A (Somewhat) Non-Technical Description of the Substitution Cost

Haldane's Dilemma and the substitution cost are a somewhat complicated topics for which very, very few people in the general population have any understanding. Although I have presented Haldane's mathematical description of the substitution cost elsewhere, I wanted to provide a gentler introduction to the concept that doesn't rely on the mathematics. Perhaps this introduction to the substitution cost will make it easier to follow the more technical descriptions of the cost.

What follows immediately is an outline six main points that I think describe the substitution cost in (mostly) non mathematical terms. If you understand these six points, I think you will have a pretty good idea of what the cost is (at least as defined by Haldane - others have their own definitions of the cost that depart somewhat from Haldane). Following the outline, I describe each of the six points in detail and with examples that I hope will make the points clear to anyone who is interested. Finally, I will provide a light introduction to the mathematics of actually calculating the substitution cost. This will only be an introduction to the math - there will then be a link to the detailed mathematics for the calculation of the substitution cost.
 

Outline of the Substitution Cost Definition

1. Start with a population of animals.
2. Detrimental change in the environment occurs.
3. A few individuals have a genetic variation that allows them to "overcome" the detrimental change in the environment.
4. The genetic variation will spread through the population as more of the offspring that posses the favorable genetic variation will survive to adulthood each generation.
5. But in the meantime (while the favorable genetic variation is spreading), a number of deaths will occur solely because of the change in the environment.
6. These deaths - that would not have occurred if the environment had not changed - are what Haldane called the substitution cost.
 

Substitution Cost Definition

1. Start with a population of animals.
Start with a population of a single species of animals.

2. Detrimental change in the environment occurs.
A change in the environment occurs - perhaps there is a longer term drop in rain fall or a lowering of temperature, a new predator begins to prey on the animal, or the animal's food supply is reduced for some reason. The important thing is that the change hurts the vast majority of the animals in the population.

3. A few individuals have a genetic variation that allows them to "overcome" the detrimental change in the environment.
However, in this population of animals, there is a very small number that have a genetic variation that allows its carriers to overcome the disadvantage caused by the change in the environment. As an example, if the detrimental environmental change were a decrease in the local temperature, a genetic variation that led to its carriers having longer fur for better protection against the cold would overcome the problems caused by the environmental change. Similarly, if the detrimental change were the introduction of a new predator, genetic variations that allowed the animal to see better (to flee from the predator sooner), run faster, or have better camouflage would once again help the animal overcome the bad effects of the environmental change. Before the environmental change, the genetic variation was quite rare in the population because it did not give any advantage or even may have been a disadvantage. (Long fur that protects the animal after the temperature drops was of no use when the temperature was warmer and in fact may have been a disadvantage.) Even if the genetic variation was disadvantageous to the animal (before the environmental change), it would have been kept in place by mutations. Returning to our long fur example, in a warm environment, short fur might be advantageous, but if there is a simple mutation that produces long fur, there will always be a small number of animals in the population that have the long fur because the mutation that causes long fur will occur every once in a while (over many generations). The important point is that there will be a few individuals in the population that will have genetic variations that will protect the very few animals that have them from the environmental change that hurts the animals in the population that do not have that genetic variation.

4. The genetic variation will spread through the population as more of the offspring that posses the favorable genetic variation will survive to adulthood each generation.
Now that the genetic variation is favorable (because of the environmental change), it should begin to increase in the population with each generation. This is because animals that have the variation will be more likely to survive and pass the trait on to their offspring. With each generation, the trait will become more common as more and more of the animals in the population inherit the genetic variation from their parents.

5. But in the meantime (while the favorable genetic variation is spreading), a number of deaths will occur solely because of the change in the environment.
Although animals that have the favored genetic variation will not be harmed by the change in the environment, the vast majority of animals in the population will not have this advantage and will be harmed. So, while the variation increases a little bit each generation, in the meantime, some of the animals that don't have the genetic variation will die each generation. These deaths can be thought of as "extra" deaths - they are deaths that would not have occurred if the environment had not changed for the worse. Haldane showed that these deaths would be surprisingly high if the favorable genetic variation increased somewhat slowly - typically ten times the population size or more.

6. These deaths - that would not have occurred if the environment had not changed - are what Haldane called the substitution cost.
These deaths due to the environmental change are what Haldane called the substitution cost. See quotes from Haldane from "The Cost of Natural Selection" that show that this was Haldane's definition of the substitution cost.
 

An Introduction to the Mathematics of the Substitution Cost

For the sake of simplicity, I will use a haploid organism for my mathematical example. The concepts (definition of the substitution cost) are the same as for the diploid case, but the calculations are simplified because there is only a single inherited copy of each gene in the haploid case. Once the substitution cost is understood for the haploid case, it should be easier to follow the complications introduced in the diploid case.

All of this refers to the scenario used to define the cost above. The genetic trait that is made favorable by the change in the environment (that is unfavorable to all animals that do not have that trait) has a frequency in the population. The frequency of the favorable trait in any generation is p, and the frequency of the non favored trait is q. Since we are considering only two traits (the favored trait and the now disfavored trait), the sum of p and q will be 1 in each generation - that is, p + q = 1 will always be true. As an example, if the population is 10,000, and if there is only one individual in the population with the favored trait (and thus 9,999 individuals that have the non favored trait), then p = 1/10,000 = 0.0001; and q = 9,9999/10,000 = 0.9999.

The way Haldane used math to describe the increase in frequency of the favored trait was to assign different "fitnesses" to the favored and non favored traits. The fitness is simply the increase (or decrease) in the frequency of each trait after selection each generation. For Haldane, the actual values of the fitnesses didn't matter so much as the relative values of the fitnesses for the two traits. So what he did was arbitrarily assign a fitness of 1 to the favored trait (carriers of the favored trait, actually), and a fitness of 1 - s to carriers of the disfavored trait. The variable s is called the selection coefficient, and is a small, positive number, typically less than 0.1. (In fact, Haldane indicated that the selection coefficient should be less than 0.1 for his substitution cost calculations to work out properly. Selection costs considerably higher than 0.1 have been observed in nature, but it is not known if these are frequent enough to have any real impact on the substitution cost in many cases.) Therefor, a selection coefficient of s = 0.1 would lead to carriers of the favored trait having a relative fitness of 1 and carriers of the disfavored trait having a relative fitness of 1 - s = 0.9. These relative fitnesses can then be used to predict the change of frequency of the favored and non favored traits after a generation of selection.

At the beginning of the generation, the frequencies of the two traits will be:

p (favored trait) and q (disfavored trait).

After selection, the frequencies of the favored and disfavored traits will change to
p (because carriers of the favored trait had a relative fitness of 1) and  (1 -s)q (because carriers of the disfavored trait had a relative fitness of 1 - s).

After selection, there will be p + (1 - s)q multiplied by the population size (the population size is often referred to by the mathematical symbol N) individuals in the population. Let's see what happens after a single round of selection on our population from the starting conditions (p = 0.0001 and q = 0.9999, s = 0.1, and N = 10,000).

After selection, there will be [p + (1 - s)q]*N individuals in the population. The new value of p in the next generation (because the frequency of the favored genetic trait has increased due to natural selection) will be p' = p/[p + (1 - s)q] = 0.0001/[0.0001 + 0.9*0.9999] = 0.0001111. The new value of q (q') will be q' = (1 -s)q/[p + (1 - s)q] = 0.9*0.9999/[0.0001 + 0.9*0.9999] = 0.9998889. In the next generation, p' will become the new value of p and q' will become the new value of q. This process can be continued, with p (the frequency of the favored trait) increasing a little each generation, and q decreasing a little each generation, until eventually the favored trait is "fixed" in the population (almost all of the animals in the population carry the favored trait p = 1.0 ) and none of the animals carry the disfavored trait.

Now, the thing that interested Haldane was the deaths due to selection each generation. If the environment had not deteriorated, then the relative fitness of the individuals carrying the trait that becomes disfavored (when the environment does eventually change) would have a fitness of 1.0. Thus, there is a decrease in fitness of those individuals carrying the disfavored trait of 1 - ( 1 - s) = s. Haldane reasoned that this means that each generation of selection lead to the loss of an additional s*q individuals (as a fraction of the population size - use s*q*N if you want the actual number of individuals) because of selection. This is what Haldane called the substitution cost, these extra deaths due to natural selection. The number he was most interested in was the sum of these selection costs over all generations of the selection. These deaths are what added up to the high selection costs Haldane reported. For the example we are looking at here (N = 10,000, initial p = 0.0001, and initial q = 0.9999), Haldane calculated that the substitution cost would be 9.21. This means that 9.21 times the population size animals would have to die in order for the substitution to occur - that's 92,100 individuals for our example population size of 10,000 individuals!

Hopefully the discussion above will make these detailed mathematical descriptions of the substitution cost a little easier to follow.

 Mathematical Description of the Substitution Cost for a Haploid Organism

 Mathematical Description of the Substitution Cost for a Diploid Organism


 Main Haldane's Dilemma Page