Haldane's Dilemma

In his book "The Biotic Message"1, Walter ReMine claims that the evolution from ape to human (actually from the common ancestor of the ape and human to human) could not have happened in ten million years because of what Haldane called the cost of substitution in his classic 1957 paper "The Cost of Natural Selection"2. Haldane claimed that in a fixed population (a population that is neither growing nor shrinking in the number of its member animals) of mammals, no more than 1 gene could be fixed per 300 generations due to the cost of substitution. Haldane assumed that the deaths caused by the newly disadvantageous gene's lower fitness (possibly due to a change in environment) would be over and above the "background" death rate - the naturally occurring deaths due to all reasons other than the lowered fitness of the gene. This is known as hard selection, and because most of the animal's excess reproductive capacity was thought to be used up in the replacement of progeny lost due to background mortality, Haldane allowed a figure of only 10% for the animal to "pay" the cost of substitution. Thus a single animal could only contribute 0.1 "individuals" to pay the cost of selection. Haldane estimated that the substitution cost would require the deaths of 30 times the population size for a single gene fixation from a very rare mutation to homozygous in say 99% of the population. Since he claimed just 10% excess reproductive capacity applied to the cost of 30 times the population size, he came up with 300 generations (30 / 0.1) to fix a single gene. ReMine then applies these figures to the human / chimpanzee clade by allowing a generous 10 million years since their evolutionary divergence (the fossil record and mitochondrial comparisons indicate something closer to 5 million years) and (a not so generous) 20 years per generation to arrive at a figure of 500,000 chimp/human generations since divergence. Dividing 500,000 generations by 1 gene replacement per 300 generations yields a maximum of 1,667 gene substitutions since humans and chimps parted ways. Thanks to the DNA hybridization experiments3-5 of Dr. Charles Silbey and others, we now know that humans and chimps differ by an amazingly low 1.6% of their genetic material. But, given that the human genome consists of 3 x 109 base pairs, if we assume 1 base pair per mutation (i.e. we assume all mutations that led to the differentiation of humans from chimps were point mutations) this requires 3 x 109 x .016 = 4.8 x 107 gene substitutions to occur in a period during which Haldane's theory predicts only 1667 gene substitutions can occur. Hence, "Haldane's Dilemma".

Is it all over for evolutionary science? Has ReMine (an Engineer working totally outside his field!) completely refuted 80 years of population genetics in one fell swoop? I don't think so, and you won't find a scientist who isn't already committed to creationistic theology who thinks so either. Let me show you a few things wrong with what ReMine has to say along with some problems with Haldane's substitution cost.

ReMine's Problems

  • ReMine neglects the fact that humans did not evolve from chimpanzees, rather humans and chimps evolved from a common ancestor. Therefor we have actually had two different branches each evolving independently, thus allowing for twice as many gene substitutions (3300 vs. 1700) as ReMine has allowed, even if all of the above is true.
  • ReMine assumes that all the differences between the human and chimp genomes are due to selection. This can't possibly be the case because many of the differences are known to occur at the 3rd triplet of gene codons and thus usually do not change the amino acid coded and can't affect fitness. Furthermore, since 95% of the genome is not transcribed (although that does not mean it is all non-functional ), most point mutations will not affect fitness. This reduces the number of selected substitutions by 5 x 2/3 % or from 4.8 x 107 substitutions to 1.6 x 106. Please remember that changes in the genome due to drift and other "random" processes do not add to the cost of substitution. I should add that Haldane's Dilemma has been viewed by scientists as possible evidence for the importance of Neutral Evolution as proposed by Kimura in 1967.
  • ReMine neglects the fact that there are only 23 pairs of human chromosomes. Thus, when there are any favorable genes on the same chromosome, their substitution cost would only have to be paid one time for the chromosome as a whole, not one time for each favorable gene. This alone could falsify ReMine's whole argument if many genes are approaching fixation on a few chromosomes.
  • ReMine ignores the possibility of gene hitchhiking - the concept that even though some mutations are neutral, they will be carried to fixation because they are physically close to a gene that is beneficial.
  • Finally, ReMine ignores the fact that due to non-point mutations (deletions and insertions due to non-equal crossing over), a single mutation can affect many more than one DNA base pair. In fact, what has to be by far and away the most common mutation is the change in DNA due to the alignment mismatch mutations in mini-satellites. These mutations can affect some multiple of between 5 and 15 base pairs and have been observed in as many as 1 in 6 human sperm!.
  • Haldane's Problems

  • Haldane assumed that the cost of substitution had to be paid on top of the "natural" death rate! In other words, it didn't matter that 90% of a mammal's offspring died without reproducing - any death that resulted from the substitution of one gene for another had to be additional death that the animal would not "normally" have suffered. This is known as hard selection and we can now easily see why Haldane only allowed an excess fertility of 10% to go towards the cost of substitution. However, most Biologists today consider all or some selection to occur as soft selection. In this scenario, the cost of substitution is "paid" in the natural death rate of the animal. That is, a disproportionate number of the individuals that die without reproducing in any generation are the ones that have lower fitness due to their genes. The Biologist Bruce Wallace has been the champion of soft selection, and you can learn more about this topic in his book "Fifty Years of Genetic Load - An Odyssey".
  • In "The Cost of Natural Selection", Haldane only considers the case of a change in the environment that causes a common allele to become disadvantageous. It's easy to see how such a change could lead to an increase in the death rate as postulated by Haldane (at least in a hard selection scenario). Obviously, environments can change so drastically that the genetic diversity of a species is insufficient to allow that species to survive under the new conditions. Haldane's calculations seem to be an entirely appropriate (although simplified) description of a common allele which has become disadvantageous due to a change in the environment under hard selection conditions. However, another possibility which is completely ignored by Haldane in "The Cost of Natural Selection" is the possibility of a rare allele becoming advantageous. Such an allele could become advantageous because of a change of behavior in the organism, a change in the environment, or migration to a different environment. Another possibility is that the allele could arise through a beneficial mutation of an existing allele. Whatever the reason for the change of fitness, the consequences of an increase in the fitness of a rare allele are completely different from the consequences of a decrease in fitness of a common allele (for hard selection). A common allele becoming disadvantageous would lead to an increased death rate for the species, but a rare allele becoming advantageous would lead to an increase in the species population (in the absence of resource limitations). Theoretically (in a naïve sense), the rare allele could approach fixation without causing a decrease in the absolute numbers of the previously common allele by causing a large increase in the species' population due to the (formerly) rare allele's increased fitness. These two scenarios are not equivalent in the patterns of deaths of individual organisms. Even though the rare allele is going to eventually replace the common allele by causing the deaths of individuals that carry the disadvantageous allele (due to competition); the number of deaths for individuals with the common disadvantageous allele will not increase significantly until the number of individuals carrying the new advantageous allele increases to a level high enough to seriously compete with large numbers of individuals carrying the old allele. Thus, we will not see the loss of 10% of the population generation after generation because an extremely small number of individuals are 10% more fit than the rest of the population. Only when the number of more fit individuals increases to a level where competition with the less fit individuals is significant will there begin to be serious increases in the death rate of the less fit individuals. It's almost as if the selection coefficients of the two alleles are frequency dependent - the selection coefficient of the common allele decreases as the frequency of the advantageous allele increases. I will be exploring the mathematics of this situation in detail in a subsequent section of this essay.
  • Since the paper was written forty years ago (and I think it's no accident that ReMine chose this paper over the far more sophisticated equations that are available today), Haldane used the simplest equation possible to describe the effects of selection. While this was a great improvement in its day, the whole idea of a "cost" of substitution disappears (at least as a limit on the evolution rate) with only a slight improvement to this equation that allows it to model reality a little more closely. I will show this below.
  • The classic view of gene fixation involves assuming a starting frequency for two alleles of a single gene, applying the Hardy-Weinberg rule to the frequencies to calculate the frequency of each genotype in the next generation, and then applying a fitness factor to each genotype to simulate the effects of selection.

    Haldane's Calculations for the Replacement of Gene "a" by Gene "A"
    
    Symbols:
    
    p - frequency of gene A at any given time.
    
    q - frequency of gene a at any given time. Note that p + q always equals 1.
    
     s - the selection coefficient of gene A.
    
    w - the fitness of gene A, w = 1 + s.
    
    N - number of individuals in the population.
    
    F - the average number of offspring that each parent produces in a generation. 
    
    
    
    Assumptions:
    Gene A is fully dominant over gene a. Gene A is a beneficial mutation that occurs in exactly one individual at the start of our calculations. Population will remain constant through time.
    Calculations:
    
            Let N = 100,000
    
             F = 4
    
            There is 1 individual with an A gene (genotype is Aa, all other genotypes are aa.)
    
                    There are 99,999 aa individuals.
    
                    p = 1 / 2N = 1 / 200,000 = 5x10-6
    
                    q = 1 - p = 1 - 5x10-6 = 0.999995
    
                    s  = 0.01
    
    
    
            w = 1 + s = 1 + 0.01 = 1.01
    
                  
    
            w * p2 = frequency of AA individuals in the next generation
    
                    w * 2pq = frequency of Aa individuals in the next generation
    
                    q2 = frequency of aa individuals in the next generation.
    
    
    
            As a simplifying step, it is conventional to adjust the fitnesses such that 
    
                    the highest fitness is 1 and the less fit individuals have a fitness less than 1.
    
    
    
                    Therefor,
    
            p2 = frequency of AA individuals in the next generation
    
                    2pq = frequency of Aa individuals in the next generation
    
                    q2 / w  = frequency of aa individuals in the next generation.  
    
    
    
                    Note: Thank you to Walter ReMine for pointing out my 
    
                    previous error in using N rather than 2N to calculate p 
    
                    and q. This large error had a minor effect upon the 
    
                    remaining calculations in this section.
    The number of interest to Haldane was
    (q2 / w) * N = (0.999995)2 / 1.01 * 100,000 = 99,009 (aprox.).
    Notice that 991 aa individuals have been lost due to selection. This is what Haldane called the cost of substitution for one generation. He continued iterating this process, summing the number of aa individuals lost each generation until gene A became nearly fixed. That sum, divided by the population size was what Haldane called the cost of substitution. But, does this process reflect reality very well? Why should the addition of one individual having a slightly beneficial gene cause the death of 991 individuals who otherwise would have gotten along fine. What is especially telling is that if the population size is raised to 1 million, now 9901 individuals have to die, even though the fitness hasn't changed any! This just doesn't make sense.

    The problem lies in the normalization of the fitness constant and the assumption of constant fitness. As Bruce Wallace puts it, although Haldane killed 991 individuals with his selection coefficient, most of those individuals were "resurrected" in the next generation with the assumption of a constant population! What we need is a slightly different fitness "constant" that allows for the fact that one slightly fit individual is not going to kill off nearly 1000 less fit individuals. And that is just what I provide in the next section.

    Haldane's Calculations for the Replacement of Gene "a" by Gene "A" With Frequency Dependent Selection

    First, let's consider a population of genetically identical individuals (except for X and Y chromosomes!). Using the symbols and assumptions above, we know that each individual will produce F offspring of which only 1 will survive to reproductive adulthood. This maintains the population at N which was a condition we imposed above. This implies a fitness of 1/F for each individual. Now let's propose a mutation in an individual such that the mutant is a small fraction s more fit than the non-mutant (i.e. s = .01 would imply an increased fitness of 1% for the new A allele over the old a allele). Just to be thorough, I am going to consider the possibility of incomplete dominance of our new, beneficial allele (as Haldane did in "The Cost of Natural Selection"). Let X be the fitness of an individual homozygous for the beneficial mutation, Y the fitness of a heterozygote, and Z is the fitness of individuals homozygous for the old, less fit allele. X, Y, and Z can be related by X = (1 + s) * Z and Y = (1 + sh) * Z . s is a positive selection coefficient as described above, and h is a factor between 0.0 and 1.0 that indicates the relative dominance of the two alleles. Thus h = 1 indicates complete dominance for the new allele, h = 0.0 indicates complete dominance for the old allele. Notice that Z is slightly less than X as expected for a less fit individual, with Y somewhere in between, tending toward X or Z depending on the value of h. Now we are going to apply a round of reproduction to our population and solve for Z and ultimately Y and X in terms of p, q, s, h, and F. The variables p, q, and N have not changed in meaning from above. Variable s has a somewhat modified meaning as described in this paragraph (but I think it is still quite analogous to the classic coefficient of selection). Variables F, h, X, Y, and Z were just introduced.

            Reproduction :
    
    
    
            F*N*p2 + F*N*2pq + F*N*q2 = F*N
    Note that if for example F is 4 (a pair of parents produce 8 offspring in their lifetimes) and N (the parent population size) is 100,000; we have 400,000 offspring before selection. Let's now apply selection:
            X*F*N*p2  + Y*F*N*2pq + Z*F*N*q2 = N
    Note that the subsequent generation's population has been brought back to exactly N by selection, thus meeting our requirement of a fixed population size. Now our goal is to solve for Z. Since:
            X = (1 + s) * Z and Y = (1 + sh) * Z
    
    
    
            (1 + s)*Z*F*N*p2  + (1 + sh)*Z*F*N*2pq + Z* F*N*q2 = N
    
    
    
            and 
    
    
    
             (1 + s)*Z*F*p2  + (1 + sh)*Z*F*2pq + Z* F*q2 = 1    (divided by  N)
    
    
    
            (1 + s)*Z*p2  + (1 + sh)*Z*2pq + Z*q2 = 1/F             (divided by F)
    
    
    
            Z*((1 + s) * p2  + (1 + sh)*2pq + *q2) = 1/F    (factored out Z)
    
    
    
            multiply out the (1 + s) and (1 + sh) terms:
    
                     Z*(p2  + s*p2  + 2pq + sh*2pq +  q2)  = 1 / F
    
    
    
            group terms that are not factors of s or s*h:
    
            Z*([p2  + 2pq + q2] + s*p2  + sh*2pq) = 1 / F
    
    
    
            since p2  + 2pq + q2 = (p + q)2 and  p + q = 1,
    
            Z*(1 + s*p2  + sh*2pq) = 1 / F
    
    
    
            Z = 1/[F*(1 + s*p2  + sh*2pq)]
    
    
    
            From the relations for X, Y, and Z as defined above:
    
    
    
    Y = (1 + sh)/[F*(1 + s* p2  + sh*2pq)] and
    
          
    
    X = (1 + s)/[F*(1 + s* p2  + sh*2pq)]
    
    
    
    
    We now have fitness equations for the two alleles in terms of F, s, h, p, and q! Now, let's take time out for a reality check. How do the fitness terms vary for a new, beneficial mutation just starting out versus what happens when it reaches fixation (let's just consider h = 1, i.e. complete dominance for the new allele)? Well, when the new A allele exists in only a few individuals, q is essentially 1, p is 0, and X and Z approach (1 + s) / F. This is a small but positive fitness that will lead to increased numbers of A individuals. Meanwhile, the old aa individuals have a fitness of very nearly 1/F because q is close to 1 and p to 0. That means these individuals will hardly feel the competition with the Aa and AA individuals because they are so rare. Only as the frequency of the new mutant gene is raised ( and q reduced) do the aa individuals begin to be seriously outcompeted. As q approaches 0 (and p approaches 1), the fitness of the aa individuals approaches 1 / [F * (1 + s)] while the AA and Aa individuals' fitness approaches 1/F. When the A gene becomes fixed, its fitness is 1/F, just like every other fixed gene in the population. This makes sense because it no longer has anyone to compete with - there are no longer any non-A individuals.

    Now, my question is (to anyone who has made it this far): What is the substitution cost? How have the patterns of death and birth differed while gene fixation was going on from when the animals were simply reproducing without substitution? To me, it looks like the exact same number of individuals have lived and died in each generation as would have lived and died if substitution were not occurring. To me, the cost of substitution appears to be an artifact of an old, simple equation to determine the survivors from one generation to the next under natural selection. This was exactly the conclusion Bruce Wallace reached in "Fifty Years of Genetic Load: An Odyssey"6 and it seems perfectly obvious to me.

    I have found further insight into these questions by breaking down the fitness equations for X, Y, and Z using the method of partial fractions. If Z = 1/[F*(1 + s*p2 + sh*2pq)] then

    Z = 1/F +A/(1 + s*p2 + sh*2pq) where A is an arbitrary value that can be calculated to preserve the equality. A can be solved for by noting that 1/F + A /(1 + s*p2 + sh*2pq) = 1/[F*(1 + s*p2 + sh*2pq)]. Therefor,

    1 + s*p2 + sh*2pq + A*F = 1

    A*F = -(s*p2 + sh*2pq)

    A = -(s*p2 + sh*2pq) / F

    This leads to:

    Z = 1/F - s/F * (p2 + h*2pq)/( 1 + s*p2 + sh*2pq)

    Similarly, for Y,

    Y = (1 + sh)/F*(1 + s* p2 + sh*2pq)],

    Let Y = 1/F + B/(1 + s* p2 + sh*2pq)], where B must be determined.

    1/F + B/*(1 + s* p2 + sh*2pq)] = (1 + sh)/F* (1 + s* p2 + sh*2pq)]

    1 + s* p2 + sh*2pq + B*F = 1 + sh

    B*F = sh - s* p2 - sh*2pq

    B = 1/F * (sh - s* p2 - sh*2pq)

    Therefor,

    Y = 1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)

    Lastly, applying the same treatment to X:

    X = (1 + s)/F*(1 + s* p2 + sh*2pq)]

    Let X = 1/F + C/*(1 + s* p2 + sh*2pq)], where C must be determined.

    1/F + C/*(1 + s* p2 + sh*2pq)] = (1 + s)/F*(1 + s* p2 + sh*2pq)]

    1 + s* p2 + sh*2pq + C*F = 1 + s

    C*F = s - s* p2 - sh*2pq

    C = 1/F * (s - s* p2 - sh*2pq)

    In which case:

    X = 1/F + s/F*(1 - p2 - h*2pq) / (1 + s* p2 + sh*2pq)

    A round of selection using these forms of the fitness equations looks like this:

    {[1/F + s/F * (1 - p2 - h*2pq) / (1 + s* p2 + sh*2pq)]* p2 +

    [1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)] * 2pq +

    [1/F - s/F * (p2 + h*2pq) / (1 + s* p2 + sh*2pq)] * q2} =

    1/F

    What does it all mean? Well, first of all, notice that the fitness of each genotype (AA, Aa, and aa) is a sum of 1/F and a factor of s/F that is dependent upon the selection coefficient, dominance, and allele frequencies. Notice that if the selection coefficient is zero, the average fitness will be 1/F, just enough to maintain the species population. Also note that for all values of p and q such that p + q = 1 (I will be adding this derivation later), the S/F terms for the three genotypes will always sum to zero, so the average fitness will always be 1/F.

    Remember, Haldane's 1957 paper was a theoretical treatise on the cost of natural selection. Here is Haldane's conclusion, which is correct in both points:
    "To conclude, I am quite aware that my conclusions will probably need drastic revision. But I am convinced that quantitative arguments of the kind here put forward should play a part in all future discussions of evolution."

    Please e-mail me (Robert Williams rwms@gate.net) with suggestions, comments, or criticisms.


    References

    1. ReMine, Walter J. 1993 The Biotic Message, St. Paul Science, Inc.

    2. Haldane, J. B. S. 1957 The Cost of Natural Selection Journal of Genetics 55:511-524

    3. Sibley. C. G., Comstock, J. A., Ahlquist, J.E. 1990 DNA Hybridization Evidence of Hominoid Phylogeny: A Reanalysis of the Data Journal of Molecular Evolution 30:202-236

    4. Sibley. C. G., Ahlquist, J.E. 1987 DNA Hybridization Evidence of Hominoid Phylogeny: Results from an Expanded Data Set Journal of Molecular Evolution 26:99-121

    5. Sibley. C. G., Ahlquist, J.E. 1984 The Phylogeny of Hominoid Primates, as Indicated by DNA-DNA Hybridization Journal of Molecular Evolution 20:2-15

    6. Wallace, Bruce 1991 Fifty Years of Genetic Load - An Odyssey Cornell University Press
    See particularly Chapter 5, Dilemmas and Options; Chapter 6, Hard and Soft Selection; Chapter 8, Self-Culling and the Persistance of Populations; and Chapter 9, Summarizing Remarks for issues that have been addressed in this essay and related topics.