Is it all over for evolutionary science? Has ReMine (an Engineer working totally outside his field!) completely refuted 80 years of population genetics in one fell swoop? I don't think so, and you won't find a scientist who isn't already committed to creationistic theology who thinks so either. Let me show you a few things wrong with what ReMine has to say along with some problems with Haldane's substitution cost.
The classic view of gene fixation involves assuming a starting frequency for two alleles of a single gene, applying the Hardy-Weinberg rule to the frequencies to calculate the frequency of each genotype in the next generation, and then applying a fitness factor to each genotype to simulate the effects of selection.
Haldane's Calculations for the Replacement of Gene "a" by Gene "A" Symbols: p - frequency of gene A at any given time. q - frequency of gene a at any given time. Note that p + q always equals 1. s - the selection coefficient of gene A. w - the fitness of gene A, w = 1 + s. N - number of individuals in the population. F - the average number of offspring that each parent produces in a generation. Assumptions:Gene A is fully dominant over gene a. Gene A is a beneficial mutation that occurs in exactly one individual at the start of our calculations. Population will remain constant through time.
Calculations: Let N = 100,000 F = 4 There is 1 individual with an A gene (genotype is Aa, all other genotypes are aa.) There are 99,999 aa individuals. p = 1 / 2N = 1 / 200,000 = 5x10-6 q = 1 - p = 1 - 5x10-6 = 0.999995 s = 0.01 w = 1 + s = 1 + 0.01 = 1.01 w * p2 = frequency of AA individuals in the next generation w * 2pq = frequency of Aa individuals in the next generation q2 = frequency of aa individuals in the next generation. As a simplifying step, it is conventional to adjust the fitnesses such that the highest fitness is 1 and the less fit individuals have a fitness less than 1. Therefor, p2 = frequency of AA individuals in the next generation 2pq = frequency of Aa individuals in the next generation q2 / w = frequency of aa individuals in the next generation. Note: Thank you to Walter ReMine for pointing out my previous error in using N rather than 2N to calculate p and q. This large error had a minor effect upon the remaining calculations in this section.The number of interest to Haldane was
The problem lies in the normalization of the fitness constant and the assumption of constant fitness. As Bruce Wallace puts it, although Haldane killed 991 individuals with his selection coefficient, most of those individuals were "resurrected" in the next generation with the assumption of a constant population! What we need is a slightly different fitness "constant" that allows for the fact that one slightly fit individual is not going to kill off nearly 1000 less fit individuals. And that is just what I provide in the next section.
Haldane's Calculations for the Replacement of Gene "a" by Gene "A" With Frequency Dependent Selection
First, let's consider a population of genetically identical individuals (except for X and Y chromosomes!). Using the symbols and assumptions above, we know that each individual will produce F offspring of which only 1 will survive to reproductive adulthood. This maintains the population at N which was a condition we imposed above. This implies a fitness of 1/F for each individual. Now let's propose a mutation in an individual such that the mutant is a small fraction s more fit than the non-mutant (i.e. s = .01 would imply an increased fitness of 1% for the new A allele over the old a allele). Just to be thorough, I am going to consider the possibility of incomplete dominance of our new, beneficial allele (as Haldane did in "The Cost of Natural Selection"). Let X be the fitness of an individual homozygous for the beneficial mutation, Y the fitness of a heterozygote, and Z is the fitness of individuals homozygous for the old, less fit allele. X, Y, and Z can be related by X = (1 + s) * Z and Y = (1 + sh) * Z . s is a positive selection coefficient as described above, and h is a factor between 0.0 and 1.0 that indicates the relative dominance of the two alleles. Thus h = 1 indicates complete dominance for the new allele, h = 0.0 indicates complete dominance for the old allele. Notice that Z is slightly less than X as expected for a less fit individual, with Y somewhere in between, tending toward X or Z depending on the value of h. Now we are going to apply a round of reproduction to our population and solve for Z and ultimately Y and X in terms of p, q, s, h, and F. The variables p, q, and N have not changed in meaning from above. Variable s has a somewhat modified meaning as described in this paragraph (but I think it is still quite analogous to the classic coefficient of selection). Variables F, h, X, Y, and Z were just introduced.
Reproduction : F*N*p2 + F*N*2pq + F*N*q2 = F*NNote that if for example F is 4 (a pair of parents produce 8 offspring in their lifetimes) and N (the parent population size) is 100,000; we have 400,000 offspring before selection. Let's now apply selection:
X*F*N*p2 + Y*F*N*2pq + Z*F*N*q2 = NNote that the subsequent generation's population has been brought back to exactly N by selection, thus meeting our requirement of a fixed population size. Now our goal is to solve for Z. Since:
X = (1 + s) * Z and Y = (1 + sh) * Z (1 + s)*Z*F*N*p2 + (1 + sh)*Z*F*N*2pq + Z* F*N*q2 = N and (1 + s)*Z*F*p2 + (1 + sh)*Z*F*2pq + Z* F*q2 = 1 (divided by N) (1 + s)*Z*p2 + (1 + sh)*Z*2pq + Z*q2 = 1/F (divided by F) Z*((1 + s) * p2 + (1 + sh)*2pq + *q2) = 1/F (factored out Z) multiply out the (1 + s) and (1 + sh) terms: Z*(p2 + s*p2 + 2pq + sh*2pq + q2) = 1 / F group terms that are not factors of s or s*h: Z*([p2 + 2pq + q2] + s*p2 + sh*2pq) = 1 / F since p2 + 2pq + q2 = (p + q)2 and p + q = 1, Z*(1 + s*p2 + sh*2pq) = 1 / F Z = 1/[F*(1 + s*p2 + sh*2pq)] From the relations for X, Y, and Z as defined above: Y = (1 + sh)/[F*(1 + s* p2 + sh*2pq)] and X = (1 + s)/[F*(1 + s* p2 + sh*2pq)]We now have fitness equations for the two alleles in terms of F, s, h, p, and q! Now, let's take time out for a reality check. How do the fitness terms vary for a new, beneficial mutation just starting out versus what happens when it reaches fixation (let's just consider h = 1, i.e. complete dominance for the new allele)? Well, when the new A allele exists in only a few individuals, q is essentially 1, p is 0, and X and Z approach (1 + s) / F. This is a small but positive fitness that will lead to increased numbers of A individuals. Meanwhile, the old aa individuals have a fitness of very nearly 1/F because q is close to 1 and p to 0. That means these individuals will hardly feel the competition with the Aa and AA individuals because they are so rare. Only as the frequency of the new mutant gene is raised ( and q reduced) do the aa individuals begin to be seriously outcompeted. As q approaches 0 (and p approaches 1), the fitness of the aa individuals approaches 1 / [F * (1 + s)] while the AA and Aa individuals' fitness approaches 1/F. When the A gene becomes fixed, its fitness is 1/F, just like every other fixed gene in the population. This makes sense because it no longer has anyone to compete with - there are no longer any non-A individuals.
Now, my question is (to anyone who has made it this far): What is the substitution cost? How have the patterns of death and birth differed while gene fixation was going on from when the animals were simply reproducing without substitution? To me, it looks like the exact same number of individuals have lived and died in each generation as would have lived and died if substitution were not occurring. To me, the cost of substitution appears to be an artifact of an old, simple equation to determine the survivors from one generation to the next under natural selection. This was exactly the conclusion Bruce Wallace reached in "Fifty Years of Genetic Load: An Odyssey"6 and it seems perfectly obvious to me.
I have found further insight into these questions by breaking down the fitness equations for X, Y, and Z using the method of partial fractions. If Z = 1/[F*(1 + s*p2 + sh*2pq)] then
Z = 1/F +A/(1 + s*p2 + sh*2pq) where A is an arbitrary value that can be calculated to preserve the equality. A can be solved for by noting that 1/F + A /(1 + s*p2 + sh*2pq) = 1/[F*(1 + s*p2 + sh*2pq)]. Therefor,
1 + s*p2 + sh*2pq + A*F = 1
A*F = -(s*p2 + sh*2pq)
A = -(s*p2 + sh*2pq) / F
This leads to:
Z = 1/F - s/F * (p2 + h*2pq)/( 1 + s*p2 + sh*2pq)
Similarly, for Y,
Y = (1 + sh)/F*(1 + s* p2 + sh*2pq)],
Let Y = 1/F + B/(1 + s* p2 + sh*2pq)], where B must be determined.
1/F + B/*(1 + s* p2 + sh*2pq)] = (1 + sh)/F* (1 + s* p2 + sh*2pq)]
1 + s* p2 + sh*2pq + B*F = 1 + sh
B*F = sh - s* p2 - sh*2pq
B = 1/F * (sh - s* p2 - sh*2pq)
Therefor,
Y = 1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)
Lastly, applying the same treatment to X:
X = (1 + s)/F*(1 + s* p2 + sh*2pq)]
Let X = 1/F + C/*(1 + s* p2 + sh*2pq)], where C must be determined.
1/F + C/*(1 + s* p2 + sh*2pq)] = (1 + s)/F*(1 + s* p2 + sh*2pq)]
1 + s* p2 + sh*2pq + C*F = 1 + s
C*F = s - s* p2 - sh*2pq
C = 1/F * (s - s* p2 - sh*2pq)
In which case:
X = 1/F + s/F*(1 - p2 - h*2pq) / (1 + s* p2 + sh*2pq)
A round of selection using these forms of the fitness equations looks like this:
{[1/F + s/F * (1 - p2 - h*2pq) / (1 + s* p2 + sh*2pq)]* p2 +
[1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)] * 2pq +
[1/F - s/F * (p2 + h*2pq) / (1 + s* p2 + sh*2pq)] * q2} =
1/F
What does it all mean? Well, first of all, notice that the fitness of each genotype (AA, Aa, and aa) is a sum of 1/F and a factor of s/F that is dependent upon the selection coefficient, dominance, and allele frequencies. Notice that if the selection coefficient is zero, the average fitness will be 1/F, just enough to maintain the species population. Also note that for all values of p and q such that p + q = 1 (I will be adding this derivation later), the S/F terms for the three genotypes will always sum to zero, so the average fitness will always be 1/F.
Remember, Haldane's 1957 paper was a theoretical
treatise on the cost of natural selection. Here is Haldane's conclusion,
which is correct in both points:
"To conclude, I am quite aware that my conclusions
will probably need drastic revision. But I am convinced that quantitative
arguments of the kind here put forward should play a part in all future
discussions of evolution."
Please e-mail me (Robert Williams rwms@gate.net) with suggestions, comments, or criticisms.
1. ReMine, Walter J. 1993 The Biotic Message, St. Paul Science, Inc.
2. Haldane, J. B. S. 1957 The Cost of Natural Selection Journal of Genetics 55:511-524
3. Sibley. C. G., Comstock, J. A., Ahlquist, J.E. 1990 DNA Hybridization Evidence of Hominoid Phylogeny: A Reanalysis of the Data Journal of Molecular Evolution 30:202-236
4. Sibley. C. G., Ahlquist, J.E. 1987 DNA Hybridization Evidence of Hominoid Phylogeny: Results from an Expanded Data Set Journal of Molecular Evolution 26:99-121
5. Sibley. C. G., Ahlquist, J.E. 1984 The Phylogeny of Hominoid Primates, as Indicated by DNA-DNA Hybridization Journal of Molecular Evolution 20:2-15
6. Wallace, Bruce 1991 Fifty Years of Genetic
Load - An Odyssey Cornell University Press
See particularly Chapter 5, Dilemmas and Options;
Chapter 6, Hard and Soft Selection; Chapter 8, Self-Culling and the Persistance
of Populations; and Chapter 9, Summarizing Remarks for issues that have
been addressed in this essay and related topics.