The classic view of gene fixation involves assuming a starting frequency for two alleles of a single gene, calculating the frequency of each genotype in the next generation, and then applying a fitness factor to each genotype to simulate the effects of selection.
Haldane's Calculations for the Replacement of Gene "a" by Gene "A" Symbols: p - frequency of gene A at any given time. q - frequency of gene a at any given time. Note that p + q always equals 1. s - the selection coefficient of gene A. w - the fitness of gene a, w = 1 - s. N - number of individuals in the population. F - the average number of offspring that each parent produces in a generation. Assumptions:
Gene A is a beneficial mutation that occurs in exactly one individual at the start of our calculations. Population will remain constant through time.Calculations:
The problem lies in the normalization of the fitness constant and the assumption of constant fitness. As Bruce Wallace puts it, although Haldane killed 991 individuals with his selection coefficient, most of those individuals were "resurrected" in the next generation with the assumption of a constant population! What we need is a slightly different fitness "constant" that allows for the fact that one slightly fit individual is not going to kill off nearly 1000 less fit individuals. And that is just what I provide in the next section.
Haldane's Calculations for the Replacement of Gene "a" by Gene "A" With Frequency Dependent Selection
First, let's consider a population of genetically identical individuals (except for X and Y chromosomes!). Using the symbols and assumptions above, we know that each individual will produce F offspring of which only 1 will survive to reproductive adulthood. This maintains the population at N which was a condition we imposed above. This implies a fitness of 1/F for each individual. Now let's propose a mutation in an individual such that the mutant is a small fraction s more fit than the non-mutant (i.e. s = .01 would imply an increased fitness of 1% for the new A allele over the old a allele). Just to be thorough, I am going to consider the possibility of incomplete dominance of our new, beneficial allele (as Haldane did in "The Cost of Natural Selection"). Let X be the fitness of an individual homozygous for the beneficial mutation, Y the fitness of a heterozygote, and Z is the fitness of individuals homozygous for the old, less fit allele. X, Y, and Z can be related by X = (1 + s) * Z and Y = (1 + sh) * Z . s is a positive selection coefficient as described above, and h is a factor between 0.0 and 1.0 that indicates the relative dominance of the two alleles. Thus h = 1 indicates complete dominance for the new allele, h = 0.0 indicates complete dominance for the old allele. Notice that Z is slightly less than X as expected for a less fit individual, with Y somewhere in between, tending toward X or Z depending on the value of h. Now we are going to apply a round of reproduction to our population and solve for Z and ultimately Y and X in terms of p, q, s, h, and F. The variables p, q, and N have not changed in meaning from above. Variable s has a somewhat modified meaning as described in this paragraph (but I think it is still quite analogous to the classic coefficient of selection). Variables F, h, X, Y, and Z were just introduced.
Reproduction : F*N*p2 + F*N*2pq + F*N*q2 = F*NNote that if for example F is 4 (a pair of parents produce 8 offspring in their lifetimes) and N (the parent population size) is 100,000; we have 400,000 offspring before selection. Let's now apply selection:
X*F*N*p2 + Y*F*N*2pq + Z*F*N*q2 = NNote that the subsequent generation's population has been brought back to exactly N by selection, thus meeting our requirement of a fixed population size. Now our goal is to solve for Z. Since:
X = (1 + s) * Z and Y = (1 + sh) * Z (1 + s)*Z*F*N*p2 + (1 + sh)*Z*F*N*2pq + Z* F*N*q2 = N and (1 + s)*Z*F*p2 + (1 + sh)*Z*F*2pq + Z* F*q2 = 1 (divided by N) (1 + s)*Z*p2 + (1 + sh)*Z*2pq + Z*q2 = 1/F (divided by F) Z*((1 + s) * p2 + (1 + sh)*2pq + *q2) = 1/F (factored out Z) multiply out the (1 + s) and (1 + sh) terms: Z*(p2 + s*p2 + 2pq + sh*2pq + q2) = 1 / F group terms that are not factors of s or s*h: Z*([p2 + 2pq + q2] + s*p2 + sh*2pq) = 1 / F since p2 + 2pq + q2 = (p + q)2 and p + q = 1, Z*(1 + s*p2 + sh*2pq) = 1 / F Z = 1/[F*(1 + s*p2 + sh*2pq)] From the relations for X, Y, and Z as defined above: Y = (1 + sh)/[F*(1 + s* p2 + sh*2pq)] and X = (1 + s)/[F*(1 + s* p2 + sh*2pq)]We now have fitness equations for the two alleles in terms of F, s, h, p, and q! Now, let's take time out for a reality check. How do the fitness terms vary for a new, beneficial mutation just starting out versus what happens when it reaches fixation (let's just consider h = 1, i.e. complete dominance for the new allele)? Well, when the new A allele exists in only a few individuals, q is essentially 1, p is 0, and X and Z approach (1 + s) / F. This is a small but positive fitness that will lead to increased numbers of A individuals. Meanwhile, the old aa individuals have a fitness of very nearly 1/F because q is close to 1 and p to 0. That means these individuals will hardly feel the competition with the Aa and AA individuals because they are so rare. Only as the frequency of the new mutant gene is raised ( and q reduced) do the aa individuals begin to be seriously outcompeted. As q approaches 0 (and p approaches 1), the fitness of the aa individuals approaches 1 / [F * (1 + s)] while the AA and Aa individuals' fitness approaches 1/F. When the A gene becomes fixed, its fitness is 1/F, just like every other fixed gene in the population. This makes sense because it no longer has anyone to compete with - there are no longer any non-A individuals.
Now, my question is (to anyone who has made it this far): What is the substitution cost? How have the patterns of death and birth differed while gene fixation was going on from when the animals were simply reproducing without substitution? To me, it looks like the exact same number of individuals have lived and died in each generation as would have lived and died if substitution were not occurring. To me, the cost of substitution appears to be an artifact of an old, simple equation to determine the survivors from one generation to the next under natural selection. This was exactly the conclusion Bruce Wallace reached in "Fifty Years of Genetic Load: An Odyssey"6 and it seems perfectly obvious to me.
I have found further insight into these questions by breaking down the fitness equations for X, Y, and Z using the method of partial fractions. If Z = 1/[F*(1 + s*p2 + sh*2pq)] then
Z = 1/F +A/(1 + s*p2 + sh*2pq) where A is an arbitrary value that can be calculated to preserve the equality. A can be solved for by noting that 1/F + A /(1 + s*p2 + sh*2pq) = 1/[F*(1 + s*p2 + sh*2pq)]. Therefor,
1 + s*p2 + sh*2pq + A*F = 1
A*F = -(s*p2 + sh*2pq)
A = -(s*p2 + sh*2pq) / F
This leads to:
Z = 1/F - s/F * (p2 + h*2pq)/( 1 + s*p2 + sh*2pq)
Similarly, for Y,
Y = (1 + sh)/F*(1 + s* p2 + sh*2pq)],
Let Y = 1/F + B/(1 + s* p2 + sh*2pq)], where B must be determined.
1/F + B/*(1 + s* p2 + sh*2pq)] = (1 + sh)/F* (1 + s* p2 + sh*2pq)]
1 + s* p2 + sh*2pq + B*F = 1 + sh
B*F = sh - s* p2 - sh*2pq
B = 1/F * (sh - s* p2 - sh*2pq)
Therefor,
Y = 1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)
Lastly, applying the same treatment to X:
X = (1 + s)/F*(1 + s* p2 + sh*2pq)]
Let X = 1/F + C/*(1 + s* p2 + sh*2pq)], where C must be determined.
1/F + C/*(1 + s* p2 + sh*2pq)] = (1 + s)/F*(1 + s* p2 + sh*2pq)]
1 + s* p2 + sh*2pq + C*F = 1 + s
C*F = s - s* p2 - sh*2pq
C = 1/F * (s - s* p2 - sh*2pq)
In which case:
X = 1/F + s/F*(1 - p2 - h*2pq) / (1 + s* p2 + sh*2pq)
A round of selection using these forms of the fitness equations looks like this:
{[1/F + s/F * (1 - p2 - h*2pq) / (1 + s* p2 + sh*2pq)]* p2 +
[1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)] * 2pq +
[1/F - s/F * (p2 + h*2pq) / (1 + s* p2 + sh*2pq)] * q2} =
1/F
What does it all mean? Well, first of all, notice that the fitness of
each genotype (AA, Aa, and aa) is a sum of 1/F and a factor of s/F that
is dependent upon the selection coefficient, dominance, and allele frequencies.
Notice that if the selection coefficient is zero, the average fitness will
be 1/F, just enough to maintain the species population. Also note that
for all values of p and q such that p + q = 1 (I will be adding this derivation
later), the S/F terms for the three genotypes will always sum to zero,
so the average fitness will always be 1/F.