summary from pp. 514 - 516 of Haldane's "The Cost of Natural Selection"
Let p n be the frequency of an allele A in any generation n, and q n be the frequency of an alternative allele a at the same locus for a haploid organism.
If (1 - s) of individuals carrying the a allele survive for every
one of A, then selection acts on the individuals as follows:
At the start of the generation, frequencies for A and
a will be pn and qn respectively. After selection,
the frequencies of A and and a will change to pn / (1-sqn)
and (1 - s) * qn / (1-sqn).
In his calculations of the substitution cost, Haldane chose to ignore the denominator (1 -sqn) for small s. For instance, if s is .01 and q is .99, the denominator would be 1 - 0.01*.99 which equals 0.9901. Ignoring the denominator introduces an error of 1 - 0.9901 = 0.0099 in one generation. This is an error of less than 1% per generation, and it will go down as q is reduced. (In fact, the maximum error will always be s, the selection coefficient.) Haldane felt this could be ignored, but others have disagreed with Haldane on this point.
The fraction of selective deaths due to natural selection for a single generation is then given as follows:
d n = s*qn
That is because (ignoring the denominator), the new value of q is given
as (1 - s)qn and s*qn is the fraction of selective
deaths.
If q n+1 is defined as the frequency of the a allele
after a round of selection,
qn+1 = [(1 - s)*q n]/[p
n + (1 - s)*q n]
qn+1 = [(1 - s)*q/[p n + q
n - s*q n]
qn+1 = [(1 - s)*q n]/[1 -
s*q n]
(Because p n + q n = 1.)
The change in q ( Dq
) due to one generation of natural selection is given by:
Dq = qn+1
- q n
Dq = -s*p n
*q n /[ 1 - s*q n]
Derivation of Dq
for a Haploid Organism
From this, we can see that for any generation n, qn is given
by:
qn = [1 + (1 - s)-n * (q0-1-
1)]-1
Well, Haldane thought that this was obvious from inspection, but follow
the link below to see how this equation can be derived.
Derivation of the equation for qn
Since d n = s*qn gives the number of deaths due to selection in a single generation ( as a fraction of the population size), we can find the total number of selective deaths (D) ( as a fraction of the initial population size).
By summing the selective deaths over all generations of the selection
process:
D = S¥n=1
s*qn (This is the summation from n=1
to infinity of s*qn.)
D = s*S¥n=1
qn
This D, the summation of selective deaths over all generations of a substitution is what Haldane defined as the Cost of Natural Selection.
Haldane continues at this point to derive an equation to determine the cost for any value of s and p0. Using calculus, he determines that the cost is independent of everything but the initial value of p. The cost is given by the equation:
D = -ln(p0) ( ln is the natural logarithm.)
Follow the link below if you would like to see details of how this equation
is found:
Derivation for the Cost of
Selection for a Haploid Organism
According to Haldane's equation for the cost, if a gene is present in 1 individual in a population of 10,000; the cost for it to become fixed (at 100%) is given as:
D = -ln(1/10000) = 9.21. What this cost means is that in order for the substitution to take place, 9.21 times the population will be required in selective deaths. In other words, in our population of 10,000, approximately 92,100 individuals will have to die in order for the substitution to occur.
Note that the cost is entirely dependent upon p0. Here is
a table showing the cost for several values of p0:
p0
Cost
--------- ----
0.01
4.6
0.001 6.9
0.0001 9.2
0.00001 11.5
0.000001 13.8
0.0000001 16.1
To summarize, the Cost of Natural Selection is the number of selective deaths required to fix an allele in a population. The cost for a haploid organism is entirely dependent upon the initial value of the frequency of the allele to be fixed and is given by the equation D = -ln(p0).