The Cost of Natural Selection for a Diploid Organism

summary from pp. 516 - 517 of Haldane's "The Cost of Natural Selection"

Let pn be the frequency of an allele A in any generation n, and qn be the frequency of an alternative allele a at the same locus for a diploid organism.

Such an organism would have three different genotypes for this locus : AA, Aa, and aa. The expected frequencies of these genotypes, based upon the frequency of the two alleles (A and a) are:
  AA - pn2
   Aa  - 2pnqn
  aa   - qn2

Using the usual relative fitnesses,  (1 - s) of individuals with the aa genotype survive for every one with the AA, and (1 - s*h) individuals having the Aa genotype survive for each AA individual. In the second case, the factor h is >= 0 and <= 1. This is the dominance factor. If h = 1, the a allele is dominant; if h = 0, the A allele is dominant; and for intermediate values of h, we have incomplete dominance. If the frequencies of the three genotypes (AA, Aa, and aa) and the relative fitness of each genotype is 1, 1 - hs, and 1 - s, then after a round of selection, the relative frequencies of the three genotypes will be:
 AA -  1 * pn2 / [ 1 - 2shpnqn - sqn2 ]
 Aa  -  (1 - sh) * 2pnqn / [ 1 - 2shpnqn - sqn2 ]
  aa   -  (1 - s) * qn2 / [ 1 - 2shpnqn - sqn2 ]
In his calculations of the substitution cost, Haldane chose to ignore the denominator (1 - 2shpnqn - sqn2 ) for small s.  It can be shown that ignoring the denominator introduces a maximal error of s  in one generation. If the selection coefficient is 0.01, the error will be less than 1% per generation when q is large (nearly 1.0), and it will go down as q is reduced. Haldane felt this could be ignored, but others have disagreed with Haldane on this point.

Under these conditions, the  fraction of selective deaths due to natural selection for a single generation is given approximately as follows:
    dn = 2shpnqn + sqn2
This becaue (ignoring the denominator) the frequency of the Aa genotype will be reduced by 2shpnqn and the frequency of the aa genotype will be reduced by sqn2 . These reductions are the fraction of selective deaths required for the change.

If q n+1 is defined as the frequency of the a allele after a round of selection,
   qn+1  = 1/2 * (1 - sh) *2 pnqn / [ 1 - 2shpnqn - sqn2 ] + (1 - s) * qn2 / [ 1 - 2shpnqn - sqn2 ]
            = [(1 - sh) * pnqn + (1 - s) * qn2 ]/ [ 1 - 2shpnqn - sqn2 ]

The change in q due to a single generation of selection (Dqn) is given by:
   Dqn =qn+1  - qn
   Dqn =[-pnqn/(sh(pn - qn)]/[1 - 2shpnqn - qn2]
   Dqn =-spnqn[(h(1 - 2h)qn)] approximately (ignoring the denominator). Follow this link for details: Derivation of Dqn for a Diploid
 

Therefor, D, the total of the deaths (as a fraction of the population size) over the course of a substitution is given by:
   D = S¥n=0 [2hsqn + sqn2]     (This is the summation from n=1 to infinity of 2shsqn + sqn2.)
 After placing the constant s term outside the summation, we have:
   D = s*S¥n=0 [2hqn + qn2]