# Abstract

Genetic drift, the random changes in frequencies of neutral variants, is the fundamental force of molecular evolution. Under-estimation of genetic drift is a major cause of mis-conclusions on evolution. However, the standard Wright-Fisher (WF) model of random sampling in population of size *N* only partially defines genetic drift with 1/*N* or 1/*N*_{e} (*N*_{e} being a function of varying *N*’s). In parallel, JBS Haldane proposed the branching process for genetic drift (Haldane 1927), whereby each gene copy is transmitted to *K* descendants with the mean and variance of *E*(*K*) and *V*(*K*). Genetic drift is simply *V*(*K*), i.e., the variance in transmission success, or *V*(*K*)/*N* when averaged over the population. Under the WF model, many paradoxes have emerged: i) Most curiously, genetic drift may often become stronger as *N* becomes larger at the ecological time scale, opposite to the model; ii) Sex chromosomes experience drift differently even with the same normalized *N*; iii) Genetic drift operates on advantageous mutations depends on *V*(*K*) but not on *N*. iv) Irresolution and paradoxes emerge in multi-copy gene systems, including diploidy, viruses and others, whereby evolution progresses both within and between individuals. We now show that the integration of the WF and Haldane (i.e., WFH) model can resolve these paradoxes. Most important, the WFH model can fully define genetic drift in molecular evolution.

**eLife assessment**

This study presents a **useful** modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution. It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks. In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are **incomplete** to support the authors' claims.

# Introduction

Genetic drift, or neutral drift, is defined as random changes in frequencies of neutral variants (Crow and Kimura 1970; Hartl and Clark 1997; Lynch, et al. 2016). Much, or even most, of DNA sequence evolution is governed by the random drift of neutral mutations. If the strength of genetic drift is under-estimated, random changes that are missed in the analysis would result in the over-estimation of other evolutionary forces of greater biological interest, including selection, mutation, migration, meiotic drive and so on. In particular, the conclusion of pervasive selection at the molecular level has been suspected to be due to the failure to fully account for genetic drift (Crow and Kimura 1970; Fu 1997; Li 1997; Charlesworth 2009; Lynch, et al. 2016; Chen, Yang, et al. 2022).

Hagedoorn and Hagedoorn (1921) may be the first to suggest that random forces (accidents, for example) could impact long-term evolution. In the same decade, models of genetic drift were developed, obviously with some disagreements, along two lines. The one approach that became widely adopted is the Wright-Fisher (WF) model (Crow and Kimura 1970; Hartl and Clark 1997), formalized by Fisher (1930) and Wright (1931). They define genetic drift to be solely due to the random sampling of genes. Thus, in generation transition, the variance in the frequency of a variant allele, *x*, would be

where *N* is the population size. The effective population size, *N*_{e}, is a function of varying *N*’s in space, time and demography. For example, when *N* fluctuates, *N*_{e} would be the harmonic mean of *N*’s across time. Weakly or strongly, *N*_{e} should still reflect the actual population size, *N*. In the WF model, genetic drift is governed by 1/ *N*_{e} in haploid populations, or 1/(2*N*_{e}) in diploid populations.

At the same time, JBS Haldane introduced the branching process to model genetic drift (Haldane 1927; Haldane 1932). In the Haldane model, each gene copy is independently transmitted to *K* descendants with the mean and variance of *E*(*K*) and *V*(*K*). Subsequently, the model was expanded by Kimura and Crow (1963) and others (Wright 1969; Gillespie 1975; Eldon and Wakeley 2006; Chen, et al. 2017; Sackman, et al. 2019). Genetic drift is simply *V*(*K*), i.e., the variance in transmission success among neutral variants. There would be no drift if *V*(*K*) = 0. To scale drift at the population level, the variances of changes in gene frequency is governed by *V*(*K*)/*N* (or *V*(*K*)/*N*_{e}; Kimura and Crow (1963); Chen, et al. (2017); see also the Supplementary Information). In short, *V*(*K*) is the reproductive component of genetic drift while 1/*N* is the sampling component or the scaling factor.

In the WF model, drift is governed only by 1/*N* (or 1/*N*_{e}). By dropping the reproductive component (see Eqs. 2-3 for further discussions), it has led to many paradoxes. A most curious one is genetic drift in relation to changing population size. The WF model dictates stronger drift when *N* is smaller. However, when *N* is very small (say, *N* = 10 in a bacteria or yeast population), that increases to 20, 40, 80 and so on, there is in fact little drift in this exponential phase. Drift will increase when *N* grows to near the carrying capacity. This trend is the exact opposite of the main property of the WF model.

The second paradox concerns the genetic drift of sex chromosomes. When normalized by their copy numbers, genetic diversities still differ among autosomes, X and Y in mammals (Hammer, et al. 2001; Wang, et al. 2014). Although selection has been frequently invoked (Charlesworth and Charlesworth 2000; Bachtrog 2013; Cortez, et al. 2014; Wilson Sayres, et al. 2014; Makova, et al. 2023), the issue is if genetic drift due to *V*(*K*) differences between sexes is a sufficient explanation. A third paradox is about drift strength when selection is also at work. A new mutation with a selective advantage of *s* would always be fixed if there is no random drift. The fixation probability with genetic drift is 2*s*/*V*(*K*) (see Results). Drift under selection is hence dependent on *V*(*K*) but, somewhat non-intuitively, not on *N*.

The power of the Haldane model in resolving these paradoxes is the focus of this study. Furthermore, the companion study (Wang et al. 2024) will address a larger issue - the evolution of biological systems. An example is evolution in multi-copy gene systems, whereby evolution proceeds in two stages - between as well as within individuals. Multi-copy gene systems include viruses, mitochondria, transposons and multi-copy genes such as ribosomal RNA (rRNA) repeat units, which are conceptually the WF model. Strictly speaking, even diploidy is also a multi-copy system as a diploid individual usually produces large quantities of gametes that compete to be transmitted (Silver 1985; Wu, et al. 1988; Wu, et al. 1989; Lindholm, et al. 2016; Courret, et al. 2019). In discussion, we will further explore systems of biological evolution that are beyond the power of the WF Model. They include the evolution under domestication, somatic cell evolution and stochastic species competition.

Through theory development supported by experimental data, we present the integration of the Haldane and the WF models into the WFH model. The integrated model is essentially the Haldane model with the approximate solutions obtained by the WF model. Cases where the WF model is not operative will still be handled by the Haldane model as in multi-copy gene systems. In short, the WF model can be considered special cases of the integrated WFH model. Therefore, the wealth of results obtained by the WF model are usually adequate approximations. Nevertheless, paradoxes do emerge as presented in this study.

# Results

In the WFH model, genetic drift is defined as in the Haldane model of branching process (see Eq. (4) of Chen, et al. (2017). Let *K* be the number of progenies receiving the gene copy of interest from a parent; *K* would follow the same distribution for all neutral variants. In haploids, *K* is equivalent to the progeny number of each parent. In diploids, *K* is the transmission success of each gene copy, which is separately tracked. By using 2*N* for the gene copy number in the population, the WF as well as Haldane model treats each diploid individual as two-haploids. By the Taylor approximations of the ratio of two variables (Kendall, et al. 2006), we obtain the approximation as

where *x* is the variant frequency (see Supplementary Information). This is the same approximation, when *V*(*K*) is not very large, as obtained by the WF model (Kimura and Crow 1964; see Chen et al. 2017 for details). For Eq. (2), we provide further extensive simulations on the accuracy in both the fixation probability and fixation time of neutral variants (see Supplementary Information). The equivalence makes a simple point that the WF model represent special cases of the WFH model, and the results are often, but not always, shared.

In the WFH model, there would be no genetic drift if *V*(*K*) = 0, regardless of the value of *N*. When *E*(*K*) = 1 and *N* stays constant, we obtain

While Eqs. 2 and 3 are biologically compatible with the Haldane model, their application to the WF model, as has been done(Kimura and Crow 1963; Crow and Denniston 1988; Chen, et al. 2017), is dubious. This is because the sampling scheme of the WF model dictates Poisson distribution for *K*, hence, *V*(*K*) = *E*(*K*). To permit *V*(*K*) ≠ *E*(*K*), the WF model would have to be modified to resemble the Haldane model(Kimura and Crow 1963; Caballero 1994). Such an awkward “fix” in fact defeats the reason for adopting the simpler WF model over the Haldane model

In Table 1, the data of progeny number from the literature show over-dispersion in all taxa surveyed, i.e., *V*(*K*) > *E*(*K*). (Strictly speaking, the progeny number is the sum of two numbers from the same distribution of K; nevertheless, *V*(*K*) > *E*(*K*) is a valid interpretation of the data.) In fact, *V*(*K*) > 10 *E*(*K*) can often be found, e.g., the ratio of *V*(*K*)/*E*(*K*) among males of the mandrill (*Mandrillus sphinx*), the great reed warbler (*Acrocephalus arundinaceus*), and the rhesus macaque (*Macaca mulatta*) at 19.5, 12.6 and 11.3, respectively (Hasselquist 1995; Setchell, et al. 2005; Dubuc, et al. 2014). We have also found that *V*(*K*) may be far larger than *E*(*K*) in other biological systems such as viruses (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023).

## I. The paradox of genetic drift in relation to *N* changes (Paradox of changing *N*)

### 1. Empirical demonstration and simulation

This “paradox of changing *N*” is that “genetic drift increases when *N* increases”. Fig. 1a shows the results in a cell culture in the exponential growth phase where each cell doubles every 13 hours. Hence, *V*(*K*) ∼ 0 with almost no drift when *N* is as small as < 50. Genetic drift is expected to increase as *N* approaches the carrying capacity. The paradox is exemplified through computer simulations in Fig. 1b-c. These panels show the drift pattern in the Haldane model and the WF model, respectively, when the populations are growing as the logistic growth model (see Methods). In the WF model, the drift is strong initially but weakened as *N* increases. The pattern is reversed in the Haldane model. The contrast between the two panels is the paradox.

To verify the simulations of Fig. 1b-c, we study the growth of yeast populations under laboratory conditions where cells appear to increase following the logistic growth curve. As shown in Fig. 1d, the lower portion of the S-curve portrays the exponential growth in a low-density population, while the upper curve indicates a slowdown in growth near the population’s carrying capacity. We directly recorded the individual output (*K*) of each yeast cell in the early (*n* = 25) and late (*n* = 65) stage of population growth under real-time high-content fluorescence imaging (Methods).

It’s evident that *V*(*K*) is decoupled from *E*(*K*). In the early stage, *E*(*K*) is 10.04 with *V*(*K*) = 0.60 over five one-hour intervals. In the high-density phase, *E*(*K*) decreases to 2.83 but *V*(*K*) increases to 1.09 in four intervals (Fig. 1e-f). Most interestingly, during the late stage of population growth, *V*(Δ*K*) (adjusted to the four-hour time span for comparison; see Methods) indeed increases as *N* increases (**Supplementary Table 3**). At the last time point, the population is closest to the carrying capacity and *V*(Δ*K*) experiences a substantial leap from around 1.12 to 1.82. Overall, the variance in progeny number *V*(*K*) in a high-density population is always greater that in a low-density population (Fig. 1g). Indeed, *V*(*K*) may exhibit an increase with the increase in *N*, sometimes outpaces the latter.

Measurements of these two growth phases demonstrate the “paradox of changing *N*” by showing i) there is almost no drift when *N* is very small; ii) as *N* increases, *V*(*K*), hence, genetic drift would also increase. Furthermore, *V*(*K*) may increase sharply when *N* is close to the carrying capacity. This trend appears to echo the field observations by Coltman, et al. (1999) who show that *V*(*K*) increases more than 5-fold when *N* increases only 2.5-fold in the United Kingdom (UK) population of Soay rams (*Ovis aries*).

### 2. The density-dependent Haldane (DDH) model

The “paradox of changing *N*” can be explained by *N*_{e} = *N*/*V*(*K*) if the numerator and denominator move in the same direction. Most interestingly, *V*(*K*) may increase more than *N* itself. This has been shown when *V*(*K*) is close to zero while *N* is growing exponentially. In addition, when *N* is very near the carrying capacity, even a small increase in *N* would intensify the competition and inflate *V*(*K*). This is shown in the yeast experiment of Fig. 1d and supported by Coltman, et al. (1999) field study of ram populations in UK. Thus, the relationship between *N* and *V*(*K*) may realistically lead to this paradox.

We now extend the Haldane model by incorporating density-dependent regulation of *N* in order to define the conditions of this paradox. The model developed here is on an ecological, rather than an evolutionary time scale. The simplest model of *N* regulation is the logistic equation:

with *C*_{k} being the carrying capacity. In the ecological time scale, a changing *N* would mean that the population is departing from *C*_{k} or moving toward it (Fig. 2a). (As will be addressed in Discussion, changing *N* in the WF model is depicted in Fig. 2b whereby *N* is at *C*_{k}. *C*_{k} may evolve too, albeit much more slowly in an evolutionary time scale.) Here, we consider changes in *N* in the ecological time scale. Examples include the exponential growth when *N* ≪ *C*_{k}, seasonal fluctuation in *N* (Frankham 1995; Charlesworth 2009) and competition modeled by the Lotka-Volterra equations (Bomze 1983).

Details of the DDH model are given in the Supplementary Information. A synopsis is given here: We consider a haploid population with two neutral alleles. The population size at time *t* is *N*_{t}. It is assumed that *E*(*K*) > 1 when *N* < *C*_{k} and *E*(*K*) < 1 when *N* > *C*_{k} as defined in Eq. (5) below.

The slope of *E*(*K*) vs. *N*, as shown in Fig 3a, is a function of *z*. To obtain *V*(*K*), we assume that *K* follows the negative binomial distribution whereby parents would suffer reproduction-arresting injury with a probability of *p*_{t} at each birthing (Supplementary Information). *V*(*K*) can then be expressed as

By Eq. (6), the ratio of *V*(*K*)/*E*(*K*) could be constant, decrease or increase with the increase of population size. With *E*(*K*) and *V*(*K*) defined, we could obtain the effective population size by substituting Eq. (5) and Eq. (6) into Eq. (3).

Eq. (7) presents the relationship between effective population size (*N*_{e}) and the population size (*N*) as shown in Fig. 3. The density-dependent *E*(*K*) could regulate *N* with different strength (Fig. 3a). The steeper the slope in Fig. 3a, the stronger the regulation.

The main results of this study are depicted in Fig. 3b. First, with no or little regulation of *N, N*_{e} and *N* are strongly correlated. The green dashed lines portray the conventional view of decreasing drift with increasing *N*. Second, under a particular strength of *N* regulation, the red line shows no dependence of *N*_{e} on *N*, meaning that genetic drift is independent of *N*. Finally, as *N* becomes strongly regulated, *N*_{e} and *N* would be negatively correlated as the blue dashed line shows. This trend is the paradox of changing *N*.

In summary, genetic drift effect can indeed decrease, increase or stay nearly constant as the population size increases, depending on the strength of *N* regulation near *C*_{k}. We further note that the *V*(*K*)/*E*(*K*) ratio of Eq. (7) can be independent (Fig. 3c), positively dependent (Fig. 3d), or negatively dependent (Fig. 3e) on *N*. Interestingly, the results of Fig. 3b are nearly identical when the *V*(*K*)/*E*(*K*) ratio changes (**Supplementary Figs 1-3**). These results show that the strength of genetic drift depends on the ecology that governs *E*(*K*), *V*(*K*) and *N*.

In the WFH framework, the paradox of changing *N* may not be an exception but a common rule in natural populations. Note that the WF approximation yielding Eq. (2) assumes nearly constant *N*. The assumption would imply very strong *N* regulation near *C*_{k} and, according to the DDH model, this is the condition for realizing the paradox of changing *N*. Coltman et al. (1999)’s observations of the reproduction in rams may be such a realization.

## II. The paradox of genetic drift in sex chromosomes

Since the relative numbers of Y, X and each autosome (A) in the human population are 1:3:4, the WF model would predict the genetic diversity (θ_{Y}, θ_{X} and θ_{A}, respectively) to be proportional to 1:3:4. In a survey of human and primate genetic diversity, θ_{Y} is almost always less than expected as has been commonly reported (Hammer, et al. 2001; Wang, et al. 2014; Wilson Sayres, et al. 2014; Makova, et al. 2023). The low θ_{Y} value has been used to suggest that human Y chromosomes are under either positive or purifying selection (Wang, et al. 2014; Wilson Sayres, et al. 2014; Makova, et al. 2023). Similarly, the reduced diversity X-linked genes was interpreted as a signature of positive selection (Pan, et al. 2022).

As pointed out above, under-estimation of genetic drift is a major cause of over-estimation of selective strength. Our goal is hence to see if genetic drift of different strength between sexes can account for the observed genetic diversities. Details will be presented in Wang et al. (2024). Below is a synopsis.

Let *V*_{m} and *V*_{f} be the *V*(*K*) for males and females respectively and α*’* = *V*_{m}/*V*_{f} is our focus. (We use α*’* for the male-to-female ratio in *V*(*K*) since α is commonly used for the ratio of mutation rate (Miyata, et al. 1987; Makova and Li 2002).) The three ratios, θ_{Y} /θ_{X} (denoted as *R*_{YX}), Θ_{y} /θ_{A} (*R*_{YA}) and θ_{X} /θ_{A} (*R*_{XA}), could be expressed as the functions of α*’*, which incorporate the relative mutation rates of autosomes, X and Y (Supplementary Information). These relative mutation rates can be obtained by interspecific comparisons as done by Makova and Li (2002).

We note that there are many measures for the within-species genetic diversity, θ = 4*N*_{e}μ. Under strict neutrality and demographic equilibrium, these measures should all converge. In the neutral equilibrium, the infinite site model dictates the frequency spectrum to be ξ_{i} = θ/*i*, where ξ_{i} is the number of sites with the variant occurring *i* times in *n* samples. Since every frequency bin is a measure of θ, different measures put different weights on the *i*-th bin (Fay and Wu 2000; Fu 2022). While π, the mean pairwise differences between sequences, is most commonly used in the literature, we use several statistics to minimize the possible influences of selection and demography (Wang et al. 2024). In this synopsis, we used the Watterson estimator (Watterson 1975) as the measure for θ_{Y}, θ_{X} and θ_{A} by counting the number of segregating sites.

Using any of the three ratios (θ_{Y} /θ_{X}, θ_{Y} /θ_{A} and θ_{X} /θ_{A}), we can obtain α*’*; for example, *R*_{YA} = θ_{Y} /θ_{A} and, hence,

where *y* is the mutation rate of Y-linked sequences relative to autosomal sequences. With rearrangement,

Similar formulae of α*’* can be obtained for *R*_{YX} and *R*_{XA} but the accuracy for estimating α*’* is highest by *R*_{YA} whereas *R*_{XA} is the least accurate for this purpose (Supplementary Information).

Table 2 presents the α*’* estimates in chimpanzees and bonobos. This is part of a general survey in mammals with a strong emphasis on primates and humans (Wang et al. 2024). It is almost always true that α*’* > 5 in primates. Sources of data used are also given in Table 2. As shown for chimpanzees, α*’* is often far larger than 5, above which the resolution is very low as can be seen in Eq. (8). Note that, when *R*_{YA} is under-estimated and approaching *y*/8, α*’* would increase rapidly when the denominator is close to 0. That is why α*’* often becomes infinity (**Supplementary Fig. 5**). The estimated α*’* (MSE) in Table 2 alleviates the problem somewhat by using all three ratios to calculate the mean square error (MSE) (Supplementary Information).

Among primates surveyed, bonobo is the only exception with α*’* < 1. While chimpanzees and bonobos are each’s closest relatives, their sexual behaviors are very divergent (de Waal 1995; De Waal and Lanting 2023). With unusually strong matriarchs, bonobo society seems to stand out among primates in sexual dominance. The α*’* value is important in behavioral studies and, particularly, in primatology and hominoid research. If 0.1% of males sire 100 children and the rest have the same *K* distribution as females, *V*_{m}/*V*_{f} would be ∼ 10. Such outlier contribution is the equivalent of super-spreaders in viral evolution which may easily be missed in field studies. In this brief exposition, we highlight again that *V*(*K*) or, more specifically, *V*_{m} and *V*_{f} are key to genetic drift.

## III. The paradox of genetic drift under selection

Genetic drift operates on neutral as well as non-neutral mutations. Let us assume a new mutation, M, with a frequency of 1/*N* is fixed in the population with the probability of *P*_{f}. Fisher (1930) first suggested that the fixation probability of an advantageous mutation should increase in growing populations, while decrease in shrinking populations. If there is no genetic drift (i.e., infinite population size in WF model), a beneficial mutation will always be fixed and *P*_{f} = 1. In the WF model, it is well known that *P*_{f} ∼ 2*s* for a new advantageous mutation, with fitness gain of *s*, while population size is large (Haldane 1932; Kimura 1962). This seems paradoxical that the determinant of genetic drift, 1/*N*_{e}, does not influence *P*_{f} (Otto and Whitlock 1997; Lanfear, et al. 2014).

In the Method section, we show that *P*_{f} under the Haldane model is given by

where *M*_{0} and *W*_{0} are the initial number of mutant allele (M allele) and wildtype allele (W allele), with *N* = *M*_{0} + *W*_{0}. We note that *u*_{M}(*t*) and *u*_{W}(*t*), respectively, are the extinction probabilities of M allele and W allele by generation *t* with the initial number of alleles of 1. While obtaining the direct analytical solution of Eq. (9) may not be feasible, we could obtain its numerical solution due to its convergence as *t* increases (see Methods). The accuracy of the numerical solution from Eq. (9) is confirmed through simulation (**Supplementary Fig. 4**). Moreover, with the aid of the WF model (see Supplementary Information), we could obtain the approximation of Eq. (9) as follows.

We verify Eq. (10) by both the numerical solution from Eq. (9) and simulations based on the branching process (**Supplementary Fig. 4**). The fixation probabilities obtained by numerical solution vs. those inferred from Eq. (10) are shown in Fig. 4. The salient feature is that the fixation probability of 2*s*, as in the classical formula, would be a substantial over-estimate when *V*(*K*) is larger than *E*(*K*). Eq. (10) is sufficiently accurate as long as *N*≥50. When *N* is as small as 10, the theoretical result is biased. Indeed, at such a low *N* value, the population is prone to extinction. The DDH model presented above should rectify this deficiency. The main message of Eq. (10) is that genetic drift under positive selection is influenced by *V*(*K*) but not by *N*. Hence, *V*(*K*) is a vital factor in determining genetic drift, certainly no less important than 1/*N*.

# Discussion

Genetic drift should broadly include all random forces affecting evolutionary trajectories. Many of these forces are excluded from the WF model, thus resulting in the many paradoxes addressed in the two companion papers, as well as the over-estimation of selection in the literature.

Compared with the conventional WF model of genetic drift, the Haldane model is conceptually more intuitive as well as more inclusive in depicting the real populations. However, analytical solutions of the branching process can be more challenging to obtain. The WF model is often more tractable mathematically. For example, *P*_{f} ∼ 2s/*V*(*K*) is a good WF approximation (Kimura and Crow 1963) to the Haldane model when N is no smaller than 100 (Fig. 4). The integrated WFH model is therefore conceptually appealing and computationally feasible.

Nevertheless, in many circumstances where the WF model is not operative, genetic drift defined by the Haldane model would demand new solutions. The DDH model is such an example, developed to resolve the paradox of changing *N*’s. The DDH model presents the conditions under which the WF model arrives in the approximate solution (i.e., Eq. (2)). The substitution of *E*(*N*) for *N* makes the approximation possible but effectively requires strong regulation of *N*. Strong regulation is the main condition for the paradox of changing *N*’s. To arrive at the approximation, the WF model in fact makes an assumption that contradicts its own prediction (that drift becomes weaker as *N* increases).

We shall now clarify the “paradox of changing *N*’s” on two fronts. First, the DDH model is on an ecological time scale whereby a population is either approaching or departing the carrying capacity (Fig. 2a). In many studies, the population size changes in the evolutionary time scale (Fig. 2b). In such studies, it is the carrying capacity *C*_{k} that is evolving while *N* hovers around *C*_{k}. For example, by the PSMC model (Li and Durbin 2011), both the European and Chinese populations experience a severe bottleneck 10-60 kyr ago. During that Last Glacial Maximum, environmental harshness may have reduced *C*_{k} drastically. Averaged over the long-time span, *N* should be at or very near *C*_{k}. Since *C*_{k} presumably evolves slowly, *N* can be assumed constant over long stretches of time. Later models with different powers than PSMC have been proposed to study *N*’s (or *C*_{k}’s) in the evolutionary time scale (Hu, et al. 2023). Second, Gillespie has proposed the concept of “genetic draft” that may also reverse the effect of population size on the rate of substitution due to selection on linked variations (Gillespie 2000, 2001). Genetic draft of Gillespie is thus distinct from the strictly neutral process discussed in this study.

A larger issue is that the WF model only covers a portion of biological evolution. All multi-copy gene systems where the evolution happens both within and between individuals, are out of reach. Viruses, transposons, mitochondria (Bazin, et al. 2006) and rRNA genes are all such systems. This topic will be addressed in the companion paper that focuses on the evolution of multi-copy rRNA genes. In fact, all diploids are the simplest kinds of the multi-copy gene systems. Because large quantities of gametes compete within individual to be transmitted, gametic selection, meiotic drive and genetic drift may all happen within individuals (Wu, et al. 1988; Wu, et al. 1989; Wu and Hammer 1991; Lindholm, et al. 2016; Courret, et al. 2019). The WF model is in essence a haploid model with 2*N* copies of genes rather than a true diploid model.

Like multi-copy gene systems, the WF model is limited, conceptually as well as operationally, by the difficulty in delineating WF populations. That is why multi-copy genes, as well as many other systems, do not fit the WF model, thus resulting in the lack of a genetic drift component in such studies. For example, domestications of animals and plants are very rapid processes of evolution (Diamond 2002; Larson and Fuller 2014; Purugganan 2019). Due to the very large *V*(*K*) in domestication, drift must have played a large role in such processes, but is rarely considered. Somatic cell evolution is another example with “undefinable” genetic drift (Wu, et al. 2016; Chen, et al. 2017; Chen, et al. 2019; Ruan, et al. 2020; Chen, Wu, et al. 2022). The Haldane (or WFH) model, as an “individual output” model, can handle these general cases of genetic drift. Finally, the WFH model is so general that even genetic drift and random forces in species competition can be incorporated.

In conclusion, the integrated WFH model should clarify the biological meaning of genetic drift, wherever the WF and Haldane model can both be used. In those systems whereby only the Haldane model works, further development of the model will be necessary.

# Methods

## Cell culture and image analysis

NIH3T3 cells, a fibroblast cell line that was isolated from a mouse NIH/Swiss embryo, were stably transfected with the fluorescent, ubiquitination-based cell cycle indicator (Fucci) (Sakaue-Sawano, et al. 2008) plasmid using Lipofectamine 3000 Transfection Reagent (Invitrogen) following the manufacturer’s specified instructions. The Fucci-labeled cells exhibited distinct fluorescent signals indicative of G1, S, and G2/M phases, represented by red, yellow, and green, respectively. NIH3T3-Fucci cell was derived from single-cell colony and cultured in DMEM supplemented with 10% Calf Bovine Serum and penicillin/streptomycin. All cells were maintained at 37°C with 5% CO_{2}. Subsequently, the cells underwent extended time-lapse imaging using high-content fluorescence microscopy (PerkinElmer Operetta CLS) equipped with a 10x objective lens. Images were captured hourly over a 100-hour period, and the analysis was conducted using ImageJ (Fiji) (Schneider, et al. 2012) to count the number of cells (**Supplementary Table 1**).

## Yeast strain construction

Strains were constructed on the genetic background of *Saccharomyces cerevisiae* strain BY4741. A GFP or BFP fluorescent protein expression cassette, under the control of the TDH3 promoter, was inserted into the pseudogene locus YLL017W. Transformations followed a published protocol (Gietz and Schiestl 2007). Transformants were plated on synthetic complete medium without uracil (SC-Ura), and from these, single colonies were selected. Confirmation of replacements was achieved through PCR, and cassette verification was performed using fluorescence microscopy. Subsequently, the constructed strains were cultivated non-selectively in YPD medium (1% Yeast extract, 2% Peptone, 2% D-glucose) at 30 _ on a rotary shaker.

## Estimation of *V*(*K*) and *E*(*K*) in yeast cells

To discern division events, even under high concentration, we conducted co-cultures of the GFP-yeast and BFP-yeast at ratios of 1:1 and 1:25, with the initial cell concentration of 0.1% and 12.5%, respectively. Then the yeast cells were then continuously imaged under high-content fluorescence microscopy (PerkinElmer Operetta CLS) for 10 hours with 1-hour intervals to observe the individual offspring of GFP-yeast. Yeast cells with a distinct offspring number (*K*) within the initial 5 hours (with the high-density group limited to the first 4 hours) were documented (**Supplementary Table 2**). Subsequently, the mean and variance of *K* for each cell were calculated across various time intervals as follows.

According to the law of total variance,

where *I*_{t} is the offspring number at time *t* for an initial single cell (i.e., the total number of progeny cells for a single cell after *t* hours), as documented in **Supplementary Table 2**. And *E*_{t-1, t} represents the average of offspring number after single time interval (1 hour here) for each single cell from time *t*-1 to time *t*, while *V*_{t-1, t} is the corresponding variance. Utilizing the documented total cell count at time *t* (denoted by *N*_{t}), we could calculate the average of offspring number for a single yeast cell in a specific time interval, e.g., *E*_{t-1, t} = *N*_{t} / *N*_{t-1} and *E*[*I*_{t-1}] = *N*_{t} /*N*_{0}. With some rearrangement,

Applying the aforementioned equations, we observed that both *V*_{t-1, t} (i.e., *V*(*K*) within a one-hour interval) and *V*(*I*_{t}) (i.e., the *V*(*K*) from 0 to time *t*) for yeast cells under high density (12.5%) are generally greater than the values under low density (**Supplementary Table 3**). This suggests that the variance of offspring number in high-density conditions could be larger than that in a low-density context. It’s noteworthy that the estimated value of *V*_{t-1, t} could be less than zero (**Supplementary Table 3**), implying a very small variance in the number of offspring. Furthermore, the occurrence of negative values for *V*_{t-1, t} is more frequent under low density than high density, indicating the higher variance of offspring number in high density as previously suggested.

To show the variance of offspring number overtime, we also calculated the change of offspring number within successive one-hour intervals, denoted as *V*(*I*_{t} - *I*_{t-1}). This value is intricately linked to the variance of offspring overtime. To facilitate the comparison with the total variance from 0h to 4h, we set *V(*Δ*K*) = 4*V*(*I*_{t} - *I*_{t-1}) to account for the four-hour time span.

## Simulation of genetic drift in the Haldane model and the Wright-Fisher (WF) model

In both models, the average number of offspring, denoted as *E*(*K*_{t}), follow Eq. (5), leading to the logistic population growth. The initial population size is set to 4, with initial gene frequency (i.e., the frequency of mutant allele) of 0.5. And the carrying capacity is established at 100, 000, with *r* = 1 and *z* = 1. In WF model, the population size at next generation is *N*_{t+1} = *N*_{t} ×*E*(*K*_{t}). Note if the calculated value of *N*_{t+1} is not an integer, we round it to the nearest whole number. The number of mutant alleles follows a binomial distribution, allowing the simulation of gene frequency overtime in WF model. For Haldane model, the number of offspring number is assumed to follow a beta-binomial distribution. The mean of offspring number *E*(*K*_{t}) is obtained from Eq. (5). The variance *V*(*K*_{t}) is obtained from Eq. (6) with parameters *a* = 1/300 and *b* = 1.2. Given the mean and variance of the beta-binomial distribution, we simulated the number of mutant alleles and population size overtime and then traced the gene frequency overtime.

## Fixation probability of a new mutation in Haldane model

Here we obtain the fixation probability of advantageous mutation in Haldane model governed by a branching process. The Haldane model considers a well-mixed population of *N*_{t} haploid parents with only two types of alleles (W for wildtype, M for mutant). There is no migration and new mutations in this model. Each individual is assumed to independently reproduce in discrete and non-overlapping generation, *t* = 0, 1, 2, …. Thus, the number of offspring per allele (also an individual in this haploid population) at any generation is represented by a set of identically and individually distributed random variables. In particular, the numbers of offspring of M and W alleles (denoted as *K*_{M} and *K*_{W} respectively) can be represented by following distributions.

For a particular distribution, we can obtain the average number of offspring as follows.

With the selection coefficient of *s* (0 for neutral mutation, positive for advantageous mutation, and negative for deleterious mutation) of M allele, average offspring number of M allele and W allele will follow the relationship.

Now, the evolution of the number of M allele (denoted as *M*_{t}) and the number of W allele (denoted as *W*_{t}) as time process is a branching process.

To compare Haldane model with WF model (the offspring number follows Poisson distribution with variance and mean equal to 1), we will set *E*(*K*_{W}) = 1, *E*(*K*_{M}) = 1 + *s*. And then let the offspring number follows a more general and realistic distribution (including negative binomial distribution (negBin), beta-binomial distribution (betaBin)), which can let the ratio of variance to mean range from 1 to a large number. (For simplicity, we let *V*(*K*_{M})/*E*(*K*_{M}) *= V*(*K*_{W})/*E*(*K*_{W}).) Based on the branching process, we obtained the fixation probability M alleles (see Supplementary Information for more details on the derivation).

where

Note *u*_{M}(*t* = 1) = *P*(*K*_{M} = 0) = *i*_{0}, *u*_{W}(*t* = 1) = *P*(*K*_{W} = 0) = *j*_{0}. And both {*u*_{M}(*t*), *t* = 0, 1, 2, …} and {*u*_{W}(*t*), *t* = 0, 1, 2, …} are a bounded monotonic sequence. Thus, although we cannot obtain the direct analytical solution for *P*_{f}, we could easily obtain their numerical solution by iteration to the case when both *u*_{M}(*t*) and *u*_{W}(*t*) converge.

# Acknowledgements

We thank Weiwei Zhai, Xionglei He for helpful comments. The work was supported by the National Natural Science Foundation of China (32200493 to Y.R., 32293193 and 32150006 to C.I.W., 81972691 to H.W.), the National Key Research and Development Projects of the Ministry of Science and Technology of China (2021YFC2301300, 2021YFC0863400).

# Competing interest statement

Authors declare that they have no competing interests.

# References

- Y-chromosome evolution: emerging insights into processes of Y-chromosome degeneration
*Nat. Rev. Genet***14**:113–124 - Population size does not influence mitochondrial genetic diversity in animals
*Science***312**:570–572 - Lotka-Volterra equation and replicator dynamics: A two-dimensional classification
*Biol. Cybern***48**:201–211 - Developments in the prediction of effective population size
*Heredity***73**:657–679 - Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation
*Nat. Rev. Genet***10**:195–205 - The degeneration of Y chromosomes
*Philos Trans R Soc Lond B Biol Sci***355**:1563–1572 - Tumorigenesis as the Paradigm of Quasi-neutral Molecular Evolution
*Mol. Biol. Evol***36**:1430–1441 - Very large hidden genetic diversity in one single tumor: evidence for tumors-in-tumor
*Natl. Sci. Rev***9** - Two decades of suspect evidence for adaptive molecular evolution-negative selection confounding positive-selection signals
*Natl. Sci. Rev***9** - A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations
*Mol. Biol. Evol***34**:2057–2064 - Density-dependent variation in lifetime breeding success and natural and sexual selection in Soay rams
*Am. Nat***154**:730–746 - Origins and functional evolution of Y chromosomes across mammals
*Nature***508**:488–493 - Meiotic drive mechanisms: lessons from Drosophila
*Proc Biol Sci***286** - Inbreeding and Variance Effective Population Numbers
*Evolution***42**:482–495 - An Introduction to Population Genetics Theory
- Bonobo sex and society
*Sci. Am***272**:82–88 - Bonobo: The forgotten ape
- Evolution, consequences and future of plant and animal domestication
*Nature***418**:700–707 - Variance in male lifetime reproductive success and estimation of the degree of polygyny in a primate
*Behav. Ecol***25**:878–889 - Coalescent processes when the distribution of offspring number among individuals is highly skewed
*Genetics***172**:2621–2633 - Hitchhiking under positive Darwinian selection
*Genetics***155**:1405–1413 - The genetical theory of natural selection
- Effective population size/adult population size ratios in wildlife: a review
*Genetics Research***66**:95–107 - Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection
*Genetics***147**:915–925 - Variances and covariances of linear summary statistics of segregating sites
*Theor. Popul. Biol***145**:95–108 - High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method
*Nat Protoc***2**:31–34 - Genetic drift in an infinite population. The pseudohitchhiking model
*Genetics***155**:909–919 - Is the population size of a species relevant to its evolution?
*Evolution***55**:2161–2169 - Natural selection for within-generation variance in offspring number II. Discrite haploid models
*Genetics***81**:403–413 - The relative value of the processes causing evolution
- The Causes of Evolution
- A mathematical theory of natural and artificial selection, part V: selection and mutation
- Great ape Y Chromosome and mitochondrial DNA phylogenies reflect subspecies structure and patterns of mating and dispersal
*Genome Res***26**:427–439 - Hierarchical patterns of global human Y-chromosome diversity
*Mol. Biol. Evol***18**:1189–1203 - Principles of Population Genetics
- Demography and Lifetime Reproductive Success in the Polygynous Great Reed Warbler
*Japanese Journal of Ornithology***44**:181–194 - Lifetime reproductive success of Snowy Plovers in coastal northern California
*The Condor***117**:473–481 - Intra-vs. Interhost Evolution of SARS-CoV-2 Driven by Uncorrelated Selection-The Evolution Thwarted
*Mol. Biol. Evol***40** - Genomic inference of a severe human bottleneck during the Early to Middle Pleistocene transition
*Science***381**:979–984 - Kendall’s advanced theory of statistics
- On the probability of fixation of mutant genes in a population
*Genetics***47**:713–719 - The measurement of effective population number
*Evolution*:279–288 - Lifetime Reproductive Success in Common Buzzard, Buteo buteo: From Individual Variation to Population Demography
*Oikos***93**:260–273 - Age-related and individual differences of reproductive success in male and female barbary macaques (Macaca sylvanus)
*Primates***36**:461–476 - Population size and the rate of evolution
*Trends Ecol. Evol***29**:33–41 - The Evolution of Animal Domestication
*Annual Review of Ecology, Evolution, and Systematics***45**:115–136 - Inference of human population history from individual whole-genome sequences
*Nature***475**:493–496 - Molecular evolution
- The Ecology and Evolutionary Dynamics of Meiotic Drive
*Trends Ecol. Evol***31**:315–326 - Genetic drift, selection and the evolution of the mutation rate
*Nat. Rev. Genet***17**:704–714 - Strong male-driven evolution of DNA sequences in humans and apes
*Nature***416**:624–626 - The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes
*bioRxiv* - Male-driven molecular evolution: a model and nucleotide sequence analysis
*Cold Spring Harb Symp Quant Biol***52**:863–867 - Extreme selective sweeps independently targeted the X chromosomes of the great apes
*Proc Natl Acad Sci U S A***112**:6413–6418 - Lifetime reproductive success in the spotted sandpiper (Actitis macularia) : sex differences and variance components
*Behav. Ecol. Sociobiol***28**:425–432 - The probability of fixation in populations of changing size
*Genetics***146**:723–733 - Lineage-specific positive selection on ACE2 contributes to the genetic susceptibility of COVID-19
*Natl. Sci. Rev* - Evolutionary Insights into the Nature of Plant Domestication
*Curr. Biol***29**:R705–R714 - Lifetime Reproductive Success and its Correlates in the Monogamous Rodent, Peromyscus californicus
*The Journal of Animal Ecology***61**:457–468 - On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all?
*Natl. Sci. Rev* - Mutations Beget More Mutations-Rapid Evolution of Mutation Rate in Response to the Risk of Runaway Accumulation
*Mol. Biol. Evol***37**:1007–1019 - A theoretical exploration of the origin and early evolution of a pandemic
*Sci Bull (Beijing)***66**:1022–1029 - Inferring Demography and Selection in Organisms Characterized by Skewed Offspring Distributions
*Genetics***211**:1019–1028 - Visualizing spatiotemporal dynamics of multicellular cell-cycle progression
*Cell***132**:487–498 - NIH Image to ImageJ: 25 years of image analysis
*Nat Methods***9**:671–675 - Sexual selection and reproductive careers in mandrills (Mandrillus sphinx)
*Behav. Ecol. Sociobiol***58**:474–485 - Mouse t haplotypes
*Annu. Rev. Genet***19**:179–208 - Natural selection on human Y chromosomes
*J Genet Genomics***41**:47–52 - On the number of segregating sites in genetical models without recombination
*Theor. Popul. Biol***7**:256–276 - Natural selection reduced diversity on human y chromosomes
*PLoS Genet***10** - Evolution and the Genetics of Populations, Volume 2: Theory of gene frequencies
- Evolution in Mendelian Populations
*Genetics***16**:97–159 - Evolution at the molecular level
- Association between a satellite DNA sequence and the Responder of Segregation Distorter in D. melanogaster
*Cell***54**:179–189 - Fitness reduction associated with the deletion of a satellite DNA array
*Nature***341**:248–251 - The Ecology and Evolution of Cancer: The Ultra-Microevolutionary Process
*Annu. Rev. Genet***50**:347–369 - Natal conditions, lifespan and lifetime reproductive success of European blackbirds
*Behav. Ecol***30**:1707–1714

# Article and author information

## Version history

- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:

## Copyright

© 2024, Ruan et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

# Metrics

- views
- 197
- downloads
- 0
- citations
- 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.