The Generalized Haldane (GH) model tracking population size changes and resolving paradoxes of genetic drift

  1. State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
  2. Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
  3. Institute of Zoology, Chinese Academy of Sciences, Beijing, China

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Ziyue Gao
    University of Pennsylvania, Philadelphia, United States of America
  • Senior Editor
    George Perry
    Pennsylvania State University, University Park, United States of America

Reviewer #1 (Public review):

The revision by Ruan et al clarifies several aspects of the original manuscript that were difficult to understand, and I think it presents some useful and interesting ideas. I understand that the authors are distinguishing their model from the standard Wright-Fisher model in that the population size is not imposed externally, but is instead a consequence of the stochastic reproduction scheme. Here, the authors chose a branching process but in principle any Markov chain can probably be used. Within this framework, the authors are particularly interested in cases where the variance in reproductive success changes through time, as explored by the DDH model, for example. They argue with some experimental results that there is a reason to believe that the variance in reproductive success does change over time.

One of the key aspects of the original manuscript that I want to engage with is the DDH model. As the authors point out, their equations 5 and 6 are assumptions, and not derived from any principles. In essence, the authors are positing that that the variance in reproductive success, given by 6, changes as a function of the current population size. There is nothing "inherent" to a negative binomial branching mechanism that results in this: in fact, the the variance in offspring number could in principle be the same for all time. As relates to models that exist in the literature, I believe that this is the key difference: unlike Cannings models, the authors allow for a changing variance in reproduction through time.

This is, of course, an interesting thing to consider, and I think that the situation the authors point out, in which drift is lower at small population sizes and larger at large population sizes, is not appreciated in the literature. However, I am not so sure that there is anything that needs to be resolved in Paradox 1. A very strong prediction of that model is that Ne and N could be inversely related, as shown by the blue line in Fig 3b. This suggests that you could see something very strange if you, for example, infer a population size history using a Wright-Fisher framework, because you would infer a population *decline* when there is in fact a population *expansion*. However, as far as I know there are very few "surprising population declines" found in empirical data. An obvious case where we know there is very rapid population growth is human populations; I don't think I've ever seen an inference of recent human demographic history from genetic data that suggests anything other than a massive population expansion. While I appreciate the authors empirical data supporting their claim of Paradox 1 (more on the empirical data later), it's not clear to me that there's a "paradox" in the literature that needs explaining so much as this is a "words of caution about interpreting inferred effective population sizes". To be clear, I think those words of caution are important, and I had never considered that you might be so fundamentally misled as to infer decline when there is growth, but calling it a "paradox" seems to suggest that this is an outstanding problem in the literature, when in fact I think the authors are raising a *new* and important problem. Perhaps an interesting thing for the authors to do to raise the salience of this point would be to perform simulations under this model and then infer effective population sizes using e.g. dadi or psmc and show that you could identify a situation in which the true history is one of growth, but the best fit would be one of decline

The authors also highlight that their approach reflects a case where the population size is determined by the population dynamics themselves, as opposed to being imposed externally as is typical in Cannings models. I agree with the authors that this aspect of population regulation is understudied. Nonetheless, several manuscripts have dealt with the case of population genetic dynamics in populations of stochastically fluctuating size. For example, Kaj and Krone (2003) show that under pretty general conditions you get something very much like a standard coalescent; for example, combining their theorem 1 with their arguments on page 36 and 37, they find that exchangeable populations with stochastic population dynamics where the variance does not change with time still converge to exactly the coalescent you would expect from Cannings models. This is strongly suggestive that the authors key result isn't about stochastic population dynamics per se, but instead related to arguing that variance in reproductive success could change through time. In fact, I believe that the result of Kaj and Krone (2003) is substantially more general than the models considered in this manuscript. That being said, I believe that the authors of this manuscript do a much better job of making the implications for evolutionary processes clear than Kaj and Krone, which is important---it's very difficult to understand from Kaj and Krone the conditions under which effective population sizes will be substantially impacted by stochastic population dynamics.

I also find the authors exposition on Paradox 3 to be somewhat strange. First of all, I'm not sure there's a paradox there at all? The authors claim that the lack of dependence of the fixation probability on Ne is a paradox, but this is ultimately not surprising---fixation of a positively selected allele depends mostly on escaping the boundary layer, which doesn't really depend on the population size (see Gillespie's book "The Causes of Molecular Evolution" for great exposition on boundary layer effects). Moreover, the authors *use a Cannings-style argument* to get gain a good approximation of how the fixation probability changes when there is non-Poisson reproduction. So it's not clear that the WFH model is really doing a lot of work here. I suppose they raise the interesting point that the particularly simple form of p(fix) = 2s is due to the assumption that variance in offspring is equal to 1.

In addition, I raised some concerns about the analysis of empirical results on reproductive variance in my original review, and I don't believe that the authors responded to it at all. I'm not super worried about that analysis, but I think that the authors should probably respond to me.

Overall, I feel like I now have a better understanding of this manuscript. However, I think it still presents its results too strongly: Paradox 1 contains important words of caution that reflect what I am confident is an under appreciated possibility, and Paradox 3 is, as far as I'm concerned, not a paradox at all. I have not addressed Paradox 2 very much because I think that another reviewer had solid and interesting comments on that front and I am leaving it to them. That being said, I do think Paradox 2 actually presents a deep problem in the literature and that the authors' argument may actually represent a path toward a solution.

This manuscript can be a useful contribution to the literature, but as it's presented at the moment, I think most of it is worded too strongly and it continues to not engage appropriately with the literature. Theoretical advances are undoubtedly important, and I think the manuscript presents some interesting things to think about, but ultimately needs to be better situated and several of the claims strongly toned down.

References:
Kaj, I., & Krone, S. M. (2003). The coalescent process in a population with stochastically varying size. Journal of Applied Probability, 40(1), 33-48.

Reviewer #2 (Public review):

Summary:

This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size.

Strengths:

The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems.

The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species.

Comments on previous revisions:

The author has addressed some of the concerns in my review, and I think the revised manuscript is more clear. I like the discussion about the caveats of the WFH model.

I hope the authors could also discuss the conditions needed for V(K)/Ne to be a reasonable approximation. It is currently unclear how the framework should be adopted in general.

The idea about estimating male-female V(K) ratios from population genetic data is interesting. Unfortunately, the results fell short. The accuracy of their estimators (derived using approximation Ne/V(K) approximation, and certain choice of theta, and then theta estimated with Watterson's estimator) should be tested with simulated results before applying to real data. The reliability of their estimator and their results from real data are unclear.

Arguments made in this paper sometimes lack precision (perhaps the authors want to emphasize intuition, but it seems more confusing than otherwise). For example: The authors stated that "This independence from N seems intuitively obvious: when an advantageous mutation increases to say, 100 copies in determining a population (depending mainly on s), its fixation would be almost certain, regardless of N.". Assuming large Ne, and with approximation, one could assume the probability of loss is e^(-2sn), but the writing about "100 copies" and "almost certain" is very imprecise, in fact, a mutation with s=0.001 segregating at 100 copies in a large Ne population is most probably lost. Whereas in a small population, it will be fixed. Yet the following sentence states "regardless of N. This may be a most direct argument against equating genetic drift, certainly no less important than 1/ N . with N, or Ne (which is supposed to be a function of N's)." I find this new paragraph misleading.

Some of the statements/wordings in this paper still seem too strong to me.

Comments on revisions:

The authors toned down. I am a bit confused because I do not seem to find any point-to-point response to my review.

Reviewer #3 (Public review):

Summary:

Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes" --- 1) how Ne depends on N might depend on population dynamics; 2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; 3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data.

Strengths:

- The theoretical results are well-described and easy to follow.
- The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm.
- The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind.
- I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size.
- Equation (10) is a nice result

Comments on revisions:

I appreciate the effort that the authors have put into the revision, but I still find the framing to be a bit confusing -- these apparent paradoxes only appear in the most basic version of Wright-Fisher models, and so framing the paper as the solution to these paradoxes overlooks much previous work. Saying that existing work discussing exactly these phenomena is "beyond the scope of this study", without citing or interacting in any way with that work is unscholarly. I agree with the authors that the apparent paradoxes that they consider and interesting, and by thinking about branching processes, the apparent paradoxes appear to be less paradoxical, but without contextualizing this work in the substantial Wright-Fisher literature (e.g., Cannings Exchangeable Models and the work of Möhle) it misrepresents the state of the field and the contributions of this paper.

Author response:

The following is the authors’ response to the previous reviews.

eLife Assessment (divided into 3 parts)

This study presents a useful modification of a standard model of genetic drift by incorporating variance in reproductive success, claiming to address several paradoxes in molecular evolution. ……

It is crucial to emphasize that our model is NOT a modification of the standard model. The Haldane model, which is generalized here for population regulation, is based on the branching process. The Haldane model and the WF model which is based on population sampling are fundamentally different. We referred to our model as the integrated WF-H model because the results obtained from the WF model over the last 90 years are often (but not always) good approximations for the Haldane model. The analogy would be the comparisons between the Diffusion model and the Coalescence model. Obviously, the results from one model are often good approximations for the other. But it is not right to say that one is a useful modification of the other.

We realize that it is a mistake to call our model the integrated WFH model, thus causing confusions over two entirely different models. Clearly, the word “integrated” did not help. We have now revised the paper by using the more accurate name for the model – the Generalized Haldane (GH) model. The text explains clerarly that the original Haldane model is a special case of the GH model.

Furthermore, we present the paradoxes and resolve them by the GH model. We indeed overreached by claiming that WF models could not resolve them. Whether the WF models have done enough to resolve the paradoxes or at least will be able to resolve them should not be a central point of our study. Here is what we state at the end of this study.:

“We understand that further modifications of the WF models may account for some or all of these paradoxes. However, such modifications have to be biologically feasible and, if possible, intuitively straightforward. Such possible elaborations of WF models are beyond the scope of this study. We are only suggesting that the Haldane model can be extensively generalized to be an alternative approach to genetic drift. The GH model attempts to integrate population genetics and ecology and, thus, can be applied to genetic systems far more complex than those studied before. The companion study is one such example.”

….. However, some of the claimed "paradoxes" seem to be overstatements, as previous literature has pointed out the limitations of the standard model and proposed more advanced models to address those limitations….

As stated in the last paragraph of the paper, it is outside of the scope of our study to comment on whether the earlier WF models can resolve these paradoxes. So, all such statements have been removed or at least drastically toned down in the formal presentation. That said, editors and reviewers may ask whether we are re-inventing the wheels. The answers are as follows:

First, two entirely different models reaching the same conclusion are NOT the re-invention of wheels. The coalescence theory does not merely rediscover the results obtained by the diffusion models. The process of obtaining the results is itself a new invention. This would lead to the next question: is the new process more rigorous and more efficient? I think the Haldane model is indeed more efficient in comparisons with the very complex modifications of the WF models.

Second, we are not sure that the paradoxes have been resolved, or even can be resolved. Note that these skepticisms have been purged from the formal presentation. Thefore, I am presenting the arguments outside of the paper for a purely intellectual discourse. Below, please allow us to address the assertions that the WF models can resolve the paradoxes.

The first paradox is that the drift strength in relation to N is often opposite of the WF model predictions. Since the WF models (standard or modified) do not generate N from within the model, how can it resolve the paradox? In contrast, the Generalized Haldane model generates N within the model. It is the regulation of N near the carrying capacity that creates the paradox – When N increases, drift also increases.

The second paradox that the same locus experiences different drifts in males and females is accepted by the reviewers. Nevertheless, we would like to point out that this second paradox echoed the first one as newly stated in the Discussion section “The second paradox of sex-dependent drift is about different V(K)’s between sexes (generally Vm > Vf) but the same E(K) between them. In the conventional models of sampling, it is not clear what sort of biological sampling scheme could yield V(K) ≠ E(K), let alone two separate V(K)’s with one single E(K). Mathematically, given separate K distributions for males and females, it is unlikely that E(K) for the whole population could be 1, hence, the population would either explode in size or decline to zero. In short, N regulation has to be built into the genetic drift model as the GH model does to avoid this paradox.”

The third paradox stems from the fact that drift is operating even for genes under selection. But then the drift strength, 2s/V(K) for an advantage of s, is indepenent of N or Ne. Since the determinant of drift strength in the WF model is ALWAYS Ne, how is Paradox 3 not a paradox for the WF model?

The 4th paradox about multi-copy gene systems is the subject of the companion paper (Wang et al.). Note that the WF model cannot handle systems of evolution that experience totally different sorts of drift within vs. between hosts (viruses, rDNAs etc). This paradox can be understood by the GH model and and will be addressed in the next paper.

While the modified model presented in this paper yields some intriguing theoretical predictions, the analysis and simulations presented are incomplete to support the authors' strong claims, and it is unclear how much the model helps explain empirical observations.

The objections appear to be that our claims of “paradox resolution” being too strong. We interpret this objection is based on the view (which we agree) that these paradoxes are intrisicallly difficult to resolve by the WF models. Since our model has been perceived to be a modified WF model, the claim of resolution is clearly too strong. However, the GH model is conceptually and operationally entirely different from the WF models as we have emphasized above. In case our reading of the editorial comments is incorrect, would it be possible for some clarifications on the nature of “incomplete support”? We would be grateful for the help.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation