Author response:
eLife Assessment
This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to four viral proteins that are of structural and functional importance. The justification of some hypotheses regarding fitness is incomplete, as well as the evidence for the model's predictive power, since it shows little improvement over neutral models in predicting protein evolution.
We thank for the constructive comments that helped improve our study. Regarding the comment about justification of fitness, we will include in the revised manuscript additional information to support the relevance of modeling protein evolution accounting for protein folding stability. We agree that increasing the parameterization of the developed birth-death model is interesting, if it does not lead to overfitting. The model presented considers the fitness of protein variants to determine their reproductive success through the corresponding birth and death rates, varying among lineages, and it is biologically meaningful and technically correct (Harmon 2019). Following a suggestion of the first reviewer to allow variation of the global birth-death rate among lineages, we will additionally incorporate this aspect into the model and evaluate its performance with the data for the evaluation of the models. The integration of structurally constrained substitution models of protein evolution, as Markov models, into the birth-death process was made following standards approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012) and we will provide more information about it in the revised manuscript. Regarding the predictive power, our study showed good accuracy in predicting the real folding stability of forecasted protein variants. On the other hand, predicting the exact sequences proved to be more challenging, indicating needs in the field of substitution models of molecular evolution. Altogether, we believe our findings provide a significant contribution to the field, as accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Additionally, we implemented the models into a freely available computer framework, with detailed documentation and diverse practical examples.
Public Reviews:
Reviewer #1 (Public review):
Summary:
Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model.
We thank the reviewer for the positive comments about our work.
Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. However, predicting the exact sequences was more challenging. For example, amino acids with similar physicochemical properties can result in similar folding stability while differ in the specific sequence, more accurate substitution models of molecular evolution are required in the field. We consider that forecasting the folding stability of future real proteins is an important advancement in forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify this issue in detail in our responses to the specific comments below.
Major concerns:
(1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birth-death model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny.
We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later for another model derived from the proposal of the reviewer and that we are now implementing into the framework and applying to the data used for the evaluation of the models), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this alters the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution, as Markov models, is correct following general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We will provide a more detailed description of the model in the revised manuscript.
Apart from these clarifications about the birth-death model used, we understand the point of the reviewer and following the suggestion we are now incorporating an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we are following the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate varies among lineages. We are now implementing this model into the computer framework and applying it to the data used for the evaluation of the models. Preliminary results, which will be finally presented in the revised manuscript, indicate that this model yields similar predictive accuracy compared to the previous birth-death model. If this is confirmed, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We will present this additional birth-death model and its results in the revised manuscript.
(2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1).
The study shows similar performance in predicting the sequences of the forecasted proteins under both the SCS model and the neutral model, but shows differences in predicting the folding stability of the forecasted proteins between these models. Indeed, as explained in the previous answer, the birth-death model accounts for variation in fitness among lineages, leading to differences among lineages in reproductive success. The new birth-death model that we are now implementing, which incorporates variation of the global birth-death rate among lineages, is producing similar preliminary results. In addition to these considerations, it is known that SCS models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability. However, inferring sequences (i.e., ancestral sequences) is considerably more challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much greater than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions among amino acids with similar physicochemical properties can result in protein variants with similar folding stability but different specific amino acid sequences; further work is demanded in the field of substitution models of molecular evolution. We will expand the discussion of this aspect in the revised manuscript.
(3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny.
In the present study, we compare the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitutions over time. Therefore, to compare the neutral and SCS models, an evolutionary time is required, in this case it is provided by the birth-death process. The suggestions 1) and 2) cannot be compared without an underlined evolutionary history. However, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in our previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models produced proteins with more realistic folding stability than models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results from the present study where we explore the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant and novel finding, folding stability is fundamental to protein function and has diverse implications. While accurately forecasting the exact sequences would indeed be ideal, this remains a challenging task with current substitution models. In this regard, we will discuss in the revised manuscript the need of developing more accurate substitution models.
(4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work.
This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is provided as an input file and it can be updated to incorporate new structures (see the framework documentation and the practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins to reduce biases), thus incorporating background molecular diversity. This important feature was not sufficiently described in the manuscript, and we will add more details in the revised version. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may impact the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We will include a discussion in the revised manuscript about our perspective on the potential effects of environmental changes on forecasting evolution.
(5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution.
We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of protein (Goldstein 2013), making it broadly applicable. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birth-death models. Rather, we aim to explore the integration of a standard birth-death model with structurally constrained substitution models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and their combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this biological system. We will include these considerations in the revised manuscript.
Reviewer #2 (Public review):
Summary:
In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and co-authors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2.
Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.
We also thank this reviewer for the positive comments on our study. Regarding the predictive power, our results showed good accuracy in predicting the folding stability of the forecasted protein variants. However, predicting the specific sequences of these variants is more challenging. For example, forecasting in amino acids with similar physicochemical properties can result in different sequences but in similar folding stability. We believe that these findings are realistic and interesting as they indicate that while forecasting folding stability is feasible, forecasting the specific sequence evolution is more complex that one could anticipate.
Strengths:
The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints.
We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.
Weaknesses:
The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported.
It is known that structurally constrained substitution (SCS) models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability, while inferring sequences (i.e., ancestral sequences) remains considerably more challenging (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much higher than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can result in protein variants with similar folding stability but with different specific amino acid composition. We will expand the discussion of this aspect in the manuscript.
Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability.
The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding forecasted variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune response. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic divergence between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate forecasting evolution. We will include these considerations in the revised manuscript.
Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). We will provide additional information on this aspect in the manuscript.
Regarding the Omicron dataset, we used 384 curated sequences of the Omicron variant of concern to construct the study dataset and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other timepoints (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. We noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID.
Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations.
Next, following the proposal of the reviewer, we will incorporate the analysis of an additional viral dataset (probably influenza following the suggestion of the reviewer) to further assess the generalizability of the method. Still, as previously indicated, not all datasets are suitable for a proper evaluation of forecasting evolution. Factors such as the shape of the fitness landscape and the amount of genetic variation over time can influence the accuracy of predictions. We will present the results of the analysis of the new data in the revised manuscript.
It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.
Our present study is not focused on investigating the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which is an important evaluation of the method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).
Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.
This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We will include additional details about the parameters of the homology modeling in the revised version. Indeed, our method assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur, and in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We will include this discussion in the revised manuscript.
Cited references
Arenas M. 2012. Simulation of Molecular Data under Diverse Evolutionary Scenarios. PLoS Comput Biol 8:e1002495.
Arenas M, Bastolla U. 2020. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 11:248-257.
Arenas M, Dos Santos HG, Posada D, Bastolla U. 2013. Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020-3028.
Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. 2016. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Molecular Phylogenetics and Evolution 94:264-270.
Arenas M, Sanchez-Cobos A, Bastolla U. 2015. Maximum likelihood phylogenetic inference with selection on protein folding stability. Molecular Biology and Evolution 32:2195-2207.
Arenas M, Weber CC, Liberles DA, Bastolla U. 2017. ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability. Systematic Biology 66:1054-1064.
Bordner AJ, Mittelmann HD. 2013. A new formulation of protein evolutionary models that account for structural constraints. Molecular Biology and Evolution 31:736-749.
Carvajal-Rodriguez A. 2010. Simulation of genes and genomes forward in time. Current Genomics 11:58-61.
Echave J, Spielman SJ, Wilke CO. 2016. Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics 17:109-121.
Echave J, Wilke CO. 2017. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 46:85-103.
Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. 2022. The evolution of the HIV-1 protease folding stability. Virus Evol 8:veac115.
Goldstein RA. 2013. Population Size Dependence of Fitness Effect Distribution and Substitution Rate Probed by Biophysical Model of Protein Thermostability. Genome Biol Evol 5:1584-1593.
Harmon LJ. 2019. Introduction to birth-death models. In. Phylogenetic Comparative Methods. p. https://lukejharmon.github.io/pcm/chapter10_birthdeath/.
Hoban S, Bertorelle G, Gaggiotti OE. 2012. Computer simulations: tools for population and evolutionary genetics. Nature Reviews Genetics 13:110-122.
Illergard K, Ardell DH, Elofsson A. 2009. Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins 77:499-508.
Lässig M, Mustonen V, Walczak AM. 2017. Predicting evolution. Nature Ecology & Evolution 1:0077.
Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, et al. 2012. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science 21:769-785.
Neher RA, Russell CA, Shraiman BI. 2014. Predicting evolution from the shape of genealogical trees. Elife 3.
Olabode AS, Kandathil SM, Lovell SC, Robertson DL. 2017. Adaptive HIV-1 evolutionary trajectories are constrained by protein stability. Virus Evol 3:vex019.
Pascual-Garcia A, Abia D, Mendez R, Nido GS, Bastolla U. 2010. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins 78:181-196.
Wilke CO. 2012. Bringing molecules back into molecular evolution. PLoS Comput Biol 8:e1002572.
Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, et al. 2020. A new coronavirus associated with human respiratory disease in China. Nature 579:265-269.
Yang Z. 2006. Computational Molecular Evolution. Oxford, England.: Oxford University Press.