Epistasis facilitates functional evolution in an ancient transcription factor

Brian P.H. Metzger; Yeonwoo Park; Tyler N. Starr; Joseph W. Thornton

doi:10.7554/eLife.88737.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Armita Nourmohammad
University of Washington, Seattle, United States of America
Senior Editor
Christian Landry
Université Laval, Québec, Canada

Reviewer #1 (Public Review):

Metzger et al develop a rigorous method filling an important unmet need in protein evolution - analysis of protein genetic architecture and evolution using data from combinatorially complete 20^N variant libraries. Addressing this need has become increasingly valuable, as experimental methods for generating these datasets expand in scope and scale. Their model incorporates several key features - (1) it reports the effects of mutations relative to the average across all variants, rather than a particular genotype, making it useful for examining genetic architectures and evolution in a less biased way, (2) it infers contributions from both "specific" and "non-specific" epistasis, which is essential for some experimental measurements, and perhaps most importantly (3) it does this for all possible 20 states at each site, in contrast to the binary analyses in prior work. These features are not individually novel but integrating them into a single analysis framework is novel and will be incredibly valuable to the protein evolution community. Using a previously published dataset generated by two of the authors, they conclude that (1) changes in function are largely attributable to pairwise but not higher-order interactions, and (2) epistasis potentiates, rather than constrains, evolutionary paths. These findings are well-supported by the data, though the authors' claim that higher-order epistasis cannot account for the variation they see could be better supported by additional analyses or discussion (as noted in recommendations for authors). Overall, this work has important implications for predicting the relationship between genotype and phenotype, which is of considerable interest to protein biochemistry, evolutionary biology, and numerous other fields.

https://doi.org/10.7554/eLife.88737.1.sa2

Reviewer #2 (Public Review):

The authors aimed to understand how epistasis influences the genetic architecture of the DNA-binding domain (DBD) of steroid hormone receptor. An ordinal regression model was developed in this study to analyze a published deep mutational scanning dataset that consists of all combinatorial amino acid variants across four positions (i.e. 160,000 variants). This published dataset measured the binding of each variant to the estrogen receptor response element (ERE, sequence: AGGTCA) as well as the steroid receptor response element (SRE, sequence: AGAACA). This model has major strengths of being reference free and able to account for global nonlinearity in the genotype-phenotype relationship. Thorough analyses of the modelling results have performed, which provided convincing results to support the importance of epistasis in promoting evolution of protein functions. This conclusion is impactful because many previous studies have shown that epistasis constrains evolution. However, the model in this study requires transformation of continuous functional data into categorical form, which would reduce precision in estimating the genetic architecture. Besides, generalizability of the findings in this study is unclear. These limitations, which are acknowledged by the authors, are minor and should not affect the conclusion of this study. The novelty of this study will likely stimulate new ideas in the field. The model will also likely be utilized by other groups in the community.

https://doi.org/10.7554/eLife.88737.1.sa1

Reviewer #3 (Public Review):

In this paper, the authors analyze a large previously published deep mutational scanning data set using a reference-free regression approach. They extract the contributions of single locus and epistatic effects to the functionality of the sequence (no, weak or strong transcription activation of two response elements). They find that pairwise epistasis plays a crucial and dominant role at creating functional sequences and at connecting the functional sequence space.

I enjoyed reading the paper and the topic (role of epistasis at creating and connecting functional sequences; development of measures of epistasis) is very exciting to me. However, I found it difficult to judge the strength of the paper both because it is written in a rather dense and yet potentially redundant fashion (see comment 1) and because I was left with a number of questions upon reading. I will focus on conceptual questions in the following comments, since I am not able to judge the statistical approach in detail.

1/ Regarding the biological result (importance of pairwise epistasis) I was wondering how potentially redundant the consecutive sections of the paper are. In which situation would the authors expect that pairwise epistasis does *not* play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape? I would also appreciate an explanation of how much new biological results this paper delivers as compared with the paper in which the data were published (which I, unfortunately, cannot access at the moment of writing this report).

2a/ Regarding the regression approach: I very much appreciate a reference-free approach to the estimation of epistasis. However, I would enjoy an explanation of how the results would have been (potentially) different if a reference-based approach was used, and how it compares with other reference-free approaches to estimating epistasis (e.g., linear regression or the gamma statistics of Ferretti et al. 2015).

2b/ When comparing the outcomes with and without epistasis, I understood that the authors compare the estimated "full model" with the outcome if epistatic effects were ignored - but without a new estimation of main effects if epistasis is ignored. Wouldn't that be a more fair comparison?

2c/ Where do the authors see the applicability of their approach to data beyond those analyzed in the present study? What are the requirements to use it? Does it only work for combinatorially complete landscapes? I did not have a chance to look at the code - how easily could other researchers apply the approach to their data?

https://doi.org/10.7554/eLife.88737.1.sa0

Epistasis facilitates functional evolution in an ancient transcription factor

Peer review process

Editors

Be the first to read new articles from eLife