Epistasis facilitates functional evolution in an ancient transcription factor

  1. Department of Ecology and Evolution, University of Chicago, 60637 USA
  2. Department of Biological Sciences, Purdue University
  3. Program in Genetics, Genomics, and Systems Biology, University of Chicago, 60637 USA
  4. Center for RNA Research, Seoul National University
  5. Department of Biochemistry and Molecular Biophysics, University of Chicago, 60637 USA
  6. Department of Biochemistry, University of Utah
  7. Department of Human Genetics, University of Chicago, 60637 USA

Editors

  • Reviewing Editor
    Armita Nourmohammad
    University of Washington, Seattle, United States of America
  • Senior Editor
    Christian Landry
    Université Laval, Québec, Canada

Reviewer #1 (Public Review):

Metzger et al develop a rigorous method filling an important unmet need in protein evolution - analysis of protein genetic architecture and evolution using data from combinatorially complete 20^N variant libraries. Addressing this need has become increasingly valuable, as experimental methods for generating these datasets expand in scope and scale. Their model incorporates several key features - (1) it reports the effects of mutations relative to the average across all variants, rather than a particular genotype, making it useful for examining genetic architectures and evolution in a less biased way, (2) it infers contributions from both "specific" and "non-specific" epistasis, which is essential for some experimental measurements, and perhaps most importantly (3) it does this for all possible 20 states at each site, in contrast to the binary analyses in prior work. These features are not individually novel but integrating them into a single analysis framework is novel and will be incredibly valuable to the protein evolution community. Using a previously published dataset generated by two of the authors, they conclude that (1) changes in function are largely attributable to pairwise but not higher-order interactions, and (2) epistasis potentiates, rather than constrains, evolutionary paths. These findings are well-supported by the data, though the authors' claim that higher-order epistasis cannot account for the variation they see could be better supported by additional analyses or discussion (as noted in recommendations for authors). Overall, this work has important implications for predicting the relationship between genotype and phenotype, which is of considerable interest to protein biochemistry, evolutionary biology, and numerous other fields.

Reviewer #2 (Public Review):

The authors aimed to understand how epistasis influences the genetic architecture of the DNA-binding domain (DBD) of steroid hormone receptor. An ordinal regression model was developed in this study to analyze a published deep mutational scanning dataset that consists of all combinatorial amino acid variants across four positions (i.e. 160,000 variants). This published dataset measured the binding of each variant to the estrogen receptor response element (ERE, sequence: AGGTCA) as well as the steroid receptor response element (SRE, sequence: AGAACA). This model has major strengths of being reference free and able to account for global nonlinearity in the genotype-phenotype relationship. Thorough analyses of the modelling results have performed, which provided convincing results to support the importance of epistasis in promoting evolution of protein functions. This conclusion is impactful because many previous studies have shown that epistasis constrains evolution. However, the model in this study requires transformation of continuous functional data into categorical form, which would reduce precision in estimating the genetic architecture. Besides, generalizability of the findings in this study is unclear. These limitations, which are acknowledged by the authors, are minor and should not affect the conclusion of this study. The novelty of this study will likely stimulate new ideas in the field. The model will also likely be utilized by other groups in the community.

Reviewer #3 (Public Review):

In this paper, the authors analyze a large previously published deep mutational scanning data set using a reference-free regression approach. They extract the contributions of single locus and epistatic effects to the functionality of the sequence (no, weak or strong transcription activation of two response elements). They find that pairwise epistasis plays a crucial and dominant role at creating functional sequences and at connecting the functional sequence space.

I enjoyed reading the paper and the topic (role of epistasis at creating and connecting functional sequences; development of measures of epistasis) is very exciting to me. However, I found it difficult to judge the strength of the paper both because it is written in a rather dense and yet potentially redundant fashion (see comment 1) and because I was left with a number of questions upon reading. I will focus on conceptual questions in the following comments, since I am not able to judge the statistical approach in detail.

1/ Regarding the biological result (importance of pairwise epistasis) I was wondering how potentially redundant the consecutive sections of the paper are. In which situation would the authors expect that pairwise epistasis does *not* play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape? I would also appreciate an explanation of how much new biological results this paper delivers as compared with the paper in which the data were published (which I, unfortunately, cannot access at the moment of writing this report).

2a/ Regarding the regression approach: I very much appreciate a reference-free approach to the estimation of epistasis. However, I would enjoy an explanation of how the results would have been (potentially) different if a reference-based approach was used, and how it compares with other reference-free approaches to estimating epistasis (e.g., linear regression or the gamma statistics of Ferretti et al. 2015).

2b/ When comparing the outcomes with and without epistasis, I understood that the authors compare the estimated "full model" with the outcome if epistatic effects were ignored - but without a new estimation of main effects if epistasis is ignored. Wouldn't that be a more fair comparison?

2c/ Where do the authors see the applicability of their approach to data beyond those analyzed in the present study? What are the requirements to use it? Does it only work for combinatorially complete landscapes? I did not have a chance to look at the code - how easily could other researchers apply the approach to their data?

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation