1. Biochemistry and Chemical Biology
  2. Structural Biology and Molecular Biophysics
Download icon

Tracing a protein’s folding pathway over evolutionary time using ancestral sequence reconstruction and hydrogen exchange

  1. Shion An Lim
  2. Eric Richard Bolin
  3. Susan Marqusee  Is a corresponding author
  1. University of California, Berkeley, United States
  2. Chan Zuckerberg Biohub, United States
Research Article
  • Cited 0
  • Views 1,390
  • Annotations
Cite this article as: eLife 2018;7:e38369 doi: 10.7554/eLife.38369

Abstract

The conformations populated during protein folding have been studied for decades; yet, their evolutionary importance remains largely unexplored. Ancestral sequence reconstruction allows access to proteins across evolutionary time, and new methods such as pulsed-labeling hydrogen exchange coupled with mass spectrometry allow determination of folding intermediate structures at near amino-acid resolution. Here, we combine these techniques to monitor the folding of the ribonuclease H family along the evolutionary lineages of T. thermophilus and E. coli RNase H. All homologs and ancestral proteins studied populate a similar folding intermediate despite being separated by billions of years of evolution. Even though this conformation is conserved, the pathway leading to it has diverged over evolutionary time, and rational mutations can alter this trajectory. Our results demonstrate that evolutionary processes can affect the energy landscape to preserve or alter specific features of a protein’s folding pathway.

https://doi.org/10.7554/eLife.38369.001

Introduction

Protein folding, the process by which an unfolded polypeptide chain navigates its energy landscape to achieve its native structure (Dill and MacCallum, 2012; Baldwin, 1975), can be defined by the partially folded conformations (intermediates) populated during this process. Such intermediates are key features of the landscape; they can facilitate folding, but they can also lead to misfolding and aggregation, resulting in a breakdown of proteostasis and disease (Karamanos et al., 2016; Chiti and Dobson, 2017Ahn et al., 2016). While identifying and characterizing these intermediates is critical to understanding and engineering a protein’s energy landscape, their transient nature and low populations present experimental challenges. Currently, we know very little about how evolutionary variations in the primary amino acid sequence affect these transient conformations and the overall folding pathway of a protein. Recent technological improvements in hydrogen exchange monitored by mass spectrometry (HX-MS) have provided access to the structural and temporal details of folding intermediates at near-single amino-acid resolution (Hu et al., 2013; Walters et al., 2012; Mayne et al., 2011; Aghera and Udgaonkar, 2017; Vahidi et al., 2013; Khanal et al., 2012). This pulsed-labeling HX-MS approach is particularly well suited to studies of multiple variants or families of proteins, as it does not require large amounts of purified protein or NMR assignments, and is a powerful platform to test hypotheses on how folding pathways and protein conformations are dictated by sequence and the environment. Thus, pulsed-labeling HX-MS can be used to address long-standing questions in the field: How robust is a protein’s energy landscape to changes in the amino acid sequence, and how conserved is the folding trajectory over evolutionary time?

Ribonuclease HI (RNase H) is an ideal system to investigate protein folding over evolutionary time. RNase H from E. coli, ecRNH* (the asterisk denotes a cysteine-free variant of RNase H), is arguably one of the best-characterized proteins in terms of its folding pathway and energy landscape. Both stopped-flow ensemble studies and single-molecule optical trap experiments demonstrate that this protein populates a major obligate intermediate before the rate-limiting step in folding (Raschke and Marqusee, 1997; Raschke et al., 1999; Cecconi et al., 2005; Rosen et al., 2014). A rare population of this intermediate can also be detected under native-state conditions (Chamberlain et al., 1996). Several homologs of RNase H have also been studied, yielding insight into the folding trends of extant RNases H (Hollien and Marqusee, 2002; Kern et al., 1998; Ratcliff et al., 2009). Given the wealth of data available on several extant RNases H, this seemed like an ideal system to explore the evolutionary basis of these observations.

One can use a phylogenetic technique called ancestral sequence reconstruction (ASR) to access the evolutionary history of a protein family and study the properties of ancestral proteins (Harms and Thornton, 2013; Wheeler et al., 2016). ASR has been applied to a variety of protein families and in addition to revealing the evolutionary history, these ancestral proteins can act as intermediates in sequence space to uncover mechanisms underlying protein properties (Starr et al., 2017; Gaucher et al., 2008; Hobbs et al., 2012; Perez-Jimenez et al., 2011; Risso et al., 2013; Smock et al., 2016; Akanuma et al., 2013; Siddiq et al., 2017). Recently, ancestral sequence reconstruction was applied to the RNase H family and the thermodynamic and kinetic properties of seven ancestral proteins connecting the lineages of E. coli and T. thermophilus RNase H (ecRNH* and ttRNH*) were characterized (Hart et al., 2014; Lim et al., 2016; Lim and Marqusee, 2017). Stopped-flow kinetics monitored by circular dichroism (CD) demonstrate that all seven ancestral proteins populate a folding intermediate before the rate-limiting step. Additionally, the folding and unfolding rates show notable trends along the phylogenetic lineages, and the presence of a folding intermediate plays an important role in modulating these evolutionary trends (Lim et al., 2016). Although the kinetic mechanism appears conserved, the structural similarities and differences between the folding intermediates across these RNases H are unknown.

For ecRNH*, multiple methods have confirmed the structural details of the folding intermediates. This major folding intermediate, termed Icore, which forms before the rate-limiting step, involves secondary structure formation in the core region of the protein, including Helices A-D and Strands 4 and 5, while the rest of the protein (Helix E and Strands 1, 2, 3), remains unfolded (Figure 1A) (Raschke and Marqusee, 1997; Spudich et al., 2004; Connell et al., 2009). Pulsed-labeling HX-MS with near amino acid resolution was developed using ecRNH* as the model protein (Hu et al., 2013). This approach confirmed the structure of Icore and revealed the stepwise protection of individual helices leading up to the intermediate. Specifically, the amide hydrogens in Helix A and Strand four are the first elements to gain protection, followed by those in Helix D and Strand 5, and then Helices B and C to form the canonical Icore intermediate. The periphery, comprising of Strands 1–3 and Helix E, gains protection in the rate-limiting step to the native state. Is this Icore folding intermediate and the stepwise folding pathway conserved across evolution?

RNase H structure and schematic of pulsed-labeling HX-MS experiment

(A) Crystal structure of Ecoli RNase H* (ecRNH*) (PDB: 2RN2) (Katayanagi et al., 1992). Secondary structural elements: Red: Strand 1, Strand 2, Strand 3; Blue: Helix A, Strand 4; Yellow: Helix B, Helix C; Green: Helix D, Strand 5; Purple: Helix E. The core region of the protein (Icore) involving Helix A, Strand 4, Helix B, Helix C, Helix D, Strand 5 and the periphery region of the protein involving Strand 1, Strand 2, Strand 3, Helix E are denoted. (B) Pulsed-labeling setup and workflow. Unfolded, fully deuterated protein in high [urea] is rapidly mixed with low [urea] refolding buffer to initiate refolding. After some refolding time, hydrogen exchange of unprotected amides is initiated by mixing with high-pH pulse buffer. The hydrogen exchange reaction is quenched by mixing with a low-pH quench buffer. The sample is injected onto an LC-MS for in-line proteolysis, desalting, and peptide separation by reverse-phase chromatography followed by MS analysis.

https://doi.org/10.7554/eLife.38369.002

Here, we used pulsed-labeling HX-MS on the resurrected family of RNases H to investigate the evolutionary and sequence determinants governing the folding trajectory. Specifically, we find that the structure of the major folding intermediate (Icore) has been conserved over three billion years of evolution, suggesting that this partially folded state plays a crucial role in the folding or function of the protein. However, the path to this intermediate varies, both during evolution and by designed mutations. The very first step in folding differs between the two extant homologs: for ecRNH*, Helix A gains protection before Helix D, while for ttRNH*, Helix D acquires protection before Helix A. This pattern can be followed along the evolutionary lineages: most of the ancestors fold like ttRNH* (Helix D before Helix A) and a switch to fold like ecRNH* (Helix A before Helix D) occurs late along the mesophilic lineage. These phylogenetic trends allow us to investigate how these early folding events are encoded in the amino acid sequence. By using single-point mutations to selectively modulate biophysical properties, notably intrinsic helicity of specific secondary structure elements, we are able to favor or disfavor the formation of specific conformations during folding and have engineering control over the folding pathway of RNase H.

Results

Monitoring a protein’s folding trajectory by pulsed-labeling HX-MS

We used pulsed-labeling hydrogen exchange monitored by mass spectrometry (HX-MS) on extant, ancestral, and site-directed variants of RNase H to examine the robustness of a protein’s folding pathway to sequence changes. These experiments allow us to characterize the partially folded intermediates and the order of structure formation during folding to ask whether these intermediates have changed over evolutionary time, and what role sequence might play in determining these intermediate conformations.

Figure 1B outlines the scheme for the pulsed-labeling experiment (for details, see Materials and methods). Briefly, folding is initiated by rapidly diluting an unfolded (high [urea]), fully deuterated protein into folding conditions (low [urea]) at 10°C. After various folding times (tf), a pulse of hydrogen exchange is applied to label amides in regions that have not yet folded. The amount of exchange at each folding timepoint is then detected by in-line proteolysis and LC/MS. Data are analyzed first at the peptide level by monitoring the protection of deuterons on peptides as a function of refolding time, and then at the residue level, using overlapping peptides de-convoluted by the program HDsite (Kan et al., 2013; Kan et al., 2011). Protection from exchange in these experiments arises from formation of structure, however it is impossible to tell the exact structure using hydrogen exchange alone. Thus, nonnative or transient structures can be detected, but the characterization of them is limited.

Since the original folding studies on many RNases H were carried out at 25°C, we re-characterized the folding of each RNase H variant at 10°C using stopped-flow circular dichroism spectroscopy (Figure 2—figure supplement 1, Figure 2—source data 1). The refolding profiles were consistent with those at 25°C (Raschke and Marqusee, 1997; Hollien and Marqusee, 2002; Lim et al., 2016). At low [urea], all ancestors show a large signal change (burst phase) within the dead time of the stopped-flow instrument (~15 msec), followed by a slower observable phase which fit well to a single exponential. The resulting chevron plots (ln(kobs) vs [urea]) show the classic rollover at low [urea] due to the presence of a stable folding intermediate. As expected, the observed rates at 10°C are slower than 25°C, but the chevron profiles are similar for all RNase H variants. Thus the overall folding trajectory, notably the population of a folding intermediate, has not changed between the two temperatures.

Monitoring the folding pathway of ttRNH* using pulsed-labeling HX-MS

First, we characterized the conformations populated during folding of extant RNase H from T. thermophilus and compared its folding trajectory to the previously characterized folding trajectory of E. coli RNase H (Figure 2—figure supplement 1, Figure 2—figure supplement 2, Figure 2—source data 1) (Hu et al., 2013). 374 unique peptides mapping to the ttRNH* sequence were identified by MS (Figure 2A, Figure 2—source data 1). The major folding intermediate in ttRNH*, Icore, is strikingly similar to that of ecRNH* (Hu et al., 2013). Similar to ecRNH*, peptides associated with Icore (Helix A-D, Strands 4 – 5) gain protection early (within milliseconds), corresponding to the timescale for the formation of the folding intermediate. Peptides associated with the periphery of the protein (Strands 2 – 3, Helix E) gain protection on the order of seconds, corresponding to the rate-limiting step (Figure 2B). Although it has been shown by CD spectroscopy that the slow observed refolding kinetics of ttRNH* fit best to a biphasic exponential process (Figure 2—figure supplement 1, Figure 2—source data 1), we were not able to observe structural changes that can be attributed to these two phases by pulsed-labeling HX-MS. Thus, uncovering the molecular mechanism that gives rise to the biphasic behavior of ttRNH* remains an open question.

Figure 2 with 2 supplements see all
Determination of the folding pathway of T. thermophilus RNase H* by HX-MS.

(A) Protection of representative peptides from ttRNH* at various refolding times. Peptides are colored according to their corresponding structural element. The solid arrow indicates the refolding time point analyzed in panel B. The dotted arrow indicates the refolding time point analyzed in panel C. (B) Protection of peptides mapping to the core region (Icore) or the periphery region of ttRNH* at 21 msec after refolding. Bars represent the mean and standard deviation of each data set. *p<0.0001 (Welch’s unpaired T-test) (C) Protection of peptides of ttRNH* mapping to distinct secondary structural elements at one msec after refolding. Bars represent the mean and standard deviation of each data set. *p=0.0027 (Welch’s unpaired T-test). Data for B and C represent aggregate data from three separate experiments. (D) Residue-resolved folding pathway of ttRNH* at representative refolding time points. Data points in black indicate residues that are site-resolved. Data points in grey indicate residues in regions with less peptide coverage and are thus not site-resolved with the neighboring residues. Residues where site-resolved protection could not be determined due to insufficient peptide coverage is denoted with a ‘x’.

https://doi.org/10.7554/eLife.38369.003

Looking at the very early refolding times allows one to determine the individual folding steps preceding Icore.. At the earliest time point (~1 msec), almost all peptides are unfolded (fully exchange with solvent) with the exception of those in Helix D and Strand 5, which are ~40% deuterated (Figure 2C). Peptides spanning Helix A and Strand 4 are less protected (~15% deuterated) at this same time point. This order of protection (Helix D before Helix A) is notably different than that for E. coli RNase H*, where Helix A is protected before Helix D (Hu et al., 2013). Peptides spanning Helix B and Helix C gain protection in the Icore intermediate. Peptides from Strands 1 – 3 and Helix E do not gain full protection until significantly later (on the order of seconds), corresponding to the rate-limiting step to the native state. Thus, while the Icore intermediate is largely conserved between ttRNH* and ecRNH*, the initial steps of folding differ between the two homologs.

The peptide data from each time point were also analyzed using HDSite to determine residue-level protection in a near site-resolved manner (Figure 2D). These site-resolved data also show protection appearing first in Helix D and Strand 5, followed by Helix A/Strand 4, Helix B/C, and finally, the periphery Helix E and Strands 1 – 3. The differences in the order of protection leading up to Icore of ecRNH* and ttRNH* are also evident in this site-resolved analysis.

Pulsed-labeling HX-MS on the ancestral RNases H

To look for evolutionary trends in the folding trajectory, we probed the folding pathway of ancestral RNases H along the lineages of E. coli and T. thermophilus RNase H (Figure 3A). Anc1* is the last common ancestor of ecRNH* and ttRNH*. Anc2* and Anc3* are ancestors along the thermophilic lineage leading to ttRNH*, and AncA*, AncB*, AncC*, and AncD* are ancestors along the mesophilic lineage leading to ecRNH*. Previous kinetic studies demonstrated that all of the ancestral proteins fold via a three-state pathway, populating an intermediate before the rate-limiting step (Lim et al., 2016; Lim and Marqusee, 2017). We now use pulsed-labeling HX-MS to obtain a near-site resolved trajectory of the folding pathway for each ancestor and determine whether the Icore structure is conserved over evolution.

Figure 3 with 6 supplements see all
Determination of the folding pathway of ancestral RNases H by HX-MS.

(A) Representation of the phylogenetic tree of the RNase H family illustrating the ancestral proteins along the two lineages leading to E. coli RNase H and T. thermophilus RNase H. Adapted from Figure 2A of Hart KM et al. 2014, PLoS Biology. 12(11) doi:10.1371/journal.pbio.1001994, published under the CreativeCommons Attribution 4.0 International Public License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). (Hart et al., 2014) Anc1* is the last common ancestor of ecRNH* and ttRNH*. Anc2* and Anc3* are ancestors along the thermophilic lineage to ttRNH*. AncA*, AncB*, AncC*, and AncD* are ancestors along the mesophilic lineage to ecRNH*. (B) Protection of representative peptides from Anc1* at various refolding times. Peptides are colored according to their corresponding structural element. The solid arrow indicates the refolding time point analyzed in panel C. The dotted arrow indicates the refolding time point analyzed in panel D. (C) Protection of peptides mapping to the core region (Icore) or the periphery region of Anc1* at 13 msec after refolding. Bars represent the mean and standard deviation of each data set. *p = 0.0011 (Welch’s unpaired T-test) (D) Protection of peptides mapping to distinct secondary structural elements of Anc1* at 1 milliseconds after refolding. Bars represent the mean and standard deviation of each data set. *p < 0.0001 (Welch’s unpaired T-test). Data for C & D represent aggregate data from three separate experiments. (E) Residue-resolved folding pathway of Anc1* at representative refolding time points. Data points in black indicate residues that are site-resolved. Data points in grey indicate residues in regions with less peptide coverage and are thus not site-resolved with the neighboring residues. Residues where site-resolved protection could not be determined due to insufficient peptide coverage is denoted with a “x”.

https://doi.org/10.7554/eLife.38369.007

We obtained good peptide coverage for all of the ancestors with a minimum of 81 peptides seen in all time points for each variant (Figure 3, Figure 3—figure supplements 16, Figure 3—source data 1). As observed in both ttRNH* (above) and ecRNH* (Figure 2, Figure 2—figure supplement 2) (Hu et al., 2013), all of the ancestral RNases H populate the canonical Icore folding intermediate prior to the rate-limiting step. Peptides corresponding to the Icore region of the RNase H structure become protected on the timescale of milliseconds, while the rest of the protein gains protection on the timescale of seconds (Figure 3C, Figure 3—figure supplements 16). Thus, the structure of this major folding intermediate is not only present in both extant RNases H, but is conserved over nearly three billion years of evolutionary history.

Similarly to the extant proteins, the periphery of the ancestral proteins gains protection on a much slower timescale (Figure 3C, Figure 3—figure supplements 16). The details of protection in this region, however, vary somewhat across the ancestors. The periphery becomes fully protected by the last time point in all ancestral proteins except for AncB* (Figure 3—figure supplement 4). AncB* was previously characterized to be non-two-state with a notable population of the folding intermediate under equilibrium conditions, (Lim et al., 2016) and the lack of protection in the periphery in the folded state of AncB* is consistent with this observation. For Anc1* and Anc2*, there are also notable differences in the time course of protection for the terminal helix, Helix E. For these two proteins, the peptides spanning Helix E are decoupled from Strands 1 – 3 (which show protection on the same timescale as global folding) and do not gain protection even in the folded state of the protein (Figure 3B, Figure 3D, Figure 3—figure supplement 1), suggesting that Helix E is improperly docked or poorly structured in Anc1* and Anc2*. Indeed, Helix E is known to be labile in ecRNH*: a deletion variant of ecRNH* without this final helix forms a cooperatively folded protein, (Goedken et al., 1997) and recent single-molecule force spectroscopy of ecRNH* showed that Helix E can be pulled off the folded protein under low force while the remainder of the protein remains structured (manuscript in preparation). It appears that Helix E may be further destabilized in Anc1* and Anc2* such that it does not show protection in the native state.

The early folding steps of RNase H change across evolutionary time

Since the order of events leading to Icore differs between the extant homologs, we examined whether the ancestral RNases H spanning the lineages of these two homologs show any trends in their early folding steps. For each ancestor, we analyzed the fraction of deuterium protected in peptides that are uniquely associated with specific regions of the protein (Figure 3D and Figure 3—figure supplements 16) to determine which region folds first.

These data show that the last common ancestor of ecRNH* and ttRNH*, Anc1*, as well as all proteins along the thermophilic lineage (Anc2* and Anc3*) show similar behavior to ttRNH* and gain protection first in Helix D/Strand 5 (Figure 3D, Figure 3—figure supplements 1 and 2). For the first two ancestors along the mesophilic lineage (AncA* and AncB*), the order of protection is difficult to determine. For AncA*, there is no significant difference in the degree of protection among the peptides within Icore (this analysis is limited by the availability of peptides associated exclusively within a region) (Figure 3—figure supplement 3). However, when all overlapping peptides are analyzed using HDSite to obtain site resolution, we observe notable protection in Helix D at the earliest refolding times. Therefore, we conclude that although Helix D folding before Helix A is likely, the early folding events of AncA* cannot be unambiguously determined. For AncB*, all of Icore gains protection at the same time point, both at the peptide and residue-level, so the order of assembly cannot be determined with our time resolution (Figure 3—figure supplement 4).

The next ancestor along the mesophilic lineage, AncC*, shows protection first in Helix D, indicating that this pattern of protection is maintained through the mesophilic lineage to this ancestor (Figure 3—figure supplement 5). AncD*, the most recent ancestor along the mesophilic lineage, however, is similar to ecRNH* and gains protection first in Helix A (Figure 3—figure supplement 6). As detailed for the other ancestors, the data were also analyzed using HDSite to determine residue-level protection for each ancestral RNase H (Figure 3E, Figure 3—figure supplements 16). These data indicate a pattern in the order of protection in the early steps of the folding pathway across the RNase H ancestors. Early protection in Helix D is an ancestral feature of RNase H that is maintained in the thermophilic lineage, with a transition occurring late during the mesophilic lineage to a different pathway where Helix A is protected before Helix D, resulting in a distinct folding pathway for the two extant RNase H homologs (Figure 4).

Intrinsic helicity as a predictor for the early folding mechanism of RNases H.

Log-ratio of intrinsic helicity of Helix A and Helix D for each RNase H variant studied. Intrinsic helix predictions were calculated using AGADIR. (Muñoz and Serrano, 1994) The order of helix protection for each variant of RNase H is depicted in color. Green bars represent proteins where Helix D is the first structural element to gain protection during refolding. Blue bars represent proteins where Helix A is the first structural element to gain protection during refolding. Grey bars represent proteins where the helix protection order could not be unambiguously determined. The order of helix protection for each ancestor and homolog is also colored on the phylogenetic tree, revealing a trend in the RNase H folding trajectory along the evolutionary lineages. The phylogenetic tree shown in this figure is adapted from Figure 2A of Hart KM et al. 2014, PLoS Biology. 12(11) doi:10.1371/journal.pbio.1001994, published under the CreativeCommons Attribution 4.0 International Public License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/) (Hart et al., 2014).

https://doi.org/10.7554/eLife.38369.015

Early helix protection is determined by the local sequence of the core

Relative to the vast sequence space available, these RNase H ancestors represent a set of closely related sequences with distinct folding properties and provide an excellent system to help us elucidate the physiochemical mechanism and the sequence determinants dictating the RNase H folding trajectory. An analysis of the intrinsic helical propensity of each region using the algorithm AGADIR (Muñoz and Serrano, 1994) shows a notable trend in helicity that correlates with the early folding events (Figure 4). For proteins that gain protection in Helix A first, the intrinsic helicity of Helix A is about four-fold higher than that of Helix D. For the variants where Helix D is protected first, the intrinsic helicity of Helix D is similar to or greater than Helix A. This suggests that intrinsic helix propensity may play an important role in determining which region is the first to gain protection during the folding pathway of RNase H. To investigate this hypothesis, we turned to rationally designed variants.

Intrinsic helicity plays a role in determining the structure of the early intermediates

If the order of protection in the early folding events of RNase H is determined by intrinsic helix propensity, then we should be able to alter the protein sequence rationally and manipulate the folding trajectory. Thus, we asked whether single-site mutations that change the relative helix propensity of Helix A and Helix D could alter the folding trajectory of ecRNH* and make it fold in a similar fashion to ttRNH*. Two different point mutations were made in ecRNH*: A55G decreases helix propensity in Helix A, and D108L increases helicity in Helix D (Figure 4, Figure 5A, Supplementary file 1 – Table 1). Pulsed-labeling HX-MS indicates that both of these variants alter the early folding events of ecRNH*. The peptide-level protection of ecRNH* A55G indicates that at 13 msec, both Helix A and Helix D show similar levels of protection. In contrast, for wild-type ecRNH*, Helix A shows protection by 1 msec and Helix D does not show comparable protection until 10 – 20 msec (Hu et al., 2013). Thus, the mutation A55G slows the gain of protection in Helix A such that it no longer protected before Helix D (Figure 5B, Figure 5—figure supplement 1). The peptide-level protection of ecRNH* D108L indicates a change in the order of protection. Due to the limited number of peptides available, we could only confidently determine this using peptides spanning the N-terminus of Helix D. At 13 msec, the N-terminus of Helix D (residues 106 – 108) near the D108L mutation is protected significantly faster than any other region of the protein. Thus increasing helix propensity correlated with a change in the folding trajectory. (Figure 5C, Figure 5—figure supplement 1). Together, these two mutations suggest that intrinsic helicity plays a role in the early folding events of RNase H and can be used to alter the stepwise order of conformations populated during folding.

Figure 5 with 1 supplement see all
Engineered mutations to alter the folding pathway of ecRNH*.

(A) Crystal structure of E. coli RNase H (PDB: 2RN2) with mutations designed to alter intrinsic helicity (Katayanagi et al., 1992). A55G, located in Helix A (blue), is colored in cyan. D108L, located in Helix D (green), is colored in light green. (B) Protection of peptides mapping to distinct secondary structural elements of ecRNH* A55G at 13 msec after refolding. Bars represent the mean and standard deviation of each data set. p=0.0917 (n.s. = not significant, Welch’s unpaired T-test). (C) Protection of peptides mapping to distinct secondary structural elements of ecRNH* D108L at 13 msec after refolding. Bars represent the mean and standard deviation of each data set. *p=0.0044, **p=0.0016 (Welch’s unpaired T-test). Data for B and C represent aggregate data from three separate experiments.

https://doi.org/10.7554/eLife.38369.016

Discussion

Determining the folding pathway of multiple protein variants

Pulsed-labeling hydrogen exchange is currently the most detailed method to identify the conformations populated during protein folding. This approach was initially developed for use with NMR detection where it benefited from NMR’s site-specific resolution of individual amides (Bai, 2006). However, using NMR with pulsed-labeling HX requires tens of milligrams of sample and NMR peak assignments for the amides in each protein studied. In addition, probes are limited to amide sites stable to exchange in the final folded state (protection factors of >~80,000) resulting in loss of information at individual sites, which can sometimes represent large regions of the protein. In contrast, detection by mass spectrometry as applied in this study requires much less protein sample, has much faster data collection, and can theoretically cover 100% of the protein sequence. Importantly, this approach does not demand any structural information of the folded state, such as NMR assignments, for the specific protein or variant studied. These advantages enabled us to obtain the stepwise folding pathway of nine variants of RNase H and study the evolutionary history and sequence determinants of the RNase H folding pathway in detail. While pulsed-labeling HX-MS has been used to characterize the folding pathways of several model systems, this study is the first to utilize the higher throughput nature of HX-MS to study an ensemble of protein variants. The advantages of this technique to study many different sequences of the same fold shows great promise for probing the relationship between amino acid sequence and a protein’s energy landscape and will likely be particularly valuable for protein engineering and design applications. Additionally, HX-MS should be also able to identify non-native states, backtracking, and misfolded conformations. We did not, however, observe such phenomena with our RNase H proteins.

Although pulsed-labeling HX-MS is a powerful technique, it is not without limitations. In particular, HX is a readout of the accessibility of the backbone amide to exchange with solvent, so we would not be able to distinguish between different protein conformations with the same protection pattern. Additionally, a variety of factors involved in protein folding from hydrogen bond formation and hydrophobic collapse can contribute to amide protection. Since the protection patterns we observe during folding are all consistent with protected regions in the native state, we see no evidence for non-native structure, indicating a sequential acquisition of native secondary structure.

Icore is a structurally conserved folding intermediate over 3 billion years of evolution

The native fold of a protein is robust to changes in sequence, and proteins with >~30% sequence identity share the same fold (Sander and Schneider, 1991). Thus small variations in sequence, such as those found among homologs or site-specific mutations, do not affect the overall three-dimensional structure of a protein. These mutations can, however, affect the overall energy landscape, which in turn can have profound effects on function. Here, we find conservation of a high-energy structure populated during the folding of the RNase H family over incredibly long evolutionary timescales. Using pulsed-labeling HX-MS we identified and characterized the structure of the major folding intermediate in seven ancestral and several mutant RNases H, which together with previous studies on extant homologs, suggest that the conservation of this intermediate is a key feature of the RNase H energy landscape across ~3 billion years of evolutionary time.

Why does Icore persist on the energy landscape of RNase H? One explanation is a simple topological constraint; all RNases H may need to fold via a populated Icore intermediate to successfully reach the native state. This explanation, however, is countered by a previous study where a single mutation (I53D) in ecRNH* destabilizes Icore such that it is no longer populated during folding—yet this variant still folds to the native state (Spudich et al., 2004). Adding osmolytes, such as sodium sulfate, stabilizes this folding intermediate and switches ecRNH* I53D back to a three-state folding pathway, showing that the presence of the folding intermediate can be modulated. Additionally, a fragment of RNase H containing only the Icore sequence (and variants thereof) can autonomously fold and be studied at equilibrium, indicating that this structure is stable and robust to mutations (Chamberlain et al., 1999; Rosen and Marqusee, 2015). The nature of the rate-limiting step, or folding barrier, which allows for the buildup of this intermediate is unclear. One possibility is that the Icore intermediate is populated simply because the information for folding this region is completely encoded locally and Icore can fold relatively fast, before this rate-limiting step to the fully folded state. Alternatively, the rapid collapse of Icore may be the result of an evolutionary pressure to quickly sequester regions of the protein that might be particularly prone to aggregation. Indeed, there are several segments of RNase H that are predicted to be aggregation-prone, and most of these segments are found in the core region of the protein (Supplementary file 1 – Table 2).

Icore could also be conserved because it contributes to the biological function or fitness of the protein. Partially folded states and high-energy non-native conformations are known to be important for a variety of protein functions and proteostasis (Chiti and Dobson, 2017; Baldwin and Kay, 2009; Boehr et al., 2006). All of the ancestral RNases H we studied here are active, in that they cleave RNA-DNA hybrids in vitro (Hart et al., 2014); and although the residues thought to contribute to substrate-binding affinity are contained in the core region of the protein, (Kanaya et al., 1991) the active site residues (D10, E48, D70) span both the core and the periphery. It is therefore possible that a stable folding core with an energetically independent periphery is important for the efficiency or dynamics associated with catalysis in RNase H.

While the presence of the Icore intermediate has been observed in all proteins studied here, recent studies have suggested that some of the RNase H variants, notably for proteins along the thermophilic lineage, the Icore folding intermediate may also involve structure in the first β-strand (Lim and Marqusee, 2017; Rosen and Marqusee, 2015; Zhou et al., 2008). While we see slight protection in this region for ttRNH*, hydrogen exchange may not be the best probe of this—docking of Strand 1 without its hydrogen-bonding partners in the rest of the β-sheet may not be reflected by backbone amide protection. Therefore, amide protection may not be observed even if Strand 1 docks early to the core. The involvement of Strand 1 in other RNase H variants studied remains unclear from this study (Lim and Marqusee, 2017; Rosen and Marqusee, 2015).

Aspects of the folding pathway are malleable across evolutionary time

Our pulsed-labeling HX-MS results also illustrate how other features of a protein’s energy landscape can be altered over evolutionary timescales. Although the Icore intermediate is conserved across all RNases H studied, the individual folding steps leading up to Icore differ. Anc1*, the last common ancestor, folds through a pathway where the Helix D/Strand 5 region is the first structural element to gain protection. This ancestral feature is maintained along the thermophilic lineage to the extant ttRNH*. Along the mesophilic branch, we observe a switch from this ancient folding pathway to one that first forms protection in Helix A/Strand 4. This suggests that while the structure of Icore has been conserved across 3 billion years of evolution, the steps to form this intermediate are malleable over time. Since an isolated helix is unlikely show protection by HX, we expect additional hydrophobic collapse of the polypeptide to contribute to the observed protection. Nonetheless, the switch in protection between Helix A and Helix D indicates that formation of native structure nucleates in a different region of the protein across the RNase H variants studied, with a clear evolutionary trend.

Despite these trends, it remains difficult to rationalize these observations in terms of a selective evolutionary pressure or fitness implication. These very early events occur on the order of one millisecond, significantly faster than the overall folding of the protein. Furthermore, all of these RNase H proteins fold to their native state efficiently with no evidence for aggregation or misfolding. So, although partially folded states have been implicated as gateways for aggregation for some proteins, (Chiti and Dobson, 2017) this does not appear to be the case for RNase H. It is possible that the change in the early folding step is a result of mutations that are coupled to another feature under selection or drift. Although the actual evolutionary implication for the RNase H folding pathway may be lost in history, the trend in folding pathway across evolutionary time demonstrates that folding pathways and conformations on the energy landscape of proteins can change over time, and this system provides an excellent tool to interrogate the role sequence plays in guiding the process of protein folding.

The folding pathway of RNase H can be altered using simple sequence changes

Our study also shows how insights from evolutionary history can contribute to our understanding of the physiochemical mechanisms dictating the protein energy landscape and how we might use that knowledge to engineer the landscape. The regions that gain protection first involve helical secondary structure elements, and their folding order correlates with isolated helical propensity of these regions predicted by AGADIR (Muñoz and Serrano, 1994). Proteins where protection is first observed in Helix A have higher intrinsic helicity in Helix A than in Helix D. Proteins where Helix D gains protection first exhibit higher helicity in Helix D or roughly equal helicity in both regions. This property was used to guide our site-directed mutagenesis to select variants to alter the folding trajectory of ecRNH* in a predictive manner using intrinsic helicity as a guide.

While these results are consistent with local helicity as a determinant of the earliest folding steps, there may be other parameters that dictate the formation of these conformations. The parameter average area buried upon folding (AABUF) (Rose et al., 1985) which measures the average change in surface area of a residue from an unfolded state to a folded state, has been shown to correlate to the structure of the folding intermediate in apomyoglobin (Nishimura et al., 2005Nishimura et al., 2011). Both helicity and AABUF are altered in the mutants considered in our study (Supplementary file 1 – Table 1). Indeed, AABUF and helicity are often correlated and contributions of either parameter are difficult to disentangle. Nevertheless, our data suggest that parameters that are locally encoded in regions of a protein can be used engineer the energy landscape of a protein including its folding pathway.

We have used a combination of ASR and pulsed-labeling HX-MS to explore the conformations populated during the folding of multiple RNase H proteins, including homologs, ancestors, and single-site variants. All RNase H proteins studied populate the same major folding intermediate, Icore, indicating that this conformation has been maintained on the energy landscape of RNase H over long evolutionary timescales (>3 billion years). This remarkable conservation of a partially folded structure on the energy landscape of RNase H is contrasted with changes in the folding pathway leading up to this structure. The early folding events preceding this intermediate (Helix A protected before Helix D or vice versa) differ between the two homologs and also shows a notable trend along the evolutionary lineages. This pattern of protection correlates with the relative helix propensity of the sequences comprising these two helices, and we use this knowledge to alter the folding pathway of ecRNH* through rationally designed mutations. Our study illustrates how the energy landscape of a protein can be altered in complex ways over evolutionary time scales, and how insights from evolutionary history can contribute to our understanding of the physiochemical mechanisms dictating the protein energy landscape.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional
information
Strain, strain
background
(Escherichia coli)
BL21 (DE3) StarQB3 Macrolab
Recombinant
DNA reagent
ttRNH* (plasmid)10.1021/bi982684hn/a
Recombinant
DNA reagent
Anc1* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
Anc2* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
Anc3* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
AncA* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
AncB* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
AncC* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
AncD* (plasmid)10.1371/journal.pbio.1001994n/a
Recombinant
DNA reagent
ecRNH* (plasmid)10.1002/pro.5560030906n/a
Recombinant
DNA reagent
ecRNH* A55G
(plasmid)
this papern/aA55G point
mutant
of ecRNH*
Recombinant
DNA reagent
ecRNH* D108L
(plasmid)
this papern/aD108L point
mutant
of ecRNH*
Commercial
assay or kit
QuikChange Site-Directed Mutagenesis
Kit
Agilent200519
Commercial
assay or kit
Amicon Ultra Protein ConcentratorMilliporeUFC900324
Chemical
compound, drug
GlycineJT Baker4059–06
Chemical
compound, drug
UreaIBI
Scientific
IB72064
Chemical
compound, drug
Sodium
Acetate
VWR Life
Sciences
0530–1 KG
Chemical
compound, drug
Deuterium
Oxide
Sigma
Aldrich
151882–100G
Chemical
compound, drug
Trifluoroacetic
Acid
Sigma
Aldrich
302031–100 ML
Chemical
compound, drug
AcetonitrileThermo
Fisher
85188
Software,
algorithm
ExMS10.1007/s13361-011-0236-3n/a
Software,
algorithm
HDSite10.1073/pnas.1315532110n/a
Software,
algorithm
HDExaminerSierra
Analytics
n/a
Software,
algorithm
IllustratorAdoben/a
Software,
algorithm
GraphPad
Prism
GraphPad
Software, Inc.
n/a
OtherPOROS 20 R2
Beads
Thermo
Fisher
1112810
OtherMicrobore
column
Upchurch
Scientific
C-128
OtherReverse-phase
analytical column
Thermo
Fisher
72205–050565
OtherPOROS 20
AL Beads
Thermo
Fisher
1602810
Peptide,
recombinant protein
Pepsin
(porcine)
Sigma AldrichP6887
Peptide,
 recombinant protein
Fungal Protease
Type-XIII
Sigma AldrichP2143
Peptide,
recombinant protein
Pfu polymeraseAgilent600353

Protein purification

Cysteine-free T. thermophilus RNase H and ancestral RNases H were expressed and purified as previously described (Hart et al., 2014; Robic et al., 2002; Hollien and Marqusee, 1999). Point mutants were generated using site-directed mutagenesis, confirmed by Sanger sequencing, and the proteins were purified as previously described (Dabora and Marqusee, 1994). Purity was confirmed by SDS-PAGE and mass spectrometry.

HX-MS system

Hydrogen exchange mass spectrometry (HX-MS) experiments were carried out using a system similar to that described by Mayne et al (Walters et al., 2012; Mayne et al., 2011). Briefly, a Bio-Logic SFM-4/Q quench flow mixer with a modified head piece with reduced swept volume was used to initiate protein refolding, followed by pulse-labeling unprotected amide hydrogen atoms, and quenching of the labeling reaction. The minimum dead time for mixing is 13 msec. Quenched samples were injected into an HPLC system constructed using two Agilent 1100 HPLC instruments. The quenched sample was flowed over columns (Upchurch C130B) packed with beads of immobilized pepsin and fungal protease at 400 μL/min in 0.05% TFA. The digested protein was run onto a C-4 trap column (Upchurch C-128 with POROS R2 beads) for desalting. An acetonitrile gradient (15 – 100% acetonitrile, 0.05% TFA at 17 μL/min) eluted peptides from this C-4 trap column and onto an analytical C-8 column (Thermo 72205–050565) for separation before injection into an ESI source for mass spectrometry analysis on a Thermo Scientific LTQ Orbitrap Discovery. The entire HPLC system is kept submerged in an ice bath at 0°C to reduce back exchange of deuterium atoms during the chromatography steps. The workflow takes ~10–18 min from injection to peptide detection.

Refolding experiment

Similar to previous reports, (Hu et al., 2013; Mayne et al., 2011) unfolded protein samples in high denaturant (80 μM [protein], 20 mM NaOAc pH = 4.1, 7–9 M [urea]) were deuterated by a repeated cycle of lyophilization and resuspension in D2O and full deuteration was confirmed by mass spectrometry. For the pulsed-labeling experiment, 1 volume of deuterated protein was mixed in the SFM-4/Q with 10 volumes of refolding buffer (10 mM Sodium Acetate pH = 5.29, H2O) to initiate refolding. The pulse for hydrogen exchange was initiated by mixing with 5 volumes of high pH buffer (100 mM Glycine pH = 10.11) and then quenched after 10 msec with five volumes low pH buffer (200 mM Glycine pH = 1.95). The length of the delay line between the first and second mixer was changed to achieve a range of refolding times. An interrupted mixing protocol was used to measure the longest refolding time points (>373 msec). Undeuterated protein was used to perform tandem mass spectrometry (MS/MS) analysis to compile a list of peptides and their retention times in the HPLC system. Competition experiments where refolding and exchange were initiated at the same time were performed by diluting deuterated protein in high urea into high-pH refolding buffer (100 mM Glycine pH = 10.11). In this experiment each site will exchange with the solvent around it unless it can gain protection before exchange occurs (<1 msec on average). Fully folded controls were created by diluting unfolded protein samples 1:10 in fully deuterated refolding buffer and incubating at room temperature for 4 hrs before loading on the SFM-4/Q to apply the same 10 msec high-pH pulse as the other time points. All data were obtained in triplicate.

For each time point, an identical sample was collected in which the high pH pulse was omitted to measure the back-exchange (the amount of deuterium lost during the sample workup, from the quench to the injection into the mass spectrometer) for each sample. Typical back-exchange ranged from 10 – 20%, and all data except for ttRNH* were normalized to the observed back-exchange value. Data for ttRNH* were normalized to the theoretical maximum number of deuterons due to consistently poor peptide coverage in the back-exchange controls for this protein.

MS detection and data analysis

Proteome Discoverer 2.0 (Thermo Scientific) was used to identify peptides from the tandem MS data. Peptides identified in the pulsed-labeled refolding experiments with deuterated protein were used to determine the presence and deuteration level of each peptide at each refolding time point. The spectral envelope of each peptide was fit using two separate algorithms developed by the Englander Lab to determine their deuteration state — ExMS for identification and fitting of peptides and HDsite for deconvolution of overlapping peptides to achieve near-amino acid level deuteration levels (Kan et al., 2013; Kan et al., 2011). Site-resolution was determined by peptide coverage of each protein at each time point. If adjacent residues were not site-resolved by peptide coverage but their normalized fraction deuterium value was within <0.1 of each other, these residues were also considered to be site-resolved. In addition, HDExaminer (Sierra Analytics) was used to identify and fit each peptide and determine deuteration levels. Different charge states of the same peptide were averaged where noted and used for further analysis. Centroids of each peptide at each time point taken from HDExaminer were used for further analysis. The residue cutoffs for specific structural regions of each protein were determined from a multiple sequence alignment using the structure of E. coli RNase H as a guide (PDB: 2RN2) (Hart et al., 2014). Peptides were assigned to different structural regions based on these residue cutoffs. Peptides that spanned multiple secondary structural regions of a protein were excluded from further analysis, as were peptides not present in all time points. Peptides mapping to Strands 1 – 3 and Helix E were assigned to the periphery region of the protein. Peptides mapping to Helix A-D and Strands 4 – 5 were assigned to the core region of the protein. Comparison of protection from different groups was carried out using GraphPad Prism (GraphPad Software). Peptides from all experimental replicates were aggregated, and the distributions of groups were compared pairwise using Welch's unequal variances t-test. Statistically significant differences in the mean are reported with their associated p-values throughout the text and figures where appropriate.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
    Importance of the positive charge cluster in Escherichia coli ribonuclease HI for the effective binding of the substrate
    1. S Kanaya
    2. C Katsuda-Nakai
    3. M Ikehara
    (1991)
    The Journal of Biological Chemistry 266:11621–11627.
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57

Decision letter

  1. Lewis E Kay
    Reviewing Editor; University of Toronto, Canada
  2. Philip A Cole
    Senior Editor; Harvard Medical School, United States
  3. Lewis E Kay
    Reviewer; University of Toronto, Canada

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your manuscript "Tracing a protein's folding pathway over evolutionary time using ancestral sequence reconstruction and hydrogen exchange" for consideration. Your paper has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The consensus is that the work is of general interest, although some revision is requested. No further experiments are required at this time.

Summary:

The reports a landmark, high-resolution determination of structure formation during the folding process of two extant ribonuclease H homologs (from T. thermophilus and E. coli, ttRNH* and ecRNH*, respectively), a series of their reconstructed ancestral progenitors reaching back to their last common ancestor, and designed site-directed point mutations of ecRNH* that test the molecular basis for differences in folding. The biophysical analysis of this conceived set of variants provides a unique view of the molecular basis for how the energy landscape of protein folding may be modulated over the course of evolution. As there is very little such information currently known, this study provides unprecedented experimental detail to address critical, fundamental questions regarding how evolutionary variations in primary amino acid sequence determine tertiary structure. Such information has far-reaching impact, as the authors demonstrate how this powerful H/D method developed previously and applied to ecRNH* (PNAS 2013) may be widely applied and provide testable knowledge of folding that is important for advancing outstanding challenges in the accurate modelling of folding as well as in protein engineering and design. The key findings are that an obligatory partially folded intermediate is conserved over 3 billion years of RNH evolution, but the path to this intermediate can vary, both during evolution and by design. A significant modulator of the protein energy landscape is the intrinsic helical propensity of different secondary structural elements, i.e. helix A and helix D. Both A and D helices are formed in the conserved intermediate, but helix D is the first secondary structure element to fold in the common ancestor and in variants up to and including ttRNH*. In contrast, along the phylogenetic branch leading to ecRNH*, the folding of helix A is progressively increasingly favourable compared to helix D, such that helix A folds first in ecRNH and the preceding ancestral protein (AncD*). The relative formation of helix A and D during folding is mirrored in the corresponding relative intrinsic helical propensities in the different RNHs, and is demonstrated as rationally alterable by characterizing point mutants engineered to change helical propensity.

Although all reviewers enjoyed the paper and recognized the novelty of the work it was felt that it should be stressed more completely in the revised manuscript. This is particularly the case as several previous papers have been published by this group on the same RNH system, although structural information on the folding order is not provided there.

Minor points:

- Do the authors think that there is a functional significance at all to the order of helix folding (D before A or vice versa)?

- Is the sequence that folds first, corresponding to the intermediate, more aggregation prone than the slower folding remainder? Could it be that the goal of the intermediate is just to sequester quickly sticky amino acids?

- Could the authors discuss a bit about the limitations of their hydrogen exchange method in terms of generating intermediate structures. They stress the positives of course in their paper. Yet, it seems like it is not possible to distinguish between the formation of different structural elements in cases where protection would be the same and hence the requirement for a somewhat native-centric folding model.

- Additional information should be provided on analysis of the HX-MS data and on the associated uncertainties. This is important to better define the current results and for other investigators who may use the results or the HX-MS method in future. For example, what are the estimated uncertainties in the residue-resolved data (panels D or E)? What specific criteria are used to assign results for an individual residue to grey (increased uncertainty) or black? Why does the grey/black colour change for different folding timepoints of the same protein? Some additional detail on how differences in protection of secondary structural elements (panels C or D) were calculated to be statistically different is also warranted. In panel C/D (and B/C), I guess each point represents data for 1 peptide in 1 experiment, or were the data determined in triplicate averaged? Please also clarify details regarding how the various normalizations were done and the extent of back exchange and the impact of back exchange on uncertainties. What are the impacts/significance for data interpretation of normalizing the data for ttRNH* to the theoretical maximum (e.g. will fractions deuterated tend to be systematically low in Figure 2). Much of this information could be in the supplementary information, perhaps including additional specific information on what peptides were observed/analyzed for different proteins.

- The authors discuss the pertinent point that the level of protection of the peripheral secondary structural elements (strands 1-3 and Helix E), which are not protected in the intermediate, differs in the folded RNHs. To what extent this different protection may impact interpretation of the results? Also, in principle the HX-MS may reveal non-native intermediates e.g. perhaps there is more structure in helix A in the intermediate than in the folded state (e.g. for Anc3*). This may also be worth commenting on?

- Comparison of the new results described in this paper with results obtained previously for ecRNH* (PNAS 2013) are central to the conclusions of this paper, but no data are shown for ecRNH*. The ecRNH* results were obtained at 25°C rather than the 10°C used for the new HX-MS results here, and that AncD*, like ecRNH*, shows protection of Helix A before Helix D. Still, it may be helpful to include some additional information on ecRNH* both for HX-MS and for folding kinetics in Figure 2—figure supplement 1. Please clarify the significance/relationship of the 2 phases observed for ttRNH* for the HX-MS results, and whether/why 1 or 2 phases are observed for the other proteins.

- Figure 5 shows HX-MS at an early timepoint of folding for ecRNH* point mutations A55G (decreases helical propensity of Helix A) and D108L (to increase helicity of D). It would certainly be of interest to see also the corresponding residue resolved analysis, and if the observed effects persist throughout folding (i.e. add the equivalent of panel D).

https://doi.org/10.7554/eLife.38369.022

Author response

[…] Although all reviewers enjoyed the paper and recognized the novelty of the work it was felt that it should be stressed more completely in the revised manuscript. This is particularly the case as several previous papers have been published by this group on the same RNH system, although structural information on the folding order is not provided there.

We are pleased that the reviewers share our excitement about the novelty of our work and its contribution to the field. We have added several sentences in the Introduction and Discussion section to further emphasize the novelty and key findings of the paper.

Minor points:

- Do the authors think that there is a functional significance at all to the order of helix folding (D before A or vice versa)?

Because the earliest events in folding where helix D or helix A gain protection takes place very quickly (on the order of milliseconds), we found it difficult to rationalize a functional significance to this trend and have outlined our reasoning in the Discussion section (subsection “Aspects of the folding pathway are malleable across evolutionary time”). Although it is possible that intrinsic helicity can underlie functional effects in the folded state of the protein (i.e. substrate binding, native-state dynamics, catalysis) or be involved in stability (generally the more thermodynamically stable proteins show protection in helix D first), we can only speculate since we did not specifically probe the functional significance of the observed folding trends in this study.

- Is the sequence that folds first, corresponding to the intermediate, more aggregation prone than the slower folding remainder? Could it be that the goal of the intermediate is just to sequester quickly sticky amino acids?

Thank you for bringing up this interesting point. To examine this in the RNase H family, we ran three different aggregation predictors (Zipper DB, FISH Amyloid, AmylPred) on the RNase H sequences and we summarize the results in a new supplementary table (Table 2 in Supplementary file 1). The algorithms found several aggregation-prone protein segments, most of which are in the core region of the protein. Thus, as the reviewers hypothesized, it is entirely possible that the goal of the intermediate is to quickly sequester these sticky amino acids. We did not observe any aggregation-like behavior in our HX-MS experiments, but investigating whether this is a significant factor in RNase H folding and evolution would be an interesting future direction. We have also added some text in the Discussion to address this in the manuscript (subsection “Icore is a structurally conserved folding intermediate over 3 billion years of evolution”, second paragraph).

- Could the authors discuss a bit about the limitations of their hydrogen exchange method in terms of generating intermediate structures. They stress the positives of course in their paper. Yet, it seems like it is not possible to distinguish between the formation of different structural elements in cases where protection would be the same and hence the requirement for a somewhat native-centric folding model.

Thank you for this important feedback about the advantages and disadvantages of HX-MS. It is true that hydrogen exchange measures protection, and cannot distinguish between different structures with the same residue protection. In this manuscript, we have done our best to refrain from drawing conclusions about specific structures of the partially folded states we observe in the RNase H proteins, and instead we focus on describing our data as protection during folding that are consistent with protected regions in the native state. We have reviewed our manuscript to ensure that our data interpretation and wording is consistent with this approach. Additionally, we have added language in the main text (subsection “Determining the folding pathway of multiple protein variants”) that explains both the advantages and disadvantages of hydrogen exchange, focusing particularly on the limitations and assumptions involved in inferring partially folded structures.

- Additional information should be provided on analysis of the HX-MS data and on the associated uncertainties. This is important to better define the current results and for other investigators who may use the results or the HX-MS method in future. For example, what are the estimated uncertainties in the residue-resolved data (panels D or E)? What specific criteria are used to assign results for an individual residue to grey (increased uncertainty) or black? Why does the grey/black colour change for different folding timepoints of the same protein? Some additional detail on how differences in protection of secondary structural elements (panels C or D) were calculated to be statistically different is also warranted. In panel C/D (and B/C), I guess each point represents data for 1 peptide in 1 experiment, or were the data determined in triplicate averaged?

We have provided additional details in the Materials and methods section and figure legends to clarify how the HX-MS data were analyzed and whether the displayed data is an average, aggregated, or a representative example from replicates. We’d like to clarify that the grey/black coloring for the site-resolved data may be different for different folding time points because site-resolution was determined by the peptide coverage of each sample. For the same protein, slightly different peptides (both in identity and number) may be detected for any given MS run. We hope that this additional information can provide the details necessary for accurate interpretations of our work and for future investigators to build upon our work.

Please also clarify details regarding how the various normalizations were done and the extent of back exchange and the impact of back exchange on uncertainties. What are the impacts/significance for data interpretation of normalizing the data for ttRNH* to the theoretical maximum (e.g. will fractions deuterated tend to be systematically low in Figure 2). Much of this information could be in the supplementary information, perhaps including additional specific information on what peptides were observed/analyzed for different proteins.

We have provided additional details in the Materials and methods section to clarify how the HX-MS data was normalized and stated the extent of back-exchange observed in our experiments. As previously explained in our Materials and methods, for one protein, ttRNH*, we chose to normalize the data to the theoretical maximum because of the consistently poor peptide coverage in the back-exchange control (this may be due to its high stability and difficulty in peptide cleavage). Since the choice of normalization only affects the absolute value but not the relative differences between time points or secondary structure elements in deuterium protection, we believe the data interpretation and the statistical significance should not be affected in the ttRNH* dataset.

Additionally, we have provided supplementary information tables (Figure 2—source data 1, Figure 3—source data 1, Figure 5—source data 1) that lists the source data for each protein studied in this manuscript, the identity of all of the peptides and their normalized fractional deuterium content at each time point.

- The authors discuss the pertinent point that the level of protection of the peripheral secondary structural elements (strands 1-3 and Helix E), which are not protected in the intermediate, differs in the folded RNHs. To what extent this different protection may impact interpretation of the results? Also, in principle the HX-MS may reveal non-native intermediates e.g. perhaps there is more structure in helix A in the intermediate than in the folded state (e.g. for Anc3*). This may also be worth commenting on?

As the reviewer noted, we see interesting differences in the level of protection in the folded state across the different RNase H proteins. These data are consistent with previous results, such as the non-2-state equilibrium behavior of AncB* (Hart et al., 2014) and the relative lability of Helix E (Goedken, Raschke and Marqusee, 1997). For all of these proteins, Icore forms during folding and is structurally conserved, and the folding steps to Icore take place on a timescale much faster than typical folding of the periphery (msec vs. sec). Thus, we believe that differences in the native ensemble of the protein would not impact the interpretation of the results and the trends we observe regarding the early folding steps leading up to the conserved Icore intermediate.

As the reviewer noted, it should be possible to reveal non-native intermediates with HX-MS, which we believe is a very important and powerful application of this technique. Specifically to Anc3*, however, we would like to clarify to the reviewer that the residue-resolved data spanning Helix A (Figure 3—figure supplement 2D) is missing because of a lack of peptide coverage in that region for that time point. When site-resolved data is not available, we use the symbol “x” to denote the lack of data at those residues, which the reviewer may have inadvertently interpreted as low protection values.

- Comparison of the new results described in this paper with results obtained previously for ecRNH* (PNAS 2013) are central to the conclusions of this paper, but no data are shown for ecRNH*. The ecRNH* results were obtained at 25°C rather than the 10°C used for the new HX-MS results here, and that AncD*, like ecRNH*, shows protection of Helix A before Helix D. Still, it may be helpful to include some additional information on ecRNH* both for HX-MS and for folding kinetics in Figure 2—figure supplement 1.

First, thank you for pointing out that the manuscript would be more complete as a stand alone entity if we provided more information on ecRNH* for comparison. We have added this to the manuscript (see below). However, we would like to clarify an important point – the ecRNH* HX-MS data in the Hu et al., 2013 paper was actually conducted at 10°C (and not 25°C as suggestion by the reviewers), and is therefore under exactly the same conditions as the results in this manuscript. To help clarify this and provide a complete set of data on all the discussed proteins in this manuscript, we have included an additional panel (panel K) in Figure 2—figure supplement 1 showing the chevron plot for ecRNH* at both 25°C and 10°C. The data shown are identical to Figure 1C in the Hu et al., 2013 manuscript and re-plotted to be consistent with the other panels in Figure 2—figure supplement 1. Additionally, we have provided the HX-MS results for ecRNH* from the Hu et al., 2013 manuscript and presented it as a new supplementary figure (Figure 2—figure supplement 2).

Please clarify the significance/relationship of the 2 phases observed for ttRNH* for the HX-MS results, and whether/why 1 or 2 phases are observed for the other proteins.

Except for ttRNH*, all of the other RNases H studied here show a single observable phase and a burst phase (see Lim et al., 2016). It has been known for some time that the observable refolding kinetics for ttRNH* fits best to a biphasic exponential process (by CD spectroscopy). The underlying mechanism and structural implications of this second phase were studied extensively in a previous manuscript (Hollien and Marqusee, 2002). Although there is some evidence that proline isomerization is involved, the exact molecular process responsible for the second phase of ttRNH*, which was not observed in any other ancestral RNases H (see Lim et al., 2016), remains elusive. We do not see any evidence of a second phase in the pulse labeling HX-MS data. It would have been very exciting if our HX-MS data revealed two distinct phases in the rate-limiting step unique to ttRNH*. We have added a section in the manuscript (subsection “Monitoring the folding pathway of ttRNH* using pulsed-labeling HX-MS”, first paragraph) that expands on this topic, and acknowledge that this remains an open question in RNase H folding, which we hope a future HX-MS experiment or another method could eventually uncover.

- Figure 5 shows HX-MS at an early timepoint of folding for ecRNH* point mutations A55G (decreases helical propensity of Helix A) and D108L (to increase helicity of D). It would certainly be of interest to see also the corresponding residue resolved analysis, and if the observed effects persist throughout folding (i.e. add the equivalent of panel D).

As suggested, we have added a supplementary figure (Figure 5—figure supplement 1) that shows the residue-resolved data for ecRNH*-A55G and ecRNH*-D108L. For these two proteins, we only carried out the earliest folding time points and the folded state control in order to address the question and hypothesis of which helix is protected first.

https://doi.org/10.7554/eLife.38369.023

Article and author information

Author details

  1. Shion An Lim

    1. Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States
    2. Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Writing—original draft, Writing—review and editing
    Contributed equally with
    Eric Richard Bolin
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2136-2732
  2. Eric Richard Bolin

    1. Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, United States
    2. Biophysics Graduate Program, University of California, Berkeley, Berkeley, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Shion An Lim
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9265-0451
  3. Susan Marqusee

    1. Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States
    2. Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, United States
    3. Department of Chemistry, University of California, Berkeley, Berkeley, United States
    4. Chan Zuckerberg Biohub, San Francisco, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing—review and editing
    For correspondence
    marqusee@berkeley.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7648-2163

Funding

National Institute of General Medical Sciences (GM050945)

  • Shion An Lim
  • Eric Richard Bolin
  • Susan Marqusee

National Science Foundation (Graduate Research Fellowship)

  • Shion An Lim

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank members of the Marqusee Lab for helpful discussions, and Dr. Ha Truong for assistance with the aggregation prediction analyses. We also thank Dr. Leland Mayne and members of the Englander Lab and Dr. Goran Stjepanovic in the Hurley Lab for support on the pulsed-labeling HX-MS instrumental setup and analysis. We thank the Vincent J Coates Proteomics/Mass Spectrometry Facility for instrumentation support. This work was funded by NIH Grant GM050945 (to SM) and a National Science Foundation Graduate Research Fellowship (to SAL). SM is a Chan Zuckerberg Biohub Investigator.

Senior Editor

  1. Philip A Cole, Harvard Medical School, United States

Reviewing Editor

  1. Lewis E Kay, University of Toronto, Canada

Reviewer

  1. Lewis E Kay, University of Toronto, Canada

Publication history

  1. Received: May 14, 2018
  2. Accepted: September 9, 2018
  3. Accepted Manuscript published: September 11, 2018 (version 1)
  4. Version of Record published: September 26, 2018 (version 2)

Copyright

© 2018, Lim et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,390
    Page views
  • 255
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biochemistry and Chemical Biology
    2. Cancer Biology
    Gerard L Brien et al.
    Research Article
    1. Biochemistry and Chemical Biology
    2. Plant Biology
    Anika Küken et al.
    Research Article Updated