Many eukaryotic membrane proteins are prone to misfolding, which compromises their functional expression at the plasma membrane. This is particularly true for mammalian gonadotropin-releasing hormone receptors (GnRHRs), which are G protein-coupled receptors involved in reproductive steroidogenesis. We recently demonstrated that evolutionary modifications within mammalian GnRHRs appear to have coincided with adaptive changes in cotranslational folding efficiency. Though changes in protein stability are known to shape evolutionary interactions, it is unclear how the energetic drivers of cotranslational folding in the membrane may modify epistatic interactions. We therefore surveyed the pairwise epistatic interactions that modify the expression of two destabilized GnRHR variants bearing mutations that selectively compromise either its membrane topology (V276T) or its native tertiary structure (W107A). Using deep mutational scanning (DMS), we evaluated how the effects of these mutations on the expression of the mature form of the protein at the plasma membrane are modified by hundreds of secondary mutations. A focused analysis of 251 mutants with high-quality measurements in three genetic backgrounds reveals that V276T and W107A form distinct epistatic interactions that depend on both the degree to which they destabilize the protein and the mechanism of their destabilization. An unsupervised learning analysis shows that V276T forms predominantly negative epistatic interactions that are most pronounced among destabilizing mutations within soluble loop regions. In contrast, W107A forms interactions with mutations in both loops and transmembrane domains that skew positive as a result of the diminishing impact of the destabilizing mutations in the context of an already unstable variant. These findings provide general insights into how pairwise epistasis is remodeled by conformational defects in membrane proteins and, more generally, in unstable proteins.
The important study describes exhaustive deep mutational scanning (DMS) of the gonadotropin-releasing hormone wild-type receptor and for two single point mutations reported to impact its folding and structure, monitoring how plasma membrane expression levels are affected by mutations. This important work is pioneering in exploring the interaction between mutations (epistasis) in a membrane protein, with a potential for explaining membrane protein evolution and genetic diseases. The evidence provided for some mutations is convincing, but it remains incomplete and harder to interpret for others without further validation of folding and stability properties of the mutants.
Mutational effects on protein stability have important consequences for evolution. Destabilized proteins misfold more often, which can attenuate their ability to function in the cell (4, 17). Though natural selection typically maximizes fitness by incorporating mutations that produce new or improved functions, this optimization process is often hindered by the destabilizing effects of most random mutations (4, 39). The cumulative energetic effects of these mutations on protein stability are generally capable of creating non-additive, context-dependent epistatic interactions (10, 26, 36). Nevertheless, there are many different aspects of protein synthesis, folding, and assembly that are governed by distinct energetic constraints. While epistatic interactions arising from changes in protein stability have been previously characterized in soluble proteins (Gong et al. 2013; Olson et al. 2014; Faber et al. 2019; Nedrud et al. 2021), the impact of epistasis on membrane protein folding and stability has yet to be explored. Although many of the mechanistic underpinnings of pairwise epistasis in soluble proteins are likely to be generalizable to all proteins, membrane proteins undergo additional cotranslational folding reactions that are governed by mechanistically distinct kinetic and energetic constraints (17, 31). Based on these considerations, we suspect that mutations in membrane proteins could potentially modify cotranslational processes in a manner that generates distinct epistatic interactions that bias their evolutionary pathways in unique ways.
Unlike water-soluble proteins, membrane proteins must be cotranslationally inserted into lipid bilayers (Stage I folding) in order to fold into their native structure (Stage II folding) (30). In eukaryotes, Stage I folding is primarily facilitated by the Sec61 translocon complex in concert with its various other accessory subunits (17, 33).
Ribosomes translating membrane proteins are targeted to this translocon complex at the endoplasmic reticulum (ER) membrane, where the nascent transmembrane domains (TMDs) partition into the lipid bilayer and establish their native topology relative to the membrane (Stage I) (28, 30, 41). After translation is complete, membrane proteins fold and assemble (Stage II) in a manner that is coupled to their passage through the secretory pathway to the plasma membrane and/ or other destination organelles (17, 30, 44). Though they are governed by distinct physicochemical constraints, failures in either Stage I or Stage II folding are capable of increasing the proportion of nascent membrane proteins that are retained in the ER and prematurely degraded (17, 44). We therefore expect that epistatic interactions can arise from the cumulative energetic effects of mutations on either of these processes. Given that three-dimensional protein structures are stabilized by similar energetic principles in water and lipid bilayers, mutations that modify the fidelity of Stage II folding energetics should potentially generate long-range epistatic interactions that are comparable to those observed in soluble proteins (17). By comparison, the topological transitions that happen during the early stages of membrane protein synthesis may be coupled to neighboring loops and helices in a manner that can play a decisive role in the formation of the native topology (13, 25, 43). Based on these considerations, we reason that the effects of mutations on each process could create distinct epistatic interactions.
To gain insights into the mechanistic basis of epistatic interactions in membrane proteins, we surveyed the effects of single and double mutants on the plasma membrane expression (PME) of the gonadotropin-releasing hormone receptor (GnRHR), a G-protein coupled receptor (GPCR) involved in reproductive steroidogenesis across many species (14, 15). We have previously demonstrated that various evolutionary sequence modifications within mammalian GnRHRs have tuned its fitness by compromising the fidelity of Stage I folding (6). Nevertheless, it remains unclear how the inefficient cotranslational folding of GnRHR reshapes its tolerance of secondary mutations. Here, we utilize deep mutational scanning (DMS) to carry out a focused analysis of 251 mutations in the background of two mouse GnRHR (mGnRHR) variants that selectively compromise either Stage I or Stage II folding. Our results show that mutations which generate Stage I and Stage II folding defects form distinct epistatic interactions throughout this receptor. An unsupervised learning analysis reveals how these interactions depend on both changes in stability and the topological context of the mutation. Together, these findings suggest that the distinct biosynthetic mechanisms of integral membrane proteins may differentially shape their fitness landscapes. The general implications of our findings for the role of protein folding and stability in protein evolution are discussed.
Mutational Library Design
To compare how perturbations of Stage I and Stage II folding shape epistatic interactions, we first identified and characterized two individual mutations that are likely to selectively disrupt either the cotranslational membrane integration (V276T) or the native tertiary structure (W107A) of GnRHR. We previously showed that T277 in TMD6 of human GnRHR (hGnRHR) contributes to the inefficient translocon-mediated membrane integration of TMD6, which decreases the receptor’s PME (6). A recent crystallographic structure of hGnRHR confirms that the polar side chain of T277, which corresponds to V276 in the mouse receptor (89.3% sequence identity), is exposed to the lipid bilayer and does not appear to make interhelical contacts that stabilize the native structure (Fig. 1 A & B) (47). Thus, mutations at this position modify the hydrophobicity of TMD6 without perturbing the native tertiary contacts that stabilize the native fold. Indeed, replacing the native valine residue in TMD6 of the mouse receptor with the threonine residue found in hGnRHR reduces the efficiency of its translocon-mediated membrane integration in vitro (Fig. S1, Table S1). Furthermore, this substitution decreases the PME of the mouse receptor by 65 ± 4% relative to that of wild-type mGnRHR (WT, Fig. 2). Consistent with previous findings, these results confirm that the inefficient translocation of this helix promotes cotranslational GnRHR misfolding (6). In contrast, substitutions at W107 within extracellular loop 1 (ECL1) preserve the hydrophobicity of TMDs and their topological energetics while disrupting a conserved hydrogen bond network that is found in a wide array of GPCR structures (Fig. 1C) (16). Mutating this conserved tryptophan to alanine (W107A) reduces the PME of mGnRHR by 88 ± 4% relative to WT (Fig. 2), which suggests that disrupting this hydrogen bond network destabilizes mGnRHR in a manner that promotes its cellular misfolding. Comparing how these mutations modify the proteostatic effects of secondary mutations will therefore reveal how changes in the fidelity of Stage I (V276T) and Stage II (W107A) folding differentially impact mutational epistasis.
To survey the epistatic interactions formed by these mutations, we generated a series of genetic libraries consisting of 1,615 missense variants in the background of V276T, W107A, and WT mGnRHR. To ensure adequate dynamic range in the downstream assay, we created these variants in the cDNA of mGnRHR, which exhibits intermediate, tunable expression (6). Briefly, we used a structural homology model of mGnRHR to select 85 residues distributed across the loops and helices of mGnRHR. We included all 19 amino acid substitutions at the 3 most solvent-accessible and the 3 most buried residues in each TMD (Fig. S2 and Table S2). We also included mutations encoding all 19 amino acid substitutions at positions that are evenly distributed across each soluble domain. We then generated an array of mutagenic oligonucleotides encoding each of these mutations (Table S3), which was used to introduce these mutations in the context of the V276T, W107A, and WT mGnRHR cDNAs by nicking mutagenesis (21). We generated each of these libraries in the context of plasmids containing a randomized ten-base region within the backbone. Unique 10-base plasmid identifiers that could be matched to specific mGnRHR variants of interests were used to score variants in the downstream assay using Illumina sequencing. Using whole plasmid PacBio sequencing of each library, we matched 1,383 of the 1,615 possible variants to one or more ten-base unique molecular identifier (UMI) in at least one of the three libraries, 320 of which were found in all three genetic backgrounds (see Methods for details).
Plasma Membrane Expression of GnRHR Mutant Libraries
To identify secondary mutations that form epistatic interactions with V276T and W107A, we measured the relative PME of single and double mutants within each genetic library by DMS as previously described (27). Briefly, we first generated a series of recombinant, pooled HEK293T cell lines in which the individual cells that each express one of the single or double mGnRHR mutants from one of the three plasmid libraries described above. A comparison of the mGnRHR surface immunostaining profiles of these cellular populations that recombinantly express this collection of variants in context of the V276T, W107A, or an otherwise WT genetic background reveals striking context-dependent differences in their proteostatic effects. Recombinant cells expressing these variants in the WT background exhibit a bimodal distribution of mGnRHR surface immunostaining, where 59% of cells exhibit immunostaining that is comparable to WT and 41% of cells exhibit reduced expression relative to WT (Fig. 3A). Although cells expressing this same collection of variants in the V276T background also exhibit bimodal surface immunostaining, 48% of the population exhibit diminished expression relative to the V276T variant (Fig. 3B). This observation suggests many of these variants synergistically decrease mGnRHR expression in combination with the V276T mutation. Unlike the WT and V276T libraries, cells expressing this same collection of variants in combination with the W107A mutation exhibit a continuous range of surface immunostaining intensities that are only slightly lower, on average, relative to cells expressing the W107A single mutant (Fig. 3C). The lack of dispersion likely reflects a compression of the distribution that arises as the folding efficiency of these variants approaches zero (see Discussion). Cells expressing this library of W107A double mutants do, however, still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants (Fig. 3C). Nevertheless, these results suggest that none of the secondary mutations can fully compensate for the proteostatic effects of the W107A mutation.
To compare the PME of these variants in each genetic background, we separated the mixed cell populations into quartiles based on the relative surface immunostaining of their expressed mGnRHR variants using fluorescence-activated cell sorting (FACS), then extracted the genomic DNA from each cellular fraction. We then used Illumina sequencing to track the relative abundance of the recombined UMIs and their associated variants within each fraction, then used these measurements to estimate the corresponding PME of each variant. A series of filters relating to the quality of the reads, the sampling of each mutation, and the similarity of variant scores across replicates were used to remove poorly sampled variants with unreliable scores (see Methods). Briefly, to restrict our analysis to variants that can be reliably scored, we removed variants that did not have at least 50 counts across the four bins in each replicate. Additionally, we compared the percentile rank of each variant across replicates and discarded variant scores for which the difference in percentile rank across the two replicates was greater than 25%. The relative PME of the remaining variants was averaged across two biological replicates. Overall, we measured the relative PME of 1,004 missense variants in the WT background, 338 missense variants in the V276T background, and 1,179 missense variants in the W107A background (Table S4). Out of the 320 variants identified in all three plasmid libraries by PacBio sequencing, 251 mutations passed these quality filters in the Illumina sequencing of the cellular isolates (Table S 4). In the following, we will focus on the comparison of the effects of these 251 mutations in each background in order to determine how their proteostatic effects are modified by secondary mutations.
To facilitate the comparison of estimated immunostaining intensities for each variant across the three genetic backgrounds, we normalized the raw immunostaining intensity values for each variant relative to that of the measured intensity values for the corresponding reference sequence (V276T, W107A, or WT mGnRHR). Relative intensity values of 1.0 correspond to no effect, whereas values over 1.0 correspond to mutations that enhance surface expression in that genetic background, and vice versa. A histogram of the relative intensity values for the 251 variants measured in all three backgrounds generally recapitulates the trends in the cellular immunostaining histograms (Fig. 3 & 4A). Most of these mutations decrease the PME of mGnRHR in all three backgrounds, which reflects the limited mutational tolerance of membrane proteins (Fig. 4A) (38). Notably, only a modest fraction of mutations measurably enhance the surface expression of WT mGnRHR (51 mutations, 20%) and V276T mGnRHR (21 mutations, 8% Fig. 4A). Mutations in the V276T background tend to decrease surface expression more than they do in the WT background, and this trend persists across most domains of the protein (Fig. 4 A & B). In contrast, a larger proportion of these mutations (100 mutants, 40%) enhance the surface expression of W107A (Fig. 4A), though these increases in immunostaining are relatively modest (Fig. 3C). Furthermore, the mutations tend to be better tolerated with respect to surface expression in combination with W107A relative to their effects on expression in the WT background (Fig. 4C).
Projecting the average relative intensities for the mutations at each residue onto the structure reveals that most positions exhibit similar trends in the mutational tolerance in each background, with loop residues being generally more permissive than those within TMDs (Fig. 4 D – F). Nevertheless, there are subdomains where the quantitative mutagenic effects deviate in a context-dependent manner. Variants bearing mutations within the C-terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E). Given that V276T perturbs the cotranslational membrane integration of TMD6 (Fig. S1, Table S1), this directional bias potentially suggests that the apparent interactions between these mutations manifest during the late stages of cotranslational folding. In contrast, mutations that are better tolerated in the context of W107A mGnRHR are located throughout the structure but are particularly abundant among residues in the middle of the primary structure that form TMD4, ICL2, and ECL2 (Fig. 4 C & F). Together, these observations show how the proteostatic effects of mGnRHR mutations are modified by secondary mutations that differentially affect the fidelities of Stage I and Stage II folding.
Distinct Pairwise Epistasis in V276T and W107A
To compare epistatic trends in these libraries, we calculated the magnitude of the epistatic interactions these 251 mutations form with V276T and W107A by comparing their relative effects on PME of the WT, V276T, and W107A variants using a previously described relative epistasis model (product model, Olson et al. 2014). Briefly, we defined fitness as the ratio of the single/ double mutant PME to that of WT mGnRHR. Epistasis scores (ε) were then determined by taking the log of the double mutant fitness value divided by the difference between the single mutant fitness values (see Methods). Positive ε values denote double mutants that have greater PME than would be expected based on the effects of single mutants. Negative ε values denote double mutants that have lower PME than would be expected based on the effects of single mutants. Pairs of mutations with ε values near zero have additive effects on PME. Though epistatic interactions are typically rare in the context of stable proteins (36), epistasis scores among this set of 251 random mutations generally form positive epistatic interactions with W107A and negative epistatic interactions with V276T (Fig. 5). Interestingly, the difference in the epistasis scores for the interactions these variants form with W107A and V276T was at least 1.0 for 98 of the 251 variants (Table S4). Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD). In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants in a manner that appears to depend on the specific folding defect (V276T vs. W107A) and topological context.
Molecular Basis for the Observed Epistatic Interactions
To identify general trends in the observed epistatic interactions, we utilized unsupervised learning to elucidate patterns among the relative PME values of variants across each genetic background. We first utilized uniform manifold approximation projection (UMAP) (20) to identify mutations that have similar expression profiles across these conditions. A projection of the variants onto a resulting two-dimensional coordinate reveals that most variants fall into one of approximately three groups (Fig. 6A). Using a density-based hierarchical clustering analysis (HDBSCAN), we unambiguously assigned 182 variants into three distinct clusters based on their expression profiles alone (Fig. 6A, Table S4) (5). An analysis of the structural context of these mutants reveals that one of the largest clusters (Cluster 3, n = 76) primarily consists of mutations of soluble loop residues that have little impact on expression and exhibit minimal epistasis (Fig. 6 A-C). This analysis also identified a smaller cluster of loop mutations (Cluster 2, n = 30) that moderately decrease expression and exhibit a greater propensity to form epistatic interactions with V276T and/ or W107A relative to the neutral loop mutations in Cluster 3 (Fig. 6 A-C). Nevertheless, most of the mutations that exhibit strong positive epistatic interactions with W107A fell into Cluster 1 (n = 76), which primarily consists of mutations within TMDs that significantly decrease expression in all three genetic backgrounds (Fig. 6 A-C). A comparison across these clusters suggests positive epistatic interactions with W107A primarily stems from the fact that mutations that compromise expression can only cause a relatively modest reduction in the PME of W107A GnRHR (Fig. 6 B & C), which is already predominantly misfolded. In contrast, the moderately disruptive loop mutations in Cluster 2 tend to exhibit more pronounced negative epistatic interactions with V276T than do the highly disruptive TMD mutations in Cluster 1 (Fig. 6C). The divergent epistatic interactions that disruptive loop and TMD mutations form with V276T potentially arise from differences in the mechanistic basis for the destabilization caused by these two classes of mutations (see Discussion).
Many epistatic interactions arise from the additive energetic effects of mutations on protein folding (26, 40). To determine how these trends relate to the effects of mutations on thermodynamic stability, we utilized Rosetta CM to generate structural models of 243 of the 251 variants that were scored in all three backgrounds and fall within the structured regions of the receptor (35). We then utilized a membrane-optimized Rosetta energy scoring function to approximate the change in the energy of the native structural ensemble of each variant (1). The cluster of loop mutations with lower expression (Cluster 2, avg. ΔΔG = + 2.2 REU) generally contains more destabilizing mutations relative to the cluster of loop mutations with WT-like expression (Cluster 3, avg. ΔΔG = + 0.8 REU), which affirms the general relationship between stability and PME in this system (p = 0.003, Fig. 6 B & D). However, the cluster of TMD variants (Cluster 1, avg. ΔΔG = + 3.4 REU) is generally more destabilizing than either of the loop clusters (Fig. 6D). A comparison across the clusters suggests that the degree of destabilization generally tracks with the magnitude of positive epistatic interactions that form with W107A, regardless of structural context (Fig. 6 C & D). This uptick in positive epistasis among mutations that destabilize the native fold and severely reduce PME likely reflects the attenuated effect of destabilizing mutations as the equilibrium fraction of folded protein approaches zero (see Discussion, Fig. 7). In contrast to W107A, epistatic interactions with V276T are predominantly negative and are most pronounced among moderately destabilizing loop mutations (Fig. 6 C & D). The abundance of negative interactions with V276T parallels the observed pairwise epistasis in other folded proteins (26), and likely reflects the higher baseline expression and/ or stability of this variant relative to W107A (Figs. 2 & 7). Nevertheless, stability-mediated epistasis cannot account for the attenuation of the epistatic interactions that V276T forms with the highly destabilizing mutations within Cluster 1 relative to the moderately destabilizing mutations within Cluster 2 (Fig. 6 C & D). These observations suggest that the magnitude of the epistatic interactions formed by V276T is dependent on not only the change in stability, but also the topological context of the secondary mutation.
Many epistatic interactions between mutations within genes encoding soluble proteins arise from their cumulative effects on protein folding energetics (24, 26). Folded proteins can only tolerate a limited number of destabilizing mutations before their cumulative energetic effects cause a cooperative decrease in the fraction of folded protein (Fig. 7A). Although this general principle should apply to all proteins, it is unclear how the distinct energetic processes involved in membrane protein biosynthesis and folding may alter such evolutionary couplings. In this study, we utilize DMS to compare the epistatic interactions formed by two mutations that reduce the expression of a misfolding-prone GPCR by selectively destabilizing either its cotranslational membrane integration (V276T) or its native tertiary structure (W107A). We note that the introduction of the V276T mutation into the mouse receptor utilized herein mimics one of the key destabilizing modifications that reduces the expression of hGnRHR relative to mGnRHR (6). Through a focused analysis of 753 total single and double mutants, we find that these mutations give rise to divergent epistatic interactions that modify the PME of mGnRHR (Fig. 5A). Mutations that form epistatic interactions with V276T tend to synergistically decrease mGnRHR expression (Figs. 3 A & B, 4B, and 5A)-a typical epistatic trend that likely arises from the combined destabilizing effects of both mutations. Surprisingly, these interactions are most prominent among moderately destabilizing loop mutations (Fig. 6C). In contrast, many of these same secondary mutations instead form positive epistatic interactions with W107A mGnRHR, regardless of their topological context (Figs. 5A & 6C). As this comparison reflects the epistatic interactions formed by an identical set of secondary mutations, we conclude that these context-dependent epistatic interactions arise from differences in the magnitude and/ or the mechanism of the mGnRHR destabilization caused by the V276T and W107A mutations.
In the context of stable proteins, stability-mediated epistasis tends to magnify the effects of deleterious mutations due to the cumulative destabilization that arises from sequential random substitutions (39). This is a fundamental consequence of the cooperative dependence of the fraction of folded protein on the free energy of unfolding. Indeed, most of the random mGnRHR mutations characterized herein are predicted to destabilize the native structure in a manner that coincides with a measurable decrease in the PME in the context of all three genetic backgrounds (Figs. 4A & 6D). Whereas these destabilizing mutations exhibit the expected negative epistatic couplings with V276T, their interactions with W107A skew positive (Fig. 5A). This anomalous positive epistasis is potentially a byproduct of the severe destabilizing effects of W107A relative to V276T (Fig. 2). While destabilizing secondary mutations can cause significant decreases in the yield of folded WT or V276T mGnRHR, the W107A variant already exhibits marginal surface immunostaining. In the context of this predominantly misfolded, poorly expressed variant, even highly destabilizing secondary mutations can only cause a relatively small decrease in the yield of folded protein. This observation can be more generally related to the relatively shallow dependence of the fraction of folded protein on the free energy of folding under conditions where folding becomes unfavorable-unfolding transitions become shallow as the fraction of folded protein approaches zero (Fig. 7B). Stabilizing mutations can also generate potent positive epistatic couplings in this context given that even small increases in stability can push the system into the steepest portion of the transition zone (Fig. 7B). Paradoxically, the asymmetry of the folding curve in this regime suggests destabilizing mutations are more tolerable in the context of an unstable protein-folding efficiency effectively has nowhere to go but upwards. Essentially, the nature of this sigmoidal transition suggests the typical trends associated with stability-mediated epistasis should become inverted as stability decreases. Such considerations could potentially expand the accessible sequence space in metastable proteins, which constitute a significant portion of the proteome (9). Combinations of mutations that would not generally be tolerated in stable proteins may also occur through such mechanism in the context of the numerous misfolded secondary alleles that are carried within the genomes of humans and other diploids with no appreciable fitness cost (e.g. the ΔF508 variant of CFTR).
Unlike W107A, the epistatic interactions formed by V276T appear to depend on their topological context-negative epistasis is more prevalent among moderately destabilizing loop variants than highly destabilizing TMD variants (Fig. 6 C & D). This observation suggests stability-mediated epistasis between mutations that destabilize the native tertiary structure of integral membrane proteins (Stage II folding) may be distinct from the interactions that arise from the disruption of their translocon-mediated cotranslational membrane integration (Stage I folding). This divergence could potentially reflect the distinction between the energetic principles of protein folding relative to those that guide the insertion of proteins into membranes, which is fundamentally driven by a membrane depth-dependent solvation energetics (13, 23). Counterintuitively, polar mutations that disrupt the membrane integration of individual TMDs also have the potential to form interhelical hydrogen bonds that can stabilize the topological orientation of neighboring TMDs, which can give rise to long-range topological couplings. (11, 22, 25). The propensity of such contacts to form may in some cases depend on the length and hydrophobicity of the loops that connect them (45). Finally, we should note that complex epistatic interactions between residues within TMDs could potentially arise through their propensity to remodel the translocon complex-polar residues within TMDs may affect the recruitment of secondary insertases and/ or intramembrane chaperones such as TRAP, the PAT complex, and the ER membrane protein complex (8, 18, 29, 32, 34, 37). Additional insights into the mechanistic basis for the recruitment of such complexes may therefore be needed to fully rationalize evolutionary couplings between TMDs. We note that, more generally, we suspect that synergistic genetic interactions between mutations could potentially arise from changes in the energetics of many other biosynthetic processes such as cofactor binding, oligomer formation, and chaperone interactions.
Together, our results reveal divergent, mechanism-dependent patterns of pairwise epistasis in the context of a misfolding-prone integral membrane protein. These findings provide novel insights as to how the unique biosynthetic constraints of integral membrane proteins can potentially lead to unique evolutionary patterns. Moreover, our results generally suggest that the deleterious effects of destabilizing mutations are partially attenuated in the context of unstable proteins. Additional investigations are needed to explore how stability-mediated epistasis is modified by other proteostatic constraints, such as the interactions of nascent proteins with cofactors and/ or molecular chaperones.
Materials & Methods
Plasmid Preparation and Characterization of Genetic Libraries
The expression of transiently expressed single mutants was carried out using a previously described pcDNA5 FRT expression vector containing mGnRHR cDNA with an N-terminal hemagglutinin (HA) epitope and bicistronic eGFP (Fig. 2) (6). Mutations were introduced by site-directed mutagenesis with PrimeSTAR HS DNA Polymerase (Takara Bio, Shiga, Japan). Biochemical measurements of the membrane integration of TMD6 were carried out using a previously described pGEM vector containing modified leader peptidase (Lep), with TMDs of interest cloned into the H-segment (Fig. S1) (6, 13). Mutations were introduced into the H-segment by site-directed mutagenesis with PrimeSTAR HS DNA Polymerase (Takara Bio, Shiga, Japan). Plasmids were purified using a ZymoPURE Midiprep Kit (Zymo Research, Irvine, CA). DMS experiments were carried out using a previously described pcDNA5 FRT vector containing mGnRHR cDNA bearing an N-terminal influenza hemaglutinin (HA) epitope, followed by an internal ribosome entry site (IRES) and eGFP sequence, which was further modified to be compatible with recombination-based DMS approaches (6). First, an attB recombination site was inserted in place of the CMV promoter by InFusion HD Cloning (Takara Bio, Shiga, Japan). To facilitate the creation of these libraries using nicking mutagenesis, a BbvCI restriction site was introduced by site-directed mutagenesis using PrimeSTAR HS DNA Polymerase (Takara Bio, Shiga, Japan). The V276T and W107A mutations were also introduced by site-directed mutagenesis with PrimeSTAR HS DNA Polymerase (Takara Bio, Shiga, Japan).
To generate mutational libraries, a 10N unique molecular identifier (UMI), or “barcode,” was first inserted into the plasmid backbone using a previously described nicking mutagenesis method (46). Nicking mutagenesis plasmid products were transformed into NEB 10-beta electrocompetent cells (New England Biolabs, Ipswich, MA) and purified using a ZymoPURE Midiprep Kit (Zymo Research, Irvine, CA). A programmed oligo pool (Agilent, Santa Clara, CA) encoding all 19 amino acid substitutions at 85 residues throughout the mGnRHR sequence was then used to generate pools of single and double mutants in the context of WT, V276T, and W107A mGnRHR cDNA using a previously described nicking mutagenesis method (21). The resulting plasmid products were transformed into NEB 10-beta electrocompetent cells (New England Biolabs, Ipswich, MA) and purified using a ZymoPURE Midiprep Kit (Zymo Research, Irvine, CA). To limit the number of mGnRHR variants per UMI, these libraries were then bottlenecked through an additional transformation in NEB Turbo electrocompetent cells (New England Biolabs, Ipswich, MA), and were again purified using a ZymoPURE Midiprep Kit (Zymo Research, Irvine, CA). The resulting plasmid preparations contained detectable levels of a concatemer product, which were removed from the mutational libraries by gel purification using a Zymoclean Gel DNA Recovery Kit (Zymo Research, Irvine, CA). Purified products were again transformed into either NEB 10-beta (New England Biolabs, Ipswich, MA) or SIG10 (Sigma-Aldrich, St. Louis, MO) electrocompetent cells, and purified with a ZymoPURE Midiprep Kit (Zymo Research, Irvine, CA).
To associate each UMI with its corresponding GnRHR variant, the final plasmid libraries were also sequenced using PacBio SMRT sequencing. Briefly, plasmid libraries were first double-digested with PmlI and MfeI-HF restriction enzymes (New England Biolabs, Ipswich, MA), then purified using a Zymo DNA Clean & Concentrator-5 kit (Zymo Research, Irvine, CA). Complete digestion was verified by analyzing the products on an agarose gel, purity was assessed with a Synergy Neo2 microplate reader (BioTek, Winooski, VT), and the final yield was quantified using a Qubit4 fluorometer (ThermoFisher, Waltham, MA). The 1537 bp fragment for each library, containing the 10N UMI and GnRHR open reading frame, was isolated on a BluePippin instrument (Sage Science, Beverly, MA). These fragments were then assembled into PacBio libraries and sequenced on the Sequel II system with a 30-hour runtime.
The sequence of each plasmid-derived fragment containing mGnRHR variants and their corresponding UMIs was determined by reconstructing the circular consensus sequences (CCS) for each independent well in the PacBio readout. mGnRHR libraries were sequenced to a depth of 1,825,010 wells for WT, 1,545,151 wells for V276T, and 1,497,359 wells for W107A. A minimum coverage of five passes around each circular fragment (subread) was required for incorporation of sequencing data from an individual well into the analysis. CCS were mapped to the reference open reading frame using minimap2. CCS that did not contain a complete 10-base UMI sequence or fully cover the target ORF were excluded from the analysis. Variations in the mGnRHR sequence of individual fragments were called using SAMtools. Reads were then grouped according to their UMI sequences. Given that insertion and deletion (indel) artifacts are prevalent within PacBio sequencing data, we only removed UMIs from the analysis if indels were detected at a specific position within the UMI in at least 10% of the subreads. Indels observed within the open reading frame were only called if their occurrence was judged to be statistically significant relative to the background indel rate for the reads according to a binomial test (p <= .05). Codon substitutions encoded in our primer library (Table S3) were called within the mGnRHR open reading frame only if they were detected in at least 75% of the individual subreads. Only UMIs that were found to be associated with a single mGnRHR variant were included in the final analysis. Our final “dictionaries” for the WT, V276T, and W107A libraries contained 3,261, 531, and 4,962 unique UMIs, respectively, that could be indexed to a corresponding mGnRHR variant.
Preparation of Cellular Libraries and Cell Sorting
A previously described HEK293T cell line with a genomic Tet-Bxb1-BFP landing pad was obtained from D. Fowler (18). Cells were grown at 37 oC and 5.0% CO2 in complete media, made of Dulbecco’s Modified Eagle’s medium (DMEM, Gibco, Grand Island, NY) supplemented with 10% fetal bovine serum (FBS, Gibco, Grand Island, NY), 0.5% penicillin (Gibco, Grand Island, NY), and 0.5% streptomycin (Gibco, Grand Island, NY). 2 million cells were plated in 10-cm tissue culture plates and co-transfected the next day with 475 ng of plasmid encoding bxb1 recombinase and 7125 ng of the plasmid library using Lipofectamine 3000 (Invitrogen, Carlsbad, CA). Cells were grown at 33 °C for 4 days after transfection. 2 days after transfection, the cells were induced with 2 μg/mL doxycycline. All cells were expanded into 15-cm tissue culture plates 6 days after transfection, and 10 million cells were re-plated into new 15-cm tissue culture plates 3 days later.
Twelve days after transfection, the cells were washed with 1X phosphate-buffered saline (PBS, Gibco, Grand Island, NY), and harvested with TrypLE Express protease (Gibco, Grand Island, NY). Cells were washed twice with 2% fetal bovine serum in PBS (wash buffer), resuspended in PBS containing 2% FBS to 10M cells/ mL, and then sorted on a BD FACS Aria II (BD Biosciences, Franklin Lakes, NJ). Recombined cells were isolated based on their characteristic gain of bicistronic eGFP and loss of BFP expression (18). Forward and side scatter profiles were first used to gate for intact live cells, and cells with eGFP-positive (488 nm laser, 530/30 nm emission filter) and BFP-negative (405 nm laser, 450/50 nm emission filter) fluorescence profiles were collected in complete media supplemented with 10% FBS (Gibco, Grand Island, NY) and then plated in 10-cm tissue culture plates. Cells were induced with 2 μg/mL doxycycline 24 hours after sorting. 4 days after sorting, 10 million cells were passaged into 15-cm tissue culture plates.
7 days after sorting for eGFP-positive cells, cells were sorted again according to the surface expression of GnRHR variants. Briefly, cells were first washed with 1X phosphate-buffered saline (PBS, Gibco, Grand Island, NY), and harvested with TrypLE Express protease (Gibco, Grand Island, NY). GnRHRs expressed at the cell surface were immunostained for 30 minutes in the dark with DyLight 550-conjugated anti-HA antibody (Invitrogen, Carlsbad, CA). Cells were then washed twice with PBS containing 2% FBS, and sorted on a BD FACS Aria II (BD Biosciences, Franklin Lakes, NJ). Forward and side scatter profiles were used to gate for intact live cells, and eGFP fluorescence was used to gate for recombined cells. Cells were then sorted into even quartiles based on the amount of DyLight 550 fluorescence (561 nm laser, 585/15 nm emission filter). Sorted cells were collected in complete media supplemented with 10% FBS (Gibco, Grand Island, NY) and plated in 10-cm tissue culture plates. Cells were allowed to grow to confluency and then were washed with 1X phosphate-buffered saline (PBS, Gibco, Grand Island, NY) and harvested with TrypLE Express protease (Gibco, Grand Island, NY). Cell pellets were frozen for subsequent analysis.
Extraction of Genomic DNA and Next Generation Sequencing
Genomic DNA (gDNA) was extracted from cell pellets using the Sigma GenElute Mammalian gDNA Miniprep Kit (Sigma-Aldrich, St. Louis, MO). DNA amplicons for Illumina sequencing were produced from the gDNA using a previously described semi-nested polymerase chain reaction (PCR) technique (18). In order to selectively amplify recombined DNA, the first PCR utilized a primer which anneals upstream of the 10N UMI and a primer which anneals in the BFP-encoding region of the landing pad. Eight replicate PCRs were carried out with 2.5 μg of gDNA and HiFi HotStart Ready Mix (Kapa Biosystems, Wilmington, MA). These PCRs were limited to 7 cycles to avoid PCR bias. PCR products were purified using a ZR-96 DNA Clean & Concentrator-5 kit (Zymo Research, Irvine, CA), and replicate products were then pooled. 10 μL of this first PCR product was then used as the template for a second PCR, which utilized primers that incorporate Illumina adapter sequences to both ends of the 10N UMI. To avoid PCR bias, these reactions were monitored by real-time PCR on a StepOne RT-PCR instrument (Applied Biosystems, Waltham, MA) and terminated during mid-log amplification (17 or 18 cycles). Four replicate PCRs were carried out with HiFi HotStart Ready Mix (Kapa Biosystems, Wilmington, MA) and SYBR Green fluorescent dye (Applied Biosystems, Waltham, MA). Replicate PCR products were then combined and purified with a Zymoclean Gel DNA Recovery Kit (Zymo Research, Irvine, CA). To further improve the yield and quality of the amplicons, an additional 6-cycle PCR was carried out with the HiFi HotStart Ready Mix (Kapa Biosystems, Wilmington, MA) and primers that preserve the DNA sequence. These PCR products were then purified using a Zymo DNA Clean & Concentrator-5 kit (Zymo Research, Irvine, CA). Finally, amplicon yield and quality were assessed using an Agilent 2200 TapeStation with D1000 tape (Agilent Technologies, Santa Clara, CA) and a Synergy Neo2 microplate reader (BioTek, Winooski, VT). Amplicons were sequenced using an Illumina NextSeq150 flow cell, paired end, with 50% PhiX content.
Plasma Membrane Expression and Epistasis Measurements
Illumina sequencing of the recombined UMIs within each sorted cell fraction were matched to the corresponding GnRHR mutants using the PacBio sequencing data and subjected to previously described quality filtering (27). Estimations of PME for each mutant were then calculated as previously described (27). PME measurements were averaged across two biological replicates. Within each library, the average fluorescence intensity values of the mutants were ranked and converted into a percentile. Mutants which differed in their percentile values by more than 25% across the two replicates were excluded from analysis. The use of percentile-based filtering overcomes intrinsic limitations associated with the distinct dynamic range constraints that occur because of the distinct surface immunostaining profiles of these three cellular libraries (Fig. 3). Intensity values that meet these criteria are highly correlated across the two biological replicates (average Spearman’s ρ=0.936 across the three libraries, Supplementary Figs. S3 & S4). To facilitate the comparison of epistatic interactions within a common set of secondary mutations, we excluded mutants that did not pass quality filtering in all three libraries from our analysis. After data filtering, we obtained PME measurements for 251 mutants in all three libraries (Table S 4-6). These PME measurements were then utilized to calculate epistasis scores for the double mutants in the V276T and W107A libraries using the following equation:
where ε is the epistasis score, ωab is the fluorescence intensity of the double mutant relative to WT mGnRHR, ωa is the fluorescence intensity of the single mutant in the WT library relative to WT mGnRHR, and ωb is the fluorescence intensity of the background mutation (either V276T or W107A) relative to WT mGnRHR, as previously described (26).
Expression Measurements of Individual GnRHR Variants
HEK283T cells were obtained from ATCC and grown under the same conditions described above. GnRHR variants were transiently expressed in these cells, and the plasma membrane and intracellular expression of these variants was measured by flow cytometry, as previously described (6). The reported expression levels represent measurements from three biological replicates.
In vitro Translation of Chimeric Lep Proteins
Messenger RNA (mRNA) of chimeric Lep proteins was generated and used for in vitro translation reactions as previously described (6). Translation products were then analyzed by SDS-PAGE as previously described (6). The intensities of singly (G1) and doubly (G2) glycosylated protein were quantified by densitometry in ImageJ software. Apparent transfer free energy values representing transfer of the TMDs of interest from the translocon to the membrane were then calculated using the following equation:
where ΔGapp is the free energy for transfer of the TMD into the membrane, R is the universal gas constant, T is the temperature, Kapp is the equilibrium constant for transfer of the TMD into the membrane, G1 is the intensity of the singly glycosylated band, and G2 is the intensity of the doubly glycosylated band, as previously described (12). The reported transfer free energy values represent the average of three experimental replicates.
Structural Modeling of M. musculus GnRHR
The sequence of mGnRHR was obtained from Uniprot (accession number Q01776) and a homology model in the inactive state was generated as previously described (6). Images were generated by structurally aligning this mGnRHR model to an experimentally determined inactive-state human GnRHR crystal structure (GNRHR, PDB 7BR3, 2.79 Å) using the Super command in PyMol (Schrödinger, Inc., New York, NY) (47).
Computational Estimates for the Stability of mGnRHR Variants
Structural estimates for the effects of mutations on the stability of mGnRHR were determined using a previously described Rosetta based protocol featuring a membrane protein optimized energy function (1, 2). Briefly, the homology model of mGnRHR described above was used as the starting structure for computational stability estimates. A spanfile describing the transmembrane regions was created for mGnRHR using OCTOPUS predictions from Topcons (3, 42). The homology model was transformed into membrane coordinates using the mp_transform application. An ensemble of structures of each variant and wildtype (50 iterations) was generated from the homology model. For variants containing two mutations, the mutations were both introduced at the same time. The ΔΔG for each variant was calculated by subtracting the average score of the three lowest scoring variant models from the average score of the three lowest scoring WT models (ΔΔG = ΔGmut -ΔGwt).
To classify mGnRHR variants based on their expression profiles, we first standardized their scores using the Standard Scaler tool from Python’s scikit-learn library in order to ensure that the clustering algorithms are not affected by the scale of the data. Next, we then reduced the dimensionality of the standardized data using UMAP (20). We then used a density-based clustering algorithm (HDBSCAN) to cluster variants based on their expression profiles (19). Unsupervised learning analyses were carried out based on deep mutational scanning measurements alone, and did not incorporate any structural and/ or energetic parameters.
Quantification and Analysis
Flow cytometry data were analyzed in FlowJo software (Treestar, Ashland, OR), and in vitro translation data were analyzed in ImageJ software. PME measurements were analyzed in Origin software (OriginLab, Northampton, MA) and mapped onto the mGnRHR homology model in PyMol (Schrödinger, Inc., New York, NY).
Data and Resource Availability
Code for the analysis of sequencing data for deep mutational scanning experiments can be found on the Schlebach Lab GitHub page (https://github.com/schebachlab). Fluorescence intensity values for mutants in all three mutational libraries are included as Excel files in the Supplementary Materials. Additional data is available upon request.
We thank Christiane Hassel and the Indiana University Bloomington Flow Cytometry Core Facility, as well as Luke Tallon, Xuechu Zhao, Lisa Sadzewicz, and the University of Maryland Institute for Genome Sciences for their experimental support. This work was supported by a National Science Foundation (NSF) Graduate Research Fellowship (1342962 to L. M. C.) and by the National Institutes of Health (R01GM129261 to J.P.S). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
- 1.An Integrated Framework Advancing Membrane Protein Modeling and DesignPLoS Comput Biol 11
- 2.Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein–Protein Binding Affinity upon MutationJ Phys Chem B 122:5389–99
- 3.TOPCONS: consensus prediction of membrane protein topologyNucleic Acids Res 37:W465–8
- 4.Thermodynamics of neutral protein evolutionGenetics 175:255–66
- 5.Density-Based Clustering Based on Hierarchical Density EstimatesIn Advances in Knowledge Discovery and Data Mining :160–72
- 6.Molecular basis for the evolved instability of a human G-protein coupled receptorCell Rep 37
- 7.Impact of in Vivo Protein Folding Probability on Local Fitness LandscapesMol Biol Evol 36:2764–77
- 8.Visualization of translation and protein biogenesis at the ER membraneNature 614:160–67
- 9.Cellular proteomes have broad distributions of protein stabilityBiophys J 99:3996–4002
- 10.Stability-mediated epistasis constrains the evolution of an influenza proteinElife 2013:1–19
- 11.Inter-helical hydrogen bond formation during membrane protein integration into the ER membraneJ Mol Biol 334:803–9
- 12.Recognition of transmembrane helices by the endoplasmic reticulum transloconNature 433
- 13.Molecular code for transmembrane-helix recognition by the Sec61 transloconNature 450
- 14.Restoration of testis function in hypogonadotropic hypogonadal mice harboring a misfolded GnRHR mutant by pharmacoperone drug therapyPNAS 110:21030–35
- 15.Regulation of G Protein-coupled Receptor Trafficking by Inefficient Plasma Membrane Expression MOLECULAR BASIS OF AN EVOLVED STRATEGYJ Biol Chem 281:8417–25
- 16.Structural and functional characterization of G protein-coupled receptors with deep mutational scanningElife 9:1–52
- 17.Folding and Misfolding of Human Membrane Proteins in Health and Disease: From Single Molecules to Cellular ProteostasisChem Rev 119:5537–5606
- 18.A platform for functional assessment of large variant libraries in mammalian cellsNucleic Acids Res 45:1–12
- 19.hdbscan: Hierarchical density based clusteringThe Journal of Open Source Software 2
- 20.UMAP: Uniform Manifold Approximation and ProjectionJ Open Source Softw 3
- 21.User-defined single pot mutagenesis using unamplified oligo poolsProtein Engineering, Design and Selection 32:41–45
- 22.Asn- and Asp-mediated interactions between transmembrane helices during translocon-mediated membrane protein assemblyEMBO Rep 7:1111–16
- 23.Side-chain hydrophobicity scale derived from transmembrane protein folding into lipid bilayersPNAS 108:10174–77
- 24.A large-scale survey of pairwise epistasis reveals a mechanism for evolutionary expansion and specialization of PDZ domainsProteins 89:899–914
- 25.Orientational Preferences of Neighboring Helices Can Drive ER Insertion of a Marginally Hydrophobic Transmembrane HelixMol Cell 45:529–40
- 26.A comprehensive biophysical description of pairwise epistasis throughout an entire protein domainCurrent Biology 24:2643–51
- 27.Probing Biophysical Sequence Constraints within the Transmembrane Domains of Rhodopsin by Deep Mutational ScanningSci Adv 6
- 28.Organization of the native ribosome–translocon complex at the mammalian endoplasmic reticulum membraneBiochim Biophys Acta Gen Subj 1860:2122–29
- 29.Structural basis for membrane insertion by the human ER membrane protein complexScience 369:433–36
- 30.Membrane Protein Folding and OligomerizationBiochemistry 29:4031–37
- 31.The Safety Dance: Biophysics of Membrane Protein Folding and Misfolding in a Cellular ContextQ Rev Biophys 48:1–34
- 32.The ER membrane protein complex interacts cotranslationally to enable biogenesis of multipass membrane proteinsElife 7
- 33.The Biogenesis of Multipass Membrane ProteinsCold Springs Harbor Perspectives in Biology
- 34.Mechanism of an intramembrane chaperone for multipass membrane proteinsNature 611
- 35.High resolution comparative modeling with RosettaCMStructure 21
- 36.Epistasis in protein evolutionProtein Science 25:1204–18
- 37.Substrate-driven assembly of a translocon for multipass membrane proteinsNature 611:167–72
- 38.Deep sequencing of 10,000 human genomesPNAS 113:11901–6
- 39.The Stability Effects of Protein Mutations Appear to be Universally DistributedJ Mol Biol 369:1318–32
- 40.Stability effects of mutations and protein evolvabilityCurr Opin Struct Biol 19:596–604
- 41.X-ray structure of a protein-conducting channelNature 427:36–44
- 42.OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammarBioinformatics 24:1662–68
- 43.How Translocons Select Transmembrane Helices
- 44.Protein energetics in maturation of the early secretory pathwayCurr Opin Cell Biol 19
- 45.Complete topology inversion can be part of normal membrane protein biogenesisProtein Science 26:824–33
- 46.Plasmid-based one-pot saturation mutagenesisNat Methods 13:928–30
- 47.Structure of the human gonadotropin-releasing hormone receptor GnRH1R reveals an unusual ligand binding modeNat Commun 11