Large protein complex interfaces have evolved to promote cotranslational assembly

  1. Mihaly Badonyi
  2. Joseph A Marsh  Is a corresponding author
  1. MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, United Kingdom

Abstract

Assembly pathways of protein complexes should be precise and efficient to minimise misfolding and unwanted interactions with other proteins in the cell. One way to achieve this efficiency is by seeding assembly pathways during translation via the cotranslational assembly of subunits. While recent evidence suggests that such cotranslational assembly is widespread, little is known about the properties of protein complexes associated with the phenomenon. Here, using a combination of proteome-specific protein complex structures and publicly available ribosome profiling data, we show that cotranslational assembly is particularly common between subunits that form large intermolecular interfaces. To test whether large interfaces have evolved to promote cotranslational assembly, as opposed to cotranslational assembly being a non-adaptive consequence of large interfaces, we compared the sizes of first and last translated interfaces of heteromeric subunits in bacterial, yeast, and human complexes. When considering all together, we observe the N-terminal interface to be larger than the C-terminal interface 54% of the time, increasing to 64% when we exclude subunits with only small interfaces, which are unlikely to cotranslationally assemble. This strongly suggests that large interfaces have evolved as a means to maximise the chance of successful cotranslational subunit binding.

Editor's evaluation

The authors use a combination of proteome-specific protein complex structures and publicly available ribosome profiling data to show that cotranslational assembly is favored by large N-terminal intermolecular interfaces. The manuscript represents an important contribution to the field of protein biosynthesis pathways by suggesting an intuitive evolutionary mechanism that can promote co-translational assembly pathways in mammalians, yeast, and bacteria.

https://doi.org/10.7554/eLife.79602.sa0

Introduction

The majority of proteins across all domains of life function as part of multimeric complexes. Although we have a comprehensive understanding of the diverse quaternary structure space occupied by complexes (Ahnert et al., 2015), much less is known about where, when, and how their component subunits assemble. Continuing advances in cryo-electron microscopy (Fontana et al., 2022), mass photometry (Soltermann et al., 2020), and genetic interaction mapping (Braberg et al., 2020) are facilitating a transition towards a structural view of proteomes (Levy and Vogel, 2021). While, at the present, our structural analyses are primarily limited to complexes in the Protein Data Bank (Berman et al., 2000), there is a new generation of multiscale protein complex modelling approaches (Evans et al., 2021; Humphreys et al., 2021; Fontana et al., 2022; Gao et al., 2022) promising to fill the gap and accelerate structure-based discovery. Our understanding of proteomes has also been dramatically improved by the development of ribosome profiling, which has provided us with quantitative measurements at the level of translation. Alterations of the technique revealed the cotranslational action of chaperones (Oh et al., 2011; Shiber et al., 2018; Stein et al., 2019), shed light on the role of collided ribosomes in proteostasis (Arpat et al., 2020; Han et al., 2020; Zhao et al., 2021), and supported the view of the ribosome as a signalling hub (D’Orazio and Green, 2021). To the present work, however, it is of outstanding relevance that ribosome profiling has laid down strong evidence that the assembly of protein complexes often starts on the ribosome (Duncan and Mata, 2011; Natan et al., 2017; Sepulveda et al., 2018; Shiber et al., 2018; Kamenova et al., 2019; Panasenko et al., 2019; Fujiwara et al., 2020; Bertolini et al., 2021; Seidel et al., 2022).

Two factors appear to be particularly important for cotranslational assembly: the proximity of nascent chains on adjacent (cis) or between juxtaposed (trans) ribosomes, and the localisation of interface residues towards the N-terminus of a protein, which allows more time for an interaction to occur during translation (Shieh et al., 2015; Natan et al., 2018; Kamenova et al., 2019). Recent findings demonstrated that homomers, formed from multiple copies of a single type of polypeptide chain, frequently assemble on the same transcript via the interaction of adjacent elongating ribosomes (Bertolini et al., 2021). This mechanism is highly effective because it takes advantage of the fact that homomeric subunits are identical; therefore nascent chains are essentially colocalised by the nature of their synthesis. Although homomers may benefit from polysome-driven assembly, it requires allocation of cellular resources to ensure at least two ribosomes are actively translating the same mRNA at any one time (Liu et al., 2016a). On the other hand, heteromers, products of different genes that physically interact, can only employ the trans mode of assembly in eukaryotes, providing mechanisms that colocalise their transcripts exist (Liu et al., 2016b; Pizzinga et al., 2019; Wang et al., 2020; Chen and Mayr, 2022). In contrast to homomers, cotranslational assembly of heteromers may only require a single ribosome on each mRNA, which could allow lowly abundant regulatory proteins to cotranslationally assemble (Heyer and Moore, 2016; Biever et al., 2020). Alternate ribosome usage and translation-coupled assembly can explain how cells achieve efficient construction of complexes with uneven stoichiometry, accounting for a substantial fraction of heteromeric complexes (Marsh et al., 2015).

Despite growing evidence supporting the importance of cotranslational assembly, far less is known about the properties of the interfaces involved. It has been observed that cotranslationally binding subunits have a tendency to fall out of solution or become degraded by orphan subunit surveillance mechanisms in the absence of their partner subunits (Choe et al., 2016; Juszkiewicz and Hegde, 2018; Natan et al., 2018; Shiber et al., 2018; Kamenova et al., 2019). This observation may be explained under two assumptions: N-terminal interfaces are aggregation prone due to interference with cotranslational folding (Ciryam et al., 2013; Jacobs and Shakhnovich, 2017; Kudva et al., 2018; Kramer et al., 2019), and/or that cotranslationally forming interfaces possess unique structural properties that predispose them to aggregation in the absence of binding partners. Whilst there is evidence for the former (Natan et al., 2018), interfaces involved in nascent chain assembly have not been systematically studied before. Therefore, we cannot exclude the possibility that they have structural features that make them more susceptible to a cotranslational route.

Hydrophobic surfaces play a key role in nucleation theory (Hermann, 1972; Tanford, 1978; Chandler, 2005) and protein folding (Privalov and Khechinashvili, 1974; Chothia, 1975; Gething and Sambrook, 1992), but more importantly, hydrophobicity remains the founding principle of protein–protein recognition theory (Kauzmann, 1959; Chothia and Janin, 1975). Defined as the buried surface area between subunits, interface size shows correspondence to hydrophobic area because larger interfaces contain more interface core residues (Levy, 2010). Conveniently, interface area is relatively simple to compute from structural data (Hubbard, 1993; Kleinjung and Fraternali, 2005; Winn et al., 2011; Mitternacht, 2016). Whilst the relationship between interface area and in vitro measured affinity is non-linear (Eisenberg and McLachlan, 1986; Horton and Lewis, 1992; Brooijmans et al., 2002; Vangone and Bonvin, 2015), interface area shows remarkable correspondence with subunit dissociation energy, and is reflective of the evolutionary history of subunits within complexes (Levy et al., 2008).

We hypothesised that cotranslational interactions may be distinguished from others based upon the areas of the interfaces involved. The size hierarchy of interfaces in protein complexes can be used to predict the order in which their subunits assemble, in good agreement with experimental data (Levy et al., 2008; Marsh et al., 2013; Wells et al., 2016). According to this theory, the largest interfaces in a complex correspond to the earliest forming subcomplexes within the assembly pathway, irrespective of the binding mode. While specific contacts that increase affinity would introduce compositional biases into the sequence space, exerting undue selection pressure on proteomes, variability in interface size can emerge from non-adaptive processes as the organising principle of cotranslational assembly (Conant, 2009; Ahnert et al., 2010; Gray et al., 2010; Lynch, 2013; Leonard and Ahnert, 2019; Hochberg et al., 2020).

In the present study, we address this idea by analysing experimental data on cotranslationally assembling human proteins (Bertolini et al., 2021). Our results establish a strong correspondence between cotranslational assembly and subunit interface size. To test whether large interfaces could represent an evolutionary adaptation to cotranslational assembly, we took advantage of the many protein complex subunits that have more than one interface. We compared the areas of first and last translated interfaces in bacterial, yeast, and human heteromeric subunits, and found a clear tendency for the first interface to be larger than the last interface across all species. These findings suggest that large protein complex interfaces have evolved to promote cotranslational assembly.

Results

Cotranslationally assembling subunits are characterised by large interfaces

In a recent study, a novel ribosome profiling method was used to identify over 4000 cotranslationally assembling human proteins (Bertolini et al., 2021). By design, the method can identify subunits that undergo cotranslational assembly when both subunits are in the process of translation. As recently proposed (Kamenova et al., 2019), we refer to this mode of binding as ‘simultaneous’ assembly (illustrated in Figure 1A, B).

Figure 1 with 1 supplement see all
Cotranslationally assembling subunits are characterised by large interfaces.

(A) Schematic representation of (cis) simultaneous cotranslational assembly in homomers. (B) Schematic representation of (trans) simultaneous cotranslational assembly in heteromers. (C) Interface size differences between cotranslationally assembling and all other subunits of homomeric symmetry groups and heterodimers. Error bars represent standard error of the mean (SEM) and labels on bars show the number of proteins in each group. The p values were calculated with two-sided Wilcoxon rank-sum tests. Pictograms show the basic structure of symmetry group members, with the blue dots representing isologous and red dots representing heterologous and heteromeric interfaces. (D) Interface size distributions of cotranslationally assembling and all other subunits of C2 homodimers and heterodimers, subset by the terminal location of the interface. The p values were calculated with two-sided Wilcoxon rank-sum tests.

To investigate if interface area correlates with simultaneous assembly, we computed the buried surface areas of homomeric and heterodimeric subunits, and subset the results by whether or not the protein was detected to cotranslationally assemble (Figure 1C). The arrangement of homomeric subunits with respect to one or more rotational axes allows their classification into symmetry groups. The three most common groups are the twofold symmetric (Schönflies notation, C2), cyclic (Cn [n>2]), and dihedral (Dn [n>1]) complexes, which all have distinct structural and functional characteristics (Goodsell and Olson, 2000; Levy and Teichmann, 2013; Bergendahl and Marsh, 2017) and should therefore be considered separately. For example, members of the cyclic and dihedral symmetry tend to have larger buried surfaces, because they interface with more than one subunit, while C2 symmetric homodimers tend to have larger buried surfaces than heterodimers (Jones and Thornton, 1996).

C2 homodimers represent the most highly populated symmetry group and their single isologous interface (i.e. symmetric or head-to-head) makes the analysis simple to perform. In line with our expectation, C2 symmetric subunits that assemble during translation expose 20% larger areas than those that do not (Figure 1C; p = 7.2 × 10−7, Wilcoxon rank-sum test). Considering that the cotranslational assembly annotations are derived from a laboratory technique that uses extensive biochemical fractionation, it is important consider the possibility that larger interfaces would be more persistent to these procedures. We therefore controlled for the potential confounding effect of larger interfaces by setting incrementally higher interface area cutoffs (Figure 1—figure supplement 1A), to which the trend appears robust. Note that we have not tested the effects of similar interface cutoffs for the other symmetry groups, due to their much smaller data set size.

Higher-order cyclic complexes are centred on one rotational axis so that every subunit has two distinct interfaces, each with an adjacent protomer. Both interfaces are heterologous (i.e. asymmetric or head-to-tail) and approximately the same size. Cyclic symmetry is potentially confounded by its tendency to form ring-like structures (Forrest, 2015), which are ubiquitous components of biological membranes. As a result, membrane-bound complexes are enriched in non-polar amino acids that form the interface with the alkane core of the lipid bilayer. We focussed on the analysis of cyclic homomers that do not localise to the plasma membrane, owing to competing hydrophobic forces exerted by protein–lipid interactions. Despite the limited number of structures available, we detect a significant difference in interface area among soluble members of the cyclic symmetry group (Figure 1C), with the mean of cotranslationally forming subunits being 24% larger (p = 0.039, Wilcoxon rank-sum test). Notably, we did not observe the trend in plasma membrane localised cyclic complexes (Figure 1—figure supplement 1B).

Dihedral symmetry can be thought of as the stacking of a dimeric or cyclic complex through the acquisition of a twofold axis. All dihedral complexes have isologous interfaces, and those with at least six subunits can have both isologous and heterologous interfaces (e.g. D3 dimers of cyclic trimers). We find that cotranslationally assembling dihedral complexes have on average 33% larger interfaces than those assumed to assemble after their complete synthesis (Figure 1C; p = 1.6 × 10−6, Wilcoxon rank-sum test). A dihedral complex is likely to have evolved from a C2 homodimer if its largest interface is isologous and, conversely, when its heterologous interface is largest, the complex probably arose via a cyclic intermediate (Levy et al., 2008; Marsh and Teichmann, 2014). When dihedral complexes are grouped by their likely evolutionary history, the trend is present in both groups (Figure 1—figure supplement 1B), consistent with that observed in C2 homodimers and higher-order soluble cyclic complexes.

We pooled all remaining homomers, including those with helical and cubic symmetry, and those that are asymmetric, into a single ‘other’ category, due to their relatively low representation in the human proteome. Altogether, cotranslationally assembling subunits in this heterogeneous category present 25% larger interface areas than other members (Figure 1C; p = 0.026, Wilcoxon rank-sum test). Thus, the interface size trend in cotranslationally assembling complexes appears to hold up across all types of homomers, with the exception of membrane-bound cyclic complexes.

Because simultaneous assembly in heteromers requires two different transcripts positioned in trans (Figure 1B), we were curious if they, too, showed a correspondence between cotranslational assembly and interface size. Due to the diverse quaternary structures and assembly pathways associated with heteromeric complexes (Ahnert et al., 2015), we focussed on the simplest cases, the heterodimers, which form a single heteromeric interface by the physical interaction of two different proteins. When compared, heterodimers that simultaneously assemble reveal a 15% larger interface area on average than those not detected to cotranslationally assemble (Figure 1C; p = 7.9 × 10−3, Wilcoxon rank-sum test). Similar to C2 homodimers, the trend in heterodimers is also robust to incremental interface area cutoffs (Figure 1—figure supplement 1A), making it unlikely to be an experimental artefact.

The weaker effect size in heterodimers relative to C2 homodimers may be explained by the combination of two factors. First, previous experimental evidence suggests that, in contrast to the simultaneous assembly probed here, heteromers commonly employ the ‘sequential’ mode of assembly, whereby a subunit in the process of translation recruits a fully synthesised and folded subunit (Halbach et al., 2009; Kassem et al., 2017; Shiber et al., 2018; Kamenova et al., 2019; Panasenko et al., 2019; Lautier et al., 2021; Seidel et al., 2022). This mode of assembly has not yet been experimentally probed on a proteome-wide scale, and it is possible that many heterodimers lacking cotranslational assembly annotations in our data set employ sequential assembly. As a result, assuming that interface size plays a role in sequential assembly as well, these unannotated proteins weaken the effect we can detect. Second, it is plausible that another biological process, yet uncharacterised in detail, is responsible for the colocalisation of transcripts and the subsequent subunit assembly (Wang et al., 2020; Chen and Mayr, 2022), which could make assembly in heteromers less reliant on interface area.

We considered three potentially confounding variables of the ribosome profiling method, which was used to detect cotranslationally assembling proteins. The first is protein length, in part because long polypeptide chains take more time to translate, making it more likely for a cotranslational interaction to come about, and partly because bigger proteins tend to form larger interfaces. Long proteins are also encoded by long transcripts on which structures called di-ribosomes (two ribosomes connected by interacting nascent chains) may persist for a longer time period, potentially leading to their survivorship bias to the observer. In Figure 1—figure supplement 1C, we present an analysis where both C2 homodimers and heterodimers are binned by their length into bins containing equal number of structures, and subset by cotranslational assembly. With the exception of long heterodimers, all bins follow the expected interface size trend. More importantly, for both types of complexes, the middle bin, which contains approximately 350–720 residue long proteins and thus covers a large fraction of the human proteome, shows the strongest effect size.

The second variable we accounted for is the confidence-based classification of the cotranslational assembly data set. Bertolini et al., 2021 employed an elaborate strategy to assign high or low confidence to the protein candidates (detailed in Bertolini et al., 2021). However, these high confidence proteins only make up a fifth of all annotations, which prohibits their exclusive use in our analyses. To address this, we leveraged homology models from the SWISS-MODEL repository (Bienert et al., 2017; Waterhouse et al., 2018) to increase the number of available structures for analysis. With this supplemented structural data set, we found the difference between high confidence and all other subunits (excluding low confidence) to be statistically significant at symmetry level for all homomers (Figure 1—figure supplement 1D), but not for heterodimers, consistent with the weaker effect size for heterodimers observed earlier. There are no significant differences observed between high and low confidence proteins for any of the groups. Although we might expect that high confidence proteins should, on average, have larger interfaces than low confidence proteins (assuming that a greater fraction of them represent true cases of cotranslational assembly), we note that the size of the high confidence set is relatively small, especially when split by symmetry group, and that the average interface size of the high confidence proteins is larger for all homomers, except in the very small dihedral group. Overall, we think that the small differences between high and low confidence sets, as well as the small size of the high confidence set, justify our use of the combined sets throughout this study.

The third potential confounder is the location of the interface relative to protein termini. Interactions via N-terminal interfaces are translated earlier, therefore increasing the time available for them to assemble cotranslationally. Given that cotranslationally forming interfaces identified by ribosome profiling are known to be significantly enriched towards the N-terminus of proteins (Bertolini et al., 2021), but that overall, homomeric interfaces tend to be enriched towards the C-terminus (Natan et al., 2018), we wished to control for interface location. We classified all interfaces as occurring on either the N- or C-terminal halves of proteins, based on the position of the interface midpoint, which is the residue at which half of the buried surface area of an interface is reached. This comparison is presented for C2 homodimers and heterodimers in Figure 1D. In all groups, there is a clear interface size trend wherein cotranslationally assembling subunits have a larger area. More interestingly, however, the trend is only significant and much larger in effect between N-terminally localised interfaces. In fact, N-terminal interfaces are significantly larger than C-terminal interfaces in cotranslationally assembling homodimers (p = 0.021, Wilcoxon rank-sum test). One possible explanation for this is that N-terminally localised interfaces are far more likely to represent cases of genuine cotranslational assembly.

Interface area is more important than other interfacial contact-based properties for explaining cotranslational assembly

To rule out that interface size is masking a more important property of cotranslationally assembling subunits, we explored other interface features using the same set of C2 symmetric homodimers (n = 1486) and heterodimers (n = 1640) as shown in Figure 1A, which are abundant in the structural data and possess only a single interface, making the results simple to interpret. Because hydrophobicity is essential to protein–protein interactions (Chothia and Janin, 1975), we calculated the apolar interface area and compared it to the total area (Figure 2A, B). We found that, while the difference in apolar interface area shows a stronger effect among homodimers than the total size (Wilcoxon effect size 0.24, p = 1.8 × 10−7, vs. 0.228, p = 7.2 × 10−7), the opposite is observed among heterodimers (Wilcoxon effect size 0.054, p = 0.262, vs. 0.127, p = 7.9 × 10−3). The origin of this sharp contrast is likely the fact that heteromeric interfaces are less hydrophobic (Jones and Thornton, 1996), and thus complexation is less likely to be primarily driven by the size of the hydrophobic patch of the interface. We also looked at the absolute number of residue–residue contacts within a 5.5 Å radius (Vangone and Bonvin, 2015), which echoes the results we obtained with total interface size, but with weaker effects (Figure 2C; Wilcoxon effect size, 0.211 for C2, p = 4.4 × 10−6, and 0.083 for heterodimers, p = 0.08). Next, we employed a contact-based model to estimate binding affinity from the number and character of residue–residue contacts (Vangone and Bonvin, 2015). As expected, this analysis revealed that cotranslationally assembling subunits have higher predicted affinities, or lower ΔG of binding (Figure 2D), among both homo- and heterodimers (Wilcoxon effect size, 0.166 for C2, p = 3.1 × 10−4, and 0.122 for heterodimers, p = 0.011), although these differences are also weaker than those observed with interface size.

Interface area is more important than other interfacial contact-based properties for explaining cotranslational assembly.

All panels show the sample mean ± standard error of the mean (SEM) for cotranslationally assembling and all other subunits of human C2 symmetric homodimers and heterodimers. The p values were derived from two-sided Wilcoxon rank-sum tests in panels (A–D) and (I), and from two-sided Dunn’s test of multiple comparisons in panels (E–H). Labels on bars in panel (A) represent sample sizes. The following parameters are shown. (A) Total interface size (Å2). (B) Apolar interface size (Å2). (C) The absolute number of interfacial contacts. (D) Predicted Gibbs free energy (ΔG) of binding (kcal/mol). (E) Fraction of residue–residue contacts by chemical character, in descending order of prevalence. (F) Specific interaction density (count/1000 Å2), in descending order of prevalence. (G) Non-interacting surface apolar (NISa) and charged (NISc) residue per cent. (H) Fractional content of positively (Arg + Lys) and negatively (Asp + Glu) charged amino acids in the full-length sequence. (I) Protein isoelectric point determined with continuum electrostatics on the full monomeric structures.

We further dissected interfacial contacts based upon chemical character (Figure 2E) and interaction type of the residues (Figure 2F). The differences within these categories are again weaker than with interface size, the only exception being the contribution of salt bridges to the cotranslational assembly of heterodimers (Dunn’s test effect size 0.173, p = 7.7 × 10−5), which is similarly reflected in the differences between the fraction of charged–apolar and charged–charged contacts. By contrast, the only non-negligible contribution to binding in the cotranslational assembly of homodimers is that of pi–pi interactions (Dunn’s test effect size 0.115, p = 7.1 × 10−3). While these observations are interesting, they are broadly explained by the apolar interface areas presented in Figure 2B. On the one hand, this trend suggests that homodimeric interfaces are made up of more hydrophobic residues, which in turn leave less space for other types of residues, for example charged amino acids that may form salt bridges. This compositional bias in isologous interfaces can, by exclusion, naturally increase the frequency of pi–pi stacked aromatic side chains, a phenomenon that impacts the mutational and evolutionary landscape of symmetric homomers (Monod et al., 1965; Goodsell and Olson, 2000; Ponstingl et al., 2005). On the other hand, heterodimers, whose interfaces are less hydrophobic, accommodate more polar and charged residues that may give rise to specific interactions, in accord with the higher frequency of hydrogen bonds and salt bridges. This notion is consistent with the weak correlation between interface size and binding affinity in heteromers (Brooijmans et al., 2002), because in heteromeric protein–protein recognition, interface complementarity is likely to play a more important role than in homomers. Nevertheless, as demonstrated here, interface size shows tremendous utility in discriminating cotranslationally assembling subunits because it is easily calculable and it is a fundamental property of protein–protein interactions, unlike salt bridges or pi–pi interactions, which do not necessarily occur at every interface.

A linear model developed for the estimation of binding affinity from residue–residue contacts at the interface (Vangone and Bonvin, 2015) incorporates coefficients for the terms ‘non-interfacial surface apolar/charged’ (NISa and NISc, respectively), which are the fraction of apolar or charged surface residues of a subunit in complex, and they have been shown to influence binding affinity (Kastritis et al., 2014). We found significant differences in the NISc parameter between cotranslationally assembling and all other subunits among both homo- and heterodimers (Figure 2G), suggesting that cotranslationally assembling subunits possess a larger proportion of charged residues on the surface than other subunits. To investigate this further, we calculated the fractional content of positively (Arg + Lys) and negatively (Asp + Glu) charged amino acids from protein sequences to see if one charge group in particular is responsible for the trend. This analysis is possible because other than the relatively rare internal charges (Hendsch and Tidor, 1994; Kajander et al., 2000), most charges tend to be on the surface, an assumption that is supported by our structural data, in which nearly 83% of charged residues have more than 25% relative accessible surface area in the monomeric protein (Levy, 2010). We found that negatively charged amino acids in cotranslationally assembling subunits are overrepresented relative to other subunits (Figure 2H; Dunn’s test of multiple comparisons, C2, p = 7.1 × 10−5 and heterodimers, p = 4.1 × 10−7), and we also detect a small but significant enrichment in positively charged amino acids, but only in homodimers (p = 6.0 × 10−3). We further support these observations with isoelectric points calculated in continuum electrostatics at physiological pH and salt concentration from the human AlphaFold structures (Tunyasuvunakool et al., 2021), which represent the full-length monomers translated on ribosomes. This analysis revealed that the net enrichment in charged amino acids on the surfaces of cotranslationally assembling heterodimeric subunits results in a lower isoelectric point than in other subunits (Figure 2I; Wilcoxon rank-sum test, p = 9.1 × 10−5).

What may explain the finding that cotranslationally assembling subunits display more negative surface charges than other proteins? We believe there are four mutually non-exclusive hypotheses that are compatible with the observation. The first is based on the work of Kastritis et al., 2014, who proposed based on alanine scanning mutagenesis experiments that polar and charged residues of non-interacting surfaces contribute to binding affinity. Second, one might argue that the role of charged residues on the surface is to counteract the strong water-orientation forces exerted at large interfaces by supporting protein solubility through favourable interactions with water molecules (Kramer et al., 2012) and ions (Linse et al., 1988). The third idea concerns the ribosome:nascent chain interaction, where negative charges could help avoid unproductive interactions with the ribosome surface (Cassaignau et al., 2021; Deckert et al., 2021), thus facilitating cotranslational folding and assembly. The fourth scenario would be attributable to a proteome-wide effect, whereby the higher the abundance of a protein, the more its surface has been shaped by evolution for optimal ‘stickiness’ to combat non-specific interactions upon molecular crowding (Levy et al., 2012). While further analysis of this effect is out of the scope of this study, using pooled homo- and heterodimer data, we detect a weak but significant Spearman correlation of 0.18 (p = 1.7 × 10−23) between the NISc parameter and HEK293-specific active ribosome count (an abundance measure, Clamer et al., 2018), which corroborates the fourth hypothesis.

Larger and earlier-assembling interfaces tend to form cotranslationally in heteromeric subunits with multiple interfaces

Having confirmed that subunit interface size correlates with cotranslational assembly, we next wanted to see if this trend applies within single subunits that have more than one interface. In other words, do multi-interface heteromeric subunits also employ their largest interface during the course of simultaneous assembly? A multi-interface heteromeric subunit forms at least two distinct interfaces with two other proteins in a complex that contains at least three subunits. Because of the interface hierarchy that exists within protein complexes (Levy et al., 2008; Marsh et al., 2013), we hypothesised that the largest interface, which is most likely to assemble earliest, should also be more likely to cotranslationally assemble.

To perform an analysis at the multi-interface level, we made use of the assembly-onset positions determined for every protein in the cotranslational assembly data set (Bertolini et al., 2021). An assembly-onset is a single residue in the protein sequence, whose codon is being decoded by the ribosome at the time of cotranslational assembly. In order to identify which interface the assembly-onsets belongs to, we mapped them to the closest interface midpoint in the linear protein sequence, as illustrated in Figure 3A. This is to avoid biases from large interfaces, which have many more interface residues and therefore a higher probability that an assembly-onset would map to them if the interface was not compressed into a single midpoint residue.

Larger and earlier-assembling interfaces tend to form cotranslationally in heteromeric subunits with multiple interfaces.

(A) Visual representation of the interface mapping protocol. The area plot shows the cumulative interface area build-up of individual interfaces during translation, which are shown in different colours. The midpoints (dots) are residues at which half of the eventually formed area is exposed. Assembly-onsets determined by Bertolini et al., 2021 are mapped to the nearest midpoint on condition that it is not a homomeric interface. (B) Pairwise comparison of cotranslationally forming (in the simultaneous mode) interfaces of multi-interface heteromeric subunits to the mean of all other heteromeric interfaces on them. For visual aid, see line diagram under panel (C) Error bars represent standard error of the mean (SEM) and labels on bars show the number of proteins in each group. The p value was calculated with the Wilcoxon signed-rank test. (C) Pairwise comparison of the normalised assembly order in 201 complexes between cotranslationally forming and all other heteromeric interfaces. The normalised assembly order is a 0-to-1 scale where 0 and 1 represent the first and the last steps of the predicted assembly pathway. The p value was calculated with the Wilcoxon signed-rank test. (D) Two examples of simultaneous cotranslational assembly between subunit pairs in heteromeric complexes: the subunits E and B1 of the V-type ATPase (pdb: 6wm2), and importin-9 with histone H2A (6n1z).

Using this method, we identified 281 interfaces on multi-interface heteromeric subunits that may form in a simultaneous cotranslational fashion. To see if these correspond to the largest interfaces within each protein, we calculated the mean interface area for all other interfaces on these subunits to be able to perform a paired statistical test. Our results show that the identified cotranslationally assembling interfaces are indeed larger by 19% than other interfaces on these subunits (Figure 3B; p = 0.018, Wilcoxon signed-rank test).

We wished to put these interfaces into the context of their full complexes. Do simultaneously forming interfaces represent early forming subcomplexes that then initiate further assembly events, since the first step of a protein complex assembly pathway is the most likely to occur cotranslationally (Wells et al., 2015)? Although the largest interface in a complex is always predicted to assemble earliest in the assembly pathway, subsequent steps are non-trivial because they can involve multiple subunit:subunit interfaces (Ahnert et al., 2015; Levy et al., 2008; Marsh et al., 2013). To answer this question, we predicted the assembly steps of heteromeric complexes on the basis of their structures (Wells et al., 2016). This analysis revealed that the identified interfaces tend to form much earlier than other heteromeric interfaces in the complexes (Figure 3C; p = 0.01, Wilcoxon signed-rank test). Another interpretation of this can be given by classifying assembly steps into ‘early’ and ‘late’, depending on their normalised assembly order (McShane et al., 2016), which is a 0-to-1 scale indicating the first-to-last steps of a pathway, where we defined early steps with values less than or equal to 0.5. According to this, a simultaneously forming interface is 1.7 times more likely to form early (180 [67%] of 270 vs. 633 [54%] of 1171; p = 9.5 × 10−5, Fisher’s exact test).

Some of the identified interfaces belong to complexes that have been shown to use cotranslational assembly routes, such as the proteasome (Panasenko et al., 2019) and subunits of the transcription initiation complex (Kamenova et al., 2019). However, many are not yet described in the literature, for example, the loading of histone H2A onto importin-9 (Figure 3D), which has been reported to act as a storage chaperone while transporting a histone dimer to the nucleus (Padavannil et al., 2019). Another example is the V-type ATPase (Figure 3D), whose catalytic A and B subunits have been tested for their ability to assemble in the sequential mode with a negative result (Shiber et al., 2018), but our structural approach using the assembly-onset identified the E1 subunit to form in the simultaneous mode with the catalytic B subunit. Although these two subunits can undergo major structural rearrangements in the complex, the same B subunits stay in contact with the same E1 subunits across all the observed conformational states (Vasanthakumar et al., 2022). In fact, such large post-translational conformational rearrangements may be common in cotranslationally forming interfaces, given that proteins with larger interfaces will have an inherent tendency to be more flexible (Marsh and Teichmann, 2014).

Evolutionarily more ancient subunits of complexes are more likely to undergo cotranslational assembly

Protein complexes are under evolutionary selection to minimise misassembly (Marsh et al., 2013; Leonard and Ahnert, 2019), meaning that over evolutionary timescales, ordered subunit assembly has been prioritised in cells (Wells et al., 2016). These findings have led to the formulation of the interface size hypothesis, which posits that the assembly pathway of a protein complex parallels with its evolutionary history. A simple proxy for predicting the steps of such pathways is interface size, which demonstrates exceptional correspondence with in vitro complex assembly–disassembly analyses (Levy et al., 2008; Marsh et al., 2013).

Naturally related to the interface size hypothesis is another trend that reflects the time it takes for a newly emerged interface to become strengthened by evolution, either beacause of a functional association between the proteins, or via entrenchment in a neutral ratchet (Kim et al., 2006; Dayhoff et al., 2010; Hochberg et al., 2020). To address this, we grouped heteromeric protein complex subunits based on the phylogenetic class in which a protein’s encoding gene first appeared, that is protein age (Liebeskind et al., 2016). In addition to the human proteins, we also employed a data set of yeast complexes. Due to the smaller size of the yeast data set, we supplemented it with heteromeric models computed for yeast core complexes, which were determined using residue coevolution inferred from paired multiple sequence alignments, followed by deep learning-based structure prediction of subunit pairs with experimental evidence to interact (Humphreys et al., 2021). In Figure 4A, we show that the average interface size of the groups, for both human and yeast, is ordered almost perfectly according to evolutionary age of both homo- and heteromers. When we ask what percentage of each protein age group was found to assemble cotranslationally by Bertolini et al., 2021, there is a remarkable agreement between older proteins with larger interfaces having a higher frequency of cotranslational assembly (Figure 4B). Older subunits are also more likely to be be found in more than one complex than younger subunits (Saeed and Deane, 2006; Drew et al., 2021); therefore cotranslationally assembling proteins are expected to be enriched in moonlighters. Indeed, it has recently been shown among components of the nuclear pore complex that some subunits preferentially assemble cotranslationally with one partner, but not with a different partner from another complex (Seidel et al., 2022).

Evolutionarily more ancient subunits of complexes are more likely to undergo cotranslational assembly.

(A) Average (mean ± standard error of the mean [SEM]) interface sizes of yeast and human homo- and heteromeric subunits grouped by the evolutionary age of the protein. Age group labels coloured in red are defined as ‘more recent’ proteins, while those in black represent ‘ancient’ proteins. Numbers under labels represent the number of distinct proteins in the given age group. Homomeric interface sizes are a pool of experimentally determined structures and SWISS-MODEL homology models. (B) The frequency (%) of cotranslational assembly, as detected by Bertolini et al., 2021, in the different protein age groups, split into homo- and heteromers. Heteromer annotations were supplemented with those contained in hu.MAP2.0 (Drew et al., 2021). Fractions under labels denote the number of cotranslationally assembling proteins out of the total in the given age group. Error bars represent 68% Jeffrey’s binomial confidence intervals.

N-terminal interfaces tend to be larger than C-terminal interfaces supporting evolutionary selection for cotranslational assembly

There are two possible explanations for the observation that cotranslationally forming interfaces tend to be larger. First, larger interfaces may be inherently more likely to form cotranslationally because their assembly is more energetically favourable. In this scenario, cotranslational assembly has not been evolutionarily selected for; instead, the larger interfaces are simply more likely to be formed while the protein is still in the process of being translated, without providing any functional benefit. Alternatively, cotranslational assembly may have been selected for, for example, because it increases the efficiency of assembly and avoids potentially damaging non-specific interactions. Here, large interfaces have evolved to increase the level of functionally beneficial cotranslational assembly.

One way to distinguish between these two scenarios is to compare the sizes of N- and C-terminal interfaces. Regardless of whether cotranslational assembly occurs simultaneously (Figure 1B) or sequentially (Figure 5A), due to vectorial synthesis on the ribosome, N-terminal regions of proteins are more likely to be involved in binding events during translation. Therefore, if cotranslational assembly is adaptive, we would expect that N-terminal interfaces in multi-interface heteromeric subunits, which will be translated first, should show a significant tendency to be larger than C-terminal interfaces, as illustrated in Figure 5B.

Figure 5 with 1 supplement see all
N-terminal interfaces tend to be larger than C-terminal interfaces supporting evolutionary selection for cotranslational assembly.

(A) Schematic representation of sequential cotranslational assembly in homomers. (B) Diagrammatic representation of the hypothesis test of the adaptive model of cotranslational assembly. (C) Area differences between the first and last translated interfaces in multi-interface heteromeric subunits across the species. Panels are ordered by the area cutoffs, 400, 1000, and 2000 Å2, which are satisfied if either the first or the last interface is larger than the given cutoff. Error bars represent standard error of the mean (SEM) and labels on bars show the number of proteins in each group. The p values were calculated with Wilcoxon signed-rank tests. (D) Bootstrap distributions of the area difference between the first and the last translated interfaces within two different categories. The first category (top two boxplots) is protein age, whereby yeast and human multi-interface heteromeric subunits are classified into ‘ancient’ and ‘more recent’ groups. In the second category (bottom boxplot), bacterial subunits are grouped based on whether or not they are encoded in operons. Positive values represent a larger first translated interface. The p values were calculated from 104 bootstrap resamples with correction for finite testing. (E) Example of an operon-encoded complex, the RecBCD nuclease (pdb: 5ld2). In the linear sequence of RecD, the interface with RecC is translated first, and that with RecB is last. The RecD:RecC interface is twice the area of the RecD:RecB interface, likely to promote cotranslational subunit recruitment. (F) Correlation between the relative distance of translational start points and the relative area difference of the first and last translated interfaces. Shaded lines represent the 95% confidence interval of the regression line. The Pearson’s correlation coefficient r, its p value, and the regression coefficient β1 are shown in the panels.

To address the question, we selected experimentally determined heteromeric complex structures from three model proteomes: Escherichia coli, Saccharomyces cerevisiae (yeast), and Homo sapiens (human). For each heteromeric subunit, we defined the first interface as the one that exposes the most N-terminal interface residue in the linear protein sequence, and, to treat the termini symmetrically, the last interface was defined as the one that exposes the last interface residue, that is the first interface residue from the C-terminal direction.

When we compare the areas of the first and last translated interfaces in heteromeric subunits across species, we find the first interface to be larger (Figure 5C; full distribution in Figure 5—figure supplement 1A). The strongest effect is measured in E. coli, where the first interface is larger than the last interface in 60% of cases, and on average is 205 Å2 (15%) larger (p = 5.6 × 10−3, Wilcoxon signed-rank test). In humans, the trend is weaker, with the first interface larger in 52% of cases, being 147 Å2 (13%) larger on average (p = 0.022). In yeast, although the first interfaces are 130 Å2 (9%) larger on average, the difference is not significant. Averaging over the three species, we observe first interface to be larger than the last interface in 54% of multi-interface subunits.

The above analysis includes all multi-interface heteromeric subunits. However, some of these contain only very small interfaces, and would therefore be unexpected to undergo cotranslational assembly via either interface, thus reducing the observed tendency for N-terminal interfaces to be larger. Indeed, when we filter the subunits to exclude those with only small interfaces, including only those where the size of at least one of the two interfaces is larger than 1000 or 2000 Å2, we find that the trend gets much larger across all three species (Figure 5C). When using the 2000 Å2 filter, we observe that N-terminal interfaces are on average 23%, 31%, and 19% larger for E. coli, human, and yeast, respectively, significant for all. Averaging over the three species, we observe first interface to be larger than the last interface in 64% of cases. Thus, when considering multi-interface subunits of heteromeric complexes where at least one interface is >2000 Å2, the N-terminal interface is larger than the C-terminal interface nearly two thirds of the time.

Building on our earlier observation that cotranslationally assembling proteins tend to be older in evolutionary terms (Figure 4B), we asked if a protein’s age could also influence the size of the N-terminal interface, assuming that over time selection would work to minimise misassembly by favouring larger first translated interfaces and ensuring cotranslational assembly between the correct subunit pair. We split the multi-interface subunits from human and yeast into ‘ancient’ and ‘more recent’ categories (see Materials and methods). We then generated bootstrap distributions of the area difference between the first and the last translated interfaces in yeast and human multi-interface heteromeric subunits within the respective age categories. The analysis revealed that the difference between the areas of the first and last translated interfaces is significantly larger in ancient proteins in both yeast and human (Figure 5D; mean difference of 163 Å2, p = 0.015 and 192 Å2, p = 0.026, respectively), but not in younger heteromers. We hypothesise that this is because there has not been enough time for selection to considerably act on the assembly of these subunits, or because newer subunits tend to be required less for cotranslational interactions due their later assembly or functional roles they play. Thus, these analyses support that large, typically ancient, interfaces have evolved to promote cotranslational assembly.

Our attention was next drawn to E. coli, which demonstrated the strongest first versus last interface size trend. In prokaryotes, many heteromeric complexes are encoded by operons, where the different subunits are translated off of the same polycistronic mRNA molecules. Early studies in bacteria indicated that operon gene order is correlated to physical interactions between the encoded proteins (Mushegian and Koonin, 1996; Dandekar et al., 1998). Further investigation laid down theoretical, mechanistic, and evolutionary evidence in support of this (Sneppen et al., 2010; Shieh et al., 2015; Wells et al., 2016). Cotranslational assembly is likely to be particularly common in operon-encoded heteromers, given that the translation of different subunits is inherently colocalised. We hypothesised that the tendency for N-terminal interfaces to be larger should be stronger in operon-encoded E. coli heteromers, compared to those that are not operon encoded.

We illustrate the example of the RecBCD nuclease in Figure 5E. Genes of the subunits are located in adjacent loci encoding transcriptional units for RecC and RecB/D. One study reported that purification of RecD is complicated by the formation of inclusion bodies, while the other two subunits remain in the soluble fraction (Marsh and Teichmann, 2014). Moreover, a genetic analysis suggested that partially folded RecC and RecD might interact during translation, or that RecC forms a complex with RecB first, onto which RecD is then assembled (Amundsen et al., 2002). The regulatory subunit RecD has two interfaces well separated in the sequence, where the interface with RecC is translated first. One might imagine that the nascent chain of RecD forms a complex with mature subunit of RecC, having double the interface area to accommodate RecC than that for RecB. In this scenario, the assembly efficiency is not only maximised by gene order reducing stochasticity, but also by cotranslational assembly minimising the need for post-translational association.

To further test whether large interfaces could have been selected to promote cotranslational assembly, we acquired annotations derived from RNA sequencing data sets (Chetal and Janga, 2015) to group heteromers from E. coli according to whether or not they are encoded by operons. We again generated bootstrap distributions of the area difference between the first and the last translated interfaces to visualise and derive a probability (Figure 5D). In agreement with the above idea, we found that the size difference between the two interfaces is significantly larger in operon-encoded multi-interface heteromeric subunits, favouring the first interface (mean area difference of 317 Å2, p = 0.011).

We speculate that the first versus last interface trend may be the hallmark of sequential cotranslational assembly (Figure 5A), rather than that of the simultaneous mode (Figure 1B). Bertolini et al., 2021 have suggested that simultaneous assembly is predominantly employed for the formation of homomeric protein complexes, which could mean sequential assembly is the more common cotranslational assembly mode in heteromers. The strong trend in E. coli also supports this idea, because polycistronic gene structure is more compatible with sequential cotranslational assembly (Shieh et al., 2015). In eukaryotes, large complexes and subunits of lowly abundant complexes may require an additional biological process to ensure their transcripts are colocalised for simultaneous assembly and to facilitate further assembly steps (Keene, 2007; Chen and Mayr, 2022). Sequential assembly, on the other hand, may have evolved to exploit large interface areas for the recruitment of partner subunits. This can be conceptualised as the ‘bait and prey strategy’ of cotranslational assembly, in which a large nascent interface represents the ‘bait’ bound by a fully folded ‘prey’ subunit. Although a proteome-scale data set of cotranslationally assembling proteins is not available for yeast, we have identified case studies of five multi-interface heteromeric subunits in yeast that use the sequential assembly mode, and found that all five subunits follow the first versus last interface size trend (Table 1). To substantiate the model further, we removed from the human data set those proteins that were identified by Bertolini et al., 2021, that is those that simultaneously assemble. Strikingly, removal of these proteins increases the size difference between the first and the last translated interfaces from 13% to 18% (Figure 5—figure supplement 1B; p = 0.014, Wilcoxon signed-rank test). One explanation is that the remaining heteromers are enriched in sequential assembly, and thus exhibit a greater difference.

Table 1
Table of yeast multi-interface heteromeric subunits, which have been shown to utilise the sequential mode of cotranslational assembly.

The yeast Set1, part of the COMPASS complex, binds multiple partners during its translation process (Halbach et al., 2009), but the order of these assembly steps is uncertain. In a partial structure of the histone methyltransferase complex (6b × 3), Set1 is found to have two biologically significant interfaces, the first with Swd3 and the last with Swd1. For the rest of the cases, the mature partners are known and the available structural data support a model where the first translated interface is larger relative to the last translated interface, likely to promote cotranslational subunit recruitment.

StudyNascent chainMature partnerFirst (Å2)Last (Å2)Source
Halbach et al., 2009Set1uncertain1,4671,0256b × 3
Shiber et al., 2018; Fischer et al., 2020Fas2Fas14,2264846ql9
Shiber et al., 2018Pfka1Pfk27,7965,7493o8o
Shiber et al., 2018Gcn3Gcd21,210687Humphreys et al., 2021
Panasenko et al., 2019Rpt2Rpt14,2264846fvt

A property that would be consistent with the above model is interface separation. The later the translation of the last interface starts relative to the first, the higher the chance that assembly of the first interface will be undisturbed, free of competition with the partner subunit of the last interface. Therefore, we hypothesised that the distance between translation start points of the first and last interfaces, which are the earliest emerged interface residues of each, should correlate with the size difference in favour of the first interface. Because of large variances in protein length and interface size, we normalised the translational distance between the first and last interfaces as the percentage of the protein’s sequence length, and scaled the area difference by the sum of both interfaces. Figure 5F shows the correlation between the separation of translation start points and the area differences of the first and the last interfaces (absolute values shown in Figure 5—figure supplement 1C). As expected, increasing the distance between the start points monotonically increases the extent of the area difference across all species. To rule out that the interface separation metric is confounded by sequence length, we split the structures into less and more than 400 amino acids, which is the pan-species mean of sequence lengths. In both subsets, there is a pronounced preference in all species for a larger first interface when the separation is high (Figure 5—figure supplement 1D). The causal direction of this effect, whether it reflects that cotranslational assembly happens more often in high degrees of interface separation, or that separation is driven by selection for cotranslational assembly, remains to be addressed.

Discussion

It has long been understood that interface area is important for assembly, but the capacity in which it shapes the hierarchy of individual interfaces on subunits remained elusive. In this study, we first combined information from ribosome profiling with structural data on protein complexes to probe the importance of interface area in the process of cotranslational assembly. Our results demonstrate a strong correspondence between interface size and cotranslational subunit binding. Inspired by this, we set out to test an important question about the biological significance of cotranslational assembly: do large interfaces give rise to cotranslational assembly because of simple energetic reasons or do they reflect an evolutionary adaptation for a functional benefit? We found a clear trend across three evolutionarily distant species for the first translated interface of heteromeric subunits to be larger, suggesting that large interfaces have evolved to promote cotranslational assembly.

While our results support the adaptive hypothesis of interface size, it is not entirely inconceivable that cotranslational assembly represents a ratchet-like mechanism of constructive neutral evolution (Gray et al., 2010; Hochberg et al., 2020), whereby a drift in interface properties creates ideal conditions for assembly on the ribosome. Reversion to the post-translational route is prevented by the accumulation of mutations that are neutral in the subunit entrenched in cotranslational assembly, but would otherwise be deleterious in the ancestor. A similar neutral process may sustain the differences in interface area presented in this study.

In light of current knowledge of bacterial operon structure and translation regulation, it is not surprising that we observed the strongest trend among E. coli heteromers with respect to the size of the interface that first emerges from the ribosome. Supposedly, the effect is attributable to the widespread sequential assembly between mature subunits and nascent chains, reflecting the mechanism of effective subunit recruitment by large N-terminal ‘bait’ interfaces. An interesting question for laboratory experiments is whether operon-encoded heteromers can assemble in the simultaneous mode, providing that the structural organisation of the bacterial polysome allows for such a precise coordination (Brandt et al., 2009).

How does a large interface area help translating ribosomes find one another? Its benefit in homomers for facilitating cotranslational assembly is clear, because the subunits are localised to the same mRNA, and large interfaces are more likely to form interactions before translation is complete. In heteromers, one hypothesis argues the involvement of RNA-binding proteins that orchestrate transcript colocalisation (Keene, 2007; Chen and Mayr, 2022), from where similar rules may apply to heteromer assembly as for homomers. A more parsimonious hypothesis is formed on the observation that cotranslational assembly can result in transcript colocalisation, which is ablated when subunit affinity is decreased (Heidenreich et al., 2020). This may suggest that affinity, which correlates strongly though imperfectly with interface size (Brooijmans et al., 2002; Vangone and Bonvin, 2015), can play a role in the colocalisation of transcripts belonging to the same complex.

Many more topics of inquiry remain open for future studies. Analogous to protein folding, cotranslational assembly can be thought of as a hydrophobic collapse that shapes the quaternary structure of the complex. As with folding in the cell, the involvement of other factors must not be overlooked. A wide array of ribosome-associated chaperones are vital for nascent chain homeostasis (Oh et al., 2011; Döring et al., 2017; Koldewey et al., 2017; Shiber et al., 2018) and the degree to which they choreograph assembly steps is yet to be elucidated.

Attention should be paid to the far-reaching genetic consequences of cotranslational assembly (Natan et al., 2017). How much does transcript-specific assembly buffer the dominant-negative effect, and what does it mean in the context of human genetic disease (Bergendahl et al., 2019)? This effect requires mutant subunits to be stable enough to assemble into complexes, and thus the impact of the mutations tends to be milder at the structural level than of other pathogenic mutations (McEntagart et al., 2016), making them exceptionally difficult to detect using the existing variant effect predictors (Gerasimavicius et al., 2021). Interestingly, cotranslational assembly should actually make the dominant-negative effect less common in homomers, because it can limit the mixing that occurs between wild-type and mutant subunits. It remains to be seen whether dominant-negative mutations are in fact less common in cotranslationally assembling complexes.

Finally, our results build on evidence from the past decade and emphasise the importance of protein complex assembly at the translatome level. Although evolutionary selection against N-terminal interface contacts to avoid premature assembly was previously found in homomers (Natan et al., 2018), here we report an opposite phenomenon in which proteins that do cotranslationally assemble sustain large N-terminal interfaces in order to promote cotranslational subunit recruitment. We expect that our observation will be supported by experimental data once the proteome-wide detection of sequentially assembling heteromeric subunits is made possible.

Materials and methods

Protein structural data sets

Request a detailed protocol

Starting from the entire set of structures in the Protein Data Bank (Berman et al., 2000) on 2021-02-18, we searched for all polypeptide chains longer than 50 residues with greater than 90% sequence identity to H. sapiens, S. cerevisiae, and E. coli canonical protein sequences. When proteins mapped to multiple chains, we selected a single chain sorting by sequence identity, then by the number of unique subunits in the complex, and then by the number of atoms present in the chain. Only biological assemblies (pdb1) were used and symmetry assignments were taken directly from the PDB. Polypeptides formed by cleavage were excluded. In the generation of the multi-interface heteromeric subunit data sets, to exclude proteins with yet uncharacterised interfaces, chains with an at least 70% complete structure were considered and only included if they formed interface pairs >800 Å2 with at least two different subunits. To supplement the smaller yeast data set, computed structures of yeast core complexes were downloaded from the ModelArchive link provided by Humphreys et al., 2021. For downstream analysis, mmCIF files were converted into standard Brookhaven PDB format and the chains were mapped to genes using the table provided on ModelArchive. Homology models of yeast and human homomeric complexes were obtained from the SWISS-MODEL repository (version 2022_02) (Bienert et al., 2017; Waterhouse et al., 2018). When a protein’s UniProt accession number mapped to multiple homology models, we selected a single model ranking by the number of subunits in the complex, followed by the length of the modelled chain. Symmetry groups of the homology models were assigned with the software AnAnaS (Pagès and Grudinin, 2018; Pagès et al., 2018). In the analysis of N- versus C-terminal interface sizes, we excluded very large heteromeric complexes, defined as those containing ≥10 subunits. This is because of the previous evidence that predicting assembly order based on interface size in very large complexes is not as accurate (Marsh et al., 2013; Ahnert et al., 2015), likely because of the many intersubunit interfaces these complexes possess.

Calculation of interface area-related properties

Request a detailed protocol

Interface areas of SWISS-MODEL homology models were calculated with FreeSASA (Mitternacht, 2016) using the default surface probe radius of 1.4 Å. Residue-level pairwise interfaces in complexes derived from the PDB and from Humphreys et al., 2021 were calculated between all pairs of subunits using AREAIMOL from the CCP4 suite (Winn et al., 2011) with a probe radius of 1.4 Å. The interface was defined as the difference between the solvent accessible surface area of each subunit in isolation and within the context of the full complex. Apolar interface area was calculated from the residue-level data by classifying A, F, G, I, L, V, M, P, and Y amino acids as apolar (Vangone and Bonvin, 2015). An area cutoff of >400 Å2 was used for homomeric subunits and individual interfaces of multi-interface heteromeric subunits derived from the PDB to exclude potential crystallographic interfaces and to restrict the analyses to biologically significant interfaces. Assembly order was computed by predicting the assembly pathway assuming additivity of pairwise interfaces in each complex (Marsh et al., 2013), and implemented with the assembly-prediction Perl package (Wells et al., 2016).

The relative interface location was calculated according to the formula:

Relativeinterfacelocation=(i1)/(L1)

where i marks the residue at which half of the cumulative buried surface area of the interface is passed (i.e. interface midpoint), and L is the sequence length.

The normalised distance between translational start points of two interfaces was calculated as:

Relativetranslationaldistance=(flastffirst)/L

where f marks the first residue of the given interface and L is the sequence length.

Area differences between the first and the last interfaces were normalised according to the equation:

Relativesizedifference=(BSAfirstBSAlast)/(BSAfirst+BSAlast)

where BSA is the buried surface area of the corresponding interface.

Calculation of interfacial contact-related properties

Request a detailed protocol

To make sure all software runs without errors, we converted the human structure files of homo- and heterodimers obtained by our pipeline from the Protein Data Bank to standard Brookhaven PDB format by taking the first atom locations in case of multiple AltLoc entries, converting non-canonical amino acids into equivalent standard names, renaming chain pairs with identical chain identifiers, and stripping files to only contain ATOM, TER, and END lines, thus excluding heteroatoms. Interfacial residue contacts were determined with the software PRODIGY (Vangone and Bonvin, 2015; Xue et al., 2016), using the default settings. The software RING 3.0 (Clementel et al., 2022) was used to compute the different types of residue interactions between the subunits. The network policy was set to ‘closest’ and all interactions were returned for a contact using the flag --all_edges.

Determining protein isoelectric point from structure

Request a detailed protocol

Predicted structures of the human monomeric proteome were acquired from the AlphaFold database (Tunyasuvunakool et al., 2021) in PDB format, which were converted into PQR files with PDB2PQR (Dolinsky et al., 2004) using the PARSE force field. The PQR files were piped into the software BLUUES (Walsh et al., 2012), and the.ddg output was kept, containing the pH ~ charge titration data. Then, each protein’s isoelectric point (pI) was calculated by interpolating from the pH ~ charge curve for charge = 0, using the approx() function in R. For proteins that are longer than 2700 amino acids and are contained in fragments in the AlphaFold database, we took the mean pI across the fragments.

Protein localisation

Request a detailed protocol

We obtained annotations for plasma membrane, cytoplasmic, and nuclear localisations directly from the UniProt FTP site (Bateman, 2021). Canonical UniProt entries with the gene ontology terms plasma membrane (GO:0005886), cytoplasm (GO:0005737), and nucleus (GO:0005634) were considered.

HEK293 active ribosome count

Request a detailed protocol

Normalised ribosome protected fragments of actively translating ribosomes specific to Human Embryonal Kidney 293 lineage were determined by Clamer et al., 2018, the data are available at the NCBI Gene Expression Omnibus (Edgar et al., 2002) under the accession GSE112353. We used averages from two biological replicates.

Protein age

Request a detailed protocol

The ages of proteins were obtained from the work of Liebeskind et al., 2016, at the link https://github.com/marcottelab/Gene-Ages/tree/master/Main. The main_HUMAN and the main_YEAST comma separated value files were parsed and the modeage column was used in our analyses. We combined the mode age into two categories, where ‘ancient’ proteins are those whose genes are common to all cellular life (‘Cellular_organisms’), whose genes were transferred horizontally from bacteria after eukaryotes diverged from archaea (‘Euk +Bac’), and whose genes emerged in the clades of eukaryotes and archaea (‘Eukaryota’ and ‘Euk_Archaea’). The other four age groups, ranging from genes emerged in the classes Opisthokonta to Mammalia in the human proteome, and from Ascomycota to Saccharomyceta in the yeast proteome, were classified as ‘more recent’.

hu.MAP2.0 heteromers

Request a detailed protocol

Drew et al., 2021 integrated large-scale affinity purification mass spectrometry data sets, large-scale biochemical fractionation data, proximity labelling, and RNA hairpin pulldown data to generate a complex map with >7000 complexes, which is freely available at http://humap2.proteincomplexes.org/. We used hu.MAP2.0 to supplement genes of heteromers for the analysis in Figure 4B.

Mapping simultaneously forming interfaces in multi-interface heteromeric subunits

Request a detailed protocol

Cotranslational assembly-onset positions were acquired from the supplemental material of Bertolini et al., 2021. From the onset positions, 30 residues were subtracted to account for the length of the ribosome tunnel. Our method maps the assembly-onset position to the closest interface midpoint in the linear sequence. Cases where the assembly-onset mapped to a homomeric interface were discarded under the assumption that the homomeric interface is hierarchically higher and undergoes cis-assembly. In subsequent analyses presented in Figure 3B, C, we only included comparisons between heteromeric interfaces.

Mapping bacterial subunits to operons

Request a detailed protocol

Operon annotations were downloaded from OperomeDB (Chetal and Janga, 2015). Genes were mapped to UniProt identifiers using E. coli proteome-specific mapping from the UniProt FTP site (Bateman, 2021).

Molecular graphics

Request a detailed protocol

Visualisation of structures was performed with UCSF ChimeraX version 1.1 (Pettersen et al., 2021).

Statistical analysis

Request a detailed protocol

Data exploration and statistical analyses were carried out in RStudio (Rstudio, 2022) version 2022.02.0+443 ‘Prairie Trillium’ Release, using R version 4.2.1 (R Development Core Team, 2021). The R packages used for analyses are tidyverse, tidytable, rsample, rstatix, scales, and ggbeeswarm. The Wilcoxon rank-sum or signed-rank tests were used for A/B testing of interface size distributions, because although they appear log-normal, they are also left bounded because of the minimum interface size cutoff, thus a non-parametric test was required. Wilcoxon signed-rank tests were one tailed, and their main assumption that the data are symmetric around the median was supported by boxplot distributions. In Dunn’s test of multiple comparisons the Holm–Bonferroni method (Holm, 1979) was used to correct for family wise error rate. The effect sizes were defined as the Z-score computed from the p value over the square root of sample size (Tomczak and Tomczak, 2014). In the bootstrap analyses, data were stratified for protein age or operonal localisation in 104 resamples. The p value was calculated by determining the fraction of point estimates (difference in area) greater than 0, with correction for finite sampling (Buckland et al., 1998). In the regression analysis, conditions for statistical inference, including linearity of the relationship between variables, the independence and normality of the residuals, and homoscedasticity were met; validations can be found in the analysis scriptata and code availabilit.

Data availability

Data and code to reproduce the results have been deposited on the OSF at https://osf.io/x5b2n/.

The following data sets were generated
    1. Badonyi M
    (2022) Open Science Framework
    ID x5b2n. Large protein complex interfaces have evolved to promote cotranslational assembly.
The following previously published data sets were used
    1. Humphreys IR
    (2021) ModelArchive
    Computed structures of core eukaryotic protein complexes.
    https://doi.org/10.5452/ma-bak-cepc
    1. Jumper JE
    (2021) AlphaFold Database
    ID Homo sapiens. Homo sapiens AlphaFold structures.
    1. Drew K
    2. Wallingford JB
    3. Marcotte EM
    (2021) hu.MAP
    ID 2.0. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies'.
    1. Clamer M
    2. Bernabò P
    3. Lauria F
    4. Tebaldi T
    5. Viero G
    (2018) NCBI Gene Expression Omnibus
    ID GSE112353. Active ribosome profiling with RiboLace and standard ribosome profiling in HEK-293 cells.
    1. Liebeskind BJ
    2. McWhite CD
    3. Marcotte EM
    (2016) GitHub
    ID marcottelab/Gene-Ages. Towards consensus gene ages.

References

    1. Holm S
    (1979)
    A simple sequentially rejective multiple test procedure
    Scandinavian Journal of Statistics 6:65–70.
  1. Book
    1. Hubbard ST
    (1993)
    NACCESS
    London: Department of Biochemistry and Molecular Biology, University College.
  2. Software
    1. R Development Core Team
    (2021) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.
  3. Software
    1. Rstudio T
    (2022)
    Rstudio Team
    PBC, Boston, MA.
    1. Tomczak M
    2. Tomczak E
    (2014)
    The need to report effect size estimates revisited: an overview of some recommended measures of effect size
    TRENDS in Sport Sciences 1:19–25.

Decision letter

  1. Marina V Rodnina
    Reviewing Editor; Max Planck Institute for Multidisciplinary Sciences, Germany
  2. Volker Dötsch
    Senior Editor; Goethe University, Germany

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Large protein complex interfaces have evolved to promote cotranslational assembly" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Volker Dötsch as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. The authors' main claim is that N-terminal interfaces are significantly larger than other interfaces and facilitate cotranslational assembly interactions. The authors show interface size differences of high confidence cotranslational assembly complexes compared to low confidence cotranslational assembly complexes compared to all others in Figure S1C, divided into symmetry groups. However, no statistical significance is provided. This is essential, as this provides the basis for the manuscript's main claim.

2. The manuscript states that the authors compared the areas of first and last translated interfaces in bacteria and yeast. However, no experimental nascent-proteome data is presented for bacteria and yeast. There have been several recent publications exploring co-translational complex assembly in both yeast and bacteria. The authors claim that large protein complex interfaces have evolved to promote cotranslational assembly could be greatly supported by utilizing some of these examples for verification.

3. Apart from the size of the interface and its location (N-term vs C-term), an important point to consider is protein-protein interaction potential (including hydrogen bonds, electrostatic and hydrophobic interactions, number of contacts, etc). Would these parameters also differ for cotranslationally assembling complexes vs posttranslationally assembling complexes and to what extent?

4. The manuscript states "However, since larger interfaces tend to cotranslationally assemble more frequently, we filtered the subunits to include cases where the size of at least one of the two interfaces is larger than 1000 or 2000 Å2". This is further demonstrated in Figure 3.

The reasoning for selecting these criteria is not clear. The authors should show the distribution of each group (all, 1000, 2000 Å2) of coco vs. non-coco, so one can see how likely is this assumption.

5. The authors show two examples of simultaneous cotranslational assembly between subunit pairs in heteromeric complexes: the subunits E and B1 of the V-type ATPase (pdb: 6wm2), and importin-9 with histone H2A (pdb: 6n1z) (Figure 2D). Structural data indicate that the V-type ATPase is highly dynamic with several conformations to this complex (Vasanthakumar, et al. (2022) Nat Struct Mol Biol), suggesting that the complex undergoes significant post-translational conformational rotations (see PDB: 7TMQ vs. 7TMM for example). The new rotational variants seem to challenge the prediction and calculation presented here. The authors should address these findings.

6. The effect size for how common it is for the N-terminus interface to be larger than the C-terminal interface should be computed in the following manner: Calculate the probability that for proteins that have two interfaces, the first interface will be larger than the second (relevant to Figure 3 and S2). This number should be reported in the abstract (as you highlight this find in the second to last sentence of the abstract), as well as the results and discussion. The reason for this is to help the community avoid over-generalizing the findings.

Reviewer #1 (Recommendations for the authors):

1. Statistical significance should be indicated in Figure S1C, as this is an essential comparison to the experimental data.

2. The approach presented in Figure 2A for mapping the interactions is not well described and Figure legend is missing details as to (i) why the assembly onset is shown at the time indicated (and the description of what the red element is and where this is taken from) and (ii) what are the blue, gray, and yellow areas. It would be also important to validate this method on a well-defined dataset.

Reviewer #2 (Recommendations for the authors):

The manuscript provides a clear analysis of current structural and experimental data, characterizing novel interface features facilitating cotranslational assembly as well as demonstrating their conservation.

The manuscript is an important contribution to the field of protein biogenesis, however, a few key points remain unanswered.

– The authors' main claim is that N` terminal interfaces are significantly larger than other interfaces and facilitate cotranslational assembly interactions. The authors show interface size differences of high confidence cotranslational assembly complexes compared to low confidence cotranslational assembly complexes compared to all others in Figure S1C, divided into symmetry groups.

However, no statistical significance is provided for any. This is required, as this provides the basis for the manuscript's main claim.

– In the manuscript it is suggested that "According to this theory, the largest interfaces in a complex correspond to the earliest forming subcomplexes within the assembly pathway, irrespective of the binding mode." This is shown for two examples, in Figure 2D: simultaneous cotranslational assembly between subunit pairs in heteromeric complexes: the subunits E and B1 of the V-type ATPase (pdb: 6wm2), and importin-9 with histone H2A (pdb: 6n1z).

However, for the V-type ATPase, there is new structural data from Vasanthakumar, et al. (2022) Nat Struct Mol Biol. Vasanthakumar and colleagues have shown the complex is highly dynamic, assigning several conformations to this complex, whether attached or not attached to the membrane as well as rotational variants. In one conformation the interface of subunits E is with A and in another with B. The results suggest the complex undergoes significant post-translational conformational rotations (see PDB: 7TMQ vs. 7TMM for example). The new rotational variants seem to challenge the prediction and calculation presented here. The authors should address these findings.

– The manuscript states that "We compared the areas of first and last translated interfaces in bacterial, yeast…". However, no experimental nascent-proteome data is presented for bacteria and yeast.

There have been several recent publications exploring co-translational complex assembly in both yeast and bacteria. For example, Kassem, et al., 2017 explored co-translational assembly of the SAGA complex in yeast; Lautier et al., Mol cell 2021 and Seidel et al., 2022 Nature Com; which combined structural with ribosome profiling and RIP-seq data to explore the co-translational assembly of the nuclear pore complex in yeast. The authors themselves cite several studies exploring co-translational assembly in yeast, including Duncan and Mata 2011; Shiber et al., 2018; Panasenko et al. 2019.

In bacteria, for example, Fujiwara et al., 2020, Cell reports, performed a nascent-proteome-wide study of co-translational interactions in Bacillus subtilis.

The authors' main claim is that large protein complex interfaces have evolved to promote cotranslational assembly, can be greatly supported by utilizing some of these examples for verification.

– "The manuscript states "However, since larger interfaces tend to cotranslationally assemble more frequently, we filtered the subunits to include cases where the size of at least one of the two interfaces is larger than 1000 or 2000 Å2". This is further demonstrated in Figure 3.

The reasoning for selecting these criteria is not clear. The authors should show the distribution of each group (all, 1000, 2000 Å2) of coco vs. non-coco, so one can see how likely is this assumption.

Reviewer #3 (Recommendations for the authors):

1. The authors found that cotranslationally assembling complexes have (on average) larger interfaces than those assumed to assemble after their complete synthesis. The authors considered the effect size of cotranslationally assembling complexes relative to posttranslationally assembling complexes with different types of symmetry and the nature of assembling subunits (e.g. home- and heterodimers, etc). Figure 1C however also shows that the interface size is generally larger in homomeric symmetry groups in comparison with e.g. heterodimers irrespective of the assembly mechanism. Could the authors comment on these differences and discuss them too?

2. Apart from the size of the interface and its location (N-term vs C-term), an important point to consider is protein-protein interaction potential (including hydrogen bonds, electrostatic and hydrophobic interactions, number of contacts, etc). Would these parameters also differ for cotranslationally assembling complexes vs posttranslationally assembling complexes and to what extent?

3. Ribosome surface properties and charge were suggested to impose certain limits on the nature of the cotranslationally forming surfaces. Specifically, electrostatic effects of the ribosomal surface on nascent polypeptide dynamics have been considered of substantial significance. In this regard, it would be important to answer the question of whether there is any charge bias in the cotranslationally assembling surface complexes vs posttranslationally assembling surface complexes that might help cotranslationally assembling complexes avoid unproductive interactions with the ribosome.

Reviewer #4 (Recommendations for the authors):

1. The authors need to calculate the p-values for Figure S1C. If they are not significant, what are we to make of that? It would seem the data does not support the relevant conclusion.

2. The authors should show all their results are robust to using the 'high confidence' only data.

3. The effect size for how common it is for the N-terminus interface to be larger than the C-terminal interface should be computed in the following manner: Calculate the probability that for proteins that have two interfaces, the first interface will be larger than the second (relevant to Figure 3 and S2). This number should be reported in the abstract (as you highlight this find in the second to last sentence of the abstract), as well as the results and discussion. The reason this is to help the community avoid over-generalizing the findings.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Large protein complex interfaces have evolved to promote cotranslational assembly" for further consideration by eLife. Your revised article has been evaluated by Volker Dötsch (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Please provide a full answer to the remaining criticism raised by referee 2. This is an essential revision. Please also take care to clarify the concern of referee 4, as it is your chance to ensure the impact of the paper.

Reviewer #2 (Recommendations for the authors):

The manuscript "Large protein complex interfaces have evolved to promote cotranslational assembly" by Mihaly Badonyi and Joseph A Marsh combines analysis of experimental data from ribosome profiling experiments (mainly Bertolini et al., 2021) with structural data on protein complexes to test the impact of interface size on co-translational complex assembly pathways in mammalians. The authors find a strong correlation between interface size and co-translational subunit association. Expanding their structural analysis to yeast and E. coli complexes, the authors find evidence supporting the hypothesis that large interfaces have evolved to promote co-translational assembly. Thus, larger N` terminal interfaces can serve to facilitate successful co-translational assembly interactions, protecting the nascent proteome.

The revised manuscript has greatly improved and the authors have addressed most of the major concerns.

The authors now provide p-values in the dot plot-boxplot chart in Figure 1 and figure supplement 1D; showing the statistical significance of interface size differences in co-translationally assembling homomers in humans. As this is the main claim of the manuscript, it was an essential revision.

The authors also provide a new aspect of the evolutionary complex's age vs co-translational propensity, as well as explore the other interfacial contact-based properties of the co-translational assembly.

However, one major concern has not been addressed by the authors in a clear manner. In regard to figure 3, heteromers analysis of the correlation between interface size and assembly order, relying on the assumption of stable interface formation. The authors present the example of the interface between the subunits E and B1 of the V-type ATPase (pdb: 6wm2); figure 3d. Vasanthakumar, et al. (2022) cryo-EM study, has shown, that the subunits E and B1 can dissociate post-translationally, following 120º rotation, where a novel interface between the same E subunit forms with a different B subunit (B2 and then B3, corresponding to rotational states 2, 3). See figure 1c of Vasanthakumar, et al., titled "Coordinated conformational changes in the V1 complex during V-ATPase reversible dissociation". In the PDBs mentioned in the authors' reply the same chain corresponding to the B1 subunit (chain B) forms interfaces with chain K (pdb:7tmq), then forms a new, very similar interface with chain I (pdb:7tmp), and in the other rotational state with chain G (pdb:7tmo). Thus, the example of a dynamic interface here does not support the authors' claim of stable interface formation.

Reviewer #4 (Recommendations for the authors):

The authors have addressed my concern regarding figure S1c. I am slightly concerned about their response to point 1, in which they basically say we can't restrict our analysis to high-confidence co-translational dimers because otherwise there is not enough data.

However, as long as other reviewers do not have serious concerns about this latter point, then I am fine with the manuscript as it is.

https://doi.org/10.7554/eLife.79602.sa1

Author response

Essential revisions:

1. The authors' main claim is that N-terminal interfaces are significantly larger than other interfaces and facilitate cotranslational assembly interactions. The authors show interface size differences of high confidence cotranslational assembly complexes compared to low confidence cotranslational assembly complexes compared to all others in Figure S1C, divided into symmetry groups. However, no statistical significance is provided. This is essential, as this provides the basis for the manuscript's main claim.

We agree that the confidence-based classification is an important part of the work reported by Bertolini et al., 2021, in which high-confidence candidates meet two conditions: their disome enrichment profile is impacted by puromycin and proteinase K, and the protein is either cytosolic or nuclear. The first condition tests how genuine the disomes are, because puromycin causes premature chain termination, and proteinase K cleaves interconnected nascent chains C-terminal to aliphatic and aromatic amino acids. Both of these treatments should dissolve true disomes. The second condition of the sole inclusion of cytosolic and nuclear proteins is due to an experimental limitation, namely, that for proteins localised to other compartments the team could not formally exclude the possibility that the ribosomes interact with membrane components of the translocation machinery and that is why they are found in the disome fraction.

Limiting our study to high confidence candidates is prohibitive to the structural analyses, in part because high confidence proteins only comprise 20% of all cotranslationally assembling candidates, and partly because the sample size further drops after mapping to the available structures of complexes. In an attempt to address this, we extended the human homomeric structure data with homology models from the SWISS-MODEL repository, which contains structures of homomeric complexes on condition that the interface is conserved (Bienert et al., 2017; Waterhouse et al., 2018), and was also used during the annotation of homomers in the study by Bertolini et al., 2021. We have updated the dotplot-boxplot chart in Figure 1 —figure supplement 1D to also show the mean interface size, and we calculated p-values comparing high confidence versus all other proteins (excluding low confidence). The differences between high confidence and all other subunits were found significant at symmetry level for all homomers, but not for heterodimers. There are no significant differences observed between high and low confidence proteins for any of the groups. Although we might expect that high confidence proteins should, on average, have larger interfaces than low confidence proteins (assuming that a greater fraction of them represent true cases of cotranslational assembly), we note that the size of the high confidence set is relatively small, especially when split by symmetry group, and that the average interface size of the high confidence proteins is higher for all homomers, except in the very small dihedral group. Overall, we think that the small differences between high and low confidence sets, as well as the small size of the high confidence set, justify our use of the combined sets throughout this study.

We have now added a new paragraph in the manuscript discussing these findings.

2. The manuscript states that the authors compared the areas of first and last translated interfaces in bacteria and yeast. However, no experimental nascent-proteome data is presented for bacteria and yeast. There have been several recent publications exploring co-translational complex assembly in both yeast and bacteria. The authors claim that large protein complex interfaces have evolved to promote cotranslational assembly could be greatly supported by utilizing some of these examples for verification.

We thank the reviewer for their comment as this was an excellent suggestion. We were familiar with the recommended literature, but we did not consider using their data because of their generally small scale. However, in the updated manuscript we collated instances of sequential assembly events of nascent chains and mature subunits in budding yeast (Table 1), which turned out to favour our hypothesis about the first versus last interface size difference being important for sequential cotranslational assembly. In this table, we could not consider Duncan and Mata, 2011, because it was performed in fission yeast, which has diverged from budding yeast more than 500 million years ago, lacks sufficient structural data in the PDB, and the mRNA co-immunoprecipitation method is not specific enough to be considered in our analysis. Also, we could not include Kassem, Villanyi and Collart, 2017, because the Spt20:Ada2 interaction has not been experimentally determined yet, nor has it been computed by Humphreys et al., 2021.

In terms of bacteria, we are not aware of a large-scale data similar to that of Bertolini et al., 2021 on cotranslational assembly. Although, Fujiwara et al., 2020 present compelling results, their method represents a general survey of cotranslational interactions, including protein-nucleic acid binding, and is not specific enough to be considered in our analyses. The authors themselves only present one case study of the complex of SpoIIAB homodimer with the sporulation σ factor F, which does not contain a multi-interface heteromeric subunit.

The works of Lautier et al., 2021 (yeast) and Seidel et al., 2022 (human) are very exciting, but we could not identify structures of multi-interface nuclear pore complex subunits, because of a limited number of subcomplexes in the PDB. A full complex of the cytoplasmic ring of the Xenopus laevis nuclear pore has only just been created in its full detail (Fontana et al., 2022), and the composite structure is not yet available (PDB: 7tdz, hold for release).

3. Apart from the size of the interface and its location (N-term vs C-term), an important point to consider is protein-protein interaction potential (including hydrogen bonds, electrostatic and hydrophobic interactions, number of contacts, etc). Would these parameters also differ for cotranslationally assembling complexes vs posttranslationally assembling complexes and to what extent?

We are grateful to the reviewer for this comment, because at the initial stages of the project, we put a lot of work into exploring other properties that might correlate with cotranslational assembly. Initially, we omitted these results, because we did not want to distract the reader from the main point, which is interface size. However, we strongly agree that it is important to consider protein interaction potential and other interfacial properties. Therefore, in the updated manuscript we include a new section titled Interface area is more important than other interfacial contact-based properties for explaining cotranslational assembly, presenting analyses of the interface properties of C2 symmetric homodimers and heterodimers in cotranslational assembly. The reason for choosing these two groups is because they are well represented in the structural data set, and because they only have one interface, which makes it straightforward to draw conclusions from the results. We hope that the reviewers will find these results interesting, and that the community will appreciate their inclusion.

4. The manuscript states "However, since larger interfaces tend to cotranslationally assemble more frequently, we filtered the subunits to include cases where the size of at least one of the two interfaces is larger than 1000 or 2000 Å2". This is further demonstrated in Figure 3.

The reasoning for selecting these criteria is not clear. The authors should show the distribution of each group (all, 1000, 2000 Å2) of coco vs. non-coco, so one can see how likely is this assumption.

Given our observation earlier in the manuscript that interfaces involved in cotranslational assembly tend to be large, then we expect that subunits that form only small interfaces will not undergo cotranslational assembly. Inclusion of these subunits in our analysis therefore reduces the observed first vs last interface size trend. Although the interface cutoffs may seem arbitrary, the actual values may not matter, because the importance is in the fact that larger interfaces have a higher propensity to cotranslationally assemble. When only subunits with interfaces >1000 and >2000 Å2 are considered, effectively excluding subunits with little chance of undergoing cotranslational assembly, we observe the N-terminal interface size bias is much stronger. We have attempted to make this point more clearly in manuscript text.

In Figure 1 —figure supplement 1D, we show the distribution of interface sizes for coco (simultaneous cotranslational assembly) vs non-coco (all other) proteins, for homomers and heterodimers, but we note that this is not directly comparable to the sizes of our cutoffs in our N vs C interface comparison, where we are comparing the sizes of two different interfaces from the same subunit. Nevertheless, they suggest that cutoffs on the order of 1000 or 2000 Å2, as we have chosen, should serve to enrich the fraction of cotranslationally assembly subunits in our analysis. We also show the full distribution of interface sizes for first and last interfaces in Figure 5 —figure supplement 1A, with the two thresholds highlighted.

To further support the first versus last interface size trend without using arbitrarily chosen interface size cutoffs, in the updated manuscript we have also now provided analyses relating to the evolutionary age of the subunits (Liebeskind, McWhite and Marcotte, 2016). We first highlight the correlation between subunit age and interface size (Figure 4A) and subunit age and cotranslational assembly as detected by Bertolini et al., 2021 (Figure 4B), in a short new section called Evolutionarily more ancient subunits of complexes are more likely to undergo cotranslational assembly. We then use this data to classify multi-interface heteromeric subunits into “ancient” and “more recent” age groups to investigate if older proteins are more likely to have a larger first translated interface, which would provide a more direct evidence to our hypothesis. Indeed, we have found that the difference in size between the first and last interface strongly favours the first interface in both yeast and human to a significant extent (Figure 5D). Altogether, we believe these results well represent the main conclusion of our study, which is that large interfaces have evolved to promote cotranslational assembly, and make a prediction that given large-scale data on the sequential cotranslational assembly mode on heteromers, we should expect their first translated interfaces to be larger.

5. The authors show two examples of simultaneous cotranslational assembly between subunit pairs in heteromeric complexes: the subunits E and B1 of the V-type ATPase (pdb: 6wm2), and importin-9 with histone H2A (pdb: 6n1z) (Figure 2D). Structural data indicate that the V-type ATPase is highly dynamic with several conformations to this complex (Vasanthakumar, et al. (2022) Nat Struct Mol Biol), suggesting that the complex undergoes significant post-translational conformational rotations (see PDB: 7TMQ vs. 7TMM for example). The new rotational variants seem to challenge the prediction and calculation presented here. The authors should address these findings.

Both the E1 and B subunits of the V-type ATPase have been identified to cotranslationally assemble by Bertolini et al., 2021, and the E1 subunit is a high confidence protein with a coiled-coil domain, a motif found enriched in cotranslational assembly. We looked at the structures of the V1 complex (7tmm, 7tmo, 7tmp, 7tmq) and those of the three states of the full V-type ATPase (7tmr, 7tms, 7tmt), which were deposited by Vasanthakumar et al., 2022. Subunits E and B remain in contact in all conformations and states. We calculated the E:B interface midpoints in the linear sequence of subunit E in all seven structures, which resulted in residues 200 (7tmr), 206 (7tmo), and 209 (7tmm, 7tmp, 7tmq, 7tms, 7tmt). The yeast subunit E is 236 amino acids long, thus the 9-residue variation in interface midpoints represents only 3.8% of the total translated polypeptide length. These results suggest that despite the dynamic nature of the complex, using the interface midpoint is a robust and reliable way of mapping cotranslationally forming interfaces.

6. The effect size for how common it is for the N-terminus interface to be larger than the C-terminal interface should be computed in the following manner: Calculate the probability that for proteins that have two interfaces, the first interface will be larger than the second (relevant to Figure 3 and S2). This number should be reported in the abstract (as you highlight this find in the second to last sentence of the abstract), as well as the results and discussion. The reason for this is to help the community avoid over-generalizing the findings.

Although we think that simplifying this result to a single number could potentially obscure the interesting differences we observe between species and different types of complexes, we have now done this and added the following statement to the abstract: “When considering all together, we observe the N-terminal interface to be larger than the C-terminal interface 54% of the time, increasing to 64% when we exclude subunits with only small interfaces, which are unlikely to cotranslationally assemble.” We have also further expanded upon our discussion of these results with respect to different species in the text.

Reviewer #1 (Recommendations for the authors):

1. Statistical significance should be indicated in Figure S1C, as this is an essential comparison to the experimental data.

Please see Essential revisions (1).

2. The approach presented in Figure 2A for mapping the interactions is not well described and Figure legend is missing details as to (i) why the assembly onset is shown at the time indicated (and the description of what the red element is and where this is taken from) and (ii) what are the blue, gray, and yellow areas. It would be also important to validate this method on a well-defined dataset.

We specified the source data in Methods, which is the work by Bertolini et al., 2021, who developed a model based on a sigmoid fit to the monosome-to-disome enrichment profile with the aim of deriving the assembly-onsets for all candidates. We did include a paragraph in the previous Results section of the manuscript explaining how the assembly-onset is defined:

“To perform an analysis at the multi-interface level, we made use of the assembly-onset positions determined for every protein in the cotranslational assembly data set (Bertolini et al., 2021). An assembly-onset is a single residue in the protein sequence, whose codon is being decoded by the ribosome at the time of cotranslational assembly. In order to identify which interface the assembly-onsets belongs to, we mapped them to the closest interface midpoint in the linear protein sequence, as illustrated in Figure 2A. This is to avoid biases from large interfaces, which have many more interface residues and therefore a higher probability that an assembly-onset would map to them if the interface was not compressed into a single residue, the midpoint.”

We agree that it would be helpful to validate the method, but we could not identify multi-interface heteromeric subunits among the simultaneously assembling proteins discovered to date (Liu et al., 2016; Kamenova et al., 2019; Panasenko et al., 2019) whose interfaces we mapped based on the assembly-onset positions, due to limitations of the structural data. We hope that future case studies will provide more mechanistic insight into the assembly of these subunits and make the validation of our method possible.

We thank the reviewer for noticing that the figure description was lacking. We made changes to specify that individual interfaces in the area chart are represented by different colours and improved the description of cumulative interface area.

Reviewer #2 (Recommendations for the authors):

The manuscript provides a clear analysis of current structural and experimental data, characterizing novel interface features facilitating cotranslational assembly as well as demonstrating their conservation.

The manuscript is an important contribution to the field of protein biogenesis, however, a few key points remain unanswered.

– The authors' main claim is that N` terminal interfaces are significantly larger than other interfaces and facilitate cotranslational assembly interactions. The authors show interface size differences of high confidence cotranslational assembly complexes compared to low confidence cotranslational assembly complexes compared to all others in Figure S1C, divided into symmetry groups.

However, no statistical significance is provided for any. This is required, as this provides the basis for the manuscript's main claim.

Please see Essential revisions (1).

– In the manuscript it is suggested that "According to this theory, the largest interfaces in a complex correspond to the earliest forming subcomplexes within the assembly pathway, irrespective of the binding mode." This is shown for two examples, in Figure 2D: simultaneous cotranslational assembly between subunit pairs in heteromeric complexes: the subunits E and B1 of the V-type ATPase (pdb: 6wm2), and importin-9 with histone H2A (pdb: 6n1z).

However, for the V-type ATPase, there is new structural data from Vasanthakumar, et al. (2022) Nat Struct Mol Biol. Vasanthakumar and colleagues have shown the complex is highly dynamic, assigning several conformations to this complex, whether attached or not attached to the membrane as well as rotational variants. In one conformation the interface of subunits E is with A and in another with B. The results suggest the complex undergoes significant post-translational conformational rotations (see PDB: 7TMQ vs. 7TMM for example). The new rotational variants seem to challenge the prediction and calculation presented here. The authors should address these findings.

Please see Essential revisions (5).

– The manuscript states that "We compared the areas of first and last translated interfaces in bacterial, yeast…". However, no experimental nascent-proteome data is presented for bacteria and yeast.

There have been several recent publications exploring co-translational complex assembly in both yeast and bacteria. For example, Kassem, et al., 2017 explored co-translational assembly of the SAGA complex in yeast; Lautier et al., Mol cell 2021 and Seidel et al., 2022 Nature Com; which combined structural with ribosome profiling and RIP-seq data to explore the co-translational assembly of the nuclear pore complex in yeast. The authors themselves cite several studies exploring co-translational assembly in yeast, including Duncan and Mata 2011; Shiber et al., 2018; Panasenko et al. 2019.

In bacteria, for example, Fujiwara et al., 2020, Cell reports, performed a nascent-proteome-wide study of co-translational interactions in Bacillus subtilis.

The authors' main claim is that large protein complex interfaces have evolved to promote cotranslational assembly, can be greatly supported by utilizing some of these examples for verification.

We thank the reviewer for these suggestions as it has helped us provide more supporting evidence for the first versus last interface size trend. Please see Essential Revisions (2).

– "The manuscript states "However, since larger interfaces tend to cotranslationally assemble more frequently, we filtered the subunits to include cases where the size of at least one of the two interfaces is larger than 1000 or 2000 Å2". This is further demonstrated in Figure 3.

The reasoning for selecting these criteria is not clear. The authors should show the distribution of each group (all, 1000, 2000 Å2) of coco vs. non-coco, so one can see how likely is this assumption.

Please see Essential revisions (4).

Reviewer #3 (Recommendations for the authors):

1. The authors found that cotranslationally assembling complexes have (on average) larger interfaces than those assumed to assemble after their complete synthesis. The authors considered the effect size of cotranslationally assembling complexes relative to posttranslationally assembling complexes with different types of symmetry and the nature of assembling subunits (e.g. home- and heterodimers, etc). Figure 1C however also shows that the interface size is generally larger in homomeric symmetry groups in comparison with e.g. heterodimers irrespective of the assembly mechanism. Could the authors comment on these differences and discuss them too?

While we agree that these differences are interesting on their own, they also represent fundamental properties of homomeric symmetry groups and heterodimers, which were reported before (see Table 2 in Jones and Thornton, 1996; and Figure S5 in Marsh and Teichmann, 2014). The main reason members of the cyclic and dihedral symmetry have larger subunit interfaces is because they have more than one interface. The difference in interface size between homo- and heterodimers has been speculated to be because isologous interfaces are generally more ancient (Kim et al., 2006; Dayhoff et al., 2010), and ancient proteins tend to have larger interfaces, as we show in Figure 4A in the updated manuscript. We have made changes to the second paragraph of the Results section to include these ideas for the reader.

2. Apart from the size of the interface and its location (N-term vs C-term), an important point to consider is protein-protein interaction potential (including hydrogen bonds, electrostatic and hydrophobic interactions, number of contacts, etc). Would these parameters also differ for cotranslationally assembling complexes vs posttranslationally assembling complexes and to what extent?

We thank the reviewer for their helpful suggestion, as it adds great value to the manuscript, and led to interesting observations. Please see Essential Revisions (3).

3. Ribosome surface properties and charge were suggested to impose certain limits on the nature of the cotranslationally forming surfaces. Specifically, electrostatic effects of the ribosomal surface on nascent polypeptide dynamics have been considered of substantial significance. In this regard, it would be important to answer the question of whether there is any charge bias in the cotranslationally assembling surface complexes vs posttranslationally assembling surface complexes that might help cotranslationally assembling complexes avoid unproductive interactions with the ribosome.

The reviewer raises an interesting point. It is without doubt that charge plays complex roles in protein biosynthesis, including the regulation of translation speed (Charneski and Hurst, 2013), and the recruitment of cotranslational chaperones (Döring et al., 2017). There are also a number of theories concerned with effects attributed to ribosome surface properties (Knight et al., 2013; Schavemaker, Śmigiel and Poolman, 2017; Cassaignau et al., 2021; Deckert et al., 2021). In the updated manuscript, we present new analyses (Figure 2) which will be of interest to the reviewer. We observe that cotranslationally assembling subunits of both C2 symmetric homodimers and heterodimers have a larger fraction of charged residues on their surfaces, which tend to be more negative than positive. Our understanding is that it is positive residues in emerging nascent chains that interact with ribosome surfaces (Cassaignau et al., 2021; Deckert et al., 2021), which would suggest cotranslationally assembling subunits are not more likely to interact with the ribosome surface than other subunits. This is consistent with the finding that the role of these interactions is to halt cotranslational folding (Cassaignau et al., 2021), but cotranslationally assembling subunits should fold cotranslationally to some extent to be able to present at least partially complete interfaces to their partners. We also include in the Results section four mutually non-exclusive hypotheses that we think might explain our observations with surface charge, one of which was kindly suggested by the reviewer.

Reviewer #4 (Recommendations for the authors):

1. The authors need to calculate the p-values for Figure S1C. If they are not significant, what are we to make of that? It would seem the data does not support the relevant conclusion.

Please see Essential revisions (1).

2. The authors should show all their results are robust to using the 'high confidence' only data.

Please see Essential revisions (1).

3. The effect size for how common it is for the N-terminus interface to be larger than the C-terminal interface should be computed in the following manner: Calculate the probability that for proteins that have two interfaces, the first interface will be larger than the second (relevant to Figure 3 and S2). This number should be reported in the abstract (as you highlight this find in the second to last sentence of the abstract), as well as the results and discussion. The reason this is to help the community avoid over-generalizing the findings.

Please see Essential revisions (6).

References

Bergendahl, L. T. and Marsh, J. A. (2017) ‘Functional determinants of protein assembly into homomeric complexes’, Scientific Reports. Nature Publishing Group, 7(1). doi: 10.1038/s41598-017-05084-8.

Bertolini, M. et al. (2021) ‘Interactions between nascent proteins translated by adjacent ribosomes drive homomer assembly’, Science. American Association for the Advancement of Science, 371(6524), pp. 57–64. doi: 10.1126/science.abc7151.

Bienert, S. et al. (2017) ‘The SWISS-MODEL Repository-new features and functionality’, Nucleic Acids Research. Nucleic Acids Res, 45(D1), pp. D313–D319. doi: 10.1093/nar/gkw1132.

Cassaignau, A. M. E. et al. (2021) ‘Interactions between nascent proteins and the ribosome surface inhibit co-translational folding’, Nature Chemistry. Nature Publishing Group, 13(12), pp. 1214–1220. doi: 10.1038/s41557-021-00796-x.

Charneski, C. A. and Hurst, L. D. (2013) ‘Positively Charged Residues Are the Major Determinants of Ribosomal Velocity’, PLoS Biology. Public Library of Science, 11(3), p. e1001508. doi: 10.1371/journal.pbio.1001508.

Chen, X. and Mayr, C. (2022) ‘A working model for condensate RNA-binding proteins as matchmakers for protein complex assembly’, RNA. Cold Spring Harbor Laboratory Press, 28(1), pp. 76–87. doi: 10.1261/RNA.078995.121.

Dayhoff, J. E. et al. (2010) ‘Evolution of Protein Binding Modes in Homooligomers’, Journal of Molecular Biology. Academic Press, 395(4), pp. 860–870. doi: 10.1016/j.jmb.2009.10.052.

Deckert, A. et al. (2021) ‘Common sequence motifs of nascent chains engage the ribosome surface and trigger factor’, Proceedings of the National Academy of Sciences of the United States of America. National Academy of Sciences, 118(52). doi: 10.1073/pnas.2103015118.

Döring, K. et al. (2017) ‘Profiling Ssb-Nascent Chain Interactions Reveals Principles of Hsp70-Assisted Folding’, Cell, 170(2), pp. 298-311.e20. doi: 10.1016/j.cell.2017.06.038.

Duncan, C. D. S. and Mata, J. (2011) ‘Widespread cotranslational formation of protein complexes’, PLoS Genetics. Edited by D. A. Wolf, 7(12), p. e1002398. doi: 10.1371/journal.pgen.1002398.

Fontana, P. et al. (2022) ‘Structure of cytoplasmic ring of nuclear pore complex by integrative cryo-EM and AlphaFold.’, Science (New York, N.Y.). American Association for the Advancement of Science, 376(6598), p. eabm9326. doi: 10.1126/science.abm9326.

Forrest, L. R. (2015) ‘Structural Symmetry in Membrane Proteins∗’, Annual Review of Biophysics. Annual Reviews Inc, 44(1), pp. 311–337. doi: 10.1146/annurev-biophys-051013-023008.

Fujiwara, K. et al. (2020) ‘Proteome-wide Capture of Co-translational Protein Dynamics in Bacillus subtilis Using TnDR, a Transposable Protein-Dynamics Reporter’, Cell Reports, 33(2). doi: 10.1016/j.celrep.2020.108250.

Goodsell, D. S. and Olson, A. J. (2000) ‘Structural symmetry and protein function’, Annual Review of Biophysics and Biomolecular Structure. Annual Reviews 4139 El Camino Way, P.O. Box 10139, Palo Alto, CA 94303-0139, USA, pp. 105–153. doi: 10.1146/annurev.biophys.29.1.105.

Halbach, A. et al. (2009) ‘Cotranslational assembly of the yeast SET1C histone methyltransferase complex’, EMBO Journal. John Wiley and Sons, Ltd, 28(19), pp. 2959–2970. doi: 10.1038/emboj.2009.240.

Humphreys, I. R. et al. (2021) ‘Computed structures of core eukaryotic protein complexes’, Science. American Association for the Advancement of Science (AAAS), 374(6573). doi: 10.1126/science.abm4805.

Jones, S. and Thornton, J. M. (1996) ‘Principles of protein-protein interactions’, Proceedings of the National Academy of Sciences of the United States of America. National Academy of Sciences, pp. 13–20. doi: 10.1073/pnas.93.1.13.

Kamenova, I. et al. (2019) ‘Co-translational assembly of mammalian nuclear multisubunit complexes’, Nature Communications. Nature Publishing Group, 10(1). doi: 10.1038/s41467-019-09749-y.

Kassem, S., Villanyi, Z. and Collart, M. A. (2017) ‘Not5-dependent co-translational assembly of Ada2 and Spt20 is essential for functional integrity of SAGA’, Nucleic Acids Research. Oxford Academic, 45(3), pp. 1186–1199. doi: 10.1093/NAR/GKW1059.

Keene, J. D. (2007) ‘RNA regulons: Coordination of post-transcriptional events’, Nature Reviews Genetics. Nature Publishing Group, pp. 533–543. doi: 10.1038/nrg2111.

Kim, W. K. et al. (2006) ‘The many faces of protein-protein interactions: A compendium of interface geometry’, PLoS Computational Biology. Public Library of Science, 2(9), pp. 1151–1164. doi: 10.1371/journal.pcbi.0020124.

Knight, A. M. et al. (2013) ‘Electrostatic effect of the ribosomal surface on nascent polypeptide dynamics’, ACS Chemical Biology, 8(6), pp. 1195–1204. doi: 10.1021/cb400030n.

Lautier, O. et al. (2021) ‘Co-translational assembly and localized translation of nucleoporins in nuclear pore complex biogenesis’, Molecular cell. Mol Cell, 81(11), pp. 2417-2427.e5. doi: 10.1016/J.MOLCEL.2021.03.030.

Levy, E. D. et al. (2008) ‘Assembly reflects evolution of protein complexes’, Nature. Nature Publishing Group, 453(7199), pp. 1262–1265. doi: 10.1038/nature06942.

Liebeskind, B. J., McWhite, C. D. and Marcotte, E. M. (2016) ‘Towards consensus gene ages’, Genome Biology and Evolution. Oxford Academic, 8(6), pp. 1812–1823. doi: 10.1093/gbe/evw113.

Liu, F. et al. (2016) ‘Cotranslational association of mRNA encoding subunits of heteromeric ion channels’, Proceedings of the National Academy of Sciences of the United States of America. National Academy of Sciences, 113(17), pp. 4859–4864. doi: 10.1073/pnas.1521577113.

Marsh, J. A. et al. (2013) ‘Protein complexes are under evolutionary selection to assemble via ordered pathways’, Cell, 153(2), pp. 461–470. doi: 10.1016/j.cell.2013.02.044.

Marsh, J. A. and Teichmann, S. A. (2014) ‘Protein Flexibility Facilitates Quaternary Structure Assembly and Evolution’, PLoS Biology. Edited by G. A. Petsko. Public Library of Science, 12(5), p. e1001870. doi: 10.1371/journal.pbio.1001870.

Panasenko, O. O. et al. (2019) ‘Co-translational assembly of proteasome subunits in NOT1-containing assemblysomes’, Nature Structural and Molecular Biology. Nature Publishing Group, 26(2), pp. 110–120. doi: 10.1038/s41594-018-0179-5.

Ponstingl, H. et al. (2005) ‘Morphological aspects of oligomeric protein structures’, Progress in Biophysics and Molecular Biology. Pergamon, pp. 9–35. doi: 10.1016/j.pbiomolbio.2004.07.010.

Schavemaker, P. E., Śmigiel, W. M. and Poolman, B. (2017) ‘Ribosome surface properties may impose limits on the nature of the cytoplasmic proteome’, eLife. eLife Sciences Publications Ltd, 6. doi: 10.7554/ELIFE.30084.

Seidel, M. et al. (2022) ‘Co-translational assembly orchestrates competing biogenesis pathways’, Nature Communications. Nature Publishing Group, 13(1), pp. 1–15. doi: 10.1038/s41467-022-28878-5.

Shiber, A. et al. (2018) ‘Cotranslational assembly of protein complexes in eukaryotes revealed by ribosome profiling’, Nature. Nature Publishing Group, 561(7722), pp. 268–272. doi: 10.1038/s41586-018-0462-y.

Vasanthakumar, T. et al. (2022) ‘Coordinated conformational changes in the V1 complex during V-ATPase reversible dissociation’, Nature Structural and Molecular Biology 2022 29:5. Nature Publishing Group, 29(5), pp. 430–439. doi: 10.1038/s41594-022-00757-z.

Waterhouse, A. et al. (2018) ‘SWISS-MODEL: Homology modelling of protein structures and complexes’, Nucleic Acids Research. Oxford Academic, 46(W1), pp. W296–W303. doi: 10.1093/nar/gky427.

Wells, J. N., Bergendahl, L. T. and Marsh, J. A. (2015) ‘Co-translational assembly of protein complexes’, Biochemical Society Transactions, 43, pp. 1221–1226. doi: 10.1042/BST20150159.

Wells, J. N., Bergendahl, L. T. and Marsh, J. A. (2016) ‘Operon Gene Order Is Optimized for Ordered Protein Complex Assembly’, Cell Reports, 14(4), pp. 679–685. doi: 10.1016/j.celrep.2015.12.085.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Please provide a full answer to the remaining criticism raised by referee 2. This is an essential revision. Please also take care to clarify the concern of referee 4, as it is your chance to ensure the impact of the paper.

Reviewer #2 (Recommendations for the authors):

The manuscript "Large protein complex interfaces have evolved to promote cotranslational assembly" by Mihaly Badonyi and Joseph A Marsh combines analysis of experimental data from ribosome profiling experiments (mainly Bertolini et al., 2021) with structural data on protein complexes to test the impact of interface size on co-translational complex assembly pathways in mammalians. The authors find a strong correlation between interface size and co-translational subunit association. Expanding their structural analysis to yeast and E. coli complexes, the authors find evidence supporting the hypothesis that large interfaces have evolved to promote co-translational assembly. Thus, larger N` terminal interfaces can serve to facilitate successful co-translational assembly interactions, protecting the nascent proteome.The revised manuscript has greatly improved and the authors have addressed most of the major concerns.

The authors now provide p-values in the dot plot-boxplot chart in Figure 1 and figure supplement 1D; showing the statistical significance of interface size differences in co-translationally assembling homomers in humans. As this is the main claim of the manuscript, it was an essential revision.

The authors also provide a new aspect of the evolutionary complex's age vs co-translational propensity, as well as explore the other interfacial contact-based properties of the co-translational assembly.

However, one major concern has not been addressed by the authors in a clear manner. In regard to figure 3, heteromers analysis of the correlation between interface size and assembly order, relying on the assumption of stable interface formation. The authors present the example of the interface between the subunits E and B1 of the V-type ATPase (pdb: 6wm2); figure 3d. Vasanthakumar, et al. (2022) cryo-EM study, has shown, that the subunits E and B1 can dissociate post-translationally, following 120º rotation, where a novel interface between the same E subunit forms with a different B subunit (B2 and then B3, corresponding to rotational states 2, 3). See figure 1c of Vasanthakumar, et al., titled "Coordinated conformational changes in the V1 complex during V-ATPase reversible dissociation". In the PDBs mentioned in the authors' reply the same chain corresponding to the B1 subunit (chain B) forms interfaces with chain K (pdb:7tmq), then forms a new, very similar interface with chain I (pdb:7tmp), and in the other rotational state with chain G (pdb:7tmo). Thus, the example of a dynamic interface here does not support the authors' claim of stable interface formation.

While there is evidence that subunit exchange may be widespread in proteomes (Tusk, Delalez and Berry, 2018), it was suggested by the authors of the study that cotranslationally assembled components are the least likely to exchange. Vasanthakumar et al., 2022, demonstrate large conformational changes throughout the V1 subcomplex on separation from the VO subcomplex in the yeast V-ATPase. Their work represents an important contribution to understanding how the regulator of the ATPase of vacuoles and endosomes (RAVE) complex might reassemble V1 and VO by recruiting subunit C, as well as revealing the inherent dynamism of the complex in mechanistic detail. However, subunits B and E do not change relative positions or undergo subunit exchange during the conformational rearrangements described by the authors. The apparent distinct interfaces formed by the E subunit with different B subunits in the different states are due to differences in author chain name assignment between the different PDB files. The reviewer correctly notes that chain B directly interacts with chains G, I and K in the three different structures, 7tmo, 7tmp and 7tmq, respectively. However, chains G, I and K in these different structures are in fact referring to the same subunit, i.e. there is not really a ~120° rotation between the E subunits in the different states 1, 2 and 3, the authors have simply named the chains differently in the different files. This can be clearly observed in the video created by the authors to illustrate the conformational changes that occur during the cycle of dissociation and reassembly of the V1 and VO subcomplexes (Video 4, https://www.nature.com/articles/s41594-022-00757-z#Sec22). The video shows how the two subcomplexes dissociate during glucose starvation, whereby the a and d subunits (green and cyan, respectively) first initiate the mechanical inhibition of proton conductance through VO (Kishikawa et al., 2020), which is followed by the V1 subcomplex adopting an inhibited conformation as well (state 2). The latter requires the ~150° rotation of the C-terminal region of subunit H (HCT, orange) towards the central rotor and the subsequent dissociation of subunit C (dark orange). Following these steps, the V1 subcomplex, now devoid of subunit C, is allowed to transition between states 1, 2, and 3, despite lacking magnesium ATPase activity. However, the interconversion between states is not associated with exchange of the B or E subunits (red and purple, respectively), and each pair of interacting B and E subunits remains in direct contact across all states. The mechanistic events reported by Vasanthakumar et al. are therefore consistent with cotranslational assembly of subunits B and E.

We would also like to note that conformational rearrangements post-assembly are not incompatible with the cotranslational assembly of subunits, since proteins with large interfaces are inherently more flexible (Marsh and Teichmann, 2014) and thus more amenable to conformational changes, functional or otherwise. An example of this is subunits of the proteasome, which have been observed to cotranslationally assemble by multiple independent studies (Duncan and Mata, 2011; Shiber et al., 2018; Panasenko et al., 2019). Strikingly, 19 out of 32 subunits in the human 26S proteasome (pdb: 6msd) were found to cotranslationally assemble by Bertolini et al., with 8 out of 19 being high confidence. The work by Dambacher et al., 2016, illustrates well the dynamic rearrangements during lid incorporation into the 26S proteasome (Video 1), whereas the work by Dong et al., 2018, shows the complete dynamic process of the substrate engagement (Video 1). Thus, while random dissociation of cotranslationally assembled components remains unlikely, dynamic conformational changes thereof are not disfavoured.

We have now updated the text to note that while major conformational rearrangements can occur, the same B and E subunits remain in contact across all states, and mention the potential for large posttranslational rearrangements in structures being common in cotranslationally assembling subunits.

Reviewer #4 (Recommendations for the authors):

The authors have addressed my concern regarding figure S1c. I am slightly concerned about their response to point 1, in which they basically say we can't restrict our analysis to high-confidence co-translational dimers because otherwise there is not enough data.

However, as long as other reviewers do not have serious concerns about this latter point, then I am fine with the manuscript as it is.

We acknowledge the concern of the reviewer, and that our analysis is inevitably limited by the available complexes in the PDB. We hope that the growing intensity of experimental structure determination and improving computational approaches will resolve the remaining complexes in the very near future, making possible further advances upon structural studies such as ours. We would like to emphasise that the analysis presented in Figure S1A strongly suggests that there is no artificial interface size bias in response to biochemical fractionation, which justifies the use of pooled high and low confidence dimers for further use. Additionally, the first versus last interface size analysis, essentially the main conclusion of the study, does not hinge upon the results obtained via analysis of the Bertolini data.

References

Bertolini, M. et al. (2021) ‘Interactions between nascent proteins translated by adjacent ribosomes drive homomer assembly’, Science. American Association for the Advancement of Science, 371(6524), pp. 57–64. doi: 10.1126/science.abc7151.

Dambacher, C. M. et al. (2016) ‘Atomic structure of the 26S proteasome lid reveals the mechanism of deubiquitinase inhibition’, eLife. eLife Sciences Publications Ltd, 5(JANUARY2016). doi: 10.7554/eLife.13027.

Dong, Y. et al. (2019) ‘Cryo-EM structures and dynamics of substrate-engaged human 26S proteasome’, Nature. Nature Publishing Group, 565(7737), pp. 49–55. doi: 10.1038/s41586-018-0736-4.

Duncan, C. D. S. and Mata, J. (2011) ‘Widespread cotranslational formation of protein complexes’, PLoS Genetics. Edited by D. A. Wolf, 7(12), p. e1002398. doi: 10.1371/journal.pgen.1002398.

Kishikawa, J. I. et al. (2020) ‘Mechanical inhibition of isolated Vo from V/A-Atpase for proton conductance’, eLife. eLife Sciences Publications Ltd, 9, pp. 1–20. doi: 10.7554/eLife.56862.

Marsh, J. A. and Teichmann, S. A. (2014) ‘Protein Flexibility Facilitates Quaternary Structure Assembly and Evolution’, PLoS Biology. Edited by G. A. Petsko. Public Library of Science, 12(5), p. e1001870. doi: 10.1371/journal.pbio.1001870.

Panasenko, O. O. et al. (2019) ‘Co-translational assembly of proteasome subunits in NOT1-containing assemblysomes’, Nature Structural and Molecular Biology. Nature Publishing Group, 26(2), pp. 110–120. doi: 10.1038/s41594-018-0179-5.

Shiber, A. et al. (2018) ‘Cotranslational assembly of protein complexes in eukaryotes revealed by ribosome profiling’, Nature. Nature Publishing Group, 561(7722), pp. 268–272. doi: 10.1038/s41586-018-0462-y.

Tusk, S. E., Delalez, N. J. and Berry, R. M. (2018) ‘Subunit Exchange in Protein Complexes’, Journal of Molecular Biology. Academic Press, pp. 4557–4579. doi: 10.1016/j.jmb.2018.06.039.

Vasanthakumar, T. et al. (2022) ‘Coordinated conformational changes in the V1 complex during V-ATPase reversible dissociation’, Nature Structural and Molecular Biology 2022 29:5. Nature Publishing Group, 29(5), pp. 430–439. doi: 10.1038/s41594-022-00757-z.

https://doi.org/10.7554/eLife.79602.sa2

Article and author information

Author details

  1. Mihaly Badonyi

    MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8305-5618
  2. Joseph A Marsh

    MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Conceptualization, Data curation, Supervision, Writing - review and editing
    For correspondence
    joseph.marsh@ed.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4132-0628

Funding

Biotechnology and Biological Sciences Research Council (BB/M010996/1)

  • Mihaly Badonyi

Medical Research Council (MR/M02122X/1)

  • Joseph A Marsh

Lister Institute of Preventive Medicine

  • Joseph A Marsh

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

MB is supported by the Biotechnology and Biological Sciences Research Council EASTBIO DTP (BB/M010996/1), and used resources provided by the Edinburgh Compute and Data Facility. JAM was supported by a Medical Research Council Career Development Award (MR/M02122X/1) and is a Lister Institute Research Prize Fellow.

Senior Editor

  1. Volker Dötsch, Goethe University, Germany

Reviewing Editor

  1. Marina V Rodnina, Max Planck Institute for Multidisciplinary Sciences, Germany

Publication history

  1. Preprint posted: May 26, 2021 (view preprint)
  2. Received: April 20, 2022
  3. Accepted: July 27, 2022
  4. Accepted Manuscript published: July 28, 2022 (version 1)
  5. Version of Record published: August 10, 2022 (version 2)

Copyright

© 2022, Badonyi and Marsh

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,951
    Page views
  • 487
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Mihaly Badonyi
  2. Joseph A Marsh
(2022)
Large protein complex interfaces have evolved to promote cotranslational assembly
eLife 11:e79602.
https://doi.org/10.7554/eLife.79602

Further reading

    1. Computational and Systems Biology
    Shunpei Yamauchi, Takashi Nozoe ... Yuichi Wakamoto
    Research Article

    Intracellular states probed by gene expression profiles and metabolic activities are intrinsically noisy, causing phenotypic variations among cellular lineages. Understanding the adaptive and evolutionary roles of such variations requires clarifying their linkage to population growth rates. Extending a cell lineage statistics framework, here we show that a population’s growth rate can be expanded by the cumulants of a fitness landscape that characterize how fitness distributes in a population. The expansion enables quantifying the contribution of each cumulant, such as variance and skewness, to population growth. We introduce a function that contains all the essential information of cell lineage statistics, including mean lineage fitness and selection strength. We reveal a relation between fitness heterogeneity and population growth rate response to perturbation. We apply the framework to experimental cell lineage data from bacteria to mammalian cells, revealing distinct levels of growth rate gain from fitness heterogeneity across environments and organisms. Furthermore, third or higher order cumulants’ contributions are negligible under constant growth conditions but could be significant in regrowing processes from growth-arrested conditions. We identify cellular populations in which selection leads to an increase of fitness variance among lineages in retrospective statistics compared to chronological statistics. The framework assumes no particular growth models or environmental conditions, and is thus applicable to various biological phenomena for which phenotypic heterogeneity and cellular proliferation are important.

    1. Computational and Systems Biology
    Jeffrey Molendijk, Ronnie Blazev ... Benjamin L Parker
    Research Article

    Improving muscle function has great potential to improve the quality of life. To identify novel regulators of skeletal muscle metabolism and function, we performed a proteomic analysis of gastrocnemius muscle from 73 genetically distinct inbred mouse strains, and integrated the data with previously acquired genomics and >300 molecular/phenotypic traits via quantitative trait loci mapping and correlation network analysis. These data identified thousands of associations between protein abundance and phenotypes and can be accessed online (https://muscle.coffeeprot.com/) to identify regulators of muscle function. We used this resource to prioritize targets for a functional genomic screen in human bioengineered skeletal muscle. This identified several negative regulators of muscle function including UFC1, an E2 ligase for protein UFMylation. We show UFMylation is up-regulated in a mouse model of amyotrophic lateral sclerosis, a disease that involves muscle atrophy. Furthermore, in vivo knockdown of UFMylation increased contraction force, implicating its role as a negative regulator of skeletal muscle function.