Firstprinciples model of optimal translation factors stoichiometry
Abstract
Enzymatic pathways have evolved uniquely preferred protein expression stoichiometry in living cells, but our ability to predict the optimal abundances from basic properties remains underdeveloped. Here, we report a biophysical, firstprinciples model of growth optimization for core mRNA translation, a multienzyme system that involves proteins with a broadly conserved stoichiometry spanning two orders of magnitude. We show that predictions from maximization of ribosome usage in a parsimonious flux model constrained by proteome allocation agree with the conserved ratios of translation factors. The analytical solutions, without free parameters, provide an interpretable framework for the observed hierarchy of expression levels based on simple biophysical properties, such as diffusion constants and protein sizes. Our results provide an intuitive and quantitative understanding for the construction of a central process of life, as well as a path toward rational design of pathwayspecific enzyme expression stoichiometry.
Introduction
A universal challenge faced by both evolution and synthetic pathway creation is to optimize the cellular abundance of proteins. This abundance optimization problem is not only multidimensional – often involving several proteins participating in the same pathway – but also under systemswide constraints, such as limited physical space (Klumpp et al., 2013) and finite nutrient inputs (You et al., 2013). The complexity of this problem has prevented rational design of protein expression for pathway engineering (Jeschek et al., 2017). Fundamentally, being able to predict the optimal and observed cellular protein abundances from their individual properties would reflect an ultimate understanding of molecular and systems biology.
Evolutionary comparison of gene expression across microorganisms suggests that basic principles governing the optimization problem may exist. We recently reported broad conservation of relative protein synthesis rates within individual pathways, even under circumstances in which the relative transcription and translation rates for the homologous enzymes have dramatically diverged across species (Lalanne et al., 2018). Moreover, distinct proteins that evolved convergently toward the same biological function also displayed the same stoichiometry of protein synthesis in their respective species. These results suggest that the determinants of optimal inpathway protein stoichiometry are likely modular and independent of detailed biochemical or physiological properties that differ across clades. However, the precise nature of such determinants remains unknown.
Translation of mRNA into proteins is a central pathway required for cell growth and therefore serves as an entry point for establishing a quantitative model of growthoptimized inpathway stoichiometry. As a group, the total amount of translationrelated proteins per cell mass linearly increases with growth rate in most conditions (Scott et al., 2010; Dai et al., 2016; Schaechter et al., 1958), a relationship considered a bacterial ‘growth law’. In addition to ribosomes which have wellcoordinated synthesis of subunits (Nomura et al., 1984), the translation pathway is comprised of nearly 100 protein factors involved in facilitating ribosome assembly, translation initiation, elongation, and termination (Marintchev and Wagner, 2004; Dever and Green, 2012; Rodnina, 2018). The intracellular abundances of these factors vary over 100fold (Pedersen et al., 1978; Li et al., 2014), and their ratios are often maintained in different growth conditions and across different species (Lalanne et al., 2018). What dictates the observed stoichiometry among translation factors is less understood. Early studies predicted expression of the highly expressed elongation factor Tu (EFTu) relative to the ribosome (Klumpp et al., 2013; Ehrenberg and Kurland, 1984) by maximizing translational flux per unit proteome. More recently, expression of several other components involved in the elongation step (ribosomes, tRNA, mRNA, EFTu, and EFTs) was predicted by minimizing the total mass of the components at a fixed translational flux (Hu et al., 2020). The selective pressure on expression levels remains to be determined for most members of the translation machinery, including initiation and termination factors that are much more lowly expressed and often assumed to be nonlimiting.
Here, we sought to derive an intuitive model to understand the quantitative abundance hierarchy (Figure 1B) among the core translation factors (tlFs), which have wellcharacterized functions (Table 1, schematic in Figure 1A). Our goal is not to exhaustively model the heterogeneous movement of ribosomes on the transcriptome (Shaw et al., 2003; Reuveni et al., 2011; Subramaniam et al., 2014; Dykeman, 2020) or to include as many details of the underlying molecular steps as possible (Hu et al., 2020; Vieira et al., 2016). Instead, we coarsegrained global translation into a cycle that consists of sequential steps with interconnected fluxes that depend on core tlFs concentrations. At steadystate cell growth, all individual fluxes are matched and the overall rate of ribosomes completing the full translation cycle is proportional to cell growth. By solving for the maximum flux under proteome allocation constraints, we obtained analytical solutions for the optimal factor concentrations, which agree well with the observed values. The ratios of optimal concentrations depend only on simple biophysical parameters that are broadly conserved across species. For instance, elongation factor EFG is predicted to be more abundant than initiation and termination tlFs by a multiplicative factor of $\approx \sqrt{\text{average number of codons per protein}}\approx 14$, whereas EFTu is predicted to be more abundant than EFG by a factor of $\approx \sqrt{\text{number of different amino acids}}\approx 4$. These results, arising from the optimization procedure and generic properties of the translation cycle, provide rationales for the orderofmagnitude expression of these important enzymes.
Results
Problem statement and model formulation
Our overall goal is to determine the growthoptimizing proteome allocation for the core translation factors. Conceptually, varying tlF concentrations has two opposing effects on cell proliferation. At the biochemical level, high tlF expression can facilitate growth by allowing more efficient usage of ribosomes. At the systems level, increased tlF expression can nonetheless limit growth by reducing the number of ribosomes and other proteins that can be produced. The tradeoffs between various tlFs and ribosomes create a multidimensional optimization problem.
We solve this multidimensional problem by treating translation as a dynamical system, in which ribosomes cycle through initiation, elongation, and termination. The resulting flux drives cell growth. During steadystate growth, every interlocked step of the translation cycle must have the same ribosome flux that is specified by the growth rate. We show that at the growth optimum, concentrations for distinct tlFs can be solved independently. The resulting analytical solutions can be expressed in terms of the growth rate and simple biophysical parameters.
Cell growth driven by tlFdependent ribosome flux
To describe the biochemical effects of tlF concentrations on cell growth, we first introduce a coarsegrained translation cycle time ${\mathrm{\tau}}_{tl}$, or the time it takes for a ribosome to complete a typical cycle of protein synthesis (Figure 1A), which consists of three sequential steps: initiation ('$ini$'), elongation ('$el$'), and termination ('$ter$'). Each of these steps is catalyzed by multiple tlFs. The full translation cycle time is then sum of ribosome transit times at the three steps (${\mathrm{\tau}}_{tl}={\mathrm{\tau}}_{ini}+{\mathrm{\tau}}_{el}+{\mathrm{\tau}}_{ter}$), whose dependence on individual tlF concentrations can be quantitatively described through mass action kinetic schemes (schematically depicted in Figure 1A, see Appendices 2, 3, and 4 for details and examples below). We express tlF concentrations in units of proteome fractions (dry mass fraction of a specified protein to the full proteome), denoted by $\varphi $ (Scott et al., 2010) (Materials and methods, section Conversion between concentration and proteome fraction). Using this notation, the translation cycle time ${\mathrm{\tau}}_{tl}$ is a decreasing function of various tlFs concentrations ($\left\{{\varphi}_{tlF,i}\right\}$).
In addition to its dependency on tlF concentrations, the translation cycle time provides a bridge between the cell growth rate and ribosome concentration. In steadystate growth (Monod, 1949; Scott et al., 2010; Dai et al., 2016), the growth rates of cells and of their protein content (total number of proteins) must be identical, denoted here as λ, as a result of the constant average cellular composition. The protein content grows at a rate determined by the flux of active ribosomes completing the translation cycle, that is ${N}_{ribo}^{act}/{\mathrm{\tau}}_{tl}$, where ${N}_{ribo}^{act}$ is the number of active ribosomes per cell, divided by the total number of proteins ${N}_{P}$ per cell: $\lambda ={N}_{ribo}^{act}/{\mathrm{\tau}}_{tl}{N}_{P}$. Active ribosomes are defined as those functionally engaged in, and cycling through, the initiation, elongation, and termination reactions of peptide synthesis. Rescaling to the total mass fraction (Materials and methods, section Conversion between concentration and proteome fraction) of proteome for active ribosomes (${\varphi}_{ribo}^{act}$) yields
where ${\mathrm{\ell}}_{ribo}$ is the number of amino acids in ribosomal proteins and $\u27e8\mathrm{\ell}\u27e9$ is the average number of codons per protein, weighted by expression levels (Materials and methods, section Average number of codons per protein: $\u27e8\mathrm{\ell}\u27e9$). The rescaling factor (${\mathrm{\ell}}_{ribo}/\u27e8\mathrm{\ell}\u27e9\approx 7300/200=36.5$) is approximately constant across growth conditions (Matrials and methods, section Average number of codons per protein: $\u27e8\mathrm{\ell}\u27e9$). This equation establishes how tlF concentrations affect the growth rate biochemically via ${\mathrm{\tau}}_{tl}$.
We note that Equation 1 is a generalized form of the bacterial growth law that relates the mass fraction of elongating ribosomes to growth rate ($\lambda =\frac{{\varphi}_{ribo}^{el}}{{\mathrm{\tau}}_{el}}\frac{\u27e8\mathrm{\ell}\u27e9}{{\mathrm{\ell}}_{ribo}}=\gamma {\varphi}_{ribo}^{el}$, where γ is a rescaled translation elongation rate and ${\varphi}_{ribo}^{el}$ is the proteome fraction of actively translating ribosomes [Scott et al., 2010; Dai et al., 2016; Scott et al., 2014]). This classic growth law was derived by considering the steadystate flux of peptide bond formation by elongating ribosomes, whereas our model focuses on the flux of ribosomes that traverse the entire translation cycle, thereby allowing us to consider the effects of translation factors and ribosomes engaged in additional steps (initiation, elongation, and termination). For each step, Equation 1 can be extended to show that the growth rate is similarly proportional to the mass fraction of the corresponding ribosomes divided by the transit time at that step (Materials and methods, section Equality of ribosome flux in steadystate).
Steadystate growth thus imposes the requirement that the growth rate be inversely proportional to the translation cycle time and proportional to the number of active ribosomes engaged in the translation cycle (Equation 1). Inactive ribosomes, comprised of assembly intermediates, hibernating ribosomes, or otherwise nonfunctional ribosomes, have been found to constitute a small fraction (≈5%) of the total ribosome pool for fast growth (Lindahl, 1975; Dai et al., 2016). Based on Equation 1, both increasing ribosome concentration and increasing tlF concentrations (which decreases $\tau}_{tl$) can accelerate growth. However, production of ribosomes and tlFs is subject to competition under a limited proteomic space, which we consider next.
Optimization under proteome allocation constraint
To model the production cost tradeoff between tlFs and ribosomes, we integrate the fluxbased formulation above with a proteomic constraint. Assuming that components of the translation machinery together accounts for a fixed fraction of proteome, that is, the ‘translation sector’ ${\varphi}_{tl}$ (denoted ${\varphi}_{R}$ in the context of growth laws [Scott et al., 2010]), the proteome fraction for active ribosomes is related to the proteome fraction for translation factors via
Equations 1 and 2, together with to the kinetic schemes for each step of the translation cycle, constitute the core of our model. Combining the biochemical effects (Equation 1) and the systemslevel constraints (Equation 2) on tlFs, we arrive at a selfcontained relationship between growth and tlF concentrations:
where we explicitly express ${\mathrm{\tau}}_{tl}$ as a function of ${\varphi}_{tlF,i}$ to reflect the dependence of ribosome transit times on translation factor abundances. The above relationship (Equation 3) allows us to ask: what is the stoichiometry of tlFs, or partitioning of the translation sector, that maximizes the growth rate (Figure 1A)?
The condition for the optimal TF abundances, that is, the set of ${\varphi}_{tlF,i}$ that satisfies ${\left(\partial \lambda /\partial {\varphi}_{tlF,i}\right)}^{*}=0$, can be obtained by considering the ${\varphi}_{tlF,i}$ as independent variables and taking the derivative of Equation 3 with respect to a specified tlF abundance. Under the assumptions that the translation sector (${\varphi}_{tl}$) and the proteome fraction for inactive ribosomes (${\varphi}_{ribo}^{inact}$) are both fixed in a given external nutrient condition, this yields
where the asterisk refers to the growth optimum within our model, that is, ${\left(\partial \lambda /\partial {\varphi}_{tlF,i}\right)}^{*}=0$. Hence, under this framework, the tlF abundances are growthoptimized when the sensitivity of the translation cycle time to changing the considered tlF abundance ($\partial {\mathrm{\tau}}_{tl}/\partial {\varphi}_{tlF,i}$) reaches a value determined solely by the growth rate and protein size factors. We emphasize that the derivative above corresponds to a perturbation scenario in which the tlF abundance is changed while maintaining fixed the total proteomic resources to the translation sector, as prescribed by our optimization procedure. As such, it does not correspond an actual perturbation easily realizable experimentally.
Although Equation 3 and the resulting optimization conditions (Equation 4, one for every tlF) corresponds to a coupled nonlinear system of multiple ${\varphi}_{tlF,i}$, substantial decoupling occurs at the optimal growth rate. In this situation, most ${\varphi}_{tlF,i}$ are only connected through the resulting growth rate. The optimization problem is then further simplified by the fact that the translation cycle consists of sequential and largely independent steps. The translation cycle time ${\mathrm{\tau}}_{tl}$ corresponds to the sum of the coarsegrained initiation, elongation, and termination times, that is, ${\mathrm{\tau}}_{tl}={\mathrm{\tau}}_{ini}+{\mathrm{\tau}}_{el}+{\mathrm{\tau}}_{ter}$. Given that each tlF is involved in a specific molecular step, the sensitivity matrix of these times to tlF concentration is sparse: ${\left(\partial {\mathrm{\tau}}_{j}/\partial {\varphi}_{tlF,i}\right)}^{*}=0$ for most combinations of ${\mathrm{\tau}}_{j}$ and ${\varphi}_{tlF,i}$. This lack of ‘crossreactivity’ expresses that, for example, the initiation time ${\mathrm{\tau}}_{ini}$ is unaffected by the tRNA synthetase concentration. This sparsity only occurs at the optimal expression levels, as the transit times typically depend on the growth rate (see an example in section Non bindinglimited regime [one stop codon]) and $\partial \lambda /\partial {\varphi}_{tlF,i}\ne 0$ away from the optimum. The optimum condition for factor $i$ then simplifies to:
where $j$ above denotes the translation step(s) that tlF_{i} participates in. This leads to simplifications that allow the system to be solved analytically in most cases: instead of solving the full system at once, individual reactions within the translation cycle can be considered in isolation. The resulting optimal concentrations are connected via the growth rate ${\lambda}^{*}$. Interestingly, the optimal stoichiometry among most tlFs is independent of ${\lambda}^{*}$ if the reactions are in the bindinglimited regime, as we show below.
Case study: Translation termination
We first illustrate the process of solving for the optimal tlF concentration for the relatively simple case of translation termination. The principles used here and the form of solutions provide conceptual guideposts for solving other steps of the translation cycle.
In bacteria, translation termination (Bertram et al., 2001) consists of two distinct, sequential steps: (1) stop codon recognition and peptidyltRNA hydrolysis catalyzed by class I peptide chain release factors RF1 and RF2, followed by (2) dissociation of ribosomal subunits from the mRNA, that is, ribosome recycling, catalyzed by RF4. We do not explicitly consider the additional factors (e.g. RF3 and EFG) due to their lack of conservation or because they are nonlimiting for this specific step (Appendix 2, section Omitted molecular details). RF1 and RF2 have the same molecular functions but recognize different stop codons (Scolnick et al., 1968): RF1 recognizes stops UAA and UAG, whereas RF2 recognizes UAA and UGA. For simplicity, we describe here a scenario where RF1 and RF2 have no specificity towards the three stop codons, which allows us to combine them in a single factor (denoted RFI). The model is readily generalized, with similar results, to the case of the two RFs with their specificity towards the three stop codons (Appendix 2, section Full three stop codons model).
Under a coarsegrained description, the total ribosome transit time at termination ${\mathrm{\tau}}_{ter}$ can be decomposed into a sum of peptide release time and ribosome recycling time. In the treatment below, we consider a regime of bindinglimited reactions for simplicity (rapid catalytic rate). A full model with catalytic components can also be solved analytically (Appendix 2, section Non bindinglimited regime (one stop codon), Figure 2A). In the bindinglimited regime (${k}_{cat}\to \mathrm{\infty}$), the peptide release time and ribosome recycling time are inversely proportional to the corresponding tlF concentrations:
where the association rate constants ${k}_{on}^{i}$ are rescaled by the factor’s sizes in proteome fraction units (Materials and methods, section Conversion between concentration and proteome fraction). The above expression constitutes the solution of the mass action scheme for termination, connecting factor abundances to termination time.
The termination time (Equation 6) can then be directly substituted into the optimality condition (Equation 5) and solved in terms of ${\lambda}^{*}$:
If the reactions are not bindinglimited, an additional catalytic term $\propto {\lambda}^{*}/{k}_{cat}$ is added to the minimally required levels above (Appendix 2, section Non bindinglimited regime [one stop codon]). The squareroot dependence in the optimal RF concentrations emerges from the ${\varphi}_{i}^{1}$ dependence of ${\mathrm{\tau}}_{i}$, for example, for ribosome recycling ${\mathrm{\tau}}_{recyc}\propto {\varphi}_{RF4}^{1}$, which becomes ${({\varphi}_{i}^{*})}^{2}$ upon taking the derivative in the optimality condition (Equation 5). The square root is then obtained by solving for ${\varphi}_{i}^{*}$. A similar squareroot dependence has been noted in optimization of the ternary complex and tRNA abundances (Ehrenberg and Kurland, 1984; Berg and Kurland, 1997). Analysis of tlF expression across slower growth conditions supports the derived square root dependence (Figure 4—figure supplement 2). As a result of the squareroot, the optimal RF concentrations are weakly affected by biophysical properties such as the association rate constants and protein sizes. In the bindinglimited regime above, the ratio of the optimal concentrations between RFI and RF4 is independent of the growth rate and only depends on the kinetics of binding.
As a side note, the expression for termination time ${\mathrm{\tau}}_{ter}$ in Equation 6 must be modified in a regime where ribosomes are frequently queued upstream of stop codons. This would occur if the termination rate were slow and approached initiation rates on mRNAs (Bergmann and Lodish, 1979; Lalanne et al., 2021). In this regime, queues of ribosomes at stop codons would incur an additional time to terminate. In a general description, the resulting additional termination time can be absorbed in a queuing factor $\mathcal{\mathcal{Q}}:{\tau}_{ter}^{full}:={\tau}_{ter}\text{}\mathcal{\mathcal{Q}}({\tau}_{ter})$ (Appendix 1 for derivation and discussion). The resulting nonlinearity would forbid the decoupling in the optimization procedure between RFI and RF4. Although absolute rates of termination are difficult to measure in vivo, translation on mRNAs is generally thought to be limited at the initiation step (Laursen et al., 2005), and consistently, ribosome queuing at stop codons in bacteria is not usually observed (except under severe perturbations, e.g. Kavčič et al., 2020; Baggett et al., 2017; Mangano et al., 2020; Saito et al., 2020; Lalanne et al., 2021). In the physiological regime of fast termination, the queuing factor converges to 1, yielding simple solutions that depend only on biophysical parameters (Equations 7).
Equipartition between tlF and corresponding ribosomes
The optimal tlF concentrations (e.g. Equation 7) can also be intuitively derived from another viewpoint. For each reaction in the translation cycle, we can define an effective proteome fraction allocated to that process, combining the proteome fractions of the corresponding tlF and the ribosomes waiting at that specific step. As an example, for the case of peptide chain release factor (RFI) just treated, the effective proteome fraction includes the release factors and ribosomes with completed peptides waiting at stop codons (dashed box in Figure 2A), that is, ${\varphi}_{RFI}^{eff}:={\varphi}_{RFI}+{\varphi}_{ribo}^{stop}$. This effective proteome fraction corresponds to the total proteomic space associated to a tlF in the context of the translation cycle.
During steadystate growth, the concentration of ribosomes waiting at any specific step of the translation cycle is equal to the total active ribosome concentration multiplied by the ratio of the transit time of that step to the full cycle: for example, here ${\varphi}_{ribo}^{stop}=\frac{{\mathrm{\tau}}_{stop}}{{\mathrm{\tau}}_{tl}}{\varphi}_{ribo}^{act}$, where ${\mathrm{\tau}}_{stop}=1/({k}_{on}^{RFI}{\varphi}_{RFI})$ is the time to arrival of RFI. Using Equation 1 for ${\varphi}_{ribo}^{act}$, the effective proteome fraction satisfies:
In the last line, we used the inequality of arithmetic and geometric means ($a+b\ge 2\sqrt{ab}$) to obtain the minimum of the effective proteome fraction. The equality holds when the two proteome fractions are equal (${\varphi}_{RFI}={\varphi}_{ribo}^{stop}$), which provides the solution for optimal ${\varphi}_{RFI}$:
Hence, we recover Equation 7 by minimizing the effective proteome fraction allocated to a given process in the translation cycle (the above argument applies to the optimal free concentration in the nonbinding limited regime, see Appendix 2, section Non bindinglimited regime (one stop codon) for an example). From this perspective, optimization of the translation apparatus balances the production cost of the enzyme of interest with the improved efficiency of a having less ribosomes idle at that step, Figure 2B. The optimal abundance in our model corresponds to a point of equipartition: the proteome fraction of free cognate factors equals the proteome fraction of ribosomes waiting at the corresponding step (Figure 2B).
Case study: Ternary complex and tRNA cycle (EFTu and aaRS)
We next consider a more complex step of the translation cycle – elongation – and demonstrate that the optimality criterion (Equation 5) can similarly provide simple analytical solutions in the physiologically relevant regime. Translation elongation involves multiple interlocked cycles (one for each chemical species) and enzymes (EFTu, EFG, EFTs, aminoacyltRNA synthetases (aaRS), and more). Our simplified kinetic scheme for translation elongation is shown in Figure 3A: charged tRNAs are brought to ribosomes through a ternary complex (TC), corresponding to a bound tRNA and EFTu. Following tRNA delivery and GTP hydrolysis, EFTu is released from the ribosome, and nucleotide exchange factor EFTs recycles EFTu back into the active pool, after which EFTu can bind a charged tRNA again and form another TC. At the ribosome, translocation to the next codon is catalyzed by EFG, followed by release of uncharged tRNAs. AminoacyltRNA synthetases then charge tRNAs to complete the elongation cycle.
To reduce the complexity due to different tRNA isoacceptors and aaRSs, we selfconsistently coarsegrained the translation elongation cycle to have a single codon (derived in Appendix 3, section Coarsegrained onecodon model). The resulting model harbors a single effective species for tRNA, aaRSs, and TCs, respectively. A rescaling factor ($1/{n}_{aa}\approx 1/20$, estimated in section Estimation of coarsegrained rates) arises in the procedure to decrease the rates of codon specific reactions and can be attached to either the respective rate constants or chemical species concentrations. In our formulation, we choose to rescale the association rate constants such that the coarsegrained abundance for each effective species corresponds to the sum over all individual codonspecific components. For example, ${\varphi}_{aaRS}$ in our coarsegrained model corresponds to the summed proteome fraction of all aaRSs in the cell, and its association rate constant with the total tRNAs is rescaled by a factor of $1/{n}_{aa}$.
As a result of this choice of rescaling within our coarsegrained model, there are two classes of reactions in the elongation cycle that are distinguished by different kinetics: those that were codon specific (scaled by $1/{n}_{aa}$) and those that are not. Codonspecific reactions, for example, aaRS binding to cognate tRNAs and TC binding to cognate codons, are coarsegrained into onecodon reactions with reduced association rate constants (marked by # in Figure 3A). By contrast, codonagnostic reactions do not incur such a rescaling and are thus much faster. We refer to this as a separation of timescale between the two classes of reactions (codonspecific vs. codonagnostic), and note that this is not a reflection of slower underlying microscopic bimolecular reaction rates, but rather a result of our choice of variable in the coarsegraining.
Similar to translation termination, the factordependent ribosome transit time through a single codon (${\mathrm{\tau}}_{aa}$) is comprised of two steps, corresponding to binding of the TC and EFG, respectively (formal derivation and non bindinglimited regime in Appendix 3, section Coarsegrained translation elongation time):
The coarsegrained factordependent portion of the total translation elongation time in our model is then given by the single codon time above multiplied by the average number of codons per protein, that is, $\u27e8\mathrm{\ell}\u27e9{\mathrm{\tau}}_{aa}$. As discussed above, the rescaling of the TC association rate constant by ${n}_{aa}^{1}$ arises as a result of our coarsegraining to a onecodon model (Appendix C, section C.1 Coarsegrained onecodon model). Note that the ternary complex concentration, ${\varphi}_{TC}$, is a nonlinear function of the concentrations of all elongation factors (including ${\varphi}_{G}$).
Despite the complexity of ${\mathrm{\tau}}_{aa}$ as a function of the ${\varphi}_{tlF,i}$, the fact that all fluxes are equal in steadystate allows several steps to be isolated and solved separately (EFTs and EFG, greyed out in Figure 3A, respectively solved in Appendix C, sections C.3.3 Optimal EFTs abundance and C.3.4 Optimal EFG abundance). For example, the approximate bindinglimited solution for optimal EFG concentration parallels that for termination factors:
Importantly, the optimum for EFG is larger than the optimum for RFs by a factor $\sqrt{\u27e8\mathrm{\ell}\u27e9}$, reflecting that the typical translation cycle to produce a protein requires $\u27e8\mathrm{\ell}\u27e9$ steps catalyzed by EFG and only one step for RFs (i.e. $\u27e8\mathrm{\ell}\u27e9{\mathrm{\tau}}_{aa}$ enters the optimality condition, Equation 5, in contrast to ${\mathrm{\tau}}_{ter}$ which is not multiplied by a scaling factor). The square root dependence arises here for the same reason as in the case of translation termination (derivative of ${\varphi}^{1}$).
In contrast to EFG and EFTs, EFTu and aaRS cannot a priori be treated in isolation because the TC is composed of both EFTu and charged tRNAs. Still, the separation of timescales within our coarsegrained model (see Appendix C, section Interpretation of the sharp separation between aaRS and EFTu limited regimes) simplifies the solution considerably. Indeed, rapid binding of charged tRNAs to EFTu leads to either component being limiting for ternary complex concentration in most of the aaRS/EFTu expression space, leading to two clearly delineated regimes (Figure 3B). In one regime, charged tRNAs are limiting (low aaRS), whereas EFTu is limiting in the other (low EFTu). These regimes are separated by a narrow transition region, whose sharpness is a reflection of the smallness of the rate rescaling parameter ${n}_{aa}^{1}$ (see Appendix 3, section Interpretation of the sharp separation between aaRS and EFTu limited regimes). We term the focal region separating the two regimes in the aaRS/EFTu expression space the 'transition line’ (see 1 for derivation and additional details).
The transition line corresponds to conditions in which EFTu and aaRS are colimiting for TC concentration. In the EFTu limited region, increasing aaRS abundance does not increase ternary complex concentration: since all EFTu proteins are already bound to charged tRNAs, increasing tRNA charging cannot further increase TC concentration. Conversely, in the aatRNA limited region, increasing EFTu abundance does not increase TC concentration: since all charged tRNAs are already bound by EFTu, increasing EFTu concentration does not alleviate the requirement for more charged tRNAs. Given that the optimality condition requires nonzero increase in ternary complex concentration with increasing factor abundance (Equation 5 using ${\mathrm{\tau}}_{aa}$ from Equation 10), the optimal EFTu and aaRS abundances must be on the transition line.
Which point on the transition line corresponds to the optimum? Note that inside the EFTu limited region, the ternary complex concentration is entirely set by the total EFTu concentration: ${\varphi}_{TC}\approx {\varphi}_{Tu}$ (since most EFTu proteins are bound by charged tRNAs, Figure 3—figure supplement 1). As an approximation resulting from the narrow range of transition region (Figure 3 and Figure 3—figure supplement 1), we assume that the EFTu limited regime solution ${\varphi}_{TC}\approx {\varphi}_{Tu}$ holds up to very close to the transition line. Replacing ${\varphi}_{TC}$ by ${\varphi}_{Tu}$ in the elongation time Equation 10 and substituting in the optimality condition (Equation 5), the approximate optimal abundance for EFTu (the full solution includes additional terms from the EFTs cycle, section Optimal EFTu and aaRS abundances) can then be obtained in the same way as for translation termination factors:
Importantly, compared to the solution for EFG, the above is multiplied by an additional factor of $\sqrt{{n}_{aa}}$. This contribution arises from the rescaling of the association rate for the ternary complex to the ribosome in our coarsegrained onecodon model, increasing the requirement on EFTu abundance.
From the necessity for the combined EFTu and aaRS solution to fall on the transition line, the approximate solution for the optimal aminoacyltRNA synthetase abundance is then the intersection (yellow star in Figure 3B) of the transition line with the EFTuonly solution described above (dashed blue line in Figure 3B, derivation of solution in Box 1).
For the above derivation to be valid, the total number of tRNAs in the cell must be sufficient to accommodate all ribosomes (about two per ribosome, A and Psites) and binding to all EFTu (about gt_{4} per ribosome based on endogenous expression stoichiometry [Li et al., 2014; Lalanne et al., 2018]). The number of tRNAs per ribosomes in the cell should thus be at least 6×. Remarkably, estimates of this ratio in the cell suggest that this is barely the case (between 6 and 7 tRNAs/ribosome at fast growth [Dong et al., 1996]). Although our model treats the total tRNA abundance as a measured parameter and omits its selective pressure (see Hu et al., 2020 which includes RNA mass in their optimization procedure), the abundance of three core components of the tRNA cycle appear to be at the special point where the transition line plateau, that is set by total tRNA abundance, just crosses the EFTuonly optimum (blue line in Figure 3B). At this point, all three components are colimiting.
Box 1.
The EFTu and aaRS transition line.
Within our framework, optimality of translation factors is dictated by how coarsegrained ribosome transit times depend on factors’ abundances (Equation 4). For elongation factors aaRS and EFTu, contribution to the ribosome elongation time (${\mathrm{\tau}}_{el}=\u27e8\mathrm{\ell}\u27e9{\mathrm{\tau}}_{aa}$) is through the concentration of the ternary complex (Equation 10). Obtaining the optimal EFTu and aaRS abundance therefore requires solving for the ternary complex concentration as a function of these two variables.
The steadystate solution for the ternary complex concentration in the aaRS/EFTu expression displays two sharply separated regime (Figure 3B), separated by a narrow transition region (the ‘transition line’). As described in the main text, the transition line plays a critical role for identifying the optimal EFTu and aaRS abundances within our model. Away from the line, there is an unproductive excess of either factors, viz. either $\partial {\varphi}_{TC}/\partial {\varphi}_{Tu}\approx 0$ or $\partial {\varphi}_{TC}/\partial {\varphi}_{aaRS}\approx 0$. Here, we derive the equation for the transition line. First, we leverage the constraint imposed by the conservation of tRNAs, which in our model is: $\displaystyle {{\textstyle \text{tRNA}}}_{tot}=[{{\textstyle \text{R}}}_{\mathrm{\varnothing}}]+\underset{\propto \phantom{\rule{thinmathspace}{0ex}}\lambda /{k}_{el}^{max}}{\underset{\u23df}{2[{{\textstyle \text{R}}}_{TC}]+2[{{\textstyle \text{R}}}_{tRNA}]+2[{{\textstyle \text{R}}}_{G}]}}+[{\textstyle \text{tRNA}}]+[{\textstyle \text{tRNA}}\phantom{\rule{3.0pt}{0ex}}:\phantom{\rule{3.0pt}{0ex}}{\textstyle \text{aaRS}}]+[{\textstyle \text{aatRNA}}]+[{\textstyle \text{TC}}].$
Above, ${\text{tRNA}}_{tot}$ corresponds to the total tRNA concentration in the cell. In addition: ${\text{R}}_{\mathrm{\varnothing}}$: elongating ribosomes with empty Asite, ${\text{R}}_{TC}$: ribosomes with bound TC, ${\text{R}}_{tRNA}$: ribosomes with filled Asite and no bound factor, ${\text{R}}_{G}$: ribosomes with bound EFG, tRNA: free uncharged tRNAs, $\text{tRNA}:\text{aaRS}$: tRNA and aaRS complex, aatRNA: free charged tRNAs, and TC: ternary complex. Here, we assume that the elongating ribosomes always have a tRNA in the Psite, and a negligible occupancy in the Esite.
Using the system of equations from the mass action scheme at steadystate (section Translation elongation: optimal solutions), variables in the tRNA conservation equation above can be solved for in terms of the total abundance of EFTu and aaRS, the growth rate, and the steadystate ternary complex concentration. We note that the three ribosome species with a filled A site (${\textstyle \text{R}}}_{TC$, ${\textstyle \text{R}}}_{tRNA$, and ${\text{R}}_{G}$) do not depend on EFTu concentration, and can be coarsegrained to a term proportional to $\lambda /{k}_{el}^{max}$, where ${k}_{el}^{max}$ is the maximal translation elongation rate (not including the TC diffusion contribution) (Dai et al., 2016). In the bindinglimited regime, converting to proteome fraction units, and leaving out the EFTs contribution without loss of generality (see section Optimal EFTu and aaRS abundances for a full treatment), we have:
Above, ${\psi}_{tRNA}$ is a normalized tRNA concentration (see Equation 28). We have explicitly highlighted that the growth rate is dependent on EFTu and aaRS only through the ternary complex concentration ${\varphi}_{TC}$. From the definition of of the elongation time (Equation 10), we have $\lambda ({\varphi}_{TC})\propto {\varphi}_{TC}/({K}_{TC}+{\varphi}_{TC})$(Klumpp et al., 2013; Dai et al., 2016) (definition of ${K}_{TC}$ in terms of model parameters: supplement, Equation 39). Equation 13 is closed and can be solved for ${\varphi}_{TC}$ at given abundances of EFTu $({\varphi}_{Tu})$ and aaRS (${\varphi}_{aaRS})$.
Although Equation 13 is nonlinear and cannot be solved exactly in general, the separation of timescales in our coarsegrained description simplifies the problem considerably. Indeed, numerical solutions of Equation 13 (Figure 3B, section Optimal EFTu and aaRS abundances) show that the behavior of TC concentration in the twodimensional EFTu/aaRS expression space is split into two distinct regimes, sharply delineated by a transition line (orange line in Figure 3B, a geometric heuristic explaining the sharp separation between the regimes is presented in Appendix 3, section Interpretation of the sharp separation between aaRS and EFTu limited regimes, Figure 3—figure supplement 1). Since TC concentration only increases as a function of both aaRS and EFTu on the transition line, the optimal solutions for the two factors must fall on it.
An expression for the transition line can be derived. Conceptually, the region of transition between the two regimes has both a low concentration of free EFTu molecules (${\varphi}_{T{u}^{GTP}}/{\varphi}_{Tu}\approx 0$) and a low concentration of free charged tRNAs ($[\text{aatRNAs}]/{\text{tRNA}}_{tot}\approx 0$). Although no values in the aaRS/EFTu expression plane can formally satisfy these two conditions simultaneously, the transition line is specified by setting the free charged tRNA term to 0 and replacing ${\varphi}_{TC}$ by ${\varphi}_{Tu}$ (no free EFTu) in Equation 13. We denote by $({\overline{\varphi}}_{Tu},{\overline{\varphi}}_{aaRS})$ points satisfying the resulting requirement, namely (see Equation 40 for non bindinglimited case):
where we have defined the excess tRNA (${\mathrm{\Delta}}_{tRNA}$) above. In words, ${\mathrm{\Delta}}_{tRNA}$ corresponds to the available tRNAs after the tRNAs sequestered on ribosomes and EFTu in the TC are subtracted from the total tRNA budget. At large aaRS concentrations, the transition line plateaus as a result of the finite total tRNA budget within the cell (Figure 3B, middle panel). The plateau is reached once all tRNAs aaRS are charged: the system is then no longer limited by aaRSs, but by the amount of tRNAs.
Using the requirement that the optimum must fall on the transition line and the approximate solution for the EFTu optimum, the approximate optimal solution for aaRS is, from Equation 14 (section Optimal EFTu and aaRS abundances for non bindinglimited solution):
Within our model, the optimal aaRS concentration is thus set by the excess tRNAs at the EFTu optimum (${\mathrm{\Delta}}_{tRNA}^{*}$).
Optimal stoichiometry of mRNA translation factors
Analogous to the case studies above, optimal concentrations for all core translation factors can be solved using the optimality condition (Equation 5) and their respective kinetics schemes (the case of translation initiation is solved in Appendix 4). The analytical forms of the optimal solutions are shown in Table 1. In the bindinglimited regime, the ratios of growthoptimized tlF concentrations are independent of the growth rate (except for aaRS), and are dependent only on basic biophysical parameters, such as protein sizes and diffusion constants.
To obtain the numerical values of association rate constants needed for calculating the optimal tlF stoichiometry (Table 1), we used the measured ${\widehat{k}}_{on}^{TC}$ in vivo and estimated all other association rate constants using a biophysically motivated scaling ($\widehat{k}$ denotes the raw association rate constant in units µM^{−1}s^{−1}, which is different from the rescaled $k$, see section Conversion between concentration and proteome fraction). To our knowledge, the binding between TC and ribosomes, ${\hat{k}}_{on}^{TC}=6.4$ µM^{−1}s^{−1} (Dai et al., 2016), is the only measured association rate constant for any tlFs in a physiological context. We estimate the association rate constants for other reactions by scaling ${\widehat{k}}_{on}^{TC}$ by the respective diffusion coefficients of the chemical species, that is for reaction involving species $A$ and $B:\text{}{\hat{k}}_{on}^{AB}/{\hat{k}}_{on}^{TC}=({D}_{A}+{D}_{B})/({D}_{TC}+{D}_{ribo})$, where ${D}_{i}$ is the diffusion constant for the molecular species $i$ (see Appendix 5—table 2). Diffusion constants for several tlFs have been measured experimentally (Bakshi et al., 2012; Sanamrad et al., 2014; Plochowietz et al., 2017; Volkov et al., 2018), and uncharacterized ones can be estimated using the cubicroot scaling with number of codons per protein from the StokesEinstein relation (Nenninger et al., 2010) (see Appendix 5—table 1). For simplicity, this approach assumes that reactive radii and orientational constraints are similar for the different reactions (see 3 Discussion for additional assumptions). These strong assumptions are necessary given the lack of in vivo biochemical parameter measurements, and can be relaxed as refined empirical determination for more physiological association rates become available in the future. Nonetheless, we note that the squareroot dependence on these parameters (Table 1) for our predictions makes the numerical values less sensitive to possible tlFspecific effects.
The estimated optimal tlF concentrations show concordance with the observed ones, both in terms of the absolute levels and the stoichiometry among tlFs (Figure 4 for fast growth, see Supplementary file 1 for data and Figure 4—figure supplement 1 for additional growth conditions). A hierarchy of expression levels emerges such that the factors involved in elongation are more abundant compared to initiation and termination factors. The separation of these two classes is driven by the scaling factor $\sqrt{\u27e8\mathrm{\ell}\u27e9}\approx 14$ in our analytical solutions, which reflects the fact that the flux for elongation factors is $\u27e8\mathrm{\ell}\u27e9\approx 200$ times higher than that for initiation and termination factors. Within each class, the finer hierarchy of expression levels can also be further explained by simple parameters. For example, EFTu is predicted to be more abundant than EFG by a factor of $\sqrt{{n}_{aa}{\mathrm{\ell}}_{Tu}/{\mathrm{\ell}}_{G}}\approx 3.3$ (observed ${\varphi}_{Tu}/{\varphi}_{G}$: E. coli 3.9, B. subtilis 2.7, V. natriegens 3.3). A higher abundance is required for EFTu because it is bound to the different tRNAs, which effectively decreases the concentration by a factor of ${n}_{aa}\approx 20$ (see section Estimation of coarsegrained rates for derivation and discussion of why the factor is not equal to the number of different tRNAs). Taken together, our model offers straightforward explanations for the observed tlF stoichiometry.
For a few tlFs, the observed concentrations are two to fivefold higher than the predicted optimal levels (e.g. EFTs, RF4, and IF1 in Figure 4). A potential explanation is that the corresponding reactions may not be binding or diffusionlimited, which would lead to a nonnegligible fraction of tlFs sequestered at the catalytic step and thereby require higher total concentrations. Indeed, recent detailed modeling of the EFTs (Hu et al., 2020) cycle estimated only a small fraction (6% to 48%) of its abundance was in the free form in the cell, consistent with the large deviation we observe for this factor from our diffusion only prediction. Our optimization model can also be solved analytically in the nonbindinglimited regime (Table 1), with the finite catalytic rate leading to an additional contribution of the form $\propto \mathrm{\ell}{\lambda}^{*}/{k}_{cat}$. However, the numerical values for these solutions are in general difficult to obtain because the estimates for catalytic rates are sparse and often inconsistent with estimates of kinetics in live cells. As an example, median estimated aaRS catalytic rates (Jeske et al., 2019) measured in vitro is ≈3 s^{−1}, well below the minimal value of 15 s^{−1}, required to sustain translation flux at the measured value (Appendix 5), suggesting substantial deviation between in vitro and in vivo kinetics. While technically demanding, the fraction of free vs. bound factors can in principle be determined through live cell microscopy of tagged factors by partitioning the diffusive states of the tagged enzyme. Using that approach, Volkov et al., 2018 estimated that EFTu was in its bound state <10% of the time (consistent with our diffusionlimited prediction closed to the observed value for this factor).
Another potential explanation for the observed deviations from our predictions is that the selective pressure for these tlFs may be lower compared to the more highly expressed tlFs. This explanation is unlikely both because their stoichiometry are observed to be conserved (Figure 1B, Figure 4—figure supplement 2) and given that the expression of other lowly expressed tlFs (e.g. RF1, RF2, and individual aaRSs) has been shown to acutely affect cell growth (Lalanne et al., 2021; Parker et al., 2020). Nevertheless, the deviations from the predicted optimal levels suggest that a more refined model may be required than our firstprinciples derivation.
Discussion
Despite the comprehensive characterization of their molecular mechanisms, the ‘mixology’ for the protein synthesis machineries inside living cells has remained elusive. Here, we establish a firstprinciples framework to provide analytical solutions for the growthoptimizing concentrations of translation factors. We find reasonable agreements between our parameterfree parsimonious predictions and the observed tlF stoichiometry (Figure 4). These results provide simple rationales for the hierarchy of expression levels, as well as insights into several construction principles for biological pathways.
An important implication from the agreement between observed stoichiometries and our predictions is that most tlFs are colimiting for growth. Previous models have focused on expression optimization for the full translation sector, ribosomes (Scott et al., 2010; Belliveau et al., 2021), and the abundant elongation factors EFTu (Ehrenberg and Kurland, 1984; Klumpp et al., 2013). In a recent study, Hu and colleagues considered additional RNA components and EFTs in their optimization procedure (Hu et al., 2020). In line with the conclusions of these previous studies, our results demonstrate that multiple components of the translation machinery, regardless of their observed expression level, are simultaneously colimiting for cell growth. By virtue of the interlocked translation cycles at steady state, the flux through every cycle must be matched. In our model, the optimality occurs when there are just enough tlFs to support the required flux in every cycle, such that the proteome fraction of free factors equals that of waiting ribosomes at that step (equipartition). If the concentration of any one tlF falls below the optimal point, it becomes the limiting factor for protein synthesis and growth. This result is supported by experimental evidence that slight knockdowns of individual RFs and aaRSs are detrimental to growth (Parker et al., 2020; Lalanne et al., 2021). Figuratively, the translation apparatus is analogous to a vulnerable supply chain, in which slowdown in any of the steps affects the full output.
In the bindinglimited regime, the optimal tlF stoichiometry is independent of the specific growth rate (except for aaRS). This is consistent with the observation that relative tlF expression remains unchanged in E. coli in conditions with doubling times ranging from 20 min to 2 hr (Lalanne et al., 2018; Li et al., 2014; Figure 4—figure supplement 2A).
Our results are also consistent with the maintenance of the relative tlF expression across large phylogenetic distances even though the underlying regulation and cellular physiology has diverged (Lalanne et al., 2018; Figure 1B, and additional comparison to slow growing C. crescentus in Figure 4—figure supplement 2A). Under the assumption of diffusionlimited association to estimate parameters, the optimal tlF stoichiometry depends only on simple biophysical parameters, including protein sizes and diffusion constants, that are likely conserved in distant species. It remains to be determined if similar biophysical principles apply to the other pathways that also exhibit conserved enzyme expression stoichiometry.
In principle, our model can also make predictions on the growth defects at suboptimal tlF concentrations. However, experimentally testing these predictions will be difficult due to secondary effects of gene regulation that are not considered in our model near optimality. For example, we have recently shown that small changes in RF levels lead to idiosyncratic induction of the general stress response in B. subtilis due to a single ultrasensitive stop codon (Lalanne et al., 2021). As a result, the growth defect not only arises from reduced translation flux, but is in fact dictated by spurious regulatory connections that are normally not activated when tlF expression is at the optimum. We propose that tlF expression may be set at the optimal levels as our firstprinciples model suggests but entrenched by connections in the regulatory network. To predict the full expressiontofitness landscape away from the optimum, a more comprehensive model may be required to take into account all the molecular interactions in the cell (Karr et al., 2012; Macklin et al., 2020).
Our coarsegraining approach has several limitations in its connection to detailed biochemical parameters. Foremost, coarsegrained association rate constants remain difficult to numerically estimate, and possibly neglect important features. In particular, given the sparsity of available in vivo rate constants, we estimate ${\widehat{k}}_{on}$ for all tlFs reactions by scaling the measured TC association rate constant (${\widehat{k}}_{on}^{TC}$) by the respective diffusion coefficients. This approach generates more plausible values than the unrealistic overestimate from Smoluchowski theory (diffusionlimited rate for perfectly absorbing spheres, see Appendix 5). However, the simplifying assumptions that certain molecular properties of modeled reactions are similar (e.g. the size of the reactive surfaces, orientational constraints of the bimolecular interaction, and possible noncognate binding events) may have to be modified for more detailed models. We also do not explicitly consider offrates in our model. Instead, our parameters correspond to effective rate constants that account for possible sequential binding and unbinding events, that is, ${\stackrel{~}{k}}_{on}={k}_{on}/{n}_{bind}$, with ${n}_{bind}={k}_{cat}/({k}_{cat}+{k}_{off})$. The effective association rate constants in our model thus contain information about catalytic and possible proofreading steps, which could be tlFspecific and are challenging to estimate. All these effects may contribute to the discrepancy between our predicted and observed tlF concentrations. As more physiological and molecular data become available, these tlFspecific features could be used to individually refine our estimate for the association rates constants and our predictions. For example, elaborate calculations from structural data could account for rotational constraints (Schlosshauer and Baker, 2004), but are beyond the scope of the present work. Overall, we expect these tlFspecific corrections to be of limited influence on the final predictions due to the squareroot dependence of the optimal expression (Table 2). We further note that a number of conclusions from our model, such as the factor of $\sqrt{\u27e8\mathrm{\ell}\u27e9}$ separating the optimal abundances of elongation from initiation/termination tlFs, are generic and do not depend on the specific association rates.
Taken together, our model provides the biophysical basis for the stoichiometry of translation factors in living cells. The firstprinciples approach complements more comprehensive models that include many biochemical parameters (Hu et al., 2020; Vieira et al., 2016), while providing intuitive rationales for the expression hierarchy. We anticipate that our approach will be generalizable to elucidate or design enzyme stoichiometry of other biological pathways, especially those whose activities are required for cell growth.
Materials and methods
Average number of codons per protein: $\u27e8\mathrm{\ell}\u27e9$
Request a detailed protocolWe calculate the average number of codons per protein, weighted by expression, as
where ${\mathrm{\ell}}_{i}$ is the number of codon for the protein product of gene $i$, and e_{i} is the protein synthesis rate (as estimated from ribosome profiling [Li et al., 2014; Lalanne et al., 2018]) for gene $i$. For a stable proteome (in fast growing bacteria, the cell doubling time is shorter than the active degradation of most proteins [Larrabee et al., 1980]), the protein synthesis rate equals to the proteome mass fraction (Li et al., 2014). Changes in the expression of genes across growth conditions do not lead to substantial changes in $\u27e8\mathrm{\ell}\u27e9$. In E. coli, across growth conditions spanning ≈20 min doubling time to ≈120 min, $\u27e8\mathrm{\ell}\u27e9$ changes by about 20%. Specifically, we find $\u27e8\mathrm{\ell}\u27e9=$ 196, 210, and 240 in respectively MOPS complete (≈20 min doubling time [Li et al., 2014]), MOPS minimal (≈56 min doubling time [Li et al., 2014]), and NQ1390 forced glucose limitation (≈120 min doubling time [Mori et al., 2021]), based on ribosome profiling data. Here for simplicity, we take $\u27e8\mathrm{\ell}\u27e9\approx 200$ throughout.
Conversion between concentration and proteome fraction
Request a detailed protocolThroughout, we use both units of concentration (molar), denoted as for example, $[A]$ for protein $A$, and proteome fraction, denoted by ${\varphi}_{A}$ (Scott et al., 2010). The correspondence between the two is ${\varphi}_{A}=[A]{\mathrm{\ell}}_{A}/P$, where ${\mathrm{\ell}}_{A}$ is the number of amino acid in protein $A$, and $P$ is the inprotein amino acid concentration in the cell. $P\approx 2.6\times {10}^{6}$ µM, and has a value approximately independent of growth rate (Klumpp et al., 2013; Bremer and Dennis, 2008). This change in units also relates to how association constants are defined in units of proteome fraction: ${\widehat{k}}_{on}[A]:={k}_{on}{\varphi}_{A}$, where the hat $\widehat{\cdot}$ refers to the association constant in usual units of µM^{−1} s^{−1} (used to connect to empirical data). Hence, ${k}_{on}:={\widehat{k}}_{on}P{\mathrm{\ell}}^{1}$ is the rescaled association rate in units of proteome fraction.
Equality of ribosome flux in steadystate
Request a detailed protocolIn steadystate exponential growth, the ribosome flux in and out of each intermediate state is equal to the total flux. This results from the fact that no ribosome can accumulate in any intermediate state. Since the flux out of state $i$ is given by ${\varphi}_{ribo}^{i}/{\mathrm{\tau}}_{i}$, we must have:
As a consequence, the proportion of ribosome in each state is equal to the proportion of time spent at that given step, for example for translation initiation:
Protein production flux and growth rate
Request a detailed protocolIn order to write the mass action kinetic scheme for more complex models, it is useful to recast our framework in terms of the protein number production flux $J$, defined as the number of full length proteins produced per cell volume per unit time. The production of each protein requires a ribosome to go through the full synthesis cycle, and as such $J$ provides a convenient quantity in mass action schemes formulated in molar units.
In steadystate of exponential growth (Monod, 1949; Scott et al., 2010; Dai et al., 2016), there is a direct relationship between the growth rate λ (defined through $\text{d}N/\text{d}t=\lambda N$, where $N$ is the number of cells per unit volume of culture) and the protein production flux $J$. Explicitly, the protein mass accumulation rate is $\lambda M$, where $M$ is the total protein mass per unit volume of culture. If $V$ is the mean cell volume, then $\lambda M/V=N{m}_{aa}\u27e8\mathrm{\ell}\u27e9J$, where ${m}_{aa}$ is the mean amino acid mass. Defining $P:=M/({m}_{aa}NV)$, the inprotein amino acid concentration per cell (Materials and methods, section Conversion between concentration and proteome fraction), the connection between protein production flux $J$ and growth rate λ is then $J=\frac{P\lambda}{\u27e8\mathrm{\ell}\u27e9}$. This relationship will be used to convert between molar and proteome fraction in some equations below.
Summary of optimal solutions
Request a detailed protocolSolutions for the factor predicted optimal abundances as a function of effective biochemical parameters and the growth rate at the optimum, are presented in Table 1. The table breaks down terms in each solution by categories: direct diffusion term (arising from diffusive search time), catalytic sequestration, and delay incurred by the diffusion of other proteins in part of the cycle of the factor of interest. Solutions are listed in terms of onrate ${\widehat{k}}_{on}$ (units of µM^{−1}s^{−1}). The aaRS solution follows a different form:
Appendix 1
Coarsegrained transition times: models of ribosome traffic
Our coarsegrained model of ribosome transitions between categories of initiation, elongation, and termination need to be distinguished from the individual molecular times of the respective steps in one important regard: ribosome traffic on mRNAs can lead to effective delays arising from transient queuing. For example, if translation termination is slow and ribosomes start to pile up and form queues upstream of stop codons on mRNAs, the molecular time of termination (time between ribosome arrival to the stop codon and its recycling to the free ribosome pool) will not be a correct reflection of the actual termination time of a ribosome, because of the additional wait time in the queue. A similar argument can be made for transient queuing forming in the body of genes for elongating ribosomes.
We connect these two (molecular and coarsegrained) levels of description by noting that our mass action schemes relating the translation factor abundance to the times of the specific steps can be used as input parameters in traffic models of ribosome movement along mRNAs taking into account possible manybody interactions (e.g. totally asymmetric exclusion processes [Shaw et al., 2003; Kavčič et al., 2020]). Solving these traffic models can then be used to obtain transition times in our coarsegrained translation cycle model. As we show below, corrections arising from transient queuing are small (for endogenous translation factor abundances) based on current estimates the absolute rates of initiation, elongation, and termination, on individual mRNAs, such that stochastic queuing does not play a dominant role in determining optimal translation factor expression levels.
As a first example, we relate the onstop codon molecular termination time ${\mathrm{\tau}}_{ter}$, which we obtain from solving our mass action scheme (see Equation 6), to the termination time in presence of queuing: ${\mathrm{\tau}}_{ter}^{full}$. The difference between the two, as described above, being related to possible queues upstream of stop codons leading to further delays in the process of translation termination, and thus to a longer termination time than that of the molecular onstop codon termination. The delay factor will be denoted $\mathcal{Q}\left({\mathrm{\tau}}_{ter}\right)$, defined through:
To derive the expression for the $\mathcal{Q}$ factor, note that in steadystate, ribosome numbers in a given state is directly proportional to the time to transition out of that state. Let m_{i} be the mRNA concentration for gene $i$ in the cell, ${n}_{ter}({\alpha}_{i},{\mathrm{\tau}}_{ter})$ the number of terminating ribosomes (including queues if present) on a transcript with per mRNA translation initiation rate (i.e. translation efficiency [Li, 2015]) ${\alpha}_{i}$, then:
whereas
with ${n}_{ter}^{\mathrm{\varnothing}\mathcal{\mathcal{Q}}}({\alpha}_{i},{\tau}_{ter})$ the average number of terminating ribosomes on a transcript with translation efficiency ${\alpha}_{i}$, assuming no queue upstream of the stop codon. Note that ${n}_{ter}({\alpha}_{i},{\tau}_{ter})\ge {n}_{ter}^{\mathrm{\varnothing}\mathcal{\mathcal{Q}}}({\alpha}_{i},{\tau}_{ter})$ (the differences being queued ribosomes). Hence, the queuing factor $\mathcal{Q}$ is:
Formally, ${n}_{ter}$ can be obtained by solving a TASEP model (Shaw et al., 2003), but a simplified queue model (Bergmann and Lodish, 1979; Lalanne et al., 2021) disregarding spatial information recapitulates the statistics of queue formation (as verified by full stochastic simulations, data not shown). The state space of the queue model is the number of ribosomes $N$ in the queue. Ribosomes arrive at a rate α (initiation rate on the transcript), and leave at the molecular termination rate ${\mathrm{\tau}}_{ter}^{1}$. The ribosome arrival rate at the queue is rigorously correct in steadystate, unless the queue becomes large enough to affect the initiation process (fully jammed transcript), or RNA degradation. The stochastic process (away from the jammed state) is then described by: $N\to N+1$ at rate α, and $N\to N1$ at rate ${\mathrm{\tau}}_{ter}^{1}$ for $N>0$. The probability for the queue to have $N$ ribosomes, $P(N)$, can be obtained as the steadystate from the resulting master equation, leading to a geometric series: $P(N)={\left(\alpha {\mathrm{\tau}}_{ter}\right)}^{N}\left(1\alpha {\mathrm{\tau}}_{ter}\right)$. Hence, the prevalence of higher order queues scales as the ratio of the initiation to termination rate on the transcript. The average queue size, corresponding to ${n}_{ter}({\alpha}_{i},{\mathrm{\tau}}_{ter})$, is:
Above, the solution of the simple model is truncated at the value where the transcript becomes fully jammed with ${\mathrm{\ell}}_{i}/{\mathrm{\ell}}_{footprint}$ ribosomes (${\mathrm{\ell}}_{i}$ and ${\mathrm{\ell}}_{footprint}$ being the size of gene $i$ and the size occupied by a ribosome respectively). The no queue ribosome number is simply equal to a model where queues with $N>1$ do not arise, hence $n}_{ter}^{\mathrm{\varnothing}\mathcal{\mathcal{Q}}}({\alpha}_{i},{\tau}_{ter})={\alpha}_{i}{\tau}_{ter$. Therefore, the queuing factor, under the stated assumptions (and assuming no transcript is in the jammed state), is
Expanding for fast termination gives $\mathcal{Q}1=\frac{{\mathrm{\tau}}_{ter}\u27e8{\alpha}^{2}\u27e9}{\u27e8\alpha \u27e9}$ as the leading order correction, where the averages are weighted by mRNA levels. The above was derived assuming exponentially distributed initiation and termination times, but could be modified to account for more complex dynamics of the initiation and initiation steps.
The queuing factor can be estimated based on absolute measurements of the initiation and termination rates in cells. Kennell and Riezman, 1977 estimate 3.2 s between initiation events on the lacZ mRNA (at 48 min per cell doubling). Bremer and Dennis, 2008 estimate 1 s per ribosome initiation events at 20 min doubling time. Recent calibrated highthroughput measurements report a genomewide median of 5.6 s per initiation events (Gorochowski et al., 2019). To our knowledge, estimation of absolute in vivo termination rates have not been performed, but we can estimate bounds. Indirect assessment based on steadystate protein production measurements place the fraction of actively elongating ribosome at about 95% (Dai et al., 2016). Assuming (upper bound) that the 5% of non elongating ribosomes are in the process of termination would give a termination time of $5\mathrm{\%}\times 11.1s\approx 0.6\phantom{\rule{thinmathspace}{0ex}}s$ (fraction of ribosomes in a given state equal to the ratio of transition times), where we have used that the elongation time of an average protein is about 11.1 s ($200/18\phantom{\rule{thinmathspace}{0ex}}{s}^{1}$) at fast growth (Dai et al., 2016). This upper bound is still much smaller than the reported median initiation time, suggesting that the queuing factor for termination is small. As additional support to the view that translation is far from being termination limited, small that queues at stop codons are only globally observed in ribosome profiling upon severe perturbations (Kavčič et al., 2020; Baggett et al., 2017; Mangano et al., 2020; Saito et al., 2020; Lalanne et al., 2021).
With regard to translation elongation, transient queuing in the body of gene can also lead to a difference between molecular and coarsegrained transition times in our model. However, the fraction of ribosomes transiently stalled due to this queuing scales as $\alpha {\mathrm{\tau}}_{aa}$ in the lowdensity phase (defined by requirements $\alpha {\mathrm{\tau}}_{ter}<1$ and $\alpha {\mathrm{\tau}}_{aa}<{(1+\sqrt{{\mathrm{\ell}}_{footprint}})}^{1}\approx 0.25)$ of the TASEP model (Shaw et al., 2003). Since measured estimates place $\alpha {\mathrm{\tau}}_{aa}\sim 0.01$ (Dai et al., 2016; Gorochowski et al., 2019), we do not consider the queuing effect for elongating ribosomes within our optimization framework for elongation factor abundances.
Appendix 2
Translation termination
Omitted molecular details
The kinetic scheme presented in Figure 2A does not include some known molecular details of translation termination. For example, GTPase RF3 has been shown to catalyze the release of RF1/RF2 post peptide hydrolysis and to effectively prevent rebinding to empty A site ribosome without peptide (Pavlov et al., 1997). RF3 is not included in our model given our desire for a parsimonious description and due to the absence of identifiable homologs in multiple bacteria (e.g. B. subtilis) (Margus et al., 2007). Our scheme aggregates the RF1/RF2 recycling rate with the catalytic rate, and further assume a unidirectional reaction without rebinding (consistent with a lower bound), effectively taking into account the action of RF3. In addition, translocation factor EFG is known to be implicated in ribosome recycling via translocation post RF4 binding (Zavialov et al., 2005). We assume EFG’s abundance requirement toward the function of termination to be a minor fraction of its total requirement (nonsense to sense codons ≈0.5%) and to be nonlimiting for this step. We thus coarsegrain EFG’s role in ribosome recycling through an effective catalytic rate for RF4, see Borg et al., 2016 for details of EFG’s involvement in ribosome recycling. As another example of simplification in our coarsegraining, we also do not explicitly model RF1/RF2’s posttranslational modification by methyltransferase PrmC (Mora et al., 2007). Thus, the activity of the RFs within our description to correspond to the average within a possibly heterogeneous pool of modified and unmodified factors in the cell.
Non bindinglimited regime (one stop codon)
If translation termination is not diffusion limited, terms corresponding to the finite catalytic times must be included in addition to the diffusive contributions in the termination time (Equation 6). Under our simplified scheme (Figure 2A) and with a single stop codons (grouping RF1 and RF2), the molecular termination time is then sum of the four separate times corresponding to distinct events:
The two novelties compared to the diffusionlimited regime (Equation 6) are: (1) addition of the catalytic times ${k}_{cat}^{1}$ for the two steps, and importantly (2) the mass action diffusion terms now involve the free concentration of release factors. Generally, the free concentration of the tlFs can be obtained by solving the steadystate solutions of kinetic schemes under constraints imposed by conservation equations. The examples in e.g., sections B.3, C.3, and D.1 below provide the mathematical details associated with the procedure.
Here, the difference between the total and free concentration of release factor arises from the finite catalytic turnover of the enzymes, and corresponds to the concentration of ribosome bound release factors. Given the flux $J$ through the system in steadystate of growth, the concentration of ribosome bound release factor (e.g. for RF4) is $J/{k}_{cat}^{RF4}$, which becomes $\frac{{\mathrm{\ell}}_{RF4}\lambda}{\u27e8\mathrm{\ell}\u27e9{k}_{cat}^{RF4}}$ upon converting to proteome fraction. This quantity sets the absolute minimum for the release factor abundance necessary to sustain growth λ for a given ${k}_{cat}$. The free concentrations for the release factors are then:
Hence, the final solution for the steadystate termination time as a function of the total abundance of the release factors and growth rate is:
The relationship above, between termination time, total tlF abundance, and growth rate λ closes the solution of the kinetic scheme. Substituting the above in the optimality condition (Equation 5) leads to the solution:
The additional terms $\propto {\lambda}^{*}$ correspond to the contribution to the optimal abundance arising from the finite catalytic rates, no present in the diffusion limited regime (Equation 7).
Full three stop codons model
The full model with three different stop codons (UAA, UGA, UAG) and RF1/RF2 with different specificities (RF1: UAA, UAG; RF2: UAA, UGA) can also be solved exactly, leading to a small correction on the summed optimal abundance for RF1 and RF2 of $\sqrt{1+2\sqrt{{f}_{UAG}{f}_{UGA}}}<1.05$ (fast growing species considered, where ${f}_{UAG}$ and ${f}_{UGA}$ are the fractional fluxes through the RF1 and RF2 stop codons, respectively) compared to the single stop codon optimum derived above (${\varphi}_{RFI}^{*}$, Equation 20). We provide details below. With three stop codons, the coarsegrained reaction scheme is shown in Appendix 2—figure 1. The relevant chemical species and parameters are listed in Appendix 2—table 1.
The corresponding mass action system of equations for peptide release:
And for ribosome recycling:
The conservation equations for RF1, RF2 and RF4 are:
With a more complex scheme such as the one above, the optimization problem can be solved in three steps. First, we obtain the steadystate concentration of the chemical species. Second, we determine the effective coarsegrained termination time. Finally, the optimal abundance is found by substituting the termination time in the optimality condition (Equation 5), and solving the resulting system of equation.
Steadystate concentrations for RFs
Note that the RF1/RF2 and RF4 completely decouple, and that the solution for RF4 is identical to the one stop codon case solved above (section Non bindinglimited regime [one stop codon]). For peptide chain release, the steadystate of the system can be solved by expressing the all chemical species in terms of $[RF1]$, and $[RF2]$:
Substituting these in the conservation equations for RF1 and RF2 leads to a closed system in terms of $[RF1]$ and $[RF2]$:
Under the assumption of identical biochemical properties for RF1 and RF2, namely ${k}_{cat}^{RF1}={k}_{cat}^{RF2}:={k}_{cat}^{RFI}$ and ${\widehat{k}}_{on}^{RF1}={\widehat{k}}_{on}^{RF2}:={\widehat{k}}_{on}^{RFI}$, the total free concentration of RF1 and RF2 simplifies to: $[RF1]+[RF2]=RF{1}_{tot}+RF{2}_{tot}\frac{J}{{k}_{cat}^{RFI}}$, where we used ${f}_{UAA}+{f}_{UAG}+{f}_{UGA}=1$ (by definition). Using this relation to eliminate $[RF2]$ from the $[RF1]$ equation (and viceversa), we obtain, upon conversion to proteome fraction:
where
These constitute the steadystate solutions of the system of equation.
Coarsegrained translation termination time
In order to obtain an expression for the termination time (peptide release portion), needed to determine the optimal RF abundance (i.e. to substitute in Equation 5), the peptide chain release contribution arises from the ribosome containing species listed in Equation 21, which sum to (under the assumption of identical biochemical properties for RF1/RF2):
Upon conversion to proteome fraction, the above becomes:
The bracketed term corresponds to the coarsegrained time associated with peptide chain release ${\mathrm{\tau}}_{pep}$, and the free concentrations are given by Equations 22.
Optimal abundances for RF1/RF2
The solved concentrations in steadystate (as a function of proteome fractions) and coarsegrained times allow us to determine the optimal RF1 and RF2 solutions (within our model). The optimality condition (Equation 5) is now:
Solving the above system leads to optima ${\varphi}_{RF1}^{*}$ and ${\varphi}_{RF2}^{*}$:
where the new factor $\delta :=2\sqrt{{f}_{UAG}{f}_{UGA}}$.
The relative flux through each stop codon (${f}_{UAA},{f}_{UAG},{f}_{UGA}$) can be estimated in a variety of bacteria from ribosome profiling data (Lalanne et al., 2018) as the total synthesis fraction of genes with the respective stop codon. For fast growing species considered in the current study, ${f}_{UAA}\approx 0.9$, and the correction term to the optimal solution for the summed abundance of RF1 and RF2 ($\sqrt{1+\delta}$) is consequently small (E. coli: ${f}_{UAA}=0.888$, ${f}_{UAG}=0.015$, ${f}_{UGA}=0.097$, $\sqrt{1+\delta}=1.04$; B. subtilis: ${f}_{UAA}=0.888$, ${f}_{UAG}=0.064$, ${f}_{UGA}=0.049$, $\sqrt{1+\delta}=1.05$; V. natriegens: ${f}_{UAA}=0.929$, ${f}_{UAG}=0.041$, ${f}_{UGA}=0.031$, $\sqrt{1+\delta}=1.04$)
Appendix 3
Translation elongation
Coarsegrained onecodon model
Translation elongation is a more complicated process than termination, involving multiple factors to bring the charged tRNA to the ribosome (EFTu), charge the tRNAs (aaRS), translocate the ribosome (EFG), and perform nucleotide exchange on EFTu to drive the process (EFTs), in addition to others not included here. Our simplified kinetic scheme is illustrated in Appendix 3—figure 1. In anticipation coarsegraining procedure detailed below, rates rescaled in the conversion to a onecodon model are marked by *.
To simplify our model, we coarsegrain the elongation cycle by considering a single codon type (section Estimation of coarsegrained rates below or details of the coarsegraining procedure), effectively grouping the tRNA’s, tRNA synthetases, and different ternary complexes to single entities. Importantly, as a result, the onrates associated with these processes are rescaled by a factor close to ${n}_{aa}^{1}$, where ${n}_{aa}=20$.
An important distinction for elongation compared to initiation and termination is that multiple elongation steps (average $\u27e8\mathrm{\ell}\u27e9\approx 200$) are required to generate a protein. Hence, the flux into the through the elongation cycle is $\u27e8\mathrm{\ell}\u27e9$ larger than that through the initiation and termination steps (there is one initiation and termination event for each protein made, but about 200 elongation steps on average).
The mass action reaction scheme for translation elongation:
To arrive at the above, we started with a full model of translation (not shown), will all possible codons, tRNA species, and ribosomes with different codons. To coarsegrain the model, we introduced the following effective variables, which correspond to the total concentration of each type of species involved, summed over the of the codon/amino acid specificity:
In the above, Greek indices correspond to different codons on mRNAs, and Roman indices to different tRNAs. Roman indices with a hat ($\widehat{i}$) correspond to tRNA synthetases recognizing specific tRNAs (multiple amino acids have more than one tRNA isoacceptor). In defining these coarsegrained species (our approach is analogous to that of Dai et al., 2016), we redefined the two following kinetic parameters:
${\widehat{k}}_{on}^{aaRS}$ and ${\widehat{k}}_{on}^{TC}$ correspond to the microscopic bimolecular rates (assumed equal for the different chemical species). ${S}_{\nu ,j}$ is the tRNA isoacceptor/codon specificity matrix (one if tRNA $i$ can recognize codon ν, 0 otherwise) (Björk and Hagervall, 2014). Rescaling terms n_{1} and n_{2} are estimated below.
Estimation of coarsegrained rates
The definition of coarsegrained parameters (Equations 26) involves sums:
These can be estimated from tRNA abundances, codon usage and individual synthetases’ levels obtained from ribosome profiling data in E. coli (Li et al., 2014).
We first consider n_{1}. Note that the fraction of free tRNA of type $i$ to the total number of free tRNA (not bound to any protein) is not readily measurable. Assuming similarities between types of tRNA’s, we approximate this fraction with the fraction of total tRNA of type $i$ to the total tRNA concentration, or
The total tRNA concentration has been measured at fast growth for E. coli (Dong et al., 1996). The relative concentration of each tRNA synthetases (appropriately corrected for stoichiometry for the different classes) can be computed from the ribosome profiling data (Li et al., 2014), and we obtain
This was to be expected since the synthetases in E. coli show little variability around their mean, and in the case of equal synthetase concentration, ${n}_{1}=20$ would strictly hold.
For the second sum (n_{2}), we use distribution of ribosome footprint reads across the transcriptome to estimate ribosome occupancies at different codons. We first make the following approximation for one of the subsum:
where ${N}_{\mu \nu}^{FP}$ is the total number of ribosome footprint reads at codon pairs $\mu ,\nu $ and ${N}_{tot}^{FP}$ is the total number of footprint reads mapping to coding sequences. The nature of the approximation is that we are taking relative fraction of ribosome footprints (representing ribosomes across the elongation cycle at that codon pair) at a given codon pair to be equal to the relative fraction of ribosomes waiting for the ternary complex to derliver a tRNA to the A site. The modest differences in elongation rates at different codons seen in ribosome profiling data (Mohammad et al., 2019) justify this approximation.
From our data (not shown), we have that
holds to better than 0.5% for each codon. ${f}_{\nu}$ above is the (expression weighted) codon usage. As before with the free tRNA concentrations, we can approximate the relative ternary complexes concentrations by the corresponding total tRNA concentrations:
We used the same dataset as before for the total tRNA concentration in E. coli (Dong et al., 1996). The codon usage was determined directly from ribosome profiling data (Li et al., 2014). The sum of these products is graphically represented in Appendix 3—figure 2. The above sum of product of tRNA fraction and codon usage provides an effective number of different ternary complexes. A priori, that might have been expected to equal to the number of tRNAs (≈40). However, as is apparent in Appendix 3—figure 2, certain tRNAcodon pairs are much more prevalent than others (even for amino acid with multiple codons and/or tRNA isoacceptors), which leads to a decrease in the effective concentration. The exact value depends on the detailed codon usage and tRNA abundance.
Given the results above, we take for simplicity ${n}_{1}={n}_{2}={n}_{aa}=20$.
Translation elongation: optimal solutions
The mass action reactions corresponding to the one codon elongation cycle model are (Equations 25):
Conservation equations close the system:
The ternary complex concentration and free EFG concentration enter the translation elongation time (Equation 10, which is the diffusion limited and factor dependent contribution to the elongation time) and are required to infer optimal abundances of elongation factors. Both can to be obtained by solving the system of nonlinear equations above.
First, catalytic steps must equal to the flux through in the system in steadystate and thus:
Together with the conservation equations, these allow for immediate solutions for the free concentrations $[\text{Ts}]$, $[\text{aaRS}]$, and $[\text{G}]$:
The solution for other species can then also be obtained in terms $[{\text{Tu}}^{\text{GTP}}]$, and $[\text{TC}]$:
Substituting these in the conservation equations for tRNAs and EFTu lead to the final system to solve (converting to proteome fraction):
where the solution for ${\varphi}_{T{u}^{\text{GTP}}}$ in terms of the ternary concentration was obtained from the conservation equation for EFTu. Equations 28 and 29 are closed, and the only variables to solve for is ${\varphi}_{TC}$ in terms of the tlF abundances: ${\varphi}_{Tu},{\varphi}_{Ts},{\varphi}_{G},{\varphi}_{aaRS}$, tRNA abundances, kinetic parameters, and the growth rate λ.
Coarsegrained translation elongation time
In order to obtain the coarsegrained translation elongation time, we proceed as for translation termination (section Coarsegrained translation termination time). The summed concentration of the ribosome containing species for translation elongation in our model is:
Converting to proteome fraction:
From the coarsegrained flux relations through the different categories (Equation 17), which defines the coarsegrained transition times, we thus have:
Above, ${\mathrm{\tau}}_{aa}$ is the effective time for a single step (by one codon) of translation elongation, and ${\mathrm{\tau}}_{ind}$ corresponds to the summed time of factor independent transitions in each elongation step (not explicitly included in the kinetic scheme).
Optimality conditions for translation elongation factors
The optimality condition (Equation 5) applied to translation elongation factors leads to:
where Equation 30 was used for ${\mathrm{\tau}}_{aa}$. Since the free EFG concentration does not depend on EFTu, EFTs, or aaRS concentration, the conditions for EFTu, EFTs and aaRS simplify to:
Carrying through the differentiation also leads to conditions on the derivatives of the ternary complex concentration at the optimum:
These relationships will be useful to solve for the some elongation factor optimal abundances below.
Optimal EFTs abundance
Differentiating Equation 28 with respect to ${\varphi}_{Tu}$ and ${\varphi}_{Ts}$, we get at the optimum:
By Equation 33, the above leads to the additional condition at the optimum:
Directly differentiating Equation 29, and using Equation 33, leads to:
Therefore, the optimal abundance for EFTs is:
Optimal EFG abundance
The optimality condition for EFG is complicated by the fact that EFG free concentration appears in the solution for the steadystate ternary complex through the tRNA conservation Equation 28. Differentiating the conservation tRNA equation, and using the optimality condition 31 (replacing a number of terms with the elongation time ${\mathrm{\tau}}_{aa}$, Equation 30):
Above, the righthand portion corresponds to the additional constraint coming from the implication of EFG in the steadystate concentration of the ternary complex. From the equation for ${\varphi}_{T{u}^{\text{GTP}}}$ (Equation 29), we have directly:
Substituting this in Equation 35:
The derivative of the ternary complex with respect to EFG at the optimum can be obtained from the original optimality condition 31, by carrying through the differentiation:
Substituting in Equation 36, we arrive at a final equation for EFG in terms of the concentration of other elongation factor and the optimal growth rate:
The optimal solution for EFG is thus:
Note that given that the term $\mathrm{\Delta}$ involves ${\varphi}_{TC}^{*}$ and ${\varphi}_{T{u}^{\text{GTP}}}^{*}$, and so the solution above is not a priori complete. However, using the approximate ternary complex concentration at the optimum (Equation 12, derived in details in section Optimal EFTu and aaRS abundances), we have:
This means that the lower bound for ${\varphi}_{G}^{*}$ above (Equation 37) is a good approximation: in the physiological regime, we can approximately neglect the indirect dependence of the ternary complex concentration on EFG via the tRNA conservation equation. Hence, the approximate solution for the EFG optimal abundance is (same for had we initially assumed that ${\varphi}_{TC}$ was independent of ${\varphi}_{G}$, in which case the solution for EFG can be obtained identically as that of release factors):
Optimal EFTu and aaRS abundances
While simplifying relations were possible with EFTs and EFG, allowing their solution (approximately) independently from the rest of the cycle, EFTu and aaRS are intricately connected through the tRNA cycle. We thus return to the tRNA conservation equation, Equation 28. For notational simplicity, we group the catalytic step of the TC, EFG binding, and EFG catalytic action (translocation) in parameter ${k}_{el}^{max}$ (these do not depend on ${\varphi}_{Tu}$ and ${\varphi}_{aaRS}$) which we take to the be experimentally determined value of 22 s^{−1} (Dai et al., 2016). Further dropping the EFTs related and catalytic terms (will be added back at the end, they only contribute a fixed term at the optimum) in the equation for the free EFTu, we get:
This system is first solved numerically (Figure 3B). To close the equation in terms of uniquely ${\varphi}_{TC}$, we use our relationship for λ (Equation 1), with:
where as before ${k}_{el}^{max}$ is the maximum rate of translation elongation (from reactions other than ternary complex diffusion) estimated from in vivo kinetic measurements (≈22 s^{−1}[Dai et al., 2016]), and ${\mathrm{\tau}}_{ini}+{\mathrm{\tau}}_{ter}\approx 0.5$ s the estimated time for the initiation and termination step ($\approx 510\%$ of the full translation cycle translation time), taken as fixed parameters here. Using this relationship for the translation time leads to the explicit relationship between growth and ternary complex concentration:
which is the same relationship as the one derived in Klumpp et al., 2013, with the addition of the terms corresponding to the rest translation cycle. Substituting the explicit relationship between growth and ternary complex concentration above (Equation 39) in the aaRS/EFTu tRNA cycle relationship (Equation 38) closes the system for ${\varphi}_{TC}$. Numerical solution for this equation is presented in Figure 3B (see section Estimation of optimal abundances for other parameters).
The main conclusion from numerically solving the reduced system (Equations 38 and 39) is that the EFTu/aaRS space is partitioned in two regimes, resulting from the separation of scale of reactions in the coarsegrained model. Specifically, ${k}_{on}^{Tu}\gg \frac{{k}_{on}^{TC}}{{n}_{aa}}$, so that any imbalance between the constituents of the ternary complex (charged tRNAs, free EFTu), results in stoichiometric unproductive excess of the component in surplus.
We can derive a relation for the ”transition line’ in the aaRS/EFTu space where both free charged tRNAs and free EFTu are at low concentrations. This corresponds to setting the (formally impossible) requirement ${\varphi}_{T{u}^{\text{GTP}}}\approx 0\Rightarrow {\varphi}_{TC}\approx {\varphi}_{Tu}$ and $[\text{aatRNA}]\propto \frac{1}{{k}_{on}^{Tu}{\varphi}_{T{u}^{\text{GTP}}}}\approx 0$, that is,
The $\overline{\cdot}$ signifies the transition line relationship between ${\overline{\varphi}}_{Tu}$ and ${\overline{\varphi}}_{aaRS}$, which is displayed in Figure 3B.
The heuristic to estimate the optimal EFTu concentration described in the main text can be extended to include the EFTs cycle. In particular, in the EFTu limited regime, with ${\varphi}_{T{u}^{GTP}}\approx 0$, we have (from Equation 29):
Substituting the above expression for ${\varphi}_{TC}$ in the optimality condition (Equation 32) for ${\varphi}_{Tu}$, we arrive at (using the optimal solution for EFTs, Equation 34):
Above, the last three terms (not appearing in Equation 12) correspond to the additional diffusion of the EFTs cycle, and catalytic contributions.
Following the argument (see main text) that the optimal aaRS abundance should lie on the transition line (Equation 40), we obtain:
with ${\mathrm{\Delta}}_{t}$ related to the excess tRNA (tRNAs remaining after subtracting tRNAs sequestered on the ribosome and TC from the total tRNA budget):
Interpretation of the sharp separation between aaRS and EFTu limited regimes
The sharp separation of the solution for ${\varphi}_{TC}$ in two distinct regimes (EFTu limited, and aaRS limited, illustrated in Figure 3B), can be intuitively understood from a geometrical viewpoint.
For the simplicity of the argument (not strictly necessary), neglecting the short initiation and termination times in Equation 39, and using ${\text{tRNA}}_{tot}=\frac{t{\varphi}_{ribo}P}{{\mathrm{\ell}}_{ribo}}$ (with $t$ the tRNA to ribosome molar ratio). The tRNA conservation condition, Equation 38, can then be rewritten as (bindinglimited regime):
At given abundance of EFTu $({\varphi}_{Tu})$ and aaRS $({\varphi}_{aaRS})$, the solution for ${\varphi}_{TC}$ is obtained when equality in the above equation is reached. The behavior of the various terms with ${\varphi}_{TC}$ is illustrated for different values of ${\varphi}_{aaRS}$ and ${\varphi}_{Tu}$ in Figure 3—figure supplement 1: the number of uncharged tRNAs (pink line in Figure 3—figure supplement 1) is a decreasing function of aaRS, and free charged tRNA (red line in Figure 3—figure supplement 1) are dependent on ${\varphi}_{Tu}$. Specifically, the free charged tRNA contribution, due to the rapid association rate ${k}_{on}^{Tu}$ (codon agnostic) between charged tRNAs and EFTu (red line), is negligible except for a very narrow range where ${\varphi}_{TC}\approx {\varphi}_{Tu}$, at which point a sharp divergence occurs. This rapid divergence bounds the solution for ${\varphi}_{TC}$ at the total EFTu concentration.
The aaRS limited regime corresponds to conditions in which the uncharged tRNA contribution (pink line) intersects the available tRNA budget (full black line), lower left in Figure 3—figure supplement 1. In contrast, the EFTu limited regime corresponds to conditions in which the free charged tRNA (red line) intersects the tRNA budget, upper right in Figure 3—figure supplement 1. The sharpness of the transition between the two regime arises from the near vertical divergence of the free charged tRNA contribution.
Appendix 4
Translation initiation
Translation initiation is also relatively complex compared to translation termination. In contrast with other steps of the translation cycle, binding of factors necessary for the process (IF1, IF2, IF3, initiator tRNA) do not occur in a strict sequential order, leading to a 'heterogeneous assembly landscape' (Gualerzi and Pon, 2015; Chen et al., 2016) more complex to model. However, one assembly pathway is kinetically favored (Milón et al., 2012). We take this favored assembly pathway as our kinetic scheme (Appendix 4—figure 1, note that binding of tRNA/mRNA are coarsegrained to a single even without loss of generality). We provide some evidence below that taking a more complex assembly pathway would minimally affect the predicted optimal initiation factor abundances.
The reactions in our simplified schemes are:
with corresponding mass action equations:
and conservation equations:
We assume the steadystate concentrations of small and large ribosomal subunits to be equal.
Subpathway without subunits joining
The system of equation is complicated by the second branch of the pathway corresponding to 50S subunit binding. However, in the regime $\sqrt{\frac{{\mathrm{\ell}}_{IF}}{{\mathrm{\ell}}_{ribo}}\frac{{\widehat{k}}_{on}^{50S}}{{\widehat{k}}_{on}^{IF}}}\ll 1$ (which is realized because of the large size of the ribosome and slower association rate constant for the large subunit compared to the initiation factors again due to size), the effect of this branch is to add a term to the optimal abundance equal to the concentration of species ${R}_{123m}$ (see derivation in section Pathway including subunits joining). We focus here on the solution of the part of the reaction scheme boxed in Appendix 4—figure 1. This subscheme corresponds to:
with conservation equations:
This system can be solved as with the previous schemes. In steadystate, we find for concentrations in terms of the free concentrations $[IF2]$ and $[IF3]$:
and the coupled equations for $[IF2]$ and $[IF3]$ that need to be solved:
As for translation termination (section Coarsegrained translation termination time) and elongation (section Coarsegrained translation elongation time), summing the ribosome containing species:
allows us to read the initiation time directly (recast in proteome fraction units):
The above is the time can be used in the optimality condition (Equation 5). Note that the parallel nature of the reactions with IF2 and IF3 leads to a reduction compared to a purely sequential pathway (negative term above decreasing the total initiation time, as expected if multiple reactions can occur in parallel).
Given that binding of IF1 occurs last in this scheme, its free concentration takes a simple form (${\varphi}_{IF1}^{free}={\varphi}_{IF1}\frac{{\mathrm{\ell}}_{IF1}\lambda}{\u27e8\mathrm{\ell}\u27e9{k}_{RNA}}$). In contrast, computing the free IF2 and IF3 concentrations requires solving the nonlinear coupled system, Equations 41. Recasting these in units of proteome fraction:
with ${\stackrel{~}{\varphi}}_{IF2}:={\varphi}_{IF2}\frac{{\mathrm{\ell}}_{IF2}\lambda}{\u27e8\mathrm{\ell}\u27e9{k}_{RNA}}\frac{{\mathrm{\ell}}_{IF2}\lambda}{\u27e8\mathrm{\ell}\u27e9{k}_{on}^{IF1}{\varphi}_{IF1}^{free}}$, and similarly for ${\stackrel{~}{\varphi}}_{IF3}$. We show now that the terms coupling the two equations for ${\varphi}_{IF2}^{free}$ and ${\varphi}_{IF2}^{free}$ (bracketed above) are small at the optimum. Indeed, based on results in simpler schemes (selfconsistency confirmed below), we expect at the optimum:
Hence, we expect the two terms at the optimum in the coupled equations above to compare as (e.g. in the free IF2 equation):
coming from the large size of the ribosome compared to the initiation factors. In addition, the derivative of the coupling terms, which appear in the optimality condition and therefore in identifying the optimal abundances, are all of the form $\frac{{\lambda}^{*}{\mathrm{\ell}}_{IF}}{\u27e8\mathrm{\ell}\u27e9{k}_{on}^{IF}{({\varphi}_{IF}^{free})}^{2}}$ compared to the main term. This scales scales as ${\mathrm{\ell}}_{IF}{\mathrm{\ell}}_{ribo}^{1}\ll 1$ at the selfconsistent solution. Hence, neglecting the coupling is justified as an approximate solutions near the optimum, and we obtain for the free concentrations of IFs:
Substituting these in the expression for the initiation time, Equation 42, and using the optimality condition (Equation 5, we find that no simple solution exist for the non symmetric case of ${k}_{on}^{IF2}\ne {k}_{on}^{IF3}$). Since the onrates should be similar for IF2 and IF3 (difference in size should only lead to modest difference in onrates coefficient, by roughly ${({\mathrm{\ell}}_{IF2}/{\mathrm{\ell}}_{IF3})}^{1/3}\approx 1.7$ assuming Stokes scaling), the symmetric case is approximately correct. We report the symmetric solution for simplicity. The final optimal solutions for the three factors for the subscheme solved here is:
The form of the solution is again similar to that derived for the simpler translation termination scheme (c.f., Equation 20), with three differences, each of which has an intuitive interpretation. First, the factor $\left[1+\frac{{\mathrm{\ell}}_{IF2}+{\mathrm{\ell}}_{IF3}}{{\mathrm{\ell}}_{ribo}}\right]$ in the IF1 solution arises as a result of IF1 binding being last in our initiation pathway. Indeed, IF1 concentration also influences free IF2 and IF3 concentration, leading to additional selective pressure to increase its abundance. In effect, the molecular species waiting for IF1 to diffuse to its target is not only the ribosome, but the ribosome with IF2 and IF3 bound, and a total amino acid weight ${\mathrm{\ell}}_{ribo}\to {\mathrm{\ell}}_{ribo}+{\mathrm{\ell}}_{IF2}+{\mathrm{\ell}}_{IF3}$. Second, the factor of $\sqrt{3/4}\approx 0.87<1$ for IF2 and IF3 (corresponding to the symmetric case), arising from the parallel pathway for IF2 and IF3 rendering the process more efficient. We therefore see that the correction from having multiple reactions in parallel is modest (0.87 vs. 1). The third difference to the simpler case of translation termination are the second terms for IF2 and IF3, corresponding to the additional delay incurred by binding of IF1. These come from the assumed sequential nature of our initiation scheme (Appendix 4—figure 1). In such cases, factors binding earlier have to be present at higher abundances to account for their wait times for later binding events. The exact form of this correction term would be different for more complex assembly pathways (but would be captured by average delays from other factor binding).
Pathway including subunits joining
The solutions above (Equations 43) are for the reduced scheme (boxed in Appendix 4—figure 1). The full solutions includes the delay arising from 50S subunit binding. Including subunit joining requires the solution of an additional equation for the steadystate concentration of species with all three initiation factors, mRNA and initiator tRNA waiting for subunit joining (species ${R}_{123m}$ in Appendix 4—figure 1, denoted ${\varphi}_{123m}$ in units of proteome fraction). The equation to solve for ${\varphi}_{123m}$ can be obtained from the 50S ribosome subunit conservation equation:
${\varphi}_{123m}$ appears in the equations for the free concentration of the initiation factors (from the conservation equations), and also leads to the appearance of a new term in the expression for the initiation time ${\mathrm{\tau}}_{ini}$ (Equation 42) corresponding to this step: $\frac{\u27e8\mathrm{\ell}\u27e9{\varphi}_{123m}}{{\mathrm{\ell}}_{30S}\lambda}$.
These two additions, resulting from the parallel branch of 50S joining, can be simplified due to a separation of scales between the various terms. For large initiation factor concentrations, the corresponding mass action terms in the equation for ${\varphi}_{123m}$ negligibly contribute to the solution. In this regime, the new term involving ${\varphi}_{123m}$ in the initiation time ${\mathrm{\tau}}_{ini}$ does not alter the form the optimal abundances of IF1, IF2, and IF3 beyond adding a constant term. Hence, in the regime of high free IF concentration, the optimality condition has the same form as derived in the previous section.We can therefore obtain ${\varphi}_{123m}$ assuming large IF concentration, denoted ${\varphi}_{123m}^{\mathrm{\infty}}$:
This solution will be selfconsistent provided (for all initiation factors):
It therefore suffices to show:
Using our optimality condition on ${\varphi}_{IF}^{free,*}$ (Equation 43) assuming no contribution from ${\varphi}_{123m}$ (selfconsistency), and converting association rates in units µM^{−1}s^{−1}, the above condition reduces to:
The selfconsistency condition is met both because initiation factors are smaller than ribosomes ${\mathrm{\ell}}_{IF}\ll {\mathrm{\ell}}_{ribo}$, and because the onrate for subunit joining is lower than initiation factor binding (${\widehat{k}}_{on}^{50S}\ll {\widehat{k}}_{on}^{IF}$), given again the size differences. The solution, including the contribution from ribosome subunits joining is then:
where for ${k}_{RNA}$ much faster than the association between the subunits, ${\varphi}_{123m}^{\mathrm{\infty}}\approx \sqrt{\frac{{\mathrm{\ell}}_{30S}{\lambda}^{*}}{\u27e8\mathrm{\ell}\u27e9{k}_{on}^{50S}}}$.
Appendix 5
Estimation of optimal abundances
To compare prediction from our parsimonious framework (Table 1) requires specific values of kinetic parameters. We use empirical measurements together with scaling relations to estimate these kinetic parameters.
Catalytic rates for many enzymes have been measured in vitro, but the obtained values can be sharply incompatible with kinetic parameters that have been measured in the cell. An example is the class tRNA synthetases. Tallying the measured ${k}_{cat}$ for all wildtype E. coli aaRSs (Jeske et al., 2019), we find a median value of ${k}_{cat}^{aaRS}\approx $ 3 s^{−1}, and 80% of reported value below 6 s^{−1}. The total molar concentration of aaRSs in the cell is comparable to the total number of ribosomes, and the perstep elongation speed of ribosome is above 15 s^{−1} (Dai et al., 2016; Johnson et al., 2020). Hence, the absolute minimum catalytic rate to sustain the translation elongation flux needs to obey ${k}_{cat}^{aaRS}>15$ s^{−1}, which is much higher than most in vitro measured values. To avoid the difficulties in estimating catalytic parameters, and to derive a lower bound on factor abundance from our model, we focus on the diffusive contributions (related to the associate rate) in our predictions, assuming large catalytic rates (${k}_{cat}\to \mathrm{\infty}$).
To estimate diffusionlimited association rate constants ${\widehat{k}}_{on}$, we scaled the measured in vivo association rate constant for the ternary complex, ${\hat{k}}_{on}^{TC}=6.4$ M^{−1}s^{−1} (Dai et al., 2016) by diffusion of the respective components, that is, ${\widehat{k}}_{on}^{AB}/{\widehat{k}}_{on}^{TC}=({D}_{A}+{D}_{B})/({D}_{TC}+{D}_{ribo})$, where ${D}_{i}$ is the diffusion coefficients for the molecular species $i$. While the in vivo diffusion coefficient for a number of component of the translation apparatus exist (Bakshi et al., 2012; Sanamrad et al., 2014; Volkov et al., 2018; Plochowietz et al., 2017), several factors do not have measured diffusion coefficients. For these, we used the cubic root scaling from the StokesEinstein relation (Nenninger et al., 2010), see Appendix 5—table 1.
We note that an alternative estimate for ${\widehat{k}}_{on}$ using the Smoluchowski relation (${\widehat{k}}_{on}^{Smol}=4\pi DR$, where $D$ is the relative diffusion coefficients of the two reactants and $R$ the capture radius) is overly simplistic as it assumes perfectly absorbing spheres. The actual diffusionlimited association rate constant could be much lower due to orientation constraints and other factors. It is also difficult to measure the capture radius in physiological conditions. Indeed, the Smoluchowski ${\widehat{k}}_{on}^{Smol}$ calculated using the diffusion coefficients of EFTu in vivo (≈3 µm^{2}s^{−1}, [Volkov et al., 2018]) and a previous estimate for the capture radius ($R\approx 2$ nm, [Klumpp et al., 2013]) yields ${\hat{k}}_{on}^{TC,Smol}\approx 45$ µM^{−1}s^{−1}, which is several fold greater than the in vivo estimate of ${k}_{on}^{TC}$ based on kinetic measurements of elongation (${\hat{k}}_{on}^{TC}=6.4$ µM^{−1}s^{−1}, [Dai et al., 2016]). This comparison illustrates that the idealized Smoluchowski formula is not applicable. That said, our scaling approach does come at the price of assuming similar molecular properties leading to decrease of the association rate constants for the other tlFs. These could be further refined via for example, structural modeling (Schlosshauer and Baker, 2004), or upon new in vivo rate constant measurements.
Additional measured quantities required to compute our estimates are: the measured growth rate λ* = 5.5 × 10^{−4} s^{−1} (for Figure 4 taken to be the average of the fastgrowing species considered, corresponding to a doubling time of 21 ± 1 min. Individual species values: E. coli: 21.5 ± 1 min, B. subtilis: 21 ± 1 min, V. natriegens: 19 ± 1 min. See below for slower growth conditions), the tRNA concentration (estimated from the tRNA to ribosome ratio of 6.5 (Dong et al., 1996) using: ${\text{tRNA}}_{tot}=\text{(tRNA/ribo)}{\varphi}_{ribo}P/{\mathrm{\ell}}_{ribo}$), the maximum percodon elongation rate, excluding ternary complex diffusion, ${k}_{el}^{max}=22$ s^{−1} (Dai et al., 2016) (used to estimate the number of tRNAs sequestered on ribosomes and therefore the excess tRNA number in the optimum for aaRS, see Equations 18 and 38), the inprotein amino acid concentration $P=2.6$ M (Klumpp et al., 2013; Bremer and Dennis, 2008).
For the fast growth average, results are displayed in Figure 4 listed in Supplementary file 2. Additional predictions in individual conditions are shown in Figure 4—figure supplement 1, with numerical values for measured and predicted values listed in Supplementary files 1–4. For predictions in different growth conditions/species, we used used the measured growth rates in the corresponding conditions (values listed in Supplementary files 1 and 3), and association rate constants estimated based on E. coli data (Appendix 5—tables 1–3), and the tRNA abundance (only needed for the prediction of aaRS) at the corresponding growth rate in E. coli from Dong et al., 1996. As a result of the lack of quantitation of tRNA abundance in other species, these values were used for B. subtilis, V. natriegens and C. crescentus, and should be interpreted with caution given possible difference in cellular physiology for these species.
Data availability
Already publicly available ribosome profiling datasets were used (GEO accessions GSE95211, GSE53767, and GSE139983). Computer scripts (Matlab) used for this study were submitted with the present work as Figure 3—source code 1. Supplementary files 14 contain the numerical data to reproduce figures.

NCBI Gene Expression OmnibusID GSE95211. Data from: Evolutionary Convergence of Pathwayspecific Enzyme Expression Stoichiometry.

NCBI Gene Expression OmnibusID GSE53767. Data from: Absolute quantification of protein production reveals principles underlying protein synthesis rates.

NCBI Gene Expression OmnibusID GSE139983. Data from: From coarse to fine: The absolute Escherichia coli proteome under diverse growth conditions.
References

Elongation in translation as a dynamic interaction among the ribosome, tRNA, and elongation factors EFG and EFTuQuarterly Reviews of Biophysics 42:159–200.https://doi.org/10.1017/S0033583509990060

Elongation factors in protein biosynthesisTrends in Biochemical Sciences 28:434–441.https://doi.org/10.1016/S09680004(03)001622

Global analysis of translation termination in E. coliPLOS Genetics 13:e1006676.https://doi.org/10.1371/journal.pgen.1006676

Growth rateoptimised tRNA abundance and codon usageJournal of Molecular Biology 270:544–550.https://doi.org/10.1006/jmbi.1997.1142

A kinetic model of protein synthesis. application to hemoglobin synthesis and translational controlThe Journal of Biological Chemistry 254:11927–11937.

The molecular choreography of protein synthesis: translational control, regulation, and pathwaysQuarterly Reviews of Biophysics 49:e11.https://doi.org/10.1017/S0033583516000056

The elongation, termination, and recycling phases of translation in eukaryotesCold Spring Harbor Perspectives in Biology 4:a013706.https://doi.org/10.1101/cshperspect.a013706

Covariation of tRNA abundance and codon usage in Escherichia coli at different growth ratesJournal of Molecular Biology 260:649–663.https://doi.org/10.1006/jmbi.1996.0428

A stochastic model for simulating ribosome kinetics in vivoPLOS Computational Biology 16:e1007618.https://doi.org/10.1371/journal.pcbi.1007618

Costs of accuracy determined by a maximal growth rate constraintQuarterly Reviews of Biophysics 17:45–82.https://doi.org/10.1017/S0033583500005254

Absolute quantification of translational regulation and burden using combined sequencing approachesMolecular Systems Biology 15:e8719.https://doi.org/10.15252/msb.20188719

Initiation of mRNA translation in bacteria: structural and dynamic aspectsCellular and Molecular Life Sciences 72:4341–4367.https://doi.org/10.1007/s0001801520103

AminoacyltRNA synthesisAnnual Review of Biochemistry 69:617–650.https://doi.org/10.1146/annurev.biochem.69.1.617

Combinatorial pathway optimization for streamlined metabolic engineeringCurrent Opinion in Biotechnology 47:142–151.https://doi.org/10.1016/j.copbio.2017.06.014

BRENDA in 2019: a European ELIXIR core data resourceNucleic Acids Research 47:D542–D549.https://doi.org/10.1093/nar/gky1048

Mechanisms of drug interactions between translationinhibiting antibioticsNature Communications 11:4013.https://doi.org/10.1038/s4146702017734z

Transcription and translation initiation frequencies of the Escherichia coli lac operonJournal of Molecular Biology 114:1–21.https://doi.org/10.1016/00222836(77)902790

The EcoCyc database: reflecting new knowledge about Escherichia coli K12Nucleic Acids Research 45:D543–D550.https://doi.org/10.1093/nar/gkw1003

Spurious regulatory connections dictate the expressionfitness landscape of translation factorsMolecular Systems Biology 17:e10302.https://doi.org/10.15252/msb.202110302

The relative rates of protein synthesis and degradation in a growing culture of Escherichia coliJournal of Biological Chemistry 255:4125–4130.https://doi.org/10.1016/S00219258(19)856429

Initiation of protein synthesis in bacteriaMicrobiology and Molecular Biology Reviews 69:101–123.https://doi.org/10.1128/MMBR.69.1.101123.2005

How do bacteria tune translation efficiency?Current Opinion in Microbiology 24:66–71.https://doi.org/10.1016/j.mib.2015.01.001

Intermediates and time kinetics of the in vivo assembly of Escherichia coli ribosomesJournal of Molecular Biology 92:15–37.https://doi.org/10.1016/00222836(75)900893

Translation initiation: structures, mechanisms and evolutionQuarterly Reviews of Biophysics 37:197–284.https://doi.org/10.1017/S0033583505004026

Realtime assembly landscape of bacterial 30S translation initiation complexNature Structural & Molecular Biology 19:609–615.https://doi.org/10.1038/nsmb.2285

The growth of bacterial culturesAnnual Review of Microbiology 3:371–394.https://doi.org/10.1146/annurev.mi.03.100149.002103

Methylation of bacterial release factors RF1 and RF2 is required for normal translation termination in vivoJournal of Biological Chemistry 282:35638–35645.https://doi.org/10.1074/jbc.M706076200

From coarse to fine: the absolute Escherichia coli proteome under diverse growth conditionsMolecular Systems Biology 17:e9536.https://doi.org/10.15252/msb.20209536

Size dependence of protein diffusion in the cytoplasm of Escherichia coliJournal of Bacteriology 192:4535–4540.https://doi.org/10.1128/JB.0028410

Regulation of the synthesis of ribosomes and ribosomal componentsAnnual Review of Biochemistry 53:75–117.https://doi.org/10.1146/annurev.bi.53.070184.000451

tRNA synthetase: TRNA aminoacylation and beyondWiley Interdisciplinary Reviews. RNA 5:461–480.https://doi.org/10.1002/wrna.1224

In vivo singleRNA tracking shows that most tRNA diffuses freely in live bacteriaNucleic Acids Research 45:926–937.https://doi.org/10.1093/nar/gkw787

Genomescale analysis of translation elongation with a ribosome flow modelPLOS Computational Biology 7:e1002127.https://doi.org/10.1371/journal.pcbi.1002127

Translation in prokaryotesCold Spring Harbor Perspectives in Biology 10:a032664.https://doi.org/10.1101/cshperspect.a032664

Dependency on medium and temperature of cell size and chemical composition during balanced grown of salmonella typhimuriumJournal of General Microbiology 19:592–606.https://doi.org/10.1099/00221287193592

Emergence of robust growth laws from optimal regulation of ribosome synthesisMolecular Systems Biology 10:747.https://doi.org/10.15252/msb.20145379

Analysis of translation elongation dynamics in the context of an Escherichia coli cellBiophysical Journal 110:2120–2131.https://doi.org/10.1016/j.bpj.2016.04.004

tRNA tracking for direct measurements of protein synthesis kinetics in live cellsNature Chemical Biology 14:618–626.https://doi.org/10.1038/s415890180063y

Elongation factor tu: a molecular switch in protein biosynthesisMolecular Microbiology 6:683–688.https://doi.org/10.1111/j.13652958.1992.tb01516.x

Components of bacterial ribosomesAnnual Review of Biochemistry 51:155–183.https://doi.org/10.1146/annurev.bi.51.070182.001103
Decision letter

Pierre SensReviewing Editor; Institut Curie, PSL Research University, CNRS, France

Aleksandra M WalczakSenior Editor; École Normale Supérieure, France

Martin J LercherReviewer; HeinrichHeineUniversität, Germany

Srividya IyerBiswasReviewer; Purdue University, United States
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Acceptance summary:
This paper presents a theoretical analysis of the abundance of components of the translation machinery (ribosomes, initiation, elongation and release factors, tRNA synthetases) in bacteria. These proteins make up a large fraction of the total proteome and their abundance is closely linked to cell growth. That the stoichiometry of the different components is adjusted such as to maximize the growth rate has been postulated a long time ago, but was so far only studied in detail for ribosomes and EFTu, the most abundant elongation factor. Here, the authors extend these earlier works to an unprecedented level of detail and provide a complete analysis based on this idea and derive the optimal stoichiometry for all these factor, which they find to be in good agreement with the observed abundance in different bacteria. This provides new evidences new evidence supporting the idea of proteome optimization for maximal growth.
Decision letter after peer review:
Thank you for submitting your article "Firstprinciples model of optimal translation factors stoichiometry" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Martin J Lercher (Reviewer #1); Srividya IyerBiswas (Reviewer #3).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
The essential revisions are listed in the list of comments from the different reviewers.
– Discuss the limitations of using the properties of the "metabolic protein sector" corresponding to the optimal stoichiometry condition to derive the optimal conditions for the translation factor sector and the ribosome sector.
– Expand the discussion of the case of EfTu by moving parts of the Supplementary in the main text (at the end of P.12) (comment from Reviewer #2).
– Discuss in more depth the results of Figure 4, and in particular the need for "fitting parameters".
– Provide a pointbypoint answer all the reviewers comments.
Reviewer #1 (Recommendations for the authors):
To make the parameter choices transparent, there should be a table that lists all model parameters, how they were derived, and an estimate of the uncertainty of the value.
Reviewer #2 (Recommendations for the authors):
This is an highly interesting study and there are only a few points, where it could be improved in my opinion:
– While the case of the release factors is very well described in the main text, the case of EFTu and aaRS is much harder to understand based on only the main text and the many references to the Supplement make the reading difficult here. I would suggest to maybe expand the description in the main text (starting with the last paragraph on p. 12).
– I think one could distinguish between a bindinglimitation of the reactions and a diffusionlimitation of the binding step of the reaction. The former is needed for the analysis, while the latter is only required in the final step, where quantitative ratios are determined and binding rates have to be approximated by a diffusionlimited rate.
– Figure 4 could in principle also be done for individual organisms (using the data shown in Figure 1b).
Reviewer #3 (Recommendations for the authors):
Addressing the concern raised through clear prose in the manuscript will suffice.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for submitting your article "Firstprinciples model of optimal translation factors stoichiometry" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Srividya IyerBiswas (Reviewer #3).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
The referees are overall satisfied with the response you provided to their comments, and with the modification you made to the paper. There is one exception however, which regards your response to point (6) of reviewer #1. His comment is copied below.
Could you please provide an answer to his new comment, and modify the paper accordingly so that a final decision regarding your manuscript can be made.
Reviewer #1 (Recommendations for the authors):
The authors' responses adequately address my previous concerns, with one exception, the answer to my previous point (6) about the uniform rescaling of the association rates. The authors now write in the manuscript:
"Importantly, the absolute values of the optimal concentrations can be anchored by the association rate constant between TC and the ribosome obtained from translation elongation kinetic measurements in vivo (Dai et al., 2016). The latter was found to be severalfold smaller than the simplest and absolute upper bound of a Smoluchowski estimate of perfectly absorbing spheres (section Estimation of optimal abundances), and we assume that the rescaling factor is the same for all reactions."
In my interpretation, what this really means is that the presented theory based on diffusion limitations provides (for whatever reasons) good predictions for the relative levels of tlFs, but not for the absolute levels. In order to get a quantitative prediction, the authors need to reduce ("rescale") all estimated onrates by almost an order of magnitude, based on the known discrepancy between their estimate and an experimental value for a single tlF.
It is not clear (i) what causes that discrepancy (other than that the estimates are very approximate), and (ii) if and why the magnitude of the discrepancy can be transferred to other factors, and (iii) why its necessity does not invalidate the authors' approach.
This is an important point and should be made much more explicit than is currently done: The impressive quantitative agreement between theory and observation rests on the assumption that it is appropriate to make this uniform rescaling. In my opinion, the analysis is very interesting – but based on the necessary rescaling, it is currently not clear if the authors' approach really provides the correct description of reality.
Reviewer #2 (Recommendations for the authors):
In my opinion, the authors have done a nice job responding to the comments and revising the manuscript.
Reviewer #3 (Recommendations for the authors):
The authors have adequately addressed my previous concerns. I recommend the manuscript for publication.
https://doi.org/10.7554/eLife.69222.sa1Author response
Essential revisions:
The essential revisions are listed in the list of comments from the different reviewers.
– Discuss the limitations of using the properties of the "metabolic protein sector" corresponding to the optimal stoichiometry condition to derive the optimal conditions for the translation factor sector and the ribosome sector.
Indeed, our original formalism implicitly assumed that the metabolic sector responds to the growth rate in a predetermined way. The model was thus not about optimization of the full proteome from scratch. In the revised manuscript, we restrict the scope of the optimization problem to be within the translation sector and not the full proteome. This yields the same optimality condition. Therefore, the model predictions remain the same under this alternative formulation which relies on few assumptions. The revised model formalism is presented in section “Optimization under proteome allocation constraint” (starting p. 5, line 136, also copied on p. 8 of the current response).
– Expand the discussion of the case of EfTu by moving parts of the Supplementary in the main text (at the end of P.12) (comment from Reviewer #2).
We have added additional details related to the EFTu case study to the main text (Box 1, p. 12 in the revised main text).
– Discuss in more depth the results of Figure 4, and in particular the need for "fitting parameters".
There are no fitting parameters. The empirical parameters used in the model are now listed in Appendix 5 Tables 13 (p.4748). We have also added more discussion related to the results of Figure 4.
– Provide a pointbypoint answer all the reviewers comments.
Our responses are detailed below.
Reviewer #1 (Recommendations for the authors):
To make the parameter choices transparent, there should be a table that lists all model parameters, how they were derived, and an estimate of the uncertainty of the value.
We thank the reviewer for raising this allimportant point on parameters. The formulae for our predictions are described in detail in section “Summary of optimal solutions” (Table 2). In order to clarify parameter selection, we have added Appendix 5 Tables 2 to list the formulae used to obtain the association rate constants, and Appendix 5 Table 3 to highlight additional parameters originally described in the supplementary text of the manuscript.
In particular, Appendix 5 Tables 2 now makes it explicit that the association rate constants are estimated by anchoring to that inferred from in vivo kinetic measurements for the ternary complex, scaled by the diffusion coefficients of participating factors. If measured values of diffusion constants for a factor have not been reported (the majority), we estimate the diffusion constant from that of the ternary complex rescaled by the ℓ^{1/3} for the two factors (StokesEinstein relation (6)), as shown in Appendix 5 Table 1.
Appendix 5 Table 3 includes a number of additional miscellaneous parameters, many of which are needed for the aaRS prediction: the maximum translation elongation rate (k_{el}^{max}=22 s^{1}, taken directly from (3)), the tRNA abundance (calculated from the measured tRNA/ribosome ratio and the total ribosome abundance converted to molar), tRNA_{tot} = (tRNA/ribosome ratio) 𝜙_{ribo} P/ℓ_{ribo}. The tRNA/ribosome ratio is taken from (7) at the corresponding growth rate, and 𝜙_{ribo} is the estimated ribosomal protein abundance from ribosome profiling.
Reviewer #2 (Recommendations for the authors):
This is an highly interesting study and there are only a few points, where it could be improved in my opinion:
– While the case of the release factors is very well described in the main text, the case of EFTu and aaRS is much harder to understand based on only the main text and the many references to the Supplement make the reading difficult here. I would suggest to maybe expand the description in the main text (starting with the last paragraph on p. 12).
We thank the reviewer for raising an issue concerning the clarity of the section relating to EFTu and tRNA synthetases. In order to facilitate following the argument without resorting to the supplementary material, we have added a technical section (“Box. 1: The EFTu and aaRS transition line”, starting on p. 12 line 362) providing condensed details of the solution of that portion of the elongation cycle. Box. 1 contains intermediate equations and a derivation of the equation specifying the “transition line” (orange line in Figure 3B), critical for identifying the optimal solution for tRNA synthetases.
Furthermore, a new supplementary section “Interpretation of the sharp separation between aaRS and EFTu limited regimes” (p. 39, line 1381) was added to provide an intuitive understanding of how the separation of scales between codon specific/agnostic reactions in our coarsegrained model leads to a sharp transition between the aaRS limited and EFTu limited regime. The geometrical argument is summarized in a new supplementary figure (Figure 3—figure supplement 1).
– I think one could distinguish between a bindinglimitation of the reactions and a diffusionlimitation of the binding step of the reaction. The former is needed for the analysis, while the latter is only required in the final step, where quantitative ratios are determined and binding rates have to be approximated by a diffusionlimited rate.
Following the reviewer’s suggestion, we have changed reference from diffusion limitation to binding limitation in all but the final section in which we estimate parameters (specifically: changing “diffusionlimited” to “bindinglimited”). We now explicitly mention that we are estimating parameters under the assumption of diffusion limitation (anchoring values based on quantities measured in vivo), p.11 line 431:
“To obtain the numerical values of association rates needed to estimate the optimal tlF stoichiometry (Table 2), we further assume that binding reactions are diffusion limited.”
– Figure 4 could in principle also be done for individual organisms (using the data shown in Figure 1b).
We thank the reviewer for this suggestion. We now include in the revised manuscript predictions for individual species (and new conditions including slower growth) as Figure 4—figure supplement 1.
Reviewer #3 (Recommendations for the authors):
Addressing the concern raised through clear prose in the manuscript will suffice.
We thank the reviewer for pointing out this subtle but deep point about our model formulation. We agree that in our original derivation, the derivative with respect to translation factor abundance leading to the optimality condition implicitly assumed that the system obeyed the growth law scaling under such perturbation. This is an overly strict requirement that is not necessary to arrive at the optimality condition.
Following your suggestion as well as Reviewer 1’s suggestion, in this revised submission we now instead consider the more circumscribed problem of constraining the total proteomic sector to the translation machinery, and use that constraint to derive the optimality condition. Under this new formulation, the derivative with regards to translation factor abundance is then not to be understood as what happens to actual cells under perturbed expression, but rather as what would happen in an (imagined) scenario where the total translation sector is fixed (and therefore the expression of other sectors as well). While this constraint restricts the scope of our model, it focuses attention on the core problem we are addressing, namely identifying the selective pressures operating on the subpartitioning of resources allocated to a given pathway.
We also include a clarification about the meaning of the derivative in our optimization condition (p. 5, line 158):
“We emphasize that the derivative above corresponds to a perturbation scenario in which the tlF abundance is changed while maintaining fixed the total proteomic resources to the translation sector, as prescribed by our optimization procedure. As such, it does not correspond an actual perturbation easily realizable experimentally.”
References:
1. L. Volkov, et al., tRNA tracking for direct measurements of protein synthesis kinetics in live cells. Nat. Chem. Biol. 14, 618–626 (2018).
2. D. RodriguezCorrea, A. E. Dahlberg, Kinetic and thermodynamic studies of peptidyltransferase in ribosomes from the extreme thermophile Thermus thermophilus. Rna 14, 2314–2318 (2008).
3. X. Dai, et al., Reduction of translating ribosomes enables Escherichia coli to maintain elongation rates during slow growth. Nat. Microbiol. 2, 1–9 (2016).
4. P. Nissen, et al., Crystal structure of the ternary complex of PhetRNAPhe, EFTu, and a GTP analog. Science (80. ). 270, 1464–1472 (1995).
5. S. Klumpp, M. Scott, S. Pedersen, T. Hwa, Molecular crowding limits translation and cell growth. Proc. Natl. Acad. Sci. U. S. A. 110, 16754–9 (2013).
6. A. Nenninger, G. Mastroianni, C. W. Mullineaux, Size dependence of protein diffusion in the cytoplasm of Escherichia coli. J. Bacteriol. 192, 4535–4540 (2010).
7. H. Dong, L. Nilsson, C. G. Kurland, Covariation of tRNA Abundance and Codon Usage in Escherichia coli at Different Growth Rates. J. Mol. Biol. 260, 649–663 (1996).
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Essential revisions:
The referees are overall satisfied with the response you provided to their comments, and with the modification you made to the paper. There is one exception however, which regards your response to point (6) of reviewer #1. His comment is copied below.
Could you please provide an answer to his new comment, and modify the paper accordingly so that a final decision regarding your manuscript can be made.
Reviewer #1 (Recommendations for the authors):
The authors' responses adequately address my previous concerns, with one exception, the answer to my previous point (6) about the uniform rescaling of the association rates. The authors now write in the manuscript:
"Importantly, the absolute values of the optimal concentrations can be anchored by the association rate constant between TC and the ribosome obtained from translation elongation kinetic measurements in vivo (Dai et al., 2016). The latter was found to be severalfold smaller than the simplest and absolute upper bound of a Smoluchowski estimate of perfectly absorbing spheres (section Estimation of optimal abundances), and we assume that the rescaling factor is the same for all reactions."
In my interpretation, what this really means is that the presented theory based on diffusion limitations provides (for whatever reasons) good predictions for the relative levels of tlFs, but not for the absolute levels. In order to get a quantitative prediction, the authors need to reduce ("rescale") all estimated onrates by almost an order of magnitude, based on the known discrepancy between their estimate and an experimental value for a single tlF.
It is not clear (i) what causes that discrepancy (other than that the estimates are very approximate), and (ii) if and why the magnitude of the discrepancy can be transferred to other factors, and (iii) why its necessity does not invalidate the authors' approach.
This is an important point and should be made much more explicit than is currently done: The impressive quantitative agreement between theory and observation rests on the assumption that it is appropriate to make this uniform rescaling. In my opinion, the analysis is very interesting – but based on the necessary rescaling, it is currently not clear if the authors' approach really provides the correct description of reality.
Thank you for raising this important point. We have edited the manuscript by reorganizing and clarifying how we arrived at estimates of association rate constants.
Through these changes, we address: (i) why the physiological association rate constant might be different from the naïve estimate, (ii) the caveats associated with applying the same scaling factor to other factors, and (iii) how the scaling factor may affect our predictions. Modified paragraphs of the manuscript are reproduced below.
In the previous revision, our wording might have unintentionally led to two misconceptions. First, we did not intend to consider the Smoluchowski expression of association rate constants (k_{on}=4𝜋DR) as the true diffusionlimited rate constant. The former assumes perfectly absorbing spherical molecules, which isn’t applicable to our case. Second, our wording of ‘rescaling’ was confusing. We did not rescale the Smoluchowski association rate constants. Instead, we used the measured k_{on} between TCribosome and estimated the k_{on} for other reactions using a scaling relationship between the association rate constant and the diffusion coefficients. We revised the manuscript accordingly to avoid these misconceptions.
Paragraph starting at line 431 (page 11):
“To obtain the numerical values of association rate constants needed for calculating the optimal tlF stoichiometry (Table 2), we used the measured ${\hat{k}}_{\mathrm{\text{on}}}^{\mathrm{\text{TC}}}$ in vivo and estimated all other association rate constants using a biophysically motivated scaling ($\hat{k}$ denotes the raw association rate constant in units µM^{−1}s^{−1}, which is different from the rescaled $\hat{k}$, see section Conversion between 435 concentration and proteome fraction). […] Nonetheless, we note that the squareroot dependence on these parameters (Table 2) for our predictions makes the numerical values less sensitive tlFspecific effects.”
Paragraph starting at line 537 (page 14):
Our coarsegraining approach has several limitations in its connection to detailed biochemical parameters. […] We further note that a number of conclusions from our model, such as the factor of $\sqrt{\u27e8\ell \u27e9}$ separating the optimal abundances of elongation from initiation/termination tlFs, are generic and do not depend on the specific association rates.”
Paragraphs starting at line 1604 (page 47, Appendix 5):
“To estimate diffusionlimited association rate constants ${\hat{k}}_{\mathrm{\text{on}}}$, we scaled the measured in vivo association rate constant for the ternary complex, ${\hat{k}}_{\mathrm{\text{on}}}^{\mathrm{\text{TC}}}=6.4\phantom{\rule{0.222em}{0ex}}{\mathrm{\text{\mu M}}}^{1}{s}^{1}$ (Dai et al., 2016) by diffusion of the respective components, i.e., ${\hat{k}}_{\mathrm{\text{on}}}^{\mathrm{\text{AB}}}/\phantom{\rule{0.222em}{0ex}}{\hat{k}}_{\mathrm{\text{on}}}^{\mathrm{\text{TC}}}({D}_{A}+{D}_{B})/({D}_{\mathrm{\text{TC}}}+{D}_{\mathrm{\text{ribo}}})$, where D, is the diffusion coefficients for the molecular species i. […] These could be further refined via e.g., structural modeling (Schlosshauer and Baker, 2004) or upon new in vivo rate constant measurements.”
https://doi.org/10.7554/eLife.69222.sa2Article and author information
Author details
Funding
National Institutes of Health (R35GM124732)
 GeneWei Li
National Science Foundation (MCB 1844668)
 GeneWei Li
Richard and Susan Smith Family Foundation (Smith Odyssey Award and Smith Family Award)
 GeneWei Li
Pew Charitable Trusts (Pew Scholar)
 GeneWei Li
Alfred P. Sloan Foundation (Sloan Research Fellowship)
 GeneWei Li
Kinship Foundation (Searle Scholar)
 GeneWei Li
National Research Council Canada (Doctoral fellowship)
 JeanBenoît Lalanne
Howard Hughes Medical Institute (International Student Fellowship)
 JeanBenoît Lalanne
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank R Battaglia, J Cascino, M Gill, M Parker, D Parker, and G Schmidt for critical reading of the manuscript, and all members of the Li lab for discussion. This research was supported by NIH grant R35GM124732, the NSF CAREER Award, the Smith Odyssey Award, the Pew Biomedical Scholars Program, a Sloan Research Fellowship, the Searle Scholars Program, the Smith Family Award for Excellence in Biomedical Research; NSERC doctoral Fellowship and HHMI International Student Research Fellowship (to JBL).
Senior Editor
 Aleksandra M Walczak, École Normale Supérieure, France
Reviewing Editor
 Pierre Sens, Institut Curie, PSL Research University, CNRS, France
Reviewers
 Martin J Lercher, HeinrichHeineUniversität, Germany
 Srividya IyerBiswas, Purdue University, United States
Publication history
 Preprint posted: April 4, 2021 (view preprint)
 Received: April 8, 2021
 Accepted: September 29, 2021
 Accepted Manuscript published: September 30, 2021 (version 1)
 Version of Record published: October 21, 2021 (version 2)
Copyright
© 2021, Lalanne and Li
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 801
 Page views

 164
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Computational and Systems Biology
 Evolutionary Biology
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machinelearningdriven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.

 Computational and Systems Biology
 Evolutionary Biology
Using a neural network to predict how green fluorescent proteins respond to genetic mutations illuminates properties that could help design new proteins.