Overflow metabolism originates from growth optimization and cell heterogeneity

Xin Wang

doi:10.7554/eLife.94586.2

Introduction

A prominent feature of cancer metabolism is that tumor cells excrete large quantities of fermentation products in the presence of sufficient oxygen (Hanahan and Weinberg, 2011; Liberti and Locasale, 2016; Vander Heiden et al., 2009). This process, discovered by Otto Warburg in the 1920s (Warburg et al., 1924) and known as the Warburg effect, aerobic glycolysis, or overflow metabolism (Basan et al., 2015; Hanahan and Weinberg, 2011; Liberti and Locasale, 2016; Vander Heiden et al., 2009), is ubiquitous among fast-proliferating cells across a broad spectrum of organisms (Vander Heiden et al., 2009), ranging from bacteria (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006; Neidhardt et al., 1990) and fungi (De Deken, 1966) to mammalian cells (Hanahan and Weinberg, 2011; Liberti and Locasale, 2016; Vander Heiden et al., 2009). For microbes, cells use standard respiration when nutrients are scarce, while they use the counterintuitive aerobic glycolysis when nutrients are adequate, just analogous to normal tissues and cancer cells, respectively (Vander Heiden et al., 2009).

Over the past century, and particularly through extensive studies in the last two decades (Liberti and Locasale, 2016), various rationales for overflow metabolism have been proposed (Basan et al., 2015; Chen and Nielsen, 2019; Majewski and Domach, 1990; Molenaar et al., 2009; Niebel et al., 2019; Peebo et al., 2015; Pfeiffer et al., 2001; Shlomi et al., 2011; Vander Heiden et al., 2009; Varma and Palsson, 1994; Vazquez et al., 2010; Vazquez and Oltvai, 2016; Zhuang et al., 2011). Notably, Basan et al. (Basan et al., 2015) provided a systematic characterization of this process, including various types of experimental perturbations. Currently, prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019) hold that overflow metabolism arises from the proteome efficiency in fermentation being consistently higher than that in respiration. However, recent studies have shown that the measured proteome efficiency in respiration is actually higher than in fermentation for many yeast and cancer cells (Shen et al., 2024), even though these cells generate fermentation products through aerobic glycolysis. This finding (Shen et al., 2024) apparently contradicts the prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019). Furthermore, most explanations (Basan et al., 2015; Chen and Nielsen, 2019; Majewski and Domach, 1990; Niebel et al., 2019; Shlomi et al., 2011; Varma and Palsson, 1994; Vazquez et al., 2010; Vazquez and Oltvai, 2016; Zhuang et al., 2011) heavily rely on the assumption that cells optimize their growth rate for a given rate of carbon influx (i.e., nutrient uptake rate) under each nutrient condition (or its equivalents). However, this assumption is questionable, as the given factors in a nutrient condition are the identities and concentrations of the carbon sources (Molenaar et al., 2009; Scott et al., 2010; Wang et al., 2019), rather than the carbon influx. Therefore, the origin and function of overflow metabolism still remain unclear (DeBerardinis and Chandel, 2020; Hanahan and Weinberg, 2011; Liberti and Locasale, 2016; Vander Heiden et al., 2009).

Why have microbes and cancer cells evolved to possess the seemingly wasteful strategy of aerobic glycolysis? For unicellular organisms, there is evolutionary pressure (Vander Heiden et al., 2009) to optimize cellular resources for rapid growth (Dekel and Alon, 2005; Edwards et al., 2001; Hui et al., 2015; Li et al., 2018; Scott et al., 2010; Towbin et al., 2017; Wang et al., 2019; You et al., 2013). In particular, it has been shown that cells allocate protein resources for optimal growth (Hui et al., 2015; Scott et al., 2010; Wang et al., 2019; You et al., 2013), and the most efficient protein allocation corresponds to elementary flux mode (Müller et al., 2014; Wortel et al., 2014). For cancer cells, disrupting the growth control system and evading immune destruction from the host are prominent hallmarks of their survival (Hanahan and Weinberg, 2011), which in certain ways mimic the evolutionary pressure on microbes to optimize cell growth rate. In this study, we apply the optimal growth principle of microbes, which also roughly holds for cancer cells, to a heterogeneous framework to address the puzzle of aerobic glycolysis. We use Escherichia coli as a typical example to show that overflow metabolism can be understood from optimal protein allocation combined with heterogeneity in enzyme catalytic rates. The optimal growth strategy varies between respiration and fermentation depending on the concentration and type of the nutrient, and the combination with cell heterogeneity results in the standard picture (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006; van Hoek et al., 1998) of overflow metabolism. Our model quantitatively illustrates the growth rate dependence of fermentation/respiration flux and enzyme allocation under various types of perturbations in E. coli. Furthermore, it provides a quantitative explanation for the data on the Crabtree effect in yeast and the Warburg effect in cancer cells (Bartman et al., 2023; Shen et al., 2024).

Results

Coarse-grained model

Based on the topology of the metabolic network (Neidhardt et al., 1990; Nelson et al., 2008) (see Fig. 1A), we classify the carbon sources that enter from the upper part of glycolysis into Group A (Wang et al., 2019) and the precursors of biomass components (such as amino acids) into five pools. Specifically, each pool is designated according to its entry point (see Fig. 1A and Appendix 1.2 for details): a1 (entry point: G6P/F6P), a2 (entry point: GA3P/3PG/PEP), b (entry point: pyruvate/acetyl-CoA), c (entry point: a-ketoglutarate), and d (entry point: oxaloacetate). Pools al and a2 are also combined as Pool a due to the joint synthesis of precursors. Then, the metabolic network for Group A carbon source utilization (see Fig. 1A) can be coarse-grained into a model shown in Fig. 1B (see Appendix 2.1 for details), where node A represents an arbitrary carbon source of Group A. Evidently, Fig. 1B is topologically identical to Fig. 1A. Each coarse-grained arrow in Fig. 1B represents a stoichiometric flux J_i, which delivers carbon flux and may be accompanied by energy consumption or biogenesis (e.g., J₁, J_a1, see Figs. 1A-B and Appendix-fig. 1A).

Model and results of overflow metabolism in *E. coli*.
**(A)** The central metabolic network of carbon source utilization. The Group A carbon sources (Wang et al., 2019) are labeled with green squares. (B) Coarse-grained model for Group A carbon source utilization. (C) Model predictions (see Eqs. S47 and S160) and experimental results (Basan et al., 2015; Holms, 1996) of overflow metabolism, covering the data for all the Group A carbon sources shown in (A). (D) Growth rate dependence of respiration and fermentation fluxes (see Eqs. S47 and S160). (E) The proteome efficiencies for energy biogenesis in the respiration and fermentation pathways vary with growth rate as functions of the substrate quality of a Group A carbon source (see Eqs. S31 and S36). See Appendices 8 and 10 for model parameter settings and experimental data sources (Basan et al., 2015; Holms, 1996; Hui et al., 2015) for Figs. 1–4 of *E. coli*.

In fact, the stoichiometric flux J_i scales with the cell population. For comparison with experiments, we define the normalized flux , which can be regarded as the flux per unit of biomass (the superscript “(N)” stands for normalized; see Appendix 1.3–1.4 for details). Here, M_carbon represents the carbon mass of the cell population, and m₀ is the weighted average carbon mass of metabolite molecules at the entry of precursor pools (see Eq. S17). Then, the cell growth rate λ can be represented by the total outflow of the normalized fluxes: (see Appendix 1.4). The normalized fluxes of respiration and fermentation are and , respectively (see Fig. 1A-B). In practice, each is characterized by two quantities: the proteomic mass fraction ϕ_i of the enzyme dedicated to carrying the flux and the substrate quality κ_i, such that . We take the Michaelis-Menten form for the enzyme kinetics (Nelson et al., 2008), and then (see Eq. S12 and Appendix 1.4 for details), where [S_i] is the concentration of substrate S_i, and K_i is the Michaelis constant. For each intermediate node and reaction along the pathway (e.g., node M₁ in J_a1), the substrate quality κ_i can be approximated as a constant (see Appendix 1.5): , where [S_i] ≥ K_i generally holds true in bacteria (Bennett et al., 2009; Park et al., 2016). However, the nutrient quality κ_A is a variable that depends on the nutrient type and concentration of a Group A carbon source (see Eq. S27).

Generally, there are three independent fates for a Group A carbon source in the metabolic network (Chen and Nielsen, 2019): fermentation, respiration, and biomass generation (see Appendix-fig. 1C-E). Each draws a distinct proteome fraction of ϕ_f, ϕ_r and ϕ_BM, with no overlap between them (see Appendix 2.1). The net effect of the first two fates is energy biogenesis, while the last one generates precursors for biomass, accompanied by energy biogenesis. By applying the proteomic constraint (Scott et al., 2010) that there is a maximum fraction ϕ_max for proteome allocation (ϕ_max ≈ 0.48 (Scott et al., 2010)), we have:

In fact, Eq. 1 is equivalent to (see Appendix 2.1 for derivation details), where ϕ_R and ϕ_A represent the proteomic mass fractions of the active ribosome-affiliated proteins and the cargo proteins responsible for the uptake of the Group A carbon source, respectively. During cell proliferation, ribosomes serve as the factories for protein synthesis and are primarily composed of proteins (Neidhardt et al., 1990; Nelson et al., 2008), while other biomass components, such as RNA, are optimally produced (Kostinski and Reuveni, 2020) in accordance with the growth rate determined by protein synthesis. Thus, the cell growth rate is proportional to ϕ_R: λ = ϕ_R ° κ_t, where κ_t is a parameter set by the translation rate (Scott et al., 2010) (see Appendix 1.1 for details), which can be approximated as a constant within the growth rate range of interest (Dai et al., 2016).

For balanced cell growth in bacteria, the energy demand J_E, expressed as the stoichiometric energy flux in ATP, is generally proportional to the biomass production rate (Ebenhoh et al., 2024), since the proportion of maintenance energy is roughly negligible (Locasale and Cantley, 2010) (see Appendix 9 for the cases of yeast and tumor cells). Thus, the normalized flux of energy demand in ATP, denoted as , representing the energy demand per unit of biomass, is proportional to the growth rate λ (see Appendix 2.1 for details):

where η_E is an energy coefficient (see Eqs. S25–S26 for details). By converting all energy currencies (such as NADH, FADH2, etc.) into ATP, the normalized energy fluxes for respiration and fermentation are given by and , where and are the stoichiometric coefficients of ATP production per glucose in each pathway (see Appendixfig. 1C-E and Appendix 2.1 for details). The denominator coefficient of “2” is derived from the stoichiometry of the coarse-grained reaction M₁→2M₂ (see Fig. 1A-B). Applying the criteria of flux balance (i.e., mass conservation; see Appendix 1.3) at each intermediate node (M_i, i = 1, …, 5) and precursor pool (Pool i, i = a1, a2, b, c, d), along with the constraints of proteome allocation (see Eq. 1) and energy demand (see Eq. 2), we obtain the relations between normalized energy fluxes and growth rate for a given nutrient condition with a fixed κ_A (see Appendix 2.1 for details):

where φ is a constant coefficient primarily determined by η_E (see Eq. S33), and φ·λ represents the normalized flux of energy demand, excluding the energy biogenesis from the biomass synthesis pathway. The coefficients ψ, ψ_r and ψ_f are functions of κ_A · ψ^-1 denotes the proteome efficiency for biomass generation in the biomass synthesis pathway (see Eq. S32), defined as ψ^-1 = λ/ϕ_BM (see Appendix 2.1). ψ_r and ψ_f represent the proteome efficiencies for energy biogenesis in the respiration and fermentation pathways, respectively, defined as the normalized energy fluxes expressed in ATP generated per proteomic mass fraction, with and . Hence,

where both and are composite parameters that can be approximated as constants, with = 1/κ₁ + 2/κ₂ + 2/κ₃ + 2/κ₄ and = 1/κ₁ + 2/κ₂ + 2/κ₆ (see Appendices 1.5 and 2.1 for details).

Origin of overflow metabolism

The standard picture of overflow metabolism (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006; van Hoek et al., 1998) is exemplified by the experimental data (Basan et al., 2015) presented in Fig. 1C, where the fermentation flux exhibits a threshold-analog dependence on the growth rate λ. It is well established that respiration is significantly more efficient than fermentation in terms of energy biogenesis per unit of carbon (i.e., ) (Nelson et al., 2008; Vander Heiden et al., 2009). Then, why do cells bother to use the seemingly wasteful fermentation pathway? We proceed to address this issue by applying optimal protein allocation (Scott et al., 2010; Wang et al., 2019) within the framework of optimal growth.

For cell proliferation in a given nutrient condition (i.e., with a fixed κ_A), the values of ε_r, ε_f and ψ are determined (Eqs. 4 and S32). However, the growth rate λ can be influenced by protein allocation between respiration and fermentation, specifically λ_r and λ_f, according to the governing equation (Eq. 3). If ε_r > ε_f, that is, if the proteome efficiency in respiration is higher than that in fermentation, then . The optimal growth strategy is , meaning that the cell exclusively uses respiration. Conversely, if ε_f> ε_r, then is optimal, and the cell solely uses fermentation. In either case, the choice between respiration and fermentation for optimal growth is determined by weighing the proteome efficiencies.

In practice, both proteome efficiencies ε_r and ε_f are functions of nutrient quality κ_A, which can be influenced by the nutrient type and concentration of the carbon source (see Eqs. 4 and S27). Therefore, the choice for optimal growth may vary depending on the nutrient conditions. In nutrient-poor conditions where and , the proteome efficiencies can be approximated by and (see Eq. 4), and hence ε_r (κ_A)>ε_f(κ_A) (since ), meaning that the proteome efficiency of respiration is higher than that of fermentation under nutrient-poor conditions. In rich media, however, using the parameters of κ_i derived from in vivo/in vitro experimental data of E. coli (see Appendix-tables 1–2 and Appendix 6.1–6.2), we obtain with Eq. 4 (see also Eqs. S39–S40), where represents the substrate quality of glucose at saturated concentration (abbreviated as “ST” in the superscript).

This means that the proteome efficiency in fermentation is higher than that in respiration for bacteria in rich media. Indeed, recent studies have validated that the measured proteome efficiency in fermentation is higher than in respiration for E. coli in lactose at saturated concentration (Basan et al., 2015), i.e., . In Fig. 1E, we present the growth rate dependence of proteome efficiencies ε_r and ε_f in a three-dimensional (3D) form using the collected data shown in Appendix-table 1, where ε_r, ε_f and the growth rate λ all vary as functions of nutrient quality κ_A. Furthermore, the ratio Δ (defined as Δ(κ_A) ≡ ε_f (κ_A)/ε_r (κ_A)) is a monotonically increasing function of κ_A, and there exists a critical value of κ_A (denoted as ; see Appendix 2.2 for details) satisfying . Below , the cell grows slowly, and the choice for optimal growth is respiration (with ε_f < ε_r, λ = ϕ_max · (ψ + φ/ε_r)^-1), while above , the cell grows faster, and the optimal growth strategy is fermentation (with ε_f < ε_r, λ = ϕ_max · (ψ + φ/ε_f)^-1)). The above analysis qualitatively explains the phenomenon of aerobic glycolysis.

For a quantitative understanding of overflow metabolism, let us first consider the homogeneous case, where all cells share identical biochemical parameters. For optimal protein allocation, the relation between fermentation flux and growth rate is given by , where “Q” represents the Heaviside step function, and λ_C is the critical growth rate for a nutrient condition with (i.e., λ_C ≡ λ()). Similarly, the growth rate dependence of respiration flux is . These digital response outcomes are consistent with the numerical simulation findings of Molenaar et al. (Molenaar et al., 2009). However, they are clearly incompatible with the threshold-analog response observed in the standard picture of overflow metabolism (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006; van Hoek et al., 1998).

To address this issue, we take into account cell heterogeneity, which is ubiquitous in both microbes (Ackermann, 2015; Bagamery et al., 2020; Balaban et al., 2004; Nikolic et al., 2013; Solopova et al., 2014; Wallden et al., 2016; Yaginuma et al., 2014; Zhang et al., 2018) and tumor cells (Duraj et al., 2021; Hanahan and Weinberg, 2011; Hensley et al., 2016). In the context of the Warburg effect or overflow metabolism, experimental studies have reported significant metabolic heterogeneity in the choice between respiration and fermentation within a cell population (Bagamery et al., 2020; Duraj et al., 2021; Hensley et al., 2016; Nikolic et al., 2013). Motivated by the observation that the turnover number (k_cat value) of a catalytic enzyme varies considerably between in vitro and in vivo measurements (Davidi et al., 2016; Garcia-Contreras et al., 2012), we note that the concentrations of potassium and phosphate, which vary from cell to cell, have a significant impact on the k_cat values of metabolic enzymes (Garcia-Contreras et al., 2012). Therefore, within a cell population, there is a distribution of k_cat values for a catalytic enzyme, commonly referred to as extrinsic noise (Elowitz et al., 2002). For simplicity, we assume that the k_cat values for each enzyme follow a Gaussian distribution. This leads to varied distributions of single-cell growth rate in different carbon sources (see Eqs. S155–S157, S163–S165), which has been fully verified by recent experiments using isogenic E. coli at single-cell resolution (Wallden et al., 2016) (Appendix-fig. 2B). Consequently, the critical growth rate λ_C should follow a Gaussian distribution within a cell population (see Appendix 7 for details), where μ_λ_C is approximated by the deterministic result of λ_C (Eq. S43). We assume that the coefficient of variation (CV) is σ_λ_C/μ_λ_C = 12%, or equivalently, the CV for the catalytic rate of each metabolic enzyme is 25%. Thus, we obtain the growth rate dependence of fermentation and respiration fluxes (see Appendix 2.3 for details):

where “erf” represents the error function. The fermentation flux exhibits a threshold-analog relation with the growth rate (the red curves in Figs. 1C-D, 2B-C, and 3B, D, F), while the respiration flux (the blue curve in Fig. 1D) decreases with an increase in fermentation flux. In Fig. 1C-D, we observe that the model results (see Eq. 5 and Appendix 8; parameters are set based on the experimental data shown in Appendix-table S1) quantitatively agree with the experimental data from E. coli (Basan et al., 2015; Holms, 1996). The fermentation flux was determined by the acetate secretion rate , and the respiration flux was deduced from the carbon dioxide flux (the superscript “(M)” represents the measurable flux in the unit of mM/OD600/h; see Appendix 8.1 for details). Therefore, by incorporating cell heterogeneity, our model of optimal protein allocation quantitatively explains overflow metabolism.

Influence of protein overexpression on overflow metabolism in *E. coli*.
**(A)** A 3D plot of the relations among fermentation flux, growth rate, and the expression level of useless proteins. In this plot, both the acetate excretion rate and growth rate vary as bivariate functions of the nutrient quality of a Group A carbon source (denoted as *κ_A*) and the useless protein expression encoded by *lacZ* gene (denoted as *ϕ_Z* perturbation; see Eqs. S57 and S160). (B) Growth rate dependence of the acetate excretion rate upon *ϕ_Z* perturbation for each fixed nutrient condition (see Eqs. S58 and S160). (C) Growth rate dependence of the acetate excretion rate as *κ_A* varies (see Eqs. S57 and S160), with each fixed expression level of LacZ.

Influence of energy dissipation, translation inhibition, and carbon source category alteration on overflow metabolism in *E. coli*.
**(A)** A 3D plot of the relations among fermentation flux, growth rate, and the energy dissipation coefficient (see Eqs. S70 and S160). (B) Growth rate dependence of the acetate excretion rate as the nutrient quality *κ_A* varies, with each fixed energy dissipation coefficient determined by or fitted from experimental data. (C) A 3D plot of the relations among fermentation flux, growth rate, and the translation efficiency (see Eqs. 85 and S160). Here, the translation efficiency is adjusted by the dose of chloramphenicol (Cm). (D) Growth rate dependence of the acetate excretion rate as *κ_A* varies, with each fixed dose of Cm. (E) Coarse-grained model for pyruvate utilization. (F) The growth rate dependence of fermentation flux in pyruvate (see Eqs. 105 and S160) significantly differs from that of the Group A carbon sources (see Eqs. 47 and S160).

Testing the model through perturbations

To further test our model, we systematically investigate its predictions under various types of perturbations and compare them with experimental data from existing studies (Basan et al., 2015; Holms, 1996) (see Appendices 3 and 4.1 for details).

First, we consider the proteomic perturbation caused by overexpression of useless proteins encoded by the lacZ gene (i.e., ϕ_Z perturbation) in E. coli. The net effect of the ϕ_Z perturbation is that the maximum fraction of the proteome available for resource allocation changes from ϕ_max to ϕ_max − ϕ_Z (Basan et al., 2015), where ϕ_Z is the proteomic mass fraction of useless proteins. In a cell population, the critical growth rate λ_C (ϕ_Z) still follows a Gaussian distribution N (μ_λ_C (ϕ_Z), σ_λ_C (ϕ_Z)²), where the CV of λ_C (ϕ_Z) remains unchanged. Consequently, the growth rate dependence of fermentation flux changes to (see Appendix 3 for model perturbation results regarding respiration flux), where both the growth rate λ(κ_A, ϕ_Z) and the normalized fermentation flux are bivariate functions of κ_A and ϕ_Z (see Eqs. S49 and S56–S57). For each degree of LacZ expression (with fixed ϕ_Z), similar to wild-type strains, the fermentation flux exhibits a threshold-analog response to growth rate as κ_A varies (see Fig. 2C), which agrees quantitatively with experimental results (Basan et al., 2015). The shifts in the critical growth rate λ_C(ϕ_Z) are fully captured by μ_λ_C (ϕ_Z) = μ_λ_C (0) (1 − ϕ_Z/ϕ_max). In contrast, for nutrient conditions with each fixed κ_A, since the growth rate changes with ϕ_Z just like λ_C (ϕ_Z): λ(κ_A, ϕ_Z) = λ (κ_A, 0)(1 − ϕ_Z/ϕ_max), the fermentation flux is then proportional to the growth rate for the varying levels of LacZ expression: , where the slope is a monotonically increasing function of the substrate quality κ_A. These scaling relations are well validated by the experimental data (Basan et al., 2015) shown in Fig. 2B. Finally, in the case where both κ_A and ϕ_Z are free to vary, the growth rate dependence of fermentation flux presents a threshold-analog response surface in a 3D plot, where ϕ_Z appears explicitly as the y-axis (see Fig. 2A). Experimental data points (Basan et al., 2015) lie right on this surface, which is highly consistent with the model predictions.

Next, we study the influence of energy dissipation, which introduces an energy dissipation coefficient w to Eq. 2: . Similarly, the critical growth rate in this case, λ_C(w), follows a Gaussian distribution N(μ_λ_C(w), σ_λ_C(w)²) in a cell population. The relation between the growth rate and fermentation flux can be characterized by (see Appendix 3.2 for details): . In Fig. 3A-B, we present a comparison between the model results and experimental data (Basan et al., 2015) in 3D and 2D plots, which demonstrate good agreement. A notable characteristic of energy dissipation, as distinguished from ϕ_Z perturbation, is that the fermentation flux increases despite a decrease in the growth rate when κ_A is fixed.

We proceed to analyze the impact of translation inhibition with different sub-lethal doses of chloramphenicol on E. coli. This type of perturbation introduces an inhibition coefficient ι to the translation rate, thus turning κ_t into κ_t/(ι + 1). Still, the critical growth rate λ_C(ι) follows a Gaussian distribution N (μ_λ_C(ι), σ_λ_C(ι)²), and then, the growth rate dependence of fermentation flux is is given by: (see Appendix 3.3 for details). In Appendix-fig. 2D-E, we observe that the model predictions are generally consistent with the experimental data (Basan et al., 2015). However, a noticeable systematic discrepancy arises when the translation rate is low. Therefore, we consider maintenance energy, which is typically tiny and generally negligible for bacteria over the growth rate range of interest (Basan et al., 2015; Locasale and Cantley, 2010; Neidhardt, 1996). Encouragingly, by assigning a very small value to the maintenance energy coefficient w₀ (w₀ = 2.5 (h^-1)), the model results for the growth rate-fermentation flux relation quantitatively agree with experiments (Basan et al., 2015) (see Fig. 3C-D and Appendix 3.3 for details).

Finally, we consider the alteration of nutrient categories by switching to a non-Group A carbon source: pyruvate, which enters the metabolic network from the endpoint of glycolysis (Neidhardt et al., 1990; Nelson et al., 2008). The coarse-grained model for pyruvate utilization is shown in Fig. 3E (see also Fig. 1A), which shares identical precursor pools with those for Group A carbon sources, yet has several differences in the coarse-grained reactions. The growth rate dependencies of both the proteome efficiencies (see Appendix-fig. 2H) and energy fluxes (see Fig. 3F) are qualitatively similar to those of Group A carbon source utilization, while there are quantitative differences in the coarse-grained parameters (see Appendices 4.1 and 8 for details). Most notably, the critical growth rate and the ATP production per glucose in the fermentation pathway for pyruvate utilization are noticeably smaller than those for Group A sources (i.e., λ_C and , respectively). Consequently, the growth rate dependence of fermentation flux in pyruvate should present a distinctly different curve from that of Group A carbon sources (see Eqs. 5 and S105), which is fully validated by experimental results (Holms, 1996) (see Fig. 3F).

Enzyme allocation under perturbations

As mentioned above, our coarse-grained model is topologically identical to the central metabolic network (Fig. 1A) and can thus predict enzyme allocation for each gene in glycolysis and the TCA cycle (see Appendix-fig. 1B and Appendix-table 1) under various types of perturbations. In Fig. 1B, the intermediate nodes M₁, M₂, M₃, M₄, and M₅ represent G6P, PEP, acetyl-CoA, α-ketoglutarate, and oxaloacetate, respectively. Therefore, ϕ₁ and ϕ₂ correspond to enzymes involved in glycolysis (or at the junction of glycolysis and the TCA cycle), while ϕ₃ and ϕ₄ correspond to enzymes in the TCA cycle (see Fig. 1A-B and Appendix 2.1).

We first consider enzyme allocation under carbon limitation by varying the nutrient type and concentration of a Group A carbon source (i.e., κ_A perturbation). This has been extensively studied in more simplified models (Hui et al., 2015; You et al., 2013), where the growth rate dependence of enzyme allocation under κ_A perturbation is generally described by a C-line response (Hui et al., 2015; You et al., 2013). Specifically, the genes responsible for digesting carbon compounds exhibit a linear increase in gene expression as the growth rate decreases (Hui et al., 2015; You et al., 2013). However, when it comes to enzymes catalyzing reactions between intermediate nodes, we gathered experimental data from existing studies (Hui et al., 2015) and found that the enzymes in glycolysis exhibit a completely different response pattern compared to those in the TCA cycle (Appendix-fig. 3A-B). This discrepancy cannot be explained by the C-line response. To address this issue, we apply the coarse-grained model described above (see Fig. 1B) to calculate the growth rate dependence of enzyme allocation for each ϕ_i(i = 1, 2, 3, 4) using model settings for wild-type strains, with no fitting parameters influencing the shape (see Eqs. S118–S119 and Appendix 8). In Fig. 4A-B and Appendix-fig. 3C-D, we see that the model predictions overall match with the experimental data (Hui et al., 2015) for representative genes from either glycolysis or the TCA cycle, and maintenance energy (with w₀ = 2.5 (h^-1)) has a negligible effect on this process. Still, there are minor discrepancies that arise from the basal expression of metabolic genes, which may be attributed to the fact that our model deals with relatively stable growth conditions while microbes need to be prepared for fluctuating environments (Basan et al., 2020; Kussell and Leibler, 2005; Mori et al., 2017).

Relative protein expression of central metabolic enzymes in *E. coli* under nutrient limitation and proteomic perturbation.
**(A, C)** Relative protein expression of representative genes from glycolysis. (**B, D**) Relative protein expression of representative genes from the TCA cycle. (**A, B**) Results of the perturbation through changes in nutrient quality *κ_A* (see Eq. S119). (**C, D**) Results of proteomic perturbation via varied levels of expression of the useless protein LacZ (i.e., *ϕ_Z* perturbation; see Eq. S121).

We proceed to analyze the influence of ϕ_Z perturbation and energy dissipation. In both cases, our model predicts a linear response to growth rate reduction for all genes in either glycolysis or the TCA cycle (see Appendix 5.2–5.3 for details). For ϕ_Z perturbation, all predicted slopes are positive, and there are no fitting parameters involved (Eqs. S120–S121). In Fig. 4C-D and Appendix-fig. 3E-J, we show that our model quantitatively illustrates the experimental data (Basan et al., 2015) for representative genes in the central metabolic network, and there is better agreement with experiments (Basan et al., 2015) by incorporating the maintenance energy (with w₀ = 2.5 (h^-1) as aforementioned). For energy dissipation, however, the predicted slopes of the enzymes corresponding to ϕ₄ are negative, and there is a constraint that the slope signs of the enzymes corresponding to the same ϕ_i(i = 1, 2, 3) should be the same. In Appendix-fig. 3K-N, we see that the model results (Eqs. S127 and S123) are consistent with experiments (Basan et al., 2015).

Explanation of the Crabtree effect in yeast and the Warburg effect in cancer cells

We proceed to apply our model to explain the Crabtree effect in yeast (Bagamery et al., 2020; De Deken, 1966; Shen et al., 2024) and the Warburg effect in tumors (Bartman et al., 2023; Duraj et al., 2021; Hanahan and Weinberg, 2011; Liberti and Locasale, 2016; Shen et al., 2024; Vander Heiden et al., 2009) with slight modifications using the optimal growth principle combined with cell heterogeneity (see Appendix 9 and Appendix-fig. 5). For yeast and tumors, similar to the case of E. coli, the proteome efficiencies ε_r and ε_f are both increasing functions of the nutrient quality κ_A (see Eq. S170). Specifically, under poor nutrient conditions (i.e., ε_A is small), the proteome efficiency in respiration is higher than that in fermentation: ε_r > ε_f (see Eqs. S174–S175), making respiration the best choice for optimal growth (see Eq. S171). Conversely, when the nutrient is abundant and ε_f > ε_r, aerobic glycolysis (i.e., fermentation) becomes the optimal growth strategy (see Eq. S172). Further combination with cell heterogeneity results in the standard picture of overflow metabolism, which has indeed been observed in yeast (van Hoek et al., 1998). However, it remains challenging to tune the growth rate of cancer cells in vivo.

Recently, Shen et al. (Shen et al., 2024) discovered that the proteome efficiency measured at the cell population level in respiration (i.e., 〈ε_r〉; where “〈 〉” denotes the population average) is higher than that in fermentation (i.e., 〈ε_f〉) for many yeast and cancer cells, despite the presence of fermentation fluxes through aerobic glycolysis. Evidently, this finding (Shen et al., 2024) contradicts prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019), which hold that overflow metabolism arises because the proteome efficiency in fermentation is consistently higher than in respiration. Nevertheless, our model may resolve this puzzle due to the incorporation of two important features. First, our model predicts that the proteome efficiency in respiration is larger than that in fermentation when nutrient quality is low (see Eqs. S174–S175). Second, and crucially, by accounting for cell heterogeneity, our model allows a proportion of cells to have a higher proteome efficiency in fermentation than in respiration, even when the overall proteome efficiency in respiration at the cell population level is greater than that in fermentation (i.e., 〈ε_r〉 > 〈ε_f〉).

To compare our model results quantitatively with experimental data on yeast and tumors (Shen et al., 2024), we define as the fraction of ATP produced through fermentation. To account for cell heterogeneity, we apply Gaussian distributions to enzyme turnover numbers, as mentioned above. This yields the relationship between and 〈ε_r〉 and 〈ε_f〉 through derivations (see Eqs. S180–S190 and Appendix 9 for details):

where χ_εr and χ_εf represent the CVs of proteome efficiencies ε_r and ε_f, respectively. Due to the higher levels of cell heterogeneity in yeast (Bagamery et al., 2020) and cancer cells (Duraj et al., 2021; Hanahan and Weinberg, 2011; Hensley et al., 2016), the CVs of ε_r and ε_f (i.e., χ_εr and χ_εf) in these cells are expected to be significantly higher than those in E. coli, although their precise values are unknown. The values for the variables shown in Eq. 6 can be obtained from experiments. Therefore, we plot the theoretical results from Eq. 6 using χ_εr and χ_εf values of 0.25, 0.40, and 0.58 to compare with experimental data from yeast and in vivo mouse tumors (Bartman et al., 2023; Shen et al., 2024). As shown in Fig. 5A-B, the theoretical results with χ_εr = χ_εf = 0.58 align quantitatively with the experimental data (Bartman et al., 2023; Shen et al., 2024) on both logarithmic and linear scales, demonstrating that our model has the potential to quantitatively explain the Crabtree effect in yeast and the Warburg effect in cancer cells.

Model comparison with data on the Crabtree effect in yeast and the Warburg effect in tumors.
**(A)** A linear scale representation on the y-axis. (B) A log scale representation on the y-axis. In (**A-B**), 〈*ε_r*〉 and 〈*ε_f*〉 represent the population averages of *ε_r* and *ε_f*, while *χ_ε_r* and *χ_ε_f* are the coefficients of variation (CVs) of *ε_r* and *ε_f* · 〈*ε_r*〉/〈*ε_f*〉 represents the ratio of proteome efficiency between respiration and fermentation at the population-averaged level, while stands for the fraction of energy flux generated by the fermentation pathway (see Eq. 6). The data for yeast in batch culture and chemostat were calculated from experimental data of *S. cerevisiae* and *I. orientalis* (Shen et al., 2024). The data for mouse tumors were calculated from in vivo experimental data of pancreatic ductal adenocarcinoma (PDAC) and leukemic spleen of mice (Bartman et al., 2023; Shen et al., 2024). See Appendix 10 for detailed information on the experimental data sources (Bartman et al., 2023; Shen et al., 2024).

Discussion

The phenomenon of overflow metabolism, or the Warburg effect, has been a long-standing puzzle in cell metabolism. Although many rationales have been proposed over the past century (Basan et al., 2015; Chen and Nielsen, 2019; Majewski and Domach, 1990; Molenaar et al., 2009; Niebel et al., 2019; Peebo et al., 2015; Pfeiffer et al., 2001; Shlomi et al., 2011; Vander Heiden et al., 2009; Varma and Palsson, 1994; Vazquez et al., 2010; Vazquez and Oltvai, 2016; Zhuang et al., 2011), contradictions persist (Shen et al., 2024), leaving the origin and function of this phenomenon unclear (DeBerardinis and Chandel, 2020; Hanahan and Weinberg, 2011; Liberti and Locasale, 2016; Vander Heiden et al., 2009). In this study, we use E. coli as a typical example and demonstrate that overflow metabolism can be understood through optimal protein allocation combined with cell heterogeneity. Under nutrient-poor conditions, the proteome efficiency of respiration is higher than that of fermentation (see Fig. 1E), and thus the cell uses respiration to optimize growth. In rich media, however, the proteome efficiency of fermentation increases more rapidly and surpasses that of respiration (see Fig. 1E), leading the cell to adopt fermentation as the optimal growth strategy. In further combination with cell heterogeneity in enzyme catalytic rates (Davidi et al., 2016; Garcia-Contreras et al., 2012), our model quantitatively illustrates the threshold-analog response (Basan et al., 2015; Holms, 1996) in overflow metabolism (see Fig. 1C). Furthermore, it quantitatively explains the data on the Crabtree effect in yeast and the Warburg effect in cancer cells (Bartman et al., 2023; Shen et al., 2024).

In existing rationales (Basan et al., 2015; Chen and Nielsen, 2019; Majewski and Domach, 1990; Shlomi et al., 2011; Varma and Palsson, 1994; Vazquez and Oltvai, 2016), the standard picture of overflow metabolism (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006; van Hoek et al., 1998) has primarily been illustrated by a threshold-linear response, which heavily depends on the assumption that cells optimize their growth rate based on a given rate of carbon influx under each nutrient condition (or similar equivalents, see Appendix 6.3). However, in practice, for microbes or tumor cells grown in vitro or in vivo, the given factors are the identity and concentration of the nutrient (Molenaar et al., 2009; Scott et al., 2010; Wang et al., 2019), rather than the rate of carbon influx. Additionally, prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019) suggest that overflow metabolism originates from the proteome efficiency in fermentation always being higher than that in respiration (see Appendix 6.3 for details). While it has been observed in E. coli that proteome efficiency in fermentation is higher than that in respiration for cells cultured in lactose at saturated concentration (Basan et al., 2015), Shen et al. (Shen et al., 2024) reported that for many yeast and cancer cells, the proteome efficiency in fermentation is noticeably lower than that in respiration, despite the presence of aerobic glycolytic fermentation flux. This observation (Shen et al., 2024) evidently contradicts the prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019). Our model resolves this puzzle by significantly differing from existing rationales in its optimization principle, where we purely optimize cell growth rate without imposing a special constraint on carbon influx (see Appendix 6.3 for details). More importantly, our model incorporates cell heterogeneity, which is crucial for both explaining the threshold-analog response in overflow metabolism and for resolving this puzzle raised by Shen et al. (Shen et al., 2024).

In the homogeneous case, the optimal growth strategy for growth rate dependent fermentation flux results in a digital response (see Eq. S44), corresponding to an elementary flux mode (Müller et al., 2014; Wortel et al., 2014), which aligns with the numerical study by Molenaar et al. (Molenaar et al., 2009), yet is incompatible with the standard picture of overflow metabolism (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006; van Hoek et al., 1998). Furthermore, cells would not generate fermentation flux if the proteome efficiency in fermentation were higher than that in respiration under the optimal growth framework. By incorporating heterogeneity in enzyme catalytic rates (Davidi et al., 2016; Garcia-Contreras et al., 2012), the critical growth rate (i.e., threshold) shifts from a single value to a Gaussian distribution (see Eq. S45 and Appendix 7 for details; see also Appendix-fig. 4) across a cell population, thereby turning a digital response into the threshold-analog response observed in overflow metabolism (see Fig. 1C). Moreover, cell heterogeneity allows a fraction of cells to possess a larger proteome efficiency in fermentation than in respiration despite the overall proteome efficiency in respiration at the cell population level is higher than in fermentation. This mechanism facilitates the fermentation flux in yeast and cancer cells observed by Shen et al. (Shen et al., 2024) (see Fig. 5A-B).

Our model results, based on cell heterogeneity, are further supported by observed distributions of single-cell growth rates in E. coli (Wallden et al., 2016) (see Appendix-fig. 2B), and by experiments involving various types of perturbations (Basan et al., 2015; Holms, 1996; Hui et al., 2015), both in terms of acetate secretion patterns and gene expression in the central metabolic network (see Fig. 2–4, Appendix-figs. 2D-E and 3). Furthermore, the heterogeneity patterns predicted by our model for fermentation and respiration modes within a cell population are validated by experiments on intra-tumor heterogeneity in glioblastoma using optical microscopy (Duraj et al., 2021) and align with the non-genetic heterogeneity observed in experiments with S. cerevisiae (Bagamery et al., 2020) and E. coli (Nikolic et al., 2013). Finally, our model can be broadly applied to address heterogeneity-related challenges in metabolism on a quantitative basis, including diverse metabolic strategies of cells in various environments (Bagamery et al., 2020; Duraj et al., 2021; Escalante-Chong et al., 2015; Hensley et al., 2016; Liu et al., 2015; Solopova et al., 2014; Wang et al., 2019).

Additional information

Author contributions

X.W. conceived and designed the project, developed the model, carried out the analytical and numerical calculations, and wrote the paper.

Competing Interest Statement

The author declares no competing interests.

Data, Materials, and Software Availability

All study data are included in the article and/or appendices.

Acknowledgements

The author thanks Chao Tang, Qi Ouyang, Yang-Yu Liu and Kang Xia for helpful discussions. This work was supported by National Natural Science Foundation of China (Grant Nos.12004443 and 12474207), Guangzhou Municipal Innovation Fund (Grant No.202102020284) and the Hundred Talents Program of Sun Yat-sen University.

Appendix 1 Model framework

Appendix 1.1 Proteome partition

Here we adopt the proteome partition framework similar to that introduced by Scott et al. (Scott et al., 2010). All proteins in a cell are classified into three classes: the fixed portion Q-class, the active ribosome-affiliated R-class, and the remaining catabolic/anabolic enzymes C-class. Each proteome class has a mass and mass fraction Q_i, where Q_Q is a constant, and we define ϕ_max = 1 − ϕ_Q. In the exponential growth phase, the ribosome allocation for protein synthesis of each class is f_i, with f_Q + f_R + f_C = 1.

To analyze cell growth optimization, we first consider the homogeneous case where all cells share identical biochemical parameters, simplifying the mass accumulation of the cell population into a “big cell”. This simplification does not affect the value of growth rate λ. For bacteria, the protein turnover is negligible, so the mass accumulation of each class follows:

where m_AA stands for the average molecular weight of amino acids, k_T is the translation rate, is the number of ribosomes, m_R is the protein mass of a single ribosome, and is the total protein mass of ribosomes, with (Neidhardt, 1996; Scott et al., 2010). For a specific stable nutrient environment, f_R and k_T are temporal invariants. Thus,

where λ = f_R ° k_T ° m_AA/(ς °m_R) , and the total protein mass of the cell population follows:

Over a long period in the exponential growth phase (i.e., t → + ∞), we have ϕ_i = f_i (i = Q, R, C), and

where K_t = k_T ° m_AA/(ς ° m_R).

Appendix 1.2 Precursor pools

Based on the entry points of the metabolic network, we classify the precursors of biomass components into five pools (Fig. 1A-B): a1 (entry point: G6P/F6P), a2 (entry point: GA3P/3PG/PEP), b (entry point: pyruvate/acetyl-CoA), c (entry point: a-ketoglutarate), and d (entry point: oxaloacetate). These five pools draw approximately r_a1 = 24%, r_a2 = 24%, r_b = 28%, r_c = 12% and r_d = 12% of the carbon flux (Nelson et al., 2008; Wang et al., 2019). There are overlapping components between Pools a1 and a2 due to the joint synthesis of some precursors. Therefore, we use Pool a to represent both Pools a1 and a2 in the descriptions.

Appendix 1.3 Stoichiometric flux

We consider the following biochemical reaction between substrate S_i and enzyme E_i:

where a_i, d_i and are the reaction parameters, S_i+1 is the product, b_i and c_i are the stoichiometric coefficients. For most of the reactions in the central metabolism, b_i = 1 and c_i = 0. The reaction rate follows Michaelis-Menten kinetics (Nelson et al., 2008):

where , [E_i] and [S_i] are the Michaelis constant, and the concentrations of enzyme E_i and substrate S_i, respectively. For this reaction (Eq. S5), d[S_i+1]/dt = b_i ° v_i and d[S_i]/dt = −v_i. In the cell population (the “big cell”), suppose that the cell volume is V_cell, then the stoichiometric flux of the reaction is:

The copy number of enzyme E_i is N_{E_i} = V_cell °[E_i] with a total weight of M_{E_i} = N_{E_i} ° m_{E_i}, where m_{E_i} is the molecular weight of E_i. By defining the enzyme cost of an E_i molecule as n_{E_i} = m_{E_i}/m₀, where m₀ is a unit mass, then the cost of all E_i molecules is Φ_i = N_{E_i} ° n_{E_i} (Wang et al., 2019). By further defining , then:

The mass fraction of enzyme E_i in the proteome is ϕ_i = M_{E_i}/M_protein, and thus:

Appendix 1.4 Carbon flux and cell growth rate

To clarify the relation between the stoichiometric flux J_i and growth rate λ, we consider the carbon flux in the biomass production. The carbon mass of the cell population (the “big cell”) is given by M_carbon = M_protein ° r_carbon/r_Proein, where r_carbon and r_protem represent the mass fraction of carbon and protein within a cell. In the exponential growth phase, the carbon flux of the biomass production is given by:

where m_carbon is the mass of a carbon atom. In fact, the carbon mass flux per stoichiometry varies depending on the entry point of the precursor pool. Taking Pool b as an example, there are three carbon atoms in a molecule of the entry point metabolite (i.e., pyruvate). Assuming that carbon atoms are conserved from pyruvate to Pool b, then the carbon flux of Pool b is given by , where J_b is the stoichiometric flux from pyruvate to Pool b (Fig. 1A-B) and stands for the carbon number of a pyruvate molecule. Combining with Eq. S10 and noting that , we get J_b ° ° m_carbon = r_b ° λ ° M_carbon. Similarly, for each precursor pool, we have:

where the subscript “EP_i“ represents the entry point of Pool i, and is the number of carbon atoms in a molecule of the entry-point metabolite.

For each substrate in intermediate steps of the metabolic network, we define K_i as the substrate quality:

and for each precursor pool, we define:

Combining Eqs. S8, S9 and S11, we have

Then, we define the normalized flux, which can be regarded as the flux per unit of biomass:

where the superscript “(N)” stands for normalized. Combined with Eqs. S8, S9 and S12, we have:

Since , by setting

we then obtain:

and we have , and

Appendix 1.5 Intermediate nodes

In a metabolic network, the metabolites between the carbon source and precursor pools are referred to as intermediate nodes. As specified by Wang et al. (Wang et al., 2019), to optimize cell growth rate, the substrate of each intermediate node is nearly saturated, and thus:

Real cases could be more complicated due to other forms of metabolic regulations. Recent quantitative studies (Bennett et al., 2009; Park et al., 2016) have shown that, at least in E. coli, for most of the substrate-enzyme pairs, K_i is lower than the substrate concentration (i.e., [S_i] > K_i), which implies .

Appendix 2 Model and analysis

Appendix 2.1 Coarse-grained model

In the coarse-grained model shown in Fig. 1B, node A represents an arbitrary carbon source of Group A (Wang et al., 2019), which joins at the upper part of glycolysis. Nodes M1, M2, M3, M4, and M5 stand for G6P, PEP, acetyl-CoA, a-ketoglutarate, and oxaloacetate, respectively. In the analysis of carbon supply into precursor pools, we lump sum G6P/F6P as M1, GA3P/3PG/PEP as M2, and pyruvate/acetyl-CoA as M3 for approximation. For the biochemical reactions, each follows Eq. S5 with b_i = 1 except for M1 →2M2 and M3+M5→M4. Basically, there are three independent possible fates for a Group A carbon source (e.g., glucose, see Appendix-fig. 1C-E) (Chen and Nielsen, 2019): energy biogenesis through fermentation; energy biogenesis via respiration (Appendix-fig.1C-D), or conversion into biomass components accompanied by energy biogenesis in the biomass pathway. Each fate involves a distinct fraction of the proteome, with no overlap between them (Appendix-fig. 1).

By applying flux balance to the stoichiometric fluxes and combining with Eq. S8, we have:

Obviously, the stoichiometric fluxes of respiration J_r and fermentation J_f (Appendix-fig. 1C-D) are:

We further assume that the carbon atoms are conserved from each entry point metabolite to the precursor pool, and then,

In terms of energy biogenesis for the relevant reactions, for convenience, we convert all the energy currencies into ATPs, namely, NADH→2ATP (Neidhardt et al., 1990), NADPH→2ATP (Neidhardt et al., 1990; Sauer et al., 2004), FADH₂→1ATP (Neidhardt et al., 1990). Then, we have

where J_E represents the energy demand for cell proliferation, expressed as the stoichiometric energy flux in ATP. β_i is the stoichiometric coefficient with β₁ = 4, β₂ = 3, β₃ = 2 , β₄ = 6, β₆ = 1, and β_a1 = 4 for E. coli (Neidhardt et al., 1990; Sauer et al., 2004). For bacteria, the energy demand is generally proportional to the carbon flux infused into biomass production, as the proportion of maintenance energy is roughly negligible (Locasale and Cantley, 2010). Thus,

where r_E is the ratio and also a constant.

By applying the substitutions specified in Eqs. S9, S12, S14–S18, combined with Eqs. S4, S10, S21–S25, and the constraint of proteome resource allocation ϕ_R + ϕ_C = ϕ_max, we have:

where and η_E are defined as and , respectively.

Here, for each intermediate node, K_i follows Eq. S20, which can be approximated as a constant. The substrate quality of the Group A carbon source k_A varies with the identity and concentration of the Group A carbon source:

which is determined externally by the culture condition. From Eq. S26, all ϕ_i and ϕ_R can be expressed in terms of , and λ :

In Eq. S28, for each ϕ_i or ϕ_R, the - and -related proteome fraction terms belong to the fractions of the proteome dedicated to respiration (denoted as far) and fermentation (denoted as faf), respectively. The 2 -related proteome fraction terms belong to those involved in the biomass synthesis pathway (denoted as ϕ_BM). Thus, , , and . By substituting Eq. S28 into Eq. S26, we have:

Here, and stand for the normalized energy fluxes of respiration and fermentation, with

where = β₁ + 2(β₂ + β₃ + β₄) and = β₁ + 2(β₂ + β₆), with = 26 and = 12 for E. coli. ε_r and ε_f represent the proteome efficiencies for energy biogenesis in the respiration and fermentation pathways (Appendix-fig. 1C-D), defined as and ; that is, the normalized energy fluxes expressed in ATP generated per proteomic mass fraction dedicated to respiration and fermentation, respectively. Hence,

ψ⁻¹ is the proteome efficiency for biomass generation in the biomass synthesis pathway (Appendix-fig. 1E), defined as (see Eqs. S15 and S19); that is, the normalized flux (which differs from the normalized energy flux used to define ε_r and ε_f) generated per proteomic mass fraction dedicated to biomass synthesis. Hence

φ is an energy demand coefficient (a constant), with

and φ ° 2 stands for the normalized flux of energy demand other than the accompanying energy biogenesis from the biomass synthesis pathway.

Appendix 2.2 The reason for overflow metabolism

Microbes optimize their growth rate to survive through the evolutionary process (Vander Heiden et al., 2009). The optimal growth principle also roughly holds for tumor cells, which proliferate while ignoring growth restriction signals and evading immune destruction by the host (Vander Heiden et al., 2009). First, we consider the optimal growth strategy for a single cell. The coarsegrained model for bacteria is summarized in Eq. S26 and further simplified in Eq. S29. Here, ε_r, ε_f and ψ are functions of κ_A (see Eqs. S31, S32), so we also denote them as ε_r (κ_A), ε_f (κ_A), ψ(κ_A). Evidently, the fluxes of both respiration and fermentation take non-negative values, i.e., , ≥ 0, and all the coefficients are positive: ε_r (κ_A), ε_f (κ_A), ψ(κ_A),φ > 0.

Thus, if ε_r > ε_f, then (ψ + φ/ε_r)° λ = ϕ_max − (1/ε_f − 1/ε_r) ≤ ϕ_max. Obviously, the solution for optimal growth is:

Similarly, if ε_f > ε_r, then the optimal growth solution is:

In both cases, the growth rate λ takes the maximum value for a given nutrient condition (i.e., given κ_A):

So, why do microbes use the seemingly wasteful fermentation pathway when the growth rate is large under aerobic conditions? Prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019) suggest that it originates from that the proteome efficiency in fermentation is consistently higher than in respiration (i.e., ε_f > ε_r). If this is the case, why do microbes still use the normal respiration pathway when the growth rate is small? The answer lies in the fact that both ε_r(κ_A) and ε_f(κ_A) are not constants, but are dependent on nutrient conditions. In Eq. S31, when κ_A is small, consider the extreme case of κ_A → 0, and then

Since , clearly,

Combined with Eq. S36, thus cells would certainly use the respiration pathway when the growth rate is very small. Meanwhile, suppose that is the maximum value of κ_A available across different Group A carbon sources, and if there exists a κ_A (with κ_A ≤ ) satisfying ε_r (κ_A) < ε_f(κ_A), specifically,

then Δ(κ_A) ≡ ε_f (κ_A)/ε_r (κ_A) is a monotonically increasing function of κ_A. Thus,

and cells would use the fermentation pathway when the growth rate is large.

In practice, experimental studies using E. coli (Basan et al., 2015) have demonstrated that proteome efficiency in fermentation is higher than in respiration when the Group A carbon source is lactose at a saturated concentration, i.e., ε_r() < ε_f(). Here, represents the substrate quality of lactose and the superscript “(ST)” signifies saturated concentration. In fact, E. coli grows much faster in G6p than lactose (Basan et al., 2015), thus, > , and hence, Eq. S40 holds for E. coli. From a theoretical perspective, we can verify Eq. S39 and consequently Eq. S40 using Eq. S20, combined with the in vivo/in vitro biochemical parameters obtained from experimental data (see Appendix-tables 1–2). For example, it is straightforward to confirm that using this method (see Appendix 8.2), further supporting the validity of Eqs. S39–S40 (see also Appendix 9).

Now that Eqs. S38–S40 are all valid, a critical value of κ_A, denoted as , exists, satisfying Δ () =1. Thus,

Combined with Eq. S31, we have:

By substituting Eq. S42 into Eqs. S31, S32 and S36, we obtain the expressions for ε_r (), ε_f () and the critical growth rate at the transition point (i.e., λ_C ≡ λ()):

where ε_r|f represents either ε_r or ε_f. In Fig. 1E, we show the dependencies of κ_r(κ_A), ε_f(κ_A) and λ(ε_A) on ε_A in a 3-dimensional form, as ε_A changes.

Appendix 2.3 The relation between respiration/fermentation flux and growth rate

We proceed to study the relation between the respiration/fermentation flux and the cell growth rate. From Eqs. S16 and S30, we see that the stoichiometric fluxes J_r, J_f, the normalized fluxes , and the normalized energy fluxes , are all interconvertible. For convenience, we first analyze the relations between , and λ under growth rate optimization. In fact, all these terms are merely functions of κ_A (see Eqs. S34–S36), which is determined by the nutrient condition (Eq. S27).

In the homogeneous case, where all microbes share identical biochemical parameters, as λ(κ_A) increases with κ_A, appear abruptly and vanish simultaneously as κ_A exceeds (Fig. 1E, see also Eqs. S34–S35, S41). Combining Eqs. S34–S36 and S43, we obtain:

where “θ” stands for the Heaviside step function. Defining λ_max = λ(), and then, [0, λ_max] is the relevant range of the x axis. In fact, the digital responses in Eq. S44 are consistent with the numerical simulation results of Molenaar et al. (Molenaar et al., 2009). However, these results are incompatible with the threshold-analog response in the standard picture of overflow metabolism (Basan et al., 2015; Holms, 1996).

In practice, the values of can be greatly influenced by the concentrations of potassium and phosphate (Garda-Contreras et al., 2012), which vary from cell to cell. Consequently, there is a distribution of values for among cell populations, commonly referred to as extrinsic noise (Elowitz et al., 2002). For convenience, we assume that each (and thus κ_i) follows a Gaussian distribution with a coefficient of variation (CV) of 25%. Therefore, the distribution of λ_C can be approximated by a Gaussian distribution (see Appendix 7.1):

where μ_λ_C and σ_λ_C represent the mean and standard deviation of λ_C, with the CV σ_λ_C/μ_λ_C calculated to be 12% (see Appendix 8.2 for details). Note that λ is κ_A dependent, while λ_C is independent of κ_A. Thus, given the growth rate of microbes in a culturing medium (e.g., in a chemostat), the normalized energy fluxes are:

where “erf” represents the error function. In practice, given a culturing medium, there is also a probability distribution for the growth rate (Appendix-fig. 2B, see also Eq. S157). For approximation, in plotting the growth rate-flux relations, we use the deterministic (noise-free) value of the growth rate as a proxy. To compare with experiments, we essentially compare the normalized fluxes , (see Appendix 8.1 for details). Combining Eqs. S30 and S46, we obtain:

In Fig. 1C-D, we see that Eq. S47 quantitatively illustrates the experimental data (Basan et al., 2015), where the model parameters were obtained using biochemical data for the catalytic enzymes (see Appendix-table 1 for details).

Appendix 3 Model perturbations

Appendix 3.1 Overexpression of useless proteins

Here, we consider the case of overexpression of the protein encoded by the lacZ gene (i.e., ϕ_Z perturbation) in E. coli. Effectively, this limits the proteome by altering ϕ_max:

where ϕ_Z stands for the proteomic mass fraction of useless proteins, which is controllable in experiments. Then, the growth rate changes into a bivariate function of κ_A and ϕ_Z:

and thus,

Obviously, remains a constant (following Eq. S42), while λ_C(ϕ_Z) = λ(, ϕ_Z) and λ_max(ϕ_Z) ≡ λ(, ϕ_Z) become functions of ϕ_Z:

In the homogeneous case, and follow:

Combined with Eqs. S50–S51, we have:

To compare with experiments, we assume that each and κ_i follow the extrinsic noise with a CV of 25% specified in Appendix 2.3, and we neglect the noise on ϕ_Z and ϕ_max. Combining Eqs. S45 and S51, λ_C(ϕ_Z) approximately follows a Gaussian distribution:

where μ_λ_C(ϕ_Z) and σ_λ_C(ϕ_Z) represent the mean and standard deviation of λ_C(ϕ_Z), with

Here, μ_λ_C(0), σ_λ_C(0), λ_C(0), λ_max(0) and λ(κ_A, 0) represent the parameters or variables free from ϕ_Z perturbation, just as those in Appendix 2.3. Since the noise on the multiplier term (i.e., 1 − ϕ_z/ϕ_max) is negligible, the CV of λ_C(ϕ_Z) (i.e., σ_λ_C(ϕ_Z)/μ_λ_C(ϕ_Z)) is unaffected by ϕ_z. By combining Eqs. S46 and S48, we obtain the relations between the normalized energy fluxes and growth rate:

where λ(κ_A, ϕ_Z), μ_λ_C(ϕ_Z) and σ_λ_C(ϕ_Z) follow Eqs. S50 and S55 accordingly. For a given value of ϕ_Z, i.e., ϕ_Z is fixed, then, λ(κ_A, ϕ_Z) changes monotonically with ϕ_A. Combining Eqs. S55- S56 and S30, we obtain the relation between the normalized fluxes , and the growth rate (where ϕ_Z is a parameter):

In Fig. 2C. we show that the model predictions (Eq. S57) quantitatively agree with the experiments (Basan et al., 2015).

Meanwhile, we can also perturb the growth rate by tuning ϕ_Z in a stable culturing environment with fixed concentration of a Group A carbon source (i.e., given [A]). In fact, for this case there is a distribution of ϕ_A values due to the extrinsic noise in , yet this distribution is fixed. For convenience of description, we still referred to it as fixed κ_A. Then, combining Eqs. S30, S50, S55 and S56, we get:

Here, λ(κ_A, 0) remains unaltered as κ_A is fixed. Therefore, in this case, and are proportional to λ, where the slopes are both functions of κ_A. More specifically, the slope of is a monotonically increasing function of κ_A, while that of is a monotonically decreasing function of κ_A. In Fig. 2B, we see that the model predictions (Eq. S58) agree quantitatively with the experiments (Basan et al., 2015).

In fact, the growth rate can be altered by tuning ϕ_Z and κ_A simultaneously. Then, the relations among the energy fluxes, growth rate and ϕ_Z still follow Eq. S57 (where ϕ_Z is a variable). In a 3-D representation, these relations correspond to a surface. In Fig. 2A, we show that the model predictions (Eq. S57) match well with the experimental data (Basan et al., 2015).

Appendix 3.2 Energy dissipation

In practice, energy dissipation disrupts the proportional relationship between energy demand and biomass production. Thus, Eq. S25 becomes:

where w represents the dissipation coefficient. In fact, maintenance energy contributes to energy dissipation, and we define the maintenance energy coefficient as w₀. In bacteria, the impact of maintenance energy is roughly negligible, yet in tumor cells, it plays a much more significant role (Locasale and Cantley, 2010).

The introduction of energy dissipation leads to a modification of Eq. S26: combining Eq. S59 and Eq. S16, we have:

Then, Eq. S29 changes to:

Consequently, if ε_r > ε_f, the optimal growth strategy for the cell is:

and if ε_f > ε_r, the optimal growth strategy is:

Then, the growth rate becomes a bivariate function of both κ_Aand w:

Clearly, is still a constant, while λ_C (w) = λ(, w) and λ_max(w) = λ(, w) become functions of w:

For a cell population, in the homogeneous case, and follow:

To compare with experiments, we assume the same extent of extrinsic noise in (and thus κ_i) as that specified in Appendix 2.3. Combining Eqs. S45 and S65, λ_C(w) approximately follows a Gaussian distribution:

where μ_λ_C(w) and σ_λ_C(w) represent the mean and standard deviation of λ_C(w), and

Here, μ_λ_C(0), σ_λ_C(0), λ_C(0), λ_max(0) and λ(κ_A, 0) represent parameters or variables unaffected by energy dissipation. In fact, there is a distribution of values for ε_r/f(). For approximation, we use the deterministic value of ε_r/f() in Eq. S68, and then the CV of λ_C(w) remains largely unperturbed by w. Combining Eqs. S46, S66 and S67, we have:

Since the dissipation coefficient w is tunable in experiments, for a given value of w, λ(κ_A, w) changes monotonically with κ_A. Combining Eqs. S68–S69 and S30, we have (here w is a parameter):

The comparison between model predictions (Eq. S70) and experimental results (Basan et al., 2015) is shown in Fig. 3B, which shows quantitative agreement. Meanwhile, the growth rate can also be perturbed by changing κ_A and w simultaneously. The relations among the energy fluxes, growth rate and w follow Eq. S70 (here w is a variable). In a 3D representation, these relations form a surface. As shown in Fig. 3A, the model predictions (Eq. S70) agree quantitatively with the experimental results (Basan et al., 2015).

Appendix 3.3 Translation inhibition

In E. coli, the translation rate can be modified by adding different concentrations of translation inhibitors, e.g., chloramphenicol (Cm). The net effect of this perturbation is represented as:

where i stands for the inhibition coefficient with ι > 0, and (1 + ι)⁻¹ represents the translation efficiency. Thus, Eq. S32 changes to:

First, we consider the case where maintenance energy is neglected, i.e., w₀ = 0. In this case, the growth rate takes the following form:

where λ(κ_A, 0) and ψ(κ_A, 0) represent the terms unaffected by translation inhibition. Thus, λ_C(ι) ≡ λ(, ι) and λ_max(ι) ≡ λ(, ι) become functions of ι:

In the homogeneous case, and follow:

To compare with experiments, we assume that extrinsic noise exists in and κ_i as specified in Appendix 2.3. Combining Eqs. S45 and S74, λ_C(ι) can be approximated by a Gaussian distribution:

where μ_λ_C(ι) and σ_λ_C(ι) represent the mean and standard deviation of λ_C(ι), with

Here, μ_λ_C(0), σ_λ_C(0), ψ(, 0) , λ_C(0) and λ_max(0) stand for the terms unaffected by translation inhibition. Essentially, there are distributions of values for ε_r/f(), ψ(, 0) and ψ(, ι). For approximation, we use the deterministic values of these terms in Eq. S77, and then the CV of λ_C(ι) can be approximated by λ_C(0). Combining Eqs. S46, S75 and S76, we have:

In the experiments, the inhibition coefficient ι is controllable by adjusting the concentration of the translation inhibitor. For a given value of ι, λ(κ_A, ι) changes monotonically with κ_A. Combining Eqs. S30 and S78, we have (here i is a parameter):

where μ_λ_C(ι) and σ_λ_C(ι) follow Eq. S77. The growth rate can also be perturbed by altering both κ_A and ι simultaneously. In this case, the relations among the energy fluxes, growth rate and ι still follow Eq. S79 (here ι is a variable). The comparison between Eq. S79 and experimental data (Basan et al., 2015) is shown in Appendix-fig. 2D (3-D) and 2E (2-D). Overall, there is good consistency; however, there remains a noticeable discrepancy when ι is large (i.e., at high concentration of the translation inhibitor). This led us to consider the maintenance energy through the coefficient w₀, which is small but may account for this discrepancy. Then, λ(κ_A, ι) changes into:

while λ_C(ι) ≡ λ(, ι) and λ_max(ι) ≡ λ(, ι) still follow Eq. S74, though the forms of λ_C(0) and λ_max(⁰) change to:

In the homogeneous case, and follow:

To compare with experiments, we assume that the extrinsic noise follows the specification in Appendix 2.3. Combining Eqs. S45, S74 and S81, λ_C(ι) approximately follows a Gaussian distribution:

Here μ_λ_C(ι) and σ_λ_C(ι) still follow Eq. S77, while μ_λ_C(0) and σ_λ_C(0) change accordingly with λ_C(0) (see Eq. S81). For approximation, we use the deterministic values of the relevant terms in Eq. S77, and then the CV of λ_C(ι) is roughly the same as λ_C(0). Combining Eqs. S46, S82 and S83, we have:

Thus, for a given ι, λ(κ_A, ι) changes monotonically with κ_A. Combining Eqs. S30 and S84, we have (here ι is a parameter):

The growth rate and fluxes can also be perturbed by altering both κ_A and ι simultaneously. The relations among the energy fluxes, growth rate and ι would still follow Eq. S85, except that ι is now regarded as a variable. Assuming a small amount of maintenance energy by assigning w₀ = 2.5 (h^-1), we find that the experimental results (Basan et al., 2015) agree quantitatively well with the model predictions (Fig. 3C-D).

Appendix 4 Overflow metabolism in substrates other than Group A carbon sources

Due to the topology of the metabolic network, for cells using Group A carbon sources, the behavior of overflow metabolism follows Eq. 5 (or Eq. S47) upon κ_Aperturbation (i.e., varying the type or concentration of a Group A carbon source). This has been demonstrated clearly in the above analysis and agrees quantitatively with experiments. However, further analysis is required for cells using substrates other than Group A sources due to the topological differences in carbon utilization (Wang et al., 2019). In principle, substrates entering from glycolysis or the points before acetyl-CoA are potentially involved in overflow metabolism, while those joining from the TCA cycle are not relevant to this behavior. Still, mixed carbon sources are likely to induce a different profile of overflow metabolism, as long as there is a carbon source derived from glycolysis.

Appendix 4.1 Pyruvate

The coarse-grained model for pyruvate utilization is shown in Fig. 3E. Here, nodes M₁, M₂, M₃, M₄, M₅ follow the descriptions in Appendix 2.1. Each biochemical reaction follows Eq. S5 with b_i = 1 except that 2M₂→M₁ and M₃ + M₅→M₄. By applying flux balance to the stoichiometric fluxes, combining with Eq. S8, we have:

For energy biogenesis, we convert all the energy currencies into ATPs, and then,

where β₇ = 1, β₈ = 2, β₃ = 2, β₄ = 6, β₆ = 1, β₉ = 6, β₁ = 4 for E. coli (Neidhardt et al., 1990; Sauer et al., 2004), and J_E follows Eq. S25. By applying the substitutions specified in Eqs. S9, S12, S14–S18, combined with Eqs. S4, S10, S22, S23, S25, S86–S87, and the constraint of proteome resource allocation, we have:

where . κ_i is approximately a constant which follows Eq. S20 for each of the intermediate node. The substrate quality of κ_py varies with the external concentration of pyruvate ([py]),

From Eq. S88, all ϕ_i can be expressed by , , and 2:

By substituting Eq. S90 into Eq. S88, we have:

Here, and stand for the normalized energy fluxes of respiration and fermentation, respectively, with

where = β₃ + β₄ + β₈ and = β₆ + β₈, with = 10 and = 3 for E. coli. The coefficients and represent the proteome efficiencies for energy biogenesis using pyruvate in respiration and fermentation pathways, respectively, with

is the proteome efficiency for biomass generation using pyruvate in the biomass synthesis pathway, with

φ_py is an energy demand coefficient (a constant), with

Evidently, Eq. S91 is identical in form with Eq. S29. The growth rate changes into κ_py dependent:

When κ_py is very small, combined with Eq. S93, then,

Obviously, ≫ , and hence

As long as

where the superscript “(ST)” stands for the saturated concentration, then,

and there exists a critical value of ϕ_py, denoted as , with

Here, is the growth rate at the transition point, and stands for either or . In Appendix-fig. 2H, we show the dependencies of (κ_py), (κ_py) and λ(κ_py) on κ_py in a 3dimensional form. In the homogeneous case, and follow:

Defining , and then, [0, ] is the relevant range of the x axis. To compare with experiments, we assume the same extent of extrinsic noise in as specified in Appendix 2.3. Then, approximately follows a Gaussian distribution:

where and stand for the mean and standard deviation of . Then, the relations between the normalized energy fluxes and growth rate are:

Combined with Eq. S92, we have:

In Fig. 3F, we show that the model predictions (Eq. S105) align quantitatively with the experimental results (Holms, 1996).

Appendix 4.2 Mixture of a Group A carbon source with extracellular amino acids

In the case of a Group A carbon source mixed with amino acids, the coarse-grained model is shown in Appendix-fig. 2A. This model can be used to analyze mixtures with one or multiple types of extracellular amino acids. Here, Eqs. S21, S22, S24 and S25 still apply, but Eq. S23 changes to (the case of i= a1 remains the same as Eq. S23):

Here, represents the number of carbon atoms in a molecule of Pool i. For simplicity, we assume:

In the case where all 21 types of amino acids are present and each is at saturated concentration (denoted as “21AA”), we have:

where ϕ_i and κ_i are defined following Eqs. S9 and S12. Since the cell growth rate significantly increases with the mixture of amino acids, we deduce that Pools a2-d are supplied by amino acids in growth optimization, with

Amino acids should be more efficient in the supply of biomass synthesis than the Group A carbon source for Pools a2-d, i.e.,

In practice, the requirement for proteome efficiency when using amino acids is even higher, since the biomass synthesis pathway is accompanied by energy biogenesis for Group A carbon sources, but not for amino acids. Combining Eqs. S108 and S109, we have:

where , follow Eq. S30, while ε_r and ε_f follow Eq. S31. ψ_21AA⁻¹ is the proteome efficiency for biomass generation in the biomass synthesis pathway under this nutrient condition, with

φ_21AA is an energy demand coefficient, with

Combining Eqs. S111 and S31, the formula for the growth rate is:

In fact, Eqs. S37–S42 still apply. ε_r/f() satisfies Eq. S43, while and are:

When extrinsic noise is taken into account, approximately follows a Gaussian distribution:

and the normalized fluxes , change to:

The above analysis can be extended to cases where a Group A carbon source is mixed with arbitrary combinations of amino acids. Eqs. S111, S114–S117 would remain in a similar form, while Eqs. S112–S113 would change depending on the combinations of amino acid. In Appendix-fig. 2B-C, we compare model predictions (see also Appendix 7.2 and Eq. S157) with experimental data (Basan et al., 2015; Wallden et al., 2016) from mixtures of 21 or 7 types of amino acids along with a Group A carbon source, demonstrating quantitative agreement. Additionally, the increase in the critical threshold of growth rate for the growth rate-dependent fermentation flux in mixtures with extracellular amino acids (i.e., , > λ_C, see Appendix-fig. 2C) has also been observed in other experimental findings (Peebo et al., 2015).

Appendix 5 Enzyme allocation upon perturbations

Appendix 5.1 Carbon limitation within Group A carbon sources

In Eq. S28, we present the model predictions for the dependencies of enzyme proteomic mass fractions on growth rate and energy fluxes. To compare with experiments, we assume the same extent of extrinsic noise in as specified in Appendix 2.3. Relative protein expression data for enzymes within glycolysis and the TCA cycle are available from existing studies and are comparable to the ϕ₁–ϕ₄ enzymes of our model (Fig. 1B). Upon κ_A perturbation, κ_A is a variable while w₀ is fixed (see Appendix 1.5). Combining Eqs. S28 and S47 (with w₀ = 0), we obtain:

In Appendix-fig. 3C-D, we show the comparisons between model predictions (Eq. S118, w₀ = 0) and experimental data (Hui et al., 2015), which are consistent overall. We then consider the influence of maintenance energy as specified in Appendix 3.2. Here, we continue to choose w₀ = 2.5 (h⁻¹ A as previously adopted in Appendix 3.3. Thus, Eq. S28 still holds. Combined with Eq. S85 under the condition that ι = 0, we have:

In Fig. 4A-B, we show that the model predictions (Eq. S119, w₀ = 2.5 (h⁻¹) generally agree with the experiments (Hui et al., 2015). However, there are different basal expressions of these enzymes, likely due to living demands other than cell proliferation, such as preparation for starvation (Mori et al., 2017) or changes in the type of the nutrient (Basan et al., 2020; Kussell and Leibler, 2005).

Appendix 5.2 Overexpression of useless proteins

In the case of ϕ_Z perturbation under each nutrient condition with fixed κ_A (see Appendix 3.1), we consider the same extent of extrinsic noise in as specified in Appendix 2.3. The relation between enzyme allocation and growth rate can be obtained by combining Eqs. S28 and S58 (with w₀ = 0):

Here λ(κ_A, 0) is the growth rate for ϕ_Z = 0, and thus it is a parameter rather than a variable. The growth rate is defined as λ(κ_A, ϕ_Z), which follows Eq. S50. Thus, ϕ_i is proportional to the growth rate λ. In Appendix-fig. 3E-F, we observe that the model predictions (Eq. S120) generally agree with the experiments (Basan et al., 2015). Next, we consider the influence of maintenance energy with w₀ = 2.5 (h⁻¹). Combining Eqs. S28, S58 and S85 (with ι = 0), we get:

Here, the growth rate is defined as λ(κ_A, ϕ_Z), and λ(κ_A, 0) is a parameter rather than a variable.

Thus, ϕ_i is a linear function of the growth rate λ, with a positive slope and a positive y-intercept. In Fig. 4C-D and Appendix-fig. 3I-J, we show that the model predictions (Eq. S121) agree quantitively with the experimental data (Basan et al., 2015).

Appendix 5.3 Energy dissipation

In the case of energy dissipation under each nutrient condition, w is perturbed while κ_A is fixed. The relation between protein allocation and growth rate can be obtained by combining Eqs. S28 and S70. However, since w is explicitly present in Eq. S70, it is necessary to reduce this variable to obtain the growth rate dependence of enzyme allocation. From Eq. S64, we have:

Here, λ(κ_A, 0) ≡ λ(κ_A, w = 0) (satisfying Eq. S64) is a parameter rather than a variable. “θ” stands for the Heaviside step function. Thus, we have:

where the energy dissipation coefficient w is regarded as a function of the growth rate.

Combining Eqs. S28, S70 and S123, we get:

where w(λ) follows Eq. S123. When κ_A lies in the vicinity of or w is small so that

then we have:

and thus:

Note that in Eq. S123, w is a linear function of λ with a negative slope. Thus ϕ_i exhibits a linear relation with λ when Eq. S125 is satisfied (see Eq. S127). In fact, the slope of ϕ₄ is certainly negative (combining Eqs. S64, S123 and S127), while the sign of the slope for other ϕ_i depends on parameters. For a given nutrient, the enzymes corresponding to the same ϕ_i should exhibit the same slope sign. Another restriction is that if the slope sign of ϕ₁ is negative, then the slope sign of ϕ₂ is surely negative. In Appendix-fig. 3K-N, we show that our model results agree well with the experimental data (Basan et al., 2015) (Eq. S127).

Appendix 6 Other aspects of the model

Appendix 6.1 A coarse-grained model with more details

To compare with experiments, we consider a coarse-grained model with more details, as shown in Appendix-fig. 2F. Here, nodes M₆, M₇ represent GA3P and DHAP, respectively. Other nodes follow the descriptions specified in Appendix 2.1. Each biochemical reaction follows Eq. S5 with b_i = 1 except that M₁ → M₆ + M₇ and M₃ + M₅ → M₄. By applying flux balance to the stoichiometric fluxes, combined with Eq. S8, we obtain:

While Eqs. S22–S25 still hold. By applying the substitutions specified in Eqs. S9, S12, S14–S18, combined with Eqs. S4, S10, S22–S25, S128, and the constraint of proteome resource allocation, we get:

Then, Eq. S28 still holds, while ϕ₁₀ and ϕ₁₁ are:

By substituting Eqs. S28 and S130 into Eq. S129, we get:

where “dt” stands for details. Eqs. S30 and S33 still hold. and represent the proteome efficiencies for energy biogenesis in the respiration and fermentation pathways, respectively, with

is the proteome efficiency for biomass generation in the biomass synthesis pathway, with

Appendix 6.2 Estimation of the in vivo enzyme catalytic rates

We use the method introduced by Davidi et al. (Davidi et al., 2016), combined with proteome experimental data (Basan et al., 2015) (Appendix-table 2), to estimate the in vivo enzyme catalytic rates. Combining Eqs. S28 and S130, we have:

Here, , , λ and ϕ_i (i = 1-6,10-11) are measurable from experiments (see Appendix 8.1 and Appendix-table 2). Thus, we can obtain the in vivo values of κ_i from Eq. S134. Combined with Eqs. S17 and S20, we have

Eq. S135 is the in vivo result for the enzyme catalytic rate. In Appendix-fig. 2G, we show a comparison between in vivo and in vitro results for k_cat values of enzymes within glycolysis and the TCA cycle, which are roughly consistent. In the applications, we prioritized the use of in vivo results for enzyme catalytic rates, and use in vitro data as a substitute when there were gaps.

Appendix 6.3 Comparison with existing models that illustrate experimental results

For the coarse-grained model described in Appendix 2, the normalized stoichiometric influx of a Group A carbon source is given by:

Combined with the first equation in Eq. S28 and Eq. S30, we obtain:

where ϑ = η_a1 + η_c + (η_a2 + η_b + η_d)/². Evidently, , and ϑ are constant parameters. In this subsection, we highlight the major differences between our model presented in Appendix 2 and existing models that illustrate the growth rate dependence of fermentation flux in the standard picture of overflow metabolism (Basan et al., 2015; Holms, 1996; Meyer et al., 1984; Nanchen et al., 2006).

Based on the modeling principles rather than the detailed mechanisms, there are two major classes of existing models that can illustrate experimental results. Both classes of models regard the proteome efficiencies ε_r and ε_f as constants, with ε_f > ε_r if used, or follow functionally equivalent propositions. However, in our model, ε_r and ε_f are both functions of κ_A, which vary significantly upon nutrient perturbation, with ε_r (κ_A → 0) > ε_f (κ_A → 0) and (see Eqs. S38, S40–S41). Furthermore, there are significant differences in the modeling and optimization principles, as listed below.

The first class of models (Chen and Nielsen, 2019; Majewski and Domach, 1990; Niebel et al., 2019; Shlomi et al., 2011; Varma and Palsson, 1994; Vazquez et al., 2010; Vazquez and Oltvai, 2016; Zhuang et al., 2011) optimize the ratio of biomass outflow to carbon influx , either to optimize the growth rate for a given carbon influx or to minimize the carbon influx for a given growth rate. Since respiration is far more efficient than fermentation in terms of energy biogenesis per unit carbon, to optimize the ratio , cells would preferentially use respiration when the carbon influx is small. As carbon influx increases above a certain threshold, factors such as proteome allocation direct cells toward fermentation in a threshold-linear response, since they consider ε_f > ε_r. Our model is significantly different from this class of models in the optimization principle, as we purely optimize the cell growth rate for a given nutrient condition, without imposing a special constraint on the carbon influx.

The second class of models, represented by Basan et al. (Basan et al., 2015), also adopt the optimization of in the interpretation of their model results. However, the growth rate dependence of fermentation flux was derived prior to the application of growth rate optimization (although it can be derived by optimizing ). In fact, Eqs. S29 and S137 in our model are very similar in form to those in Basan et al. (Basan et al., 2015), yet there are critical differences, which we list below. In Eq. S29, by regarding and as the two variables in a system of linear equations, we obtain the following expressions:

In Basan et al. (Basan et al., 2015), Eq. S138 is considered to be the relation between and λ upon nutrient (and thus ) perturbation, while ε_r and ε_f are regarded as constants throughout the perturbation. By contrast, in our model, Eq. S138 serves as a constraint under a given nutrient condition with fixed κ_A, and is not relevant to nutrient perturbation. For wild-type strains, if ε_r (κ_A) > ε_f (κ_A) (or vice versa), then the solution for optimal growth is and , with . This solution, which satisfies Eq. S138, corresponds to a point rather than a line in the relation between growth rate λ and normalized energy flux upon κ_A perturbation.

Appendix 7 Probability density functions of variables and parameters

Appendix 7.1 Probability density function of κ_i

Enzyme catalysis is crucial for the survival of living organisms, as it significantly accelerates biochemical reactions by reducing the energy barrier between the substrate and product (Nelson et al., 2008). However, the maximal turnover rate of enzymes, k_cat, varies notably between in vivo and in vitro measurements (Davidi et al., 2016). Recent studies suggest that differences in the aquatic medium are the primary cause of this variation (Davidi et al., 2016; García-Contreras et al., 2012). In particular, potassium and phosphate concentrations have a significant influence on k_cat (García-Contreras et al., 2012), and these concentrations exhibit some degree of variation among cell populations under intracellular conditions (García-Contreras et al., 2012). For simplicity, we assume that the turnover rate of each enzyme E_i, , follows a Gaussian distribution with among cells (representing extrinsic noise (Elowitz et al., 2002), denoted as χ_ext). The probability density function of is then given by:

When the CV of the distribution (i.e., ) is less than 1/3, is almost identical to . In this case, follows the positive inverse of Gaussian (IOG) distribution, and the probability density function is:

where is the shape parameter, and is the mean.

Meanwhile, due to the stochastic nature of biochemical reactions, we apply Gillespie’s chemical Langevin equation (Gillespie, 2000) to account for intrinsic noise (Elowitz et al., 2002) (denoted as χ_int). For cell size regulation of E. coli within a cell cycle, the cell mass at the initiation of DNA replication per chromosome origin remains constant (Donachie, 1968). Thus, the time required for enzyme E_i to complete a catalytic job (with a timescale of ) can be approximated as the first passage time of a stochastic process, with

Here , where Θ is proportional to the cell volume, and Γ_i (t) represents independent, temporally uncorrelated Gaussian white noise. Then, for a given value of , the first passage time T_Γ follows an Inverse Gaussian (IG) distribution (Folks and Chhikara, 1978):

where is the shape parameter, and represents the mean. The variance of this distribution is . Thus, we can obtain the CV:

which is inversely proportional to the square root of cell volume. Evidently, the intrinsic and extrinsic noise make orthogonal contributions to the total noise (Elowitz et al., 2002) (denoted as χ_tot):

In fact, when the CV is small (i.e., CV≪1), both the IOG and IG distributions converge into Gaussian distributions (Appendix-fig. 4). In the back-of-the-envelope calculations, we approximate x in all denominator terms of IOG (x, μ, ζ) and IG (x, μ, ζ) as μ (since CV≪1). Then, both the IOG and IG distributions can be approximated as follows:

with a variance of , and

with a variance of . Rigorously, we show below that IG (x, μ, ζ) shrinks to be N (μ, μ/ζ) when the CV is small. For the IG distribution, the characteristic function of the variable x is given by (Folks and Chhikara, 1978; Van Kampen, 1992):

and therefore,

When the variance σ² = μ³/ζ is very small, we essentially require 2μ²k/ζ = 2σ²k/μ ≪ 1, and then . Thus,

This leads to:

In fact, intrinsic noise does affect the short-term measurement of enzyme catalytic rate and growth rate at the single-cell level. However, its contribution in the long term is averaged out and thus becomes negligible. For simplicity, we approximate χ_tot ≈ χ_ext. Combined with Eqs. S145–S146, it is straightforward to verify that shares roughly the same CV as :

For convenience, in the model analysis, we approximate both IOG and IG distributions as Gaussian distributions. Then, all are independent, normally distributed random variables following Gaussian distributions:

Using the properties of Gaussian distributions, for a series of constant real numbers γ_i, the summation of , which we define as , follows a Gaussian distribution (Van Kampen, 1992):

with and . The relation between κ_i and is shown in Eq. S12. To optimize cell growth rate, each κ_i of the intermediate nodes satisfies Eq. S20, while κ_A satisfies Eq. S27. Thus, for a given nutrient condition ([A] is fixed), all the ratios are constants. Combined with Eqs. S139, S145–S146, and S152, the distributions of all κ_i and 1/κ_i can be approximated as Gaussian distributions:

where and are the means of κ_i and 1/κ_i, and and are their standard deviations. Using the properties of Gaussian distributions, combined with Eq. S31, S32, S36, S42–S43, S145–S146 and S153, ε_r, ε_f, ψ, λ_r, λ_f, and λ_C also roughly follow Gaussian distributions.

Appendix 7.2 Probability density function of the growth rate λ

From Appendix 7.1, we note that λ_r and λ_f (see Eq. S36) roughly follow Gaussian distributions, with

where and represent the mean and standard deviation, respectively. We further assume that the correlation between λ_r and λ_f is ρ_rf. From Eq. S36, we see that the growth rate λ takes the maximum of λ_r and λ_f , i.e.,

Then, the cumulative distribution function of λ is , where

Thus, the probability density function of the growth rate A is given by:

In Appendix-fig. 2B, we show that Eq. S157 quantitatively matches the experimental data for E. coli under the relevant conditions.

Appendix 8 Model comparison with experiments on E. coli

Appendix 8.1 Flux comparison with experiments on E. coli

In Appendix 6.2, we see that the values of and are required to calculate the in vivo enzyme catalytic rates of the intermediate nodes. Here, we use J_acetate and to represent the stoichiometric fluxes of acetate from the fermentation pathway and CO2 from the respiration pathway, respectively. Combined with the stoichiometric coefficients of both pathways, we have:

By further combining with Eqs. S16–S17, we get:

In fact, the values of J_acetate and scale with the mass of the “big cell”, which increases over time. In experiments, the measurable fluxes are typically expressed in the unit of mM/OD₆₀₀/h (Basan et al., 2015). Thus, we define and as the fluxes of J_acetate and (per biomass) in the unit of mM/OD₆₀₀/h, respectively. The superscript “(M)” represents the measurable flux in this unit. For E. coli, we use the following biochemical data collected from published literature: 1 OD₆₀₀ roughly corresponds to 6×10⁸ cells/mL (Stevenson et al., 2016), the average mass of a cell is 1pg (Milo and Phillips, 2015), the biomass percentage of the cell weight is 30% (Neidhardt et al., 1990), the molar mass of carbon is 12g (Nelson et al., 2008), r_carbon = 0.48 (Neidhardt et al., 1990) and r_protein = 0.55 (Neidhardt et al., 1990). Combined with the values of r_i (see Appendix 1.2) and , where , , , and (Nelson et al., 2008), we have:

From Eq. S18, we obtain the values of η_i for each precursor pool: η_a1 = 0.15, η_a2 = 0.30, η_b = 0.35, η_c = 0.09, and η_d = 0.11. Still, the value of η_E is required to compare the growth rate dependence of fermentation/respiration fluxes between model results and experiments, which we will specify in Appendix 8.2.

Appendix 8.2 Model parameter settings using experimental data of E. coli

We have collected biochemical data for E. coli, as shown in Appendix-tables 1–2, to set the model parameters. This includes the molecular weight (MW) and in vitro k_cat values of the catalytic enzymes, as well as the proteome and flux data used to calculate the in vivo turnover numbers. To reduce measurement noise, we take the average rather than the maximum value of in vivo k_cat from calculations using data from four cultures (see Appendix-table 2). Here, we prioritize the use of in vivo k_cat wherever applicable unless there is a gap in the in vivo data (see Appendix-table 1).

Note that our models are coarse grained. For example, the flux J₃ shown in Fig. 1B actually corresponds to three different reactions in the metabolic network (see Fig. 1A and Appendix-table 1), which we label as (i = 1, 2, 3). For each , there are corresponding variables/parameters of , , , satisfying Eqs. S8, S9 and S12, Evidently, (i = 1, 2, 3), and it is straightforward to derive the following relation between and κ₃:

In fact, Eq. S161 can be generalized to determine the values of other κ_i in the coarse-grained models combined with the biochemical data. For the coarse-grained model of Group A carbon source utilization shown in Fig. 1B, we have the values for parameters κ_i (i = 1, …, 6), and then . Evidently, , , and thus . For pyruvate, we have (see Eqs. S43 and S101), and it is easy to check that .

For the remaining model parameters, note that we have classified the inactive ribosomal-affiliated proteins into the Q-class, and then ϕ_max = 48% (Scott et al., 2010). The value of κ_t is obtainable from experiments: the translation speed is 20.1aa/s (Scott et al., 2010), with 7336 amino acids per ribosome (Neidhardt, 1996) and ς ≈ 1.67 (Neidhardt, 1996; Scott et al., 2010) (see Appendix 1.1), hence κ_t = 1/610 (s^-1). However, there are insufficient data to determine the values of κ_i (i = a1, a2, b, c, d) for the metabolites between the entry point metabolites shown in Fig. 1A to the precursor pools. These processes involves many steps, so these values are expected to be quite large. Here, we combine the contributions of κ_t and κ_i (i = a1, a2, b, c, d) by defining a composite parameter:

We proceed to estimate the values of Ω and φ using experimental data (Basan et al., 2015) for wild-type strains on the relation (Fig. 1C), and then all the remaining model parameters are set accordingly.

For the case of w₀ = 0, where all k_cat values follow a Gaussian distribution with an extrinsic noise of 25% CV (which is the general setting we use unless otherwise specified), we have φ = 10.8 and Ω = 1345 (s). Accordingly, we obtain η_E = 14.78, (h^-1) , and , where the CV of the extrinsic noise for Ω is estimated using the averaged CV of other κ_i. For the translation inhibition effect of Cm, we estimate the values for ι as , , and , where the superscript stands for the concentration of Cm, and the subscript represents the choice of w₀.

For pyruvate, with the value of η_E, we get φ_py = 14.82. However, there is still a lack of proteome data to determine the value of κ₉, which involves many steps in the metabolic network and thus can be considerably large. Here we define another composite parameter, Ω’_Gg ≡ (η_b + η_c)/κ₈ + η_a1/κ₉, and estimate its value as Ω’_Gg = 690 (s) from growth rate data for E. coli measured under the relevant nutrient conditions (Basan et al., 2015), where the subscript “Gg” stands for glucogenesis. Then, (h^-1), and , where the same CV of extrinsic noise for Ω applies to Ω’_Gg.

For the case of a Group A carbon source mixed with 21 amino acids (21AA, with saturated concentrations), we have φ_21AA = 14.2. Comparing Eq. S32 with Eq. S112, the parameter Ω should change to . Obviously, 1/κ_t < Ω_21AA < Ω, and we estimate Ω_21AA = 1000 (s) from the growth rate data for E. coli measured under the relevant nutrient conditions (Wallden et al., 2016). Then, we have (h^-1), and .

For the case of a Group A carbon source mixed with 7 amino acids (7AA: His, Iso, Leu, Lys, Met, Phe, and Val), similar to the roles of φ_21AA and Ω_21AA, we define φ_7AA and Ω_7AA. Using the mass fraction of the 7AA combined with Eq. S18, we have φ_7AA = 11.6. For the value of Ω_7AA, evidently, Ω_21AA < Ω_7AA < Ω, and we estimate Ω_7AA = 1215 (s) from growth rate data for E. coli measured under the relevant culture media (Basan et al., 2015). Then, (h^-1), and .

For the case of w₀ = 2.5 (h^-1), we have φ = 8.3, and thus η_E = 12.28, while other parameters such as Ω, and remain the same as for w₀ = 0. Nevertheless, the values for ι under translation inhibition by Cm are influenced by the choice of w₀, where the values of ι change to , , and .

From Appendix 7.1–7.2, combined with Eq. S114, the distributions of and can be approximated by Gaussian distributions:

where and stand for the mean values, while and represent the standard deviations. For the case of glucose mixed with 21AA (labeled as “Glucose+21AA”), the distribution of the growth rate follows Eq. S157. With Ω_21AA = 1000 (s) , we have , (both definitions follow Eq. S163), and ρ_rf ≈ 1.0 (obtained from numerical results).

For the case of succinate mixed with 21AA (labeled as “Succinate+21AA”), the respiration pathway is always more efficient since succinate lies within the TCA cycle. Thus, the cell growth rate (defined as ) would take the value of the respiration one and follows a Gaussian distribution:

For the case where acetate is the sole carbon source, the cells exclusively use the respiration pathway, and the growth rate (defined as λ_acetate) follows a Gaussian distribution:

Using the measured growth rate data (Wallden et al., 2016), we estimate and . To illustrate the distribution of growth rates , and shown in Appendix-fig. 2B, if no other source of noise existed, extrinsic noise with a CV of 40% would be required for each k_cat value. Then, , , , and . Allowing for the possibility that intrinsic noise may also play a non-negligible role in the observed single-cell growth rate (which is not a longterm average), we still use extrinsic noise with a CV of 25% for the model results of E.coli, except for those shown in Appendix-fig. 2B.

Appendix 9 Explanation of the Crabtree effect in yeast and the Warburg effect in tumors

Our model, along with the analysis presented in Appendix 2, can be extended with modifications to explain the Crabtree effect in yeast and the Warburg effect in tumors. In both cases, the optimization objective remains maximizing the cell growth rate. Consequently, yeast and tumor cells use the most efficient pathway for ATP production at the single-cell level.

For model applications in yeast or tumor cell metabolism, the fermentation flux shifts from acetate secretion to ethanol and lactate secretion, respectively (see Appendix-fig. 5A-B). The respiration and biomass generation pathways remain largely similar to those of E. coli, except that the biochemical reactions within the TCA cycle and respiratory chain occur in the mitochondria (see Appendix-fig. 5C-D). This leads to an increased enzyme cost for the respiration pathway due to energy currency exchanges between NADH or FADH₂ and ATP in the mitochondria. The coarse-grained models for Group A carbon source utilization in yeast and mammalian cells are shown in Appendix-fig. 5E-F, where M₃ represents pyruvate. In yeast and mammalian cells, the stoichiometric coefficients for ATP production (i.e., β_i) are identical to each other but differ from those of E. coli (see Fig. 1B and Appendix-fig. 5C-D), with β₁ = 5, β₂ = 1 , β₃ = 5 , β₄ = 7.5, β₆ = -2.5 , and β_a1 = 5 (Nelson et al., 2008). Hence, the stoichiometric coefficients of ATP production per glucose in each pathway are and , respectively, where and .

The impact of maintenance energy in yeast and tumor cells is significantly higher than that in E. coli (Locasale and Cantley, 2010). Therefore, Eq. S25 changes to (see Eq. S59):

where w₀ is the aforementioned maintenance energy coefficient. Thus, we have (see Eq. S60):

To account for the protein cost of energy currency exchanges in the mitochondria, we introduce ϕ_MT and κ_MT to represent the proteomic mass fraction of the enzymes and the effective substrate quality of related metabolites in the mitochondria, respectively. Note that the energy currency exchanges between NADH or FADH2 and ATP only occur during respiration, as there is no net NADH or FADH2 generation during fermentation (see Appendix-fig. 5C-D). Combined with Eq. S167, Eq. S25 changes to:

Here, Eq. S28 still holds, and we have:

where and follow Eq. S30, and ψ and φ satisfy Eq. S32 and S33, respectively. The expression for ε_f follows Eq. S31. However, the expression for ε_r differs from that in Eq. S31. For yeast and mammalian cells, we have:

At the single-cell level, from Eq. S169, and similar to Eq. S61–S63, if ε_r > ε_f, the optimal growth strategy is:

while if ε_f > ε_r, the optimal growth strategy is:

In both cases, the growth rate λ reaches its maximum value for a given nutrient condition with fixed κ_A:

From Eq. S170, when κ_A is very small such that κ_A → 0, it is evident that for yeast and mammalian cells, we still have:

Thus,

since still holds. Then, as long as , there exists a critical switching point for κ_A (denoted as , see Eq. S41), below which respiration is more efficient, while above , fermentation becomes more efficient in ATP production per proteome. Combined with Eq. S170, we have:

Accordingly, we obtain the expressions for , and λ_C (i.e., ):

Consequently, yeast and tumor cells would preferentially use respiration under starvation conditions (where ε_r > ε_f), yet switch to aerobic glycolysis when nutrients are abundant (where ε_r < ε_f) for optimal cell growth. This qualitatively illustrates the Crabtree effect in yeast and the Warburg effect in tumors.

At the cell population level, cell heterogeneity resulting from intrinsic and extrinsic noise causes the turnover numbers (i.e., k_cat) of enzymes and the critical growth rates at the transition point (λ_C) to follow distributions, which we assume to be Gaussian (see Eq. S45, Appendices 2.3 and 7.1). Due to the higher level of heterogeneity observed in tumor cells (Duraj et al., 2021; Hanahan and Weinberg, 2011; Hensley et al., 2016) and yeast (Bagamery et al., 2020) compared to E. coli, the extent of noise—and thus the CVs of k_catand λ_C—in yeast and tumor cells are expected to be larger than those in E. coli. The growth rate dependence of the normalized energy fluxes is as follows:

where and are the mean and standard deviation of λ_C, respectively, similar to the case of E. coli. Therefore, the growth rate dependence of the normalized fluxes is:

Combined with Eq. S160, Eq. S179 can be compared to experimental results, although in practice, it is difficult to tune the growth rate of tumor cells in vivo in experiments.

Recently, Shen et al. (Shen et al., 2024) reported that in many yeast and tumor cells, the measured proteome efficiencies in respiration at the cell population level are higher than the corresponding proteome efficiencies in fermentation, even though aerobic glycolysis fermentation fluxes still occur. This finding apparently contradicts prevalent explanations (Basan et al., 2015; Chen and Nielsen, 2019), which assert that overflow metabolism originates from the proteome efficiency in fermentation always being higher than in respiration.

Our model can resolve the puzzle above based on two important features: First, our model predicts that as long as ATP generation per glucose in respiration is higher than in fermentation (i.e., ), which definitely holds true for all organisms, the proteome efficiency in respiration is higher than that in fermentation when the nutrient quality κ_A is low (see Eqs. S37- S38 and S174–S175). Second, and importantly, due to cell heterogeneity at the population level, a subset of cells exhibiting greater proteome efficiency in fermentation compared to respiration could exist, even if the proteome efficiency at the cell population level in respiration is higher than in fermentation.

To facilitate comparison between our model and the experiments of Shen et al. (Shen et al., 2024), we define Pr_f as the proportion of ATP generated from fermentation, and as the proteome efficiency difference between respiration and fermentation, with

and

At the cell population level, ε_r, ε_f, 1/ε_r and 1/ε_f roughly follow Gaussian distributions (see Appendix 7.1 and Eq. S170), with

Here, , , , , and , , , are the standard deviations and mean values of ε_r, ε_f, 1/ε_r and 1/ε_f, respectively. Thus,

where the angle bracket “” represents the average over the cell population, and and are the population-averaged values of ε_r and ε_f, respectively, which are both measurable in experiments. From the derivations shown in Appendix 7.1, we approximately have

Here, we use , , and to represent the CVs of ε_r, ε_f, 1/ε_r and 1/ε_f, respectively, with

Similar to Eq. S151, the CVs of 1/ε_r and 1/ε_f are roughly equal to those of ε_r and ε_f, respectively. Thus,

Combining Eqs. S181 and S182, and using the properties of Gaussian distributions, follows a Gaussian distribution:

where and are the mean and standard deviation of , respectively. Evidently, we have

Then, we proceed to calculate the relation between Pr_f and using Eq. S187, and hence we obtain:

Combining Eqs. S180, S183–S185, and S188–S189, we have:

Note that the normalized energy fluxes and are proportional to the measured ATP fluxes generated in respiration and fermentation, respectively. Hence, Eq. S190 can be directly compared to experimental data. For yeast and tumor cells, due to a higher level of heterogeneity, the CVs of ε_r and ε_f, i.e., and , could be significantly higher than the corresponding values in E. coli, though their exact values are unknown. Consequently, we plot theoretical results with the values of and chosen as 0.25, 0.40, and 0.58 to compare with the experimental data for yeast and in vivo mouse tumors (Bartman et al., 2023; Shen et al., 2024). In Fig. 5A-B, we observe that the theoretical results using agree well with the experimental data (Bartman et al., 2023; Shen et al., 2024), both on a log scale and linear scale. This demonstrates that our model has the potential to quantitatively illustrate the Crabtree effect in yeast and the Warburg effect in tumors.

Appendix 10 Notes on the application of reference data

Data calibration

Throughout our manuscript, we use experimental data from the original references, except for two calibrations. The first calibration is noted in the footnote of Appendixtable 2. With this calibration, the data for E. coli (Basan et al., 2015) in Appendix-table 2 align with the curve shown in Fig. 1C, which includes experimental data for E. coli from other sources. The second calibration applies to the data shown in Figs. 3F and 1C (chemostat data for E. coli). The unit in the original reference (Holms, 1996) is mmol/(dry mass)g/h. To convert this to the unit mM/OD₆₀₀/h, used in our text, the conversion factor should be 0.18. Here, we deduce that only 60% of the measured dry biomass in centrifuged material is effective when calibrating with other experimental results. Therefore, there is a calibration factor of 0.6, and the conversion factor changes to 0.3.

Data from the inducible strains

Some of the experimental data in the original references (Basan et al., 2015; Hui et al., 2015) were obtained using E. coli strains with titratable systems (e.g., titratable ptsG, LacY). The relation of these inducible strains generally aligns with the same curve as that of wild-type E. coli (Fig. 1C). Since evolutionary treatment was not applied to the inducible strains, we approximate titration perturbation as a technique that mimics culturing the strains in a less efficient Group A carbon source.

Experimental data sources

The batch culture data for E. coli shown in Fig. 1C (labeled as minimum/rich media or inducible strains) and Appendix-fig. 2C were taken from the source data of the reference’s figure 1 (Basan et al., 2015). The chemostat data for E. coli shown in Fig. 1C were taken from the reference’s table 7 (Holms, 1996). The data for E. coli shown in Fig. 1D were taken from the reference’s extended data figure 3a (Basan et al., 2015), with the calibration specified in the footnote to Appendix-table 2.

The data for E. coli shown in Fig. 2A were adopted from the reference’s extended data figure 4a-b (Basan et al., 2015). The data for E. coli shown in Fig. 2B were taken from the source data of the reference’s figure 2a (Basan et al., 2015). The data for E. coli shown in Fig. 2C were taken from the source data of the reference’s figure 3a (Basan et al., 2015). The data for E. coli shown in Fig. 3A-B were taken from the source data of the reference’s figure 3d (Basan et al., 2015).

The data for E. coli shown in Fig. 3C-D and Appendix-fig. 2D-E were taken from the source data of the reference’s figure 3c (Basan et al., 2015). The data for E. coli shown in Fig. 3F were taken from the reference’s table 7 (Holms, 1996), with a calibration factor specified in the above paragraph (“Data calibration”).

The data for E. coli shown in Fig. 4A-B and Appendix-fig. 3A-D were taken from the reference’s table S2 with the label “C-lim” (Hui et al., 2015). We excluded the reference’s data with λ = 0.45205 h^-1 as there are other unconsidered factors involved during slow growth (Dai et al., 2016) (for λ < 0.5 h^-1), and we suspect that unknown calibration factors may exist. The data for E. coli shown in Fig. 4C-D and Appendix-fig. 3E-N were adopted from the reference’s extended data figure 6–7 (Basan et al., 2015).

The batch culture data for yeast shown in Fig. 5 were derived from the source data of the reference’s extended figure 4c-d (Shen et al., 2024). The chemostat data for yeast shown in Fig. 5 were derived from the source data of the reference’s figure 3d-e (Shen et al., 2024), where glucose is the limiting nutrient. We excluded the reference’s data for I. orientalis under condition C2, where the ATP flux was abnormally small. The mouse tumor in vivo data shown in Fig. 5 were derived from the source data of the reference’s figure 4e-g (Shen et al., 2024), which were originally reported by Bartman et al. (Bartman et al., 2023), the same research group as Shen et al. (Shen et al., 2024). We did not include the cancer cell line data shown in figure 4a-c of Shen et al. (Shen et al., 2024) since it appears that the proteomic data and flux data were obtained from two different references with inconsistent culturing conditions.

The gene names of E. coli depicted in Appendix-fig. 1B were identified using the KEGG database. The data for E. coli shown in Appendix-fig. 2G were drawn from Appendix-table 1, which includes the original references themselves. The flux data for E. coli presented in Appendix-table 2 were obtained from the reference’s extended data figure 3a (Basan et al., 2015), with the calibration specified in the footnote. The proteome data for E. coli shown in Appendixtable 2 were taken from the reference’s supplementary Table N5 (Basan et al., 2015).

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Coarse-grained model

Model and results of overflow metabolism in E. coli.

Origin of overflow metabolism

Influence of protein overexpression on overflow metabolism in E. coli.

Influence of energy dissipation, translation inhibition, and carbon source category alteration on overflow metabolism in E. coli.

Testing the model through perturbations

Enzyme allocation under perturbations

Relative protein expression of central metabolic enzymes in E. coli under nutrient limitation and proteomic perturbation.

Explanation of the Crabtree effect in yeast and the Warburg effect in cancer cells

Model comparison with data on the Crabtree effect in yeast and the Warburg effect in tumors.

Discussion

Additional information

Author contributions

Competing Interest Statement

Data, Materials, and Software Availability

Acknowledgements

Appendix 1 Model framework

Appendix 1.1 Proteome partition

Appendix 1.2 Precursor pools

Appendix 1.3 Stoichiometric flux

Appendix 1.4 Carbon flux and cell growth rate

Appendix 1.5 Intermediate nodes

Appendix 2 Model and analysis

Appendix 2.1 Coarse-grained model

Appendix 2.2 The reason for overflow metabolism

Appendix 2.3 The relation between respiration/fermentation flux and growth rate

Appendix 3 Model perturbations

Appendix 3.1 Overexpression of useless proteins

Appendix 3.2 Energy dissipation

Appendix 3.3 Translation inhibition

Appendix 4 Overflow metabolism in substrates other than Group A carbon sources

Appendix 4.1 Pyruvate

Appendix 4.2 Mixture of a Group A carbon source with extracellular amino acids

Appendix 5 Enzyme allocation upon perturbations

Appendix 5.1 Carbon limitation within Group A carbon sources

Appendix 5.2 Overexpression of useless proteins

Appendix 5.3 Energy dissipation

Appendix 6 Other aspects of the model

Appendix 6.1 A coarse-grained model with more details

Appendix 6.2 Estimation of the in vivo enzyme catalytic rates

Appendix 6.3 Comparison with existing models that illustrate experimental results

Appendix 7 Probability density functions of variables and parameters

Appendix 7.1 Probability density function of κi

Appendix 7.2 Probability density function of the growth rate λ

Appendix 8 Model comparison with experiments on E. coli

Appendix 8.1 Flux comparison with experiments on E. coli

Appendix 8.2 Model parameter settings using experimental data of E. coli

Appendix 9 Explanation of the Crabtree effect in yeast and the Warburg effect in tumors

Appendix 10 Notes on the application of reference data

Data calibration

Data from the inducible strains

Experimental data sources

Molecular weight (MW) andinvivo/in vitrokcatdata forE. coli

Proteome and flux data (Basan et al., 2015) used to calculate the in vivo kcat of E. coli

Illustrations of symbols in this manuscript.

Central metabolic network and carbon utilization pathways of E. coli.

Model and results for experimental comparison of E. coli.

Relative protein expression of central metabolic enzymes in E. coli under various types of perturbations.

Asymptotic distributions of inverse Gaussian distribution and the inverse of Gaussian distribution.

Carbon utilization in yeast and mammalian cells.

References

Article and author information

Author information

Xin Wang

Version history

Cite all versions

Copyright

Metrics

Appendix 7.1 Probability density function of κ_i

Molecular weight (MW) andinvivo/in vitrok_catdata forE. coli

Proteome and flux data (Basan et al., 2015) used to calculate the in vivo k_cat of E. coli