Abstract
Cells can adapt to various environments by changing their biomolecular profiles while maintaining physiological homeostasis. What organizational principles in cells enable the simultaneous realization of adaptability and homeostasis? To address this question, we measure Raman scattering light from Escherichia coli cells under diverse conditions, whose spectral patterns convey their comprehensive molecular composition. We reveal that dimension-reduced Raman spectra can predict condition-dependent proteome profiles. Quantitative analysis of the Raman-proteome correspondence characterizes a low-dimensional hierarchical stoichiometry-conserving proteome structure. The network centrality of each gene in the stoichiometry conservation relations correlates with its essentiality and evolutionary conservation, and these correlations are preserved from bacteria to human cells. Furthermore, stoichiometry-conserving core components obey growth law and ensure homeostasis across conditions, whereas peripheral stoichiometry-conserving components enable adaptation to specific conditions. Mathematical analysis reveals that the stoichiometrically constrained architecture is reflected in major changes in Raman spectral patterns. These results uncover coordination of global stoichiometric balance in cells and demonstrate that vibrational spectroscopy can decipher such biological constraints beyond statistical or machine-learning inference of cellular states.
Introduction
Biological cells can change their gene expression and metabolic profiles globally to adapt to thier biological contexts and external conditions, while maintaining the homeostasis of their core physiological states. The simultaneous realization of adaptability and homeostasis is a hallmark of biological systems and is assumed to be a system-level property of gene expression profiles in cells (1, 2). However, understanding the underlying organizational principles in comprehensive gene expression profiles remains to be a fundamental problem in biology.
Vibrational spectroscopy such as Raman spectroscopy might help us investigate such principles in gene expression profiles. Raman spectroscopy is a light scattering technique that measures energy shifts of light caused by interaction with sample molecules. Raman spectra are obtainable non-destructively even from biological samples such as individual cells. In principle, cellular Raman spectra are optical signatures conveying comprehensive molecular composition of targeted cells (3–6). Furthermore, no prior treatments, such as staining and tagging, are necessary to obtain cellular Raman spectra. However, although some biomolecules have separable and intense Raman signal peaks, Raman spectra of most biomolecules overlap and are masked by signals of other molecules due to the diversity and complexity of molecular compositions of cells. Therefore, it is impractical to comprehensively determine the amounts of biomolecules by spectral decomposition.
Despite the intractability of spectral decomposition, reconstruction of comprehensive molecular profiles may be achievable by analyzing detectable global spectral patterns (Fig. 1A) thanks to effective low dimensionality of changes in molecular profile of targeted cells (7–17)(Fig. 1B and Fig. S1). Indeed, it has been demonstrated that condition-dependent global transcriptome profiles of cells can be inferred from cellular Raman spectra based on their statistical correspondence (18, 19). Importantly, this Raman-spectroscopic transcriptome inference was possible from dimension-reduced Raman spectra. Therefore, dominant changes in global Raman spectral patterns may contain vital information about the constraints on the molecular profiles in cells; an inspection of their correspondence might give us insights into architectural principles of omics profiles and biological foundation for global omics inference from spectral patterns (Fig. S1).

Cellular physiological state differences detected by Raman spectral global patterns and gene expression profiles.
(A) Condition-dependent cellular Raman spectral patterns. Raman spectra obtained from cells reflect their molecular profiles. Therefore, systematic differences in global spectral patterns may indicate their physiological states. A Raman spectrum from each cell can be represented as a vector and a point in a high-dimensional Raman space. If condition-dependent differences exist in the spectral patterns, appropriate dimensional reduction methods allow us to classify the spectra and detect cellular physiological states in a low-dimensional space. (B) Condition-dependent gene expression profiles. Global gene expression profiles (proteomes and transcriptomes) are also dependent on conditions. For each gene, we can consider a high-dimensional vector whose elements represent expression levels under different conditions. It has been suggested that these expression-level vectors are constrained to some low-dimensional manifolds (7–17). This study characterizes the statistical correspondence between dimension-reduced Raman spectral patterns and gene expression profiles. Analyzing the correspondence, we reveal a stoichiometry conservation principle that constrains gene expression profiles to low-dimensional manifolds.
In this report, we first reveal that, in addition to transcriptomes, condition-dependent proteome profiles of Escherichia coli are predictable from cellular Raman spectra. Next, we scrutinize the correspondence between Raman and proteome data, identifying several stoichiometrically conserved groups (SCGs) whose expression tightly correlates with the major changes in cellular Raman spectra. Finally, we reveal that the stoichiometry conservation centrality of each gene correlates with its essentiality, evolutionary conservation, and condition-specificity of gene expression levels, which turns out general across different omics layers and organisms.
Results
Statistical correspondence between Raman spectra and proteomes
To examine the correspondence between Raman spectra and proteomes in E. coli, we reproduced 15 environmental conditions for which absolute quantitative proteome data are already available (20) and measured Raman spectra of E. coli cells under those conditions (Fig. 2A and Fig. 2B). The culture conditions we adopted include (i) exponential growth phase in minimal media with various carbon sources, (ii) exponential growth phase in rich media, (iii) exponential growth phase with various stressors, and (iv) stationary phases (Table S1). We measured Raman spectra of single cells sampled from each condition and focused on the fingerprint region of biological samples, where the signals from various biomolecules concentrate (spectral range of 700–1800 cm−1, Fig. 2B and Fig. S2). The Raman spectra were classified on the basis of the environmental conditions using a simple linear classifier, linear discriminant analysis (LDA) (3, 4, 21) (Fig. 2C–2E). This classifier calculates the most discriminatory axes by maximizing the ratio of between-condition variance to within-condition variance and reduces the dimensions of Raman data to m − 1, where m is the number of conditions (see sections 1.1 and 2.1 in (22)).

Estimation of proteomes from Raman spectra.
(A) The experimental design. We cultured E. coli cells under 15 different conditions and measured single cells’ Raman spectra. We then examined the correspondence between the measured Raman spectra and the absolute quantitative proteome data reported by Schmidt et al. (20). (B) Representative Raman spectra from single cells, one from the “Glucose” condition, and the other from the “LB” condition. The fingerprint region and representative peaks are annotated. (C to E) Cellular Raman spectra in LDA space. The dimensionality of the spectra is reduced to 14 (= 15 − 1). Each point represents a spectrum from a single cell, and each ellipse shows the 95% concentration ellipse for each condition. Their projection to the LDA1-LDA2 plane (C), the LDA1-LDA3 plane (D), and the LDA1-LDA4 plane (E) are shown. (F) Visualization of the 14 dimensional LDA space embedded in two dimensional space with t-SNE. (G) The scheme of leave-one-out cross-validation. The Raman and proteome data of one condition (here j) are excluded, and the matrix B is estimated using the data of the rest of the conditions as
The result shows that Raman spectral points from different environmental conditions are distinguishable in the LDA space (Fig. 2C–2E). For example, the first and second LDA axes clearly distinguish the conditions “LB” and “stationary3days” (Fig. 2C), and the third axis distinguishes “Glucose42C” and “GlycerolAA” (Fig. 2D). Notably, the first principal axis LDA1 correlated with growth rate significantly (Pearson correlation r = 0.81 ± 0.09, Fig. S2). Visualizing the Raman LDA data by embedding them on a 2D plane using t-distributed Stochastic Neighbor Embedding (t-SNE) (23) confirms that the points for each condition form a distinctive cluster (Fig. 2F). These results imply that positions in the Raman LDA space reflect condition-dependent differences in cellular physiological states.
We next asked whether these Raman spectral differences in the LDA space could be linked to the different proteome profiles. To examine this, we hypothesized linear correspondence between the proteome vector
B is a matrix that connects
We conducted leave-one-out cross-validation (LOOCV) to verify this linear correspondence (Fig. 2G). We excluded one condition (here, j) as a test condition and estimated B as
The proteome profile estimated using the first four major LDA axes (LDA1–LDA4) agreed well with the actual proteome data under most conditions (Fig. 2H and Fig. S3; see section 1.2 in (22) for the estimation with all the LDA axes). Changing the condition to exclude, we estimated the proteomes for all the 15 conditions and calculated the overall estimation error by the Euclidean distance
Stoichiometry conservation of proteins in the ISP COG class
The regression matrix B considered above determines how the proteomes relate to the Raman LDA axes. Therefore, analyzing B should provide some insights into constraints on condition-dependent changes in the proteomes.
The matrix B is represented as B = [b0 b1 … bm−1], where bk = (b1k b2k … bnk)T is the (k + 1)-th column of B (0 ≤ k ≤ m − 1) and n = 2,058 is the number of protein species in the proteome data. We first asked whether any shared features might exist in the coefficients of B depending on biological functions of corresponding proteins. We then classified the proteins according to functional annotations of Clusters of Orthologous Group (COG) classes (27–29) and found that, for many proteins belonging to the “information storage and processing” (ISP) COG class, the coefficients corresponding to different LDA axes are approximately proportional to the constant terms; i.e., blk ≈ ckbl0, where ck is a constant common to the ISP COG class proteins (Fig. 3A). The ISP COG class contains various proteins involved in processing genetic information such as translation, transcription, DNA replication, and DNA repair (20). Simple calculations show that these proportionality relationships imply that proteins in the ISP COG class conserves their mutual abundance ratios, i.e., stoichiometry, irrespective of environmental conditions (see section 1.3 in (22)).

A stoichiometrically conserved protein group identified by an analysis of the Raman-proteome coefficient matrix.
(A) Scatterplots of Raman-proteome transformation coefficients. The horizontal axes are constant terms (b0) in all the plots. The vertical axis is coefficients for LDA1 (b1), LDA2 (b2), LDA3 (b3), or LDA4 (b4) in each plot. The proteins in the ISP COG class are indicated in yellow. Yellow solid straight lines are least squares regression lines passing through the origins for the ISP proteins. Insets are enlarged views of area around the origins. In this figure, we used the average of
Since this is an implication from the Raman-proteome correspondence, we next examined the stoichiometry conservation only with the proteome data, evaluating the expression levels with Pearson correlation coefficients for all the pairs of the conditions for each COG class (Fig. 3B and Fig. S4). For the ISP COG class, the correlation coefficients were close to 1, whereas those for the other COG classes were significantly weaker depending on condition pairs. Therefore, stoichiometry conservation is stronger in the ISP COG class than in the other COG classes. Remarkably, neither shared transcription factors nor chromosome locations can account for the observed stoichiometry conservation of many protein pairs (Fig. 3C and Fig. 3D), implying multi-level regulation and coordination of their abundance.
We consulted other public quantitative proteome data of Mycobacterium tuberculosis (30), Mycobacterium bovis (30), and Saccharomyces cerevisiae (31) under environmental perturbations and consistently found strong stoichiometry conservation of the ISP COG class (Fig. S4). Furthermore, the same trend was observed for the genotype-dependent expression changes in
E. coli proteomes (20) (Fig. S4).
Identifying stoichiometrically conserved groups
Inspired by the existence of a large class of proteins that conserves their stoichiometry, we considered a systematic way to extract stoichiometrically conserved groups (SCGs) without relying on artificial functional classification of COG. Focusing only on the proteome data, we evaluated stoichiometry conservation by the cosine similarity of expression levels across conditions for all the pairs of proteins in the proteome (Fig. 4A) and extracted groups in each of which the component proteins exhibit coherent expression change patterns by setting a high threshold of cosine similarity (≥ 0.995, Fig. 4B).

Extracting SCGs from proteome data.
(A) Quantifying stoichiometry conservation by cosine similarity. We consider an expression vector for each protein species whose elements represent its abundance under different conditions. The cosine similarity between the expres-sion vectors of two protein species becomes nearly 1 when they conserve mutual stoichiometry strongly across conditions, whereas lower than 1 when their expression patterns are incoherent. (B) Extracted SCGs. We extracted proteins with high cosine similarity relationships. Each node represents a protein species. An edge connecting two nodes represents that the expression patterns of the two connected protein species have high cosine similarity exceeding a threshold of 0.995. Proteins that have no edge with the other proteins are not shown. The largest and the second largest protein groups, which we refer to as SCG 1 and SCG 2, respectively, are indicated by shaded polygons. (C) Expression patterns of the extracted SCGs. The horizontal and vertical axes represent growth rate and protein abundance, respectively. Line-connected points represent expression-level changes of different protein species across conditions. The inset for SCG 2 shows the total abundances of SCG 2 proteins with a log-scaled vertical axis. Error bars are standard errors. (D) The gene loci of the homeostatic core (SCG 1) proteins on the chromosome. Magenta dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). We determined the gene loci based on ASM75055v1.46 (50).
The largest SCG (SCG 1) included many proteins in the ISP COG class (91 out of 191 SCG 1 members), such as ribosomal proteins and RNA polymerase, and also proteins in the other COG classes (Fig. 4B, Table S4). We call this largest SCG homeostatic core, as it constitutes the largest stoichiometry-conserving unit in cells. We found that the abundance of each protein in the homeostatic core (SCG 1) increased approximately linearly with the growth rate in each condition (Fig. 4C). This relationship is reminiscent of the growth law: The total ribosomal contents for translation increase linearly with growth rate (32–34). The linear increase in the abundance of each protein in Fig. 4C indicates that the growth law is valid even at the single-gene level for a large class of ribosomal and non-ribosomal proteins in the homeostatic core (Fig. S5) (see section 3.1 in (22)).
Though not evenly distributed, the gene loci of the proteins in the homeostatic core are scattered throughout the chromosome (Fig. 4D). Therefore, localization of gene loci to a single or a small number of operons is not likely a cause of the observed stoichiometry conservation.
The proteins in the second largest SCG (SCG 2) are expressed at high levels in the fast growth conditions, especially in the “LB” condition (Fig. 4C). The SCG 2 includes many proteins in the metabolism COG class (21 out of 26 SCG 2 members) (Table S5), and their abundance increases approximately exponentially with growth rate (Fig. 4C). We also identified other condition-specific small SCGs, such as a group most expressed in the “GlycerolAA” condition (SCG 3) (Table S6), a group mainly expressed in the “Fructose” condition (SCG 4) (Table S7), and a group most expressed in the stationary phase conditions (SCG 5) (Table S8) (Fig. 4C).
Biological relevance of stoichiometry conservation
To understand the overall strength of stoichiometry conservation of the proteins in the different SCGs, we calculated the sum of cosine similarity,
The proteins in the homeostatic core had high centrality scores (Fig. 5A). Therefore, these proteins tend to have more connections with other proteins in terms of stoichiometry conservation. On the other hand, the proteins in the condition-specific SCGs tend to have low centrality scores among all the proteins (Fig. 5A), which suggests that their stoichiometry conservation is localized within each SCG.

A proteome structure characterized by global stoichiometry conservation relationships.
(A) Distributions of stoichiometry conservation centrality values for all the proteins (gray), the homeostatic core (SCG 1) proteins (magenta), and the proteins belonging to the other SCGs (cyan). (B) Correlation between stoichiometry conservation centrality and gene essentiality. The proportion of essential genes within each class of stoichiometry conservation ranking is shown. The list of essential genes was downloaded from EcoCyc (49). (C) Correlation between stoichiometry conservation and evolutionary conservation. The strength of evolutionary conservation of each protein species was estimated by the number of orthologs found in the OrthoMCL species (35). The genes with more orthologs tend to have higher stoichiometry conservation centrality (p = 6.24 × 10−15 by one-sided Brunner-Munzel test between the top 25% and the bottom 25% fractions of ortholog number ranking). Likewise, the genes with higher stoichiometry conservation centrality scores tend to have more orthologs (p = 4.04 × 10−12 by one-sided Brunner-Munzel test, Top 25%–Bottom 25% comparison; p-values in the captions for (F) to (I) were evaluated with the same statistical test scheme). (D) to (G) Stoichiometry conservation analyses of human cell atlas transcriptome data of fetal 15 organs (36). The top gray histogram in (D) shows the distribution of stoichiometry conservation centrality values for all genes. The bottom histograms in (D) show the distribution for coding genes (yellow) and that for the other genes (cyan). (E) shows a correlation between the ratio of coding genes and stoichiometry conservation centrality calculated from the human cell atlas data. (F) shows a correlation between gene essentiality and stoichiometry conservation centrality calculated from the human cell atlas data. The essentiality of each human gene was quantified by CRISPR score, which is the fitness cost imposed by CRISPR-based inactivation of the gene in KBM7 chronic myelogenous leukemia cells (35). Genes with lower CRISPR score are regarded as more essential. The fraction with low CRISPR score (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality (p < 10−15). The fraction with high centrality score tends to be more essential (p < 10−15). (G) shows a correlation between evolutionary conservation and stoichiometry conservation centrality based on the human cell atlas data. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality (p < 10−15). The gene fraction with high centrality score tends to have more orthologs (p< 10−15). (H) and (I) Stoichiometry conservation analyses of genome-wide Perturb-seq data (37). (H) shows a correlation between stoichiometry conservation centrality calculated from the Perturb-seq data and gene essentiality. The essentiality of each gene was quantified by the CRISPR score as in (F). The gene fraction with low CRISPR score (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality (p< 10−15). The gene fraction with high centrality score tends to be more essential (p < 10−15). (I) shows a correlation between stoichiometry conser-vation based on the Perturb-seq data and evolutionary conservation of genes. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality (p < 10−15). The gene fraction with high centrality score tends to have more orthologs (p < 10−15). (J) Representation of the proteomes as a graph. A node corresponds to a protein species, and the weight of an edge is taken as the cosine similarity between the expression vectors of the two connected protein species. The matrix A can specify the whole graph. Note that the diagonal elements of A are ones, which were introduced just for simplicity. (K) Cosine similarity LE (csLE) structure in a three-dimensional space. Each dot represents a different protein species and is color-coded on the basis of its stoichiometry conservation centrality value. We selected the axes considering the structural similarity to the Raman-based proteome structure in ΩB (see Fig. 6). (L) The csLE structure in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a different protein species. The proteins belonging to each SCG are indicated with distinct markers.
The stoichiometry conservation centrality is biologically relevant because it correlates with gene essentiality. Fractions of essential genes almost monotonically decrease with the ranks of centrality score (Fig. 5B and Fig. S6). We also noted that genes with high centrality scores have more orthologs determined by OrthoMCL-DB (35) across the three domains of life (Fig. 5C and Fig. S6). Likewise, genes with many orthologs tend to have higher centrality scores (Fig. 5C and Fig. S6). Therefore, the stoichiometry conservation in cells correlates with the evolutionary conservation of proteins.
Comparable correlations of stoichiometry conservation centrality with gene essentiality and evolutionary conservation were also found for the S. pombe transcriptome data (Fig. S6). In addition, fractions of coding genes almost monotonically decreased with ranks of centrality score in the S. pombe data (Fig. S6).
We further analyzed two kinds of Homo sapiens transcriptome data. One is a human cell atlas, in which expression of both coding and noncoding genes in 15 fetal organs was quantified (36), and the other is genome-wide Perturb-seq data (37), in which genetically perturbed transcriptomes were measured mainly for coding genes. Our analysis of the human cell atlas data revealed that, while the overall distribution of stoichiometry conservation centrality was broad (Fig. 5D top), the centrality distribution of coding genes was skewed to higher values (Fig. 5D bottom) as observed for the E. coli proteome. Fractions of coding genes almost mono-tonically decreased with ranks of centrality (Fig. 5E) as seen in the S. pombe data (Fig. S6). Essentiality of each gene in human cells was quantified by an index called CRISPR score, which measures the fitness cost imposed by CRISPR-based inactivation of the gene (38). Genes with lower CRISPR scores are considered more essential. Our analysis revealed that genes with higher stoichiometry conservation centrality scores tend to have lower CRISPR scores, thus more essential (Fig. 5F). Similarly, genes with lower CRISPR scores tend to have higher stoichiometry conservation centrality scores. Furthermore, genes with higher centrality scores have more orthologs across the three domains of life, and vice versa (Fig. 5G). Comparable correlations of stoichiometry conservation with essentiality and evolutionary conservation were also found in the genome-wide Perturb-seq data (Fig. 5H and Fig. 5I). Together, these results suggest that correlations of stoichiometry conservation centrality with gene essentiality and evolutionary conservation are general and preserved from E. coli to human cells regardless of the type of perturbation.
Revealing global stoichiometry conservation architecture of the proteomes with csLE
To gain further insights into the linkage between stoichiometry conservation constraints and cellular Raman spectra, we next analyzed the proteomes using a method similar to Laplacian eigenmaps (LE) (39). We consider a symmetric matrix A with its (i, j) entry is cos
In this csLE space ΩLE, the stoichiometry conservation centrality of the proteins decreased from center to periphery (Fig. 5K), which confirms that it indeed measures the extent to which each protein is close to the center in the entire stoichiometry conservation architecture. Furthermore, the proteins formed polyhedral distributions with the cluster of the proteins in the homeostatic core at the center and the clusters of the proteins in the other condition-specific SCGs at distinct vertices (Fig. 5L). This distribution is consistent with the fact that the condition-specific SCGs are the components whose expression patterns are distant from the homeostatic core and also between each other.
Representing the proteomes using the Raman LDA axes
Given that the analysis of the Raman-proteome regression coefficients B initially led us to identify the SCGs and the stoichiometry conservation architecture in the proteome data, major changes in cellular Raman spectra characterized by LDA might reflect the expression changes of these coherently-expressed components.
The coefficients in the regression matrix B must satisfy
We then found that the distribution of the proteins in ΩB closely resembled the one in ΩLE when visualized using the first few major axes (Fig. 5L and Fig. 6A). This similarity is nontrivial because ΩLE is constructed only from the proteome data, whereas ΩB depends on the Raman LDA space.

Raman-based proteome structure and its similarity to stoichiometry-based proteome structure.
(A) Proteome structure determined by Raman-proteome coefficients visualized in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a protein species. The proteins belonging to each SCG are indicated with distinct markers. We note that SCGs are defined without referring to Raman data (Fig. 4). (B–D) Similarity among the distribution of LDA Raman spectra (B), the proteome structure determined by Raman-proteome coefficients (C), and the proteome structure determined by stoichiometry conservation (D). (E) Mathematical relation between the coordinates of the proteins in ΩB (C) and ΩLE (D). The two conditions, one between b0 and
We remark that each axis of ΩB is directly linked to the corresponding Raman LDA axis. Consequently, the orthants in ΩB where the condition-specific protein species reside agree with those in the Raman LDA space where the cellular Raman spectra under corresponding condi-tions reside (Fig. S10) (see sections 1.7 and 2.1 in (22)). Indeed, we find such orthant agreement between the proteins in the condition-specific SCGs (SCG 2–SCG 5) and the cellular Raman spectra under the corresponding conditions (Fig. 6B and Fig. 6C). This straightforward correspondence between ΩB and the Raman LDA space allows us to examine the relationship between changes in cellular Raman spectra and omics components’ stoichiometry conservation architecture by comparing the two proteome structures in ΩB and ΩLE.
Omics-level interpretation of cellular Raman spectra and a quantitative constraint between expression generality and stoichiometry conservation centrality
To understand rigorously what the similarity of the proteome structures in ΩB and ΩLE signifies (Fig. 6C and Fig. 6D), we clarified the mathematical relation between the coordinates of the proteins in these two spaces (Fig. 6E; see sections 2.1 and 2.2 in (22) for detail). We then characterized the two mathematical conditions that must be satisfied simultaneously (Fig. 6E).
The first condition is that major axes of the Raman LDA space and those of the proteome csLE space correspond (Fig. 6E). Consequently, cellular Raman spectra under a condition accompanying the expression of a condition-specific SCG must be significantly different from those under conditions with the expression of other condition-specific SCGs in a manner distinguishable by LDA. We verified this first condition with the data (Fig. S9).
The second condition is that the stoichiometry conservation centrality of each protein species di must be proportional to gi := ∥pi∥1/∥pi∥2, where ∥pi∥1 and ∥pi∥2 are the L1 and L2 norms of the expression level vector of protein i across conditions (Fig. 6E).
gi can be interpreted as the expression generality score. When gi is large, the protein i is expressed generally across conditions; when gi is small, this is expressed only under specific conditions (Fig. S8) (see section 1.9 in (22)). Therefore, the proportionality between di and gi indicates that the proteins with high stoichiometry conservation centrality must be expressed non-specifically to conditions. We also verified this condition with the data, confirming that it is indeed satisfied (Fig. 7A and Fig. S9).

Proportionality between stoichiometry conservation centrality and expression generality.
(A) Relationships between stoichiometry conservation centrality (di) and expression generality (gi). Each gray dot represent a protein species. The proteins belonging to each SCG are indicated with distinct markers. The dashed lines are
The spread of the points from the proportionality diagonal line of the E. coli proteome data in Fig. 7A was found related to the growth rate under the condition where each protein is expressed the most (see section 2.2 in (22) for a detailed analysis on the origin of the deviation). Consequently, one can envisage a growth-rate-dependent expression pattern of each protein on the basis of its relative position in this gi-di plot (Fig. 7B and Fig. 7C). For example, both BamB and YqjD are expressed non-specifically to the conditions with nearly identical expression generality scores. However, BamB is expressed at higher levels under fast growth conditions, whereas YqjP is expressed at higher levels under slow growth conditions due to their relative positions to the proportionality line. Similar growth rate dependence is observed for PaaE and DgoA, but with more prominent condition-specificity because these proteins are characterized by their low expression generality scores. These growth-rate-dependent deviation patterns might hint at a new growth law that governs the total relative expression changes of the proteome components (see section 2.2 in (22) for detailed discussion).
Generality
We also examined the generality of the aforementioned two conditions using the Raman and proteome data of E. coli strains with different genotypes (BW25113, MG1655, and NCM3722) under two culture conditions (20) and the Raman and transcriptome data of S. pombe under 10 culture conditions (18). Applying csLE to the omics data, we again found similar omics structures between ΩLE and ΩB when visualized using the first few major axes, with homeostatic cores at the centers and condition-specific SCGs at the vertices (Fig. S11 and Fig. S12).
Proportionality between stoichiometry conservation centrality and expression generality score was also confirmed in both additional datasets (Fig. S7). We further used publicly available quantitative proteome data of M. tuberculosis, M. bovis, and S. cerevisiae (30, 31) to examine this relation and confirmed that the proportionality universally holds (Fig. S7 and Fig. S13). Almost no deviation from the proportionality line existed in the S. cerevisiae proteome data measured for the cells in different media but cultured in chemostats with an identical dilution rate (thus, identical growth rate), which is consistent with the result of E. coli in which the deviations were related to the growth rate differences.
Discussion
A Raman spectrum obtained from a single cell is a superposition of the spectra of all of its constituent biomolecules. Therefore, cellular Raman spectra potentially contain rich information on essential state differences in targeted cells. The fact that both transcriptomes and proteomes are inferable from cellular Raman spectra, as demonstrated in this and previous (18) studies, endorses this speculation. The detailed analyses of the relationship between Raman and omics data have identified functionally relevant constraints on omics changes and provided an interpretation of cellular Raman spectra (Fig. S1). Specifically, it has been revealed that major changes in cellular Raman spectra distinguishable by LDA reflect the changes in omics profiles under the constraints of stoichiometry conservation. This correspondence would help us interpret global changes in cellular Raman spectra by translating them into the differences in omics profiles.
We remark that linearity in our formulation enabled us to find the rigorous connection between the two omics spaces ΩB and ΩLE (Fig. 6E). Unlike the original LE, we adopted cosine similarity as weights of edges between all node pairs to measure expression stoichiometry conservation of proteins. This modification was indispensable in terms of interpretation; relative proximity of positions in ΩLE reflects the strength of stoichiometry conservation. We also remark that simple principal component analysis (PCA) applied to the normalized E. coli proteome data also finds a similar low-dimensional proteome structure (Fig. S6) (see section 1.10 in (22)). Therefore, besides interpretability, omics structures in ΩLE might reflect dominant relationships among omics components commonly characterized by several methods of omics representation.
Stoichiometry conservation is plausibly crucial for cellular functions and physiology. For example, the enzymes involved in evolutionarily conserved metabolic pathways conserve their stoichiometry across microorganism species despite their diverse transcriptional and translational rates (40). It is suggested that stoichiometry conservation is achieved by optimizing the metabolic flux for fast growth (41). Furthermore, a ribosome-targeting antibiotic causes an imbalance of ribosomal proteins and growth arrest in E. coli, but the balance is restored alongside growth recovery through physiological adaptation (42). These results suggest that breakage of stoichiometric balance among core components could impose significant fitness cost.
It is intriguing to ask how cells conserve stoichiometry among the components in each SCG. Especially, the homeostatic core (SCG 1) contains many components whose gene loci are scattered throughout the genome. It is known that both transcriptional and translational negative auto-regulation contributes to controlling the stoichiometry of many ribosomal proteins (43–48). The genes for the ribosomal proteins are scattered in multiple operons and co-regulated with many other non-ribosomal proteins, such as RNA polymerase subunits, translation initiation/elongation factors, and transmembrane transporters (49). Therefore, the stoichiometry-conserving mechanisms established for ribosomes might be partially exploited for the stoichiometry conservation within the homeostatic core.
The proportionality between stoichiometry conservation centrality and expression generality score suggests that proteins with high stoichiometry conservation centrality govern basal cellular functions required under any conditions. In fact, both essential genes and evolutionarily conserved genes are enriched in the omics fractions with high centrality scores. On the contrary, proteins of low centrality scores might have been acquired in later stages of the evolution and exploited to survive or increase fitness under specific conditions. Such hierarchy in the stoichiometry conservation centrality among core and peripheral processes might promote the adaptability of cells since cells can respond to diverse environments without restructuring a large body of the functional homeostatic core. This architectural principle in omics might underlie the robustness and adaptability of biological cells.
Acknowledgements
We thank Matthias Heinemann and Silke Bonsing-Vedelaar for detailed information on the E. coli culture conditions; Doeke R. Hekstra, Tetsuya J. Kobayashi, Takafumi Miyamoto, John Russell, and Ian Hunt-Isaak for reading the manuscript and providing critical comments; Kunihiko Kaneko, Chikara Furusawa, Yasushi Okada, and members of the Wakamoto Lab and the Universal Biology Institute for discussion and encouragement. This work was supported by JST CREST Grant Number JPMJCR1927 (Y.W.); JST ERATO Grant Number JPMJER1902 (Y.W.); JSPS KAKENHI Grant Numbers 19J22448 (K.F.K.) and 21K20672 (T.N.).
Additional files
References
- 1.The Strategy of the GenesGeorge Allen & Unwin Ltd
- 2.Nature183:1654
- 3.Microbiology144:1157
- 4.Journal of Raman Spectroscopy35:525
- 5.PLoS ONE9
- 6.Communications Biology1:85
- 7.Proceedings of the National Academy of Sciences95:14863
- 8.Nature Genetics34:166
- 9.Physical Review E67:031902
- 10.Molecular Systems Biology9:701
- 11.Nature500:301
- 12.Physical Review X5:1
- 13.Molecular Systems Biology11:784
- 14.Cell Systems2:239
- 15.Nature Communications8
- 16.Molecular Biology and Evolution37:2865
- 17.Physical Review Research2:013197
- 18.Cell Systems7:104
- 19.Nature Biotechnology:1–9
- 20.Nature Biotechnology34:104
- 21.Handbook of geometric computingSpringer pp. 129–167
- 22.Supplementary materials
- 23.Journal of Machine Learning Research9:2579
- 24.The design of experimentsLondon: Oliver And Boyd; Edinburgh
- 25.Supplement to the Journal of the Royal Statistical Society4:119
- 26.Statistical Applications in Genetics and Molecular Biology9
- 27.Science278:631
- 28.BMC Bioinformatics4:1
- 29.Nucleic Acids Research43:D261
- 30.Cell Host & Microbe18:96
- 31.Cell Systems4:495
- 32.Biochimica et Biophysica Acta42:99
- 33.Science330:1099
- 34.EcoSal Plus3
- 35.Nucleic Acids Research34:D363
- 36.Science370:eaba7721
- 37.Cell185:2559
- 38.Science350:1096
- 39.Neural Computation15:1373
- 40.Cell173:749
- 41.eLife10:e69222
- 42.eLife11:e74486
- 43.Proceedings of the National Academy of Sciences77:7084
- 44.Nature289:89
- 45.Microbiology and Molecular Biology Reviews71:477
- 46.Translational Regulation of Gene Expression 2Springer pp. 23–47
- 47.RNA14:1882
- 48.Communications Biology3:1
- 49.Nucleic Acids Research45:D543
- 50.Nucleic Acids Research48:D689
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2025, Kamei et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 18
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.