Research Article

Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns

Department of Basic Science, Graduate School of Arts and Sciences, The University of Tokyo, Japan
Department of Medicine, The University of Chicago, United States
Research Center for Complex Systems Biology, The University of Tokyo, Japan
Universal Biology Institute, The University of Tokyo, Japan
Department of Optical Imaging, Advanced Research Promotion Center Tokushima University, Japan
Department of Biology, New York University, United States

Apr 14, 2026

https://doi.org/10.7554/eLife.101485.3

Open access
Copyright information

Figures
Tables
Additional files

21 figures, 12 tables and 1 additional file

Figures

Figure 1

Download asset Open asset

Cellular physiological state differences detected by Raman spectral global patterns and gene expression profiles.

(A) Condition-dependent cellular Raman spectral patterns. Raman spectra obtained from cells reflect their molecular profiles. Therefore, systematic differences in global spectral patterns may indicate their physiological states. A Raman spectrum from each cell can be represented as a vector and a point in a high-dimensional Raman space. If condition-dependent differences exist in the spectral patterns, appropriate dimensional reduction methods allow us to classify the spectra and detect cellular physiological states in a low-dimensional space. (B) Condition-dependent gene expression profiles. Global gene expression profiles (proteomes and transcriptomes) are also dependent on conditions. For each gene, we can consider a high-dimensional vector whose elements represent expression levels under different conditions. It has been suggested that these expression-level vectors are constrained to some low-dimensional manifolds (Eisen et al., 1998; Segal et al., 2003; Bergmann et al., 2003; Keren et al., 2013; You et al., 2013; Kaneko et al., 2015; Hui et al., 2015; Heimberg et al., 2016; Biswas et al., 2017; Husain and Murugan, 2020; Sato and Kaneko, 2020). This study characterizes the statistical correspondence between dimension-reduced Raman spectral patterns and gene expression profiles. Analyzing the correspondence, we reveal a stoichiometry conservation principle that constrains gene expression profiles to low-dimensional manifolds.

Figure 2

Download asset Open asset

Estimation of proteomes from Raman spectra.

(A) The experimental design. We cultured *E. coli* cells under 15 different conditions and measured single cells’ Raman spectra. We then examined the correspondence between the measured Raman spectra and the absolute quantitative proteome data reported by Schmidt et al., 2016. (B) Representative Raman spectra from single cells, one from the ‘Glucose’ condition, and the other from the ‘LB’ condition. The fingerprint region and representative peaks are annotated. (**C–E**) Cellular Raman spectra in linear discriminant analysis (LDA) space. The dimensionality of the spectra is reduced to $14 (= 15 - 1)$ . Each point represents a spectrum from a single cell, and each ellipse shows the 95% concentration ellipse for each condition. Their projections to the LDA1-LDA2 plane (C), the LDA1-LDA3 plane (D), and the LDA1-LDA4 plane (E) are shown. (F) Visualization of the 14-dimensional LDA space embedded in two-dimensional space with t-distributed stochastic neighbor embedding (t-SNE). (G) The scheme of leave-one-out cross-validation. The Raman and proteome data of one condition (here $j$ ) are excluded, and the matrix $B$ is estimated using the data of the rest of the conditions as $B_{- j}^{e s t}$ . The proteome data under the condition $j$ is estimated from the Raman data ${\hat{𝒓}}_{j}$ with $B_{- j}^{est}$ and compared with the actual data to calculate estimation errors. (H) Comparison of measured and estimated proteome data. The plot for the ‘Glucose’ condition is shown as an example. Each dot corresponds to one protein species. The straight line indicates $x = y$ . Proteins with negative estimated values are not shown.

Figure 3

Download asset Open asset

A stoichiometrically conserved protein group identified by an analysis of the Raman-proteome coefficient matrix.

(A) Scatterplots of Raman-proteome transformation coefficients. The horizontal axes are constant terms ( $𝒃_{0}$ ) in all the plots. The vertical axis is coefficients for LDA1 ( $𝒃_{1}$ ), LDA2 ( $𝒃_{2}$ ), LDA3 ( $𝒃_{3}$ ), or LDA4 ( $𝒃_{4}$ ) in each plot. The proteins in the information storage and processing (ISP) Clusters of Orthologous Group (COG) class are indicated in yellow. Yellow solid straight lines are least squares regression lines passing through the origins for the ISP proteins. Insets are enlarged views of area around the origins. In this figure, we used the average of $B_{- i}^{est}$ as an estimate of $B$ . (B) Similarity of expression patterns between culture conditions for each COG class. We divided the proteome into COG classes (Tatusov et al., 2003; Galperin et al., 2015) and calculated Pearson correlation coefficient of expression patterns for all the combinations of culture conditions. Since the data are from 15 conditions, there are 105 (=15·14/(2·1)) points for each COG class in the graph. The box-and-whisker plots summarize the distributions of the points. The lines inside the boxes denote the medians, the top and bottom edges of the boxes do the 25th percentiles and 75th percentiles, respectively. The numbers of protein species are 376 for the Cellular Processes and Signaling COG class, 354 for the ISP COG class, and 840 for the Metabolism COG class. See Appendix 1—figure 4 for the evaluation with Pearson correlation coefficient of log abundances and with cosine similarity. Appendix 1—figure 4 also contains figures directly showing expression-level changes of different protein species across conditions for each COG class. (C) Examples of stoichiometry-conserving proteins in the ISP COG class. The horizontal axis represents the abundance of RplF under 15 conditions, and the vertical axis represents those of several ISP COG class proteins. These proteins are also contained in the *homeostatic core* defined later (see Figure 4). The solid straight lines are linear regression lines with an intercept of zero. (D) Examples of abundance ratios of non-ISP COG class proteins. The horizontal axis represents the abundance of RplF under 15 conditions, and the vertical axis represents those of compared non-ISP COG class proteins. Crp belongs to the Cellular Processes and Signaling COG class; the other proteins belong to the Metabolism COG class. In both (C) and (D), we selected the proteins expressed from distant loci on the chromosome. All sigma factors participating in the regulation of the proteins examined in (C) and (D) are listed on the right of the gene name legends. All transcription factors known to regulate multiple genes listed here are shown in the right diagrams. Arrows show activation; bars represent inhibition; and squares indicate that a transcription factor activates or inhibits depending on other factors. The information on gene regulation and functions was obtained from EcoCyc (Keseler et al., 2017) in August 2022. The error bars are standard errors calculated by using the data of Schmidt et al., 2016. The insets show the positions of the genes on the *E. coli* chromosome determined based on ASM75055v1.46 (Howe et al., 2020). No genes are in the same operon.

Figure 4

Download asset Open asset

Extracting stoichiometrically conserved groups (SCGs) from proteome data.

(A) Quantifying stoichiometry conservation by cosine similarity. We consider an $m$ -dimensional expression vector for each protein species whose elements represent its abundance under different conditions. The cosine similarity between the $m$ -dimensional expression vectors of two protein species becomes nearly 1 when they conserve mutual stoichiometry strongly across conditions, whereas lower than 1 when their expression patterns are incoherent. (B) Extracted SCGs. We extracted proteins with high cosine similarity relationships. Each node represents a protein species. An edge connecting two nodes represents that the expression patterns of the two connected protein species have high cosine similarity exceeding a threshold of 0.995. Proteins that have no edge with the other proteins are not shown. The largest and the second largest protein groups, which we refer to as SCG 1 and SCG 2, respectively, are indicated by shaded polygons. (C) Expression patterns of the extracted SCGs. The horizontal and vertical axes represent growth rate and protein abundance, respectively. Line-connected points represent expression-level changes of different protein species across conditions. SCG 1 (homeostatic core) is shown in two ways: the left panel with a linear-scaled vertical axis and the right panel with a log-scaled vertical axis. The inset for SCG 2 shows the total abundances of SCG 2 proteins with a log-scaled vertical axis. Error bars are standard errors. (D) The gene loci of the homeostatic core (SCG 1) proteins on the chromosome. Magenta dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). We determined the gene loci based on ASM75055v1.46 (Howe et al., 2020).

Figure 5

Download asset Open asset

A proteome structure characterized by global stoichiometry conservation relationships.

(A) Distributions of stoichiometry conservation centrality values for all the proteins (gray), the homeostatic core (SCG 1) proteins (magenta), and the proteins belonging to the other stoichiometrically conserved groups (SCGs) (cyan). (B) Correlation between stoichiometry conservation centrality and gene essentiality. The proportion of essential genes within each class of stoichiometry conservation ranking is shown. The list of essential genes was downloaded from EcoCyc (Keseler et al., 2017). (C) Correlation between stoichiometry conservation and evolutionary conservation. The strength of evolutionary conservation of each protein species was estimated by the number of orthologs found in the OrthoMCL species (Chen et al., 2006). The genes with more orthologs tend to have higher stoichiometry conservation centrality ( $p = 3.42 \times 10^{- 14}$ by one-sided Brunner-Munzel test between the top 25% and the bottom 25% fractions of ortholog number ranking). Likewise, the genes with higher stoichiometry conservation centrality scores tend to have more orthologs ( $p = 8.44 \times 10^{- 12}$ by one-sided Brunner-Munzel test, top 25%–bottom 25% comparison; $p$ -values in the captions for (**F–I**) were evaluated with the same statistical test scheme). (**D–G**) Stoichiometry conservation analyses of human cell atlas transcriptome data of fetal 15 organs (Cao et al., 2020). The top gray histogram in (D) shows the distribution of stoichiometry conservation centrality values for all genes. The bottom histograms in (D) show the distribution for coding genes (yellow) and that for the other genes (cyan). (E) shows a correlation between the ratio of coding genes and stoichiometry conservation centrality calculated from the human cell atlas data. (F) shows a correlation between gene essentiality and stoichiometry conservation centrality calculated from the human cell atlas data. The essentiality of each human gene was quantified by CRISPR score, which is the fitness cost imposed by CRISPR-based inactivation of the gene in KBM7 chronic myelogenous leukemia cells (Wang et al., 2015). Genes with lower CRISPR score are regarded as more essential. The fraction with low CRISPR scores (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The fraction with high centrality scores tends to be more essential ( $p < 10^{- 15}$ ). (G) shows a correlation between evolutionary conservation and stoichiometry conservation centrality based on the human cell atlas data. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The gene fraction with high centrality scores tends to have more orthologs ( $p < 10^{- 15}$ ). (H) and (I) Stoichiometry conservation analyses of genome-wide Perturb-seq data (Replogle et al., 2022). (H) shows a correlation between stoichiometry conservation centrality calculated from the Perturb-seq data and gene essentiality. The essentiality of each gene was quantified by the CRISPR score as in (F). The gene fraction with low CRISPR scores (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The gene fraction with high centrality scores tends to be more essential ( $p < 10^{- 15}$ ). (I) shows a correlation between stoichiometry conservation based on the Perturb-seq data and evolutionary conservation of genes. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The gene fraction with high centrality scores tends to have more orthologs ( $p < 10^{- 15}$ ). (J) Representation of the proteomes as a graph. A node corresponds to a protein species, and the weight of an edge is taken as the cosine similarity between the $m$ -dimensional expression vectors of the two connected protein species. The $n \times n$ matrix $A$ can specify the whole graph. Note that the diagonal elements of $A$ are ones, which were introduced just for simplicity. (K) Cosine similarity LE (csLE) structure in a three-dimensional space. Each dot represents a different protein species and is color-coded on the basis of its stoichiometry conservation centrality value. We selected the axes considering the structural similarity to the Raman-based proteome structure in $Ω_{B}$ (see Figure 6). (L) The csLE structure in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a different protein species. The proteins belonging to each SCG are indicated with distinct markers. Colors of the two-dimensional histograms in (C), (F), (G), (H), and (I) represent the height of each bar.

Figure 6

Download asset Open asset

Raman-based proteome structure and its similarity to stoichiometry-based proteome structure.

(A) Proteome structure determined by Raman-proteome coefficients visualized in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a protein species. The proteins belonging to each stoichiometrically conserved group (SCG) are indicated with distinct markers. We note that SCGs are defined without referring to Raman data (Figure 4). (**B–D**) Similarity among the distribution of linear discriminant analysis (LDA) Raman spectra (B), the proteome structure determined by Raman-proteome coefficients (C), and the proteome structure determined by stoichiometry conservation (D). (E) Mathematical relation between the coordinates of the proteins in $Ω_{B}$ (C) and $Ω_{LE}$ (D). The two conditions, one with $Θ$ (magenta) and the other between $𝒃_{0}$ and $𝒃_{0}^{est}$ (cyan), must hold for the similarity between the two proteome structures (yellow), as described in the gray box. $\overset{⋆}{\propto}$ denotes column-wise proportionality.

Figure 7

Download asset Open asset

Proportionality between stoichiometry conservation centrality and expression generality.

(A) Relationships between stoichiometry conservation centrality ( $d_{i}$ ) and expression generality ( $g_{i}$ ). Each gray dot represents a protein species. The proteins belonging to each stoichiometrically conserved group (SCG) are indicated with distinct markers. The dashed lines are $y = n, x = 1, \sqrt{m} (n = 2058, m = 15)$ . The solid lines represent $y = {(\sum_{j = 1}^{n} d_{j}) / m}^{1 / 2} x$ (see Section 2.2 in Appendix). The deviation of a point from the solid line is related to the growth rate under the condition where each protein is expressed the most. (B) The same plot as (A) in black and white. Overlaid red circles indicate proteins featured in (C). (C) Expression patterns of the proteins indicated by red circles in (B) across conditions. The condition differences are shown by the growth rate differences on the horizontal axes. The arrangement of the plots for the proteins corresponds to their relative positions in (B).

Appendix 1—figure 1

Download asset Open asset

Schematic illustration of the approach in this study.

Related to Figure 1. Raman spectra and gene expression profiles are both high-dimensional vectors and can be represented as points in high-dimensional spaces. Coarse-graining Raman spectra by dimensional reduction finds condition-dependent differences in their global spectral patterns (see Figure 2). The dimension-reduced spectra were linked to and used to predict condition-dependent global gene expression profiles (see Figure 2), which implies that global changes in spectral patterns detect differences in cellular physiological states. The analysis of this linkage led us to discover a stoichiometry-conserving constraint on gene expression, which enabled us to represent gene expression profiles in a functionally relevant low-dimensional space (i; see also Figures 3—5). Then, we find a nontrivial correspondence between these low-dimensional Raman and gene expression spaces (ii; see also Figure 6). This correspondence provides an omics-level interpretation of global Raman spectral patterns and a quantitative constraint between expression generality and stoichiometry conservation centrality (ii; see also Figure 7, Appendix 1—figure 9).

Appendix 1—figure 2

Download asset Open asset

Custom-built Raman microscope and analyses of *E. coli* Raman spectra.

Related to Figure 2. (A) Schematic diagram of the Raman microscope used in this study. (B) Representative Raman spectra from single *E. coli* cells. The fingerprint region of one spectrum is shown for each condition. (C) Linear superposition of Raman shifts. Each linear discriminant analysis (LDA) axis is a linear superposition of Raman shifts. These figures show the coefficients for LDA1 (left) and LDA2 (right). (D) Relationship between Raman LDA1 axis and growth rates. The horizontal axis represents Raman LDA1 axis. The vertical axis represents growth rates measured in Schmidt et al., 2016. Each point corresponds to the data for one condition. Pearson correlation coefficient is 0.81±0.09.

Appendix 1—figure 3

Download asset Open asset

Estimation of proteomes from Raman spectra.

Related to Figure 2. Comparing the measured proteomes with those estimated from Raman spectra. The horizontal and vertical axes represent the estimated and measured proteomes, respectively. Proteins with negative estimated abundance are not shown in these figures. The conditions with the largest and the second largest numbers of proteins with negative estimated abundance were ‘stationary3days’ (666 proteins) and ‘LB’ (359 proteins). The conditions with the fewest and the second fewest negatively estimated proteins were ‘GlucosepH6’ (0 proteins) and ‘Xylose’ (7 proteins).

Appendix 1—figure 4

Download asset Open asset

Comparison of stoichiometry conservation among Clusters of Orthologous Group (COG) classes.

Related to Figure 3. (**A and B**) Relations between protein abundance and constant terms of Raman-proteome coefficients. The horizontal axes are $b_{0}$ (constant terms), and the vertical axes are ${\hat{p}}_{i}^{⊤}$ (protein abundance). Dashed lines are the least squares regression lines with intercept zero for information storage and processing (ISP) COG class members. The average of $B_{- i}^{est}$ was used as an estimate of $B$ here. In (A), only ISP COG class members are shown for three representative conditions: ‘Galactose’, ‘Glucose’, and ‘GlycerolAA’. In (B), all proteins are shown for a representative condition, ‘GlycerolAA’. (C) Relations between protein abundance and growth rates of *E. coli* under 15 environmental conditions. We analyzed the absolute quantitative proteome data, growth rate data, and COG annotation reported by Schmidt et al., 2016. Lines represent different protein species. Error bars are standard errors. The top panel is for the Cellular Processes and Signaling COG class; the middle is for the ISP COG class; and the bottom is for the Metabolism COG class. (D) Relations between protein abundance and growth rates of three *E. coli* strains (BW25113, MG1655, and NCM3722) under two culture conditions. We again analyzed the data by Schmidt et al., 2016. Lines represent different protein species. Error bars are standard errors. (**E and F**) COG class-dependent expression pattern similarity of *E. coli* proteomes between conditions. The *E. coli* proteome data under the 15 different environmental conditions were analyzed. The similarity is evaluated by Pearson correlation coefficients of log expression levels in (E) and by cosine similarity in (F). We consider all the combinations of the 15 conditions. Thus, there are 105 data points for each COG class. The box-and-whisker plots summarize the distributions of the points. The lines inside the boxes denote the medians. The top and bottom edges of the boxes denote the 25th percentiles and 75th percentiles, respectively. Note that (E) and (F) are evaluations of the same data used in Figure 3B in the main text with different similarity indices. (G) COG class-dependent expression pattern similarity between different strains of *E. coli* (BW25113, MG1655, and NCM3722). The absolute quantitative proteome data and COG annotation were taken from Schmidt et al., 2016. The similarity was evaluated by cosine similarity. The data contain three strains. Thus, there are three points for each COG class. The top panel is for the ‘Glucose’ condition, and the bottom is for the ‘LB’ condition. (**H–J**) COG class-dependent expression pattern similarity in other organisms. (H) is for *M. tuberculosis* (data from Schubert et al., 2015; six environmental conditions [time points]), (I) for *M. bovis* (data from Schubert et al., 2015; six environmental conditions [time points]), and (J) for *S. cerevisiae* (data from Lahtvee et al., 2017; 10 environmental conditions). The COG annotations were taken from the December 2014 release of 2003-2014 COGs (Galperin et al., 2015) and the Release 3 of ‘Mycobrowser’ (Kapopoulou et al., 2011) for (H) and (I) and from the Comprehensive Sake Yeast Genome Database (S288C strain) (Akao et al., 2011) for (J). The unit for protein abundance was fg/cell for (H) and (I) and fg in pg dry cell weight for (J).

Appendix 1—figure 5

Download asset Open asset

Single-gene-level growth law in the homeostatic core.

Related to Figure 4. (A) Relationship between population growth rates and total abundance of SCG 1 (homeostatic core) proteins. Here, we analyzed the *E. coli* proteome data (Schmidt et al., 2016), focusing on the 15 conditions for which we obtained Raman data. The dashed line is the least squares regression line. (B) Scatterplots of log abundance of SCG 1 (homeostatic core) proteins. Here, the proteomes under three representative conditions, ‘LB’, ‘Glucose’, and ‘Galactose’, are compared with that under the standard condition ‘Glycerol’. Each colored line is the linear regression line with slope one for the points with the same color. The vertical line is $x = 0$ . (C) Relationship between population growth rate and coefficient of determination of linear regression in (B). The vertical line represents the growth rate under the standard condition (‘Glycerol’). (D) Linear relationship between common abundance ratio and growth rates. The vertical axis represents $10^{Γ_{c}}$ , where $Γ_{c}$ is the y-intercepts in (B) (see Section 3.1.2 in Appendix). The dashed line is the linear regression line. The horizontal line is $y = 1$ , and the $x$ coordinate of the vertical line is the growth rate under the standard condition (‘Glycerol’). (E) The gene loci of the proteins belonging to the condition-specific stoichiometrically conserved groups (SCGs) on the chromosome (ASM75055v1.46; Howe et al., 2020). Colored dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). The edge in the map of SCG 5 cannot be seen because their gene loci are clustered in close proximity in the same operon.

Appendix 1—figure 6

Download asset Open asset

Functional relevance of stoichiometry conservation centrality.

Related to Figure 5. (A) Relationship between gene essentiality and stoichiometry conservation centrality in *E. coli*. The proportion of essential genes is plotted for each stoichiometry conservation centrality rank range. In this plot, we calculated stoichiometry conservation centrality based on the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. The list of essential genes was downloaded from EcoCyc (Keseler et al., 2017). (B) Relationship between gene essentiality and stoichiometry conservation centrality in *S. pombe*. We calculated stoichiometry conservation centrality based on the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018. Only coding genes are considered in this plot, though stoichiometry conservation centrality values were calculated using both coding and non-coding genes. Gene classification is based on PomBase (Harris et al., 2022). Some bins do not reach 100% in sum because 11 coding genes in the *S. pombe* transcriptome data were not found in the current PomBase. (C) Relationship between ratio of coding genes and stoichiometry conservation centrality in the *S. pombe* transcriptome data. The coding/non-coding assignment is based on PomBase (Harris et al., 2022). (D) Correlation between stoichiometry conservation and evolutionary conservation. In this plot, we calculated stoichiometry conservation centrality based on the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. Colors represent the height of each bar. The distributions of stoichiometry conservation centrality were compared between the top 25% and the bottom 25% fractions in the number of orthologs rankings. The fraction with many orthologs tends to have higher stoichiometry conservation centrality (one-sided Brunner-Munzel test, $p = 7.84 \times 10^{- 15}$ ). The distributions of the number of orthologs were compared between the top 25% and the bottom 25% stoichiometry conservation centrality fractions. The high centrality fraction tends to have more orthologs (one-sided Brunner-Munzel test, $p = 1.46 \times 10^{- 11}$ ). Ortholog data were taken from OrthoMCL-DB (Chen et al., 2006). (**E–G**) Correlation between stoichiometry conservation and evolutionary conservation in *S. pombe*. We calculated stoichiometry conservation centrality based on the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018. In (E), the result is shown by a two-dimensional histogram. Colors represent the height of each bar. The distributions of the number of orthologs were compared between the top 25% and the bottom 25% stoichiometry conservation centrality fractions. The high centrality fraction tends to have more orthologs (one-sided Brunner-Munzel test, $p = 0.00548$ ). The direct comparison between the two fractions is shown in (F). The distributions of stoichiometry conservation centrality were compared between the top 25% and the bottom 25% fractions in the number of orthologs rankings. The fraction with many orthologs tends to have higher stoichiometry conservation centrality (one-sided Brunner-Munzel test, $p = 0.00270$ ). The direct comparison between the two fractions is shown in (G). Ortholog data were taken from OrthoMCL-DB (Chen et al., 2006). (H) Applying principal component analysis (PCA) to $L^{2}$ -normalized proteomes. PCA (with mean centering) was applied to $L^{2}$ -normalized proteome data $[\begin{array}{lll} p_{1} / {‖ p_{1} ‖}_{2} & \dots & p_{n} / {‖ p_{n} ‖}_{2} \end{array}]$ . Here, we analyzed the *E. coli* proteome data under the 15 conditions for which we obtained Raman data. The left is a projection onto a two-dimensional space, and the right is a projection onto a three-dimensional space. The axes for visualization were selected by considering similarity to the cosine similarity LE (csLE) structure.

Appendix 1—figure 7

Download asset Open asset

Distributions and constraints with respect to stoichiometry conservation centrality (degree).

Related to Figure 5 and Figure 7. (A) Comparison of degree (stoichiometry conservation centrality) distributions between original (yellow) and randomized (blue) *E. coli* proteome data. We created randomized proteome data by shuffling the expression levels across the protein species within each condition. We used the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. (B) Comparison of the $g_{j}$ - $d_{j}$ relationships between original (yellow) and randomized data (blue). The horizontal axis is expression generality score ( $g_{j} =$ $L^{1}$ norm/ $L^{2}$ norm), and the vertical axis is stoichiometry conservation centrality ( $d_{j}$ : degree). Each dot represents a protein species. The dashed lines are $y = n$ , $x = 1, \sqrt{m}$ ( $n = 2058, m = 15$ ). The solid lines are $y = \sqrt{\sum_{i} d_{i} / m} x$ . (**C–H**) Degree (stoichiometry conservation centrality) distributions for additional datasets. Yellow histograms are for the original data, and blue histograms are for the randomized data. (C) For the proteomes of three *E. coli* strains (BW25113, MG1655, and NCM3722) in LB (Schmidt et al., 2016); (D) for the proteomes of the three *E. coli* strains in M9 Glucose (Schmidt et al., 2016); (E) for the proteomes of *M. tuberculosis* (Schubert et al., 2015); (F) for the proteomes of *M. bovis* (Schubert et al., 2015); (G) for the proteomes of *S. cerevisiae* (Lahtvee et al., 2017); and (H) for the transcriptomes of *S. pombe* (Kobayashi-Kirschvink et al., 2018). (**I–N**) $g_{j}$ - $d_{j}$ relationships for additional datasets. Each gray dot represents a protein species. The proteins belonging to the homeostatic core in each dataset are shown in magenta; those belonging to condition-specific stoichiometrically conserved groups (SCGs) are indicated in different colors in each plot. See the caption of Appendix 1—figures 11 and 13 for the cosine similarity threshold to specify the homeostatic core and the condition-specific SCGs in each dataset. The dashed lines are $y = n, x = 1, \sqrt{m}$ . The solid lines through the origins are $y = \sqrt{\sum_{i = 1}^{n} d_{i} / m} x$ (I) for the proteomes of the three *E. coli* strains in LB (Schmidt et al., 2016); (J) for the proteomes of the three *E. coli* strains in M9 Glucose (Schmidt et al., 2016); (K) for the proteomes of *M. tuberculosis* (Schubert et al., 2015); (L) for the proteomes of *M. bovis* (Schubert et al., 2015) (M) for the proteomes of *S. cerevisiae* (Lahtvee et al., 2017); and (N) for the transcriptomes of *S. pombe* (Kobayashi-Kirschvink et al., 2018).

Appendix 1—figure 8

Download asset Open asset

Properties of normalized expression vectors.

Related to Figure 7. (**A and B**) Schematic explanation for the interpretation of the $L^{1}$ norm/ $L^{2}$ norm ratio of expression vectors as an index of expression generality. (A) is a two-dimensional case, and (B) is a three-dimensional case. The inset in (A) schematically explains $L^{1}$ norm and $L^{2}$ norm of an expression vector. See ‘Interpretation of $L^{1}$ norm/ $L^{2}$ norm ratio of an expression vector as a quantitative measure of expression generality’ in Materials and methods for details. (C) Schematic explanation for deviations of points from the proportionality line in the $g_{j}$ - $d_{j}$ plots. Here, we consider four condition-specific protein species a, b, c, and d labeled in the descending order of growth rates under the conditions accompanying their expression. Note that their $L^{1}$ norm/ $L^{2}$ norm ratios are all one on the horizontal axis. One can show that the degree (stoichiometry conservation centrality) $d_{j}$ is proportional to the inner product of $L^{2}$ -normalized expression vector $p_{j} / ‖ p_{j} ‖_{2}$ and the expression norm vector ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ (see Equation 2.147 in Section 2.2.2). Since the elements of ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ increase approximately linearly with growth rates of the corresponding conditions (see D), the degrees (stoichiometry conservation centrality values) decrease from a to d in the order of growth rates. (**D–F**) Correlation between elements of ${\tilde{p}}_{tot}$ and population growth rates. The vertical axis represents the elements of ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ , and the horizontal axis represents the population growth rates. The dashed lines are $y = 1 / \sqrt{m}$ . (D) is the result from the analysis of the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data ( $m = 15$ ). (E) is the result from the analysis of the proteome data of three strains of *E. coli* (BW25113, MG1655, and NCM3722) under ‘LB’ and ‘Glucose’ conditions ( $m = 6$ ) (Schmidt et al., 2016). (F) is the result from the analysis of the proteome data of *S. cerevisiae* under 10 different conditions ( $m = 10$ ) (Lahtvee et al., 2017). The cells were cultured in a chemostat with the same dilution rate. The numbers of analyzable protein species and the numbers of conditions were different between (D) and (E). Thus, the values of the vertical axes cannot be compared directly between them.

Appendix 1—figure 9

Download asset Open asset

Mathematical analyses of the main Raman-proteome data.

Related to Figure 6. Proteomes of *E. coli* under 15 conditions (Schmidt et al., 2016) and corresponding Raman data we measured in this study were analyzed in this figure. (A) Visual comparison of the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and a random orthogonal matrix. Height of each bar indicates the value of each element. Colors represent the height of each bar. For clarifying the position of each element, a component form of matrix $Θ$ is shown in the middle ( $m = 15$ ). For $Θ$ (middle) and a random orthogonal matrix (right), the original matrices are displayed in the upper row, and matrices whose elements are the absolute values of the corresponding elements of the original matrices are displayed in the lower row. (In this figure, $| Θ |$ represents a matrix of which the $(i, j)$ element is the absolute value of the $(i, j)$ element of $Θ$ .) (B) Representation of matrices as scatterplots. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (C) Comparison of the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and random orthogonal matrices $Q$ by Pearson correlation coefficients. Pearson correlation coefficient of the element-wise squared matrix of each matrix can be regarded as a measure of closeness to the identity matrix ( $\circ$ represents element-wise multiplication). The probability of finding a random orthogonal matrix $Q$ with Pearson correlation coefficient greater than the Pearson correlation coefficient of $Θ$ was $< 1 \times 10^{- 5}$ (no occurrence in 10⁵ samplings). See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (D) Comparison of magnitudes of off-diagonal elements among the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and random orthogonal matrices $Q$ . The lattice on the top explains the numbering of $k$ -diagonals ( $- m < k < m$ , $m = 15$ ). In the lattices on the bottom, black color indicates areas in which the elements are squared and summed at the corresponding steps (i.e. areas represented by $x$ in the graph). The sum of the squared values in each step is shown in the middle graph. Error bars of the random matrix line are standard errors of 100 samplings. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (E) Comparison of magnitudes of elements of leading principal submatrices among the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and random orthogonal matrices $Q$ . In the lattices on the bottom, black color indicates an area in which elements are squared and summed at the corresponding step (i.e. an area represented by $x$ in the graphs). The sum of the squared values in each area is shown in the top graph. The results shown in the top graph are converted into ratios to the identity matrix $I$ and are shown in the middle graph. Error bars of the random matrix line are standard errors of 100 samplings. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (F) Comparison of $\sqrt{m} d i a g (b_{0})$ and $d i a g (b_{0}^{e s t})$ . $x$ axis represents $\sqrt{m} b_{0}$ and $y$ axis represents $b_{0}^{est}$ . The dashed line indicates $y = x$ . (G) Comparison between $B_{E}^{norm}$ (left) and $B_{E}^{e s t, n o r m}$ (right). Note that while $B_{E}^{norm}$ figure (left) is the same as Figure 6C, the right figure shows $B_{E}^{e s t, n o r m} = {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w}$ , where ${\tilde{V}}_{rw}$ is shown in Figure 6D.

Appendix 1—figure 10

Download asset Open asset

Orthant correspondences between Raman spectra in linear discriminant analysis (LDA) space and condition-specific proteins in Raman-proteome coefficient proteome space.

Related to Figure 6. Using the main Raman and proteome data of *E. coli* under the 15 conditions, we examine the orthant correspondence between Raman spectra in the LDA space and condition-specific proteins in the Raman-proteome coefficient proteome space $Ω_{B}$ . Here, we focus on two proteins PaaE and AcrR. (A) Expression patterns of PaaE (left) and AcrR (right) across conditions. Error bars are standard errors. PaaE is expressed under the ‘LB’ condition in a condition-specific manner, whereas AcrR is expressed at high levels not only under ‘LB’ condition but also under several other conditions. (B) Positions of PaaE and AcrR in the Raman-proteome coefficient-based proteome space $Ω_{B}$ . (C) Verification of orthant correspondence. We verified the orthant correspondence described by Equation 2.76. We multiplied both sides of Equation 2.76 by ${(Σ_{R_{E}}^{n o r m})}^{- 1}$ , and the elements of the vectors of both sides were compared by scatterplots. The horizontal axes are related to the coordinates in the Raman LDA space; the vertical axes are related to the coordinates in the Raman-proteome coefficient proteome space. The dashed lines are $y = x$ . The nearly perfect agreement of the elements confirms the orthant correspondence for the condition-specific protein PaaE (left). Deviations from the diagonal agreement line are found for AcrR (right).

Appendix 1—figure 11

Download asset Open asset

Stoichiometry-based omics structures and their correspondences to Raman-based omics structures for additional datasets.

Related to Figures 4—6. This figure summarizes the results on omics structures characterized by stoichiometry conservation relations and their correspondences to those characterized by Raman-omics relations for additional datasets. (**A–E**) show the results from the analyses of the Raman and proteome data of three *E. coli* strains (BW25113, MG1655, and NCM3722) in LB; (**F–J**) from the analyses of the Raman and proteome data of the three *E. coli* strains in M9 Glucose; and (**K–O**) from the analyses of the Raman and transcriptome data of *S. pombe* under 10 conditions. We used the *E. coli* proteome data reported in Schmidt et al., 2016, and the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018, in the analyses. (A), (F), and (K) show distributions of omics components in cosine similarity LE (csLE) space. Stoichiometry conservation centrality of each component is indicated by color. (B), (G), and (L) show expression patterns of representative condition-specific omics components indicated in the previous figures of omics structures in the csLE spaces. Error bars are standard errors in (B) and (G), and maximum-minimum ranges (two replicates) in (L). (C), (H), and (M) show positions of averaged cellular Raman spectra under different conditions in the linear discriminant analysis (LDA) spaces. (D), (I), and (N) show omics structures in the spaces specified by the Raman-omics coefficients with the homeostatic cores and condition-specific stoichiometrically conserved groups (SCGs) indicated by colored points. (E), (J), and (O) show the omics structures in the csLE omics spaces with the homeostatic cores and condition-specific SCGs indicated by colored points. Columns $v_{rw, 1}$ (the eigenvector corresponding to $L_{rw}$ ’s smallest eigenvalue except for zero) and $v_{rw, 2}$ (the eigenvector corresponding to $L_{rw}$ ’s second smallest eigenvalue except for zero) are shown. We used the cosine similarity thresholds of 0.99993 to specify SCGs both for the three *E. coli* strains under LB data (D and E) and for the three *E. coli* strains under M9 Glucose data (I and J), and 0.9967 for the *S. pombe* transcriptome data (N and O).

Appendix 1—figure 12

Download asset Open asset

Analyses of the mathematical relation connecting two types of omics spaces.

Related to Figure 6. This figure shows the analyses of mathematical relation that connects coordinates of omics components in the two types of omics spaces (see Figure 6E and Section 2 in Appendix) using additional datasets. (**A–F**) show the results from the analyses of the Raman and proteome data of three *E. coli* strains (BW25113, MG1655, and NCM3722) in LB; (**G–L**) from the analyses of the Raman and proteome data of the three *E. coli* strains in M9 Glucose; and (**M–R**) from the analyses of the Raman and transcriptome data of *S. pombe* under 10 conditions. We used the *E. coli* proteome data reported in Schmidt et al., 2016, and the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018 in the analyses. See the caption of Appendix 1—figure 9 for the explanation of each panel. The stoichiometrically conserved groups (SCGs) in (F), (L), and (R) are the same as in Appendix 1—figure 11. The probability of finding a random orthogonal matrix $Q$ with Pearson correlation coefficient greater than the Pearson correlation coefficient of $Θ$ was 0.022 in (B), 0.013 in (H), and $< 1 \times 10^{- 5}$ (no occurrence in 10⁵ samplings) in (N).

Appendix 1—figure 13

Download asset Open asset

Stoichiometry-based proteome structures for additional datasets.

Related to Figures 4 and 5. This figure shows proteome structures in the cosine similarity LE (csLE) proteome spaces for additional datasets. (**A–C**) show the results from the analyses of the proteome data of *M. tuberculosis* H37Rv under gradual changes in oxygen levels (Schubert et al., 2015); (**D–F**) shows the results from the analyses of the proteome data of *M. bovis* BCG under gradual changes in oxygen levels (Schubert et al., 2015); and (**G–I**) show the results from the analyses of the proteome data of *S. cerevisiae* under 10 conditions in chemostat with the same dilution rate (Lahtvee et al., 2017). (A), (D), and (G) show the proteome structures in the csLE spaces. The thresholds used to specify the stoichiometrically conserved groups (SCGs) were 0.99965 for (A), 0.9997 for (D), and 0.9989 for (G). (B), (E), and (H) show the same proteome structures as in the previous panels, but with stoichiometry conservation centrality of each protein species indicated by the color. (C), (F), and (I) show expression patterns of representative proteins indicated by the red circles in the previous panels. Error bars in (C) are standard errors.

Appendix 1—figure 14

Download asset Open asset

Dependence of low-dimensional correspondence between Raman spectra and proteomes on the number of conditions.

Related to Figure 6. The dependence of the low-dimensional correspondence between Raman spectra and proteomes on the number of analyzed conditions was systematically investigated by evaluating the similarity of the orthogonal matrix $Θ$ to the identity matrix for all subsampled condition sets. Proteomes of *E. coli* under 15 conditions (Schmidt et al., 2016) and corresponding Raman data we measured in this study were analyzed in this figure. (A) The relationship between the number of conditions and the probability of obtaining higher level of low-dimensional correspondence than that of experimental data by chance. This probability is calculated as the probability of finding a random orthogonal matrix with Pearson correlation coefficient greater than the Pearson correlation coefficient of $Θ$ by creating 10⁴ random orthogonal matrices. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods and Appendix 1—figure 9 for details of the evaluation method. Each green square corresponds to one subsample, and each short horizontal black line represents the median of all the $(\binom{15}{x})$ combinations of conditions (i.e. $(\binom{15}{x})$ green squares) for each subsample size $x$ . The blue dashed line indicates the detection limit (i.e. one over the number of generated random orthogonal matrices). The non-subsampled case (i.e. the case with all 15 conditions) in this figure corresponds to Appendix 1—figure 9C. (B) Visual comparison of $Θ$ , $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ for six representative subsamples indicated in (A). As in Appendix 1—figure 9A, $Θ$ is visualized using $| Θ |$ , whose element is the absolute value of the corresponding element of $Θ$ , and height of each bar in the figures of $| Θ |$ indicates the value of each element of $| Θ |$ . Colors reflect the height of each bar. Spaces created with columns of $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ are $Ω_{B}$ and $Ω_{LE}$ , respectively. As $Θ$ deviates from the identity matrix from the cases $α$ and $β$ to the case of $ϵ$ , the low-dimensional correspondence between $Ω_{B}$ and $Ω_{LE}$ collapses naturally. Since the case $ζ$ is the non-subsampled case, the figure of $| Θ |$ is the same as Appendix 1—figure 9A, and those of $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ are the same as Appendix 1—figure 9G. Note that the figure of $Ω_{B}$ of the case $ζ$ is also exactly the same as Figure 6C, and that of $Ω_{LE}$ of the case $ζ$ is equal to Figure 6D up to a factor of ${(\sum_{i = 1}^{n} d_{i})}^{1 / 2}$ . The stoichiometrically conserved groups (SCGs) shown in this figure were defined in the analysis of the proteomes of all the 15 conditions (Figure 4C).

Tables

Table 1

List of scalars, vectors, and matrices in the main text.

Scalars, vectors, and matrices in the main text are listed with their sizes and descriptions. $m$ is the number of conditions, and $n$ is the number of protein species. ( $m = 15$ and $n = 2058$ in the main text.) Note that the notation summarized in this table differs in some respect from that in Materials and methods and Appendix.

	Size (#columns × #rows)	Description
${\hat{r}}_{j}$ $(j = 1, \dots, m)$	$(m - 1) \times 1$ (vector)	Mean LDA Raman profile of single cells under condition $j$
${\hat{p}}_{j}$ $(j = 1, \dots, m)$	$n \times 1$ (vector)	Proteome profile of cell population under condition $j$
$B$ $= [\begin{matrix} b_{0} & \dots & b_{m - 1} \end{matrix}]$ $= (b_{i k})_{1 \leq i \leq n, 0 \leq k \leq m - 1}$	$n \times m$	Set of condition-independent coefficients that linearly connect ${\hat{r}}_{j}$ and ${\hat{p}}_{j}$ for all conditions $j$ (Equation 1)
$p_{i}$ $(i = 1, \dots, n)$	$m \times 1$ (vector)	Expression levels of protein species $i$ across $m$ conditions
$\cos θ_{p_{i} p_{j}}$ $= (p_{i} / ‖ p_{i} ‖_{2}) \cdot (p_{j} / ‖ p_{j} ‖_{2})$ $(i, j = 1, \dots, n)$	$1 \times 1$ (scalar)	Stoichiometry (abundance ratio) conservation strength between two protein species $i$ and $j$ (Figure 4A)
$A = {(\cos θ_{p_{i} p_{j}})}_{1 \leq i, j \leq n}$	$n \times n$	Set of stoichiometry conservation strengths between all pairs of protein species (Figure 5J)
$d_{i} = Σ_{j = 1}^{n} \cos θ_{𝒑_{i} 𝒑_{j}}$ $(i = 1, \dots, n)$	$1 \times 1$ (scalar)	Stoichiometry conservation centrality of protein species $i$
$g_{i} = ‖ p_{i} ‖_{1} / ‖ p_{i} ‖_{2}$ $(i = 1, \dots, n)$	$1 \times 1$ (scalar)	Expression generality of protein species $i$

Key resources table

Reagent type (species) or resource	Designation	Source or reference
Chemical compound, drug	Difco LB Broth, Miller (Luria-Bertani)	Becton, Dickinson and Company
Chemical compound, drug	Bacto Yeast Extract	Becton, Dickinson and Company
Chemical compound, drug	Bacto Tryptone	Becton, Dickinson and Company
Chemical compound, drug	Sodium Chloride	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Disodium Hydrogenphosphate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Potassium Dihydrogenphosphate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Ammonium Sulfate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Zinc Sulfate Heptahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Cooper(II) Chloride Dihydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Manganese(II) Sulfate Pentahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Cobalt(II) Chloride Hexahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Calcium Chloride Dihydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Magnesium Sulfate Heptahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Thiamin Hydrochloride	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Iron(III) Chloride Hexahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Sodium Acetate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Disodium Fumarate	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	D-Galactose	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	D-Glucose	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Glycerol	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	D-Fructose	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	D-Mannose	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	D-Xylose	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Alanine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Asparagine Monohydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Cysteine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Glutamic acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Glutamine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Glycine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Histidine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Isoleucine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Phenylalanine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Proline	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Serine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Adenine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Arginine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Aspartic acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Leucine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Lysine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Methionine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Threonine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Tryptophan	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Tyrosine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Valine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Uracil	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	$8 mol / L$ Sodium Hydroxide Solution	Wako Pure Chemical Industries, Ltd., FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	35–37% (mass/mass) Hydrochloric Acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	$0.1 m o l / L$ Hydrochloric Acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Agar	Wako Pure Chemical Industries, Ltd., FUJIFILM Wako Pure Chemical Corporation
Strain, strain background (Escherichia coli)	BW25113	Wakamoto Laboratory stock
Strain, strain background (Escherichia coli)	MG1655	Wakamoto Laboratory stock
Strain, strain background (Escherichia coli)	NCM3722	Coli Genetic Stock Center

Appendix 1—table 1

List of culture conditions.

M9 m.m. and a.a. in this table are the abbreviations for M9 minimal media and amino acids, respectively.

Phase	Overview of composition	Temperature	pH	Name in this paper
Exponential	M9 m.m. + acetate	37°C	7	Acetate
	M9 m.m. + fructose			Fructose
	M9 m.m. + fumarate			Fumarate
	M9 m.m. + galactose			Galactose
	M9 m.m. + glucose			Glucose
	M9 m.m. + glucose	42°C		Glucose42C
	M9 m.m. + glucose	37°C	6	GlucosepH6
	M9 m.m. + glycerol		7	Glycerol
	M9 m.m. + glycerol + a.a.			GlycerolAA
	M9 m.m. + glucose + NaCl			OsmoticStressGlucose
	M9 m.m. + mannose			Mannose
	M9 m.m. + xylose			Xylose
	LB			LB
Stationary for 1 day	M9 m.m. + glucose			stationary1day
Stationary for 3 days	M9 m.m. + glucose			stationary3days

Appendix 1—table 2

Evaluation of the overall estimation error with various distance measures (the case where LDA1 to LDA4 axes were used).

The sum of estimation errors $\sum_{i} d i s t ({\hat{p}}_{i}, {\hat{p}}_{i}^{e s t})$ was calculated, and a permutation test (10⁵ permutations) was conducted. In this table, LDA1 to LDA4 axes were used. $\overline{x}$ represents a vector whose all elements are the mean of all elements of $x$ . $x_{j}$ is the $j$ -th element of $x$ . ${median}_{j} x_{j}$ represents the median of scalers $x_{j}$ .

Metric	Definition of $d i s t (x, y)$	$\sum_{i} d i s t ({\hat{p}}_{i}, {\hat{p}}_{i}^{e s t})$	$p$ -value
Square of $L^{2}$ norm (PRESS)	$‖ x - y ‖_{2}^{2} = \sum_{j} (x - y)_{j}^{2}$	2.34 × 10³	0.00005
$L^{1}$ norm	$‖ x - y ‖_{1} = \sum_{j} \| (x - y)_{j} \|$	1.40 × 10³	0.00002
Cosine distance	$1 - \frac{x \cdot y}{‖ x ‖_{2} ‖ y ‖_{2}}$	1.52	0.0014
1 – Pearson correlation coefficient	$1 - \frac{(x - \bar{x}) \cdot (y - \bar{y})}{‖ x - \bar{x} ‖_{2} ‖ y - \bar{y} ‖_{2}}$	1.57	0.0012
Median of relative error	${m e d i a n}_{j} \frac{\| (x - y)_{j} \|}{x_{j} + 1}$	0.0536	0.00022

Appendix 1—table 3

Evaluation of the overall estimation error with various distance measures (the case where all the 14 LDA axes were used).

The results obtained by using all the 14 LDA axes are presented. See Appendix 1—table 2 for notations. Note that the system is underdetermined in this case; thus, we adopted the minimum-norm solution from among all least-squares solutions.

Metric	Definition of $d i s t (x, y)$	$\sum_{i} d i s t ({\hat{p}}_{i}, {\hat{p}}_{i}^{e s t})$	$p$ -value
Square of $L^{2}$ norm (PRESS)	$‖ x - y ‖_{2}^{2} = \sum_{j} (x - y)_{j}^{2}$	1.63 × 10³	0.0019
$L^{1}$ norm	$‖ x - y ‖_{1} = \sum_{j} \| (x - y)_{j} \|$	1.19 × 10³	0.00066
Cosine distance	$1 - \frac{x \cdot y}{‖ x ‖_{2} ‖ y ‖_{2}}$	1.18	0.0879
1 – Pearson correlation coefficient	$1 - \frac{(x - \bar{x}) \cdot (y - \bar{y})}{‖ x - \bar{x} ‖_{2} ‖ y - \bar{y} ‖_{2}}$	1.23	0.085
Median of relative error	${m e d i a n}_{j} \frac{\| (x - y)_{j} \|}{x_{j} + 1}$	0.0418	0.00082

Appendix 1—table 4

Gene list of SCG 1 (homeostatic core).

Members of homeostatic core (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
rpoC	DNA-directed RNA polymerase subunit beta’
rpoB	DNA-directed RNA polymerase subunit beta
tufA	Elongation factor Tu 1
infB	Translation initiation factor IF-2
fusA	Elongation factor G
glyS	Glycyl-tRNA synthetase beta subunit
rpsA	30S ribosomal protein S1
leuS	Leucyl-tRNA synthetase
pheT	Phenylalanyl-tRNA synthetase beta chain
aspS	Aspartyl-tRNA synthetase
valS	Valyl-tRNA synthetase
secA	Protein translocase subunit SecA
gyrA	DNA gyrase subunit A
pepN	Aminopeptidase N
tsf	Elongation factor Ts
tig	Trigger factor
pta	Phosphate acetyltransferase
bamA	Outer membrane protein assembly factor YaeT
rne	Ribonuclease E
ftsZ	Cell division protein FtsZ
gyrB	DNA gyrase subunit B
polA	DNA polymerase I
rplB	50S ribosomal protein L2
prlC	Oligopeptidase A
rho	Transcription termination factor Rho
ftsH	ATP-dependent zinc metalloprotease FtsH
nusA	Transcription elongation protein NusA
lysS	Lysyl-tRNA synthetase
metG	Methionyl-tRNA synthetase
glnS	Glutaminyl-tRNA synthetase
lpdA	Dihydrolipoyl dehydrogenase
serS	Seryl-tRNA synthetase
surA	Chaperone SurA
rpsB	30S ribosomal protein S2
gltX	Glutamyl-tRNA synthetase
lptD	LPS-assembly protein LptD
argS	Arginyl-tRNA synthetase
fabB	3-Oxoacyl-[acyl-carrier-protein] synthase 1
pheS	Phenylalanyl-tRNA synthetase alpha chain
clpX	ATP-dependent Clp protease ATP-binding subunit ClpX
accC	Biotin carboxylase
pyrG	CTP synthase
tolC	Outer membrane protein TolC
rplE	50S ribosomal protein L5
accA	Acetyl-coenzyme A carboxylase carboxyl transferase subunit alpha
hflK	Modulator of FtsH protease HflK
pdxB	Erythronate-4-phosphate dehydrogenase
ygfZ	tRNA-modifying protein YgfZ
pmbA	Protein PmbA
rplA	50S ribosomal protein L1
hldD	ADP-L-glycero-D-manno-heptose-6-epimerase
mreB	Rod shape-determining protein MreB
acrA	Acriflavine resistance protein A
gor	Glutathione reductase
hisS	Histidyl-tRNA synthetase
rpsC	30S ribosomal protein S3
glmM	Phosphoglucosamine mutase
lepA	Elongation factor 4
ffh	Signal recognition particle protein
secD	Protein-export membrane protein SecD
lpoA	Penicillin-binding protein activator LpoA
rhlB	ATP-dependent RNA helicase RhlB
rpsG	30S ribosomal protein S7
rpsD	30S ribosomal protein S4
minD	Septum site-determining protein MinD
cyoA	Ubiquinol oxidase subunit 2
mdoG	Glucans biosynthesis protein G
rplC	50S ribosomal protein L3
glmU	Bifunctional protein GlmU
rpsF	30S ribosomal protein S6
rpsE	30S ribosomal protein S5
hemL	Glutamate-1-semialdehyde 2,1-aminomutase
hldE	Bifunctional protein HldE
ubiE	Ubiquinone/menaquinone biosynthesis methyltransferase UbiE
sspA	Stringent starvation protein A
nusG	Transcription antitermination protein NusG
prfB	Peptide chain release factor 2
dacA	D-alanyl-D-alanine carboxypeptidase DacA
rplF	50S ribosomal protein L6
fabG	3-Oxoacyl-[acyl-carrier-protein] reductase
ftsY	Cell division protein FtsY
dcrB	Protein DcrB
mlaC	Probable phospholipid-binding protein MlaC
hflC	Modulator of FtsH protease HflC
coaB	Coenzyme A biosynthesis bifunctional protein CoaBC
ybiT	Uncharacterized ABC transporter ATP-binding protein YbiT
oxyR	Hydrogen peroxide-inducible genes activator
rpsH	30S ribosomal protein S8
fkpA	FKBP-type peptidyl-prolyl cis-trans isomerase FkpA
frr	Ribosome-recycling factor
fabD	Malonyl CoA-acyl carrier protein transacylase
hslO	33 kDa chaperonin
ybeZ	PhoH-like protein
hemX	Putative uroporphyrinogen-III C-methyltransferase
rplY	50S ribosomal protein L25
rplK	50S ribosomal protein L11
rpsI	30S ribosomal protein S9
bamB	Lipoprotein YfgL
bamD	UPF0169 lipoprotein YfiO
kdgR	Transcriptional regulator KdgR
glnD	[Protein-PII] uridylyltransferase
yniC	Phosphatase YniC
rpsJ	30S ribosomal protein S10
rplX	50S ribosomal protein L24
rplD	50S ribosomal protein L4
rplQ	50S ribosomal protein L17
ppa	Inorganic pyrophosphatase
rpsM	30S ribosomal protein S13
rplN	50S ribosomal protein L14
ybaB	UPF0133 protein YbaB
yidC	Inner membrane protein OxaA
lptB	Lipopolysaccharide export system ATP-binding protein LptB
suhB	Inositol-1-monophosphatase
yejK	Nucleoid-associated protein YejK
ghrA	Glyoxylate/hydroxypyruvate reductase A
rsmI	Ribosomal RNA small subunit methyltransferase I
hemY	Protein HemY
uup	ABC transporter ATP-binding protein Uup
hrpA	ATP-dependent RNA helicase HrpA
rplJ	50S ribosomal protein L10
rplM	50S ribosomal protein L13
fur	Ferric uptake regulation protein
rplS	50S ribosomal protein L19
rcsB	Capsular synthesis regulator component B
mrp	Protein Mrp
glyQ	Glycyl-tRNA synthetase alpha subunit
greA	Transcription elongation factor GreA
nrdB	Ribonucleoside-diphosphate reductase 1 subunit beta
wbbI	Uncharacterized protein YefG
udk	Uridine kinase
mnmG	tRNA uridine 5-carboxymethylaminomethyl modification enzyme MnmG
rplL	50S ribosomal protein L7/L12
rplI	50S ribosomal protein L9
rpoZ	DNA-directed RNA polymerase subunit omega
ybbN	Uncharacterized protein YbbN
yfiF	Uncharacterized tRNA/rRNA methyltransferase YfiF
yedD	Uncharacterized lipoprotein YedD
rpmD	50S ribosomal protein L30
tatB	Sec-independent protein translocase protein TatB
yfgM	UPF0070 protein YfgM
kdsB	3-Deoxy-manno-octulosonate cytidylyltransferase
rpoN	RNA polymerase sigma-54 factor
fdx	2Fe-2S ferredoxin
rplV	50S ribosomal protein L22
rplO	50S ribosomal protein L15
fabZ	(3R)-hydroxymyristoyl-[acyl-carrier-protein] dehydratase
mipA	MltA-interacting protein
ssb	Single-stranded DNA-binding protein
yiaF	Uncharacterized protein YiaF
secY	Preprotein translocase subunit SecY
rbfA	Ribosome-binding factor A
potA	Spermidine/putrescine import ATP-binding protein PotA
rimM	Ribosome maturation factor RimM
trxA	Thioredoxin-1
rpsS	30S ribosomal protein S19
rpsU	30S ribosomal protein S21
accB	Biotin carboxyl carrier protein of acetyl-CoA carboxylase
engB	Probable GTP-binding protein EngB
tatA	Sec-independent protein translocase protein TatA
rfbD	dTDP-4-dehydrorhamnose reductase
ribF	Riboflavin biosynthesis protein RibF
folP	Dihydropteroate synthase
lepB	Signal peptidase I
sspB	Stringent starvation protein B
hupA	DNA-binding protein HU-alpha
rpsP	30S ribosomal protein S16
rplP	50S ribosomal protein L16
rpsT	30S ribosomal protein S20
rpsK	30S ribosomal protein S11
rplU	50S ribosomal protein L21
rplR	50S ribosomal protein L18
lpxA	Acyl-[acyl-carrier-protein]–UDP-N-acetylglucosamine O-acyltransferase
yceD	Uncharacterized protein YceD
queC	7-Cyano-7-deazaguanine synthase
rpmA	50S ribosomal protein L27
rpmG	50S ribosomal protein L33
rpmF	50S ribosomal protein L32
rpsN	30S ribosomal protein S14
rplT	50S ribosomal protein L20
nudK	GDP-mannose pyrophosphatase NudK
rplW	50S ribosomal protein L23
trmB	tRNA (guanine-N(7)-)-methyltransferase
rluB	Ribosomal large subunit pseudouridine synthase B
rpsR	30S ribosomal protein S18
secG	Protein-export membrane protein SecG
rlmE	Ribosomal RNA large subunit methyltransferase E
yfaY	CinA-like protein
trmA	tRNA (uracil-5-)-methyltransferase
rpmH	50S ribosomal protein L34
yajC	UPF0092 membrane protein YajC
yheU	UPF0270 protein YheU

Appendix 1—table 5

Gene list of SCG 2.

Members in SCG 2 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
fdoG	Formate dehydrogenase-O major subunit
dsdA	D-serine dehydratase
treC	Trehalose-6-phosphate hydrolase
sdaB	L-serine dehydratase 2
nanA	N-acetylneuraminate lyase
garD	D-galactarate dehydratase
proV	Glycine betaine/L-proline transport ATP-binding protein ProV
garR	2-Hydroxy-3-oxopropionate reductase
nanK	N-acetylmannosamine kinase
fdoH	Formate dehydrogenase-O iron-sulfur subunit
aphA	Class B acid phosphatase
nanE	Putative N-acetylmannosamine-6-phosphate 2-epimerase
srlB	Glucitol/sorbitol-specific phosphotransferase enzyme IIA component
ibpB	Small heat shock protein IbpB
hybC	Hydrogenase-2 large chain
proW	Glycine betaine/L-proline transport system permease protein ProW
srlE	Glucitol/sorbitol-specific phosphotransferase enzyme IIB component
fdoI	Formate dehydrogenase, cytochrome b556(fdo) subunit
preT	Uncharacterized oxidoreductase YeiT
garL	5-Keto-4-deoxy-D-glucarate aldolase
paaB	Phenylacetic acid degradation protein PaaB
paaK	Phenylacetate-coenzyme A ligase
paaE	Probable phenylacetic acid degradation NADH oxidoreductase PaaE
ykgE	Uncharacterized protein YkgE
ybjT	Uncharacterized protein YbjT
ykgG	Uncharacterized protein YkgG

Appendix 1—table 6

Gene list of SCG 3.

Members in SCG 3 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
wzc	Tyrosine-protein kinase Wzc
amiC	N-acetylmuramoyl-L-alanine amidase AmiC

Appendix 1—table 7

Gene list of SCG 4.

Members in SCG 4 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
fruB	Multiphosphoryl transfer protein
fruK	1-Phosphofructokinase
fruA	PTS system fructose-specific EIIBC component
narI	Respiratory nitrate reductase 1 gamma chain

Appendix 1—table 8

Gene list of SCG 5.

Members in SCG 5 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
hdeB	Protein HdeB
hdeA	Chaperone-like protein HdeA

Appendix 1—table 9

Interpretations of $r_{h}$ , ${\hat{r}}_{i}$ , $b_{h}$ , and ${\hat{b}}_{j}$ .

Interpretations of the columns and rows of $R_{E}$ and $B_{E}$ are summarized.

Matrix	Vector		Dimension	Description
$R_{E}$	Column	$r_{h} (h = 0, \dots, m - 1)$	$m$	List of $h$ -th LDA coordinates of mean LDA Raman of all the conditions
$R_{E}$	Row	${\hat{r}}_{i} (i = 1, \dots, m)$	$m$	Mean LDA Raman of condition $i$
$B_{E}$	Column	$b_{h} (h = 0, \dots, m - 1)$	$n$	List of coefficients of all the proteins for the $h$ -th LDA axis
$B_{E}$	Row	${\hat{b}}_{j} (j = 1, \dots, n)$	$m$	Coefficients for protein $j$

Appendix 1—table 10

Mathematical relation between Raman-proteome coefficients and cosine similarity LE (csLE) proteomes.

The matrices in the left-hand side of Equation 2.138 (a proteome structure based on Raman-proteome coefficients) and their counterparts in the right-hand side of Equation 2.138 (a proteome structure obtained with csLE) are listed.

Raman-omicscoef. structure	csLE	Size and type of matrix	Description
$B_{E}^{norm}$	$\begin{array}{ll} {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w} \\ (= B_{E}^{e s t, n o r m}) \end{array}$	$n \times m$ matrix	Coefficients normalized by constants
$I$	$Θ$	$m \times m$ orthogonal matrix	Orthogonal transformation
$\begin{array}{ll} m^{- 1 / 2} d i a g ({(1_{m})}^{⊤} P) \\ (= m^{1 / 2} d i a g (b_{0})) \end{array}$	$\begin{array}{ll} {(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} D \\ (= d i a g (b_{0}^{e s t})) \end{array}$	$n \times n$ diagonal matrix	Constant terms
$Σ_{R_{E}}^{norm}$	$Σ_{LE}$	$m \times m$ diagonal matrix	Singular values

Additional files

MDAR checklist: https://cdn.elifesciences.org/articles/101485/elife-101485-mdarchecklist1-v1.docx
Download elife-101485-mdarchecklist1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Article PDF

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Ken-ichiro F Kamei
Koseki J Kobayashi-Kirschvink
Takashi Nozoe
Hidenori Nakaoka
Miki Umetani
Yuichi Wakamoto

(2026)

Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns

eLife 14:RP101485.

https://doi.org/10.7554/eLife.101485.3

Share this article

Cite this article

Cellular physiological state differences detected by Raman spectral global patterns and gene expression profiles.

Estimation of proteomes from Raman spectra.

A stoichiometrically conserved protein group identified by an analysis of the Raman-proteome coefficient matrix.

Extracting stoichiometrically conserved groups (SCGs) from proteome data.

A proteome structure characterized by global stoichiometry conservation relationships.

Raman-based proteome structure and its similarity to stoichiometry-based proteome structure.

Proportionality between stoichiometry conservation centrality and expression generality.

Schematic illustration of the approach in this study.

Custom-built Raman microscope and analyses of E. coli Raman spectra.

Estimation of proteomes from Raman spectra.

Comparison of stoichiometry conservation among Clusters of Orthologous Group (COG) classes.

Single-gene-level growth law in the homeostatic core.

Functional relevance of stoichiometry conservation centrality.

Distributions and constraints with respect to stoichiometry conservation centrality (degree).

Properties of normalized expression vectors.

Mathematical analyses of the main Raman-proteome data.

Orthant correspondences between Raman spectra in linear discriminant analysis (LDA) space and condition-specific proteins in Raman-proteome coefficient proteome space.

Stoichiometry-based omics structures and their correspondences to Raman-based omics structures for additional datasets.

Analyses of the mathematical relation connecting two types of omics spaces.

Stoichiometry-based proteome structures for additional datasets.

Dependence of low-dimensional correspondence between Raman spectra and proteomes on the number of conditions.

List of scalars, vectors, and matrices in the main text.

List of culture conditions.

Evaluation of the overall estimation error with various distance measures (the case where LDA1 to LDA4 axes were used).

Evaluation of the overall estimation error with various distance measures (the case where all the 14 LDA axes were used).

Gene list of SCG 1 (homeostatic core).

Gene list of SCG 2.

Gene list of SCG 3.

Gene list of SCG 4.

Gene list of SCG 5.

Interpretations of rh,r^i,bh, and b^j.

Mathematical relation between Raman-proteome coefficients and cosine similarity LE (csLE) proteomes.

MDAR checklist

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Interpretations of $r_{h}$ , ${\hat{r}}_{i}$ , $b_{h}$ , and ${\hat{b}}_{j}$ .