Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns

  1. Ken-ichiro F Kamei  Is a corresponding author
  2. Koseki J Kobayashi-Kirschvink
  3. Takashi Nozoe
  4. Hidenori Nakaoka
  5. Miki Umetani
  6. Yuichi Wakamoto  Is a corresponding author
  1. Department of Basic Science, Graduate School of Arts and Sciences, The University of Tokyo, Japan
  2. Department of Medicine, The University of Chicago, United States
  3. Research Center for Complex Systems Biology, The University of Tokyo, Japan
  4. Universal Biology Institute, The University of Tokyo, Japan
  5. Department of Optical Imaging, Advanced Research Promotion Center Tokushima University, Japan
  6. Department of Biology, New York University, United States
21 figures, 12 tables and 1 additional file

Figures

Cellular physiological state differences detected by Raman spectral global patterns and gene expression profiles.

(A) Condition-dependent cellular Raman spectral patterns. Raman spectra obtained from cells reflect their molecular profiles. Therefore, systematic differences in global spectral patterns may indicate their physiological states. A Raman spectrum from each cell can be represented as a vector and a point in a high-dimensional Raman space. If condition-dependent differences exist in the spectral patterns, appropriate dimensional reduction methods allow us to classify the spectra and detect cellular physiological states in a low-dimensional space. (B) Condition-dependent gene expression profiles. Global gene expression profiles (proteomes and transcriptomes) are also dependent on conditions. For each gene, we can consider a high-dimensional vector whose elements represent expression levels under different conditions. It has been suggested that these expression-level vectors are constrained to some low-dimensional manifolds (Eisen et al., 1998; Segal et al., 2003; Bergmann et al., 2003; Keren et al., 2013; You et al., 2013; Kaneko et al., 2015; Hui et al., 2015; Heimberg et al., 2016; Biswas et al., 2017; Husain and Murugan, 2020; Sato and Kaneko, 2020). This study characterizes the statistical correspondence between dimension-reduced Raman spectral patterns and gene expression profiles. Analyzing the correspondence, we reveal a stoichiometry conservation principle that constrains gene expression profiles to low-dimensional manifolds.

Estimation of proteomes from Raman spectra.

(A) The experimental design. We cultured E. coli cells under 15 different conditions and measured single cells’ Raman spectra. We then examined the correspondence between the measured Raman spectra and the absolute quantitative proteome data reported by Schmidt et al., 2016. (B) Representative Raman spectra from single cells, one from the ‘Glucose’ condition, and the other from the ‘LB’ condition. The fingerprint region and representative peaks are annotated. (C–E) Cellular Raman spectra in linear discriminant analysis (LDA) space. The dimensionality of the spectra is reduced to 14(=151). Each point represents a spectrum from a single cell, and each ellipse shows the 95% concentration ellipse for each condition. Their projections to the LDA1-LDA2 plane (C), the LDA1-LDA3 plane (D), and the LDA1-LDA4 plane (E) are shown. (F) Visualization of the 14-dimensional LDA space embedded in two-dimensional space with t-distributed stochastic neighbor embedding (t-SNE). (G) The scheme of leave-one-out cross-validation. The Raman and proteome data of one condition (here j) are excluded, and the matrix B is estimated using the data of the rest of the conditions as Bjest. The proteome data under the condition j is estimated from the Raman data 𝒓^j with B-jest and compared with the actual data to calculate estimation errors. (H) Comparison of measured and estimated proteome data. The plot for the ‘Glucose’ condition is shown as an example. Each dot corresponds to one protein species. The straight line indicates x=y. Proteins with negative estimated values are not shown.

A stoichiometrically conserved protein group identified by an analysis of the Raman-proteome coefficient matrix.

(A) Scatterplots of Raman-proteome transformation coefficients. The horizontal axes are constant terms (𝒃0) in all the plots. The vertical axis is coefficients for LDA1 (𝒃1), LDA2 (𝒃2), LDA3 (𝒃3), or LDA4 (𝒃4) in each plot. The proteins in the information storage and processing (ISP) Clusters of Orthologous Group (COG) class are indicated in yellow. Yellow solid straight lines are least squares regression lines passing through the origins for the ISP proteins. Insets are enlarged views of area around the origins. In this figure, we used the average of B-iest as an estimate of B. (B) Similarity of expression patterns between culture conditions for each COG class. We divided the proteome into COG classes (Tatusov et al., 2003; Galperin et al., 2015) and calculated Pearson correlation coefficient of expression patterns for all the combinations of culture conditions. Since the data are from 15 conditions, there are 105 (=15·14/(2·1)) points for each COG class in the graph. The box-and-whisker plots summarize the distributions of the points. The lines inside the boxes denote the medians, the top and bottom edges of the boxes do the 25th percentiles and 75th percentiles, respectively. The numbers of protein species are 376 for the Cellular Processes and Signaling COG class, 354 for the ISP COG class, and 840 for the Metabolism COG class. See Appendix 1—figure 4 for the evaluation with Pearson correlation coefficient of log abundances and with cosine similarity. Appendix 1—figure 4 also contains figures directly showing expression-level changes of different protein species across conditions for each COG class. (C) Examples of stoichiometry-conserving proteins in the ISP COG class. The horizontal axis represents the abundance of RplF under 15 conditions, and the vertical axis represents those of several ISP COG class proteins. These proteins are also contained in the homeostatic core defined later (see Figure 4). The solid straight lines are linear regression lines with an intercept of zero. (D) Examples of abundance ratios of non-ISP COG class proteins. The horizontal axis represents the abundance of RplF under 15 conditions, and the vertical axis represents those of compared non-ISP COG class proteins. Crp belongs to the Cellular Processes and Signaling COG class; the other proteins belong to the Metabolism COG class. In both (C) and (D), we selected the proteins expressed from distant loci on the chromosome. All sigma factors participating in the regulation of the proteins examined in (C) and (D) are listed on the right of the gene name legends. All transcription factors known to regulate multiple genes listed here are shown in the right diagrams. Arrows show activation; bars represent inhibition; and squares indicate that a transcription factor activates or inhibits depending on other factors. The information on gene regulation and functions was obtained from EcoCyc (Keseler et al., 2017) in August 2022. The error bars are standard errors calculated by using the data of Schmidt et al., 2016. The insets show the positions of the genes on the E. coli chromosome determined based on ASM75055v1.46 (Howe et al., 2020). No genes are in the same operon.

Extracting stoichiometrically conserved groups (SCGs) from proteome data.

(A) Quantifying stoichiometry conservation by cosine similarity. We consider an m-dimensional expression vector for each protein species whose elements represent its abundance under different conditions. The cosine similarity between the m-dimensional expression vectors of two protein species becomes nearly 1 when they conserve mutual stoichiometry strongly across conditions, whereas lower than 1 when their expression patterns are incoherent. (B) Extracted SCGs. We extracted proteins with high cosine similarity relationships. Each node represents a protein species. An edge connecting two nodes represents that the expression patterns of the two connected protein species have high cosine similarity exceeding a threshold of 0.995. Proteins that have no edge with the other proteins are not shown. The largest and the second largest protein groups, which we refer to as SCG 1 and SCG 2, respectively, are indicated by shaded polygons. (C) Expression patterns of the extracted SCGs. The horizontal and vertical axes represent growth rate and protein abundance, respectively. Line-connected points represent expression-level changes of different protein species across conditions. SCG 1 (homeostatic core) is shown in two ways: the left panel with a linear-scaled vertical axis and the right panel with a log-scaled vertical axis. The inset for SCG 2 shows the total abundances of SCG 2 proteins with a log-scaled vertical axis. Error bars are standard errors. (D) The gene loci of the homeostatic core (SCG 1) proteins on the chromosome. Magenta dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). We determined the gene loci based on ASM75055v1.46 (Howe et al., 2020).

A proteome structure characterized by global stoichiometry conservation relationships.

(A) Distributions of stoichiometry conservation centrality values for all the proteins (gray), the homeostatic core (SCG 1) proteins (magenta), and the proteins belonging to the other stoichiometrically conserved groups (SCGs) (cyan). (B) Correlation between stoichiometry conservation centrality and gene essentiality. The proportion of essential genes within each class of stoichiometry conservation ranking is shown. The list of essential genes was downloaded from EcoCyc (Keseler et al., 2017). (C) Correlation between stoichiometry conservation and evolutionary conservation. The strength of evolutionary conservation of each protein species was estimated by the number of orthologs found in the OrthoMCL species (Chen et al., 2006). The genes with more orthologs tend to have higher stoichiometry conservation centrality (p=3.42×10-14 by one-sided Brunner-Munzel test between the top 25% and the bottom 25% fractions of ortholog number ranking). Likewise, the genes with higher stoichiometry conservation centrality scores tend to have more orthologs (p=8.44×1012 by one-sided Brunner-Munzel test, top 25%–bottom 25% comparison; p-values in the captions for (F–I) were evaluated with the same statistical test scheme). (D–G) Stoichiometry conservation analyses of human cell atlas transcriptome data of fetal 15 organs (Cao et al., 2020). The top gray histogram in (D) shows the distribution of stoichiometry conservation centrality values for all genes. The bottom histograms in (D) show the distribution for coding genes (yellow) and that for the other genes (cyan). (E) shows a correlation between the ratio of coding genes and stoichiometry conservation centrality calculated from the human cell atlas data. (F) shows a correlation between gene essentiality and stoichiometry conservation centrality calculated from the human cell atlas data. The essentiality of each human gene was quantified by CRISPR score, which is the fitness cost imposed by CRISPR-based inactivation of the gene in KBM7 chronic myelogenous leukemia cells (Wang et al., 2015). Genes with lower CRISPR score are regarded as more essential. The fraction with low CRISPR scores (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality (p<10-15). The fraction with high centrality scores tends to be more essential (p<1015). (G) shows a correlation between evolutionary conservation and stoichiometry conservation centrality based on the human cell atlas data. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality (p<10-15). The gene fraction with high centrality scores tends to have more orthologs (p<10-15). (H) and (I) Stoichiometry conservation analyses of genome-wide Perturb-seq data (Replogle et al., 2022). (H) shows a correlation between stoichiometry conservation centrality calculated from the Perturb-seq data and gene essentiality. The essentiality of each gene was quantified by the CRISPR score as in (F). The gene fraction with low CRISPR scores (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality (p<10-15). The gene fraction with high centrality scores tends to be more essential (p<10-15). (I) shows a correlation between stoichiometry conservation based on the Perturb-seq data and evolutionary conservation of genes. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality (p<10-15). The gene fraction with high centrality scores tends to have more orthologs (p<10-15). (J) Representation of the proteomes as a graph. A node corresponds to a protein species, and the weight of an edge is taken as the cosine similarity between the m-dimensional expression vectors of the two connected protein species. The n×n matrix A can specify the whole graph. Note that the diagonal elements of A are ones, which were introduced just for simplicity. (K) Cosine similarity LE (csLE) structure in a three-dimensional space. Each dot represents a different protein species and is color-coded on the basis of its stoichiometry conservation centrality value. We selected the axes considering the structural similarity to the Raman-based proteome structure in ΩB (see Figure 6). (L) The csLE structure in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a different protein species. The proteins belonging to each SCG are indicated with distinct markers. Colors of the two-dimensional histograms in (C), (F), (G), (H), and (I) represent the height of each bar.

Raman-based proteome structure and its similarity to stoichiometry-based proteome structure.

(A) Proteome structure determined by Raman-proteome coefficients visualized in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a protein species. The proteins belonging to each stoichiometrically conserved group (SCG) are indicated with distinct markers. We note that SCGs are defined without referring to Raman data (Figure 4). (B–D) Similarity among the distribution of linear discriminant analysis (LDA) Raman spectra (B), the proteome structure determined by Raman-proteome coefficients (C), and the proteome structure determined by stoichiometry conservation (D). (E) Mathematical relation between the coordinates of the proteins in ΩB (C) and ΩLE (D). The two conditions, one with Θ (magenta) and the other between 𝒃0 and 𝒃0est (cyan), must hold for the similarity between the two proteome structures (yellow), as described in the gray box. denotes column-wise proportionality.

Proportionality between stoichiometry conservation centrality and expression generality.

(A) Relationships between stoichiometry conservation centrality (di) and expression generality (gi). Each gray dot represents a protein species. The proteins belonging to each stoichiometrically conserved group (SCG) are indicated with distinct markers. The dashed lines are y=n, x=1, m (n=2058, m=15). The solid lines represent y={(j=1ndj)/m}1/2x (see Section 2.2 in Appendix). The deviation of a point from the solid line is related to the growth rate under the condition where each protein is expressed the most. (B) The same plot as (A) in black and white. Overlaid red circles indicate proteins featured in (C). (C) Expression patterns of the proteins indicated by red circles in (B) across conditions. The condition differences are shown by the growth rate differences on the horizontal axes. The arrangement of the plots for the proteins corresponds to their relative positions in (B).

Appendix 1—figure 1
Schematic illustration of the approach in this study.

Related to Figure 1. Raman spectra and gene expression profiles are both high-dimensional vectors and can be represented as points in high-dimensional spaces. Coarse-graining Raman spectra by dimensional reduction finds condition-dependent differences in their global spectral patterns (see Figure 2). The dimension-reduced spectra were linked to and used to predict condition-dependent global gene expression profiles (see Figure 2), which implies that global changes in spectral patterns detect differences in cellular physiological states. The analysis of this linkage led us to discover a stoichiometry-conserving constraint on gene expression, which enabled us to represent gene expression profiles in a functionally relevant low-dimensional space (i; see also Figures 35). Then, we find a nontrivial correspondence between these low-dimensional Raman and gene expression spaces (ii; see also Figure 6). This correspondence provides an omics-level interpretation of global Raman spectral patterns and a quantitative constraint between expression generality and stoichiometry conservation centrality (ii; see also Figure 7, Appendix 1—figure 9).

Appendix 1—figure 2
Custom-built Raman microscope and analyses of E. coli Raman spectra.

Related to Figure 2. (A) Schematic diagram of the Raman microscope used in this study. (B) Representative Raman spectra from single E. coli cells. The fingerprint region of one spectrum is shown for each condition. (C) Linear superposition of Raman shifts. Each linear discriminant analysis (LDA) axis is a linear superposition of Raman shifts. These figures show the coefficients for LDA1 (left) and LDA2 (right). (D) Relationship between Raman LDA1 axis and growth rates. The horizontal axis represents Raman LDA1 axis. The vertical axis represents growth rates measured in Schmidt et al., 2016. Each point corresponds to the data for one condition. Pearson correlation coefficient is 0.81±0.09.

Appendix 1—figure 3
Estimation of proteomes from Raman spectra.

Related to Figure 2. Comparing the measured proteomes with those estimated from Raman spectra. The horizontal and vertical axes represent the estimated and measured proteomes, respectively. Proteins with negative estimated abundance are not shown in these figures. The conditions with the largest and the second largest numbers of proteins with negative estimated abundance were ‘stationary3days’ (666 proteins) and ‘LB’ (359 proteins). The conditions with the fewest and the second fewest negatively estimated proteins were ‘GlucosepH6’ (0 proteins) and ‘Xylose’ (7 proteins).

Appendix 1—figure 4
Comparison of stoichiometry conservation among Clusters of Orthologous Group (COG) classes.

Related to Figure 3. (A and B) Relations between protein abundance and constant terms of Raman-proteome coefficients. The horizontal axes are b0 (constant terms), and the vertical axes are p^i (protein abundance). Dashed lines are the least squares regression lines with intercept zero for information storage and processing (ISP) COG class members. The average of Biest was used as an estimate of B here. In (A), only ISP COG class members are shown for three representative conditions: ‘Galactose’, ‘Glucose’, and ‘GlycerolAA’. In (B), all proteins are shown for a representative condition, ‘GlycerolAA’. (C) Relations between protein abundance and growth rates of E. coli under 15 environmental conditions. We analyzed the absolute quantitative proteome data, growth rate data, and COG annotation reported by Schmidt et al., 2016. Lines represent different protein species. Error bars are standard errors. The top panel is for the Cellular Processes and Signaling COG class; the middle is for the ISP COG class; and the bottom is for the Metabolism COG class. (D) Relations between protein abundance and growth rates of three E. coli strains (BW25113, MG1655, and NCM3722) under two culture conditions. We again analyzed the data by Schmidt et al., 2016. Lines represent different protein species. Error bars are standard errors. (E and F) COG class-dependent expression pattern similarity of E. coli proteomes between conditions. The E. coli proteome data under the 15 different environmental conditions were analyzed. The similarity is evaluated by Pearson correlation coefficients of log expression levels in (E) and by cosine similarity in (F). We consider all the combinations of the 15 conditions. Thus, there are 105 data points for each COG class. The box-and-whisker plots summarize the distributions of the points. The lines inside the boxes denote the medians. The top and bottom edges of the boxes denote the 25th percentiles and 75th percentiles, respectively. Note that (E) and (F) are evaluations of the same data used in Figure 3B in the main text with different similarity indices. (G) COG class-dependent expression pattern similarity between different strains of E. coli (BW25113, MG1655, and NCM3722). The absolute quantitative proteome data and COG annotation were taken from Schmidt et al., 2016. The similarity was evaluated by cosine similarity. The data contain three strains. Thus, there are three points for each COG class. The top panel is for the ‘Glucose’ condition, and the bottom is for the ‘LB’ condition. (H–J) COG class-dependent expression pattern similarity in other organisms. (H) is for M. tuberculosis (data from Schubert et al., 2015; six environmental conditions [time points]), (I) for M. bovis (data from Schubert et al., 2015; six environmental conditions [time points]), and (J) for S. cerevisiae (data from Lahtvee et al., 2017; 10 environmental conditions). The COG annotations were taken from the December 2014 release of 2003-2014 COGs (Galperin et al., 2015) and the Release 3 of ‘Mycobrowser’ (Kapopoulou et al., 2011) for (H) and (I) and from the Comprehensive Sake Yeast Genome Database (S288C strain) (Akao et al., 2011) for (J). The unit for protein abundance was fg/cell for (H) and (I) and fg in pg dry cell weight for (J).

Appendix 1—figure 5
Single-gene-level growth law in the homeostatic core.

Related to Figure 4. (A) Relationship between population growth rates and total abundance of SCG 1 (homeostatic core) proteins. Here, we analyzed the E. coli proteome data (Schmidt et al., 2016), focusing on the 15 conditions for which we obtained Raman data. The dashed line is the least squares regression line. (B) Scatterplots of log abundance of SCG 1 (homeostatic core) proteins. Here, the proteomes under three representative conditions, ‘LB’, ‘Glucose’, and ‘Galactose’, are compared with that under the standard condition ‘Glycerol’. Each colored line is the linear regression line with slope one for the points with the same color. The vertical line is x=0. (C) Relationship between population growth rate and coefficient of determination of linear regression in (B). The vertical line represents the growth rate under the standard condition (‘Glycerol’). (D) Linear relationship between common abundance ratio and growth rates. The vertical axis represents 10Γc, where Γc is the y-intercepts in (B) (see Section 3.1.2 in Appendix). The dashed line is the linear regression line. The horizontal line is y=1, and the x coordinate of the vertical line is the growth rate under the standard condition (‘Glycerol’). (E) The gene loci of the proteins belonging to the condition-specific stoichiometrically conserved groups (SCGs) on the chromosome (ASM75055v1.46; Howe et al., 2020). Colored dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). The edge in the map of SCG 5 cannot be seen because their gene loci are clustered in close proximity in the same operon.

Appendix 1—figure 6
Functional relevance of stoichiometry conservation centrality.

Related to Figure 5. (A) Relationship between gene essentiality and stoichiometry conservation centrality in E. coli. The proportion of essential genes is plotted for each stoichiometry conservation centrality rank range. In this plot, we calculated stoichiometry conservation centrality based on the E. coli proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. The list of essential genes was downloaded from EcoCyc (Keseler et al., 2017). (B) Relationship between gene essentiality and stoichiometry conservation centrality in S. pombe. We calculated stoichiometry conservation centrality based on the S. pombe transcriptome data reported in Kobayashi-Kirschvink et al., 2018. Only coding genes are considered in this plot, though stoichiometry conservation centrality values were calculated using both coding and non-coding genes. Gene classification is based on PomBase (Harris et al., 2022). Some bins do not reach 100% in sum because 11 coding genes in the S. pombe transcriptome data were not found in the current PomBase. (C) Relationship between ratio of coding genes and stoichiometry conservation centrality in the S. pombe transcriptome data. The coding/non-coding assignment is based on PomBase (Harris et al., 2022). (D) Correlation between stoichiometry conservation and evolutionary conservation. In this plot, we calculated stoichiometry conservation centrality based on the E. coli proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. Colors represent the height of each bar. The distributions of stoichiometry conservation centrality were compared between the top 25% and the bottom 25% fractions in the number of orthologs rankings. The fraction with many orthologs tends to have higher stoichiometry conservation centrality (one-sided Brunner-Munzel test, p=7.84×1015). The distributions of the number of orthologs were compared between the top 25% and the bottom 25% stoichiometry conservation centrality fractions. The high centrality fraction tends to have more orthologs (one-sided Brunner-Munzel test, p=1.46×1011). Ortholog data were taken from OrthoMCL-DB (Chen et al., 2006). (E–G) Correlation between stoichiometry conservation and evolutionary conservation in S. pombe. We calculated stoichiometry conservation centrality based on the S. pombe transcriptome data reported in Kobayashi-Kirschvink et al., 2018. In (E), the result is shown by a two-dimensional histogram. Colors represent the height of each bar. The distributions of the number of orthologs were compared between the top 25% and the bottom 25% stoichiometry conservation centrality fractions. The high centrality fraction tends to have more orthologs (one-sided Brunner-Munzel test, p=0.00548). The direct comparison between the two fractions is shown in (F). The distributions of stoichiometry conservation centrality were compared between the top 25% and the bottom 25% fractions in the number of orthologs rankings. The fraction with many orthologs tends to have higher stoichiometry conservation centrality (one-sided Brunner-Munzel test, p=0.00270). The direct comparison between the two fractions is shown in (G). Ortholog data were taken from OrthoMCL-DB (Chen et al., 2006). (H) Applying principal component analysis (PCA) to L2-normalized proteomes. PCA (with mean centering) was applied to L2-normalized proteome data [p1/p12pn/pn2]. Here, we analyzed the E. coli proteome data under the 15 conditions for which we obtained Raman data. The left is a projection onto a two-dimensional space, and the right is a projection onto a three-dimensional space. The axes for visualization were selected by considering similarity to the cosine similarity LE (csLE) structure.

Appendix 1—figure 7
Distributions and constraints with respect to stoichiometry conservation centrality (degree).

Related to Figure 5 and Figure 7. (A) Comparison of degree (stoichiometry conservation centrality) distributions between original (yellow) and randomized (blue) E. coli proteome data. We created randomized proteome data by shuffling the expression levels across the protein species within each condition. We used the E. coli proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. (B) Comparison of the gj-dj relationships between original (yellow) and randomized data (blue). The horizontal axis is expression generality score (gj=L1 norm/L2 norm), and the vertical axis is stoichiometry conservation centrality (dj: degree). Each dot represents a protein species. The dashed lines are y=n, x=1,m (n=2058,m=15). The solid lines are y=idi/m x. (C–H) Degree (stoichiometry conservation centrality) distributions for additional datasets. Yellow histograms are for the original data, and blue histograms are for the randomized data. (C) For the proteomes of three E. coli strains (BW25113, MG1655, and NCM3722) in LB (Schmidt et al., 2016); (D) for the proteomes of the three E. coli strains in M9 Glucose (Schmidt et al., 2016); (E) for the proteomes of M. tuberculosis (Schubert et al., 2015); (F) for the proteomes of M. bovis (Schubert et al., 2015); (G) for the proteomes of S. cerevisiae (Lahtvee et al., 2017); and (H) for the transcriptomes of S. pombe (Kobayashi-Kirschvink et al., 2018). (I–N) gj-dj relationships for additional datasets. Each gray dot represents a protein species. The proteins belonging to the homeostatic core in each dataset are shown in magenta; those belonging to condition-specific stoichiometrically conserved groups (SCGs) are indicated in different colors in each plot. See the caption of Appendix 1—figures 11 and 13 for the cosine similarity threshold to specify the homeostatic core and the condition-specific SCGs in each dataset. The dashed lines are y=n,x=1,m. The solid lines through the origins are y=i=1ndi/mx (I) for the proteomes of the three E. coli strains in LB (Schmidt et al., 2016); (J) for the proteomes of the three E. coli strains in M9 Glucose (Schmidt et al., 2016); (K) for the proteomes of M. tuberculosis (Schubert et al., 2015); (L) for the proteomes of M. bovis (Schubert et al., 2015) (M) for the proteomes of S. cerevisiae (Lahtvee et al., 2017); and (N) for the transcriptomes of S. pombe (Kobayashi-Kirschvink et al., 2018).

Appendix 1—figure 8
Properties of normalized expression vectors.

Related to Figure 7. (A and B) Schematic explanation for the interpretation of the L1 norm/L2 norm ratio of expression vectors as an index of expression generality. (A) is a two-dimensional case, and (B) is a three-dimensional case. The inset in (A) schematically explains L1 norm and L2 norm of an expression vector. See ‘Interpretation of L1 norm/L2 norm ratio of an expression vector as a quantitative measure of expression generality’ in Materials and methods for details. (C) Schematic explanation for deviations of points from the proportionality line in the gj-dj plots. Here, we consider four condition-specific protein species a, b, c, and d labeled in the descending order of growth rates under the conditions accompanying their expression. Note that their L1 norm/L2 norm ratios are all one on the horizontal axis. One can show that the degree (stoichiometry conservation centrality) dj is proportional to the inner product of L2-normalized expression vector pj/pj2 and the expression norm vector p~tot/p~tot2 (see Equation 2.147 in Section 2.2.2). Since the elements of p~tot/p~tot2 increase approximately linearly with growth rates of the corresponding conditions (see D), the degrees (stoichiometry conservation centrality values) decrease from a to d in the order of growth rates. (D–F) Correlation between elements of p~tot and population growth rates. The vertical axis represents the elements of p~tot/p~tot2, and the horizontal axis represents the population growth rates. The dashed lines are y=1/m. (D) is the result from the analysis of the E. coli proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data (m=15). (E) is the result from the analysis of the proteome data of three strains of E. coli (BW25113, MG1655, and NCM3722) under ‘LB’ and ‘Glucose’ conditions (m=6) (Schmidt et al., 2016). (F) is the result from the analysis of the proteome data of S. cerevisiae under 10 different conditions (m=10) (Lahtvee et al., 2017). The cells were cultured in a chemostat with the same dilution rate. The numbers of analyzable protein species and the numbers of conditions were different between (D) and (E). Thus, the values of the vertical axes cannot be compared directly between them.

Appendix 1—figure 9
Mathematical analyses of the main Raman-proteome data.

Related to Figure 6. Proteomes of E. coli under 15 conditions (Schmidt et al., 2016) and corresponding Raman data we measured in this study were analyzed in this figure. (A) Visual comparison of the unit matrix I, the orthogonal matrix Θ obtained from the data, and a random orthogonal matrix. Height of each bar indicates the value of each element. Colors represent the height of each bar. For clarifying the position of each element, a component form of matrix Θ is shown in the middle (m=15). For Θ (middle) and a random orthogonal matrix (right), the original matrices are displayed in the upper row, and matrices whose elements are the absolute values of the corresponding elements of the original matrices are displayed in the lower row. (In this figure, |Θ| represents a matrix of which the (i,j) element is the absolute value of the (i,j) element of Θ.) (B) Representation of matrices as scatterplots. See ‘Evaluating similarity between orthogonal matrix Θ and identity matrix’ in Materials and methods for details. (C) Comparison of the unit matrix I, the orthogonal matrix Θ obtained from the data, and random orthogonal matrices Q by Pearson correlation coefficients. Pearson correlation coefficient of the element-wise squared matrix of each matrix can be regarded as a measure of closeness to the identity matrix ( represents element-wise multiplication). The probability of finding a random orthogonal matrix Q with Pearson correlation coefficient greater than the Pearson correlation coefficient of Θ was <1×105 (no occurrence in 105 samplings). See ‘Evaluating similarity between orthogonal matrix Θ and identity matrix’ in Materials and methods for details. (D) Comparison of magnitudes of off-diagonal elements among the unit matrix I, the orthogonal matrix Θ obtained from the data, and random orthogonal matrices Q. The lattice on the top explains the numbering of k-diagonals (m<k<m, m=15). In the lattices on the bottom, black color indicates areas in which the elements are squared and summed at the corresponding steps (i.e. areas represented by x in the graph). The sum of the squared values in each step is shown in the middle graph. Error bars of the random matrix line are standard errors of 100 samplings. See ‘Evaluating similarity between orthogonal matrix Θ and identity matrix’ in Materials and methods for details. (E) Comparison of magnitudes of elements of leading principal submatrices among the unit matrix I, the orthogonal matrix Θ obtained from the data, and random orthogonal matrices Q. In the lattices on the bottom, black color indicates an area in which elements are squared and summed at the corresponding step (i.e. an area represented by x in the graphs). The sum of the squared values in each area is shown in the top graph. The results shown in the top graph are converted into ratios to the identity matrix I and are shown in the middle graph. Error bars of the random matrix line are standard errors of 100 samplings. See ‘Evaluating similarity between orthogonal matrix Θ and identity matrix’ in Materials and methods for details. (F) Comparison of mdiag(b0) and diag(b0est). x axis represents mb0 and y axis represents b0est. The dashed line indicates y=x. (G) Comparison between BEnorm (left) and BEest,norm (right). Note that while BEnorm figure (left) is the same as Figure 6C, the right figure shows BEest,norm=(i=1ndi)1/2V~rw, where V~rw is shown in Figure 6D.

Appendix 1—figure 10
Orthant correspondences between Raman spectra in linear discriminant analysis (LDA) space and condition-specific proteins in Raman-proteome coefficient proteome space.

Related to Figure 6. Using the main Raman and proteome data of E. coli under the 15 conditions, we examine the orthant correspondence between Raman spectra in the LDA space and condition-specific proteins in the Raman-proteome coefficient proteome space ΩB. Here, we focus on two proteins PaaE and AcrR. (A) Expression patterns of PaaE (left) and AcrR (right) across conditions. Error bars are standard errors. PaaE is expressed under the ‘LB’ condition in a condition-specific manner, whereas AcrR is expressed at high levels not only under ‘LB’ condition but also under several other conditions. (B) Positions of PaaE and AcrR in the Raman-proteome coefficient-based proteome space ΩB. (C) Verification of orthant correspondence. We verified the orthant correspondence described by Equation 2.76. We multiplied both sides of Equation 2.76 by (ΣREnorm)1, and the elements of the vectors of both sides were compared by scatterplots. The horizontal axes are related to the coordinates in the Raman LDA space; the vertical axes are related to the coordinates in the Raman-proteome coefficient proteome space. The dashed lines are y=x. The nearly perfect agreement of the elements confirms the orthant correspondence for the condition-specific protein PaaE (left). Deviations from the diagonal agreement line are found for AcrR (right).

Appendix 1—figure 11
Stoichiometry-based omics structures and their correspondences to Raman-based omics structures for additional datasets.

Related to Figures 46. This figure summarizes the results on omics structures characterized by stoichiometry conservation relations and their correspondences to those characterized by Raman-omics relations for additional datasets. (A–E) show the results from the analyses of the Raman and proteome data of three E. coli strains (BW25113, MG1655, and NCM3722) in LB; (F–J) from the analyses of the Raman and proteome data of the three E. coli strains in M9 Glucose; and (K–O) from the analyses of the Raman and transcriptome data of S. pombe under 10 conditions. We used the E. coli proteome data reported in Schmidt et al., 2016, and the S. pombe transcriptome data reported in Kobayashi-Kirschvink et al., 2018, in the analyses. (A), (F), and (K) show distributions of omics components in cosine similarity LE (csLE) space. Stoichiometry conservation centrality of each component is indicated by color. (B), (G), and (L) show expression patterns of representative condition-specific omics components indicated in the previous figures of omics structures in the csLE spaces. Error bars are standard errors in (B) and (G), and maximum-minimum ranges (two replicates) in (L). (C), (H), and (M) show positions of averaged cellular Raman spectra under different conditions in the linear discriminant analysis (LDA) spaces. (D), (I), and (N) show omics structures in the spaces specified by the Raman-omics coefficients with the homeostatic cores and condition-specific stoichiometrically conserved groups (SCGs) indicated by colored points. (E), (J), and (O) show the omics structures in the csLE omics spaces with the homeostatic cores and condition-specific SCGs indicated by colored points. Columns vrw,1 (the eigenvector corresponding to Lrw’s smallest eigenvalue except for zero) and vrw,2 (the eigenvector corresponding to Lrw’s second smallest eigenvalue except for zero) are shown. We used the cosine similarity thresholds of 0.99993 to specify SCGs both for the three E. coli strains under LB data (D and E) and for the three E. coli strains under M9 Glucose data (I and J), and 0.9967 for the S. pombe transcriptome data (N and O).

Appendix 1—figure 12
Analyses of the mathematical relation connecting two types of omics spaces.

Related to Figure 6. This figure shows the analyses of mathematical relation that connects coordinates of omics components in the two types of omics spaces (see Figure 6E and Section 2 in Appendix) using additional datasets. (A–F) show the results from the analyses of the Raman and proteome data of three E. coli strains (BW25113, MG1655, and NCM3722) in LB; (G–L) from the analyses of the Raman and proteome data of the three E. coli strains in M9 Glucose; and (M–R) from the analyses of the Raman and transcriptome data of S. pombe under 10 conditions. We used the E. coli proteome data reported in Schmidt et al., 2016, and the S. pombe transcriptome data reported in Kobayashi-Kirschvink et al., 2018 in the analyses. See the caption of Appendix 1—figure 9 for the explanation of each panel. The stoichiometrically conserved groups (SCGs) in (F), (L), and (R) are the same as in Appendix 1—figure 11. The probability of finding a random orthogonal matrix Q with Pearson correlation coefficient greater than the Pearson correlation coefficient of Θ was 0.022 in (B), 0.013 in (H), and <1×105 (no occurrence in 105 samplings) in (N).

Appendix 1—figure 13
Stoichiometry-based proteome structures for additional datasets.

Related to Figures 4 and 5. This figure shows proteome structures in the cosine similarity LE (csLE) proteome spaces for additional datasets. (A–C) show the results from the analyses of the proteome data of M. tuberculosis H37Rv under gradual changes in oxygen levels (Schubert et al., 2015); (D–F) shows the results from the analyses of the proteome data of M. bovis BCG under gradual changes in oxygen levels (Schubert et al., 2015); and (G–I) show the results from the analyses of the proteome data of S. cerevisiae under 10 conditions in chemostat with the same dilution rate (Lahtvee et al., 2017). (A), (D), and (G) show the proteome structures in the csLE spaces. The thresholds used to specify the stoichiometrically conserved groups (SCGs) were 0.99965 for (A), 0.9997 for (D), and 0.9989 for (G). (B), (E), and (H) show the same proteome structures as in the previous panels, but with stoichiometry conservation centrality of each protein species indicated by the color. (C), (F), and (I) show expression patterns of representative proteins indicated by the red circles in the previous panels. Error bars in (C) are standard errors.

Appendix 1—figure 14
Dependence of low-dimensional correspondence between Raman spectra and proteomes on the number of conditions.

Related to Figure 6. The dependence of the low-dimensional correspondence between Raman spectra and proteomes on the number of analyzed conditions was systematically investigated by evaluating the similarity of the orthogonal matrix Θ to the identity matrix for all subsampled condition sets. Proteomes of E. coli under 15 conditions (Schmidt et al., 2016) and corresponding Raman data we measured in this study were analyzed in this figure. (A) The relationship between the number of conditions and the probability of obtaining higher level of low-dimensional correspondence than that of experimental data by chance. This probability is calculated as the probability of finding a random orthogonal matrix with Pearson correlation coefficient greater than the Pearson correlation coefficient of Θ by creating 104 random orthogonal matrices. See ‘Evaluating similarity between orthogonal matrix Θ and identity matrix’ in Materials and methods and Appendix 1—figure 9 for details of the evaluation method. Each green square corresponds to one subsample, and each short horizontal black line represents the median of all the (15x) combinations of conditions (i.e. (15x) green squares) for each subsample size x. The blue dashed line indicates the detection limit (i.e. one over the number of generated random orthogonal matrices). The non-subsampled case (i.e. the case with all 15 conditions) in this figure corresponds to Appendix 1—figure 9C. (B) Visual comparison of Θ,BEnorm and BEest,norm for six representative subsamples indicated in (A). As in Appendix 1—figure 9A, Θ is visualized using |Θ|, whose element is the absolute value of the corresponding element of Θ, and height of each bar in the figures of |Θ| indicates the value of each element of |Θ|. Colors reflect the height of each bar. Spaces created with columns of BEnorm and BEest,norm are ΩB and ΩLE, respectively. As Θ deviates from the identity matrix from the cases α and β to the case of ϵ, the low-dimensional correspondence between ΩB and ΩLE collapses naturally. Since the case ζ is the non-subsampled case, the figure of |Θ| is the same as Appendix 1—figure 9A, and those of BEnorm and BEest,norm are the same as Appendix 1—figure 9G. Note that the figure of ΩB of the case ζ is also exactly the same as Figure 6C, and that of ΩLE of the case ζ is equal to Figure 6D up to a factor of (i=1ndi)1/2. The stoichiometrically conserved groups (SCGs) shown in this figure were defined in the analysis of the proteomes of all the 15 conditions (Figure 4C).

Tables

Table 1
List of scalars, vectors, and matrices in the main text.

Scalars, vectors, and matrices in the main text are listed with their sizes and descriptions. m is the number of conditions, and n is the number of protein species. (m=15 and n=2058 in the main text.) Note that the notation summarized in this table differs in some respect from that in Materials and methods and Appendix.

Size (#columns × #rows)Description
r^j
(j=1,,m)
(m1)×1 (vector)Mean LDA Raman profile

of single cells under condition j
p^j
(j=1,,m)
n×1 (vector)Proteome profile
of cell population under condition j
B
=[b0bm1]

=(bik)1in,0km1
n×mSet of condition-independent
coefficients that linearly connect r^j and p^j for all conditions j (Equation 1)
pi
(i=1,,n)
m×1 (vector)Expression levels of protein speciesi across m conditions
cosθpipj
=(pi/pi2)(pj/pj2)
(i,j=1,,n)
1×1 (scalar)Stoichiometry (abundance ratio)
conservation strength between two protein species i and j (Figure 4A)
A=(cosθpipj)1i,jnn×nSet of stoichiometry conservation strengths between all pairs of protein species (Figure 5J)
di=Σj=1ncosθ𝒑i𝒑j
(i=1,,n)
1×1 (scalar)Stoichiometry conservation centrality of protein species i
gi=pi1/pi2
(i=1,,n)
1×1 (scalar)Expression generality of protein species i
Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Chemical compound, drugDifco LB Broth, Miller (Luria-Bertani)Becton, Dickinson and Company
Chemical compound, drugBacto Yeast ExtractBecton, Dickinson and Company
Chemical compound, drugBacto TryptoneBecton, Dickinson and Company
Chemical compound, drugSodium ChlorideWako Pure Chemical Industries, Ltd.
Chemical compound, drugDisodium HydrogenphosphateWako Pure Chemical Industries, Ltd.
Chemical compound, drugPotassium DihydrogenphosphateWako Pure Chemical Industries, Ltd.
Chemical compound, drugAmmonium SulfateWako Pure Chemical Industries, Ltd.
Chemical compound, drugZinc Sulfate HeptahydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugCooper(II) Chloride DihydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugManganese(II) Sulfate PentahydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugCobalt(II) Chloride HexahydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugCalcium Chloride DihydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugMagnesium Sulfate HeptahydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugThiamin HydrochlorideWako Pure Chemical Industries, Ltd.
Chemical compound, drugIron(III) Chloride HexahydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugSodium AcetateWako Pure Chemical Industries, Ltd.
Chemical compound, drugDisodium FumarateFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugD-GalactoseWako Pure Chemical Industries, Ltd.
Chemical compound, drugD-GlucoseWako Pure Chemical Industries, Ltd.
Chemical compound, drugGlycerolWako Pure Chemical Industries, Ltd.
Chemical compound, drugD-FructoseFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugD-MannoseFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugD-XyloseWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-AlanineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-Asparagine MonohydrateWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-CysteineFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugL-Glutamic acidWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-GlutamineWako Pure Chemical Industries, Ltd.
Chemical compound, drugGlycineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-HistidineFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugL-IsoleucineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-PhenylalanineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-ProlineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-SerineWako Pure Chemical Industries, Ltd.
Chemical compound, drugAdenineFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugL-ArginineFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugL-Aspartic acidWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-LeucineFUJIFILM Wako Pure Chemical Corporation
Chemical compound, drugL-LysineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-MethionineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-ThreonineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-TryptophanWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-TyrosineWako Pure Chemical Industries, Ltd.
Chemical compound, drugL-ValineWako Pure Chemical Industries, Ltd.
Chemical compound, drugUracilWako Pure Chemical Industries, Ltd.
Chemical compound, drug8mol/L Sodium Hydroxide SolutionWako Pure Chemical Industries, Ltd., FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug35–37% (mass/mass) Hydrochloric AcidWako Pure Chemical Industries, Ltd.
Chemical compound, drug0.1mol/L Hydrochloric AcidWako Pure Chemical Industries, Ltd.
Chemical compound, drugAgarWako Pure Chemical Industries, Ltd., FUJIFILM Wako Pure Chemical Corporation
Strain, strain background (Escherichia coli)BW25113Wakamoto Laboratory stock
Strain, strain background (Escherichia coli)MG1655Wakamoto Laboratory stock
Strain, strain background (Escherichia coli)NCM3722Coli Genetic Stock Center
Appendix 1—table 1
List of culture conditions.

M9 m.m. and a.a. in this table are the abbreviations for M9 minimal media and amino acids, respectively.

PhaseOverview of compositionTemperaturepHName in this paper
ExponentialM9 m.m. + acetate37°C7Acetate
M9 m.m. + fructoseFructose
M9 m.m. + fumarateFumarate
M9 m.m. + galactoseGalactose
M9 m.m. + glucoseGlucose
M9 m.m. + glucose42°CGlucose42C
M9 m.m. + glucose37°C6GlucosepH6
M9 m.m. + glycerol7Glycerol
M9 m.m. + glycerol + a.a.GlycerolAA
M9 m.m. + glucose + NaClOsmoticStressGlucose
M9 m.m. + mannoseMannose
M9 m.m. + xyloseXylose
LBLB
Stationary for 1 dayM9 m.m. + glucosestationary1day
Stationary for 3 daysstationary3days
Appendix 1—table 2
Evaluation of the overall estimation error with various distance measures (the case where LDA1 to LDA4 axes were used).

The sum of estimation errors idist(p^i,p^iest) was calculated, and a permutation test (105 permutations) was conducted. In this table, LDA1 to LDA4 axes were used. x represents a vector whose all elements are the mean of all elements of x . xj is the j-th element of x . medianj xj represents the median of scalers xj .

MetricDefinition of dist(x,y)idist(p^i,p^iest)p-value
Square of L2 norm (PRESS)xy22=j(xy)j22.34 × 1030.00005
L1 normxy1=j|(xy)j|1.40 × 1030.00002
Cosine distance1xyx2y21.520.0014
1 – Pearson correlation coefficient1(xx¯)(yy¯)xx¯2yy¯21.570.0012
Median of relative errormedianj|(xy)j|xj+10.05360.00022
Appendix 1—table 3
Evaluation of the overall estimation error with various distance measures (the case where all the 14 LDA axes were used).

The results obtained by using all the 14 LDA axes are presented. See Appendix 1—table 2 for notations. Note that the system is underdetermined in this case; thus, we adopted the minimum-norm solution from among all least-squares solutions.

MetricDefinition of dist(x,y)idist(p^i,p^iest)p-value
Square of L2 norm (PRESS)xy22=j(xy)j21.63 × 1030.0019
L1 normxy1=j|(xy)j|1.19 × 1030.00066
Cosine distance1xyx2y21.180.0879
1 – Pearson correlation coefficient1(xx¯)(yy¯)xx¯2yy¯21.230.085
Median of relative errormedianj|(xy)j|xj+10.04180.00082
Appendix 1—table 4
Gene list of SCG 1 (homeostatic core).

Members of homeostatic core (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

NameDescription
rpoCDNA-directed RNA polymerase subunit beta’
rpoBDNA-directed RNA polymerase subunit beta
tufAElongation factor Tu 1
infBTranslation initiation factor IF-2
fusAElongation factor G
glySGlycyl-tRNA synthetase beta subunit
rpsA30S ribosomal protein S1
leuSLeucyl-tRNA synthetase
pheTPhenylalanyl-tRNA synthetase beta chain
aspSAspartyl-tRNA synthetase
valSValyl-tRNA synthetase
secAProtein translocase subunit SecA
gyrADNA gyrase subunit A
pepNAminopeptidase N
tsfElongation factor Ts
tigTrigger factor
ptaPhosphate acetyltransferase
bamAOuter membrane protein assembly factor YaeT
rneRibonuclease E
ftsZCell division protein FtsZ
gyrBDNA gyrase subunit B
polADNA polymerase I
rplB50S ribosomal protein L2
prlCOligopeptidase A
rhoTranscription termination factor Rho
ftsHATP-dependent zinc metalloprotease FtsH
nusATranscription elongation protein NusA
lysSLysyl-tRNA synthetase
metGMethionyl-tRNA synthetase
glnSGlutaminyl-tRNA synthetase
lpdADihydrolipoyl dehydrogenase
serSSeryl-tRNA synthetase
surAChaperone SurA
rpsB30S ribosomal protein S2
gltXGlutamyl-tRNA synthetase
lptDLPS-assembly protein LptD
argSArginyl-tRNA synthetase
fabB3-Oxoacyl-[acyl-carrier-protein] synthase 1
pheSPhenylalanyl-tRNA synthetase alpha chain
clpXATP-dependent Clp protease ATP-binding subunit ClpX
accCBiotin carboxylase
pyrGCTP synthase
tolCOuter membrane protein TolC
rplE50S ribosomal protein L5
accAAcetyl-coenzyme A carboxylase carboxyl transferase subunit alpha
hflKModulator of FtsH protease HflK
pdxBErythronate-4-phosphate dehydrogenase
ygfZtRNA-modifying protein YgfZ
pmbAProtein PmbA
rplA50S ribosomal protein L1
hldDADP-L-glycero-D-manno-heptose-6-epimerase
mreBRod shape-determining protein MreB
acrAAcriflavine resistance protein A
gorGlutathione reductase
hisSHistidyl-tRNA synthetase
rpsC30S ribosomal protein S3
glmMPhosphoglucosamine mutase
lepAElongation factor 4
ffhSignal recognition particle protein
secDProtein-export membrane protein SecD
lpoAPenicillin-binding protein activator LpoA
rhlBATP-dependent RNA helicase RhlB
rpsG30S ribosomal protein S7
rpsD30S ribosomal protein S4
minDSeptum site-determining protein MinD
cyoAUbiquinol oxidase subunit 2
mdoGGlucans biosynthesis protein G
rplC50S ribosomal protein L3
glmUBifunctional protein GlmU
rpsF30S ribosomal protein S6
rpsE30S ribosomal protein S5
hemLGlutamate-1-semialdehyde 2,1-aminomutase
hldEBifunctional protein HldE
ubiEUbiquinone/menaquinone biosynthesis methyltransferase UbiE
sspAStringent starvation protein A
nusGTranscription antitermination protein NusG
prfBPeptide chain release factor 2
dacAD-alanyl-D-alanine carboxypeptidase DacA
rplF50S ribosomal protein L6
fabG3-Oxoacyl-[acyl-carrier-protein] reductase
ftsYCell division protein FtsY
dcrBProtein DcrB
mlaCProbable phospholipid-binding protein MlaC
hflCModulator of FtsH protease HflC
coaBCoenzyme A biosynthesis bifunctional protein CoaBC
ybiTUncharacterized ABC transporter ATP-binding protein YbiT
oxyRHydrogen peroxide-inducible genes activator
rpsH30S ribosomal protein S8
fkpAFKBP-type peptidyl-prolyl cis-trans isomerase FkpA
frrRibosome-recycling factor
fabDMalonyl CoA-acyl carrier protein transacylase
hslO33 kDa chaperonin
ybeZPhoH-like protein
hemXPutative uroporphyrinogen-III C-methyltransferase
rplY50S ribosomal protein L25
rplK50S ribosomal protein L11
rpsI30S ribosomal protein S9
bamBLipoprotein YfgL
bamDUPF0169 lipoprotein YfiO
kdgRTranscriptional regulator KdgR
glnD[Protein-PII] uridylyltransferase
yniCPhosphatase YniC
rpsJ30S ribosomal protein S10
rplX50S ribosomal protein L24
rplD50S ribosomal protein L4
rplQ50S ribosomal protein L17
ppaInorganic pyrophosphatase
rpsM30S ribosomal protein S13
rplN50S ribosomal protein L14
ybaBUPF0133 protein YbaB
yidCInner membrane protein OxaA
lptBLipopolysaccharide export system ATP-binding protein LptB
suhBInositol-1-monophosphatase
yejKNucleoid-associated protein YejK
ghrAGlyoxylate/hydroxypyruvate reductase A
rsmIRibosomal RNA small subunit methyltransferase I
hemYProtein HemY
uupABC transporter ATP-binding protein Uup
hrpAATP-dependent RNA helicase HrpA
rplJ50S ribosomal protein L10
rplM50S ribosomal protein L13
furFerric uptake regulation protein
rplS50S ribosomal protein L19
rcsBCapsular synthesis regulator component B
mrpProtein Mrp
glyQGlycyl-tRNA synthetase alpha subunit
greATranscription elongation factor GreA
nrdBRibonucleoside-diphosphate reductase 1 subunit beta
wbbIUncharacterized protein YefG
udkUridine kinase
mnmGtRNA uridine 5-carboxymethylaminomethyl modification enzyme MnmG
rplL50S ribosomal protein L7/L12
rplI50S ribosomal protein L9
rpoZDNA-directed RNA polymerase subunit omega
ybbNUncharacterized protein YbbN
yfiFUncharacterized tRNA/rRNA methyltransferase YfiF
yedDUncharacterized lipoprotein YedD
rpmD50S ribosomal protein L30
tatBSec-independent protein translocase protein TatB
yfgMUPF0070 protein YfgM
kdsB3-Deoxy-manno-octulosonate cytidylyltransferase
rpoNRNA polymerase sigma-54 factor
fdx2Fe-2S ferredoxin
rplV50S ribosomal protein L22
rplO50S ribosomal protein L15
fabZ(3R)-hydroxymyristoyl-[acyl-carrier-protein] dehydratase
mipAMltA-interacting protein
ssbSingle-stranded DNA-binding protein
yiaFUncharacterized protein YiaF
secYPreprotein translocase subunit SecY
rbfARibosome-binding factor A
potASpermidine/putrescine import ATP-binding protein PotA
rimMRibosome maturation factor RimM
trxAThioredoxin-1
rpsS30S ribosomal protein S19
rpsU30S ribosomal protein S21
accBBiotin carboxyl carrier protein of acetyl-CoA carboxylase
engBProbable GTP-binding protein EngB
tatASec-independent protein translocase protein TatA
rfbDdTDP-4-dehydrorhamnose reductase
ribFRiboflavin biosynthesis protein RibF
folPDihydropteroate synthase
lepBSignal peptidase I
sspBStringent starvation protein B
hupADNA-binding protein HU-alpha
rpsP30S ribosomal protein S16
rplP50S ribosomal protein L16
rpsT30S ribosomal protein S20
rpsK30S ribosomal protein S11
rplU50S ribosomal protein L21
rplR50S ribosomal protein L18
lpxAAcyl-[acyl-carrier-protein]–UDP-N-acetylglucosamine O-acyltransferase
yceDUncharacterized protein YceD
queC7-Cyano-7-deazaguanine synthase
rpmA50S ribosomal protein L27
rpmG50S ribosomal protein L33
rpmF50S ribosomal protein L32
rpsN30S ribosomal protein S14
rplT50S ribosomal protein L20
nudKGDP-mannose pyrophosphatase NudK
rplW50S ribosomal protein L23
trmBtRNA (guanine-N(7)-)-methyltransferase
rluBRibosomal large subunit pseudouridine synthase B
rpsR30S ribosomal protein S18
secGProtein-export membrane protein SecG
rlmERibosomal RNA large subunit methyltransferase E
yfaYCinA-like protein
trmAtRNA (uracil-5-)-methyltransferase
rpmH50S ribosomal protein L34
yajCUPF0092 membrane protein YajC
yheUUPF0270 protein YheU
Appendix 1—table 5
Gene list of SCG 2.

Members in SCG 2 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

NameDescription
fdoGFormate dehydrogenase-O major subunit
dsdAD-serine dehydratase
treCTrehalose-6-phosphate hydrolase
sdaBL-serine dehydratase 2
nanAN-acetylneuraminate lyase
garDD-galactarate dehydratase
proVGlycine betaine/L-proline transport ATP-binding protein ProV
garR2-Hydroxy-3-oxopropionate reductase
nanKN-acetylmannosamine kinase
fdoHFormate dehydrogenase-O iron-sulfur subunit
aphAClass B acid phosphatase
nanEPutative N-acetylmannosamine-6-phosphate 2-epimerase
srlBGlucitol/sorbitol-specific phosphotransferase enzyme IIA component
ibpBSmall heat shock protein IbpB
hybCHydrogenase-2 large chain
proWGlycine betaine/L-proline transport system permease protein ProW
srlEGlucitol/sorbitol-specific phosphotransferase enzyme IIB component
fdoIFormate dehydrogenase, cytochrome b556(fdo) subunit
preTUncharacterized oxidoreductase YeiT
garL5-Keto-4-deoxy-D-glucarate aldolase
paaBPhenylacetic acid degradation protein PaaB
paaKPhenylacetate-coenzyme A ligase
paaEProbable phenylacetic acid degradation NADH oxidoreductase PaaE
ykgEUncharacterized protein YkgE
ybjTUncharacterized protein YbjT
ykgGUncharacterized protein YkgG
Appendix 1—table 6
Gene list of SCG 3.

Members in SCG 3 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

NameDescription
wzcTyrosine-protein kinase Wzc
amiCN-acetylmuramoyl-L-alanine amidase AmiC
Appendix 1—table 7
Gene list of SCG 4.

Members in SCG 4 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

NameDescription
fruBMultiphosphoryl transfer protein
fruK1-Phosphofructokinase
fruAPTS system fructose-specific EIIBC component
narIRespiratory nitrate reductase 1 gamma chain
Appendix 1—table 8
Gene list of SCG 5.

Members in SCG 5 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

NameDescription
hdeBProtein HdeB
hdeAChaperone-like protein HdeA
Appendix 1—table 9
Interpretations of rh,r^i,bh, and b^j.

Interpretations of the columns and rows of RE and BE are summarized.

MatrixVectorDimensionDescription
REColumnrh(h=0,,m1)mList of h-th LDA coordinates of mean LDA Raman of all the conditions
Rowr^i(i=1,,m)mMean LDA Raman of condition i
BEColumnbh(h=0,,m1)nList of coefficients of all the proteins for the h-th LDA axis
Rowb^j(j=1,,n)mCoefficients for protein j
Appendix 1—table 10
Mathematical relation between Raman-proteome coefficients and cosine similarity LE (csLE) proteomes.

The matrices in the left-hand side of Equation 2.138 (a proteome structure based on Raman-proteome coefficients) and their counterparts in the right-hand side of Equation 2.138 (a proteome structure obtained with csLE) are listed.

Raman-omicscoef. structurecsLESize and type of matrixDescription
BEnorm(i=1ndi)1/2V~rw(=BEest,norm)n×m matrixCoefficients normalized by constants
IΘm×m orthogonal matrixOrthogonal transformation
m1/2diag((1m)P)(=m1/2diag(b0))(i=1ndi)1/2diag(PP)1/2D(=diag(b0est))n×n diagonal matrixConstant terms
ΣREnormΣLEm×m diagonal matrixSingular values

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ken-ichiro F Kamei
  2. Koseki J Kobayashi-Kirschvink
  3. Takashi Nozoe
  4. Hidenori Nakaoka
  5. Miki Umetani
  6. Yuichi Wakamoto
(2026)
Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns
eLife 14:RP101485.
https://doi.org/10.7554/eLife.101485.3