Research Article

Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns

Department of Basic Science, Graduate School of Arts and Sciences, The University of Tokyo, Japan
Department of Medicine, The University of Chicago, United States
Research Center for Complex Systems Biology, The University of Tokyo, Japan
Universal Biology Institute, The University of Tokyo, Japan
Department of Optical Imaging, Advanced Research Promotion Center Tokushima University, Japan
Department of Biology, New York University, United States

Apr 14, 2026

https://doi.org/10.7554/eLife.101485.3

Open access
Copyright information

Version of Record: April 14, 2026 Read the peer reviews
Reviewed Preprint: v2 November 27, 2025
Reviewed Preprint: v1 March 24, 2025

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Altmetric provides a collated score for online attention across various platforms and media.
See more details

1. Related to
Raman Spectroscopy: Probing the proteome

Wei-Hsiang Lin, Chia-Liang Cheng

Insight Jan 7, 2026

eLife Assessment

This paper reports the fundamental finding of how Raman spectral patterns correlate with proteome profiles using Raman spectra of E. coli cells from different physiological conditions and found global stoichiometric regulation on proteomes. The authors' findings provide compelling evidence that stoichiometric regulation of proteomes is general through analysis of both bacterial and human cells. In the future, similar methodology can be applied on various tissue types and microbial species for studying proteome composition with Raman spectral patterns.

https://doi.org/10.7554/eLife.101485.3.sa0

Significance of the findings:

Fundamental: Findings that substantially advance our understanding of major research questions

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Cells can adapt to various environments by changing their biomolecular profiles while maintaining physiological homeostasis. What organizational principles in cells enable the simultaneous realization of adaptability and homeostasis? To address this question, we measure Raman scattering light from Escherichia coli cells under diverse conditions, whose spectral patterns convey their comprehensive molecular composition. We reveal that dimension-reduced Raman spectra can predict condition-dependent proteome profiles. Quantitative analysis of the Raman-proteome correspondence characterizes a low-dimensional hierarchical stoichiometry-conserving proteome structure. The network centrality of each gene in the stoichiometry conservation relations correlates with its essentiality and evolutionary conservation, and these correlations are preserved from bacteria to human cells. Furthermore, stoichiometry-conserving core components obey growth law and ensure homeostasis across conditions, whereas peripheral stoichiometry-conserving components enable adaptation to specific conditions. Mathematical analysis reveals that the stoichiometrically constrained architecture is reflected in major changes in Raman spectral patterns. These results uncover coordination of global stoichiometric balance in cells and demonstrate that vibrational spectroscopy can decipher such biological constraints beyond statistical or machine-learning inference of cellular states.

Introduction

Biological cells can change their gene expression and metabolic profiles globally to adapt to their biological contexts and external conditions, while maintaining the homeostasis of their core physiological states. The simultaneous realization of adaptability and homeostasis is a hallmark of biological systems and is assumed to be a system-level property of gene expression profiles in cells (Waddington, 1957; Waddington, 1959). However, understanding the underlying organizational principles in comprehensive gene expression profiles remains to be a fundamental problem in biology.

Vibrational spectroscopy such as Raman spectroscopy might help us investigate such principles in gene expression profiles. Raman spectroscopy is a light scattering technique that measures energy shifts of light caused by interaction with sample molecules. Raman spectra are obtainable non-destructively even from biological samples such as individual cells. In principle, cellular Raman spectra are optical signatures conveying comprehensive molecular composition of targeted cells (Goodacre et al., 1998; Huang et al., 2004; Ichimura et al., 2014; Germond et al., 2018). Furthermore, no prior treatments, such as staining and tagging, are necessary to obtain cellular Raman spectra. However, although some biomolecules have separable and intense Raman signal peaks, Raman spectra of most biomolecules overlap and are masked by signals of other molecules due to the diversity and complexity of molecular compositions of cells. Therefore, it is impractical to comprehensively determine the amounts of biomolecules by spectral decomposition.

Despite the intractability of spectral decomposition, reconstruction of comprehensive molecular profiles may be achievable by analyzing detectable global spectral patterns (Figure 1A), thanks to effective low dimensionality of changes in molecular profile of targeted cells (Eisen et al., 1998; Segal et al., 2003; Bergmann et al., 2003; Keren et al., 2013; You et al., 2013; Kaneko et al., 2015; Hui et al., 2015; Heimberg et al., 2016; Biswas et al., 2017; Husain and Murugan, 2020; Sato and Kaneko, 2020; Figure 1B and Appendix 1—figure 1). Indeed, it has been demonstrated that condition-dependent global transcriptome profiles of cells can be inferred from cellular Raman spectra based on their statistical correspondence (Kobayashi-Kirschvink et al., 2018; Kobayashi-Kirschvink et al., 2024). Importantly, this Raman-spectroscopic transcriptome inference was possible from dimension-reduced Raman spectra. Therefore, dominant changes in global Raman spectral patterns may contain vital information about the constraints on the molecular profiles in cells; an inspection of their correspondence might give us insights into architectural principles of omics profiles and biological foundation for global omics inference from spectral patterns (Appendix 1—figure 1).

Figure 1

Download asset Open asset

Cellular physiological state differences detected by Raman spectral global patterns and gene expression profiles.

(A) Condition-dependent cellular Raman spectral patterns. Raman spectra obtained from cells reflect their molecular profiles. Therefore, systematic differences in global spectral patterns may indicate their physiological states. A Raman spectrum from each cell can be represented as a vector and a point in a high-dimensional Raman space. If condition-dependent differences exist in the spectral patterns, appropriate dimensional reduction methods allow us to classify the spectra and detect cellular physiological states in a low-dimensional space. (B) Condition-dependent gene expression profiles. Global gene expression profiles (proteomes and transcriptomes) are also dependent on conditions. For each gene, we can consider a high-dimensional vector whose elements represent expression levels under different conditions. It has been suggested that these expression-level vectors are constrained to some low-dimensional manifolds (Eisen et al., 1998; Segal et al., 2003; Bergmann et al., 2003; Keren et al., 2013; You et al., 2013; Kaneko et al., 2015; Hui et al., 2015; Heimberg et al., 2016; Biswas et al., 2017; Husain and Murugan, 2020; Sato and Kaneko, 2020). This study characterizes the statistical correspondence between dimension-reduced Raman spectral patterns and gene expression profiles. Analyzing the correspondence, we reveal a stoichiometry conservation principle that constrains gene expression profiles to low-dimensional manifolds.

In this report, we first reveal that, in addition to transcriptomes, condition-dependent proteome profiles of Escherichia coli are predictable from cellular Raman spectra. Next, we scrutinize the correspondence between Raman and proteome data, identifying several stoichiometrically conserved groups (SCGs) whose expression tightly correlates with the major changes in cellular Raman spectra. Finally, we reveal that the stoichiometry conservation centrality of each gene correlates with its essentiality, evolutionary conservation, and condition specificity of gene expression levels, which turns out general across different omics layers and organisms.

Results

Statistical correspondence between Raman spectra and proteomes

To examine the correspondence between Raman spectra and proteomes in E. coli, we reproduced 15 environmental conditions for which absolute quantitative proteome data are already available (Schmidt et al., 2016) and measured Raman spectra of E. coli cells under those conditions (Figure 2A and B). The culture conditions we adopted include (i) exponential growth phase in minimal media with various carbon sources, (ii) exponential growth phase in rich media, (iii) exponential growth phase with various stressors, and (iv) stationary phases (Appendix 1—table 1). We measured Raman spectra of single cells sampled from each condition and focused on the fingerprint region of biological samples, where the signals from various biomolecules concentrate (spectral range of 700–1800 cm⁻¹, Figure 2B and Appendix 1—figure 2). The Raman spectra were classified on the basis of the environmental conditions using a simple linear classifier, linear discriminant analysis (LDA) (Goodacre et al., 1998; Huang et al., 2004; De Bie et al., 2005; Figure 2C–E and Appendix 1—figure 1). This classifier calculates the most discriminatory axes by maximizing the ratio of between-condition variance to within-condition variance and reduces the dimensions of Raman data to $m - 1$ , where $m = 15$ is the number of conditions (see ‘Experimental methods, data acquisition, and data analyses’ in Materials and methods and Section 2.1 in Appendix).

Figure 2

Download asset Open asset

Estimation of proteomes from Raman spectra.

(A) The experimental design. We cultured *E. coli* cells under 15 different conditions and measured single cells’ Raman spectra. We then examined the correspondence between the measured Raman spectra and the absolute quantitative proteome data reported by Schmidt et al., 2016. (B) Representative Raman spectra from single cells, one from the ‘Glucose’ condition, and the other from the ‘LB’ condition. The fingerprint region and representative peaks are annotated. (**C–E**) Cellular Raman spectra in linear discriminant analysis (LDA) space. The dimensionality of the spectra is reduced to $14 (= 15 - 1)$ . Each point represents a spectrum from a single cell, and each ellipse shows the 95% concentration ellipse for each condition. Their projections to the LDA1-LDA2 plane (C), the LDA1-LDA3 plane (D), and the LDA1-LDA4 plane (E) are shown. (F) Visualization of the 14-dimensional LDA space embedded in two-dimensional space with t-distributed stochastic neighbor embedding (t-SNE). (G) The scheme of leave-one-out cross-validation. The Raman and proteome data of one condition (here $j$ ) are excluded, and the matrix $B$ is estimated using the data of the rest of the conditions as $B_{- j}^{e s t}$ . The proteome data under the condition $j$ is estimated from the Raman data ${\hat{𝒓}}_{j}$ with $B_{- j}^{est}$ and compared with the actual data to calculate estimation errors. (H) Comparison of measured and estimated proteome data. The plot for the ‘Glucose’ condition is shown as an example. Each dot corresponds to one protein species. The straight line indicates $x = y$ . Proteins with negative estimated values are not shown.

The result shows that Raman spectral points from different environmental conditions are distinguishable in the $(m - 1)$ -dimensional LDA space (Figure 2C–E). For example, the first and second LDA axes clearly distinguish the conditions ‘LB’ and ‘stationary3days’ (Figure 2C), and the third axis distinguishes ‘Glucose42C’ and ‘GlycerolAA’ (Figure 2D). Notably, the first principal axis LDA1 correlated with growth rate significantly (Pearson correlation $r = 0.81 \pm 0.09$ , Appendix 1—figure 2). Visualizing the Raman LDA data by embedding them on a two-dimensional plane using t-distributed stochastic neighbor embedding (t-SNE) (van der Maaten and Hinton, 2008) confirms that the points for each condition form a distinctive cluster (Figure 2F). These results imply that positions in the Raman LDA space reflect condition-dependent differences in cellular physiological states.

We next asked whether these Raman spectral differences in the LDA space could be linked to the different proteome profiles (Appendix 1—figure 1). To examine this, we hypothesized linear correspondence between the $n$ -dimensional proteome column vector ${\hat{p}}_{j}$ , where $n = 2058$ is the number of protein species in the proteome data, and the low-dimensional ( $(m - 1)$ -dimensional) Raman column vector ${\hat{𝒓}}_{j}$ in condition $j$ ,

{\hat{p}}_{j} = B \cdot [\begin{matrix} 1 \\ {\hat{r}}_{j} \end{matrix}] .

$B$ is an $n \times m$ matrix that connects ${\hat{𝒑}}_{j}$ and ${\hat{𝒓}}_{j}$ . We calculated ${\hat{𝒓}}_{j}$ as the average of the low-dimensional LDA Raman data of single cells in condition $j$ since the proteomes were measured for cell populations (Table 1).

Table 1

List of scalars, vectors, and matrices in the main text.

Scalars, vectors, and matrices in the main text are listed with their sizes and descriptions. $m$ is the number of conditions, and $n$ is the number of protein species. ( $m = 15$ and $n = 2058$ in the main text.) Note that the notation summarized in this table differs in some respect from that in Materials and methods and Appendix.

	Size (#columns × #rows)	Description
${\hat{r}}_{j}$ $(j = 1, \dots, m)$	$(m - 1) \times 1$ (vector)	Mean LDA Raman profile of single cells under condition $j$
${\hat{p}}_{j}$ $(j = 1, \dots, m)$	$n \times 1$ (vector)	Proteome profile of cell population under condition $j$
$B$ $= [\begin{matrix} b_{0} & \dots & b_{m - 1} \end{matrix}]$ $= (b_{i k})_{1 \leq i \leq n, 0 \leq k \leq m - 1}$	$n \times m$	Set of condition-independent coefficients that linearly connect ${\hat{r}}_{j}$ and ${\hat{p}}_{j}$ for all conditions $j$ (Equation 1)
$p_{i}$ $(i = 1, \dots, n)$	$m \times 1$ (vector)	Expression levels of protein species $i$ across $m$ conditions
$\cos θ_{p_{i} p_{j}}$ $= (p_{i} / ‖ p_{i} ‖_{2}) \cdot (p_{j} / ‖ p_{j} ‖_{2})$ $(i, j = 1, \dots, n)$	$1 \times 1$ (scalar)	Stoichiometry (abundance ratio) conservation strength between two protein species $i$ and $j$ (Figure 4A)
$A = {(\cos θ_{p_{i} p_{j}})}_{1 \leq i, j \leq n}$	$n \times n$	Set of stoichiometry conservation strengths between all pairs of protein species (Figure 5J)
$d_{i} = Σ_{j = 1}^{n} \cos θ_{𝒑_{i} 𝒑_{j}}$ $(i = 1, \dots, n)$	$1 \times 1$ (scalar)	Stoichiometry conservation centrality of protein species $i$
$g_{i} = ‖ p_{i} ‖_{1} / ‖ p_{i} ‖_{2}$ $(i = 1, \dots, n)$	$1 \times 1$ (scalar)	Expression generality of protein species $i$

We conducted leave-one-out cross-validation (LOOCV) to verify this linear correspondence (Figure 2G). We excluded one condition (here, $j$ ) as a test condition and estimated $B$ as $B_{- j}^{est}$ by simple ordinary least squares (OLS) regression using the data of the rest of the conditions. We thereby estimated the proteome in condition $j$ as ${\hat{p}}_{j}^{e s t} = B_{- j}^{e s t} \cdot [\begin{matrix} 1 \\ {\hat{r}}_{j} \end{matrix}]$ .

The proteome profile estimated using the first four major LDA axes (LDA1–LDA4) agreed well with the actual proteome data under most conditions (Figure 2H and Appendix 1—figure 3; see ‘Raman-proteome statistical correspondence’ in Materials and methods for the estimation with all the LDA axes). Changing the condition to exclude, we estimated the proteomes for all the 15 conditions and calculated the overall estimation error by the Euclidean distance $\sum_{j} ‖ {\hat{p}}_{j}^{e s t} - {\hat{p}}_{j} ‖^{2}$ . The result shows that the overall estimation error is significantly small ( $p = 0.00005$ by permutation test; Fisher, 1935; Pitman, 1937; Phipson and Smyth, 2010). Adopting other distance measures does not change the conclusion (Appendix 1—tables 2 and 3). These results, therefore, validate the assumption of linear correspondence between cellular Raman spectra and proteomes and confirm that condition-dependent changes in proteomes can be inferred from the corresponding low-dimensional Raman spectra.

Stoichiometry conservation of proteins in the ISP COG class

Since the dimensionality of the proteome data is significantly higher than that of the Raman data, the result above suggests that changes in proteome profiles are constrained in low-dimensional space. The regression matrix $B$ considered above determines how the proteomes relate to the Raman LDA axes. Therefore, analyzing $B$ should provide some insights into constraints on condition-dependent changes in the proteomes (Appendix 1—figure 1).

The $n \times m$ matrix $B$ is represented as $B = [\begin{matrix} b_{0} & b_{1} & \dots & b_{m - 1} \end{matrix}]$ , where the $(k + 1)$ -th column $b_{k} = {(b_{1 k} b_{2 k} \dots b_{n k})}^{⊤}$ ( $0 \leq k \leq m - 1$ ) is the collection of coefficients of all $n$ proteins for the $k$ -th LDA axis (Table 1). In the case of $k = 0$ , the coefficients are constant terms. We first asked whether any shared features might exist in the coefficients of $B$ depending on biological functions of corresponding proteins. We then classified the proteins according to functional annotations of Clusters of Orthologous Group (COG) classes (Tatusov et al., 1997; Tatusov et al., 2003; Galperin et al., 2015) and found that, for many proteins belonging to the ‘information storage and processing’ (ISP) COG class, the coefficients corresponding to different LDA axes are approximately proportional to the constant terms, i.e., $b_{l k} \approx c_{k} b_{l 0}$ , where $l$ is the index of an ISP COG class protein species and $c_{k}$ is the proportionality constant common to many ISP COG class protein species for the $k$ -th LDA axis (Figure 3A). The ISP COG class contains various proteins involved in processing genetic information such as translation, transcription, DNA replication, and DNA repair (Schmidt et al., 2016). Simple calculations show that these proportionality relationships imply that proteins in the ISP COG class conserve their mutual abundance ratios, i.e., stoichiometry, irrespective of environmental conditions (see ‘Characterizing an SCG by analyzing the Raman-proteome correspondence matrix’ in Materials and methods).

Figure 3

Download asset Open asset

A stoichiometrically conserved protein group identified by an analysis of the Raman-proteome coefficient matrix.

(A) Scatterplots of Raman-proteome transformation coefficients. The horizontal axes are constant terms ( $𝒃_{0}$ ) in all the plots. The vertical axis is coefficients for LDA1 ( $𝒃_{1}$ ), LDA2 ( $𝒃_{2}$ ), LDA3 ( $𝒃_{3}$ ), or LDA4 ( $𝒃_{4}$ ) in each plot. The proteins in the information storage and processing (ISP) Clusters of Orthologous Group (COG) class are indicated in yellow. Yellow solid straight lines are least squares regression lines passing through the origins for the ISP proteins. Insets are enlarged views of area around the origins. In this figure, we used the average of $B_{- i}^{est}$ as an estimate of $B$ . (B) Similarity of expression patterns between culture conditions for each COG class. We divided the proteome into COG classes (Tatusov et al., 2003; Galperin et al., 2015) and calculated Pearson correlation coefficient of expression patterns for all the combinations of culture conditions. Since the data are from 15 conditions, there are 105 (=15·14/(2·1)) points for each COG class in the graph. The box-and-whisker plots summarize the distributions of the points. The lines inside the boxes denote the medians, the top and bottom edges of the boxes do the 25th percentiles and 75th percentiles, respectively. The numbers of protein species are 376 for the Cellular Processes and Signaling COG class, 354 for the ISP COG class, and 840 for the Metabolism COG class. See Appendix 1—figure 4 for the evaluation with Pearson correlation coefficient of log abundances and with cosine similarity. Appendix 1—figure 4 also contains figures directly showing expression-level changes of different protein species across conditions for each COG class. (C) Examples of stoichiometry-conserving proteins in the ISP COG class. The horizontal axis represents the abundance of RplF under 15 conditions, and the vertical axis represents those of several ISP COG class proteins. These proteins are also contained in the *homeostatic core* defined later (see Figure 4). The solid straight lines are linear regression lines with an intercept of zero. (D) Examples of abundance ratios of non-ISP COG class proteins. The horizontal axis represents the abundance of RplF under 15 conditions, and the vertical axis represents those of compared non-ISP COG class proteins. Crp belongs to the Cellular Processes and Signaling COG class; the other proteins belong to the Metabolism COG class. In both (C) and (D), we selected the proteins expressed from distant loci on the chromosome. All sigma factors participating in the regulation of the proteins examined in (C) and (D) are listed on the right of the gene name legends. All transcription factors known to regulate multiple genes listed here are shown in the right diagrams. Arrows show activation; bars represent inhibition; and squares indicate that a transcription factor activates or inhibits depending on other factors. The information on gene regulation and functions was obtained from EcoCyc (Keseler et al., 2017) in August 2022. The error bars are standard errors calculated by using the data of Schmidt et al., 2016. The insets show the positions of the genes on the *E. coli* chromosome determined based on ASM75055v1.46 (Howe et al., 2020). No genes are in the same operon.

Since this is an implication from the Raman-proteome correspondence, we next examined the stoichiometry conservation only with the proteome data, evaluating the expression levels with Pearson correlation coefficients for all the pairs of the conditions for each COG class (Figure 3B). For the ISP COG class, the correlation coefficients were close to 1, whereas those for the other COG classes were significantly smaller depending on condition pairs. We also evaluated the coordination of gene expression patterns within each COG class using cosine similarity and obtained consistent results (Appendix 1—figure 4). Therefore, stoichiometry conservation is stronger in the ISP COG class than in the other COG classes. Remarkably, neither shared transcription factors nor chromosome locations can account for the observed stoichiometry conservation of many protein pairs. Indeed, although the ISP COG class shows highly coordinated expression patterns (Figure 3C) compared to the non-ISP COG class (Figure 3D), the gene loci are not chromosomally clustered in either example. Additionally, the similarity/dissimilarity of expression patterns cannot easily be inferred from transcription factor regulation patterns. These results imply multi-level regulation of their abundance.

We consulted other public quantitative proteome data of Mycobacterium tuberculosis (Schubert et al., 2015), Mycobacterium bovis (Schubert et al., 2015), and Saccharomyces cerevisiae (Lahtvee et al., 2017) under environmental perturbations and consistently found strong stoichiometry conservation of the ISP COG class (Appendix 1—figure 4). Furthermore, the same trend was observed for the genotype-dependent expression changes in E. coli proteomes (Schmidt et al., 2016; Appendix 1—figure 4).

Identifying SCGs

Inspired by the existence of a large class of proteins that conserves their stoichiometry, we considered a systematic way to extract SCGs without relying on artificial functional classification of COG (Appendix 1—figure 1). Focusing only on the proteome data, we evaluated stoichiometry conservation for all the pairs of proteins in the proteome by calculating the cosine similarity of expression patterns (i.e. all $\cos θ_{p_{i} p_{j}} := (p_{i} / ‖ p_{i} ‖_{2}) \cdot (p_{j} / ‖ p_{j} ‖_{2})$ in Figure 4A and Table 1, where each element of the $m$ -dimensional vector $𝒑_{i}$ denotes the expression level of protein species $i$ under one of the $m$ conditions), and extracted groups in each of which the component proteins exhibit coherent expression change patterns by setting a high threshold of cosine similarity ( $\geq 0.995$ , Figure 4B; see ‘Direct characterization of SCGs in omics data’ in Materials and methods for details).

Figure 4

Download asset Open asset

Extracting stoichiometrically conserved groups (SCGs) from proteome data.

(A) Quantifying stoichiometry conservation by cosine similarity. We consider an $m$ -dimensional expression vector for each protein species whose elements represent its abundance under different conditions. The cosine similarity between the $m$ -dimensional expression vectors of two protein species becomes nearly 1 when they conserve mutual stoichiometry strongly across conditions, whereas lower than 1 when their expression patterns are incoherent. (B) Extracted SCGs. We extracted proteins with high cosine similarity relationships. Each node represents a protein species. An edge connecting two nodes represents that the expression patterns of the two connected protein species have high cosine similarity exceeding a threshold of 0.995. Proteins that have no edge with the other proteins are not shown. The largest and the second largest protein groups, which we refer to as SCG 1 and SCG 2, respectively, are indicated by shaded polygons. (C) Expression patterns of the extracted SCGs. The horizontal and vertical axes represent growth rate and protein abundance, respectively. Line-connected points represent expression-level changes of different protein species across conditions. SCG 1 (homeostatic core) is shown in two ways: the left panel with a linear-scaled vertical axis and the right panel with a log-scaled vertical axis. The inset for SCG 2 shows the total abundances of SCG 2 proteins with a log-scaled vertical axis. Error bars are standard errors. (D) The gene loci of the homeostatic core (SCG 1) proteins on the chromosome. Magenta dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). We determined the gene loci based on ASM75055v1.46 (Howe et al., 2020).

The largest SCG (SCG 1) included many proteins in the ISP COG class (91 out of 191 SCG 1 members), such as ribosomal proteins and RNA polymerase, and also proteins in the other COG classes (Figure 4B, Appendix 1—table 4). We call this largest SCG homeostatic core, as it constitutes the largest stoichiometry-conserving unit in cells. We found that the abundance of each protein in the homeostatic core (SCG 1) increased approximately linearly with the growth rate in each condition (Figure 4C). This relationship is reminiscent of the growth law: The total ribosomal contents for translation increase linearly with growth rate (Neidhardt and Magasanik, 1960; Scott et al., 2010; Bremer and Dennis, 2008). The linear increase in the abundance of each protein in Figure 4C indicates that the growth law is valid even at the single-gene level for a large class of ribosomal and non-ribosomal proteins in the homeostatic core (Appendix 1—figure 5) (see Section 3.1 in Appendix).

Though not evenly distributed, the gene loci of the proteins in the homeostatic core are scattered throughout the chromosome (Figure 4D). Therefore, localization of gene loci to a single or a small number of operons is not likely a cause of the observed stoichiometry conservation.

The proteins in the second largest SCG (SCG 2) are expressed at high levels in the fast growth conditions, especially in the ‘LB’ condition (Figure 4C). The SCG 2 includes many proteins in the metabolism COG class (21 out of 26 SCG 2 members) (Appendix 1—table 5), and their abundance increases approximately exponentially with growth rate (Figure 4C). We also identified other condition-specific small SCGs, such as a group most expressed in the ‘GlycerolAA’ condition (SCG 3) (Appendix 1—table 6), a group mainly expressed in the ‘Fructose’ condition (SCG 4) (Appendix 1—table 7), and a group most expressed in the stationary phase conditions (SCG 5) (Appendix 1—table 8; Figure 4C).

Biological relevance of stoichiometry conservation

To understand the overall strength of stoichiometry conservation of the proteins in the different SCGs, we calculated the sum of cosine similarity, $d_{i} = \sum_{j} \cos θ_{p_{i} p_{j}}$ , for each protein species $i$ , where $\cos θ_{p_{i} p_{j}}$ is cosine similarity between the $m$ -dimensional expression level vectors of protein $i$ and protein $j$ (Figure 4A), and the sum is taken over all the protein species (see ‘Global proteome structures based on stoichiometric balance’ in Materials and methods). We refer to $d_{i}$ as ‘stoichiometry conservation centrality’ (Table 1).

The proteins in the homeostatic core had high centrality scores (Figure 5A). Therefore, these proteins tend to have more connections with other proteins in terms of stoichiometry conservation. On the other hand, the proteins in the condition-specific SCGs tend to have low centrality scores among all the proteins (Figure 5A), which suggests that their stoichiometry conservation is localized within each SCG.

Figure 5

Download asset Open asset

A proteome structure characterized by global stoichiometry conservation relationships.

(A) Distributions of stoichiometry conservation centrality values for all the proteins (gray), the homeostatic core (SCG 1) proteins (magenta), and the proteins belonging to the other stoichiometrically conserved groups (SCGs) (cyan). (B) Correlation between stoichiometry conservation centrality and gene essentiality. The proportion of essential genes within each class of stoichiometry conservation ranking is shown. The list of essential genes was downloaded from EcoCyc (Keseler et al., 2017). (C) Correlation between stoichiometry conservation and evolutionary conservation. The strength of evolutionary conservation of each protein species was estimated by the number of orthologs found in the OrthoMCL species (Chen et al., 2006). The genes with more orthologs tend to have higher stoichiometry conservation centrality ( $p = 3.42 \times 10^{- 14}$ by one-sided Brunner-Munzel test between the top 25% and the bottom 25% fractions of ortholog number ranking). Likewise, the genes with higher stoichiometry conservation centrality scores tend to have more orthologs ( $p = 8.44 \times 10^{- 12}$ by one-sided Brunner-Munzel test, top 25%–bottom 25% comparison; $p$ -values in the captions for (**F–I**) were evaluated with the same statistical test scheme). (**D–G**) Stoichiometry conservation analyses of human cell atlas transcriptome data of fetal 15 organs (Cao et al., 2020). The top gray histogram in (D) shows the distribution of stoichiometry conservation centrality values for all genes. The bottom histograms in (D) show the distribution for coding genes (yellow) and that for the other genes (cyan). (E) shows a correlation between the ratio of coding genes and stoichiometry conservation centrality calculated from the human cell atlas data. (F) shows a correlation between gene essentiality and stoichiometry conservation centrality calculated from the human cell atlas data. The essentiality of each human gene was quantified by CRISPR score, which is the fitness cost imposed by CRISPR-based inactivation of the gene in KBM7 chronic myelogenous leukemia cells (Wang et al., 2015). Genes with lower CRISPR score are regarded as more essential. The fraction with low CRISPR scores (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The fraction with high centrality scores tends to be more essential ( $p < 10^{- 15}$ ). (G) shows a correlation between evolutionary conservation and stoichiometry conservation centrality based on the human cell atlas data. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The gene fraction with high centrality scores tends to have more orthologs ( $p < 10^{- 15}$ ). (H) and (I) Stoichiometry conservation analyses of genome-wide Perturb-seq data (Replogle et al., 2022). (H) shows a correlation between stoichiometry conservation centrality calculated from the Perturb-seq data and gene essentiality. The essentiality of each gene was quantified by the CRISPR score as in (F). The gene fraction with low CRISPR scores (i.e. high essentiality fraction) tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The gene fraction with high centrality scores tends to be more essential ( $p < 10^{- 15}$ ). (I) shows a correlation between stoichiometry conservation based on the Perturb-seq data and evolutionary conservation of genes. The gene fraction with many orthologs tends to have higher stoichiometry conservation centrality ( $p < 10^{- 15}$ ). The gene fraction with high centrality scores tends to have more orthologs ( $p < 10^{- 15}$ ). (J) Representation of the proteomes as a graph. A node corresponds to a protein species, and the weight of an edge is taken as the cosine similarity between the $m$ -dimensional expression vectors of the two connected protein species. The $n \times n$ matrix $A$ can specify the whole graph. Note that the diagonal elements of $A$ are ones, which were introduced just for simplicity. (K) Cosine similarity LE (csLE) structure in a three-dimensional space. Each dot represents a different protein species and is color-coded on the basis of its stoichiometry conservation centrality value. We selected the axes considering the structural similarity to the Raman-based proteome structure in $Ω_{B}$ (see Figure 6). (L) The csLE structure in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a different protein species. The proteins belonging to each SCG are indicated with distinct markers. Colors of the two-dimensional histograms in (C), (F), (G), (H), and (I) represent the height of each bar.

The stoichiometry conservation centrality is biologically relevant because it correlates with gene essentiality. Fractions of essential genes almost monotonically decrease with the ranks of centrality score (Figure 5B and Appendix 1—figure 6). We also noted that genes with high centrality scores have more orthologs determined by OrthoMCL-DB (Chen et al., 2006) across the three domains of life (Figure 5C and Appendix 1—figure 6). Likewise, genes with many orthologs tend to have higher centrality scores (Figure 5C and Appendix 1—figure 6). Therefore, the stoichiometry conservation in cells correlates with the evolutionary conservation of proteins.

To determine if the correlation of stoichiometry conservation centrality with gene essentiality and evolutionary conservation is general, we analyzed the transcriptome data from other organisms and found comparable correlations in Schizosaccharomyces pombe (Appendix 1—figure 6). In addition, we found that fractions of coding genes almost monotonically decreased with ranks of centrality score in the S. pombe data (Appendix 1—figure 6).

We further analyzed two kinds of Homo sapiens transcriptome data. One is a human cell atlas, in which expression of both coding and non-coding genes in 15 fetal organs was quantified (Cao et al., 2020), and the other is genome-wide Perturb-seq data (Replogle et al., 2022), in which genetically perturbed transcriptomes were measured mainly for coding genes. Our analysis of the human cell atlas data revealed that, while the overall distribution of stoichiometry conservation centrality was broad (Figure 5D, top), the centrality distribution of coding genes was skewed to higher values (Figure 5D, bottom) as observed for the E. coli proteome. Fractions of coding genes almost monotonically decreased with ranks of centrality (Figure 5E) as seen in the S. pombe data (Appendix 1—figure 6). Essentiality of each gene in human cells was quantified by an index called CRISPR score, which measures the fitness cost imposed by CRISPR-based inactivation of the gene (Wang et al., 2015). Genes with lower CRISPR scores are considered more essential. Our analysis revealed that genes with higher stoichiometry conservation centrality scores tend to have lower CRISPR scores, thus more essential (Figure 5F). Similarly, genes with lower CRISPR scores tend to have higher stoichiometry conservation centrality scores. Furthermore, genes with higher centrality scores have more orthologs across the three domains of life and vice versa (Figure 5G). Comparable correlations of stoichiometry conservation with essentiality and evolutionary conservation were also found in the genome-wide Perturb-seq data (Figure 5H and I). Together, these results suggest that correlations of stoichiometry conservation centrality with gene essentiality and evolutionary conservation are general and preserved from E. coli to human cells regardless of the type of perturbation (see ‘Relevance of centrality of csLE structure to biological functions’ in Materials and methods for details).

Revealing global stoichiometry conservation architecture of the proteomes with csLE

Although the previous analysis revealed the biological relevance of stoichiometry conservation centrality, it is a one-dimensional quantity and cannot capture the global architecture of omics profiles. To gain further insights into genome-wide stoichiometry-conserving relationships among genes, we next analyzed the proteomes using a method similar to Laplacian eigenmaps (LE) (Appendix 1—figure 1; Belkin and Niyogi, 2003). We consider a symmetric $n \times n$ matrix $A$ whose $(i, j)$ entry is $\cos θ_{p_{i} p_{j}}$ (Figure 5J, Table 1). The entire proteome structure can be represented using the eigenvectors of normalized $A$ . Major differences of this method from the ordinary LE are that we consider an edge for all node pairs and that we adopt cosine similarity for weighting edges. This method places the proteins with higher cosine similarity closer in the resulting $(m - 1)$ -dimensional space (see ‘Global proteome structures based on stoichiometric balance’ in Materials and methods and Section 2.1 in Appendix); we call this linear method cosine similarity LE (csLE).

In this $(m - 1)$ -dimensional csLE space $Ω_{L E}$ , the stoichiometry conservation centrality of the proteins decreased from center to periphery (Figure 5K), which confirms that it indeed measures the extent to which each protein is close to the center in the entire stoichiometry conservation architecture. Furthermore, the proteins formed polyhedral distributions with the cluster of the proteins in the homeostatic core at the center and the clusters of the proteins in the other condition-specific SCGs at distinct vertices (Figure 5L). This distribution is consistent with the fact that the condition-specific SCGs are the components whose expression patterns are distant from the homeostatic core and also between each other.

Representing the proteomes using the Raman LDA axes

Given that the analysis of the LDA Raman-proteome regression coefficients $B$ (Figure 3A) eventually led us to identify the stoichiometry conservation architecture in the proteome data (Figure 5), the low-dimensional proteome structure in $Ω_{LE}$ might be related to major changes in cellular Raman spectra in the LDA space and provide insight into the Raman-proteome correspondence. To investigate this, we considered representing the proteomes on the basis of the Raman LDA axes (Appendix 1—figure 1).

The coefficients in the $n \times m$ regression matrix $B$ must satisfy the proportionality $b_{i k} / b_{i 0} = b_{j k} / b_{j 0}$ for all $k$ -th LDA axes ( $1 \leq k \leq m - 1$ ) for the pair of protein $i$ and protein $j$ that perfectly conserve their stoichiometry, as previously mentioned in the analysis of the ISP COG class (Figure 3A; see ‘Characterizing an SCG by analyzing the Raman-proteome correspondence matrix’ in Materials and methods and Section 2.1 in Appendix). Noting this property, we constructed another $(m - 1)$ -dimensional proteome space $Ω_{B}$ , assigning each protein species $i$ a coordinate $(β_{i}^{L D A 1} β_{i}^{L D A 2} \dots β_{i}^{L D A (m - 1)})$ , where $β_{i}^{L D A k} := b_{i k} / b_{i 0}$ is the normalized coefficient of gene $i$ corresponding to the $k$ -th LDA axis. As in $(m - 1)$ -dimensional $Ω_{LE}$ , a pair of proteins with strong stoichiometry conservation is expected to position closely in this $(m - 1)$ -dimensional proteome space $Ω_{B}$ . Note that the proximity of the coordinates $β_{i}^{LDA k}$ of different proteins $i$ in $Ω_{B}$ is equivalent to the approximate proportionality of different proteins $i$ in Figure 3A, demonstrated for the ISP COG class using the proportionality constants (normalized coefficients) $c_{k}$ common to different proteins.

We then found that the distribution of the proteins in $Ω_{B}$ closely resembled the one in $Ω_{LE}$ when visualized using the first few major axes (Figure 5L and Figure 6A). This similarity is nontrivial because $Ω_{LE}$ is constructed only from the proteome data, whereas $Ω_{B}$ depends on the $(m - 1)$ -dimensional Raman LDA space (Figure 2C–E).

Figure 6

Download asset Open asset

Raman-based proteome structure and its similarity to stoichiometry-based proteome structure.

(A) Proteome structure determined by Raman-proteome coefficients visualized in a three-dimensional space. The views from two different angles are shown. Each gray dot represents a protein species. The proteins belonging to each stoichiometrically conserved group (SCG) are indicated with distinct markers. We note that SCGs are defined without referring to Raman data (Figure 4). (**B–D**) Similarity among the distribution of linear discriminant analysis (LDA) Raman spectra (B), the proteome structure determined by Raman-proteome coefficients (C), and the proteome structure determined by stoichiometry conservation (D). (E) Mathematical relation between the coordinates of the proteins in $Ω_{B}$ (C) and $Ω_{LE}$ (D). The two conditions, one with $Θ$ (magenta) and the other between $𝒃_{0}$ and $𝒃_{0}^{est}$ (cyan), must hold for the similarity between the two proteome structures (yellow), as described in the gray box. $\overset{⋆}{\propto}$ denotes column-wise proportionality.

We remark that each axis of $Ω_{B}$ is directly linked to the corresponding Raman LDA axis. Consequently, the orthants in $Ω_{B}$ where the condition-specific protein species reside agree with those in the Raman LDA space where the cellular Raman spectra under corresponding conditions reside (Appendix 1—figure 10) (see ‘Global omics structures characterized by Raman-omics correspondences’ in Materials and methods and Section 2.1 in Appendix). Indeed, we find such orthant agreement between the proteins in the condition-specific SCGs (SCG 2–SCG 5) and the cellular Raman spectra under the corresponding conditions (Figure 6B and C). This straightforward correspondence between $Ω_{B}$ and the Raman LDA space allows us to examine the relationship between changes in cellular Raman spectra and omics components’ stoichiometry conservation architecture by comparing the two proteome structures in $Ω_{B}$ and $Ω_{LE}$ .

Omics-level interpretation of cellular Raman spectra and a quantitative constraint between expression generality and stoichiometry conservation centrality

To understand rigorously what the similarity of the proteome structures in $Ω_{B}$ and $Ω_{LE}$ signifies (Figure 6C and D), we clarified the mathematical relation between the coordinates of the proteins in these two spaces (Figure 6E and Appendix 1—figure 1; see Sections 2.1 and 2.2 in Appendix for details). We then characterized the two mathematical conditions that must be satisfied simultaneously (Figure 6E).

The first condition is that major axes of the Raman LDA space and those of the proteome csLE space correspond (Figure 6E). Consequently, cellular Raman spectra under a condition accompanying the expression of a condition-specific SCG must be significantly different from those under conditions with the expression of other condition-specific SCGs in a manner distinguishable by LDA. Mathematically, this condition is related to the $m \times m$ orthogonal matrix $Θ$ that appears in the equation in Figure 6E. For the distributions of the proteome components to be similar in the low-dimensional subspaces of $Ω_{LE}$ and $Ω_{B}$ , $Θ$ must be close to the identity matrix with small off-diagonal elements (Figure 6E). We verified this first condition with the data (Appendix 1—figure 9; see ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details).

The second condition relates to the proportionality of the $n$ -dimensional vectors $𝒃_{0}$ and $𝒃_{0}^{est}$ in Figure 6E. This proportionality relation can be transformed into another relation that $d_{i}$ is proportional to $g_{i} := ‖ p_{i} ‖_{1} / ‖ p_{i} ‖_{2}$ , where ${∥ 𝒑_{i} ∥}_{1}$ and ${∥ 𝒑_{i} ∥}_{2}$ are the $L^{1}$ and $L^{2}$ norms of the expression-level $m$ -dimensional vector of protein $i$ across conditions (Figures 4A and 6E, Table 1).

$g_{i}$ can be interpreted as the expression generality score. When $g_{i}$ is large, the protein $i$ is expressed generally across conditions; when $g_{i}$ is small, this is expressed only under specific conditions (Appendix 1—figure 8) (see ‘Interpretation of $L^{1}$ norm/ $L^{2}$ norm ratio of an expression vector as a quantitative measure of expression generality’ in Materials and methods). Therefore, the proportionality between $d_{i}$ and $g_{i}$ indicates that the proteins with high stoichiometry conservation centrality must be expressed nonspecifically to conditions. We also verified this condition with the data, confirming that it is indeed satisfied (Figure 7A and Appendix 1—figure 9).

Figure 7

Download asset Open asset

The spread of the points from the proportionality diagonal line of the E. coli proteome data in Figure 7A was found related to the growth rate under the condition where each protein is expressed the most (see Section 2.2 in Appendix for a detailed analysis on the origin of the deviation). Consequently, one can envisage a growth-rate-dependent expression pattern of each protein on the basis of its relative position in this $g_{i}$ - $d_{i}$ plot (Figure 7B and C). For example, both BamB and YqjD are expressed nonspecifically to the conditions with nearly identical expression generality scores. However, BamB is expressed at higher levels under fast growth conditions, whereas YqjP is expressed at higher levels under slow growth conditions due to their relative positions to the proportionality line. A similar growth rate dependence is observed for PaaE and DgoA, but with more prominent condition specificity because these proteins are characterized by their low expression generality scores. These growth-rate-dependent deviation patterns might hint at a new growth law that governs the total relative expression changes of the proteome components (see Section 2.2 in Appendix for detailed discussion).

Generality

We also examined the generality of the aforementioned two conditions using the Raman and proteome data of E. coli strains with different genotypes (BW25113, MG1655, and NCM3722) under two culture conditions (Schmidt et al., 2016) and the Raman and transcriptome data of S. pombe under 10 culture conditions (Kobayashi-Kirschvink et al., 2018). Applying csLE to the omics data, we again found similar omics structures between $Ω_{LE}$ and $Ω_{B}$ when visualized using the first few major axes, with homeostatic cores at the centers and condition-specific SCGs at the vertices (Appendix 1—figures 11 and 12).

Proportionality between stoichiometry conservation centrality and expression generality score was also confirmed in both additional datasets (Appendix 1—figure 7). We further used publicly available quantitative proteome data of M. tuberculosis, M. bovis, and S. cerevisiae (Schubert et al., 2015; Lahtvee et al., 2017) to examine this relation and confirmed that the proportionality universally holds (Appendix 1—figures 7 and 13). Almost no deviation from the proportionality line existed in the S. cerevisiae proteome data measured for the cells in different media but cultured in chemostats with an identical dilution rate (thus, identical growth rate), which is consistent with the result of E. coli in which the deviations were related to the growth rate differences.

Discussion

A Raman spectrum obtained from a single cell is a superposition of the spectra of all of its constituent biomolecules. Therefore, cellular Raman spectra potentially contain rich information on essential state differences in targeted cells. The fact that both transcriptomes and proteomes are inferable from cellular Raman spectra, as demonstrated in this and previous (Kobayashi-Kirschvink et al., 2018) studies, endorses this speculation. The detailed analyses of the relationship between Raman and omics data have identified functionally relevant constraints on omics changes and provided an interpretation of cellular Raman spectra (Appendix 1—figure 1). Specifically, it has been revealed that major changes in cellular Raman spectra distinguishable by LDA reflect the changes in omics profiles under the constraints of stoichiometry conservation. This correspondence would help us interpret global changes in cellular Raman spectra by translating them into the differences in omics profiles.

We remark that linearity in our formulation enabled us to find the rigorous connection between the two omics spaces $Ω_{B}$ and $Ω_{LE}$ (Figure 6E). Unlike the original LE, we adopted cosine similarity as weights of edges between all node pairs to measure expression stoichiometry conservation of proteins. This modification was indispensable in terms of interpretation; relative proximity of positions in $Ω_{LE}$ reflects the strength of stoichiometry conservation. We also remark that simple principal component analysis (PCA) applied to the normalized E. coli proteome data also finds a similar low-dimensional proteome structure (Appendix 1—figure 6) (see ‘Proteome structure obtained with PCA’ in Materials and methods). Therefore, besides interpretability, omics structures in $Ω_{LE}$ might reflect dominant relationships among omics components commonly characterized by several methods of omics representation.

It should be noted that the quantitative analysis of Raman-omics correspondence resulted in the characterization of stoichiometry-conserving architecture in cells (Appendix 1—figure 1). This shows that besides distinguishing different cellular states or quantifying specific biomolecular species by focusing on spectral peaks, Raman spectra can also characterize the system-level constraints behind changes in global gene expression profiles. While the identified features, such as stoichiometry conservation centrality, expression generality score, and csLE space, can be calculated without Raman data, it is difficult to reach them directly without scrutinizing the Raman-omics correspondence. Furthermore, the definition of expression generality and its relation to stoichiometry conservation centrality were directly derived from the Raman-omics correspondence analysis (Figures 6E and 7). Therefore, as a signal reflecting comprehensive molecular profiles in cells, Raman spectra are an important modality for dissecting system-level properties and constraints in cells.

In this study, we mainly analyzed the Raman and proteome data of E. coli under 15 different environmental conditions. However, the resulting low-dimensional structures and correspondence of $Ω_{LE}$ and $Ω_{B}$ can change depending on what and how many conditions are included in the analysis. Thus, an intriguing question is how the Raman-proteome correspondence is affected by the conditions used in the analysis. A subsampling analysis focusing on the orthogonal matrix $Θ$ , which represents low-dimensional correspondence precision of $Ω_{LE}$ and $Ω_{B}$ (Figure 6E), reveals that correspondence precision tends to increase with an increasing number of conditions (Appendix 1—figure 14). This result suggests that increasing the number of conditions generally improves the low-dimensional correspondence rather than disrupting it.

Since the proteome data that we referenced (Schmidt et al., 2016) represent the averaged expression profile of the cells in each condition, we likewise averaged the single-cell Raman data in each condition in the LDA space to determine their correspondence. Once this correspondence is established, it becomes technically feasible to infer the proteomes of individual cells from their Raman spectra. However, verifying the accuracy of the inferred proteome profiles requires quantitative ground truth of single-cell proteomes, which are not yet readily obtainable, especially for bacterial cells. Despite this limitation, future studies may clarify the correspondence at the single-cell level as omics technology advances.

Stoichiometry conservation is plausibly crucial for cellular functions and physiology. For example, the enzymes involved in evolutionarily conserved metabolic pathways conserve their stoichiometry across microorganism species despite their diverse transcriptional and translational rates (Lalanne et al., 2018). It is suggested that stoichiometry conservation is achieved by optimizing the metabolic flux for fast growth (Lalanne and Li, 2021). Furthermore, a ribosome-targeting antibiotic causes an imbalance of ribosomal proteins and growth arrest in E. coli, but the balance is restored alongside growth recovery through physiological adaptation (Koganezawa et al., 2022). These results suggest that disruption of stoichiometric balance among core components could impose significant fitness cost.

It is known that functions, essentiality, and evolutionary conservation of genes can be linked to the topologies of gene networks (Jeong et al., 2001; He and Zhang, 2006; Yu et al., 2007; Fraser et al., 2002; Wuchty et al., 2003; Li et al., 2020). However, networks that have been previously analyzed, such as protein-protein interaction networks, depend on known interactions. Therefore, as our understanding of the molecular interactions evolves with new findings, the conclusions may change. Furthermore, analysis of a particular interaction network cannot account for effects of different types of interactions or multilayered regulations affecting each protein species, thus highlighting only one aspect of the inherently global coordination of molecular compositions in cells. In contrast, the stoichiometry conservation network in this study focuses solely on expression patterns as the net result of interactions and regulations among all types of molecules in cells. Consequently, the stoichiometry conservation networks are not affected by the detailed knowledge of molecular interactions and naturally reflect the global effects of multilayered interactions behind cellular physiological state changes. Additionally, stoichiometry conservation networks can easily be obtained for non-model organisms, for which detailed molecular interaction information is usually unavailable. Therefore, analysis with the stoichiometry conservation network has several advantages over existing methods from both biological and technical perspectives.

It is intriguing to ask how cells conserve stoichiometry among the components in each SCG. In particular, the homeostatic core (SCG 1) contains many components whose gene loci are scattered throughout the genome. It is known that both transcriptional and translational negative autoregulation contributes to controlling the stoichiometry of many ribosomal proteins (Nomura et al., 1980; Dean et al., 1981; Kaczanowska and Rydén-Aulin, 2007; Portier and Grunberg-Manago, 1993; Aseev et al., 2008; Roy et al., 2020). The genes for the ribosomal proteins are scattered in multiple operons and co-regulated with many other non-ribosomal proteins, such as RNA polymerase subunits, translation initiation/elongation factors, and transmembrane transporters (Keseler et al., 2017). Therefore, the stoichiometry-conserving mechanisms established for ribosomes might be partially exploited for the stoichiometry conservation within the homeostatic core.

The existence of condition-specific SCGs and genes with similar expression patterns confirms that adaptation to specific conditions is not necessarily achieved by a small number of functionally relevant genes, but is often accompanied by changes in the expression of many seemingly unrelated genes. Indeed, condition-specific SCGs contain genes with unclear roles in adaptation, including some that are functionally uncharacterized (Appendix 1—table 5–8). Therefore, it would be important to investigate whether the coexpression of multiple genes is crucial for cellular adaptation to a wide range of perturbations while maintaining homeostasis.

The proportionality between stoichiometry conservation centrality and expression generality score suggests that proteins with high stoichiometry conservation centrality govern basal cellular functions required under any conditions. In fact, both essential genes and evolutionarily conserved genes are enriched in the omics fractions with high centrality scores. On the contrary, proteins of low centrality scores might have been acquired in later stages of the evolution and exploited to survive or increase fitness under specific conditions. Such hierarchy in the stoichiometry conservation centrality among core and peripheral processes might promote the adaptability of cells since cells can respond to diverse environments without restructuring a large body of the functional homeostatic core. This architectural principle in omics might underlie the robustness and adaptability of biological cells.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference
Chemical compound, drug	Difco LB Broth, Miller (Luria-Bertani)	Becton, Dickinson and Company
Chemical compound, drug	Bacto Yeast Extract	Becton, Dickinson and Company
Chemical compound, drug	Bacto Tryptone	Becton, Dickinson and Company
Chemical compound, drug	Sodium Chloride	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Disodium Hydrogenphosphate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Potassium Dihydrogenphosphate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Ammonium Sulfate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Zinc Sulfate Heptahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Cooper(II) Chloride Dihydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Manganese(II) Sulfate Pentahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Cobalt(II) Chloride Hexahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Calcium Chloride Dihydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Magnesium Sulfate Heptahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Thiamin Hydrochloride	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Iron(III) Chloride Hexahydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Sodium Acetate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Disodium Fumarate	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	D-Galactose	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	D-Glucose	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Glycerol	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	D-Fructose	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	D-Mannose	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	D-Xylose	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Alanine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Asparagine Monohydrate	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Cysteine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Glutamic acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Glutamine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Glycine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Histidine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Isoleucine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Phenylalanine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Proline	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Serine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Adenine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Arginine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Aspartic acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Leucine	FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	L-Lysine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Methionine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Threonine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Tryptophan	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Tyrosine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	L-Valine	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Uracil	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	$8 mol / L$ Sodium Hydroxide Solution	Wako Pure Chemical Industries, Ltd., FUJIFILM Wako Pure Chemical Corporation
Chemical compound, drug	35–37% (mass/mass) Hydrochloric Acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	$0.1 m o l / L$ Hydrochloric Acid	Wako Pure Chemical Industries, Ltd.
Chemical compound, drug	Agar	Wako Pure Chemical Industries, Ltd., FUJIFILM Wako Pure Chemical Corporation
Strain, strain background (Escherichia coli)	BW25113	Wakamoto Laboratory stock
Strain, strain background (Escherichia coli)	MG1655	Wakamoto Laboratory stock
Strain, strain background (Escherichia coli)	NCM3722	Coli Genetic Stock Center

Note that mathematical notation in Materials and methods differs in some respects from that in the main text, Table 1, and main figures.

Experimental methods, data acquisition, and data analyses

Absolute quantitative proteome data

Request a detailed protocol

We utilized high-quality absolute quantitative proteome data reported by Schmidt et al., 2016. In these data, expression levels of more than 55% of genes of E. coli BW25113 strain (more than 95% of total proteome mass) were quantified under various environmental conditions.

We also used additional absolute quantitative proteome data (Schmidt et al., 2016; Schubert et al., 2015; Lahtvee et al., 2017) for checking the generality of our findings (see Appendix 3.2). In addition to the proteome data across environmental conditions, Schmidt et al. also reported proteomes of E. coli strains with different genotype backgrounds (BW25113, MG1655, and NCM3722) cultured in a rich medium or a minimal medium supplemented with glucose. Schubert et al., 2015 quantified proteomes of M. tuberculosis H37Rv strain and M. bovis BCG strain under time-course environmental change conditions starting from exponential growth conditions, followed by dormant states induced by decreasing oxygen levels, and finally regrowth conditions with re-aeration. Lahtvee et al., 2017 quantified proteomes of S. cerevisiae under a reference condition and three stressed conditions (ethanol, osmotic pressure, and high temperature, with three stress intensity steps for each type of stress) using chemostat.

For checking the generality of our findings across omics classes, we also used the transcriptome data reported by our previous paper (Kobayashi-Kirschvink et al., 2018). The data include the transcriptomes of S. pombe in rich and minimal media, in nutrient-depleted media, and under various stress conditions.

E. coli strains and culture conditions

Request a detailed protocol

To quantitatively analyze a linkage between the absolute proteome data generated by Schmidt et al., 2016 and Raman data, we reproduced the culture conditions used in Schmidt et al., 2016 as closely as possible in our lab. We obtained three biological replicates.

E. coli strains

Request a detailed protocol

We used BW25113, MG1655, and NCM3722 as in Schmidt et al., 2016. In particular, BW25113 (Datsenko and Wanner, 2000) was used for the main data in this study. The genotype of BW25113 is F⁻ Δ(araD-araB)567 ΔlacZ4787 (::rrnB-3) λ⁻ rph-1 Δ(rhaD-rhaB)568 hsdR514, that of MG1655 is F⁻ λ⁻ rph-1 , and that of NCM3722 is F⁺, respectively (Baba et al., 2006; Blattner et al., 1997; Soupene et al., 2003).

Culture conditions

Request a detailed protocol

We prepared 15 batch culture conditions listed in Appendix 1—table 1. We excluded three culture conditions among the 18 conditions reported in Schmidt et al., 2016 because we could not obtain sufficiently strong cellular Raman signals under those excluded conditions. See Schmidt et al., 2016 for the detail of medium compositions. For ‘GlucosepH6’ medium, 37% HCl was titrated to the ‘Glucose’ medium. Medium for ‘stationary1day’ and ‘stationary3days’ was the same as ‘Glucose’ medium. LB agar plates were prepared by adding $15 g / L$ agar to ‘LB’ medium.

Cultivation

Request a detailed protocol

Culturing E. coli cells proceeded in four steps:

Step 1: Growth on LB agar plates. Cells were taken from a −80°C glycerol stock and streaked on LB agar plates. The plates were incubated at 37°C overnight and stored at 4°C. All subsequent experiments were conducted using colonies on the LB agar plates. Picking colonies from the plates for cultivation was done within 4 days of storage at 4°C.
Step 2: Liquid culture under ‘Glucose’ condition. Several colonies picked from LB agar plates were inoculated into ‘Glucose’ liquid culture medium and grown for about 16 hr. Cells for the ‘Glucose42C’ condition were cultured at 42°C, and those for the other conditions were grown at 37 °C.
Step 3: Liquid culture under each condition. Cells from Step 2 were passaged into each type of medium and grown to exponential phase. Cells for the ‘Glucose42C’ condition were grown at 42°C, and those for the other conditions were cultured at 37°C.
Step 4: Liquid culture under each condition. Cells from Step 3 were passaged into the respective fresh medium and grown to almost the same level of turbidity as that at the end of Step 3. Cells for the ‘Glucose42C’ condition were cultured at 42°C, and those for the other conditions were grown at 37°C.

For the exponential conditions, cell cultivation was conducted as described above. For the stationary conditions, cultivation of cells at Step 3 was continued instead of proceeding to Step 4 and ended 1 or 3 days after they reached the stationary phase.

The medium volume was $2 mL$ for all the liquid cultures in our experiments. Borosilicate glass test tubes with a diameter of $16.5 mm$ and a length of $165 mm$ were used. A fresh medium was pre-warmed before passage so that its temperature was the same as that of cultivation. All the liquid cultures were under reciprocal shaking at $200 r / \min$ and at an inclination of 45°. Liquid cultures were diluted to an OD₆₀₀ of around 0.01 for passage.

Main differences between our cultivation conditions and those of Schmidt et al., 2016 are the periods of storage at 4°C at Step 1 (a maximum of 3 weeks in Schmidt et al., 2016), the number of colonies inoculated from plates to liquid medium at the second step (one colony per inoculation in Schmidt et al., 2016), and medium volumes and shaking conditions of liquid cultures ( $50 mL$ liquid culture in $500 mL$ unbaffled wide-neck Erlenmeyer flasks under orbital shaking at $300 r / \min$ in Schmidt et al., 2016).

Growth rate measurements

Request a detailed protocol

Growth curves were obtained by continuing the Step 3 in cultivation. Cultivation of cells for growth measurements was conducted with $5 mL$ culture media, not $2 mL$ , due to a requirement of the device used for continuous turbidity recording (ODBox-C, TAITEC Corporation). In addition, cells were washed with each type of fresh medium before inoculation at the beginning of Step 3, and cultivation for growth recording started from an OD₆₀₀ of around 0.001. Growth rates were calculated from the growth curves using the fitting algorithm based on Gaussian processes (Swain et al., 2016).

Raman measurements and preprocessing of spectra

Request a detailed protocol

Cells were washed three times with 0.9% aqueous solution of NaCl, and 5 µL of the suspension was placed on a synthetic quartz slide glass (Toshin Riko Co., Ltd.) and dried. Raman spectra of cells were measured with a Raman microscope (Appendix 1—figure 2), where a custom-built Raman system (STR-Raman, AIRIX) was integrated into a microscope (Ti-E, Nikon). Excitation light was generated by a $532 nm$ continuous-wave diode-pumped solid-state laser (Gem 532, Laser Quantum). We altered the first version of this Raman microscope (Kobayashi-Kirschvink et al., 2018), and light from the laser oscillator was transmitted by mirrors in this research. A $100 \times$ and $NA = 0.9$ air objective lens (MPLN100X, Olympus) was used. Raman scattered light was collected by an optical fiber and transmitted to a spectrometer (Acton SP2300i, Princeton Instruments). Dispersed light by a $300 gr / mm$ grating was projected onto an image sensor of an sCMOS camera (OrcaFlash 4.0 v2, Hamamatsu Photonics). The sCMOS camera was water-cooled at 15°C to reduce dark noise. The exposure time for each cell was $10 s$ . Randomly selected 15 cells were measured per condition per replicate. Raman spectrum of background was measured for each cell with $10 s$ exposure in an area close to a targeted cell where neither cells nor NaCl crystals existed.

In our setup, the laser power at the sample stage was $21 mW$ . The measurement system and processes were controlled using Micro-Manager 1.4 (Edelstein et al., 2014) and a plugin we made.

Readout noise of sCMOS image sensors is pixel-dependent. A noise reduction filter developed in Kobayashi-Kirschvink et al., 2018, on the basis of Huang et al., 2013, was applied to measured spectral images by using 10,000 blank images obtained with the same sCMOS sensor with exposure time of $10 s$ . See Kobayashi-Kirschvink et al., 2018 for details.

After noise reduction with the filter, pixel counts were summed up along the direction perpendicular to wavenumber. A background spectrum was subtracted from a cellular Raman spectrum. A pixel region corresponding to the range from $632 {cm}^{- 1}$ to $1862 {cm}^{- 1}$ was cropped. The cropped spectrum was smoothed with a Savitzky-Golay filter (Savitzky and Golay, 1964). To minimize the effect of laser excitation variations, each spectrum was normalized by subtracting the average and dividing it by the standard deviation.

Data analysis

Request a detailed protocol

We wrote scripts and analyzed data using MATLAB (R2019a and R2023b), except for Brunner-Munzel test, for which we used R (version 4.0.3) (see ‘Centrality-evolutionary conservation correlation’ in Materials and methods).

Related to Figure 2 in the main text, we first performed LDA against the Raman data. LDA is a linear classifier; it finds the most discriminatory bases by maximizing the ratio of the between-class variance to the within-class variance and reduces the dimensions of the data to $m - 1$ , where $m$ is the number of classes (Huang et al., 2004; De Bie et al., 2005; Goodacre et al., 1998). In the case of our main data, classes are culture conditions. In the verification step of the correspondence between the LDA Raman and omics data, we conducted LOOCV. In LOOCV, one condition is used as test data and the remaining conditions are used as training data. This is repeated by changing the condition to exclude.

The details of the data analyses are provided in the sections below.

Raman-proteome statistical correspondence

Notation

Request a detailed protocol

We write the population-averaged 14-dimensional LDA Raman spectrum vector of each condition as a row vector ${\hat{r}}_{i} (i = 1, . . ., 15)$ and the 2058-dimensional absolute proteome vector of each condition as a row vector ${\hat{p}}_{i} (i = 1, . . ., 15)$ . Note that we regarded ${\hat{𝒓}}_{i}$ and ${\hat{𝒑}}_{i}$ as column vectors in the main text for simple expression of equations.

Our hypothesis of Raman-proteome linear correspondence (Equation 1 in the main text) is expressed as

{\hat{p}}_{i}^{⊤} = B \cdot [\begin{matrix} 1 \\ {\hat{r}}_{j}^{⊤} \end{matrix}],

where $B$ is a 2058 × 15 matrix and $⊤$ denotes transpose. In LOOCV, one condition is excluded (let $i$ be the excluded condition) and the remaining 14 conditions are used to estimate $B$ . We write the estimated $B$ as $B_{- i}^{est}$ , which is also a 2058 × 15 matrix. Let ${\hat{𝒑}}_{i}^{est}$ be the estimated proteome of the excluded condition in LOOCV (Figure 2G).

OLS in LOOCV scheme

Request a detailed protocol

In the case of LOOCV, 14 (= 15 − 1) conditions are included in a training data. Thus, if all the 14 LDA axes of the low-dimensional Raman data are considered, OLS becomes underdetermined. We excluded higher dimensions of the Raman space to conduct OLS in LOOCV unless otherwise noted. The results described in the main text were obtained using the first four axes (LDA1 to LDA4). In this case, $B_{- i}^{est}$ is a 2058 × 5 matrix.

Permutation test

Request a detailed protocol

Let a permutation of all the 15 conditions be $σ$ . In our permutation test, we calculated overall estimation errors as $\sum_{i} dist ({\hat{𝒑}}_{i}, {\hat{𝒑}}_{σ (i)}^{est})$ , where $dist ({\hat{𝒑}}_{i}, {\hat{𝒑}}_{σ (i)}^{est})$ is one of the distance measures between ${\hat{𝒑}}_{i}$ and ${\hat{𝒑}}_{σ (i)}^{est}$ listed in Appendix 1—table 2. There exist $15!$ sets of $σ$ , and calculating all of them is computationally intensive. Thus, we randomly generated 10⁵ permutation sets.

The result presented in the main text is the case where Euclidean metric (PRESS) was used as a distance measure. Likewise, we also obtained small $p$ -values with the other metrics (Appendix 1—table 2).

We could also estimate the proteomes with high accuracy using all the 14 dimensions of the LDA space (Appendix 1—table 3). As noted in ‘OLS in LOOCV scheme’ in Materials and methods, the regression is underdetermined in this case. Thus, we simply adopted the minimum-norm solution from among all least-squares solutions.

Characterizing an SCG by analyzing the Raman-proteome correspondence matrix

Notation

Request a detailed protocol

The component representation of Equation 2 is

\overset{{\hat{p}}_{i}^{⊤}}{\overset{⏞}{(\begin{matrix} \begin{matrix} p_{i 1} \\ p_{i 2} \\ ⋮ \\ p_{i n} \end{matrix} \end{matrix})}} = \overset{B}{\overset{⏞}{(\begin{matrix} b_{10} & b_{11} & \dots & b_{1 (m - 1)} \\ b_{20} & b_{21} & \dots & b_{2 (m - 1)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{n 0} & b_{n 1} & \dots & b_{n (m - 1)} \end{matrix})}} \overset{{[1 {\hat{r}}_{i}]}^{⊤}}{\overset{⏞}{(\begin{matrix} 1 \\ r_{i 1} \\ ⋮ \\ r_{i (m - 1)} \end{matrix})}},

where $n$ is the number of proteins and $m$ is the number of culture conditions. $n = 2058$ and $m = 15$ in our case. Let $𝒃_{h - 1}$ be the $h$ -th row of th column of $B$ . For example, $𝒃_{0} = {(\begin{matrix} b_{10} & \dots & b_{n 0} \end{matrix})}^{⊤}$ denotes the constant term for each protein, and $𝒃_{1} = {(\begin{matrix} b_{11} & \dots & b_{n 1} \end{matrix})}^{⊤}$ the coefficient of LDA1 for each protein. The expression level of protein $j$ in the condition $i$ is

p_{i j} = b_{j 0} + b_{j 1} r_{i 1} + \dots + b_{j (m - 1)} r_{i (m - 1)} .

Stoichiometry conservation of ISP COG class

Request a detailed protocol

In the main text, we revealed that many proteins belonging to ISP COG class were aligned on a straight line passing through the origin when the relations between the columns of $B$ were shown in scatterplots (Figure 3A). Consider hypothetical proteins that align perfectly on a straight line through the origin. Let $e_{1}, \dots, e_{k}$ be the indices of such perfectly aligning protein species. Extracting only these rows for the proteins from Equation 3, we obtain

(\begin{matrix} p_{i e_{1}} \\ p_{i e_{2}} \\ ⋮ \\ p_{i e_{k}} \end{matrix}) = [\begin{matrix} {\tilde{b}}_{0} & {\tilde{b}}_{1} & \dots & {\tilde{b}}_{m - 1} \end{matrix}] (\begin{matrix} 1 \\ r_{i 1} \\ ⋮ \\ r_{i (m - 1)} \end{matrix})

= [\begin{matrix} {\tilde{b}}_{0} & c_{1} {\tilde{b}}_{0} & \dots & c_{m - 1} {\tilde{b}}_{0} \end{matrix}] (\begin{matrix} \begin{matrix} 1 \\ r_{i 1} \\ ⋮ \\ r_{i (m - 1)} \end{matrix} \end{matrix})

= (1 + c_{1} r_{i 1} + \dots + c_{m - 1} r_{i (m - 1)}) {\tilde{b}}_{0},

where $c_{i}$ ( $i = 1, 2, \dots, m - 1$ ) are constants and ${\tilde{b}}_{h} := {(b_{e_{1} h} \dots b_{e_{k} h})}^{⊤}$ . For our data, Appendix 1—figure 4A and B correspond to Equation 7. In these plots, the $y$ -axis represents ${\hat{𝒑}}_{i}^{⊤}$ , and the $x$ -axis ${\tilde{𝒃}}_{0}$ . Many ISP proteins indeed align on a straight line through the origin with different slopes for different conditions (Appendix 1—figure 4A). In contrast, many proteins in other COG classes do not align on a straight line (Appendix 1—figure 4B).

Importantly, for a pair of proteins $e_{α}, e_{β}$ that align on the straight line,

\frac{p_{i e_{α}}}{p_{i e_{β}}} = \frac{b_{e_{α} 0}}{b_{e_{β} 0}}

holds from Equation 7. The right-hand side of Equation 8 does not contain condition index $i$ , which means that the abundance ratio of the proteins remains constant regardless of the conditions.

On the evaluation of stoichiometry conservation by Pearson correlation coefficient

Request a detailed protocol

In the main text, we used Pearson correlation coefficients to confirm the stoichiometry conservation of many ISP COG class members (Figure 3B). However, strictly speaking, cosine similarity is a more appropriate measure to evaluate stoichiometry conservation. In this analysis, cosine similarity can be written as

\cos θ_{{\tilde{\hat{p}}}_{i} {\tilde{\hat{p}}}_{j}} = \frac{{\tilde{\hat{p}}}_{i} \cdot {\tilde{\hat{p}}}_{j}}{{‖ {\tilde{\hat{p}}}_{i} ‖}_{2} {‖ {\tilde{\hat{p}}}_{j} ‖}_{2}},

where ${\tilde{\hat{𝒑}}}_{i}$ and ${\tilde{\hat{𝒑}}}_{j}$ are the vectors representing the protein abundance for the proteome subgroups (‘ISP’ COG class, ‘Cellular processes and signaling’ COG class, and ‘Metabolism’ COG class) for conditions $i$ and $j$ , respectively ( $1 \leq i, j \leq m$ ). Cosine similarity version of Figure 3B is Appendix 1—figure 4F. The cosine similarity takes the maximum value 1 only when abundance ratios between all considered proteins are perfectly the same between the two compared conditions.

In addition, we also examined differences between COG classes by calculating Pearson correlation coefficients of log abundances (Appendix 1—figure 4E).

Direct characterization of SCGs in omics data

Notation

Request a detailed protocol

Let $𝒑_{i}$ be a column vector representing the abundances of protein $i$ . Each component of this vector indicates the abundance of protein $i$ under each condition. Therefore,

p_{i} = (\begin{matrix} \begin{matrix} p_{1 i} \\ p_{2 i} \\ ⋮ \\ p_{m i} \end{matrix} \end{matrix}),

where $m = 15$ in our case. Note that $𝒑_{i}$ defined here is a 15-dimensional column vector and different from ${\hat{𝒑}}_{i}$ introduced previously, which was a 2058-dimensional row vector.

Identifying SCGs in omics data

Request a detailed protocol

As explained in the main text, we extracted SCGs directly from the omics data, without referring to Raman data or COG classification. We evaluated the similarity of expression patterns for all the combinations of proteins using cosine similarity. Specifically, cosine similarity between proteins $i$ and $j$ is calculated as

\cos θ_{p_{i} p_{j}} := \frac{p_{i} \cdot p_{j}}{‖ p_{i} ‖_{2} ‖ p_{j} ‖_{2}} .

This is the inner product of normalized $𝒑_{i}$ and $𝒑_{j}$ . Note that $0 \leq \cos θ_{𝒑_{i} 𝒑_{j}} \leq 1$ as protein abundances of any proteins are non-negative. $\cos θ_{𝒑_{i} 𝒑_{j}}$ takes the maximum value 1 if and only if $𝒑_{i}$ and $𝒑_{j}$ point in identical direction, i.e., the abundance ratios of proteins $i$ and $j$ are constant under all the conditions. Therefore, if we extract only the proteins connected with high cosine similarity from all $(\binom{n}{2})$ protein pairs, they would constitute proteome fractions in each of which the abundance ratios of the proteins remain almost constant across all the $m$ conditions. We hence extracted only the protein pairs whose cosine similarity was above a high threshold of 0.995. As a result, we obtained several SCGs, in each of which the protein species are linked to each other with high cosine similarity (Figure 4B and C).

The genes in each SCG are listed in Appendix 1—table 4–8. Note that there are many other minor components (Figure 4B), some of which may have an expression pattern similar to another component but are separated due to the high threshold.

The positions of members of the SCGs on the chromosome are shown in Figure 4D (SCG 1 [homeostatic core]) and Appendix 1—figure 5E (SCGs 2–5).

Global proteome structures based on stoichiometric balance

Request a detailed protocol

In the previous section, we identified SCGs by setting a threshold of cosine similarity for extracting protein pairs. We next removed the threshold and considered the ‘distance’ with respect to cosine similarity for all the protein pairs to capture the global proteome structure that includes SCGs.

The cosine similarity for all the $(\binom{n}{2})$ pairs of proteins can be summarized in one matrix as

A := {(\cos θ_{p_{i} p_{j}})}_{1 \leq i, j \leq n} = (\begin{matrix} \cos θ_{p_{1} p_{1}} & \dots & \cos θ_{p_{1} p_{n}} \\ ⋮ & ⋱ & ⋮ \\ \cos θ_{p_{n} p_{1}} & \dots & \cos θ_{p_{n} p_{n}} \end{matrix}),

where ( $i, j$ ) component represents cosine similarity between proteins $i$ and $j$ . Assuming that this matrix is an adjacency matrix in graph theory and network theory, the entire proteomes are considered as a weighted undirected complete graph (with loops), where nodes correspond to protein types and any protein pair is connected by an undirected edge. Each edge is weighted by the cosine similarity between the two protein species at both ends. Note that all the diagonal elements of $A$ are one, which represents that each node has a loop with weight of one. These were introduced just for simplicity.

To ask whether the SCGs identified in the previous section have any unique features in this network, we evaluated the degree to which each node is central in the network structure. In graph theory, ‘centrality’ is known as an index to measure how ‘important’ or ‘influential’ each node is. In particular, we employed a measure called ‘degree centrality’ (for weighted graphs) (Nieminen, 1973; Segarra and Ribeiro, 2015). Degree centrality, which is also called ‘degree’, simply measures ‘influence’ of a node on a network on the basis of links with its direct neighborhood. One can obtain a degree centrality value by calculating the sum of the weights of all the edges connected to each node (see also the definition of the degree matrix $D$ in Equation 14 below). We note that in our graph, degree centrality vector $A 𝟏_{n} (= D 𝟏_{n})$ , where $𝟏_{n}$ is an $n$ -dimensional column vector of which all elements are one, is equal to the eigenvector corresponding to the largest eigenvalue of a ‘normalized’ adjacency matrix ${(D^{- 1} A)}^{⊤} = A D^{- 1}$ up to multiplication by a constant. From this perspective, the centrality index we adopted measures ‘influence’ of a node in a recursive manner depending on ‘influence’ of its neighboring nodes. A well-known example of this centrality indicator is Google’s PageRank (Brin and Page, 1998) used for ranking web pages on the World Wide Web. It can be regarded as a variant of ‘eigenvector centrality’ (the eigenvector corresponding to the largest eigenvalue of the adjacency matrix $A$ ) (Bonacich, 1972; Segarra and Ribeiro, 2015). As explained in the main text, the protein species in the homeostatic core (the largest SCG) had high centrality scores, while those in the other condition-specific SCGs had low centrality scores (Figure 5A).

We directly observed the global stoichiometry conservation structure of this proteome graph using Laplacian eigenmaps (Figures 5K, L, 6D). In general, a graph can be uniquely specified not only by the adjacency matrix $A$ , but also by the Laplacian matrix $L$ defined as

L := D - A,

where $D = (d_{i j})$ is the degree matrix with the components of

d_{i j} = {\begin{cases} (A 1_{n})_{i} & (i = j) \\ 0 & (i \neq j) \end{cases},

where $𝟏_{n}$ is an $n$ -dimensional column vector of which all elements are one. The $(i, i)$ -element of $D$ represents the sum of the weights of all the edges connected to node $i$ . In our case, it represents the sum of cosine similarity values between protein $i$ and the other proteins. To see the entire proteome graph structure, we specifically employed the normalized Laplacian,

L_{rw} = D^{- 1} L = I - D^{- 1} A .

We remark that there are two types of often-used normalized Laplacian matrices, $L_{rw}$ and $L_{sym} = D^{- 1 / 2} L D^{- 1 / 2} = I - D^{- 1 / 2} A D^{- 1 / 2}$ , in the field of machine learning (von Luxburg, 2007), and our mathematical analysis can provide a clear interpretation to each of them in the context of the Raman-proteome linear correspondence as described in Appendix 2.1.5.

There exist $m - 1$ nontrivial eigenvalues of $L_{rw}$ that are greater than zero and less than one. We write these $m - 1$ eigenvalues as $λ_{LE1}, \dots, λ_{LE (m - 1)}$ from the smallest and the corresponding eigenvectors as $𝒗_{rw, 1}, \dots, 𝒗_{rw, (m - 1)}$ . Additionally, we denote the eigenvector corresponding to the eigenvalue zero as $𝒗_{rw, 0}$ . Using these eigenvectors, one can construct a matrix $\begin{array}{ll} {\tilde{V}}_{r w} = [\begin{matrix} v_{r w, 0} & v_{r w, 1} & \dots & v_{r w, (m - 1)} \end{matrix}] \end{array}$ and visualize a proteome, assigning protein $j$ with a coordinate specified by the elements after the second column in the $j$ -th row of ${\tilde{V}}_{r w}$ , i.e., by the $j$ -th row of $[\begin{matrix} v_{r w, 1} & \dots & v_{r w, (m - 1)} \end{matrix}]$ . The csLE structure we illustrate in the main figures was produced by using these eigenvectors. For example, the csLE1-csLE2 figure in the main text (Figure 6D) is a scatterplot between $𝒗_{rw, 1}$ and $𝒗_{rw, 2}$ . Note that the closer to one the cosine similarity of a protein pair is (the more similar their expression patterns are), the ‘closer’ the two protein species are placed (see Section 2.1.5 in Appendix for details).

This method of obtaining low-dimensional representation of data using eigenvectors of a graph Laplacian is known as Laplacian eigenmaps (LE) (Belkin and Niyogi, 2001; Belkin and Niyogi, 2003). Thus, what we explained above is the LE of a graph with edges weighted with cosine similarity of expression patterns of nodes (protein species). It differs from the original and common usages of LE in that the graph we considered is a complete graph (with loops) and that the weight of edges (pairwise similarity of nodes) is cosine similarity. It has made all the mathematical formulations linear, which allowed us to biologically interpret the results with mathematically rigorous analyses. We also remark that our graph representation of proteome does not rely on existing knowledge on the underlying interaction and regulatory networks of proteins and is based only on final expression levels of the proteins. Therefore, the results are robust against the uncertainty of underlying molecular detail.

Relevance of centrality of csLE structure to biological functions

Centrality-essentiality correlation

Request a detailed protocol

As mentioned in the main text, centrality of protein species with regard to stoichiometry conservation correlates with gene essentiality (Figure 5B). We analyzed the proteome data from all the 22 conditions reported by Schmidt et al., 2016 in Figure 5B. Interestingly, the centrality-essentiality correlation becomes weaker when the analysis was conducted with the data from fewer conditions (Appendix 1—figure 6A).

We obtained the list of essential genes of E. coli from EcoCyc (Keseler et al., 2017) on September 23, 2020. The list contained 318 essential genes in total. The essentiality of the genes in this list was determined on the basis of whether single-gene knockouts of BW25113 (Keio Collection) could grow under LB condition at 37°C (Baba et al., 2006).

We also confirmed centrality-essentiality correlation for S. pombe transcriptome data (Kobayashi-Kirschvink et al., 2018; Appendix 1—figure 6B, see Appendix 3.2). For this analysis, we downloaded the list of essential genes of S. pombe from PomBase (Harris et al., 2022) on May 13, 2022. The list contained 1221 essential genes in total. Here, the essentiality data by PomBase was based on the Fission Yeast Phenotype Ontology terms ‘inviable vegetative cell population’ (FYPO:0002061) and ‘viable vegetative cell population’ (FYPO:0002060) (Harris et al., 2013). Note that in our S. pombe essentiality analysis, we focused only on coding genes, whereas the csLE structure was calculated using both coding and non-coding genes. See ‘Centrality-coding/non-coding correlation’ in Materials and methods and Appendix 1—figure 6C for the proportion of coding genes in each bin in Appendix 1—figure 6B. Eleven coding genes in the S. pombe transcriptome data were not found in current PomBase. Thus, some bins do not show 100% in total in Appendix 1—figure 6B.

Stoichiometry conservation centrality in human cells was evaluated using two kinds of H. sapiens transcriptome data: (i) human cell atlas data reported in Cao et al., 2020 (Figure 5F) and (ii) genome-wide Perturb-seq data reported in Replogle et al., 2022 (Figure 5H).

The human cell atlas data (Cao et al., 2020) contain gene expression profiles in cells from 15 fetal organs. To calculate stoichiometry conservation centrality from the human cell atlas, we analyzed the pseudobulk data (GSE156793_S4_gene_expression_tissue.txt provided at the Gene Expression Omnibus). We calculated stoichiometry conservation centrality value of each gene using expression level data of 53,908 genes that are expressed at least in one organ.

The Perturb-seq data we used are gene expression profiles in a chronic myeloid leukemia cell line K562 (Replogle et al., 2022). This dataset contains single-cell RNA sequencing data of genetically perturbed cells in which expression of targeted genes is inhibited by CRISPR interference. We analyzed the pseudobulk data (K562_gwps_raw_bulk_01.h5ad provided at Figshare) to calculate stoichiometry conservation centrality. We evaluated stoichiometry conservation centrality value of each gene using the expression data of all the 8248 genes in the Perturb-seq data. We remark that this dataset did not contain genes that showed no expression under all the reported genetic perturbation conditions.

Human gene essentiality was determined by referring to another dataset reported in Wang et al., 2015, in which fitness cost imposed by gene inactivation was evaluated by a CRISPR-based method (Wang et al., 2015). The fitness cost was quantified by an index called CRISPR score; genes with lower CRISPR scores are considered more essential (Wang et al., 2015). We used the CRISPR scores calculated with a human chronic myelogenous leukemia cell line KBM7.

The CRISPR scores of 16,996 genes and 7462 genes were found in Wang et al., 2015 among the genes whose stoichiometry conservation centrality was evaluated using the human cell atlas data (Cao et al., 2020) and the Perturb-seq data (Replogle et al., 2022), respectively. We evaluated the correlations between stoichiometry conservation centrality and gene essentiality (CRISPR scores) for these common genes in Figure 5F and H. The correlations were examined with the Brunner-Munzel test (Brunner and Munzel, 2000) using R (version 4.0.3) and ‘brunnermunzel’ package (version 2.0) (Ara, 2022).

Centrality-evolutionary conservation correlation

Request a detailed protocol

As mentioned in the main text, centrality of proteins with regard to expression stoichiometry conservation weakly correlates with evolutionary conservation represented by the number of orthologs based on protein sequences (Figure 5C). In Figure 5C, we analyzed the proteome data from all the 22 conditions reported in Schmidt et al., 2016. We also confirmed the relation for the E. coli proteome data from fewer conditions which we had used for our Raman-proteome correspondence analyses (Appendix 1—figure 6D).

We obtained the ortholog data from OrthoMCL-DB (Chen et al., 2006) (release 6.12). We used the number of orthologs in all of the ‘Core species’ and the ‘Peripheral species’ of OrthoMCL, which are across the three domains (Bacteria, Archaea, and Eukaryota), as a proxy for evolutionary conservation of each protein. To examine the correlation, we performed the Brunner-Munzel test (Brunner and Munzel, 2000) using R (version 4.0.3) and ‘brunnermunzel’ package (version 2.0) (Ara, 2022). The E. coli proteome data contain 15 proteins with IDs that were not found in OrthoMCL-DB for technical reasons such as changes in IDs in the past, and thus, we manually processed these 15 proteins.

We also examined S. pombe transcriptome data (Kobayashi-Kirschvink et al., 2018; Appendix 1—figure 6E–G, see Appendix 3.2). We obtained ortholog data from OrthoMCL-DB (Chen et al., 2006) (release 6.12). The S. pombe transcriptome data have 11 coding genes which were not found in both current PomBase and OrthoMCL-DB, and two coding genes which were found in PomBase but not in OrthoMCL-DB. The S. pombe transcriptome data contain not only coding genes but also non-coding genes, and we obtained the csLE structure using both.

We also evaluated stoichiometry conservation-evolutionary conservation correlation using the human cell atlas data (Cao et al., 2020; Figure 5G) and the genome-wide Perturb-seq data (Replogle et al., 2022; Figure 5I). Ortholog data for these analyses were obtained from OrthoMCL-DB (release 6.20). We found the ortholog data in OrthoMCL-DB for 18,959 genes among the 53,908 genes with stoichiometry conservation centrality evaluated with the human cell atlas data. We remark that 98.7% of the 18,959 genes were classified as coding genes in the human cell atlas data. We also found the ortholog data for 7957 genes among the 8248 genes with stoichiometry conservation centrality evaluated with the Perturb-seq data. The correlations were examined with the Brunner-Munzel test (Brunner and Munzel, 2000) using R (version 4.0.3) and ‘brunnermunzel’ package (version 2.0) (Ara, 2022).

Centrality-coding/non-coding correlation

Request a detailed protocol

As mentioned in the main text and ‘Centrality-essentiality correlation’ in Materials and methods, centrality of genes with regard to stoichiometry conservation clearly correlates with coding/non-coding classification of genes in S. pombe. We observed this trend using S. pombe transcriptome data (Kobayashi-Kirschvink et al., 2018; Appendix 1—figure 6C). The coding/non-coding assignment of each gene is based on PomBase (Harris et al., 2022) data downloaded on October 11, 2022.

We observed a comparable correlation even in the human cell atlas data (Figure 5E). The gene type assignment is based on the human cell atlas data. Note that almost all the genes in the Perturb-seq data were coding genes.

Global omics structures characterized by Raman-omics correspondences

Notation

Request a detailed protocol

Let ${\hat{b}}_{j}$ denote the $j$ -th row in $B$ (see Equation 3). It is an $m$ -dimensional row vector whose components represent coefficients of protein $j$ . The first component is the constant term, and the $i$ -th component is the coefficient for LDA( $i - 1$ ) Raman. Below, we consider the coefficients normalized with the constant terms,

{\hat{b}}_{j}^{n o r m} := (\begin{matrix} 1 & \frac{b_{j 1}}{b_{j 0}} & \dots & \frac{b_{j (m - 1)}}{b_{j 0}} \end{matrix}) .

Raman-proteome correspondence matrix as a low-dimensional representation of proteome changes

Request a detailed protocol

We asked whether the stoichiometry conservation structure of the proteomes revealed by LE (Figures 5K, L, 6D) is relevant to the low-dimensional Raman LDA space. To address this, we focused on a proteome low-dimensional structure specified by the Raman-proteome coefficients, motivated by the fact that the analysis of $B$ led to the discovery of a proteome fraction that conserves mutual stoichiometry (Figure 3). We considered a space where ${\hat{𝒃}}_{j}^{norm}$ represents the coordinate of each protein. From Equation 7 or Equation 8, protein species whose abundance ratios remain constant have an identical coordinate in this normalized coefficient space. The proteome in this normalized coefficient space is shown in Figure 6A and C.

This structure (Figure 6A and C) is constructed using the Raman LDA axes (dual basis) and is different from the csLE structure (Figures 5K, L, 6D), which is independent of Raman information. Therefore, it is nontrivial that these two structures are similar. This similarity suggests that differences in cellular Raman spectra captured by LDA might be quantitatively related to the omics structure deduced from stoichiometry-conserving relations. We will mathematically analyze this similarity in Section 2 in Appendix.

Evaluating similarity between orthogonal matrix $Θ$ and identity matrix

Request a detailed protocol

As we see in Appendix 2.1.5, an orthogonal matrix $Θ$ that appears in the relation connecting the two types of proteome structure must be close to an identity matrix to guarantee the structural similarity. To evaluate to what extent $Θ$ is close to an identity matrix, we generated many random orthogonal matrices (Mezzadri, 2006) and compared $Θ$ and the identity matrix with them.

We first multiplied each orthogonal matrix by itself in the sense of Hadamard product (element-wise product). Then, we regarded the resultant matrix as a scatterplot (Appendix 1—figure 9B) and calculated its Pearson correlation coefficient, assuming that $(i, j)$ element was the frequency of ‘data points’ at the coordinate $(i, j)$ . The obtained Pearson correlation coefficient can be regarded as a measure of closeness to the identity matrix. In the case of the identity matrix, the correlation coefficient takes the maximum value, one, because non-zero values are concentrated on the diagonal part.

We calculated the square of each matrix in the sense of Hadamard product for two reasons. First, since all elements of the resultant matrix are non-negative, one can ensure that the number of ‘points’ is non-negative at any coordinate. Note that it is not necessarily an integer here. Second, the sum of all the elements of the resultant matrix is necessarily $m$ . Thus, the total number of ‘points’ is equally $m$ for any $m \times m$ orthogonal matrices compared.

In addition to this method, we also evaluated the closeness of $Θ$ to the identity matrix (i) by comparing the magnitudes of off-diagonal elements among $Θ$ , the identity matrix, and random orthogonal matrices, and (ii) by comparing the magnitudes of elements of leading principal submatrices among $Θ$ , the identity matrix, and random orthogonal matrices. In (i), from a part consisting of $(m - 1)$ - and $- (m - 1)$ -diagonals ( $(1, m)$ and $(m, 1)$ elements) to the whole matrix, we expand step by step the area to consider by including $i$ - and $- i$ -diagonals ( $m - 1 \geq i \geq 0$ , the final step is inclusion of the main diagonal), and calculated the sum of the square of the elements in the area at each step. In (ii), from the smallest leading principal submatrix ( $(1, 1)$ element) to the whole matrix, we expand the area to consider step by step and calculate the sum of the square of the elements in the area at each step. See also schematic diagrams in the figures in Appendix (e.g. Appendix 1—figure 9D and E).

Interpretation of $L^{1}$ norm/ $L^{2}$ norm ratio of an expression vector as a quantitative measure of expression generality

Request a detailed protocol

In Appendix 2.1.5, we will also see that even if $Θ$ is close to the identity matrix, there is another condition which must be met to guarantee the similarity of the two types of proteome structure. By considering the mathematics behind the condition, we will reveal that the two indices, stoichiometry conservation centrality (degree) $d_{j} = \sum_{i} \cos θ_{𝒑_{i} 𝒑_{j}}$ and expression generality score $g_{j} = {‖ p_{j} ‖}_{1} / {‖ p_{j} ‖}_{2}$ , must be mutually proportional. Here, we explain why $g_{j}$ is a quantitative measure of the generality (or constancy) of expression levels.

First, we note that the ratio ${‖ p_{j} ‖}_{1} / {‖ p_{j} ‖}_{2}$ is independent of the magnitude of the expression vector $𝒑_{j}$ . In other words, normalization does not affect the ratio:

\frac{{‖ \frac{p_{j}}{‖ p_{j} ‖_{2}} ‖}_{1}}{{‖ \frac{p_{j}}{‖ p_{j} ‖_{2}} ‖}_{2}} = \frac{\sum_{i = 1}^{m} \frac{| p_{i j} |}{‖ p_{j} ‖_{2}}}{\sqrt{\sum_{i = 1}^{m} \frac{(p_{i j})^{2}}{‖ p_{j} ‖_{2}^{2}}}} = \frac{\frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}}}{\frac{‖ p_{j} ‖_{2}}{‖ p_{j} ‖_{2}}} = \frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}} .

On the basis of this, we only consider normalized expression vectors $p_{j} / {‖ p_{j} ‖}_{2}$ without loss of generality.

By definition, $L^{2}$ norm of a normalized expression vector (the denominator of the most left-hand side of Equation 17) equals one. Thus, the ratio we are considering equals the $L^{1}$ norm of the normalized expression vector:

\frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}} = {‖ \frac{p_{j}}{‖ p_{j} ‖_{2}} ‖}_{1} .

Here, we write

\frac{p_{j}}{‖ p_{j} ‖_{2}} = (\begin{matrix} {\tilde{p}}_{1 j} \\ {\tilde{p}}_{2 j} \\ ⋮ \\ {\tilde{p}}_{m j} \end{matrix}),

where ${\tilde{p}}_{1 j}, {\tilde{p}}_{2 j}, \dots, {\tilde{p}}_{m j} \geq 0$ . Then,

\frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}} = {‖ \frac{p_{j}}{‖ p_{j} ‖_{2}} ‖}_{1} = \sum_{i = 1}^{m} {\tilde{p}}_{i j} .

Note that $\sum_{i = 1}^{m} {({\tilde{p}}_{i j})}^{2} = 1$ holds because of normalization. Therefore, any normalized expression vector corresponds to a point on the first orthant division of the unit $(m - 1)$ -sphere $\sum_{i = 1}^{m} {(x_{i})}^{2} = 1 (x_{1}, \dots, x_{m} \geq 0)$ .

Next, we consider a hyperplane $\sum_{i = 1}^{m} x_{i} = k$ which passes through the point in Equation 19. Since all the coefficients of this hyperplane are equal, all the $m$ intercepts are also equal. The intercept value is $k = \sum_{i = 1}^{m} {\tilde{p}}_{i j}$ , thus equals the ratio in Equation 20. In other words, the ratio from Equation 20 appears as an intercept of the hyperplane passing through the point corresponding to the normalized vector $p_{j} / {‖ p_{j} ‖}_{2}$ with all the coefficients equal to one.

By simple calculation, one can see that the two surfaces $\sum_{i = 1}^{m} {(x_{i})}^{2} = 1 (x_{1}, \dots, x_{m} \geq 0)$ and $\sum_{i = 1}^{m} x_{i} = k$ intersect when $1 \leq k \leq \sqrt{m}$ . (In other words, ${‖ p_{j} ‖}_{2} \leq {‖ p_{j} ‖}_{1} \leq \sqrt{m} {‖ p_{j} ‖}_{2}$ holds.) The intercept value $k = \sum_{i = 1}^{m} {\tilde{p}}_{i j}$ takes the maximum $k = \sqrt{m}$ when the normalized expression vector points to the ‘center’ of the first orthant division of the unit $(m - 1)$ -sphere, i.e., when

\frac{p_{j}}{‖ p_{j} ‖_{2}} = (\begin{matrix} {\tilde{p}}_{1 j} \\ {\tilde{p}}_{2 j} \\ ⋮ \\ {\tilde{p}}_{m j} \end{matrix}) = \frac{1}{\sqrt{m}} (\begin{matrix} 1 \\ 1 \\ ⋮ \\ 1 \end{matrix}) .

This means that the expression level is even and constant across the conditions. When this evenness of expression level breaks, the intercept value $k$ decreases, and it attains the minimum $k = 1$ when the normalized expression vector overlaps with an axis, i.e., when

\frac{p_{j}}{‖ p_{j} ‖_{2}} = (\begin{matrix} {\tilde{p}}_{1 j} \\ ⋮ \\ {\tilde{p}}_{(μ - 1) j} \\ {\tilde{p}}_{μ j} \\ {\tilde{p}}_{(μ + 1) j} \\ ⋮ \\ {\tilde{p}}_{m j} \end{matrix}) = (\begin{matrix} 0 \\ ⋮ \\ 0 \\ 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}),

which corresponds to a completely ‘condition-specific expression pattern’ (μ is the condition’s index).

See Appendix 1—figure 8A and B for a graphical explanation of the argument for the two- and three-dimensional cases.

Proteome structure obtained with PCA

Request a detailed protocol

As mentioned in the main text, we confirmed that PCA could find a proteome structure (Appendix 1—figure 6H) similar to the csLE structure (Figure 5L and Figure 6D). Since cosine similarity of expression vectors is inner product of the $L^{2}$ -normalized expression vectors, we also performed $L^{2}$ normalization of proteome data before applying PCA in this analysis. In other words, we applied PCA to a normalized proteome data $[p_{1} / {‖ p_{1} ‖}_{2} \dots p_{n} / {‖ p_{n} ‖}_{2}]$ (see Appendix 2.1.5).

We remark that, despite the structural similarity between the PCA structure and the csLE structure, csLE has an advantage over PCA in that the relative proximity of positions reflects the strength of stoichiometry conservation between each element. In addition, as shown in the main text and Section 2 in Appendix, csLE of omics data has a direct quantitative connection to cellular Raman spectra, which is not the case for PCA.

Appendix 1

1 Materials and methods

See Materials and methods section in the main text.

2 Mathematical analysis and details

To clarify what is nontrivial in the correspondence between LDA Raman and csLE proteome, here we derive rigorous mathematical relations for the correspondence through linear algebraic calculation.

2.1 Mathematics behind the correspondence

2.1.1 Notations

We use the following notations:

$1_{x}$ denotes an $x$ -dimensional column vector of ones, and $0_{x}$ does an $x$ -dimensional zero column vector.
A vector without a hat (e.g. $x$ ) is a column vector, and a vector with a hat (e.g. $\hat{x}$ ) is a row vector.
$I$ denotes an identity matrix, and $O$ does a zero matrix.
For a square matrix $X = (x_{i j})$ , $d i a g (X) = (δ_{i j} x_{i j})$ , and for a vector $x = (x_{i})$ , $d i a g (x) = (δ_{i j} x_{i})$ , where $δ_{i j}$ is the Kronecker delta.
For a matrix $X$ , $X [i, :]$ denotes the $i$ -th row of $X$ , and $X [:, j]$ denotes the $j$ -th column of $X$ .

2.1.2 Preparations

Let $l$ be the number of cells in each condition, $m$ be the number of conditions, $n$ be the original dimension of proteome, i.e., the number of protein species in the proteome data, and $s^{'}$ be the original dimension of Raman spectra after the application of the Savitzky-Golay filter. In our main data, $l = 38, m = 15, n = 2058$ , and $s^{'} = 599$ .

Original preprocessed Raman data: Let

\begin{array}{ll} {\hat{x}}_{j}^{' (i)} = (x_{j 1}^{' (i)} \dots x_{j s^{'}}^{' (i)}) \end{array}

be a preprocessed Raman spectrum from cell $j$ under condition $i$ (see ‘Raman measurements and preprocessing of spectra’ in Materials and methods). The prime ′ denotes that the variable is the original preprocessed data. The ${\hat{x}}_{j}^{' (i)}$ $(1 \leq j \leq l)$ are collected in an $l \times s^{'}$ matrix:

X_{i}^{'} = [\begin{matrix} {\hat{x}}_{1}^{' (i)} \\ ⋮ \\ {\hat{x}}_{l}^{' (i)} \end{matrix}] = {(x_{j k}^{' (i)})}_{1 \leq j \leq l, 1 \leq k \leq s^{'}} .

Combining $X_{i}^{'}$ from different conditions, one can define an $(l m) \times s^{'}$ matrix

X^{'} = [\begin{matrix} X_{1}^{'} \\ ⋮ \\ X_{m}^{'} \end{matrix}],

which contains all the preprocessed Raman data.

PCA and centering: Before LDA, PCA was first applied to the preprocessed Raman data to reduce noise. The covariance matrix is

C_{X^{'}} := \frac{1}{l m - 1} {(X^{'} - \frac{1}{l m} 1_{l m} (1_{l m})^{⊤} X^{'})}^{⊤} (X^{'} - \frac{1}{l m} 1_{l m} (1_{l m})^{⊤} X^{'}),

which is positive semi-definite. PCA is formulated as the following eigenvalue problem of $C_{X^{'}}$ :

C_{X^{'}} V_{P C A} = V_{P C A} Λ_{P C A},

where $Λ_{P C A}$ is a diagonal matrix with the eigenvalues of $C_{X^{'}}$ as its diagonal elements in decreasing order from the upper left, and $V_{P C A}$ is an orthogonal matrix consisting of the eigenvectors of $C_{X^{'}}$ as its columns. Using the first $s (1 \leq s \leq s^{'})$ columns of $V_{PCA} = [\begin{matrix} w_{1} & \dots & w_{s^{'}} \end{matrix}]$ , i.e., the columns corresponding to the first $s$ largest eigenvalues, we obtain an $(l m) \times s$ matrix representing the post PCA data:

X := (X^{'} - \frac{1}{l m} 1_{l m} {(1_{l m})}^{⊤} X^{'}) [\begin{matrix} w_{1} & \dots & w_{s} \end{matrix}]

Here, $w_{k}$ is the $k$ -th PCA coefficient vector, and $s$ is the reduced dimension of the Raman spectra. The subtraction of the $\frac{1}{l m} 1_{l m} (1_{l m})^{⊤} X^{'}$ is for centering the data. In our case, the top 218 principal components (i.e. $s = 218$ ) explaining 98% of the variance were used to reduce noise and dimensionality.

Let ${\hat{x}}_{j}^{(i)}$ be the post PCA Raman spectrum from cell $j$ under condition $i$ , i.e.,

\begin{array}{ll} {\hat{x}}_{j}^{(i)} = (\begin{matrix} x_{j 1}^{(i)} & \dots & x_{j s}^{(i)} \end{matrix}) \end{array},

and $X_{i}$ be the collection of ${\hat{x}}_{j}^{(i)}$ $(1 \leq j \leq l)$ , i.e.,

X_{i} = [\begin{matrix} {\hat{x}}_{1}^{(i)} \\ ⋮ \\ {\hat{x}}_{l}^{(i)} \end{matrix}] = {(x_{j k}^{(i)})}_{1 \leq j \leq l, 1 \leq k \leq s}

Then, $X$ is written as

X = [\begin{matrix} X_{1} \\ ⋮ \\ X_{m} \end{matrix}]

From Equation 2.6,

{(1_{l m})}^{⊤} X = 0,

namely, for any $k$ ,

\sum_{i = 1}^{m} \sum_{j = 1}^{l} x_{j k}^{(i)} = 0,

which means that the post PCA data is centered.

Population average in each condition: Let ${\hat{x}}_{i}$ be the population average of the post PCA spectra of cells under condition $i$ . Then,

\begin{array}{lcr} {\hat{x}}_{i} = \frac{1}{l} {(1_{l})}^{⊤} X_{i} . \end{array}

Also,

\begin{array}{ll} {\hat{x}}_{i} = (\begin{matrix} {\bar{x}}_{i 1} & \dots & {\bar{x}}_{i s} \end{matrix}) \end{array}

where

\begin{array}{lcr} {\overline{x}}_{i k} = \frac{1}{l} \sum_{j = 1}^{l} x_{j k}^{(i)} . \end{array}

We define an $m \times s$ matrix

\begin{array}{ll} \bar{X} = [\begin{matrix} {\hat{x}}_{1} \\ ⋮ \\ {\hat{x}}_{m} \end{matrix}] = {({\bar{x}}_{i k})}_{1 \leq i \leq m, 1 \leq k \leq s} \end{array} .

Each row of $\overline{X}$ corresponds to a condition. From Equation 2.11 and Equation 2.14, for any $k$ ,

\sum_{i = 1}^{m} {\bar{x}}_{i k} = 0.

Namely,

{(1_{m})}^{⊤} \bar{X} = 0 .

These relations mean that $\overline{X}$ is also centered.

LDA: The within-class covariance matrix is

C_{I} := \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{l - 1} {(X_{i} - \frac{1}{l} 1_{l} {(1_{l})}^{⊤} X_{i})}^{⊤} (X_{i} - \frac{1}{l} 1_{l} {(1_{l})}^{⊤} X_{i})

= \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{l - 1} {(X_{i} - 1_{l} {\hat{x}}_{i})}^{⊤} (X_{i} - 1_{l} {\hat{x}}_{i}) .

Here, Equation 2.12 was used. The between-class covariance matrix is

C_{B} := \frac{1}{m - 1} {(\bar{X} - \frac{1}{m} 1_{m} {(1_{m})}^{⊤} \bar{X})}^{⊤} (\bar{X} - \frac{1}{m} 1_{m} {(1_{m})}^{⊤} \bar{X})

= \frac{1}{m - 1} {\bar{X}}^{⊤} \bar{X} .

Here, Equation 2.17 was used. Assume $r a n k (C_{I}) = s$ and $r a n k (C_{B}) = m - 1$ (the maximum possible values). In fact, $r a n k (C_{I}) = s$ and $r a n k (C_{B}) = m - 1$ in our data. Note that $s > m - 1$ . From the definitions of $C_{I}$ and $C_{B}$ above, both are positive semi-definite. LDA is formulated as the following generalized eigenvalue problem:

C_{B} V_{L D A} = C_{I} V_{L D A} Λ_{L D A}^{'},

where $Λ_{LDA}^{'}$ is a diagonal matrix, and $V_{LDA}$ is an $s \times (m - 1)$ matrix that simultaneously diagonalizes $C_{B}$ and $C_{I}$ (to $Λ_{LDA}$ ):

{(V_{L D A})}^{⊤} C_{B} V_{L D A} = {(V_{L D A})}^{⊤} C_{I} V_{L D A} Λ_{L D A}^{'} = Λ_{L D A} .

Here, the diagonal elements in $Λ_{LDA}^{'}$ were in decreasing order from the upper left. In our analysis, the columns of $V_{LDA}$ were normalized.

Using $V_{LDA}$ , we obtain an $m \times (m - 1)$ matrix representing the post LDA data

R := \bar{X} V_{L D A} .

Each row of $R$ represents a dimension-reduced Raman spectrum of each condition. Let us write the $h$ -th $(1 \leq h \leq m - 1)$ column of $R$ as

r_{h} := (\begin{matrix} r_{1 h} \\ ⋮ \\ r_{m h} \end{matrix}) .

Then,

R = [\begin{matrix} r_{1} & \dots & r_{m - 1} \end{matrix}] = {(r_{i h})}_{1 \leq i \leq m, 1 \leq h \leq m - 1} .

Transforming $R^{⊤} R$ gives

R^{⊤} R = {(V_{L D A})}^{⊤} {\bar{X}}^{⊤} \bar{X} V_{L D A}

= (m - 1) {(V_{L D A})}^{⊤} C_{B} V_{L D A} (∵ 2.20)

= (m - 1) Λ_{L D A} (∵ 2.23) .

Therefore, $R^{⊤} R$ is a diagonal matrix, and $r_{h}$ are orthogonal to each other. As all the diagonal elements of the diagonal matrix $Λ_{LDA}$ is positive, $r a n k (R) = m - 1$ . Furthermore,

{(1_{m})}^{⊤} R = {(1_{m})}^{⊤} \bar{X} V_{L D A}

= {(0_{m - 1})}^{⊤} (∵ 2.17) .

Namely, for any $h$ ,

\sum_{i = 1}^{m} r_{i h} = 0.

This means that, as the data is centered, all the columns of $R$ are perpendicular to $1_{m}$ .

Proteome

Let

p_{j} = (\begin{matrix} p_{1 j} \\ ⋮ \\ p_{m j} \end{matrix})

be the absolute abundances of the $j$ -th protein. $p_{j}$ are collected in an $m \times n$ matrix:

P := [\begin{matrix} p_{1} & \dots & p_{n} \end{matrix}] = {(p_{i j})}_{1 \leq i \leq m, 1 \leq j \leq n} .

$p_{i j}$ is the absolute abundance of the $j$ -th protein in condition $i$ .

We assume $r a n k (P) = m$ , i.e., proteome vectors for different conditions are linearly independent. Actually, $r a n k (P) = m$ in our data. We also assume that proteins with zero expression in all the $m$ conditions had been excluded from the proteome data.

2.1.3 Linear transformation between LDA Raman and proteome

We define an $m \times m$ matrix

R_{E} := [\begin{matrix} 1_{m} & R \end{matrix}] = [\begin{matrix} 1_{m} & r_{1} & \dots & r_{m - 1} \end{matrix}]

We denote the first column of $R_{E}$ as $r_{0}$ . Hence,

r_{0} = 1_{m} .

From Equation 2.29 and Equation 2.31,

\begin{array}{lcr} \begin{array}{r} {(R_{E})}^{⊤} R_{E} = [\begin{array}{c} (1_{m})^{⊤} \\ R^{⊤} \end{array}] [\begin{array}{cc} 1_{m} & R \end{array}] = [\begin{array}{cc} m & 0 \\ 0 & R^{⊤} R \end{array}] = [\begin{array}{cc} m & 0 \\ 0 & (m - 1) Λ_{LDA} \end{array}] . \end{array} \end{array}

Therefore, ${(R_{E})}^{⊤} R_{E}$ is also a diagonal matrix, and $R_{E}$ has full rank. For convenience, we write

{(R_{E})}^{⊤} R_{E} = Λ_{R_{E}} = (\begin{matrix} m \\ (m - 1) λ_{L D A 1} \\ ⋱ \\ (m - 1) λ_{L D A (m - 1)} \end{matrix}) .

Here, we consider singular value decomposition (SVD) of $R_{E}$ ; i.e.;

R_{E} = U_{R_{E}} Σ_{R_{E}} {(V_{R_{E}})}^{⊤},

where $Σ_{R_{E}}$ is a diagonal matrix whose diagonal elements are the singular values of $R_{E}$ , and ${(U_{R_{E}})}^{⊤} U_{R_{E}} = {(V_{R_{E}})}^{⊤} V_{R_{E}} = I$ . Note that we can set $V_{R_{E}} = I$ in the following way. Let

U_{R_{E}} := R_{E} (Λ_{R_{E}})^{- 1 / 2} .

Then,

(U_{R_{E}})^{⊤} U_{R_{E}} = I,

(U_{R_{E}})^{⊤} R_{E} = (Λ_{R_{E}})^{1 / 2} .

Thus, SVD of $R_{E}$ can be written as

R_{E} = U_{R_{E}} (Λ_{R_{E}})^{1 / 2} I .

As $Λ_{R_{E}}$ is the eigenvalue matrix of $(R_{E})^{⊤} R_{E}$ ,

{(R_{E})}^{⊤} R_{E} = Λ_{R_{E}} = {(Σ_{R_{E}})}^{2}

and

Σ_{R_{E}} = (\begin{matrix} \sqrt{m} \\ \sqrt{(m - 1) λ_{L D A 1}} \\ ⋱ \\ \sqrt{(m - 1) λ_{L D A (m - 1)}} \end{matrix}) .

Now, we consider linear transformation between $P$ and $R_{E}$ . We introduce the $n \times m$ coefficient matrix $B_{E} = [b_{0} \dots b_{m - 1}]$ that connects $P$ and $R_{E}$ as

\begin{array}{ll} P & = R_{E} {(B_{E})}^{⊤} \end{array}

= [\begin{matrix} r_{0} r_{1} & \dots & r_{m - 1} \end{matrix}] [\begin{matrix} {(b_{0})}^{⊤} \\ {(b_{1})}^{⊤} \\ ⋮ \\ {(b_{m - 1})}^{⊤} \end{matrix}]

= 1_{m} {(b_{0})}^{⊤} + r_{1} {(b_{1})}^{⊤} + \dots + r_{m - 1} {(b_{m - 1})}^{⊤} .

$R_{E}$ has full rank and is therefore invertible. Thus, $B_{E}$ is obtained by

B_{E} := P^{⊤} {({(R_{E})}^{- 1})}^{⊤} .

From the viewpoint of linear regression, $b_{0}$ can be regarded as the constant terms and $b_{h} (1 \leq h \leq m - 1)$ is the coefficients for the $h$ -th LDA dimension.

We can rewrite $P$ using the row vectors in $R_{E}$ and $B_{E}$ . Writing the $i$ -th $(1 \leq i \leq m)$ row of $R_{E}$ as ${\hat{r}}_{i} = (1 r_{i 1} \dots r_{i (m - 1)})$ ,

R_{E} = \begin{array}{ll} [\begin{matrix} {\hat{r}}_{1} \\ ⋮ \\ {\hat{r}}_{m} \end{matrix}] \end{array}

Likewise, writing the $j$ -th $(1 \leq j \leq n)$ row of $B_{E}$ as ${\hat{b}}_{j} = (b_{j 0} \dots b_{j (m - 1)})$ ,

B_{E} = [\begin{matrix} {\hat{b}}_{1} \\ ⋮ \\ {\hat{b}}_{n} \end{matrix}]

Then, Equation 2.46 can be written in another way:

\begin{array}{ll} P & = R_{E} {(B_{E})}^{⊤} \\ = [\begin{matrix} {\hat{r}}_{1} \\ ⋮ \\ {\hat{r}}_{m} \end{matrix}] [\begin{matrix} {({\hat{b}}_{1})}^{⊤} & \dots & {({\hat{b}}_{n})}^{⊤} \end{matrix}] = (\begin{matrix} {\hat{r}}_{1} {({\hat{b}}_{1})}^{⊤} & \dots & {\hat{r}}_{1} {({\hat{b}}_{n})}^{⊤} \\ ⋮ & ⋱ & ⋮ \\ {\hat{r}}_{m} {({\hat{b}}_{1})}^{⊤} & \dots & {\hat{r}}_{m} {({\hat{b}}_{n})}^{⊤} \end{matrix}) \end{array}

The interpretation of each vector is summarized in Appendix 1—table 9.

2.1.4 Relation between LDA Raman and $Ω_{B}$

Here, we discuss the spatial correspondence between the Raman distribution in LDA space and the normalized Raman-proteome coefficient proteome structure (Figure 6B and C).

Connecting LDA Raman and Raman-omics transformation coefficients

Let us consider ${(R_{E})}^{⊤} R_{E} {(B_{E})}^{⊤}$ in two ways. In this first approach,

{(R_{E})}^{⊤} (R_{E} {(B_{E})}^{⊤}) = {(R_{E})}^{⊤} P

= [{({\hat{r}}_{1})}^{⊤} \dots {({\hat{r}}_{m})}^{⊤}] ⟮ \begin{matrix} p_{11} & \dots & p_{1 n} \\ ⋮ & ⋱ & ⋮ \\ p_{m 1} & \dots & p_{m n} \end{matrix} ⟯

= [\begin{matrix} \sum_{i = 1}^{m} p_{i 1} {({\hat{r}}_{i})}^{⊤} & \dots & \sum_{i = 1}^{m} p_{i n} {({\hat{r}}_{i})}^{⊤} \end{matrix}]

In the second approach,

\begin{array}{ll} ({(R_{E})}^{⊤} R_{E}) {(B_{E})}^{⊤} \\ = {(Σ_{R_{E}})}^{2} {(B_{E})}^{⊤} \end{array}

= (\begin{matrix} m \\ (m - 1) λ_{L D A 1} \\ ⋱ \\ (m - 1) λ_{L D A (m - 1)} \end{matrix}) [\begin{matrix} {({\hat{b}}_{1})}^{⊤} & \dots & {({\hat{b}}_{n})}^{⊤} \end{matrix}] .

Comparing the two calculations yields

\begin{array}{lcr} {(R_{E})}^{⊤} P = {(Σ_{R_{E}})}^{2} {(B_{E})}^{⊤} . \end{array}

This is equivalent to

\sum_{i = 1}^{m} p_{i j} {({\hat{r}}_{i})}^{⊤} = (\begin{matrix} m \\ (m - 1) λ_{L D A 1} \\ ⋱ \\ (m - 1) λ_{L D A (m - 1)} \end{matrix}) {({\hat{b}}_{j})}^{⊤}

for any protein $j$ .

Normalization with constant terms

Next, we consider the normalization of the matrices in Equation 2.58 with constant terms.

We first define the normalized coefficient matrix as

B_{E}^{norm} := (\begin{matrix} 1 & \frac{b_{11}}{b_{10}} & \dots & \frac{b_{1 (m - 1)}}{b_{10}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & \frac{b_{n 1}}{b_{n 0}} & \dots & \frac{b_{n (m - 1)}}{b_{n 0}} \end{matrix}) = diag {(b_{0})}^{- 1} B_{E} .

This is normalization of the coefficients by the constant terms. Furthermore, we normalize $Σ_{R_{E}}$ as

Σ_{R_{E}}^{n o r m} := \frac{1}{{(Σ_{R_{E}})}_{11}} Σ_{R_{E}} = \frac{1}{\sqrt{m}} Σ_{R_{E}}

= (\begin{matrix} 1 \\ \sqrt{\frac{m - 1}{m} λ_{L D A 1}} \\ ⋱ \\ \sqrt{\frac{m - 1}{m} λ_{L D A (m - 1)}} \end{matrix}) .

Thus, the right-hand side of Equation 2.58 is

{(Σ_{R_{E}})}^{2} {(B_{E})}^{⊤} = m {(Σ_{R_{E}}^{n o r m})}^{2} {(B_{E}^{n o r m})}^{⊤} d i a g (b_{0}) .

Since the first row of ${(R_{E})}^{⊤}$ is ${(1_{m})}^{⊤}$ , one can rewrite the left-hand side of Equation 2.58 as

\begin{array}{ll} {(R_{E})}^{⊤} P & = [\begin{matrix} \frac{\sum_{i = 1}^{m} p_{i 1} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i 1}} & \dots & \frac{\sum_{i = 1}^{m} p_{i n} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i n}} \end{matrix}] (\begin{matrix} \sum_{i = 1}^{m} p_{i 1} \\ ⋱ \\ \sum_{i = 1}^{m} p_{i n} \end{matrix}) \end{array}

= [\begin{matrix} \frac{\sum_{i = 1}^{m} p_{i 1} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i 1}} & \dots & \frac{\sum_{i = 1}^{m} p_{i n} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i n}} \end{matrix}] diag ({(1_{m})}^{⊤} P) .

Here,

(1_{m})^{⊤} P = (1_{m})^{⊤} R_{E} (B_{E})^{⊤}

= [m {(0_{m - 1})}^{⊤}] (B_{E})^{⊤}

= m (b_{0})^{⊤} .

Therefore,

b_{0} = \frac{1}{m} (\begin{matrix} {(1_{m})}^{⊤} p_{1} \\ ⋮ \\ {(1_{m})}^{⊤} p_{n} \end{matrix}) = (\begin{matrix} \frac{1}{m} \sum_{i = 1}^{m} p_{i 1} \\ ⋮ \\ \frac{1}{m} \sum_{i = 1}^{m} p_{i n} \end{matrix})

This relation indicates that the constant term for each protein is its average abundance. Consequently,

{(R_{E})}^{⊤} P = m [\begin{matrix} \frac{\sum_{i = 1}^{m} p_{i 1} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i 1}} & \dots & \frac{\sum_{i = 1}^{m} p_{i n} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i n}} \end{matrix}] diag (b_{0})

Therefore, we obtain

[\begin{matrix} \frac{\sum_{i = 1}^{m} p_{i 1} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i 1}} & \dots & \frac{\sum_{i = 1}^{m} p_{i n} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i n}} \end{matrix}] = {(Σ_{R_{E}}^{n o r m})}^{2} {(B_{E}^{n o r m})}^{⊤} .

Equivalently,

\frac{\sum_{i = 1}^{m} p_{i j} {({\hat{r}}_{i})}^{⊤}}{\sum_{i = 1}^{m} p_{i j}} = (\begin{matrix} 1 \\ \frac{m - 1}{m} λ_{L D A 1} \\ ⋱ \\ \frac{m - 1}{m} λ_{L D A (m - 1)} \end{matrix}) (\begin{matrix} 1 \\ \frac{b_{j 1}}{b_{j 0}} \\ ⋮ \\ \frac{b_{j (m - 1)}}{b_{j 0}} \end{matrix})

for any protein $j$ . This means that the normalized coefficients of each protein are mainly determined by the weighted averages of the Raman vectors, where the weights are the abundances of the protein.

As we already saw in Equation 7 or Equation 8 in Materials and methods, this equation also shows that protein pairs whose abundance ratio remains constant over all the conditions have identical normalized coefficients.

Special case – condition-specific protein

Consider an imaginary condition-specific protein $γ$ whose abundance is $c (> 0)$ under condition $Γ$ and zero under the other conditions, i.e.,

\begin{array}{ll} \begin{aligned} p_{i γ} = {\begin{cases} 0 & (i \neq Γ) \\ c & (i = Γ) \end{cases} . \end{aligned} \end{array}

From Equation 2.59,

(\begin{matrix} c \\ c \\ ⋱ \\ c \end{matrix}) {({\hat{r}}_{Γ})}^{⊤} = (\begin{matrix} m \\ (m - 1) λ_{L D A 1} \\ ⋱ \\ (m - 1) λ_{L D A (m - 1)} \end{matrix}) {({\hat{b}}_{γ})}^{⊤},

which indicates that the LDA Raman of condition $Γ$ , ${({\hat{r}}_{Γ})}^{⊤}$ and Raman-proteome coefficients for $Γ$ -specific protein $γ$ , ${({\hat{b}}_{γ})}^{⊤}$ are in the same orthant. The normalized version is obtained by dividing both sides by the first row (or from Equation 2.72):

{({\hat{r}}_{Γ})}^{⊤} = (\begin{matrix} 1 \\ \frac{m - 1}{m} λ_{L D A 1} \\ ⋱ \\ \frac{m - 1}{m} λ_{L D A (m - 1)} \end{matrix}) (\begin{matrix} 1 \\ \frac{b_{γ 1}}{b_{γ 0}} \\ ⋮ \\ \frac{b_{γ (m - 1)}}{b_{γ 0}} \end{matrix})

= {(Σ_{R_{E}}^{n o r m})}^{2} \frac{{({\hat{b}}_{γ})}^{⊤}}{b_{γ 0}},

which shows that the LDA Raman of condition $Γ$ , ${({\hat{r}}_{Γ})}^{⊤}$ and the normalized Raman-proteome coefficients of $Γ$ -specific protein $γ$ , ${({\hat{b}}_{γ})}^{⊤} / b_{γ 0}$ are in the same orthant.

Application to main data

The LDA Raman distribution shown in Figure 6B corresponds to $\hat{r_{i}}$ (scatterplots between different columns of $R_{E}$ ). On the other hand, the normalized coefficient proteome structure in Figure 6C is the scatterplots between different columns of $B_{E}^{norm}$ . The above linear algebra explains the correspondence between the two. In addition, from Equation 2.72, one can understand that the homeostatic core distributes around the center of the structure because its member proteins are expressed in all the conditions.

Equation 2.76 was obtained by considering an imaginary protein whose expression levels were zero under all conditions except for one condition. To confirm this relation with actual data, we picked an almost-condition-specific protein (PaaE, highly expressed in LB condition) and a non-condition-specific protein (AcrR), and confirmed that the former approximately satisfied Equation 2.76, while the relation did not hold for the latter (Appendix 1—figure 10).

2.1.5 Relation between $Ω_{B}$ and $Ω_{L E}$

Here, we discuss the spatial correspondence between the normalized Raman-proteome coefficient proteome structure and the csLE proteome structure (Figure 6C and D).

csLE proteome structure

Consider an undirected graph where each node corresponds to one type of protein. As previously explained, let the graph be a complete graph, namely every pair of nodes is connected by an edge. Each edge is weighted with cosine similarity between the two types of protein connected by the edge. Cosine similarity of protein $i$ and protein $j$ is given by

\cos θ_{p_{i} p_{j}} := \frac{p_{i} \cdot p_{j}}{‖ p_{i} ‖_{2} ‖ p_{j} ‖_{2}},

where $p_{i} \cdot p_{j} (= {(p_{i})}^{⊤} p_{j})$ is the inner product of $p_{i}$ and $p_{j}$ , and ${‖ p_{i} ‖}_{2} (= \sqrt{{(p_{i})}^{⊤} p_{i}})$ is the $L^{2}$ -norm (Euclidean norm) of $p_{i}$ . Cosine similarity $\cos θ_{p_{i} p_{j}}$ evaluates how similar the expression patterns are between protein $i$ and protein $j$ . When the abundance ratio between protein $i$ and protein $j$ remains constant over all the $m$ conditions, $\cos θ_{p_{i} p_{j}}$ takes the maximum value 1.

The adjacency matrix of this graph is given by

A := {(\cos θ_{p_{i} p_{j}})}_{1 \leq i, j \leq n} = {(\frac{p_{i} \cdot p_{j}}{‖ p_{i} ‖_{2} ‖ p_{j} ‖_{2}})}_{1 \leq i, j \leq n}

and the degree matrix of this graph is

D = d i a g (A 1_{n}) .

For simplicity, diagonal element $(i, i)$ of $A$ is $\cos θ_{p_{i} p_{i}} = 1$ for any protein $i$ , i.e., each node has a loop. $A$ is $n \times n$ and real symmetric and $D$ is $n \times n$ and diagonal. Then, the Laplacian matrix is given by

L := D - A,

which is an $n \times n$ symmetric matrix, and the symmetric normalized Laplacian is given by

L_{s y m} = D^{- 1 / 2} L D^{- 1 / 2} = I - D^{- 1 / 2} A D^{- 1 / 2},

which is an $n \times n$ symmetric matrix.

Here, we define $\hat{P}$ by normalizing the columns of $P = [\begin{matrix} p_{1} & \dots & p_{n} \end{matrix}]$ :

\begin{array}{ll} \hat{P} := [\begin{matrix} \frac{p_{1}}{{‖ p_{1} ‖}_{2}} & \dots & \frac{p_{n}}{{‖ p_{n} ‖}_{2}} \end{matrix}] = P diag {(P^{⊤} P)}^{- 1 / 2} \end{array}

By using $\hat{P}$ , $A$ is rewritten as

A = {(\cos θ_{p_{i} p_{j}})}_{1 \leq i, j \leq n} = {(\frac{p_{i} \cdot p_{j}}{‖ p_{i} ‖_{2} ‖ p_{j} ‖_{2}})}_{1 \leq i, j \leq n}

= {(P d i a g {(P^{⊤} P)}^{- 1 / 2})}^{⊤} P d i a g {(P^{⊤} P)}^{- 1 / 2}

= d i a g {(P^{⊤} P)}^{- 1 / 2} P^{⊤} P d i a g {(P^{⊤} P)}^{- 1 / 2}

= {\hat{P}}^{⊤} \hat{P} .

Consider an eigenproblem

\begin{array}{lcr} L_{sym} V_{sym} = V_{sym} Λ_{LE}, \end{array}

where $Λ_{L E}$ is an $n \times n$ diagonal matrix in which the eigenvalues of $L_{sym}$ are arranged in increasing order from the upper left, and columns of $V_{sym}$ are the normalized eigenvectors of $L_{sym}$ corresponding to the eigenvalues. Denote

Λ_{L E} = (\begin{matrix} λ_{L E 0} \\ ⋱ \\ λ_{L E (n - 1)} \end{matrix})

Here, $L_{sym}$ has the following four characteristics:

$L_{sym}$ is positive semi-definite. See, for example, von Luxburg, 2007 for the proof.
In an undirected graph with non-negative weights, the number of separated graph components equals the multiplicity of the eigenvalue zero of $L_{sym}$ . See, for example, von Luxburg, 2007 for the proof. Since our proteome graph is connected, our $L_{sym}$ has the single eigenvalue zero.
From Equation 2.86, $r a n k (A) = r a n k (\hat{P})$ . Here, it is obvious that $d i a g {(P^{⊤} P)}^{- 1 / 2}$ has full rank, hence, $r a n k (\hat{P}) = r a n k (P) = m$ . Therefore, $r a n k (A) = m$ . Obviously, $D$ has full rank by definition and thus, $r a n k (D^{- 1 / 2} A D^{- 1 / 2}) = r a n k (A) = m$ . Therefore, $D^{- 1 / 2} A D^{- 1 / 2}$ has $n - m$ singular values of zero. Since $D^{- 1 / 2} A D^{- 1 / 2}$ is symmetric, its singular values and eigenvalues are the same. Thus, $D^{- 1 / 2} A D^{- 1 / 2}$ has $n - m$ singular values of zero. Therefore, $L_{s y m} (= I - D^{- 1 / 2} A D^{- 1 / 2})$ has $n - m$ singular values of $1 (= 1 - 0)$ .
For any $n$ -dimensional vector $x$ , $x^{⊤} D^{- 1 / 2} A D^{- 1 / 2} x$ $= {(\hat{P} D^{- 1 / 2} x)}^{⊤} \hat{P} D^{- 1 / 2} x$ $= {‖ \hat{P} D^{- 1 / 2} x ‖}_{2}^{2} \geq 0$ . Therefore, $D^{- 1 / 2} A D^{- 1 / 2}$ is positive semi-definite, and all of the eigenvalues of $L_{s y m} (= I - D^{- 1 / 2} A D^{- 1 / 2})$ are less than or equal to one.

By these four points, we see that the eigenvalues of $L_{s y m} (= I - D^{- 1 / 2} A D^{- 1 / 2})$ satisfy

0 = λ_{L E 0} < λ_{L E 1} \leq \dots \leq λ_{L E (m - 1)} < λ_{L E m} = λ_{L E (m + 1)} = \dots = λ_{L E (n - 1)} = 1.

Now we define an $m \times m$ matrix ${\tilde{Λ}}_{L E}$ as

\begin{array}{ll} {\tilde{Λ}}_{L E} = (\begin{matrix} 0 \\ λ_{L E 1} \\ ⋱ \\ λ_{L E (m - 1)} \end{matrix}) \end{array}

Then, we can write

Λ_{L E} = (\begin{matrix} {\tilde{Λ}}_{L E} \\ I \end{matrix})

Let ${\tilde{V}}_{s y m}$ be the first $m$ columns of $V_{s y m}$ :

\begin{array}{ll} {\tilde{V}}_{s y m} = [\begin{matrix} {\hat{v}}_{s y m, 1} \\ ⋮ \\ {\hat{v}}_{s y m, n} \end{matrix}] = [\begin{matrix} v_{s y m, 10} & \dots & v_{s y m, 1 (m - 1)} \\ ⋮ & ⋱ & ⋮ \\ v_{s y m, n 0} & \dots & v_{s y m, n (m - 1)} \end{matrix}] \end{array}

The truncated version of the eigenproblem is

L_{s y m} {\tilde{V}}_{s y m} = {\tilde{V}}_{s y m} {\tilde{Λ}}_{L E} .

$\begin{array}{ll} {\hat{v}}_{sym, i} := (\begin{matrix} v_{sym, i 0} & \dots & v_{sym, i (m - 1)} \end{matrix}) \end{array}$ is the $i$ -th row of ${\tilde{V}}_{s y m}$ and provides a new $m$ -dimensional representation of the $i$ -th protein. This representation of the proteome reflects distance between each protein pair in terms of cosine similarity.

Here we clarify the correspondence of the eigenproblems defined by the normalized Laplacian matrices $L_{s y m}$ and $L_{r w}$ . Let

{\tilde{V}}_{r w} = D^{- 1 / 2} {\tilde{V}}_{s y m} .

Then, this eigenproblem can also be regarded as the following generalized eigenproblem,

L {\tilde{V}}_{r w} = D {\tilde{V}}_{r w} {\tilde{Λ}}_{L E} .

Remembering that $L_{r w} = D^{- 1} L$ , we can further transform it into an eigneproblem

L_{r w} {\tilde{V}}_{r w} = {\tilde{V}}_{r w} {\tilde{Λ}}_{L E} .

This is the form of eigenproblem that we explained previously in ‘Global proteome structures based on stoichiometric balance’ in Materials and methods. In this section, we discuss it later.

The eigenproblem (Equation 2.87) can be transformed to

D^{- 1 / 2} A D^{- 1 / 2} V_{s y m} = V_{s y m} M,

where

M := I - Λ_{L E} = (\begin{matrix} I - {\tilde{Λ}}_{L E} \\ O \end{matrix}) .

This means that the columns of $V_{sym}$ are also the (normalized) eigenvectors of $D^{- 1 / 2} A D^{- 1 / 2}$ , and $M$ is the corresponding eigenvalue matrix.

Defining an $m \times m$ matrix $\tilde{M}$ as

\begin{array}{ll} \tilde{M} = I - {\tilde{Λ}}_{L E} = (\begin{matrix} 1 \\ 1 - λ_{L E 1} \\ ⋱ \\ 1 - λ_{L E (m - 1)} \end{matrix}) \end{array},

we can write

M = (\begin{matrix} \tilde{M} \\ O \end{matrix}) .

Note that

\begin{array}{lcr} 1 > 1 - λ_{LE 1} \geq \dots \geq 1 - λ_{LE (m - 1)} > 0. \end{array}

Equation 2.97 is further transformed into

A = D^{1 / 2} V_{s y m} M {(V_{s y m})}^{⊤} D^{1 / 2} .

Comparing Equation 2.102 and Equation 2.85 leads to

d i a g {(P^{⊤} P)}^{- 1 / 2} P^{⊤} P d i a g {(P^{⊤} P)}^{- 1 / 2} = D^{1 / 2} V_{s y m} M {(V_{s y m})}^{⊤} D^{1 / 2} .

Connecting Raman-proteome transformation coefficients and csLE proteome

We consider $P^{⊤} P$ in two ways. First, from Equation 2.103,

\begin{array}{ll} P^{⊤} P = d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} V_{s y m} M {(V_{s y m})}^{⊤} D^{1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} \\ = d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} V_{s y m} M^{1 / 2} {(d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} V_{s y m} M^{1 / 2})}^{⊤} \\ = d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} (\begin{matrix} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2} | O \end{matrix}) {(d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} (\begin{matrix} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2} | O \end{matrix}))}^{⊤} \\ = (\begin{matrix} d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2} | O \end{matrix}) {(\begin{matrix} d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2} | O \end{matrix})}^{⊤} \\ = (\begin{matrix} d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2} | O \end{matrix}) (\begin{matrix} {(d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2})}^{⊤} \\ O \end{matrix}) \\ = d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2} {(d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} {\tilde{M}}^{1 / 2})}^{⊤} . \end{array}

Since

D^{- 1 / 2} A D^{- 1 / 2} = D^{- 1 / 2} {\hat{P}}^{⊤} \hat{P} D^{- 1 / 2} (∵ (2.86))

= {(\hat{P} D^{- 1 / 2})}^{⊤} \hat{P} D^{- 1 / 2},

the diagonal elements of $\tilde{M}$ , i.e., the positive eigenvalues of $D^{- 1 / 2} A D^{- 1 / 2}$ are the square of the singular values of $\hat{P} D^{- 1 / 2}$ . Compact SVD of $\hat{P} D^{- 1 / 2}$ is expressed as

\hat{P} D^{- 1 / 2} = U_{L E} Σ_{L E} {(V_{L E})}^{⊤},

where $Σ_{LE}$ is an $m \times m$ diagonal matrix whose diagonal elements are the singular values in decreasing order from the upper left, and ${(U_{L E})}^{⊤} U_{L E} = {(V_{L E})}^{⊤} V_{L E} = I$ . We then obtain

Σ_{L E} = {\tilde{M}}^{1 / 2} = (\begin{matrix} 1 \\ \sqrt{1 - λ_{L E 1}} \\ ⋱ \\ \sqrt{1 - λ_{L E (m - 1)}} \end{matrix})

Thus,

P^{⊤} P = d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} Σ_{L E} {(d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} Σ_{L E})}^{⊤} .

On the other hand, from Equation 2.44 and Equation 2.46,

P^{⊤} P = {(R_{E} {(B_{E})}^{⊤})}^{⊤} R_{E} {(B_{E})}^{⊤} = B_{E} {(R_{E})}^{⊤} R_{E} {(B_{E})}^{⊤}

= B_{E} {(Σ_{R_{E}})}^{2} B_{E}^{⊤}

= B_{E} Σ_{R_{E}} {(B_{E} Σ_{R_{E}})}^{⊤} .

Therefore, comparing Equation 2.109 and Equation 2.112 yields

B_{E} Σ_{R_{E}} = d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} Σ_{L E} Θ,

where $Θ$ is an $m \times m$ orthogonal matrix. We define the estimate of $B_{E}$ as

B_{E}^{e s t} := d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} .

By this notation, Equation 2.113 can be written as

B_{E} Σ_{R_{E}} = B_{E}^{e s t} Σ_{L E} Θ .

Here, the left-hand side represents Raman-proteome linear transformation, whereas the right-hand side except for $Θ$ is derived only from proteome data. Note that in order to derive Equation 2.113, LDA does not need to be applied to Raman data because $P^{⊤} P$ can be written in the form of $B_{E} {(Σ_{R_{E}})}^{2} {(B_{E})}^{⊤}$ even if LDA is not applied to Raman data.

Normalization with constant terms

Now we consider normalizing both sides of Equation 2.115 by the first columns.

With Equation 2.60 and Equation 2.61, the left-hand side of Equation 2.115 can be rewritten as

B_{E} Σ_{R_{E}} = \sqrt{m} d i a g (b_{0}) B_{E}^{n o r m} Σ_{R_{E}}^{n o r m} .

Likewise, for the right-hand side, the first column of $B_{E}^{est}$ (the estimated constant term) is

b_{0}^{e s t} := B_{E}^{e s t} [:, 1] = B_{E}^{e s t} (\begin{matrix} 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}) = diag {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m} (\begin{matrix} 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}) .

The first column of ${\tilde{V}}_{sym}$ is the normalized eigenvector corresponding to the eigenvalue zero of $L_{sym}$ .

By the definition of $L$ , $L 1_{n} = D 1_{n} - A 1_{n} = d i a g (D) - d i a g (D) = 0_{n}$ . Hence, in general, $L$ has an eigenvalue zero and a corresponding eigenvector $1_{n}$ . Therefore, by the definition of $L_{r w}$ , $L_{r w} 1_{n} = D^{- 1} L 1_{n} = 0_{n}$ ; $L_{r w}$ also has an eigenvalue zero and a corresponding eigenvector $1_{n}$ . The eigenproblems $L_{r w} {\tilde{V}}_{r w} = {\tilde{V}}_{r w} {\tilde{Λ}}_{L E}$ and $L_{s y m} {\tilde{V}}_{s y m} = {\tilde{V}}_{s y m} {\tilde{Λ}}_{L E}$ are equivalent because one can obtain the eigenproblem of $L_{s y m}$ by left multiplying both sides of the eigenproblem of $L_{r w}$ by $D^{1 / 2}$ and that of $L_{r w}$ by left multiplying both sides of the eigenproblem of $L_{s y m}$ by $D^{- 1 / 2}$ . Eigenvalues of $L_{r w}$ and $L_{s y m}$ are identical, and ${\tilde{V}}_{r w} = D^{- 1 / 2} {\tilde{V}}_{s y m}$ holds for their eigenvectors. Therefore, $L_{s y m}$ has an eigenvalue zero and a corresponding eigenvector $D^{1 / 2} 1_{n}$ . The multiplicity of the eigenvalue zero of our $L_{s y m}$ is one, and its corresponding eigenvector is limited to $D^{1 / 2} 1_{n}$ .

By writing $D$ as

D = (\begin{matrix} d_{1} \\ ⋱ \\ d_{n} \end{matrix}),

{\tilde{V}}_{sym} [:, 1] = {\tilde{V}}_{sym} (\begin{matrix} 1 \\ 0 \\ ⋮ \\ 0 \end{matrix}) = {(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} D^{1 / 2} 1_{n} .

Thus, Equation 2.117 can be further transformed as

\begin{array}{ll} b_{0}^{e s t} = B_{E}^{e s t} [:, 1] & = {(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} diag {(P^{⊤} P)}^{1 / 2} D^{1 / 2} D^{1 / 2} 1_{n} \\ = {(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} diag {(P^{⊤} P)}^{1 / 2} D 1_{n} \end{array}

= \frac{1}{{(\sum_{i = 1}^{n} d_{i})}^{1 / 2}} (\begin{matrix} d_{1} \sqrt{{(p_{1})}^{⊤} p_{1}} \\ ⋮ \\ d_{n} \sqrt{{(p_{n})}^{⊤} p_{n}} \end{matrix}) .

Remembering that both $d i a g (P^{⊤} P)$ and $D$ are diagonal, we obtain

d i a g (b_{0}^{e s t}) = d i a g (B_{E}^{e s t} [:, 1]) = {(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} D .

Therefore, from Equation 2.114 and Equation 2.122, the ‘estimated coefficients’ normalized with the ‘estimated constants’ are

B_{E}^{e s t, n o r m} := d i a g {(b_{0}^{e s t})}^{- 1} B_{E}^{e s t}

= d i a g {(b_{0}^{e s t})}^{- 1} d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m}

= {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} D^{- 1} d i a g {(P^{⊤} P)}^{- 1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} D^{1 / 2} {\tilde{V}}_{s y m}

= {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} D^{- 1 / 2} {\tilde{V}}_{s y m}

= {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w} .

We remark that the eigenproblem of $L_{rw}$ , i.e.,

\begin{array}{lcr} L_{rw} {\tilde{V}}_{rw} = {\tilde{V}}_{rw} {\tilde{Λ}}_{LE} \end{array}

is equivalent to solving a minimization problem

a r g m i n \sum_{i, j} {‖ {\hat{v}}_{r w, i} - {\hat{v}}_{r w, j} ‖}_{2}^{2} \cos θ_{p_{i} p_{j}} s u b j e c t t o ({\tilde{V}}_{r w})^{⊤} D {\tilde{V}}_{r w} = I .

Thus, the closer $p_{i}$ and $p_{j}$ are in terms of cosine similarity, the closer ${\hat{v}}_{rw, i}$ and ${\hat{v}}_{rw, j}$ (the $i$ -th and $j$ -th rows of ${\tilde{V}}_{rw}$ , respectively).

The relation between the minimization problem and the eigenproblem is the following. The objective function of the minimization problem is

\sum_{i, j} {‖ {\hat{v}}_{r w, i} - {\hat{v}}_{r w, j} ‖}_{2}^{2} \cos θ_{p_{i} p_{j}} = 2 \sum_{i} d_{i} {‖ {\hat{v}}_{r w, i} ‖}_{2}^{2} - 2 \sum_{i, j} A_{i j} {\hat{v}}_{r w, i} \cdot {\hat{v}}_{r w, j} .

Here,

t r ({({\tilde{V}}_{r w})}^{⊤} D {\tilde{V}}_{r w}) = \sum_{i} d_{i} {‖ {\hat{v}}_{r w, i} ‖}_{2}^{2},

t r ({({\tilde{V}}_{r w})}^{⊤} A {\tilde{V}}_{r w}) = \sum_{i, j} A_{i j} {\hat{v}}_{r w, i} \cdot {\hat{v}}_{r w, j} .

Therefore, Equation 2.130 can be transformed into

\sum_{i, j} {‖ {\hat{v}}_{r w, i} - {\hat{v}}_{r w, j} ‖}_{2}^{2} \cos θ_{p_{i} p_{j}} = 2 t r ({({\tilde{V}}_{r w})}^{⊤} D {\tilde{V}}_{r w}) - 2 t r ({({\tilde{V}}_{r w})}^{⊤} A {\tilde{V}}_{r w})

= 2 t r ({({\tilde{V}}_{r w})}^{⊤} L {\tilde{V}}_{r w}) .

Thus, the minimization problem is

a r g m i n t r ({({\tilde{V}}_{r w})}^{⊤} L {\tilde{V}}_{r w}) s u b j e c t t o {({\tilde{V}}_{r w})}^{⊤} D {\tilde{V}}_{r w} = I .

This can be transformed into the generalized eigenproblem $L {\tilde{V}}_{r w} = D {\tilde{V}}_{r w} {\tilde{Λ}}_{L E}$ by the method of Lagrange multipliers (Ghojogh et al., 2019).

Remember that this property of ${\hat{v}}_{r w, i}$ and ${\hat{v}}_{r w, j}$ is analogous to that of ${\hat{b}}_{i}^{n o r m}$ and ${\hat{b}}_{j}^{n o r m}$ (the $i$ -th and $j$ -th rows of $B_{E}^{n o r m}$ , respectively) as explained in Section 2.1.4 in Appendix and ‘Raman-proteome correspondence matrix as a low-dimensional representation of proteome changes’ in Materials and methods.

By considering the first (upper-left) element of $Σ_{LE}$ is one, the right side of Equation 2.113 can be transformed into

B_{E}^{e s t} Σ_{L E} Θ = d i a g (b_{0}^{e s t}) B_{E}^{e s t, n o r m} Σ_{L E} Θ

= [{(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} D] [{(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w}] Σ_{L E} Θ .

Therefore, from Equations 2.115, 2.116, 2.136, 2.137,

\sqrt{m} diag (b_{0}) B_{E}^{n o r m} Σ_{R_{E}}^{n o r m} = diag (b_{0}^{e s t}) B_{E}^{e s t, n o r m} Σ_{L E} Θ

= [{(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} diag {(P^{⊤} P)}^{1 / 2} D] [{(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w}] Σ_{L E} Θ .

This is the equation that links the normalized Raman-proteome coefficient proteome structure and the csLE proteome structure.

Mathematical interpretation of the obtained equation

From Equation 2.139, if the distributions of $B_{E}^{n o r m}$ and ${\tilde{V}}_{rw}$ are similar, the diagonal matrix $Θ$ must be similar to the identity matrix because large off-diagonal elements of $Θ$ makes lower dimensions ‘mix’ much with the higher dimensions. In addition, the directions of diagonal matrices $d i a g (b_{0})$ and $d i a g {(P^{⊤} P)}^{1 / 2} D$ must also be close to each other even if $Θ$ is close to the identity matrix. Note that the first column of $Θ$ also reflects the relation between $b_{0}$ and $b_{0}^{e s t} (\propto d i a g {(P^{⊤} P)}^{1 / 2} D 1_{n})$ .

The obtained relation between Raman-proteome normalized coefficient structure and csLE structure is summarized in Appendix 1—table 10.

Note that the relation between $Σ_{R_{E}}^{norm}$ and $Σ_{LE}$ can change depending on normalization of $V_{LDA}$ . However, the difference is not important for the spatial correspondence between the two structures because they only affect scaling of the axes. Rather, $Λ_{LDA}^{'}$ and $\tilde{M} = {(Σ_{L E})}^{2}$ , which determine the order of columns of $V_{LDA}$ and ${\tilde{V}}_{rw}$ , are important.

Application to main data

The normalized coefficient proteome structure in Figure 6A and C is the scatterplots between different columns of $B_{E}^{n o r m}$ . The cosine similarity proteome structures in Figure 5L and Figure 6D are the scatterplots between different columns of ${\tilde{V}}_{rw}$ .

In our data analysis, we calculated ${\tilde{V}}_{r w}$ as ${\tilde{V}}_{r w} = D^{- 1 / 2} {\tilde{V}}_{s y m}$ , where each column of ${\tilde{V}}_{s y m}$ was normalized.

On the basis of the results of the mathematical analysis, we compared $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m} = {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w}$ in Appendix 1—figure 9G.

The similarity between the projections of the two distinct omics structures onto low-dimensional subspaces suggests that $Θ$ is close to the identity matrix (Equation 2.138 and Appendix 1—table 10). Appendix 1—figure 9A, C–E shows that the actual $Θ$ is indeed significantly close to the identity matrix. This suggests that the major changes in cellular Raman spectra detectable by LDA reflect the major changes in the proteome characterized by LE based on stoichiometry balance (cosine similarity).

The structural similarity also suggests that directions of $d i a g (b_{0})$ and $d i a g {(P^{⊤} P)}^{1 / 2} D$ are also similar (Equation 2.138 and Appendix 1—table 10).

In fact, Appendix 1—figure 9F confirmed good agreement between $m^{1 / 2} d i a g (b_{0})$ and ${(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} D = d i a g (b_{0}^{e s t})$ . Since these two quantities are calculated only from proteome data, this agreement is a characteristic of proteome data. See the next section (Section 2.2 in Appendix) for further analyses and discussion on this point.

2.2 Quantitative constraint on omics profiles

2.2.1 From agreement between $\sqrt{m} b_{0}$ and $b_{0}^{e s t}$ to proportionality between $L^{1}$ norm/ $L^{2}$ norm ratio and degree

We observed above that constant terms $b_{0}$ and the estimated constant terms $b_{0}^{est}$ were strongly correlated (Appendix 1—figure 9F). It is of note that both $b_{0}$ and $b_{0}^{est}$ can be calculated only from omics data. Specifically, from Equations 2.121 and 2.69,

b_{0} \propto (\begin{matrix} {(1_{m})}^{⊤} p_{1} \\ ⋮ \\ {(1_{m})}^{⊤} p_{n} \end{matrix}) = (\begin{matrix} \sum_{i = 1}^{m} p_{i 1} \\ ⋮ \\ \sum_{i = 1}^{m} p_{i n} \end{matrix}) = (\begin{matrix} \sum_{i = 1}^{m} | p_{i 1} | \\ ⋮ \\ \sum_{i = 1}^{m} | p_{i n} | \end{matrix}) = (\begin{matrix} {‖ p_{1} ‖}_{1} \\ ⋮ \\ {‖ p_{n} ‖}_{1} \end{matrix}),

b_{0}^{e s t} \propto (\begin{matrix} d_{1} \sqrt{{(p_{1})}^{⊤} p_{1}} \\ ⋮ \\ d_{n} \sqrt{{(p_{n})}^{⊤} p_{n}} \end{matrix}) = (\begin{matrix} d_{1} \sqrt{\sum_{i = 1}^{m} p_{i 1}^{2}} \\ ⋮ \\ d_{n} \sqrt{\sum_{i = 1}^{m} p_{i n}^{2}} \end{matrix}) = (\begin{matrix} d_{1} {‖ p_{1} ‖}_{2} \\ ⋮ \\ d_{n} {‖ p_{n} ‖}_{2} \end{matrix}) .

Here, ${‖ p_{j} ‖}_{1}$ and ${‖ p_{j} ‖}_{2}$ are the $L^{1}$ and $L^{2}$ norms of $p_{j}$ and reflect only the expression property of protein $j$ . On the other hand, the degree $d_{j}$ is a measure for the relationships of protein $j$ with the other proteins because $d_{j}$ is the sum of cosine similarities, $d_{j} = \sum_{i = 1}^{n} \cos θ_{p_{i} p_{j}}$ .

The observed relation

\sqrt{m} b_{0} \approx b_{0}^{e s t}

(Appendix 1—figure 9F) indicates that a proportionality relation

\frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}} \propto \sum_{i = 1}^{n} \cos θ_{p_{i} p_{j}}

must hold approximately.

As mentioned in the main text, we refer to the ratio of $L^{1}$ norm to $L^{2}$ norm in the left-hand side of Equation 2.143 as expression generality score ( $g_{j}$ ) because it can be interpreted as a measure of constancy and generality of the expression levels of the protein (see ‘Interpretation of $L^{1}$ norm/ $L^{2}$ norm ratio of an expression vector as a quantitative measure of expression generality’ in Materials and methods, Appendix 1—figure 8A and B).

When a protein is perfectly condition-specific and expressed only in a particular condition, its $L^{1}$ norm equals its $L^{2}$ norm, and the ratio takes the minimum value one. When a protein is expressed equally across all the conditions, its $L^{1}$ norm is greater than its $L^{2}$ norm and the ratio takes the maximum value, the square root of the number of conditions (Appendix 1—figure 8A and B). On the other hand, the right-hand side of Equation 2.143, which we refer to as stoichiometry conservation centrality ( $d_{j}$ ) in the main text, measures to what extent protein $j$ conserves its stoichiometry with the other proteins. Therefore, the proportionality relation (Equation 2.143) suggests a global quantitative constraint between condition specificity of expression patterns and stoichiometry conservation strength. Positions of SCGs and density of genes in the csLE structure (Figure 5A, K, and L) already suggested that genes with less condition-specific expression patterns have more genes with stoichiometrically similar expression patterns, and the proportionality here quantitatively captures this property of omics dynamics. We remark that the proportionality relation was confirmed in all the omics data we analyzed in this paper (see Section 3.2 and Appendix 1—figure 7I–N).

2.2.2 Mathematics behind proportionality between stoichiometry conservation centrality $d_{j}$ and expression generality score $g_{j}$

The cosine similarity-based analyses involve normalization of expression vectors $p_{j}$ by its $L^{2}$ norm ${‖ p_{j} ‖}_{2}$ (Equation 2.77). Normalized expression vectors $p_{j} / {‖ p_{j} ‖}_{2}$ represent points on the first orthant division of a unit $(m - 1)$ -sphere in an $m$ -dimensional space ( $\sum_{i = 1}^{m} {(x_{i})}^{2} = 1 (x_{1}, \dots, x_{m} \geq 0)$ ). $L^{2}$ normalization allows us to compare expression patterns without considering expression magnitudes, and an expression pattern is represented by a position on the unit $(m - 1)$ -sphere. Therefore, our cosine similarity-based quantification of stoichiometric balance in omics data is equivalent to evaluating distances (measured with angle) between points on the $(m - 1)$ -sphere.

Since stoichiometry conservation centrality $d_{j}$ is the sum of cosine similarities,

d_{j} = \sum_{i = 1}^{n} \cos θ_{p_{j} p_{i}} = \sum_{i = 1}^{n} \frac{p_{j} \cdot p_{i}}{‖ p_{j} ‖_{2} ‖ p_{i} ‖_{2}}

= \frac{p_{j}}{‖ p_{j} ‖_{2}} \cdot (\sum_{i = 1}^{n} \frac{p_{i}}{‖ p_{i} ‖_{2}}) .

Defining

{\tilde{p}}_{t o t} := \sum_{i = 1}^{n} \frac{p_{i}}{‖ p_{i} ‖_{2}},

we obtain

\frac{1}{\sqrt{\sum_{i = 1}^{n} d_{i}}} d_{j} = \frac{p_{j}}{‖ p_{j} ‖_{2}} \cdot \frac{{\tilde{p}}_{t o t}}{‖ {\tilde{p}}_{t o t} ‖_{2}} .

Note that $‖ {\tilde{p}}_{t o t} ‖_{2} = \sqrt{\sum_{i = 1}^{n} d_{i}}$ . The last term ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ is the normalized vector of the sum of all the normalized expression vectors, which we refer to as ‘expression-pattern norm vector’. Therefore, Equation 2.147 means that stoichiometry conservation centrality $d_{j} = \sum_{i} \cos θ_{p_{j} p_{i}}$ is proportional to the cosine of the angle between the expression-pattern norm vector ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ and protein $j$ ’s expression pattern $p_{j} / ‖ p_{j} ‖_{2}$ . The more distant the expression pattern of a protein is from that specified by the expression-pattern norm vector, the smaller $d_{j}$ is.

On the other hand, expression generality score $g_{j}$ is

g_{j} = \frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}} = {‖ \frac{p_{j}}{{‖ p_{j} ‖}_{2}} ‖}_{1}

= \frac{p_{j}}{‖ p_{j} ‖_{2}} \cdot 1_{m} .

Therefore,

\frac{1}{\sqrt{m}} \frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}} = \frac{p_{j}}{‖ p_{j} ‖_{2}} \cdot \frac{1_{m}}{\sqrt{m}} .

The last term $1_{m} / \sqrt{m}$ is the normalized vector of $1_{m}$ , corresponding to the ‘center’ of the first orthant division of the unit $(m - 1)$ -sphere. In other words, $1_{m} / \sqrt{m}$ represents ‘perfectly even expression pattern’ across conditions. Therefore, Equation 2.150 means that the expression generality score $g_{j} = ‖ p_{j} ‖_{1} / ‖ p_{j} ‖_{2}$ is proportional to the cosine of the angle between the ‘perfectly even expression pattern’ $1_{m} / \sqrt{m}$ and protein $j$ ’s expression pattern $p_{j} / ‖ p_{j} ‖_{2}$ . The more distant the expression pattern of a protein is from the ‘perfectly even expression pattern’, the smaller the expression generality score is.

Comparing Equations 2.147 and 2.150, we see that if the expression-pattern norm vector ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ and the perfectly even expression pattern $1_{m} / \sqrt{m}$ are equal, a proportional relationship

\forall j, d_{j} = \sqrt{\frac{\sum_{i = 1}^{n} d_{i}}{m}} \frac{‖ p_{j} ‖_{1}}{‖ p_{j} ‖_{2}}

holds. Note that this is equivalent to $\sqrt{m} b_{0} = b_{0}^{e s t}$ .

Conversely, if ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ deviates from $1_{m} / \sqrt{m}$ , the proportional relation between $d_{j}$ and $g_{j}$ breaks. We found that the proteome data by Schmidt et al., 2016 showed ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2} \neq 1_{m} / \sqrt{m}$ . Instead, we found that the values of the elements of ${\tilde{p}}_{tot}$ increased approximately linearly with the population growth rates under corresponding conditions (Appendix 1—figure 8D). Such a strong positive correlation between the elements of ${\tilde{p}}_{tot}$ and the population growth rates is nontrivial and suggests a new growth law constraining the total of relative expression level changes of all the proteins.

Next, we consider the consequence of this positive correlation between the elements of ${\tilde{p}}_{tot}$ and the population growth rates. Let us consider the proteins whose expression generality score $g_{j} = ‖ p_{j} ‖_{1} / ‖ p_{j} ‖_{2}$ takes the minimum value one. Namely, the expression of these proteins is completely condition-specific (Appendix 1—figure 8A and B). For such proteins, only one component of the expression pattern vector $p_{j} / ‖ p_{j} ‖_{2}$ is one, and the other components are zero. Thus, from Equation 2.147, their stoichiometry conservation centrality $d_{j}$ becomes proportional to the elements of ${\tilde{p}}_{tot}$ corresponding to the conditions under which they are expressed. Since the values of the elements of ${\tilde{p}}_{tot}$ are positively correlated with the population growth rates, $d_{j}$ of completely condition-specific proteins also exhibits a positive correlation with the growth rates under the conditions accompanying their expression (Appendix 1—figure 8C).

Such correlation can be confirmed for the proteins with nearly condition-specific expression patterns (PaaE, Asr, and DgoA in Figure 7B and C).

More generally, the deviation of $d_{j}$ from the perfect proportionality line can be understood by the relation

\frac{d_{j}}{\sqrt{\sum_{i = 1}^{n} d_{i}}} - \frac{‖ p_{j} ‖_{1}}{\sqrt{m} ‖ p_{j} ‖_{2}} = \frac{p_{j}}{‖ p_{j} ‖_{2}} \cdot (\frac{{\tilde{p}}_{t o t}}{‖ {\tilde{p}}_{t o t} ‖_{2}} - \frac{1_{m}}{\sqrt{m}}),

which can be derived from Equations 2.147, 2.150. Note that the values of the elements of the last term ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2} - 1_{m} / \sqrt{m}$ also increase with the population growth rates, being positive under fast growth conditions and negative under slow growth conditions (Appendix 1—figure 8D). Therefore, when protein $j$ tends to be expressed higher under fast growth conditions, i.e., when the elements of $p_{j} / ‖ p_{j} ‖_{2}$ corresponding to the fast growth conditions are relatively larger than those corresponding to the slow growth conditions, the left-hand side of Equation 2.152 becomes positive, and its $d_{j}$ resides above the perfect proportionality line. On the other hand, when protein $j$ tends to be expressed higher under slow growth conditions, its $d_{j}$ resides below the perfect proportionality line.

In the $g_{j}$ - $d_{j}$ plot in Figure 7A and B, we find several stretches of protein clusters above and below the perfect proportionality line. As expected from the argument above, each cluster corresponds to a group of proteins with similar expression patterns, and their positions relative to the proportionality line characterize the condition under which they are expressed the most (Figure 7C).

In summary, visualizing omics data by using the stoichiometry conservation centrality $d_{j}$ and the expression generality score $g_{j}$ allows us to systematically characterize the condition-dependent expression pattern of each protein on the basis of its position in the plot. Interestingly, this systematic characterization of gene expression patterns (the relation between $d_{j}$ and $g_{j}$ ) was derived from our mathematical analyses of the correspondences between Raman and omics as we explained above.

3 Extended data analysis

3.1 Growth laws

3.1.1 Single-gene-level growth law

Bacterial growth law states that the total abundances of ribosomal components increase linearly with growth rate (Neidhardt and Magasanik, 1960; Scott et al., 2010; Bremer and Dennis, 2008). The homeostatic core (the largest SCG) identified in our analysis contains many ribosomal proteins. Hence, it is plausible that the total abundance of homeostatic core proteins also increases linearly with growth rate, which we indeed found true (Appendix 1—figure 5A). Furthermore, the abundance ratios of homeostatic core proteins are conserved across conditions. Therefore, the intracellular abundance of each protein species in the homeostatic core is expected to increase linearly with growth rate.

Let $p_{ϵ j}$ be the abundance of protein $j$ in the homeostatic core in environment $ϵ$ . Since this protein conserves the stoichiometry with the other homeostatic core proteins across conditions,

\frac{p_{ϵ i}}{p_{ϵ j}} = α_{i j}

in any environments $ϵ$ ( $α_{i j}$ is the environment-independent abundance ratio of the homeostatic core protein $i$ to protein $j$ ).

Let $M_{ϵ} = \sum_{i} p_{ϵ i} = p_{ϵ j} \sum_{i} α_{i j}$ be the total abundance of homeostatic core proteins in environment $ϵ$ . The growth law for the homeostatic core is

M_{ϵ} = a + b g_{ϵ},

where $g_{ϵ}$ is the growth rate in environment $ϵ$ , $a$ is the $y$ -intercept, and $b$ is the slope of the linear relation. Therefore,

p_{ϵ j} = \frac{a}{\sum_{i} α_{i j}} + \frac{b}{\sum_{i} α_{i j}} g_{ϵ} .

This shows that the abundance of homeostatic core proteins satisfies single-gene-level growth law.

3.1.2 Extended verification of stoichiometry conservation

When the abundance ratios between protein $i$ and protein $j$ are conserved,

\frac{p_{c j}}{p_{c i}} = \frac{p_{s j}}{p_{s i}},

where $c$ and $s$ specify the environments ( $s$ signifies the standard environment). Hence,

\frac{p_{c j}}{p_{s j}} = \frac{p_{c i}}{p_{s i}} = γ_{c},

where $γ_{c}$ is the common abundance ratio of stoichiometry-conserving proteins with respect to the condition $c$ . Note that $γ_{c}$ is common among all the proteins in a stoichiometry-conserving group.

From Equation 3.5,

\begin{array}{lcr} \log p_{c j} = \log p_{s j} + \log γ_{c} . \end{array}

Therefore, plotting $\log p_{c j}$ against $\log p_{s j}$ should find the stoichiometry-conserving proteins aligned on a straight line with a slope of 1. We indeed find such plots for the homeostatic core proteins (Appendix 1—figure 5B and C). This result confirms their stoichiometry conservation from a different perspective.

3.1.3 Linear dependence of common abundance ratio on growth rate

Since the total amount of homeostatic core proteins increases linearly with growth rate (Appendix 1—figure 5A and Equation 3.2),

\begin{array}{lcr} \sum_{j} p_{c j} - \sum_{j} p_{s j} = b (g_{c} - g_{s}) . \end{array}

Since $\sum_{j} p_{c j} = γ_{c} \sum_{j} p_{s j}$ ,

(γ_{c} - 1) \sum_{j} p_{s j} = b (g_{c} - g_{s}) .

Hence,

γ_{c} = (1 - \frac{b g_{s}}{M_{s}}) + \frac{b}{M_{s}} g_{c},

where $M_{s} = \sum_{j} p_{s j}$ . Therefore, the common abundance ratio $γ_{c}$ also increases linearly with growth rate.

Estimating $Γ_{c} := \log_{10} γ_{c}$ as the $y$ -intercepts of the regression lines with a slope of 1 (Appendix 1—figure 5B), we confirmed this linear dependence of common abundance ratio of homeostatic core proteins on growth rate (Appendix 1—figure 5D).

3.2 Generality of the results

3.2.1 Additional datasets with Raman data

Correspondence of three types of space

The correspondences among LDA Raman, Raman-omics normalized coefficient omics structure, and csLE omics structure were also observed in other datasets. The datasets include Raman (this paper) and proteome (Schmidt et al., 2016) data of E. coli with different genotypes (BW25113, MG1655, and NCM3722) cultured in the ‘LB’ medium (Appendix 1—figure 11A–E) and in the ‘Glucose’ medium (Appendix 1—figure 11F–J), and Raman and transcriptome data of S. pombe cultured under 10 different environmental conditions (Appendix 1—figure 11K–O; Kobayashi-Kirschvink et al., 2018).

Comparison of matrices obtained by mathematical analyses

In addition to the comparison of three types of space, we also examined the matrices on the basis of results of the mathematical analyses (Appendix 1—table 10) using the aforementioned additional datasets; the closeness of $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ (Appendix 1—figure 12F for Raman-proteome of E. coli with different genotypes cultured in ‘LB’, Appendix 1—figure 12L for Raman-proteome of E. coli with different genotypes cultured in ‘Glucose’, and Appendix 1—figure 12R for Raman-transcriptome of S. pombe cultured in 10 environment conditions), the closeness of $Θ$ to the identity matrix (Appendix 1—figure 12A–D for Raman-proteome of E. coli with different genotypes cultured in ‘LB’, Appendix 1—figure 12G–J for Raman-proteome of E. coli with different genotypes cultured in ‘Glucose’, and Appendix 1—figure 12M–P for Raman-transcriptome of S. pombe cultured in 10 environment conditions), and the correspondence between $\sqrt{m} d i a g (b_{0})$ and $d i a g (b_{0}^{e s t})$ (Appendix 1—figure 12E for Raman-proteome of E. coli with different genotypes cultured in ‘LB’, Appendix 1—figure 12K for Raman-proteome of E. coli with different genotypes cultured in ‘Glucose’, and Appendix 1—figure 12Q for Raman-transcriptome of S. pombe cultured under in 10 environment conditions).

We confirmed that the same results hold for these additional datasets. Note that the correspondence between $\sqrt{m} d i a g (b_{0})$ and $d i a g (b_{0}^{e s t})$ does not involve Raman data. It is an intrinsic property of the omics data.

Proportionality between expression generality and stoichiometry conservation centrality

The correspondence between $\sqrt{m} d i a g (b_{0})$ and $d i a g (b_{0}^{e s t})$ suggested the proportionality between expression generality score $g_{j}$ and stoichiometry conservation centrality $d_{j}$ in these omics data. We confirmed that the same results hold for these additional datasets (Appendix 1—figure 7I–N).

Correlation between ${\tilde{p}}_{t o t}$ and growth rates

For the proteome data of E. coli with different genotypes (BW25113, MG1655, and NCM3722) cultured in ‘LB’ and in ‘Glucose’, growth rates were also reported (Schmidt et al., 2016). We confirmed a positive correlation between the elements of ${\tilde{p}}_{tot}$ and growth rates for these datasets (Appendix 1—figure 8E). See also the deviation from the proportionality line in the $g_{j} - d_{j}$ plot (Appendix 1—figure 7I–J).

Biological relevance of centrality of csLE structure

We also confirmed centrality-essentiality correlation and centrality-evolutionary conservation correlation in the S. pombe transcriptome data (Appendix 1—figure 6B and E–G).

Degree distribution and its destruction by randomization

Degree (stoichiometry conservation centrality) distributions of csLE structure of the additional datasets also showed a similar pattern as the main data, and randomization of the omics data breaks the strong correlation of expression patterns in the actual data (Appendix 1—figure 7C, D, and H).

3.2.2 Additional datasets without Raman data

Examining proteome structures with csLE does not require Raman data. Therefore, we additionally analyzed publicly available proteome data of M. tuberculosis and M. bovis under the growth conditions with distinct oxygen levels (Schubert et al., 2015), and the proteome data of S. cerevisiae under various environmental conditions (Lahtvee et al., 2017).

We characterized csLE structures of these datasets (Appendix 1—figure 13). Furthermore, we confirmed the proportionality between expression generality score $g_{j}$ and stoichiometry conservation centrality $d_{j}$ (Appendix 1—figure 7K–M). For the proteome data of S. cerevisiae, growth rates were also reported (Lahtvee et al., 2017). The S. cerevisiae cells were cultured in chemostat at the same dilution rate in any condition. In fact, we observed little variation of ${\tilde{p}}_{tot}$ (Appendix 1—figure 8F), which leads to little deviation from the proportionality line (Appendix 1—figure 7M).

Appendix 1—figure 1

Download asset Open asset

Schematic illustration of the approach in this study.

Related to Figure 1. Raman spectra and gene expression profiles are both high-dimensional vectors and can be represented as points in high-dimensional spaces. Coarse-graining Raman spectra by dimensional reduction finds condition-dependent differences in their global spectral patterns (see Figure 2). The dimension-reduced spectra were linked to and used to predict condition-dependent global gene expression profiles (see Figure 2), which implies that global changes in spectral patterns detect differences in cellular physiological states. The analysis of this linkage led us to discover a stoichiometry-conserving constraint on gene expression, which enabled us to represent gene expression profiles in a functionally relevant low-dimensional space (i; see also Figures 3—5). Then, we find a nontrivial correspondence between these low-dimensional Raman and gene expression spaces (ii; see also Figure 6). This correspondence provides an omics-level interpretation of global Raman spectral patterns and a quantitative constraint between expression generality and stoichiometry conservation centrality (ii; see also Figure 7, Appendix 1—figure 9).

Appendix 1—figure 2

Download asset Open asset

Custom-built Raman microscope and analyses of *E. coli* Raman spectra.

Related to Figure 2. (A) Schematic diagram of the Raman microscope used in this study. (B) Representative Raman spectra from single *E. coli* cells. The fingerprint region of one spectrum is shown for each condition. (C) Linear superposition of Raman shifts. Each linear discriminant analysis (LDA) axis is a linear superposition of Raman shifts. These figures show the coefficients for LDA1 (left) and LDA2 (right). (D) Relationship between Raman LDA1 axis and growth rates. The horizontal axis represents Raman LDA1 axis. The vertical axis represents growth rates measured in Schmidt et al., 2016. Each point corresponds to the data for one condition. Pearson correlation coefficient is 0.81±0.09.

Appendix 1—figure 3

Download asset Open asset

Estimation of proteomes from Raman spectra.

Related to Figure 2. Comparing the measured proteomes with those estimated from Raman spectra. The horizontal and vertical axes represent the estimated and measured proteomes, respectively. Proteins with negative estimated abundance are not shown in these figures. The conditions with the largest and the second largest numbers of proteins with negative estimated abundance were ‘stationary3days’ (666 proteins) and ‘LB’ (359 proteins). The conditions with the fewest and the second fewest negatively estimated proteins were ‘GlucosepH6’ (0 proteins) and ‘Xylose’ (7 proteins).

Appendix 1—figure 4

Download asset Open asset

Comparison of stoichiometry conservation among Clusters of Orthologous Group (COG) classes.

Related to Figure 3. (**A and B**) Relations between protein abundance and constant terms of Raman-proteome coefficients. The horizontal axes are $b_{0}$ (constant terms), and the vertical axes are ${\hat{p}}_{i}^{⊤}$ (protein abundance). Dashed lines are the least squares regression lines with intercept zero for information storage and processing (ISP) COG class members. The average of $B_{- i}^{est}$ was used as an estimate of $B$ here. In (A), only ISP COG class members are shown for three representative conditions: ‘Galactose’, ‘Glucose’, and ‘GlycerolAA’. In (B), all proteins are shown for a representative condition, ‘GlycerolAA’. (C) Relations between protein abundance and growth rates of *E. coli* under 15 environmental conditions. We analyzed the absolute quantitative proteome data, growth rate data, and COG annotation reported by Schmidt et al., 2016. Lines represent different protein species. Error bars are standard errors. The top panel is for the Cellular Processes and Signaling COG class; the middle is for the ISP COG class; and the bottom is for the Metabolism COG class. (D) Relations between protein abundance and growth rates of three *E. coli* strains (BW25113, MG1655, and NCM3722) under two culture conditions. We again analyzed the data by Schmidt et al., 2016. Lines represent different protein species. Error bars are standard errors. (**E and F**) COG class-dependent expression pattern similarity of *E. coli* proteomes between conditions. The *E. coli* proteome data under the 15 different environmental conditions were analyzed. The similarity is evaluated by Pearson correlation coefficients of log expression levels in (E) and by cosine similarity in (F). We consider all the combinations of the 15 conditions. Thus, there are 105 data points for each COG class. The box-and-whisker plots summarize the distributions of the points. The lines inside the boxes denote the medians. The top and bottom edges of the boxes denote the 25th percentiles and 75th percentiles, respectively. Note that (E) and (F) are evaluations of the same data used in Figure 3B in the main text with different similarity indices. (G) COG class-dependent expression pattern similarity between different strains of *E. coli* (BW25113, MG1655, and NCM3722). The absolute quantitative proteome data and COG annotation were taken from Schmidt et al., 2016. The similarity was evaluated by cosine similarity. The data contain three strains. Thus, there are three points for each COG class. The top panel is for the ‘Glucose’ condition, and the bottom is for the ‘LB’ condition. (**H–J**) COG class-dependent expression pattern similarity in other organisms. (H) is for *M. tuberculosis* (data from Schubert et al., 2015; six environmental conditions [time points]), (I) for *M. bovis* (data from Schubert et al., 2015; six environmental conditions [time points]), and (J) for *S. cerevisiae* (data from Lahtvee et al., 2017; 10 environmental conditions). The COG annotations were taken from the December 2014 release of 2003-2014 COGs (Galperin et al., 2015) and the Release 3 of ‘Mycobrowser’ (Kapopoulou et al., 2011) for (H) and (I) and from the Comprehensive Sake Yeast Genome Database (S288C strain) (Akao et al., 2011) for (J). The unit for protein abundance was fg/cell for (H) and (I) and fg in pg dry cell weight for (J).

Appendix 1—figure 5

Download asset Open asset

Single-gene-level growth law in the homeostatic core.

Related to Figure 4. (A) Relationship between population growth rates and total abundance of SCG 1 (homeostatic core) proteins. Here, we analyzed the *E. coli* proteome data (Schmidt et al., 2016), focusing on the 15 conditions for which we obtained Raman data. The dashed line is the least squares regression line. (B) Scatterplots of log abundance of SCG 1 (homeostatic core) proteins. Here, the proteomes under three representative conditions, ‘LB’, ‘Glucose’, and ‘Galactose’, are compared with that under the standard condition ‘Glycerol’. Each colored line is the linear regression line with slope one for the points with the same color. The vertical line is $x = 0$ . (C) Relationship between population growth rate and coefficient of determination of linear regression in (B). The vertical line represents the growth rate under the standard condition (‘Glycerol’). (D) Linear relationship between common abundance ratio and growth rates. The vertical axis represents $10^{Γ_{c}}$ , where $Γ_{c}$ is the y-intercepts in (B) (see Section 3.1.2 in Appendix). The dashed line is the linear regression line. The horizontal line is $y = 1$ , and the $x$ coordinate of the vertical line is the growth rate under the standard condition (‘Glycerol’). (E) The gene loci of the proteins belonging to the condition-specific stoichiometrically conserved groups (SCGs) on the chromosome (ASM75055v1.46; Howe et al., 2020). Colored dots are nodes (genes), and gray lines are edges (high cosine similarity relationships). The edge in the map of SCG 5 cannot be seen because their gene loci are clustered in close proximity in the same operon.

Appendix 1—figure 6

Download asset Open asset

Functional relevance of stoichiometry conservation centrality.

Related to Figure 5. (A) Relationship between gene essentiality and stoichiometry conservation centrality in *E. coli*. The proportion of essential genes is plotted for each stoichiometry conservation centrality rank range. In this plot, we calculated stoichiometry conservation centrality based on the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. The list of essential genes was downloaded from EcoCyc (Keseler et al., 2017). (B) Relationship between gene essentiality and stoichiometry conservation centrality in *S. pombe*. We calculated stoichiometry conservation centrality based on the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018. Only coding genes are considered in this plot, though stoichiometry conservation centrality values were calculated using both coding and non-coding genes. Gene classification is based on PomBase (Harris et al., 2022). Some bins do not reach 100% in sum because 11 coding genes in the *S. pombe* transcriptome data were not found in the current PomBase. (C) Relationship between ratio of coding genes and stoichiometry conservation centrality in the *S. pombe* transcriptome data. The coding/non-coding assignment is based on PomBase (Harris et al., 2022). (D) Correlation between stoichiometry conservation and evolutionary conservation. In this plot, we calculated stoichiometry conservation centrality based on the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. Colors represent the height of each bar. The distributions of stoichiometry conservation centrality were compared between the top 25% and the bottom 25% fractions in the number of orthologs rankings. The fraction with many orthologs tends to have higher stoichiometry conservation centrality (one-sided Brunner-Munzel test, $p = 7.84 \times 10^{- 15}$ ). The distributions of the number of orthologs were compared between the top 25% and the bottom 25% stoichiometry conservation centrality fractions. The high centrality fraction tends to have more orthologs (one-sided Brunner-Munzel test, $p = 1.46 \times 10^{- 11}$ ). Ortholog data were taken from OrthoMCL-DB (Chen et al., 2006). (**E–G**) Correlation between stoichiometry conservation and evolutionary conservation in *S. pombe*. We calculated stoichiometry conservation centrality based on the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018. In (E), the result is shown by a two-dimensional histogram. Colors represent the height of each bar. The distributions of the number of orthologs were compared between the top 25% and the bottom 25% stoichiometry conservation centrality fractions. The high centrality fraction tends to have more orthologs (one-sided Brunner-Munzel test, $p = 0.00548$ ). The direct comparison between the two fractions is shown in (F). The distributions of stoichiometry conservation centrality were compared between the top 25% and the bottom 25% fractions in the number of orthologs rankings. The fraction with many orthologs tends to have higher stoichiometry conservation centrality (one-sided Brunner-Munzel test, $p = 0.00270$ ). The direct comparison between the two fractions is shown in (G). Ortholog data were taken from OrthoMCL-DB (Chen et al., 2006). (H) Applying principal component analysis (PCA) to $L^{2}$ -normalized proteomes. PCA (with mean centering) was applied to $L^{2}$ -normalized proteome data $[\begin{array}{lll} p_{1} / {‖ p_{1} ‖}_{2} & \dots & p_{n} / {‖ p_{n} ‖}_{2} \end{array}]$ . Here, we analyzed the *E. coli* proteome data under the 15 conditions for which we obtained Raman data. The left is a projection onto a two-dimensional space, and the right is a projection onto a three-dimensional space. The axes for visualization were selected by considering similarity to the cosine similarity LE (csLE) structure.

Appendix 1—figure 7

Download asset Open asset

Distributions and constraints with respect to stoichiometry conservation centrality (degree).

Related to Figure 5 and Figure 7. (A) Comparison of degree (stoichiometry conservation centrality) distributions between original (yellow) and randomized (blue) *E. coli* proteome data. We created randomized proteome data by shuffling the expression levels across the protein species within each condition. We used the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data. (B) Comparison of the $g_{j}$ - $d_{j}$ relationships between original (yellow) and randomized data (blue). The horizontal axis is expression generality score ( $g_{j} =$ $L^{1}$ norm/ $L^{2}$ norm), and the vertical axis is stoichiometry conservation centrality ( $d_{j}$ : degree). Each dot represents a protein species. The dashed lines are $y = n$ , $x = 1, \sqrt{m}$ ( $n = 2058, m = 15$ ). The solid lines are $y = \sqrt{\sum_{i} d_{i} / m} x$ . (**C–H**) Degree (stoichiometry conservation centrality) distributions for additional datasets. Yellow histograms are for the original data, and blue histograms are for the randomized data. (C) For the proteomes of three *E. coli* strains (BW25113, MG1655, and NCM3722) in LB (Schmidt et al., 2016); (D) for the proteomes of the three *E. coli* strains in M9 Glucose (Schmidt et al., 2016); (E) for the proteomes of *M. tuberculosis* (Schubert et al., 2015); (F) for the proteomes of *M. bovis* (Schubert et al., 2015); (G) for the proteomes of *S. cerevisiae* (Lahtvee et al., 2017); and (H) for the transcriptomes of *S. pombe* (Kobayashi-Kirschvink et al., 2018). (**I–N**) $g_{j}$ - $d_{j}$ relationships for additional datasets. Each gray dot represents a protein species. The proteins belonging to the homeostatic core in each dataset are shown in magenta; those belonging to condition-specific stoichiometrically conserved groups (SCGs) are indicated in different colors in each plot. See the caption of Appendix 1—figures 11 and 13 for the cosine similarity threshold to specify the homeostatic core and the condition-specific SCGs in each dataset. The dashed lines are $y = n, x = 1, \sqrt{m}$ . The solid lines through the origins are $y = \sqrt{\sum_{i = 1}^{n} d_{i} / m} x$ (I) for the proteomes of the three *E. coli* strains in LB (Schmidt et al., 2016); (J) for the proteomes of the three *E. coli* strains in M9 Glucose (Schmidt et al., 2016); (K) for the proteomes of *M. tuberculosis* (Schubert et al., 2015); (L) for the proteomes of *M. bovis* (Schubert et al., 2015) (M) for the proteomes of *S. cerevisiae* (Lahtvee et al., 2017); and (N) for the transcriptomes of *S. pombe* (Kobayashi-Kirschvink et al., 2018).

Appendix 1—figure 8

Download asset Open asset

Properties of normalized expression vectors.

Related to Figure 7. (**A and B**) Schematic explanation for the interpretation of the $L^{1}$ norm/ $L^{2}$ norm ratio of expression vectors as an index of expression generality. (A) is a two-dimensional case, and (B) is a three-dimensional case. The inset in (A) schematically explains $L^{1}$ norm and $L^{2}$ norm of an expression vector. See ‘Interpretation of $L^{1}$ norm/ $L^{2}$ norm ratio of an expression vector as a quantitative measure of expression generality’ in Materials and methods for details. (C) Schematic explanation for deviations of points from the proportionality line in the $g_{j}$ - $d_{j}$ plots. Here, we consider four condition-specific protein species a, b, c, and d labeled in the descending order of growth rates under the conditions accompanying their expression. Note that their $L^{1}$ norm/ $L^{2}$ norm ratios are all one on the horizontal axis. One can show that the degree (stoichiometry conservation centrality) $d_{j}$ is proportional to the inner product of $L^{2}$ -normalized expression vector $p_{j} / ‖ p_{j} ‖_{2}$ and the expression norm vector ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ (see Equation 2.147 in Section 2.2.2). Since the elements of ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ increase approximately linearly with growth rates of the corresponding conditions (see D), the degrees (stoichiometry conservation centrality values) decrease from a to d in the order of growth rates. (**D–F**) Correlation between elements of ${\tilde{p}}_{tot}$ and population growth rates. The vertical axis represents the elements of ${\tilde{p}}_{t o t} / ‖ {\tilde{p}}_{t o t} ‖_{2}$ , and the horizontal axis represents the population growth rates. The dashed lines are $y = 1 / \sqrt{m}$ . (D) is the result from the analysis of the *E. coli* proteome data (Schmidt et al., 2016) under the 15 conditions for which we obtained Raman data ( $m = 15$ ). (E) is the result from the analysis of the proteome data of three strains of *E. coli* (BW25113, MG1655, and NCM3722) under ‘LB’ and ‘Glucose’ conditions ( $m = 6$ ) (Schmidt et al., 2016). (F) is the result from the analysis of the proteome data of *S. cerevisiae* under 10 different conditions ( $m = 10$ ) (Lahtvee et al., 2017). The cells were cultured in a chemostat with the same dilution rate. The numbers of analyzable protein species and the numbers of conditions were different between (D) and (E). Thus, the values of the vertical axes cannot be compared directly between them.

Appendix 1—figure 9

Download asset Open asset

Mathematical analyses of the main Raman-proteome data.

Related to Figure 6. Proteomes of *E. coli* under 15 conditions (Schmidt et al., 2016) and corresponding Raman data we measured in this study were analyzed in this figure. (A) Visual comparison of the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and a random orthogonal matrix. Height of each bar indicates the value of each element. Colors represent the height of each bar. For clarifying the position of each element, a component form of matrix $Θ$ is shown in the middle ( $m = 15$ ). For $Θ$ (middle) and a random orthogonal matrix (right), the original matrices are displayed in the upper row, and matrices whose elements are the absolute values of the corresponding elements of the original matrices are displayed in the lower row. (In this figure, $| Θ |$ represents a matrix of which the $(i, j)$ element is the absolute value of the $(i, j)$ element of $Θ$ .) (B) Representation of matrices as scatterplots. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (C) Comparison of the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and random orthogonal matrices $Q$ by Pearson correlation coefficients. Pearson correlation coefficient of the element-wise squared matrix of each matrix can be regarded as a measure of closeness to the identity matrix ( $\circ$ represents element-wise multiplication). The probability of finding a random orthogonal matrix $Q$ with Pearson correlation coefficient greater than the Pearson correlation coefficient of $Θ$ was $< 1 \times 10^{- 5}$ (no occurrence in 10⁵ samplings). See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (D) Comparison of magnitudes of off-diagonal elements among the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and random orthogonal matrices $Q$ . The lattice on the top explains the numbering of $k$ -diagonals ( $- m < k < m$ , $m = 15$ ). In the lattices on the bottom, black color indicates areas in which the elements are squared and summed at the corresponding steps (i.e. areas represented by $x$ in the graph). The sum of the squared values in each step is shown in the middle graph. Error bars of the random matrix line are standard errors of 100 samplings. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (E) Comparison of magnitudes of elements of leading principal submatrices among the unit matrix $I$ , the orthogonal matrix $Θ$ obtained from the data, and random orthogonal matrices $Q$ . In the lattices on the bottom, black color indicates an area in which elements are squared and summed at the corresponding step (i.e. an area represented by $x$ in the graphs). The sum of the squared values in each area is shown in the top graph. The results shown in the top graph are converted into ratios to the identity matrix $I$ and are shown in the middle graph. Error bars of the random matrix line are standard errors of 100 samplings. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods for details. (F) Comparison of $\sqrt{m} d i a g (b_{0})$ and $d i a g (b_{0}^{e s t})$ . $x$ axis represents $\sqrt{m} b_{0}$ and $y$ axis represents $b_{0}^{est}$ . The dashed line indicates $y = x$ . (G) Comparison between $B_{E}^{norm}$ (left) and $B_{E}^{e s t, n o r m}$ (right). Note that while $B_{E}^{norm}$ figure (left) is the same as Figure 6C, the right figure shows $B_{E}^{e s t, n o r m} = {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w}$ , where ${\tilde{V}}_{rw}$ is shown in Figure 6D.

Appendix 1—figure 10

Download asset Open asset

Orthant correspondences between Raman spectra in linear discriminant analysis (LDA) space and condition-specific proteins in Raman-proteome coefficient proteome space.

Related to Figure 6. Using the main Raman and proteome data of *E. coli* under the 15 conditions, we examine the orthant correspondence between Raman spectra in the LDA space and condition-specific proteins in the Raman-proteome coefficient proteome space $Ω_{B}$ . Here, we focus on two proteins PaaE and AcrR. (A) Expression patterns of PaaE (left) and AcrR (right) across conditions. Error bars are standard errors. PaaE is expressed under the ‘LB’ condition in a condition-specific manner, whereas AcrR is expressed at high levels not only under ‘LB’ condition but also under several other conditions. (B) Positions of PaaE and AcrR in the Raman-proteome coefficient-based proteome space $Ω_{B}$ . (C) Verification of orthant correspondence. We verified the orthant correspondence described by Equation 2.76. We multiplied both sides of Equation 2.76 by ${(Σ_{R_{E}}^{n o r m})}^{- 1}$ , and the elements of the vectors of both sides were compared by scatterplots. The horizontal axes are related to the coordinates in the Raman LDA space; the vertical axes are related to the coordinates in the Raman-proteome coefficient proteome space. The dashed lines are $y = x$ . The nearly perfect agreement of the elements confirms the orthant correspondence for the condition-specific protein PaaE (left). Deviations from the diagonal agreement line are found for AcrR (right).

Appendix 1—figure 11

Download asset Open asset

Stoichiometry-based omics structures and their correspondences to Raman-based omics structures for additional datasets.

Related to Figures 4—6. This figure summarizes the results on omics structures characterized by stoichiometry conservation relations and their correspondences to those characterized by Raman-omics relations for additional datasets. (**A–E**) show the results from the analyses of the Raman and proteome data of three *E. coli* strains (BW25113, MG1655, and NCM3722) in LB; (**F–J**) from the analyses of the Raman and proteome data of the three *E. coli* strains in M9 Glucose; and (**K–O**) from the analyses of the Raman and transcriptome data of *S. pombe* under 10 conditions. We used the *E. coli* proteome data reported in Schmidt et al., 2016, and the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018, in the analyses. (A), (F), and (K) show distributions of omics components in cosine similarity LE (csLE) space. Stoichiometry conservation centrality of each component is indicated by color. (B), (G), and (L) show expression patterns of representative condition-specific omics components indicated in the previous figures of omics structures in the csLE spaces. Error bars are standard errors in (B) and (G), and maximum-minimum ranges (two replicates) in (L). (C), (H), and (M) show positions of averaged cellular Raman spectra under different conditions in the linear discriminant analysis (LDA) spaces. (D), (I), and (N) show omics structures in the spaces specified by the Raman-omics coefficients with the homeostatic cores and condition-specific stoichiometrically conserved groups (SCGs) indicated by colored points. (E), (J), and (O) show the omics structures in the csLE omics spaces with the homeostatic cores and condition-specific SCGs indicated by colored points. Columns $v_{rw, 1}$ (the eigenvector corresponding to $L_{rw}$ ’s smallest eigenvalue except for zero) and $v_{rw, 2}$ (the eigenvector corresponding to $L_{rw}$ ’s second smallest eigenvalue except for zero) are shown. We used the cosine similarity thresholds of 0.99993 to specify SCGs both for the three *E. coli* strains under LB data (D and E) and for the three *E. coli* strains under M9 Glucose data (I and J), and 0.9967 for the *S. pombe* transcriptome data (N and O).

Appendix 1—figure 12

Download asset Open asset

Analyses of the mathematical relation connecting two types of omics spaces.

Related to Figure 6. This figure shows the analyses of mathematical relation that connects coordinates of omics components in the two types of omics spaces (see Figure 6E and Section 2 in Appendix) using additional datasets. (**A–F**) show the results from the analyses of the Raman and proteome data of three *E. coli* strains (BW25113, MG1655, and NCM3722) in LB; (**G–L**) from the analyses of the Raman and proteome data of the three *E. coli* strains in M9 Glucose; and (**M–R**) from the analyses of the Raman and transcriptome data of *S. pombe* under 10 conditions. We used the *E. coli* proteome data reported in Schmidt et al., 2016, and the *S. pombe* transcriptome data reported in Kobayashi-Kirschvink et al., 2018 in the analyses. See the caption of Appendix 1—figure 9 for the explanation of each panel. The stoichiometrically conserved groups (SCGs) in (F), (L), and (R) are the same as in Appendix 1—figure 11. The probability of finding a random orthogonal matrix $Q$ with Pearson correlation coefficient greater than the Pearson correlation coefficient of $Θ$ was 0.022 in (B), 0.013 in (H), and $< 1 \times 10^{- 5}$ (no occurrence in 10⁵ samplings) in (N).

Appendix 1—figure 13

Download asset Open asset

Stoichiometry-based proteome structures for additional datasets.

Related to Figures 4 and 5. This figure shows proteome structures in the cosine similarity LE (csLE) proteome spaces for additional datasets. (**A–C**) show the results from the analyses of the proteome data of *M. tuberculosis* H37Rv under gradual changes in oxygen levels (Schubert et al., 2015); (**D–F**) shows the results from the analyses of the proteome data of *M. bovis* BCG under gradual changes in oxygen levels (Schubert et al., 2015); and (**G–I**) show the results from the analyses of the proteome data of *S. cerevisiae* under 10 conditions in chemostat with the same dilution rate (Lahtvee et al., 2017). (A), (D), and (G) show the proteome structures in the csLE spaces. The thresholds used to specify the stoichiometrically conserved groups (SCGs) were 0.99965 for (A), 0.9997 for (D), and 0.9989 for (G). (B), (E), and (H) show the same proteome structures as in the previous panels, but with stoichiometry conservation centrality of each protein species indicated by the color. (C), (F), and (I) show expression patterns of representative proteins indicated by the red circles in the previous panels. Error bars in (C) are standard errors.

Appendix 1—figure 14

Download asset Open asset

Dependence of low-dimensional correspondence between Raman spectra and proteomes on the number of conditions.

Related to Figure 6. The dependence of the low-dimensional correspondence between Raman spectra and proteomes on the number of analyzed conditions was systematically investigated by evaluating the similarity of the orthogonal matrix $Θ$ to the identity matrix for all subsampled condition sets. Proteomes of *E. coli* under 15 conditions (Schmidt et al., 2016) and corresponding Raman data we measured in this study were analyzed in this figure. (A) The relationship between the number of conditions and the probability of obtaining higher level of low-dimensional correspondence than that of experimental data by chance. This probability is calculated as the probability of finding a random orthogonal matrix with Pearson correlation coefficient greater than the Pearson correlation coefficient of $Θ$ by creating 10⁴ random orthogonal matrices. See ‘Evaluating similarity between orthogonal matrix $Θ$ and identity matrix’ in Materials and methods and Appendix 1—figure 9 for details of the evaluation method. Each green square corresponds to one subsample, and each short horizontal black line represents the median of all the $(\binom{15}{x})$ combinations of conditions (i.e. $(\binom{15}{x})$ green squares) for each subsample size $x$ . The blue dashed line indicates the detection limit (i.e. one over the number of generated random orthogonal matrices). The non-subsampled case (i.e. the case with all 15 conditions) in this figure corresponds to Appendix 1—figure 9C. (B) Visual comparison of $Θ$ , $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ for six representative subsamples indicated in (A). As in Appendix 1—figure 9A, $Θ$ is visualized using $| Θ |$ , whose element is the absolute value of the corresponding element of $Θ$ , and height of each bar in the figures of $| Θ |$ indicates the value of each element of $| Θ |$ . Colors reflect the height of each bar. Spaces created with columns of $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ are $Ω_{B}$ and $Ω_{LE}$ , respectively. As $Θ$ deviates from the identity matrix from the cases $α$ and $β$ to the case of $ϵ$ , the low-dimensional correspondence between $Ω_{B}$ and $Ω_{LE}$ collapses naturally. Since the case $ζ$ is the non-subsampled case, the figure of $| Θ |$ is the same as Appendix 1—figure 9A, and those of $B_{E}^{norm}$ and $B_{E}^{e s t, n o r m}$ are the same as Appendix 1—figure 9G. Note that the figure of $Ω_{B}$ of the case $ζ$ is also exactly the same as Figure 6C, and that of $Ω_{LE}$ of the case $ζ$ is equal to Figure 6D up to a factor of ${(\sum_{i = 1}^{n} d_{i})}^{1 / 2}$ . The stoichiometrically conserved groups (SCGs) shown in this figure were defined in the analysis of the proteomes of all the 15 conditions (Figure 4C).

Appendix 1—table 1

List of culture conditions.

M9 m.m. and a.a. in this table are the abbreviations for M9 minimal media and amino acids, respectively.

Phase	Overview of composition	Temperature	pH	Name in this paper
Exponential	M9 m.m. + acetate	37°C	7	Acetate
	M9 m.m. + fructose			Fructose
	M9 m.m. + fumarate			Fumarate
	M9 m.m. + galactose			Galactose
	M9 m.m. + glucose			Glucose
	M9 m.m. + glucose	42°C		Glucose42C
	M9 m.m. + glucose	37°C	6	GlucosepH6
	M9 m.m. + glycerol		7	Glycerol
	M9 m.m. + glycerol + a.a.			GlycerolAA
	M9 m.m. + glucose + NaCl			OsmoticStressGlucose
	M9 m.m. + mannose			Mannose
	M9 m.m. + xylose			Xylose
	LB			LB
Stationary for 1 day	M9 m.m. + glucose			stationary1day
Stationary for 3 days	M9 m.m. + glucose			stationary3days

Appendix 1—table 2

Evaluation of the overall estimation error with various distance measures (the case where LDA1 to LDA4 axes were used).

The sum of estimation errors $\sum_{i} d i s t ({\hat{p}}_{i}, {\hat{p}}_{i}^{e s t})$ was calculated, and a permutation test (10⁵ permutations) was conducted. In this table, LDA1 to LDA4 axes were used. $\overline{x}$ represents a vector whose all elements are the mean of all elements of $x$ . $x_{j}$ is the $j$ -th element of $x$ . ${median}_{j} x_{j}$ represents the median of scalers $x_{j}$ .

Metric	Definition of $d i s t (x, y)$	$\sum_{i} d i s t ({\hat{p}}_{i}, {\hat{p}}_{i}^{e s t})$	$p$ -value
Square of $L^{2}$ norm (PRESS)	$‖ x - y ‖_{2}^{2} = \sum_{j} (x - y)_{j}^{2}$	2.34 × 10³	0.00005
$L^{1}$ norm	$‖ x - y ‖_{1} = \sum_{j} \| (x - y)_{j} \|$	1.40 × 10³	0.00002
Cosine distance	$1 - \frac{x \cdot y}{‖ x ‖_{2} ‖ y ‖_{2}}$	1.52	0.0014
1 – Pearson correlation coefficient	$1 - \frac{(x - \bar{x}) \cdot (y - \bar{y})}{‖ x - \bar{x} ‖_{2} ‖ y - \bar{y} ‖_{2}}$	1.57	0.0012
Median of relative error	${m e d i a n}_{j} \frac{\| (x - y)_{j} \|}{x_{j} + 1}$	0.0536	0.00022

Appendix 1—table 3

Evaluation of the overall estimation error with various distance measures (the case where all the 14 LDA axes were used).

The results obtained by using all the 14 LDA axes are presented. See Appendix 1—table 2 for notations. Note that the system is underdetermined in this case; thus, we adopted the minimum-norm solution from among all least-squares solutions.

Metric	Definition of $d i s t (x, y)$	$\sum_{i} d i s t ({\hat{p}}_{i}, {\hat{p}}_{i}^{e s t})$	$p$ -value
Square of $L^{2}$ norm (PRESS)	$‖ x - y ‖_{2}^{2} = \sum_{j} (x - y)_{j}^{2}$	1.63 × 10³	0.0019
$L^{1}$ norm	$‖ x - y ‖_{1} = \sum_{j} \| (x - y)_{j} \|$	1.19 × 10³	0.00066
Cosine distance	$1 - \frac{x \cdot y}{‖ x ‖_{2} ‖ y ‖_{2}}$	1.18	0.0879
1 – Pearson correlation coefficient	$1 - \frac{(x - \bar{x}) \cdot (y - \bar{y})}{‖ x - \bar{x} ‖_{2} ‖ y - \bar{y} ‖_{2}}$	1.23	0.085
Median of relative error	${m e d i a n}_{j} \frac{\| (x - y)_{j} \|}{x_{j} + 1}$	0.0418	0.00082

Appendix 1—table 4

Gene list of SCG 1 (homeostatic core).

Members of homeostatic core (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
rpoC	DNA-directed RNA polymerase subunit beta’
rpoB	DNA-directed RNA polymerase subunit beta
tufA	Elongation factor Tu 1
infB	Translation initiation factor IF-2
fusA	Elongation factor G
glyS	Glycyl-tRNA synthetase beta subunit
rpsA	30S ribosomal protein S1
leuS	Leucyl-tRNA synthetase
pheT	Phenylalanyl-tRNA synthetase beta chain
aspS	Aspartyl-tRNA synthetase
valS	Valyl-tRNA synthetase
secA	Protein translocase subunit SecA
gyrA	DNA gyrase subunit A
pepN	Aminopeptidase N
tsf	Elongation factor Ts
tig	Trigger factor
pta	Phosphate acetyltransferase
bamA	Outer membrane protein assembly factor YaeT
rne	Ribonuclease E
ftsZ	Cell division protein FtsZ
gyrB	DNA gyrase subunit B
polA	DNA polymerase I
rplB	50S ribosomal protein L2
prlC	Oligopeptidase A
rho	Transcription termination factor Rho
ftsH	ATP-dependent zinc metalloprotease FtsH
nusA	Transcription elongation protein NusA
lysS	Lysyl-tRNA synthetase
metG	Methionyl-tRNA synthetase
glnS	Glutaminyl-tRNA synthetase
lpdA	Dihydrolipoyl dehydrogenase
serS	Seryl-tRNA synthetase
surA	Chaperone SurA
rpsB	30S ribosomal protein S2
gltX	Glutamyl-tRNA synthetase
lptD	LPS-assembly protein LptD
argS	Arginyl-tRNA synthetase
fabB	3-Oxoacyl-[acyl-carrier-protein] synthase 1
pheS	Phenylalanyl-tRNA synthetase alpha chain
clpX	ATP-dependent Clp protease ATP-binding subunit ClpX
accC	Biotin carboxylase
pyrG	CTP synthase
tolC	Outer membrane protein TolC
rplE	50S ribosomal protein L5
accA	Acetyl-coenzyme A carboxylase carboxyl transferase subunit alpha
hflK	Modulator of FtsH protease HflK
pdxB	Erythronate-4-phosphate dehydrogenase
ygfZ	tRNA-modifying protein YgfZ
pmbA	Protein PmbA
rplA	50S ribosomal protein L1
hldD	ADP-L-glycero-D-manno-heptose-6-epimerase
mreB	Rod shape-determining protein MreB
acrA	Acriflavine resistance protein A
gor	Glutathione reductase
hisS	Histidyl-tRNA synthetase
rpsC	30S ribosomal protein S3
glmM	Phosphoglucosamine mutase
lepA	Elongation factor 4
ffh	Signal recognition particle protein
secD	Protein-export membrane protein SecD
lpoA	Penicillin-binding protein activator LpoA
rhlB	ATP-dependent RNA helicase RhlB
rpsG	30S ribosomal protein S7
rpsD	30S ribosomal protein S4
minD	Septum site-determining protein MinD
cyoA	Ubiquinol oxidase subunit 2
mdoG	Glucans biosynthesis protein G
rplC	50S ribosomal protein L3
glmU	Bifunctional protein GlmU
rpsF	30S ribosomal protein S6
rpsE	30S ribosomal protein S5
hemL	Glutamate-1-semialdehyde 2,1-aminomutase
hldE	Bifunctional protein HldE
ubiE	Ubiquinone/menaquinone biosynthesis methyltransferase UbiE
sspA	Stringent starvation protein A
nusG	Transcription antitermination protein NusG
prfB	Peptide chain release factor 2
dacA	D-alanyl-D-alanine carboxypeptidase DacA
rplF	50S ribosomal protein L6
fabG	3-Oxoacyl-[acyl-carrier-protein] reductase
ftsY	Cell division protein FtsY
dcrB	Protein DcrB
mlaC	Probable phospholipid-binding protein MlaC
hflC	Modulator of FtsH protease HflC
coaB	Coenzyme A biosynthesis bifunctional protein CoaBC
ybiT	Uncharacterized ABC transporter ATP-binding protein YbiT
oxyR	Hydrogen peroxide-inducible genes activator
rpsH	30S ribosomal protein S8
fkpA	FKBP-type peptidyl-prolyl cis-trans isomerase FkpA
frr	Ribosome-recycling factor
fabD	Malonyl CoA-acyl carrier protein transacylase
hslO	33 kDa chaperonin
ybeZ	PhoH-like protein
hemX	Putative uroporphyrinogen-III C-methyltransferase
rplY	50S ribosomal protein L25
rplK	50S ribosomal protein L11
rpsI	30S ribosomal protein S9
bamB	Lipoprotein YfgL
bamD	UPF0169 lipoprotein YfiO
kdgR	Transcriptional regulator KdgR
glnD	[Protein-PII] uridylyltransferase
yniC	Phosphatase YniC
rpsJ	30S ribosomal protein S10
rplX	50S ribosomal protein L24
rplD	50S ribosomal protein L4
rplQ	50S ribosomal protein L17
ppa	Inorganic pyrophosphatase
rpsM	30S ribosomal protein S13
rplN	50S ribosomal protein L14
ybaB	UPF0133 protein YbaB
yidC	Inner membrane protein OxaA
lptB	Lipopolysaccharide export system ATP-binding protein LptB
suhB	Inositol-1-monophosphatase
yejK	Nucleoid-associated protein YejK
ghrA	Glyoxylate/hydroxypyruvate reductase A
rsmI	Ribosomal RNA small subunit methyltransferase I
hemY	Protein HemY
uup	ABC transporter ATP-binding protein Uup
hrpA	ATP-dependent RNA helicase HrpA
rplJ	50S ribosomal protein L10
rplM	50S ribosomal protein L13
fur	Ferric uptake regulation protein
rplS	50S ribosomal protein L19
rcsB	Capsular synthesis regulator component B
mrp	Protein Mrp
glyQ	Glycyl-tRNA synthetase alpha subunit
greA	Transcription elongation factor GreA
nrdB	Ribonucleoside-diphosphate reductase 1 subunit beta
wbbI	Uncharacterized protein YefG
udk	Uridine kinase
mnmG	tRNA uridine 5-carboxymethylaminomethyl modification enzyme MnmG
rplL	50S ribosomal protein L7/L12
rplI	50S ribosomal protein L9
rpoZ	DNA-directed RNA polymerase subunit omega
ybbN	Uncharacterized protein YbbN
yfiF	Uncharacterized tRNA/rRNA methyltransferase YfiF
yedD	Uncharacterized lipoprotein YedD
rpmD	50S ribosomal protein L30
tatB	Sec-independent protein translocase protein TatB
yfgM	UPF0070 protein YfgM
kdsB	3-Deoxy-manno-octulosonate cytidylyltransferase
rpoN	RNA polymerase sigma-54 factor
fdx	2Fe-2S ferredoxin
rplV	50S ribosomal protein L22
rplO	50S ribosomal protein L15
fabZ	(3R)-hydroxymyristoyl-[acyl-carrier-protein] dehydratase
mipA	MltA-interacting protein
ssb	Single-stranded DNA-binding protein
yiaF	Uncharacterized protein YiaF
secY	Preprotein translocase subunit SecY
rbfA	Ribosome-binding factor A
potA	Spermidine/putrescine import ATP-binding protein PotA
rimM	Ribosome maturation factor RimM
trxA	Thioredoxin-1
rpsS	30S ribosomal protein S19
rpsU	30S ribosomal protein S21
accB	Biotin carboxyl carrier protein of acetyl-CoA carboxylase
engB	Probable GTP-binding protein EngB
tatA	Sec-independent protein translocase protein TatA
rfbD	dTDP-4-dehydrorhamnose reductase
ribF	Riboflavin biosynthesis protein RibF
folP	Dihydropteroate synthase
lepB	Signal peptidase I
sspB	Stringent starvation protein B
hupA	DNA-binding protein HU-alpha
rpsP	30S ribosomal protein S16
rplP	50S ribosomal protein L16
rpsT	30S ribosomal protein S20
rpsK	30S ribosomal protein S11
rplU	50S ribosomal protein L21
rplR	50S ribosomal protein L18
lpxA	Acyl-[acyl-carrier-protein]–UDP-N-acetylglucosamine O-acyltransferase
yceD	Uncharacterized protein YceD
queC	7-Cyano-7-deazaguanine synthase
rpmA	50S ribosomal protein L27
rpmG	50S ribosomal protein L33
rpmF	50S ribosomal protein L32
rpsN	30S ribosomal protein S14
rplT	50S ribosomal protein L20
nudK	GDP-mannose pyrophosphatase NudK
rplW	50S ribosomal protein L23
trmB	tRNA (guanine-N(7)-)-methyltransferase
rluB	Ribosomal large subunit pseudouridine synthase B
rpsR	30S ribosomal protein S18
secG	Protein-export membrane protein SecG
rlmE	Ribosomal RNA large subunit methyltransferase E
yfaY	CinA-like protein
trmA	tRNA (uracil-5-)-methyltransferase
rpmH	50S ribosomal protein L34
yajC	UPF0092 membrane protein YajC
yheU	UPF0270 protein YheU

Appendix 1—table 5

Gene list of SCG 2.

Members in SCG 2 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
fdoG	Formate dehydrogenase-O major subunit
dsdA	D-serine dehydratase
treC	Trehalose-6-phosphate hydrolase
sdaB	L-serine dehydratase 2
nanA	N-acetylneuraminate lyase
garD	D-galactarate dehydratase
proV	Glycine betaine/L-proline transport ATP-binding protein ProV
garR	2-Hydroxy-3-oxopropionate reductase
nanK	N-acetylmannosamine kinase
fdoH	Formate dehydrogenase-O iron-sulfur subunit
aphA	Class B acid phosphatase
nanE	Putative N-acetylmannosamine-6-phosphate 2-epimerase
srlB	Glucitol/sorbitol-specific phosphotransferase enzyme IIA component
ibpB	Small heat shock protein IbpB
hybC	Hydrogenase-2 large chain
proW	Glycine betaine/L-proline transport system permease protein ProW
srlE	Glucitol/sorbitol-specific phosphotransferase enzyme IIB component
fdoI	Formate dehydrogenase, cytochrome b556(fdo) subunit
preT	Uncharacterized oxidoreductase YeiT
garL	5-Keto-4-deoxy-D-glucarate aldolase
paaB	Phenylacetic acid degradation protein PaaB
paaK	Phenylacetate-coenzyme A ligase
paaE	Probable phenylacetic acid degradation NADH oxidoreductase PaaE
ykgE	Uncharacterized protein YkgE
ybjT	Uncharacterized protein YbjT
ykgG	Uncharacterized protein YkgG

Appendix 1—table 6

Gene list of SCG 3.

Members in SCG 3 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
wzc	Tyrosine-protein kinase Wzc
amiC	N-acetylmuramoyl-L-alanine amidase AmiC

Appendix 1—table 7

Gene list of SCG 4.

Members in SCG 4 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
fruB	Multiphosphoryl transfer protein
fruK	1-Phosphofructokinase
fruA	PTS system fructose-specific EIIBC component
narI	Respiratory nitrate reductase 1 gamma chain

Appendix 1—table 8

Gene list of SCG 5.

Members in SCG 5 (Figure 4, cosine similarity threshold: 0.995). The description of each gene is cited from Schmidt et al., 2016.

Name	Description
hdeB	Protein HdeB
hdeA	Chaperone-like protein HdeA

Appendix 1—table 9

Interpretations of $r_{h}$ , ${\hat{r}}_{i}$ , $b_{h}$ , and ${\hat{b}}_{j}$ .

Interpretations of the columns and rows of $R_{E}$ and $B_{E}$ are summarized.

Matrix	Vector		Dimension	Description
$R_{E}$	Column	$r_{h} (h = 0, \dots, m - 1)$	$m$	List of $h$ -th LDA coordinates of mean LDA Raman of all the conditions
$R_{E}$	Row	${\hat{r}}_{i} (i = 1, \dots, m)$	$m$	Mean LDA Raman of condition $i$
$B_{E}$	Column	$b_{h} (h = 0, \dots, m - 1)$	$n$	List of coefficients of all the proteins for the $h$ -th LDA axis
$B_{E}$	Row	${\hat{b}}_{j} (j = 1, \dots, n)$	$m$	Coefficients for protein $j$

Appendix 1—table 10

Mathematical relation between Raman-proteome coefficients and cosine similarity LE (csLE) proteomes.

The matrices in the left-hand side of Equation 2.138 (a proteome structure based on Raman-proteome coefficients) and their counterparts in the right-hand side of Equation 2.138 (a proteome structure obtained with csLE) are listed.

Raman-omicscoef. structure	csLE	Size and type of matrix	Description
$B_{E}^{norm}$	$\begin{array}{ll} {(\sum_{i = 1}^{n} d_{i})}^{1 / 2} {\tilde{V}}_{r w} \\ (= B_{E}^{e s t, n o r m}) \end{array}$	$n \times m$ matrix	Coefficients normalized by constants
$I$	$Θ$	$m \times m$ orthogonal matrix	Orthogonal transformation
$\begin{array}{ll} m^{- 1 / 2} d i a g ({(1_{m})}^{⊤} P) \\ (= m^{1 / 2} d i a g (b_{0})) \end{array}$	$\begin{array}{ll} {(\sum_{i = 1}^{n} d_{i})}^{- 1 / 2} d i a g {(P^{⊤} P)}^{1 / 2} D \\ (= d i a g (b_{0}^{e s t})) \end{array}$	$n \times n$ diagonal matrix	Constant terms
$Σ_{R_{E}}^{norm}$	$Σ_{LE}$	$m \times m$ diagonal matrix	Singular values

Data availability

All data and analysis codes have been deposited in Zenodo and are publicly available at https://doi.org/10.5281/zenodo.17090710.

The following data sets were generated

(2025) Zenodo
Code and data for "Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns".

https://doi.org/10.5281/zenodo.17090710

The following previously published data sets were used

1. Kobayashi-Kirschvink K
2. Nakaoka H
3. Oda A
4. Kamei KF
5. Nosho K
6. Fukushima H
7. Kanesaki Y
8. Yajima S
9. Masaki H
10. Ohta K
11. Wakamoto Y
(2018) Mendeley Data
Data for: Linear Regression Links Transcriptomic Data and Cellular Raman Spectra.

https://doi.org/10.17632/2fx3h2rx2m.1
1. Cao J
2. O'Day DR
3. Pliner HA
4. Kingsley P
5. Deng M
6. Daza RM
7. Zager MA
8. Kimberly A
9. Blecher R
10. Zhang F
11. O'Day DR
12. Spielmann M
13. Palis J
14. Doherty D
15. Steemers FJ
16. Glass IA
17. Trapnell C
18. Shendure J
(2020) NCBI Gene Expression Omnibus
ID GSE156793. A human cell atlas of fetal gene expression.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE156793
1. Replogle J
2. Weissman J
(2022) figshare
"Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq" Replogle et al. 2022 processed Perturb-seq datasets.

https://doi.org/10.25452/figshare.plus.20029387.v1
1. Keseler IM
(2017) The EcoCyc
ID Version 24.1. The EcoCyc database.

https://ecocyc.org/

References

1. Akao T
2. Yashiro I
3. Hosoyama A
4. Kitagaki H
5. Horikawa H
6. Watanabe D
7. Akada R
8. Ando Y
9. Harashima S
10. Inoue T
11. Inoue Y
12. Kajiwara S
13. Kitamoto K
14. Kitamoto N
15. Kobayashi O
16. Kuhara S
17. Masubuchi T
18. Mizoguchi H
19. Nakao Y
20. Nakazato A
21. Namise M
22. Oba T
23. Ogata T
24. Ohta A
25. Sato M
26. Shibasaki S
27. Takatsume Y
28. Tanimoto S
29. Tsuboi H
30. Nishimura A
31. Yoda K
32. Ishikawa T
33. Iwashita K
34. Fujita N
35. Shimoi H
(2011) Whole-genome sequencing of sake yeast Saccharomyces cerevisiae Kyokai no. 7
DNA Research 18:423–434.

https://doi.org/10.1093/dnares/dsr029
- PubMed
- Google Scholar
Software
1. Ara T
(2022) Brunnermunzel: (permuted) brunner-munzel test, version 2.0
R Package.

https://doi.org/10.32614/CRAN.package.brunnermunzel
(2008) A new regulatory circuit in ribosomal protein operons: S2-mediated control of the rpsB-tsf expression in vivo
RNA 14:1882–1894.

https://doi.org/10.1261/rna.1099108
- PubMed
- Google Scholar
1. Baba T
2. Ara T
3. Hasegawa M
4. Takai Y
5. Okumura Y
6. Baba M
7. Datsenko KA
8. Tomita M
9. Wanner BL
10. Mori H
(2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection
Molecular Systems Biology 2:2006.0008.

https://doi.org/10.1038/msb4100050
- PubMed
- Google Scholar
Conference
1. Belkin M
2. Niyogi P
(2001)
Laplacian eigenmaps and spectral techniques for embedding and clustering

NIPS’01: Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic. pp. 585–591.
- Google Scholar
1. Belkin M
2. Niyogi P
(2003) Laplacian eigenmaps for dimensionality reduction and data representation
Neural Computation 15:1373–1396.

https://doi.org/10.1162/089976603321780317
- Google Scholar
(2003) Iterative signature algorithm for the analysis of large-scale gene expression data
Physical Review E 67:031902.

https://doi.org/10.1103/PhysRevE.67.031902
- Google Scholar
1. Biswas S
2. Kerner K
3. Teixeira PJPL
4. Dangl JL
5. Jojic V
6. Wigge PA
(2017) Tradict enables accurate prediction of eukaryotic transcriptional states from 100 marker genes
Nature Communications 8:15309.

https://doi.org/10.1038/ncomms15309
- PubMed
- Google Scholar
1. Blattner FR
2. Plunkett G
3. Bloch CA
4. Perna NT
5. Burland V
6. Riley M
7. Collado-Vides J
8. Glasner JD
9. Rode CK
10. Mayhew GF
11. Gregor J
12. Davis NW
13. Kirkpatrick HA
14. Goeden MA
15. Rose DJ
16. Mau B
17. Shao Y
(1997) The complete genome sequence of Escherichia coli K-12
Science 277:1453–1462.

https://doi.org/10.1126/science.277.5331.1453
- PubMed
- Google Scholar
1. Bonacich P
(1972) Factoring and weighting approaches to status scores and clique identification
The Journal of Mathematical Sociology 2:113–120.

https://doi.org/10.1080/0022250X.1972.9989806
- Google Scholar
1. Bremer H
2. Dennis PP
(2008) Modulation of chemical composition and other parameters of the cell at different exponential growth rates
EcoSal Plus 3:ecosal.5.2.3.

https://doi.org/10.1128/ecosal.5.2.3
- PubMed
- Google Scholar
1. Brin S
2. Page L
(1998) The anatomy of a large-scale hypertextual Web search engine
Computer Networks and ISDN Systems 30:107–117.

https://doi.org/10.1016/S0169-7552(98)00110-X
- Google Scholar
1. Brunner E
2. Munzel U
(2000) The nonparametric Behrens‐Fisher problem: asymptotic theory and a small‐sample approximation
Biometrical Journal 42:17–25.

https://doi.org/10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U
- Google Scholar
1. Cao J
2. O’Day DR
3. Pliner HA
4. Kingsley PD
5. Deng M
6. Daza RM
7. Zager MA
8. Aldinger KA
9. Blecher-Gonen R
10. Zhang F
11. Spielmann M
12. Palis J
13. Doherty D
14. Steemers FJ
15. Glass IA
16. Trapnell C
17. Shendure J
(2020) A human cell atlas of fetal gene expression
Science 370:eaba7721.

https://doi.org/10.1126/science.aba7721
- PubMed
- Google Scholar
1. Chen F
2. Mackey AJ
3. Stoeckert CJ
4. Roos DS
(2006) OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
Nucleic Acids Research 34:D363–D368.

https://doi.org/10.1093/nar/gkj123
- PubMed
- Google Scholar
1. Datsenko KA
2. Wanner BL
(2000) One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products
PNAS 97:6640–6645.

https://doi.org/10.1073/pnas.120163297
- PubMed
- Google Scholar
1. Dean D
2. Yates JL
3. Nomura M
(1981) Escherichia coli ribosomal protein S8 feedback regulates part of spc operon
Nature 289:89–91.

https://doi.org/10.1038/289089a0
- PubMed
- Google Scholar
Book
(2005) Eigenproblems in pattern recognition
In: Corrochano EB, editors. Handbook of Geometric Computing. Springer. pp. 129–167.

https://doi.org/10.1007/3-540-28247-5_5
- Google Scholar
(2014) Advanced methods of microscope control using μManager software
Journal of Biological Methods 1:e10.

https://doi.org/10.14440/jbm.2014.36
- PubMed
- Google Scholar
(1998) Cluster analysis and display of genome-wide expression patterns
PNAS 95:14863–14868.

https://doi.org/10.1073/pnas.95.25.14863
- PubMed
- Google Scholar
Book
1. Fisher RA
(1935)
The Design of Experiments

Edinburgh: Oliver and Boyd.
- Google Scholar
(2002) Evolutionary rate in the protein interaction network
Science 296:750–752.

https://doi.org/10.1126/science.1068696
- PubMed
- Google Scholar
(2015) Expanded microbial genome coverage and improved protein family annotation in the COG database
Nucleic Acids Research 43:D261–D269.

https://doi.org/10.1093/nar/gku1223
- PubMed
- Google Scholar
(2018) Raman spectral signature reflects transcriptomic features of antibiotic resistance in Escherichia coli
Communications Biology 1:85.

https://doi.org/10.1038/s42003-018-0093-8
- PubMed
- Google Scholar
Preprint
(2019) Eigenvalue and Generalized Eigenvalue Problems: Tutorial
arXiv.

https://doi.org/10.48550/arXiv.1903.11240
- Google Scholar
1. Goodacre R
2. Timmins ÉM
3. Burton R
4. Kaderbhai N
5. Woodward AM
6. Kell DB
7. Rooney PJ
(1998) Rapid identification of urinary tract infection bacteria using hyperspectral whole-organism fingerprinting and artificial neural networks
Microbiology 144 ( Pt 5):1157–1170.

https://doi.org/10.1099/00221287-144-5-1157
- PubMed
- Google Scholar
1. Harris MA
2. Lock A
3. Bähler J
4. Oliver SG
5. Wood V
(2013) FYPO: the fission yeast phenotype ontology
Bioinformatics 29:1671–1678.

https://doi.org/10.1093/bioinformatics/btt266
- PubMed
- Google Scholar
1. Harris MA
2. Rutherford KM
3. Hayles J
4. Lock A
5. Bähler J
6. Oliver SG
7. Mata J
8. Wood V
(2022) Fission stories: using PomBase to understand Schizosaccharomyces pombe biology
Genetics 220:iyab222.

https://doi.org/10.1093/genetics/iyab222
- PubMed
- Google Scholar
1. He X
2. Zhang J
(2006) Why do hubs tend to be essential in protein networks?
PLOS Genetics 2:e88.

https://doi.org/10.1371/journal.pgen.0020088
- PubMed
- Google Scholar
(2016) Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing
Cell Systems 2:239–250.

https://doi.org/10.1016/j.cels.2016.04.001
- PubMed
- Google Scholar
1. Howe KL
2. Contreras-Moreira B
3. De Silva N
4. Maslen G
5. Akanni W
6. Allen J
7. Alvarez-Jarreta J
8. Barba M
9. Bolser DM
10. Cambell L
11. Carbajo M
12. Chakiachvili M
13. Christensen M
14. Cummins C
15. Cuzick A
16. Davis P
17. Fexova S
18. Gall A
19. George N
20. Gil L
21. Gupta P
22. Hammond-Kosack KE
23. Haskell E
24. Hunt SE
25. Jaiswal P
26. Janacek SH
27. Kersey PJ
28. Langridge N
29. Maheswari U
30. Maurel T
31. McDowall MD
32. Moore B
33. Muffato M
34. Naamati G
35. Naithani S
36. Olson A
37. Papatheodorou I
38. Patricio M
39. Paulini M
40. Pedro H
41. Perry E
42. Preece J
43. Rosello M
44. Russell M
45. Sitnik V
46. Staines DM
47. Stein J
48. Tello-Ruiz MK
49. Trevanion SJ
50. Urban M
51. Wei S
52. Ware D
53. Williams G
54. Yates AD
55. Flicek P
(2020) Ensembl Genomes 2020-enabling non-vertebrate genomic research
Nucleic Acids Research 48:D689–D695.

https://doi.org/10.1093/nar/gkz890
- PubMed
- Google Scholar
(2004) Raman spectroscopic signature of life in a living yeast cell
Journal of Raman Spectroscopy 35:525–526.

https://doi.org/10.1002/jrs.1219
- Google Scholar
1. Huang F
2. Hartwich TMP
3. Rivera-Molina FE
4. Lin Y
5. Duim WC
6. Long JJ
7. Uchil PD
8. Myers JR
9. Baird MA
10. Mothes W
11. Davidson MW
12. Toomre D
13. Bewersdorf J
(2013) Video-rate nanoscopy using sCMOS camera-specific single-molecule localization algorithms
Nature Methods 10:653–658.

https://doi.org/10.1038/nmeth.2488
- PubMed
- Google Scholar
1. Hui S
2. Silverman JM
3. Chen SS
4. Erickson DW
5. Basan M
6. Wang J
7. Hwa T
8. Williamson JR
(2015) Quantitative proteomic analysis reveals a simple strategy of global resource allocation in bacteria
Molecular Systems Biology 11:784.

https://doi.org/10.15252/msb.20145697
- PubMed
- Google Scholar
1. Husain K
2. Murugan A
(2020) Physical Constraints on Epistasis
Molecular Biology and Evolution 37:2865–2874.

https://doi.org/10.1093/molbev/msaa124
- PubMed
- Google Scholar
1. Ichimura T
2. Chiu L
3. Fujita K
4. Kawata S
5. Watanabe TM
6. Yanagida T
7. Fujita H
(2014) Visualizing cell state transition using Raman spectroscopy
PLOS ONE 9:e84478.

https://doi.org/10.1371/journal.pone.0084478
- PubMed
- Google Scholar
(2001) Lethality and centrality in protein networks
Nature 411:41–42.

https://doi.org/10.1038/35075138
- PubMed
- Google Scholar
1. Kaczanowska M
2. Rydén-Aulin M
(2007) Ribosome biogenesis and the translation process in Escherichia coli
Microbiology and Molecular Biology Reviews 71:477–494.

https://doi.org/10.1128/MMBR.00013-07
- PubMed
- Google Scholar
(2015) Universal relationship in gene-expression changes for cells in steady-growth state
Physical Review X 5:011014.

https://doi.org/10.1103/PhysRevX.5.011014
- Google Scholar
(2011) The MycoBrowser portal: a comprehensive and manually annotated resource for mycobacterial genomes
Tuberculosis 91:8–13.

https://doi.org/10.1016/j.tube.2010.09.006
- PubMed
- Google Scholar
1. Keren L
2. Zackay O
3. Lotan-Pompan M
4. Barenholz U
5. Dekel E
6. Sasson V
7. Aidelberg G
8. Bren A
9. Zeevi D
10. Weinberger A
11. Alon U
12. Milo R
13. Segal E
(2013) Promoters maintain their relative activity levels under different growth conditions
Molecular Systems Biology 9:701.

https://doi.org/10.1038/msb.2013.59
- PubMed
- Google Scholar
(2017) The EcoCyc database: reflecting new knowledge about Escherichia coli K-12
Nucleic Acids Research 45:D543–D550.

https://doi.org/10.1093/nar/gkw1003
- PubMed
- Google Scholar
1. Kobayashi-Kirschvink KJ
2. Nakaoka H
3. Oda A
4. Kamei KF
5. Nosho K
6. Fukushima H
7. Kanesaki Y
8. Yajima S
9. Masaki H
10. Ohta K
11. Wakamoto Y
(2018) Linear regression links transcriptomic data and cellular raman spectra
Cell Systems 7:104–117.

https://doi.org/10.1016/j.cels.2018.05.015
- PubMed
- Google Scholar
1. Kobayashi-Kirschvink KJ
2. Comiter CS
3. Gaddam S
4. Joren T
5. Grody EI
6. Ounadjela JR
7. Zhang K
8. Ge B
9. Kang JW
10. Xavier RJ
11. So PTC
12. Biancalani T
13. Shu J
14. Regev A
(2024) Prediction of single-cell RNA expression profiles in live cells by Raman microscopy with Raman2RNA
Nature Biotechnology 42:1726–1734.

https://doi.org/10.1038/s41587-023-02082-2
- PubMed
- Google Scholar
(2022) History-dependent physiological adaptation to lethal genetic modification under antibiotic exposure
eLife 11:e74486.

https://doi.org/10.7554/eLife.74486
- PubMed
- Google Scholar
(2017) Absolute quantification of protein and mRNA abundances demonstrate variability in gene-specific translation efficiency in yeast
Cell Systems 4:495–504.

https://doi.org/10.1016/j.cels.2017.03.003
- PubMed
- Google Scholar
1. Lalanne J-B
2. Taggart JC
3. Guo MS
4. Herzel L
5. Schieler A
6. Li G-W
(2018) Evolutionary convergence of pathway-specific enzyme expression stoichiometry
Cell 173:749–761.

https://doi.org/10.1016/j.cell.2018.03.007
- PubMed
- Google Scholar
1. Lalanne JB
2. Li GW
(2021) First-principles model of optimal translation factors stoichiometry
eLife 10:e69222.

https://doi.org/10.7554/eLife.69222
- PubMed
- Google Scholar
1. Li X
2. Li W
3. Zeng M
4. Zheng R
5. Li M
(2020) Network-based methods for predicting essential genes or proteins: a survey
Briefings in Bioinformatics 21:566–583.

https://doi.org/10.1093/bib/bbz017
- PubMed
- Google Scholar
Preprint
1. Mezzadri F
(2006) How to generate random matrices from the classical compact groups
arXiv.

https://arxiv.org/abs/math-ph/0609050
- Google Scholar
1. Neidhardt FC
2. Magasanik B
(1960) Studies on the role of ribonucleic acid in the growth of bacteria
Biochimica et Biophysica Acta 42:99–116.

https://doi.org/10.1016/0006-3002(60)90757-5
- PubMed
- Google Scholar
1. Nieminen UJ
(1973) On the centrality in a directed graph
Social Science Research 2:371–378.

https://doi.org/10.1016/0049-089X(73)90010-0
- Google Scholar
1. Nomura M
2. Yates JL
3. Dean D
4. Post LE
(1980) Feedback regulation of ribosomal protein gene expression in Escherichia coli: structural homology of ribosomal RNA and ribosomal protein mRNA
PNAS 77:7084–7088.

https://doi.org/10.1073/pnas.77.12.7084
- PubMed
- Google Scholar
1. Phipson B
2. Smyth GK
(2010) Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn
Statistical Applications in Genetics and Molecular Biology 9:Article39.

https://doi.org/10.2202/1544-6115.1585
- PubMed
- Google Scholar
1. Pitman EJG
(1937) Significance tests which may be applied to samples from any populations
Supplement to the Journal of the Royal Statistical Society 4:119–130.

https://doi.org/10.2307/2984124
- Google Scholar
Book
1. Portier C
2. Grunberg-Manago M
(1993) Regulation of ribosomal protein mRNA translation in bacteria
In: Ilan J, editors. Translational Regulation of Gene Expression 2. Springer. pp. 23–47.

https://doi.org/10.1007/978-1-4615-2894-4_2
- Google Scholar
1. Replogle JM
2. Saunders RA
3. Pogson AN
4. Hussmann JA
5. Lenail A
6. Guna A
7. Mascibroda L
8. Wagner EJ
9. Adelman K
10. Lithwick-Yanai G
11. Iremadze N
12. Oberstrass F
13. Lipson D
14. Bonnar JL
15. Jost M
16. Norman TM
17. Weissman JS
(2022) Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq
Cell 185:2559–2575.

https://doi.org/10.1016/j.cell.2022.05.013
- PubMed
- Google Scholar
1. Roy B
2. Granas D
3. Bragg F Jr
4. Cher JAY
5. White MA
6. Stormo GD
7. Granas D
8. Bragg F Jr
9. Cher JAY
10. White MA
11. Stormo GD
(2020) Autoregulation of yeast ribosomal proteins discovered by efficient search for feedback regulation
Communications Biology 3:761.

https://doi.org/10.1038/s42003-020-01494-z
- PubMed
- Google Scholar
1. Sato TU
2. Kaneko K
(2020) Evolutionary dimension reduction in phenotypic space
Physical Review Research 2:013197.

https://doi.org/10.1103/PhysRevResearch.2.013197
- Google Scholar
1. Savitzky A
2. Golay MJE
(1964) Smoothing and differentiation of data by simplified least squares procedures
Analytical Chemistry 36:1627–1639.

https://doi.org/10.1021/ac60214a047
- Google Scholar
1. Schmidt A
2. Kochanowski K
3. Vedelaar S
4. Ahrné E
5. Volkmer B
6. Callipo L
7. Knoops K
8. Bauer M
9. Aebersold R
10. Heinemann M
(2016) The quantitative and condition-dependent Escherichia coli proteome
Nature Biotechnology 34:104–110.

https://doi.org/10.1038/nbt.3418
- PubMed
- Google Scholar
(2015) Absolute proteome composition and dynamics during dormancy and resuscitation of Mycobacterium tuberculosis
Cell Host & Microbe 18:96–108.

https://doi.org/10.1016/j.chom.2015.06.001
- PubMed
- Google Scholar
1. Scott M
2. Gunderson CW
3. Mateescu EM
4. Zhang Z
5. Hwa T
(2010) Interdependence of cell growth and gene expression: origins and consequences
Science 330:1099–1102.

https://doi.org/10.1126/science.1192588
- PubMed
- Google Scholar
1. Segal E
2. Shapira M
3. Regev A
4. Pe’er D
5. Botstein D
6. Koller D
7. Friedman N
(2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data
Nature Genetics 34:166–176.

https://doi.org/10.1038/ng1165
- PubMed
- Google Scholar
1. Segarra S
2. Ribeiro A
(2015) Stability and continuity of centrality measures in weighted graphs
IEEE Transactions on Signal Processing 64:543–555.

https://doi.org/10.1109/TSP.2015.2486740
- Google Scholar
1. Soupene E
2. van Heeswijk WC
3. Plumbridge J
4. Stewart V
5. Bertenthal D
6. Lee H
7. Prasad G
8. Paliy O
9. Charernnoppakul P
10. Kustu S
(2003) Physiological studies of Escherichia coli strain MG1655: growth defects and apparent cross-regulation of gene expression
Journal of Bacteriology 185:5611–5626.

https://doi.org/10.1128/JB.185.18.5611-5626.2003
- PubMed
- Google Scholar
(2016) Inferring time derivatives including cell growth rates using Gaussian processes
Nature Communications 7:13766.

https://doi.org/10.1038/ncomms13766
- PubMed
- Google Scholar
(1997) A genomic perspective on protein families
Science 278:631–637.

https://doi.org/10.1126/science.278.5338.631
- PubMed
- Google Scholar
1. Tatusov RL
2. Fedorova ND
3. Jackson JD
4. Jacobs AR
5. Kiryutin B
6. Koonin EV
7. Krylov DM
8. Mazumder R
9. Mekhedov SL
10. Nikolskaya AN
11. Rao BS
12. Smirnov S
13. Sverdlov AV
14. Vasudevan S
15. Wolf YI
16. Yin JJ
17. Natale DA
(2003) The COG database: an updated version includes eukaryotes
BMC Bioinformatics 4:41.

https://doi.org/10.1186/1471-2105-4-41
- PubMed
- Google Scholar
1. van der Maaten L
2. Hinton G
(2008)
Visualizing data using t-SNE

Journal of Machine Learning Research 9:2579–2605.
- Google Scholar
1. von Luxburg U
(2007) A tutorial on spectral clustering
Statistics and Computing 17:395–416.

https://doi.org/10.1007/s11222-007-9033-z
- Google Scholar
Book
1. Waddington CH
(1957)
The Strategy of the Genes

George Allen & Unwin Ltd.
- Google Scholar
1. Waddington CH
(1959) Canalization of development and genetic assimilation of acquired characters
Nature 183:1654–1655.

https://doi.org/10.1038/1831654a0
- PubMed
- Google Scholar
1. Wang T
2. Birsoy K
3. Hughes NW
4. Krupczak KM
5. Post Y
6. Wei JJ
7. Lander ES
8. Sabatini DM
(2015) Identification and characterization of essential genes in the human genome
Science 350:1096–1101.

https://doi.org/10.1126/science.aac7041
- PubMed
- Google Scholar
(2003) Evolutionary conservation of motif constituents in the yeast protein interaction network
Nature Genetics 35:176–179.

https://doi.org/10.1038/ng1242
- PubMed
- Google Scholar
1. You C
2. Okano H
3. Hui S
4. Zhang Z
5. Kim M
6. Gunderson CW
7. Wang Y-P
8. Lenz P
9. Yan D
10. Hwa T
(2013) Coordination of bacterial proteome with metabolism by cyclic AMP signalling
Nature 500:301–306.

https://doi.org/10.1038/nature12446
- PubMed
- Google Scholar
1. Yu H
2. Kim PM
3. Sprecher E
4. Trifonov V
5. Gerstein M
(2007) The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics
PLOS Computational Biology 3:e59.

https://doi.org/10.1371/journal.pcbi.0030059
- PubMed
- Google Scholar

Article and author information

Author details

Ken-ichiro F Kamei

Department of Basic Science, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan

Contribution
Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Validation, Investigation, Methodology, Writing – original draft, Writing – review and editing

For correspondence
kenichiro_kamei@cell.c.u-tokyo.ac.jp

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0009-0002-8026-4454
Koseki J Kobayashi-Kirschvink

Department of Medicine, The University of Chicago, Chicago, United States

Contribution
Conceptualization, Writing – review and editing

Competing interests
Inventor on patents (JP6993682 and US10,379,052 B2) filed by The University of Tokyo

"This ORCID iD identifies the author of this article:" 0000-0001-6590-3823
Takashi Nozoe
1. Department of Basic Science, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
2. Research Center for Complex Systems Biology, The University of Tokyo, Tokyo, Japan
3. Universal Biology Institute, The University of Tokyo, Tokyo, Japan
Contribution
Formal analysis, Funding acquisition, Investigation, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2556-6484
Hidenori Nakaoka

Department of Optical Imaging, Advanced Research Promotion Center Tokushima University, Tokushima, Japan

Contribution
Conceptualization, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-8465-5853
Miki Umetani

Department of Biology, New York University, New York, United States

Contribution
Investigation, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3171-4327
Yuichi Wakamoto
1. Department of Basic Science, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
2. Research Center for Complex Systems Biology, The University of Tokyo, Tokyo, Japan
3. Universal Biology Institute, The University of Tokyo, Tokyo, Japan
Contribution
Conceptualization, Supervision, Funding acquisition, Writing – original draft, Writing – review and editing

For correspondence
cwaka@mail.ecc.u-tokyo.ac.jp

Competing interests
Inventor on patents (JP6993682 and US10,379,052 B2) filed by The University of Tokyo

"This ORCID iD identifies the author of this article:" 0000-0002-6233-0844

Funding

Japan Science and Technology Agency (JPMJCR1927)

Yuichi Wakamoto

Japan Science and Technology Agency (JPMJER1902)

Yuichi Wakamoto

Japan Society for the Promotion of Science (19J22448)

Ken-ichiro F Kamei

Japan Society for the Promotion of Science (21K20672)

Takashi Nozoe

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Matthias Heinemann and Silke Bonsing-Vedelaar for detailed information on the E. coli culture conditions; Doeke R Hekstra, Tetsuya J Kobayashi, Takafumi Miyamoto, John Russell, and Ian Hunt-Isaak for reading the manuscript and providing critical comments; Kunihiko Kaneko, Chikara Furusawa, Yasushi Okada, and members of the Wakamoto Lab and the Universal Biology Institute for discussion and encouragement. This work was supported by JST CREST Grant Number JPMJCR1927 (YW); JST ERATO Grant Number JPMJER1902 (YW); JSPS KAKENHI Grant Numbers 19J22448 (KFK) and 21K20672 (TN).

Version history

Preprint posted: June 17, 2024
Sent for peer review: September 24, 2024
Reviewed Preprint version 1: March 24, 2025
Reviewed Preprint version 2: November 27, 2025
Version of Record published: April 14, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.101485. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

2,166

views
191

downloads
2

citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Citations by DOI

2

citations for umbrella DOI https://doi.org/10.7554/eLife.101485

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Article PDF

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Ken-ichiro F Kamei
Koseki J Kobayashi-Kirschvink
Takashi Nozoe
Hidenori Nakaoka
Miki Umetani
Yuichi Wakamoto

(2026)

Revealing global stoichiometry conservation architecture in cells from Raman spectral patterns

eLife 14:RP101485.

https://doi.org/10.7554/eLife.101485.3

Share this article

Cite this article

Cellular physiological state differences detected by Raman spectral global patterns and gene expression profiles.

Estimation of proteomes from Raman spectra.

List of scalars, vectors, and matrices in the main text.

A stoichiometrically conserved protein group identified by an analysis of the Raman-proteome coefficient matrix.

Extracting stoichiometrically conserved groups (SCGs) from proteome data.

A proteome structure characterized by global stoichiometry conservation relationships.

Raman-based proteome structure and its similarity to stoichiometry-based proteome structure.

Proportionality between stoichiometry conservation centrality and expression generality.

Schematic illustration of the approach in this study.

Custom-built Raman microscope and analyses of E. coli Raman spectra.

Estimation of proteomes from Raman spectra.

Comparison of stoichiometry conservation among Clusters of Orthologous Group (COG) classes.

Single-gene-level growth law in the homeostatic core.

Functional relevance of stoichiometry conservation centrality.

Distributions and constraints with respect to stoichiometry conservation centrality (degree).

Properties of normalized expression vectors.

Mathematical analyses of the main Raman-proteome data.

Orthant correspondences between Raman spectra in linear discriminant analysis (LDA) space and condition-specific proteins in Raman-proteome coefficient proteome space.

Stoichiometry-based omics structures and their correspondences to Raman-based omics structures for additional datasets.

Analyses of the mathematical relation connecting two types of omics spaces.

Stoichiometry-based proteome structures for additional datasets.

Dependence of low-dimensional correspondence between Raman spectra and proteomes on the number of conditions.

List of culture conditions.

Evaluation of the overall estimation error with various distance measures (the case where LDA1 to LDA4 axes were used).

Evaluation of the overall estimation error with various distance measures (the case where all the 14 LDA axes were used).

Gene list of SCG 1 (homeostatic core).

Gene list of SCG 2.

Gene list of SCG 3.

Gene list of SCG 4.

Gene list of SCG 5.

Interpretations of rh,r^i,bh, and b^j.

Mathematical relation between Raman-proteome coefficients and cosine similarity LE (csLE) proteomes.

Author details

Ken-ichiro F Kamei

Contribution

For correspondence

Competing interests

Koseki J Kobayashi-Kirschvink

Contribution

Competing interests

Takashi Nozoe

Contribution

Competing interests

Hidenori Nakaoka

Contribution

Competing interests

Miki Umetani

Contribution

Competing interests

Yuichi Wakamoto

Contribution

For correspondence

Competing interests

Interpretations of $r_{h}$ , ${\hat{r}}_{i}$ , $b_{h}$ , and ${\hat{b}}_{j}$ .