Channel capacity of signaling networks quantifies their fidelity in sensing extracellular inputs. Low estimates of channel capacities for several mammalian signaling networks suggest that cells can barely detect the presence/absence of environmental signals. However, given the extensive heterogeneity in cell states, we hypothesize that the sensing ability itself varies from cell to cell in a cell state dependent manner. In this work, we present an information theoretic framework to quantify the distribution of sensing abilities from single cell data. Using data on two mammalian pathways, we show that sensing abilities are widely distributed in the population and most cells achieve better resolution of inputs than what is implied by traditional cell state agnostic estimates. We verify these predictions using live cell imaging data on the IGFR/FoxO pathway. Importantly, we identify cell state variables that correlate with cells’ sensing abilities. This information theoretic framework will significantly improve our understanding of how cells sense in their environment.
In this valuable paper, the authors use an existing theoretical framework relying on information theory and maximum entropy inference in order to quantify how much information single cells can carry, taking into account their internal state. They reanalyze experimental data in this light. Despite some limitations of the data, the study convincingly highlights the difference between single-cell and population channel capacities. This result should be of interest to the quantitative biology community, as it contributes to explaining why channel capacities are apparently low in cells.
In cell populations, there is a significant overlap in cellular responses to environmental stimuli of differing strengths. This raises a fundamental question1: do signaling networks in cells relay accurate information about their environment to take appropriate action? And if not, where along the signal transduction pathway is the information lost2,3? Mutual information (MI) quantifies the information content in a cellular output about extracellular inputs. For an input x (e.g., concentration of a ligand) and an output (e.g., intracellular species concentration4–6 or a cellular phenotype7,8), the MI is defined as9
Experimental single cell methods such as flow cytometry10, immunofluorescence10, mass spectrometry11, or live cell imaging12 allow us to estimate response histograms p(x|u) across several inputs. Using these distributions, we can estimate the maximum of the MI (the channel capacity, CC) by optimizing Eq. 1 over input distributions p (u). The CC quantifies fidelity of signal transduction13. For example, a CC of 1 bit implies that that the cells can at best distinguish between two input levels, with higher CCs indicating that cells can resolve multiple input states. Importantly, CC can be used to identify bottlenecks in signaling3,14.
The CC has been estimated for input-output relationships in several mammalian signaling networks4,6,15– 18. When the output is defined as levels of a single protein a fixed time, the CC was found to be surprisingly low, ∼1 − 1.5 bits. These estimates have been improved by considering multidimensional outputs5 or time varying inputs19. While these modifications led to somewhat higher CC estimates, the overall conclusion that cells know little about their environment remains well established. In contrast, significantly higher CC estimates have been found when the output is defined as an average over the cell population6, suggesting that the only way to overcome low sensing fidelity is collective response.
These previous calculations estimated one channel capacity for all cells in a population, implicitly assuming that individual cells have similar sensing capabilities. We now know that cells exhibit extensive heterogeneity in cell state variables such as abundances of key signaling proteins11,20,21, gene expression levels21, epigenetic status22, etc. These variables can remain constant throughout cells’ entire life cycle and beyond23. Heterogeneity in cell state variables leads to a variability in response to extracellular cues, including chemotherapeutic drugs23,24, mitogens25, hormones26, and chemotactic signals7,8. Therefore, it is quite reasonable to expect that cells have variable abilities to sense their environment27,28. And that the differences in cells’ abilities are explained by differences in cell states.
There is currently no conceptual framework or computational algorithms to estimate the distribution of sensing abilities in cell populations. To that end, we introduce an information theoretic quantity CeeMI: Cell state dependent Mutual Information. We show that using typically collected single cell data and a computational model of the underlying signaling network, we can estimate the distribution pCeeMI (I) of single cell signaling fidelities (single cell mutual information values). We also show that we can identify cell state variables that make some cells better and others worse at sensing their environment.
Using an illustrative example, we show that in heterogeneous cell populations, the traditional cell state agnostic estimate of mutual information is significantly lower than the mutual information of signaling networks in typical cells. Next, using previously collected experimental data, we estimate pCeeMI (I) for two important mammalian signaling pathways; the Epidermal growth factor (EGF)/EGF receptor pathway29 and the Insulin like growth factor (IGF)/Forkhead Box protein O (FoxO) pathway29, we show that while the cell state agnostic CC estimates for both pathways are ∼ 1 bit, most individual cells are predicted to be significantly better at resolving different inputs. Using live cell imaging data for the IGF/FoxO pathway, we show that our estimate of variability in sensing abilities matches closely with experimental estimates. Importantly, for this pathway, we also verify our prediction that specific cell state variables dictate cells’ sensing abilities. We believe that CeeMI will be an important tool to explore variability in cellular sensing abilities and in identifying what makes some cells better and others worse in sensing their environment.
Conditional mutual information models single cells as cell state dependent channels
Consider a cell population where cells are characterized by state variables θ. These include abundances of signaling proteins and enzymes, epigenetic states, etc. We assume that cell states are temporally stable30, that is, θ remains constant over a time scale that is much longer than typical fluctuations in environmental inputs. We assume that these variables are distributed in the population according to a distribution p(θ). If x denotes an output of choice (e.g., phosphorylation levels of some protein at a specific time) and u denotes the input (e.g., ligand concentration), the experimentally measured response distribution p(x|u) can be decomposed as:
where p(x|u, θ) is the distribution of the output x conditioned on the input u and cell state variables θ. In most cases, p(x|u, θ) may not be experimentally accessible. However, it is conceptually well defined.
Using p(x|u, θ), we can define the cell state dependent mutual information I(θ) for a fixed input distribution p(u) analogously to Eq. 1:
where δ (x) is the Dirac delta function. We can also compute the joint distribution between the single cell mutual information and any cell state variable of interest χ (e.g., cell surface receptor levels):
where χ (θ) is the value of the biochemical parameter when cell state parameters are fixed at θ. The distributions defined in Eq. 5 quantify the interdependency between a cell’s signaling fidelity I(θ) and cell specific biochemical parameters χ (θ). As we will see below, the distributions in Eq. 4 and Eq. 5 can be experimentally verified when appropriate measurements are available. Finally, we define the population average of the cell state dependent mutual information:
In information theory, ICee is known as the conditional mutual information9 between the input u and the output x conditioned on θ. From these definitions of it can be shown that ICee ≥ I (SI Section 1). That is, if cell state variables remain approximately constant, at least some cells in the population have better sensing abilities compared to what is implied by the cell state agnostic mutual information (Eq. 1). Since ICee depends on the input distribution p(u),we can find an optimal input distribution p(u) that maximizes ICee (SI Section 1). Going forward, unless an input distribution is specified, the distributions pCeeMI (I) and pCeeMI (I, χ) are discussed in the context of this optimal input distribution.
Maximum entropy inference can estimate pCeeMI (I)
Estimation of pCeeMI (I) requires decomposing the experimentally observed response p(x|u) into cell specific output distributions p(x|u, θ) and the distribution of cell state variables p(θ) (Eq. 3 and 4). This problem is difficult to solve given that neither of these distributions are accessible in typical experiments. For many signaling networks, stochastic biochemical models can be constructed to obtain the cell specific output distribution p(x|u, θ). Here, χ comprise protein concentrations and effective rates of enzymatic reactions and serve as a proxy for cell state variables. Given the experimentally measured cell state averaged response p(x|u) and the model predicted cell specific output distribution p(x|u, θ), we need a computational method to infer p(θ) (see Eq. 2). The problem of inference of parameter heterogeneity is a non-trivial inverse problem31. We use our previously develop maximum entropy-based approach32 to infer p(θ). This way, we can estimate p CeeMI (I) using experimentally obtained cell state agnostic response p(x|u) and a stochastic biochemical model of the underlying signaling network.
Cell state agnostic mutual information underestimates cells’ ability to sense their environment
To illustrate the effect of heterogeneity of cell state variables on the cell state agnostic estimate of mutual information (which we call cell state agnostic mutual information or ICSA from now onwards), we consider a simple stochastic biochemical network of a receptor-ligand system. Details of the model and the calculations presented below can be found in SI Section 2. Briefly, extracellular ligand L binds to cell surface receptors R. Steady state levels of the ligand bound receptor B is considered the output. The signaling network is parametrized by several cell state variables θ such as receptor levels, rates of binding/unbinding, etc. For simplicity, we assume that only one variable, steady state receptor level R0 in the absence of the ligand, varies in the population. Calculations with variability in other parameters are presented in SI Section 2.
In this population, cells’ response p(B|L,R0) is distributed as a Poisson distribution whose mean is proportional to the cell state variable R0 (SI Section 2). That is, when all other parameters are fixed, a higher R0 leads to lower noise. To calculate cell state dependent mutual information (Eq. 3), we assume that p(L) is a gamma distribution. As expected, I(R0) (Eq. 3) between the output B and the input L increases monotonically with R0 (inset in Fig. 2A). Moreover, given that R0 varies in the population
(assumed to be a gamma distribution as well), the sensing ability varies in the population as well (Fig. 2A). Notably, the average ICee of I(R0) remains relatively robust to variation in R0. At the same time, the traditional estimate ICSA, which is the mutual information between the input L and the cell state agnostic population response p(B|L) = ∫ p(B|L, R0) p(R0) dR0 (Eq. 1 and Eq. 2), decreases as the population heterogeneity in R0 increases. Importantly, ICSA is significantly lower than the sensing ability of most cells in the population (Fig. 2A). This is because the overlap in the population response distributions is significantly larger than single cell response distributions (Fig. 2B), which arises due to the variability in cell state variables. This simple example illustrates that the traditional mutual information estimates may severely underestimate cells’ ability to resolve inputs, especially when parameters are heterogeneously distributed. Moreover, it is crucial to explicitly account for heterogeneity in cell state variables when estimating fidelity of cellular communication channels.
Experimental verification of pCeeMI(I) using growth factor signaling networks
In real cell populations, state variables that govern signaling dynamics such as protein levels20,26 (receptors, kinases, dephosphatases, etc.), endocytosis rates33, ligand binding rates34, etc. are known to vary from cell to cell. Therefore, we expect the sensing abilities to be widely distributed as well. To experimentally verify the computational prediction of the distribution pCeeMI (I) of sensing abilities, we need an experimental system that allows us to approximate the cell state specific response distribution p(x|u, θ). The IGF/FoxO pathway is an ideal system for these explorations for several reasons. First, following IGF stimulation, the transcription factor FoxO is pulled out of the nucleus29. GFP-tagged variant of FoxO can be used to detect the dynamics of nuclear levels of FoxO at the single cell level28. Second, nuclear FoxO levels reach an approximate steady state within 30-45 minutes of constant IGF stimulation with FoxO levels decreasing with increasing IGF dose28. As a result, an approximate cell state specific distribution p(x |u, θ) of steady state levels of FoxO can be obtained by stimulating the same cell with successive IGF levels28. Finally, the biochemical interactions in the IGF/FoxO are well studied29 (Fig. 2C), allowing us to a build stochastic biochemical model based on previous computational efforts35 that fits the single cell data accurately. Another system where pCeeMI (I) can in principle be verified is the EGF/EGFR pathway29 (Fig. 2C). Here too, abundance of cell surface EGFR levels can be tracked in live cells following EGF stimulation36,37, allowing us to obtain cell-state specific response distribution p(x|u, θ). Below, we show estimates of pCeeMI(I) for both pathways and an experimental verification of our estimates for the IGF/FoxO pathway where live-cell imaging data was previously collected28.
The details of the calculations presented below can be found in SI Sections 3 and 4. Briefly, we first constructed stochastic biochemical models for the two pathways based on known interactions29 (Fig. 2C) and previous models35,38. The output for the EGFR pathway was defined as the steady state levels of cell surface EGF receptors and the output for the IGF/FoxO pathway was defined as the steady state nuclear levels of the transcription factor FoxO. Using previously collected single cell data on the two pathways28,32 and our maximum entropy-based framework32, we estimated the distribution over parameters p(θ) for the two pathways (SI Section 3). Using these inferred distributions, and the model-predicted cell state specific response distribution p(x|u, θ), we can compute pCeeMI (I) for any specified input distribution p(u). We choose the support of the input distribution as the ligand concentrations used in the experimental setup. The estimates shown in Fig. 2D and Fig. 2E show pCeeMI(I) corresponding to the input distribution that maximizes the conditional mutual information ICee (see Eq. 6).
Similar to the illustrative example (Fig. 2A), there is a wide distribution of single cell sensing fidelities in real populations (Fig. 2D and Fig. 2E). Moreover, most cells are better sensors compared to what is indicated by the maximum of ICSA which was estimated to be ∼ 1 bit for both pathways. Indeed, as seen in the insets of Fig. 2D and Fig. 2E, the input distribution corresponding to the maximum of ICSA is concentrated on the lowest and the highest input for both pathways, indicating that cells may be able to detect only two input levels. In contrast, the input distribution corresponding to the maximum of ICee is close to uniform, suggesting that individual cells can in fact resolve different ligand levels28.
To verify our computational estimate of pCeeMI (I) for the IGF/FoxO pathway, we reanalyzed previously collected data28 wherein several cells were stimulated using successive IGF levels. The details of our calculations can be found in SI Section 4. Briefly, the cells reached an approximate steady state within 60 minutes of each stimulation and nuclear FoxO levels measured between 60 to 90 minutes were used to approximate an experimental cell state specific response distribution p(x|u, θ). The distribution pCeeMI (I) was then obtained by maximizing the average mutual information ICee (averaged over all cells) with respect to the input distribution. As seen in Fig. 2D and Fig. 2E, the experimentally evaluated single cell signaling fidelities match closely with our computational estimates. Moreover, as predicted using our computational analysis, individual cells in the population were significantly better at sensing than what is implied by the maximum of ICSA. Indeed, the distribution of steady state FoxO levels were found to be well-resolved at the single cell level as well (Fig. 2E). Live cell imaging data for the EGFR pathway was not available and we leave it to future studies to validate our predictions.
Our calculations show that real cell populations comprise cells that have differing sensing fidelities, individual cells are significantly better at sensing their environment than what traditional estimates would indicate, and importantly, the CeeMI approach can accurately estimate the variation in signaling performances using readily collected experimental data and stochastic biochemical models.
CeeMI identifies biochemical parameters that determine cells’ ability to sense their environment
We expect that cells’ ability to sense their environment depends on their state variables. Our approach allows us to quantify the joint distribution pCeeMI (I, χ) of single cell signaling fidelity and biochemical state variables that take part in the signaling network. To test whether we can identify variables that are predictive of cellular fidelity, we estimated the joint distribution pCeeMI(I, χ) for two variables that were experimentally accessible, response range of nuclear FoxO (Fig. 3 left, see inset) and total nuclear FoxO levels prior to IGF stimulation (Fig. 3 right, see inset). In both figures, the shaded regions show computational estimates of the joint probability distributions, and the red circles represent real cells. The green and the cyan trend lines represent computational and experimental binned averages.
One may expect that higher total nuclear FoxO levels could result in lower noise and therefore better sensing abilities. However, we find that total nuclear FoxO levels only weakly correlate with cell state dependent mutual information (correlation coefficient r = 0.16 for computational estimates and r = 0.04 for experimental data). In contrast, cell state dependent mutual information depended strongly on the dynamic range of the response (correlation coefficient r = 0.53 for computational estimates and r = 0.29 for experimental data). Importantly, the model captured the observation that cells with a small response range had a variable sensing abilities while cells with a large response range all performed well in resolving extracellular IGF levels. Surprisingly, the total nuclear FoxO levels only weakly correlated with the cell specific mutual information. In SI Section 5, we show the predicted joint distributions pCeeMI(I, χ) for several other biochemical variables that can potentially govern single cells’ response to extracellular IGF stimuli. This way, CeeMI can be used to systematically identify cell state variables that differentiate between good sensors and bad sensors of the environmental stimuli.
Cell populations are characterized by heterogeneity in cell state variables that is responsible for important phenotypic differences including sensitivity to drugs23,24, response to chemotactic signals7,8, and following proliferation cues25. Therefore, it is reasonable to expect that cells’ ability to sense their environment depends on their state and is therefore variable across cells in a population. To quantify this heterogeneity, here, we developed a novel information theoretic framework that takes as input easily measurable single cell data and models of signaling networks to quantify the distribution pCeeMI (I) of single cell sensing abilities. Our framework also quantifies the joint distribution pCeeMI (I, χ) of cell specific sensing ability and any biochemical cell state variable. Importantly, using two growth factor pathways, we showed that individual cells in real cell populations were much better at sensing their environment compared to what is implied by the traditional estimate of channel capacities of signaling networks.
Our framework will be useful in identifying bottlenecks in signal transduction. Many cellular phenotypes such as chemotaxis7,8 and cell proliferation39 exhibit a weak correlation between cellular outputs (e.g., directional alignment with chemical gradients) with the input (e.g., gradient strength), resulting in a low channel capacity even for individual cells. In such cases, it is important to understand where exactly along the information transduction pathway is the information about the gradient is lost. If traditional calculations are pursued, for example, for movement of mammalian cells under EGF gradients7,8, one may conclude that the information loss likely happens right at the receptor level (Fig. 2D). In contrast, CeeMI will allow us to disentangle the effect of cell state heterogeneity and noisy cellular response to precisely pinpoint intracellular signaling nodes that are responsible for signal corruption.
A limitation of the current approach is the assumption that cell state variables remain constant over the time scale of typical environmental fluctuations. While many cell state variables remain constant over the life spans of cells and beyond23,40, it is likely that state changes may occur within the lifespan of a cell41. These dynamical changes can be accommodated easily in the framework. Here, instead of fixing the cell state variables θ, we can treat them as initial conditions and propagate them stochastically with their own dynamics. In the limit of very fast dynamics where individual cells rapidly transition cell states, our framework will agree with traditional cell state agnostic estimates of the channel capacity. In contrast, if the cell state dynamics are slow, our framework highlights differences between cells in the population.
In summary, we showed that like other phenotypes, the ability to sense the environment is itself heterogeneously distributed in a cell population. Moreover, we also showed that mammalian cells appear to be significantly better at sensing their environment than what traditional mutual information calculation suggests. Finally, we showed that we could identify cell state variables that made some cells better sensors compared to others. We believe that CeeMI will be an important framework in quantifying fidelity of input/output relationships in heterogeneous cell populations.
AG, HA, and PD are supported by NIGMS grant R35GM142547. The authors would like to thank Andre Levchenko for useful discussions.
- 1.Cellular noise and information transmissionCurr. Opin. Biotechnol 28:156–164
- 2.Fundamental Limits to Cellular SensingJ. Stat. Phys 162:1395–1424
- 3.The application of information theory to biochemical signaling systemsPhys. Biol 9
- 4.Information Transduction Capacity of Noisy Biochemical Signaling NetworksScience 334:354–358
- 5.Accurate information transmission through dynamic biochemical signaling networksScience 346:1370–1373
- 6.Fundamental trade-offs between information flow in single cells and cellular populationsProc. Natl. Acad. Sci 114:5755–5760
- 7.Physical constraints on accuracy and persistence during breast cancer cell chemotaxisPLOS Comput. Biol 15
- 8.Signal processing capacity of the cellular sensory machinery regulates the accuracy of chemotaxis under complex cuesiScience 24
- 9.Elements of information theory
- 10.Single-cell protein analysisCurr. Opin. Biotechnol 23:83–88
- 11.Single-cell protein analysis by mass spectrometryCurr. Opin. Chem. Biol 60:1–9
- 12.Live-cell imaging in the era of too many microscopesCurr. Opin. Cell Biol 66:34–42
- 13.Quantifying information of intracellular signaling: progress with machine learningRep. Prog. Phys 85
- 14.Limits on information transduction through amplitude and frequency regulation of transcription factor activityeLife 4
- 15.Robustness and Compensation of Information Transmission of Signaling PathwaysScience 341:558–561
- 16.Sensing relative signal in the Tgf-β/Smad pathwayProc. Natl. Acad. Sci 114
- 17.Information Transfer in Gonadotropin-releasing Hormone (GnRH) SignalingJ. Biol. Chem 291:2246–2259
- 18.Robustness and Information Transfer within IL-6-induced JAK/STAT SignallingCommun. Biol 2
- 19.Mapping the dynamic transfer functions of eukaryotic gene regulationCell Syst 12:1079–1093
- 20.Heterogeneous kinetics of AKT signaling in individual cells are accounted for by variable protein concentrationFront. Physiol 3
- 21.Highly multiplexed simultaneous detection of RNAs and proteins in single cellsNat. Methods 13:269–275
- 22.Molecular Signals of Epigenetic StatesScience 330:612–616
- 23.Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistanceNature 546:431–435
- 24.Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosisNature 459:428–432
- 25.Cell cycle proliferation decisions: the impact of single cell analysesFEBS J 284:362–375
- 26.Signaling Heterogeneity is Defined by Pathway Architecture and Intercellular Variability in Protein ExpressioniScience 24
- 27.High capacity in G protein-coupled receptor signalingNat. Commun 9
- 28.Individual Cells Can Resolve Variations in Stimulus Intensity along the IGF-PI3K-AKT Signaling AxisCell Syst 9:580–588
- 29.Signal transduction: principles, pathways, and processes
- 30.Defining cell types and states with single-cell genomicsGenome Res 25:1491–1498
- 31.A review of selected techniques in inverse problem nonparametric probability distribution estimationJ. Inverse Ill-Posed Probl 20
- 32.Maximum Entropy Framework for Predictive Inference of Cell Population Heterogeneity and Responses in Signaling NetworksCell Syst 10:204–212
- 33.Correlated receptor transport processes buffer single-cell heterogeneityPLOS Comput. Biol 13
- 34.Heterogeneity of epidermal growth factor binding kinetics on individual cellsBiophys. J 73:1089–1102
- 35.Mathematical modeling reveals modulation of both nuclear influx and efflux of Foxo1 by the IGF-I/PI3K/Akt pathway in skeletal muscle fibersAm. J. Physiol.-Cell Physiol 306:C570–C584
- 36.Regulation of EGF-Stimulated EGF Receptor Endocytosis During M PhaseTraffic 12:201–217
- 37.Live cell fluorescence imaging reveals high stoichiometry of Grb2 binding to the EGF receptor sustained during endocytosisJ. Cell Sci https://doi.org/10.1242/jcs.137786
- 38.Receptor-based mechanism of relative sensing and cell memory in mammalian signaling networkseLife 9
- 39.Unraveling Growth Factor Signaling and Cell Cycle Progression in Individual FibroblastsJ. Biol. Chem 291:14628–14638
- 40.Network plasticity of pluripotency transcription factors in embryonic stem cellsNat. Cell Biol 17:1235–1246
- 41.Exploring intermediate cell states through the lens of single cellsCurr. Opin. Syst. Biol 9:32–41