Introduction

In cell populations, there is a significant overlap in cellular responses to environmental stimuli of differing strengths. This raises a fundamental question1: do signaling networks in cells relay accurate information about their environment to take appropriate action? And if not, where along the signal transduction pathway is the information lost2,3? Mutual information (MI) quantifies the information content in a cellular output about extracellular inputs. For an input x (e.g., concentration of a ligand) and an output (e.g., intracellular species concentration46 or a cellular phenotype7,8), the MI is defined as9

Experimental single cell methods such as flow cytometry10, immunofluorescence10, mass spectrometry11, or live cell imaging12 allow us to estimate response histograms p(x|u) across several inputs. Using these distributions, we can estimate the maximum of the MI (the channel capacity, CC) by optimizing Eq. 1 over input distributions p (u). The CC quantifies fidelity of signal transduction13. For example, a CC of 1 bit implies that that the cells can at best distinguish between two input levels, with higher CCs indicating that cells can resolve multiple input states. Importantly, CC can be used to identify bottlenecks in signaling3,14.

The CC has been estimated for input-output relationships in several mammalian signaling networks4,6,1518. When the output is defined as levels of a single protein a fixed time, the CC was found to be surprisingly low, ∼1 − 1.5 bits. These estimates have been improved by considering multidimensional outputs5 or time varying inputs19. While these modifications led to somewhat higher CC estimates, the overall conclusion that cells know little about their environment remains well established. In contrast, significantly higher CC estimates have been found when the output is defined as an average over the cell population6, suggesting that the only way to overcome low sensing fidelity is collective response.

These previous calculations estimated one channel capacity for all cells in a population, implicitly assuming that individual cells have similar sensing capabilities. We now know that cells exhibit extensive heterogeneity in cell state variables such as abundances of key signaling proteins11,20,21, gene expression levels21, epigenetic status22, etc. These variables can remain constant throughout cells’ entire life cycle and beyond23. Heterogeneity in cell state variables leads to a variability in response to extracellular cues, including chemotherapeutic drugs23,24, mitogens25, hormones26, and chemotactic signals7,8. Therefore, it is quite reasonable to expect that cells have variable abilities to sense their environment27,28. And that the differences in cells’ abilities are explained by differences in cell states.

There is currently no conceptual framework or computational algorithms to estimate the distribution of sensing abilities in cell populations. To that end, we introduce an information theoretic quantity CeeMI: Cell state dependent Mutual Information. We show that using typically collected single cell data and a computational model of the underlying signaling network, we can estimate the distribution pCeeMI (I) of single cell signaling fidelities (single cell mutual information values). We also show that we can identify cell state variables that make some cells better and others worse at sensing their environment.

Using an illustrative example, we show that in heterogeneous cell populations, the traditional cell state agnostic estimate of mutual information is significantly lower than the mutual information of signaling networks in typical cells. Next, using previously collected experimental data, we estimate pCeeMI (I) for two important mammalian signaling pathways; the Epidermal growth factor (EGF)/EGF receptor pathway29 and the Insulin like growth factor (IGF)/Forkhead Box protein O (FoxO) pathway29, we show that while the cell state agnostic CC estimates for both pathways are ∼ 1 bit, most individual cells are predicted to be significantly better at resolving different inputs. Using live cell imaging data for the IGF/FoxO pathway, we show that our estimate of variability in sensing abilities matches closely with experimental estimates. Importantly, for this pathway, we also verify our prediction that specific cell state variables dictate cells’ sensing abilities. We believe that CeeMI will be an important tool to explore variability in cellular sensing abilities and in identifying what makes some cells better and others worse in sensing their environment.

Results

Conditional mutual information models single cells as cell state dependent channels

Consider a cell population where cells are characterized by state variables θ. These include abundances of signaling proteins and enzymes, epigenetic states, etc. We assume that cell states are temporally stable30, that is, θ remains constant over a time scale that is much longer than typical fluctuations in environmental inputs. We assume that these variables are distributed in the population according to a distribution p(θ). If x denotes an output of choice (e.g., phosphorylation levels of some protein at a specific time) and u denotes the input (e.g., ligand concentration), the experimentally measured response distribution p(x|u) can be decomposed as:

where p(x|u, θ) is the distribution of the output x conditioned on the input u and cell state variables θ. In most cases, p(x|u, θ) may not be experimentally accessible. However, it is conceptually well defined.

Using p(x|u, θ), we can define the cell state dependent mutual information I(θ) for a fixed input distribution p(u) analogously to Eq. 1:

I(θ) quantifies individual cells’ ability to sense their environment as a function of the cell state parameters θ. The distribution pCeeMI (I) of single cell sensing abilities is

where δ (x) is the Dirac delta function. We can also compute the joint distribution between the single cell mutual information and any cell state variable of interest χ (e.g., cell surface receptor levels):

where χ (θ) is the value of the biochemical parameter when cell state parameters are fixed at θ. The distributions defined in Eq. 5 quantify the interdependency between a cell’s signaling fidelity I(θ) and cell specific biochemical parameters χ (θ). As we will see below, the distributions in Eq. 4 and Eq. 5 can be experimentally verified when appropriate measurements are available. Finally, we define the population average of the cell state dependent mutual information:

In information theory, ICee is known as the conditional mutual information9 between the input u and the output x conditioned on θ. From these definitions of it can be shown that ICeeI (SI Section 1). That is, if cell state variables remain approximately constant, at least some cells in the population have better sensing abilities compared to what is implied by the cell state agnostic mutual information (Eq. 1). Since ICee depends on the input distribution p(u),we can find an optimal input distribution p(u) that maximizes ICee (SI Section 1). Going forward, unless an input distribution is specified, the distributions pCeeMI (I) and pCeeMI (I, χ) are discussed in the context of this optimal input distribution.

Maximum entropy inference can estimate pCeeMI (I)

Estimation of pCeeMI (I) requires decomposing the experimentally observed response p(x|u) into cell specific output distributions p(x|u, θ) and the distribution of cell state variables p(θ) (Eq. 3 and 4). This problem is difficult to solve given that neither of these distributions are accessible in typical experiments. For many signaling networks, stochastic biochemical models can be constructed to obtain the cell specific output distribution p(x|u, θ). Here, χ comprise protein concentrations and effective rates of enzymatic reactions and serve as a proxy for cell state variables. Given the experimentally measured cell state averaged response p(x|u) and the model predicted cell specific output distribution p(x|u, θ), we need a computational method to infer p(θ) (see Eq. 2). The problem of inference of parameter heterogeneity is a non-trivial inverse problem31. We use our previously develop maximum entropy-based approach32 to infer p(θ). This way, we can estimate p CeeMI (I) using experimentally obtained cell state agnostic response p(x|u) and a stochastic biochemical model of the underlying signaling network.

Cell state agnostic mutual information underestimates cells’ ability to sense their environment

To illustrate the effect of heterogeneity of cell state variables on the cell state agnostic estimate of mutual information (which we call cell state agnostic mutual information or ICSA from now onwards), we consider a simple stochastic biochemical network of a receptor-ligand system. Details of the model and the calculations presented below can be found in SI Section 2. Briefly, extracellular ligand L binds to cell surface receptors R. Steady state levels of the ligand bound receptor B is considered the output. The signaling network is parametrized by several cell state variables θ such as receptor levels, rates of binding/unbinding, etc. For simplicity, we assume that only one variable, steady state receptor level R0 in the absence of the ligand, varies in the population. Calculations with variability in other parameters are presented in SI Section 2.

In this population, cells’ response p(B|L,R0) is distributed as a Poisson distribution whose mean is proportional to the cell state variable R0 (SI Section 2). That is, when all other parameters are fixed, a higher R0 leads to lower noise. To calculate cell state dependent mutual information (Eq. 3), we assume that p(L) is a gamma distribution. As expected, I(R0) (Eq. 3) between the output B and the input L increases monotonically with R0 (inset in Fig. 2A). Moreover, given that R0 varies in the population

A schematic of our computational approach.

(top) single cell data across different input conditions and time points are integrated with a stochastic model of a signaling network using a previous developed maximum entropy approach leading to a distribution over signaling network parameters p(θ) (middle). (bottom) In silico cells are generated using the inferred parameter distribution and cell-state specific mutual information I(θ) and population distribution of cell performances pCeeMI(I) is estimated. The model also evaluates the correlation between cells’ performance and biochemical parameters.

A. The distribution of single cell sensing abilities (horizontal blue histograms) and its averag plotted as a function of the coefficient of variation of the distribution of one cell state variable, the cell surface receptor number The dashed blue lines show the traditional cell state averaged mutual information (Eq. 1). The inset shows the dependence between cell state specific mutual information and cell state variable The input distribution is assumed to be a gamma distribution. B. A schematic showing the effect of heterogeneity in cell states on population level response. Even when individual cells have little overlap in their responses to extracellular signal (bottom), the population level responses could have significant overlap (top), leading to a low mutual information between cell state averaged response and the input. C. A combined schematic of the two growth factor pathways. Extracellular growth factor ligand (red circle) binds to cell surface receptors which are shuttled to and from the plasma membrane continuously. Ligand bound receptors are phosphorylated and activate Akt. Phosphorylated Akt leads to phosphorylation of FoxO which effectively shuttles it out of the nucleus. D. The estimated distribution of single cell mutual information values for the EGF/EGFR pathway. The inset shows the input distribution corresponding to the maximum of the average of (blue), along with the input distribution corresponding to the channel capacity of ICSA (green). E. Same as D for the IGF/FoxO pathway. We additionally show the experimentally estimated pCeeMI(I) (pink).

(assumed to be a gamma distribution as well), the sensing ability varies in the population as well (Fig. 2A). Notably, the average ICee of I(R0) remains relatively robust to variation in R0. At the same time, the traditional estimate ICSA, which is the mutual information between the input L and the cell state agnostic population response p(B|L) = ∫ p(B|L, R0) p(R0) dR0 (Eq. 1 and Eq. 2), decreases as the population heterogeneity in R0 increases. Importantly, ICSA is significantly lower than the sensing ability of most cells in the population (Fig. 2A). This is because the overlap in the population response distributions is significantly larger than single cell response distributions (Fig. 2B), which arises due to the variability in cell state variables. This simple example illustrates that the traditional mutual information estimates may severely underestimate cells’ ability to resolve inputs, especially when parameters are heterogeneously distributed. Moreover, it is crucial to explicitly account for heterogeneity in cell state variables when estimating fidelity of cellular communication channels.

Experimental verification of pCeeMI(I) using growth factor signaling networks

In real cell populations, state variables that govern signaling dynamics such as protein levels20,26 (receptors, kinases, dephosphatases, etc.), endocytosis rates33, ligand binding rates34, etc. are known to vary from cell to cell. Therefore, we expect the sensing abilities to be widely distributed as well. To experimentally verify the computational prediction of the distribution pCeeMI (I) of sensing abilities, we need an experimental system that allows us to approximate the cell state specific response distribution p(x|u, θ). The IGF/FoxO pathway is an ideal system for these explorations for several reasons. First, following IGF stimulation, the transcription factor FoxO is pulled out of the nucleus29. GFP-tagged variant of FoxO can be used to detect the dynamics of nuclear levels of FoxO at the single cell level28. Second, nuclear FoxO levels reach an approximate steady state within 30-45 minutes of constant IGF stimulation with FoxO levels decreasing with increasing IGF dose28. As a result, an approximate cell state specific distribution p(x |u, θ) of steady state levels of FoxO can be obtained by stimulating the same cell with successive IGF levels28. Finally, the biochemical interactions in the IGF/FoxO are well studied29 (Fig. 2C), allowing us to a build stochastic biochemical model based on previous computational efforts35 that fits the single cell data accurately. Another system where pCeeMI (I) can in principle be verified is the EGF/EGFR pathway29 (Fig. 2C). Here too, abundance of cell surface EGFR levels can be tracked in live cells following EGF stimulation36,37, allowing us to obtain cell-state specific response distribution p(x|u, θ). Below, we show estimates of pCeeMI(I) for both pathways and an experimental verification of our estimates for the IGF/FoxO pathway where live-cell imaging data was previously collected28.

The details of the calculations presented below can be found in SI Sections 3 and 4. Briefly, we first constructed stochastic biochemical models for the two pathways based on known interactions29 (Fig. 2C) and previous models35,38. The output for the EGFR pathway was defined as the steady state levels of cell surface EGF receptors and the output for the IGF/FoxO pathway was defined as the steady state nuclear levels of the transcription factor FoxO. Using previously collected single cell data on the two pathways28,32 and our maximum entropy-based framework32, we estimated the distribution over parameters p(θ) for the two pathways (SI Section 3). Using these inferred distributions, and the model-predicted cell state specific response distribution p(x|u, θ), we can compute pCeeMI (I) for any specified input distribution p(u). We choose the support of the input distribution as the ligand concentrations used in the experimental setup. The estimates shown in Fig. 2D and Fig. 2E show pCeeMI(I) corresponding to the input distribution that maximizes the conditional mutual information ICee (see Eq. 6).

Similar to the illustrative example (Fig. 2A), there is a wide distribution of single cell sensing fidelities in real populations (Fig. 2D and Fig. 2E). Moreover, most cells are better sensors compared to what is indicated by the maximum of ICSA which was estimated to be ∼ 1 bit for both pathways. Indeed, as seen in the insets of Fig. 2D and Fig. 2E, the input distribution corresponding to the maximum of ICSA is concentrated on the lowest and the highest input for both pathways, indicating that cells may be able to detect only two input levels. In contrast, the input distribution corresponding to the maximum of ICee is close to uniform, suggesting that individual cells can in fact resolve different ligand levels28.

To verify our computational estimate of pCeeMI (I) for the IGF/FoxO pathway, we reanalyzed previously collected data28 wherein several cells were stimulated using successive IGF levels. The details of our calculations can be found in SI Section 4. Briefly, the cells reached an approximate steady state within 60 minutes of each stimulation and nuclear FoxO levels measured between 60 to 90 minutes were used to approximate an experimental cell state specific response distribution p(x|u, θ). The distribution pCeeMI (I) was then obtained by maximizing the average mutual information ICee (averaged over all cells) with respect to the input distribution. As seen in Fig. 2D and Fig. 2E, the experimentally evaluated single cell signaling fidelities match closely with our computational estimates. Moreover, as predicted using our computational analysis, individual cells in the population were significantly better at sensing than what is implied by the maximum of ICSA. Indeed, the distribution of steady state FoxO levels were found to be well-resolved at the single cell level as well (Fig. 2E). Live cell imaging data for the EGFR pathway was not available and we leave it to future studies to validate our predictions.

Our calculations show that real cell populations comprise cells that have differing sensing fidelities, individual cells are significantly better at sensing their environment than what traditional estimates would indicate, and importantly, the CeeMI approach can accurately estimate the variation in signaling performances using readily collected experimental data and stochastic biochemical models.

CeeMI identifies biochemical parameters that determine cells’ ability to sense their environment

We expect that cells’ ability to sense their environment depends on their state variables. Our approach allows us to quantify the joint distribution pCeeMI (I, χ) of single cell signaling fidelity and biochemical state variables that take part in the signaling network. To test whether we can identify variables that are predictive of cellular fidelity, we estimated the joint distribution pCeeMI(I, χ) for two variables that were experimentally accessible, response range of nuclear FoxO (Fig. 3 left, see inset) and total nuclear FoxO levels prior to IGF stimulation (Fig. 3 right, see inset). In both figures, the shaded regions show computational estimates of the joint probability distributions, and the red circles represent real cells. The green and the cyan trend lines represent computational and experimental binned averages.

Dependence on cell state dependent mutual information on biochemical parameters.

(left) The joint distribution pCeeMI(I, χ) of cell state specific mutual information and biochemical parameter χ chosen to be the single cell response range of nuclear FoxO levels (x-axis, see inset for a cartoon). The shaded blue regions are model predictions, and the green line is the model average. The darker shades represent higher probabilities. The red dots represent experimental cells. The cyan line represents experimental averages. (right) same as (left) with biochemical parameter χ chosen to be steady state nuclear Foxo levels in the absence of stimulation. The contours represent 1% to 10%, 10% to 50%, and 50% to 100% of the total probability mass (from faint to dark shading).

One may expect that higher total nuclear FoxO levels could result in lower noise and therefore better sensing abilities. However, we find that total nuclear FoxO levels only weakly correlate with cell state dependent mutual information (correlation coefficient r = 0.16 for computational estimates and r = 0.04 for experimental data). In contrast, cell state dependent mutual information depended strongly on the dynamic range of the response (correlation coefficient r = 0.53 for computational estimates and r = 0.29 for experimental data). Importantly, the model captured the observation that cells with a small response range had a variable sensing abilities while cells with a large response range all performed well in resolving extracellular IGF levels. Surprisingly, the total nuclear FoxO levels only weakly correlated with the cell specific mutual information. In SI Section 5, we show the predicted joint distributions pCeeMI(I, χ) for several other biochemical variables that can potentially govern single cells’ response to extracellular IGF stimuli. This way, CeeMI can be used to systematically identify cell state variables that differentiate between good sensors and bad sensors of the environmental stimuli.

Discussion

Cell populations are characterized by heterogeneity in cell state variables that is responsible for important phenotypic differences including sensitivity to drugs23,24, response to chemotactic signals7,8, and following proliferation cues25. Therefore, it is reasonable to expect that cells’ ability to sense their environment depends on their state and is therefore variable across cells in a population. To quantify this heterogeneity, here, we developed a novel information theoretic framework that takes as input easily measurable single cell data and models of signaling networks to quantify the distribution pCeeMI (I) of single cell sensing abilities. Our framework also quantifies the joint distribution pCeeMI (I, χ) of cell specific sensing ability and any biochemical cell state variable. Importantly, using two growth factor pathways, we showed that individual cells in real cell populations were much better at sensing their environment compared to what is implied by the traditional estimate of channel capacities of signaling networks.

Our framework will be useful in identifying bottlenecks in signal transduction. Many cellular phenotypes such as chemotaxis7,8 and cell proliferation39 exhibit a weak correlation between cellular outputs (e.g., directional alignment with chemical gradients) with the input (e.g., gradient strength), resulting in a low channel capacity even for individual cells. In such cases, it is important to understand where exactly along the information transduction pathway is the information about the gradient is lost. If traditional calculations are pursued, for example, for movement of mammalian cells under EGF gradients7,8, one may conclude that the information loss likely happens right at the receptor level (Fig. 2D). In contrast, CeeMI will allow us to disentangle the effect of cell state heterogeneity and noisy cellular response to precisely pinpoint intracellular signaling nodes that are responsible for signal corruption.

A limitation of the current approach is the assumption that cell state variables remain constant over the time scale of typical environmental fluctuations. While many cell state variables remain constant over the life spans of cells and beyond23,40, it is likely that state changes may occur within the lifespan of a cell41. These dynamical changes can be accommodated easily in the framework. Here, instead of fixing the cell state variables θ, we can treat them as initial conditions and propagate them stochastically with their own dynamics. In the limit of very fast dynamics where individual cells rapidly transition cell states, our framework will agree with traditional cell state agnostic estimates of the channel capacity. In contrast, if the cell state dynamics are slow, our framework highlights differences between cells in the population.

In summary, we showed that like other phenotypes, the ability to sense the environment is itself heterogeneously distributed in a cell population. Moreover, we also showed that mammalian cells appear to be significantly better at sensing their environment than what traditional mutual information calculation suggests. Finally, we showed that we could identify cell state variables that made some cells better sensors compared to others. We believe that CeeMI will be an important framework in quantifying fidelity of input/output relationships in heterogeneous cell populations.

Acknowledgements

AG, HA, and PD are supported by NIGMS grant R35GM142547. The authors would like to thank Andre Levchenko for useful discussions.