The ability to sense the environment is heterogeneously distributed in cell populations

  1. Andrew Goetz
  2. Hoda Akl
  3. Purushottam Dixit  Is a corresponding author
  1. Department of Biomedical Engineering, Yale University, United States
  2. Department of Physics, University of Florida, United States
  3. Systems Biology Institute, Yale University, United States


Channel capacity of signaling networks quantifies their fidelity in sensing extracellular inputs. Low estimates of channel capacities for several mammalian signaling networks suggest that cells can barely detect the presence/absence of environmental signals. However, given the extensive heterogeneity and temporal stability of cell state variables, we hypothesize that the sensing ability itself may depend on the state of the cells. In this work, we present an information-theoretic framework to quantify the distribution of sensing abilities from single-cell data. Using data on two mammalian pathways, we show that sensing abilities are widely distributed in the population and most cells achieve better resolution of inputs compared to an ‘average cell’. We verify these predictions using live-cell imaging data on the IGFR/FoxO pathway. Importantly, we identify cell state variables that correlate with cells’ sensing abilities. This information-theoretic framework will significantly improve our understanding of how cells sense in their environment.

eLife assessment

In this valuable article, the authors use an existing theoretical framework relying on information theory and maximum entropy inference in order to quantify how much information single cells can carry, taking into account their internal state. They reanalyze experimental data in this light. Despite some limitations of the data, the study convincingly highlights the difference between single-cell and population channel capacities. This result should be of interest to the quantitative biology community as it contributes to explaining why channel capacities are apparently low in cells.


In cell populations, there is a significant overlap in responses to environmental stimuli of differing strengths. This raises a fundamental question (Levchenko and Nemenman, 2014): do signaling networks in cells relay accurate information about their environment to take appropriate action? And if not, where along the signal transduction pathway is the information lost (Rhee et al., 2012; ten Wolde et al., 2016)? Mutual information (MI) quantifies the information content in an intracellular output about extracellular inputs. For an input u (e.g., concentration of a ligand) and an output x (e.g., intracellular species concentration, Cheong et al., 2011; Selimkhanov et al., 2014; Suderman et al., 2017) or a cellular phenotype (Varennes et al., 2019; Mattingly et al., 2021), the MI is defined as (Cover and Thomas, 2006)

(1) I=x,up(x|u)p(u)log2p(x|u)up(x|u)p(u)

Experimental single-cell methods such as flow cytometry (Wu and Singh, 2012), immunofluorescence (Wu and Singh, 2012), mass spectrometry (Slavov, 2021), or live-cell imaging (Lemon and McDole, 2020) allow us to estimate response histograms pxu across several inputs. Using these distributions, we can estimate the maximum of the MI (the channel capacity, CC) by optimizing Equation 1 over input distributions p(u). The CC quantifies fidelity of signal transduction (Levchenko and Nemenman, 2014; Rhee et al., 2012). For example, a CC of 1 bit implies that that the cells can at best distinguish between two input levels (e.g., presence versus absence), with higher CCs indicating that cells can resolve multiple input states. Importantly, CC can be used to identify bottlenecks in signaling (Rhee et al., 2012; Hansen and O’Shea, 2015).

The CC has been estimated for input–output relationships in several mammalian signaling networks (Cheong et al., 2011; Selimkhanov et al., 2014; Suderman et al., 2017; Billing et al., 2019; Garner et al., 2016; Frick et al., 2017). When the output is defined as levels of a single protein a fixed time, the CC was found to be surprisingly low, ∼1–1.5 bits. These estimates have been improved by considering multidimensional outputs (Selimkhanov et al., 2014) or time-varying inputs (Lee et al., 2021). While these modifications led to somewhat higher CC estimates, the overall conclusion that cells know little about their environment remains well established. In contrast, significantly higher CC estimates have been found when the output at the level of cell population averages is considered (Suderman et al., 2017), suggesting that the only way to overcome low-sensing fidelity is population average response.

These previous calculations estimated one-channel capacity for all cells in a population, implicitly assuming that individual cells have similar sensing abilities. However, we now know that cells exhibit extensive heterogeneity in cell state variables (Trapnell, 2015; Wollman, 2018) such as abundances of key signaling proteins (Meyer et al., 2012; Frankel et al., 2014), mRNA abundances (Chen et al., 2015), and chromatin conformation (Nagano et al., 2013) and accessibility (Buenrostro et al., 2015). Notably, the time scale of fluctuations in these variables can be significantly slower than relevant signaling time scales (Flusberg et al., 2013; Spencer et al., 2009), sometimes extending across multiple generations (Pleška et al., 2021). Heterogeneity in cell state variables leads to a heterogeneity in response to extracellular cues, including chemotherapeutic drugs (Flusberg et al., 2013; Spencer et al., 2009; Emert et al., 2021), mitogens (Matson and Cook, 2017), hormones (Norris et al., 2021), chemotactic signals (Varennes et al., 2019; Pleška et al., 2021; Moon et al., 2021; Waite et al., 2016), and other electrical and chemical stimuli (Clark and Campbell, 2019; Yao et al., 2016). Therefore, we hypothesize that that the ability to sense the environment varies from cell to cell in populations in a cell state-dependent manner.

There is no conceptual framework to estimate the variation in sensing abilities in cell populations and its dependence on cell state variables. To that end, we introduce an information-theoretic quantity CeeMI: Cell state-dependent Mutual Information. We show that using typically collected single-cell data and computational models of signaling networks, we can estimate the distribution pCeeMI(I) of single-cell signaling fidelities (single-cell mutual information values). We also show that we can identify cell state variables that make some cells better and others worse at sensing their environment.

Using an illustrative example, we show that in heterogeneous cell populations, estimates of mutual information that average over cell states can be significantly lower than the mutual information of signaling networks in typical cells. Next, using previously collected experimental data, we estimate pCeeMI(I) for two important mammalian signaling pathways (Lewis Cantley and Sever, 2013): the epidermal growth factor (EGF)/EGF receptor pathway and the insulin-like growth factor (IGF)/Forkhead Box protein O (FoxO) pathway. We show that while the cell state-agnostic CC estimates for both pathways are ∼1 bit, most individual cells are predicted to be significantly better at resolving different inputs. Using live-cell imaging data for the IGF/FoxO pathway, we show that our estimate of variability in sensing abilities matches closely with experimental estimates. Importantly, for this pathway, we also verify our prediction that specific cell state variables dictate cells’ sensing abilities. Finally, using a simple receptor/ligand model, we show how the time scales cell state dynamics affects cells’ individuality and therefore sensing abilities. We believe that CeeMI will be an important tool to explore variability in cellular sensing abilities and in identifying what makes some cells better and others worse in sensing their environment.


Conditional mutual information models single cells as cell state-dependent channels

Consider a cell population where cells are characterized by state variables θ (Figure 1). These include abundances of signaling proteins and enzymes, epigenetic states, etc. We assume that cell states are temporally stable, that is, θ remains constant over a time scale that is longer than typical fluctuations in environmental inputs. Later, using a toy simple model, we explicitly study the effect of cell state dynamics on cells’ ability to sense their environment.

A schematic of our computational approach.

Top: single-cell data across different input conditions and time points are integrated with a stochastic model of a signaling network using a previous developed maximum entropy approach leading to a distribution over signaling network parameters p(θ) (middle). Bottom: in silico cells are generated using the inferred parameter distribution and cell state-specific mutual information I(θ) and population distribution of cell performances pCeeMI(I) is estimated. The model also evaluates the correlation between cells’ performance and biochemical parameters.

We assume that cell state variables are distributed in the population according to a distribution pθ . If x denotes an output of choice (e.g., phosphorylation levels of one or more protein(s) at one or more time point(s)) and u denotes the input (e.g., ligand concentration), the experimentally measured response distribution pxu can be decomposed as

(2) p(x|u)=p(x|u,θ)p(θ)dθ

where pxu,θ is the distribution of the output x conditioned on the input u and cell state variables θ. We note that in most cases the same cell cannot be probed multiple times. Consequently, pxu,θ may not be experimentally accessible. However, it is conceptually well defined and mathematically accessible when interactions among molecules are specified (Feinberg and Levchenko, 2023).

Using pxu,θ , we can define the cell state-dependent mutual information I(θ) for a fixed input distribution pu analogously to Equation 1:

(3) I(θ)=x,up(x|u,θ)p(u)log2p(x|u,θ)up(x|u,θ)p(u)

I(θ) quantifies individual cells’ ability to sense their environment as a function of the cell state parameters θ. The distribution pCeeMII of single-cell sensing abilities is

(4) pCeeMI(I)=p(θ)δ(II(θ))dθ

where δ is the Dirac delta function. We can also compute the joint distribution between the single-cell mutual information and any cell state variable of interest χ (e.g., abundance of cell surface receptors):

(5) pCeeMI(I,χ)=p(θ)δ(II(θ))δ(χχ(θ))dθ

where χ(θ) is the value of the biochemical parameter when cell state variables are fixed at θ. The distributions in Equation 5 quantify the interdependency between a cell’s signaling fidelity I(θ) and cell-specific biochemical parameters χ(θ). As we will see below, the distributions in Equations 4 and 5 can be experimentally verified when appropriate measurements are available. Finally, we define the population average of the cell state-dependent mutual information:

(6) ICee=I(θ)p(θ)dθ

In information theory, ICee is known as the conditional mutual information (Cover and Thomas, 2006) between the input u and the output x conditioned on θ. If cell state variables remain constant and if distribution over cell states is independent of the input distribution, it can be shown that ICeeI (Section 1 of Materials and Methods). It then follows that at least some cells in the population have better sensing abilities compared to the cell state-agnostic mutual information (Equation 1).

Since ICee depends on the input distribution pu, we can find an optimal input distribution pu that maximizes ICee (Section 1 of Materials and Methods). Going forward, unless an input distribution is specified, the distributions pCeeMII and pCeeMII,χ are discussed in the context of this optimal input distribution.

Maximum entropy inference can estimate pCeeMI(I)

Estimation of pCeeMI(I) requires decomposing the experimentally observed response pxu into cell-specific output distributions pxu,θ and the distribution of cell state variables p(θ) (Equations 3 and 4). This problem is difficult to solve given that neither pxu,θ nor pθ are accessible in typical experiments. However, for many signaling networks, stochastic biochemical models can be constructed to obtain the cell-specific output distribution pxu,θ . Here, θ comprise protein concentrations and effective rates of enzymatic reactions and serve as a proxy for cell state variables. Given the experimentally measured cell state-averaged response pxu and the model-predicted cell-specific output distribution pxu,θ , we need a computational method to infer pθ (see Equation 2). The problem of inference of parameter heterogeneity is a non-trivial inverse problem (Dixit et al., 2020). While there are several proposed methods to solve this problem (reviewed in Loos and Hasenauer, 2019), most cannot efficiently infer pθ for signaling networks with even a moderate (n10) number of parameters. Here, we use our previously develop maximum entropy-based approach to infer pθ (Dixit et al., 2020). This way, we can estimate pCeeMI(I) using experimentally obtained cell state-agnostic response pxu and a stochastic biochemical model pxu,θ of the underlying signaling network.

An ‘average cell’ can discern much less than a typical cell about the environment

To illustrate the effect of heterogeneity of cell state variables on the cell state-agnostic estimate of mutual information (which we call ICSA from now onward), we consider a simple stochastic biochemical network of a receptor-ligand system. Details of the model and the calculations presented below can be found in Section 2 of Materials and Methods. Briefly, extracellular ligand L binds to cell surface receptors R. Steady-state levels of the ligand-bound receptor B are considered the output. The signaling network is parameterized by several cell state variables θ such as receptor levels, rates of binding/unbinding, etc. For simplicity, we assume that only one variable, steady-state receptor level R0 in the absence of the ligand, varies in the population. Calculations with variability in other parameters are presented in Section 2 of Materials and Methods.

In this population, cells’ response pBL,R0 is distributed as a Poisson distribution whose mean is proportional to the cell state variable R0 (Section 2 of Materials and Methods). That is, when all other parameters are fixed, a higher R0 leads to lower noise (coefficient of variation). To calculate cell state-dependent mutual information (Equation 3), we assume that p(L) is a gamma distribution. As expected, I(R0) (Equation 3) between the output B and the input L increases monotonically with R0 (inset in Figure 2A). Moreover, given that R0 varies in the population (also assumed to be a gamma distribution), the sensing ability varies in the population as well (Figure 2A). Notably, the average ICee of I(R0) remains relatively robust to variation in R0 . At the same time, the traditional estimate ICSA , which is the mutual information between the input L and the cell state-agnostic population response pBL=pBL,R0pR0dR0 (response of the ‘average’ cell, Equations 1 and 2), decreases as the population heterogeneity in R0 increases. Importantly, ICSA is significantly lower than the sensing ability of most cells in the population (Figure 2A). This is because the overlap in the population response distributions is significantly larger than that in single-cell response distributions (Figure 2B). This simple example illustrates that the traditional mutual information estimates may severely underestimate cells’ ability to resolve inputs, especially when cell state variables are heterogeneously distributed. Moreover, it is crucial to explicitly account for heterogeneity in cell state variables when estimating fidelity of cellular communication channels.

Figure 2 with 7 supplements see all
Channel capacities are broadly distributed in a population of cells.

(A) The distribution of single-cell sensing abilities pCeeMII (horizontal blue histograms) and its average ICee plotted as a function of the coefficient of variation of the distribution of one cell state variable, the cell surface receptor number R0. The dashed blue lines show the traditional cell state-averaged mutual information ICSA (Equation 1). The inset shows the dependence between cell state-specific mutual information I(θ) and cell state variable R0. The input distribution p(L) is assumed to be a gamma distribution. (B) A schematic showing the effect of heterogeneity in cell states on population-level response. Even when individual cells have little overlap in their responses to extracellular signal (bottom), the population-level responses could have significant overlap (top), leading to a low mutual information between cell state-averaged response and the input. (C) A combined schematic of the two growth factor pathways. Extracellular growth factor ligand (red circle) binds to cell surface receptors which are shuttled to and from the plasma membrane continuously. Ligand-bound receptors are phosphorylated and activate Akt. Phosphorylated Akt leads to phosphorylation of FoxO, which bars it from entering the nucleus. EGF/EGFR model is limited to the reactions on the plasma membrane. The corresponding cell state variables are given by θ{kprod, kbind, kunbind, kp, kdp, kdeg, kdeg}. The cell state variables for the IGF/FoxO model are given by θ{kprod, kbind, kunbind, kp, kdp, kdeg, kdeg, [Akt]total, kap, kadp, kin, kef, kfp, kfdp, [FoxO]total}. (D) The estimated distribution of single-cell mutual information values pCeeMI(I) for the EGF/EGFR pathway using a maximum entropy estimation of p(θ). The inset shows the input distribution p(u) corresponding to the maximum of the average ICee of pCeeMII (blue), along with the input distribution corresponding to the channel capacity of ICSA (green). (E) Same as (D) for the IGF/FoxO pathway. We additionally show the experimentally estimated pCeeMII (pink).

Experimental verification of pCeeMI(I) using growth factor signaling networks

In cell populations, state variables that govern signaling dynamics such as protein levels (receptors, kinases, dephosphatases, etc.) (Meyer et al., 2012; Frankel et al., 2014), as well as effective rate constants such as endocytosis rates (Kallenberger et al., 2017), ligand binding rates (Chung et al., 1997), etc., vary from cell to cell. Therefore, we expect the downstream phenotype of environmental sensing to be widely distributed as well. To experimentally verify the computational prediction of the distribution pCeeMI(I) of sensing abilities, we need a system that allows us to approximate the cell state-specific response distribution pxu,θ . The IGF/FoxO pathway (Lewis Cantley and Sever, 2013) is an ideal system for these explorations for several reasons. First, following IGF stimulation, the transcription factor FoxO is pulled out of the nucleus. GFP-tagged variant of FoxO can be used to detect the dynamics of nuclear levels of FoxO at the single-cell level (Gross et al., 2019). Second, nuclear FoxO levels reach an approximate steady state within 30–45 min of constant IGF stimulation, with FoxO levels decreasing with increasing IGF dose (Gross et al., 2019). As a result, an approximate cell state-specific distribution pxu,θ of steady-state levels of FoxO can be obtained by stimulating the same cell with successive IGF levels. Finally, the biochemical interactions in the IGF/FoxO are well studied (Figure 2C), allowing us to build a stochastic biochemical model based on previous computational efforts (Wimmer et al., 2014; Lyashenko et al., 2020) that fits the single-cell data accurately. Another system where pCeeMI(I) can in principle be verified is the EGF/EGFR pathway (Figure 2C). Here too, abundance of cell surface EGFR levels can be tracked in live cells following EGF stimulation (Fortian and Sorkin, 2014), allowing us to obtain cell state-specific response distribution pxu,θ . Below, we show estimates of pCeeMI(I) for both pathways and an experimental verification of our estimates for the IGF/FoxO pathway where live-cell imaging data was previously collected (Gross et al., 2019).

The details of the calculations presented below can be found in Materials and Methods (Figure 2—figure supplement 7; Figure 2—figure supplement 6; Figure 2—figure supplement 5; Figure 2—figure supplement 3; Figure 2—figure supplement 2; Figure 2—figure supplement 1; Figure 2—figure supplements 4). Briefly, we first constructed stochastic biochemical models for the two pathways based on known interactions (Figure 2C) and previous models (Wimmer et al., 2014; Lyashenko et al., 2020). The output for the EGFR pathway was defined as the steady-state levels of cell surface EGF receptors, and the output for the IGF/FoxO pathway was defined as the steady-state nuclear levels of the transcription factor FoxO. Using previously collected single-cell data on the two pathways (Dixit et al., 2020; Gross et al., 2019; Lyashenko et al., 2020) and our maximum entropy-based framework (Dixit et al., 2020), we estimated the distribution over parameters p(θ) for the two pathways (Section 3 of Materials and Methods). Using these inferred distributions, and the model-predicted cell state-specific response distribution pxu,θ , we could compute pCeeMI(I) for any specified input distribution p(u). We choose the support of the input distribution as the ligand concentrations used in the experimental setup. The estimates shown in Figure 2D and E show pCeeMI(I) corresponding to the input distribution p(u) that maximizes the conditional mutual information ICee (see Equation 6). This input distribution is shown in the inset of Figure 2D and E.

Similar to the illustrative example (Figure 2A), there is a wide distribution of single-cell sensing fidelities in real populations (Figure 2D and E). Moreover, most cells are better sensors compared to the ‘average cell’, a cell whose response pxu is averaged over cell state variability which was estimated to be ∼1 bit for both pathways. The cellular signaling fidelity skews toward the upper limit of 2 bits, which corresponds to the logarithm of the number of inputs used in the experiment. Indeed, as seen in the insets of Figure 2D and E and , the input distribution corresponding to the maximum of the cell state-agnostic mutual information ICSA is concentrated on the lowest and the highest input for both pathways, indicating that cells may be able to detect only two input levels. In contrast, the input distribution corresponding to the maximum of ICee is close to uniform, suggesting that individual cells can in fact resolve different ligand levels.

To verify our computational estimate of pCeeMI(I) for the IGF/FoxO pathway, we reanalyzed previously collected data wherein several cells were stimulated using successive IGF levels (Gross et al., 2019). The details of our calculations can be found in Section 4 of Materials and methods. Briefly, the cells reached an approximate steady state within 60 min of each stimulation and nuclear FoxO levels measured between 60 and 90 min were used to approximate an experimental cell state-specific response distribution pxu,θ . The distribution pCeeMI(I) was then obtained by maximizing the average mutual information ICee (averaged over all cells) with respect to the input distribution. As seen in Figure 2D and E, the experimentally evaluated single-cell signaling fidelities match closely with our computational estimates. Moreover, as predicted using our computational analysis, individual cells in the population were significantly better at sensing than what is implied by the maximum of ICSA. Indeed, the distribution of steady-state FoxO levels was found to be well resolved at the single-cell level as well (Figure 2E). Live-cell imaging data for the EGFR pathway was not available, and we leave it to future studies to validate our predictions.

Our calculations show that real cell populations comprise cells that have differing sensing fidelities, individual cells are significantly better at sensing their environment than what traditional estimates would indicate, and importantly, the CeeMI approach can accurately estimate the variation in signaling performances using readily collected experimental data and stochastic biochemical models. Notably, the variability in cell states and therefore the heterogeneity in sensing abilities are likely to be stable over time; the same cell’s FoxO responses to the same input were found to have significantly less variation compared to the variation within the population (Figure 2—figure supplement 7).

CeeMI identifies biochemical parameters that determine cells’ ability to sense their environment

Like other phenotypes (Frankel et al., 2014) and signaling dynamics (Meyer et al., 2012; Norris et al., 2021), we expect that cells’ ability to sense their environment depends on their state variables. For example, cells with faster endocytosis rates (Kallenberger et al., 2017) may integrate environmental fluctuations with higher accuracy (Lyashenko et al., 2020). Or cells with higher receptor numbers (Frankel et al., 2014) will lead to a lowered relative noise in bound ligand concentration.

To systematically identify the variables that differentiate between cells’ ability to sense the environment, we quantify the joint distribution pCeeMII,χ of single-cell signaling fidelity and biochemical state variables that take part in the signaling network. To test whether we can identify variables that are predictive of cellular fidelity, we estimated the joint distribution pCeeMII,χ for two variables that were experimentally accessible, response range of nuclear FoxO (Figure 3, left, see inset) and total nuclear FoxO levels prior to IGF stimulation (Figure 3, right, see inset). In both figures, the shaded regions show computational estimates of the joint probability distributions, and the red circles represent real cells. The green and the cyan trend lines represent computational and experimental binned averages.

Figure 3 with 1 supplement see all
Dependence on cell state-dependent mutual information on biochemical parameters.

Left: the joint distribution pCeeMII,χ of cell state-specific mutual information and biochemical parameter χ chosen to be the single-cell response range of nuclear FoxO levels (x-axis, see inset for a cartoon). The shaded blue regions are model predictions, and the green line is the model average. The darker shades represent higher probabilities. The red dots represent experimental cells. The cyan line represents experimental averages. Right: same as left, with biochemical parameter χ chosen to be steady-state nuclear Foxo levels in the absence of stimulation. The contours represent 1–10%, 10–50%, and 50–100% of the total probability mass (from faint to dark shading).

One may expect that higher total nuclear FoxO levels could result in lower noise (coefficient of variation) and therefore better sensing abilities. However, we find that total nuclear FoxO levels only weakly correlate with cell state-dependent mutual information (correlation coefficient r=0.16 for computational estimates and r=0.04 for experimental data). In contrast, cell state-dependent mutual information depended strongly on the dynamic range of the response (correlation coefficient r=0.53 for computational estimates and r=0.29 for experimental data). Importantly, the model captured the observation that cells with a small response range had a variable sensing abilities while cells with a large response range all performed well in resolving extracellular IGF levels. Surprisingly, the total nuclear FoxO levels only weakly correlated with the cell-specific mutual information. In Section 5 of Materials and Methods (Figure 3—figure supplement 1), we show the predicted joint distributions pCeeMII,χ for several other biochemical variables that can potentially govern single cells’ response to extracellular IGF stimuli. This way, CeeMI can be used to systematically identify cell state variables that differentiate between good sensors and bad sensors of the environmental stimuli.

Time scale of stochastic dynamics of cell states dictates the divergence between state-specific and state-agnostic sensing abilities

A key limitation of the presented analysis is the assumption that cell state variables remain approximately constant over the time scale of typical environmental fluctuations. While many cell state variables change slowly, remaining roughly constant over the life spans of cells and beyond (Flusberg et al., 2013; Spencer et al., 2009), state changes may occur within the life span of a cell as well (MacLean et al., 2018). These dynamical changes can be accommodated easily in our calculations. Here, instead of fixing the cell state variables θ, we can treat them as initial conditions and propagate them stochastically with their own dynamics. In the limit of very fast dynamics where individual cells rapidly transition cell states, we expect that the individuality of cells in a population vanishes and consequently cell state-specific mutual information for each cell will agree with traditional cell state-agnostic estimates of the channel capacity. In contrast, if the cell state dynamics are slow, our framework highlights differences between cells in the population.

To elucidate the role of cell state dynamics, we built a simple model of ligand–receptor system (Section 6 of Materials and Methods, Figure 4). Briefly, the model included production and degradation of the receptor mRNA, translation of the receptor protein from the mRNA, and ligand binding to the receptor (Figure 4A). The number of ligand-free receptors and the number mRNA molecules were together considered the state variables. We tuned the mRNA dynamics by keeping the average mRNA number constant while simultaneously changing mRNA production and degradation rates. The time scale of mRNA dynamics is denoted by τ.

Cell state dynamics governs cell state-conditioned mutual information.

(A) In a simple stochastic model, receptor mRNA is produced at a constant rate from the DNA and the translated into ligand-free receptors. The number of ligand-bound receptors after a short exposure to ligands is considered the output. (B) A schematic showing dynamics of receptor numbers when mRNA dynamics are slower compared to signaling time scales. (C) Conditioning on receptor numbers leads to differing abilities in sensing the environment when the time scale of mRNA dynamics is slow. In contrast, when the mRNA dynamics are fast (large τ-1), conditioning on cell state variables does not lead to difference in sensing abilities.

When the mRNA dynamics are slower than translation, ligand binding, and receptor degradation, individual cells are effectively frozen around a particular receptor number (Figure 4B). The distribution of the number of ligand-bound receptors after a short-term ligand exposure reflects this frozen state. As a result, the cell state-conditioned mutual information depends strongly the cell state, with higher mutual information for cells with higher receptor numbers. In contrast, as the mRNA dynamics become faster, cells quickly lose the memory of cell state conditioning. As a result, the distribution of the number of ligand-bound receptors after a short-term ligand exposure reflects an ergodic averaging over the underlying mRNA dynamics. This averaging effectively washes out differences between cells (Figure 4C).

This example shows that when cells states change at time scales slower than signaling dynamics, cells in a population can be identified by their cell state and their ability to sense the environment can differ from one another.


Cell populations are characterized by heterogeneity in cell state variables (Trapnell, 2015; Wollman, 2018) that is responsible for important phenotypic differences and selective advantage, for example, in sensitivity to drugs (Flusberg et al., 2013; Spencer et al., 2009; Emert et al., 2021), response to chemotactic signals (Mattingly and Emonet, 2022), and following proliferation cues (Matson and Cook, 2017). Therefore, it is reasonable to expect that cells’ ability to sense their environment depends on their state and is therefore variable across cells in a population. To quantify this heterogeneity, here, we developed a novel information-theoretic perspective that allowed us to quantify the distribution pCeeMII of single-cell sensing abilities using easily measurable single-cell data and stochastic models of signaling networks. We also quantified the joint distribution pCeeMII,χ of cell-specific sensing ability and biochemical cell state variable. Importantly, using two growth factor pathways, we showed that individual cells in real cell populations were much better at sensing their environment compared to what is implied by the traditional estimate of channel capacities of signaling networks. Typical single-cell data are time stamped and do not give information about dynamics in the single cell. Moreover, these data do not provide information about dependence of cell-to-cell variability on cell state variables. Therefore, computational modeling of cellular trajectories from single-cell data and stochastic models is essential in deciphering mechanistic origins of heterogeneity in cellular phenotypes.

The approach presented here will be useful in identifying bottlenecks in signal transduction. Many cellular phenotypes such as chemotaxis (Varennes et al., 2019; Mattingly et al., 2021) and cell proliferation (Gross and Rotwein, 2016) exhibit a weak correlation between cellular outputs (e.g., directional alignment with chemical gradients) with the input (e.g., gradient strength), resulting in a low channel capacity even for individual cells. In such cases, it is important to understand where exactly along the information transduction pathway is the information about the gradient is lost. If traditional calculations are pursued, for example, for movement of mammalian cells under growth factor gradients (Varennes et al., 2019; Moon et al., 2021), one may conclude that the information loss likely happens right at the receptor level (Figure 2D). In contrast, CeeMI will allow us to disentangle the effect of cell state heterogeneity and noisy cellular response to precisely pinpoint intracellular signaling nodes that are responsible for signal corruption.

How do we contrast our results with previous low estimates of channel capacities? There is extensive population heterogeneity in cell state variables and this heterogeneity often is stable over time. Nonetheless, this heterogeneity arises from stochasticity in underlying processes including dynamics of epigenetic transitions and gene expression. When one specifies state variables (Wollman, 2018) θ, one effectively conditions on the stochastic trajectory of the underlying dynamics to a specific narrow window (Figure 4). This extra conditioning narrows the cells’ response distributions, revealing the heterogeneous nature of cell populations (Wollman, 2018).

In summary, we showed that like other phenotypes the ability to sense the environment is itself heterogeneously distributed in a cell population. Moreover, we also showed that when conditioned on cell state variables, mammalian cells appear to be significantly better at sensing their environment than what traditional mutual information calculation suggests. Finally, we showed that we could identify cell state variables that made some cells better sensors compared to others. We believe that CeeMI will be an important framework in quantifying fidelity of input/output relationships in heterogeneous cell populations.

Materials and methods

Scripts used to generate mutual information results as well the experimental data used to train our models can be obtained at (copy archived at Goetz, 2024). Script for maximum entropy inference of cellular parameters can be obtained at (copy archived at Akl, 2024).

The description of the methods is organized as follows: Section 1 describes basic information theoretical concepts and our numerical approach to estimate mutual information and channel capacity. Section 2 describes the in silico toy model. Section 3 describes the maximum entropy approach. Section 4 describes numerical methods for quantification of mutual information and channel capacity using live-cell imagining data. Section 5 shows the joint distribution of mutual information and biochemical parameters. Section 6 describes the methods to examine the role of cell state dynamics on sensing ability.

1. Information theory primer

Here, we give a brief description of information theoretic quantities used in the article. The readers are referred to a textbook (Cover and Thomas, 2006) for a detailed discussion.

A communication channel (e.g., a signaling network) is an input/output relationship between two random variables: the input U (say, a ligand concentration) and the response R (e.g., levels of some intracellular protein). The mutual information (MI) between U and R is the reduction in the uncertainty about U due to the access of the outcome of R. The MI is defined as

(S1) II(U;R)=rR,uUp(r|u)p(u)log2p(r|u)up(r|u)p(u)

where the choice of two in the base of the logarithm gives the value of information in bits. The summation is replaced by integrals when considering continuous random variables. In the context of cell signaling, we will use the notation ICSA (instead of I in Equation S1) to denote the cell state-agnostic mutual information, which is the mutual information between the response distribution p(r|u) and the input distribution p(u) .

We also consider a more nuanced situation where instead of a single channel we have a family of channels whose states are characterized by a random variable Θ. The channel state θΘ uniquely determines the probabilistic response relationship p(r|u,θ) between U and R when Θ is fixed. Similar to Equation S1, we can define the MI between U and R conditioned for a specific realization θΘ

(S2) I(θ)=rR,uUp(r|u,θ)p(u)log2p(r|u,θ)up(r|u,θ)p(u)

The average of I(θ) over p(θ) is traditionally known as the conditional mutual information (Cover and Thomas, 2006). We define the Cell state-dependent mutual information ICee

(S3) ICeeI(U;R|Θ)=θΘI(θ)p(θ)

The channel capacity (CC) is a measure of the optimal performance of the signaling network with respect to the distribution of U. It can be defined for both I and ICee as follows:

(S4) CCI=maxp(u)I(U;R)orCCICee=maxp(u)I(U;R|Θ)

The difference between I and ICee is called the interaction information I(U;R;Θ). The interaction information can be simplified as

(S5) I(U;R;Θ)=I(U;R)I(U;R|Θ)=I(U;Θ)I(U;Θ|R)=I(U;Θ|R)0

where I(U;Θ)=0 follows from the statistical independence of the input signal U and channel state Θ. Equation S5 shows that ICee>I as long as I(U;Θ)=0. We note that this may not be true in general. However, it is true in the context of cellular signaling networks where the inputs are chosen by the experimentalists while the cell states are an inherent property of the cell population. It is trivial to extend this argument to the channel capacities of both terms.

Numerical estimation of mutual information

Request a detailed protocol

Evaluating the mutual information between an input and an output (Equation S1 and S2) requires numerical integration over the input and the output distribution. We limit the input distributions to a finite support, specifically to the ligand concentrations that were used in the experimental setup. These are L=[0,0.0078,0.01,0.03,0.06,0.125,0.25,0.5,1,100] ng/mL for the EGF/EGFR pathway and L=[0, 17.5, 37.5, 125] pM for the IGF/FoxO pathway.

The numerical integration in Equation S1 and S2 requires summing over probabilities of all possible responses and inputs. When the response distributions are approximated using histograms (either from experimental data or from Markov chain Monte Carlo simulations of a model), the summation can be error prone. To avoid this, we assume that the responses are distributed according to a gamma distribution (see Figure 2—figure supplement 4). The gamma distribution was chosen here and in several other places because it has been shown to accurately approximate real distributions of protein/mRNA abundances (Taniguchi et al., 2010).

To speed up the calculations, inspired by previous approaches (Rhee et al., 2012), we use a binning strategy. Specifically, we bin the response distribution using a constant bin width. The bin width was chosen to be equal to 5% of the smallest interquartile range across response distributions corresponding to all considered inputs. The binning procedure requires truncating the response distributions to a finite support. We ensured that our binning captured at least 99.95% of the entire mass of the distribution. The same strategy was used to obtain ICSA in Equation S1 and I(θ) in Equation S2. The only difference in computing I(θ) compared to ICSA was that the response distribution p(r|u,θ) was obtained computationally using a stochastic differential equation model with network parameters fixed at θ. Samples from the joint distribution pCeeMI(I,χ) where χ is a cell state variable were obtained by sampling cell state variables θ from p(θ) (see Section 3) and simultaneously evaluating I(θ) and χ(θ).

Once the mutual information could be obtained numerically for a given input distribution p(u), the corresponding channel capacity, the maximum of the mutual information over all possible input distributions, can be obtained by solving the following optimization problem:

(S6) maxp(u)ICSAs.t.p(u)=1andp(u)0

The optimization problem was solved via a trust region constrained algorithm (Byrd et al., 1999) using the SciPy optimization library (Virtanen et al., 2020). When the distribution over cell state variables p(θ) is available, we can estimate ICee as the average I(θ)θ (Equation S3). The optimum value of ICee over all input distributions serves as a cell state-dependent analogue to the channel capacity of ICSA and is obtained by solve a similar optimization problem:

(S7) maxp(u)ICees.t.p(u)=1andp(u)0

The procedure needed to evaluate single-cell mutual information values using live-cell imaging data on the IGF/FoxO pathway is described in Section 4.

2. Description of the in silico toy model

The in silico cell receptor/ligand system comprises two components (unbound receptors and bound receptors) and five reactions (Figure 2—figure supplement 1).

In the model, receptors are constantly shuttled to the membrane and removed from the membrane and degraded. The extracellular ligand (concentration denoted by L) binds to cell surface receptors. We assume that the ligand concentration is kept constant in the environment. Steady-state abundance of ligand-bound receptor (denoted by B) is taken as the output of the system. At steady state, the mean field equations for the average species level are

(S8) kprodkbindLR+kunbindBkdegR=0
(S9) kbindLRkunbindBkdegB=0

Solving for steady state, at the single-cell level, the mean number of bound receptors is given by

(S10) μB=R0LkbindLkbind+kdeg+kunbind

In Equation S10, kbind is the ligand binding rate, kunbind is the ligand unbinding rate, kdeg is the degradation rate, and R0 is the average value of the receptor level in the absence of the ligand. The bound receptor levels are Poisson distributed with a mean given by Equation S10 (Figure 2—figure supplement 2).

Using this toy network, we created two in silico cell populations. In both populations, we fixed kbind=1sec1a.u.1 and kunbind=10sec1 . In the first population, every parameter was kept constant across cells except for the cell surface receptor levels R0 . In the second population, every parameter was kept constant across cells except for the receptor degradation rate kdeg . In the first population (when R0 was varied), we fixed kdeg=5sec1 . In the second population (when kdeg was varied), we fixed R0=50molecules/cell. The parameter that varied across cells was assumed to be distributed according to a gamma distribution. For the two populations, we kept fixed mean value of the variable parameter to be R0=500molecules/cell and kdeg=5sec1 . We varied coefficients of variation for both populations between CV=10-1.5 and CV=10-0.5.

Obtaining ICSA for the toy network

Request a detailed protocol

Equation 1 shows that calculation of MI in a cell state-agnostic manner requires the knowledge of the cell state-averaged response distribution p(B|L) and the distribution of inputs p(L). In our calculations, we restrict p(L) to be a discrete version of the gamma distribution obtained by equal percentile binning of a gamma distribution. The mean of the gamma distribution was taken to be 10, the coefficient of variation 1, and the number of bins was 25. The discretization step was not essential but was taken to simplify the calculations by making all variables discrete.

We assumed that p(B|L), which is obtained by averaging over the cell state variables, was distributed as a negative binomial distribution (Figure 2—figure supplement 2). We estimated its first two moments by numerically averaging the first two moments of the single-cell response distribution p(B|L,θ) (Poisson distribution with mean given by Equation S10) according to the population distribution p(θ) of the variable parameter(s) θ (θR0 for the first cell population and θkdeg for the second cell population). As mentioned above, p(θ) was assumed to be a gamma distribution whose coefficient was systematically varied. Using the first two moments, we inferred the parameters for the negative binomial distribution.

Once the population-level response p(B|L) is obtained and the input distribution p(L) is fixed, we can calculate ICSA using Equation 1. The values of ICSA for the population with variable R0 are given in Figure 2C while those for the population with variable kdeg are given in Figure 2—figure supplement 3.

Obtaining pCeeMI(I) and ICee for the toy receptor network

Request a detailed protocol

As indicated by Equations 4 and 6, calculation of the distribution of cell state-dependent mutual information values requires the cell state-specific response distribution p(B|L,θ) and the distribution p(θ) of cell states. In the toy model, p(B|L,θ) is modeled as a Poisson distribution with a mean given by Equation S8 and p(θ) is assumed to be gamma distributed in the variable parameter.

Using p(B|L,θ) and gamma distributed input distribution p(L), we obtain cell state-specific mutual information I(θ) using Equation S2. We find pCeeMI(I) by sampling multiple values of θ (R0 for the first population and kdeg for the second population). ICee is simply the average of pCeeMI(I). ICee and pCeeMI(I) for the population with variable R0 are given in Figure 2C. Figure 2—figure supplement 3 shows the same for the population with variable kdeg .

3. Maximum entropy inference of cell state variability

Using experimentally collected single-cell data on heterogeneity in protein abundances, we estimate the distribution over cell state variables (biochemical parameters) using the maximum entropy approach. Below, we first describe the data that was used in our analysis. Next, we briefly discuss the maximum entropy approach.

EGF/EGFR pathway, data, and model

Request a detailed protocol

We used previously collected single-cell data on cell surface EGFR levels (Dixit et al., 2020; Lyashenko et al., 2020). Briefly, MCF10A cells were stimulated with 10 different extracellular EGF levels ranging between 0 ng/mL and 100 ng/mL. Cell surface EGFR levels were measured in ∼7000 cells for each ligand concentration after 3 hr of continuous EGF stimulation. The data was measured in arbitrary fluorescence units. To convert the data to the units of number of receptors per cell, we used a mean number of cell surface receptors R=2.5×105 for MCF10A cells (Shi et al., 2016). The population mean of the experimentally measured steady-state receptor count distribution in the absence of the ligand was matched to this number. Receptor count data at every other ligand concentration was scaled appropriately.

Using a previously validated model (Dixit et al., 2020; Lyashenko et al., 2020), we constructed a simplified model of the EGF/EGFR pathway. Specifically, we incorporated ligand binding to receptor, receptor activation, and preferential endocytosis of activated receptors. To keep our model simple, we did not incorporate receptor dimerization and oligomerization following ligand binding. Notably, evidence suggests that oligomers may be preformed in the absence of the ligand as well, which could make them effective monomers. Finally, we assumed that the extracellular ligand concentration was kept constant. The model was represented by the following reaction network.

(S11) ϕkprodR
(S12) RLkbindB,BkunbindL+R
(S13) BkpP,PkdpB
(S14) Rkdegϕ,Bkdegϕ,Pkdegϕ

In Equations S11–S14, R is the level of free receptor, B is the level of ligand-bound receptor, and P is the level of phosphorylated receptor. The total number of receptors is given by RT=R+B+P. Cell state variables θ comprised θ{kprod,kbind,kunbind,kp,kdp,kdeg,kdeg}.

IGF/FoxO pathway, data, and model

Request a detailed protocol

We used previously collected single-cell data on nuclear FoxO levels following continuous IGF stimulation in HeLa cells (Gross et al., 2019). Briefly, cells were continuously stimulated with IGF, and nuclear FoxO levels were measured using GFP-tagged FoxO using live-cell imaging every 3 min for 90 min. To convert the arbitrary fluorescence units to units of copies of FoxO per cell, we first removed the background fluorescence intensity. Then, we used the previously estimated total FoxO levels in HeLa cells (Bekker-Jensen et al., 2017) (∼710 molecules/cell) and the nuclear-to-cytoplasmic ratio in the absence of stimulation (Wimmer et al., 2014) (2/3 of the total in the nucleus). There was a small disagreement in mean nuclear FoxO levels in the absence of stimulation across different experiments. To remove this artifact and to start all experiments with the average nuclear FoxO levels, we offset individual experiments such that the mean nuclear FoxO was identical across all experiments.

Similar to the EGF/EGFR pathway, using a previously validated model (Wimmer et al., 2014), we constructed a simplified model of the IGF/FoxO pathway. Specifically, we incorporated ligand binding to IGF receptor, receptor activation, activation of Akt, and Akt-driven phosphorylation of FoxO. Phosphorylated FoxO was prohibited from entering the nucleus. Finally, we assumed that the extracellular ligand concentration was kept constant. The model was represented by the following reaction network.

(S15) ϕkprodR
(S16) RLkbindB,BkunbindL+R
(S17) BkpP,PkdpB
(S18) Rkdegϕ,Bkdegϕ,Pkdegϕ
(S19) AktPkappAkt,pAktkadpAkt
(S20) FoxOckinFoxOn,FoxOnkefFoxOc
(S21) FoxOcpAktkfppFoxOc,pFoxOckfdpFoxOc

In Equations S15–S21, R is the level of free IGF receptor, B is the level of ligand-bound receptor, and P is the level of phosphorylated receptor. pAkt is phosphorylated Akt, pFoxOc is cytoplasmic phosphorylated FoxO, FoxOc is cytoplasmic unphosphorylated FoxO, and FoxOn is nuclear unphosphorylated FoxO. Cell state variables θ comprised θ{kprod,kbind,kunbind,kp,kdp,kdeg,kdeg,[Akt]total,kap,kadp,kin,kef,kfp,kfdp,[FoxO]total}.

Inference of model parameters

Request a detailed protocol

We assume that cells in a population can be assigned a cell-specific state denoted by a state vector θ that comprises biochemical parameters relevant to the modeled signaling network. The population variability in cell state parameters is represented by the joint probability density pθ. Typically, pθ is not experimentally accessible. Therefore, we infer it using a previously developed technique called MEDIRIAN (maximum entropy-based framework for inference of heterogeneity in dynamics of signaling networks) (Dixit et al., 2020). MERIDIAN infers the maximum entropy distribution pθ that reproduces a set of averages computed from experimental single-cell measurements.

MERIDIAN requires a mechanistic model of the signaling network that can predict cell’s response to extracellular perturbation (e.g., ligand) and user-specified population averages computed from experimental data. For the EGF/EGFR and IGF/FoxO networks, we used stochastic biochemical models described by Equations S11–S21, respectively. We use the moment closure approximation (Gillespie, 2009) to approximate the single-cell distributions using the first two moments (see below). The differential equations for the pathways can be found on the GitHub.

Now, we briefly describe the MERIDIAN approach. The entropy of any distribution p(θ) is given by

(S22) S=p(θ)logp(θ)dθ

In MERIDIAN, we find p(θ) that maximizes S while requiring it to reproduce a set of average constraints evaluated using experimental data. Following Dixit et al., 2020, entropy maximized p(θ) is given by the Gibbs–Boltzmann distribution

(S23) p(θ)=1Ωexp(mλmψm(θ))

where λm are the Lagrange multiplier corresponding to the mth constraint, ψmθ is the quantity whose average is constrained, and a Ω is the normalization constant.

Given a set of constraints (see below), the Lagrange multipliers can be numerically tuned such that the predictions from the distribution pθ match their experimental value. We optimize the Lagrange multipliers using gradient-based search (Equation S24) using the ADAM algorithm (Kingma, 2014). The gradients for minimizing a Lagrangian cost function are given by Dixit et al., 2020.

(S24) L=logΩ+mλmRmLλm=Rmψm(θ)θ

In Equation S24, ψm(θ)θ denotes the ensemble average computed using pθ and Rm are the corresponding measurements. We stop the iterative procedure when the mean absolute relative error 1Mm|Rmψm(θ)θ|Rm reaches a predefined value.

The predictions from the Max Ent model depend on the choice of the experimental constraints. To ensure that the constraints represent the entire range of single-cell behaviors, we opt for percentile constraints. Specifically, for experimentally collected single-cell data for a given condition (ligand dose, time point, etc.), we first approximate the single-cell histogram using a gamma distribution. Then, we identify abundances that represent 10th–90th percentiles of this distribution. The fraction of cells belonging to each of these percentile windows is exactly 10%. These become our experimental constraints Rem=0.1. Here, ‘e’ denotes the experimental condition (ligand dose, time point, etc.) and m[1,10] denotes the percentile window.

The corresponding model predictions of the fraction of cells in a given percentile window ψem(θ)θ are given by

(S25) ψem(θ)θ=p(θ)ψem(θ)dθ


(S26) ψem(θ)=lemuempe(r|θ)dr

In Equation S26, per|θ is the model-predicted single-cell distribution of responses (surface EGFR levels or nuclear FoxO levels) for experimental condition e. The integration bounds lem and uem represent the lower and upper bounds of the mth percentile window. The distribution per|θ in principle can be approximated by several runs of an explicit simulation using Gillespie’s algorithm (Billing et al., 2019). This may prove to be computationally expensive, especially when sampling through multiple parameter sets θ. Therefore, we resort to moment closure techniques (Gillespie, 2009) to approximate per|θ as a gamma distribution. We used a previously developed package called MOCA (moment closure analysis) (Schnoerr et al., 2015). We used a Gaussian moment closure to obtain the first and the second moments of the distributions. The moment closure approximation was quite accurate compared to the explicit Gillespie simulation and allowed us to rapidly predict single cell response distributions without performing multiple calculations (Figure 2—figure supplement 4).

The averages in Equation S25 cannot be computed analytically. We therefore resort to Markov chain Monte Carlo techniques to approximate them. Briefly, for a fixed set of Lagrange multipliers, we start 150 parallel MCMC chains in the parameter space with a starting point chosen randomly from the previous iteration. Each step in the MCMC calculation attempted to change between 1 and 5 parameters. The step size for the change was a uniform random number whose maximum was 10% of the parameter bounds for individual parameters. Each MCMC chain was run for approximately for 104 steps for the EGF/EGFR pathway and 2.5×105 for the IGF/FoxO pathway. The first 1000 steps were discarded, and parameter sets were stored every 50th step after that.

We used ADAM to optimize the Lagrange multipliers (Figure 2—figure supplement 5). The hyperparameters for ADAM were as follows: the exponential decay rate for the first moment estimates was set to 0.8. The exponential decay rate for the second-moment estimates was set to 0.999, the step size was 0.1. This procedure led to a decrease in the relative error. Using the final set of Lagrange multipliers, we sampled parameter sets that represented an in silico cell population. This population was used for further analysis (see below).

4. Numerical quantification of mutual information and channel capacity using live-cell imaging data

Request a detailed protocol

Here, we describe how we estimated cell state-specific mutual information from live-cell imaging data. The IGF/FoxO pathway reaches an approximate steady state within 30–45 min after the introduction of the IGF ligand. We used data collected on HeLa cells where cells were treated with subsequent doses of IGF every 90 min to approximate the single-cell response to multiple IGF levels. We approximated the single-cell steady-state nuclear FoxO distributions as gamma distributions and estimated their means and variances from data collected between 60 and 90 min of IGF stimulation (Figure 2—figure supplement 6). Using these first two moments, we approximated the single-cell response distribution p(r|u,θ) as a gamma distribution. The mutual information at the single-cell level and the population average cell state-specific mutual information ICee were obtained using these response distributions. For ICSA , the first two moments of the single-cell response distributions were averaged to obtain the population-level means and variances. This, along with the assumption that the responses were gamma distributions, provided the cell state-agnostic response distribution p(r|u) from which ICSA was obtained.

To verify that cell states are indeed conserved at the time scale of the experiment, we reanalyzed data generated by Gross et al., 2019 wherein cells were perturbed with IGF (37.5 pM), followed by a washout which allowed the cells to reach pre-stimulation nuclear FoxO levels, followed by a re-perturbation with the same amount of IGF. Nuclear FoxO response was measured at the single-cell level after 90 min with IGF exposure both these times. Since the response x to the same input u was measured twice in the same cell (x1 and x2), we could evaluate the intrinsic variability in response at the single-cell level. We then compared this intrinsic variability to the extrinsic cell state-dependent variability in the population.

To do so, we computed for each cell δ=x1-x2 the difference between the two responses. Figure 2—figure supplement 7 shows the histogram p(δ) as computed from the data (pink) and the same computed from the model that was trained on the single-cell data (blue). We also computed p(δ0), which represented the difference between responses of two different cells from the same population, shown for both data and the model.

As shown in Figure 2—figure supplement 7, the distribution p(δ) is significantly narrower than p(δ0), suggesting that intracellular variability is significantly smaller than across-population variability and that cells’ response to the same stimuli is quite conserved, especially when compared to responses in randomly picked pairs of cells. This shows that cell states and the corresponding response to extracellular perturbations are conserved, at least at the time scale of the experiment. Therefore, our estimates of cell-to-cell variability signaling fidelity are stable and reliable.

5. Model-predicted joint distributions pCeeMI(I,χ) for several biochemical parameters

Request a detailed protocol

See Figure 3—figure supplement 1.

6. Examining the role of cell state dynamics on sensing ability

Request a detailed protocol

To elucidate the role of cell state dynamics, we built a simple model of ligand–receptor system (Figure 4). We define the system as follows:

(S27) ϕkmprodmRNA,mRNAkmdegϕ
(S28) mRNAkprodmRNA+R
(S29) RLkbindB,BkunbindL+R
(S30) Rkdegϕ,Bkdegϕ

Here all cells are assigned identical kinetic parameters. Instead, the state of the system is defined by the abundances of each molecule prior to ligand dose. Thus, each cell may be described by two state variables, θ={R0,mRNA0}, where mRNA0 is the initial mRNA levels and R0 is the initial ligand-free receptor count. We tune the mRNA dynamics by changing mRNA production and degradation rates while keeping the average mRNA copy number constant. A relaxation of conditional responses requires both states have sufficiently high turnover, while we will modulate mRNA turnover, we will simply select a sufficiently high receptor turnover rate as described below.

For our simulations, we set kmprodkmdeg=5a.u., kprod=50a.u.s1 , kdeg=0.5s1, and kunbind=1s1 . We also set kmdeg=τ-1, where τ takes on five values, equally distributed in the log space, between 102 and 104 s. We select two distinct cells to study, θa={300,3} and θb={700,7}. For each cell and mRNA turnover time scale of interest, we introduce 20 different doses equally distributed in log scale, leading to values of Lkbind ranging from 10-2 to 103(s1). We ran 2500 Gillespie simulations for each condition to obtain approximate distributions of the response. We then fit these samples to gamma distributions to obtain conditional responses from which the channel capacity mutual information can be calculated using an approach similar to the one described in Section 1.

Data availability

The current manuscript is a computational study of previously collected data. No data have been generated for this manuscript. Scripts used to generate mutual information results as well the experimental data used to train our models can be obtained at: (copy archived at Goetz, 2024). Script for maximum entropy inference of cellular parameters can be obtained at: (copy archived at Akl, 2024).


  1. Book
    1. Cover TM
    2. Thomas JA
    Elements of Information Theory (2nd ed)
    Hoboken: Wiley-Interscience.
  2. Book
    1. Lewis Cantley TH
    2. Sever R
    Jeremy Thorner Signal Transduction: Principles, Pathways, and Processes (1st ed)
    New York: Cold Spring Harbor Laboratory Press.

Article and author information

Author details

  1. Andrew Goetz

    Department of Biomedical Engineering, Yale University, New Haven, United States
    Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Hoda Akl
    Competing interests
    No competing interests declared
  2. Hoda Akl

    Department of Physics, University of Florida, Gainesville, United States
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Andrew Goetz
    Competing interests
    No competing interests declared
  3. Purushottam Dixit

    1. Department of Biomedical Engineering, Yale University, New Haven, United States
    2. Systems Biology Institute, Yale University, West Haven, United States
    Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3282-0866


National Institute of General Medical Sciences (R35GM142547)

  • Andrew Goetz
  • Hoda Akl
  • Purushottam Dixit

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


AG, HA, and PD were supported by NIGMS grant R35GM142547. The authors thank Andre Levchenko for useful discussions.

Version history

  1. Preprint posted: March 13, 2023 (view preprint)
  2. Sent for peer review: March 30, 2023
  3. Preprint posted: June 22, 2023 (view preprint)
  4. Preprint posted: January 4, 2024 (view preprint)
  5. Version of Record published: January 31, 2024 (version 1)

Cite all versions

You can cite all versions using the DOI This DOI represents all versions, and will always resolve to the latest one.


© 2023, Goetz, Akl et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 884
  • 50
  • 1

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Andrew Goetz
  2. Hoda Akl
  3. Purushottam Dixit
The ability to sense the environment is heterogeneously distributed in cell populations
eLife 12:RP87747.

Share this article

Further reading

    1. Cell Biology
    2. Physics of Living Systems
    Ivan Castello-Serrano, Frederick A Heberle ... Ilya Levental
    Research Article

    The organelles of eukaryotic cells maintain distinct protein and lipid compositions required for their specific functions. The mechanisms by which many of these components are sorted to their specific locations remain unknown. While some motifs mediating subcellular protein localization have been identified, many membrane proteins and most membrane lipids lack known sorting determinants. A putative mechanism for sorting of membrane components is based on membrane domains known as lipid rafts, which are laterally segregated nanoscopic assemblies of specific lipids and proteins. To assess the role of such domains in the secretory pathway, we applied a robust tool for synchronized secretory protein traffic (RUSH, Retention Using Selective Hooks) to protein constructs with defined affinity for raft phases. These constructs consist solely of single-pass transmembrane domains (TMDs) and, lacking other sorting determinants, constitute probes for membrane domain-mediated trafficking. We find that while raft affinity can be sufficient for steady-state PM localization, it is not sufficient for rapid exit from the endoplasmic reticulum (ER), which is instead mediated by a short cytosolic peptide motif. In contrast, we find that Golgi exit kinetics are highly dependent on raft affinity, with raft preferring probes exiting the Golgi ~2.5-fold faster than probes with minimal raft affinity. We rationalize these observations with a kinetic model of secretory trafficking, wherein Golgi export can be facilitated by protein association with raft domains. These observations support a role for raft-like membrane domains in the secretory pathway and establish an experimental paradigm for dissecting its underlying machinery.

    1. Microbiology and Infectious Disease
    2. Physics of Living Systems
    Chi Zhang, Rongjing Zhang, Junhua Yuan
    Research Article

    Bacteria in biofilms secrete potassium ions to attract free swimming cells. However, the basis of chemotaxis to potassium remains poorly understood. Here, using a microfluidic device, we found that Escherichia coli can rapidly accumulate in regions of high potassium concentration on the order of millimoles. Using a bead assay, we measured the dynamic response of individual flagellar motors to stepwise changes in potassium concentration, finding that the response resulted from the chemotaxis signaling pathway. To characterize the chemotactic response to potassium, we measured the dose–response curve and adaptation kinetics via an Förster resonance energy transfer (FRET) assay, finding that the chemotaxis pathway exhibited a sensitive response and fast adaptation to potassium. We further found that the two major chemoreceptors Tar and Tsr respond differently to potassium. Tar receptors exhibit a biphasic response, whereas Tsr receptors respond to potassium as an attractant. These different responses were consistent with the responses of the two receptors to intracellular pH changes. The sensitive response and fast adaptation allow bacteria to sense and localize small changes in potassium concentration. The differential responses of Tar and Tsr receptors to potassium suggest that cells at different growth stages respond differently to potassium and may have different requirements for potassium.