A mechanism for hunchback promoters to readout morphogenetic positional information in less than a minute
Abstract
Cell fate decisions in the fly embryo are rapid: hunchback genes decide in minutes whether nuclei follow the anterior/posterior developmental blueprint by reading out positional information in the Bicoid morphogen. This developmental system is a prototype of regulatory decision processes that combine speed and accuracy. Traditional arguments based on fixedtime sampling of Bicoid concentration indicate that an accurate readout is impossible within the experimental times. This raises the general issue of how speedaccuracy tradeoffs are achieved. Here, we compare fixedtime to onthefly decisions, based on comparing the likelihoods of anterior/posterior locations. We found that these more efficient schemes complete reliable cell fate decisions within the short embryological timescales. We discuss the influence of promoter architectures on decision times and error rates, present concrete examples that rapidly readout the morphogen, and predictions for new experiments. Lastly, we suggest a simple mechanism for RNA production and degradation that approximates the loglikelihood function.
Introduction
From development to chemotaxis and immune response, living organisms make precise decisions based on limited information cues and intrinsically noisy molecular processes, such as the readout of ligand concentrations by specialized genes or receptors (Houchmandzadeh et al., 2002; Perry et al., 2012; Takeda et al., 2012; Marcelletti and Katz, 1992; Bowsher and Swain, 2014). Selective pressure in biological decisionmaking is often strong, for reasons that range from predator evasion to growth maximization or fast immune clearance. In development, early embryogenesis of insects and amphibians unfolds outside of the mother, which arguably imposes selective pressure for speed to limit the risks of predation and infection by parasitoids (O'Farrell, 2015). In Drosophila embryos, the first 13 cycles of DNA replication and mitosis occur without cytokinesis, resulting in a multinucleated syncytium containing about 6000 nuclei (O'Farrell et al., 2004). Speed is witnessed both by the rapid and synchronous cleavage divisions observed over the cycles, and the successive fast decisions on the choice of differentiation blueprints, which are made in less than 3 min (Lucas et al., 2018).
In the early fly embryo, the map of the future body structures is set by the segmentation gene hierarchy (NüssleinVolhard et al., 1984; Houchmandzadeh et al., 2002; Jaeger, 2011). The definition of the positional map starts by the emergence of two (anterior and posterior) regions of distinct hunchback (hb) expression, which are driven by the readout of the maternal Bicoid (Bcd) morphogen gradient (Houchmandzadeh et al., 2002, Figure 1a). hunchback spatial profiles are sharp and the variance in hunchback expression of nuclei at similar positions along the AP axis is small (Desponds et al., 2016; Lucas et al., 2018). Taken together, these observations imply that the shorttime readout is accurate and has a low error. Accuracy ensures spatial resolution and the correct positioning of future organs and body structures, while low errors ensure reproducibility and homogeneity among spatially close nuclei. The amount of positional information available at the transcriptional locus is close to the minimal amount necessary to achieve the required precision (Gregor et al., 2007b; Porcher et al., 2010; Garcia et al., 2013; Petkova et al., 2019). Furthermore, part of the morphogenetic information is not accessible for reading by downstream mechanisms (Tikhonov et al., 2015), as information is channeled and lost through subsequent cascades of gene activity. In spite of that, by the end of nuclear cycle 14 the positional information encoded in the gap gene readouts is sufficient to correctly predict the position of each nucleus within 2% of the egg length (Petkova et al., 2019). Adding to the time constraints, mitosis resets the binding of transcription factors (TF) during the phase of synchronous divisions (Lucas et al., 2018), suggesting that the decision about the nuclei’s position is made by using information accessible within one nuclear cycle. Experiments additionally show that during the nuclear cycles 10–13 the positional information encoded by the Bicoid gradient is read out by hunchback promoters precisely and within 3 min (Lucas et al., 2018).
Effective speedaccuracy tradeoffs are not specific to developmental processes, but are shared by a large number of sensing processes (Rinberg et al., 2006; Heitz and Schall, 2012; Chittka et al., 2009). This generality has triggered interest in quantitative limits and mechanisms for accuracy. Berg and Purcell derived the seminal tradeoff between integration time and readout accuracy for a receptor evaluating the concentration of a ligand (Berg and Purcell, 1977) based on its average binding occupancy. Later studies showed that this limit takes more complex forms when rebinding events of detached ligands (Bialek and Setayeshgar, 2005; Kaizu et al., 2014) or spatial gradients (Endres and Wingreen, 2008) are accounted for. The accuracy of the averaging method in Berg and Purcell, 1977 can be improved by computing the maximum likelihood estimate of the time series of receptor occupancy for a given model (Endres and Wingreen, 2009; Mora and Wingreen, 2010). However, none of these approaches can result in a precise anteriorposterior (AP) decision for the hunchback promoter in the short time of the early nuclear cycles, which has led to the conclusion that there is not enough time to apply the fixedtime BergPurcell strategy with the desired accuracy (Gregor et al., 2007a). Additional mechanisms to increase precision (including internuclear diffusion) do yield a speedup (Erdmann et al., 2009; Aquino et al., 2016), yet they are not sufficient to meet the 3min challenge. The issue of the embryological speedaccuracy tradeoff remains open.
The approaches described above are fixedtime strategies of decisionmaking, that is, they assume that decisions are made at a predefined deterministic time T that is set long enough to achieve the desired level of error and accuracy. As a matter of fact, fixing the decision time is not optimal because the amount of available information depends on the specific statistical realization of the noisy signal that is being sensed. The time of decision should vary accordingly and therefore depend on the realization. This intuition was formalized by A. Wald by his Sequential Probability Ratio Test (SPRT) (Wald, 1945a). SPRT achieves optimality in the sense that it ensures the fastest decision strategy for a given level of expected error. The adaption of the method to biological sensing posits that the cell discriminates between two hypothetical concentrations by accumulating information through binding events, and by computing on the fly the ratio of the likelihoods (or appropriate proxies) for the two concentrations to be discriminated (Siggia and Vergassola, 2013). When the ratio ‘strongly’ favors one of the two hypotheses, a decision is triggered. The strength of the evidence required for decisionmaking depends on the desired level of error. For a given level of precision, the average decision time can be substantially shorter than for traditional averaging algorithms (Siggia and Vergassola, 2013). SPRT has also been proposed as an efficient model of decisionmaking in the fields of social interactions and neuroscience (Gold and Shadlen, 2007; Marshall et al., 2009) and its connections with nonequilibrium thermodynamics are discussed in Roldán et al., 2015.
Wald’s approach is particularly appealing for biological concentration readouts since many of them, including the anteriorposterior decision faced by the hunchback promoter, appear to be binary decisions. Our first goal here is to specifically consider the paradigmatic example of the hunchback promoter and elucidate the degree of speedup that can be achieved by decisions on the fly. Second, we investigate specific implementations of the decision strategy in the form of possible hunchback promoter architectures. We specifically ask how cooperative TF binding affects the sensing limits. Our results have implications beyond fly development and are generally relevant to regulatory processes. We identify promoter architectures that, by approximating Wald’s strategy, do satisfy several key experimental constraints and reach the experimentally observed level of accuracy of hunchback expression within the (apparently) very stringent time limits of the early nuclear cycles.
Results
Methodological setup
The decision process of the anterior vs posterior hunchback expression
The problem faced by nuclei in their decision of anterior vs posterior developmental fate is sketched in Figure 1a. By decision we mean that nuclei commit to a cell fate through a process that is mainly irreversible leading to one of two classes of cell states that correspond to either the anterior or the posterior regions of the embryo, based on positional information acquired through gene activity. We limit our investigation of promoter architectures to the six experimentally identified Bicoidbinding sites (Figure 1b). We do not consider the known Hunchbackbinding sites because before nuclear cycle 13 there is little time to produce sufficient concentrations of zygotic proteins for a significant feedback effect and the measured maternal hunchback profile has not been shown to alter anteriorposterior decisionmaking. Following the observation that Bcd readout is the leading factor in nuclei fate determination (OchoaEspinosa et al., 2009), we also neglect the role of other maternal gradients, for example Caudal, Zelda or Capicua (Jiménez et al., 2000; Sokolowski et al., 2012; Tran et al., 2018; Lucas et al., 2018), since the readout of these morphogens can only contribute additional information and decrease the decision time. We focus on the proximal promoter since no active enhancers have been identified for the hunchback locus in nuclear cycles 11–13 (Perry et al., 2011). Our results can be generalized to enhancers (Hannon et al., 2017), the addition of which only further improves the speedaccuracy efficacy, as we explicitly show for a simple model of Bicoid activated enhancers in the section ’Joint dynamics of Bicoid enhancer and promoter’. Since our goal is to show that accurate decisions can be made rapidly, we focus on the worst case decisionmaking scenario: the positional information (Wolpert et al., 2015) is gathered through a readout of the Bicoid concentration only, and the decision is assumed to be made independently in each nucleus. Having additional information available and/or coupling among nuclei can only strengthen our conclusion.
The profile of the average concentration of the maternal morphogen Bicoid $L(x)$ is well represented by an exponential function that decreases from the anterior toward the posterior of the embryo : $L(x)={L}_{0}{e}^{5(x{x}_{0})/100}$, where x is the position along the anteriorposterior axis measured in terms of percentage egglength (EL), and x_{0} is the position of half maximum hb expression corresponding to L_{0} Bcd concentration. The decay length $\lambda =5$ corresponds to 20% EL (Gregor et al., 2007a). Nuclei convert the graded Bicoid gradient into a sharp border of hunchback expression (Figure 1a), with high and low expressions of the hunchback promoter at the left and the right of the border respectively (Driever and NüssleinVolhard, 1988; Struhl et al., 1992; Crauk and Dostatni, 2005; Gregor et al., 2007b; Houchmandzadeh et al., 2002; Porcher et al., 2010; Garcia et al., 2013; Lucas et al., 2013; Tran et al., 2018; Lucas et al., 2018). We define the border region of width δ_{x} symmetrically around x_{0} by the dashed lines in Figure 1a. δ_{x} is related to the positional resolution (Erdmann et al., 2009; Tran et al., 2018) of the anteriorposterior decision: it is the minimal distance measured in percentages of egglength between two nuclei’s positions, at which the nuclei can distinguish the Bcd concentrations. Although this value is not known exactly, a lower bound is estimated as ${\delta}_{x}\sim 2\%$ EL (Gregor et al., 2007a), which corresponds to the width of one nucleus.
We denote the Bcd concentration at the anterior (respectively posterior) boundary of the border region by L_{1} (respectively L_{2}) (see Figure 1a). At each position x, nuclei compare the probability that the local concentration $L(x)$ is greater than L_{1} (anterior) or smaller than L_{2} (posterior). By using current best estimates of the parameters (see Appendix 1), a classic fixedtimedecision integration process and an integration time of 270 s (the duration of the interphase in nuclear cycle 11 [Tran et al., 2018]), we compute in Figure 1c the probability of error per nucleus for each position in the embryo (see Appendix 2 for details). As expected, errors occur overwhelmingly in the vicinity of the border region, where the decision is the hardest (Figure 1c). For nuclei located within the border region, both anterior and posterior decisions are correct since the nuclei lie close to both regions. It follows that, although the error rate can formally be computed in this region, the errors do not describe positional resolution mistakes and do not contribute to the total error (white zone in Figure 1c).
In view of Figure 1c and to simplify further analysis we shall focus on the boundaries of the border region : each nucleus discriminates between hypothesis 1 – the Bcd concentration is $L={L}_{1}$, and hypothesis 2 – the Bcd concentration is $L={L}_{2}$. To achieve a positional resolution of ${\delta}_{x}=2\%$ EL, nuclei need to be able to discriminate between differences in Bcd concentrations on the order of 10%. In addition to the variation in Bcd concentration estimates that are due to biological precision, concentration estimated using many trials follows a statistical distribution. The central limit theorem suggests that this distribution is approximately Gaussian. This assumption means that the probability that the Bcd concentration estimate deviates from the actual concentration by more than the prescribed 10% positional resolution is 32% (see the subsection 'How many nuclei make a mistake?’ for variations on the value and arguments on the error rate). In Figure 1d, we show that the time required under a fixedtimedecision strategy for a promoter with six binding sites to estimate the Bcd concentration within the 32% Gaussian error rate (Gregor et al., 2007a) close to the boundary is much longer than 270 s, $\simeq 40$ minutes (see Appendix 2 for details of the calculation). The activation rule for the promoter architecture in the figure is that all binding sites need to be bound for transcription initiation.
Identifying fast decision promoter architectures
The promoter model
We model the hb promoter as six Bcd binding sites (Schröder et al., 1988; Driever et al., 1989; Struhl et al., 1989; OchoaEspinosa et al., 2005) that determine the activity of the gene (Figure 2a). Bcd molecules bind to and unbind from each of the $i=1,\mathrm{\dots},6$ sites with rates μ_{i} and ν_{i}, which are allowed to be different for each site. For simplicity, the gene can only take two states : either it is silenced (OFF) and mRNA is not produced, or the gene is activated (ON) and mRNA is produced at full speed. While models that involve different levels of polymerase loading are biologically relevant and interesting, the simplified model allows us to gain more intuition and follows the worstcase scenario logic that we discussed in the previous subsection ’The decision process of the anterior vs posterior hunchback expression’. The same remark applies for the wide variety of promoter architectures considered in previous works (Estrada et al., 2016; Tran et al., 2018). In particular, we assume that only the number of sites that are bound matters for gene activation (and not the specific identity of the sites). Such architectures are again a subset of the range of architectures considered in Estrada et al., 2016; Tran et al., 2018.
The dynamics of our model is a Markov chain with seven states with probability P_{i} corresponding to the number of sites occupied: from all sites unoccupied (probability P_{0}) to all six sites bound by Bcd molecules (probability P_{6}). The minimum number k of bound Bicoid sites required to activate the gene divides this chain into the two disjoint and complementary subsets of active states (${P}_{\text{ON}}={\sum}_{i=k}^{6}{P}_{i}$, for which the gene is activated) and inactive states (${P}_{\text{OFF}}={\sum}_{i=0}^{k1}{P}_{i}$, for which the gene is silenced) as illustrated in Figure 2b and d.
As Bicoid ligands bind and unbind the promoter (Figure 2b), the gene is successively activated and silenced (Figure 2c). This binding/unbinding dynamics results in a series of OFF and ON activation times that constitute all the information about the Bcd concentration available to downstream processes to make a decision. We note that the idea of translating the statistics of bindingunbinding times into a decision remains the same as in the BergPurcell approach, where only the activation times are translated into a decision (and not the deactivation times). The promoter architecture determines the relationship between Bcd concentration and the statistics of the ONOFF activation time series, which makes it a key feature of the positional information decision process. Following (Siggia and Vergassola, 2013), we model the decision process as a Sequential Probability Ratio Test (SPRT) based on the time series of gene activation. At each point in time, SPRT computes the likelihood P of the observed time series under both hypotheses $P({L}_{1})$ and $P({L}_{2})$ and takes their ratio : $R(t)=P({L}_{1})/P({L}_{2})$. The logarithm of $R(t)$ undergoes stochastic changes until it reaches one of the two decision threshold boundaries K or −K (symmetric boundaries are used here for simplicity) (Figure 2d). The decision threshold boundaries K are set by the error rate e for making the wrong decision between the hypothetical concentrations : $K=\text{log}\left((1e)/e\right)$ (see Siggia and Vergassola, 2013 and Appendix 3). The choice of K or e depends on the level of reproducibility desired for the decision process. We set $K\simeq 0.75$, corresponding to the widely used error rate $e\simeq 0.32$ of being more than one standard deviation away from the mean of the estimate for the concentration assumed to be unbiased in a Gaussian model (see the subsection ’The decision process of the anterior vs posterior hunchback expression’). The statistics of the fluctuations in likelihood space are controlled by the values of the Bcd concentrations: when Bcd concentration is low, small numbers of Bicoid ligands bind to the promoter (Figure 2e) and the hb gene spends little time in the active expression state (Figure 2f), which leads to a negative drift in the process and favors the lower one of the two possible concentrations (Figure 2g).
Mean decision time: connecting driftdiffusion and Wald’s approaches
In this section, we develop new methods to determine the statistics of gene switches between the OFF and ON expression states. Namely, by relating Wald’s approach (Wald, 1945a) with driftdiffusion, we establish the equality between the drift and diffusion coefficients in decision making space for difficult decision problems, that is, when the discrimination is hard, we elucidate the reason underlying the equality. That allows us to effectively determine longterm properties of the likelihood logratio and compute mean decision times for complex architectures.
A gene architecture consists of N binding sites and is represented by N + 1 Markov states corresponding to the number of bound TF, and the rates at which they bind or unbind (Figure 3a). For a given architecture, the dynamics of binding/unbinding events and the rules for activation define the two probability distributions ${P}_{\text{OFF}}(t,L)$ and ${P}_{\text{ON}}(s,L)$ for the duration of the OFF and ON times, respectively (Figure 3b). The two series are denoted ${\{{t}_{i}\}}_{1\le i\le {J}^{+}}$ and ${\{{s}_{j}\}}_{1\le j\le {J}^{}}$, where ${J}^{+}$ and ${J}^{}$ are the number of switching events in time t from OFF to ON and vice versa. For those cases where the two concentrations L_{1} and L_{2} are close and the discrimination problem is difficult (which is the case of the Drosophila embryo), an accurate decision requires sampling over many activation and deactivation events to achieve discrimination. The logarithm of the ratio $R(t)$ can then be approximated by a drift–diffusion equation: $d\text{log}R(t)/dt=V+\sqrt{2D}\eta $, where V is the constant drift, that is, the bias in favor of one of the two hypotheses, D is the diffusion constant and $\eta $ a standard Gaussian white noise with zero mean and deltacorrelated in time (Wald, 1945a; Siggia and Vergassola, 2013). The decision time for the case of symmetric boundaries $K={K}_{}={K}_{+}$, is given by the mean firstpassage time for this biased random walk in the loglikelihood space (Redner, 2001; Siggia and Vergassola, 2013):
Note that in this approximation all the details of the promoter architecture are subsumed into the specific forms of the drift V and the diffusion D.
Drift. We assume for simplicity that the time series of OFF and ON times are independent variables (when this assumption is relaxed, see Appendix 8). This assumption is in particular always true when gene activation only depends on the number of bound Bicoid molecules. Under these assumptions, we can apply Wald’s equality (Wald, 1945b; Durrett, 2010) to the loglikelihood ratio, $\text{log}R(t)$. Wald considered the sum of a random number M of independent and identically distributed (i.i.d.) variables. The equality that he derived states that if M is independent of the outcome of variables with higher indices ${({X}_{i})}_{i>M}$ (i.e. M is a stopping time), then the average of the sum is the product $\u27e8M\u27e9\u27e8{X}_{i}\u27e9$.
Wald’s equality applies to our likelihood sum (${\sum}_{i}^{M}\text{log}{R}_{i}$ of the likelihoods, where M is the number of ON and OFF times before a given (large) time t). We conclude the drift of the loglikelihood ratio, $\text{log}R(t)$, is inversely proportional to $({\tau}_{\text{ON}}+{\tau}_{\text{OFF}})$, where ${\tau}_{\text{ON}}$ is the mean of the distribution of ON times ${P}_{\text{ON}}(t,L)$ and ${\tau}_{\text{OFF}}$ is the mean of the distribution of OFF times ${P}_{\text{OFF}}(s,L)$. The term $({\tau}_{\text{ON}}+{\tau}_{\text{OFF}})$ determines the average speed at which the system completes an activation/deactivation cycle, while the average $\u27e8\text{log}{R}_{i}\u27e9$ describes how much deterministic bias the system acquires on average per activation/deactivation cycle. The latter can be reexpressed in terms of the KullbackLeibler divergence ${D}_{KL}(fg)={\int}_{0}^{\mathrm{\infty}}d{t}^{\prime}f({t}^{\prime})\text{log}(f({t}^{\prime})/g({t}^{\prime}))$ between the distributions of the OFF and ON times calculated for the actual concentration L and each one of the two hypotheses, L_{1} and L_{2} :
Equation 2 quantifies the intuition that the drift favors the hypothetical concentration with the time distribution which is the closest to that of the real concentration L (Figure 3b).
Diffusivity : Why it is more involved to calculate and how we circumvent it. While the drift has the closed simple form in Equation 2, the diffusion term is not immediately expressed as an integral. The qualitative reason is as follows. Computing the likelihood of the two hypotheses requires computing a sum where the addends are stochastic (ratios of likelihoods) and the number of terms is also stochastic (the number of switching events). These two random variables are correlated: if the number of switching events is large, then the times are short and the likelihood is probably higher for large concentrations. While the drift is linear in the above sum (so that the average of the sum can be treated as shown above), the diffusivity depends on the square of the sum. The diffusivity involves then the correlation of times and ratios (CarballoPacheco et al., 2019), which is harder to obtain as it depends a priori on the details of the binding site model (see the subsection ’Equality between drift and diffusivity’ and the subsection ’When are correlations between the times of events leading to decision important?’ of Appendix 3 for details).
We circumvent the calculation of the diffusivity by noting that the same methods used to derive Equation (1) also yield the probability of first absorption at one of the two boundaries, say +K (see the subsection ’Equality between drift and diffusivity’ of Appendix 3):
By imposing ${\mathrm{\Pi}}_{K}=1e$, we obtain $VK/D=\text{log}\left((1e)/e\right)$ and the comparison with the expression of $K=\text{log}\left((1e)/e\right)$ leads to the equality $D=V$.
The above equality is expected to hold for difficult decisions only. Indeed, driftdiffusion is based on the continuity of the loglikelihood process and Wald’s arguments assume the absence of substantial jumps in the loglikelihood over a cycle. In other words, the two approaches overlap if the hypotheses to be discriminated are close. For very distinct hypotheses (easy discrimination problems), the two approaches may differ from the actual discrete process of decision and among themselves. We expect then that $V=D$ holds only for hypotheses that are close enough, which is verified by explicit examples (see the subsection ’Equality between drift and diffusivity’ of Appendix 3). The Appendix subsection also verifies $V=D$ by expanding the general expression of V and D for close hypotheses. The origin of the equality is discussed below.
Using $V=D$, we can reduce the general formula Equation (1) to
which is formula 4.8 in Wald, 1945a and it is the expression that we shall be using (unless stated otherwise) in the remainder of the paper.
The additional consequence of the equality $V=D$ is that the argument $VK/2D$ of the hyperbolic tangent in Equation (1) is $\simeq K/2=\text{ln}\left(\sqrt{(1e)/e}\right)$. It follows that for any problem where the error $e\ll 1$, the argument of the hyperbolic tangent is large and the decision time is weakly dependent on deviations to $V=D$ that occur when the two hypotheses differ substantially. A concrete illustration is provided in the subsection ’The first passage time to decision’ of Appendix 3.
Single binding site example. As an example of the above equations, we consider the simplest possible architecture with a single binding site ($N=1$), where the gene activation and deactivation processes are Markovian. In this case, the deactivation rate ν is independent of TF concentration and the activation rate is exponentially distributed ${P}_{\text{OFF}}(L,t)=kL{e}^{kLt}$. We can explicitly calculate the drift $V=\nu kL/(\nu +kL)(\text{log}({L}_{1}/{L}_{2})+({L}_{2}{L}_{1})/L)$ (Equation 2) and expand it for ${L}_{2}=L$ and ${L}_{1}=L+\delta L$, at leading order in $\delta L$. Inserting the resulting expression into Equation 4, we conclude that
decreases with increasing relative TF concentration difference $\delta L/L$ and gives a very good approximation of the complete formula (see Appendix 3—figure 2a–c with different values of $\delta L/L$).
Equations 2 and 4 greatly reduce the complexity of evaluating the performance of architectures, especially when the number of binding sites is large. Alternatively, computing the correlation of times and loglikelihoods would be increasingly demanding as the size of the gene architecture transfer matrices increase. As an illustration, Figure 3 compares the performance of different activation strategies : the 2ormore rule ($k=2$), which requires at least two Bcdbinding sites to be occupied for hb promoter activation (Figure 3a–d in blue), and the 4ormore rule ($k=4$) (Figure 3a–d in red) for fixed binding and unbinding parameters. Figure 3c shows that stronger drifts lead to faster decisions. The full decision time probability distribution is computed from the explicit formula for its Laplace transform (Siggia and Vergassola, 2013, Figure 3d). With the rates chosen for Figure 3, the $k=4$ rule leads to an ON time distribution that varies strongly with the concentration, making it easier to discriminate between similar concentrations: it results in a stronger average drift that leads to a faster decision than $k=2$ (Figure 3d).
What is the origin of the $V\mathrm{=}D$ equality? The special feature of the SPRT random process is that it pertains to a loglikelihood. This is at the core of the $V=D$ equality that we found above. First, note that the equality is dimensionally correct because loglikelihoods have no physical dimensions so that both V and D have units of $tim{e}^{1}$. Second, and more important, loglikelihoods are built by Bayesian updating, which constrains their possible variations. In particular, given the current likelihoods ${P}_{1}(t)={\mathrm{\Pi}}_{j}{P}_{\text{ON}}({t}_{j},{L}_{1}){P}_{\text{OFF}}({s}_{j},{L}_{1})$ and ${P}_{2}(t)={\mathrm{\Pi}}_{j}{P}_{\text{ON}}({t}_{j},{L}_{2}){P}_{\text{OFF}}({s}_{j},{L}_{2})$ at time t for the two concentrations L_{1} and L_{2} and the respective probabilities ${Q}_{1}(t)={P}_{1}/({P}_{1}+{P}_{2})$ and ${Q}_{2}(t)=1{Q}_{1}$ of the two hypotheses, it must be true that the expected values after a certain time $\mathrm{\Delta}t$ remain the same if the expectation is taken with respect to the current ${P}_{i}(t)$ (see, e.g. Reddy et al., 2016). In formulae, this implies that the average variation of the probability ${Q}_{2}$ over a given time $\mathrm{\Delta}t$ that is
should vanish (see the subsection 'Equality between drift and diffusivity’ of Appendix 3 for a derivation). Here, ${\u27e8\mathrm{\Delta}{Q}_{2}\u27e9}_{1}$ is the expected variation of ${Q}_{2}$ under the assumption that hypothesis 1 is true and ${\u27e8\mathrm{\Delta}{Q}_{2}\u27e9}_{2}$ is the same quantity but under the assumption that hypothesis 2 is true. We notice now that ${Q}_{2}(t)=\frac{1}{1+{e}^{\mathcal{L}(t)}}$, where $\mathcal{L}=\text{log}R$ is the loglikelihood, and that the driftdiffusion of the loglikelihood implies that ${\u27e8\mathrm{\Delta}\mathcal{L}\u27e9}_{1}=V\mathrm{\Delta}t$, ${\u27e8\mathrm{\Delta}\mathcal{L}\u27e9}_{2}=V\mathrm{\Delta}t$ and ${\u27e8{\left(\mathrm{\Delta}\mathcal{L}{\u27e8\mathrm{\Delta}\mathcal{L}\u27e9}_{1}\right)}^{2}\u27e9}_{1}={\u27e8{\left(\mathrm{\Delta}\mathcal{L}{\u27e8\mathrm{\Delta}\mathcal{L}\u27e9}_{2}\right)}^{2}\u27e9}_{2}=2D\mathrm{\Delta}t$. By using that $d{Q}_{2}/d\mathcal{L}={Q}_{1}{Q}_{2}$ and ${d}^{2}{Q}_{2}/d{\mathcal{L}}^{2}={Q}_{1}{Q}_{2}({Q}_{2}{Q}_{1})$, we finally obtain that
and imposing $\u27e8\mathrm{\Delta}{Q}_{2}\u27e9=0$ yields the equality $V=D$. Note that the above derivation holds only for close hypotheses, otherwise the velocity and the diffusivity under the two hypotheses do not coincide.
Additional embryological constraints on promoter architectures
In addition to the requirements imposed by their performance in the decision process (green dashed line in Figure 4a), promoter architectures are constrained by experimental observations and properties that limit the space of viable promoter candidates for the fly embryo. A discussion about their possible function and their relation to downstream decoding processes is deferred to the final section.
First, we require that the average probability for a nucleus to be active in the boundary region is equal to 0.5, as it is experimentally observed (Lucas et al., 2013; Figure 1a). This requirement mainly impacts and constrains the ratio between binding rates ${\mu}_{i}$ and unbinding rates ${\nu}_{i}$.
Second, there is no experimental evidence for active search mechanisms of Bicoid molecules for its targets. It follows that, even in the best case scenario of a Bcd ligand in the vicinity of the promoter always binding to the target, the binding rate is equal to the diffusion limited arrival rate ${\mu}_{\text{max}}L\simeq 0.124{s}^{1}$ (Appendix 1). As a result, the binding rates ${\mu}_{i}$ are limited by diffusion arrivals and the number of available binding sites: ${\mu}_{i}\le {\mu}_{\text{max}}(7i)$ (black dashed line in Figure 4b), where L is the concentration of Bicoid. This sets the timescale for binding events. In Appendix 1, we explore the different measured values and estimates of parameters defining the diffusion limit ${\mu}_{\text{max}}L$ and their influence on the decision time (see Appendix 1—table 1 for all the predictions).
Third, as shown in Figure 1, the hunchback response is sharp, as quantified by fitting a Hill function to the expression level vs position along the egg length. Specifically, the hunchback expression (in arbitrary units) ${f}_{\text{hb}}$ is well approximated as a function of the Bicoid concentration $L(x)$ by the Hill function ${f}_{\text{hb}}(x)\simeq L{(x)}^{H}/(L{(x)}^{H}+{L}_{0}^{H})$, where ${L}_{0}$ is the Bcd concentration at the halfmaximum hb expression point and H is the Hill coefficient (Figure 1a). Experimentally, the measured Hill coefficient for mRNA expression from the WT hb promoter is $H\sim 78$(Lucas et al., 2018; Tran et al., 2018). Recent work (Tran et al., 2018) suggests that these high values might not be achieved by Bicoidbinding sites only. Given current parameter estimates and an equilibrium binding model, (Tran et al., 2018) shows that a Hill coefficient of 7 is not achievable within the duration of an early nuclear cycle ($\simeq 5$ min). That points at the contribution of other mechanisms to pattern steepness. Given these reasons (and the fact that we limit ourselves only to a model with six equilibrium Bcdbinding sites only), we shall explore the space of possible equilibrium promoter architectures limiting the steepness of our profiles to Hill coefficients $H\sim 45$.
Numerical procedure for identifying fast decisionmaking architectures
Using Equations 2 and 4, we explore possible hb promoter architectures and activation rules to find the ones that minimize the time required for an accurate decision, given the constraints listed in the paragraph ‘Additional embryological constraints on promoter architectures’. We optimize over all possible binding rates ${({\mu}_{i})}_{1\le i\le 6}$ (μ_{1} is the binding rate of the first Bcd ligand and ${\mu}_{6}$ the binding rate of the last Bcd ligand when 5 Bcd ligands are already bound to the promoter), and the unbinding rates ${({\nu}_{i})}_{1\le i\le 6}$ (ν_{1} is the unbinding rate of a single Bcd ligand bound to the promoter and ${\nu}_{6}$ is the unbinding rate of all Bcd ligands when all six Bcdbinding sites are occupied). We also explore different activation rules by varying the minimal number of Bcd ligands k required for activation in the kormore activation rule (Estrada et al., 2016; Tran et al., 2018). We use the most recent estimates of biological constants for the hb promoter and Bcd diffusion (see Appendix 1) and set the error rate at the border to 32% (Gregor et al., 2007b; Petkova et al., 2019). Reasons for this choice were given in the subsection ‘The decision process of the anterior vs posterior hunchback expression’ and will be revisited in the subsection ‘How many nuclei make a mistake?’, where we shall introduce some embryological considerations on the number of nuclei involved in the decision process and determine the error probability accordingly. The optimization procedure that minimized the average decision time for different values of k and H is implemented using a mixed strategy of multiple random starting points and steepest gradient descent (Figure 4a).
Logic and properties of the identified fast decision architectures
The main conclusion we reach using the methodology presented in the 'Methodological setup' section is that there exist promoter architectures that reach the required precision within a few minutes and satisfy all the additional embryological constraints that were discussed previously (Figure 4a). The fastest promoters (blue crosses in Figure 4a) reach a decision within the time of nuclear cycle 11 (green line in Figure 4a) for a wide range of activation rules. Even imposing steep readouts ($H>4$) allows us to identify relatively fast promoters, although imposing the nuclear cycle time limit, pushes the activation rule to smaller k. Interestingly, we find that the fastest architectures identified perform well over a range of high enough concentrations (Appendix 4—figure 1). The optimal architectures differ mainly by the distribution of their unbinding rates (Figure 4b). We shall now discuss their properties, namely the binding times of Bicoid molecules to the DNA binding sites, and the dependence of the promoter activity on various features, such as activation rules and the number of binding sites in detail. Together, these results elucidate the logic underlying the process of fast decisionmaking.
How many nuclei make a mistake?
The precision of a stochastic readout process is defined by two parameters: the resolution of the readout $\delta x$, and the probability of error, which sets the reproducibility of the readout. In Figure 4, we have used the statistical Gaussian error level (32%) to obtain our results. However, the error level sets a crucial quantity for a developing organism and it is important to connect it with the embryological process, namely how many nuclei across the embryo will fail to properly decide (whether they are positioned in the anterior or in the posterior part of the embryo). To make this connection, we compute this number for a given average decision time t and we integrate the error probability along the AP axis to obtain the error per nucleus ${e}_{s}$. The expected number of nuclei that fail to correctly identify their position is given by $\u27e8{n}_{\text{error}}\u27e9={e}_{s}{2}^{c1}$, where c is the nuclear cycle and we have neglected the loss due to yolk nuclei remaining in the bulk and arresting their divisions after cycle 10 (Foe and Alberts, 1983). Assuming a 270 s readout time – the total interphase time of nuclear cycle 11 (Tran et al., 2018) – for the fastest architecture identified above and an error rate of 32%, we find that $\u27e8{n}_{\text{error}}\u27e9\simeq 0.3$, that is an essentially failproof mechanism. This number can be compared with >30 nuclei in the embryo that make an error in a $\u27e8T\u27e9=270\mathrm{s}$ readout in a BergPurcelllike fixedtime scheme (integrated blue area in Figure 1c).
Conversely, for a given architecture, reducing the error level increases drastically the mean firstpassage time to decision: the mean time for decision as a function of the error rate for the fastest architecture identified with $H>5$ and $k=1$ is shown in Appendix 2—figure 1. The decision can be made in about a minute for $e=32\%$ but requires on average 10 min for $e=10\%$ (Appendix 2—figure 1). Note that, because the mean firstpassage depends simply on the inverse of the drift per cycle (Equation 4), the relative performance of two architectures is the same for any error rate so that the fastest architectures identified in Figure 4a are valid for all error levels.
Just like for the fixedtime strategy (Figure 1c and d), nuclei located in the midembryo region are more likely to make mistakes and take longer on average to trigger a decision (Appendix 4—figure 2).
Residence times among the various states
As shown in Tran et al., 2018, high Hill coefficients in the hunchback response are associated with frequent visits of the extreme expression states where available binding sites are either all empty (state 0), or all occupied (state 6). Figure 4c provides a concrete illustration by showing the distribution of residence times for the promoter architectures that yield the fastest decision times for $k=3$ and no constraints (blue bars), $H>4$ (red bars) and $H>5$ (black bars). When there are no constraints on the slope of the hunchback response, the most frequently occupied states are close to the ONOFF transition (2 and 3 occupied binding sites in Figure 4c) to allow for fast back and forth between the active and inactive states of the gene and thereby gather information more rapidly by reducing ${\tau}_{\text{ON}}+{\tau}_{\text{OFF}}$ (see formulae 2 and 4).
We notice that for higher Hill coefficients, the system transits quickly through the central states (in particular states with 3 and 4 occupied Bcd sites, Figure 4 red and black bars). As expected for high Hill coefficients, such dynamics requires high cooperativity. Cooperativity helps the recruitment of extra transcription factors once one or two of them are already bound and thus speeds up the transitioning through the states with 2, 3 and 4 occupied binding sites. An even higher level of cooperativity is required to make TF DNA binding more stable when 5 or 6 of them are bound, reducing the OFF rates ${\nu}_{5}$ or ${\nu}_{6}$ (Figure 4b).
The (short) binding times of Bicoid on DNA
The distribution of times spent bound to DNA of individual Bicoid molecules is shown in Figure 4d obtained from Monte Carlo simulations using rates from the fastest architecture with $H>5$ and $k=2$. We find an exponential decay, an average bound time of about 7.1 s and a median around 0.5 s. Our mediantimebound prediction is of the same order of magnitude as the observed bound times seen in recent experiments by Mir et al., 2017; Mir et al., 2018, who found short (mean ∼0.629 s and median ∼0.436 s based on exponential fits), yet quite distinguishable from background, bound times to DNA. These results were considered surprising because it seemed unclear how such short binding events could be consistent with the processing of ON and OFF gene switching events. Our results show that such short binding times may actually be instrumental in achieving the tradeoff between accuracy and speed, and rationalize how longer activation events are still achieved despite the fast binding and unbinding. High cooperativity architectures lead to nonexponential bound times to DNA (Figure 4d) for which the typical bound time (median) is short but the tail of the distribution includes slower dynamics that can explain longer activation events (the mean is much larger than the median). This result suggests that cells can use the bursty nature of promoter architectures to better discriminate between TF concentrations.
In Mir et al., 2017, the raw distribution comprises both nonspecific and specific binding and cannot be directly compared to simulation results. Instead, we use the largest of the two exponents fit for the boundary region (Mir et al., 2017), which should correspond to specific binding. The agreement between the distributions in Figure 4d is overall good, and we ascribe discrepancies to the fact that (Mir et al., 2017) fit two exponential distributions assuming the observed times were the convolution of exponential specific and nonspecific binding times. Yet the true specific binding time distribution is likely not exponential, e.g. due to the effect of binding sites having different binding affinities. We show in Appendix 5—figure 1 that the two distributions are very similar and hard to distinguish once they are mixed with the nonspecific part of the distribution.
Activation rules
In the parameter range of the early fly embryo, the fastest decisionmaking architectures share the oneormore ($k=1$) activation rule : the promoter switches rapidly between the ON and OFF expression states and the extra binding sites are used for increasing the size of the target rather than building a more complex signal. Architectures with $k=2$ and $k=3$ activation rules can make decisions in less than 270 s and satisfy all the required biological constraints. Generally speaking, our analysis predicts that fast decisions require a small number of Bicoidbinding sites (less than three) to be occupied for the gene to be active. The advantage of the $k=2$ or $k=3$ activation rules is that the ON and OFF times are on average longer than for $k=1$, which makes the downstream processing of the readout easier. We do not find any architecture satisfying all the conditions for the $k=4,5,6$ activation rules, although we cannot exclude there could be some architectures outside of the subset that we managed to sample, especially for the $k=4$ activation rules where we did identify some promoter structures that are close to the time constraint.
Activation rules with higher k can give higher information per cycle for the ON rate, yet they do not seem to lead to faster decisions because of the much longer duration of the cycles. To gain insight on how the tradeoff between fast cycles and information affects the efficiency of activation rules, we consider architectures with only two binding sites, which lend to analytical understanding (Figure 5a and b). Both of these considered architectures are out of equilibrium and require energy consumption (as opposed to the two equilibrium architectures of Figure 5c and d).
When is 1ormore faster than allornothing activation?
A first model has the promoter consisting of two binding sites with the allornothing rule $k=2$ (Figure 5a). We consider the mathematically simpler, although biologically more demanding, situation where TFs cannot unbind independently from the intermediate states – once one TF binds, all the binding sites need to be occupied before the promoter is freed by the unbinding with rate ν of the entire complex of TFs. This situation can be formulated in terms of a nonequilibrium cycle model, depicted for two binding sites in Figure 5a. The activation time is a convolution of the exponential distributions ${P}_{\text{OFF}}(t,L)=\frac{{\mu}_{1}{\mu}_{2}L}{{\mu}_{2}{\mu}_{1}}\left({e}^{{\mu}_{1}Lt}{e}^{{\mu}_{2}Lt}\right)$. In the simple case, when the two binding rates are similar (${\mu}_{1}\simeq {\mu}_{2}$), the OFF times follow a Gamma distribution and the drift and diffusion can be computed analytically (see Appendix 4). When the two binding rates are not similar the drift and diffusion must be obtained by numerical integration (see Appendix 4).
In the first model described above (Figure 5a), deactivation times are independent of the concentration and do not contribute to the information gained per cycle and, as a result, to $V/({\tau}_{\text{ON}}+{\tau}_{\text{OFF}})$. To explore the effect of deactivation time statistics on decision times, we consider a cycle model where the gene is activated by the binding of the first TF (the 1ormore $k=1$ rule) and deactivation occurs by complete unbinding of the TFs complex (Figure 5b). The resulting activation times are exponentially distributed and contribute to drift and diffusion as in the simple two state promoter model (Figure 5d). The deactivation times are a convolution of the concentrationdependent second binding and the concentrationindependent unbinding of the complex and their probability distribution is ${P}_{\text{ON}}(t,L)=\frac{\nu {\mu}_{2}L}{\nu {\mu}_{2}L}\left({e}^{{\mu}_{2}Lt}{e}^{\nu t}\right)$. Drift and diffusion can be obtained analytically (Appendix 4). The concentrationdependent deactivation times prove informative for reducing the mean decision time at low TF concentrations but increase the decision time at high TF concentrations compared to the simplest irreversible binding model. In the limit of unbinding times of the complex ($1/\nu $) much larger than the second binding time ($1/{\mu}_{2}L$), no information is gained from deactivation times. In the limit of ${\mu}_{1}L/\nu \to \mathrm{\infty}$, the $k=1$ model reduces to a one binding site exponential model and the two architectures (Figure 5b and d) have the same asymptotic performance.
Within the irreversible schemes of Figure 5a and Figure 5b, the average time of one activation/deactivation cycle is the same for the allornothing $k=2$ and 1ormore $k=1$ activation schemes. The difference in the schemes comes from the information gained in the drift term $V/({\tau}_{\text{ON}}+{\tau}_{\text{OFF}})$, which begs the question : is it more efficient to deconvolve the second binding event from the first one within the allornothing $k=2$ activation scheme, or from the deactivation event in the $k=1$ activation scheme?
In general, the convolution of two concentrationdependent events is less informative than two equivalent independent events, and more informative than a single binding event. For small concentrations L, activation events are much longer than deactivation events. In the $k=1$ scheme, OFF times are dominated by the concentrationdependent step ${\mu}_{2}L$ and the two activation events can be read independently. This regime of parameters favors the $k=1$ rule (Figure 5e). However, when the concentration L is very large the two binding events happen very fast and for ${\mu}_{2}L\gg \nu $, in the $k=1$ scheme, it is hard to disentangle the binding and the unbinding events. The information gained in the second binding event goes to 0 as $L\to \mathrm{\infty}$ and the oneormore $k=1$ activation scheme (Figure 5b) effectively becomes equivalent to a single binding site promoter (Figure 5d), making the allornothing $k=2$ activation (Figure 5a) scheme more informative (Figure 5e). The fastest decision time architecture systematically convolves the second binding event with the slowest of the other reactions (Figure 5e), with the transition between the two activation schemes when the other reactions have exactly equal rates (${\mu}_{1}L=\nu $ line in Figure 5e) (see Appendix 6 for a derivation).
How the number of binding sites affects decisions
The above results have been obtained with six binding sites. Motivated by the possibility of building synthetic promoters (Park et al., 2019) or the existence of yet undiscovered binding sites, we investigate here the role of the number of binding sites. Our results suggest that the main effect of additional binding sites in the fly embryo is to increase the size of the target (and possibly to allow for higher cooperativity and Hill coefficients). To better understand the influence of the number of binding sites on performance at the diffusion limit, we compare a model with one binding site (Figure 5d) to a reversible model with two binding sites where the gene is activated only when both binding sites are bound (allornothing $k=2$, Figure 5c). Just like for the six binding site architectures, we describe this two binding site reversible model by using the transition matrix of the $N+1$ Markov chain and calculate the total activation time ${P}_{\text{OFF}}(t,L)$.
For fixed values of the real concentration L, the two hypothetical concentrations L_{1} and L_{2}, the error e and the second offrate ${\nu}_{2}$, we optimize the remaining parameters μ_{1}, μ_{2} and ν_{1} for the shortest average decision time.
For high gene deactivation rates ${\nu}_{2}$, the fastest decision time is achieved by a promoter with one binding site (Figure 5f): once one ligand has bound, the promoter never goes back to being completely unbound (${\nu}_{1}^{*}$=0 in Figure 5c) but toggles between one and two bound TF (Figure 5d with $\nu ={\nu}_{2}$ and $\mu ={\mu}_{2}$). For lower values of gene deactivation rates ${\nu}_{2}$, there is a sharp transition to a minimal $\u27e8T\u27e9$ solution using both binding sites. In the allornothing activation scheme that is used here, the distribution of deactivation times is ligand independent and the concentration is measured only through the distribution of activation times, which is the convolution of the distributions of times spent in the 0 and 1 states before activation in the two state. For very small deactivation ${\nu}_{2}$ rates, it is more informative to 'measure’ the ligand concentration by accumulating two binding events every time the gene has to go through the slow step of deactivating (Figure 5c). However, for large deactivation rates little time is 'lost’ in the uninformative expressing state and there is no need to try and deconvolve the binding events but rather use direct independent activation/deactivation statistics from a single binding site promoter (Figure 5d, see Appendix 7 for a more detailed calculation).
The role of weak binding sites
An important observation about the strength of the binding sites that emerge from our search is that the binding rates are often below the diffusion limit ${\mu}_{\text{max}}{L}_{0}\simeq 0.124{s}^{1}$ (see black dashed line in Figure 4b) : some of the ligands reach the receptor, they could potentially bind but the decision time decreases if they do not. In other words, binding sites are 'weak’ and, since this is also a feature of many experimental promoters (Gertz et al., 2009), the purpose of this section is to investigate the rationale for this observation by using the models described in Figure 5.
Naively, it would seem that increasing the binding rate can only increase the quality of the readout. This statement is only true in certain parameter regimes, and weaker binding sites can be advantageous for a fast and precise readout. To provide concrete examples, we fix the deactivation rate ν and the first binding rate μ_{1} in the 1ormore irreversible binding model of Figure 5b and we look for the unbinding rate ${\mu}_{2}^{*}$ that leads to the fastest decision. We consider a situation where the two binding sites are not interchangeable and binding must happen in a specific order. In this case, the diffusion limit states that ${\mu}_{2}\le {\mu}_{1}$ if the first binding is strong and happens at the diffusion limit. We optimize the mean decision time for $0\le {\mu}_{2}\le {\mu}_{1}$ (see Appendix 9—figure 1 for an example) and find a range of parameters where the fastestdecision value ${\mu}_{2}^{*}<{\mu}_{1}$ is not as fast as parameter range allows (Figure 5g). We note that this weaker binding site that results in fast decision times can only exist within a promoter structure that features cooperativity. In this specific case, the first binding site needs to be occupied for the second one to be available. If the two binding sites are independent, then the diffusion limit is ${\mu}_{2}\le {\mu}_{1}$ and the fastest $\u27e8T\u27e9$ solution always has the fastest possible binding rates.
Predictions for Bicoidbinding sites mutants
In addition to results for wild type embryos, our approach also yields predictions that could be tested experimentally by using synthetic hb promoters with a variable numbers of Bicoidbinding sites (Figure 6a). For any of the fast decisionmaking architectures identified and activation rules chosen, we can compute the effects of reducing the number of binding sites. Specifically, our predictions for the $k=3$ activation rule and $H>4$ in Figure 6b can be compared to FISH or fluorescent live imaging measurements of the fraction of active loci at a given position along the anteriorposterior axis. Bcdbinding site mutants of the WT promoter have been measured by immunostaining in cycle 14 (Park et al., 2019), although mRNA experiments in earlier cell cycles of well characterized mutants are needed to provide for a more quantitative comparison.
An important consideration for the comparison to experimental data is that there is a priori no reason for the hb promoter to have an optimal architecture. We do find indeed many architectures that satisfy all the experimental constraints and are not the fastest decisionmaking but 'good enough’ hb promoters. A relevant question then is whether or not similarity in performance is associated with similarity in the microscopic architecture. This point is addressed in Figure 6c, where we compare the fraction of active loci along the AP axis using several constraintconforming architectures for $H>4$ and the $k=2$ and $k=3$ activation rules. The plot shows that solutions corresponding to the same activation rule are clustered together and quite distinguishable from the rest. This result suggests that the precise values of the binding and unbinding constants are not important for satisfying the constraints, that many solutions are possible, and that FISH or MS2 imaging experiments can be used to distinguish between different activation rules. The fraction of active loci in the boundary region is an even simpler variable that can differentiate between different activation rules (Figure 6d). Lastly, we make a prediction for the displacement of the anteriorposterior boundary in mutants, showing that a reduced numbers of Bcd sites results in a strong anterior displacement of the hb expression border compared to six binding sites, regardless of the activation rule (Figure 6e). Error bars in Figure 6e, that correspond to different closetofastest architectures, confirm that these share similar properties and different activation rules are distinguishable.
Joint dynamics of Bicoid enhancer and promoter
The Bicoid transcription factor has been shown to target more than a thousand enhancer loci in the Drosophila embryo with a wide concentration range of sensitivities (Driever and NüssleinVolhard, 1988; Struhl et al., 1989; Hannon et al., 2017). Enhancers are of special interest because they can be located far away from the promoter (Ribeiro et al., 2010; Krivega and Dean, 2012) and perform a statistically independent sample of the concentration that is later combined with that of the promoter. Evidence suggests that promoterbased conformational changes can be stable over long times (Fukaya et al., 2016), which mimics information storage during a process of signal integration. To explore these effects, we consider a simple model of enhancer dynamics where a Bicoidspecific enhancer switches between two states ON and OFF independently of the promoter dynamics. We assume a simple rule for the gene activity: the gene is transcribed when both the promoter and the enhancer are ON. As an example, we consider an enhancer made of one binding site so that the ON rate ${e}_{\text{ON}}$ of the enhancer is limited by the diffusion rate ${\mu}_{\text{max}}L$. As an example, we perform a parameter search for the promoter activation rule $k=2$ (see Appendix 12), while still assuming that about half the nuclei are active at the boundary and a required Hill coefficient greater than 4, looking for architectures yielding the shortest decision time for an error rate of 32% for the 10% relative concentration difference discrimination problem. We find that the enhancer improves the performance of the readout, reducing the time to decision by about $\simeq 6\%$. We find that adding extra binding sites to the promoter increases the computing power of the enhancerpromoter system and can reduce the time to decision to about 60% of the performance of the best architectures without enhancers.
Estimating the loglikelihood function with RNA concentrations
To illustrate how a biochemical network can approximate the calculation of the loglikelihood, we consider the case of the fastest architecture identified in the paragraph ’Numerical procedure for identifying fast decisionmaking architectures’ with $k=1$ and $H>4$. In Figure 7a, we show the contributions of the OFF times (blue) and the ON times (red) to the loglikelihood for this architecture. We notice that the behavior of the loglikelihood contributions at long times is simply linear in time with a positive rate for ON times and a negative rate for OFF times. Conversely, short ON times contribute negatively to the loglikelihood while short OFF times contribute positively to the loglikelihood (Figure 7a). This observation suggests a simple model of RNA production with delay to approximate the computation of the loglikelihood. We consider a model of RNA production with five parameters (Figure 7b, details of the model are given in Appendix 11). We assume that when the promoter is in the ON state, polymerase is loaded and RNA is transcribed at a constant rate ${r}_{\text{ON}}$ while when the promoter is in the OFF state, RNA is produced at a lower basal rate ${r}_{b}$. In order to approximate the linear behavior of the loglikelihood function at long times we suggest the existence of an enzyme actively degrading hunchback RNA (Chanfreau, 2017) and include in our model of RNA production the delay ${d}_{\text{ON}}$ and inertia ${d}_{\text{OFF}}$ associated with promoter dynamics and polymerase loading. RNA levels fluctuate and trigger decisions when the concentration $[\text{RNA}]$ is greater than a threshold ${c}_{1}(t)$ (anterior decision) or lower than a threshold ${c}_{2}(t)$ (posterior decision). Since many forms of switch have already been presented in the literature (Goldbeter and Koshland, 1981; Tyson et al., 2003; Ozbudak et al., 2004; Siggia and Vergassola, 2013; Sandefur et al., 2013), we shall concentrate on the loglikelihood calculation and refer to previous references for the implementation of a decision when reaching a threshold.
We look for parameters that satisfy both a high speed and high accuracy requirement for a decision between two points located 2% egg lengths apart across the midembryo boundary. For the fastest architecture identified with $k=1$ and $H>4$, we identify parameters that satisfy $e<32\%$ and a mean decision time $T<3$ min (see Appendix 11—figure 1). We check that this model produces a profile of RNA that is consistent with the observed high Hill coefficient (Figure 7c). Interestingly, we find that for this particular set of parameters the RNA profile Hill coefficient is increased by the delayed transcription dynamics and the active degradation from $\simeq 4$ (blue line in Figure 7c) up to $\simeq 5.2$ (red line in Figure 7c, details of the calculation are given in Appendix 11). This result could shed new light on the fundamental limits to Hill coefficients in the context of cooperative TF binding (Estrada et al., 2016; Tran et al., 2018) and provide a possible mechanism to explain how mRNA profiles can reach higher steepness than the corresponding TF activities do. We also looked for parameter sets that approximate the loglikelihood for the optimal architecture identified for $k=2$ and $H>4$ and find several candidates that fall close to the requirement of speed and accuracy (Appendix 11—figure 1). Together these results show that implementing the loglikelihood using a molecular circuit with a hb promoter is possible. They do not show this is what is happening in the embryo itself.
Discussion
The issue of precision in the Bicoid readout by the hunchback promoter has a long history (NüssleinVolhard and Wieschaus, 1980; Tautz, 1988). Recent interest was sparked by the argument that the amount of information at the hunchback locus available during one nuclear cycle is too small for the observed 2% EL distance between neighboring nuclei that are able to make reproducible distinct decisions (Gregor et al., 2007b). By using updated estimates of the biophysical parameters (Porcher et al., 2010; Tran et al., 2018), and the BergPurcell error estimation, we confirm that establishing a boundary with 2% variability between neighbouring nuclei would take at least about 13.4 min – roughly the nontransient expression time in nuclear cycle 14 (Lucas et al., 2018; Tran et al., 2018) (Appendix 1). This holds for a single Bicoidbinding site. An intuitive way to achieve a speed up is to increase the number of binding sites: multiple occupancy time traces are thereby made available, which provides a priori more information on the Bicoid concentration.
Possible advantages of multiple sites are not so easy to exploit, though. First, the various sites are close and their respective bindings are correlated (Kaizu et al., 2014), so that their respective occupancy time traces are not independent. That reduces the gain in the amount of information. Second, if the activation of gene expression requires the joint binding of multiple sites, the transition to the active configuration takes time. The overall process may therefore be slowed down with respect to a single binding site model, in spite of the additional information. Third, and most importantly, information is conveyed downstream via the expression level of the gene, which is again a single time trace. This channeling of the multiple sites’ occupancy traces into the single time trace of gene expression makes gene activation a real information bottleneck for concentration readout. All these factors can combine and even lead to an increase in the decision time. To wit, an allornothing equilibrium activation model with six identical binding sites functioning at the diffusion limit and no cooperativity takes about 38 min to achieve the same above accuracy. In sum, the binding site kinetics and the gene activation rules are essential to harness the potential advantage of multiple binding sites.
Our work addresses the question of which multisite promoters architecture minimize the effects of the activation bottleneck. Specifically, we have shown that decision schemes based on continuous updating and variable decision times significantly improve speed while maintaining the desired high readout accuracy. This should be contrasted to previously considered fixedtime integration strategies. In the case of the hunchback promoter in the fly embryo, the continuous update schemes achieve the 2% EL positional resolution in less than 1 min, always outperforming fixedtime integration strategies for the same promoter architecture (see Appendix 1—table 1). While 1 min is even beyond what is required for the fly embryo, this margin in speed allows to accommodate additional constraints, viz. steep spatial boundary and biophysical constraints on kinetic parameters. Our approach ultimately yields many promoter architectures that are consistent with experimental observables in fly embryos, and results in decision times that are compatible with a precise readout even for the fast nuclear cycle 11 (Lucas et al., 2018; Tran et al., 2018).
Several arguments have been brought forward to suggest that the duration of a nuclear cycle is the limiting time period for the readout of Bicoid concentration gradient. The first one concerns the reset of gene activation and transcription factor binding during mitosis. In that sense, any information that was stored in the form of Bicoid already bound to the gene is lost. The second argument is that the hunchback response integrated over a single nuclear cycle is already extremely precise. However, none of these imply that the hunchback decision is made at a fixedtime (corresponding to mitosis) so that strategies involving variable decision times are quite legitimate and consistent with all the known phenomenology.
We have performed our calculations in a worstcase scenario. First, we did not consider averaging of the readout between neighbouring nuclei. While both protein (Gregor et al., 2007a) and mRNA concentrations (Little et al., 2013) are definitely averaged, and it has been shown theoretically that averaging can both increase and decrease (Erdmann et al., 2009) readout variability between nuclei, we do not take advantage of this option. The fact that we achieve less than 3 min in nuclear cycle 11, demonstrates that averaging is a priori dispensable. Second, we demand that the hunchback promoter results in a readout that gives the positional resolution observed in nuclear cycle 14, in the time that the hunchback expression profile is established in nuclear cycle 11. The reason for this choice is twofold. On the one hand, we meant to show that such a task is possible, making feasible also less constrained setups. On the other hand, the hunchback expression border established in nuclear cycle 11 does not move significantly in later nuclear cycles in the WT embryo, suggesting that the positional resolution in nuclear cycle 11 is already sufficient to reach the precision of later nuclear cycles. The positional resolution that can be observed in nuclear cycle 11 at the gene expression level is $\sim 10\%$ EL (Tran et al., 2018), but this is also due to smaller nuclear density.
Two main factors generally affect the efficiency of decisions: how information is transmitted and how available information is decoded and exploited. Decoding depends on the representation of available information. Our calculations have considered the issue of how to convey information across the bottleneck of gene activation, under the constraint of a given Hill coefficient. The latter is our empirical way of taking into account the constraints imposed by the decoding process. High Hill coefficients are a very convenient way to package and represent positional information: decoding reduces to the detection of a sharp transition, an edge in the limit of very high coefficients. The interpretation of the Hill coefficient as a decoding constraint is consistent with our results that an increase in the coefficient slows down the decision time. The resulting picture is that promoter architecture results from a balance between the constraints imposed by a quick and accurate readout and those stemming from the ease of its decoding. The very possibility of a balance is allowed by the main conclusion demonstrated here that promoter structures can go significantly below the time limit imposed by the duration of the early nuclear cycles. That leaves room for accommodating other features without jeopardising the readout timescale. While the constraint of a fixed Hill coefficient is an effective way to take into account constraints on decoding, it will be of interest to explore in future work if and how one can go beyond this empirical approach. That will require developing a joint description for transmission and decoding via an explicit modeling of the mechanisms downstream of the activation bottleneck.
Recent work has shed light on the role of out of equilibrium architectures on steepness of response (Estrada et al., 2016) and gradient establishment (Tran et al., 2018; Park et al., 2019). Here, we showed that equilibrium architectures perform very well and achieve short decision times, and that out of equilibrium architectures do not seem to significantly improve the performance of promoters, except for making some switches from gene states a bit faster. Nonequilibrium effect can, however, increase the Hill coefficient of the response without adding extra binding sites, which is useful for the downstream readout of positional information that we formulated above as decoding.
We also showed how short bound times of Bicoid molecules to the DNA (Mir et al., 2017; Mir et al., 2018) are translated into accurate and fast decisions. Our fast decisionmaking architectures also display short DNAbound times. However, the constraint of high cooperativity means that the distribution of bound times to the DNA is nonexponential and the rare long binding times that occur during the bursty binding process are exploited during the readout. The combination of high cooperativity and high temporal variance due to bursty dynamics is a possible recipe for an accurate readout.
At the technical level, we developed new methods for the mean decision time of complex gene architectures within the framework of variable time decisionmaking (SPRT). This allowed us to establish the equality $V=D$ between drift and diffusion of the loglikelihood between two close hypotheses. Its underlying reason is the martingale property that the conditional expectation of probabilities for two hypotheses, given all prior history, is equal to their present value. The methodology developed here will be useful for the broad range of decision processes where SPRT is relevant, including neuroscience (Gold and Shadlen, 2007; Bitzer et al., 2014) and synthetic biology (O'Brien and Murugan, 2019; Pittayakanchit et al., 2018).
We made predictions about how promoter architectures with different activation schemes can be compared in synthetic embryos with different numbers of Bcd binding sites. Furthermore, experiments that change the composition of the syncytial medium would influence the diffusion constant and assay the assumption of diffusionlimited activation. Our model predicts that these changes would result in modifications of hunchback activation profiles: higher or lower diffusion rates slide the hunchback profile towards the anterior or the posterior end of the embryo, respectively, similarly to an increase or a decrease of the number of Bicoid binding sites. Any of the above experiments would greatly advance our understanding of the molecular control of spatial patterning in Drosophila embryo and, more generally, of regulatory processes.
Finally, we showed how a simple model of RNA production with delay and active degradation could easily approximate the seemingly complex loglikelihood calculation. The specific implementation that we described is tentative and aimed at simplicity, yet it illustrates several points, namely that the loglikelihood can be approximated by preserving a high Hill coefficients, observed characteristics of transcription dynamics and ensuring speedaccuracy limits. Future experiments could identify candidates for the enzyme responsible for the active degradation of RNA, or image the formation and dissolution of clusters using superresolution imaging methods. This mechanism is also very close to the multifactor clusters (Mir et al., 2018) observed recently in early Drosophila in conjunction with active transcription sites. We suggest this mechanism as an example of implementation of the loglikelihood calculation, but note that the added complexity of the computation could happen at different levels of the expression machinery, including upstream of promoter activation through enhancer dynamics and chromatin arrangement.
Appendix 1
Biological parameters in the embryo
To build a model of the promoter, we combine parameters from recent experimental work.
The embryo at nuclear cycle n is modelled as having ${2}^{n1}$ nuclei/cells. For simplicity, we assume they are equidistributed on the periphery of the embryo and across the embryo’s length and do not take into account the effect of the geometry of the embryo because the curvature of the embryo is small (the embryo is about 500 µmlong along the anteriorposterior axis and only about 100 µmlong along the dorsoventral axis). We also neglect the few nuclei forming pole cells and remaining in the bulk (Foe and Alberts, 1983).
The Bicoid concentration in the embryo is given by $L(x)={L}_{0}{e}^{(x{x}_{0})/\lambda}$, where x is the position along the anteriorposterior (AP) embryo axis measured in % of the egg length, $\lambda $ is the decay length also measured in % of the egg length ($\lambda \simeq 100\mu m$ which is roughly 20% of the egg length (Gregor et al., 2007b) and x_{0} is the position of halfmaximum hunchback expression (x_{0} is of the order of 250 µm and varies slightly depending on cell cycle, usually close to 45% egg length for the WT hunchback promoter (Porcher et al., 2010). ${L}_{0}\simeq 5.6\mu {m}^{3}$ is the concentration of free Bicoid molecules at the AP boundary (AbuArish et al., 2010) (that also corresponds to the point of largest hunchback expression slope (Tran et al., 2018).
To compute the diffusion limited arrival rate at the locus, we use the following parameters: diffusivity $D\simeq 7.4\mu {m}^{2}{s}^{1}$ (Porcher et al., 2010; AbuArish et al., 2010), concentration of free Bicoid molecules at the AP boundary ${L}_{0}$, size of the binding target $a\simeq 3nm$ (Gregor et al., 2007a), which leads to an effective ${\mu}_{\text{max}}{L}_{0}=Da{L}_{0}\simeq 0.124{s}^{1}$ at the boundary. This value is an upper bound, assuming that every encounter between a transcription factor and a binding site results in successful binding. We note in Main Text Figure 4b that most of the ON rates are close to the diffusion limit. We conclude that in this parameter regime, the most efficient strategy is to have ON events that are as fast as possible. The only reason to reduce them is to achieve the required Hill coefficient. That can be done by either adjusting the ON or the OFF rates.
The above estimate $\mu}_{\mathrm{m}\mathrm{a}\mathrm{x}}{L}_{0}=0.124{s}^{1$ may be inaccurate for various reasons and we ought to explore the sensitivity of results to those uncertainties. Appendix 1—table 1 recapitulates the timeperformance of different strategies for different choices of the parameters. A first source of uncertainty is the value of the diffusivity, which is estimated to vary between 4 and $7\mu {m}^{2}{s}^{1}$ (Porcher et al., 2010; AbuArish et al., 2010). We consider then two possible values for the diffusivity from Porcher et al., 2010: ${D}_{1}=4.5\mu {m}^{2}{s}^{1}$ and the aforementioned value ${D}_{2}=7.4\mu {m}^{2}{s}^{1}$. A second source of uncertainty is that the actual Bicoid concentration at the boundary could vary by up to a factor two depending on estimates of the concentration and its decay length (Gregor et al., 2007b; Tran et al., 2018; AbuArish et al., 2010). We consider then two possible value for the concentration: the aforementioned ${L}_{0}^{(1)}=5.6$ molecules per $\mu {m}^{3}$, and ${L}_{0}^{(2)}=11.2$ molecules per $\mu {m}^{3}$. Finally, we assumed above that the size of the target is the full Bicoid operator site, which we took about ten base pairs following Gregor et al., 2007a. However, assuming that the TF must reach a specific position on the promoterbinding site could reduce the size of the target to a single base pair, that is a by a factor 10. In terms of parameters, we consider then two possible values for a : either ${a}_{1}\simeq {3.10}^{4}\mu m$, or ${a}_{2}={3.10}^{3}\mu m$. All in all, taking into account the various sources of uncertainty ${\mu}_{\text{max}}{L}_{0}$ can range in the interval $[0.0076,0.25]{s}^{1}$.
We consider four possible decision strategies. The first one is a single binding site making a fixedtime decision. This computation is made using the original BergPurcell formula (Berg and Purcell, 1977). In the BergPurcell strategy, the concentration of the ligand is inferred based on the total time that the receptor or the binding site has spent occupied by ligands. Due to averaging, the relative error of the concentration readout is inversely proportional to the number of independent measurements of the concentration that can be made within the total fixed time T, that is, to the arrival rate of new Bicoid molecules at the binding site, multiplied by the probability to find the binding site empty (in our case, at the boundary, the probability is roughly one half). Since the rate of arrival of new Bicoid molecules to the binding site is ${\mu}_{\text{max}}{L}_{0}=Da{L}_{0}$, the relative error of the concentration readout is given by
where ${L}_{0}$ is the estimate of the concentration, T is the integration time and $\overline{n}$ is the probability that the binding site is full.
In the second strategy, there are two sets of six binding sites being read independently. In that case, the information from each binding site can be accessed individually and their contributions averaged to give a more precise estimate. This calculation again can be made using the original BergPurcell formula (Berg and Purcell, 1977) for several receptors, dividing the relative error by the square root of the total number of binding sites in Equation 8:
where N is the total number of binding sites (in our case, $N=12$).
For the third possibility, we consider a decision made at a fixed time using the fastest architecture identified (Main Text Figure 4) without constraint on the slope (activation rule $k=1$). We compute the decision time using the driftdiffusion approximation with fixed time (see Appendix 2).
Finally, we consider the fastest architecture identified with a random decision time and the Sequential Probability Ratio Test (SPRT) strategy.
The result of the above calculations is that for a single receptor estimating Bcd concentration with 10% precision within a fixedtime BergPurcell type calculation (see Appendix 2), decisions take between 6 min for the fastest binding rates and ∼4 hr for the slowest estimates. Conversely, by using the onthefly SPRT decisionmaking process and the oneormore $k=1$ scheme at equilibrium, the time needed to make decisions with 10% precision and an error rate of 32% at the boundary is $\simeq 30\mathrm{s}$ for the fastest rates and $\simeq 17$ min for the slowest rates. For all sets of parameters, the onthefly SPRT decisionmaking process gives a ∼3.5fold faster decision time than the $N=12$ BergPurcell estimate and a >10fold faster decision making time than the onebindingsite BergPurcell estimate. For the fastest rates, a decision with an error rate of less than 5% can be achieved in about 5 min within the SPRT scheme.
Appendix 2
Error rate and decision time for the fixedtime decision strategy
In this section, we describe how we compute the decision time for a fixedtime strategy (or 'BergPurcell type decision’) and a complex promoter architecture.
The classic BergPurcell calculation is based on the idea of averaging the time spent by the ligand bound to a receptor (or, in our case, a binding site). The original calculation assumed that the waiting times between binding and unbinding are exponential and that the bound times are not informative about the concentration. Neither of these assumptions hold in the case of the hunchback promoter. Indeed, in the context of a complex promoter architecture, the waiting times that are available to the nucleus or cell downstream are the gene’s ON and OFF switching times. They are not exponentially distributed, and, depending on the activation rule, the OFF times can be just as informative about the concentration as the ON times. For these two reasons, we ought to readapt the BergPurcell idea to compute the mean decision time.
To that purpose, we consider a decision with a given rate of error e, fix a time of decision T and choose the concentration that has the highest likelihood between the two options L_{1} and L_{2}. In other words, if $\text{log}R(T)=\text{log}P({L}_{1})/P({L}_{2})>0$ then the nucleus chooses $L={L}_{1}$, while if $\text{log}R(T)<0$ then it chooses $L={L}_{2}$. For instance, if the actual concentration $L={L}_{1}$, then the probability of error at time T is given by $P(\text{log}R(t)<0)$.
To calculate the above error, we use the driftdiffusion approximation for $\text{log}R(t)$, compute the drift V and diffusivity D from Main Text Equations 14 and approximate the distribution of $\text{log}R(T)$ by a normal distribution with mean $VT$ and standard deviation $\sqrt{DT}$. We compute the error rate e for the fixedtime decision process based on the Gaussian approximation. Finally, to find the mean decision time for a given error rate, we perform a quick onedimensional search over T, which yields the value of the fixed decision time appropriate for the prescribed error e.
Appendix 3
The mean firstpassage time for the decisionmaking process
To investigate the role of promoter architectures in decisionmaking, we apply the SPRT approach (Sequential Probability Ratio Test) (Wald, 1945a; Siggia and Vergassola, 2013). In the simplest formulation of this approach, a decision is made between two hypotheses about the concentration of a surrounding TF: either the TF is at concentration L_{1} or it is at concentration L_{2}. The decision is based on the binding and unbinding events of the TF to a promoter. At each point in time, the error of committing to a given concentration (scenario) is estimated by computing the ratio $R(t)$ of the probability of one hypothesis, $P({L}_{1})$ (the surrounding concentration is L_{1}) over the other (the concentration is L_{2}), $P({L}_{2})$:
Technically, the time dependent likelihood ratio, $R(t)$ undergoes a random walk as TFs bind and unbind to the promoter, activating and deactivating the gene. The process is terminated and a decision is made when the likelihood ratio falls below the desired error level, which is expressed in terms of absorbing boundaries at K_{+} (the concentration is L_{1}) and ${K}_{}$ (the concentration is L_{2}): $\text{log}R(t)\le {K}_{}$ or $\text{log}R(t)\ge {K}_{+}$ (see Main Text Figure 2). The boundaries ${K}_{\pm}$ can be expressed in terms of the error e and for symmetric errors, ${K}_{+}={K}_{}=(1e)/e$ (Siggia and Vergassola, 2013). In our case $e=32\%$. The mean time for decision can be computed for each discrimination task as the mean firstpassage time of a random walk (in the limit of long decision times). In Appendix 2—figure 1, we show the mean decisiontime for different values of e.
Assuming the gene has two levels of activation ON and OFF, the information available to downstream mechanisms is a series of ON times ${s}_{i}$ and OFF times ${t}_{j}$ of gene activity. The probability of a concentration hypothesis is then $P({L}_{m})=P(\{{t}_{i}\},\{{s}_{j}\}{L}_{m})$. If successive ON and OFF times are independent then the probabilities factorize. The log ratio is then written as a function of the ON (respectively OFF) time probability distribution ${P}_{\text{ON}}(t,L)$ (respectively ${P}_{\text{OFF}}(t,L)$)
where ${J}_{}$ is the number of ON to OFF switching events and ${J}_{+}$ the number of OFF to ON switching events (Siggia and Vergassola, 2013). To understand the dynamics of the log ratio, it is necessary to compute the distributions ${P}_{\text{ON}}(t,L)$ and ${P}_{\text{OFF}}(t,L)$ based on the promoter dynamics, which is the focus of the following subsection.
From binding to gene activation
In this subsection, we describe how the highdimensional complete state of the promoter is mapped onto the lowdimensional activity of the gene. We use a formalism similar to that of Desponds et al., 2016; Tran et al., 2018.
The promoter features N binding sites : its complete state at time T is described by an N dimensional vector $\overrightarrow{B}(t)$, where ${B}_{i}=0/1$ if the binding site i is empty/full. We assume the gene has two levels of activity: either it is OFF and no mRNA is produced, or it is ON and mRNA is produced at a fixed rate. Activity is then described by a simple Boolean variable $a(t)$ equal to 0/1, corresponding to the gene being OFF/ON.
We also assume that the activity of the gene depends only on the total number of transcription factors bound to the promoter so that there is an integer $1\le k\le N$ such that $a(t)={\mathbb{\U0001d7d9}}_{\sum _{i}{B}_{i}\ge k}$ (where $\mathbb{\U0001d7d9}$ is the indicator function). From the point of view of gene activity, we are only interested in the dynamics of $\text{\mathbf{B}}(t)={\sum}_{i}{B}_{i}(t)$. We make another simplifying assumption: the probabilities of $\text{\mathbf{B}}(t)$ increasing or decreasing only depend on the value of $\text{\mathbf{B}}(t)$ and not on which binding sites specifically are bound or unbound, that is $\text{\mathbf{B}}(t)$ itself has a Markovian dynamics in $\{0,1,\mathrm{\dots}N\}$.
At a given time t, we describe the stochastic state of the promoter as a vector of probability $\overrightarrow{X}(t)$ where ${X}_{i}(t)$ is the probability of having $\text{\mathbf{B}}=i1$ and ${\sum}_{i}{X}_{i}(t)=1$. The Markovian dynamics of B translate into an $(N+1)\times (N+1)$ transition matrix M for $\overrightarrow{X}:\overrightarrow{X}(t+s)={e}^{Ms}\overrightarrow{X}(t)$. M describes the dynamics of the promoter from the point of view of gene activity. If transcription factors do not dimerize or form complexes, then ${M}_{i,j}\ne 0$ only if $ij\le 1$ since the probability of two of them binding or unbinding decreases as the square of time for short times.
Let us now relate the statistics of the switching times for the gene activity a to the transition matrix M. Starting from a distribution of OFF states $\overrightarrow{{X}_{0}}$ we compute the cumulative distribution function of the waiting time $\tau $ before switching to the $ON$ state
where Trans is the transpose, ${\overrightarrow{X}}_{\text{ON}}$ is a vector of 1 for states corresponding to active genes and 0 for states corresponding to inactive genes, and ${M}_{\text{OFF}}^{*}$ is a modified version of M where all ON states act as 'sinks’ (i.e. all the transition rates from ON states have been set to 0). The expression in Equation 12 computes the cumulative probability of transitions to an ON state before time t, that is ${P}_{\text{OFF}}(t)=dP(\tau \le t)/dt$.
For simplicity, we restrict ourselves to promoter structures where there is only one point of entry into the ON or the OFF states (i.e. any switching event from ON to OFF or vice versa will end with the promoter having a specific number of binding sites). Under this hypothesis, the probability vector $\overrightarrow{{X}_{0}}$ in Equation 12 is uniquely determined and independent of the specific dynamics of the previous ON or OFF times. We relax these constraints on the promoter structure in Appendix 8. We denote by ${\tau}_{\text{OFF}}$ (respectively ${\tau}_{\text{ON}}$) the average of the OFF (respectively ON) times for the real concentration L.
Random walk of the log ratio
When the two hypothesized concentrations are very similar to each other as compared to the concentration scale L set by the actual concentration value, ${L}_{1}{L}_{2}\ll L$, the discrimination is hard and requires a large number of binding and unbinding events. In this limit, the random walk $\text{log}R(t)$ can be approximated by a driftdiffusion process with drift V and diffusion D:
where $\eta $ is a Gaussian white noise. The decision time for the case of symmetric boundaries ${K}_{+}={K}_{}$ is a random variable T with mean given by Equation 1 in the main text (Siggia and Vergassola, 2013). $\u27e8T\u27e9$ depends on the details of the biochemical sensing of the TF concentration and promoter architecture via the drift and diffusivity. Computing V and D in Equation 13 is enough to derive the mean firstpassage time to the decision and even its full distribution by computing its Laplace transform as a solution of the driftdiffusion equation using standard techniques of firstpassage for Gaussian processes (Siggia and Vergassola, 2013).
Drift
For many switching events ${J}^{+}\gg 1$ and ${J}^{}\gg 1$, the sums in Equation 11 can be replaced by continuous averages over binding time distributions. Since ${J}_{}$ and ${J}_{+}$ differ at most by one, we can also replace the two values ${J}_{}\simeq {J}_{+}$ by a unique value J, which is the number of ONOFF cycles. The times are distributed according to the ON and OFF probability distributions for the real concentration L. At a given large time t, the number J of terms in the sum and the logratios appearing in Equation 11 are a priori correlated but Wald’s equality (Wald, 1945a) ensures that the average of the sum is the product of the two averages. Since the expected number of cycles grows linearly in time as $\u27e8J\u27e9\propto t/\left({\tau}_{\mathrm{O}\mathrm{N}}+{\tau}_{\mathrm{O}\mathrm{F}\mathrm{F}}\right)$, we conclude that the drift term in Equation 13 is given by:
where ${D}_{KL}(PQ)={\int}_{0}^{\mathrm{\infty}}dsP(s)\text{log}(P(s)/Q(s))$ is the KullbackLeibler divergence between the two distributions P and Q.
In sum, the drift is determined by how informative the distribution of waiting times in the ON and OFF states are about the concentration differences, with an average rate equal to the inverse of the cycle time ${\tau}_{\text{OFF}}+{\tau}_{\text{ON}}$. The KullbackLeibler divergence appearing in Equation 14 is intuitive : it represents the distance between the real concentration and the hypothetical concentration in the space of probabilistic models of switching times. The larger that distance, the easier it becomes to tell the difference between the two distributions through random sampling.
Expansion of the drift for small concentration differences
When the two candidate concentrations are similar, $L\simeq {L}_{1}\simeq {L}_{2}$, the quantities computed in the previous subsection can be expanded at first order in the differences in concentrations $\delta {L}_{1}={L}_{1}L$ and $\delta {L}_{2}={L}_{2}L$. Starting from Equation 14, we expand the drift V at first and second orders:
where $\hookrightarrow}_{{P}_{\mathrm{O}\mathrm{N}}$ means that the same operations and integrations are performed for P_{ON}.
Due to conservation of probability, the integral of the first and second Lderivatives of ${P}_{\text{OFF}}(t,L)$ vanish. The firstorder expansion of V vanishes and we have at first nonvanishing order in ${L}_{i}L$
If for instance ${L}_{2}L>{L}_{1}L$ then the drift is positive, favouring the concentration L_{1}, closer to the real concentration.
Equality between drift and diffusivity
In this subsection, we present different ways of proving the equality between V and D in SPRT when the two hypotheses are close by connecting different approaches. We show that this equality is a general property of random walks in Bayesian belief space. We check that these results are true in the controlled case of one binding site in Appendix 3—figure 1.
First approach: the exit points of the decision process
As proved by Wald in the original paper where he introduced sequential probability ratio tests (Wald, 1944), the nature of the test (the ratio between the likelihood of two hypotheses) requires a specific relationship between the error and the boundaries that define the regions of decision. In our case, we assume the same probability of calling L_{1} when L_{2} is true and calling L_{2} when L_{1} is true, leading to the definition of only one error level e and two symmetric boundaries K and −K. Specifically, Wald shows that $K=\text{log}\frac{1e}{e}$ (see also Siggia and Vergassola, 2013).
When the two hypotheses are close enough that the variations of the loglikelihood can be approximated by a Gaussian process, one can compute the exit probabilities in terms of the drift V and diffusivity D. The equation for the probability of absorption $\mathrm{\Pi}(X)$ at the upper boundary K is
with the boundary condition $\mathrm{\Pi}(K)=1$ and $\mathrm{\Pi}(K)=0$. The corresponding solution is
as shown in the supplementary material of Siggia and Vergassola, 2013. Setting $x=0$ in Equation 18, we find that
We note that this probability of absorption is also $1e$ by definition of the error (assuming for instance that $L={L}_{1}$) leading to
which in turn gives
And so we find that in the limit of close hypotheses, $V=D$.
Second approach: expansion for small concentration differences
Let us consider the SPRT process between two hypotheses L_{1} and L_{2} and assume for simplicity that $L={L}_{2}$. In this version of the proof, we consider that the difference $\delta L={L}_{1}L$ is small compared to L (i.e $\delta L/L\ll 1$) and expand the expressions for drift and diffusion in increasing orders of $\delta L/L$ to find that they match. We have shown in the subsection ’Expansion of the drift for small concentration differences’ of Appendix 3 that the first nonvanishing term in the expansion of the drift is of order 2 in $\delta L/L$ and is given by Equation 16. An integral formula for the second moment of the loglikelihood is given in Equation A15 of CarballoPacheco et al., 2019 from which we get that the diffusivity of the log ratio is the sum of four terms. The first term is given by
$D}_{1$ is proportional to ${V}^{2}$ and so is of order ${(\delta L/L)}^{4}$. The second term is given by
The prefactor V is of order ${(\delta L/L)}^{2}$. Expanding the first two terms in the brackets gives the same type of terms as in Equation 15. Because of probability conservation we find that these terms are of the same order as V (i.e ${(\delta L/L)}^{2}$). The last two terms have prefactors s and t respectively which break the argument for vanishing first order terms in Equation 15 and these terms are of order $\delta L/L$. Putting the pieces together, we find that ${D}_{2}$ is of order ${(\delta L/L)}^{3}$. The third term is given by
Because ON and OFF times are independent, cross terms in ${D}_{3}$ are products of two $\u27e8\text{log}P(.,{L}_{1})/P(.,{L}_{2})\u27e9$ and are of subleading order ${(\delta L/L)}^{4}$. Using the symmetry between ${P}_{\text{ON}}$ and ${P}_{\text{OFF}}$, we expand one of the square terms in ${D}_{3}$
The fourth term is ${V}^{2}$ which is of order ${(\delta L/L)}^{4}$. So we find that, at leading order, $D\simeq {D}_{3}$ which has the same expansion as V (see Equation 16). And so we recover that for small concentration differences, $V=D$.
In the case of one binding site with ON rate $\mu L$ and OFF rate ν we check that the formula from CarballoPacheco et al., 2019 gives the exact expression computed in Siggia and Vergassola, 2013
Replacing L_{2} with L in Equation 26 and expanding in $\delta L/L$ we have
We identify the drift computed for the one binding site example in the paragraph ’Mean decision time : connecting driftdiffusion and Wald’s approaches’ of the main text and find that at first order drift and diffusivity are equal.
Third approach: general properties of Bayesian random walks
To understand the origin of the equality between drift and diffusion, we go back to the definition of SPRT as the update of the Bayesian beliefs in two competing hypotheses. We no longer condition the process on one hypothesis being true but rather on the probability that each one is true given the value of the log ratio a certain time point. A very general property of posterior distributions updated with evidence is that their average does not change. This martingale property of belief update can be shown in the following way: assume that the nucleus receives extra evidence $\theta $ with probability $p(\theta )$. Before that evidence comes, the probability that hypothesis 1 is true ${p}_{1}$ has a certain distribution $P({p}_{1})$. Then we have that the probability that hypothesis 1 is true varies as :
where we used Bayes rule. The integral of $p(\theta {p}_{1})$ over $\theta $ sums up to one and we are left with
In the context of two hypothetical concentrations, we consider the normalized likelihood of the two hypotheses at time t: ${Q}_{2}(t)={P}_{2}(t)/({P}_{1}(t)+{P}_{2}(t))$ and ${Q}_{1}(t)={P}_{1}(t)/({P}_{1}(t)+{P}_{2}(t))$, where ${P}_{i}(t)$ is the likelihood that hypothesis i is true based on the evidence accumulated up to time t. The martingale property translates as
where ${\u27e8\u27e9}_{\alpha}$ are averages taken assuming hypothesis $\alpha $ is true and $\mathrm{\Delta}Q$ is the variation of Q through time (or accumulated evidence). We note that ${Q}_{1}={e}^{\mathcal{L}(t)}/(1+{e}^{\mathcal{L}(t)})$ and ${Q}_{2}=1/(1+{e}^{\mathcal{L}(t))}$, where $\mathcal{L}(t)=\text{log}{P}_{1}(t)/{P}_{2}(t)$ is the loglikelihood ratio at a given time. We express the variations of the probabilities in terms of the loglikelihood so that we can connect it to the driftdiffusion parameters. The drift and diffusivity that depend a priori on the real concentration are the average and variance of the variation of $\mathcal{L}$, respectively. We have ${\u27e8\mathrm{\Delta}\mathcal{L}(t)\u27e9}_{1}={V}_{(1)}\mathrm{\Delta}t$, ${\u27e8\mathrm{\Delta}\mathcal{L}(t)\u27e9}_{2}={V}_{(2)}\mathrm{\Delta}t$, ${\u27e8{\left(\mathrm{\Delta}\mathcal{L}(t){\u27e8\mathrm{\Delta}\mathcal{L}\u27e9}_{1}\right)}^{2}\u27e9}_{1}=2{D}_{(1)}\mathrm{\Delta}t$ and ${\u27e8{\left(\mathrm{\Delta}\mathcal{L}(t){\u27e8\mathrm{\Delta}\mathcal{L}\u27e9}_{2}\right)}^{2}\u27e9}_{2}=2{D}_{(2)}\mathrm{\Delta}t$ (where $\mathrm{\Delta}$ is the total change over time $\mathrm{\Delta}t$). When the two concentrations are close, the statistics of the acquisition of evidence are symmetric for L_{1} and L_{2} and we get that ${D}_{(1)}={D}_{(2)}=D$ and ${V}_{(1)}={V}_{(2)}=V$. As explained in the paragraph 'Mean decision time : connecting driftdiffusion and Wald’s approaches’ of the main text, computing the derivatives of ${Q}_{i}$ with respect to $\mathcal{L}$ is straightforward. Plugging these relationships into Equation 30 and gathering terms in V and D we find that
from which we derive that $V=D$.
The first passage time to decision
The first exit time of a driftdiffusion process starting from 0 with two symmetric boundaries at +K and −K is given by $\u27e8T\u27e9=K\text{tanh}(VK/2D)/V$ (see for instance [Redner, 2001]). Plugging in $V=D$, we recover Equation 4 of the main text:
We check that this approximation is correct for small concentration differences in Appendix 3—figure 2a,b and c. As mentioned in the main text, because the hyperbolic tangent is very flat for high values of its argument, we find that the mean decision time depends weakly on corrections to $V=D$ when the error rate e is small (which is equivalent to K large). We check that this result in true in Appendix 3—figure 2d. We note that from $K=\text{log}(1e)/e$ we have $\text{tanh}(K/2)=12e$. And so we recover the second part of Equation 4 from the main text
In one of the original papers about SPRT (Wald, 1945a), Wald derives a similar formula for the total number of events before the decision:
where J_{exit} is the number of ONOFF events when decision happens. Combining Equation 34 with Wald’s equality (Wald, 1945b; Durrett, 2010) that states that $\u27e8T\u27e9=\u27e8{J}_{\text{exit}}\u27e9({\tau}_{\text{ON}}+{\tau}_{\text{OFF}})$, we recover Equation 33.
When are correlations between the times of events leading to decision important?
To compute the drift or diffusivity, one must compute the variation of the loglikelihood up to a given time T. The fact that the total time T is fixed introduces correlation between the duration of the different ONOFF events. Can these correlations be ignored, leading to an effective drift or diffusion per cycle? They can only when the two concentrations are close, as we check in Appendix 3—figure 3.
Following the arguments in CarballoPacheco et al., 2019, let us consider the following generating function for the cumulants of the loglikelihood difference ${e}^{\lambda \left({\mathcal{L}}_{2}{\mathcal{L}}_{1}\right)}$ at a given time T assuming hypothesis i is true ($i=1$ or 2) :
where ${\mathcal{L}}_{\alpha}$ is the loglikelihood of hypothesis $\alpha $, n is the number of ONOFF cycles and ${s}_{j}$ and ${t}_{j}$ are respectively the ON and OFF waiting times. This is not exactly what appears in CarballoPacheco et al., 2019, but it is sufficient to grasp the consequences of the correlations introduced by a global constraint on the duration of the process, that is, the $\mathrm{\Theta}$ function. Note that such constraint breaks the statistical independence of the various cycles, and introduces dependencies even though the ${P}_{\text{ON}}$ and ${P}_{\text{OFF}}$ factorize. We rewrite the Heaviside function using the Laplace transform form $\mathrm{\Theta}(x)={\int}_{\gamma i\mathrm{\infty}}^{\gamma +i\mathrm{\infty}}\frac{d\sigma}{2i\pi \sigma}{e}^{\sigma x}$ and the product of the probabilities as ${e}^{{\scriptscriptstyle \sum {p}_{\mathrm{ON}}({s}_{j},{L}_{\alpha})}+{p}_{\text{OFF}}({t}_{j},{L}_{\alpha})}$, where ${p}_{\text{ON}}(.,{L}_{\alpha})$ and ${p}_{\text{OFF}}(.,{L}_{\alpha})$ are the loglikelihoods for hypothesis $\alpha $. Putting the pieces together we obtain
where
One can remark that the expression is factorized, that is, f is raised to the power n, and even f itself can be interpreted as 'expectation value' of ${e}^{\lambda \left[{p}_{\text{ON}}(s,{L}_{2}){p}_{\text{ON}}(s,{L}_{1})\right]}$ (and same with ${p}_{\text{OFF}}(t,{L}_{\alpha})$) over the 'distribution' ${e}^{{p}_{\text{ON}}(s,{L}_{i})+{q}_{\text{OFF}}(t,{L}_{i})\sigma (t+s)}$, that is, the constraint introduces an exponential factor that distorts the weight of the various durations. These elements are consistent with the idea of a random variable per cycle that can be averaged to compute the drift and diffusivity. However, the idea of ignoring correlations between duration of events is not valid because $\sigma $ is a function of $\lambda $. Indeed, summing the series over n gives $f(\sigma ,\lambda )/(1f(\sigma ,\lambda ))$ and the asymptotic behavior is then determined by the first singularity encountered as $\gamma $ is moved to the left for the inverse Laplace transform. For each value of $\lambda $, this determines a value of $\sigma $, that is ${\sigma}_{s}(\lambda )$ (see CarballoPacheco et al., 2019 for the complete calculation).
What the above says more qualitatively is that the variables per cycle are correlated because of the global constraint and there is not a singlecycle effective probability distribution that can account for that because of the dependence of $\sigma $ on $\lambda $. In other words, different moments involve different configurations and distortions of the weights for the durations of the cycles. The only exception is when $d{\sigma}_{s}(\lambda )/d\lambda $ vanishes, which is the case for two close hypotheses. Indeed, from $f({\sigma}_{s}(\lambda ),\lambda )=1$, one obtains $d{\sigma}_{s}/d\lambda \propto \partial f/\partial \lambda $ and it also holds $\partial f/\partial \lambda (\lambda =0)=\u27e8({p}_{\text{ON}}(s,{L}_{2}){p}_{\text{ON}}(s,{L}_{1}))+({p}_{\text{OFF}}(t,{L}_{2}){p}_{\text{ON}}(t,{L}_{1}))\u27e9$. We checked explicitly in the subsection 'Equality between drift and diffusivity' of Appendix 3 that in that limit in the formula from CarballoPacheco et al., 2019 the first two terms are negligible and the diffusivity reduces to the variance of the loglikelihoods only.
Indeed, when the two hypothetical concentrations are close and the correlations can be ignored, a diffusivity per cycle can be defined naturally as
where ${D}_{3}$ is the average of the squared loglikelihood variation (see Equation 24). We check in Appendix 3—figure 3 that this formula is a good approximation for small concentration differences in the context of one binding site. In that context we have
where ν is the OFF rate and $\mu L$ is the ON rate. We find that this approximation is better than the $V=D$ approximation only when the cycle times depend weakly on the concentration (i.e, for one binding site, when ν is small, see Appendix 3—figure 3).
Appendix 4
The mean firstpassage time for different architectures
Activation by multiple TF  irreversible binding allornothing models
We first consider the allornothing $k=2$ two binding site cycle model depicted in Main Text Figure 5a.
In this model, ON events end when the complex formed by two copies of the TF unbinds. It follows that ON times are exponentially distributed and independent of the concentration ${P}_{\text{ON}}(t)=\nu {e}^{\nu t}$. Because they are independent of the concentration, they will not contribute to the log ratios. The mean ON time is given by ${\tau}_{\text{ON}}=1/\nu $.
The activation time (OFF time) requires two copies of the TF to bind and is the sum of the times it takes for each of them to bind. From a probabilistic point of view, this sum becomes a convolution and the OFF times are given by
The average OFF time is ${\tau}_{\text{OFF}}=1/{\mu}_{1}L+1/{\mu}_{2}L$.
We can now compute the drift (the ON times do not contribute to information)
The calculations above are only valid when the two binding rates are different ${\mu}_{1}\ne {\mu}_{2}$. Let us now assume that ${\mu}_{1}={\mu}_{2}$. The ON time distribution is unchanged but the OFF time distribution is now
where we have identified $\gamma (2,1/{\mu}_{1}L)$, the standard Gamma distribution with exponent 2 and parameter $1/{\mu}_{1}L$. The expression of the drift simplifies as:
When the two binding rates μ_{1} and μ_{2} are similar but not exactly equal the activation time distribution can be approximated by the Gamma distribution ${P}_{\text{OFF}}(t,L)\approx \gamma (2,\frac{{\mu}_{1}{\mu}_{2}L}{{\mu}_{2}+{\mu}_{1}})$. It is then convenient to use Equation 43 with $\frac{{\mu}_{1}{\mu}_{2}L}{{\mu}_{2}+{\mu}_{1}}$ as a parameter.
Activation by multiple TF – irreversible binding with 1ormore $k\mathrm{=}\mathrm{1}$ activation
We now consider the nonequilibrium two binding site 1ormore $k=1$ activation model depicted in Main Text Figure 5b. The OFF times are exponentially distributed ${P}_{\text{OFF}}(t,L)={\mu}_{1}L{e}^{{\mu}_{1}Lt}$. They will contribute as in the one bindingsite exponential model.
The ON times are now a convolution of a concentrationdependent exponential step with rate ${\mu}_{2}L$ and a concentrationindependent unbinding with rate ν. Their distribution is
The expression is valid for ${\mu}_{2}L\ne \nu $ and, as previously, reduces to a $\gamma $ distribution with exponent two in case of equality.
The corresponding drift is
Appendix 5
Appendix 6
Allornothing vs 1ormore activation
In this section, we give a more detailed derivation of the result given in the subsection ’Activation rules’ of the main text. Specifically, we compare two activation rules for the same promoter architecture: two binding sites with binding rates μ_{1} and μ_{2}. We consider that the two binding sites bind TF independently so that ${\mu}_{1}=2{\mu}_{2}$ (there are two free targets for the first binding). The two copies of the TF unbind together with unbinding rate ν. It is a cycle model. We consider the 1ormore $k=1$ activation rule (Main Text Figure 5b) and the allornothing activation rule (Main Text Figure 5a). We will prove that the fastest activation scheme is the allornothing scheme when ${\mu}_{1}L>\nu $ and 1ormore when ${\mu}_{1}L<\nu $. This result corresponds to the following intuitive idea : given a choice to associate the second binding with either the first binding or the unbinding for the readout by the cell, it is more informative to convolve it with the fastest of the rates to maximize the information extracted from it. This is in general true if the time per cycle is the same for the different architectures considered.
The total time it takes to go through a cycle is the same for the two architectures: ${\tau}_{\text{ON}}+{\tau}_{\text{OFF}}=1{\mu}_{1}+1/{\mu}_{2}+1/\nu $, so the difference in performance will come from the amount of information extracted per cycle. We are interested in the limit of similar concentrations, that is, $L={L}_{2}$ and ${L}_{1}=L+\delta L$, $\delta L\ll L$. Since $D=V$ and the decision time is inversely proportional to V, it is enough to compare ${V}^{(\text{all})}$ (the drift in the allornothing scheme) with ${V}^{(\text{one})}$ (the drift in the 1ormore scheme) to determine the fastest architecture.
To compute ${V}^{(\text{all})}$, we start from Equation 41, plugging in ${L}_{2}=L$, ${L}_{1}=L+\delta L$ and ${\mu}_{1}=2{\mu}_{2}$. We expand for $\delta L/L\to 0$:
where $\zeta $ is the Riemann zeta function. Eventually we have
While the expansion of ${V}^{\text{one}}$ does not take a simple form in general, it can be computed for the specific value of ${\mu}_{1}L=\nu $. When $\nu =L{\mu}_{1}=2L{\mu}_{2}$, $L={L}_{2}$ and ${L}_{1}=L+\delta L$, expanding to second order Equation 45 becomes
We can now use the above equality and the following arguments to reach the conclusion stated at the beginning of the section. ${V}^{(\text{all})}$ is independent of ν because no information is gained about the concentrations from the step involving unbinding. Conversely, ${V}^{(\text{one})}$ is an increasing function of ν: as ν decreases, the unbinding part takes over in the convolution for the ON times. Since the difference between the two convolutions can only decrease as ν decreases, it becomes harder and harder to differentiate the two ON time distributions, their KullbackLeibler divergence becomes smaller, and the drift term ${V}^{(\text{one})}$ is reduced.
In sum, for $\nu ={\mu}_{1}L$, ${V}^{(\text{all})}={V}^{(\text{one})}$; ${V}^{(\text{all})}$ is independent of ν whilst ${V}^{(\text{one})}$ increases with ν. We conclude that for $\nu >{\mu}_{1}L$, ${V}^{(\text{one})}>{V}^{(\text{all})}$ and the 1ormore scheme is preferred. Conversely, for $\nu <{\mu}_{1}L$, ${V}^{(\text{one})}<{V}^{(\text{all})}$ and the allornothing scheme is preferred.
Appendix 7
Comparing one and two bindingsite architectures
In this section we compare the performance of a one bindingsite architecture (Main Text Figure 5d) to that of an equilibrium allornothing $k=2$ two bindingsite architecture (Main Text Figure 5c). The motivation is to provide detailed explanations for the results given in subsection 'How the number of binding sites affects decisions' of the main text. To rank their performance, we compare, as in the previous appendix 6, the drift for the two architectures.
To do the comparison, we optimize the two binding site architecture rates μ_{1}, μ_{2} and ν_{1} for a fixed value of the error rate e, the real concentration L, the hypothetical concentrations L_{1} and L_{2} and the second unbinding rate ${\nu}_{2}$. The fixed second unbinding rate sets a time scale for the process. For concreteness, we suppose that the real concentration $L={L}_{2}$ and decision is hard ${L}_{1}=L+\delta L$, $\delta L/L\ll 1$.
Decisions can be trivially sped up by increasing indefinitely all the rates to very high values. To avoid this, in the optimization process we set an upper bound to be $5{s}^{1}$. We choose this upper bound to be smaller than the largest values of ${\nu}_{2}$ considered ($9{s}^{1}$) and larger than the smaller values of ${\nu}_{2}$ considered ($1{s}^{1}$). From a biological point of view, this upper bound for the ON rates can correspond to the diffusion limited arrival rate. For the OFF rates, it can correspond to a minimum bound time required to trigger a downstream mechanism (for instance, the activation of the gene). We check in Appendix 7—figure 2 that curves and effects similar to those discussed below are obtained if the upper bounds are modified.
For all values of parameters, we find that the optimal ON rates are maximal (${\mu}_{1}={\mu}_{2}=5{s}^{1}$). We observed and discussed in subsection 'How the number of binding sites affects decision' of the main text that for certain values of the parameters (ν and L), the optimal first unbinding rate ν_{1} reaches the upper bound $5{s}^{1}$ while for smaller values the optimal first unbinding rate remains at 0 (Main Text Figure 5f inset). The transition between the two regions is sharp. Setting ν_{1} to 0 is equivalent to having an effective one binding site model because the promoter never goes back to the 0 state. For that reason, we want to compare the performance of a two binding site model with parameters μ_{1}, μ_{2}, ν_{1} and ${\nu}_{2}$ to that of a one binding site model with parameters μ_{2} and ${\nu}_{2}$.
Since we are in the limit of small concentration differences, the speed of decision is set by V and $\tau ={\tau}_{\text{ON}}+{\tau}_{\text{OFF}}$. We denote the mean drifts of the one and the two bindingsite models as ${V}^{(1)}$ and ${V}^{(2)}$, respectively. Similarly, ${\tau}_{1}$ and ${\tau}_{2}$ are the mean times per cycle of the two models.
The one bindingsite architecture activates with rate ${\mu}_{2}L$ and deactivates with rate ${\nu}_{2}$. Both waiting times for ON and OFF expression states are exponential. Following results in the paragraph 'Mean decision time : connecting driftdiffusion and Wald’s approaches' of the main text, we have ${\tau}_{1}=1/{\nu}_{2}+1/({\mu}_{2}L)$ and ${V}^{(1)}=(1/{\nu}_{2}+1/{\mu}_{2}L)\left[\text{log}({L}_{1}/{L}_{2})+({L}_{2}{L}_{1})/L\right]$. We conclude that :
In the two binding site architecture, the ON times are exponentially distributed with rate ${\nu}_{2}$. The OFF times are more complex as they can result from several cycles from a promoter with a binding site occupied to an empty promoter before finally switching to two full binding sites and gene activation. We compute ${P}_{\text{OFF}}(t,L)$ using the modified transition matrix of the Markov chain with the ON states acting as sinks as described in Appendix 3. The OFF time distribution is given by the derivative of Equation 12.
In the two binding site model, the transition matrix is a 3 by 3 matrix and can be diagonalized analytically to compute explicitly the exponential in Equation 12 . We set ${\mu}_{1}{L}_{\text{eq}}={\mu}_{2}{L}_{\text{eq}}={\nu}_{1}$ since it is always the case in the identified optimal architectures for the two binding site model (where ${L}_{\text{eq}}$ is the specific value of L for which upper bounds on ν_{1} and $\mu L$ are equal, ${L}_{\text{eq}}=1\mu {m}^{3}$ in Main Text Figure 5f). We compute the distribution of the OFF times explicitly and find
where $r=L/{L}_{\text{eq}}$.
Computing the mean of the distribution in Equation 50 , we find that the average time for a cycle in the two binding site architecture is
We can now numerically integrate ${V}^{(2)}={\tau}^{(2)}\int {P}_{\text{OFF}}(t,L)\text{log}\left({P}_{\text{OFF}}(t,{L}_{1})/{P}_{\text{OFF}}(t,L)\right)$. We use this simplified formula to compare the performance of the two architectures and recover the results of Main Text Figure 5f . We show an example for a specific value of L, varying ${\nu}_{2}$ in Appendix 7—figure 1.
Appendix 8
Out of equilibrium architectures and averaging over several steps
In some architectures where several copies of the TF can bind or unbind at once, there can be correlations between successive ON and OFF times. This depends on the structure of the promoter Markov chain. It is made of OFF and ON states. There can be correlations between the OFF and ON times if the chain can enter the ON state subgraph through different ON states, coming from different OFF states (or the same situation reverting ON and OFF states). In that case, the time that the chain will spend in the ON state will depend on the initial entry state, and so it will depend on the OFF state from which the chain entered the ON subgraph, giving a correlation with the previous OFF time to the ON time. If the structure is particularly complex, these correlations can span over several ONOFF cycle. We do however assume that the chain is ergodic so that they vanish at long times.
To deal with this situation, a solution is to average the contribution to the information of ON/OFF events over several ON/OFF times, until the correlation with the initial times is lost. The event in the log ratio becomes a series of ON and OFF times and their joint probability (as they no longer factorize). The next series of events can be considered approximately (or exactly in certain cases) independent of the previous series and the rest of the theory can be applied to this sum of independent variables.
We did not explore these architectures in detail as they are extremely complex, do not seem to increase the performance of the promoter for the readout, and most likely the type of information about the concentration that is hidden in the correlation between ON and OFF would be very hard to decode for downstream mechanisms.
Appendix 9
Appendix 10
Parameters for Main Text Figure 6
Parameters for Main Text Figure 5 panel a, panel b blue full line, panel c red and panel d red are those identified as optimal for $k=3$, $H>4$: ${\mu}_{1}=0.1262\mu {m}^{3}{s}^{1}$, ${\mu}_{2}=0.1104\mu {m}^{3}{s}^{1}$, ${\mu}_{3}=0.0868\mu {m}^{3}{s}^{1}$, ${\mu}_{4}=0.0662\mu {m}^{3}{s}^{1}$, ${\mu}_{5}=0.0441\mu {m}^{3}{s}^{1}$, ${\mu}_{6}=0.0220\mu {m}^{3}{s}^{1}$, ${\nu}_{1}=1.0793{s}^{1}$, ${\nu}_{2}=0.5933{s}^{1}$, ${\nu}_{3}=0.7961{s}^{1}$, ${\nu}_{4}=0.5169{s}^{1}$, ${\nu}_{5}=0.0908{s}^{1}$, ${\nu}_{6}=0.1225{s}^{1}$.Parameters for Main Text Figure 5 panel b green dashed line, panel c blue and panel d blue are those identified as optimal for $k=2$, $H>4$: ${\mu}_{1}=0.1325\mu {m}^{3}{s}^{1}$, ${\mu}_{2}=0.1088\mu {m}^{3}{s}^{1}$, ${\mu}_{3}=0.0861\mu {m}^{3}{s}^{1}$, ${\mu}_{4}=0.0662\mu {m}^{3}{s}^{1}$, ${\mu}_{5}=0.0431\mu {m}^{3}{s}^{1}$, ${\mu}_{6}=0.0215\mu {m}^{3}{s}^{1}$, ${\nu}_{1}=1.0384{s}^{1}$, ${\nu}_{2}=1.5926{s}^{1}$, ${\nu}_{3}=0.6007{s}^{1}$, ${\nu}_{4}=0.2966{s}^{1}$, ${\nu}_{5}=0.1581{s}^{1}$, ${\nu}_{6}=0.0947{s}^{1}$.Other lines in panel b correspond to architectures that are close to optimality, with parameters in the same range as the ones given above.
Appendix 11
Approximating the loglikelihood function with RNA levels
In this section, we give details of the calculations linked to the model of RNA production presented in the section 'Estimating the loglikelihood function with RNA concentrations' of the main text. In this model, we assume that when the promoter enters the ON state, it takes a time ${d}_{\text{ON}}$ before polymerase is loaded. This time can be associated with the formation of polymerase clusters at the transcription sites (Cho et al., 2016a; Cho et al., 2016b or of more complex multifactor transient hubs recently observed in Drosophila embryos (Park et al., 2019). Once the promoter returns to the OFF state, our model also includes a delay ${d}_{\text{OFF}}$ during which the gene continues loading polymerase. This delay can be associated with the dissolution of the clusters and hubs mentioned above or simply with the inertia of polymerase loading. Under these assumptions, the contribution of an ON time t to RNA level is
The contribution of an OFF time s to RNA level is given by
We have that the RNA level at time T is
where the ${\{{t}_{i}\}}_{i}$ are the ON times up to time T and ${\{{s}_{j}\}}_{j}$ the OFF times up to time T.
We look for sets of parameters that give a positive drift ${V}_{1}$ when $L>1.05{L}_{0}$ and negative drift ${V}_{2}$ when $L<0.95{L}_{0}$. For such architectures, the ON and OFF production levels will roughly balance each other at the boundary. To ensure a positive RNA level at the boundary, we redefine the ON rate: ${r}_{\text{ON}}^{*}={r}_{\text{ON}}{r}_{b}>0$ so that the basal rate is now a factor proportional to time in both ON time and OFF time RNA contributions. In that sense, the basal rate is a systematic drift of the RNA level that is proportional to time and can be removed from the equations by defining moving boundaries ${c}_{1}(t)={c}_{1}(0)+{r}_{b}t$ and ${c}_{2}(t)={c}_{2}(0)+{r}_{b}t$. With these redefined rates we now look for architectures such that the drift at the anterior boundary is positive:
and the drift at the posterior boundary is negative:
Specifically, we look for architectures that satisfy these conditions and for which the ratio of the drift over the variance contributions to the loglikelihood is as large as possible to avoid dynamics that are dominated by noise. Finally we look for a set of threshold concentrations that give the most favorable speedaccuracy tradeoff. We show the results of our nonexhaustive search in Appendix 11—figure 1. For one of the architectures that fall below the error level of 32% and the decision time threshold of 3 min, we check the RNA expression profile (Main text Figure 7). For RNA degradation, we use the rate ${r}_{\text{OFF}}[\text{RNA}]/(1+[\text{RNA}])$ that reduces to ${r}_{\text{OFF}}$ close to the boundary and yields the desired linear in time degradation level, but that is proportional to $[\text{RNA}]$ for small values of the concentration. For this function, we compute the average number of RNA molecules in the cell after 3 min of dynamics (Main text Figure 7) and find that the RNA profile has a Hill coefficient of 5.2.
Appendix 12
Exploring the effect of joint enhancer and promoter dynamics
In this section, we explore the effect of the computational power of an enhancer when added to that of the promoter. Bicoid is known to target many gene enhancers (Driever and NüssleinVolhard, 1988; Struhl et al., 1989 following complex concentration dependent patterns (Hannon et al., 2017) . We choose one of the many possible models of enhancer dynamics as en example. For simplicity, we assume a twostate enhancer that is made of one Bicoidbinding site, with dynamics independent of the local promoter dynamics. We assume that the enhancer turns on with a rate ${e}_{ON}$ and turns off with a rate ${e}_{OFF}$ following Markovian dynamics (i.e the waiting times are exponentially distributed). We take a simple rule for gene activity: the gene is ON when both the promoter (which is still a six binding site promoter as in the previous models) and the enhancer are in the ON states.
From a mathematical point of view, the complete system can be viewed as a 14state Markov chain with states $(i,j)$, $0\le i\le 6$, $0\le j\le 1$ corresponding to the promoter having i Bicoid molecules bound and the enhancer having j Bicoid molecules bound. We use this Markov chain to compute the waiting time statistics of the ON and OFF states of the hunchback gene activity. An added difficulty is that the OFF state can now be entered through two transitions: the promoter turning OFF, or the enhancer turning OFF. To account for this degeneracy, we compute the OFF time statistics for each transition. We then compute the complete probability distribution using a weighted average of the two distributions with specific transitions, where the weights are the probability of each transition happening first. We check that this method predicts the correct waiting times (see Appendix 12—figure 1). We note that this method overlooks the correlation between successive ON and OFF times. A perfect Bayesian machine could take advantage of these correlations to improve its estimate of the concentration. Here we assume that the cell cannot.
Once the waiting times for the ON and OFF states of the gene are computed, the drift and diffusion of the SPRT process is obtained by integration and the first passage time to decision between anterior and posterior given by Equation 32. Using that scheme, we optimize the parameters of the enhancers and promoter for the fastest decision with a given error between the anterior and posterior boundary regions. As an example, we picked the case $k=2$ for the promoter activation rule and imposed $H>4$. We still impose that half the nuclei at the boundary be active on average (i.e the product of the activities of the enhancer and the promoter at the boundary is equal to $1/2$). We also limit the binding rate to be diffusion limited. We find that enhancer dynamics improve the performance of the cell: the optimal scheme performs the decision in an average of $100s$ instead of $108s$ for the promoter alone. In these optimal schemes the enhancer is mostly ON due to slower binding dynamics than the promoter (only one binding site versus 6). Assuming the enhancer target is six times larger (or equivalently that the enhancer also has six binding sites), but that the enhancers dynamics can still be approximated by exponential waiting times, we find that in the optimal scheme, the enhancer is ON for a reasonable fraction of the time (about 75 %) and that the performance is improved down to $\simeq 70s$.
Enhancer presence seems to improve the flexibility of promoter states to get good performance while maintaining a high Hill coefficient and half the genes active at the boundary.
A complete study of enhancer dynamics is beyond the scope of this paper but would include other models such as genes that activate when either the enhancer or the promoter are ON, or generally dynamics where the enhancer state influences the transition rates of the promoter.
Data availability
All data analyzed during this study were previously published in the literature and references are included in the paper.
References

Know the SingleReceptor sensing limit? think againJournal of Statistical Physics 162:1353–1364.https://doi.org/10.1007/s1095501514129

Physics of chemoreceptionBiophysical Journal 20:193–219.https://doi.org/10.1016/S00063495(77)855446

Physical limits to biochemical signalingPNAS 102:10040–10045.https://doi.org/10.1073/pnas.0504321102

Perceptual decision making: driftdiffusion model is equivalent to a Bayesian modelFrontiers in Human Neuroscience 8:102.https://doi.org/10.3389/fnhum.2014.00102

Environmental sensing, information transfer, and cellular decisionmakingCurrent Opinion in Biotechnology 28:149–155.https://doi.org/10.1016/j.copbio.2014.04.010

Impact of RNA modifications and RNAModifying enzymes on eukaryotic ribonucleasesThe Enzymes, RNA Modification 41:299–329.https://doi.org/10.1016/bs.enz.2017.03.008

Speedaccuracy tradeoffs in animal decision makingTrends in Ecology & Evolution 24:400–407.https://doi.org/10.1016/j.tree.2009.02.010

Precision of readout at the hunchback gene: analyzing short transcription time traces in living fly embryosPLOS Computational Biology 12:e1005256.https://doi.org/10.1371/journal.pcbi.1005256

BookProbability: Theory and ExamplesCambridge: Cambridge University Press.https://doi.org/10.1017/CBO9780511779398

Maximum likelihood and the single receptorPhysical Review Letters 103:158101.https://doi.org/10.1103/PhysRevLett.103.158101

Role of spatial averaging in the precision of gene expression patternsPhysical Review Letters 103:258101.https://doi.org/10.1103/PhysRevLett.103.258101

Studies of nuclear and cytoplasmic behaviour during the five mitotic cycles that precede gastrulation in Drosophila embryogenesisJournal of Cell Science 61:31.

The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038

The gap gene networkCellular and Molecular Life Sciences 68:243–274.https://doi.org/10.1007/s000180100536y

The BergPurcell limit revisitedBiophysical Journal 106:976–985.https://doi.org/10.1016/j.bpj.2013.12.030

Enhancer and promoter interactionslong distance callsCurrent Opinion in Genetics & Development 22:79–85.https://doi.org/10.1016/j.gde.2011.11.001

Live imaging of bicoiddependent transcription in Drosophila embryosCurrent Biology 23:2135–2139.https://doi.org/10.1016/j.cub.2013.08.053

3 minutes to precisely measure morphogen concentrationPLOS Genetics 14:e1007676.https://doi.org/10.1371/journal.pgen.1007676

On optimal decisionmaking in brains and social insect coloniesJournal of the Royal Society Interface 6:1065–1074.https://doi.org/10.1098/rsif.2008.0511

Dense Bicoid hubs accentuate binding along the morphogen gradientGenes & Development 31:1784–1794.https://doi.org/10.1101/gad.305078.117

Limits of sensing temporal concentration changes by single cellsPhysical Review Letters 104:248101.https://doi.org/10.1103/PhysRevLett.104.248101

Mutations affecting the pattern of the larval cuticle in Drosophila melanogaster : I. zygotic loci on the second chromosomeWilhelm Roux's Archives of Developmental Biology 193:267–282.https://doi.org/10.1007/BF00848156

Temporal pattern recognition through analog molecular computationACS Synthetic Biology 8:826–832.https://doi.org/10.1021/acssynbio.8b00503

Embryonic cleavage cycles: how is a mouse like a fly?Current Biology 14:R35–R45.https://doi.org/10.1016/j.cub.2003.12.022

Growing an embryo from a single cell: a hurdle in animal lifeCold Spring Harbor Perspectives in Biology 7:a019042.https://doi.org/10.1101/cshperspect.a019042

Precision of hunchback expression in the Drosophila embryoCurrent Biology 22:2247–2252.https://doi.org/10.1016/j.cub.2012.09.051

Infomax strategies for an optimal balance between exploration and exploitationJournal of Statistical Physics 163:1454–1476.https://doi.org/10.1007/s1095501615210

BookA Guide to FirstPassage ProcessesCambridge: Cambridge University Press.https://doi.org/10.1017/CBO9780511606014

Decision making in the arrow of timePhysical Review Letters 115:250602.https://doi.org/10.1103/PhysRevLett.115.250602

Network representations and methods for the analysis of chemical and biochemical pathwaysMolecular BioSystems 9:2189–2200.https://doi.org/10.1039/c3mb70052f

Mutual repression enhances the steepness and precision of gene expression boundariesPLOS Computational Biology 8:e1002654.https://doi.org/10.1371/journal.pcbi.1002654

Only accessible information is useful: insights from gradientmediated patterningRoyal Society Open Science 2:150486.https://doi.org/10.1098/rsos.150486

Precision in a rush: tradeoffs between reproducibility and steepness of the hunchback expression patternPLOS Computational Biology 14:e1006513.https://doi.org/10.1371/journal.pcbi.1006513

Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cellCurrent Opinion in Cell Biology 15:221–231.https://doi.org/10.1016/S09550674(03)000176

On cumulative sums of random variablesThe Annals of Mathematical Statistics 15:283–296.https://doi.org/10.1214/aoms/1177731235

Sequential tests of statistical hypothesesThe Annals of Mathematical Statistics 16:117–186.https://doi.org/10.1214/aoms/1177731118

Some generalizations of the theory of cumulative sums of random variablesThe Annals of Mathematical Statistics 16:287–293.https://doi.org/10.1214/aoms/1177731092
Article and author information
Author details
Funding
National Science Foundation (PoLS Grant 1411313)
 Massimo Vergassola
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are grateful to Gautam Reddy and Huy Tran for multiple relevant discussions. This work was supported by the National Science Foundation, Program PoLS, Grant 1411313. JD was partially supported by the NSFSimons Center for Quantitative Biology (Simons Foundation SFARI/597491RWC and the National Science Foundation 17764421). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Version history
 Received: June 28, 2019
 Accepted: July 29, 2020
 Accepted Manuscript published: July 29, 2020 (version 1)
 Version of Record published: August 14, 2020 (version 2)
Copyright
© 2020, Desponds et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,530
 views

 355
 downloads

 21
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Developmental Biology
 Physics of Living Systems
Geometric criteria can be used to assess whether cell intercalation is active or passive during the convergent extension of tissue.

 Computational and Systems Biology
 Physics of Living Systems
Locomotion is a fundamental behavior of Caenorhabditis elegans (C. elegans). Previous works on kinetic simulations of animals helped researchers understand the physical mechanisms of locomotion and the musclecontrolling principles of neuronal circuits as an actuator part. It has yet to be understood how C. elegans utilizes the frictional forces caused by the tension of its muscles to perform sequenced locomotive behaviors. Here, we present a twodimensional rigid body chain model for the locomotion of C. elegans by developing Newtonian equations of motion for each body segment of C. elegans. Having accounted for frictioncoefficients of the surrounding environment, elastic constants of C. elegans, and its kymogram from experiments, our kinetic model (ElegansBot) reproduced various locomotion of C. elegans such as, but not limited to, forwardbackward(omega turn)forward locomotion constituting escaping behavior and deltaturn navigation. Additionally, ElegansBot precisely quantified the forces acting on each body segment of C. elegans to allow investigation of the force distribution. This model will facilitate our understanding of the detailed mechanism of various locomotive behaviors at any given frictioncoefficients of the surrounding environment. Furthermore, as the model ensures the performance of realistic behavior, it can be used to research actuatorcontroller interaction between muscles and neuronal circuits.