Predictive modeling reveals that higher-order cooperativity drives transcriptional repression in a synthetic developmental enhancer

  1. Yang Joon Kim
  2. Kaitlin Rhee
  3. Jonathan Liu
  4. Selene Jeammet
  5. Meghan A Turner
  6. Stephen J Small
  7. Hernan G Garcia  Is a corresponding author
  1. Chan Zuckerberg Biohub, United States
  2. Department of Chemical Biology, University of California, Berkeley, United States
  3. Department of Physics, University of California, Berkeley, United States
  4. Department of Biology, Ecole Polytechnique, France
  5. Biophysics Graduate Group, University of California, Berkeley, United States
  6. Department of Biology, New York University, United States
  7. Department of Molecular and Cell Biology, University of California, Berkeley, United States
  8. Institute for Quantitative Biosciences–QB3, University of California at Berkeley, United States

Abstract

A challenge in quantitative biology is to predict output patterns of gene expression from knowledge of input transcription factor patterns and from the arrangement of binding sites for these transcription factors on regulatory DNA. We tested whether widespread thermodynamic models could be used to infer parameters describing simple regulatory architectures that inform parameter-free predictions of more complex enhancers in the context of transcriptional repression by Runt in the early fruit fly embryo. By modulating the number and placement of Runt binding sites within an enhancer, and quantifying the resulting transcriptional activity using live imaging, we discovered that thermodynamic models call for higher-order cooperativity between multiple molecular players. This higher-order cooperativity captures the combinatorial complexity underlying eukaryotic transcriptional regulation and cannot be determined from simpler regulatory architectures, highlighting the challenges in reaching a predictive understanding of transcriptional regulation in eukaryotes and calling for approaches that quantitatively dissect their molecular nature.

Editor's evaluation

The work by Kim et al., used synthetic constructs in Drosophila to examine the relationship between regulators and transcription initiation. By measuring regulator concentrations and the corresponding RNA polymerase initiation rates in different synthetic constructs and using a thermodynamic model, the authors concluded that higher-order cooperativities between the repressor on adjacent binding sites, and that between the repressor and RNA polymerase are needed to explain the observed response curves in RNA polymerase loading rate. This work targets a challenging question in eukaryotic transcription regulation, where higher-order cooperativity between different molecular components, in addition to simple transcription factor binding and unbinding, is often necessary to account for observed promoter behaviors when multiple elements (repressors, mediators, activators) exist.

https://doi.org/10.7554/eLife.73395.sa0

Introduction

During embryonic development, transcription factors bind stretches of regulatory DNA termed enhancers to dictate the spatiotemporal dynamics of gene expression patterns that will lay out the future body plan of multicellular organisms (Spitz and Furlong, 2012; Small and Arnosti, 2020). One of the greatest challenges in quantitative developmental biology is to predict these patterns from knowledge of the number, placement, and affinity of transcription factor binding sites within enhancers. The early embryo of the fruit fly Drosophila melanogaster has become one of the main workhorses in this attempt to achieve a predictive understanding of cellular decision-making in development due to its well-characterized gene regulatory network and transcription factor binding motifs, and the ease with which its development can be quantified using live imaging (Garcia et al., 2020; Small and Arnosti, 2020; Rivera et al., 2019).

Predictive understanding calls for the derivation of theoretical models that generate quantitative and experimentally testable predictions. Thermodynamic models based on equilibrium statistical mechanics have emerged as a widespread theoretical framework to achieve this goal (Ackers et al., 1982; Vilar and Leibler, 2003; Bolouri and Davidson, 2003; Bintu et al., 2005b; Bintu et al., 2005a; Segal et al., 2008; Fakhouri et al., 2010; Sayal et al., 2016; Phillips et al., 2019; Eck et al., 2020). For instance, over the last decade, a dialogue between these thermodynamic models and experiments demonstrated the capacity to quantitatively predict bacterial transcriptional regulation from knowledge of the DNA regulatory architecture (He et al., 2010; Garcia and Phillips, 2011; Brewster et al., 2014; Garcia et al., 2012; Sepúlveda et al., 2016).

The predictive power of these models is evident when inferring model parameters from simple regulatory architectures and using those parameters to make parameter-free predictions of more complex architectures (Boedicker et al., 2013a; Boedicker et al., 2013b, Razo-Mejia et al., 2018; Phillips et al., 2019). Consider, for example, that RNA polymerase II (RNAP)—which we take as a proxy for the whole basal transcriptional machinery—binds to a promoter with a dissociation constant Kp. When RNAP is bound, transcription is initiated at a rate R (Figure 1A). In the absence of any regulation, a thermodynamic model will only have Kp and R as its free parameters which can be experimentally determined by, for example, measuring mRNA distributions (Razo-Mejia et al., 2020). Now, we assume that the parameters Kp and R inferred in this step do not just enable a fit to the data, but that their values represent physical quantities that remain unaltered as more complex regulatory architectures are iteratively considered. As a result, when we consider the case where a single repressor molecule can bind, our model calls for only two new free parameters: a dissociation constant for repressor to its binding motif Kr, and a negative cooperativity between repressor and RNAP, ωrp, that makes the recruitment of RNAP to the DNA less favorable when the repressor is bound to its binding site (Figure 1B). Once again, after determining Kr and ωrp experimentally (Phillips et al., 2019), we consider the case where two repressors can bind simultaneously (Figure 1C). If the repressors interact with RNAP independently of each other, then our model has no remaining free parameters such that we will have reached complete predictive power. However, protein-protein interactions between repressors could exist or even higher-order interactions giving rise to a repressor-repressor-RNAP ternary complex might be present. This extra complexity would require yet another round of experimentation to quantify these interactions represented by ωrr and ωrrp in Figure 1C, respectively. Even after quantifying these parameters, predictive power might not be reached if, after adding yet another repressor binding site, a complex between all three repressors and RNAP can be formed (Figure 1D).

Building up predictive models of transcriptional repression.

(A) In the absence of repressor binding, gene expression can be characterized by a dissociation constant between RNAP and the promoter Kp and the rate of transcription initiation when the promoter is bound by RNAP R. (B) In the presence of a single repressor binding site, models need to account for two additional parameters describing the repressor dissociation constant Kr and a repressor-RNAP interaction term ωrp. (C) For two-repressor architectures, parameters accounting for repressor-repressor interactions ωrr and for interactions giving rise to a repressor-repressor-RNAP complex could also have to be incorporated. (D) For the case of three repressor binding sites, additional parameters ωrrr and ωrrrp capturing the higher-order cooperativity between three repressor molecules and between three Runt molecules and RNAP, respectively, could be necessary. Note the nomenclature shown below each construct, which indicates which Runt binding sites are present in each construct.

While protein-protein cooperativity captured by ωrr has been studied both in bacteria (Ackers et al., 1982; Ptashne and Gann, 2002) and eukaryotes (Giniger and Ptashne, 1988; Ma et al., 1996; Lebrecht et al., 2005; Parker et al., 2011; Fakhouri et al., 2010; Sayal et al., 2016), the necessity of accounting for higher-order interactions such as those described in our example by the ωrrp and ωrrrp terms had only been demonstrated in archeae (Peeters et al., 2013) and bacteria (Dodd et al., 2004). The need to invoke this higher-order cooperativity in eukaryotes only became apparent in the last few years (Estrada et al., 2016b; Park et al., 2019; Biddle et al., 2020). These higher-order cooperativities might be necessary in order to account for the complex interactions mediated by, for example, the recruitment of co-repressors (Courey and Jia, 2001; Walrad et al., 2011), mediator complex (Park et al., 2019), or any other element of the transcriptional machinery. As a result, while posing a challenge to reaching a parameter-free predictive understanding of transcriptional regulation, higher-order cooperativity provides an avenue for quantifying the complexity of the molecular processes underlying eukaryotic cellular decision-making.

In this paper, we sought to test whether an iterative and predictive approach, such as that outlined in Figure 1, was possible for transcriptional repression in the early embryo of the fruit fly Drosophila melanogaster or whether it is necessary to invoke higher-order cooperativities that challenge the reach of our predictive models as we add more complexity to the system. To make this possible, we engineered binding sites for the Runt repressor into the Bicoid-activated hunchback P2 minimal enhancer. We systematically varied the number and placement of Runt binding sites within this enhancer (Chen et al., 2012) in order to determine whether model fits to real-time transcriptional measurements from the enhancer constructs containing only one-Runt binding site could accurately predict repression in two- and three-Runt binding site constructs (Figure 1). We found that a thermodynamic model can recapitulate all our data. However, we also discovered that, while the model could describe repression by a single Runt repressor, protein-protein and higher-order cooperativities had to be invoked in order to quantitatively account for regulation by two or more repressor molecules. While these higher-order cooperativities limit the iterative bottom-up discourse between theory and experiment that has been successful in bacteria (Phillips et al., 2009), they also provide a concrete theoretical framework for quantifying the complexities behind eukaryotic transcriptional control, and call for the development of new theories and experiments specifically conceived to uncover the the molecular underpinnings of this complexity.

Results

Predicting transcription rate using a thermodynamic model of Bicoid activation and Runt repression

Inspired by the theory-experiment dialogue leading to predictive understanding of the lac operon in E. coli over the last four decades (Phillips et al., 2019; Razo-Mejia et al., 2018; Garcia and Phillips, 2011; Garcia et al., 2012; Ackers et al., 1982; Buchler et al., 2003), we built a predictive model of Runt repression on the Bicoid-activated hunchback P2 enhancer using the thermodynamic model framework (Phillips et al., 2019; Bintu et al., 2005b; Bintu et al., 2005a) with the goal of predicting the rate of transcription initiation as a function of input transcription factor concentration, and the number and placement of Runt repressor binding sites. Our model rests on the ‘occupancy hypothesis’ that states that the rate of mRNA production, d[mRNA]/dt, is proportional to the probability of the promoter being bound by RNA polymerase II (RNAP), pbound, such that

(1) d[mRNA]dt=Rpbound,

where R is the rate of mRNA production when the promoter is occupied by RNAP. Note that, throughout this study, we treat the rate of transcription initiation and the rate of RNAP loading interchangeably.

To generate intuition, we start by modeling the case of hunchback P2 with one Runt binding site. Figure 2A illustrates the possible states the system can be found in. Each state has an associated statistical weight which can be calculated as prescribed by equilibrium statistical mechanics (Bintu et al., 2005b; Bintu et al., 2005a). Here, we assume that there are six Bicoid binding sites with the same dissociation constant given by Kb, one Runt binding site with a dissociation constant specified by Kr, and a promoter with a dissociation constant for RNAP prescribed by Kp. In the absence of Runt, we consider four states as shown in the top two rows of Figure 2A. Here, we assume that Bicoid-Bicoid cooperativity is so strong that the enhancer can either be unoccupied or completely bound by Bicoid molecules (Gregor et al., 2007; Park et al., 2019). Further, we consider an interaction between Bicoid and RNAP given by ωbp. For simplicity, we use the dimensionless parameters b=[Bicoid]/Kb, r=[Runt]/Kr and p=[RNAP]/Kp. These assumptions lead to a functional form reminiscent of a Hill function that explains the sharp step-like expression pattern along the embryo’s anterior-posterior axis of the hunchback gene (Gregor et al., 2007; Park et al., 2019; Driever and Nüsslein-Volhard, 1988; Driever et al., 1989). A full thermodynamic model in which we do not make this assumption of high Bicoid-Bicoid cooperativity is discussed in detail in Section ‘Derivation of the general thermodynamic model for the hunchback P2 enhancer’ and Section ‘Derivation of the general and simpler thermodynamic model for the hunchback P2 enhancer with one Runt binding site’.

Figure 2 with 2 supplements see all
Thermodynamic model of transcriptional regulation by Bicoid activator and Runt repressor.

(A) States and statistical weights for the regulation of hunchback P2 with one Runt binding site in the limit of strong Bicoid-Bicoid cooperativity. Here, we use the dimensionless parameters b=[Bicoid]/Kb, r=[Runt]/Kr, and p=[RNAP]/Kp, where Kb, Kr, and Kp are the dissociation constants of Bicoid, Runt, and RNAP, respectively. ωbp represents the cooperativity between Bicoid and RNAP, ωrp captures the cooperativity between Runt and RNAP, and R represents the rate of transcription when the promoter is occupied by RNAP. The top two rows correspond to states where only Bicoid and RNAP act, while the bottom two rows represent repression by Runt. (B) Representative prediction of RNAP loading rate as a function of Bicoid and Runt concentrations for ωbp=3,ωrp=0.001,p=0.001,R=1(AU/min).

The molecular mechanism by which Runt downregulates transcription of its target genes remains unclear (Chen et al., 2012; Hang and Gergen, 2017; Koromila and Stathopoulos, 2017; Koromila and Stathopoulos, 2019). Here, we assume the so-called ‘direct repression’ model (Gray et al., 1994) that posits that Runt operates by inhibiting RNAP binding to the promoter through a direct Runt-RNAP interaction term given by ωrp<1 independently of Bicoid. As a result, in the presence of Runt, we consider four additional states as shown in the bottom two rows of Figure 2A. Other potential mechanisms of Runt repression are further discussed in Supplementary Section ‘Comparison of different modes of repression’, where we also show that the choice of specific mechanism does not change our conclusions.

Given these assumptions, we arrive at the microstates and corresponding statistical weights shown in Figure 2A. The probability of finding RNAP bound to the promoter, pbound, is calculated by dividing the sum of all statistical weights featuring RNAP by the sum of the weights of all possible microstates. The calculation of pbound combined with Equation 1 leads to the expression

(2) Rate=Rpbound=Rp+b6pωbp+rpωrp+b6rpωbpωrp1+b6+r+b6r+p+b6pωbp+rpωrp+b6rpωbpωrp,

which makes it possible to predict the output rate of mRNA production as a function of the input concentrations of Bicoid and Runt (Figure 2B). With this theoretical framework in hand, we experimentally tested the predictions of this model.

Measuring transcriptional input-output to test model predictions

The transcriptional input-output function in Figure 2B indicates that, in order to predict the rate of RNAP loading and to test our theoretical model, we need to first measure the concentration of the input Bicoid and Runt transcription factors. In order to quantify the concentration profile of Bicoid, we used an established eGFP-Bicoid line (Gregor et al., 2007) and measured mean Bicoid nuclear concentration dynamics along the anterior-posterior axis of the embryo over nuclear cycles 13 and 14 (nc13 and nc14, respectively) as shown in Movie Figure 3—video 1 (Eck et al., 2020). An example snapshot and time trace of Bicoid nuclear concentration dynamics at 40% of the embryo length appear in Figure 3A and B.

Figure 3 with 11 supplements see all
Measurement of input transcription factor concentrations and output rate of transcription to test model predictions.

(A) Snapshot of an embryo expressing eGFP-Bicoid spanning 20–60% of the embryo length. (For a full time-lapse movie, see Movie Figure 3—video 1) (B) Bicoid nuclear fluorescence dynamics taken at 40% of the embryo. (C) Snapshot of an embryo expressing eGFP:LlamaTag-Runt spanning 20–60% of the embryo length. (For a full time-lapse movie, see Movie Figure 3—video 2) (D) Runt nuclear concentration dynamics in males and females. (E) Measured transcription factor concentration profiles along the anterior-posterior axis of the embryo. The concentration profiles are averaged over the gray shaded regions shown in (B) and (D) which corresponds to a time window between 5 and 10min into nc14. (F) Predicted RNAP loading rate for hunchback P2 with one Runt binding site over the anterior-posterior axis generated for a reasonable set of model parameters Kb=30 AU, Kr=100 AU, ωbp=100, p=0.001, and R=1 AU/min for varying values of the Runt-RNAP interaction term ωrp=[10-2,1]. (G) Schematic of the MS2 system where 24 repeats of the MS2 loop sequence are inserted downstream of the promoter followed by the lacZ gene. The MS2 coat protein (MCP) fused to GFP binds the MS2 loops. (H) Example snapshot of an embryo expressing MCP-GFP and Histone-RFP. Green spots correspond to active transcriptional loci and red circles correspond to nuclei. Spot intensities are proportional to the number of actively transcribing RNAP molecules. (I) Representative MS2 fluorescence averaged over a narrow window (2.5% of the embryo length) along the anterior-posterior axis of the embryo. The initial rate of RNAP loading was obtained by fitting a line (brown) to the initial rise of the data and the x-intercept is defined as the onset of transcription (TON). (J) Measured initial rate of RNAP loading (over a spatial bin of 2.5% of the embryo length) across the anterior-posterior axis of the embryo, from the hunchback P2 enhancer. (B, D, E, and J, error bars represent standard error of the mean over 3 embryos; I, error bars represent standard error of the mean over the spatial averaging corresponding to roughly ten nuclei; A, C, and H, white scale bars represent 20 μm.).

Quantification of the Runt concentration using standard fluorescent protein fusions is not possible due to the slow maturation times of these proteins (Bothma et al., 2018). We therefore measured Runt concentration dynamics using our recently developed LlamaTags, which are devoid of such maturation dynamics artifacts (Bothma et al., 2018). Specifically, we generated a new fly line harboring a fusion of a LlamaTag against eGFP to the endogenous runt gene using CRISPR/Cas9-mediated homology-directed repair (Materials and Methods; Harrison et al., 2010, Gratz et al., 2015).

Using this LlamaTag fusion, we measured the mean Runt nuclear fluorescence along the anterior-posterior axis of the embryo over nc13 and nc14 (Materials and Methods; Figure 3B; Movie Figure 3—video 2). As expected due to the location of the runt gene on the X chromosome (Lott et al., 2011), there is a sex dependence in the nuclear concentration levels in nc13, with males displaying lower Runt levels than females; this difference is compensated by early nc14 (Figure 3C and D). As a result, for ease of analysis, we focused subsequent quantitative dissection on nc14.

We used the measured input protein concentration profiles to predict the output transcription rate. To make this possible, we invoked previous observations stating that the concentration dynamics of input transcription factors does not significantly affect the initial rate of RNAP loading (Garcia et al., 2013; Eck et al., 2020). As a result, we decided to use the time-averaged concentration dynamics of Bicoid and Runt over a time window spanning 5 min after the 13th anaphase to 10 min after this anaphase (gray shaded region in Figure 3B and D) as inputs to our model, resulting in the static spatial concentration profiles shown in Figure 3E. We then used these time-averaged concentration profiles of input transcription factors to calculate the time-averaged rate of transcription initiation over the same time window. In the Supplementary Information Section ‘Comparing using static versus dynamic transcription factor concentrations as model inputs’ we compare this methodology with one that acknowledges input transcription factor concentration dynamics and show that the prediction stemming from both approaches leads to equivalent theoretical predictions. Specifically, the time-averaged rate of transcription predicted by the dynamic inputs was similar to the rate of transcription predicted by the static inputs.

Along the anterior-posterior axis of the embryo, the measured Bicoid and Runt concentration profiles define a trajectory through the input-output function (Figure 2B). Given a set of parameters, this trajectory predicts the initial rate of RNAP loading. This quantitative prediction can be directly compared with experimentally measured transcription initiation rates. For example, given the concentration profiles shown in Figure 3E, we calculate the RNAP loading rate as a function of the position along the embryo for different values of the Runt-RNAP interaction, captured by ωrp. Figure 3F illustrates how ωrp shapes the predicted profiles for the RNAP loading rate. As expected, the prediction shows that the rate of transcription decreases as the strength of the Runt-RNAP interaction decreases.

Next, we sought to experimentally test these predictions by measuring the rate of RNAP loading using the MS2 system (Bertrand et al., 1998; Lucas et al., 2013; Garcia et al., 2013). Here, we inserted 24 repeats of the MS2 loop sequence following the hunchback P2 enhancer and even-skipped promoter in our reporter construct, which leads to the fluorescent labeling of sites of active transcription in living embryos (Figure 3G and H; Movie Figure 3—video 3). The fluorescence intensity of each MS2 spot is proportional to the number of actively transcribing RNAP molecules (Garcia et al., 2013). In order to quantify the transcriptional activity reported by MS2, we measured the mean MS2 spot fluorescence over nuclei in a narrow spatial window (Figure 3I; Garcia et al., 2013; Eck et al., 2020). To measure the initial rate of RNAP loading, we obtained the slope of the initial rise in the number of actively transcribing RNAP molecules over the same time window used to average input transcription factor concentration (Figure 3I, brown line). The resulting RNAP loading rate plotted over the anterior-posterior axis is in qualitative agreement with the classic pattern driven by the hunchback P2 minimal enhancer (Figure 3J; Garcia et al., 2013, Chen et al., 2012, Park et al., 2019).

While we chose the initial rate of transcription as the experimental measurable to confront against our model predictions, the MS2 technique can also report on other dynamical features of transcription such as the time window over which transcription occurs and the fraction of loci that engage in transcription at any point over the nuclear cycle. Although these two quantities have been shown to be relevant in shaping gene expression patterns in other regulatory contexts (Garcia et al., 2013; Lammers et al., 2020; Eck et al., 2020; Dufourt et al., 2018; Reimer et al., 2021), we found that the transcription time window was not significantly regulated in the presence of Runt (Figure 3—figure supplement 3). As described in Section ‘Quantitative interpretation of MS2 signals’, we did find some modulation of the fraction of transcriptionally engaged loci for a subset of our synthetic enhancer constructs but, as we could not detect a clear trend in how this fraction of active loci was modulated, we did not pursue a theoretical dissection of the control of this quantity by Runt.

Enhancer sequence dictates unrepressed transcription rates by determining RNAP-promoter interactions

With these theoretical models and our experimental platform in hand, we designed a set of synthetic enhancer constructs with differing number and placement of Runt binding sites as shown in Figure 4A (top) , and Figure 4—figure supplement 1. Our enhancer sequences are identical to those created and validated by Chen et al., 2012, which kept the length of the enhancer sequence consistent and inserted experimentally validated Runt binding sites (Melnikova et al., 1993; Lewis et al., 1999; Chen et al., 2012; Koromila and Stathopoulos, 2017) by mutating the base pairs within the enhancer that are not mapped to binding sites for any known transcripiton factor in the early fruit fly embryo (Hertz et al., 1990; Hertz and Stormo, 1999).

Figure 4 with 2 supplements see all
Enchancer-to-enhancer variability in the unrepressed transcription level stems from unique RNAP-dependent parameters.

(A) Measured initial rates of RNAP loading across the anterior-posterior axis of the embryo for all synthetic enhancer constructs in the absence of Runt protein. (The [111] synthetic enhancer construct with the position of Bicoid (red) and Runt (green) binding sites is shown in genomic length scale on top as a reference.) (B) Representative best MCMC fit and (C) associated corner plot for the [001] construct in the runt null background. (D) Inferred model parameters for all synthetic enhancers in the absence of Runt repressor. Note the large spread in ωbp, consistent with the corner plot shown in (C), which indicates that our model does not constrain this parameter well compared to the other parameters. (E) Coefficient of variation of inferred parameters. (A, B), shaded regions represent the standard error of the mean over>3 embryos; (B) error bars from MCMC fit represent 95% confidence interval; (D) error bars represent standard deviations calculated from the MCMC posterior chains; (E) error bars are calculated by propagating the standard deviation of individual parameters from their MCMC chains.

A major assumption of our theoretical approach is that the model parameters obtained from simple regulatory architectures can be used as inputs for more complex constructs. For instance, we assume that the Runt-independent model parameters for Bicoid and RNAP action—Kb, ωbp, p and R (Figure 2A)—are conserved for all constructs containing Runt binding sites regardless of their number and placement in the enhancer. If model parameters can be shared across constructs, then our model should predict the same profile for the rate of transcription across all synthetic enhancer constructs.

To test this assumption, we measured the initial rate of RNAP loading in all of our reporter constructs, in runt null embryos (Materials and Methods). Notably, unrepressed transcription rates varied significantly across synthetic enhancers (Figure 4A). For example, despite no Runt being present, the [001] construct had almost twice the unrepressed rate of [000].

This large construct-to-construct variability in unrepressed transcription rates likely originates from the Runt binding site sequences interfering with some combination of Bicoid and RNAP function. To uncover the mechanistic effect of these Runt binding sites sequences on unrepressed activity, we sought to determine which parameters in our thermodynamic model varied across constructs. In the absence of Runt repressor, only four states remain corresponding to the two top rows of Figure 2A. In this limit, the predicted rate of transcription is given by

(3)  Rate=Rp+([Bicoid]Kb)6pωbp1+p+([Bicoid]Kb)6+([Bicoid]Kb)6pωbp,

where we have invoked the same parameters as in Figure 2 and Equation 2. For clarity, the free parameters in this equation are marked using the red color.

To obtain the model parameters for each construct measured in Figure 4A, we invoked the Bayesian inference technique of Markov Chain Monte Carlo (MCMC) sampling that has been widely used for inferring the biophysical parameters from theoretical models (Liu et al., 2021, Razo-Mejia et al., 2018, Geyer and Thompson, 1992; Supplementary Section ‘Markov Chain Monte Carlo inference protocol’). A representative comparison of the MCMC fit to the experimental data reveals a good agreement between theory and experiment (Figure 4B). MCMC sampling also gives the distribution of the posterior probability for each parameter as well as their cross-correlation (Figure 4C). These corner plots reveal relatively unimodal posterior distributions, suggesting that a unique set of parameters can explain the data.

Note that, while the Bicoid dissociation constant Kb and the Bicoid-RNAP interaction term ωbp remain largely unchanged regardless of enhancer sequence, there is considerable variability in the inferred mean RNAP-dependent parameters p and R (Figure 4D). This variability can be further quantified by examining the coefficient of variation,

(4) CV=σμ,

where σ and μ are the standard deviation and the mean of each parameter, respectively, calculated over all constructs. The coefficients of variation for the RNAP and promoter-dependent parameters are much higher than those for Bicoid-dependent parameters (≈ 40% versus < 10%; Figure 4E). This suggests that the variability in unrepressed transcription rates due to the presence of Runt binding sites stems from differences in the behavior of RNAP at the promoter rather than differences in Bicoid binding or activation. As a result, as we consider increasingly more complex regulatory architectures, we associated each construct with its own specific Bicoid- and RNAP-dependent parameters as inferred in Figure 4D. In contrast, as we will show below, we will conserve Runt-dependent parameters as we consider increasingly more complex constructs featuring more Runt binding sites.

The thermodynamic model recapitulates repression by one Runt binding site

Next, we asked whether our model recapitulates gene expression for the hunchback P2 enhancer with a one-Runt binding site in the presence of Runt repressor as predicted by Equation 2. We posited that, since the binding site sequence remains unaltered throughout our constructs (Figure 4—figure supplement 1), the value of the Runt dissociation constant Kr would also remain unchanged across these enhancers regardless of Runt binding site position; however, we assumed that, as the distance between Runt and the promoter varied, so could the Runt-RNAP interaction term ωrp.

We measured the initial rate of transcription along the embryo for all our constructs containing one Runt binding site in the presence of Runt protein. In this case of a single Runt binding site, Equation 2 predicts that the initial rate of RNAP loading will be given by

(5) Rate=Rpbound=Rp+b6pωbp+[Runt]Krpωrp+b6[Runt]Krpωbpωrp1+b6+[Runt]Kr+b6r+p+b6pωbp+[Runt]Krpωrp+b6[Runt]Krpωbpωrp.

Here, we have have rewritten Equation 2 to clarify which parameters are fixed and which parameters are inferred using color coding. Specifically, we took Runt-independent parameters (Kb, ωbp, p and R), shown in black, as given by the inference from our previous experiments in the absence of Runt (Figure 4). Further, Runt-dependent parameters (Kr and ωrp) which we will infer, are shown in red. We then used MCMC sampling to infer these Runt-dependent parameters for each of our constructs while retaining the mean values of Runt-independent parameters.

The resulting MCMC fits show significant agreement with the experimental data (Figure 5A), confirming that, within our model, the same dissociation constant Kr can be used for all Runt binding sites regardless of their position within the enhancer. Further, the corner plot yielded a unimodal distribution of posterior probability of the inferred parameters (Figure 5B), indicating the existence of a unique set of most-likely model parameters. We challenged our assumption of constant Kr across our constructs in Section Figure 5—figure supplement 3, where we show that, even if we posit that each construct has a different Runt dissociation constant, the obtained Kr values are comparable.

Figure 5 with 3 supplements see all
Testing the direct repression model in the presence of one Runt binding site.

(A) Initial transcription rate as a function of position along the embryo for the three constructs containing one Runt binding site in the presence and absence of Runt repressor, together with their best MCMC fits. (B) Corner plots from MCMC inference for all constructs with one Runt binding site. (C) Inferred ωrp value as a function of distance between the promoter and the Runt binding site. (A, data points represent mean and standard error of the mean over the embryos and shaded error bars represent 95% confidence intervals for the best MCMC fits for Runt WT datasets; C, data and error bars represent the mean and standard deviation of the posterior chains, respectively.).

The observed trend in the Runt-RNAP interaction captured by ωrp qualitatively agrees with the “direct repression” model. Specifically, because the model assumes that Runt interacts directly with RNAP, it predicts that, the farther apart Runt and the promoter are, the lower this interaction should be (Gray et al., 1994). In agreement with this prediction, the mean value of ωrp obtained from our fits changes from high repression (ωrp0.1) in the [001] construct to almost no repression (ωrp1) in the [100] construct as the Runt site is moved away from the promoter (Figure 5C). Thus, the direct repression model recapitulates repression by a single Runt molecule using the the same dissociation constant regardless of Runt binding site position, and displays the expected dependence of the Runt-RNAP interaction term on the distance between these two molecules.

Predicting repression by two-Runt binding sites requires both Runt-Runt and Runt-Runt-RNAP higher-order cooperativity

Could the parameters inferred in the preceding section be used to accurately predict repression in the presence of two Runt binding sites? An extra Runt binding site enables new protein-protein interactions between Runt molecules and RNAP (Figure 6A). First, we considered individual Runt-RNAP interaction terms, ωrp1 and ωrp2, whose values were already inferred from the one-Runt binding site constructs as ωrp[001],ωrp[010],andωrp[100] (Figure 5D). Second, we considered protein-protein interactions (positive or negative) between two Runt molecules, ωrr. Third, following recent studies of Bicoid activation of the hunchback P2 minimal enhancer (Estrada et al., 2016a; Park et al., 2019), we also posited the existence of simultaneous Runt-Runt-RNAP higher-order cooperativity ωrrp. Given these different cooperativities, and as shown in detail in Figure 6—figure supplement 6B, the predicted rate of transcription is

(6) Rate=R(p+b6pωbp+rp(ωrp1+ωrp2)+r2pωrp1ωrp2ωrrωrrp+b6rpωbp(ωrp1+ωrp2)+b6r2pωbpωrp1ωrp2ωrrωrrp)(1+b6(1+2r+pωbp)+2r+p+rp(ωrp1+ωrp2)+r2(ωrr+pωrp1ωrp2ωrrωrrp)+b6rpωbp(ωrp1+ωrp2)+b6r2ωrr+b6r2pωbpωrp1ωrp2ωrrωrrp)1.
Figure 6 with 7 supplements see all
Prediction for the transcription initiation rate of hunchback P2 with two-Runt binding sites under different models of cooperativity.

(A) Direct repression model for hunchback P2 with two Runt binding sites featuring Runt-RNAP interaction terms given by ωrp1 and,ωrp2 Runt-Runt cooperativity captured by ωrr, and Runt-Runt-RNAP higher-order cooperativity accounted for by ωrrp. (B) Parameter-free model prediction for two Runt binding sites when the two Runt molecules bind the DNA and interact with RNAP independently of each other. (C,D,E) Best MCMC fits for the data for two-Runt binding site constructs for models with various combinations of cooperativity parameters. (C) Model incorporating Runt-Runt cooperativity. (D) Model incorporating Runt-Runt-RNAP higher-order cooperativity. (E) Model accounting for both Runt-Runt cooperativity and Runt-Runt-RNAP higher-order cooperativity. (F) Fixed or inferred parameters ωrr and ωrrp for all two-Runt binding site constructs. Note that ωrr is fixed to 1 for [011] and [101] constructs due to the fact that no Runt-Runt cooperativity is necessary to quantitatively describe the expression driven by these constructs; only the [110] construct is used to infer both ωrr and ωrrp. The horizontal line of ω=1 denotes the case of no cooperativity. (G) Akaike Information Criterion (AIC) for all four scenarios of different free parameters shown throughout (B–E). (B-E, data points represent mean and standard error of the mean over the embryos. C-E, shaded error bars represent 95% confidence intervals for the best MCMC fits for the Runt WT datasets; F, data and error bars represent the mean and standard deviation of the posterior chain, while the standard deviation for the fixed ωrr is set to 0.).

Here, once again, we have color-coded parameters to be inferred in red to differentiate them from fixed parameters that were already inferred in previous sections. Despite the complexity of this equation, note that its only free parameters are the cooperativity parameters ωrr and ωrrp. As a result, we sought to determine whether the Runt-RNAP cooperativity terms, ωrp1 and ωrp2, are sufficient to predict repression by two Runt molecules, or whether the cooperativities given by ωrr and ωrrp also need to be invoked.

Consider the simplest case where two Runt molecules bind and interact with RNAP independently from each other. Here, ωrr=1, and ωrrp=1. This model has no free parameters; all parameters have already been determined by the inferences performed on Runt null datasets and one-Runt binding site constructs (Figure 4 and Figure 5, respectively). While there was some agreement between the model and the data for the [101] construct (Figure 6B, center), significant deviations from the prediction occurred for the other two constructs. These deviations ranged from less repression than predicted for [011] (Figure 6B, left) to more repression than predicted for [110] (Figure 6B, right). Thus, this simple model of Runt independent repression is not supported by the experimental data, suggesting additional regulatory interactions between the Runt molecules and RNAP.

A first alternative to the independent repression model is the consideration of Runt-Runt cooperative interactions such as those that characterize many transcription factors (Park et al., 2019; Estrada et al., 2016b; He et al., 2010; Segal et al., 2008; Ptashne, 2004). However, adding a Runt-Runt cooperativity term, ωrr, was insufficient to account for the observed regulatory behavior (Figure 6C; Figure 6—figure supplement 4 more thoroughly analyzes this discrepancy). A second alternative consists in incorporating a Runt-Runt-RNAP higher-order cooperativity term, ωrrp. While the best MCMC fits revealed significant improvements in predictive power, important deviations still existed for the [110] construct (Figure 6D, right; Figure 6—figure supplement 5 more thoroughly analyzes the MCMC inference results).

Not surprisingly, given the agreement of the higher-order cooperativity model with the data for the [011] and [101] constructs (Figure 6D, left and center), this agreement persisted when both Runt-Runt cooperativity and Runt-Runt-RNAP higher-order cooperativity were considered (Figure 6E, left and center). However, including these two cooperativities also significantly improved the ability of the model at explaining the [110] experimental data (Figure 6E, right). Thus, while higher-order cooperativity is the main interaction necessary to quantitatively describe repression by two Runt repressors, pairwise cooperativity also needs to be invoked. This conclusion is supported by our MCMC sampling: posterior distributions for the Runt-Runt cooperativity term are not well constrained for the [011] or [101] constructs, whereas Runt-Runt-RNAP higher-order cooperativity is constrained very well across all constructs (Figure 6—figure supplement 6D; Figure 6—figure supplement 6 more thoroughly analyzes the MCMC inference results). As a result, accounting for both pairwise and higher-order cooperativity is necessary for the model to explain the observed rate of RNAP loading of all three constructs.

The higher-order cooperativity revealed by our analysis can lead to more or less repression than predicted by the independent repression model, motivating us to determine the magnitude of this cooperativity across constructs. To make this possible, we inferred the magnitude of the Runt-Runt cooperativity ωrr and the Runt-Runt-RNAP higher-order cooperativity ωrrp. As shown in Figure 6F, depending on the spatial arrangement of Runt binding sites, the Runt-Runt-RNAP higher-order cooperativity term ωrrp can be below or above 1. Note that, in doing these fits, we first set the Runt-Runt cooperativity, ωrr, values for [011] and [101] to 1 because, as we had demonstrated in Figure 6D, only the higher-order Runt-Runt-RNAP cooperativity was necessary. Thus, different placements of Runt molecules on the enhancer lead to distinct higher-order interactions with RNAP which, in turn, can result in less or more repression than predicted by a model where Runt molecules act independently of each other.

Repression by three-Runt binding sites also requires higher-order cooperativity

Building on our success in deploying thermodynamic models to explain repression by one- and two-Runt binding sites, we investigated repression by three-Runt binding sites. First, we accounted for pairwise interactions between Runt and RNAP, which were inferred from measurements of the one-Runt binding site constructs (Figure 1B), yielding ωrp[001],ωrp[010], and ωrp[100] from [001], [010], and [100]. Second, we considered pairwise protein-protein interactions between Runt molecules (Figure 1C), which were inferred from the two-Runt binding sites constructs through the parameters ωrr[011],ωrr[101], and ωrr[110]. Finally, we incorporated Runt-Runt-RNAP higher-order cooperativity acquired from the two-Runt binding sites constructs (Figure 1C) captured by ωrrp[011],ωrrp[101], and ωrrp[110]. we tested our model predictions using a similar scheme to that described in the previous section: we generated a parameter-free prediction for the initial rate of transcription by using the inferred parameters from the one- and two-Runt binding sites constructs, including the pairwise and higher-order interactions described above.

Figure 7A shows the resulting parameter-free prediction. As seen in the figure, our model could not qualitatively recapitulate the experimental data as it predicted too much repression. Such disagreement suggests that additional regulatory interactions are at play. Building on the need for higher-order cooperativity in the two-Runt binding site case, we propose the existence of higher-order cooperativities necessary to describe regulation by three Runt molecules—Runt-Runt-Runt higher-order cooperativity, ωrrr and Runt-Runt-Runt-RNAP higher-order cooperativity, ωrrrp (Figure 1D). The resulting expression for the predicted rate of transcription in the presence of all these sources of cooperativity is shown in Equation S10 in Section ‘Derivation of the general and simpler thermodynamic model for the hunchback P2 enhancer with one Runt binding site’. For simplicity, we assumed that the Runt-Runt-Runt cooperativity is one, and only determined the Runt-Runt-Runt-RNAP higher-order cooperativity. By including only a Runt-Runt-Runt-RNAP higher-order cooperativity parameter, our model recapitulated the experimental data (Figure 7B). Thus, our results further support the view in which the addition of Runt repressor binding motifs in an enhancer calls for the incorporation of cooperativities of increasingly higher-order.

Prediction for hunchback P2 with three-Runt binding sites and multiple sources of cooperativity.

(A) Prediction using previously inferred Runt-RNAP, Runt-Runt, and Runt-Runt-RNAP cooperativity parameters. (B) Best MCMC fit obtained by incorporating an additional Runt-Runt-Runt-RNAP higher-order cooperativity parameter of ωrrrp=857, corresponding to roughly 7kBT of free energy. (A,B, data points represent mean and standard error of the mean over>3 embryos; B, shaded regions represent 95% confidence intervals for the best MCMC fit.).

Discussion

One of the challenges in generating predictions to probe thermodynamic models is that, often, these models are contrasted against experimental data from endogenous regulatory regions (Segal et al., 2008; Sayal et al., 2016; Park et al., 2019). Here, the presence of multiple binding sites for several transcription factors—known and unknown (Vincent et al., 2016)—leads to models with a combinatorial explosion of free parameters. Like the proverbial elephant that can be fit with four parameters (Mayer et al., 2010), experiments with endogenous enhancers typically contain enough parameters to render it possible to explain away apparent disagreement between theory and experiment (Garcia et al., 2020).

To close this gap, synthetic minimal enhancers have emerged as an attractive alternative to endogenous enhancers (Fakhouri et al., 2010; Sayal et al., 2016; Park et al., 2019; Crocker et al., 2016). Here, the presence of only a handful of transcription factor binding sites and the ability to systematically control their placement and affinity dramatically reduce the number of free parameters in the model (Garcia et al., 2020). Inferences performed on these synthetic constructs could then inform model parameters that would make it possible to quantitatively predict transcriptional output of de novo enhancers (Sayal et al., 2016).

Building on these works, we sought to predict how the Runt repressor, which counteracts activation by Bicoid along the anterior-posterior axis of the early fly embryo (Chen et al., 2012), dictates output levels of transcription. To dissect repression, a strong and detectable level of expression in the absence of the repressor was needed, prompting us to choose a simple system of synthetic enhancers based on the strong hunchback P2 minimal enhancer (Garcia et al., 2013; Chen et al., 2012). This enhancer has been carefully studied in terms of its activator Bicoid and the pioneer-like transcription factor Zelda in the early embryo (Driever and Nüsslein-Volhard, 1988; Garcia et al., 2013; Park et al., 2019; Eck et al., 2020), making it easier to identify neutral sequences within the enhancer for introducing Runt binding sites (Chen et al., 2012). Further, when inserted into hunchback P2, Runt binding site number determines the level of transcription incrementally (Chen et al., 2012). Thus, hunchback P2 provided an ideal scaffold for quantitatively and systematically dissecting repression by Runt.

Previous studies using synthetic enhancers relied on measurements of input transcription factor patterns using fluorescence immunostaining, and of cytoplasmic mRNA patterns using fluorescence in situ hybridization (FISH) or single-molecule FISH. These fixed-tissue techniques have key differences from the live-imaging approach adopted here. First, given the dynamical nature of development, it is necessary to know when data were acquired. Doing so with high temporal resolution using FISH is challenging, although it can be accomplished to some degree by synchronizing embryo deposition before fixation (Park et al., 2019). Second, while most transcription factors directly dictate the rate of RNAP loading, and hence the rate of mRNA production (Spitz and Furlong, 2012; Garcia et al., 2013; Eck et al., 2020), typical FISH measurements report on the accumulated mRNA in the cytoplasm, which is a convolution of all processes of the transcription cycle—initiation, elongation, and termination (Liu et al., 2021; Alberts et al., 2015)—as well as mRNA nuclear export dynamics, diffusion, and degradation. These processes could be modulated in space and time, potentially confounding measurements. Here, we overcame these challenges by using the MS2 technique to precisely time our embryos and acquire the rate of transcription initiation. Of course, despite the ease of measuring the rate of transcription initiation using MS2, the accumulated mRNA is presumably a more relevant quantity for predicting downstream cellular decision making. Previous studies have shown that the MS2-MCP technique can also be used to quantify such patterns of accumulated mRNA, and that this quantification leads to results comparable to those obtained by smFISH (Garcia et al., 2013; Lammers et al., 2020). Following the same quantification method, we assessed the relationship between the initial rate of RNAP loading and the accumulated mRNA (Figure 3—figure supplement 5, Figure 3—figure supplement 6) by plotting them against each other. Reassuringly, as shown in Figure 3—figure supplement 8, our analysis revealed a strong correlation (with Pearson’s correlation coefficient of 0.90), supporting our claim that higher-order cooperativity is essential for explaining the action of multiple transcription factors during the development.

Interestingly, our initial dissection of constructs containing various combinations of Runt binding sites, but in the absence of Runt protein, revealed that unrepressed gene expression levels depend strongly on the number and placement of the binding sites within the enhancer (Figure 4A). These results challenge previous assumptions that unregulated gene expression levels stay unchanged as enhancer architecture is modulated (Sayal et al., 2016; Fakhouri et al., 2010; Barr et al., 2017), but they are in accordance with observations in bacterial systems (Garcia et al., 2012). As a result, our measurements call for accounting for unregulated levels in future quantitative dissections of eukaryotic enhancers, or to study relative magnitudes such as the fold-change in gene expression that has driven the dissection of bacterial transcriptional regulation (Phillips et al., 2019).

Using the thermodynamic model shown in Equation 3, we determined that the Bicoid-dependent parameters remain constant while RNAP-dependent parameters vary across these synthetic enhancer constructs. We speculate that the overall enhancer sequence, which changed as a result of the placement of different combinations of Runt binding sites within it, might affect the binding of the transcriptional machinery. Specifically, since the enhancer is proximal to the promoter, the transcriptional machinery might see slightly different DNA sequences in the vicinity of the promoter as suggested by published structures of the transcriptional machinery assembled on DNA (Louder et al., 2016).

Once we accounted for this difference in unrepressed gene expression levels, we determined that the repression profiles obtained for constructs bearing one-Runt binding site could be described by a simple thermodynamic model (Figure 2). Specifically, we showed that the same dissociation constant described Runt binding regardless of the position of its binding site along the enhancer (Figure 5A). Further, the Runt-RNAP interaction terms describing repressor action decreased as the binding site was placed farther from the promoter (Figure 5C), qualitatively consistent with a ‘direct repression’ model in which Runt needs to physically contact RNAP in order to realize its function (Jaynes and O’Farrell, 1991; Gray et al., 1994; Hewitt et al., 1999).

Although our model recapitulated repression by a one-Runt binding site, the inferred parameters were insufficient to quantitatively predict repression by two-Runt binding sites (Figure 6B). These results suggest that multiple repressors do not act independently of each other. Instead, new parameters describing both Runt-Runt cooperativity and Runt-Runt-RNAP higher-order cooperativity had to be incorporated into our models to quantitatively describe Runt action in these constructs (Figure 6—figure supplement 1C–E). An examination of the various cooperativity values inferred in the language of interaction energies (Table 1) revealed that these energies were of a magnitude comparable to protein-protein interaction energies previously measured in bacterial systems (Dodd et al., 2004; Bintu et al., 2005a; Amit et al., 2011). Interestingly, these interaction energies were both positive and negative, suggesting that both cooperativity or anti-cooperativity are at play depending on enhancer architecture (Amit et al., 2011). Additionally, the [101] construct showed a closer agreement with the parameter-free prediction, without invoking higher-order cooperativity, than the other two constructs ([110] or [011]). This further supports a picture where higher-order cooperativity is sensitive to the placement and orientation of transcription factor binding sites within regulatory regions.

Table 1
Interaction energies for the Runt-related cooperativity parameters from one-, two-, and three-Runt sites constructs.

Note that we used the Boltzmann relation of ω=exp(-E/(kBT)), where the E is the interaction energy, kB is the Boltzmann constant, and T is the temperature.

Interaction energies for the Runt-related cooperativity parameters
model parameterconstructinteraction energy (KBT)
Runt-RNAP interaction,ωrp[001]2.34 ± 0.63
[010]1.36 ± 0.36
[100]0.18 ± 0.24
Runt-Runt interaction,ωrr[011]0 (manually set)
[110]-0.95 ± 0.12
[101]0 (manually set)
Runt-Runt-RNAP interaction,ωrrp[011]-2.09 ± 0.27
[110]4.15 ± 1.14
[101]1.12 ± 0.51
Runt-Runt-Runt-RNAP interaction,ωrrrp[111]-2.12 ± 0.14

While we have long known about protein-protein cooperative interactions (Ackers et al., 1982), in the last few years it has become clear that higher-order cooperativity can also be at play in eukaryotic systems (Estrada et al., 2016a; Park et al., 2019; Biddle et al., 2020) as well as in bacteria (Dodd et al., 2004) and archaea (Peeters et al., 2013). The existence of this higher-order cooperativity suggests that, to predict gene expression from DNA sequence, it might be necessary to build an understanding of the many simultaneous interactions that precede transcriptional initiation. Our discovery of higher-order cooperativity in the action of multiple Runt molecules opens up new avenues to uncover the molecular nature of this phenomenon. For example, following an approach developed in Park et al., 2019, it could be possible to determine whether and how these cooperativity parameters are modulated upon perturbation of molecular players such as the Groucho or CtBP co-repressors, Big-brother, a co-factor facilitating the Runt binding to DNA, and components of the mediator complex (Park et al., 2019; Courey and Jia, 2001; Walrad et al., 2011). Indeed, Park et al., 2019 recently showed that co-activators and mediator units are involved in dictating the magnitude of similar higher-order cooperativity terms in activation by Bicoid. Thus, our thermodynamic models provide a lens through which to dissect the molecular underpinnings of Runt interactions with itself and with the transcriptional machinery.

Notably, the need to invoke cooperative interactions as more Runt binding sites are being added opposes our goal of predicting complex regulatory architectures from experiments with simpler architectures without the need to invoke new parameters. However, it will be interesting to determine whether more parameters need to be invoked as the number of Runt binding sites increases beyond three, or whether the parameters already inferred are sufficient to endow our models with parameter-free predictive power.

Importantly, while our model adopted a ‘direct repression’ view of the mechanism of Runt action, other mechanisms of repression such as ‘quenching’ could also describe the data. While all such models call for higher-order cooperativity to describe the data (Supplementary Section ‘Comparison of different modes of repression’), our data cannot differentiate among those models. Thus, we did not attempt to distinguish different molecular mechanisms of Runt transcriptional repression.

Finally, even though the work presented here has relied exclusively on thermodynamic models, it is important to note that a much more general approach based on kinetic models that are not in thermodynamic equilibrium could also be appropriate for describing our data. Indeed, an increasing body of work over the last few years has provided evidence for the necessity of invoking these more complex models in the context of transcriptional regulation in eukaryotes (Estrada et al., 2016a; Li et al., 2018; Park et al., 2019; Eck et al., 2020). In future work, it will be interesting to determine whether, when our data is viewed through the lens of these non-equilibrium models, invoking higher-order cooperativity is still necessary or whether, instead, simple pairwise protein-protein interactions suffice to reach an agreement between theory and experiment.

Overall, the work presented here establishes a framework for systematically and quantitatively studying repression in the early fly embryo. As showcased here, synthetic enhancers based on the hunchback P2 minimal enhancer constitute an ideal scaffold for the study of other repressors in early fly embryos. For example, we envision that this approach could be used to dissect repression by other transcription factors such as Capicua or Krüppel (Löhr et al., 2009; Sauer and Jäckle, 1991; Papagianni et al., 2018; Chen et al., 2012), and to probe observations of multiple repressors working together to oppose activation by Bicoid in establishing gene expression patterns along the anterior-posterior axis (Chen et al., 2012; Briscoe and Small, 2015). We anticipate that a similar approach could be used to dissect repression along the dorso-ventral axis of the embryo, by for example, adding repressor binding sites to well-established reporter constructs that are only regulated by the Dorsal activator (Jiang and Levine, 1993). Critically, we need to understand not only how one species of repressor works in concert with an activator, but also how multiple species of repressors work together as a system. The approach presented here provides a way forward for predictively understanding the complex gene regulatory network that shapes gene expression patterns in the early fly embryo.

Materials and methods

Generation of synthetic enhancers with MS2 reporter

Request a detailed protocol

The synthetic enhancer constructs used in this study are based off of Chen et al., 2012. In summary, the hunchback P2 enhancer was used as a scaffold to introduce Runt binding sites at different positions that are thought to be neutral (i.e. these Runt binding sites do not interfere with any other obvious binding sites for other transcription factors in the early Drosophila embryos as shown in Figure 4—figure supplement 1). For the three positions chosen to introduce Runt binding sites in Chen et al., 2012, the Gene Synthesis service from Genscript was used to generate synthetic enhancers with all possible configurations of zero-, one-, two-, and three-Runt binding sites in hunchback P2 as shown in Figure 1A. The enhancer sequences were placed into the original plasmid pIB backbone (Chen et al., 2012) using the Gene Fragment Synthesis service in Genscript, followed by the even-skipped promoter, and 24 repeats of the MS2v5 loop (Wu et al., 2015), the lacZ coding sequence, and the α-Tubulin 3’UTR sequence (Chen et al., 2012) as shown in Table 2. These plasmids were injected into the 38F1 landing site using the RMCE method (Bateman et al., 2006) by BestGene Inc Flies were screened by selecting for white eye color and made homozygous. The orientation of the insertion was determined by genomic PCR to ensure a consistent orientation across all of our constructs. Specifically, we used two sets of primers that each amplified one of these two possible orientations: ‘Upward’, where the forward primer binds to a genomic location outside of 38F1 (TTCTAGTTCCAGTGAAATCCAAGCA) and the reverse primer binds to a location in our reporter transgene (ACGCCAGGGTTTTCCCAG), and ‘Downward’, where the forward primer remains the same as the ‘Upward’ set and the reverse primer binds to a location in our reporter transgene (CTCTGTTCTCGCTATTATTCCAACC) when the insertion is the opposite orientation to the ‘Upward’ orientation. As a result, only amplicons from either one of the orientations of insertion in the 38F1 landing site can be obtained. We chose the ‘Downward’ orientation for all our constructs.

Table 2
List of plasmids used to create the transgenic fly lines used in this study.
Plasmids
Name (hyperlinked to Benchling)Function
pIB-hbP2-evePr-MS2v5-LacZ-Tub3UTR[000]-MS2v5 reporter construct
pIB-hbP2+r1-far-evePr-MS2v5-LacZ-Tub3UTR[100]-MS2v5 reporter construct
pIB-hbP2+r1-mid-evePr-MS2v5-LacZ-Tub3UTR[010]-MS2v5 reporter construct
pIB-hbP2+r1-close-evePr-MS2v5-LacZ-Tub3UTR[001]-MS2v5 reporter construct
pIB-hbP2+r2-2+3-evePr-MS2v5-LacZ-Tub3UTR[011]-MS2v5 reporter construct
pIB-hbP2+r2-1+3-evePr-MS2v5-LacZ-Tub3UTR[101]-MS2v5 reporter construct
pIB-hbP2+r2-1+2-evePr-MS2v5-LacZ-Tub3UTR[110]-MS2v5 reporter construct
pIB-hbP2+r3-evePr-MS2v5-LacZ-Tub3UTR[111]-MS2v5 reporter construct
pHD-scarless-LlamaTag-RuntDonor plasmid for LlamaTag-Runt CRISPR knock-in fusion for the N-terminal
pU6:3-gRNA(Runt-N-2)gRNA plasmid for LlamaTag-Runt CRISPR knock-in fusion for the N-terminal
pCasper-vasa-eGFPvasa maternal driver for ubiquitous eGFP expression in the early embryo

CRISPR-Cas9 knock-in of the green LlamaTag in the endogenous runt locus

Request a detailed protocol

We used CRISPR-Cas9 mediated Homology Directed Repair (HDR) to insert the LlamaTag against eGFP into the N-terminal of the runt endogenous locus (Bothma et al., 2018; Gratz et al., 2015). The donor plasmid was constructed by stitching individual fragments—PCR amplified left/right homology arms from the endogenous runt locus roughly 1 kb in length each, LlamaTag, and pHD-scarless vector—using Gibson assembly (Gratz et al., 2015). The PAM sites in the donor plasmid were mutated such that the Cas9 only cleaved the endogenous locus, not the donor plasmid, without changing the amino acid sequence of the Runt protein. The final donor plasmid contained the 3xP3-dsRed marker such that dsRed is expressed in the fly eye and ocelli for screening. Positive transformant flies were screened using a fluorescence dissection scope and set up for single fly crosses to establish individual lines that were then verified with PCR amplification and Sanger sequencing (UC Berkeley Sequencing Facility). Importantly, this llamaTag-runt allele rescues development to adulthood as a homozygous. Thus we concluded that the LlamaTag-Runt allele can be used to monitor the behavior of endogenous Runt protein.

Fly strains

Request a detailed protocol

Transcription from the synthetic enhancer reporter constructs was measured by using embryos from crossing yw;his2av-mRFP1;MCP-eGFP(2) females and yw;synthetic enhancer-MS2v5-lacZ;+ males as described in Garcia et al., 2013; Eck et al., 2020; Lammers et al., 2020.

eGFP-Bicoid measurements were performed using the fly line from Gregor et al., 2007. The LlamaTag-Runt measurements were done using the fly line LlamaTag-Runt; +; vasa-eGFP, His2Av-iRFP illustrated in Table 3. Briefly, eGFP was supplied by a vasa maternal driver. Females carrying both the LlamaTag-Runt and the vasa-driven eGFP were crossed with males carrying the LlamaTag-Runt, the progeny from this cross were imaged and then recovered to determine the embryo’s sex using PCR. PCR was run with three sets of primers: Y chr1 (Forward: CGATCCAGCCCAATCTCTCATATCACTA, Reverse: ATCGTCGGTAATGTGTCCTCCGTAATTT), Y chr2 (Forward: AACGTAACCTAGTCGGATTGCAAATGGT, Reverse: GAGGCGTACAATTTCCTTTCTCATGTCA), and Auto1 (Forward: GATTCGATGCACACTCACATTCTTCTCC, Reverse: GCTCAGCGCGAAACTAACATGAAAAACT). Two of primers in the set (Y chr1 and Y chr2) bind to the Y chromosome while the other one (Auto1) binds to one of the autosomes and constitutes a positive control (Lott et al., 2011). The Histone-iRFP fly line was from Pan et al., 2022, and was used for nuclei segmentation to extract nuclear flourescence from the eGFP channel.

Table 3
List of fly lines used in this study and their experimental usage.
Fly lines
GenotypeUse
LlamaTag-Runt; +; vasa-eGFP, His2Av-iRFPVisualize LlamaTagged Runt protein and label nuclei
LlamaTag-Runt; +; MCP-eGFP(4F), His2Av-iRFPVisualize LlamaTagged Runt protein, nascent transcripts and label nuclei
run3/FM6; +; +Visualize LlamaTagged Runt protein, nascent transcripts and label nuclei
yw; His2Av-mRFP; MCP-eGFPFemales to label nascent RNA and nuclei
yw; [000]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [100]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [010]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [001]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [011]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [101]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [110]-MS2v5; +Males carrying the MS2 reporter transgene
yw; [111]-MS2v5; +Males carrying the MS2 reporter transgene

To generate the embryos that are zygotic null for the runt allele, we used a fly cross scheme consisting of two crosses. In the first generation, we crossed LlamaTag-Runt;+;+ males with run3/FM6;+;MCP-eGFP(4 F),his2av-mRFP1 females. run3 is the null allele for runt, missing around 5 kb including the coding sequence of the runt locus (Gergen and Butler, 1988; Chen et al., 2012). The MCP-eGFP(4 F) transgene expresses approximately twice the amount of MCP protein than the MCP-eGFP(2) (Garcia et al., 2013; Eck et al., 2020) and thus results in similar levels of MCP to those of MCP-eGFP(2) in the trans-heterozygotes. The female progeny from this cross, LlamaTag-Runt/run3;+;MCP-eGFP(4 F),his2av-mRFP1/+ was then crossed with males whose genotype was LlamaTag-Runt/Y;synthetic enhancer-MS2v5-lacZ;+ to produce the embryos that we used for live imaging. The resulting embryos carried maternally supplied MCP-eGFP and His-RFP for visualization of nascent transcripts and nuclei. The X chromosome contained a LlamaTag-Runt allele or run3 null allele. We could differentiate between these two genotypes because, when the embryo had the Runt allele, a stripe pattern would appear in late nc14. We imaged all embryos until late nc14 to make sure that we were capturing the nulls.

Sample preparation and data collection

Request a detailed protocol

Sample preparation was done following the protocols described in Garcia et al., 2013. Briefly, embryos were collected, dechorionated with bleach for 1–2 min, and then mounted between a semipermeable membrane (Lumox film, Starstedt, Germany) and a coverslip while embedded in Halocarbon 27 oil (Sigma-Aldrich). Live imaging was performed using a Leica SP8 scanning confocal microscope, a White Light Laser and HyD dectectors (Leica Microsystems, Biberach, Germany). Imaging settings for the MS2 experiments with the presence of MCP-eGFP and Histone-RFP were the same as in Eck et al., 2020 except that we used a 1024x245 pixel format to image a wider field of view along the anterior-posterior axis. The settings for the eGFP-Bicoid measurements were the same as described in Eck et al., 2020.

The settings for the eGFP:LlamaTag-Runt measurements were similar to that of eGFP-Bicoid except for the following. To increase our imaging throughput, we utilized the ‘Mark and Position’ functionality in the LASX software (Leica SP8) to image 5–6 embryos simultaneously. To account for the decreased time resolution, we lowered the z-stack size from 10 μm to 2.5 μm, keeping the 0.5 μm z-step. By doing this, we could maintain 1-min frame rate for each imaged embryo. Additionally, these flies expressed Histone-iRFP, instead of Histone-RFP as in Eck et al., 2020, so that we used a 670 nm laser at 40 μW (measured at a 10x objective) for excitation of the histone channel, and the HyD detector was set to a 680 nm-800 nm spectral window (Figure 3—figure supplement 7).

Image analysis

Request a detailed protocol

Images were analyzed using custom-written software (MATLAB, mRNA Dynamics Github repository; Garcia Lab @ UC Berkeley, 2022) following the protocol in Garcia et al., 2013 and Eck et al., 2020. Briefly, this procedure involved segmentation and tracking of nuclei and transcription spots. First, segmentation and tracking of individual nuclei were done using the histone channel as a nuclear mask. Second, segmentation of each transcription spot was done based on its fluorescence intensity and existence over multiple z-stacks. The intensity of each MCP-GFP transcriptional spot was calculated by integrating pixel intensity values in a small window around the spot and subtracting the background fluorescence measured outside of the active transcriptional locus. When there was no detectable transcriptional activity, we assigned NaN values for the intensity. The tracking of transcriptional spots was done by using the nuclear tracking and proximity of transcriptional spots between consecutive time points. The nuclear protein fluorescence intensities from the eGFP-Bicoid and LlamaTag-Runt fly lines, which we use as a proxy for the protein nuclear concentration, were calculated as follows. Using the nuclear mask generated from the histone channel, we performed the same nuclear segmentation and tracking as described above for the MS2 spots. Then, for every z-section, we extracted the integrated fluorescence over a 2μm diameter circle on the xy-plane centered on each nucleus. For each nucleus, the recorded fluorescence corresponded to the z-position where the fluorescence was maximal. This resulted in an average nuclear concentration as a function of time for each single nucleus. These concentrations from individual nuclei were then averaged over a narrow spatial window (2.5% of the embryo length) to generate the spatially averaged protein concentration reported in the main text. For the eGFP:LlamaTag-Runt datasets, we had to subtract the background eGFP fluorescence due to the presence of an unbound eGFP population (Bothma et al., 2018). We used the same protocol described in Bothma et al., 2018 and in the Supplementary Section ‘Quantifying the nuclear concentration of LlamaTag-Runt’ to extract this background.

Bayesian inference procedure: Markov Chain Monte Carlo sampling

Request a detailed protocol

Parameter inference was done using the Markov Chain Monte Carlo (MCMC) method. We used a well-established package MCMCstat that uses an adaptive MCMC algorithm (Haario et al., 2006; Haario et al., 2001). A detailed description on how we performed the MCMC parameter inference, for example setting the priors and bounds for parameters, can be found in Supplementary Section ‘Markov Chain Monte Carlo inference protocol’.

Appendix 1

S1 Derivation of the general thermodynamic model for the hunchback P2 enhancer

In this section, we rederive the thermodynamic model presented in the main text, now without the assumption of strong Bicoid-Bicoid cooperativity. The equilibrium thermodynamic modeling framework that we used in this paper is described in more detail in Bintu et al., 2005a; Bintu et al., 2005b.

We start by modeling the case of hunchback P2 without any Runt binding sites, which is believed to have at least six Bicoid binding sites (Park et al., 2019; Driever et al., 1989). As shown by the states and weights presented in Figure 2—figure supplement 1A, in our thermodynamic model, we assume that the six Bicoid binding sites have the same dissociation constant given by Kb, and we posit that RNAP-promoter binding is governed by a dissociation constant given by Kp. We also assume pairwise cooperativity between Bicoid molecules given by ωb, and cooperativity between each Bicoid molecule and RNAP given by ωbp. For simplicity, we will use the dimensionless parameters b=[Bicoid]/Kb and p=[RNAP]/Kp, where [Bicoid], and [RNAP] are the concentrations of Bicoid and RNAP, respectively, and Kb and Kp are their corresponding dissociation constants.

We factor the total partition function into two categories: Zb corresponding to states that only have Bicoid bound, and Zbp describing states with both Bicoid and RNAP bound. Then, we calculate each component separately. The sum of microstates for Zb is

(S1) Zb=1+6b+15b2ωb++b6ωb5=1+i=16(6i)biωbi-1.

Using the binomial theorem, we can simplify Equation S1 leading to

(S2) Zb=1+i=16(6i)biωbi1=1+1ωb[(1+b ωb)61].

Using the same logic, we obtain Zbp such that

(S3) Zbp=(p+pi=16(6i)biωbi1ωbpi)=p+pωb[(1+bωbωbp)61].

Using these two partition functions, we then calculate the probability of the promoter being bound by RNAP, pbound as

(S4) Pbound=ZbpZb+Zbp=p+pωb[(1+bωbωbp)6-1]1+1ωb[(1+bωb)6-1]+p+pωb[(1+bωbωbp)6-1].

Following recent work (Gregor et al., 2007; Park et al., 2019), we now assume that the Bicoid-Bicoid pairwise cooperativity is very strong (ωb1). We can then simplify Equation S4 to obtain

(S5) Pbound=p+pb6ωb5ωbp61+p+b6ωb5+pb6ωb5ωbp6.

If we now define a new binding constant for Bicoid, Kb=Kb*(1ωb)56, such that b=bωb56, and a new cooperativity term between Bicoid and RNAP given by ωbp=ωbp6, we can then rewrite Equation S5 as

(S6) Pbound=p+b6pωbp1+p+b6+b6pωbp,

which is the expression we use throughout the main text. Thus, strong pairwise cooperativity between Bicoid molecules leads to a functional form where only the state with all Bicoid molecules bound remain (six in this case). This strong cooperativity can explain the sharp step-like expression pattern along the embryo’s anterior-posterior axis of the hunchback gene (Figure 3J; Gregor et al., 2007, Park et al., 2019, Driever and Nüsslein-Volhard, 1988; Driever and Nüsslein-Volhard, 1989).

S2 Derivation of the general and simpler thermodynamic model for the hunchback P2 enhancer with one Runt binding site

Having derived the equation for the strong cooperative binding of Bicoid to the wild-type hunchback P2 enhancer, we will now extend that model to the case of hunchback P2 with one Runt binding site. The corresponding states and weights of our full model are shown in Figure 2—figure supplement 2A.

Using a similar logic for calculating the partition functions as described in the previous section, we can compute the probability of the promoter being bound by RNAP as

(S7) pbound=(p+p i=16(6i)biωbi1ωbpi)Bicoid and RNAP+(r p ωrp+r p ωrpi=16(6i)biωbi1ωbpi)Bicoid, Runt, and RNAP(1+i=16(6i)biωbi1)Bicoid only+(p+pi=16(6i)biωbi1ωbpi)Bicoid and RNAP+(r+ri=16(6i)biωbi1)Bicoid and Runt+(rpωrp+rpωrpi=16(6i)biωbi1ωbpi)Bicoid, Runt, and RNAP,

where, in addition to the parameters defined in the above section for the wild-type hunchback P2 case in the absence of Runt, we have added two parameters: the dissociation constant for Runt given by Kr, and a Runt-RNAP interaction term (an anti-cooperativity), ωrp. Using the binomial theorem as in Equation S2, we can simplify Equation S7 to obtain

(S8) pbound=p+pωb[(1+bωbωbp)61]+rpωrp+rpωrpωb[(1+bωbωbp)61]1+1ωb[(1+bωb)61]+p+pωb[(1+bωbωbp)61]+r+rωb[(1+bωb)61]+rpωrp+rpωrpωb[(1+bωbωbp)61].

We now again assume that Bicoid-Bicoid cooperativity is very strong such that ωb1. Then, we can combine Equation S8 with Equation 1 to obtain

(S9) Rate=Rpbound=Rp+b6pωbp+rpωrp+b6rpωbpωrp1+b6+r+b6r+p+b6pωbp+rpωrp+b6rpωbpωrp,

where the new parameters, b and ωbp are defined in the same way as in Equation S6. The effective states and weights remaining after taking this limit are shown in Figure 2—figure supplement 2B. Similarly, we can derive expressions for pbound in the presence of two and three Runt binding sites, and in the strong Bicoid-Bicoid cooperativity limit in order to obtain the predictions used throughout this text. We show this expression for two Runt binding sites in Equation 6. Further, for the case of repression by three Runt binding sites, the rate of transcription is given by

(S10) Rate=R(p+b6pωbp+rp(ωrp1+ωrp2+ωrp3)+b6rpωbp(ωrp1+ωrp2+ωrp3)+r2p(ωrp1ωrp2ωrr1ωrrp1+ωrp2ωrp3ωrr2ωrrp2+ωrp3ωrp1ωrr3ωrrp3)+r3pωrp1ωrp2ωrp3ωrr1ωrr2ωrr3ωrrp1ωrrp2ωrrp3ωrrrωrrrp+b6r2pωbp(ωrp1ωrp2ωrr1ωrrp1+ωrp2ωrp3ωrr2ωrrp2+ωrp3ωrp1ωrr3ωrrp3)+b6r3pωbpωrp1ωrp2ωrp3ωrr1ωrr2ωrr3ωrrp1ωrrp2ωrrp3ωrrrωrrrp)(1+b6(1+3r+pωbp)+3r+p+rp(ωrp1+ωrp2+ωrp3)+r2p(ωrp1ωrp2ωrr1ωrrp1+ωrp2ωrp3ωrr2ωrrp2ωrp3ωrp1ωrr3ωrrp3)+b6r2(ωrr1+ωrr2+ωrr3)+b6r3ωrr1ωrr2ωrr3ωrrr+r2(ωrr1+ωrr2+ωrr3)+r3ωrr1ωrr2ωrr3+r3pωrp1ωrp2ωrp3ωrr1ωrr2ωrr3ωrrp1ωrrp2ωrrp3ωrrrωrrrp+b6rpωbp(ωrp1+ωrp2+ωrp3)+b6r2pωbp(ωrp1ωrp2ωrr1ωrrp1+ωrp2ωrp3ωrr2ωrrp2+ωrp3ωrp1ωrr3ωrrp3)+b6r3pωbpωrp1ωrp2ωrp3ωrr1ωrr2ωrr3ωrrp1ωrrp2ωrrp3ωrrrωrrrp)1,

where the parameters are defined as in Figure 1 and Section ‘Repression by three-Runt binding sites also requires higher-order cooperativity’.

S3 Comparing using static versus dynamic transcription factor concentrations as model inputs

In this section, we tested whether using static, time-averaged transcription factor concentration profiles yielded comparable theoretical predictions than when instead acknowledging the fact that input transcription factor concentration changes over time. Briefly, we compared the predicted rate of transcription calculated in two ways: (1) time-averaging the instantaneous rate from the dynamic transcription factor concentration profiles over a specified time window (from 5 to 10 minutes from the 13th anaphase) and (2) using static input transcription factors already time-averaged over the same time window.

As a concrete example, we focused on the hunchback P2 enhancer with one Runt binding site. We calculated the predicted rate of transcription using the thermodynamic model given by Equation 2. First, we performed this calculation using the dynamic concentration profiles of Bicoid and Runt shown in Figure 3B and D, respectively. Briefly, the terns b and r in Equation 2 now become functions of time such that

(S11) Rate(t)=Rp+b6(t)pωbp+r(t)pωrp+b6(t)r(t)pωbpωrp1+b6(t)+r(t)+b6(t)r(t)+p+b6(t)pωbp+r(t)pωrp+b6(t)r(t)pωbpωrp,

where b(t)=[Bicoid](t)/Kb and r(t)=[Runt](t)/Kr. We choose a set of reasonable values for the model parameters to illustrate the calculation of Rate(t) at 30% of the embryo length. The resulting dynamic rate of transcription profile is shown in Figure 3—figure supplement 1A (blue curve). We then use this profile to calculate the time-averaged rate of transcription over the time window of 5–10 minutes from the 13th anaphase, resulting in the green area shown in Figure 3—figure supplement 1A.

The predicted average rate of RNAP loading given dynamic input transcription factors can be compared to the predicted rate of RNAP loading given the average input concentrations that we used throughout the main text (Figure 3E). Specifically, we plug the static concentration profiles of Bicoid and Runt shown in Figure 3E into Equation 2 to obtain the red area shown in Figure 3—figure supplement 1A. As shown in the figure, the predicted rate of transcription obtained by these two analysis methodologies are equivalent within error.

Finally, we performed this comparison between different approaches to calcualate the rate of transcription as a function of position along the embryo (from 20% to 70% of the embryo length). As shown in Figure 3—figure supplement 1B, the resulting spatial profiles are comparable within error. Thus, we have shown that our approach of using time-averaged, static transcription factor concentrations as inputs to our model yield quantitatively equivalent result as accounting for the dynamic concentration profiles of these transcription factors.

S4 Quantitative interpretation of MS2 signals

The MS2 signal reports on three features of transcriptional dynamics: (1) the initial RNAP loading rate, (2) the duration of transcription, and (3) the fraction of loci that engage in transcription at any time point in the nuclear cycle. In this section, we will explain in further detail how we extract these features from the MS2 signal over nuclear cycle 14.

S4.1 Extracting the initial RNAP loading rate

The initial rate of RNAP loading corresponds the average transcription rate observed after transcriptional onset and until the MS2 signal reaches its peak value during nuclear cycle 14. In order to measure this rate, we followed the protocol described in Garcia et al., 2013. Briefly, as shown in Figure 3—figure supplement 2A, we fitted a line to the MS2 time trace (averaged over nuclei within a spatial window of 2.5% of the embryo length) within the time window of 5–10 minutes after the 13th anaphase. The slope of this line reported on the initial rate of RNAP loading (Figure 3G). The spatial profiles of this initial rate of RNAP loading across all our synthetic enhancer constructs and genotypes are shown in Figure 3—figure supplement 2B.

S4.2 Extracting the duration of transcription

In the main text, we focused on the theoretical prediction of the initial rate of transcription. However, the length of the time window over which transcription occurs (Lammers et al., 2020) is another regulatory knob that, in principle, Runt could modulate to dictate gene expression patterns. We sought to determine the duration of time over which transcription occurs to assess whether Runt affects not only the initial rate of transcription, but also the time window over which transcription could initiate. To quantify the effective duration of transcription initiation, we resorted to the analysis methodology developed in Garcia et al., 2013. Briefly, we parametrized the MS2 signal decay regime—after transcription reaches its peak and becomes slower than the unloading rate (Garcia et al., 2013)—as an exponential decay (Figure 3—figure supplement 3A). Thus, we can describe the MS2 spot fluorescence trace in the decay regime as

(S12) Fluo(t)=Fluomaxe-(t-Tpeak)/τ,

where Tpeak represents the time point where the MS2 spot fluorescence reaches its peaks, Fluomax is the maximum level of fluorescence, and τ is the decay time.

Given the sometimes noisy MS2 traces (data not shown), we fitted an exponential curve to the more robust integral of the MS2 spot fluorescence over time from Tpeak to the end of nuclear cycle 14 as shown in Figure 3—figure supplement 3B. This quantity is proportional to the amount of mRNA produced between the integration bounds (Garcia et al., 2013). The resulting accumulated mRNA time trace is then fitted to the integrated form of Equation S12, which is given by

(S13) mRNA(t)=mRNAmax(1-e(t-Tpeak)/τ),

where mRNAmax is the accumulated mRNA at the end of nuclear cycle 14.

The resulting profiles of the duration of transcription along the embryo for our all synthetic enhancer constructs are illustrated in Figure 3—figure supplement 3C in the presence and absence of Runt protein. As shown in the figure, this duration time is not significantly modulated by Runt repressor.

S4.3 Calculation of the fraction of competent nuclei

Another quantity that could be modulated by Runt repressor is the fraction of loci that ever engage in transcription during a given nuclear cycle, which we termed as the “fraction of competent loci”. As demonstrated by Garcia et al., 2013, Dufourt et al., 2018, Lammers et al., 2020 and Eck et al., 2020, this fraction of transcriptionally competent loci is modulated along the anterior-posterior axis, presumably due to the action of transcription factor gradients.

To show a concrete example of how this quantity is calculated, we take data from one construct ([000]) showing the MS2 spot fluorescence time traces from individual loci of transcription as shown in Figure 3—figure supplement 4A. Here, columns represent time points during nuclear cycle 14, and rows represent individual transcriptional loci. As shown in the figure, roughly 80% of the loci, labeled as “competent loci”, show active transcription during nuclear cycle 14. However, the remaining 20% of the loci never engage in transcription, which we termed as “incompetent loci”. Because these two populations exhibit wildly different behaviors, we define the fraction of competent loci as

(S14) fraction of competent loci=number of competent locinumber of total loci.

Thus, in this example in Figure 3—figure supplement 4A, the fraction of competent loci is approximately 0.8.

Figure 3—figure supplement 4B shows the measured fraction of active loci for all synthetic enhancer constructs in the presence and absence of Runt repressor. As seen in the figure, although this quantity can be modulated by the presence of Runt repressor, this is not always the case (e.g., [011] and [111]). Moreover, we could not find a trend for how the fraction of competent loci is modulated by different combinations of Runt binding sites. For example, the [100] construct alone did show a change in the fraction of active loci in the presence of Runt, whereas the [010] construct did not. When these two binding sites were combined as the [110], there was no significant modulation of the fraction of competent loci when adding Runt repressor. In another example, the [001] construct showed a mild modulation of the fraction of competent loci. However, when this Runt binding site was combined with the [010], which did not show any modulation, the [011] construct showed a much bigger modulation of the fraction of competent loci than the [001]. Thus, the [010] Runt binding site could drive more or less modulation of the fraction of competent loci when combined with different Runt binding sites in a context-dependent manner. As a result of our failure to uncover an apparent trend in terms of which regulatory architectures lead to a stronger modulation of the fraction of active loci, we did not attempt to theoretically explain the regulation of this fraction of active loci in this study.

S5 Design of synthetic enhancer constructs based on the hunchback P2 enhancer

The Runt binding sites were introduced into the hunchback P2 minimal enhancers at the positions determined by Chen et al., 2012. To make this possible, the authors chose positions containing presumed neutral DNA sequences, meaning that these DNA locations did not contain obvious motifs for Bicoid or Zelda, the major input transcription factors that regulate this enhancer. Then, these DNA sequences were mutated to turn them into Runt binding sites.

To ensure that this process did not perturb the binding sites for Bicoid and Zelda we resorted to the Advanced PATSER entry form (Hertz et al., 1990; Hertz and Stormo, 1999) which identifies the location of transcription factor binding sites from a sequence of DNA based on position weight matrices. We used position weight matrices for Bicoid and Zelda from Park et al., 2019. PATSER was run with the settings described in Eck et al., 2020 for both the hunchback P2 enhancer and the hunchback P2 enchancer with three Runt binding sites (from Chen et al., 2012) for Bicoid and Zelda, respectively. The result of this analysis for these two constructs is shown for each transcription factor in Figure 4—figure supplement 1A. Here, we took a PATSER score cutoff—for considering a given sequence to be a binding site—of 3 as in Eck et al., 2020. We observed that the recognized binding motifs for both Bicoid and Zelda were identical between the two constructs, meaning that we did not add additional Bicoid or Zelda binding sites by introducing the Runt motifs. The resulting synthetic enhancer with three Runt binding sites with mapped binding sites for Bicoid, Zelda (Figure 4—figure supplement 1A), and Runt (Chen et al., 2012) is shown in Figure 4—figure supplement 1B as a reference. The position of the Runt binding sites are noted from their distance from the promoter (which is marked as 0).

S6 Markov Chain Monte Carlo inference protocol

Markov Chain Monte Carlo (MCMC) sampling is a widely used technique for robust parameter estimation using Bayesian statistics (Geyer and Thompson, 1992; Sivia and Skilling, 2006). We used the MATLAB package MCMCstat, an adaptive MCMC technique, which we could directly implement downstream of our data analysis pipeline (Haario et al., 2006; Haario et al., 2001). Detailed instructions on how to implement the MCMCstat package can be found in https://mjlaine.github.io/mcmcstat/, (copy archived at swh:1:rev:ca0cbd288f03c7a29050b3d6698a96b45ccfa4b2; Laine, 2021).

MCMC allows for an estimation of the set of parameter values of a model that best explain the experimental data along with their associated errors. In this work, we used MCMC to infer the set of best fit values of the parameters in our thermodynamic models given the observed profile of the rate of transcription initiation along the anterior-posterior axis of the embryo.

MCMC calculates a Bayesian posterior probability distribution of each free parameter given the data by stochastically sampling different parameter values. For a given set of observations D and a model with parameters θ, the posterior probability distribution of a particular set of values is given by Bayes’ theorem

(S15) p(θ|D)posteriorp(D|θ)likelihoodp(θ)prior.

The prior function represents the a priori assumption about the probability distribution of parameter values θ. Here, we assumed a uniform prior distribution for all parameters to reflect our ignorance about the model parameters within the following intervals:

  • Kb: [0,100] AU

  • ωbp: [0,200]

  • p: [0,1]

  • R: [0,400] AU/min

  • Kr: [0,100] AU

  • ωrp: [0,1.2]

  • ωrr: [0,100]

  • ωrrp: [0,100]

These intervals were justified using the following arguments.

First, because we observed a gradual modulation of the rate of transcription by both Bicoid and Runt in the middle region of the embryo we reasoned that the binding sites for these transcription factors were not saturated. As a result, we posited that the real dissociation constant should be between the minimum and maximum measured values of Bicoid and Runt (Figure 3—figure supplement 2). Our measurements of Bicoid and Runt concentration yield fluorescence values over the 0–100 AU range for the embryo region that we used for contrasting our model and experimental data (20–50% of the embryo length), such that the dissociation constants (Kb and Kr) should not exceed the maximum value of the Bicoid or Runt concentration.

Second, ωbp represents the cooperativity between Bicoid complex and RNAP. In the statistical mechanics framework, this cooperativity can be expressed using the interaction energy between Bicoid and RNAP, Δϵbp, such that ωbp=exp(-βΔϵbp), where β=1kBT, kB is the Boltzmann constant and T is the temperature. There is not much known about in vivo interaction energies between Bicoid and RNAP complex, thus we tried several different bounds until we found a narrow enough parameter bound with unimodal distribution of the posterior chain. As we could see from the corner plots in Figure 4C, there is a positive correlation between Kb and ωbp. Thus, we constrained the ωbp intervals by finding an interval that gives both well-constrained Kb and ωbp (Figure 4C).

Third, R represents the rate of RNAP loading when the promoter is occupied, thus it is constrained by the maximum observed rate of RNAP loading (Figure 3—figure supplement 2).

Fourth, p=([RNAP]/Kp) represents the concentration of RNAP divided by its dissociation constant. Recall that the predicted rate of transcription from hunchback P2 in the limit where the Bicoid concentration reaches zero is given by

(S16) Rate([Bicoid]0)=Rp1+p.

This rate of transcription at the posterior region, where Bicoid reaches zero, is much lower than that at the anterior region where Bicoid saturates given by R (Figure 3—figure supplement 2). As a result, we can write the inequality

(S17) Rp1+pR.

such that

(S18) p1+p1,

which holds if p1.

Finally, we did not have good estimates for the intervals of either Runt-Runt cooperativity, ωrr, or higher-order cooperativity, ωrrp. Thus, we initially started with an interval of [0,100], of the same order as the interval we used ωbp. We then explored whether this parameter bound was sufficient to give us constrained values of ωrr and ωrrp. As we showed in Figure 6—figure supplement 6D, this interval gives reasonably constrained values of ωrr and ωrrp. As shown in Figure 6 and Figure 6—figure supplement 6, we posit that the ωrr parameter is not well-constrained not because of its width of the interval, but because it is not as essential for the model fit to the data as it is to include ωrrp into the model. Overall, our MCMC inference results as well as the corner plots shown demonstrate that our parameter intervals chosen were reasonable.

S6.1 Calculation of the Akaike Information Criterion

The Akaike Information Criterion (AIC) is a quantitative metric to assess how well a model performs compared to other models (Akaike, 1974). AIC is defined by two terms that account for: (1) how well the model explains the data (log-likelihood), and (2) how many parameters are used (to penalize the model). The AIC is expressed by

(S19) AIC=-2log(likelihood)+2p,

where p is the number of free parameters in the model. As a result, given a set of models, the model with minimal AIC value is preferred (Akaike, 1974).

First, we assume that our measurement error follows a Gaussian probability distribution. As a result, we can write down the probability of measuring a k-th datum, xk from a Gaussian distribution whose mean is μk and standard deviation is σk, termed prob(xk|μk,σk)(Sivia and Skilling, 2006)

(S20) prob(xk|μk,σk)=1σk2πexp(-(xk-μk)22σk2).

We can write the likelihood L of a given set of N measurements by multiplying the probabilities of each independent measurement such that

(S21) L=prob({x1,x2,,xN}|{μ1,μ2,,μN},{σ1,σ2,,σN})=Πk=1N1σk2πexp(-(xk-μk)22σk2),

where xk is the k-th datum with standard deviation of σk from an expected value of μk from a model (Sivia and Skilling, 2006).

By taking the logarithm of the likelihood (L) we get

(S22) log(L)=k=1Nlog(σk2π)k=1N(xkμk)22σk2.

We can now plug in this expression for the log-likelihood into the definition of the AIC from Equation S19, resulting in

(S23) AIC=2k+2k=1N(σk2π)+2k=1N(xkμk)22σk2.

All of the Akaike Information Criteria values shown in Figure 6—figure supplement 1 and Figure 5—figure supplement 3 are calculated using Equation S23.

S7 Comparison of different modes of repression

Transcriptional repressors have been classified into two broad categories: short-range and long-range, depending on the genomic length scale that they act on Courey and Jia, 2001; Li and Gilmour, 2011. Long-range repression is realized by the recruitment of chromatin modifiers. In contrast, short-range repressors act within 100–150 bp by interacting with nearby transcription factors or with the promoter (Li and Gilmour, 2011). Traditionally, the molecular mechanism of short-range repressors, such as Runt, have been further classified into three categories: “direct repression”, “competition”, and “quenching” (Gray et al., 1994; Jaynes and O’Farrell, 1991; Arnosti et al., 1996; Kulkarni and Arnosti, 2005). In “direct repression”, the repressor inhibits the binding of RNAP to the promoter (Figure 5—figure supplement 1A). “Competition” denotes a repressor that competes with an activator for the same DNA binding location (Figure 5—figure supplement 1B). This molecular mechanisms has been proposed for the action of Giant and Krüppel repressors on the even-skipped stripe 2 enhancer, where some activator and repressor binding sites partially overlap (Small et al., 1992). Lastly, “quenching” corresponds to the case where the repressor and activator do not interact with each other directly. Instead, the repressor inhibits the activators’ action of recruiting the RNAP (Figure 5—figure supplement 1C).

Despite several classic studies of the molecular mechanism of repressors in the early fly embryo (Gray et al., 1994; Ip et al., 1992; Bothma et al., 2011; Jaynes and O’Farrell, 1991), the mechanisms of many repressors remain unknown. Note that, even for the same repressor, the mode of repression might not be the same depending on, for example, its sequence context (Koromila and Stathopoulos, 2019; Hang and Gergen, 2017). For example, it has been proposed that Runt repressor acts with different mechanisms in different regulatory elements of the sloppy-paired gene (Hang and Gergen, 2017). In this section, we derive a thermodynamic model from each mode of repression and compare their explanatory power in the context of our data stemming from the hunchback P2 enhancer containing one Runt binding site. Note that, in the main text, we already developed a thermodynamic model for the “direct repression” scenario (Section ‘Derivation of the general and simpler thermodynamic model for the hunchback P2 enhancer with one Runt binding site’). As a result, in this section, we focus on deriving the thermodynamic models for the “competition” and “quenching” scenarios, but repeat the result of the derivation for the “direct repression” here for ease comparison between different models.

S7.1 Derivation of models for each scenario of repression for hunchback P2 with one Runt binding site

S7.1.1 Modeling repression for hunchback P2 with one Runt binding site: direct repression

For completeness, we repeat the expression for the direct repression scenario as shown in Section ‘Derivation of the general and simpler thermodynamic model for the hunchback P2 enhancer with one Runt binding site’ and Figure 5—figure supplement 1A. The probability of finding RNAP bound to the promoter, pbound, is calculated by dividing the sum of all statistical weights featuring RNAP by the sum of the weights of all possible microstates. The calculation of pbound, combined with Equation 1, leads to the expression

(S24) Rate=Rpbound=Rp+b6pωbp+rpωrp+b6rpωbpωrp1+b6+r+b6r+p+b6pωbp+rpωrp+b6rpωbpωrp,

where the parameters are as defined in Figure 2.

S7.1.2 Modeling repression for hunchback P2 with one Runt binding site: competition

In the competition scenario, Runt binding makes Bicoid binding less likely. This mechanism can be captured by an interaction term between Bicoid and Runt given by ωbr. Building on our assumption of strong Bicoid-Bicoid cooperativity, we posit that Runt disfavors the state with six bound Bicoid molecules. We can enumerate the states and weights from Figure 5—figure supplement 1B to calculate the Rate (pbound), which leads to

(S25) Rate=Rp+b6pωbp+rp+b6rpωbpωbr1+p+b6+r+b6rωbr+b6pωbp+rp+b6rpωbpωbr.
S7.1.3 Modeling repression for hunchback P2 with one Runt binding site: quenching

In the quenching scenario, Runt reduces the magnitude of the cooperativity between the Bicoid complex and RNAP by a factor ωbrp. We can enumerate the states and weights from Figure 5—figure supplement 1C, leading to a rate of transcription given by

(S26) Rate=Rp+b6pωbp+rp+b6rpωbpωbrp1+p+b6+r+b6r+b6pωbp+rp+b6rpωbpωbrp.

With these expressions for each repression mechanism in hand, we can now compare how each model fares against our experimental data.

S7.2 Comparing the three models of repression with the one-Runt binding site data

We used the MCMC sampling to fit each model to our experimentally measured initial rate of transcription over the anterior-posterior axis of the embryo. As shown in Figure 5—figure supplement 2A, B and C, we see that all three models can explain the [100] and [010] construct data relatively well. However, the competition model resulted in a qualitatively poor fit to the [001] construct as shown by the lack of saturation in the most anterior region of the embryo (Figure 5—figure supplement 2C), (ii). The direct repression and quenching models showed equally good fits to the data stemming from this construct.

S7.3 Predicting two-Runt binding sites data for each mode of repression

We further tested these different models of repression by using the parameters inferred from the one-Runt binding site constructs to predict the rate of initiation for the two-Runt binding sites constructs. As reasoned in the main text, we began by assuming that the two Runt molecules act independently of each other such that there are no interactions between Runt molecules. Figure 6—figure supplement 1 shows this parameter-free prediction for our two-Runt binding sites constructs for all three modes of repression. As shown in the figure, none of the models can explain the data, suggesting the need to invoke additional interactions between the molecular players of our model.

Next, we considered whether Runt-Runt pairwise or higher-order cooperativities had to be invoked in order to explain the two-Runt binding sites data for both the competition and quenching mechanisms. For the competition model, we considered Runt-Runt cooperativity, ωrr, and Runt-Runt-Bicoid higher-order cooperativity, ωbrr in addition to the Runt-Bicoid interaction term ωbr. In the quenching scenario, we accounted for Runt-Runt cooperativity, ωrr, and Runt-Runt-Bicoid-RNAP higher-order cooperativity, ωbrrp. For both the competition (Figure 6—figure supplement 2) and quenching (Figure 6—figure supplement 3) mechanisms, we observed a qualitatively similar trend to that observed for direct repression (Figure 6). Specifically, as shown in Figure 6—figure supplements 2C and 3C, considering pairwise cooperativity did not significantly improve the MCMC fits to the data for either model considered. Further, considering only the higher-order cooperativity also did not improve the fits for both competition and quenching mechanisms as shown in Figure 6—figure supplement 2D and Figure 6—figure supplement 3D. Invoking both Runt-Runt cooperativity and higher-order cooperativity improved the fits qualitatively for both competition and quenching mechanisms as shown in Figure 6—figure supplement 2E and Figure 6—figure supplement 3E.

While the quenching model showed almost equally good MCMC fits to the data as the direct repression model, the competition model showed qualitatively poor fits in any combination of cooperativities. In particular, there was a significant mismatch in the most anterior region of the embryo, where Bicoid is thought to saturate hunchback expression. While we do not view these fits as conclusive evidence to support one mechanism over the other, an exercise that would require a new round of experimentation, we conclude that higher-order cooperativity is required to explain the data from the two-Runt binding sites constructs regardless of the choice of mechanism of Runt.

S8 Quantifying the nuclear concentration of LlamaTag-Runt

The major caveat in the eGFP:LlamaTag-Runt fluorescence measurements is that the raw nuclear fluorescence that we measured consists of two populations: eGFP bound to the LlamaTag-Runt, and free, unbound eGFP. Thus, in order to measure nuclear Runt concentration, we need to factor out the contribution from free eGFP to the overall fluorescence.

We followed the procedure described in Bothma et al., 2018 which consists of using cytoplasmic fluorescence to calculate the free nuclear eGFP under two assumptions. First, we posit that most of the transcription factors reside in the nucleus such that the cytoplasmic fluorescence mostly reports on free cytoplasmic eGFP. Second, we assume that the nucleus-to-cytoplasm ratio of free eGFP is kept constant at a measured chemical equilibrium of KG=GFPC/GFPN=0.8, where GFPC and GFPN are the eGFP fluorescence in nuclei and cytoplasm in the absence of LlamaTag (Bothma et al., 2018).

As shown in Bothma et al., 2018, the nuclear concentration of the GFP-tagged transcription factor, GFP-TFN, is given by

(S27) GFP-TFN=FluoN-FluoCKG,

where FluoN and FluoC are the eGFP fluorescence in nuclei and cytoplasm, respectively, that we measured in the embryos with both eGFP and LlamaTagged Runt. The resulting nuclear concentration of LlamaTag-Runt is shown in Figure 3B.

Data availability

All data (both input transcription factor concentration and output transcription from all synthetic enhancers, both pre- and post-processed data) have been deposited in Dryad under the DOI https://doi.org/10.5061/dryad.7sqv9s4sv.

The following data sets were generated
    1. Kim Y
    2. Rhee K
    3. Liu J
    4. Jeammet S
    5. Turner M
    6. Small S
    7. Garcia HG
    (2021) Dryad Digital Repository
    Predictive modeling reveals that higher-order cooperativity drives transcriptional repression in a synthetic developmental enhancer.
    https://doi.org/10.5061/dryad.7sqv9s4sv

References

  1. Book
    1. Alberts B
    2. Johnson A
    3. Lewis J
    4. Raff M
    5. Roberts K
    6. Walter P
    (2015)
    Molecular Biology of the Cell
    New York, NY: Taylor and Francis Group.
  2. Book
    1. Phillips R
    2. Kondev J
    3. Theriot J
    (2009)
    Physical Biology of the Cell
    New York, United States: Garland Science.
  3. Book
    1. Ptashne M
    2. Gann A
    (2002)
    Genes and Signals
    New York: Cold Spring Harbor Laboratory Press.
  4. Book
    1. Ptashne M
    (2004)
    A Genetic Switch: Phage Lambda Revisited
    Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory Press.
  5. Book
    1. Sivia D
    2. Skilling J
    (2006)
    Data Analysis: A Bayesian Tutorial
    Oxford, United Kingdom: Oxford University Press.

Decision letter

  1. Jie Xiao
    Reviewing Editor; Johns Hopkins University, United States
  2. Naama Barkai
    Senior Editor; Weizmann Institute of Science, Israel

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Predictive modeling reveals that higher-order cooperativity drives transcriptional repression in a synthetic developmental enhancer" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission. While the work is potentially important, the uniqueness of the model and experimental verification of model predictions need to be further justified.

The essential revisions are:

1. Modeling: Please provide a better articulation/construction of the model regarding the fitting parameters (which to keep constant and which to vary, the number of parameters to use, error bars in the experimental data), whether an alternative model should be considered, whether the provided model can indeed account quantitatively observed data (not just to demonstrate that previous models do not work), and whether statistically the model fits experimental data adequately (not by overfitting, examine using Akaike Information Criterion AIC or Bayesian Information Criterion BIC). See reviewers #1 pt 1, #2 pt 1-5, and #3's overall review, and pt1.

2. Experiments: please provide experimental measurement for a few conditions where the outcome of transcription regulation by Runt is measured by smFISH or protein concentrations to demonstrate that the measured promoter activity using RNAP loading rate is equivalent or better (without complications in mRNA/protein degradation).

Reviewer #1 (Recommendations for the authors):

1. Modeling: I would like to see a better articulation of what the alternative is to a 'thermodynamic model.' I understand the authors meaning to be equilibrium models arrived at through partition functions. But kinetic models based on detailed balance are also thermodynamic. I don't want to get twisted around in the language, but the over-emphasis on the ideological purity of thermodynamics was distracting and seems to refer to an ongoing dialog / conversation which is likely lost on the general reader.

a. One example is when they assume that the Runt dissociation constant remains unchanged with different positions (Kr) because the sequence is the same. The problem stems from their previous section where they found that there was an unsuspected relationship between the position of the Runt Binding sites and the parameter "p=[RNAP]/Kp" --- the point being that the sequence is the same in each construct, and therefore, based on their model Kp is changing. So why is it preferrable to assume that Kr is constant with position, but Kp changes?

b. Expanding on the previous point, they then proceed to state that an unchanging Kr is supported by the fact that by only varying the wrp they were able to fit their data. Considering the similar role of Kr with wrp within the equation, I wonder if one could reach the same result varying the Kr and holding wrp constant? Furthermore, who's to say that both don't vary between the constructs? Could the authors address this concern by varying Kr and further discussing this possibility and what it means for the interpretations of the model? In summary, the authors have a model which explains their data. Great. I'm not sure this agreement can be used as an argument for the propriety of the 'thermodynamic approach.'

2. A simple plot showing mRNA/cell as a function of repressor concentration for the different configurations in Figure 6 would be much appreciated. Perhaps I missed the inference, but the ultimate goal of an input/output understanding is to map concentration of a regulator to expression level of the target gene.

Reviewer #2 (Recommendations for the authors):

The manuscript has several issues as noted below.

1) For MCMC fitting of experimental data, the absolute error or relative error of all the results should be given. One would like to know the confidence degree of the fitting results.

2) Formula (5) in the main text is used to fit the data, and the results are shown in figure6. However, the number of the parameters in the fitting formula is almost the same as the amount of the experimental data, and the formula is too complicated. The corresponding parameter space is quite large. Therefore, it is necessary to further analyze the accuracy of the fitting and the necessity of adding the cooperativity.

3) What is source of error bar of the initial RNAP loading rate? Need to give detailed explanation. If the source of error bar is from the data of multiple traces, why not fit multiple trajectories separately?

4) The fitting equations (1), (2) and (5), are all the fitting function for the experimental data. Are the parameter b, p and r the fitting parameters? If they are the fitting parameters, the fitting result needs to be given. These parameters are related to the weights of different binding states in your model, which need to be listed in detail, and compared the differences of fitting results under variant conditions.

5) In this paper, a variety of data is shown, which are Runt binding to different sites such as [001][010],[011] and so on. Please elaborate on how Runt is experimentally controlled or determined to bind to the specified sites.

6) The fluorescence of nucleus and cytoplasm is quantitatively analyzed, how to distinguish nucleus and cytoplasm? Please give the process and corresponding results.

7) It is shown in Fiugre6b that if the cooperativity is not included, a good fitting result cannot be obtained. Has the fitting covered a large enough parameter space search? The fitting results in figure 6 are quite different from the experiment results regardless of the cooperativity. If so, does it mean that the cooperativity plays a major role in the fitting, or even that the individual interaction between Runt and binding sites can be ignored? Could you separate out the part of cooperativity in the fitting results so as to confirm that the cooperativity plays a key role in the experiment?

Reviewer #3 (Recommendations for the authors):

1) Results of Figure 5 should be accompanied by reports of fitting alternative models and suitable model comparisons performed. Also, a baseline model with constant KR and \omegaRP (same for all three constructs) should be evaluated and compared to. Such comparison should ideally involve model comparison techniques such as likelihood ratio and AIC/BIC.

2) Results of Figure 6 should be accompanied by statistical assessment of improvements due to higher-order cooperativity parameters.

3) The fact that fitting of more complex models typically relied on setting different values for the additional parameters for each construct (rather than the same values for all tested constructs of that category) should be examined more closely. How well can the data be modeled without this flexibility?

4) The authors should address the concern that alternative model structures different from that used in the thermodynamics-based models here might have explained the data without requiring higher-order cooperativity.

Other suggestions:

One of the first experimental tests reported is that of predicted expression profile in the hunchback promoter with one Runt binding site. Figure 3J is supposed to be a qualitative match to the predictions in Figure 3F. Could the authors elaborate on what reasonable regimes of the model parameters would provide predictions that do not qualitatively match the observation? It seems like with an activator (Bicoid) progressively decreasing and a repressor (Runt) progressively increasing from anterior to posterior, the enhancer expression will obviously decrease in some form. Perhaps the authors might wish to make it clear that their goal here was to establish some basic parameter estimates to be used in the remaining analyses, and perhaps also to assess the uncertainties associated with those model parameters.

The results presented in Figure 4 are puzzling. Figure 4D suggests that the differences among the constructs arises purely from differences in the p and R parameters, and not from the bicoid or bicoid-RNAP related parameters. Could the authors add a possible mechanistic speculation here regarding how making changes solely in the enhancer sequence might result in biochemical changes solely in the behavior of RNAP at the promoter (without any change in the protein-DNA interactions at the enhancer itself)?

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Predictive modeling reveals that higher-order cooperativity drives transcriptional repression in a synthetic developmental enhancer" for further consideration by eLife. Your revised article has been evaluated by Naama Barkai (Senior Editor) and a Reviewing Editor.

The manuscript has been significantly improved but please further address one issue on the validation of transcription rate measurement, the reviewers asked to use an orthogonal method such as smFISH to validate the measurement using MS2, which was not done in the revision. Please provide at least a thorough comparison and contrast of the current method with other orthogonal methods in the Discussion to equip readers with a full context of the quantifications that are central to the work.

https://doi.org/10.7554/eLife.73395.sa1

Author response

The essential revisions are:

1. Modeling: Please provide a better articulation/construction of the model regarding the fitting parameters (which to keep constant and which to vary, the number of parameters to use, error bars in the experimental data), whether an alternative model should be considered, whether the provided model can indeed account quantitatively observed data (not just to demonstrate that previous models do not work), and whether statistically the model fits experimental data adequately (not by overfitting, examine using Akaike Information Criterion AIC or Bayesian Information Criterion BIC). See reviewers #1 pt 1, #2 pt 1-5, and #3's overall review, and pt1.

We apologize for the lack of clarity in our original manuscript regarding the model construction as well as the statistical interpretation of the MCMC inference. In our revised manuscript, we colored the free parameters in red to emphasize which parameters are being inferred from a given model equation. We also included error bars representing 95% of confidence intervals for all of our MCMC inference results in the main Figures (Figure 4, 5, 6, and 7).

For the question of whether an alternative model should be considered, specifically for one-Runt binding site constructs, we tested alternative models under four different scenarios: (1) Kr and wrp being constant across constructs, (2) Kr remains constant and wrp varies, (3) Kr varies and wrp remains constant, and (4) both Kr and wrp vary across constructs. The results are given in Figure S16, where we can see that our original assumption of keeping Kr constant and varying wrp indeed explains the data better than the other models. This result is further supported by the quantification of model performance using the Akaike Information Criterion (as suggested by the reviewers) shown in Figure S16F. Finally, the inferred parameters from the case where we let both Kr and wrp vary across constructs are shown in Figure S16G. These results suggest that the Kr values do not vary significantly across the constructs whereas wrp values vary significantly between the constructs. From the aforementioned evidence, we argue that our assumption of constant Kr across constructs is well supported.

To assuage the reviewers’ concern on whether the improved model with higher-order cooperativities considered in the main text indeed explains the data better than the others, we computed Akaike Information Criterion for all models presented in Figure 6, resulting in a new panel, Figure 6G. The result indeed supports that both Runt-Runt cooperativity and higher-order cooperativity (emphasis on the higher-order cooperativity) are required to explain the experimental data without overfitting.

Finally, in addition to the “direct repression” that was assumed throughout the main text, we tested alternative models based on different modes of repression that have been proposed in the literature (Gray et al., Genes and Development 8:1829, 1994,). Specifically, we tested two additional modes of repression, “competition” and “quenching” in Figure S7 and S8, respectively, and summarized the results in Section S5. In short, our goal was to see if the pairwise cooperativities are sufficient to explain the two-Runt binding sites cases in these alternate models of repression, or whether we had to invoke higher-order cooperativities as in the “direct repression” mechanism. We concluded that both modes of repression required higher-order cooperativities to explain the two-Runt binding sites cases (even though the competition mechanism generally gave worse fits than direct repression or quenching). Thus, we believe that our conclusion of the necessity of higher-order cooperativity to account for our experimental data holds regardless of the specific model of repression that is assumed.

2. Experiments: please provide experimental measurement for a few conditions where the outcome of transcription regulation by Runt is measured by smFISH or protein concentrations to demonstrate that the measured promoter activity using RNAP loading rate is equivalent or better (without complications in mRNA/protein degradation).

We have revised our manuscript to include the results for accumulated mRNA patterns. Here is the summary of our major revisions.

First, we quantified the accumulated mRNA level (pattern) by integrating the MS2 time traces during nuclear cycle 14. This method was previously validated to give quantitatively comparable results to single-molecule fluorescence in situ hybridization (smFISH) (Garcia et al., Current Biology, 23:2140, 2013). The patterns of accumulated mRNA corresponding to each construct with and without the Runt protein are given in Figure S17. The figure clearly shows that the effect of repression is present at the level of accumulated mRNA.

Second, to demonstrate that the measured promoter activity using RNAP loading rate is equivalent to measures of accumulated mRNA, we compared the initial rate of RNAP loading to the accumulated mRNA at each position along the anterior-posterior axis for all of our constructs. The result is now included in a new figure, Figure S19. Interestingly, there is substantial correlation between these two metrics, with an average Pearson correlation coefficient of 0.9. Thus, we argue that the initial rate of RNAP loading, the quantity that we focused on in this paper, is comparable to the accumulated mRNA, suggesting that higher-order cooperativity is necessary to predict patterns of accumulated mRNA.

Regardless, we would like to emphasize that, even though there was a good correlation between the rate of transcription and the accumulated mRNA as shown in Figure S19, the objective of our work was not to predict accumulated mRNA patterns. Instead, our goal was to uncover the molecular mechanisms by which multiple transcription factors work together and to quantitatively predict the consequences of these mechanisms. We argue that the action of transcription factors is more accurately captured by monitoring the rate of transcription, rather than the accumulated mRNA, which could be potentially confounded by the regulation of different aspects of the transcription cycle (initiation, elongation and termination), as well as degradation and diffusion. We now emphasized this point in the Discussion (Lines 417-422).

Note that the reviewer requested to make this comparison using orthogonal techniques such as smFISH. However, we hope that the reviewers will agree that the already established relationship between the integral of the MS2 signal and the accumulated mRNA makes invoking such orthogonal techniques (which lack the temporal resolution of the MS2 system) unnecessary.

Reviewer #1 (Recommendations for the authors):

1. Modeling: I would like to see a better articulation of what the alternative is to a 'thermodynamic model.' I understand the authors meaning to be equilibrium models arrived at through partition functions. But kinetic models based on detailed balance are also thermodynamic. I don't want to get twisted around in the language, but the over-emphasis on the ideological purity of thermodynamics was distracting and seems to refer to an ongoing dialog / conversation which is likely lost on the general reader.

We agree with the reviewer’s comment. Kinetic models that account for the transition rates between the different binding states the enhancer become thermodynamic models in the extreme case where the rates that characterize these transitions are much faster than the rate of mRNA production. In this limit, we can invoke quasi-equilibrium, making it possible to forego rates altogether and focus on the probabilities of each state. As the reviewer suggests, we could have entertained such a more general kinetic model. However, our goal was to ask whether the simple thermodynamic model, which has been very successful in bacteria, could also predict transcriptional regulation in the animal context. We have now included a discussion of alternative models in the main text (Lines 107-108) and Discussion (Lines 492-493) to clarify our points.

a. One example is when they assume that the Runt dissociation constant remains unchanged with different positions (Kr) because the sequence is the same. The problem stems from their previous section where they found that there was an unsuspected relationship between the position of the Runt Binding sites and the parameter "p=[RNAP]/Kp" --- the point being that the sequence is the same in each construct, and therefore, based on their model Kp is changing. So why is it preferrable to assume that Kr is constant with position, but Kp changes?

We agree that the justification for the assumption that the Runt dissociation constant Kr is equivalent throughout our constructs should be clarified further.

In Section 2.3 and Figure 4, we attempted to determine whether the simple thermodynamic model derived in Equation 3 could explain our data, and if so, which parameters were consistent across our synthetic enhancer constructs, and which parameters varied. As the reviewer pointed out, the parameter “p=[RNAP]/Kp” changes between constructs, thus suggesting that Kp changes even though the sequence of the promoter is exactly the same across constructs. We do not know why this is the case, but we could speculate on a potential scenario.

We speculate that, since the enhancer is proximal to the promoter, the overall enhancer sequence might affect the binding of the transcriptional machinery itself. Therefore, the transcriptional machinery might see slightly different DNA sequences, a combination of the promoter and part of synthetic enhancer for each construct. Indeed, Louder et al. (Nature 531:604, 2018), showed using cryo-EM that promoter-bound general transcription factor IID (TFIID) occupies roughly 60 bp of DNA (from -20 to +40, where 0 represents the start of the transcription site in the promoter) when it forms pre-initiation complex. Thus, it is conceivable that the general transcriptional machinery would interact with the general DNA sequence of an enhancer located proximally upstream from the promoter. In contrast, the binding site sequence for Runt is small enough (8 bp) compared to the enhancer sequence (~300 bp) that Runt protein might see a similar local DNA sequence for each binding site. Thus, our results could be explained if the general transcriptional machinery interacts with much longer DNA sequences than Runt repressor. This discussion is now included in the Discussion (Lines 433-440)

Regardless of the reason behind the variation in Kp between reporter constructs, our assumption that the Runt dissociation constant remains equivalent across all binding sites is supported by relaxing this assumption and performing MCMC inference on an unconstrained model. In a new figure, Figure S16G, we show the inferred parameters corresponding to the case where we let both Kr and wrp vary. In Author response image 1, it is clear that the inferred Kr values do overlap between constructs within error bars, whereas the p=[RNAP]/Kp varies across synthetic enhancer constructs. Thus, we believe that our assumption of Kr remaining constant across all of our constructs is valid, and chose to continue making this assumption throughout the main text. We have incorporated this evidence that the inferred Kr values do overlap between constructs within error bars in Figure S16.

Author response image 1
Inferred values of Kr and [RNAP]/Kp across all synthetic enhancer constructs.

Kr is inferred from one-Runt binding site constructs, and [RNAP]/Kp is inferred from all constructs in the absence of Runt protein.

b. Expanding on the previous point, they then proceed to state that an unchanging Kr is supported by the fact that by only varying the wrp they were able to fit their data. Considering the similar role of Kr with wrp within the equation, I wonder if one could reach the same result varying the Kr and holding wrp constant? Furthermore, who's to say that both don't vary between the constructs? Could the authors address this concern by varying Kr and further discussing this possibility and what it means for the interpretations of the model? In summary, the authors have a model which explains their data. Great. I'm not sure this agreement can be used as an argument for the propriety of the 'thermodynamic approach.'

This is a great suggestion. We tested alternative models under four different scenarios: (1) Kr and wrp being constant across constructs, (2) Kr remains constant and wrp varies, (3) Kr varies and wrp remains constant, and (4) both Kr and wrp vary across constructs. The results are given in Figure S16, where we can see that our original assumption of keeping Kr constant and varying wrp indeed explains the data better than the other models. Briefly, the best MCMC fits to our data look qualitatively similar for the [001] and [010] constructs regardless of our assumptions, but differ for the [100] construct. For the [100] construct, the best MCMC fits predict that there should be a repression in the presence of Runt protein for the (1) and (3) scenarios. However, no repression is present in the data, suggesting that the assumption of keeping Kr constant is essential in explaining all three constructs, especially the [100] construct. This result is further supported by the quantification of model performance given by the Akaike Information Criterion shown in Figure S16F. The Akaike Information Criterion is a metric that reports the performance of the model. Briefly, Akaike Information Criterion quantifies the deviation between the model and the data, then penalizes the model by the number of free parameters. As a result, the smaller the Akaike Information Criterion is, the better the model performs. Our analysis suggests that the assumption of Kr remaining constant and wrp varying is favorable. Finally, the inferred parameters from the case where we let both Kr and wrp vary across constructs are shown in Figure S16G, showing that the inferred Kr values are comparable between constructs (within error bars), whereas wrp values vary.

2. A simple plot showing mRNA/cell as a function of repressor concentration for the different configurations in Figure 6 would be much appreciated. Perhaps I missed the inference, but the ultimate goal of an input/output understanding is to map concentration of a regulator to expression level of the target gene.

In the new Figure S18, we now present the accumulated mRNA as a function of repressor concentration for all of our synthetic enhancers. We would like to emphasize that this visualization could be misleading since Bicoid and Runt both vary along the anterior posterior axis of the embryo. Thus, to avoid confusion regarding the simultaneous variation of Bicoid and Runt across the embryo, in the main text we favor our current visualization where we plot our transcriptional output along the embryo’s body axis (anterior-posterior axis).

Reviewer #2 (Recommendations for the authors):

The manuscript has several issues as noted below.

1) For MCMC fitting of experimental data, the absolute error or relative error of all the results should be given. One would like to know the confidence degree of the fitting results.

Thank you for pointing this out. We agree that we missed this important information in our first submission. We have now included the 95% confidence interval for all our MCMC fits and prediction plots in our revised submission. As a result, Figures 4, 5, 6, and 7 now contain proper confidence intervals from the MCMC inference.

2) Formula (5) in the main text is used to fit the data, and the results are shown in figure6. However, the number of the parameters in the fitting formula is almost the same as the amount of the experimental data, and the formula is too complicated. The corresponding parameter space is quite large. Therefore, it is necessary to further analyze the accuracy of the fitting and the necessity of adding the cooperativity.

We agree that what is now Equation 6 is quite complicated and that we did not clearly present which parameters are free and which parameters are fixed because they were determined using previous measurements of reporter constructs with simpler regulatory architectures. We revised the way we display the formula to emphasize that only wrr and wrrp are the free parameters by coloring these two parameters in red. Thus, the number of free parameters is indeed smaller than the number of data points.

3) What is source of error bar of the initial RNAP loading rate? Need to give detailed explanation. If the source of error bar is from the data of multiple traces, why not fit multiple trajectories separately?

We thank the reviewer for pointing out this lack of clarification in our manuscript. Figure 3I shows how the initial rate of RNAP loading is measured by extracting the slope resulting from a linear fit to the averaged MS2 time traces at the beginning of nuclear cycle 14. Here, the MS2 traces were averaged over a set of nuclei in an individual embryo that are in the same spatial bin along the anterior-posterior axis of the embryo (2.5% of the embryo length) as previously described (Garcia et al., Current Biology 23:2140, 2013; Eck and Liu et al., eLife 9:e56429, 2020). Then, for all figures throughout this paper, we averaged these initial RNAP loading rates over multiple embryos (>3 embryos) at each spatial position, with an error bar corresponding to the standard error of the mean. Thus, the source of the error bar in the initial RNAP loading rate stems from embryo-to-embryo variability. This information has now been included throughout the paper, especially in Figure 3 caption.

To address the reviewer’s concern of whether our parameter inference would change if we inferred the model parameters from individual embryos, instead of the mean initial RNAP loading rates averaged over multiple embryos, we now included a new figure, Figure S20. In Figure S20B, we show that the initial RNAP loading rate obtained from individual embryos and then averaged over these embryos does not differ significantly from the initial RNAP loading rate calculated from an averaged embryo. Additionally, for a construct with the largest embryo-to-embryo variability, [010], we compared the results of performing MCMC inference using these two different approaches to calculating the initial RNAP loading rate. In Figure S20C, we now show that the values of parameters inferred using both approaches are comparable.

4) The fitting equations (1), (2) and (5), are all the fitting function for the experimental data. Are the parameter b, p and r the fitting parameters? If they are the fitting parameters, the fitting result needs to be given. These parameters are related to the weights of different binding states in your model, which need to be listed in detail, and compared the differences of fitting results under variant conditions.

We apologize for the lack of clarity in the presentation of these equations in our first draft. The parameters b, p, and r are b=[Bicoid]/Kb, p=[RNAP]/Kp, and r=[Runt]/Kr, respectively. Thus, the fitting parameters are Kb, p, and Kr. However, Kb and p were already fitted in Figure 4 in the absence of Runt protein. These inferred parameters were taken as given in order to determine Kr in Figure 5. Thus, parameters are not inferred simultaneously, but sequentially. To clarify further, we now mark all free parameters in our equations in red in our manuscript to more easily distinguish free and fixed parameters.

5) In this paper, a variety of data is shown, which are Runt binding to different sites such as [001][010],[011] and so on. Please elaborate on how Runt is experimentally controlled or determined to bind to the specified sites.

We now elaborate on our synthetic enhancer construct design in Section 2.3 (Lines 219-226). Briefly, we used the same sequence motif that was identified in a study by Chen et al. (Cell 149:618, 2012). In that work, using synthetic enhancers, the binding motif used in our manuscript was validated to recruit Runt to repress the transcription of a lacZ reporter. Specifically, the resulting expression patterns were quantified using in situ hybridization in the presence and absence of Runt protein. We used the exact same sequence and positions of Runt binding sites across all our synthetic enhancer constructs, and we also observed transcriptional repression for all constructs by comparing the presence and absence of Runt protein. Additionally, this binding motif contains the canonical binding motifs for Run domain proteins, which have conserved amino acid sequences for the DNA binding domains between Drosophila Run and its human homologue Runx1 (Chen et al., Cell 149:618, 2012; Koromila et al., Cell Reports 28:855, 2019). Several studies using ChIP-chip, or electric mobility shift assay (EMSA) have identified this consensus sequence to be directly bound by Runx1 protein’s DNA binding domain (Melnikova et al., Journal of Virology 67:2408, 1993; Lewis et al., Journal of Virology 73:5535, 1999). We have now included these references supporting Runt’s direct binding to its binding sites where we introduced our synthetic enhancer constructs in the main text.

6) The fluorescence of nucleus and cytoplasm is quantitatively analyzed, how to distinguish nucleus and cytoplasm? Please give the process and corresponding results.

We segmented nuclei by using the signal stemming from a histone fusion to infrared fluorescent protein (His-iRFP). We have included a new figure (Figure S22) to clearly show this. We have now clarified this point in the Materials and methods.

7) It is shown in Figure 6b that if the cooperativity is not included, a good fitting result cannot be obtained. Has the fitting covered a large enough parameter space search? The fitting results in figure 6 are quite different from the experiment results regardless of the cooperativity. If so, does it mean that the cooperativity plays a major role in the fitting, or even that the individual interaction between Runt and binding sites can be ignored? Could you separate out the part of cooperativity in the fitting results so as to confirm that the cooperativity plays a key role in the experiment?

We agree with the reviewer that this point has to be clarified further. In brief, the curves in Figure 6B did not have any free parameters, as we used the values for the parameters Kr and wrp that were inferred from Figure 5 for each Runt binding site (wrr = 1, and wrrp=1 in Equation 6 in this particular case).

To address the reviewer’s point about whether the higher-order cooperativity is sufficient to get a good fit without the interaction between Runt and its binding site, we performed a sensitivity analysis for the initial RNAP loading rate as a function of the Runt dissociation constant for its binding site, Kr. Briefly, we fixed the Kr value to the inferred value from one-Runt binding site constructs shown in Figure 5, varied the Kr values by 100-fold (0.1x or 10x Kr) and then repeated the MCMC inference for different scenarios of cooperativities to see if the higher-order cooperativity alone could give equivalently good fits to the data, regardless of the magnitude of Kr. The result is shown in our new figure, Figure S21. Figure S21B-E, center column shows the case of a 10 times larger value of Kr, where we found that the best MCMC fit cannot recapitulate our data even in the presence of both Runt-Runt cooperativity and higher-order cooperativity. On the other hand, Figure S21B-E, right column shows the case of Kr/10, the best MCMC fit could recapitulate the data quite well if we invoked higher-order cooperativity, but ignored Runt-Runt pairwise cooperativity. In conclusion, we found that the Runt dissociation constant, Kr, plays a pivotal role in fitting as well as the combination of different cooperativities, but that, even as we vary this dissociation constant, higher-order cooperativity needs to be invoked to obtain a quantitative agreement between theory and experiment.

Reviewer #3 (Recommendations for the authors):

1) Results of Figure 5 should be accompanied by reports of fitting alternative models and suitable model comparisons performed. Also, a baseline model with constant KR and \omegaRP (same for all three constructs) should be evaluated and compared to. Such comparison should ideally involve model comparison techniques such as likelihood ratio and AIC/BIC.

We have included a new figure, Figure S16 to address this point about alternative model assumptions in the context of the one-Runt binding site constructs. We tested alternative models with four different scenarios: (1) Kr and wrp being constant across constructs, (2) Kr remains constant and wrp varies, (3) Kr varies and wrp remains constant, and (4) both Kr and wrp vary across constructs. The results are given in Figure S16, where we show that our original assumption of keeping Kr constant and varying wrp indeed explains the data better than the other models with the minimal set of free parameters. These results are further supported by a new calculation of the Akaike Information Criterion shown in Figure S16F.

2) Results of Figure 6 should be accompanied by statistical assessment of improvements due to higher order cooperativity parameters.

We again thank the reviewer for bringing up this point. We now include an Akaike Information Criterion (AIC) assessment for all models presented in Figure 6 (Figure 6G) to show how the model improves as we change our assumptions.

3) The fact that fitting of more complex models typically relied on setting different values for the additional parameters for each construct (rather than the same values for all tested constructs of that category) should be examined more closely. How well can the data be modeled without this flexibility?

Indeed, the simplest, baseline model would have been to consider equal values of additional parameters across the synthetic enhancer constructs such as when we introduce Kr and wrp in Figure 5 to explain the repression by Runt protein for the one-Runt binding site cases. We considered the reviewer’s suggestion and performed MCMC inference keeping Kr and wrp constant. However, the result was quite off from the experimental data as shown in Figure S16B.

4) The authors should address the concern that alternative model structures different from that used in the thermodynamics-based models here might have explained the data without requiring higher order cooperativity.

As the reviewer pointed out, our objective was to test whether thermodynamic models could predict Runt repressor action for a wide array of regulatory architectures. Within the thermodynamic modeling framework, we have tested different modes of repression, such as “competition” and “quenching'' as shown in Figure S7 and S8, and shown that they all require higher-order cooperativities in order to quantitatively recapitulate the data.

Of course, it is interesting to consider what would happen if we expanded our models to, for example, the non-equilibrium regime. While we are very interested in this approach, we believe that testing the non-equilibrium models is beyond the scope of this manuscript, and now address this point explicitly in the Discussion (Lines 491-499).

Other suggestions:

One of the first experimental tests reported is that of predicted expression profile in the hunchback promoter with one Runt binding site. Figure 3J is supposed to be a qualitative match to the predictions in Figure 3F. Could the authors elaborate on what reasonable regimes of the model parameters would provide predictions that do not qualitatively match the observation? It seems like with an activator (Bicoid) progressively decreasing and a repressor (Runt) progressively increasing from anterior to posterior, the enhancer expression will obviously decrease in some form. Perhaps the authors might wish to make it clear that their goal here was to establish some basic parameter estimates to be used in the remaining analyses, and perhaps also to assess the uncertainties associated with those model parameters.

Figure 3J was given as an illustrative example of one of our measured profiles of the initial RNAP loading. Our objective with this figure was not to go into the detail of the reasonable parameter regimes that would be consistent with these data. Indeed, as the reviewer points out, the role of Figure 3J is to confirm the qualitative prediction that, given the Bicoid and Runt profiles, the initial rate of RNAP loading should decrease towards the posterior of the embryo. Figure 5 then attempts to quantitatively confront this type of data with our models to explore the reasonable regime of the model parameters as well as uncertainties. We clarified this in the main text, Section 2.2 (Lines 189-190) and Figure 3 caption.

The results presented in Figure 4 are puzzling. Figure 4D suggests that the differences among the constructs arises purely from differences in the p and R parameters, and not from the bicoid or bicoid-RNAP related parameters. Could the authors add a possible mechanistic speculation here regarding how making changes solely in the enhancer sequence might result in biochemical changes solely in the behavior of RNAP at the promoter (without any change in the protein-DNA interactions at the enhancer itself)?

Indeed, this result was puzzling to us. We think that our result reveals an important potential caveat in the study of synthetic enhancers: the basal level of expression should be measured experimentally in the absence of the transcription factor of interest. This can, of course, be experimentally challenging in cases where there are more than two transcription factors of interest.

As pointed out by the reviewer, the MCMC inference results shown in Figure 4 lead us to conclude that the different basal activities of our synthetic enhancer constructs in the absence of Runt protein is due to the variability in p=[RNAP]/Kp and the maximum rate of transcription R. Currently, we do not have a satisfactory explanation for this observation. We would like to leave this for future investigations using experimental methods that can directly address these possibilities, but want to offer some speculations which we repeat from our response to Reviewer #1.

We speculate that, since the enhancer is proximal to the promoter, the overall enhancer sequence might affect the binding of the transcriptional machinery itself. Therefore, the transcriptional machinery might see slightly different DNA sequences, a combination of the promoter and part of synthetic enhancer for each construct. Indeed, Louder et al. (Nature 531:604, 2018), used cryo-EM to show that promoter-bound general transcription factor IID (TFIID) occupies roughly 60 bp of DNA (from -20 to +40, where 0 represents the start of the transcription site in the promoter) when it forms pre-initiation complex. Thus, it is conceivable that the general transcriptional machinery would interact with the general DNA sequence of an enhancer located proximally upstream from the promoter. In contrast, the binding site sequence for Runt is small enough (8 bp) compared to the enhancer sequence (~300 bp) that Runt protein might see a similar local DNA sequence for each binding site. Thus, our results could be explained if the general transcriptional machinery interacts with much longer DNA sequences than Runt repressor. This discussion is now included in the Discussion (Lines 433-440)

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been significantly improved but please further address one issue on the validation of transcription rate measurement-- the reviewers asked to use an orthogonal method such as smFISH to validate the measurement using MS2, which was not done in the revision. Please provide at least a thorough comparison and contrast of the current method with other orthogonal methods in the Discussion to equip readers with a full context of the quantifications that are central to the work.

Once again, we thank the editors and reviewers for their insightful comments and suggestions on our manuscript. We further address the issue raised by the editors on the validation of transcription rate measurement via comparison to orthogonal methods, such as smFISH below.

First, we quantified the accumulated mRNA level (pattern) by integrating the MS2 time traces during nuclear cycle 14. This method was previously validated to give quantitatively comparable results to single-molecule fluorescence in situ hybridization (smFISH) (Garcia et al., Current Biology 23:2140, 2013; Lammers et al., PNAS 17:836, 2020). Briefly, Garcia et al. compared the patterns of accumulated mRNA along the anterior-posterior axis of the embryo at late nuclear cycle 14 as quantified via MS2 and smFISH. These two methods yielded quantitatively comparable results as shown in Figure S3H from Garcia et al., Current Biology, 23:2140, 2013. We have elaborated further on this validation study in the Discussion (Lines 419-429).

Furthermore, we have compared two publicly available datasets from the same reporter construct driven by hunchback P2 enhancer by fluorescence in situ hybridization (FISH; Park et al., eLife 8: e41266, 2019) and MS2 ( Eck and Liu et al., eLife 9:e56429, 2020). Again, we integrated the MS2 time traces during nuclear cycle 13 for the datasets from Eck and Liu et al., and then compared that to the patterns of accumulated mRNA quantified via fluorescence in situ hybridization at early nuclear cycle 14 from Park et al. (shown in Author response image 2). By normalizing the accumulated mRNA profiles, we could see a good match along the anterior-posterior axis. From these comparisons to orthogonal methods widely used in the field (both smFISH and FISH), we claim that the MS2 technique is an accurate method for quantifying the patterns of accumulated mRNA as much as smFISH or FISH.

Note that the reviewer requested we repeat these experiments in the context of our work. However, we hope that the reviewers will agree that the already established relationship between the integral of the MS2 signal and the accumulated mRNA makes repeating these measurements unnecessary.

Author response image 2
Comparison of accumulated mRNA profiles measured by FISH and MS2 for a reporter construct driven by the hunchback P2 enhancer.

Normalized profiles of accumulated mRNA averaged over embryos from FISH (blue, Park et al., eLife 8:e41266, 2019) and MS2 (red, Eck and Liu et al., eLife 9:e56429). The blue and red curves show the mean and standard error of the mean over 16 and 9 embryos, respectively. The accumulated mRNA profiles were normalized to the average level within 25%-35% of the embryo length.

https://doi.org/10.7554/eLife.73395.sa2

Article and author information

Author details

  1. Yang Joon Kim

    Chan Zuckerberg Biohub, San Francisco, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1742-5657
  2. Kaitlin Rhee

    Department of Chemical Biology, University of California, Berkeley, Berkeley, United States
    Contribution
    Data curation, Formal analysis, Investigation
    Competing interests
    No competing interests declared
  3. Jonathan Liu

    Department of Physics, University of California, Berkeley, Berkeley, United States
    Contribution
    Software, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Selene Jeammet

    Department of Biology, Ecole Polytechnique, Paris, France
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  5. Meghan A Turner

    Biophysics Graduate Group, University of California, Berkeley, Berkeley, United States
    Contribution
    Resources
    Competing interests
    No competing interests declared
  6. Stephen J Small

    Department of Biology, New York University, New York, United States
    Contribution
    Conceptualization, Resources, Supervision
    Competing interests
    No competing interests declared
  7. Hernan G Garcia

    1. Chan Zuckerberg Biohub, San Francisco, United States
    2. Department of Physics, University of California, Berkeley, Berkeley, United States
    3. Biophysics Graduate Group, University of California, Berkeley, Berkeley, United States
    4. Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States
    5. Institute for Quantitative Biosciences–QB3, University of California at Berkeley, Berkeley, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Investigation, Visualization, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    hggarcia@berkeley.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5212-3649

Funding

Burroughs Wellcome Fund (Career Award)

  • Hernan G Garcia

Sloan Research Foundation

  • Hernan G Garcia

Human Frontier Science Program

  • Hernan G Garcia

Searle Scholars Program

  • Hernan G Garcia

Shurl and Kay Curci Foundation

  • Hernan G Garcia

Hellman Foundation

  • Hernan G Garcia

National Institute of Health (DP2 OD024541-01)

  • Hernan G Garcia

National Science Foundation (1652236)

  • Hernan G Garcia

Korea Foundation for Advanced Studies (Graduate Student Fellowship)

  • Yang Joon Kim

Chan Zuckerberg Biohub (Chan Zuckerberg Biohub Investigator Award)

  • Hernan G Garcia

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to Armando Reimer, Brandon Schlomann, Elizabeth Eck, Gabriella Martini, Jacques Bothma, Jeehae Park, Jia Ling, Matthew Norstad, Michael Eisen, Nicholas Lammers, Nipam Patel, Rob Phillips, Simon Alamos, Xavier Darzacq, and Yasemin Kirişçioğlu for their guidance and comments on our manuscript. This work was supported by the Burroughs Wellcome Fund Career Award at the Scientific Interface, the Sloan Research Foundation, the Human Frontiers Science Program, the Searle Scholars Program, the Shurl and Kay Curci Foundation, the Hellman Foundation, the NIH Director’s New Innovator Award (DP2 OD024541-01), NSF CAREER Award (1652236), and an NIH R01 Award (R01GM139913) to HGG, and a KFAS scholarship to YJK. HGG is also a Chan Zuckerberg Biohub investigator.

Senior Editor

  1. Naama Barkai, Weizmann Institute of Science, Israel

Reviewing Editor

  1. Jie Xiao, Johns Hopkins University, United States

Publication history

  1. Preprint posted: July 29, 2021 (view preprint)
  2. Received: August 27, 2021
  3. Accepted: December 9, 2022
  4. Accepted Manuscript published: December 12, 2022 (version 1)
  5. Version of Record published: January 12, 2023 (version 2)

Copyright

© 2022, Kim et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 424
    Page views
  • 73
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yang Joon Kim
  2. Kaitlin Rhee
  3. Jonathan Liu
  4. Selene Jeammet
  5. Meghan A Turner
  6. Stephen J Small
  7. Hernan G Garcia
(2022)
Predictive modeling reveals that higher-order cooperativity drives transcriptional repression in a synthetic developmental enhancer
eLife 11:e73395.
https://doi.org/10.7554/eLife.73395