Cellular resolution models for even skipped regulation in the entire Drosophila embryo

Abstract
eLife digest
Introduction
Results
Discussion
Methods
Data availability
References
Article and author information
Metrics

Abstract

Transcriptional control ensures genes are expressed in the right amounts at the correct times and locations. Understanding quantitatively how regulatory systems convert input signals to appropriate outputs remains a challenge. For the first time, we successfully model even skipped (eve) stripes 2 and 3+7 across the entire fly embryo at cellular resolution. A straightforward statistical relationship explains how transcription factor (TF) concentrations define eve’s complex spatial expression, without the need for pairwise interactions or cross-regulatory dynamics. Simulating thousands of TF combinations, we recover known regulators and suggest new candidates. Finally, we accurately predict the intricate effects of perturbations including TF mutations and misexpression. Our approach imposes minimal assumptions about regulatory function; instead we infer underlying mechanisms from models that best fit the data, like the lack of TF-specific thresholds and the positional value of homotypic interactions. Our study provides a general and quantitative method for elucidating the regulation of diverse biological systems.

https://doi.org/10.7554/eLife.00522.001

eLife digest

The transcription of genes into messenger RNA (mRNA) molecules is one of the most important processes in biology, but our present understanding of this process is largely qualitative. Molecules such as transcription factors and regions of DNA other than the region that codes for the mRNA are known to interact with each other to influence the onset of transcription, and also the rate at which it occurs. However, given the cellular concentrations of transcription factors in a developing organism, it is not known if it is possible to accurately predict their effects on transcription. Being able to make such predictions would greatly improve our understanding of how transcription and the development of an organism are controlled.

Ilsley et al. have tackled this problem by analysing a large volume of data called the Virtual Embryo dataset: produced by the Berkeley Drosophila Transcription Network Project, this dataset includes the results of mRNA expression measurements on 95 different genes at six different times during the early development of Drosophila melanogaster, a species of fruit fly. In particular, Ilsley et al. focussed on the expression at one point in time of the even skipped (eve) gene, a widely studied gene that is important for embryo development in these fruit flies. The eve gene is one of the genes responsible for dividing the fly into segments which form part of its body plan.

Without making any assumptions about the biological mechanisms that might be involved, Ilsley et al. built a statistical model that was able to predict the pattern of gene expression for a fruit fly, given the concentrations of the relevant transcription factors in the various cells within the embryo as input. The model was also able to predict the patterns of gene expression observed in other experiments involving mutations and the misexpression of fruit fly genes. Moreover, Ilsley et al. have made various predictions involving the genes Bicoid and Hunchback that can be tested experimentally in future studies.

https://doi.org/10.7554/eLife.00522.002

Introduction

A detailed knowledge of transcriptional control will have profound consequences for our understanding of myriad biological processes, including development, homeostasis, and evolution of new phenotypes. To this end, through a combination of genomic, genetic, and molecular experiments, the field continues to accumulate considerable information documenting components of regulatory systems and regulator-target interactions (Gerstein et al., 2010; The modENCODE Consortium, 2010; The ENCODE Project Consortium, 2012). At present however, many of these descriptions are qualitative. A major goal going forward is to interpret these data in a quantitative manner (Wilczynski and Furlong, 2010): how do regulators and regulatory interactions convert input signals to the appropriate output expression pattern? In general, answering these questions remains a significant challenge. The experiments needed to probe regulatory functions in detail are technically demanding; moreover, many systems involve multiple layers of control that cannot be investigated within a single experimental set-up. Theoretical models can help advance experimental investigations by providing a framework for deriving general principles and developing testable hypotheses (Reeves et al., 2006; Tomlin and Axelrod, 2007; Lewis, 2008; Oates et al., 2009; Davidson, 2010). An effective model should be able to define and predict expression accurately by describing how and by how much regulators influence target gene expression (Hasty et al., 2001; Segal and Widom, 2009).

Transcription in animals is controlled by interaction among transcription factors (TFs), enhancers, core promoters, silencers, insulators, and chromatin structure (Lemon and Tjian, 2000; Arnosti, 2003; Levine, 2010; Ohler and Wassarman, 2010; Dean, 2011). It is thought that core promoter elements and chromatin structure provide general competence for transcription at transcription start sites (Lenhard et al., 2012), whereas more distant enhancers up-regulate expression of genes under specific conditions (Bulger and Groudine, 2011; Ong and Corces, 2011). A single gene can be regulated by multiple enhancers, each directing a portion of the overall gene expression pattern in space and time. Enhancers operate by binding TFs, which in turn recruit regulatory co-factors and/or interact directly with the core promoter where RNA polymerase acts (Spitz and Furlong, 2012). A comprehensive model of transcriptional regulation would therefore include many factors, such as regulatory DNA sequence, DNA conformation, TF concentrations and nucleosome position among others (Segal and Widom, 2009). However, many of the parameters in such a model are currently impossible to measure. In the absence of such measurements, a partial yet predictive model based on available data is still valuable.

Here, we propose models of transcriptional control that are highly predictive of target gene expression given only TF concentrations at cellular resolution. Our goal is to make few assumptions about the underlying molecular mechanism. Instead, by generating models that predict experimental measurements as accurately as possible, we infer probable biological mechanisms and insights suggested by the parameters of the models. To achieve this, we focus on modeling the functional link between TF inputs and the resulting output (i.e., the ‘regulatory input function‘). These models are specific to individual enhancers: they capture how genomic loci interpret TF concentrations to control the output expression level of their target genes. Though multiple previous modeling studies have explicitly included protein–DNA interactions (e.g., in Drosophila, see He et al., 2010; Janssens et al., 2006; Junion et al., 2012; Kazemian et al., 2010; Segal et al., 2008; Zinzen et al., 2009), here, we choose to model the relationship between inputs and outputs directly as this offers several advantages. First and most importantly, this type of model encapsulates numerous relevant levels of biophysical interactions (i.e., TF-DNA, TF-TF, enhancer-promoter etc). Second, it enables us evaluate the utility of higher-order interactions between TFs, propose potential regulators and consider alternative hypotheses of experimental results. Third, in the context of developmental biology, it allows us to explore the minimal information required to define positional information in the early embryo. Finally, focusing on input and output measurements means that the approach is applicable to relatively uncharacterized systems, for instance where enhancer regions have not yet been identified, or in assessing the conservation of regulatory input functions between species (Wunderlich et al., 2012).

We develop and test our models in the context of the well-studied even skipped (eve) enhancers in order to demonstrate their accuracy and utility. eve is expressed in a symmetrical pattern of seven stripes that subdivide the embryo along the anteroposterior axis (Nüsslein-Volhard and Wieschaus, 1980). Each stripe is only a few nuclei wide and any regulatory input function of an enhancer must define at least two borders at a high level of precision. A number of well-characterized enhancers direct expression of the seven eve stripes individually or in pairs (Goto et al., 1989; Harding et al., 1989; Fujioka et al., 1999). Here, we focus on the enhancers eve 2 and eve 3+7, which have been shown to control stripe 2 and stripes 3 and 7 respectively (Goto et al., 1989; Harding et al., 1989; Stanojevic et al., 1991; Small et al., 1992, 1996). Many of the input TFs and their roles in regulating eve expression have been defined; however, there remain unexplained properties underlying their regulation. An advantage of modeling eve is that we can use the available information as independent validations of our ability to recover known regulators and predict the outcome of regulatory perturbations, while also producing new insights. It is notable that though there has been some success in simulating the simpler gap gene expression pattern and in predicting eve expression on a portion of the anteroposterior axis, modeling pair-rule expression accurately across the whole embryo has remained a significant challenge (Jaeger et al., 2004; Janssens et al., 2006; Papatsenko and Levine, 2008; Segal et al., 2008; Kazemian et al., 2010; Kim et al., 2013).

To fit regulatory input functions, we require accurate measurements of expression levels for both the regulating TFs and eve. The Virtual Embryo from the Berkeley Drosophila Transcription Network Project provides the best available data for this purpose (Fowlkes et al., 2008). It is a cellular resolution, spatiotemporal atlas of gene expression and morphology for a whole Drosophila melanogaster blastoderm embryo. The dataset contains the three-dimensional coordinates for 6078 nuclei, along with mRNA expression measurements of 95 different genes at six time points during the 50 min leading to gastrulation: these genes include critical TFs that direct patterning in the early Drosophila embryo.

Using our modeling framework, we (i) predict expression patterns with accuracy and explanatory power at cellular resolution across the whole embryo; (ii) recover previously described regulatory relationships and test whether they provide sufficient positional information to define the resulting expression pattern; (iii) propose potential new regulatory relationships by comparing alternative models; and (iv) predict expression patterns under perturbation of input TFs, capturing the outcome of knockdown and misexpression experiments. Given the high level of accuracy of our models, we conclude with observations regarding mechanism and principles of enhancer function.

Results

Approach of this study

Our strategy is to find the logistic regression coefficients that most accurately describe the relationship between measured regulator concentrations and specific stripes of eve expression (Figure 1). We first train our models using the known regulators described in the literature to evaluate if they are sufficient for determining eve expression. At this stage, we also test the model for consistency across different subsets of the data. Next, we ask generally which regulators are able to specify eve expression (regulator discovery) and consider the plausibility of concentration-dependent dual regulation. Finally, we assess whether our models are able to predict beyond the conditions of the training data: specifically, we test whether our models can predict expression under perturbation, such as mutation of TFs and their cognate enhancer binding sites, by comparing our predictions with independently published experimental results.

Figure 1 with 1 supplement see all

Download asset Open asset

Schematic representation of method used to model *eve* expression.

(A) Logistic regression is used to calculate the probability p_i that *eve* is ON in a given nucleus i, given TF concentrations. A logistic model linearly combines the values of independent variables (in this case, the concentrations, x_ki, of regulators 1 to k) to produce a prediction; the predictor, η_i, is then transformed by the logistic function to give the probability, p_i, of *eve* being ON. The weight parameters β_k are optimized to provide the best fit with the training data: positive weights indicate activators and negative weights indicate repressors. (B) Schematic representation of the data preparation, model training and prediction steps using *eve* stripe 2 as an example. The plots represent how logistic regression operates; the lateral perspectives of the embryo show the Virtual Embryo and processed expression data for *eve* and four regulators (Bcd, Hb, Kr, and Gt). In Step 1, *eve*’s expression is discretized whereas TF concentrations are retained as continuous values. Each nucleus corresponds to a data point. In Step 2, the logistic model is trained to classify whether *eve* is ON or OFF using all nuclei in stripe 2, and all OFF nuclei in the embryo. In Step 3, the trained model is used to predict *eve* expression in every nucleus of the entire embryo using the concentrations of the relevant regulators within them (shown in green for activators, and purple for repressors). In Step 4, the effects of perturbations are predicted by adjusting the concentration of the regulator under consideration (in this case, Hb), but without changing any model parameters.

https://doi.org/10.7554/eLife.00522.003

A classification approach for modeling eve expression

Preparing the data

We trained our models on a single time point: this has the advantage of eliminating uncertainties regarding nuclei assignments across time points in the Virtual Embryo dataset, yet still provides sufficient data for fitting. All models were trained using the third time point corresponding to ∼30 min before gastrulation, when according to the Virtual Embryo data, the borders are sharpening and the stripes are not moving dramatically (Figure 1—figure supplement 1). This gave us confidence that eve is transcriptionally active in the relevant nuclei. At this time point, eve’s expression changes from high to low over only a few nuclei across all seven stripes along the anteroposterior axis. Thus we categorized each nucleus as ON or OFF depending on whether eve is above or below a value of 0.2 since this defines the stripe borders reasonably (Figure 1—figure supplement 1). The Virtual Embryo dataset contains expression measurements normalized to a range of 0 to just over 1 across the entire embryo and time points; thus 0.2 corresponds to ∼20% of the maximal expression.

We made use of mRNA measurements for 34 regulatory genes and protein measurements for an additional four genes (bicoid, hunchback, Kruppel and giant). Since our model does not require absolute concentration measurements, mRNA expression is a reasonable proxy for protein assuming that the spatial distribution of the two is similar. We distinguish between protein and mRNA measurements by indicating the regulator name in italics for mRNA (e.g., gt) or normal case for protein (e.g., Gt). In contrast to eve, the expression profiles of these regulators were retained as continuous measurements because many of them are expressed in a graded fashion. Four pair-rule genes (fushi tarazu, odd skipped, hairy and paired) that have similar stripe patterns to eve were excluded from the data set; although some of them might help modulate the expression of eve, they were removed so that we could assess whether eve’s complex spatial pattern could be derived directly from simpler patterns of the regulators upstream of the pair-rule genes. Moreover, eve expression looks qualitatively normal in these TF mutants (Schroeder et al., 2011).

Modeling eve expression using logistic regression

We selected logistic regression for modeling eve expression because it provides a framework for linking continuous input variables (i.e., the regulator concentrations) to a binary output (i.e., eve’s expression state). Like linear regression, a logistic model linearly combines the values of the independent variables to produce a prediction; but the linear predictor is further transformed by the logistic function to give the probability, p, of eve being ON (Figure 1 for a schematic of ‘Methods’). As in any regression model, the weight parameters are optimized so that the output shows the greatest agreement with the training data. The weight assigned to each TF indicates its regulatory role, with positive weights indicating activators and negative weights indicating repressors. Importantly, since each regulator in the linear combination has independent weight parameters, the model needs only relative concentration measurements. Models were trained for classification using the nuclei of the stripe(s) under consideration as well as all OFF nuclei in the embryo. It is important to note that although this can be viewed as a training step, the ability of the model to classify at this stage is of direct interest to us: do the regulators contain sufficient positional information to explain eve expression in the given stripe(s)? We then use the model to predict eve expression in every nucleus across the entire embryo using the concentrations of the relevant regulators within them. This step reveals the model’s applicability across the whole embryo, rather than just for the nuclei that were used for training.

Linear logistic modeling accurately recapitulates eve 2 expression

First we focused on the expression of the second stripe of eve, as it is directed by a very well-characterized enhancer, eve 2. Through detailed molecular analysis, it is known that eve stripe 2 is controlled by the gap genes (Frasch and Levine, 1987; Stanojevic et al., 1991; Small et al., 1992), a class of TFs that are present in broad regions of the early embryo. In the generally accepted minimal mechanism, two activators Hunchback (Hb) and Bicoid (Bcd) enable broad permissive eve expression in the region of stripe 2, while two repressors Giant (Gt) and Kruppel (Kr) define the anterior and posterior borders respectively by suppressing eve outside the stripe.

Modeling with known regulators recapitulates eve 2 expression

We trained the logistic model to define the expression of eve stripe 2 using a linear combination of the measured concentrations of Hb, Bcd, Gt, and Kr (Figure 2A). Figure 2B shows the model’s output for every nucleus plotted from two perspectives according to their coordinates in the Virtual Embryo. Every nucleus is assigned a probability of eve expression and the color scale ranges from light (p=0) to dark (p=1); nuclei within the stripes (defined by actual eve expression) are shown in grey-scale from white to black, and predictions outside stripes are presented on a red-scale with peach for values near 0. Figure 2C depicts the probability of eve expression being above the threshold in the nuclei of the lateral midline along the anteroposterior axis. It is immediately apparent that the model successfully combines the four known regulators to define precisely the location of eve stripe 2. (Bcd’s role as a potential repressor is discussed below).

Figure 2 with 4 supplements see all

Download asset Open asset

Logistic models accurately predict *eve 2* expression.

(A) Lateral perspectives of the *Drosophila* embryo depicting the contribution of four regulators (Bcd, Hb, Gt, and Kr) to the model output. Embryos are drawn with the anterior (A) to the left and posterior (P) to the right, along with regulator names and corresponding coefficients in the model. Each nucleus is shaded to indicate the level of contribution by regulators, with darker colors signifying stronger effects (in this case, due to higher regulator concentrations): green represents a positive, activating effect and purple a negative, repressive one. Inputs are continuous, but drawn using a discrete color scale for simplicity. (B) Lateral and 3D perspectives of the embryo show the model prediction of *eve* stripe 2 expression. Each nucleus is colored from light to dark for low to high probability of *eve* being ON: within stripes the color scale is from white to black and outside the stripes it is on a red scale, with peach for values below 0.15. (C) A ribbon plot showing the probability of *eve* expression (y-axis) for nuclei within 10 μm of the lateral midline along the anteroposterior axis (x-axis). The plot demonstrates that the stripe borders are sharply defined. It also allows easy comparisons with other models that are generally performed in one dimension. (D) For regulator discovery, for every possible pair of regulators, we determined the best-scoring model of four regulators containing the pair. The 38 regulators in the dataset are shown on the x- and y-axes of the heat map, and the highest scores for every pair are depicted in the intersecting cell on a color scale from light (minimum score in the heat map) to dark (highest score in the heat map). Regulators making consistently informative contributions to models can be identified by the dark bands running across the heat map. Using linear logistic models, Gt, Hb and Bcd can be clearly seen to be informative regulators (highlighted in black). (E–G) Prediction made using a quadratic logistic model, in which Bcd is assigned a concentration-dependent dual regulatory activity: it is an activator (green) at low concentrations in the region of stripe 2, and a repressor (purple) at higher concentrations everywhere else. The model outputs a better prediction for stripe 2 expression as shown in Table 1. Most importantly, it reconciles Bcd’s apparently paradoxical behavior compared with the literature (Small et al., 1992; Andrioli et al., 2002). (H) Regulator discovery using quadratic models identifies Gt, Hb, Bcd, and Kr as informative regulators (highlighted in black).

https://doi.org/10.7554/eLife.00522.005

To the best of our knowledge, this the first time that eve 2’s expression has been predicted so accurately across the entire embryo including the anteriormost region. Most nuclei inside the stripe are correctly classified as having a high probability of being ON, and there is minimal ‘over-spill’ either side of the stripe (Figure 2A–C and Table 1). The model defines eve expression around the entire circumference of the embryo, following the dorsal-ventral curvature of the stripe: this demonstrates that the four standard regulators of eve 2 already encode this information, implying that dorsoventral factors are not required to provide this information directly to eve 2. Finally, it is notable that the model predicts a small amount of expression near stripe 7; ectopic expression of stripe 7 is sometimes observed in transgenic reporters driven by eve 2 enhancers (Small et al., 1992; Janssens et al., 2006; Hare et al., 2008).

Table 1

Measurements quantifying the accuracy of eve 2 predictions

https://doi.org/10.7554/eLife.00522.010

Model	In the stripe (%)	Immediate neighbors (%)	2^nd degree neighbors (%)
eve 2 linear logistic	86	23	2
eve 2 quadratic logistic	93	21	2

The table shows the percentage of nuclei with predicted eve expression (p>0.5). This is a stringent measure of the accuracy of classification, and is particularly useful for assessing the accuracy of the stripe borders. Nuclei with eve expression>0.2 are defined as those ‘in the stripe’; neighboring nuclei are outside this thresholded region, either immediately adjacent to it, or two nuclei away. A perfect prediction should identify all stripe 2 nuclei as having a high probability of ON with the probability of being ON dropping off rapidly further from the stripe. Though both linear and quadratic models output excellent predictions, the latter provides a slightly more accurate fit to the data.

It is worth noting here that the model’s performance is most reliably assessed by visually comparing the predicted and actual distributions of eve expression as in Figure 2; this enables one to evaluate thousands of individual predictions, as well as the overall shape of the prediction, which are not easily captured in a single statistical measure. Nonetheless, Table 1 quantifies the accuracy of the model—particularly in defining the borders of the stripe—by calculating the percentage of nuclei with a high fitted probability of eve being ON (threshold p>0.5; see ‘Methods’ for description of p). Almost all nuclei within the stripe are correctly identified as ON, and the percentage of nuclei having a high fitted probability quickly drops off further from the stripe.

The model performs consistently across different subsets of the data

We were interested in the extent to which our model fit is dependent on the subset of the embryo chosen as training data (Figure 2—figure supplement 2). We found that the model performs well in a cross-validation test in which we averaged 100 predictions with the training data restricted to 50 randomly selected nuclei; that is, less than 2% of the training data. Conversely, we also assessed whether the model is overly flexible in being able to train and predict the expression of any arbitrary stripe in the embryo using the above four regulators. We found this is not the case, suggesting that the positional information provided by these regulators is specific for the eve 2 enhancer and that our model interprets this information accurately (Figure 2—figure supplement 3).

Regulator discovery ascertains the known regulators

Having successfully applied the model using known regulators, we next developed a method to identify a parsimonious set of regulators from the dataset informative for the target enhancer’s expression. Such techniques are broadly applicable in discovering potential regulators of uncharacterized enhancers, and therefore useful in producing testable hypotheses. We tested a stepwise selection process, but found that it generally includes more regulators than necessary for a good visual fit (e.g., a stepwise selection procedure for eve 2 with the Bayesian information criterion finds 11 regulators). The stopping point (i.e., the penalty for adding an extra parameter) is effectively arbitrary in this case, or at least difficult to determine a priori in a justifiable manner. Additionally, stepwise selection does not consider all models exhaustively.

Instead, since we are particularly interested in identifying parsimonious models that can explain eve expression, and logistic models are fast to fit, we took the approach of fitting all possible models of four regulators out of the possible 38 in the dataset (73,815 models) and used the log likelihood of each fitted model as its score (or equivalently here, the Akaike information criterion). Gratifyingly, the best-scoring model comprises the known regulators Hb, Bcd, Gt, and Kr.

To make use of the scores more generally, we developed a method that summarizes the scores for all 73,815 predictions and highlights regulators that work well together (Figure 2D). For each possible pair of regulators (the fixed pair), we determined the best-scoring model of four regulators containing the pair. The two regulators of the fixed pair are shown on the axes on the heat map, with the highest score for the pair depicted in the intersecting cell on a color scale from light (the minimum score on the heat map) to dark (the maximum score). Dark bands crossing the heat map highlight individual regulators that consistently make informative contributions to stripe 2 expression. Hence, it is clear that Bcd, Hb, and Gt are key regulators. Although Kr is actually in the top-scoring model, the heat map does not show it as consistently informative.

A quadratic logistic model suggests a dual regulatory role for Bcd

The linear model successfully recapitulates stripe 2 expression; however, it identifies Bcd as a repressor, whereas most existing literature defines the TF as an activator. Despite the apparent consensus, Bcd’s function is not straightforward. The need for Bcd-binding sites for successful eve expression suggests an activating function (Small et al., 1992); but this does not explain why the enhancer is inactive in the anteriormost region of the embryo despite Bcd being present at high concentrations and the known repressors Gt and Kr having low concentrations (Figure 2A). Our linear models reflect this apparent paradox: Bcd is highlighted as one of the most important TFs during regulator discovery in spite of consistently having a negative coefficient, but a model trained by excluding the anterior region of the embryo assigns Bcd an activating function (Figure 2—figure supplement 1). These observations strongly suggest that Bcd—as both a repressor and activator—provides useful positional information to eve.

We asked whether these two functions could be reconciled if Bcd’s regulatory effect were dependent on its concentration, either directly, or mediated through other factors or post-translational modifications (Janody et al., 2000, 2001; Andrioli et al., 2002). This is readily modeled by adding a single parameter: a quadratic term for Bcd (Figure 2E–G). The result is clear: the modified model retains a repressive function for Bcd in the anterior of the embryo where it is present in high concentrations, but enables an activating function in the region of stripe 2 where it has lower concentrations (Figure 2E). The modification doesn’t lead to over-fitting on small training subsets and in fact improves the model’s ability to generalize to the whole embryo from an anteroposteriorly restricted training subset (Figure 2—figure supplements 2 and 4). In addition, regulator discovery now identifies all four TFs as important, with a more consistently informative role for Kr than in the simple linear model (Figure 2H).

Independent experiments validate eve 2 model predictions

We next tested whether our model is predictive of experimental perturbations. We considered experiments that test the role of eve 2 regulators by either knocking down the input TF (Stanojevic et al., 1991), or by mutating binding sites for that TF in the eve 2 enhancer (Arnosti et al., 1996). To simulate these perturbations, we set the concentrations of Bcd or Hb to zero without further adjustment of the coefficients. Strictly speaking, this models the direct effect of the perturbation and is akin to the removal of the relevant binding sites from the enhancer.

The results of these perturbations are shown in Figure 3. Only the quadratic model correctly predicts the expression pattern in a Bcd null mutant (Figure 3C). In the linear model, Bcd is designated a repressor and so its mutant causes broad eve expression in the anterior of the embryo in contrast to the experimental result (Figure 3C). In the quadratic model the lack of either activator (Bcd or Hb) abolishes the expression of stripe 2 as expected (Figure 3C,D). In both the linear and quadratic models, the loss of the repressors Gt or Kr causes eve expression to extend towards the anterior and posterior of the embryo respectively, in line with their roles in defining the stripe borders (Figure 3A,B).

Figure 3

Download asset Open asset

The quadratic model accurately predicts *eve 2* expression under perturbation of input TFs.

The effects of regulatory perturbations on stripe 2 expression are predicted by altering regulator concentrations but keeping all the model coefficients unchanged; for TF deletion or binding site mutants, this involves setting the relevant regulator’s concentrations to 0. Predictions are made for perturbations using the linear and quadratic models. Comparisons to experiments provide robust, independent validations of model predictions. Loss of (A) gt or (B) Kr causes *eve* expression to extend towards the anterior and posterior of the embryo respectively, in excellent agreement with experimental evidence. (C) For the *bcd* mutant, the linear model predicts expression at the anterior of the embryo, something that is not observed in experiments. In contrast, the quadratic model does not suffer from this. (D) Perturbing hb leads to complete loss of *eve* stripe 2 for both models. The better agreement between predictions and experimental evidence suggests that the quadratic is a more plausible model of *eve 2* regulation.

In situ images in panels 1, 2, and 6 are reproduced from Figure 4B–C and 6C, Small et al. (1992), *The EMBO Journal*; Nature Publishing Group has granted permission to reproduce these images under the terms of the Creative Commons Attribution 3.0 Unported License (CC BY 3.0).

© 1991, Cold Spring Harbor Laboratory Press, All Rights Reserved. The in situ image in panel 3 is reprinted with permission from Figure 2D, Small et al. (1991), *Genes & Development*.

© 1991, American Association for the Advancement of Science, All Rights Reserved. In situ images in panels 4 and 5 are reprinted with permission from Figure 3A and 3C, Stanojevic et al. (1991), *Science*.

© 1996, The Company of Biologists, All Rights Reserved. The in situ image in panel 7 is reproduced with permission from Figure 6B, Arnosti et al. (1996), *Development*.

https://doi.org/10.7554/eLife.00522.011

Both models predict the observed response to binding site mutations: the expansions of stripe 2 in the correct directions and extent along the length of the embryo. The models demonstrate that the extent of posterior extension in the Kr mutant is restricted because of decreasing activator concentrations (Figure 3B). For the anterior extension in the Gt mutant, the restriction requires a repressor since activator concentrations remain high to the end of the embryo (Figure 3A; Andrioli et al., 2002). Bcd can provide this repression in both linear and quadratic models: however, only the quadratic can reconcile this with Bcd’s known activating function. For the linear model to work with a Bcd activator, one would need a fifth regulator as an anterior repressor. Indeed, multiple studies have searched for a repressor in this region, and multiple candidates have been identified though none have been conclusive (Bellaïche et al., 1996; Janody et al., 2000; Andrioli et al., 2002; Zhao et al., 2002; Singh et al., 2005).

Models successfully predict eve 3+7 expression

eve stripes 3 and 7 are regulated together by a single enhancer (Small et al., 1996; Clyde et al., 2003; Struffi et al., 2011). Such an arrangement requires appropriate TF concentrations for eve activation to be present in nuclei separated by some distance. We tested whether our modeling framework can contend with the challenge of specifying two extra stripe borders using the available regulator concentrations.

A combination of modeling and regulator discovery suggests two plausible models

We first fit our models using only the known regulators of eve stripes 3 and 7, Hb and Kni. Kni is thought to repress the region between the stripes and Hb is thought to repress in the anterior and posterior regions outside the stripes (Clyde et al., 2003). The measured concentrations of Hb (protein) and kni (mRNA) alone are not sufficient for our models of stripe 3 and 7 expression; in particular, the concentration of Hb is too low to repress expression to the posterior of stripe 7 (Figure 4—figure supplement 1).

Using regulator discovery (Figure 4—figure supplements 6 and 7), we identified two alternative models that are able to define stripes 3 and 7 (Figure 4). The first is a linear logistic model that includes two additional gap genes, Giant (Gt) and tailless (tll); including both Gt (protein) and tll (mRNA) improves predictions over including tll alone (Figure 4—figure supplement 2). In this model, all regulators function as repressors: the model has a positive intercept which can represent a ubiquitous activator (Figure 4A). Our second model is a quadratic logistic regression model that treats Hb as a dual regulator, in a similar manner to Bcd for eve 2 (Figure 4D). Concentration-dependent regulation by Hb—as an activator at low concentrations and repressor at higher levels—has been suggested by previous experimental work (Hülskamp et al., 1990, 1994; Zuo et al., 1991; Schulz and Tautz, 1994); and used to model stripe 3 expression (Papatsenko and Levine, 2008) and gap gene regulation (Bieler et al., 2011). Using regulator discovery, we again identified tll as the top candidate for repressing expression posterior to stripe 7 (Figure 4—figure supplement 7). tll has previously been proposed as a regulator of stripe 7, in some cases as an activator (Small et al., 1996) and in others as a repressor (Janssens et al., 2006; Morán and Jiménez, 2006). Both our linear and quadratic models output good predictions of eve stripes 3 and 7 (Figure 4B,C,E,F and Table 2). As with predictions for eve 2, the high probability predictions are within the stripes and the models successfully replicate eve expression around the embryo. Further, using these chosen regulators, the models are not able to train and predict the expression of any arbitrary pair of stripes (Figure 4—figure supplements 4,5).

Figure 4 with 7 supplements see all

Download asset Open asset

Linear and quadratic logistic models accurately predict *eve 3+7* expression.

Regulator inputs and model output are shown as in Figure 2. (A–C) The linear model including Hb, *kni*, *tll,* and Gt; (D–F) the quadratic model comprises Hb, *kni,* and *tll*, with a quadratic term for Hb as a concentration-dependent dual regulator. Both models clearly define the two stripes, though the midline ribbon plots show that the quadratic model defines the sharpest borders. The initial predictions therefore suggest that the quadratic model provides the best output.

https://doi.org/10.7554/eLife.00522.012

Table 2

Measurements quantifying the accuracy of eve 3+7 predictions

https://doi.org/10.7554/eLife.00522.020

Model	In the stripe (%)	Immediate neighbors (%)	2^nd degree neighbors (%)
eve 3+7 linear logistic
kni and Hb	0	0	0
kni, Hb, and tll	56	39	27
kni, Hb, tll, and gt	76	37	17
eve 3+7 quadratic logistic	86	39	9

The table shows similar measures of accuracy for stripes 3 and 7 as in Table 1. It is clear that the quadratic model and the 4-regulator linear model provide the best predictions, with the most sharply defined borders.

There are reasons to favor the quadratic model

We prefer the quadratic model over the linear for a variety of reasons. First, it is simpler: the quadratic requires only three regulators, compared to four in the linear model. Both models have five parameters, which include three shared regulators (Hb, kni, tll) and the intercept. Second, the quadratic model has more clearly defined stripe borders than the linear model. Third, the quadratic model is more robust to the choice of training data (Figure 4—figure supplement 3), indicating that it describes the regulatory relationship uniformly across the embryo: the model performs consistently whether it is trained on either of the two stripes, a restricted region around the lateral midline, or only on the stripes and their immediately neighboring nuclei. Finally, the quadratic model retains accurate expression of the stripes even when all 38 candidate regulators are included; by contrast the prediction from the linear model begins to fragment spatially, which suggests localized over-fitting.

Independent experimental perturbations are consistent with the quadratic model

As with eve 2, we can further compare the models by predicting the outcomes of regulatory perturbations of input TFs (Figure 5). Here we consider perturbations of kni and hb, the best characterized regulators of eve 3+7. It is again important to distinguish between expression in a mutant background, which reveals both direct and indirect interactions, and corresponding binding site mutations within the eve 3+7 enhancer, which probe only direct interactions.

Figure 5 with 3 supplements see all

Download asset Open asset

Linear and quadratic logistic models accurately predict *eve 3+7* expression under perturbation of Kni and Hb.

The effects of regulatory perturbations on *eve 3+7* expression are predicted as described in the main text. (A) Perturbation of *kni* and its binding sites cause full reporter expression between the stripes. The linear model predicts this observed extension, but the quadratic does not. (B) Perturbation of hb causes stripe 3 to expand and move anteriorly, and stripe 7 to expand slightly. Binding site mutations show similar effects, though perhaps without the anterior shift of stripe 3. The linear model provides good prediction of both stripes. The quadratic produces a good stripe 3 prediction, including its anterior shift, but fails to predict any expression in stripe 7. (**C–F**) Given the initial preference for the quadratic, we considered minor and biological plausible assumptions that allow the model to make accurate predictions. For the *kni* mutants, these are (C) the minor adjustment of the intercept and (D) inclusion of indirect effects of *kni* on hb by increasing Hb by 50% of wild-type *kni*. For the hb mutants, these are (E) the inclusion of residual maternal Hb in the posterior and (F) simulating the effects of residual Hb binding sites.

In situ images in panels 1 and 3 are reprinted with permission from Figure 4B–C, Small et al. (1996), *Developmental Biology* (© copyright Elsevier, 1996, All Rights Reserved). In situ images in panels 2 and 4 are reproduced with permission from Figures 4H and 6D, Struffi et al. (2011), *Development* (© copyright The Company of Biologists, 2011, All Rights Reserved).

https://doi.org/10.7554/eLife.00522.021

Perturbing kni

In the kni mutant, expression of an eve 3+7 reporter transgene extends fully between the two stripes before partially retreating towards wild-type expression (Small et al., 1996). Similarly, when Kni-binding sites in the eve 3+7 enhancer are removed, the expression pattern matches the kni null mutant (Struffi et al., 2011), although an earlier transgenic reporter with fewer mutated Kni-binding sites showed only partial extension (Clyde et al., 2003).

To mimic both of these types of perturbations, we eliminated kni as an input. Under these conditions, the linear model predicts the observed full extension between the two stripes, whereas the quadratic does not (Figure 5A). However, given our reasons for preferring the quadratic model described above, it is worth considering some minor and biologically plausible assumptions to reconcile these perturbation experiments with the quadratic model (Figure 5C,D). We discuss these in terms of direct and indirect effects below.

Direct effects

The direct effects can be understood by considering two related minor adjustments (Figure 5C). First, we can assume that the Kni protein is ubiquitously expressed in the embryo at low concentrations, but that this is not reflected in the Virtual Embryo dataset; it is possible that in situ hybridization was not sufficiently sensitive for these low-level transcripts or that the protein has a slightly different profile to the kni mRNA. If we increase kni concentrations in the Virtual Embryo dataset by just 0.1 (∼10% of the maximum measured value across all time points), the retrained quadratic model predicts full extension between the stripes. The prediction for wild-type expression is not affected and this adjustment is sufficient for explaining both the kni mutant and the eve 3+7 binding site mutations. Second, since this adjustment produces a model with identical coefficients (i.e., β_k) except for the intercept (i.e., β₀)—which increases from −8.1 to −3.6—we can change the intercept directly in the quadratic model. This adjustment is also sufficient to predict full extension between the stripes, though it alters the wild-type prediction slightly (Figure 5—figure supplement 2). This change in the intercept corresponds to potential differences in expression between the endogenous gene and the transgenic reporter. For instance, the reporter might have a lower barrier to activation and be more efficiently transcribed relative to the endogenous enhancer. Alternatively, the mutations in the transgenic enhancer may have abolished Kni repression, but then introduced the binding of another weak, ubiquitous activator.

Indirect effects

Next, we consider the effects of Kni on downstream regulators (Figure 5D). There is strong evidence that kni is a repressor of hb, as its loss causes hb expression to extend from stripe 7 towards stripe 3 (Hülskamp et al., 1990; Clyde et al., 2003). We simulated this indirect interaction by increasing Hb concentration in proportion to the relative loss of kni. Since Hb is an activator at low concentrations in the quadratic model, this indirect effect can drive eve expression between the stripes. This adjustment is not relevant to binding site mutants, but interestingly it does provide a tentative explanation for the partial retreat of eve’s extension towards a wild-type expression pattern: as Hb concentrations increase over time, the TF eventually switches from an activator to a repressor of eve between the stripes (Figure 5—figure supplement 1).

Perturbing hb

hb is both maternally and zygotically expressed. In embryos null for hb zygotic expression, eve stripe 3 moves anteriorly and expands, whereas stripe 7 shows more limited widening (Small et al., 1996). Mutating Hb-binding sites in the eve 3+7 enhancer leads to similar expansion, though perhaps without the anterior shift of stripe 3 (Struffi et al., 2011). Maternally deposited hb mRNA is ubiquitous but differentially translated in the anterior (Hülskamp et al., 1989). Zygotically, hb is transcribed in both an anterior domain that largely overlaps the maternal hb pattern and in a posterior stripe (Margolis et al., 1995). Thus at the time point used here, the zygotic mutant likely contains residual Hb protein in the anterior at the time point we use in this study.

We simulated the zygotic mutant by eliminating Hb altogether in the posterior and decreasing its expression to 20% in the anterior domain (Figure 5B). In these conditions, the linear model produces a good prediction for both stripes. The quadratic model arguably produces a better prediction for stripe 3, capturing its movement towards the anterior; however, it predicts zero expression in stripe 7. Based on our preference for the quadratic model as described in previous sections, and on the evidence presented in the following section, we again consider minor adjustments to reconcile it with experimental results (Figure 5E,F).

Direct effects

The first adjustment is relevant for the zygotic mutant, and assumes some active Hb in the posterior around stripe 7 (Figure 5E). Specifically, having as little as 0.15 of Hb (∼15% of maximal expression) in this region is sufficient for a good prediction of stripe 7. Next, we simulated the effects of having some residual Hb-binding sites in the enhancer, as a mutagenesis experiment may not abolish all binding (e.g., see Clyde et al., 2003 compared to Struffi et al., 2011). A simple way to model this is to have the same low level of Hb activity as in the zygotic mutant, which produces a good prediction as just described. As an alternative, we also considered whether incomplete mutagenesis could affect Hb’s dual-regulatory behavior. Papatsenko and Levine (2008) proposed that the dual role is facilitated by adjacently bound Hb molecules masking each other’s active sites; in this scenario, we would expect dual regulation to be attenuated as binding sites are lost through mutagenesis. We simulated this by dampening the regulatory effect of Hb at higher concentrations and found that the predicted expression patterns agree with experimental results (Figure 5F, Figure 5—figure supplement 3). Notably, this prediction does not show movement of stripe 3 towards the anterior or expansion in stripe 7, correctly reflecting the experimental results of binding site mutagenesis.

Models predict eve 2 and 3+7 expression at earlier time points

We also tested the linear and quadratic models for both eve 2 and eve 3+7 on the two previous time points in the Virtual Embryo, which were not used for training (Figure 6). The results for eve 2 show a wider stripe forming before it narrows to the boundaries of stripe 2. This mirrors published results for an eve 2 reporter as well as the endogenous expression of eve (Small et al., 1992; Andrioli et al., 2002). The predictions for eve 3+7 are also consistent in terms of the positions of the stripes, although they have stripe 3 appearing earlier than stripe 7. This timing difference is not obvious in the endogenous eve expression recorded in the Virtual Embryo, although stripe 7 does appear relatively weak in some transgenic reporters (Small et al., 1996). At the earlier time points the difference in sharpness of the stripe borders between the quadratic and linear model is more pronounced suggesting that the interpretation of positional information by the quadratic model is more stable and precise.

Figure 6

Download asset Open asset

Models predict *eve 2* and *3+7* expression at earlier time points.

Model predictions for earlier time points in the Virtual Embryo are shown for the (A) *eve 2* and (B) *eve 3+7* linear and quadratic models. The time points are labeled from the start of the dataset; the third time point is the one used throughout the main text. For *eve 2,* the linear and quadratic models show a wider stripe at the second time point and a well-defined stripe at time point 3. This matches the in situ images below from Andrioli et al. (2002) which show a transgenic reporter at early and mid cycle. The predictions for *eve 3+7* are consistent in terms of the positions of the stripes, with stripe 3 appearing earlier than stripe 7. At the earlier time points the difference in sharpness of the stripe borders between the quadratic and linear model is more pronounced suggesting that the interpretation of positional information by the quadratic model is more stable and precise.

The in situ image for *eve 3+7* is reprinted with permission from Figure 2C, Small et al. (1996), *Developmental Biology* (© copyright Elsevier, 1996, All Rights Reserved). In situ images for *eve 2* are reproduced with permission from Figure 4A,B, Andrioli et al. (2006), *Development* (© copyright The Company of Biologists, 2006, All Rights Reserved).

https://doi.org/10.7554/eLife.00522.025

Quadratic eve 2 and 3+7 models predict TF misexpression results better than linear models

As a final test of our models, we compare our predictions to the misexpression work of Clyde and colleagues (Figure 7, Figure 7—figure supplement 1; Clyde et al., 2003). In this study, the authors constructed transgenes with a snail promoter that misexpressed hb or kni in the ventral region of the embryo and recorded the effects of expressing one or two copies of these transgenes. The experiments confirmed that Hb and Kni repress eve 3+7 and 4+6 at different concentrations as suggested by Fujioka et al. (1999). However, the experiments also revealed some curious observations that are not easily explained. First, stripes 3 and 7 respond differently to the same additional concentrations of Hb despite being regulated by the same enhancer. Secondly, and most intriguingly, the results show substantial bending of stripes: in the presence of one copy of the hb transgene, stripe 3 extends towards the posterior of the embryo, and with two copies, stripe 7 bends towards the anterior. These behaviors cannot be explained readily by simple, qualitative inspection of the embryos and they were not explored in the original study.

Figure 7 with 1 supplement see all

Download asset Open asset

Quadratic models accurately predict fine-scale features of expression patterns due to input misexpression.

The study by Clyde et al. (2003) misexpressed hb and *kni* along the ventral surface of the embryo using transgenes driven by a *snail* promoter and recorded the effects of one or two copies of these transgenes on *eve* expression. We replicated these experiments using quadratic models for *eve 2* and *eve 3+7* (trained on stripe 3), by adding Hb and *kni* in proportion to the distribution of *snail* in the Virtual Embryo dataset. As described in the main text, we also added an indirect effect from Hb activating Kr. (A) With *kni* misexpression, the model accurately predicts the thinning (x1 transgene), then cutting of stripe 3 (x2). (B) With Hb misexpression, the model successfully predicts the bulging, then cutting and bending of stripe 3 (x2), and the bulging of stripe 7 (x2). Stripe 2 remains unaffected in both perturbations, in agreement with the experimental results. The accuracy of the predictions indicates that the quadratic model for *eve 3+7* can explain the experimental results very well. In contrast the linear models are unable to predict these results.

In situ images are reproduced from Figures 1F–H and 1K–M, Clyde et al. (2003), published in *Nature*; Nature Publishing Group has granted permission to reproduce these images under the terms of the Creative Commons Attribution 3.0 Unported License (CC BY 3.0).

https://doi.org/10.7554/eLife.00522.026

To model this experiment, we added Hb and kni to the Virtual Embryo in proportion to the measured distribution of snail at different concentrations ranging from 0.1 to 0.4 (∼10% to ∼40% of maximal expression). We simulated the responses of eve 2 and eve 3+7 using both the linear and quadratic models above and a quadratic model trained on stripe 3. Figure 7 and Figure 7—figure supplement 1 display the predictions of these models at 0.2 and 0.4 of Hb and kni, simulating the effects of one or two copies of the transgene respectively. Figure 7 includes a putative indirect effect on eve 2 via Kr (see below).

The quadratic models predict the fine-scale effects of this misexpression experiment with remarkable accuracy whereas the linear models do not (compare Figure 7 to Figure 7—figure supplement 1A,B). Specifically, the quadratic model trained on stripe 3 successfully reproduces the bending and bulging of both stripes 3 and 7 under hb misexpression (Figure 7B), as well as the repression seen with kni misexpression (Figure 7A). Our models for eve 2 predict that since Hb is an activator, increasing the concentration of Hb should lead to increased activation of eve 2 and a resultant broadening of the stripe (Figure 7—figure supplement 1A,C,E); but this is not in fact observed in the misexpression study. Indirect effects of Hb on another factor, such as Kr, can resolve this discrepancy: adding Kr in proportion (50%) to the increase of Hb does indeed prevent stripe 2 from expanding, but only in the quadratic model and not in the linear (Figure 7 and Figure 7—figure supplement 1B,D). Given all the evidence provided here, on balance we conclude that the quadratic models are the most likely for both eve 2 and eve 3+7.

Discussion

Summary

Our goal was to understand the regulatory system underlying spatiotemporal patterning in the early Drosophila embryo by fitting regulatory input functions to the output of individual enhancers. Our models are accurate, predictive and simple to apply and interpret. We showed that simple functional forms relating TF concentrations to eve expression outputs are highly predictive of wild-type and mutant expression patterns. In doing so, we have demonstrated that precise positional information—in other words, information interpreted by individual nuclei to produce an expression pattern—is available in the early embryo. We determined whether TFs are most informative when serving as activators or repressors of each enhancer, and we also explored whether a dual-regulatory role for some TFs improved expression predictions. Here we discuss our work in relation to other models of regulatory function, the insights our models provide into transcriptional regulation and positional information in the embryo, and the experimentally testable hypotheses proposed by our models.

Previous models

The regulation underlying anteroposterior patterning of the Drosophila blastoderm has long been a favorite system for modeling work; for recent reviews, see for example Jaeger et al. (2009) and Papatsenko (2009). Some models have been successful in reproducing the gap gene patterns (Jaeger et al., 2004, 2007; Bieler et al., 2011; Papatsenko and Levine, 2011), but none have succeeded in accurately predicting precise stripes of even skipped across the whole anteroposterior axis of the embryo (Levine, 2008). In general, previous models have focused on utilizing information contained in the cis-regulatory sequence; for example predicting expression and evaluating potential TFs of a 1.7kb region of regulatory DNA upstream of eve (Janssens et al., 2006), fitting models to fusions of eve enhancers and predicting expression from different regulatory DNA (Kim et al., 2013), testing models of TF binding and synergy by predicting expression across many enhancers (Segal et al., 2008; He et al., 2010) and identifying enhancer sequences within the genome based on the fit between predicted and observed expression patterns (Kazemian et al., 2010).

Our choices in modeling the regulatory function of enhancers differ from these previous studies in a number of important respects. First, our models are highly accurate in fitting the eve expression pattern in the entire embryo. This is in part because we chose to model the regulatory function of each enhancer separately, rather than fitting a single model that applies across many enhancers simultaneously. By defining parameters that are specific for each enhancer, we are able to assign the regulatory roles for TFs in a context-specific manner. Second, our models also perform well because, unlike previous studies, we do not impose any biological mechanisms on our models (e.g., a ‘thermodynamic score’ for protein–DNA interactions). Instead we worked the other way round: we tested models that fit data as accurately as possible and then inferred the underlying mechanisms. This simple framework nonetheless allows us to propose experimentally testable hypotheses. Third, our modeling framework is quick to apply. This allowed us to search comprehensively for informative regulators, a property that is particularly valuable for studying poorly characterized enhancers.

Inferred mechanisms for regulatory input function

Since the models are accurate and predictive, they may reflect the underlying molecular mechanism for transcriptional regulation. Further, the models are relatively easy to interpret, so we can infer what they mean in terms of biological mechanism. Here we highlight three features.

Thresholding a combination of TFs is sufficient for positional information

One of the important questions in animal development is how each cell determines its position in the embryo. Early work on positional information in the Drosophila embryo was inspired by the idea of a morphogen gradient that is interpreted by the nuclei according to a set threshold (Ashe and Briscoe, 2006; Crick, 1970; Wolpert, 1969, 1996). More recently, it has been concluded that a lone-acting morphogen is insufficient for providing precise positional information to the embryo, especially since no gradient with this characteristic has been measured in an embryo (Kerszberg and Wolpert, 2007; Wolpert, 2011). Our model, however, shows that the combined action of multiple morphogens and a corresponding interpretative threshold is indeed able to read positional information from measured gradients alone. In particular, it succeeds by applying the threshold to the overall balance of activators and repressors rather than to each factor individually.

Focusing on the contributions of individual TFs also tends to emphasize the role of repressors in providing positional information to the eve stripes. Since repressors are often crucial in defining the borders of the stripes, it is natural to suppose that the activators are merely permissive, and that precision in positional information is provided by the repressors. Here we show that an alternative view is compatible with the data. In particular, activators and repressors contribute symmetrically to positional information: they work to increase or decrease the probability of transcription, but neither class acts separately according to a threshold that is independent of the concentrations of other factors. Thus, if more activators are present in a nucleus, a higher concentration of repressors will be required to reduce transcription to the same level. This means that positional information cannot be defined by any one factor in isolation, and nor can mutant results be interpreted reliably in the absence of data on other factors.

Pairwise cooperative interactions between TFs are not necessary for synergy

Our model can help clarify the concept of synergy, where the effect of one regulator depends on the concentration of another. This has been proposed in the context of transcriptional activators in general (Struhl, 2001) and observed between Hb and Bcd in controlling expression of eve 2 (Stanojevic et al., 1991; Simpson-Brose et al., 1994). Our model shows that this effect is observed with a linear combination of concentrations: that is, without any pairwise interactions between Hb and Bcd or other factors. Thus, our model is compatible with the early findings of Arnosti et al. (1996), which suggest that eve transcription is controlled by the total balance of activators and repressors rather than through complex and intricate combinatorial interactions between TFs. However, this is not to say that cooperative interactions do not take place, or are not important in other contexts, but rather that it is necessary to distinguish between synergistic interactions that can be explained by independent binding of multiple factors (as in our model), and those that occur as a result of pairwise interactions between TFs. We expect pairwise interactions between TFs on the DNA to require particular arrangements of binding sites. Therefore, the success of our model without pairwise interactions suggests that the ordering and exact spacing of binding sites are not important, except potentially in the case of dual regulation. This agrees well with multiple observations about the flexibility of enhancer sequences, which can tolerate rearrangement over evolutionary time while maintaining their function (reviewed in Borok et al., 2010).

Dual regulatory function of Hb and Bcd

Hb has a dual role: it acts as a repressor in some enhancers (e.g., eve 3+7) and an activator in others (e.g. eve 2) (Small et al., 1992, 1996). Here, like Papatsenko and Levine (2008), we model a dual role for Hb in the context of a single enhancer. In our model of eve 3+7, including a quadratic term for concentration-dependent dual regulation produces better wild-type predictions, explains experimental perturbations accurately (with certain assumptions), and produces consistent fits across different training subsets. Although Kazemian el al. (2010) did not find a quadratic term for Hb generally useful for fitting logistic models to Drosophila expression patterns, they did find this to be true for Bcd, particularly for the anterior parts of the expression patterns. In our work, this term is not needed for a good fit, but we add it for eve 2 to show how a repressive role in the anterior can be reconciled with an activating role around stripe 2 (Zhao et al., 2002; Singh et al., 2005).

Concentration-dependent regulatory activities have been observed in other systems: for instance in humans, at low concentrations, Sp1 is as an activator of the folate receptor gene in conjunction with Ets TFs; at higher concentrations it becomes a repressor by blocking Ets binding (Kelley et al., 2003). Our model does not reveal how Hb and Bcd achieve dual-regulatory activity, and it is quite possible that they make use of different mechanisms. One possibility is a change in protein–protein interactions, through formation of homo-oligomers or interactions with co-factors (e.g., Janody et al., 2001). Alternatively as with Sp1, changes in DNA occupancies may alter how regulators interact with adjacent TF molecules. We discuss experimental tests of these possibilities below. Regardless of mechanism, however, we propose that concentration-dependent effects are important, in contrast to the hypothesis that concentrations above a predefined threshold are neutral in effect. Moreover, we suggest that similar analysis techniques could be used to test potential dual-regulatory capabilities of other regulators, such as Gli and Lef/Tcf in the Hedgehog and Wnt signaling pathways (Logan and Nusse, 2004; Arce et al., 2006; Varjosalo and Taipale, 2008; Whitington et al., 2011).

Experimentally testable hypotheses

Our models predict which input TFs are relevant for a given enhancer, and whether they act as activators or repressors. In the case of eve 2 and eve 3+7, we showed that many of these predictions are confirmed by independent experiments already in the literature. These studies involve either perturbing a candidate regulator by mutation, over-expression or misexpression, or mutagenizing binding sites for a candidate regulator in an enhancer sequence, and then measuring the expression of eve. To confirm our predictions, we made qualitative comparisons between published data (in the form of a single representative image) and our model predictions. Having validated our modeling framework on these well-characterized enhancers, we can now broadly apply this framework to discover regulators for less well-characterized enhancers in this system. While many enhancers in this network have been mapped by computational studies and functional genomics (Berman et al., 2002; Schroeder et al., 2004; Kazemian et al., 2010; Négre et al., 2011; Schroeder et al., 2011), our knowledge of most of their regulatory input functions remains incomplete. Our modeling framework complements existing functional genomic and bioinformatics approaches: combined they will allow a comprehensive description of the relevant inputs of each of these enhancers, and how those inputs work together to produce an output expression pattern.

Our models also point to a role for concentration-dependent effects of Hb and Bcd on their targets. We hypothesize that this is due to concentration-dependent differences in protein-protein interactions, perhaps mediated by the arrangement of TF binding sites in an enhancer, as has been proposed for Hb (Papatsenko and Levine, 2008). To test whether binding site arrangements are important, the binding sites for Bcd and Hb can be rearranged within the eve 2 and eve 3+7 enhancers, and the output of these mutated enhancers measured. To test which parts of the TFs are involved in mediating protein-protein interactions, the TFs themselves can be mutated, and protein–protein interactions can be assayed by in vitro binding studies. Finally, to test the concentration-dependent effects directly, the concentration of Hb and Bcd can be manipulated in vivo by over-expression, knock-down and misexpression. Our modeling framework is especially useful in this last case, as predictions with and without concentration-dependent effects can be compared. We propose that misexpression studies are likely to be particularly informative, based on the fine-scale differences such as stripe bending and bulging that we were able to predict.

Instead of making qualitative comparisons to experimental data, it would be ideal to test our models quantitatively at cellular resolution. This is possible if we create additional Virtual Embryo data where perturbations, both to input TFs and enhancer sequences, are measured. For knock-down, over-expression or misexpression of TFs, we will need to create a new Virtual Embryo for each perturbation. This will capture all of the direct and indirect consequences of perturbing the TF. We can assess the consequences of mutating enhancer sequences by integrating transgenic reporters into any given Virtual Embryo dataset, as in Wunderlich et al., 2012. Creating these new datasets is not a trivial undertaking technically but it would provide the framework for us to directly compare the output of our model predictions to experimental data at cellular resolution to detect fine-scale differences, and without making assumptions about indirect effects. For example, this would allow us to test our proposed role for tll in repressing the posterior border of eve stripe 7, where classic experiments have been inconclusive and to validate future predictions for other enhancers in the segmentation network. We fully anticipate that analyzing this type of data will lead to further refinements of our models.

General applicability of our modeling approach

Clearly, our model depends on the quality of the data in the Virtual Embryo, which was derived from many in situ hybridization images of the Drosophila blastoderm (Keränen et al., 2006; Luengo Hendriks et al., 2006; Fowlkes et al., 2008). To predict spatiotemporal expression patterns, it’s important that the measurements are quantitative and at the resolution of individual cells. One advantage of the blastoderm is that the relevant nuclei are near the surface of the embryo, making it easier to segment the overall fluorescence signal and assign it to individual nuclei. However, microscopy and other techniques such as single-cell transcriptomics are continually improving (Kalisky et al., 2011); we anticipate that many comparable datasets will become available over time, both for other developmental time points in Drosophila, and in other model systems. Our study demonstrates how theoretical models can be applied to such data in order to make new biological discoveries.

Methods

Virtual Embryo dataset

Request a detailed protocol

Release 2.0 of the Virtual Embryo dataset was downloaded from the Berkeley Drosophila Transcription Network Project website (http://bdtnp.lbl.gov/Fly-Net/) (Fowlkes et al., 2008). The release contains composited mRNA expression measurements for 95 genes in 6078 nuclei at six time points (or ‘cohorts’). Also provided are protein expression data for four gene products (Bcd, Hb, Kr, and Gt) for some of the time points. Data for the current study were extracted from a ‘comma-separated values’ (CSV) format Virtual Embryo file (D_mel_wt__alas_r2.vpc): each row corresponds to a nucleus in the embryo, with columns containing measurements including three-dimensional coordinates, average expression level for a given gene, time point for measurement etc. Expression measurements are provided as relative values for each nucleus, ranging from ‘0’ for minimum expression across all six time points to a little over ‘1’ for maximum expression (e.g.,, the maximum for eve is 1.11 and for Hb it is 1.05). The variability in the maximum is a result of the method used to determine the relative variation between nuclei across different time points in the Virtual Embryo (Fowlkes et al., 2008).

The coordinates of the Virtual Embryo are along the anteroposterior (x), left-right (y) and dorsoventral (z) axes. The difference between the minimum and maximum is 404 μm for the x-coordinate, 154 μm for the y-coordinate and 155 μm for the z-coordinate.

Training data preparation

Request a detailed protocol

Training was performed using expression measurements at the third time point (Cohort 3). 6,078 nuclei were classed as ON (2444) or OFF (3634) depending on whether eve’s expression is above or below the threshold of 0.2 (approximately 20% of maximum). Nuclei were grouped into the seven eve stripes making use of the neighboring nuclei information provided in the Virtual Embryo (stripe 2 = 348 nuclei, stripe 3 = 342 nuclei, stripe 7 = 383 nuclei). mRNA expression measurements for 34 genes were included in the training data (brk, bun, cad, CG10924, CG17786, CG4702, cnc, croc, Cyp310a1, D, Dfd, Doc2, emc, fj, fkh, hkb, kni, knrl, oc, path, rho, sala, slp1, slp2, sna, sob, srp, term, tll, Traf1, trn, tsh, twi, zen). For four TFs (Bcd, Hb, Kr, Gt), we used the protein expression measurements instead.

Training logistic regression models

Request a detailed protocol

Logistic regression was used to model eve expression by linking the regulator concentrations as continuous input variables, to eve’s expression state as the binary output. For a nucleus i, the predictor, η_i, is calculated as a linear combination of concentrations:

η_{i} = β_{0} + β_{1} x_{1 i} + ... + β_{k} x_{k i}

where x_ki is the expression measurement of the kth gene for the ith nucleus with the β to be estimated. For the quadratic models, a single quadratic term was added for the regulator, q, in question:

η_{i} = β_{0} + β_{1} x_{1 i} + ... + β_{k} x_{k i} + β_{k + 1} x_{q i}^{2}

The predictor is linked to the estimated probability p_i of eve being ON in the ith nucleus:

p_{i} = \frac{1}{1 + e^{{- η}_{i}}}

Models for eve 2 were trained using the 348 nuclei defined as ON in stripe 2, as well as the nuclei defined as OFF excluding the nuclei of other stripes and their immediate neighbors (2588 nuclei). Similarly, models for eve 3+7 were trained using 725 ON nuclei in stripes 3 and 7, and 2756 OFF nuclei.

The models were fitted using the R function glm from the stats package, which uses Iteratively Re-weighted Least Squares. For our best fitting models, glm issued a fitting and evaluation warning message. This was because most of the logistic models that classify the eve stripes successfully have some fitted probabilities very near 0 or 1. (The nuclei on the borders of the stripes have intermediate values). Although this can suggest problems in certain situations, here, in agreement with Ripley (Ripley, 2008) it is viewed as a desirable outcome of classification. The trained model was then used to predict eve expression in all 6078 nuclei across the entire embryo, using the concentrations of the relevant regulators.

Tests of model consistency across different training subsets

Request a detailed protocol

The consistency of the model across different training subsets was tested in several ways. (i) Each model was trained on a subset of the training dataset and then used to predict eve expression for the whole embryo. Subsets used included: nuclei within 20 μm either side of the lateral midline; nuclei within the relevant stripe(s) and only their immediate neighbors; and a cross-validation test, which was the average of 100 predictions each trained on a random subset of 50 nuclei. For eve 3+7, two extra subsets excluded the ON nuclei from either stripe 3 or 7. Less consistent models produce poor predictions after training on some subsets. (ii) Each model was trained using all 38 regulators and then used to predict eve expression for the whole embryo. Models suffering from localized over-fitting show fragmented eve expression. (iii) Models were trained for each of the stripes in turn, using the regulators of the best-fitting models (such as Bcd, Hb, Gt, and Kr for eve 2). This showed that the given regulators are not able to fit any arbitrary stripe well.

Predictions of regulatory perturbations

Request a detailed protocol

The effects of regulatory perturbations were simulated by adjusting the concentrations of the relevant regulator without changing any model parameters (i.e., without retraining), and then predicting eve expression across the whole embryo. Binding site mutations and null mutants were simulated by setting the regulator concentration to ‘0’ in all nuclei. Where indicated, indirect effects were simulated by adjusting the expression level of downstream regulators and again, predicting eve expression without any model adjustments. Other types of regulatory perturbations, such as the misexpression studies, were performed similarly by adjusting regulator concentrations as described in the main text.

Visual display of model outputs

Request a detailed protocol

Model predictions of wild-type and mutant eve expression are displayed graphically for each nucleus in the embryo. The Virtual Embryo contains three-dimensional coordinates for each nucleus, making it possible to show the predictions in their spatial context. In most figures, embryos are shown from two perspectives: lateral and three-dimensional. In the lateral perspective, each nucleus is plotted using the (x,z) coordinate, ignoring the y coordinate. The x- and z-axes are aligned to the anteroposterior (left to right) and dorsoventral (top to bottom) axes respectively, so showing a view from the left side of the embryo. Since predictions for the left and right sides are similar, all nuclei (i.e., both left and right) are plotted in one composite view from the left side of the embryo. The three-dimensional perspective is plotted using the cloud function from the lattice package in R, similarly from an anterior perspective. Nuclei are colored according to the model’s prediction, from p=0 (light) to p=1 (dark). The color scale for predictions within stripes is grey-scale and predictions outside of stripes are shown on a red scale, with peach for values below 0.15.

Calculating the accuracy of model outputs

Request a detailed protocol

To accompany the visual display of wild-type predictions, we also calculated percentage accuracies to aid comparison between alternative models. These values provide good indications of model performance in predicting the stripe boundaries. For each model, we calculated the proportion of nuclei predicted as being ON (p>0.5) within the stripe(s) under consideration (i.e., true positives), in nuclei immediately adjacent to the stripe nuclei, and two nuclei away (i.e., false positives). The identities of neighboring nuclei are provided by the Virtual Embryo dataset.

Regulator discovery

Request a detailed protocol

For eve 2, we trained all possible linear models using four out of 38 regulators in the dataset (total 73,815 models), using the log likelihood of each fitted model as its score. A similar approach was used for exploring quadratic models, except that any model containing Bcd and/or Hb also included the corresponding quadratic term(s). The results are summarized as heat maps as shown in Figure 3.

Software

Analysis was performed with R version 2.15.1 (R Core Team, 2012), using colors from the ColorBrewer palettes in the RColorBrewer package. Plots made use of the lattice, ggplot2 and RBGL packages. The graph package was used to select neighboring nuclei.

Data availability

The following previously published data sets were used

1. Fowlkes CC
2. Hendriks CL
3. Keränen SV
4. Weber GH
5. Rübel O
6. Huang MY
et al. (2008) A quantitative spatiotemporal atlas of gene expression in the Drosophila blastoderm
Publicly available at http://bdtnp.lbl.gov/.

http://bdtnp.lbl.gov/Fly-Net/bidatlas.jsp

References

(2002)
Anterior repression of a Drosophila stripe enhancer requires three position-specific mechanisms

Development 129:4931–4940.
- Google Scholar
(2006) Diversity of LEF/TCF action in development and disease
Oncogene 25:7492–7504.

https://doi.org/10.1038/sj.onc.1210056
- Google Scholar
1. Arnosti DN
(2003) Analysis and function of transcriptional regulatory elements: insights from Drosophila
Annu Rev Entomol 48:579–602.

https://doi.org/10.1146/annurev.ento.48.091801.112749
- Google Scholar
1. Arnosti DN
2. Barolo S
3. Levine M
4. Small S
(1996)
The eve stripe 2 enhancer employs multiple modes of transcriptional synergy

Development 122:205–214.
- Google Scholar
1. Ashe HL
2. Briscoe J
(2006) The interpretation of morphogen gradients
Development 133:385–394.

https://doi.org/10.1242/dev.02238
- Google Scholar
(1996)
Neither the homeodomain nor the activation domain of Bicoid is specifically required for its down-regulation by the Torso receptor tyrosine kinase cascade

Development 122:3499–3508.
- Google Scholar
1. Berman BP
2. Nibu Y
3. Pfeiffer BD
4. Tomancak P
5. Celniker SE
6. Levine M
et al. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome
Proc Natl Acad Sci USA 99:757–762.

https://doi.org/10.1073/pnas.231608898
- Google Scholar
(2011) Whole-embryo modeling of early segmentation in Drosophila identifies robust and fragile expression domains
Biophys J 101:287–296.

https://doi.org/10.1016/j.bpj.2011.05.060
- Google Scholar
1. Borok MJ
2. Tran DA
3. Ho MC
4. Drewell RA
(2010) Dissecting the regulatory switches of development: lessons from enhancer evolution in Drosophila
Development 137:5–13.

https://doi.org/10.1242/dev.036160
- Google Scholar
1. Bulger M
2. Groudine M
(2011) Functional and mechanistic diversity of distal transcription enhancers
Cell 144:327–339.

https://doi.org/10.1016/j.cell.2011.01.024
- Google Scholar
1. Clyde DE
2. Corado MS
3. Wu X
4. Pare A
5. Papatsenko D
6. Small S
(2003) A self-organizing system of repressor gradients establishes segmental complexity in Drosophila
Nature 426:849–853.

https://doi.org/10.1038/nature02189
- Google Scholar
1. Crick F
(1970) Diffusion in embryogenesis
Nature 225:420–422.

https://doi.org/10.1038/225420a0
- Google Scholar
1. Davidson EH
(2010) Emerging properties of animal gene regulatory networks
Nature 468:911–920.

https://doi.org/10.1038/nature09645
- Google Scholar
1. Dean A
(2011) In the loop: long range chromatin interactions and gene regulation
Brief Funct Genomics 10:3–10.

https://doi.org/10.1093/bfgp/elq033
- Google Scholar
1. The ENCODE Project Consortium
(2012) An integrated encyclopedia of DNA elements in the human genome
Nature 489:57–74.

https://doi.org/10.1038/nature11247
- Google Scholar
1. Fowlkes CC
2. Hendriks CL
3. Keränen SV
4. Weber GH
5. Rübel O
6. Huang MY
et al. (2008) A quantitative spatiotemporal atlas of gene expression in the Drosophila blastoderm
Cell 133:364–374.

https://doi.org/10.1016/j.cell.2008.01.053
- Google Scholar
1. Frasch M
2. Levine M
(1987) Complementary patterns of even-skipped and fushi tarazu expression involve their differential regulation by a common set of segmentation genes in Drosophila
Genes Dev 1:981–995.

https://doi.org/10.1101/gad.1.9.981
- Google Scholar
(1999)
Analysis of an even-skipped rescue transgene reveals both composite and discrete neuronal and early blastoderm enhancers, and multi-stripe positioning by gap gene repressor gradients

Development 126:2527–2538.
- Google Scholar
1. Gerstein MB
2. Lu ZJ
3. Van Nostrand EL
4. Cheng C
5. Arshinoff BI
6. Liu T
et al. (2010) Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project
Science 330:1775–1787.

https://doi.org/10.1126/science.1196914
- Google Scholar
(1989) Early and late periodic patterns of even skipped expression are controlled by distinct regulatory elements that respond to different spatial cues
Cell 57:413–422.

https://doi.org/10.1016/0092-8674(89)90916-1
- Google Scholar
1. Harding K
2. Hoey T
3. Warrior R
4. Levine M
(1989)
Autoregulatory and gap gene response elements of the even-skipped promoter of Drosophila

EMBO J 8:1205–1212.
- Google Scholar
1. Hare EE
2. Peterson BK
3. Iyer VN
4. Meier R
5. Eisen MB
(2008) Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation
PLOS Genet 4:e1000106.

https://doi.org/10.1371/journal.pgen.1000106
- Google Scholar
(2001) Computational studies of gene regulatory networks: in numero molecular biology
Nat Rev Genet 2:268–279.

https://doi.org/10.1038/35066056
- Google Scholar
1. He X
2. Samee MA
3. Blatti C
4. Sinha S
(2010) Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression
PLOS Comput Biol 6:e1000935.

https://doi.org/10.1371/journal.pcbi.1000935
- Google Scholar
(1994)
Differential regulation of target genes by different alleles of the segmentation gene hunchback in Drosophila

Genetics 138:125–134.
- Google Scholar
(1990) A morphogenetic gradient of hunchback protein organizes the expression of the gap genes kruppel and knirps in the early Drosophila embryo
Nature 346:577–580.

https://doi.org/10.1038/346577a0
- Google Scholar
1. Hülskamp M
2. Schroder C
3. Pfeifle C
4. Jackle H
5. Tautz D
(1989) Posterior segmentation of the Drosophila embryo in the absence of a maternal posterior organizer gene
Nature 338:629–632.

https://doi.org/10.1038/338629a0
- Google Scholar
(2007) Known maternal gradients are not sufficient for the establishment of gap domains in Drosophila melanogaster
Mech Dev 124:108–128.

https://doi.org/10.1016/j.mod.2006.11.001
- Google Scholar
1. Jaeger J
2. Surkova S
3. Blagov M
4. Janssens H
5. Kosman D
6. Kozlov KN
et al. (2004) Dynamic control of positional information in the early Drosophila embryo
Nature 430:368–371.

https://doi.org/10.1038/nature02678
- Google Scholar
1. Janody F
2. Sturny R
3. Catala F
4. Desplan C
5. Dostatni N
(2000)
Phosphorylation of bicoid on MAP-kinase sites: contribution to its interaction with the torso pathway

Development 127:279–289.
- Google Scholar
1. Janody F
2. Sturny R
3. Schaeffer V
4. Azou Y
5. Dostatni N
(2001)
Two distinct domains of Bicoid mediate its transcriptional downregulation by the Torso pathway

Development 128:2281–2290.
- Google Scholar
1. Janssens H
2. Hou S
3. Jaeger J
4. Kim AR
5. Myasnikova E
6. Sharp D
et al. (2006) Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene
Nat Genet 38:1159–1165.

https://doi.org/10.1038/ng1886
- Google Scholar
1. Junion G
2. Spivakov M
3. Girardot C
4. Braun M
5. Gustafson EH
6. Birney E
et al. (2012) A transcription factor collective defines cardiac cell fate and reflects lineage history
Cell 148:473–486.

https://doi.org/10.1016/j.cell.2012.01.030
- Google Scholar
(2011) Genomic analysis at the single-cell level
Annu Rev Genet 45:431–445.

https://doi.org/10.1146/annurev-genet-102209-163607
- Google Scholar
et al. (2010) Quantitative analysis of the Drosophila segmentation regulatory network using pattern generating potentials
PLOS Biol 8:e1000456.

https://doi.org/10.1371/journal.pbio.1000456
- Google Scholar
(2003) Dual regulation of ets-activated gene expression by SP1
Gene 307:87–97.

https://doi.org/10.1016/S0378-1119(03)00445-1
- Google Scholar
et al. (2006) Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution II: dynamics
Genome Biol 7:R124.

https://doi.org/10.1186/gb-2006-7-12-r124
- Google Scholar
1. Kerszberg M
2. Wolpert L
(2007) Specifying positional information in the embryo: looking beyond morphogens
Cell 130:205–209.

https://doi.org/10.1016/j.cell.2007.06.038
- Google Scholar
1. Kim AR
2. Martinez C
3. Ionides J
4. Ramos AF
5. Ludwig MZ
6. Ogawa N
et al. (2013) Rearrangements of 2.5 kilobases of noncoding DNA from the Drosophila even-skipped locus define predictive rules of genomic cis-regulatory logic
PLOS Genet 9:e1003243.

https://doi.org/10.1371/journal.pgen.1003243
- Google Scholar
1. Lemon B
2. Tjian R
(2000) Orchestrated response: a symphony of transcription factors for gene control
Genes Dev 14:2551–2569.

https://doi.org/10.1101/gad.831000
- Google Scholar
(2012) Metazoan promoters: emerging characteristics and insights into transcriptional regulation
Nat Rev Genet 13:233–245.

https://doi.org/10.1038/nrg3163
- Google Scholar
1. Levine M
(2008) A systems view of Drosophila segmentation
Genome Biol 9:207.

https://doi.org/10.1186/gb-2008-9-2-207
- Google Scholar
1. Levine M
(2010) Transcriptional enhancers in animal development and evolution
Curr Biol 20:R754–R763.

https://doi.org/10.1016/j.cub.2010.06.070
- Google Scholar
1. Lewis J
(2008) From signals to patterns: space, time, and mathematics in developmental biology
Science 322:399–403.

https://doi.org/10.1126/science.1166154
- Google Scholar
1. Logan CY
2. Nusse R
(2004) The Wnt signaling pathway in development and disease
Annu Rev Cell Dev Biol 20:781–810.

https://doi.org/10.1146/annurev.cellbio.20.010403.113126
- Google Scholar
et al. (2006) Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: data acquisition pipeline
Genome Biol 7:R123.

https://doi.org/10.1186/gb-2006-7-12-r123
- Google Scholar
(1995)
Posterior stripe expression of hunchback is driven from two promoters by a common enhancer element

Development 121:3067–3077.
- Google Scholar
1. The modENCODE Consortium
(2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE
Science 330:1787–1797.

https://doi.org/10.1126/science.1198374
- Google Scholar
1. Morán E
2. Jiménez G
(2006) The tailless nuclear receptor acts as a dedicated repressor in the early Drosophila embryo
Mol Cell Biol 26:3446–3454.

https://doi.org/10.1128/MCB.26.9.3446-3454.2006
- Google Scholar
1. Négre N
2. Brown CD
3. Ma L
4. Bristow CA
5. Miller SW
6. Wagner U
et al. (2011) A cis-regulatory map of the Drosophila genome
Nature 471:527–531.

https://doi.org/10.1038/nature09990
- Google Scholar
1. Nüsslein-Volhard C
2. Wieschaus E
(1980) Mutations affecting segment number and polarity in Drosophila
Nature 287:795–801.

https://doi.org/10.1038/287795a0
- Google Scholar
(2009) Quantitative approaches in developmental biology
Nat Rev Genet 10:517–530.

https://doi.org/10.1038/nrg2548
- Google Scholar
1. Ohler U
2. Wassarman DA
(2010) Promoting developmental transcription
Development 137:15–26.

https://doi.org/10.1242/dev.035493
- Google Scholar
1. Ong CT
2. Corces VG
(2011) Enhancer function: new insights into the regulation of tissue-specific gene expression
Nat Rev Genet 12:283–293.

https://doi.org/10.1038/nrg2957
- Google Scholar
1. Papatsenko D
(2009) Stripe formation in the early fly embryo: principles, models, and networks
BioEssays 31:1172–1180.

https://doi.org/10.1002/bies.200900096
- Google Scholar
1. Papatsenko D
2. Levine M
(2011) The Drosophila gap gene network is composed of two parallel toggle switches
PLOS ONE 6:e21145.

https://doi.org/10.1371/journal.pone.0021145
- Google Scholar
1. Papatsenko D
2. Levine MS
(2008) Dual regulation by the Hunchback gradient in the Drosophila embryo
Proc Natl Acad Sci USA 105:2901–2906.

https://doi.org/10.1073/pnas.0711941105
- Google Scholar
1. R Core Team
(2012) R: a language and environment for statistical computing
(Vienna: R Foundation for Statistical Computing), http://www.r-project.org/.

http://www.r-project.org/
- Google Scholar
(2006) Quantitative models of developmental pattern formation
Dev Cell 11:289–300.

https://doi.org/10.1016/j.devcel.2006.08.006
- Google Scholar
Book
1. Ripley BD
(2008)
Pattern Recognition and Neural Networks

Cambridge: Cambridge University Press.
- Google Scholar
(2011) How to make stripes: deciphering the transition from non-periodic to periodic patterns in Drosophila segmentation
Development 138:3067–3078.

https://doi.org/10.1242/dev.062141
- Google Scholar
1. Schroeder MD
2. Pearce M
3. Fak J
4. Fan H
5. Unnerstall U
6. Emberly E
et al. (2004) Transcriptional control in the segmentation gene network of Drosophila
PLOS Biol 2:E271.

https://doi.org/10.1371/journal.pbio.0020271
- Google Scholar
1. Schulz C
2. Tautz D
(1994)
Autonomous concentration-dependent activation and repression of kruppel by hunchback in the Drosophila embryo

Development 120:3043–3049.
- Google Scholar
(2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation
Nature 451:535–540.

https://doi.org/10.1038/nature06496
- Google Scholar
1. Segal E
2. Widom J
(2009) From DNA sequence to transcriptional behaviour: a quantitative approach
Nat Rev Genet 10:443–456.

https://doi.org/10.1038/nrg2591
- Google Scholar
(1994) Synergy between the hunchback and bicoid morphogens is required for anterior patterning in Drosophila
Cell 78:855–865.

https://doi.org/10.1016/S0092-8674(94)90622-X
- Google Scholar
1. Singh N
2. Zhu W
3. Hanes SD
(2005) Sap18 is required for the maternal gene bicoid to direct anterior patterning in Drosophila melanogaster
Dev Biol 278:242–254.

https://doi.org/10.1016/j.ydbio.2004.11.011
- Google Scholar
1. Small S
2. Blair A
3. Levine M
(1992)
Regulation of even-skipped stripe 2 in the Drosophila embryo

EMBO J 11:4047–4057.
- Google Scholar
1. Small S
2. Blair A
3. Levine M
(1996) Regulation of two pair-rule stripes by a single enhancer in the Drosophila embryo
Dev Biol 175:314–324.

https://doi.org/10.1006/dbio.1996.0117
- Google Scholar
1. Small S
2. Kraut R
3. Hoey T
4. Warrior R
5. Levine M
(1991) Transcriptional regulation of a pair-rule stripe in Drosophila
Genes Dev 5:827–839.

https://doi.org/10.1101/gad.5.5.827
- Google Scholar
1. Spitz F
2. Furlong EE
(2012) Transcription factors: from enhancer binding to developmental control
Nat Rev Genet 13:613–626.

https://doi.org/10.1038/nrg3207
- Google Scholar
(1991) Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo
Science 254:1385–1387.

https://doi.org/10.1126/science.1683715
- Google Scholar
1. Struffi P
2. Corado M
3. Kaplan L
4. Yu D
5. Rushlow C
6. Small S
(2011) Combinatorial activation and concentration-dependent repression of the Drosophila even skipped stripe 3+7 enhancer
Development 138:4291–4299.

https://doi.org/10.1242/dev.065987
- Google Scholar
1. Struhl K
(2001) Gene regulation. a paradigm for precision
Science 293:1054–1055.

https://doi.org/10.1126/science.1064050
- Google Scholar
1. Tomlin CJ
2. Axelrod JD
(2007) Biology by numbers: mathematical modelling in developmental biology
Nat Rev Genet 8:331–340.

https://doi.org/10.1038/nrg2098
- Google Scholar
1. Varjosalo M
2. Taipale J
(2008) Hedgehog: functions and mechanisms
Genes Dev 22:2454–2472.

https://doi.org/10.1101/gad.1693608
- Google Scholar
(2011) Beyond the balance of activator and repressor
Sci Signal 4:pe29.

https://doi.org/10.1126/scisignal.2002183
- Google Scholar
1. Wilczynski B
2. Furlong EE
(2010) Challenges for modeling global gene regulatory networks during development: insights from Drosophila
Dev Biol 340:161–169.

https://doi.org/10.1016/j.ydbio.2009.10.032
- Google Scholar
1. Wolpert L
(1969) Positional information and the spatial pattern of cellular differentiation
J Theor Biol 25:1–47.

https://doi.org/10.1016/S0022-5193(69)80016-0
- Google Scholar
1. Wolpert L
(1996) One hundred years of positional information
Trends Genet 12:359–364.

https://doi.org/10.1016/S0168-9525(96)80019-9
- Google Scholar
1. Wolpert L
(2011) Positional information and patterning revisited
J Theor Biol 269:359–365.

https://doi.org/10.1016/j.jtbi.2010.10.034
- Google Scholar
(2012) Dissecting sources of quantitative gene expression pattern divergence between Drosophila species
Mol Syst Biol 8:604.

https://doi.org/10.1038/msb.2012.35
- Google Scholar
1. Zhao C
2. York A
3. Yang F
4. Forsthoefel DJ
5. Dave V
6. Fu D
et al. (2002)
The activity of the Drosophila morphogenetic protein bicoid is inhibited by a domain located outside its homeodomain

Development 129:1669–1680.
- Google Scholar
1. Zinzen RP
2. Girardot C
3. Gagneur J
4. Braun M
5. Furlong EE
(2009) Combinatorial binding predicts spatio-temporal cis-regulatory activity
Nature 462:65–70.

https://doi.org/10.1038/nature08531
- Google Scholar
1. Zuo P
2. Stanojević D
3. Colgan J
4. Han K
5. Levine M
6. Manley JL
(1991) Activation and repression of transcription by the gap proteins hunchback and kruppel in cultured Drosophila cells
Genes Dev 5:254–264.

https://doi.org/10.1101/gad.5.2.254
- Google Scholar

Article and author information

Author details

Garth R Ilsley
1. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
2. Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
Contribution
GRI, Selection and preparation of data, Conception and design, Analysis and interpretation of data, Drafting or revising the article

For correspondence
garth.ilsley@oist.jp

Competing interests
The authors declare that no competing interests exist.
Jasmin Fisher
1. Microsoft Research Cambridge, Cambridge, United Kingdom
2. Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
Contribution
JF, Supervisory role, Discussions, Drafting or revising the article

Competing interests
The authors declare that no competing interests exist.
Rolf Apweiler

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom

Contribution
RA, Supervisory role, Discussions, Drafting or revising the article

Competing interests
The authors declare that no competing interests exist.
Angela H DePace

Department of Systems Biology, Harvard Medical School, Boston, United States

Contribution
AHD, Supervisory role, Discussions, Interpretation of data, Drafting or revising the article

Contributed equally with
Nicholas M Luscombe

Competing interests
The authors declare that no competing interests exist.
Nicholas M Luscombe
1. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
2. Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
3. UCL Genetics Institute, Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom
4. London Research Institute, Cancer Research UK, London, United Kingdom
Contribution
NML, Supervisory role, Discussions, Interpretation of data, Drafting or revising the article

Contributed equally with
Angela H DePace

Competing interests
The authors declare that no competing interests exist.

Funding

EMBL

Garth R Ilsley
Rolf Apweiler
Nicholas M Luscombe

Cancer Research UK

Nicholas M Luscombe

Okinawa Institute of Science and Technology

Garth R Ilsley
Nicholas M Luscombe

National Institutes of Health

Rolf Apweiler
Angela H DePace

Microsoft Research

Jasmin Fisher

Peterhouse, Cambridge

Garth R Ilsley

University College London

Nicholas M Luscombe

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank the members of the DePace laboratory, in particular Tara Lydiard-Martin, Max V Staller, Zeba Wunderlich, and Ben Vincent, for insightful discussions throughout the project.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.