Ancient mechanisms for the evolution of the bicoid homeodomain's function in fly development

  1. Qinwen Liu
  2. Pinar Onal
  3. Rhea R Datta
  4. Julia M Rogers
  5. Urs Schmidt-Ott
  6. Martha L Bulyk
  7. Stephen Small  Is a corresponding author
  8. Joseph W Thornton  Is a corresponding author
  1. University of Chicago, United States
  2. New York University, United States
  3. Harvard University, United States
  4. Brigham and Women’s Hospital and Harvard Medical School, United States


The ancient mechanisms that caused developmental gene regulatory networks to diversify among distantly related taxa are not well understood. Here we use ancestral protein reconstruction, biochemical experiments, and developmental assays of transgenic animals carrying reconstructed ancestral genes to investigate how the transcription factor Bicoid (Bcd) evolved its central role in anterior-posterior patterning in flies. We show that most of Bcd’s derived functions are attributable to evolutionary changes within its homeodomain (HD) during a phylogenetic interval >140 million years ago. A single substitution from this period (Q50K) accounts almost entirely for the evolution of Bcd’s derived DNA specificity in vitro. In transgenic embryos expressing the reconstructed ancestral HD, however, Q50K confers activation of only a few of Bcd’s transcriptional targets and yields a very partial rescue of anterior development. Adding a second historical substitution (M54R) confers regulation of additional Bcd targets and further rescues anterior development. These results indicate that two epistatically interacting mutations played a major role in the evolution of Bcd’s controlling regulatory role in early development. They also show how ancestral sequence reconstruction can be combined with in vivo characterization of transgenic animals to illuminate the historical mechanisms of developmental evolution.


Animal body plans are established during development by interacting networks of transcription factors and other molecules, which organize temporal and spatial cascades of gene expression that control cell division, differentiation, and migration (Davidson and Peter, 2015; Gilbert, 2013; Stathopoulos and Levine, 2005). How specific genetic changes during evolution modified these networks and the developmental processes they control is the fundamental question in modern studies of the evolution of development (evo-devo) (Carroll, 2008; Davidson and Erwin, 2006; Levine and Tjian, 2003; Peter and Davidson, 2011; Rebeiz et al., 2015).

Considerable progress has been made toward this goal, particularly among closely related species and populations. Comparative analyses have identified differences in gene expression that correlate with phenotypic differences, and manipulative experiments have shown that altering expression of specific genes can disrupt or mimic specific forms of developmental diversity (Angelini and Kaufman, 2005; Bhullar et al., 2015; Cretekos et al., 2008; Jarvis et al., 2012; Martin et al., 2016; Nakamura et al., 2016; Tarazona et al., 2016; Mazo-Vargas et al., 2017; Zhang et al., 2017; Roeske et al., 2018). Transgenic experiments in which cis-regulatory regions are swapped between species or populations have identified polymorphisms that reproduce diversity in gene expression and developmental outcomes, such as pigmentation patterns in insects (Gompel et al., 2005; Jeong et al., 2008) and mice (Manceau et al., 2011), patterns of cuticular projections on the abdomens of flies (Frankel et al., 2011; McGregor et al., 2007), and the presence/absence of armor, spines, and other decorations among fish populations (Chan et al., 2010; Colosimo et al., 1928; Miller et al., 2007; Shapiro et al., 2004).

For deeper taxonomic divergences, it has been more difficult to establish the specific causes of evolutionary changes in gene regulation and development (Rebeiz et al., 2015). Distantly related taxa cannot be mated to each other, so genetic crosses cannot be used. Comparisons among species have demonstrated that candidate genes in key regulatory pathways differ in expression, and experimental manipulations have provided evidence that changes in expression of transcription factors in these pathways are involved in phenotypic differences (Martin et al., 2016; Nakamura et al., 2016; Gehrke and Shubin, 2016; Leal and Cohn, 2016; Petit et al., 2017). But the specific mutations and causal mechanisms that caused loss, gain, or modification of ancestral modes of regulation and developmental outcomes have not been identified, for several reasons. Distantly related genes (or regulatory elements) typically differ at many sequence sites – not only those that caused functional change but also those that diversified during subsequent periods of functional conservation and sequence drift – complicating efforts to identify the effect of specific substitutions (Rebeiz et al., 2015; Hochberg and Thornton, 2017). Epistasis among sites within loci represents a further challenge, because mutations introduced into present-day sequences often have effects different from those they had during historical evolution (Bridgham et al., 2009; Natarajan et al., 2013; Ortlund et al., 2007). In such cases, sequence differences that were causal in the past can be masked in horizontal swap experiments, leading to false negative inferences concerning the mutation’s effects; alternatively, if a sequence state from one lineage is swapped into a present-day gene with which it is epistatically incompatible, that state may appear to have been the evolutionary cause of a null phenotype, a particular problem if the diversity being studied involves the absence of a structure or function (Hochberg and Thornton, 2017). Epistasis between loci presents another difficulty: when a putatively causal factor, such as a regulatory protein or DNA element, is swapped horizontally from one species into a model organism to test its causal role and recapitulates the phenotype from the ‘source’ species, this provides strong evidence for causality, but a negative result could arise because of an epistatic ‘mismatch’ between the swapped factor and other loci. The more distantly diverged the two species, the more significant this problem becomes (Gehrke and Shubin, 2016).

Ancestral sequence reconstruction (ASR) can, in principle, help to address these difficulties. ASR uses statistical phylogenetic methods to infer the sequences of ancient genes or proteins, which can then be synthesized and experimentally characterized (Hochberg and Thornton, 2017; Harms and Thornton, 2010). This approach allows the direction of evolutionary change to be determined and localized to a specific interval of phylogenetic history. The relatively small number of candidate mutations that occurred during this interval can then be introduced into the appropriate ancestral protein to directly test the hypothesis that they caused the evolutionary shift in biochemical properties and/or biological function. By using the ancestral background, this approach also minimizes the confounding effect of intragenic epistasis. ASR can also help to reduce the complications of intragenic epistasis that may affect horizontal transgenesis: the mismatch between the reconstructed ancestral protein and the genetic background of the model organism in which it is tested is less severe, because divergence between the source and the model occurred along only one branch from the ancestor, instead of two. Moreover, testing ancestral factors from successive phylogenetic nodes, which may differ by only a few sequence sites, drastically limits the scope of the inference that might be compromised by epistasis.

ASR is now widely used in evolutionary molecular biology and biochemistry (Bridgham et al., 2009; Natarajan et al., 2013; Ortlund et al., 2007; Anderson et al., 2016; Bridgham et al., 2006; Finnigan et al., 2012; Hart et al., 2014; Howard et al., 2014; McKeown et al., 2014; Nguyen et al., 2016) and has been used to study the evolution of transcriptional networks in microbes (Sorrells et al., 2015; Baker et al., 2012; Baker et al., 2013). Recently, transgenic animals carrying reconstructed ancestral genes have been engineered to test the effects of historical genetic changes on physiology and fitness (Siddiq et al., 2017a), but this approach has not yet been applied to the study of development.

Here we use ASR to study the mechanisms for the historical evolution of a controlling molecular determinant of the anterior-posterior (AP) axis during embryonic development in flies (Dearden and Akam, 1999; Patel, 1994). In Drosophila melanogaster and other cyclorrhaphan flies – a large taxonomic group that appeared >140 Mya (Grimaldi and Engel, 2005) AP patterning depends on an anterior gradient of Bicoid (Bcd), a transcription factor translated in the embryo from maternally deposited, anteriorly localized mRNA (Driever and Nüsslein-Volhard, 1988a; Driever and Nüsslein-Volhard, 1988b; Berleth et al., 1988; Little and Wieschaus, 2011). Embryos lacking Bcd fail to produce any head or thoracic structures, and duplicate posterior structures are formed in the embryo’s anterior ([Frohnhöfer and Nüsslein-Volhard, 1986]; see also Figure 1—figure supplement 1 panels K, L). Bcd directly regulates transcription of dozens of target genes (McGregor et al., 2007; Chen et al., 2012; Driever et al., 1989; Struhl et al., 1989) by using its homeodomain (HD) to bind specific cis-regulatory DNA sequences, particularly variants of the canonical motif TAATCC, with flanking sequences providing some additional specificity (Driever et al., 1989; Struhl et al., 1989; Noyes et al., 2008; Treisman et al., 1989; Dave et al., 2000). Bcd also binds the 3' untranslated region of the messenger RNA of Caudal (Cad), a transcription factor involved in development of the posterior end, and represses its translation (Chan and Struhl, 1997; Rivera-Pomar et al., 1996). In some insect taxa outside the Cyclorrhapha – such as beetles, wasps, and midges – the gene regulatory network that carries out AP development is similar to that in D. melanogaster, but its anterior deployment is controlled not by Bcd but by other upstream transcription factors (Brent et al., 2007; Klomp et al., 2015; Lynch et al., 2006).

Bcd is unique to cyclorrhaphan flies. In winged insects, the Hox3 gene is called Zen; a duplication of this gene deep in the cyclorrhaphan lineage yielded two paralogs, Bcd and Zen (Stauber et al., 1999; Stauber et al., 2002). Bcd’s developmental role, anterior localization, and DNA specificity all appear to be derived evolutionary features, whereas many of Zen’s appear to be ancestral (Schmidt-Ott et al., 2010). Specifically, neither Hox3 nor Zen is anteriorly localized in the embryo or involved in establishing AP polarity, and both seem to play conserved roles in extraembryonic tissue formation (Rafiqi et al., 2008; Rafiqi et al., 2010; van der Zee et al., 2005). Both Hox3 and Zen strongly prefer the consensus DNA motif TAATTA over Bcd’s TAATCC consensus motif (Noyes et al., 2008; Berger et al., 2008), and neither regulates Cad translation (Schoppmeier et al., 2009). Evolution of Bcd’s developmental role in cyclorrhaphan flies therefore required, at minimum, two major changes after its emergence by gene duplication: (1) Bcd must have evolved the capacity to regulate expression of TFs that specify anterior and posterior structures and (2) maternal localization of Bcd in the anterior of the egg must also have evolved. Changes in Bcd’s target loci and in proteins with which Bcd interacts may also have contributed to the evolution of Bcd function. Here, we focus on the mechanisms that drove the first category of evolutionary changes: the acquisition by the Bcd protein of the capacity to regulate its target genes.

Numerous studies have addressed the sequence-structure function relations of Bcd and other HD proteins in D. melanogaster, but little is known about the mechanisms that drove the evolution of Bcd and its developmental role. Biochemical analyses in vitro suggest the hypothesis that acquisition of a lysine residue at position 50 (K50), which is located on the recognition helix (RH) that directly contacts the DNA major groove, may have played a role. K50 is present in several HD proteins that bind canonical target sites similar to Bcd’s, but HDs that prefer Zen-like motifs contain a glutamine at this position (Q50). Biochemical studies have shown that swapping these two states between extant HDs affects relative affinity for these target sequences and suggest potential structural mechanisms for this effect (Noyes et al., 2008; Treisman et al., 1989; Ades and Sauer, 1994; Hanes and Brent, 1989; Niessing et al., 2000; Percival-Smith et al., 1990; Schier and Gehring, 1993; Baird-Titus et al., 2006; Otting et al., 1990). Other residues in the protein’s recognition helix, including an arginine at site 54 (R54) have also been shown to affect DNA and RNA recognition in vitro (Dave et al., 2000; Niessing et al., 2000; Hanes and Brent, 1991). There has been no direct work, however, to assess what portion of the shift in DNA specificity that occurred during Bcd evolution is directly attributable to these or any other historical sequence substitutions.

In vivo, there is no evidence to indicate the extent to which Q50K or any other substitution can account for the evolution of Bcd’s regulatory or developmental functions in the embryo. The only prior manipulative study to examine the in vivo effects of Q50K on Bcd function introduced this mutation into D. melanogaster Fushi tarazu (Ftz), a distantly related HD protein, and found that Q50K does not confer activation of any Bcd targets in fly embryos (Schier and Gehring, 1993) or cultured cells (Zhao et al., 2000). This observation could indicate that Q50K made no contribution to the evolution of Bcd’s developmental functions; alternatively, it may have contributed to Bcd evolution, but only in combination with other Bcd-specific residues not introduced in this experiment; yet another possibility is that other characteristics of the present-day Ftz protein, not present in either Bcd or the Bcd-Zen ancestor, are incompatible with Bcd-like functions in the D. melanogaster embryo and therefore masked the contribution of Q50K in this experiment. A recent study found that replacing the HD of D. melanogaster Bcd with that of orthodenticle (otd), which also contains K50 and many other sequence differences, abolishes activation of most of Bcd’s targets and does not rescue anterior development in transgenic embryos, suggesting that this residue is not sufficient to account for most of the regulatory functions of the Bcd HD (Datta et al., 2018). No studies have addressed the in vivo developmental effect of historically relevant substitutions at other sites on Bcd’s functions.

To directly and explicitly test hypotheses about the effects of historical sequence changes on evolution of Bcd and its HD, we reconstructed ancestral forms of Bcd and used both in vitro and in vivo experiments to dissect the genetic and evolutionary trajectory of Bcd’s functional evolution. This strategy allowed us to localize the change in Bcd function to a specific phylogenetic branch, quantify the biochemical and developmental aspects of that change, and characterize the extent to which historical amino acid substitutions within the HD, including but not limited to Q50K, were necessary and sufficient to account for the evolution of Bcd-specific functions.


Developmental roles of Zen and Bcd homeodomains

Only the HD region of Bcd, Zen, and Hox3 retains sufficient phylogenetic signal to enable high-confidence ancestral sequence reconstruction. We therefore sought first to determine the extent to which the developmental functions of Bcd and Zen in extant D.melanogaster are attributable to differences between their HDs.

We established an in vivo gene replacement assay using bcd-null (bcdE1) D. melanogaster females and characterized both morphological and gene-regulatory phenotypes (Figure 1; Figure 1—figure supplement 1). As expected, bcd-null females produce embryos that lack all cephalic and thoracic structures; they have only one to three of the anterior-most abdominal segments and develop ectopic anterior filzkörper -- posterior structures that are normally restricted to the eighth abdominal segment (Frohnhöfer and Nüsslein-Volhard, 1986) (Figure 1—figure supplement 1K,L). These embryos fail to activate transcription of known Bcd target genes, including hb, kni, gt, otd, ems, btd, and the first two stripes of eve; translational repression of cad in the anterior is also lost (Figure 1—figure supplement 1M–T). We introduced into bcd-null embryos constructs coding for full-length or chimeric versions of D. melanogaster Bcd or Zen under the control of the bcd promoter (BP) and the bcd 3’UTR (Figure 1; See Materials and methods). RNAs from all transgenes were anteriorly localized in a fashion indistinguishable from the endogenous bcd mRNA pattern (Figure 1—figure supplement 1EE-GG). We assessed morphological rescue by the following criteria: repression of ectopic anterior filzkörper, development of all eight abdominal segments, development of all three thoracic segments, and development of internal skeletal elements in the cephalic region. Target gene regulation was considered partially rescued if expression in the anterior could be detected by in situ hybridization, even if the patterns were reduced or spatially shifted.

Figure 1 with 1 supplement see all
Role of the homeodomain in functional divergence of D. melanogaster Zen and Bcd.

Offspring of transgenic bcd-deficient D. melanogaster containing two copies of various replacement transgenes are shown in cuticle preparations of larval head (A, K, U) and whole body (B, L, V); in situ hybridization experiments and antibody stains in transgenic embryos show expression of Bcd’s translational (C, M, W) and transcriptional target genes (D–J, N–T, X–DD). Replacement transgenes were Bcd from D. melanogaster (DmBcd, A–J), Zen from D. melanogaster (DmZen, K–T), and a chimera of DmZen in which the homeodomain (HD) has been replaced by the HD of DmBcd (DmZen-BcdHD, U–DD). All three transgenes contain the D. melanogaster bcd promoter (BP), which directs maternal expression, and the bcd 3’UTR sequence required for anterior localization of the transgenic RNA. In the icons for each construct, protein-coding sequences from DmBcd are light blue and DmZen are red. Triangles show AttB sites used for recombination-mediated cassette exchange. For cuticle preps, mouthhooks (MH), dorsal arm (DA), ventral arm (VA), cephalic structures (Ceph), thoracic segments (T1–T3), abdominal segments (A1–A8), and Filzkörper (Fk) are indicated when present. An anti-Cad antibody was used to detect Cad protein. Antisense probes used in in situ hybridization experiments are indicated (hb: hunchback, kni: knirps, gt: giant, otd: orthodenticle, ems: empty-spiracle, btd: buttonhead, eve: even-skipped mRNAs). For comparison to wild-type, bcd-deficient, and a reciprocal construct of Bcd containing the ZenHD, see Figure 1—figure supplement 1.

As expected, the wild-type D. melanogaster Bcd transgene (DmBcd) rescued the bcdE1 phenotype by all molecular and morphological criteria, and over 95% of embryos survived to adulthood (Figure 1A–J). In contrast, Drosophila Zen (DmZen) failed to rescue any aspect of the bcd-null phenotype (Figure 1K–T compared to Figure 1—figure supplement 1K–T). The Zen protein therefore shares few if any regulatory activities with Bcd, even when it is expressed in Bcd-like fashion.

When DmZen was modified by replacing only its HD with that from D.melanogaster Bcd (DmZen-BcdHD, Figure 1U–DD), we observed substantial rescue of morphological and regulatory phenotypes. All thoracic and abdominal segments were formed, ectopic filzkörper structures were suppressed, and some head structures could be observed (Figure 1U,V). All seven of Bcd’s transcriptional targets were activated, although their expression domains were shifted to the anterior (Figure 1X–DD), and Cad translation in the most anterior regions was partially repressed (Figure 1W). In contrast, a reciprocal construct of the Bcd coding sequence containing the Zen HD (DmBcd-ZenHD) conferred no rescue of any developmental or gene regulatory phenotypes (Figure 1Figure 1—figure supplement 1U–DD). Sequence differences within the HD are therefore sufficient to explain a major portion of the Bcd protein’s anterior patterning and gene regulatory activities in present-day D. melanogaster embryos, with much smaller contributions from differences elsewhere in the protein.

Reconstruction of ancestral homeodomains

To understand when and how the Bcd HD's functions evolved, we reconstructed the phylogenetic history of the Bcd and Zen HDs and inferred ancestral sequences. The protein family phylogeny was inferred from an alignment of 33 Bcd, Zen, and Hox3 HD sequences from cyclorrhaphan flies and other insects (Figure 2; Supplementary file 1). Branch lengths and model parameters were estimated by maximum likelihood; topological relationships were inferred by maximum likelihood (Figure 2—figure supplement 1) and by constraining the topology to reflect relationships among species established in prior studies (Figure 2). As expected, both phylogenies indicate that duplication of Hox3/Zen before the radiation of the Cyclorrhapha produced two new paralogous genes – one lineage leading to cyclorrhaphan Zen and the other to Bcd. The phylogenies’ branch lengths suggest that a burst of rapid evolution occurred in the Bcd HD after this duplication, whereas Zen’s HD continued to evolve at a slower rate (Figure 2; Figure 2—figure supplement 1).

Figure 2 with 1 supplement see all
Reconstruction of ancestral HD sequences.

(A) Phylogeny of Bcd, Zen, and Hox3 HD amino acid sequences from insects and other arthropods. One representative sequence is shown among sequence entries with no sequence diversity. Reconstructed ancestral nodes are labeled; the mean posterior probability across sites for each maximum likelihood ancestral sequence is shown. Branch support labels show the approximate likelihood ratio statistic. Scale bar, 0.7 amino acid changes per site. The tree was constrained to reproduce well-corroborated species relationships; for the maximum likelihood phylogeny, see Figure 2—figure supplement 1. For species names and sequences, see Supplementary file 1. (B) Amino acid sequences of AncZB, AncBcd and Bcd from D. melanogaster (Dm) HDs. For AncBcd and DmBcd, only residues that differ from the AncZB HD are shown. Of these, 11 substitutions (shown in blue) are diagnostic differences conserved among all sequences in the Bcd vs. Zen clades. Differences between AncBcd and DmBcd are labeled red. For site-specific posterior probabilities of ancestral states, see Supplementary file 2.

Using this alignment and the species-constrained phylogeny, we inferred the maximum likelihood (ML) amino acid sequence of the HD at two key ancestral nodes: (1) AncZB, the common ancestor of all Zen and Bcd proteins in cyclorrhaphan flies, which represents the single ancestral sequence that existed just before the duplication and (2) AncBcd, one of AncZB’s daughter nodes, which represents the Bcd protein in the last common ancestor of all cyclorrhaphan flies (Figure 2A). Both sequences are reconstructed with little statistical ambiguity (mean posterior probability = 0.970 and 0.986, with only 3 and 2 sites ambiguously reconstructed in AncZB and AncBcd, respectively (Supplementary file 2). AncZB and AncBcd HD differ at 31 amino acid positions (Figure 2B).

Divergence of DNA-binding preferences between AncBcd and AncZB

We first characterized the evolutionary changes in DNA specificity that occurred in the Bcd gene after its birth by duplication of AncZB. We synthesized coding sequences for AncZB and AncBcd HDs and characterized each protein’s global DNA specificity using protein-binding microarrays (PBMs [Berger et al., 2006]) consisting of all possible DNA 10-mers (Figure 3). AncBcd’s overall occupancy of DNA motifs was very similar to that of D. melanogaster Bcd (r = 0.94, Figure 3A). In contrast, the correlation between the sequences bound by AncBcd and those bound by AncZB was far weaker (r = 0.56, Figure 3B), with thousands of target sequences changing in relative occupancy between AncZB and AncBcd. Bcd’s present-day global specificity is therefore a derived state that evolved on the phylogenetic branch after duplication of AncZB but before the last common ancestor of cyclorrhaphan flies.

Figure 3 with 2 supplements see all
In vitro and in vivo functions of AncBcd and AncZB HDs.

(A, B) Protein binding microarray (PBM) analysis of the DNA specificity of AncBcd, DmBcd, and AncZB HDs. Each dot represents one DNA 6-mer, plotted according to its binding enrichment (E-scores) in PBMs performed with two different HDs; Pearson’s correlation coefficient for each comparison is shown. (C–E) Energy logos showing the site-specific sequence determinants of DNA binding by AncZB, AncBcd, and DmBcd HDs as determined from PBM data. The size of each letter is proportional to the absolute value of that base’s main effect on the energy of binding relative to the mean of all bases at that position. The y-axis is unitless after division by RT. (F) Affinity of ancestral HDs for consensus Zen and Bcd DNA motifs (ZM and BM, sequences shown) in a quantitative fluorescence electrophoretic mobility shift assay. Several noncanonical Bcd targets previously shown to bind Drosophila Bcd were also tested. Column heights and error bars show the mean and SEM for three replicates. (G–Z) Cuticle phenotypes and gene expression patterns for offspring produced by bcdE1 females carrying two copies of a rescue construct expressing Dm Bcd protein (light blue) in which the HD has been replaced by the reconstructed AncBcd HD (dark blue, panels G-P) or the reconstructed AncZB HD (purple, panels Q-Z). Individual panels and constructs are labeled as in Figure 1. For replicates and alternative ancestral reconstructions, see Figure 3—figure supplement 1, Figure 4—figure supplement 1 and Supplementary files 3 and 4. For SPR analysis, see Figure 3—figure supplement 2 and Supplementary file 4.

To characterize the evolutionary shift in preference for DNA motifs, we computed from the PBM data the additive effect of each possible nucleotide at each sequence position on the estimated binding energy of the protein-DNA complex (Figure 3C,D,E). AncZB, AncBcd, and DmBcd all share the core consensus motif TAAT (positions 1–4), but with some degeneracy. The major differences are at positions 5 and 6: AncZB prefers T or G at position 5 and A or G at position 6 (Figure 3C), but AncBcd and DmBcd strongly prefer sequences containing CC at these two positions (Figure 3D,E). An additional difference is that AncZB strongly excludes binding to motifs with T at position 2, whereas AncBcd excludes binding to C at this position (Figure 3C,D). These findings were robust to uncertainty about the sequences of the ancestral HDs: PBMs performed using alternative reconstructions of AncBcd and AncZB, which incorporated plausible alternative amino acid states at all ambiguously reconstructed sites (Eick et al., 2017), yielded almost identical results (Supplementary file 3).

To quantitatively characterize evolutionary changes in DNA preference that occurred between AncZB and AncBcd HDs, we used electrophoretic mobility shift assays (F-EMSA) to measure binding affinity to the PBM consensus Zen motif (ZM, TAATTA), the consensus Bcd motif (BM, TAATCC) and five other noncanonical motifs to which Bcd has previously been shown to bind (Dave et al., 2000; Hanes and Brent, 1991; Zhao et al., 2000; Driever and Nüsslein-Volhard, 1989; Rivera-Pomar et al., 1995; Ma et al., 1996). We found that relative preference for Bcd motifs increased dramatically across the interval from AncZB and AncBcd, with affinity for BM increasing 25-fold and that for ZM declining by a factor of 10 (Figure 3F). Binding to the noncanonical motifs also increased substantially from AncZB to AncBcd. Alternative reconstructions of AncZB and AncBcd HDs yielded nearly identical results (Figure 3—figure supplement 1).

We further investigated DNA binding by the ancestral proteins using a surface plasmon resonance assay (SPR) (Figure 3—figure supplement 2). Corroborating the results from EMSA, AncZB’s affinity for ZM is higher than for BM, whereas AncBcd prefers BM. Changes in affinity were caused primarily by changes in dissociation rates: AncBcd’s off-rate from BM was 66-fold slower than AncZB’s, and its off-rate from ZM was three times faster; changes in on-rates were small (Supplementary file 4). As a result, there was a strong evolutionary shift in average residence times on the two motifs: between AncZB and AncBcd, residence time on BM increased by 70-fold, while that on ZM declined by a factor of 3 (Figure 3—figure supplement 2, Supplementary file 4). Although changes in preference were qualitatively similar when assessed by F-EMSA and SPR, the dissociation constants estimated by the two methods were somewhat different, presumably because buffers differ between the two techniques and because EMSA, unlike SPR, is a nonequilibrium assay.

Taken together, these data indicate that during the phylogenetic interval immediately after the duplication of AncZB, a major change in DNA recognition and affinity occurred along the lineage leading to AncBcd. The ensemble of occupied DNA binding sites shifted dramatically, and the protein’s preference for canonical Zen vs. Bicoid motifs was inverted.

In vivo characterization of ancestral HDs

We next tested the effects of the AncZB and AncBcd HDs on development and gene regulation in vivo. We swapped these domains precisely into the DmBcd rescue transgene and tested whether they rescue the defects of embryos from bcd-null females. The reconstructed AncBcd HD yielded a phenotype indistinguishable from that of the DmBcd transgene, with essentially complete morphological rescue (Figure 3G,H), activation of all Bcd targets (Figure 3J–P), and repression of Cad (Figure 3I), with 37% of embryos surviving to adulthood. In contrast, when the AncZB HD was swapped into the same construct, no morphological or molecular rescue was observed (Figure 3Q–Z). The derived in vivo functions of the Bcd homeodomain – including its capacity to regulate key developmental target genes and to specify formation of cephalic and thoracic structures in anterior regions of the embryo – therefore evolved almost entirely during the interval after duplication of AncZB but before AncBcd.

Genetic causes for the evolution of AncBcd’s DNA specificity

The genetic causes for the evolutionary change in the Bcd HD’s functions must be among the 31 amino acid substitutions that occurred between AncZB and AncBcd (Figure 2B). Although previous research into the genetics of Bcd function focused on specific candidate residues, we took an unbiased approach. To identify the most likely causal changes, we examined phylogenetic patterns of conservation and divergence. Eleven of the 31 amino acid substitutions that occurred between AncZB and AncBcd are phylogenetically diagnostic, with the AncBcd state conserved in all descendant Bcd proteins (Figure 2B); these diagnostic substitutions constitute an initial set of candidate causal changes.

We first tested whether each of these 11 substitutions was necessary for the evolution of AncBcd’s preference for the canonical Bicoid motif (BM). We individually reverted each diagnostic residue in AncBcd HD to the state present in AncZB HD and measured the effect on DNA affinity using F-EMSA. Only one reversal – changing lysine 50 back to the ancestral glutamine (K50q) – shifted the protein’s preference for BM back toward ZM; this reversion caused a 38-fold change in relative affinity and restored the ancestral preference for ZM. Mutating arginine (R) to the ancestral methionine (m) at position 54 (R54m) – had no effect on relative preference, but it lowered affinity to both BM and ZM by a factor of ~15 (Figure 4A). No other ancestral state affected affinity to either BM or ZM or relative preference by a factor of three or more in either direction, indicating that none of these were necessary for the historical evolution of the derived DNA preference of the Bcd HD.

Figure 4 with 1 supplement see all
A single amino acid substitution changes AncZB's in vitro binding preference.

(A) Effect of reversing diagnostic historical substitutions on AncBcd-HD’s affinity for consensus Bcd and Zen motifs (BM and ZM), in a fluorescent EMSA assay. Column height and error bars show mean and SEM of three replicates. Upper and lower cases denote derived and ancestral states, respectively. (B) Effect of introducing historical substitutions to the derived state at site 50 and 54 into AncZB HD. (C) Binding of DNA motifs by AncZBHD-q50K is tightly correlated with binding by AncBcdHD. Each point is a DNA 6-mer plotted by the average of E-scores for 8-mers containing that 6-mer. Pearson correlation coefficient is shown. (E) Binding of DNA motifs by AncBcdHD-K50q is tightly correlated with binding by AncZB-HD. (D, F) The PBM energy logos of AncZBHD-q50K and AncBcdHD-K50q (compare to Figure 3C). (G) Effect of historical substitutions on global binding specificities in PBMs. Each column represents an 8-mer, colored by its PBM E-score, according to the color scale shown. Each row shows PBM of a reconstructed HD protein with or without residue 50 swapped between ancestral and derived states. 8-mers were clustered using the Manhattan distance metric. Only 8-mers with E-score ≥ 0.45 for at least one HD variant are shown. Logos at the bottom of the heatmap were created from the 8-mers in the two large clusters.

To test whether the substitutions at positions 50 and 54 were sufficient to cause the evolutionary change in DNA recognition in vitro, we introduced the derived states from AncBcd into the AncZB HD (Figure 4B). Substitution q50K had a very large effect, shifting the ancestral protein’s relative binding preference by four orders of magnitude, increasing affinity for BM 100-fold and reducing that for ZM by a factor of 100 (Figure 4B). By comparison, simultaneously introducing into AncZB all 30 other substitutions from this interval except q50K (construct AncBcd-K50q) caused no improvement in affinity for BM and just a seven-fold reduction in affinity to ZM (Figure 4A). Substitution m54R alone increased affinity for both ZM and BM, but in the presence of q50K, it had no significant effect, indicating an epistatic interaction by which q50K silences the effect of m54R (Figure 4B). These data indicate that the single substitution q50K was sufficient to cause the entire historical shift in in vitro DNA preference from ZM to BM, whereas m54R did not make an observable contribution to the new specificity.

We next used PBMs to determine whether the evolution of Bcd’s global DNA specificity was also attributable primarily to substitution q50K. We compared the ensemble of DNA motifs bound by AncZB to those bound by AncZB-q50K. This substitution shifted the overall enrichment and de-enrichment across all oligomers to be extremely similar to those bound by AncBcd (Figure 4C, r = 0.93, compared to r = 0.56 for the correlation of AncZB and AncBcd (Figure 3B). Substitution q50K also transformed the binding-energy logo of AncZB into one almost identical to that of AncBcd, with the ‘CC’ dinucleotide strongly favored at positions 5 and 6 and the ancestrally preferred bases now de-enriched (Figure 4D, compared to Figure 3D). Substitution q50K was therefore sufficient to confer on AncZB the derived in vitro DNA-binding profile of AncBcd.

The other HD substitutions did not contribute to the evolutionary shift in global DNA specificity, as shown by a PBM comparison of AncZB to AncBcd-K50q, which is AncZB plus all 30 other derived states that evolved in the AncZB-AncBcd interval (except for q50K). The global DNA specificity of this protein remained strongly correlated to that of AncZB (r = 0.93, Figure 4E); its binding-energy motif was also like that of AncZB, with a strong preference for G/T and A at the last two positions, and depletion of binding to the CC bases that are enriched among AncBcd’s targets (Figure 4F). The overall profiles of DNA targets bound by AncZB and AncBcd-K50q are highly similar, with enrichment of sequences similar to TAATTA and de-enrichment of those similar to TAATCC; conversely, AncBcd and AncBcdq50K show the opposite pattern (Figure 4G).

Taken together, these results indicate that substitution q50K was the primary determinant of the evolutionary shift in in vitro DNA-binding preferences that occurred during the transition from AncZB to AncBcd. The other 30 substitutions that occurred on this branch, including m54R, had very minor effects on the in vitro DNA-binding specificity of the evolving HD.

In vivo effects of historical substitutions on gene expression and development

To test the contribution of historical substitutions to the evolution of Bcd’s role in AP specification and the regulatory network that drives it, we incorporated candidate historical residues into transgenic Bcd constructs containing the AncZB or AncBcd HDs, transformed bcd-null females with these constructs, and assayed their offspring.

We found that reversing only q50K (AncBcdHD-K50q) completely abolished the rescue conferred by AncBcd HD: larval cuticles were similar to bcd-null embryos (Figure 5A,B), expression of Bcd's transcriptional targets was completely lost (Figure 5D–J), and Cad translation was no longer repressed (Figure 5C). Substitution q50K was therefore necessary for evolution of the developmental and gene regulatory functions of the Bcd HD.

Effect of historical substitutions on Bcd functions in vivo.

(A–J) Testing the necessity of q50K for AncBcd function. Cuticle phenotypes and Bcd-target gene expression patterns of embryos from bcdE1 females carrying two copies of a Bcd rescue construct containing AncBcd-HD (dark blue) with an amino acid reversion to the AncZB ancestral state at residue 50 (purple arrow). (K–NN) Testing the sufficiency of historical substitutions of q50K and m54R. Embryos expressing Bcd rescue constructs containing AncZB-HD (purple) with historical substitutions to the derived AncBcd state (blue arrows) at site 50 (K–T), 54 (U–DD), or both (EE–NN). Individual panels and constructs are labeled as in Figure 1. Stars indicate Bcd-dependent target gene activation induced by the indicated constructs.

To determine whether q50K was also a sufficient historical cause, we introduced the derived lysine into the AncZB HD. We observed partial rescue (Figure 5K–T). Ectopic filzkörper formation was suppressed, and all eight abdominal segments were formed in ~90% of the embryos (Supplementary file 5). The posterior-most thoracic segment (T3) was observed in around 60% of larvae. These embryos failed to form anterior thoracic segments (T1 and T2) and any head structures (Figure 5K,L, Supplementary file 5). Of Bcd’s seven transcriptional targets, three (hb, kni, and eve stripe 2) were activated, but four (gt, otd, ems, and btd) were not (Figure 5N–T), and Cad translation was not repressed (Figure 5M). Thus, q50K partially accounts for the evolution of Bcd’s role in development and gene regulation in vivo.

We therefore examined whether substitution m54R – the only other historical substitution with a significant effect on DNA binding in vitro – affected the functions of the ancestral proteins. Introducing m54R alone into AncZB HD conferred no observable rescue of developmental or gene regulatory phenotypes (Figure 5U–DD). Introducing m54R together with q50K, however, yielded a notably more complete rescue than q50K alone: all larvae carrying the double substitution contained T3 and all eight abdominal segments,>80% also contained T2, and >50% formed anterior structures similar to the sclerotized structures of the internal cephalic skeleton (Figure 5EE,FF; Supplementary file 5). Compared to the single q50K substitution, two additional Bcd-dependent transcriptional targets, gt and otd, were activated (Figure 5HH–KK,NN), but two others, btd and ems, remained silent (Figure 5LL,MM), and Cad translation was still not repressed in the anterior (Figure 5GG). These data point to an epistatic interaction between m54R and q50K in vivo, in which m54R has an effect only when introduced together with q50K; this interaction differs from that observed in vitro, where m54R affects affinity in isolation but not in combination with q50K.

Thus, just two large-effect substitutions – q50K and m54R – are sufficient in the ancestral HD to recapitulate the evolution of major aspects of the Bcd HD’s derived function, including regulation of many target genes and partial specification of anterior structures. Compared to the phenotype of AncBcd-HD, the rescue conferred by these two substitutions is incomplete, with several targets remaining unregulated and anterior development still compromised. Additional contributions must therefore have been made by some of the other historical substitutions in the HD during the interval between AncZB and AncBcd.


Relation to prior work

The evolution of a molecule and its functions can be conceived of as a specific historical trajectory through the vast space of possible sequences from the ancestral to the derived form. A complete understanding of the evolution of a protein’s role in development would identify the sequence substitutions that changed its functions, order them in a temporal sequence, and characterize how each change altered specific molecular and developmental processes. This ‘vertical’ approach is different from that used by most studies, which compare the molecular sequences and developmental functions of present-day proteins. For example, experiments that perturb the sequence of a present-day protein, introducing amino acid states that have different biochemical properties or are found in other proteins that have different functions, explore the local sequence space immediately surrounding that protein. Although this approach can help to identify some of the genetic and biochemical constraints that shape a molecule's current functions, it does not directly address the causal effects of the set of historical changes in gene sequence that occurred during its evolution. (Hochberg and Thornton, 2017)

Previous work investigated the genetic determinants of D. melanogaster Bcd’s functions using non-vertical approaches. In vitro experiments have shown that mutating Q50 to K in distantly related extant HDs increases affinity for or activation from consensus Bcd motifs and reduces it from Zen motifs; the converse experiment in DmBcd HD has the opposite effect (Treisman et al., 1989; Ades and Sauer, 1994; Percival-Smith et al., 1990; Hanes and Brent, 1991; Hanes and Brent, 1991; Zhao et al., 2000). Mutations at some other sites, including position 54, also affect affinity and/or preference (Dave et al., 2000; Niessing et al., 2000; Damante et al., 1996). These studies established that certain residues are important for the present-day protein’s functions, but their design could not characterize how much of the evolutionary acquisition of Bcd’s functions was caused by these or any other historical sequence substitutions. First, they provided no evolutionary scale to refer to: the quantitative change in function that occurred during the key evolutionary interval was unknown, so how much of that change was attributable to Q50K or other substitutions could not be assessed. Second, most mutation studies (with the exception of Q50K) exchanged amino acids for non-ancestral states that never occurred during Bcd evolution. For example, mutating Bcd’s Arg54 to alanine, serine, or lysine affects in vitro functions, as does introducing Arg54 into distantly related HDs that contain Ala or Tyr (Dave et al., 2000; Niessing et al., 2000; Damante et al., 1996; Pellizzari et al., 1997); however, none of these amino acids were present in the ancestral HD, so these results provided no information about the impacts of the Met-Arg substitution that occurred at this site during Bcd evolution. Indeed, without a phylogenetic analysis, the substitutions that occurred during history– the set of transitions from ancestral to derived amino acids that temporally co-occurred with the change in Bcd function – had not been identified. Third, even when the relevant derived and ancestral states were studied, their effects were not assayed in the appropriate historical sequence context, a critical issue because mutations can have different effects on function in extant vs. ancestral backgrounds (Hochberg and Thornton, 2017). Thus, it has been unknown whether any historical substitution – even the well-studied position 50 – was an entirely sufficient cause, a partial contributor, an epistatic modifier, or causally irrelevant in the historical evolution of Bcd’s derived functions in vitro.

Even less was previously known about the genetic causes of the evolution of Bcd’s developmental and gene regulatory functions in vivo. Mutating Q50 to K in the HD of the distantly related protein Ftz failed to confer activation of Bcd-responsive genes in transgenic fly embryos or in cultured cells (Schier and Gehring, 1993; Zhao et al., 2000). This negative result might imply that Q50K did not cause the evolution of any of Bcd’s derived functions in vivo, but it could also be an artifact of epistasis, if other sequence states in Ftz are incompatible with Bcd functions but were not present in the ancestral Bcd protein when the substitution occurred. Q50K has also been introduced into other HDs, but effects on other developmental processes, not on Drosophila anterior development or expression of Bcd-responsive genes, were studied (Ibrahim et al., 2013; Percival-Smith et al., 1997; Capovilla et al., 1994; Plaza et al., 2001) At site 54, the mutations studied in vivo have involved states that did not occur during Bcd evolution, such as alanine and serine (Niessing et al., 2000), so the effects of M54R on its own, or in combination with Q50K, have been unknown.

Our strategy was designed to directly characterize the causal contribution of historical substitutions to the acquisition of Bcd’s derived functions, both in vitro and in vivo, using the reconstructed context of the ancestral proteins in which they occurred. By functionally characterizing successive ancestral nodes on the phylogeny, we found that all the derived functions of the Drosophila Bcd HD were acquired on one phylogenetic branch, which represents the evolutionary interval between the duplication of AncZB, which produced a separate bcd locus, and before the bcd gene in the last common ancestor of all cyclorrhaphan flies (AncBcd). During this interval, a dramatic shift occurred in the global set of DNA sequences to which AncBcd binds, and the relative preference for canonical BMs vs. ZMs changed by a factor of 250. This interval was also the key period for the evolution of Bcd’s in vivo functions: whereas AncZB-HD rescued no Bcd-specific functions, AncBcd-HD activated all seven Bcd target genes tested, inhibited Cad translation, and organized development of many anterior structures in the embryo, yielding functions and phenotypes indistinguishable from those conferred by the D. melanogaster Bcd HD.

The sequence changes that occurred along this phylogenetic branch represent the candidate genetic causes for the evolution of Bcd HD’s derived functions. We focused on the contribution of 11 substitutions from this interval, because the derived states of these are conserved among all known Bcd proteins. When each of these was introduced singly into AncZB, only Q50K made a measurable contribution to the new specificity, and this substitution alone was sufficient to recapitulate the entire evolutionary shift in local specificity (affinity for BM and ZM) and global DNA affinity (as measured in a PBM), yielding in vitro functions indistinguishable from AncBcd. The other substitutions had no effect on motif preference, and only M54R had a significant effect on affinity, conferring a nonspecific increase in binding to both motifs; however, when M54R was introduced together with Q50K, it had no measurable effect on affinity for either motif, compared to Q50K alone. Thus, a single large-effect substitution appears to provide a nearly complete causal explanation for the evolution of the Bcd HD’s derived DNA specificity as measured in our in vitro assays.

By performing transgenic experiments with reconstructed ancestral HDs, we sought to directly assess the contribution of historical substitutions to the shift in embryonic gene expression and developmental phenotypes that occurred during Bcd evolution. We found that Q50K was a major evolutionary cause: introducing it into AncZB-HD conferred regulation of three of seven Bcd target genes and recovery of some aspects of anterior development. Unlike its minor role in vitro, M54R was an important causal contributor in vivo, conferring regulation of two additional targets and a more complete developmental rescue, but only when introduced together with Q50K. At least some of the 29 other substitutions that occurred during this interval made additional contributions, conferring activation of the last two transcriptional targets (ems and btd), increasing expression levels and sharpening localization of the entire suite of targets, yielding anterior repression of Cad translation, and rescuing the full anterior developmental phenotype of AncBcd-HD. Further work will be required to determine the specific contribution of these substitutions to the evolution of Bcd function.


There are several limitations to our approach, but none are likely to undermine our major claims. First, ancestral sequences are inferred statistically and are therefore never known with certainty. The ML sequences of AncZB and AncBcd were inferred with very high confidence, with only 3 and 2 sites, respectively, having a plausible alternative amino acid, and the expected number of erroneously reconstructed sites is just 1.7 and 0.9 in the two proteins. We performed the in vitro assays using ‘AltAll’ versions of the reconstructed proteins that contain all plausible alternative residues and found no differences in function from the ML ancestors. Our inferences concerning ancestral functions therefore appear to be robust to stochastic uncertainty about the precise sequence of the ancestral proteins. We did not test these alternative reconstructions in transgenic animals and therefore cannot strictly rule out the possibility that the true ancestral sequences of AncZB and AncBcd may have functioned differently from the ML versions in vivo. But this possibility seems unlikely, given the high degree of statistical confidence and the robustness of the protein’s in vitro functions to sequence uncertainty.

A second limitation is that we assayed reconstructed ancestral HDs in the context of an extant model organism. The developmental gene regulatory environment is likely to have changed in some ways from the ancestral cyclorrhaphan to D. melanogaster; it is possible that the ancestral proteins that we tested could have had different effects on the ancestral organism in which they occurred. All transgenic experiments, including horizontal swaps of genes or regulatory elements from one species into a model organism, are potentially complicated by this kind of epistatic mismatch (Gehrke and Shubin, 2016). In general, epistatic mismatch should be less of a problem when ancestral genes are used because the genetic distance between ancestral gene and host organism is shorter than it is in horizontal swaps between extant species that descend from the same ancestor. We observed virtually no developmental or regulatory differences between D. melanogaster transformed with its own DmBcd-HD and those carrying the AncBcd-HD, suggesting no incompatibility between the ancestral HD and the developmental environment in which it was tested. Moreover, our assays of AncBcd, AncZB, and variants containing Q50K and/or M54R were all carried out in the common genetic background of D. melanogaster; so our inferences would be artifacts only if interactions with this background differentially affected these very similar sequences.

A third limitation is that we focused our study on evolutionary change within the Bcd HD, because only the HD can be reconstructed with confidence. Changes elsewhere in the Bcd coding sequence may have made some contribution to the emergence of the protein’s new functions. But our experiments establish that changes within the HD are the predominant causes within the Bcd protein of its functional evolution: swapping just the HD of DmBcd-HD into the rest of the Zen protein confers the majority of Bcd-like in vivo functions, including regulation of all major Bcd target genes and rescue of many anterior structures. The level of expression and precise localization of some Bcd targets is not fully rescued by this Zen-BcdHD chimera, however, so amino acid changes outside the HD likely had additional, albeit relatively minor effects, on tuning Bcd’s derived gene regulatory functions.

Bcd’s evolution involved a shift in not only the protein’s intrinsic functional properties but also its localization; our experimental design did not address the genetic changes that drove maternally expressed bcd to become localized in the anterior of the embryo. The evolution of Bcd functions may also have involved changes in proteins with which Bcd interacts or in Bcd’s target regulatory elements. These questions could, in principle, be addressed in the future using ancestral reconstruction. Indeed, the strategy we use here can be applied to any developmental functions and phenotypes for which phylogenetic signal is adequate to allow relevant ancestral sequences to be reconstructed. Ancestral regulatory regions may be amenable to reconstruction and in vitro/in vivo characterization (Rebeiz et al., 2011; Rogers et al., 2013), if they are conserved enough that traces of the ancestral state are retained in extant sequences. It may therefore be possible, in some cases at least, to provide more complete causal explanations across biological levels for the ancient evolution of differences in development among distantly related taxa.


Our results shed light on several issues concerning developmental evolution. First, they illuminate the number and effect-size of substitutions during the evolution of development, a classic question in evolutionary genetics (Fisher, 1919; Lewontin, 1974; Orr, 2005) and evo-devo (Colosimo et al., 1928; Martin and Orgogozo, 2013; Rockman, 2012; Stern and Orgogozo, 2008). We found that two very large-effect substitutions drove the evolution of the majority of the Bcd HD’s derived functions. This result is consistent with findings in other protein families that just a few historical substitutions drove evolutionary changes in biochemical specificity in enzymes, transcription factors, and binding proteins (Siddiq et al., 2017b). In contrast, previous studies of development that have identified the genetic basis of differences between populations or closely related species have mostly found a more diffuse distribution of effect sizes, with many substitutions of small effect accounting for new phenotypes (Gompel et al., 2005; Frankel et al., 2011; Linnen et al., 2013; Rebeiz et al., 2009); however, a major role for large-effect mutations has been observed when the derived phenotype is conferred by loss-of-function mutations (e.g. refs. [Colosimo et al., 1928; Jeong et al., 2006; Prud'homme et al., 2006; Hoekstra et al., 2006]).

In the case of Bcd, both large- and small-effect substitutions appear to have been important, with additional contributions by some of the 29 other substitutions on the same branch. These results are not consistent with a model by which evolution occurs through a very large number of changes of extremely small effect or a simple ‘hopeful monster’ model by which a single mutation is sufficient to confer a novel, high-fitness phenotype. Our findings are more consistent with a model in which a very small number of large-effect substitutions can yield a new but non-optimal function, which is subsequently refined by additional changes of smaller effect (Orr, 2005; Charlesworth and Charlesworth, 1975).

Our results also have implications for understanding the evolution of Bcd’s role in cyclorrhaphan development. The difference we observed between the in vitro and in vivo contributions of the key historical substitutions indicates that the evolution of Bcd’s developmental functions cannot be explained entirely by a shift in binding to canonical ZM and BM. Q50K causes the entire shift in preference measured in vitro, but it accounts for a relatively small portion of the HD’s in vivo evolution – activation of three of seven major transcriptional target genes, and only a very limited rescue of anterior development; two of these, hb and eve2, are the same targets that are activated by a construct of the D. melanogaster bcd gene containing the otd HD, which also contains K50 (Datta et al., 2018). Adding M54R to Q50K had no further effect on preference or affinity in vitro, but it conferred a notably more complete rescue in vivo. Residue Arg54 has been proposed to be critical for Cad inhibition (Niessing et al., 2000), but the AncZB-Q50K/M54R construct showed no detectable inhibition of Cad in vivo, indicating that M54R’s effects are mediated by some other mechanism. Effects on non-canonical, lower-occupancy DNA targets are one possibility (Datta et al., 2018; Ramos and Barolo, 2013; Crocker et al., 2015), as are changes in cooperative interactions with some other transcription factor (Baker et al., 2012). When the remaining historical substitutions have been identified that confer inhibition of Cad translation and activation of Bcd’s two remaining transcriptional targets (ems and btd), it should be possible to use the data we present here to disentangle how Bcd’s regulatory effects on its various targets interact to specify AP polarity.

A major puzzle is how a new transcription factor could evolve a controlling role in a crucial process of early development, because intermediate evolutionary states that would disrupt AP development would presumably be strongly selected against. It has been suggested that Bcd may have usurped its role as a primary anterior organizing transcription factor from a more ancient protein with Bcd-like specificity, such as Otd, which plays a role in anterior organization in some wasps (Lynch et al., 2006). In this scenario, as soon as Bcd’s new specificity evolved, it would immediately have taken on a controlling position in the AP gene regulatory network, so long as it was localized to the embryo’s anterior. There are other examples of transcription factors taking control of existing integrated regulatory networks during evolution, although the mechanisms by which they did so are not yet known (Booth et al., 2010; Mank and Avise, 2009; Matsuda et al., 2002; Wallis et al., 2008).

Our finding that only one or two changes is sufficient to drastically alter Bcd’s target specificity both in vitro and in vivo seems to increase the plausibility of the idea that AncBcd could have commandeered control over an ancestral suite of targets that already contained BMs. But a key premise of this model — that Otd or other HD-containing proteins with Bcd-like specificity were the ancestral anterior organizers — is not yet established. In other insect species, including some more closely related to cyclorrhaphans than are wasps and beetles, non-HD transcription factors unrelated to Otd or Bcd are the upstream controlling determinants of AP development (Klomp et al., 2015), so it is currently premature to conclude that Otd played this role in the ancestral cyclorrhaphan. Moreover, replacing the HD of D. melanogaster Bcd with that of D. melanogaster Otd in transgenic flies yields activation of only a few Bcd targets and virtually no rescue of anterior development (Datta et al., 2018); this result seem incompatible with the model that Bcd simply commandeered Otd’s targets during evolution, unless subsequent evolution of the system has obscured the putative ancestral relationship.

A second premise of this model is that the network of downstream genes that executes AP specification in response to Otd or some other ancestral master anterior factor was already in place in a form similar or identical to that in extant D. melanogaster, ready to be commandeered by Bcd when it evolved. Current knowledge of the mechanisms of AP specification in other species is inadequate to establish the long-term stasis of this network. Subsequent changes in the regulatory sequences of some present-day Bcd target genes may have been important in initiating or modifying their response to Bcd, after Bcd’s intrinsic molecular functions were already acquired. Our experiments, which were conducted in the context of the D. melanogaster embryo, cannot address this possibility. To infer the likely components of the ancestral network and the relationships among them, it will be necessary to assess the mechanisms of AP specification in numerous non-cyclorrhaphan dipterans, as well as basally branching cyclorrhapans, and evaluate that evidence in a phylogenetic context. Such efforts will refute or corroborate the ‘Otd replacement’ hypothesis and may suggest other plausible scenarios for how the relatively rapid acquisition by Bcd of new molecular functions allowed it to evolve a controlling role in a crucial integrated process of early animal development.

Materials and methods

Drosophila stocks, cloning, and transgenesis

Request a detailed protocol

We used the following stocks in these experiments: yw (wild type), ±/Cyo bcd+;bcdE1/bcdE1, Cyo bcd+/Sco;bcdE1/bcdE1, and ΦC31 (y+);38F1 (w+). We cloned an injection plasmid (piattB40-Bcd) containing two inverted ΦC31-specific recombination sequences (AgeI/HindIII) for Recombination Mediated Cassette Exchange (RMCE), Gmr-GFP (HindIII/AscI), and a polylinker flanked by 1.9 kb bcd promoter and 0.8 kb 3’UTR (AgeI/AscI). The bcd and zen coding regions were amplified by PCR from pBS-SK + cDNA clones, digested with RsrII and AscI and ligated into piattB40-Bcd. We used standard cloning techniques to generate homeodomain swaps and residue changes. Gene Blocks coding for the ancestral HD and variant sequences together with the flanking Bcd coding sequence were obtained from Integrated DNA Technologies (IDT). They were digested with AscI and BspEI and ligated to the piattB40-Bcd vector digested with the same restriction enzymes. All transgenic lines were generated using the ΦC31 integration system, and constructs were integrated into the 38F1 landing site on the second chromosome (Bateman et al., 2006). Each transgene was crossed to Cyo bcd+/Sco; bcdE1/bcdE1 to generate Cyo bcd+/[transgene]; bcdE1/bcdE1 stocks. Embryos and larvae from homozygous transgenic females were assayed for gene expression and cuticle phenotype.

In situ hybridization, immunohistochemistry, and image processing

Request a detailed protocol

In situ hybridizations were performed as previously described (Kosman et al., 2004). Briefly, embryos 1–3 hr AEL (after egg laying) were dechorionated 2 min in bleach and fixed and devitellinized in a biphasic fixation solution containing 3 ml 1X PBS, 1 ml 37% Formaldehyde and 4 ml Heptane for 25 min on a shaker at RT. Fixed and permeabilized embryos were incubated with DIG or Fluorescein-labeled RNA probes and the labeled probes were detected by Alkaline Phosphatase (AP)-conjugated primary antibodies (Cat #11093274910 and #11426338910) by using NBT/BCIP solution. RNA expression was observed by Zeiss Axioskop microscopy. Cuticle preparations were performed on embryos aged 20–24 hr. Larvae were dechorionated for 2 min in bleach, and a 1:1 mixture of methanol and heptane was used to remove the vitelline membrane. Larvae were fixed overnight at 65°C in a 1:4 mixture of glycerol and acetic acid, and mounted in a 1:1 mixture of Hoyer’s medium (Anderson, 1954) and lactic acid. Guinea pig anti-Cad (1:400) and Alexa Fluor conjugated 647 donkey anti-guinea pig (1:500, Invitrogen) were used to examine Cad protein expression. All antibodies were diluted in PBT (1X PBS with 0.1% Tween). Data for immunostaining images were collected on a Leica TCS SP5 confocal microscope using the Leica confocal analysis software.

Phylogenetics and ancestral sequence construction

Request a detailed protocol

Protein sequences of Bicoid (Bcd), Zerknullt (Zen) and Hox3 orthologs were collected by BLAST against NCBI databases using Drosophila melanogaster Bcd and Zen HD sequences. Three zen cDNA sequences were reverse transcribed from insect species, Lonchoptera lutea, Platypeza consobrina and Rhajio tringarius (Stauber, 2001). Protein sequences were aligned using the multiple sequence alignment program MSAProbs (v0.9.7) (Liu et al., 2010). The alignment was manually examined and HD regions were extracted, realigned, and then checked for proper alignment according to known conserved positions. Only one representative sequence was kept if multiple sequence entries showed no sequence diversity among them. The curated HD alignment included 33 HDs from 27 insect species (Supplementary file 1). Phylogenetic analysis was performed using PHYML (v3.0) (Guindon et al., 2010) and LG model (Le and Gascuel, 2008) with gamma-distributed among-site rate variation and model defined state frequencies, which was the best-fit evolutionary model selected using the Akaike Information criterion implemented in ProtTest software (v2.4) (Abascal et al., 2005). Statistical support for each node was evaluated by calculating the approximate likelihood ratio statistic (Anisimova and Gascuel, 2006).

Homedomain sequences are short and highly conserved at many sites, so they contain inadequate information to resolve some species relationships by maximum likelihood (Figure 2—figure supplement 1). We therefore imposed a tree topology a priori to constrain all species relationships, which are well corroborated from extensive prior research on insect phylogenetics. Branch lengths and model parameters were then optimized on this tree by maximum likelihood. Ancestral HD sequences were reconstructed by the maximum likelihood method using the codeml module of PAML (v4.7) (Yang, 2007), assuming the constrained tree, a free four-category gamma distribution of among-site rate variation, and the LG model. A customized Python script was used to extract marginal posterior probability of each amino acid at each site for the nodes of interest from PAML output file (available at; copy archived at We identified all ambiguously reconstructed sites as those at which multiple states have posterior probability >0.20; we synthesized alternative ancestral reconstructions containing the state with the second highest likelihood at all such sites (Supplementary file 2). The resulting ‘alt-all’ ancestors represent the far edge of the cloud of plausible ancestral sequences and allow a conservative experimental test (Figure 3—figure supplement 1, Figure 4—figure supplement 1, Supplementary file 3) of the robustness of functional inferences to statistical uncertainty about the ancestral sequences (Eick et al., 2017).

Protein synthesis and purification

Request a detailed protocol

Ancestral HD cDNA sequences were synthesized (GeneScript). For SPR and F-EMSA experiments, HD cDNAs were cloned into the pETMALc-H10T vector (Pryor and Leiting, 1997) C-terminal to a cassette containing a 6xHis tag, maltose binding protein (MBP) and a TEV protease cleavage site. For protein binding microarray experiments, cDNA sequences of ancestral HDs plus a flanking region containing 15 amino acids from D. melanogaster Bcd or Zen were synthesized (GeneScript) and cloned into the pDEST15 vector through Gateway cloning (Life Technologies) to produce N-terminal GST fusion proteins. Ancestral HDs were expressed in BL21 (DE3) pLysS Rosetta cells.

Protein expression was induced by 0.5 mM IPTG at A600 of 0.8–1.2. Cells were then grown overnight at 18°C, harvested and frozen at −20°C. Cells were lysed using BPER Protein Extraction Reagent Kit (ThermoScientific). For MBP-His fusion ancestral HDs, the supernatant of the cell lysate was loaded onto a pre-equilibrated 5 mL HisTrap HP column (GE) and eluted with a linear imidazole gradient (40 mM to 1 M) in 20 mM sodium phosphate and 200 mM NaCl buffer [pH 7.4] on AKTA FPLC system. Then, ancestral HDs were cleaved from MBP-His fusions using TEV protease in dialysis buffer consisting of 20 mM sodium phosphate and 250 mM NaCl [pH 7.0]. For GST fusion ancestral HDs, the supernatant of cell lysate was loaded onto a pre-equilibrated gravity Glutathione column (ThermoScientific), eluted with 10 mM glutathione, 125 mM Tris and 150 mM NaCl [pH 8.0], and dialysed in dialysis buffer [pH 6.0]. For the second purification step, samples were loaded onto a 5 mL HiPrep SP FF cation exchange column (GE) and eluted with a linear NaCl gradient (250 mM to 1 M) in 20 mM sodium phosphate buffer [pH 7.0]. Whenever needed, ancestral HDs were further purified on a HiPrep 16/60 Sephacryl S-100 HR size exclusion column (GE) with 20 mM Tris and 150 mM NaCl [pH 7.5]. Protein purity was assayed after purification by visualization on a 12% SDS-PAGE gel stained with Bio-Safe Coomassie G-250 stain (Bio-Rad). Purified protein aliquots were flash frozen in liquid nitrogen and stored at −80°C in 20 mM sodium phosphate, 150 mM NaCl, 1 mM DTT and 10% glycerol [pH7.5]. Protein folding and aggregation states were measured using circular dichroism (Jasco J-1500 CD) and dynamic light scattering (Wyatt DynaPro NanoStar) spectroscopy after freeze-thaw cycle. For biochemical assays, protein aliquots were thawed, centrifuged, and the supernatant was used for Quick StartTM Bradford Assay (BioRad) to quantify protein concentrations.

Protein binding microarray analysis

Request a detailed protocol

Protein binding microarrays (PBM) were performed essentially as described previously (Berger et al., 2006) using a custom-designed ‘all 10-mer’ universal array (Berger et al., 2008) in the 8 × 60K array format (Agilent Technologies, Inc.; AMADID # 030236) (Nakagawa et al., 2013). All proteins were tested at a concentration of 100 nM. Duplicate PBMs were performed for each protein. The array data were quantified as described previously (Berger et al., 2006) using the Universal PBM Analysis Suite (Berger and Bulyk, 2009). Replicate experiments for each protein yielded similar results, although the signal in one replicate was in most cases stronger than in the other (Figure 4—figure supplement 1, Supplementary file 6). For PBM analysis, 15 amino acids flanking the HD were included to follow previously optimized protocols (Berger et al., 2008); we found almost identical results whether these residues were those from D. melanogaster Bcd or Zen (Figure 4—figure supplement 1, Supplementary file 3).

For heat-map and clustering analysis, E-scores for 8-mers scoring≥0.45 for at least one tested construct were clustered and plotted using the heatmap.2 function in the gplots R package, with the Manhattan distance metric. Clusters of 8-mers selected from the heatmaps were aligned as described (Jiang et al., 2013), and sequence logos were made using the makePWM and seqLogo functions from the Biostrings and seqLogo R packages. For correlation analysis, binding of a HD protein to every 6-mer was calculated as the mean E-score averaged over all 8-mers containing that 6-mer; 6-mers were used because our F-EMSA and SPR experiments were performed using canonical 6-mers, and because the strongest information about sequence specificity observed in our PBMs and those of other homeodomains typically comes from 6-mers, although flanking bases can provide some additional information (see, for example, Figure 4 of [Berger et al., 2008]). The Pearson correlation coefficient was then calculated for the relationship between 6-mer E-scores of PBMs performed using two different HDs.

PBM binding logos were derived from site-specific effects on binding energies estimated using a maximum likelihood method implemented in the BEEML-PBM algorithm (Zhao and Stormo, 2011). BEEML-PBM models the nonlinear relationship between the logarithm of binding probability and the binding free energy by including transcription factor concentration as a separate parameter and also taking into account background and position effects. In practice, for each ancestral HD, the normalized probe intensity data and a set of initial parameters specifying its position weight matrix (PWM) energy model were used as input files for nonlinear regression to obtain the parameters of its PWM energy model that maximize the fit to the observed probe intensity data. To evaluate PWM energy model performance, PWMs trained on replicate one were used to predict replicate 2 8-mer median intensities. Observed or predicted median intensity of an 8mer was calculated by taking the median of the observed or predicted intensities of all probes containing this 8-mer. The coefficient of determination (R2) was calculated to quantify the proportion of variance of observed median intensities that could be explained by predicted median intensities. The predictive performance was then compared with experimental reproducibility (R2 of 8mer median intensities) between the two replicates. To evaluate the robustness of our statistical sequence reconstruction, a PWM energy model trained on an alternate reconstructed ancestral HD was used to predict 8-mer median intensities of its corresponding ML ancestor. PWM energy models were graphically represented using energy logos implemented in Energy Model program (Foat et al., 2006; Zhao et al., 2012), which represent the relative energy contribution for each nucleotide at each position, and directly display binding affinity preference of favorable and unfavorable nucleotides.

Quantitative fluorescence electrophoretic mobility shift assays (F-EMSA)

Request a detailed protocol

Quantitative F-EMSAs were used to measure the binding affinities of ancestral HDs to 14 bp 5’ fluorescein-labeled dsDNA oligonucleotides (FAM-DNA). FAM-DNAs were produced by annealing equimolar complementary ssDNA strands (HPLC purified; Eurofins). 5 µM annealed FAM-DNA aliquots were stored at −20°C. Typical reactions consisted of 2.5 nM FAM-DNA equilibrated with varying concentrations of protein in assay buffer for 30 min at room temperature (20 mm Tris, pH 7.5, 150 mm NaCl and 1 mM DTT). Immediately prior to loading, one-twentieth volume of 30% (v/v) glycerol, 0.01% (w/v) bromcresol green was added to each reaction as a dye marker. A 10 µl sample of each reaction was loaded onto a pre-run 8% native polyacrylamide gel in pre-chilled 0.5 × TBE buffer. The gels were run for 40 min at 100 volts then immediately scanned using a fluor-imager (Bio-Rad GelDoc) with a blue laser at 491 nm. The fluorescence intensity ratio of bound verse total FAM-DNA (F) was determined as a function of total protein concentration (P) and DNA concentration (D), and the data were fitted to a quadratic equation (Equation 1) to determine the binding association constants (KA). In Equation 1, b0 and bM represent the baseline and maximum response as normalization factor for data fitting. All procedures associated with image processing and model fitting were carried out using a customized R script.

(1) F=b0+(bM-b0)×[D+P+KA-1-(D+P+KA-1)2-4PD2D]

Surface plasmon resonance assays

Request a detailed protocol

All SPR kinetic experiments were performed with a four-channel Biacore 3000 biosensor system (GE) at 25°C. 20 bp 3′-Biotin-labeled DNA oligonucleotides (HPLC purified; Eurofins) were captured with streptavidin on CM5 sensor chips (GE) on which the carboxymethyl groups were activated and coupled with streptavidin to amine linkages. Flow cell #1 was left blank as a control while others contained a specific immobilized DNA sequence. Preliminary experiments were done to find appropriate buffer and surface regeneration conditions, DNA immobilization levels, protein concentrations, flow rate, association and dissociation contact time for each protein variant. Low levels of DNA immobilization, 100 ~ 150 RUs (response unit), and fast flow rate, 30 µl /min, were used to minimize mass transfer effects. Before each kinetic experiment, the system was stabilized for 3 min by flowing HBS-PE buffer at 30 µl /min (10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM EDTA, 0.005% v/v Surfactant P20). During the kinetic association phase, a protein dilution in modified HBS-PE buffer (10 mM HEPES pH 8.5, 150 mM NaCl, 3 mM EDTA, 0.005% v/v Surfactant P20, 1 mM DTT) was injected. Following a kinetic dissociation phase in which only running buffer was injected, DNA binding surface was regenerated by flowing 20 µl 0.05% SDS at 100 µl /min. The kinetic parameters (kon and koff) were estimated by using BIAevaluation software (GE) to fit the observed data (R) to a function (Equation 2), with terms for concentration of protein variants (c0) at various time points (t) and correction terms for refractive index change (RRI), drifting baseline (cD) and local maximum response (Rmax).

(2) R=konc0Rmaxkonc0+koff[1-e-(konc0+koff)t]+RRI+cDt

Data availability

The PBM data generated is available via Dryad (doi:10.5061/dryad.pm3g4r3)

The following data sets were generated


    1. Colosimo PF
    2. Hosemann KE
    3. Balabhadra S
    4. Villarreal G
    5. Dickson M
    6. Grimwood J
    Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles
    Science 2005:307.
  1. Book
    1. Davidson EH
    2. Peter I
    Genomic Control Process
    Oxford: Academic Press.
  2. Book
    1. Gilbert SF
    Developmental Biology (10th ed)
    Sunderland, MA: Sinauer Associates.
  3. Book
    1. Grimaldi D
    2. Engel MS
    Evolution of the Insects
    New York: Cambridge University Press.
  4. Book
    1. Lewontin RC
    The Genetic Basis of Evolutionary Change
    New York: Columbia University Press.
    1. Ma X
    2. Yuan D
    3. Diepold K
    4. Scarborough T
    5. Ma J
    The Drosophila morphogenetic protein bicoid binds DNA cooperatively
    Development 122:1195–1206.
    1. Nguyen V
    2. Wilson C
    3. Hoemberger M
    4. Stiller JB
    5. Agafonov RV
    6. Kutter S
    Evolutionary drivers of thermoadaptation in enzyme catalysis
    Science 355:aah3717.
    1. Percival-Smith A
    2. Müller M
    3. Affolter M
    4. Gehring WJ
    The interaction with DNA of wild-type and mutant fushi tarazu homeodomains
    The EMBO Journal 9:3967.
    1. Percival-Smith A
    2. Weber J
    3. Gilfoyle E
    4. Wilson P
    Genetic characterization of the role of the two HOX proteins, proboscipedia and sex combs reduced, in determination of adult antennal, tarsal, maxillary palp and Proboscis identities in Drosophila Melanogaster
    Development 124:5049–5062.
  5. Book
    1. Schmidt-Ott U
    2. Rafiqi AM
    3. Lemke S
    Hox3/zen and the Evolution of Extraembryonic Epithelia in Insects
    In: Deutsch J. S, editors. Hox Genes: Studies From the 20th to the 21st Century. New York: Springer. pp. 133–144.
    1. Stauber M
    Studien zur Evolution des Musterbildungsgens bicoid in Dipteren. PhD thesis. 
    Universität Witten/Herdecke.

Decision letter

  1. Patricia J Wittkopp
    Senior and Reviewing Editor; University of Michigan, United States
  2. Artyom Kopp
    Reviewer; UC Davis, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your work entitled "Ancient mechanism for the evolution of Bicoid function in fly development" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Artyom Kopp (Reviewer #3). The other two reviewers remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

All three reviewers agree that the methodology is a significant strength of the paper. After discussion, however, the reviewers and editors felt that the results presented here are very much in line with what is already known about the functional changes brought about by evolutionary amino acid substitutions in the BCD homeodomain. Furthermore, the conclusions were based on testing variants in Drosophila melanogaster, which does not allow for an analysis of the contributions of "trans" evolutionary changes in target sequences, and in interacting proteins. In the end, it was the lack of a significantly new advance in understanding BCD protein evolution/function that led to the decision.

Reviewer #1:

This study sought to characterize the amino acids changes that have evolved to confer functionality on Bicoid since the ancestral duplication that produced this gene in the lineage leading to cyclorrhaphan flies. The study focused on the homeodomain and used a transgenic rescue assay to test the effect of variants on target gene expression and overall ability to rescue in vivoin vivo. These experiments were complimented by in vitroin vitro analysis of binding site preferences of variants.

The authors found that the Bcd homeodomain in an otherwise Zen protein could perform much like Bcd. They then used ancestral sequence reconstruction (ASR) to generate and test ancestral sequences. This allowed them to identify an ancestral-like sequence that could function like Bcd although it was unable to fully function like this transcription factor, in for example, repressing translation of caudal. The authors then sought to determine which Bcd specific amino acids in the homeodomain were responsible. They found that the K at position 50 conferred much of the characteristics of Bcd in vitroin vitro and in vivoin vivo and that the R at position 54 also contributed to Bcd function in vivoin vivo. None of the other amino acid differences tested made any detectable contribution, although since rescue was not complete they and other changes outside the homeodomain presumably also have minor roles.

Overall the experiments are described clearly and the work confirms that the amino acids at positions 50 and 54 have made a major contribution to the evolution of Bcd functionality as inferred from several previous studies cited in the manuscript.

A case is made in the Discussion sectionDiscussion section for how these amino acid changes may have facilitated the usurpation of the ancestral gene regulatory network for anterior patterning. However this study only assesses changes in the homeodomain of Bcd and therefore there appears to be an assumption that the binding sites in the target genes have not changed in sequence and nor has there been any other changes in trans. Therefore this speculation should be toned down or at least the caveats should be explained in more detail.

It would also have been interesting to determine how Otd with a Bcd homeodomain and other variants with particular amino acid substitutions performed in these assays to further test what ancestral system Bcd replaced and how this was achieved.

The Introduction sets out the case for the merits of the ASR approach to identify mutations underlying evolutionary differences. However, while ASR can be a powerful approach, I think the Introduction is too biased and there is an implication that horizontal comparisons do not generally allow the genetic changes underlying phenotypic differences to be identified (e.g. from wording such as 'differences that correlate with phenotypic differences' and 'identified allelic differences that reproduce diversity in gene expression and developmental outcomes'. On the contrary mapping approaches and functional analyses of variants between species have identified many mutations underlying population and species differences (e.g. see Martin and Orgogozo, 2013) and continue to do so. Therefore a more balanced Introduction is needed.

Reviewer #2:

The article by Liu et al. aims to understand the evolution of the very interesting fly protein Bicoid. They use a cutting edge tool for reconstructing the most likely ancestor of it and its paralog Zen, and testing the functions of variations on the modern and ancestral proteins in establishing and patterning the anterior posterior axis. I think the work is very well done, but in the end, does not advance our knowledge of the properties and evolution of Bicoid significantly. The importance of the transition from Q50 to K50 is well known and has been for around 30 years, and it is no surprise that it is far and away the most significant change in the protein. Position 54 is also well known to have important functions.

These results should be the starting point of the paper, not the main conclusions. The really interesting thing to me is why these are not sufficient to give complete Bicoid rescue? Is it still a question of intramolecular epistasis that prevents the combination of k50 and r54 from activating all bicoid targets? If so, which sites are involved? Or is something else going on? Perhaps the cis-regulatory elements in D., melanogaster have coevolved with the Drosophila Bcd homeodomain? I realize testing the latter hypothesis is outside the scope of the approach of this paper, but a more systematic probing of the potential of interactions among residues within the homeodomain with the two large effect changes could support or eliminate the intramolecular epistasis hypothesis, and could give more support to searching for other explanations.

As it stands I feel that the field would not be advanced significantly by this work, since we are still at the point of knowing that q50 to k50 led to a big change in the sequences bound by Bicoid, but we have no further understanding of how this led to Bicoid becoming the anterior determinant of the cyclorrhaphans.

Reviewer #3:

I think this is an exemplary paper. As the authors explain in their very eloquent Introduction, incorporating explicit phylogenetic structure and ancestral character reconstruction into comparative studies produces much more rigorous results than simple pairwise comparisons between species. This is as true for development as it is for morphology and DNA sequences, but unfortunately the application of phylogenetic approaches is much rarer in evo-devo studies than in other sub-fields of evolutionary biology – – due at least in part to the difficulty of collecting appropriate data, but also perhaps to cultural factors. This paper raises the bar in both respects. The authors have performed a very nicely structured comparison of reconstructed ancestral proteins to investigate how their function evolved through time on a defined evolutionary lineage. At the same time, they have integrated a wealth of in vivoin vivo, in vitro, and biophysical data to characterize the functions of both extant and ancestral proteins. The result is a very thorough and rigorous paper. In some respects, the results are not entirely surprising – – the single mutation that was found to play by far the largest role in the evolution of protein function is the one that has been hypothesized to do just that for decades. However, in addition to confirming the role of this mutation, the authors were able to quantify its contribution to phenotypic change at different levels of biological organization (from binding affinity/occupancy rate to target gene expression to morphological phenotype), identify a second AA substitution that acts in concert with it, and to rule out the role of multiple other fixed AA changes (within the limits of their assays, of course). So the exceptionally laborious approach taken by the authors has paid off.

[The decision letter after re-review follows.]

As you know, the initial reviews and discussion identified this as interesting and unusually rigorous work demonstrating the phenotypic effects of lineage specific amino acid substitutions in flies. However, concerns were also expressed about the novelty of this work and impact on the field given that the amino acids tested had previously been suggested to be involved in the trait studied. After much debate, I decided initially to reject this work based on these grounds. You have done a great job highlighting the differences between this prior work and your new findings in the comprehensive response to reviewers and appeal letter. After consulting with all three of the original reviewers, we agree that the changes to the manuscript address our concerns. I am thus happy to now accept this work for publication.

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Thank you for the decision letter and reviews of our paper. As the letter states, all reviewers viewed the paper’s methodology as very strong, and there were no concerns about the quality of the experiments, the data, or their interpretation. The major comments pertained to the paper’s impact in relation to prior work and, to a lesser extent, to the way we framed a few issues in the introduction and discussion. We have revised the paper to address these concerns, as detailed below.

1) Reviewers 1 and 2 commented that the impact of the paper was low, because the importance of sites 50 and 54 was already known. We believe this comment arises because we inadequately described previous work in our relation to our findings, because a close reading of the literature shows that the extent to which Q50K and M54R can account for the evolution of Bcd’s derived functions in vitro was not previously known, and in vivo, there was no evidence whatsoever to indicate that these substitutions can confer Bcd-like functions or played any role in Bcd evolution.

As the reviewers pointed out, numerous previous studies showed that swapping lysine and glutamine at site 50 between several HD proteins strongly affects in vitro DNA preference for Bcd vs. Zen motifs, and other experiments studied the in vitro effects of various mutations at site 54. Thus, it was reasonable to hypothesize that these might have been involved Bcd evolution. However, these studies could not and did not quantify the extent to which the evolutionary acquisition of Bcd’s derived specificity was caused by the specific historical sequence changes that occurred at sites 50 and 54. That goal would require quantifying the historical shift in Bcd affinity for both kinds of target sequence and assessing the effects of the historical substitutions, in the ancestral background in which they occurred; none of this was previously done. in vivo, no studies have provided any evidence that these substitutions caused the evolution of Bcd’s role in gene regulation and development: the only relevant work found that introducing Q50K into a related HD confers none of Bcd’s derived functions, and the M54R substitution’s effects on Bcd functions was never previously assessed.

Our findings are therefore unprecedented that (1) Q50K alone can confer significant (but partial) in vivo rescue of anterior gene regulation and development, (2) adding M54R yields a substantially more complete rescue of these phenotypes, (3) M54’s effect is epistatically dependent on co-introduction of Q50K, (4) Other substitutions from the same phylogenetic interval are required for complete acquisition of all of the Bcd HD’s regulatory activity and role in anterior development, and (5) the acquisition of Bcd’s derived DNA specificity in vitro is entirely attributable to Q50K.

We have revised the Introduction and Discussion section to thoroughly discuss prior work and to explicitly place our strategy and the impact of our findings in the context of that work.

2) The decision letter and reviewer 2 expressed concern that our in vivo experiments were conducted by introducing variants of the ancestral Bcd into D. melanogaster; this design, the comment says, did not allow us to test the evolutionary consequences of changes at Bcd’s regulatory targets or interacting proteins, which could also have been important in the evolution of its role in AP development. It is true that we focused on the evolutionary effect of changes in the HD, using D. melanogaster as a transgenic platform, but we believe this design is a strength of our study, so long as the interpretations are drawn appropriately. All transgenic experiments are tests of specific putative causal factors when introduced into a particular genetic background; they cannot address the contributions of other potential causes that are already “embedded” within the host organism, and they must be interpreted accordingly. In our study, changes at other loci could indeed have played a role in the evolution of Bcd’s developmental role, but our experiments quite decisively establish the developmental effects of the genetic changes within the Bcd HD that we tested. Further work will be necessary to study the impacts of changes at other loci.

A second potential concern could relate to the possibility of an epistatic mismatch between the ancestral gene and the host organism into which it is introduced, potentially leading to artifactual results. This issue, which potentially affects all transgenic studies, is important to acknowledge. But a strength of our ancestral reconstruction strategy is that it can help to reduce this potential problem compared to horizontal transgenic experiments, because an ancestor and descendant differ only by the changes that occurred along only one branch, whereas the two extant species differ by all changes that occurred along both branches. Further, it is unlikely that our particular findings are attributable to epistasis between ancestral alleles and the transgenic host in which they were assayed, because the developmental and regulatory phenotypes of D. melanogaster transformed with its own DmBcd-HD and those transformed with the AncBcd-HD are essentially indistinguishable. Finally, our inferences involved tightly focused comparisons, in which we tested ancestral proteins from successive phylogenetic nodes and examined the effect of specific sequence changes on the phenotypes they produce; this design drastically limits the scope of the inferences that might plausibly be compromised by epistasis with the assay system, because the proteins we tested are so similar to each other. Only if epistasis differentially affects the specific substitutions Q50K, M54R, and the other differences between AncZB and AncBcd which were assessed in otherwise identical proteins in a common genetic background could our results be misleading.

We have modified the DDiscussion section and Iintroduction to explicitly acknowledge the potential importance of evolutionary change at Bcd targets at interacting proteins and to carefully describe the issue that arise when ancestral genes are assessed by introducing them into an extant model organism.

3) Reviewer 1 felt that our introduction was too dismissive of comparative strategies for identifying the genetic basis for the evolution of development. We agree that such approaches have been very important, particularly among closely related species among populations, and would not want imply otherwise. Among more distant taxa, they have also made significant contributions to our understanding, although typically at considerably lower resolution. We have modified the Introduction to acknowledge these contributions.

4) Reviewer 1 suggested that we remove the phrase “higher taxa.” We have replaced it with “more distantly related taxa.”

5) Reviewer 1 suggested that we refer to the Bcd motif using a longer sequence than the canonical 6-bp core sequence we described. We prefer the classic 6-bp core because it contains essentially all of the key information about Bcd’s mode of DNA recognition relative to Zen, which is the relevant change on the evolutionary timescale studied here. Further, it is consonant with our PBM and EMSA experiments, and it is far more accessible for the reader than a longer motif that includes a small amount of degenerate information at flanking sites. We have provided appropriate citations for the consensus sequence we use and have modified the text to acknowledge that there is some additional sequence preference at flanking sites.

6) Reviewer 1 asked us to fix mismatches in the references and check all figure pointers; we did so.

7) Reviewer 2 and reviewer 3 found aspects of our introductory description of the relationships among Hox3, Zen, and Bcd to be confusing. We have rewritten this section for accuracy and clarity.

8) Reviewer 2 suggested we change description of the evolutionary relationship between Bcd and Ftz. We have done so, referring to Ftz as a distantly-related HD protein rather than a paralog.

9) Reviewer 3 asked for more information about the constrained topology and why we used it. We have modified the methods section to more fully address this. Briefly, HD sequences are too short and have inadequate phylogenetic signal to resolve relationships among all the species with sequences in our alignment. These species relationships, however, are all well established by previous work using much larger datasets. We therefore imposed these relationships in a single tree topology, and then used maximum likelihood to optimize branch lengths and model parameters and infer ancestral sequences. We have explained this in the text.

10) Reviewer 3 asked if including the Syrphid sequence affects the ancestral sequence. We checked, and the effect is trivial: reconstructions of AncBcd with or without EbaBcdHD have the same ML sequence, and the difference in their marginal average posterior probabilities is 0.006.

11) Reviewer 3 asked about the correlation of DNA occupancy in PBM between Dn-Zen and AncZB. We cannot answer that question, because we did not perform a PBM for DmZen.

Article and author information

Author details

  1. Qinwen Liu

    Department of Ecology and Evolution, University of Chicago, Chicago, United States
    Conceived the project; performed phylogenetic analyses, quantitative gel shifts, and surface plasmon resonance assays; analysed the protein binding microarray experiments data; interpreted data and contributed to writing the manuscript
    Contributed equally with
    Pinar Onal and Rhea R Datta
    Competing interests
    No competing interests declared
  2. Pinar Onal

    Department of Biology, New York University, New York, United States
    Conceived the project; generated transgenic lines containing HDs from AncBcd, AncZB, AncBcdK50q, AncZBq50K, AncZBm54R, and AncZBK50/R54; performed cuticle preparations, in situ hybridizations, and immunofluorescence assays; interpreted data and contributed to writing the manuscript
    Contributed equally with
    Qinwen Liu and Rhea R Datta
    Competing interests
    No competing interests declared
  3. Rhea R Datta

    Department of Biology, New York University, New York, United States
    Present address
    Biology Department, Hamilton College, Clinton, New York, United States
    Conceived the project; generated transgenic lines containing HDs from DmBcd, DmZen, and DmZenBcdHD; performed cuticle preparations, in situ hybridizations, and immunofluorescence assays; analysed the protein binding microarray experiments data; interpreted data and contributed to writing the manuscript
    Contributed equally with
    Qinwen Liu and Pinar Onal
    Competing interests
    No competing interests declared
  4. Julia M Rogers

    1. Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, United States
    2. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States
    Performed protein binding microarray experiments; analysed the protein binding microarray experiments data; interpreted data and contributed to writing the manuscript
    Competing interests
    No competing interests declared
  5. Urs Schmidt-Ott

    Department of Organismal Biology and Anatomy, University of Chicago, Chicago, United States
    Conceived the project; interpreted data and contributed to writing the manuscript
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1351-9472
  6. Martha L Bulyk

    1. Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, United States
    2. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States
    3. Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States
    Contributed to the design of experiments and analyses; interpreted data and contributed to writing the manuscript
    Competing interests
    No competing interests declared
  7. Stephen Small

    Department of Biology, New York University, New York, United States
    Conceived and supervised the project; contributed to design of experiments and analyses; interpreted data and contributed to writing the manuscript
    For correspondence
    Competing interests
    No competing interests declared
  8. Joseph W Thornton

    1. Department of Ecology and Evolution, University of Chicago, Chicago, United States
    2. Department of Human Genetics, University of Chicago, Chicago, United States
    Conceived and supervised the project; contributed to the design of experiments and analyses; interpreted data and contributed to writing the manuscript
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9589-6994


National Institutes of Health (R01GM051946)

  • Stephen J Small

National Institutes of Health (R01GM104397)

  • Joseph W Thornton

National Institutes of Health (R01GM121931)

  • Joseph W Thornton

National Institutes of Health (R01-HG005287)

  • Martha L Bulyk

National Institutes of Health (F32GM112351)

  • Qinwen Liu

National Institutes of Health (NSF IOS0744966)

  • Stephen J Small

National Institutes of Health (IOS1355057)

  • Urs Schmidt-Ott

New York University Dean for Science Research Fund

  • Stephen J Small

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We thank Elena Solomaha, Jackie Moore, Christine Yoo, Anastasia Vedenko, Leila Shokri, and Jesse Kurland for assistance with experiments and analysis. We thank Claude Desplan, Lionel Christiaen, Vincent Lynch, and members of the Thornton and Small laboratories for helpful discussions and comments on the manuscript. Supported by NIH R01GM051946 (SS), R01GM104397 (JWT), R01GM121931 (JWT), R01-HG005287 (MLB), and F32GM112351 (QL), NSF IOS0744966 (SS) and IOS1355057 (US-O) and NYU’s Dean for Science Research Fund (SS).

Senior and Reviewing Editor

  1. Patricia J Wittkopp, University of Michigan, United States


  1. Artyom Kopp, UC Davis, United States

Version history

  1. Received: December 22, 2017
  2. Accepted: July 28, 2018
  3. Version of Record published: October 9, 2018 (version 1)


© 2018, Liu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 3,764
    Page views
  • 464
  • 18

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Qinwen Liu
  2. Pinar Onal
  3. Rhea R Datta
  4. Julia M Rogers
  5. Urs Schmidt-Ott
  6. Martha L Bulyk
  7. Stephen Small
  8. Joseph W Thornton
Ancient mechanisms for the evolution of the bicoid homeodomain's function in fly development
eLife 7:e34594.

Further reading

    1. Epidemiology and Global Health
    2. Genetics and Genomics
    Arturo Torres Ortiz, Michelle Kendall ... Louis Grandjean
    Research Article

    Accurate inference of who infected whom in an infectious disease outbreak is critical for the delivery of effective infection prevention and control. The increased resolution of pathogen whole-genome sequencing has significantly improved our ability to infer transmission events. Despite this, transmission inference often remains limited by the lack of genomic variation between the source case and infected contacts. Although within-host genetic diversity is common among a wide variety of pathogens, conventional whole-genome sequencing phylogenetic approaches exclusively use consensus sequences, which consider only the most prevalent nucleotide at each position and therefore fail to capture low frequency variation within samples. We hypothesized that including within-sample variation in a phylogenetic model would help to identify who infected whom in instances in which this was previously impossible. Using whole-genome sequences from SARS-CoV-2 multi-institutional outbreaks as an example, we show how within-sample diversity is partially maintained among repeated serial samples from the same host, it can transmitted between those cases with known epidemiological links, and how this improves phylogenetic inference and our understanding of who infected whom. Our technique is applicable to other infectious diseases and has immediate clinical utility in infection prevention and control.

    1. Cancer Biology
    2. Genetics and Genomics
    Ann-Kathrin Herzfeldt, Marta Puig Gamez ... Lee Kim Swee
    Research Article

    Cytotoxic CD8+ T lymphocytes (CTLs) are key players of adaptive anti-tumor immunity based on their ability to specifically recognize and destroy tumor cells. Many cancer immunotherapies rely on unleashing CTL function. However, tumors can evade killing through strategies which are not yet fully elucidated. To provide deeper insight into tumor evasion mechanisms in an antigen-dependent manner, we established a human co-culture system composed of tumor and primary immune cells. Using this system, we systematically investigated intrinsic regulators of tumor resistance by conducting a complementary CRISPR screen approach. By harnessing CRISPR activation (CRISPRa) and CRISPR knockout (KO) technology in parallel, we investigated gene gain-of-function as well as loss-of-function across genes with annotated function in a colon carcinoma cell line. CRISPRa and CRISPR KO screens uncovered 187 and 704 hits respectively, with 60 gene hits overlapping between both. These data confirmed the role of interferon‑γ (IFN-γ), tumor necrosis factor α (TNF-α) and autophagy pathways and uncovered novel genes implicated in tumor resistance to killing. Notably, we discovered that ILKAP encoding the integrin-linked kinase-associated serine/threonine phosphatase 2C, a gene previously unknown to play a role in antigen specific CTL-mediated killing, mediate tumor resistance independently from regulating antigen presentation, IFN-γ or TNF-α responsiveness. Moreover, our work describes the contrasting role of soluble and membrane-bound ICAM-1 in regulating tumor cell killing. The deficiency of membrane-bound ICAM-1 (mICAM-1) or the overexpression of soluble ICAM-1 (sICAM-1) induced resistance to CTL killing, whereas PD-L1 overexpression had no impact. These results highlight the essential role of ICAM-1 at the immunological synapse between tumor and CTL and the antagonist function of sICAM-1.