1. Chromosomes and Gene Expression
Download icon

GAF is essential for zygotic genome activation and chromatin accessibility in the early Drosophila embryo

  1. Marissa M Gaskill
  2. Tyler J Gibson
  3. Elizabeth D Larson
  4. Melissa M Harrison  Is a corresponding author
  1. Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, United States
Research Article
  • Cited 3
  • Views 1,780
  • Annotations
Cite this article as: eLife 2021;10:e66668 doi: 10.7554/eLife.66668

Abstract

Following fertilization, the genomes of the germ cells are reprogrammed to form the totipotent embryo. Pioneer transcription factors are essential for remodeling the chromatin and driving the initial wave of zygotic gene expression. In Drosophila melanogaster, the pioneer factor Zelda is essential for development through this dramatic period of reprogramming, known as the maternal-to-zygotic transition (MZT). However, it was unknown whether additional pioneer factors were required for this transition. We identified an additional maternally encoded factor required for development through the MZT, GAGA Factor (GAF). GAF is necessary to activate widespread zygotic transcription and to remodel the chromatin accessibility landscape. We demonstrated that Zelda preferentially controls expression of the earliest transcribed genes, while genes expressed during widespread activation are predominantly dependent on GAF. Thus, progression through the MZT requires coordination of multiple pioneer-like factors, and we propose that as development proceeds control is gradually transferred from Zelda to GAF.

eLife digest

Most cells in an organism share the exact same genetic information, yet they still adopt distinct identities. This diversity emerges because only a selection of genes is switched on at any given time in a cell. Proteins that latch onto DNA control this specificity by activating certain genes at the right time. However, to perform this role they first need to physically access DNA: this can be difficult as the genetic information is tightly compacted so it can fit in a cell. A group of proteins can help to unpack the genome to uncover the genes that can then be accessed and activated. While these ‘pioneer factors’ can therefore shape the identity of a cell, much remains unknown about how they can work together to do so. For instance, the pioneer factor Zelda is essential in early fruit fly development, as it enables the genetic information of the egg and sperm to undergo dramatic reprogramming and generate a new organism. Yet, it was unclear whether additional helpers were required for this transition.

Using this animal system, Gaskill, Gibson et al. identified GAGA Factor as a protein which works with Zelda to open up and reprogram hundreds of different sections along the genome of fruit fly embryos. This tag-team effort started with Zelda being important initially to activate genes; regulation was then handed over for GAGA Factor to continue the process. Without either protein, the embryo died.

Getting a glimpse into early genetic events during fly development provides insights that are often applicable to other animals such as fish and mammals. Ultimately, this research may help scientists to understand how things can go wrong in human embryos.

Introduction

Pronounced changes in cellular identity are driven by pioneer transcription factors that act at the top of gene regulatory networks. While nucleosomes present a barrier to the DNA binding of many transcription factors, pioneer factors can bind DNA in the context of nucleosomes. Pioneer-factor binding establishes accessible chromatin domains, which serve to recruit additional transcription factors that drive gene expression (Zaret and Mango, 2016; Iwafuchi-Doi and Zaret, 2014; Zaret and Carroll, 2011). These unique characteristics of pioneer factors enable them to facilitate widespread changes in cell identity. Nonetheless, cell-fate transitions often require a combination of pioneering transcription factors to act in concert to drive the necessary transcriptional programs. Indeed, the reprogramming of a specified cell type to an induced pluripotent stem cell requires a cocktail of transcription factors, of which Oct4, Sox2, and Klf4 function as pioneer factors (Takahashi and Yamanaka, 2006; Takahashi et al., 2007; Chronis et al., 2017; Soufi et al., 2015; Soufi et al., 2012). Despite the many examples of multiple pioneer factors functioning together to drive reprogramming, how these factors coordinate gene expression changes within the context of organismal development remains unclear.

Pioneer factors are also essential for the reprogramming that occurs in the early embryo. Following fertilization, specified germ cells must be rapidly and efficiently reprogrammed to generate a totipotent embryo capable of differentiating into all the cell types of the adult organism. This reprogramming is initially driven by mRNAs and proteins that are maternally deposited into the oocyte. During this time, the zygotic genome remains transcriptionally quiescent. Only after cells have been reprogrammed is the zygotic genome gradually activated. This maternal-to-zygotic transition (MZT) is broadly conserved among metazoans and essential for future development (Schulz and Harrison, 2019; Vastenhouw et al., 2019). Activators of the zygotic genome have been identified in a number of species (zebrafish – Pou5f3, Sox19b, Nanog; mice – Dux, Nfy; humans – DUX4, OCT4; fruit flies – Zelda), and all share essential features of pioneer factors (Schulz and Harrison, 2019; Vastenhouw et al., 2019).

Early Drosophila development is characterized by 13, rapid, synchronous nuclear divisions. Zygotic transcription gradually becomes activated starting about the eighth nuclear division, and widespread transcription occurs at the 14th nuclear division when the division cycle slows and the nuclei are cellularized (Schulz and Harrison, 2019; Vastenhouw et al., 2019). The transcription factor Zelda (Zld) is required for transcription of hundreds of genes throughout zygotic genome activation (ZGA) and was the first identified global genomic activator (Liang et al., 2008; Harrison et al., 2011; Nien et al., 2011). Embryos lacking maternally encoded Zld die before completing the MZT (Liang et al., 2008; Harrison et al., 2011; Nien et al., 2011; Fu et al., 2014; Staudt et al., 2006). Zld has the defining features of a pioneer transcription factor: it binds to nucleosomal DNA (McDaniel et al., 2019), facilitates chromatin accessibility (Schulz et al., 2015; Sun et al., 2015) and this leads to subsequent binding of additional transcription factors (Yáñez-Cuna et al., 2012; Xu et al., 2014; Foo et al., 2014).

By contrast to the essential role for Zld in flies, no single global activator of zygotic transcription has been identified in other species. Instead, multiple transcription factors function together to activate zygotic transcription (Schulz and Harrison, 2019; Vastenhouw et al., 2019). Work from our lab and others has implicated additional factors in regulating reprogramming in the Drosophila embryo. Specifically, the enrichment of GA-dinucleotides in regions of the genome that remain accessible in the absence of Zld and at loci that gain accessibility late in ZGA, suggest that a protein that binds to these loci functions with Zld to define cis-regulatory regions during the initial stages of development (Schulz et al., 2015; Sun et al., 2015; Blythe and Wieschaus, 2016). Three proteins, GAGA factor (GAF), chromatin-linked adaptor for MSL proteins (CLAMP), and Pipsqueak (Psq) are known to bind to GA-dinucleotide repeats and are expressed in the early embryo, implicating one or all these proteins in reprogramming the embryonic transcriptome (Rieder et al., 2017; Soruco et al., 2013; Kuzu et al., 2016; Biggin and Tjian, 1988; Bhat et al., 1996; Soeller et al., 1993; Lehmann et al., 1998).

CLAMP was first identified based on its role in targeting the dosage compensation machinery, and it preferentially localizes to the X chromosome (Larschan et al., 2012; Soruco et al., 2013). Psq is essential for oogenesis (Horowitz and Berg, 1996), and it has been suggested to have a role as a repressor and in chromatin looping (Gutierrez-Perez et al., 2019; Huang et al., 2002). GAF, encoded by the Trithorax-like (Trl) gene, has broad roles in transcriptional regulation, including functioning as a transcriptional activator (Farkas et al., 1994; Bhat et al., 1996), repressor (Mishra et al., 2001; Busturia et al., 2001; Bernués et al., 2007; Horard et al., 2000), and insulator (Ohtsuki and Levine, 1998; Wolle et al., 2015; Kaye et al., 2017). Through interactions with chromatin remodelers, GAF is instrumental in driving regions of accessible chromatin both at promoters and distal cis-regulatory regions (Okada and Hirose, 1998; Tsukiyama et al., 1994; Xiao et al., 2001; Tsukiyama and Wu, 1995; Fuda et al., 2015; Judd et al., 2021). Analysis of hypomorphic alleles has suggested an important function for GAF in the early embryo in both driving expression of Ultrabiothorax (Ubx), Abdominal B (Abd-B), engrailed (en), and fushi tarazu (ftz) and in maintaining normal embryonic development (Bhat et al., 1996; Farkas et al., 1994). Given these diverse functions for GAF, and that it shares many properties of a pioneer transcription factor, we sought to investigate whether it has a global role in reprogramming the zygotic genome for transcriptional activation.

Investigation of the role of GAF in the early embryo necessitated the development of a system to robustly eliminate GAF, which had not been possible since GAF is essential for maintenance of the maternal germline and is resistant to RNAi knockdown in the embryo (Bhat et al., 1996; Bejarano and Busturia, 2004; Rieder et al., 2017). For this purpose, we generated endogenously GFP-tagged GAF, which provided the essential functions of the untagged protein, and used the deGradFP system to deplete GFP-tagged GAF in early embryos expressing only the tagged construct (Caussinus et al., 2012). Using this system, we identified an essential function for GAF in driving chromatin accessibility and gene expression during the MZT. Thus, at least two pioneer-like transcription factors, Zld and GAF, must cooperate to reprogram the zygotic genome of Drosophila following fertilization.

Results

GAF binds the same loci throughout ZGA

To investigate the role of GAF during the MZT, we used Cas9-mediated genome engineering to tag the endogenous protein with super folder Green Fluorescent Protein (sfGFP) at either the N- or C-termini, sfGFP-GAF(N) and GAF-sfGFP(C), respectively (Pédelacq et al., 2006). There are two protein isoforms of GAF (Benyajati et al., 1997). Because the N-terminus of GAF is shared by both isoforms, the sfGFP tag on the N-terminus labels all GAF protein (Figure 1—figure supplement 1A). By contrast, the two reported isoforms differ in their C-termini, and thus the C-terminal sfGFP labels only the short isoform (Figure 1—figure supplement 1B). Whereas null mutants in Trithorax-like (Trl), the gene encoding GAF, are lethal (Farkas et al., 1994), both sfGFP-tagged lines are homozygous viable and fertile. Additionally, in embryos from both lines sfGFP-labeled GAF is localized to discrete nuclear puncta and is retained on the mitotic chromosomes in a pattern that recapitulates what has been previously described for GAF based on antibody staining in fixed embryos (Figure 1A, Figure 1—figure supplement 1C; Raff et al., 1994). Together, these data demonstrate that the sfGFP tag does not interfere with essential GAF function and localization.

Figure 1 with 3 supplements see all
GAF binds thousands of loci throughout the MZT.

(A) Images of His2Av-RFP; sfGFP-GAF(N) embryos at the nuclear cycles (NC) indicated above. sfGFP-GAF(N) localizes to puncta during interphase and is retained on chromosome during mitosis. His2AvRFP is shown in magenta. sfGFP-GAF(N) is shown in green. Scale bars, 5 µm. (B) Binding motif enrichment of GA-dinucleotide repeats at GAF peaks identified at stage 3 and stage 5 determined by MEME-suite (left). Distribution of the GA-repeat motif within peaks (right). Gray line indicates peak center. (C) Representative genome browser tracks of ChIP-seq peaks for GAF-sfGFP(C) from stage 3 and stage 5 embryos and GAF ChIP-seq from S2 cells (Fuda et al., 2015). (D) Venn diagram of the peak overlap for GAF as determined by ChIP-seq for GAF-sfGFP(C) from sorted stage 3 embryos and stage 5 embryos and by ChIP-seq for GAF from S2 cells (Fuda et al., 2015). Total number of peaks identified at each stage is indicated in parentheses. See also Figure 1—figure supplements 13.

To begin to elucidate the role of GAF during early embryogenesis, we determined the genomic regions occupied by GAF during the MZT. We hand sorted homozygous GAF-sfGFP(C) stage 3 (nuclear cycle (NC) 9) and stage 5 (NC14) embryos and performed chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) using an anti-GFP antibody. While alternative splicing generates two GAF protein isoforms that differ in their C-terminal polyQ domain, in the early embryo only the short isoform is detectable (Benyajati et al., 1997). We confirmed the expression of the short isoform in the early embryo by blotting extract from 0 to 4 hr AEL (after egg laying) N- and C-terminally tagged GAF embryos with an anti-GFP antibody (Figure 1—figure supplement 1D). The long isoform was undetectable in extract from embryos of both sfGFP-tagged lines harvested 0–4 hr AEL, but was detectable at low levels in extract from sfGFP-GAF(N) embryos harvested 13–16 hr AEL. Thus, we conclude that during the MZT the C-terminally sfGFP-labeled short isoform comprises the overwhelming majority of GAF present in the GAF-sfGFP(C) embryos. We identified 3391 GAF peaks at stage 3 and 4175 GAF peaks at stage 5 (Supplementary file 1). To control for possible cross-reactivity with the anti-GFP antibody, we performed ChIP-seq on w1118 stage 3 and stage 5 embryos in parallel. Since there are no GFP-tagged proteins in the w1118 embryos, any peaks identified with the anti-GFP antibody would be the result of non-specific interactions and excluded from further analysis. No peaks were called in the w1118 dataset for either stage, confirming the specificity of the peaks identified in the GAF-sfGFP(C) embryos (Figure 1—figure supplement 2A,B). Further supporting the specificity of the ChIP data, the canonical GA-rich GAF-binding motif was the most highly enriched sequence identified in ChIP peaks from both stages, and these motifs were centrally located in the peaks (Figure 1B). Peaks were identified in the regulatory regions of several previously identified GAF-target genes, including the heat shock promoter (hsp70), even-skipped (eve), Krüppel (Kr), Ubx, and en (Figure 1C; Biggin and Tjian, 1988; Gilmour et al., 1989; Soeller et al., 1988; Lee et al., 1992; Read et al., 1990; Kerrigan et al., 1991). Peaks were enriched at promoters, which fits with the previously defined role of GAF in establishing paused RNA polymerase (Figure 1—figure supplement 2C; Lee et al., 2008; Fuda et al., 2015; Judd et al., 2021).

There was a substantial degree of overlap between GAF peaks identified at both stage 3 and stage 5. A total of 2955 peaks were shared between the two time points, representing 87% of total stage 3 peaks and 71% of total stage 5 peaks (Figure 1C,D, Figure 1—figure supplement 2D). This demonstrates that GAF binding is established prior to widespread ZGA and remains relatively unchanged during early development, similar to what has been shown for Zld, the major activator of the zygotic genome (Harrison et al., 2011). We compared GAF-binding sites we identified in the early embryo to previously identified GAF-bound regions in S2 cells, which are derived from 20 to 24 hr old embryos (Fuda et al., 2015). Despite the difference in cell-type and antibody used, 56% (1906) of stage 3 peaks and 43% (1811) of stage 5 peaks overlapped peaks identified in S2 cells (Figure 1C,D, Figure 1—figure supplement 2D). It was previously noted that GAF binding in 8–16 hr embryos was highly similar to GAF occupancy in the wing imaginal disc harvested from the larva (Slattery et al., 2014), and our data indicate that this binding is established early in development prior to activation of the zygotic genome. Nonetheless, peaks unique to each tissue likely represent GAF-binding events as they are centrally enriched for GA-rich GAF-binding motifs (Figure 1—figure supplement 3). Thus, while a majority of GAF-binding sites are maintained in both the embryo and cell culture, a subset of GAF-binding sites is likely tissue specific.

Maternal GAF is required for embryogenesis and progression through the MZT

Early embryonic GAF is maternally deposited in the oocyte, and this maternally deposited mRNA can sustain development until the third instar larval stage (Farkas et al., 1994). To investigate the role of GAF during the MZT therefore necessitated a system to eliminate this maternally encoded GAF. RNAi failed to successfully knockdown GAF in the embryo (Rieder et al., 2017), and female germline clones cannot be generated as GAF is required for egg production (Bhat et al., 1996; Bejarano and Busturia, 2004). To overcome these challenges, we leveraged our N-terminal sfGFP-tagged allele and the previously developed deGradFP system to target knockdown at the protein level (Caussinus et al., 2012). The deGradFP system uses a genomically encoded F-box protein fused to a nanobody recognizing GFP, which recruits GFP-tagged proteins to a ubiquitin ligase complex. Ubiquitination of the GFP-tagged protein subsequently leads to efficient degradation by the proteasome. To adapt this system for efficient use in the early embryo, we generated transgenic flies in which the deGradFP nanobody fusion was driven by the nanos (nos) promoter for strong expression in the embryo 0–2 hr after fertilization (Wang and Lehmann, 1991). In embryos laid by nos-deGradFP; sfGFP-GAF(N) females all maternally encoded GAF protein is tagged with GFP and thus subject to degradation by the deGradFP nanobody fusion. These embryos are hereafter referred to as GAFdeGradFP.

We verified the efficiency of the deGradFP system by imaging living embryos in which nuclei were marked by His2Av-RFP. GAFdeGradFP embryos lack the punctate, nuclear GFP signal identified in control embryos that do not carry the deGradFP nanobody fusion, indicating efficient depletion of sfGFP-GAF(N) (Figure 2A). This knockdown was robust, as we failed to identify any NC10 - 14 embryos with nuclear GFP signal, and none of the embryos carrying both the deGradFP nanobody and the sfGFP-tagged GAF hatched (Figure 2B). Based on live embryo imaging, the majority of embryos died prior to NC14, indicating that maternal GAF is essential for progression through the MZT. We identified a small number of GFP-expressing, gastrulating escapers. It is unclear if these embryos had an incomplete knockdown of maternally encoded sfGFP-GAF(N), or if a small percentage of embryos survived until gastrulation in the absence of GAF and that the GFP signal was the result of zygotic gene expression. Nonetheless, none of these embryos survived until hatching. Despite being able to maintain a strain homozygous for the N-terminal, sfGFP-tagged GAF, quantitative analysis revealed an effect on viability. Embryos homozygous for sfGFP-GAF(N) had only a 30% hatching rate (Figure 2B). By contrast, embryos homozygous for GAF-sfGFP(C) hatched at a rate of 66%. To determine whether the differences in hatching rates were due to differences in protein expression levels, we performed in vivo fluorescent quantification of the GFP signal in nuclei of sfGFP-GAF(N) and GAF-sfGFP(C) homozygous stage 5 embryos (2–2.5 hr AEL). There was no statistically significant difference in GFP signal between the genotypes (Figure 2—figure supplement 1), suggesting that the lower hatching rate for sfGFP-GAF(N) embryos is unlikely to be caused by reduced protein levels. Nonetheless, all future experiments controlled for the effect of the N-terminal tag by using sfGFP-GAF(N) homozygous embryos as paired controls with GAFdeGradFP embryos.

Figure 2 with 1 supplement see all
Embryos lacking maternal GAF die during early embryogenesis with nuclear and mitotic defects.

(A) Images of control (maternal genotype: His2Av-RFP; sfGFP-GAF(N)) and GAFdeGradFP (maternal genotype: His2Av-RFP/nos-deGradFP; sfGFP-GAF(N)) embryos at NC14, demonstrating loss of nuclear GFP signal specifically in GAFdeGradFP embryos. His2Av-RFP marks the nuclei. (B) Hatching rates after > 24 hr. (C) Confocal images of His2Av-RFP in arrested/dying GAFdeGradFP embryos with blocky nuclei, mitotic arrest, and nuclear fallout. (D) Confocal images of His2Av-RFP in NC13 control and GAFdeGradFP embryos, showing disordered nuclei and anaphase bridges in GAFdeGradFP embryos. Scale bars, 50 µm except where indicated. See also Figure 2—figure supplement 1.

Having identified a dramatic effect of eliminating maternally expressed GAF on early embryonic development, we used live imaging to investigate the developmental defects in our GAFdeGradFP embryos. GAFdeGradFP embryos in which nuclei were marked by a fluorescently labeled histone (His2Av-RFP) were imaged through several rounds of mitosis. We observed defects such as asynchronous mitosis, anaphase bridges, disordered nuclei, and nuclear dropout (Figure 2C,D; Videos 1 and 2). Nevertheless, embryos were able to complete several rounds of mitosis without GAF before arresting. Nuclear defects became more pronounced as GAFdeGradFP embryos developed and approached NC14. The arrested/dead embryos were often arrested in mitosis or had large, irregular nuclei (Figure 2C), similar to the nuclear ‘supernovas’ in GAF-deficient nuclei reported in Bhat et al., 1996. Our live imaging allowed us to detect an additional nuclear defect: that in the absence of maternal GAF nuclei become highly mobile and, in some cases, adopt a ‘swirling’ pattern (Video 2). While it is unclear what causes in this phenotype, GAF is suggested to have a role in maintaining genome stability. Thus, loss of GAF may lead to activation of the DNA-damage response pathway, which can result in similar nuclear phenotypes (Bhat et al., 1996; Sibon et al., 2000; Takada et al., 2003). As a control, we imaged sfGFP-GAF(N) homozygous embryos through several rounds of mitosis. These embryos proceeded normally through NC10-14, demonstrating that the mitotic defects in the GAFdeGradFP embryos were caused by the absence of GAF and not the sfGFP tag (Figure 2D; Video 3). The defects in GAFdeGradFP embryos are reminiscent of the nuclear-division defects previously reported for maternal depletion of zld (Liang et al., 2008; Staudt et al., 2006) and identify a fundamental role for GAF during the MZT.

Video 1
Video of a GAFdeGradFP embryo going through several rounds of mitosis prior to gastrulation.

Nuclei are marked by His2Av-RFP.

Video 2
Video of a severely disordered GAFdeGradFP embryo going through several rounds of mitosis prior to gastrulation.

Nuclei are marked by His2Av-RFP.

Video 3
Video of a control (His2Av-RFP; sfGFP-GAF(N)) embryo going through several rounds of mitosis prior to gastrulation.

Nuclei are marked by His2Av-RFP.

GAF is required for the activation of hundreds of zygotic genes

During the MZT, there is a dramatic change in the embryonic transcriptome as developmental control shifts from mother to offspring. Having demonstrated that maternal GAF was essential for development during the MZT, we investigated the role of GAF in regulating these transcriptional changes. We performed total-RNA seq on bulk collections of GAFdeGradFP and control (sfGFP-GAF(N)) embryos harvested 2–2.5 hr AEL, during the beginning of NC14 when the widespread genome activation has initiated. Our replicates were reproducible (Figure 3—figure supplement 1), allowing us to identify 1452 genes that were misexpressed in GAFdeGradFP embryos as compared to controls. Importantly, by using sfGFP-GAF(N) homozygous embryos as a control we have excluded from our analysis any genes misexpressed as a result of the sfGFP tag on GAF. Of the misexpressed genes 884 were down-regulated and 568 were up-regulated in the absence of GAF (Figure 3A, Figure 3—figure supplement 2A). The gene encoding GAF, Trithorax-like, was named because it was required for expression of the homeotic genes Ubx and Abd-B (Farkas et al., 1994). Our RNA-seq analysis identified 7 of the 8 Drosophila homeotic genes (Ubx, Abd-B, adb-A, pb, Dfd, Scr, and Antp) down-regulated in GAFdeGrad embryos. Additionally, many of the gap genes, essential regulators of anterior-posterior patterning, are down-regulated: giant (gt), knirps (kni), huckebein (hkb), Krüppel (Kr), and tailless (tll) (Supplementary file 2). Gene ontology (GO)-term analysis showed down-regulated genes were enriched for functions in system development and developmental processes as would be expected for essential genes activated during ZGA (Figure 3—figure supplement 2B). GO-term analysis of the up-regulated genes showed weak enrichment for response to stimulus and metabolic processes (Figure 3—figure supplement 2C).

Figure 3 with 3 supplements see all
GAF is required for zygotic genome activation.

(A) Volcano plot of transcripts mis-expressed in GAFdeGradFP embryos as compared to sfGFP-GAF(N) controls. Stage 5 GAF-sfGFP(C) ChIP-seq was used to identify GAF-bound target genes. (B) The percentage of up-regulated and down-regulated transcripts in GAFdeGradFP embryos classified as maternal, zygotic, or maternal-zygotic based on Lott et al., 2011. (C) Overlap of down-regulated embryonic transcripts in the absence of GAF or Zld activity (p<2.2×10−16, log2(odds ratio)=3.16, two-tailed Fisher’s exact test). Down-regulated genes for Zld are from McDaniel et al., 2019. (D) Transcripts down-regulated in both GAFdeGradFP and ZldCRY2 embryos or in either condition alone were classified based on temporal expression during ZGA (Li et al., 2014). Early = NC10-11, Mid = NC12-13, Late = early NC14, Later = late NC14. Only genes that were assigned to one of the four classes are shown. See also Figure 3—figure supplements 13.

To determine whether GAF is functioning predominantly in transcriptional activation or repression, we used our stage 5 ChIP-seq data to determine the likely direct targets of GAF by assigning GAF-ChIP peaks to the nearest gene. We found that 45% (397) of the down-regulated genes were proximal to a GAF peak. By contrast, only 17% (99) of up-regulated genes are near GAF peaks, similar to the 15% of genes with unchanged expression levels that were proximal to GAF sites (Figure 3A). The significant enrichment for GAF-binding sites proximal to down-regulated genes as compared to up-regulated supports a role for GAF specifically in transcriptional activation (p<2.2×10−16, log2(odds ratio)=1.95, two-tailed Fisher’s exact test). Genes activated late in NC14, during widespread genome activation, with a proximal GAF-binding site are significantly more down-regulated than genes activated at the same time point but lacking a GAF-binding site (Figure 3—figure supplement 2D). This analysis suggests that the down-regulated genes identified by RNA-seq are unlikely to be due to a general failure in activating zygotic gene expression, but rather specifically identify genes that depend on GAF for expression. Because zygotic gene expression is required for degradation of a subset of maternal mRNAs, if the genome is not activated, maternal transcripts are not properly degraded and therefore are increased in RNA-seq data (Harrison et al., 2011; Hamm et al., 2017; Liang et al., 2008). Indeed, down-regulated transcripts were enriched for zygotically expressed genes, and up-regulated transcripts were largely maternally contributed (p<2.2×10−16, log2(odds ratio)=4.21, two-tailed Fisher’s exact test; Figure 3B). Thus, GAF is essential for transcriptional activation during the MZT and likely functions along with Zld to drive zygotic genome activation.

Previous data defined a role for the additional GA-dinucleotide binding protein, CLAMP, in activating zygotic gene expression (Larschan et al., 2012; Rieder et al., 2017; Urban et al., 2017b; Soruco et al., 2013). Therefore, we compared genes down-regulated in GAFdeGradFP embryos to genes down-regulated in clamp-RNAi embryos 2–4 hr AEL (Rieder et al., 2017). A total of 174 genes are down-regulated in both datasets, comprising 19.7% of total down-regulated GAF targets and 50.1% of total down-regulated CLAMP targets (Figure 3—figure supplement 3A). While this demonstrates that a subset of genes requires both GAF and CLAMP for proper expression during ZGA, a majority of GAF-regulated genes only depend on GAF for expression, independent of CLAMP. GAF and CLAMP have similar, but not identical binding preferences (Kaye et al., 2018), which is reflected in their partially, but not completely overlapping genome occupancy (Figure 3—figure supplement 3B). These binding site differences likely explain their differential requirement during ZGA.

Having identified that GAF was required for ZGA, we wanted to ensure that the effects were not due to changes in the levels of Zld, the previously identified activator of the zygotic genome. Immunoblots for Zld on extract from GAFdeGradFP and control (sfGFP-GAF(N)) 2–2.5 hr AEL (stage 5) embryos confirmed that Zld levels were consistent between extracts (Figure 3—figure supplement 3C). Therefore, the effects of GAF on genome activation are not due to a loss of Zld. Reciprocally, we performed immunoblots on 2–2.5 hr AEL embryo extract for sfGFP-GAF(N) in a background in which maternal zld was depleted using RNAi driven by matα-GAL4-VP16 (Sun et al., 2015). We found that sfGFP-GAF(N) levels were unchanged upon zld knockdown compared to controls (Figure 3—figure supplement 3D), demonstrating that neither GAF nor Zld protein levels are affected by the knockdown of the other factor. Based on the roles for Zld and GAF in activating the zygotic genome, we investigated whether these proteins were required to activate distinct or overlapping target genes. We compared genes down-regulated in GAFdeGradFP embryos to genes down-regulated when Zld was inactivated optogenetically throughout zygotic genome activation (NC10-14) (McDaniel et al., 2019). We identified 135 genes down-regulated in both datasets, comprising 42% of the total number of down-regulated genes dependent on Zld and 15% of the total down-regulated genes dependent on GAF (Figure 3C). An even lower degree of overlap is observed when only direct targets are considered with 49 down-regulated targets shared between Zld and GAF, which have 232 and 397 direct targets, respectively. By contrast only 29 up-regulated genes were shared between the two datasets (Figure 3—figure supplement 3E). Genes that required both factors for activation include the gap genes previously mentioned (gt, kni, hkb, Kr, tll) as well as genes involved in cellular blastoderm formation such as slow as molasses (slam) and bottleneck (bnk). While Zld and GAF share some targets, they each are required for expression of hundreds of individual genes.

Activation of the zygotic genome is a gradual process that initiates with transcription of a small number of genes around NC8. Transcripts can therefore be divided into categories based on the timing of their initial transcription (Li et al., 2014; Lott et al., 2011). Previous data suggested that while Zld was required for activation of genes throughout the MZT, early genes were particularly sensitive to loss of Zld and that GAF might be functioning later (Schulz et al., 2015; Blythe and Wieschaus, 2016). To determine when GAF-dependent genes were expressed during ZGA, we took all the genes that could be classified based on their timing of activation during the MZT (as determined in Li et al., 2014) and divided them based on their dependence on Zld and GAF for activation: those down-regulated in both GAFdeGradFP and ZldCRY2 embryos (Both), those down-regulated only in ZldCRY2 embryos (ZldCRY2), and those down-regulated only in GAFdeGradFP embryos (GAFdeGradFP) (Figure 3D). Genes activated by Zld, both those regulated by Zld alone (ZldCRY2) and those regulated by both Zld and GAF (Both) were enriched for genes expressed early (NC10-11) and Mid (NC12-13). By contrast, 73% of genes activated by GAF alone (GAFdeGradFP) were expressed either late (early NC14) or later (late NC14) (p=1.3×10−14, log2(odds ratio)=3.13, two-tailed Fisher’s exact test). This analysis supports a model in which GAF and Zld are essential activators for the initial wave of ZGA and that GAF has an additional role, independent of Zld, in activating widespread transcription during NC14.

The majority of GAF and Zld-binding sites are occupied independently of the other factor

To further delineate the relationship between Zld- and GAF-mediated transcriptional activation during the MZT, we determined whether GAF binding was dependent on Zld. Not only are a substantial subset of genes dependent on both Zld and GAF for wild-type levels of expression (Figure 3C), but 42% of GAF-binding sites at stage 5 are also occupied by Zld (Harrison et al., 2011; Figure 4A,B). The GAF peaks that are co-bound with Zld include many of the strongest GAF peaks identified at stage 5 (Figure 4—figure supplement 1A). Zld is known to facilitate the binding of multiple different transcription factors (Twist, Dorsal, and Bicoid), likely by forming dynamic subnuclear hubs (Yáñez-Cuna et al., 2012; Xu et al., 2014; Foo et al., 2014; Mir et al., 2018; Dufourt et al., 2018; Yamada et al., 2019). Indeed, 19% of Dorsal sites depend on Zld for occupancy (Sun et al., 2015). We therefore examined GAF localization upon depletion of maternal zld using RNAi driven in the maternal germline by matα-GAL4-VP16 in a background containing sfGFP-GAF(N) (Sun et al., 2015). Immunoblot confirmed a nearly complete knockdown of Zld (Figure 4C), and we verified that RNAi-treated embryos failed to hatch. To assess the depletion of Zld on the subnuclear localization of GAF, we imaged sfGFP-GAF(N); zld-RNAi and control (sfGFP-GAF(N)) embryos using identical acquisition settings. We observed no difference in puncta formation of GAF at the beginning of ZGA (NC10) or late ZGA (NC14) (Figure 4D). We conclude that Zld is not required for GAF to form subnuclear puncta during the MZT.

Figure 4 with 2 supplements see all
At the majority of loci, GAF and Zld bind chromatin independently.

(A) Overlap of Zld- and GAF-binding sites determined by GAF-sfGFP(C) stage 5 ChIP-seq and Zld NC14 ChIP-seq (Harrison et al., 2011). (B) Representative genome browser tracks of Zld and GAF ChIP-seq peaks at the tailless locus. (C) Immunoblot for Zld on embryo extracts from zld-RNAi; sfGFP-GAF(N) and sfGFP-GAF(N) control embryos harvested 2–3 hr AEL. Tubulin was used as a loading control. (D) Images of zld-RNAi; sfGFP-GAF(N) embryos and sfGFP-GAF(N) control embryos at NC10 and NC14 as marked. Arrowhead shows nuclear dropout, a phenotype indicative of zld loss-of-function. Scale bar, 10 µm. (E) Correlation between log2(RPKM) of ChIP peaks for GAF from GAF-sfGFP(N) stage 5 embryos (control) and zld-RNAi; sfGFP-GAF(N) embryos (zld-RNAi). Color highlights significantly changed peaks (adjusted p-value<0.05, fold change > 2) and those that are bound by both GAF and Zld as indicated below. (F) Correlation between log2(RPKM) of ChIP peaks for Zld from sfGFP-GAF(N) embryos (control) and GAFdeGradFP embryos fixed 2–2.5 hr AEL. Color highlights significantly changed peaks (adjusted p-value<0.05, fold change > 2) and those that are bound by both GAF and Zld as indicated below. See also Figure 4—figure supplements 12.

To more specifically determine the impact of loss of Zld on GAF chromatin occupancy, we performed ChIP-seq with an anti-GFP antibody on embryos expressing zld-RNAi and sfGFP-GAF(N) along with paired sfGFP-GAF(N) controls at 2–2.5 hr AEL (stage 5). Mouse H3.3-GFP chromatin was used as a spike-in to normalize for immunoprecipitation efficiency between samples. In control sfGFP-GAF(N) embryos, we identified 6373 peaks, and these largely overlapped with the peaks identified in the GAF-sfGFP(C) embryos; 91% of the 4175 peaks identified in the GAF-sfGFP(C) embryos overlap with the peaks identified for the N-terminally tagged GAF (Figure 4—figure supplement 1B). This high degree of overlap indicates that our ChIP-seq experiments identified a robust set of high-confidence GAF-bound regions. To determine whether GAF requires Zld for binding, we analyzed GAF occupancy upon zld knockdown at this set of 3800 high-confidence peaks. The majority (3674) of these high-confidence GAF peaks were maintained in the zld-RNAi background and were bound at roughly equivalent levels when normalized to the spike-in control (Figure 4E and Figure 4—figure supplement 2A). Using DESeq2, we identified 126 GAF-bound regions that were significantly different between the sfGFP-GAF(N) controls and the zld-RNAi embryos (Figure 4E): 101 sites that were decreased and 25 sites that were increased.

To assess the impact of the loss of GAF on Zld binding, we completed the reciprocal experiment and performed ChIP-seq for Zld on GAFdeGradFP and control homozygous (sfGFP-GAF(N)) embryos at 2–2.5 hr AEL (stage 5). We identified a set of high-confidence Zld peaks, by overlapping the Zld peaks from our control (sfGFP-GAF(N)) embryos with previously published data for Zld at NC14 (Harrison et al., 2011). Similar to GAF binding when Zld is depleted, the majority of the 6003 high-confidence Zld peaks were maintained when GAF was depleted (Figure 4F). However, in the GAFdeGradFP ChIP data we observed a global reduction in Zld peak heights as compared to the control (Figure 4—figure supplement 2B). Therefore, it is possible that GAF knockdown causes a global reduction in Zld binding. However, technical, rather than biological differences, may account for this overall decrease (see Materials and methods). Among the high-confidence Zld peaks, DESeq2 identified 105 Zld-binding sites that differed in occupancy between GAFdeGradFP embryos and controls: 86 sites decreased and 19 sites increased.

Because of the limited number of significantly different peaks identified upon removal of either Zld (3.3% of GAF peaks changed) or GAF (1.7% of Zld peaks changed), we sought to determine whether there were global changes in the distribution of these factors. For this purpose, we ranked peaks in each dataset based on the number of reads per kilobase per million mapped reads (RPKM) and determined if the peak ranks were correlated between the mutant and control. We identified a high degree of correlation in peak rank for both comparisons (Pearson correlation, r = 0.93 for GAF ChIP in sfGFP-GAF(N) and sfGFP-GAF(N); zld-RNAi embryos and r = 0.88 for Zld ChIP in sfGFP-GAF(N) and GAFdeGradFP embryos; Figure 4—figure supplement 2C,D). Thus, the overall distribution of binding sites is maintained for each factor in the absence of the other.

At some loci, pioneer factors have been shown to function together to stabilize binding (Donaghey et al., 2018; Chronis et al., 2017). If either GAF or Zld stabilized genomic occupancy of the other factor, we would expect to identify a loss of binding specifically at loci that are co-bound by both factors. We identified a set of GAF and Zld co-occupied sites by overlapping our high-confidence GAF peaks and high-confidence Zld peaks. Peaks with at least 100 bp overlap were considered to be shared and used for our set of co-bound regions. To test likely direct effects of GAF on Zld occupancy, we determined the number regions that showed significant changes in Zld binding in GAFdeGradFP embryos that were bound by GAF. One out of the 86 regions at which Zld binding decreased in the GAF knockdown and 5 out of 19 that increased overlapped with regions bound by GAF (Figure 4F). Contrary to our expectation if GAF was directly promoting Zld occupancy, there was no enrichment for co-bound regions and those that lost Zld binding. We investigated the reciprocal effect of Zld on GAF occupancy by determining the number regions that showed significant changes in GAF binding in zld-RNAi embryos that were also bound by Zld. Fifty-one of the 101 regions that had decreased GAF binding were also bound by Zld, suggesting a potential direct effect of Zld on GAF occupancy at these regions. By contrast, only 6 of the 25 regions that had increased GAF binding were also bound by Zld (Figure 4E). GAF-binding regions that were decreased in the absence of Zld were significantly enriched for co-bound sites as compared to those GAF-binding sites that were maintained upon Zld depletion (p=4.8×10−5, log2(odds ratio)=1.19, two-tailed Fisher’s exact test). Based on the enrichment for Zld binding at these GAF-bound decreased sites, we hypothesized that Zld might be pioneering chromatin accessibility at these regions to promote GAF binding. Indeed, these regions were enriched for loci that depend on Zld for chromatin accessibility (p=2.2×10−16, log2(odds ratio)=4.07, two-tailed Fisher’s exact test): 38 of the 51 GAF and Zld co-bound sites at which GAF requires Zld for occupancy also require Zld for accessibility (Hannon et al., 2017). Thus, at this limited set of regions the pioneer factor Zld may function to establish accessible chromatin that is necessary to promote robust GAF binding. Nonetheless, the majority of Zld- and GAF-binding sites are occupied in the absence of the other factor.

GAF is essential for accessibility at hundreds of loci during ZGA

GAF interacts with chromatin remodelers and is required to maintain chromatin accessibility in tissue culture (Okada and Hirose, 1998; Tsukiyama et al., 1994; Tsukiyama and Wu, 1995; Xiao et al., 2001; Judd et al., 2021; Fuda et al., 2015). Furthermore, GAF-binding motifs are enriched at regions of open chromatin that are established at NC12 and NC13 and that are bound by Zld but do not depend on Zld for accessibility (Schulz et al., 2015; Sun et al., 2015; Blythe and Wieschaus, 2016). To directly test the function of GAF in determining accessible chromatin domains during the MZT, we performed the assay for transposase-accessible chromatin (ATAC)-seq on six replicates of single GAFdeGradFP and control (sfGFP-GAF(N)) embryos harvested 2–2.5 hr AEL. His2AV-RFP signal for each embryo was visually inspected prior to performing ATAC-seq to ensure that embryos did not have grossly distorted nuclear morphology. In contrast to our control, replicates from the GAFdeGradFP embryos showed higher variability (Figure 5—figure supplement 1A), suggesting that developmental defects caused by the lack of GAF might have lowered our ability to detect subtle changes in accessibility. Nonetheless, we identified 1523 regions with significant changes in accessibility in GAFdeGradFP embryos as compared to controls; 607 regions lost accessibility and 916 regions gained accessibility (Figure 5A, Figure 5—figure supplement 1B,C). Sites that lost accessibility were among the regions with the highest ATAC-signal (Figure 5—figure supplement 1C). 32% (197) of the regions that lost accessibility were significantly enriched for GAF binding, as determined by stage 5 GAF-sfGFP(C) ChIP-seq, when compared to open regions with no change in accessibility (p=2.2×10−16, log2(odds ratio)=2.4, two-tailed Fisher’s exact test, Figure 5—figure supplement 1D). The enrichment for GAF-binding sites in those regions that depend on GAF for accessibility suggests that GAF may directly drive accessibility at these regions. Therefore, we focused our analysis on the subset of regions that depend on GAF for accessibility, which we define as those that lost accessibility in the absence of GAF. Consistent with the enrichment of GAF-binding sites in promoters, 45% of all regions that depend on GAF for accessibility were in promoters. Enrichment for promoters was even larger in those regions that were bound by GAF and dependent on GAF for accessibility (Figure 5B). By contrast, regions that gain accessibility in the absence of GAF are not enriched for promoters, supporting that this is likely an indirect effect of GAF knockdown (Figure 5B). Ubx, a previously identified embryonic GAF-target gene, is an example of a gene that requires GAF for chromatin accessibility at the promoter (Figure 5C; Farkas et al., 1994). Ubx similarly requires GAF occupancy for gene expression as the transcript was significantly down-regulated in GAFdeGradFP embryos (Figure 5C). Thus, GAF binding at the Ubx promoter is required for both chromatin accessibility and gene expression. Together, our data demonstrate that GAF is required for chromatin accessibility at hundreds of loci during the MZT, and that this activity occurs preferentially at promoters.

Figure 5 with 1 supplement see all
GAF is required for chromatin accessibility.

(A) Volcano plot of regions that change in accessibility in GAFdeGradFP embryos as compared to sfGFP-GAF(N) controls, stage 5 GAF-sfGFP(C) ChIP-seq was used to identify GAF-bound target regions. (B) Genomic distribution of all regions that lose accessibility (Lose accessibility (all)), regions that lose accessibility and are GAF-bound (Lose accessibility (GAF-bound), regions that did not change significantly in accessibility (Non-significant), and regions that gain accessibility (Gain accessibility)). (C) Genome browser tracks of GAF-sfGFP(C) ChIP-seq at stage 3 and stage 5, ATAC-seq on wild-type embryos at NC12, NC13, and stage 5 along with control (sfGFP-GAF(N)) and GAFdeGradFP embryos, and RNA-seq from control (sfGFP-GAF(N)) and GAFdeGradFP embryos. NC12 and NC13 ATAC-seq data is from Blythe and Wieschaus, 2016. ATAC-seq data for stage 5 embryos is from Nevil et al., 2020. Region highlighted in green indicates the GAF-dependent, GAF-bound Ubx promoter. See also Figure 5—figure supplement 1.

GAF and Zld are individually required for chromatin accessibility at distinct regions

Previous work from our lab and others suggested that during the MZT, GAF may be responsible for maintaining chromatin accessibility at Zld-bound regions in the absence of Zld (Schulz et al., 2015; Moshe and Kaplan, 2017; Sun et al., 2015). To more broadly investigate how GAF and Zld shape chromatin accessibility during early development, we focused on regions co-occupied by both factors as identified by ChIP-seq (this work and Harrison et al., 2011). We compared our single embryo ATAC-seq data to ATAC-seq data for wild-type NC14 embryos and NC14 embryos lacking maternal zld (zld-) (Figure 6A; Hannon et al., 2017). Only seven of the 1192 regions co-bound by both Zld and GAF decreased in accessibility upon removal of either factor. By contrast, 104 regions require GAF for accessibility and 190 require Zld. Regions that require GAF for accessibility had a higher average GAF ChIP-seq peak height than regions that require Zld. Similarly, the Zld ChIP-seq signal was higher in regions where Zld is necessary for accessibility. Thus, both GAF and Zld are individually required for accessibility at distinct genomic regions co-occupied by both factors, and this requirement is correlated with occupancy as reflected in ChIP-seq peak height (Figure 6A). GAF is also required for accessibility at 83 additional regions at which GAF is bound without Zld (Figure 6—figure supplement 1). The majority of sites bound by both Zld and GAF (891) did not change in chromatin accessibility when either factor was removed, suggesting that at these locations GAF and Zld may function redundantly to facilitate chromatin accessibility or that other factors were sufficient to maintain accessibility at these sites. Thus, during the MZT GAF and Zld are individually required for chromatin accessibility at distinct co-bound regions. At very few co-bound regions are both factors individually required.

Figure 6 with 2 supplements see all
GAF and Zld independently shape chromatin accessibility over the MZT.

(A) Heatmaps of ChIP-seq and ATAC-seq data, as indicated above, for regions bound by both GAF and Zld and subdivided based on the change of accessibility in the absence of either factor. (B) Number of regions that depend on GAF for accessibility and are bound by GAF that are accessible at NC11, NC12, and NC13. (C) Number of regions that depend on Zld for accessibility and are bound by Zld that are accessible at NC11, NC12, and NC13. NC11, NC12, and NC13 data are from Blythe and Wieschaus, 2016. Zld ChIP-seq data are from Harrison et al., 2011. ATAC-seq data from zld germline clones (zld-) are from Hannon et al., 2017. Total number of regions accessible at NC11 = 3084, NC12 = 6487, and NC13 = 9824. See also Figure 6—figure supplements 12.

Previous work demonstrated that the Zld-binding motif is enriched at accessible regions that are established by NC11. By contrast, GAF-binding motifs are enriched at regions that dynamically gain chromatin accessibility later during the MZT at NC12 or NC13 (Blythe and Wieschaus, 2016). If Zld preferentially drives chromatin accessibility early in the MZT and GAF is preferentially required later, we would expect that regions that require GAF for accessibility would be enriched for regions that gain accessibility at NC12 and NC13. To test this prediction, we compared our ATAC-seq data to ATAC-seq on embryos precisely staged by nuclear cycle (Blythe and Wieschaus, 2016). Of the GAF-bound, GAF dependent accessible regions that were identified in the staged ATAC-seq data, three were open by NC11, 17 were newly opened at NC12, and 42 were newly opened at NC13 (Figure 6B, Figure 6—figure supplement 2A), demonstrating that GAF-bound, GAF-dependent regions are enriched for regions that dynamically open at NC12 or NC13 as compared to regions that are already accessible at NC11 (p=5.7×10−7, log2(odds ratio)=3.2, two-tailed Fisher’s exact test). Indeed, the Ubx promoter dynamically gained accessibility at NC13 (Figure 5C). Confirming that the loss of accessibility observed at late-opening regions was not due to differences in staging between the control (sfGFP-GAF(N)) and GAFdeGradFP embryos, the vast majority of sites that gained accessibility at NC13 (2452) were unchanged in accessibility in the GAFdeGradFP embryos (Figure 6—figure supplement 2). In contrast to GAF, Zld-bound, Zld-dependent accessible regions were enriched for regions already accessible at NC11: 1182 were open by NC11, 343 were newly opened at NC12, and 244 were newly opened at NC13 (p<2.2×10−16, log2(odds ratio)=2.7, two-tailed Fisher’s exact test) (Figure 6C). Together, these analyses show that GAF is preferentially required for chromatin accessibility at regions that gain accessibility over the MZT, while Zld is required for accessibility at regions that are open early.

The preferential requirement for GAF-mediated accessibility during NC12 and NC13 allowed us to test whether GAF binding preceded the establishment of chromatin accessibility at these regions or whether this accessibility was co-incident with GAF binding. We determined whether the 42 regions that depended on GAF for accessibility and that gained accessibility at NC13 were already bound by GAF at stage 3 (NC9) or whether these regions were among those that were only occupied by GAF at stage 5 (NC14). All 42 of these regions were bound by GAF at stage 3 and stage 5, demonstrating that GAF occupancy at these regions precedes accessibility and supports a role for GAF as a pioneer factor at these regions.

Discussion

Through a combination of Cas9-genome engineering and the deGradFP system, we have depleted maternally encoded GAF and demonstrated that GAF is required for progression through the MZT. Along with Zld, GAF is broadly required for activation of the zygotic genome and for shaping chromatin accessibility. During the major wave of ZGA, when thousands of genes are transcribed, Zld and GAF are largely independently required for both transcriptional activation and chromatin accessibility (Figure 7). Thus, in Drosophila, as in mice, zebrafish, and humans, transcriptional activation during the MZT is driven by multiple factors with pioneering characteristics (Schulz and Harrison, 2019; Vastenhouw et al., 2019). Together our system has enabled us to begin to determine how these two powerful factors collaborate to ensure the rapid, efficient transition from specified germ cells to a pluripotent cell population.

Zld and GAF independently regulate embryonic reprogramming.

(A) During the minor wave of ZGA (NC10-13), Zld is the predominant factor required for driving expression of genes bound by Zld alone and genes bound by both Zld and GAF. (B) As the genome is more broadly activated during NC14, GAF becomes the major factor in driving zygotic transcription. (C) Early in the MZT, Zld is required for chromatin accessibility at many regions co-bound by GAF and Zld. When zld is depleted, accessibility is lost at a subset of regions, but GAF remains bound at the majority of sites. (D) Late during the MZT, GAF is required for chromatin accessibility at many Zld-bound regions. In GAFdeGradFP embryos accessibility is lost at a subset of sites, but Zld remains bound.

Maternally encoded GAF is essential for development beyond the MZT

Using the deGradFP system, we demonstrated that maternally encoded GAF is required for embryogenesis. In contrast to zygotic GAF null embryos, which survive until the third instar larval stage, GAFdeGradFP embryos do not hatch (Farkas et al., 1994). Our imaging showed that depletion of GAF in the early embryo resulted in severe defects, including asynchronous mitosis, anaphase bridges, nuclear dropout, and high nuclear mobility (Figure 2C,D, Videos 1 and 2). Similar defects are seen when zld is maternally depleted from embryos (Liang et al., 2008; Staudt et al., 2006) and like embryos lacking maternal zld, GAFdeGradFP embryos died before the completion of the MZT. Distinct from embryos lacking maternal zld, a subset of GAFdeGradFP embryos display more intense defects. We identified defects in a subset of GAFdeGradFP embryos as early as NC10, which is consistent with our genomics data indicating that GAF is required for proper expression of some of the earliest genes expressed during ZGA (Figure 3D). It is possible that some of the phenotypic defects observed in GAFdeGradFP embryos are the result of GAF knockdown in the female germline. However, GAF null female germline clones are unable to produce eggs (Bejarano and Busturia, 2004), which stands in contrast to the large number of GAFdeGradFP embryos produced by females expressing sfGFP-GAF(N) and the deGradFP nanobody fusion. Together these phenotypic differences between null mutants and our deGrad-based knockdown along with our genomics data identifying misregulation of previously identified embryonic GAF-targets in the GAFdeGradFP embryos demonstrate an essential function for GAF in the early embryo.

Live imaging of GFP-tagged, endogenously encoded GAF demonstrated that GAF is mitotically retained in small foci. This is similar to what was reported for antibody staining on fixed embryos, which showed GAF localized at pericentric heterochromatin regions of GA-rich satellite repeats (Raff et al., 1994; Platero et al., 1998). Because this prior imaging necessitated fixing embryos, it was unclear if mitotic retention of GAF was required for mitosis. Our system enabled us to determine that in the absence of GAF nuclei can undergo several rounds of mitosis albeit with noticeable defects (Videos 1 and 2). We conclude that GAF is not strictly required for progression through mitosis. However, the nuclear defects observed in GAFdeGradFP embryos support the model that GAF is broadly required for nuclear division and chromosome stability, in addition to its role in transcriptional activation during ZGA (Bhat et al., 1996). Our imaging also identified high nuclear mobility in GAFdeGradFP embryos as compared to control embryos; nuclei of a subset of embryos showed a dramatic ‘swirling’ pattern of movement (Video 2). This defect is potentially due a DNA damage response that inactivates centrosomes to promote nuclear fallout (Sibon et al., 2000; Takada et al., 2003), or general disorder in the cytoskeletal network that is responsible for nuclear migration and division in the syncytial embryo (Sullivan and Theurkauf, 1995). Altogether, our phenotypic analysis of GAFdeGradFP embryos shows for the first time that maternal GAF is required for progression through the MZT, and suggests GAF has an early, global role in nuclear division and chromosome stability.

In addition to GAF and Zld, the transcription factor CLAMP is expressed in the early embryo and functions in chromatin accessibility and transcriptional activation (Rieder et al., 2017; Soruco et al., 2013; Urban et al., 2017a; Urban et al., 2017b; Rieder et al., 2019). While CLAMP, like GAF, binds GA-dinucleotide repeats, the two proteins preferentially bind to slightly different GA-repeats (Kaye et al., 2018). GAF and CLAMP can compete for binding sites in vitro, and, in cell culture, when one factor is knocked down the occupancy of the other increases, suggesting that CLAMP and GAF compete for a subset of binding sites and may have partial functional redundancy (Kaye et al., 2018). We demonstrate that GAF, like CLAMP, is essential in the early embryo, indicating that these two GA-dinucleotide-binding proteins cannot completely compensate for each other in vivo during early development (Rieder et al., 2017). Furthermore, our sequencing analysis identified that a majority of genes that require GAF for expression are distinct from those that are regulated by CLAMP. Thus, while GAF and CLAMP may have some overlapping functions, they are independently required to regulate embryonic development during the MZT.

GAF is necessary for widespread zygotic genome activation and chromatin accessibility

GAF is a multi-purpose transcription factor with known roles in transcriptional regulation at promoters and enhancers as well as additional suggested roles in high-order chromatin structure. Our analysis showed that during the MZT GAF acts largely as an activator, directly binding and activating hundreds of zygotic transcripts during ZGA (Figure 3). We identified thousands of regions bound by GAF throughout the MZT, and these regions were preferentially associated with genes whose transcription decreased when maternally encoded GAF was degraded. This function may be driven, in part, through GAF-mediated chromatin accessibility as we identified hundreds of regions that depend on GAF for accessibility (Figure 5). This activity, in both transcriptional activation and mediating open chromatin, is similar to Zld, the only previously identified essential activator of the zygotic genome in Drosophila, and is shared with genome activators in other species, such as Pou5f3 and Nanog (zebrafish) and Dux (mammals) (Schulz and Harrison, 2019; Vastenhouw et al., 2019). Our data support a pioneer-factor like role for GAF in this process as we demonstrated that GAF is already bound early in the MZT to regions that require GAF for accessibility later. Thus, GAF occupancy precedes GAF-mediated accessibility at a subset of sites.

In addition to a direct role in mediating chromatin accessibility through the recruitment of chromatin remodelers (Okada and Hirose, 1998; Tsukiyama et al., 1994; Tsukiyama and Wu, 1995; Xiao et al., 2001; Judd et al., 2021; Fuda et al., 2015), GAF may also indirectly affect chromatin accessibility through a role in shaping three-dimensional chromatin structure. Sixty-seven percent of the regions that lost chromatin accessibility in the absence of GAF did not overlap a GAF-binding site as identified by ChIP-seq. Thus, at these regions GAF may function indirectly through the ability to facilitate enhancer-promoter loops (Mahmoudi et al., 2002; Melnikova et al., 2004; Petrascheck et al., 2005). In addition, GAF-binding motifs are enriched at TAD boundaries which form during the MZT (Hug et al., 2017), and GAF binding is enriched at Polycomb group dependent repressive loops that form following NC14 (Ogiyama et al., 2018). Further investigation is necessary to determine the role of GAF in establishing three-dimensional chromatin architecture in the early embryo.

GAF and Zld are uniquely required to reprogram the zygotic genome

Reprogramming in culture requires a cocktail of transcription factors that possess pioneer factor activity (Takahashi and Yamanaka, 2006; Soufi et al., 2012; Soufi et al., 2015). Similarly, in zebrafish and mice multiple pioneering factors are required for the rapid and efficient reprogramming that occurs during the MZT (Schulz and Harrison, 2019; Vastenhouw et al., 2019). Here we have shown that, in addition to the essential pioneer factor Zld, GAF is a pioneer-like factor required for both gene expression and chromatin accessibility in the early Drosophila embryo. By analyzing the individual contributions of these two factors, we have begun to elucidate the different mechanisms by which multiple pioneer factors can drive dramatic changes in cell identity.

While some pioneer factors work together to stabilize their interaction on chromatin (Chronis et al., 2017; Donaghey et al., 2018; Liu and Kraus, 2017; Swinstead et al., 2016) our data support primarily independent genome occupancy by Zld and GAF. Our reciprocal ChIP-seq data demonstrated that at regions bound by GAF and Zld, binding of each factor was largely retained in the absence of the other (Figure 4). Analysis of chromatin accessibility further supports these independent roles. We would predict that if Zld binding were lost in the absence of GAF or vice versa that at regions occupied by both factors either Zld or GAF individually would be required for accessibility. However, we identified only seven regions that lost accessibility in the absence of either Zld or GAF. By contrast, we identified more than one hundred regions that were individually dependent on each factor alone, and 891 regions that did not change in accessibility upon loss of either Zld or GAF. These data support independent roles for Zld and GAF in establishing or maintaining accessibility. Furthermore, we propose that because each factor can retain genome occupancy in the absence of the other that this may explain the 891 regions that remain accessible in the absence of either Zld or GAF: in the absence of Zld, GAF may be able to maintain accessibility and vice versa. However, until we investigate the chromatin landscape upon the removal of both Zld and GAF we cannot rule out that at these regions other factors may be instrumental in maintaining accessibility. Indeed, Duan et al. identify an essential role for the GA-dinucleotide-binding protein CLAMP in directing Zld binding and promoting chromatin accessibility (Duan et al., 2020).

At most loci, our data support independent binding for GAF and Zld. Nonetheless, we cannot eliminate a possible global role for GAF on Zld occupancy as the Zld ChIP-seq signal is considerably lower in the GAFdeGradFP embryos as compared to controls. We would predict that if GAF is directly functioning to stabilize Zld binding then this effect would be more evident at regions where both GAF and Zld bind, but this is not what we observed. Nonetheless, because of the proposed role of GAF in regulating three-dimensional chromatin architecture, GAF may be globally required for robust Zld occupancy. Furthermore, at a small subset of loci our data support a pioneering role for Zld in promoting GAF binding. In this experiment, the endogenously tagged GAF allowed us to use antibodies directed against the GFP epitope for ChIP. The use of the epitope tag enabled us to robustly control for immunoprecipitation efficiency by using another GFP-tagged protein as a spike-in control for the immunoprecipitation of GAF. In this manner, we identified 101 GAF-bound regions that decreased upon Zld knockdown. Of these, 51 were Zld-bound regions and 38 of these depend on Zld for chromatin accessibility. Thus, at a small subset of sites the pioneering function of Zld may be required to stabilize GAF binding.

We propose that there is a handoff between Zld and GAF pioneer-like activity as the embryo progresses through the MZT: Zld functions primarily at the initiation of zygotic genome activation and GAF functions primarily later during the major wave of zygotic transcription. We identified that genes regulated by Zld are enriched for those activated during NC10-13, while genes that depend on GAF are enriched for those that initiate expression during NC14. Similarly, we determined that the majority of regions that require Zld for accessibility are already accessible at NC11 while those that require GAF are enriched for regions that gain accessibility at NC12 and NC13. This is supported by prior analysis that showed that the Zld-binding motif is enriched at regions that are already accessible at NC11 and the GAF-binding motif is enriched at regions that dynamically gain accessibility later during the MZT (Blythe and Wieschaus, 2016). Based on this evidence, we propose that there is a gradual handoff in the control of transcriptional activation and chromatin remodeling from Zld to GAF as the MZT progresses (Figure 7). During the initial stages of the MZT, the nuclear division cycle is an incredibly rapid series of synthesis and mitotic phases. At NC14, this cycle slows, and these different dynamics likely influence the mechanisms by which accessibility can be established. While it is unclear how Zld establishes accessibility, GAF interacts with chromatin remodelers. It is possible that the activities of these complexes may have a more substantial impact once the division cycle slows.

Together our data support the requirement for at least two pioneer-like transcription factors, Zld and GAF, to sequentially reprogram the zygotic genome following fertilization and allow for future embryonic development. It is likely that there are additional factors that function along with Zld and GAF to define accessible cis-regulatory regions and drive genome activation. Future studies will enable more detailed mechanistic insights into how multiple pioneering factors work together to reshape the transcriptional landscape and transform cell fate in the early embryo.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Genetic reagent (Drosophila melanogaster)w1118Bloomington Drosophila Stock CenterBDSC:3605;
FLYB:FBal0018186;
RRID:BDSC_3605
Genetic reagent (D. melanogaster)His2AV-RFP (II)Bloomington Drosophila Stock CenterBDSC:23651;
FLYB:FBti0077845;
RRID:BDSC_23651
Genetic reagent (D. melanogaster)mat-α-GAL4-VP16Bloomington Drosophila Stock CenterBDSC:7062;
FLYB:FBti0016915;
RRID:BDSC_7062
Genetic reagent (D. melanogaster)UAS-shRNA-zldSun et al., 2015
DOI:10.1101/gr.192542.115
Genetic reagent (D. melanogaster)sfGFP-GAF(N)This paperCas9 edited allele
Genetic reagent (D. melanogaster)GAF-sfGFP(C)This paperCas9 edited allele
Genetic reagent (D. melanogaster)nos-deGradFPThis paperTransgenic insertion into PBac{yellow[+]-attP-3B}VK00037 docking site (BDSC:9752) (FLYB: FBti0076455)
NSlmb-vhhGFP4 amplified from BDSC:58740
AntibodyAnti-GFP (rabbit polyclonal)AbcamCat# ab290ChIP (6 μg)
WB (1:2000)
AntibodyAnti-Zld (rabbit polyclonal)Harrison et al., 2010
DOI:10.1016/j.ydbio.2010.06.026
ChIP (8 μg)
WB (1:750)
AntibodyAnti-alpha tubulin (mouse monoclonal)Sigma-AldrichCat# T6199WB (1:5000)
AntibodyAnti-rabbit IgG-HRP (Goat, secondary)Bio-RadCat#1706515WB (1:3000)
AntibodyAnti-mouse IgG-HRP (Goat, secondary)Bio-RadCat#1706516WB (1:3000)
cell line Mus musculusH3.3-GFPThis paperCell line maintained in the lab of Peter Lewis
software, algorithmRhttp://www.R-project.org
software, algorithmbowtie 2 v2.3.5Langmead and Salzberg, 2012
software, algorithmSamtools v1.11http://www.htslib.org/
software, algorithmMACS v2Zhang et al., 2008
software, algorithmGenomicRanges R packageLawrence et al., 2013
software, algorithmDeepToolsRamírez et al., 2016
software, algorithmMEME-suiteBailey et al., 2009
software, algorithmGviz R packageHahne and Ivanek, 2016`
software, algorithmSubread (v1.6.4)Liao et al., 2014
software, algorithmDESeq2 R packageLove et al., 2014
software, algorithmHISAT v2.1.0Kim et al., 2015
software, algorithmNGMergeGaspar, 2018

Drosophila strains and genetics

Request a detailed protocol

All stocks were grown on molasses food at 25°C. Fly strains used in this study: w1118, His2Av-RFP (II) (Bloomington Drosophila Stock Center (BDSC) #23651), mat-α-GAL4-VP16 (BDSC #7062), UAS-shRNA-zld (Sun et al., 2015). sfGFP-GAF(N) and GAF-sfGFP(C) mutant alleles were generated using Cas9-mediated genome engineering (see below).

nos-deGradFP (II) transgenic flies were made by PhiC31 integrase-mediated transgenesis into the PBac{yellow[+]-attP-3B}VK00037 docking site (BDSC #9752) by BestGene Inc. The sequence for NSlmb-vhhGFP4 was obtained from Caussinus et al., 2012 and amplified from genomic material from UASp-Nslmb.vhhGFP4 (BDSC #58740). The NSlmb-vhhGFP4 sequence was cloned using Gibson assembly into pattB (DGRC #1420) with the nanos promoter and 5’UTR.

To obtain the embryos for ChIP-seq and live embryo imaging in a zld knockdown background, we crossed mat-α-GAL4-VP16 (II)/Cyo; sfGFP-GAF N(III) flies to UAS-shRNA-zld (III) flies and took mat-α-GAL4-VP16/+ (II); sfGFP-GAF(N)/ UAS-shRNA-zld (III) females. These females were crossed to their siblings, and their embryos were collected. For controls, mat-α-GAL4-VP16 (II)/Cyo; sfGFP-GAF N(III) flies were crossed to w1118 flies and embryos from mat-α-GAL4-VP16/+(II): sfGFP-GAF(N)/+(III) females crossed to their siblings were collected.

To obtain embryos for live imaging, hatching rate assays, RNA-seq, ATAC-seq, and ChIP-seq in a GAF knockdown background we crossed nos-degradFP (II); sfGFP-GAF(N)/TM6c (III) flies to His2Av-RFP (II); sfGFP-GAF(N) (III) flies and selected females that were nos-degradFP/His2Av-RFP (II); sfGFP-GAF(N) (III). These females were crossed to their siblings of the same genotype, and their embryos were collected. Embryos from His2Av-RFP (II); sfGFP-GAF(N) (III) females were used as paired controls.

Cas9-genome engineering

Request a detailed protocol

Cas9-mediated genome engineering as described in Hamm et al., 2017 was used to generate the N-terminal and C-terminal super folder Green Fluorescent Protein (sfGFP)-tagged GAF. The double-stranded DNA (dsDNA) donor was created using Gibson assembly (New England BioLabs, Ipswich, MA) with 1 kb homology arms flanking the sfGFP tag and GAF N-terminal or C-terminal open reading frame. sfGFP sequence was placed downstream of the GAF start codon (N-terminal) or just upstream of the stop codon in the fifth GAF exon, coding for the short isoform (C-terminal). Additionally, a 3xP3-DsRed cassette flanked by the long-terminal repeats of PiggyBac transposase was placed in the second GAF intron (N-terminal) or fourth GAF intron (C-terminal) for selection. The guide RNA sequences (N-terminal- TAAACATTAAATCGTCGTGT), (C-terminal- AAATGAATACTCGATTA) were cloned into pBSK under the U63 promoter using inverse PCR. Purified plasmid was injected into embryos of yw; attP40{nos-Cas9}/CyO for the N-terminal line and y1 M{vas-Cas9.RFP-}ZH-2A w1118 (BDSC#55821) for the C-terminal line by BestGene Inc Lines were screened for DsRed expression to verify integration. The entire 3xP3-DsRed cassette was cleanly removed using piggyBac transposase, followed by sequence confirmation of precise tag integration.

Live embryo imaging

Request a detailed protocol

Embryos were dechorionated in 50% bleach for 2 min and subsequently mounted in halocarbon 700 oil. Due to the fragility of the GAFdegradFP embryos, embryos used for videos were mounted in halocarbon 700 oil without dechorionation. The living embryos were imaged on a Nikon A1R+ confocal at the University of Wisconsin-Madison Biochemistry Department Optical Core. Nuclear density, based on the number of nuclei/2500 μm2, was used to determine the cycle of pre-gastrulation embryos. Nuclei were marked with His2AV-RFP. Image J (Schindelin et al., 2012) was used for post-acquisition image processing. Videos were acquired at 1 frame every 10 s. Playback rate is seven frames/second.

Hatching rate assays

Request a detailed protocol

A minimum of 50 females and 25 males of the indicated genotypes were allowed to mate for at least 24 hr before lays were taken for hatching rate assays. Embryos were picked from overnight lays and approximately 200 were lined up on a fresh molasses plate. Unhatched embryos were counted 26 hr or more after embryos were selected.

Immunoblotting

Request a detailed protocol

Proteins were transferred to 0.45 μm Immobilon-P PVDF membrane (Millipore, Burlington, MA) in transfer buffer (25 mM Tris, 200 mM Glycine, 20% methanol) for 60 min (75 min for Zld) at 500mA at 4°C. The membranes were blocked with blotto (2.5% non-fat dry milk, 0.5% BSA, 0.5% NP-40, in TBST) for 30 min at room temperature and then incubated with anti-GFP (1:2000, #ab290) (Abcam, Cambridge, United Kingdom) anti-Zld (1:750) (Harrison et al., 2010), or anti-tubulin (DM1A, 1:5000) (Sigma, St. Louis, MO), overnight at 4°C. The secondary incubation was performed with goat anti-rabbit IgG-HRP conjugate (1:3000) (Bio-Rad, Hercules, CA) or anti-mouse IgG-HRP conjugate (1:3000) (Bio-Rad) for 1 hr at room temperature. Blots were treated with SuperSignal West Pico PLUS chemiluminescent substrate (Thermo Fisher Scientific, Waltham, MA) and visualized using the Azure Biosystems c600 or Kodak/Carestream BioMax Film (VWR, Radnor, PA).

Fluorescent quantification of nuclei

Request a detailed protocol

2–2.5 hr AEL embryos were dechorionated in bleach and subsequently mounted in halocarbon 700 oil. Embryos were imaged on a Nikon Ti-2e Epifluorescent microscope using ×60 magnification. Images were acquired of a single z-plane. Analysis was performed using the Nikon analysis software. Ten circular regions of interest (ROIs) were drawn around individual nuclei and the mean fluorescent intensity of each nucleus was calculated. Ten circular ROIs were drawn in the regions outside of the nuclei to measure the background fluorescent level of the embryo, and the mean fluorescent intensity of the background was calculated. To normalize values to the background fluorescent intensity, the final mean intensity of the nuclei was determined to be the mean fluorescent intensity of the nuclei after subtracting the mean fluorescent intensity of the background. This analysis was performed on images from 27 GAF-sfGFP(C) homozygous embryos and 26 sfGFP-GAF(N) homozygous embryos.

Chromatin immunoprecipitation

Request a detailed protocol

ChIP was performed as described previously (Blythe and Wieschaus, 2015) on: stage 3 and stage five hand selected GAF-sfGFP(C) homozygous embryos, stage 3 and stage five hand selected w1118 embryos, 2–2.5 hr AEL hand selected embryos from mat-α-GAL4-VP16/+ (II); sfGFP-GAF(N)/ UAS-shRNA-zld (III) females and mat-α-GAL4-VP16/+ (II); sfGFP-GAF(N)/+ (III) females, 2–2.5 hr AEL embryos from nos-degradFP/His2Av-RFP (II); sfGFP-GAF(N) (III) females, and 2–2.5 hr AEL embryos from His2Av-RFP (II); sfGFP-GAF(N) (III) females. For each genotype, two biological replicates were collected. Briefly, 1000 stage 3 embryos or 400–500 stage five embryos were collected, dechorionated in 50% bleach for 3 min, fixed for 15 min in 0.45% formaldehyde and then lysed in 1 mL of RIPA buffer (50 mM Tris-HCl pH 8.0, 0.1% SDS, 1% Triton X-100, 0.5% sodium deoxycholate, and 150 mM NaCl). The fixed chromatin was then sonicated for 20 s 11 times at 20% output and full duty cycle (Branson Sonifier 250). Chromatin was incubated with 6 μg of anti-GFP antibody (Abcam #ab290) or 8 μl of anti-Zld antibody (Harrison et al., 2010) overnight at 4°C, and then bound to 50 μl of Protein A magnetic beads (Dynabeads Protein A, Thermo Fisher Scientific). The purified chromatin was then washed, eluted, and treated with 90 μg of RNaseA (37°C, for 30 min) and 100 μg of Proteinase K (65°C, overnight). The DNA was purified using phenol/chloroform extraction and concentrated by ethanol precipitation. Each sample was resuspended in 25 μl of water. Sequencing libraries were made using the NEB Next Ultra II library kit and were sequenced on the Illumina Hi-Seq4000 using 50 bp single-end reads or the Illumina NextSeq 500 using 75 bp single-end reads at the Northwestern Sequencing Core (NUCore).

ChIP-seq data analysis

Request a detailed protocol

ChIP-seq data was aligned to the Drosophila melanogaster reference genome (version dm6) using bowtie 2 v2.3.5 (Langmead and Salzberg, 2012) with the following non-default parameters: -k 2, --very-sensitive. Aligned reads with a mapping quality < 30 were discarded, as were reads aligning to scaffolds or the mitochondrial genome. To identify regions that were enriched in immunoprecipitated samples relative to input controls, peak calling was performed using MACS v2 (Zhang et al., 2008) with the following parameters: -g 1.2e8, --call-summits. To focus analysis on robust, high-quality peaks, we used 100 bp up- and downstream of peak summits, and retained only peaks that were detected in both replicates and overlapped by at least 100 bp. All downstream analysis focused on these high-quality peaks. Peak calling was also performed for control ChIP samples performed on w1118 with the α-GFP antibody. No peaks were called in any of the w1118 controls, indicating high specificity of the α-GFP antibody. To compare GAF-binding sites at stage 3, stage 5 and in S2 cells, and to compare GAF, Zld and CLAMP binding, we used GenomicRanges R package (Lawrence et al., 2013) to compare different sets of peaks. Peaks overlapping by at least 100 bp were considered to be shared. To control for differences in data processing and analysis between studies, previously published ChIP-seq datasets for Zld (Harrison et al., 2011, GSE30757), GAF (Fuda et al., 2015, GSE40646), and CLAMP (Rieder et al., 2019, GSE133637) were processed in parallel with ChIP-seq data sets generated in this study. DeepTools (Ramírez et al., 2016) was used to generate read depth for 10 bp bins across the genome. A z-score was calculated for each 10 bp bin using the mean and standard deviation of read depth across all 10 bp bins. Z-score normalized read depth was used to generate heatmaps and metaplots. De novo motif discovery was performed using MEME-suite (Bailey et al., 2009). Genome browser tracks were generated using raw bigWig files with the Gviz package in R (Hahne and Ivanek, 2016). Numbers used for all Fisher’s exact tests are included in Supplementary file 4.

Spike-in normalization for GAF ChIP in zld-RNAi background

Request a detailed protocol

Chromatin for GAF ChIP following depletion of Zld was prepared as described above. Prior to addition of the anti-GFP antibody, mouse chromatin prepared from cells expression an H3.3-GFP fusion protein was added to Drosophila chromatin at a 1:750 ratio. Following sequencing, reads were aligned to a combined reference genome containing both the Drosophila genome (version dm6) and the mouse genome (version mm39). Only reads that could be unambiguously aligned to one of the two reference genomes were retained. To control for any variability in the proportion of mouse chromatin in the input samples, the ratio of percentage of spike in reads in the IP relative to the input were used. A scaling factor was calculated by dividing one by this ratio. Z-score normalized read depth was adjusted by this scaling factor, and the resulting spike-in normalized values were used for heatmaps.

Analysis of differential binding in GAF- and Zld-depleted embryos

Request a detailed protocol

The global decrease in Zld ChIP signal in GAF-depleted embryos (Figure 4—figure supplement 2C) caused us to consider technical and biological factors that may have affected these experiments. Global Zld-binding signal may have been lower in the GAFdeGradFP background because GAFdeGradFP embryos die at variable times around the MZT. To attempt account for this variability, we collected embryos from a tight 2–2.5 hr AEL timepoint rather than sorting. Therefore, a portion of these embryos collected for ChIP-seq may have been dead. Additionally, the immunoprecipitation efficiency can vary between experiments and thus might have been lower in the GAFdeGradFP ChIP-seq as compared to the control. Alternatively, as discussed in the Results section, GAF may be broadly required for robust Zld occupancy.

To control for these technical factors, we performed an analysis based on peak rank, in addition to peak intensity. The number of reads aligning within each peak was quantified using featureCounts from the Subread package (v1.6.4) (Liao et al., 2014). Peaks were then ranked based on the mean RPKM-normalized read count between replicates, allowing comparison of peak rank between different conditions.

DESeq2 (Love et al., 2014) was used to identify potential differential binding sites in a more statistically rigorous way. To control for variable detection of low-intensity peaks, only high-confidence GAF stage 5 peaks identified both in homozygous GAF-sfGFP(C) and heterozygous sfGFP-GAF(N) embryos (3800 peaks) were analyzed. For Zld ChIP-seq, analysis was restricted to 6003 high-confidence peaks shared between our control dataset and previously published Zld ChIP (Harrison et al., 2011). A table of read counts for these peaks, generated by featureCounts as described above, was used as input to DESeq2. Peaks with an adjusted p-value<0.05 and a fold change >2 were considered to be differentially bound. Numbers used for all Fisher’s exact tests are included in Supplementary file 4.

Total RNA-seq

Request a detailed protocol

A total of 150–200 embryos from His2Av-RFP/nos-degradFP (II); sfGFP-GAF(N) (III) and His2Av-RFP (II); sfGFP-GAF(N) (III) females were collected from a half hour lay and aged for 2 hr. For each genotype, three biological replicates were collected. Embryos were then picked into Trizol (Invitrogen, Carlsbad, CA) with 200 μg/ml glycogen (Invitrogen). RNA was extracted and RNA-seq libraries were prepared using the Universal RNA-Seq with NuQuant, Drosophila AnyDeplete Universal kit (Tecan, Männedorf, Switzerland). Samples were sequenced on the Illumina NextSeq500 using 75 bp single-end reads at the Northwestern Sequencing Core (NUCore).

RNA-seq analysis

Request a detailed protocol

RNA-seq data was aligned to the Drosophila melanogaster genome (dm6) using HISAT v2.1.0 (Kim et al., 2015). Reads with a mapping quality score <30 were discarded. The number of reads aligning to each gene was quantified using featureCounts, generating a read count table that was used to analyze differential expression with DESeq2. Genes with an adjusted p-value<0.05 and a fold change >2 were considered statistically significant. To identify GAF-target genes, GAF ChIP peaks were assigned to the nearest gene (this study). Zygotically and maternally expressed genes (Lott et al., 2011, GSE25180) zygotic gene expression onset (Li et al., 2014, GSE58935), Zld-dependent genes (McDaniel et al., 2019, GSE121157), CLAMP-dependent genes (Rieder et al., 2017, GSE102922), and Zld targets (Harrison et al., 2011, GSE30757) were previously defined. Genome browser tracks were generated using bigWigs with the Gviz package in R (Hahne and Ivanek, 2016). Numbers used for all Fisher’s exact tests are included in Supplementary file 4.

Assay for transposase-accessible chromatin

Request a detailed protocol

Embryos from His2Av-RFP/nos-degradFP (II); sfGFP-GAF(N) (III) and His2Av-RFP (II); sfGFP-GAF(N) (III) females were collected from a half hour lay and aged for 2 hr. Embryos were dechorionated in bleach, mounted in halocarbon 700 oil, and imaged on a Nikon Ti-2e Epifluorescent microscope using 60x magnification. Embryos with nuclei that were not grossly disordered as determined by the His2Av-RFP marker were selected for single embryo ATAC-seq. Six replicates were analyzed for each genotype. Single-embryo ATAC-seq was performed as described previously (Blythe and Wieschaus, 2016; Buenrostro et al., 2013). Briefly, a single dechorionated embryo was transferred to the detached cap of a 1.5 ml microcentrifuge tube containing 10 µl of ice-cold ATAC lysis buffer (10 mM Tris pH 7.5, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40). Under a dissecting microscope, a microcapillary tube was used to homogenize the embryo. The cap was placed into a 1.5 ml microcentrifuge tube containing an additional 40 µl of cold lysis buffer. Tubes were centrifuged for 10 min at 500 g at 4°C. The supernatant was removed, and the resulting nuclear pellet was resuspended in 5 µl buffer TD (Illumina, San Diego, CA) and combined with 2.5 µl H2O and 2.5 µl Tn5 transposase (Tagment DNA Enzyme, Illumina). Tubes were placed at 37°C for 30 min and the resulting fragmented DNA was purified using the Minelute Cleanup Kit (Qiagen, Hilden, Germany), with elution performed in 10 µl of the provided elution buffer. Libraries were amplified for 12 PCR cycles with unique dual index primers using the NEBNext Hi-Fi 2X PCR Master Mix (New England Biolabs). Amplified libraries were purified using a 1.2X ratio of Axygen magnetic beads (Corning Inc, Corning, NY). Libraries were submitted to the University of Wisconsin-Madison Biotechnology Center for 150 bp, paired-end sequencing on the Illumina NovaSeq 6000.

ATAC-seq analysis

Request a detailed protocol

Adapter sequences were removed from raw sequence reads using NGMerge (Gaspar, 2018). ATAC-seq reads were aligned to the Drosophila melanogaster (dm6) genome using bowtie2 with the following parameters: --very-sensitive, --no-mixed, --no-discordant, -X 5000, -k 2. Reads with a mapping quality score < 30 were discarded, as were reads aligning to scaffolds or the mitochondrial genome. Analysis was restricted to fragments < 100 bp, which, as described previously, are most likely to originate from nucleosome-free regions (Buenrostro et al., 2013). To maximize the sensitivity of peak calling, reads from all replicates of GAFdeGradFP and control embryos were combined. Peak calling was performed on combined reads using MACS2 with parameters -f BAMPE --keep-dup all -g 1.2e8 --call-summits. This identified 64,133 accessible regions. To facilitate comparison to published datasets with different sequencing depths, peaks were filtered to include only those with a fold-enrichment over background > 2.5 (as determined by MACS2). This resulted in 36,571 peaks that were considered for downstream analysis. Because greater sequencing depth can result in the detection of a large number of peaks with a lower level of enrichment over background, this filtering step ensured that peak sets were comparable across all datasets. 201 bp peak regions (100 bp on either side of the peak summit) were used for downstream analysis. Reads aligning within accessible regions were quantified using featureCounts, and differential accessibility analysis was performed using DESeq2 with an adjusted p-value<0.05 and a fold change > 2 as thresholds for differential accessibility. Previously published ATAC-seq datasets (Hannon et al., 2017), GSE86966; (Blythe and Wieschaus, 2016), GSE83851; (Nevil et al., 2020), GSE137075. were re-analyzed in parallel with ATAC-seq data from this study to ensure identical processing of data sets. Heatmaps and metaplots of z score-normalized read depth were generated with DeepTools. MEME-suite was used to for de novo motif discovery for differentially accessible ATAC peaks and to match discovered motifs to previously known motifs from databases. Genome browser tracks were generated using bigWigs with the Gviz package in R (Hahne and Ivanek, 2016). Numbers used for all Fisher’s exact tests are included in Supplementary file 4.

Data availability

Sequencing data have been deposited in GEO under accession code GSE152773.

The following data sets were generated
    1. Gaskill MM
    2. Gibson TJ
    3. Larson ED
    4. Harrison MM
    (2020) NCBI Gene Expression Omnibus
    ID GSE152773. GAF is essential for zygotic genome activation and chromatin accessibility in the early Drosophila embryo.
The following previously published data sets were used
    1. Guertin MJ
    (2013) NCBI Gene Expression Omnibus
    ID GSE40646. The Genomic Binding Profile of GAGA Element Associated Factor (GAF) in Drosophila S2 cells.
    1. Rieder L
    2. Jordan W
    3. Larschan E
    (2019) NCBI Gene Expression Omnibus
    ID GSE133637. Targeting of the dosage-compensated male X-chromosome during early Drosophila development.
    1. Lott SE
    2. Eisen MB
    (2011) NCBI Gene Expression Omnibus
    ID GSE25180. Non-canonical compensation of zygotic X transcription in Drosophila melanogaster development revealed through single embryo RNA-Seq.
    1. Li XY
    2. Harrison MM
    3. Kaplan T
    4. Eisen MB
    (2014) NCBI Gene Expression Omnibus
    ID GSE58935. Establishment of regions of genomic activity during the Drosophila maternal-to-zygotic transition.
    1. Hannon CE
    2. Blythe SA
    3. Wieschaus EF
    (2017) NCBI Gene Expression Omnibus
    ID GSE86966. Concentration dependent binding states of the Bicoid Homeodomain Protein.
    1. McDaniel SL
    2. Gibson TJ
    3. Schulz KN
    4. Garcia MF
    5. Nevil M
    6. Jain SU
    7. Lewis PW
    8. Zaret KS
    9. Harrison MM
    (2019) NCBI Gene Expression Omnibus
    ID GSE121157. Continued activity of the pioneer factor Zelda is required to drive zygotic genome activation.
    1. Blythe SA
    2. Wieschaus EF
    (2016) NCBI Gene Expression Omnibus
    ID GSE83851. ATAC-seq analysis of chromatin accessibility and nucleosome positioning in Drosophila melanogaster precellular blastoderm embryos.
    1. Nevil M
    2. Gibson TJ
    3. Bartolutti C
    4. Iyengar A
    5. Harrison MM
    (2020) NCBI Gene Expression Omnibus
    ID GSE137075. Developmentally regulated requirement for the conserved transcription factor Grainy head in determining chromatin accessibility.

References

    1. Bhat KM
    2. Farkas G
    3. Karch F
    4. Gyurkovics H
    5. Gausz J
    6. Schedl P
    (1996)
    The GAGA factor is required in the early Drosophila embryo not only for transcriptional regulation but also for nuclear division
    Development 122:1113–1124.
    1. Busturia A
    2. Lloyd A
    3. Bejarano F
    4. Zavortink M
    5. Xin H
    6. Sakonju S
    (2001)
    The MCP silencer on the Drosophila Abd-B gene requires both pleiohomeotic and GAGA factor for the maintenance of repression
    Development 128:2163–2173.
  1. Book
    1. Hahne F
    2. Ivanek R
    (2016) Visualizing genomic data using Gviz and bioconductor
    In: Mathé E, Davis S, editors. Methods in Molecular Biology. Humana Press. pp. 335–351.
    https://doi.org/10.1007/978-1-4939-3578-9_16
    1. Horowitz H
    2. Berg CA
    (1996)
    The Drosophila pipsqueak gene encodes a nuclear BTB-domain-containing protein required early in oogenesis
    Development 122:1859–1871.

Decision letter

  1. Yukiko M Yamashita
    Reviewing Editor; Whitehead Institute/MIT, United States
  2. Kevin Struhl
    Senior Editor; Harvard Medical School, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper will be of interest to a broad audience of developmental biologists and molecular biologists in the field of transcriptional control and epigenetics. It evaluates the pioneer factor activity associated with GAGA-Factor during the process of zygotic genome activation. The experiments are rigorously performed and the data analysis supports the conclusions.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "GAF is essential for zygotic genome activation and chromatin accessibility in the early Drosophila embryo" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous. Our decision has been reached after consultation between the reviewers.

This manuscript by Gaskill and Gibson et al. presents compelling evidence that GAGA-factor acts as an additional pioneer factor in the early Drosophila embryo. The authors use state of the art genome engineering technology and protein degradation techniques to demonstrate that GAF is essential for ZGA and embryonic viability and is needed for the expression of a set of genes that are known to be bound by GAF.

All the reviewers appreciated the importance and the potential impact of the work, but all raised issues with experimental approaches with so many confounding factors making it difficult to interpret the results with confidence (the detailed critiques can be found below in individual reviews).

In the light of eLife's policy to invite revisions only when revision experiments are unlikely to change major conclusion of the paper, we decided that this manuscript must be rejected at this point. However, we would like to note that all the reviewers are quite enthusiastic about this manuscript, and if you can address major concerns, we would be ready to review the revised manuscript as a new submission, which will be handled by the same set of editors/reviewers.

Reviewer #1:

In this manuscript, Gaskill et al. perform a thorough investigation of the role of GAF as a pioneer transcription factor during zygotic genome activation (ZGA) in Drosophila embryos. Using an elegant system for tagging and acute degradation of GAF in embryos, the authors show that: (i) the tagged version of the protein is functional and recapitulates the expected binding properties (Figure 1); (ii) acute proteasomal degradation of the protein prior to ZGA leads to severe defects in embryo viability (Figure 2); (iii) degradation of GAF leads to a significant downregulation of zygotic gene expression at ZGA (Figure 3) independently of Zelda (Figure 4); and, (iv) loss of GAF leads to a significant reduction of chromatin accessibility.

The manuscript is very well written, the data are presented in a logical manner and the findings are supported by the data. The formal identification of GAF as a critical factor for ZGA is important and the results will open up new avenues for investigation of how these molecular mechanisms orchestrate the maternal-to-zygotic transition.

My main criticism of the work is the lack of a biochemical characterisation of GAF binding to nucleosomes to demonstrate a pioneer activity, and further biochemical evidence regarding the level of interaction with Zelda at the protein level. Combined, these experiments might bring a significant level of clarity to some of the results presented in Figure 4 and 5 by elucidating whether GAF is able to bind nucleosomes independently of Zelda.

In addition, I have another three major points:

1) I find the logic regarding the conclusions drawn in Figure 4 difficult to follow. In particular, the authors conclude that "GAF does not broadly depend on Zld for chromatin occupancy in the early embryo". However, Figure 4—figure supplement 1B shows significant differences in the binding intensities upon Zelda RNAi. Therefore, I find it difficult to reconcile these two aspects. I appreciate that the effect might be global, since both Zelda targets and non-Zelda GAF target peaks seem to change, but this needs to be clarified in the manuscript.

2) The same point applies where the authors state: "Zld binding to target loci is largely unperturbed in the absence of GAF". However, Figure 4—figure supplement 1D again shows a significant effect in the binding of Zelda upon GAF removal.

3) I find Figure 5F very difficult to follow and I cannot match the numbers reported in the main text that describe this analysis. This is also the case for Figure 5—figure supplement 1C. The authors should make this part more accessible. In addition, it might be more informative to perform a direct comparison of the ATAC-seq maps generated here and those for Zelda mutants from Hannon et al., 2017. Further analysis on the specific changes in nucleosome occupancy changes after Zelda or GAF depletion measured by these datasets might already help in demonstrating the pioneer activity of GAF.

Reviewer #2:

In the manuscript under review, Gaskill et al., address the longstanding question of the role of GAF in the process of MZT including zygotic genome activation. The question of GAF's role in early development was addressed in the past (Bhat et al., 1996) through conventional genetic means where it was found that elimination of maternal GAF results in a highly pleiotropic, messy, lethal phenotype. Gaskill et al. bring a welcome, timely, and modern approach to revisiting this problem by combining CRISPR, new approaches for generating conditional loss of function (DegradFP), and genomics. In doing so, they recover essentially the same messy pleiotropic loss of function phenotype.

While it would not be impossible to work with such a messy phenotype, in my opinion the authors have not taken sufficient care to ensure that their conclusions are free of possible confounding effects of the GAF loss of function phenotype. As highlighted below, there are several examples where conclusions are over-stated, technical or biological issues remain unaddressed, and -significantly- at least one of the presented experiments is designed so poorly that the rigor of the entire study has to be called into serious doubt (See major issue 1 as well as 3 below).

If done well, the experiments in this study are important to do and the results would be valuable for not only the developmental biology community, but to the broader field of transcriptional regulation and epigenetics. I make the comments below mindful of the difficulties of life in the pandemic, and I have tried to see a way to fix the study without suggesting that significant parts of it be re-done more carefully. Unfortunately, I am not confident that the existing data are suitable for publication as detailed below.

1) Significant lethality of sfGFP-GAF (30% hatch rate). While this phenotype is concerning, what is more of an issue is that my interpretation of the westerns in Figure 1—figure supplement 1 suggest that substantially less protein is produced from the N-terminally tagged line as well. This expression problem does not appear to be mentioned in the text, but it definitely should be. What is the quantitative difference between protein levels in these westerns? Does this raise concerns that the “wild-type” samples are not really very wild-type? When does the lethality occur? Specifically, does this effect of the N-terminal GFP-GAF allele affect any of the conclusions?

More concerning is that the authors mixed these two alleles (in different copy numbers as well) for the GAF vs Zld ChIP-seq experiments. As stated in the Materials and methods: these experiments done comparing one versus two copies of the GFP tagged GAF allele, but *also* that the zld RNAi sample used the N-terminal tagged GAF allele (in one copy) and the control used the C-terminal tagged allele (in two copies). As mentioned above, it appears that there is a difference in the expression levels of these two alleles, which is presumably magnified significantly by choosing to compare hets and homozygotes. This experiment could easily have been done on N-terminal hets as controls. This falls well below the bar for acceptable experimental design.

2) Re: conclusion: "maternal GAF is essential for progression through the MZT". The method of knockdown via DegradFP driven by mat-α-GAL4-VP16 begins eliminating GAF in mid-oogenesis. To underscore, this is not an “embryo-specific” knockdown. At the very least, it is a knockdown slightly later than what would be achieved in a germline clone. To the extent that the lethal phenotype has been examined in this manuscript, it remains unclear if it reflects a requirement of GAF specifically for the MZT, or if it reflects a requirement of GAF in some other biological process. There are high degrees of similarity between the DegradFP phenotype and the Trl[13C] phenotype reported in (Bhat '96) where ambiguity about the timing of GAF function also limited insight to when and how GAF is truly required. While the conclusion is not technically incorrect, the implication that this phenotype unambiguously implicates GAF as a regulator of the biological process of MZT (as opposed to oogenesis, genome integrity, chromatin architecture in general, et cetera) is not true.

Also, the GAF loss-of-function phenotype bears absolutely no resemblance to the Zld loss of function phenotype, as the authors further conclude. As the authors are aware, the vast majority of zld germline clones are virtually indistinguishable from wild-type embryos through the cleavage stages, and nearly perfectly phenocopy embryos injected with the Pol 2 inhibitor α amanitin at least at the gross morphological scale. Some minor changes in cell cycle timing occurs in zld mutants, and occasionally an extra mitotic division (also seen with Pol 2). But these cleavages are mostly completed with the fidelity typical of a wild type embryo. In dramatic contrast, GAF loss-of-function results in likely pleiotropic phenotypes that either result in or are caused by errors in mitosis or extensive DNA damage. Importantly, since effects like this are not achieved by inhibiting transcription, this phenotype is likely not due to effects on the ZGA component of MZT. This is consistent with the acute effect of GAF loss of function resulting in disruption of additional unknown maternal processes. But the extent that these disrupted maternal processes specifically impact the MZT, as opposed to other maternally-supplied biological functions, has not been demonstrated in this study.

3) Given the GAF loss of function phenotype: how can the authors be sure that the GAF[degradFP] embryos are at a comparable stage to "wild-type" in their genomic experiments? When gene expression is observed to be GAF-dependent, what steps have the authors taken to ensure that quantitative differences in gene expression stem from loss of GAF and are not confounded by the fact that in a 2-2.5 hr bulk collection that nearly all of the DegradFP embryos are in the process of dying, developing asynchronously, with likely extensive DNA Damage, et cetera? From the Materials and methods section, it seems that for some of the experiments, nothing at all was done. I have a hard time imagining how there is a way around this, since embryos with extensive damaged nuclei cannot be expected to continue transcribing RNA normally, and in fact there is evidence that they very much do not (Iampietro, Dev Cell 2014). Timing is also important here, since many of the genes that are listed as being sensitive to GAF loss of function are just beginning to be expressed, any delay in development would only exacerbate differences between the two experimental samples. For instance, the enrichment for up-regulated maternal genes could either reflect a role for GAF in the process of MZT, in general, or just that the embryos are either significantly delayed or outright die before they have a chance to undergo ZGA. As described above, this does not prove that GAF is involved in the process of MZT. Again, on the basis of these results, it cannot reliably be concluded, that GAF is directly playing a role in the process of ZGA.

4) Looking at the GAF[degradFP] coverage in the bigWig files supplied to GEO, I am strongly concerned that there is a major global difference in the normalized read counts (at least five fold lower peak heights all over the genome, but also with substantially higher background in the DegradFP sample), which should not really be there if the different samples are truly comparable to one another. Is this difference biological or technical? Such huge differences (if they are not just an error in generating the bigWig files) I think would certainly overburden algorithmic normalization in programs like DESeq2. Not to mention that it makes evaluation of the magnitude of differences seen in the ATAC experiments impossible to do through a genome browser, and has significantly colored my interpretation of the results with the DegradFP approach towards the negative (see above). The authors need to address the reason for this difference in magnitude, as well as provide information such as read counts per sample, mapping rates, fraction of mitochondrial reads (for ATAC data). If the issue is due to significant differences in recovered reads between the two genotypes, the question should be asked whether it is a good idea to publish these particular datasets.

Reviewer #3:

This manuscript by Gaskill and Gibson et al. presents compelling evidence that GAGA-factor acts as an additional pioneer factor in the early Drosophila embryo. The authors use state of the art genome engineering technology and protein degradation techniques to demonstrate that GAF is essential for ZGA and embryonic viability and is needed for the expression of a set of genes that are known to be bound by GAF. This data provides confirmation that previous results in cell lines translate well in vivo and that the previously established mechanistic role for GAF in opening chromatin and establishing paused Pol II holds true in animals. While this manuscript contains compelling findings and will be of high general interest, the authors' reciprocal ChIP-seq data is not convincing and needs to be repeated.

1) The reciprocal ChIP-seq experiments are simply not convincing in their current form. The authors want to use this data to claim that GAF or Zelda depletion does not impact the binding of the other factor, so that they can say that there is not cooperativity on the "vast majority of target sites". However, they note that there is a global decrease in GAF binding after Zelda depletion and a global decrease in Zelda binding after GAF depletion. They attempt to get around this issue by citing vague technical issues and using rank order comparisons instead, but this analysis is not appropriate in this case. If GAF and Zelda were binding cooperatively, wouldn't one expect there to be a global decrease in the binding of one after depletion of the other, and not necessarily a change in peak rank order? The authors speculate in the Materials and methods section as to why technical reasons could have caused this difference, however, what they should do is perform properly controlled experiments that control for these technical variables so that they can nail this finding with certainty. This is not a trivial issue, as a major conclusion of the paper is that GAF and Zelda act as independent pioneer factors with slightly different activation times in the embryo

a) Why was the zld-RNAi GAF ChIP-seq experiment performed in the sfGFP-GAF(N) background and then compared to the sfGFP-GAF(C) dataset? This experiment should be repeated with the proper control, GAF ChIP-seq in the sfGFP-GAF(C) background. Furthermore, it is concerning that it seems the authors simply performed the zld-RNAi GAF ChIP-seq experiment and then compared back to a previously generated GAF-ChIP-seq dataset; ChIP-seq is an assay subject to a large amount of technical variance and if experiments are to be quantitatively compared as they are in the manuscript, they must be performed at the same time and processed in parallel to minimize this variance. Were the ZLD ChIP-seq experiments and controls performed at the same time and processed in parallel?

b) Why are the authors not utilizing input controls in these ChIP-seq comparisons? Instead of using CPM, why not display fold change over input? This might be a more robust way of dealing with technical differences that lead to global changes as a result of IP efficiency, etc.

https://doi.org/10.7554/eLife.66668.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

In this manuscript, Gaskill et al. perform a thorough investigation of the role of GAF as a pioneer transcription factor during zygotic genome activation (ZGA) in Drosophila embryos. Using an elegant system for tagging and acute degradation of GAF in embryos, the authors show that: (i) the tagged version of the protein is functional and recapitulates the expected binding properties (Figure 1); (ii) acute proteasomal degradation of the protein prior to ZGA leads to severe defects in embryo viability (Figure 2); (iii) degradation of GAF leads to a significant downregulation of zygotic gene expression at ZGA (Figure 3) independently of Zelda (Figure 4); and, (iv) loss of GAF leads to a significant reduction of chromatin accessibility.

The manuscript is very well written, the data are presented in a logical manner and the findings are supported by the data. The formal identification of GAF as a critical factor for ZGA is important and the results will open up new avenues for investigation of how these molecular mechanisms orchestrate the maternal-to-zygotic transition.

My main criticism of the work is the lack of a biochemical characterisation of GAF binding to nucleosomes to demonstrate a pioneer activity, and further biochemical evidence regarding the level of interaction with Zelda at the protein level. Combined, these experiments might bring a significant level of clarity to some of the results presented in Figure 4 and 5 by elucidating whether GAF is able to bind nucleosomes independently of Zelda.

We are pleased that reviewer 1 finds the manuscript clear and that the findings are supported by the data.

Our lab has done several experiments to determine which factors, if any, Zelda is interacting with on the protein level. A graduate student performed IP mass spec, yeast 2-hybrid, and BioID experiments to identify proteins that directly interact with Zelda. Although multiple proteins were identified as potential Zld-interacting factors, we were unable to verify any of these through co-immunoprecipitation followed by immunoblot. We speculate that Zld may interact transiently with a large number of factors, potentially through it’s low-complexity transcription activation domain (Hamm et al., JBC 2015). GAF-interacting partners have been identified using affinity purification coupled with high-throughput mass spectrometry (Lomaev et al., 2017), and Zelda was not identified as a GAF-interacting factor in this study. Based on these previously generated data, we anticipate that if we were to perform co-immunoprecipitation studies with GAF and Zld we would not identify an interaction. We recognize that the biochemical identification of interactions depend on many factors, including purification conditions, making it impossible to rule out protein:protein interactions with a negative result. Therefore, although we cannot definitively state that GAF and Zelda are not interacting on a protein level, previous data suggests it is unlikely. Further supporting independent roles for GAF and Zld in shaping the chromatin landscape, GAF is required for chromatin accessibility in S2 tissue-culture cells, in which Zld is not expressed at detectable levels.

We also agree that nucleosome binding assays with GAF would be interesting and informative, however we believe they are beyond the scope of the current manuscript. A major challenge to this line of experimentation is that we have thus far been unable to purify soluble full length GAF protein in bacterial culture, and therefore anticipate purifying sufficient amounts of full-length GAF will require extensive troubleshooting. Given the difficulty of these experiments we believe it is outside the scope of this study. Although GAF has not yet been shown to directly bind nucleosomes, it has been extensively shown in vitro and in cell culture to promote chromatin accessibility at promoters through the recruitment of chromatin remodelers (Okada and Hirose, 1998; Tsukiyama et al., 1994; Xiao et al., 2001; Tsukiyama and Wu, 1995; Fuda et al., 2015; Judd et al., 2020). Our new analysis includes data demonstrating that GAF binds to regions before accessibility is established. However, we acknowledge that GAF has not been shown to directly bind nucleosomes and therefore it is not proven to have all of the features of a canonical pioneer factor. Therefore, we have removed “pioneer factor” from the title of our manuscript and refer to GAF within the manuscript as a “pioneer-like factor”.

In addition, I have another three major points:

1) I find the logic regarding the conclusions drawn in Figure 4 difficult to follow. In particular, the authors conclude that "GAF does not broadly depend on Zld for chromatin occupancy in the early embryo". However, Figure 4—figure supplement 1B shows significant differences in the binding intensities upon Zelda RNAi. Therefore, I find it difficult to reconcile these two aspects. I appreciate that the effect might be global, since both Zelda targets and non-Zelda GAF target peaks seem to change, but this needs to be clarified in the manuscript.

We apologize for the lack of clarity in this section. In response to the concerns of other reviewers, we have repeated these GAF ChIP-seq experiments to better control for any global effects the loss of Zld might have on GAF occupancy. We performed ChIP-seq for sfGFP-GAF (N) in a zld-RNAi background with paired heterozygous sfGFP-GAF (N) embryos as controls. We further included a spike-in control to allow for normalization (chromatin from mouse cells expression GFP-tagged histone H3). In the new GAF ChIP data after z-score and spike-in normalization, we no longer observe a global decrease in GAF binding in the zld-RNAi background. When we perform differential binding analysis on these data we again identify relatively few sites (3.3% of total GAF peaks) that are significantly changed. We have extensively rewritten this section of the manuscript and changed Figure 4E and Figure 4—figure supplement 2A, C to incorporate the new data and clarify our conclusions.

2) The same point applies where the authors state: "Zld binding to target loci is largely unperturbed in the absence of GAF". However, Figure 4—figure supplement 1D again shows a significant effect in the binding of Zelda upon GAF removal.

We appreciate the reviewer’s feedback and have rewritten this section to clearly convey our conclusions. We emphasize that although we observe a global decrease in Zld binding upon GAF depletion, a very small subset of Zld binding sites were statistically significantly changed in the absence of GAF (1.7%). Therefore, we conclude that at the majority of Zld binding sites the ability of Zld to bind target loci is not ablated when GAF is depleted. See rewritten section in the comment above.

3) I find Figure 5F very difficult to follow and I cannot match the numbers reported in the main text that describe this analysis. This is also the case for Figure 5—figure supplement 1C. The authors should make this part more accessible. In addition, it might be more informative to perform a direct comparison of the ATAC-seq maps generated here and those for Zelda mutants from Hannon et al., 2017. Further analysis on the specific changes in nucleosome occupancy changes after Zelda or GAF depletion measured by these datasets might already help in demonstrating the pioneer activity of GAF.

We apologize for the lack of clarity in this section. In response to another reviewer’s comments, we performed ATAC-seq on single-embryo replicates. We repeated and expanded our analysis with these new data (new Figure 5 and Figure 6). We now incorporate the ATAC-seq in zld mutants from Hannon et al., 2017 in Figure 6. We also investigated when during the MZT regions that require either Zld or GAF for gain accessibility based on data from Blythe and Wieschaus, 2016. We have therefore re-written the Results and Discussion and worked to make these sections more accessible. We have also updated Figure 5, Figure 5—figure supplement1, Figure 6, Figure 6—figure supplement1, and Figure 6—figure supplement 2 to incorporate our new data and analysis.

Reviewer #2:

In the manuscript under review, Gaskill et al., address the longstanding question of the role of GAF in the process of MZT including zygotic genome activation. The question of GAF's role in early development was addressed in the past (Bhat et al., 1996) through conventional genetic means where it was found that elimination of maternal GAF results in a highly pleiotropic, messy, lethal phenotype. Gaskill et al. bring a welcome, timely, and modern approach to revisiting this problem by combining CRISPR, new approaches for generating conditional loss of function (DegradFP), and genomics. In doing so, they recover essentially the same messy pleiotropic loss of function phenotype.

While it would not be impossible to work with such a messy phenotype, in my opinion the authors have not taken sufficient care to ensure that their conclusions are free of possible confounding effects of the GAF loss of function phenotype. As highlighted below, there are several examples where conclusions are over-stated, technical or biological issues remain unaddressed, and -significantly- at least one of the presented experiments is designed so poorly that the rigor of the entire study has to be called into serious doubt (See major issue 1 as well as 3 below).

If done well, the experiments in this study are important to do and the results would be valuable for not only the developmental biology community, but to the broader field of transcriptional regulation and epigenetics. I make the comments below mindful of the difficulties of life in the pandemic, and I have tried to see a way to fix the study without suggesting that significant parts of it be re-done more carefully. Unfortunately, I am not confident that the existing data are suitable for publication as detailed below.

We appreciate that the reviewer notes that the “experiments are important to do and that the results would be valuable” to both the developmental biology and transcription fields and regret that the reviewer did not feel we have “taken sufficient care to ensure our free of confounding effects”. To address these concerns, we have included new ChIP-seq and ATAC-seq data as well as additional data and analysis to further support our conclusions.

1) Significant lethality of sfGFP-GAF (30% hatch rate). While this phenotype is concerning, what is more of an issue is that my interpretation of the westerns in Figure 1—figure supplement 1 suggest that substantially less protein is produced from the N-terminally tagged line as well. This expression problem does not appear to be mentioned in the text, but it definitely should be. What is the quantitative difference between protein levels in these westerns? Does this raise concerns that the “wild-type” samples are not really very wild-type? When does the lethality occur? Specifically, does this effect of the N-terminal GFP-GAF allele affect any of the conclusions?

We appreciate the reviewer’s concerns about the sfGFP-GAF (N) line. We were also concerned about any confounding effects, which was the reason we used sfGFP-GAF(N) homozygous embryos as our controls for every genomics experiment. By comparing to this control, we will have eliminated from our analysis effects caused by the sfGFP-GAF(N) allele alone. We are confident the sfGFP-GAF(N) is rescuing essential GAF function, as we have verified the line is homozygous viable and fertile and recapitulates GAF nuclear localization characteristics. Additionally, in this revised manuscript we discuss additional ChIP-seq experiments performed on sfGFP-GAF(N) heterozygous embryos. Through comparison of these newly generated data for sfGFP-GAF(N) binding to binding sites identified for the C-terminally tagged version (GAF-sfGFP(C)), which has a 66% hatching rate, we identify a high-degree of overlap. 91% of loci identified in GAF-sfGFP(C) embryos are also identified in sfGFP-GAF(N) embryos (Figure 4—figure supplement 1B). This demonstrates that the N-terminal sfGFP tag does not alter the ability of GAF to localize to subnuclear domains or to bind specific target loci in the genome. Together, we feel this is strong evidence that, despite the reduced hatching rate, sfGFP-GAF(N) recapitulates the majority of GAF function. This analysis is described in the text.

To address the question of protein levels in the two tagged versions of GAF, we performed fluorescent quantification of GFP signal in nuclei of sfGFP-GAF(N) and GAF-sfGFP(C) homozygous embryos. This strategy allowed us to directly compare single embryos, while our western blots (Figure 1—figure supplement 1D) were performed on bulk collections. This quantification did not identify any significant difference in the fluorescent GFP signal between the two genotypes, suggesting GAF protein levels are not reduced in sfGFP-GAF(N) embryos compared to GAF-sfGFP(C) embryos. We have included these data in the manuscript in Figure 2—figure supplement 1 and described this experiment in the text.

It is challenging to determine the precise timing of the lethality of the sfGFP-GAF(N) embryos as it appears to be variable. We observe that approximately half of the sfGFP-GAF(N) homozygous embryos successfully progress to gastrulation. Therefore, a subset of sfGFP-GAF(N) embryos die prior to gastrulation, while the remainder die at some point following gastrulation.

More concerning is that the authors mixed these two alleles (in different copy numbers as well) for the GAF vs Zld ChIP-seq experiments. As stated in the Materials and methods: these experiments done comparing one versus two copies of the GFP tagged GAF allele, but *also* that the zld RNAi sample used the N-terminal tagged GAF allele (in one copy) and the control used the C-terminal tagged allele (in two copies). As mentioned above, it appears that there is a difference in the expression levels of these two alleles, which is presumably magnified significantly by choosing to compare hets and homozygotes. This experiment could easily have been done on N-terminal hets as controls. This falls well below the bar for acceptable experimental design.

To address this concern, we performed ChIP-seq for sfGFP-GAF(N) in a zld-RNAi background with paired heterozygous sfGFP-GAF(N) embryos as controls. Our data for sfGFP-GAF(N) binding from heterozygous embryos identified 6373 GAF peaks. When we compared these to the 4175 GAF peaks identified in the GAF-sfGFP(C) homozygous ChIP-seq dataset, we identified that 91% of GAF-sfGFP(C) peaks identified were maintained in the sfGFP-GAF(N) dataset, and peaks unique to either dataset are among the weakest sites identified (Figure 4—figure supplement 1B). Thus, the location of the sfGFP tag on GAF does not significantly alter the ability of GAF to bind specific genomic loci. To further enable comparison of GAF binding between control and zld-RNAi embryos, we included a spike-in control to allow for normalization (chromatin from mouse cells expression GFP-tagged histone H3.3). With the new GAF ChIP data after z-score and spike-in normalization, we no longer observe a global decrease in GAF binding in the zld-RNAi background. These new ChIP-seq data support the previous conclusions that GAF remains bound to nearly all the same loci in the presence and absence of Zld. We have updated Figure 4E and Figure 4—figure supplement 2A,C as well as the text to reflect these new data. We were also able to identify 101 GAF-binding sites that were altered upon the depletion of zld, and these have been discussed in the text.

2) Re: conclusion: "maternal GAF is essential for progression through the MZT". The method of knockdown via DegradFP driven by mat-α-GAL4-VP16 begins eliminating GAF in mid-oogenesis. To underscore, this is not an “embryo-specific” knockdown. At the very least, it is a knockdown slightly later than what would be achieved in a germline clone. To the extent that the lethal phenotype has been examined in this manuscript, it remains unclear if it reflects a requirement of GAF specifically for the MZT, or if it reflects a requirement of GAF in some other biological process. There are high degrees of similarity between the DegradFP phenotype and the Trl[13C] phenotype reported in (Bhat '96) where ambiguity about the timing of GAF function also limited insight to when and how GAF is truly required. While the conclusion is not technically incorrect, the implication that this phenotype unambiguously implicates GAF as a regulator of the biological process of MZT (as opposed to oogenesis, genome integrity, chromatin architecture in general, et cetera) is not true.

While we highlight that the expression of the degradFP enzyme is driven by the nanos promoter and not mat-α-GAL4-VP16, we acknowledge the point of the reviewer has raised, and we have changed the language in the manuscript to better reflect this caveat.

We agree that is important to acknowledge potential downstream effects of GAF knockdown at oogenesis. Nonetheless, a number of reasons support our hypothesis that the majority of the effects we identify are the result of GAF knockdown in the embryo:

1) Females in which GAF null germline clones have been generated are rarely able to produce eggs. As reported in Bejarano and Busturia, 2004, of 807 female GAF null germline clones only 23 could produce eggs. Furthermore, most of the eggs produced by these germline clones were unfertilized. These observations suggest that GAF is required during oogenesis for egg production. By contrast, our knockdown system does not show this egg-laying defect, suggesting that the essential functions of GAF in the germline are not inhibited.

2) In Bhat et al., 1996 the authors investigate GAF protein and mRNA levels in the ovary and embryos of wild-type control and Trl[13C] hypomorphic GAF mutants. Based on their studies, they conclude that the reduction in Trl mRNA deposited in the Trl[13C] embryos, and the resulting reduction of GAF protein levels in the embryo, likely produce the nuclear division defects observed. Although they cannot conclusively rule out that the described phenotypes are an indirect effect of GAF knockdown in the germline, they conclude that is unlikely given their observations that GAF protein levels are similar in the ovaries of wild-type controls and Trl[13C] mutants. Therefore, the shared phenotypes between our GAFdeGradFP embryos and the hypomorphic embryos from Trl[13C] mothers in Bhat et al., 1996 are suggestive that GAF knockdown is primarily occurring in the embryo. Nonetheless, our system has advantages over the Trl[13C] allele. While Bhat et al. observed variation in phenotype and GAF-expression levels both between and within embryos, our system robustly eliminates GAF protein in every embryo examined and uniformly in all nuclei.

3) Analysis of our genomics data indicates that the effects we observe are the direct consequence of GAF knockdown in the embryo. Our RNA-seq analysis identified only a subset of genes expressed at NC14 that were downregulated in the GAFdeGradFP embryos and these were significantly enriched for genes with a proximal GAF-binding site (Figure 3—figure supplement 2D). Further supporting the RNA-seq data and that it reflects the loss of GAF function in the embryo, we identify multiple, zygotically expressed, previously identified GAF-target genes as down-regulated in our RNA-seq analysis, such as engrailed and Ultrabithorax.

In summary, we agree that we cannot rule out that some of the phenotypic and gene expression differences between the control and GAFdeGradFP embryos are due to loss of GAF protein in the female germline. Nonetheless, we feel that the evidence strongly supports that many of the effects we identify are due to protein loss in the embryo.

Also, the GAF loss-of-function phenotype bears absolutely no resemblance to the Zld loss of function phenotype, as the authors further conclude. As the authors are aware, the vast majority of zld germline clones are virtually indistinguishable from wild-type embryos through the cleavage stages, and nearly perfectly phenocopy embryos injected with the Pol 2 inhibitor α amanitin at least at the gross morphological scale. Some minor changes in cell cycle timing occurs in zld mutants, and occasionally an extra mitotic division (also seen with Pol 2). But these cleavages are mostly completed with the fidelity typical of a wild type embryo. In dramatic contrast, GAF loss-of-function results in likely pleiotropic phenotypes that either result in or are caused by errors in mitosis or extensive DNA damage. Importantly, since effects like this are not achieved by inhibiting transcription, this phenotype is likely not due to effects on the ZGA component of MZT. This is consistent with the acute effect of GAF loss of function resulting in disruption of additional unknown maternal processes. But the extent that these disrupted maternal processes specifically impact the MZT, as opposed to other maternally-supplied biological functions, has not been demonstrated in this study.

We agree with reviewer 2 that the phenotypic defects of a subset of GAF knockdown embryos are more dramatic than the effects of maternal zld null embryos during NC10-13. Nonetheless, we would like to point out that there are multiple reports of defects in zld null embryos prior to NC14. For example, in Liang et al., 2008, nuclear fallout is reported in maternal zld null embryos prior to NC14, which we have also observed when imaging maternal zld-RNAi embryos. In that same paper, it is described that in early NC14 maternal zld null embryos the actin network becomes disorganized and begins to degenerate, leading to highly disorganized nuclei. It is reported in Staudt et al., 2006 that embryos injected with vielfaltig (renamed zld) dsRNA begin to display anaphase bridges and abnormally condensed chromatin as early as NC10, with these defects becoming more frequent through subsequent nuclear cycles. (Indeed, the gene was originally named vielfaltig, meaning versatile or manifold, because of these wide-ranging phenotypic defects.) Therefore, we have edited the text for clarity to reflect that a subset of the GAF knockdown embryos have similar phenotypes to what has been previously published for maternal zld knockdown embryos, while a subset of GAF knockdown embryos display more dramatic phenotypes likely due to additional roles of GAF for genome stability.

3) Given the GAF loss of function phenotype: how can the authors be sure that the GAF[degradFP] embryos are at a comparable stage to "wild-type" in their genomic experiments? When gene expression is observed to be GAF-dependent, what steps have the authors taken to ensure that quantitative differences in gene expression stem from loss of GAF and are not confounded by the fact that in a 2-2.5 hr bulk collection that nearly all of the DegradFP embryos are in the process of dying, developing asynchronously, with likely extensive DNA Damage, et cetera? From the Materials and methods section, it seems that for some of the experiments, nothing at all was done. I have a hard time imagining how there is a way around this, since embryos with extensive damaged nuclei cannot be expected to continue transcribing RNA normally, and in fact there is evidence that they very much do not (Iampietro, Dev Cell 2014). Timing is also important here, since many of the genes that are listed as being sensitive to GAF loss of function are just beginning to be expressed, any delay in development would only exacerbate differences between the two experimental samples. For instance, the enrichment for up-regulated maternal genes could either reflect a role for GAF in the process of MZT, in general, or just that the embryos are either significantly delayed or outright die before they have a chance to undergo ZGA. As described above, this does not prove that GAF is involved in the process of MZT. Again, on the basis of these results, it cannot reliably be concluded, that GAF is directly playing a role in the process of ZGA.

We chose to perform a bulk collection of embryos for our RNA-seq experiment precisely because the nuclear defects in the GAFdeGradFP embryos makes exactly staging the embryos challenging and because of the issues the reviewer raises given the initiation of gene expression. Given the dramatic changes in gene expression that can occur within minutes during zygotic genome activation, we used tightly timed bulk collection to alleviate potential timing differences that would be exacerbated by single-embryo RNA-seq. To address the concerns above, we analyzed our RNA-seq data by selecting for those genes that initiate expression late in ZGA (late nuclear cycle 14, NC14; the “later” class of genes defined by Li et al., 2014). If expression defects are caused by a failure of the embryos to reach ZGA, we would expect a global decrease in all genes expressed at this timepoint in the GAFdeGradFP embryos as compared to control. Instead, our analysis demonstrated that many non-GAF target transcripts activated during late NC14 are expressed at similar levels in GAFdeGradFP embryos as compared to control embryos. By contrast, a substantial proportion of genes activated at NC14 with a proximal GAF-binding site (as identified by our ChIP-seq) are downregulated in the GAFdeGradFP embryos as compared to controls (Figure 3—figure supplement 2D). Thus, during NC14 direct GAF target transcript expression is affected by GAF knockdown while numerous non-GAF target transcripts are activated normally. This additional analysis suggests our tightly timed bulk collection succeeded in enriching for NC14 embryos and that expression differences identified in our RNA-seq data are not caused by a timing delay in GAFdeGradFP embryos compared to controls.

4) Looking at the GAF[degradFP] coverage in the bigWig files supplied to GEO, I am strongly concerned that there is a major global difference in the normalized read counts (at least five fold lower peak heights all over the genome, but also with substantially higher background in the DegradFP sample), which should not really be there if the different samples are truly comparable to one another. Is this difference biological or technical? Such huge differences (if they are not just an error in generating the bigWig files) I think would certainly overburden algorithmic normalization in programs like DESeq2. Not to mention that it makes evaluation of the magnitude of differences seen in the ATAC experiments impossible to do through a genome browser, and has significantly colored my interpretation of the results with the DegradFP approach towards the negative (see above). The authors need to address the reason for this difference in magnitude, as well as provide information such as read counts per sample, mapping rates, fraction of mitochondrial reads (for ATAC data). If the issue is due to significant differences in recovered reads between the two genotypes, the question should be asked whether it is a good idea to publish these particular datasets.

We agree with reviewer 2 that our ATAC-seq data were noisy. Since our lab has had success with single embryo ATAC-seq, we decided to collect single embryos from control and GAFdeGradFP embryos from the 2-2.5 hr time point. Single-embryo ATAC-seq assays are not subject to the same timing issues as RNA-seq as the changes in accessibility identified by Blythe and Wieschaus, 2016 over the MZT are less dramatic than the changes in transcript levels during the same time point. Nonetheless, we performed these experiments in six biological replicates to minimize effects of differences in developmental timing. We imaged individual embryos to confirm that they had nuclei that were not grossly disordered using His2av-RFP prior to performing the ATAC-seq. The data from this experiment were much improved, and we have updated and expanded new Figures 5 and 6 using the data from our single embryo ATAC-seq experiment.

We have updated Figure 5, Figure 5—figure supplement1, Figure 6, Figure 6—figure supplement 1, Figure 6—figure supplement 2 with our new data and expanded analysis, as well as the text.

Reviewer #3:

This manuscript by Gaskill and Gibson et al. presents compelling evidence that GAGA-factor acts as an additional pioneer factor in the early Drosophila embryo. The authors use state of the art genome engineering technology and protein degradation techniques to demonstrate that GAF is essential for ZGA and embryonic viability and is needed for the expression of a set of genes that are known to be bound by GAF. This data provides confirmation that previous results in cell lines translate well in vivo and that the previously established mechanistic role for GAF in opening chromatin and establishing paused Pol II holds true in animals. While this manuscript contains compelling findings and will be of high general interest, the authors' reciprocal ChIP-seq data is not convincing and needs to be repeated.

We appreciate that the reviewer finds the that the manuscript contains “compelling findings and will be of high general interest.” We have repeated the reciprocal ChIP-seq data and re-analyzed these data to address the concerns raised.

1) The reciprocal ChIP-seq experiments are simply not convincing in their current form. The authors want to use this data to claim that GAF or Zelda depletion does not impact the binding of the other factor, so that they can say that there is not cooperativity on the "vast majority of target sites". However, they note that there is a global decrease in GAF binding after Zelda depletion and a global decrease in Zelda binding after GAF depletion. They attempt to get around this issue by citing vague technical issues and using rank order comparisons instead, but this analysis is not appropriate in this case. If GAF and Zelda were binding cooperatively, wouldn't one expect there to be a global decrease in the binding of one after depletion of the other, and not necessarily a change in peak rank order? The authors speculate in the Materials and methods section as to why technical reasons could have caused this difference, however, what they should do is perform properly controlled experiments that control for these technical variables so that they can nail this finding with certainty. This is not a trivial issue, as a major conclusion of the paper is that GAF and Zelda act as independent pioneer factors with slightly different activation times in the embryo.

We agree with the point raised by reviewer 3 and the other reviewers that better controlled experiments eliminating technical variables would bring clarity to the global decrease in GAF and Zld binding observed in the knockdown backgrounds of the other factor. We therefore performed ChIP-seq for sfGFP-GAF(N) in a zld-RNAi background with paired heterozygous sfGFP-GAF(N) embryos as controls. We further included a spike-in control to allow for normalization (chromatin from mouse cells expression GFP-tagged histone H3.3). In our new GAF ChIP-seq data after z-score and spike-in normalization we no longer observe a global decrease in GAF binding in the zld-RNAi background. These new data support the previous conclusion that GAF remains bound to the majority of target loci in both the presence and absence of Zld, with only 3.3% of GAF peaks statistically significantly changed in the zld-RNAi background. However, we performed additional analysis on the 101 GAF-binding sites that decrease when Zld is lost. We find these sites are enriched for Zld-bound regions that depend on Zld for accessibility. Therefore, at a small subset of GAF-binding sites, pioneering activity of Zld may be required for GAF occupancy. However at the majority of GAF-binding sites GAF does not depend on Zld for occupancy. We used this new GAF ChIP data to generate Figure 4E, Figure 4—figure supplement 2A,C, and we have incorporated textual changes.

We agree with reviewer 3’s assessment that we cannot rule out that the global decrease we observed in Zld binding upon the loss of GAF is biological. However, we expect that if Zld required GAF for occupancy we would see a change in Zld binding in the absence of GAF specifically at regions where the two factors are co-bound. We based this assumption on previously published data on pioneer-factor cooperativity. For example, in Donaghey et al., 2018 the authors found that GATA4 can stabilize FOXA2 binding at a subset of sites where the two factors are bound when they are ectopically expressed in cell culture. Additionally, when OSKM occupancy was investigated during MEF reprogramming it was found that the cooperative binding of O, S, and K was required for targeting of these factors specifically to sites that were co-occupied by the three factors (Chronis et al., 2017). In contrast, the Zld-binding sites that decrease in the absence of GAF are depleted for GAF and Zld co-bound regions when compared to unchanged Zld binding sites. Therefore, at these sites GAF is not directly stabilizing Zld binding or creating local chromatin accessibility. We have clarified the text to more clearly convey this.

Comparisons to previously published reciprocal ChIP-seq data also reinforces our conclusion that neither GAF nor Zld is broadly required for the binding of the other factor to target loci. In Sun et al., 2015 the authors tested the dependency of the transcription factors Dorsal and Zld on the binding of the other factor during the MZT in Drosophila. Like us, they used DESeq to call differential binding sites for Dorsal peaks between control and zld-RNAi embryos and for Zld peaks in Dorsal mutant and control embryos. They determined that Zld is required for Dorsal binding to a subset of sites, as 19.4% of Dorsal peaks are significantly decreased and 37.8% are significantly changed in the zld RNAi background. By contrast, only 2.5% of Zld peaks are significantly changed in the absence of Dorsal. Therefore, the authors conclude that Zld is required for Dorsal binding, but that Dorsal is not required for Zld binding. Our data shows only 3.3% of GAF peaks significantly change upon the loss of Zld and 1.7% of Zld peaks significantly change upon the loss of GAF when we call differential binding with DESeq2 (Figure 4E,F). Therefore, both of our ChIP-seq experiments more closely resemble the impact of Dorsal loss on Zld than the impact of Zld loss on Dorsal. Nevertheless, we have modified the language in our manuscript to make it clear that we cannot rule out an influence of GAF on global Zld binding (Results, Discussion) and to emphasize that there is a small subset of GAF-binding sites that depend on the pioneering activity of Zld for occupancy (Results, Discussion).

a) Why was the zld-RNAi GAF ChIP-seq experiment performed in the sfGFP-GAF(N) background and then compared to the sfGFP-GAF(C) dataset? This experiment should be repeated with the proper control, GAF ChIP-seq in the sfGFP-GAF(C) background. Furthermore, it is concerning that it seems the authors simply performed the zld-RNAi GAF ChIP-seq experiment and then compared back to a previously generated GAF-ChIP-seq dataset; ChIP-seq is an assay subject to a large amount of technical variance and if experiments are to be quantitatively compared as they are in the manuscript, they must be performed at the same time and processed in parallel to minimize this variance. Were the ZLD ChIP-seq experiments and controls performed at the same time and processed in parallel?

As mentioned above, we have repeated the ChIP-seq for sfGFP-GAF(N) in the zld RNAi background with paired heterozygous sfGFP-GAF(N) controls. In our Zld ChIP-seq experiment, we performed the ChIP-seq for Zld in the GAFdeGradFP and paired homozygous sfGFP-GAF(N) controls in parallel.

b) Why are the authors not utilizing input controls in these ChIP-seq comparisons? Instead of using CPM, why not display fold change over input? This might be a more robust way of dealing with technical differences that lead to global changes as a result of IP efficiency, etc.

Previously published ChIP-seq data for Zld at NC14 (Harrison et al., 2011) and GAF in S2 cells (Fuda et al., 2015) that we used for comparisons to our ChIP-seq in the paper did not have publicly available input controls. Therefore, we used CPM normalization for all ChIP-seq data so that we would have a consistent normalization method for ChIP-seq data throughout the paper. Upon revision we have decided to use z-score normalization for all ChIP-seq heatmaps. With our new ChIP-seq data for GAF in a zld RNAi background with paired controls using spike-in and z-score normalization we no longer observe a global decrease in GAF signal in the zld-RNAi background. However, we continue to observe a global decrease in Zld ChIP signal in GAF knockdown embryos compared to controls using both z-score normalized and input normalization heatmap generation. For consistency, we used z-score normalization throughout.

https://doi.org/10.7554/eLife.66668.sa2

Article and author information

Author details

  1. Marissa M Gaskill

    Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Tyler J Gibson
    Competing interests
    No competing interests declared
  2. Tyler J Gibson

    Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - review and editing
    Contributed equally with
    Marissa M Gaskill
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2796-2176
  3. Elizabeth D Larson

    Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, United States
    Contribution
    Formal analysis, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Melissa M Harrison

    Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing - original draft, Writing - review and editing
    For correspondence
    mharrison3@wisc.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8228-6836

Funding

National Institutes of Health (R01GM111694)

  • Melissa M Harrison

National Institutes of Health (R35GM136298)

  • Melissa M Harrison

Vallee Foundation

  • Melissa M Harrison

National Institutes of Health (T32GM007215)

  • Marissa M Gaskill
  • Tyler J Gibson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Erica Larschan and Jingyue Duan for helpful discussions. We also thank Peter Lewis, Christine Rushlow, the Bloomington Stock Center, and the Drosophila Genome Resource Center for providing reagents and fly lines. We acknowledge the University of Wisconsin-Madison Biochemistry Department Optical Core for access to microscopes for imaging and the University of Wisconsin-Madison Biotechnology Center and the NUSeq Core Facility for sequencing. MMG and TJG were supported by National Institutes of Health (NIH) National Research Service Award T32 GM007215. Experiments were supported by a R01 GM111694 and R35 GM136298 from the National Institutes of Health (NIH) and a Vallee Scholar Award (MMH).

Senior Editor

  1. Kevin Struhl, Harvard Medical School, United States

Reviewing Editor

  1. Yukiko M Yamashita, Whitehead Institute/MIT, United States

Publication history

  1. Received: January 19, 2021
  2. Accepted: March 14, 2021
  3. Accepted Manuscript published: March 15, 2021 (version 1)
  4. Version of Record published: April 27, 2021 (version 2)

Copyright

© 2021, Gaskill et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,780
    Page views
  • 285
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Chromosomes and Gene Expression
    2. Structural Biology and Molecular Biophysics
    Inigo Urrutia-Irazabal et al.
    Research Article Updated

    The PcrA/UvrD helicase binds directly to RNA polymerase (RNAP) but the structural basis for this interaction and its functional significance have remained unclear. In this work, we used biochemical assays and hydrogen-deuterium exchange coupled to mass spectrometry to study the PcrA-RNAP complex. We find that PcrA binds tightly to a transcription elongation complex in a manner dependent on protein:protein interaction with the conserved PcrA C-terminal Tudor domain. The helicase binds predominantly to two positions on the surface of RNAP. The PcrA C-terminal domain engages a conserved region in a lineage-specific insert within the β subunit which we identify as a helicase interaction motif present in many other PcrA partner proteins, including the nucleotide excision repair factor UvrB. The catalytic core of the helicase binds near the RNA and DNA exit channels and blocking PcrA activity in vivo leads to the accumulation of R-loops. We propose a role for PcrA as an R-loop suppression factor that helps to minimize conflicts between transcription and other processes on DNA including replication.

    1. Cancer Biology
    2. Chromosomes and Gene Expression
    William F Richter et al.
    Research Article Updated

    MLL-rearranged leukemia depends on H3K79 methylation. Depletion of this transcriptionally activating mark by DOT1L deletion or high concentrations of the inhibitor pinometostat downregulates HOXA9 and MEIS1, and consequently reduces leukemia survival. Yet, some MLL-rearranged leukemias are inexplicably susceptible to low-dose pinometostat, far below concentrations that downregulate this canonical proliferation pathway. In this context, we define alternative proliferation pathways that more directly derive from H3K79me2 loss. By ICeChIP-seq, H3K79me2 is markedly depleted at pinometostat-downregulated and MLL-fusion targets, with paradoxical increases of H3K4me3 and loss of H3K27me3. Although downregulation of polycomb components accounts for some of the proliferation defect, transcriptional downregulation of FLT3 is the major pathway. Loss-of-FLT3-function recapitulates the cytotoxicity and gene expression consequences of low-dose pinometostat, whereas overexpression of constitutively active STAT5A, a target of FLT3-ITD-signaling, largely rescues these defects. This pathway also depends on MLL1, indicating combinations of DOT1L, MLL1 and FLT3 inhibitors should be explored for treating FLT3-mutant leukemia.