Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

The X-chromosome gene regulatory process called dosage compensation ensures that males (1X) and females (2X) express equal levels of X-chromosome transcripts. The mechanism in Caenorhabditis elegans has been elusive due to improperly annotated transcription start sites (TSSs). Here we define TSSs and the distribution of transcriptionally engaged RNA polymerase II (Pol II) genome-wide in wild-type and dosage-compensation-defective animals to dissect this regulatory mechanism. Our TSS-mapping strategy integrates GRO-seq, which tracks nascent transcription, with a new derivative of this method, called GRO-cap, which recovers nascent RNAs with 5′ caps prior to their removal by co-transcriptional processing. Our analyses reveal that promoter-proximal pausing is rare, unlike in other metazoans, and promoters are unexpectedly far upstream from the 5′ ends of mature mRNAs. We find that C. elegans equalizes X-chromosome expression between the sexes, to a level equivalent to autosomes, by reducing Pol II recruitment to promoters of hermaphrodite X-linked genes using a chromosome-restructuring condensin complex.

https://doi.org/10.7554/eLife.00808.001

eLife digest

In many species, including humans, females have two X chromosomes whereas males have only one. To ensure that females do not end up with a double dose of the proteins encoded by genes on the X chromosome, animals employ a strategy called dosage compensation to control the expression of X-linked genes.

The mechanisms underlying dosage compensation vary between species, but they typically involve a regulatory complex that binds to the X chromosomes of one sex to modify gene expression. In the nematode worm Caenorhabditis elegans—which consists of hermaphrodites (XX) and males (XO)—this regulatory complex, called the dosage compensation complex (DCC), binds to both X chromosomes of XX individuals, reducing gene expression from each by 50%. DCC shares many subunits with a protein complex called condensin, which regulates the structure of chromosomes to achieve proper chromosome segregation. However, it is unclear exactly how the DCC controls the expression of X-linked genes.

For a gene to be expressed, an enzyme called RNA polymerase II must bind to the gene’s promoter—a stretch of DNA upstream of the protein-coding part of the gene—so that it can begin transcribing the DNA into RNA. Promoters have been difficult to define in C. elegans, but Kruesi et al. devised a strategy to map transcription start sites, and hence promoters, throughout the worm genome. The strategy integrates the results of two methods: One measures the extent and orientation of each gene’s transcribed region, and the other locates the distinctive cap structures that mark the true 5′ ends of newly made RNAs.

Using this new promoter information, coupled with genome-wide measurements of the levels of newly synthesized transcripts from wild-type and dosage-compensation-defective animals, they showed that C. elegans achieves dosage compensation by reducing the recruitment of RNA polymerase II to the promoters of X-linked genes in XX individuals.

Kruesi et al. also identified a second regulatory mechanism that acts in both sexes to increase the level of transcription of genes on the X chromosome. This ensures that after dosage compensation, genes on the X chromosome are expressed at a similar level to those on the autosomes (all chromosomes other than X and Y).

As well as shedding light on the mechanism by which dosage compensation occurs in C. elegans, the study by Kruesi et al. provides a valuable data set on transcription start sites in the worm, and puts forward a general strategy that could be used to map these sites in other species.

https://doi.org/10.7554/eLife.00808.002

Introduction

The essential, X-chromosome-wide regulatory process called dosage compensation ensures that males (XO or XY) and females (XX), from worms to humans, express equivalent levels of X-chromosome products despite their unequal dose of X chromosomes (Gelbart and Kuroda, 2009; Meyer, 2010; Conrad and Akhtar, 2012; Jeon et al., 2012). The failure to dosage compensate is lethal. Dosage compensation strategies differ across species, but invariably a regulatory complex is targeted to the X chromosomes of one sex to modulate transcription along the entire X. The molecular mechanisms by which these complexes regulate gene expression remain elusive. Here we analyzed X-chromosome dosage compensation in the nematode Caenorhabditis elegans to determine the step of transcription controlled by its dosage compensation complex (DCC). The DCC binds to both X chromosomes of hermaphrodites to reduce transcription by half (Meyer, 2010; Pferdehirt et al., 2011). Sequence-specific DNA binding sites recruit the DCC to X and facilitate its spreading along X (Ercan et al., 2009; Jans et al., 2009; Pferdehirt et al., 2011). The DCC shares subunits with condensin (Csankovszki et al., 2009; Mets and Meyer, 2009), a protein complex required for the compaction, resolution, and segregation of mitotic and meiotic chromosomes (Wood et al., 2010), suggesting that DCC-dependent changes in chromosome structure facilitate transcription regulation. In principle, the DCC could control any step of transcription: recruitment of RNA polymerase II (Pol II) to the gene promoter, initiation of transcription, escape of Pol II from the promoter or proximal pause sites, elongation of RNA transcripts, or termination of transcription.

To understand the mechanism of C. elegans dosage compensation, we first developed a procedure to map the position, density, and orientation of transcriptionally engaged Pol II genome-wide in C. elegans and then devised a strategy to identify the transcription start sites (TSSs). Nascent RNA transcripts from approximately 70% of C. elegans genes undergo a rapid co-transcriptional processing event in which the 5′ end is replaced by a common 22-nucleotide leader RNA (SL1) through a trans-splicing mechanism (Blumenthal, 2012). Because trans-splicing removes information about Pol II initiation from nascent RNAs, TSSs have been difficult to identify from accumulated mRNAs (Morton and Blumenthal, 2011). The paucity of annotated promoters has made transcription regulation a challenge to study in C. elegans. By comparing the quantity, location, and direction of engaged Pol II from wild-type and dosage-compensation-defective embryos relative to TSSs, we determined the step of transcription controlled by the DCC.

Our work establishes a general strategy for TSS mapping in any organism and provides an invaluable TSS data set for dissecting C. elegans gene regulation. We show that C. elegans equalizes X-chromosome-wide gene expression between the sexes by reducing Pol II recruitment to the promoters of X-linked genes in XX embryos via a mechanism that utilizes a chromosome-restructuring complex. We also show that a separate regulatory mechanism functions in C. elegans to elevate the intrinsic level of transcription from the X chromosomes of both sexes, so that after dosage compensation, X chromosomes and the two sets of autosomes have equivalent expression.

Results

Genome-wide mapping of transcriptionally engaged Pol II and transcription start sites reveals promoters to be far upstream of mature mRNA 5′ ends

To map the distribution of transcriptionally engaged Pol II genome-wide, we performed global run-on sequencing (GRO-seq) experiments using nuclei from three stages of wild-type animals (embryos, starved L1 larvae, and L3 larvae) and dosage-compensation-defective embryos. In GRO-seq reactions, engaged polymerases are allowed to transcribe (run-on) short distances (100 nucleotides) and incorporate affinity tags into their nascent RNAs under conditions that prohibit new initiation (Core et al., 2008). Tagged transcripts are affinity purified, amplified, sequenced, and aligned to the genome to map engaged Pol II (Figure 1A–E).

Figure 1 with 12 supplements see all

Download asset Open asset

Genome-wide annotation of *Caenorhabditis* *elegans* transcription start sites.

(A)–(E) Examples of newly annotated transcription start sites (TSSs) for protein-coding genes, non-coding RNA genes, and multigenic transcription units called operons identified using the combination of GRO-seq and GRO-cap. Red arrows demark the WormBase (WB) gene models. Dashed vertical lines show the WB gene starts. The GRO-seq signal is in reads per kilobase per million (RPKM). For protein coding genes, the GRO-seq signal was averaged across 25 bp windows with a 25 bp step. The GRO-cap signal is in reads per million (RPM). TAP+ is the signal from capped mRNAs, and TAP− is the background. For (D) and (E), the GRO-cap signal is the TAP+ signal after subtracting the TAP− signal. (A) TSS for a trans-spliced gene. The TSS maps 981 bp upstream of the WB start, with a continuous intervening GRO-seq signal. (B) TSS for the polycistronc microRNA cluster *mir-54-56* maps 158 bp upstream of the primary transcript start. (C) TSS for a non-trans-spliced gene. The TSS from GRO-cap and GRO-seq aligns with the WB start site. (D) Identification of the operon TSS shows the operon includes an additional gene, *tag-175*. The TSS for the operon maps 781 bp upstream of *tag-175*. (E) TSSs for genes in operons that also use independent promoters, including the TSS for a snoRNA gene within the intron of a gene. *vha-10* mRNA is trans-spliced with an SL2 RNA, indicating processing from a polycistronic RNA and an SL1 RNA, indicating transcription from an independent promoter. (F) Heat maps show that TSSs vastly improve gene models. The GRO-seq signal from embryos was plotted, one gene per row, for each of 4246 genes relative to the WB start (left) or the new TSS (right). The genes were ordered with increasing distance between the TSS and WB start. The light line moving rightward in the right panel does not represent TSSs. It reflects reduced GRO-seq signal immediately downstream of the trans-splice acceptor site that has been commonly annotated as the WB start site. (G) Heat maps showing the GRO-cap signal from embryos that was plotted for each of 4246 genes relative to the WB start (left) or the new TSS (right). The genes were ordered with increasing distance between the TSS and WB start.

https://doi.org/10.7554/eLife.00808.003

Figure 1—source data 1 Aligned reads for GRO-seq, GRO-cap, and ChIP-seq experiments. The number of reads from each replicate of GRO-seq, GRO-cap, and ChIP-seq that uniquely aligned to the Caenorhabditis elegans genome are listed.: https://doi.org/10.7554/eLife.00808.004
Download elife-00808-fig1-data1-v1.xls
Figure 1—source data 2 Annotation of transcription start sites for protein-coding genes.: https://doi.org/10.7554/eLife.00808.005
Download elife-00808-fig1-data2-v1.xls
Figure 1—source data 3 Annotation of transcription start sites for non-coding RNAs.: https://doi.org/10.7554/eLife.00808.006
Download elife-00808-fig1-data3-v1.xls

The two GRO-seq biological replicates for each stage had high statistical correlation throughout the genome (Spearman correlation, ρ > 0.94) and across gene bodies (Spearman correlation, ρ > 0.98) (Figure 1—figure supplements 1–3 and Figure 1—source data 1). Gene expression levels calculated from GRO-seq data correlated well with expression data from microarrays and RNA-seq experiments (Figure 1—figure supplement 4). For the majority of expressed genes, we found continuous GRO-seq signal upstream of the WormBase (WB)-annotated transcription starts (Figure 1A,B), suggesting that GRO-seq reactions contain nascent RNAs with true 5′ ends.

To map TSSs unambiguously, we performed a series of enzymatic selections on our GRO-seq run-on RNAs to capture only those RNAs with a 5′ cap, the 7-methyl guanosine residue added shortly after transcription initiation (Rasmussen and Lis, 1993). This modified GRO-seq procedure, called GRO-cap (Figure 2), enabled us to map TSSs with nucleotide resolution by tracking nascent RNAs prior to trans-splicing. Use of nascent RNAs without size selection also reduces the background from RNAs that are capped post-transcriptionally and increases the probability of identifying TSSs from promoters with low Pol II occupancy.

Figure 2

Download asset Open asset

GRO-cap strategy for identifying TSSs.

GRO-cap is a modified form of GRO-seq that utilizes the tagging and extensive purification of nascent RNAs from GRO-seq (Core et al., 2008) and then employs redundant enzymatic steps to enrich for RNAs with 5′ caps. Of particular importance to this study, GRO-cap permits analysis of RNAs prior to their co-transcriptional processing, which replaces true transcription start sites (TSSs) with trans-spliced leader RNAs in *Caenorhabditis elegans*. GRO-seq run-ons have been tuned to extend the length of nascent RNAs by only 100 nucleotides on average, thus minimizing any possibility that independent transcription units might been artifactually linked. In GRO-cap, nuclei are isolated and RNA polymerases are allowed to transcribe briefly in a run-on reaction in the presence of Br-UTP, as in GRO-seq. RNA is isolated but the base-hydrolysis step of GRO-seq is omitted to increase the probability of capturing nascent RNA molecules with a 5′ 7-methyl-GTP cap. BrU-RNAs made during the run-on reaction are enriched by selection with anti-BrdU beads to ensure the identification of true TSSs from capped nascent RNAs rather than 5′ ends from RNAs that received post-transcriptional capping (Fejes-Toth et al., 2009). A 3′ RNA adapter (red) is ligated to the RNAs, followed by another round of bead enrichment. Selection against 5′ mono-phosphate RNAs that do not represent capped RNAs (and any carry-through 5′ RNA adapters) is achieved by sequential enzymatic treatment with Terminator exonuclease to degrade 5′ mono-phosphate RNAs and then alkaline phosphatase to remove 5′ phosphates from 5′ mono-phosphate RNAs resistant to the exonuclease. Half of the nuclear run-on (NRO) RNA pool is treated with tobacco acid pyrophosphatase (TAP+) to remove the 5′ cap from the RNA, thereby exposing a 5′ mono-phosphate. The other half is left untreated (TAP−) to provide a control population of residual 5′ mono-phosphate RNAs that never had 5′ caps. The 5′ mono-phosphate RNAs are ligated to 5′ RNA adapters (blue). The TAP+ and TAP− samples are prepared for Illumina sequencing as in GRO-seq by reverse transcription of RNA into DNA and then amplification of DNA from 5′ and 3′ adapter regions. We note that transcripts <500 bp are captured most efficiently on Illumina sequencing platforms. The enriched TSS regions are identified by mapping the 5′ ends of the sequence reads back to the genome and comparing the TAP+ and TAP− sites to eliminate false TSSs. Comparing the GRO-cap candidate TSSs to the 5′ ends of transcription units defined by GRO-seq permits reliable assignment of TSSs to transcription units.

https://doi.org/10.7554/eLife.00808.019

For genes that lack trans-splicing, the site of maximum GRO-cap signal was coincident with both the 5′-most GRO-seq signal and the WB-annotated start (Figure 1C), confirming that GRO-cap and GRO-seq together permit high-confidence mapping of TSSs. For trans-spliced genes with no previously identified TSSs, strong GRO-cap and GRO-seq signals were found upstream of the WB-annotated starts, and TSS calls were supported by uninterrupted GRO-seq signal between the maximum GRO-cap signal and the WB start (Figure 1A).

In total, a TSS was identified for 31.7% (6353 genes) of all C. elegans protein-coding genes from at least one of the three developmental stages examined (Figure 1—source data 2). Of our TSS calls, 77% are for genes shown previously to be trans-spliced (Figure 1—source data 2) (Allen et al., 2011). Plotting the average GRO-seq and GRO-cap signals from each developmental stage relative to the TSSs from the same stage revealed the vast improvement in gene models (Figure 1F,G and Figure 1—figure supplements 5, 6 and 7A–C). Independent GRO-cap reactions from the same developmental stage gave very similar results (Figure 1G and Figure 1—figure supplement 7D).

TSSs were also annotated for the majority of genes encoding short non-coding RNAs such as snoRNAs, 21 U-RNAs, and microRNAs, including TSSs for the five polycistronic microRNA clusters (Figure 1C,E and Figure 1—source data 3) (see ‘Materials and methods’). The TSSs for the 21 U-RNAs were 2 bp upstream from the mature RNA (Figure 1—figure supplement 9).

In addition, the primary TSSs and gene composition were determined for many multigenic transcription units called operons in which the polycistronic pre-mRNAs are processed to monocistronic mRNAs through 3′ end formation and trans-splicing using SL2 RNA (Figure 1D,E) (Blumenthal, 2012). Identification of the TSS for one operon showed it to include a 5′ gene not previously ascribed to it (Figure 1D). TSSs for several operons showed the TSS to be far downstream (>2 kb) of the WB start, a result we also found for genes not included in operons (Figure 1—figure supplement 10A,B). Within operons some genes have independent promoters to transcribe their pre-mRNAs (Allen et al., 2011), and their TSSs were determined, including snoRNA genes within introns of internal genes (Figure 1E and Figure 1—figure supplement 10A,C). In general, we used conservative statistical criteria for calling TSSs, and additional TSSs can be identified by visual inspection of the data.

The new TSS calls revealed promoters to be further upstream from the 5′ ends of mature mRNAs than previously thought, as demonstrated by heat maps showing the GRO-seq signal or GRO-cap signal of re-annotated genes relative to WB starts or TSSs (Figure 1F,G and Figure 1—figure supplement 7A–D) and histograms showing distances between TSSs and WB starts or SL1 trans-splice acceptor sites (TSA) (Figure 1—figure supplement 11A,B). The TSS-to-TSA, called the outron, was previously thought to be 50–500 bp. We found instead that outrons can be as long as 14 kb and have a median of 260 bp and mean of 753 bp (Figure 1—figure supplement 11B). Fully 59% of outrons are longer than 200 bp, 21% are longer than 1 kb, and 2.3% are longer than 5 kb (e.g., Figure 1—figure supplement 8A–C).

Multiple lines of evidence indicate that the GRO-seq signal between newly called TSSs and previously identified TSAs reflects legitimate outrons rather than independent overlapping upstream transcripts. First, 3′ UTRs or polyA signals are rare in outrons of >1 kb in length, indicating the engaged Pol II is not from independent polyadenylated transcripts. From 565 such outrons, only 1.4% had an identified 3′ UTR in the Jan et al. (2011) study, and 0.7% had a 3′ UTR in the Mangone et al. (2010) study. Furthermore, only 3.5% had a polyA site (Mangone et al., 2010). Second, regions corresponding to long outrons have a continuous ChIP-chip signal from antibodies enriched for either the ser2 phosphorylated form of Pol II or the hypo-phosphorylated form of Pol II (Figure 1—figure supplement 8A–C). These results and the restricted run-on length of ∼100 nucleotides (Core et al., 2008) indicate that the GRO-seq signal corresponds to bound Pol II in vivo and is not an artifact of the nuclear run-on (NRO) reactions in vitro extending beyond the 3’ ends defined in vivo (Figure 1—figure supplement 8A–C and Figure 3A). Third, a heat map of individual genes showing the GRO-cap signal relative to TSSs reveals that a dominant TSS contributes the majority of the vast GRO-cap signal (Figure 1G). Together these observations strongly support the argument that the GRO-cap signal paired with the continuous GRO-seq signal from WB starts defines true TSSs. Recently published studies of TSSs in C. elegans that used cutoffs for TSS calls of either 1 kb upstream (Gu et al., 2012) or 200 bp upstream (Chen et al., 2013) of WB starts identified some outron TSSs but could not identify TSSs for a large class of genes with longer outrons (see ‘Discussion’).

Figure 3 with 2 supplements see all

Download asset Open asset

Features of promoters and TSSs.

(A) Trans-spliced genes can have multiple transcription start sites (TSSs), suggesting that trans-splicing eliminates the pressure to have only one precise TSS per gene. Shown are GRO-seq and corrected GRO-cap (TAP+ signal after subtracting TAP− signal) signals with ChIP-seq data of hypo-phosphorylated Pol II (8WG16 antibody, modENCODE_2439) for the trans-spliced gene *sca-1* expressed in the L3 larval stage. The total GRO-seq signal becomes more intense as additional TSSs (from 5′ to 3′) contribute to the pool of engaged Pol II molecules that transcribe through the upstream regulatory region of *sca-1*. The combination of continuous Pol II signal in the upstream region and the lack of 3′ UTRs or polyA signals (Mangone et al., 2010) strengthens the interpretation that the GRO-cap signal combined with the continuous GRO-seq signal identified true TSSs for *sca-1*. From left to right, the TSSs reside upstream of the WormBase (WB) gene model by 5728 bp, 4582 bp, 3044 bp, 1669 bp, and 159 bp. The ChIP-seq signal 3′ of the *sca-1* 3′ UTR is from the *klp-7* gene on the opposite strand. (B) A gene can use different primary TSSs in different developmental states. The primary TSS for *tag-294* in embryos and L3 larvae is 1529 bp upstream of the WB start, while the primary TSS in starved L1 larvae is 656 bp upstream. DNA sequences flanking newly annotated TSSs have evolutionarily conserved core promoter elements, including (C) TATA-box elements and (D) initiator elements (Inr). Of 4547 embryo genes with TSSs, 162 genes (3.6%) have a TATA element with a perfect match to the consensus 15–45 bp upstream of it, and 745 genes (16.4%) have an Inr with the adenine residing at the TSS (+1 bp). Consensus sequences for TATA elements and the Inr are above the graphs. RPKM: reads per kilobase per million; RPM: reads per million.

https://doi.org/10.7554/eLife.00808.020

Features of C. elegans promoters

Several other noteworthy features of promoters emerged from this comprehensive mapping of C. elegans transcription units and TSSs. (1) Many trans-spliced genes have multiple TSSs (Figure 3A), suggesting that trans-splicing has removed the selective pressure to form promoters with a single TSS. (2) Genes can use different TSSs across developmental stages, indicating developmental stage-specific regulation of transcription initiation (Figure 3B). (3) DNA sequences flanking the newly annotated TSSs have strong sequence conservation across nematode species (Figure 3—figure supplement 1B) and also have evolutionarily conserved core promoter elements, including the TATA-box (worm consensus TATAWAWR) (Figure 3C and Figure 3—figure supplements 1A and 2A,B) and the initiator element (Inr) (worm consensus YCAYTY) (Figure 3D and Figure 3—figure supplement 2C), both of which facilitate formation of the Pol II pre-initiation complex (Juven-Gershon and Kadonaga, 2010).

Features of C. elegans transcription: abundant 3′ Pol II accumulation and divergent transcription, but scarce promoter-proximal pausing

Three prominent features of transcription emerged. Accumulation of Pol II at 3′ ends of genes is abundant (Figure 4A–C and Figure 4—figure supplement 1A,B), as is divergent transcription from promoters lacking upstream divergent genes (Figure 4D–F). In contrast, promoter-proximal RNA Pol II pausing in C. elegans is rare under normal growth conditions, unlike in other metazoans, as shown later in ‘Results’.

Figure 4 with 3 supplements see all

Download asset Open asset

Features of *Caenorhabditis* *elegans* transcription: 3′ Pol II pausing and divergent transcription.

(A)–(C) Pol II 3′ accumulation is prevalent in worms. 3′ End pausing ratios were calculated by dividing the highest average GRO-seq signal at the 3′ end by the average GRO-seq signal in the gene body. (A) A histogram of the 3′ end pausing ratios shows 3′ accumulation of Pol II is more extensive in *Caenorhabditis elegans* than in Drosophila. The histogram compares 3′ accumulation for 3984 genes expressed in C. *elegans* embryos with 6107 genes expressed in Drosophila cell lines. (B) The GRO-seq signal surrounding the 3′ end (cleavage and polyadenylation site [CPS]) was averaged for genes in or not in operons. Genes at the beginning (n = 430) and middle (n = 276) of operons were more highly expressed and had higher 3′ accumulation than genes at the end (n = 474) of operons or not in operons (n = 5048). Genes plotted had to be greater than 3 kb in length. (C) 3′ End pausing ratios were calculated for all classes of genes in (B) and plotted as boxplots. For this analysis, genes had to be greater than 3 kb in length and the gene body RPKM (reads per kilobase per million) had to be ≥1. Genes found at the beginning (n = 415) and middle (n = 275) of operons had higher 3′ pausing than genes at the end (n = 467) of operons. Genes lacking a downstream gene had similar 3′ pausing ratios whether or not (n = 3670) they were in operons. The 3′ pausing ratios for genes in all classes were greater than for Drosophila genes (n = 3260). (D)–(F) Upstream divergent transcription is common at promoters of C. *elegans* genes. GRO-seq and GRO-cap profiles show transcription of a divergent gene pair (D) or divergent transcription from a promoter without an upstream divergent gene partner (E). Gene on plus strand (red gene and signal). Gene on minus strand (blue gene and signal). (F) Upstream divergent transcription from C. *elegans* promoters is intermediate between that in humans and Drosophila. Plot compares the log₂(sense/antisense) transcription ratio of human and fly promoters to C. *elegans* promoters without divergent gene pairs. The median log₂ ratios are 0.3 for humans, 2.3 for C. *elegans*, and 5.0 for Drosophila. RPM: reads per million.

https://doi.org/10.7554/eLife.00808.023

Figure 4—source data 1 Gene expression, and 5′ pausing and 3′ pausing data for protein-coding genes.: https://doi.org/10.7554/eLife.00808.024
Download elife-00808-fig4-data1-v1.xls

Pol II 3′ accumulation, likely caused by slow 3′ end formation and RNA processing (Gromak et al., 2006), is positively correlated in C. elegans with the expression level of the gene (Figure 4—figure supplement 1A and Figure 4—source data 1) and is more extensive in C. elegans than in Drosophila (Figure 4A and Figure 4—figure supplement 1B). Multiple peaks of 3′ accumulation within a gene help identify genes having several isoforms with distinct 3′ ends (Figure 1—figure supplement 10D). The prevalence in C. elegans of polycistronic operons requiring extensive RNA processing to produce monocistronic mRNAs caused us to ask whether 3′ Pol II accumulation was positively correlated with the gene’s position in an operon and hence the level of RNA processing. First and middle genes in an operon require two forms of co-transcriptional RNA processing at the 3′ end to generate monocistronic mRNAs, polyadenylation of the upstream gene, and trans-splicing of the downstream gene, while the terminal gene requires only polyadenylation at the 3′ end. First and middle genes showed more 3′ accumulation than terminal genes or genes not in operons (Figure 4B,C). In other words, 3′ pausing is longer at genes with another gene to process just downstream. The 3′ Pol II accumulation at terminal genes and genes not in operons was equivalent, yet greater than in Drosophila genes. Lastly, 3′ accumulation was similar between terminal genes of operons and monocistronic genes undergoing trans-splicing, and both gene sets had greater 3′ accumulation than monocistronic genes lacking trans-splicing, which nonetheless had more 3′ accumulation than Drosophila genes (Figure 4—figure supplement 2A). Cues triggering trans-splicing, particularly operon-specific trans-splicing, appear to facilitate 3′ accumulation and perhaps predispose C. elegans Pol II to greater 3′ pausing genome-wide, whether or not a gene resides in an operon. It should be noted that Pol II accumulation at 3′ ends does not overlap with U-rich regions at 3′ ends. Therefore, the high GRO-seq signal is not due to selective enrichment of U-rich RNAs (Figure 4—figure supplement 2B,C).

GRO-seq readily detects divergent transcription from promoters. In C. elegans, divergent transcripts are short and initiated 75–150 bp upstream from TSSs of promoters lacking upstream divergent genes (Figure 4E and Figure 4—figure supplement 3A–C). The frequency of upstream divergent transcription in C. elegans appears to be intermediate in degree between that in mammals, where it occurs at the majority of active promoters (Kapranov et al., 2007; Core et al., 2008; Seila et al., 2008), and that in Drosophila, where it occurs only rarely (Nechaev et al., 2010; Core et al., 2012) (Figure 4F). For both mammals and worms, promoters with TATA elements rarely support divergent transcription (Figure 4—figure supplement 3D,E) (Core et al., 2012). These results underscore fundamental similarities and differences in the architecture, evolution, and function of eukaryotic promoters.

Genome-wide consequences of disrupting dosage compensation

Analysis of transcriptionally engaged Pol II in wild-type vs dosage-compensation-defective embryos provided a robust assessment of the genome-wide impact of disrupting dosage compensation. To assess dosage compensation, we disrupted sdc-2 (sex determination and dosage compensation), the central XX-specific factor that triggers assembly of all DCC components onto X and induces hermaphrodite sexual differentiation by repressing the autosomal male sex-determining gene her-1 (Figure 5A) (Dawes et al., 1999). Without sdc-2, DCC subunits fail to bind to X, and her-1 is expressed, causing XX embryos to become severely masculinized and die from overexpression of X-chromosome genes (Figure 5A). The pivotal role of sdc-2 in dosage compensation made its depletion the most effective way to disrupt dosage compensation, although depletion of any DCC condensin subunit causes similar XX lethality and elevation of X gene expression, as shown previously by our genome-wide measurements of gene expression (Jans et al., 2009).

Figure 5 with 3 supplements see all

Download asset Open asset

GRO-seq analysis of dosage-compensation.

(A) Genetic hierarchy for coordinate control of sex determination and dosage compensation. *sdc-2* is expressed solely in XX embryos and triggers the hermaphrodite fate. *sdc-2* acts together with *sdc-1* and *sdc-3*, both zinc finger proteins, to induce hermaphrodite sexual development by repressing transcription of the male sex-determining gene *her-1. sdc-2* acts together with *sdc-3* and *dpy-30*, also a member of the MLL/COMPASS gene activating complex, to load the DCC onto X and thereby turn dosage compensation on. *sdc-2* is the single gene required for all DCC components to assemble onto X. Without *sdc-2*, *her-1* is expressed, causing sexual transformation of XX embryos to the male fate, and the DCC fails to assemble onto X, causing severe dosage compensation disruption and the death of all XX embryos. The DCC contains not only the X loaders (red, orange) but also five homologs of the mitotic condensin complex (yellow, blue, green). The DCC binds to the X chromosomes of only XX animals to reduce transcription by half, thereby equalizing X-chromosome gene expression between males (XO) and hermaphrodites (XX). (B) GRO-seq shows that both RNA isoforms of *her-1* are elevated in *sdc-2* XX mutants. *her-1* is expressed at such a low level in XX embryos that the two transcription start sites (TSSs) are only evident with GRO-cap in the *sdc-2* mutants. The gene model (red arrow) incorporates 3′ end data from Jan et al. (2011). (C) and (D) The X-linked protein coding gene *pdi-2* is elevated in expression in *sdc-2* mutants as is the gene encoding the *mir-62* microRNA. For *pdi-2*, the elevation starts at the TSS and is evident throughout the gene. Red arrows show our re-annotated gene models. (E) The DCC subunit DPY-27 binds just upstream of the TSS. Comparison of the average DPY-27 ChIP-seq signal relative to WormBase (WB) starts and TSSs of X-linked genes. RPKM: reads per kilobase per million; RPM: reads per million.

https://doi.org/10.7554/eLife.00808.028

Figure 5—source data 1 Genome-wide changes in gene expression caused by the disruption of dosage compensation. Genes were separated into different gene sets based on their length, origin, and chromosome (X vs autosome) to compare GRO-seq gene expression between the sdc-2 mutant and control RNAi embryos. Shown are the total number of genes, the median sdc-2 mutant/control RNAi expression ratio, the average sdc-2 mutant/control RNAi expression ratio together with the standard error of the mean, and the number of genes in each set that are more highly expressed in each condition. Across numerous protein-coding gene sets, the X chromosome is more highly expressed and the autosomes are slightly less expressed in the sdc-2 mutant. Furthermore, when X-linked genes are significantly changed in expression, they are almost exclusively increased in expression. Each gene set is separated into two sets, one containing all genes and the other containing genes that are significantly changed in expression as determined by analysis with DESeq (p<0.05) (Anders and Huber, 2010). For the first two lists (labeled with ‘≥250 bp’), average GRO-seq gene expression was calculated from the beginning to the end of either the WormBase (WB) model or the newly annotated transcription start site (TSS) gene model. WB genes had to be expressed at greater than 1 RPKM (reads per kilobase per million), and have at least 250 uniquely mappable bases in both sets. For the next set (labeled with ‘WormBase WS230 Genes ≥1.1 kb’), average GRO-seq expression was calculated for genes greater than 1.1 kb, with the first and last 300 bp of the gene excluded. The level of expression had to be ≥1 RPKM for WB genes, and have at least 250 uniquely mappable bases for a gene to be included. For the final two sets of genes (labeled with ‘≥1.1 kb’ and ‘≥1.5 kb’), expression was calculated for genes of the indicated length that have a newly annotated TSS, with the first and last 300 bp of the gene excluded. A gene had to have at least 250 uniquely mappable bases for it to be included. RNA polymerase II transcribed microRNAs are controlled by dosage compensation, while RNA polymerase III transcribed tRNAs are not. Average GRO-seq gene expression from sdc-2 mutant and control RNAi embryos was compared across ncRNAs. For microRNAs, expression values were calculated from the full length of the WB ‘primary transcript’ or re-annotated TSS gene models. For tRNAs, expression values were calculated from the beginning of the ‘mature transcript’ to 50 bp downstream of the stop. Because tRNAs are highly repetitive and transcription of highly transcribed tRNAs continues downstream of the stop, the extra 50 bp was included to increase the unique mappability of each tRNA. For a gene to be considered for analysis, it had to have at least 25 bp of uniquely mappable DNA, and to have an average expression of at least 1 RPKM in both control RNAi and sdc-2 mutant embryos. The median and mean sdc-2/control expression levels show that X-linked microRNAs are more susceptible to dosage compensation than autosomal microRNAs. X-linked tRNAs are decreased slightly in expression in the sdc-2 mutant, suggesting that its expression is not controlled by dosage compensation.: https://doi.org/10.7554/eLife.00808.029
Download elife-00808-fig5-data1-v1.xls

Severe disruption of sdc-2 function was achieved by treating an sdc-2 partial loss-of-function XX mutant with sdc-2 RNAi [sdc-2(y93, RNAi)] (Figure 5—figure supplement 1). GRO-seq data showed that expression of her-1 (Figure 5B) and protein-coding genes on X (Figure 5C) were elevated in sdc-2 mutants, while expression of genes on autosomes was mildly reduced (Figure 5—figure supplement 2A–D). her-1 was de-repressed 12.7-fold, and the increase in GRO-seq signal was uniform from the promoter to the 3′ end, suggesting that SDC-2 controls sex determination by reducing Pol II recruitment or initiation at the her-1 promoter in XX embryos (Figure 5B).

The median change of gene expression in sdc-2 mutant vs control embryos was an increase of 1.63- to 1.67-fold for X-linked genes of all lengths and a slight reduction of 0.79- to 0.81-fold for autosome-linked genes (Figure 5—source data 1 and Figure 4—source data 1). For the subset of X and autosomal genes whose expression was statistically different (p≤0.05) between mutant and control embryos, 99% of X-linked genes had elevated expression in the mutants, while 63–66% of autosomal genes had reduced expression (Figure 5—source data 1 and Figure 4—source data 1). The reduced autosomal gene expression was robust to normalization procedures that counteract potential complications from increased X expression (see ‘Materials and methods’).

Our GRO-seq experiments also provided the first indication that dosage compensation controls the expression of small non-coding RNAs. We found most embryonically expressed X-linked microRNAs to be dosage compensated (Figure 5D and Figure 5—source data 1, and ‘Materials and methods’), while X-linked tRNAs were not, implying that RNA polymerase II is broadly sensitive to dosage compensation, and RNA polymerase III is insensitive (Figure 5—source data 1).

Previous studies showed the DCC binds to sequence-specific DNA recruitment sites on X and disperses to promoter regions of actively transcribed genes (Ercan et al., 2009; Jans et al., 2009; Pferdehirt et al., 2011), but the lack of precise TSS calls had prevented an accurate alignment of DCC binding sites with promoters. Mapping of new TSSs relative to DCC binding sites called from our ChIP-seq experiments performed for this comparison showed the peak of DCC binding to be immediately upstream of the TSS (Figure 5E). Although the DCC binds to promoter regions, prior studies showed that DCC binding to a gene is neither necessary nor sufficient for the compensation of that gene, and not all genes are dosage compensated (Jans et al., 2009). We assessed the generality of that conclusion by comparing DCC binding upstream of a TSS to the increase in gene expression caused by disrupting sdc-2. Our results confirmed and extended the original conclusion (Figure 5—figure supplement 3A,B). Thus, the DCC can act at a distance to control gene expression, and DCC binding intensity is not a proxy for the dosage compensation status of the gene.

The C. elegans dosage compensation process does not regulate promoter-proximal pausing

To determine the step of transcription controlled by the DCC, we compared the distribution of the GRO-seq signal along X and autosomal genes between control and dosage-compensation-defective embryos. The change in Pol II distribution expected from disrupting dosage compensation differs according to the step of transcription affected by the DCC. If the DCC restricts Pol II recruitment, a uniform increase in engaged Pol II would be expected from the promoter to the 3′ end of genes of mutants. If the DCC reduces transcription by preventing the release of Pol II promoter-proximal pausing, or by reducing transcription elongation at stages downstream of the pause site, an increase in the level of engaged Pol II would be expected in the gene body and 3′ end in mutants, with the increase beginning more promoter proximal for a mechanism that controls pausing.

Promoter-proximal pausing of transcriptionally engaged Pol II is a rate-limiting step of transcription in metazoans (Adelman and Lis, 2012) observed at ∼ 40% of active genes in mammalian cells (Core et al., 2008) and more than 60% of active genes in Drosophila (Core et al., 2012). To determine whether control of 5′ pausing was a plausible mechanism for dosage compensation, we calculated 5′ pausing ratios (GRO-seq signal for 5′ end/gene body) for genes in each developmental stage using TSSs calls for the respective stage (Figure 4—source data 1). In contrast to Drosophila and mammalian genomes, very few genes (0.38%, 15 of 3975) of wild-type C. elegans embryos showed evidence of 5′ pausing (Figure 4—source data 1, Figure 6A,B, and Figure 6—figure supplement 1A–D). The paused genes were not enriched on X (Figure 6C). Moreover, the number of 5′ paused genes was not decreased in sdc-2 mutants (Figure 4—source data 1 and Figure 6D,E).

Figure 6 with 1 supplement see all

Download asset Open asset

Promoter-proximal RNA Pol II pausing is rare in *Caenorhabditis elegans* and is not the target of dosage compensation.

(A) GRO-seq and GRO-cap signals show that a gene not paused in embryos becomes paused in L1 larvae deprived of food. (B) Comparison of average GRO-seq signal from embryos, starved L1 larvae, and L3 larvae within 2 kb of transcription start sites (TSSs) called in all three stages shows that promoter-proximal pausing in embryos and L3s is rare compared to that in starved L1 larvae. (C) Promoter-proximal pausing is not enriched on the X chromosome relative to autosomes in embryos. If dosage compensation prevented the release of Pol II from promoter-proximal pause sites, there should be higher levels of pausing on the X chromosome. (D) and (E) The level of promoter-proximal pausing is not decreased in *sdc-2* mutants compared to control embryos. If dosage compensation reduced gene expression by preventing the release of Pol II from promoter-proximal pause sites, the *sdc-2* mutant should exhibit lower levels of pausing. (D) Although X-linked genes have increased expression in *sdc-2* mutants, their level of pausing is not decreased. (E) The level of pausing displayed by autosomal genes is unchanged in *sdc-2* mutants. RPKM: reads per kilobase per million; RPM: reads per million.

https://doi.org/10.7554/eLife.00808.033

Nonetheless, the GRO-seq assay is capable of detecting 5′ pausing in C. elegans, since genes not paused in embryos become paused in L1s deprived of food (Figure 6A and Figure 6—figure supplement 1A–C). In starved L1s, we found 7.7% of genes (166 of 2133) to exhibit 5′ pausing, and most were on autosomes (Figure 4—source data 1). A prior genome-wide analysis of Pol II binding in L1 animals cultured with and without food discovered an enrichment of Pol II only at some promoters of food-deprived larvae, consistent with 5′ proximal pausing (Baugh et al., 2009). Our results confirm their proposal that Pol II pausing can be induced by food deprivation. These cumulative results make it highly implausible that 5′ pausing control is the mechanism of X-chromosome dosage compensation in embryos, and they are consistent with the lack of a C. elegans negative elongation factor complex (NELF), which contributes, in part, to pausing in other organisms (Yamaguchi et al., 1999).

The C. elegans dosage compensation process regulates the recruitment of Pol II to promoters

Analysis of GRO-seq data comparing transcriptionally engaged Pol II in control vs dosage-compensation-defective XX embryos revealed a uniform increase in the level of engaged Pol II across the entire length of X-linked genes, from TSSs to 3′ ends, in the sdc-2 mutant. This conclusion was reached both by metagene analysis (Figure 7A) and by the analysis of individual genes exhibiting a range of overexpression in the mutant (Figures 5C and 7C, and Figure 7—figure supplement 1). Analysis of individual genes averts complications in interpretations that might arise from averaging of data.

Figure 7 with 4 supplements see all

Download asset Open asset

The DCC condensin complex reduces X-chromosome gene expression in XX embryos by restricting Pol II recruitment to promoters.

(A) Uniform increase in GRO-seq signal across the length of X-linked genes results from disrupting dosage compensation. Metagene analysis comparing the average GRO-seq signal from 665 X-linked genes ≥1.5 kb in control RNAi or *sdc-2* mutant embryos. Genes were scaled to the same length as follows: 5′ ends (−1 kb to +500 bp of the transcription start site [TSS]) and 3′ ends (500 bp upstream to 1 kb downstream of 3′ end) were not scaled, and the gene body was scaled to 2 kb. The signal was averaged at each base pair and then averaged across 25 bp windows. The GRO-seq signal is elevated approximately 1.6-fold across genes in *sdc-2* mutant versus control RNAi embryos (below). (B) The GRO-seq signal is decreased slightly across autosomal genes in *sdc-2* mutant versus RNAi control embryos. Metagene analysis of 2949 autosomal genes ≥1.5 kb performed as in (D). The ratio of the GRO-seq signal in mutant versus control embryos is about 0.9. (C) Heat map shows that the GRO-seq signal is increased along the length of each X-linked gene in *sdc-2* mutants. For each of 665 genes, the GRO-seq signal from mutant or control embryos was summed across 100 bp windows and the *sdc-2*/control ratio was calculated for each window. The log₂(*sdc-2*/control) value was plotted across the scaled gene. (D) Heat map shows that the GRO-seq signal is moderately decreased along the length of individual autosomal genes in *sdc-2* mutants. For each of 2949 autosomal genes, the log₂(*sdc-2*/control) value was plotted across the scaled gene as in (F). (E) Dosage compensation does not specifically affect Pol II elongation. An elongation density index was calculated for each gene greater than 2 kb in length that did not have another gene on the same strand within 1 kb of the TSS. After excluding the first and last 500 bp of the gene, the average signal across the last 75% of the remaining gene was divided by the average signal across the first 25% of the remaining gene. Ratios of the indices between the *sdc-2* mutant and control RNAi embryos are not significantly different for genes on the X compared to the autosomes. Error bars represent a 95% confidence interval for the mean indices. n = 481 (X); n = 1861 (autosomes). (F) Occupancy of hypo-phosphorylated Pol II at the promoters of X-linked genes is increased in dosage compensation mutants, showing greater Pol II recruitment. Comparison of normalized Pol II ChIP-chip signal from control RNAi or *sdc-2* mutant embryos relative to newly annotated TSSs of X-linked genes. (G) Sense and upstream divergent transcription are coordinately increased for X-linked genes in *sdc-2* mutants. Comparison of average sense or antisense GRO-seq signal from *sdc-2* mutant and control RNAi embryos for a 3 kb window surrounding TSSs for genes with no divergent gene partner. The GRO-cap signal was only evaluated in this analysis for genes having a log₂(sense/antisense) ratio ≤1.5 in control RNAi embryos.

https://doi.org/10.7554/eLife.00808.035

For the metagene analysis, the ratio of engaged polymerase in mutant vs control was consistently elevated by at least 1.6-fold across scaled X-linked genes, from TSSs to 3′ ends (Figure 7A). This elevation in the GRO-seq signal was also readily apparent in metagene analyses of X-linked genes subdivided by quartiles of expression (Figure 7—figure supplement 2) or by gene length (Figure 7—figure supplement 3). In contrast, metagene analysis of autosomal genes from the same data sets showed only a slight decrease in engaged Pol II across their lengths (Figure 7B), consistent with the slight decrease in autosomal gene expression in mutants (Jans et al., 2009).

Heat maps displaying the log₂ ratio of mutant to control GRO-seq signal across individual X-linked (Figure 7C) and autosomal (Figure 7D) genes confirmed the uniform elevation of transcriptionally engaged Pol II across X-linked genes and the reduction across individual autosomal genes of sdc-2 mutants. Together, these results demonstrate that the dosage compensation mechanism reduces X-linked gene expression in C. elegans hermaphrodites by restricting either the recruitment or initiation of Pol II. This conclusion is strongly supported by the analysis of elongation density indices (average GRO-seq signal across the last 75% of genes/average signal across the first 25%) in mutant vs control embryos for genes on X and autosomes. The ratios of indices between control and mutant embryos were not significantly different for genes on X compared to autosomes, indicating that dosage compensation does not preferentially affect Pol II elongation (Figure 7E).

Two further lines of evidence show Pol II recruitment to be the predominant step of transcription targeted by the dosage compensation process. First, re-analysis of our previous genome-wide ChIP-chip analysis of Pol II binding (Pferdehirt et al., 2011) relative to the new TSSs showed an approximately twofold increase in the occupancy of hypo-phosphorylated Pol II at the promoters of X-linked genes in sdc-2 mutant vs control embryos (Figure 7F). DNA-bound hypo-phosphorylated Pol II at promoters is enriched for non-initiated Pol II. The increase in Pol II occupancy at promoters of mutants (measured by ChIP) was nearly equivalent to the increase in post-initiated Pol II (measured by GRO-seq), indicating that Pol II promoter recruitment is rate-limiting when dosage compensation is active.

Second, in sdc-2 mutants, the levels of sense and upstream divergent transcription were elevated coordinately at promoters on X lacking upstream divergent genes (Figure 7G and Figure 7—figure supplement 4A). At those promoters, the level of hypo-phosphorylated Pol II was also elevated about twofold in mutant vs control embryos, as assayed by ChIP (Figure 7—figure supplement 4B). Since we found increased Pol II occupancy in mutants and observed virtually no promoter-proximal pausing in control embryos, escape from pausing into productive elongation, or any other form of post-initiation regulation, cannot be the cause of elevated transcription at the divergent promoters in sdc-2 mutants. Furthermore, an increase in transcription initiation by bound polymerases cannot account for these results, but an increase in Pol II recruitment can. Our combined experiments demonstrate that C. elegans dosage compensation controls X-chromosome-wide gene expression predominantly by reducing Pol II recruitment to promoters of hermaphrodites.

C. elegans balances gene expression between X chromosomes and autosomes

In organisms that equalize X-chromosome gene expression between the sexes by a dosage compensation process, the question arises as to whether the compensated level of X-chromosome expression is equivalent to or half of the expression from the two sets of autosomes. The answer to this question has been controversial, although evidence has mounted in favor of a mechanism to balance total expression between X chromosomes and the two sets of autosomes based on measurements of accumulated transcripts (Xiong et al., 2010; Deng et al., 2011; Disteche, 2012; Lin et al., 2012; Deng et al., 2013; Jue et al., 2013). Our GRO-seq experiments have addressed this question in the most definitive way to date by measuring the levels of nascent transcripts prior to co-transcriptional processing. We show that in wild-type C. elegans embryos, X and autosomes have nearly equivalent levels of total gene expression, and that levels of transcribing Pol II are uniform across X and autosomal genes (Figure 8A). In dosage-compensation-defective mutants, the level of X expression and engaged Pol II exceeds that of autosomes by 1.7-fold, from the TSSs to the 3′ ends (Figure 8B). These results demonstrate the existence of a separate mechanism in C. elegans to elevate the intrinsic rate of transcription from the X chromosomes of both sexes, so that after dosage compensation, X chromosomes and the two sets of autosomes have equivalent expression. Our experiments provide evidence that it, like dosage compensation, works at the level of controlling Pol II recruitment.

Figure 8

Download asset Open asset

Gene expression is balanced between X chromosomes and autosomes.

(A) *Caenorhabditis elegans* has a mechanism to equalize expression between X chromosomes and autosomes. Metagene analysis comparing the average GRO-seq signal from X-linked and autosome-linked genes of control RNAi embryos. The X to autosome expression ratio is 0.9. (B) In dosage-compensation-defective mutants, the level of X-chromosome expression exceeds that of autosomes by 1.7-fold. Metagene analysis comparing the average GRO-seq signal from X-linked and autosome-linked genes of *sdc-2* mutant embryos. RPKM: reads per kilobase per million.

https://doi.org/10.7554/eLife.00808.040

Discussion

A critical step in dissecting the mechanism of dosage compensation in any organism is to determine the primary step of transcription controlled by the dosage compensation complex to balance X-chromosome gene expression between the sexes. A major obstacle towards attaining this goal for C. elegans had been the lack of properly annotated transcription start sites for the majority of its genes. We therefore initiated our studies by developing a general and efficient strategy to map TSSs for Pol II-regulated genes with nucleotide resolution. This method, called GRO-cap, utilized purified nascent transcripts from the NRO reactions developed for GRO-seq and employed redundant enzymatic steps to capture RNAs with 5′ caps. GRO-cap tracks RNAs prior to their co-transcriptional processing. This approach was critical for our analysis because co-transcriptional processing removes TSSs from the transcripts of most nematode genes, including protein-coding genes and small non-coding RNA genes.

We found TSSs for protein-coding genes to be unexpectedly far upstream from the 5′ ends of mature mRNAs. True TSSs were as far as 14 kb upstream of previously annotated 5′ ends (typically the trans-spliced acceptor sites), and 59% of TSSs were >200 bp upstream and 21% >1 kb upstream. GRO-cap was also effective at identifying TSSs for the majority of small processed RNAs, including microRNAs, tRNAs, snoRNAs, and 21 U-RNAs.

To determine the step of transcription controlled by the DCC, we compared the genome-wide distribution of transcriptionally engaged Pol II (from GRO-seq) relative to the newly defined TSSs (from GRO-cap) in wild-type vs dosage-compensation-defective XX embryos. We found 5′ promoter-proximal pausing to be rare, unlike in other metazoans, and not likely to be the target of dosage compensation. Instead, dosage compensation reduces recruitment of RNA Pol II to the promoters of hermaphrodite X-linked genes, thereby decreasing transcription from the two X chromosomes of hermaphrodites to the level of transcription from the single X chromosome of males. A separate regulatory mechanism elevates the intrinsic level of transcription from X chromosomes of both sexes so that after dosage compensation total expression from X chromosomes and autosomes is equivalent.

Advantages of GRO-cap

GRO-cap has advantages over TSS mapping strategies that use total accumulated RNA or short-capped (sc) RNA as the starting material. The use of nascent RNAs without base hydrolysis or size selection, as in GRO-cap, enriches the proportion of 5′ capped RNAs within the starting RNA population, reduces the level of false TSS calls from RNAs that are capped post-transcriptionally (Fejes-Toth et al., 2009), and increases the probability of identifying TSSs from promoters that are transcribed at low levels. Most importantly, the GRO-cap strategy permits reliable assignment of TSSs by pairing TSS calls with uninterrupted GRO-seq signal for transcriptionally engaged Pol II between the GRO-cap TSS and the previously annotated 5′ end. Multiple lines of evidence showed that GRO-cap signal combined with continuous GRO-seq signal from WB starts defines true TSSs.

Two reports of nematode TSSs have recently been published. The first used two approaches to identify the transcription starts (Gu et al., 2012). The first, called CapSeq, used total accumulated RNA as the starting material for the enzymatic enrichment of RNAs with 5′ caps and set the cutoff for TSS calls to be within 1 kb upstream and 100 bp downstream from previously annotated 5′ ends. As a consequence, CapSeq did not identify TSSs for a large class of protein-coding genes. TSSs of small processed non-coding RNAs were also difficult to identify by CapSeq. The second approach, called CIP-TAP, used scRNAs and was equivalently effective as GRO-cap for identifying TSSs of small non-coding RNAs.

The second report also used scRNAs to map TSSs and set the cutoff for TSS calls to be no greater then 200 bp upstream of the previously annotated 5′ ends, thereby also not defining TSSs for a large class of genes (Chen et al., 2013). The authors found numerous clusters of Pol II initiation upstream of their calls and classified many of these initiation events as enhancer-like chromatin signatures based on overlap with bound transcription factors. Based on comparison with our data (e.g., Figure 1—figure supplement 12A,B), we propose that a proportion of the upstream enhancer-like signatures are TSSs giving rise to full-length transcripts. Two clear examples of genes are shown in Figure 1—figure supplement 12B,C, where the single TSS (either 2534 bp or 2878 bp upstream of the WB start) called from GRO-cap and GRO-seq data was classified as an enhancer by Chen et al. (2013).

Features of C. elegans transcription

Although 5′ promoter-proximal pausing in metazoans is a common rate-limiting step of transcription that is highly regulated to control gene expression (Adelman and Lis, 2012), we found it to be rare in C. elegans under normal growth conditions. However, in L1 larvae deprived of food, we found 5′ pausing at 7.7% of genes with TSS calls. A prior study using ChIP-seq discovered the accumulation of Pol II at promoters of starved larvae and proposed that 5′ promoter-proximal pausing was responsible for Pol II accumulation (Baugh et al., 2009). Our results confirm that interpretation. Two factors promote 5′ pausing in metazoans, negative elongation factor (NELF) and DRB sensitivity-inducing factor (DSIF), although the relative contribution of each is not fully understood (Adelman and Lis, 2012; Yamaguchi et al., 2013). C. elegans appears to lack NELF, suggesting either that DSIF is sufficient in the infrequent cases of pausing or that the core promoter complex (Kwak et al., 2013) or another not-yet identified negative elongation factor participates.

The 3′ accumulation of Pol II has been documented previously in flies and humans using GRO-seq (Core et al., 2008, 2012). We found 3′ Pol II accumulation to be more extensive in nematodes than flies. Pol II 3′ accumulation is likely caused by slow 3′ end formation and RNA processing (Gromak et al., 2006). Because C. elegans has numerous polycistronic operons requiring extensive RNA processing to produce monocistronic mRNAs (Blumenthal, 2012), we were able to discover a strong positive correlation between the amount of 3′ Pol II accumulation and the amount of RNA processing. Curiously, though, even genes that were not part of operons exhibited greater 3′ accumulation than genes in flies. This observation raises the question of whether RNA processing in C. elegans is generally slower than that in other organisms to accommodate extensive trans-splicing, or whether C. elegans accommodates operons by imposing additional regulation on Pol II to enhance its 3′ pausing at all genes to assess whether to halt transcription or continue elongation.

C. elegans dosage compensation controls RNA Pol II recruitment

Gene expression in metazoans is controlled by diverse regulatory mechanisms that function over widely different distances. Some mechanisms act locally on individual genes, while others such as dosage compensation function across large chromosomal territories or along entire chromosomes to regulate a large set of genes coordinately. In general, the step of transcription controlled by long-range mechanisms is not understood. In our study, multiple lines of evidence supported the conclusion that C. elegans dosage compensation regulates gene expression along X primarily by reducing the recruitment of RNA Pol II to the promoters of hermaphrodite X-linked genes.

First, regulation of 5′ promoter-proximal pausing cannot be the mechanism underlying C. elegans dosage compensation. If the DCC reduced X-chromosome gene expression by increasing 5′ promoter-proximal pausing, numerous genes on X would be paused in wild-type embryos, and disruption of dosage compensation would reduce the level of pausing. GRO-seq experiments showed instead that 5′ promoter-proximal pausing is rare in wild-type XX embryos. The few genes that exhibited 5′ pausing were not enriched on X chromosomes relative to autosomes, and the level of pausing was not decreased in dosage-compensation defective mutants.

Second, Pol II-mediated transcription elongation is not preferentially affected by the DCC. The ratios of elongation density indices (average GRO-seq signal in last 75% of a gene/average GRO-seq signal in first 25% of a gene) calculated for genes on X chromosomes and autosomes in control vs dosage-compensation-defective embryos were not significantly different between X-linked and autosomal genes, indicating that Pol II-mediated transcription elongation is not selectively changed on X chromosomes of mutants.

Third, the level of transcriptionally engaged Pol II assayed genome-wide by GRO-seq in control vs dosage-compensation-defective XX embryos revealed a uniform increase in engaged Pol II across the entire length of X-chromosome genes, but not autosomal genes, in mutants. Hence, the DCC controls X-chromosome gene expression in XX animals by reducing either the recruitment or initiation of Pol II. This conclusion was validated both by metagene analysis and by analysis of hundreds of individual X-linked genes showing different levels of de-repression in dosage-compensation-defective mutants.

Fourth, genome-wide quantification of Pol II occupancy by ChIP plotted relative to the new TSSs showed an increase in the hypo-phosphorylated form of Pol II at promoters of dosage-compensation-defective embryos vs control embryos that was equivalent to the increase in post-initiated Pol II measured by GRO-seq. Since DNA-bound hypo-phosphorylated Pol II at promoters is enriched for non-initiated Pol II, these results indicate that Pol II promoter recruitment is rate limiting when dosage compensation is active.

Our combined experiments reveal that the primary mechanism by which the C. elegans dosage compensation process reduces X-chromosome gene expression by half in XX embryos is to limit Pol II recruitment to promoters of X-linked genes. Our study does not eliminate the possibility of a minor repressive influence acting through another step of transcription or through a post-transcriptional mechanism such as RNA stability.

DCC function

How might condensin reduce Pol II recruitment to the promoters of X-linked genes by approximately twofold? Our current and prior (Jans et al., 2009) studies showed that DCC binding to the promoter of a gene is neither necessary nor sufficient to elicit repression of the gene. Hence, the DCC influences gene expression over long distance, likely by imposing changes in higher-order chromosome structure. Clues to such a DCC function were suggested originally by the simultaneous discovery of the DCC and condensin’s biochemical properties in vitro as an ATPase that alters DNA topology (Chuang et al., 1994; Kimura and Hirano, 1997; Hagstrom et al., 2002; Hirano, 2012; Piazza et al., 2013) and its canonical roles in vivo of compacting and resolving mitotic and meiotic chromosomes for proper chromosome segregation (Hirano and Mitchison, 1994; Hagstrom et al., 2002; Chan et al., 2004; Hirano, 2012; Piazza et al., 2013). However, mechanisms of DCC function are perhaps best deduced from its non-canonical roles in vivo of regulating interphase chromosome structure (Bauer et al., 2012) and meiotic crossover recombination (Mets and Meyer, 2009; Wood et al., 2010; Aragon et al., 2013). Condensin II in Drosophila induces axial compaction of interphase chromosomes, globally disrupts inter-chromosomal interactions, and promotes dispersal of peri-centric heterochromatin (Bauer et al., 2012). These activities serve to compartmentalize the interphase nucleus into discrete chromosomal territories. Furthermore, nematodes carrying mutations that disrupt condensin I, which shares four subunits with the DCC, display elongated chromosomal axes during meiotic prophase and exhibit chromosome-wide changes in the distribution of double strand breaks and crossovers (Mets and Meyer, 2009), providing strong evidence that DCC subunits control intra-chromosomal structure.

These cytological, biochemical, and genetic observations of condensin function suggest that chromosome structure might affect Pol II recruitment in several ways, which are not mutually exclusive. Chromosome compaction and associated topological changes in DNA could broadly affect promoter accessibility of Pol II and the regulatory factors that recruit or stabilize it, thereby reducing Pol II recruitment in a quantitatively similar manner at different sites. Compaction of interphase chromosomal territories could reduce the local concentration of bound transcription activators and hence bound Pol II. Changes in intra-chromosomal interactions could alter the relationships between distal regulatory regions and their target promoters, thereby limiting Pol II recruitment. These and other models are topics of future studies.

Dosage compensation mechanisms differ between nematodes and flies

Dosage compensation strategies differ across species. Mammals inactivate one of the two female X chromosomes, flies double expression of the single male X chromosome, and nematodes halve expression of both hermaphrodite X chromosomes. A central question is whether the molecular mechanisms underlying these diverse forms of chromosome-wide transcriptional regulation are the same or different.

In Drosophila melanogaster, dosage compensation is achieved by the male-specific lethal (MSL) complex that binds along the single X chromosome of males to double transcription of X-linked genes (Gelbart and Kuroda, 2009; Conrad and Akhtar, 2012). The complex contains two long non-coding RNAs and five proteins, including the H4K16 histone acetyltransferase MOF (Hilfiker et al., 1997; Conrad et al., 2012a) and the H2BK34 ubiquitin ligase MSL2 (Wu et al., 2011). The MSL complex was proposed to regulate gene expression by controlling transcription elongation (Smith et al., 2001). This model was subsequently supported by genome-wide mapping of MSL proteins and MSL-dependent H4K16ac to the bodies of male X-linked genes, with a bias toward 3′ ends (Alekseyenko et al., 2006; Gilfillan et al., 2006; Legube et al., 2006), and GRO-seq experiments in S2 cells with or without RNAi of the msl2 gene (Larschan et al., 2011). While a recent study analyzing genome-wide Pol II occupancy in males and females by ChIP-seq experiments suggested an alternative view that fly dosage compensation operates at the level of Pol II recruitment or initiation (Conrad et al., 2012b), a mathematical computation error discovered in this study (Ferrari et al., 2013; Straub and Becker, 2013) rendered the results insufficient to distinguish between the competing models. The elongation model received recent further support from the discovery that dosage compensation is disrupted by impairing the function of SPT5, a transcription elongation factor that co-localizes with the MSL complex on male X chromosomes and interacts physically with the MSL complex (Prabhakaran and Kelley, 2012). Thus, the weight of evidence strongly favors enhancement of transcription elongation as the primary mechanism of fly dosage compensation.

Not only does the overall dosage compensation strategy differ between worms and flies, the underlying molecular mechanism appears to differ as well. In worms, reduction in X-chromosome gene expression is primarily achieved by reducing recruitment of Pol II to promoters, while in flies, elevation in X-chromosome gene expression is primarily achieved by facilitating Pol II transcription elongation. Multiple solutions have evolved to coordinately control gene expression across an entire chromosome.

Balancing gene expression between X chromosomes and autosomes

In many species, the evolution of sex chromosomes to be the primary determinants of sexual fate resulted in males having one X chromosome and females having two. Such chromosome sex-determining mechanisms had the potential to cause two problems in gene expression, an imbalance in X-chromosome gene expression between the sexes and an imbalance in gene expression between the male X chromosome and his two sets of autosomes (Disteche, 2012). X-chromosome dosage compensation strategies such as the Drosophila strategy solved both problems by doubling transcription of the single male X chromosome. An alternative solution would be to co-evolve two mechanisms, one to increase expression of X chromosomes in both sexes, thereby preventing hypo X expression in males, and a second to decrease total expression from female X chromosomes to prevent hyper X expression relative to female autosomes and male X chromosomes. Nematodes and mammals evolved the second strategy for equalizing X-chromosome expression between the sexes. It has been controversial whether these organisms also evolved a strategy to elevate X expression in both sexes and thereby balance expression between X chromosomes and autosomes (Xiong et al., 2010; Deng et al., 2011; Disteche, 2012; Lin et al., 2012; Deng et al., 2013; Jue et al., 2013).

The bulk of evidence now favors the presence of a mechanism to up-regulate X-chromosome expression in males and females of both organisms (Deng et al., 2011, 2013; Jue et al., 2013), and our results provide the most compelling evidence to date for C. elegans. Our approach of examining nascent transcripts in mixed-stage embryos by GRO-seq conferred three advantages. First, it enabled us to quantify transcription specifically from somatic embryonic cells and thereby avert the complication of quantifying maternally contributed germline transcripts that contaminate mature embryo mRNA. Second, we could examine nascent transcription at a stage for which most germline-specific mechanisms of gene regulation would have been erased. Third, we could quantify transcriptionally engaged Pol II across the entire length of a gene, starting from the bona fide TSS, and assess the step of transcription controlled by the process. We found that a major part of the mechanism to increase X expression in C. elegans is to increase the level of Pol II recruitment to promoters, as was shown recently for mammals (Deng et al., 2013). Together, these results reinforce the evolutionary importance of balancing gene expression across all chromosomes of a genome.

Share this article

Cite this article

Genome-wide annotation of Caenorhabditis elegans transcription start sites.

Figure 1—source data 1

Figure 1—source data 2

Figure 1—source data 3

GRO-cap strategy for identifying TSSs.

Features of promoters and TSSs.

Features of Caenorhabditis elegans transcription: 3′ Pol II pausing and divergent transcription.

Figure 4—source data 1

GRO-seq analysis of dosage-compensation.

Figure 5—source data 1

Promoter-proximal RNA Pol II pausing is rare in Caenorhabditis elegans and is not the target of dosage compensation.

The DCC condensin complex reduces X-chromosome gene expression in XX embryos by restricting Pol II recruitment to promoters.

Gene expression is balanced between X chromosomes and autosomes.

Author details

William S Kruesi

Contribution

Contributed equally with

Competing interests

Leighton J Core

Contribution

Contributed equally with

Competing interests

Colin T Waters

Contribution

Competing interests

John T Lis

Contribution

Competing interests

Barbara J Meyer

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism