Gene transcription can be activated by decreasing the duration of RNA polymerase II pausing in the promoter-proximal region, but how this is achieved remains unclear. Here we use a ‘multi-omics’ approach to demonstrate that the duration of polymerase pausing generally limits the productive frequency of transcription initiation in human cells (‘pause-initiation limit’). We further engineer a human cell line to allow for specific and rapid inhibition of the P-TEFb kinase CDK9, which is implicated in polymerase pause release. CDK9 activity decreases the pause duration but also increases the productive initiation frequency. This shows that CDK9 stimulates release of paused polymerase and activates transcription by increasing the number of transcribing polymerases and thus the amount of mRNA synthesized per time. CDK9 activity is also associated with long-range chromatin interactions, suggesting that enhancers can influence the pause-initiation limit to regulate transcription.https://doi.org/10.7554/eLife.29736.001
Genes can contain the coded instructions to make proteins. These instructions must first be copied, or transcribed, into an intermediate molecule called a messenger RNA by an enzyme known as RNA polymerase II. Shortly after it begins, this enzyme – which is called Pol II for short – pauses, and it only starts again after it recruits other proteins, including one called CDK9.
The number of RNA copies made of a gene depends upon how many Pol II enzymes begin transcription. Pol II pausing also has an effect – if the enzymes pause for longer, less messenger RNA is transcribed. But why does this happen? One hypothesis is that paused Pol II enzymes interfere with other Pol II enzymes initiating transcription. Yet, until recently it was not possible to measure if this actually happens in living cells.
Now, Gressel, Schwalb et al. used a new biochemical method together with a compound that blocks CDK9 to measure pausing and transcription initiation for active genes in living human cells. The CDK9 inhibitor was used to make Pol II enzymes pause for longer than normal. Gressel, Schwalb et al. found that different genes responded differently to CDK9 inhibition, meaning that some remained paused for longer than others. The number of Pol II enzymes that initiated transcription was calculated by measuring how many RNA copies had been made locally at that the site of transcription. These experiments showed that blocking the release of paused Pol II strongly reduced the number of RNA copies made.
Gressel, Schwalb et al. conclude that Pol II pausing can control initiation of transcription. Cells may use Pol II pausing to adjust how many copies of an RNA are made, helping to ensure that different cell types make the appropriate number of RNA copies from a gene. Many diseases are associated with gene transcription being incorrectly regulated. This and future studies will help scientists to better understand how Pol II pausing contributes to the control of transcription in both normal and diseased cells.https://doi.org/10.7554/eLife.29736.002
Transcription in metazoan cells is often regulated at the level of promoter-proximal pausing (Core et al., 2008; Day et al., 2016; Henriques et al., 2013; Nechaev et al., 2010; Rougvie and Lis, 1988; Strobl and Eick, 1992), which can be detected by measuring the occupancy with paused Pol II by ChIP-seq (Johnson et al., 2007), GRO-seq (Core et al., 2008), (m)NET-seq (Mayer et al., 2015; Nojima et al., 2015), or PRO-seq (Kwak et al., 2013). Genes with paused Pol II are conserved across mammalian cell types and states (Day et al., 2016). The mechanisms underlying how Pol II pausing can regulate RNA transcript synthesis remain unclear.
Transcription of a human protein-coding gene of average length takes at least half an hour to be completed. The duration of pausing however lies in the range of minutes (Jonkers et al., 2014) and does not considerably change the overall time it takes to complete a transcript. Thus, how can changes in the pause duration lead to synthesis of a different number of RNA transcripts per time? It has been suggested that a decreased pause duration goes along with a higher initiation frequency, because occupancy peaks for promoter-proximal Pol II can increase upon gene activation (Boehm et al., 2003) or can remain high even when pausing is impaired (Henriques et al., 2013).
The height of Pol II occupancy peaks however cannot directly inform on initiation frequency or pause duration because it depends not only on the number of polymerases that pass the pause site but also on their residence time (Ehrensberger et al., 2013). A kinetic model of transcription predicted that pause duration delimits the initiation frequency and suggested that paused Pol II sterically interferes with initiation (Ehrensberger et al., 2013). Indeed, modeling reveals that a paused polymerase positioned up to around 50 bp downstream of the TSS could sterically interfere with formation of the Pol II initiation complex (Figure 1—figure supplement 1). Even if a paused polymerase is located further downstream, it may still interfere with initiation if one or more additional elongating polymerases line up behind it.
The critical relationship between pausing and initiation could thus far not be tested experimentally, as no methods were available to measure initiation frequencies. A recently developed method, transient transcriptome sequencing (TT-seq) (Schwalb et al., 2016), now allows to unveil the flow of polymerases as it measures local RNA synthesis rates genome-wide at nucleotide resolution.
Here we investigate whether changes in pause duration alter initiation frequency in living cells. We specifically inhibit the kinase CDK9, which facilitates Pol II pause release (Laitem et al., 2015; Marshall and Price, 1992; Peterlin and Price, 2006), and monitor RNA synthesis and initiation frequencies by TT-seq. A combination of TT-seq data with mNET-seq data allows us to derive pause durations for active genes. We conclude that the duration of pausing can control transcription initiation at human genes, and derived determinants for CDK9-dependent pause release and initiation activation.
To specifically inhibit CDK9, we used a chemical biology approach (Lopez et al., 2014) that circumvents off-target effects of standard CDK9 inhibitors (Morales and Giordano, 2016). We introduced a CDK9 analog sensitive mutation (CDK9as) into human Raji B cells by CRISPR-Cas9 (Materials and methods, Figure 1—figure supplement 2A–B). This allows for rapid and highly specific CDK9 inhibition with the adenine analog 1-NA-PP1 (Lopez et al., 2014), which does not have any effect on wild type cells (Figure 1—figure supplement 2C). CDK9 protein levels were unchanged in CDK9as mutant cells compared to wild type cells (Figure 1—figure supplement 2D). After 72 hr of incubation with 1-NA-PP1, growth of CDK9as cells ceased, whereas wild type cells grew normally (Figure 1—figure supplement 2E).
We treated CDK9as cells with 5 μM of 1-NA-PP1 for 10 min and monitored changes in RNA synthesis by TT-seq (Schwalb et al., 2016), using a RNA labeling time of 5 min (Figure 1A). TT-seq data were highly reproducible (Spearman correlation coefficient 1) and monitored transcription activity before and after CDK9 inhibition (Figure 1B). CDK9 inhibition resulted in reduced TT-seq signals at the beginning of genes, indicating that less Pol II was released into gene bodies (Figure 1B, Figure 2—figure supplement 1A–B). This gave rise to a ‘response window’ revealing the distance traveled by Pol II during 10 min inhibitor treatment (Figure 1C). Downstream of the response window, the TT-seq signal was largely unchanged, indicating continued RNA synthesis from Pol II elongation complexes that had been released before CDK9 inhibition.
To determine the relative response of genes to CDK9 inhibition, we calculated response ratios for those transcribed units (TUs, Materials and methods) that synthesized RNA, harbored a single TSS, and exceeded 10 kbp in length (2,538 TUs). The response ratio of TUs varied between 0% to 100% (fully responding TUs) with a median of 58% (Figure 1C–E). A remaining TT-seq signal in the response window likely reflects the proportion of polymerases that move to productive elongation without CDK9 kinase activity, but we cannot exclude that it stems from incomplete CDK9 inhibition. However, based on the assumption that the inhibitor is evenly distributed across cells and within, the portion of CDK9 that has not been fully inhibited must be very low.
The width of the response window differs between TUs (Figure 1D) and informs on Pol II elongation velocity (Materials and methods). The average width of the response window was 23 kbp, and thus the average elongation velocity was 2.3 kbp/min (Figure 2A–B), which agrees with previous estimates (Fuchs et al., 2014; Jonkers et al., 2014; Saponaro et al., 2014; Veloso et al., 2014). Gene-specific elongation velocities (Figure 2C, Figure 2—figure supplement 1A–B) were significantly higher in TUs with longer first introns (Figure 2D, Wilcoxon rank sum test, p-value<1.916·10−11), consistent with faster transcription of introns (Jonkers et al., 2014). Elongation velocity correlated positively with nucleosome density, and negatively with the stability of the DNA-RNA hybrid, CpG density and topoisomerase occupancy (Figure 2—figure supplement 1C).
To study the kinetics of CDK9-dependent Pol II pause release, we generated mNET-seq data that map the RNA 3’-end of engaged Pol II and extracted the position of paused polymerases (Materials and methods). mNET-seq data were highly reproducible (Spearman correlation coefficient 0.93). Of the above TUs, 2135 (84 %) showed mNET-seq signal peaks above background (Materials and methods). The called pause sites were distributed around a maximum located ~84 bp downstream of the TSS (Figure 3A, Figure 3—figure supplement 1A). At these sites we detected an enrichment for G/C-C/G dinucleotides (Figure 3—figure supplement 1B) with a strongly conserved cytosine at the RNA 3’-end (Figure 3B). We also observed a minimum of the predicted melting temperature of the DNA-RNA hybrid (Materials and methods) immediately downstream of the pause site (Figure 3C). A weak DNA-RNA hybrid in the active center of Pol II is known to destabilize the elongation complex (Kireeva et al., 2000), and could be a major determinant for establishing the paused state.
To quantify pausing, we defined the pause duration d as the time a polymerase needs to pass through a 200 bp ‘pause window’ located ±100 bp around the pause site. The pause duration d can now be derived from a combination of mNET-seq and TT-seq data. In particular, the mNET-seq signal corresponds to the number of polymerases in the pause window, which is determined by d and by the initiation frequency I (Figure 4A) (Ehrensberger et al., 2013). Thus, d is proportional to the ratio of the mNET-seq signal over I. To calculate I we integrated TT-seq signals over exons, excluding the first exon (Materials and methods). This provides the ‘productive initiation frequency’, that is the number of polymerases that initiate and successfully exit from the pause window. We use the term ‘productive’ because we do not know whether there is a small fraction of polymerases terminating within the pause window. Finally, to derive absolute values of d, we scaled the reciprocal of d (the elongation velocity in the pause window) according to the elongation velocity obtained from CDK9 inhibition (Materials and methods).
We obtained a mean productive initiation frequency of 2.7 polymerases cell−1min−1, and pause durations in the range of minutes, with strong variations between TUs. The pause durations are generally consistent with reported half-lives of paused Pol II in mouse (Jonkers et al., 2014) and Drosophila cells (Buckley et al., 2014; Henriques et al., 2013) but slightly shorter. Pause durations were also consistent with kinetic modeling of TT-seq data alone. At TUs with long pause durations we observed less labeled RNA in the short region between the TSS and the pause site (Figure 4—figure supplement 1). This confirms that indeed initiation frequencies are altered. It also indicates that the fraction of Pol II enzymes that terminate within the pause window is low, in agreement with previous findings (Henriques et al., 2013). For strongly CDK9-responding TUs, we obtained a significantly longer pause duration (Wilcoxon rank sum test, p-value<10−12) and lower initiation frequencies (Figure 4B–C).
These results prompted us to ask whether the pause duration is generally related to the initiation frequency. We indeed found a robust anti-correlation between I and d in normally growing cells, and an upper boundary for combinations of I and d which we call ‘pause-initiation limit’. (Figure 4D, Figure 4—figure supplement 2A). Thus, genes with shorter pausing show higher initiation frequencies and more RNA synthesis. This fundamental relationship can be verified by calculating the pause duration without the initiation frequency I, (Materials and methods, Figure 4—figure supplement 2B–C,E). Repeated random shuffling of mNET-seq signal assignment to TUs abolishes the correlation between and I (Figure 4—figure supplement 2D). It also shows that the observation of impossible combinations of pause duration and initiation frequency I (points above ‘pause-initiation limit’) are minimal (Figure 4—figure supplement 2F). In conclusion, independent mNET-seq and TT-seq data led to independent measures of pause duration and productive initiation frequency for each gene, which were then observed to be globally anti-correlated.
These findings now allowed us to test directly whether longer pause durations lead to lower initiation frequencies, by analyzing TT-seq data after CDK9 inhibition. CDK9 inhibition resulted in significantly reduced labeled RNA in the short region between the TSS and the pause site (Wilcoxon rank sum test, p-value<10−16) (Figure 5A–B). Productive initiation frequencies were significantly downregulated after CDK9 inhibition (Wilcoxon rank sum test, p-value<10−16) (Figure 5C). Because CDK9 specifically targets paused Pol II, and not initiating polymerase, these results show that pausing limits initiation, and not the other way around. Thus, human genes have a ‘pause-initiation limit’.
To monitor the occupancy of engaged Pol II we generated mNET-seq data before and after CDK9 inhibition (Materials and methods). CDK9 inhibition resulted in increased mNET-seq signal at the beginning of genes and decreased signal in the gene body, indicating that less Pol II was released from the pause site (Figure 6A). Indeed, calculation of pause durations from mNET-seq and TT-seq data after CDK9 inhibition showed that Pol II resides significantly longer at the pause site after CDK9 inhibition (Wilcoxon rank sum test, p-value<10−16) (Figure 6B). Taken together, CDK9 inhibition increases the pause duration and decreases the initiation frequency at human genes (Figure 6C–D).
To investigate possible reasons for polymerase pausing and its consequences, we compared different properties of TUs with long and short pause durations. For the 5’-region of TUs with longer pause durations, the transcript adopts more RNA secondary structure in vivo and in silico (Wilcoxon rank sum test, p-value<10−16) (Figure 7A, Figure 7—figure supplement 1A) (Rouskin et al., 2014). TUs with longer pause durations were also enriched for hyper-methylated CpG islands (ENCODE Project Consortium, 2012) upstream of the pause site (Figure 7B), consistent with a previous report (Hendrix et al., 2008). Comparison of strongly and weakly CDK9-responding TUs around the pause site showed that TUs that responded strongly to CDK9 inhibition showed a higher tendency to establish long-range chromatin interactions (Figure 7C) as observed by Hi-C (Ma et al., 2015). This is consistent with the idea that interactions of an enhancer with its target promoter can stimulate Pol II pause release (Ghavi-Helm et al., 2014; Rahl et al., 2010). This tendency however seems to be independent of the pause duration as comparing TUs with long and short pause durations leads to no observable difference in Hi-C signal.
Finally, we investigated which factors preferentially occupy pause windows with longer pause durations. This is now possible because ChIP-seq signals can be normalized with the productive initiation frequency. Without such normalization, ChIP-seq derived factor occupancies are artificially high in pause windows with long pause durations (Ehrensberger et al., 2013). Correlation of such normalized ChIP-seq signals in the pause window with pause durations (Figure 7—figure supplement 1B–C) resulted in a positive correlation for Pol II phosphorylation at sites that are associated with elongation, and also for NELF-E, CDK9, and Brd4, which are all factors involved in Pol II pausing and release.
Taken together, our results show that Pol II pausing can control transcription initiation and demonstrate the central role of CDK9 in controlling pause duration and thereby the productive initiation frequency. Our results have implications for understanding gene regulation. Genes that show initiation frequencies below the pause-initiation limit may be activated by increasing the initiation frequency without changing pause duration. However, activation of genes that are transcribed at the pause-initiation limit requires a decrease in pause duration, that is stimulation of pause release, to enable higher initiation frequencies. We suggest that pause-controlled initiation evolved because mutations in the promoter-proximal region can change pause duration, and thereby limit initiation, but do not compromise a high initiation capacity of the core promoter around the TSS. This may have enabled the evolution of genes that remain highly inducible but can be efficiently downregulated.
After our work had been completed, a publication appeared that concluded that polymerase pausing inhibits new transcription initiation (Shao and Zeitlinger, 2017). The conclusion in this paper is consistent with our general finding of an interdependency of Pol II pausing and transcription initiation, but the two studies differ in three aspects. First, we used human cells whereas the published work was conducted in Drosophila cells. Second, our work uses a multi-omics approach to enable a kinetic description, whereas the published work is based on changes in factor occupancy. Third, we selectively inhibited CDK9 using CRISPR-Cas9-based engineering and chemical biology, whereas the published work used small molecule inhibitors that may target multiple kinases. Despite these differences, the general conclusion that promoter-proximal pausing of Pol II sets a limit to the frequency of transcription initiation holds for both human and Drosophila cells and is likely a general feature of metazoan gene regulation.
|Reagent type (species) |
|Designation||Source or reference||Identifiers||Additional information|
|cell line (Homo sapiens; male)||Raji B lymphocyte cells (wild type)||DSMZ||DSMZ Cat# ACC-319; RRID:CVCL_0511|
|cell line (Homo sapiens; male)||Raji B lymphocyte cells (CDK9as)||This paper||Raji B cells were obtained from DSMZ Cat# ACC-319, RRID:CVCL_0511. Homozygous mutation of F103 at the CDK9 gene loci in Raji B cells was performed using the CRISPR-Cas9 system.|
|antibody||anti-CDK9||Santa Cruz, Dallas, TX USA||sc-484|
|antibody||anti-alpha-tubulin||Sigma-Aldrich, St. Louis, MO USA||DM1A|
|antibody||anti-Pol II (total, unphos + phos)||BIOZOL, Eching, Germany||MABI0601|
|commercial assay or kit||CellTiter 96 AQueous One Solution Cell Proliferation Assay (MTS)||Promega, Madison, WI USA||G3582|
|commercial assay or kit||Plasmo Test Mycoplasma Detection Kit||InvivoGen, San Diego, CA USA||rep-pt1|
|commercial assay or kit||Ovation Universal RNA-Seq System||NuGEN, Leek, The Netherlands||0343–32|
|commercial assay or kit||TruSeq Small RNA Library Prep Kit||Illumina, Massachusetts USA||RS-200–0012|
|chemical compound, drug||CDK9as inhibitor; 1-NA-PP1||Calbiochem, EMD Millipore, Danvers, MA USA||529579||CAS 221243-82-9|
|chemical compound, drug||Solvent control; DMSO||Sigma-Aldrich, St. Louis, MO USA||D8418|
|chemical compound, drug||4-thiouracil (4sU)||Sigma-Aldrich, St. Louis, MO USA||T4509|
|chemical compound, drug||empigen BB detergent||Sigma-Aldrich, St. Louis, MO USA||30326|
Raji B cells were obtained from DSMZ (DSMZ no.: ACC 319; RRID:CVCL_0511). CDK9as Raji B cells were generated in this study by CRISPR-Cas9-based engineering of Raji B cells obtained from DSMZ (DSMZ no.: ACC 319; RRID:CVCL_0511). Raji B cells and CDK9as Raji B cells were grown in RPMI 1640 medium (Thermo Fisher Scientific, Waltham, MA USA) supplemented with 10% foetal calf serum (bio-sell, Nürnberg, Germany), 100 U/mL penicillin and 100 µg/mL streptomycin (Thermo Fisher Scientific, Waltham, MA USA), and 2 mM L-glutamine (Thermo Fisher Scientific, Waltham, MA USA) at 37°C and 5% CO2. Cells were verified to be free of mycoplasma contamination using Plasmo Test Mycoplasma Detection Kit (InvivoGen, San Diego, CA USA).
CDK9as contains a point mutation of the so-called gatekeeper residue that enables the kinase active site to accept bulky ATP analogs (1-NA-PP1) (4-Amino-1-tert-butyl-3-(1ʹ-naphthyl)pyrazolo[3,4-d]pyrimidine). To identify the gatekeeper residue (Lopez et al., 2014), the amino acid sequence of the human CDK9 kinase (UniProt, P50750-1) was aligned with sequences of previously characterized kinases carrying analog sensitive mutations. Multiple sequence alignment was performed with the web tool Clustal Omega 1.2.4 (Sievers et al., 2011). For the canonical isoform of CDK9, phenylalanine (F) 103 was identified as the gatekeeper residue and selected for mutation to alanine (A). Mutation of F103 at the CDK9 gene loci in Raji B cells was performed using the CRISPR-Cas9 system (Doudna and Charpentier, 2014; Hsu et al., 2014) as described (Mulholland et al., 2015) with minor modifications. Briefly, the single guide RNA (sgRNA) for editing CDK9 was designed by using the web tool Optimized CRISPR design (http://crispr.mit.edu/), and was incorporated to pSpCas9(BB)−2A-GFP (PX458) vector by BpiI restriction sites (Addgene plasmid # 48138) (Ran et al., 2013). For nucleotide replacement (gttc to cgcg), 200 nt single-stranded DNA oligonucleotides (ssODNs) were synthesized by Integrated DNA Technologies (IDT, Leuven, Belgium) and used as homology-directed repair (HDR) template. A BstUI cutting site was incorporated into the HDR template for screening. The vector and HDR template were introduced into human Raji B cells using Amaxa Mouse ES Cell Nucleofector Kit (Lonza, Basel, Switzerland) according to the manufacturer’s instructions. Two days after transfection, GFP positive cells were single cell sorted into 96 well plates using FACS Aria II instrument (Becton Dickinson, Franklin Lakes, NJ USA). After two weeks, individual colonies were expanded for genomic DNA isolation. The mutant lines were validated by PCR using respective primers, BstUI digestion (Figure 1—figure supplement 2A) and DNA sequencing (Figure 1—figure supplement 2B).
HDR template (A103 is underlined, BstUI cutting site in small letters):
Primers for sgRNA generation and screening:
Proteins equivalent to 1 × 105 Raji B cells were loaded in Laemmli buffer and subjected to SDS-PAGE before transfer to nitrocellulose. Unspecific binding of antibodies was blocked by incubation of the membrane with 5% milk in Tris-buffered saline containing 1% Tween. Primary antibodies were anti-CDK9 (sc-484) (Santa Cruz, Dallas, TX USA) and anti-α-tubulin (DM1A) (Sigma-Aldrich, St. Louis, MO USA). Fluorophore-coupled secondary antibodies (Rockland Immunochemicals Inc., Pottstown, PA USA) were used and blots were visualized using the Odyssey system (LI-COR, Lincoln, NE USA).
Cell proliferation at increasing 1-NA-PP1 inhibitor concentrations was measured in four biological replicates using the CellTiter 96 AQueous One Solution Cell Proliferation Assay System (Promega, Madison, WI USA). Cells were seeded in a 96-well plate and increasing concentrations of 1-NA-PP1 (Calbiochem, EMD Millipore, Danvers, MA USA) or DMSO (Sigma-Aldrich, St. Louis, MO USA) were added. After 72 hr, MTS tetrazolium compound was added to each well for one hour. Subsequently, the quantity of the MTS formazan product was measured as absorbance at 490 nm with a Sunrise photometer (TECAN, Männedorf, Switzerland) that was operated using the Magellan data analysis software (v7.2, TECAN, Männedorf, Switzerland). Relative signals for each concentration were calculated by dividing the signals of the CDK9as inhibitor treated cells by the corresponding signals of the control.
Two biological replicates of reactions including RNA spike-ins were performed essentially as described (Schwalb et al., 2016). Briefly, 3.3 × 107 Raji B (CDK9as or wild type) cells were treated for 15 min with solvent DMSO (control) or 5 µM of 1-NA-PP1 (CDK9as inhibitor). After 10 min of treatment, labeling was performed by adding 500 µM of 4-thiouracil (4sU) (Sigma-Aldrich, St. Louis, MO, USA) for 5 min at 37°C and 5% CO2. Cells were harvested by centrifugation at 3000 x g for 2 min. Total RNA was extracted using QIAzol according to the manufacturer’s instructions. RNAs were sonicated to generate fragments of <1.5 kbp using AFAmicro tubes in a S220 Focused-ultrasonicator (Covaris Inc., Woburn, MA USA). 4sU-labeled RNA was purified from 150 µg total fragmented RNA. Separation of labeled RNA was achieved with streptavidin beads (Miltenyi Biotec, Bergisch Gladbach, Germany) as described in (Schwalb et al., 2016). Prior to library preparation, 4sU-labeled RNA was purified and quantified. Enrichment of 4sU-labeled RNA was analyzed by RT-qPCR as described (Schwalb et al., 2016). Input RNA was treated with HL-dsDNase (ArcticZymes, Tromsø, Norway) and used for strand-specific library preparation according to the Ovation Universal RNA-Seq System (NuGEN, Leek, The Netherlands). The size-selected and pre-amplified fragments were analyzed on a Fragment Analyzer before clustering and sequencing on the Illumina HiSeq 1500.
Paired-end 50 base reads with additional 6 base reads of barcodes were obtained for each of the samples, that is two TT-seq replicates with 1-NA-PP1 (CDK9as inhibitor) and two TT-Seq replicates with DMSO (control) treatment. Reads were demultiplexed and mapped with STAR 2.3.0 (Dobin and Gingeras, 2015) to the hg20/hg38 (GRCh38) genome assembly (Human Genome Reference Consortium). Samtools (Li et al., 2009) was used to quality filter SAM files, whereby alignments with MAPQ smaller than 7 (-q 7) were skipped and only proper pairs (-f2) were selected. Further data processing was carried out using the R/Bioconductor environment. We used a spike-in (RNAs) normalization strategy essentially as described (Schwalb et al., 2016) to allow observation of global shifts and antisense bias determination (ratio of spurious reads originating from the opposite strand introduced by the RT reactions). Read counts for spike-ins were calculated using HTSeq (Anders et al., 2015). Sequencing depth calculations did not detect global differences. Antisense bias ratios were calculated for each sample j according to
for all available spike-ins i.
For each annotated gene, transcription units (TUs) were defined as the union of all existing inherent transcript isoforms (UCSC RefSeq GRCh38). Read counts for all features were calculated using HTSeq (Anders et al., 2015) and corrected for antisense bias using antisense bias ratios calculated as described above. The real number of read counts sij for transcribed unit i in sample j was calculated as
where Sij and Aij are the observed number of read counts on the sense and antisense strand. Read counts per kilobase (RPK) were calculated upon bias corrected read counts falling into the region of a transcribed unit divided by it’s length in kilobases. Based on the antisense bias corrected RPKs a subgroup of expressed TUs was defined to comprise all TUs with an RPK of 100 or higher in two summarized replicates of TT-seq without inhibitor treatment. An RPK of 100 corresponds to approximately a coverage of 10 per sample due to an average fragment size of 200. This subset was used throughout the analysis unless stated otherwise.
Aligned duplicated fragments were discarded for each sample. Of the resulting unique fragment isoforms only those were kept that exhibited a positive inner mate distance. The number of transcribed bases (tbj) for all samples was calculated as the sum of the coverage of evident (sequenced) fragment parts (read pairs only) for all fragments smaller than 500 bases in length and with an inner mate interval not entirely overlapping a Refseq annotated intron (UCSC RefSeq GRCh38, ~96% of all fragments) in addition to the sum of the coverage of non-evident fragment parts (entire fragment).
We first checked that no significant global shifts were detected in a comparison of two TT-seq replicates with 1-NA-PP1 (CDK9as inhibitor) treatment against two TT-seq replicates with DMSO treatment (control) in the above described spike-ins normalization strategy. Then all samples were subjected to an alternative, more robust normalization procedure. For each sample j the antisense bias corrected number of transcribed bases tbj was calculated on all expressed TUs i exceeding 125 kbp in length. 50 kbp were truncated from each side of the selected TUs to avoid influence of the response to CDK9as inhibition (Laitem et al., 2015). On the resulting intervals, size factors for each sample j were determined as
where m denotes the number of samples. This formula has been adapted (Anders and Huber, 2010) and was used to correct for library size and sequencing depth variations.
For each condition j (control or CDK9as inhibited) the antisense bias corrected number of transcribed bases was calculated on all expressed TUs i exceeding 10 kbp in length. Of all remaining TUs only those were kept harboring one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38). Response ratios were calculated for a window from the TSS to 10 kbp downstream (excluding the first 200 bp) for each TU i as
where negative values were set to 0.
For each condition j (control or CDK9as inhibited) the antisense bias corrected number of transcribed bases was calculated on all expressed TUs i with a given response ratio , excluding the first 200 bp. All TUs were truncated by 5 kbp in length from the 3’ end prior to calculation to avoid influence of some alterations in signal around the pA site after CDK9as inhibition (Laitem et al., 2015). A robust common elongation velocity estimate was calculated by finding an optimal fit for all TUs i between 25 to 200 kbp in length Li, that is minimizing the function
on the interval [0,10000] with inhibitor treatment duration t*=15 [min] and labeling duration t = 5 [min], given that
that is the difference of transcribed bases obtained by the CDK9as inhibitor treatment equals the number of transcribed bases per nucleotide times the number of nucleotides traveled in minutes corrected by the amount of the response .
For each condition j (control or CDK9as inhibited) the antisense bias corrected number of transcribed bases was calculated on all expressed TUs i exceeding 35 kbp in length, excluding the first 200 bp. All TUs were truncated by 5 kbp in length from the 3’ end prior to calculation to avoid influence of some alterations in signal around the pA site after CDK9as inhibition (Laitem et al., 2015). Of all remaining TUs only those were kept harboring one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38). For each TU i with the elongation velocity vi [kbp/min] was calculated as
with inhibitor treatment duration t*=15 [min] and labeling duration t = 5 [min].
Two biological replicates of reactions including empigen BB detergent treatment during immunoprecipitation (IP) were performed essentially as described (Nojima et al., 2016; Schlackow et al., 2017), with minor modifications. Briefly, 1.6 × 108 Raji B (CDK9as) cells were treated for 15 min with solvent DMSO (control) or 5 µM of 1-NA-PP1 (CDK9as inhibitor). Cell fractionation was performed as described (Conrad and Ørom, 2017). Isolated chromatin was digested with micrococcal nuclease (MNase) (NEB, Ipswich, MA USA) at 37°C and 1,400 rpm for 90 s. To inactivate MNase, EGTA was added to a final concentration of 25 mM. Digested chromatin was collected by centrifugation at 4°C and 13,000 rpm for 5 min. The supernatant was diluted tenfold with IP buffer containing 50 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.05% (vol/vol) NP-40, and 1% (vol/vol) empigen BB (Sigma-Aldrich, St. Louis, MO USA). For each IP, 50 µg of Pol II antibody clone MABI0601 (BIOZOL, Eching, Germany) was conjugated to Dynabeads M-280 Sheep Anti-Mouse IgG (Thermo Fisher Scientific, Waltham, MA USA). Pol II antibody-conjugated beads were added to diluted sample. IP was performed on a rotating wheel at 4°C for 1 hr. The beads were washed six times with IP buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.05 % NP-40, and 1% empigen BB) and once with 500 µL of PNKT buffer containing 1 x T4 polynucleotide kinase (PNK) buffer (NEB, Ipswich, MA USA) and 0.1% (vol/vol) Tween-20 (Sigma-Aldrich, St. Louis, MO USA). Beads were incubated in 100 µL of PNK reaction mix containing 1 x PNK buffer, 0.1% (vol/vol) Tween-20, 1 mM ATP, and T4 PNK, 3’ phosphatase minus (NEB, Ipswich, MA USA) at 37°C for 10 min. Beads were washed once with IP buffer. RNA was extracted with TRIzol reagent. RNA was precipitated with GlycoBlue co-precipitant (Thermo Fisher Scientific, Waltham, MA USA) and resolved on 6% denaturing acrylamide containing 7 M urea (PanReac AppliChem, Darmstadt, Germany) gel for size purification. Fragments of 35–100 nt were eluted from the gel using elution buffer containing 1 M NaOAc, 1 mM EDTA, and precipitated in ethanol. RNA libraries were prepared according to the TruSeq Small RNA Library Kit (Illumina, Massachusetts USA) and as described (Nojima et al., 2016). The size-selected and pre-amplified fragments were analyzed on a Fragment Analyzer before clustering and sequencing on an Illumina HiSeq 2500 sequencer.
Paired-end 50 base reads with additional 6 base reads of barcodes were obtained for each of the samples, that is mNET-seq samples with 1-NA-PP1 (CDK9as inhibitor) and with DMSO (control) treatment. Reads were demultiplexed and mapped with STAR 2.3.0 (Dobin and Gingeras, 2015) to the hg20/hg38 (GRCh38) genome assembly (Human Genome Reference Consortium). Samtools (Li et al., 2009) was used to quality filter SAM files, whereby alignments with MAPQ smaller than 7 (-q 7) were skipped and only proper pairs (-f2) were selected. Further data processing was carried out using the R/Bioconductor environment. Antisense bias (ratio of spurious reads originating from the opposite strand introduced by the RT reactions) was determined using positions in regions without antisense annotation with a coverage of at least 100 according to Refseq annotated genes (UCSC RefSeq GRCh38). mNET-seq coverage tracks were size factor normalized on 260 TUs that showed a response of less than 5% () in the TT-seq signal upon 1-NA-PP1 (CDK9as inhibitor) treatment. The response ratio was determined as described above including also TUs with multiple TSS to extend the number of TUs for normalization. Note that variation of the response ratio cutoff and thereby the number of TUs available for normalization does virtually not change the normalization parameters. Coverage tracks for further analysis were restricted to the last nucleotide incorporated by the polymerase in the aligned mNET-seq reads.
For all expressed TUs i exceeding 10 kbp in length with one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38) the pause site m* was calculated for all bases m in a window from the TSS to the end of the first exon (excluding the last 5 bases) via maximizing the function
where needed to exceed 5 times the median of the signal strength for all non-negative antisense bias corrected mNET-seq coverage values (Nojima et al., 2015). Note that all provided coverage tracks were used.
The known sequence and mixture of the utilized spike-ins allows to calculate a conversion factor to RNA amount per cell [cell−1] given their molecular weight assuming perfect RNA extraction. The number of spike-in molecules per cell N [cell−1] was calculated as
with the number of spike-ins m 25.10−9 [g], the number of cells n 3.27.107, the Avogadro constant NA 6.02214085774.1023 [mol−1] and molar-mass (molecular weight) of the spike-ins M [g mol−1] calculated as
where An, Un, Cn, Gn and 4sUn are the number of each respective nucleotide within each spike-in polynucleotide. is set to 0.1 in case of a labeled spike-in and 0 otherwise. The addition of 159 to the molecular weight takes into account the molecular weight of a 5' triphosphate. Provided the above the conversion factor to RNA amount per cell [cell−1] can be calculated as
for all labeled spike-in species i with length Li. Note that imperfect RNA extraction efficiency would lead to an underestimation of cellular labeled RNA in comparison to the amount of added spike-ins and thus to an underestimation of initiation frequencies. In case of a strong underestimation however the real initiation frequencies would lie above the pause-initiation limit, which is theoretically impossible. Thus we assume this effect to be insignificant.
The antisense bias corrected number of transcribed bases was calculated on all expressed TUs i exceeding 10 kbp in length. Of all remaining TUs only those were kept harboring one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38). For each TU i the productive initiation frequency Ii [cell−1min−1], which corresponds to the pause release rate, was calculated as
with labeling duration t = 5 [min] and length Li. Note that and Li were restricted to regions of non-first constitutive exons (exonic bases common to all isoforms).
For all expressed TUs i exceeding 10 kbp in length with one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38) the pause duration di [min] was calculated as the residing time of the polymerase in a window ±100 bases m around the pause site (see above) as
with pause release rate and the number of polymerases (antisense bias corrected mNET-seq coverage values [Nojima et al., 2015]) in a window ±100 bases around the pause site. For pause sites below 100 bp downstream of the TSS the first 200 bp of the TU were considered. Note that the right part of the formula is restricted to mNETseq instances above the 50% quantile for robustness and adjusts di to an absolute scale by comparing the CDK9 derived elongation velocities with those derived from combining mNET-seq and TT-seq data in the response window .
The previously derived inequality from (Ehrensberger et al., 2013)
states that new initiation events into productive elongation are limited by the velocity of the polymerase in the promoter-proximal region and that steric hindrance occurs below a distance of 50 bp between the active sites of the initiating Pol II and the paused Pol II. Given the calculations of pause duration and (productive) initiation frequency above, we can reformulate this inequality to
with 200 [bp] being the above defined pause window.
Based on the following model we simulated TT-seq coverage values by providing elongation velocity profiles , a labeling duration and a uracil content dependent labeling bias
denotes the labeling probability (set to 0.05) and the number of uracil residues of a given fragment (set to 0.28 times fragment length). The elongation velocity profile can be used to calculate the number of elongated positions of the polymerase at timepoint as
Given the transcription start site the number of elongated positions can be used to determine the end of an emerging nascent fragment . Based on that we determined the start position of a fragment as for each labeling duration as the position of the polymerase at the beginning of the labeling process. Subsequently, we used the number of uracil residues present in the RNA fragment to weight the amount of coverage contributed by this fragment as . Additionally, we applied a size selection similar to that in the original protocol for fragments below 80 bp in length with a sigmoidal curve that mimics a typical size selection spread. Given a pause position of 80 bp downstream of the TSS and pause duration of 1 or 2 min we adjusted the elongation velocity profile to simulate polymerase pausing. Note that neither reasonable changes in labeling probability, size selection probability nor changes in uracil residue content change the general observation that longer pause durations induce a greater shortage of TT-seq coverage in the region between the TSS and the pause site.
For each condition j (control or CDK9as inhibited) the antisense bias corrected number of transcribed bases was calculated on all expressed TUs i exceeding 35 kbp in length, excluding the first 200 bp. All TUs were truncated by 5 kb in length from the 3’ end prior to calculation to avoid influence of some alterations in signal around the pA site after CDK9as inhibition (Laitem et al., 2015). Of all remaining TUs only those were kept harboring one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38). For each TU i with the cumulative sums of the difference of the number of transcribed bases for each base k was calculated as
starting at the unique TSS (position 0) to the length of the TU. A elongation length estimate was then calculated by finding an optimal fit for n between 0 to , that is maximizing the function
on the interval [0, ]. In words, finding the maximum of the cumulative sums of difference in coverage rotated 45 degrees clockwise. The elongation velocity [kbp/min] was subsequently calculated as
with inhibitor treatment duration t*=15 [min] and labeling duration t = 5 [min].
For all expressed TUs i exceeding 10 kb in length with one unique TSS given all Refseq annotated isoforms (UCSC RefSeq GRCh38) the pause duration [min] was calculated as the residing time of the polymerase in a window ±100 bases m around the pause site (see above) as
with elongation length estimate and the number of polymerases (antisense bias corrected mNET-seq coverage values) in a window ±100 bases around the pause site. For pause sites below 100 bp downstream of the TSS the first 200 bp of the TU were considered. Note that was adjusted to the height as by a single proportionality factor for visualization purposes.
The gene-wise DMS-seq coverage (300 μl in vivo) for a window of [−15,–65] bp upstream of the pause site was normalized by subtraction from the respective DMS-seq coverage (denatured) allowing for maximal 5% negative values which were set to 0 (sequencing depth adjustment). The gene-wise mean values were subsequently normalized by dividing with the initiation frequency. Note that the latter normalization has an insignificant effect.
The gene-wise mean minimum free energy for a window of [−15,–65] bp upstream of the pause site was calculated from subsequent minimum free energy estimates of 13-base pair RNA fragments tiling the respective area using RNAfold from the ViennaRNA package (Lorenz et al., 2011).
Cellular fractionation and isolation of chromatin-associated RNAMethods in molecular biology 1468:1–9.https://doi.org/10.1007/978-1-4939-4035-6_1
The 8-nucleotide-long RNA:DNA hybrid is a primary stability determinant of the RNA polymerase II elongation complexJournal of Biological Chemistry 275:6530–6536.https://doi.org/10.1074/jbc.275.9.6530
CDK9 inhibitors define elongation checkpoints at both ends of RNA polymerase II-transcribed genesNature Structural & Molecular Biology 22:396–403.https://doi.org/10.1038/nsmb.3000
Control of formation of two distinct classes of RNA polymerase II elongation complexesMolecular and Cellular Biology 12:2078–2090.https://doi.org/10.1128/MCB.12.5.2078
Paused RNA polymerase II inhibits new transcriptional initiationNature Genetics 49:1045–1051.https://doi.org/10.1038/ng.3867
Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexesBiochemistry 34:11211–11216.https://doi.org/10.1021/bi00035a029
Katherine A JonesReviewing Editor; Salk Institute for Biological Studies, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "CDK9-dependent RNA polymerase II pausing controls transcription initiation" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors and the evaluation has been overseen by Kevin Struhl as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
In this manuscript, 4sU-seq and NET-seq were used in combination with an analogue-sensitive CDK9 mutant to investigate the relationship between transcriptional pausing and initiation. The reviewers agree that the findings definitively show that transcription initiation rates vary at strongly versus weakly paused RNA polymerase II promoters, and that inactivation of CDK9 thus affects transcription initiation, whether directly or indirectly, in a manner that correlates with pausing strength. The reviewers consider these conclusions extremely important and timely, and help explain how strongly paused genes can effectively and synchronously upregulate mRNA levels.
Comments are provided for revision:
1) Prior cell-free transcription experiments have shown large effects on mRNA synthesis due to pause-release under conditions that do not support enhancer-linked transcription initiation or chromatin templates to facilitate transcription reinitiation. The differences in pause frequency and strength between different promoters can also be recapitulated in vitro in the absence of long-range connections between enhancers or chromatin- the resulting changes in mRNA levels in these studies are quite large and P-TEFb/NELF-DSIF-dependent. Should the changes in initiation secondary to pausing observed here be more accurately described as indirect? Or would 4sU-seq and NET-seq reveal a stronger effect on initiation in these systems?
2) The authors should comment on the fact that the estimates for pause duration and initiation frequency in subsection “Multi-omics analysis provides pause duration d and initiation frequency I” disregard any differences there might be in transcriptional processivity between different genes and regions.
4) Introduction: Consider inserting 'underlying' so the sentence reads: "The mechanisms underlying how Pol II pausing can regulate….."
5) Cite Saponaro et al., 2014, who also published on elongation speeds using the techniques referenced in subsection “Pol II elongation velocity is gene-specific”.
6) The authors have not directly analyzed initiation or the formation of pre-initiation complexes as was done by Shao and Zeitlinger, 2017. As such, they should use the term "pause release" rather than 'initiation frequency' given that the data are focused on pol II release from the pausing site. 7)
7) In subsection “TT-seq monitors immediate response to CDK9 inhibition”, they can't claim that pol II that is still going through the EEC after CDK9 inhibition is independent of CDK9 function. Is it not more likely that not every CDK9 has been fully inhibited by 10 min treatment of NA? The only (indirect) experimental evidence they have showing that pol II pausing is regulating pol II initiation at the promoter is the decrease in TT-seq signal between the TSS and the pause site after CDK9 inhibition. The problem here is that it is not entirely clear whether CDK9 activity is doing something upstream of the pausing site and is involved in elongation between the TSS and the pause site.https://doi.org/10.7554/eLife.29736.043
- Patrick Cramer
- Patrick Cramer
- Heinrich Leonhardt
- Dirk Eick
- Patrick Cramer
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We would like to thank Helmut Blum and Stefan Krebs (LAFUGA, LMU Munich) for sequencing. We also thank Merle Hantsche for structural modeling. We thank Julien Gagneur (Technical University of Munich) for initial discussions. HL was funded by SFB 1064 TP A17. DE was funded by SFB1064 (Chromatin Dynamics). PC was funded by Advanced Grant TRANSREGULON of the European Research Council and the Volkswagen Foundation.
- Katherine A Jones, Reviewing Editor, Salk Institute for Biological Studies, United States
© 2017, Gressel et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.