Introduction

In humans, most protein-coding genes initiate transcription bidirectionally. Usually, only sense transcription is extended and leads to mRNA synthesis. Antisense transcription terminates within ∼1 kilobase (kb), and the short non-coding (nc)RNA is degraded1. Similar asymmetry frequently occurs in unicellular eukaryotes (e.g., budding yeast) and at some plant promoters2, 3. Thus, promoter directionality is a broadly observed phenomenon. Interestingly, bidirectionality is the ground state of promoters and directionality is acquired over evolutionary time4. This is proposed to be through a combination of DNA sequences and proteins that favour the directional initiation and elongation of transcription.

The present explanation for the directionality of mammalian RNAPII promoters involves the arrangement of U1 snRNA binding sites and polyadenylation signals (PASs). U1 snRNA promotes RNAPII elongation through protein-coding genes by binding to RNA and preventing early termination57. It does this by silencing PASs and antagonizing other attenuation mechanisms, which include the PP1 regulator, PNUTS, and the restrictor complex8, 9. In contrast, U1 binding splice sites are rarer in short antisense transcripts, which are often rich in PAS sequences that are proposed to promote transcriptional termination10. This model predicts that polyadenylation factors control a large fraction of antisense transcriptional termination.

The multi-subunit Integrator complex also regulates promoter-proximal transcription1115. The Integrator complex comprises the backbone, arm, phosphatase, and endonuclease modules11. Its endoribonuclease is INTS11, and its phosphatase activity is mediated by INTS6 and protein phosphatase 2A (PP2A). INTS11 endonuclease broadly affects promoter-proximal transcriptional attenuation whereas INTS phosphatase is proposed to regulate the escape of RNAPII into elongation16, 17. INTS6/PP2A phosphatase functionally antagonises CDK9, which is vital for RNAPII promoter escape and productive elongation18. By this model, CDK9 activity promotes elongation across protein-coding genes and INTS6/PP2A opposes it.

We tested the prediction that PAS factors control transcription directionality by terminating antisense transcription, but this is not usually the case. Instead, we find that promoter directionality is conferred by preferential initiation in the sense direction and is thereafter maintained by INTS11 and CDK9. The termination of most antisense transcription is constitutively INTS11-dependent, whereas sense transcription is hypersensitive to INTS11 only when CDK9 activity is simultaneously inhibited. We hypothesise that CDK9 activity desensitises sense transcription to INTS11 and that reduced CDK9 activity in the antisense direction exposes RNAPII to INTS11-dependent termination.

Results and discussion

The present explanation of mammalian promoter directionality invokes early PAS-dependent termination of antisense transcription10. This is partly based on the direct detection of polyadenylated antisense transcripts, which provides evidence that some transcriptional termination is PAS-dependent. However, there are non-polyadenylated antisense RNAs that might be attenuated in other ways19. To test the relevance of PAS-dependent termination for antisense transcriptional termination and promoter directionality we tagged RBBP6 with the dTAG degron. Because RBBP6 is required to activate the PAS cleavage machinery20, 21, its depletion should inhibit any PAS-dependent transcriptional termination. Three homozygous dTAG-RBBP6 clones were isolated (Supplemental Figure 1A) and tagged RBBP6 was efficiently depleted after exposure to the dTAGv-1 degrader (Figure 1A). To test the overall contribution of RBBP6 to transcription, we used POINT (Polymerase Intact Nascent Transcript)-seq22, which maps nascent transcription by sequencing full-length RNA extracted from immunoprecipitated RNAPII.

RBBP6 loss disrupts PAS-dependent termination of sense transcription.

A. Western blot demonstrating the depletion of dTAG-RBBP6 over a time course of dTAGv-1 addition. SPT5 serves as a loading control.

B. Genome browser track of NEDD1 in POINT-seq data from dTAG-RBBP6 cells treated or not (2hr) with dTAGv-1. RBBP6 depletion induces a termination defect in the protein-coding direction (downstream of the indicated PAS) but not the upstream antisense direction. The Y-axis shows Reads Per Kilobase per Million mapped reads (RPKM).

C. Metaplot of POINT-seq data from dTAG-RBBP6 cells treated (RBBP6 dep) or not (Ctrl) (2hr) with dTAGv-1. This shows 1316 protein-coding genes selected as separated from any expressed transcription unit by ≥10kb. Signals above and below the x-axis are sense and antisense reads, respectively. The Y-axis scale is RPKM. TSS=transcription start site; TES=transcription end site (this marks the PAS position).

D. Heatmap representation of the data in C, which displays signal as a log2 fold change (log2FC) in RBBP6 depleted versus un-depleted conditions.

Figure 1B shows POINT-seq coverage over NEDD1 including the upstream antisense transcript. RBBP6 loss causes a termination defect beyond the PAS shown by the extended POINT-seq signal beyond the NEDD1 gene. However, RBBP6 does not affect the termination of upstream antisense RNA which is not extended when RBBP6 is depleted. Metaplot and heatmap analyses across 1316 genes confirm this as a global trend (Figures 1C and D). Although many antisense transcripts contain multiple AAUAAA sequences, most still terminate RBBP6-independently (Supplementary Figures 1B and C). Similarly, we recently showed that the PAS-dependent 5’→3’ exonucleolytic torpedo terminator, XRN2, does not affect antisense transcriptional termination9. Therefore, although a fraction of antisense transcription is polyadenylated, most of it can terminate using PAS-independent mechanisms.

Our data argue that PAS-independent termination mechanisms are prevalent for antisense transcription. A major PAS-independent termination pathway is driven by the Integrator complex, which was first identified as the 3’ end processing complex for snRNAs23. Several reports show that Integrator terminates transcription from most promoters, including those that initiate antisense transcription, so might affect promoter directionality11, 16, 17. To analyse this, we tagged its endonucleolytic subunit (INTS11) with a dTAG degron24, which enables rapid depletion of its catalytic activity (Figure 2A). We then performed POINT-seq on INTS11-dTAG cells depleted or not of INTS11 to assay the global impact of INTS11 on RNAPII transcription. The established effect of INTS11 at snRNAs was detected in our POINT-seq data and demonstrates the efficacy of this approach (Figure 2B).

INTS11 loss disrupts the termination of antisense transcription.

A. Western blot demonstrating homozygous tagging of INTS11 with dTAG and the depletion of INTS11-dTAG after 1.5 hr treatment with dTAGv-1. INTS8 serves as a loading control.

B. Genome browser track showing POINT-seq signal over RNU5A-1 and RNU5B-1 in INTS11-dTAG cells treated or not (1.5 hr) with dTAGv-1. Note, that INTS11 depletion induces a termination defect in each case. Y-axis shows RPKM following spike-in normalisation.

C. Genome browser track showing POINT-seq signal over PGK1 and its upstream antisense region in INTS11-dTAG cells treated or not (1.5 hr) with dTAGv-1. Y-axis shows RPKM following spike-in normalisation.

D. Metaplot of POINT-seq data from INTS11-dTAG cells treated or not (1.5 hr) with dTAGv-1. This shows 1316 expressed protein-coding genes that are separated from any expressed transcription unit by ≥10kb. Signals above and below the x-axis are sense and antisense reads, respectively. Y-axis shows RPKM following spike-in normalisation.

E. Heatmap representation of the data in D, which displays signal as a log2 fold change (log2FC) in INTS11 depleted versus undepleted conditions over a region 3kb upstream and downstream of annotated TSSs.

As exemplified by PGK1 (Figure 2C), INTS11 does not affect the termination of protein-coding transcription but its loss causes a strong upregulation of antisense transcription. Meta-analysis shows the generality of antisense transcriptional attenuation via INTS11 (Figure 2D) and heatmap analysis demonstrates the dominant impact of INTS11 on antisense versus sense transcription at most promoters (Figure 2E). Transcription over protein-coding gene bodies is generally more modestly affected by INTS11 loss, but there is evidence of mildly increased transcription over the 5’ end of some genes as was recently reported (Supplemental Figure 2A)1217. Overall, INTS11 frequently attenuates antisense transcription whereas a smaller fraction of sense transcription is affected.

Our analyses show that antisense RNAs are more frequently terminated via INTS11 than sense transcripts, which ultimately terminate via the PAS-dependent mechanism. The sensitivity of antisense transcripts to INTS11 might be due to the nature of antisense transcription or some other promoter feature. To interrogate this further, we analysed two other promoter classes: those where protein-coding transcripts are initiated in both directions and those that initiate the bidirectional transcription of unstable enhancer (e)RNAs. In the former case, both transcripts are extended and stable (i.e., like most sense transcripts); in the latter case, both are short and unstable like most antisense transcripts. When both directions are protein-coding, INTS11 depletion causes modest reductions in transcription (Supplemental Figure 2B). Conversely, eRNA transcription was bidirectionally upregulated upon INTS11 elimination like the antisense transcripts in (Supplemental Figure 2C). We conclude that short ncRNAs are more strongly affected by INTS11 than protein-coding transcripts. At directional promoters this results in the attenuation of antisense transcription.

The increased antisense POINT-seq signal following INTS11 loss is consistent with defects in transcriptional termination. However, it could result from increased transcriptional initiation. We were also interested in whether preferential sense initiation could contribute to observed directionality as was proposed in budding yeast4. To precisely resolve directional aspects of initiation and assay any impact of INTS11, we devised a variant of POINT-seq called short (s)POINT. Briefly, sPOINT follows the POINT-seq protocol but library preparation employs the selective amplification of 5’ capped RNAs <150nts (Figure 3A, Supplemental Figure 3A, Experimental Procedures). The sPOINT signal over PGK1 exemplifies this and demonstrates a very restricted signal close to the TSS (Figure 3B). Figure 3C compares the meta profile of POINT- and sPOINT-seq. While POINT-seq reads span entire transcribed sequences the sPOINT signal is restricted to the promoter-proximal region. Thus, the well-known promoter-proximal RNAPII pause is efficiently assayed by sPOINT-seq. Our metaplot in Figure 3C shows that most capped RNAPII-associated RNA <150nts are promoter-proximal and thus relate to paused polymerases.

Transcription initiation is more efficient and focused in the sense direction.

A. Schematic of sPOINT-seq protocol. The POINT-seq protocol is followed, in which chromatin is isolated and engaged RNAPII is immunoprecipitated. Short transcripts are preferentially amplified during library preparation (see Experimental Procedures for full details).

B. Comparison of POINT-(top trace) and sPOINT-seq (lower trace) on PGK1. Y-axis units are RPKM.

C. Metaplot comparison of POINT-(top plot) and sPOINT-seq (lower plot) profiles across the 684 highest expressed protein-coding that are separated from expressed transcription units by ≥10kb.. Signals above and below the x-axis are sense and antisense reads, respectively. Y-axis shows RPKM following spike-in normalisation.

D. Top metaplot shows full read coverage for sPOINT-seq performed in INTS11-dTAG cells treated or not (1.5hr) with dTAGv-1. The lower metaplot is the same data but only the 5’ end of each read is plotted. Both plots display a region +/-1kb from the TSS of the top 20% (based on sPOINT signal) promoters. Y-axis signals are RPKM following spike-in normalisation.

E. Metaplot zoom of the antisense TSS signals deriving from the lower plot in D. This makes clear the dispersed sites of initiation. The Y-axis scale is RPKM following spike-in normalisation.

F. Genome browser track of PGK1 promoter region in sPOINT-seq. This showcases the focused sense TSS (black arrows) and the dispersed antisense reads (brackets) generalised in the metaplots in D and E. Note the higher y-axis scale (RPKM) for sense vs. antisense.

We then performed sPOINT-seq in INTS11-dTAG cells treated or not with dTAGv-1 and plotted the coverage of full-length reads within 1kb of TSSs (Figure 3D). Because sPOINT-seq maps the 5’ and 3’ ends of these reads, it also detects TSSs at single-nucleotide resolution and these 5’ ends are plotted in the lower meta profile in Figure 3D. INTS11 loss results in lower sPOINT-seq signal arguing against increased transcriptional initiation in its absence. Together with the increased POINT-seq – especially for antisense transcription – this indicates more RNAPII promoter escape upon INTS11 loss. Interestingly, this experiment reveals sense transcriptional initiation to be more efficient and focused compared to the antisense direction. This is clear from the TSS-aligned metaplot in Figure 3D and the dispersed nature of antisense transcription shown in Figure 3E. Quantitation of the TSS-derived sense vs. antisense reads demonstrates the higher signal in the sense direction in untreated cells (Supplemental Figure 3B). Figure 3F (PGK1) and supplemental figure 3C (ACTB) exemplify these features on individual genes. INTS11 loss does not dramatically alter TSS position or focus. Although a lower resolution technique, RNAPII chromatin-immunoprecipitation and sequencing (ChIP-seq) confirmed these promoter characteristics and the mild impact of INTS11 (Supplemental Figures 3D and E). Overall, these data show that directionality is partly determined by preferential initiation in the sense direction.

After initiation, directionality is maintained because it is well-established that sense transcription goes further than antisense transcription. If INTS11 regulates this, an opposing force is required to prevent it from attenuating sense transcription. INTS11 affects transcription very early, occupies promoters, and becomes less active as RNAPII moves into elongation1618, 25, 26. Therefore, if INTS11 is counteracted to allow sense transcription, any responsible mechanism needs to act early. One of the first transcriptional checkpoints involves the phosphorylation of RNAPII and other factors by CDK9, which releases promoter-proximally paused RNAPII into elongation27. During this process, the Integrator-associated phosphatase, PP2A, antagonizes CDK9 and presumably regulates the sensitivity of RNAPII to INTS1118. Because Integrator phosphatase has little effect on antisense transcription16, we hypothesised that INTS11 sensitivity and CDK9 activity are inversely correlated to maintain directionality after initiation.

This hypothesis predicts that sense transcription will be attenuated via INTS11 when CDK9 is inactive. To test this genome-wide, we depleted INTS11 from INTS11-dTAG cells in the presence or absence of a specific CDK9 inhibitor (NVP-228) and performed POINT-seq. As exemplified by CDCA7, the depletion of INTS11 alone caused an antisense transcriptional termination defect with a milder impact on the protein-coding sense direction (Figure 4A). As expected, NVP-2 treatment reduced transcription over the protein-coding gene body. Antisense transcription also displays some CDK9 sensitivity. Importantly, in NVP-2-treated cells, INTS11 loss increased transcription in both directions - effectively compromising the maintenance of directionality. Further examples are displayed in Supplemental Figures 4A and B. This is a genome-wide trend as shown in the metaplots in Figures 4B and C. Heatmap analysis of the effect of INTS11 loss after CDK9 inhibition shows bidirectional upregulation of transcription at most assayed genes (Figure 4D). This experiment shows a clearer effect of INTS11 loss over the 5’ end of sense transcripts versus our experiment in Figure 2 (although this is again more modest than the antisense effect). We speculate this to be due to the slightly extended period of INTS11 depletion employed alongside NVP-2 treatment (2.5 hr here vs. 1.5 hr for Figure 2).

Promoters lose their directionality when CDK9 is inhibited.

A. Genome browser track of CDCA7 in POINT-seq data derived from INTS11-dTAG cells either untreated, dTAG treated, NVP-2 treated or dTAG and NVP-2 treated (2.5 hr). Signals above and below the Y-axis are sense and antisense reads, respectively. The Y-axis scale shows RPKM following spike-in normalisation.

B. Metaplot of POINT-seq analysis of POINT-seq in INTS11-dTAG cells depleted or not of INTS11 and treated or not with NVP-2 (2.5 hr). This shows 1316 protein-coding genes selected as separated from any expressed transcription unit by ≥10kb. The regions 3kb upstream and downstream of genes are included. Y-axis units are RPKM following spike-in normalisation.

C. Metaplot of the same CDK9i + and -dTAG data shown in B but zoomed into the region 3kb upstream and downstream of the TSS.

D. Heatmap representation of the data in C, which displays signal as a log2 fold change (log2FC) in INTS11 depleted versus un-depleted conditions covering a region 3kb upstream and downstream of the TSS.

E. qRT-PCR analysis of INTS11-dTAG cells transfected with the HIV reporter construct with or without TAT then depleted or not of INTS11. Quantitation shows signals relative to those obtained in the presence of INTS11 and the absence of TAT after normalising to MALAT1 RNA. n=3. Error bars show standard deviation. ** denoted p=0.01. Note that INTS11 depletion was performed concurrently with transection (14hr in total).

F. Model for promoter directionality depicting higher levels of focused transcriptional initiation in the sense direction together with opposing gradients of CDK9 and INTS11 activity that peak in sense and antisense directions, respectively.

Following CDK9 inhibition and INTS11 loss, the largest recovery of protein-coding transcription is often over the 5’ end of genes. This presumably reflects a continued requirement of CDK9 for elongation, even when INTS11 is absent. Furthermore, Integrator phosphatase remains RNAPII-associated after acute INTS11 loss17, which would antagonise elongation especially if CDK9 is lacking. Importantly, the POINT-seq signal remains higher in the sense vs. antisense direction even when CDK9 and INTS11 are both compromised. This is consistent with our sPOINT-seq conclusion that levels of sense transcriptional initiation are higher compared to the antisense direction. Finally, we confirmed the CDK9-mediated suppression of INTS11 on four selected protein-coding genes using an alternative inhibitor, 5, 6-dichloro-1-β-D-ribofuranosylbenzimidazole (DRB) (Supplemental Figure 4C). In sum, most RNAPII promoters are bidirectionally affected by INTS11 loss when CDK9 is inhibited.

If CDK9 counteracts INTS11, its recruitment should also prevent transcriptional attenuation by Integrator. To assay this, we employed a plasmid where transcription is driven by the human immunodeficiency virus (HIV) promoter. Transcription from the HIV promoter results in the synthesis of the trans-activating response (TAR) element and promoter-proximal RNAPII pausing. Pause release requires the trans-activator of transcription (TAT), which promotes RNAPII elongation by recruiting CDK929. INTS11 suppresses transcription from the HIV promoter when TAT is not present30. To test the effect of CDK9 on this process, INTS11-dTAG cells were transfected with the HIV reporter with or without TAT before treatment or not with dTAGv-1. Transcription from the reporter was analysed by qRT-PCR (Figure 4E). HIV transcription is induced by INTS11 loss in the absence of TAT. Transcription was greatly stimulated by TAT (∼200 fold) but was no longer enhanced by INTS11 loss (there was slightly less reporter RNA recovered). Therefore, CDK9 recruitment by TAT alleviates INTS11-dependent attenuation of transcription.

How CDK9 opposes INTS11 is unresolved but, of possible relevance, INTS11 and SPT5 are adjacent in the RNAPII: Integrator structure31. As SPT5 is a substrate of CDK93234, its phosphorylation might evict INTS11 or prevent its association with the complex. We show that sense transcriptional initiation dominates over the antisense direction. This could reflect the positioning of DNA elements that attract initiation factors to advantage sense transcription4. Consistently, mammalian initiation components show peak occupancy over annotated sense TSSs35. Thereafter, the maintenance of directionality by CDK9 and INTS11 is consistent with their well-established roles in promoting and terminating transcription. Published ChIP-seq indicates their respective enrichment in sense and antisense directions, which supports our model (Supplemental Figure 4D18). Nevertheless, some antisense transcription is NVP-2 sensitive, revealing CDK9 activity on a fraction of RNAPII (Figure 4). As CDK9 opposes INTS11, this fraction of antisense transcription may be terminated via other mechanisms like restrictor and the PAS-dependent machinery9, 10, 3638. Overall, transcriptional initiation and the opposing activities of CDK9 and INTS11 establish and maintain promoter directionality (see model in Figure 4F).

Acknowledgements

We thank Hiroshi Kimura for the total RNAPII antibody used for POINT-seq. Our research was funded by a Wellcome Trust Investigator Award to SW (223106/Z/21/Z). This project used the University of Exeter Sequencing Service, and their equipment was funded by the Wellcome Trust (Multi-User Equipment Grant award number 218247/Z/19/Z).

Author contributions

Conceptualization J.D.E, S.W; data curation J.D.E, J.B, L.D, C.E; formal analysis J.D.E., J.B., L.D; C.E and S.W; methodology, J.D.E. and S.W.; investigation, J.D.E., J.B., L.D; C.E and S.W.; supervision and funding acquisition, S.W.; writing and editing, J.D.E., J.B., L.D; C.E and S.W.

Experimental procedures

Sequencing data

Deposited at Gene Expression Omnibus under accession: GSE243266.

Cloning

HIV reporter constructs were made by removing the CMV promoter, the entire beta-globin sequence, and its PAS from a pcDNA5 FRT/TO plasmid containing the WT β-globin (βWT) gene (described in39) and inserting an HIV promoter and downstream TAR element derived from βΔ5-7 (described in40). INTS11 targeting constructs were modified from those we previously described to generate INTS11-SMASh cells (described in41). The SMASh tag was removed and replaced with 2xHA dTAG derived from Addgene plasmid 9179224. Guide RNA expressing Cas9 plasmids to modify INTS11 or RBBP6 were made by inserting annealed oligonucleotides, containing the targeting sequence, into px330 digested with BbsI.

Cell culture and cell lines

HCT116 cells were maintained in DMEM supplemented with penicillin/streptomycin at 37°C, 5% CO2. dTAG-RBBP6 cells were generated using the “CHoP in” protocol42. Briefly, a 24-well dish was transfected with 250ng of px33043 containing the RBBP6-targeting guide and 250ng of PCR product containing the dTAG degron preceded by a blasticidin or puromycin selection marker (derived from Addgene plasmid 9179224). Jetprime (Polyplus) was used for transfection. Three days later, cells were passaged into media containing 10µg/ml Blasticidin/1µg/ml Puromycin and colonies were PCR screened ∼10 days later. INTS11-dTAG cells were generated by homology-directed repair. A 6-well dish of cells was transfected with 1µg px330 containing the INTS11-targeting guide (described in41) and 1µg each of the repair templates. Three days later, cells were passaged into media containing 30µg/ml Hygromycin and 800µg/ml Neomycin (G418). ∼10 days later, colonies were picked and screened by PCR. For protein depletion 1µM dTAGv-1 (Tocris) was used for 1-14hrs (see figure legends for timings used in each experiment). NVP-2 was used at 250nM for 2.5 hrs. DRB was used at 100µM for 2.5 hr.

POINT-seq

For POINT-seq, we followed the protocol provided in22. The only modification was that we started with a confluent 10cm dish of cells and performed the immunoprecipitation with 6µg of anti-RNAPII. ∼2% cell volume of Drosophila S2 cells was included as a spike in control. Libraries were prepared using the NEBNext Ultra™ II Directional RNA Library Prep Kit for Illumina (New England Biolabs). sPOINT was performed in the same manner except immunoprecipitated RNA was treated with Terminator™ 5-Phosphate-Dependent Exonuclease (lucigen) to remove any uncapped transcripts and libraries were prepared with the SMARTer® smRNA-Seq Kit for Illumina (Takara Bio).

ChIP-seq

For each experiment, 1×10cm dish of cells was used. Protein: DNA crosslinks were formed by adding Formaldehyde (1% v/v) to culture media for 10 mins then quenching with 125mM glycine. Cells were rinsed 2x with PBS, scraped off the dish, and pelleted in 10ml PBS at 500xg for 5 mins. We then employed the simple ChIP enzymatic kit (Cell Signalling Technologies) to fragment chromatin and purify RNAPII-bound DNA. We followed the kit protocol except for conjugating 5µg of anti-total RNAPII to sheep anti-mouse dynabeads (Life Technologies). Sequencing libraries were generated using the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina.

Total RNA isolation and qRT-PCR

A 24-well dish of cells was transfected with 100ng HIV reporter plasmid and, where co-transfected, 50ng TAT plasmid44 using Jetprime (Polyplus) following the manufacturer’s protocol. Media was refreshed 5 hours post-transfection and dTAGv-1 was added where appropriate. The next day, total RNA was isolated using Trizol (Thermo Fisher) following the manufacturer’s protocol. RNA was DNase treated then 1µg was reverse transcribed using Protoscript II reverse transcriptase (New England Biolabs). qPCR was performed using LUNA SYBR green reagent (New England Biolabs) on a Qiagen Rotorgene instrument. Quantitative analysis used the ΔCT method.

Antibodies

INTS11 (Abbexa; abx234340); total RNAPII (gift from Hiroshi Kimura), INTS8 (RRID:AB_2683403); RBBP6 (RRID:AB_2621169); SPT5 (RRID:AB_2196394).

qRT-PCR primers and gRNA target sites

See supplemental table 1.

Bioinformatics

sPOINT

For sPOINT TSS metaplots showing full-length and 5’ derived coverage, gene lists were determined by selecting principal protein-coding transcript isoforms from gencode v42 human annotation – specifically, those containing both “appris_principal_1” and “Ensembl_canonical” labels (15301 in total). Of these, the top 20% expressed (based on - dTAG) were used to generate meta profiles (3060 TSSs in total).

Antisense PAS analysis

Bed files from Figure 1C were edited to map the antisense transcript from each gene by fitting each gene’s start and end point to 3kb upstream of their TSS. Sequences underlying these regions were obtained via BEDTools getFASTA (hg38) and consensus PAS (AATAAA) enumerated in R. PAS frequency per transcript was plotted using ggplot2. Heatmaps for transcripts containing > 1 or no PAS motifs were plotted with DeepTools.

POINT-Seq alignment and visualisation

Adapters were removed from raw reads using Trim Galore! and mapped to GRCh38 using HISAT2 using default parameters. Additionally, reads were also mapped to Dm6 to quantify unique Drosophila reads in each sample for spike-in. Reads with a MAPQ score <30 were removed from alignment files using SAMtools45. Biological replicates were normalised and merged using SAMtools merge before, spike-in scaled bigwig files using DeepTools were visualised using IGV46.

ChIP-seq alignment

Adapters were removed from raw reads using Trim Galore! and mapped to GRCh38 using HISAT2 with default parameters. Reads were also mapped to Dm6 to identify spike-in signal. Reads with MAPQ score of ≤30 were removed with SAMtools45. Replicates were normalised and merged using SAMtools merge.

POINT-seq metaplots and heat maps

A metagene list of 1316 genes was generated containing genes with no overlapping regions within 10kb of any other expressed transcription unit. Split strand metagene plots were produced using Drosophila spike-in normalised sense and antisense (scaled to −1) bigwig coverage files separately with further graphical processing performed in R. For heat maps, computematrix (DeepTools) was used to generate score files from the normalised bigwig files using the 10kb non-overlapping gene list. A log2 ratio (depletion/control) was applied to identify changes in reads. Plots were redrawn in R; parameters used for each heat map are detailed in figure legends.

ChIP-seq datasets

Data used to produce CDK9 and INTS11 metaplot (GSE163804), GSM numbers: INTS11 (GSM4987768) and CDK9 (GSM4987764).

ChIP-seq metaplot

ChIP-seq metaplots were created from bigwig coverage data files using DeepTools with further processing performed within R. For Supplementary Figure 3E, plots used the same gene list described above for sPOINT promoter analyses. For Supplementary Figure 4D, a metagene list was generated from non-overlapping 918 antisense (promoter upstream transcripts) regions determined by de novo transcript assembly of previously published nuclear RNA-seq from cells depleted of the catalytic exosome subunit, DIS347.

Supplemental figure legends

A. Western blot showing three separate dTAG-RBBP6 cell clones. In each case, homozygous tagging is demonstrated by the size shift versus endogenous RBBP6 (HCT116 lanes). Tagged RBBP6 is completely depleted by 2hr treatment with dTAGv-1 whereas endogenous RBBP6 is unaffected. SPT5 serves as a loading control. Clone number 2 (red asterisk) was selected for the experiments in Figure 1.

B. Graph plotting the number of AAUAAA sequences in antisense transcripts derived from a 3kb window upstream of TSSs. Y-axis is the number of transcripts, and the x-axis shows the AAUAAA count per transcript.

C. Heatmap of control or RBBP6-depleted POINT-seq showing the RBBP6 effect on antisense transcripts without an AAUAAA or those that contain at least one AAUAAA. Most are unaffected by RBBP6 loss. The Y-axis is RPKM.

A. Genome browser track showing POINT-seq signal over the first 7kb of GLCCI1 in INTS11-dTAG cells treated or not (1.5 hr) with dTAGv-1. This is an example of where INTS11 loss affects protein-coding transcription as previously described17. Y-axis shows RPKM following spike-in normalisation.

B. Metaplot of POINT-seq data from INTS11-dTAG cells treated or not (1.5hr) with dTAGv-1. These are protein-coding genes arranged head-to-head. Signals above and below the x-axis are sense and antisense reads, respectively. Y-axis shows RPKM following spike-in normalisation.

C. Metaplot of POINT-seq data from INTS11-dTAG cells treated or not (1.5hr) with dTAGv-1. This shows RNAs derived from enhancer clusters, which generally initiate transcription bidirectionally. Signals above and below the x-axis are sense and antisense reads, respectively. Y-axis shows RPKM following spike-in normalisation.

A. Plot showing the size distribution of fragments mapped by sPOINT-seq. This demonstrates its selectivity toward transcripts <150nts.

B. Violin plot of the sum sPOINT 5’end spike-in normalised RPKM signal across a 500bp window from the protein-coding TSSs shown in Figure 3D (n=3060). These samples are untreated with dTAG to quantitate the normal levels of sense vs. antisense initiation/pausing in a window -/+ 500bp from the TSS in respective directions for antisense and sense. This shows more initiation/pausing in sense vs. antisense directions.

C. Genome browser track of ACTB promoter region in sPOINT-seq from INTS11-dTAG cells treated or not with dTAGv-1 (1.5 hr). This showcases the focused sense TSS (black arrows) and the dispersed antisense reads (brackets). Note the different y-axis scales (RPKM) for sense vs. antisense.

D. Genome track of RNU5A-1 and RNU5B-1 in RNAPII ChIP-seq performed on INTS11-dTAG cells treated or not with dTAGv-1 (2hr). INTS11 depletion causes RNAPII build-up beyond both genes indicating a transcription termination defect. The Y-axis scale is RPKM.

E. Metaplot of RNAPII occupancy of protein-coding TSS regions (the same gene set employed for the sPOINT analysis in Figure 3) derived from RNAPII ChIP-seq performed on INTS11-dTAG cells treated or not with dTAGv-1 (2hr). The Y-axis scale is log10 qvalue of peak pileup values normalised to spike in control.

A. Genome browser track of TARDBP in POINT-seq data derived from INTS11-dTAG cells either untreated, dTAG treated, NVP-2 treated or dTAG and NVP-2 treated (2.5 hr). The Y-axis scale shows RPKM.

B. As for A, but for NEDD1.

C. qRT-PCR analysis of PGK1, ENY2, PLEKHF2, and NUDCD1 pre-mRNAs in INTS11-dTAG cells treated or not with dTAGv-1 and at the same time exposed or not to DRB (all 2.5 hr). To enrich nascent transcripts, primers detect intronic RNA. Quantitation shows fold change versus spliced actin relative to samples untreated with dTAGv-1 or DRB. DRB treatment substantially reduces signal, which is restored when INTS11 is co-depleted. Error bars show standard deviation. n=3, * denotes p≤0.05.

D. Analysis of published18 INTS11 and CDK9 ChIP-seq data showing their respective occupancy within 1kb of the TSS. Based on this data, they are slightly enriched upstream (INTS11) and downstream (CDK9) of the TSS. The Y-axis units are RPKM.