Chromatin-associated RNA sequencing (ChAR-seq) maps genome-wide RNA-to-DNA contacts

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

RNA is a critical component of chromatin in eukaryotes, both as a product of transcription, and as an essential constituent of ribonucleoprotein complexes that regulate both local and global chromatin states. Here, we present a proximity ligation and sequencing method called Chromatin-Associated RNA sequencing (ChAR-seq) that maps all RNA-to-DNA contacts across the genome. Using Drosophila cells, we show that ChAR-seq provides unbiased, de novo identification of targets of chromatin-bound RNAs including nascent transcripts, chromosome-specific dosage compensation ncRNAs, and genome-wide trans-associated RNAs involved in co-transcriptional RNA processing.

https://doi.org/10.7554/eLife.27024.001

Introduction

Much of the eukaryotic genome is transcribed into non-coding RNA (ncRNA), and several studies have established that a subset of these ncRNAs form ribonucleoprotein complexes that bind and regulate chromatin (Guttman and Rinn, 2012; Meller et al., 2015; Cech and Steitz, 2014). Some of the most well studied ncRNAs are those involved in dosage compensation, which include roX1 and roX2 in Drosophila and Xist in mammals. In Drosophila, roX1 and roX2 are part of the male-specific lethal (MSL) complex that coats the single male X chromosome to acetylate histone H4K16 and increase transcription (Conrad and Akhtar, 2012). In female mammals, Xist is expressed from a single X-chromosomal locus and coats the X chromosome from which it is expressed in order to silence transcription (Augui et al., 2011). Other ncRNAs, such as HOTAIR (Rinn et al., 2007; Chu et al., 2011), HOTTIP (Wang et al., 2011), and enhancer RNAs (Sigova et al., 2015), have been shown to regulate expression of specific genes by localizing to chromatin and recruiting activating or repressing proteins. Finally, repetitive ncRNA transcripts have roles at chromosomal loci essential in maintaining genomic integrity over many cell divisions, including TERRA at telomeres (Bunting et al., 2010) and alpha-satellites near centromeres (Hall et al., 2012). Despite these well-studied examples, the genomic targets of most chromatin-associated ncRNAs are unknown, and the mechanisms by which these ncRNAs regulate the epigenetic and spatial organization of chromatin remain largely unexplored.

Genomic methods for studying the localization of specific RNA transcripts include ChIRP (Chu et al., 2011), CHART (Engreitz et al., 2013), and RAP (Simon et al., 2011). These techniques use hybridization of complementary oligonucleotides to pull down a single target RNA and then next generation sequencing or mass spectrometry to identify its DNA- or protein-binding partners (Simon et al., 2011). However, de novo discovery of chromatin-associated RNAs remains limited to computational predictions (Guttman and Rinn, 2012) or association with previously known factors (Khalil et al., 2009). Nuclear fractionation allows isolation of bulk chromatin and subsequent identification of chromatin bound RNAs via sequencing, but does not provide sequence-resolved maps of RNA binding locations along the genome (Werner and Ruthenburg, 2015). To overcome these limitations, we have developed ChAR-seq, a proximity ligation and sequencing method (Figure 1A) that both identifies chromatin-associated RNAs and maps them to genomic loci (Figure 1B).

Figure 1 with 10 supplements see all

Download asset Open asset

ChAR-seq uses proximity ligation of chromatin-associated RNA and deep sequencing to map RNA-DNA contacts in situ.

(A) Overview of the ChAR-seq method wherein RNA-DNA contacts are preserved by crosslinking, followed by in situ ligation of the 3’ end of RNAs to the adenylated 5’ end of the ssDNA tail of an oligonucleotide ‘bridge’ containing a biotin modification and a DpnII-complementary overhang on the opposite end. After extending the bridge by reverse transcription to generate a strand of cDNA complementary to the RNA, the genomic DNA is then digested with DpnII and then re-ligated, capturing proximally-associated bridge molecules and RNA. The chimeric molecules are reverse-transcribed, purified and sequenced. (B) Chimeric molecules are sequenced and the RNA and DNA ends are distinguished owing to the polarity of the bridge, which preferentially ligates to RNA via the 5'-adenylated tail and to DNA via the DpnII overhang. The RNA and DNA reads are then computationally recombined to produce contact maps for each annotated RNA in the genome. (C) Representative examples of genome-wide RNA coverage plots generated for Total RNA (black), mRNA (red), *Hsromega* (green), *chinmo* (green), *ten-m* (green), *snRNA:U2* (cyan), *snRNA:7SK* (cyan), *rox1* (blue) and *roX2* (purple). Arrows show the transcription start site for each gene. In chromosome cartoons throughout the paper, light gray represents the primary chromosome scaffolds, darker gray regions are heterochromatic scaffolds, and black circles are centromeres. (D) Zoomed in region for an 850 kilobase region of chromosome 3L (chr3L). ChAR-seq tracks for Total RNA, *ten-m*, *snRNA:U2*, and *snRNA:7SK* are shown in comparison with PRO-seq tracks (*Drosophila* S2 [Kwak et al., 2013]) and ATAC-seq (this study, CME-W1-cl8+). (E) ChAR-seq contact matrix (RNA-to-RNA, *top*) plotted and aligned with same 850 kb region as panel D. ChAR-seq was performed without bridge addition (Hi-C/Mock-ChAR), resulting in DNA-DNA proximity ligation as in Hi-C (‘Hi-C, DNA-to-DNA’, *bottom*).

https://doi.org/10.7554/eLife.27024.002

Results

We developed and performed ChAR-seq using Drosophila melanogaster CME-W1-cl8+ cells, a male wing disc derived cell line with a normal karyotype and well-characterized epigenome and transcriptome (Cherbas et al., 2011; Roy et al., 2010). ChAR-seq is a chromosome conformation capture method that maps genome-wide RNA-to-DNA contacts in crosslinked nuclei (Dekker et al., 2002; Rao et al., 2014). Briefly, cells are cross-linked with formaldehyde and permeabilized, then RNA is partially fragmented and soluble RNA is washed away. The chromatin-cross-linked RNA is then ligated to an oligonucleotide duplex 'bridge' molecule and reverse transcribed. Genomic DNA is then digested and ligated onto the other end of the oligonucleotide 'bridge', creating a link between chromatin-associated RNA and proximal DNA. The ligated RNA is fully converted to cDNA during second strand synthesis. Finally, the DNA is sonicated and the chimeric molecules are purified, processed, and sequenced.

To enable the capture and analysis of RNA-to-DNA contacts, the oligonucleotide bridge (see Figure 1—figure supplement 1) was designed to have several key features: (1) the 5'-adenylated end (5'App) of the bridge ensures that it is the only 5’ end competent for ligation to the 3'-ends of ssRNA by truncated T4 Rnl2tr R55K K227Q mutant ligase (Viollet et al., 2011), which cannot adenylate 5' ends (Figure 1—figure supplement 2), (2) the sequence of the bridge does not exist in the yeast, fly, mouse or human genomes and encodes a defined polarity, (3) the end of the bridge contains a restriction site for specific ligation to digested genomic DNA, and (4) the bridge is biotinylated so that it can be captured and enriched. After the bridge is ligated to RNA in situ, the molecules are stabilized by reverse transcription using Bst3.0 polymerase, which can traverse the DNA-RNA junction. The genomic DNA is then digested using the restriction enzyme DpnII, which has a median spacing of ~200 bp between sites in the fly genome. The digested genomic DNA is then ligated to the bridge adaptor using T4 DNA ligase. Second strand synthesis is completed using limiting RNase H and DNA Polymerase I. Crosslinks are then reversed followed by DNA precipitation, sonication and purification using streptavidin beads. Libraries are constructed by repairing and dA-tailing the DNA fragments, ligating TruSeq adaptors to the ends and PCR amplifying (Figure 1A).

Upon conversion of RNA-DNA contacts to a covalent chimera, the chimeric molecules were sequenced using 152 bp single-end reads. Sequencing across the bridge junction ensures identification of the RNA and DNA portions of the chimeric molecule by reading the polarity of the bridge (Figure 1B). The RNA/cDNA (Figure 1B, red) and the genomic DNA side (Figure 1B, black) of each read were computationally split and aligned to the transcriptome and genome. After post-processing for unique alignments, repeat removal, and removal of blacklisted regions, each RNA molecule was mapped to the genomic location to which it was ligated (see Materials and methods and Figure 1—figure supplement 3), resulting in 22.2 million high-confidence unique mapping events for ~16,800 RNA transcripts. All individual RNA-to-DNA contacts for a given transcript were then combined to produce a genome-wide association map for each individual transcript (Figure 1C). To ensure that ChAR-seq signal was not due to spurious bridge-to-DNA ligation, we performed a control experiment in which we added RNase A and RNase H to lysed cells before the RNA-to-bridge ligation. This RNase-treatment reduced the number of unique bridge molecules identified by six-fold, demonstrating that the vast majority of bridge ligation events are indeed RNA-dependent (Figure 1—figure supplement 4). The ChAR-seq protocol is highly reproducible, with the number normalized RNA-to-DNA contacts observed for each RNA showing high concordance between replicates (Figure 1—figure supplement 5). To estimate the specificity of our observed RNA-to-DNA ligation events and to ensure that the contacts that we observed were not due to diffusion of RNA after fragmentation, we performed a spike-in experiment wherein we added increasing amounts of exogenous RNA to the cells after lysis but before bridge ligation. After lysis and RNA fragmentation, we recovered and quantified the total soluble RNA from the supernatant, then spiked-in purified in vitro transcribed RNA fragments (~200 nt) from commonly used protein expression vectors (MBP, HALO and GFP). The spike-in controls were added at 0.1%, 1% and 10% of the total, recoverable RNA by mass. Though we see a clear concentration-dependent increase in the number of false positives, even in the scenario where we spiked in RNA at 10% of the total soluble RNA we observed fewer than 0.5% false positive RNA-to-DNA contacts (Figure 1—figure supplement 6), which compares favorably to the false positive rates in related RNA-DNA mapping methods (Li et al., 2017; Sridhar et al., 2017).

Only the 3'-hydroxyl of each RNA is available for ligation to the bridge, thus the polarity of each RNA molecule with respect to its transcriptional direction can be determined by its orientation with respect to the bridge. The majority (85% of total) of the RNAs captured in our assay were sense, with the largest single subtype represented by sense-stranded mRNA (32% of total), owing to the capture of nascent transcripts (Figure 1—figure supplement 7). Most of the chromatin-associated antisense transcripts that we identified arose from ncRNA or intronic regions. In fact, 96% of the antisense mRNAs were intronic in origin with 64% of these originating from a single 119 kb gene (CG42339), suggesting the presence of unannotated ncRNAs in this region. The remaining chromatin-associated RNA detected in our assay arose from non-protein coding transcripts (see Figure 1—figure supplement 7), of which 18% were small nucleolar RNA (snoRNA) and 19% were small nuclear RNA (snRNA).

ChAR-seq generated RNA-to-DNA contacts can be aggregated (Figure 1C, see e.g., Total RNA), grouped by RNA class (Figure 1C, see e.g., mRNA) or viewed individually (Figure 1C). Individual RNAs mapped by ChAR-seq generally fell into one of three classes. In the first class, RNAs were found around the locus from which they are transcribed (Figure 1C, see, e.g., Hsromega, chinmo, ten-m). In the second class, RNAs were found bound to chromatin in trans, generally distributed across most or all of the genome, often in addition to a peak around the gene body from which the RNA is transcribed (Figure 1C, see e.g., snRNA:U2, snRNA:7SK). In a third class, RNAs that are part of the dosage compensation complex (Figure 1C, see roX1 and roX2) were enriched on and coat the X chromosome. To investigate this first class of RNAs, we compared aggregated RNA-to-DNA contacts with data from nascent transcription sequencing using PRO-seq (Kwak et al., 2013), and observed qualitative agreement between PRO-seq and ChAR-seq data sets (Figure 1D, see PRO-seq and mRNA). Nevertheless, many RNA-to-DNA contacts in our dataset are associated in trans to genomic regions outside of the gene body from which the RNA is transcribed (Figure 1—figure supplement 8).

To determine if the ChAR-seq protocol disrupts genome organization, we omitted the bridge to produce a mock-treated sample. We next biotinylated the DpnII-digested ends in our mock-sample and ligated them, essentially preparing a Hi-C library (‘Hi-C/Mock-ChAR’) (see Materials and methods for details) from the mock-treated sample (Figure 1E, bottom). Topologically associated domains (TADs) are preserved in the mock-treated sample and the DNA-DNA contacts show high correlation with previously published CME-W1-cl8+ cell Hi-C (Ramírez et al., 2015) (Figure 1—figure supplements 9–10). Thus, the three-dimensional genome organization is largely preserved in our protocol. Furthermore, the profile of RNA-to-DNA contacts detected by ChAR-seq was distinctly different from that of DNA-to-DNA contacts in our mock-treated Hi-C library (Figure 1E and Figure 1—figure supplement 9), indicating that ChAR-seq signal is not simply a byproduct of DNA-DNA contacts.

ChAR-seq data can also be visualized in a two-dimensional contact plot, where the genomic locus from which the RNA is transcribed is represented on the y-axis in linear genome coordinates, and the x-axis defines the genomic location where each RNA was bound. These plots provide a useful overview visualization for of the entire dataset. When we generated these contact plots for ncRNA (Figure 2A), mRNA (Figure 2B) and snRNA (Figure 2C), we observed strong horizontal lines that represent RNA transcripts that are transcribed from a single locus but are found distributed throughout the genome (class II), or in the special case of roX1 and roX2, specifically along the X chromosome (class III). Furthermore, RNAs found at sites from which they are transcribed clustered tightly along the diagonal, a feature most pronounced for mRNAs (class I) (Figure 2B). Many of the RNAs we found distributed broadly across the genome are transcription associated small nuclear RNAs (snRNAs) (Figure 2C). One of these, snRNA:7SK, is an abundant snRNA that functions as a scaffold for a transcriptional regulatory ribonucleoprotein complex that includes p-TEFb, Hexim and LARP7. Other broadly distributed snRNAs are components of the spliceosome (e.g., snRNA:U2) which largely functions co-transcriptionally (Perales and Bentley, 2009).

Figure 2 with 2 supplements see all

Download asset Open asset

ChAR-seq is an ‘all to all’ RNA-to-DNA proximity ligation method.

(A) Genome-wide plot of RNA to DNA contacts for non-coding RNAs. The y-axis represents the region of the genome from which a given RNA was transcribed and the x-axis represents the region of the genome where each RNA was found to be associated through proximity ligation (i.e., the binding site for each RNA). Genome-wide contact plots generated in the same way for (B) mRNA, and (C) snRNA. (D) Cumulative frequency of length-normalized contacts for 16,812 RNAs identified on the ‘RNA-side’ of chimeric reads. The majority (88%) of RNAs have fewer than 10 contacts per kilobase per million reads (CPKM) in our dataset and were not further analyzed owing to low coverage. The remaining 1952 RNAs account for 18.5 million (83%) of the total RNA-to-DNA contacts. (E) Scatter plot of length normalized chromatin-contacts versus total expression for each RNA. The 138 RNAs that had more than 100 CPKM and were enriched more than ten-fold are highlighted in red.

https://doi.org/10.7554/eLife.27024.013

To identify RNAs that are highly enriched for chromatin interactions, we plotted the normalized cumulative distribution of the number of sense contacts observed for each gene (Figure 2D). The majority of the RNAs in our dataset (14,860 out of 16,812, 88%) had fewer than 10 contacts per kilobase million reads (CPKM) (Figure 2D) and were excluded from further analysis. The remaining 1952 RNAs (12%) accounted for 83% (18.5 million) of all chromatin contacts in our data set. To estimate the contribution of total RNA abundance to this interaction signal, we performed RNA-seq for the CME-W1-cl8+ cell line and compared RNA expression levels with RNA-to-DNA contacts identified by ChAR-seq (Figure 2E, Supplementary file 1). We observed a correlation between RNA expression level and chromatin-RNA contacts; however, a cluster of RNAs clearly generated more chromatin interactions that would be expected from the overall expression levels (Figure 2E). Using both the length and read normalized contacts (CPKM) and the fold-enrichment over RNA expression as measured by RNA-seq, we identified 138 RNAs that had more than 100 CPKM and were enriched more than ten-fold, though many were enriched by 2–5 orders of magnitude (Figure 2E, red symbols; Figure 2—figure supplement 1). Notably, we observe good concordance between the RNAs identified using this methodology between replicates (Figure 2—figure supplement 2).

We developed ChAR-seq using the male WME-cl8+ line, reasoning that the ncRNAs roX1 and roX2 would serve as an internal positive control. Both roX1 and roX2 are part of the MSL2 complex, which binds across the X-chromosome in male flies to recruit chromatin-modifiers that increase transcriptional output (Figure 3A) (Lucchesi and Kuroda, 2015). Indeed, ChAR-seq data showed roX1 and roX2 to be 7.6-fold (p-value<10⁻¹⁰) and 8.1-fold (p-value<10⁻¹⁰) enriched for interactions on the X chromosome, respectively (Figure 3B,C, Figure 3—figure supplement 1). In contrast, female flies express Sex lethal (Sxl), which binds to msl2 mRNA to prevent its translation, blocking assembly of the MSL2 complex (Lucchesi and Kuroda, 2015). Importantly, roX1 and roX2 require MSL2 for X-chromosome specific localization (Lucchesi and Kuroda, 2015), therefore female cells should lack detectable spreading of these ncRNAs along the X-chromosome. When we performed ChAR-seq in a female Drosophila melanogaster cell line, Kc167, we did not detect any significant roX2 localization on the X chromosome (Figure 3D) but observed excellent agreement in interaction signal from other RNAs across both cell lines (Figure 3—figure supplement 2, Figure 3C male, CME-W1-cl8+ and Figure 3D, female, Kc167, see e.g., snRNA:7SK and Hsromega).

Figure 3 with 2 supplements see all

Download asset Open asset

Mapping *roX1* and *roX2* of the X chromosome dosage compensation complex.

(A) Illustration of the *roX1*/*roX2* spreading across the solitary X chromosome in male flies (CME-W1-cl8+ cell line). In contrast, the female-derived Kc167 cell line expresses significantly lower levels of the MSL2 complex, which mediates the association of *roX1* and *roX2*, which therefore do not coat either of the two X-chromosomes in females. (B) Circos plot showing *roX2* spreading from its site of transcription (red arrow) and binding with high density along the X-chromosome but low density binding throughout the genome. (C) Coverage plots of *roX1* (blue), *roX2* (purple), *snRNA:7SK* (cyan) and *Hsromega* (green) in male CME-W1-cl8+ cells. Tracks are DpnII normalized reads. ChAR-seq data were subsampled to match the read depth of the Kc167 sample. (D) Complementary coverage plots generated from female Kc167 cells. (E) Comparison of ChAR-seq (*this work*) to an alternative RNA-to-chromatin mapping method called ChIRP-seq (data from reference [Quinn et al., 2014]). Tracks for *roX1* (upper, blue) and *roX2* (lower, purple) were generated from 32308 and 87453 contacts, respectively, from a ChAR-seq dataset containing a total of 22.2 million contacts. For comparison, the *roX1* and *roX2* tracks derived from ChIRP-seq each represent greater that 20 million reads. To compare tracks at different read depths, the contact number was autoscaled, with the maximum peak height given a value of 1. (F) Comparison of the signal-to-noise ratio (see methods) between ChAR-seq and Chirp-seq for *roX* genes. ‘Raw reads’ is the number of *roX* reads present in each data set analyzed. (G) Correlation coefficients were calculated for *roX1* and *roX2* coverage tracks generated using ChIRP-seq and ChAR-seq and plotted relative to increasing bin size to estimate the resolution of the ChAR-seq assay.

https://doi.org/10.7554/eLife.27024.016

High-resolution maps of roX1 and roX2 localization have previously been generated using ChIRP-Seq, which hybridizes probes against a known RNA and pulls down the associated chromatin for sequencing (Chu et al., 2011; Quinn et al., 2014). Comparing ChIRP-seq to ChAR-seq for both roX1 and roX2 (Figure 3E), we found that DNA contact locations were in surprisingly good agreement despite the fact that ChAR-seq reads are spread across all RNAs while ChIRP-seq reads map the specific RNA target, resulting in a large disparity in the effective sequencing depth between the methods. In ChIRP-seq, virtually all of the signal is attributable to interactions between chromatin and the target RNA. In contrast, ChAR-seq captures all RNA and DNA contacts, so that any given target RNA will comprise a subset of the total RNA-chromatin contacts in the dataset. In the case of roX1 and roX2, we observed 32,308 and 87,453 contacts, representing 0.1% and 0.36% of the ChAR-seq dataset. In contrast, the ChIRP-seq datasets plotted in Figure 3E represent ~24M and ~21M reads for roX1 and roX2, respectively. This indicates that ChAR-seq can identify RNA peaks along chromatin with high sensitivity for a given RNA.

To more quantitatively compare ChAR-seq to ChIRP-seq, we compared the signal-to-noise ratio (SNR) for roX1 and roX2 binding to DNA in each assay (Figure 3F). We defined signal as the most densely bound regions on chrX, whereas we defined noise as binding events distributed randomly on autosomes (see Materials and methods for details). We found that ChAR-seq roX1 SNR was much higher than that of ChIRP-seq (49.8 vs 3.9), and ChAR-seq roX2 SNR was about two-fold higher than that of ChIRP-seq (45.2 vs 22.0), despite having 2 orders of magnitude fewer reads in ChAR-seq. Altogether, these data suggest that ChAR-seq has excellent sensitivity and sufficient signal-to-noise to characterize accurate chromatin-binding events for individual RNAs.

The resolution with which we can measure the localization of an RNA to a given genomic site constrains our ability to assess its potential modes of action. To measure the accuracy of ChAR-seq measurements of RNA interaction with DNA, we compared ChAR-Seq data to ChIRP-seq data to estimate the base-pair resolution of ChAR-Seq. We expected this resolution to be bounded ––in part––by the local DpnII cut frequency and the number of contacts for any given RNA. We divided the X chromosome into evenly sized bins and calculated correlation coefficients between ChIRP-seq and ChAR-seq datasets at increasing bin sizes for both roX1 and roX2 (Figure 3G). Using this method, we noted a bi-phasic increase of the correlation coefficient, corresponding to a minor plateau around 200 bp and a major plateau at ~25 kbp. The minor plateau is likely due to the DpnII distribution bias in the ChAR-seq tracks, while the major plateau is an estimate of the resolution of our assay, which is on the order of other proximity-ligation sequencing assays like Hi-C (van Berkum et al., 2010).

To test if we could identify the functional roles for our most highly enriched RNAs, we clustered the snRNA class of RNAs based on their genomic contacts. These snRNAs collectively comprised 23% of all the RNA-to-DNA contacts in our dataset (Figure 4A) and are a substantial component of the spliceosome, a multi-megadalton ribonucleoprotein complex that catalyzes pre-mRNA splicing (Zhou et al., 2002; Will and Lührmann, 2011). The composition and conformation of the spliceosome is highly dynamic, though two dominant species exist in eukaryotes: the major spliceosome comprised of U1, U2, U4, U5 and U6 snRNAs, and the minor spliceosome comprised of U4:atac, U6:atac, U5, U11, and U12 (Will and Lührmann, 2011). Many members of this class of snRNAs have highly similar gene duplication variants in the Drosophila genome. We therefore first calculated the base sequence similarity of these variants to one another and aggregated signals that were tightly clustered (Figure 4—figure supplement 1). When we then correlated genome-wide binding signal within this class, we found that the distribution patterns of the major spliceosome snRNAs U1, U2, U4, U5, U6 clustered together along with snRNA:7SK (Figure 4B), which is part of the p-TEFb complex that relieves pausing of RNA Polymerase II at promoters (Kwak and Lis, 2013) and may participate in the release of paused polymerase during RNA splicing (Barboric et al., 2009). The components of the minor spliceosome did not cluster together, likely due to their low abundance (Will and Lührmann, 2011) and consequently low representation in our dataset.

Figure 4 with 2 supplements see all

Download asset Open asset

Correlation of chromatin-associated RNAs with genome features.

(A) Relative abundance of snRNAs identified by ChAR-seq. The size of the circles is proportional to the abundance of the snRNAs found by ChAR-seq. RNA components of the major and minor spliceosome are bounded by the gray boxes. (B) Cluster analysis of the pairwise correlation between genome-wide tracks of snRNAs. (C) Meta-analysis plots aggregating the signal of *snRNA:7SK*, *snRNA:U2*, *snRNA:U5*, *roX1*, *roX2* and ATAC-seq over gene bodies (red), putative enhancers (blue dashed line) and random regions (black). (D) Hierarchical clustering based on pairwise Pearson correlation between representative ChAR-seq RNA-to-DNA contact coverage tracks (black) and modENCODE datasets available for the WME-C1-cl8 + cell line. Notable associations for the dosage compensation complex (green) and heterochromatin (‘het’) are indicated in the right margin.

https://doi.org/10.7554/eLife.27024.019

We next reasoned that spliceosome RNAs––as part of the co-transcriptional RNA processing machinery––should also be enriched in regions of active transcription. We therefore aggregated spliceosomal RNA signals over gene bodies (Figure 4C, red lines), putative enhancers (Kvon et al., 2014) (Figure 4C, blue dashed lines) and a random distribution of genomic bins of similar size (Figure 4C, black lines). We observed an enrichment of snRNAs (7SK, U2 and U6), but not roX1 or roX2, over gene bodies (Figure 4C) with a broad peak around transcription start sites, in good agreement with ChIRP data for 7SK in mice (Flynn et al., 2016). Because active transcription is also correlated with topological boundaries in flies (Hou et al., 2012; Ulianov et al., 2016), we examined the relationship between RNA contacts and genome organization. To test whether RNA-DNA contacts are enriched at topological boundaries, we measured the DNA contact frequency for snRNA:7SK and snRNA:U2 in 20 kb windows spanning TAD boundaries and TAD centers. Both 7SK and U2 RNAs were modestly, but significantly (Wilcoxon rank-sum test) enriched at TAD boundaries (Figure 4—figure supplement 2A (top)). We also aggregated 7SK and U2 contact signals across all TAD boundaries (Figure 4—figure supplement 2A (bottom), red lines) and a random distribution of identically sized genomic bins (Figure 4—figure supplement 2A (bottom), black lines), and found that these RNAs were ~1.4 fold enriched over boundaries. We also noted that roX2 was ~2.2 fold enriched over TAD boundaries on the X chromosome (Figure 4—figure supplement 2B), consistent with previous data showing that MSL2 dosage compensation complex High Affinity Sites (HAS) are preferentially found at TAD boundaries (Ramírez et al., 2015). Examination of chromatin accessibility (ATAC-seq) over TAD boundaries (Figure 4—figure supplement 2B) showed that open chromatin is enriched at TAD boundaries, supporting the idea that TAD boundaries are transcriptionally active in flies.

In contrast to the small number of well-defined and well-characterized snRNAs involved in splicing, there are more than 200 snoRNAs in flies (Huang et al., 2005) that are significantly divergent in sequence and, surprisingly, were highly represented in our dataset (Figure 1—figure supplement 7 and Figure 2—figure supplements 1–2). Most of these snoRNAs have either unknown function or are computationally identified and indirectly implicated in the maturation and modification of ribosomal rRNA (Huang et al., 2005).

To determine if our enriched chromatin-associated RNAs, in particular snRNAs and snoRNAs, might localize to euchromatic or heterochromatic states or with specific transcription factors, we cross-correlated our ChAR-seq signal against modENCODE datasets available for the CME-W1-cl8+ cell line. To normalize the signals for comparison, we first calculated the expected contacts per 2 kb bin for each RNA under a uniform distribution, based on the total number of genome-wide contacts for each RNA and the number of DpnII sites per bin. This null model was then used to calculate the log2 ratio of the observed to the expected contacts per bin for each RNA, which was then transformed into a z-score ((x-μ)/σ) based on the whole-genome mean (μ) and standard deviation (σ). Similarly, we re-binned the modENCODE tracks, removed bins that did not contain a DpnII site, and transformed the log2 mean-shift values to a z-score. We then calculated the pair-wise Pearson correlation coefficients between each signal track, and then clustered the data (Figure 4D). We observed discrete clustering of roX1 and roX2 with known dosage compensation complex factors, MOF, the histone modifications H4K16ac and H3K36me3 (Bell et al., 2008), and JIL-1 kinase (Bai et al., 2004), validating this analytical approach (Figure 4D). Beyond this sub-cluster of dosage compensation factors, the remainder of the chromatin-associated RNAs fell into two distinct and anti-correlated categories: those associated with active chromatin and transcription (e.g., RNAPII, H4K8ac, H3K18ac) or heterochromatin (e.g., HP2, H3K9me3, HP1a) (Figure 4D). In particular, we note that snRNA:U2 and snRNA:7SK cluster tightly with the transcription-associated chromatin marks, while many of the snoRNAs and minor spliceosome snRNAs that we identified are associated with heterochromatin, likely due to co-localization of heterochromatin factors to the nucleolus. Interestingly, snRNA:U5, a component of both the major and minor spliceosome, has variants that clearly cluster with either transcriptionally active chromatin (63BC) or heterochromatin (23D, 38ABa, and 34A). Previous work has shown that the snRNA:U5:38Aba variant (Figure 4D, heterochromatin cluster) exhibits a unique tissue-specific expression profile with the greatest abundance in neural tissue, which led the authors to propose isoform-dependent functions in alternative splicing (Chen et al., 2005). The differential clustering that we observe for snRNA:U5, and between major and minor spliceosome snRNAs, for euchromatin and heterochromatin might reflect such isoform-specific functions of the spliceosome in different chromatin states.

Discussion

ChAR-seq maps the chromosomal binding sites of all chromatin-associated RNAs, independent of whether they are associated as nascent transcripts or bound as part of ribonucleoprotein complexes (RNPs). In this way, ChAR-seq can be thought of as a massively parallelized de novo RNA mapping assay capable of generating hundreds to thousands of RNA-binding maps. ChAR-seq also detects multiple classes of chromatin-associated RNAs. ChAR-seq preserves the three-dimensional structure of the genome and provides an RNA-DNA interaction matrix that complements DNA-DNA proximity measurements using Hi-C. We validated ChAR-seq using chromosome-specific ncRNAs roX1 and roX2 associated with dosage compensation. The comparison between ChAR-seq and ChIRP-seq, which vary dramatically in the sequencing depth needed to analyze a specific RNA, highlights the utility of ChAR-seq as a de novo chromatin-associated RNA discovery tool. ChAR-seq also maps nascent RNAs found at the loci from which they are transcribed.

ChAR-seq is similar to a recently published method (Sridhar et al., 2017), but has two key distinctions. First, proximity ligations are performed in situ in intact nuclei, which reduces nonspecific interactions (Rao et al., 2014). Second, ChAR-seq uses long single-end reads to sequence across the entire junction of the ‘bridge’, ensuring that RNA-to-DNA contacts are mapped with high confidence and reporting on the polarity of the bridge-ligated RNA.

While this work was in revision, a genome-wide RNA-DNA contact method—GRID-seq—was published that also uses proximity ligation of a directional bridge to RNA and DNA (Li et al., 2017). One key difference is that GRID-seq uses a restriction enzyme to cut cDNA and gDNA fragments 19–23 bp distal to the bridge, which allows size selection of molecules that contain both RNA and DNA. This likely reduces the number of uninformative molecules sequenced. A disadvantage, however, is that the resulting reads are 19–23 bp, which have a greater chance of falsely mapping during genome alignment than the 20–100 bp RNA and DNA fragments obtained with ChAR-seq. Although their details differ, ChAR-seq and GRID-seq appear to have similar ability to detect roX2 binding to chrX in Drosophila (see Figure 3—figure supplement 1B).

We used ChAR-seq to discover and map several dozen ncRNAs that are pervasively bound across the genome. Many of these ncRNAs are components of ribonucleoprotein complexes associated with transcription elongation (snRNA:7SK), splicing (snRNA:U2, etc) and RNA processing (snRNAs, snoRNAs and scaRNAs). Interestingly, more than half of the chromatin-associated RNAs identified based on our enrichment criteria are snoRNAs, most of which––but not all––correlate with heterochromatin. Generally, snoRNA ribonucleoproteins (snoRNPs) use intermolecular base pairing to direct chemical modification of the 2'-hydroxyl groups or the isomerization of uridines to pseudouridine (Cech and Steitz, 2014) and snoRNAs are known abundant components of chromatin in both flies (Schubert et al., 2012) and in mice (Meng et al., 2016). Despite their abundance and the their known role in RNA modification, we do not yet understand the functions of these modifications, or the implication of snoRNAs and scaRNAs chromatin association in cells (Cech and Steitz, 2014). We also demonstrate that ChAR-seq can be used with orthogonal genome-wide datasets to identify and classify RNAs that are associated with specific chromatin states (e.g., euchromatin vs heterochromatin). We expect this approach will be particularly useful in organisms that use lncRNAs such as HOTAIR, HOTTIP and BRAVEHEART as scaffolds for ribonucleoproteins that regulate facultative heterochromatin. Finally, we observed enrichment of transcription-associated RNAs with TAD boundaries, reflecting a potential role for active transcription in the topological organization of the genome. This observation is consistent with previous observations that active transcription is a stronger predictor for TAD partitioning in flies (Hou et al., 2012; Ulianov et al., 2016) than CTCF and cohesin, the prototypical TAD boundary markers in mice and humans (Merkenschlager and Nora, 2016).

We anticipate that ChAR-seq will be a powerful new high throughput discovery platform capable of simultaneously identifying new chromatin-associated RNAs and mapping their chromatin binding sites (and associated epigenetic chromatin states), all of which will be particularly useful in comparing ‘epigenomic’ changes that coincide with cellular differentiation or tumorigenesis.

Share this article

Cite this article

ChAR-seq uses proximity ligation of chromatin-associated RNA and deep sequencing to map RNA-DNA contacts in situ.

ChAR-seq is an ‘all to all’ RNA-to-DNA proximity ligation method.

Mapping roX1 and roX2 of the X chromosome dosage compensation complex.

Correlation of chromatin-associated RNAs with genome features.

Author details

Jason C Bell

Present address

Contribution

For correspondence

Competing interests

David Jukam

Contribution

Competing interests

Nicole A Teran

Contribution

Competing interests

Viviana I Risca

Contribution

Competing interests

Owen K Smith

Contribution

Competing interests

Whitney L Johnson

Contribution

Competing interests

Jan M Skotheim

Contribution

Competing interests

William James Greenleaf

Contribution

Competing interests

Aaron F Straight

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism