Overview of RNase-treated iMARGI data in H1 ES cells. a) Schematic comparison of standard iMARGI (control) and RNase-treated iMARGI protocols, illustrating differences in RNA-chromatin interactions, RNA transcript profiles, and protected chromatin-associated RNA (caRNA) interacting genomic regions. b) Percentage of RNA reads mapping to genic (blue) and intergenic (gray) regions in control and RNase-treated iMARGI samples. Error bars represent standard error across replicates. c) Percentage of RNA reads mapping to repeat (green) and non-repeat (gray) regions in control and RNase-treated iMARGI samples. d) Fold change of RNase-insensitive repeat caRNA species compared to control. Red line indicates no change (fold change = 1). e) Differential gene expression analysis between control and RNase-treated samples. Red and blue dots represent significantly up- and down-regulated genes, respectively. f) Proportions of lncRNA, protein-coding, and other RNA types among control-specific, non-differentially expressed, and RNase-specific genes. g) Diseases associated with RNase-specific lncRNA, ranked by statistical significance (-log2(FDR)).

Sequence conservation of RNase inaccessible RNA ends. a) Sequence conservation (PhaseCons 100 score) of each read’s 1000 bp flanking region. All pairs in the control and RNase library were divided into three groups based on their RNA and DNA end genomic distance. short-distance pairs (<1k bp), cis pairs (1k bp ∼ 2Mb), trans pairs (>2Mb and interchromosomal). b) Average read conservation score in each library and each group. Differences of conservation score in each group were tested using t-test. c) RNA ends coverage of SNORA74A gene in three replicates of RNase libraries and one combined control library. Sequence conservation score phastCons100 were shown on the top track. Higher value indicates higher level of conserveness. Secondary structure of SNORA74A was computed by R2DT. RNA end reads enriched regions were highlighted on the secondary structure. Enriched binding sequence motif was called by MEME. d) Same information as c) for gene RNU5B-1. Green box corresponds to the Sm motif.

RNAs interacting with functional genomic regions genome wide. a) Heatmap showing overlap between RAHs (n=25,333) and various genomic features (transcription factor binding sites, histone modifications, chromatin accessibility). Rows represent individual RAHs, columns represent genomic features. Color intensity indicates degree of overlap. Tracks below show feature type, proportion of RAHs overlapping each feature, and proportion of feature peaks overlapping RAHs. Right-side tracks display a number of features overlapping each RAH, genomic location, and CpG content. b) Proportion of DNA end peaks explained by chromatin accessibility (AC), transcription factor binding (TF), and histone modifications (HIST) in control, RNase, and RNase RAH conditions. c) Distribution of RAHs across different genomic features. d) Comparison of genomic feature composition between RAHs, transcription factor binding sites, and background genomic regions. e) Nuclear Dam-LaminB1 signals in RAHs with (blue) and without known functional annotations (purple). f) Proportion of RAHs overlapping with repeat elements (RE) and their distribution across RE categories. g) Top 5 repeat element species showing highest fold-change enrichment in RAHs compared to genome-wide RepeatMasker proportions.

RNase treatment magnifies stringent interaction of caRNA-TF interactions. a) Heatmaps and summary plots for CTCF, BCL11A, and SOX2 binding sites. Each row represents a ChIP-seq peak (peak center ±5kb) from H1 cells. Color intensity indicates caRNA interaction level. Class 1 transcription factors show enhanced caRNA-TF peak interactions after RNase treatment. b) Boxplots comparing caRNA concentration scores in control vs RNase-treated samples for the transcription factors in (a). Concentration score is the ratio of caRNA attachment in the central 1kb to the surrounding 3kb of each peak. c) Genome browser view of caRNA attachment levels at a CTCF binding site on Chr11. Tracks show control and three RNase-treated replicates. d-f) Similar to (a-c), but for histone modifications H3K27me3, H3K36me3, and H3K79me2. Class 3 histone modifications show diminished caRNA signals after RNase treatment. g) Scatter plot comparing caRNA concentration scores between control and RNase-treated samples for 76 chromatin markers. Dashed line represents y=x. h) Bar plot showing changes in concentration scores ((RNase-Control)/Control) for 69 invariant chromatin markers. Orange and blue indicate enriched and decreased scores, respectively.

Transcription factor binding sites enriched caRNA species. a) Contingency table of association test to identify TFBS enriched caRNA species. b) Venn plot showing the overlap between CTCF enriched caRNAs (in H1 cells) and RedChIP identified CTCF associated caRNAs (in K562 cells). c) Venn plot showing the overlap between CTCF enriched caRNAs (in H1 cells) and fRIP-seq identified CTCF associated caRNAs (in K562 cells). d) Heatmap showing each TF enriched caRNAs with color mapped to odds ratio from association test. Columns are clustered using hierarchical clustering based on Jaccard distance. Rows are genes split by gene type. Bottom track showing the proportion of each RNA species among all enriched caRNAs for each TF. e) Correlation between TF-TF similarities measured by TF genome footprints (number of TFBS in each non-overlapped 100kbp bin on the genome) or enriched RNA species. f) Boxplot showing the number of TFs each RNA species associated with. g) Top enriched GO terms of all mRNAs that are enriched with any TF examined.

Reproducibility of iMARGI library data for RNA and DNA ends. iMARGI pairs with RNA and DNA end distance larger than 1000bp used. a) Reproducibility of RNA end read counts per gene across three RNase-treated iMARGI replicates. Each point represents a gene, with read counts plotted on a log scale. RNA ends were counted in a strand-specific manner. Spearman correlation coefficients and their associated p-values are shown for each pairwise comparison. b) Similar to (a), but showing reproducibility across five control iMARGI replicates. The matrix of scatter plots illustrates pairwise comparisons between all control samples, with correlation coefficients displayed for each pair. c) Reproducibility of DNA end read counts mapped to 100 kbp genomic bins across three RNase-treated iMARGI replicates. Bins were derived by sliding across the whole genome without overlap. As in (a), Spearman correlation coefficients and p-values are provided for pairwise comparisons. d) Similar to (c), but showing reproducibility of DNA end read counts across five control iMARGI replicates, presented in a matrix of pairwise comparisons with correlation coefficients. p < 0.001, indicated by ***.

iMARGI RNA, DNA ends distance distribution. a) Barplot showing the proportion of read pairs that with RNA, DNA end distance belongs to <1kbp, 1kbp∼2Mbp, >2Mbp but within one chromosome and finally interchromosomal read pairs. Error bar of the proportion was derived from library replicates. b) Similar to a), but without <1kbp category. c) Genomic distance versus contact frequency for all RNA, DNA read pairs with two ends distance longer than 1kbp. d) Venn plot showing the overlap between RNase treated library specific genes and DisGeNET or LncRNADisease databases. e) RNase specific genes enriched disease terms.

RNase treatment magnifies stringent interaction of caRNA-TF interactions. a) Heatmap and summary plot for five class I TFs (stronger caRNA enrichment at TFBS after RNase digestion) that have been suggested to have RNA binding properties: CHD7, JUN, SIN3A, EP300, YY1. Rows represent all peaks of a TF. Chip-seq data annotated from Cistrome database H1 cell line. Columns include 1000 bins represent 10kb surrounding regions from the center of a chip-seq peak. Color maps to the number of caRNAs interacting with each bin. Summary plot shows the average coverage across all peaks for a transcription factor. b, d, f) Boxplot of concentration score in Control or RNase library. The score is defined as peak central 1000bp region caRNA attachment level divided by peak central 3000bp caRNA attachment level. c) Similar to a), showing five TFs have enriched caRNA binding after RNase digestion but have not been reported has RNA binding property. e) Similar to a), showing five markers including TF TRIM28 or histone modifications whose DNA sites showing depleted caRNA level after RNase digestion.

a) Q-Q plot showing the p value distributions of all 52 TF markers (color dots, each color corresponds to one transcription factor) and comparing them to the uniform distribution (dashed line). b) Number of enriched genes for each transcription factor. c) Histogram of p-value distribution from association test. d) Overlap between CHD7 TFBS enriched caRNA species and CHD7 fRIP-seq identified CHD7 protein targeting RNAs. e-f) Similar to c-d) but for RBBP5. g) Relationship between number of caRNA associated transcription factors and gene total transcribed read counts in untreated control library.

iMARGI libraries overview