Reduction and depurination of DMS-modified RNA enables detection of N7G-DMS adducts with misincorporations.

(A) Strategy for the misincorporation detection of N7G methylation. N7-methylated G is reduced by potassium borohydride and then heated in acidic conditions to yield an abasic site. The abasic site then proceeds to induce misincorporations in the cDNA following reverse transcription.

(B) Overall schematic for the BASH MaP experimental workflow. RNA is first treated with dimethyl sulfate (DMS) which produces the following adducts: m1A, m3C, m7G, and to a lesser extent m1G and m3U. DMS-modified RNA is then subjected to reduction by potassium borohydride (800 mM) for 4 h at room temperature followed by purification and heating in a pH 2.9 buffer of acetic acid and sodium acetate for 4 h at 45°C. RNA is then purified and subjected to reverse transcription with enzymes and buffer conditions which promote cDNA misincorporations at methylated bases.

(C) Optimization of reduction duration and efficiency of abasic sites to induce misincorporations. To determine the optimal reduction duration, we treated total HeLa RNA with potassium borohydride (800 mM) for various durations of time, performed depurination for 4 h in acid buffer, and then prepared RT-PCR amplicons surrounding the endogenously methylated base m7G1638 in the 18S rRNA. We then quantified the fraction of reads containing a misincorporations at G1638 divided by the total number of reads to yield a misincorporations rate at G1638 following amplicon sequencing. Plots of misincorporations rates revealed that 4 h of borohydride treatment induced an average misincorporations rate of 70% at G1638.

(D) Misincorporation signature of abasic sites under reverse transcription conditions for detection of methylated bases. On the left, quantification of the types of misincorporations at G1638 as described in Fig. 1C for SuperScript II, a reverse transcriptase enzyme commonly used to detect methylated bases with cDNA misincorporations. On the right, fraction of each type of misincorporation calculated collectively from all G residues in Spinach following reverse transcription with SuperScript II. Spinach RNA was either modified with DMS (170 mM) for 8 min at 25°C or treated with an ethanol control. Modified and control RNA was then reduced with potassium borohydride (800 mM) for 4 h or incubated in water for 4 h. All three Spinach samples underwent identical heating in acidic buffer conditions before undergoing reverse transcription. Comparison of types of misincorporations shows that reduction of DMS treated Spinach RNA produces a misincorporation signature at G residues which mirrors the positive control G1638 when reverse transcribed with SuperScript II.

(E) Reduction of DMS treated Spinach RNA produces novel misincorporation data at G bases. To determine if Spinach is highly modified by DMS at N7G we utilized the experimental data as described in Fig. 1E with an additional control group in which DMS was omitted but the sample underwent reduction and depurination. Not shown, all four samples underwent identical heating in acidic buffer prior to reverse transcription. We then plotted the misincorporation rate of each G in Spinach for each experimental condition. This misincorporation rate reveals a dramatic increase in misincorporation rates for G bases in Spinach modified with DMS and reduced with potassium borohydride.

(F) Reproducibility of BASH MaP. Spinach RNA was probed with either 85 mM or 170 mM DMS for 8 min at 25°C and then reduced and depurinated. The misincorporation rate at each position in Spinach was compared between the two samples and a linear regression was performed which showed an R2 of 0.9928 demonstrating high reproducibility.

(G) Effect of reduction and depurination on the detection of m1A, m3C, and m3U. To determine whether reduction and depurination of DMS-treated RNA impaired the detection of other methylated bases, we treated Spinach with DMS (170 mM) for 8 min at 25°C using buffer conditions which promote the methylation of m1G and m3U2, 20. Then, DMS-treated Spinach was either directly reverse transcribed (DMS MaP) or subjected to reduction and depurination (BASH MaP) before reverse transcription. We then compared the misincorporation rate at each A, C, and U position in Spinach and performed a linear regression. The R2 of 0.9222 demonstrates that reduction and depurination do not impair the detection of m1A, m3C, and m3U generated by the modification of RNA by DMS.

(H-J) Receiver operator characteristic curves demonstrate that BASH MaP identifies single-stranded regions of RNA. To determine if BASH MaP could accurately distinguish single-stranded from base-paired A, C, and U bases, we constructed Receiver Operator Characteristic (ROC) curves for A, C, and U bases for both BASH MaP and DMS MaP as described in Fig. 1G. The larger the area under the curve (AUC), the better a method is at discriminating paired vs unpaired RNA bases. An AUC = 1.0 demonstrates perfect discrimination ability. Panels H-J demonstrate that BASH MaP accurately discriminates between single-stranded and base-paired A, C, and U bases.

N7G reactivity reveals guanosines involved in tertiary interactions.

(A) N7G reactivity overlaid on the secondary structure model of Spinach as derived from the crystal structure 4TS230. G-quadruplex G’s are indicated with a red asterisk. N7G reactivity is colored on a continuous gradient from grey (low reactivity) to yellow (moderate reactivity) to red (high reactivity). N7G reactivities represent normalized values derived from misincorporation rates. Spinach BASH MaP data was obtained as described in Fig. 1G and no background subtraction was utilized.

(B) Comparison between structural features in Spinach and observed BASH MaP misincorporation rate at G bases. To determine which structural features impact N7G methylation rate we utilized the crystal structure 4TS2, and classified individual G’s as either base-paired, single-stranded, engaged in an RNA G-quadruplex, or engaging in a base triple. This plot reveals single-stranded G’s have relatively high methylation rates whereas RNA G-quadruplex G’s are strongly protected from N7G methylation.

(C) Comparison between RNA G-quadruplex G’s and all other G’s in Spinach reveals RNA G-quadruplex G’s are the most protected G’s from N7G methylation in Spinach.

(D) Schematic of alternative conformations the 15x polyUG (pUG) repeat RNA can adopt in solution. The 15x pUG repeat can adopt four distinct RNA G-quadruplex conformations of 12 consecutive GU repeats. Each of these four distinct conformations is indicated as Register 1 through 4. Each register is uniquely defined by a set of three G’s not engaged in an RNA G-quadruplex (colored red). These G’s are predicted to be single stranded or base-paired and thus display much higher reactivity than the G’s engaged in the RNA G-quadruplex. To show how the G-quadruplex is rearranged in each conformation, G42 is labeled with a red asterisk.

(E) BASH MaP misincorporations rate plot of pUG G’s. 15x pUG RNA was refolded in potassium buffer (100 mM) and modified with DMS (170 mM) for 6 min at 37°C. G’s engaged in a G-quadruplex for each unique register are indicated at the bottom of the plot. G’s predicted to be engaged in a G-quadruplex in each of the four registers is bolded and bracketed below the plot. The misincorporation rate plot reveals that G’s engaged in a G-quadruplex in all four registers display a strong protection from DMS methylation. Consequently, G20 and G48, which are only predicted to be protected in a single register show the highest misincorporation rates.

(F) Quantification of Fig. 2E shows the relationship between G inclusion in RNA G-quadruplex registers and measured misincorporation rate. The plot reveals increased protection from N7G methylation for G’s as the number of G-quadruplex conformations increases.

(G) BASH MaP of a G-quadruplex in the 3’UTR of the AKT2 mRNA. SH-SY5Y cells were treated with DMS (170 mM) for 6 min at 37°C. Primers were designed to specifically amplify the putative G-quadruplex region in the AKT2 mRNA. Misincorporation rates were converted to normalized reactivity values as described in METHODS. Putative G-quadruplex G’s as identified in rG4-seq39 are highlighted in red. Red bars below the sequence indicate putative G-quadruplex G’s with strong protections from N7G-DMS methylations. G’s previously identified as engaged in a G-quadruplex in cells are indicated below with a black bar. The normalized reactivity plot reveals N7G protections from DMS at previously identified in cell G-quadruplex G’s as well as other G-tracts. Together, these data support the formation of a G-quadruplex in 3’UTR of the AKT2 mRNA in SH-SY5Y cells.

BASH MaP reveals networks of co-occurring modifications in the Spinach G-quadruplex.

(A-B) Heatmap of correlation strength between misincorporation that co-occur on the same sequencing read for Spinach treated with DMS MaP (A) or BASH MaP (B). Spinach was treated with DMS (170 mM) for 8 min at 25°C. Each point represents a G-test significance value as calculated by RING Mapper which was then scaled to a value between zero and one. Values closer to one appear darker in the correlation heatmap and represent higher G-test correlation strength.

(C) Three-dimensional model of Spinach core ligand binding domain with numbering scheme used in Fig. 3A-D and Fig. 3F-G. G-quadruplex and mixed tetrad interactions in Spinach are indicated with a grey plane. The ligand DFHBI-1T interacts with G52 and sits between the upper G-quadruplex tetrad and a U-A-U base triple. Structural domains P2, P3, and the core domain are indicated.

(D) Close up of boxed region in Fig. 3A shows the pattern of co-occurring misincorporations surrounding the bases involved in the G-quadruplex of Spinach (marked in red). This heatmap displays predominantly co-occurring misincorporations between A – A, A – C, and A – G positions.

(E) Close up of boxed region in Fig. 3B shows the pattern of co-occurring misincorporations surrounding the bases involved in the G-quadruplex of Spinach (marked in red). This heatmap is enriched in co-occurring misincorporations between G – G positions and displays correlations between G’s involved in the G-quadruplex of Spinach.

(F-G) Network analysis of G – G correlations in DMS MaP (F) and BASH MaP (G) of Spinach. A network was constructed by representing G positions in Spinach as vertices and creating edges between G positions that display co-occurring misincorporations above a Z-score threshold of 2.0. BASH MaP of Spinach reveals that G’s form two major clusters. The smaller cluster contains six of nine G’s which comprise the Spinach G-quadruplex.

Multiplexed probing of single nucleotide mutants identifies N7G and base stacking interactions.

(A-E) Mutate and Map (M2) heatmaps of DMS MaP (A) and BASH MaP (B) of randomly mutagenized Spinach RNA. A mutate and map heatmap plots the chemical reactivity profile for RNAs with a PCR-derived mutation at a specific position along the length of the RNA. When a position in an RNA is mutated through mutagenic PCR, it is predicted that all interacting nucleotides will display an increase in chemical reactivity. Each row of the heatmap (Mutation Position) represents sequencing reads with a PCR-derived mutation at the indicated position within Spinach. Each column of the heatmap (Mapped Position) represents the misincorporation rate at the indicated position in Spinach. Visualization of changes in reactivity to DMS which are induced by point mutations is enabled by performing Z-score normalizations for each column individually. Only positive Z-scores are plotted to display increases in chemical reactivity due to PCR-derived mutations. M2 BASH MaP displays unique signals in the mutate- and-map heatmap in the G-quadruplex (red box) and P3 (grey box) region of Spinach.

(C) G-quadruplex topology of Spinach. Spinach contains a two-tiered G-quadruplex with a mixed tetra stacked below. G-quadruplex numberings and colors are used to denote G-tetrads. The top G-tetrad is colored blue. The middle G-tetrad is colored black. The bottom mixed tetrad is colored red. N7G interactions are denoted with a yellow dotted line.

(D) Mutate-and-Map has previously been used to identify base-pair interactions in RNA. To determine whether N7G interactions could be identified by mutate-and-map we examined the mutate-and-map heatmap generated by M2 BASH MaP of Spinach shown in Fig. 4B zoomed in on the Spinach G-quadruplex core domain. G-quadruplex G’s are indicated in red. The plot reveals that PCR-derived mutations in G-quadruplex G’s induce increases in N7G reactivity to DMS throughout the entire Spinach core domain. This suggests that PCR-derived mutations within the Spinach G-quadruplex perturb the structure of the entire core domain.

(E) Zoom in of the M2 DMS MaP heatmap for the P3 domain of Spinach. P3 stem and loop nucleotides are indicated below the plot. The base-pairing pattern of the P3 domain is clearly identified as a diagonal line in the mutate-and-map heatmap. Base pairs are highlighted with dotted arrows.

(F) Zoom in of M2 BASH MaP heatmap for the P3 domain of Spinach. G’s which display long vertical lines in the heatmap and give the P3 region a jagged appearance are highlighted in blue. The vertical lines indicate that PCR-derived mutations at multiple adjacent nucleotides all cause increased reactivity to DMS of highlighted G’s. This suggests that N7G reactivity to DMS is sensitive to local disruption of helix stacking.

(G) Crystal structure of the Spinach P3 stem. The P3 stem is denoted in grey. G bases which display signals in the M2 BASH MaP heatmap are highlighted in blue. Hyper-reactive G’s at helix termini such as G81 are colored red.

Tertiary folding constraints enable accurate secondary structure modeling of Spinach.

(A) Spinach secondary structure model derived from the crystal structure 4TS2. This secondary structure model was utilized as a benchmark for comparing structure modeling approaches.

(B) Spinach secondary structure models generated by the indicated combination of experimental data (DMS MaP or BASH MaP) and folding algorithm (mFold, RNAstructure, DAGGER). To determine whether structure probing data could improve the modeling of Spinach secondary structure, we assessed the sensitivity and specificity of a variety of computational approaches. Base pairs which are correctly predicted are indicated by green bars. Base pairs which are incorrectly predicted are indicated with red dashed lines. A base pair was determined to be correct if the true base pairing partner was within one base (+/-) from the indicated pairing partner. For detailed explanation of settings used for each RNA secondary structure modeling approach see METHODS. Incorporation of experimental data improved Spinach secondary structure modeling; however, all structures included false helices and lacked the P2 domain of Spinach.

(C) Tertiary-folding constraints derived from N7G-reactivity data are implemented through modification of the DaVinci data analysis pipeline (DAGGER). To generate tertiary constraints, G’s in the bottom quartile of N7G reactivity are first identified. Then, all pairs of bottom quartile G’s which display significant rates of co-occurring misincorporations with each other are identified as likely to be engaged in a tertiary interaction. These positions are indicated by annotating the base as lowercase in the input FASTA file for a modified DaVinci analysis pipeline. Each sequencing read is first converted to a bitvector where a zero represents no misincorporation and a one represents a misincorporation. The DaVinci pipeline forces sites of misincorporations to be single stranded upon subsequent folding. G’s identified as likely to be engaged in a tertiary interaction are forced to be single stranded by editing the bitvectors and setting the value at each tertiary G to one. Sequencing reads with a misincorporation at any G identified as tertiary are treated separately because a modification at these positions indicates a change in tertiary structure. For these bitvectors, tertiary G’s are allowed to be considered for base pairing by ContraFold, the folding engine used in DaVinci. After folding of each unique sequencing read, RNA secondary structures are converted to forgi vectors which utilize the Forgi library to encode RNA structure in a string of numbers. Principle component analysis (PCA) is then performed on the forgi vectors, and the ensemble of RNA structures is visualized through a plot of the first two principle components. Clustering of related RNA structures is performed in the PCA reduced dimensional space using techniques such as Kmeans clustering. Together, this pipeline enables more accurate structure modeling of G-quadruplex containing RNA.

(D) Tertiary constraints and DAGGER analysis of BASH MaP treated Spinach accurately model Spinach secondary structure. To determine whether tertiary folding constraints could improve Spinach structure modeling, we implemented the technique as described in Fig. 5C and applied it to Spinach BASH MaP data. Base pairs which are correctly predicted are indicated by green bars. Base pairs which are incorrectly predicted are indicated with red dashed lines. A base pair was determined to be correct if the true base pairing partner was within one base (+/-) from the indicated pairing partner. The resulting secondary structure most closely matches the crystallographic secondary structure through formation of the P2 domain and absence of false helices.

RNA structure deconvolution reveals G-quadruplex and P2 misfolding in Spinach.

(A) RNA structure deconvolution identifies two conformations of Spinach. To identify multiple conformations of Spinach, we applied the program DANCE which utilizes a Bernoulli mixture model to identify mutually exclusive patterns of misincorporation in sequencing data. Spinach was subjected to BASH MaP and the sequencing data was input into the DANCE pipeline. DANCE identified two conformations denoted as State 1 and State 2 of 80.6% and 19.4% abundance. Misincorporation rates for each state were converted into reactivity values through normalization and plotted for comparison (see METHODS).

(B) Reactivity differences between DANCE-identified states in Spinach reveal changes in G-quadruplex core and P2 domains. To identify which regions of Spinach adopted different structures between the two states identified by DANCE, we plotted the difference between reactivities of State 1 and State 2. The plot of change in reactivity shows that State 1 and State 2 differ in the core and P2 domains but remain unchanged in stems P1 and P3.

(C) Spinach alternative states display differential N7G reactivity at G-quadruplex G’s. To determine whether the alternative states of Spinach display differences in N7G reactivity, we compared the misincorporation rate of each G for State 1 and State 2. G-quadruplex G’s are indicated below the plot in red. The plot shows that most G’s in Spinach display no change in N7G reactivity to DMS whereas G-quadruplex G’s show marked changes in N7G reactivity to DMS.

(D-E) Spinach G-quadruplex G’s are differentiated from all other G’s in State 1 (D) ****p < 0.0001, unpaired t-test with Welch’s correction. G-quadruplex G’s show no difference in misincorporation rates for State 2 (E) ns p=0.2262, unpaired t-test with Welch’s correction.

(F) The Spinach G-quadruplex is unfolded in State 2. To determine whether the G-quadruplex in Spinach was unfolded in State 2 we compared the misincorporation rate of G-quadruplex G’s for the population average and DANCE deconvolved States 1 and 2. The plot shows that G-quadruplex G’s display increased misincorporation rates in State 2 which suggests State 2 consists of an unfolded G-quadruplex. ****p < 0.0001, Tukey’s multiple comparison test.

(G) Single-stranded loops in the Spinach G-quadruplex show decreased reactivity in State 2. To determine whether the single-stranded loops in the Spinach G-quadruplex core remain unpaired, we compared misincorporation rates between State 1 and State 2. The plot shows that all single-stranded loop residues show reduced reactivity to DMS in State 2 which suggests these positions display increased base-pairing interactions in State 2.

(H) Nucleotide substitutions in Spinach G-quadruplex loop residues reduce Spinach fluorescence. To determine whether Spinach G-quadruplex loop residues make base-pair interactions with G-quadruplex G’s in the Spinach alternative conformation, we systematically changed loop residues from A to C which should stabilize any base-pairing interactions with G’s. We quantified Spinach fluorescence through an in-gel fluorescence assay (see METHODS). The plot shows that conversion of A residues to C residues in the G-quadruplex loop region induces a progressive loss in Spinach fluorescence.

(I) DAGGER clustering with tertiary constraints applied to Spinach BASH MaP data identifies two clusters with altered base pairing in the P2 domain. To identify alternative base pairing conformations of the misfolded Spinach, we utilized the orthogonal single molecule analysis method DaVinci with N7G reactivity data (DAGGER). We incorporated N7G reactivity data to create tertiary folding constraints before DAGGER folding and clustering (see METHODS). Dimensional reduction and clustering identified two major clusters denoted Cluster 1 and Cluster 2. The most representative secondary structure of each cluster is boxed and indicated by the bit number. The DaVinci clustering plot reveals that the misfolded Spinach displays a register shift in the P2 domain. Cluster 1, colored green, is consistent with the Spinach crystallographic secondary structure. Cluster 2 is colored orange.

Step 1 PCR Primers

Oligos and primers