RNA sequence to structure analysis from comprehensive pairwise mutagenesis of multiple self-cleaving ribozymes
Abstract
Self-cleaving ribozymes are RNA molecules that catalyze the cleavage of their own phosphodiester backbones. These ribozymes are found in all domains of life and are also a tool for biotechnical and synthetic biology applications. Self-cleaving ribozymes are also an important model of sequence-to-function relationships for RNA because their small size simplifies synthesis of genetic variants and self-cleaving activity is an accessible readout of the functional consequence of the mutation. Here, we used a high-throughput experimental approach to determine the relative activity for every possible single and double mutant of five self-cleaving ribozymes. From this data, we comprehensively identified non-additive effects between pairs of mutations (epistasis) for all five ribozymes. We analyzed how changes in activity and trends in epistasis map to the ribozyme structures. The variety of structures studied provided opportunities to observe several examples of common structural elements, and the data was collected under identical experimental conditions to enable direct comparison. Heatmap-based visualization of the data revealed patterns indicating structural features of the ribozymes including paired regions, unpaired loops, non-canonical structures, and tertiary structural contacts. The data also revealed signatures of functionally critical nucleotides involved in catalysis. The results demonstrate that the data sets provide structural information similar to chemical or enzymatic probing experiments, but with additional quantitative functional information. The large-scale data sets can be used for models predicting structure and function and for efforts to engineer self-cleaving ribozymes.
Editor's evaluation
This is a valuable study that provides compelling evidence for important nucleotides in five self-cleaving ribozymes. Epistasis analyses are novel in this field.
https://doi.org/10.7554/eLife.80360.sa0Introduction
Challenges with predicting the functional effects of changing an RNA sequence continues to limit the study and design of RNA molecules. Recently, machine learning approaches have made considerable advancements in predicting an RNA structure from a sequence. However, these approaches rely heavily on crystal structures of RNA molecules and sequence conservation of homologs, both of which are limited for RNA molecules compared to proteins (Calonaci et al., 2020; Townshend et al., 2021). In addition, describing an RNA molecule as a single structure can be inaccurate, and regulatory elements such as riboswitches demonstrate the importance of an ensemble of structures for an RNA function. It is unclear that predictions based on individual structures alone will be able to predict the functional effects of mutations with the precision needed for many biotechnical and synthetic biology applications, or to predict disease-associated mutations in RNA molecules (Halvorsen et al., 2010). This suggests that new experimental data types might be important for understanding, designing, and manipulating the transcriptome.
Self-cleaving ribozymes provide a useful model to study sequence-structure-function relationships in RNA molecules. Self-cleaving ribozymes are catalytic RNA molecules that cleave their own phosphodiester backbone. They were first discovered in viruses and viroids, but numerous families of self-cleaving ribozymes have since been discovered in all domains of life (Prody et al., 1986). The CPEB3 ribozyme, for example, was discovered in the human genome and found to be highly conserved in mammals (Bendixsen et al., 2021; Salehi-Ashtiani et al., 2006). Other self-cleaving ribozymes, such as the hammerhead and twister ribozymes, are found broadly distributed across eukaryotic and prokaryotic genomes (Perreault et al., 2011; Roth et al., 2014). The biological roles of ribozymes in different genomes and different genetic contexts remain an active area of investigation (Jimenez et al., 2015). In addition to being widespread across the tree of life, self-cleaving ribozymes have also been used for several bioengineering applications (Liang et al., 2011; Peng et al., 2021; Wei and Smolke, 2015; Zhong et al., 2016). For example, self-cleaving ribozymes are being combined with aptamers to develop synthetic gene regulatory devices, which have biotechnical and biomedical applications where ligand-dependent control of gene expression is desired (Kobori et al., 2017; Kobori et al., 2015; Stifel et al., 2019; Townshend et al., 2015).
The testing of mutational effects in ribozyme sequences has been accelerated by high-throughput experimental approaches. Most self-cleaving ribozymes are fairly small (<200 nt), and genetic variants can be made by chemical synthesis of a single DNA oligonucleotide that is then used as a template for in vitro transcription. The self-cleavage activity of the ribozyme requires a precise three-dimensional structure, and therefore activity can be used as a sensitive indirect readout of native structure. Mutations that disrupt the native structure are detected as reduced activity compared to the unmutated ‘wild-type’ ribozyme. Several methods have been developed to enable the detection of ribozyme function by high-throughput sequencing of biochemical reactions (Bendixsen et al., 2019; Hayden, 2016; Kobori and Yokobayashi, 2016; Shen et al., 2021). For self-cleaving ribozymes, each read from the data reports both the mutations and whether or not that molecule was reacted (cleaved) or unreacted (uncleaved). Therefore, high-throughput sequencing allows numerous genetic variants to be pooled together and still observed hundreds to thousands of times in the data. This provides confidence in the fraction cleaved (FC) for each genetic variant in a given experiment, and genetic variants are compared to determine relative activity (RA). Importantly, the data are internally controlled because both reacted and unreacted molecules are observed, which controls for differences in their abundance due to synthesis steps (chemical DNA synthesis, transcription, reverse-transcription, and PCR).
A common approach to confirm structural interactions in RNA and proteins is through analysis of pairs of mutations (Dutheil et al., 2010; Olson et al., 2014). In this context, it can be useful to calculate pairwise epistasis, which measures deviations in the mutational effects of double mutants relative to the effects of each individual mutation (assuming an additive model of mutational effects). For example, in the case of a base pair, each single mutation would disrupt the base-pairing interaction, destabilizing the catalytically active RNA structure and reducing activity. However, if two mutants together restore a base pair, the RA of the double mutant would have much higher activity than expected from the additive effects of the individual mutations (positive epistasis). In contrast to paired nucleotides, double mutants at non-paired nucleotides tend to have a more reduced activity than expected from each individual mutation (negative epistasis) (Bendixsen et al., 2017; Li et al., 2016). In the case of two mutations that create a different base pair (i.e., G-C to A-U), it is known that the stacking with neighboring base pairs is also structurally important, and some base pair substitutions will not be equivalent in a given structural context. This creates a range of possible epistatic effects even for two mutations at paired nucleotide positions. In addition, some non-canonical base interactions within tertiary contacts may also show epistasis even when they do not involve Watson-Crick or GU wobble base-pairing interactions. Nevertheless, the propensity for positive epistasis between physically interacting nucleotides suggests that a comprehensive evaluation of pairwise mutational effects should contain considerable structural information.
Here, we report comprehensive analysis of mutational effects for all single and double mutants for five different self-cleaving ribozymes. RA effects of all single and double mutations were determined by high-throughput sequencing of co-transcriptional self-cleavage reactions, and this data was used to calculate epistasis between pairs of mutations. The ribozymes studied include a mammalian CPEB3 ribozyme, a hepatitis delta virus (HDV) ribozyme, a twister ribozyme from Oryza sativa, a hairpin ribozyme derived from the satellite RNA from tobacco ringspot virus, and a hammerhead ribozyme (Bendixsen et al., 2021; Burke and Greathouse, 2005; Chadalavada et al., 2007; Liu et al., 2014; Müller et al., 2012). For each reference ribozyme, a single DNA oligo template library was synthesized with 97% wild-type nucleotides at each position, and 1% of each of the three other nucleotides. This mutagenesis strategy was expected to produce all possible single and double mutants, as well as a random sampling of combinations of three or more mutations. The mutagenized templates were transcribed in vitro, all under identical conditions, where active ribozymes had the opportunity to self-cleave co-transcriptionally. All ribozyme constructs studied cleave near the 5′-end of the RNA, and a template switching reverse transcription protocol was used to append a common primer binding site to both cleaved and uncleaved molecules. Subsequently, low-cycle PCR was used to add indexed Illumina adapters for high-throughput sequencing. Each mutagenized ribozyme template was transcribed separately and in triplicate, and amplified with unique indexes so that all replicates could be pooled and sequenced together on an Illumina sequencer. The sequencing data was then used to count the number of times each unique sequence was observed as cleaved or uncleaved, and this data was used to calculate the FC. The FC of single and double mutants was normalized to the unmutated reference sequence to determine RA. The RA values of the single and double mutants were used to calculate all possible pairwise epistatic interactions in all five ribozymes. We mapped epistasis values to each ribozyme structure to evaluate correlations between structural elements and patterns of pairwise epistasis values. The results indicated that structural features of the ribozymes are revealed in the data, suggesting that these data sets will be useful for developing models for predicting sequence-structure-function relationships in RNA molecules.
Results and discussion
Epistatic effects in paired nucleotide positions show stability-dependent signatures
To evaluate how the effects of mutations mapped to the ribozyme structures, we plotted the RA values as heatmaps, similar to previous publications by others (Andreasson et al., 2020; Kobori and Yokobayashi, 2016; Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, panel A, large plot). We then used this data to calculate epistasis between pairs of mutations. We first inspected nucleotide positions known to be involved in base-paired regions of the secondary structure of each ribozyme. In this heatmap layout, many paired regions showed an anti-diagonal line of high-activity double mutant variants with strong positive epistasis (Figures 1—5, insets, red to blue plots). In addition, pairs of mutations off the anti-diagonal tended to show negative or non-positive epistasis. Pseudoknot elements that involve Watson-Crick base pairs also showed this pattern, including the single base pair T1 element in CPEB3 (Figure 1) and the two base pair T1 element in HDV (Figure 2). The layout of mutations in the heatmap places paired nucleotide positions along the anti-diagonal and compensatory double mutants that change one Watson-Crick base pair to another are found on this anti-diagonal. Individual mutations that break a base pair will often reduce ribozyme activity, but the activity can be restored by a second compensatory mutation resulting in positive epistasis. In contrast, double mutants off-diagonal usually disrupt two base pairs (unless they result in a GU wobble base pair). It is expected that breaking two base pairs in the same paired region would be more deleterious to ribozyme activity than breaking one base pair. The epistasis data indicates that two non-compensatory mutations in the same paired region are more deleterious than expected from an additive assumption, and frequently create negative epistasis off-diagonal within paired regions.

Effects of mutations and pairwise epistasis in a CPEB3 ribozyme.
(A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a mammalian CPEB3 ribozyme. Base-paired regions P1, P2, P3, P4, and T1 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired regions are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes (A is reproduced from Figure 1B from Beck et al., 2022). (B) Secondary structure of the CPEB3 ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the CPEB3 ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU).

Effects of mutations and pairwise epistasis in a HDV self-cleaving ribozyme.
(A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of an HDV ribozyme. Base-paired regions P1, P2, P3, P4, and T1 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired regions are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the HDV ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the HDV ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). HDV, hepatitis delta virus.

Effects of mutations and pairwise epistasis in a twister self-cleaving ribozyme.
(A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a twister ribozyme. Base-paired regions P2, P4, T1, and T2 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired region are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the twister ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the twister ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU).

Effects of mutations and pairwise epistasis in a hairpin self-cleaving ribozyme.
(A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a hairpin ribozyme. Base-paired regions P1, P2, and P3 are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired region are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the hairpin ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the hairpin ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). (D) The distributions of epistasis values in all terminal stem loops across all five ribozymes, and epistasis observed within loop A, loop B, and between loop A and loop B in the hairpin ribozyme.

Effects of mutations and pairwise epistasis in a hammerhead self-cleaving ribozyme.
(A) Relative activity heatmap depicting all possible pairwise effects of mutations on the cleavage activity of a hammerhead ribozyme. Base-paired regions, P1 and P2, are highlighted and color coordinated along the axes, and surrounded by black squares within the heatmap. Pairwise epistasis interactions observed for each paired region are each shown as expanded insets for easy identification of the specific epistatic effects measured for each pair of mutations. Instances of positive epistasis are shaded blue, and negative epistasis is shaded red, with higher color intensity indicating a greater magnitude of epistasis. Catalytic residues are indicated by stars along the axes. (B) Secondary structure of the hammerhead ribozyme used in this study. Each nucleotide is shaded to indicate the average relative cleavage activity of all single mutations at that position. (C) Distributions of epistasis values in the paired regions of the hammerhead ribozyme. Data were categorized as double mutations that result in two mismatches (2 Mismatch), a single mismatch (1 Mismatch), or no mismatches because of a new Watson-Crick base pair or GU wobble results (WC/GU). (D) Crystal structure of a hammerhead ribozyme (3ZD5) with C20 and G25 indicated (orange) and hydrogen bonds between the nucleotides shown as yellow dashed lines.
To further evaluate epistasis within base-paired regions, we separated epistasis data into three categories based on the number of base pairs that the mutations disrupt. For each ribozyme, we plotted the distribution of epistasis values as violin plots (Figures 1—5, panel C). For all ribozymes, the analysis revealed a clear trend. On average, disrupting two base pairs resulted in negative epistasis (mean of distribution), disrupting one base pair shifted the distribution toward more positive epistasis values, and the highest epistasis values (mean and max) were found for double mutants that result in zero disrupted base pairs because the two mutations together create a new Watson-Crick or GU Wobble pair. This trend was observed for paired regions in every ribozyme, and in all cases the distributions were significantly different (p<0.05–0.001, Mann-Whitney U test). This pattern of epistasis in paired regions demonstrates the potential for identifying base-paired regions in RNA structures using comprehensive double-mutant activity data.
To further evaluate the potential of epistasis data to identify base-paired regions, we analyzed the epistasis values for each paired region individually. For this analysis, we separated the epistasis values calculated for double mutants that result in a Watson-Crick base pair (‘on-diagonal’ in heatmaps) from all other double mutants (‘off-diagonal’ in heatmaps) in each paired region (Figure 6). Short-paired regions showed the largest differences in the distributions of epistatic effects for on-diagonal and off-diagonal double mutants, while longer-paired regions showed small differences in these distributions. For example, short-paired regions P3 in CPEB3 and HDV (3 bp), and T1 in the twister (4 bp) showed very large differences in the mean of the distributions. These small regions were highly sensitive to individual mutations, and most pairs of mutations within this region resulted in almost no detectable activity except when they created a different Watson-Crick base pair, leading to the large positive epistasis values (Figures 1—5). In addition, in these short-paired regions, we do not see strong negative epistasis. It appears that the strong deleterious effect of a single mutation in these short regions makes a second mutation no more disruptive to activity, resulting in a mean of the distribution near zero for double mutants off-diagonal. In contrast, the largest paired region (HDV P4, 14 bp) showed a very small difference between the distribution of epistasis values found on-diagonal and off-diagonal. This can be rationalized because losing one base pair was not deleterious to the HDV ribozyme activity under our experimental conditions (Figure 2), and this does not allow for positive epistasis upon a second mutation. Even the loss of two base pairs in P4 was somewhat tolerated, leading to very little negative epistasis for two mutations at unpaired positions. Taken together, the results are consistent with other observations in both RNA and proteins, where it has been observed that the effects of mutations, and their additivity, have been shown to be dependent on the local thermodynamics of the structured region (Kraut et al., 2003; Moody and Bevilacqua, 2003).

Distributions of epistasis values calculated for individual paired regions in all five ribozymes.
For each region, epistasis values were separated into double mutants that restore a Watson-Crick base pair (‘on-diagonal’, blue) and all other double mutants (‘off-diagonal’, gray). The mean of each distribution (µ) is reported and indicated by the dashed line. The p value is the probability that values were drawn from the same distribution by chance (Mann-Whitney U test).
To explicitly investigate the influence of thermodynamic stability on mutational effects in the data, we calculated the minimum free energy for each paired region and compared mutational effects. We split each paired region into two separate RNA sequences that contained only the base-paired nucleotides, eliminating loop nucleotides, and used nearest neighbor rules to calculate the minimum free energy of their interaction (NUPACK). This approach neglects thermodynamic contributions from terminal loops, but allowed for a consistent approach to compare internal and terminal paired regions. We found a significant negative correlation between the median deleterious effects of single mutations and the minimum free energy of the paired regions (Figure 6—figure supplement 1). Clearly, though, thermodynamic stability alone does not explain every mutational effect. For example, CPEB3 P1 is more sensitive to mutations than CPEB3 P2 or P4 even though the latter are less stable. This is likely because P1 is immediately adjacent to the site of self-cleavage, while P2 and P4 are not. Overall, this analysis of thermodynamic stability indicates that for RNA’s with unknown structures, more stable structural elements may be harder to identify from epistatic effects alone when there is not a strong deleterious effect of individual mutations. However, it is also possible that more stable elements would show stronger epistasis under different experimental conditions, such as different temperatures or magnesium concentrations (Peri et al., 2022).
Catalytic residues do not have any high-activity mutants
Self-cleaving ribozymes often utilize a concerted acid-base catalysis mechanism where specific nucleobases act as proton donors (acid) or acceptors (base) (Jimenez et al., 2015), and mutations at these positions abolish activity. Analyzing the effects of individual mutations will not distinguish catalytic nucleotides from structurally important nucleotides. Comprehensive pairwise mutations, on the other hand, can potentially distinguish between catalytic residues that cannot be rescued by a second mutation, and structurally important nucleotides that can be rescued (positive epistasis). The catalytic cytosines of the CPEB3 (C57) and HDV (C75) act as proton donors due to perturbed pKa values (Nakano et al., 2000; Skilandat et al., 2016). For the twister ribozyme (Figure 3), the guanosine at position G39 acts as a general base, and the adenosine at position A1 acts as a general acid (Wilson et al., 2016). The catalytic nucleotides for the Hammerhead ribozyme (Figure 5) are the Guanosines located at positions G25 and G39 (Scott et al., 2013). The hairpin ribozyme (Figure 4) contains catalytic nucleotides at positions G29 and A59 (Wilson et al., 2006). In the RA heatmaps, the columns and rows associated with these nucleotides result in low activity values (Figures 1—5, Figure 6—figure supplement 2). It is important to note that because there is complete coverage of all double mutants in this data set, we can be certain that there are no possible compensatory mutations. These results show how catalytic residues can be identified in the comprehensive pairwise mutagenesis data.
Unpaired nucleotides show mutational effects that depend on tertiary structure
Ribozymes with mutations to nucleotides found in terminal loops that are not involved in tertiary structure elements showed high RA for most single and double mutants, and essentially no epistasis. This is not surprising if these loops reside on the periphery of the ribozyme and are not involved in structural contacts with other nucleotides. This is the case for L4 of CPEB3 (Figure 1), L4 of HDV (Figure 2), and L1 and L3 of the hairpin ribozyme (Figure 4). Two mutations within these loops do not reduce activity, and mutations in these loops do not rescue other deleterious mutations such as those that break a base pair.
The internal loops LA and LB of the hairpin ribozyme are structurally important (Figure 4). Interactions between nucleotides within LB include six non-Watson-Crick base-pairing interactions that are important for the formation of an active ribozyme structure (Figure 4—figure supplement 1). Several non-canonical base-base and sugar-base hydrogen bonds between nucleotides within LA are also important for the formation of the active site (Fedor, 2000; Wilson et al., 2006). Docking between LA and LB is necessary for the formation of a catalytically active ribozyme and is facilitated by a Watson-Crick base pair between nucleotides numbered G1 and C46 in the version of the ribozyme used here (Rupert and Ferré-D’Amaré, 2001). In contrast to terminal loop regions, most single mutations within LA and LB resulted in low self-cleavage activity in our data (Figure 4). In addition, the double mutants within and between loop A and loop B show several instances of strong positive epistasis (Figure 4—figure supplement 1C), and the distributions of epistasis within and between these loops are significantly different than the terminal loops that are not structurally important (Figure 4D). This positive epistasis indicates that many of the important structural contacts can be achieved by other specific pairs of nucleotides. For example, the double mutant G1C and C46G shows strong epistasis suggesting that swapping a C-G base pair for the G-C base pair can restore activity by facilitating docking between the two loops. Several double mutants at positions that form non-canonical interactions in LB show positive epistasis. For example, mutation A41G shows positive epistasis when the interacting nucleotide C65 is mutated to a G or U. The non-canonical A45:A59 interaction shows positive epistasis for several pairs of mutations (A45U A59C, A45C A59C, and A45G A59U). Finally, the non-canonical base pair A47:G57 in LB, shows positive epistasis for the A47U:G57A double mutant. The difference between terminal loops and loops with structural importance highlights how activity-based data can help identify non-canonical structures that are challenging to predict computationally, and that might be difficult to identify by other common approaches, such as chemical probing experiments (Walter et al., 2000).
Another example of structurally important unpaired regions can be found in the CUGA uridine turn (U-turn) motif in the hammerhead ribozyme (Figure 5). This CUGA turn forms the catalytic pocket and positions a catalytic cytosine (-1C) at the cleavage site (Doudna, 1995). Crystal structures of the sTRSV ribozyme show a base pair between the nucleotides corresponding to the nucleotides numbered here as C20 and G25 in the ribozyme construct used for our experiments (Chi et al., 2008; Martick and Scott, 2006). These two nucleotides showed strong positive epistasis for the mutations C20G and G25C, which substitutes a G:C base pair for the original C:G base pair. All other single and double mutants in this region showed low activity, and no instances of strong positive epistasis within or between this motif (Figure 5). The low activity resulting from mutations in this region confirms the functional importance of this motif, and indicates that this motif cannot be easily formed or rescued by sequences with up to two mutational differences, except for the G:C base pair swap.
Tertiary interactions between loops in the hammerhead ribozyme provide another example of structurally important loop regions. Type III hammerhead ribozymes, like the one used in this study, contain tertiary interactions between nucleotides in the loops of P1 and P2 that are implicated in structural organization of the catalytic core. A crystal structure of this loop-loop interaction showed a network of interhelical non-canonical base pairs and stacks, with several nucleobases in stem-loop I interacting with more than one nucleobase in stem-loop II (Chi et al., 2008; Martick and Scott, 2006). However, there are numerous different loop sequences in naturally occurring hammerhead ribozymes indicating that this loop-loop interaction can be formed by a variety of different sequences (Burke and Greathouse, 2005; Perreault et al., 2011). We therefore anticipated that we would observe a significant level of positive epistasis between these two loops for double mutations that were capable of maintaining these tertiary interactions. Surprisingly, however, we found that most individual and double mutations do not reduce activity (Figure 5), and double mutants do not show positive epistasis (Figure 5—figure supplement 1). This indicates that the multiple interactions between the loops compensate for mutations that break a single interaction. It is interesting to note that the mutational robustness of these loops has been exploited in bioengineering applications, where insertion of an aptamer into one of the loops and randomization of the other allowed for the selection of synthetic riboswitches (Townshend et al., 2015). The identification of robust structural elements through high-throughput mutational data could be useful for identifying better targets for aptamer integration in other ribozymes.
We also find support for a two-nucleotide T1 pseudoknot in CPEB3 involving a non-canonical U-U base pair. While no crystal structure of the CPEB3 ribozyme has been solved, this U:U base pair has been confirmed and implicated as a magnesium binding site based on NMR and Tb3+ cleavage data (Skilandat et al., 2016). In our data, we find that G:C and C:G base pairs can support activity. We see negative epistasis for several pairs of mutations that result in ‘mismatches’ (A:G, A:A, and A:C) and positive or no epistasis for pairs of mutations that result in G:C and C:G base pairs, which all supports the formation of a second base pair in T1. We note that because the starting base pair is a U:U, the location of double mutants resulting in WC/GU pairs do not lie on the anti-diagonal in the heatmaps. Because a crystal structure of the CPEB3 ribozyme has not been solved, the CPEB3 data provides an example of how comprehensive mutational data can be useful for RNA with unknown structures.
Evaluation of read depth and mutational coverage
The accuracy of our RA measurements depends on the number of reads we observe that map to each unique ribozyme sequence (read depth). Each reference ribozyme has a different nucleotide length resulting in different numbers of possible single and double mutants. In addition, the pooling of experimental replicates for sequencing does not result in equal mixtures of each replicate. In order to determine read depth, we mapped reads to the reference sequences and counted the number of reads that matched each ribozyme, while allowing for one or two mutations. We observed every single and double mutant for all ribozymes in each replicate, indicating 100% coverage of these mutant classes for all of our data sets. The distributions of observations for each single and double mutant of each ribozyme are shown in Figure 6—figure supplement 3. The HDV data showed the lowest depth, possibly because it is a larger ribozyme (87 nt), and fewer reads mapped to the single and double mutants (Table 1). Nevertheless, this analysis confirms that the data contain complete coverage of all single and double mutants and ample read depth for all five ribozymes.
Summary of the lengths of each self-cleaving ribozyme used in this study, the number of single and double mutants whose cleavage activity was analyzed, and the average fraction cleaved observed for all single and double mutants.
Ribozyme name | Ribozyme length | Possible single Mmutants | Possible double mutants | Total mapped reads | Wild-type fraction cleaved | Single mutant average fraction cleaved | Double mutant average fraction cleaved |
---|---|---|---|---|---|---|---|
CPEB3 | 69 | 207 | 21,114 | 9,238,603 | 0.90 | 0.69 | 0.44 |
HDV | 87 | 261 | 33,669 | 3,316,380 | 0.60 | 0.40 | 0.25 |
Twister | 48 | 144 | 10,152 | 7,762,863 | 0.60 | 0.41 | 0.21 |
Hairpin | 71 | 213 | 22,365 | 5,067,216 | 0.52 | 0.29 | 0.17 |
Hammerhead | 45 | 135 | 8,910 | 8,054,498 | 0.34 | 0.27 | 0.19 |
Epistasis plots are an informative approach to visualizing high-throughput activity data
Previous studies have reported comprehensive pairwise mutagenesis of ribozymes that provide interesting opportunities for comparison to the data presented here. For example, all pairwise mutations in a 42-nucleotide region of the same twister ribozyme were previously reported (Kobori and Yokobayashi, 2016). Compared to our experiments, these previous experiments used a later transcriptional time point (2 hr) and lower magnesium concentration (6 mM). They did not calculate epistasis, and reported the RA of all double mutants using heatmaps, inspiring the figures presented here. The results were highly similar, and the authors were able to identify paired regions in the data. The similarity between the results illustrates the reliability of this sequencing-based approach, which is promising for future data sharing and meta-analysis efforts. In another prior work, all pairwise mutations in the glmS ribozyme were analyzed using a custom-built fluorescent RNA array (Andreasson et al., 2020). The power of this approach is that they were able to monitor self-cleavage over short and long time scales, which enables differentiating both very slow and very fast self-cleaving variants. While the authors did not calculate pairwise epistasis, they reported RA heatmaps and also ‘rescue effects’ when the activity of a double mutant is sufficiently higher than the activity of a single mutant. This rescue analysis is very similar to positive epistasis, but only takes into account one mutation at a time. This analysis was also able to identify many of the know base-pair interactions and some tertiary contacts in the glmS ribozyme. In addition, they were able to observe some minor secondary structure rearrangement, where mutations in some nucleotides were able to rescue neighboring nucleotides by shifting the base-pairing slightly. The pairwise epistasis analysis presented here adds an additional approach to extract information from such high-throughput sequencing-based analysis of self-cleaving ribozymes. Unlike the rescue analysis, which can only identify positive interactions, the ability to detect negative epistatic interactions may help further identify structurally important regions for RNA sequence design and engineering efforts. It is possible that all of these analysis approaches could be used for RNA functions other than self-cleavage, if they can be detected by high-throughput sequencing. This could include ribozyme activities that can be enriched by in vitro selections (Pressman et al., 2019), or mutations in natural RNA molecules that affect growth rates (Li et al., 2016).
Conclusion
We have determined the RA for all single and double mutants of five self-cleaving ribozymes and use this data to calculate epistasis for all possible pairs of nucleotides. The data was collected under identical co-transcriptional conditions, facilitating direct comparison of the data sets. The data revealed signatures of structural elements including paired regions and non-canonical structures. In addition, the comprehensiveness of the double mutants enabled identification of catalytic residues. Recently, there has been significant progress toward predicting RNA structures from sequence using machine learning approaches (Calonaci et al., 2020; Townshend et al., 2021; Zhao et al., 2021). The machine learning models are typically trained on structural biology data from x-ray crystallography, chemical probing (SHAPE), and natural sequence conservation. Self-cleaving ribozymes have been central to this effort. Our approach is similar to SHAPE in that it can be obtained with common lab equipment and commercially available reagents. The activity data presented provides information similar to natural sequence conservation, except that it provides quantitative effects of mutations, not just frequency. For example, secondary structures have been predicted based on comparative sequence analysis by identifying covarying nucleotide positions in homologous RNA sequences. These approaches are important because they do not require any experimental evaluation of sequences. However, this ‘comparative approach’ may not be able to identify important nucleotides or structural elements other than canonical base pairs. We hope that the activity-based data presented here will provide information not present in these other training data sets and help advance computational predictions.
Materials and methods
Mutational library design and preparation of self-cleaving ribozymes
Request a detailed protocolSingle-stranded DNA molecules used as templates for in vitro transcription were synthesized as described previously (Kobori and Yokobayashi, 2016), using doped oligos containing 97% of the base of the reference sequence and 1% of the three other remaining bases at each position (Keck Oligo Synthesis Resource, Yale). A constant structured RNA cassette was appended to the template sequences to provide a reverse transcription primer binding site (Wilkinson et al., 2006). The ssDNA library was made double-stranded to allow for T7 transcription via low-cycle PCR using Taq DNA polymerase.
Co-transcriptional self-cleavage assay
Request a detailed protocolThe co-transcriptional self-cleavage reactions were carried out in triplicate by combining 20 μL 10× T7 transcription buffer (500 μL 1 M Tris pH 7.5, 50 μL 1 M DTT, 20 μL 1 M Spermidine, 150 μL 1 M MgCl2, 280 μL RNase-free water), 4 μL rNTP (25 mM, NEB, Ipswich, Ma), 8 μL T7 RNA Polymerase-Plus enzyme mix (1600 U, Invitrogen, Waltham, MA), 160 μL nuclease-free water, and 8 μL of double-stranded DNA template (4 pmol, 0.5 μM PCR product) at 37°C for 30 min. The transcription and co-transcription self-cleavage reactions were quenched by adding 60 μL of 50 mM EDTA. The resulting RNA was purified and concentrated using Direct-zol RNA MicroPrep Kit with TRI-Reagent (Zymo Research, Irvine, CA), and eluted in 7 μL nuclease-free water. Concentrations were determined via absorbance at 260 nm (ThermoFisher NanoDrop, Waltham, MA), and normalized to 5 μM.
Reverse transcription and Illumina indexing PCR
Request a detailed protocolReverse transcription was carried out using a 5′ RACE protocol using phased template switching oligo’s (TSO1-4, Supplementary file 1) as described previously (Bendixsen et al., 2020). Briefly, reverse transcription reactions used 5 pmol RNA and 20 pmol of reverse transcription primer in a volume of 10 μL. RNA and primer were heated to 72°C for 3 min and cooled on ice. Reverse transcription was initiated by adding 4 μL SMARTScribe 5× First-Strand Buffer (TaKaRa, San Jose, CA), 2 μL dNTP (10 mM), 2 μL DTT (20 mM), 2 μL phased template switching oligo mix (10 μM), and 2 μL SMARTScribe Reverse Transcriptase (200 units, TaKaRa). Four different template switching oligos (TSO 1–4) with different lengths and nucleotide compositions were used such that the first nucleotides read during sequencing are a balance of all four nucleotides, and the ribozymes are sequenced in four different frames relative to the primer. The mixture was incubated at 42°C for 90 min and the reaction was stopped by heating to 72°C for 15 min. The resulting cDNA was purified on a silica-based column (DCC-5, Zymo Research) and eluted into 7 μL water. Illumina adapter sequences and indexes were added using high-fidelity PCR. A unique index combination was assigned to each ribozyme and for each replicate. The PCR reaction contained 3 μL purified cDNA, 12.5 μL KAPA HiFi HotStart ReadyMix (2×, KAPA Biosystems, Wilmington, MA), 2.5 μL forward, 2.5 μL reverse primer (Illumina Nextera Index Kit), and 5 μL water. Several cycles of PCR were examined using gel electrophoresis and a PCR cycle was chosen that was still in logarithmic amplification, prior to saturation. Each PCR cycle consisted of 98°C for 10 s, 63°C for 30 s, and 72°C for 30 s. PCR DNA was purified on silica-based columns (DCC-5, Zymo Research) and eluted in 22.5 μL water. The final product was then verified using gel electrophoresis.
High-throughput sequencing
Request a detailed protocolThe indexed PCR products for all replicates were pooled together at equimolar concentrations based on absorbance at 260 nm. Paired-end sequencing reads were obtained for the pooled libraries using an Illumina HiSeq 4000 (Genomics and Cell Characterization Core Facility, University of Oregon).
Sequencing data analysis
Request a detailed protocolPaired-end sequencing reads were joined using FLASh, allowing ‘outies’ due to overlapping reads. The joined sequencing reads were analyzed using custom Julia scripts available at https://gitlab.com/bsu/biocompute-public/mut_12 (Beck, 2023). Our analysis implemented a sequence-length sliding window to screen for double mutant variants of a reference ribozyme. Nucleotide identities for each mutant were identified and then counted as either cleaved or uncleaved based on the presence or absence of the 5′-cleavage product sequence. The RA was calculated as previously described (Kobori and Yokobayashi, 2016). Briefly, a FC was calculated for each genotype in each replicate as FC=Nclv/(Nclv+Nunclv). This value was normalized to the reference/wild-type FC as RA=FC/FCwt. The RA values were averaged across the three replicates and then plotted as a heatmap. Epistasis interactions for each double mutant (i, j) were quantified as previously described (Bendixsen et al., 2017), where . In order to eliminate false positive detection of epistasis interactions, values were filtered to eliminate instances where the difference between the double and any of the single mutants was less than 1−3σ of the overall distribution of differences between the single and double mutant relative activities. Values greater than 1 indicate positive epistasis, and values less than 0 indicate negative epistasis. Mann-Whitney U test was used to determine the probability that epistasis or activity values of different structural elements were from the same distribution.
Correlation of thermodynamic stability of paired regions and observed mutational effects
Request a detailed protocolEach base-paired region was split into two separate RNA sequences containing only the nucleotides involved in base-pairing, omitting nucleotides belonging to stem loops. Complex formation between each pair of strands at was analyzed in Nupack using Serrra and Turner RNA energy parameters in order to obtain minimum free energy values for each paired region (37°C, [1 μM]). Using custom Julia scripts, the median RA for single mutations to each paired region was plotted as a function of the calculated free energy and a Pearson correlation coefficient was calculated.
Data availability
Sequencing reads in FastQ format are available at ENA (PRJEB52899 and PRJEB51631). Sequences, activity data, and computer code is available at GitLab (https://gitlab.com/bsu/biocompute-public/mut_12 copy archived at swh:1:rev:7be31e58784ab6ffe5f15790ebbda98279af84d2).
-
European Nucleotide ArchiveID PRJEB52899. RNA sequence to structure analysis from comprehensive pairwise mutagenesis of multiple self-cleaving ribozymes.
-
European Nucleotide ArchiveID PRJEB51631. Predicting higher-order mutational effects in an RNA enzyme by machine learning of high-throughput experimental data.
References
-
Predicting higher-order mutational effects in an RNA enzyme by machine learning of high-throughput experimental dataFrontiers in Molecular Biosciences 9:893864.https://doi.org/10.3389/fmolb.2022.893864
-
Negative epistasis in experimental RNA fitness landscapesJournal of Molecular Evolution 85:159–168.https://doi.org/10.1007/s00239-017-9817-5
-
Experimental resurrection of ancestral mammalian CPEB3 ribozymes reveals deep functional conservationMolecular Biology and Evolution 38:2843–2853.https://doi.org/10.1093/molbev/msab074
-
Machine learning a model for RNA structure predictionNAR Genomics and Bioinformatics 2:lqaa090.https://doi.org/10.1093/nargab/lqaa090
-
Base pairing constraints drive structural epistasis in ribosomal RNA sequencesMolecular Biology and Evolution 27:1868–1876.https://doi.org/10.1093/molbev/msq069
-
Structure and function of the hairpin ribozymeJournal of Molecular Biology 297:269–291.https://doi.org/10.1006/jmbi.2000.3560
-
Chemistry and biology of self-cleaving ribozymesTrends in Biochemical Sciences 40:648–661.https://doi.org/10.1016/j.tibs.2015.09.001
-
High-Throughput assay and engineering of self-cleaving ribozymes by sequencingNucleic Acids Research 43:e85.https://doi.org/10.1093/nar/gkv265
-
High-throughput mutational analysis of a twister ribozymeAngewandte Chemie 55:10354–10357.https://doi.org/10.1002/anie.201605470
-
Deep sequencing analysis of aptazyme variants based on a pistol ribozymeACS Synthetic Biology 6:1283–1288.https://doi.org/10.1021/acssynbio.7b00057
-
Challenges in enzyme mechanism and energeticsAnnual Review of Biochemistry 72:517–571.https://doi.org/10.1146/annurev.biochem.72.121801.161617
-
Crystal structure and mechanistic investigation of the twister ribozymeNature Chemical Biology 10:739–744.https://doi.org/10.1038/nchembio.1587
-
Folding of a stable DNA motif involves a highly cooperative network of interactionsJournal of the American Chemical Society 125:16285–16293.https://doi.org/10.1021/ja038897y
-
Self-cleaving ribozymes: substrate specificity and synthetic biology applicationsRSC Chemical Biology 2:1370–1383.https://doi.org/10.1039/d0cb00207k
-
Dynamic RNA fitness landscapes of a group I ribozyme during changes to the experimental environmentMolecular Biology and Evolution 39:msab373.https://doi.org/10.1093/molbev/msab373
-
Identification of hammerhead ribozymes in all domains of life reveals novel structural variationsPLOS Computational Biology 7:e1002031.https://doi.org/10.1371/journal.pcbi.1002031
-
Mapping a systematic ribozyme fitness landscape reveals a frustrated evolutionary network for self-aminoacylating RNAJournal of the American Chemical Society 141:6213–6223.https://doi.org/10.1021/jacs.8b13298
-
A widespread self-cleaving ribozyme class is revealed by bioinformaticsNature Chemical Biology 10:56–60.https://doi.org/10.1038/nchembio.1386
-
The hammerhead ribozyme: structure, catalysis, and gene regulationProgress in Molecular Biology and Translational Science 120:1–23.https://doi.org/10.1016/B978-0-12-381286-5.00001-9
-
High-Throughput cellular RNA device engineeringNature Methods 12:989–994.https://doi.org/10.1038/nmeth.3486
-
Geometric deep learning of RNA structureScience 373:1047–1051.https://doi.org/10.1126/science.abe5650
-
Probing non-selective cation binding in the hairpin ribozyme with TB (III)Journal of Molecular Biology 298:539–555.https://doi.org/10.1006/jmbi.2000.3691
-
Engineering dynamic cell cycle control with synthetic small molecule-responsive RNA devicesJournal of Biological Engineering 9:21.https://doi.org/10.1186/s13036-015-0019-7
-
The novel chemical mechanism of the twister ribozymeJournal of the American Chemical Society 138:6151–6162.https://doi.org/10.1021/jacs.5b11791
-
Review of machine learning methods for RNA secondary structure predictionPLOS Computational Biology 17:e1009291.https://doi.org/10.1371/journal.pcbi.1009291
Decision letter
-
Timothy W NilsenReviewing Editor; Case Western Reserve University, United States
-
James L ManleySenior Editor; Columbia University, United States
-
Philip BevilacquaReviewer; Pennsylvania State University, United States
-
Benoît MasquidaReviewer; Centre National de la Recherche Scientifique, France
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "RNA sequence to structure analysis from comprehensive pairwise mutagenesis of multiple self-cleaving ribozymes" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Timothy Nilsen as Reviewing Editor and James Manley as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: Philip Bevilacqua (Reviewer #1); Benoît Masquida (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
All the reviewers felt that the work was, in principle, suitable for publication in eLife. Nevertheless, the reviewers felt that the data was under-interpreted and that analyses from a 3D perspective would improve the paper significantly. Please see the reviews below. In this case, any revised manuscript will be subject to rereview.
Reviewer #1 (Recommendations for the authors):
1. p8. The longer size of HDV does not explain the fewer reads for it because it is only slightly longer than the other ribozymes. What about the stability of the HDV ribozyme and RT not being able to read through it?
2. p8. What is the minimum number of reads per single mutant in HDV? double mutants? For the latter, the average is only 50 (Figure S1) and the minimum appears to be ~10 reads. Can reliable data be attained with so few reads? What is the statistical significance for low read singles and double mutations?
3. p8. more precise language is needed. Suggestions below. These words do not have to be used but some should be provided to guide the reader. "We plotted the relative activity value as heat maps (Figures 1-5 a large plot, shows only blue color)." "We then used this data to calculate epistasis between pairs of mutations (Figures 1-5 insets red to blue colors)".
4. p8. "many paired regions showed an anti-diagonal line of high activity double mutant variants with strong positive epistasis". It would be helpful to dissect this anti-diagonal into two different distributions: WC+wobbles and mismatches. In other words, there are off-diagonal elements within the anti-diagonal square that are meaningful. This should provide an interesting sub-anativealysis in panel C; specifically, it will allow a look at double mutants that lead to the loss of a single base pair (off-diagonal elements within the anti-diagonal square) vs. a loss of two pairs (off-diagonal squares).
5. Overall fraction cleaved varies wildly between the five ribozymes, from 0.19 to 0.68 (Table 1). It would be helpful to know what the fraction cleaved was for the wild-type ribozymes that emerged from the deep sequencing versus just making the wild-type ribozyme alone and measuring the bulk fraction cleaved from a radiolabeled experiment. This could help the authors discuss how sensitive or robust a given ribozyme is to mutations and then speculate why.
6. Why is panel A different colors for different ribozymes? This seems unnecessary and random (e.g. two of them are blue).
7. p9. Not only were the epistasis values for on- anti-diagonal "consistently more positive than two mutations in positions that are not directly base-paired (off-diagonal)" the latter were often negative (at least the mean was), which should be stated. i.e. "more positive" implies both distributions were positive.
8. Figure 4. The inset for LA/LA is not positioned correctly on the main figure. A28-A31 are should be left-shifted. Moreover, epistasis between LA and LB could not be judged, as the authors would like the reader to do on p12. The authors should provide an LA/LB inset for the reader to look at. Additionally, the authors would like the reader to think about interactions of 1 and 46 but don't draw the pair in panel B, making this hard to visualize. The "positive epistasis" for G42U and A64G is missing in the inset which is entirely white for this square. And the A47:G57 showing "positive epistasis for double mutants that result in an AU base pair" doesn't make sense because this can happen with a single mutant at G57U.
9. p14. The authors give details on epistasis between positions 20 and 25 but nothing is shown in the off-diagonals and this data cannot be had on the main diagram in A.
10. p10. The discussion that "differences between epistasis in short and long base-paired regions suggests that the thermodynamic stability of each paired region is important for the observed activity" is clearly supported by the data but should be explained. Short regions cannot withstand the loss of one and especially two base pairs because they are short and once broken cannot be broken again. Bevilacqua and Herschlag have discussed this and could be consulted and referenced. 1. Moody, E. M. and Bevilacqua, P. C. (2003). Folding a stable DNA motif involves a highly cooperative network of interactions. J. Am. Chem. Soc. 125, 16285-16293. 2. Kraut, D. A., Carroll, K. S. and Herschlag, D. (2003). Challenges in enzyme mechanism and energetics. Annu. Rev. Biochem. 72, 517-571.
11. Methods. The authors did reverse transcription. From Table S1, it appears that extra bases were appended to the 3'end of each ribozyme to provide an RT primer binding site [it appears this way from the first unbolded bases listed at the 5'-end of the template in Table S1]. If so how do those bases affect the fraction cleaved of the wild-type? And then the mutants? See comment 5 above. Also, I was unable to understand how the sequence of the RT primer in Table S1 works with the templates.
12. In the epistasis equation, which should be numbered, what are the authors taking the log of? RA? Shouldn't there be a plus sign between the two terms in the denominator? For e.g. assume additivity RAi = RAj = 10^-1, and RAij = 10^-2, we need a plus sign to have this come out to unity; as it would come out to -2. Also, how does it follow that negative epistasis is less than 0? Shouldn't negative be >1 and positive be <1?
13. We could see some trend in the length of paired regions and the intensity of the epistasis effect, but it was hard to tell whether the negative correlation between the median deleterious effects of single mutations and the minimum free energy of the paired regions was significant from the plot in Supplementary Figure 3 and the given Pearson Correlation = -0.53. Also, it may be good to address and explain the difference in the distribution of epistasis value of CPEB3 P1, P2, and P4, which all have 7 base pairs but are very different in the distribution of epistasis value.
Reviewer #2 (Recommendations for the authors):
Regarding the CPEB3 ribozyme, an open question is about the role of the U21-U42 base pair. Figure 1 indicates that positive epistasis has been measured for some sequence combinations. It would be very sound to frame this region of the heat map together with the T1 interaction to discuss the heuristic power of the presented approach since the CPEB3 ribozyme is the only ribozyme studied in this manuscript for which no crystal structure has yet been made available.
Reviewer #3 (Recommendations for the authors):
1) For example, the way the ribozymes have been randomized is already described in Kobori and Yokobayashi (2016). On the other side, it would be necessary to provide more experimental information on the way the template switching reverse transcription is performed. This is not well explained as there is no clear explanation for the usage of the TSO1-4 oligos… The authors should rewrite the experimental method section of their article so that anybody who wants to use this approach could do it without struggling…
2) While some of this 3D information is seldom mentioned in the text, there is no easy way for the reader to find this information in the data provided. A figure exemplifying some of the key 3D interactions of these ribozymes would be most useful.
For example, it would be nice to have more structural details by eventually showing some of the results within the context of the 3D structures of each ribozyme.
For instance, the Watson-Crick base pair interaction G1C-C46G between LA/LB in the hairpin ribozyme could be shown… The same thing with the base pair C20G-G25C in the hammerhead ribozyme. In fact, the 2D structure of the hammerhead ribozyme could be improved as it does not correspond to the active form.
The authors should also provide more data information (as well as supportive figures) about the tertiary interactions that involve non-canonical base pairs and that show positive epistasis… After all, this is a piece of information that has not yet been obtained before for these ribozymes as it cannot be easily obtained by other approaches.
3) The authors could have certainly enhanced dramatically the scope of their article by trying to validate the structure of a self-cleaving ribozyme for which the 3D structure is not known yet. This would have provided a clear test for their approach and would have enhanced dramatically their claim that it could complement chemical and enzymatic probing.
https://doi.org/10.7554/eLife.80360.sa1Author response
Reviewer #1 (Recommendations for the authors):
1. p8. The longer size of HDV does not explain the fewer reads for it because it is only slightly longer than the other ribozymes. What about the stability of the HDV ribozyme and RT not being able to read through it?
There are several factors that may have resulted in fewer reads for HDV. We did normalize the amount of RNA going into the RT reaction, but we did not quantify the amount of resulting cDNA going into the indexing PCR reaction. As you mentioned, the stability of HDV could have influenced this, but we are lacking any direct information. We used Fragment Analyzer data to quantify the amount of correct sized PCR DNA before pooling the samples, so some unequal mixing is possible, but unlikely. A very probable cause is that all samples were sequenced on a single flow cell. It is reported by Illumina that shorter sequences cluster more efficiently, likely due to biased bridge amplification on the flow cell, which would have favored the shorter ribozyme sequences. Finally, the doped oligo synthesis used to generate the library using a 3% mutation rate results in a mean number of mutations of 2.6 for HDV, whereas we would expect a mean number of mutations closer to 2 for the other ribozymes. This results in the library containing more sequences with three or more mutations, which were excluded from our analysis. The combined effects of lower numbers of single and double mutants, clustering bias and possible RT efficiency are all probably contributing to the lower coverage.
2. p8. What is the minimum number of reads per single mutant in HDV? double mutants? For the latter, the average is only 50 (Figure S1) and the minimum appears to be ~10 reads. Can reliable data be attained with so few reads? What is the statistical significance for low read singles and double mutations?
The minimum number of reads for single mutants for HDV is 1945, and the minimum number of reads for double mutants is 4. We acknowledge that sequences that have very few reads have lower confidence in the calculated fraction cleaved and relative activity. In fact, we found a weak correlation between the standard deviation of the fraction cleaved between replicates and the number of reads (pearson = -0.02). The lowest abundance reads have the least confidence in the mean relative activity, which we agree could affect some of our epistasis calculations because all of the lowest abundance reads are double mutants. We are confident in the single mutants, but some of the double mutants could be reported higher or lower due to low abundance. To address this concern, we inspected the data and found that there are only 36 HDV sequences that have 10 or fewer reads (0.1% of 33,669 double mutant genotypes). We therefore looked at the structural location of each pair of mutations. For 34 out of the 36 each of the two mutations were located in different structural regions, and therefore do not contribute to our observations of activity or epistasis within structural elements, which is the main analysis in the text. There were only two instances of low abundance reads where both of the mutations fell within a single structural element: (1) U12C and A16G are both in the same side of HDV P2, and do not form a base pair. The activity of the double mutant sequence is used to calculate one epistasis data point that is included in the “off diagonal” epistasis distribution presented in the figure. (2) U20C and U27G is a double mutant where both mutations reside within the loop L3. The activity of the double mutant is used to calculate one epistasis data point in the Terminal loop data used for comparison to the hairpin loop data in Figure 4D. Taken together we feel that the low abundance reads in the HDV data set do not contribute in any significant way to the analysis and conclusions in the manuscript.
It is also important to note that we took another precaution in all of our epistasis analysis. We were concerned that very low activity single mutants could lead to positive epistasis even when the double mutant had only slightly higher activity, but was still very low activity. We felt that this minor rescue effects were likely of little structural relevance. In our analysis, all epistasis values were evaluated based on the differences between single and double mutants, and likely “false positives” were filtered from our plots. To do this, we determined the distribution of all difference between double mutants and their comprising single mutations. We calculated the standard deviation of this distribution. We set epistasis to zero for instances where the difference between the double mutant and either of the single mutants was less than 1-3stdev of the overall distribution. This filtering approach has also been used by others referenced below.
1. Andreasson JOL, Savinov A, Block SM, Greenleaf WJ (2020) Comprehensive sequence-to-function mapping of cofactor-dependent RNA catalysis in the glmS ribozyme. Nature Communications, 11(1):1663. https://doi.org/10.1038/s41467-020-15540-1
3. p8. more precise language is needed. Suggestions below. These words do not have to be used but some should be provided to guide the reader.
"We plotted the relative activity value as heat maps (Figures 1-5 a large plot, shows only blue color)."
"We then used this data to calculate epistasis between pairs of mutations (figures 1-5 insets red to blue colors)".
Thank you for this suggestion, we agree more precise language would help the reader discern between the RA plot and the epistasis insets. We have edited the text on page 8 to state “(Figures 1-5 A, large plot)”, and “(Figures 1-5, insets, red to blue plots)”.
4. p8. "many paired regions showed an anti-diagonal line of high activity double mutant variants with strong positive epistasis". It would be helpful to dissect this anti-diagonal into two different distributions: WC+wobbles and mismatches. In other words, there are off-diagonal elements within the anti-diagonal square that are meaningful. This should provide an interesting sub-anativealysis in panel C; specifically, it will allow a look at double mutants that lead to the loss of a single base pair (off-diagonal elements within the anti-diagonal square) vs. a loss of two pairs (off-diagonal squares).
Thank you for this suggestion. We performed the suggested analysis and did find there was a significant difference between anti-diagonal elements which formed a WC base pair or GU wobble, from those that broke a single base pair, and these were also significantly different from off-diagonal mutations that broke 2 base pairs. We have updated panel C in Figures 1-5 with these distributions shown as violin plots, and have also revised the text on page 9 to reflect this new analysis.
5. Overall fraction cleaved varies wildly between the five ribozymes, from 0.19 to 0.68 (Table 1). It would be helpful to know what the fraction cleaved was for the wild-type ribozymes that emerged from the deep sequencing versus just making the wild-type ribozyme alone and measuring the bulk fraction cleaved from a radiolabeled experiment. This could help the authors discuss how sensitive or robust a given ribozyme is to mutations and then speculate why.
Thank you for this suggestion. We have updated Table 1 to indicate the fraction cleaved that we observed for the wild-type, for single, and for double mutants for each ribozyme.
6. Why is panel A different colors for different ribozymes? This seems unnecessary and random (e.g. two of them are blue).
Yes, we agree this is arbitrary. However, we would prefer to leave them colored as they are because they coordinate with several supplemental figures, and we believe it makes the data sets easier to differentiate. We chose color blind friendly colors, so we used two different blues that are easy to distinguish by color blind individuals.
7. p9. Not only were the epistasis values for on- anti-diagonal "consistently more positive than two mutations in positions that are not directly base-paired (off-diagonal)" the latter were often negative (at least the mean was), which should be stated. i.e. "more positive" implies both distributions were positive.
We agree that this language could be improved for clarity and to emphasize our observation of prevalent negative epistasis ‘off-diagonal’. We edited the text on page 9 to clarify these points.
8. Figure 4. The inset for LA/LA is not positioned correctly on the main figure. A28-A31 are should be left-shifted. Moreover, epistasis between LA and LB could not be judged, as the authors would like the reader to do on p12. The authors should provide an LA/LB inset for the reader to look at. Additionally, the authors would like the reader to think about interactions of 1 and 46 but don't draw the pair in panel B, making this hard to visualize. The "positive epistasis" for G42U and A64G is missing in the inset which is entirely white for this square. And the A47:G57 showing "positive epistasis for double mutants that result in an AU base pair" doesn't make sense because this can happen with a single mutant at G57U.
Thank you, we corrected the placement of the box for the LA/LA inset on the hairpin figure.
Moreover, epistasis between LA and LB could not be judged, as the authors would like the reader to do on p12. The authors should provide an LA/LB inset for the reader to look at. Additionally, the authors would like the reader to think about interactions of 1 and 46 but don't draw the pair in panel B, making this hard to visualize.
In order to help readers visualize the interaction between G1 and C46, we have drawn a dashed line between these positions on the secondary structure in Figure 4. In addition, we have generated a new supplementary figure (S.Figure 5) that we think better highlights our epistasis data as it corresponds to the loop-loop interactions in hairpin. This figure contains a secondary structure of the loopA-loopB interactions that depicts which bases have been previously shown to interact (panel A), a crystal structure rendering of the loop loop interaction (panel B), as well as the LoopA-LoopB epistasis insets (panel C). We think that the linking of Supplemental figures to main figures in eLife should make this easily accessible for the reader.
The "positive epistasis" for G42U and A64G is missing in the inset which is entirely white for this square. And the A47:G57 showing "positive epistasis for double mutants that result in an AU base pair" doesn't make sense because this can happen with a single mutant at G57U.
You are correct that there were errors in some of our examples of epistasis in these loops. Thank you for catching these mistakes! We have closely re-inspected epistasis in these areas and have updated the description of the results on page 13 of the manuscript. Regarding the A47:G57 mutations that result in an AU basepair, you are correct that the A47:G57U would only be a single mutant, and would not exhibit epistasis. We have edited this to simply state we see positive epistasis for the A47G:G57U mutations, and have removed the language stating that this forms an AU basepair.
9. p14. The authors give details on epistasis between positions 20 and 25 but nothing is shown in the off-diagonals and this data cannot be had on the main diagram in A.
We agree that this data should be shown in the main figure for hammerhead. We have added an inset to show epistasis values for the nucleotides in the CUGA motif that contains this interaction. We have also added a panel (D) to the figure showing a crystal structure with this base pair interaction highlighted.
10. p10. The discussion that "differences between epistasis in short and long base-paired regions suggests that the thermodynamic stability of each paired region is important for the observed activity" is clearly supported by the data but should be explained. Short regions cannot withstand the loss of one and especially two base pairs because they are short and once broken cannot be broken again. Bevilacqua and Herschlag have discussed this and could be consulted and referenced. 1. Moody, E. M. and Bevilacqua, P. C. (2003). Folding a stable DNA motif involves a highly cooperative network of interactions. J. Am. Chem. Soc. 125, 16285-16293. 2. Kraut, D. A., Carroll, K. S. and Herschlag, D. (2003). Challenges in enzyme mechanism and energetics. Annu. Rev. Biochem. 72, 517-571.
Thank you for this suggestion, we added those references and your suggested explanation. We agree that for regions that are less thermodynamically stable, mutational effects seem more extreme. Our data shows that this does not result in more negative epistasis, but rather a shift towards more positive epistasis for compensatory double mutations, and that appears to be responsible for the large differences in the epistasis distributions for these areas. We also clarified the text to explain this.
11. Methods. The authors did reverse transcription. From Table S1, it appears that extra bases were appended to the 3'end of each ribozyme to provide an RT primer binding site [it appears this way from the first unbolded bases listed at the 5'-end of the template in Table S1]. If so how do those bases affect the fraction cleaved of the wild-type? And then the mutants? See comment 5 above. Also, I was unable to understand how the sequence of the RT primer in Table S1 works with the templates.
Thank you for your close inspection of the oligos used. Our templates did contain additional bases to provide an RT primer site. The sequence used was developed by others as a primer binding site for RNA SHAPE experiments. It was reported in the literature that the sequence did not interfere with the folding of several different RNA molecules. To prevent structural interference, the sequence is designed to have 2 internal hairpins each stabilized by a UUCG tetraloop which are separated by short dinucleotide linkers to reduce stacking of the hairpins on the internal RNA structure. Because it is on the 3’-end of the RNA, and we are studying co-transcriptional self-cleavage, the ribozyme structure has a kinetic advantage as well because it is transcribed first and has the opportunity to fold before the primer binding site is transcribed. We do not have any direct information on how this sequence may affect the fraction cleaved of every mutant sequence, but have observed high observed rates of self-cleavage for the wild type (see table 1). In previous publications with the CPEB3 ribozymes, we observed no difference in activity with and without the primer binding site (unpublished data). We have added the following reference to the manuscript because it describes the design and effectiveness of the primer binding sequence in SHAPE experiments:
Wilkinson KA, Merino EJ, Weeks KM. 2006. Selective 2′ -hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat Protoc 1: 1610–1616. doi:10.1038/nprot.2006.249 Wu L, Wen C, Qin Y, Yin H, Tu Q, Van N
Also, I was unable to understand how the sequence of the RT primer in Table S1 works with the templates.
The RT primer listed in the table was an erroneous duplication of our TSO4 sequence. The sequence was replaced with the correct RT primer sequence. Thank you for finding this mistake.
12. In the epistasis equation, which should be numbered, what are the authors taking the log of? RA? Shouldn't there be a plus sign between the two terms in the denominator? For e.g. assume additivity RAi = RAj = 10^-1, and RAij = 10^-2, we need a plus sign to have this come out to unity; as it would come out to -2. Also, how does it follow that negative epistasis is less than 0? Shouldn't negative be >1 and positive be <1?
Thank you for catching this, we made a mistake when writing the equation we used to calculate epistasis in the manuscript. We have corrected it in the manuscript and also double checked our code that we used to calculate epistasis. We found that the code was correct, so this was just a typo that is now corrected. Because there are only a few equations reported and none are referenced elsewhere in the manuscript, we did not number them. We will defer to the editorial preference of the journal.
13. We could see some trend in the length of paired regions and the intensity of the epistasis effect, but it was hard to tell whether the negative correlation between the median deleterious effects of single mutations and the minimum free energy of the paired regions was significant from the plot in Supplementary Figure 3 and the given Pearson Correlation = -0.53. Also, it may be good to address and explain the difference in the distribution of epistasis value of CPEB3 P1, P2, and P4, which all have 7 base pairs but are very different in the distribution of epistasis value.
We have added a trendline to the plot, and have added the p = value of 0.029 to show significance. CPEB3 P1 appears to be more sensitive to the first mutation, likely because of proximity to the cleavage site. We have added this to the discussion of thermodynamic stability starting on p. 10.
Reviewer #2 (Recommendations for the authors):
Regarding the CPEB3 ribozyme, an open question is about the role of the U21-U42 base pair. Figure 1 indicates that positive epistasis has been measured for some sequence combinations. It would be very sound to frame this region of the heat map together with the T1 interaction to discuss the heuristic power of the presented approach since the CPEB3 ribozyme is the only ribozyme studied in this manuscript for which no crystal structure has yet been made available.
Thank you for this suggestion. We have taken a closer look at our data for single and double mutants involving the analogous U21 and U38 for our construct used. We believe that our data does support the presence of this flexible non-WC base pair interaction as described in Skilindat, et al. 2016. For nearly all double mutations involving a U->A mutation, we observe negative epistasis. This could be caused by single U->A mutations forming a stable AU basepair, and exhibiting an increase (or no decrease) in RA for those single mutants. Therefore, double mutations involving a U->A mutation that do not form a BP will disrupt this favorable single mutant interaction and would be expected to exhibit negative epistasis. Additionally, we do see weak positive epistasis for the U21C:U38G double mutant, which could hypothetically form a GC basepair. While we did not observe positive epistasis for the U21G:U38C double mutant, we believe this could be influenced by the effects of single mutations going through a GU wobble.
Worth noting however, is that this basepair interaction is not as sensitive to single and double mutations as we would expect for such a short paired region. Most single and double mutations to either of these positions do not result in a significant reduction in self-cleavage activity. In fact, the most sensitive single mutation is U38G, which we would expect to be a favorable mutation if it formed a GU wobble with U21. Skilindat, et al. 2016 found that U38 is a site of Mg2+ binding. Our data suggests that this Mg2+ binding interaction is not necessarily nucleotide specific, as U38G is the only deleterious single mutation at this position.
We have expanded the inset for T1 for CPEB3 to show the epistatic effects of all double mutations between these two positions, and have indicated the presence of that interaction on the secondary structure in figure 1. We have also added a discussion to the text on page 15.
Reviewer #3 (Recommendations for the authors):
1) For example, the way the ribozymes have been randomized is already described in Kobori and Yokobayashi (2016). On the other side, it would be necessary to provide more experimental information on the way the template switching reverse transcription is performed. This is not well explained as there is no clear explanation for the usage of the TSO1-4 oligos… The authors should rewrite the experimental method section of their article so that anybody who wants to use this approach could do it without struggling…
Thank you for your comment, we have added the appropriate Kobori and Yokobayashi reference for our doped oligo synthesis library design to our methods section.
Our lab has previously published a manuscript detailing the template switching reverse transcription using phased oligos methods in the RNA journal, we have made sure to add a clear citation that can be referenced by those wishing for more detailed information to use this approach.
Bendixsen DP, Roberts JM, Townshend B, Hayden EJ. 2020. Phased nucleotide inserts for sequencing low-diversity RNA samples from in vitro selection experiments. RNA 26:1060–1068. doi:10.1261/rna.072413.119
2) While some of this 3D information is seldom mentioned in the text, there is no easy way for the reader to find this information in the data provided. A figure exemplifying some of the key 3D interactions of these ribozymes would be most useful.
For example, it would be nice to have more structural details by eventually showing some of the results within the context of the 3D structures of each ribozyme.
For instance, the Watson-Crick base pair interaction G1C-C46G between LA/LB in the hairpin ribozyme could be shown.
Thank you for this suggestion. We agreed that our manuscript would be improved by the incorporation of 3D figures exemplifying the key 3D interactions that we discussed. We have added a new supplementary figure 5 which highlights the 3D interactions within and between the internal loops of hairpin. This figure includes a secondary structure depiction of these specific interactions, as well as a crystal structure rendering that shows the loop-loop interactions and highlights the G1-C46 basepair. The same thing with the base pair C20G-G25C in the hammerhead ribozyme. We have also added a new panel D to the hammerhead figure (5), which shows a crystal structure for hammerhead highlighting the C20G-G25C basepair interaction.
In fact, the 2D structure of the hammerhead ribozyme could be improved as it does not correspond to the active form…
We are not aware of a more accurate 2D depiction of the hammerhead ribozyme, as our version is shown in the form that allows for the tertiary contact between L1 and L2.
The authors should also provide more data information (as well as supportive figures) about the tertiary interactions that involve non-canonical base pairs and that show positive epistasis… After all, this is a piece of information that has not yet been obtained before for these ribozymes as it cannot be easily obtained by other approaches…
We have added all of our epistasis data for double mutants to the new supplementary figure 5 highlighting the loop-loop interaction in hairpin. We believe this will allow readers to explore instances of positive epistasis measured for the tertiary interactions involving non-canonical pairs in the hairpin loops.
3) The authors could have certainly enhanced dramatically the scope of their article by trying to validate the structure of a self-cleaving ribozyme for which the 3D structure is not known yet. This would have provided a clear test for their approach and would have enhanced dramatically their claim that it could complement chemical and enzymatic probing.
Thank you for this comment. We do agree this would have strengthened the manuscript, but is beyond the scope of the work contained herein.
https://doi.org/10.7554/eLife.80360.sa2Article and author information
Author details
Funding
National Science Foundation (OIA-1738865)
- Eric J Hayden
National Science Foundation (OIA-1826801)
- Eric J Hayden
National Aeronautics and Space Administration (80NSSC17K0738)
- Eric J Hayden
Human Frontier Science Program (RGY0077/2019)
- Eric J Hayden
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The authors acknowledge funding from the National Science Foundation (EH, Grant numbers OIA-1738865 and OIA-1826801), the National Aeronautics and Space Administration (EH, Grant number 80NSSC17K0738), and the Human Frontier Science Program (EH, Ref no.: RGY0077/2019).
Senior Editor
- James L Manley, Columbia University, United States
Reviewing Editor
- Timothy W Nilsen, Case Western Reserve University, United States
Reviewers
- Philip Bevilacqua, Pennsylvania State University, United States
- Benoît Masquida, Centre National de la Recherche Scientifique, France
Publication history
- Preprint posted: May 17, 2022 (view preprint)
- Received: May 18, 2022
- Accepted: December 28, 2022
- Accepted Manuscript published: January 19, 2023 (version 1)
- Version of Record published: February 6, 2023 (version 2)
Copyright
© 2023, Roberts et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 656
- Page views
-
- 120
- Downloads
-
- 0
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Biochemistry and Chemical Biology
- Chromosomes and Gene Expression
In nucleosomes, histone N-terminal tails exist in dynamic equilibrium between free/accessible and collapsed/DNA-bound states. The latter state is expected to impact histone N-termini availability to the epigenetic machinery. Notably, H3 tail acetylation (e.g. K9ac, K14ac, K18ac) is linked to increased H3K4me3 engagement by the BPTF PHD finger, but it is unknown if this mechanism has a broader extension. Here, we show that H3 tail acetylation promotes nucleosomal accessibility to other H3K4 methyl readers, and importantly, extends to H3K4 writers, notably methyltransferase MLL1. This regulation is not observed on peptide substrates yet occurs on the cis H3 tail, as determined with fully-defined heterotypic nucleosomes. In vivo, H3 tail acetylation is directly and dynamically coupled with cis H3K4 methylation levels. Together, these observations reveal an acetylation ‘chromatin switch’ on the H3 tail that modulates read-write accessibility in nucleosomes and resolves the long-standing question of why H3K4me3 levels are coupled with H3 acetylation.
-
- Biochemistry and Chemical Biology
- Medicine
Extracellular vesicles (EVs) are released by all cells into biofluids such as plasma. The separation of EVs from highly abundant free proteins and similarly sized lipoproteins remains technically challenging. We developed a digital ELISA assay based on Single Molecule Array (Simoa) technology for ApoB-100, the protein component of several lipoproteins. Combining this ApoB-100 assay with previously developed Simoa assays for albumin and three tetraspanin proteins found on EVs (Ter-Ovanesyan, Norman et al., 2021), we were able to measure the separation of EVs from both lipoproteins and free proteins. We used these five assays to compare EV separation from lipoproteins using size exclusion chromatography with resins containing different pore sizes. We also developed improved methods for EV isolation based on combining several types of chromatography resins in the same column. We present a simple approach to quantitatively measure the main impurities of EV isolation in plasma and apply this approach to develop novel methods for enriching EVs from human plasma. These methods will enable applications where high-purity EVs are required to both understand EV biology and profile EVs for biomarker discovery.