1. Biochemistry and Chemical Biology
Download icon

Random-sequence genetic oligomer pools display an innate potential for ligation and recombination

  1. Hannes Mutschler  Is a corresponding author
  2. Alexander I Taylor
  3. Benjamin T Porebski
  4. Alice Lightowlers
  5. Gillian Houlihan
  6. Mikhail Abramov
  7. Piet Herdewijn
  8. Philipp Holliger  Is a corresponding author
  1. MRC Laboratory of Molecular Biology, United Kingdom
  2. Katholieke Universiteit Leuven, Belgium
Research Article
  • Cited 0
  • Views 934
  • Annotations
Cite this article as: eLife 2018;7:e43022 doi: 10.7554/eLife.43022

Abstract

Recombination, the exchange of information between different genetic polymer strands, is of fundamental importance in biology for genome maintenance and genetic diversification and is mediated by dedicated recombinase enzymes. Here, we describe an innate capacity for non-enzymatic recombination (and ligation) in random-sequence genetic oligomer pools. Specifically, we examine random and semi-random eicosamer (N20) pools of RNA, DNA and the unnatural genetic polymers ANA (arabino-), HNA (hexitol-) and AtNA (altritol-nucleic acids). While DNA, ANA and HNA pools proved inert, RNA (and to a lesser extent AtNA) pools displayed diverse modes of spontaneous intermolecular recombination, connecting recombination mechanistically to the vicinal ring cis-diol configuration shared by RNA and AtNA. Thus, the chemical constitution that renders both susceptible to hydrolysis emerges as the fundamental determinant of an innate capacity for recombination, which is shown to promote a concomitant increase in compositional, informational and structural pool complexity and hence evolutionary potential.

https://doi.org/10.7554/eLife.43022.001

Introduction

The phenotypic richness of a biopolymer pool, that is the density of functional sequences within the sequence space (the totality of all available or possible sequences), is a central, if largely unexplored parameter of its evolutionary potential. This property critically informs many aspects of biological function including the adaptability of biological systems as a function of combinatorial diversity ranging from bacterial populations to the mammalian immune system and to abiotically-generated sequence pools. In all such cases, ‘survival’ (in a Darwinian sense) depends on the availability of functional entities (such as a neutralizing antibody) within the combinatorial space available to the system (such as the number of circulating B-lymphocytes or the size of a phage display repertoire). The same concept is also of central importance to the proposed spontaneous emergence of function from prebiotic pools of informational polymers, a key conjecture of current scenarios for the origin of life. Indeed, the phenotypic richness of random-sequence pools of the natural nucleic acids RNA and DNA, as well as a variety of xeno nucleic acids (XNAs), has been studied by in vitro selection methods reliant on iterative rounds of enrichment and amplification, yielding ligands (aptamers) and catalysts (Wachowius et al., 2017). Extrapolation from such selection experiments and analysis of the mutational landscapes of functional oligomers indicated a low frequency of such functional molecules (1 in 1010-1013) (Wilson and Szostak, 1999), suggesting that, globally, the pools are essentially inert, although weak ligands/catalysts might be more frequent (Ekland et al., 1995; Knight et al., 2005). Together, experimental data and theoretical considerations suggest that function scales with sequence (i.e. information) and structural complexity (or a related property, termed ‘functional information’ (Carothers et al., 2004)). Thus, sequences with any given function (and among these, those with highest functionality) are more frequently found in longer, informationally and compositionally more complex, and by inference structurally more diverse pools, despite a potential trade-off through increased misfolding (Ekland et al., 1995; Sabeti et al., 1997; Carothers et al., 2004; Legiewicz et al., 2005; Jiménez et al., 2013). In agreement with this, in silico models predict a sharp drop in the propensity of both folded structures and structural complexity in RNA pools with shorter oligomer lengths (<25 nucleotides, nt) (Briones et al., 2009). Consequently, the pools of short, random RNA oligomers of mixed sequence composition (<20 nt) accessible from prebiotic chemistry (Monnard et al., 2003) would be predicted to contain few defined three-dimensional structures and even fewer functional sequences. This raises the question how functions that would constitute a living system could ever have emerged spontaneously from such pools.

Starting from the premise that the global potential of oligomer pools for function may have been underestimated by an undue focus on single ‘hit’ sequences (i.e. fitness peaks), we have sought to examine the innate, global functional potential of the pools themselves, which has never been measured directly. Specifically, we examine pools of short RNA oligomers of semi- and fully-random sequence (with or without activation chemistry) and compare them to equivalent pools of chemically related but distinct genetic polymers, including DNA as well as the xeno nucleic acids (XNAs) ANA (arabino-), HNA (hexitol-) and AtNA (altritol-nucleic acids), with regards to a simple functional test: the ability to undergo intermolecular ligation and/or recombination. While DNA, ANA and HNA pools proved inert, we discover a pervasive tendency of diverse-sequence RNA (and to a lesser extent AtNA) pools to undergo ligation and/or recombination even in the absence of extraneous activation chemistry, and analyse the constitution and emergent properties of recombined pools by deep sequencing, bioinformatic analysis and in silico simulation. Our results suggest that RNA (and to a lesser extent AtNA) hold a privileged position among simple alternatives for genetic information storage and propagation due to an innate capacity for recombination enabled by energetically near-neutral transesterification chemistry (Eftink and Biltonen, 1983; Usher and McHale, 1976; Verlander et al., 1973; Verlander and Orgel, 1974). Furthermore, our results demonstrate that such spontaneous, non-enzymatic recombination among random-sequence pool members acts to counterbalance hydrolytic decomposition and in the process, progressively increases pool compositional, informational and structural diversity.

Results

We chose to specifically examine 1015-member pools of eicosamer (20 nt long) RNAs. While still within the range of oligomer lengths likely to be accessible via prebiotic chemistry (Ferris et al., 1996; Monnard et al., 2003), 20-mer sequences are predicted to exhibit at least some tendency for secondary structure formation (Briones et al., 2009). Furthermore, a 1015-member eicosamer pool (1.7 nmol) comprises significant redundancy (909 copies) of each possible 20-mer sequence. Finally, as shown previously, naturally occurring 20-mer sequences can associate into non-covalent multimer complexes reconstituting function such as catalysis, when incubated in eutectic ice phases at −9 °C, as exemplified by the hairpin ribozyme (Mutschler et al., 2015; Vlassov et al., 2004).

In order to examine the innate potential of such pools for functionality (such asligation), we first examined the reactivity of 2’, 3’-cyclic phosphate (>p)-activated random RNA pools. >p groups form as part of RNA degradation reactions, as well as during proposed prebiotic nucleotide synthesis (Jarvinen et al., 1991; Li and Breaker, 1999; Powner et al., 2009; Gibard et al., 2018) and can mediate non-enzymatic ligation in the presence of an organizing template (Lutay et al., 2006; Usher and McHale, 1976). In order to maximize reactivity, we incubated pools in ice; eutectic ice phase formation has been shown to promote molecular concentration, reduce RNA hydrolysis, and provide a benign, compartmentalized medium, making water ice a potentially beneficial medium for RNA propagation at the origin of life (Attwater et al., 2010; Attwater et al., 2013; Mutschler et al., 2015). We first incubated a >p activated random RNA eicosamer pool (N20>p, including 6% 5’-Carboxyfluorescein (FAM) labelled N20>p to facilitate detection of ligation products) in ice without any organizing template and sought to detect higher molecular weight (MW) product bands by gel shift in denaturing gel electrophoresis (Urea-PAGE) (Figure 1A). We observed significant intermolecular, covalent N20>p pool ligation with yields of ~10% (after 2 months, −9 °C) both in the presence or absence of magnesium (+ / - Mg2+) counterions (10 mM MgCl2, Supplementary file 1). No ligation was observed in equivalent control samples incubated at −80 °C (Figure 1B). Deep sequencing of the most prominent higher MW band confirmed that >80% of sequences corresponded to ligation events between two eicosamers (N20 × N20, Figure 1C). Thus, RNA eicosamer pools activated by >p undergo a disproportionation reaction by intermolecular ligation, yielding a subpopulation of pool sequences extended in length (predominantly 40-mers and some 60-mers (Figure 1C)).

Figure 1 with 2 supplements see all
Reactivity of 2’, 3’ cyclic phosphate (>p) activated N20 RNA pools.

(A) N20>p pools are incubated in eutectic ice phases, spiked with 6% 5’FAM (Carboxyfluorescein)-labelled N20>p to facilitate detection of ligation products (orange) by Urea-PAGE. (B) Scan of a denaturing gel for FAM-labelled products after 57 days of incubation either in eutectic ice (−9 °C) or at −80 °C, and +/- MgCl2. Note that ligation of FAM-N20>p is inhibited due to the blocked 5’-OH. (C) Densitogram of a SYBR-gold stained Urea-PAGE gel trace (-MgCl2, −9 °C) (bottom panel) showing total ligation (both FAM-labelled and unlabelled ligation products (indicated by arrows)). The size distribution and size of the main ligation product (40mer) from deep sequencing indicates direct ligation of two eicosamers (inset). (D) Upper panel: Average nucleotide distribution profile of the 40mer ligation products. Lower panel: Changes in nucleotide composition compared to the unligated input N20>p pool. Black arrows indicate the ligation site. (E) Upper panel: Average base-pairing frequencies of sequenced 40mers (as predicted by RNAfold (Lorenz et al., 2011): ‘Opening’ base pairs (obp, green; defined as hybridization to a downstream nucleotide) and ‘closing’ base pairs (cpb, hybridization to an upstream nucleotide, orange). Lower panel: Average difference of predicted base pairing between experimental pool and synthetic sequence pools (N = 3; 300,000 sequences each) generated in silico using experimental nucleotide frequencies shown in panel D. Grey indicates predicted unpaired bases (unp), black standard deviations.

https://doi.org/10.7554/eLife.43022.002

Comparison of the 40mer ligation product sequences with the sequence distribution of pre-ligation eicosamers revealed a clear nucleotide signature at the N-1pN+1 ligation junction (predominantly CpN, with UpG also overrepresented) (Figure 1D, Figure 1—figure supplement 1A). While the mechanistic basis for this bias remains to be elucidated, we hypothesize that the stable pairing and the slightly higher reactivity of pyrimidine nucleoside 3′-phosphodiesters (hydroxide ion catalysed cleavage of 3′, 5′-YpN is ca. 2x-3x faster than 3′, 5′-RpN cleavage (Oivanen et al., 1998)) aides >p dependent ligation. Regardless of the mechanism, the resulting sequence fingerprint allows an assignment of ligation junctions and reaction trajectories proceeding through an analogous, putative >p like intermediate or transition state in diverse RNA contexts (see below).

Other global features of the ligation junction are notable. In silico RNA folding using RNAfold (Lorenz et al., 2011) comparing experimental versus computer-generated sequence pools using the same post-ligation nucleotide distribution (shown in Figure 1D) indicates a significant enrichment of secondary structure in the ligated pool with ca. 55% of intermolecular ligation junctions located within helical regions (Figure 1E). In accordance with previous studies on >p reactivity, this suggests that ligation is accelerated by inter-strand hybridization, which positions reactive groups in proximity and increases their effective concentration, while reducing the entropic cost of intermolecular ligation. In support of this, we found evidence for extensive intramolecular association and non-covalent complex formation in random eicosamer pools as judged by non-denaturing gel electrophoresis (Figure 1—figure supplement 2). Thus, random pools contain a significant proportion of organizing templates through intermolecular hybridization among pool oligomers and this provides gapped duplex junctions poised for ligation (Lutay et al., 2006; Usher and McHale, 1976).

We wondered if these pools contained RNA sequence motifs that would promote ligation by mechanisms other than gapped duplex formation. To aiddiscovery of specific motifs, we repeated the ligation experiments using semi-random RNA sequence pools comprising random eicosamers (N20) flanked by (unstructured) 20 nt constant sequence primer binding sites (C120, C220). The use of such ‘bait’ (C120-N20>p, B>p) and ‘prey’ (5’OH-N20-C220, OH-P) oligonucleotides (2.5 µM each) enabled rapid detection and analysis of intermolecular bait × prey (B × P) ligation products with high sensitivity by RT-PCR using primers complementary to C120 and C220 as well as much shorter incubation times than needed for direct detection (Figure 2A,B and Figure 2—figure supplement 1A). Deep sequencing of RT-PCR products revealed ligation of semi-random RNA sequence pools (40 nt) into 80 nt B-P ligation products (5’-C120-N20-N20-C220-3’), with very similar features as in the fully-random pool, including the above described CpN ligation junction ‘fingerprint’ and the increased base-pairing bias at the ligation site (Figure 1—figure supplement 1B, Figure 2—figure supplement 1B). Indeed, when we randomly sampled clones from the ligated fraction and analysed the ability of the constituent B, P segments of the most active clone for ligation, activity depended on formation of a gapped duplex junction with distal sequences having little impact on ligation (Figure 2—figure supplement 2).

Figure 2 with 6 supplements see all
Reactivity of semi-random RNA pools with different termini.

(A) Assembly scheme: The random N20 regions of both semi-random RNAs are illustrated in orange (bait) and cyan (prey). Only direct ligation products comprising the constant primer binding site segments 5’-C120 (dark grey) and 3’-C220 (light grey) are detected by RT-PCR. (B) RT-PCR products for bait>p × 5’OH-prey reactions after incubation at −9 °C or −80 °C for 51 days. Shorter PCR products recovered at −80 °C result from unspecific annealing of the RT-primer in the 3’-random region of the bait fragment (Figure 2—figure supplement 1A). (C, D) Ligation frequencies for motif J (C) respectively motif H (D) in bait>p × 5’OH-prey products. Motif J is defined by four unpaired bases flanked by 3 nt helical regions. The plot shows the normalized amount of ligation sites located between two adjacent nucleotides within motif J. Although the ligation frequency generally drops in regions predicted to be unpaired, in submotifs J4 and H4 it is enriched (shown in red). (E) Consensus motif for J4 extracted from the sequence peak in (C). The constant regions form the strand opposite of the ligation site (green letters). (F) Consensus motif for H4. As for J4, the ligation site comprises parts of the constant region (green). (G) A minimal RNA construct with the J4 core-motif catalyses bimolecular ligation under frozen conditions in ice (-MgCl2, cyan; +10 mM MgCl2, dark blue) as well as under ambient temperature (17 °C) in presence of MgCl2 (brown). No ligation under ambient conditions is observed in absence of MgCl2 (light brown). (H) H4 catalyses slow 3’−5’ ligation reactions exclusively under frozen conditions and in absence of Mg2+ (blue).

https://doi.org/10.7554/eLife.43022.007

Applying a more stratified analysis, we searched for other RNA motifs than a gapped duplex that may mediate RNA ligation by comparing ligated B-P-sequences from two different semi-random RNA pools (differing in their C20 sequences, Supplementary file 2). The use of different C20 anchor sequences (C320, C420) allowed us to dissociate motif structure from sequence-dependent effects such as the imprint of pairing with the conserved C20 termini. We clustered ligation junction structures according to (predicted) minimal free energy using 408 different secondary structure sub-motifs based on 25 main-motifs (Motif A - Motif Y) including internal loops and hairpins (Figure 2—figure supplement 3). While ligation frequencies of most of these motifs broadly correlated with predicted base pairing frequency at the ligation junction (consistent with formation of a helical junction and proximity ligation, see above), we identified high ligation frequencies at specific positions in the unpaired regions of some predicted internal loop motifs (motif H and J, Figure 2C,D). The consensus sequence of these submotifs, H4 and J4, could subsequently be extracted using the motif discovery tool DREME (Bailey, 2011). In both motifs, internal loop segments are formed opposite the putative ligation junctions with involvement of sequences from the C20 constant regions, which seemed to enhance ligation (and therefore enrichment) of compatible junctions from a subpopulation of random segments (Figure 2E,F).

Strikingly, motif J4 and H4 were sufficient to promote regioselective ligation in minimal hairpin constructs (Figure 2G,H, Figure 2—figure supplements 4 and 5): A minimized version of the J4 motif promoted reversible ligation via 2’−5’ regiochemistry both in ice (in the absence of Mg2+) and at ambient conditions (strictly Mg2+-dependent) with an up to 10-fold enhanced apparent ligation rate (kobs) compared to a simple gapped duplex reaction under the same conditions (Figure 2—figure supplement 6). A search for the J4 consensus motif in the JAR3D structure database (Zirbel et al., 2015) revealed that J4-like motifs are present in in naturally occurring RNAs such as the domain II of the 23S rRNA of Deinococcus radiodurans (Harms et al., 2001) and the Escherichia coli adenosylcobalamin riboswitch (Johnson et al., 2012), for which structural information is available. In both structures, the J4-like motifs form a stable pseudohelical structure comprising a purine-rich 4 × 4 internal loop with a triple-sheared GA motif that immediately suggests a likely catalytic involvement of a cross-strand adenine in the reversible 2’−5’ ligation reaction (Figure 3). By contrast, no structural information is currently available for the H4 motif, which mediates ligation via canonical 3’−5’ regiochemistry (exclusively under eutectic ice conditions (−9 °C) in the absence of Mg2+) albeit with a 2- to 3-fold slower kobs than an equivalent gapped duplex-mediated reaction (Figure 2—figure supplement 4, Figure 2—figure supplement 6). Thus, random pools comprise a wide variety of RNA motifs capable of promoting >p ligation via either 2’−5’ or 3’−5’ regiochemistry.

Structure of J4-like motifs that occur in natural RNAs.

(A) J4 likely forms a purine-rich 4 × 4 internal loop classified as a triple-sheared GA motif (RNA 3D Motif Atlas id IL_93568.6) by JAR3D (Zirbel et al., 2015), which also occurs naturally in the 23S rRNA of Deinococcus radiodurans (Harms et al., 2008) (PDB ID: 2ZJR) and the Escherichia coli adenosylcobalamin riboswitch (Johnson et al., 2012) (PDB ID: 4GMA). (B) Consecutive GA base pairs in such internal loops have a strong stabilizing effect on the pseudohelical structure (Chen and Turner, 2006). Shown are the consensus motif (top) and the PDB overlay of the 4GMA (yellow) and 2ZJR (green) substructures. Note, the canonical 3’−5’ linkage between the G-1 and G+1 equivalents in 4GMA and 2ZJR is a 2’−5’ linkage in ligated J4. (C) In 2ZJR, the central A (red) of the GAAG coding strand forms a sheared AG base pair with the first G of the opposite CGGA motif (G-1 in J4) indicating that a similar conformation could exist in J4. The adenine moiety of the central A (red) is in close proximity to the 2’OH of the G-1 equivalent suggesting a potential catalytic activation of the 2’OH nucleophile in the reversible transesterification of the 2’−5’ phosphodiester bond between G+1 and G-1 in J4 through H-bonding.

https://doi.org/10.7554/eLife.43022.016

Next, we sought to dissect mechanistic modes of intermolecular RNA ligation using binary combinations of semi-random RNA bait x prey (B × P) sequence pools (2.5 µM each) each bearing different 2’, 3’- or 5’-substituents (Figure 4A). As described above, we observed efficient and rapid ligation of B>p × 5’OH-P pools (and no ligation in negative controls incubated at −80 °C), as judged by RT-PCR (Figure 4B). Furthermore, we also observed intermolecular assembly in 2’- (or 3’-) phosphorylated B-2’p (or B-3’p) × 5’OH-P reactions. However, these predominantly yielded shorter products, varying between 40–80 nt in length (Figure 4B,C), while still displaying the characteristic CpN ligation ‘fingerprint’ suggestive of reaction via a >p intermediate (Figure 4D, Figure 1—figure supplement 1E). Thus, the CpN sequence signature allowed mapping of presumed ligation junctions, which revealed variable-length assembly products derived from progressive truncation of bait segments, while preserving prey segment length (Figure 4D, Video 1), suggesting a mechanism involving intermolecular recombination (rather than ligation). Recombination likely proceeds via a transesterification mechanism involving a nucleophilic attack of the prey 5’-OH on bait segments either directly (leading to internal bait segment cleavage) or on the >p of bait segments truncated by hydrolysis (Figure 4—figure supplement 1). Consistent with this mechanistic hypothesis, blocking the prey pool 5’-OH by phosphorylation (5’p-P, Figure 4A,B) abolished recombination (Figure 4B,C). Furthermore, qRT-PCR suggests that recombination proceeds with a >10 fold slower rate than direct >p dependent ligation (Figure 4—figure supplement 2), consistent with a putative multi-step reaction trajectory involving a slow, hydrolytic generation of truncated, activated bait (B>p) intermediates for ligation, as well as potentially direct in-line attack in a subset of reactions (Figure 4—figure supplement 1).

Figure 4 with 4 supplements see all
Reactivity of semi-random sequence RNA pools with different 2’, 3’- or 5’-substituents

(A) Binary combinations of semi-random sequence bait and prey RNA pools and their reactivity as visualised by RT-PCR (B) after incubation at −9 °C or −80 °C (51 days). As in Figure 2B, smaller PCR products recovered at −80 °C (C) result from unspecific annealing of the RT-primer in the 3’-random region of the bait fragment. (C) Product size distributions from deep sequencing of bait>p × 5’OH-prey (green), bait-2’/3’p × 5’OH-prey (dark red), bait-OH × 5’p-prey (grey) and bait-OH × 5’OH-prey (light red) co-incubation reactions. (D) Nucleotide signature for products created by recombination of bait-2’/3’p × 5’OH-prey. The frequency of the four nucleotides is represented by a linear combination of RGB values (C-blue, U-red, A-green, G-yellow) resulting in mixed colours in random segments (Material and methods). The characteristic CpN signature at the presumed recombination site (arrow) indicates that the 5’-end of the full-length prey ligates to progressively truncated fragments of >p-activated bait suggesting recombination between bait and prey RNAs in both 2’, 3’ phosphorylated (bait-2’/3’p × 5’OH-prey) and unmodified RNA (bait-OH × 5’OH-bait) reactions.

https://doi.org/10.7554/eLife.43022.017
Video 1
Animated scan through all sequenced products of different length of a C120-N20-2’/3’p (bait) × 5’OH-N20-C220 (prey) recombination reaction.

The nucleotide signature of the different-length products indicate that they were predominantly formed through ligation of full-length prey to truncated and therefore >p activated bait segments.

https://doi.org/10.7554/eLife.43022.025

In order to explore the consequence of alternative activating groups, prey pool activation with a 5’-triphosphate (5’ppp-P) was also examined (Figure 4—figure supplement 3). These yielded an even broader length distribution of products, comprising full-length 80 nt B-P ligation products (likely originating by direct nucleophilic attack of the bait pool terminal bait-3’OH (or −2’OH) attack on the 5’ppp of the prey pool), as well as a wide range of shorter products, derived from both B and P truncationlikely as a result of non-canonical, multi-step reaction trajectories. Indeed, we note that our method of RT-PCR amplification omits the detection of any potential intermolecular recombination or ligation product that does not display a B-P sequence arrangement. Therefore, bait or prey-pool self-ligation (or recombination) (yielding B-B, P-P type products) or reverse ligation (yielding P-B) or indeed non-canonical intramolecular reactions yielding circular, lariat or branched products (unless resolved into a B-P arrangement) are not detected (Figure 4—figure supplement 4). Thus, although rapid and convenient, our PCR assay likely underestimates the true level of intra- and inter-pool reactivity (and the true molecular diversity of reaction products).

Nevertheless, most significantly (and consistent with the reactivities observed above), we observed comparable intermolecular reactivity in unmodified semi-random RNA sequence bait and prey pools (C120-N20-3’OH × 5’OH-N20-C220) - that is pure RNA pools devoid of phosphorylation or any other activation chemistry. Remarkably, these showed intermolecular assembly, yielding a similar distribution of truncated products as observed for bait-2’/3’p×5’OH prey reactions, suggesting intermolecular recombination via transesterification chemistry as outlined above (Figure 4B,C).

In order to better understand the reactivity of unmodified RNA pools without the obscuring imprint of conserved C20 sequences (and PCR amplification bias), we next sought to directly observe reactivity of random RNA eicosamer pools devoid of any activation chemistry. In these (bait × prey: 5’FAM-N20-3’OH × 5’OH-N20A10-3’OH), the prey pool included an invariant A10 tail sequence to aid recovery and analysis (by poly-dT primed reverse transcription), and to allow detection of prey self-recombination products (P-P). After an extended incubation (5 months, −9 °C) we were able to directly visualize formation of two distinct recombination products by gel electrophoresis without any amplification (Figure 5A). The two main product bands (Rec1 and Rec2, together ca. 2.3% yield) were analysed using either poly-dT-primed (for Rec1) or poly-U tailing followed by poly-dA primed SMARTer RT-PCR (Rec1 and Rec2) and deep sequencing (Figure 5—figure supplement 1). For Rec1, we found recombination fragment distributions from ~40 to ~50 nt that include the characteristic fingerprint of a >p ligation junction (Figure 5B,C). In contrast, the longer (on average) Rec2 products (~45 to ~55 nt) showed an ApN ligation junction as they were formed (albeit less efficiently (ca. 0.2% yield)) from recombination of two prey segments within the A10 tail (Figure 5B,C). Thus, as with the unmodified semi-random sequence RNA pools (Figure 4B,C), intermolecular ligation products in random, non-activated RNA pools are likely formed by a transesterification process initiated either by direct recombination or by hydrolytic cleavage (of either B or P), followed by putative ligation of truncated RNA>p sub-fragments. These mechanisms give rise to recombination product species such as bait-prey (B-P, forming Rec1) or prey-prey (P-P present in both Rec1 and Rec2). Bait-bait recombination (B-B) products are not observed, as the bait 5’OH is blocked by FAM modification to aid analysis.

Figure 5 with 1 supplement see all
Reactivity of unmodified random RNA pools.

(A). Denaturing gel electrophoresis of unmodified N20A10 and FAM-N20 RNA pools after incubation (−9 °C or −80 °C, 5 months) imaged both for FITC fluorescence (showing FAM-N20 and its products) (left panel) and after SYBRGold staining (showing all RNAs). Two main product bands (Rec1, Rec2) are apparent. Note the RNA degradation products arising upon prolonged -9 °C but not -80 °C incubation. (B) Size distributions of the sequenced Rec1 (yellow) and Rec2 (blue) bands. (C) Color-coded nucleotide frequencies of Rec1 and Rec2 sequences recovered (according to the protocol shown in Figure 5—figure supplement 1A,B). (D) Average base-pairing differences between the experimental 50mer product pool from Rec1 and randomly generated in silico sequence pools (N = 3; 300,000 sequences each) using the experimental nucleotide frequencies from (C) with secondary structure calculated using RNAfold. Grey indicates predicted unpaired (unp) nucleotides, green nucleotides involved in opening base pairs (obp) and orange closing base pairs (cbp). Standard deviations are shown in black. (E) Predicted energies of the minimal free energy (MFE) structures from the three experimental and in silico pools (as in (D)) suggest that recombination product pools show enhanced folding stabilities. Note that the error bars for the energies calculated from the three synthetic datasets are too small to be displayed.

https://doi.org/10.7554/eLife.43022.026

Comparison of Rec1 sequence pools with three random pools (3 × 300,000 sequences) of equal nucleotide distribution generated in silico again indicated that recombination sites were preferentially localized in base-paired (gapped duplex motif) regions (Figure 5D). Furthermore, selected pools not only exhibited an overall increase in local duplex formation, but also generally more stable (predicted) minimal free energy structures (Figure 5E). Thus, pure random-sequence RNA pools, devoid of any activation chemistry or conserved sequence regions (apart from an A10 tag) display an innate capacity for intermolecular recombination, yielding subsets with altered oligomer length distribution (including longer RNA oligomers), enhanced sequence diversity and structural stability.

Finally, we sought to dissect critical chemical parameters required for an innate recombination potential. To this end, we expanded our investigation beyond RNA to compare and contrast the potential of pools of a range of RNA congeners (representing systematic variations of the canonical ribofuranose ring structure), with respect to intermolecular ligation or recombination. We first examined unmodified, semi-random sequence pools of DNA or prebiotically plausible ANA (arabino nucleic acid) (Noronha et al., 2000; Roberts et al., 2018). In these chemistries, the canonical ribofuranose ring of RNA is replaced with either a 2’-deoxyribose (DNA) lacking the 2’ hydroxyl of RNA, or an arabinofuranose ring (ANA), in which the analogous 2’-hydroxyl group is in the axial (trans) position. Like RNA, both DNA and ANA are capable of forming functional catalytic motifs (Silverman, 2016; Taylor et al., 2015), including small self-cleaving DNA catalysts (Gu et al., 2013). Therefore, both DNA and ANA pools contain sequences with potential function, and in the case of DNA should be able to perform autocatalytic DNA strand cleavage, as a first step of a recombination reaction. However, while we readily detected recombination in the semi-random-sequence RNA pools (as above), neither DNA nor ANA pools showed detectable recombination even after prolonged incubation (up to 6 months, −9 °C; Figure 6A) and despite the 10–100-fold higher sensitivity of PCR (DNA) compared to RT-PCR (RNA and ANA) detection (Ohuchi et al., 1998; Okello et al., 2010). Even inclusion of transition metal ions (Zn2+), which had been shown to strongly promote DNA self-cleavage (Gu et al., 2013), did not result in recombination in DNA (or ANA) pools, while apparently inhibiting it in RNA pools (Figure 6—figure supplement 1).

Figure 6 with 5 supplements see all
Reactivity of unmodified semi-random sequence pools of RNA and related genetic polymers.

Binary combinations of semi-random sequence bait and prey pools of RNA, DNA, ANA, HNA and AtNA and their reactivity are visualised by RT PCR as in Figure 4. (A) In contrast to semi-random sequence pools of RNA (left panel), DNA and ANA pools show no detectable recombination (−9 °C, 5 months). (B) Unmodified semi-random sequence RNA and AtNA pools show recombination after a brief exposure to alkaline pH, while HNA does not. Note that the typical low-molecular weight band seen in −80 °C negative control reactions is reduced or absent potentially due to the pre-incubation NaOH treatment, which leads to fragmentation of semi-randomers (in the case of RNA and AtNA) and the generally lower RT-sensitivity for HNA and AtNA. (C) Size distributions of the RNA (teal) and AtNA (light red) recombination products. Compared to previous experiments (see Figures 4 and 5) RNA recombination products are biased towards shorter fragments, presumably due to increased hydrolysis during NaOH treatment. (D) Color-coded nucleotide signatures of AtNA recombination products from (B) shows similar bait segment truncation as in RNA (see Figure 4) but with a distinct ApN ligation signature.

https://doi.org/10.7554/eLife.43022.030

Next, we explored the reactivity of pools of two additional RNA congeners, in which the five-membered ribofuranose ring is replaced by six-membered ring analogues: D-altritol nucleic acid (AtNA), which shares an analogous vicinal 2’, 3’-cis-diol configuration with RNA (Figure 6B), and its 2’-deoxy-analogue, hexitol nucleic acid (HNA). While HNA random-sequence pools had been shown to contain catalytic motifs (Taylor et al., 2015), AtNA random-pools had not previously been explored.

To test whether the AtNA vicinal hydroxyl can participate in transesterification reactions despite its different ring geometry, we first exposed defined-sequence HNA and AtNA (as well as equivalent RNA) oligonucleotides to alkaline pH (NaOH). Indeed, while HNA proved inert, AtNA (like RNA) proved to be sensitive to alkaline pH-induced hydrolysis, albeit with an apparent 6-fold slower strand scission rate than RNA (Figure 6—figure supplement 2). This indicates that (akin to RNA) the equivalent vicinal 2’-OH in AtNA can participate in intramolecular transesterification reactions presumably via an analogous cyclic phosphate (AtNA>p) intermediate (Figure 6—figure supplement 3). At the same time, the observed lower hydrolytic reactivity suggests that formation of AtNA>p cyclic phosphate may be energetically less favourable than in RNA. Next, we examined whether HNA and AtNA pools (devoid of extraneous activation chemistry) could participate in intermolecular recombination reactions. To this end, we compared the reactivity of semi-random sequence pools of HNA and AtNA to equivalent RNA pools, but observed unambiguous recombination only in the RNA pools (as seen previously) (Figure 6—figure supplement 4). To expedite potentially weak tendencies for recombination, we exposed semi-random sequence pools of HNA and AtNA (as well as control RNA) pools briefly to alkaline pH (50 mM NaOH at 65 °C (RNA (3 min), HNA/AtNA (27 min), neutralized with 1 M Tris•HCl pH 7.4) in order to generate an increased amount of >p products, before a prolonged incubation (−9 °C ice, 5 months). Using a novel engineered reverse transcriptase (RT TK2, see Materials and methods), which, unlike commercial enzymes, is able to reverse transcribe both RNA as well as both HNA and AtNA oligomers, we analysed ligation products by RT-PCR. While HNA pools again proved inert, we now observed clear intermolecular recombination in the AtNA pools, although to a notably lesser extent than equivalent RNA pools.

Deep sequencing of the recombination products from both RNA and AtNA pools revealed a similar size distribution indicative of recombination but, in the case of RNA, biased towards smaller products, presumably due to increased hydrolytic degradation induced by exposure to alkaline pH (Figure 6C). Furthermore, a clear ApN nucleotide signature was observed at the AtNA recombination junction, distinct from the CpN signature of RNA, presumably reflecting the different chemical requirements for AtNA recombination (Figure 6D). The finding that only RNA (and to a lesser extent AtNA) pools are capable of intermolecular recombination, while DNA, ANA and HNA pools were inert, strongly suggests that a vicinal cis-diol configuration forms a critical chemical determinant of an innate intermolecular recombination potential with the propensity for recombination modulated by ring geometry.

Discussion

Recombination is a fundamental process in biology both central to genetic diversification and genome repair (Pesce et al., 2016; de Visser and Elena, 2007), as well as a protective mechanism against the progressive corruption of genetic information by drift (i.e. Muller’s ratchet (Muller, 1964)). RNA recombination is widespread among RNA viruses (Simon-Loriere and Holmes, 2011); while generally attributed to template switching, recombination by purely chemical processes has been suggested to occur in polio (Gmyl et al., 1999) and rubella viruses (Adams et al., 2003), as well as Qbeta bacteriophage (Chetverin et al., 1997).

Here, we show that a capacity for spontaneous ligation and recombination is an innate property of random RNA (and to a lesser extent AtNA) oligomer pools and requires neither recombinase enzymes nor extraneous activating chemistry. As we demonstrate, such non-enzymatic recombination and ligation involves, to some degree, specific sequence motifs (e.g. in RNA pools) that promote ligation with either 2’−5’ or 3’−5’ regioselectivity (Figure 2, Figure 2—figure supplement 5). However, the bulk of observed reactivity derives from the spontaneous organization of random sequence oligomers into non-covalent complexes through intermolecular hybridization, which can be visualized by native gel electrophoresis (Figure 1—figure supplement 2).There is staggering potential combinatorial diversity of such intra-pool interactions. Within the N20 pools examined herein (with a potential sequence diversity of ca. 1012), such combinatorial diversity amounts to >1024 potential binary and an even vaster number of potential tertiary or higher order interactions. In the case of random RNA as well as semi-random RNA and AtNA oligomer pools, a subset of these non-covalent intermolecular assemblies do not remain inert (and non-covalent), but display a pervasive tendency to undergo spontaneous disproportionation reactions resulting in intermolecular recombination and ligation.

The discrepancy between the extent of ligation / recombination we observe (2–10 % of pool sequences (Figure 1, Figure 4) to the predicted frequency of functional sequences of, for example, aptamers / ribozymesfrom repertoire selection / SELEX experiments (1 in 1010-1013 (Wilson and Szostak, 1999)) should be viewed within the same context. Indeed, standard SELEX experiments are designed to iteratively deplete the interaction network's underlying pool reactivity and instead isolate the exceedingly rare unitary sequences capable of function. While such experiments have been ground-breaking in revealing the potential of RNA (as well as DNA and some XNAs) for complex catalytic or ligand-binding functions (Silverman, 2016; Taylor et al., 2015; Wilson and Szostak, 1999), our results show that a back-extrapolation from the hit frequency of SELEX experiments leads to a gross underestimation of global pool functionality, as only single sequences are considered and systems-level interactions and networks are ignored.

We were able to visualize the intermolecular recombination and / or ligation products both directly by gel shift in denaturing gel electrophoresis and by specific RT-PCR amplification. Both assays provide evidence for bona fide ligation and recombination products as PCR or sequencing artefacts can be ruled out on multiple grounds. For one, incubation at −80 °C, which freezes the eutectic phase in our buffer system (Mutschler et al., 2015; Mutschler and Holliger, 2014), never yields products, establishing that fragments do not recombine during workup. Furthermore, the appearance (or absence) of recombination products is clearly and reproducibly linked to chemical features of the original pool (e.g. free 5’OH (RNA) (Figure 4), or cis diol (RNA, AtNA) (Figure 6)), which are erased in the RT step. Finally, in both the case of the >p activated and unmodified random RNA sequence pools, we are able to observe the formation of ligation / recombination products directly by Urea-PAGE, which would dissociate even highly stable non-covalent complexes. Indeed, we note that our assays likely underestimate total pool reactivity (as well as the diversity of potential reaction products) as a range of more complex potential recombination modes, including intra- and intermolecular circular, lariat or branched products are only partially detected by our methods, or not at all (Figure 4—figure supplement 4).

Previous reports of non-enzymatic RNA recombination with defined substrates suggests that the majority of such reactions are likely to involve either a concerted mechanism of direct in-line attack on an internal phosphodiester linkage or, more frequently, a two step-mechanism of two consecutive transesterification reactions via an >p intermediate generated by hydrolysis (Lutay et al., 2007; Nechaev et al., 2009; Staroseletz et al., 2018). In both reaction trajectories, ligation /recombination is promoted by proximity through in trans intermolecular hybridization (Figure 4—figure supplement 1) as well as the concentration and entropic effects of eutectic ice phase formation. Whilst we observe formation of canonical 3’−5’ linkages with some RNA motifs (H4), we expect that the majority of RNA pool linkages likely conform to 2’−5’ regiochemistry as these linkages form preferentially in RNA duplex geometry (Lutay et al., 2006; Orgel, 2004), although such information is lacking for AtNA. In RNA, sporadic 2’−5’ linkages have been found to be broadly compatible with folding and function (Engelhart et al., 2013) and can progressively be converted to the canonical 3’−5’ linkage (Mariani and Sutherland, 2017).

The mechanistic basis for the distinctive sequence signature (CpN) in RNA is currently unclear, although CpA and UpA junctions have been shown to be generally more reactive (Kaukinen et al., 2002). Thus, the bias may reflect the higher propensity for C>p formation and/or reactivity, which may be more reactive due to the stronger electron withdrawal by the cytosine ring system (Krishnamurthy, 2012) or hydrogen bond interactions between the C-2 oxygen of the pyrimidine and the 2'-hydroxyl of the ribose (Bibillo et al., 2000). Earlier reports suggest that reactivity of phosphodiester bonds in a gapped duplex can be context dependent being strongly modulated by the local sequence (stacking) and base-pairing (H-bonding) (Kaukinen et al., 2002; Oivanen et al., 1998), which may explain the different AtNA recombination signature (ApN) (Figure 6D).

Repertoire selection experiments had previously shown that random sequence pools of different genetic polymers (including RNA, DNA and some XNAs) all contain functional sequences, including highly active ribozymes (Wachowius et al., 2017), DNAzymes (Silverman, 2016) and to a lesser extent ANA-, HNA- and other XNAzymes (Taylor et al., 2015). Although rare, these functional sequences can be isolated by iterative rounds of selection and amplification. In contrast, in the experiments presented here (which do not involve enrichment), we observe striking differences in innate reactive potential of different chemistries, with only RNA (and to a lesser extent AtNA) pools undergoing ligation and recombination. In contrast, DNA, ANA and HNA pools remained inert in our experiments, even under forcing conditions such as alkaline pH or high concentrations of transition metal ions like Zn2+, which have been shown to promote self-cleavage in DNA (Gu et al., 2013). While nucleophilicity and pKa of the 5’-OH are likely to be comparable in RNA, DNA and the XNAs examined herein, classic studies using model nucleotide phosphodiester and -triester compounds suggest an at least 30-fold higher reactivity of RNA internucleotide diester linkages compared to DNA and ANA (Lönnberg and Korhonen, 2005). This increased reactivity is thought to be due to the vicinal cis-diol configuration of RNA, with the 2´-OH stabilizing the negative charge developing on the phosphorane intermediate and /or the departing 3´-oxygen by both H-bonding (Lönnberg and Korhonen, 2005; Oivanen et al., 1998) and local electronic effects (Aström et al., 2004). Consistent with this mechanistic model, AtNA, an RNA congener, sharing an analogous vicinal cis-diol configuration (in a six-membered ring context), was found to be capable of intermolecular recombination, although AtNA reactivity appears to be significantly lower than RNA. The latter may be due to greater torsional strain required for in-line attack or formation of an analogous cyclic phosphate (AtNA>p) intermediate (with an almost planar configuration) in the context of a six-membered ring (Bruzik et al., 1996).

Although >p is only weakly activated towards nucleophilic attack, in the context of high effective concentrations of a 5’OH nucleophile in a gapped duplex (or pre-organized internal loop motifs like H4, J4), intermolecular bond formation can be efficient (Usher and McHale, 1976; Lutay et al., 2006). Crucially, >p groups form spontaneously as part of RNA (and presumably AtNA) hydrolytic degradation. While DNA pools have been found to comprise a number of small self-cleaving motifs (Gu et al., 2013), these yield 3’OH and 5’p termini, which require further activation for ligation. Thus a vicinal diol configuration not only accelerates the initial strand cleavage step of recombination, but also yields cleavage product termini (>p and 5’OH) competent for re-ligation. Indeed, other vicinal cis-diol containing genetic polymers have previously been examined by solid-phase synthetic chemistry as part of systematic variation along the ribo- and lyxopyranosyl series (Eschenmoser, 1999; Wippo et al., 2001) (Figure 6—figure supplement 5), and ligative polymerization of 2’, 3’>p activated pyranosyl tetranucleotides on a template has been demonstrated (Bolli et al., 1997). Therefore, a vicinal diol configuration appears to be a critical component of a spontaneous capacity for recombination, setting RNA apart from a range of other prebiotically plausible genetic polymers including TNA (Schöning et al., 2000), ANA (Roberts et al., 2018) and PNA (Böhler et al., 1995).

The pervasive spontaneous ligation and recombination activity displayed by oligomer pools described herein may seem counterintuitive, but such behaviour is both predicted and supported by theoretical arguments from statistical mechanics and thermodynamics. These suggest that under the right boundary conditions, recombination - with an concomitant increase in the diversity of length distribution of (RNA) polymer chains - is favoured by entropic factors (Blokhuis and Lacoste, 2017). Remarkably, not only is the diversity in oligomer length distribution altered but recombined pools also display a shift towards an increased tendency for secondary structure formation (Figure 5E). This observation likely reflects both the tendency of longer oligomers to form more stable and more diverse secondary structures (Briones et al., 2009; Stich et al., 2008) as well as some degree of intrapool selection for sequences capable of intermolecular hybridization and duplex formation (Figure 1—figure supplement 2). Finally, recombination also scrambles parts of the pool sequence information, which could lead to an increase in informational complexity in redundant pools such as the random 20-mer RNA and semi-random 40-mer RNA and AtNA oligomer pools used in our experiments. To better understand this effect, we explored recombination outcomes by in silico simulations of a broad range of boundary conditions (including a range of hydrolytic cleavage (degradation) vs. ligation rates, pool redundancy and code complexity (2-letter vs. 4-letter code)). Using the Shannon population level entropy index D (Derr et al., 2012) as an established measure of informational complexity (see Material and methods), we find that recombination in simulated, redundant N10 and N20 pools disproportionates pool sequences into a distribution of both shorter (cleavage) and longer (ligation) strands and spontaneously increases D under a wide range of boundary conditions (Figure 7, Figure 7—figure supplement 1). Furthermore, in silico recombination reveals critical pool and reaction parameters that enhance information gains, whereby increases in pool informational complexity (as measured by D) scale with pool redundancy, code complexity and oligomer length (Figure 7—figure supplement 1). Under the conditions used in our experiments, i.e. ca. 1000-fold pool redundancy, a 4-letter base alphabet and N20 oligomer length, information gains occur under almost all boundary conditions (Figure 7). Thus, spontaneous recombination in random oligomer pools under any but the most strongly degradative regimes expands both sequence size distribution and sequence diversity, and thereby progressively increases pool compositional (sequence length distribution), structural (secondary structure formation) and informational complexity.

Figure 7 with 1 supplement see all
The change in information content in a random sequence pool as a consequence of recombination.

Informational content / complexity change is computed as the population level Shannon entropy (Diversity index: D (Derr et al., 2012)), where N is the total number of unique sequences in the pool, and ni is the proportion of individuals belonging to the ith unique sequence. D is zero if all molecules are identical and D is maximal when molecules are uniformly distributed through sequence space (top panel). D changes after recombination of a N20 4-letter code, random sequence pool of 100-fold redundancy (each sequence is present in 100 copies). The difference of pool diversity before and after recombination (ΔD) plotted as a heat map as a function of hydrolysis and ligation rates (bottom panel). Under these boundary conditions (mimicking the RNA pools experimentally analysed for example in Figure 5), D increases under all examined combinations of hydrolysis and ligation rates.

https://doi.org/10.7554/eLife.43022.038

In the context of evolution of early random sequence oligomer pools (and potentially viral quasispecies), the impact of the above recombination processes is significant, in particular considering that such pools would at the same time likely to be under fractionation and/or adaptive pressure. Indeed, a number of physicochemical processes (including thermophoretic cycles in porous substrates (Kreysing et al., 2015) or adsorption to mineral surfaces (Ferris et al., 1996)) would favour length-extended oligomers over pool members or short cleavage products. However, adaptive pressure is arguably the most important process that favours more diverse and length extended oligomer sequences. Previous studies have underlined the importance of oligomer length for encoding the most active (large) functional motifs or more frequent smaller motifs (Ekland et al., 1995; Velez et al., 2012; Bartel and Szostak, 1993), as well as the importance of sequence diversity and the power of (continuous) recombination in accelerating adaptive walks (Stemmer, 1994; Hayden et al., 2005; Lehman, 2008; Arnold, 2009; Mutschler et al., 2015).

In conclusion, our findings indicate that native random-sequence RNA (and to a lesser extent AtNA) pools are naturally predisposed to recombination via near energetically neutral transesterification chemistry enabled by their specific chemical structure. As a result, such pools have an unique innate capacity to ‘bootstrap’ themselves towards higher compositional, informational and structural complexity (Briones et al., 2009) and thus a likely enhanced adaptive potential.

Materials and methods

Oligonucleotides

Oligonucleotides are described in detail in Supplementary file 2 (RNA) and Supplementary file 3 (DNA). RNAs with terminal 2’, 3’-cyclic phosphate were created from 2’/3’-phosphorylated RNAs using 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC, Sigma) as described previously.(Mutschler et al., 2015; Mutschler and Holliger, 2014) Non-random activated RNAs were further purified by Urea-PAGE. Random EDC-treated RNAs were precipitated with isopropanol and washed extensively with 70% ethanol. Dried RNA pellets (3–8 nmol RNA) were resuspended in nuclease-free H2O and stored at −30 °C until usage. All oligonucleotide concentrations were determined spectroscopically using the extinction coefficients at 260 nm calculated by OligoCalc (Kibbe, 2007). To approximate the concentration of random RNAs, an even nucleotide distribution was assumed. Random XNA pools were prepared using DNA primers and templates described in Supplementary file 2 and polymerase D4K (Pinheiro et al., 2012) for ANA, or 6G12 I521L (Taylor et al., 2015) for HNA as described previously (Pinheiro et al., 2012; Taylor et al., 2015; Taylor and Holliger, 2015). For AtNA synthesis, we developed a novel polymerase, polLG, which will be described elsewhere. Following XNA synthesis, DNA components were degraded using 0.1 U/µl TURBO DNase (Invitrogen) for 2 hr at 37 °C in 10 mM Tris•HCl (pH 7.6), 2.5 mM MgCl2 and 0.5 mM CaCl2, and all-XNA oligonucleotides purified by Urea-PAGE.

Ligation / recombination reactions

For semi-random bait/prey RNA pairs (Supplementary file 2), 2.5 μM of both sets of fragments were frozen on dry ice or at −30 °C in 25 mM NaCl, 1 mM Tris•HCl pH 8.3 and transferred in a TXF200-R3 refrigerated water-baths (Grant) filled with 50% (v/v) Blue Star Antifreeze (Carplan) equilibrated at −9 °C. Fully random oligonucleotides pools were frozen at a higher initial concentration of 16 µM (N20>p) supplemented with 1 µM of FITC-labelled N20>p pool. For analytical experiments with defined RNA sequences (H4, J4, splint samples, minimal hairpin constructs) 0.5 μM of a FITC-labelled 3’-fragment was mixed with 1 μM of non-fluorescent, >p activated 5’-ligation partner. For analytical trans reactions, RNA splints concentrations were 1 μM. For preparative-scale reactions, concentrations were increased 10-fold to improve ligation efficiencies and product yields. Naive FITC-N20 / N20A10 pools (2.5 µM each) were incubated at for 5 months before recovery and denaturing PAGE analysis. Partially hydrolysed semi-random RNA and AltNA bait/prey pairs (see below) were incubated at a final concentration of 0.6 µM for each oligonucleotide for 5 months at −9 °C.

All samples, with the exception of those used for qRT-PCR and deep-sequencing, were quenched with an excess of 4x quench buffer (98% formamide, 10 mM EDTA pH 8, 0.04 wt% bromophenol blue) then analysed by denaturing PAGE using 20% (w/v) Acrylamide/Bis 19:1 TB/8 M urea gels. For ligation kinetics, reaction master mixes were aliquoted into 10 µl reactions and frozen/quenched independently to prevent repeated freeze-thaw cycles of individual reactions. Gels were imaged using a Typhoon Trio scanner (GE Healthcare). Percentage intensity of starting material and ligation products were determined using ImageQuant TL. For quantifications of SYBRgold-stained N20>p ligation products, band intensities were normalized using the expected oligonucleotide length (Supplementary file 1) before calculating the relative ligation yields. Gel image brightness and contrast were adjusted to illustrate ligation product banding patterns. PAGE gels do not show any bands other than those presented in the figures. Samples used for qRT-PCR and deep sequencing were stored at −80 °C until needed. cDNA synthesis and library preparation from N20>p ligation reactions.

Ice pellets of preparative ligation reactions (1.6 nmol of total N20>p in 100 µl) from −9 °C long-term incubation experiments were thawed on ice. 5 µl were mixed with 4 volumes of denaturing gel loading buffer (95% formamide, 10 mM EDTA pH 8, 0.1% (w/v) bromophenol blue) and analysed by denaturing PAGE followed by fluorescent gel imaging. To phosphorylate free 5’-ends and dephosphorylate unreacted 3’-ends, the remaining 95 µl of the reaction was treated with 90 U T4-PNK (NEB) for 45 min at 37 °C in presence of 1 mM ATP and 1x T4-PNK reaction buffer. 10 pmol of fresh N20>p was treated the same way to obtain sequence information of the starting material. T4-PNK was quenched by the addition of 8.8 µmol EDTA and formamide (final concentration 40%). For in-ice ligation reaction, ligated products were separated from the N20>p starting material by preparative denaturing PAGE followed by staining with SYBR Gold RNA stain (ThermoFisher Scientific). The bands corresponding to the N40 ligation product were excised, eluted from the gel matrix in 0.3 M sodium acetate (pH 5) and precipitated using 2 volumes of isopropanol. To remove residual gel-contaminations, both RNA samples were further purified using the RNeasy Mini Kit (Qiagen). Next, 3.5 pmol of the RNA pools (or the whole 10 pmol of N20>p starting material) were ligated to preadenylated adapter Appa-3RND-Cy5 (20 pmol, Supplementary file 3) using 200 U T4 RNA Ligase 2, truncated KQ (NEB) in presence of 15% (w/v) PEG 8000 and RNA ligase buffer (1x) for 90 min at 25 °C. After addition of 22 pmol 3RND_rev primer in 5.5 µl H2O, the sample was incubated for 5 min at 72 °C, 15 min at 37 °C and 15 min at 25 °C followed by addition of 14.5 µl 5’ linker ligation mix (4 µl of 10 mM ATP, 2 µl of 10x T4 RNA Ligase buffer (NEB), 5.5 µl of 50% PEG 8000, 1 µl of 20 µM 5_RNDM linker and 2 µl RNA Ligase 1 (NEB, 10 U /µl). The ligation mix was incubated for 2 hr at 25 °C and stored at −20 °C. RT-PCR (50 µl) was performed using 1 µl of the ligation reaction and 20 pmol of the recovery primers 5RNDM_fw/3RNDM_rv in a SuperScript III One-Step RT-PCR reaction supplemented with Platinum Taq DNA Polymerase (ThermoFisher Scientific) according to the supplier’s instructions. Amplified cDNA samples were prepared for Illumina deep sequencing by appending the bridge-amplification sequences by PCR using the NEBNext High-Fidelity 2X PCR Master Mix (NEB) using the primers P3_3RND and P5_5RND according to the supplier’s protocol. Libraries were barcoded using P5 primers containing 6 nt sequences from the NEXTflex series (Illumina).

cDNA synthesis from semi-random RNA reactions

For semi-randomer RNA reactions, RT-PCR was performed using the SuperScript III One-Step RT-PCR System with Platinum Taq DNA Polymerase (Invitrogen), according to the manufacturer’s protocol. In each reaction, 1.25 pM of total RNA from ligation or control reactions was used as RT-template with forward and reverse primers complementary to the respective constant C120/C320 and C220/C420 primer binding sites (Supplementary file 3). Deep sequencing libraries were prepared using the respective P5_C1/P5_C3 and P3_C2/P3_C4 primers using the NEBNext High-Fidelity 2X PCR Master Mix (NEB). To prepare C120/C320-N20>p pre-ligation for sequencing, the >p group was removed by dephosphorylation using T4-Polynucleotide Kinase (NEB) followed by adapter ligation using miRNA Cloning Linker 1 (IDT) as described in the previous section. cDNA was generated by RT-PCR using primers C1_fw (or C3_fw) and miRNA_CL1_rev in a SuperScript III One-Step RT-PCR reaction supplemented with Platinum Taq DNA Polymerase (ThermoFisher Scientific) according to the supplier’s instructions. Libraries for Illumina deep sequencing were prepared by PCR using the NEBNext High-Fidelity 2X PCR Master Mix (NEB) using the primers P3_miRNA_CL1_rv and P5_C1 (or P5_C3) according to the supplier’s protocol. DNA libraries from the 5’OH-N20-C220 and 5’OH-N20-C420 pre-ligation semi-random RNA pools were generated using a two-step protocol: First, single stranded cDNA was prepared using C2_rv/C4_rv using the SuperScript III according to the supplier’s instructions. Next, miRNA Cloning Linker 1 was ligated to the purified cDNA and ds cDNA was generated using GoTaq Green mastermix (Promega). Finally, reverse complement cDNA libraries of the 5’OH-N20-C220 and 5’OH-N20-C420 were generated using P3_miRNA_CL1_rv and P5_C2 (or P5_C3).

cDNA synthesis from N20A10 / N20 recombination reactions

For selected reactions with fully random pools, polyU tails were added to provide a primer-binding site prior to RT-PCR. PolyU tailing was performed using 0.2 U/µl polyU polymerase (NEB) in 1X NEB Buffer 2, 0.5 mM UTP for 20 min at 37 °C, followed by heat inactivation for 1 min at 75 °C. Next, RT-PCR was performed (with or without prior polyU-tailing) using the SMARTer RT-PCR system (Clontech), according to the manufacturer’s protocol, using the supplied SMARTer oligo as forward primer, and an oligo containing either a polyA- or polyT as reverse primer (Supplementary file 3; Figure 5—figure supplement 1). cDNA was amplified using OneTaq hot-start master mix (NEB) supplemented with appropriate primers (Supplementary file 3).

DNA synthesis from semi-random XNA reactions.

XNA reverse transcription was primed using DNA or mixed DNA-2’-O-methyl-RNA primers (Supplementary file 3). ANA was reverse transcribed using RTI521L as described previously (Pinheiro et al., 2012; Taylor and Holliger, 2015). HNA and AltNA were reverse transcribed using a novel polymerase, polTK2, which will be described elsewhere. XNA RT reactions were isolated using streptavidin beads as described previously (Taylor and Holliger, 2015), prior to amplification using OneTaq hot-start master mix (NEB), or a blend of OneTaq and ThermoScript (Thermo Fisher), with appropriate primers (Supplementary file 3).

qRT-PCR

qRT-PCR was performed according to the supplier protocol for SuperScript III One-Step RT-PCR System with Platinum Taq DNA Polymerase in a DNA Engine Opticon System 2 (Bio-Rad). 1.25 pmol RNA was used with 6 pmol C1_fwd and C2_rev primers in 30 μl reaction volume.

Deep sequencing and pool analysis

Amplified polyclonal cDNA samples were prepared for deep sequencing by the Illumina Miseq method by appending the bridge-amplification sequences by PCR, as described previously (Taylor and Holliger, 2015). PCR products were purified using either a QIAquick PCR Purification kit (Qiagen), or by agarose gel followed by a QIAquick Gel Extraction kit (Qiagen). 6–12 pmol of pooled libraries plus 20% - 30% PhiX control (Illumina) were denatured and sequenced (single-end read, 75 or 150 cycles) using a MiSeq reagent kit and instrument (Illumina) according to manufacturer’s instructions. Libraries were barcoded using P5 primers containing 6 nt sequences from the NEXTflex series (Illumina). Data files were trimmed, quality filtered (90%, Q20), and collapsed using scripts combining the FASTX command line toolkit (http://hannonlab.cshl.edu/fastx_toolkit/commandline.html) and cutadapt (Martin, 2011) on a Linux workstation. All FASTA files containing sequences of ligation and recombination products were split according to their product size using either awk or custom bash scripts implementing the egrep command (an example script is provided as Source code 1). Position-specific nucleotide frequencies were determined using the fastx_quality_stats command. To create nucleotide signature plots, these frequencies were converted into RGB code at each position i using the following transformations:

Ri=255-fCi255-fAi255
Gi=255-fCi255-fUi255
Bi=255-fAi255-fGi255-fUi255

where fCi, fAi, fUi, and fGi are the positional nucleotide frequencies for C, A, U and G at each position irespectively with fCi+ fAi+ fUi+fGi=1.

To determine ligation/recombination-specific secondary structure (ss) biases in the experimental data sets, large (>100,000 sequences) datasets were generated with a custom Python script (Source code 2) using the experimental position-specific nucleotide frequencies as input. Minimal free energy (MFE) structures for both, experimental and synthetic datasets were calculated using RNAfold (Lorenz et al., 2011) using standard settings. The position-specific dot-bracket element frequencies were extracted using custom bash scripts (an example script is provided as Source code 3).

To screen for secondary structures promoting ligation, the minimal free energy ss for each sequence from a C320-N20>p × 5’OH-N20-C220 and C120-N20>p × 5’OH-N20-C420 ligation experiment was calculated using RNAfold 2.3.5 with default settings. Using custom bash scripts (example script provided as Source code 4), all position-specific matches between ss submotifs from Figure 2—figure supplement 3 and the calculated ss in proximity to the ligation site were clustered. Nucleotide sequences of matched ss-motifs were extracted into separate FASTA-files and consensus motifs for selected matches with untypically increased ligation frequencies in unpaired regions were determined using DREME (Bailey, 2011). Motif H4 and J4 gave the most unambiguous nucleotide consensus sequences and were therefore experimentally tested for ligation.

Regiospecificity of H4

3’−5’-linked controls of full-length H4 or H4T1 were produced in 75 μl reactions containing 5 nmol of 5H4 or 5H4T1 and 3H4 or 3H4T1 RNA in T4 DNA Ligase Buffer (NEB). In a first step, the 3’-fragment was 5’ phosphorylated and the 5’-fragment dephosphorylated using 30 U of full-length T4 Polynucleotide Kinase (NEB) followed by incubation for 1 hr at 37 °C. After addition of an equimolar amount of H4_splintG (for H4T1) the reaction was heated to 80 °C for 1 min and transferred to 65 °C for 15 min. The sample volume was made up to 100 μl with 2.5 μl 10x T4 DNA Ligase Buffer, H2O and 30 U T4 RNA Ligase 2 (NEB). No splint was necessary for enzymatic ligation of H4. Samples were incubated at 37 °C for 90 min followed by gel purification of the ligated RNA. Self-ligated H4 RNA was obtained by gel purification of the 80 nt band after large-scale self-ligation of 5H4>p and 3H4 at −9 °C in 25 mM NaCl, 1 mM Tris•HCl pH 8 in a TXF200-R3 refrigerated water-baths for 2 weeks. Similarly, self-ligated H4T was produced by in-ice ligation 5H4T>p and 3H4T in presence of equimolar amounts of H4_splintA.

For analytical DNAzyme cleavage, 0.1 μM ligated H4 sample or control was mixed with 1 μM E5112_H4 designed to cleave the H4 GrA ligation junction (Cruz et al., 2004) in E5112-DNAzyme buffer (250 mM KCl, 400 mM NaCl, 125 mM HEPES•KOH pH 7, 50 mM MgCl2, 18.75 mM MnCl2) and incubated for 2 h hours at 37 °C. Reactions were quenched by adding 4 volumes quenching buffer (98% formamide, 10 mM EDTA, 0.1% (w/v) Bromophenol Blue) and analysed by Urea-PAGE followed by fluorescent imaging. For RNaseT1 cleavage, 0.15 μM ligated H4T1 was incubated at 55 °C in 25 μl RNase T1 Buffer (30 mM sodium acetate pH 5, 1 μM EDTA, 30 mM urea, 0.04 U/μl RNaseT1). Samples were quenched at 0, 10 and 30 min using 4 volumes of quenching buffer and analysed using Urea-PAGE followed by fluorescence gel imaging.

Regioselectivity of J4

An all 3’−5’-linked control for J4-shrt was generated by annealing 5 nmol of each J4_shrt-P and J4_shrt-F with 5 nmol of the DNA splint J4_shrt-splint followed by ligation using T4-RNA Ligase two as described above. Self-ligated J4-shrt product was generated by co-incubating J4_shrt>p and J4_shrt-F in 25 mM NaCl, 1 mM Tris•HCl pH 8 in a TXF200-R3 refrigerated water-baths equilibrated at −9 °C for 2 weeks. Both self-ligated and control ligation products were gel-purified. Self-cleavage of J4-shrt RNAs was probed in 25 mM HEPES•NaOH pH 7.5, 5 mM MgCl2 and 5 mM MnCl2 (room temperature, 15 h). Cleavage of the 3’−5’ J4-shrt control at the J4 ligation junction was performed using the custom 8–17-derived DNAzyme under the same conditions. The same DNAzyme was also used to inhibit initial self-cleavage of in-ice self-ligated J4.

Partial hydrolysis of random XNA and RNA pools

For experiments in which random oligonucleotide pools were partially hydrolysed prior to ligation reactions, purified 40mer oligonucleotides (equivalent to C1-bait, Supplementary file 2) were incubated in 50 mM NaOH at 65 °C for 3 min (RNA) or 27 min (AtNA and HNA), neutralized with 1 M Tris•HCl (pH 7.4).

Altritol triphosphates

Altritol nucleosides (Abramov and Herdewijn, 2007) were converted to their 5’-triphosphates in one-pot reaction by Lugwig method (Ludwig, 1981). In this procedure, regioselective phosphorylation of 5’-hydroxy group of sugar was carried out with phosphoryl oxychloride in trimethylphosphate followed by the addition of tetrabutylammonium pyrophosphate. The triphosphates were deprotected with ammonia, isolated with ion-exchange chromatography and finally purified by RP HPLC.

Nu = ABz, CBz, dmfG, and U.

i) POCl₃, trimethylphosphate; ii) tetrabytylammonium pyrophosphate, NBu3, DMF; iii) 25% NH3.

https://doi.org/10.7554/eLife.43022.040

For all reactions, analytical grade solvents were used. All moisture-sensitive reactions were carried out in oven-dried glassware under N2. Reagents and solvents were provided by Acros, Fluka, or TCI. TLC: 31P NMR spectra: a Bruker Avance 300-MHz, or a Bruker Avance 500-MHz spectrometer. HRMS spectra were acquire on a quadrupole orthogonal acceleration time-of-flight mass spectrometer (Synapt G2 HDMS, Waters, Milford, MA). Samples were infused at 3 µl/min and spectra were obtained in positive (or: negative) ionization mode with a resolution of 15000 (FWHM) using leucine enkephalin as lock mass.

General procedure for the synthesis of altritol nucleoside 5’ triphosphates: To an ice-cold solution or suspension of corresponding altritol nucleoside (Abramov and Herdewijn, 2007) (0.1 mmol) in trimethyl phosphate (TMP) (1.0 mL) was added phosphoryl chloride (20 µl, 0.22 mmol) and the solution was stirred at 0 °C for 5 hr. Tributylamine (300 µl, 1.6 mmol) and tetrabutylammonium pyrophosphate solution (0.5 M in DMF, 1.1 mmol) were added simultaneously, and the solution was stirred for a further 30 min. The reaction was then quenched by the addition of 0.5 M triethylammonium bicarbonate (TEAB) buffer (10 mL), and stored at 4 °C overnight. The reaction product was deprotected with 25% ammonia, evaporated to dryness, re-dissolved in 0.1 M TEAB (5 mL) and applied to a Sephadex A25 column. The column was eluted with a linear gradient of 0.1–1.0 M TEAB. Appropriate fractions were pooled and evaporated to dryness to give desired product. The triphosphates were finally purified by HPLC (Alltima 5μ C-18 reverse phase column 10 × 250 mm, buffer A, 0.1 M TEAB; buffer B, 0.1 M TEAB, 25% MeCN. 0% to 100% buffer B over 60 min at 4 mL / min).

1’,5’-ANHYDRO-2’-DEOXY-2’-(ADENIN-9-YL)-D-ALTRITOL 6’-TRIPHOSPHATE TRIETHYLAMMONIUM SALT (Vastmans et al., 2001). 31P NMR (D2O): δ −9.2 (1P, d), −11.1 (1P,d), −23.0 (1P,t). ESIHRMS found: (M-H)- 520.0044. calcd for C11H18N5O13P3: (M-H)- 520.0041.

1’,5’-ANHYDRO-2’-DEOXY-2’-(GUANIN-9-YL)-D-ALTRITOL 6’-TRIPHOSPHATE TRIETHYLAMMONIUM SALT 31P NMR (D2O): δ −8.4 (1P, br s), −10.9 (1P, d), −22.5 (1P, t). ESIHRMS found: (M-H)- 535.9991. calcd for C11H18N5O14P3: (M-H)-5359990.

1’,5’-ANHYDRO-2’-DEOXY-2’-(CYTOSIN −1-YL)-D-ALTRITOL 6’-TRIPHOSPHATE, TRIETHYLAMMONIUM SALT 31P NMR (D2O): δ −6.9 (1P, br d), −11.0 (1P,d), −22.7 (1P,br t). ESIHRMS found: (M-H)- 495.9938. calcd for C10H18N3O14P3: (M-H)- 495.9928.

1’,5’-ANHYDRO-2’-DEOXY-2’-(URACIL-1-YL)-D-ALTRITOL 6’-TRIPHOSPHATE, TRIETHYLAMMONIUM SALT 31P NMR (D2O): δ −6.5 (1P, d), −11.1 (1P,d), −22.6 (1P,t). ESIHRMS found: (M-H)- 496.9795. calcd for C10H17N2O15P3: (M-H)- 496.9768

Measuring population-level entropy

To quantify the informational content, D, of a population of sequences, we compute the Shannon index (Derr et al., 2012):

D= -i=1Nnilog2(ni)

Where N is the total number of unique sequences in the pool, and ni is the proportion of individuals belonging to the ith unique sequence. D is zero if all molecules are identical and D is maximal when molecules are uniformly distributed through sequence space.

Simulations of recombination reactions

To understand the effect of recombination on informational content, we simulated a population of random sequences that can undergo hydrolysis according to a normal distribution around the midpoint of the sequence in question. Hydrolysed sequences of length greater than two nucleotides are then allowed to randomly ligate with either another hydrolysed sequence, or an unhydrolysed sequence, weighted by the size of each population. The Shannon index is computed for the population before and after recombination, and the delta Shannon index, ΔD, is reported. Here, a negative ΔD indicates a loss in informational content and a positive ΔD indicates a gain of informational content. The Python script for the simulation is available as Source code 5.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
    Nonreplicative RNA recombination in poliovirus
    1. AP Gmyl
    2. EV Belousov
    3. SV Maslova
    4. EV Khitrina
    5. AB Chetverin
    6. VI Agol
    (1999)
    Journal of Virology 73:8958–8965.
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
    A new route to nucleoside 5'-triphosphates
    1. J Ludwig
    (1981)
    Acta Biochimica Et Biophysica; Academiae Scientiarum Hungaricae 16:131–133.
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
    The relation of recombination to mutational advance
    1. HJ Muller
    (1964)
    Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 1:2–9.
    https://doi.org/10.1016/0027-5107(64)90047-8
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
    Prebiotic chemistry and the origin of the RNA world
    1. LE Orgel
    (2004)
    Critical Reviews in Biochemistry and Molecular Biology 39:99–123.
    https://doi.org/10.1080/10409230490460765
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83

Decision letter

  1. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany
  2. Ulrich Muller
    Reviewing Editor; University of California, San Diego, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for submitting your article "Innate potential of random genetic oligomer pools for recombination" for consideration by eLife. Your article has been evaluated by three peer reviewers, and the evaluation has been overseen by a guest Reviewing Editor and Detlef Weigel as Senior Editor.

The reviewers have discussed the reviews with one another. As you will see from the individual reviews, there was a range of opinions regarding the importance and validity of the work. Overall, the work was seen as technically strong, but there were major concerns about the impact of the work. All three reviewers and the guest editor agreed that very substantial revisions would be necessary before the manuscript could be considered again for eLife. Most importantly, reviewer 2 raises the point that the manuscript is too focused on the audience of nucleic acid chemists; a re-submitted manuscript would have to demonstrate, supported by literature, that the study is of interest to a wider audience. Otherwise, reviewer 2 and the guest editor feel that the manuscript would be appropriate for more specialized journals such as Nucleic Acids Research or Angewandte Chemie.

At this point we cannot directly invite revision, but we do remain interested in the work. We would likely reconsider the work, if you can make a much more convincing case regarding the general interest and advance of your findings, plus fully address the following concerns (and address the rest of the reviewers' comments as much as possible).

Essential revisions:

Reviewer 1: major comments 1 and 4.

Reviewer 2, major comment 1: please provide clear arguments, supported by literature, why this study is of interest to a broader audience than nucleic acid chemists, by inserting a section into the introduction. The first paragraph of reviewer 3 (last sentence) could be used as starting point. Major comment 2: Based on this concern, please describe in additional sentences how the reactions monitored in the study could give rise to an overall increase in compositional complexity if both cleavage and ligation events are considered.

Reviewer 3: all major comments.

Reviewer #1:

This study investigates the ability of random sequence pools of RNA, DNA, ANA, HNA, and AtNA to promote ligation and recombination reactions. The work is original, and explores important concepts related to the functional potential of polymers. However, from my perspective it could be significantly improved by considering the following issues.

1) You used gel purification and high-throughput sequencing to identify molecules from different types of random sequence pools that undergo ligation or recombination reactions after long incubations. My major criticism of this work is that you did not show that these sequences react at enhanced rates relative to the average polymer sequence. Each pool probably contains a small number of molecules with enhanced ligation or recombination rates and a large excess of sequences that react at background levels. After a single purification (analogous to a single round of selection), it is possible that these pools will still contain mostly inactive sequences. If this is the case, sequencing will not necessarily provide information about motifs that promote ligation/recombination (but see point two). The clearest way to address this point would be to experimentally measure the ligation/recombination rates of randomly chosen variants from at least one of the evolved pools. Are the ligation/recombination rates of these sequences higher than those of arbitrary sequences or the unevolved starting pool? Figure 2—figure supplement 4 shows that you can measure the rates of some of these sequences in a reasonable amount of time, but not whether they are enhanced relative to arbitrary sequences or the starting pool. The fact that the sequence compositions of the ligated/recombined pools are different from the starting pools is encouraging in this respect. On the other hand, because the results of many experiments in this study cannot be correctly interpreted without knowing why particular sequences reacted, additional characterization is needed. I would also consider putting at least some of these results in the main figures.

2) I am aware of several studies (such as Pitt and Ferré-D'Amaré, 2010 and Jiménez et al., 2013) in which high-throughput sequencing was used to characterize pools after in vitro selection experiments. In these studies, sequenced pools contained multiple copies of some variants, and a key component of the analysis was to show that copy number was correlated with catalytic or binding activity. I could imagine analyzing your experiments in a similar way. Furthermore, showing that some sequences appear multiple times in the high-throughput sequencing reads would provide additional evidence that they have enhanced ligations/recombination rates. I suspect that your evolved pools contained too many sequences to perform this type of analysis, but these issues should be at least briefly discussed.

3) It seems to me that a model of recombination in which a 5' hydroxyl group in one oligonucleotide attacks a phosphodiester bond in the other is simpler than the one proposed here (in which recombination requires both self-cleavage and ligation). I would recommend providing evidence from either experiments or the literature in support of your proposed mechanism.

4) RNA secondary structure can only be predicted to a limited extent. For this reason, I am not convinced that the ligation junctions of these sequences typically occur in the context of duplexes. Can you demonstrate this experimentally? Furthermore, I am not aware of reliable methods to predict RNA tertiary structure. For this reason (and because it is does not change any of the main conclusions of the paper) I would remove the statement in the Results section that J4 forms a purine-rich 4x4 internal loop with a triple-shared GA motif.

5) For me, one of the most exciting and interesting parts of the manuscript was your analysis of the reactivity of polymers other than RNA and DNA. However, the way in which the results were presented was not logical to me. I would recommend first comparing the ability of each of these polymers to promote ligation/recombination reactions under the same conditions, and then discuss new types of experiments (such as treating with base prior to incubation).

Reviewer #2:

Mutschler et al., have investigated spontaneous cleavage and ligation reactions among a diverse population of RNA (and RNA analogue) molecules. These reactions are well known, and the ability of a complementary template to accelerate the rate of ligation was first described over 40 years ago. Although these reactions have previously been carried out with mixed populations, this is the first study, to my knowledge, to apply deep sequencing to assess sequence preferences, especially for ligation. Not surprisingly, the authors find that there are some sequence preferences, which reflect a combination of intrinsic chemical reactivity and the potential for templated interactions. The reasons for the former are obscure, whereas the latter are due to both canonical and non-canonical templating effects. It is not surprising that DNA, ANA (arabino nucleic acid), and HNA (hexitol nucleic acid), all of which lack a vicinal cis-hydroxyl, do not undergo these reactions, whereas AtNA (altritol nucleic acid), which contains a vicinal cis-hydroxyl, does.

This is a well-executed study that will be of interest to nucleic acid chemists, but will not have broader appeal, as would be required for publication in eLife. The conclusions are generally well supported by the data. However, it is somewhat misleading to point to the "increase in the compositional and structural complexity of recombined pools" while ignoring the decrease in oligomer length that occurs due to the same reactions. If one considers only the ligated products, then those materials can be said to have increased complexity. But for the assembly of materials that involve both cleavage and ligation events (as in Figure 3), one must also consider the many shorter cleavage products that arise. It is not clear from this study how the reactions could achieve an overall increase in compositional complexity. Presumably this would require a fractionation process or means to shift the chemical equilibrium in favor of ligation. The present study would be more suitable for a specialized journal such as Nucleic Acids Research or Angewandte Chemie.

Reviewer #3:

This manuscript describes and extensively analyzes the ability of random sequences to ligate and recombine. The testing was primarily performed with RNA sequences and then compared across multiple genetic polymers (DNA, HNA, ANA, AtNA). The authors show (for the most part) that recombination via trans-ligation of two termini and via sequence displacement is detectable either directly using denaturing PAGE or, in cases when constant regions were added to the distal termini, using (RT)PCR. The reactions were performed in the eutectic phase of water-ice and control reactions incubated at -80 °C showed no activity. Overall this is a comprehensively designed and analyzed work with far-reaching importance, particularly for the Origin of Life field, but also for synthetic and chemical biology.

The ligated junctions for RNA seem to favor a G on the downstream of the ligation site. The RNA ligation site is thus biased, though not completely, towards a C/G sequence, not just CN. The authors use the unligated pool as a control; however, sequencing of unligated prey sequences is strongly biased by the method of introducing sequencing primers on the 5′ end (which depends on cross-templating by the RT enzyme and the SMARTER oligo substrate), so the statistics of the ligated vs unligated nucleotide at the 3′ side of the junction are not reliable. I understand the authors are being careful with their interpretation of the data, but the biases of the trans-templating method are known and can therefore be incorporated into the analysis, at least at the level of discussion if not directly in the Results section.

Figure 4B is key for the conclusions of this manuscript, but the bands for HNA are practically invisible and for AtNA the -80 control is very poorly visible. When intensified, both HNA and AtNA show bands corresponding to longer products. Ditto for DNA and ANA lanes in Figure Figure 4—figure supplement 1, where little to nothing is visible in either the experiment or the control lanes. These gels should be presented intensified as in Figure 3A. Related to this point, why are the -80 °C controls in Figure 4 invisible in the first place?

What are the diversities of libraries used? The theoretical limit of a 20-mer is 42≈1012. Are these libraries exhaustive with respect to the potential diversity limit? Are there known biases in composition for the non-biological genetic polymers? The authors do not mention the amounts of material used, just concentrations. Clearly, sequencing of the entire starting pool (or even recombined and directly isolated ones) is not practical, but for the PCR-amplified pools, the sequencing results may be exhaustive. For that, however, an estimate of the starting diversity and the total amplification of the recombined products would have to be presented. The manuscript would be enhanced if these numbers were included for all experiments and pools, particularly for the RT-PCR-amplified pools, where the HTS results may represent nearly all ligated sequences.

The free energy of activation of a >p vs a triphosphate is known. Assuming similar activation energy for the ligation of a 5′-OH with >p and a triphosphate, the expected kinetics for the background reaction can be estimated from the Rohatgi and Szostak, 1996 paper. An estimate of the background ligation rate and acceleration by the individual motifs should be presented.

https://doi.org/10.7554/eLife.43022.053

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

[…] Reviewer #1:

This study investigates the ability of random sequence pools of RNA, DNA, ANA, HNA, and AtNA to promote ligation and recombination reactions. The work is original, and explores important concepts related to the functional potential of polymers. However, from my perspective it could be significantly improved by considering the following issues.

1) You used gel purification and high-throughput sequencing to identify molecules from different types of random sequence pools that undergo ligation or recombination reactions after long incubations. Each pool probably contains a small number of molecules with enhanced ligation or recombination rates and a large excess of sequences that react at background levels. After a single purification (analogous to a single round of selection), it is possible that these pools will still contain mostly inactive sequences. If this is the case, sequencing will not necessarily provide information about motifs that promote ligation/recombination (but see point 2). Figure 2—figure supplement 4 shows that you can measure the rates of some of these sequences in a reasonable amount of time, but not whether they are enhanced relative to arbitrary sequences or the starting pool. The fact that the sequence compositions of the ligated/recombined pools are different from the starting pools is encouraging in this respect. On the other hand, because the results of many experiments in this study cannot be correctly interpreted without knowing why particular sequences reacted, additional characterization is needed. I would also consider putting at least some of these results in the main figures.

We welcome this comment, which touches upon key aspects of our work and are addressing it below in detail.

The main criticism is based on a misinterpretation of the objective of the paper, which was not to bypass selection / SELEX strategies to isolate defined functional sequences (fitness peaks) (which has been done many times) but rather to investigate the global, systems-level fitness of random sequence pools as a whole (a measurement of which, to our knowledge has never before been attempted). In a sense, what we are trying to determine is what the reviewer 1 refers to as the ‘background’. In this respect, our main findings are the following:

1) Contrary to expectations, the global functionality of the pools exceeds what would be expected by extrapolation from SELEX-type experiments by orders of magnitude (see Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5).

2) RNA pools in particular show an unprecedented innate and spontaneous propensity for recombination. Recombination increases the compositional, structural and informational complexity of RNA pools (as we demonstrate) and hence their adaptive potential (see Figure 4, Figure 5 and Figure 7).

3) Pools of other genetic polymers (specifically Altritol nucleic acids (AtNA)) not found in nature are also capable of recombination, establishing (1) that RNA is not unique in this regard and (2) revealing key chemical and structural parameters required for spontaneous recombination (see Figure 6).

We would argue that all three of these are significant findings with the potential to fundamentally change our perspective on early evolution and indeed on the functional potential of diverse oligomer pools in both biology and chemistry, and as such are of broad interest. However, we do accept that our arguments were not presented in sufficient breadth. In the revised manuscript, we have endeavoured to set out our findings both more clearly for the general reader and discuss them more fully within the context of previous work and the fundamental importance of the observed processes such as recombination in biology.

Please find below a more detailed examination of our main points:

I) in vitro selection experiments (SELEX etc.) have demonstrated repeatedly that diverse random sequence pools of RNA, DNA (as well as some unnatural XNA) oligomers comprise functional sequences including catalysts capable of RNA cleavage and ligation. We therefore may take it as a given that such functional sequences are present in those pools. However, various attempts at extrapolating the frequency of functional sequences all suggest that such functional sequences are extremely rare (10-10-10-13, see e.g. Bartel and Szostak, 1993; Wilson and Szostak, 1999 or Jiménez et al., 2013) depending on the study and functional criteria)). This is turn would suggest that such pools by themselves should be largely inert and that functional sequences would be impossible to discover in the absence of powerful selection methods involving i.e. efficient means for enrichment, replication and amplification. If true, this would clearly call into question one of the key conjectures about the origin of life, which posits a spontaneous emergence of function from random oligomer pools.

In the present work we set out to subject the above conjecture to a critical test based on the hypothesis that while individual functional sequences may be vanishingly rare, global pool functionality may be higher due to system-level interactions among pool sequences. Put simply, selection experiments identify individual sequences that are functional but discard simple interaction networks among pool sequences that may also be functional, e.g. as hetero-dimeric, -trimeric,. -oligomeric assemblies. Thereby, selection experiments likely underestimate the global phenotypic potential of the pools by an undue focus on individual sequences. Indeed, in some cases, as in our own selection experiments seeking to isolate improved RNA triplet polymerase ribozymes, the best adaptive solution turned out to be a heterodimer discovered fortuitously within our selection experiment designed to isolate the most active single sequences (Attwater et al., 2018).

II) Another argument in favour of considering the importance of intrapool interaction networks in potentiating global pool functional potential is their truly vast potential combinatorial diversity. For example, within the eicosamer pools examined herein, which encode a potential diversity of 1012 unique sequences, there are 1012 x 1012 = 1024 potential binary interactions (larger than Avogadro’s number!).

Indeed, the data described in our manuscript indicates that such pools are far from inert but display a pervasive tendency (of between 2-10% of pool sequences) for disproportionation reactions through transesterification chemistry resulting in ligation and recombination reactions among pool members.

III) As outlined above, our intent was never to “bypass” SELEX experiments but rather examine pools global functional potential. However, we accept that we may have given this impression inadvertently by our attempts to dissect the different RNA motifs contributing to pool functional potential in the case of RNA ligation via a 2’, 3’-cyclic phosphate intermediates. We now discuss these results more extensively and provide a more detailed breakdown of the different classes of reactive RNA molecules contributing to this result including an expanded Figure 2 and a new Figure 3 in the main text as suggested by the reviewer.

My major criticism of this work is that you did not show that these sequences react at enhanced rates relative to the average polymer sequence.

With regards to this specific point of reviewer 1, we would like to argue that we in fact did just that.

Our argument runs as follows:

1) The “average polymer sequence” is likely to be unreactive on the timescale examined (hence our proportions of 10% (Figure 1), respectively 2% (Figure 4) of reactivity (depending on pre-activation)). Hence, all ligated or recombined random or semi-random sequence pools show enhanced reaction rates compared to the “average” sequence.

2) Among these subsets of reactive sequences, we find that a majority of intermolecular ligation and recombination reactions proceed via a gapped helical intermediate (as had been described previously for templated >p ligation, see e.g. by Lutay et al., 2006, Biogeosciences (2006), 3: 243 or Usher et al., 1976, Science, 192: 53). In these, intermolecular hybridization provides an internal template and ligation as well as recombination transesterification chemistry via a >p intermediate is accelerated by proximity due to mutual sequence complementarity at the ligation junction.

To better illustrate point and the activity of these, we have now included data on four randomly picked sequence pairs as suggested by the reviewer. Of these, one pair (pair 1) is predicted to form a helical junction at the ligation site, whereas the other three sequence pairs (pairs 2-4) are predicted to form complexes but without a gapped duplex formation at the ligation junction. As expected, only pair 1 shows ligation under our experimental conditions (presumably ligation reaction promoted by proximity), while no ligation is observed for the other three pairs. This data is now shown in the new Figure 2—supplement 2.

3) This naturally leads to the question if all reactions would proceed via a gapped duplex intermediate or if among reactive sequence pairs we could identify other, distinct RNA motifs that would also accelerate intermolecular ligation. Indeed, bioinformatics analysis of a subset of reactive pool sequences allowed us to identify putative secondary structure elements promoting ligation, two examples of which (J4, H4) we analysed in more detail. As we show, J4 clearly accelerates cleavage / ligation beyond the rate provided by simple sequence complementarity and could therefore be referred to as nascent catalytic motif. H4 accelerates the reaction compared to the vast bulk of the pool sequences. However, while it is somewhat slower than a comparable helical intermediate, it promotes formation of a canonical 3’-5’ linkage, which is disfavoured in the helical arrangement (Lutay et al., 2006).

We would therefore argue that we have not only shown that some sequences react with an enhanced rate compared to the average sequence/background rate but that among these, there are again specific motifs, which react with even higher rates and/or distinct regiochemistry. More motifs are very likely to exist but their frequency in the subpool of sequenced RNAs was too low to allow their unambiguous identification.

In order to better illustrate the above arguments as suggested by the reviewer and outline this hierarchical analysis of RNA motifs involved in ligation, we have expanded this section of the manuscript including an expanded Figure 2 and a new Figure 3 in the main text.

The clearest way to address this point would be to experimentally measure the ligation/recombination rates of randomly chosen variants from at least one of the evolved pools. Are the ligation/recombination rates of these sequences higher than those of arbitrary sequences or the unevolved starting pool? Figure 2

As discussed above, there are potentially distinct levels of “background” reactivity. To better illustrate this point, we now specifically include analysis of two typical gapped duplexes as well as the two defined non-duplex RNA motifs identified by bioinformatic analysis. This data is now shown in the new Figure 2—supplement 2 and 6 and main text Figures 2 and 3.

With regards to reconstruction of oligomer reactivities, we note that we can only recapitulate bimolecular reactivity. Any sequences ligated by a 3rd partner oligonucleotide (e.g. acting as a splint) or via more complicated interaction networks or trajectories cannot be reconstructed as the participating oligonucleotides do not form part of the product. Thus, our analysis likely underestimates the extent and diversity of reaction trajectories and reactive motifs as our method is unable to sample in “trans-ligases”. Furthermore, our analysis also excludes non-canonical intra- or intermolecular ligation trajectories that yield circles, lariats or branched products. The lack of reactivity among some randomly picked sequence pairs in Figure 2—supplement 2 should be viewed in this context. Nevertheless, we show that salient features of reactivity as well as individual motifs can be extracted by careful analysis of the data.

Thus, our data unequivocally shows that random RNA sequence pools comprise a diverse range of functionalities both at the level of rare individual sequences (as was known before) and much more prominently at the systems-level / global pool level (which is a key result from the present work).

2) I am aware of several studies (such as Pitt and Ferré-D'Amaré, 2010 and Jiménez et al., 2013) in which high-throughput sequencing was used to characterize pools after in vitro selection experiments. In these studies, sequenced pools contained multiple copies of some variants, and a key component of the analysis was to show that copy number was correlated with catalytic or binding activity. I could imagine analyzing your experiments in a similar way. Furthermore, showing that some sequences appear multiple times in the high-throughput sequencing reads would provide additional evidence that they have enhanced ligations/recombination rates. I suspect that your evolved pools contained too many sequences to perform this type of analysis, but these issues should be at least briefly discussed.

We are of course aware of the work of Pitt and Ferré-D'Amaré, 2010 and Jiménez et al., 2013 but these studies differ from our work in that they sought to correlate the activity-sequence relationships in either a repertoire of mutants of a known small ribozyme motifs or in a population of selected anti-GTP aptamers. Thus, both of these studies sought to map fitness peaks centred around discrete sequences either identified previously or known to occur at high frequency upon selection in the pool examined.

In contrast, our work is aimed at a better understanding not of individual fitness peaks for a given activity but rather the global pool reactivity and potential for function (the fitness landscape in total) as it arises from systems-level interaction networks within the pools.

While we have also mapped reactivity to individual motifs, we did so as a means to better understand the underlying pool reactivity, not to map out fitness peaks.

With regard to sequencing coverage, reviewer 1 is indeed correct that, although there is ca. 1000-fold redundancy of 20-mer sequences in our 1015 sequence pools, the maximum available sequencing depth (109) is insufficient to capture or map sequence enrichment without multiple rounds of selection. Taking this into account, we would therefore argue that the level of reactivity and identification of individual functional sequence pairs in our experiments is striking and unprecedented given the rarity of such sequences observed in previous selection experiments.

The latter is due to the fact that standard SELEX experiments are designed, to iteratively deplete and disrupt the interaction networks underlying global pool reactivity and instead isolate the exceedingly rate unitary sequences capable of function. While such experiments have been ground-breaking in revealing the potential of RNA (as well as DNA and various XNA) sequences for complex catalytic or ligand-binding functions, our results show that back-extrapolation of their frequency leads to a gross underestimation of pool functionality as only single sequences are considered.

Without wishing to overstretch the analogy, our results in many ways parallel similar findings in related fields such as chemistry, where increasingly systems-chemistry approaches are found to yield pathways to products believed to be impossible to obtain by conventional linear synthetic approaches.

In conclusion, in this manuscript we map for the first the time the innate, global functional potential of oligomer pools ab initio. Rather than seeking to isolate the “winners” in an evolutionary experiment (“race”) through cycles of selection, enrichment and amplification, we seek an overview of the potential of all “starters”, i.e. all pool sequences. We have now expanded the discussion of these points to set our work (and its motivation and conclusions) much clearer apart from “conventional” SELEX-type experiments.

3) It seems to me that a model of recombination in which a 5' hydroxyl group in one oligonucleotide attacks a phosphodiester bond in the other is simpler than the one proposed here (in which recombination requires both self-cleavage and ligation). I would recommend providing evidence from either experiments or the literature in support of your proposed mechanism.

We welcome this insightful comment. Previous studies examining RNA ligation via 2’, 3’-cyclic phosphate activation (>p) on an instructing template (e.g. Chetverin et al., 1997; Nechaev et al., 2009) suggested a 2-step mechanism (whereby a first step generates RNA oligomers terminating in >p by hydrolysis). Our own experiments (detailed in Figure 1) of RNA ligation in >p activated random eicosamer pools provides a baseline for reactivity via this chemistry in a random pool setting.

Our observation that recombination reactions in the same pools lacking >p proceeds on a ca. 10-fold slower timescale suggests that either a direct attack mechanism is kinetically slower or that recombination predominantly proceeds via a 2-step mechanism involving (1) (a slow step of) RNA hydrolysis generating >p and (2) ligation via >p. This has been our rationale for favouring this mechanistic trajectory over a direct in-line attack.

Nevertheless, reviewer 1 is of course correct that both mechanisms are likely to occur and indeed may be operating concurrently. In the light of this, we have amended the text to make this clear and now discuss both reaction trajectories in the context of ours as well as published findings.

4) RNA secondary structure can only be predicted to a limited extent. For this reason, I am not convinced that the ligation junctions of these sequences typically occur in the context of duplexes. Can you demonstrate this experimentally? Furthermore, I am not aware of reliable methods to predict RNA tertiary structure. For this reason (and because it is does not change any of the main conclusions of the paper) I would remove the statement on page 5 that J4 forms a purine-rich 4x4 internal loop with a triple-shared GA motif.

This is an important comment, but our arguments do not rely on accurate RNA tertiary (or even secondary) structure prediction.

We very much agree with the referee that exact topologies of RNA secondary structures (that ultimately determine the tertiary structure) are poorly predicted by most software packages (Zhao et al., (2018)). Physics-based free energy minimization approach of RNA secondary structure prediction has been evaluated in detail and found to have a base-pairing consistency of 70% between native and predicted secondary structures for moderately long RNAs (~70 - 500 nts, see e.g. Zhao et al., 2018; Doshi et al., 2004 or Mathews et al., 1999).

However, for our raw analyses only the accuracy of the base-pairing matters (where 70% accuracy is sufficient for a global picture) while “average” tertiary structures are not (and cannot) be analysed using this method. Furthermore, our finding that >p ligation is favoured at helical junctions is not only in full agreement with the literature (Lutay et al., 2006; Usher et al.,1976; Bolli et al. 1997), but we have provide experimental verification in individual examples in this manuscript (Figure 2—figure supplement 2, Figure 2—figure supplement 4, Figure 2—figure supplement 6) suggesting that at the level of the comparatively short RNA oligomers examined herein, predictions are largely trustworthy.

Finally, we note that our suggestion that the J4 motif folds into a 4x4-loop with a triple-sheared GA motif derives not from bioinformatic predictions, but from the fact that J4-like sequence motifs are found in several natural RNAs, for some of which high-resolution structural information is available (23S rRNA of Deinococcus radiodurans (PDB ID: 2ZJR) / Escherichia coli adenosylcobalamin riboswitch (PDB ID: 4GMA)). These structures provide a rationale as to why and how J4 may promote ligation via a 2’, 5’ linkage (as shown in the new Figure 3). Thus, our statement is backed up by multiple lines of structural evidence and we have therefore not altered it in the revised manuscript. Nevertheless, we now provide more context and discussion on the structural folds of J4 and the other RNA motifs to present our arguments more clearly including main text Figure 2, Figure 3 and Figure 2—figure supplement 3, Figure 2—figure supplement, Figure 2—figure supplement 3, Figure 2—figure supplement 4, Figure 2—figure supplement 5 and – Figure 2—figure supplement 6.

5) For me, one of the most exciting and interesting parts of the manuscript was your analysis of the reactivity of polymers other than RNA and DNA. However, the way in which the results were presented was not logical to me. I would recommend first comparing the ability of each of these polymers to promote ligation/recombination reactions under the same conditions, and then discuss new types of experiments (such as treating with base prior to incubation).

We welcome this comment and agree that the differential reactivity of chemically closely related but distinct genetic polymers are a key finding.

As suggested by reviewer 1, we had originally compared all genetic polymers under the same conditions (e.g. neutral pH, 10 mM Mg2+), where we only observed reactivity with RNA pools (Figure 6A). We subsequently introduced modified conditions in order to expedite reactivity in the inert pools e.g. addition of Zn2+ in the case of DNA and ANA pools, which had previously been shown to be an essential cofactor in autocatalytic cleavage by small DNA motifs (Gu et al., 2013). However, we found that even in the presence of Zn2+, we observed no reactivity in the DNA (or ANA) pools. These negative results are now provided in Figure 6—figure supplement 1.

As we had already observed degradation of AtNA (but not HNA) in response to alkaline pH (although 6-fold slower rate for AtNA than RNA, Figure 6—figure supplement 2), we tested RNA, HNA and AtNA pool reactivity following a pulse of NaOH (now Figure 6B). Indeed, this revealed that AtNA, like RNA- is able to recombine, although reactivity was noticeably weaker (ca. 10-20-fold). In contrast, we did not observe recombination of AtNA (and HNA) in absence of NaOH. We had previously not included these negative results but now the data is presented in the new Figure 6—figure supplement 5. A direct comparison of the observed recombination yields between RNA and AtNA is also complicated by the fact that observed yields depend on the efficacy of the respective reverse transcriptase, the sensitivity of which undoubtedly better for RNA. We have now expanded the discussion of the above results in the revised manuscript to better outline the rationale for our different experimental strategies.

Reviewer #2:

Mutschler et al., have investigated spontaneous cleavage and ligation reactions among a diverse population of RNA (and RNA analogue) molecules. These reactions are well known, and the ability of a complementary template to accelerate the rate of ligation was first described over 40 years ago. Although these reactions have previously been carried out with mixed populations, this is the first study, to my knowledge, to apply deep sequencing to assess sequence preferences, especially for ligation. Not surprisingly, the authors find that there are some sequence preferences, which reflect a combination of intrinsic chemical reactivity and the potential for templated interactions. The reasons for the former are obscure, whereas the latter are due to both canonical and non-canonical templating effects. It is not surprising that DNA, ANA (arabino nucleic acid), and HNA (hexitol nucleic acid), all of which lack a vicinal cis-hydroxyl, do not undergo these reactions, whereas AtNA (altritol nucleic acid), which contains a vicinal cis-hydroxyl, does.

1) This is a well-executed study that will be of interest to nucleic acid chemists, but will not have broader appeal, as would be required for publication in eLife.

We would like to respectfully disagree with this assertion of reviewer 2. Recombination, the exchange of information between different genetic polymer strands, is of fundamental importance in biology (see e.g. Visser and Elena, 2007 Pesce et al., 2016) both for genetic diversification in the germline (meiosis) and soma (e.g. gene conversion in avian antibody generation) as well as genome maintenance and repair. It is furthermore of critical importance to guard against the progressive corruption of genetic information by the deleterious effects of genetic drift (i.e. Muller’s ratchet), enabling elimination of neutral or weakly deleterious mutations by backcrossing (Muller, (1964)).

RNA recombination is widespread among RNA viruses (Simon-Loriere and Holmes, 2011). While generally attributed to template switching etc. recombination by inferred purely chemical processes has been observed in various viruses such as Poliovirus (Gmyl et al., 1999), rubella virus (Adams et al., 2003) or Qbeta bacteriophage (Chetverin et al., 1997), pointing to the possibility of a wider importance of non-enzymatic RNA recombination in present day biology with implications for non-viral transcriptome complexity and fluidity.

We would therefore argue that our finding that RNA – due to its specific chemical constitution- has an innate capacity and tendency to recombine even in the absence of recombinase enzyme or extraneous activating chemistry has important implications both for origins research as well as indeed the analysis of all diverse RNA oligomer assemblies.

In the context of primordial RNA pools, this innate capacity has fundamental implications as it provides for a progressive disproportionation of RNA sequences, which does increase pool complexity by the de novo generation of diversity of RNA oligomer length and sequence content and secondary structure driven by intrapool RNA base pairing networks (see also response to point 2 below).

The potential for spontaneous recombination is likely innate to any diverse assembly of RNA sequences (including those found in biology) and should be considered when analysing such assemblies and their dynamics including studies of the transcriptome and viral populations and quasi-species.

2) The conclusions are generally well supported by the data. However, it is somewhat misleading to point to the "increase in the compositional and structural complexity of recombined pools" while ignoring the decrease in oligomer length that occurs due to the same reactions. If one considers only the ligated products, then those materials can be said to have increased complexity. But for the assembly of materials that involve both cleavage and ligation events (as in Figure 3), one must also consider the many shorter cleavage products that arise. It is not clear from this study how the reactions could achieve an overall increase in compositional complexity.

We welcome this insightful comment by reviewer 2 and agree that our claims of increased compositional and informational diversity/complexity need to be supported by a more comprehensive analysis of the impact of recombination on pool complexity.

1) We would like to point out that while notions of spontaneous generation of increased informational and compositional complexity by RNA pools may seem counterintuitive, they are in fact both predicted and supported by theoretical arguments from statistical mechanics and thermodynamics, which suggest that under the right boundary conditions an increase in length and informational complexity of (RNA) polymer chains is favoured by entropic factors (see e.g. Blokhuis and Lacoste, 2017). We now outline these arguments as part of an expanded Discussion section in the revised manuscript.

2) We now provide in silico simulations and analysis of the population level Shannon entropy (an established measure of pool sequence diversity i.e. informational complexity (Derr et al., 2012)) of pre- and post-recombination pools. These calculations consider total pool informational content, under conditions of random oligomer cleavage and ligation and show that such recombination indeed increases the pool informational diversity under almost any boundary condition, provided that recombination is applied to a pool with significant sequence redundancy. Indeed, this applies to our experimental system: the 1015- member eicosamer pools analysed herein are redundant with respect to encoded information, covering the total combinatorial diversity (420 =1.09 x 1012) ca. 1000-fold (i.e. there are approx. 900 copies of each eicosamer sequence within the pool).

Our finding that recombination diversifying parts of a redundant sequence pool generates new information is also intuitively understandable, as recombination of part of a redundant pool generates new sequences (both in length distribution and information content), while maintaining (i.e. not destroying) the previous sequence and information content (as redundancy persists but is reduced). In other words, sequence / compositional / informational complexity is increased at the expense of redundancy.

Other outcomes of our calculations may be less obvious but no less interesting. These include the finding that the informational gains through recombination scale both with sequence code complexity (4 letter code > 2 letter code) and oligomer length as Recombination of longer oligomer pools generates more diversity. The simulations illustrate the trade-offs between enhanced cleavage rates, which generate more short fragments (loss of information) but also generate more ligatable fragments (and thus enable information gains) and ligation rates, which fix informational gains.

In addition to detailed calculations, we now also provide an extended discussion of these points in our revised manuscript including a new Figure 7 and accompanying figure supplements.

3) It is not clear from this study how the reactions could achieve an overall increase in compositional complexity. Presumably this would require a fractionation process or means to shift the chemical equilibrium in favor of ligation. The present study would be more suitable for a specialized journal such as Nucleic Acids Research or Angewandte Chemie.

Reviewer 2 points out an important aspect of our study. While the above (see point 2) simply examines the generation of diversity as a theoretical concept of informational complexity in the context of early evolution, the impact of recombination on sequence pools under fractionation and / or adaptive pressure is more significant and more relevant to real world scenarios.

Indeed, there are a number of physicochemical processes (including thermophoretic cycles in porous substrates / adsorption to (mineral) surfaces etc.), which would favour length-extended oligomers over pool members or short cleavage products (see for example Kreysing et al., 2015). However, the simplest fractionation process that favours more diverse and length extended oligomer sequences is selection for (higher order) function.

Previous studies had established the importance of pool diversity as well as increased oligomer length not only for the discovery of functional motifs but, more importantly, for the discovery of the most active or complex functional motifs (reviewed in Pobanza and Luptáka, 2016), even in the presence of trade-off from an increased tendency for ambiguous folding of longer RNA oligomers (Sabeti et al., 1997). Furthermore, the power of continued recombination of sequence pools under selective pressure to accelerate adaptive walks and evolutionary optimization is clearly demonstrated both by examples from biology (gene conversion, RNA virus recombination (see above)) and protein engineering (see e.g. Stemmer, 1994, reviewed in Arnold, 2009).

Therefore, we would argue that this all leads to an inescapable conclusion, which is that the innate potential of RNA oligomer pools for recombination provides a route unique to RNA (and to some degree other genetic polymers, see below) to bootstrap themselves towards higher informational, compositional and structural diversity and hence evolutionary potential.

Given the challenges of prebiotic chemistry to generate oligomer pools of substantial length (the here investigated 20-mers are very much at the upper end of RNA oligomer lengths reported containing all four nucleotides) the innate potential of such pools to extend themselves in length and diversity by recombination is highly significant. In light of theoretical arguments, which suggest a sharp drop in the potential for folding into stable secondary (and by inference tertiary) structures needed for function below a certain oligomer length limit (Briones et al., 2009), recombination might have been of fundamental importance for the emergence of the first functional RNAs.

Furthermore, in the presence of a fractionation process selecting for increased oligomer length (e.g. thermophoresis (see above)) or a given function (such as binding to a substrate and / or catalysis), RNA pools capable of recombination would yield a continuous supply of extended, more structured and more functional motifs and thereby outcompeting putative rival genetic polymers (such as DNA) unable to recombine. Such process would be potentiated by an ongoing process of recombination during selection (or cyclical iteration of either) as recombination has previously been shown to be a powerful means to accelerate adaptive walks (see above).

Finally, having established the potentially profound importance of recombination for the emergence of function from oligomer pools, our work begins to define key chemical parameters required for the display of an innate potential for recombination. Among these, a vicinal diol geometry on the ring appears to be crucial enabling formation of a cyclic phosphate upon hydrolysis, which supports subsequent ligation.

Reviewer #3:

[…] The ligated junctions for RNA seem to favor a G on the downstream of the ligation site. The RNA ligation site is thus biased, though not completely, towards a C/G sequence, not just CN. The authors use the unligated pool as a control; however, sequencing of unligated prey sequences is strongly biased by the method of introducing sequencing primers on the 5′ end (which depends on cross-templating by the RT enzyme and the SMARTER oligo substrate), so the statistics of the ligated vs unligated nucleotide at the 3′ side of the junction are not reliable. I understand the authors are being careful with their interpretation of the data, but the biases of the trans-templating method are known and can therefore be incorporated into the analysis, at least at the level of discussion if not directly in the Results section.

We welcome this perceptive comment.

The bias of trans-templating is of course real and similar issues exist in other library preparation methods for diverse nucleic acid pools. However, we have several lines of evidence that the CpN dinucleotide signature is a direct result of the transesterification reaction. These are detailed below:

1) We prepared ligated and recombined RNA pool fractions using a range of different methods (incuding adaptor ligation, direct RT-PCR, SMARTER RT-PCR) but observed the CpN bias regardless. Even the N20 x N20A10 products produced by polyT-primed SMARTer RT-PCR (Figure 5), which we do not compare to preligation material show a raw C-bias at the NpN recombination site in products ≤ 50 nts (Figure 5C). Here, the random nucleotide at the position of the recombination event was never exposed to terminal adapter ligation or template switching.

2) The reference pools of unligated N20>p and C1/C3-bait>p semi-random RNA (controls for experiments described in Figure 1 respectively Figure 2) were prepared by dephosphorylation of the >p followed by adapter ligation of a preadenylated oligonucleotide and not by SMARTer RT-PCR. While this protocol is also prone to sequence biases at the 3’-end (and close to it), due to sequence preferences of the ligase, the reference pools used in Figure 1-supplement 1 are unbiased further upstream of the adapter-ligated 3’-end, where the comparison happens and where we observe the same CpN signature in all recombination reactions. Furthermore, even without taking the pre-ligation material as reference, (and as already stated in 1), a raw C-bias is directly visible in all recombination products shown in the colour plot in Figure 4D, Figure 5C and the supplementary movie, where the C-bias follows the recombination “seam” of the differently sized products independently from the library preparation method.

Thus, the overall ligation junction sequence bias / motif CpN was apparent throughout all RNA recombination experiments with some (minor) variations as detailed in Figure 1, Figure 4, Figure 5 and Figure 1—figure supplement 1. While referee 3 is correct at noting the bias for a downstream G (CpG) in the N20>p experiments, this bias is not observed in all cases (e.g. the semi-random RNAs shown in Figure 1—figure supplement 1 and Figure 2—figure supplement 1D) and we have therefore not elaborated on it.

Figure 4B is key for the conclusions of this manuscript, but the bands for HNA are practically invisible and for AtNA the -80 control is very poorly visible. When intensified, both HNA and AtNA show bands corresponding to longer products. Ditto for DNA and ANA lanes in Figure 4—figure supplement 1, where little to nothing is visible in either the experiment or the control lanes. These gels should be presented intensified as in Figure 3A. Related to this point, why are the -80 °C controls in Figure 4 invisible in the first place?

The reviewer is correct that bands for HNA (both -9 oC & -80 oC), AtNA (-80 °C), DNA (both -9 oC & -80 oC) and ANA (both -9 °C & -80 °C) are practically invisible. Indeed, these reactions are what we consider negative (i.e. ligation / recombination did not take place, and hence no unambiguous PCR amplicons (larger than primer-dimers) were detected (in contrast with clear amplification products of RNA (-9 oC) and AtNA (-9 oC)).

As reviewer 3 correctly observes, it is also possible to make out faint bands in some of the other reactions including negative control PCRs, e.g. at -80 °C. However, none of these weak bands relate to (weak) recombination / ligation activity in e.g. the DNA, ANA or HNA pools for a number of reasons:

The appearance of weak amplification products is a common occurrence in all PCR reactions, where (spurious) bands will eventually appear after sufficient PCR cycles, as even minute amounts of mispriming or cross-contamination are exponentially amplified. The problem is greatly increased in PCR amplifications from random (or semi-random) sequence pools due to the increased likelihood of mispriming events on the random sequence segments. Furthermore, the weak bands pointed out by the referee occur to the same extent in both the -9 °C samples (incubated for prolonged periods) and -80 °C negative controls (which were immediately deep frozen at -80oC) and can therefore not be the consequence of a reaction occurring during the incubation period.

The typical low-molecular weight band seen in the -80 °C negative control reactions (likely due to mispriming of the RT-primer on the random sequence part of the bait fragment) is reduced or absent in Figure 6B (previously Figure 4B), due a) to the pre-incubation NaOH treatment, which leads to fragmentation of semi-randomers (in the case of RNA and AtNA) and b) a lower sensitivity of the RT enzyme for HNA and AtNA. Both a) and b) cause less opportunity for spurious background mispriming of the RT primer in the experiment shown in Figure 6B. At the same time, the intensity of the bona fide recombination products in both RNA (and AtNA) (when incubated at -9 oC) is enhanced by NaOH (as would be expected).

We therefore stand by our previous conclusion that only RNA (and to a lesser extent AtNA) pools have an innate potential for ligation and recombination while this capacity is either absent or negligible in DNA, ANA and HNA pools.

What are the diversities of libraries used? The theoretical limit of a 20-mer is 42≈1012. Are these libraries exhaustive with respect to the potential diversity limit? Are there known biases in composition for the non-biological genetic polymers? The authors do not mention the amounts of material used, just concentrations. Clearly, sequencing of the entire starting pool (or even recombined and directly isolated ones) is not practical, but for the PCR-amplified pools, the sequencing results may be exhaustive. For that, however, an estimate of the starting diversity and the total amplification of the recombined products would have to be presented. The manuscript would be enhanced if these numbers were included for all experiments and pools, particularly for the RT-PCR-amplified pools, where the HTS results may represent nearly all ligated sequences.

We thank reviewer 3 for this insightful comment and apologize for having omitted a fuller discussion of the diversities explored, which we now provide in the revised manuscript.

We chose eicosamer pools because, within the repertoire sizes explored within our experiments (1014-1015 molecules), there is considerable redundancy within these pools (diversity 420 = 1.09 x 1012). Although we are unable to completely capture this within the current limits of NGS sequencing (ca. 109 reads), we can infer whole pool diversity from the degree of randomness of the partially sequenced pool.

Even within reacted pools where we observe 10% (Figure 1, 1014 sequences) resp. 2% or 0.2% (Figure 5, 1013 or 1012) of pool sequences ligated / recombined, NGS cannot provide full coverage. We are therefore, in all cases, observing a representative sample. Within that sample we can exploit similarities and redundancy such as is present in structural classes of RNA motifs. We leverage this in the bioinformatic RNA motif classification analysis we present (Figure 2, Figure 3 and associated supplementary figures) and show that it enables us to identify reactive motifs.

Another issue worthy of consideration is the fact that while we obtain RNA and DNA pools directly by solid-phase synthesis), XNA pools (ANA, HNA, AtNA) need to be obtained by enzymatic synthesis using bespoke engineered polymerases, as solid phase synthesis is not available. While enzymatic synthesis is efficient on random DNA templates, inevitable biases are introduced, which reduce the diversity of XNA pools compared to DNA and RNA pools. Nevertheless, this is unlikely to significantly impact pool reactivity, given the considerable redundancy within eicosamer pools.

The free energy of activation of a >p vs a triphosphate is known. Assuming similar activation energy for the ligation of a 5′-OH with >p and a triphosphate, the expected kinetics for the background reaction can be estimated from the Rohatgi and Szostak, 1996 paper. An estimate of the background ligation rate and acceleration by the individual motifs should be presented.

We welcome this perceptive comment. However, we would like to refrain from using triphosphate dependent ligation as a reference as the activation energy is critically dependent on the presence and concentration of divalent metal ions like Mg2+ (which are absent in the majority of our reactions). Furthermore, in triphosphate-based ligations, the diphosphate leaving group renders the ligation reaction quasi-irreversible (kobs = kligation). In contrast, >p-dependent ligation is reversible due to the absence of a leaving group (kobs = kligation + kcleavage) and the (chemically) almost isoenergetic reaction equilibrium.

Finally, it is challenging to extrapolate triphosphate-dependent ligation rates determined under ambient in-solution conditions to reactions taking place at much lower temperatures in eutectic in-ice conditions.

In order to provide a (better) estimate of the rate acceleration, we now compare the apparent reaction rates (kobs) for motif J4 and H4 to the kobs of three template / splinted ligation reactions (which may be considered the background ligation rate occurring in a gapped duplex) and summarize this data in the new Figure 2—figure supplement 6.

https://doi.org/10.7554/eLife.43022.054

Article and author information

Author details

  1. Hannes Mutschler

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Present address
    Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Conceptualization, Resources, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    Contributed equally with
    Alexander I Taylor
    For correspondence
    mutschler@biochem.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8005-1657
  2. Alexander I Taylor

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Contribution
    Conceptualization, Resources, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Hannes Mutschler
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7684-1437
  3. Benjamin T Porebski

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Contribution
    Software, Validation, Visualization, Methodology
    Competing interests
    No competing interests declared
  4. Alice Lightowlers

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Present address
    Weatherall Institute of Molecular Medicine, Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    Validation, Investigation
    Competing interests
    No competing interests declared
  5. Gillian Houlihan

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Contribution
    Resources, Developed, produced and tested the new RT-enzyme
    Competing interests
    No competing interests declared
  6. Mikhail Abramov

    REGA Institute, Katholieke Universiteit Leuven, Leuven, Belgium
    Contribution
    Resources, Synthesized and quality-checked the HNA and AtNA nucleosides used in this study
    Competing interests
    No competing interests declared
  7. Piet Herdewijn

    REGA Institute, Katholieke Universiteit Leuven, Leuven, Belgium
    Contribution
    Resources, Funding acquisition
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3589-8503
  8. Philipp Holliger

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Contribution
    Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    ph1@mrc-lmb.cam.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3440-9854

Funding

Medical Research Council (MC_U105178804)

  • Alexander I Taylor
  • Benjamin T Porebski
  • Gillian Houlihan
  • Philipp Holliger

Federation of European Biochemical Societies (FEBS Long-Term Fellowship)

  • Hannes Mutschler

KU Leuven (OT/1414/128)

  • Mikhail Abramov
  • Piet Herdewijn

Fonds voor Wetenschappelijk Onderzoek - Vlaanderen (G078014N)

  • Mikhail Abramov
  • Piet Herdewijn

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank J Attwater for discussions and comments on the manuscript.

This work was supported by a Federation of European Biochemical Societies (FEBS) Long-Term Fellowship (HM), the Medical Research Council (PhH, AIT, GH, program no. MC_U105178804) and from KU Leuven, Research Council (OT/1414/128) and FWO Vlaanderen (G078014N) (MA and PiH).

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. Ulrich Muller, University of California, San Diego, United States

Publication history

  1. Received: October 21, 2018
  2. Accepted: November 16, 2018
  3. Accepted Manuscript published: November 21, 2018 (version 1)
  4. Version of Record published: December 11, 2018 (version 2)

Copyright

© 2018, Mutschler et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 934
    Page views
  • 167
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biochemistry and Chemical Biology
    2. Neuroscience
    Apurwa M Sharma et al.
    Research Advance
    1. Biochemistry and Chemical Biology
    2. Chromosomes and Gene Expression
    Jeremy G Bird et al.
    Research Article