tRNA sequences can assemble into a replicator
Abstract
Can replication and translation emerge in a single mechanism via self-assembly? The key molecule, transfer RNA (tRNA), is one of the most ancient molecules and contains the genetic code. Our experiments show how a pool of oligonucleotides, adapted with minor mutations from tRNA, spontaneously formed molecular assemblies and replicated information autonomously using only reversible hybridization under thermal oscillations. The pool of cross-complementary hairpins self-selected by agglomeration and sedimentation. The metastable DNA hairpins bound to a template and then interconnected by hybridization. Thermal oscillations separated replicates from their templates and drove an exponential, cross-catalytic replication. The molecular assembly could encode and replicate binary sequences with a replication fidelity corresponding to 85–90 % per nucleotide. The replication by a self-assembly of tRNA-like sequences suggests that early forms of tRNA could have been involved in molecular replication. This would link the evolution of translation to a mechanism of molecular replication.
eLife digest
The genetic code stored within DNA contains the instructions for manufacturing all the proteins organisms need to develop, grow and survive. This requires molecular machines that ‘transcribe’ regions of the genetic code into RNA molecules which are then ‘translated’ into the string of amino acids that form the final protein. However, these molecular machines and other proteins are also needed to replicate and synthesize the sequences stored in DNA. This presents evolutionary biologists with a ‘chicken-and-egg’ situation: which came first, the DNA sequences needed to manufacture proteins or the proteins needed to transcribe and translate DNA?
Understanding the order in which DNA replication and protein translation evolved is challenging as these processes are tightly intertwined in modern-day species. One theory, known as the ‘RNA world hypothesis’, suggests that all life on Earth began with a single RNA molecule that was able to make copies of itself, as DNA does today. To investigate this hypothesis, Kühnlein, Lanzmich and Braun studied a molecule called transfer RNA (or tRNA for short) which is responsible for translating RNA into proteins. tRNA is assumed to be one of the earliest evolved molecules in biology. Yet, why it was present in early life forms before it was needed for translation still remained somewhat of a mystery.
To gain a better understanding of tRNA’s role early in evolution, Kühnlein, Lanzmich and Braun made small changes to its genetic code and then carried out tests on these tRNA-like sequences. The experiments showed these ‘early’ forms of tRNA can actually self-assemble into a molecule which is capable of replicating the information stored in its sequence. It suggests early forms of tRNA could have been involved in replication before modern tRNA developed its role in protein translation.
With these experiments, Kühnlein, Lanzmich and Braun have identified a possible evolutionary link between DNA replication and protein translation, suggesting the two processes emerged through one shared pathway: tRNA. This deepens our understanding about the origins of early life, while taking biochemists one step closer to their distant goal of recreating self-replicating molecular machines in the laboratory.
Introduction
A machine to create replicate of itself is an old dream of engineering (von Neumann, 1951). Biological systems have solved this problem long ago at the nanoscale with DNA and RNA. Their replication machinery was optimized to perfection through Darwinian evolution. In modern living systems, the replication of DNA and RNA necessitates the formation of covalent bonds. It requires an interconnected machinery: proteins need to perform base-by-base replication of sequence information, a modern metabolism to supply activated molecules, and tRNA as well as the ribosome to create the required proteins.
This is a complex system to set up in the first place at the emergence of life. The RNA world hypothesis proposes, that early on, the catalytic function of highly defined RNA sequences was used for self-replication (Horning and Joyce, 2016; Orgel, 2004; Turk et al., 2011). These ribozymes catalyze the ligation of RNA (Doudna et al., 1991; Mutschler et al., 2015; Paul and Joyce, 2002; Robertson et al., 2001; Walton et al., 2020) and the addition of individual bases (Attwater et al., 2013; Horning and Joyce, 2016). These very special sequences were engineered using in vitro evolution. It is unclear how autonomous evolution of early life could have reached such levels of sequence complexity.
Here, we focus on how such replication may have been predated by simpler forms of self-replication. Creating a replicator must fulfill a series of requirements. Replication must yield fidelity in copying, be fast, enable exponential replication, be fed by an autonomous energy source, not require complex sequences and should not form too many replicates without the existence of a template.
We show that replication of information can be realized by the reversible hybridization interactions between tRNA-like molecules alone. The proposed mechanism is driven by an external physical non-equilibrium setting, in our case thermal oscillations. Since the process does not involve chemical ligation, it does not rely on a particular non-enzymatic or catalytic ligation chemistry (Dolinnaya et al., 1988; Engelhart et al., 2012; Patzke et al., 2014; Pino et al., 2011; Rohatgi et al., 1996; Sievers and von Kiedrowski, 1994; von Kiedrowski, 1986) or particular catalytically active sequences, but merely requires sequence complementarity. The advantage of reversible hybridization is the re-usability of educts and products. Moreover, sequence-encoded interactions can self-select by forming agglomerates.
Nature’s approach to achieve exponential growth is the usage of cross-catalysis: the replicate of a template serves as a template for the next round of replication. For short replicators under isothermal conditions, the binding between template and replicate has to be weak such that the dissociation of strands happens spontaneously and is not rate limiting (Paul and Joyce, 2002; Sievers and von Kiedrowski, 1994; von Kiedrowski, 1986). For longer replicates, temperature change has successfully been used to separate strands for replication catalyzed by thermostable proteins (Barany, 1991; Saiki et al., 1985). For catalytic RNA, elevated salt concentrations disfavor strand separation by temperature and catalyze hydrolysis (Horning and Joyce, 2016). In an interesting alternative to strand separation by temperature, Schulman et al. used moderate shear flows to separate DNA tile assemblies (Schulman et al., 2012).
Apart from nucleotide-based replicators, very interesting replication systems using non-covalent interactions have been developed with non-biological compounds (Bottero et al., 2016; Sadownik and Philp, 2008; Tjivikua et al., 1990), peptide-based approaches (Altay et al., 2017; Bourbo et al., 2011; Carnall et al., 2010; Lee et al., 1996; Rubinov et al., 2012), and peptide nucleic acids (Ura et al., 2009). We also want to point to several instructive reviews about the state-of-the-art systems chemistry regarding self-replication (Adamski et al., 2020; Ashkenasy et al., 2017; Kosikova and Philp, 2017).
In the past, metastable hairpin states have been prepared in a physically separated manner. The reaction was then triggered by mixing. For example, the mixing of hairpins with a trigger sequence has been shown to form long concatemers (Dirks and Pierce, 2004). With a similar logic, mixing a low entropy combination of molecules was used to create entropically driven DNA machines, including exponentially amplifying assemblies (Zhang et al., 2007). These reactions run downwards into the binding equilibrium. However, the preparation of the initial low entropy state required human intervention or a unique flow setting for mixing.
Sequence design
We designed a set of cooperatively replicating DNA strands using the program package NUPACK (Zadeh et al., 2011). The sequences are designed to have self-complementary double hairpins and are pairwise complementary within the molecule pool, such that the 3’ hairpin of one strand is complementary to the 5’ hairpin of the next. Their structure resembles the secondary structure of proto-tRNAs proposed by stereochemical theories (Figure 1a), comprising two hairpin loops that surround the anticodon with a few neighboring bases (Krammer et al., 2012). The lengths of 82–84 nt of the double hairpins are that of average tRNA molecules (Sharp et al., 1985), with stem loops consisting of 30–33 nt and the information-encoding interjacent domains of 15 nt. As the replication mechanism is based on hybridization only, it is expected to perform equally well for DNA and RNA. Here, we implemented the system with DNA and not RNA as done previously (Krammer et al., 2012). Both, in the design and the implementation we did not see significant differences between the two versions. Because of the simpler and more inexpensive synthesis of the 82–84 nt long sequences we now implemented the replicator in DNA. Due to short heating times and moderate magnesium concentrations, we estimate that an RNA version could survive for days if not weeks (Li and Breaker, 1999; Mariani et al., 2018). The most critical step regarding the RNA stability would be the initial temperature spike to 95 °C, which remains unchanged from our previous study (Krammer et al., 2012) and did not prove critical. We also show that an RNA version behaves structurally identical to the implemented DNA version (Figure 1—figure supplement 1).
Replication mechanism
The replication mechanism is a template-based replication, where instead of single nucleotides, information is encoded by a succession of oligomers. The domain, at the location of the anticodon in tRNA, is the template sequence and thus contains the information to be replicated. We therefore term it information domain. The goal is to replicate the succession of information domains.
To allow longer replicates, we chose the resulting meta-sequences to be periodic with a periodicity of four different hairpins. This makes the minimal cyclic meta-sequence large enough to keep the information domains accessible even in cyclic configuration. The information domains feature a binary system and contain sequences marked by '0' and '1' (blue/red). For replication, two sets of strands replicate strings of codons in a cross-catalytic manner (Figure 1b), using complementary information domains (light/dark colors).
The replication is driven by thermal oscillations and operates in four steps (Figure 1b): (0) Fast cooling within seconds brings the strands to their activated state with both hairpins closed. (1) At the base temperature, activated strands with complementary information domains can bind to an already assembled template. (2) Thermal fluctuations cause open-close fluctuations of the hairpins. When strands are already bound to a template at the information domain, those fluctuations permit adjacent complementary hairpins of different strands to bind. In this way, the succession of information domains is replicated. (3) Subsequent heating splits the newly formed replicate from the template at the information domains. Due to their higher melting temperatures, the backbone of hairpin strands remains stable. Both, replicate and template, are available for a new replication round. This makes both the replicate and the template replication cross-catalytic in a subsequent step. Later, high temperatures spikes can unbind and recycle all molecules for new rounds of replication.
Because of the initial fast cooling, all hairpins are closed in free solution. This inhibits the formation of replicates without template. While the binding of adjacent hairpins with template happens within minutes, hairpins in free solution connect without template only on timescales slower than hours and thus give false positives at a very low rate.
The basic principle of this replication mechanism was previously explored by Krammer et al. using a set of four hairpins using half a tRNA sequence (36 nt) that amplified into dimers (Krammer et al., 2012). This amplification could not encode information and suffered from a high rate (>50 %) of unspecific amplification without template (Figure 4 therein). Here, in contrast, we demonstrate exponential amplification, and the replicator can now encode sequence information ‘0’ and ‘1’ with four bits. Moreover, the strands making up the new replicator are double hairpins with the sequence structure and length of tRNA. The replicator now shows a significantly decreased unspecific amplification without template of approximately 10 % (Figure 5a).
Results
Analysis of molecule conformations
Native polyacrylamide gel electrophoresis (PAGE) showed that the double hairpins assembled as intended (Figure 2). Comparing different subsets of strands allowed to identify all gel bands.
All complexes were formed at concentrations of 200 nM of each strand and could be resolved despite their branched tertiary structure. Friction coefficients of complexes of two to four strands were 1.6–1.8-fold higher than for linear dsDNA, and 2.4-fold higher for larger complexes (4:4 configuration, ca. 660 nt, Figure 2—figure supplement 1). This agrees with the branched structure of the suggested strand assembly geometry (Figure 1a). Partially assembled complexes of two or three strands bound to a four-strand template could be resolved (Figure 6—figure supplement 1). Complexes containing single bound information domains were not stable during electrophoresis (Figure 2, lanes 2, 7 and Figure 6—figure supplement 1). This allowed to differentiate fully assembled complexes from those where individual strands are bound to a template but have not formed backbone duplexes. Covalent end labels and two reference lanes on each gel were used to quantify concentrations from gel intensities using image analysis as described in Materials and methods.
Selection by agglomeration and sedimentation
For a replicator to be autonomous, there must be a mechanism in place to select, assemble and (re-)accumulate its molecular components purely at one location. We argue that DNA hydrogels could offer such a solution. While DNA often, also in our case, assembles into agglomerates, DNA hydrogels have been shown to be able to form fluid phases if gaps of single bases were added to create flexible linkers between molecules (Nguyen and Saleh, 2017).
We combined eight matching hairpin sequences of design as introduced in Figure 1 at moderately elevated concentrations and cooled the system to only 25 °C after separating the molecules at 95 °C (Figure 3). We found the spontaneous formation of agglomerates that were large enough to sediment under gravity. The initial homogeneous fluorescence turned into micrometer-sized grains and sedimented within hours. The fluorescence was provided by a covalently attached label to either strand or . Since the double hairpins have a periodic boundary condition, they can create large assemblies (Figure 3a).
It is evident from Figure 3—video 1 that the sedimentation was very selective. When only seven of the eight matching hairpins were present, sedimentation was much weaker and, in most cases, undetectable (Figure 3b,c). For the full system, the sedimentation kinetics showed to be strongly concentration dependent (Figure 3—figure supplement 1b). Analogous experiments with random sequences (random pool of 84 nt strands) at equal concentration did not show agglomeration nor sedimentation (Figure 3—figure supplement 1c). We have previously found that similar hairpin molecules provided the shortest sequences capable of forming agglomerates (Morasch et al., 2016).
The above results suggest that agglomeration could serve as an efficient way to assemble matching hairpins from much less structured and selected sequences in an autonomous way. After the molecules have been assembled as sedimented agglomerates, a convection flow can carry the large assemblies into regions of warmer temperatures, where the molecules would be disassembled by heat and activated for replication with a cooling step. Similar recycling behavior is seen in thermal gradient traps (Morasch et al., 2016), which were also found to enhance the molecular assembly (Mast et al., 2013) with characteristics that can match the above scenario.
Templating kinetics
Hybridization between stems of neighboring hairpins (Figure 1b, step 2) was catalyzed by the presence of already assembled complexes , confirming its role as a template. Assembly kinetics at 45 °C were recorded in reactions containing 200 nM of each strand for a range of template concentrations. At 120 nM template concentration, 40 % yield was achieved within 10 min (Figure 4b, black line). The untemplated, spontaneous reaction proceeded significantly slower (1.4 % yield, light gray line).
Assembly rates showed a strong dependence on incubation temperature (Figure 4c). At 39 °C, the reaction proceeded significantly slower than at 42 °C or 45 °C. This is because the hairpins are predominantly in closed configuration and cannot bind to neighboring molecules in the assembly. Binding between complementary information domains still occurs, but the formation of bonds between neighboring strands becomes rate limiting. Above the melting temperature of the information domain (48 °C) (see Figure 4—figure supplement 1), template-directed assembly becomes slower. However, the slower kinetics of template-directed product formation are partially superposed by the spontaneous product formation lacking an initial template (Figure 4c, small circles), which becomes an additional reaction channel due to the now open hairpins.
Exponential amplification
As intermediate step toward replication, we studied amplification reactions under thermal oscillations (Figure 5). The amplification reactions only contained strands encoding for information domain '0', that is , , , , …, . The strands were subjected to thermal oscillations between Tbase = 45 °C and Tpeak = 67 °C. The lower temperature was held for 20 min, the upper for one second with temperature ramps amounting to 20±1 s in each full cycle. This asymmetric shape of the temperature cycle accords with differences in kinetics of the elongation step and the melting of the information domain. It is typical for trajectories in thermal convection settings with local heating (Braun et al., 2003).
The growth of molecular assemblies with different initial concentrations of template revealed an almost linear dependence of the reaction velocity on the initial amount of template (Figure 5a, b). This confirms the exponential nature of the replication. The cross-catalytic replication kinetics can be described by a simplistic model that only considers the concentrations of the template and its complement of :
Here, is the rate of cross-catalysis and the spontaneous formation rate. For , the model corresponds to simple exponential growth on a per-cycle basis. The model can be solved in closed form but does not account for saturation effects from the depletion of monomers. Therefore, it is not valid for concentrations similar to the total concentration of each strand. Fitting the model to the amplification reactions with 0–45 nM of template revealed rate constants of = 0.16 cycle−1 and = 0.4 nM cycle−1 (Figure 5b). Amplification was robust with regard to the peak temperature of the oscillations. For Tpeak below 74 °C, the reaction remained almost unaffected (Figure 5c). Above, the temperature is too close to the melting transitions of the hairpin-hairpin duplexes, ranging from 76 to 79 °C (Figure 4—figure supplement 1).
The ability to withstand consecutive dilutions is characteristic for exponentially growing replicators and was tested for in serial transfer experiments. Strands encoding for '0' (i.e. , , , etc.) were thermally cycled with 30 nM of template . After three cycles each, samples were diluted one to one with buffer containing all eight strands as monomers at 200 nM each (Figure 5d). This high frequency of dilutions prevented the reaction from transitioning into the saturating regime. The cross-catalytic model was fitted to the data with the dilution factor as single free parameter, that was found to be 0.43. The difference from the theoretical value of 0.50 was likely due to strands sticking to the reaction vessels before dilution. As a control, a reaction with the same initial concentration of template , but without monomers , , , , was subjected to the same protocol. As the control could not grow exponentially, it gradually died out (Figure 5d, open circles).
Sequence replication
The above-mentioned reactions did amplify, but not replicate actual sequence information, as they only contained strands with 0/ information domains. To study the replication of arbitrary sequences of binary code, replication reactions with all 16 strands encoding for '0' and '1' were performed. To discriminate sequences encoded in equally sized complexes and deduce error rates, we compared these results to those from different reaction runs with defects, that is lacking one or two of the hairpin sequences required for the faithful replication of a particular template. Reference reactions contained all 16 strands (, , , , , …, ) at 100 nM each, and were run for each of three different template sequences (, , and ) (Figure 6). The product yields were quantified from reaction time traces, extracted by integrating the intensities of all gel bands containing tetramers with the labeled strand .
Leaving out a single strand (reaction label “+++−”, for example omitting for template ) reduced the yield of full-size product to about 40 % (Figure 6a, b). The non-zero product yield with a missing strand is most likely due to the incorporation of the corresponding strand with an information domain mismatch (here ). This type of mismatch allows the hairpin backbone to form regardless, and the unfaithful product can propagate since both strands needed for an amplification of '1' at position D ( and ) are provided.
In particular during the first few cycles, mostly complex : (3:4) was detected in the gel, instead of the desired tetramer product (Figure 6—figure supplement 1). This was expected given the lack of strand and provides an upper limit on the error rate of the full replication. The fact that the full reaction produced almost no complexes 3:4 or 4:3 indicates that the incomplete product was indeed caused by the lack of a particular strand.
Removal of a further strand either directly next to the previous one ('++−−', missing strands ) or not ('+−+−', missing strands ) reduced the yield of product tetramers even further. Due to the periodic design those two variants represent all defective sets with two missing strands. Replication of the other two templates and produced very similar results. Product concentrations after six cycles are given in Figure 6c for each of the three templates as well as an average over the template sequences (horizontal lines). A single defect reduced the yield of tetramer complexes to about 40 %, two defects to 15–20 %, which is close to %, that is the combined probability of two independent mismatches.
Replication fidelity
The observed rate of erroneous product formation can be attributed to the spontaneous background rate (Figure 4b,c, Figure 5a,b and Figure 6b). The reaction ‘+−+−' (dark green) amplified similarly to the untemplated reference reaction (solid line), as it did not contain any strands that could bind next to each other to the template and form a backbone duplex (Figure 6b). For the templated reactions '+++−' and '++−−', templating worked for partial sequences, producing intermediate yields.
The reduction in yield caused by a single defect (i.e. missing strand) to ~40 % (and to ~16 % for two defects) translates into a replication fidelity per information domain of ~60 %. The exact value for the replication fidelity is 62 % and can be calculated from Figure 6b by extracting the endpoint concentrations (blue vs. yellow line) and calculating .
However, this is a worst-case estimation, and the replication fidelity is likely higher due to binding competition. The mutations caused by a single defect ('+++-') in Figure 6b were imposed by not providing strand for a template ending with and only leaving the option to incorporate instead. For the full system ('++++'), however, with the presence of the matching strand, there is a binding competition for position D. Since the matching strand preferentially binds, the unfaithful incorporation of the wrong strand would be reduced. A similar effect of competition was observed in a protein-catalyzed ligation reaction (Toyabe and Braun, 2019). There, a comparable binding competition lead to a sevenfold decrease of the inferior ligation reaction in the presence of competition (Figure 2a, b therein). Therefore, we expect the real fidelity to be better than above lower bound estimate.
It is interesting to project and compare this per information domain replication fidelity to a per nucleotide replicator (i.e. polymerization). To do so, we define a threshold in the decrease of melting temperature per information domain as the criterion for when the replication mechanism is still functional. Then, we estimate how many point mutations in the information domain can maximally be tolerated to stay within this range of decrease in melting temperature. From this, we can calculate a hypothetical, corresponding per nucleotide fidelity to the measured information domain fidelity.
We compared the properties of the duplex 0: to duplexes 0:*, where * differs from by point mutations. We assumed that within the temperature range of this replication mechanism (Figure 7b, gray box) a reduction in information domain melting temperature Tm of the mutated duplex 0:* by up to 10 °C compared to the original duplex 0: would be tolerated by the replication reaction. This was inferred from the width of the melting transition of duplex 0: (Figure 7b), where a shift of 10 °C corresponds to an increase of the unbound fraction from 0.08 at Tbase = 45 °C to 0.66 at 55 °C. In terms of free energies of the information domain duplex, this difference corresponds to ΔG(0:*) ≥ −12.5 kcal/mol compared to ΔG(0:) = −15.4 kcal/mol. 99 % of all duplexes 0:*, with * containing three point mutations, met that criterion (Figure 7a). Therefore, up to point mutations can be allowed.
We will assume that the replication did not differentiate between information domain and any information domain * if and * differ by less than point mutations. The fidelity per information domain is given by a cumulative binomial distribution:
Here, is the information domain length, and the per nucleotide replication fidelity. The reduction in binding energy of the information domain duplex 0:* and subsequent change in melting temperature was used as criterion to define the functionality of the replicator and to translate between a per information domain and a per nucleotide approach. As justified above, we calculate with mutations within the bases of the information domain, that is the replication can tolerate up to three mismatches in the information domain. From Figure 6 we extracted a per information domain fidelity of , and deduce a per nucleotide fidelity of %. In fact, information domain duplexes 0:* with mutations at two internal bases all show similar properties as information domains with a total of three mutations (Figure 7—figure supplement 1). This refinement () would increase the per nucleotide fidelity to %. We therefore estimate that a per nucleotide replication process would need a replication fidelity of 85–90 % to produce sequences with an error rate equivalent to the presented mechanism. Detailed calculations of the per nucleotide fidelities can be found in the supplementary information.
Discussion
A cross-catalytic replicator can be made from short sequences and without covalent bonds under a simple non-equilibrium setting of periodic thermal oscillations. The replication is fast and proceeds within a few thermal oscillations of 20 min each. This velocity is comparable to other replicators (Kindermann et al., 2005), cross-ligating ribozymes (Robertson and Joyce, 2014), or autocatalytic DNA networks (Yin et al., 2008). The required thermal oscillations can be obtained by laminar convection in thermal gradients (Braun et al., 2003; Salditt et al., 2020), which also accumulates oligonucleotides (Mast et al., 2013). Depending on the envisioned environment, the mechanism could also be driven by thermochemical oscillations (Ball and Brindley, 2014) or convection in pH gradients (Keil et al., 2017). It should however be noted, that with the current state-of-the-art prebiotic chemistry regarding polymerization and ligation, the creation of >80 nt RNA is not yet understood.
It is likely that a slower prebiotic ligation chemistry could later fix the replication results over long timescales. Such an additional non-enzymatic ligation (Stadlbauer et al., 2015) that joins successive strands would relax the constraint that backbone duplexes must not melt during high-temperature steps. Early on, this is difficult to achieve in aqueous solution against the high concentration of water. In order to overcome this competition and to favor the reaction entropically by a leaving group, individual bases are typically activated by triphosphates (Attwater et al., 2013; Horning and Joyce, 2016) or imidazoles, which are especially interesting in this context since they can replicate RNA directly (O'Flaherty et al., 2019; Zhou et al., 2019). However, the required chemical conditions of enhanced Mg2+ concentration hinder strand separation.
The overall replication fidelity is limited by the spontaneous bond formation rate between pairs of hairpin sequences, caused by the interaction of strands in free solution. At lower concentrations, as one would imagine in a prebiotic setting, this rate would decrease at the expense of an overall slower reaction. To some degree and despite ongoing design efforts, such a background rate is inherent to hairpin-fuelled DNA or RNA reactions (Green et al., 2006; Krammer et al., 2012; Yin et al., 2008).
The replication mechanism is expected to also work with shorter strands, as long as the order of the melting temperatures of the information domain and the backbone duplexes is preserved. Smaller strands would also be easier to produce by an upstream polymerization process, simply because they contain less nucleotides. In addition, binding of shorter information domain duplexes could discriminate even single base mismatches, resulting in an increased selectivity. It is not straightforward to estimate a minimal sequence length for the demonstrated mechanism. However, it is worth noting that it has been suggested that tRNA arose from two proto-tRNA sequences (Hopfield, 1978).
Pre-selection of nucleic acids for the presented hairpin-driven replication mechanism can be provided by highly sequence-specific gelation of DNA. This gel formation has been shown to be most efficient with double hairpin structures very similar to the tRNA-like sequences used in this study (Morasch et al., 2016). For our replication system, we have demonstrated this in Figure 3 by showing the spontaneous formation of agglomerates and sedimentation under gravity if all molecules of the assembly are present. This self-selection shows a possible pathway how the system can emerge from random or semi-random sequences, for example in a flow or a convection system where the molecules are selected as macroscopic agglomerate (Mast et al., 2013). Another selection pressure could stem from the biased hydrolysis of double-stranded nucleotide backbones, which favors assembled complexes over the initial hairpins (Obermayer et al., 2011).
The replication mechanism could serve as a mutable assembly strategy for larger functional RNAs (Mutschler et al., 2015; Vaidya et al., 2012). As an evolutionary route toward a more mRNA-like replication product with chemically ligated information domains, the mechanism would be supplemented by self-cleavage next to the information domains that cuts out the non-coding backbone duplexes, followed by ligation of the information domains. Both operations could potentially be performed by very small ribozymatic centers (Dange et al., 1990; Szostak, 2012; Vlassov et al., 2005).
The proposed replication mechanism of assemblies from tRNA-like sequences allows to speculate about a transition from an autonomous replication of successions of information domains to the translation of codon sequences encoded in modern mRNA (Figure 1a). Short peptide-RNA hybrids (Griesser et al., 2017; Jauker et al., 2015), combined with specific interactions between 3’-terminal amino acids and the anticodons, could have given rise to a primitive genetic code. The spatial arrangement of tRNA-like sequences that are replicated by the presented mechanism would translate into a spatial arrangement of the amino acid or short peptide tails that are attached to the strands in a codon-encoded manner (Schimmel and Henderson, 1994). The next stage would then be the detachment and linking of the tails to form longer peptides. Eventually, tRNA would transition to its modern role in protein translation. The mechanism thus proposes a hypothesis for the emergence of predecessors of tRNA, independent of protein translation. This is crucial for models of the evolution of translation, because it could justify the existence of tRNA before it was utilized in an early translation process. However, many questions around the evolutionary steps that created translation are still unclear.
Therefore, replication and translation could have, at an early stage, emerged along a common evolutionary trajectory. This supports the notion that predecessors of tRNA could have featured a rudimentary replication mechanism: starting with a double hairpin structure of tRNA-like sequences, the replication of a succession of informational domains would emerge. The interesting aspect is, that the replication is first encoded by hybridization and can later be fixed by a much slower ligation of the hairpins. The demonstrated mechanism could therefore jumpstart a non-enzymatic replication chemistry, which was most likely restricted in fidelity due to working on a nucleotide-by-nucleotide basis (Robertson and Joyce, 2012; Szathmáry, 2006).
Materials and methods
Strand design
Request a detailed protocolDNA double-hairpin sequences were designed using the NUPACK software package (Zadeh et al., 2011). In addition to the secondary structures of the double-hairpins, the design algorithm was constrained by all target dimers. Candidate sequences were selected for optimal homogeneity of binding energies and melting temperatures. Backbone domains connecting consecutive strands (e.g. ) had to be the most stable bonds in the system, in particular more stable than between a template and a newly formed product complex (e.g. :). On the other hand, hairpin melting temperatures had to be low enough to allow for a sufficient degree of thermal fluctuations. To reconcile this with the length of the strands, mismatches were introduced in the hairpin stems. The sequences of all strands are listed in Supplementary file 1.
Thermal cycling assays
Request a detailed protocolAll reactions were performed in salt 20 mM Tris-HCl pH 8, 150 mM NaCl with added 20 mM MgCl2. DNA oligonucleotides (Biomers, Germany) were used at 200 nM concentration per strand in reactions containing a fixed-sequence subset of eight strands (e.g. 0/ only) and 100 nM per strand in reactions containing all 16 different strands.
Thermal cycling was done in a standard PCR cycler (Bio-Rad C1000). Reaction kinetics were obtained by running each reaction for different run times or numbers of cycles in parallel. The products were analyzed using native PAGE. The time between thermal cycling and PAGE analysis was minimized to exclude artifacts from storage on ice.
Template sequences were prepared using a two-step protocol. Annealing from 95°C to 70°C within 1 hr, followed by incubation at 70 °C for 30 min. Afterwards, samples were cooled to 2 °C and stored on ice. When assembling complexes containing paired information domains (Figure 2), samples were slowly cooled down from 70 to 25 °C within 90 min before being transferred onto ice. DNA double hairpins were quenched into monomolecular state by heating to 95 °C and subsequent fast transfer into ice water.
Product analysis
Request a detailed protocolDNA complexes were analyzed using native polyacrylamide gel electrophoresis (PAGE) in gels at 5 % acrylamide concentration and 29:1 acrylamide / bisacrylamide ratio (Bio-Rad, Germany). Gels were run at electric fields of 14 V/cm at room temperature. Strand / was covalently labeled with Cy5. Cy5 fluorescence intensities were later used to compute strand concentrations. As an additional color channel, strands were stained using SYBR Green I dye (New England Biolabs). Complexes were identified by comparing the products obtained from annealing different strand subsets.
To correctly identify bands in the time-resolved measurements, gels were run with a marker lane. The marker contained strands (200 nM), (150 nM), (50 nM), and (100 nM), and was prepared using the two-step annealing protocol from 95 to 70 °C. The unequal strand concentrations ensured that the sample contained a mixture of mono-, di-, tri-, and tetramers.
Electrophoresis gels were imaged in a multi-channel imager (Bio-Rad ChemiDoc MP), image post processing, and data analysis were performed using a self-developed LabVIEW software. Post-processing corrected for inhomogeneous illumination by the LEDs, image rotation, and distortions of the gel lanes if applicable. Background fluorescence was determined from empty lanes on the gel, albeit generally low in the Cy5 channel.
For the determination of reaction yields, the intensities of all gel bands containing strands of the sequence length of interest were added up. For strings of four strands, these were the single tetramer as well as its complex with di- and tri- and tetramers. Single strands separated from their complements during electrophoresis (Figure 2 and Figure 6—figure supplement 1).
Thermal melting curves
Request a detailed protocolThermal melting curves were measured using either UV absorbance at 260 nm wavelength in a UV/Vis spectrometer (JASCO V-650, 1 cm optical path length), via quenching of the Cy5 label at the 5'-end of strand (excitation: 620–650 nm, detection: 675–690 nm), or using fluorescence of the intercalating dye SYBR Green I (excitation: 450–490 nm, detection: 510–530 nm). Fluorescence measurements were performed in a PCR cycler (Bio-Rad C1000). Samples measured via fluorescence were at 200 nM of each strand, those measured via UV absorption contained 1 µM total DNA concentration to improve the signal-to-noise ratio. Before analysis of the melting curves (Mergny and Lacroix, 2003), data were corrected for baseline signals from reference samples containing buffer and intercalating dye, if applicable.
Self-assembly and sedimentation analysis
Request a detailed protocolThe samples were mixed in the replication buffer (150 mM NaCl, 20 mM MgCl2, 20 mM Tris-HCl pH 8) at a total oligomer concentration of 5 µM, that is varying concentration per strand depending on the number of different strands in the configuration (4, 7, or 8). The microfluidic chamber was assembled with a custom cut, 500 µm thick, Teflon foil placed between two plane sapphires (Figure 3—figure supplement 2). Three Peltier elements (QuickCool QC-31–1.4-3.7AS, purchased from Conrad Electronics, Germany) were attached to the backside of the chamber to provide full temperature control. The chamber was initially flushed with 3M Novec7500 (3M, Germany) to avoid bubble formation. The samples were pipetted into the microfluidic chamber through the 0.5 mm channels using microloader pipette tips (Eppendorf, Germany). The chamber was then sealed with Parafilm and heated to 95 °C for 10 s to fully separate the strands and cooled rapidly (within 30 s) to 25 °C. Assembly and sedimentation were monitored for 20 hr on a fluorescence microscope (Axiotech Vario, Zeiss, Germany) with two LEDs (490 nm and 625 nm, Thorlabs, Germany) using a 2.5 x objective (Fluar, Zeiss, Germany). The observed sedimentation was independent of the attached dye and its position (Figure 3—figure supplement 1c). Prior to image analysis the image stacks were stabilized using an ImageJ plugin (Li, 2008). The ratio of sedimented fluorescence relative to the first frame after heating was used to quantify sedimentation (Figure 3). The sedimentation time-traces (Figure 3b) were fitted with a Sigmoid function to determine the final concentration increase c/c0 (Figure 3c). The experiment was also performed with random 84 nt DNA strands at 5 µM total concentration to exclude unspecific agglomeration (Figure 3—figure supplement 1c).
Appendix 1
Calculation of fidelity rate
Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. Now, we want to assume that the presented replication mechanism would translate into a base-by-base replication and look at (i) how tolerant would the replication be to point mutations at the information domain and (ii) given that threshold, how good would a base-by-base replication have to do to perform equally well, that is what per nucleotide fidelity would it need to have.
Question (i) is answered in Figure 7, where we see that on the 15 nt information domain we can allow up to three base mismatches to stay within the bounds of the temperature cycling (gray box, Figure 7b). In order to calculate how the measured replication fidelity per information domain translates into a hypothetical replication fidelity per nucleotide we assume a cumulative binomial distribution:
We know that the overall likelihood to get a 'correctly' replicated information domain is 62 %. From Figure 7 we know that in a base-by-base replication, 'correctly' means with up to three mismatches. Therefore, we must find the number of combinatorial possibilities of spatially distributing 0, 1 or 2 mismatches on the 15 nt information domain (using nucleotides and allowing up to mismatches). Using this, we can determine the probability for a success, that is the correct replication of a single nucleotide, to meet the overall likelihood.
For and , we measure the replication fidelity per information domain to be . Therefore, we calculate:
From the information domain energy statistics shown in Figure 7—figure supplement 1, one can see that strands with two internal mutations behave nearly identical to strands with a total of three mutations (accepting internal and terminal mutations). Therefore, we simplify the calculation and only consider internal mutations.
Accordingly, we calculate for and and a per information domain fidelity :
Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85–90 % to perform equally well as the presented replication mechanism.
Data availability
No data sets (e.g. sequencing data, clinical trial data etc.) were produced in this study. The source data files (Igor incl. macros) and data analysis (LabVIEW) tools used are provided as supporting fFiles (zip).
References
-
From self-replication to replicator systems en route to de novo lifeNature Reviews Chemistry 4:386–403.https://doi.org/10.1038/s41570-020-0196-x
-
Emergence of a new Self-Replicator from a dynamic combinatorial library requires a specific Pre-Existing replicatorJournal of the American Chemical Society 139:13612–13615.https://doi.org/10.1021/jacs.7b07346
-
In-ice evolution of RNA polymerase ribozyme activityNature Chemistry 5:1011–1018.https://doi.org/10.1038/nchem.1781
-
Hydrogen peroxide thermochemical oscillator as driver for primordial RNA replicationJournal of the Royal Society Interface 11:20131052.https://doi.org/10.1098/rsif.2013.1052
-
A synthetic replicator drives a propagating Reaction-Diffusion frontJournal of the American Chemical Society 138:6723–6726.https://doi.org/10.1021/jacs.6b03372
-
Self-assembly and Self-replication of short amphiphilic β-sheet peptidesOrigins of Life and Evolution of Biospheres 41:563–567.https://doi.org/10.1007/s11084-011-9257-y
-
Exponential DNA replication by laminar convectionPhysical Review Letters 91:158103.https://doi.org/10.1103/PhysRevLett.91.158103
-
Site-directed modification of DNA duplexes by chemical ligationNucleic Acids Research 16:3721–3738.https://doi.org/10.1093/nar/16.9.3721
-
DNA hairpins: fuel for autonomous DNA devicesBiophysical Journal 91:2966–2975.https://doi.org/10.1529/biophysj.106.084681
-
Amino Acid-Specific, Ribonucleotide-Promoted peptide formation in the absence of enzymesAngewandte Chemie 129:1244–1248.https://doi.org/10.1002/ange.201610651
-
Spontaneous formation of RNA strands, peptidyl RNA, and cofactorsAngewandte Chemie International Edition 54:14564–14569.https://doi.org/10.1002/anie.201506593
-
Systems chemistry: kinetic and computational analysis of a nearly exponential organic replicatorAngewandte Chemie International Edition 44:6750–6755.https://doi.org/10.1002/anie.200501527
-
Exploring the emergence of complexity using synthetic replicatorsChemical Society Reviews 46:7274–7305.https://doi.org/10.1039/C7CS00123A
-
Thermal, autonomous replicator made from transfer RNAPhysical Review Letters 108:238104.https://doi.org/10.1103/PhysRevLett.108.238104
-
Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2‘-Hydroxyl GroupJournal of the American Chemical Society 121:5364–5372.https://doi.org/10.1021/ja990592p
-
Analysis of thermal melting curvesOligonucleotides 13:515–537.https://doi.org/10.1089/154545703322860825
-
Heat-Flow-Driven oligonucleotide gelation separates Single-Base differencesAngewandte Chemie International Edition 55:6676–6679.https://doi.org/10.1002/anie.201601886
-
Freeze-thaw cycles as drivers of complex ribozyme assemblyNature Chemistry 7:502–508.https://doi.org/10.1038/nchem.2251
-
Nonenzymatic Template-Directed synthesis of Mixed-Sequence 3'-NP-DNA up to 25 nucleotides long inside model protocellsJournal of the American Chemical Society 141:10481–10488.https://doi.org/10.1021/jacs.9b04858
-
Emergence of information transmission in a prebiotic RNA reactorPhysical Review Letters 107:018101.https://doi.org/10.1103/PhysRevLett.107.018101
-
Prebiotic chemistry and the origin of the RNA worldCritical Reviews in Biochemistry and Molecular Biology 39:99–123.https://doi.org/10.1080/10409230490460765
-
DNA with 3'-5'-disulfide links--rapid chemical ligation through isosteric replacementAngewandte Chemie International Edition 53:4222–4226.https://doi.org/10.1002/anie.201310644
-
Sequence complementarity-driven nonenzymatic ligation of RNABiochemistry 50:2994–3003.https://doi.org/10.1021/bi101981z
-
The origins of the RNA worldCold Spring Harbor Perspectives in Biology 4:a003608.https://doi.org/10.1101/cshperspect.a003608
-
Highly efficient self-replicating RNA enzymesChemistry & Biology 21:238–245.https://doi.org/10.1016/j.chembiol.2013.12.004
-
Nonenzymatic, template-directed ligation of oligoribonucleotides is highly regioselective for the formation of 3'-5' phosphodiester bondsJournal of the American Chemical Society 118:3340–3344.https://doi.org/10.1021/ja9537134
-
A simple synthetic replicator amplifies itself from a dynamic reagent poolAngewandte Chemie 120:10113–10118.https://doi.org/10.1002/ange.200804223
-
Thermal habitat for RNA amplification and accumulationPhysical Review Letters 125:048104.https://doi.org/10.1103/PhysRevLett.125.048104
-
Structure and transcription of eukaryotic tRNA genesCritical Reviews in Biochemistry 19:107–144.https://doi.org/10.3109/10409238509082541
-
Tetraloop-like geometries could form the basis of the catalytic activity of the most ancient ribooligonucleotidesChemistry - a European Journal 21:3596–3604.https://doi.org/10.1002/chem.201406140
-
The origin of replicators and reproducers philosophical transactions of the royal society BBiological Sciences 361:1761–1776.https://doi.org/10.1098/rstb.2006.1912
-
The eightfold path to non-enzymatic RNA replicationJournal of Systems Chemistry 3:2.https://doi.org/10.1186/1759-2208-3-2
-
Self-replicating systemJournal of the American Chemical Society 112:1249–1250.https://doi.org/10.1021/ja00159a057
-
Catalyzed and spontaneous reactions on ribozyme riboseJournal of the American Chemical Society 133:6044–6050.https://doi.org/10.1021/ja200275h
-
The RNA world on ice: a new scenario for the emergence of RNA informationJournal of Molecular Evolution 61:264–273.https://doi.org/10.1007/s00239-004-0362-7
-
A Self-Replicating hexadeoxynucleotideAngewandte Chemie International Edition in English 25:932–935.https://doi.org/10.1002/anie.198609322
-
BookThe general and logical theory of automataIn: Jeffress L. A, editors. Cerebral Mechanisms in Behavio. Wiley. pp. 1–41.
-
NUPACK: analysis and design of nucleic acid systemsJournal of Computational Chemistry 32:170–173.https://doi.org/10.1002/jcc.21596
Article and author information
Author details
Funding
Deutsche Forschungsgemeinschaft (TRR 235)
- Alexandra Kühnlein
- Dieter Braun
Deutsche Forschungsgemeinschaft (Project-ID 364653263)
- Alexandra Kühnlein
- Dieter Braun
Deutsche Forschungsgemeinschaft (CRC 1032 (A04) Project-ID 201269156)
- Simon Alexander Lanzmich
- Dieter Braun
Deutsche Forschungsgemeinschaft (Student fellowship)
- Alexandra Kühnlein
Deutsche Forschungsgemeinschaft (Graduate school "Quantitative Bioscience Munich")
- Alexandra Kühnlein
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We gratefully acknowledge financial support the Deutsche Forschungsgemeinschaft (DFG) through the TRR 235 Emergence of Life (Project-ID 364653263) and the CRC 1032 NanoAgents (Project-ID 201269156). We thank for funding from the Graduate School ‘Quantitative Bioscience Munich’ (QBM). We appreciate the fruitful discussions in the Simons Collaboration on the Origins of Life, thank for the measurements by Thomas Rind and acknowledge discussions with Tim Liedl, Christof Mast and Lorenz Keil. We thank Filiz Civril, Adriana Serrão and Thomas Matreux for comments on the manuscript.
Copyright
© 2021, Kühnlein et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 9,092
- views
-
- 707
- downloads
-
- 7
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
Timely and effective use of antimicrobial drugs can improve patient outcomes, as well as help safeguard against resistance development. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is currently routinely used in clinical diagnostics for rapid species identification. Mining additional data from said spectra in the form of antimicrobial resistance (AMR) profiles is, therefore, highly promising. Such AMR profiles could serve as a drop-in solution for drastically improving treatment efficiency, effectiveness, and costs. This study endeavors to develop the first machine learning models capable of predicting AMR profiles for the whole repertoire of species and drugs encountered in clinical microbiology. The resulting models can be interpreted as drug recommender systems for infectious diseases. We find that our dual-branch method delivers considerably higher performance compared to previous approaches. In addition, experiments show that the models can be efficiently fine-tuned to data from other clinical laboratories. MALDI-TOF-based AMR recommender systems can, hence, greatly extend the value of MALDI-TOF MS for clinical diagnostics. All code supporting this study is distributed on PyPI and is packaged at https://github.com/gdewael/maldi-nn.
-
- Computational and Systems Biology
- Genetics and Genomics
Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.