tRNA sequences can assemble into a replicator
Abstract
Can replication and translation emerge in a single mechanism via self-assembly? The key molecule, transfer RNA (tRNA), is one of the most ancient molecules and contains the genetic code. Our experiments show how a pool of oligonucleotides, adapted with minor mutations from tRNA, spontaneously formed molecular assemblies and replicated information autonomously using only reversible hybridization under thermal oscillations. The pool of cross-complementary hairpins self-selected by agglomeration and sedimentation. The metastable DNA hairpins bound to a template and then interconnected by hybridization. Thermal oscillations separated replicates from their templates and drove an exponential, cross-catalytic replication. The molecular assembly could encode and replicate binary sequences with a replication fidelity corresponding to 85–90 % per nucleotide. The replication by a self-assembly of tRNA-like sequences suggests that early forms of tRNA could have been involved in molecular replication. This would link the evolution of translation to a mechanism of molecular replication.
eLife digest
The genetic code stored within DNA contains the instructions for manufacturing all the proteins organisms need to develop, grow and survive. This requires molecular machines that ‘transcribe’ regions of the genetic code into RNA molecules which are then ‘translated’ into the string of amino acids that form the final protein. However, these molecular machines and other proteins are also needed to replicate and synthesize the sequences stored in DNA. This presents evolutionary biologists with a ‘chicken-and-egg’ situation: which came first, the DNA sequences needed to manufacture proteins or the proteins needed to transcribe and translate DNA?
Understanding the order in which DNA replication and protein translation evolved is challenging as these processes are tightly intertwined in modern-day species. One theory, known as the ‘RNA world hypothesis’, suggests that all life on Earth began with a single RNA molecule that was able to make copies of itself, as DNA does today. To investigate this hypothesis, Kühnlein, Lanzmich and Braun studied a molecule called transfer RNA (or tRNA for short) which is responsible for translating RNA into proteins. tRNA is assumed to be one of the earliest evolved molecules in biology. Yet, why it was present in early life forms before it was needed for translation still remained somewhat of a mystery.
To gain a better understanding of tRNA’s role early in evolution, Kühnlein, Lanzmich and Braun made small changes to its genetic code and then carried out tests on these tRNA-like sequences. The experiments showed these ‘early’ forms of tRNA can actually self-assemble into a molecule which is capable of replicating the information stored in its sequence. It suggests early forms of tRNA could have been involved in replication before modern tRNA developed its role in protein translation.
With these experiments, Kühnlein, Lanzmich and Braun have identified a possible evolutionary link between DNA replication and protein translation, suggesting the two processes emerged through one shared pathway: tRNA. This deepens our understanding about the origins of early life, while taking biochemists one step closer to their distant goal of recreating self-replicating molecular machines in the laboratory.
Introduction
A machine to create replicate of itself is an old dream of engineering (von Neumann, 1951). Biological systems have solved this problem long ago at the nanoscale with DNA and RNA. Their replication machinery was optimized to perfection through Darwinian evolution. In modern living systems, the replication of DNA and RNA necessitates the formation of covalent bonds. It requires an interconnected machinery: proteins need to perform base-by-base replication of sequence information, a modern metabolism to supply activated molecules, and tRNA as well as the ribosome to create the required proteins.
This is a complex system to set up in the first place at the emergence of life. The RNA world hypothesis proposes, that early on, the catalytic function of highly defined RNA sequences was used for self-replication (Horning and Joyce, 2016; Orgel, 2004; Turk et al., 2011). These ribozymes catalyze the ligation of RNA (Doudna et al., 1991; Mutschler et al., 2015; Paul and Joyce, 2002; Robertson et al., 2001; Walton et al., 2020) and the addition of individual bases (Attwater et al., 2013; Horning and Joyce, 2016). These very special sequences were engineered using in vitro evolution. It is unclear how autonomous evolution of early life could have reached such levels of sequence complexity.
Here, we focus on how such replication may have been predated by simpler forms of self-replication. Creating a replicator must fulfill a series of requirements. Replication must yield fidelity in copying, be fast, enable exponential replication, be fed by an autonomous energy source, not require complex sequences and should not form too many replicates without the existence of a template.
We show that replication of information can be realized by the reversible hybridization interactions between tRNA-like molecules alone. The proposed mechanism is driven by an external physical non-equilibrium setting, in our case thermal oscillations. Since the process does not involve chemical ligation, it does not rely on a particular non-enzymatic or catalytic ligation chemistry (Dolinnaya et al., 1988; Engelhart et al., 2012; Patzke et al., 2014; Pino et al., 2011; Rohatgi et al., 1996; Sievers and von Kiedrowski, 1994; von Kiedrowski, 1986) or particular catalytically active sequences, but merely requires sequence complementarity. The advantage of reversible hybridization is the re-usability of educts and products. Moreover, sequence-encoded interactions can self-select by forming agglomerates.
Nature’s approach to achieve exponential growth is the usage of cross-catalysis: the replicate of a template serves as a template for the next round of replication. For short replicators under isothermal conditions, the binding between template and replicate has to be weak such that the dissociation of strands happens spontaneously and is not rate limiting (Paul and Joyce, 2002; Sievers and von Kiedrowski, 1994; von Kiedrowski, 1986). For longer replicates, temperature change has successfully been used to separate strands for replication catalyzed by thermostable proteins (Barany, 1991; Saiki et al., 1985). For catalytic RNA, elevated salt concentrations disfavor strand separation by temperature and catalyze hydrolysis (Horning and Joyce, 2016). In an interesting alternative to strand separation by temperature, Schulman et al. used moderate shear flows to separate DNA tile assemblies (Schulman et al., 2012).
Apart from nucleotide-based replicators, very interesting replication systems using non-covalent interactions have been developed with non-biological compounds (Bottero et al., 2016; Sadownik and Philp, 2008; Tjivikua et al., 1990), peptide-based approaches (Altay et al., 2017; Bourbo et al., 2011; Carnall et al., 2010; Lee et al., 1996; Rubinov et al., 2012), and peptide nucleic acids (Ura et al., 2009). We also want to point to several instructive reviews about the state-of-the-art systems chemistry regarding self-replication (Adamski et al., 2020; Ashkenasy et al., 2017; Kosikova and Philp, 2017).
In the past, metastable hairpin states have been prepared in a physically separated manner. The reaction was then triggered by mixing. For example, the mixing of hairpins with a trigger sequence has been shown to form long concatemers (Dirks and Pierce, 2004). With a similar logic, mixing a low entropy combination of molecules was used to create entropically driven DNA machines, including exponentially amplifying assemblies (Zhang et al., 2007). These reactions run downwards into the binding equilibrium. However, the preparation of the initial low entropy state required human intervention or a unique flow setting for mixing.
Sequence design
We designed a set of cooperatively replicating DNA strands using the program package NUPACK (Zadeh et al., 2011). The sequences are designed to have self-complementary double hairpins and are pairwise complementary within the molecule pool, such that the 3’ hairpin of one strand is complementary to the 5’ hairpin of the next. Their structure resembles the secondary structure of proto-tRNAs proposed by stereochemical theories (Figure 1a), comprising two hairpin loops that surround the anticodon with a few neighboring bases (Krammer et al., 2012). The lengths of 82–84 nt of the double hairpins are that of average tRNA molecules (Sharp et al., 1985), with stem loops consisting of 30–33 nt and the information-encoding interjacent domains of 15 nt. As the replication mechanism is based on hybridization only, it is expected to perform equally well for DNA and RNA. Here, we implemented the system with DNA and not RNA as done previously (Krammer et al., 2012). Both, in the design and the implementation we did not see significant differences between the two versions. Because of the simpler and more inexpensive synthesis of the 82–84 nt long sequences we now implemented the replicator in DNA. Due to short heating times and moderate magnesium concentrations, we estimate that an RNA version could survive for days if not weeks (Li and Breaker, 1999; Mariani et al., 2018). The most critical step regarding the RNA stability would be the initial temperature spike to 95 °C, which remains unchanged from our previous study (Krammer et al., 2012) and did not prove critical. We also show that an RNA version behaves structurally identical to the implemented DNA version (Figure 1—figure supplement 1).

Heat-driven replication by hybridization using hairpin structures inspired from transfer RNA.
(a) Transfer RNA folds into a double-hairpin conformation upon very few base substitutions. In that configuration, the 3’-terminal amino acid binding site (green) is close to the anticodon (blue) and a double hairpin structure forms. A set of pairwise complementary double hairpins can encode and replicate sequences of information. A binary code implemented in the position of the anti-codon, the information domain, allows to encode and replicate binary sequences (red vs blue). Each strand (82-84 nt) comprises two hairpin loops (gray) and an interjacent unpaired information domain of 15 nt length (blue/red, here: ). The displayed structure of eight strands shows replication of a template corresponding to the binary code 0010. Note, that no covalent linkage is involved in the process. (b) Replication is driven by thermal oscillations in four steps: (0) The hairpins are activated into their closed conformation by fast cooling indicated by triangles. (1) Strands with matching information domain bind to the template. (2) Fluctuations in the bound strands’ hairpins facilitate the hybridization of neighboring strands. (3) Subsequent heating splits replica from template, while keeping the longer hairpin sequences connected, freeing both as templates for the next cycle.
Replication mechanism
The replication mechanism is a template-based replication, where instead of single nucleotides, information is encoded by a succession of oligomers. The domain, at the location of the anticodon in tRNA, is the template sequence and thus contains the information to be replicated. We therefore term it information domain. The goal is to replicate the succession of information domains.
To allow longer replicates, we chose the resulting meta-sequences to be periodic with a periodicity of four different hairpins. This makes the minimal cyclic meta-sequence large enough to keep the information domains accessible even in cyclic configuration. The information domains feature a binary system and contain sequences marked by '0' and '1' (blue/red). For replication, two sets of strands replicate strings of codons in a cross-catalytic manner (Figure 1b), using complementary information domains (light/dark colors).
The replication is driven by thermal oscillations and operates in four steps (Figure 1b): (0) Fast cooling within seconds brings the strands to their activated state with both hairpins closed. (1) At the base temperature, activated strands with complementary information domains can bind to an already assembled template. (2) Thermal fluctuations cause open-close fluctuations of the hairpins. When strands are already bound to a template at the information domain, those fluctuations permit adjacent complementary hairpins of different strands to bind. In this way, the succession of information domains is replicated. (3) Subsequent heating splits the newly formed replicate from the template at the information domains. Due to their higher melting temperatures, the backbone of hairpin strands remains stable. Both, replicate and template, are available for a new replication round. This makes both the replicate and the template replication cross-catalytic in a subsequent step. Later, high temperatures spikes can unbind and recycle all molecules for new rounds of replication.
Because of the initial fast cooling, all hairpins are closed in free solution. This inhibits the formation of replicates without template. While the binding of adjacent hairpins with template happens within minutes, hairpins in free solution connect without template only on timescales slower than hours and thus give false positives at a very low rate.
The basic principle of this replication mechanism was previously explored by Krammer et al. using a set of four hairpins using half a tRNA sequence (36 nt) that amplified into dimers (Krammer et al., 2012). This amplification could not encode information and suffered from a high rate (>50 %) of unspecific amplification without template (Figure 4 therein). Here, in contrast, we demonstrate exponential amplification, and the replicator can now encode sequence information ‘0’ and ‘1’ with four bits. Moreover, the strands making up the new replicator are double hairpins with the sequence structure and length of tRNA. The replicator now shows a significantly decreased unspecific amplification without template of approximately 10 % (Figure 5a).
Results
Analysis of molecule conformations
Native polyacrylamide gel electrophoresis (PAGE) showed that the double hairpins assembled as intended (Figure 2). Comparing different subsets of strands allowed to identify all gel bands.

Assembly of different subsets of the cross-replicating system of strands observed by native gel electrophoresis.
Samples contained strands at 200 nM concentration each and were slowly annealed as described in Materials and methods. Lane contents are indicated at the top of each lane. Comparison of different lanes allowed for the attribution of bands to complexes. Complexes incorporating all present strands are marked (•). The red channel shows the intensity -Cy5, the cyan channel shows SYBR Green I fluorescence. Single information domain bonds (lane 2, 7) break during gel electrophoresis.
-
Figure 2—source data 1
Source data for assembly of different subsets of the cross-replicating system of strands observed by native gel electrophoresis.
- https://cdn.elifesciences.org/articles/63431/elife-63431-fig2-data1-v1.zip
All complexes were formed at concentrations of 200 nM of each strand and could be resolved despite their branched tertiary structure. Friction coefficients of complexes of two to four strands were 1.6–1.8-fold higher than for linear dsDNA, and 2.4-fold higher for larger complexes (4:4 configuration, ca. 660 nt, Figure 2—figure supplement 1). This agrees with the branched structure of the suggested strand assembly geometry (Figure 1a). Partially assembled complexes of two or three strands bound to a four-strand template could be resolved (Figure 6—figure supplement 1). Complexes containing single bound information domains were not stable during electrophoresis (Figure 2, lanes 2, 7 and Figure 6—figure supplement 1). This allowed to differentiate fully assembled complexes from those where individual strands are bound to a template but have not formed backbone duplexes. Covalent end labels and two reference lanes on each gel were used to quantify concentrations from gel intensities using image analysis as described in Materials and methods.
Selection by agglomeration and sedimentation
For a replicator to be autonomous, there must be a mechanism in place to select, assemble and (re-)accumulate its molecular components purely at one location. We argue that DNA hydrogels could offer such a solution. While DNA often, also in our case, assembles into agglomerates, DNA hydrogels have been shown to be able to form fluid phases if gaps of single bases were added to create flexible linkers between molecules (Nguyen and Saleh, 2017).
We combined eight matching hairpin sequences of design as introduced in Figure 1 at moderately elevated concentrations and cooled the system to only 25 °C after separating the molecules at 95 °C (Figure 3). We found the spontaneous formation of agglomerates that were large enough to sediment under gravity. The initial homogeneous fluorescence turned into micrometer-sized grains and sedimented within hours. The fluorescence was provided by a covalently attached label to either strand or . Since the double hairpins have a periodic boundary condition, they can create large assemblies (Figure 3a).

Spontaneous self-assembly and sedimentation of matching hairpins.
(a) In a simple, sealed microfluidic chamber (Figure 3—figure supplement 2), the hairpin strands can self-assemble into agglomerates and sediment on a timescale of hours. The sample was initially heated to 95 °C for 10 s to ensure an unbound initial state, then rapidly (within 30 s) cooled to 25 °C, where self-assembly and sedimentation occured. Note, that agglomeration and sedimentation only occured if all eight matching hairpins were provided (top two rows) but not in the case of a knockout (-, bottom row). For quantification, the bulk and sediment intensities were normalized by the first frame after heating. Samples contained strands at total concentration of 5 µM, about threefold higher than in Figure 2 and the following replication experiments. (b) Time traces of concentration increase for sediment and bulk of different configurations, same examples as shown in a. The time traces of all further knockout permutations are shown in Figure 3—figure supplement 1b. (c) Final concentration increase of sediment, relative to first frame after heating, for all configurations. The final values (N≥3) for c/c0 are retrieved from fitting the time traces. For the full set of complementary hairpins, self-assembly and sedimentation is most pronounced.
-
Figure 3—source data 1
Source data for spontaneous self-assembly and sedimentation of matching hairpins.
- https://cdn.elifesciences.org/articles/63431/elife-63431-fig3-data1-v1.zip
It is evident from Figure 3—video 1 that the sedimentation was very selective. When only seven of the eight matching hairpins were present, sedimentation was much weaker and, in most cases, undetectable (Figure 3b,c). For the full system, the sedimentation kinetics showed to be strongly concentration dependent (Figure 3—figure supplement 1b). Analogous experiments with random sequences (random pool of 84 nt strands) at equal concentration did not show agglomeration nor sedimentation (Figure 3—figure supplement 1c). We have previously found that similar hairpin molecules provided the shortest sequences capable of forming agglomerates (Morasch et al., 2016).
The above results suggest that agglomeration could serve as an efficient way to assemble matching hairpins from much less structured and selected sequences in an autonomous way. After the molecules have been assembled as sedimented agglomerates, a convection flow can carry the large assemblies into regions of warmer temperatures, where the molecules would be disassembled by heat and activated for replication with a cooling step. Similar recycling behavior is seen in thermal gradient traps (Morasch et al., 2016), which were also found to enhance the molecular assembly (Mast et al., 2013) with characteristics that can match the above scenario.
Templating kinetics
Hybridization between stems of neighboring hairpins (Figure 1b, step 2) was catalyzed by the presence of already assembled complexes , confirming its role as a template. Assembly kinetics at 45 °C were recorded in reactions containing 200 nM of each strand for a range of template concentrations. At 120 nM template concentration, 40 % yield was achieved within 10 min (Figure 4b, black line). The untemplated, spontaneous reaction proceeded significantly slower (1.4 % yield, light gray line).

Isothermal template assisted product formation.
(a) Schematic representation of the templating step at constant temperature. (b) Kinetics of tetramer formation at 45 °C with different starting concentrations of template (). Data includes concentrations of all complexes containing tetramers. (c) Templating observed over a broad temperature range. Large circles show data for reactions at nM of template , small circles show the spontaneous formation (). The latter increases at T > 45 °C. Above 48 °C, binding of monomers to the template gets weaker, slowing down the rate of template assisted formation. This is consistent with the melting temperatures of the information domains (see Figure 4—figure supplement 1).
-
Figure 4—source data 1
Source data for determination of thermal oscillation temperatures.
- https://cdn.elifesciences.org/articles/63431/elife-63431-fig4-data1-v1.zip
Assembly rates showed a strong dependence on incubation temperature (Figure 4c). At 39 °C, the reaction proceeded significantly slower than at 42 °C or 45 °C. This is because the hairpins are predominantly in closed configuration and cannot bind to neighboring molecules in the assembly. Binding between complementary information domains still occurs, but the formation of bonds between neighboring strands becomes rate limiting. Above the melting temperature of the information domain (48 °C) (see Figure 4—figure supplement 1), template-directed assembly becomes slower. However, the slower kinetics of template-directed product formation are partially superposed by the spontaneous product formation lacking an initial template (Figure 4c, small circles), which becomes an additional reaction channel due to the now open hairpins.
Exponential amplification
As intermediate step toward replication, we studied amplification reactions under thermal oscillations (Figure 5). The amplification reactions only contained strands encoding for information domain '0', that is , , , , …, . The strands were subjected to thermal oscillations between Tbase = 45 °C and Tpeak = 67 °C. The lower temperature was held for 20 min, the upper for one second with temperature ramps amounting to 20±1 s in each full cycle. This asymmetric shape of the temperature cycle accords with differences in kinetics of the elongation step and the melting of the information domain. It is typical for trajectories in thermal convection settings with local heating (Braun et al., 2003).

Exponential amplification of a restricted sequence subset with thermal oscillations.
(a) Amplification time traces for concentration c for sequence 0000 during the first four to six cycles (Tpeak = 67 °C) for template () concentrations from 0 to 45 nM. The data was fitted using the cross-catalytic model from equation (1). Strands , , , …, were used at 200 nM concentration each. Data points show concentrations of complexes 4:4. (b) Initial reaction velocity as a function of initial template concentration . The data points show good agreement with the line calculated from the fits in panel a. (c) Amplification proceeded for peak temperatures below 74 °C. Above, backbone duplexes start to melt, and the complexes are no longer stable. The base temperature was 45 °C, reactions initially contained 30 nM of complex as template. (d) Serial transfer experiment. The reaction containing strands , , , …, (black circles) survived successive dilution by a factor of 1/2 every three cycles at almost constant concentration. In contrast, a reaction with the same amount of template , but lacking monomers , fades out (open circles). The solid line shows the model from Equation 1.
-
Figure 5—source data 1
Source data for exponential amplification of a restricted sequence subset with thermal oscillations.
- https://cdn.elifesciences.org/articles/63431/elife-63431-fig5-data1-v1.zip
The growth of molecular assemblies with different initial concentrations of template revealed an almost linear dependence of the reaction velocity on the initial amount of template (Figure 5a, b). This confirms the exponential nature of the replication. The cross-catalytic replication kinetics can be described by a simplistic model that only considers the concentrations of the template and its complement of :
Here, is the rate of cross-catalysis and the spontaneous formation rate. For , the model corresponds to simple exponential growth on a per-cycle basis. The model can be solved in closed form but does not account for saturation effects from the depletion of monomers. Therefore, it is not valid for concentrations similar to the total concentration of each strand. Fitting the model to the amplification reactions with 0–45 nM of template revealed rate constants of = 0.16 cycle−1 and = 0.4 nM cycle−1 (Figure 5b). Amplification was robust with regard to the peak temperature of the oscillations. For Tpeak below 74 °C, the reaction remained almost unaffected (Figure 5c). Above, the temperature is too close to the melting transitions of the hairpin-hairpin duplexes, ranging from 76 to 79 °C (Figure 4—figure supplement 1).
The ability to withstand consecutive dilutions is characteristic for exponentially growing replicators and was tested for in serial transfer experiments. Strands encoding for '0' (i.e. , , , etc.) were thermally cycled with 30 nM of template . After three cycles each, samples were diluted one to one with buffer containing all eight strands as monomers at 200 nM each (Figure 5d). This high frequency of dilutions prevented the reaction from transitioning into the saturating regime. The cross-catalytic model was fitted to the data with the dilution factor as single free parameter, that was found to be 0.43. The difference from the theoretical value of 0.50 was likely due to strands sticking to the reaction vessels before dilution. As a control, a reaction with the same initial concentration of template , but without monomers , , , , was subjected to the same protocol. As the control could not grow exponentially, it gradually died out (Figure 5d, open circles).
Sequence replication
The above-mentioned reactions did amplify, but not replicate actual sequence information, as they only contained strands with 0/ information domains. To study the replication of arbitrary sequences of binary code, replication reactions with all 16 strands encoding for '0' and '1' were performed. To discriminate sequences encoded in equally sized complexes and deduce error rates, we compared these results to those from different reaction runs with defects, that is lacking one or two of the hairpin sequences required for the faithful replication of a particular template. Reference reactions contained all 16 strands (, , , , , …, ) at 100 nM each, and were run for each of three different template sequences (, , and ) (Figure 6). The product yields were quantified from reaction time traces, extracted by integrating the intensities of all gel bands containing tetramers with the labeled strand .

Sequence replication with thermal oscillations and fidelity check by forcing mutations from '0' to '1' at different locations.
(a) Replication of sequence . Reactions were started with 15 nM initial template . All strands (, , , …, ) were present at 100 nM each. Native-PAGE results comparing the reaction of all 16 strands ('++++') with the reaction lacking strand ('+++−'). The defective set '+++−' mostly produced 3:4 complexes instead of 4:4 complexes (see schematics on the right). The overall yield of tetramer-containing complexes was greatly reduced. As size reference, the marker lane contained complexes , , , and monomers . The complete gel is presented in Figure 6—figure supplement 1. (b) Product concentration over time for the complete sequence network (yellow) and three defective sets with missing strands. Data was integrated by quantitative image analysis from electrophoresis gels using covalent markers on the -strand counting all product complexes containing tetramers. Mutations of information in the product from '0' to '1' were induced by defective reactions that lacked strands ('+++−'), ('++−−'), and ('+−+−'). All reactions were initiated with 15 nM of . The solid line shows data from reaction '++++' without template. (c) End point comparison of reactions with templates (panels a, b), , and after six cycles. Horizontal lines indicate averages of the three template sequences. A single missing strand reduced product yield to about 40 %, two missing strands to 15–20 %.
-
Figure 6—source data 1
Source data for sequence replication with thermal oscillations and fidelity check by forcing mutations from '0' to '1' at different locations.
- https://cdn.elifesciences.org/articles/63431/elife-63431-fig6-data1-v1.zip
Leaving out a single strand (reaction label “+++−”, for example omitting for template ) reduced the yield of full-size product to about 40 % (Figure 6a, b). The non-zero product yield with a missing strand is most likely due to the incorporation of the corresponding strand with an information domain mismatch (here ). This type of mismatch allows the hairpin backbone to form regardless, and the unfaithful product can propagate since both strands needed for an amplification of '1' at position D ( and ) are provided.
In particular during the first few cycles, mostly complex : (3:4) was detected in the gel, instead of the desired tetramer product (Figure 6—figure supplement 1). This was expected given the lack of strand and provides an upper limit on the error rate of the full replication. The fact that the full reaction produced almost no complexes 3:4 or 4:3 indicates that the incomplete product was indeed caused by the lack of a particular strand.
Removal of a further strand either directly next to the previous one ('++−−', missing strands ) or not ('+−+−', missing strands ) reduced the yield of product tetramers even further. Due to the periodic design those two variants represent all defective sets with two missing strands. Replication of the other two templates and produced very similar results. Product concentrations after six cycles are given in Figure 6c for each of the three templates as well as an average over the template sequences (horizontal lines). A single defect reduced the yield of tetramer complexes to about 40 %, two defects to 15–20 %, which is close to %, that is the combined probability of two independent mismatches.
Replication fidelity
The observed rate of erroneous product formation can be attributed to the spontaneous background rate (Figure 4b,c, Figure 5a,b and Figure 6b). The reaction ‘+−+−' (dark green) amplified similarly to the untemplated reference reaction (solid line), as it did not contain any strands that could bind next to each other to the template and form a backbone duplex (Figure 6b). For the templated reactions '+++−' and '++−−', templating worked for partial sequences, producing intermediate yields.
The reduction in yield caused by a single defect (i.e. missing strand) to ~40 % (and to ~16 % for two defects) translates into a replication fidelity per information domain of ~60 %. The exact value for the replication fidelity is 62 % and can be calculated from Figure 6b by extracting the endpoint concentrations (blue vs. yellow line) and calculating .
However, this is a worst-case estimation, and the replication fidelity is likely higher due to binding competition. The mutations caused by a single defect ('+++-') in Figure 6b were imposed by not providing strand for a template ending with and only leaving the option to incorporate instead. For the full system ('++++'), however, with the presence of the matching strand, there is a binding competition for position D. Since the matching strand preferentially binds, the unfaithful incorporation of the wrong strand would be reduced. A similar effect of competition was observed in a protein-catalyzed ligation reaction (Toyabe and Braun, 2019). There, a comparable binding competition lead to a sevenfold decrease of the inferior ligation reaction in the presence of competition (Figure 2a, b therein). Therefore, we expect the real fidelity to be better than above lower bound estimate.
It is interesting to project and compare this per information domain replication fidelity to a per nucleotide replicator (i.e. polymerization). To do so, we define a threshold in the decrease of melting temperature per information domain as the criterion for when the replication mechanism is still functional. Then, we estimate how many point mutations in the information domain can maximally be tolerated to stay within this range of decrease in melting temperature. From this, we can calculate a hypothetical, corresponding per nucleotide fidelity to the measured information domain fidelity.
We compared the properties of the duplex 0: to duplexes 0:*, where * differs from by point mutations. We assumed that within the temperature range of this replication mechanism (Figure 7b, gray box) a reduction in information domain melting temperature Tm of the mutated duplex 0:* by up to 10 °C compared to the original duplex 0: would be tolerated by the replication reaction. This was inferred from the width of the melting transition of duplex 0: (Figure 7b), where a shift of 10 °C corresponds to an increase of the unbound fraction from 0.08 at Tbase = 45 °C to 0.66 at 55 °C. In terms of free energies of the information domain duplex, this difference corresponds to ΔG(0:*) ≥ −12.5 kcal/mol compared to ΔG(0:) = −15.4 kcal/mol. 99 % of all duplexes 0:*, with * containing three point mutations, met that criterion (Figure 7a). Therefore, up to point mutations can be allowed.

Sequence space analysis of information domain binding.
The binding energies quantify the ability of the replication mechanism to discriminate nucleotide mutations. (a) Cumulative free energy distributions of information domain duplexes 0: (red), : (light red), as well as all 0:* and :* with up to three point mutations in * and * (yellow, green, blue). 99 % of duplexes 0:* with three point mutations have free energies ΔG ≥ -12.5 kcal/mol (dashed line), significantly weaker than that of 0: (ΔG = -15.4 kcal/mol). (b) Melting curves of information domain duplexes 0: (red), : (light red), and the two duplexes 0:* indicated by arrows in panel a. Even the 0:* duplex (i) at the low end of the ΔG distribution has a melting temperature of about 10 °C below that of 0:. This difference in melting temperature destabilizes binding of the information domain and causes the replication mechanism to reject these sequences in the thermal oscillation regime between Tbase = 45 °C and Tpeak = 67 °C (gray box).
-
Figure 7—source data 1
Source data for information domain binding energy statistics split into information domains containing terminal mutations and those with internal mutations only.
- https://cdn.elifesciences.org/articles/63431/elife-63431-fig7-data1-v1.zip
We will assume that the replication did not differentiate between information domain and any information domain * if and * differ by less than point mutations. The fidelity per information domain is given by a cumulative binomial distribution:
Here, is the information domain length, and the per nucleotide replication fidelity. The reduction in binding energy of the information domain duplex 0:* and subsequent change in melting temperature was used as criterion to define the functionality of the replicator and to translate between a per information domain and a per nucleotide approach. As justified above, we calculate with mutations within the bases of the information domain, that is the replication can tolerate up to three mismatches in the information domain. From Figure 6 we extracted a per information domain fidelity of , and deduce a per nucleotide fidelity of %. In fact, information domain duplexes 0:* with mutations at two internal bases all show similar properties as information domains with a total of three mutations (Figure 7—figure supplement 1). This refinement () would increase the per nucleotide fidelity to %. We therefore estimate that a per nucleotide replication process would need a replication fidelity of 85–90 % to produce sequences with an error rate equivalent to the presented mechanism. Detailed calculations of the per nucleotide fidelities can be found in the supplementary information.
Discussion
A cross-catalytic replicator can be made from short sequences and without covalent bonds under a simple non-equilibrium setting of periodic thermal oscillations. The replication is fast and proceeds within a few thermal oscillations of 20 min each. This velocity is comparable to other replicators (Kindermann et al., 2005), cross-ligating ribozymes (Robertson and Joyce, 2014), or autocatalytic DNA networks (Yin et al., 2008). The required thermal oscillations can be obtained by laminar convection in thermal gradients (Braun et al., 2003; Salditt et al., 2020), which also accumulates oligonucleotides (Mast et al., 2013). Depending on the envisioned environment, the mechanism could also be driven by thermochemical oscillations (Ball and Brindley, 2014) or convection in pH gradients (Keil et al., 2017). It should however be noted, that with the current state-of-the-art prebiotic chemistry regarding polymerization and ligation, the creation of >80 nt RNA is not yet understood.
It is likely that a slower prebiotic ligation chemistry could later fix the replication results over long timescales. Such an additional non-enzymatic ligation (Stadlbauer et al., 2015) that joins successive strands would relax the constraint that backbone duplexes must not melt during high-temperature steps. Early on, this is difficult to achieve in aqueous solution against the high concentration of water. In order to overcome this competition and to favor the reaction entropically by a leaving group, individual bases are typically activated by triphosphates (Attwater et al., 2013; Horning and Joyce, 2016) or imidazoles, which are especially interesting in this context since they can replicate RNA directly (O'Flaherty et al., 2019; Zhou et al., 2019). However, the required chemical conditions of enhanced Mg2+ concentration hinder strand separation.
The overall replication fidelity is limited by the spontaneous bond formation rate between pairs of hairpin sequences, caused by the interaction of strands in free solution. At lower concentrations, as one would imagine in a prebiotic setting, this rate would decrease at the expense of an overall slower reaction. To some degree and despite ongoing design efforts, such a background rate is inherent to hairpin-fuelled DNA or RNA reactions (Green et al., 2006; Krammer et al., 2012; Yin et al., 2008).
The replication mechanism is expected to also work with shorter strands, as long as the order of the melting temperatures of the information domain and the backbone duplexes is preserved. Smaller strands would also be easier to produce by an upstream polymerization process, simply because they contain less nucleotides. In addition, binding of shorter information domain duplexes could discriminate even single base mismatches, resulting in an increased selectivity. It is not straightforward to estimate a minimal sequence length for the demonstrated mechanism. However, it is worth noting that it has been suggested that tRNA arose from two proto-tRNA sequences (Hopfield, 1978).
Pre-selection of nucleic acids for the presented hairpin-driven replication mechanism can be provided by highly sequence-specific gelation of DNA. This gel formation has been shown to be most efficient with double hairpin structures very similar to the tRNA-like sequences used in this study (Morasch et al., 2016). For our replication system, we have demonstrated this in Figure 3 by showing the spontaneous formation of agglomerates and sedimentation under gravity if all molecules of the assembly are present. This self-selection shows a possible pathway how the system can emerge from random or semi-random sequences, for example in a flow or a convection system where the molecules are selected as macroscopic agglomerate (Mast et al., 2013). Another selection pressure could stem from the biased hydrolysis of double-stranded nucleotide backbones, which favors assembled complexes over the initial hairpins (Obermayer et al., 2011).
The replication mechanism could serve as a mutable assembly strategy for larger functional RNAs (Mutschler et al., 2015; Vaidya et al., 2012). As an evolutionary route toward a more mRNA-like replication product with chemically ligated information domains, the mechanism would be supplemented by self-cleavage next to the information domains that cuts out the non-coding backbone duplexes, followed by ligation of the information domains. Both operations could potentially be performed by very small ribozymatic centers (Dange et al., 1990; Szostak, 2012; Vlassov et al., 2005).
The proposed replication mechanism of assemblies from tRNA-like sequences allows to speculate about a transition from an autonomous replication of successions of information domains to the translation of codon sequences encoded in modern mRNA (Figure 1a). Short peptide-RNA hybrids (Griesser et al., 2017; Jauker et al., 2015), combined with specific interactions between 3’-terminal amino acids and the anticodons, could have given rise to a primitive genetic code. The spatial arrangement of tRNA-like sequences that are replicated by the presented mechanism would translate into a spatial arrangement of the amino acid or short peptide tails that are attached to the strands in a codon-encoded manner (Schimmel and Henderson, 1994). The next stage would then be the detachment and linking of the tails to form longer peptides. Eventually, tRNA would transition to its modern role in protein translation. The mechanism thus proposes a hypothesis for the emergence of predecessors of tRNA, independent of protein translation. This is crucial for models of the evolution of translation, because it could justify the existence of tRNA before it was utilized in an early translation process. However, many questions around the evolutionary steps that created translation are still unclear.
Therefore, replication and translation could have, at an early stage, emerged along a common evolutionary trajectory. This supports the notion that predecessors of tRNA could have featured a rudimentary replication mechanism: starting with a double hairpin structure of tRNA-like sequences, the replication of a succession of informational domains would emerge. The interesting aspect is, that the replication is first encoded by hybridization and can later be fixed by a much slower ligation of the hairpins. The demonstrated mechanism could therefore jumpstart a non-enzymatic replication chemistry, which was most likely restricted in fidelity due to working on a nucleotide-by-nucleotide basis (Robertson and Joyce, 2012; Szathmáry, 2006).
Materials and methods
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Sequence-based reagent | Biomers | P - GCAGCGTTAATTCCCGC GCCTATCGGGAATGTAA CGCAGTGGGTAATAATG ACGATAGCCGTTCGGGA AAAGCGAACGGTATCG | ||
Sequence-based reagent | Biomers | P - GCAGCGATACCGTTCG CTTTTCCCGAACGGCT ATCGCAGTGGGTAATA ATGAGCGAACTGTCGG TGCTTGCGACAGTGTCGC | ||
Sequence-based reagent | Biomers | P - GCAGGCGACACTGTCG CAAGCACCGACAGTTC GCCAGTGGGTAATAAT GAGCGGTTCCTTGCGG AGTAGGCAAGGAATCCGC | ||
Sequence-based reagent | Biomers | P - GCAGGCGGATTCCTTG CCTACTCCGCAAGGAA TCGCCAGTGGGTAATA ATGACGTTACATTCCC GATAGGCGCGGGAATTAACG | ||
Sequence-based reagent | Biomers | P - GCTGCGCATTAACGCG CTTGTCCCGCGTTAAT TGCGCTCATTATTACC CACTCGCTCTCGGCTG TTTTGCCCAGCCGAGCAGCG | ||
Sequence-based reagent | Biomers | P – GCTGCGTTGCATTGGC GATCAAAGCCAATGCG AACGCTCATTATTACC CACTCGCAATTAACGC GGGACAAGCGCGTTAATGCG | ||
Sequence-based reagent | Biomers | P - GCTGGTTGGAGAAGGC GAACAGCACGCCTTCC CAACCTCATTATTACCC ACTCGTTCGCATTGGC TTTGATC GCCAATGCAACG | ||
Sequence-based reagent | Biomers | P - GCTGCGCTGCTCGGCT GGGCAAAACAGCCGAG AGCGCTCATTATTACCC ACTGTTGGGAAGGCGT GCTGTTCGCCTTCTCCAAC | ||
Sequence-based reagent | Biomers | P - GCAGCGTTAATTCCCG CGCCTATCGGGAATGT AACGCAAAAGAAGAGA AAGACGATAGCCGTTC GGGAAAAGCGAACGGTATCG | ||
Sequence-based reagent | Biomers | P - GCAGCGATACCGTTCG CTTTTCCCGAACGGCT ATCGCAAAAGAAGAGA AAGAGCGAACTGTCGG TGCTTGCGACAGTGTCGC | ||
Sequence-based reagent | Biomers | P - GCAGGCGACACTGTCG CAAGCACCGACAGTTC GCCAAAAGAAGAGAAA GAGCGGTTCCTTGCGG AGTAGGCAAGGAATCCGC | ||
Sequence-based reagent | Biomers | P - GCAGGCGGATTCCTTG CCTACTCCGCAAGGAA TCGCCAAAAGAAGAGA AAGACGTTACATTCCC GATAGGCGCGGGAATTAACG | ||
Sequence-based reagent | Biomers | P - GCTGCGCATTAACGCG CTTGTCCCGCGTTAAT TGCGCTCTTTCTCTTC TTTTCGCTCTCGGCTG TTTTGCCCAGCCGAGCAGCG | ||
Sequence-based reagent | Biomers | P - GCTGCGTTGCATTGGC GATCAAAGCCAATGCG AACGCTCTTTCTCTTC TTTTCGCAATTAACGC GGGACAAGCGCGTTAATGCG | ||
Sequence-based reagent | Biomers | P - GCTGGTTGGAGAAGGC GAACAGCACGCCTTCC CAACCTCTTTCTCTTC TTTTCGTTCGCATTGG CTTTGATCGCCAATGCAACG | ||
Sequence-based reagent | Biomers | P- GCTGCGCTGCTCGGCT GGGCAAAACAGCCGAG AGCGCTCTTTCTCTTC TTTTGTTGGGAAGGCG TGCTGTTCGCCTTCTCCAAC | ||
Sequence-based reagent | – Cy5 | Biomers | Cy5 -GCAGCGTTAATTCCCGC GCCTATCGGGAATGTAA CGCAGTGGGTAATAATG ACGATAGCCGTTCGGGA AAAGCGAACGGTATCG | |
Sequence-based reagent | – Cy5 | Biomers | Cy5 - GCAGCGTTAATTCCCG CGCCTATCGGGAATGT AACGCAAAAGAAGAGA AAGACGATAGCCGTTC GGGAAAAGCGAACGGTATCG | |
Sequence-based reagent | R (random) | Biomers | NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNN | |
Sequence-based reagent | R (random) – Cy5 | Biomers | Cy5 - NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNN | |
Software, algorithm | NUPACK | nupack.org | https://doi.org/10.1002/jcc.21596 | |
Software, algorithm | ImageJ | ImageJ http://imagej.nih.gov/ij/ | RRID:SCR_002285 | |
Software, algorithm | ImageJ stabilization plugin | http://www.cs.cmu.edu/~kangli/code/Image_Stabilizer.html |
Strand design
Request a detailed protocolDNA double-hairpin sequences were designed using the NUPACK software package (Zadeh et al., 2011). In addition to the secondary structures of the double-hairpins, the design algorithm was constrained by all target dimers. Candidate sequences were selected for optimal homogeneity of binding energies and melting temperatures. Backbone domains connecting consecutive strands (e.g. ) had to be the most stable bonds in the system, in particular more stable than between a template and a newly formed product complex (e.g. :). On the other hand, hairpin melting temperatures had to be low enough to allow for a sufficient degree of thermal fluctuations. To reconcile this with the length of the strands, mismatches were introduced in the hairpin stems. The sequences of all strands are listed in Supplementary file 1.
Thermal cycling assays
Request a detailed protocolAll reactions were performed in salt 20 mM Tris-HCl pH 8, 150 mM NaCl with added 20 mM MgCl2. DNA oligonucleotides (Biomers, Germany) were used at 200 nM concentration per strand in reactions containing a fixed-sequence subset of eight strands (e.g. 0/ only) and 100 nM per strand in reactions containing all 16 different strands.
Thermal cycling was done in a standard PCR cycler (Bio-Rad C1000). Reaction kinetics were obtained by running each reaction for different run times or numbers of cycles in parallel. The products were analyzed using native PAGE. The time between thermal cycling and PAGE analysis was minimized to exclude artifacts from storage on ice.
Template sequences were prepared using a two-step protocol. Annealing from 95°C to 70°C within 1 hr, followed by incubation at 70 °C for 30 min. Afterwards, samples were cooled to 2 °C and stored on ice. When assembling complexes containing paired information domains (Figure 2), samples were slowly cooled down from 70 to 25 °C within 90 min before being transferred onto ice. DNA double hairpins were quenched into monomolecular state by heating to 95 °C and subsequent fast transfer into ice water.
Product analysis
Request a detailed protocolDNA complexes were analyzed using native polyacrylamide gel electrophoresis (PAGE) in gels at 5 % acrylamide concentration and 29:1 acrylamide / bisacrylamide ratio (Bio-Rad, Germany). Gels were run at electric fields of 14 V/cm at room temperature. Strand / was covalently labeled with Cy5. Cy5 fluorescence intensities were later used to compute strand concentrations. As an additional color channel, strands were stained using SYBR Green I dye (New England Biolabs). Complexes were identified by comparing the products obtained from annealing different strand subsets.
To correctly identify bands in the time-resolved measurements, gels were run with a marker lane. The marker contained strands (200 nM), (150 nM), (50 nM), and (100 nM), and was prepared using the two-step annealing protocol from 95 to 70 °C. The unequal strand concentrations ensured that the sample contained a mixture of mono-, di-, tri-, and tetramers.
Electrophoresis gels were imaged in a multi-channel imager (Bio-Rad ChemiDoc MP), image post processing, and data analysis were performed using a self-developed LabVIEW software. Post-processing corrected for inhomogeneous illumination by the LEDs, image rotation, and distortions of the gel lanes if applicable. Background fluorescence was determined from empty lanes on the gel, albeit generally low in the Cy5 channel.
For the determination of reaction yields, the intensities of all gel bands containing strands of the sequence length of interest were added up. For strings of four strands, these were the single tetramer as well as its complex with di- and tri- and tetramers. Single strands separated from their complements during electrophoresis (Figure 2 and Figure 6—figure supplement 1).
Thermal melting curves
Request a detailed protocolThermal melting curves were measured using either UV absorbance at 260 nm wavelength in a UV/Vis spectrometer (JASCO V-650, 1 cm optical path length), via quenching of the Cy5 label at the 5'-end of strand (excitation: 620–650 nm, detection: 675–690 nm), or using fluorescence of the intercalating dye SYBR Green I (excitation: 450–490 nm, detection: 510–530 nm). Fluorescence measurements were performed in a PCR cycler (Bio-Rad C1000). Samples measured via fluorescence were at 200 nM of each strand, those measured via UV absorption contained 1 µM total DNA concentration to improve the signal-to-noise ratio. Before analysis of the melting curves (Mergny and Lacroix, 2003), data were corrected for baseline signals from reference samples containing buffer and intercalating dye, if applicable.
Self-assembly and sedimentation analysis
Request a detailed protocolThe samples were mixed in the replication buffer (150 mM NaCl, 20 mM MgCl2, 20 mM Tris-HCl pH 8) at a total oligomer concentration of 5 µM, that is varying concentration per strand depending on the number of different strands in the configuration (4, 7, or 8). The microfluidic chamber was assembled with a custom cut, 500 µm thick, Teflon foil placed between two plane sapphires (Figure 3—figure supplement 2). Three Peltier elements (QuickCool QC-31–1.4-3.7AS, purchased from Conrad Electronics, Germany) were attached to the backside of the chamber to provide full temperature control. The chamber was initially flushed with 3M Novec7500 (3M, Germany) to avoid bubble formation. The samples were pipetted into the microfluidic chamber through the 0.5 mm channels using microloader pipette tips (Eppendorf, Germany). The chamber was then sealed with Parafilm and heated to 95 °C for 10 s to fully separate the strands and cooled rapidly (within 30 s) to 25 °C. Assembly and sedimentation were monitored for 20 hr on a fluorescence microscope (Axiotech Vario, Zeiss, Germany) with two LEDs (490 nm and 625 nm, Thorlabs, Germany) using a 2.5 x objective (Fluar, Zeiss, Germany). The observed sedimentation was independent of the attached dye and its position (Figure 3—figure supplement 1c). Prior to image analysis the image stacks were stabilized using an ImageJ plugin (Li, 2008). The ratio of sedimented fluorescence relative to the first frame after heating was used to quantify sedimentation (Figure 3). The sedimentation time-traces (Figure 3b) were fitted with a Sigmoid function to determine the final concentration increase c/c0 (Figure 3c). The experiment was also performed with random 84 nt DNA strands at 5 µM total concentration to exclude unspecific agglomeration (Figure 3—figure supplement 1c).
Appendix 1
Calculation of fidelity rate
Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. Now, we want to assume that the presented replication mechanism would translate into a base-by-base replication and look at (i) how tolerant would the replication be to point mutations at the information domain and (ii) given that threshold, how good would a base-by-base replication have to do to perform equally well, that is what per nucleotide fidelity would it need to have.
Question (i) is answered in Figure 7, where we see that on the 15 nt information domain we can allow up to three base mismatches to stay within the bounds of the temperature cycling (gray box, Figure 7b). In order to calculate how the measured replication fidelity per information domain translates into a hypothetical replication fidelity per nucleotide we assume a cumulative binomial distribution:
We know that the overall likelihood to get a 'correctly' replicated information domain is 62 %. From Figure 7 we know that in a base-by-base replication, 'correctly' means with up to three mismatches. Therefore, we must find the number of combinatorial possibilities of spatially distributing 0, 1 or 2 mismatches on the 15 nt information domain (using nucleotides and allowing up to mismatches). Using this, we can determine the probability for a success, that is the correct replication of a single nucleotide, to meet the overall likelihood.
For and , we measure the replication fidelity per information domain to be . Therefore, we calculate:
From the information domain energy statistics shown in Figure 7—figure supplement 1, one can see that strands with two internal mutations behave nearly identical to strands with a total of three mutations (accepting internal and terminal mutations). Therefore, we simplify the calculation and only consider internal mutations.
Accordingly, we calculate for and and a per information domain fidelity :
Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85–90 % to perform equally well as the presented replication mechanism.
Data availability
No data sets (e.g. sequencing data, clinical trial data etc.) were produced in this study. The source data files (Igor incl. macros) and data analysis (LabVIEW) tools used are provided as supporting fFiles (zip).
References
-
From self-replication to replicator systems en route to de novo lifeNature Reviews Chemistry 4:386–403.https://doi.org/10.1038/s41570-020-0196-x
-
Emergence of a new Self-Replicator from a dynamic combinatorial library requires a specific Pre-Existing replicatorJournal of the American Chemical Society 139:13612–13615.https://doi.org/10.1021/jacs.7b07346
-
In-ice evolution of RNA polymerase ribozyme activityNature Chemistry 5:1011–1018.https://doi.org/10.1038/nchem.1781
-
Hydrogen peroxide thermochemical oscillator as driver for primordial RNA replicationJournal of the Royal Society Interface 11:20131052.https://doi.org/10.1098/rsif.2013.1052
-
A synthetic replicator drives a propagating Reaction-Diffusion frontJournal of the American Chemical Society 138:6723–6726.https://doi.org/10.1021/jacs.6b03372
-
Self-assembly and Self-replication of short amphiphilic β-sheet peptidesOrigins of Life and Evolution of Biospheres 41:563–567.https://doi.org/10.1007/s11084-011-9257-y
-
Exponential DNA replication by laminar convectionPhysical Review Letters 91:158103.https://doi.org/10.1103/PhysRevLett.91.158103
-
Site-directed modification of DNA duplexes by chemical ligationNucleic Acids Research 16:3721–3738.https://doi.org/10.1093/nar/16.9.3721
-
DNA hairpins: fuel for autonomous DNA devicesBiophysical Journal 91:2966–2975.https://doi.org/10.1529/biophysj.106.084681
-
Amino Acid-Specific, Ribonucleotide-Promoted peptide formation in the absence of enzymesAngewandte Chemie 129:1244–1248.https://doi.org/10.1002/ange.201610651
-
Spontaneous formation of RNA strands, peptidyl RNA, and cofactorsAngewandte Chemie International Edition 54:14564–14569.https://doi.org/10.1002/anie.201506593
-
Systems chemistry: kinetic and computational analysis of a nearly exponential organic replicatorAngewandte Chemie International Edition 44:6750–6755.https://doi.org/10.1002/anie.200501527
-
Exploring the emergence of complexity using synthetic replicatorsChemical Society Reviews 46:7274–7305.https://doi.org/10.1039/C7CS00123A
-
Thermal, autonomous replicator made from transfer RNAPhysical Review Letters 108:238104.https://doi.org/10.1103/PhysRevLett.108.238104
-
Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2‘-Hydroxyl GroupJournal of the American Chemical Society 121:5364–5372.https://doi.org/10.1021/ja990592p
-
Analysis of thermal melting curvesOligonucleotides 13:515–537.https://doi.org/10.1089/154545703322860825
-
Heat-Flow-Driven oligonucleotide gelation separates Single-Base differencesAngewandte Chemie International Edition 55:6676–6679.https://doi.org/10.1002/anie.201601886
-
Freeze-thaw cycles as drivers of complex ribozyme assemblyNature Chemistry 7:502–508.https://doi.org/10.1038/nchem.2251
-
Nonenzymatic Template-Directed synthesis of Mixed-Sequence 3'-NP-DNA up to 25 nucleotides long inside model protocellsJournal of the American Chemical Society 141:10481–10488.https://doi.org/10.1021/jacs.9b04858
-
Emergence of information transmission in a prebiotic RNA reactorPhysical Review Letters 107:018101.https://doi.org/10.1103/PhysRevLett.107.018101
-
Prebiotic chemistry and the origin of the RNA worldCritical Reviews in Biochemistry and Molecular Biology 39:99–123.https://doi.org/10.1080/10409230490460765
-
DNA with 3'-5'-disulfide links--rapid chemical ligation through isosteric replacementAngewandte Chemie International Edition 53:4222–4226.https://doi.org/10.1002/anie.201310644
-
Sequence complementarity-driven nonenzymatic ligation of RNABiochemistry 50:2994–3003.https://doi.org/10.1021/bi101981z
-
The origins of the RNA worldCold Spring Harbor Perspectives in Biology 4:a003608.https://doi.org/10.1101/cshperspect.a003608
-
Highly efficient self-replicating RNA enzymesChemistry & Biology 21:238–245.https://doi.org/10.1016/j.chembiol.2013.12.004
-
Nonenzymatic, template-directed ligation of oligoribonucleotides is highly regioselective for the formation of 3'-5' phosphodiester bondsJournal of the American Chemical Society 118:3340–3344.https://doi.org/10.1021/ja9537134
-
A simple synthetic replicator amplifies itself from a dynamic reagent poolAngewandte Chemie 120:10113–10118.https://doi.org/10.1002/ange.200804223
-
Thermal habitat for RNA amplification and accumulationPhysical Review Letters 125:048104.https://doi.org/10.1103/PhysRevLett.125.048104
-
Structure and transcription of eukaryotic tRNA genesCritical Reviews in Biochemistry 19:107–144.https://doi.org/10.3109/10409238509082541
-
Tetraloop-like geometries could form the basis of the catalytic activity of the most ancient ribooligonucleotidesChemistry - a European Journal 21:3596–3604.https://doi.org/10.1002/chem.201406140
-
The origin of replicators and reproducers philosophical transactions of the royal society BBiological Sciences 361:1761–1776.https://doi.org/10.1098/rstb.2006.1912
-
The eightfold path to non-enzymatic RNA replicationJournal of Systems Chemistry 3:2.https://doi.org/10.1186/1759-2208-3-2
-
Self-replicating systemJournal of the American Chemical Society 112:1249–1250.https://doi.org/10.1021/ja00159a057
-
Catalyzed and spontaneous reactions on ribozyme riboseJournal of the American Chemical Society 133:6044–6050.https://doi.org/10.1021/ja200275h
-
The RNA world on ice: a new scenario for the emergence of RNA informationJournal of Molecular Evolution 61:264–273.https://doi.org/10.1007/s00239-004-0362-7
-
A Self-Replicating hexadeoxynucleotideAngewandte Chemie International Edition in English 25:932–935.https://doi.org/10.1002/anie.198609322
-
BookThe general and logical theory of automataIn: Jeffress L. A, editors. Cerebral Mechanisms in Behavio. Wiley. pp. 1–41.
-
NUPACK: analysis and design of nucleic acid systemsJournal of Computational Chemistry 32:170–173.https://doi.org/10.1002/jcc.21596
Decision letter
-
Patricia J WittkoppSenior Editor; University of Michigan, United States
-
Gonen AshkenasyReviewing Editor; Ben-Gurion University, Israel
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
We have found special interest in your new system, since it can serve as a new platform for building a replicator or an amplifier, and hence may help understanding how DNA/RNA self-replicate and pass on information.
Decision letter after peer review:
Thank you for submitting your article "tRNA sequences can assemble into a replicator" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Patricia Wittkopp as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our policy on revisions we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.
Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.
Summary:
The authors describe a new replication system, driven by DNA molecules for which the sequence design was inspired by natural tRNAs. The new system assembles into four hairpins that can template the replication of complementary sequences. One of the main challenges in obtaining efficient (exponential) replication is originated from product inhibition, namely that the template-replicate duplexes are more stable than their single stranded forms, thus tend to have longer lifetimes and slow down the next replication cycles. In the current work, the authors developed a smart thermal strategy to drive the templates amplification. Their design offers a good mechanism for building a replicator or an amplifier and may serve as a simplified platform in understanding more about how DNA/RNA may self-replicate and pass on information. The experiments of templated kinetics and exponential replication seem to be conducted appropriately, and the article is well-written and supported on previous literature from the field.
The paper can be of interest to the growing community exploring the chemistry of the origin of life, and particularly replicative chemistry. The paper is recommended to be published after addressing the following comments.
Essential revisions:
1) For the Sequence replication, Figure 6b, what is the concentration of y-axis that is being reported? Is it the tetramer concentration (including and ) or does it only include ? If it includes both, it should be specified, if not, can the authors explain why it is possible to form new tetramers when there is no fuels? Also, as all 4 reactions (++++, +++-, ++--, +-+-) starts with 15 nM of tetramers, should they not begin with at least 15 nM in the 4 cases compared to the no template case?
2) Can the authors better explain the section of the calculation of Replication fidelity? As there are no mutations happening in the hairpins, wouldn't it make more sense for calculating the fidelity rate on information-domain-basis instead of on the nucleotide-basis? Also, the equation for fidelity per information domain 𝑝𝐾(𝑁) may need more explanation and clarification, it would be helpful, thus, to add a section in the supplementary information clearing up the definition of fidelity rate (of the final tetramers, how many of them are ) and how these equations are derived. The 71 % fidelity per information domain caused by the 40 % decrease is also a bit confusing, as in this case, all these products should be errors, doesn't it translate to 60 % of leak for such cases?
3) No comments are made along the paper regarding the possibility of non-nucleic acid molecules to replicate, but many recent studies have highlighted this possibility. As a minimum, to complete the discussion it would be nice to refer the reader to review papers on this topic. In addition, since the replication process described in this paper is based on self-assembly, some references to previous works where self-assembly and replication were also intimately related are missing.
4) By reading the end of the Abstract (and from the title) one may understand that today's tRNA sequences, or very similar sequences, could confer any advantages with respect to replication and selection. As this point is not directly demonstrated in the paper, the authors could consider a more conservative discussion of this issue and clarify how far this argument holds.
5) The statement "Here, we implemented the system with DNA for practical reasons. Nevertheless, due to short heating times and moderate magnesium concentrations, we also estimate that an RNA version can survive for weeks (Li and Breaker, 1999)" may be true according to that reference, but it would have so important implications that it requires some experimental verification. Otherwise, it would be better to be more conservative about this point.
https://doi.org/10.7554/eLife.63431.sa1Author response
Essential revisions:
1) For the Sequence replication, Figure 6b, what is the concentration of y-axis that is being reported? Is it the tetramer concentration (including and ) or does it only include ? If it includes both, it should be specified, if not, can the authors explain why it is possible to form new tetramers when there is no fuels? Also, as all 4 reactions (++++, +++-, ++--, +-+-) starts with 15 nM of tetramers, should they not begin with at least 15 nM in the 4 cases compared to the no template case?
In these settings, as the reviewers noted correctly, Figure 6b shows indeed the product concentration. As the label sits on strand , the template is unlabeled and only the formation of product (containing ) is recorded. This also becomes clear in Figure 6a, where the labeled strand at timepoint zero shows no product complexes but over time gets incorporated in the replicated product complexes. Therefore, the starting concentration of the product is indeed 0 nM, as shown in the Figure 6b. Only the unlabeled template is present in the first cycle at a concentration of 15 nM.
To study the replication of information, the performed reactions contained all 16 fuel strands (i.e. , , , ), and started with one of the three templates , , . We then compare the amount of product for the complete reaction (“++++”) to the output for reactions lacking one or more of the fuel strands required to replicate the template after six oscillations. For example, in the case “+++-“, when providing the template , we omitted and thus forced the reaction to create a mutation from “0” to “1” at position D. For the case “++--“ and the template , both and were omitted, which forced two mutations from “0” to “1” at positions C and D.
One might also ask why we did not use differentially labeled strands. We deliberately decided against this, as we wanted to keep the labeled strand constant and not introduce additional labels to avoid differences in binding due to potential stacking interactions of or with the dyes.
To better show all this, we have changed the y-label of Figure 6b and adapted the figure legend.
In all cases, when a fuel strand required for the correct product formation is removed, e.g. removing when providing template , the product yield of tetramers is reduced to about 40 %. Since it is impossible to form the correct product, the detected tetramers must contain mismatches at the position where the correct strand is missing. In principle, this can be two kinds of mismatches:
i) The incorporation of the correct information domain but at the wrong position, leading to an insufficiently formed backbone. For example, could be incorporated at the position of . But location mismatches of this type will break up during the next temperature cycle as could only bind at the information domain but is incompatible along the hairpins and would therefore not lead to the formation of a tetramer. Such a reaction would not be reflected in the final yield of tetramers.
ii) The second type of mismatch will in contrast alter the final concentration of tetramers. A correctly formed backbone is created, but an information domain mismatch occurs, e.g. is incorporated at the position of . This tetramer with the mutation from “0” to “1” at position D will not break up during the temperature cycles due to its correctly formed backbone. Therefore, we enter the next round of replication with an additional tetramer with a correctly formed backbone but a different sequence, e.g. here . This will continue to replicate since for this tetramer all fuel strands are available. From now on, can act as a template for an unfaithful replication. We argue that mismatches of type (ii) are most likely the reason for a tetramer product yield of 40 % despite one missing strand.
In Figure 6c, we also tested above scheme for two other templates and which behaved in a very similar manner and showed that the probability of two mutations is in good approximation with the squared probability of a single mutation ( %). This indicates that the processes causing the non-zero product yields in first approximation are independent, which matches with our explanation of mismatch type (ii). Due to the periodicity of the design, the two defective sets (“++--” and “+-+-“) cover the whole combinatorial space of two mismatches.
Clarifying those points, we now write:
“Reference reactions contained all 16 strands (, , , , , …, ) at 100 nM each, and were run for each of three different template sequences (, , and ) (Figure 6). […] A single defect reduced the yield of tetramer complexes to about 40 %, two defects to 15–20 %, which is close to %, i.e. the combined probability of two independent mismatches.”
2) Can the authors better explain the section of the calculation of Replication fidelity? As there are no mutations happening in the hairpins, wouldn't it make more sense for calculating the fidelity rate on information-domain-basis instead of on the nucleotide-basis? Also, the equation for fidelity per information domain may need more explanation and clarification, it would be helpful, thus, to add a section in the SI clearing up the definition of fidelity rate (of the final tetramers, how many of them are 0A0B0C0D) and how these equations are derived. The 71 % fidelity per information domain caused by the 40 % decrease is also a bit confusing, as in this case, all these products should be errors, doesn't it translate to 60 % of leak for such cases?
We indeed agree that the replication fidelity was estimated wrongly by assigning the 100 % to the sum of the concentrations in the calculation of the fraction of perfect matches. We therefore revised the numbers given regarding the fidelity of the replicator.
This reduces our estimation of the replicator’s fidelity from 71 % down to 62 %. However, we want to point out that in the way we test for mutations, competition for binding sites is neglected, which is why a replication fidelity of 62 % is a lower bound estimation.
As correctly noted by the reviewers, the replication fidelity is defined by how much of the replicated information is replicated accurately. To stay with the example from (M1) and Figure 6b, this means how many of the product tetramers from template and “++++” do actually contain the accurate product sequence . In the experiments shown in Figure 6b, c we have determined the probability of mutations in the absence of where we obtain a ~40 % yield. The exact value for the replication fidelity of 62 % can directly be calculated from Figure 6b by extracting the endpoint concentrations (blue line vs. yellow lane) and calculating . Please note that for the calculation of the replication fidelity we now use a 2-digit precision, whereas for simplicity we stick with a 1-digit precision in Figure 6.
The experiments presented in Figure 6 measured the rate of incorporation of if no was present and there was no competition in binding. For the case of provided template – in Figure 6c different templates are analyzed – this means that the faithfully replicated products without competition amount to a ratio of 62 %, a little bit lower than initially stated in the manuscript where we incorrectly put the 100 % reference to the sum of both above concentrations.
We think this is a worst-case scenario. The mutations in the “+++-” case of Figure 6b were forced from a template ending with and could only bind as the optimal fuel was not provided. In the full system, the presence of the matching fuel strand, which binds preferentially at position D would have reduced the unfaithful incorporation of the wrong strand. We have seen a similar effect of competition for a protein-catalyzed ligation reaction (Toyabe and Braun, 2019). There, a comparable binding competition lead to a 7-fold decrease of the inferior ligation reaction in the presence of competition (Figure 2a, b therein).
Following this argumentation, we expect that the mutations from “0” to “1” would occur much less under competition, when the fuel for “0” is provided in the mutation experiment shown in Figure 6b. How much this will actually be the case is however hard to estimate as the analysis cannot distinguish between sequences.
The calculation of the replication fidelity per nucleotide is a projection. The aim in calculating a number for the per nucleotide replication fidelity is to compare our work to other studies, which are in comparison base-by-base replicators and provide a number for the replication fidelity per nucleotide.
Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. Now, we assume that the presented replication mechanism would translate into a base-by-base replication and look at (i) how tolerant would the replication scheme be to point mutations at the information domain and (ii) given that threshold, how well would a base-by-base replication have to perform in terms of per nucleotide fidelity.
We explain the reasoning in calculating the per nucleotide fidelity more thoroughly now in the manuscript. We also added a section, where we provide more detail on why a binomial distribution is used and how the exact numbers for the replication fidelity per nucleotide are calculated.
“Question (i) is answered in Figure 7, where we see that on the 15 nt information domain we can allow up to three base mismatches to stay within the bounds of the temperature cycling (gray box, Figure 7b). We make this clearer in the manuscript now. In order to calculate how the measured replication fidelity per information domain translates into a hypothetical replication fidelity per nucleotide we assume a cumulative binomial distribution:
We know that the overall likelihood to get a “correctly” replicated information domain is 62 %. From Figure 7 we know that in a base-by-base replication, “correctly” translates to three mismatches to sustain the replication. Therefore, we must find the number of combinatorial possibilities of spatially distributing 0, 1 or 2 mismatches on the 15 nt information domain (using nucleotides and allowing up to mismatches). Using this, we can determine the probability for a success, i.e. the correct replication of a single nucleotide, to meet the overall likelihood.”
For and , the cumulative binomial distribution can be solved for (the per-nucleotide fidelity needed) , which yields . When neglecting the terminal mismatches (Figure 7—figure supplement 1), we calculate after solving . Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85-90 % to perform equally well as the presented replication mechanism.
Reflecting the above discussion, we now write in the manuscript:
“The reduction in yield caused by a single defect (i.e. missing strand) to ~40 % (and to ~16 % for two defects) translates into a replication fidelity per information domain of ~60 %. […] Detailed calculations of the per nucleotide fidelities can be found in the subsection “Calculation of fidelity rate”.”
In addition, we included the following extra section:
“Calculation of fidelity rate
Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. […] Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85 - 90 % to perform equally well as the presented replication mechanism.”
We also adjusted the number given for the replication fidelity per nucleotide in the Abstract to 88 % and reformulated the sentence to minimize confusion about the per information domain and per nucleotide fidelity. We now write:
“The molecular assembly could encode and replicate binary sequence information with a replication fidelity corresponding to 85 - 90 % per nucleotide.”
3) No comments are made along the paper regarding the possibility of non-nucleic acid molecules to replicate, but many recent studies have highlighted this possibility. As a minimum, to complete the discussion it would be nice to refer the reader to review papers on this topic. In addition, since the replication process described in this paper is based on self-assembly, some references to previous works where self-assembly and replication were also intimately related are missing.
This is indeed a very good point – and apologies for our nucleotide-centered point of view. We now add to the manuscript:
“Apart from nucleotide-based replicators, very interesting replication systems using non-covalent interactions have been developed with non-biological compounds (Bottero et al., 2016; Sadownik & Philp, 2008, Tjivikua et al., 1990,), peptide-based approaches (Altay et al., 2017; Bourbo et al., 2011; Carnall et al., 2010; Lee et al., 1996; Rubinov et al., 2012) and peptide nucleic acids (Ura et al., 2009). We also want to point to several instructive reviews about the state-of-the-art systems chemistry regarding self-replication (Adamski et al., 2020, Ashkenasy et al., 2017, Kosikova and Philp, 2017).”
4) By reading the end of the Abstract (and from the title) one may understand that today's tRNA sequences, or very similar sequences, could confer any advantages with respect to replication and selection. As this point is not directly demonstrated in the paper, the authors could consider a more conservative discussion of this issue and clarify how far this argument holds.
We understand the reviewers’ reservations about this point. We want to stress that we do not claim that today’s tRNA sequences have any advantages in today’s replication or selection. We merely argue that our experiments support a hypothesis under which tRNA which is one of the most ancient molecules of modern biology might have transformed its role over time. While it today is responsible for the translation of proteins, it might much earlier, in a maybe slightly different form, have been involved in a molecular replication scheme, like the one presented in this manuscript.
We have therefore reformulated the sentence in the Abstract to make it clear that our argument is about a connection in very early replication mechanisms. Of course, it is difficult to argue that modern tRNA sequences are very close to ancient ones. But we hope that our discussion of the replication mechanism on the basis of melting temperatures and the kinetics of hybridization makes it clear that we do not rely on a very specific sequence, but merely on hybridization and a conserved order of melting temperatures.
We now write in the Abstract:
“The replication by a self-assembly of tRNA-like sequences suggests that early forms of tRNA could have been involved in molecular replication. This would link the evolution of translation to a mechanism of molecular replication.”
5) The statement "Here, we implemented the system with DNA for practical reasons. Nevertheless, due to short heating times and moderate magnesium concentrations, we also estimate that an RNA version can survive for weeks (Li and Breaker, 1999)" may be true according to that reference, but it would have so important implications that it requires some experimental verification. Otherwise, it would be better to be more conservative about this point.
Krammer et al. reported a much more primitive replicator with single and not double hairpins and therefore only half the sequence length (Krammer et al., 2012). It is important to note that in this 2012 study the replicator was implemented in RNA. After remodeling the replicator in DNA, we can say that for both replicators, the RNA and the DNA version, we could explain their behavior based on hybridization, and did not have to include any extra considerations for the RNA version.
We also included an overview (see Figure 1—figure supplement 1b) over the predicted secondary structures and the free energies for an RNA version of the presented replicator, when substituting every 'T' with a 'U', using NUPACK (Zadeh et al., 2011). The secondary structure is identical to the DNA version (compare with Figure 1—figure supplement 1a). Only the free energies are slightly higher (+30 %) which could be compensated by a reduction of salt concentrations for an RNA implementation.
Therefore, we argue that the replication scheme can readily be implemented in RNA. Even though the timescales on which Krammer et al. performed their experiments are much shorter, the initial heating step to 95 °C (20 mins > 80 °C) is identical and would arguably have the strongest effect on RNA stability compared to the moderate temperatures during cycling, in both Krammer et al. (10 °C (27 s) - 40 °C (3 s)) and this study (45 °C (20mins) - 67 °C (20 s)).We also want to quote another recent study looking at the hydrolysis of RNA by Mariani et al. They determined the half-life for a 10 nt RNA at 10mM Mg2+ at 90 °C to be seven days. For a 30 nt RNA under the same conditions they measure 16 % unspecific degradation after seven days (Mariani et al., 2018). Even though our Mg2+ concentration is 2-fold higher, we are operating at much more moderate temperatures. We also want to mention a recent study from our own lab, where replication with a 200 nt ribozyme was performed at much higher Mg2+ concentration (50 mM) including temperature spikes (Salditt et al., 2020), which were tolerated well and confirm the hydrolysis studies cited in this manuscript.
We understand that RNA stability at high magnesium concentration and high temperature is critical, but as the time of exposure at high temperature in the presented replication scheme is limited, we would stick with our claim however in a reformulated form. Although, we now elaborate more on the previous implementation with RNA. We now write:
"Here, we implemented the system with DNA and not RNA as done previously (Krammer et al., 2012). Both, in the design and the implementation we did not see significant differences between the two versions. Because of the simpler and more inexpensive synthesis of the 82-84 nt long sequences we now implemented the replicator in DNA. Due to short heating times and moderate magnesium concentrations, we estimate that an RNA version could survive for days if not weeks (Li & Breaker, 1999, Mariani et al. 2018). The most critical step regarding the RNA stability would be the initial temperature spike to 95 °C, which remains unchanged from our previous study (Krammer et al., 2012) and did not prove critical. In Figure S1 we also show that an RNA version behaves structurally identical to the implemented DNA version."
https://doi.org/10.7554/eLife.63431.sa2Article and author information
Author details
Funding
Deutsche Forschungsgemeinschaft (TRR 235)
- Alexandra Kühnlein
- Dieter Braun
Deutsche Forschungsgemeinschaft (Project-ID 364653263)
- Alexandra Kühnlein
- Dieter Braun
Deutsche Forschungsgemeinschaft (CRC 1032 (A04) Project-ID 201269156)
- Simon Alexander Lanzmich
- Dieter Braun
Deutsche Forschungsgemeinschaft (Student fellowship)
- Alexandra Kühnlein
Deutsche Forschungsgemeinschaft (Graduate school "Quantitative Bioscience Munich")
- Alexandra Kühnlein
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We gratefully acknowledge financial support the Deutsche Forschungsgemeinschaft (DFG) through the TRR 235 Emergence of Life (Project-ID 364653263) and the CRC 1032 NanoAgents (Project-ID 201269156). We thank for funding from the Graduate School ‘Quantitative Bioscience Munich’ (QBM). We appreciate the fruitful discussions in the Simons Collaboration on the Origins of Life, thank for the measurements by Thomas Rind and acknowledge discussions with Tim Liedl, Christof Mast and Lorenz Keil. We thank Filiz Civril, Adriana Serrão and Thomas Matreux for comments on the manuscript.
Senior Editor
- Patricia J Wittkopp, University of Michigan, United States
Reviewing Editor
- Gonen Ashkenasy, Ben-Gurion University, Israel
Publication history
- Received: September 24, 2020
- Accepted: January 28, 2021
- Version of Record published: March 2, 2021 (version 1)
Copyright
© 2021, Kühnlein et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 8,768
- Page views
-
- 662
- Downloads
-
- 7
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
The mouse brain is by far the most intensively studied among mammalian brains, yet basic measures of its cytoarchitecture remain obscure. For example, quantifying cell numbers, and the interplay of sex, strain, and individual variability in cell density and volume is out of reach for many regions. The Allen Mouse Brain Connectivity project produces high-resolution full brain images of hundreds of brains. Although these were created for a different purpose, they reveal details of neuroanatomy and cytoarchitecture. Here, we used this population to systematically characterize cell density and volume for each anatomical unit in the mouse brain. We developed a DNN-based segmentation pipeline that uses the autofluorescence intensities of images to segment cell nuclei even within the densest regions, such as the dentate gyrus. We applied our pipeline to 507 brains of males and females from C57BL/6J and FVB.CD1 strains. Globally, we found that increased overall brain volume does not result in uniform expansion across all regions. Moreover, region-specific density changes are often negatively correlated with the volume of the region; therefore, cell count does not scale linearly with volume. Many regions, including layer 2/3 across several cortical areas, showed distinct lateral bias. We identified strain-specific or sex-specific differences. For example, males tended to have more cells in extended amygdala and hypothalamic regions (MEA, BST, BLA, BMA, and LPO, AHN) while females had more cells in the orbital cortex (ORB). Yet, inter-individual variability was always greater than the effect size of a single qualifier. We provide the results of this analysis as an accessible resource for the community.
-
- Computational and Systems Biology
- Immunology and Inflammation
To appropriately defend against a wide array of pathogens, humans somatically generate highly diverse repertoires of B cell and T cell receptors (BCRs and TCRs) through a random process called V(D)J recombination. Receptor diversity is achieved during this process through both the combinatorial assembly of V(D)J-genes and the junctional deletion and insertion of nucleotides. While the Artemis protein is often regarded as the main nuclease involved in V(D)J recombination, the exact mechanism of nucleotide trimming is not understood. Using a previously published TCRβ repertoire sequencing data set, we have designed a flexible probabilistic model of nucleotide trimming that allows us to explore various mechanistically interpretable sequence-level features. We show that local sequence context, length, and GC nucleotide content in both directions of the wider sequence, together, can most accurately predict the trimming probabilities of a given V-gene sequence. Because GC nucleotide content is predictive of sequence-breathing, this model provides quantitative statistical evidence regarding the extent to which double-stranded DNA may need to be able to breathe for trimming to occur. We also see evidence of a sequence motif that appears to get preferentially trimmed, independent of GC-content-related effects. Further, we find that the inferred coefficients from this model provide accurate prediction for V- and J-gene sequences from other adaptive immune receptor loci. These results refine our understanding of how the Artemis nuclease may function to trim nucleotides during V(D)J recombination and provide another step toward understanding how V(D)J recombination generates diverse receptors and supports a powerful, unique immune response in healthy humans.