1. Computational and Systems Biology
  2. Structural Biology and Molecular Biophysics
Download icon

tRNA sequences can assemble into a replicator

  1. Alexandra Kühnlein
  2. Simon A Lanzmich
  3. Dieter Braun  Is a corresponding author
  1. Systems Biophysics, Physics Department, Center for NanoScience, Ludwig-Maximilians-Universität München, Germany
Research Article
  • Cited 0
  • Views 6,064
  • Annotations
Cite this article as: eLife 2021;10:e63431 doi: 10.7554/eLife.63431

Abstract

Can replication and translation emerge in a single mechanism via self-assembly? The key molecule, transfer RNA (tRNA), is one of the most ancient molecules and contains the genetic code. Our experiments show how a pool of oligonucleotides, adapted with minor mutations from tRNA, spontaneously formed molecular assemblies and replicated information autonomously using only reversible hybridization under thermal oscillations. The pool of cross-complementary hairpins self-selected by agglomeration and sedimentation. The metastable DNA hairpins bound to a template and then interconnected by hybridization. Thermal oscillations separated replicates from their templates and drove an exponential, cross-catalytic replication. The molecular assembly could encode and replicate binary sequences with a replication fidelity corresponding to 85–90 % per nucleotide. The replication by a self-assembly of tRNA-like sequences suggests that early forms of tRNA could have been involved in molecular replication. This would link the evolution of translation to a mechanism of molecular replication.

eLife digest

The genetic code stored within DNA contains the instructions for manufacturing all the proteins organisms need to develop, grow and survive. This requires molecular machines that ‘transcribe’ regions of the genetic code into RNA molecules which are then ‘translated’ into the string of amino acids that form the final protein. However, these molecular machines and other proteins are also needed to replicate and synthesize the sequences stored in DNA. This presents evolutionary biologists with a ‘chicken-and-egg’ situation: which came first, the DNA sequences needed to manufacture proteins or the proteins needed to transcribe and translate DNA?

Understanding the order in which DNA replication and protein translation evolved is challenging as these processes are tightly intertwined in modern-day species. One theory, known as the ‘RNA world hypothesis’, suggests that all life on Earth began with a single RNA molecule that was able to make copies of itself, as DNA does today. To investigate this hypothesis, Kühnlein, Lanzmich and Braun studied a molecule called transfer RNA (or tRNA for short) which is responsible for translating RNA into proteins. tRNA is assumed to be one of the earliest evolved molecules in biology. Yet, why it was present in early life forms before it was needed for translation still remained somewhat of a mystery.

To gain a better understanding of tRNA’s role early in evolution, Kühnlein, Lanzmich and Braun made small changes to its genetic code and then carried out tests on these tRNA-like sequences. The experiments showed these ‘early’ forms of tRNA can actually self-assemble into a molecule which is capable of replicating the information stored in its sequence. It suggests early forms of tRNA could have been involved in replication before modern tRNA developed its role in protein translation.

With these experiments, Kühnlein, Lanzmich and Braun have identified a possible evolutionary link between DNA replication and protein translation, suggesting the two processes emerged through one shared pathway: tRNA. This deepens our understanding about the origins of early life, while taking biochemists one step closer to their distant goal of recreating self-replicating molecular machines in the laboratory.

Introduction

A machine to create replicate of itself is an old dream of engineering (von Neumann, 1951). Biological systems have solved this problem long ago at the nanoscale with DNA and RNA. Their replication machinery was optimized to perfection through Darwinian evolution. In modern living systems, the replication of DNA and RNA necessitates the formation of covalent bonds. It requires an interconnected machinery: proteins need to perform base-by-base replication of sequence information, a modern metabolism to supply activated molecules, and tRNA as well as the ribosome to create the required proteins.

This is a complex system to set up in the first place at the emergence of life. The RNA world hypothesis proposes, that early on, the catalytic function of highly defined RNA sequences was used for self-replication (Horning and Joyce, 2016; Orgel, 2004; Turk et al., 2011). These ribozymes catalyze the ligation of RNA (Doudna et al., 1991; Mutschler et al., 2015; Paul and Joyce, 2002; Robertson et al., 2001; Walton et al., 2020) and the addition of individual bases (Attwater et al., 2013; Horning and Joyce, 2016). These very special sequences were engineered using in vitro evolution. It is unclear how autonomous evolution of early life could have reached such levels of sequence complexity.

Here, we focus on how such replication may have been predated by simpler forms of self-replication. Creating a replicator must fulfill a series of requirements. Replication must yield fidelity in copying, be fast, enable exponential replication, be fed by an autonomous energy source, not require complex sequences and should not form too many replicates without the existence of a template.

We show that replication of information can be realized by the reversible hybridization interactions between tRNA-like molecules alone. The proposed mechanism is driven by an external physical non-equilibrium setting, in our case thermal oscillations. Since the process does not involve chemical ligation, it does not rely on a particular non-enzymatic or catalytic ligation chemistry (Dolinnaya et al., 1988; Engelhart et al., 2012; Patzke et al., 2014; Pino et al., 2011; Rohatgi et al., 1996; Sievers and von Kiedrowski, 1994; von Kiedrowski, 1986) or particular catalytically active sequences, but merely requires sequence complementarity. The advantage of reversible hybridization is the re-usability of educts and products. Moreover, sequence-encoded interactions can self-select by forming agglomerates.

Nature’s approach to achieve exponential growth is the usage of cross-catalysis: the replicate of a template serves as a template for the next round of replication. For short replicators under isothermal conditions, the binding between template and replicate has to be weak such that the dissociation of strands happens spontaneously and is not rate limiting (Paul and Joyce, 2002; Sievers and von Kiedrowski, 1994; von Kiedrowski, 1986). For longer replicates, temperature change has successfully been used to separate strands for replication catalyzed by thermostable proteins (Barany, 1991; Saiki et al., 1985). For catalytic RNA, elevated salt concentrations disfavor strand separation by temperature and catalyze hydrolysis (Horning and Joyce, 2016). In an interesting alternative to strand separation by temperature, Schulman et al. used moderate shear flows to separate DNA tile assemblies (Schulman et al., 2012).

Apart from nucleotide-based replicators, very interesting replication systems using non-covalent interactions have been developed with non-biological compounds (Bottero et al., 2016; Sadownik and Philp, 2008; Tjivikua et al., 1990), peptide-based approaches (Altay et al., 2017; Bourbo et al., 2011; Carnall et al., 2010; Lee et al., 1996; Rubinov et al., 2012), and peptide nucleic acids (Ura et al., 2009). We also want to point to several instructive reviews about the state-of-the-art systems chemistry regarding self-replication (Adamski et al., 2020; Ashkenasy et al., 2017; Kosikova and Philp, 2017).

In the past, metastable hairpin states have been prepared in a physically separated manner. The reaction was then triggered by mixing. For example, the mixing of hairpins with a trigger sequence has been shown to form long concatemers (Dirks and Pierce, 2004). With a similar logic, mixing a low entropy combination of molecules was used to create entropically driven DNA machines, including exponentially amplifying assemblies (Zhang et al., 2007). These reactions run downwards into the binding equilibrium. However, the preparation of the initial low entropy state required human intervention or a unique flow setting for mixing.

Sequence design

We designed a set of cooperatively replicating DNA strands using the program package NUPACK (Zadeh et al., 2011). The sequences are designed to have self-complementary double hairpins and are pairwise complementary within the molecule pool, such that the 3’ hairpin of one strand is complementary to the 5’ hairpin of the next. Their structure resembles the secondary structure of proto-tRNAs proposed by stereochemical theories (Figure 1a), comprising two hairpin loops that surround the anticodon with a few neighboring bases (Krammer et al., 2012). The lengths of 82–84 nt of the double hairpins are that of average tRNA molecules (Sharp et al., 1985), with stem loops consisting of 30–33 nt and the information-encoding interjacent domains of 15 nt. As the replication mechanism is based on hybridization only, it is expected to perform equally well for DNA and RNA. Here, we implemented the system with DNA and not RNA as done previously (Krammer et al., 2012). Both, in the design and the implementation we did not see significant differences between the two versions. Because of the simpler and more inexpensive synthesis of the 82–84 nt long sequences we now implemented the replicator in DNA. Due to short heating times and moderate magnesium concentrations, we estimate that an RNA version could survive for days if not weeks (Li and Breaker, 1999; Mariani et al., 2018). The most critical step regarding the RNA stability would be the initial temperature spike to 95 °C, which remains unchanged from our previous study (Krammer et al., 2012) and did not prove critical. We also show that an RNA version behaves structurally identical to the implemented DNA version (Figure 1—figure supplement 1).

Figure 1 with 1 supplement see all
Heat-driven replication by hybridization using hairpin structures inspired from transfer RNA.

(a) Transfer RNA folds into a double-hairpin conformation upon very few base substitutions. In that configuration, the 3’-terminal amino acid binding site (green) is close to the anticodon (blue) and a double hairpin structure forms. A set of pairwise complementary double hairpins can encode and replicate sequences of information. A binary code implemented in the position of the anti-codon, the information domain, allows to encode and replicate binary sequences (red vs blue). Each strand (82-84 nt) comprises two hairpin loops (gray) and an interjacent unpaired information domain of 15 nt length (blue/red, here: 0D). The displayed structure of eight strands shows replication of a template corresponding to the binary code 0010. Note, that no covalent linkage is involved in the process. (b) Replication is driven by thermal oscillations in four steps: (0) The hairpins are activated into their closed conformation by fast cooling indicated by triangles. (1) Strands with matching information domain bind to the template. (2) Fluctuations in the bound strands’ hairpins facilitate the hybridization of neighboring strands. (3) Subsequent heating splits replica from template, while keeping the longer hairpin sequences connected, freeing both as templates for the next cycle.

Replication mechanism

The replication mechanism is a template-based replication, where instead of single nucleotides, information is encoded by a succession of oligomers. The domain, at the location of the anticodon in tRNA, is the template sequence and thus contains the information to be replicated. We therefore term it information domain. The goal is to replicate the succession of information domains.

To allow longer replicates, we chose the resulting meta-sequences to be periodic with a periodicity of four different hairpins. This makes the minimal cyclic meta-sequence large enough to keep the information domains accessible even in cyclic configuration. The information domains feature a binary system and contain sequences marked by '0' and '1' (blue/red). For replication, two sets of strands replicate strings of codons in a cross-catalytic manner (Figure 1b), using complementary information domains (light/dark colors).

The replication is driven by thermal oscillations and operates in four steps (Figure 1b): (0) Fast cooling within seconds brings the strands to their activated state with both hairpins closed. (1) At the base temperature, activated strands with complementary information domains can bind to an already assembled template. (2) Thermal fluctuations cause open-close fluctuations of the hairpins. When strands are already bound to a template at the information domain, those fluctuations permit adjacent complementary hairpins of different strands to bind. In this way, the succession of information domains is replicated. (3) Subsequent heating splits the newly formed replicate from the template at the information domains. Due to their higher melting temperatures, the backbone of hairpin strands remains stable. Both, replicate and template, are available for a new replication round. This makes both the replicate and the template replication cross-catalytic in a subsequent step. Later, high temperatures spikes can unbind and recycle all molecules for new rounds of replication.

Because of the initial fast cooling, all hairpins are closed in free solution. This inhibits the formation of replicates without template. While the binding of adjacent hairpins with template happens within minutes, hairpins in free solution connect without template only on timescales slower than hours and thus give false positives at a very low rate.

The basic principle of this replication mechanism was previously explored by Krammer et al. using a set of four hairpins using half a tRNA sequence (36 nt) that amplified into dimers (Krammer et al., 2012). This amplification could not encode information and suffered from a high rate (>50 %) of unspecific amplification without template (Figure 4 therein). Here, in contrast, we demonstrate exponential amplification, and the replicator can now encode sequence information ‘0’ and ‘1’ with four bits. Moreover, the strands making up the new replicator are double hairpins with the sequence structure and length of tRNA. The replicator now shows a significantly decreased unspecific amplification without template of approximately 10 % (Figure 5a).

Results

Analysis of molecule conformations

Native polyacrylamide gel electrophoresis (PAGE) showed that the double hairpins assembled as intended (Figure 2). Comparing different subsets of strands allowed to identify all gel bands.

Figure 2 with 1 supplement see all
Assembly of different subsets of the cross-replicating system of strands observed by native gel electrophoresis.

Samples contained strands at 200 nM concentration each and were slowly annealed as described in Materials and methods. Lane contents are indicated at the top of each lane. Comparison of different lanes allowed for the attribution of bands to complexes. Complexes incorporating all present strands are marked (•). The red channel shows the intensity 0A-Cy5, the cyan channel shows SYBR Green I fluorescence. Single information domain bonds (lane 2, 7) break during gel electrophoresis.

Figure 2—source data 1

Source data for assembly of different subsets of the cross-replicating system of strands observed by native gel electrophoresis.

https://cdn.elifesciences.org/articles/63431/elife-63431-fig2-data1-v1.zip

All complexes were formed at concentrations of 200 nM of each strand and could be resolved despite their branched tertiary structure. Friction coefficients of complexes of two to four strands were 1.6–1.8-fold higher than for linear dsDNA, and 2.4-fold higher for larger complexes (4:4 configuration, ca. 660 nt, Figure 2—figure supplement 1). This agrees with the branched structure of the suggested strand assembly geometry (Figure 1a). Partially assembled complexes of two or three strands bound to a four-strand template could be resolved (Figure 6—figure supplement 1). Complexes containing single bound information domains were not stable during electrophoresis (Figure 2, lanes 2, 7 and Figure 6—figure supplement 1). This allowed to differentiate fully assembled complexes from those where individual strands are bound to a template but have not formed backbone duplexes. Covalent end labels and two reference lanes on each gel were used to quantify concentrations from gel intensities using image analysis as described in Materials and methods.

Selection by agglomeration and sedimentation

For a replicator to be autonomous, there must be a mechanism in place to select, assemble and (re-)accumulate its molecular components purely at one location. We argue that DNA hydrogels could offer such a solution. While DNA often, also in our case, assembles into agglomerates, DNA hydrogels have been shown to be able to form fluid phases if gaps of single bases were added to create flexible linkers between molecules (Nguyen and Saleh, 2017).

We combined eight matching hairpin sequences of design as introduced in Figure 1 at moderately elevated concentrations and cooled the system to only 25 °C after separating the molecules at 95 °C (Figure 3). We found the spontaneous formation of agglomerates that were large enough to sediment under gravity. The initial homogeneous fluorescence turned into micrometer-sized grains and sedimented within hours. The fluorescence was provided by a covalently attached label to either strand 0A or 1A. Since the double hairpins have a periodic boundary condition, they can create large assemblies (Figure 3a).

Figure 3 with 3 supplements see all
Spontaneous self-assembly and sedimentation of matching hairpins.

(a) In a simple, sealed microfluidic chamber (Figure 3—figure supplement 2), the hairpin strands can self-assemble into agglomerates and sediment on a timescale of hours. The sample was initially heated to 95 °C for 10 s to ensure an unbound initial state, then rapidly (within 30 s) cooled to 25 °C, where self-assembly and sedimentation occured. Note, that agglomeration and sedimentation only occured if all eight matching hairpins were provided (top two rows) but not in the case of a knockout (-1D, bottom row). For quantification, the bulk and sediment intensities were normalized by the first frame after heating. Samples contained strands at total concentration of 5 µM, about threefold higher than in Figure 2 and the following replication experiments. (b) Time traces of concentration increase for sediment and bulk of different configurations, same examples as shown in a. The time traces of all further knockout permutations are shown in Figure 3—figure supplement 1b. (c) Final concentration increase of sediment, relative to first frame after heating, for all configurations. The final values (N≥3) for c/c0 are retrieved from fitting the time traces. For the full set of complementary hairpins, self-assembly and sedimentation is most pronounced.

Figure 3—source data 1

Source data for spontaneous self-assembly and sedimentation of matching hairpins.

https://cdn.elifesciences.org/articles/63431/elife-63431-fig3-data1-v1.zip

It is evident from Figure 3—video 1 that the sedimentation was very selective. When only seven of the eight matching hairpins were present, sedimentation was much weaker and, in most cases, undetectable (Figure 3b,c). For the full system, the sedimentation kinetics showed to be strongly concentration dependent (Figure 3—figure supplement 1b). Analogous experiments with random sequences (random pool of 84 nt strands) at equal concentration did not show agglomeration nor sedimentation (Figure 3—figure supplement 1c). We have previously found that similar hairpin molecules provided the shortest sequences capable of forming agglomerates (Morasch et al., 2016).

The above results suggest that agglomeration could serve as an efficient way to assemble matching hairpins from much less structured and selected sequences in an autonomous way. After the molecules have been assembled as sedimented agglomerates, a convection flow can carry the large assemblies into regions of warmer temperatures, where the molecules would be disassembled by heat and activated for replication with a cooling step. Similar recycling behavior is seen in thermal gradient traps (Morasch et al., 2016), which were also found to enhance the molecular assembly (Mast et al., 2013) with characteristics that can match the above scenario.

Templating kinetics

Hybridization between stems of neighboring hairpins (Figure 1b, step 2) was catalyzed by the presence of already assembled complexes 0¯A0¯B0¯C0¯D, confirming its role as a template. Assembly kinetics at 45 °C were recorded in reactions containing 200 nM of each strand for a range of template concentrations. At 120 nM template concentration, 40 % yield was achieved within 10 min (Figure 4b, black line). The untemplated, spontaneous reaction proceeded significantly slower (1.4 % yield, light gray line).

Figure 4 with 1 supplement see all
Isothermal template assisted product formation.

(a) Schematic representation of the templating step at constant temperature. (b) Kinetics of tetramer formation at 45 °C with different starting concentrations of template (c¯0). Data includes concentrations of all complexes containing tetramers. (c) Templating observed over a broad temperature range. Large circles show data for reactions at c¯0=120 nM of template 0¯A0¯B0¯C0¯D, small circles show the spontaneous formation (c¯0=0). The latter increases at T > 45 °C. Above 48 °C, binding of monomers to the template gets weaker, slowing down the rate of template assisted formation. This is consistent with the melting temperatures of the information domains (see Figure 4—figure supplement 1).

Figure 4—source data 1

Source data for determination of thermal oscillation temperatures.

https://cdn.elifesciences.org/articles/63431/elife-63431-fig4-data1-v1.zip

Assembly rates showed a strong dependence on incubation temperature (Figure 4c). At 39 °C, the reaction proceeded significantly slower than at 42 °C or 45 °C. This is because the hairpins are predominantly in closed configuration and cannot bind to neighboring molecules in the assembly. Binding between complementary information domains still occurs, but the formation of bonds between neighboring strands becomes rate limiting. Above the melting temperature of the information domain (48 °C) (see Figure 4—figure supplement 1), template-directed assembly becomes slower. However, the slower kinetics of template-directed product formation are partially superposed by the spontaneous product formation lacking an initial template (Figure 4c, small circles), which becomes an additional reaction channel due to the now open hairpins.

Exponential amplification

As intermediate step toward replication, we studied amplification reactions under thermal oscillations (Figure 5). The amplification reactions only contained strands encoding for information domain '0', that is 0A, 0¯A, 0B, 0¯B, …, 0¯D. The strands were subjected to thermal oscillations between Tbase = 45 °C and Tpeak = 67 °C. The lower temperature was held for 20 min, the upper for one second with temperature ramps amounting to 20±1 s in each full cycle. This asymmetric shape of the temperature cycle accords with differences in kinetics of the elongation step and the melting of the information domain. It is typical for trajectories in thermal convection settings with local heating (Braun et al., 2003).

Exponential amplification of a restricted sequence subset with thermal oscillations.

(a) Amplification time traces for concentration c for sequence 0000 during the first four to six cycles (Tpeak = 67 °C) for template (0¯A0¯B0¯C0¯D) concentrations c¯0 from 0 to 45 nM. The data was fitted using the cross-catalytic model from equation (1). Strands 0A, 0¯A, 0B, …, 0¯D were used at 200 nM concentration each. Data points show concentrations of complexes 4:4. (b) Initial reaction velocity as a function of initial template concentration c¯0. The data points show good agreement with the line calculated from the fits in panel a. (c) Amplification proceeded for peak temperatures below 74 °C. Above, backbone duplexes start to melt, and the complexes are no longer stable. The base temperature was 45 °C, reactions initially contained 30 nM of complex 0¯A0¯B0¯C0¯D as template. (d) Serial transfer experiment. The reaction containing strands 0A, 0¯A, 0B, …, 0¯D (black circles) survived successive dilution by a factor of 1/2 every three cycles at almost constant concentration. In contrast, a reaction with the same amount of template 0¯A0¯B0¯C0¯D, but lacking monomers 0¯AD, fades out (open circles). The solid line shows the model from Equation 1.

Figure 5—source data 1

Source data for exponential amplification of a restricted sequence subset with thermal oscillations.

https://cdn.elifesciences.org/articles/63431/elife-63431-fig5-data1-v1.zip

The growth of molecular assemblies with different initial concentrations of template 0¯A0¯B0¯C0¯D revealed an almost linear dependence of the reaction velocity on the initial amount of template (Figure 5a, b). This confirms the exponential nature of the replication. The cross-catalytic replication kinetics can be described by a simplistic model that only considers the concentrations ct of the template 0A0B0C0D and its complement c-t of 0¯A0¯B0¯C0¯D:

(1) ddtc(t)=kc¯(t)+k0,ddtc¯(t)=kc(t)+k0

Here, k is the rate of cross-catalysis and k0 the spontaneous formation rate. For c(t)c¯(t), the model corresponds to simple exponential growth on a per-cycle basis. The model can be solved in closed form but does not account for saturation effects from the depletion of monomers. Therefore, it is not valid for concentrations similar to the total concentration of each strand. Fitting the model to the amplification reactions with 0–45 nM of template 0¯A0¯B0¯C0¯D revealed rate constants of k = 0.16 cycle−1 and k0 = 0.4 nM cycle−1 (Figure 5b). Amplification was robust with regard to the peak temperature of the oscillations. For Tpeak below 74 °C, the reaction remained almost unaffected (Figure 5c). Above, the temperature is too close to the melting transitions of the hairpin-hairpin duplexes, ranging from 76 to 79 °C (Figure 4—figure supplement 1).

The ability to withstand consecutive dilutions is characteristic for exponentially growing replicators and was tested for in serial transfer experiments. Strands encoding for '0' (i.e. 0A, 0¯A, 0B, etc.) were thermally cycled with 30 nM of template 0¯A0¯B0¯C0¯D. After three cycles each, samples were diluted one to one with buffer containing all eight strands as monomers at 200 nM each (Figure 5d). This high frequency of dilutions prevented the reaction from transitioning into the saturating regime. The cross-catalytic model was fitted to the data with the dilution factor as single free parameter, that was found to be 0.43. The difference from the theoretical value of 0.50 was likely due to strands sticking to the reaction vessels before dilution. As a control, a reaction with the same initial concentration of template 0¯A0¯B0¯C0¯D, but without monomers 0¯A, 0¯B, 0¯C, 0¯D, was subjected to the same protocol. As the control could not grow exponentially, it gradually died out (Figure 5d, open circles).

Sequence replication

The above-mentioned reactions did amplify, but not replicate actual sequence information, as they only contained strands with 0/0¯ information domains. To study the replication of arbitrary sequences of binary code, replication reactions with all 16 strands encoding for '0' and '1' were performed. To discriminate sequences encoded in equally sized complexes and deduce error rates, we compared these results to those from different reaction runs with defects, that is lacking one or two of the hairpin sequences required for the faithful replication of a particular template. Reference reactions contained all 16 strands (0A, 0¯A, 1A, 1¯A, 0B, …, 1¯D) at 100 nM each, and were run for each of three different template sequences (0¯A0¯B0¯C0¯D, 0¯A1¯B0¯C1¯D, and 0¯A0¯B1¯C1¯D) (Figure 6). The product yields were quantified from reaction time traces, extracted by integrating the intensities of all gel bands containing tetramers with the labeled strand 0A.

Figure 6 with 1 supplement see all
Sequence replication with thermal oscillations and fidelity check by forcing mutations from '0' to '1' at different locations.

(a) Replication of sequence 0A0B0C0D. Reactions were started with 15 nM initial template 0¯A0¯B0¯C0¯D. All strands (0A, 0¯A, 1A, …, 1¯D) were present at 100 nM each. Native-PAGE results comparing the reaction of all 16 strands ('++++') with the reaction lacking strand 0D ('+++−'). The defective set '+++−' mostly produced 3:4 complexes instead of 4:4 complexes (see schematics on the right). The overall yield of tetramer-containing complexes was greatly reduced. As size reference, the marker lane contained complexes 0A0B0C0D, 0A0B0C, 0A0B, and monomers 0A. The complete gel is presented in Figure 6—figure supplement 1. (b) Product concentration over time for the complete sequence network (yellow) and three defective sets with missing strands. Data was integrated by quantitative image analysis from electrophoresis gels using covalent markers on the 0A-strand counting all product complexes containing tetramers. Mutations of information in the product from '0' to '1' were induced by defective reactions that lacked strands 0D ('+++−'), 0Cand0D ('++−−'), and 0Band0D ('+−+−'). All reactions were initiated with 15 nM of 0¯A0¯B0¯C0¯D. The solid line shows data from reaction '++++' without template. (c) End point comparison of reactions with templates 0¯A0¯B0¯C0¯D (panels a, b), 0¯A1¯B0¯C1¯D, and 0¯A0¯B1¯C1¯D after six cycles. Horizontal lines indicate averages of the three template sequences. A single missing strand reduced product yield to about 40 %, two missing strands to 15–20 %.

Figure 6—source data 1

Source data for sequence replication with thermal oscillations and fidelity check by forcing mutations from '0' to '1' at different locations.

https://cdn.elifesciences.org/articles/63431/elife-63431-fig6-data1-v1.zip

Leaving out a single strand (reaction label “+++−”, for example omitting 0D for template 0¯A0¯B0¯C0¯D) reduced the yield of full-size product to about 40 % (Figure 6a, b). The non-zero product yield with a missing strand is most likely due to the incorporation of the corresponding strand with an information domain mismatch (here 1D). This type of mismatch allows the hairpin backbone to form regardless, and the unfaithful product can propagate since both strands needed for an amplification of '1' at position D (1D and 1¯D) are provided.

In particular during the first few cycles, mostly complex 0A0B0C:0¯A0¯B0¯C0¯D (3:4) was detected in the gel, instead of the desired tetramer product (Figure 6—figure supplement 1). This was expected given the lack of strand 0D and provides an upper limit on the error rate of the full replication. The fact that the full reaction produced almost no complexes 3:4 or 4:3 indicates that the incomplete product was indeed caused by the lack of a particular strand.

Removal of a further strand either directly next to the previous one ('++−−', missing strands 0Cand0D) or not ('+−+−', missing strands 0Band0D) reduced the yield of product tetramers even further. Due to the periodic design those two variants represent all defective sets with two missing strands. Replication of the other two templates 0¯A1¯B0¯C1¯D and 0¯A0¯B1¯C1¯D produced very similar results. Product concentrations after six cycles are given in Figure 6c for each of the three templates as well as an average over the template sequences (horizontal lines). A single defect reduced the yield of tetramer complexes to about 40 %, two defects to 15–20 %, which is close to 0.4×0.4=0.161520 %, that is the combined probability of two independent mismatches.

Replication fidelity

The observed rate of erroneous product formation can be attributed to the spontaneous background rate (Figure 4b,c, Figure 5a,b and Figure 6b). The reaction ‘+−+−' (dark green) amplified similarly to the untemplated reference reaction (solid line), as it did not contain any strands that could bind next to each other to the template and form a backbone duplex (Figure 6b). For the templated reactions '+++−' and '++−−', templating worked for partial sequences, producing intermediate yields.

The reduction in yield caused by a single defect (i.e. missing strand) to ~40 % (and to ~16 % for two defects) translates into a replication fidelity per information domain of ~60 %. The exact value for the replication fidelity is 62 % and can be calculated from Figure 6b by extracting the endpoint concentrations (blue vs. yellow line) and calculating 114nM37nM=0.62.

However, this is a worst-case estimation, and the replication fidelity is likely higher due to binding competition. The mutations caused by a single defect ('+++-') in Figure 6b were imposed by not providing strand 0D for a template ending with 0¯D and only leaving the option to incorporate 1D instead. For the full system ('++++'), however, with the presence of the matching strand, there is a binding competition for position D. Since the matching strand preferentially binds, the unfaithful incorporation of the wrong strand would be reduced. A similar effect of competition was observed in a protein-catalyzed ligation reaction (Toyabe and Braun, 2019). There, a comparable binding competition lead to a sevenfold decrease of the inferior ligation reaction in the presence of competition (Figure 2a, b therein). Therefore, we expect the real fidelity to be better than above lower bound estimate.

It is interesting to project and compare this per information domain replication fidelity to a per nucleotide replicator (i.e. polymerization). To do so, we define a threshold in the decrease of melting temperature per information domain as the criterion for when the replication mechanism is still functional. Then, we estimate how many point mutations in the information domain can maximally be tolerated to stay within this range of decrease in melting temperature. From this, we can calculate a hypothetical, corresponding per nucleotide fidelity to the measured information domain fidelity.

We compared the properties of the duplex 0:0¯ to duplexes 0:0¯*, where 0¯* differs from 0¯ by K point mutations. We assumed that within the temperature range of this replication mechanism (Figure 7b, gray box) a reduction in information domain melting temperature Tm of the mutated duplex 0:0¯* by up to 10 °C compared to the original duplex 0:0¯ would be tolerated by the replication reaction. This was inferred from the width of the melting transition of duplex 0:0¯ (Figure 7b), where a shift of 10 °C corresponds to an increase of the unbound fraction from 0.08 at Tbase = 45 °C to 0.66 at 55 °C. In terms of free energies of the information domain duplex, this difference corresponds to ΔG(0:0¯*) ≥ −12.5 kcal/mol compared to ΔG(0:0¯) = −15.4 kcal/mol. 99 % of all duplexes 0:0¯*, with 0¯* containing three point mutations, met that criterion (Figure 7a). Therefore, up to K=3 point mutations can be allowed.

Figure 7 with 1 supplement see all
Sequence space analysis of information domain binding.

The binding energies quantify the ability of the replication mechanism to discriminate nucleotide mutations. (a) Cumulative free energy distributions of information domain duplexes 0:0¯ (red), 1:1¯ (light red), as well as all 0:0¯* and 1:1¯* with up to three point mutations in 0¯* and 1¯* (yellow, green, blue). 99 % of duplexes 0:0¯* with three point mutations have free energies ΔG ≥ -12.5 kcal/mol (dashed line), significantly weaker than that of 0:0¯ (ΔG = -15.4 kcal/mol). (b) Melting curves of information domain duplexes 0:0¯ (red), 1:1¯ (light red), and the two duplexes 0:0¯* indicated by arrows in panel a. Even the 0:0¯* duplex (i) at the low end of the ΔG distribution has a melting temperature of about 10 °C below that of 0:0¯. This difference in melting temperature destabilizes binding of the information domain and causes the replication mechanism to reject these sequences in the thermal oscillation regime between Tbase = 45 °C and Tpeak = 67 °C (gray box).

Figure 7—source data 1

Source data for information domain binding energy statistics split into information domains containing terminal mutations and those with internal mutations only.

https://cdn.elifesciences.org/articles/63431/elife-63431-fig7-data1-v1.zip

We will assume that the replication did not differentiate between information domain 0¯ and any information domain 0¯* if 0¯ and 0¯* differ by less than K point mutations. The fidelity per information domain pKN is given by a cumulative binomial distribution:

(2) pK(N)=k=0K1(Nk)pNk(1p)k

Here, N is the information domain length, and p the per nucleotide replication fidelity. The reduction in binding energy of the information domain duplex 0:0¯* and subsequent change in melting temperature was used as criterion to define the functionality of the replicator and to translate between a per information domain and a per nucleotide approach. As justified above, we calculate with K=3 mutations within the N=15 bases of the information domain, that is the replication can tolerate up to three mismatches in the information domain. From Figure 6 we extracted a per information domain fidelity of p3(15)=0.62, and deduce a per nucleotide fidelity of p=85 %. In fact, information domain duplexes 0:0¯* with mutations at two internal bases all show similar properties as information domains with a total of three mutations (Figure 7—figure supplement 1). This refinement (p2(13)=0.62) would increase the per nucleotide fidelity to p=90 %. We therefore estimate that a per nucleotide replication process would need a replication fidelity of 85–90 % to produce sequences with an error rate equivalent to the presented mechanism. Detailed calculations of the per nucleotide fidelities can be found in the supplementary information.

Discussion

A cross-catalytic replicator can be made from short sequences and without covalent bonds under a simple non-equilibrium setting of periodic thermal oscillations. The replication is fast and proceeds within a few thermal oscillations of 20 min each. This velocity is comparable to other replicators (Kindermann et al., 2005), cross-ligating ribozymes (Robertson and Joyce, 2014), or autocatalytic DNA networks (Yin et al., 2008). The required thermal oscillations can be obtained by laminar convection in thermal gradients (Braun et al., 2003; Salditt et al., 2020), which also accumulates oligonucleotides (Mast et al., 2013). Depending on the envisioned environment, the mechanism could also be driven by thermochemical oscillations (Ball and Brindley, 2014) or convection in pH gradients (Keil et al., 2017). It should however be noted, that with the current state-of-the-art prebiotic chemistry regarding polymerization and ligation, the creation of >80 nt RNA is not yet understood.

It is likely that a slower prebiotic ligation chemistry could later fix the replication results over long timescales. Such an additional non-enzymatic ligation (Stadlbauer et al., 2015) that joins successive strands would relax the constraint that backbone duplexes must not melt during high-temperature steps. Early on, this is difficult to achieve in aqueous solution against the high concentration of water. In order to overcome this competition and to favor the reaction entropically by a leaving group, individual bases are typically activated by triphosphates (Attwater et al., 2013; Horning and Joyce, 2016) or imidazoles, which are especially interesting in this context since they can replicate RNA directly (O'Flaherty et al., 2019; Zhou et al., 2019). However, the required chemical conditions of enhanced Mg2+ concentration hinder strand separation.

The overall replication fidelity is limited by the spontaneous bond formation rate between pairs of hairpin sequences, caused by the interaction of strands in free solution. At lower concentrations, as one would imagine in a prebiotic setting, this rate would decrease at the expense of an overall slower reaction. To some degree and despite ongoing design efforts, such a background rate is inherent to hairpin-fuelled DNA or RNA reactions (Green et al., 2006; Krammer et al., 2012; Yin et al., 2008).

The replication mechanism is expected to also work with shorter strands, as long as the order of the melting temperatures of the information domain and the backbone duplexes is preserved. Smaller strands would also be easier to produce by an upstream polymerization process, simply because they contain less nucleotides. In addition, binding of shorter information domain duplexes could discriminate even single base mismatches, resulting in an increased selectivity. It is not straightforward to estimate a minimal sequence length for the demonstrated mechanism. However, it is worth noting that it has been suggested that tRNA arose from two proto-tRNA sequences (Hopfield, 1978).

Pre-selection of nucleic acids for the presented hairpin-driven replication mechanism can be provided by highly sequence-specific gelation of DNA. This gel formation has been shown to be most efficient with double hairpin structures very similar to the tRNA-like sequences used in this study (Morasch et al., 2016). For our replication system, we have demonstrated this in Figure 3 by showing the spontaneous formation of agglomerates and sedimentation under gravity if all molecules of the assembly are present. This self-selection shows a possible pathway how the system can emerge from random or semi-random sequences, for example in a flow or a convection system where the molecules are selected as macroscopic agglomerate (Mast et al., 2013). Another selection pressure could stem from the biased hydrolysis of double-stranded nucleotide backbones, which favors assembled complexes over the initial hairpins (Obermayer et al., 2011).

The replication mechanism could serve as a mutable assembly strategy for larger functional RNAs (Mutschler et al., 2015; Vaidya et al., 2012). As an evolutionary route toward a more mRNA-like replication product with chemically ligated information domains, the mechanism would be supplemented by self-cleavage next to the information domains that cuts out the non-coding backbone duplexes, followed by ligation of the information domains. Both operations could potentially be performed by very small ribozymatic centers (Dange et al., 1990; Szostak, 2012; Vlassov et al., 2005).

The proposed replication mechanism of assemblies from tRNA-like sequences allows to speculate about a transition from an autonomous replication of successions of information domains to the translation of codon sequences encoded in modern mRNA (Figure 1a). Short peptide-RNA hybrids (Griesser et al., 2017; Jauker et al., 2015), combined with specific interactions between 3’-terminal amino acids and the anticodons, could have given rise to a primitive genetic code. The spatial arrangement of tRNA-like sequences that are replicated by the presented mechanism would translate into a spatial arrangement of the amino acid or short peptide tails that are attached to the strands in a codon-encoded manner (Schimmel and Henderson, 1994). The next stage would then be the detachment and linking of the tails to form longer peptides. Eventually, tRNA would transition to its modern role in protein translation. The mechanism thus proposes a hypothesis for the emergence of predecessors of tRNA, independent of protein translation. This is crucial for models of the evolution of translation, because it could justify the existence of tRNA before it was utilized in an early translation process. However, many questions around the evolutionary steps that created translation are still unclear.

Therefore, replication and translation could have, at an early stage, emerged along a common evolutionary trajectory. This supports the notion that predecessors of tRNA could have featured a rudimentary replication mechanism: starting with a double hairpin structure of tRNA-like sequences, the replication of a succession of informational domains would emerge. The interesting aspect is, that the replication is first encoded by hybridization and can later be fixed by a much slower ligation of the hairpins. The demonstrated mechanism could therefore jumpstart a non-enzymatic replication chemistry, which was most likely restricted in fidelity due to working on a nucleotide-by-nucleotide basis (Robertson and Joyce, 2012; Szathmáry, 2006).

Materials and methods

Key resources table
Reagent type
(species) or
resource
DesignationSource or
reference
IdentifiersAdditional
information
Sequence-based reagent0ABiomersP - GCAGCGTTAATTCCCGC
GCCTATCGGGAATGTAA
CGCAGTGGGTAATAATG
ACGATAGCCGTTCGGGA
AAAGCGAACGGTATCG
Sequence-based reagent0BBiomersP - GCAGCGATACCGTTCG
CTTTTCCCGAACGGCT
ATCGCAGTGGGTAATA
ATGAGCGAACTGTCGG
TGCTTGCGACAGTGTCGC
Sequence-based reagent0CBiomersP - GCAGGCGACACTGTCG
CAAGCACCGACAGTTC
GCCAGTGGGTAATAAT
GAGCGGTTCCTTGCGG
AGTAGGCAAGGAATCCGC
Sequence-based reagent0DBiomersP - GCAGGCGGATTCCTTG
CCTACTCCGCAAGGAA
TCGCCAGTGGGTAATA
ATGACGTTACATTCCC
GATAGGCGCGGGAATTAACG
Sequence-based reagent0¯ABiomersP - GCTGCGCATTAACGCG
CTTGTCCCGCGTTAAT
TGCGCTCATTATTACC
CACTCGCTCTCGGCTG
TTTTGCCCAGCCGAGCAGCG
Sequence-based reagent0¯BBiomersP – GCTGCGTTGCATTGGC
GATCAAAGCCAATGCG
AACGCTCATTATTACC
CACTCGCAATTAACGC
GGGACAAGCGCGTTAATGCG
Sequence-based reagent0¯CBiomersP - GCTGGTTGGAGAAGGC
GAACAGCACGCCTTCC
CAACCTCATTATTACCC
ACTCGTTCGCATTGGC
TTTGATC GCCAATGCAACG
Sequence-based reagent0¯DBiomersP - GCTGCGCTGCTCGGCT
GGGCAAAACAGCCGAG
AGCGCTCATTATTACCC
ACTGTTGGGAAGGCGT
GCTGTTCGCCTTCTCCAAC
Sequence-based reagent1ABiomersP - GCAGCGTTAATTCCCG
CGCCTATCGGGAATGT
AACGCAAAAGAAGAGA
AAGACGATAGCCGTTC
GGGAAAAGCGAACGGTATCG
Sequence-based reagent1BBiomersP - GCAGCGATACCGTTCG
CTTTTCCCGAACGGCT
ATCGCAAAAGAAGAGA
AAGAGCGAACTGTCGG
TGCTTGCGACAGTGTCGC
Sequence-based reagent1CBiomersP - GCAGGCGACACTGTCG
CAAGCACCGACAGTTC
GCCAAAAGAAGAGAAA
GAGCGGTTCCTTGCGG
AGTAGGCAAGGAATCCGC
Sequence-based reagent1DBiomersP - GCAGGCGGATTCCTTG
CCTACTCCGCAAGGAA
TCGCCAAAAGAAGAGA
AAGACGTTACATTCCC
GATAGGCGCGGGAATTAACG
Sequence-based reagent1¯ABiomersP - GCTGCGCATTAACGCG
CTTGTCCCGCGTTAAT
TGCGCTCTTTCTCTTC
TTTTCGCTCTCGGCTG
TTTTGCCCAGCCGAGCAGCG
Sequence-based reagent1¯BBiomersP - GCTGCGTTGCATTGGC
GATCAAAGCCAATGCG
AACGCTCTTTCTCTTC
TTTTCGCAATTAACGC
GGGACAAGCGCGTTAATGCG
Sequence-based reagent1¯CBiomersP - GCTGGTTGGAGAAGGC
GAACAGCACGCCTTCC
CAACCTCTTTCTCTTC
TTTTCGTTCGCATTGG
CTTTGATCGCCAATGCAACG
Sequence-based reagent1¯DBiomersP- GCTGCGCTGCTCGGCT
GGGCAAAACAGCCGAG
AGCGCTCTTTCTCTTC
TTTTGTTGGGAAGGCG
TGCTGTTCGCCTTCTCCAAC
Sequence-based reagent0A – Cy5BiomersCy5 -GCAGCGTTAATTCCCGC
GCCTATCGGGAATGTAA
CGCAGTGGGTAATAATG
ACGATAGCCGTTCGGGA
AAAGCGAACGGTATCG
Sequence-based reagent1A – Cy5BiomersCy5 - GCAGCGTTAATTCCCG
CGCCTATCGGGAATGT
AACGCAAAAGAAGAGA
AAGACGATAGCCGTTC
GGGAAAAGCGAACGGTATCG
Sequence-based reagentR (random)BiomersNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNN
Sequence-based reagentR (random) – Cy5BiomersCy5 - NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNN
Software, algorithmNUPACKnupack.orghttps://doi.org/10.1002/jcc.21596
Software, algorithmImageJImageJ http://imagej.nih.gov/ij/RRID:SCR_002285
Software, algorithmImageJ stabilization pluginhttp://www.cs.cmu.edu/~kangli/code/Image_Stabilizer.html

Strand design

Request a detailed protocol

DNA double-hairpin sequences were designed using the NUPACK software package (Zadeh et al., 2011). In addition to the secondary structures of the double-hairpins, the design algorithm was constrained by all target dimers. Candidate sequences were selected for optimal homogeneity of binding energies and melting temperatures. Backbone domains connecting consecutive strands (e.g. 0A0B0C) had to be the most stable bonds in the system, in particular more stable than between a template and a newly formed product complex (e.g. 0B:0¯B). On the other hand, hairpin melting temperatures had to be low enough to allow for a sufficient degree of thermal fluctuations. To reconcile this with the length of the strands, mismatches were introduced in the hairpin stems. The sequences of all strands are listed in Supplementary file 1.

Thermal cycling assays

Request a detailed protocol

All reactions were performed in salt 20 mM Tris-HCl pH 8, 150 mM NaCl with added 20 mM MgCl2. DNA oligonucleotides (Biomers, Germany) were used at 200 nM concentration per strand in reactions containing a fixed-sequence subset of eight strands (e.g. 0/0¯ only) and 100 nM per strand in reactions containing all 16 different strands.

Thermal cycling was done in a standard PCR cycler (Bio-Rad C1000). Reaction kinetics were obtained by running each reaction for different run times or numbers of cycles in parallel. The products were analyzed using native PAGE. The time between thermal cycling and PAGE analysis was minimized to exclude artifacts from storage on ice.

Template sequences were prepared using a two-step protocol. Annealing from 95°C to 70°C within 1 hr, followed by incubation at 70 °C for 30 min. Afterwards, samples were cooled to 2 °C and stored on ice. When assembling complexes containing paired information domains (Figure 2), samples were slowly cooled down from 70 to 25 °C within 90 min before being transferred onto ice. DNA double hairpins were quenched into monomolecular state by heating to 95 °C and subsequent fast transfer into ice water.

Product analysis

Request a detailed protocol

DNA complexes were analyzed using native polyacrylamide gel electrophoresis (PAGE) in gels at 5 % acrylamide concentration and 29:1 acrylamide / bisacrylamide ratio (Bio-Rad, Germany). Gels were run at electric fields of 14 V/cm at room temperature. Strand 0A/1A was covalently labeled with Cy5. Cy5 fluorescence intensities were later used to compute strand concentrations. As an additional color channel, strands were stained using SYBR Green I dye (New England Biolabs). Complexes were identified by comparing the products obtained from annealing different strand subsets.

To correctly identify bands in the time-resolved measurements, gels were run with a marker lane. The marker contained strands 0A (200 nM), 0B (150 nM), 0C (50 nM), and 0D (100 nM), and was prepared using the two-step annealing protocol from 95 to 70 °C. The unequal strand concentrations ensured that the sample contained a mixture of mono-, di-, tri-, and tetramers.

Electrophoresis gels were imaged in a multi-channel imager (Bio-Rad ChemiDoc MP), image post processing, and data analysis were performed using a self-developed LabVIEW software. Post-processing corrected for inhomogeneous illumination by the LEDs, image rotation, and distortions of the gel lanes if applicable. Background fluorescence was determined from empty lanes on the gel, albeit generally low in the Cy5 channel.

For the determination of reaction yields, the intensities of all gel bands containing strands of the sequence length of interest were added up. For strings of four strands, these were the single tetramer as well as its complex with di- and tri- and tetramers. Single strands separated from their complements during electrophoresis (Figure 2 and Figure 6—figure supplement 1).

Thermal melting curves

Request a detailed protocol

Thermal melting curves were measured using either UV absorbance at 260 nm wavelength in a UV/Vis spectrometer (JASCO V-650, 1 cm optical path length), via quenching of the Cy5 label at the 5'-end of strand 0A (excitation: 620–650 nm, detection: 675–690 nm), or using fluorescence of the intercalating dye SYBR Green I (excitation: 450–490 nm, detection: 510–530 nm). Fluorescence measurements were performed in a PCR cycler (Bio-Rad C1000). Samples measured via fluorescence were at 200 nM of each strand, those measured via UV absorption contained 1 µM total DNA concentration to improve the signal-to-noise ratio. Before analysis of the melting curves (Mergny and Lacroix, 2003), data were corrected for baseline signals from reference samples containing buffer and intercalating dye, if applicable.

Self-assembly and sedimentation analysis

Request a detailed protocol

The samples were mixed in the replication buffer (150 mM NaCl, 20 mM MgCl2, 20 mM Tris-HCl pH 8) at a total oligomer concentration of 5 µM, that is varying concentration per strand depending on the number of different strands in the configuration (4, 7, or 8). The microfluidic chamber was assembled with a custom cut, 500 µm thick, Teflon foil placed between two plane sapphires (Figure 3—figure supplement 2). Three Peltier elements (QuickCool QC-31–1.4-3.7AS, purchased from Conrad Electronics, Germany) were attached to the backside of the chamber to provide full temperature control. The chamber was initially flushed with 3M Novec7500 (3M, Germany) to avoid bubble formation. The samples were pipetted into the microfluidic chamber through the 0.5 mm channels using microloader pipette tips (Eppendorf, Germany). The chamber was then sealed with Parafilm and heated to 95 °C for 10 s to fully separate the strands and cooled rapidly (within 30 s) to 25 °C. Assembly and sedimentation were monitored for 20 hr on a fluorescence microscope (Axiotech Vario, Zeiss, Germany) with two LEDs (490 nm and 625 nm, Thorlabs, Germany) using a 2.5 x objective (Fluar, Zeiss, Germany). The observed sedimentation was independent of the attached dye and its position (Figure 3—figure supplement 1c). Prior to image analysis the image stacks were stabilized using an ImageJ plugin (Li, 2008). The ratio of sedimented fluorescence relative to the first frame after heating was used to quantify sedimentation (Figure 3). The sedimentation time-traces (Figure 3b) were fitted with a Sigmoid function to determine the final concentration increase c/c0 (Figure 3c). The experiment was also performed with random 84 nt DNA strands at 5 µM total concentration to exclude unspecific agglomeration (Figure 3—figure supplement 1c).

Appendix 1

Calculation of fidelity rate

Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. Now, we want to assume that the presented replication mechanism would translate into a base-by-base replication and look at (i) how tolerant would the replication be to point mutations at the information domain and (ii) given that threshold, how good would a base-by-base replication have to do to perform equally well, that is what per nucleotide fidelity would it need to have.

Question (i) is answered in Figure 7, where we see that on the 15 nt information domain we can allow up to three base mismatches to stay within the bounds of the temperature cycling (gray box, Figure 7b). In order to calculate how the measured replication fidelity per information domain translates into a hypothetical replication fidelity per nucleotide we assume a cumulative binomial distribution:

pK(N) = k=0K1(Nk) pNk(1p)k

We know that the overall likelihood to get a 'correctly' replicated information domain is 62 %. From Figure 7 we know that in a base-by-base replication, 'correctly' means with up to three mismatches. Therefore, we must find the number of combinatorial possibilities of spatially distributing 0, 1 or 2 mismatches on the 15 nt information domain (using N=15 nucleotides and allowing up to K=3 mismatches). Using this, we can determine the probability p for a success, that is the correct replication of a single nucleotide, to meet the pKN = 0.62 overall likelihood.

For K=3 and N=15, we measure the replication fidelity per information domain to be pKN=0.62. Therefore, we calculate:

k=0K1(Nk)pNk(1p)k=k=02(15k)p15k(1p)k=0.62(150)p15(1p)0+(151)p14(1p)1+(152)p13(1p)2=0.621p15+15p14(1p)1+105p13(1p)2==p15+15p1415p15+105p13210p14+105 p15==91p15 195p14+105p13=0.62p=0.853=85%

From the information domain energy statistics shown in Figure 7—figure supplement 1, one can see that strands with two internal mutations behave nearly identical to strands with a total of three mutations (accepting internal and terminal mutations). Therefore, we simplify the calculation and only consider internal mutations.

Accordingly, we calculate for K=2 and N=#allbases#terminalbases=152=13 and a per information domain fidelity pKN=0.62:

k=0K1(Nk)pNk(1p)k=k=01(13k)p13k(1p)k=0.62(130)p13(1p)0+(131)p12(1p)1=0.621p13+13p12(1p)1==p13+13p1213p13==12p13+13p12=0.62p=0.900=90%

Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85–90 % to perform equally well as the presented replication mechanism.

Data availability

No data sets (e.g. sequencing data, clinical trial data etc.) were produced in this study. The source data files (Igor incl. macros) and data analysis (LabVIEW) tools used are provided as supporting fFiles (zip).

References

    1. Orgel LE
    (2004) Prebiotic chemistry and the origin of the RNA world
    Critical Reviews in Biochemistry and Molecular Biology 39:99–123.
    https://doi.org/10.1080/10409230490460765
    1. Tjivikua T
    2. Ballester P
    3. Rebek J
    (1990) Self-replicating system
    Journal of the American Chemical Society 112:1249–1250.
    https://doi.org/10.1021/ja00159a057
    1. von Kiedrowski G
    (1986) A Self-Replicating hexadeoxynucleotide
    Angewandte Chemie International Edition in English 25:932–935.
    https://doi.org/10.1002/anie.198609322
  1. Book
    1. von Neumann J
    (1951)
    The general and logical theory of automata
    In: Jeffress L. A, editors. Cerebral Mechanisms in Behavio. Wiley. pp. 1–41.

Decision letter

  1. Patricia J Wittkopp
    Senior Editor; University of Michigan, United States
  2. Gonen Ashkenasy
    Reviewing Editor; Ben-Gurion University, Israel

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

We have found special interest in your new system, since it can serve as a new platform for building a replicator or an amplifier, and hence may help understanding how DNA/RNA self-replicate and pass on information.

Decision letter after peer review:

Thank you for submitting your article "tRNA sequences can assemble into a replicator" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Patricia Wittkopp as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our policy on revisions we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

The authors describe a new replication system, driven by DNA molecules for which the sequence design was inspired by natural tRNAs. The new system assembles into four hairpins that can template the replication of complementary sequences. One of the main challenges in obtaining efficient (exponential) replication is originated from product inhibition, namely that the template-replicate duplexes are more stable than their single stranded forms, thus tend to have longer lifetimes and slow down the next replication cycles. In the current work, the authors developed a smart thermal strategy to drive the templates amplification. Their design offers a good mechanism for building a replicator or an amplifier and may serve as a simplified platform in understanding more about how DNA/RNA may self-replicate and pass on information. The experiments of templated kinetics and exponential replication seem to be conducted appropriately, and the article is well-written and supported on previous literature from the field.

The paper can be of interest to the growing community exploring the chemistry of the origin of life, and particularly replicative chemistry. The paper is recommended to be published after addressing the following comments.

Essential revisions:

1) For the Sequence replication, Figure 6b, what is the concentration of y-axis that is being reported? Is it the tetramer concentration (including 0A0B0C0D and 0¯A0¯B0¯C0¯D) or does it only include 0A0B0C0D? If it includes both, it should be specified, if not, can the authors explain why it is possible to form new 0A0B0C0D tetramers when there is no 0D fuels? Also, as all 4 reactions (++++, +++-, ++--, +-+-) starts with 15 nM of 0A0B0C0D tetramers, should they not begin with at least 15 nM in the 4 cases compared to the no template case?

2) Can the authors better explain the section of the calculation of Replication fidelity? As there are no mutations happening in the hairpins, wouldn't it make more sense for calculating the fidelity rate on information-domain-basis instead of on the nucleotide-basis? Also, the equation for fidelity per information domain 𝑝𝐾(𝑁) may need more explanation and clarification, it would be helpful, thus, to add a section in the supplementary information clearing up the definition of fidelity rate (of the final tetramers, how many of them are 0A0B0C0D) and how these equations are derived. The 71 % fidelity per information domain caused by the 40 % decrease is also a bit confusing, as in this case, all these products should be errors, doesn't it translate to 60 % of leak for such cases?

3) No comments are made along the paper regarding the possibility of non-nucleic acid molecules to replicate, but many recent studies have highlighted this possibility. As a minimum, to complete the discussion it would be nice to refer the reader to review papers on this topic. In addition, since the replication process described in this paper is based on self-assembly, some references to previous works where self-assembly and replication were also intimately related are missing.

4) By reading the end of the Abstract (and from the title) one may understand that today's tRNA sequences, or very similar sequences, could confer any advantages with respect to replication and selection. As this point is not directly demonstrated in the paper, the authors could consider a more conservative discussion of this issue and clarify how far this argument holds.

5) The statement "Here, we implemented the system with DNA for practical reasons⁠. Nevertheless, due to short heating times and moderate magnesium concentrations, we also estimate that an RNA version can survive for weeks (Li and Breaker, 1999)" may be true according to that reference, but it would have so important implications that it requires some experimental verification. Otherwise, it would be better to be more conservative about this point.

https://doi.org/10.7554/eLife.63431.sa1

Author response

Essential revisions:

1) For the Sequence replication, Figure 6b, what is the concentration of y-axis that is being reported? Is it the tetramer concentration (including 0A0B0C0D and 0¯A0¯B0¯C0¯D) or does it only include 0A0B0C0D? If it includes both, it should be specified, if not, can the authors explain why it is possible to form new 0A0B0C0D tetramers when there is no 0D fuels? Also, as all 4 reactions (++++, +++-, ++--, +-+-) starts with 15 nM of 0A0B0C0D tetramers, should they not begin with at least 15 nM in the 4 cases compared to the no template case?

In these settings, as the reviewers noted correctly, Figure 6b shows indeed the product concentration. As the label sits on strand 0A, the template is unlabeled and only the formation of product (containing 0A) is recorded. This also becomes clear in Figure 6a, where the labeled strand 0A at timepoint zero shows no product complexes but over time gets incorporated in the replicated product complexes. Therefore, the starting concentration of the product is indeed 0 nM, as shown in the Figure 6b. Only the unlabeled template 0¯A0¯B0¯C0¯D is present in the first cycle at a concentration of 15 nM.

To study the replication of information, the performed reactions contained all 16 fuel strands (i.e. 0A,0B,0C,0D, 0¯A,0¯B,0¯C,0¯D, 1A,1B,1C,1D, 1¯A,1¯B,1¯C,1¯D), and started with one of the three templates 0¯A0¯B0¯C0¯D, 0¯A0¯B1¯C1¯D, 0¯A1¯B0¯C1¯D. We then compare the amount of product for the complete reaction (“++++”) to the output for reactions lacking one or more of the fuel strands required to replicate the template after six oscillations. For example, in the case “+++-“, when providing the template 0¯A0¯B0¯C0¯D, we omitted 0D and thus forced the reaction to create a mutation from “0” to “1” at position D. For the case “++--“ and the template 0¯A0¯B0¯C0¯D, both 0C and 0D were omitted, which forced two mutations from “0” to “1” at positions C and D.

One might also ask why we did not use differentially labeled strands. We deliberately decided against this, as we wanted to keep the labeled strand constant and not introduce additional labels to avoid differences in binding due to potential stacking interactions of or with the dyes.

To better show all this, we have changed the y-label of Figure 6b and adapted the figure legend.

In all cases, when a fuel strand required for the correct product formation is removed, e.g. removing 0D when providing template 0¯A0¯B0¯C0¯D, the product yield of tetramers is reduced to about 40 %. Since it is impossible to form the correct product, the detected tetramers must contain mismatches at the position where the correct strand is missing. In principle, this can be two kinds of mismatches:

i) The incorporation of the correct information domain but at the wrong position, leading to an insufficiently formed backbone. For example, 0B could be incorporated at the position of 0D. But location mismatches of this type will break up during the next temperature cycle as 0B could only bind at the information domain but is incompatible along the hairpins and would therefore not lead to the formation of a tetramer. Such a reaction would not be reflected in the final yield of tetramers.

ii) The second type of mismatch will in contrast alter the final concentration of tetramers. A correctly formed backbone is created, but an information domain mismatch occurs, e.g. 1D is incorporated at the position of 0D. This tetramer with the mutation from “0” to “1” at position D will not break up during the temperature cycles due to its correctly formed backbone. Therefore, we enter the next round of replication with an additional tetramer with a correctly formed backbone but a different sequence, e.g. here 0A0B0C1D. This will continue to replicate since for this tetramer all fuel strands are available. From now on, 0A0B0C1D can act as a template for an unfaithful replication. We argue that mismatches of type (ii) are most likely the reason for a tetramer product yield of 40 % despite one missing strand.

In Figure 6c, we also tested above scheme for two other templates 0¯A1¯B0¯C1¯D and 0¯A0¯B1¯C1¯D which behaved in a very similar manner and showed that the probability of two mutations is in good approximation with the squared probability of a single mutation (0.4×0.4 = 0.16  15  20 %). This indicates that the processes causing the non-zero product yields in first approximation are independent, which matches with our explanation of mismatch type (ii). Due to the periodicity of the design, the two defective sets (“++--” and “+-+-“) cover the whole combinatorial space of two mismatches.

Clarifying those points, we now write:

“Reference reactions contained all 16 strands (0A, 0¯A, 1A, 1¯A, 0B, …, 1¯D) at 100 nM each, and were run for each of three different template sequences (0¯A0¯B0¯C0¯D, 0¯A1¯B0¯C1¯D, and 0¯A0¯B1¯C1¯D) (Figure 6). […] A single defect reduced the yield of tetramer complexes to about 40 %, two defects to 15–20 %, which is close to 0.4×0.4=0.161520 %, i.e. the combined probability of two independent mismatches.”

2) Can the authors better explain the section of the calculation of Replication fidelity? As there are no mutations happening in the hairpins, wouldn't it make more sense for calculating the fidelity rate on information-domain-basis instead of on the nucleotide-basis? Also, the equation for fidelity per information domain pK(N) may need more explanation and clarification, it would be helpful, thus, to add a section in the SI clearing up the definition of fidelity rate (of the final tetramers, how many of them are 0A0B0C0D) and how these equations are derived. The 71 % fidelity per information domain caused by the 40 % decrease is also a bit confusing, as in this case, all these products should be errors, doesn't it translate to 60 % of leak for such cases?

We indeed agree that the replication fidelity was estimated wrongly by assigning the 100 % to the sum of the concentrations in the calculation of the fraction of perfect matches. We therefore revised the numbers given regarding the fidelity of the replicator.

This reduces our estimation of the replicator’s fidelity from 71 % down to 62 %. However, we want to point out that in the way we test for mutations, competition for binding sites is neglected, which is why a replication fidelity of 62 % is a lower bound estimation.

As correctly noted by the reviewers, the replication fidelity is defined by how much of the replicated information is replicated accurately. To stay with the example from (M1) and Figure 6b, this means how many of the product tetramers from template 0¯A0¯B0¯C0¯D and “++++” do actually contain the accurate product sequence 0A0B0C0D. In the experiments shown in Figure 6b, c we have determined the probability of mutations in the absence of 0D where we obtain a ~40 % yield. The exact value for the replication fidelity of 62 % can directly be calculated from Figure 6b by extracting the endpoint concentrations (blue line vs. yellow lane) and calculating 1 14 nM37 nM = 0.62. Please note that for the calculation of the replication fidelity we now use a 2-digit precision, whereas for simplicity we stick with a 1-digit precision in Figure 6.

The experiments presented in Figure 6 measured the rate of incorporation of 1D if no 0D was present and there was no competition in binding. For the case of provided template 0¯A0¯B0¯C0¯D – in Figure 6c different templates are analyzed – this means that the faithfully replicated products without competition amount to a ratio of 62 %, a little bit lower than initially stated in the manuscript where we incorrectly put the 100 % reference to the sum of both above concentrations.

We think this is a worst-case scenario. The mutations in the “+++-” case of Figure 6b were forced from a template ending with 0¯D and could only bind 1D as the optimal fuel 0D was not provided. In the full system, the presence of the matching fuel strand, which binds preferentially at position D would have reduced the unfaithful incorporation of the wrong strand. We have seen a similar effect of competition for a protein-catalyzed ligation reaction (Toyabe and Braun, 2019). There, a comparable binding competition lead to a 7-fold decrease of the inferior ligation reaction in the presence of competition (Figure 2a, b therein).

Following this argumentation, we expect that the mutations from “0” to “1” would occur much less under competition, when the fuel for “0” is provided in the mutation experiment shown in Figure 6b. How much this will actually be the case is however hard to estimate as the analysis cannot distinguish between sequences.

The calculation of the replication fidelity per nucleotide is a projection. The aim in calculating a number for the per nucleotide replication fidelity is to compare our work to other studies, which are in comparison base-by-base replicators and provide a number for the replication fidelity per nucleotide.

Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. Now, we assume that the presented replication mechanism would translate into a base-by-base replication and look at (i) how tolerant would the replication scheme be to point mutations at the information domain and (ii) given that threshold, how well would a base-by-base replication have to perform in terms of per nucleotide fidelity.

We explain the reasoning in calculating the per nucleotide fidelity more thoroughly now in the manuscript. We also added a section, where we provide more detail on why a binomial distribution is used and how the exact numbers for the replication fidelity per nucleotide are calculated.

“Question (i) is answered in Figure 7, where we see that on the 15 nt information domain we can allow up to three base mismatches to stay within the bounds of the temperature cycling (gray box, Figure 7b). We make this clearer in the manuscript now. In order to calculate how the measured replication fidelity per information domain translates into a hypothetical replication fidelity per nucleotide we assume a cumulative binomial distribution:

pK(N) = k=0K1(Nk) pNk(1p)k We know that the overall likelihood to get a “correctly” replicated information domain is 62 %. From Figure 7 we know that in a base-by-base replication, “correctly” translates to three mismatches to sustain the replication. Therefore, we must find the number of combinatorial possibilities of spatially distributing 0, 1 or 2 mismatches on the 15 nt information domain (using N=15 nucleotides and allowing up to K=3 mismatches). Using this, we can determine the probability p for a success, i.e. the correct replication of a single nucleotide, to meet the pK(N)=0.62 overall likelihood.”

For N=15 and K=3, the cumulative binomial distribution p3(15)=0.62 can be solved for p (the per-nucleotide fidelity needed) , which yields p=0.85. When neglecting the terminal mismatches (Figure 7—figure supplement 1), we calculate p=0.90 after solving p2(13)=0.62. Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85-90 % to perform equally well as the presented replication mechanism.

Reflecting the above discussion, we now write in the manuscript:

“The reduction in yield caused by a single defect (i.e. missing strand) to ~40 % (and to ~16 % for two defects) translates into a replication fidelity per information domain of ~60 %. […] Detailed calculations of the per nucleotide fidelities can be found in the subsection “Calculation of fidelity rate”.”

In addition, we included the following extra section:

Calculation of fidelity rate

Through the experiments shown in Figure 6 we already know that the replication fidelity per information domain is 62 %. […] Therefore, a comparable base-by-base replication would need a per nucleotide fidelity of 85 - 90 % to perform equally well as the presented replication mechanism.”

We also adjusted the number given for the replication fidelity per nucleotide in the Abstract to 88 % and reformulated the sentence to minimize confusion about the per information domain and per nucleotide fidelity. We now write:

“The molecular assembly could encode and replicate binary sequence information with a replication fidelity corresponding to 85 - 90 % per nucleotide.”

3) No comments are made along the paper regarding the possibility of non-nucleic acid molecules to replicate, but many recent studies have highlighted this possibility. As a minimum, to complete the discussion it would be nice to refer the reader to review papers on this topic. In addition, since the replication process described in this paper is based on self-assembly, some references to previous works where self-assembly and replication were also intimately related are missing.

This is indeed a very good point – and apologies for our nucleotide-centered point of view. We now add to the manuscript:

“Apart from nucleotide-based replicators, very interesting replication systems using non-covalent interactions have been developed with non-biological compounds (Bottero et al., 2016; Sadownik & Philp, 2008, Tjivikua et al., 1990,), peptide-based approaches (Altay et al., 2017; Bourbo et al., 2011; Carnall et al., 2010; Lee et al., 1996; Rubinov et al., 2012) and peptide nucleic acids (Ura et al., 2009). We also want to point to several instructive reviews about the state-of-the-art systems chemistry regarding self-replication (Adamski et al., 2020, Ashkenasy et al., 2017, Kosikova and Philp, 2017).”

4) By reading the end of the Abstract (and from the title) one may understand that today's tRNA sequences, or very similar sequences, could confer any advantages with respect to replication and selection. As this point is not directly demonstrated in the paper, the authors could consider a more conservative discussion of this issue and clarify how far this argument holds.

We understand the reviewers’ reservations about this point. We want to stress that we do not claim that today’s tRNA sequences have any advantages in today’s replication or selection. We merely argue that our experiments support a hypothesis under which tRNA which is one of the most ancient molecules of modern biology might have transformed its role over time. While it today is responsible for the translation of proteins, it might much earlier, in a maybe slightly different form, have been involved in a molecular replication scheme, like the one presented in this manuscript.

We have therefore reformulated the sentence in the Abstract to make it clear that our argument is about a connection in very early replication mechanisms. Of course, it is difficult to argue that modern tRNA sequences are very close to ancient ones. But we hope that our discussion of the replication mechanism on the basis of melting temperatures and the kinetics of hybridization makes it clear that we do not rely on a very specific sequence, but merely on hybridization and a conserved order of melting temperatures.

We now write in the Abstract:

“The replication by a self-assembly of tRNA-like sequences suggests that early forms of tRNA could have been involved in molecular replication. This would link the evolution of translation to a mechanism of molecular replication.”

5) The statement "Here, we implemented the system with DNA for practical reasons⁠. Nevertheless, due to short heating times and moderate magnesium concentrations, we also estimate that an RNA version can survive for weeks (Li and Breaker, 1999)" may be true according to that reference, but it would have so important implications that it requires some experimental verification. Otherwise, it would be better to be more conservative about this point.

Krammer et al. reported a much more primitive replicator with single and not double hairpins and therefore only half the sequence length (Krammer et al., 2012). It is important to note that in this 2012 study the replicator was implemented in RNA. After remodeling the replicator in DNA, we can say that for both replicators, the RNA and the DNA version, we could explain their behavior based on hybridization, and did not have to include any extra considerations for the RNA version.

We also included an overview (see Figure 1—figure supplement 1b) over the predicted secondary structures and the free energies for an RNA version of the presented replicator, when substituting every 'T' with a 'U', using NUPACK (Zadeh et al., 2011). The secondary structure is identical to the DNA version (compare with Figure 1—figure supplement 1a). Only the free energies are slightly higher (+30 %) which could be compensated by a reduction of salt concentrations for an RNA implementation.

Therefore, we argue that the replication scheme can readily be implemented in RNA. Even though the timescales on which Krammer et al. performed their experiments are much shorter, the initial heating step to 95 °C (20 mins > 80 °C) is identical and would arguably have the strongest effect on RNA stability compared to the moderate temperatures during cycling, in both Krammer et al. (10 °C (27 s) - 40 °C (3 s)) and this study (45 °C (20mins) - 67 °C (20 s)).We also want to quote another recent study looking at the hydrolysis of RNA by Mariani et al. They determined the half-life for a 10 nt RNA at 10mM Mg2+ at 90 °C to be seven days. For a 30 nt RNA under the same conditions they measure 16 % unspecific degradation after seven days (Mariani et al., 2018). Even though our Mg2+ concentration is 2-fold higher, we are operating at much more moderate temperatures. We also want to mention a recent study from our own lab, where replication with a 200 nt ribozyme was performed at much higher Mg2+ concentration (50 mM) including temperature spikes (Salditt et al., 2020), which were tolerated well and confirm the hydrolysis studies cited in this manuscript.

We understand that RNA stability at high magnesium concentration and high temperature is critical, but as the time of exposure at high temperature in the presented replication scheme is limited, we would stick with our claim however in a reformulated form. Although, we now elaborate more on the previous implementation with RNA. We now write:

"Here, we implemented the system with DNA and not RNA as done previously (Krammer et al., 2012). Both, in the design and the implementation we did not see significant differences between the two versions. Because of the simpler and more inexpensive synthesis of the 82-84 nt long sequences we now implemented the replicator in DNA. Due to short heating times and moderate magnesium concentrations, we estimate that an RNA version could survive for days if not weeks (Li & Breaker, 1999, Mariani et al. 2018). The most critical step regarding the RNA stability would be the initial temperature spike to 95 °C, which remains unchanged from our previous study (Krammer et al., 2012) and did not prove critical. In Figure S1 we also show that an RNA version behaves structurally identical to the implemented DNA version."

https://doi.org/10.7554/eLife.63431.sa2

Article and author information

Author details

  1. Alexandra Kühnlein

    Systems Biophysics, Physics Department, Center for NanoScience, Ludwig-Maximilians-Universität München, Munich, Germany
    Contribution
    Conceptualization, Data curation, Formal analysis, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Simon A Lanzmich
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9582-6304
  2. Simon A Lanzmich

    Systems Biophysics, Physics Department, Center for NanoScience, Ludwig-Maximilians-Universität München, Munich, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Visualization, Methodology, Writing - original draft
    Contributed equally with
    Alexandra Kühnlein
    Competing interests
    No competing interests declared
  3. Dieter Braun

    Systems Biophysics, Physics Department, Center for NanoScience, Ludwig-Maximilians-Universität München, Munich, Germany
    Contribution
    Conceptualization, Software, Funding acquisition, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    dieter.braun@lmu.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7751-1448

Funding

Deutsche Forschungsgemeinschaft (TRR 235)

  • Alexandra Kühnlein
  • Dieter Braun

Deutsche Forschungsgemeinschaft (Project-ID 364653263)

  • Alexandra Kühnlein
  • Dieter Braun

Deutsche Forschungsgemeinschaft (CRC 1032 (A04) Project-ID 201269156)

  • Simon Alexander Lanzmich
  • Dieter Braun

Deutsche Forschungsgemeinschaft (Student fellowship)

  • Alexandra Kühnlein

Deutsche Forschungsgemeinschaft (Graduate school "Quantitative Bioscience Munich")

  • Alexandra Kühnlein

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We gratefully acknowledge financial support the Deutsche Forschungsgemeinschaft (DFG) through the TRR 235 Emergence of Life (Project-ID 364653263) and the CRC 1032 NanoAgents (Project-ID 201269156). We thank for funding from the Graduate School ‘Quantitative Bioscience Munich’ (QBM). We appreciate the fruitful discussions in the Simons Collaboration on the Origins of Life, thank for the measurements by Thomas Rind and acknowledge discussions with Tim Liedl, Christof Mast and Lorenz Keil. We thank Filiz Civril, Adriana Serrão and Thomas Matreux for comments on the manuscript.

Senior Editor

  1. Patricia J Wittkopp, University of Michigan, United States

Reviewing Editor

  1. Gonen Ashkenasy, Ben-Gurion University, Israel

Publication history

  1. Received: September 24, 2020
  2. Accepted: January 28, 2021
  3. Version of Record published: March 2, 2021 (version 1)

Copyright

© 2021, Kühnlein et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 6,064
    Page views
  • 456
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Brian Q Geuther et al.
    Research Article Updated

    Automated detection of complex animal behaviors remains a challenging problem in neuroscience, particularly for behaviors that consist of disparate sequential motions. Grooming is a prototypical stereotyped behavior that is often used as an endophenotype in psychiatric genetics. Here, we used mouse grooming behavior as an example and developed a general purpose neural network architecture capable of dynamic action detection at human observer-level performance and operating across dozens of mouse strains with high visual diversity. We provide insights into the amount of human annotated training data that are needed to achieve such performance. We surveyed grooming behavior in the open field in 2457 mice across 62 strains, determined its heritable components, conducted GWAS to outline its genetic architecture, and performed PheWAS to link human psychiatric traits through shared underlying genetics. Our general machine learning solution that automatically classifies complex behaviors in large datasets will facilitate systematic studies of behavioral mechanisms.

    1. Computational and Systems Biology
    Michael S Lauer et al.
    Research Article

    A previous report found an association of topic choice with race-based funding disparities among R01 applications submitted to the National Institutes of Health ('NIH') between 2011-2015. Applications submitted by African American or Black ('AAB') Principal Investigators ('PIs') skewed toward a small number of topics that were less likely to be funded (or 'awarded'). It was suggested that lower award rates may be related to topic-related biases of peer reviewers. However, the report did not account for differential funding ecologies among NIH Institutes and Centers ('ICs'). In a re-analysis, we find that 10% of 148 topics account for 50% of applications submitted by AAB PIs. These applications on 'AAB Preferred' topics were funded at lower rates, but peer review outcomes were similar. The lower rate of funding for these topics was primarily due to their assignment to ICs with lower award rates, not to peer-reviewer preferences.