Abstract
The transition from prebiotic chemistry to living systems requires the emergence of a scheme for enzyme-free genetic replication. Here, we analyze a recently proposed prebiotic replication scenario, the so-called Virtual Circular Genome (VCG) [Zhou et al., RNA 27, 1-11 (2021)]: Replication takes place in a pool of oligomers, where each oligomer contains a subsequence of a circular genome, such that the oligomers encode the full genome collectively. While the sequence of the circular genome may be reconstructed based on long oligomers, monomers and short oligomers merely act as replication feedstock. We observe a competition between the predominantly error-free ligation of a feedstock molecule to a long oligomer and the predominantly erroneous ligation of two long oligomers. Increasing the length of long oligomers and reducing their concentration decreases the fraction of erroneous ligations, enabling high-fidelity replication in the VCG. Alternatively, the formation of erroneous products can be suppressed if each ligation involves at least one monomer, while ligations between two long oligomers are effectively prevented. This kinetic discrimination (favoring monomer incorporation over oligomer–oligomer ligation) may be an intrinsic property of the activation chemistry, or can be externally imposed by selectively activating only monomers in the pool. Surprisingly, under these conditions, shorter oligomers are extended by monomers more quickly than long oligomers, a phenomenon which has already been observed experimentally [Ding et al., JACS 145, 7504-7515 (2023)]. Our work provides a theoretical explanation for this behavior, and predicts its dependence on system parameters such as the concentration of long oligomers. Taken together, the VCG constitutes a promising scenario of prebiotic information replication: It could mitigate challenges of in non-enzymatic copying via template-directed polymerization, such as short lengths of copied products and high error rates.
Introduction
In order to delineate possible pathways toward the emergence of life, it is necessary to understand how a chemical reaction network capable of storing and replicating genetic information might arise from prebiotic chemistry. RNA is commonly assumed to play a central role on this path, as it can store information in its sequence and catalyze its own replication (Joyce, 1989; Robertson and Joyce, 2012; Higgs and Lehman, 2015). While ribozymes capable of replicating strands of their own length have been demonstrated in the laboratory (Attwater et al., 2013), it remains elusive how enzyme-free self-replication might have worked before the emergence of such complex ribozymes.
One possible mechanism is template-directed primer extension (Kervio et al., 2016; Walton and Szostak, 2016; Ding et al., 2021; Leveau et al., 2022; Welsch et al., 2023): In this process, a primer hybridizes to a template and is extended by short oligonucleotides, thereby forming a (complementary) copy of the template strand. Considerable progress has been made in optimizing template-directed primer extension, but challenges remain: (i) The produced copies are likely to be incomplete. So far, at most 12 nt have been successfully added to an existing primer (Leveau et al., 2022). Moreover, as the pool of primer strands needs to emerge via random polymerization, the primer is likely to hybridize to the template at various positions, and not only to its 3′-end, leaving part of the 3′-end of the template uncopied (Szostak, 2011). (ii) Errors in enzyme-free copying are frequent due to the limited thermodynamic discrimination between correct Watson-Crick pairing and mismatches (Kervio et al., 2010; Leu et al., 2011, 2013). While some activation chemistries (relying on bridged dinucleotides) have been shown to exhibit improved fidelity (Duzdevich et al., 2021), the error probability still constrains the length of the genome that can be reliably replicated.
The issue of insufficient thermodynamic discrimination can, in principle, be mitigated by making use of kinetic stalling after the incorporation of a mismatch (Rajamani et al., 2010; Leu et al., 2013). By introducing a competition between the reduced polymerization rate and a characteristic timescale of the non-equilibrium environment, it is possible to filter correct sequences from incorrect ones (Göppel et al., 2021). To address the challenge of incomplete copies, Zhou et al. propose an new scenario of replication, the so-called Virtual Circular Genome (VCG) (Zhou et al., 2021). In this scenario, genetic information is stored in a pool of oligomers that are shorter than the circular genome they collectively encode: Each oligomer bears a subsequence of the circular genome, such that the collection of all oligomers encodes the full circular genome virtually. Within the pool, each oligomer can act as template or primer (Zhou et al., 2021). The oligomers hybridize to each other and form complexes that allow for templated ligation of two oligomers, or for the extension of an oligomer by a monomer. Because the sequences of the ligated strands and the template are part of the genome, most of the products should also retain the sequence of the genome. That way, long oligomers encoding the circular genome can be produced at the expense of short oligomers (Zhou et al., 2021). The long strands, in turn, can assemble into catalytically active ribozymes. With a continuous influx of short oligomers, the VCG might allow for continuous replication of the virtually encoded circular genome. Importantly, replication in the VCG is expected to avoid the issue of incomplete copies. Since the genome is circular, it does not matter which part of the genome an oligomer encodes, as long as the sequence is compatible with the genome sequence. An additional feature of the VCG scenario is that replication should be achievable without the need of adding many nucleotides to a primer: Provided the concentration of oligomers decreases exponentially with their length, the concentration of each oligomer in the pool can be doubled by extending each oligomer only by a few nucleotides (Zhou et al., 2021). The extension of an oligomer by a few nucleotides in a VCG pool has already been demonstrated experimentally (Ding et al., 2023).
A recent computational study points out that the VCG scenario is prone to loss of genetic information via “sequence scrambling” (Chamanian and Higgs, 2022): If the genome contains identical sequence motifs at multiple different loci, replication in the VCG will mix the sequences of these loci, thus destroying the initially defined genome. It is currently unclear, which conditions could prevent this genome instability of VCGs, such that their genetic information is retained. Length distribution, sequence composition, oligonucleotide concentration and environmental conditions, such as temperature, all affect the stability of complexes and thus the replication dynamics of the VCG pool. Here, we characterize the replication fidelity and yield of VCG pools using a kinetic model, which explicitly incorporates association and dissociation of RNA strands as well as templated ligation. We study a broad spectrum of prebiotically plausible and experimentally accessible oligomer pools, from pools containing only monomers and long oligomers of a single length to pools including a range of long oligomers with uniform or exponential concentration profile. The length of the included oligomers as well as their concentration are free parameters of our model.
We find that, regardless of the pool composition, three competing types of template-directed ligation reactions emerge: (i) ligations between two short oligomers (or monomers), producing products too short to specify a unique genomic locus, (ii) ligations between a short and a long oligomer, typically generating longer products compatible with the genome sequence, and (iii) ligations between two long oligomers, which often yield sequences incompatible with the genome. These erroneous ligation of type (iii) are a key driver of sequence scrambling, as they covalently link oligomers originating from non-adjacent genomic loci, effectively “mixing” distant regions of the genome. Fidelity is primarily determined by the competition between the correct extension of a long oligomer and the erroneous ligation of two long oligomers. The likelihood of the latter can be reduced by decreasing the relative abundance of long oligomers, even though this increases the frequency of unproductive ligations between short oligomers. As a result, fidelity can be improved at the cost of reduced yield. The efficiency, meaning the yield attainable at a fixed high fidelity, thus depends on the length distribution of the oligomers in the pool.
Alternatively, the issue of erroneous ligations is mitigated if ligations of long oligomers are kinetically suppressed, such that each ligation incorporates only one monomer at a time, as in the experimental study by Ding et al. (Ding et al., 2023). In this case, the VCG concentration can be chosen arbitrarily large without compromising fidelity. Interestingly, our model predicts an un-expected feature: In the limit of high VCG concentrations, short oligomers are more likely to be extended than long oligomers, even though, intuitively, complexes containing longer oligomers are expected to be more stable and thus more productive. The same behavior was indeed observed experimentally (Ding et al., 2023). We provide an explanation for this feature, and discuss its dependence on system parameters such as the length and the concentration of long oligomers in the pool.
Model
In the VCG scenario, a circular genome is stored in a pool of oligomers, with each oligomer shorter than the genome it helps encode. Each oligomer bears a subsequence of the circular genome, such that, collectively, the oligomers encode the full genome (Fig. 1A). As the spontaneous emergence of such VCG pools is expected to be rare (Chamanian and Higgs, 2022), our study focuses on the conditions under which an existing VCG pool can reliably replicate. We therefore begin with a known genome and an associated VCG pool, without addressing the question of origin. To set up our model of VCG dynamics, we specify (i) the circular genome used, (ii) the procedure by which the genome is mapped to a set of oligomers, and (iii) the chemical reactions governing the system’s evolution.

Model.
(A) In the Virtual Circular Genome (VCG) scenario, a circular genome (depicted in green) as well as its sequence complement are encoded in a pool of oligomers (depicted in blue and orange). Collectively, the pool of oligomers encodes the whole sequence of the circular genome. Depending on their length, two types of oligomers can be distinguished: Long VCG oligomers specify a unique locus on the genome, while feedstock molecules (monomers or short oligomers) are too short to do so. (B) The length-distribution of oligomers included in the VCG pool is assumed to be exponential. The concentration of feedstock and VCG oligomers as well as their respective length scales of exponential decay
Circular Genomes
For a given genome length, LG, there are
Construction of VCG Pools
To specify a VCG pool that encodes a genomic sequence, one must select which subsequences are included in the pool at which concentrations. We consider unbiased pools, where the concentration of subsequences, c(L), depends only on their length, L, i.e., all subsequences of a given length are included at equal concentration. We refer to the length-dependent concentration profile as the length distribution of the pool. Depending on their length, oligomers fall into two categories (Fig. 1B): (i) short feedstock molecules (monomers and oligomers) and (ii) long VCG oligomers. Feedstock oligomers are oligomers that are shorter than the unique motif length LU. Since their sequence appears multiple times on the genome, they do not encode a specific position on the genome. Thus, they serve as feedstock for the elongation of VCG oligomers rather than as information storage. Conversely, VCG oligomers, which are longer than the unique motif length LU, have a unique locus on the circular genome. Collectively, the VCG oligomers enable the reconstruction of the full genome.
The full length distribution, c(L), can be decomposed into the contributions of feedstock and VCG oligomers,
We assume that both cF and cV follow an exponential length distribution. In our model, the concentration of VCG oligomers can be varied independently of the concentration of feedstock, and the length scales for the exponential decay (
For any other oligomer length, the concentrations equal zero. This parametrization includes uniform length distributions as a special case (κF = 0 and κV = 0), and also allows for concentration profiles that are peaked. Peaked length distributions can emerge from the interplay of templated ligation, (de)hybridization and outflux in open systems (Rosenberger et al., 2021). We define the total concentration of feedstock,
(De)hybridization Kinetics
Oligomers can hybridize to each other to form double-stranded complexes, or dehybridize from an existing complex. For simplicity, we do not include self-folding within a strand, which is a reasonable assumption for short oligomers. The stability of a complex is determined by its hybridization energy, with lower hybridization energy indicating greater stability. We use a simplified nearest-neighbor energy model to compute the hybridization energy (Rosenberger et al., 2021; Göppel et al., 2022; Laurent et al., 2024): The total energy equals the sum of the energy contributions of all nearest-neighbor blocks in a given complex (Fig. 1C). The energy contribution associated with a block of two Watson-Crick base pairs (matches) is denoted γ < 0, and dangling end blocks involving one Watson-Crick pair and one free base contribute γ/2. Nearest-neighbor blocks with mismatches increase the hybridization energy by γMM > 0 per block, thus reducing the stability of the complex. The rate constants of hybridization and dehybridization are related via
where c° = 1 M is the standard concentration, and ΔG is the free energy of hybridization. The association rate constant kon is proportional to the encounter rate constant, kenc = 1/(c°t0). The encounter timescale t0 serves as the elementary time unit of the kinetic model, with all reaction timescales measured relative to it.
Templated Ligation
Two oligomers A and B that are hybridized adjacently to each other on a third oligomer C can produce a new oligomer A-B via templated ligation (Fig. 1D). Depending on the length of A and B, we distinguish three types of ligation reactions (Fig. 1E): (i) F+F ligations, in which two feedstock molecules ligate, (ii) F+V ligations, where a VCG oligomer is extended by a feedstock molecules, and (iii) V+V ligations involving two VCG oligomers. The formation of a covalent bond via templated ligation is not spontaneous, but requires the presence of an activation reaction. Usually, these reactions add a leaving group to the 5′-end of the nucleotide, which is cleaved during bond formation (Kervio et al., 2016; Walton and Szostak, 2016). We assume that the concentration of activating agent is sufficiently high for the activation to be far quicker than the formation of the covalent bond, such that activation and covalent bond formation can be treated as a single effective reaction. When not otherwise stated, we assume that all possible templated ligation reactions occur with the same rate constant klig.
Observables
Templated ligation in the pool forms longer oligomers at the expense of shorter oligomers and monomers. While the product of an F+V ligation (or V+V ligation) is always a VCG oligomer, F+F ligations can produce feedstock or VCG oligomers. In both cases, a produced VCG oligomer can be correct (compatible with the genome) or incorrect (incompatible). We quantify these processes by measuring extension fluxes in units of nucleotides ligated to an existing strand (counting the length of the shorter ligated strand as the number of incorporated bases). In particular, we define the fidelity f as the extension flux resulting in correct VCG oligomers relative to the flux resulting in any VCG oligomer,
In addition, we introduce the yield y as the proportion of total extension flux that produces VCG oligomers,
Efficient replication of the VCG requires both, high fidelity and high yield. Hence, we introduce the efficiency of replication η as the product of fidelity and yield,
Moreover, we define the ligation share s of a ligation type, which allows us to discern the contributions of different types of templated ligations (F+F, F+V, V+V) to fidelity, yield, and efficiency,
Results
Replication efficiency reaches a maximum at intermediate concentrations of VCG oligomers
We begin our analysis of the dynamics of VCG pools with an exemplary genome of length LG = 16 nt,
This genome contains all possible monomers and dimers with equal frequency, ensuring that all motifs up to LE = 2 nt are represented. Identifying a unique address on this genome requires at least three nucleotides. Therefore, the unique motif length is LU = 3 nt, and VCG oligomers need to be at least 3 nt long. Further below, we also explore genomes of different lengths LG, as well as varying characteristic sequence length scales LE and LU (method for genome construction detailed in Supplementary Section S1).
Based on the genome, we construct the initial oligomer pool. As a first step, we focus on a simple scenario in which the pool contains only monomers (serving as feedstock) and VCG oligomers of a single, defined length. The VCG pools are evolved in time using a stochastic simulation based on the Gillespie algorithm (Rosenberger et al., 2021; Göppel et al., 2022; Laurent et al., 2024). Since the Gillespie algorithm operates on the level of counts of molecules instead of concentrations, we must assign a volume to each system (in the range 1 µm3 to 10 000 µm3, see Supplementary Material Section S2). Besides the volume, we also need to choose the reaction rate constants appropriately: (i) The association time t0 is the fundamental time unit in our kinetic model, and all other times are expressed relative to t0. Experimentally determined association rate constants are typically around 106 −107 M−1 s−1 (Wetmur and Davidson, 1968; Braunlin and Bloomfield, 1991; Ashwood et al., 2023; Todisco et al., 2024a). For the purpose of estimating absolute timescales, we assume a constant association timescale of t0 ≈ 1 µs in the following. (ii) The timescale of dehybridization is computed via Eq. (3) using the energy contribution γ = −2.5 kBT for a matching nearest-neighbor block and γMM = 25.0 kBT in case of mismatches. The high energy penalty of nearest-neighbor blocks involving mismatches, γMM, is chosen to suppress the formation of mismatches, while the value of γ roughly matches the average energy of all matching nearest-neighbor blocks given by the Turner energy model of RNA hybridization (Mathews et al., 2004). (iii) For templated ligation, we select a reaction rate constant of
Based on the ligation events observed in the simulation, we compute the observables introduced above (Model section). Due to the small ligation rate constant, it is computationally unfeasible to simulate the time evolution for more than a few ligation time units. Consequently, ligation events are scarce, which leads to high variances in the computed observables. We mitigate this issue by calculating the observables based on the concentration of complexes that are in a productive configuration, even if they do not undergo templated ligation within the time window of the simulation (Supplementary Material Section S2).
From the ligation events, we compute the total flux of oligomer formation. The observable y (yield) quantifies the fraction of this total flux directed to producing VCG oligomers. Fig. 2B shows how the yield depends on the concentration of VCG oligomers,

Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools).
(A) The pool contains a fixed concentration of monomers,
Fig. 2C also shows that the relative contribution of V+V ligations increases with increasing
Characterizing replication efficiency via full simulations is computationally expensive: Depending on parameters, obtaining a single data point in Fig. 2B–D can require hundreds of simulations, each lasting several days. To explore a broad parameter space more easily, we introduce an approximate adiabatic method that (i) assumes ligation is much slower than any hybridization or dehybridization event, and (ii) relies on a coarse-grained sequence-independent representation of oligomers. Details are provided in Supplementary Material Sections S3 and S4. In brief, because ligation is rare, we first compute the equilibrium distribution of free and bound oligomers. Oligomers of the same length share a common concentration, and complex concentrations are determined via the mass action law using length-dependent dissociation constants. Combining the mass action law with a mass conservation constraint for each oligomer length allows us to compute the equilibrium concentrations of free VCG and feedstock strands,
The results of the adiabatic approach agree well with the simulation data (Fig. 2B-D), supporting that the replication efficiency depends non-monotonously on the concentration of VCG oligomers, with a maximum at intermediate concentration. While the available simulation data only allows for a qualitative characterization of this trend, the adiabatic approach enables a quantitative analysis. For instance, we use the adiabatic approach to determine the equilibrium concentration ratio at which replication efficiency is maximal as a function of the VCG oligomer length LV (solid line in Fig. 2F). As expected from the qualitative trend observed in the simulation, pools containing longer oligomers reach their maximum for lower concentration of VCG oligomers. The shaded area indicates the range of VCG concentrations within which the pool’s efficiency deviates by no more than one percent from its optimum. We observe that this range of close-to-optimal VCG concentrations increases with LV. Thus, pools containing longer oligomers require less fine-tuning of the VCG concentration for replication with high efficiency.
In addition to the numerical results, we utilize the adiabatic approach to study the optimal equilibrium VCG concentration analytically (Supplementary Material Section S6). We find that, for any choice of LV, replication efficiency reaches its maximum when the fractions of dimerization (1+1) reactions and erroneous V+V ligations are equal (Fig. 2C for LV = 6 nt). This criterion can be used to derive a scaling law for the optimal equilibrium concentration ratio
which is shown as a dashed line in Fig. 2F (the length-scale ΛF+F is defined in Supplementary Material Section S6). The optimal equilibrium concentration ratio decreases exponentially with LV, while the hybridization energy γ/2 sets the inverse length-scale of the exponential decay. Analytical estimate and numerical solution agree well, as long as the hybridization is weak and oligomers are sufficiently short. For strong binding and long oligomers, complexes involving more than three strands play a non-negligible role, but such complexes are neglected in the analytical approximation.
Fig. 2G shows how the maximal replication efficiency depends on the VCG oligomer length. Consistent with the qualitative trend observed in Fig. 2D, longer oligomers enable higher maxima in replication efficiency. Regardless of the choice of γ, replication efficiency reaches 100% if LV is sufficiently high. Starting from Eq. (4), we find the following approximation for the maximal replication efficiency attainable at a given oligomer length (dashed lines in Fig. 2G),
where η° is a genome-dependent constant (Supplementary Material Section S7). This equation can provide guidance for the construction of VCG pools with high replication efficiency: Given a target efficiency, the necessary oligomer length and hybridization energy, i.e., temperature, can be calculated. In Fig. 2G, we determine the oligomer length necessary to achieve ηmax = 95% for varying hybridization energies γ. At higher temperature (weaker binding), VCG pools require longer oligomers to replicate with high efficiency.
Eq. (5) is not restricted to the specific example genome of length LG = 16 nt, but applies more generally to genomes of arbitrary length. Any genome of length LG can contain at most 2 LG distinct motifs. Consequently, the minimum length required to specify a unique address on the genome equals
In genomes with different choices of LE and LU (
Multi-length VCG pools are dominated by longest oligomers
In the previous section, we characterized the behavior of pools containing VCG oligomers of a single length. We observed that V+V ligations are error-prone due to insufficient overlap between the educt strands and the template, whereas F+V ligations extend the VCG oligomers with high efficiency. The F+V ligations will gradually broaden the length distribution of the VCG pool, raising the question of how this broadening affects the replication behavior. In principle, introducing multiple oligomer lengths into the VCG pool might even improve the fidelity of V+V ligations, since a long VCG oligomer could serve as a template for the correct ligation of two shorter VCG oligomers.
To analyze this question quantitatively, we first consider the simple case of a VCG pool that only contains monomers, tetramers, and octamers. The concentration of monomers is set to c(1) = 0.1 mM, while the concentrations of the VCG oligomers are varied independently (Fig. 3A). Replication efficiency reaches its maximum at c(8) ≈ 0.1 µM and very low tetramer concentration, c(4) ≈ 7.4 pM, effectively resembling a single-length VCG pool containing only octamers. As shown in Fig. 3B, the maximal efficiency is surrounded by a plateau of close-to-optimal efficiency. The octamer concentration can be varied by more than one order of magnitude without significant change in efficiency. Similarly, adding tetramers does not affect efficiency as long as the tetramer concentration does not exceed the octamer concentration. Fig. 3C illustrates that the plateau of close-to-optimal efficiency coincides with the concentration regime where the ligation of a monomer to an octamer with another octamer acting as template,

Replication performance of pools containing VCG oligomers of two different lengths.
(A) The pool contains a fixed concentration of monomers,
We observe similar behavior in VCG pools containing a range of oligomer lengths from

Replication performance of multi-length VCG pools.
(A) The pool contains a fixed concentration of monomers,
In a realistic prebiotic scenario, the concentration profile of the VCG pool would not be uniform. Depending on the mechanism producing the pool and its coupling to the non-equilibrium environment, it might have a concentration profile that decreases or increases exponentially with length. We use the parameter κV to control this exponential length dependence (Fig. 4A): For negative κV, the concentration increases as a function of length, while exponentially decaying length distributions have positive κV. We find that replication efficiency is high if the concentration of long VCG oligomers exceeds or at least matches the concentration of short VCG oligomers (κV ≤ 0 in Fig. 4C). In that case, replication efficiency is dominated by the long oligomers in the pool, since these form the most stable complexes. As the concentration of long oligomers is decreased further (κV > 0), the higher stability of complexes formed by longer oligomers is eventually insufficient to compensate for the reduced concentration of long oligomers. Replication efficiency is then governed by short VCG oligomers, which exhibit lower replication efficiency. In the limit κV → ∞, replication efficiency approaches the replication efficiency of a single-length VCG pool containing only oligomers of length
Adding dinucleotides to the feedstock decreases replication efficiency
So far, we focused on ensembles that contain solely monomers as feedstock. However, examining the influence of dimers on replication in VCG pools is of interest, since dinucleotides have proven to be interesting candidates for enzyme-free RNA copying (Walton and Szostak, 2016; Sosson et al., 2019; Leveau et al., 2022). For this reason, we study oligomer pools like those illustrated in Fig. 5A: The ensemble contains monomers, dimers and VCG oligomers of a single length, LV. As our default scenario, the dimer concentration is set to 10% of the monomer concentration, corresponding to κF = 2.3, but this ratio can be modified by changing κF.

Replication performance of single-length VCG pools containing monomers and dimers as feedstock.
(A) The pool contains a fixed total concentration of feedstock,
Fig. 5B compares the replication efficiency of a pool with
Replication efficiency reaches its maximum at intermediate VCG concentrations, where replication is dominated by F+V ligations. Notably, the maximal attainable efficiency is significantly lower for pools with dimers than without, as dimers increase the number and the stability of complex configurations that can form incorrect products (Fig. 5C). Without dimers, ligation products are only incorrect if the overlap between the VCG educt oligomer and template is shorter than the unique motif length, LU. With dimers, however, dangling end dimers can cause incorrect products even in case of long overlap of educt oligomer and template (right column in Fig. 5C). The stability of the latter complexes depends on the length of the VCG oligomers, LV, whereas the stability of complexes facilitating incorrect monomer addition is independent of oligomer length (Fig. 5C).
In the presence of dimers, the length-dependent stability of complexes allowing for correct and incorrect F+V ligations causes a competition, which sets an upper bound on the efficiency of replication (Supplementary Material Section S10),
Here, we introduced effective association constants 𝒦a, which depend differently on the VCG oligomer length, LV. While the effective association constant
The effective association constants for correct 1+V and 2+V ligations also scale exponentially with the oligomer length (Supplementary Material Fig. S14).
In systems without dimers, i.e., 𝒦F → ∞, ηmax approaches 100%, which is consistent with the behavior observed in the previous sections. Conversely, in systems containing dimers, the maximal efficiency remains at a value below 100%, which depends on the concentration of dimers in the feedstock. Fig. 5C shows the dependence of maximal replication efficiency on the length of VCG oligomers in pools containing monomers, dimers and VCG oligomers. As LV increases, ηmax converges towards the upper bound defined in Eq. (6) (dashed line in Fig. 5C).
Kinetic suppression of error-prone VCG oligomer ligation
In all scenarios considered so far, the efficiency of replication is limited by a common mechanism, regardless of the specifics of the VCG pool: Ligations involving two oligomers that hybridize to the template over a region shorter than LU are prone to generate incorrect products. In previous sections, we minimized these erroneous ligations by fine-tuning the concentration and length of VCG oligomers. However, such control may become unnecessary if the typically error-prone ligation of two oligomers (V+V) is kinetically suppressed. Kinetic suppression can be an intrinsic property of the activation chemistry: Templated ligation of two oligomers can be several orders of magnitude slower than the extension of an oligomer by a single monomer (Prywes et al., 2016; Ding et al., 2021). As a result, V+V ligations are disfavored purely by their slower kinetics. In addition, it is conceivable that only monomers are chemically activated while longer oligomers remain inactive, which would further reduce the likelihood of erroneous ligations. This scenario has already been explored experimentally (Ding et al., 2023). In natural environments, it could occur, for instance, when activated monomers are produced externally and then diffuse into compartments containing the VCG but lacking internal activation pathways (Toparlak et al., 2023; Kriebisch et al., 2024).
Within our model, we capture the kinetic suppression by introducing two different rates of ligation, klig,1 for ligations involving a monomer and klig,>1 for ligations involving no monomer, allowing for kinetic discrimination between these processes. We explore the resulting replication efficiency in the limit of perfect kinetic discrimination (klig,>1/klig,1 → 0) where only monomers are reactive for ligation. We first consider a pool where the reactive monomers are mixed with VCG oligomers of a single length as well as non-reactive dimers (Fig. 6A). We vary the concentration of VCG oligomers, but keep the feedstock concentrations constant. For small VCG concentrations, we observe low efficiencies (Fig. 6B), as the ligation of two monomers, or one monomer and one dimer, are most likely. Conversely, high

Replication performance of single-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers alongside non-reactive dimers and VCG oligomers of a single length. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of
While replication efficiency characterizes the relative amount of nucleotides used for the correct elongation of VCG oligomers, it is also interesting to analyze which fraction of VCG oligomers is in a monomer-extension-competent state,
Here,
While the asymptotic value
This scaling implies that longer oligomers require exponentially lower VCG concentration to achieve a given ratio r1+V (Fig. 6C), as their greater length allows them to form more stable complexes. This observation implies that pools with longer oligomers will always be more productive than pools with shorter oligomers (at equal VCG concentration).
The behavior becomes more complex in pools containing VCG oligomers of multiple lengths, due to the competitive binding within such heterogeneous pools. To illustrate this, we examine an ensemble containing VCG oligomers ranging from
where

Replication performance of multi-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers as well as non-reactive dimers and VCG oligomers. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of
To understand the mechanism underlying the inversion of productivity, we analyze how different complex types contribute to r1+V(L). We introduce
which denotes the fraction of oligomers of length LE that are in a monomer-extension-competent complex configuration that uses an oligomer of length LT as template. The term
It is noteworthy that Ding et al. have already observed the inversion of productivity experimentally (Ding et al., 2023). In their study, they included activated monomers, activated imidazolium-bridged dinucleotides and oligomers up to a length of 12 nt, and observed that the primer extension rate for short primers is higher than the extension rate of long primers. Even though our model differs from their setup in some aspects (e.g., different circular genome, no bridged dinucleotides) evaluating our model using parameters similar to those of the experimentally studied system predicts inversion of productivity that qualitatively agrees with the experimental findings (Supplementary Material Section S14). We therefore assume that the mechanism underlying inversion of productivity described here also applies to the experimental observations.
Discussion and Conclusion
While significant progress has been made in understanding the prebiotic formation of ribonucleotides (Powner et al., 2009; Kim et al., 2011; Benner et al., 2012; Becker et al., 2016) and characterizing ribozymes that might play a role in an RNA world (Mutschler et al., 2015; Attwater et al., 2018; Pressman et al., 2019; Tjhung et al., 2020), a convincing scenario bridging the gap between prebiotic chemistry and ribozyme-catalyzed replication is still missing. Here, we studied a scenario proposed by Zhou et al. (Zhou et al., 2021) (the ‘Virtual Circular Genome’, VCG) using theoretical and computational approaches. We analyzed the process whereby template-directed ligation replicates the genomic information that is collectively stored in the VCG oligomers. Our analysis revealed a trade-off between the fidelity and the yield of this process: At low concentration of VCG oligomers, most of the ligations produce oligomers that are too short to specify a unique locus on the genome, resulting in a low yield of replication (Fig. 2B-C). At high VCG concentration, erroneous templated ligations cause sequence scrambling and consequently limit the fidelity of replication (Fig. 2C-D). We considered two solutions to these issues: (i) a VCG pool composition that optimizes its replication behavior within the bounds of the fidelity-yield trade-off, and (ii) breaking the fidelity-yield trade-off given that error-prone ligations can be kinetically suppressed.
The first solution maximizes the yield of replication for fixed fidelity. In pools containing only monomers and VCG oligomers of a single length, replication efficiency can be maximized by increasing the length of VCG oligomers and decreasing their concentration (Fig. 2F-G). This reduces the likelihood of error-prone templated ligation of long oligomers. When the pool contains VCG oligomers of multiple lengths, replication efficiency is typically governed by the longest oligomer in the pool (Fig. 4D). Including dimers as feedstock for the replication increases the error fraction (Fig. 5B-C), as dimers that bind to a template with a dangling end are prone to form an incorrect product (Fig. 5D).
The second solution eliminates the error-prone templated ligation of two VCG oligomers by suppressing them kinetically, e.g., by assuring that only monomers are chemically activated. This enables both fidelity and yield to remain high at high VCG concentrations (Fig. 6B), effectively breaking the fidelity-yield trade-off. Longer VCG oligomers are then more likely to be extended than shorter oligomers at equal concentration, (Fig. 6C). However, this is only true for pools with VCG oligomers of a single length — once multiple VCG oligomer lengths compete with each other, shorter oligomers can be more productive than longer ones (Fig. 7B). This feature, which has also been observed experimentally (Ding et al., 2023), is caused by an asymmetry in the productive interaction between short and long oligomers (Fig. 7C): While short oligomers can sequester longer oligomers as templates for their extension by a monomer, short oligomers are unlikely to serve as templates for longer oligomers (Fig. 7D-E).
As we intended to study the pathways responsible for sequence scrambling and to explore possible mitigation strategies, we based our analysis on a coarse-grained model which neglects some experimental details. First, we assumed that a complex instantaneously dehybridizes if it contains a non-complementary base pair, whereas in reality, short duplexes can tolerate a limited number of mismatches (Todisco et al., 2024a). While such mismatches can facilitate incorrect hybridization and introduce additional replication errors, we expect this effect to be moderate: Mismatches preferentially occur near the ends of the hybridized region, where their destabilizing effect on binding is weakest (Todisco et al., 2024a). However, such terminal mismatches have also been shown to significantly reduce ligation rates (Rajamani et al., 2010; Leu et al., 2013), which in turn limits the likelihood of forming incorrect products.
Second, we simplified the hybridization dynamics by assuming that all oligomers bind to each other at equal rates, and that dehybridization rates are determined by the hybridization energy computed via a nearest-neighbor model. However, recent work has shown that hybridization to a gap flanked by two oligomers proceeds more slowly than binding to an unoccupied template. Moreover, the resulting nicked complexes (two oligomers hybridized adjacently on a template) are more stable than predicted by standard nearest-neighbor models due to enhanced stacking interactions at the nick site (Todisco et al., 2024b). While this added stability is not expected to affect overall replication efficiency of the VCG (since all productive complexes, correct or incorrect, contain a nick), it can impact the kinetics of the system. In particular, the extended lifetime of such complexes may challenge the adiabatic approximation used in much of our analysis, which assumes ligation is always slower than hybridization and dehybridization.
Third, we do not model the activation chemistry explicitly, but instead assume that all monomers (and, depending on the scenario, also oligomers) are always reactive. As a result, some activated intermediates that are known to form in experiments, such as imidazolium-bridged dinucleotides (Walton and Szostak, 2016), are not modeled. Nonetheless, we include aspects of activation chemistry in a coarse-grained manner. Specifically, to capture the experimentally observed difference in reactivity between monomer incorporation and templated ligation of oligomers under aminoimidazolium activation, we introduce two distinct ligation rate constants. With this approach, we describe the experimental setup well enough to qualitatively reproduce features observed in experiments, for example, the preferential extension of shorter oligomers by monomers in pools containing VCG oligomers of varying lengths (Ding et al., 2023).
The VCG scenario was proposed to close the gap between prebiotic chemistry and ribozyme-catalyzed replication. To this end, VCG pools need to be capable of replicating (parts of) ribozymes that play a role in the emergence of life. While there are cases of small ribozymes (Pressman et al., 2019) or ribozymes with small active sites (e.g., the Hammerhead ribozyme (Scott et al., 2013)), ribozymes obtained experimentally via in vitro evolution are often more than a hundred nucleotides long (Johnston et al., 2001; Müller and Bartel, 2008; Wochner et al., 2011; Attwater et al., 2013). Remarkably, our model suggests that the VCG scenario enables high-fidelity replication of long genomes, even in pools containing relatively short VCG oligomers. For a genome of length LG, a sequence of at least
It is noteworthy that replication in the VCG scenario imposes a selection pressure on prebiotic genomes to reduce their unique motif length, LU. A circular genome requiring many nucleotides to specify a unique locus (high LU) replicates less efficiently than one with a shorter LU, assuming all other properties of the VCG pool — particularly the oligomer length distribution — remain identical. This length distribution arises from the interplay between the chemical kinetics and molecular transport governed by the physical environment. For instance, templated ligation in an open system with continuous oligomer influx and outflux can produce a non-monotonic length distribution, with a concentration peak at a characteristic oligomer length Lc, determined by the interplay between dehybridization and outflux (Rosenberger et al., 2021). Through this emergent length scale, the environment shapes replication in the VCG scenario. If the environment facilitates long oligomers (Lc > LU), replication proceeds efficiently. Conversely, in environments with a small Lc, repeating motifs longer than Lc are selected against. In such cases, mutational errors may replace long repeated motifs with functionally equivalent sequences composed of shorter unique motifs, thereby increasing replication efficiency.
Given the broad range of prebiotically plausible non-equilibrium environments (Ianeselli et al., 2023), it is reasonable to expect that some environments provide the required conditions for efficient replication. The constraints formulated in this work can help to guide the search for self-replicating oligomer pools, in the vast space of possible concentration profiles and non-equilibrium environments.
Supplementary Material
S1. Constructing circular genomes
In the Virtual Circular Genome (VCG) scenario, genomes are encoded in a pool of oligomers. The encoded genomes are assumed to be circular sequences of length LG, containing both the original sequence and its reverse complement. Each genome is characterized by two fundamental length scales that reflect different aspects of motif distribution along the sequence. The minimal unique motif length, LU, is defined as the shortest subsequence length for which all motifs of length L ≥ LU appear at most once in the genome. In contrast, the exhaustive coverage length, LE, denotes the largest motif length for which all
Similarly, for a motif to be uniquely addressable on the genome, its length must be at least
We note that
The characteristic length scales LE and LU impose constraints on how motifs are distributed. For example, when L = LU, all motifs of length L must appear at most once, while at least one motif of length L − 1 must occur more than once. To quantify motif distributions, we introduce the motif entropy,
where fi denotes the relative frequency of motif i across the genome and its reverse complement. Motif entropy ranges from zero (a homogeneous sequence with only one motif) to a maximum value that depends on the subsequence length,
For a motif length L to qualify as the unique motif length LU, its entropy must be maximal, S(L) = Smax(L), while S(L − 1) must be smaller than its respective maximum, S(L − 1) < Smax(L − 1).
The correspondence between characteristic length scales and motif entropy provides a way to construct genome sequences with specified motif characteristics. By treating the entropy function as an effective “Hamiltonian” ℋ, we can generate genome sequences through Metropolis–Hastings sampling. Each update step in the Metropolis-Hastings algorithm involves either a single-nucleotide mutation or a cut-and-paste operation that relocates a segment of the genome to a new position (the cut-and-paste operation is 10 times more likely than the single nucleotide mutation). The acceptance criterion follows the standard Metropolis rule: modifications that reduce the Hamiltonian are always accepted, while increases in energy are accepted with probability exp [−β(Eold − Enew)], where β−1 is an effective temperature chosen to be small compared to the typical energy to ensure convergence to the minimum. Simulations are either run until a predefined entropy target is reached or until the energy converges to a plateau.
To generate genomes with
Starting from a random sequence, we perform 10,000 Metropolis–Hastings steps at an inverse temperature β = 10−5 to construct genome sequences of lengths LG = 16 nt (Main Text) and LG = 64 nt (Table S2).
To explore genomes, where
using two different sampling protocols. In the first, the simulation is terminated as soon as the genome reaches the desired values of LE and LU. The resulting motif distributions on the intermediate length scales (LE < L < LU) remain close to uniform, with only minor biases sufficient to enforce the length-scale constraints. In the second protocol, entropy minimization continues beyond the point at which the target values are achieved, leading to more strongly biased motif distributions on intermediate length scales. These construction strategies allow us to systematically tune genome complexity and motif structure, enabling controlled investigations of how the characteristic length scales influence replication dynamics (Section S8).
S2. Computing replication observables based on the kinetic simulation
We simulate the dynamics of VCG pools using a kinetic simulation that is based on the Gillespie algorithm. In the simulation, oligomers can hybridize to each other to form complexes, or dehybridize from an existing complex. Moreover, two oligomers can undergo templated ligation if they are hybridized adjacent to each other on a third oligomer. At each time t, the state of the system is determined by a list of all single-stranded oligomers and complexes as well as their respective copy number. We refer to the state of the system at the time t as the ensemble of compounds ℰt. Given the copy numbers, the rates ri of all possible chemical reactions i ∈ ℐ can be computed. To evolve the system in time, we need to perform two steps: (i) We sample the waiting time until the next reaction, τ, from an exponential distribution with mean
Our goal is to compute observables characterizing replication in the VCG scenario based on the full kinetic simulation. For clarity, we focus on one particular observable (yield) for the derivation. The results for other observables are stated directly, as their derivations follow analogously. Recall the definition of the yield introduced in the main text,
As we are interested in the initial replication performance of the VCG, we compute the yield based on the ligation events that take place until the characteristic timescale of ligations
Instead, we compute the replication observables based on the copy number of complexes that could potentially perform a templated ligation, i.e., complexes, in which two strands are hybridized adjacent to each other, such that they could form a covalent bond. It can be shown analytically that the number of potentially productive complexes is a good approximation for the number of incorporated nucleotides. The number of incorporated nucleotides can be computed as the integral over the ligation flux, weighted by the number of nucleotides that are added in each templated ligation reaction,
Here, N(C) denotes the copy number of the complex C in the pool ℰt. Le,1 and Le,2 denote the lengths of the oligomers that undergo ligation, and 𝟙 is an indicator function which enforces that only complexes in a configuration that allows for templated ligation contribute to the reaction flux. As only few ligation events are expected to happen until τlig, it is reasonable to assume that the ensembles ℰt do not change significantly during t ∈ [0, τlig]. Therefore, the integration over time may be interpreted as a multiplication by τlig,
where ⟨… ⟩ denotes the average over realizations of the ensembles ℰt within the time interval t ∈ [τeq, τlig]. Note that, at this point, we made the additional assumption that no templated ligations are taking place between [0, τeq]. This assumption is reasonable, as (i) the equilibration process is very short compared to the characteristic timescale of ligation, and (ii) the number of complexes that might allow for templated ligation during equilibration is lower than in the equilibrium (we start the simulation with an ensemble of single-stranded oligomers), implying that the rate of templated ligation is small.
In order to compute the average over different realizations of ensembles ℰ, we need to sample a set of uncorrelated ensembles that have reached the hybridization equilibrium. For this purpose, we run a full kinetic simulation. The simulation starts with a pool containing only single-stranded oligomers, and reaches the (de)hybridization equilibrium after a time τeq. We identify this timescale of equilibration by fitting an exponential function to the total hybridization energy of all complexes in the system, ΔGtot (Fig. S1A). In the set of ensembles used to evaluate the average in Eq. (S1), we only include ensembles for time t > τeq to ensure that the ensembles have reached (de)hybridization equilibrium. To ensure that the ensembles are uncorrelated, we require that the time between two ensembles that contribute to the average is at least τcorr. The correlation time, τcorr, is determined via an exponential fit to the autocorrelation function of ΔGtot (Fig. S1B). Besides computing the expectation value (Eq. S1), we are also interested in the “uncertainty” of this expectation value, i.e., in the standard deviation of the sample mean σ⟨X⟩. (We use X as a short-hand notation for ∑C∈ℰ N(C) min(Le,1, Le,2) 𝟙 (C allows for templated ligation).) The standard deviation of the sample mean, σ⟨X⟩, is related to the standard deviation of X, σX, by the number of samples, σ⟨X⟩ = (Ns)−1/2σX. Moreover, based on the van-Kampen system size expansion, we expect the standard deviation of X to be proportional to V −1/2. Thus, σ⟨X⟩ ∝ (NsV)−1/2.
Using Eq. (S1) (as well as an analogous expression for the number of nucleotides that are incorporated in VCG oligomers), the yield can be expressed as
The additional condition 𝟙 (Le,1 + Le,2 ≥ LS) in the numerator ensures that the product oligomer is long enough to be counted as a VCG oligomer, i.e., at least LS nucleotides long. Analogously, the expression for the fidelity of replication reads
Multiplying fidelity and yield results in the efficiency of replication,

Characteristic timescales in the kinetic simulation.
A The equilibration timescale is determined based on the total hybridization energy of all strands in the pool, ΔGtot. By fitting an exponential function to ΔGtot, we obtain a characteristic timescale, τ * (vertical dotted line), which is then used to calculate the equilibration time as τeq = 5τ * (vertical dashed line). The horizontal dashed line shows the total hybridization energy expected in (de)hybridization equilibrium according to the coarse-grained adiabatic approach (Sections S3 and S4). B The correlation timescale is determined based on the autocorrelation of ΔGtot. We obtain τcorr (vertical dashed line) by fitting an exponential function to the autocorrelation. In both panels, we show simulation data obtained for a VCG pool containing monomers and VCG oligomers with a concentration of
The ligation share of a particular type of templated ligation s(type), that is, the relative contribution of this templated-ligation type to the nucleotide extension flux, can be represented in a similar form as the other observables,
As all observables are expressed as the ratio of two expectation values,
Since the variances,

Simulation runtime of the full kinetic simulation for a VCG pool that includes monomers and VCG oligomers of length L = 8.
The total concentration of feedstock monomers equals

System parameters used to compute the replication observables yield, y, and replication efficiency, η, based on the kinetic simulation.
The computed observables are shown in Fig. 2. Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). (A) The pool contains a fixed concentration of monomers,
S3. Coarse-grained representation of complexes in the adiabatic approach
It is computationally expensive to evaluate the replication observables via the full kinetic simulation. For this reason, we develop an adiabatic approach, which allows us to compute the replication observables, provided that templated ligation is far slower than (de)hybridization. The adiabatic approach relies on a coarse-grained representation of the oligomers in the pool, which is introduced in this section.
Single Strands
In the coarse-grained description, oligomers of identical length are assumed to have equal concentration, irrespective of their sequence. This assumption is justified for two reasons: (i) We initialize the VCG pool without sequence bias, i.e., all oligomers compatible with the genome sequence are included at equal concentration. (ii) Hybridization energy in our simplified energy model (and therefore also the stability of complexes) only depends on the length of the hybridization site, not on its sequence, provided there is no mismatch. Each coarse-grained oligomer is uniquely identified by its length L, and it represents a group of oligomers with ℭ (L) distinct sequences. We refer to the number ℭ (L) as the combinatorial multiplicity of the coarse-grained oligomer. The value of ℭ (L) depends on the choice of the encoded genome. By construction (see main text), we assume that all possible oligomer sequences of length L < LS are included in the genome. For L ≥ LS, only a subset of all possible 4L sequences is included, but no sequence is repeated multiple times across the genome. Therefore, the combinatorial multiplicity equals
Duplexes

Schematic representation of complexes considered in the adiabatic approach.
A A duplex is comprised of two strands, which we refer to as W (Watson) and C (Crick). The relative position of the strands is characterized by alignment index i; for the depicted duplex, i = −2. The length of the hybridization region is called Lo. B A triplex contains three strands. By convention, we denote the two strands that are on the same “side” of the complex as W1 and W2, and the complementary strand as C. The alignment indices i and j denote the positions of W1 and W2 relative to C. For the depicted triplex, i = −2 and j = 3. The length of the hybridization regions are called Lo,1 and Lo,2.
Two strands can form a duplex by hybridizing to each other. We refer to the bottom oligomer as ‘Crick” strand C and to the top oligomer as “Watson” strand W. A duplex is uniquely characterized by the lengths of the oligomers, LC and LW, as well as their relative alignment (Fig. S3A). The alignment index i denotes the position of the Watson strand with respect to the Crick strand. As there needs to be at least one nucleotide of overlap between the strands for a duplex to exist, the alignment index needs lie in the interval i ∈ [− (LW − 1), LC − 1]. Using the alignment index, we can also determine if the duplex has a left (or right) dangling end. The corresponding indicator variables are called dl (or dr),
Moreover, the length of the hybridization region Lo can be computed via
The hybridization energy of the duplex depends on the length of the hybridization region as well as on the existence/absence of dangling ends. For a hybridization site of length Lo, there are Lo − 1 nearest-neighbor energy blocks each of which contributes γ to the energy. Moreover, each dangling end contributes
To compute the combinatorial multiplicity for a duplex with fixed LC, LW and alignment index i, we need to multiply the combinatorial multiplicity of the Crick strand by the number of possible hybridization partners. We assume that a hybridization partner is possible if its sequence is perfectly complementary to the lower strand within the hybridization region, whereas hybridization partners with mismatches are not accounted for. This is sensible as long as the energetic penalty for mismatches in the full kinetic simulation is sufficiently large to suppress mismatches. The number of possible hybridization partners is determined by the length of the overlap region Lo: If Lo ≥ LS, the pool contains only one oligomer sequence that can act as hybridization partner by construction of the genome. For shorter hybridization regions, multiple hybridization partners might be possible. Their number is set by the combinatorial multiplicity of the Watson oligomer divided by the combinatorial multiplicity of the hybridization region,
To avoid double-counting, we only account for complexes in which the Crick strand is at least as long as the Watson strand, LC ≥ LW, and multiply ℭ (LW, LC, i) by 1/2 if LW = LC.
Ternary Complexes
Ternary complexes, i.e., complexes comprised of three strands, are uniquely characterized by the length of the three oligomers, LC, LW,1, LW,2, as well as their respective alignment (Fig. S3B). The alignment index i denotes the position of strand W1 relative to strand C. Analogously, j denotes the relative position of W2 relative to oligomer C. Two strands that are hybridized to each other need to have a hybridization region of at least one nucleotide. Moreover, the strands W1 and W2 must not occupy the same position on the template strand C. Taking both requirements together, the alignment indices fall within the intervals,
A triplex may have a dangling end not only on its left or right end, but also in the gap between strands W1 and W2. Three boolean variables are necessary to denote the presence/absence of the respective dangling ends,
The length of the two hybridization regions are given by
The hybridization energy depends on the length of the overlap regions as well as on the existence of dangling ends: As in the duplex, each overlap region of length Lo,1 (or Lo,2) comprises Lo,1 − 1 (or Lo,2 − 1) nearest neighbor blocks, each of which contributes γ to the total energy. Moreover, every dangling end contributes γ/2. Note that the presence of a gap between strands W1 and W2, i.e., dm = 1, implies that there are two dangling ends, one for W1 and another for W2. Gaps in between two complexes contribute γ/2 per each dangling end, adding up to γ. If there is no gap between the strands, i.e., dm = 0, there are no dangling end contributions, but a new full nearest neighbor block emerges, which contributes γ to the energy. Therefore, the total energy reads,
The combinatorial multiplicity of a triplex is computed in the same way as for the duplex: The combinatorial multiplicity of the strand C is multiplied by the number of possible hybridization partners W1 and W2. Again, the number of possible partners is set by the length of the hybridization regions,
We use 𝟙 to denote the indicator function which returns 1 in case the condition in the bracket is fulfilled and zero otherwise. As all ternary complexes are asymmetric, there is no need to introduce a symmetry correction factor.
Quaternary complex
The largest complexes to be accounted for in our coarse-grained adiabatic approach are quaternary complexes, i.e., complexes comprised of four strands. We need to distinguish three types of such complexes: (i) 3-1 quaternary complexes, (ii) left-tilted 2-2 quaternary complexes and (iii) right-tilted 2-2 quaternary complexes. In 3-1 quaternary complexes, three Watson strands are hybridized to one Crick strand (Fig. S4), whereas in 2-2 quaternary complexes, two Watson strands are hybridized to two Crick strands (Fig. S5 and Fig. S6).
3-1 quaternary complexes

Schematic representation of a 3-1 tetraplex.
Three strands (in the following referred to as Watson strands W1, W2, and W3), hybridize to a single template strand (Crick strand C). The positions relative to the left end of the C strand are given by the alignment indices i, j, and k; here, i = −2, j = 2, k = 5. The length of the overlap regions are denoted Lo,1, Lo,2 and Lo,3.
Fig. S4 depicts a typical 3-1 tetraplex. Such a tetraplex is uniquely characterized by the length of its oligomers, LC, LW,1, LW,2, LW,3, as well as their relative position to each other denoted by the alignment indices i, j, and k. All positions within the triplex are measured relative to the left end of the C strand. Any W strand needs to have at least one nucleotide of overlap with the C strand, but two W strands must never occupy the same position on the C strand. Consequently, the alignment indices fall within the intervals,
There are two dangling ends (left and right) and potentially two gaps between the W strands: one gap between W1 and W2 and another one between W2 and W3. The following boolean variables indicate the presence/absence of the respective dangling ends,
The length of the hybridization regions is given by
Following the same reasoning as in the case of ternary complexes, the energy equals
Similarly, the combinatorial multiplicity of 3-1 quaternary complexes is constructed using the same reasoning as in the case of ternary complexes,
As 3-1 quaternary complexes are not symmetric under rotation, no symmetry correction of the combinatorial multiplicity is necessary.

Schematic representation of a left-tilted 2-2 tetraplex.
A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Both Watson strands are hybridized to the left Crick strand C1, whereas only W2 is hybridized to the right Crick strand C2. The alignment indices i, j and k denote the position of the strands relative to the left end of C1; here, i = −2, j = 3 and k = 6. The length of the hybridization regions are called Lo,1, Lo,2 and Lo,3. B Rotating the schematic representation of a left-tilted 2-2 tetraplex by 180° produces an alternative representation of the same complex, which is again a left-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation.
Left-Tilted 2-2 quaternary complexes
A 2-2 tetraplex is comprised of two C strands and two W strands. We call a 2-2 tetraplex left-tilted if strand W1 is connected to strand W2 via strand C1 (Fig. S5A). The lengths of the oligomers are called LW,1, LW,2, LC,1 and LC,2. The positions of the strands relative to each other are governed by the alignment indices. All positions are measured relative to the position of the left end of strand C1. The alignment indices may take on the following values,
The complex can have dangling ends on the right and on the left end of the complex; the presence of these dangling ends is indicated by the boolean variables dl and dr. Moreover, two gaps are possible: There might be a gap between strands W1 and W2, or a gap between C1 and C2. The respective boolean variables read
We refer to the hybridization region of strand W1 and C1 as overlap region 1, to the hybridization region of strand W2 and C1 as overlap region 2 and to the hybridization region of strand W2 and C2 as overlap region 3. The length of these hybridization regions is computed via
Given the length of the hybridization region as well as the presence/absence of dangling ends, we can compute the hybridization energy,
The combinatorial multiplicity of a left-tilted 2-2 tetraplex is constructed using the same reasoning as in the case of a 3-1 tetraplex,
To prevent double-counting the same tetraplex, we include either the complex or its rotated representation in the container of possible complexes, but not both. If the complex is symmetric under rotation, we multiply the combinatorial multiplicity by 1/2. Given a left-tilted 2-2 tetraplex (LC,1,LC,2,LW,1,LW,2, i, j, k), we can compute the corresponding rotated tetraplex
In order to compute the map of the alignment indices under rotation, we need to express the relative positions of the strands with respect to the position of strand C1 after rotation, which corresponded to W2 before rotation. For example, irot corresponds to the number of nucleotides by which strand C2 (before rotation) protrudes beyond strand W2 (before rotation). Expressed in terms of variables before rotation, this distance may be written as k +LC,2 jLW,2. Analogous relations can be derived for all alignment indices,
Right-tilted 2-2 quaternary complexes

Schematic representation of a right-tilted 2-2 tetraplex.
A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Unlike in the left-tilted 2-2 tetraplex, both Watson strands are hybridized to the right Crick strand C2, whereas only W1 is hybridized to the left Crick strand C1. The alignment indices i, j, and k denote the positions of the strands relative to C1; here, i = 1, k = 3, and j = 6. The length of the overlap regions are called Lo,1, Lo,2 and Lo,3. B Rotating the schematic representation of a right-tilted 2-2 tetraplex by 180° produces an alternative representation of the same complex, which is again a right-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation. The mapping is identical for left- and right-tilted 2-2 quaternary complexes.
A 2-2 tetraplex is called right-tilted if strand W1 is connected to strand W2 via strand C2 (Fig. S6). As in the case of the left-tilted 2-2 tetraplex, the oligomer lengths are again called LW,1, LW,2, LC,1 and LC,2, but the values of the alignment indices that are possible for the right-tilted tetraplex differ from the ones of the left-tilted tetraplex,
Note that the range of i is chosen such that at least one nucleotides of strand W1 always extends to the right beyond the end of strand C1, allowing for a hybridization region between strand C2 and W1. The boolean variables denoting the presence/absence of dangling ends read
The length of the overlap regions is given by
Like in the case of left-tilted 2-2 quaternary complexes, the total hybridization energy is computed via
and the combinatorial multiplicity via
We include either the tetraplex or its rotated representation in the list of possible complexes to avoid double-counting of quaternary complexes. Moreover, the combinatorial multplicity is divided by 2, if the tetraplex is symmetric under rotation. It turns out that the rotation map for the right-tilted 2-2 tetraplex is identical to the one of the left-tilted 2-2 tetraplex,
S4. Numerical solution of the (de)hybridization equilibrium in the adiabatic approach
Based on the list of complexes constructed in the previous section, we can compute the equilibrium concentration of strands and complexes reached in the (de)hybridization equilibrium. In the following, we denote the concentration of a coarse-grained oligomer with length L as c(L), and the concentration of an oligomer with length L and known sequence as cs(L). Recall that we assumed that all sequences of a given length that are compatible with the circular genome are equally likely in the pool (Section S3). Thus, the concentration of the coarse-grained oligomer and the concentration of an oligomer with specified sequence are related by the combinatorial multiplicity,
In order to compute the concentration of a complex based on the concentration of single strands, we make use of the law of mass action. The concentration of a specific sequence realization of a complex is computed as the product of concentrations of the strands forming the complex divided by the dissociation constant Kd,
Here,
where c° = 1 M is the standard concentration. Just as in the case of single strands, the concentration of the sequence-independent coarse-grained complex is related to the concentration of a complex with specific sequence realization via the combinatorial prefactor,
It can be useful to combine the combinatorial multiplicity and the dissociation constant into a single effective association constant,
Note that the effective association constant including the combinatorial multiplicity is denoted by 𝒦a (in curly font), while the association constant without combinatorial multiplicity is denoted
In the adiabatic approach, we study the behavior of the system on timescales that are long enough for the system to reach the (de)hybridization equilibrium, but too short for templated ligation events to take place. Therefore, the length of the oligomers are expected not to change throughout the equilibration process, and we need to introduce a separate mass conservation law for each coarse-grained oligomer, i.e., each oligomer length, that is included in the pool. For each length, the concentration of single-stranded coarse-grained oligomers of length L and the concentration of coarse-grained oligomers of length L bound in a complex need to add up to their total concentration ctot(L) set by the initial condition,
In this equation,
Combining the mass conservation requirement with the mass action law gives a set of coupled polynomial equations. The number of equations equals the number of distinct oligomer lengths included in the pool. The polynomial equations are of degree 4, as quaternary complexes are the largest complexes to be accounted for and their concentrations equals the product of the four strands comprising the complex. We determine the equilibrium concentrations by finding the root of this set of fourth-degree polynomial equations using the Levenberg-Marquardt algorithm.
S5. Computing replication observables based on the adiabatic approach

Schematic representation of a complex that allows for templated ligation.
The strands E1 and E2 are adjacent to each other, such that a covalent bond can form between their ends. The length of the product strand, LP, is set by the length of the educt strands, Le,1 and Le,2. The likelihood for the complex to form a product oligomer whose sequence is compatible with the true circular genome, pcorr, is determined by the length of the educts and the length of their hybridization region with the template. The parts of the complex that are depicted with hatching do not affect pcorr.
As we are not modeling templated ligation events explicitely in the adiabatic approach, we compute replication observables based on the equilibrium concentration of complexes that are in a configuration which allows for a templated ligation reaction to happen. Templated ligations are possible if two strands in the complex are adjacent to each other, i.e., there is no gap in between two oligomers that are hybridized to the same template strand (Fig. S7). Recall that the absence of a gap between two oligomers in the complex implies that the dangling end indicator variable dm = 0 (Section S3). The length of the product strand P is equal to the sum of the lengths of the two educt strands E1 and E2, LP = LE,1 +LE,2. We can use the information about the length of the product strand to compute the yield of replication. By definition (see main text), the yield equals the fraction of nucleotides used to form VCG oligomers, i.e., strands longer than LS,
We can express this quantity in terms of the equilibrium concentration of complexes facilitating templated ligation,
In order to compute the fidelity of replication, we need to distinguish between product oligomers sequences that are compatible with the genomes (correct sequences) and sequences that are incompatible with the genome (false sequences). As we do not know about the details of the sequences due to the coarse-grained representation of the complexes, we need to invoke a combinatorial argument to determine the fraction of correct products. To this end, we compare the number of product sequences that might be produced in a complex of given oligomer lengths and relative oligomer position, to the number of correct products. The combinatorial multiplicity of the products that could be produced by a complex of given configuration is set by the combinatorial multiplicity of the possible templates, ℭ (Lo,1 + Lo,2), multiplied by the multiplicity of the educt strands hybridizing to the template with given lengths of the hybridization regions, Lo,1 and Lo,2,
The multiplicity of correct products equals the combinatorial multiplicity of strands that have the same length as the product,
This implies that the probability for a complex of given shape
Using this probability, we can compute the fidelity of replication,
as well as the replication efficiency,
S6. Scaling law of the equilibrium concentration ratio that maximizes replication efficiency
We consider a pool containing monomers and VCG oligomers of a single length. As the feedstock only contains monomers, the total monomer concentration and the total feedstock concentration are identical,
where the concentrations denote the equilibrium concentration of all complexes facilitating the indicated type of templated ligation. Note that we do not include dimerizations, i.e., F+F ligations, in the numerator as they do not contribute to the yield. Moreover, the V+V ligations are multiplied by the length of the oligomers LV, as each V+V extends an existing oligomer by LV nucleotides.
Assuming that complexes comprised of three strands are the dominant type of complexes, we can express the equilibrium concentrations as product of the equilibrium concentrations of the free strands and the effective association constants,
Here,
where we introduced the ratio of equilibrium concentrations,
The effective association constant are computed using the combinatorial rules outlined in Section S4. We find that
and,
These observations allow us to obtain an significantly simplified approximate expression for
We used the identity
scales linearly with LV. In F+F ligations, an VCG oligomer acts as template facilitating the ligation of two monomers. Independent of the length of the VCG oligomer, the two monomers always have the same binding affinity to their template. However, the number of possible positions that the two adjacent monomers can have on the template scales linearly in template length, causing the effective association constant to depend linearly on LV, scales exponentially with LV. As illustrated in Fig. 2. Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). (A) The pool contains a fixed concentration of monomers, , as well as VCG oligomers of a single length, LV, at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). (B) The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. (C) The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for LV = 6 nt. (D) Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. (E) V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, pcorr, depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with LG = 16 nt, LE = 2 nt and LU = 3 nt. (F) The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (dashed line, Eq. (4equation.8)) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with LV, facilitating reliable replication without fine-tuning to match the optimal concentration ratio. (G) Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of LV and approaches a plateau of 100%. For high efficiency, Eq. (5equation.9) provides a good approximation of the length-dependence of ηmax (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Eq. (5equation.9) (vertical dotted lines). (H) The unique motif length, , increases logarithmically with the length of the genome, LG. The length of VCG oligomers, LV, at which the optimal replication efficiency reaches 95% (computed using Eq. (5equation.9)) exhibits the same logarithmic dependence on LG.figure.caption.7E in the main text, the length of the hybridization region equals L in complexes facilitating incorrect V+V ligations. This leads to exponential scaling of with LV, as the hybridization energy is proportional to the length of the hybridization site. The number of complexes facilitating incorrect product formation is only a function of the unique subsequence length LS, but does not depend on the oligomer length LV. Therefore, there is no multiplicative prefactor proportional LV,

Effective association constants of complexes facilitating F+F ligations (A), false V+V ligations (B), and F+V ligations (C).
The dots depict the effective association constants derived based on the combinatoric rules presented in S4, the solid lines represent the respective scaling laws introduced in Eq. (S4-S7). Different colors correspond to different hybridization energies per matching nearest neighbor block γ.
Fig. S8 shows the LV-dependence of the effective association constants. The circles represent the effective association constants derived based on the combinatoric rules discussed in Section S4, while the solid lines show the length-dependent scaling computed via Eq. (S4) and (S5).
Given the scaling of
We obtain ropt (Fig. 2). Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). (A) The pool contains a fixed concentration of monomers,
S7. Characteristic length-scale of vcg oligomers enabling high replication efficiency
Starting from Eq. S3, the optimal efficiency of replication can be expressed as
To understand the scaling of ηopt as function of LV, we need to know the scaling of
Therefore, we use the following scaling ansatz for
Combining the scaling laws for
S8. Dependence of replication efficiency on subsequence lengthscales of the circular genome
The coarse-grained representation (Section S3) used in the adiabatic approach applies only to a specific class of genomes – those in which the exhaustive coverage length LE is maximal and the unique motif length LU is minimal. This corresponds to genomes that satisfy
To analyze replication behavior beyond this limited case, we developed a fully sequence-resolved extension of the adiabatic approach. Rather than using a coarse-grained view of oligomers, this method considers each distinct strand sequence as a separate chemical species. For example, a genome of length LG = 64 nt that is encoded in a pool containing monomers and octamers, a total of 132 individual oligomers must be tracked: 4 monomers and 128 distinct octamers. This contrasts sharply with the coarse-grained scenario, which involves only two variables (total monomer and total octamer concentrations). Starting from the single-stranded oligomers, all possible complexes involving up to three strands are enumerated, and their hybridization free energies are calculated using the same energetic model applied in the coarse-grained framework. In the aforementioned example, this results in 351 200 distinct sequence-resolved complexes, compared to just 135 complexes in the coarse-grained model, highlighting the increased computational demands in terms of memory and runtime. The hybridization equilibrium is computed by solving the algebraic system defined by the law of mass action and mass conservation. The procedure mirrors that of the coarse-grained approach (Section S4), with the key difference that combinatorial prefactors are no longer required: these are inherently encoded in the full enumeration of unique sequences and their binding configurations.
For our further analysis, we fix the genome length at LG = 64 nt and systematically vary the motif length scales LE and LU. Genomes with desired values of LE and LU are constructed as described in Section S1: Starting from a genome with maximal motif entropy

Genomes sampled via the Metropolis-Hastings algorithm using the motif entropy as “Hamiltonian”.
The table summarizes the sequence of the genome, its characteristic length scales LE and LU, as well as the motif entropy on all length scales of interest. The keyword “bias” is used to distinguish two different sampling procedures: Weakly biased genomes are designed to obey the desired length scales LE and LU while retaining a close-to-uniform motif distribution for subsequences of length LE < L < LU, whereas the motif distribution is far from uniform for strongly biased genomes.

Replication efficiency as a function of the concentration of VCG oligomers in the pool for different choices of genomes and varying VCG oligomer length.
All genomes are LG = 64 nt long, and include all motifs up to length LE = 2 nt, but differ with respect to their minimal unique subsequence length LU : (A) LU = 6 nt, (B) LU = 8 nt, and (C) LU = 10 nt (shown as dotted lines). For comparison, every panel shows the replication efficiency of a genome with LE = 3 nt, LU = 4 nt (solid line). Different colors are used to distinguish different VCG oligomer lengths. Under otherwise identical conditions (e.g., identical oligomer length), replication proceeds with lower efficiency in genomes with higher unique subsequence length LU.

Maximal replication efficiency as a function of the oligomer length for different genomes (all LG = 64 nt long).
Regardlesss of the genome, the oligomer lenght needs to exceed LU to enable replication with high efficiency (e.g., higher than 95%). The difference between the genome length required for high efficiency replication and the unique motif length LU depends on the motif distribution on intermediate length scales (LE < L < LU): Genomes with strong bias require longer oligomers (A) than genomes with weak bias (B) (see Table S2 for the genomes and their motif entropies).
Each genome is mapped to a Virtual Circular Genome (VCG) pool containing monomers and oligomers of variable length LV. Assuming all oligomers are chemically activated, we compute the replication efficiency of each pool as a function of the relative oligomer concentration, for different values of LV and across genome types (Fig. S9). As expected, replication efficiency exhibits a maximum at intermediate oligomer concentrations. The maximum efficiency achieved depends on both the VCG oligomer length and the genome’s motif properties. In general, longer oligomers enable more efficient replication. However, at a fixed oligomer length, genomes with higher unique motif lengths LU (or lower LE) replicate with lower efficiency, implying that these genomes require longer oligomers for successful replication. For example, a VCG pool with oligomers of length LV = 9 nt can replicate a genome with LE = 3 nt and LU = 4 nt at 97% efficiency (see solid green curve in Fig. S9B). In contrast, replicating a genome with LE = 3 nt and LU = 10 nt requires oligomers of at least LV = 11 nt to reach comparable efficiency (see dotted green curve in Fig. S9C).
Across all tested genomes, the oligomer length required to achieve > 95% efficiency (denoted
S9. Enhanced replication efficiency in multi-length VCG pools
We investigate if a pool containing a range of VCG oligomer lengths exhibits higher replication efficiency than single-length pools. To this end, we study a VCG pool comprised of monomers (with fixed concentration), as well as tetramers and octamers (with variable concentration). For moderate binding affinity, γ = −2.5 kBT, the pool reaches optimal replication efficiency, if the tetramer concentration is smaller than the octamer concentration (Main Text Fig. 3) Replication performance of pools containing VCG oligomers of two different lengths. (A) The pool contains a fixed concentration of monomers,

Length-modulated enhanced ligation in pools containing tetramers and octamers for strong binding affinity, γ = −5.0 kBT.
The replication efficiency reaches its maximum in the concentration regime that supports templated ligation of tetramers on octamer templates.

Length-modulated enhanced ligation in pools containing heptamers and octamers for weak binding affinity, γ = −2.5 kBT.
The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Length-modulated enhanced ligation in pools containing heptamers and octamers for strong binding affinity, γ = −5.0 kBT.
The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.
S10. Dimer-related upper bound of replication efficiency
In a VCG pool comprised of monomers, dimers and VCG oligomers of single length, the replication efficiency can be expressed as
provided the pool operates in the concentration regime where all types of ligations other than F+V ligations are negligible. In this equation,
Complexes allowing for templated ligation need to involve at least three oligomers, but may be comprised of more strands. For the following analytical derivation, we restrict ourselves to ternary complexes, while the full numerical solution (see continuous lines in Main Text Fig. 5) Replication performance of single-length VCG pools containing monomers and dimers as feedstock. (A) The pool contains a fixed total concentration of feed-stock,

Effective association constants of complexes facilitating 1+V ligations (A) and 2+V ligations (B).
Using this expression, the equation for the replication efficiency may be simplified,
The effective association constants in the denominator may be expressed as the sum of the association constants of correct and false products,
In the following, we simplify the notation by referring to
we can express η as
Taking the limit LV → ∞ yields and upper bound for the replication efficiency,
Provided that the ratio of equilibrium concentrations of dimers and monomers is comparable to the ratio of total concentration of dimers and monomers, ceq(2)/ceq(1) ≈ ctot(2)/ctot(1), we can write
Therefore, the upper bound of the replication efficiency may be expressed as
For the genome that is analyzed in the main text,
S11. Asymptotic fraction of oligomers undergoing extension by monomers
In pools containing only activated monomers, as well as non-activated dimers and VCG oligomers of a single length, we observe that the fraction of oligomers that can be extended by a monomer reaches an asymptotic value for high VCG concentrations. Notably, this asymptotic value is independent of the oligomer length.
The fraction of oligomers that are in a monomer extension-competent state can be computed based on the equilibrium concentration of complexes facilitating the extension of an oligomer by a monomer,
Note that, in principle, the monomer can be added to the 5′- or the 3′-end of the oligomer, leading to two contributions in the numerator, which are identical due to symmetry,
Just like in the previous section,
Therefore, the fraction of monomer-extendable oligomers reads,
In the limit of high VCG concentration, almost all VCG oligomers form duplexes. Hence, the equilibrium concentration of free oligomers can be approximated by,
implying,

The effective association constants of reactive ternary complexes (complexes comprising three strands) can be computed based on the effective association constant of the duplexes.
A If the hybridization region of the two VCG oligomers is LV − 1 nucleotides long, the monomer hybridizes to the end (start) of the template strand. As the template has no dangling end, the energy contribution of the hybridizing monomer is γ/2. B and C For hybridization regions that are shorter than LV − 1, but at least 1 nucleotide long, the energy contribution due to the added nucleotide is γ. In all cases, the prefactor 2 accounts for the two possible positions at which a monomer might be added.
As almost all monomers are free in solution, their equilibrium concentration may be approximated by the total concentration of monomers,
From this representation of
The notation
We write the duplex association constants in terms of the binding energy due to their overlap length,
Therefore, the asymptotic ratio of monomer-extended oligomers reads,
Addition of a single base pair will decrease the energy by the energy of a half nearest neighbor block, i.e., by γ/2. Thus, we assign Kd(1) = exp (−|γ|/2), and find,
S12. Analytical solution for equilibrium concentrations in multi-length VCG pools
We consider pools comprised of monomers, dimers and VCG oligomers of multiple lengths. For simplicity, we restrict ourselves to profiles with a uniform distribution of VCG oligomers, such that all VCG oligomers have the same concentration. Assuming that almost all oligomer mass is contained in single strands and duplexes, the mass conservation equations read
Note that there is a mass conservation equation for each oligomer length individually, i.e., we are dealing with a set of multiple coupled quadratic equations. We can make the system of equations dimensionless by introducing a dimensionless concentration
This rescaling is chosen as it is the solution to the (approximative) mass conservation equation in the high concentration limit,
under the assumption that
Writing the full mass conservation equation in terms of the dimensionless concentration yields,
We drop all ratios of total concentrations, as we assume that the total concentration is the same for each oligomer length of VCG oligomer (uniform concentration profile),
By introducing the dimensionless prefactors,
we can rewrite the mass conservation equation,
To solve for
We can use this representation to compute the equilibrium concentrations recursively: We start the recursion with the assumption
In the first recursion step, we compute
This scheme can be applied until the approximated values of
Note that the equilibrium concentration obtained after the (i + 1)-th iteration step

Comparison between approximate (analytical) and true (numerical) solution of the (de)hybridization equilibrium in multi-length VCG pools.
A The approximation converges after roughly five iteration steps. The relative error drops below 1% in the first iteration step already. B Equilibrium concentrations as a function of total VCG concentration. Continuous lines show the numerical solution, while the dotted and dashed lines depict the approximation obtained in the zeroth or first iteration step respectively. The feedstock concentration is fixed,
S13. Threshold concentration for the inversion of productivity
In VCG pools that contain VCG oligomers of multiple lengths as well as activated monomers, we observe “inversion of productivity”: The fraction of short VCG oligomers in a monomer extension-capable complex configuration exceeds that of long oligomers. However, this is only the case for sufficiently high concentration of VCG oligomers. The threshold concentration that is necessary for oligomers of length M to exceed the monomer-extension fraction of oligomers of length L (assuming M < L) is set by the condition r1+V(L) = r1+V(M). To compute r1+V(L), we need to account for all possible complexes, in which an oligomer L can be extended by a monomers,
As we are considering a uniform concentration profile, all VCG oligomers have the same total concentration, i.e., ctot(L) = ctot(M). Thus, we can express the condition for the threshold concentration as
We combine this condition with the analytical approximation of the equilibrium concentration up to the first iteration step derived in Section S12, cf. Eqs. S9-S11. As the equilibrium concentrations
S14. Inversion of productivity using parameters of system studied by Ding et al
In their experimental study, Ding et al. focus on a genome of length LG = 12 nt. In the VCG pool encoding the genome, Ding et al. include (Tab. S1 in [9])
dimers with 11 different sequences,
trimers with 20 different sequences, and
tetramers up to 12-mers with 24 different sequences each.
Every oligomer sequence is included with a concentration of 1 µM (so-called 1x profile), adding up to a total concentration of
We construct a genome that mimics the properties of the genome used by Ding et al., but obeys our genome construction principles outlined in the method section of the main text. To this end, we consider a genome of length LG = 12 nt and a minimal unique subsequence length of LS = 3 nt. This implies that our VCG pool contains
dimers with 16 different sequences, and
trimers up to 12-mers with 24 different sequences.
Our model does not include imidazolium-bridged dinucleotides explicitely. Instead, we assume that the concentration of activated mononucleotides in our model equals the total concentration of activated homo-dinucleotides (20 mM) used in the experimental study. The total concentration of non-activated oligomers is treated as a free parameters.
We find that the system exhibits inversion of productivity: For the entire considered range of concentration of non-activated oligomers, 10-mers are more likely to be extended by a monomer than 12-mers (Fig. S17). Moreover, 8-mers are more productive than 12-mers (provided the concentration of non-activated oligomers exceeds roughly 0.3 µM), and more productive than 10-mers (provided

Replication performance of multi-length VCG pools, in which only the monomers are activated.
The pool includes activated monomers (
Acknowledgements
We thank Paul Higgs and members of the Gerland group for stimulating discussions. This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) via the CRC/TRR 392 Molecular Evolution (Project-ID 521256690), and under Germany’s Excellence Strategy (EXC-2094-390783311, ORIGINS).
Additional information
Code Availability
Most of the software used in this study is publicly available at https://github.com/gerland-group/VirtualCircularGenome. This repository includes code for sampling circular genomes using a Metropolis-Hastings-type algorithm, as well as code for the coarse-grained adiabatic approach used in the numerical analysis of replication performance. The full kinetic simulation code is available from the authors upon reasonable request.
Funding
German Research Foundation CRC/TRR 392 Molecular Evolution (521256690)
Excellence Cluster Origins (EXC-2094-390783311)
References
- Thermodynamics and kinetics of DNA and RNA dinucleotide hybridization to gaps and overhangsBiophys J 122:3323–3339https://doi.org/10.1016/j.bpj.2023.07.009Google Scholar
- Ribozyme-catalysed RNA synthesis using triplet building blockseLife 7:e35255https://doi.org/10.7554/eLife.35255Google Scholar
- In-ice evolution of RNA polymerase ribozyme activityNat Chem 5:1011–1018https://doi.org/10.1038/nchem.1781Google Scholar
- A high-yielding, strictly regioselective prebiotic purine nucleoside formation pathwayScience 352:833–836https://doi.org/10.1126/science.aad2808Google Scholar
- Asphalt, Water, and the Prebiotic Synthesis of Ribose, Ribonucleosides, and RNAAcc Chem Res 45:2025–2034https://doi.org/10.1021/ar200332wGoogle Scholar
- Proton NMR study of the base-pairing reactions of d(GGAATTCC): salt effects on the equilibria and kinetics of strand associationBiochemistry 30:754–758https://doi.org/10.1021/bi00217a026Google Scholar
- Computer simulations of Template-Directed RNA Synthesis driven by temperature cycling in diverse sequence mixturesPLoS Comput Biol 18:e1010458https://doi.org/10.1371/journal.pcbi.1010458Google Scholar
- Kinetic explanations for the sequence biases observed in the nonenzy-matic copying of RNA templatesNucleic Acids Res 50:35–45https://doi.org/10.1093/nar/gkab1202Google Scholar
- Experimental Tests of the Virtual Circular Genome Model for Nonenzymatic RNA ReplicationJ Am Chem Soc 145:7504–7515https://doi.org/10.1021/jacs.3c00612Google Scholar
- Competition between bridged dinucleotides and activated mononucleotides determines the error frequency of nonenzymatic RNA primer extensionNucleic Acids Res 49:3681–3691https://doi.org/10.1093/nar/gkab173Google Scholar
- A kinetic error filtering mechanism for enzyme-free copying of nucleic acid sequencesbiorXiv https://doi.org/10.1101/2021.08.06.455386Google Scholar
- Thermodynamic and Kinetic Sequence Selection in Enzyme-Free Polymer Self-Assembly inside a Non-equilibrium RNA ReactorLife 12:567https://doi.org/10.3390/life12040567Google Scholar
- The RNA World: molecular cooperation at the origins of lifeNat Rev Genet 16:7–17https://doi.org/10.1038/nrg3841Google Scholar
- Physical non-equilibria for prebiotic nucleic acid chemistryNat Rev Phys 5:185–195https://doi.org/10.1038/s42254-022-00550-3Google Scholar
- RNA-Catalyzed RNA Polymerization: Accurate and General RNA-Templated Primer ExtensionScience 292:1319–1325https://doi.org/10.1126/science.1060786Google Scholar
- RNA evolution and the origins of lifeNature 338:217–224https://doi.org/10.1038/338217a0Google Scholar
- Templating efficiency of naked DNAProc Natl Acad Sci USA 107:12074–12079https://doi.org/10.1073/pnas.0914872107Google Scholar
- The effect of leaving groups on binding and reactivity in enzyme-free copying of DNA and RNANucleic Acids Res 44:5504–5514https://doi.org/10.1093/nar/gkw476Google Scholar
- Synthesis of Carbohydrates in Mineral-Guided Prebiotic CyclesJ Am Chem Soc 133:9457–9468https://doi.org/10.1021/ja201769fGoogle Scholar
- Template-based information transfer in chemically fueled dynamic combinatorial librariesNat Chem 16:1240–1249https://doi.org/10.1038/s41557-024-01570-5Google Scholar
- Emergence of Homochirality via Template-Directed Ligation in an RNA ReactorPRX Life 2:013015https://doi.org/10.1103/PRXLife.2.013015Google Scholar
- Cascade of Reduced Speed and Accuracy after Errors in Enzyme-Free Copying of Nucleic Acid SequencesJ Am Chem Soc 135:354–366https://doi.org/10.1021/ja3095558Google Scholar
- The prebiotic evolutionary advantage of transferring genetic information from RNA to DNANucleic Acids Res 39:8135–8147https://doi.org/10.1093/nar/gkr525Google Scholar
- Enzyme-Free Copying of 12 Bases of RNA with DinucleotidesAngew Chem Int Ed Engl 61:e202203067https://doi.org/10.1002/anie.202203067Google Scholar
- Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structureProc Natl Acad Sci USA 101:7287–7292https://doi.org/10.1073/pnas.0401799101Google Scholar
- Freeze–thaw cycles as drivers of complex ribozyme assemblyNat Chem 7:502–508https://doi.org/10.1038/nchem.2251Google Scholar
- Improved polymerase ribozyme efficiency on hydrophobic assembliesRNA 14:552–562https://doi.org/10.1261/rna.494508Google Scholar
- Synthesis of activated pyrimidine ribonucleotides in prebiotically plausible conditionsNature 459:239–242https://doi.org/10.1038/nature08013Google Scholar
- Mapping a Systematic Ribozyme Fitness Landscape Reveals a Frustrated Evolutionary Network for Self-Aminoacylating RNAJ Am Chem Soc 141:6213–6223https://doi.org/10.1021/jacs.8b13298Google Scholar
- Nonenzymatic copying of RNA templates containing all four letters is catalyzed by activated oligonucleotideseLife 5:e17756https://doi.org/10.7554/elife.17756Google Scholar
- Effect of Stalling after Mismatches on the Error Catastrophe in Nonenzymatic Nucleic Acid ReplicationJ Am Chem Soc 132:5880–5885https://doi.org/10.1021/ja100780pGoogle Scholar
- The Origins of the RNA WorldCold Spring Harb Perspect Biol 4:a003608https://doi.org/10.1101/cshperspect.a003608Google Scholar
- Self-Assembly of Informational Polymers by Templated LigationPhys Rev X 11:031055https://doi.org/10.1103/PhysRevX.11.031055Google Scholar
- The Hammerhead RibozymeProg Mol Biol Transl Sci 120:1–23https://doi.org/10.1016/B978-0-12-381286-5.00001-9Google Scholar
- Enzyme-free ligation of dimers and trimers to RNA primersNucleic Acids Res 47:3836–3845https://doi.org/10.1093/nar/gkz160Google Scholar
- An optimal degree of physical and chemical heterogeneity for the origin of life?Philos Trans R Soc B 366:2894–2901https://doi.org/10.1098/rstb.2011.0140Google Scholar
- An RNA polymerase ribozyme that synthesizes its own ancestorProc Natl Acad Sci USA 117:2906–2913https://doi.org/10.1073/pnas.1914282117Google Scholar
- Transient states during the annealing of mismatched and bulged oligonu-cleotidesNucleic Acids Res 52:2174–2187https://doi.org/10.1093/nar/gkae091Google Scholar
- RNA Complexes with Nicks and Gaps: Thermodynamic and Kinetic Effects of Coaxial Stacking and Dangling EndsJ Am Chem Soc 146:18083–18094https://doi.org/10.1021/jacs.4c05115Google Scholar
- Cyclophos-pholipids Enable a Protocellular Life CycleACS Nano 17:23772–23783https://doi.org/10.1021/acsnano.3c07706Google Scholar
- A Highly Reactive Imidazolium-Bridged Dinucleotide Intermediate in Nonenzymatic RNA Primer ExtensionJ Am Chem Soc 138:11996–12002https://doi.org/10.1021/jacs.6b07977Google Scholar
- A Kinetic Model of Nonenzymatic RNA Polymerization by Cytidine-5’-phosphoro-2-aminoimidazolideBiochemistry 56:5739–5747https://doi.org/10.1021/acs.biochem.7b00792Google Scholar
- Prolinyl Nucleotides Drive Enzyme-Free Genetic Copying of RNAAngew Chem Int Ed Engl 62:e202307591https://doi.org/10.1002/anie.202307591Google Scholar
- Kinetics of renaturation of DNAJ Mol Biol 31:349–370https://doi.org/10.1016/0022-2836(68)90414-2Google Scholar
- Ribozyme-Catalyzed Transcription of an Active RibozymeScience 332:209–212https://doi.org/10.1126/science.1200752Google Scholar
- The virtual circular genome model for primordial RNA replicationRNA 27:1–11https://doi.org/10.1261/rna.077693.120Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.104043. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Ludwig Burger & Ulrich Gerland
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 1,168
- downloads
- 12
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.