Model. A In the Virtual Circular Genome (VCG) model, a circular genome (depicted in green) as well as its sequence complement are encoded in a pool of oligomers (depicted in blue and orange). Collectively, the pool of oligomers encodes the whole sequence of the circular genome. Depending on their length, two types of oligomers can be distinguished: Feedstock oligomers (depicted in blue) are too short to specify a unique locus on the genome, while long VCG oligomers (depicted in orange) do. B The length-distribution of oligomers included in the VCG pool is assumed to be exponential. The concentration of feedstock oligomers and VCG oligomers as well as their respective length scales of exponential decay and can be varied independently. The set of included oligomer lengths can be restricted via and . C The hybridization energy of complexes is computed using a simplified nearest-neighbor model: Each full block comprised of two base pairs (depicted in pink) contributes γ, while dangling end blocks (depicted in blue) contribute γ/2. D Oligomers form complexes via hybridization reactions, or dehybridize from an existing complex. The ratio of hybridization and dehybridization rate is governed by the hybridization energy. If two oligomers are adjacent to each other in a complex, they can undergo templated ligation. E Based on the length of the reacting oligomers, we distinguish three types of templated ligation: Ligation of two feedstock oligomers (F+F), ligation of feedstock oligomer to VCG oligomer (F+V) and ligation of two VCG oligomers (V+V).

Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). A The pool contains a fixed concentration of monomers, mM, as well as VCG oligomers of a single length, LV, at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). B The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. C The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for LV = 6 nt. D Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. E V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, pcorr, depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with LG = 16 nt and LS = 3 nt. F The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (Eq. (4), dashed line) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with LV, facilitating reliable replication without fine-tuning to match the optimal concentration ratio. G Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of LV and approaches a plateau of 100%. For high efficiency, Eq. (5) provides a good approximation of the length-dependence of ηmax (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Eq. (5) (vertical dotted lines). H By construction, the unique subsequence length, LS, increases logarithmically as a function of the length of the genome, LG. The length of VCG oligomers, LV, at which the optimal replication efficiency reaches 95% (computed via Eq. 5) exhibits the same logarithmic dependence on LG.

Replication performance of VCG pools containing VCG oligomers of multiple lengths (multi-length VCG pools). A The pool contains a fixed concentration of monomers, , as well as tetramers and octamers at variable concentration. The hybridization energy per nearest-neighbor block is γ = 2.5 kBT. B Replication efficiency reaches its maximum for c(8) ≈ 0.1 μM and significantly lower tetramer concentration, c(4) ≈ 7.4 pM. Efficiency remains close-to-maximal on a plateau around the maximum spanning almost two orders of magnitude in tetramer and octamer concentration. In addition, efficiency exhibits a ridge of increased efficiency for high tetramer concentration and intermediate octamer concentration. C The concentrations of complexes that facilitate templated ligation are grouped by the length of the template and the educts, . We distinguish complexes producing correct (labeled ”c”) and false products (labeled ”f”). For each relevant type of complex, we highlight the region of concentration where it contributes most significantly, i.e., at least 20% of the total ligation flux. The plateau of high efficiency is dominated by the ligation of monomers to octamers, whereas the ridge of increased efficiency is due to the correct ligation of two tetramers templated by an octamer.

Replication performance of multi-length VCG pools. A The pool contains a fixed concentration of monomers, , as well as long oligomers in the range at variable concentration . The length dependence of the concentration profile is assumed to be uniform (for panels B, D, and E) or exponential (for panel C); its steepness is set by the parameter κV. B If the length distribution is uniform, reducing decreases the maximal efficiency, whereas increasing increases it. Pools containing a range of oligomer lengths are always outperformed by single-length VCGs (blue curve). C Assuming an exponential length distribution of VCG oligomers allows to tune from a poorly-performing regime (dominated by oligomers of length ) to a well-performing regime (dominated by oligomers of length ). In the limit κV → ∞, ηmax approaches the replication efficiency of single-length pools containing only oligomers of length (dashed lines). D Especially for high , replication is dominated by primer extension of the long oligomers in the VCG, here . In this limit, addition of shorter oligomers leaves the dominant F+V ligations almost unchanged. E Reducing for fixed nt gives rise to increased significance of erroneous ligation reactions.

Replication performance of single-length VCG pools containing monomers and dimers as feedstock. A The pool contains a fixed total concentration of feedstock oligomers, , partitioned into monomers and dimers, as well as VCG oligomers of a single length, LV. The proportion of monomers and dimers can be adjusted via κF, and the concentration of the VCG oligomers is a free parameter,. B Replication efficiency exhibits a maximum at intermediate VCG concentration in systems with (dashed blue curve) and without dimers (solid blue curve). The presence of dimers reduces replication efficiency significantly, as they enhance the ligation share of incorrect F+V ligations (dashed green curve). The panel depicts the behavior for LV = 7 nt and κF = 2.3. C Optimal replication efficiency increases as a function of oligomer length, LV, and asymptotically approaches a plateau (dashed lines, Eq. (6). The value of this plateau, , is determined by the competition between correct and false 2+V reactions, both of which grow exponentially with LV. Thus, depends on the relative concentration of the dimers in the pool: the more dimers are included, the lower is . D Erroneous 1+V ligations are possible if the educt oligomer has a short overlap region with the template. The hybridization energy for such configurations is small, and independent of the length of the VCG oligomers (left). 2+V ligations may produce incorrect products via the same mechanism (middle). In addition, erroneous 2+V ligations can be caused by complexes in which two VCG oligomers hybridize perfectly to each other, but the dimer has a dangling end. The stability of these complexes increases exponentially with oligomer length (right).

Replication performance of single-length VCG pools in which only the monomers are activated. A The pool contains activated monomers alongside non-activated dimers and VCG oligomers of a single length. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 μM, adding up to a total feedstock concentration of , while the concentration of VCG oligomers, , can be varied. B Unlike in pools in which all oligomers are activated, replication efficiency does not decrease at high VCG concentration if only monomers are activated, as erroneous V+V ligations are impossible. Instead, replication efficiency approaches an asymptotic value of 1. C The fraction of oligomers that are in a monomer extension-competent state depends on the total concentration of VCG oligomers. At low VCG concentration, most oligomers are single-stranded, and extension of oligomers by monomers is scarce. At high VCG concentration, r1+V approaches the asymptotic value (Eq. (7), grey dashed line). In this limit, almost all oligomers form duplexes, which facilitate monomer addition upon hybridization of a monomer. Thus, the asymptotic fraction of oligomers that gets extended by monomers is not determined by the oligomer length, but by the binding affinity of monomers to existing duplexes. Conversely, the threshold concentration at which depends on oligomer length (colored dashed lines): Longer oligomers reach higher r1+V at lower VCG concentration.

Replication performance of multi-length VCG pools in which only the monomers are activated. A The pool contains activated monomers as well as non-activated dimers and VCG oligomers. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 μM, adding up to a total feedstock concentration of , while the total concentration of VCG oligomers, , can be varied. All VCG oligomers are assumed to have the same concentration. B At low VCG concentration, the fraction of oligomers that are in a monomer extension-competent state is higher for long oligomers than for short oligomers, while, at high VCG concentration, monomers preferentially ligate to short oligomers (“inversion of productivity”). The threshold concentrations at which a short oligomer starts to outperform a longer oligomer depend on the lengths of the compared oligomers (dashed lines). C The mechanism underlying inversion of productivity can be understood based on the pair-wise competition of different VCG oligomers, e.g., 8-mers vs. 9-mers. Over the entire range of VCG concentrations, complexes with 8-mer templates have a lower relative equilibrium concentration than complexes with 9-mer templates (bottom two curves vs. top two curves). However, as the concentration of VCG oligomers is increased, the extension fraction of 8-mers that are extended by monomers using a 9-mer as a template exceeds the fraction of extended 9-mers. D The equilibrium concentration of free oligomer decreases with increasing . Long oligomers have a lower equilibrium concentration ratio of free oligomers, as they can form more stable complexes with longer hybridization sites. E Complexes in which 8-mers serve as template are less stable than complexes with 9-mer templates, explaining the higher relative equilibrium concentration of the latter complex type (see panel C). Complexes with 9-mer template have similar stability regardless of the length of the educt oligomer, i.e., and are similarly stable. This similar stability together with the higher concentration of free 8-mers compared to 9-mers (see panel D) is the reason why the fraction of monomer-extended 8-mers exceeds the one of 9-mers (see panel C).

Characteristic timescales in the kinetic simulation. A The equilibration timescale is determined based on the total hybridization energy of all strands in the pool, ΔGtot. By fitting an exponential function to ΔGtot, we obtain a characteristic timescale, τ * (vertical dotted line), which is then used to calculate the equilibration time as τeq = 5τ * (vertical dashed line). The horizontal dashed line shows the total hybridization energy expected in (de)hybridization equilibrium according to the coarse-grained adiabatic approach (Sections S-II and S-III). B The correlation timescale is determined based on the autocorrelation of ΔGtot. We obtain τcorr (vertical dashed line) by fitting an exponential function to the autocorrelation. In both panels, we show simulation data obtained for a VCG pool containing monomers and VCG oligomers with a concentration of as well as oligomers of length L = 8 nt with a concentration of .

Simulation runtime of the full kinetic simulation for a VCG pool that includes monomers and VCG oligomers of lenght L = 8. The total concentration of feedstock monomers equals , while the total concentration of VCG oligomers equals . The energy contribution per matching nearest-neighbor block equals γ = 2.5 kBT. The volume of the system is varied, and the time-evolution is simulated until t = 5.0 · 107t0. The runtime of the simulation scales linearly with the volume of the system.

System parameters used to compute the replication observables yield, y, and replication efficiency, η, based on the kinetic simulation. The computed observables are shown in Fig. 2 in the main text.

Schematic representation of complexes considered in the adiabatic approach. A A duplex is comprised of two strands, which we refer to as W (Watson) and C (Crick). The relative position of the strands is characterized by alignment index i; for the depicted duplex, i = −2. The length of the hybridization region is called Lo. B A triplex contains three strands. By convention, we denote the two strands that are on the same “side” of the complex as W1 and W2, and the complementary strand as C. The alignment indices i and j denote the positions of W1 and W2 relative to C. For the depicted triplex, i = −2 and j = 3. The length of the hybridization regions are called Lo,1 and Lo,2.

Schematic representation of a 3-1 tetraplex. Three strands (in the following referred to as Watson strands W1, W2, and W3), hybridize to a single template strand (Crick strand C). The positions relative to the left end of the C strand are given by the alignment indices i, j, and k; here, i = −2, j = 2, k = 5. The length of the overlap regions are denoted Lo,1, Lo,2 and Lo,3.

Schematic representation of a left-tilted 2-2 tetraplex. A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Both Watson strands are hybridized to the left Crick strand C1, whereas only W2 is hybridized to the right Crick strand C2. The alignment indices i, j and k denote the position of the strands relative to the left end of C1; here, i = −2, j = 3 and k = 6. The length of the hybridization regions are called Lo,1, Lo,2 and Lo,3. B Rotating the schematic representation of a left-tilted 2-2 tetraplex by 180° produces an alternative representation of the same complex, which is again a left-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation.

Schematic representation of a right-tilted 2-2 tetraplex. A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Unlike in the left-tilted 2-2 tetraplex, both Watson strands are hybridized to the right Crick strand C2, whereas only W1 is hybridized to the left Crick strand C1. The alignment indices i, j, and k denote the positions of the strands relative to C1; here, i = 1, k = 3, and j = 6. The length of the overlap regions are called Lo,1, Lo,2 and Lo,3. B Rotating the schematic representation of a right-tilted 2-2 tetraplex by 180° produces an alternative representation of the same complex, which is again a right-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation. The mapping is identical for left- and right-tilted 2-2 tetraplexes.

Schematic representation of a complex that allows for templated ligation. The strands E1 and E2 are adjacent to each other, such that a covalent bond can form between their ends. The length of the product strand, LP, is set by the length of the educt strands, Le,1 and Le,2. The likelihood for the complex to form a product oligomer whose sequence is compatible with the true circular genome, pcorr, is determined by the length of the educts and the length of their hybridization region with the template. The parts of the complex that are depicted with hatching do not affect pcorr.

Effective association constants of complexes facilitating F+F ligations (A), false V+V ligations (B), and F+V ligations (C). The dots depict the effective association constants derived based on the combinatoric rules presented in S-III, the solid lines represent the respective scaling laws introduced in Eq. (S4-S7). Different colors correspond to different hybridization energies per matching nearest neighbor block γ.

Length-modulated enhanced ligation in pools containing tetramers and octamers for strong binding affinity, γ = −5.0 kBT. The replication efficiency reaches its maximum in the concentration regime that supports templated ligation of tetramers on octamer templates.

Length-modulated enhanced ligation in pools containing heptamers and octamers for weak binding affinity, γ = −2.5 kBT. The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Length-modulated enhanced ligation in pools containing heptamers and octamers for strong binding affinity, γ = −5.0 kBT. The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Effective association constants of complexes facilitating 1+V ligations (A) and 2+V ligations (B).

The effective association constants of reactive triplexes can be computed based on the effective association constant of the duplexes. A If the hybridization region of the two VCG oligomers is LV − 1 nucleotides long, the monomer hybridizes to the end (start) of the template strand. As the template has no dangling end, the energy contribution of the hybridizing monomer is γ/2. B and C For hybridization regions that are shorter than LV − 1, but at least 1 nucleotide long, the energy contribution due to the added nucleotide is γ. In all cases, the prefactor 2 accounts for the two possible positions at which a monomer might be added.

Comparison between approximate (analytical) and true (numerical) solution of the (de)hybridization equilibrium in multi-length VCG pools. A The approximation converges after roughly five iteration steps. The relative error drops below 1% in the first iteration step already. B Equilibrium concentrations as a function of total VCG concentration. Continuous lines show the numerical solution, while the dotted and dashed lines depict the approximation obtained in the zeroth or first iteration step respectively. The feedstock concentration is fixed, .

Replication performance of multi-length VCG pools, in which only the monomers are activated. The pool includes activated monomers (ctot(1) = 20 mM, as well as non-activated oligomers of length L = 2 nt up to L = 12 nt with variable total concentration . The system exhibits inversion of productivity: 10-mers are more likely to be in a monomer-extension competent state than 12-mers, 8-mers are more likely to be extended by monomers than 10-mers (for provided ). For the experimentally used concentration (vertical dashed line), 8-mers are more productive than 10-mers, and those are more productive than 12-mers. However, unlike in the experimental system, 6-mers are less productive than 8-mers and 10-mers.