Figures and data

Model.
(A) In the Virtual Circular Genome (VCG) scenario, a circular genome (depicted in green) as well as its sequence complement are encoded in a pool of oligomers (depicted in blue and orange). Collectively, the pool of oligomers encodes the whole sequence of the circular genome. Depending on their length, two types of oligomers can be distinguished: Long VCG oligomers specify a unique locus on the genome, while feedstock molecules (monomers or short oligomers) are too short to do so. (B) The length-distribution of oligomers included in the VCG pool is assumed to be exponential. The concentration of feedstock and VCG oligomers as well as their respective length scales of exponential decay

Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools).
(A) The pool contains a fixed concentration of monomers,

Replication performance of pools containing VCG oligomers of two different lengths.
(A) The pool contains a fixed concentration of monomers,

Replication performance of multi-length VCG pools.
(A) The pool contains a fixed concentration of monomers,

Replication performance of single-length VCG pools containing monomers and dimers as feedstock.
(A) The pool contains a fixed total concentration of feedstock,

Replication performance of single-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers alongside non-reactive dimers and VCG oligomers of a single length. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of

Replication performance of multi-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers as well as non-reactive dimers and VCG oligomers. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of

Characteristic timescales in the kinetic simulation.
A The equilibration timescale is determined based on the total hybridization energy of all strands in the pool, ΔGtot. By fitting an exponential function to ΔGtot, we obtain a characteristic timescale, τ * (vertical dotted line), which is then used to calculate the equilibration time as τeq = 5τ * (vertical dashed line). The horizontal dashed line shows the total hybridization energy expected in (de)hybridization equilibrium according to the coarse-grained adiabatic approach (Sections S3 and S4). B The correlation timescale is determined based on the autocorrelation of ΔGtot. We obtain τcorr (vertical dashed line) by fitting an exponential function to the autocorrelation. In both panels, we show simulation data obtained for a VCG pool containing monomers and VCG oligomers with a concentration of

Simulation runtime of the full kinetic simulation for a VCG pool that includes monomers and VCG oligomers of length L = 8.
The total concentration of feedstock monomers equals

System parameters used to compute the replication observables yield, y, and replication efficiency, η, based on the kinetic simulation.
The computed observables are shown in Fig. 2. Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). (A) The pool contains a fixed concentration of monomers,

Schematic representation of complexes considered in the adiabatic approach.
A A duplex is comprised of two strands, which we refer to as W (Watson) and C (Crick). The relative position of the strands is characterized by alignment index i; for the depicted duplex, i = −2. The length of the hybridization region is called Lo. B A triplex contains three strands. By convention, we denote the two strands that are on the same “side” of the complex as W1 and W2, and the complementary strand as C. The alignment indices i and j denote the positions of W1 and W2 relative to C. For the depicted triplex, i = −2 and j = 3. The length of the hybridization regions are called Lo,1 and Lo,2.

Schematic representation of a 3-1 tetraplex.
Three strands (in the following referred to as Watson strands W1, W2, and W3), hybridize to a single template strand (Crick strand C). The positions relative to the left end of the C strand are given by the alignment indices i, j, and k; here, i = −2, j = 2, k = 5. The length of the overlap regions are denoted Lo,1, Lo,2 and Lo,3.

Schematic representation of a left-tilted 2-2 tetraplex.
A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Both Watson strands are hybridized to the left Crick strand C1, whereas only W2 is hybridized to the right Crick strand C2. The alignment indices i, j and k denote the position of the strands relative to the left end of C1; here, i = −2, j = 3 and k = 6. The length of the hybridization regions are called Lo,1, Lo,2 and Lo,3. B Rotating the schematic representation of a left-tilted 2-2 tetraplex by 180° produces an alternative representation of the same complex, which is again a left-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation.

Schematic representation of a right-tilted 2-2 tetraplex.
A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Unlike in the left-tilted 2-2 tetraplex, both Watson strands are hybridized to the right Crick strand C2, whereas only W1 is hybridized to the left Crick strand C1. The alignment indices i, j, and k denote the positions of the strands relative to C1; here, i = 1, k = 3, and j = 6. The length of the overlap regions are called Lo,1, Lo,2 and Lo,3. B Rotating the schematic representation of a right-tilted 2-2 tetraplex by 180° produces an alternative representation of the same complex, which is again a right-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation. The mapping is identical for left- and right-tilted 2-2 quaternary complexes.

Schematic representation of a complex that allows for templated ligation.
The strands E1 and E2 are adjacent to each other, such that a covalent bond can form between their ends. The length of the product strand, LP, is set by the length of the educt strands, Le,1 and Le,2. The likelihood for the complex to form a product oligomer whose sequence is compatible with the true circular genome, pcorr, is determined by the length of the educts and the length of their hybridization region with the template. The parts of the complex that are depicted with hatching do not affect pcorr.

Effective association constants of complexes facilitating F+F ligations (A), false V+V ligations (B), and F+V ligations (C).
The dots depict the effective association constants derived based on the combinatoric rules presented in S4, the solid lines represent the respective scaling laws introduced in Eq. (S4-S7). Different colors correspond to different hybridization energies per matching nearest neighbor block γ.

Genomes sampled via the Metropolis-Hastings algorithm using the motif entropy as “Hamiltonian”.
The table summarizes the sequence of the genome, its characteristic length scales LE and LU, as well as the motif entropy on all length scales of interest. The keyword “bias” is used to distinguish two different sampling procedures: Weakly biased genomes are designed to obey the desired length scales LE and LU while retaining a close-to-uniform motif distribution for subsequences of length LE < L < LU, whereas the motif distribution is far from uniform for strongly biased genomes.

Replication efficiency as a function of the concentration of VCG oligomers in the pool for different choices of genomes and varying VCG oligomer length.
All genomes are LG = 64 nt long, and include all motifs up to length LE = 2 nt, but differ with respect to their minimal unique subsequence length LU : (A) LU = 6 nt, (B) LU = 8 nt, and (C) LU = 10 nt (shown as dotted lines). For comparison, every panel shows the replication efficiency of a genome with LE = 3 nt, LU = 4 nt (solid line). Different colors are used to distinguish different VCG oligomer lengths. Under otherwise identical conditions (e.g., identical oligomer length), replication proceeds with lower efficiency in genomes with higher unique subsequence length LU.

Maximal replication efficiency as a function of the oligomer length for different genomes (all LG = 64 nt long).
Regardlesss of the genome, the oligomer lenght needs to exceed LU to enable replication with high efficiency (e.g., higher than 95%). The difference between the genome length required for high efficiency replication and the unique motif length LU depends on the motif distribution on intermediate length scales (LE < L < LU): Genomes with strong bias require longer oligomers (A) than genomes with weak bias (B) (see Table S2 for the genomes and their motif entropies).

Length-modulated enhanced ligation in pools containing tetramers and octamers for strong binding affinity, γ = −5.0 kBT.
The replication efficiency reaches its maximum in the concentration regime that supports templated ligation of tetramers on octamer templates.

Length-modulated enhanced ligation in pools containing heptamers and octamers for weak binding affinity, γ = −2.5 kBT.
The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Length-modulated enhanced ligation in pools containing heptamers and octamers for strong binding affinity, γ = −5.0 kBT.
The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Effective association constants of complexes facilitating 1+V ligations (A) and 2+V ligations (B).

The effective association constants of reactive ternary complexes (complexes comprising three strands) can be computed based on the effective association constant of the duplexes.
A If the hybridization region of the two VCG oligomers is LV − 1 nucleotides long, the monomer hybridizes to the end (start) of the template strand. As the template has no dangling end, the energy contribution of the hybridizing monomer is γ/2. B and C For hybridization regions that are shorter than LV − 1, but at least 1 nucleotide long, the energy contribution due to the added nucleotide is γ. In all cases, the prefactor 2 accounts for the two possible positions at which a monomer might be added.

Comparison between approximate (analytical) and true (numerical) solution of the (de)hybridization equilibrium in multi-length VCG pools.
A The approximation converges after roughly five iteration steps. The relative error drops below 1% in the first iteration step already. B Equilibrium concentrations as a function of total VCG concentration. Continuous lines show the numerical solution, while the dotted and dashed lines depict the approximation obtained in the zeroth or first iteration step respectively. The feedstock concentration is fixed,

Replication performance of multi-length VCG pools, in which only the monomers are activated.
The pool includes activated monomers (