Physics of Living Systems

Toward Stable Replication of Genomic Information in Pools of RNA Molecules

Ludwig Burger
Ulrich Gerland author has email address

Physics of Complex Biosystems, Department of Bioscience, School of Natural Sciences, Technical University of Munich, Garching, Germany

https://doi.org/10.7554/eLife.104043.2

Open access
Copyright information

Figures and data

Model.
(A) In the Virtual Circular Genome (VCG) scenario, a circular genome (depicted in green) as well as its sequence complement are encoded in a pool of oligomers (depicted in blue and orange). Collectively, the pool of oligomers encodes the whole sequence of the circular genome. Depending on their length, two types of oligomers can be distinguished: Long VCG oligomers specify a unique locus on the genome, while feedstock molecules (monomers or short oligomers) are too short to do so. (B) The length-distribution of oligomers included in the VCG pool is assumed to be exponential. The concentration of feedstock and VCG oligomers as well as their respective length scales of exponential decay and can be varied independently. The set of included oligomer lengths can be restricted via and . (C) The hybridization energy of complexes is computed using a simplified nearest-neighbor model: Each full block comprised of two base pairs (depicted in pink) contributes γ, while dangling end blocks (depicted in blue) contribute γ/2. (D) Oligomers form complexes via hybridization reactions, or dehybridize from an existing complex. The ratio of hybridization and dehybridization rate is governed by the hybridization energy (Eq. 3). If two oligomers are adjacent to each other in a complex, they can undergo templated ligation. (E) Based on the length of the reacting oligomers, we distinguish three types of templated ligation: Ligation of two feedstock molecules (F+F), ligation of a feedstock molecule to a VCG oligomer (F+V) and ligation of two VCG oligomers (V+V).

Model.
(A) In the Virtual Circular Genome (VCG) scenario, a circular genome (depicted in green) as well as its sequence complement are encoded in a pool of oligomers (depicted in blue and orange). Collectively, the pool of oligomers encodes the whole sequence of the circular genome. Depending on their length, two types of oligomers can be distinguished: Long VCG oligomers specify a unique locus on the genome, while feedstock molecules (monomers or short oligomers) are too short to do so. (B) The length-distribution of oligomers included in the VCG pool is assumed to be exponential. The concentration of feedstock and VCG oligomers as well as their respective length scales of exponential decay and can be varied independently. The set of included oligomer lengths can be restricted via and . (C) The hybridization energy of complexes is computed using a simplified nearest-neighbor model: Each full block comprised of two base pairs (depicted in pink) contributes γ, while dangling end blocks (depicted in blue) contribute γ/2. (D) Oligomers form complexes via hybridization reactions, or dehybridize from an existing complex. The ratio of hybridization and dehybridization rate is governed by the hybridization energy (Eq. 3). If two oligomers are adjacent to each other in a complex, they can undergo templated ligation. (E) Based on the length of the reacting oligomers, we distinguish three types of templated ligation: Ligation of two feedstock molecules (F+F), ligation of a feedstock molecule to a VCG oligomer (F+V) and ligation of two VCG oligomers (V+V).

Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools).
(A) The pool contains a fixed concentration of monomers, , as well as VCG oligomers of a single length, L, at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). (B) The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. (C) The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for L_V = 6 nt. (D) Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. (E) V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, p_corr, depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with L_G = 16 nt, L_E = 2 nt and L_U = 3 nt. (F) The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (dashed line, Eq. (4)) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with L_V, facilitating reliable replication without fine-tuning to match the optimal concentration ratio. (G) Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of L_V and approaches a plateau of 100%. For high efficiency, Eq. (5) provides a good approximation of the length-dependence of η_max (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Eq. (5) (vertical dotted lines). (H) The unique motif length, , increases logarithmically with the length of the genome, L_G. The length of VCG oligomers, L_V, at which the optimal replication efficiency reaches 95% (computed using Eq. (5)) exhibits the same logarithmic dependence on L_G.

Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools).
(A) The pool contains a fixed concentration of monomers, , as well as VCG oligomers of a single length, L, at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). (B) The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. (C) The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for L_V = 6 nt. (D) Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. (E) V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, p_corr, depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with L_G = 16 nt, L_E = 2 nt and L_U = 3 nt. (F) The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (dashed line, Eq. (4)) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with L_V, facilitating reliable replication without fine-tuning to match the optimal concentration ratio. (G) Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of L_V and approaches a plateau of 100%. For high efficiency, Eq. (5) provides a good approximation of the length-dependence of η_max (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Eq. (5) (vertical dotted lines). (H) The unique motif length, , increases logarithmically with the length of the genome, L_G. The length of VCG oligomers, L_V, at which the optimal replication efficiency reaches 95% (computed using Eq. (5)) exhibits the same logarithmic dependence on L_G.

Replication performance of pools containing VCG oligomers of two different lengths.
(A) The pool contains a fixed concentration of monomers, , as well as tetramers and octamers at variable concentration. The hybridization energy per nearest-neighbor block is γ = −2.5 k_BT. (B) Replication efficiency reaches its maximum for c(8) ≈ 0.1 µM and significantly lower tetramer concentration, c(4) ≈ 7.4 pM. Efficiency remains close-to-maximal on a plateau around the maximum spanning almost two orders of magnitude in tetramer and octamer concentration. In addition, efficiency exhibits a ridge of increased efficiency for high tetramer concentration and intermediate octamer concentration. (C) Complexes that facilitate templated ligation are grouped by the length of the template and the educts, . We distinguish complexes producing correct (labeled “c”) and false products (labeled “f”). For each relevant type of complex, we highlight the region in the concentration plane where it contributes most significantly, i.e., at least 20% of the total ligation flux. The plateau of high efficiency is dominated by the ligation of monomers to octamers, whereas the ridge of increased efficiency is due to the correct ligation of two tetramers templated by an octamer.

Replication performance of pools containing VCG oligomers of two different lengths.
(A) The pool contains a fixed concentration of monomers, , as well as tetramers and octamers at variable concentration. The hybridization energy per nearest-neighbor block is γ = −2.5 k_BT. (B) Replication efficiency reaches its maximum for c(8) ≈ 0.1 µM and significantly lower tetramer concentration, c(4) ≈ 7.4 pM. Efficiency remains close-to-maximal on a plateau around the maximum spanning almost two orders of magnitude in tetramer and octamer concentration. In addition, efficiency exhibits a ridge of increased efficiency for high tetramer concentration and intermediate octamer concentration. (C) Complexes that facilitate templated ligation are grouped by the length of the template and the educts, . We distinguish complexes producing correct (labeled “c”) and false products (labeled “f”). For each relevant type of complex, we highlight the region in the concentration plane where it contributes most significantly, i.e., at least 20% of the total ligation flux. The plateau of high efficiency is dominated by the ligation of monomers to octamers, whereas the ridge of increased efficiency is due to the correct ligation of two tetramers templated by an octamer.

Replication performance of multi-length VCG pools.
(A) The pool contains a fixed concentration of monomers, , as well as long oligomers in the range at variable concentration . The length dependence of the concentration profile is assumed to be uniform (for panels B, D, and E) or exponential (for panel C); its steepness is set by the parameter κ_V. (B) If the length distribution is uniform, reducing decreases the maximal efficiency, whereas increasing increases it. Pools containing a range of oligomer lengths are always outperformed by single-length VCGs (blue curve). (C) Assuming an exponential length distribution of VCG oligomers allows us to tune from a poorly-performing regime (dominated by oligomers of length ) to a well-performing regime (dominated by oligomers of length ). In the limit κ_V → ∞, η_max approaches the replication efficiency of single-length pools containing only oligomers of length (dashed lines). (D) For high , replication is dominated by primer extension of the long oligomers in the VCG (here ). In this limit, addition of shorter oligomers leaves the dominant F+V ligations almost unchanged. (E) Reducing for fixed increases the fraction of unproductive (i.e., dimerization) or erroneous ligation reactions.

Replication performance of multi-length VCG pools.
(A) The pool contains a fixed concentration of monomers, , as well as long oligomers in the range at variable concentration . The length dependence of the concentration profile is assumed to be uniform (for panels B, D, and E) or exponential (for panel C); its steepness is set by the parameter κ_V. (B) If the length distribution is uniform, reducing decreases the maximal efficiency, whereas increasing increases it. Pools containing a range of oligomer lengths are always outperformed by single-length VCGs (blue curve). (C) Assuming an exponential length distribution of VCG oligomers allows us to tune from a poorly-performing regime (dominated by oligomers of length ) to a well-performing regime (dominated by oligomers of length ). In the limit κ_V → ∞, η_max approaches the replication efficiency of single-length pools containing only oligomers of length (dashed lines). (D) For high , replication is dominated by primer extension of the long oligomers in the VCG (here ). In this limit, addition of shorter oligomers leaves the dominant F+V ligations almost unchanged. (E) Reducing for fixed increases the fraction of unproductive (i.e., dimerization) or erroneous ligation reactions.

Replication performance of single-length VCG pools containing monomers and dimers as feedstock.
(A) The pool contains a fixed total concentration of feedstock, , partitioned into monomers and dimers, as well as VCG oligomers of a single length, L_V. The proportion of monomers and dimers can be adjusted via κ_F, and the concentration of the VCG oligomers is a free parameter, . (B) Replication efficiency exhibits a maximum at intermediate VCG concentration in systems with (dashed blue curve) and without dimers (solid blue curve). The presence of dimers reduces replication efficiency significantly, as they enhance the ligation share of incorrect F+V ligations (dashed green curve). The panel depicts the behavior for L_V = 7 nt and κ_F = 2.3. (C) Optimal replication efficiency increases as a function of oligomer length, L_V, and asymptotically approaches a plateau (dashed lines, Eq. (6)). The value of this plateau, , is determined by the competition between correct and false 2+V reactions, both of which grow exponentially with L_V. Thus, depends on the relative concentration of the dimers in the pool: the more dimers are included, the lower is . (D) Erroneous 1+V ligations are possible if the educt oligomer has a short overlap region with the template. The hybridization energy for such configurations is small, and independent of the length of the VCG oligomers (left). While 2+V ligations may produce incorrect products via the same mechanism (middle), they can also be caused by complexes in which two VCG oligomers hybridize perfectly to each other, but the dimer has a dangling end. The stability of these complexes increases exponentially with oligomer length (right).

Replication performance of single-length VCG pools containing monomers and dimers as feedstock.
(A) The pool contains a fixed total concentration of feedstock, , partitioned into monomers and dimers, as well as VCG oligomers of a single length, L_V. The proportion of monomers and dimers can be adjusted via κ_F, and the concentration of the VCG oligomers is a free parameter, . (B) Replication efficiency exhibits a maximum at intermediate VCG concentration in systems with (dashed blue curve) and without dimers (solid blue curve). The presence of dimers reduces replication efficiency significantly, as they enhance the ligation share of incorrect F+V ligations (dashed green curve). The panel depicts the behavior for L_V = 7 nt and κ_F = 2.3. (C) Optimal replication efficiency increases as a function of oligomer length, L_V, and asymptotically approaches a plateau (dashed lines, Eq. (6)). The value of this plateau, , is determined by the competition between correct and false 2+V reactions, both of which grow exponentially with L_V. Thus, depends on the relative concentration of the dimers in the pool: the more dimers are included, the lower is . (D) Erroneous 1+V ligations are possible if the educt oligomer has a short overlap region with the template. The hybridization energy for such configurations is small, and independent of the length of the VCG oligomers (left). While 2+V ligations may produce incorrect products via the same mechanism (middle), they can also be caused by complexes in which two VCG oligomers hybridize perfectly to each other, but the dimer has a dangling end. The stability of these complexes increases exponentially with oligomer length (right).

Replication performance of single-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers alongside non-reactive dimers and VCG oligomers of a single length. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of , while the concentration of VCG oligomers, , is varied. (B) Unlike in pools in which all ligation processes occur, replication efficiency does not decrease at high VCG concentration if ligations not involving monomers are kinetically suppressed. Instead, replication efficiency approaches an asymptotic value of 100%, as erroneous V+V ligations are impossible. (C) The fraction of oligomers that are in a monomer-extension-competent state depends on the total concentration of VCG oligomers. At low VCG concentration, most oligomers are single-stranded, and extension of oligomers by monomers is scarce. At high VCG concentration, r_1+V approaches the asymptotic value (Eq. (7), grey dashed line). In this limit, almost all oligomers form duplexes, which facilitate monomer addition upon hybridization of a monomer. Thus, the asymptotic fraction of oligomers that gets extended by monomers is not determined by the oligomer length, but by the binding affinity of monomers to existing duplexes. Conversely, the threshold concentration at which depends on oligomer length (colored dashed lines): Longer oligomers reach higher r_1+V at lower VCG concentration.

Replication performance of single-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers alongside non-reactive dimers and VCG oligomers of a single length. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of , while the concentration of VCG oligomers, , is varied. (B) Unlike in pools in which all ligation processes occur, replication efficiency does not decrease at high VCG concentration if ligations not involving monomers are kinetically suppressed. Instead, replication efficiency approaches an asymptotic value of 100%, as erroneous V+V ligations are impossible. (C) The fraction of oligomers that are in a monomer-extension-competent state depends on the total concentration of VCG oligomers. At low VCG concentration, most oligomers are single-stranded, and extension of oligomers by monomers is scarce. At high VCG concentration, r_1+V approaches the asymptotic value (Eq. (7), grey dashed line). In this limit, almost all oligomers form duplexes, which facilitate monomer addition upon hybridization of a monomer. Thus, the asymptotic fraction of oligomers that gets extended by monomers is not determined by the oligomer length, but by the binding affinity of monomers to existing duplexes. Conversely, the threshold concentration at which depends on oligomer length (colored dashed lines): Longer oligomers reach higher r_1+V at lower VCG concentration.

Replication performance of multi-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers as well as non-reactive dimers and VCG oligomers. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of , while the total concentration of VCG oligomers, , is varied. All VCG oligomers are assumed to have the same concentration. (B) At low VCG concentration, long oligomers are more likely in a monomer-extension-competent state than short oligomers, whereas at high VCG concentration, the trend reverses and short oligomers are more likely to be extended by monomers (“inversion of productivity”). The threshold concentration at which a short oligomer starts to outperform a longer oligomer depends on the lengths of the compared oligomers (dashed lines). (C) The mechanism underlying inversion of productivity can be understood based on the pair-wise competition of different VCG oligomers, e.g., 8-mers vs. 9-mers. Over the entire range of VCG concentrations, complexes with 8-mer templates have a lower relative equilibrium concentration than complexes with 9-mer templates (bottom two curves vs. top two curves). As the concentration of VCG oligomers is increased, ligations of type exceed ligations of type , i.e., the extension fraction of 8-mers that are extended by monomers using a 9-mer as a template exceeds the fraction of extended 9-mers. (D) The equilibrium concentration of free oligomer decreases with increasing . For longer oligomers, the equilibrium fraction of free oligomers is lower, as they can form more stable complexes with longer hybridization sites. (E) Complexes in which 8-mers serve as template are less stable than complexes with 9-mer templates, explaining why complexes with 8-mer template are more abundant than complexes with 9-mer template (see panel C). Complexes with 9-mer template have similar stability regardless of the length of the educt oligomer, i.e., and are similarly stable. This similar stability together with the higher concentration of free 8-mers compared to 9-mers (see panel D) is the reason why the fraction of monomer-extended 8-mers exceeds the one of 9-mers (see panel C).

Replication performance of multi-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers as well as non-reactive dimers and VCG oligomers. The concentrations of monomers and dimers are fixed, c(1) = 0.091 mM and c(2) = 9.1 µM, adding up to a total feedstock concentration of , while the total concentration of VCG oligomers, , is varied. All VCG oligomers are assumed to have the same concentration. (B) At low VCG concentration, long oligomers are more likely in a monomer-extension-competent state than short oligomers, whereas at high VCG concentration, the trend reverses and short oligomers are more likely to be extended by monomers (“inversion of productivity”). The threshold concentration at which a short oligomer starts to outperform a longer oligomer depends on the lengths of the compared oligomers (dashed lines). (C) The mechanism underlying inversion of productivity can be understood based on the pair-wise competition of different VCG oligomers, e.g., 8-mers vs. 9-mers. Over the entire range of VCG concentrations, complexes with 8-mer templates have a lower relative equilibrium concentration than complexes with 9-mer templates (bottom two curves vs. top two curves). As the concentration of VCG oligomers is increased, ligations of type exceed ligations of type , i.e., the extension fraction of 8-mers that are extended by monomers using a 9-mer as a template exceeds the fraction of extended 9-mers. (D) The equilibrium concentration of free oligomer decreases with increasing . For longer oligomers, the equilibrium fraction of free oligomers is lower, as they can form more stable complexes with longer hybridization sites. (E) Complexes in which 8-mers serve as template are less stable than complexes with 9-mer templates, explaining why complexes with 8-mer template are more abundant than complexes with 9-mer template (see panel C). Complexes with 9-mer template have similar stability regardless of the length of the educt oligomer, i.e., and are similarly stable. This similar stability together with the higher concentration of free 8-mers compared to 9-mers (see panel D) is the reason why the fraction of monomer-extended 8-mers exceeds the one of 9-mers (see panel C).

Characteristic timescales in the kinetic simulation.
A The equilibration timescale is determined based on the total hybridization energy of all strands in the pool, ΔG_tot. By fitting an exponential function to ΔG_tot, we obtain a characteristic timescale, τ ^* (vertical dotted line), which is then used to calculate the equilibration time as τ_eq = 5τ ^* (vertical dashed line). The horizontal dashed line shows the total hybridization energy expected in (de)hybridization equilibrium according to the coarse-grained adiabatic approach (Sections S3 and S4). B The correlation timescale is determined based on the autocorrelation of ΔG_tot. We obtain τ_corr (vertical dashed line) by fitting an exponential function to the autocorrelation. In both panels, we show simulation data obtained for a VCG pool containing monomers and VCG oligomers with a concentration of as well as oligomers of length L = 8 nt with a concentration of .

Characteristic timescales in the kinetic simulation.
A The equilibration timescale is determined based on the total hybridization energy of all strands in the pool, ΔG_tot. By fitting an exponential function to ΔG_tot, we obtain a characteristic timescale, τ ^* (vertical dotted line), which is then used to calculate the equilibration time as τ_eq = 5τ ^* (vertical dashed line). The horizontal dashed line shows the total hybridization energy expected in (de)hybridization equilibrium according to the coarse-grained adiabatic approach (Sections S3 and S4). B The correlation timescale is determined based on the autocorrelation of ΔG_tot. We obtain τ_corr (vertical dashed line) by fitting an exponential function to the autocorrelation. In both panels, we show simulation data obtained for a VCG pool containing monomers and VCG oligomers with a concentration of as well as oligomers of length L = 8 nt with a concentration of .

Simulation runtime of the full kinetic simulation for a VCG pool that includes monomers and VCG oligomers of length L = 8.
The total concentration of feedstock monomers equals , while the total concentration of VCG oligomers equals . The energy contribution per matching nearest-neighbor block equals γ = −2.5 k_BT. The volume of the system is varied, and the time-evolution is simulated until t = 5.0 · 10⁷t₀. The runtime of the simulation scales linearly with the volume of the system.

Simulation runtime of the full kinetic simulation for a VCG pool that includes monomers and VCG oligomers of length L = 8.
The total concentration of feedstock monomers equals , while the total concentration of VCG oligomers equals . The energy contribution per matching nearest-neighbor block equals γ = −2.5 k_BT. The volume of the system is varied, and the time-evolution is simulated until t = 5.0 · 10⁷t₀. The runtime of the simulation scales linearly with the volume of the system.

System parameters used to compute the replication observables yield, y, and replication efficiency, η, based on the kinetic simulation.
The computed observables are shown in Fig. 2. Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). (A) The pool contains a fixed concentration of monomers, , as well as VCG oligomers of a single length, L_V, at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). (B) The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. (C) The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for L_V = 6 nt. (D) Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. (E) V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, p_corr, depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with L_G = 16 nt, L_E = 2 nt and L_U = 3 nt. (F) The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (dashed line, Eq. (4equation.8)) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with L_V, facilitating reliable replication without fine-tuning to match the optimal concentration ratio. (G) Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of L_V and approaches a plateau of 100%. For high efficiency, Eq. (5equation.9) provides a good approximation of the length-dependence of η_max (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Eq. (5equation.9) (vertical dotted lines). (H) The unique motif length, , increases logarithmically with the length of the genome, L_G. The length of VCG oligomers, L_V, at which the optimal replication efficiency reaches 95% (computed using Eq. (5equation.9)) exhibits the same logarithmic dependence on L_G.figure.caption.7 in the main text.

System parameters used to compute the replication observables yield, y, and replication efficiency, η, based on the kinetic simulation.
The computed observables are shown in Fig. 2. Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools). (A) The pool contains a fixed concentration of monomers, , as well as VCG oligomers of a single length, L_V, at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). (B) The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. (C) The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for L_V = 6 nt. (D) Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. (E) V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, p_corr, depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with L_G = 16 nt, L_E = 2 nt and L_U = 3 nt. (F) The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (dashed line, Eq. (4equation.8)) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with L_V, facilitating reliable replication without fine-tuning to match the optimal concentration ratio. (G) Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of L_V and approaches a plateau of 100%. For high efficiency, Eq. (5equation.9) provides a good approximation of the length-dependence of η_max (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Eq. (5equation.9) (vertical dotted lines). (H) The unique motif length, , increases logarithmically with the length of the genome, L_G. The length of VCG oligomers, L_V, at which the optimal replication efficiency reaches 95% (computed using Eq. (5equation.9)) exhibits the same logarithmic dependence on L_G.figure.caption.7 in the main text.

Schematic representation of complexes considered in the adiabatic approach.
A A duplex is comprised of two strands, which we refer to as W (Watson) and C (Crick). The relative position of the strands is characterized by alignment index i; for the depicted duplex, i = −2. The length of the hybridization region is called L_o. B A triplex contains three strands. By convention, we denote the two strands that are on the same “side” of the complex as W1 and W2, and the complementary strand as C. The alignment indices i and j denote the positions of W1 and W2 relative to C. For the depicted triplex, i = −2 and j = 3. The length of the hybridization regions are called L_o,1 and L_o,2.

Schematic representation of a 3-1 tetraplex.
Three strands (in the following referred to as Watson strands W1, W2, and W3), hybridize to a single template strand (Crick strand C). The positions relative to the left end of the C strand are given by the alignment indices i, j, and k; here, i = −2, j = 2, k = 5. The length of the overlap regions are denoted L_o,1, L_o,2 and L_o,3.

Schematic representation of a left-tilted 2-2 tetraplex.
A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Both Watson strands are hybridized to the left Crick strand C1, whereas only W2 is hybridized to the right Crick strand C2. The alignment indices i, j and k denote the position of the strands relative to the left end of C1; here, i = −2, j = 3 and k = 6. The length of the hybridization regions are called L_o,1, L_o,2 and L_o,3. B Rotating the schematic representation of a left-tilted 2-2 tetraplex by 180^° produces an alternative representation of the same complex, which is again a left-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation.

Schematic representation of a right-tilted 2-2 tetraplex.
A Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Unlike in the left-tilted 2-2 tetraplex, both Watson strands are hybridized to the right Crick strand C2, whereas only W1 is hybridized to the left Crick strand C1. The alignment indices i, j, and k denote the positions of the strands relative to C1; here, i = 1, k = 3, and j = 6. The length of the overlap regions are called L_o,1, L_o,2 and L_o,3. B Rotating the schematic representation of a right-tilted 2-2 tetraplex by 180^° produces an alternative representation of the same complex, which is again a right-tilted 2-2 tetraplex. The panel depicts the rotated tetraplex representation (variables with superscript “rot”) as well as the un-rotated representation (variables without superscript). There is a unique linear mapping between un-rotated and rotated representation, e.g., C2 after rotation always corresponds to W1 before rotation. The mapping is identical for left- and right-tilted 2-2 quaternary complexes.

Schematic representation of a complex that allows for templated ligation.
The strands E1 and E2 are adjacent to each other, such that a covalent bond can form between their ends. The length of the product strand, L_P, is set by the length of the educt strands, L_e,1 and L_e,2. The likelihood for the complex to form a product oligomer whose sequence is compatible with the true circular genome, p_corr, is determined by the length of the educts and the length of their hybridization region with the template. The parts of the complex that are depicted with hatching do not affect p_corr.

Effective association constants of complexes facilitating F+F ligations (A), false V+V ligations (B), and F+V ligations (C).
The dots depict the effective association constants derived based on the combinatoric rules presented in S4, the solid lines represent the respective scaling laws introduced in Eq. (S4-S7). Different colors correspond to different hybridization energies per matching nearest neighbor block γ.

Genomes sampled via the Metropolis-Hastings algorithm using the motif entropy as “Hamiltonian”.
The table summarizes the sequence of the genome, its characteristic length scales L_E and L_U, as well as the motif entropy on all length scales of interest. The keyword “bias” is used to distinguish two different sampling procedures: Weakly biased genomes are designed to obey the desired length scales L_E and L_U while retaining a close-to-uniform motif distribution for subsequences of length L_E < L < L_U, whereas the motif distribution is far from uniform for strongly biased genomes.

Replication efficiency as a function of the concentration of VCG oligomers in the pool for different choices of genomes and varying VCG oligomer length.
All genomes are L_G = 64 nt long, and include all motifs up to length L_E = 2 nt, but differ with respect to their minimal unique subsequence length L_U : (A) L_U = 6 nt, (B) L_U = 8 nt, and (C) L_U = 10 nt (shown as dotted lines). For comparison, every panel shows the replication efficiency of a genome with L_E = 3 nt, L_U = 4 nt (solid line). Different colors are used to distinguish different VCG oligomer lengths. Under otherwise identical conditions (e.g., identical oligomer length), replication proceeds with lower efficiency in genomes with higher unique subsequence length L_U.

Maximal replication efficiency as a function of the oligomer length for different genomes (all L_G = 64 nt long).
Regardlesss of the genome, the oligomer lenght needs to exceed L_U to enable replication with high efficiency (e.g., higher than 95%). The difference between the genome length required for high efficiency replication and the unique motif length L_U depends on the motif distribution on intermediate length scales (L_E < L < L_U): Genomes with strong bias require longer oligomers (A) than genomes with weak bias (B) (see Table S2 for the genomes and their motif entropies).

Length-modulated enhanced ligation in pools containing tetramers and octamers for strong binding affinity, γ = −5.0 k_BT.
The replication efficiency reaches its maximum in the concentration regime that supports templated ligation of tetramers on octamer templates.

Length-modulated enhanced ligation in pools containing heptamers and octamers for weak binding affinity, γ = −2.5 k_BT.
The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Length-modulated enhanced ligation in pools containing heptamers and octamers for strong binding affinity, γ = −5.0 k_BT.
The replication efficiency reaches its maximum in the concentration regime that is dominated by the addition of monomers to the VCG oligomers.

Effective association constants of complexes facilitating 1+V ligations (A) and 2+V ligations (B).

The effective association constants of reactive ternary complexes (complexes comprising three strands) can be computed based on the effective association constant of the duplexes.
A If the hybridization region of the two VCG oligomers is L_V − 1 nucleotides long, the monomer hybridizes to the end (start) of the template strand. As the template has no dangling end, the energy contribution of the hybridizing monomer is γ/2. B and C For hybridization regions that are shorter than L_V − 1, but at least 1 nucleotide long, the energy contribution due to the added nucleotide is γ. In all cases, the prefactor 2 accounts for the two possible positions at which a monomer might be added.

Comparison between approximate (analytical) and true (numerical) solution of the (de)hybridization equilibrium in multi-length VCG pools.
A The approximation converges after roughly five iteration steps. The relative error drops below 1% in the first iteration step already. B Equilibrium concentrations as a function of total VCG concentration. Continuous lines show the numerical solution, while the dotted and dashed lines depict the approximation obtained in the zeroth or first iteration step respectively. The feedstock concentration is fixed, .

Comparison between approximate (analytical) and true (numerical) solution of the (de)hybridization equilibrium in multi-length VCG pools.
A The approximation converges after roughly five iteration steps. The relative error drops below 1% in the first iteration step already. B Equilibrium concentrations as a function of total VCG concentration. Continuous lines show the numerical solution, while the dotted and dashed lines depict the approximation obtained in the zeroth or first iteration step respectively. The feedstock concentration is fixed, .

Replication performance of multi-length VCG pools, in which only the monomers are activated.
The pool includes activated monomers (, as well as non-activated oligomers of length L = 2 nt up to L = 12 nt with variable total concentration . The system exhibits inversion of productivity: 10-mers are more likely to be in a monomer-extension competent state than 12-mers, 8-mers are more likely to be extended by monomers than 10-mers (for provided ). For the experimentally used concentration (vertical dashed line), 8-mers are more productive than 10-mers, and those are more productive than 12-mers. However, unlike in the experimental system, 6-mers are less productive than 8-mers and 10-mers.

Replication performance of multi-length VCG pools, in which only the monomers are activated.
The pool includes activated monomers (, as well as non-activated oligomers of length L = 2 nt up to L = 12 nt with variable total concentration . The system exhibits inversion of productivity: 10-mers are more likely to be in a monomer-extension competent state than 12-mers, 8-mers are more likely to be extended by monomers than 10-mers (for provided ). For the experimentally used concentration (vertical dashed line), 8-mers are more productive than 10-mers, and those are more productive than 12-mers. However, unlike in the experimental system, 6-mers are less productive than 8-mers and 10-mers.

Sign up for email alerts