Toward stable replication of genomic information in pools of RNA molecules
Figures
Model.
(A) In the Virtual Circular Genome (VCG) scenario, a circular genome (depicted in green) as well as its sequence complement are encoded in a pool of oligomers (depicted in blue and orange). Collectively, the pool of oligomers encodes the whole sequence of the circular genome. Depending on their length, two types of oligomers can be distinguished: Long VCG oligomers specify a unique locus on the genome, while feedstock molecules (monomers or short oligomers) are too short to do so. (B) The length distribution of oligomers included in the VCG pool is assumed to be exponential. The concentration of feedstock and VCG oligomers as well as their respective length scales of exponential decay and can be varied independently. The set of included oligomer lengths can be restricted via , and , . (C) The hybridization energy of complexes is computed using a simplified nearest-neighbor model: Each full block comprised of two base pairs (depicted in pink) contributes , while dangling end blocks (depicted in blue) contribute . (D) Oligomers form complexes via hybridization reactions, or dehybridize from an existing complex. The ratio of hybridization and dehybridization rate is governed by the hybridization energy (Equation 1). If two oligomers are adjacent to each other in a complex, they can undergo templated ligation. (E) Based on the length of the reacting oligomers, we distinguish three types of templated ligation: Ligation of two feedstock molecules (F+F), ligation of a feedstock molecule to a VCG oligomer (F+V) and ligation of two VCG oligomers (V+V).
Replication performance of VCG pools containing VCG oligomers of a single length (single-length VCG pools).
(A) The pool contains a fixed concentration of monomers, , as well as VCG oligomers of a single length, , at variable concentration (the VCG oligomers cover all possible subsequences of the genome and its complement at equal concentration). (B) The yield increases as a function of , because dimerizations become increasingly unlikely for high VCG concentrations. (C) The ligation share of different ligation types depends on the total VCG concentration: In the low concentration limit, dimerization (F+F) dominates; for intermediate concentrations, F+V ligations reach their maximum, while, for high concentrations, a substantial fraction of reactions are V+V ligations. The panel depicts the behavior for . (D) Replication efficiency is limited by the small yield for small . In the limit of high , replication efficiency decreases due to the growing number of error-prone V+V ligations. Maximal replication efficiency is reached at intermediate VCG concentration. (E) V+V ligations are prone to the formation of incorrect products due to the short overlap between educt strand and template. In general, the probability of correct product formation, , depends on the choice of circular genome and as well as its mapping to the VCG pool. The probabilities listed here refer to a VCG pool with , and . (F) The optimal equilibrium concentration ratio of free VCG strands to free feedstock strands, which maximizes replication efficiency, decays as a function of length (continuous line). The analytical scaling law (dashed line, Equation 2) captures this behavior. The window of close-to-optimal replication, within which efficiency deviates no more than 1% from its optimum (shaded areas), increases with facilitating reliable replication without fine-tuning to match the optimal concentration ratio. (G) Maximal replication efficiency, which is attained at the optimal VCG concentration depicted in panel E, increases as a function of and approaches a plateau of 100%. For high efficiency, Equation 3 provides a good approximation of the length-dependence of (dashed lines). The oligomer length at which replication efficiency equals 95% is determined using Equation 3 (vertical dotted lines). (H) The unique motif length, increases logarithmically with the length of the genome, . The length of VCG oligomers, at which the optimal replication efficiency reaches 95% (computed using Equation 3) exhibits the same logarithmic dependence on .
Replication performance of pools containing VCG oligomers of two different lengths.
(A) The pool contains a fixed concentration of monomers, , as well as tetramers and octamers at variable concentration. The hybridization energy per nearest-neighbor block is . (B) Replication efficiency reaches its maximum for and significantly lower tetramer concentration, . Efficiency remains close to maximal on a plateau around the maximum spanning almost two orders of magnitude in tetramer and octamer concentration. In addition, efficiency exhibits a ridge of increased efficiency for high tetramer concentration and intermediate octamer concentration. (C) Complexes that facilitate templated ligation are grouped by the length of the template and the educts, . We distinguish complexes producing correct (labeled ‘c’) and false products (labeled ‘f’). For each relevant type of complex, we highlight the region in the concentration plane where it contributes most significantly, that is at least 20% of the total ligation flux. The plateau of high efficiency is dominated by the ligation of monomers to octamers, whereas the ridge of increased efficiency is due to the correct ligation of two tetramers templated by an octamer.
Replication performance in pools containing tetramers and octamers for strong binding affinity, .
Replication efficiency reaches its maximum in the concentration regime that supports templated ligation of tetramers on octamer templates.
Replication performance in pools containing heptamers and octamers for weak binding affinity, .
Replication efficiency reaches its maximum in the concentration regime dominated by the addition of monomers to the VCG oligomers.
Replication performance in pools containing heptamers and octamers for strong binding affinity, .
Replication efficiency reaches its maximum in the concentration regime dominated by the addition of monomers to the VCG oligomers.
Replication performance of multi-length VCG pools.
(A) The pool contains a fixed concentration of monomers, , as well as long oligomers in the range at variable concentration . The length dependence of the concentration profile is assumed to be uniform (for panels B, D, and E) or exponential (for panel C); its steepness is set by the parameter . (B) If the length distribution is uniform, reducing decreases the maximal efficiency, whereas increasing increases it. Pools containing a range of oligomer lengths are always outperformed by single-length VCGs (blue curve). (C) Assuming an exponential length distribution of VCG oligomers allows us to tune from a poorly-performing regime (dominated by oligomers of length ) to a well-performing regime (dominated by oligomers of length ). In the limit , approaches the replication efficiency of single-length pools containing only oligomers of length (dashed lines). (D) For high , replication is dominated by primer extension of the long oligomers in the VCG (here ). In this limit, addition of shorter oligomers leaves the dominant F+V ligations almost unchanged. (E) Reducing for fixed increases the fraction of unproductive (i.e. dimerization) or erroneous ligation reactions.
Replication performance of single-length VCG pools containing monomers and dimers as feedstock.
(A) The pool contains a fixed total concentration of feedstock, , partitioned into monomers and dimers, as well as VCG oligomers of a single length, . The proportion of monomers and dimers can be adjusted via , and the concentration of the VCG oligomers is a free parameter, . (B) Replication efficiency exhibits a maximum at intermediate VCG concentration in systems with (dashed blue curve) and without dimers (solid blue curve). The presence of dimers reduces replication efficiency significantly, as they enhance the ligation share of incorrect F+V ligations (dashed green curve). The panel depicts the behavior for and . (C) Optimal replication efficiency increases as a function of oligomer length, , and asymptotically approaches a plateau (dashed lines, Equation 4). The value of this plateau, , is determined by the competition between correct and false 2+V reactions, both of which grow exponentially with . Thus, depends on the relative concentration of the dimers in the pool: the more dimers are included, the lower is . (D) Erroneous 1+V ligations are possible if the educt oligomer has a short overlap region with the template. The hybridization energy for such configurations is small and independent of the length of the VCG oligomers (left). While 2+V ligations may produce incorrect products via the same mechanism (middle), incorrect product can also be caused by complexes in which two VCG oligomers hybridize perfectly to each other, but the dimer has a dangling end. The stability of these complexes increases exponentially with oligomer length (right).
Replication performance of single-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers alongside non-reactive dimers and VCG oligomers of a single length. The concentrations of monomers and dimers are fixed, and , adding up to a total feedstock concentration of , while the concentration of VCG oligomers, , is varied. (B) Unlike in pools in which all ligation processes occur, replication efficiency does not decrease at high VCG concentration if ligations that do not involve monomers are kinetically suppressed. Instead, replication efficiency approaches an asymptotic value of 100%, as erroneous V+V ligations are impossible. (C) The fraction of oligomers that are in a monomer-extension-competent state depends on the total concentration of VCG oligomers. At low VCG concentration, most oligomers are single-stranded, and extension of oligomers by monomers is scarce. At high VCG concentration, approaches the asymptotic value (grey dashed line, Figure 7.). In this limit, almost all oligomers form duplexes, which facilitate monomer addition upon hybridization of a monomer. Thus, the asymptotic fraction of oligomers that gets extended by monomers is not determined by the oligomer length, but by the binding affinity of monomers to existing duplexes. Conversely, the threshold concentration at which depends on oligomer length (colored dashed lines): Longer oligomers reach higher at lower VCG concentration.
Replication performance of multi-length VCG pools with kinetic suppression of ligation between oligomers.
(A) The pool contains reactive monomers as well as non-reactive dimers and VCG oligomers. The concentrations of monomers and dimers are fixed, and , adding up to a total feedstock concentration of , while the total concentration of VCG oligomers, , is varied. All VCG oligomers are assumed to have the same concentration. (B) At low VCG concentration, long oligomers are more likely in a monomer-extension-competent state than short oligomers, whereas at high VCG concentration, the trend reverses and short oligomers are more likely to be extended by monomers (‘productivity inversion’). The threshold concentration at which a short oligomer starts to outperform a longer oligomer depends on the lengths of the compared oligomers (dashed lines). (C) The mechanism underlying productivity inversion can be understood based on the pair-wise competition of different VCG oligomers, for example 8-mers vs. 9-mers. Over the entire range of VCG concentrations, complexes with 8-mer templates have a lower relative equilibrium concentration than complexes with 9-mer templates (bottom two curves vs. top two curves). As the concentration of VCG oligomers is increased, ligations of type exceed ligations of type , that is the fraction of 8-mers that are extended by monomers using a 9-mer as a template exceeds the fraction of extended 9-mers. (D) The equilibrium concentration of free oligomer decreases with increasing . For longer oligomers, the equilibrium fraction of free oligomers is lower, as they can form more stable complexes with longer hybridization sites. (E) Complexes in which 8-mers serve as template are less stable than complexes with 9-mer templates, explaining why complexes with 8-mer templates are more abundant than complexes with 9-mer templates (see panel C). Complexes with 9-mer template have similar stability regardless of the length of the educt oligomer, that is and are similarly stable. This similar stability, together with the higher concentration of free 8-mers compared to 9-mers (see panel D), is the reason why the fraction of monomer-extended 8-mers exceeds the one of 9-mers (see panel C).
Simulation runtime of the full kinetic simulation for a VCG pool that includes monomers and VCG oligomers of length .
The total concentration of feedstock monomers equals , while the total concentration of VCG oligomers is . The energy contribution per matching nearest-neighbor block is set to . The volume of the system is varied, and the time evolution is simulated until . The runtime of the simulation scales linearly with the volume of the system.
Characteristic timescales in the kinetic simulation.
(A) The equilibration timescale is determined based on the total hybridization energy of all strands in the pool, . By fitting an exponential function to , we obtain a characteristic timescale (vertical dotted line), which is then used to calculate the equilibration time as (vertical dashed line). The horizontal dashed line shows the total hybridization energy expected in (de)hybridization equilibrium according to the coarse-grained adiabatic approach (Methods). (B) The correlation timescale is determined based on the autocorrelation of . We obtain (vertical dashed line) by fitting an exponential function to the autocorrelation. In both panels, we show simulation data obtained for a VCG pool containing monomers and VCG oligomers with a concentration of as well as oligomers of length with a concentration of .
Schematic representation of complexes considered in the adiabatic approach.
(A) A duplex is comprised of two strands, which we refer to as W (Watson) and C (Crick) strands. The relative position of the strands is characterized by the alignment index ; for the depicted duplex, . The length of the hybridization region is called . (B) A ternary complex contains three strands. By convention, we denote the two strands that are on the same ‘side’' of the complex as W1 and W2, and the complementary strand as C. The alignment indices and denote the positions of W1 and W2 relative to C. For the depicted complex, and . The length of the hybridization regions is called and .
Schematic representation of a 3-1 quaternary complex.
Three strands (referred to as Watson strands W1, W2, and W3) hybridize to a single template strand (Crick strand C). The positions relative to the left end of the C strand are given by the alignment indices , and ; here, . The length of the overlap regions is denoted as , and .
Schematic representation of a left-tilted 2-2 quaternary complex.
(A) Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Both Watson strands are hybridized to the left Crick strand C1, whereas only W2 is hybridized to the right Crick strand C2. The alignment indices and denote the position of the strands relative to the left end of C1; here, , and . The length of the hybridization regions is called , , and . (B) Rotating the schematic representation of a left-tilted 2-2 quaternary complex by produces an alternative representation of the same complex, which is again a left-tilted 2-2 complex. The panel depicts the rotated complex representation (variables with superscript ‘rot’) as well as the non-rotated representation (variables without superscript). There is a unique linear mapping between non-rotated and rotated representation, for example C2 after rotation always corresponds to W1 before rotation.
Schematic representation of a right-tilted 2-2 quaternary complex.
(A) Two Watson strands (W1 and W2) are hybridized to two Crick strands (C1 and C2). Unlike in the left-tilted 2-2 quaternary complex, both Watson strands are hybridized to the right Crick strand C2, whereas only W1 is hybridized to the left Crick strand C1. The alignment indices , and denote the positions of the strands relative to C1; here, , , and . The length of the overlap regions is called and . (B) Rotating the schematic representation of a right-tilted 2-2 quaternary complex by produces an alternative representation of the same complex, which is again a right-tilted 2–2 complex. The panel depicts the rotated complex representation (variables with superscript ‘rot’) as well as the non-rotated representation (variables without superscript). There is a unique linear mapping between non-rotated and rotated representation, for example C2 after rotation always corresponds to W1 before rotation. The mapping is identical for left- and right-tilted 2–2 quaternary complexes.
Schematic representation of a complex that allows templated ligation.
The strands E1 and E2 are adjacent to each other, such that a covalent bond can form between their ends. The length of the product strand, , is set by the length of the educt strands, and . The likelihood for the complex to form a product oligomer whose sequence is compatible with the true circular genome, , is determined by the length of the educts and the length of their hybridization region with the template. The parts of the complex shown with hatching do not affect .
Effective association constants of complexes facilitating F+F ligations (A), false V+V ligations (B), and F+V ligations (C).
The dots depict the effective association constants derived based on the combinatorial rules presented in the Methods section, and the solid lines represent the respective scaling laws introduced in Equations 10 and 11. Different colors correspond to different hybridization energies per matching nearest neighbor block γ.
Replication efficiency as a function of the concentration of VCG oligomers for different choices of genomes and varying VCG oligomer length.
All genomes are long and include all motifs up to length , but differ with respect to their minimal unique subsequence length : (A) , (B) , and (C) (shown as dotted lines). For comparison, every panel shows the replication efficiency of a genome with (solid line). Different colors are used to distinguish different VCG oligomer lengths. Under otherwise identical conditions (e.g. identical oligomer length), replication proceeds with lower efficiency in genomes with higher unique subsequence length .
Maximal replication efficiency as a function of the oligomer length for different genomes (all long).
Regardless of the genome, the oligomer length needs to exceed to enable replication with high efficiency (e.g., higher than 95%). The difference between the genome length required for high efficiency replication and the unique motif length depends on the motif distribution on intermediate length scales : Genomes with strong bias require longer oligomers (A) than genomes with weak bias (B) (see Supplementary file 1 for the genomes and their motif entropies).
Effective association constants of complexes facilitating 1+V ligations (A) and 2+V ligations (B).
The effective association constants of reactive ternary complexes can be computed based on the effective association constant of the duplexes.
(A) If the hybridization region of the two VCG oligomers is nucleotides long, the monomer hybridizes to the end (start) of the template strand. As the template has no dangling end, the energy contribution of the hybridizing monomer is . (B) and (C) for hybridization regions that are shorter than , but at least 1 nucleotide long, the energy contribution due to the added nucleotide is . In all cases, the factor 2 accounts for the two possible positions at which a monomer might be added.
Comparison between approximate (analytical) and exact (numerical) solution of the (de)hybridization equilibrium in multi-length VCG pools.
(A) The approximation converges after roughly five iteration steps. The relative error drops below 1% in the first iteration step already. (B) Equilibrium concentrations as a function of total VCG concentration. Continuous lines show the numerical solution, while the dotted and dashed lines depict the approximation obtained in the zeroth or first iteration step, respectively. The feedstock concentration is fixed, .
Replication performance of multi-length VCG pools, in which only the monomers are activated.
The pool includes activated monomers , as well as non-activated oligomers of length up to with variable total concentration . The system exhibits inversion of productivity: 10-mers are more likely to be in a monomer-extension competent state than 12-mers, 8-mers are more likely to be extended by monomers than 10-mers (for provided ). For the experimentally used concentration (vertical dashed line), 8-mers are more productive than 10-mers, and those are more productive than 12-mers. However, unlike in the experimental system, 6-mers are less productive than 8-mers and 10-mers.
Tables
Input parameters and resulting observables (yield and efficiency) from the full kinetic simulation of replication in pools containing monomers and VCG oligomers of a single length . The observables (yield and efficiency) listed in this table are shown in Figure 2.
| VCG oligo. length | conc. ratio | volume | equilibration time | correlation time | number of samples | yield | efficiency |
|---|---|---|---|---|---|---|---|
| 6 | 1.0 ⋅ 10−4 | 5.0 ⋅ 104 | 3.4 ⋅ 106 | 1.9 ⋅ 106 | 3805 | 0.04 ± 0.01 | 0.04 ± 0.01 |
| 6 | 1.0 ⋅ 10−3 | 5.0 ⋅ 103 | 1.2 ⋅ 107 | 2.6 ⋅ 106 | 3264 | 0.38 ± 0.02 | 0.36 ± 0.02 |
| 6 | 3.3 ⋅ 10−3 | 8.0 ⋅ 102 | 1.3 ⋅ 107 | 2.7 ⋅ 106 | 5400 | 0.68 ± 0.02 | 0.64 ± 0.02 |
| 6 | 1.0 ⋅ 10−2 | 9.1 ⋅ 101 | 1.4 ⋅ 107 | 2.7 ⋅ 106 | 5440 | 0.87 ± 0.01 | 0.77 ± 0.03 |
| 6 | 3.3 ⋅ 10−2 | 9.1 ⋅ 100 | 1.3 ⋅ 107 | 2.4 ⋅ 106 | 6170 | 0.96 ± 0.01 | 0.63 ± 0.03 |
| 7 | 1.0 ⋅ 10−4 | 3.9 ⋅ 104 | 1.7 ⋅ 108 | 2.6 ⋅ 107 | 784 | 0.33 ± 0.05 | 0.33 ± 0.05 |
| 7 | 1.0 ⋅ 10−3 | 7.6 ⋅ 102 | 1.9 ⋅ 108 | 4.0 ⋅ 107 | 2041 | 0.87 ± 0.02 | 0.81 ± 0.05 |
| 7 | 3.3 ⋅ 10−3 | 7.7 ⋅ 101 | 1.9 ⋅ 108 | 3.3 ⋅ 107 | 2980 | 0.95 ± 0.01 | 0.87 ± 0.04 |
| 7 | 1.0 ⋅ 10−2 | 1.1 ⋅ 101 | 1.9 ⋅ 108 | 2.6 ⋅ 107 | 3465 | 0.99 ± 0.01 | 0.81 ± 0.05 |
| 7 | 3.3 ⋅ 10−2 | 1.7 ⋅ 100 | 1.9 ⋅ 108 | 3.1 ⋅ 107 | 3235 | 0.99 ± 0.04 | 0.73 ± 0.05 |
| 8 | 1.0 ⋅ 10−4 | 6.3 ⋅ 103 | 2.5 ⋅ 109 | 1.1 ⋅ 108 | 466 | 0.81 ± 0.05 | 0.81 ± 0.05 |
| 8 | 1.0 ⋅ 10−3 | 9.9 ⋅ 101 | 1.9 ⋅ 109 | 3.6 ⋅ 108 | 615 | 0.99 ± 0.01 | 0.99 ± 0.01 |
| 8 | 3.3 . 10-3 | 1.6 ⋅ 101 | 1.0 ⋅ 109 | 2.2 ⋅ 108 | 1100 | 0.95 ± 0.03 | 0.95 ± 0.03 |
| 8 | 1.0 . 10-2 | 3.8 ⋅ 100 | 5.6 ⋅ 108 | 1.4 ⋅ 108 | 1700 | 1.00 ± 0.01 | 0.93 ± 0.05 |
| 8 | 3.3 . 10-2 | 0.9 ⋅ 100 | 4.9 ⋅ 108 | 7.4 ⋅ 107 | 3195 | 1.00 ± 0.03 | 0.82 ± 0.05 |
Additional files
-
Supplementary file 1
List of genomes sampled via the Metropolis-Hastings algorithm for a genome length of 64 nucleotides.
- https://cdn.elifesciences.org/articles/104043/elife-104043-supp1-v1.pdf