Stochastic yield catastrophes and robustness in self-assembly

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Appendix 1
Appendix 2
Appendix 3
Appendix 4
Appendix 5
Appendix 6
Appendix 7
Appendix 8
Data availability
References
Article and author information
Metrics

Abstract

A guiding principle in self-assembly is that, for high production yield, nucleation of structures must be significantly slower than their growth. However, details of the mechanism that impedes nucleation are broadly considered irrelevant. Here, we analyze self-assembly into finite-sized target structures employing mathematical modeling. We investigate two key scenarios to delay nucleation: (i) by introducing a slow activation step for the assembling constituents and, (ii) by decreasing the dimerization rate. These scenarios have widely different characteristics. While the dimerization scenario exhibits robust behavior, the activation scenario is highly sensitive to demographic fluctuations. These demographic fluctuations ultimately disfavor growth compared to nucleation and can suppress yield completely. The occurrence of this stochastic yield catastrophe does not depend on model details but is generic as soon as number fluctuations between constituents are taken into account. On a broader perspective, our results reveal that stochasticity is an important limiting factor for self-assembly and that the specific implementation of the nucleation process plays a significant role in determining the yield.

eLife digest

The self-assembly of a large biological molecule from small building blocks is like finishing a puzzle of magnetic pieces by shaking the box. Even though each piece of the puzzle is attracted to its correct neighbours, the limited control makes it very hard to finish the puzzle in a short amount of time.

The problem becomes even more difficult if several copies of the same puzzle are assembled in one box. If several puzzles start at the same time, the different parts might steal pieces from each other, making it impossible to successfully complete any of the puzzles. This is called a depletion trap. If the box is only shaken and there is no real control over individual pieces, these traps occur at random.

Overcoming these random depletion traps is an important challenge when assembling nanostructures and other artificial molecules designed by humans without wasting many, potentially expensive, components. Previous studies have shown that when multiple copies of the same structure are assembled simultaneously, slowing the rate of initiation increases the yield of correctly-made structures. This prevents new structures from stealing pieces from existing structures before they are fully completed.

Now, Gartner, Graf, Wilke et al. have used a mathematical model to show that changing the way initiation is delayed leads to different yields. This was especially true for small systems where fluctuations in the availability of the different pieces strongly enhanced the initiation of new structures. In these cases, the self-assembly process terminated undesirably with many incomplete structures.

Nanostructures have various applications ranging from drug delivery to robotics. These findings suggest that in order to efficiently assemble biological molecules, the concentrations of the different building blocks need to be tightly controlled. A question for further research is to investigate strategies that reduce fluctuations in the availability of the building blocks to develop more efficient assembly protocols.

Introduction

Efficient and accurate assembly of macromolecular structures is vital for living organisms. Not only must resource use be carefully controlled, but malfunctioning aggregates can also pose a substantial threat to the organism itself (Jucker and Walker, 2013; Drummond and Wilke, 2009). Furthermore, artificial self-assembly processes have important applications in a variety of research areas like nanotechnology, biology, and medicine (Zhang, 2003; Whitesides and Grzybowski, 2002; Whitesides et al., 1991). In these areas, we find a broad range of assembly schemes. For example, while a large number of viruses assemble capsids from identical protein subunits, some others, like the Escherichia virus T4, form highly complex and heterogeneous virions encompassing many different types of constituents (Zlotnick et al., 1999; Zlotnick, 2003; Hagan, 2014; Leiman et al., 2010). Furthermore, artificially built DNA structures can reach up to Gigadalton sizes and can, in principle, comprise an unlimited number of different subunits (Ke et al., 2012; Reinhardt and Frenkel, 2014; Gerling et al., 2015; Wagenbauer et al., 2017). Notwithstanding these differences, a generic self-assembly process always includes three key steps: First, subunits must be made available, for example by gene expression, or rendered competent for binding, for example by nucleotide exchange (Alberts and Johnson, 2015; Chen et al., 2008; Whitelam, 2015) (‘activation’). Second, the formation of a structure must be initiated by a nucleation event (‘nucleation’). Due to cooperative or allosteric effects in binding, there might be a significant nucleation barrier (Chen et al., 2008; Jacobs and Frenkel, 2015; Sear, 2007; Lazaro and Hagan, 2016; Hagan and Elrad, 2010). Third, following nucleation, structures grow via aggregation of substructures (‘growth’). To avoid kinetic traps that may occur due to irreversibility or very slow disassembly of substructures (Hagan et al., 2011; Grant et al., 2011), structure nucleation must be significantly slower than growth (Zlotnick et al., 1999; Ke et al., 2012; Reinhardt and Frenkel, 2014; Wei et al., 2012; Jacobs et al., 2015; Hagan and Elrad, 2010). Physically speaking, there are no irreversible reactions. However, in the biological context, self-assembly describes the (relatively fast) formation of long-lasting, stable structures. Therefore, at least part of the assembly reactions are often considered to be irreversible on the time scale of the assembly process. In this manuscript we investigate, for a given target structure, whether the nature of the specific mechanism employed in order to slow down nucleation influences the yield of assembled product. To address this question, we examine a generic model that incorporates the key elements of self-assembly outlined above.

Model definition

We model the assembly of a fixed number of well-defined target structures from limited resources. Specifically, we consider a set of $S$ different species of constituents denoted by $1, \dots, S$ which assemble into rings of size $L$ . The cases $S = 1$ and $1 < S \leq L$ ( $S = L$ ) are denoted as homogeneous and partially (fully) heterogeneous, respectively. The homogeneous model builds on previous work on virus capsid (Chen et al., 2008; Hagan et al., 2011), linear protein filament assembly (Michaels et al., 2016; Michaels et al., 2017; D'Orsogna et al., 2012) and aggregation and polymerization models (Krapivsky et al., 2010). The heterogeneous model in turn links to previous model systems used to study, for example, DNA-brick-based assembly of heterogeneous structures (Murugan et al., 2015; Hedges et al., 2014; D'Orsogna et al., 2013). We emphasize that, even though strikingly similar experimental realizations of our model exist (Gerling et al., 2015; Wagenbauer et al., 2017; Praetorius and Dietz, 2017), it is not intended to describe any particular system. The ring structure represents a general linear assembly process involving building blocks with equivalent binding properties and resulting in a target of finite size. The main assumption in the ring model is that the different constituents assemble linearly in a sequential order. In many biological self-assembling systems like bacterial flagellum assembly or biogenesis of the ribosome subunits the assumption of a linear binding sequence appears to be justified (Peña et al., 2017; Chevance and Hughes, 2008). In order to test the validity of our results beyond these constraints we also perform stochastic simulations of generalized self-assembling systems that do not obey a sequential binding order: i) by explicitly allowing for polymer-polymer bindings and ii) by considering the assembly of finite sized squares that grow independently in two dimensions (see Figures 6 and 7).

The assembly process starts with $N$ inactive monomers of each species. We use $C = N / V$ to denote the initial concentration of each monomer species, where $V$ is the reaction volume. Monomers are activated independently at the same per capita rate $α$ , and, once active, are available for binding. Binding takes place only between constituents of species with periodically consecutive indices, for example 1 and 2 or $S$ and 1 (leading to structures such as $\dots 1231 \dots$ for $S = 3$ ); see Figure 1. To avoid ambiguity, we restrict ring sizes to integer multiples of the number of species $S$ . Furthermore, we neglect the possibility of incorrect binding, for example species 1 binding to 3 or $S - 1$ . Polymers, that is incomplete ring structures, grow via consecutive attachment of monomers. For simplicity, polymer-polymer binding is disregarded at first, as it is typically assumed to be of minor importance (Zlotnick et al., 1999; Chen et al., 2008; Murugan et al., 2015; Haxton and Whitelam, 2013). To probe the robustness of the model, later we consider an extended model including polymer-polymer binding for which the results are qualitatively the same (see Figure 6 and the discussion). Furthermore, it has been observed that nucleation phenomena play a critical role for self-assembly processes (Ke et al., 2012; Wei et al., 2012; Reinhardt and Frenkel, 2014; Chen et al., 2008). So it is in general necessary to take into account a critical nucleation size, which marks the transition between slow particle nucleation and the faster subsequent structure growth (Michaels et al., 2016; Lazaro and Hagan, 2016; Morozov et al., 2009; Murugan et al., 2015). We denote this critical nucleation size by $L_{nuc}$ , which in terms of classical nucleation theory corresponds to the structure size at which the free energy barrier has its maximum. For $l < L_{nuc}$ attachment of monomers to existing structures and decay of structures (reversible binding) into monomers take place at size-dependent reaction rates $μ_{l}$ and $δ_{l}$ , respectively (Figure 1). Here, we focus on identical rates $μ_{l} = μ$ and $δ_{l} = δ$ . A discussion of the general case is given in Appendix 4. Above the nucleation size, polymers grow by attachment of monomers with reaction rate $ν \geq μ$ per binding site. As we consider successfully nucleated structures to be stable on the observational time scales, monomer detachment from structures above the critical nucelation size is neglected (irreversible binding) (Murugan et al., 2015; Chen et al., 2008). Complete rings neither grow nor decay (absorbing state).

Figure 1

Download asset Open asset

Schematic description of the model.

(a) Rings of size $L$ are assembled from $S$ different particle species. $N$ monomers of each species are initially in an inactive state (blue) and are activated at the same per-capita rate $α$ . Once active (green), species with periodically consecutive index can bind to each other. Structures grow by attachment of single monomers. Below a critical nucleation size ( $L_{nuc}$ ), structures of size $l$ (light yellow) grow and decay into monomers at size-dependent rates $μ_{l}$ and $δ_{l}$ , respectively. Above the critical size, polymers (dark yellow) are stable and grow at size-independent rate $ν$ until the ring is complete (the absorbing state; red). (b) Illustration of depletion traps. If nucleation is slow compared to growth, initiated structures are likely to be completed. Otherwise, many stable nuclei will form that cannot be completed before resources run out.

We investigate two scenarios for the control of nucleation speed, first separately and then in combination. For the ‘activation scenario’ we set $μ = ν$ (all binding rates are equal) and control the assembly process by varying the activation rate $α$ . For the ‘dimerization scenario’ all particles are inherently active ( $α \to \infty$ ) and we control the assembly process by varying the dimerization rate $μ$ (we focus on $L_{nuc} = 2$ ). It has been demonstrated previously in Chen et al. (2008) and (Endres and Zlotnick, 2002; Hagan and Elrad, 2010; Morozov et al., 2009) that either a slow activation or a slow dimerization step are suitable in principle to retard nucleation and favour growth of the structures over the initiation of new ones. We quantify the quality of the assembly process in terms of the assembly yield, defined as the number of successfully assembled ring structures relative to the maximal possible number $N S / L$ . Yield is measured when all resources have been used up and the system has reached its final state. We do not discuss the assembly time in this manuscript, however, in Appendix 5 we show typical trajectories for the time evolution of the yield in the activation and dimerization scenario. If the assembly product is stable (absorbing state), the yield can only increase with time. Consequently, the final yield constitutes the upper limit for the yield irrespective of additional time constraints. Therefore, the final yield is an informative and unambiguous observable to describe the efficiency of the assembly reaction.

We simulated our system both stochastically via Gillespie’s algorithm (Gillespie, 2007) and deterministically as a set of ordinary differential equations corresponding to chemical rate equations (see Appendix 1).

Results

Deterministic behavior in the macroscopic limit

First, we consider the macroscopic limit, $N ≫ 1$ , and investigate how assembly yield depends on the activation rate $α$ (activation scenario) and the dimerization rate $μ$ (dimerization scenario) for $L_{nuc} = 2$ . Here, the deterministic description coincides with the stochastic simulations (Figure 2a and b). For both high activation and high dimerization rates, yield is very poor. Upon decreasing either the activation rate (Figure 2a) or the dimerization rate (Figure 2b), however, we find a threshold value, $α_{th}$ or $μ_{th}$ , below which a rapid transition to the perfect yield of 1 is observed both in the deterministic and stochastic simulation. By exploiting the symmetries of the system with respect to relabeling of species, one can show that, in the deterministic limit, the behavior is independent of the number of species $S$ (for fixed $L$ and $N$ , see Appendix 1). Consequently, all systems behave equivalently to the homogeneous system and yield becomes independent of $S$ in this limit. Note, however, that equivalent systems with differing $S$ have different total numbers of particles $S N$ and hence assemble different total numbers of rings.

Figure 2

Download asset Open asset

Deterministic behavior in the macroscopic limit $N ≫ 1$ .

(**a, b**) Yield for different particle numbers $N$ (symbols) and ring sizes $L$ (colors) for $L_{nuc} = 2$ . Decreasing either (a) the activation rate (‘activation scenario’: $μ = ν$ ) or (b) the dimerization rate (‘dimerization scenario’: $α \to \infty$ ) achieves perfect yield. The stochastic simulation results (symbols) average over 16 realizations and agree exactly with the integration of the chemical rate equations (lines). The threshold values (Equation 1) are indicated by the vertical dashed lines. Plotting yield against the dimensionless quantity $α / (ν C)$ causes the curves for different $C$ to collapse into a single master curve (inset in a). For both scenarios there is no dependency on the number of species $S$ in the deterministic limit. (**c, d**) Illustration showing how depletion traps are avoided by either slow activation (c) or slow dimerization (d). If the activation or the dimerization rate is small (large) compared to the growth rate, assembly paths leading to complete rings are favored (disfavored). The color scheme is the same as in Figure 1. (e) Deterministically, the size distribution of polymers behaves like a wave, and is shown for large and small activation rate for $L = 60$ , $L_{nuc} = 2$ , $N = 10000$ and $μ = ν = 1$ . The distributions are obtained from a numerical integration of the deterministic mean-field dynamics, Equation 6, and are plotted for early, intermediate and final simulation times. The respective percentage of inactive monomers and complete rings is indicated by the symbols in the scale bar on the left or right.

Decreasing the activation rate reduces the concentration of active monomers in the system. Hence growth of the polymers is favored over nucleation, because growth depends linearly on the concentration of active monomers while nucleation shows a quadratic dependence. Likewise, lower dimerization rates slow down nucleation relative to growth. Both mechanisms therefore restrict the number of nucleation events, and ensure that initiated structures can be completed before resources become depleted (see Figure 2c and d).

Mathematically, the deterministic time evolution of the polymer size distribution $c (l, t)$ is described by an advection-diffusion equation (Endres and Zlotnick, 2002; Yvinec et al., 2012) with advection and diffusion coefficients depending on the instantaneous concentration of active monomers (see Appendix 2). Solving this equation results in the wavefront of the size distribution advancing from small to large polymer sizes (Figure 2e). Yield production sets in as soon as the distance travelled by this wavefront reaches the maximal ring size $L$ . Exploiting this condition, we find that in the deterministic system for $L_{nuc} = 2$ , a non-zero yield is obtained if either the activation rate or the dimerization rate remains below a corresponding threshold value, that is if $α < α_{th}$ or $μ < μ_{th}$ , where

α_{th} = P_{α} \frac{ν}{μ} \frac{ν C}{{(L - \sqrt{L})}^{3}} and μ_{th} = P_{μ} \frac{ν}{{(L - \sqrt{L})}^{2}}

(see Appendix 3) with proportionality constants $P_{α} = [\sqrt{π} Γ (2 / 3) / Γ (7 / 6)]^{3} / 3 \approx 5.77$ and $P_{μ} = π^{2} / 2 \approx 4.93$ . These relations generalize previous results (Morozov et al., 2009) to finite activation rates and for heterogeneous systems. A comparison between the threshold values given by Equation 1 and the simulated yield curves is shown in Figure 2a,b. The relations highlight important differences between the two scenarios (where $α \to \infty$ and $μ = ν$ , respectively): While $α_{th}$ decreases cubically with the ring size $L$ , $μ_{th}$ does so only quadratically. Furthermore, the threshold activation rate $α_{th}$ increases with the initial monomer concentration $C$ . Consequently, for fixed activation rate, the yield can be optimized by increasing $C$ . In contrast, the threshold dimerization rate is independent of $C$ and the yield curves coincide for $N ≫ 1$ . Finally, if $α$ is finite and $μ < ν$ , the interplay between the two slow-nucleation scenarios may lead to enhanced yield. This is reflected by the factor $ν / μ$ in $α_{th}$ , and we will come back to this point later when we discuss the stochastic effects.

In summary, for large particle numbers ( $N ≫ 1$ ), perfect yield can be achieved in two different ways, independently of the heterogeneity of the system - by decreasing either the activation rate (activation scenario) or the dimerization rate (dimerization scenario) below its respective threshold value.

Stochastic effects in the case of reduced resources

Next, we consider the limit where the particle number becomes relevant for the physics of the system. In the activation scenario, we find a markedly different phenomenology if resources are sparse. Figure 3a shows the dependence of the average yield on the activation rate for different, low particle numbers in the completely heterogeneous case ( $S = L$ ). Here, we restrict our discussion to the average yield. The error of the mean is negligible due to the large number of simulations used to calculate the average yield. Still, due to the randomness in binding and activation, the yield can differ between simulations. A figure with the average yield and its standard deviation is shown in Appendix 6. For very low and very high average yield, the standard deviation has to be small due to the boundedness of the yield. For intermediate values of the average, the standard deviation is highest but still small compared to the average yield. Thus, the average yield is meaningful for the essential understanding of the assembly process. Whereas the deterministic theory predicts perfect yield for small activation rates, in the stochastic simulation yield saturates at an imperfect value $y_{\max} < 1$ . Reducing the particle number $N$ decreases this saturation value $y_{\max}$ until no finished structures are produced ( $y_{\max} \to 0$ ). The magnitude of this effect strongly depends on the size of the target structure $L$ if the system is heterogeneous. Figure 3c shows a diagram characterizing different regimes for the saturation value of the yield, $y_{\max} (N, L)$ , in dependence of the particle number $N$ and the size of the target structure $L$ for fully heterogeneous systems $(S = L)$ . We find that the threshold particle number $N_{y}^{t h}$ necessary to obtain a fixed yield $y$ increases nonlinearly with the target size $L$ . For the depicted range of $L$ , the dependence of the threshold for nonzero yield, $N_{> 0}^{t h}$ , on $L$ can approximately be described by a power-law: $N_{> 0}^{t h} \sim L^{ξ}$ , with exponent $ξ \approx 2.8$ for $L \leq 600$ . Consequently, for $L = 600$ already more than 10⁵ rings must be assembled in order to obtain a yield larger than zero. In Appendix 8 we included two additional plots that show the dependence of $y_{\max}$ on $N$ for fixed $L$ and the dependence on $L$ for fixed $N$ , respectively. The suppression of the yield is caused by fluctuations (see explanation below) and is not captured by a deterministic description. Because these stochastic effects can decrease the yield from a perfect value in a deterministic description to zero (see Figure 3a), we term this effect ‘stochastic yield catastrophe’. For fixed target size $L$ and fixed maximum number of target structures $\frac{N S}{L}$ , $y_{\max}$ increases with decreasing number of species, see Figure 3d. In the fully homogeneous case, $S = 1$ , a perfect yield of 1 is always achieved for $α \to 0$ . The decrease of the maximal yield with the number of species $S$ thus suggests that, in order to obtain high yield, it is beneficial to design structures with as few different species as possible. In large part this effect is due to the constraint $S N = const$ , whereby the more homogeneous systems (small $S$ ) require larger numbers of particles per species $N$ and, correspondingly, exhibit less stochasticity. If $N$ is fixed instead of $S N$ , the yield still initially decreases with increasing number of species $S$ but then quickly reaches a stationary plateau and gets independent of $S$ for $S ≫ 1$ , see Appendix 7. Moreover, increasing the nucleation size $L_{nuc}$ , and with it the reversibility of binding, also increases $y_{\max}$ , see Figure 3(d). This indicates that, beside heterogeneity of the target structure, irreversibility of binding on the relevant time scale makes the system susceptible to stochastic effects.

Figure 3

Download asset Open asset

Stochastic effects in the case of reduced resources.

(**a, b**) Yield of the fully heterogeneous system ( $S = L$ ) for reduced number of particles (symbols) for $L = 60$ and $L_{nuc} = 2$ averaged over 1024 ensembles. In the activation scenario, at low activation rates the yield saturates at an imperfect value $y_{\max}$ , which decreases with the number of particles (a). This finding disagrees with the deterministic prediction (black line) of perfect yield for $α \to 0$ . In contrast, the dimerization scenario robustly exhibits the maximal yield of 1 for small $N$ , in agreement with the deterministic prediction (black line) (b). (c) Diagram showing different regimes of $y_{\max} (N, L)$ in dependence of the particle number $N$ and target size $L$ (for the fully heterogeneous system $S = L$ ) as obtained from stochastic simulations in the limit $α \to 0$ . The minimal number of particles necessary to obtain a fixed yield increases in a strongly nonlinear way with the target size. The symbols along the line $L = 60$ represent the saturation values of the yield curves in (a). (d) Dependence of $y_{\max}$ on the number of species $S$ for fixed $L = 60$ and fixed number of ring structures $N S / L$ . Symbols indicate different values of the critical nucleation size $L_{nuc}$ . The impact of stochastic effects strongly depends on the number of species under the constraint of a fixed total number of particles $N S$ and fixed target size $L$ . The homogeneous system is not subject to stochastic effects at all. Higher reversibility for larger $L_{nuc}$ also mitigates stochastic effects.

The stochastic yield catastrophe is mainly attributable to fluctuations in the number of active monomers. In the deterministic (mean-field) equation the different particle species evolve in balanced stoichiometric concentrations. However, if activation is much slower than binding, the number of active monomers present at any given time is small, and the mean-field assumption of equal concentrations is violated due to fluctuations (for $S > 1$ ). Activated monomers then might not fit any of the existing larger structures and would instead initiate new structures. Figure 4a illustrates this effect and shows how fluctuations in the availability of active particles lead to an enhanced nucleation and, correspondingly, to a decrease in yield. Due to the effective enhancement of the nucleation rate, the resulting polymer size distribution has a higher amplitude than that predicted deterministically (Figure 4b) and the system is prone to depletion traps. A similar broadening of the size distribution has been reported in the context of stochastic coagulation-fragmentation of identical particles (D'Orsogna et al., 2015).

Figure 4

Download asset Open asset

Cause and effect of stochasticity in the activation scenario.

(a) Illustration of the significance of stochastic effects when resources are sparse. Arrows indicate possible transitions and the probabilities in the depicted situation for sufficiently small activation rate $α$ . For small $α$ , the random order of activation alone determines the availability of monomers and therefore the order of binding. In the depicted situation, the complete structure is assembled only with probability 1/2. In all other cases, only fragments of the structure are assembled such that the final yield is decreased. (b) Polymer size distribution for the activation scenario (symbols) as obtained from stochastic simulations, in comparison with its deterministic prediction (lines) for $S = L = 100$ , $N = 1000$ and $L_{nuc} = 2$ . Due to the enhanced number of nucleation events, the stochastic wave encompasses far more structures and moves more slowly. As a result, it does not quite reach the absorbing boundary.

In the dimerization scenario, in contrast, there is no stochastic activation step. All particles are available for binding from the outset. Consequently, stochastic effects do not play an essential role in the dimerization scenario and perfect yield can be reached robustly for all system sizes, regardless of the number of species $S$ (Figure 3(b)).

Non-monotonic yield curves for a combination of slow dimerization and activation

So far, the two implementations of the ‘slow nucleation principle’ have been investigated separately. Surprisingly, we observe counter-intuitive behavior in a mixed scenario in which both dimerization and activation occur slowly (i.e., $μ < ν$ , $α < \infty$ ). Figure 5 shows that, depending on the ratio $μ / ν$ , the yield can become a non-monotonic function of $α$ . In the regime where $α$ is large, nucleation is dimerization-limited; therefore activation is irrelevant and the system behaves as in the dimerization scenario for $α \to \infty$ . Upon decreasing $α$ we then encounter a second regime, where activation and dimerization jointly limit nucleation. The yield increases due to synergism between slow dimerization and activation (see $μ / ν$ dependence of $α_{th}$ , Equation 1), whilst the average number of active monomers is still high and fluctuations are negligible. Finally, a stochastic yield catastrophe occurs if $α$ is further reduced and activation becomes the limiting step. This decline is caused by an increase in nucleation events due to relative fluctuations in the availability of the different species (‘fluctuations between species’). This contrasts the deterministic description where nucleation is always slower for smaller activation rate. Depending on the ratio $μ / ν$ , the ring size $L$ and the particle number $N$ , maximal yield is obtained either in the dimerization-limited (red curves, Figure 5), activation-limited (blue curve, Figure 5b) or intermediate regime (green and orange curves, Figure 5).

Figure 5

Download asset Open asset

Yield for a combination of slow dimerization and activation.

(**a, b**) Dependence of the yield of the fully heterogeneous system on the activation rate $α$ for $N = 100$ and different values of the dimerization rate (colors/symbols) for $L = 60$ (a) and $L = 40$ (b) (averaged over 1024 ensembles). For large activation rates the yield behaves deterministically (lines). In contrast, for small activation rates, stochastic effects (blue shading) lead to a decrease in yield. Depending on the parameters, the yield maximum is attained in either the deterministic, stochastic or intermediate regime. (c) Table summarizing the qualitative behavior of the yield (poor/intermediate/perfect) for a combination of dimerization and activation rates for both the deterministic and the stochastic limit. The columns correspond to low and high values of the dimerization rate, as indicated by the marker in the corresponding deterministic yield curve at the top of the column. Similarly, the rows correspond to low, intermediate and high activation rates. Arrows and colors indicate where and for which curve this behavior can be observed in (a) and (b). Deviations between the deterministic and stochastic limits are most prominent for low activation rates.

Robustness of the results to model modifications

In our model, the reason for the stochastic yield catastrophe is that - due to fluctuations between species - the effective nucleation rate is strongly enhanced. Hence, if binding to a larger structure is temporarily impossible, activated monomers tend to initiate new structures, causing an excess of structures that ultimately cannot be completed. Natural questions that arise are whether (i) relaxing the constraint that polymers cannot bind other polymers or (ii) abandoning the assumption of a linear assembly path, will resolve the stochastic yield catastrophe. To answer these questions, we performed stochastic simulations for extensions of our model system showing that the stochastic yield catastrophe indeed persists. We start by considering the ring model from the previous section but take polymer-polymer binding into account in addition to growth via monomer attachment (Figure 6). In detail, we assume that two structures of arbitrary size (and with combined length $\leq L$ ) bind at rate $ν$ if they fit together, that is if the left (right) end of the first structure is periodically continued by the right (left) end of the second one. Realistically, the rate of binding between two structures is expected to decrease with the motility and thus the sizes of the structures. In order to assess the effect of polymer-polymer binding, we focus on the worst case where the rate for binding is independent of the size of both structures. If a stochastic yield catastrophe occurs for this choice of parameters, we expect it to be even more pronounced in all the ‘intermediate cases’. Figure 6 shows the dependence of the yield on the activation rate in the polymer-polymer model. As before, yield increases below a critical activation rate and then saturates at an imperfect value for small activation rates. Decreasing the number of particles per species, decreases this saturation value. Compared to the original model, the stochastic yield catastrophe is mitigated but still significant: For structures of size $S = L = 100$ , yield saturates at around 0.87 for $N = 100$ particles per species and at around 0.33 for $N = 10$ particles per species. We thus conclude that polymer-polymer binding indeed alleviates the stochastic yield catastrophe but does not resolve it. Since binding only happens between consecutive species, structures with overlapping parts intrinsically can not bind together and depletion traps continue to occur. Taken together, also in the extended model, fluctuations in the availability of the different species lead to an excess of intermediate-sized structures that get kinetically trapped due to structural mismatches. Note that in the extreme case of $N = 1$ , incomplete polymers can always combine into one final ring structure so that in this case the yield is always 1. Analogously, for high activation rates yield is improved for $N = 10$ compared to $N \geq 50$ (Figure 6b).

Figure 6

Download asset Open asset

Extended model including polymer-polymer binding.

(a) In the extended model, structures not only grow by monomer attachment but also by binding with another polymer (colored arrow). As before, binding only happens between periodically consecutive species with rate $ν$ per binding site. So, the reaction rate for two polymers is identical to the one for monomer-polymer binding, $ν$ . Furthermore, only polymers with combined length $\leq L$ can bind. All other processes and rules are the same as in the original model described in Figure 1. (b) The yield of the extended model as obtained from stochastic simulations is shown in dependence of the activation rate $α$ for $S = L = 100$ , $μ = ν = 1$ , $L_{nuc} = 2$ and different values of the number of particles per species, $N$ (averaged over 1024 ensembles). The qualitative behavior is the same as for the original model. In particular, yield saturates (in the stochastic limit) at an imperfect value for slow activation rates. Note that for small particle numbers polymer-polymer binding results in an increase of the minimal yield (here for large activation rates). This is due to the fact that even in the case where a priori too many nucleation events happen, polymers can combine into final structures.

Kinetic trapping due to structural mismatches can occur in every (partially) irreversible heterogeneous assembly process with finite-sized target structure and limited resources. From our results, we thus expect a stochastic yield catastrophe to be common to such systems. In order to further test this hypothesis, we simulated another variant of our model where finite sized squares assemble via monomer attachment from a pool of initially inactive particles, see Figure 7. In contrast to the original model, the assembled structures are non-periodic and exhibit a non-linear assembly path where structures can grow independently in two dimensions. While the ring model assumes a sequential order of binding of the monomers, the square allows for a variety of distinct assembly paths that all lead to the same final structure. Note that, because of the absence of periodicity, the square model is only well defined for the completely heterogeneous case. Figure 7 depicts the dependence of the yield on the activation rate for a square of size $S = 100$ . Also in this case, we find that the yield saturates at an imperfect value for small activation rates. Hence, we showed that the stochastic yield catastrophe is not resolved neither by accounting for polymer-polymer combination nor by considering more general assembly processes with multiple parallel assembly paths. This observation supports the general validity of our findings and indicates that stochastic yield catastrophes are a general phenomenon of (partially) irreversible and heterogeneous self-assembling systems that occur if particle number fluctuations are non-negligible.

Figure 7

Download asset Open asset

Assembly of squares of size $\sqrt{L} \times \sqrt{L}$ from $L$ different particle species.

(a) As in the ring models, there are $N$ monomers of each species in the system. All particles are initially in an inactive state (blue) and are activated at the same per-capita rate $α$ . Once active (green), species with neighboring position within the square (left/right, up/down) can bind to each other. Structures grow by attachment of single monomers until the square is complete (absorbing state). Depending on the number $b$ of contacts between the monomer and the structure, the corresponding rate is $b ν$ . For simplicity, all polymers (yellow) are stable ( $L_{nuc} = 2$ ) and we do not consider polymer-polymer binding. (b) The yield of the square model as obtained from stochastic simulations is shown in dependence of the activation rate $α$ for $S = L = 100$ , $μ = ν = 1$ and different values of the number of particles per species, $N$ (averaged over 256 ensembles). The qualitative behavior is the same as for the previous models: Whereas the yield is poor for large activation rates, it strongly increases below a threshold value and saturates (in the stochastic limit) at an imperfect value < 1 for small activation rates. The saturation value decreases with decreasing number of particles in the system.

Discussion

Our results show that different ways to slow down nucleation are indeed not equivalent, and that the explicit implementation is crucial for assembly efficiency. Susceptibility to stochastic effects is highly dependent on the specific scenario. Whereas systems for which dimerization limits nucleation are robust against stochastic effects, stochastic yield catastrophes can occur in heterogeneous systems when resource supply limits nucleation. The occurrence of stochastic yield catastrophes is not captured by the deterministic rate equations, for which the qualitative behavior of both scenarios is the same. Therefore, a stochastic description of the self-assembly process, which includes fluctuations in the availability of the different species, is required. The interplay between stochastic and deterministic dynamics can lead to a plethora of interesting behaviors. For example, the combination of slow activation and slow nucleation may result in a non-monotonic dependence of the yield on the activation rate. While deterministically, yield is always improved by decreasing the activation rate, stochastic fluctuations between species strongly suppress the yield for small activation rate by effectively enhancing the nucleation speed. This observation clearly demonstrates that a deterministically slow nucleation speed is not sufficient in order to obtain good yield in heterogeneous self-assembly. For example, a slow activation step does not necessarily result in few nucleation events although deterministically this behavior is expected. Thus, our results indicate that the slow nucleation principle has to be interpreted in terms of the stochastic framework and have important implications for yield optimization.

We showed that demographic noise can cause stochastic yield catastrophes in heterogeneous self-assembly. However, other types of noise, such as spatiotemporal fluctuations induced by diffusion, are also expected to trigger stochastic yield catastrophes. Hence, our results have broad implications for complex biological and artificial systems, which typically exhibit various sources of noise. We characterize conditions under which stochastic yield catastrophes occur, and demonstrate how they can be mitigated. These insights could usefully inform the design of experiments to circumvent yield catastrophes: In particular, while slow provision of constituents is a feasible strategy for experiments, it is highly susceptible to stochastic effects. On the other hand, irrespective of its robustness to stochastic effects, the experimental realization of the dimerization scenario relies on cooperative or allosteric effects in binding, and may therefore require more sophisticated design of the constituents (Sacanna et al., 2010; Zeravcic et al., 2017). Our theoretical analysis shows that stochasticity can be alleviated either by decreasing heterogeneity (presumably lowering realizable complexity) or by increasing reversibility (potentially requiring fine-tuning of bond strengths and reducing the stability of the assembly product). Alternative approaches to control stochasticity include the promotion of specific assembly paths (Murugan et al., 2015; Gartner, Graf and Frey, in preparation) and the control of fluctuations (Graf, Gartner and Frey, in preparation). One possibility to test these ideas and the ensuing control strategies could be via experiments based on DNA origami. Instead of building homogeneous ring structures as in Wagenbauer et al. (2017), one would have to design heterogeneous ring structures made from several different types of constituents with specified binding properties. By varying the opening angle of the ‘wedges’ (and thus the preferred number of building blocks in the ring) and/or the number of constituents, both the target structure size $L$ as well as the heterogeneity of the target structure $S$ could be controlled.

Moreover, the ideas presented in this manuscript are relevant for the understanding of intracellular self-assembly. In cells, provision of building blocks is typically a gradual process, as synthesis is either inherently slow or an explicit activation step, such as phosphorylation, is required. In addition, the constituents of the complex structures assembled in cells are usually present in small numbers and subject to diffusion. Hence, stochastic yield catastrophes would be expected to have devastating consequences for self-assembly, unless the relevant cellular processes use elaborate control mechanisms to circumvent stochastic effects. Further exploration of these control mechanisms should enhance the understanding of self-assembly processes in cells and help improve synthesis of complex nanostructures.

Materials and methods

All our simulation data was generated with either C++ or MATLAB. The source code is available at the eLife website.

Here we show the derivation of Equation 1 in the main text, giving the threshold values for the rate constants below which finite yield is obtained. The details can be found in Appendices 1–3.

Master equation and chemical rate equations

Request a detailed protocol

We start with the general Master equation and derive the chemical rate equations (deterministic/mean-field equations) for the heterogeneous self-assembly process. We renounce to show the full Master equation here but instead state the system that describes the evolution of the first moments. To this end, we denote the random variable that describes the number of polymers of size $ℓ$ and species $s$ in the system at time $t$ by $n_{ℓ}^{s} (t)$ with $2 \leq ℓ < L$ and $1 \leq s \leq S$ . The species of a polymer is defined by the species of the respective monomer at its left end. Furthermore, $n_{0}^{s}$ and $n_{1}^{s}$ denote the number of inactive and active monomers of species $s$ , respectively, and $n_{L}$ the number of complete rings. We signify the reaction rate for binding of a monomer to a polymer of size $ℓ$ by $ν_{ℓ}$ . $α$ denotes the activation rate and $δ_{ℓ}$ the decay rate of a polymer of size $ℓ$ . By $⟨ \dots ⟩$ we indicate (ensemble) averages. The system governing the evolution of the first moments (the averages) of the ${n_{ℓ}^{s}}$ is then given by:

\frac{d}{d t} ⟨ n_{0}^{s} ⟩ = - α ⟨ n_{0}^{s} ⟩,

\frac{d}{d t} ⟨ n_{1}^{s} ⟩ = α ⟨ n_{0}^{s} ⟩ - \sum_{ℓ = 1}^{L - 1} ν_{ℓ} (⟨ n_{1}^{s} n_{ℓ}^{s + 1} ⟩ + ⟨ n_{1}^{s} n_{ℓ}^{s - ℓ} ⟩) + \sum_{ℓ = 2}^{L_{nuc} - 1} \sum_{k = s + 1 - ℓ}^{k = s} δ_{ℓ} ⟨ n_{ℓ}^{k} ⟩,

\frac{d}{d t} ⟨ n_{2}^{s} ⟩ = ν_{1} ⟨ n_{1}^{s} n_{1}^{s + 1} ⟩ - ν_{2} ⟨ n_{2}^{s} n_{1}^{s + 2} ⟩ - ν_{2} ⟨ n_{2}^{s} n_{1}^{s - 1} ⟩ - δ_{2} ⟨ n_{2}^{s} ⟩ 1_{{2 < L_{nuc}}},

\frac{d}{d t} ⟨ n_{ℓ}^{s} ⟩ = ν_{ℓ - 1} ⟨ n_{ℓ - 1}^{s} n_{1}^{ℓ + s - 1} ⟩ + ν_{ℓ - 1} ⟨ n_{ℓ - 1}^{s + 1} n_{1}^{s} ⟩ - ν_{ℓ} ⟨ n_{ℓ}^{s} n_{1}^{s + ℓ} ⟩ - ν_{ℓ} ⟨ n_{ℓ}^{s} n_{1}^{s - 1} ⟩ - δ ⟨ n_{ℓ}^{s} ⟩ 1_{{ℓ < L_{nuc}}},

\frac{d}{d t} ⟨ n_{L}^{s} ⟩ = ν_{L - 1} ⟨ n_{L - 1}^{s} n_{1}^{L + s - 1} ⟩ + ν_{L - 1} ⟨ n_{L - 1}^{s + 1} n_{1}^{s} ⟩ .

The different terms of this equation are illustrated graphically in Figure 8. The first equation describes loss of inactive particles due to activation at rate $α$ . Equation 2b gives the temporal change of the number of active monomers that is governed by the following processes: activation of inactive monomers at rate $α$ , binding of active monomers to the left or to the right end of an existing structure of size $ℓ$ at rate $ν_{ℓ}$ , and decay of below-critical polymers of size $ℓ$ into monomers at rate $δ_{ℓ}$ (disassembly). Equations 2c and 2d describe the dynamics of dimers and larger polymers of size $3 \leq ℓ < L$ , respectively. The terms account for reactions of polymers with active monomers (polymerization) as well as decay in the case of below-critical polymers (disassembly). The indicator function $𝟏_{{x < L_{nuc}}}$ equals 1 if the condition $x < L_{nuc}$ is satisfied and 0 otherwise. Note that a polymer of size $ℓ \geq 3$ can grow by attaching a monomer to its left or to its right end whereas the formation of a dimer of a specific species is only possible via one reaction pathway (dimerization reaction). Finally, polymers of length $L$ – the complete ring structures – form an absorbing state and, therefore, include only the respective gain terms (cf Equation 2e).

Figure 8

Download asset Open asset

Graphical illustration of Equations 2 and 6.

(a) Visualization of the gain and loss terms in the dynamics of the active monomers in Equation 2b. Gain of active monomers is due to activation of inactive monomers as well as decay of unstable polymers. Loss of active monomers is due to dimerization and attachment of monomers to larger polymers. (b) Visualization of the transitions between clusters of different sizes (without distinction of species). The first and second box represent the inactive and active monomers in the system, the subsequent boxes each represent the ensemble of polymers of a certain size. The arrows between the boxes show possible reactions and transitions with the reaction rates indicated accordingly. Each arrow starting from or leading to a box is associated with a corresponding loss or gain term on the right hand side of Equation 2 and Equation 6.

We simulated the Master equation underlying Equation 2 stochastically using Gillespie’s algorithm. For the following deterministic analysis, we neglect correlations between particle numbers ${n_{ℓ}^{s}}$ , which is valid assumption for large particle numbers. Then the two-point correlator can be approximated as the product of the corresponding mean values (mean-field approximation)

⟨ n_{i}^{s} n_{j}^{k} ⟩ = ⟨ n_{i}^{s} ⟩ ⟨ n_{j}^{k} ⟩ \forall s, k

Furthermore, for the expectation values it must hold

⟨ n_{ℓ}^{s} ⟩ = ⟨ n_{ℓ}^{1} ⟩ \forall s

because all species have equivalent properties (there is no distinct species) and hence the system is invariant under relabelling of the upper index. By

c_{ℓ} := \frac{⟨ n_{ℓ}^{s} ⟩}{V},

we denote the concentration of any monomer or polymer species of size $ℓ$ , where $V$ is the reaction volume. Due to the symmetry formulated in Equation 4, the heterogeneous assembly process decouples into a set of $S$ identical and independent homogeneous assembly processes in the deterministic limit. The corresponding homogeneous system then is described by the following set of equations that is obtained by applying (Equation 3, Equation 4) and (Equation 5) to (Equation 2)

\frac{d}{d t} c_{0} = - α c_{0},

\frac{d}{d t} c_{1} = α c_{0} - 2 c_{1} \sum_{ℓ = 1}^{L - 1} ν_{ℓ} c_{ℓ} + \sum_{ℓ = 2}^{L_{nuc} - 1} l δ_{ℓ} c_{ℓ},

\frac{d}{d t} c_{2} = ν_{1} c_{1}^{2} - 2 ν_{2} c_{1} c_{2} - δ_{2} c_{2} 1_{{2 < L_{nuc}}},

\frac{d}{d t} c_{ℓ} = 2 ν_{ℓ - 1} c_{1} c_{ℓ - 1} - 2 ν_{ℓ} c_{1} c_{ℓ} - δ_{ℓ} c_{ℓ} 1_{{ℓ < L_{nuc}}}, for 3 \leq ℓ < L,

\frac{d}{d t} c_{L} = 2 ν_{L - 1} c_{1} c_{L - 1} .

The rate constants $ν_{ℓ}$ in Equations 6 and 2 differ by a factor of $V$ . For convenience, we use however the same symbol in both cases. The rate constants $ν_{ℓ}$ in Equation 6 can be interpreted in the usual units $[\frac{liter}{mol sec}]$ . Due to the symmetry, the yield, which is given by the quotient of the number of completely assembled rings and the maximum number of complete rings, becomes independent of the number of species $S$

yield(t) = \frac{S c_{L} (t) V}{S N L^{- 1}} = \frac{c_{L} (t) V L}{N} .

Hence, it is enough to study the dynamics of the homogeneous system, Equation 6, to identify the condition under which non zero yield is obtained.

Effective description by an advection-diffusion equation

Request a detailed protocol

The dynamical properties of the evolution of the polymer-size distribution become evident if the set of ODEs, Equation 6, is rewritten as a partial differential equation. This approach was previously described in the context of virus capsid assembly (Zlotnick et al., 1999; Morozov et al., 2009). For simplicity, we restrict ourselves to the case $L_{nuc} = 2$ and let $ν_{1} = μ$ and $ν_{ℓ \geq 2} = ν$ . Then, for the polymers with $ℓ > 2$ we have

\partial_{t} c_{ℓ} = 2 ν c_{1} [c_{ℓ - 1} - c_{ℓ}] .

As a next step, we approximate the index $ℓ \in {2, 3, \dots, L}$ indicating the length of the polymer as a continuous variable $x \in [2, L]$ and define $c (x = ℓ) := c_{ℓ}$ . By $A := c_{1}$ we denote the concentration of active monomers in the following to emphasize their special role. Formally expanding the right-hand side of Equation 8 in a Taylor series up to second order

c (ℓ - 1) = c (ℓ) - \partial_{x} c (ℓ) + \frac{1}{2} \partial_{x}^{2} c (ℓ),

one arrives at the advection-diffusion equation with both advection and diffusion coefficients depending on the concentration of active monomers $A (t)$

\partial_{t} c (x) = - 2 ν A \partial_{x} c (x) + ν A \partial_{x}^{2} c (x) .

Equation 10 can be written in the form of a continuity equation $\partial_{t} c (x) = - \partial_{x} J (x)$ with flux $J = 2 ν A c - ν A \partial_{x} c$ . The flux at the left boundary $x = 2$ equals the influx of polymers due to dimerization of free monomers $J (2, t) = μ A^{2}$ . This enforces a Robin boundary condition at $x = 2$

2 ν A c (2, t) - ν A \partial_{x} c (2, t) = μ A^{2} .

At $x = L$ we set an absorbing boundary $c (L, t) = 0$ so that completed structures are removed from the system. The time evolution of the concentration of active monomers is given by

\partial_{t} A = α C e^{- α t} - 2 μ A^{2} - 2 ν A \int_{2}^{L} c (x, t) 𝑑 x .

The terms on the right-hand side account for activation of inactive particles, dimerization, and binding of active particles to polymers (polymerization).

Qualitatively, Equation 10 describes a profile that emerges at $x = 2$ from the boundary condition Equation 11, moves to the right with time-dependent velocity $2 ν A (t)$ due to the advection term, and broadens with a time-dependent diffusion coefficient $ν A (t)$ . In Appendices 2–3 we show how the full solution of Equations 10 and 11 can be found assuming knowledge of $A (t)$ . Here, we focus only on the derivation of the threshold activation rate and threshold dimerization rate that mark the onset of non-zero yield. Yield production starts as soon as the density wave reaches the absorbing boundary at $x = L$ . Therefore, finite yield is obtained if the sum of the advectively travelled distance $d_{adv}$ and the diffusively travelled distance $d_{diff}$ exceeds the system size $L - 2$

d_{adv} + d_{diff} \geq L - 2 .

According to Equation 10, $d_{adv} = 2 ν \int_{0}^{\infty} A (t) 𝑑 t$ and $d_{diff} = \sqrt{2 ν \int_{0}^{\infty} A (t) 𝑑 t}$ , giving as condition for the onset of finite yield

2 ν \int_{0}^{\infty} A (t) 𝑑 t \overset{!}{=} \frac{1}{4} {(\sqrt{1 + 4 (L - 2)} - 1)}^{2} \approx L - \sqrt{L},

where the last approximation is valid for large $L$ .

In order to obtain $\int_{0}^{\infty} A (t) 𝑑 t$ we derive an effective two-component system that governs the evolution of $A (t)$ . To this end, we denote the total number of polymers in Equation 12 by $B (t) := \int_{2}^{\infty} c (x, t) 𝑑 x$ (as long as yield is zero the upper boundary is irrelevant and we can consider $L = \infty$ ). Equation 12 then reads

\frac{d}{d t} A = α C e^{- α t} - 2 μ A^{2} - 2 ν A B,

and the dynamics of $B$ is determined from the boundary condition, Equation 11

\frac{d}{d t} B = \int_{2}^{\infty} \partial_{t} c (x, t) d x = \int_{2}^{\infty} - \partial_{x} J (x, t) d x = - \underset{= 0}{\underset{⏟}{J (\infty, t)}} + J (2, t) = μ A (t)^{2} .

Measuring $A$ and $B$ in units of the initial monomer concentration $C$ and time in units of ${(ν C)}^{- 1}$ the equations are rewritten in dimensionless units as

\frac{d}{d t} A = ω e^{- ω t} - 2 η A^{2} - 2 A B,

\frac{d}{d t} B = η A^{2},

where $ω = \frac{α}{ν C}$ and $η = \frac{μ}{ν}$ . Equation 17 describes a closed two-component system for the concentration of active monomers $A$ and the total concentration of polymers $B$ . It describes the dynamics exactly as long as yield is zero. In order to evaluate the condition (14) we need to determine the integral over $A (t)$ as a function of $ω$ and $η$

\int_{0}^{\infty} A_{ω, η} (t) 𝑑 t := g (ω, η) .

To that end, we proceed by looking at both scenarios separately. The numerical analysis, confirming our analytic results, is given in Appendix 3.

Dimerization scenario

Request a detailed protocol

The activation rate in the dimerization scenario is $α \to \infty$ , and instead of the term $ω e^{- ω t}$ in $d A / d t$ , we set the initial condition $A (0) = 1$ (and $B (0) = 0$ ). Furthermore, $η = μ / ν ≪ 1$ and we can neglect the term proportional to $η$ in $d A / d t$ . As a result,

\frac{d A}{d B} = - \frac{2 B}{η A} .

Solving this equation for $A$ as a function of $B$ using the initial condition $A (B = 0) = 1$ , the totally travelled distance of the wave is determined to be

2 g (ω, η) = 2 \frac{π}{2 \sqrt{2}} \frac{1}{\sqrt{η}},

where for the evaluation of the integral we used the substitution $η A^{2} d t = d B$ .

Activation scenario

Request a detailed protocol

In the activation scenario, yield sets in only if the activation rate and thus the effective nucleation rate is slow. As a result, in addition to $ω ≪ 1$ , we can again neglect the term proportional to $η$ in $d A / d t$ . This time, however, we have to keep the term $ω e^{- ω t}$ . As a next step, we assume that $d A / d t$ is much smaller than the remaining terms on the right-hand side, $ω e^{- ω t}$ and $- 2 A B$ . This assumption might seem crude at first sight but is justified a posteriori by the solution of the equation (see Appendix 3). Hence, we get the algebraic equation $A (t) = ω e^{- ω t} / (2 B (t))$ . Using it to solve $d B / d t = η A^{2}$ for $B$ , and then to determine $A$ , the totally travelled distance of the wave is deduced as

2 g (ω, η) = 2 \frac{3^{2 / 3} \sqrt{π} Γ (2 / 3)}{6 Γ (7 / 6)} {(ω η)}^{- 1 / 3} .

Taken together, we therefore obtain two conditions out of which one must be fulfilled in order to obtain finite yield

2 a (η ω)^{- \frac{1}{3}} \geq L - \sqrt{L} \Rightarrow α < α_{th} := P_{α} \frac{ν}{μ} \frac{ν C}{(L - \sqrt{L})^{3}}

or 2 b η^{- \frac{1}{2}} \geq L - \sqrt{L} \Rightarrow μ < μ_{th} := P_{μ} \frac{ν}{(L - \sqrt{L})^{2}},

where $a$ and $b$ are numerical factors, and $P_{α} = 8 a^{3} \approx 5.77$ and $P_{μ} = 4 b^{2} \approx 4.93$ . This verifies Equation 1 in the main text.

Appendix 1

Chemical reaction equations and the equivalence of models with different numbers of species

In this section we derive the chemical rate equations (deterministic equations) for the self-assembly process as described in the main text. Furthermore, we show that for general $S$ in the deterministic limit the model is equivalent to a set of $S$ independent assembly processes with only one species.

Homogeneous structures

First, we consider the homogeneous model ( $S = 1$ ). By $c_{ℓ} (t)$ we denote the concentration of complexes of length $ℓ$ ( $ℓ \geq 2$ ) at time $t$ , $c_{1} (t)$ is the concentration of active monomers and $c_{0} (t)$ the concentration of inactive monomers at time $t$ . In the following we will usually skip the time argument for better readability. We denote the reaction rate for binding of a monomer to a polymer of size $ℓ$ by $ν_{ℓ}$ . The model from the main text is recovered by setting $ν_{ℓ} := μ_{ℓ}$ if $ℓ < L_{nuc}$ , and $ν_{ℓ} := ν$ otherwise. The ensuing set of ordinary differential equations then reads:

\frac{d}{d t} c_{0} = - α c_{0},

\frac{d}{d t} c_{1} = α c_{0} - 2 c_{1} \sum_{ℓ = 1}^{L - 1} ν_{ℓ} c_{ℓ} + \sum_{ℓ = 2}^{L_{nuc} - 1} l δ_{ℓ} c_{ℓ},

\frac{d}{d t} c_{2} = ν_{1} c_{1}^{2} - 2 ν_{2} c_{1} c_{2} - δ_{2} c_{2} 1_{{2 < L_{nuc}}},

\frac{d}{d t} c_{ℓ} = 2 ν_{ℓ - 1} c_{1} c_{ℓ - 1} - 2 ν_{ℓ} c_{1} c_{ℓ} - δ_{ℓ} c_{ℓ} 1_{{ℓ < L_{nuc}}}, for 3 \leq ℓ < L,

\frac{d}{d t} c_{L} = 2 ν_{L - 1} c_{1} c_{L - 1} .

The indicator function $𝟏_{{x < L_{nuc}}}$ equals 1 if the condition $x < L_{nuc}$ is satisfied and 0 otherwise. The first equation describes loss of inactive particles due to activation at rate $α$ . It is uncoupled from the remainder of the equations and is solved by $c_{0} (t) = C e^{- α t}$ , with $C$ denoting the initial concentration of inactive monomers. The temporal change of the active monomers is governed by the following processes (Equation A1b): activation of inactive monomers at rate $α$ , binding of active monomers to existing structures at rate $ν_{ℓ}$ (polymerization), and decay of below-critical polymers into monomers at rate $δ_{ℓ}$ (disassembly). All binding rates appear with a factor of 2 because a monomer can attach to a polymer on its left or on its right end.

Note that there is a subtlety with the dimerization term $2 ν_{1} c_{1}^{2}$ in Equation A1b: the dimerization term as well bears a factor of 2 because two identical monomers $A$ and $B$ can form a dimer in two possible ways, either as $A B$ or $B A$ . Additionally, there is a stoichiometric factor of 2 for the monomers in this reaction. However, one factor of 2 is cancelled again because, assuming there are $n$ monomers, the number of ordered pairs of monomers that describe possible reaction partners is $\frac{1}{2} n (n - 1) \approx n^{2} / 2$ (if $n$ is large) rather than $n^{2}$ (the number of reaction partners when two different species react). This leaves us with a single factor of 2 like for all the other binding reactions.

Equations A1c and A1d describe the dynamics of dimers and larger polymers of size $3 \leq ℓ < L$ , respectively. The terms account for reactions of polymers with active monomers (polymerization) as well as decay in the case of below-critical polymers (disassembly). The dimerization term in the equation for $\partial_{t} c_{2}$ lacks the factor of 2 because the stoichiometric factor is missing for the dimers as compared with the dimerization term for the monomers in the line above. Finally, polymers of length $L$ – the complete ring structures – form an absorbing state and therefore only include a reactive gain term (Equation A1e).

Heterogeneous structures

Next we consider systems with more than one particle species ( $S > 1$ ). The heterogeneous system can be described by dynamical equations equivalent to the homogeneous system. We show this starting from a full description that distinguishes both monomers and polymers into a set of different species $1, \dots, S$ . The species of a polymer is defined by the species of the respective monomer at its left end. As polymers assemble in consecutive order of species, a polymer is uniquely determined by its length and species (i.e. species of leftmost monomer). In that sense, $c_{ℓ}^{s}$ with $0 \leq ℓ < L$ and $1 \leq s \leq S$ denotes the concentration of a polymer of length $ℓ$ and species $s$ ( $c_{0}^{s}$ and $c_{1}^{s}$ again denote inactive and active monomers of species $s$ , respectively). For example, $c_{4}^{5}$ denotes the concentration of polymers [5678] if $S \geq 8$ , or of polymers [5612] if $S = 6$ . Upper indices are always assumed to be taken modulo $S$ whenever they lie outside the range $[1, S]$ . Therefore, the dynamics of the concentrations $c_{ℓ}^{s}$ with $3 \leq ℓ < L$ is given by

\frac{d}{d t} c_{ℓ}^{s} = ν_{ℓ - 1} c_{ℓ - 1}^{s} c_{1}^{ℓ + s - 1} + ν_{ℓ - 1} c_{ℓ - 1}^{s + 1} c_{1}^{s} - ν_{ℓ} c_{ℓ}^{s} c_{1}^{s + ℓ} - ν_{ℓ} c_{ℓ}^{s} c_{1}^{s - 1} - δ c_{ℓ}^{s} 1_{{ℓ < L_{nuc}}} .

The terms on the right-hand side account for the influx due to binding of the respective polymers of length $ℓ - 1$ with a monomer either on the right or on the left (first and second term), and for the outflux due to reactions of a polymer of length $ℓ$ and species $s$ with a monomer on the right or on the left (third and fourth term), as well as for decay into monomers for $ℓ < L_{nuc}$ (last term). For the dynamics of the dimers, however, there is only one gain term arising from dimerization:

\frac{d}{d t} c_{2}^{s} = ν_{1} c_{1}^{s} c_{1}^{s + 1} - ν_{2} c_{2}^{s} c_{1}^{s + 2} - ν_{2} c_{2}^{s} c_{1}^{s - 1} - δ_{2} c_{2}^{s} 1_{{2 < L_{nuc}}} .

Equivalently, for the active monomers we find:

\begin{matrix} \frac{d}{d t} c_{1}^{s} = & α C e^{- α t} - c_{1}^{s} \sum_{ℓ = 1}^{L - 1} ν_{ℓ} (c_{ℓ}^{s + 1} + c_{ℓ}^{s - ℓ}) + \sum_{ℓ = 2}^{L_{nuc} - 1} \sum_{k = s + 1 - ℓ}^{k = s} δ_{ℓ} c_{ℓ}^{k} . \end{matrix}

Now we exploit the symmetry of the system with respect to the species index, that is, the upper index in ${c_{ℓ}^{s}}$ : Since all species in the system are equivalent, the dynamic equations are invariant under relabelling of the upper indices. Consequently, it must hold that:

c_{ℓ}^{s} (t) = c_{ℓ}^{k} (t), f o r a n y s, k \leq S a t a n y t i m e t .

In other words, the upper index is irrelevant and can also be discarded. The variable $c_{ℓ}$ then denotes the concentration of any one polymer species of length $ℓ$ . Taking advantage of this symmetry for the equations of the heterogeneous system, (Equation A2, Equation A3 and Equation A4), and collecting equal terms leads to a set of equations fully identical to those for the homogeneous system (Equation A1). We show the equivalence to the homogeneous model exemplarily for the dynamics of the polymers with size $ℓ \geq 3$ in Equation A2. Applying $c_{ℓ}^{s} (t) = c_{ℓ} (t)$ to Equation A2 yields for the dynamics of the concentration of an arbitrary polymer species of size $ℓ$ :

\begin{array}{ll} \frac{d}{d t} c_{ℓ} & = ν_{ℓ - 1} c_{ℓ - 1} c_{1} + ν_{ℓ - 1} c_{ℓ - 1} c_{1} - ν_{ℓ} c_{ℓ} c_{1} - ν_{ℓ} c_{ℓ} c_{1} - δ c_{ℓ} 1_{{ℓ < L_{nuc}}} . \\ = 2 ν_{ℓ - 1} c_{ℓ - 1} c_{1} - 2 ν_{ℓ} c_{ℓ} c_{1} - δ c_{ℓ} 1_{{ℓ < L_{nuc}}}, \end{array}

which is identical to the respective dynamic Equation A1d for the homogeneous model. The other equations for the heterogeneous system reduce to those for the homogeneous system in an analogous manner.

Summarizing, we have shown that the (deterministic) heterogeneous assembly process decouples into a set of $S$ identical and independent homogeneous processes. In particular, yield, which is given by the quotient of the number of completely assembled rings and the maximal possible number of complete rings, becomes independent of $S$ :

yield(t) = \frac{S c_{L} (t)}{S N L^{- 1}} = \frac{c_{L} (t) L}{N} .

Appendix 2

Effective description of the evolution of the polymer size distribution as an advection-diffusion equation

The dynamical properties of the evolution of the polymer size distribution become evident if the set of ODEs, Equation 1, is rewritten as a partial differential equation. This approach was previously described in the context of virus capsid assembly (Morozov et al., 2009; Zlotnick et al., 1999; Endres and Zlotnick, 2002) but we will restate the essential steps here for the convenience of the reader. To this end we interpret the length index of the polymer $ℓ \in {2, 3, \dots, L}$ as a continuous variable that we rename $x \in [2, L]$ . With such a continuous description in view we write $c (x = ℓ) := c_{ℓ}$ to denote the concentration of polymers of size $ℓ$ .

Since the active monomers play a special role, we denote their concentration in the following by $A$ . For simplicity we restrict our discussion to the case $L_{nuc} = 2$ and let $ν_{1} = μ$ and $ν_{ℓ \geq 2} = ν$ . Generalizations to $L_{nuc} > 2$ can be done in a similar way. Then, for the polymers with $ℓ \geq 3$ we have:

\partial_{t} c (ℓ) = 2 ν A [c (ℓ - 1) - c (ℓ)] .

Formally, expanding the right-hand side in a Taylor series up to second order

c (ℓ - 1) = c (ℓ) - \partial_{x} c (ℓ) + \frac{1}{2} \partial_{x}^{2} c (ℓ),

we arrive at an advection-diffusion equation with both advection and diffusion coefficients depending on the concentration of active monomers $A (t)$ ,

\partial_{t} c (x) = - 2 ν A \partial_{x} c (x) + ν A \partial_{x}^{2} c (x) .

Equation A9 can be written in the form of a continuity equation $\partial_{t} c (x) = - \partial_{x} J (x)$ with flux $J = 2 ν A c - ν A \partial_{x} c$ . The flux at the left boundary, $x = 2$ , equals the influx of polymers due to dimerization of free monomers, $J (2, t) = μ A^{2}$ . This enforces a Robin boundary condition at $x = 2$ ,

2 ν A c (2, t) - ν A \partial_{x} c (2, t) = μ A^{2} .

At $x = L$ , we have an absorbing boundary $c (L, t) = 0$ so that completed structures are removed from the system. Furthermore, the time evolution of the concentration of active particles is given by

\partial_{t} A = α C e^{- α t} - 2 μ A^{2} - 2 ν A \int_{2}^{L} c (x, t) 𝑑 x .

The terms on the right-hand side account for activation of inactive particles, dimerization, and binding of active particles to polymers (polymerization).

Qualitatively, Equation A9 describes a profile that emerges at $x = 2$ from the boundary condition, Equation A10, moves to the right with time dependent velocity $2 ν A (t)$ due to the advection term, and broadens with a time-dependent diffusion coefficient $ν A (t)$ . The concentration of active particles $A$ determines both the influx of dimers at $x = 2$ , as well as the speed and diffusion of the wave profile.

Next, we derive an expression that solves Equation A9, assuming that we know $A (t)$ . We start by solving Equation A9 at the left boundary $c (2, t)$ , and then translate the resulting expression to obtain a solution for $c (x, t)$ . To obtain $c (2, t)$ in dependence of $a (t)$ we can solve $\frac{d}{d t} c (2, t) = μ A^{2} - 2 ν A c (2, t)$ (see Equation A1c) by ’variation of the constants’ as

c (2, t) = \int_{0}^{t} μ A (\tilde{t})^{2} \exp [- 2 \int_{\tilde{t}}^{t} ν A (t^{'}) d t^{'}] d \tilde{t} .

With help of this expression we find $c (x, t)$ : Given $c (2, t)$ , the advective part of Equation A9,

\partial_{t} \tilde{c} (x) = - 2 ν A \partial_{x} \tilde{c} (x) .

is solved by

c_{advec} (x, t) = c (2, τ (x, t)) .

Here, $τ (x, t)$ denotes the time when a particle now at position $x$ and time $t$ was at $x = 2$ . In other words, a particle at time $t$ and position $x$ has entered the system at $x = 2$ at time $τ (x, t)$ . This ansatz solves the PDE (Equation A13) if and only if $τ (x, t)$ satisfies

τ (x, t) = {\tilde{A}}^{- 1} (\tilde{A} (t) - \frac{x - 2}{2 ν})

with $\tilde{A}$ being an arbitrary integral of $A$ such that $\partial_{t} \tilde{A} (t) = A (t)$ and ${\tilde{A}}^{- 1}$ denoting its inverse. More easily, we find this form of $τ$ by requiring that the integral over the velocity from time $τ$ to $t$ equals the travelled distance $x - 2$ :

\int_{τ}^{t} 2 ν A (t^{'}) 𝑑 t^{'} = x - 2 .

To include the diffusive contribution in Equation A13, we use the diffusion kernel,

k (x, y, t) = {(4 π \int_{τ (y, t)}^{t} D (t))}^{- 1 / 2} \exp (\frac{- x^{2}}{4 \int_{τ (y, t)}^{t} D (t)}),

with the time dependent diffusion constant $D (t) = ν A (t)$ . The kernel $k (x, y, t)$ accounts for the mass that has been diffusively transported from $y$ over a distance of $x$ . Because the mass has entered the system at $x = 2$ at time $τ (y, t)$ , it diffused for the time $t - τ (y, t)$ . The complete expression for $c (x, t)$ is then obtained as the convolution of $c_{advec} (x, t)$ (Equation A14), that is obtained from Equation A12 and Equation A15, and the diffusion kernel $k (x, y, t)$ (Equation A17):

c (x, t) = \int c_{advec} (s, t) k (x - s, s, t) 𝑑 s = \int c (2, τ (s, t)) k (x - s, s, t) 𝑑 s .

Interpreting the terms in the equations and the general form of the solution, we are able to understand the qualitative behavior of the system. If both the activation and the dimerization rate are large, the system produces zero yield: both advection and diffusion are driven by the concentration of active monomers $A$ . If activation is fast, the concentration of active monomers $A$ will become large initially since activation is faster than the reaction dynamics. Consequently, provided $μ \sim ν$ , dimerization dominates over binding because it depends quadratically on $A$ , see Equation A11. The reservoir of free particles then depletes quickly and cannot sustain the motion of the wave for long enough to reach the absorbing boundary, resulting in a very low yield. Only if either the activation rate is low enough or if $μ ≪ ν$ , the motion of the wave can be sustained until it reaches the absorbing boundary.

Appendix 3

Threshold values for the activation and dimerization rate

Based on the analysis from the previous section, we will now determine the threshold activation rate and threshold dimerization rate which mark the onset of non-zero yield. Yield production starts as soon as the density wave reaches the absorbing boundary at $x = L$ . Therefore, finite yield is obtained if and only if the sum of the advectively travelled distance $d_{adv}$ and the diffusively travelled distance $d_{diff}$ exceeds the system size $L - 2$ :

d_{adv} + d_{diff} \geq L - 2 .

The condition for the onset of non-zero yield is obtained by assuming equality in this relation. The advectively travelled distance is obtained from Equation A16 by setting the borders of the integral over the velocity to $τ = 0$ and $t = \infty$ :

d_{adv} = \int_{0}^{\infty} 2 ν A (t^{'}) 𝑑 t^{'} .

The diffusively travelled distance is approximately given by the standard deviation of the Gaussian diffusion kernel, Equation A17, again with $τ = 0$ and $t = \infty$ ,

d_{diff} = \sqrt{2 ν \int_{0}^{\infty} A (t) 𝑑 t} .

Taken together, we obtain a condition for the onset of finite yield:

2 ν \int_{0}^{\infty} A (t) 𝑑 t + \sqrt{2 ν \int_{0}^{\infty} A (t) 𝑑 t} = L - 2 .

Substituting $y = \sqrt{2 ν \int A}$ and requiring that $y$ is positive, we solve the quadratic equation and find that Equation A22 is equivalent to

2 ν \int_{0}^{\infty} A (t) 𝑑 t = y^{2} = \frac{1}{4} {(\sqrt{1 + 4 (L - 2)} - 1)}^{2} \approx L - \sqrt{L},

where the last approximation is valid for large $L$ .

We determine the threshold values for the activation rate $α$ and the dimerization rate $μ$ by finding solutions of the dynamical equation for the active particles $A (t)$ , Equation A11, such that the condition, Equation A23, is fulfilled. Thus, we start by deriving the dependence of $\int_{0}^{\infty} A (t) 𝑑 t$ on $α$ and $μ$ .

The concentration $c (x, t)$ appears in Equation A11 only in terms of an integral $\int_{2}^{L} c (x, t) 𝑑 x$ , counting the total number of polymers in the system. As long as yield is zero there is no outflux of polymers at the absorbing boundary $x = L$ and the total number of polymers in the system only increases due to the influx at the left boundary $x = 2$ . As long as yield is zero we can therefore equivalently consider the limit $L \to \infty$ . We denote the total number of polymers in Equation A11 by $B (t) := \int c (x, t) 𝑑 x$ for which the dynamics is determined from the boundary condition, Equation A10:

\frac{d}{d t} B = \int_{2}^{\infty} \partial_{t} c (x, t) 𝑑 x = \int_{2}^{\infty} - \partial_{x} J (x, t) d x = - \underset{= 0}{\underset{⏟}{J (\infty, t)}} + J (2, t) = μ A {(t)}^{2} .

Hence, as long as yield is zero, the total number of polymers increases with the rate of the dimerization events. The system then simplifies to a set of two coupled ordinary differential equations for $A$ and $B$ :

\frac{d}{d t} A = α C e^{- α t} - 2 μ A^{2} - 2 ν A B

\frac{d}{d t} B = μ A^{2} .

The dynamics of $A$ and $B$ is equivalent to a two-state activator-inhibitor system, where $A$ dimerizes into $B$ at rate $μ$ , and $B$ degrades (inhibits) $A$ at rate $2 ν$ . Note that Equation A25 describes the exact dynamics of the active monomers $A$ and total number of polymers $B$ in the deterministic system as long as yield is zero. The system has therefore been greatly reduced from originally $S N$ coupled ODEs to now only two coupled ODEs.

For the further analysis it is useful to non-dimensionalize Equation A25 by measuring $A$ and $B$ in units of the initial concentration of inactive monomers $C$ and time in units of ${(ν C)}^{- 1}$ :

\frac{d}{d t} A = ω e^{- ω t} - 2 η A^{2} - 2 A B,

\frac{d}{d t} B = η A^{2},

with the remaining dimensionless parameters $ω = \frac{α}{ν C}$ and $η = \frac{μ}{ν}$ . We are interested in the integral over $A (t)$ as a function of $ω$ and $η$ ,

\int_{0}^{\infty} A_{ω, η} (t) 𝑑 t := g (ω, η),

which relates to the totally travelled distance of the wave. Note that, in case of zero yield, $2 g (ω, η)$ is the total advectively travelled distance of the wave (cf. Equation A20) and the square of the diffusively travelled distance (cf. Equation A21).

Analysis of the dimerization scenario

The dimerization scenario is characterized by fast activation $α ≫ C ν$ and slow dimerization $μ ≪ ν$ . For the dimensionless parameters these assumptions translate to $η ≪ 1$ and $η ≪ ω$ . Because for small $η ≪ 1$ nucleation is much slower than growth we neglect the dimerization term in Equation A26a against the growth term. Furthermore, because $η ≪ ω$ activation happens on a fast time scale compared with nucleation and we may therefore integrate out the fast time scale assuming that all particles are activated instantaneously at the beginning. The system Equation A26 then reduces to

\frac{d}{d t} A = - 2 A B,

\frac{d}{d t} B = η A^{2},

with the initial condition $A (0) = 1$ and $B (0) = 0$ . We divide the first equation by the second one (formally applying the chain rule and the inverse function theorem) to obtain a single equation for the dynamics of $A (B)$ :

\frac{d A}{d B} = - \frac{2}{η} \frac{B}{A},

where $A (B = 0) = 1$ . This first order ODE can be solved by separation of variables and subsequent integration, yielding

A (B) = \sqrt{1 - \frac{2}{η} B^{2}} .

Because the number of active monomers $A (t)$ must vanish for $t \to \infty$ , the final value of $B$ is

B_{\infty} := B (t = \infty) = \sqrt{\frac{η}{2}} .

Thereby, we calculate the function $g (η)$ via variable substitution $d t = \frac{d B}{η A^{2}}$ :

g (η) = \int_{0}^{\infty} A (t) 𝑑 t = \int_{0}^{B_{\infty}} A (B) \frac{d B}{η A {(B)}^{2}} = \frac{1}{η} \int_{0}^{B_{\infty}} \frac{d B}{\sqrt{1 - \frac{2}{η} B^{2}}} = \frac{π}{2 \sqrt{2}} η^{- \frac{1}{2}} .

So, the dependence of the travelled distance of the wave on $η$ obeys a power law with exponent $- \frac{1}{2}$ , confirming the previous result (Morozov et al., 2009). For the coefficient we find $\frac{π}{2 \sqrt{2}} \approx 1.1107$ .

Additionally, we can determine the time dependent solutions $A (t)$ and $B (t)$ . Using the solution for $A (B)$ from Equation A30 in Equation A28b we obtain $B (t)$ as

B (t) = \sqrt{\frac{η}{2}} \tanh (\sqrt{2 η} t) .

We use this expression for $B (t)$ in Equation A28a to obtain $A (t)$ . The resulting ODEs can again be solved by separation of variables as

A (t) = \frac{1}{\cosh (\sqrt{2 η} t)} .

Analysis of the activation scenario

In the activation scenario, $α ≪ C ν$ , such that $ω ≪ 1$ and $ω ≪ η$ . As we know already that decreasing $ω$ will slow down nucleation relative to growth we can again neglect the dimerization term in Equation A26a. In contrast to the dimerization scenario, however, we have to keep the activation term. Transforming time via $τ := 1 - e^{- ω t}$ such that $τ \in [0, 1]$ and writing $a (τ) = a (1 - e^{- ω t}) := A (t)$ and $b (τ) = b (1 - e^{- ω t}) := B (t)$ the system in Equation A26 becomes:

\frac{d}{d τ} a = 1 - \frac{2}{ω (1 - τ)} a b,

\frac{d}{d τ} b = \frac{η}{ω (1 - τ)} a^{2},

with the initial condition $a (0) = b (0) = 0$ . The function $g (ω, η)$ transforms as

g (ω, η) = \int_{0}^{\infty} A (t) 𝑑 t = \frac{1}{ω} \int_{0}^{1} \frac{a (τ)}{1 - τ} 𝑑 τ .

In the following we derive the asymptotic solution for $a (τ)$ in the limit of small $ω$ in order to evaluate the integral in Equation A36. In the limit $τ \to 1$ ( $\Leftrightarrow t \to \infty$ ) both $a (τ)$ and $\frac{d}{d τ} a (τ)$ will become small whereas $b (τ)$ increases monotonically. The reaction term in Equation A35a is furthermore weighted by a factor $\frac{1}{ω}$ which will become large if $ω ≪ 1$ . We therefore postulate that for sufficiently large $τ$ the derivative $\frac{d}{d τ} a (τ)$ is much smaller than the two terms on the right-hand side of Equation A35a and hence negligible. This assumption has to be justified a posteriori with the obtained solution. Neglecting the derivative term $\frac{d}{d τ} a$ in (Equation A35a) reduces the equation to an algebraic equation and we find

a = \frac{ω (1 - τ)}{2 b} .

Using this result in Equation A35b we can solve for $b$ by separation of variables and subsequent integration:

b (τ) = {(ω η)}^{\frac{1}{3}} \cdot {(\frac{3}{4} τ - \frac{3}{8} τ^{2})}^{\frac{1}{3}} .

From Equation A37 we immediately obtain $a (τ)$ :

a (τ) = \frac{ω^{\frac{2}{3}}}{η^{\frac{1}{3}}} \cdot \frac{1 - τ}{{(6 τ - 3 τ^{2})}^{\frac{1}{3}}} := \frac{ω^{\frac{2}{3}}}{η^{\frac{1}{3}}} h (τ),

where by $h (τ)$ we denote the part of the solution that depends only on $τ$ . Hence, we find that $a$ and hence also $\frac{d}{d τ} a$ scale like $\sim ω^{\frac{2}{3}}$ , and will thus become small if $ω ≪ 1$ and $τ$ is large enough. Therefore the solution is consistent and justifies the approximation in which we neglected the derivative term in the limit of small $ω$ and sufficiently large $τ$ .

Note that consistency of the solution with the approximation is a sufficient criterion for the validity of the approximation: We can solve the system for $A$ and $B$ in Equation A35 iteratively by defining

\frac{d}{d τ} a_{i - 1} = 1 - \frac{2}{ω (1 - τ)} a_{i} b_{i},

\frac{d}{d τ} b_{i} = \frac{η}{ω (1 - τ)} a_{i}^{2} .

Assuming that for $i \to \infty$ , $a_{i}$ and $b_{i}$ converge to the correct solutions $a (τ)$ and $b (τ)$ when starting with $a_{0} = 0$ , we obtain $a_{1}$ and $b_{1}$ as given by Equation A39 and Equation A38 and can iteratively refine the approximation. The next iteration step then reads: $\frac{d}{d τ} a_{1} = 1 - \frac{2}{ω (1 - τ)} a_{2} b_{2}$ . As $a_{1} \sim ω^{\frac{2}{3}}$ we know that the left-hand side will be small and $a_{1}$ and $b_{1}$ solve the system if the left-hand side equals 0. Writing $a_{2} = a_{1} + {\tilde{a}}_{2}$ and $b_{2} = b_{1} + {\tilde{b}}_{2}$ this gives:

\frac{d}{d τ} a_{1} = 1 - \frac{2}{ω (1 - τ)} (a_{1} + {\tilde{a}}_{2}) (b_{1} + {\tilde{b}}_{2}) \approx \frac{- 2}{ω (1 - τ)} (a_{1} {\tilde{b}}_{2} + b_{1} {\tilde{a}}_{2}) .

From dimensional analysis it follows that the correction terms ${\tilde{a}}_{2}$ and ${\tilde{b}}_{2}$ must scale like ${\tilde{a}}_{2} \sim ω^{\frac{4}{3}}$ and ${\tilde{b}}_{2} \sim ω$ and are hence much smaller than the first order approximations $a_{1}$ and $b_{1}$ . Higher order corrections will give even smaller contributions showing that if $\frac{d}{d τ} a_{1} ≪ 1$ , $a_{1}$ is indeed a very good approximation.

In the limit $τ \to 0$ , however, the expression for $a (τ)$ in Equation A39 diverges and consistency is violated. Hence, the obtained solution is valid only for sufficiently large $τ$ .

We fix some small $ϵ > 0$ such that the approximation can be assumed to be sufficiently good if $\frac{d}{d t} a < ϵ$ . Furthermore, we define $τ_{ϵ}$ such that $\frac{d}{d τ} a < ϵ$ for all $τ > τ_{ϵ}$ . Using Equation A39 we can write this as $\frac{d}{d τ} h < ϵ η^{\frac{1}{3}} / ω^{\frac{2}{3}}$ for all $τ > τ_{ϵ}$ , where the left-hand side, $\frac{d}{d τ} h$ , depends only on $τ$ . Hence, by decreasing $ω$ we can make $τ_{ϵ}$ arbitrarily small: ${lim}_{ω \to 0} τ_{ϵ} = 0$ . In order to calculate $g (ω, η)$ the integral in Equation A36 can be separated in a domain where the approximation $a (τ)$ is accurate and a domain where the correct solution $\tilde{a} (τ)$ deviates strongly from $a (τ)$ :

g (ω, η) = \frac{1}{ω} \int_{0}^{τ_{ϵ}} \frac{\tilde{a} (τ)}{1 - τ} 𝑑 τ + \frac{1}{ω} \int_{τ_{ϵ}}^{1} \frac{a (τ)}{1 - τ} 𝑑 τ .

We see from Equation A35a that $\frac{d}{d τ} \tilde{a} = 1$ describes an upper bound to $\tilde{a}$ showing that $\tilde{a} (τ) \leq τ$ . Therefore we can bound the contribution of the first integral as $\int_{0}^{τ_{ϵ}} \frac{\tilde{a} (τ)}{1 - τ} 𝑑 τ \leq \int_{0}^{τ_{ϵ}} \frac{τ}{1 - τ_{ϵ}} 𝑑 τ = \frac{1}{2} \frac{τ_{ϵ}^{2}}{1 - τ_{ϵ}}$ . Because this upper bound for the integral goes to 0 if $ω$ and hence $τ_{ϵ}$ become small the first integral will become negligible against the second one. Asymptotically, we therefore only need to consider the second integral with the solution for $a (τ)$ as given by Equation A39:

\begin{array}{ll} g (ω, η) & = {(ω η)}^{- \frac{1}{3}} \int_{0}^{1} (6 τ - 3 τ^{2})^{- \frac{1}{3}} d τ = {(ω η)}^{- \frac{1}{3}} \int_{0}^{3} \frac{d z}{6 z^{\frac{1}{3}} \sqrt{1 - \frac{z}{3}}} = \\ = \frac{3^{\frac{2}{3}} \sqrt{π} Γ (\frac{2}{3})}{6 Γ (\frac{7}{6})} {(ω η)}^{- \frac{1}{3}} \approx 0.8969 \cdot {(ω η)}^{- \frac{1}{3}}, \end{array}

where we used the substitution $τ = 1 - \sqrt{1 - z / 3}$ and $Γ (x)$ is the (Euler) Gamma function. So, in the limit of small $ω$ , $g$ scales with $ω$ and $η$ with identical exponent $- \frac{1}{3}$ . This contrasts the dimerization scenario where $g$ as well as $A$ and $B$ depend only on $η$ and are independent of $ω$ (cf. Equation A32, A33 and A34).

Numerical analysis and the threshold values for the rate constants

In order to confirm the results of the last two paragraphs and to see how $g (ω, η)$ behaves in the intermediate regime where $ω$ and $η$ are of the same order of magnitude we also investigate the function $g (ω, η)$ numerically. For that purpose we numerically integrate the ODE-system for $A (t)$ and $B (t)$ in Equation A26 for different values of $ω$ and $η$ with a semi-implicit method. Subsequently, we integrate the solution $A (t)$ using an adaptive recursive Simpson’s rule. Plotting $g$ in dependence of $ω$ for fixed $η$ on a double-logarithmic scale reveals a rather simple bipartite form of $g$ , see Appendix 3—figure 1a:

g (ω, η) = {\begin{matrix} g_{1} (η) ω^{- \frac{1}{3}} & ω ≪ 1 \\ g_{2} (η) & ω ≫ 1 . \end{matrix}

Appendix 3—figure 1

Download asset Open asset

Fit of $g (ω, η)$ on log-log scale.

The function $g (ω, η) = \int_{0}^{\infty} A_{ω, η} (t) 𝑑 t$ describes (half) the travelled distance of the profile of the polymer size distribution in dependence of $ω = \frac{α}{ν C}$ and $η = \frac{μ}{ν}$ . Marker points show solutions for $g (ω, η)$ as obtained numerically from integration of Equation A26. Red lines are linear fits on log-log scale. In (a) we plot $g (ω, η)$ for fixed $η$ (here exemplarily for $η = 0.01$ ) over 25 orders of magnitude in $ω$ and find a markedly bipartite behavior: For small $ω$ the dependence on $ω$ is perfectly matched by a power law with exponent $- \frac{1}{3}$ and $η$ -dependent coefficient $g_{1} (η)$ , whereas for large $ω$ it is a constant $g_{2} (η)$ . (b) Plotting $g_{2} (η) = g (ω = \infty, η)$ in dependence of $η$ reveals again strictly bipartite behavior. Here, however, only the branch for small $η \leq 1$ is relevant. With the coefficient $g_{1} (η)$ that can be determined in a similar way this leads to the final form of $g (ω, η)$ as given by Equation A46.

The transition between these two regimes is rather sharp so that $g$ is best described in a piecewise fashion

g (ω, η) = \max (g_{1} (η) ω^{- \frac{1}{3}}, g_{2} (η)) .

Next, we plot the coefficients $g_{1} (η)$ and $g_{2} (η)$ against $η$ . Here we find that $g_{1} (η) = a η^{- \frac{1}{3}}$ with $a = const \approx 0.90$ and $g_{2} (η)$ is again bipartite with a sharp kink in between (Appendix 3—figure 1b):

g_{2} (η) = \min (b η^{- \frac{1}{2}}, b^{'} η^{- 0.85}),

where $b \approx 1.11$ and $b^{'} \approx 1.37$ . The transition between both regimes is at $η \approx 1.82$ . The second regime is not relevant for self-assembly since it refers to both large $ω$ and large $η$ , hence the travelled distance $2 g$ is too small to give finite yield in this regime. Therefore, we discard the second regime and obtain as final result

g (ω, η) = \max (a {(η ω)}^{- \frac{1}{3}}, b η^{- \frac{1}{2}}),

with $a \approx 0.90$ and $b \approx 1.11$ . This confirms perfectly the exponents as well as the coefficients found in the last two paragraphs. It is, however, surprising that there is such a sharp transition between both regimes, which allows to define $g (ω, η)$ in a piecewise fashion. This behavior must be the result of a series of lower oder terms in $g (ω, η)$ which are unimportant in the limits $ω ≪ η$ and $η ≪ ω$ but cause the sharp transition when $ω$ and $η$ are of the same order of magnitude.

Finally, we return to our original task of finding the threshold values of the activation and dimerization rate for the onset of yield. Using our result for $g (ω, η)$ in Equation A23 we find as necessary and sufficient condition to obtain finite yield in the deterministic system:

2 \max (a {(η ω)}^{- \frac{1}{3}}, b η^{- \frac{1}{2}}) \geq L - \sqrt{L} .

Alternatively, we can state this result as two separate conditions out of which at least one must be fulfilled to obtain finite yield:

2 a (η ω)^{- \frac{1}{3}} \geq L - \sqrt{L} \Rightarrow α < α_{th} := P_{α} \frac{ν}{μ} \frac{ν C}{(L - \sqrt{L})^{3}}

or 2 b η^{- \frac{1}{2}} \geq L - \sqrt{L} \Rightarrow μ < μ_{th} := P_{μ} \frac{ν}{(L - \sqrt{L})^{2}}

where $P_{α} = 8 a^{3} \approx 5.77$ and $P_{μ} = 4 b^{2} \approx 4.93$ . This verifies Equation 1 in the main text.

Appendix 4

Impact of the implementation of sub-nucleation reactions

In the main text we focused our discussion on irreversible binding $L_{n u c} = 2$ . In this section we investigate the effect of different implementations of the sub-nucleation reactions.

In general, perfect yield is trivially achieved if the complete ring is the only stable structure. However, yield can be maximal already for smaller nucleation sizes $L_{n u c}$ depending on the explicit decay rate $δ$ . In the deterministic limit without the dimerization and activation mechanisms ( $μ = ν$ , $α \to \infty$ ) a rapid transition from zero yield to perfect yield occurs in dependence of the critical nucleation size (see Appendix 4—figure 1). The threshold value in this case is approximately half the ring size and is weakly affected by the decay rate $δ$ . In order to obtain finite yield for small nucleation sizes, an extremely high decay rate would be necessary. Hence, maximizing the yield solely by increasing the nucleation size is not very feasible.

Appendix 4—figure 1

Download asset Open asset

Yield maximization due to increased nucleation size.

Without activation and dimerization mechanism $(α \to \infty, μ = ν)$ the yield can still be optimized by increasing the critical nucleation size $L_{n u c}$ . However, a significant improvement is only achieved for critical sizes larger than half the ring size. Above, a rapid transition to perfect yield takes place. Below no effect is observed at all. Increasing $δ$ shifts the onset of yield to slightly smaller critical nucleation sizes. Other parameters: $L = 60$ , $N = 10000$ .

In our model, the subcritical reaction rates $μ_{i}$ may take different values. Here, we want to restrict our discussion to two scenarios. First, all rates have an identical value $μ_{i} = μ$ and second, the rates increase linearly up to the super-nucleation reaction rate: $μ_{i} = μ + (ν - μ) \frac{i - 1}{L_{n u c} - 1}$ .

In the deterministic limit, both implementations show the same qualitative behavior as the dimerization mechanism with $L_{n u c} = 2$ in the main text (see Appendix 4—figure 2). The only relevant aspect for the final yield is the extend to which nucleation is slowed down in total. In the constant scenario all reaction steps contribute equally. As a results there is a strong dependence on the number of such reaction steps, that is on the critical nucleation size. If however, the reaction rates increase linearly with the size of the polymers, the dimerzation rate dominates. Only in the case $μ ≪ ν$ finite yield is observed at all. In this limit the dimerization rate is much smaller than the subsequent growth rates. The explicit form of the different $μ_{i}$ is not of major importance for the yield. The total slowdown of nucleation is the central feature. Structure decay does not play any role for intermediate nucleation sizes.

Appendix 4—figure 2

Download asset Open asset

Yield for the dimerization mechanism ( $α \to \infty$ ) with different nucleation sizes (colors).

(a) If all sub-nucleation growth rates are identical $(μ_{i} = μ)$ increasing the nucleation size increases the threshold value $μ_{t h}$ . The slow down of nucleation due to the individual sub-nucleation steps in total determines the yield. (b) If the sub-nucleation growth rates increase linearly $(μ_{i} = μ + (ν - μ) \frac{i - 1}{L_{n u c} - 1})$ no dependence on the nucleation size is observed. The dimerization rate $μ_{1} = μ$ (which is the most limiting step) dominates entirely. Other parameters: $L = 60$ , $N = 10000$ , $δ = 1$ .

The last question we want to address is how the combination of activation and dimerization mechanism and the corresponding non-monotonic behavior is affected by the nucleation size. Again, we compare constant sub-nucleation growth with a linearly increasing growth rate (see Appendix 4—figure 3). In the deterministic regime both implementations behave qualitatively similar as the dimerization mechanism discussed in the main text. However, in both cases the stochastic yield catastrophe is less pronounced. For the constant growth rates a saturation of the maximal yield is observed for sufficiently low $μ$ . If the profile is linear this effect is weaker as compared to the constant case and a dependency on the explicit value of $μ$ is still observed. The saturation value is not reached for these reactions rates.

Appendix 4—figure 3

Download asset Open asset

Combined mechanisms for different nucleation sizes (symbols) and dimerization rates (color).

(a) If the sub-nucleation growth rates are identical $(μ_{i} = μ)$ the stochastic yield catastrophe is weakened but still has a drastic impact. The qualitative behavior remains unchanged. (b) For a linearly increasing sub-nucleation growth rate $(μ_{i} = μ + (ν - μ) \frac{i - 1}{L_{n u c} - 1})$ in the deterministic regime no changes are observed at all. The effect of the stochastic yield catastrophe is less pronounced. This improvement is mainly caused by structure decay which mitigates stochastic fluctuations. However, a slight dependency of the saturation value on the rate $μ$ is observed. Other parameters: $L = 60$ , $S = L$ , $N = 100$ , $δ = 0.1$ .

Taking all our results for the sub-nucleation behavior together we draw the following conclusions: First, structure decay by itself it not very efficient in order to maximize yield. Second, the explicit choice of the sub-nucleation rates is of minor importance for the qualitative behavior. The system behaves similarly to the case $L_{n u c} = 2$ . Third, larger nucleation sizes mitigate the stochastic yield catastrophe in general.

Appendix 5

Time evolution of the yield in the activation and dimerization scenario

In the main text we focus on the final yield, which represents the maximal yield that can be obtained in the assembly reaction for $t \to \infty$ . Here, we briefly discuss the temporal evolution of the yield in the two scenarios. Appendix 5—figure 1 shows the yield as a function of time for the dimerization scenario (blue) and the activation scenario (red) for the corresponding parameters indicated in the plot. Drawn lines show the evolution of the yield in the stochastic simulation whereas dashed lines represent its deterministic evolution obtained by integrating the corresponding mean-field rate equations (only shown for the activation scenario). In both scenarios, yield production sets in after a short lag time (Hagan and Elrad, 2010). The emergence of a lag time can be understood in terms of the interpretation of the assembly process as the progression of a travelling wave (see Sec. B). The travelling wave thereby describes the polymer size distribution and the time that is needed for the wave to reach the absorbing boundary equals the lag time for yield production observed in Appendix 5—figure 1. After the lag time, the yield increases very abruptly in the dimerization scenario and a bit more continually in the activation scenario. Since monomers are provided gradually in the activation scenario, the emerging wave is flatter and extends over a larger range (in polymer size space) as compared to the dimerization scenario. Consequently, yield production is more gradual in the activation scenario than in the dimerization scenario. For the same reason, the dimerization scenario is generally ‘faster’ or more time efficient than the activation scenario. For a detailed analysis of the time efficiency of these and other self-assembly scenarios we refer the reader to our manuscript in preparation (Gartner, Graf and Frey, in preparation).

In all depicted situations, the yield increases monotonically with time. This is, of course, generally true since the completed ring structures define an absorbing state in our system. The final yield, which is indicated in the right bar, therefore represents the upper limit for the yield that can be achieved in the assembly reaction. Appendix 5—figure 1 shows that the temporal yield curves initially are rather steep and quickly reach a value that lies within 10% of the final yield (‘quickly’ thereby refers to the respective time scale), before the curves flatten and increase more slowly. This underlines that the final yield is a meaningful observable that not only describes the upper limit for the yield but also approximates the typical yield of the assembly reaction under appropriate time constraints that are not too restrictive (on the time scale set by the respective lag time).

Appendix 5—figure 1

Download asset Open asset

Time evolution of the yield in the activation and dimerization scenario.

The time dependence of the yield is depicted for a dimerization scenario (blue) with $μ = 5 \times 10^{- 4}$ and $N = 100$ and for two activation scenarios (red) with $α = 0.1$ and $N = 2 \times 10^{2}$ and $N = 10^{4}$ , respectively, for target structures of size $L = 20$ . Drawn lines show the time evolution of the stochastic systems while dashed lines describe the time evolution in the corresponding deterministic systems (where the final yield may be higher in the activation scenario). In all cases the yield increases monotonically with time. The final yield, that is indicated in the right bar, represents the upper limit of the yield at any time. Yield production in the activation scenario is generally more gradual than in the dimerization scenario. Therefore, the dimerization scenario is, in general, more time efficient than the activation scenario.

Appendix 6

Standard deviation of the yield

In the main text, the analysis focuses on the average yield. A priori it is, however, not apparent that this average quantity is informative, in particular due to the strong effect of stochasticity in the system. Here, we thus take a step forward to complement this picture by additionally considering a simple measure for the fluctuations of the yield, its standard deviation. Appendix 6—figure 1 is an extension of Figure 3a in the main text, showing the dependence of the average yield and its sample standard deviation on the activation rate. Since yield is always positive, the standard deviation of the yield has to be small if the average yield is close to 0 ( $N = 500$ in Appendix 6—figure 1). The same holds true for average yield close to 1 as the yield is bounded by one from above ( $N = 5000$ in Appendix 6—figure 1). For intermediate values of the average yield, the standard deviation is highest but still small compared to the average yield ( $N = 1000$ in Appendix 6—figure 1). The average yield is, thus, meaningful. Naturally the ratio of the standard deviation compared to the average yield also depends on the number of particles per species $N$ and on the number of species $S$ . Generally speaking, for higher $N$ and $S$ , this ratio decreases (see Appendix 7—figure 1 for the dependency on $S$ ).

Appendix 6—figure 1

Download asset Open asset

Average yield and its sample standard deviation.

For average yield close to 0 or close to 1, the standard deviation has to be small due to the boundedness of the yield to the interval [0, 1]. For intermediate values, the standard deviation is highest. Its value is, however, still considerably smaller than the average yield. The parameters are $L = 60$ , $S = L$ , $μ = ν = 1$ and different particle numbers $N$ (colors/symbols). To obtain the average yield, the yield has been averaged over 1000 simulations. The standard deviation corresponds to the unbiased sample standard deviation.

Appendix 7

Influence of the heterogeneity of the target structure for fixed number of particles per species

Figure 3d in the main text shows how the maximal yield $y_{\max}$ depends on the number of species $S$ if the ring size $L$ and the number of possible ring structures $N S / L$ is fixed. This comparison for fixed $N S$ is motivated by the question which role the heterogeneity of a structure plays for assembly efficiency if a certain number of structures should be realized. Figure 3d illustrates that a higher number of species $S$ (more heterogeneous structures) leads to a lower maximally possible yield, suggesting that it is beneficial to build structures with as few different species as possible. However, this situation does not correspond to the deterministically equivalent case of fixed number of particles per species $N$ (note, though, that in the deterministic case the maximally possible yield is always 1, namely for $α \to 0$ ). Instead, for higher number of species $S$ , the number of particles per species $N \propto 1 / S$ decreases. How does the heterogeneity of the structures $S$ alter the maximally possible yield if $L$ and $N$ (instead of $L$ and $N S$ ) are fixed? Appendix 7—figure 1 shows how the maximal yield $y_{\max}$ and its standard deviation (obtained as average yield and sample standard deviation for $α = 10^{- 8}$ when the yield has well saturated and the dynamics (except for the timescale) get independent of the exact value of the rate-limiting activation rate) depend on the number of species $S$ . For homogeneous structures $S = 1$ yield is always perfect since in this case there can be no fluctuations between species. As a result, the average yield is 1 and the standard deviation is 0. For increasing $S$ , the average yield decreases until it levels off for $S ≫ 1$ . This behavior indicates that indeed the decreasing number of particles per species $N$ for larger $S$ is essential for the decrease of the maximal yield with $S$ in Figure 3d. As mentioned above, the standard deviation is largest for small $S > 1$ and decreases with $S$ .

Appendix 7—figure 1

Download asset Open asset

Influence of the heterogeneity of the target structure on the yield for fixed number of particles per species $N$ .

The maximal yield and its standard deviation (obtained as average yield and sample standard deviation for $α = 10^{- 8}$ ) are plotted against the number of species $S$ making up the structure of size $L = 60$ . The number of particles per species $N = 1000$ is fixed. Yield drops from a perfect value of 1 for $S = 1$ to a smaller value and levels off for $S ≫ 1$ . The standard deviation is largest for small $S$ (except for $S = 1$ where the yield is always perfect) and decreases with increasing number of species.

Appendix 8

Dependence of the maximal yield $y_{max}$ in the activation scenario on $N$ and $L$

Figure 3c in the main text characterizes the dependence of the maximal yield $y_{max}$ in the activation scenario as a ‘phase diagram’ distinguishing different regimes of $y_{max}$ in dependence of the particle number $N$ and target size $L$ . Supplementing this figure in the main text, Appendix 8—figure 1 shows the maximum yield that is obtained in the activation scenario in the limit $α \to 0$ for fixed $L$ in dependence of $N$ (Appendix 8—figure 1a) as well as for fixed $N$ in dependence of $L$ (Appendix 8—figure 1b). For larger particle number $N$ , the maximal yield exhibits a transition from 0 to 1 over roughly three orders of magnitude. Increasing $L$ shifts the transition to larger $N$ . The threshold particle number where the transition starts is characterised by $N_{th}^{> 0} (L)$ (see main text). Approximately, for $L \leq 600$ , we find $N_{th}^{> 0} (L) \sim L^{2.8}$ (cf. main text, Figure 3c). Similarly, decreasing the target size $L$ for fixed $N$ , the maximal yield exhibits a transition from 0 to 1 over roughly one order of magnitude in $L$ . The corresponding threshold value $L_{th}^{> 0}$ as a function of $N$ is obtained as the inverse function of $N_{th}^{> 0} (L)$ . Hence, at least for $N \leq 10^{5}$ , approximately it holds $L_{th}^{> 0} (N) \sim N^{0.36}$ . Since $y_{max}$ is largely independent of the number of species $S$ for fixed $N$ and $L$ (see Appendix 7), the maximal yield in the activation scenario (for $L_{nuc} = 2$ ) can be fully characterized as a function $y_{max} (N, L)$ of $N$ and $L$ . Hence, $y_{max}$ can roughly be expressed in terms of the threshold particle number $N_{th}^{> 0} (L)$ as

y_{max} (N, L) {\begin{matrix} \approx 1 if N > 10^{3} N_{th}^{> 0} (L) \\ < 1 if N_{th}^{> 0} (L) < N < 10^{3} N_{th}^{> 0} (L) \\ = 0 if N < N_{th}^{> 0} (L) \end{matrix}

Appendix 8—figure 1

Download asset Open asset

Dependence of the maximal yield $y_{𝐦𝐚𝐱}$ in the activation scenario on $N$ and $L$ .

For each data point, $y_{max}$ was determined as the average yield of 100 independent stochastic simulations of the activation scenario with $α = 10^{- 12}$ . (a) Variation of the particle number $N$ for different target sizes $L$ . The maximal yield increases from 0 to 1 over roughly three order of magnitude in $N$ . The onset of the transition depends on $L$ . (b) Variation of the target size $L$ for different particle numbers $N$ . Increasing the target size $L$ with $N$ being fixed causes the maximal yield to drop to 0. The transition from 1 to 0 spans roughly one order of magnitude in $L$ and its position is determined by $N$ .

As can be seen from Figure 3c in the main text, the transition line between zero and nonzero yield slightly flattens with increasing $L$ . Hence, the power law $N_{th}^{> 0} (L) \sim L^{2.8}$ (and similarly for $L_{th}^{> 0}$ ) only holds approximately and for a restricted range in $L$ and $N$ . The asymptotic behavior of $N_{th}^{> 0}$ in the limit $L \to \infty$ remains elusive.

Data availability

All data was generated from stochastic simulations in C++ and deterministic simulations in Matlab. The source code files are included with the article.

References

Book
1. Alberts B
2. Johnson A
(2015)
Molecular Biology of the Cell

New York: Garland Science.
- Google Scholar
1. Chen C
2. Kao CC
3. Dragnea B
(2008) Self-assembly of brome mosaic virus capsids: insights from shorter time-scale experiments
The Journal of Physical Chemistry A 112:9405–9412.

https://doi.org/10.1021/jp802498z
- PubMed
- Google Scholar
1. Chevance FF
2. Hughes KT
(2008) Coordinating assembly of a bacterial macromolecular machine
Nature Reviews Microbiology 6:455–465.

https://doi.org/10.1038/nrmicro1887
- PubMed
- Google Scholar
(2012) Stochastic self-assembly of incommensurate clusters
The Journal of Chemical Physics 136:084110.

https://doi.org/10.1063/1.3688231
- PubMed
- Google Scholar
1. D'Orsogna MR
2. Zhao B
3. Berenji B
4. Chou T
(2013) Combinatoric analysis of heterogeneous stochastic self-assembly
The Journal of Chemical Physics 139:121918.

https://doi.org/10.1063/1.4817202
- PubMed
- Google Scholar
(2015) First assembly times and equilibration in stochastic coagulation-fragmentation
The Journal of Chemical Physics 143:014112.

https://doi.org/10.1063/1.4923002
- PubMed
- Google Scholar
1. Drummond DA
2. Wilke CO
(2009) The evolutionary consequences of erroneous protein synthesis
Nature Reviews Genetics 10:715–724.

https://doi.org/10.1038/nrg2662
- PubMed
- Google Scholar
1. Endres D
2. Zlotnick A
(2002) Model-based analysis of assembly kinetics for virus capsids or other spherical polymers
Biophysical Journal 83:1217–1230.

https://doi.org/10.1016/S0006-3495(02)75245-4
- PubMed
- Google Scholar
(2015) Dynamic DNA devices and assemblies formed by shape-complementary, non-base pairing 3D components
Science 347:1446–1452.

https://doi.org/10.1126/science.aaa5372
- PubMed
- Google Scholar
1. Gillespie DT
(2007) Stochastic simulation of chemical kinetics
Annual Review of Physical Chemistry 58:35–55.

https://doi.org/10.1146/annurev.physchem.58.032806.104637
- PubMed
- Google Scholar
(2011) Analyzing mechanisms and microscopic reversibility of self-assembly
The Journal of Chemical Physics 135:214505.

https://doi.org/10.1063/1.3662140
- PubMed
- Google Scholar
(2011) Mechanisms of kinetic trapping in self-assembly and phase transformation
The Journal of Chemical Physics 135:104115.

https://doi.org/10.1063/1.3635775
- PubMed
- Google Scholar
1. Hagan MF
(2014) Modeling viral capsid assembly
Advances in Chemical Physics 155:1.

https://doi.org/10.1002/9781118755815.ch01
- PubMed
- Google Scholar
1. Hagan MF
2. Elrad OM
(2010) Understanding the concentration dependence of viral capsid assembly kinetics--the origin of the lag time and identifying the critical nucleus size
Biophysical Journal 98:1065–1074.

https://doi.org/10.1016/j.bpj.2009.11.023
- PubMed
- Google Scholar
1. Haxton TK
2. Whitelam S
(2013) Do hierarchical structures assemble best via hierarchical pathways?
Soft Matter 9:6851–6861.

https://doi.org/10.1039/c3sm27637f
- Google Scholar
(2014) Growth of equilibrium structures built from a large number of distinct component types
Soft Matter 10:6404–6416.

https://doi.org/10.1039/C4SM01021C
- PubMed
- Google Scholar
(2015) Rational design of self-assembly pathways for complex multicomponent structures
PNAS 112:6313–6318.

https://doi.org/10.1073/pnas.1502210112
- Google Scholar
1. Jacobs WM
2. Frenkel D
(2015) Self-assembly protocol design for periodic multicomponent structures
Soft Matter 11:8930–8938.

https://doi.org/10.1039/C5SM01841B
- PubMed
- Google Scholar
1. Jucker M
2. Walker LC
(2013) Self-propagation of pathogenic protein aggregates in neurodegenerative diseases
Nature 501:45–51.

https://doi.org/10.1038/nature12481
- PubMed
- Google Scholar
1. Ke Y
2. Ong LL
3. Shih WM
4. Yin P
(2012) Three-dimensional structures self-assembled from DNA bricks
Science 338:1177–1183.

https://doi.org/10.1126/science.1227268
- PubMed
- Google Scholar
Book
(2010)
A Kinetic View of Statistical Physics

Cambridge University Press.
- Google Scholar
1. Lazaro GR
2. Hagan MF
(2016) Allosteric control of icosahedral capsid assembly
The Journal of Physical Chemistry B 120:6306–6318.

https://doi.org/10.1021/acs.jpcb.6b02768
- PubMed
- Google Scholar
(2010) Morphogenesis of the T4 tail and tail fibers
Virology Journal 7:355.

https://doi.org/10.1186/1743-422X-7-355
- PubMed
- Google Scholar
1. Michaels TC
2. Dear AJ
3. Kirkegaard JB
4. Saar KL
5. Weitz DA
6. Knowles TP
(2016) Fluctuations in the kinetics of linear protein Self-Assembly
Physical Review Letters 116:258103.

https://doi.org/10.1103/PhysRevLett.116.258103
- PubMed
- Google Scholar
(2017) Kinetic constraints on self-assembly into closed supramolecular structures
Scientific Reports 7:12295.

https://doi.org/10.1038/s41598-017-12528-8
- PubMed
- Google Scholar
(2009) Assembly of viruses and the pseudo-law of mass action
The Journal of Chemical Physics 131:155101.

https://doi.org/10.1063/1.3212694
- PubMed
- Google Scholar
(2015) Undesired usage and the robust self-assembly of heterogeneous structures
Nature Communications 6:6203.

https://doi.org/10.1038/ncomms7203
- PubMed
- Google Scholar
1. Peña C
2. Hurt E
3. Panse VG
(2017) Eukaryotic ribosome assembly, transport and quality control
Nature Structural & Molecular Biology 24:689–699.

https://doi.org/10.1038/nsmb.3454
- PubMed
- Google Scholar
1. Praetorius F
2. Dietz H
(2017) Self-assembly of genetically encoded DNA-protein hybrid nanoscale shapes
Science 355:eaam5488.

https://doi.org/10.1126/science.aam5488
- PubMed
- Google Scholar
1. Reinhardt A
2. Frenkel D
(2014) Numerical evidence for nucleated self-assembly of DNA brick structures
Physical Review Letters 112:238103.

https://doi.org/10.1103/PhysRevLett.112.238103
- PubMed
- Google Scholar
(2010) Lock and key colloids
Nature 464:575–578.

https://doi.org/10.1038/nature08906
- PubMed
- Google Scholar
1. Sear RP
(2007) Nucleation: theory and applications to protein solutions and colloidal suspensions
Journal of Physics: Condensed Matter 19:033101.

https://doi.org/10.1088/0953-8984/19/3/033101
- Google Scholar
(2017) Gigadalton-scale shape-programmable DNA assemblies
Nature 552:78–83.

https://doi.org/10.1038/nature24651
- PubMed
- Google Scholar
1. Wei B
2. Dai M
3. Yin P
(2012) Complex shapes self-assembled from single-stranded DNA tiles
Nature 485:623–626.

https://doi.org/10.1038/nature11075
- PubMed
- Google Scholar
1. Whitelam S
(2015) Hierarchical assembly may be a way to make large information-rich structures
Soft Matter 11:8225–8235.

https://doi.org/10.1039/C5SM01375E
- PubMed
- Google Scholar
(1991) Molecular self-assembly and nanochemistry: a chemical strategy for the synthesis of nanostructures
Science 254:1312–1319.

https://doi.org/10.1126/science.1962191
- PubMed
- Google Scholar
1. Whitesides GM
2. Grzybowski B
(2002) Self-assembly at all scales
Science 295:2418–2421.

https://doi.org/10.1126/science.1070821
- PubMed
- Google Scholar
(2012) First passage times in homogeneous nucleation and self-assembly
The Journal of Chemical Physics 137:244107.

https://doi.org/10.1063/1.4772598
- PubMed
- Google Scholar
(2017) Colloquium : Toward living matter with colloidal particles
Reviews of Modern Physics 89:031001.

https://doi.org/10.1103/RevModPhys.89.031001
- Google Scholar
1. Zhang S
(2003) Fabrication of novel biomaterials through molecular self-assembly
Nature Biotechnology 21:1171–1178.

https://doi.org/10.1038/nbt874
- PubMed
- Google Scholar
(1999) A theoretical model successfully identifies features of hepatitis B virus capsid assembly
Biochemistry 38:14644–14652.

https://doi.org/10.1021/bi991611a
- PubMed
- Google Scholar
1. Zlotnick A
(2003) Are weak protein-protein interactions the general rule in capsid assembly?
Virology 315:269–274.

https://doi.org/10.1016/S0042-6822(03)00586-5
- PubMed
- Google Scholar

Article and author information

Author details

Florian M Gartner

Arnold Sommerfeld Center for Theoretical Physics (ASC) and Center for NanoScience (CeNS), Department of Physics, Ludwig-Maximilians-Universität München, München, Germany

Contribution
Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Project administration

Contributed equally with
Isabella R Graf and Patrick Wilke

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-9801-4288
Isabella R Graf

Arnold Sommerfeld Center for Theoretical Physics (ASC) and Center for NanoScience (CeNS), Department of Physics, Ludwig-Maximilians-Universität München, München, Germany

Contribution
Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Project administration

Contributed equally with
Florian M Gartner and Patrick Wilke

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-9169-9109
Patrick Wilke

Arnold Sommerfeld Center for Theoretical Physics (ASC) and Center for NanoScience (CeNS), Department of Physics, Ludwig-Maximilians-Universität München, München, Germany

Contribution
Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Project administration

Contributed equally with
Florian M Gartner and Isabella R Graf

Competing interests
No competing interests declared
Philipp M Geiger

Arnold Sommerfeld Center for Theoretical Physics (ASC) and Center for NanoScience (CeNS), Department of Physics, Ludwig-Maximilians-Universität München, München, Germany

Contribution
Conceptualization, Validation, Investigation, Visualization, Project administration

Competing interests
No competing interests declared
Erwin Frey

Arnold Sommerfeld Center for Theoretical Physics (ASC) and Center for NanoScience (CeNS), Department of Physics, Ludwig-Maximilians-Universität München, München, Germany

Contribution
Conceptualization, Resources, Supervision, Funding acquisition, Validation, Methodology, Project administration

For correspondence
frey@lmu.de

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-8792-3358

Funding

Deutsche Forschungsgemeinschaft (GRK2062)

Patrick Wilke

Deutsche Forschungsgemeinschaft (QBM)

Florian M Gartner
Isabella R Graf

Aspen Center for Physics (PHY-160761)

Erwin Frey

Deutsche Forschungsgemeinschaft (EXC-2094 - 390783311)

Erwin Frey

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Nigel Goldenfeld for a stimulating discussion, and Raphaela Geßele and Laeschkir Hassan for helpful feedback on the manuscript. This research was supported by the German Excellence Initiative via the program ‘NanoSystems Initiative Munich’(NIM) and was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2094–390783311. FMG and IRG are supported by a DFG fellowship through the Graduate School of Quantitative Biosciences Munich (QBM). We also gratefully acknowledge financial support by the DFG Research Training Group GRK2062 (Molecular Principles of Synthetic Biology). Finally, EF thanks the Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611, for their hospitality and inspiring discussions with colleagues.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.