Stochastic yield catastrophes and robustness in selfassembly
Abstract
A guiding principle in selfassembly is that, for high production yield, nucleation of structures must be significantly slower than their growth. However, details of the mechanism that impedes nucleation are broadly considered irrelevant. Here, we analyze selfassembly into finitesized target structures employing mathematical modeling. We investigate two key scenarios to delay nucleation: (i) by introducing a slow activation step for the assembling constituents and, (ii) by decreasing the dimerization rate. These scenarios have widely different characteristics. While the dimerization scenario exhibits robust behavior, the activation scenario is highly sensitive to demographic fluctuations. These demographic fluctuations ultimately disfavor growth compared to nucleation and can suppress yield completely. The occurrence of this stochastic yield catastrophe does not depend on model details but is generic as soon as number fluctuations between constituents are taken into account. On a broader perspective, our results reveal that stochasticity is an important limiting factor for selfassembly and that the specific implementation of the nucleation process plays a significant role in determining the yield.
eLife digest
The selfassembly of a large biological molecule from small building blocks is like finishing a puzzle of magnetic pieces by shaking the box. Even though each piece of the puzzle is attracted to its correct neighbours, the limited control makes it very hard to finish the puzzle in a short amount of time.
The problem becomes even more difficult if several copies of the same puzzle are assembled in one box. If several puzzles start at the same time, the different parts might steal pieces from each other, making it impossible to successfully complete any of the puzzles. This is called a depletion trap. If the box is only shaken and there is no real control over individual pieces, these traps occur at random.
Overcoming these random depletion traps is an important challenge when assembling nanostructures and other artificial molecules designed by humans without wasting many, potentially expensive, components. Previous studies have shown that when multiple copies of the same structure are assembled simultaneously, slowing the rate of initiation increases the yield of correctlymade structures. This prevents new structures from stealing pieces from existing structures before they are fully completed.
Now, Gartner, Graf, Wilke et al. have used a mathematical model to show that changing the way initiation is delayed leads to different yields. This was especially true for small systems where fluctuations in the availability of the different pieces strongly enhanced the initiation of new structures. In these cases, the selfassembly process terminated undesirably with many incomplete structures.
Nanostructures have various applications ranging from drug delivery to robotics. These findings suggest that in order to efficiently assemble biological molecules, the concentrations of the different building blocks need to be tightly controlled. A question for further research is to investigate strategies that reduce fluctuations in the availability of the building blocks to develop more efficient assembly protocols.
Introduction
Efficient and accurate assembly of macromolecular structures is vital for living organisms. Not only must resource use be carefully controlled, but malfunctioning aggregates can also pose a substantial threat to the organism itself (Jucker and Walker, 2013; Drummond and Wilke, 2009). Furthermore, artificial selfassembly processes have important applications in a variety of research areas like nanotechnology, biology, and medicine (Zhang, 2003; Whitesides and Grzybowski, 2002; Whitesides et al., 1991). In these areas, we find a broad range of assembly schemes. For example, while a large number of viruses assemble capsids from identical protein subunits, some others, like the Escherichia virus T4, form highly complex and heterogeneous virions encompassing many different types of constituents (Zlotnick et al., 1999; Zlotnick, 2003; Hagan, 2014; Leiman et al., 2010). Furthermore, artificially built DNA structures can reach up to Gigadalton sizes and can, in principle, comprise an unlimited number of different subunits (Ke et al., 2012; Reinhardt and Frenkel, 2014; Gerling et al., 2015; Wagenbauer et al., 2017). Notwithstanding these differences, a generic selfassembly process always includes three key steps: First, subunits must be made available, for example by gene expression, or rendered competent for binding, for example by nucleotide exchange (Alberts and Johnson, 2015; Chen et al., 2008; Whitelam, 2015) (‘activation’). Second, the formation of a structure must be initiated by a nucleation event (‘nucleation’). Due to cooperative or allosteric effects in binding, there might be a significant nucleation barrier (Chen et al., 2008; Jacobs and Frenkel, 2015; Sear, 2007; Lazaro and Hagan, 2016; Hagan and Elrad, 2010). Third, following nucleation, structures grow via aggregation of substructures (‘growth’). To avoid kinetic traps that may occur due to irreversibility or very slow disassembly of substructures (Hagan et al., 2011; Grant et al., 2011), structure nucleation must be significantly slower than growth (Zlotnick et al., 1999; Ke et al., 2012; Reinhardt and Frenkel, 2014; Wei et al., 2012; Jacobs et al., 2015; Hagan and Elrad, 2010). Physically speaking, there are no irreversible reactions. However, in the biological context, selfassembly describes the (relatively fast) formation of longlasting, stable structures. Therefore, at least part of the assembly reactions are often considered to be irreversible on the time scale of the assembly process. In this manuscript we investigate, for a given target structure, whether the nature of the specific mechanism employed in order to slow down nucleation influences the yield of assembled product. To address this question, we examine a generic model that incorporates the key elements of selfassembly outlined above.
Model definition
We model the assembly of a fixed number of welldefined target structures from limited resources. Specifically, we consider a set of $S$ different species of constituents denoted by $1,\mathrm{\dots},S$ which assemble into rings of size $L$. The cases $S=1$ and $1<S\le L$ ($S=L$) are denoted as homogeneous and partially (fully) heterogeneous, respectively. The homogeneous model builds on previous work on virus capsid (Chen et al., 2008; Hagan et al., 2011), linear protein filament assembly (Michaels et al., 2016; Michaels et al., 2017; D'Orsogna et al., 2012) and aggregation and polymerization models (Krapivsky et al., 2010). The heterogeneous model in turn links to previous model systems used to study, for example, DNAbrickbased assembly of heterogeneous structures (Murugan et al., 2015; Hedges et al., 2014; D'Orsogna et al., 2013). We emphasize that, even though strikingly similar experimental realizations of our model exist (Gerling et al., 2015; Wagenbauer et al., 2017; Praetorius and Dietz, 2017), it is not intended to describe any particular system. The ring structure represents a general linear assembly process involving building blocks with equivalent binding properties and resulting in a target of finite size. The main assumption in the ring model is that the different constituents assemble linearly in a sequential order. In many biological selfassembling systems like bacterial flagellum assembly or biogenesis of the ribosome subunits the assumption of a linear binding sequence appears to be justified (Peña et al., 2017; Chevance and Hughes, 2008). In order to test the validity of our results beyond these constraints we also perform stochastic simulations of generalized selfassembling systems that do not obey a sequential binding order: i) by explicitly allowing for polymerpolymer bindings and ii) by considering the assembly of finite sized squares that grow independently in two dimensions (see Figures 6 and 7).
The assembly process starts with $N$ inactive monomers of each species. We use $C=N/V$ to denote the initial concentration of each monomer species, where $V$ is the reaction volume. Monomers are activated independently at the same per capita rate $\alpha $, and, once active, are available for binding. Binding takes place only between constituents of species with periodically consecutive indices, for example 1 and 2 or $S$ and 1 (leading to structures such as $\mathrm{\dots}1231\mathrm{\dots}$ for $S=3$); see Figure 1. To avoid ambiguity, we restrict ring sizes to integer multiples of the number of species $S$. Furthermore, we neglect the possibility of incorrect binding, for example species 1 binding to 3 or $S1$. Polymers, that is incomplete ring structures, grow via consecutive attachment of monomers. For simplicity, polymerpolymer binding is disregarded at first, as it is typically assumed to be of minor importance (Zlotnick et al., 1999; Chen et al., 2008; Murugan et al., 2015; Haxton and Whitelam, 2013). To probe the robustness of the model, later we consider an extended model including polymerpolymer binding for which the results are qualitatively the same (see Figure 6 and the discussion). Furthermore, it has been observed that nucleation phenomena play a critical role for selfassembly processes (Ke et al., 2012; Wei et al., 2012; Reinhardt and Frenkel, 2014; Chen et al., 2008). So it is in general necessary to take into account a critical nucleation size, which marks the transition between slow particle nucleation and the faster subsequent structure growth (Michaels et al., 2016; Lazaro and Hagan, 2016; Morozov et al., 2009; Murugan et al., 2015). We denote this critical nucleation size by ${L}_{\mathrm{nuc}}$, which in terms of classical nucleation theory corresponds to the structure size at which the free energy barrier has its maximum. For $l<{L}_{\mathrm{nuc}}$ attachment of monomers to existing structures and decay of structures (reversible binding) into monomers take place at sizedependent reaction rates ${\mu}_{l}$ and ${\delta}_{l}$, respectively (Figure 1). Here, we focus on identical rates ${\mu}_{l}=\mu $ and ${\delta}_{l}=\delta $. A discussion of the general case is given in Appendix 4. Above the nucleation size, polymers grow by attachment of monomers with reaction rate $\nu \ge \mu $ per binding site. As we consider successfully nucleated structures to be stable on the observational time scales, monomer detachment from structures above the critical nucelation size is neglected (irreversible binding) (Murugan et al., 2015; Chen et al., 2008). Complete rings neither grow nor decay (absorbing state).
We investigate two scenarios for the control of nucleation speed, first separately and then in combination. For the ‘activation scenario’ we set $\mu =\nu $ (all binding rates are equal) and control the assembly process by varying the activation rate $\alpha $. For the ‘dimerization scenario’ all particles are inherently active ($\alpha \to \mathrm{\infty}$) and we control the assembly process by varying the dimerization rate $\mu $ (we focus on ${L}_{\mathrm{nuc}}=2$). It has been demonstrated previously in Chen et al. (2008) and (Endres and Zlotnick, 2002; Hagan and Elrad, 2010; Morozov et al., 2009) that either a slow activation or a slow dimerization step are suitable in principle to retard nucleation and favour growth of the structures over the initiation of new ones. We quantify the quality of the assembly process in terms of the assembly yield, defined as the number of successfully assembled ring structures relative to the maximal possible number $NS/L$. Yield is measured when all resources have been used up and the system has reached its final state. We do not discuss the assembly time in this manuscript, however, in Appendix 5 we show typical trajectories for the time evolution of the yield in the activation and dimerization scenario. If the assembly product is stable (absorbing state), the yield can only increase with time. Consequently, the final yield constitutes the upper limit for the yield irrespective of additional time constraints. Therefore, the final yield is an informative and unambiguous observable to describe the efficiency of the assembly reaction.
We simulated our system both stochastically via Gillespie’s algorithm (Gillespie, 2007) and deterministically as a set of ordinary differential equations corresponding to chemical rate equations (see Appendix 1).
Results
Deterministic behavior in the macroscopic limit
First, we consider the macroscopic limit, $N\gg 1$, and investigate how assembly yield depends on the activation rate $\alpha $ (activation scenario) and the dimerization rate $\mu $ (dimerization scenario) for ${L}_{\mathrm{nuc}}=2$. Here, the deterministic description coincides with the stochastic simulations (Figure 2a and b). For both high activation and high dimerization rates, yield is very poor. Upon decreasing either the activation rate (Figure 2a) or the dimerization rate (Figure 2b), however, we find a threshold value, ${\alpha}_{\mathrm{th}}$ or ${\mu}_{\mathrm{th}}$ , below which a rapid transition to the perfect yield of 1 is observed both in the deterministic and stochastic simulation. By exploiting the symmetries of the system with respect to relabeling of species, one can show that, in the deterministic limit, the behavior is independent of the number of species $S$ (for fixed $L$ and $N$, see Appendix 1). Consequently, all systems behave equivalently to the homogeneous system and yield becomes independent of $S$ in this limit. Note, however, that equivalent systems with differing $S$ have different total numbers of particles $SN$ and hence assemble different total numbers of rings.
Decreasing the activation rate reduces the concentration of active monomers in the system. Hence growth of the polymers is favored over nucleation, because growth depends linearly on the concentration of active monomers while nucleation shows a quadratic dependence. Likewise, lower dimerization rates slow down nucleation relative to growth. Both mechanisms therefore restrict the number of nucleation events, and ensure that initiated structures can be completed before resources become depleted (see Figure 2c and d).
Mathematically, the deterministic time evolution of the polymer size distribution $c(l,t)$ is described by an advectiondiffusion equation (Endres and Zlotnick, 2002; Yvinec et al., 2012) with advection and diffusion coefficients depending on the instantaneous concentration of active monomers (see Appendix 2). Solving this equation results in the wavefront of the size distribution advancing from small to large polymer sizes (Figure 2e). Yield production sets in as soon as the distance travelled by this wavefront reaches the maximal ring size $L$. Exploiting this condition, we find that in the deterministic system for ${L}_{\mathrm{nuc}}=2$, a nonzero yield is obtained if either the activation rate or the dimerization rate remains below a corresponding threshold value, that is if $\alpha <{\alpha}_{\mathrm{th}}$ or $\mu <{\mu}_{\mathrm{th}}$, where
(see Appendix 3) with proportionality constants ${P}_{\alpha}=[\sqrt{\pi}\mathrm{\Gamma}(2/3)/\mathrm{\Gamma}(7/6){]}^{3}/3\approx 5.77$ and ${P}_{\mu}={\pi}^{2}/2\approx 4.93$. These relations generalize previous results (Morozov et al., 2009) to finite activation rates and for heterogeneous systems. A comparison between the threshold values given by Equation 1 and the simulated yield curves is shown in Figure 2a,b. The relations highlight important differences between the two scenarios (where $\alpha \to \mathrm{\infty}$ and $\mu =\nu $, respectively): While ${\alpha}_{\mathrm{th}}$ decreases cubically with the ring size $L$, ${\mu}_{\mathrm{th}}$ does so only quadratically. Furthermore, the threshold activation rate ${\alpha}_{\mathrm{th}}$ increases with the initial monomer concentration $C$. Consequently, for fixed activation rate, the yield can be optimized by increasing $C$. In contrast, the threshold dimerization rate is independent of $C$ and the yield curves coincide for $N\gg 1$. Finally, if $\alpha $ is finite and $\mu <\nu $, the interplay between the two slownucleation scenarios may lead to enhanced yield. This is reflected by the factor $\nu /\mu $ in ${\alpha}_{\mathrm{th}}$, and we will come back to this point later when we discuss the stochastic effects.
In summary, for large particle numbers ($N\gg 1$), perfect yield can be achieved in two different ways, independently of the heterogeneity of the system  by decreasing either the activation rate (activation scenario) or the dimerization rate (dimerization scenario) below its respective threshold value.
Stochastic effects in the case of reduced resources
Next, we consider the limit where the particle number becomes relevant for the physics of the system. In the activation scenario, we find a markedly different phenomenology if resources are sparse. Figure 3a shows the dependence of the average yield on the activation rate for different, low particle numbers in the completely heterogeneous case ($S=L$). Here, we restrict our discussion to the average yield. The error of the mean is negligible due to the large number of simulations used to calculate the average yield. Still, due to the randomness in binding and activation, the yield can differ between simulations. A figure with the average yield and its standard deviation is shown in Appendix 6. For very low and very high average yield, the standard deviation has to be small due to the boundedness of the yield. For intermediate values of the average, the standard deviation is highest but still small compared to the average yield. Thus, the average yield is meaningful for the essential understanding of the assembly process. Whereas the deterministic theory predicts perfect yield for small activation rates, in the stochastic simulation yield saturates at an imperfect value ${y}_{\mathrm{max}}<1$. Reducing the particle number $N$ decreases this saturation value ${y}_{\mathrm{max}}$ until no finished structures are produced (${y}_{\mathrm{max}}\to 0$). The magnitude of this effect strongly depends on the size of the target structure $L$ if the system is heterogeneous. Figure 3c shows a diagram characterizing different regimes for the saturation value of the yield, ${y}_{\mathrm{max}}(N,L)$, in dependence of the particle number $N$ and the size of the target structure $L$ for fully heterogeneous systems $(S=L)$. We find that the threshold particle number ${N}_{y}^{th}$ necessary to obtain a fixed yield $y$ increases nonlinearly with the target size $L$. For the depicted range of $L$, the dependence of the threshold for nonzero yield, ${N}_{>0}^{th}$, on $L$ can approximately be described by a powerlaw: ${N}_{>0}^{th}\sim {L}^{\xi}$, with exponent $\xi \approx 2.8$ for $L\le 600$. Consequently, for $L=600$ already more than 10^{5} rings must be assembled in order to obtain a yield larger than zero. In Appendix 8 we included two additional plots that show the dependence of ${y}_{\mathrm{max}}$ on $N$ for fixed $L$ and the dependence on $L$ for fixed $N$, respectively. The suppression of the yield is caused by fluctuations (see explanation below) and is not captured by a deterministic description. Because these stochastic effects can decrease the yield from a perfect value in a deterministic description to zero (see Figure 3a), we term this effect ‘stochastic yield catastrophe’. For fixed target size $L$ and fixed maximum number of target structures $\frac{NS}{L}$, ${y}_{\mathrm{max}}$ increases with decreasing number of species, see Figure 3d. In the fully homogeneous case, $S=1$, a perfect yield of 1 is always achieved for $\alpha \to 0$. The decrease of the maximal yield with the number of species $S$ thus suggests that, in order to obtain high yield, it is beneficial to design structures with as few different species as possible. In large part this effect is due to the constraint $SN=\text{const}$, whereby the more homogeneous systems (small $S$) require larger numbers of particles per species $N$ and, correspondingly, exhibit less stochasticity. If $N$ is fixed instead of $SN$, the yield still initially decreases with increasing number of species $S$ but then quickly reaches a stationary plateau and gets independent of $S$ for $S\gg 1$, see Appendix 7. Moreover, increasing the nucleation size ${L}_{\mathrm{nuc}}$, and with it the reversibility of binding, also increases ${y}_{\mathrm{max}}$, see Figure 3(d). This indicates that, beside heterogeneity of the target structure, irreversibility of binding on the relevant time scale makes the system susceptible to stochastic effects.
The stochastic yield catastrophe is mainly attributable to fluctuations in the number of active monomers. In the deterministic (meanfield) equation the different particle species evolve in balanced stoichiometric concentrations. However, if activation is much slower than binding, the number of active monomers present at any given time is small, and the meanfield assumption of equal concentrations is violated due to fluctuations (for $S>1$). Activated monomers then might not fit any of the existing larger structures and would instead initiate new structures. Figure 4a illustrates this effect and shows how fluctuations in the availability of active particles lead to an enhanced nucleation and, correspondingly, to a decrease in yield. Due to the effective enhancement of the nucleation rate, the resulting polymer size distribution has a higher amplitude than that predicted deterministically (Figure 4b) and the system is prone to depletion traps. A similar broadening of the size distribution has been reported in the context of stochastic coagulationfragmentation of identical particles (D'Orsogna et al., 2015).
In the dimerization scenario, in contrast, there is no stochastic activation step. All particles are available for binding from the outset. Consequently, stochastic effects do not play an essential role in the dimerization scenario and perfect yield can be reached robustly for all system sizes, regardless of the number of species $S$ (Figure 3(b)).
Nonmonotonic yield curves for a combination of slow dimerization and activation
So far, the two implementations of the ‘slow nucleation principle’ have been investigated separately. Surprisingly, we observe counterintuitive behavior in a mixed scenario in which both dimerization and activation occur slowly (i.e., $\mu <\nu $, $\alpha <\mathrm{\infty}$). Figure 5 shows that, depending on the ratio $\mu /\nu $, the yield can become a nonmonotonic function of $\alpha $. In the regime where $\alpha $ is large, nucleation is dimerizationlimited; therefore activation is irrelevant and the system behaves as in the dimerization scenario for $\alpha \to \mathrm{\infty}$. Upon decreasing $\alpha $ we then encounter a second regime, where activation and dimerization jointly limit nucleation. The yield increases due to synergism between slow dimerization and activation (see $\mu /\nu $ dependence of ${\alpha}_{\mathrm{th}}$, Equation 1), whilst the average number of active monomers is still high and fluctuations are negligible. Finally, a stochastic yield catastrophe occurs if $\alpha $ is further reduced and activation becomes the limiting step. This decline is caused by an increase in nucleation events due to relative fluctuations in the availability of the different species (‘fluctuations between species’). This contrasts the deterministic description where nucleation is always slower for smaller activation rate. Depending on the ratio $\mu /\nu $, the ring size $L$ and the particle number $N$, maximal yield is obtained either in the dimerizationlimited (red curves, Figure 5), activationlimited (blue curve, Figure 5b) or intermediate regime (green and orange curves, Figure 5).
Robustness of the results to model modifications
In our model, the reason for the stochastic yield catastrophe is that  due to fluctuations between species  the effective nucleation rate is strongly enhanced. Hence, if binding to a larger structure is temporarily impossible, activated monomers tend to initiate new structures, causing an excess of structures that ultimately cannot be completed. Natural questions that arise are whether (i) relaxing the constraint that polymers cannot bind other polymers or (ii) abandoning the assumption of a linear assembly path, will resolve the stochastic yield catastrophe. To answer these questions, we performed stochastic simulations for extensions of our model system showing that the stochastic yield catastrophe indeed persists. We start by considering the ring model from the previous section but take polymerpolymer binding into account in addition to growth via monomer attachment (Figure 6). In detail, we assume that two structures of arbitrary size (and with combined length $\le L$) bind at rate $\nu $ if they fit together, that is if the left (right) end of the first structure is periodically continued by the right (left) end of the second one. Realistically, the rate of binding between two structures is expected to decrease with the motility and thus the sizes of the structures. In order to assess the effect of polymerpolymer binding, we focus on the worst case where the rate for binding is independent of the size of both structures. If a stochastic yield catastrophe occurs for this choice of parameters, we expect it to be even more pronounced in all the ‘intermediate cases’. Figure 6 shows the dependence of the yield on the activation rate in the polymerpolymer model. As before, yield increases below a critical activation rate and then saturates at an imperfect value for small activation rates. Decreasing the number of particles per species, decreases this saturation value. Compared to the original model, the stochastic yield catastrophe is mitigated but still significant: For structures of size $S=L=100$, yield saturates at around 0.87 for $N=100$ particles per species and at around 0.33 for $N=10$ particles per species. We thus conclude that polymerpolymer binding indeed alleviates the stochastic yield catastrophe but does not resolve it. Since binding only happens between consecutive species, structures with overlapping parts intrinsically can not bind together and depletion traps continue to occur. Taken together, also in the extended model, fluctuations in the availability of the different species lead to an excess of intermediatesized structures that get kinetically trapped due to structural mismatches. Note that in the extreme case of $N=1$, incomplete polymers can always combine into one final ring structure so that in this case the yield is always 1. Analogously, for high activation rates yield is improved for $N=10$ compared to $N\ge 50$ (Figure 6b).
Kinetic trapping due to structural mismatches can occur in every (partially) irreversible heterogeneous assembly process with finitesized target structure and limited resources. From our results, we thus expect a stochastic yield catastrophe to be common to such systems. In order to further test this hypothesis, we simulated another variant of our model where finite sized squares assemble via monomer attachment from a pool of initially inactive particles, see Figure 7. In contrast to the original model, the assembled structures are nonperiodic and exhibit a nonlinear assembly path where structures can grow independently in two dimensions. While the ring model assumes a sequential order of binding of the monomers, the square allows for a variety of distinct assembly paths that all lead to the same final structure. Note that, because of the absence of periodicity, the square model is only well defined for the completely heterogeneous case. Figure 7 depicts the dependence of the yield on the activation rate for a square of size $S=100$. Also in this case, we find that the yield saturates at an imperfect value for small activation rates. Hence, we showed that the stochastic yield catastrophe is not resolved neither by accounting for polymerpolymer combination nor by considering more general assembly processes with multiple parallel assembly paths. This observation supports the general validity of our findings and indicates that stochastic yield catastrophes are a general phenomenon of (partially) irreversible and heterogeneous selfassembling systems that occur if particle number fluctuations are nonnegligible.
Discussion
Our results show that different ways to slow down nucleation are indeed not equivalent, and that the explicit implementation is crucial for assembly efficiency. Susceptibility to stochastic effects is highly dependent on the specific scenario. Whereas systems for which dimerization limits nucleation are robust against stochastic effects, stochastic yield catastrophes can occur in heterogeneous systems when resource supply limits nucleation. The occurrence of stochastic yield catastrophes is not captured by the deterministic rate equations, for which the qualitative behavior of both scenarios is the same. Therefore, a stochastic description of the selfassembly process, which includes fluctuations in the availability of the different species, is required. The interplay between stochastic and deterministic dynamics can lead to a plethora of interesting behaviors. For example, the combination of slow activation and slow nucleation may result in a nonmonotonic dependence of the yield on the activation rate. While deterministically, yield is always improved by decreasing the activation rate, stochastic fluctuations between species strongly suppress the yield for small activation rate by effectively enhancing the nucleation speed. This observation clearly demonstrates that a deterministically slow nucleation speed is not sufficient in order to obtain good yield in heterogeneous selfassembly. For example, a slow activation step does not necessarily result in few nucleation events although deterministically this behavior is expected. Thus, our results indicate that the slow nucleation principle has to be interpreted in terms of the stochastic framework and have important implications for yield optimization.
We showed that demographic noise can cause stochastic yield catastrophes in heterogeneous selfassembly. However, other types of noise, such as spatiotemporal fluctuations induced by diffusion, are also expected to trigger stochastic yield catastrophes. Hence, our results have broad implications for complex biological and artificial systems, which typically exhibit various sources of noise. We characterize conditions under which stochastic yield catastrophes occur, and demonstrate how they can be mitigated. These insights could usefully inform the design of experiments to circumvent yield catastrophes: In particular, while slow provision of constituents is a feasible strategy for experiments, it is highly susceptible to stochastic effects. On the other hand, irrespective of its robustness to stochastic effects, the experimental realization of the dimerization scenario relies on cooperative or allosteric effects in binding, and may therefore require more sophisticated design of the constituents (Sacanna et al., 2010; Zeravcic et al., 2017). Our theoretical analysis shows that stochasticity can be alleviated either by decreasing heterogeneity (presumably lowering realizable complexity) or by increasing reversibility (potentially requiring finetuning of bond strengths and reducing the stability of the assembly product). Alternative approaches to control stochasticity include the promotion of specific assembly paths (Murugan et al., 2015; Gartner, Graf and Frey, in preparation) and the control of fluctuations (Graf, Gartner and Frey, in preparation). One possibility to test these ideas and the ensuing control strategies could be via experiments based on DNA origami. Instead of building homogeneous ring structures as in Wagenbauer et al. (2017), one would have to design heterogeneous ring structures made from several different types of constituents with specified binding properties. By varying the opening angle of the ‘wedges’ (and thus the preferred number of building blocks in the ring) and/or the number of constituents, both the target structure size $L$ as well as the heterogeneity of the target structure $S$ could be controlled.
Moreover, the ideas presented in this manuscript are relevant for the understanding of intracellular selfassembly. In cells, provision of building blocks is typically a gradual process, as synthesis is either inherently slow or an explicit activation step, such as phosphorylation, is required. In addition, the constituents of the complex structures assembled in cells are usually present in small numbers and subject to diffusion. Hence, stochastic yield catastrophes would be expected to have devastating consequences for selfassembly, unless the relevant cellular processes use elaborate control mechanisms to circumvent stochastic effects. Further exploration of these control mechanisms should enhance the understanding of selfassembly processes in cells and help improve synthesis of complex nanostructures.
Materials and methods
All our simulation data was generated with either C++ or MATLAB. The source code is available at the eLife website.
Here we show the derivation of Equation 1 in the main text, giving the threshold values for the rate constants below which finite yield is obtained. The details can be found in Appendices 1–3.
Master equation and chemical rate equations
Request a detailed protocolWe start with the general Master equation and derive the chemical rate equations (deterministic/meanfield equations) for the heterogeneous selfassembly process. We renounce to show the full Master equation here but instead state the system that describes the evolution of the first moments. To this end, we denote the random variable that describes the number of polymers of size $\mathrm{\ell}$ and species $s$ in the system at time $t$ by ${n}_{\mathrm{\ell}}^{s}(t)$ with $2\le \mathrm{\ell}<L$ and $1\le s\le S$. The species of a polymer is defined by the species of the respective monomer at its left end. Furthermore, $n}_{0}^{s$ and ${n}_{1}^{s}$ denote the number of inactive and active monomers of species $s$, respectively, and ${n}_{L}$ the number of complete rings. We signify the reaction rate for binding of a monomer to a polymer of size $\mathrm{\ell}$ by $\nu}_{\ell$. $\alpha$ denotes the activation rate and ${\delta}_{\mathrm{\ell}}$ the decay rate of a polymer of size $\mathrm{\ell}$. By $\u27e8\mathrm{\dots}\u27e9$ we indicate (ensemble) averages. The system governing the evolution of the first moments (the averages) of the $\{{n}_{\mathrm{\ell}}^{s}\}$ is then given by:
The different terms of this equation are illustrated graphically in Figure 8. The first equation describes loss of inactive particles due to activation at rate $\alpha $. Equation 2b gives the temporal change of the number of active monomers that is governed by the following processes: activation of inactive monomers at rate $\alpha $, binding of active monomers to the left or to the right end of an existing structure of size $\mathrm{\ell}$ at rate ${\nu}_{\mathrm{\ell}}$, and decay of belowcritical polymers of size $\mathrm{\ell}$ into monomers at rate ${\delta}_{\mathrm{\ell}}$ (disassembly). Equations 2c and 2d describe the dynamics of dimers and larger polymers of size $3\le \mathrm{\ell}<L$, respectively. The terms account for reactions of polymers with active monomers (polymerization) as well as decay in the case of belowcritical polymers (disassembly). The indicator function ${\mathrm{\U0001d7cf}}_{\{x<{L}_{\text{nuc}}\}}$ equals 1 if the condition $x<{L}_{\text{nuc}}$ is satisfied and 0 otherwise. Note that a polymer of size $\mathrm{\ell}\ge 3$ can grow by attaching a monomer to its left or to its right end whereas the formation of a dimer of a specific species is only possible via one reaction pathway (dimerization reaction). Finally, polymers of length $L$ – the complete ring structures – form an absorbing state and, therefore, include only the respective gain terms (cf Equation 2e).
We simulated the Master equation underlying Equation 2 stochastically using Gillespie’s algorithm. For the following deterministic analysis, we neglect correlations between particle numbers $\{{n}_{\mathrm{\ell}}^{s}\}$, which is valid assumption for large particle numbers. Then the twopoint correlator can be approximated as the product of the corresponding mean values (meanfield approximation)
Furthermore, for the expectation values it must hold
because all species have equivalent properties (there is no distinct species) and hence the system is invariant under relabelling of the upper index. By
we denote the concentration of any monomer or polymer species of size $\mathrm{\ell}$, where $V$ is the reaction volume. Due to the symmetry formulated in Equation 4, the heterogeneous assembly process decouples into a set of $S$ identical and independent homogeneous assembly processes in the deterministic limit. The corresponding homogeneous system then is described by the following set of equations that is obtained by applying (Equation 3, Equation 4) and (Equation 5) to (Equation 2)
The rate constants ${\nu}_{\mathrm{\ell}}$ in Equations 6 and 2 differ by a factor of $V$. For convenience, we use however the same symbol in both cases. The rate constants ${\nu}_{\mathrm{\ell}}$ in Equation 6 can be interpreted in the usual units $[\frac{\text{liter}}{\text{mol sec}}]$. Due to the symmetry, the yield, which is given by the quotient of the number of completely assembled rings and the maximum number of complete rings, becomes independent of the number of species $S$
Hence, it is enough to study the dynamics of the homogeneous system, Equation 6, to identify the condition under which non zero yield is obtained.
Effective description by an advectiondiffusion equation
Request a detailed protocolThe dynamical properties of the evolution of the polymersize distribution become evident if the set of ODEs, Equation 6, is rewritten as a partial differential equation. This approach was previously described in the context of virus capsid assembly (Zlotnick et al., 1999; Morozov et al., 2009). For simplicity, we restrict ourselves to the case ${L}_{\text{nuc}}=\mathrm{\hspace{0.17em}2}$ and let ${\nu}_{1}=\mu $ and ${\nu}_{\mathrm{\ell}\ge 2}=\nu $. Then, for the polymers with $\mathrm{\ell}>2$ we have
As a next step, we approximate the index $\mathrm{\ell}\in \{2,3,\mathrm{\dots},L\}$ indicating the length of the polymer as a continuous variable $x\in [2,L]$ and define $c(x=\mathrm{\ell}):={c}_{\mathrm{\ell}}$. By $A:={c}_{1}$ we denote the concentration of active monomers in the following to emphasize their special role. Formally expanding the righthand side of Equation 8 in a Taylor series up to second order
one arrives at the advectiondiffusion equation with both advection and diffusion coefficients depending on the concentration of active monomers $A(t)$
Equation 10 can be written in the form of a continuity equation ${\partial}_{t}c(x)={\partial}_{x}J(x)$ with flux $J=\mathrm{\hspace{0.17em}2}\nu Ac\nu A{\partial}_{x}c$. The flux at the left boundary $x=\mathrm{\hspace{0.17em}2}$ equals the influx of polymers due to dimerization of free monomers $J(2,t)=\mu {A}^{2}$. This enforces a Robin boundary condition at $x=\mathrm{\hspace{0.17em}2}$
At $x=L$ we set an absorbing boundary $c(L,t)=\mathrm{\hspace{0.17em}0}$ so that completed structures are removed from the system. The time evolution of the concentration of active monomers is given by
The terms on the righthand side account for activation of inactive particles, dimerization, and binding of active particles to polymers (polymerization).
Qualitatively, Equation 10 describes a profile that emerges at $x=\mathrm{\hspace{0.17em}2}$ from the boundary condition Equation 11, moves to the right with timedependent velocity $2\nu A(t)$ due to the advection term, and broadens with a timedependent diffusion coefficient $\nu A(t)$. In Appendices 2–3 we show how the full solution of Equations 10 and 11 can be found assuming knowledge of $A(t)$. Here, we focus only on the derivation of the threshold activation rate and threshold dimerization rate that mark the onset of nonzero yield. Yield production starts as soon as the density wave reaches the absorbing boundary at $x=L$. Therefore, finite yield is obtained if the sum of the advectively travelled distance ${d}_{\text{adv}}$ and the diffusively travelled distance ${d}_{\text{diff}}$ exceeds the system size $L2$
According to Equation 10, ${d}_{\text{adv}}=2\nu \underset{0}{\overset{\mathrm{\infty}}{\int}}A(t)\mathit{d}t$ and ${d}_{\text{diff}}=\sqrt{2\nu \underset{0}{\overset{\mathrm{\infty}}{\int}}A(t)\mathit{d}t}$, giving as condition for the onset of finite yield
where the last approximation is valid for large $L$.
In order to obtain ${\int}_{0}^{\mathrm{\infty}}A(t)\mathit{d}t$ we derive an effective twocomponent system that governs the evolution of $A(t)$. To this end, we denote the total number of polymers in Equation 12 by $B(t):={\int}_{2}^{\mathrm{\infty}}c(x,t)\mathit{d}x$ (as long as yield is zero the upper boundary is irrelevant and we can consider $L=\mathrm{\infty}$). Equation 12 then reads
and the dynamics of $B$ is determined from the boundary condition, Equation 11
Measuring $A$ and $B$ in units of the initial monomer concentration $C$ and time in units of ${(\nu C)}^{1}$ the equations are rewritten in dimensionless units as
where $\omega =\frac{\alpha}{\nu C}$ and $\eta =\frac{\mu}{\nu}$. Equation 17 describes a closed twocomponent system for the concentration of active monomers $A$ and the total concentration of polymers $B$. It describes the dynamics exactly as long as yield is zero. In order to evaluate the condition (14) we need to determine the integral over $A(t)$ as a function of $\omega $ and $\eta $
To that end, we proceed by looking at both scenarios separately. The numerical analysis, confirming our analytic results, is given in Appendix 3.
Dimerization scenario
Request a detailed protocolThe activation rate in the dimerization scenario is $\alpha \to \mathrm{\infty}$, and instead of the term $\omega {e}^{\omega t}$ in $\mathrm{d}A/\mathrm{d}t$, we set the initial condition $A(0)=1$ (and $B(0)=0$). Furthermore, $\eta =\mu /\nu \ll 1$ and we can neglect the term proportional to $\eta $ in $\mathrm{d}A/\mathrm{d}t$. As a result,
Solving this equation for $A$ as a function of $B$ using the initial condition $A(B=0)=1$, the totally travelled distance of the wave is determined to be
where for the evaluation of the integral we used the substitution $\eta {A}^{2}\mathrm{d}t=\mathrm{d}B$.
Activation scenario
Request a detailed protocolIn the activation scenario, yield sets in only if the activation rate and thus the effective nucleation rate is slow. As a result, in addition to $\omega \ll 1$, we can again neglect the term proportional to $\eta $ in $\mathrm{d}A/\mathrm{d}t$. This time, however, we have to keep the term $\omega {e}^{\omega t}$. As a next step, we assume that $\mathrm{d}A/\mathrm{d}t$ is much smaller than the remaining terms on the righthand side, $\omega {e}^{\omega t}$ and $2AB$. This assumption might seem crude at first sight but is justified a posteriori by the solution of the equation (see Appendix 3). Hence, we get the algebraic equation $A(t)=\omega {e}^{\omega t}/(2B(t))$. Using it to solve $\mathrm{d}B/\mathrm{d}t=\eta {A}^{2}$ for $B$, and then to determine $A$, the totally travelled distance of the wave is deduced as
Taken together, we therefore obtain two conditions out of which one must be fulfilled in order to obtain finite yield
where $a$ and $b$ are numerical factors, and ${P}_{\alpha}=\mathrm{\hspace{0.17em}8}{a}^{3}\approx 5.77$ and ${P}_{\mu}=\mathrm{\hspace{0.17em}4}{b}^{2}\approx 4.93$. This verifies Equation 1 in the main text.
Appendix 1
Chemical reaction equations and the equivalence of models with different numbers of species
In this section we derive the chemical rate equations (deterministic equations) for the selfassembly process as described in the main text. Furthermore, we show that for general $S$ in the deterministic limit the model is equivalent to a set of $S$ independent assembly processes with only one species.
Homogeneous structures
First, we consider the homogeneous model ($S=\mathrm{\hspace{0.17em}1}$). By ${c}_{\mathrm{\ell}}(t)$ we denote the concentration of complexes of length $\mathrm{\ell}$ ($\mathrm{\ell}\ge 2$) at time $t$, ${c}_{1}(t)$ is the concentration of active monomers and ${c}_{0}(t)$ the concentration of inactive monomers at time $t$. In the following we will usually skip the time argument for better readability. We denote the reaction rate for binding of a monomer to a polymer of size $\mathrm{\ell}$ by ${\nu}_{\mathrm{\ell}}$. The model from the main text is recovered by setting ${\nu}_{\mathrm{\ell}}:={\mu}_{\mathrm{\ell}}$ if $\mathrm{\ell}<{L}_{\text{nuc}}$, and ${\nu}_{\mathrm{\ell}}:=\nu $ otherwise. The ensuing set of ordinary differential equations then reads:
The indicator function ${\mathrm{\U0001d7cf}}_{\{x<{L}_{\text{nuc}}\}}$ equals 1 if the condition $x<{L}_{\text{nuc}}$ is satisfied and 0 otherwise. The first equation describes loss of inactive particles due to activation at rate $\alpha $. It is uncoupled from the remainder of the equations and is solved by ${c}_{0}(t)=C{e}^{\alpha t}$, with $C$ denoting the initial concentration of inactive monomers. The temporal change of the active monomers is governed by the following processes (Equation A1b): activation of inactive monomers at rate $\alpha $, binding of active monomers to existing structures at rate ${\nu}_{\mathrm{\ell}}$ (polymerization), and decay of belowcritical polymers into monomers at rate ${\delta}_{\mathrm{\ell}}$ (disassembly). All binding rates appear with a factor of 2 because a monomer can attach to a polymer on its left or on its right end.
Note that there is a subtlety with the dimerization term $2\phantom{\rule{thinmathspace}{0ex}}{\nu}_{1}\phantom{\rule{thinmathspace}{0ex}}{c}_{1}^{2}$ in Equation A1b: the dimerization term as well bears a factor of 2 because two identical monomers $A$ and $B$ can form a dimer in two possible ways, either as $AB$ or $BA$. Additionally, there is a stoichiometric factor of 2 for the monomers in this reaction. However, one factor of 2 is cancelled again because, assuming there are $n$ monomers, the number of ordered pairs of monomers that describe possible reaction partners is $\frac{1}{2}n(n1)\approx {n}^{2}/2$ (if $n$ is large) rather than ${n}^{2}$ (the number of reaction partners when two different species react). This leaves us with a single factor of 2 like for all the other binding reactions.
Equations A1c and A1d describe the dynamics of dimers and larger polymers of size $3\le \mathrm{\ell}<L$, respectively. The terms account for reactions of polymers with active monomers (polymerization) as well as decay in the case of belowcritical polymers (disassembly). The dimerization term in the equation for ${\partial}_{t}{c}_{2}$ lacks the factor of 2 because the stoichiometric factor is missing for the dimers as compared with the dimerization term for the monomers in the line above. Finally, polymers of length $L$ – the complete ring structures – form an absorbing state and therefore only include a reactive gain term (Equation A1e).
Heterogeneous structures
Next we consider systems with more than one particle species ($S>1$). The heterogeneous system can be described by dynamical equations equivalent to the homogeneous system. We show this starting from a full description that distinguishes both monomers and polymers into a set of different species $1,\mathrm{\dots},S$. The species of a polymer is defined by the species of the respective monomer at its left end. As polymers assemble in consecutive order of species, a polymer is uniquely determined by its length and species (i.e. species of leftmost monomer). In that sense, ${c}_{\mathrm{\ell}}^{s}$ with $0\le \mathrm{\ell}<L$ and $1\le s\le S$ denotes the concentration of a polymer of length $\mathrm{\ell}$ and species $s$ (${c}_{0}^{s}$ and ${c}_{1}^{s}$ again denote inactive and active monomers of species $s$, respectively). For example, ${c}_{4}^{5}$ denotes the concentration of polymers [5678] if $S\ge 8$, or of polymers [5612] if $S=6$. Upper indices are always assumed to be taken modulo $S$ whenever they lie outside the range $[1,S]$. Therefore, the dynamics of the concentrations ${c}_{\mathrm{\ell}}^{s}$ with $3\le \mathrm{\ell}<L$ is given by
The terms on the righthand side account for the influx due to binding of the respective polymers of length $\mathrm{\ell}1$ with a monomer either on the right or on the left (first and second term), and for the outflux due to reactions of a polymer of length $\mathrm{\ell}$ and species $s$ with a monomer on the right or on the left (third and fourth term), as well as for decay into monomers for $\mathrm{\ell}<{L}_{\text{nuc}}$ (last term). For the dynamics of the dimers, however, there is only one gain term arising from dimerization:
Equivalently, for the active monomers we find:
Now we exploit the symmetry of the system with respect to the species index, that is, the upper index in $\{{c}_{\mathrm{\ell}}^{s}\}$: Since all species in the system are equivalent, the dynamic equations are invariant under relabelling of the upper indices. Consequently, it must hold that:
In other words, the upper index is irrelevant and can also be discarded. The variable ${c}_{\mathrm{\ell}}$ then denotes the concentration of any one polymer species of length $\mathrm{\ell}$. Taking advantage of this symmetry for the equations of the heterogeneous system, (Equation A2, Equation A3 and Equation A4), and collecting equal terms leads to a set of equations fully identical to those for the homogeneous system (Equation A1). We show the equivalence to the homogeneous model exemplarily for the dynamics of the polymers with size $\mathrm{\ell}\ge 3$ in Equation A2. Applying ${c}_{\mathrm{\ell}}^{s}(t)={c}_{\mathrm{\ell}}(t)$ to Equation A2 yields for the dynamics of the concentration of an arbitrary polymer species of size $\mathrm{\ell}$:
which is identical to the respective dynamic Equation A1d for the homogeneous model. The other equations for the heterogeneous system reduce to those for the homogeneous system in an analogous manner.
Summarizing, we have shown that the (deterministic) heterogeneous assembly process decouples into a set of $S$ identical and independent homogeneous processes. In particular, yield, which is given by the quotient of the number of completely assembled rings and the maximal possible number of complete rings, becomes independent of $S$:
Appendix 2
Effective description of the evolution of the polymer size distribution as an advectiondiffusion equation
The dynamical properties of the evolution of the polymer size distribution become evident if the set of ODEs, Equation 1, is rewritten as a partial differential equation. This approach was previously described in the context of virus capsid assembly (Morozov et al., 2009; Zlotnick et al., 1999; Endres and Zlotnick, 2002) but we will restate the essential steps here for the convenience of the reader. To this end we interpret the length index of the polymer $\mathrm{\ell}\in \{2,3,\mathrm{\dots},L\}$ as a continuous variable that we rename $x\in [2,L]$. With such a continuous description in view we write $c(x=\mathrm{\ell}):={c}_{\mathrm{\ell}}$ to denote the concentration of polymers of size $\mathrm{\ell}$.
Since the active monomers play a special role, we denote their concentration in the following by $A$. For simplicity we restrict our discussion to the case ${L}_{\text{nuc}}=\mathrm{\hspace{0.17em}2}$ and let ${\nu}_{1}=\mu $ and ${\nu}_{\mathrm{\ell}\ge 2}=\nu $. Generalizations to ${L}_{\text{nuc}}>2$ can be done in a similar way. Then, for the polymers with $\mathrm{\ell}\ge 3$ we have:
Formally, expanding the righthand side in a Taylor series up to second order
we arrive at an advectiondiffusion equation with both advection and diffusion coefficients depending on the concentration of active monomers $A(t)$,
Equation A9 can be written in the form of a continuity equation ${\partial}_{t}c(x)={\partial}_{x}J(x)$ with flux $J=\mathrm{\hspace{0.17em}2}\nu Ac\nu A{\partial}_{x}c$. The flux at the left boundary, $x=\mathrm{\hspace{0.17em}2}$, equals the influx of polymers due to dimerization of free monomers, $J(2,t)=\mu {A}^{2}$. This enforces a Robin boundary condition at $x=\mathrm{\hspace{0.17em}2}$,
At $x=L$, we have an absorbing boundary $c(L,t)=\mathrm{\hspace{0.17em}0}$ so that completed structures are removed from the system. Furthermore, the time evolution of the concentration of active particles is given by
The terms on the righthand side account for activation of inactive particles, dimerization, and binding of active particles to polymers (polymerization).
Qualitatively, Equation A9 describes a profile that emerges at $x=\mathrm{\hspace{0.17em}2}$ from the boundary condition, Equation A10, moves to the right with time dependent velocity $2\nu A(t)$ due to the advection term, and broadens with a timedependent diffusion coefficient $\nu A(t)$. The concentration of active particles $A$ determines both the influx of dimers at $x=\mathrm{\hspace{0.17em}2}$, as well as the speed and diffusion of the wave profile.
Next, we derive an expression that solves Equation A9, assuming that we know $A(t)$. We start by solving Equation A9 at the left boundary $c(2,t)$, and then translate the resulting expression to obtain a solution for $c(x,t)$. To obtain $c(2,t)$ in dependence of $a(t)$ we can solve $\frac{d}{dt}c(2,t)=\mu {A}^{2}2\nu Ac(2,t)$ (see Equation A1c) by ’variation of the constants’ as
With help of this expression we find $c(x,t)$: Given $c(2,t)$, the advective part of Equation A9,
is solved by
Here, $\tau (x,t)$ denotes the time when a particle now at position $x$ and time $t$ was at $x=2$. In other words, a particle at time $t$ and position $x$ has entered the system at $x=2$ at time $\tau (x,t)$. This ansatz solves the PDE (Equation A13) if and only if $\tau (x,t)$ satisfies
with $\stackrel{~}{A}$ being an arbitrary integral of $A$ such that ${\partial}_{t}\stackrel{~}{A}(t)=A(t)$ and ${\stackrel{~}{A}}^{1}$ denoting its inverse. More easily, we find this form of $\tau $ by requiring that the integral over the velocity from time $\tau $ to $t$ equals the travelled distance $x2$:
To include the diffusive contribution in Equation A13, we use the diffusion kernel,
with the time dependent diffusion constant $D(t)=\nu A(t)$. The kernel $k(x,y,t)$ accounts for the mass that has been diffusively transported from $y$ over a distance of $x$. Because the mass has entered the system at $x=2$ at time $\tau (y,t)$, it diffused for the time $t\tau (y,t)$. The complete expression for $c(x,t)$ is then obtained as the convolution of ${c}_{\text{advec}}(x,t)$ (Equation A14), that is obtained from Equation A12 and Equation A15, and the diffusion kernel $k(x,y,t)$ (Equation A17):
Interpreting the terms in the equations and the general form of the solution, we are able to understand the qualitative behavior of the system. If both the activation and the dimerization rate are large, the system produces zero yield: both advection and diffusion are driven by the concentration of active monomers $A$. If activation is fast, the concentration of active monomers $A$ will become large initially since activation is faster than the reaction dynamics. Consequently, provided $\mu \sim \nu $, dimerization dominates over binding because it depends quadratically on $A$, see Equation A11. The reservoir of free particles then depletes quickly and cannot sustain the motion of the wave for long enough to reach the absorbing boundary, resulting in a very low yield. Only if either the activation rate is low enough or if $\mu \ll \nu $, the motion of the wave can be sustained until it reaches the absorbing boundary.
Appendix 3
Threshold values for the activation and dimerization rate
Based on the analysis from the previous section, we will now determine the threshold activation rate and threshold dimerization rate which mark the onset of nonzero yield. Yield production starts as soon as the density wave reaches the absorbing boundary at $x=L$. Therefore, finite yield is obtained if and only if the sum of the advectively travelled distance ${d}_{\text{adv}}$ and the diffusively travelled distance ${d}_{\text{diff}}$ exceeds the system size $L2$:
The condition for the onset of nonzero yield is obtained by assuming equality in this relation. The advectively travelled distance is obtained from Equation A16 by setting the borders of the integral over the velocity to $\tau =0$ and $t=\mathrm{\infty}$:
The diffusively travelled distance is approximately given by the standard deviation of the Gaussian diffusion kernel, Equation A17, again with $\tau =0$ and $t=\mathrm{\infty}$,
Taken together, we obtain a condition for the onset of finite yield:
Substituting $y=\sqrt{2\nu \int A}$ and requiring that $y$ is positive, we solve the quadratic equation and find that Equation A22 is equivalent to
where the last approximation is valid for large $L$.
We determine the threshold values for the activation rate $\alpha $ and the dimerization rate $\mu $ by finding solutions of the dynamical equation for the active particles $A(t)$, Equation A11, such that the condition, Equation A23, is fulfilled. Thus, we start by deriving the dependence of ${\int}_{0}^{\mathrm{\infty}}A(t)\mathit{d}t$ on $\alpha $ and $\mu $.
The concentration $c(x,t)$ appears in Equation A11 only in terms of an integral ${\int}_{2}^{L}c(x,t)\mathit{d}x$, counting the total number of polymers in the system. As long as yield is zero there is no outflux of polymers at the absorbing boundary $x=L$ and the total number of polymers in the system only increases due to the influx at the left boundary $x=\mathrm{\hspace{0.17em}2}$. As long as yield is zero we can therefore equivalently consider the limit $L\to \mathrm{\infty}$. We denote the total number of polymers in Equation A11 by $B(t):=\int c(x,t)\mathit{d}x$ for which the dynamics is determined from the boundary condition, Equation A10:
Hence, as long as yield is zero, the total number of polymers increases with the rate of the dimerization events. The system then simplifies to a set of two coupled ordinary differential equations for $A$ and $B$:
The dynamics of $A$ and $B$ is equivalent to a twostate activatorinhibitor system, where $A$ dimerizes into $B$ at rate $\mu $, and $B$ degrades (inhibits) $A$ at rate $2\nu $. Note that Equation A25 describes the exact dynamics of the active monomers $A$ and total number of polymers $B$ in the deterministic system as long as yield is zero. The system has therefore been greatly reduced from originally $SN$ coupled ODEs to now only two coupled ODEs.
For the further analysis it is useful to nondimensionalize Equation A25 by measuring $A$ and $B$ in units of the initial concentration of inactive monomers $C$ and time in units of ${(\nu C)}^{1}$:
with the remaining dimensionless parameters $\omega =\frac{\alpha}{\nu C}$ and $\eta =\frac{\mu}{\nu}$. We are interested in the integral over $A(t)$ as a function of $\omega $ and $\eta $,
which relates to the totally travelled distance of the wave. Note that, in case of zero yield, $2g(\omega ,\eta )$ is the total advectively travelled distance of the wave (cf. Equation A20) and the square of the diffusively travelled distance (cf. Equation A21).
Analysis of the dimerization scenario
The dimerization scenario is characterized by fast activation $\alpha \gg C\nu $ and slow dimerization $\mu \ll \nu $. For the dimensionless parameters these assumptions translate to $\eta \ll 1$ and $\eta \ll \omega $. Because for small $\eta \ll 1$ nucleation is much slower than growth we neglect the dimerization term in Equation A26a against the growth term. Furthermore, because $\eta \ll \omega $ activation happens on a fast time scale compared with nucleation and we may therefore integrate out the fast time scale assuming that all particles are activated instantaneously at the beginning. The system Equation A26 then reduces to
with the initial condition $A(0)=1$ and $B(0)=0$. We divide the first equation by the second one (formally applying the chain rule and the inverse function theorem) to obtain a single equation for the dynamics of $A(B)$:
where $A(B=0)=1$. This first order ODE can be solved by separation of variables and subsequent integration, yielding
Because the number of active monomers $A(t)$ must vanish for $t\to \mathrm{\infty}$, the final value of $B$ is
Thereby, we calculate the function $g(\eta )$ via variable substitution $dt=\frac{dB}{\eta {A}^{2}}$:
So, the dependence of the travelled distance of the wave on $\eta $ obeys a power law with exponent $\frac{1}{2}$, confirming the previous result (Morozov et al., 2009). For the coefficient we find $\frac{\pi}{2\sqrt{2}}\approx 1.1107$.
Additionally, we can determine the time dependent solutions $A(t)$ and $B(t)$. Using the solution for $A(B)$ from Equation A30 in Equation A28b we obtain $B(t)$ as
We use this expression for $B(t)$ in Equation A28a to obtain $A(t)$. The resulting ODEs can again be solved by separation of variables as
Analysis of the activation scenario
In the activation scenario, $\alpha \ll C\nu $, such that $\omega \ll 1$ and $\omega \ll \eta $. As we know already that decreasing $\omega $ will slow down nucleation relative to growth we can again neglect the dimerization term in Equation A26a. In contrast to the dimerization scenario, however, we have to keep the activation term. Transforming time via $\tau :=1{e}^{\omega t}$ such that $\tau \in [0,1]$ and writing $a(\tau )=a(1{e}^{\omega t}):=A(t)$ and $b(\tau )=b(1{e}^{\omega t}):=B(t)$ the system in Equation A26 becomes:
with the initial condition $a(0)=b(0)=0$. The function $g(\omega ,\eta )$ transforms as
In the following we derive the asymptotic solution for $a(\tau )$ in the limit of small $\omega $ in order to evaluate the integral in Equation A36. In the limit $\tau \to 1$ ($\iff t\to \mathrm{\infty}$) both $a(\tau )$ and $\frac{d}{d\tau}a(\tau )$ will become small whereas $b(\tau )$ increases monotonically. The reaction term in Equation A35a is furthermore weighted by a factor $\frac{1}{\omega}$ which will become large if $\omega \ll 1$. We therefore postulate that for sufficiently large $\tau $ the derivative $\frac{d}{d\tau}a(\tau )$ is much smaller than the two terms on the righthand side of Equation A35a and hence negligible. This assumption has to be justified a posteriori with the obtained solution. Neglecting the derivative term $\frac{d}{d\tau}a$ in (Equation A35a) reduces the equation to an algebraic equation and we find
Using this result in Equation A35b we can solve for $b$ by separation of variables and subsequent integration:
From Equation A37 we immediately obtain $a(\tau )$:
where by $h(\tau )$ we denote the part of the solution that depends only on $\tau $. Hence, we find that $a$ and hence also $\frac{d}{d\tau}a$ scale like $\sim {\omega}^{\frac{2}{3}}$, and will thus become small if $\omega \ll 1$ and $\tau $ is large enough. Therefore the solution is consistent and justifies the approximation in which we neglected the derivative term in the limit of small $\omega$ and sufficiently large $\tau$.
Note that consistency of the solution with the approximation is a sufficient criterion for the validity of the approximation: We can solve the system for $A$ and $B$ in Equation A35 iteratively by defining
Assuming that for $i\to \mathrm{\infty}$, ${a}_{i}$ and ${b}_{i}$ converge to the correct solutions $a(\tau )$ and $b(\tau )$ when starting with ${a}_{0}=0$, we obtain ${a}_{1}$ and ${b}_{1}$ as given by Equation A39 and Equation A38 and can iteratively refine the approximation. The next iteration step then reads: $\frac{d}{d\tau}{a}_{1}=1\frac{2}{\omega (1\tau )}{a}_{2}{b}_{2}$. As ${a}_{1}\sim {\omega}^{\frac{2}{3}}$ we know that the lefthand side will be small and ${a}_{1}$ and ${b}_{1}$ solve the system if the lefthand side equals 0. Writing ${a}_{2}={a}_{1}+{\stackrel{~}{a}}_{2}$ and ${b}_{2}={b}_{1}+{\stackrel{~}{b}}_{2}$ this gives:
From dimensional analysis it follows that the correction terms ${\stackrel{~}{a}}_{2}$ and ${\stackrel{~}{b}}_{2}$ must scale like ${\stackrel{~}{a}}_{2}\sim {\omega}^{\frac{4}{3}}$ and ${\stackrel{~}{b}}_{2}\sim \omega $ and are hence much smaller than the first order approximations ${a}_{1}$ and ${b}_{1}$. Higher order corrections will give even smaller contributions showing that if $\frac{d}{d\tau}{a}_{1}\ll 1$, ${a}_{1}$ is indeed a very good approximation.
In the limit $\tau \to 0$, however, the expression for $a(\tau )$ in Equation A39 diverges and consistency is violated. Hence, the obtained solution is valid only for sufficiently large $\tau $.
We fix some small $\u03f5>0$ such that the approximation can be assumed to be sufficiently good if $\frac{d}{dt}a<\u03f5$. Furthermore, we define ${\tau}_{\u03f5}$ such that $\frac{d}{d\tau}a<\u03f5$ for all $\tau >{\tau}_{\u03f5}$. Using Equation A39 we can write this as $\frac{d}{d\tau}h<\u03f5{\eta}^{\frac{1}{3}}/{\omega}^{\frac{2}{3}}$ for all $\tau >{\tau}_{\u03f5}$, where the lefthand side, $\frac{d}{d\tau}h$, depends only on $\tau $. Hence, by decreasing $\omega $ we can make ${\tau}_{\u03f5}$ arbitrarily small: ${lim}_{\omega \to 0}{\tau}_{\u03f5}=0$. In order to calculate $g(\omega ,\eta )$ the integral in Equation A36 can be separated in a domain where the approximation $a(\tau )$ is accurate and a domain where the correct solution $\stackrel{~}{a}(\tau )$ deviates strongly from $a(\tau )$:
We see from Equation A35a that $\frac{d}{d\tau}\stackrel{~}{a}=1$ describes an upper bound to $\stackrel{~}{a}$ showing that $\stackrel{~}{a}(\tau )\le \tau $. Therefore we can bound the contribution of the first integral as ${\int}_{0}^{{\tau}_{\u03f5}}\frac{\stackrel{~}{a}(\tau )}{1\tau}\mathit{d}\tau \le {\int}_{0}^{{\tau}_{\u03f5}}\frac{\tau}{1{\tau}_{\u03f5}}\mathit{d}\tau =\frac{1}{2}\frac{{\tau}_{\u03f5}^{2}}{1{\tau}_{\u03f5}}$. Because this upper bound for the integral goes to 0 if $\omega $ and hence ${\tau}_{\u03f5}$ become small the first integral will become negligible against the second one. Asymptotically, we therefore only need to consider the second integral with the solution for $a(\tau )$ as given by Equation A39:
where we used the substitution $\tau =1\sqrt{1z/3}$ and $\mathrm{\Gamma}(x)$ is the (Euler) Gamma function. So, in the limit of small $\omega $, $g$ scales with $\omega $ and $\eta $ with identical exponent $\frac{1}{3}$. This contrasts the dimerization scenario where $g$ as well as $A$ and $B$ depend only on $\eta $ and are independent of $\omega $ (cf. Equation A32, A33 and A34).
Numerical analysis and the threshold values for the rate constants
In order to confirm the results of the last two paragraphs and to see how $g(\omega ,\eta )$ behaves in the intermediate regime where $\omega $ and $\eta $ are of the same order of magnitude we also investigate the function $g(\omega ,\eta )$ numerically. For that purpose we numerically integrate the ODEsystem for $A(t)$ and $B(t)$ in Equation A26 for different values of $\omega $ and $\eta $ with a semiimplicit method. Subsequently, we integrate the solution $A(t)$ using an adaptive recursive Simpson’s rule. Plotting $g$ in dependence of $\omega $ for fixed $\eta $ on a doublelogarithmic scale reveals a rather simple bipartite form of $g$, see Appendix 3—figure 1a:
The transition between these two regimes is rather sharp so that $g$ is best described in a piecewise fashion
Next, we plot the coefficients ${g}_{1}(\eta )$ and ${g}_{2}(\eta )$ against $\eta $. Here we find that ${g}_{1}(\eta )=a{\eta}^{\frac{1}{3}}$ with $a=\text{const}\approx 0.90$ and ${g}_{2}(\eta )$ is again bipartite with a sharp kink in between (Appendix 3—figure 1b):
where $b\approx 1.11$ and ${b}^{\prime}\approx 1.37$. The transition between both regimes is at $\eta \approx 1.82$. The second regime is not relevant for selfassembly since it refers to both large $\omega $ and large $\eta $, hence the travelled distance $2g$ is too small to give finite yield in this regime. Therefore, we discard the second regime and obtain as final result
with $a\approx 0.90$ and $b\approx 1.11$. This confirms perfectly the exponents as well as the coefficients found in the last two paragraphs. It is, however, surprising that there is such a sharp transition between both regimes, which allows to define $g(\omega ,\eta )$ in a piecewise fashion. This behavior must be the result of a series of lower oder terms in $g(\omega ,\eta )$ which are unimportant in the limits $\omega \ll \eta $ and $\eta \ll \omega $ but cause the sharp transition when $\omega $ and $\eta $ are of the same order of magnitude.
Finally, we return to our original task of finding the threshold values of the activation and dimerization rate for the onset of yield. Using our result for $g(\omega ,\eta )$ in Equation A23 we find as necessary and sufficient condition to obtain finite yield in the deterministic system:
Alternatively, we can state this result as two separate conditions out of which at least one must be fulfilled to obtain finite yield:
where ${P}_{\alpha}=\mathrm{\hspace{0.17em}8}{a}^{3}\approx 5.77$ and ${P}_{\mu}=\mathrm{\hspace{0.17em}4}{b}^{2}\approx 4.93$. This verifies Equation 1 in the main text.
Appendix 4
Impact of the implementation of subnucleation reactions
In the main text we focused our discussion on irreversible binding ${L}_{nuc}=\mathrm{\hspace{0.17em}2}$. In this section we investigate the effect of different implementations of the subnucleation reactions.
In general, perfect yield is trivially achieved if the complete ring is the only stable structure. However, yield can be maximal already for smaller nucleation sizes ${L}_{nuc}$ depending on the explicit decay rate $\delta $. In the deterministic limit without the dimerization and activation mechanisms ($\mu =\nu $, $\alpha \to \mathrm{\infty}$ ) a rapid transition from zero yield to perfect yield occurs in dependence of the critical nucleation size (see Appendix 4—figure 1). The threshold value in this case is approximately half the ring size and is weakly affected by the decay rate $\delta $. In order to obtain finite yield for small nucleation sizes, an extremely high decay rate would be necessary. Hence, maximizing the yield solely by increasing the nucleation size is not very feasible.
In our model, the subcritical reaction rates ${\mu}_{i}$ may take different values. Here, we want to restrict our discussion to two scenarios. First, all rates have an identical value ${\mu}_{i}=\mu $ and second, the rates increase linearly up to the supernucleation reaction rate: ${\mu}_{i}=\mu +\left(\nu \mu \right)\frac{i1}{{L}_{nuc}1}$.
In the deterministic limit, both implementations show the same qualitative behavior as the dimerization mechanism with ${L}_{nuc}=\mathrm{\hspace{0.17em}2}$ in the main text (see Appendix 4—figure 2). The only relevant aspect for the final yield is the extend to which nucleation is slowed down in total. In the constant scenario all reaction steps contribute equally. As a results there is a strong dependence on the number of such reaction steps, that is on the critical nucleation size. If however, the reaction rates increase linearly with the size of the polymers, the dimerzation rate dominates. Only in the case $\mu \ll \nu $ finite yield is observed at all. In this limit the dimerization rate is much smaller than the subsequent growth rates. The explicit form of the different ${\mu}_{i}$ is not of major importance for the yield. The total slowdown of nucleation is the central feature. Structure decay does not play any role for intermediate nucleation sizes.
The last question we want to address is how the combination of activation and dimerization mechanism and the corresponding nonmonotonic behavior is affected by the nucleation size. Again, we compare constant subnucleation growth with a linearly increasing growth rate (see Appendix 4—figure 3). In the deterministic regime both implementations behave qualitatively similar as the dimerization mechanism discussed in the main text. However, in both cases the stochastic yield catastrophe is less pronounced. For the constant growth rates a saturation of the maximal yield is observed for sufficiently low $\mu $. If the profile is linear this effect is weaker as compared to the constant case and a dependency on the explicit value of $\mu $ is still observed. The saturation value is not reached for these reactions rates.
Taking all our results for the subnucleation behavior together we draw the following conclusions: First, structure decay by itself it not very efficient in order to maximize yield. Second, the explicit choice of the subnucleation rates is of minor importance for the qualitative behavior. The system behaves similarly to the case ${L}_{nuc}=\mathrm{\hspace{0.17em}2}$. Third, larger nucleation sizes mitigate the stochastic yield catastrophe in general.
Appendix 5
Time evolution of the yield in the activation and dimerization scenario
In the main text we focus on the final yield, which represents the maximal yield that can be obtained in the assembly reaction for $t\to \mathrm{\infty}$. Here, we briefly discuss the temporal evolution of the yield in the two scenarios. Appendix 5—figure 1 shows the yield as a function of time for the dimerization scenario (blue) and the activation scenario (red) for the corresponding parameters indicated in the plot. Drawn lines show the evolution of the yield in the stochastic simulation whereas dashed lines represent its deterministic evolution obtained by integrating the corresponding meanfield rate equations (only shown for the activation scenario). In both scenarios, yield production sets in after a short lag time (Hagan and Elrad, 2010). The emergence of a lag time can be understood in terms of the interpretation of the assembly process as the progression of a travelling wave (see Sec. B). The travelling wave thereby describes the polymer size distribution and the time that is needed for the wave to reach the absorbing boundary equals the lag time for yield production observed in Appendix 5—figure 1. After the lag time, the yield increases very abruptly in the dimerization scenario and a bit more continually in the activation scenario. Since monomers are provided gradually in the activation scenario, the emerging wave is flatter and extends over a larger range (in polymer size space) as compared to the dimerization scenario. Consequently, yield production is more gradual in the activation scenario than in the dimerization scenario. For the same reason, the dimerization scenario is generally ‘faster’ or more time efficient than the activation scenario. For a detailed analysis of the time efficiency of these and other selfassembly scenarios we refer the reader to our manuscript in preparation (Gartner, Graf and Frey, in preparation).
In all depicted situations, the yield increases monotonically with time. This is, of course, generally true since the completed ring structures define an absorbing state in our system. The final yield, which is indicated in the right bar, therefore represents the upper limit for the yield that can be achieved in the assembly reaction. Appendix 5—figure 1 shows that the temporal yield curves initially are rather steep and quickly reach a value that lies within 10% of the final yield (‘quickly’ thereby refers to the respective time scale), before the curves flatten and increase more slowly. This underlines that the final yield is a meaningful observable that not only describes the upper limit for the yield but also approximates the typical yield of the assembly reaction under appropriate time constraints that are not too restrictive (on the time scale set by the respective lag time).
Appendix 6
Standard deviation of the yield
In the main text, the analysis focuses on the average yield. A priori it is, however, not apparent that this average quantity is informative, in particular due to the strong effect of stochasticity in the system. Here, we thus take a step forward to complement this picture by additionally considering a simple measure for the fluctuations of the yield, its standard deviation. Appendix 6—figure 1 is an extension of Figure 3a in the main text, showing the dependence of the average yield and its sample standard deviation on the activation rate. Since yield is always positive, the standard deviation of the yield has to be small if the average yield is close to 0 ($N=500$ in Appendix 6—figure 1). The same holds true for average yield close to 1 as the yield is bounded by one from above ($N=5000$ in Appendix 6—figure 1). For intermediate values of the average yield, the standard deviation is highest but still small compared to the average yield ($N=1000$ in Appendix 6—figure 1). The average yield is, thus, meaningful. Naturally the ratio of the standard deviation compared to the average yield also depends on the number of particles per species $N$ and on the number of species $S$. Generally speaking, for higher $N$ and $S$, this ratio decreases (see Appendix 7—figure 1 for the dependency on $S$).
Appendix 7
Influence of the heterogeneity of the target structure for fixed number of particles per species
Figure 3d in the main text shows how the maximal yield ${y}_{\mathrm{max}}$ depends on the number of species $S$ if the ring size $L$ and the number of possible ring structures $NS/L$ is fixed. This comparison for fixed $NS$ is motivated by the question which role the heterogeneity of a structure plays for assembly efficiency if a certain number of structures should be realized. Figure 3d illustrates that a higher number of species $S$ (more heterogeneous structures) leads to a lower maximally possible yield, suggesting that it is beneficial to build structures with as few different species as possible. However, this situation does not correspond to the deterministically equivalent case of fixed number of particles per species $N$ (note, though, that in the deterministic case the maximally possible yield is always 1, namely for $\alpha \to 0$). Instead, for higher number of species $S$, the number of particles per species $N\propto 1/S$ decreases. How does the heterogeneity of the structures $S$ alter the maximally possible yield if $L$ and $N$ (instead of $L$ and $NS$) are fixed? Appendix 7—figure 1 shows how the maximal yield ${y}_{\mathrm{max}}$ and its standard deviation (obtained as average yield and sample standard deviation for $\alpha ={10}^{8}$ when the yield has well saturated and the dynamics (except for the timescale) get independent of the exact value of the ratelimiting activation rate) depend on the number of species $S$. For homogeneous structures $S=1$ yield is always perfect since in this case there can be no fluctuations between species. As a result, the average yield is 1 and the standard deviation is 0. For increasing $S$, the average yield decreases until it levels off for $S\gg 1$. This behavior indicates that indeed the decreasing number of particles per species $N$ for larger $S$ is essential for the decrease of the maximal yield with $S$ in Figure 3d. As mentioned above, the standard deviation is largest for small $S>1$ and decreases with $S$.
Appendix 8
Dependence of the maximal yield ${y}_{\text{max}}$ in the activation scenario on $N$ and $L$
Figure 3c in the main text characterizes the dependence of the maximal yield ${y}_{\text{max}}$ in the activation scenario as a ‘phase diagram’ distinguishing different regimes of ${y}_{\text{max}}$ in dependence of the particle number $N$ and target size $L$. Supplementing this figure in the main text, Appendix 8—figure 1 shows the maximum yield that is obtained in the activation scenario in the limit $\alpha \to 0$ for fixed $L$ in dependence of $N$ (Appendix 8—figure 1a) as well as for fixed $N$ in dependence of $L$ (Appendix 8—figure 1b). For larger particle number $N$, the maximal yield exhibits a transition from 0 to 1 over roughly three orders of magnitude. Increasing $L$ shifts the transition to larger $N$. The threshold particle number where the transition starts is characterised by ${N}_{\text{th}}^{>0}(L)$ (see main text). Approximately, for $L\le 600$, we find ${N}_{\text{th}}^{>0}(L)\sim {L}^{2.8}$ (cf. main text, Figure 3c). Similarly, decreasing the target size $L$ for fixed $N$, the maximal yield exhibits a transition from 0 to 1 over roughly one order of magnitude in $L$. The corresponding threshold value ${L}_{\text{th}}^{>0}$ as a function of $N$ is obtained as the inverse function of ${N}_{\text{th}}^{>0}(L)$. Hence, at least for $N\le {10}^{5}$, approximately it holds ${L}_{\text{th}}^{>0}(N)\sim {N}^{0.36}$. Since ${y}_{\text{max}}$ is largely independent of the number of species $S$ for fixed $N$ and $L$ (see Appendix 7), the maximal yield in the activation scenario (for ${L}_{\text{nuc}}=2$) can be fully characterized as a function ${y}_{\text{max}}(N,L)$ of $N$ and $L$. Hence, ${y}_{\text{max}}$ can roughly be expressed in terms of the threshold particle number ${N}_{\text{th}}^{>0}(L)$ as
As can be seen from Figure 3c in the main text, the transition line between zero and nonzero yield slightly flattens with increasing $L$. Hence, the power law ${N}_{\text{th}}^{>0}(L)\sim {L}^{2.8}$ (and similarly for ${L}_{\text{th}}^{>0}$) only holds approximately and for a restricted range in $L$ and $N$. The asymptotic behavior of ${N}_{\text{th}}^{>0}$ in the limit $L\to \mathrm{\infty}$ remains elusive.
Data availability
All data was generated from stochastic simulations in C++ and deterministic simulations in Matlab. The source code files are included with the article.
References

Selfassembly of brome mosaic virus capsids: insights from shorter timescale experimentsThe Journal of Physical Chemistry A 112:9405–9412.https://doi.org/10.1021/jp802498z

Coordinating assembly of a bacterial macromolecular machineNature Reviews Microbiology 6:455–465.https://doi.org/10.1038/nrmicro1887

Stochastic selfassembly of incommensurate clustersThe Journal of Chemical Physics 136:084110.https://doi.org/10.1063/1.3688231

Combinatoric analysis of heterogeneous stochastic selfassemblyThe Journal of Chemical Physics 139:121918.https://doi.org/10.1063/1.4817202

First assembly times and equilibration in stochastic coagulationfragmentationThe Journal of Chemical Physics 143:014112.https://doi.org/10.1063/1.4923002

The evolutionary consequences of erroneous protein synthesisNature Reviews Genetics 10:715–724.https://doi.org/10.1038/nrg2662

Stochastic simulation of chemical kineticsAnnual Review of Physical Chemistry 58:35–55.https://doi.org/10.1146/annurev.physchem.58.032806.104637

Analyzing mechanisms and microscopic reversibility of selfassemblyThe Journal of Chemical Physics 135:214505.https://doi.org/10.1063/1.3662140

Mechanisms of kinetic trapping in selfassembly and phase transformationThe Journal of Chemical Physics 135:104115.https://doi.org/10.1063/1.3635775

Modeling viral capsid assemblyAdvances in Chemical Physics 155:1.https://doi.org/10.1002/9781118755815.ch01

Allosteric control of icosahedral capsid assemblyThe Journal of Physical Chemistry B 120:6306–6318.https://doi.org/10.1021/acs.jpcb.6b02768

Morphogenesis of the T4 tail and tail fibersVirology Journal 7:355.https://doi.org/10.1186/1743422X7355

Fluctuations in the kinetics of linear protein SelfAssemblyPhysical Review Letters 116:258103.https://doi.org/10.1103/PhysRevLett.116.258103

Assembly of viruses and the pseudolaw of mass actionThe Journal of Chemical Physics 131:155101.https://doi.org/10.1063/1.3212694

Undesired usage and the robust selfassembly of heterogeneous structuresNature Communications 6:6203.https://doi.org/10.1038/ncomms7203

Eukaryotic ribosome assembly, transport and quality controlNature Structural & Molecular Biology 24:689–699.https://doi.org/10.1038/nsmb.3454

Numerical evidence for nucleated selfassembly of DNA brick structuresPhysical Review Letters 112:238103.https://doi.org/10.1103/PhysRevLett.112.238103

Nucleation: theory and applications to protein solutions and colloidal suspensionsJournal of Physics: Condensed Matter 19:033101.https://doi.org/10.1088/09538984/19/3/033101

First passage times in homogeneous nucleation and selfassemblyThe Journal of Chemical Physics 137:244107.https://doi.org/10.1063/1.4772598

Colloquium : Toward living matter with colloidal particlesReviews of Modern Physics 89:031001.https://doi.org/10.1103/RevModPhys.89.031001

Fabrication of novel biomaterials through molecular selfassemblyNature Biotechnology 21:1171–1178.https://doi.org/10.1038/nbt874
Decision letter

Frank JülicherReviewing Editor; Max Planck Institute for the Physics of Complex Systems, Germany

Naama BarkaiSenior Editor; Weizmann Institute of Science, Israel

Pablo SartoriReviewer; The Rockefeller University, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
The authors study the role of fluctuations for the yield of selfassembly of heterogenous molecular complexes in specific arrangements. The approach captures the combination of subunit activation, nucleation of complex formation and complex assembly, starting from a set of different subunits in solution. This study shows in particular that when nucleation is limited by activation of subunits, yield can drop catastrophically due to stochasticity of activation and nucleation. This occurs in regimes where yield would be high when fluctuations are neglected. This work provides deep new insights in the role of stochasticity for the reliable selfassembly of molecular complexes and a framework for the study of molecular assembly in cells.
Decision letter after peer review:
Thank you for submitting your article "Stochastic yield catastrophes and robustness in selfassembly" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Naama Barkai as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Pablo Sartori (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
The authors study the role of fluctuations for the yield of selfassembly of heterogenous molecular structures. An elegant and simple model is introduced which captures fluctuations of both activation of monomers and stochastic nucleation. The paper first recapitulates known results that slow nucleation is key to high yield. It is shown that in the deterministic or mean field limit slow nucleation by activation or by cooperative assembly (dimerization) play a similar role. However, in the presence of fluctuations, both scenarios can show very different behaviors. In particular the yield can drop in the activation scenario due to fluctuations which the authors term stochastic yield catastrophe. Interesting is the general case where activation and dimerization occur together. The authors show that in this case there can be an optimal activation rate at which yield is maximized. The work provides fundamental new insights in the selfassembly of macromolecular structures in cells.
However, the referees also raised several points that need to be addressed in a revision. In particular the depth and relevance of some of the results is not fully clear. The authors should address the specific points given below.
Essential revisions:
The relevance and strength of the results remains partly unclear. Several points need clarification:
1) The authors use the term "Stochastic yield catastrophe" to describe the drop of yield for increased fluctuations. In contrast to the catastrophic drop of yield in the deterministic case, it is unclear whether for these stochastic effects the term stochastic yield catastrophe is meaningful and appropriate as the drop is gradual and seems soft rather than representing a transition.
2) In Figure 3D the effect of increasing the number of species for a fixed size L is studied. It is shown that more species results in lower maximal yield. This could simply be an effect of the constraint NS = constant, which results in a decreasing number of components per species, N. The authors should check whether, when keeping N = constant, changing S still has an effect on the yield.
3) Still in Figure 3, the characterization of the "stochastic yield catastrophe" seems incomplete in view of its relevance to the authors. It seems that there is a certain dependence y_{max}(N,L,S) (forgetting about L_{nuc} for now). Immediate questions that are not addressed are: how does y_{max} scale with N, for fixed L and S? How does it then scale with L? And how with the heterogeneity, L/S? One step in this direction is Figure 3D, in which the authors show y_{max}(S) for several values of L, but variable N (why then the label N = 1000 in the panel?). More such plots should be presented to clarify the key properties of the system.
4) There are additional questions which relate to the relevance of the results in realistic situations. The model is elegant but very much simplified. For example, growth is fully reversible which may not be the case in many systems of macromolecular assembly. It is unclear how general and robust the results of the simplified model are if the simplifications are relaxed.
5) The time needed for assembly is not discussed. In a biological context it is essential that assembly occurs sufficiently fast. High yield in the long term alone is not enough for assembly to happen. I understand that only looking at yield at long times is an elegant simplification of the problem, but it remains unclear how useful these insights are for real situations.
https://doi.org/10.7554/eLife.51020.sa1Author response
Essential revisions:
The relevance and strength of the results remains partly unclear. Several points need clarification:
1) The authors use the term "Stochastic yield catastrophe" to describe the drop of yield for increased fluctuations. In contrast to the catastrophic drop of yield in the deterministic case, it is unclear whether for these stochastic effects the term stochastic yield catastrophe is meaningful and appropriate as the drop is gradual and seems soft rather than representing a transition.
The question alerted us to the fact that we have to be more explicit about the reason why we use the term’ catastrophe’. The stochastic yield catastrophe is not the change in yield with respect to the parameter α. The term is intended to refer to situations in which we deterministically expect a perfect yield but due to stochastic effects we may end up with zero yield instead. This drop in yield of up to 100% compared to the deterministically expected value is what we refer to as a ’stochastic catastrophe’. We have revised the definition of the term in the subsection “Stochastic effects in the case of reduced resources” of the main text to make this point clearer. Furthermore, we extended our discussion of the nonlinearities in the stochastic effects with respect to different model parameters (especially the ring size L) to give a more complete picture of their significance as well. This directly links to answer 3 in which those aspects are discussed in detail.
– We have adapted the introduction of the term “Stochastic yield catastrophe” in subsection “Stochastic effects in the case of reduced resources”.
2) In Figure 3D the effect of increasing the number of species for a fixed size L is studied. It is shown that more species results in lower maximal yield. This could simply be an effect of the constraint NS = constant, which results in a decreasing number of components per species, N. The authors should check whether, when keeping N = constant, changing S still has an effect on the yield.
Indeed, if N (instead of NS) is fixed in Figure 3D, the maximal yield becomes independent of S for large S >> 1. However, this saturation value of the yield differs from the yield for the homogeneous system S = 1, where the maximal yield is always 1 as the system behaves deterministically. So, the yield (sharply) drops from a perfect value of 1 (for S = 1) to a nonperfect value for S > 1 and then levels off. Intriguingly, for some cases, one even observes nonmonotonic behavior of the yield with respect to the number of species: Yield decreases from 1 (for S = 1) to a lower value and then increases again before it saturates for S >> 1. This rather nonintuitive behavior needs a more thorough discussion which we provide in detail in a followup manuscript (in preparation). In short, we believe that similar to the deterministic description also the stochastic limit shows equivalent behavior for different number of species S, as long as N and L are fixed and S >> 1.
In this manuscript, we decided to focus on the case where NS = const since this ensures that all compared systems can build the same number of rings, NS/L. In this context, Figure 3D suggests that in order to achieve high yield, it is beneficial to build structures that are as homogeneous as possible. So, we believe that the relevance of our findings does not change. Nonetheless, in order to pick up the important point made by the referees, we added a remark to the manuscript and a figure that shows the flattening of the maximal yield for S >> 1.
– We have added a remark to subsection “Stochastic effects in the case of reduced resources”.
– We have added subsection “Influence of the heterogeneity of the target structure for fixed number of particles per species” and a figure to the Appendix.
3) Still in Figure 3, the characterization of the "stochastic yield catastrophe" seems incomplete in view of its relevance to the authors. It seems that there is a certain dependence y_{max}(N,L,S) (forgetting about L_{nuc} for now). Immediate questions that are not addressed are: how does y_{max} scale with N, for fixed L and S? How does it then scale with L? And how with the heterogeneity, L/S? One step in this direction is Figure 3D, in which the authors show y_{max}(S) for several values of L, but variable N (why then the label N=1000 in the panel?). More such plots should be presented to clarify the key properties of the system.
We are happy to take this opportunity to clarify the dependence of y_{max}(N,L,S) on N,L and S. As already explained in point (2) and shown by the added plot, y_{max} quickly becomes independent of S when S > 1. Hence, what remains to be described is the dependence of y_{max} on N and L. Phenomenologically, by increasing the particle number N (for fixed L), we encounter a threshold value N_{th} where the yield starts to increase sigmoidally from 0 to 1 (roughly over three orders of magnitude in N), quite similar to the behavior of the deterministic yield with respect to α and µ in Figure 1. Analogously, for fixed particle number N, by decreasing L, y_{max} exhibits a rapid transition from 0 to 1 at some threshold L_{th}. We included two additional plots in the Appendix that show the dependences of y_{max} on N and L, respectively. In the main text, in Figure 3C, we characterize the dependence of y_{max} on N and L in combined form as a “phase diagram” that shows the four regimes y_{max} = 0, 0 < y_{max} < 0.5, 0.5 < y_{max} < 0.99 and 0.99 < y_{max} in N−L phase space (we now include the regime 0.99 < y_{max} to show that the transitions from 0 to almost perfect yield extend over a finite range in N or L, respectively).
In order to assess the relevance of stochastic effects for a particular system it is, therefore, crucial to know N_{th}(L), the threshold particle number as a function of L, whose inverse gives L_{th}(N). The function N_{th}(L) is exactly described by the line that separates the two regimes y_{max} = 0 and y_{max} > 0 in the “phase diagram” in Figure 3C. On the double logarithmic scale, we find that there is an approximate powerlaw dependence of y_{max} on L with an exponent of around 2.8. It is not totally clear to us why there is this dependence with such a high exponent. Simple scaling arguments that we tried could only explain an exponent of maximally 1.5. We investigate the underlying reasons for this dependence in a technical followup project (manuscript in preparation). The strongly nonlinear dependence of N_{th} on L partly explains why we argue that the expression “stochastic yield catastrophe” is justified, as the relevance of this effect strongly increases with the size of the target structure. We extended the caption of Figure 3C and the description in the main text in order to clarify the relevance of the stochastic yield catastrophe and the dependence of y_{max} on the system parameters.
The label “N = 1000” in Figure 3D was, of course, incorrect and misleading. We thank the attentive referee for this remark and have corrected the label.
– We have included the regime y_{max} > 0.99 in Figure 3C and extended the description in the Figure caption and the main text.
– We have added subsection “Dependence of the maximal yield y_{max} in the activation scenario on N and L” and two additional figures to the Appendix.
– We have corrected the false label in Figure 3D.
4) There are additional questions which relate to the relevance of the results in realistic situations. The model is elegant but very much simplified. For example, growth is fully reversible which may not be the case in many systems of macromolecular assembly. It is unclear how general and robust the results of the simplified model are if the simplifications are relaxed.
In our model two different modes of binding exist. Below a polymer size L_{nuc} binding is fully reversible whereas above L_{nuc} irreversible growths takes place. This irreversibility is meant to represent a separation of time scales between assembly and disassembly of a stable structure. Introducing a very small dissociation rate would not affect our results significantly on the time scale of observation (see answer to question 5) but makes the definition of the yield ambiguous. If binding is reversible for all unfinished structures (L_{nuc} = L − 1), trivially, only a yield of 1 is possible at the end of the assembly process. However, the process then may take an unphysical amount of time. The relevance of the size of L_{nuc} for the stochastic yield catastrophe is studied in Figure 3D. The robustness of all other effects with respect to the size of L_{nuc} (which corresponds to the degree of reversibility) is discussed in Appendix 4. Furthermore, the two additional models illustrated in Figure 6A and Figure 7A were introduced to test for robustness against other model assumptions like monomer binding and linear growth. In all cases, we still find stochastic yield reduction, which leads us to the conclusion that the observed effects do not rely on model specifics. We added remarks to the main text to clarify the relation between L_{nuc} and the existence of reversible and irreversible binding. Finally, we want to remark that the assumption of a finite L_{nuc} < L and irreversible growth above this threshold have been applied successfully in experimental studies, as for example of the Brome Mosaic Virus Capsids (Chevance and Hughes, 2008).
– We made the relation between reversible and irreversible binding and L_{nuc} more explicit in the main text.
5) The time needed for assembly is not discussed. In a biological context it is essential that assembly occurs sufficiently fast. High yield in the long term alone is not enough for assembly to happen. I understand that only looking at yield at long times is an elegant simplification of the problem, but it remains unclear how useful these insights are for real situations.
The importance of time in the biological context is a very pertinent observation and we fully agree with the referee that this question arises naturally: time efficiency plays a significant role in biological and also artificial selfassembly. Due to its significance and due to the following three reasons we believe the question of assembly time and time efficiency is beyond the scope of the current study. First, a thorough discussion of time in the context of selfassembly turned out to be extensive and hence would have considerably increased the length of the manuscript. Second, a simultaneous discussion of both factors, stochasticity and time efficiency, would complicate the analysis (if not treated separately) and obscure the significance and relevance of each one of these factors. Third, the final yield still is a most significant observable that characterizes the efficiency of the assembly process in an explicit way: Since the yield monotonically increases with time, the final yield represents an upper bound for the yield at any finite time. In other words, the final yield is the maximum yield that can be achieved in the assembly process irrespective of time constraints. As such, the final yield is an important and informative observable that can be defined and analyzed in an unambiguous way. For those reasons, we decided to focus here entirely on the analysis of the final yield and how it is affected by stochastic effects and to treat the time aspect separately.
However, in order to give the reader an impression of what the time evolution of the yield typically looks like, we now included a corresponding plot and a new section in the Appendix. The plot exhibits typical time trajectories of the yield in both scenarios and shows, in particular, that the yield increases monotonically with time. It also shows that typically, after some delay time, the yield increases rather quickly to a value that lies within 10% of the final yield and then continues to grow more slowly. This underlines that the final yield is a meaningful observable that describes the typical outcome of the assembly reaction well, under appropriate time constraints that are not too restrictive.
– We have added a paragraph to clarify the significance of the observable of the final yield and to state more explicitly that the time efficiency is not considered here.
– We have added subsection “Time evolution of the yield in the dimerization and activation scenario” and a figure to the Appendix.
https://doi.org/10.7554/eLife.51020.sa2Article and author information
Author details
Funding
Deutsche Forschungsgemeinschaft (GRK2062)
 Patrick Wilke
Deutsche Forschungsgemeinschaft (QBM)
 Florian M Gartner
 Isabella R Graf
Aspen Center for Physics (PHY160761)
 Erwin Frey
Deutsche Forschungsgemeinschaft (EXC2094  390783311)
 Erwin Frey
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Nigel Goldenfeld for a stimulating discussion, and Raphaela Geßele and Laeschkir Hassan for helpful feedback on the manuscript. This research was supported by the German Excellence Initiative via the program ‘NanoSystems Initiative Munich’(NIM) and was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC2094–390783311. FMG and IRG are supported by a DFG fellowship through the Graduate School of Quantitative Biosciences Munich (QBM). We also gratefully acknowledge financial support by the DFG Research Training Group GRK2062 (Molecular Principles of Synthetic Biology). Finally, EF thanks the Aspen Center for Physics, which is supported by National Science Foundation grant PHY1607611, for their hospitality and inspiring discussions with colleagues.
Senior Editor
 Naama Barkai, Weizmann Institute of Science, Israel
Reviewing Editor
 Frank Jülicher, Max Planck Institute for the Physics of Complex Systems, Germany
Reviewer
 Pablo Sartori, The Rockefeller University, United States
Publication history
 Received: August 12, 2019
 Accepted: February 4, 2020
 Accepted Manuscript published: February 5, 2020 (version 1)
 Version of Record published: March 23, 2020 (version 2)
Copyright
© 2020, Gartner et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,766
 Page views

 291
 Downloads

 4
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Physics of Living Systems
Foraging mammals exhibit a familiar yet poorly characterized phenomenon, ‘alternation’, a pause to sniff in the air preceded by the animal rearing on its hind legs or raising its head. Rodents spontaneously alternate in the presence of airflow, suggesting that alternation serves an important role during plumetracking. To test this hypothesis, we combine fully resolved simulations of turbulent odor transport and Bellman optimization methods for decisionmaking under partial observability. We show that an agent trained to minimize search time in a realistic odor plume exhibits extensive alternation together with the characteristic castandsurge behavior observed in insects. Alternation is linked with casting and occurs more frequently far downwind of the source, where the likelihood of detecting airborne cues is higher relative to ground cues. Casting and alternation emerge as complementary tools for effective exploration with sparse cues. A model based on marginal value theory captures the interplay between casting, surging, and alternation.

 Physics of Living Systems
Computational model reveals why pausing to sniff the air helps animals track a scent when they are far away from the source.