Experimentally guided models reveal replication principles that shape the mutation distribution of RNA viruses
 Cited 14
 Views 2,128
 Annotations
Abstract
Life history theory posits that the sequence and timing of events in an organism's lifespan are finetuned by evolution to maximize the production of viable offspring. In a virus, a life history strategy is largely manifested in its replication mode. Here, we develop a stochastic mathematical model to infer the replication mode shaping the structure and mutation distribution of a poliovirus population in an intact single infected cell. We measure production of RNA and poliovirus particles through the infection cycle, and use these data to infer the parameters of our model. We find that on average the viral progeny produced from each cell are approximately five generations removed from the infecting virus. Multiple generations within a single cell infection provide opportunities for significant accumulation of mutations per viral genome and for intracellular selection.
https://doi.org/10.7554/eLife.03753.001eLife digest
Viruses with genetic information made up of molecules of RNA can multiply quickly, but not very accurately. This means that many errors, or mutations, occur when the RNA is copied to create new viruses. The advantage of this rapid, but mistakefilled, RNA replication process is that some of the mutations will be beneficial to the virus. This allows viruses to rapidly evolve, for example, to develop resistance against drugs.
The poliovirus is an RNA virus that can cause paralysis and death in humans. To prevent such infections, scientists have extensively studied the poliovirus and have developed effective vaccines against it that have eliminated the virus from all but a few countries. Because so much is known about the poliovirus and because it has a very simple structure, scientists continue to use the poliovirus as a model to study virus behavior.
One unknown aspect of the poliovirus' behavior is how it replicates after invading a cell. Are all of its RNA copies made from the original viral RNA that first infected the cell, in what is known as a ‘stamping machine’ model? Or do the new copies of the RNA also get copied themselves in a ‘geometric replication mode’ that increases the likelihood of mutations and enables the virus to evolve more rapidly?
Viral RNA molecules are copied by one of the virus's own proteins and so before the viral RNA can be replicated, it must first be translated to form viral proteins. When and where replication begins depends on the concentration of translated proteins around the RNA and so replication tends to begin in particular areas of the cell at different times. Schulte, Draghi et al. used mathematical modeling to create computer simulations of the number of polioviruses in a cell that take into account these time and space constraints. By including random elements in the model, the simulated behavior more accurately follows experimentally recorded data than previously used models.
The results of the model led Schulte, Draghi et al. to conclude that the poliovirus replicates by the ‘geometric mode’; as new copies of the poliovirus RNA are made, each copy goes on to make more copies. This means that in a single infected cell there are multiple generations of RNA, and each generation may undergo distinct mutations that are passed on to the next set of RNA copies. In fact, Schulte, Draghi et al. found that the average virus released from an infected cell is the greatgreatgreatgranddaughter of the original virus that infected the cell. With so many different generations of virus coexisting in a cell, there are a lot of opportunities for new genetic combinations to occur and for viruses to evolve new abilities.
https://doi.org/10.7554/eLife.03753.002Introduction
RNA viruses are excellent models for evolution. They replicate quickly and have extremely high mutation rates, orders of magnitude greater than those of most DNAbased life forms (Drake, 1993). While this combination of traits creates the potential for rapid adaptation, it necessitates a life history strategy that balances the need for explosive, exponential growth with the requirement to maintain genomic integrity. The life history strategies of viruses are largely reflected by their mode of intracellular replication. Two classic replication modes have been described for singlestranded RNA viruses: the ‘stamping machine’ mode (Stent, 1963) and the ‘geometric replication’ mode (Luria, 1951). In the stamping machine mode (SM), templates made from the original infecting genomes are used for the production of all progeny genomes. In the geometric replication mode (GR), newly made progeny genomes are used to create further templates for additional rounds of replication within a single cellular infection cycle (Figure 1). Progeny produced from stamping machine replication are all a single generation away from the parental strand whereas progeny generated from geometric growth represent a distribution of generations from the parental strand, often resulting in a fractional mean number of generations (see Figure 1). The iterative nature of GR creates branched genealogies that allow for expansive exploration of sequence space and results in a mutation distribution that differs from the SM mode (Luria, 1951). Recent studies with populationgenetic models (Draghi et al., 2010) and RNA enzyme populations (Hayden et al., 2011) have shown that differences in the distribution of mutants can significantly impact the adaptability of a population. Recent studies with poliovirus (PV) have also demonstrated that mutational differences within a population can have dramatic effects on pathogenicity (Pfeiffer and Kirkegaard, 2005; Vignuzzi et al., 2006) as well as fitness, virulence, and robustness (Lauring et al., 2012).
Poliovirus' simple genomic architecture and medical importance have made it one of the most extensively studied viruses (Racaniello, 2006). However, despite decades of mechanistic studies and recent revelations of the importance of population structures, the replication mode and resulting mutation distribution have yet to be determined. PV therefore proves an excellent candidate for the rigorous construction of a computational model of virus replication to predict population structure and mutation distribution. A major feature of PV intracellular dynamics is that the genome participates in multiple reactions: translation, replication, and encapsidation. Its 7.5 kb genome contains a single open reading frame, which encodes 7 nonstructural proteins and 4 capsid proteins. Translation produces a single polyprotein, which is cleaved into individual functional viral proteins. Replication of the positivesense genome by the virusencoded RNAdependent RNA polymerase produces a negativesense strand, which is used as a template for further genome synthesis. Evidence suggests that the initial, infecting positivesense genomes must be translated before they can replicate (Novak and Kirkegaard, 1994). The switch from translation to replication appears to be dependent on the concentration of a viral protein product, 3CD, which stimulates a transition from a linear, translating RNA to a noncovalently associated ‘circular’ RNA competent for replication (Gamarnik and Andino, 1998, 2000; Herold and Andino, 2001). Encapsidation is thought to result from protein–protein associations of capsid pentamers with the RNA replication machinery and protein–RNA association of capsid pentamers with viral RNA (Pfister et al., 1992; Nugent and Kirkegaard, 1995; Liu et al., 2010). Actively replicating genomes are preferentially encapsidated and packaging is biased to exclude negativesense strands, although the mechanism of this is not understood (Nugent et al., 1999). Although multiple ribosomes can translate a genome at the same time and multiple viral polymerases can replicate a genome at the same time, the processes are mutually exclusive (Gamarnik and Andino, 1998). Similarly, neither translation nor replication can occur after a genome is packaged into a virion.
Several studies have demonstrated that PV genomes are often localized to the cytosolic surfaces of the endoplasmic reticulum, Golgi bodies, lysosomes, or vesicles derived from these (Schlegel et al., 1996; Bolten et al., 1998; Cui et al., 2005; Egger and Bienz, 2005; den Boon et al., 2010). Replication complexes are thought to form on these membranes in cis, resulting in a close association of translation products and positivesense genomes (Novak and Kirkegaard, 1994; Egger et al., 2000). Compartmentalization of replication complexes likely accounts for the observation that many functions of nonstructural proteins cannot be complemented in trans (Novak and Kirkegaard, 1994; Ansardi et al., 1996). Only capsid proteins, 3CD, and 3D have been demonstrated to transcomplement (Novak and Kirkegaard, 1994; Nugent et al., 1999; Oh et al., 2009). Taken together, these studies suggest that the essential transitions—from translation to replication, and from replication to encapsidation—are largely localized and influenced by the dynamics of the molecules in each compartment.
In recent years, modeling approaches have begun to examine the tradeoffs that come with having a genome that is a template for both replication and translation (Krakauer and Komarova, 2003; Regoes et al., 2005; Sardanyés et al., 2009; Thébaud et al., 2010; Martinez et al., 2011). These studies have raised mechanistic and evolutionary questions about the life cycle of singlestranded, positivesense RNA viruses, but most have not produced models that can be directly compared to data. Several previous models are deterministic in nature (Krakauer and Komarova, 2003; Regoes et al., 2005; Martinez et al., 2011) and assume a wellmixed, spatially uniform cellular environment (Krakauer and Komarova, 2003; Regoes et al., 2005; Sardanyés et al., 2009; Thébaud et al., 2010; Martinez et al., 2011). Experimental evidence suggests that each of these assumptions is problematic and do not reflect the biological constraints and properties of viral replication. The small numbers of the critical molecules that initiate an infection suggest that a stochastic model would more accurately describe early reactions and could make distinct predictions from previous deterministic approaches (Srivastavawz et al., 2002). Often infections begin with relatively few virions that release their genomes into the cell and continue with the translation of these few initial genomes. Random variation in the switch from translation to replication is amplified by the subsequent exponential phase of the infection, and this amplification is likely to bias the mean dynamics of a set of infections. Indeed, recent singlecell studies demonstrated the significant impact of stochastic effects on poliovirus infections (Schulte and Andino, 2014).
Here, we have developed a stochastic simulation model in which we compartmentalize reactions in an effort to accurately describe intracellular dynamics in both space and time. Additionally, rather than fixing each parameter on an estimated value, an approach used by previous models, we use an Approximate Bayesian Computation approach to fit our parameters from temporal quantitative data. We find that by combining stochasticity and spatial structure, our model reflects and describes the population dynamics and structure of the viral population during an infection cycle more accurately than previous models.
Fitting our model to RNA abundances over time, we find that poliovirus follows the geometric replication mode: multiple iterative generations of genomic replication produce progeny virus. Posterior parameter fits indicate that progeny of a single cellular infection are approximately five generations away from the initial, infecting genomes. This replication mode produces populations with expansive, branched genealogies, creating the dramatic potential for the exploration of sequence space, as well as creating the potential for intracellular selection among related mutant genomes.
Results
Inference of replication parameters
We used temporal, quantitative RTPCR data of both positivesense genomes and negativesense strands to estimate the free parameters in our model. The role of each parameter in poliovirus replication and in the mathematics of our model are diagrammed in Figure 2 and described in detail in the ‘Materials and methods’. We chose to use measurements of positive and negativesense RNA at multiple time points for three multiplicities of infection (1, 10, and 100), as well as measurements of virion numbers at multiple times for MOI 10; this amounted to 27 measured means, with three data points for each mean. Strandspecific qRTPCR was performed to quantify positivesense and negativesense poliovirus RNA against in vitro transcribed standard RNAs of each sense (Burrill et al., 2013). Along with cell counts, this allowed for temporal measurements of the average positivesense and negativesense poliovirus RNA copies per cell. Negativesense RNA was not detectable until 2 hr post infection for MOIs 10 and 100 and 3 hr post infection for MOI 1. Positivesense RNA was clearly quantifiable for all time points at the MOI 10 and 100 but did not rise above background levels until 3 hr post infection for MOI 1. Using a newly developed virion immunoprecipitation assay (Burrill et al., 2013), we observed de novo virion assembly between 2 hr and 3 hr post infection. Along with total positivesense RNA measurements from this time course, we obtained a percentage of genomes encapsidated in quadruplicate at 3 hr, 4 hr, and 5 hr post infection. Figure 3 illustrates this data alongside projections from inferred parameters from the second round of SMC (see Figure 3—source data 1).
The relatively high number of data dimensions, combined with the computationally intensive and highly stochastic nature of our simulations, made a traditional maximum likelihood approach impractical. Instead, we turned to Approximate Bayesian Computation, using as our summary statistic the sum of the squared deviations of the average simulated RNA concentrations (and average fraction of virions for MOI 10) from their corresponding empirical means. This algorithm produces progressively more accurate estimates of each parameter over several rounds; Figure 4—figure supplement 1 illustrates that, for most parameters, round one restricts the credible range of each parameter in comparison to the flat prior and round two leads to further focusing. The data appear to be uninformative for at least one parameter, c_{pack}; a second parameter, com_{max}, appears to be poorly constrained by the comparison to MOI = 10 in round one, but somewhat constrained by the broader measurement against all three MOIs in round two. Round two also appears to significantly move the mode of two other parameters, c_{com} and c_{3A}.
Figure 4—figure supplement 1 indicates that ABC inference informed the values of nine of our ten parameters, but these marginal parameter distributions alone do not capture correlations between parameter values. Figure 4—figure supplement 2 shows evidence of significant correlations, and Figure 4—figure supplement 3 shows that parameter sets drawn from the marginal distributions in Figure 4—figure supplement 1 (i.e., uncorrelated parameter values) do a poor job of matching the data. While not unexpected, these significant correlations require that we work directly with the sampled parameter sets arising from our inference process, which is the approach we take below.
Each parameter in the posterior is supported over a significant range of possibilities. This remaining uncertainty reflects two factors: the data may be insufficient to determine each parameter, and the inference process may not have fully exploited the inferential power of the data. We took several approaches to quantify the sufficiency of the data and the effectiveness of the inference process. First, we measured the mean error of parameter sets when compared to the data for each multiplicity of infection independently; we asked if performance at one MOI predicted performance at the other two. If so, the dimensionality of our data would be effectively lower than we had initially assumed. Surprisingly, pairwise correlations between mean error at one MOI and another were very weak: Spearman's rho is 0.031 for MOIs 1 and 10, −0.092 for MOIs 1 and 100, and 0.129 for MOIs 10 and 100. This suggests that measurements at each MOI are contributing distinct information to our inference process.
Second, we determined the sensitivity of our measure of model ‘fit’ to variation in each of the parameters. This analysis, described in detail in the ‘Materials and methods’, showed that the data significantly informed the values of eight of ten of the parameters (Figure 4—figure supplement 4). We also performed a separate validation analysis which attempted to infer the replication phenotype, $\overline{g}$, from mock data simulated from parameter sets drawn from our prior distribution. As described more fully in the ‘Materials and methods’, this exercise confirms that the data and method are adequate to infer the trait of interest, albeit with some degree of inaccuracy (Figure 4—figure supplement 5).
Finally, we examine the fit between the data and the mean dynamics of inferred parameter sets. Figure 3 shows that the inferred parameter sets generally capture the information in the RNA and virion data, although some parameter sets deviate consistently from the data for some values. Variability among replicate sets of twenty singlecell simulations is substantial, correlated across a time series, and greatest for the smallest MOI. Further inferential effort could improve either the accuracy of the mean predicted dynamics or the precision of replicate simulation dynamics, though Figure 3 suggests that such improvements could only be modest. This variability is expected due to the stochastic nature of the simulations, and it may better reflect the biological noise of the infection (Schulte and Andino, 2014).
Predicted replication dynamics
Figure 4 shows the inferred posterior distribution of $\overline{g}$, the mean number of generations for a packaged virion based on two rounds of inference with measured RNA and virion abundances. This distribution is plotted for MOI = 10; the predicted values at MOI = 1 and MOI = 100 are very similar and highly correlated (weighted means: MOI = 1, 4.96; MOI = 10, 5.06; MOI = 100, 4.85; Spearman's rho (unweighted): MOI 1 and 10, 0.92; MOI 1 and 100, 0.85; MOI 10 and 100, 0.96). While this distribution does show substantial variance, it is strongly inconsistent with a ‘stamping machine’ mode of replication, which would have a $\overline{g}$ near one.
To explore the robustness of this inference, we compared the predicted dynamics of the model to an additional type of data: the fraction of positivesense RNA molecules translating at each time point. We fractionated infected cell lysates and quantified positivesense RNAs in monosome and polysome fractions relative to total positivesense RNA copies. These data render a percentage of genomes associated with translation machinery and provide an additional set of data to evaluate the parameter sets produced by SMC. When measured at an MOI of 10 at 1, 2, 3, 4, and 5 hr post infection, the majority of positivesense RNAs appeared to be associating with translation machinery, consistently averaging near 85%. Many of the inferred parameter sets are consistent with the measured values but a substantial fraction is clearly inconsistent (Figure 4—figure supplement 6). The summed squared error of the translating fractions is also correlated with $\overline{g}$ (Figure 4—figure supplement 7). To estimate how these new data inform our prediction of $\overline{g}$, we calculated a weighting factor based on the relative rank of the summed squared error of translating fractions, such that the parameter set with the best fit was assigned a weight of 1, the next a weight of 1134/1135, etc. Reweighting the distribution of $\overline{g}$ by this additional factor produced the distribution shown in Figure 4; the mean $\overline{g}$ shifts from 5.06 to 4.78.
Predicting the distribution of mutations
We simulated mutation and selection during infections to understand how replication dynamics shape the distribution of mutation frequencies among virions. To illustrate how mutant frequencies depended on $\overline{g}$, we chose two parameter sets with values of $\overline{g}$ at the low and high end of the range supported by the posteriors in Figure 4 and included the ‘best’ parameter set as a representative of the more common values of $\overline{g}$. Mutation frequencies for these parameter sets (‘best’, ‘low’, and ‘high’—see Figure 3—source data 1) are plotted in Figure 5A for a range of mutants that have a diminished rate of replication relative to the wild type. We chose to model this particular type of deficiency because we expected that replication deficits would directly affect the growth and packaging of the mutants. We observed that deficits in a different trait, the rate of complex formation, were effectively invisible to intracellular selection (the frequency of a mutation with an 80% reduction in complex formation was estimated to be reduced by 0.6–4.6% compared to a neutral mutation in the ‘best’ parameter set); we expect that mutations in traits like the rate of translation would also be complemented by the wildtype phenotype and so would not experience significant selection during the infection in which they arose.
Several distinct features of mutation in this model are evident from Figure 5A. Mutation frequency does not decrease linearly as intracellular selection approaches its maximal value; the curve results from the fact that mutant genomes that are immediately packaged are not subject to selection, while the contribution of rare, early mutants to average mutation frequency may be reduced by multiple rounds of intracellular selection. Knowing $\overline{g}$ and the mutation rate allows us to directly calculate the fate of neutral or very unfit mutations, but estimating the frequency of mutations of intermediate fitness requires additional simulations using our model. A third feature is the sizable confidence intervals relative to the number of infections sampled (10 million for each point). This high variability reflects the large contribution of very rare mutations that arise early in an infection and can contribute 1000s of mutant virions, especially when selection is weak.
The effect of these rare, early mutations in the overall mutant distribution can be seen as a departure from a Poisson process. To remove a potential confounding variation in burst size, we compare the distribution of mutations from infections within 10% of the median burst size and calculate a Poisson expectation for a mediansized burst with the same expected frequency. For the ‘best’ parameter set, median infections produced many more bursts with no copies of a given mutation (79.4% vs 22.5% for the Poisson), but also many more bursts with five or more copies of the mutant (8% vs 1.82% for the Poisson; n = 51,365).
The distribution of the number of generations between progeny virions and initial infecting genomes is displayed for three parameter sets (‘best’, ‘low’, and ‘high’—see Figure 3—source data 1) in Figure 5B. Only a very small percentage of progeny are produced via a single genomic replication cycle. Although all three parameter sets have means close to five generations, the distributions show a portion of the progeny virions representing up to 10 generations between the infecting genotypes and packaged virions within a single cellular infection.
Discussion
The intracellular replication mode of a virus strongly influences the frequency and distribution of mutations among progeny, which shape the longterm behavior of an infecting population (Vignuzzi et al., 2006; Lauring et al., 2012). Due to the complex nature of intracellular dynamics, assessing the mode of replication of viruses is a difficult task (but see Chao et al., 2002). Here, we built on decades of mechanistic studies and recent modeling efforts to construct a stochastic computational model coupled with new Bayesian inference methods. We combined these mathematical and computational techniques with accurate temporal data to produce a detailed picture of viral infection. We found that positive and negativesense RNA measurements made across multiple MOIs, along with quantitative data on virion packaging, are sufficient to infer that poliovirus replication occurs in several layers of intermediate replication, in contrast to the oftassumed ‘stamping machine’ model. The implications of the inferred geometric replication mode are as follows: (1) error rates perreplication are considerably lower than measured rates from fullreplicationcycle in vivo studies, (2) for a given viral polymerase error rate, mutation will progressively accumulate in both genome and antigenome RNAs, which should result in a more accentuated departure from the master sequence, allowing a better exploration of the available sequence space during a single infection cycle, and (3) there exists a significant potential for intracellular selection and competition among related genomes, even in infections initiated by only a single genome.
Accurate estimates of viral mutation rates are essential for studying viral evolution and have crucial practical applications in drug and vaccine design. While estimates of mutation rates exist for nearly two dozen viruses, estimates of replication modes exist for only a few (Sanjuan et al., 2010). Calculating perreplication event mutation rates from observed mutant frequencies is not possible, or even meaningful, without knowledge of the replication mode. Thus, estimates of poliovirus perreplication event mutation rates can vary over 10fold depending on the assumed replication mode (Drake, 1993; Sanjuan et al., 2010). By inferring the mode of replication, we have been able to link estimates of perreplication event mutation rates to published mutant frequencies. The most extensive poliovirus mutant frequency data set estimated an average mutant frequency of 2 × 10^{−4} (Acevedo et al., 2014). Using our inferred value of approximately five intracellular generations, we calculate a perreplication event mutation rate of 2 × 10^{−4}/5 × 2 = 2 × 10^{−5}, which is in agreement with the average estimates of poliovirus mutation rates calculated in vivo from lethal mutation frequencies (Acevedo et al., 2014). Rates of specific types of mutations, such as transversions and transitions, could each be inferred from their mutation frequencies by the same approach. Our inference of five intracellular generations is also in line with previous inferences of replication mode using the LuriaDelbruck fluctuation test nullclass method (Sanjuan et al., 2010). However, our results highlight some limitations for inferring mutation rates from frequencies: intracellular selection may strongly affect mutation frequencies, and the strong stochastic nature of virus replication appears to deeply modulate minor allele distribution, which in turn will result in imprecise estimates of the expected frequency. In particular, assuming that mutation frequency can be modeled as a Poisson process will lead to inappropriate confidence in measured frequencies. As a consequence, multiple empirical mutation frequencies measurements will be required to obtain a more precise determination of true mutation frequencies.
The branched genealogy inferred in our study implies the potential for significant amounts of intracellular complementation, selection, and competition between mutant genomes, even in infections initiated by a single genome (Novak and Kirkegaard, 1994; Turner and Chao, 1999; Vignuzzi et al., 2006). Figure 5A demonstrates the extent to which the frequency of a mutation can be skewed by negative selection during the course of an infection. On the other hand, a mutational event that occurred early in replication and conveyed an intracellular replication advantage could potentially give rise to hundreds or thousands of descendant virions in a single generation. If the mutation distribution data in Figure 5B were displayed as a tree (as in Figure 1), it would contain over 7000 terminal nodes, too many to resolve in a figure. Hence, the apparent potential for mutant interactions is vast. These results suggest that the evolutionary fate of mutations may depend strongly on their intracellular competitive ability, even when multiplicities of infection are low. Additionally, studies that rely on bottlenecks to reduce selection in viral mutation studies (e.g., de la Peña et al., 2000) may be allowing more selection than anticipated. Future population dynamics studies should consider the implications of the intracellular expansion of mutant phenotypes.
Virus infections are normally depicted as deterministic processes that follow a stereotypical path from infection to progeny production and death of the infected cell. However, experimental data show that some infected cells produce few progeny while others produce large populations of progeny (Schulte and Andino, 2014). These observations support the notion that stochasticity is an important factor shaping the outcome of infection. By combining accurate experimental measurements with a stochastic model of viral replication, we have obtained a realistic description of how the molecular events driving the life cycle of the virus govern the outcome of infection in each cell.
A significant benefit of computational modeling is that the information learned in the empirical process of the development of a model can yield important insights in the biologic processes under study. For example, our initial attempts to fit temporal strand measurement data were unable to match the sharp transition to exponential growth seen in the data. Only after removing the requirement for positivesense genomes to be translated before becoming replicationcompetent was our model flexible enough to rapidly create templates for exponential replication. While Novak and Kirkegaard (1994) demonstrate a requirement of the initial, infecting genomes to be translated before replication can occur, their data did not implicate that all genomes produced at any time during infection must be translated before replicating. Our study suggests that newly synthesized positivesense genomes may or may not disperse to nucleate new replication complexes within a single cellular infection, allowing us to model intracellular dynamics in a novel way by permitting a portion of newly made positivesense strands to immediately act as templates for replication without the requirement of translation.
Our model succeeds in describing many experimentally observed features of viral replication and is an excellent staging point for future and more accurate models of viral replication and evolution. With the realistic benefits of stochasticity, compartmentalized reactions, and parameters inferred from quantitative, temporal data, it acts as a baseline intracellular viral replication algorithm. More quantitative data, including data on the formation and number of replication compartments, would further inform the model. Potential additions of intracellular selection, complementation, and recombination parameters would allow population evolution studies to explore intracellular dynamics with more precision than previous approaches. The ultimate goal is to generate a comprehensive model incorporating mechanistic replication dynamics learned from virology with selection and complementation dynamics learned from population genetics. This tool could be very powerful for informing future therapeutic and preventative strategies.
Materials and methods
Experimental procedures
Cells and viruses
Request a detailed protocolHeLaS3 cells (ATCC CCL2.2) were maintained in 50% DMEM/50% F12 medium supplemented with 10% newborn calf serum, 100 U/ml penicillin, 100 U/ml streptomycin, and 2 mM glutamine (Invitrogen). Poliovirus Mahoney type I genomic RNA was generated from in vitro transcription of prib(+)XpAlong. To generate virus, 20 µg of RNA was electroporated into 4 × 10^{6} HeLaS3 cells in a 4mm cuvette with the following pulse: 300 V, 24 Ω, 1000 µF. The resulting virus was passaged at high multiplicity of infection (MOI ∼1–10) three times then subjected to ultracentrifugation through a 30% sucrose cushion.
Infections
Request a detailed protocolFour wells of HeLaS3 cells in 12well plates were washed, trypsinized, and counted twice each on a hemocytometer then averaged to determine cell count. To synchronize infections, plates were placed on ice, cells were washed with cold serumfree media and infected at MOIs 1, 10, and 100. Plates were incubated at 4°C for 30 min with rocking every 10 min to adhere virus. After removal of the inoculum, cells were washed 2× with warm serumfree media. Cells were then incubated at 37°C in 2% serum media until harvest. To harvest, plates were frozen at −70°C.
RNA extraction, reverse transcription (RT), and qPCR
Request a detailed protocolPlates were thawed on ice and refrozen at −70°C 3×. RNA was extracted via the PureLink RNA Micro Kit (Life Technologies) according to the manufacturer's instructions. cDNA was synthesized from total RNA using SuperScript III Reverse Transcriptase (Life Technologies) and 125 nM strandspecific RT primer (+strand_RT: 5′GGCCGTCATGGTGGCGAATAATGTGATGGATCCGGGGGTAGCG3′; strand_RT: 5′GGCCGTCATGGTGGCGAATAACATGGCAGCCCCGGAACAGG3′) in a 5µl reaction. Separate RT reactions for positive and negativestrand RNAs were performed for each sample. RT products were treated with 0.5 units of Exonuclease I (Fermentas) to remove excess RT primer prior to qPCR. Strandspecific qPCR was based on a published protocol (Burrill et al., 2013). cDNAs were analyzed by qPCR using 2× SYBR FAST Master Mix (Kapa Biosystems), 200 nM strandspecific qPCR primer (+strand_For: 5′CATGGCAGCCCCGGAACAGG3′; strand_Rev: 5′TGTGATGGATCCGGGGGTAGCG3′), and 200 nM Tag primer (5′GGCCGTCATGGTGGCGAATAA3′) in a 10µl reaction. A 10× dilution series of in vitro transcribed positive and negativestrand RNA standards was run alongside experimental samples and used to construct a standard curve.
Virion immunoprecipitation
Request a detailed protocolLysates from MOI 10 infections were homogenized with a final concentration of 0.06% NP40. Immunoprecipitation was performed using Protein Acoated Dynabeads and antipoliovirus antibody according to a published protocol (Burrill et al., 2013).
Sucrose gradients
Request a detailed protocolHeLaS3 cells were infected for 1, 2, 3, 4, and 5 hr at an MOI of 10 in 15cm dishes then simultaneously treated with 100 µg/ml cycloheximide (CHX) for 2 min at 37°C. Cells were washed with PBS+CHX and lysed with 0.5% NP40 lysis buffer containing CHX on ice for 20 min. Cell debris was pelleted in a tabletop centrifuge at 2000 rpm for 10 min at 4°C, then nuclei were pelleted at 9000 rpm for 10 min at 4°C. Cell lysates were loaded on a 10–50% sucrose gradient containing CHX and ultracentrifuged at 35,000 rpm for 3 hr. Fractions were collected on a Biocomp Gradient Station with a BioRad Econo UV Monitor. Fractions were pooled based on the spectrophotographic trace into two fractions (ribonucleoprotein and monosome/polysome fractions), RNA was extracted and subjected to qRTPCR.
Modeling replication
Outline of stochastic simulation
Request a detailed protocolWe developed a stochastic simulation model that tracks discrete abundances of poliovirus molecular species within a cell and simulates individual reactions. This model is based on the Gillespie algorithm (Gillespie, 1976). In this model, stochastic events, such as the production, decay, or transformation of molecular species, are represented by reaction rates. Each rate can depend on x, the vector of abundances of all species in the system. Given an initial state x_{0} at t = 0, the algorithm proceeds for a set duration (t_{max}) as follows:
a. Sum the rates of all reactions 1..n in the system; ${r}_{total}={\displaystyle \sum}_{i=1}^{n}{r}_{i}\left(\mathit{x}\right).$
b. Draw the time until the next event, Δt, from an exponential distribution with a mean of r_{total}.
c. Advance to time t + Δt.
d. Choose which event occurred by drawing from a multinomial with probabilities r_{i}(x)/r_{total}.
e. Change x to reflect the chosen event.
f. If t < t_{max}, return to step (a).
Similar to Hensel et al. (2009), we have modified the basic Gillespie algorithm to balance accuracy and speed. When the r_{total} is below a threshold (1000 events/min), we draw exponential times as described above; when it is above this threshold, we use the inverse of the rate—the expected time—as our interval between reactions. Figure 3—figure supplemental 1 shows that this approximation delivers accurate results for the best (lowest error) inferred parameter set. This procedure allows us to efficiently generate stochastic realizations of replication, translation, and other reactions unfolding in a single infected cell, based on a system of equations that describes each essential reaction in the poliovirus life cycle. Results from many replicate simulations are then averaged to predict the dynamics across a population of infected cells.
Figure 2 depicts the events in poliovirus replications captured quantitatively by our model, each of which is described in detail below.
Binding (step 1)
Request a detailed protocolWe assume that the number of virions that bind to, and subsequently infect, a cell is Poisson distributed with a mean equal to the multiplicity of infection (MOI). This formulation assumes that bound virions do not interfere with the binding of additional virions during the period of infection. We denote these coated positivesense RNA genomes at RNA^{+}_{initial}; their distribution is therefore:
Uncoating (step 2)
Request a detailed protocolA quantitative description of uncoating was derived from the data presented in Brandenberg et al. (2007); based on these data, we choose the twoparameter gamma distribution to model stochasticity in this process. To account for differences in experimental protocols, we excluded the t = 0 measurement from Brandenberg et al.'s data and, taking the t = 8 min measurement as the starting point, fit the gamma distribution to the average cumulative measurements. Using the optim() function in R, we obtained an estimate of 0.678 for the shape parameter, and 0.02 for the rate parameter (n = 28, R^{2} ≈ 0.92). Each of the RNA^{+}_{initial} molecules transitions to a translationally competent, linearform positivesense RNA, RNA^{+}_{lin}, after a waiting time, t_{uncoat}, drawn from Equation 2.
Translation (step 3)
Request a detailed protocolTranslation is the first role of positivesense genomes in a cell, and it continues as the primary role throughout the infection. Because poliovirus translates a single polyprotein, we assume that all protein products are produced at equal rates based upon a single rateconstant of initiation. We also assume that poliovirus genomes, and not cellular factors, are ratelimiting, and neglect the delay between the initiation of translation and the appearance of the protein products. With these assumptions, translation can be modeled as a firstorder equation with a single parameter, c_{trans}, yielding a rate of translation:
Here, and throughout the model, we consider these rates to describe Poisson processes, rather than changes in continuously valued quantities. Square brackets are used to indicate the concentration per cell, or abundance, of each molecular species. We track three protein products of translation: the procapsid units, which we abbreviate CAP, and protein products 3A and 3CD. Based on evidence from complementation experiments, we assume that CAP units and 3A diffuse freely, while 3CD accumulates within complexes with translating genomes (Novak and Kirkegaard, 1994; Ansardi et al., 1996; Nugent et al., 1999). Equation 3 applies to translation of both complexassociated and free genomes. Global abundances of CAP and 3A are tracked, while abundances of 3CD are tracked individually for each replication complex. 3CD arising from the translation of free genomes is ignored.
Replication complex formation (step 4)
Request a detailed protocolWe assume that two events must happen before a translating positivesense strand can replicate: it must attach to a membrane, representing nucleation of a replication complex, and it must circularize through association with 3CD (Gamarnik and Andino, 1998; Herold and Andino, 2001). Once a strand associates with a membrane, we consider that it has formed a complex, and assume that all subsequent translation events will add to the local concentration of 3CD. We model this first step by introducing a rate, r_{compart}, at which the RNA^{+}_{lin} species forms complexes. We also assume that the viral protein product 3A facilitates this complex formation (Hsu et al., 2010). Finally, we assume that complex formation is limited by the supply for suitable membrane, which limits the number of possible complexes in a cell to com_{max} (Guinea and Carrasco, 1990). We therefore introduce a firstorder reaction scaled by the number of existing complexes, com, the maximum, com_{max}, and the concentration of the protein 3A:
While other viral and cellular proteins are involved in complex formation, we assume that their influence is adequately represented by tracking the concentration of 3A. We also represent the consumption of some number of 3A molecules in the formation of each complex by a parameter c_{3A}. If insufficient 3A is available upon complex formation, newly translated proteins are consumed by the existing complex until c_{3A} have been allocated. Therefore, we are assuming that 3A binding is cooperative, and that incomplete complexes have much higher affinity for 3A than does the reaction to form a new complex.
Circularization (step 5)
Request a detailed protocolWe model circularization—the transition of a positivesense genome from a linear, translating molecule to a noncovalently associated circularized molecule competent for replication—as a firstorder reaction driven by the concentration of the viral protein 3CD in each complex (indexed by i).
This formulation reflects experimental data supporting the direct role of 3CD in circularization (Gamarnik and Andino, 1998; Herold and Andino, 2001), and the low rate of rescue of 3CDdeficient strains by complementation in trans (Novak and Kirkegaard, 1994).
Replication (step 6)
Request a detailed protocolWe distinguish replication rates for positive and negative strand synthesis with separate rate constants c_{rep+} and c_{rep−}. We ignore polymerase concentrations and instead assume that both types of replication are firstorder reactions modified by a common cellular resource limit. This limitation is parameterized by rep_{max}, the maximum number of replication events per cell permitted by some limited resource and implemented with a running counter, rep, of synthesized RNAs. We also assume that percapita replication does not differ between replication complexes, allowing us to write a massaction equation for both replication reactions.
Note that r_{rep+} measures the rate at which replication is initiated on positivesense templates, producing negativesense strands, and similarly, r_{rep−} measures the rate of positivestrand production.
We allowed newly synthesized positivesense genomes one of three fates: (1) associate with capsid protomers and become encapsidated, (2) diffuse into the cytoplasm where they can translate and potentially create new, independent compartments, or (3) remain in the complex in which they were generated and act as templates for further RNA replication. We assume that positivesense genomes that remain in the complex are immediately competent for replication. We were unable to fit the sharp transition to exponential growth seen in our strand measurement data without allowing for this third option. Allowing newly synthesized positivesense genomes to remain in the complex and act immediately as replication templates is consistent with previous reports indicating a coupling between translation and replication as we still require initial, infecting genomes to be translated before transitioning to replication (Novak and Kirkegaard, 1994). Negativesense strands also stay in the complex in which they were produced and are immediately competent for replication.
We assume that only the positivesense replicationcompetent form is packaged (Ansardi et al., 1996, but see; Liu et al., 2010), and that, following Nugent et al. (1999), genomes can only be packaged as they are synthesized from a negativesense strand. We therefore first determine whether the newly synthesized positivesense strand is packaged; then, for unpackaged genomes, we calculate whether they remain in the replication complex.
Packaging (step 7)
Request a detailed protocolWe assume that the rate of initiation of packaging is proportional to the global concentration of a virusderived protein product, CAP, representing capsid protomers. These protomers form pentamers, of which 12 are required for each capsid; each packaging event therefore consumes 60 units of CAP. To account for the evidence that deficiencies in capsid proteins can be complemented in trans, we allowed capsid proteins to diffuse freely throughout the cell. As with 3A molecules and complex formation, if a packaging event begins when the global abundance of CAP is less than 60, then further packaging is halted until this deficit is filled.
Using the approximation that each available CAP molecule independently contributes to the probability of packaging, we derive the probability that a newly synthesized positivesense strand is packaged to be:
Positive strand dispersal (step 8)
Request a detailed protocolThe probability of a newly synthesized positivesense strand to remain within its replication complex, assuming it was not packaged, is given by the parameter c_{stay}. The total probability is therefore:
Replication phenotype
Request a detailed protocolOur primary goal is to infer the number of replication cycles between the infecting and the progeny virions. Defining a complete replication cycle to include both copying to a negativesense strand, then back to a positivesense strand, we label the mean of this value $\overline{g}$. The principal purpose of this mean is to link the mutation rate of replication to the mean mutation frequency in the progeny population, so the appropriate measure is to average over virions, not infected cells. For k replicate simulations, let n_{i} represent the number of progeny produced by each replicate i, and g_{ij} represent the number of replication cycles in the ancestry of each virion j in replicate i; then we calculate $\overline{g}$ as follows.
Parameter inference
Inference method
Request a detailed protocolWe chose to implement a version of Approximate Bayesian computation called Sequential Monte Carlo (Sisson et al., 2007; Toni et al., 2009; Beaumont, 2010; Csilléry et al., 2010; Lopes and Beaumont, 2010; Toni and Stumpf, 2010). This method consists of several rounds of parameter selection which form successively better approximations of the posterior distribution. In each round x, a population of size n_{x} parameters sets is generated iteratively by choosing a parameter set from the preceding round x − 1, perturbing its values, then accepting or discarding the new parameter set based on the distance of its measured summary statistic from the summary statistic representing the data. Parameter sets with distances less than ε_{x} are accepted; a diminishing series of thresholds, ε_{1} > ε_{2} > ε_{3}, etc, progressively focuses the search on those parameter values that best match the data. In round 1, parameter sets are drawn from the prior distributions; this first round is therefore identical to the basic rejection algorithm, but with a fairly large ε_{1} to reduce computation time.
The advantage of searching for better parameter sets near previously identified good values is a much higher frequency of acceptance, and therefore much less computational time. However, the parameters accepted in later rounds are then biased toward common values in the previous rounds. The SMC algorithm removes this bias by weighting the selection of parameter sets against those that are most similar to their parent round, and toward those that resemble the prior distribution. Let K_{σ}(θ_{a}, θ_{b}) represent the probability of perturbing θ_{a} into θ_{b} with a Gaussian kernel of standard deviation σ, w_{xi} represent the weight of parameter set i in round x, θ_{xij} represent the value of parameter j in set i of round x, and π(θ) represent the prior probability of θ. Then, for round two and later, we calculate these importance weights as in Equation 10 (adapted from Toni et al., 2009).
Beaumont (2010) suggests that the Gaussian perturbations applied to each proposed parameter set should be scaled with regard to the variance in that parameter in the previous round. In practice, we identified a tradeoff based on the scaling of these perturbations; smaller perturbations lead to increased acceptance rates but more positively skewed importance weights; because the weighting of each accepted parameter set is normalized relative to the highest observed importance weight, this strong skew effectively dilutes the inferential power of the analysis. We found that using the standard deviation of each parameter as the standard deviation for the perturbation balanced this tradeoff adequately for our model: the weights of the 1135 sampled parameter sets had an entropy of 10.01 bits, compared to a maximal value of 10.15 bits. This high entropy confirms that any remaining skew in the importance weights does not severely diminish the effective sample size.
Implementing this method requires several additional choices: the shapes of prior distributions, the number of rounds, the values of ε, and the number of replicates, n, to perform for each evaluation of the model. This last decision turned out to be crucial; inferring based on the mean of a larger number of replicates (n ≥ 1000) tended to select parameter sets with highly variable behavior. Reducing n led to a higher rate of parameter set rejection but more biologically plausible dynamics. A simple explanation for this pattern is that a number of parameter sets produce acceptable mean behavior, but differ in the degree of stochastic variation around that mean in a way that does not reflect measured stochastic variation (Schulte and Andino, 2014). We therefore chose to accept parameter sets that passed a given ε for multiple, sequential sets of n replicates. For round 1, one thousand parameters sets were accepted based on five sets of n = 20 replicates at MOI = 10 only with ε = 12. For round 2, 1135 parameters sets were accepted based on five sets of n = 20 replicates at MOI = 1 with ε = 16, five sets of n = 20 replicates at MOI = 10 with ε = 12, and five sets of n = 20 replicates at MOI = 100 with ε = 7. Thresholds for round 2 were calibrated to achieve an acceptance rate of about 1 in 10,000.
Approximate Bayesian computation uses summary statistics to judge the fit of model predictions to data. We chose to compute the sum squared error of the log of the mean abundances of positive and negativesense RNA, and the log ratio of positivesense RNA in capsids to the total positive pool. This produced a single summary statistic that captured the total error of the predictions from a particular parameter set. Using the natural logs for each data point effectively weights each deviation by its relative magnitude, which prevents the large absolute size of errors at later time steps from dominating the error measurement.
To aid in visual exploration of the data, we chose five representative parameter sets as follows. From an initial batch of 513 parameter sets, we chose the 50 sets with the overall lowest error. From these, we sampled sets of five at random and calculated the summed pairwise distance in parameter space of those five sets from each other. To adjust for the different scales and uncertainties in each parameter, the contribution of each parameter to the distance measure was divided by its standard deviation over the whole set of 513 values. These summed distances provided a metric of the parameter diversity captured in a choice of five parameter sets; we examined one thousand randomly drawn sets of five and chose the set with the highest summed distance. These five parameter sets are shown in Figure 3—source data 1.
Method validation
Request a detailed protocolTo test the effectiveness of the inference process and the adequacy of our data, we chose parameter sets from the prior, produced simulated data from these parameter sets, and then performed the sequential Monte Carlo method described above. We chose three sets of parameters based on two criteria: representation of a high diversity of the replication phenotype, $\overline{g}$, and biological plausibility. We achieved these criteria by sampling the prior, measuring $\overline{g}$ and the mean number of virions at MOI = 10, and choosing three parameter sets than spanned the range of $\overline{g}$ and produced a number of virions similar to the empirically measured value.
For each parameter set, we measured RNA abundances and, for MOI = 10, virion abundances with 10,000 replicate simulations. These data were then treated exactly as the empirical data were handled; the log of the averages was used to measure error in the Bayesian inference procedure described above. At least one thousand accepted parameter sets were collected for both inference rounds (except in one case in which fast convergence and lengthy computation times made round two unnecessary and computationally costly).
The mean RNA abundances for MOI = 10 and the inference results are plotted in Figure 4—figure supplement 5. In each case, the inference process produced a narrow posterior, relative to the prior, with clear similarity to the actual value of $\overline{g}$ for each starting parameter set. However, the mean of the final posterior fails to perfectly match the true value in all three cases. Also, the second round of inference, which more than doubles the amount of data used to assess goodnessoffit, achieved little in these validation experiments. The error thresholds (values of ε) used here may have been too permissive to achieve complete convergence to the correct value. Nonetheless, these experiments show that our statistical method, when combined with the types and quantities of data we have available, can produce a reliable inference.
Sensitivity analysis
Request a detailed protocolTo further investigate if the parameter values identified by ABC minimize the error in predicted RNA and virion dynamics, we explored the sensitivity of mean error to variation in each parameter for a single parameter set (‘Best’ in Figure 3—source data 1). As before, error was assessed by comparing the log of the mean RNA and virion abundances to empirical data for sets of 20 simulations. 500 trials of 20 simulations each at each MOI were averaged to produce a highresolution estimate of true deviation from the data. These results are plotted as 1/(1 + mean error) to show an intuitive goodnessoffit measure, where high values indicate similarity to the data. Figure 4—figure supplement 4 shows that the ‘best’ parameter set is at or near a local maximum for goodnessoffit for eight of ten parameters; the effects of the remaining two parameters, c_{pack} and com_{max}, appear to be minimal for this parameter set. These results suggest that convergence of the posterior distributions is linked to the sensitivity of the model to each parameter, which supports the effectiveness of the ABC algorithm.
References
 1

2
Poliovirus assembly and encapsidation of genomic RNAAdvances in Virus Research 46:1–68.https://doi.org/10.1016/S00653527(08)60069X
 3

4
Approximate Bayesian Computation in evolution and ecologyAnnual Review of Ecology, Evolution, and Systematics 41:379–406.https://doi.org/10.1146/annurevecolsys102209144621

5
Intracellular localization of poliovirus plus and minusstrand RNA visualized by strandspecific fluorescent in situ hybridizationJournal of Virology 72:8578–8585.
 6

7
Poliovirus: generation and characterization of mutantsCurrent Protocols in Microbiology, Chapter 15: Unit 15H.2, 10.1002/9780471729259.mc15h02s29.
 8

9
Approximate bayesian computation (ABC) in practiceTrends in Ecology & Evolution 25:410–418.https://doi.org/10.1016/j.tree.2010.04.001

10
Visualizing the dynamic behavior of poliovirus plusstrand RNA in living host cellsNucleic Acids Research 33:3245–3252.https://doi.org/10.1093/nar/gki629
 11

12
Cytoplasmic viral replication complexesCell Host & Microbe 8:77–85.https://doi.org/10.1016/j.chom.2010.06.010
 13

14
Rates of spontaneous mutation among RNA virusesProceedings of the National Academy of Sciences of USA 90:4171–4175.https://doi.org/10.1073/pnas.90.9.4171

15
Intracellular location and translocation of silent and active poliovirus replication complexesThe Journal of General Virology 86:707–718.https://doi.org/10.1099/vir.0.804420
 16

17
Switch from translation to RNA replication in a positivestranded RNA virusGenes & Development 12:2293–2304.https://doi.org/10.1101/gad.12.15.2293
 18

19
General method for numerically simulating stochastic time evolution of coupled chemicalreactionsJournal of Computational Physics 22:403–434.https://doi.org/10.1016/00219991(76)900413

20
Phospholipid biosynthesis and poliovirus genome replication, two coupled phenomenaThe EMBO Journal 9:2011–2016.
 21

22
Stochastic kinetic modeling of vesicular stomatitis virus intracellular growthBulletin of Mathematical Biology 71:1671–1692.https://doi.org/10.1007/s1153800994195
 23
 24

25
Levels of selection in positivestrand virus dynamicsJournal of Evolutionary Biology 16:64–73.https://doi.org/10.1046/j.14209101.2003.00481.x
 26
 27

28
ABC: a useful Bayesian tool for the analysis of population dataInfection, Genetics and Evolution 10:826–833.https://doi.org/10.1016/j.meegid.2009.10.010

29
The frequency distribution of spontaneous bacteriophage mutants as evidence for the exponential rate of phage reproductionCold Spring Harbor Symposia on Quantitative Biology 16:463–470.https://doi.org/10.1101/SQB.1951.016.01.033
 30

31
Coupling between genome translation and replication in an RNA virusGenes & Development 8:1726–1737.https://doi.org/10.1101/gad.8.14.1726

32
Functional coupling between replication and packaging of poliovirus replicon RNAJournal of Virology 73:427–435.

33
RNA binding properties of poliovirus subviral particlesJournal of Virology 69:13–22.
 34
 35
 36
 37

38
Optimal replication of poliovirus within cellsThe American Naturalist 165:364–373.https://doi.org/10.1086/428295
 39

40
Replication mode and landscape topology differentially affect RNA virus mutational load and robustnessJournal of Virology 83:12579–12589.https://doi.org/10.1128/JVI.0076709

41
Cellular origin and ultrastructure of membranes induced during poliovirus infectionJournal of Virology 70:6576–6588.

42
Singlecell analysis uncovers extensive biological noise in poliovirus replicationJournal of Virology 88:6205–6212.https://doi.org/10.1128/JVI.0353913

43
Sequential monte carlo without likelihoodsProceedings of the National Academy of Sciences of USA 104:1760–1765.https://doi.org/10.1073/pnas.0607208104

44
Stochastic vs. deterministic modeling of intracellular viral kineticsJournal of Theoretical Biology 218:309–321.https://doi.org/10.1006/jtbi.2002.3078

45
Molecular Biology of Bacterial VirusesSan Francisco, Calif: W H Freeman and Company.

46
The relationship between mutation frequency and replication strategy in positivesense singlestranded RNA virusesProceedings Biological Sciences/ the Royal Society 277:809–817.https://doi.org/10.1098/rspb.2009.1247

47
Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systemsJournal of the Royal Society, Interface 6:187–202.https://doi.org/10.1098/rsif.2008.0172
 48
 49
 50
Decision letter

Stephen P GoffsReviewing Editor; Howard Hughes Medical Institute, Columbia University, United States
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
[Editors’ note: this article was originally rejected after review, but the authors were later invited to resubmit.]
Thank you for choosing to send your work entitled “Experimentally guided mathematical modeling reveals RNA virus replication principles shaping the mutation distribution” for consideration at eLife. Your full submission has been evaluated by Aviv Regev (Senior editor), a Reviewing editor, and 3 peer reviewers, and the decision was reached after discussions between the reviewers. We regret to inform you that your work will not be considered further for publication.
The three reviews are below for your consideration, with the following comments from the Reviewing editor.
This paper reports a new attempt to model viral replication, including stochastic effects from cell to cell, and pools of viral RNA that are in replicative vs. translation pools. The model fits data best with a geometric increase of RNA (no surprise) with about 5 generations occurring intracellularlly (before cell death?). The large number (ten or so) of parameters were explored by an Approximation Bayesian Computation method, not all were independent, and some having no major impact. While I cannot comment on the math, most of the assumptions of the model seem plausible and the predictions seem to fit the limited data. The model then is used to predict the distribution of mutations, with good correspondence to data.
While it seems like a useful framework has been devised, novel predictions of the model are few and far between.
I have a few issues that reflect my uncertainty about the working of the model.
1) I have a freefloating concern with identifying a cycle as a complete genometo antigenometogenome cycle. How do we deal with the fact that there are many more genomes than antigenomes? In fact, this ratio is measured and predicted (Figure 3); wouldn't this be a good thing to include as a plotted output? Does one assume here that the excess of genomes are all moved out of the replication pool, while all the antigenomes are always in the pool? How do we deal with the necessary halfcycles? Some discussion of this would be helpful.
2) In the Results section, I might question the restriction that “following Nugent et al. (1999), genomes can only be packaged as they are synthesized from a negativesense strand…”. Could we consider a more open view that any genome, new or old, could be packaged (from any pool)? Would this be compatible with the data? Was this tried?
Reviewer #1
Life history theory aims to quantify different parameters, such as time to reproduction or number of offsprings, in potential strategies in the life of organisms to maximize the number of surviving offspring. RNA viruses present a particularly simple lifestyle and their fast replication makes them particularly appropriate to test evolutionary models. The authors apply these principles to the evolution of poliovirus. This work is the collaboration between two strong groups in the area of mathematical modeling and experimental viral evolution. Of particularly interest in this work is the determination of parameters in single cell infection. For instance, the authors contrast two simple models of replication, stamping machine where the progeny is one generation away from the initial and the geometric replication model where several rounds of replication occur within a single cell. Main conclusions of this work the lifetime of viral progeny (∼5 generations) and the abundance RNA increase following an exponential pattern which allow to test different models of replication.
The mathematical model is a stochastic model a la Gillespie where the cell infection is decomposed into a set of individual steps (binding, uncoating, translation, replication complex formation, circularization, replication, packaging, strand dispersal). The model has many parameters. To infer the value of these parameters the authors use quantitative RTPCR data of both positivesense genomes and negativesense, amounting to 27 measures. The stochastic approach coupled to high dimension of data and large numbers of parameters led the authors to choose an approximate Bayesian computation using as summary statistic the sum of the squared deviations of the average simulated RNA concentration.
Initially I was extremely excited about the manuscript, but I became less enthusiastic as I was reading along.
1) The model proposed discretized the infection into a set of simple steps each of this depending on a few parameters, creating a complex chain of steps. Each step has its own parameters that add complexity in addition to the challenges of stochasticity. The model fits many parameters. It would be more interesting to make some specific hypotheses to be tested experimentally beyond parameter fitting.
2) The steps follow a virology book cartoon model, but many other factors are known to be important (transport, cell factors, etc.), and it is unclear that the factors considered are the only important ones.
3) The variability among cells could be large, as reflected by recent single cell studies, and it is unclear how this variability is taken into account in these models. A critical question in this work is the cell to cell variability. In a system where there is geometric growth a progeny the mean number of viruses in a cell is not as informative as the whole distribution. One would think that the larger scale dynamics of the virus are determined by the cells producing more viruses, in particular, in a geometric growth model.
Reviewer #2
The authors have built on their previous work in poliovirus cell interaction to develop an elaborate model of intracellular replication and consequent accumulation of mutations. In combination with careful experimental analysis of replication kinetics of both plus and minus strands as well as virions at several different MOIs, they are able to establish reasonably tight ranges for the 10 or so free parameters required by their model and draw several interesting conclusions, including that intracellular replication is best fit to a branching (rather than “stamping machine”) process, that, on average, progeny of an infected cell are about 5 copying generations removed from the infecting genome, and that, therefore, the underlying mutation rate (per copying event) is about 5fold less than estimated from singlestep experiments.
Although the modeling analysis seems to me to be quite elegant, to use upto date techniques, and to be well supported by careful experimentation, and the conclusions seem reasonable and well justified, I am not an expert in either modeling or positivestrand virus replication, so I will defer to other reviewers on these issues. That said, my major concern is that the manuscript is rather inaccessible to the general readership expected for a journal like eLife. I found it quite difficult to read and understand, and I had to go back to a virology textbook a few times. It would help a lot if Figure 2 contained more information for the general reader, such as a map of the genome indicating the names and functions of the important gene products. Also very helpful would be a summary table listing the various parameters and the steps they refer to. Overall, the reader needs to have a clearer vision of the replication cycle than is accessible from the introductory material here. As another example, the description of the necessity for “circularization” of the RNA, and the drawing in Figure 2, leave the nonexpert reader with the strong impression that the circles referred to are covalently bound RNAonly structures, rather than RNAprotein complexes, as the authors have nicely shown in previous work.
Although the stated goal of the analysis is to understand how the replication mode shapes the mutation distribution, the treatment of mutations seems rather lightweight, by comparison to replication dynamics, and also has confusing aspects. For starters, in most modeling I am accustomed to, μ is used to refer to the mutation rate, and s the selection coefficient. The use of different notation here made reading more difficult, for sure. Second, the use of a “one size fits all” approach to mutations seems oversimplified and inaccurate, because the kinetics of accumulation of different types of mutants will depend very strongly on how and where they act, whether in cis or trans to the genome, whether they affect one protein (e.g. missense mutations) or others (nonsense mutations), etc. If and how the authors included these issues in their simulations is not made clear. Obviously, there must be accumulation of some mutations that would be lethal in clonal replication, but not others, so the meaning of “replication deficit” is quite unclear.
Reviewer #3
Schulte et al. present a detailed analysis of the replication process of polio virus inside a cell. They develop a stochastic model of viral replication, fit it to empirical data via Approximate Bayesian Computation (ABC), and infer several parameters of interest. Most importantly, they show that polio virus does not replicate using the stampingmachine mechanism. Instead, there are intermediate rounds of replication inside the cell.
Overall, the paper is interesting, and the authors have a wealth of novel results. However, the presentation of the work is lacking at places, and the paper will require substantial revisions before it is publishable.
Most importantly, I feel the authors can't entirely decide on what their story is. Is the main motivation to identify the replication mode of the virus, or to show that a stochastic model is better than a deterministic model? At places, the paper seems to argue strongly for stochastic over deterministic treatment, but in the end there isn't a single result I could see that would demonstrate how a deterministic approach fails. I think it would be better to focus on the biological question (method of replication mode) and not worry so much about justifying the stochastic approach.
Secondly, the computational methods are not that well described. Notably, the methods section talks only about experimental work. Yet the main body of the text is not sufficient to fully understand the computational work, and some paragraphs currently in the Results would be better placed in the Materials and methods (see below). Also, the code and raw data for this project need to be made available.
Other comments:
I couldn't always figure out what parts of the work were simulation and what parts were fitting of the model to measured data. This needs to be worked out more clearly. Also, it would be good to have a table that lists the final parameter estimates obtained from fitting the model to the data.
It would be good to add a step of model verification where you simulate the model given some parameter set, and then see whether you can recover the parameters using the ABC method. In particular, can you distinguish SM and GR scenarios when all other parameters are the same?
In the Introduction: I would argue that if a deterministic model agrees well with measured data then that is sufficient evidence to conclude the stochastic fluctuations can be neglected.
In the Results: The modified Gillespie algorithm needs to be explained in detail in the Materials and methods section. For example, what is the threshold parameter at which you switch from stochastic to deterministic treatment? Also, the code needs to be made available.
In the Results, “we also assume that poliovirus genomes, and not cellular factors, are ratelimiting”: it is not clear to me to what extent this assumption biases results. Certainly, eventually cellular factors will be limiting viral replication inside a cell. So how does your model take this into account?
In the Results, “inferring based on mean of a larger number of replicates (n ≥ 1000) tended to select parameter sets with highly variable behavior. Reducing n led to a higher rate of parameter set rejection but more biologically plausible dynamics”: any idea why your results depend on n in this way?
Figure 1: Where does the number of 2.33 come from for GR? I doubt it is part of the definition for GR, but the sentence is written as if it were.
https://doi.org/10.7554/eLife.03753.018Author response
We have incorporated many of the reviewers’ suggestions and modified the paper to clarify several aspects that were unclear in the previous version. We have also included additional information and novel analysis of our data. We believe that the revised version is much stronger now.
Critical and innovative advances in our study are:
1) This is the first stochastic model for any virus replication that uses experimental information to infer the replication parameters;
2) We clearly establish that poliovirus replication significantly deviates from the accepted “stamping machine” strategy;
3) We use this model to examine how the genetic structure of a virus population is established during replication in single infected cells.
1) I have a freefloating concern with identifying a cycle as a complete genometo antigenometogenome cycle. How do we deal with the fact that there are many more genomes than antigenomes? In fact, this ratio is measured and predicted (Figure 3); wouldn't this be a good thing to include as a plotted output? Does one assume here that the excess of genomes are all moved out of the replication pool, while all the antigenomes are always in the pool? How do we deal with the necessary halfcycles? Some discussion of this would be helpful.
We have revised our manuscript to use the word “generation” in place of “cycle”—this change in terminology should help clear up this issue for most readers. We use “generation” to denote the completion of a full round of the intracellular replication process of a genome produced from a negativesense strand that itself arose from a positivesense genome. This definition does not assume that genomic and antigenomic molecules must balance or have any particular stoichiometry. Of course, the premise of our model is that the relative abundances over time of genomic and antigenomic RNAs are informative about replication, and those abundances are the primary data we fit (Figure 3). However, each genomic RNA is related to the progenitor genomes by some number of generations, each consisting of a genomictoantigenomic step, then a corresponding antigenomictogenomic step. It is the counts of these generations, not the ratios, which are plotted in Figure 3, and which define the opportunity for mutation. There is therefore no contradiction among our results: the balance of rates of replication and other processes determine the ratios, and also determine the mean number of generations, which we now call $\overline{g}$.
2) In the Results section, I might question the restriction that “following Nugent et al. (1999), genomes can only be packaged as they are synthesized from a negativesense strand…”. Could we consider a more open view that any genome, new or old, could be packaged (from any pool)? Would this be compatible with the data? Was this tried?
Rather than try every possible derivation of our model, we choose to build off the wealth of insights to poliovirus biology found in the literature. In addition to Nugent et al. (1999) showing a coupling of replication and packaging, Baltimore et al. (1966) show an increase in viral particles only in the presence of active replication, suggesting that genomes destined for viral particles are siphoned off of the replication machinery.
Reviewer #1
1) The model proposed discretized the infection into a set of simple steps each of this depending on a few parameters, creating a complex chain of steps. Each step has its own parameters that add complexity in addition to the challenges of stochasticity. The model fits many parameters. It would be more interesting to make some specific hypotheses to be tested experimentally beyond parameter fitting.
Our primary result—that around five generations separate a typical progeny from the infecting genome—is a specific and testable hypothesis, and we have sought to emphasize that point more explicitly. To clarify, this result provides a quantitative prediction of the ratio of the frequency of a neutral mutation after a generation to the rate of such mutations. We are working to develop methods to test these predictions and hope that the publication of these results will stimulate other such experiments. We also produce these predictions for a range of deleterious mutations (Figure 5A). We have revised the presentation of these results to make this point more clearly.
2) The steps follow a virology book cartoon model, but many other factors are known to be important (transport, cell factors, etc.), and it is unclear that the factors considered are the only important ones.
We agree completely with this conclusion, which is why it is critical to provide testable predictions that could point to inadequacies in our model. As noted above, the model is already complex; we believe that the best approach is to construct a model with minimal complexity that both captures the essential biological steps and recapitulates the measured dynamics (as seen in Figure 3). Publishing this model will provide a muchneeded testbed for researchers to assess the quantitative contribution of the many other details of poliovirus replication.
3) The variability among cells could be large, as reflected by recent single cell studies, and it is unclear how this variability is taken into account in these models. A critical question in this work is the cell to cell variability. In a system where there is geometric growth a progeny the mean number of viruses in a cell is not as informative as the whole distribution. One would think that the larger scale dynamics of the virus are determined by the cells producing more viruses, in particular, in a geometric growth model.
We agree that stochastic variability is extremely important—our revised Materials and methods section hopefully do a better job of communicating the details behind our stochastic model.
Reviewer #2
Although the modeling analysis seems to me to be quite elegant, to use upto date techniques, and to be well supported by careful experimentation, and the conclusions seem reasonable and well justified, I am not an expert in either modeling or positivestrand virus replication, so I will defer to other reviewers on these issues. That said, my major concern is that the manuscript is rather inaccessible to the general readership expected for a journal like eLife. I found it quite difficult to read and understand, and I had to go back to a virology textbook a few times. It would help a lot if Figure 2 contained more information for the general reader, such as a map of the genome indicating the names and functions of the important gene products. Also very helpful would be a summary table listing the various parameters and the steps they refer to. Overall, the reader needs to have a clearer vision of the replication cycle than is accessible from the introductory material here. As another example, the description of the necessity for “circularization” of the RNA, and the drawing in Figure 2, leave the nonexpert reader with the strong impression that the circles referred to are covalently bound RNAonly structures, rather than RNAprotein complexes, as the authors have nicely shown in previous work.
Thank you for these positive comments. We have extensively revised the Materials and methods section and believe it now gives a more complete picture of both the model and the lifecycle. We have revised our description of the “circular” RNA form to state its noncovalent nature explicitly.
Although the stated goal of the analysis is to understand how the replication mode shapes the mutation distribution, the treatment of mutations seems rather lightweight, by comparison to replication dynamics, and also has confusing aspects. For starters, in most modeling I am accustomed to, μ is used to refer to the mutation rate, and s the selection coefficient. The use of different notation here made reading more difficult, for sure. Second, the use of a “one size fits all” approach to mutations seems oversimplified and inaccurate, because the kinetics of accumulation of different types of mutants will depend very strongly on how and where they act, whether in cis or trans to the genome, whether they affect one protein (e.g. missense mutations) or others (nonsense mutations), etc. If and how the authors included these issues in their simulations is not made clear. Obviously, there must be accumulation of some mutations that would be lethal in clonal replication, but not others, so the meaning of “replication deficit” is quite unclear.
We have changed the notation for the number of generations to avoid the ambiguity of μ. We absolutely agree that different types of mutations would accumulate differently, and have expanded the text to describe this issue and justify our modeling of a particular class of mutation—those that directly affect intracellular replication rates.
Reviewer #3
Most importantly, I feel the authors can't entirely decide on what their story is. Is the main motivation to identify the replication mode of the virus, or to show that a stochastic model is better than a deterministic model? At places, the paper seems to argue strongly for stochastic over deterministic treatment, but in the end there isn't a single result I could see that would demonstrate how a deterministic approach fails. I think it would be better to focus on the biological question (method of replication mode) and not worry so much about justifying the stochastic approach.
You are correct—the point of the paper is not to argue for the superiority of a stochastic model, although we think it is both useful and necessary to differentiate our work from previous models. We have edited the Introduction and Discussion to shift the emphasis to our main result, namely the prediction that poliovirus follows the geometric replication mode, i.e. multiple iterative generations of genomic replication produce progeny virus.
Secondly, the computational methods are not that well described. Notably, the methods section talks only about experimental work. Yet the main body of the text is not sufficient to fully understand the computational work, and some paragraphs currently in the Results would be better placed in the Materials and methods (see below). Also, the code and raw data for this project need to be made available.
Other comments:
I couldn't always figure out what parts of the work were simulation and what parts were fitting of the model to measured data. This needs to be worked out more clearly. Also, it would be good to have a table that lists the final parameter estimates obtained from fitting the model to the data.
We have reorganized the Materials and methods and the Results to help clarify these and other issues. We have also reorganized the figure supplements to make this kind of information easier to find. The full posterior distribution for each parameter is graphed in Figure 4—figure supplement 1, and point estimates are given in Figure 3—figure supplement 2.
It would be good to add a step of model verification where you simulate the model given some parameter set, and then see whether you can recover the parameters using the ABC method. In particular, can you distinguish SM and GR scenarios when all other parameters are the same?
This is an excellent suggestion and we have added a figure doing exactly this (Figure 5—figure supplement 5).
In the Introduction: I would argue that if a deterministic model agrees well with measured data then that is sufficient evidence to conclude the stochastic fluctuations can be neglected.
Our view is that the biological process is clearly stochastic, making a stochastic model a reasonable starting place. However, we agree that the original statement was misleading—if a deterministic data was found to adequately explain all data, then a stochastic model would be unnecessary.
In the Results: the modified Gillespie algorithm needs to be explained in detail in the Materials and methods section. For example, what is the threshold parameter at which you switch from stochastic to deterministic treatment? Also, the code needs to be made available.
This parameter value, as well as other missing details, have been added to the thoroughly revised Materials and methods section. The code has also been made available.
In the Results, “we also assume that poliovirus genomes, and not cellular factors, are ratelimiting”: it is not clear to me to what extent this assumption biases results. Certainly, eventually cellular factors will be limiting viral replication inside a cell. So how does your model take this into account?
We do implement a limitation on the replication of poliovirus genomes as the parameter rep_{max}, and an additional limitation on the number of replication complexes as the parameter com_{max}. We found that the addition of these parameters allowed the replication rate to decline toward the end of an infection, in keeping with the empirical data (and common sense). Without evidence that resources specifically limited translation, we didn’t see the need to add this complicating parameter to the model.
In the Results, “inferring based on mean of a larger number of replicates (n ≥ 1000) tended to select parameter sets with highly variable behavior. Reducing n led to a higher rate of parameter set rejection but more biologically plausible dynamics”: any idea why your results depend on n in this way?
We have added the text: “A simple explanation for this pattern is that a number of parameter sets produce acceptable mean behavior, but differ in the degree of stochastic variation around that mean in a way that does not reflect measured stochastic variation (Schulte and Andino, 2014).”
Figure 1: Where does the number of 2.33 come from for GR? I doubt it is part of the definition for GR, but the sentence is written as if it were.
The number refers to the particular example plotted on the right side of Figure 1. The misleading sentence has been rephrased to indicate this.
https://doi.org/10.7554/eLife.03753.019Article and author information
Author details
Funding
National Institute of Allergy and Infectious Diseases (R01 AI36178)
 Michael B Schulte
 Raul Andino
National Institute of Allergy and Infectious Diseases (R01 AI40085)
 Michael B Schulte
 Raul Andino
Defense Advanced Research Projects Agency (BAA1093)
 Michael B Schulte
 Raul Andino
Burroughs Wellcome Fund (JBP)
 Joshua B Plotkin
David and Lucile Packard Foundation (JBP)
 Joshua B Plotkin
U.S. Department of the Interior (D12AP00025)
 Joshua B Plotkin
Army Research Office (W911NF1210552)
 Joshua B Plotkin
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
 Stephen P Goffs, Howard Hughes Medical Institute, Columbia University, United States
Publication history
 Received: June 22, 2014
 Accepted: December 31, 2014
 Version of Record published: January 30, 2015 (version 1)
Copyright
© 2015, Schulte et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,128
 Page views

 373
 Downloads

 14
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Computational and Systems Biology
 Microbiology and Infectious Disease
This study assembles DNA adenine methylomes for 93 Mycobacterium tuberculosis complex (MTBC) isolates from seven lineages paired with fullyannotated, finished, de novo assembled genomes. Integrative analysis yielded four key results. First, methyltransferase allelemethylome mapping corrected methyltransferase variant effects previously obscured by referencebased variant calling. Second, heterogeneity analysis of partially active methyltransferase alleles revealed that intracellular stochastic methylation generates a mosaic of methylomes within isogenic cultures, which we formalize as ‘intercellular mosaic methylation’ (IMM). Mutationdriven IMM was nearly ubiquitous in the globally prominent Beijing sublineage. Third, promoter methylation is widespread and associated with differential expression in the ΔhsdM transcriptome, suggesting promoter HsdMmethylation directly influences transcription. Finally, comparative and functional analyses identified 351 sites hypervariable across isolates and numerous putative regulatory interactions. This multiomic integration revealed features of methylomic variability in clinical isolates and provides a rational basis for hypothesizing the functions of DNA adenine methylation in MTBC physiology and adaptive evolution.

 Epidemiology and Global Health
 Microbiology and Infectious Disease
Multiple studies have reported a male bias in incidence and/or prevalence of malaria infection in males compared to females. To test the hypothesis that sexbased differences in hostparasite interactions affect the epidemiology of malaria, we intensively followed Plasmodium falciparum infections in a cohort in a malaria endemic area of eastern Uganda and estimated both force of infection (FOI) and rate of clearance using amplicon deepsequencing. We found no evidence of differences in behavioral risk factors, incidence of malaria, or FOI by sex. In contrast, females cleared asymptomatic infections at a faster rate than males (hazard ratio [HR]=1.82, 95% CI 1.20 to 2.75 by clone and HR = 2.07, 95% CI 1.24 to 3.47 by infection event) in multivariate models adjusted for age, timing of infection onset, and parasite density. These findings implicate biological sexbased differences as an important factor in the host response to this globally important pathogen.