Experimentally guided models reveal replication principles that shape the mutation distribution of RNA viruses

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
References
Article and author information
Metrics

Abstract

Life history theory posits that the sequence and timing of events in an organism's lifespan are fine-tuned by evolution to maximize the production of viable offspring. In a virus, a life history strategy is largely manifested in its replication mode. Here, we develop a stochastic mathematical model to infer the replication mode shaping the structure and mutation distribution of a poliovirus population in an intact single infected cell. We measure production of RNA and poliovirus particles through the infection cycle, and use these data to infer the parameters of our model. We find that on average the viral progeny produced from each cell are approximately five generations removed from the infecting virus. Multiple generations within a single cell infection provide opportunities for significant accumulation of mutations per viral genome and for intracellular selection.

https://doi.org/10.7554/eLife.03753.001

eLife digest

Viruses with genetic information made up of molecules of RNA can multiply quickly, but not very accurately. This means that many errors, or mutations, occur when the RNA is copied to create new viruses. The advantage of this rapid, but mistake-filled, RNA replication process is that some of the mutations will be beneficial to the virus. This allows viruses to rapidly evolve, for example, to develop resistance against drugs.

The poliovirus is an RNA virus that can cause paralysis and death in humans. To prevent such infections, scientists have extensively studied the poliovirus and have developed effective vaccines against it that have eliminated the virus from all but a few countries. Because so much is known about the poliovirus and because it has a very simple structure, scientists continue to use the poliovirus as a model to study virus behavior.

One unknown aspect of the poliovirus' behavior is how it replicates after invading a cell. Are all of its RNA copies made from the original viral RNA that first infected the cell, in what is known as a ‘stamping machine’ model? Or do the new copies of the RNA also get copied themselves in a ‘geometric replication mode’ that increases the likelihood of mutations and enables the virus to evolve more rapidly?

Viral RNA molecules are copied by one of the virus's own proteins and so before the viral RNA can be replicated, it must first be translated to form viral proteins. When and where replication begins depends on the concentration of translated proteins around the RNA and so replication tends to begin in particular areas of the cell at different times. Schulte, Draghi et al. used mathematical modeling to create computer simulations of the number of polioviruses in a cell that take into account these time and space constraints. By including random elements in the model, the simulated behavior more accurately follows experimentally recorded data than previously used models.

The results of the model led Schulte, Draghi et al. to conclude that the poliovirus replicates by the ‘geometric mode’; as new copies of the poliovirus RNA are made, each copy goes on to make more copies. This means that in a single infected cell there are multiple generations of RNA, and each generation may undergo distinct mutations that are passed on to the next set of RNA copies. In fact, Schulte, Draghi et al. found that the average virus released from an infected cell is the great-great-great-granddaughter of the original virus that infected the cell. With so many different generations of virus coexisting in a cell, there are a lot of opportunities for new genetic combinations to occur and for viruses to evolve new abilities.

https://doi.org/10.7554/eLife.03753.002

Introduction

RNA viruses are excellent models for evolution. They replicate quickly and have extremely high mutation rates, orders of magnitude greater than those of most DNA-based life forms (Drake, 1993). While this combination of traits creates the potential for rapid adaptation, it necessitates a life history strategy that balances the need for explosive, exponential growth with the requirement to maintain genomic integrity. The life history strategies of viruses are largely reflected by their mode of intracellular replication. Two classic replication modes have been described for single-stranded RNA viruses: the ‘stamping machine’ mode (Stent, 1963) and the ‘geometric replication’ mode (Luria, 1951). In the stamping machine mode (SM), templates made from the original infecting genomes are used for the production of all progeny genomes. In the geometric replication mode (GR), newly made progeny genomes are used to create further templates for additional rounds of replication within a single cellular infection cycle (Figure 1). Progeny produced from stamping machine replication are all a single generation away from the parental strand whereas progeny generated from geometric growth represent a distribution of generations from the parental strand, often resulting in a fractional mean number of generations (see Figure 1). The iterative nature of GR creates branched genealogies that allow for expansive exploration of sequence space and results in a mutation distribution that differs from the SM mode (Luria, 1951). Recent studies with population-genetic models (Draghi et al., 2010) and RNA enzyme populations (Hayden et al., 2011) have shown that differences in the distribution of mutants can significantly impact the adaptability of a population. Recent studies with poliovirus (PV) have also demonstrated that mutational differences within a population can have dramatic effects on pathogenicity (Pfeiffer and Kirkegaard, 2005; Vignuzzi et al., 2006) as well as fitness, virulence, and robustness (Lauring et al., 2012).

Figure 1

Download asset Open asset

Illustrations of the genealogies of different replication modes.

Red dots indicate positive-sense strands. Blue dots indicate negative-sense templates. Stamping machine (SM) progeny are one generation from the initial infecting genome (left). In an example of geometric replication (GR), progeny are an average of 2.33 generations from the initial infecting genome (right).

https://doi.org/10.7554/eLife.03753.003

Poliovirus' simple genomic architecture and medical importance have made it one of the most extensively studied viruses (Racaniello, 2006). However, despite decades of mechanistic studies and recent revelations of the importance of population structures, the replication mode and resulting mutation distribution have yet to be determined. PV therefore proves an excellent candidate for the rigorous construction of a computational model of virus replication to predict population structure and mutation distribution. A major feature of PV intracellular dynamics is that the genome participates in multiple reactions: translation, replication, and encapsidation. Its 7.5 kb genome contains a single open reading frame, which encodes 7 nonstructural proteins and 4 capsid proteins. Translation produces a single polyprotein, which is cleaved into individual functional viral proteins. Replication of the positive-sense genome by the virus-encoded RNA-dependent RNA polymerase produces a negative-sense strand, which is used as a template for further genome synthesis. Evidence suggests that the initial, infecting positive-sense genomes must be translated before they can replicate (Novak and Kirkegaard, 1994). The switch from translation to replication appears to be dependent on the concentration of a viral protein product, 3CD, which stimulates a transition from a linear, translating RNA to a noncovalently associated ‘circular’ RNA competent for replication (Gamarnik and Andino, 1998, 2000; Herold and Andino, 2001). Encapsidation is thought to result from protein–protein associations of capsid pentamers with the RNA replication machinery and protein–RNA association of capsid pentamers with viral RNA (Pfister et al., 1992; Nugent and Kirkegaard, 1995; Liu et al., 2010). Actively replicating genomes are preferentially encapsidated and packaging is biased to exclude negative-sense strands, although the mechanism of this is not understood (Nugent et al., 1999). Although multiple ribosomes can translate a genome at the same time and multiple viral polymerases can replicate a genome at the same time, the processes are mutually exclusive (Gamarnik and Andino, 1998). Similarly, neither translation nor replication can occur after a genome is packaged into a virion.

Several studies have demonstrated that PV genomes are often localized to the cytosolic surfaces of the endoplasmic reticulum, Golgi bodies, lysosomes, or vesicles derived from these (Schlegel et al., 1996; Bolten et al., 1998; Cui et al., 2005; Egger and Bienz, 2005; den Boon et al., 2010). Replication complexes are thought to form on these membranes in cis, resulting in a close association of translation products and positive-sense genomes (Novak and Kirkegaard, 1994; Egger et al., 2000). Compartmentalization of replication complexes likely accounts for the observation that many functions of nonstructural proteins cannot be complemented in trans (Novak and Kirkegaard, 1994; Ansardi et al., 1996). Only capsid proteins, 3CD, and 3D have been demonstrated to trans-complement (Novak and Kirkegaard, 1994; Nugent et al., 1999; Oh et al., 2009). Taken together, these studies suggest that the essential transitions—from translation to replication, and from replication to encapsidation—are largely localized and influenced by the dynamics of the molecules in each compartment.

In recent years, modeling approaches have begun to examine the trade-offs that come with having a genome that is a template for both replication and translation (Krakauer and Komarova, 2003; Regoes et al., 2005; Sardanyés et al., 2009; Thébaud et al., 2010; Martinez et al., 2011). These studies have raised mechanistic and evolutionary questions about the life cycle of single-stranded, positive-sense RNA viruses, but most have not produced models that can be directly compared to data. Several previous models are deterministic in nature (Krakauer and Komarova, 2003; Regoes et al., 2005; Martinez et al., 2011) and assume a well-mixed, spatially uniform cellular environment (Krakauer and Komarova, 2003; Regoes et al., 2005; Sardanyés et al., 2009; Thébaud et al., 2010; Martinez et al., 2011). Experimental evidence suggests that each of these assumptions is problematic and do not reflect the biological constraints and properties of viral replication. The small numbers of the critical molecules that initiate an infection suggest that a stochastic model would more accurately describe early reactions and could make distinct predictions from previous deterministic approaches (Srivastavawz et al., 2002). Often infections begin with relatively few virions that release their genomes into the cell and continue with the translation of these few initial genomes. Random variation in the switch from translation to replication is amplified by the subsequent exponential phase of the infection, and this amplification is likely to bias the mean dynamics of a set of infections. Indeed, recent single-cell studies demonstrated the significant impact of stochastic effects on poliovirus infections (Schulte and Andino, 2014).

Here, we have developed a stochastic simulation model in which we compartmentalize reactions in an effort to accurately describe intracellular dynamics in both space and time. Additionally, rather than fixing each parameter on an estimated value, an approach used by previous models, we use an Approximate Bayesian Computation approach to fit our parameters from temporal quantitative data. We find that by combining stochasticity and spatial structure, our model reflects and describes the population dynamics and structure of the viral population during an infection cycle more accurately than previous models.

Fitting our model to RNA abundances over time, we find that poliovirus follows the geometric replication mode: multiple iterative generations of genomic replication produce progeny virus. Posterior parameter fits indicate that progeny of a single cellular infection are approximately five generations away from the initial, infecting genomes. This replication mode produces populations with expansive, branched genealogies, creating the dramatic potential for the exploration of sequence space, as well as creating the potential for intracellular selection among related mutant genomes.

Results

Inference of replication parameters

We used temporal, quantitative RT-PCR data of both positive-sense genomes and negative-sense strands to estimate the free parameters in our model. The role of each parameter in poliovirus replication and in the mathematics of our model are diagrammed in Figure 2 and described in detail in the ‘Materials and methods’. We chose to use measurements of positive- and negative-sense RNA at multiple time points for three multiplicities of infection (1, 10, and 100), as well as measurements of virion numbers at multiple times for MOI 10; this amounted to 27 measured means, with three data points for each mean. Strand-specific qRT-PCR was performed to quantify positive-sense and negative-sense poliovirus RNA against in vitro transcribed standard RNAs of each sense (Burrill et al., 2013). Along with cell counts, this allowed for temporal measurements of the average positive-sense and negative-sense poliovirus RNA copies per cell. Negative-sense RNA was not detectable until 2 hr post infection for MOIs 10 and 100 and 3 hr post infection for MOI 1. Positive-sense RNA was clearly quantifiable for all time points at the MOI 10 and 100 but did not rise above background levels until 3 hr post infection for MOI 1. Using a newly developed virion immunoprecipitation assay (Burrill et al., 2013), we observed de novo virion assembly between 2 hr and 3 hr post infection. Along with total positive-sense RNA measurements from this time course, we obtained a percentage of genomes encapsidated in quadruplicate at 3 hr, 4 hr, and 5 hr post infection. Figure 3 illustrates this data alongside projections from inferred parameters from the second round of SMC (see Figure 3—source data 1).

Figure 2

Download asset Open asset

The replication cycle of poliovirus as represented in our model.

Numbered steps correspond to sections and equations in the ‘Materials and methods’.

https://doi.org/10.7554/eLife.03753.004

Figure 3 with 1 supplement see all

Download asset Open asset

Projected mean abundances of positive-sense RNA (solid line simulations vs filled circle experimental measurements), negative-sense RNA (dashed line simulations vs hollow circle experimental measurements) and virions (orange dotted line simulations vs star experimental measurements; measured only for MOI = 10).

Each row represents a different example parameter set (see ‘Results’); each line is the mean of 20 individual cell simulations, and the means of five sets of 20 replicate simulations are plotted in each panel. Parameter values are given in Figure 3—source data 1.

https://doi.org/10.7554/eLife.03753.005

Figure 3—source data 1 ‘Best’ parameter set used in Figure 3—figure supplement 1, and Figure 4—figure supplement 4. Representative parameter sets (sets 1–5) used in Figure 3. Note that natural log values are provided for all parameters except c_stay.: https://doi.org/10.7554/eLife.03753.006
Download elife-03753-fig3-data1-v1.docx

The relatively high number of data dimensions, combined with the computationally intensive and highly stochastic nature of our simulations, made a traditional maximum likelihood approach impractical. Instead, we turned to Approximate Bayesian Computation, using as our summary statistic the sum of the squared deviations of the average simulated RNA concentrations (and average fraction of virions for MOI 10) from their corresponding empirical means. This algorithm produces progressively more accurate estimates of each parameter over several rounds; Figure 4—figure supplement 1 illustrates that, for most parameters, round one restricts the credible range of each parameter in comparison to the flat prior and round two leads to further focusing. The data appear to be uninformative for at least one parameter, c_pack; a second parameter, com_max, appears to be poorly constrained by the comparison to MOI = 10 in round one, but somewhat constrained by the broader measurement against all three MOIs in round two. Round two also appears to significantly move the mode of two other parameters, c_com and c_3A.

Figure 4—figure supplement 1 indicates that ABC inference informed the values of nine of our ten parameters, but these marginal parameter distributions alone do not capture correlations between parameter values. Figure 4—figure supplement 2 shows evidence of significant correlations, and Figure 4—figure supplement 3 shows that parameter sets drawn from the marginal distributions in Figure 4—figure supplement 1 (i.e., uncorrelated parameter values) do a poor job of matching the data. While not unexpected, these significant correlations require that we work directly with the sampled parameter sets arising from our inference process, which is the approach we take below.

Each parameter in the posterior is supported over a significant range of possibilities. This remaining uncertainty reflects two factors: the data may be insufficient to determine each parameter, and the inference process may not have fully exploited the inferential power of the data. We took several approaches to quantify the sufficiency of the data and the effectiveness of the inference process. First, we measured the mean error of parameter sets when compared to the data for each multiplicity of infection independently; we asked if performance at one MOI predicted performance at the other two. If so, the dimensionality of our data would be effectively lower than we had initially assumed. Surprisingly, pairwise correlations between mean error at one MOI and another were very weak: Spearman's rho is 0.031 for MOIs 1 and 10, −0.092 for MOIs 1 and 100, and 0.129 for MOIs 10 and 100. This suggests that measurements at each MOI are contributing distinct information to our inference process.

Second, we determined the sensitivity of our measure of model ‘fit’ to variation in each of the parameters. This analysis, described in detail in the ‘Materials and methods’, showed that the data significantly informed the values of eight of ten of the parameters (Figure 4—figure supplement 4). We also performed a separate validation analysis which attempted to infer the replication phenotype, $\bar{g}$ , from mock data simulated from parameter sets drawn from our prior distribution. As described more fully in the ‘Materials and methods’, this exercise confirms that the data and method are adequate to infer the trait of interest, albeit with some degree of inaccuracy (Figure 4—figure supplement 5).

Finally, we examine the fit between the data and the mean dynamics of inferred parameter sets. Figure 3 shows that the inferred parameter sets generally capture the information in the RNA and virion data, although some parameter sets deviate consistently from the data for some values. Variability among replicate sets of twenty single-cell simulations is substantial, correlated across a time series, and greatest for the smallest MOI. Further inferential effort could improve either the accuracy of the mean predicted dynamics or the precision of replicate simulation dynamics, though Figure 3 suggests that such improvements could only be modest. This variability is expected due to the stochastic nature of the simulations, and it may better reflect the biological noise of the infection (Schulte and Andino, 2014).

Predicted replication dynamics

Figure 4 shows the inferred posterior distribution of $\bar{g}$ , the mean number of generations for a packaged virion based on two rounds of inference with measured RNA and virion abundances. This distribution is plotted for MOI = 10; the predicted values at MOI = 1 and MOI = 100 are very similar and highly correlated (weighted means: MOI = 1, 4.96; MOI = 10, 5.06; MOI = 100, 4.85; Spearman's rho (unweighted): MOI 1 and 10, 0.92; MOI 1 and 100, 0.85; MOI 10 and 100, 0.96). While this distribution does show substantial variance, it is strongly inconsistent with a ‘stamping machine’ mode of replication, which would have a $\bar{g}$ near one.

Figure 4 with 7 supplements see all

Download asset Open asset

Left: posterior distribution of the mean number of generations of replication ( $\bar{g}$ ). Right: distribution reweighted by the fit of predicted fractions of translating positive-sense RNA to empirical measurements.

https://doi.org/10.7554/eLife.03753.008

To explore the robustness of this inference, we compared the predicted dynamics of the model to an additional type of data: the fraction of positive-sense RNA molecules translating at each time point. We fractionated infected cell lysates and quantified positive-sense RNAs in monosome and polysome fractions relative to total positive-sense RNA copies. These data render a percentage of genomes associated with translation machinery and provide an additional set of data to evaluate the parameter sets produced by SMC. When measured at an MOI of 10 at 1, 2, 3, 4, and 5 hr post infection, the majority of positive-sense RNAs appeared to be associating with translation machinery, consistently averaging near 85%. Many of the inferred parameter sets are consistent with the measured values but a substantial fraction is clearly inconsistent (Figure 4—figure supplement 6). The summed squared error of the translating fractions is also correlated with $\bar{g}$ (Figure 4—figure supplement 7). To estimate how these new data inform our prediction of $\bar{g}$ , we calculated a weighting factor based on the relative rank of the summed squared error of translating fractions, such that the parameter set with the best fit was assigned a weight of 1, the next a weight of 1134/1135, etc. Reweighting the distribution of $\bar{g}$ by this additional factor produced the distribution shown in Figure 4; the mean $\bar{g}$ shifts from 5.06 to 4.78.

Predicting the distribution of mutations

We simulated mutation and selection during infections to understand how replication dynamics shape the distribution of mutation frequencies among virions. To illustrate how mutant frequencies depended on $\bar{g}$ , we chose two parameter sets with values of $\bar{g}$ at the low and high end of the range supported by the posteriors in Figure 4 and included the ‘best’ parameter set as a representative of the more common values of $\bar{g}$ . Mutation frequencies for these parameter sets (‘best’, ‘low’, and ‘high’—see Figure 3—source data 1) are plotted in Figure 5A for a range of mutants that have a diminished rate of replication relative to the wild type. We chose to model this particular type of deficiency because we expected that replication deficits would directly affect the growth and packaging of the mutants. We observed that deficits in a different trait, the rate of complex formation, were effectively invisible to intracellular selection (the frequency of a mutation with an 80% reduction in complex formation was estimated to be reduced by 0.6–4.6% compared to a neutral mutation in the ‘best’ parameter set); we expect that mutations in traits like the rate of translation would also be complemented by the wild-type phenotype and so would not experience significant selection during the infection in which they arose.

Figure 5

Download asset Open asset

Left: mean mutation frequencies for three parameter sets (‘low’, $\bar{g}$ = 3.94; ‘best’, $\bar{g}$ = 4.65; ‘high’, $\bar{g}$ = 5.76). Mutation rate is 2 × 10⁻⁵ per replication event; ‘relative replication rate’ reflects the reduced probability of a mutant template to replicate, relative to an unmutated strand. Grey lines indicate the expected mean for each parameter set with no selection (deficit of zero); the black line shows the mutation rate in one replication step, and therefore the expected frequency when mutants cannot replicate. Bars indicate 95% confidence intervals. Right: distributions of g of progeny from single cell infections for three parameter sets (‘low’, $\bar{g}$ = 3.94; ‘best’, $\bar{g}$ = 4.65; ‘high’, $\bar{g}$ = 5.76).

https://doi.org/10.7554/eLife.03753.016

Several distinct features of mutation in this model are evident from Figure 5A. Mutation frequency does not decrease linearly as intracellular selection approaches its maximal value; the curve results from the fact that mutant genomes that are immediately packaged are not subject to selection, while the contribution of rare, early mutants to average mutation frequency may be reduced by multiple rounds of intracellular selection. Knowing $\bar{g}$ and the mutation rate allows us to directly calculate the fate of neutral or very unfit mutations, but estimating the frequency of mutations of intermediate fitness requires additional simulations using our model. A third feature is the sizable confidence intervals relative to the number of infections sampled (10 million for each point). This high variability reflects the large contribution of very rare mutations that arise early in an infection and can contribute 1000s of mutant virions, especially when selection is weak.

The effect of these rare, early mutations in the overall mutant distribution can be seen as a departure from a Poisson process. To remove a potential confounding variation in burst size, we compare the distribution of mutations from infections within 10% of the median burst size and calculate a Poisson expectation for a median-sized burst with the same expected frequency. For the ‘best’ parameter set, median infections produced many more bursts with no copies of a given mutation (79.4% vs 22.5% for the Poisson), but also many more bursts with five or more copies of the mutant (8% vs 1.82% for the Poisson; n = 51,365).

The distribution of the number of generations between progeny virions and initial infecting genomes is displayed for three parameter sets (‘best’, ‘low’, and ‘high’—see Figure 3—source data 1) in Figure 5B. Only a very small percentage of progeny are produced via a single genomic replication cycle. Although all three parameter sets have means close to five generations, the distributions show a portion of the progeny virions representing up to 10 generations between the infecting genotypes and packaged virions within a single cellular infection.

Discussion

The intracellular replication mode of a virus strongly influences the frequency and distribution of mutations among progeny, which shape the long-term behavior of an infecting population (Vignuzzi et al., 2006; Lauring et al., 2012). Due to the complex nature of intracellular dynamics, assessing the mode of replication of viruses is a difficult task (but see Chao et al., 2002). Here, we built on decades of mechanistic studies and recent modeling efforts to construct a stochastic computational model coupled with new Bayesian inference methods. We combined these mathematical and computational techniques with accurate temporal data to produce a detailed picture of viral infection. We found that positive- and negative-sense RNA measurements made across multiple MOIs, along with quantitative data on virion packaging, are sufficient to infer that poliovirus replication occurs in several layers of intermediate replication, in contrast to the oft-assumed ‘stamping machine’ model. The implications of the inferred geometric replication mode are as follows: (1) error rates per-replication are considerably lower than measured rates from full-replication-cycle in vivo studies, (2) for a given viral polymerase error rate, mutation will progressively accumulate in both genome and anti-genome RNAs, which should result in a more accentuated departure from the master sequence, allowing a better exploration of the available sequence space during a single infection cycle, and (3) there exists a significant potential for intracellular selection and competition among related genomes, even in infections initiated by only a single genome.

Accurate estimates of viral mutation rates are essential for studying viral evolution and have crucial practical applications in drug and vaccine design. While estimates of mutation rates exist for nearly two dozen viruses, estimates of replication modes exist for only a few (Sanjuan et al., 2010). Calculating per-replication event mutation rates from observed mutant frequencies is not possible, or even meaningful, without knowledge of the replication mode. Thus, estimates of poliovirus per-replication event mutation rates can vary over 10-fold depending on the assumed replication mode (Drake, 1993; Sanjuan et al., 2010). By inferring the mode of replication, we have been able to link estimates of per-replication event mutation rates to published mutant frequencies. The most extensive poliovirus mutant frequency data set estimated an average mutant frequency of 2 × 10⁻⁴ (Acevedo et al., 2014). Using our inferred value of approximately five intracellular generations, we calculate a per-replication event mutation rate of 2 × 10⁻⁴/5 × 2 = 2 × 10⁻⁵, which is in agreement with the average estimates of poliovirus mutation rates calculated in vivo from lethal mutation frequencies (Acevedo et al., 2014). Rates of specific types of mutations, such as transversions and transitions, could each be inferred from their mutation frequencies by the same approach. Our inference of five intracellular generations is also in line with previous inferences of replication mode using the Luria-Delbruck fluctuation test null-class method (Sanjuan et al., 2010). However, our results highlight some limitations for inferring mutation rates from frequencies: intracellular selection may strongly affect mutation frequencies, and the strong stochastic nature of virus replication appears to deeply modulate minor allele distribution, which in turn will result in imprecise estimates of the expected frequency. In particular, assuming that mutation frequency can be modeled as a Poisson process will lead to inappropriate confidence in measured frequencies. As a consequence, multiple empirical mutation frequencies measurements will be required to obtain a more precise determination of true mutation frequencies.

The branched genealogy inferred in our study implies the potential for significant amounts of intracellular complementation, selection, and competition between mutant genomes, even in infections initiated by a single genome (Novak and Kirkegaard, 1994; Turner and Chao, 1999; Vignuzzi et al., 2006). Figure 5A demonstrates the extent to which the frequency of a mutation can be skewed by negative selection during the course of an infection. On the other hand, a mutational event that occurred early in replication and conveyed an intracellular replication advantage could potentially give rise to hundreds or thousands of descendant virions in a single generation. If the mutation distribution data in Figure 5B were displayed as a tree (as in Figure 1), it would contain over 7000 terminal nodes, too many to resolve in a figure. Hence, the apparent potential for mutant interactions is vast. These results suggest that the evolutionary fate of mutations may depend strongly on their intracellular competitive ability, even when multiplicities of infection are low. Additionally, studies that rely on bottlenecks to reduce selection in viral mutation studies (e.g., de la Peña et al., 2000) may be allowing more selection than anticipated. Future population dynamics studies should consider the implications of the intracellular expansion of mutant phenotypes.

Virus infections are normally depicted as deterministic processes that follow a stereotypical path from infection to progeny production and death of the infected cell. However, experimental data show that some infected cells produce few progeny while others produce large populations of progeny (Schulte and Andino, 2014). These observations support the notion that stochasticity is an important factor shaping the outcome of infection. By combining accurate experimental measurements with a stochastic model of viral replication, we have obtained a realistic description of how the molecular events driving the life cycle of the virus govern the outcome of infection in each cell.

A significant benefit of computational modeling is that the information learned in the empirical process of the development of a model can yield important insights in the biologic processes under study. For example, our initial attempts to fit temporal strand measurement data were unable to match the sharp transition to exponential growth seen in the data. Only after removing the requirement for positive-sense genomes to be translated before becoming replication-competent was our model flexible enough to rapidly create templates for exponential replication. While Novak and Kirkegaard (1994) demonstrate a requirement of the initial, infecting genomes to be translated before replication can occur, their data did not implicate that all genomes produced at any time during infection must be translated before replicating. Our study suggests that newly synthesized positive-sense genomes may or may not disperse to nucleate new replication complexes within a single cellular infection, allowing us to model intracellular dynamics in a novel way by permitting a portion of newly made positive-sense strands to immediately act as templates for replication without the requirement of translation.

Our model succeeds in describing many experimentally observed features of viral replication and is an excellent staging point for future and more accurate models of viral replication and evolution. With the realistic benefits of stochasticity, compartmentalized reactions, and parameters inferred from quantitative, temporal data, it acts as a baseline intracellular viral replication algorithm. More quantitative data, including data on the formation and number of replication compartments, would further inform the model. Potential additions of intracellular selection, complementation, and recombination parameters would allow population evolution studies to explore intracellular dynamics with more precision than previous approaches. The ultimate goal is to generate a comprehensive model incorporating mechanistic replication dynamics learned from virology with selection and complementation dynamics learned from population genetics. This tool could be very powerful for informing future therapeutic and preventative strategies.

Share this article

Cite this article

Illustrations of the genealogies of different replication modes.

The replication cycle of poliovirus as represented in our model.

Figure 3—source data 1

Author details

Michael B Schulte

Contribution

Contributed equally with

Competing interests

Jeremy A Draghi

Contribution

Contributed equally with

Competing interests

Joshua B Plotkin

Contribution

Competing interests

Raul Andino

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism