Viral load and contact heterogeneity predict SARSCoV2 transmission and superspreading events
Abstract
SARSCoV2 is difficult to contain because many transmissions occur during presymptomatic infection. Unlike influenza, most SARSCoV2infected people do not transmit while a small percentage infect large numbers of people. We designed mathematical models which link observed viral loads with epidemiologic features of each virus, including distribution of transmissions attributed to each infected person and duration between symptom onset in the transmitter and secondarily infected person. We identify that people infected with SARSCoV2 or influenza can be highly contagious for less than 1 day, congruent with peak viral load. SARSCoV2 superspreader events occur when an infected person is shedding at a very high viral load and has a high number of exposed contacts. The higher predisposition of SARSCoV2 toward superspreading events cannot be attributed to additional weeks of shedding relative to influenza. Rather, a person infected with SARSCoV2 exposes more people within equivalent physical contact networks, likely due to aerosolization.
Introduction
The SARSCoV2 pandemic is an ongoing tragedy that has caused nearly 2 million deaths and massively disrupted the global economy. The pandemic is rapidly expanding in the United States and is reemerging focally in many countries that had previous success in limiting its spread (https://coronavirus.jhu.edu/map.html).
Two features have proven challenging in containing outbreaks. First, most transmissions occur during the presymptomatic phase of infection (He et al., 2020; Moghadas et al., 2020; Tindale et al., 2020). Underlying this observation is a highly variable incubation period, defined as time between infection and symptom onset, which often extends beyond an infected person’s peak viral shedding (Ganyani et al., 2020).
Second, there is substantial overdispersion of the secondary infection distribution (individual R_{0}) for an individual infected with SARSCoV2 (Endo et al., 2020). An overdispersed R_{0} means that most infected people do not transmit at all (individual R_{0} = 0) while a minority of infected people are superspreaders (individual R_{0} >5). If the average population R_{0} is greater than 1, then exponential growth of cases occurs in the absence of effective interventions (LloydSmith et al., 2005). Overdispersion has been quantified: approximately 10–20% of infected people account for 80% of SARSCoV2 transmissions (Endo et al., 2020; Bi et al., 2020). SARSCoV2 superspreader events, in which the duration of contact between a single transmitter and large number of secondarily infected people is often limited to hours, are well documented (Hamner et al., 2020; Park et al., 2020).
This pattern is not evident for influenza which has more homogeneous individual transmission (Cowling et al., 2009; Brugger and Althaus, 2020). Differing viral load kinetics between the two viruses might explain this distinction; SARSCoV2 is often present intermittently in the upper airways for many weeks (Qi et al., 2020; Cao et al., 2020), while influenza is rarely shed for more than a week (Pawelek et al., 2012). Alternatively, SARSCoV2 aerosolization may effectively increase the number of people with true viral exposures given the same contact network. This means that a SARSCoV2infected person in a crowded indoor space could lead to more transmissions relative to an influenzainfected person.
Viral load is recognized as a strong determinant of transmission risk (Watanabe et al., 2010). For influenza, the dose of viral exposure is related to the probability of infection in human challenge studies (Memoli et al., 2015) and early antiviral treatment reduces household transmission (Pebody et al., 2011; Goldstein et al., 2010). Household shedding of human herpesvirus6 is closely linked to subsequent infection in newborns (Mayer et al., 2020) and infants shedding high levels of cytomegalovirus in the oropharynx predictably transmit the virus back to their mothers (Boucoiran et al., 2018). Studies in mice definitively demonstrated that viral exposure dose determines likelihood of SARSCoV1 infection, (Watanabe et al., 2010) and SARSCoV2 experiments in golden hamsters are also highly suggestive of dosedependent infection (Sia et al., 2020).
The epidemiology of viral infections can also be perturbed by biomedical interventions that lower viral load at mucosal transmission surfaces. Reduction of genital herpes simplex virus2 shedding with antiviral treatments decreases probability of transmission (Corey et al., 2004). Suppressive antiretroviral therapy (ART) for HIV virtually eliminates the possibility of partnertopartner sexual transmission and has limited community transmission dramatically (Rodger et al., 2019; Cohen et al., 2016).
SARSCoV2 overdispersion, aerosolization, and dosedependent infection all require urgent attention as possible avenues to understand and mitigate the pandemic as it continues to wreak havoc. Early therapies that lower peak viral load may reduce the severity of COVID19 but may also decrease the probability of transmission and superspreader events (Schiffer et al., 2020). Similarly, the effectiveness of policies such as limiting mass gatherings, and enforcing mask use can be directly evaluated by their ability to reduce exposure viral load and transmission risk (Leung et al., 2020). Here, we developed a transmission simulation framework to capture the contribution of viral load to observed epidemiologic transmission metrics for influenza and SARSCoV2 and used this approach to explain why SARSCoV2 is predisposed to superspreading events.
Results
Overall approach
We designed a series of steps to estimate the viral load required for SARSCoV2 and influenza transmission, as well as conditions required to explain the observed over dispersion of secondary infections (individual R_{0}) and frequent superspreader events associated with SARSCoV2 but not influenza. This process included withinhost modeling of viral loads, simulations of exposures and possible transmissions based on various transmission dose response curves, testing of various parameter sets against epidemiologic data and exploratory analyses with the best fitting model (Figure 1, Figure 1—figure supplement 1).
Withinhost mathematical model of SARS CoV2 shedding
First, we used our previously developed withinhost mathematical model (equations in the Materials and methods), (Goyal et al., 2020) to generate plausible viral load patterns in the upper airway of an infected person or transmitter who could potentially transmit the virus to others (Figure 1, Figure 1—figure supplement 2a). Briefly, the model captures observed upper airway viral kinetics from 25 people from four different countries (Wölfel et al., 2020; Lescure et al., 2020; Young et al., 2020; Kim et al., 2020). Key observed features include an early viral peak followed by a decelerating viral clearance phase, which in turn leads to a temporary plateau at a lower viral load, ultimately followed by rapid viral elimination. Our model captures these patterns by including a densitydependent term for early infected cell elimination and a nonspecific acquired immune term for late infected cell elimination.
One limitation of our model is that only half of study participants provided longitudinal viral load data from the very early days of infection when COVID19 is often presymptomatic. Therefore, the model's output is most reliable for later time points. In particular, we have somewhat limited information on viral expansion rate and duration of peak shedding. To impute possible variability, we generated a set of heterogeneous shedding curves in which the viral upslope, the downslope of viral load after peak and the viral load during plateau phase were varied (Figure 1—figure supplement 2b). Overall, the model generated several distinct patterns of infection: rapid elimination after the initial peak, a prolonged plateau phase with a low viral load, and a prolonged plateau phase with higher viral load. We simulated the transmission model with and without imputed heterogeneity.
Transmission dose response curves
We defined an exposure event in very specific biologic terms as a discrete event consisting of sufficient contact in time and space between a transmitter and one or more uninfected persons (exposure contacts) to allow for the possibility of a successful transmission. An exposure contact is defined as a susceptible person who is exposed to a SARSCoV2infected person for a sufficient period of time and at a close enough distance to allow for the possibility of a successful transmission, which is then determined by the viral load of the infected person. The number of exposure contacts can in theory be reduced by social distancing measures, quarantine, or masking.
We next designed hundreds of dose response curves which separately predict contagiousness (CD curves) and infectiousness (ID curves) at a certain viral dose given an exposure contact. Contagiousness is defined as the viralloaddependent probability of passage of virusladen droplets or airborne particles from the airways of a potential transmitter to the airway of an exposure contact. Infectiousness is defined as the viralloaddependent probability of transmission given direct airway exposure to virus in an exposure contact. Transmission risk is the product of these two mechanistic probabilities derived from the ID and CD curves and results is a transmission dose (TD) response curve. Each CD or ID curve is defined by its ID50 (λ) or viral load at which contagion or infection probability is 50% (Figure 1—figure supplement 2c), as well as its slope (α) (Figure 1—figure supplement 2d; Brouwer et al., 2017). The TD50 is defined as viral load at which there is 50% transmission probability. We assumed equivalent curves for contagiousness and infectiousness for model fitting purposes. We also considered a simpler model with only a single TD curve (for infectiousness) and obtained qualitatively similar results (Materials and methods). Of note, a null model in which there is an assumed fixed probability of infection at all timepoints during infection poorly fit the observed data.
Exposure contact rate simulations
We introduced heterogeneity of exposure contact rates among possible transmitters by randomly selecting from a gamma distribution defined by mean number of exposure contacts per day (θ) and a scaling factor ($\rho $) that controls daily variability (Figure 1—figure supplement 3).
Transmission simulations
For each defined exposure contact, viral load in the transmitter was sampled and transmission risk was then identified based on the product of the CD and ID curves, or the TD curve (Figure 1—figure supplement 2e,f; Figure 1). Based on these probabilities, we stochastically modeled whether a transmission occurred for each exposure contact. This process was repeated when there were multiple possible exposure events within a given discretized time interval and the total number of exposures and transmissions within that interval was calculated.
For each successful transmission, we assumed that it takes $\tau $ days for the first infected cell to produce virus. To inform simulated values of serial interval (SI or time between symptom onset in the secondarily infected and transmitter), we randomly selected the incubation period (IP), for both the transmitter and the newly infected person, from a gamma distribution based on existing data (Figure 1—figure supplement 4a). (Ganyani et al., 2020; Lauer et al., 2020) Incubation period was defined as time from infection to the time of the onset of symptoms, where the mean incubation for SARSCoV2 is 5.2 days compared to 2 days for influenza (Ganyani et al., 2020; Cowling et al., 2009; Lauer et al., 2020).
Model fitting
In order to identify the parameter set that best recapitulated the observed data, we first performed a grid search simulating 41,7792 parameter sets with 256 possible TD curves defined by ID50 and CD50 (λ) and slope (α), along with 408 combinations of the mean exposed contact rate per day ($\theta $) and associated variance parameter ($\rho $), and values of $\tau \in [0.5,1,2,3]$ days. We aimed to identify the parameter set that best recapitulated the following features of the observed epidemiologic and individuallevel data for SARSCoV2: mean R_{0} across individuals (R_{0}∈[1.4, 2.5]), (Ganyani et al., 2020; Endo et al., 2020; Bi et al., 2020; Du et al., 2020; World Health Organization, 2020) mean serial interval across individuals (SI ∈[4.0, 4.5]), (Ganyani et al., 2020; Du et al., 2020; Nishiura et al., 2020) cumulative distribution functions of individual R_{0}, (Endo et al., 2020; Bi et al., 2020; Zhang et al., 2020; Dillon, 2020; Miller, 2020) and cumulative distribution functions of serial intervals derived from SARSCoV2 transmission pair studies that were conducted early during the pandemic (Du et al., 2020), prior to any confounding influence of social distancing measures with the exception of likely postsymptomatic selfisolation behavior. Here, we define individual R_{0} as the total number of secondary transmissions from the transmitter in a fully susceptible population (Materials and methods). Given that viral RNA is composed mostly of noninfectious material, we further checked the closeness of the solved ID curve with the observed relationship between viral RNA and probability of positive viral culture from a longitudinal cohort of infected people (van Kampen, 2020).
Influenza modeling
Next, we performed equivalent analyses for influenza (results detailed below and in Figures 6–8) to explain the lower frequency of observed superspreader events with this infection. Influenza viral kinetics were modeled using a previously datavalidated model (Baccam et al., 2006). Incubation periods for influenza are lower and less variable than for SARSCoV2 and were randomly selected for each simulation of the model using a gamma distribution (Figure 1—figure supplement 4b; Lessler et al., 2009). We again fit the model to: mean R_{0} across individuals (R_{0}∈[1.1, 1.5]), (Opatowski et al., 2011; Cowling et al., 2010; Roberts and Nishiura, 2011) mean serial interval (SI ∈[2.9, 4.3]), (Cowling et al., 2009) cumulative distribution functions of individual R_{0} corresponding to the 2008–2009 influenza A H1N1 epidemic season with mean R_{0} = 1.26 and dispersion parameter = 2.36 in the negative binomial distribution, and cumulative distribution functions of serial intervals (Cowling et al., 2009; Brugger and Althaus, 2020; Opatowski et al., 2011).
Modelpredicted individual R_{0} and serial intervals for SARSCoV2 infection
A single model parameter set ([$\alpha $, $\lambda $, $\tau $, $\theta $, $\rho $] = [0.8, 10^{7}, 0.5, 4, 40]) most closely reproduced empirically observed individual R_{0} and serial interval histograms (Figure 2a,c) and cumulative distribution functions (Figure 2b,d).
The preciseness of the estimated parameter set was also independently confirmed with the use of Approximate Bayesian Computation (ABC) rejection sampling method, (Liepe et al., 2014) and a finer grid search in proximity to the aforementioned optimal solution (shown in Materials and methods). Despite assuming that each infected person sheds at a high viral load for a period of time (Figure 1, Figure 1—figure supplement 2b), the model captured the fact that ~75% of 10,000 simulated transmitters do not infect any other people and that each increase in the number of individual transmissions is associated with a lower probability (Figure 2a). Furthermore, we also fit the model to a higher population R_{0} of 2.82.9 and arrived at a similar set of parameter values but with a higher daily rate of exposure contacts ([$\alpha $, $\lambda $, $\tau $, $\theta $, $\rho $] = [0.8, 10^{7.5}, 0.5, 20, 30]) confirming the robustness of the qualitative results of the model.
SARSCoV2 viral load was recently measured with viral RNA levels and mapped to concurrent probability of positive viral culture in a Dutch cohort (van Kampen, 2020). Our model output demonstrated a nearly equivalent infectious dose response curve if we multiplied modeled viral RNA levels by 25 (Figure 2—figure supplement 1): this adjustment was likely necessary because viral loads in the Dutch study participants were notably higher than those in German, Singaporean, Korean, and French participants used in our intrahost model fitting, perhaps due to different sampling technique or primers used in PCR assays (Wölfel et al., 2020; Lescure et al., 2020; Young et al., 2020; Kim et al., 2020; van Kampen, 2020).
The model also generated superspreader events with 10,000 simulated transmissions (Figure 2b). If superspreaders are defined as those who produce at least five secondary infections, we estimate that ~10% of all infected people and ~35% of all transmitters are superspreaders. If superspreaders are defined as those who produce at least 10 secondary infections, we estimate that ~6% of all infected people and ~25% of all transmitters are superspreaders. If superspreaders are defined as those who produce at least 20 secondary infections, we estimate that ~2.5% of all infected people and ~10% of all transmitters are superspreaders. If superspreaders are defined as those producing ≥5, ≥10, or ≥20 secondary infections, the contribution to all secondary infections is estimated at ~85%, ~70%, or ~44%, respectively (Table 1).
The model also recapitulated the high variance of the serial interval observed within SARSCoV2 transmission pairs, including negative values observed in the data (Figure 2c,d). We next projected generation time, defined as the period between when an individual becomes infected and when they transmit the virus, for all transmission pairs and identified that the mean serial interval (4.4 days) provides an accurate approximation of mean generation time. However, the variance of generation time was considerably lower and by definition does not include negative values. A majority of generation times fell between 4 and 7 days, compared to −5 to 12 days for the serial interval (Figure 2e).
Viral load thresholds for SARSCoV2 transmission
The optimized ID curve has an ID50 of 10^{7} viral RNA copies and a moderately steep slope (Figure 3a). The TD50 for SARSCoV2 was slightly higher at 10^{7.5} viral RNA copies (Figure 3a). To assess the impact of these parameters on transmission, we performed simulations with 10,000 transmitters and concluded that transmission is very unlikely (~0.00005%) given an exposure to an infected person with an upper airway viral load of <10^{4} SARSCoV2 RNA copies, and unlikely (~0.002%) given an exposure to an infected person with a viral load of <10^{5} SARSCoV2 RNA copies. On the other hand, transmission is much more likely (39%) given an exposure to an infected person who is shedding >10^{7} SARSCoV2 RNA copies, and 75% given an exposure to an infected person with a viral load of >10^{8} SARSCoV2 RNA copies. We obtain similar results (not shown) when we solve our model using the assumption of homogeneous viral load trajectories (according to viral take off time, peak viral load and first phase decline) as in Figure 1—figure supplement 2a.
Narrow duration of high infectivity during SARSCoV2 infection
We next plotted the probability of infection given an exposure to a transmitter. Under multiple shedding scenarios, the window of high probability transmission is limited to time points around peak viral load, and some heterogeneity in regard to peak infectivity is noted between people (Figure 3b–d). In general, infected persons are likely to be most infectious (i.e., above TD50) for a ~0.5–1.0 day period between days 2 and 6 after infection. We therefore conclude that the observed wide variance in serial interval (Figure 2c) results primarily from the possibility of highly discrepant incubation periods between the transmitter and infected person, rather than wide variability in shedding patterns across transmitters.
Requirements for SARS CoV2 superspreader events
The solved value for exposed contact network heterogeneity (ρ) is 40 indicating high variability in daytoday exposure contact rates (Figure 1—figure supplement 3d) with an average number of exposed contacts per day (θ = 4). We generated a heat map from our TD curve to identify conditions required for superspreader events which included viral load exceeding 10^{7} SARS CoV2 RNA copies and a high number of exposure contacts on that day. We observed an inflection point between 10^{6} and 10^{7} SARS CoV2 RNA copies, after which large increases in the number of daily exposure contacts prominently increases the number of transmissions from a single person (Figure 4a). The exposure contact network occasionally resulted in days with ≥150 exposure contacts per day, which may allow an extremely high number of secondary infections from a single person (Figure 4a).
We next plotted transmission events simulated on a daily basis over 30 days since infection, from 10,000 transmitters, according to viral load at exposure and number of exposure contacts on that day (Figure 4b). Secondary transmissions to only 1–3 people occurred almost exclusively with daily numbers of exposure contacts below 10 with any exposure viral load exceeding 10^{6} RNA copies or with higher numbers of exposure contacts per day and viral loads exceeding 10^{5} RNA copies. Massive superspreader events with over 50 infected people almost always occurred at viral loads exceeding 10^{7} RNA copies with high levels of concurrent exposure contacts (Figure 4b).
We next identified that over 50% of simulated secondary infections were associated with a transmitter who has a high number of exposed contacts (11–100 per day) and a viral load exceeding 10^{6} RNA copies (Figure 4c), which is the mechanistic underpinning of why ~70% of all simulated secondary infections arose from transmitters who produced more than 10 secondary infections (Table 1).
Longer duration of infectivity during the earliest phase of the pandemic in Wuhan
The serial interval was reported to be longer during the initial stages of the pandemic in Wuhan (before January 22, 2020, termed as prelockdown) possibly because infected people were initially less likely to selfisolate after developing symptoms (Li et al., 2020; Ali et al., 2020). Therefore, we refit our model to this data with mean serial interval of ~7.5 days, mean ${\mathrm{R}}_{0}\in \left[2.2,\text{}2.5\right]$ and the individual R_{0} distribution as shown in Figure 1a. The new parameter estimates ([$\alpha $, $\lambda $, $\tau $, $\theta $, $\rho $] = [0.6, 10^{5.0}, 0.5, 0.7, 40]) yields good fits to the distribution of individual R_{0} as well as serial interval (Figure 5a–b). Importantly, the optimized ID curve has an ID50 of 10^{5} viral RNA copies during the early phase of the pandemic (prelockdown), ~100fold lower than postlockdown (Figure 3a). Keeping in mind that ID50 is an average measure, we propose that ID50 estimates leading to more prolonged and intimate exposure contacts that were of higher risk given an equivalent viral load, whereas estimates from later in the pandemic (Figure 3) reflect greater social distancing and higher use of masking. This is also evident in the longer duration of infectivity during prelockdown (Figure 5d). The predicted exposure contact network in Wuhan is highly dispersed and resembles Figure 1—figure supplement 3d.
Model predicted individual R_{0} and serial intervals for influenza infection
A single model parameter set most closely reproduced empirically observed histograms and cumulative distribution functions for individual R_{0} and serial intervals for influenza: ($\alpha $, $\lambda $, $\tau $, $\theta $, $\rho $) = (0.7, 10^{5.5}, 00.5, 4, 1). ID50 values for influenza were lower than SARSCoV2, but a direct comparison cannot be made because tissue culture infectious dose (TCID) has been more commonly used for measurements of influenza viral load, whereas viral RNA is used for SARSCoV2. Nevertheless, TCID is a closer measure of infectious virus and it is thus reasonable that ID50 based on TCID for influenza would be ~30fold lower than ID50 based on total viral RNA (infectious and noninfectious virus) for SARSCoV2 (van Kampen, 2020).
The other notable difference was a considerably lower $\rho $ value for influenza (Figure 1—figure supplement 3b), denoting much less heterogeneity in the number of exposure contacts per person while the average daily exposure contact was the same for both viruses (4 per day). The model captures the fact that 40% of people in this influenza infected cohort do not transmit to anyone else and that each increase in the number of individual transmissions is associated with a lower probability (Figure 6a). In keeping with the observed data, our model simulations predicted that relative to SARSCoV2, superspreader events involving five or more people were ~5fold less common overall and 10fold less common among transmitters (~2% of all infected people and ~3% of transmitters) (Figure 6b, Table 1). Superspreaders defined as those infecting ≥5 individuals contributed to only ~10% to all transmissions (Table 1).
The model also recapitulated the lower variance of serial interval for influenza relative to SARSCoV2 (Figure 6c,d). We next identified that the mean and variance of the serial interval provide good approximations of the mean and variance for generation time. A majority of generation times fell between 2 and 6 days (Figure 6e).
Viral load thresholds for influenza transmission
Based on the optimized TD curve for influenza (Figure 7a), we next plotted the probability of infection given an exposure to an infected person. The TD50 for influenza was 10^{6.1} TCID/mL. Under various shedding scenarios, the window of high probability transmission was limited to time points around peak viral load (Figure 7b–d). In general, infected persons were likely to be most infectious (i.e., above TD50) for a ~ 0.5–1.0 day period. The observed low variance in serial interval (Figure 6c) primarily results from the low variance in incubation period (Figure 1—figure supplement 4b) and the limited variability in timing of peak viral loads across transmitters.
Determinants of influenza individual R_{0}
We generated a heat map from our TD curve to identify conditions governing influenza transmission to multiple people including viral load exceeding 10^{6} influenza TCID and a high number of exposure contacts per day. The contact network never resulted in days with more than 15 exposure contacts per day, which severely limited the possible number of transmissions from a single person relative to SARSCoV2 (Figure 8a, Figure 1—figure supplement 3b).
We plotted transmission events simulated on a daily basis over 30 days since infection from 10,000 transmitters according to viral load at exposure and number of exposure contacts on that day (Figure 8b). Secondary transmissions to fewer than five people accounted for 90% of infections (Table 1) and occurred with fewer than 10 daily exposure contacts and exposure viral loads exceeding 10^{4} TCID. Small scale superspreader events with 5–10 infected people almost always occurred at viral loads exceeding 10^{5} TCID with 5–10 concurrent exposure contacts (Figure 8b).
We next identified that over 50% of infections were associated with a transmitter who had fewer than 10 exposure contacts per day and a viral load exceeding 10^{4.5} TCID (Figure 8c), which is why no infected person ever transmitted to more than 10 other people (Table 1).
Differing exposed contact distributions, rather than viral kinetics, explain SARS CoV2 superspreader events
We sought to explain why SARSCoV2 has a more overdispersed distribution of individual R_{0} relative to influenza. To assess viral kinetics as a potential factor, we comparatively plotted transmission risk per exposure contact as a function of time since infection in 10,000 transmitters for each virus. The median per contact transmission risk among simulated transmitters was slightly higher for influenza; however, the upper bounds of 75% and 95% percentile of transmission risks among simulated transmitters were marginally higher for SARSCoV2 compared to influenza along with a longer tail of low transmission risk beyond 7 days after exposure (Figure 9a). The transmission risk was considerably higher for the 25% of simulated SARSCoV2 infections with the highest viral loads, suggesting that a substantial subset of infected people may be more predisposed to superspreading. When plotted as time since onset of symptoms, the variability in the time at which transmissions take place relative to symptom onset was considerably larger for persons with high SARSCoV2 viral load, owing to the variable incubation period of this virus (Figure 9b).
The median duration of shedding over infectivity thresholds was short and nearly equivalent for both viruses. For SARSCoV2 and influenza, median [range] time above ID10 was 2.7 [0, 7] and 2.4 [1.6, 3.7] days, respectively; median time above ID25 was 1.7 [0, 3] and 1.5 [0, 2.2] days, respectively; median time above ID50 was 0.8 [0, 1.3] and 0 [0, 1.3] days, respectively; median time above ID75 was 0 [0, 0.4] and 0 [0, 0] days, respectively; median time above ID90 was 0 [0, 0] and 0 [0, 0] days, respectively. ID10, ID25 and ID50 values were more variable across SARSCoV2 simulations due to more heterogeneous viral kinetics among simulated infected people.
For SARSCoV2 and influenza, median [range] time above TD10 was 1.4 [0, 2.5] and 1.2 [0, 2.0] days, respectively; median time above TD25 was 0.8 [0, 1.3] and 0.3 [0, 1.3] days, respectively; median time above TD50 was 0 [0, 0.5] and 0 [0, 0.4] days, respectively; median time above TD75 was 0 [0, 0] and 0 [0, 0] days, respectively. TD10, TD25 and TD50 values were more variable across SARSCoV2 simulations due to a minority of trajectories with prolonged moderate viral loads (Figure 9c).
We next plotted the frequency of exposure contacts per day for both viruses and noted a higher frequency of days with no exposed contacts (Figure 9d), but also a higher frequency of days with more than 10 exposure contacts (Figure 9e) for SARSCoV2 relative to influenza, despite an equivalent mean number of daily exposure contacts. To confirm that this distribution drives the different observed distributions of individual R_{0} values (Figure 9f), we simulated SARSCoV2 infection with an assumed $\rho $=1 (low dispersion of exposure contacts) and generated a distribution of individual R_{0} similar to that of influenza (Figure 9—figure supplement 1a). Similarly, we simulated influenza infection with an assumed $\rho $=40 (high dispersion of exposure contacts) and generated a distribution of individual R_{0} similar to that of SARSCoV2 (Figure 9—figure supplement 1b.). Under all scenarios, predicted distributions of serial interval (Figure 9g, Figure 9—figure supplement 1) and generation time (Figure 9h, Figure 9—figure supplement 1) were unchanged by shifts in the exposed contact network.
In summary, we conclude that despite differing viral shedding kinetics (Figure 10a, top), the kinetics of infectivity are extremely similar between influenza and SARSCoV2 (Figure 10a, bottom). Overdispersed contact networks, likely due to wider airborne dispersal of virus in equivalent crowded environments, explain the ability of SARSCoV2 to initiate superspreader events (Figure 10b).
Projections of targeted physical distancing
Physical distancing is a strategy to decrease R_{0}. We simulated a decrease in the contact rate uniformly across the population and noted a decrease in population R_{0} (Figure 10—figure supplement 1a) as well the percent of infected people who will transmit (Figure 10—figure supplement 1b) and become superspreaders (Figure 10—figure supplement 1c–d). An approximately 40% decrease in the average exposed contact rate lowered R_{0} below 1 (Figure 10—figure supplement 1a). We further investigated whether lowering contact rate among larger groups only, in particular by banning exposure events with a high number of exposure contacts, could control the epidemic. We identified that limiting exposure contacts to no more than five per day is nearly equivalent to limiting exposure contacts altogether and that only a small decrease in mean exposure contact rate can achieve R_{0} < 1 if exposure events with more than 20 contacts are eliminated (Figure 10—figure supplement 2).
Presymptomatic transmission and superspreading risk
Much of the highest transmission risk for SARSCoV2 exists in the presymptomatic phase (Figure 9b) which explains why 62% of simulated transmissions occurred in the presymptomatic phase for SARSCoV2, compared to 10% for influenza. Similarly, 62% and 21% of SARSCoV2 and influenza superspreader events with secondary transmissions ≥ 5, and 39% of SARSCoV2 superspreader events with secondary transmissions ≥ 10 fell exclusively in the presymptomatic period.
Discussion
Our model provides a plausible link between SARSCoV2 shedding kinetics and the virus’ most fundamental epidemiologic properties. First, we identify a transmission dose response curve which specifies that a nasal viral load below a certain threshold (conservatively ~10^{4} RNA copies) is unlikely to result in transmission – consistent with the overall rarity of positive cultures at these levels (van Kampen, 2020). We also predict a relatively steep TD curve such that transmission becomes much more likely when a susceptible person contacts an infected person shedding above 10^{8} viral RNA copies. The amount of viral RNA can be roughly converted to the probability of a positive viral culture which approximates infectiousness, and this simulated relationship qualitatively matches the dose response observed in formal dose challenge experiments performed with SARSCoV1 in mice (Watanabe et al., 2010).
Our results may have relevance for dosing of SARSCoV2 in human challenge experiments that are being conducted for testing vaccines and therapies. However, we emphasize that it would first be valuable to test the model’s predictions with graded challenge models of infection in nonhuman primates or golden hamsters (Chandrashekar et al., 2020). Our estimates for viral load transmission thresholds are inherently imprecise based on the fact that there is no international standard for PCR or clinical sampling site (saliva versus nasopharyngeal swab). Indeed, viral loads have varied considerably across studies and average transmission dose may have increased as social distancing measures were implemented across the globe, because higher viral load may be required to achieve transmission across larger distances and over shorter periods of time.
While the duration of shedding for SARSCoV2 is often three weeks or longer, (Qi et al., 2020; Cao et al., 2020) our model predicts a short period—averaging less than two days—of high transmission risk. We note that our model was fit to pandemic settings where selfisolation upon development of symptoms was normative, and this highrisk window increases to 2–3 days when fit to prelockdown conditions in Wuhan. This duration of infectivity is predicted to be comparable to that of influenza. Our model predicts that transmission after the first week of infection is quite rare. Output is also consistent with the observation that transmissions commonly but not always occurs during the presymptomatic phase of infection (He et al., 2020; Moghadas et al., 2020; Tindale et al., 2020). This variability is attributed less to timing of peak viral load and more to the variable incubation period of the virus.
The observed high heterogeneity in serial interval is attributable almost entirely to the variable nature of the incubation period, rather than transmission occurring extremely late after infection. While our estimate for mean generation time is equivalent to that of mean serial interval, it is notable that the range of SARSCoV2 serial intervals is much wider than the range of generation times. This result is evident even though we extended heterogeneity of viral shedding curves beyond that observed in the somewhat limited existing shedding data.
The finding of limited duration of SARSCoV2 infectivity has practical implications. First, considerable resources are being used in hospitals and skilled nursing facilities to isolate patients with persistent SARSCoV2 shedding. Our model suggests that a low nasal viral load, particularly during late infection, may not require full patient isolation procedures in the absence of aerosolizing procedures. Experimental verification would be required before the implementation of such policies. If true, substantial hospital resources and valuable isolation beds might be gained for subsequent waves of infection. Similarly, employees would be able to return to work sooner after infection, saving lost labor and wages. Our results also suggest that time since first positive test may be predictive of lack of contagion, although more viral load kinetic studies will be needed to confirm the existing observation that viral loads after a week of infection are usually low and associated with negative viral cultures (van Kampen, 2020). Finally, our conclusions are generally supportive of rapid, less sensitive assays which are more likely to detect infection at periods of contagion (Larremore, 2020).
Many of these conclusions, including specific viral load thresholds for transmission, a steep dose response curve and a maximum 2 day duration of high contagion within an infected individual are equally relevant for influenza infection. One important difference is that incubation periods for influenza are far less variable, which means that at the individual level the serial interval is much more likely to be predictive of the generation time.
Another finding is that SARSCoV2 superspreading events are dependent on a large number of exposure contacts during the relatively narrow 1–2 days peak infectivity window. Because we predict that superspreader potential may be a signature property of infection, rather than a characteristic of a tiny subset of infected people, this result also has practical implications. A common experience during the pandemic has been early identification of a small cluster of infected people within a specific confined environment such as a senior living home, crowded work environment, athletic team, or restaurant. Our results demonstrate that newly diagnosed people within small clusters may be past the peak of their superspreading potential. At this stage, many more infections have often been established and drastic quarantine procedures should be considered. Other undiagnosed, presymptomatic infected people may have superspreader potential, while the known infected person is no longer contagious, highlighting the importance of effective contact tracing.
At the prevention level, school opening and work opening strategies should focus on severely limiting the possible number of exposure contacts per day. Where large numbers of exposure contacts are unavoidable, rigorous masking policies should be considered, perhaps with N95 masks that may more significantly lower exposure viral loads (Leung et al., 2020).
Influenza infection is much less predisposed to superspreader events than SARSCoV2. Yet, influenza shedding at levels above those required for a high probability of transmission occurs with only slightly lower frequency. Therefore, viral kinetics are unlikely to drive the markedly different probability of superspreader events between the two viruses—despite the fact that the overall duration of SARSCoV2 shedding exceeds duration of influenza shedding often by more than 2 weeks.
Rather, our analysis suggests that the exposure contact networks of SARSCoV2 transmitters are highly dispersed relative to those of influenza. This observation is unlikely to relate to different societal contact matrices as both viruses share the respiratory transmission route, with demonstrated ability to spread in schools, homes, workplaces. and other crowded environments. The most likely explanation underlying differing exposure contact networks is that SARSCoV2 is more predisposed to airborne transmission than influenza (van Doremalen et al., 2020). Here, our precise definition of an exposure contact (sufficient contact between a transmitter and an uninfected person to potentially allow transmission) is of high relevance. Our result suggests that a SARSCoV2infected person in a crowded poorly ventilated room will generate more exposure contacts than an influenza infected person in the exact same room, likely based on wider dispersal and / or longer airborne survival of the virus. Thus, airborne transmission of SARSCoV2 has a significant downstream effect on epidemiology. This prediction reinforces current public health recommendation to avoid crowded indoor spaces with poor air recirculation.
On the other hand, a much higher proportion of SARSCoV2infected people than influenza infected people do not transmit at all. This result lacks a clear mechanistic explanation but may imply that aerosolization occurs only in a subset of infected people. One theoretical explanation is that high viral load shedding in the presymptomatic phase is defined by lack of cough or sneeze leading to limited spatial diffusion of virus. Alternatively, it is also possible that a proportion of infected people never shed virus at high enough viral loads to allow efficient transmission. This possibility speaks to the need for more quantitative viral load data gathered during the initial stages of infection (Kissler, 2020).
Age cohort structure differs between the two infections, with a lower proportion of observed pediatric infections for SARSCoV2. If adults have more high exposure events than children, then this could also explain superspreader events. We are less enthusiastic about this hypothesis. First, SARSCoV2 superspreader events have occurred in schools and camps and would likely be more common in the absence of widespread global school closures in high prevalence regions. Second, a sufficient proportion of influenza cases occur in adults to rule out the presence of frequent large superspreading events in this population.
Our analysis has important limitations. First, exposure contacts were assumed to be homogeneous and we do not capture the volume of the exposing aerosol or droplet. If a largevolume droplet contains 10 times more viral particles than an aerosol droplet, then the exposure could be dictated by this volume as well as the viral load of the potential transmitter. It is possible that under rare circumstances with extremely highvolume exposures, even persons with extremely low viral loads may transmit.
Second, based on the quality of available data, we fit our models for SARSCoV2 and influenza to viral RNA and viral culture, respectively. This might impact the quantitative results but is unlikely to affect qualitative predictions of the model. Existing data suggest that kinetics of viral RNA and culture are similar during both infections, with culture having lower sensitivity to detect virus (van Kampen, 2020).
Third, data during early SARSCoV2 infection is fairly limited such that we may be underestimating variance of initial viral growth rates. We accounted for this by imputing additional heterogeneity in viral kinetic characteristics and obtained similar results regarding mechanisms of transmission. Nevertheless, it is possible that viral shedding kinetics among infected people may be more variable than observed to date. Importantly, we cannot rule out the possibility that a small minority of infected people shed at sufficient levels for transmission for much longer than has been observed to date. We also assume that viral load does not change substantially over the course of the subsequent day when simulating transmission which may allow for slight misclassification of exposure viral load.
Fourth, our intrahost model of SARSCoV2 was fit to heterogeneous data from different continents and with different sampling techniques and PCR assays (Goyal et al., 2020). Similarly, R_{0} estimates for SARSCoV2 have varied temporally and spatially across the globe and the same is likely true for influenza. Social interaction matrices are very likely to differ among countries and between urban and rural areas. For these reasons, our estimates of TD50 are necessarily imprecise based on available data and should serve only as a conservative benchmark. Regarding overdispersion of individual R_{0}, we do not capture the fact that social interaction networks in certain municipalities may predispose to more frequent and severe superspreader events. Nevertheless, we are confident that differing aerosolization properties of the two viruses underly the observed overdispersion of individual R_{0} for SARSCoV2 relative to influenza, regardless of geographic location.
Fifth, contagiousness could have different dose response dynamics than viral load dependent infectiousness and may require investigation in the future upon the availability of epidemiologically relevant additional data.
Overall, the model is intended to capture a universal property of SARSCoV2 infection but is not specific for local epidemics. Nevertheless, it is extremely clear that superspreader events are a globally generalizable feature of SARSCoV2 epidemiology.
In conclusion, fundamental epidemiologic features of SARSCoV2 and influenza infections can be directly related to viral shedding patterns in the upper airway as well as the nature of exposure contact networks. We contend that this information should be leveraged for more nuanced public health practice in the next phase of the pandemic.
Materials and methods
SARSCoV2 withinhost model
Request a detailed protocolTo simulate SARSCoV2 shedding dynamics, we employed our previously described viral infection model (Goyal et al., 2020). In this model, susceptible cells (S) after coming into contact with SARSCoV2 (V) become infected at rate $\beta VS$. The infected cells (I) produce new virus at a percapita rate $\pi $. The model also includes the clearance of infected cells in two ways: (1) by an innate response with density dependent rate $\delta {I}^{k}$; and (2) an acquired response with rate $\frac{m{E}^{r}}{{E}^{r}+{\varphi}^{r}}$ mediated by SARSCoV2specific effector cells ($E$). The clearance mediated by innate immunity depends on the infected cell density and is controlled by the exponent $k$. The Hill coefficient $r$ parameterizes the nonlinearity of the second response and allows for rapid saturation of the killing. Parameter $\varphi $ defines the effector cell level by which killing of infected cells by $E$ is half maximal.
In the model, SARSCoV2specific effector cells rise after two stages from precursors cells (${M}_{1}$ and ${M}_{2}$). The first precursor cell compartment (${M}_{1}$) proliferates in the presence of infection with rate $\omega I{M}_{1}$ and differentiates into the effector cell at a per capita rate $q$ during the next intermediate stage. Finally, effector cells die at rate ${\delta}_{E}$. The model is expressed as a system of ordinary differential equations:
We assumed $S\left(0\right)={10}^{7}$ cells/mL, $I\left(0\right)=1$ cells/mL, $V\left(0\right)=\frac{\pi I\left(0\right)}{c}$ copies/mL, ${M}_{1}\left(0\right)=1$, ${M}_{2}\left(0\right)=0$ and ${E}_{0}=0$.
When we introduce simulated heterogeneity in cases of SARSCoV2 (by increasing the standard deviation of the random effects of parameters β by 20, δ by 2, k by 2, and π by 5 in the original distribution from Goyal et al., 2020), some of the viral shedding curves suggest that viral shedding could continue for long period (over 6 weeks). Indeed, while median viral shedding duration has been estimated at 12–20 days, shedding for many months is also observed commonly (Widders et al., 2020). We assumed that viral loads after day 20 drop to a exposurelevel viral load level (i.e. $V\left(0\right)$) as most viral shedding observed after this point is transient and at an extremely low viral load (Huang, 2020). The population distribution of parameters to simulate artificial SARSCoV2 viral shedding dynamics is provided in Table 2.
Influenza withinhost model
Request a detailed protocolTo simulate viral shedding dynamics of influenza viral, we employ a model (Baccam et al., 2006) that is a simplified version of the viral dynamics model presented for SARSCoV2. This model assumes $k=0$ and $m=0$ and can be expressed as a system of ordinary differential equations:
Following this model, (Baccam et al., 2006) we assumed $S\left(0\right)=4\times {10}^{8}$ cells/mL, $I\left(0\right)=1$ cells/mL, $V\left(0\right)=\frac{\pi I\left(0\right)}{c}$ copies/mL. To simulate artificial influenza viral shedding dynamics, we assumed the population distribution of parameters $Log10\left(\beta \right)$, $Log10\left(\pi \right)$, $Log10\left(\gamma \right)$ and $Log10\left(\delta \right)$ are 4.56 (0.17), 1.98 (0.14), 0.47 (0.03), and 0.60 (0.06), respectively.
Doseresponse model
Request a detailed protocolFor both viruses, to estimate the infectiousness ${P}_{t}\left[V\right(t\left)\right]$ based on viral loads $V\left(t\right)$, we employed the function, ${P}_{t}\left[V\right(t\left)\right]=\frac{V{\left(t\right)}^{\alpha}}{{\lambda}^{\alpha}+V{\left(t\right)}^{\alpha}}$. Here, $\lambda $ is the infectivity parameter that represents the viral load that corresponds to 50% infectiousness and 50% contagiousness, and $\alpha $ is the Hill coefficient that controls the slope of the doseresponse curve.
Transmission model and reproduction number
Request a detailed protocolOur transmission model assumes that only some contacts of an infected individual with viral load dependent infectiousness are physically exposed to the virus (defined as exposure contacts), that only some exposure contacts have virus passaged to their airways (contagiousness) and that only some exposed contacts with virus in their airways become secondarily infected (successful secondary infection). Contagiousness and infectiousness are then treated as viral load dependent multiplicative probabilities with transmission risk for a single exposure contact being the product. Contagiousness is considered to be viral load dependent based on the concept that a transmitter’s dispersal cloud of virus is more likely to prove contagious at higher viral load, which is entirely separate from viral infectivity within the airway once a virus contacts the surface of susceptible cells.
We next assume that the total exposed contacts within a time step $\left({\eta}_{{\mathrm{\Delta}}_{t}}\right)$ is gamma distributed, that is, ${\eta}_{{\mathrm{\Delta}}_{t}}~\Gamma \left(\frac{\theta}{\rho},\rho \right){\mathrm{\Delta}}_{t}$, using the average daily contact rates $\left(\theta \right)$ and the dispersion parameter $\left(\rho \right)$. To obtain the true number of exposure contacts with airway exposure to virus, we simply multiply the contagiousness of the transmitter with the total exposed contacts within a time step (i.e., ${\zeta}_{t}={\eta}_{{\mathrm{\Delta}}_{t}}{P}_{t}$).
Transmissions within a time step are simulated stochastically using timedependent viral load to determine infectiousness (${P}_{t}$). Successful transmission is modeled stochastically by drawing a random uniform variable ($U\left(\mathrm{0,1}\right)$) and comparing it with infectiousness of the transmitter. In the case of successful transmission, the number of secondary infections within that time step $\left({T}_{{\mathrm{\Delta}}_{t}}\right)$ is obtained by the product of the infectiousness (${P}_{t}$) and the number of exposure contacts drawn from the gamma distribution $\left({\zeta}_{t}\right)$. In other words, the number of secondary infections for a time step is ${T}_{{\mathrm{\Delta}}_{t}}=Ber\left({P}_{t}\right){P}_{t}{\eta}_{{\mathrm{\Delta}}_{t}}$. If we disregard contagiousness by assuming ${P}_{t}=1$ in ${\zeta}_{t}$, we identify that there are little to no differences on overall results other than the emergent TD curve and optimal parameter set describing doseresponse curve and exposed contact network, which no longer agrees as closely with in vitro probability of positive virus culture (Figure 2—figure supplement 1; van Kampen, 2020).
We obtain the number of secondary infections from a transmitter on a daily basis assuming that viral load, and subsequent risk, does not change substantially within a day. We then summed up the number of secondary infections over 30 days since the time of exposure to obtain the individual reproduction number, that is,${R}_{0}={\sum}_{{\mathrm{\Delta}}_{t}}{T}_{{\mathrm{\Delta}}_{t}}$.
Serial interval and generation time
Request a detailed protocolWe further assume that upon successful infection, it takes $\tau $ days for the virus to move withinhost, reach infection site and produce the first infected cell.
To calculate serial interval (time between the onset of symptoms of transmitter and secondarily infected person), we sample the incubation period in the transmitter and in the secondarily infected person from a gamma distribution with a shape described in Figure 1—figure supplement 4 (Ganyani et al., 2020; Lauer et al., 2020). In cases in which symptom onset in the newly infected person precedes symptom onset in the transmitter, the serial interval is negative; otherwise, serial interval is nonnegative. We calculate generation time as the difference between the time of infection of transmitter and the time of infection of secondarily infected person.
Individual R_{0} and serial interval data for model fitting
Request a detailed protocolThere is abundance of data confirming overdispersed R_{0} for SARSCoV2. From contact tracing of 391 SARSCoV2 cases in Shenzhen, China, 1286 close contacts were identified: the distribution of individual R_{0} values in this cohort was highly overdispersed, with 80% of secondary infections being caused by 8–9% of infected people (Bi et al., 2020). In another study, authors analyzed the contact/travel history of 135 infected cases in Tianjin, China and determined heterogeneity in the individual R_{0}. (Zhang et al., 2020) Another contract tracing study also identified and characterized SARSCoV2 clusters in Hong Kong and estimated that 20% of cases were responsible for 80% of local transmission (Dillon, 2020).
A modeling study that simulated observed outbreak sizes in ~40 affected countries during the early phase of epidemics also confirmed that ~80% of secondary transmissions may have been caused by a small fraction of infectious individuals (~10%) (Endo et al., 2020). The latter study provided the distribution of individual R_{0} (Figure 2a) that we employed for fitting purposes. Using the data on 468 COVID19 transmission events reported in mainland China, Du et al. estimated the mean serial interval as well as the distribution of serial interval (Figure 2c). (Du et al., 2020) We employed this data for fitting purposes. Alternatively, we also fit to the mean serial interval as well as the distribution of serial interval formulated from data of 162 transmission pairs that were observed before January 22, 2020 (termed as prelockdown) during the initial stages of the pandemic in Wuhan (Ganyani et al., 2020).
The cumulative distribution function of individual R_{0} for influenza was obtained from a modeling study that simulated the transmission dynamics of seasonal influenza in Switzerland from 2003 to 2015 (Brugger and Althaus, 2020). We picked the parameters mean R_{0} = 1.26 and dispersion parameter = 2.36 in the negative binomial distribution that corresponded to the 2008–2009 influenza A H1N1 epidemic season (Brugger and Althaus, 2020). Another modeling study that simulated the agespecific cumulative incidence of 2009 H1N1 influenza in eight Southern Hemisphere Countries yielded similar results (Opatowski et al., 2011). By following the household members of index cases, a study estimated the cumulative distribution of serial interval based on symptomonset times from 14 transmission pairs (Cowling et al., 2009). We employed these cumulative distribution functions of individual R_{0} and serial interval of influenza for fitting purposes.
Fitting procedure
Request a detailed protocolTo estimate the values of unknown parameters in cases of SARSCoV2, we performed a grid search comprehensively exploring a total of 417,792 combinations of 5 parameters taking the following values:
$\tau \in$ [0.5, 1, 2, 3] days,
$\alpha \in$ [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 4.0, 5.0, 10.0]
$\lambda \in$ [10^{0}, 10^{0.5}, 10^{1.0}…, 10^{8}]
$\theta \in$ [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 4.0, 5.0, 10.0, 20.0, 50.0].
$\rho \in$ [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 75.0, 100, 200, 500].
The parameter sets of ($\lambda ,\tau ,\alpha ,\theta ,\rho $) were simulated for 1000 infected individuals to determine how well each set generates the summary statistics of mean R_{0}, mean SI and the R_{0} histograms by following a procedure explained in Figure 1—figure supplement 1 and below:
Step A
Request a detailed protocolSimulate viral load $V\left(t\right)$ of 1000 simulated infected individuals using the differential equation for virus described for SARSCoV2 and influenza above:
For each combination of ($\lambda ,\tau ,\alpha ,\theta ,\rho $)
For each time step ${\mathrm{\Delta}}_{t}$
Compute ${P}_{t}\left[V\left(t\right);\lambda ,\alpha \right]$
Draw ${\eta}_{{\mathrm{\Delta}}_{t}}~\Gamma \left(\frac{\theta}{\rho},\rho \right){\mathrm{\Delta}}_{t}$
Calculate ${T}_{{\mathrm{\Delta}}_{t}}=Ber\left({P}_{t}\right){P}_{t}{\eta}_{{\mathrm{\Delta}}_{t}}$
Calculate ${R}_{0}={\sum}_{{\mathrm{\Delta}}_{t}}{T}_{{\mathrm{\Delta}}_{t}}$
Check if calculated mean ${R}_{0}$ is in the range: (Ganyani et al., 2020; Du et al., 2020)
Calculate Serial Interval based on $\tau $ and incubation period
Check if calculated $SI$ is in the range in: (Ganyani et al., 2020; Du et al., 2020; Nishiura et al., 2020)
Step B
Request a detailed protocolIf the parameter combination in Step A satisfy the criteria, then
Compute residual sum of squares (RSS) for the obtained ${R}_{0}$ and histogram from: (Endo et al., 2020; Bi et al., 2020; Zhang et al., 2020; Miller, 2020)
We visually checked whether our doseresponse curve matched the observed probability of positive virus culture (van Kampen, 2020). We assumed that viral loads derived from positive culture (van Kampen, 2020) can be considered equivalent to viral loads in the withinhost model if divided by a positive integer. We identified an integer of 25 to provide closest fit to the empirical data (Figure 2—figure supplement 1).
We performed a global sensitivity analysis to identify which parameter variability accounted for fit to different components of the data. Only narrow ranges of λ permitted close fit to the mean of R_{0} and distribution functions of individual R_{0} (Figure 10—figure supplement 3), while a specific value for α was necessary to fit to mean serial interval and distribution functions of individual R_{0} (Figure 10—figure supplement 3). Only narrow ranges of θ permitted close fit to the mean of R_{0} and distribution functions of individual R_{0} (Figure 10—figure supplement 4), while a specific value for ρ was necessary to fit to distribution functions of individual R_{0} (Figure 10—figure supplement 4).
To obtain TD50 (${\lambda}_{T})$ based on ID50 ($\lambda $), we use the following relation for real positive values of α,
From solving the second half ($\frac{1}{{\left(\frac{{10}^{{\lambda}_{T}}}{V}\right)}^{{\alpha}_{T}}+1}=0.5$), we get
Substituting $V={10}^{{\lambda}_{T}}$ in the firsthalf, we have,
Alternative fitting procedures
Request a detailed protocolTo further confirm the validity and robustness of our results, particularly in relation to the estimated parameter values, we alternatively employed Approximate Bayesian Computation (ABC) rejectionsampling method to estimate 1000 combinations of four parameters ($\alpha ,\lambda ,\theta ,\rho $) with an error threshold of 0.1 while assuming $\tau =0.5$ days. We assumed uniform prior distribution of each parameter with ranges described in the previous section. This approach yields a parameter distribution for all parameters except the parameter $\alpha $ that is centered at parameter values generated by the grid search method (Figure 10—figure supplement 5). The value for the parameter $\alpha $ can be further narrowed by the use of additional fitting to the data of the probability of positive virus culture in vitro.
Furthermore, we also employed a narrow grid search close to the solution yielded by both the extensive grid search in the previous section and ABC. In this procedure, we searched a total of ~8000 parameter combinations varying, $\alpha $ from 0.5 to 0.1 at intervals of 0.05; $\lambda $ from 6.5 to 7.5 at intervals of 0.1; $\theta $ from 3 to 5 at intervals of 0.2, and $\rho $ from 30 to 50 at intervals of 5 while keeping $\tau $ fixed at 0.5 days. This process yielded most likely parameter estimates (with the lowest error threshold of 0.02) with median $\lambda $, $\theta $, $\rho $ and $\alpha $ taking values 6.9, 3.8, 45, and 0.9, respectively. We have not visually represented the results as the figure looks very similar to Figure 10—figure supplement 3 and Figure 10—figure supplement 4 but on a narrow range of parameter values.
Overall, these two alternative approaches confirm that our parameter estimates obtained for our model using the grid search method in the previous section are the most likely parameter values given the features of the COVID19 pandemic.
Data and materials availability
Request a detailed protocolThe original data and code is shared at: https://github.com/ashish2goyal/SARS_CoV_2_Super_Spreader_Event; Goyal, 2021; copy archived at swh:1:rev:1b445c424f077d248da4860e56825e82a5307eb6.
Data availability
The original data and code is shared at: https://github.com/ashish2goyal/SARS_CoV_2_Super_Spreader_Event copy archived at https://archive.softwareheritage.org/swh:1:rev:1b445c424f077d248da4860e56825e82a5307eb6/.
References

Kinetics of influenza A virus infection in humansJournal of Virology 80:7590–7599.https://doi.org/10.1128/JVI.0162305

Nonprimary maternal Cytomegalovirus infection after viral shedding in infantsPediatric Infectious Disease Journal 37:627–631.https://doi.org/10.1097/INF.0000000000001877

Doseresponse relationships for environmentally mediated infectious disease transmission modelsPLOS Computational Biology 13:e1005481.https://doi.org/10.1371/journal.pcbi.1005481

A trial of LopinavirRitonavir in adults hospitalized with severe Covid19New England Journal of Medicine 382:1787–1799.https://doi.org/10.1056/NEJMoa2001282

Antiretroviral therapy for the prevention of HIV1 transmissionNew England Journal of Medicine 375:830–839.https://doi.org/10.1056/NEJMoa1600693

Oncedaily valacyclovir to reduce the risk of transmission of genital herpesNew England Journal of Medicine 350:11–20.https://doi.org/10.1056/NEJMoa035144

Estimation of the serial interval of influenzaEpidemiology 20:344–347.https://doi.org/10.1097/EDE.0b013e31819d1092

Serial interval of COVID19 among publicly reported confirmed casesEmerging Infectious Diseases 26:1341–1343.https://doi.org/10.3201/eid2606.200357

SoftwareSARS_CoV_2_Super_Spreader_Event, version swh:1:rev:1b445c424f077d248da4860e56825e82a5307eb6Software Heritage.

High SARSCoV2 attack rate following exposure at a choir practice  Skagit county, Washington, march 2020MMWR. Morbidity and Mortality Weekly Report 69:606–610.https://doi.org/10.15585/mmwr.mm6919e6

Viral load kinetics of SARSCoV2 infection in first two patients in KoreaJournal of Korean Medical Science 35:e86.https://doi.org/10.3346/jkms.2020.35.e86

Clinical and virological data of the first cases of COVID19 in Europe: a case seriesThe Lancet Infectious Diseases 20:697–706.https://doi.org/10.1016/S14733099(20)302000

Outbreak of 2009 pandemic influenza A (H1N1) at a New York City schoolNew England Journal of Medicine 361:2628–2636.https://doi.org/10.1056/NEJMoa0906089

Early transmission dynamics in Wuhan, China, of novel CoronavirusInfected pneumoniaNew England Journal of Medicine 382:1199–1207.https://doi.org/10.1056/NEJMoa2001316

Serial interval of novel coronavirus (COVID19) infectionsInternational Journal of Infectious Diseases 93:284–286.https://doi.org/10.1016/j.ijid.2020.02.060

Coronavirus disease outbreak in call center, South KoreaEmerging Infectious Diseases 26:1666–1670.https://doi.org/10.3201/eid2608.201274

Modeling withinhost dynamics of influenza virus infection including immune responsesPLOS Computational Biology 8:e1002588.https://doi.org/10.1371/journal.pcbi.1002588

Use of antiviral drugs to reduce household transmission of pandemic (H1N1) 2009, united kingdomEmerging Infectious Diseases 17:990–999.https://doi.org/10.3201/eid/1706.101161

Factors associated with the duration of viral shedding in adults with COVID19 outside of Wuhan, China: a retrospective cohort studyInternational Journal of Infectious Diseases 96:531–537.https://doi.org/10.1016/j.ijid.2020.05.045

An early TestandTreat strategy for severe acute respiratory syndrome coronavirus 2Open Forum Infectious Diseases 7:ofaa232.https://doi.org/10.1093/ofid/ofaa232

Aerosol and surface stability of SARSCoV2 as compared with SARSCoV1New England Journal of Medicine 382:1564–1567.https://doi.org/10.1056/NEJMc2004973

SARSCoV2: the viral shedding vs infectivity dilemmaInfection, Disease & Health 25:210–215.https://doi.org/10.1016/j.idh.2020.05.002

ReportStatement on the Meeting of the International Health Regulations (2005) Emergency Committee Regarding the Outbreak of Novel Coronavirus (2019nCoV)WHO.

Evaluating transmission heterogeneity and SuperSpreading event of COVID19 in a metropolis of ChinaInternational Journal of Environmental Research and Public Health 17:3705.https://doi.org/10.3390/ijerph17103705
Decision letter

Aleksandra M WalczakSenior Editor; École Normale Supérieure, France

Lauren ChildsReviewing Editor

Jonathan FordeReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This work assesses the role of withinhost viral shedding dynamics and contact heterogeneity on distribution of transmission events in SARSCoV2 and influenza. Using multiscale modeling, predictions are made on the manner and contribution of super spreading to transmission. This model has the potential to provide insight into transmission dynamics of SARSCoV2, which could help inform policy.
Decision letter after peer review:
Thank you for submitting your article "Viral load and contact network predict SARSCoV2 transmission and superspreading events" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Aleksandra Walczak as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Jonathan Forde (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)
Summary:
This work assesses the role of withinhost viral shedding dynamics and contact heterogeneity on distribution of transmission events in SARSCoV2 and influenza. Using multiscale modeling, with similar resulting generation time and serial interval distributions to published work, predictions are made on the manner and contribution of super spreading to transmission. Distinctions are seen when comparing to applying a similar modeling framework to influenza.
Essential revisions:
1) Statistical analysis:
The model parameters are estimated using an exhaustive grid search, which yields good fits for the bestfit values, but there is no assessment of statistical certainty in the parameter values. The authors essentially adopted a strategy in the spirit of approximate Bayesian computation (ABC), by proposing parameter values, simulating from a model, and comparing summary statistics of the simulated output to known values from the literature. The analysis would be helped by doing a more formal ABC analysis, as this would provide a better sense of how narrowly constrained the parameter values are given the available data. At minimum, it would be more convincing to consider additional parameter sets grided across a narrowed region of parameter space before selecting an optimal fit.
2) Model validation
The state of our knowledge about these infections is limited, both by the short time during with this research has been conducted, and the paper's need to rely on data taken from before the introduction of confounding factors such as social distancing and widespread mask usage. For this reason, in addition to the included sensitivity analysis for the model parameters, a sense of the sensitivity of the model's conclusions to the data set to which it is being fitted is needed. How much would these results change if there are errors in our understanding of the distribution of individual R0 values, or serial intervals?
3) Distinction in assumptions for flu and covid
The populations on which the histograms for the two diseases are based are quite different. For SARSCoV2, the studies are from China (Shenzhen, Tianjin and Hong Kong), while those for influenza are from Switzerland. Could cultural differences be relevant? What about seasonal differences, as the time during which the early SARSCoV2 studies occurred was necessarily restricted?
Furthermore, the explanation for the difference between influenza and COVID is based primarily on differences in contact patterns. While the Discussion clarifies this to be based on the efficiency with which exposures lead to infections (and presymptomatic transmission), which does sound like a viral parameter, rather than a social one. These viral factors do seem more believable than having to explain why the patterns of social contact exhibited by influenza patients would differ from those of SARSCoV2 patients. More focus on possible mechanistic explanations is warranted.
Title: "Contact heterogeneity" rather than "contact network" is more appropriate for the title as no network is considered.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for submitting your article "Viral load and contact heterogeneity predict SARSCoV2 transmission and superspreading events" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Aleksandra Walczak as the Senior Editor. The Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.
Revisions:
A key definition in the work (as stated in the Discussion) is the exposure contact. However, apart from the parenthetical definition in the Discussion, exposure contact is not clearly defined. This is particularly important as estimates on the exposure contact seem to impact the estimates on the viral load to infectiousness functional relationship. In particular, infectiousness is defined "… as the viral load dependent probability of transmission given direct airway exposure to virus in an exposure contact." Why does the parameterization for this change so significantly pre and post lockdown when the main difference appears to be in exposure contacts? The explanation given refers to "… more prolonged and intense exposure contacts…" This seems to imply an unequal reflection of what an "exposure contact" truly means. Furthermore, how does the "exposed contact rate" compare with exposure contacts.
Figure 10. This new figure is confusing. In the text, viral shedding kinetics are referred to as panel (A) and kinetics of infectivity as panel (B), but they are both part of panel (A) in the figure. What is different between the SARSCoV2 and Influenza schematics in panel (B). Panel (C) is referred to in the text but is not in the figure.
https://doi.org/10.7554/eLife.63537.sa1Author response
Essential revisions:
1) Statistical analysis:
The model parameters are estimated using an exhaustive grid search, which yields good fits for the bestfit values, but there is no assessment of statistical certainty in the parameter values. The authors essentially adopted a strategy in the spirit of approximate Bayesian computation (ABC), by proposing parameter values, simulating from a model, and comparing summary statistics of the simulated output to known values from the literature. The analysis would be helped by doing a more formal ABC analysis, as this would provide a better sense of how narrowly constrained the parameter values are given the available data. At minimum, it would be more convincing to consider additional parameter sets grided across a narrowed region of parameter space before selecting an optimal fit.
Thank you for this valuable feedback. We agree that a more detailed evaluation of the parameter space would be valuable. In the first version of the manuscript, we demonstrate in Figure 10—figure supplement 3 and Figure 10—figure supplement 4 that there is a fairly narrow parameter space both for the dose response curve (Figure 10—figure supplement 3) and contact network space (Figure 10—figure supplement 4) that provides best model fit. Given that we are only exploring 4 parameters, we are confident that our parameter estimates are fairly precise.
We appreciate the idea to test the model fit within a tighter parameter space, in proximity to the existing optimal solution. We performed this analysis with λ varied between 6.5 and 7.5 logs at intervals of 0.1, α varied between 0.5 and 1.0 at intervals of 0.05, θ varied between 3 and 5 at intervals of 0.2, and ρ varied between 30 and 50 at intervals of 5, with narrower intervals to allow more precise parameter estimation. The estimates, now reported in the manuscript, only change slightly with this approach.
We also performed ABC to gather 1,000 parameter combinations with an error threshold of 0.1. This approach further confirms that our previous parameter estimates are the most likely parameter values given the features of the COVID19 pandemic (see Author response image 1). We have summarized these new results in the revised manuscript as in “The preciseness of parameter estimates was independently confirmed with the use of Approximate Bayesian Computation (ABC) rejection sampling method (https://www.nature.com/articles/npre.2011.5964.1) and a finer grid search in proximity to the aforementioned optimal solution” in the revised manuscript and in Figure 10—figure supplement 5.
2) Model validation
The state of our knowledge about these infections is limited, both by the short time during with this research has been conducted, and the paper's need to rely on data taken from before the introduction of confounding factors such as social distancing and widespread mask usage. For this reason, in addition to the included sensitivity analysis for the model parameters, a sense of the sensitivity of the model's conclusions to the data set to which it is being fitted is needed. How much would these results change if there are errors in our understanding of the distribution of individual R0 values, or serial intervals?
This is an excellent point. We now include data which demonstrates that the serial interval was longer during the initial stages of the pandemic in Wuhan than in subsequent cohorts across the globe (PMID: 31995857). This makes sense because infected people are currently much more likely to selfisolate after developing symptoms than in the very earliest phases of the pandemic. Essentially, a higher proportion of prelockdown transmissions occurred during the symptomatic phase compared to postlockdown. We therefore have additionally refit the model to this data and show that the duration of infectivity is predicted to be significantly longer under this assumption. This is now included as Figure 5 in the revised version.
We similarly fit the model to presumed data with higher R0 of 2.82.9 (see Author response image 2, which demonstrates excellent fits to the individual R0 distribution drawn from the negative binomial distribution with assumed mean R0=2.9 and borrowed dispersion parameter as k=0.2 from PMC7338915, PMID: 33024095 and https://www.medrxiv.org/content/10.1101/2020.06.28.20142133v2). The main difference is that only ~60% of individuals do not produce secondary infections when the mean R0 is high in comparison to ~75% when we assumed low mean R0. These results are briefly described in the revised paper but we do not include the figure in the revisions as we do not wish to bloat the paper as it already contains a high number of figures and supplementary figures.
3) Distinction in assumptions for flu and covid
The populations on which the histograms for the two diseases are based are quite different. For SARSCoV2, the studies are from China (Shenzhen, Tianjin and Hong Kong), while those for influenza are from Switzerland. Could cultural differences be relevant? What about seasonal differences, as the time during which the early SARSCoV2 studies occurred was necessarily restricted?
This is a very important point as well. We have amended the Discussion to make the more conservative conclusion that the model explains the differences in these two settings for these two viruses. That said, we are extremely confident that our qualitative conclusions can be generalized for global influenza and SARSCoV2 for the simple reason that superspreader events are a fundamental feature of the coronavirus in every country where it has been assessed, whereas these events are very rarely reported for influenza. We make this point in the revised Discussion.
We also repeatedly emphasize that the estimate of our 4 parameter values are bound to differ in different countries with different cultural practices and contact matrices.
Furthermore, the explanation for the difference between influenza and COVID is based primarily on differences in contact patterns. While the Discussion clarifies this to be based on the efficiency with which exposures lead to infections (and presymptomatic transmission), which does sound like a viral parameter, rather than a social one. These viral factors do seem more believable than having to explain why the patterns of social contact exhibited by influenza patients would differ from those of SARSCoV2 patients. More focus on possible mechanistic explanations is warranted.
Thank you. It is a subtle point but we feel that the contact network parameters that differ between the two viruses are dictated less by social factors and more by critical differences in viral biology. First, both viruses spread in very similar populations. Moreover, as stated in the paper, if a SARSCoV2 infected person and an influenza infected person (both at peak viral loads) are placed in an equivalent crowded and poorly ventilated room, the SARSCoV2 infected person will have the potential to have exposure contacts with a greater proportion of people with the virus. The most parsimonious explanation is aerosolization for SARSCoV2 being much more important than for influenza. We now conclude the paper with a conceptual figure (Figure 10) which illustrates this point.
Title: "Contact heterogeneity" rather than "contact network" is more appropriate for the title as no network is considered.
We absolutely agree. Thank you for this point.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Revisions:
A key definition in the work (as stated in the Discussion) is the exposure contact. However, apart from the parenthetical definition in the Discussion, exposure contact is not clearly defined. This is particularly important as estimates on the exposure contact seem to impact the estimates on the viral load to infectiousness functional relationship. In particular, infectiousness is defined "… as the viral load dependent probability of transmission given direct airway exposure to virus in an exposure contact." Why does the parameterization for this change so significantly pre and post lockdown when the main difference appears to be in exposure contacts?
Thank you for pointing this omission out. We have now added the following definition: an exposure contact is a susceptible person who is exposed to a SARSCoV2 infected person for a sufficient period of time and at a close enough distance to allow for the possibility of a successful transmission. The probability of transmission then becomes dependent on viral load of the infected person. For this reason, the number of exposure contacts may go down under conditions of quarantine or selfisolation upon developing symptoms.
The explanation given refers to "… more prolonged and intense exposure contacts…" This seems to imply an unequal reflection of what an "exposure contact" truly means. Furthermore, how does the "exposed contact rate" compare with exposure contacts.
We emphasize that exposure contacts were likely different (longer and more intimate) prior to widespread knowledge of the virus in Wuhan allowing for lower viral load transmissions. We eliminate the paragraph with “exposed contact rate” as we agree this is confusing.
Figure 10. This new figure is confusing. In the text, viral shedding kinetics are referred to as panel (A) and kinetics of infectivity as panel (B), but they are both part of panel (A) in the figure. What is different between the SARSCoV2 and Influenza schematics in panel (B). Panel (C) is referred to in the text but is not in the figure.
We fixed the paper such that the text matches the paper.
https://doi.org/10.7554/eLife.63537.sa2Article and author information
Author details
Funding
National Institute of Allergy and Infectious Diseases (R01 AI12112905S1)
 Joshua T Schiffer
Council of State and Territorial Epidemiologists (Inform Public Health Decision Making Funding Opportunity)
 Joshua T Schiffer
Fred Hutchinson Cancer Research Center
 Joshua T Schiffer
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are grateful to study participants from around the globe who donated critical virologic data early during the pandemic. We thank Jeroen van Kampen and Marion Koopmans for helpful discussions. Funding: This study was supported by Fred Hutchinson Cancer Research Center faculty discretionary funds and by National Institute of Allergy and Infectious Diseases (grant # 5R0 1AI12112905).
Senior Editor
 Aleksandra M Walczak, École Normale Supérieure, France
Reviewing Editor
 Lauren Childs
Reviewer
 Jonathan Forde
Publication history
 Received: September 28, 2020
 Accepted: February 22, 2021
 Accepted Manuscript published: February 23, 2021 (version 1)
 Accepted Manuscript updated: February 24, 2021 (version 2)
 Version of Record published: March 3, 2021 (version 3)
Copyright
© 2021, Goyal et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 11,058
 Page views

 662
 Downloads

 40
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Medicine
 Microbiology and Infectious Disease
 Epidemiology and Global Health
 Immunology and Inflammation
eLife has published articles on a wide range of infectious diseases, including COVID19, influenza, tuberculosis, HIV/AIDS, malaria and typhoid fever.

 Epidemiology and Global Health
 Evolutionary Biology
Evaluating the characteristics of emerging SARSCoV2 variants of concern is essential to inform pandemic risk assessment. A variant may grow faster if it produces a larger number of secondary infections ('R advantage') or if the timing of secondary infections (generation time) is better. So far, assessments have largely focused on deriving the R advantage assuming the generation time was unchanged. Yet, knowledge of both is needed to anticipate impact. Here we develop an analytical framework to investigate the contribution of both the R advantage and generation time to the growth advantage of a variant. It is known that selection on a variant with larger R increases with levels of transmission in the community. We additionally show that variants conferring earlier transmission are more strongly favoured when the historical strains have fast epidemic growth, while variants conferring later transmission are more strongly favoured when historical strains have slow or negative growth. We develop these conceptual insights into a new statistical framework to infer both the R advantage and generation time of a variant. On simulated data, our framework correctly estimates both parameters when it covers time periods characterized by different epidemiological contexts. Applied to data for the Alpha and Delta variants in England and in Europe, we find that Alpha confers a +54% [95% CI, 4563%] R advantage compared to previous strains, and Delta +140% [98182%] compared to Alpha, and mean generation times are similar to historical strains for both variants. This work helps interpret variant frequency dynamics and will strengthen risk assessment for future variants of concern.