Inference of the SARS-CoV-2 generation time using UK household data

  1. William S Hart  Is a corresponding author
  2. Sam Abbott
  3. Akira Endo
  4. Joel Hellewell
  5. Elizabeth Miller
  6. Nick Andrews
  7. Philip K Maini
  8. Sebastian Funk
  9. Robin N Thompson
  1. Mathematical Institute, University of Oxford, United Kingdom
  2. Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, United Kingdom
  3. Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, United Kingdom
  4. Immunisation and Countermeasures Division, UK Health Security Agency, United Kingdom
  5. Data and Analytical Sciences, UK Health Security Agency, United Kingdom
  6. Mathematics Institute, University of Warwick, United Kingdom
  7. Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, University of Warwick, United Kingdom

Abstract

The distribution of the generation time (the interval between individuals becoming infected and transmitting the virus) characterises changes in the transmission risk during SARS-CoV-2 infections. Inferring the generation time distribution is essential to plan and assess public health measures. We previously developed a mechanistic approach for estimating the generation time, which provided an improved fit to data from the early months of the COVID-19 pandemic (December 2019-March 2020) compared to existing models (Hart et al., 2021). However, few estimates of the generation time exist based on data from later in the pandemic. Here, using data from a household study conducted from March to November 2020 in the UK, we provide updated estimates of the generation time. We considered both a commonly used approach in which the transmission risk is assumed to be independent of when symptoms develop, and our mechanistic model in which transmission and symptoms are linked explicitly. Assuming independent transmission and symptoms, we estimated a mean generation time (4.2 days, 95% credible interval 3.3–5.3 days) similar to previous estimates from other countries, but with a higher standard deviation (4.9 days, 3.0–8.3 days). Using our mechanistic approach, we estimated a longer mean generation time (5.9 days, 5.2–7.0 days) and a similar standard deviation (4.8 days, 4.0–6.3 days). As well as estimating the generation time using data from the entire study period, we also considered whether the generation time varied temporally. Both models suggest a shorter mean generation time in September-November 2020 compared to earlier months. Since the SARS-CoV-2 generation time appears to be changing, further data collection and analysis is necessary to continue to monitor ongoing transmission and inform future public health policy decisions.

Editor's evaluation

This paper is a timely update to the authors previous work and will be of interest to those working on public health responses and the mathematical modelling of infectious diseases. In this work the authors infer the generation interval of SARS–CoV–2 which can allow for the assessment of public health measures. The derivation of the likelihood function is also of interest to mathematical modellers as it allows for the inference of the generation interval from data sets where susceptible depletion may dominate infection dynamics.

https://doi.org/10.7554/eLife.70767.sa0

Introduction

The generation time (or generation interval) of a SARS-CoV-2 infector-infectee pair is defined as the period of time between the infector and infectee each becoming infected (Anderson and May, 1992; Diekmann and Heesterbeek, 2000; Griffin et al., 2020; Svensson, 2007; Wallinga and Lipsitch, 2007). The generation time distribution of many infector-infectee pairs characterises the temporal profile of the transmission risk of an infected host (averaged over all hosts and normalised so that it represents a valid probability distribution; Fraser, 2007). Inferring the generation time distribution of SARS-CoV-2 is important in order to predict the effects of non-pharmaceutical interventions such as contact tracing and quarantine (Ashcroft et al., 2021; Ferretti et al., 2020b; Hart et al., 2021). In addition, the generation time distribution is widely used in epidemiological models for estimating the time-dependent reproduction number from case notification data (Abbott et al., 2020; Fraser, 2007; Gostic et al., 2020; Thompson et al., 2020) and is crucial for understanding the relationship between the reproduction number and the epidemic growth rate (Fraser, 2007; Parag et al., 2021; Park et al., 2020a; Wallinga and Lipsitch, 2007).

The SARS-CoV-2 generation time distribution has previously been estimated using data from known infector-infectee transmission pairs (Ferretti et al., 2020a; Ferretti et al., 2020b; Hart et al., 2021) or entire clusters of cases (Ganyani et al., 2020; Hu et al., 2021; Sun et al., 2021). These studies involved data (Cheng et al., 2020; Ferretti et al., 2020b; Ganyani et al., 2020; He et al., 2020; Xia et al., 2020; Zhang et al., 2020) collected between December 2019 and April 2020, almost entirely from countries in East and Southeast Asia (with the exception of four transmission pairs from Germany and four from Italy in Ferretti et al., 2020b). Evidence from January and February 2020 in China suggested a temporal reduction in the mean generation time due to non-pharmaceutical interventions (Sun et al., 2021). Specifically, effective isolation of infected individuals is likely to have reduced the proportion of transmissions occurring when potential infectors were in the later stages of infection, thereby shortening the generation time (Sun et al., 2021). Similarly, two other studies found a decrease in the serial interval (the difference between symptom onset times of an infector and infectee; Ali et al., 2020) and an increase in the proportion of presymptomatic transmissions (Bushman et al., 2021) in China over the same time period, which can be attributed to symptomatic hosts being isolated increasingly quickly over time.

Despite estimation of the SARS-CoV-2 generation time in Asia early in the pandemic, relatively little is known about the generation time distribution outside Asia, and whether or not any changes have occurred in the generation time since the early months of the pandemic. At the time of writing, we are aware of only one previous study in which the generation time was estimated using data from the UK (Challen et al., 2021). In that study (Challen et al., 2021), data describing symptom onset dates for 50 infector-infectee pairs, collected by Public Health England (PHE; now the UK Health Security Agency) between January and March 2020 as part of the ‘First Few Hundred’ case protocol (Boddington et al., 2021; Public Health England, 2020), were used to infer the generation time distribution. However, since these transmission pairs mostly consisted of international travellers and their household contacts, the authors concluded that their estimates of the generation time may have been biased downwards due to enhanced surveillance and isolation of these cases (Challen et al., 2021).

Here, we use data from a household study (Miller et al., 2021), conducted between March and November 2020, to estimate the SARS-CoV-2 generation time distribution in the UK under two different underlying transmission models. In the first model (the ‘independent transmission and symptoms model’), a parsimonious assumption is made that the generation time and the incubation period of the infector are independent (i.e. there is no link between the times at which infectors transmit the virus and the times at which they develop symptoms), as has often been employed in studies in which the SARS-CoV-2 generation time has been estimated (Challen et al., 2021; Ferretti et al., 2020a; Ganyani et al., 2020; Hart et al., 2021; Lehtinen et al., 2021; Table 1). In the second model (the ‘mechanistic model’), we use a mechanistic approach in which potential infectors progress through different stages of infection, first becoming infectious before developing symptoms (Hart et al., 2021). Infectiousness is therefore explicitly linked to symptoms in the mechanistic model. A feature of the mechanistic model is that individuals with longer incubation periods will (on average) be infectious for longer before developing symptoms, and so generate more transmissions, compared to those with shorter incubation periods.

By fitting separately to data from three different time intervals within the study period, we explore whether or not there was a detectable temporal change in the generation time distribution.

Table 1
Previous SARS-CoV-2 generation time estimates.

Estimates of the mean and standard deviation of the generation time distribution, obtained under the assumption of independent transmission and symptoms. 95% credible intervals are shown in brackets where available.

StudyLocationTime periodMean generation time (days)Standard deviation of generation time distribution (days)
Ferretti et al., 2020bVariousDecember 2019-February 20205.01.9
Ganyani et al., 2020SingaporeJanuary-February 20205.20 (3.78–6.78)1.72 (0.91–3.93)
Ganyani et al., 2020ChinaJanuary-February 20203.95 (3.01–4.91)1.51 (0.74–2.97)
Hart et al., 2021VariousDecember 2019-March 20205.57 (5.08–6.09)2.32 (1.83–2.91)
Ferretti et al., 2020aVariousDecember 2019-March 20205.51.8
Challen et al., 2021UKJanuary-March 20204.8 (4.3–5.41)1.7 (1.0–2.6)

Results

Inferring the generation time from UK household data

We fitted two models of infectiousness (the independent transmission and symptoms model and the mechanistic model) to data collected from 172 UK households in a study (Miller et al., 2021) conducted by PHE between March and November 2020 (Figure 1—source data 1). Each household was recruited to the study following a confirmed SARS-CoV-2 infection, and all household members were then followed to investigate whether or not they became infected (this was determined through PCR and antibody testing). If a household member was infected and developed symptoms, their symptom onset date was recorded (see Methods).

In our previous work (Hart et al., 2021), we fitted the same two models of infectiousness to data from infector-infectee transmission pairs collected in the early months of the COVID-19 pandemic. Here, we adapted the approach presented in that article (Hart et al., 2021) in order to estimate the generation time using household transmission data. Specifically, we used data augmentation MCMC, augmenting the observed data with both estimated times of infection and estimated precise times at which symptomatic infected hosts developed symptoms (within recorded symptom onset dates). This enabled us (in the likelihood function) to account for uncertainty about exactly who-infected-whom within a household by summing together likelihood contributions corresponding to infection by different possible infectors. In addition, we corrected for the regularity of household contacts to derive more widely applicable estimates of the generation time. We did this by including a factor in the likelihood to account for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (see Methods for full details of our approach).

For the two fitted models, we calculated posterior estimates of the mean (Figure 1A) and standard deviation (Figure 1B) of the generation time distribution, in addition to the proportion of transmissions occurring prior to symptom onset (among infectors who develop symptoms; Figure 1C) and the overall infectiousness parameter, β0 (see Methods; Figure 1D). Under the commonly used independent transmission and symptoms model, we obtained a point estimate of 4.2 days (95% credible interval (CrI) 3.3–5.3 days) for the mean generation time (Figure 1A, blue violin; we calculated point estimates for each model using the posterior means of fitted model parameters because the mode of the joint posterior distribution could not easily be calculated from the output of the MCMC procedure). This value is similar to a previous estimate obtained using data from China by Ganyani et al., 2020. It is slightly lower than estimates for Singapore obtained by Ganyani et al., 2020 and for several countries (predominantly in Asia) obtained by Ferretti et al., 2020b (Table 1), although those estimates lie within our credible interval. On the other hand, our estimated standard deviation of 4.9 days (95% CrI 3.0–8.3 days; Figure 1B, blue violin) is substantially higher than previous estimates (Table 1). Using our mechanistic model, we obtained a higher estimate for the mean generation time of 5.9 days (95% CrI 5.2–7.0 days; Figure 1A, red violin), and a similar estimate for the standard deviation (4.8 days, 95% CrI 4.0–6.3 days; Figure 1B, red violin), compared to those predicted by the independent transmission and symptoms model.

Figure 1 with 12 supplements see all
Comparison of posterior predictions.

Violin plots indicating posterior distributions of the mean (A) and standard deviation (B) of the generation time distribution, proportion of transmissions occurring prior to symptom onset (among infectors who develop symptoms; C), and overall infectiousness parameter, β0 (describing the expected number of household transmissions generated by a single infected host) in a large, otherwise entirely susceptible, household; D). We show results obtained both using a model in which infectiousness is assumed to be independent of when symptoms develop (‘independent transmission and symptoms model’, blue), and using the mechanistic model from Hart et al., 2021 in which infectiousness is explicitly linked to symptoms (‘mechanistic model’, red).

Figure 1—source data 1

Household transmission data.

The transmission data from 172 households used in our analyses.

https://cdn.elifesciences.org/articles/70767/elife-70767-fig1-data1-v2.xlsx

The two models gave similar posterior distributions for the proportion of transmissions prior to symptom onset (Figure 1C). Specifically, point estimate values of model parameters led to an estimated proportion of transmissions prior to symptom onset of 0.72 (95% CrI 0.63–0.80) for the independent transmission and symptoms model, and 0.73 (95% CrI 0.61–0.83) for the mechanistic model. These estimates are higher than obtained in some previous studies in which the infectiousness profile of SARS-CoV-2 infected hosts at each time since infection and/or time since symptom onset has been estimated (Ashcroft et al., 2020; Ferretti et al., 2020a; He et al., 2020). On the other hand, our point estimates for the two models both lie within the 95% credible interval obtained for the mechanistic model in our previous work (0.53–0.77, point estimate 0.65; Hart et al., 2021). Similar or higher estimates also exist in the wider literature (Casey-Bryars et al., 2021; Ganyani et al., 2020; Tindale et al., 2020).

Posterior distributions for fitted model parameters are shown in Figure 1—figure supplement 1 and Figure 1—figure supplement 2, and point estimates and 95% credible intervals are given in Appendix 1—table 2 and Appendix 1—table 3. Since only the likelihood with respect to augmented data was calculated in the MCMC procedure, direct comparisons of the goodness of fit between the models were not readily available. However, comparing model predictions of the distribution of the interval between successive symptom onset dates in households to the analogous distribution in the data indicated that both models provided a similar fit to the data (Figure 1—figure supplement 3).

In Figure 1 (and elsewhere, unless otherwise stated), we characterise the generation time distribution assuming that a constant supply of susceptible individuals are available to infect during the course of infection. This distribution corresponds to the normalised expected infectiousness profile of an infected host at each time since infection, and is widely applicable to transmission outside of, as well as within, households. However, realised household generation times are expected to be shorter than the estimates shown in Figure 1. This is due to the depletion of susceptible household members before longer generation times can be obtained, especially in small households (Cauchemez et al., 2009; Fraser, 2007; Park et al., 2020a). As a result, we also predicted the mean and standard deviation of realised generation times within the study households (Figure 1—figure supplement 4A,B), accounting for the precise distribution of household sizes in the study. For both the independent transmission and symptoms model and the mechanistic model, the mean (point estimates 3.6 days and 4.9 days for the two models, respectively) and standard deviation (3.8 days and 4.1 days) of realised household generation times were lower than our main generation time estimates shown in Figure 1. Since household transmission typically occurs earlier in the infector’s course of infection than indicated by the estimates shown in Figure 1, we predicted a higher proportion of presymptomatic transmissions within the study households (Figure 1—figure supplement 4C) compared to the estimates in Figure 1C.

For both models, we then used point estimates of fitted model parameters to infer the distributions of the generation time (Figure 2A), the time from onset of symptoms to transmission (TOST; Figure 2B) and the serial interval (Figure 2C). The TOST distribution (which characterises the relative expected infectiousness of a host (who develops symptoms) at each time from symptom onset, as opposed to from infection [Ashcroft et al., 2020; Ferretti et al., 2020a; He et al., 2020; Lehtinen et al., 2021; Wells et al., 2021]) obtained using the mechanistic model was more concentrated around the time of symptom onset compared to that predicted assuming independent transmission and symptoms (Figure 2B), as we found in our previous work (Hart et al., 2021). In contrast, the estimated serial interval distributions were similar for the two models (Figure 2C). The means and standard deviations of the distributions shown in Figure 2 are given in Appendix 1—table 4.

Generation time, TOST and serial interval distributions.

Inferred generation time (A), TOST (B) and serial interval (C) distributions for the two models, obtained using point estimate (posterior mean) parameters. The means and standard deviations of these distributions are given in Appendix 1—table 4. Similarly to Hart et al., 2021, the discontinuity in the red curve in (B) occurs because different transmission rates were fitted for infectors in the presymptomatic infectious (P) and symptomatic infectious (I) stages of infection. The reduction in transmission following symptom onset can be attributed to changes in behaviour in response to symptoms (Manfredi and D’Onofrio, 2013).

Temporal variation in the generation time distribution

To explore whether or not the generation time distribution changed during the study period, we separately fitted the independent transmission and symptoms model to the data from households in which the index case was recruited in (i) March-April, (ii) May-August, or (iii) September-November 2020 (Figure 3). We chose these time periods to ensure the numbers of households recruited into the study during each interval were similar (Figure 3—figure supplement 1).

Figure 3 with 6 supplements see all
Temporal changes in the generation time.

Violin plots indicating posterior distributions of the mean (A) and standard deviation (B) of the generation time distribution, proportion of transmissions occurring prior to symptom onset (among infectors who develop symptoms; C), and overall infectiousness parameter, β0 (D), for the independent transmission and symptoms model fitted to data from March-April (blue), May-August (red), or September-November 2020 (orange).

The results shown in Figure 3A suggest a shorter mean generation time in September-November 2020 (2.9 days, 95% CrI 1.8–4.3 days) compared to earlier months (4.9 days, 95% CrI 3.6–6.3 days, for March-April and 5.2 days, 95% CrI 3.4–7.2 days, for May-August). Comparing the posterior estimates for May-August and September-November (the red and orange violins in Figure 3A, respectively) indicated a 97% posterior probability of a shorter mean generation time in the later of these two time periods. A similar temporal reduction in the mean generation time was found when we instead fitted the mechanistic model to the data from the three time intervals (Figure 3—figure supplement 2). Estimates of the mean generation time using the mechanistic model were 6.5 days (95% CrI 5.6–8.1 days) for March-April, 7.1 days (95% CrI 5.7–9.6 days) for May-August, and 5.1 days (95% CrI 4.3–6.4 days) for September-November, with a 98% posterior probability of a shorter mean generation time in September-November than May-August. We also used point estimates of model parameters to compare the distributions of the generation time, TOST and serial interval between the time periods (Figure 3—figure supplement 3), with both models indicating that the transmission risk peaked earlier in infection for individuals infected in September-November compared to earlier months (Figure 3—figure supplement 3A,D).

Figure 3C shows posterior estimates for the proportion of transmissions occurring prior to symptom onset (among symptomatic infectors) across the three time periods for the independent transmission and symptoms model, indicating a very high proportion of presymptomatic transmissions in September-November (0.83, 95% CrI 0.72–0.93) compared to lower estimates for March-April (0.64, 95% CrI 0.51–0.77) and May-August (0.62, 95% CrI 0.41–0.79). Our results for the mechanistic model indicate a similar temporal increase in the proportion of presymptomatic transmissions during the study period (Figure 3—figure supplement 2C).

To explore the lower estimated generation time for September-November further, we also fitted the independent transmission and symptoms model to the data from each of these months individually (Figure 3—figure supplement 4). The shorter estimated generation time compared to earlier in the pandemic was consistent across each of the three months (Figure 3—figure supplement 4A). We note that, while the Alpha (B.1.1.7) variant had begun to emerge in the UK by the end of the study period (Public Health England, 2021), genomic surveillance as part of the study showed that this variant caused infections in only two study households. This variant was therefore unlikely to have been responsible for the temporal reduction in the generation time that we observed.

In Figure 3—figure supplement 5, we show the posterior distributions of the fitted parameters for the mechanistic model (other than the overall infectiousness, β0, which is shown in Figure 3D) over the different time periods. These parameters represent the mean duration of the platent period (expressed as a proportion of the mean incubation period; Figure 3—figure supplement 5A), the mean duration of the symptomatic infectious period (Figure 3—figure supplement 5B), and the relative infectiousness of presymptomatic infectious hosts compared to those with symptoms (Figure 3—figure supplement 5C). However, there was substantial overlap in the credible intervals of posterior estimates of each parameter between the three time periods. We were therefore unable to identify the precise parameter(s) responsible for the decrease in generation time and increase in the proportion of presymptomatic transmissions that we observed.

Sensitivity analyses

When we fitted the two models to the household transmission data, we assumed that each household transmission chain was initiated by a single primary case and all other infected household members were infected from within the household. However, we also extended our framework to account for the possibility of co-primary cases (Appendix 1, Figure 1—figure supplement 5 and Figure 3—figure supplement 6). This led to slightly higher estimates of the mean generation time (Figure 1—figure supplement 5A) under each model compared to the corresponding estimates shown in Figure 1A, with point estimates of 4.8 days (95% CrI 3.6–6.3 days) for the independent transmission and symptoms model and 6.8 days (95% CrI 5.7–8.6 days) for the mechanistic model. Estimates of the standard deviation of the generation time distribution were similar to those in Figure 1 (Figure 1—figure supplement 5B); point estimates were 4.8 days (95% CrI 2.9–7.9 days) for the independent transmission and symptoms model and 5.1 days (95% CrI 4.0–6.9 days for the mechanistic model). As part of the fitting procedure, we estimated the probability that each household member was infected during the primary transmission event (Figure 1—figure supplement 5E), obtaining point estimates of 0.17 (95% CrI 0.02–0.33) under the independent transmission and symptoms model and 0.27 (95% CrI 0.10–0.41) under the mechanistic model. We also repeated the analyses in Figure 3 but accounting for the possibility of co-primary cases (Figure 3—figure supplement 6). Our main qualitative finding remained unchanged: the mean generation time was found to decrease during the study period (Figure 3—figure supplement 6A).

In the independent transmission and symptoms model, we assumed that both the generation time and incubation period follow lognormal distributions. The mean and standard deviation of the generation time distribution were estimated by fitting the model to the household transmission data. In the fitting procedure, we assumed that the incubation period followed a lognormal distribution that was obtained in a previous meta-analysis (McAloon et al., 2020). In contrast, we assumed in our mechanistic approach that each infection could be decomposed into three gamma distributed stages (latent, presymptomatic infectious and symptomatic infectious), so that the incubation period was also gamma distributed (with the same mean and standard deviation as the lognormal distribution obtained by McAloon et al., 2020). An expression for the generation time distribution in the mechanistic model, which does not take a simple parametric form, is given in the Appendix. However, we conducted supplementary analyses in which we instead assumed that either the generation time (Figure 1—figure supplement 6) or incubation period (Figure 1—figure supplement 7) in the independent transmission and symptoms model was gamma distributed. In both cases, we obtained similar results to those shown for that model in Figure 1.

We also relaxed the assumption of a fixed incubation period distribution (Figure 1—figure supplement 8), using the confidence intervals obtained by McAloon et al., 2020 to account for uncertainty in the incubation period distribution (Figure 1—figure supplement 8A, B). For both the independent transmission and symptoms model and the mechanistic model, accounting for this uncertainty did not substantially affect posterior estimates of either the mean (Figure 1—figure supplement 8C) or the standard deviation (Figure 1—figure supplement 8D) of the generation time distribution.

In our main analyses, we assumed that household transmission was frequency-dependent, so that the force of infection exerted by an infected household member on each susceptible household member scales with 1/n, where n is the household size (Cauchemez et al., 2014; Cauchemez et al., 2004). However, since some studies of influenza virus transmission in households have found transmission to lie somewhere in between frequency- and density-dependent (Endo et al., 2019; Ferguson et al., 2005), we also considered alternative possibilities where infectiousness scales with n-ρ, for different values of ρ. In Figure 1—figure supplement 9A-C, we compared estimates under our baseline value of ρ=1 (frequency-dependent transmission) with those obtained assuming either ρ=0 (density-dependent transmission) or the intermediate possibility of ρ=0.5 considered by Endo et al., 2019. In addition, we conducted an analysis in which the dependency, ρ, was estimated alongside other model parameters (Figure 1—figure supplement 9D). We found that our estimates of the mean and standard deviation of the generation time distribution were robust to the assumed value of ρ (Figure 1—figure supplement 9A, B). However, when the value ρ was fitted (Figure 1—figure supplement 9D), we estimated a value of 1.0 (95% CrI 0.6–1.5). This supported our assumption of frequency-dependent transmission, although the credible interval was relatively wide. In addition, we considered the possibility that infectiousness instead scales with 1/(n1), so that the infector under consideration is not included in this scaling, and again obtained similar estimates of the mean and standard deviation of the generation time distribution compared to those shown in Figure 1 (Figure 1—figure supplement 10).

We also considered the sensitivity of our results to the assumed relative infectiousness of asymptomatic infected hosts (Figure 1—figure supplement 11). In most of our analyses, we assumed that the expected infectiousness of an infected host who remained asymptomatic throughout infection was a factor αA=0.35 times that of a host who develops symptoms, at each time since infection (Buitrago-Garcia et al., 2020). However, similar estimates of the mean (Figure 1—figure supplement 11) and standard deviation (Figure 1—figure supplement 11B) of the generation time distribution were obtained when we instead assumed αA=0.1 or αA=1.27 (these values corresponded to the lower and upper confidence bounds obtained by Buitrago-Garcia et al., 2020). Lower values of αA did lead to slightly higher estimates of the overall infectiousness of infectors who develop symptoms, β0 (Figure 1—figure supplement 11D). However, this effect was minimal, likely because very few cases in the household study were asymptomatic (27 out of 357).

Finally, we explored the robustness of our results to the exclusion of household members of unknown infection status (see Methods), considering the extreme possibilities where these individuals were instead assumed to have either all remained uninfected, or all become infected (Figure 1—figure supplement 12). Although the estimates of β0 were affected by this assumption (Figure 1—figure supplement 12D), the estimated generation time distribution was robust to the assumed infection status of these individuals (Figure 1—figure supplement 12A,B).

Discussion

In this study, we estimated the generation time distribution of SARS-CoV-2 in the UK by fitting two different models to data describing the infection status and symptom onset dates of individuals in 172 households. The first model was predicated on an assumption that transmission and symptoms are independent. While this assumption has often been made in previous studies in which the SARS-CoV-2 generation time has been estimated (Challen et al., 2021; Deng et al., 2021; Ferretti et al., 2020b; Ganyani et al., 2020; Knight and Mishra, 2020), it is not an accurate reflection of the underlying epidemiology (Bacallado et al., 2020; Lehtinen et al., 2021). Therefore, we also considered a mechanistic model based on compartmental modelling, which was shown in our earlier work (Hart et al., 2021) to provide an improved fit to data from 191 SARS-CoV-2 infector-infectee pairs compared to previous models that have been used to estimate the generation time. Here, infection times and the order of transmissions within households were unknown, whereas in Hart et al., 2021 the direction of transmission was assumed to be known for each infector-infectee pair. For that reason, we needed to extend the statistical inference methods underlying our previous work (Hart et al., 2021) to fit the two models to household data. To do this, we used a data augmentation MCMC approach similar to previous studies of household influenza virus transmission (Cauchemez et al., 2009; Cauchemez et al., 2004; Ferguson et al., 2005).

Under the model assuming independent transmission and symptoms, we estimated a mean generation time of 4.2 days (95% CrI 3.3–5.3 days) and a standard deviation of 4.9 days (95% CrI 3.0–8.3 days). The estimate of the mean generation time was comparable to previous estimates obtained under this assumption using data from elsewhere (Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020; Table 1). On the other hand, while our credible interval for the standard deviation was wide, the estimates obtained in those previous studies (Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020) all lay below our lower 95% credible limit of 3.0 days. One potential cause of this disparity is the difference in isolation policies for symptomatic hosts between countries. In particular, the UK’s policy of self-isolation may be expected to lead to a longer-tailed generation time distribution compared to countries with a policy of isolation outside the home, since under home isolation, some within-household transmission is likely to occur even following isolation. Isolation outside the home was commonplace in the East and Southeast Asian countries where the majority of the data underlying the estimates by Ferguson et al., 2005; Ferretti et al., 2020a; Ganyani et al., 2020 were collected.

Using the mechanistic model, we predicted a higher mean generation time of 5.9 days (95% CrI 5.2–7.0 days) compared to the value estimated under the assumption of independent transmission and symptoms. On the other hand, the inferred serial intervals for the independent transmission and symptoms model and mechanistic model were more similar (Figure 2C), with means of 4.2 days and 4.7 days, respectively. Temporal information in our household transmission data consisted mostly of symptom onset dates, with very few individuals testing positive before developing symptoms. Therefore, the variation in estimates of the generation time between the models can be attributed to differences in the assumed relationships between the generation time and serial interval under those models. For the independent transmission and symptoms model, the generation time and serial interval distributions have the same mean, as is commonly assumed to be the case (Lehtinen et al., 2021). However, this was not true for the mechanistic model, in which infected hosts with longer presymptomatic infectious periods generate (on average) a higher number of transmissions. As a result, under the mechanistic model, a randomly chosen infection is more likely to arise from an infector with a longer incubation period than from a host with a shorter incubation period, thereby leading to a longer generation time than serial interval (an analytical expression for the exact difference between the mean generation time and serial interval for that model is derived in the Appendix).

Our results do not indicate any clear difference in goodness of fit to the data between the two models (Figure 1—figure supplement 3). A range of factors should therefore be considered when deciding which of our estimates of epidemiological parameters to use in subsequent analyses. Although any model requires simplifying assumptions to be made, our mechanistic approach allows the standard assumption of independent transmission and symptoms to be relaxed by providing a mechanistic underpinning to the relationship between the times at which individuals display symptoms and become infectious. Furthermore, as described above, this model was shown in our previous work (Hart et al., 2021) to provide a better fit to an earlier SARS-CoV-2 dataset than a model assuming independence between transmission and symptoms (in our earlier work [Hart et al., 2021], the simpler setting of transmission pairs rather than households facilitated direct model comparison). On the other hand, the independent transmission and symptoms model has the advantage of producing an estimated generation time distribution with a simple parametric form. The choice of estimates to use may also depend on precisely what the estimates are being used for. For example, the generation time distribution inferred under the assumption of independent transmission and symptoms may be better suited for use in some models for estimating the time-dependent reproduction number, since those models often also involve the assumption that transmission and symptoms are independent (Abbott et al., 2020). In contrast, the parameter estimates from our mechanistic approach correspond naturally to parameters in compartmental epidemic models.

By fitting separately to data from three different intervals within the study period (March-November 2020), we investigated whether or not the generation time distribution in the UK changed as the pandemic progressed. Our results indicate a shorter mean generation time in September-November compared to earlier months (Figure 3A). One possible explanation for this is a higher proportion of time spent indoors in colder months leading to an increased transmission risk, particularly in the early stages of infection before symptoms develop (since symptomatic infected hosts are still likely to self-isolate). This explanation is consistent with our finding in Figure 3C of a higher proportion of transmissions occurring prior to symptom onset in September-November compared to March-April and May-August.

While behavioural changes may have been responsible for our finding of a temporal decrease in the generation time, an alternative explanation could be that evolutionary changes in the SARS-CoV-2 virus that occurred during the study period affected the generation time. For example, the B.1.177 lineage emerged in Spain in early summer 2020, and became the dominant SARS-CoV lineage in the UK around the beginning of October 2020 (Vöhringer et al., 2021). Subsequently, the Alpha (B.1.1.7) variant, which was first detected in September 2020, became dominant in the UK in December 2020 (Public Health England, 2021). The Alpha variant has been shown to possess different characteristics than earlier variants (Davies et al., 2021; Volz et al., 2021), causing an increased epidemic growth rate in the UK that has been attributed to an increase in transmissibility of 43%–90% (Davies et al., 2021). While in principle evolutionary changes could explain the variation in the generation time that we observed, sequencing data show that the Alpha variant was responsible for infections in only two households within our dataset. Consequently, the Alpha variant was not responsible for our main finding of a temporally decreasing generation time, and additional data are required to quantify the impact of the emergence of that variant (and subsequent variants, such as the Delta (B.1.617.2) and Omicron (B.1.1.529) variants) on the SARS-CoV-2 generation time.

In data collected from infector-infectee transmission pairs, shorter generation times are expected to be over-represented at times when case numbers are rising (Britton and Scalia Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), and vice versa. While we used data from households (rather than transmission pairs) in our analyses, a similar effect may have contributed to our shorter estimated mean generation time for September-November 2020 (national case numbers were mostly increasing in September-October 2020) compared to earlier months of the study (during which case numbers were mostly decreasing; Knock et al., 2021; Pouwels et al., 2021). However, we estimated the mean generation time to be similar in November (when case numbers were mostly decreasing [Knock et al., 2021; Pouwels et al., 2021]) compared to September and October (Figure 3—figure supplement 4), suggesting that this effect of background epidemic dynamics alone did not drive the temporal changes in generation time that we observed. We note, however, that sample sizes for individual months were small (Figure 3—figure supplement 1). Extending our household inference framework to explicitly account for background epidemic dynamics in generation time estimates (similar to methods that have been developed for transmission pair data [Britton and Scalia Tomba, 2019; Ferretti et al., 2020b]) is an avenue for future work.

Our finding of a temporal decrease in the mean generation time during the study period highlights the importance of obtaining up-to-date generation time estimates specific to the location under study. Should variations in the generation time distribution occur and not be accounted for, estimates of the time-dependent reproduction number may be incorrect (Park et al., 2021; Wallinga and Lipsitch, 2007). Specifically, if the mean generation time is shorter than assumed, then the true value of the time-dependent reproduction number is likely to be closer to one than the inferred value (Wallinga and Lipsitch, 2007), and vice versa.

One advantage of our approach compared to previous studies in which the SARS-CoV-2 generation time has been estimated (Ferretti et al., 2020a; Ganyani et al., 2020; Hart et al., 2021) is that we were able to include the contribution of asymptomatic infected hosts to household transmission chains in our analyses. We showed that our estimated generation time distribution was robust to the assumed relative infectiousness of infected hosts who remain asymptomatic, αA (Figure 1—figure supplement 11). Similarly, while we assumed frequency-dependent household transmission in most of our analyses, we found that the exact relationship between the household size and transmission had little effect on our estimates of the mean and standard deviation of the generation time distribution (Figure 1—figure supplement 9 and Figure 1—figure supplement 10). We also considered estimating the exponent governing the dependency of transmission on household size (Figure 1—figure supplement 9D). This supported our assumption of frequency-dependent transmission, and is consistent with the finding of an inverse relationship between household size and secondary attack rate in the household study underlying our analyses (Miller et al., 2021). In previous studies of influenza transmission within households, evidence has been found both in favour of (Cauchemez et al., 2004) and against (Endo et al., 2019) frequency-dependent transmission.

While our generation time estimates were robust to the assumed relative infectiousness of infected hosts who remain asymptomatic and whether transmission was assumed to be frequency- or density-dependent, extending our approach to account for the possibility that household transmission chains originate with multiple co-primary cases led to slightly higher estimates of the generation time (Figure 1—figure supplement 5) compared to our main estimates (Figure 1). Despite the overall higher estimated generation time, our main qualitative finding of a temporal decrease in the generation time held when co-primary cases were incorporated (Figure 3—figure supplement 6).

Like any mathematical modelling study, our approach has some limitations. We used household data in our analyses, whereas some characteristics of wider community transmission may differ from those of transmission within households. However, we corrected for the regularity of household contacts to estimate the (expected) infectiousness profile of an infected host at each time since infection (accounting for behavioural factors), which provides a widely applicable generation time estimate (Figure 1). Specifically, the infectiousness profile gives the generation time distribution under the assumption that a constant supply of susceptible individuals are available throughout the course of infection. This distribution can then be conditioned to specific population structures, as we demonstrated by estimating the realised generation time distribution within the study households (Figure 1—figure supplement 4). The household generation time estimates shown in Figure 1—figure supplement 4 are shorter than our main generation time estimates (Figure 1), due to the regularity of household contacts and the depletion of susceptible individuals within households before longer generation times can be realised.

We also note that, while our dataset involved a larger sample size than used in most other studies in which the SARS-CoV-2 generation time was estimated (Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020; Hart et al., 2021), the demographics of the study households may not have been completely representative of the wider population. Exploring heterogeneity in the generation time distribution between individuals and/or households with different characteristics is an important topic for future work. This could involve, but is not limited to, estimating the generation time distribution for individuals of different age, sex, ethnicity, and socio-economic status. Nonetheless, as well as providing updated SARS-CoV-2 generation time estimates, our study demonstrates that changes in the generation time can be detected using data from household studies. Our finding that the generation time has become shorter highlights both the importance of continued monitoring of the generation time and the role of household studies in such monitoring efforts, particularly in light of the more recent emergence of novel SARS-CoV-2 variants.

In summary, we have inferred the SARS-CoV-2 generation time distribution in the UK using household data and two different transmission models. A key output of this research is one of the first estimates of the SARS-CoV-2 generation time outside Asia. Another crucial feature of our analysis is that it was based on data from beyond the first few months of the pandemic. Since this research suggests that the generation time may be changing, continued data collection and analysis is of clear importance.

Methods

Data

Data were obtained from a household study (Miller et al., 2021) conducted in 172 UK households (with 603 household members in total) by PHE between March and November 2020 (Figure 1—source data 1). In each household, an index case was recruited following a positive PCR test. The following were then recorded for each household member:

  • The timing and outcome of (up to) two subsequent PCR tests.

  • The outcome of an antibody test (carried out for 541 individuals – 90% of the study cohort).

  • Whether or not the household member developed symptoms.

  • The date of symptom onset (only for symptomatic individuals with a positive PCR or antibody test).

In the study, all household members who tested positive in either a PCR or antibody test were assumed to have been infected. Conversely, all individuals who tested negative for antibodies and did not return a positive PCR test (i.e. the two PCR tests were either negative or were not carried out) were assumed to have remained uninfected, irrespective of symptom status. For 34 individuals (6% of the study cohort), no antibody test was carried out and any PCR tests were negative. Since the available data were considered insufficient to determine whether or not these 34 individuals were infected, these individuals were excluded from our main analyses (but were counted in the household size), although we also considered the sensitivity of our results to this assumption.

In two households, at least one household member developed symptoms 55–56 days prior to the symptom onset date of the index case, with no other household members developing symptoms (or returning a positive PCR or antibody test) between these dates. In contrast, the maximum gap between successive symptom onset dates in the remaining households was 25 days (Figure 1—figure supplement 3). Data from these two households were excluded from our analyses, on the basis that the virus was most likely introduced multiple times into these households. Three other households were also excluded from our analyses because, other than the index cases in each household, all other household members were of unknown infection status (i.e. they were among the individuals for whom no antibody test was carried out and any PCR tests were negative).

Overall, aside from the five excluded households, the 167 remaining households comprised 587 individuals, of whom 330 became infected and developed symptoms, 27 became infected but remained asymptomatic, 200 remained uninfected, and the remaining 30 were of unknown infection status. The number of households and individuals recruited into the study by month is shown in Figure 3—figure supplement 1.

Models

General modelling framework

Throughout, we denote the expected force of infection exerted by an infected host onto each susceptible member of their household, at time since infection τ, by β(τ), where we assumed

β(τ)=(β0/n)f(τ),

for a host who develops symptoms, and

β(τ)=αA(β0/n)f(τ),

for a host who remains asymptomatic throughout infection. Here:

  • β0 is the overall infectiousness parameter, describing the expected number of household transmissions generated by a single infected host (who develops symptoms) in a large, otherwise entirely susceptible, household.

  • n is the household size. The scaling of β(τ) with 1/n corresponds to frequency-dependent transmission, as assumed by Cauchemez et al., 2014; Cauchemez et al., 2004, although we carried out a sensitivity analysis in which we considered alternative possibilities where household transmission is density-dependent (without the scaling factor 1/n), scales with 1/n0.5 (Endo et al., 2019), or scales with 1/(n1).

  • f(τ) is the generation time distribution (which was assumed to be the same for entirely asymptomatic hosts as those who develop symptoms).

  • αA is the relative infectiousness of infected hosts who remain asymptomatic throughout infection. We assumed a value of 0.35 (Buitrago-Garcia et al., 2020) in most of our analyses, although we considered different values of αA in a sensitivity analysis.

Except where otherwise stated, we considered the generation time distribution assuming a constant supply of susceptibles during infection, f(τ), which corresponds to the normalised expected infectiousness profile and gives a widely applicable generation time estimate (see Discussion). However, realised generation times within a household may be shorter than predicted by this distribution due to the depletion of susceptible household members before longer generation times can be realised (Cauchemez et al., 2009; Fraser, 2007; Park et al., 2020b). For example, if infected hosts are (on average) equally infectious at two times since infection, τ1<τ2, then f(τ1)=f(τ2). However, because the number of susceptible household members may decrease between these two times (i.e. either the host under consideration, or another infected household member, may transmit the virus within the household in the intervening time), then transmission is in fact more likely to occur in a household at the earlier time, τ1, when more susceptibles are available. Therefore, we also predicted the mean and standard deviation of realised generation times within the study households in Figure 1—figure supplement 4.

We considered two different models of infectiousness, which are outlined below. Under each model, expressions were derived in Hart et al., 2021 for the generation time, TOST and serial interval distributions, in addition to the proportion of transmissions occurring before symptom onset. These expressions are given in the Appendix here (other than the generation time distribution and proportion of presymptomatic transmissions for the independent transmission and symptoms model, which are stated below).

Independent transmission and symptoms model

In this model, the infectiousness of an infected host (who does not remain asymptomatic throughout infection; asymptomatic infected hosts are considered separately) at a given time since infection, τ, is assumed to be independent of exactly when the host develops symptoms – that is, the generation time and incubation period are independent. In our main analyses using this model, we assumed that the generation time distribution, f(τ), is the probability density function of a lognormal distribution (Ferguson et al., 2005; an alternative case of a gamma distributed generation time is considered in Figure 1—figure supplement 6). The mean and standard deviation of this distribution, in addition to β0, were estimated when we fitted the model to the household transmission data.

Under the assumption of independent transmission and symptoms, the proportion of transmissions occurring prior to symptom onset (among infectors who develop symptoms) is given by (Ferretti et al., 2020b; Fraser et al., 2004)

0f(τ)(1Finc(τ))dτ,

where Finc is the cumulative distribution function of the incubation period (this was assumed to be known; the exact incubation period distribution we used is given under ‘Parameter estimation’ below).

Mechanistic model

Under the mechanistic model (Hart et al., 2021), infectors who develop symptoms progress through independent latent (E), presymptomatic infectious (P) and symptomatic infectious (I) stages of infection. We assumed the duration of each stage to be gamma distributed, and infectiousness was assumed to be constant during each stage. Under these assumptions, an expression can be derived for the expected infectiousness, β(ττinc), of a host (who develops symptoms) at each time since infection τ, conditional on their incubation period τinc. We assumed that entirely asymptomatic infected hosts follow the same stage progression as those who develop symptoms, although in this case the distinction between the P and I stages has no epidemiological meaning. Details of the mechanistic approach, including the formula for β(ττinc), are provided in the Appendix.

When we fitted this model to the household transmission data, three model parameters were estimated in addition to β0. These parameters correspond to:

  • The ratio between the mean latent (E) period and the mean incubation (combined E and P) period (where the latter was assumed to be known).

  • The mean symptomatic infectious (I) period.

  • The ratio between the transmission rates when potential infectors are in the P and I stages.

Likelihood function

Here, we consider a household of size n, in which nI household members become infected (of whom nS develop symptoms and nA remain asymptomatic throughout infection) and nU=n-nI remain uninfected. We derive an expression for the likelihood of the parameters of either model of infectiousness, given the entire sequence of infection times of individuals in the household (t1<<tnI) as well as the precise symptom onset time (ts,j) of each host, j, who develops symptoms. In the case of the mechanistic model, the likelihood also depends on the times at which entirely asymptomatic infected hosts enter the I stage of infection (these times are also denoted by ts,j, although for asymptomatic infected individuals these times have no epidemiological meaning). Since exact infection times were not available within study households, and it was unknown exactly when each symptomatic infected host developed symptoms within their recorded symptom onset date, we used data augmentation MCMC to fit the two models to the UK household transmission data using this likelihood function (see further details below).

When deriving the likelihood, we made several simplifying assumptions:

We denote the expected infectiousness of household member j, at time τ since infection, by βj(τ). For the mechanistic model in which transmission and symptoms are not independent, this infectiousness is conditional on the duration of the incubation period, ts,j-tj, for a host who develops symptoms (the infectiousness is also conditional on (ts,jtj) for an entirely asymptomatic infected host, although this interval has no epidemiological meaning for such individuals). The total (instantaneous) force of infection exerted at time t on each susceptible household member is then

λ(t)=j=1nIβj(ttj),

where βj(ttj)=0 for ttj, and the cumulative force of infection is

Λ(t)=tλ(s)ds.

For k=2,,nI, conditional on the sequence of infection times up to time tk, the probability that host k becomes infected at time tk is given by

λ(tk)exp(Λ(tk)),

where exp(Λ(tk)) represents the probability of host k avoiding infection from household contacts that occurred before their actual time of infection, tk (Cauchemez et al., 2004; Ferguson et al., 2005). This factor, which was not included in the likelihood when we previously estimated the generation time using data from infector-infectee transmission pairs (Hart et al., 2021), is required here because of the regularity of household contacts. Since household contacts occur frequently, it is necessary to account explicitly for contacts between infected and susceptible individuals that did not lead to transmission. The inclusion of this factor in the likelihood therefore corrects for the regularity of household contacts to ensure widely applicable generation time estimates (note that this factor is equal to one in the limit of a very small overall household infectiousness parameter, β0).

For k=nI+1,,n, conditional on the entire sequence of infection times, t1,,tnI, the probability of host k never being infected is given by exp(Λ()). In the case of independent transmission and symptoms, we have

exp(Λ())=exp(β0(nS+αAnA)/n),

whereas for the mechanistic model, exp(Λ()) instead depends on the incubation periods of those hosts who develop symptoms, as well as the corresponding time periods for entirely asymptomatic infected hosts (see the Appendix).

The likelihood contribution from the household, L(θ), where θ is the vector of unknown model parameters, can therefore be written as

L(θ)=k=1nLk,1(θ)Lk,2(θ).

Here, Lk,1(θ) is the contribution to the likelihood from the transmission, or absence of transmission, to host k, that is,

Lk,1(θ)={1,fork=1;λ(tk)exp(Λ(tk)),fork=2,,nI;exp(Λ()),fork=nI+1,,n.

Lk,2(θ) is the contribution from the incubation period of host k (where applicable), that is, for the independent transmission and symptoms model,

Lk,2(θ)={finc(ts,ktk),ifhostkbecomesinfectedanddevelopssymptoms;1,otherwise;

where finc is the probability density function of the incubation period (this was assumed to be known; the exact incubation period distribution we used is given below). For the mechanistic model, we also have a contribution to the likelihood from the (in this case not epidemiologically meaningful) times ts,k-tk for entirely asymptomatic infected hosts, so that

Lk,2(θ)={finc(ts,ktk),fork=1,,nI;1,fork=nI+1,,n.

Parameter estimation

Incubation period

For the independent transmission and symptoms model, we assumed a lognormal incubation period distribution with mean 5.8 days and standard deviation 3.1 days (McAloon et al., 2020). For the mechanistic model, we assumed a gamma distributed incubation period with the same mean and standard deviation; this was for mathematical convenience, since the incubation period could then be decomposed into the sum of independent gamma distributed latent and presymptomatic infectious periods. Results for the independent transmission and symptoms model using a gamma distributed incubation period are shown in Figure 1—figure supplement 7, and uncertainty in the exact parameters of the incubation period distribution is accounted for in Figure 1—figure supplement 8.

Parameter fitting procedure

Unknown model parameters were estimated using data augmentation MCMC. The observed data comprised information about whether or not individuals were ever infected and/or displayed symptoms, symptom onset dates, and for some individuals an upper bound on their infection time (corresponding to the date of a positive PCR test). These data were augmented with (estimated) precise times of infection and symptom onset (where applicable) for each infected host. No prior assumptions were made about the order of transmissions within each household.

Below, we outline the parameter fitting procedure that we used for the independent transmission and symptoms model. The procedure used for the mechanistic model was similar and is described in the Appendix.

Lognormal priors were assumed for fitted model parameters (these parameters were the mean and standard deviation of the generation time distribution, in addition to the overall infectiousness, β0). The priors for the mean and standard deviation of the generation time distribution had medians of 5 days and 2 days, respectively (these choices were informed by previous estimates of the SARS-CoV-2 generation time distribution [Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020]), and were chosen to ensure a prior probability of only 0.025 that these parameters exceeded very high values of 10 days and 7 days, respectively. The exact priors we used are given in Appendix 1—table 2.

Here, we denote the vector of model parameters by θ, and the augmented data by

t=(t(1),,t(M)),

where t(m) represents the augmented data from household m=1,,M, and M is the total number of households. We write the (overall) likelihood as

L(θ;t)=m=1ML(m)(θ;t(m)),

where the likelihood contribution, L(m)(θ;t(m)), from each household, m, was computed as described in the previous section (i.e. all households in the study were assumed to be independent), and we denote the prior density of θ by π(θ).

In each step of the chain, we carried out (in turn) one of the following:

  1. Propose new values for each entry of the vector of model parameters, θ, using independent normal proposal distributions for each parameter (around the corresponding parameter values in the previous step of the chain). Accept the proposed parameters, θprop, with probability

    min(L(θprop;t)π(θprop)L(θold;t)π(θold),1),

    where θold denotes the vector of parameter values from the previous step of the chain, and where the augmented data, t, remain unchanged in this step.

  1. Propose new values for the precise symptom onset times of each symptomatic infected host, using independent uniform proposal distributions (within the day of symptom of onset for each host). For each household, m, accept the proposed augmented data, tprop(m), from that household with probability

    min(L(m)(θ;tprop(m))L(m)(θ;told(m)),1),

    where told(m) denotes the corresponding augmented data from the previous step of the chain, and where the model parameters, θ, remain unchanged in this step (i.e. proposed times are accepted/rejected independently for each household, according to the likelihood contribution from that household).

  1. Propose new values for the infection time of one randomly chosen symptomatic infected host in each household (in households where there was at least one), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, m, accept the proposed augmented data, tprop(m), from that household with probability

    min(L(m)(θ;tprop(m))L(m)(θ;told(m)),1).
  1. Propose new values for the infection time of one randomly chosen asymptomatic infected host in each household (in households where there was at least one), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, m, accept the proposed augmented data, tprop(m), from that household with probability

    min(L(m)(θ;tprop(m))L(m)(θ;told(m)),1).

The chain was run for 10,000,000 iterations; the first 2,000,000 iterations were discarded as burn-in. Posteriors were obtained by recording every 100 iterations of the chain.

Governance statement

The household study was approved by the PHE Research Ethics and Governance Group as part of the portfolio of PHE’s enhanced surveillance activities in response to the pandemic.

Appendix 1

Details of mechanistic model

In this model, each infected host (who develops symptoms) progresses through independent latent (E), presymptomatic infectious (P) and symptomatic infectious (I) stages of infection. The infectiousness of the host during the P and I stages is denoted by βP and βI, respectively, and we denote the ratio αP=βP/βI. We assumed the duration of each stage, denoted yE/P/I, to be gamma distributed:

yEGamma (kE,1/(kincγ)),yPGamma (kP,1/(kincγ)),yI Gamma (kI,1/(kIμ)),

where we write XGamma(a,b) for a gamma distributed random variable with shape parameter a and scale parameter b. We assumed that kE+kP=kinc, so that the incubation period, τinc=yE+yP, is gamma distributed, with

τincGamma(kinc,1/(kincγ)).

We fixed the values of the parameters kinc and 1/γ (which represent the shape parameter of the incubation period distribution and the mean incubation period, respectively) in order to obtain the specified incubation period distribution (the exact values that we assumed are given in Appendix 1—table 1). For simplicity, we also assumed that kI=1, so the symptomatic infectious period is exponentially distributed. The parameters kE (the shape parameter of the latent (E) period distribution), 1/μ (the mean symptomatic infectious (I) period) and αP (the ratio between the transmission rates of hosts in the P and I stages) were estimated when we fitted the model to the household transmission data.

Hosts who remain asymptomatic throughout infection were assumed to follow the same E/P/I stages, although in this case the distinction between the P and I stages has no epidemiological meaning. Stage durations, as well as the value of αP, were assumed to be identical for entirely asymptomatic hosts and those who develop symptoms, so that the generation time distribution is the same for all infected hosts.

Conditional infectiousness

For a host who develops symptoms, conditional on incubation period τinc, the expected infectiousness at time since infection τ is (Hart et al., 2021)

β(ττinc)={αPC(β0/n)(1FBeta(1τ/τinc;kP,kE)),0<τ<τinc,C(β0/n)(1FI(ττinc)),τ>τinc.

Here, β0 is the overall infectiousness parameter (see Methods in the main text), n is the household size, FI(yI) is the cumulative distribution function of the duration of the I stage, FBeta(x;a,b) is the cumulative distribution function of a beta distributed random variable with shape parameters a and b, and

C=kincγμαPkPμ+kincγ.

The cumulative conditional infectiousness can therefore be calculated to be

B(ττinc)=0τβ(ττinc)dτ={(ττinc)β(ττinc)+αPCβ0n[kpτinckinc(1FBeta(1τ/τinc;kP+1,kE))],0τ<τinc,(ττinc)β(ττinc)+Cβ0n[αkpτinckinc+1μFGamma(ττinc;kI+1,1kIμ)],ττinc,

where FGamma(x;a,b) is the cumulative distribution of a gamma distributed random variable with shape parameter a and scale parameter b. The total force of infection exerted on each household member (over the course of infection) is then

B(τinc)=β0n(αPkPγμτinc+kincγαPkPμ+kincγ).

The mean of this expression over the incubation period distribution is β0/n.

For a host who remains asymptomatic throughout infection, conditional on the combined duration of the E and P stages, τinc=yE+yP, the infectiousness, β(ττinc), is given by αA times the corresponding expression for a host who develops symptoms. We note that in this case, τinc has no epidemiological interpretation, but this conditional infectiousness was useful when fitting the model to data (see ‘Parameter estimation’ below).

Generation time distribution

The generation time, τgen, for an individual transmission can be written as

τgen=yE+y,

where yE is the length of the latent (E) stage, and y is the time from the start of the presymptomatic infectious (P) stage to the transmission occurring. As shown by Hart et al., 2021, if the effect of susceptible depletion during infection is neglected, y has density,

f(y)=C(αP(1FP(y))+0y(1FI(yyP))fP(yP)dyP).

Using this density, it can be shown that the moments of this distribution are

E[(y)m]=Cm+1(αPE[yPm+1]+E[(yP+yI)m+1yPm+1]).

In particular,

E[y]=C2(αPE[yP2]+2E[yP]E[yI]+E[yI2]),

and

Var[y]=C3(αPE[yP3]+3E[yP2]E[yI]+3E[yP]E[yI2]+E[yI3])(E[y])2.

Note that for a gamma distributed random variable, XGamma(a,b), we have

E[Xm]=Γ(a+m)Γ(a)bm=a(a+1)(a+(m1))bm.

Therefore, for gamma distributed stage durations, explicit expressions can be obtained for the mean and variance of the generation time distribution,

E[τgen]=E[yE]+E[y],Var[τgen]=Var[yE]+Var[y],

where the last equality holds because yE and y are assumed to be independent.

Proportion of presymptomatic transmissions

Among infectors who develop symptoms, the proportion of transmissions occurring prior to symptom onset (neglecting the effect of susceptible depletion during infection) is given by (Hart et al., 2021)

qP=(βPkPkincγ)(βPkPkincγ+βIμ)=αPkPμαPkPμ+kincγ.

Parameter estimation

The vector of model parameters,

θ=kE/kinc,1/μ,αP,β0,

was estimated by fitting the mechanistic model to the household transmission data.

We assumed independent prior distributions for each entry of θ. Lognormal priors were assumed for 1/μ, αP and β0. Since αP represents the ratio between the transmission rates of hosts in the P and I stages, a prior with median one was used to ensure equal prior probabilities of values above and below one. This prior was also chosen to limit the prior probability of extreme values, with a prior 95% credible interval of [0.2,5]. A beta prior was used for kE/kinc (which was constrained to lie between 0 and 1), and was chosen to restrict the prior probability of values very close to either 0 or 1. The exact priors we used are given in Appendix 1—table 3.

A slightly amended version of the parameter fitting algorithm described in the main text for the independent transmission and symptoms model was used. In particular, we augmented the observed data with:

  1. The infection time, tj, of each infected host.

  2. The time, ts,j, at which each infected host transitioned from the P to I stage.

Note that for hosts who develop symptoms, the time of entry into the I stage corresponds to the symptom onset time. The data were also augmented with this transition time for entirely asymptomatic infected hosts because the conditional infectiousness, β(τts,jtj), is more straightforward to calculate than β(τ).

In each step of the chain, we carried out (in turn) one of the following:

  1. Propose new values for each entry of the vector of model parameters, θ, using a multivariate normal proposal distribution (around the value of θ in the previous step of the chain; a correlation of 0.5 was used between the proposal distributions of kE/kinc and αP, and between those of 1/μ and αP). Accept the proposed parameters, θprop, with probability

    min(L(θprop;t)π(θprop)L(θold;t)π(θold),1),

    where θold denotes the vector of parameter values from the previous step of the chain, and where the augmented data, t remain unchanged in this step.

  1. Propose new values for the precise symptom onset times of each symptomatic infected host, using independent uniform proposal distributions (within the day of symptom of onset for each host). For each household, m, accept the proposed augmented data, tprop(m), from that household with probability

    min(L(m)(θ;tprop(m))L(m)(θ;told(m)),1),

    where told(m) denotes the corresponding augmented data from the previous step of the chain, and where the model parameters, θ, remain unchanged in this step (i.e. proposed times are accepted/rejected independently for each household, according to the likelihood contribution from that household).

  1. Propose new values for the infection time of one randomly chosen infected host in each household (either symptomatic or asymptomatic), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, m, accept the proposed augmented data, tprop(m), from that household with probability

    min(L(m)(θ;tprop(m))L(m)(θ;told(m)),1).
  1. Propose new values for both the infection time, t, and the time of the start of the I stage, ts, holding (tst) constant, for one randomly chosen asymptomatic infected host in each household (in households where there was at least one), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, m, accept the proposed augmented data, tprop(m), from that household with probability

    min(L(m)(θ;tprop(m))L(m)(θ;told(m)),1).

Relationship between generation time, TOST and serial interval

Here, we consider a randomly chosen infector-infectee pair (in which both the infector and the infectee develop symptoms) within a large, well-mixed population, of which only a small proportion is infected. In that setting, the observed generation time distribution is equal to the normalised infectiousness profile, which will not be true within a household (compare Figure 1 and Figure 1—figure supplement 4). We define:

τinc,1=(incubation period of the infector),τinc,2=(incubation period of the infectee),τgen=(generation time),xtost=(time from onset of symptoms (of infector) to transmission (TOST)),xser=(serial interval),

where we use τ for time intervals relative to the time of infection and x for those relative to the time of symptom onset. We denote the probability density functions of these time periods by finc,1, finc,2, fgen, ftost and fser, respectively. Note that

xtost=τgen-τinc,1,

and

xser=xtost+τinc,2,

so that

xser=τgen+τinc,2-τinc,1.

In the independent transmission and symptoms model, τgen and τinc,1 are assumed to be independent, and the incubation periods of the infector and infectee are assumed to be drawn independently from the population incubation period distribution, finc=finc,1=finc,2. Therefore, the TOST distribution is given by the convolution

(1) ftost(xtost)=0fgen(xtost+τ)finc(τ)dτ.

Assuming that xtost and τinc,2 are independent, the serial interval distribution can be calculated from the TOST distribution as

(2) fser(xser)=0ftost(xserτ)finc(τ)dτ.

Note that

E[xser]=E[τgen]+E[τinc,2]E[τinc,1]=E[τgen],

i.e. the generation time and serial interval distributions have the same mean.

For the mechanistic model, we still have finc,2=finc, and the serial interval distribution can be calculated from the TOST distribution using Equation 2. On the other hand, τgen and τinc,1 are not independent, so Equation 1 connecting the TOST and generation time distributions for the independent transmission and symptoms model does not hold for the mechanistic model. As shown by Hart et al., 2021, the TOST distribution for the mechanistic model is, instead, given by

ftost(xtost)={αPC(1FP(xtost)),xtost<0,C(1FI(xtost)),xtost0.

Further, under the mechanistic model, the expected number of presymptomatic transmissions generated by an infected host is dependent on their incubation period. As a result, the infector’s incubation period does not follow the same distribution as that of the infectee. In particular, by Bayes’ theorem, we have

finc,1(τinc,1)=p(τinc,112)=p(12τinc,1)finc(τinc,1)p(12),

where we write 1 → 2 to denote the occurrence of the transmission from the infector to the infectee. Because we are here considering a large population, the probability of the transmission occurring is proportional to the overall infectiousness of the infector (integrated over the course of infection), B(), so we have

finc,1(τinc,1)=B(τinc,1)finc(τinc,1)B()=(αPkPγμτinc,1+kincγαPkPμ+kincγ)finc(τinc,1).

The expected incubation period of the infector is then

E[τinc,1]=1γ+αPkPμkincγ(αPkPμ+kincγ)=E[τinc,2]+qPkincγ,

where qP is the proportion of transmissions occurring prior to symptom onset.

As a result of the above, the expected values of the generation time and serial interval in the mechanistic model are not equal. Instead, we have

E[xser]=E[τgen]qPkincγ.

Under the values of kinc and γ that we assumed (Appendix 1—table 1), this gives a mean generation time that is approximately (1.6×qP) days longer than the mean serial interval.

Extension of framework to account for co-primary cases

In most of our analyses, we assumed that each household transmission chain was initiated by a single primary case, so that all other infected household members were infected from within the household. However, we also relaxed this assumption by extending our framework to account for the possibility of co-primary cases (Figure 1—figure supplement 5 and Figure 3—figure supplement 6). Rather than assuming that all co-primary cases were infected at exactly the same time, we instead assumed that each household member could be infected at any time during a primary infection event that was taken to last one day (the choice of one day was arbitrary but in principle any duration could be used). This enabled us to easily incorporate the possibility of co-primary cases into our data augmentation MCMC approach by adapting the likelihood function as described below.

As in Methods, we here consider a household (of size n) in which nI household members become infected (of whom nS develop symptoms and nA remain asymptomatic throughout infection) and nU remain uninfected. Under either the independent transmission and symptoms model or the mechanistic model, we now denote the total force of infection exerted on each susceptible member of the household by other household members at time t by λh(t), and the cumulative force of infection by Λh(t) (i.e. these correspond to the quantities denoted by λ(t) and Λ(t), respectively, in Methods). Assuming each (susceptible) household member is also subject to a constant force of infection, βp, during a primary event taking place between times tp,start and tp,end, the total force of infection exerted on each susceptible household member at time t is

λ(t)=λp(t)+λh(t),

where

λp(t)={βp,tp,startttp,end;0,otherwise.

The cumulative force of infection is

Λ(t)=Λp(t)+Λh(t),

where

Λp(t)=tλp(s)ds=βp2(tp,endtp,start+|ttp,start||tp,endt|).

We took tp,start and tp,end to be the start and end of the day of the first household member becoming infected, respectively.

The likelihood contribution from the household, L(θ), where θ is the vector of unknown model parameters, is then given by

L(θ)=11exp(nβp×(tp,endtp,start))k=1nLk,1(θ)Lk,2(θ).

Here,

Lk,1(θ)={λ(tk)exp(Λ(tk)),fork=1,,nI;exp(Λ()),fork=nI+1,,n;

and for the independent transmission and symptoms model,

Lk,2(θ)={finc(ts,ktk),ifhostkbecomesinfecteddevelopssymptoms;1,otherwise;

where finc is the probability density function of the incubation period, while for the mechanistic model,

Lk,2(θ)={finc(ts,ktk)fork=1,,n1;1fork=nI+1,,n.

The factor

11exp(nβp×(tp,endtp,start)),

is included to condition on at least one household member becoming infected during the primary transmission event.

Using this likelihood function, we fitted both models to the household data using the same data augmentation MCMC approach described for the independent transmission and symptoms model in Methods and for the mechanistic model earlier in the Appendix. Alongside other model parameters, we estimated the probability of each household member becoming infected during the primary transmission event,

1exp(βp×(tp,endtp,start)),

in the MCMC procedure (in the case we considered, (tp,endtp,start) was always equal to one day, so βp could be calculated from this probability). A uniform prior was assumed for the probability of primary infection.

Supplementary tables

Appendix 1—table 1
Assumed (not fitted) parameter values used for the two models that we considered.
ParameterModelInterpretationValueJustification
αABothRelative infectiousness of entirely asymptomatic hosts0.35Taken from Buitrago-Garcia et al., 2020 (other values considered in sensitivity analyses)
Mean of natural logarithm of the incubation periodIndependent transmission and symptomsParameter of lognormal incubation period distribution1.63 log(day)Taken from McAloon et al., 2020 (uncertainty in this value considered in sensitivity analyses)
Standard deviation of natural logarithm of the incubation periodIndependent transmission and symptomsParameter of lognormal incubation period distribution0.50 log(day)Taken from McAloon et al., 2020 (uncertainty in this value considered in sensitivity analyses)
kincMechanisticShape parameter of gamma incubation period distribution3.5Consistent with mean and standard deviation from McAloon et al., 2020
1/γMechanisticMean incubation period5.8 daysConsistent with mean and standard deviation from McAloon et al., 2020
kIMechanisticShape parameter of (gamma) symptomatic infectious period distribution1Assumed
Appendix 1—table 2
Fitted parameters in the independent transmission and symptoms model, the prior distributions used, and the posterior means and 95% credible intervals obtained.
ParameterPriorPosterior mean (95% CrI)
Mean generation timeLognormal(1.6,0.35)[prior median 5.0 days, 95% CrI 2.5–9.8 days]4.2 days(3.3–5.3 days)
Standard deviation of generation time distributionLognormal(0.7,0.65)[prior median 2.0 days, 95% CrI 0.6–7.2 days]4.9 days(3.0–8.3 days)
Overall infectiousness parameter, β0Lognormal(0.7,0.8)[prior median 2.0, 95% CrI 0.4–9.7]1.7(1.4–1.9)
Appendix 1—table 3
Fitted parameters in the mechanistic model, the prior distributions used, and the posterior means and 95% credible intervals obtained.
ParameterPriorPosterior mean (95% CrI)
Ratio of mean durations of the latent (E) and incubation (combined E and P) periods, kE/kincBeta(2.1,2.1)[prior median 0.5, 95% CrI 0.1–0.9]0.2(0.03–0.5)
Mean symptomatic infectious (I) period, 1/μLognormal(1.6,0.8)[prior median 5.0 days, 95% CrI 1.0–23.8 days]5.0 days(3.2–7.5 days)
Ratio of transmission rates in the P and I stages, αPLognormal(0,0.8)[prior median 1.0, 95% CrI 0.2–4.8]3.1(1.2–6.9)
Overall infectiousness parameter, β0Lognormal(0.7,0.8)[prior median 2.0, 95% CrI 0.4–9.7]1.8(1.5–2.1)
Appendix 1—table 4
The means and standard deviations of the generation time, TOST and serial interval distributions shown in Figure 2.

Other than the generation time distribution for the independent transmission and symptoms model (which is lognormal with the specified mean and standard deviation), none of the remaining distributions take a simple parametric form.

ModelDistributionMeanStandard deviation
Independent transmission and symptomsGeneration time4.2 days4.9 days
TOST−1.6 days5.8 days
Serial interval4.2 days6.6 days
MechanisticGeneration time5.9 days4.8 days
TOST−1.1 days4.9 days
Serial interval4.7 days5.8 days

Data availability

All data generated or analysed during this study are included in the manuscript and its supporting files; a Source Data file has been provided for Figure 1. Code for reproducing our results is available at https://github.com/will-s-hart/UK-generation-times (copy archived at swh:1:rev:729266e972315ba3344da430d5de58123fce4e4e).

References

  1. Book
    1. Anderson RM
    2. May RM
    (1992)
    Infectious Diseases of Humans: Dynamics and Control
    OUP Oxford Press.
    1. Ashcroft P
    2. Huisman JS
    3. Lehtinen S
    4. Bouman JA
    5. Althaus CL
    6. Regoes RR
    7. Bonhoeffer S
    (2020)
    COVID-19 infectivity profile correction
    Swiss Medical Weekly 150:w20336.
  2. Book
    1. Diekmann O
    2. Heesterbeek JAP
    (2000)
    Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation
    John Wiley & Sons.

Article and author information

Author details

  1. William S Hart

    Mathematical Institute, University of Oxford, Oxford, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    william.hart@keble.ox.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2504-6860
  2. Sam Abbott

    Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Akira Endo

    Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Writing – review and editing
    Competing interests
    received a research grant from Taisho Pharmaceutical Co., Ltd
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6377-7296
  4. Joel Hellewell

    Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Elizabeth Miller

    1. Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
    2. Immunisation and Countermeasures Division, UK Health Security Agency, London, United Kingdom
    Contribution
    Data curation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1884-0097
  6. Nick Andrews

    Data and Analytical Sciences, UK Health Security Agency, London, United Kingdom
    Contribution
    Data curation, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Philip K Maini

    Mathematical Institute, University of Oxford, Oxford, United Kingdom
    Contribution
    Methodology, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0146-9164
  8. Sebastian Funk

    Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Contribution
    Conceptualization, Methodology, Project administration, Supervision, Writing – review and editing
    Contributed equally with
    Robin N Thompson
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2842-3406
  9. Robin N Thompson

    1. Mathematics Institute, University of Warwick, Coventry, United Kingdom
    2. Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
    Contribution
    Conceptualization, Methodology, Supervision, Writing – review and editing
    Contributed equally with
    Sebastian Funk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8545-5212

Funding

Engineering and Physical Sciences Research Council (EP/R513295/1)

  • William S Hart

National Institute for Health Research (NIHR200929)

  • Elizabeth Miller

Taisho Pharmaceutical Co., Ltd (Research grant)

  • Akira Endo

UKRI (EP/V053507/1)

  • Robin N Thompson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Thanks to Pauline Waight, who managed the data for the household study, and to the PHE staff who collected the data and tested the PCR and serum samples. Thanks also to Rob Challen, Julia Gog, Matt Keeling and other members of the Juniper Consortium (https://maths.org/juniper/) for helpful comments about this research.

Version history

  1. Received: May 29, 2021
  2. Preprint posted: May 30, 2021 (view preprint)
  3. Accepted: February 7, 2022
  4. Accepted Manuscript published: February 9, 2022 (version 1)
  5. Version of Record published: March 30, 2022 (version 2)

Copyright

© 2022, Hart et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,344
    views
  • 191
    downloads
  • 37
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. William S Hart
  2. Sam Abbott
  3. Akira Endo
  4. Joel Hellewell
  5. Elizabeth Miller
  6. Nick Andrews
  7. Philip K Maini
  8. Sebastian Funk
  9. Robin N Thompson
(2022)
Inference of the SARS-CoV-2 generation time using UK household data
eLife 11:e70767.
https://doi.org/10.7554/eLife.70767

Share this article

https://doi.org/10.7554/eLife.70767

Further reading

    1. Epidemiology and Global Health
    Xiaoxin Yu, Roger S Zoh ... David B Allison
    Review Article

    We discuss 12 misperceptions, misstatements, or mistakes concerning the use of covariates in observational or nonrandomized research. Additionally, we offer advice to help investigators, editors, reviewers, and readers make more informed decisions about conducting and interpreting research where the influence of covariates may be at issue. We primarily address misperceptions in the context of statistical management of the covariates through various forms of modeling, although we also emphasize design and model or variable selection. Other approaches to addressing the effects of covariates, including matching, have logical extensions from what we discuss here but are not dwelled upon heavily. The misperceptions, misstatements, or mistakes we discuss include accurate representation of covariates, effects of measurement error, overreliance on covariate categorization, underestimation of power loss when controlling for covariates, misinterpretation of significance in statistical models, and misconceptions about confounding variables, selecting on a collider, and p value interpretations in covariate-inclusive analyses. This condensed overview serves to correct common errors and improve research quality in general and in nutrition research specifically.

    1. Ecology
    2. Epidemiology and Global Health
    Emilia Johnson, Reuben Sunil Kumar Sharma ... Kimberly Fornace
    Research Article

    Zoonotic disease dynamics in wildlife hosts are rarely quantified at macroecological scales due to the lack of systematic surveys. Non-human primates (NHPs) host Plasmodium knowlesi, a zoonotic malaria of public health concern and the main barrier to malaria elimination in Southeast Asia. Understanding of regional P. knowlesi infection dynamics in wildlife is limited. Here, we systematically assemble reports of NHP P. knowlesi and investigate geographic determinants of prevalence in reservoir species. Meta-analysis of 6322 NHPs from 148 sites reveals that prevalence is heterogeneous across Southeast Asia, with low overall prevalence and high estimates for Malaysian Borneo. We find that regions exhibiting higher prevalence in NHPs overlap with human infection hotspots. In wildlife and humans, parasite transmission is linked to land conversion and fragmentation. By assembling remote sensing data and fitting statistical models to prevalence at multiple spatial scales, we identify novel relationships between P. knowlesi in NHPs and forest fragmentation. This suggests that higher prevalence may be contingent on habitat complexity, which would begin to explain observed geographic variation in parasite burden. These findings address critical gaps in understanding regional P. knowlesi epidemiology and indicate that prevalence in simian reservoirs may be a key spatial driver of human spillover risk.