Inference of the SARSCoV2 generation time using UK household data
Abstract
The distribution of the generation time (the interval between individuals becoming infected and transmitting the virus) characterises changes in the transmission risk during SARSCoV2 infections. Inferring the generation time distribution is essential to plan and assess public health measures. We previously developed a mechanistic approach for estimating the generation time, which provided an improved fit to data from the early months of the COVID19 pandemic (December 2019March 2020) compared to existing models (Hart et al., 2021). However, few estimates of the generation time exist based on data from later in the pandemic. Here, using data from a household study conducted from March to November 2020 in the UK, we provide updated estimates of the generation time. We considered both a commonly used approach in which the transmission risk is assumed to be independent of when symptoms develop, and our mechanistic model in which transmission and symptoms are linked explicitly. Assuming independent transmission and symptoms, we estimated a mean generation time (4.2 days, 95% credible interval 3.3–5.3 days) similar to previous estimates from other countries, but with a higher standard deviation (4.9 days, 3.0–8.3 days). Using our mechanistic approach, we estimated a longer mean generation time (5.9 days, 5.2–7.0 days) and a similar standard deviation (4.8 days, 4.0–6.3 days). As well as estimating the generation time using data from the entire study period, we also considered whether the generation time varied temporally. Both models suggest a shorter mean generation time in SeptemberNovember 2020 compared to earlier months. Since the SARSCoV2 generation time appears to be changing, further data collection and analysis is necessary to continue to monitor ongoing transmission and inform future public health policy decisions.
Editor's evaluation
This paper is a timely update to the authors previous work and will be of interest to those working on public health responses and the mathematical modelling of infectious diseases. In this work the authors infer the generation interval of SARS–CoV–2 which can allow for the assessment of public health measures. The derivation of the likelihood function is also of interest to mathematical modellers as it allows for the inference of the generation interval from data sets where susceptible depletion may dominate infection dynamics.
https://doi.org/10.7554/eLife.70767.sa0Introduction
The generation time (or generation interval) of a SARSCoV2 infectorinfectee pair is defined as the period of time between the infector and infectee each becoming infected (Anderson and May, 1992; Diekmann and Heesterbeek, 2000; Griffin et al., 2020; Svensson, 2007; Wallinga and Lipsitch, 2007). The generation time distribution of many infectorinfectee pairs characterises the temporal profile of the transmission risk of an infected host (averaged over all hosts and normalised so that it represents a valid probability distribution; Fraser, 2007). Inferring the generation time distribution of SARSCoV2 is important in order to predict the effects of nonpharmaceutical interventions such as contact tracing and quarantine (Ashcroft et al., 2021; Ferretti et al., 2020b; Hart et al., 2021). In addition, the generation time distribution is widely used in epidemiological models for estimating the timedependent reproduction number from case notification data (Abbott et al., 2020; Fraser, 2007; Gostic et al., 2020; Thompson et al., 2020) and is crucial for understanding the relationship between the reproduction number and the epidemic growth rate (Fraser, 2007; Parag et al., 2021; Park et al., 2020a; Wallinga and Lipsitch, 2007).
The SARSCoV2 generation time distribution has previously been estimated using data from known infectorinfectee transmission pairs (Ferretti et al., 2020a; Ferretti et al., 2020b; Hart et al., 2021) or entire clusters of cases (Ganyani et al., 2020; Hu et al., 2021; Sun et al., 2021). These studies involved data (Cheng et al., 2020; Ferretti et al., 2020b; Ganyani et al., 2020; He et al., 2020; Xia et al., 2020; Zhang et al., 2020) collected between December 2019 and April 2020, almost entirely from countries in East and Southeast Asia (with the exception of four transmission pairs from Germany and four from Italy in Ferretti et al., 2020b). Evidence from January and February 2020 in China suggested a temporal reduction in the mean generation time due to nonpharmaceutical interventions (Sun et al., 2021). Specifically, effective isolation of infected individuals is likely to have reduced the proportion of transmissions occurring when potential infectors were in the later stages of infection, thereby shortening the generation time (Sun et al., 2021). Similarly, two other studies found a decrease in the serial interval (the difference between symptom onset times of an infector and infectee; Ali et al., 2020) and an increase in the proportion of presymptomatic transmissions (Bushman et al., 2021) in China over the same time period, which can be attributed to symptomatic hosts being isolated increasingly quickly over time.
Despite estimation of the SARSCoV2 generation time in Asia early in the pandemic, relatively little is known about the generation time distribution outside Asia, and whether or not any changes have occurred in the generation time since the early months of the pandemic. At the time of writing, we are aware of only one previous study in which the generation time was estimated using data from the UK (Challen et al., 2021). In that study (Challen et al., 2021), data describing symptom onset dates for 50 infectorinfectee pairs, collected by Public Health England (PHE; now the UK Health Security Agency) between January and March 2020 as part of the ‘First Few Hundred’ case protocol (Boddington et al., 2021; Public Health England, 2020), were used to infer the generation time distribution. However, since these transmission pairs mostly consisted of international travellers and their household contacts, the authors concluded that their estimates of the generation time may have been biased downwards due to enhanced surveillance and isolation of these cases (Challen et al., 2021).
Here, we use data from a household study (Miller et al., 2021), conducted between March and November 2020, to estimate the SARSCoV2 generation time distribution in the UK under two different underlying transmission models. In the first model (the ‘independent transmission and symptoms model’), a parsimonious assumption is made that the generation time and the incubation period of the infector are independent (i.e. there is no link between the times at which infectors transmit the virus and the times at which they develop symptoms), as has often been employed in studies in which the SARSCoV2 generation time has been estimated (Challen et al., 2021; Ferretti et al., 2020a; Ganyani et al., 2020; Hart et al., 2021; Lehtinen et al., 2021; Table 1). In the second model (the ‘mechanistic model’), we use a mechanistic approach in which potential infectors progress through different stages of infection, first becoming infectious before developing symptoms (Hart et al., 2021). Infectiousness is therefore explicitly linked to symptoms in the mechanistic model. A feature of the mechanistic model is that individuals with longer incubation periods will (on average) be infectious for longer before developing symptoms, and so generate more transmissions, compared to those with shorter incubation periods.
By fitting separately to data from three different time intervals within the study period, we explore whether or not there was a detectable temporal change in the generation time distribution.
Results
Inferring the generation time from UK household data
We fitted two models of infectiousness (the independent transmission and symptoms model and the mechanistic model) to data collected from 172 UK households in a study (Miller et al., 2021) conducted by PHE between March and November 2020 (Figure 1—source data 1). Each household was recruited to the study following a confirmed SARSCoV2 infection, and all household members were then followed to investigate whether or not they became infected (this was determined through PCR and antibody testing). If a household member was infected and developed symptoms, their symptom onset date was recorded (see Methods).
In our previous work (Hart et al., 2021), we fitted the same two models of infectiousness to data from infectorinfectee transmission pairs collected in the early months of the COVID19 pandemic. Here, we adapted the approach presented in that article (Hart et al., 2021) in order to estimate the generation time using household transmission data. Specifically, we used data augmentation MCMC, augmenting the observed data with both estimated times of infection and estimated precise times at which symptomatic infected hosts developed symptoms (within recorded symptom onset dates). This enabled us (in the likelihood function) to account for uncertainty about exactly whoinfectedwhom within a household by summing together likelihood contributions corresponding to infection by different possible infectors. In addition, we corrected for the regularity of household contacts to derive more widely applicable estimates of the generation time. We did this by including a factor in the likelihood to account for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (see Methods for full details of our approach).
For the two fitted models, we calculated posterior estimates of the mean (Figure 1A) and standard deviation (Figure 1B) of the generation time distribution, in addition to the proportion of transmissions occurring prior to symptom onset (among infectors who develop symptoms; Figure 1C) and the overall infectiousness parameter, $\beta}_{0$ (see Methods; Figure 1D). Under the commonly used independent transmission and symptoms model, we obtained a point estimate of 4.2 days (95% credible interval (CrI) 3.3–5.3 days) for the mean generation time (Figure 1A, blue violin; we calculated point estimates for each model using the posterior means of fitted model parameters because the mode of the joint posterior distribution could not easily be calculated from the output of the MCMC procedure). This value is similar to a previous estimate obtained using data from China by Ganyani et al., 2020. It is slightly lower than estimates for Singapore obtained by Ganyani et al., 2020 and for several countries (predominantly in Asia) obtained by Ferretti et al., 2020b (Table 1), although those estimates lie within our credible interval. On the other hand, our estimated standard deviation of 4.9 days (95% CrI 3.0–8.3 days; Figure 1B, blue violin) is substantially higher than previous estimates (Table 1). Using our mechanistic model, we obtained a higher estimate for the mean generation time of 5.9 days (95% CrI 5.2–7.0 days; Figure 1A, red violin), and a similar estimate for the standard deviation (4.8 days, 95% CrI 4.0–6.3 days; Figure 1B, red violin), compared to those predicted by the independent transmission and symptoms model.
The two models gave similar posterior distributions for the proportion of transmissions prior to symptom onset (Figure 1C). Specifically, point estimate values of model parameters led to an estimated proportion of transmissions prior to symptom onset of 0.72 (95% CrI 0.63–0.80) for the independent transmission and symptoms model, and 0.73 (95% CrI 0.61–0.83) for the mechanistic model. These estimates are higher than obtained in some previous studies in which the infectiousness profile of SARSCoV2 infected hosts at each time since infection and/or time since symptom onset has been estimated (Ashcroft et al., 2020; Ferretti et al., 2020a; He et al., 2020). On the other hand, our point estimates for the two models both lie within the 95% credible interval obtained for the mechanistic model in our previous work (0.53–0.77, point estimate 0.65; Hart et al., 2021). Similar or higher estimates also exist in the wider literature (CaseyBryars et al., 2021; Ganyani et al., 2020; Tindale et al., 2020).
Posterior distributions for fitted model parameters are shown in Figure 1—figure supplement 1 and Figure 1—figure supplement 2, and point estimates and 95% credible intervals are given in Appendix 1—table 2 and Appendix 1—table 3. Since only the likelihood with respect to augmented data was calculated in the MCMC procedure, direct comparisons of the goodness of fit between the models were not readily available. However, comparing model predictions of the distribution of the interval between successive symptom onset dates in households to the analogous distribution in the data indicated that both models provided a similar fit to the data (Figure 1—figure supplement 3).
In Figure 1 (and elsewhere, unless otherwise stated), we characterise the generation time distribution assuming that a constant supply of susceptible individuals are available to infect during the course of infection. This distribution corresponds to the normalised expected infectiousness profile of an infected host at each time since infection, and is widely applicable to transmission outside of, as well as within, households. However, realised household generation times are expected to be shorter than the estimates shown in Figure 1. This is due to the depletion of susceptible household members before longer generation times can be obtained, especially in small households (Cauchemez et al., 2009; Fraser, 2007; Park et al., 2020a). As a result, we also predicted the mean and standard deviation of realised generation times within the study households (Figure 1—figure supplement 4A,B), accounting for the precise distribution of household sizes in the study. For both the independent transmission and symptoms model and the mechanistic model, the mean (point estimates 3.6 days and 4.9 days for the two models, respectively) and standard deviation (3.8 days and 4.1 days) of realised household generation times were lower than our main generation time estimates shown in Figure 1. Since household transmission typically occurs earlier in the infector’s course of infection than indicated by the estimates shown in Figure 1, we predicted a higher proportion of presymptomatic transmissions within the study households (Figure 1—figure supplement 4C) compared to the estimates in Figure 1C.
For both models, we then used point estimates of fitted model parameters to infer the distributions of the generation time (Figure 2A), the time from onset of symptoms to transmission (TOST; Figure 2B) and the serial interval (Figure 2C). The TOST distribution (which characterises the relative expected infectiousness of a host (who develops symptoms) at each time from symptom onset, as opposed to from infection [Ashcroft et al., 2020; Ferretti et al., 2020a; He et al., 2020; Lehtinen et al., 2021; Wells et al., 2021]) obtained using the mechanistic model was more concentrated around the time of symptom onset compared to that predicted assuming independent transmission and symptoms (Figure 2B), as we found in our previous work (Hart et al., 2021). In contrast, the estimated serial interval distributions were similar for the two models (Figure 2C). The means and standard deviations of the distributions shown in Figure 2 are given in Appendix 1—table 4.
Temporal variation in the generation time distribution
To explore whether or not the generation time distribution changed during the study period, we separately fitted the independent transmission and symptoms model to the data from households in which the index case was recruited in (i) MarchApril, (ii) MayAugust, or (iii) SeptemberNovember 2020 (Figure 3). We chose these time periods to ensure the numbers of households recruited into the study during each interval were similar (Figure 3—figure supplement 1).
The results shown in Figure 3A suggest a shorter mean generation time in SeptemberNovember 2020 (2.9 days, 95% CrI 1.8–4.3 days) compared to earlier months (4.9 days, 95% CrI 3.6–6.3 days, for MarchApril and 5.2 days, 95% CrI 3.4–7.2 days, for MayAugust). Comparing the posterior estimates for MayAugust and SeptemberNovember (the red and orange violins in Figure 3A, respectively) indicated a 97% posterior probability of a shorter mean generation time in the later of these two time periods. A similar temporal reduction in the mean generation time was found when we instead fitted the mechanistic model to the data from the three time intervals (Figure 3—figure supplement 2). Estimates of the mean generation time using the mechanistic model were 6.5 days (95% CrI 5.6–8.1 days) for MarchApril, 7.1 days (95% CrI 5.7–9.6 days) for MayAugust, and 5.1 days (95% CrI 4.3–6.4 days) for SeptemberNovember, with a 98% posterior probability of a shorter mean generation time in SeptemberNovember than MayAugust. We also used point estimates of model parameters to compare the distributions of the generation time, TOST and serial interval between the time periods (Figure 3—figure supplement 3), with both models indicating that the transmission risk peaked earlier in infection for individuals infected in SeptemberNovember compared to earlier months (Figure 3—figure supplement 3A,D).
Figure 3C shows posterior estimates for the proportion of transmissions occurring prior to symptom onset (among symptomatic infectors) across the three time periods for the independent transmission and symptoms model, indicating a very high proportion of presymptomatic transmissions in SeptemberNovember (0.83, 95% CrI 0.72–0.93) compared to lower estimates for MarchApril (0.64, 95% CrI 0.51–0.77) and MayAugust (0.62, 95% CrI 0.41–0.79). Our results for the mechanistic model indicate a similar temporal increase in the proportion of presymptomatic transmissions during the study period (Figure 3—figure supplement 2C).
To explore the lower estimated generation time for SeptemberNovember further, we also fitted the independent transmission and symptoms model to the data from each of these months individually (Figure 3—figure supplement 4). The shorter estimated generation time compared to earlier in the pandemic was consistent across each of the three months (Figure 3—figure supplement 4A). We note that, while the Alpha (B.1.1.7) variant had begun to emerge in the UK by the end of the study period (Public Health England, 2021), genomic surveillance as part of the study showed that this variant caused infections in only two study households. This variant was therefore unlikely to have been responsible for the temporal reduction in the generation time that we observed.
In Figure 3—figure supplement 5, we show the posterior distributions of the fitted parameters for the mechanistic model (other than the overall infectiousness, $\beta}_{0$, which is shown in Figure 3D) over the different time periods. These parameters represent the mean duration of the platent period (expressed as a proportion of the mean incubation period; Figure 3—figure supplement 5A), the mean duration of the symptomatic infectious period (Figure 3—figure supplement 5B), and the relative infectiousness of presymptomatic infectious hosts compared to those with symptoms (Figure 3—figure supplement 5C). However, there was substantial overlap in the credible intervals of posterior estimates of each parameter between the three time periods. We were therefore unable to identify the precise parameter(s) responsible for the decrease in generation time and increase in the proportion of presymptomatic transmissions that we observed.
Sensitivity analyses
When we fitted the two models to the household transmission data, we assumed that each household transmission chain was initiated by a single primary case and all other infected household members were infected from within the household. However, we also extended our framework to account for the possibility of coprimary cases (Appendix 1, Figure 1—figure supplement 5 and Figure 3—figure supplement 6). This led to slightly higher estimates of the mean generation time (Figure 1—figure supplement 5A) under each model compared to the corresponding estimates shown in Figure 1A, with point estimates of 4.8 days (95% CrI 3.6–6.3 days) for the independent transmission and symptoms model and 6.8 days (95% CrI 5.7–8.6 days) for the mechanistic model. Estimates of the standard deviation of the generation time distribution were similar to those in Figure 1 (Figure 1—figure supplement 5B); point estimates were 4.8 days (95% CrI 2.9–7.9 days) for the independent transmission and symptoms model and 5.1 days (95% CrI 4.0–6.9 days for the mechanistic model). As part of the fitting procedure, we estimated the probability that each household member was infected during the primary transmission event (Figure 1—figure supplement 5E), obtaining point estimates of 0.17 (95% CrI 0.02–0.33) under the independent transmission and symptoms model and 0.27 (95% CrI 0.10–0.41) under the mechanistic model. We also repeated the analyses in Figure 3 but accounting for the possibility of coprimary cases (Figure 3—figure supplement 6). Our main qualitative finding remained unchanged: the mean generation time was found to decrease during the study period (Figure 3—figure supplement 6A).
In the independent transmission and symptoms model, we assumed that both the generation time and incubation period follow lognormal distributions. The mean and standard deviation of the generation time distribution were estimated by fitting the model to the household transmission data. In the fitting procedure, we assumed that the incubation period followed a lognormal distribution that was obtained in a previous metaanalysis (McAloon et al., 2020). In contrast, we assumed in our mechanistic approach that each infection could be decomposed into three gamma distributed stages (latent, presymptomatic infectious and symptomatic infectious), so that the incubation period was also gamma distributed (with the same mean and standard deviation as the lognormal distribution obtained by McAloon et al., 2020). An expression for the generation time distribution in the mechanistic model, which does not take a simple parametric form, is given in the Appendix. However, we conducted supplementary analyses in which we instead assumed that either the generation time (Figure 1—figure supplement 6) or incubation period (Figure 1—figure supplement 7) in the independent transmission and symptoms model was gamma distributed. In both cases, we obtained similar results to those shown for that model in Figure 1.
We also relaxed the assumption of a fixed incubation period distribution (Figure 1—figure supplement 8), using the confidence intervals obtained by McAloon et al., 2020 to account for uncertainty in the incubation period distribution (Figure 1—figure supplement 8A, B). For both the independent transmission and symptoms model and the mechanistic model, accounting for this uncertainty did not substantially affect posterior estimates of either the mean (Figure 1—figure supplement 8C) or the standard deviation (Figure 1—figure supplement 8D) of the generation time distribution.
In our main analyses, we assumed that household transmission was frequencydependent, so that the force of infection exerted by an infected household member on each susceptible household member scales with $1/n$, where $n$ is the household size (Cauchemez et al., 2014; Cauchemez et al., 2004). However, since some studies of influenza virus transmission in households have found transmission to lie somewhere in between frequency and densitydependent (Endo et al., 2019; Ferguson et al., 2005), we also considered alternative possibilities where infectiousness scales with ${n}^{\rho}$, for different values of $\rho $. In Figure 1—figure supplement 9AC, we compared estimates under our baseline value of $\rho =1$ (frequencydependent transmission) with those obtained assuming either $\rho =0$ (densitydependent transmission) or the intermediate possibility of $\rho =0.5$ considered by Endo et al., 2019. In addition, we conducted an analysis in which the dependency, $\rho $, was estimated alongside other model parameters (Figure 1—figure supplement 9D). We found that our estimates of the mean and standard deviation of the generation time distribution were robust to the assumed value of $\rho $ (Figure 1—figure supplement 9A, B). However, when the value $\rho $ was fitted (Figure 1—figure supplement 9D), we estimated a value of 1.0 (95% CrI 0.6–1.5). This supported our assumption of frequencydependent transmission, although the credible interval was relatively wide. In addition, we considered the possibility that infectiousness instead scales with $1/(n1)$, so that the infector under consideration is not included in this scaling, and again obtained similar estimates of the mean and standard deviation of the generation time distribution compared to those shown in Figure 1 (Figure 1—figure supplement 10).
We also considered the sensitivity of our results to the assumed relative infectiousness of asymptomatic infected hosts (Figure 1—figure supplement 11). In most of our analyses, we assumed that the expected infectiousness of an infected host who remained asymptomatic throughout infection was a factor ${\alpha}_{A}=0.35$ times that of a host who develops symptoms, at each time since infection (BuitragoGarcia et al., 2020). However, similar estimates of the mean (Figure 1—figure supplement 11) and standard deviation (Figure 1—figure supplement 11B) of the generation time distribution were obtained when we instead assumed ${\alpha}_{A}=0.1$ or ${\alpha}_{A}=1.27$ (these values corresponded to the lower and upper confidence bounds obtained by BuitragoGarcia et al., 2020). Lower values of ${\alpha}_{A}$ did lead to slightly higher estimates of the overall infectiousness of infectors who develop symptoms, ${\beta}_{0}$ (Figure 1—figure supplement 11D). However, this effect was minimal, likely because very few cases in the household study were asymptomatic (27 out of 357).
Finally, we explored the robustness of our results to the exclusion of household members of unknown infection status (see Methods), considering the extreme possibilities where these individuals were instead assumed to have either all remained uninfected, or all become infected (Figure 1—figure supplement 12). Although the estimates of ${\beta}_{0}$ were affected by this assumption (Figure 1—figure supplement 12D), the estimated generation time distribution was robust to the assumed infection status of these individuals (Figure 1—figure supplement 12A,B).
Discussion
In this study, we estimated the generation time distribution of SARSCoV2 in the UK by fitting two different models to data describing the infection status and symptom onset dates of individuals in 172 households. The first model was predicated on an assumption that transmission and symptoms are independent. While this assumption has often been made in previous studies in which the SARSCoV2 generation time has been estimated (Challen et al., 2021; Deng et al., 2021; Ferretti et al., 2020b; Ganyani et al., 2020; Knight and Mishra, 2020), it is not an accurate reflection of the underlying epidemiology (Bacallado et al., 2020; Lehtinen et al., 2021). Therefore, we also considered a mechanistic model based on compartmental modelling, which was shown in our earlier work (Hart et al., 2021) to provide an improved fit to data from 191 SARSCoV2 infectorinfectee pairs compared to previous models that have been used to estimate the generation time. Here, infection times and the order of transmissions within households were unknown, whereas in Hart et al., 2021 the direction of transmission was assumed to be known for each infectorinfectee pair. For that reason, we needed to extend the statistical inference methods underlying our previous work (Hart et al., 2021) to fit the two models to household data. To do this, we used a data augmentation MCMC approach similar to previous studies of household influenza virus transmission (Cauchemez et al., 2009; Cauchemez et al., 2004; Ferguson et al., 2005).
Under the model assuming independent transmission and symptoms, we estimated a mean generation time of 4.2 days (95% CrI 3.3–5.3 days) and a standard deviation of 4.9 days (95% CrI 3.0–8.3 days). The estimate of the mean generation time was comparable to previous estimates obtained under this assumption using data from elsewhere (Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020; Table 1). On the other hand, while our credible interval for the standard deviation was wide, the estimates obtained in those previous studies (Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020) all lay below our lower 95% credible limit of 3.0 days. One potential cause of this disparity is the difference in isolation policies for symptomatic hosts between countries. In particular, the UK’s policy of selfisolation may be expected to lead to a longertailed generation time distribution compared to countries with a policy of isolation outside the home, since under home isolation, some withinhousehold transmission is likely to occur even following isolation. Isolation outside the home was commonplace in the East and Southeast Asian countries where the majority of the data underlying the estimates by Ferguson et al., 2005; Ferretti et al., 2020a; Ganyani et al., 2020 were collected.
Using the mechanistic model, we predicted a higher mean generation time of 5.9 days (95% CrI 5.2–7.0 days) compared to the value estimated under the assumption of independent transmission and symptoms. On the other hand, the inferred serial intervals for the independent transmission and symptoms model and mechanistic model were more similar (Figure 2C), with means of 4.2 days and 4.7 days, respectively. Temporal information in our household transmission data consisted mostly of symptom onset dates, with very few individuals testing positive before developing symptoms. Therefore, the variation in estimates of the generation time between the models can be attributed to differences in the assumed relationships between the generation time and serial interval under those models. For the independent transmission and symptoms model, the generation time and serial interval distributions have the same mean, as is commonly assumed to be the case (Lehtinen et al., 2021). However, this was not true for the mechanistic model, in which infected hosts with longer presymptomatic infectious periods generate (on average) a higher number of transmissions. As a result, under the mechanistic model, a randomly chosen infection is more likely to arise from an infector with a longer incubation period than from a host with a shorter incubation period, thereby leading to a longer generation time than serial interval (an analytical expression for the exact difference between the mean generation time and serial interval for that model is derived in the Appendix).
Our results do not indicate any clear difference in goodness of fit to the data between the two models (Figure 1—figure supplement 3). A range of factors should therefore be considered when deciding which of our estimates of epidemiological parameters to use in subsequent analyses. Although any model requires simplifying assumptions to be made, our mechanistic approach allows the standard assumption of independent transmission and symptoms to be relaxed by providing a mechanistic underpinning to the relationship between the times at which individuals display symptoms and become infectious. Furthermore, as described above, this model was shown in our previous work (Hart et al., 2021) to provide a better fit to an earlier SARSCoV2 dataset than a model assuming independence between transmission and symptoms (in our earlier work [Hart et al., 2021], the simpler setting of transmission pairs rather than households facilitated direct model comparison). On the other hand, the independent transmission and symptoms model has the advantage of producing an estimated generation time distribution with a simple parametric form. The choice of estimates to use may also depend on precisely what the estimates are being used for. For example, the generation time distribution inferred under the assumption of independent transmission and symptoms may be better suited for use in some models for estimating the timedependent reproduction number, since those models often also involve the assumption that transmission and symptoms are independent (Abbott et al., 2020). In contrast, the parameter estimates from our mechanistic approach correspond naturally to parameters in compartmental epidemic models.
By fitting separately to data from three different intervals within the study period (MarchNovember 2020), we investigated whether or not the generation time distribution in the UK changed as the pandemic progressed. Our results indicate a shorter mean generation time in SeptemberNovember compared to earlier months (Figure 3A). One possible explanation for this is a higher proportion of time spent indoors in colder months leading to an increased transmission risk, particularly in the early stages of infection before symptoms develop (since symptomatic infected hosts are still likely to selfisolate). This explanation is consistent with our finding in Figure 3C of a higher proportion of transmissions occurring prior to symptom onset in SeptemberNovember compared to MarchApril and MayAugust.
While behavioural changes may have been responsible for our finding of a temporal decrease in the generation time, an alternative explanation could be that evolutionary changes in the SARSCoV2 virus that occurred during the study period affected the generation time. For example, the B.1.177 lineage emerged in Spain in early summer 2020, and became the dominant SARSCoV lineage in the UK around the beginning of October 2020 (Vöhringer et al., 2021). Subsequently, the Alpha (B.1.1.7) variant, which was first detected in September 2020, became dominant in the UK in December 2020 (Public Health England, 2021). The Alpha variant has been shown to possess different characteristics than earlier variants (Davies et al., 2021; Volz et al., 2021), causing an increased epidemic growth rate in the UK that has been attributed to an increase in transmissibility of 43%–90% (Davies et al., 2021). While in principle evolutionary changes could explain the variation in the generation time that we observed, sequencing data show that the Alpha variant was responsible for infections in only two households within our dataset. Consequently, the Alpha variant was not responsible for our main finding of a temporally decreasing generation time, and additional data are required to quantify the impact of the emergence of that variant (and subsequent variants, such as the Delta (B.1.617.2) and Omicron (B.1.1.529) variants) on the SARSCoV2 generation time.
In data collected from infectorinfectee transmission pairs, shorter generation times are expected to be overrepresented at times when case numbers are rising (Britton and Scalia Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), and vice versa. While we used data from households (rather than transmission pairs) in our analyses, a similar effect may have contributed to our shorter estimated mean generation time for SeptemberNovember 2020 (national case numbers were mostly increasing in SeptemberOctober 2020) compared to earlier months of the study (during which case numbers were mostly decreasing; Knock et al., 2021; Pouwels et al., 2021). However, we estimated the mean generation time to be similar in November (when case numbers were mostly decreasing [Knock et al., 2021; Pouwels et al., 2021]) compared to September and October (Figure 3—figure supplement 4), suggesting that this effect of background epidemic dynamics alone did not drive the temporal changes in generation time that we observed. We note, however, that sample sizes for individual months were small (Figure 3—figure supplement 1). Extending our household inference framework to explicitly account for background epidemic dynamics in generation time estimates (similar to methods that have been developed for transmission pair data [Britton and Scalia Tomba, 2019; Ferretti et al., 2020b]) is an avenue for future work.
Our finding of a temporal decrease in the mean generation time during the study period highlights the importance of obtaining uptodate generation time estimates specific to the location under study. Should variations in the generation time distribution occur and not be accounted for, estimates of the timedependent reproduction number may be incorrect (Park et al., 2021; Wallinga and Lipsitch, 2007). Specifically, if the mean generation time is shorter than assumed, then the true value of the timedependent reproduction number is likely to be closer to one than the inferred value (Wallinga and Lipsitch, 2007), and vice versa.
One advantage of our approach compared to previous studies in which the SARSCoV2 generation time has been estimated (Ferretti et al., 2020a; Ganyani et al., 2020; Hart et al., 2021) is that we were able to include the contribution of asymptomatic infected hosts to household transmission chains in our analyses. We showed that our estimated generation time distribution was robust to the assumed relative infectiousness of infected hosts who remain asymptomatic, ${\alpha}_{A}$ (Figure 1—figure supplement 11). Similarly, while we assumed frequencydependent household transmission in most of our analyses, we found that the exact relationship between the household size and transmission had little effect on our estimates of the mean and standard deviation of the generation time distribution (Figure 1—figure supplement 9 and Figure 1—figure supplement 10). We also considered estimating the exponent governing the dependency of transmission on household size (Figure 1—figure supplement 9D). This supported our assumption of frequencydependent transmission, and is consistent with the finding of an inverse relationship between household size and secondary attack rate in the household study underlying our analyses (Miller et al., 2021). In previous studies of influenza transmission within households, evidence has been found both in favour of (Cauchemez et al., 2004) and against (Endo et al., 2019) frequencydependent transmission.
While our generation time estimates were robust to the assumed relative infectiousness of infected hosts who remain asymptomatic and whether transmission was assumed to be frequency or densitydependent, extending our approach to account for the possibility that household transmission chains originate with multiple coprimary cases led to slightly higher estimates of the generation time (Figure 1—figure supplement 5) compared to our main estimates (Figure 1). Despite the overall higher estimated generation time, our main qualitative finding of a temporal decrease in the generation time held when coprimary cases were incorporated (Figure 3—figure supplement 6).
Like any mathematical modelling study, our approach has some limitations. We used household data in our analyses, whereas some characteristics of wider community transmission may differ from those of transmission within households. However, we corrected for the regularity of household contacts to estimate the (expected) infectiousness profile of an infected host at each time since infection (accounting for behavioural factors), which provides a widely applicable generation time estimate (Figure 1). Specifically, the infectiousness profile gives the generation time distribution under the assumption that a constant supply of susceptible individuals are available throughout the course of infection. This distribution can then be conditioned to specific population structures, as we demonstrated by estimating the realised generation time distribution within the study households (Figure 1—figure supplement 4). The household generation time estimates shown in Figure 1—figure supplement 4 are shorter than our main generation time estimates (Figure 1), due to the regularity of household contacts and the depletion of susceptible individuals within households before longer generation times can be realised.
We also note that, while our dataset involved a larger sample size than used in most other studies in which the SARSCoV2 generation time was estimated (Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020; Hart et al., 2021), the demographics of the study households may not have been completely representative of the wider population. Exploring heterogeneity in the generation time distribution between individuals and/or households with different characteristics is an important topic for future work. This could involve, but is not limited to, estimating the generation time distribution for individuals of different age, sex, ethnicity, and socioeconomic status. Nonetheless, as well as providing updated SARSCoV2 generation time estimates, our study demonstrates that changes in the generation time can be detected using data from household studies. Our finding that the generation time has become shorter highlights both the importance of continued monitoring of the generation time and the role of household studies in such monitoring efforts, particularly in light of the more recent emergence of novel SARSCoV2 variants.
In summary, we have inferred the SARSCoV2 generation time distribution in the UK using household data and two different transmission models. A key output of this research is one of the first estimates of the SARSCoV2 generation time outside Asia. Another crucial feature of our analysis is that it was based on data from beyond the first few months of the pandemic. Since this research suggests that the generation time may be changing, continued data collection and analysis is of clear importance.
Methods
Data
Data were obtained from a household study (Miller et al., 2021) conducted in 172 UK households (with 603 household members in total) by PHE between March and November 2020 (Figure 1—source data 1). In each household, an index case was recruited following a positive PCR test. The following were then recorded for each household member:
The timing and outcome of (up to) two subsequent PCR tests.
The outcome of an antibody test (carried out for 541 individuals – 90% of the study cohort).
Whether or not the household member developed symptoms.
The date of symptom onset (only for symptomatic individuals with a positive PCR or antibody test).
In the study, all household members who tested positive in either a PCR or antibody test were assumed to have been infected. Conversely, all individuals who tested negative for antibodies and did not return a positive PCR test (i.e. the two PCR tests were either negative or were not carried out) were assumed to have remained uninfected, irrespective of symptom status. For 34 individuals (6% of the study cohort), no antibody test was carried out and any PCR tests were negative. Since the available data were considered insufficient to determine whether or not these 34 individuals were infected, these individuals were excluded from our main analyses (but were counted in the household size), although we also considered the sensitivity of our results to this assumption.
In two households, at least one household member developed symptoms 55–56 days prior to the symptom onset date of the index case, with no other household members developing symptoms (or returning a positive PCR or antibody test) between these dates. In contrast, the maximum gap between successive symptom onset dates in the remaining households was 25 days (Figure 1—figure supplement 3). Data from these two households were excluded from our analyses, on the basis that the virus was most likely introduced multiple times into these households. Three other households were also excluded from our analyses because, other than the index cases in each household, all other household members were of unknown infection status (i.e. they were among the individuals for whom no antibody test was carried out and any PCR tests were negative).
Overall, aside from the five excluded households, the 167 remaining households comprised 587 individuals, of whom 330 became infected and developed symptoms, 27 became infected but remained asymptomatic, 200 remained uninfected, and the remaining 30 were of unknown infection status. The number of households and individuals recruited into the study by month is shown in Figure 3—figure supplement 1.
Models
General modelling framework
Throughout, we denote the expected force of infection exerted by an infected host onto each susceptible member of their household, at time since infection $\tau$, by $\beta (\tau )$, where we assumed
for a host who develops symptoms, and
for a host who remains asymptomatic throughout infection. Here:
${\beta}_{0}$ is the overall infectiousness parameter, describing the expected number of household transmissions generated by a single infected host (who develops symptoms) in a large, otherwise entirely susceptible, household.
$n$ is the household size. The scaling of $\beta (\tau )$ with $1/n$ corresponds to frequencydependent transmission, as assumed by Cauchemez et al., 2014; Cauchemez et al., 2004, although we carried out a sensitivity analysis in which we considered alternative possibilities where household transmission is densitydependent (without the scaling factor $1/n$), scales with $1/{n}^{0.5}$ (Endo et al., 2019), or scales with $1/(n1)$.
$f\phantom{\rule{negativethinmathspace}{0ex}}\left(\tau \right)$ is the generation time distribution (which was assumed to be the same for entirely asymptomatic hosts as those who develop symptoms).
${\alpha}_{A}$ is the relative infectiousness of infected hosts who remain asymptomatic throughout infection. We assumed a value of 0.35 (BuitragoGarcia et al., 2020) in most of our analyses, although we considered different values of ${\alpha}_{A}$ in a sensitivity analysis.
Except where otherwise stated, we considered the generation time distribution assuming a constant supply of susceptibles during infection, $f\phantom{\rule{negativethinmathspace}{0ex}}\left(\tau \right)$, which corresponds to the normalised expected infectiousness profile and gives a widely applicable generation time estimate (see Discussion). However, realised generation times within a household may be shorter than predicted by this distribution due to the depletion of susceptible household members before longer generation times can be realised (Cauchemez et al., 2009; Fraser, 2007; Park et al., 2020b). For example, if infected hosts are (on average) equally infectious at two times since infection, $\tau}_{1}<{\tau}_{2$, then $f\phantom{\rule{negativethinmathspace}{0ex}}\left({\tau}_{1}\right)=f\phantom{\rule{negativethinmathspace}{0ex}}\left({\tau}_{2}\right)$. However, because the number of susceptible household members may decrease between these two times (i.e. either the host under consideration, or another infected household member, may transmit the virus within the household in the intervening time), then transmission is in fact more likely to occur in a household at the earlier time, ${\tau}_{1}$, when more susceptibles are available. Therefore, we also predicted the mean and standard deviation of realised generation times within the study households in Figure 1—figure supplement 4.
We considered two different models of infectiousness, which are outlined below. Under each model, expressions were derived in Hart et al., 2021 for the generation time, TOST and serial interval distributions, in addition to the proportion of transmissions occurring before symptom onset. These expressions are given in the Appendix here (other than the generation time distribution and proportion of presymptomatic transmissions for the independent transmission and symptoms model, which are stated below).
Independent transmission and symptoms model
In this model, the infectiousness of an infected host (who does not remain asymptomatic throughout infection; asymptomatic infected hosts are considered separately) at a given time since infection, $\tau $, is assumed to be independent of exactly when the host develops symptoms – that is, the generation time and incubation period are independent. In our main analyses using this model, we assumed that the generation time distribution, $f\phantom{\rule{negativethinmathspace}{0ex}}\left(\tau \right)$, is the probability density function of a lognormal distribution (Ferguson et al., 2005; an alternative case of a gamma distributed generation time is considered in Figure 1—figure supplement 6). The mean and standard deviation of this distribution, in addition to ${\beta}_{0}$, were estimated when we fitted the model to the household transmission data.
Under the assumption of independent transmission and symptoms, the proportion of transmissions occurring prior to symptom onset (among infectors who develop symptoms) is given by (Ferretti et al., 2020b; Fraser et al., 2004)
where $F}_{inc$ is the cumulative distribution function of the incubation period (this was assumed to be known; the exact incubation period distribution we used is given under ‘Parameter estimation’ below).
Mechanistic model
Under the mechanistic model (Hart et al., 2021), infectors who develop symptoms progress through independent latent (E), presymptomatic infectious (P) and symptomatic infectious (I) stages of infection. We assumed the duration of each stage to be gamma distributed, and infectiousness was assumed to be constant during each stage. Under these assumptions, an expression can be derived for the expected infectiousness, $\beta (\tau \mid {\tau}_{inc})$, of a host (who develops symptoms) at each time since infection $\tau $, conditional on their incubation period ${\tau}_{inc}$. We assumed that entirely asymptomatic infected hosts follow the same stage progression as those who develop symptoms, although in this case the distinction between the P and I stages has no epidemiological meaning. Details of the mechanistic approach, including the formula for $\beta (\tau \mid {\tau}_{inc})$, are provided in the Appendix.
When we fitted this model to the household transmission data, three model parameters were estimated in addition to ${\beta}_{0}$. These parameters correspond to:
The ratio between the mean latent (E) period and the mean incubation (combined E and P) period (where the latter was assumed to be known).
The mean symptomatic infectious (I) period.
The ratio between the transmission rates when potential infectors are in the P and I stages.
Likelihood function
Here, we consider a household of size $n$, in which ${n}_{I}$ household members become infected (of whom ${n}_{S}$ develop symptoms and ${n}_{A}$ remain asymptomatic throughout infection) and ${n}_{U}=n{n}_{I}$ remain uninfected. We derive an expression for the likelihood of the parameters of either model of infectiousness, given the entire sequence of infection times of individuals in the household ($t}_{1}<\dots <{t}_{{n}_{I}$) as well as the precise symptom onset time (${t}_{s,j}$) of each host, $j$, who develops symptoms. In the case of the mechanistic model, the likelihood also depends on the times at which entirely asymptomatic infected hosts enter the I stage of infection (these times are also denoted by ${t}_{s,j}$, although for asymptomatic infected individuals these times have no epidemiological meaning). Since exact infection times were not available within study households, and it was unknown exactly when each symptomatic infected host developed symptoms within their recorded symptom onset date, we used data augmentation MCMC to fit the two models to the UK household transmission data using this likelihood function (see further details below).
When deriving the likelihood, we made several simplifying assumptions:
The virus is introduced once into the household (i.e. no subsequent infections from the community occur following the infection of the primary case).
No coprimary cases (we relaxed this assumption in the Appendix, Figure 1—figure supplement 5 and Figure 3—figure supplement 6).
Potential bias towards more recent introduction of the virus into the household if community prevalence is increasing, or less recent if prevalence is decreasing (Britton and Scalia Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), was neglected.
We denote the expected infectiousness of household member $j$, at time $\tau $ since infection, by ${\beta}_{j}(\tau )$. For the mechanistic model in which transmission and symptoms are not independent, this infectiousness is conditional on the duration of the incubation period, ${t}_{s,j}{t}_{j}$, for a host who develops symptoms (the infectiousness is also conditional on $\left({t}_{s,j}{t}_{j}\right)$ for an entirely asymptomatic infected host, although this interval has no epidemiological meaning for such individuals). The total (instantaneous) force of infection exerted at time $t$ on each susceptible household member is then
where ${\beta}_{j}\phantom{\rule{negativethinmathspace}{0ex}}\left(t{t}_{j}\right)=0$ for $t\le {t}_{j}$, and the cumulative force of infection is
For $k=2,\dots ,{n}_{I}$, conditional on the sequence of infection times up to time ${t}_{k}$, the probability that host $k$ becomes infected at time ${t}_{k}$ is given by
where $\mathrm{exp}\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\Lambda}\phantom{\rule{negativethinmathspace}{0ex}}\left({t}_{k}\right)\right)$ represents the probability of host $k$ avoiding infection from household contacts that occurred before their actual time of infection, ${t}_{k}$ (Cauchemez et al., 2004; Ferguson et al., 2005). This factor, which was not included in the likelihood when we previously estimated the generation time using data from infectorinfectee transmission pairs (Hart et al., 2021), is required here because of the regularity of household contacts. Since household contacts occur frequently, it is necessary to account explicitly for contacts between infected and susceptible individuals that did not lead to transmission. The inclusion of this factor in the likelihood therefore corrects for the regularity of household contacts to ensure widely applicable generation time estimates (note that this factor is equal to one in the limit of a very small overall household infectiousness parameter, ${\beta}_{0}$).
For $k={n}_{I}+1,\dots ,n$, conditional on the entire sequence of infection times, ${t}_{1},\dots {,t}_{{n}_{I}}$, the probability of host $k$ never being infected is given by $\mathrm{exp}\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\Lambda}\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\infty}\right)\right)$. In the case of independent transmission and symptoms, we have
whereas for the mechanistic model, $\mathrm{exp}\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\Lambda}\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\infty}\right)\right)$ instead depends on the incubation periods of those hosts who develop symptoms, as well as the corresponding time periods for entirely asymptomatic infected hosts (see the Appendix).
The likelihood contribution from the household, $L\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta \right)$, where $\theta $ is the vector of unknown model parameters, can therefore be written as
Here, ${L}_{k,1}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta \right)$ is the contribution to the likelihood from the transmission, or absence of transmission, to host $k$, that is,
${L}_{k,2}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta \right)$ is the contribution from the incubation period of host $k$ (where applicable), that is, for the independent transmission and symptoms model,
where ${f}_{inc}$ is the probability density function of the incubation period (this was assumed to be known; the exact incubation period distribution we used is given below). For the mechanistic model, we also have a contribution to the likelihood from the (in this case not epidemiologically meaningful) times $\left({t}_{s,k}{t}_{k}\right)$ for entirely asymptomatic infected hosts, so that
Parameter estimation
Incubation period
For the independent transmission and symptoms model, we assumed a lognormal incubation period distribution with mean 5.8 days and standard deviation 3.1 days (McAloon et al., 2020). For the mechanistic model, we assumed a gamma distributed incubation period with the same mean and standard deviation; this was for mathematical convenience, since the incubation period could then be decomposed into the sum of independent gamma distributed latent and presymptomatic infectious periods. Results for the independent transmission and symptoms model using a gamma distributed incubation period are shown in Figure 1—figure supplement 7, and uncertainty in the exact parameters of the incubation period distribution is accounted for in Figure 1—figure supplement 8.
Parameter fitting procedure
Unknown model parameters were estimated using data augmentation MCMC. The observed data comprised information about whether or not individuals were ever infected and/or displayed symptoms, symptom onset dates, and for some individuals an upper bound on their infection time (corresponding to the date of a positive PCR test). These data were augmented with (estimated) precise times of infection and symptom onset (where applicable) for each infected host. No prior assumptions were made about the order of transmissions within each household.
Below, we outline the parameter fitting procedure that we used for the independent transmission and symptoms model. The procedure used for the mechanistic model was similar and is described in the Appendix.
Lognormal priors were assumed for fitted model parameters (these parameters were the mean and standard deviation of the generation time distribution, in addition to the overall infectiousness, ${\beta}_{0}$). The priors for the mean and standard deviation of the generation time distribution had medians of 5 days and 2 days, respectively (these choices were informed by previous estimates of the SARSCoV2 generation time distribution [Ferretti et al., 2020a; Ferretti et al., 2020b; Ganyani et al., 2020]), and were chosen to ensure a prior probability of only 0.025 that these parameters exceeded very high values of 10 days and 7 days, respectively. The exact priors we used are given in Appendix 1—table 2.
Here, we denote the vector of model parameters by $\theta $, and the augmented data by
where $\mathit{t}}^{\left(m\right)$ represents the augmented data from household $m=1,\dots ,M$, and $M$ is the total number of households. We write the (overall) likelihood as
where the likelihood contribution, ${L}^{\left(m\right)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}^{\left(m\right)}\right)$, from each household, $m$, was computed as described in the previous section (i.e. all households in the study were assumed to be independent), and we denote the prior density of $\theta $ by $\pi \phantom{\rule{negativethinmathspace}{0ex}}\left(\theta \right)$.
In each step of the chain, we carried out (in turn) one of the following:
Propose new values for each entry of the vector of model parameters, $\theta $, using independent normal proposal distributions for each parameter (around the corresponding parameter values in the previous step of the chain). Accept the proposed parameters, $\theta}_{prop$, with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{L\phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{prop};\mathit{t}\right)\pi \phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{prop}\right)}{L\phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{old};\mathit{t}\right)\pi \phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{old}\right)}},1\right),$where $\theta}_{old$ denotes the vector of parameter values from the previous step of the chain, and where the augmented data, $\mathit{t}$, remain unchanged in this step.
Propose new values for the precise symptom onset times of each symptomatic infected host, using independent uniform proposal distributions (within the day of symptom of onset for each host). For each household, $m$, accept the proposed augmented data, $\mathit{t}}_{prop}^{(m)$, from that household with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{prop}^{(m)}\right)}{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{old}^{(m)}\right)}},1\right),$where $\mathit{t}}_{old}^{(m)$ denotes the corresponding augmented data from the previous step of the chain, and where the model parameters, $\theta$, remain unchanged in this step (i.e. proposed times are accepted/rejected independently for each household, according to the likelihood contribution from that household).
Propose new values for the infection time of one randomly chosen symptomatic infected host in each household (in households where there was at least one), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, $m$, accept the proposed augmented data, $\mathit{t}}_{prop}^{(m)$, from that household with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{prop}^{(m)}\right)}{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{old}^{(m)}\right)}},1\right).$
Propose new values for the infection time of one randomly chosen asymptomatic infected host in each household (in households where there was at least one), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, $m$, accept the proposed augmented data, $\mathit{t}}_{prop}^{(m)$, from that household with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{prop}^{(m)}\right)}{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{old}^{(m)}\right)}},1\right).$
The chain was run for 10,000,000 iterations; the first 2,000,000 iterations were discarded as burnin. Posteriors were obtained by recording every 100 iterations of the chain.
Governance statement
The household study was approved by the PHE Research Ethics and Governance Group as part of the portfolio of PHE’s enhanced surveillance activities in response to the pandemic.
Appendix 1
Details of mechanistic model
In this model, each infected host (who develops symptoms) progresses through independent latent (E), presymptomatic infectious (P) and symptomatic infectious (I) stages of infection. The infectiousness of the host during the P and I stages is denoted by ${\beta}_{P}$ and ${\beta}_{I}$, respectively, and we denote the ratio ${\alpha}_{P}={\beta}_{P}/{\beta}_{I}$. We assumed the duration of each stage, denoted ${y}_{E/P/I}$, to be gamma distributed:
where we write $X\sim \text{Gamma}\phantom{\rule{negativethinmathspace}{0ex}}\left(a,b\right)$ for a gamma distributed random variable with shape parameter $a$ and scale parameter $b$. We assumed that ${k}_{E}+{k}_{P}={k}_{inc}$, so that the incubation period, ${\tau}_{inc}={y}_{E}+{y}_{P}$, is gamma distributed, with
We fixed the values of the parameters ${k}_{inc}$ and $1/\gamma $ (which represent the shape parameter of the incubation period distribution and the mean incubation period, respectively) in order to obtain the specified incubation period distribution (the exact values that we assumed are given in Appendix 1—table 1). For simplicity, we also assumed that ${k}_{I}=1$, so the symptomatic infectious period is exponentially distributed. The parameters ${k}_{E}$ (the shape parameter of the latent (E) period distribution), $1/\mu $ (the mean symptomatic infectious (I) period) and ${\alpha}_{P}$ (the ratio between the transmission rates of hosts in the P and I stages) were estimated when we fitted the model to the household transmission data.
Hosts who remain asymptomatic throughout infection were assumed to follow the same E/P/I stages, although in this case the distinction between the P and I stages has no epidemiological meaning. Stage durations, as well as the value of ${\alpha}_{P}$, were assumed to be identical for entirely asymptomatic hosts and those who develop symptoms, so that the generation time distribution is the same for all infected hosts.
Conditional infectiousness
For a host who develops symptoms, conditional on incubation period ${\tau}_{inc}$, the expected infectiousness at time since infection $\tau $ is (Hart et al., 2021)
Here, ${\beta}_{0}$ is the overall infectiousness parameter (see Methods in the main text), $n$ is the household size, ${F}_{I}\phantom{\rule{negativethinmathspace}{0ex}}\left({y}_{I}\right)$ is the cumulative distribution function of the duration of the I stage, $F}_{Beta}\phantom{\rule{negativethinmathspace}{0ex}}\left(x\phantom{\rule{thinmathspace}{0ex}};a,b\right)\phantom{\rule{thinmathspace}{0ex}$ is the cumulative distribution function of a beta distributed random variable with shape parameters $a$ and $b$, and
The cumulative conditional infectiousness can therefore be calculated to be
where ${F}_{Gamma}\phantom{\rule{negativethinmathspace}{0ex}}\left(x;a,b\right)$ is the cumulative distribution of a gamma distributed random variable with shape parameter $a$ and scale parameter $b$. The total force of infection exerted on each household member (over the course of infection) is then
The mean of this expression over the incubation period distribution is ${\beta}_{0}/n$.
For a host who remains asymptomatic throughout infection, conditional on the combined duration of the E and P stages, ${\tau}_{inc}={y}_{E}+{y}_{P}$, the infectiousness, $\beta \phantom{\rule{negativethinmathspace}{0ex}}\left(\tau \mid {\tau}_{inc}\right)$, is given by ${\alpha}_{A}$ times the corresponding expression for a host who develops symptoms. We note that in this case, ${\tau}_{inc}$ has no epidemiological interpretation, but this conditional infectiousness was useful when fitting the model to data (see ‘Parameter estimation’ below).
Generation time distribution
The generation time, ${\tau}_{gen}$, for an individual transmission can be written as
where ${y}_{E}$ is the length of the latent (E) stage, and $y}^{\ast$ is the time from the start of the presymptomatic infectious (P) stage to the transmission occurring. As shown by Hart et al., 2021, if the effect of susceptible depletion during infection is neglected, $y}^{\ast$ has density,
Using this density, it can be shown that the moments of this distribution are
In particular,
and
Note that for a gamma distributed random variable, $X\sim \text{Gamma}\left(a,b\right)$, we have
Therefore, for gamma distributed stage durations, explicit expressions can be obtained for the mean and variance of the generation time distribution,
where the last equality holds because ${y}_{E}$ and $y}^{\ast$ are assumed to be independent.
Proportion of presymptomatic transmissions
Among infectors who develop symptoms, the proportion of transmissions occurring prior to symptom onset (neglecting the effect of susceptible depletion during infection) is given by (Hart et al., 2021)
Parameter estimation
The vector of model parameters,
was estimated by fitting the mechanistic model to the household transmission data.
We assumed independent prior distributions for each entry of $\theta $. Lognormal priors were assumed for $1/\mu $, ${\alpha}_{P}$ and ${\beta}_{0}$. Since ${\alpha}_{P}$ represents the ratio between the transmission rates of hosts in the P and I stages, a prior with median one was used to ensure equal prior probabilities of values above and below one. This prior was also chosen to limit the prior probability of extreme values, with a prior 95% credible interval of [0.2,5]. A beta prior was used for ${k}_{E}/{k}_{inc}$ (which was constrained to lie between 0 and 1), and was chosen to restrict the prior probability of values very close to either 0 or 1. The exact priors we used are given in Appendix 1—table 3.
A slightly amended version of the parameter fitting algorithm described in the main text for the independent transmission and symptoms model was used. In particular, we augmented the observed data with:
The infection time, ${t}_{j}$, of each infected host.
The time, ${t}_{s,j}$, at which each infected host transitioned from the P to I stage.
Note that for hosts who develop symptoms, the time of entry into the I stage corresponds to the symptom onset time. The data were also augmented with this transition time for entirely asymptomatic infected hosts because the conditional infectiousness, $\beta \phantom{\rule{negativethinmathspace}{0ex}}\left(\tau \mid {t}_{s,j}{t}_{j}\right)$, is more straightforward to calculate than $\beta \phantom{\rule{negativethinmathspace}{0ex}}\left(\tau \right)$.
In each step of the chain, we carried out (in turn) one of the following:
Propose new values for each entry of the vector of model parameters, $\theta $, using a multivariate normal proposal distribution (around the value of $\theta $ in the previous step of the chain; a correlation of 0.5 was used between the proposal distributions of ${k}_{E}/{k}_{inc}$ and ${\alpha}_{P}$, and between those of $1/\mu $ and ${\alpha}_{P}$). Accept the proposed parameters, $\theta}_{prop$, with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{L\phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{prop};\mathit{t}\right)\pi \phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{prop}\right)}{L\phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{old};\mathit{t}\right)\pi \phantom{\rule{negativethinmathspace}{0ex}}\left({\theta}_{old}\right)}},1\right),$where ${\theta}_{old}$ denotes the vector of parameter values from the previous step of the chain, and where the augmented data, $\mathit{t}$ remain unchanged in this step.
Propose new values for the precise symptom onset times of each symptomatic infected host, using independent uniform proposal distributions (within the day of symptom of onset for each host). For each household, $m$, accept the proposed augmented data, $\mathit{t}}_{prop}^{\left(m\right)$, from that household with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{prop}^{(m)}\right)}{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{old}^{(m)}\right)}},1\right),$where $\mathit{t}}_{old}^{\left(m\right)$ denotes the corresponding augmented data from the previous step of the chain, and where the model parameters, $\theta $, remain unchanged in this step (i.e. proposed times are accepted/rejected independently for each household, according to the likelihood contribution from that household).
Propose new values for the infection time of one randomly chosen infected host in each household (either symptomatic or asymptomatic), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, $m$, accept the proposed augmented data, $\mathit{t}}_{prop}^{\left(m\right)$, from that household with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{prop}^{(m)}\right)}{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{old}^{(m)}\right)}},1\right).$
Propose new values for both the infection time, $t$, and the time of the start of the I stage, ${t}_{s}$, holding $\left({t}_{s}t\right)$ constant, for one randomly chosen asymptomatic infected host in each household (in households where there was at least one), using independent normal proposal distributions (around the equivalent times in the previous step of the chain). For each household, $m$, accept the proposed augmented data, $\mathit{t}}_{prop}^{\left(m\right)$, from that household with probability
$min\phantom{\rule{negativethinmathspace}{0ex}}\left({\displaystyle \frac{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{prop}^{(m)}\right)}{{L}^{(m)}\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta ;{\mathit{t}}_{old}^{(m)}\right)}},1\right).$
Relationship between generation time, TOST and serial interval
Here, we consider a randomly chosen infectorinfectee pair (in which both the infector and the infectee develop symptoms) within a large, wellmixed population, of which only a small proportion is infected. In that setting, the observed generation time distribution is equal to the normalised infectiousness profile, which will not be true within a household (compare Figure 1 and Figure 1—figure supplement 4). We define:
where we use $\tau $ for time intervals relative to the time of infection and $x$ for those relative to the time of symptom onset. We denote the probability density functions of these time periods by ${f}_{inc,1}$, ${f}_{inc,2}$, ${f}_{gen}$, ${f}_{tost}$ and ${f}_{ser}$, respectively. Note that
and
so that
In the independent transmission and symptoms model, ${\tau}_{gen}$ and ${\tau}_{inc,1}$ are assumed to be independent, and the incubation periods of the infector and infectee are assumed to be drawn independently from the population incubation period distribution, ${f}_{inc}={f}_{inc,1}={f}_{inc,2}$. Therefore, the TOST distribution is given by the convolution
Assuming that ${x}_{tost}$ and ${\tau}_{inc,2}$ are independent, the serial interval distribution can be calculated from the TOST distribution as
Note that
i.e. the generation time and serial interval distributions have the same mean.
For the mechanistic model, we still have ${f}_{inc,2}={f}_{inc}$, and the serial interval distribution can be calculated from the TOST distribution using Equation 2. On the other hand, ${\tau}_{gen}$ and ${\tau}_{inc,1}$ are not independent, so Equation 1 connecting the TOST and generation time distributions for the independent transmission and symptoms model does not hold for the mechanistic model. As shown by Hart et al., 2021, the TOST distribution for the mechanistic model is, instead, given by
Further, under the mechanistic model, the expected number of presymptomatic transmissions generated by an infected host is dependent on their incubation period. As a result, the infector’s incubation period does not follow the same distribution as that of the infectee. In particular, by Bayes’ theorem, we have
where we write 1 → 2 to denote the occurrence of the transmission from the infector to the infectee. Because we are here considering a large population, the probability of the transmission occurring is proportional to the overall infectiousness of the infector (integrated over the course of infection), $B\phantom{\rule{negativethinmathspace}{0ex}}\left(\mathrm{\infty}\right)$, so we have
The expected incubation period of the infector is then
where ${q}_{P}$ is the proportion of transmissions occurring prior to symptom onset.
As a result of the above, the expected values of the generation time and serial interval in the mechanistic model are not equal. Instead, we have
Under the values of ${k}_{inc}$ and $\gamma $ that we assumed (Appendix 1—table 1), this gives a mean generation time that is approximately $\left(1.6\times {q}_{P}\right)$ days longer than the mean serial interval.
Extension of framework to account for coprimary cases
In most of our analyses, we assumed that each household transmission chain was initiated by a single primary case, so that all other infected household members were infected from within the household. However, we also relaxed this assumption by extending our framework to account for the possibility of coprimary cases (Figure 1—figure supplement 5 and Figure 3—figure supplement 6). Rather than assuming that all coprimary cases were infected at exactly the same time, we instead assumed that each household member could be infected at any time during a primary infection event that was taken to last one day (the choice of one day was arbitrary but in principle any duration could be used). This enabled us to easily incorporate the possibility of coprimary cases into our data augmentation MCMC approach by adapting the likelihood function as described below.
As in Methods, we here consider a household (of size $n$) in which ${n}_{I}$ household members become infected (of whom ${n}_{S}$ develop symptoms and ${n}_{A}$ remain asymptomatic throughout infection) and ${n}_{U}$ remain uninfected. Under either the independent transmission and symptoms model or the mechanistic model, we now denote the total force of infection exerted on each susceptible member of the household by other household members at time $t$ by ${\lambda}_{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(t\right)$, and the cumulative force of infection by ${\mathrm{\Lambda}}_{h}\phantom{\rule{negativethinmathspace}{0ex}}\left(t\right)$ (i.e. these correspond to the quantities denoted by $\lambda \phantom{\rule{negativethinmathspace}{0ex}}\left(t\right)$ and $\mathrm{\Lambda}\phantom{\rule{negativethinmathspace}{0ex}}\left(t\right)$, respectively, in Methods). Assuming each (susceptible) household member is also subject to a constant force of infection, ${\beta}_{p}$, during a primary event taking place between times $t}_{p,\phantom{\rule{thinmathspace}{0ex}}start$ and $t}_{p,\phantom{\rule{thinmathspace}{0ex}}end$, the total force of infection exerted on each susceptible household member at time $t$ is
where
The cumulative force of infection is
where
We took $t}_{p,\phantom{\rule{thinmathspace}{0ex}}start$ and $t}_{p,\phantom{\rule{thinmathspace}{0ex}}end$ to be the start and end of the day of the first household member becoming infected, respectively.
The likelihood contribution from the household, $L\phantom{\rule{negativethinmathspace}{0ex}}\left(\theta \right)$, where $\theta $ is the vector of unknown model parameters, is then given by
Here,
and for the independent transmission and symptoms model,
where ${f}_{inc}$ is the probability density function of the incubation period, while for the mechanistic model,
The factor
is included to condition on at least one household member becoming infected during the primary transmission event.
Using this likelihood function, we fitted both models to the household data using the same data augmentation MCMC approach described for the independent transmission and symptoms model in Methods and for the mechanistic model earlier in the Appendix. Alongside other model parameters, we estimated the probability of each household member becoming infected during the primary transmission event,
in the MCMC procedure (in the case we considered, $\left({t}_{p,\phantom{\rule{thinmathspace}{0ex}}end}{t}_{p,\phantom{\rule{thinmathspace}{0ex}}start}\right)$ was always equal to one day, so ${\beta}_{p}$ could be calculated from this probability). A uniform prior was assumed for the probability of primary infection.
Supplementary tables
Data availability
All data generated or analysed during this study are included in the manuscript and its supporting files; a Source Data file has been provided for Figure 1. Code for reproducing our results is available at https://github.com/willshart/UKgenerationtimes (copy archived at swh:1:rev:729266e972315ba3344da430d5de58123fce4e4e).
References

Epidemiological and clinical characteristics of early COVID19 cases, United Kingdom of Great Britain and Northern IrelandBulletin of the World Health Organization 99:178–189.https://doi.org/10.2471/BLT.20.265603

Estimation in emerging epidemics: biases and remediesJournal of the Royal Society, Interface 16:20180670.https://doi.org/10.1098/rsif.2018.0670

Transmission of SARSCoV2 before and after symptom onset: impact of nonpharmaceutical interventions in ChinaEuropean Journal of Epidemiology 36:429–439.https://doi.org/10.1007/s10654021007464

A Bayesian MCMC approach to study transmission of influenza: application to household longitudinal dataStatistics in Medicine 23:3469–3487.https://doi.org/10.1002/sim.1912

Household transmission of 2009 pandemic influenza A (H1N1) virus in the United StatesThe New England Journal of Medicine 361:2619–2627.https://doi.org/10.1056/NEJMoa0905498

Metaanalysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction numberStatistical Methods in Medical Research 9622802211065159.https://doi.org/10.1177/09622802211065159

BookMathematical Epidemiology of Infectious Diseases: Model Building, Analysis and InterpretationJohn Wiley & Sons.

Practical considerations for measuring the effective reproductive number, R_{t}PLOS Computational Biology 16:e1008409.https://doi.org/10.1371/journal.pcbi.1008409

Key epidemiological drivers and impact of interventions in the 2020 SARSCoV2 epidemic in EnglandScience Translational Medicine 13:eabg4262.https://doi.org/10.1126/scitranslmed.abg4262

On the relationship between serial interval, infectiousness profile and generation timeJournal of the Royal Society, Interface 18:20200756.https://doi.org/10.1098/rsif.2020.0756

Reconciling earlyoutbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (SARSCoV2) outbreakJournal of the Royal Society, Interface 17:20200144.https://doi.org/10.1098/rsif.2020.0144

Inferring generationinterval distributions from contacttracing dataJournal of the Royal Society, Interface 17:20190719.https://doi.org/10.1098/rsif.2019.0719

A note on generation times in epidemic modelsMathematical Biosciences 208:300–311.https://doi.org/10.1016/j.mbs.2006.10.010

Key questions for modelling COVID19 exit strategiesProceedings of the Royal Society B: Biological Sciences 287:20201405.https://doi.org/10.1098/rspb.2020.1405

How generation intervals shape the relationship between growth rates and reproductive numbersProceedings of the Royal Society B: Biological Sciences 274:599–604.https://doi.org/10.1098/rspb.2006.3754

Optimal COVID19 quarantine and testing strategiesNature Communications 12:356.https://doi.org/10.1038/s41467020207428
Decision letter

Jennifer FleggReviewing Editor; The University of Melbourne, Australia

Eduardo FrancoSenior Editor; McGill University, Canada

Rowland Raymond KaoReviewer; University of Edinburgh, United Kingdom

Eamon ConwayReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Decision letter after peer review:
[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]
Thank you for submitting the paper "Inference of SARS–CoV–2 generation times using UK household data" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Rowland Raymond Kao (Reviewer #1); Eamon Conway (Reviewer #2).
We are sorry to say that, after consultation with the reviewers, we have decided that this work will not be considered further for publication by eLife.
Specifically, all of the reviewers agreed that there wasn't enough novelty in the manuscript, given that the main methodology has been previously published, to be considered in eLife. There were also concerns over the generalisability of the work. The work is very well written and important but would be better suited in a more specialised journal. The authors should consider emphasising the changes to the likelihood function to deal with household data, since this is a novel contribution of the work.
Reviewer #1:
This paper extends a previous analytical method that the authors developed to evaluate the time to infectiousness of COVID–19, in order to evaluate differences in the generation interval across different time periods during the course of the pandemic in England in 2020. The time to infectiousness (i.e. how long is it until infected individuals start producing virus in a way that is a risk of infecting others) is a generalisable concept. That is unless we expect there to be inherent differences in the way infected individuals progress to becoming infectious (when looking at distributions of outcomes, comparing between populations of interest) we can take a result from one population of individuals, and assume that it gives us a reasonable idea of how long it takes to become infectious, in another population. Differences in the way people come into contact with each other will have some influence on this, but generally speaking if a person is infectious after 4 days in China, you should be considering a person to be a risk of infecting others after 4 days in other countries as well.
In contrast, generation time (how long does it take an infected person, on average, to infect the persons they are going to infect?) depends strongly not just on the inherent characteristics of the virus, and progression of disease in individuals, but also (more strongly that time to infectiousness) the circumstances of contact between individuals. Because generation time is tied to so many other factors, one of the most reliable ways to estimate generation times is to analyse data where there are groups of in–contact individuals where there is likely to be highly likely that there is only one generation of transmission involved (where contacts between individuals are clustered, possibly two but with three generations highly unlikely). In this case, the most important unknowns are the time from when individuals are infected to when become infectious and the time to when they test positive – the requirement for time to infectiousness is why the methods used in the initial paper are appropriate for generating better generation time estimates.
As most published results relate to the very early stages of the pandemic in China where extensive contact tracing was done, there is some interest in understanding whether the generation times differ substantially in other locations and if they change over time (and therefore, why). In this analysis, Hart et al. estimate generation times across three, three month time periods using household contact data in England in 2020, and show differences in generation time estimates depending on the method used (in particular, when considering an approach which ties infectiousness to symptomatic development which they showed provided better results compared to other methods in their previous paper) and the period of 2020 over which the estimates are taken. While the result appears technically robust for the data analysed, its usefulness is limited by difficulty in extending the results – while a different dataset from ones used for the analyses in China they refer to, and from the result of Challen et al. that looked at contacts of international travellers in the UK, it is also in its own way quite specific and further breakdown of possible factors would be worthwhile. First, the limitations to household contacts means that it is not representative of general transmission in the population – household contacts are high risk, with many opportunities for transmission and may therefore be relatively short. Generalised contacts outside of households are likely to be less frequent and often of shorter duration and more strongly affected by diurnal and weekly rhythms. Second, it is also known that demographic factors such as ethnicity and income are strongly linked to infection and severe infection risk. While this does not tell us directly about any links to infectiousness and infectious contact, it is reasonable to consider a connection – and therefore a link to generation times. As such, in this relatively small sample (172 households, with much higher numbers in the first 3 months, compared to the middle or last three) differences in demographics may influence generation times as well. Finally, the α variant, first identified in Kent, was probably circulating for much of the final three months of this analysis – dominant by early 2021 in the UK, it would have had a variable proportion across much of those final three months, and also varied geographically in terms of proportion as well, with a much earlier rise in the SE and in London. Unless those proportions are known, it would be difficult to know how much differences in generation times are due to the variant, to demographics, or other, possibly behavioural factors. Thus, some caution should be applied before taking general lessons from it, at least in the absence of those additional considerations.
In my view, the bulk of the methodological innovation was in the original paper and therefore as it stands, the principle interest is in the estimates of the generation times themselves. However, while I do think there is some interest in these results really in my view, they are specific and situational. The data are limited as they are to a relatively small number of households, involving only household contacts, where the uncertainties of variants of concern, and demographics including ethnicity, income, nature of housing, etc. make it difficult to interpret the results with real generality. I would also recommend that the authors include a discussion of the biases that may limit the generality of their work.
Reviewer #2:
In this work, Hart et al. infer the generation interval for SARS–CoV–2 using infector–infectee pairs from household data. The generation interval is obtained across three different time intervals (March–April, May–August and September–November) and using both an "independent transmission" model and the "mechanistic" model that was originally proposed in Hart et al. 2021. The main result is that the inferred generation interval in September–November has decreased compared to the earlier months of the pandemic, irrespective of the model considered. Overall, the conclusions drawn in the paper are well supported and have been shown to be robust through a thorough sensitivity analysis.
Strengths
– They use a mechanistic model to account for the change in infectivity at symptom onset.
– A major strength of this investigation is that they can observe the dynamics of the generation time over three different time periods of the pandemic. To my knowledge, this is a novel result that allows for a more up to date understanding of SARS–CoV–2 transmission.
– Whilst not highlighted in the text, it appears that there has been significant effort to extend the likelihood function to appropriately model household dynamics. This is non–trivial work in my opinion, and I believe the details of the derivation will be of use to mathematical modellers that deal with susceptible depletion in their data.
Weaknesses
– The main weakness of the paper in its current form is that the analysis appears superficial, with a large amount of curve fitting and very little explanation. It would be beneficial if the authors delved more deeply into their results, especially with the mechanistic model. It would be very interesting to relate the changes in generation time to mechanisms of transmission.
– The authors calculate the mean and standard deviation of the generation interval across three different time points; however, they only present one figure with the distribution of the generation time (Figure 2). It would be interesting to know how the generation time distribution changes in time, as opposed to just the mean and standard deviation. I believe that such an analysis would link nicely to their previous work, where they highlight the importance of ongoing public health measures such as contact tracing.
I would like to congratulate the authors on a timely update to their work. I thoroughly enjoyed seeing their updated results, especially as some of the questions addressed have been of interest to myself. I do however have some recommendations.
I understand that writing a rather mathematical paper for a general audience can be quite complicated, but I feel in this case that the authors have done themselves a disservice by not emphasising the technical concepts in the paper. At first read it appears that the authors have taken their model and fitted values, which is not particularly interesting. It was only once I made it to the Materials and methods section where I found the significant extension on previous work. I believe highlighting the adaptation of the likelihood function to account for the household level data was non–trivial and should be mentioned earlier (I believe this could be placed in the Results section), adding to the appeal of the paper. I note that susceptible depletion is mentioned in the main text, but I believe you should elaborate on how the likelihood function has been constructed to account for this.
Throughout the work the posterior mean has been used as a point estimate for parameter values. I believe a more natural point estimate would be to choose the maximum of the posterior distribution. I notice that when looking at the posterior distributions of the mechanistic model (Figure S2), the maximum value of the posterior and the posterior mean differ by a wide mark for α_p and k_E/k_inc. The impact of this choice might be minimal, but I believe it should be investigated.
It would be interesting to know how the generation time distribution changes in time, as opposed to just the mean and standard deviation. This would be a simple extension where they take the point estimates for multiple time points to show the temporal variation. I believe that such an analysis would link nicely to your previous work.
I am uncertain why the arguments of the paragraph at line 227 are required. It appears that the point is to justify the inclusion of a 1/n factor in the force of infection, however, I believe this is an obvious factor to include (I would use 1/(n–1) rather than 1/n though) that does not require parameter fitting to understand. If you were to consider a multigroup SIR model with varying population numbers the 1/(n–1), where n is the number of individuals in the group, is included so as the force of infection acts on the proportion of individuals that are susceptible. If this was not the case, then a different β would be required in each group. As you argue that the β value is a constant and does not vary between households it makes sense that the β value must be scaled by the number of individuals in the household, otherwise you would need a different β value for each house (which would be impossible to infer given the small household sizes).
For reproducibility and transparency, I would like the authors to provide all code used to generate results, in line with eLife's policies on availability of data, software and research materials. This will allow other researchers to implement the methods they have developed on other data sets, but also enable confirmation that there is no coding mistakes.
Reviewer #3:
The authors have previously published a mechanistic model for inferring infectiousness profile that explicitly models dependence of the risk of onward transmission on the onset of symptoms on an individual. In the present study, they apply this model as well as another more commonly used model which assumes these two things (transmission risk and onset of symptoms) to be independent, to data from a household study conducted from March–Nov 2020 in the UK. Both the models find that the mean generation time in Sept–Nov 2020 is shorter than in the earlier periods of the study.
This is well–presented study with careful analysis and extensive sensitive analysis which shows that the modelled estimates are robust to a range of assumptions.
[Editors’ note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your article "Inference of the SARS–CoV–2 generation time using UK household data" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Rowland Raymond Kao (Reviewer #1); Eamon Conway (Reviewer #2).
This paper is a timely update to the authors previous work and will be of interest to those working on public health responses and the mathematical modelling of infectious diseases. In this work the authors infer the generation interval of SARS–CoV–2 which can allow for the assessment of public health measures. The derivation of the likelihood function is also of interest to mathematical modellers as it allows for the inference of the generation interval from data sets where susceptible depletion may dominate infection dynamics.
As is customary in eLife, the reviewers have discussed their critiques with one another. What follows below is the Reviewing Editor's edited compilation of the essential and ancillary points provided by reviewers in their critiques and in their interaction post–review. Please submit a revised version that addresses these concerns directly. Although we expect that you will address these comments in your response letter, we also need to see the corresponding revision clearly marked in the text of the manuscript. Some of the reviewers' comments may seem to be simple queries or challenges that do not prompt revisions to the text. Please keep in mind, however, that readers may have the same perspective as the reviewers. Therefore, it is essential that you attempt to amend or expand the text to clarify the narrative accordingly.
Essential revisions:
1) While the observation of reduced generation times is both useful if true, and potentially plausible, it may not be robust. The overlap between the posterior estimates of generation times etc. are quite broad – and looking across three periods it doesn't seem like it would take much to change the trends in even the mean values.
2) In particular, the size of the study is not that large, and in each household, it seems from the Miller paper, that only two PCR tests were taken – as the approach does not consider the impact of latent processes (i.e. missed infections) it would be important to know whether a slight bias in missed infections across periods would impact on the conclusions.
3) The authors also state (line 573) that "Potential bias towards more recent infection of the primary host if community prevalence is increasing, or less recent if prevalence is decreasing (Britton and Scalia 900 Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), was neglected." Could this also provide some possible explanation for the shift in generation times? Especially given that the justify their assumption in part on the analysis across individual months, and there are relatively few recruited households (on the order of 10 I think, based on Figure 3 in the supplement).
4) The authors also say that (line 150) " we corrected for the regularity of household contacts to derive more widely applicable estimates of the generation time. We did this by including a factor in the likelihood to account for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (see Materials and Methods for full details of our approach)." This sounds really interesting and would greatly increase the generality of the outcome. But unfortunately, from the description in the material and methods I was not able to figure out exactly why this was – which doesn't mean it’s wrong of course, but it would be helpful to me to have a more detailed description.
5) The authors state that on line 163 that "point estimates for each model using the posterior means of fitted model parameters because the mode of the joint posterior distribution could not easily be calculated from the output of the MCMC
procedure." It would be important to know whether there are any correlations in the parameter posteriors that might make inappropriate.
6) I spent some time trying to understand if there could be any issue causing the higher fraction of pre symptomatic transmission, which is the most unexpected result, and I could not find any obvious one. Same for the high variance of the generation time distribution (though this and the high pre–symptomatic transmission could be related). Hence, I think that these results can be published in the current form.
On the other way, the temporal changes in generation time do not seem to account for the epidemic dynamics and therefore would be biased upward in Spring 2020 and downward in Autumn 2020 as observed. The authors are aware of that as they explain in the Discussion, but I think that the author should either correct for this effect in their approach or clarify better how this effect is accounted for and what may its contribution be.
Reviewer #1:
The additional work done by the authors has been considerable and substantially increased the potential value of the work. In particular, the addition of data augmentation MCMC helps to provide greater depth to the outcomes, and the identification of declining generation times useful (especially if it could be established in 'real time' – i.e. rather than retrospectively, but to aid in understanding ongoing epidemics) and interesting.
I do have a few concerns which in my view need to be addressed before it would be suitable for publication in eLife.
First, while the observation of reduced generation times is both useful if true, and potentially plausible, it may not be robust. The overlap between the posterior estimates of generation times etc. are quite broad – and looking across three periods it doesn't seem like it would take much to change the trends in even the mean values.
In particular, the size of the study is not that large, and in each household, it seems from the Miller paper, that only two PCR tests were taken – as the approach does not consider the impact of latent processes (i.e. missed infections) it would be important to know whether a slight bias in missed infections across periods would impact on the conclusions.
The authors also state (line 573) that "Potential bias towards more recent infection of the primary host if community prevalence is increasing, or less recent if prevalence is decreasing (Britton and Scalia 900 Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), was neglected." Could this also provide some possible explanation for the shift in generation times? Especially given that the justify their assumption in part on the analysis across individual months, and there are relatively few recruited households (on the order of 10 I think, based on Figure 3 in the supplement).
The authors also say that (line 150) " we corrected for the regularity of household contacts to derive more widely applicable estimates of the generation time. We did this by including a factor in the likelihood to account for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (see Materials and Methods for full details of our approach)." This sounds really interesting and would greatly increase the generality of the outcome. But unfortunately, from the description in the material and methods I was not able to figure out exactly why this was – which doesn't mean it’s wrong of course, but it would be helpful to me to have a more detailed description.
The authors state that on line 163 that "point estimates for each model using the posterior means of fitted model parameters because the mode of the joint posterior distribution could not easily be calculated from the output of the MCMC
procedure." It would be important to know whether there are any correlations in the parameter posteriors that might make inappropriate.
Reviewer #2:
I'd like to thank the authors for updating the manuscript in a very thorough manner, I really enjoyed reading through the revisions. I believe that the authors have addressed all of my concerns.
Reviewer #4:
This excellent paper suggests that despite extensive studies, we have not yet reached a full understanding of the generation time of SARS–CoV–2. The study is a robust examination of the subject of generation time within households in UK, which may not be representative of transmission in other contexts. It is unclear to the reviewers if temporal changes in generation time are real and attributable to e.g. the appearance of B.1.177.
This work is sound. While surprising, the results are supported by multiple statistical/modelling approaches and robustness analyses, and believable.
The three most striking results are:
1) The width of the generation time distribution is much wider than in previous works. While this is undoubtedly surprising, the explanation by the authors is believable: home quarantine in the UK is probably less effective in stopping late transmissions within households and may even amplify them.
2) The fraction of presymptomatic transmissions is >70%, quite high compared to most previous estimates. Combined with the high number of fully asymptomatic individuals, it would imply that <20% of transmissions come from individuals showing symptoms. This result seems also hard to square with the previous one, which would suggest a wide distribution of TOST. Of course, this estimate may be affected by the setting, since the analysis is restricted to households and therefore a higher force of infection.
3) According to this work, the generation time changed between spring 2020 and autumn 2020 in the UK. This corresponds to the arrival of the B.1.177 lineage, probably more infectious than previous variants, but also to a different epidemiological phase of the epidemic: lockdown followed by gradual reopening in spring/summer, with a corresponding decrease in incidence, then a new wave in autumn with an increase in the number of cases until November. The authors do not correct for this epidemiological dynamic, therefore leaving open the possibility that it would cause an apparent change in generation time similar to the observed one. Other explanations (e.g. behavioural or reporting ones) may be possible.
It is important to remark that many of the results of the mechanistic model may be affected by the assumption that longer incubation intervals correspond to higher infectiousness. The agreement with the results of the simpler model with independent incubation period and generation time implies that this assumption is not relevant for the main results (with the possible exception of the longer mean generation time).
Recommendations:
The results of the paper look really robust.
I spent some time trying to understand if there could be any issue causing the higher fraction of pre symptomatic transmission, which is the most unexpected result, and I could not find any obvious one. Same for the high variance of the generation time distribution (though this and the high pre–symptomatic transmission could be related). Hence, I think that these results can be published in the current form.
On the other way, the temporal changes in generation time do not seem to account for the epidemic dynamics and therefore would be biased upward in Spring 2020 and downward in Autumn 2020 as observed. The authors are aware of that as they explain in the Discussion, but I think that the author should either correct for this effect in their approach or clarify better how this effect is accounted for and what may its contribution be.
https://doi.org/10.7554/eLife.70767.sa1Author response
[Editors’ note: The authors appealed the original decision. What follows is the authors’ response to the first round of review.]
Specifically, all of the reviewers agreed that there wasn't enough novelty in the manuscript, given that the main methodology has been previously published, to be considered in eLife. There were also concerns over the generalisability of the work. The work is very well written and important but would be better suited in a more specialised journal. The authors should consider emphasising the changes to the likelihood function to deal with household data, since this is a novel contribution of the work.
Thank you for your helpful feedback and comments that have allowed us to improve our manuscript. As a Research Advance article, the main aim of our study is to update the results of our previous work with more recent estimates of the SARSCoV2 generation time. However, we agree with Reviewer 2 that the adaptation of the likelihood function to estimate the generation time using household data represents a significant methodological extension of our earlier work.
As recommended by Reviewer 2, we have therefore added a new paragraph to the start of the Results to improve the exposition of this methodological advance (lines 137144 and 150154). We describe how we used a data augmentation MCMC approach in which we augmented the observed data with both estimated times of infection and the precise times at which symptomatic infected hosts developed symptoms (compared to our earlier work in which only symptom onset times were imputed; lines 140142). This allowed us to account (in the likelihood function) for two important differences between the household transmission data considered here and the data from infectorinfectee pairs used in our previous study: first, we accounted for uncertainty in exactly who infected whom within a household by summing together likelihood contributions corresponding to infection by different possible infectors (lines 142144 and 150); second, we corrected for the regularity of household contacts by including a factor in the likelihood function that accounts for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (lines 150154).
In addition, a further novel component of our research compared to other previous studies in which the generation time has been estimated is the inclusion of the contribution of entirely asymptomatic infectors in the likelihood function. We also highlight this clearly in the revised manuscript (lines 613616).
We hope our responses to the points raised by Reviewer 1 below alleviate the initial concerns about the generalisability of our results. In particular, we emphasise that our approach specifically corrects for the regularity of household contacts to give more widely applicable estimates of the generation time (see lines 150154, 654659 and 675677 of the revised manuscript). Since household data are routinely collected during epidemics, our modelling framework can be used to estimate the generation time (an important measure describing the timescale of realised transmission) during future outbreaks of many different pathogens. Furthermore, our general finding that the generation time changes temporally is important, as it highlights the importance of monitoring the generation time throughout epidemics so that transmission can be characterised accurately. Finally, we emphasise that our results provide some of the most uptodate estimates of the SARSCoV2 generation time. We therefore believe that this research is both generalisable and of widespread interest.
Reviewer #1:
This paper extends a previous analytical method that the authors developed to evaluate the time to infectiousness of COVID–19, in order to evaluate differences in the generation interval across different time periods during the course of the pandemic in England in 2020. The time to infectiousness (i.e. how long is it until infected individuals start producing virus in a way that is a risk of infecting others) is a generalisable concept. That is unless we expect there to be inherent differences in the way infected individuals progress to becoming infectious (when looking at distributions of outcomes, comparing between populations of interest) we can take a result from one population of individuals, and assume that it gives us a reasonable idea of how long it takes to become infectious, in another population. Differences in the way people come into contact with each other will have some influence on this, but generally speaking if a person is infectious after 4 days in China, you should be considering a person to be a risk of infecting others after 4 days in other countries as well.
In contrast, generation time (how long does it take an infected person, on average, to infect the persons they are going to infect?) depends strongly not just on the inherent characteristics of the virus, and progression of disease in individuals, but also (more strongly that time to infectiousness) the circumstances of contact between individuals. Because generation time is tied to so many other factors, one of the most reliable ways to estimate generation times is to analyse data where there are groups of in–contact individuals where there is likely to be highly likely that there is only one generation of transmission involved (where contacts between individuals are clustered, possibly two but with three generations highly unlikely). In this case, the most important unknowns are the time from when individuals are infected to when become infectious and the time to when they test positive – the requirement for time to infectiousness is why the methods used in the initial paper are appropriate for generating better generation time estimates.
We thank the reviewer for their helpful comments and are pleased that they recognise that our mechanistic model is appropriate for estimating the generation time. The reviewer is correct that the distribution of the time to infectiousness is likely to be more consistent between settings than that of the generation time, which depends on both the infectiousness of infected hosts at different times since infection and on behavioural factors (for example, if infected individuals selfisolate after developing symptoms, this acts to reduce the generation time; adding this explicit link between symptoms and infectiousness was the main advance of our original eLife article). Unfortunately, however, in many scenarios it is most important to estimate the generation time (rather than inherent infectiousness), since the generation time describes realised transmission. For example, estimates of the timedependent reproduction number depend on the generation time distribution, since it is a characteristic of realised transmission in the population. As a result, obtaining uptodate and locationspecific estimates of the SARSCoV2 generation time is crucial, particularly in light of our finding that the generation time changes
As most published results relate to the very early stages of the pandemic in China where extensive contact tracing was done, there is some interest in understanding whether the generation times differ substantially in other locations and if they change over time (and therefore, why). In this analysis, Hart et al. estimate generation times across three, three month time periods using household contact data in England in 2020, and show differences in generation time estimates depending on the method used (in particular, when considering an approach which ties infectiousness to symptomatic development which they showed provided better results compared to other methods in their previous paper) and the period of 2020 over which the estimates are taken. While the result appears technically robust for the data analysed, its usefulness is limited by difficulty in extending the results – while a different dataset from ones used for the analyses in China they refer to, and from the result of Challen et al. that looked at contacts of international travellers in the UK, it is also in its own way quite specific and further breakdown of possible factors would be worthwhile.
We agree with the reviewer that investigating whether the generation time varies by location and temporally is an interesting research question. Since, as we show, the generation time actually does vary temporally, it is crucial to monitor the generation time during epidemics and use the most uptodate estimates when analysing populationlevel transmission.
While we used data from households in our analyses, our approach corrects for the regularity of household contacts to obtain widely applicable generation time estimates (see lines 150154, 654659 and 675677 of the revised manuscript and our response to the reviewer’s next point below). Since household data are routinely collected, we contend that this manuscript provides a useful advance on our previous manuscript (which considered data from known transmission pairs) by providing a general framework for estimating the generation time, as well as some of the most uptodate SARSCoV2 generation time estimates currently available.
We also agree with the reviewer that a further breakdown of possible factors would be a worthwhile extension of this research. Of course, doing this would require data on the characteristics of individuals and households (e.g. ages or socioeconomic statuses of different individuals) to be available. In the Discussion of the revised manuscript, we explain the need to conduct such analyses in future to understand how the generation time depends on specific characteristics more clearly (lines 682686).
First, the limitations to household contacts means that it is not representative of general transmission in the population – household contacts are high risk, with many opportunities for transmission and may therefore be relatively short. Generalised contacts outside of households are likely to be less frequent and often of shorter duration and more strongly affected by diurnal and weekly rhythms.
We agree that the high frequency of household contacts would be expected to lead to shorter generation times within households than in the wider population. However, we explicitly correct for this in our analysis. In the revised manuscript, we now highlight in both the Results (lines 150154) and the Discussion (lines 654657) that we include the regularity of household contacts and the availability of susceptible hosts in households in the likelihood function to derive widely applicable estimates of the generation time. These estimates, which correspond to the generation time assuming a constant supply of susceptibles during infection (lines 227228, 238240 and 654657), can then be conditioned to specific population structures (lines 657659). For example, we estimated the realised generation times within the study households in Figure 1—figure supplement 4. As expected, these household generation times are shorter than our main estimates in Figure 1 (lines 240249, 657659 and 675677).
Moreover, our work demonstrates the important principle that changes in the generation time can be detected using data from household studies, highlighting both the importance of continued monitoring of the generation time and the role of household data in monitoring efforts (lines 686692 of the revised manuscript). Finally, we note that household data have previously been used to estimate the generation time for other pathogens – see particularly the highly cited study of influenza by Ferguson et al. (https://doi.org/10.1038/nature04017) to which we refer in our manuscript.
Second, it is also known that demographic factors such as ethnicity and income are strongly linked to infection and severe infection risk. While this does not tell us directly about any links to infectiousness and infectious contact, it is reasonable to consider a connection – and therefore a link to generation times. As such, in this relatively small sample (172 households, with much higher numbers in the first 3 months, compared to the middle or last three) differences in demographics may influence generation times as well.
While we agree with the reviewer that the accuracy of our estimates may have been impacted if the study households were not representative of the wider population, we do not believe this caveat to be any more specific to our study than to other studies in which the SARSCoV2 generation time has been estimated. In fact, our sample size is larger than those used in all other such studies of which we are aware. We discuss this point in our revised manuscript (lines 679682) and note that comparing the generation time between individuals/households of different characteristics is an interesting and important area for future work (lines 682686).
Finally, the Alpha variant, first identified in Kent, was probably circulating for much of the final three months of this analysis – dominant by early 2021 in the UK, it would have had a variable proportion across much of those final three months, and also varied geographically in terms of proportion as well, with a much earlier rise in the SE and in London. Unless those proportions are known, it would be difficult to know how much differences in generation times are due to the variant, to demographics, or other, possibly behavioural factors. Thus, some caution should be applied before taking general lessons from it, at least in the absence of those additional considerations.
Thank you for this interesting comment. In fact, the Public Health England household study underlying our results included genomic surveillance. The Αlpha variant was only responsible for infections in two study households, so we can be confident that this variant was not responsible for our finding of a temporal decrease in the generation time. Since this is an important point, we have now stated it clearly in both the Results and Discussion of the revised manuscript (lines 338342, 588592 and 598601). If more recent data become available, obtaining further updated generation time estimates in light of novel variants is an important area of future work (as noted in lines 601603 of the revised submission).
In my view, the bulk of the methodological innovation was in the original paper and therefore as it stands, the principle interest is in the estimates of the generation times themselves.
As far as we understand, the key criterion for publication of a Research Advance manuscript in eLife is that it provides new results that build on the original published eLife article. We would therefore request that our submission – which builds on our previously published research by providing updated generation time estimates and demonstrates temporal variation in the generation time – is considered in this context.
We would also like to emphasise that the adaptation of our approach to estimate the generation time using household data (rather than data from transmission pairs) required a substantial advance in our methodology. We have improved the exposition of this methodological advance by adding a new paragraph to the Results of our revised manuscript (lines 137144 and 150154) as recommended by Reviewer 2.
In our revised submission, we have also furthered our methodological innovation by adding a new analysis in which we relax the assumption that each household infection chain was initiated by a single primary case, instead accounting for the possibility of coprimary infections (Figure 1—figure supplement 5, Figure 3—figure supplement 6, and lines 360378 and 643650). The novel way in which we incorporated coprimary cases is described in detail in lines 16841741 of the Appendix in our revised submission. Even with this extension to our approach, our main qualitative finding of a temporal decrease in the generation time was unchanged (Figure 3—figure supplement 6 and lines 377378 and 648650).
However, while I do think there is some interest in these results really in my view, they are specific and situational. The data are limited as they are to a relatively small number of households, involving only household contacts, where the uncertainties of variants of concern, and demographics including ethnicity, income, nature of housing, etc. make it difficult to interpret the results with real generality. I would also recommend that the authors include a discussion of the biases that may limit the generality of their work.
We hope our responses to the points above reassure the reviewer about the generalisability of our results. The household study analysed here involves a larger number of participants than previous studies, we explicitly account for the repetitiveness of household transmission when deriving widely applicable generation time estimates, and we provide information about variants of concern. We thank the reviewer for their helpful comments, allowing us to make these points more clearly in our revised submission, and – as recommended – we have now included a discussion of the limitations of our study in the revised manuscript (lines 652659 and 675692).
Reviewer #2:
In this work, Hart et al. infer the generation interval for SARS–CoV–2 using infector–infectee pairs from household data. The generation interval is obtained across three different time intervals (March–April, May–August and September–November) and using both an "independent transmission" model and the "mechanistic" model that was originally proposed in Hart et al. 2021. The main result is that the inferred generation interval in September–November has decreased compared to the earlier months of the pandemic, irrespective of the model considered. Overall, the conclusions drawn in the paper are well supported and have been shown to be robust through a thorough sensitivity analysis.
We thank the reviewer for their useful comments and suggestions and are pleased that the reviewer considers our conclusions to be well supported and robust.
Strengths
– They use a mechanistic model to account for the change in infectivity at symptom onset.
– A major strength of this investigation is that they can observe the dynamics of the generation time over three different time periods of the pandemic. To my knowledge, this is a novel result that allows for a more up to date understanding of SARS–CoV–2 transmission.
– Whilst not highlighted in the text, it appears that there has been significant effort to extend the likelihood function to appropriately model household dynamics. This is non–trivial work in my opinion, and I believe the details of the derivation will be of use to mathematical modellers that deal with susceptible depletion in their data.
We thank the reviewer for highlighting some of the key strengths of our study. We agree that the methodological advance in this study is important and useful for epidemiological modellers, and we thank the reviewer for encouraging us to highlight this more clearly. As described in our response to the editorial comments above, we have therefore followed the reviewer’s suggestion by adding a paragraph to the Results in which we summarise the methodological advance required to fit the models developed in our previous work to data from households rather than infectorinfectee pairs (lines 137144 and 150154).
Weaknesses
– The main weakness of the paper in its current form is that the analysis appears superficial, with a large amount of curve fitting and very little explanation. It would be beneficial if the authors delved more deeply into their results, especially with the mechanistic model. It would be very interesting to relate the changes in generation time to mechanisms of transmission.
While the primary aim of this research was to obtain updated generation time estimates and demonstrate the key principle that this important quantity is changing, in our revised submission we have extended the analyses within and around Figure 3 to delve deeper into the finding of a temporal decrease in the generation time.
First, we have added a new panel to Figure 3 (panel C in the revised submission) in which we show that the predicted decrease in generation time was accompanied by an increase in the proportion of presymptomatic transmissions, with a very high 83% of transmissions predicted to occur before symptom onset (among infectors who developed symptoms) in SeptemberNovember (lines 325332). We note in the Discussion (lines 570572) that this finding is consistent with our hypothesis that a shorter generation time in the autumn months may have resulted from increased indoor contacts as the weather became colder, particularly among individuals without COVID19 symptoms (whereas symptomatic hosts were still expected to selfisolate; lines 559562 and 568570).
Second, as suggested by the reviewer below, we have added a new figure (Figure 3—figure supplement 3) in which we compare the generation time distribution itself between the three different time periods (compared to Figure 3, where we focus on the mean and standard deviation of this distribution), as well as the distributions of the time from symptom onset to transmission (TOST) and the serial interval. Both models indicate that the transmission risk peaked earlier in infection for individuals infected in SeptemberNovember compared to earlier months (lines 321325).
Third, we have added a figure (Figure 3—figure supplement 5) in which we compare estimates of individual model parameters for the mechanistic model between the different time periods. As described in lines 348354 of the revised manuscript, this showed that our finding of a shorter generation time and higher proportion of presymptomatic transmissions in SeptemberNovember compared to earlier months may have resulted from any of: (i) an increase in the relative infectiousness of presymptomatic infectious infectors compared to symptomatic infectors (which is consistent with the hypothesis of increased indoor mixing among nonsymptomatic individuals described above); (ii) a decrease in the (mean) duration of the symptomatic infectious period (which could, for example, result from faster isolation of symptomatic individuals); or (iii) a decrease in the (mean) time to infectiousness. However, since there was substantial overlap in the credible intervals for each individual parameter between the time periods, it was not possible to definitively identify the parameter(s) responsible for the observed change in the generation time (lines 354357).
– The authors calculate the mean and standard deviation of the generation interval across three different time points; however, they only present one figure with the distribution of the generation time (Figure 2). It would be interesting to know how the generation time distribution changes in time, as opposed to just the mean and standard deviation. I believe that such an analysis would link nicely to their previous work, where they highlight the importance of ongoing public health measures such as contact tracing.
As described in our response to the previous point above, we have implemented this excellent suggestion in our revised submission (Figure 3—figure supplement 3 and lines 321325).
I would like to congratulate the authors on a timely update to their work. I thoroughly enjoyed seeing their updated results, especially as some of the questions addressed have been of interest to myself. I do however have some recommendations.
We thank the reviewer for recognising the interest of updated estimates of the generation time and for their useful recommendations.
I understand that writing a rather mathematical paper for a general audience can be quite complicated, but I feel in this case that the authors have done themselves a disservice by not emphasising the technical concepts in the paper. At first read it appears that the authors have taken their model and fitted values, which is not particularly interesting. It was only once I made it to the Materials and methods section where I found the significant extension on previous work. I believe highlighting the adaptation of the likelihood function to account for the household level data was non–trivial and should be mentioned earlier (I believe this could be placed in the Results section), adding to the appeal of the paper. I note that susceptible depletion is mentioned in the main text, but I believe you should elaborate on how the likelihood function has been constructed to account for this.
We thank the reviewer for this helpful suggestion which has allowed us to improve the manuscript. As described above, we have followed the reviewer’s suggestion by describing earlier in the manuscript the methodological advance required to fit the models developed in our previous work to household data rather than data from infectorinfectee pairs (lines 137144 and 150154). We agree that this adds to the appeal of this paper.
Throughout the work the posterior mean has been used as a point estimate for parameter values. I believe a more natural point estimate would be to choose the maximum of the posterior distribution. I notice that when looking at the posterior distributions of the mechanistic model (Figure S2), the maximum value of the posterior and the posterior mean differ by a wide mark for α_p and k_E/k_inc. The impact of this choice might be minimal, but I believe it should be investigated.
The mode of the joint posterior distribution of the fitted model parameters (i.e., the maximum a posteriori estimate) is not readily available as an output from the data augmentation MCMC approach that we used to fit the two models to the household data. Therefore, as in other studies using data augmentation MCMC (see, for example, the studies by Ferguson et al. (https://doi.org/10.1038/nature04017) and by Cauchemez et al. (https://doi.org/10.1002/sim.1912) to which we refer in our manuscript), we used the posterior mean as a point estimate. We state this justification for using the posterior mean in lines 162166 of the revised manuscript.
A possible alternative is to obtain point estimates by estimating the mode of the marginal posterior distribution of each fitted parameter. As noted by the reviewer, this would have a substantial effect on point estimates of some fitted parameters for the mechanistic model. However, both methods of obtaining point parameter estimates lead to very similar inferred estimates of the generation time. For example, for the posterior parameter distributions for the mechanistic model shown in Figure 1—figure supplement 2 (the figure corresponding to Figure S2 in our initial submission), the inferred point estimate of the mean generation time is 5.9 days when using either posterior means or marginal posterior modes, and the point estimate of the standard deviation is 4.8 days in both cases.
It would be interesting to know how the generation time distribution changes in time, as opposed to just the mean and standard deviation. This would be a simple extension where they take the point estimates for multiple time points to show the temporal variation. I believe that such an analysis would link nicely to your previous work.
As noted above, we have implemented this excellent suggestion in our revised submission (Figure 3—figure supplement 3 and lines 321325).
I am uncertain why the arguments of the paragraph at line 227 are required. It appears that the point is to justify the inclusion of a 1/n factor in the force of infection, however, I believe this is an obvious factor to include (I would use 1/(n–1) rather than 1/n though) that does not require parameter fitting to understand. If you were to consider a multigroup SIR model with varying population numbers the 1/(n–1), where n is the number of individuals in the group, is included so as the force of infection acts on the proportion of individuals that are susceptible. If this was not the case, then a different β would be required in each group. As you argue that the β value is a constant and does not vary between households it makes sense that the β value must be scaled by the number of individuals in the household, otherwise you would need a different β value for each house (which would be impossible to infer given the small household sizes).
This is an interesting point. We agree with the reviewer that frequencydependent transmission is a natural assumption, and that 1/(n1) may be a more natural choice of scaling factor than 1/n. We used the factor 1/n in most of our analyses since this is a common choice in the literature (see for example two papers by Cauchemez et al. (https://doi.org/10.1002/sim.1912 and https://doi.org/10.1371/journal.ppat.1004310) to which we refer in our manuscript). However, we also show in Figure 1—figure supplement 10 that the exact choice of either 1/n or 1/(n1) has a minimal effect on our estimates of the generation time (see also lines 423424 and 439441 of the revised manuscript).
We felt it was important to confirm the robustness of our results to the assumption of frequencydependent transmission because some previous studies have predicted household influenza transmission to be a somewhere between frequency and densitydependent – for example, two studies by Ferguson et al. (https://doi.org/10.1038/nature04017) and Endo et al. (https://doi.org/10.1371/journal.pcbi.1007589) to which we refer in our manuscript predicted transmission to scale with 1/n^0.8 and 1/n^0.5, respectively. This motivation for including this sensitivity analysis in our work is now outlined in the Results of the revised manuscript (lines 408414; this corresponds to the paragraph at line 227 in the original submission mentioned by the reviewer).
For reproducibility and transparency, I would like the authors to provide all code used to generate results, in line with eLife's policies on availability of data, software and research materials. This will allow other researchers to implement the methods they have developed on other data sets, but also enable confirmation that there is no coding mistakes.
We completely agree with the need to ensure that all code is publicly available. The code underlying our analyses is publicly available at https://github.com/willshart/UKgenerationtimes. We include this link in the data availability section of our revised submission.
Reviewer #3:
The authors have previously published a mechanistic model for inferring infectiousness profile that explicitly models dependence of the risk of onward transmission on the onset of symptoms on an individual. In the present study, they apply this model as well as another more commonly used model which assumes these two things (transmission risk and onset of symptoms) to be independent, to data from a household study conducted from March–Nov 2020 in the UK. Both the models find that the mean generation time in Sept–Nov 2020 is shorter than in the earlier periods of the study.
This is well–presented study with careful analysis and extensive sensitive analysis which shows that the modelled estimates are robust to a range of assumptions.
We are pleased that the reviewer found our study to be wellpresented and for recognising the significant sensitivity analyses that we performed to ensure that our results are robust.
[Editors’ note: what follows is the authors’ response to the second round of review.]
Essential revisions:
1) While the observation of reduced generation times is both useful if true, and potentially plausible, it may not be robust. The overlap between the posterior estimates of generation times etc. are quite broad – and looking across three periods it doesn't seem like it would take much to change the trends in even the mean values.
This is an interesting point, which has motivated us to undertake an explicit quantitative comparison of the posterior estimates of the mean generation time between the different time periods. We found that the independent transmission and symptoms model indicated a 97% posterior probability of a shorter mean generation time in SeptemberNovember 2020 compared to MayAugust, and the mechanistic model a 98% posterior probability. These comparisons are included in the Results of our revised submission (lines 273282). These results provide quantitative evidence of the robustness of our finding of a shorter generation time in the autumn of 2020 compared to earlier months.
2) In particular, the size of the study is not that large, and in each household, it seems from the Miller paper, that only two PCR tests were taken – as the approach does not consider the impact of latent processes (i.e. missed infections) it would be important to know whether a slight bias in missed infections across periods would impact on the conclusions.
The reviewer is correct that two PCR tests were taken by each household member as part of the household study, but in fact antibody testing was also carried out (see for example lines 683684 of the revised manuscript). We expect the combination of PCR and antibody testing to have minimised any possibility of missed infections.
We also conducted a sensitivity analysis (shown in Figure 1—figure supplement 12 and described in lines 425433) in which we considered different assumptions regarding the infection status of 34 individuals for whom infection status could not be determined (these individuals did not return a positive PCR test and did not undertake antibody testing), obtaining almost identical estimates of the generation time under each assumption considered (although estimates of the overall infectiousness parameter, $\beta}_{0$, were affected by the exact assumption). If a small number of infected individuals never returned a positive PCR test and tested negative for antibodies, then we similarly expect such potentially missed infections to have had a very small effect on our generation time estimates.
3) The authors also state (line 573) that "Potential bias towards more recent infection of the primary host if community prevalence is increasing, or less recent if prevalence is decreasing (Britton and Scalia 900 Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), was neglected." Could this also provide some possible explanation for the shift in generation times? Especially given that the justify their assumption in part on the analysis across individual months, and there are relatively few recruited households (on the order of 10 I think, based on Figure 3 in the supplement).
We have expanded our discussion of this important point in our revised submission (lines 541556). In particular, we note the possibility that overrepresentation of shorter generation times in a growing epidemic may have contributed to our shorter estimated mean generation time for SeptemberNovember 2020 (particularly in September and October, when national case numbers were mostly increasing) compared to earlier months of the study (when case numbers were mostly decreasing; lines 541548). However, our mean generation time estimate for November 2020 (in which case numbers were mostly decreasing) is similar to the estimates for September and October 2020 (see Figure 3—figure supplement 4). This suggests that the effect of these background epidemic dynamics did not drive the temporal changes in the generation time that we inferred (lines 548552). Finally, as mentioned by the reviewer, we note that an important caveat regarding this comparison between generation time estimates in individual months is the relatively small sample size per month (lines 552553).
4) The authors also say that (line 150) " we corrected for the regularity of household contacts to derive more widely applicable estimates of the generation time. We did this by including a factor in the likelihood to account for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (see Materials and Methods for full details of our approach)." This sounds really interesting and would greatly increase the generality of the outcome. But unfortunately, from the description in the material and methods I was not able to figure out exactly why this was – which doesn't mean it’s wrong of course, but it would be helpful to me to have a more detailed description.
We have expanded the relevant description in the Materials and Methods as requested (lines 848860).
In our modelling approach, the instantaneous force of infection exerted by an infected host onto each susceptible individual in their household at time since infection τ is given by
The function f(τ) represents the (normalised) relative infectiousness profile of an infected host at each time since infection and is independent of the household structure. The total withinhousehold force of infection on any susceptible individual, λ(t), essentially involves a sum of β(t) terms for each infected individual in the household. The probability of individual k becoming infected at time t_{k} requires both: (i) the individual to avoid infection before time t_{k}; and (ii) the individual to then become infected at time t_{k}.
In our previous eLife article (upon which this Research Advance builds), we considered transmission between known infectorinfectee transmission pairs. In that analysis, we estimated f(τ) using a likelihood function that included a term corresponding to point (ii) above. However, as is common in studies estimating the generation time using data from infectorinfectee pairs, we did not include a term corresponding to point (i) in that study – the exclusion of such a term amounts to an implicit assumption that contacts between the infector and infectee in each transmission pair are sporadic and of short duration, so that the probability of the infector transmitting the pathogen to the infectee before time τ is negligible (and similarly for the probability of the infectee being infected before time τ by an individual other than their eventual infector).
In this Research Advance, to estimate f(τ) from the household data, we added a term to the likelihood corresponding to point (i) – specifically, the factor exp(−Λ(t_{k})), where Λ(t) is the integral of λ(t) between times −∞ and t. This term represents the probability of avoiding infection from household contacts occurring before time t_{k}. This probability may be nonnegligible in the household context due to the high frequency of household contacts. Inclusion of this term therefore allowed us to correct for the regularity of household contacts to correctly derive estimates of f(τ) from the household data.
We also now clarify in the Discussion (lines 631642) that the expected infectiousness profile, f(τ), provides a widely applicable estimate of the generation time distribution that is independent of the household size (specifically, f(τ) gives the generation time distribution under the assumption that a constant supply of susceptible individuals is available throughout the host’s course of infection). In principle, f(τ) can be used to calculate the generation time distribution of realised transmissions in different settings by combining this function with the contact network of those other settings – see for example the estimates of realised generation times in study households in Figure 1—figure supplement 4, which are shorter than our main generation time estimates in Figure 1 (which are derived from f(τ)) because of the regularity of household contacts and the depletion of susceptible individuals within households before longer generation times can be obtained.
5) The authors state that on line 163 that "point estimates for each model using the posterior means of fitted model parameters because the mode of the joint posterior distribution could not easily be calculated from the output of the MCMC
procedure." It would be important to know whether there are any correlations in the parameter posteriors that might make inappropriate.
As described in the sentence quoted by the reviewer (lines 161165 of the revised manuscript), we used posterior means as point estimates of directly fitted model parameters. These point parameter values were then used to calculate point estimates of secondary quantities, such as the mean and standard deviation of the generation time distribution for the mechanistic model (please note that these two quantities were directly fitted for the independent transmission and symptoms model, but were secondary quantities for the mechanistic model), and the proportion of presymptomatic transmissions for both models.
We do not think that correlations between parameter posteriors would necessarily make this approach inappropriate. However, we do note here that an alternative method would be to first calculate the posterior distributions of secondary quantities using the output of the MCMC procedure (by calculating “current” estimates of these quantities at each step of the chain, as we did to obtain the violin plots shown in Figure 1), then calculate the means of these distributions. This method would account for correlations between the posterior distributions of fitted parameters, but we instead chose to use our approach as described above to ensure consistency of point estimates (for example, ensuring that if the generation time distribution in the independent transmission and symptoms model had the point estimate values of the mean and standard deviation, then the corresponding proportion of presymptomatic transmissions would also be given by the point estimate of that quantity). Nonetheless, we found the two approaches for calculating point estimates to give similar answers – for example, point estimates of the proportion of presymptomatic transmissions for the independent transmission and symptoms model using our method and the alternative approach were 0.72 and 0.72, respectively; point estimates of the proportion of presymptomatic transmissions for the mechanistic model were 0.74 and 0.73; point estimates of the mean generation time using the mechanistic model were 5.9 days and 6.0 days.
6) I spent some time trying to understand if there could be any issue causing the higher fraction of pre symptomatic transmission, which is the most unexpected result, and I could not find any obvious one. Same for the high variance of the generation time distribution (though this and the high pre–symptomatic transmission could be related). Hence, I think that these results can be published in the current form.
We are pleased the reviewer is happy for these results to be published in their current form.
On the other way, the temporal changes in generation time do not seem to account for the epidemic dynamics and therefore would be biased upward in Spring 2020 and downward in Autumn 2020 as observed. The authors are aware of that as they explain in the Discussion, but I think that the author should either correct for this effect in their approach or clarify better how this effect is accounted for and what may its contribution be.
As described in our response to point 3 above, we have expanded our discussion of the possibility that our generation time estimates were affected by background epidemic dynamics (lines 541553 of the revised manuscript).
While methods exist for explicitly accounting for background epidemic dynamics in generation time estimates obtained using data from infectorinfectee transmission pairs, we are not aware of such methods having been developed for estimating the generation time using household data. Therefore, we leave this interesting extension of our approach to future work (see lines 553556).
Reviewer #1:
The additional work done by the authors has been considerable and substantially increased the potential value of the work. In particular, the addition of data augmentation MCMC helps to provide greater depth to the outcomes, and the identification of declining generation times useful (especially if it could be established in 'real time' – i.e. rather than retrospectively, but to aid in understanding ongoing epidemics) and interesting.
We again thank the reviewer for their helpful comments on the earlier version of our manuscript, which helped us improve our work.
I do have a few concerns which in my view need to be addressed before it would be suitable for publication in eLife.
First, while the observation of reduced generation times is both useful if true, and potentially plausible, it may not be robust. The overlap between the posterior estimates of generation times etc. are quite broad – and looking across three periods it doesn't seem like it would take much to change the trends in even the mean values.
In particular, the size of the study is not that large, and in each household, it seems from the Miller paper, that only two PCR tests were taken – as the approach does not consider the impact of latent processes (i.e. missed infections) it would be important to know whether a slight bias in missed infections across periods would impact on the conclusions.
The authors also state (line 573) that "Potential bias towards more recent infection of the primary host if community prevalence is increasing, or less recent if prevalence is decreasing (Britton and Scalia 900 Tomba, 2019; Ferretti et al., 2020b; Lehtinen et al., 2021), was neglected." Could this also provide some possible explanation for the shift in generation times? Especially given that the justify their assumption in part on the analysis across individual months, and there are relatively few recruited households (on the order of 10 I think, based on Figure 3 in the supplement).
The authors also say that (line 150) " we corrected for the regularity of household contacts to derive more widely applicable estimates of the generation time. We did this by including a factor in the likelihood to account for each infected individual avoiding infection from household contacts that occurred prior to their actual time of infection (see Materials and Methods for full details of our approach)." This sounds really interesting and would greatly increase the generality of the outcome. But unfortunately, from the description in the material and methods I was not able to figure out exactly why this was – which doesn't mean it’s wrong of course, but it would be helpful to me to have a more detailed description.
The authors state that on line 163 that "point estimates for each model using the posterior means of fitted model parameters because the mode of the joint posterior distribution could not easily be calculated from the output of the MCMC procedure." It would be important to know whether there are any correlations in the parameter posteriors that might make inappropriate.
We thank the reviewer for these additional comments, which we address under Essential Revisions above.
Reviewer #4:
This excellent paper suggests that despite extensive studies, we have not yet reached a full understanding of the generation time of SARS–CoV–2. The study is a robust examination of the subject of generation time within households in UK, which may not be representative of transmission in other contexts. It is unclear to the reviewers if temporal changes in generation time are real and attributable to e.g. the appearance of B.1.177.
This work is sound. While surprising, the results are supported by multiple statistical/modelling approaches and robustness analyses, and believable.
We thank the reviewer for their helpful comments. The suggestion that the emergence of the B.1.177 lineage may have contributed to our finding of a temporal decrease in the generation time is interesting, and we mention this possibility in the Discussion of our revised manuscript (lines 522527).
The three most striking results are:
1) The width of the generation time distribution is much wider than in previous works. While this is undoubtedly surprising, the explanation by the authors is believable: home quarantine in the UK is probably less effective in stopping late transmissions within households and may even amplify them.
We are pleased that the reviewer agrees with this hypothesis for the cause of the relatively high reported standard deviation of the generation time distribution.
2) The fraction of presymptomatic transmissions is >70%, quite high compared to most previous estimates. Combined with the high number of fully asymptomatic individuals, it would imply that <20% of transmissions come from individuals showing symptoms. This result seems also hard to square with the previous one, which would suggest a wide distribution of TOST. Of course, this estimate may be affected by the setting, since the analysis is restricted to households and therefore a higher force of infection.
We agree with the reviewer that our estimates for the proportion of presymptomatic transmissions are high compared with some previous estimates, although similar or higher estimates do exist elsewhere in the literature (including in our previous paper in eLife, which this Research Advance builds on), as described in lines 191198 of our manuscript.
In fact, the TOST distribution for the independent transmission and symptoms model shown in Figure 2B has a higher standard deviation (5.8 days) than the corresponding generation time distribution in Figure 2A (4.9 days). For the mechanistic model, the generation time and TOST distributions have similar standard deviations (4.8 days and 4.9 days, respectively). These standard deviations are reported in Appendix 1table 4; to highlight this, we have added a reference to this table to the main text of our revised submission (lines 242243). Therefore, the reviewer is correct in expecting our generation time distribution estimates to correspond to relatively wide TOST distributions, but the proportion of presymptomatic transmissions is nonetheless high in both models.
3) According to this work, the generation time changed between spring 2020 and autumn 2020 in the UK. This corresponds to the arrival of the B.1.177 lineage, probably more infectious than previous variants, but also to a different epidemiological phase of the epidemic: lockdown followed by gradual reopening in spring/summer, with a corresponding decrease in incidence, then a new wave in autumn with an increase in the number of cases until November. The authors do not correct for this epidemiological dynamic, therefore leaving open the possibility that it would cause an apparent change in generation time similar to the observed one. Other explanations (e.g. behavioural or reporting ones) may be possible.
As described above, in our revised submission we discuss the possibility that the arrival of the B.1.177 lineage (lines 522527) and/or background epidemiological dynamics (lines 541553) may have contributed to our finding of a temporal change in the generation time.
It is important to remark that many of the results of the mechanistic model may be affected by the assumption that longer incubation intervals correspond to higher infectiousness. The agreement with the results of the simpler model with independent incubation period and generation time implies that this assumption is not relevant for the main results (with the possible exception of the longer mean generation time).
We contend that it is realistic to assume (as is the case in the mechanistic model) that individuals with longer incubation periods will (on average) have longer presymptomatic infectious periods, and therefore generate more transmissions, compared to those with shorter incubation periods. However, we agree that this assumption is likely to affect estimates of epidemiological quantities using that model (but does not affect our main finding of a temporal decrease in the generation time). We therefore now note this assumption when first describing the mechanistic model in the Introduction of our revised submission (lines 111113).
Recommendations:
The results of the paper look really robust.
I spent some time trying to understand if there could be any issue causing the higher fraction of pre symptomatic transmission, which is the most unexpected result, and I could not find any obvious one. Same for the high variance of the generation time distribution (though this and the high pre–symptomatic transmission could be related). Hence, I think that these results can be published in the current form.
We are pleased that our results look robust to the reviewer, and that they believe our results can be published in the current form.
On the other way, the temporal changes in generation time do not seem to account for the epidemic dynamics and therefore would be biased upward in Spring 2020 and downward in Autumn 2020 as observed. The authors are aware of that as they explain in the Discussion, but I think that the author should either correct for this effect in their approach or clarify better how this effect is accounted for and what may its contribution be.
Please see the Essential Revisions above for our response to this comment. We again thank the reviewer for their helpful comments and suggestions.
https://doi.org/10.7554/eLife.70767.sa2Article and author information
Author details
Funding
Engineering and Physical Sciences Research Council (EP/R513295/1)
 William S Hart
National Institute for Health Research (NIHR200929)
 Elizabeth Miller
Taisho Pharmaceutical Co., Ltd (Research grant)
 Akira Endo
UKRI (EP/V053507/1)
 Robin N Thompson
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Thanks to Pauline Waight, who managed the data for the household study, and to the PHE staff who collected the data and tested the PCR and serum samples. Thanks also to Rob Challen, Julia Gog, Matt Keeling and other members of the Juniper Consortium (https://maths.org/juniper/) for helpful comments about this research.
Senior Editor
 Eduardo Franco, McGill University, Canada
Reviewing Editor
 Jennifer Flegg, The University of Melbourne, Australia
Reviewers
 Rowland Raymond Kao, University of Edinburgh, United Kingdom
 Eamon Conway
Publication history
 Received: May 29, 2021
 Preprint posted: May 30, 2021 (view preprint)
 Accepted: February 7, 2022
 Accepted Manuscript published: February 9, 2022 (version 1)
 Version of Record published: March 30, 2022 (version 2)
Copyright
© 2022, Hart et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 968
 Page views

 160
 Downloads

 18
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Epidemiology and Global Health
Background:
Whether natural selection may have attributed to the observed blood group frequency differences between populations remains debatable. The ABO system has been associated with several diseases and recently also with susceptibility to COVID19 infection. Associative studies of the RhD system and diseases are sparser. A large diseasewide risk analysis may further elucidate the relationship between the ABO/RhD blood groups and disease incidence.
Methods:
We performed a systematic loglinear quasiPoisson regression analysis of the ABO/RhD blood groups across 1,312 phecode diagnoses. Unlike prior studies, we determined the incidence rate ratio for each individual ABO blood group relative to all other ABO blood groups as opposed to using blood group O as the reference. Moreover, we used up to 41 years of nationwide Danish followup data, and a disease categorization scheme specifically developed for diagnosiswide analysis. Further, we determined associations between the ABO/RhD blood groups and the age at the first diagnosis. Estimates were adjusted for multiple testing.
Results:
The retrospective cohort included 482,914 Danish patients (60.4% females). The incidence rate ratios (IRRs) of 101 phecodes were found statistically significant between the ABO blood groups, while the IRRs of 28 phecodes were found statistically significant for the RhD blood group. The associations included cancers and musculoskeletal, genitourinary, endocrinal, infectious, cardiovascular, and gastrointestinal diseases.
Conclusions:
We found associations of diseasewide susceptibility differences between the blood groups of the ABO and RhD systems, including cancer of the tongue, monocytic leukemia, cervical cancer, osteoarthrosis, asthma, and HIV and hepatitis B infection. We found marginal evidence of associations between the blood groups and the age at first diagnosis.
Funding:
Novo Nordisk Foundation and the Innovation Fund Denmark

 Epidemiology and Global Health
 Genetics and Genomics
Background: To evaluate the utility of polygenic risk scores (PRS) in identifying highrisk individuals, different publicly available PRS for breast (n=85), prostate (n=37), colorectal (n=22) and lung cancers (n=11) were examined in a prospective study of 21,694 Chinese adults.
Methods: We constructed PRS using weights curated in the online PGS Catalog. PRS performance was evaluated by distribution, discrimination, predictive ability, and calibration. Hazard ratios (HR) and corresponding confidence intervals [CI] of the common cancers after 20 years of followup were estimated using Cox proportional hazard models for different levels of PRS.
Results: A total of 495 breast, 308 prostate, 332 femalecolorectal, 409 malecolorectal, 181 femalelung and 381 malelung incident cancers were identified. The area under receiver operating characteristic curve for the best performing sitespecific PRS were 0.61 (PGS000873, breast), 0.70 (PGS00662, prostate), 0.65 (PGS000055, femalecolorectal), 0.60 (PGS000734, malecolorectal) and 0.56 (PGS000721, femalelung), and 0.58 (PGS000070, malelung), respectively. Compared to the middle quintile, individuals in the highest cancerspecific PRS quintile were 64% more likely to develop cancers of the breast, prostate, and colorectal. For lung cancer, the lowest cancerspecific PRS quintile was associated with 2834% decreased risk compared to the middle quintile. In contrast, the hazard ratios observed for quintiles 4 (femalelung: 0.95 [0.611.47]; malelung: 1.14 [0.821.57]) and 5 (femalelung: 0.95 [0.611.47]) were not significantly different from that for the middle quintile.
Conclusions: Sitespecific PRSs can stratify the risk of developing breast, prostate, and colorectal cancers in this East Asian population. Appropriate correction factors may be required to improve calibration.
Funding This work is supported by the National Research Foundation Singapore (NRFNRFF201702), PRECISION Health Research, Singapore (PRECISE) and the Agency for Science, Technology and Research (A*STAR). WP Koh was supported by National Medical Research Council, Singapore (NMRC/CSA/0055/2013). CC Khor was supported by National Research Foundation Singapore (NRFNRFI201801). Rajkumar Dorajoo received a grant from the Agency for Science, Technology and Research Career Development Award (A*STAR CDA  202D8090), and from Ministry of Health Healthy Longevity Catalyst Award (HLCA20Jan0022). The Singapore Chinese Health Study was supported by grants from the National Medical Research Council, Singapore (NMRC/CIRG/1456/2016) and the U.S. National Institutes of Health [NIH] (R01 CA144034 and UM1 CA182876).