A modelling approach to estimate the transmissibility of SARSCoV2 during periods of high, low, and zero case incidence
Abstract
Against a backdrop of widespread global transmission, a number of countries have successfully brought large outbreaks of COVID19 under control and maintained nearelimination status. A key element of epidemic response is the tracking of disease transmissibility in near realtime. During major outbreaks, the effective reproduction number can be estimated from a timeseries of case, hospitalisation or death counts. In low or zero incidence settings, knowing the potential for the virus to spread is a response priority. Absence of case data means that this potential cannot be estimated directly. We present a semimechanistic modelling framework that draws on timeseries of both behavioural data and case data (when disease activity is present) to estimate the transmissibility of SARSCoV2 from periods of high to low – or zero – case incidence, with a coherent transition in interpretation across the changing epidemiological situations. Of note, during periods of epidemic activity, our analysis recovers the effective reproduction number, while during periods of low – or zero – case incidence, it provides an estimate of transmission risk. This enables tracking and planning of progress towards the control of large outbreaks, maintenance of virus suppression, and monitoring the risk posed by reintroduction of the virus. We demonstrate the value of our methods by reporting on their use throughout 2020 in Australia, where they have become a central component of the national COVID19 response.
Editor's evaluation
This paper is interesting, timely, and important because it presents a way to understand the transmission potential of a virus even when there are very few local cases. This has a high public health communication and preparedness value. The paper is clearly written, and the results fit with the known epidemiology of outbreaks that occurred in Australia in 2020. The results are convincing and likely to be of broad interest within and outside the field of epidemiological modelling.
https://doi.org/10.7554/eLife.78089.sa0Introduction
The first 12 months of the COVID19 pandemic led to overwhelmed health systems and enormous social disruption across the globe. Government strategy and public responses to COVID19 were highly variable. Prior to the global circulation of the Delta and Omicron variants, a small number of jurisdictions had achieved extended periods of elimination through 2020 and into early 2021, including Taiwan, Thailand, New Zealand and Australia (Rajatanavin et al., 2021; Summers et al., 2020; Golding et al., 2021). Meanwhile, parts of Europe and the Americas were heavily impacted by COVID19 (The Lancet, 2020; Remuzzi and Remuzzi, 2020), with health systems overwhelmed by multiple explosive outbreaks. The Delta and Omicron variants — with their increased transmissibility — has led to epidemic activity, now likely to be sustained, in a number of previously low prevalence settings (Australian Government Department of Health, 2021; Tiberghien, 2021; New Zealand Government Ministry of Health, 2022; Mallapaty, 2022).
A key element of epidemic response is the close monitoring of the speed of disease spread, via estimation of the effective reproduction number (${R}_{\mathrm{eff}}$) — the average number of new infections caused by an infected individual over their entire infectious period, in the presence of public health interventions and where no assumption of 100% susceptibility is made. Methods are wellestablished for near realtime estimation of this critical value and estimates are routinely assessed by decisionmakers through the course of an epidemic (Gostic et al., 2020; Cori et al., 2013; Thompson et al., 2019; Abbott et al., 2020b; White and Pagano, 2008). When ${R}_{\mathrm{eff}}$ is above 1, the epidemic is estimated to be growing. If control measures, population immunity, or other factors can bring ${R}_{\mathrm{eff}}$ below 1, then the epidemic is estimated to be in decline. Accurate and timely estimation of ${R}_{\mathrm{eff}}$, and the timely adjustment of interventions in response to it, is critical for the sustainable and successful management of COVID19.
However, when incident cases are driven to very low levels — as occurred in Australia following the first wave of COVID19 from February to April 2020 — established methods for estimating ${R}_{\mathrm{eff}}$ are no longer informative. Yet the virus remained a threat, as evidenced by multiple instances of reintroduction and subsequent additional waves in Australia throughout 2020 and early 2021. Independent of whether local (and temporary) elimination was achieved, knowledge of SARSCoV2’s potential transmissibility and the risk of resurgence was a response priority.
Here, by making use of social and behavioural data, we demonstrate a novel method for estimating the ability of the virus to spread in a population, which is informative even when case incidence is very low or zero. In the absence of cases, our method estimates the ability of the virus, if it were present, to spread in a population, which we define as the ‘transmission potential’. We use the word ‘potential’ to distinguish this quantity from an estimate of actual transmission. When the virus is present, our method recovers the effective reproduction number and, additionally, the deviation between the ${R}_{\mathrm{eff}}$ and the transmission potential. Applying this method in realtime provides an estimate of the transmissibility of SARSCoV2 in periods of high, low, and even zero, case incidence, with a coherent and seamless transition in interpretation across the changing epidemiological situations.
Our innovative methods and workflows address a major challenge in epidemic situational awareness: assessing epidemic risk when case numbers are driven to low levels or (temporary) elimination is achieved, as frequently occurred in Australia through 2020–21 (Golding et al., 2021). We have routinely applied this method to all Australian states and territories and reported the outputs to peak national decisionmaking committees on a weekly basis since early May 2020. The concepts of transmission potential and ${R}_{\mathrm{eff}}$ have been incorporated into key instruments of government, including Australia’s national COVID19 surveillance plan (Department of Health and Aged Care, 2021a). The transmission potential and ${R}_{\mathrm{eff}}$ are reported to the public through the Australian Government’s weekly Common Operating Picture (Department of Health and Aged Care, 2021b). While not addressed in this article, our methods have recently been updated to include consideration of variants of concern (Golding et al., 2021) and the effect of vaccination (Department of Health and Aged Care, 2021b) on reducing the ability of the virus to spread in the population.
Model
In this section, we describe a novel method for estimating temporal trends in the transmissibility of SARSCoV2.
The effective reproduction number is the product of the number of contacts an infectious person makes and the per contact probability of infection (the latter of which depends on the nature and duration of contact) (Anderson and May, 1991). Both quantities are impacted by changes in behaviour, which are in turn driven by changes in policy, such as stayathome orders and handwashing advice, and the population’s perception and evaluation of risk, among other factors. The new techniques introduced here provide an estimate for how observed changes in rates of social contact and the per contact probability of infection translate to changes in the ability of the virus to spread.
We estimate the timevarying ability of SARSCoV2 to spread in a population using a novel semimechanistic model informed by data on cases, population behaviours and health system effectiveness (see Materials and methods). We separately model transmission from locally acquired cases (localtolocal transmission) and from overseas acquired cases (importtolocal transmission). We model localtolocal transmission (${R}_{\mathrm{eff}}$) using two components (Figure 1): the average populationlevel trend in ${R}_{\mathrm{eff}}$ driven by interventions that primarily target transmission from local cases, specifically changes in physical distancing behaviour and case targeted measures (Component 1, the ‘transmission potential’ or TP); and shortterm fluctuations in ${R}_{\mathrm{eff}}$ to capture stochastic dynamics of transmission, such as clusters of cases and short periods of lowerthanexpected transmission (Component 2, the ‘deviation’ between TP and ${R}_{\mathrm{eff}}$). During periods of low or zero transmission, TP provides an evaluation of the ability of the virus to spread, informing riskassessments and supporting public health planning and response (Doherty Institute, 2021).
To estimate Component 1, we use three submodels (Figure 1, labelled a, b and c). We distinguish between two types of physical distancing behaviour:
Macrodistancing, defined as the reduction in the average rate of nonhousehold contacts, and assessed through weekly nationwide surveys of the daily number of nonhousehold contacts; and
Precautionary microbehaviour, defined as the reduction in transmission probability per nonhousehold contact, and assessed through weekly nationwide surveys from which we estimate the proportion of the population reporting always keeping 1.5 m physical distance from nonhousehold contacts. Note that for Australian reporting purposes, we used the term ‘microdistancing’ behaviour behaviour.
The modelling framework uses adherence to the 1.5 m rule as a proxy for all behaviours (other than those reducing the number of contacts) that may influence transmission, and so is intended to capture the use of masks, preference for outdoor gatherings, and hand hygiene, among other factors. The 1.5 m rule was a suitable proxy because it was consistent public health advice throughout the analysis period and timeseries data were available to track adherence to this metric over time.
By synthesising data from these surveys and numerous population mobility data streams made available by technology company Google, we infer temporal trends in macro and precautionary microbehaviour behaviour (submodels a and b). Furthermore, using data on the number of days from symptom onset to case notification for cases, we estimate the proportion of cases that are detected (and thus advised to isolate) by each day postinfection. By quantifying the temporal change in the probability density for the timetodetection (submodel c), the model estimates how earlier isolation of cases — due to improvements in contact tracing, expanded access to testing, more inclusive case definitions, and other factors impacting detection rates — reduces the ability of SARSCoV2 to spread.
Transmission potential (Component 1) reflects the average potential for the virus to spread at the population level. During times of disease activity, Component 2 measures how transmission within the subpopulations that have the most active cases at a given point in time differs compared to that expected from the populationwide TP. The combination of Components 1 and 2 recovers the estimated ${R}_{\mathrm{eff}}$ (see Equation (10) in Materials and Methods), as per established methods (Cori et al., 2013; Thompson et al., 2019; Abbott et al., 2020b). When Component 2, the deviation between TP and ${R}_{\mathrm{eff}}$, is positively biased (${R}_{\mathrm{eff}}$ > TP), it may indicate that transmission is concentrated in populations with higherthanaverage levels of mixing, such as healthcare workers or meat processing workers. If negatively biased (${R}_{\mathrm{eff}}$ < TP), it reflects suppressed transmission compared to expectation. This may be due to an effective public health response actively suppressing transmission (e.g. through test, trace, isolation and quarantine), or other factors such as local depletion of susceptible individuals, and/or the virus circulating in a subpopulation with fewerthanaverage social contacts.
Results
To demonstrate the utility of our method for assessing epidemic activity and risk, we report on its application to Australian data on cases, population behaviour and health system effectiveness from the first 12 months of the COVID19 pandemic. We focus on the period from early March 2020 to late January 2021 prior to emergence of variants of concern in Australia (first Alpha, then Delta, and recently Omicron) and vaccination roll out (refer to our recent technical report for details on our approach to variants of concern Golding et al., 2021). We describe our results in the context of the COVID19 epidemiology and public health response in Australia during this period, noting that the methods were developed and applied during the pandemic and contributed to government response efforts. We report retrospective estimates (using data as of 24 January 2021 and our model as of September 2021). Where relevant, we also report estimates made at the time of analysis in 2020, which may differ as a result of updates to the case data and methodological improvements to our model over time, as well as minor statistical variation and smoothing.
Across its eight states and territories, Australia has managed a number of distinct phases of the pandemic — from an initial wave of importations (February–April 2020), to sustained periods of zero local case incidence (April–June 2020 and October–December 2020) to widespread community transmission (June–October 2020). Like elsewhere in the world, key interventions have included quarantine of overseas arrivals, restrictions on mobility and gathering sizes, advice on personal hygiene, and case targeted interventions. The specific measures, and the level of control of SARSCoV2 transmission, has varied between states and over time, according to changing epidemiology and response objectives, among other factors. The model has proven informative across vastly different and rapidly changing phases of the pandemic.
To highlight these different epidemiological situations and the insights gained from our modelbased analysis, we draw on exemplar events from the Australian epidemic when describing our results below. In Table 1, we summarise the key types of information provided by estimated quantities under different epidemiological situations. Further, in Figure 2—figure supplements 1–3, we provide timeseries estimates of each metric and model subcomponent from early March 2020 to late January 2021 for each Australian state and territory.
Initial wave of importations
Australia took an early and precautionary approach to managing the risk of importation of SARSCoV2. On 1 February 2020, when China was the only country reporting uncontained transmission, Australia restricted all travel from mainland China to Australia. Only Australian citizens and residents were permitted entry from mainland China. These individuals were advised to selfquarantine for 14 days from their date of arrival. From 20 March 2020, Australia closed its borders to all foreign nationals, and from 27 March, shifted to mandatory statemanaged quarantine for returned citizens and residents, with weekly quotas on the number of arrivals. These policies remained in place at the time of writing.
During the first half of March 2020, that is, prior to the border closure, daily case incidence increased sharply. Although more than twothirds of these cases had acquired their infection overseas, pockets of local transmission were reported in Australia’s largest cities of Sydney (New South Wales) and Melbourne (Victoria) (Australian Government Department of Health, 2020a; Figure 2A and E). From 16 March 2020, state governments progressively implemented — in rapid succession — a range of physical distancing measures to reduce and prevent community transmission. These measures were part of a coordinated national response strategy. By 31 March, Australians were strongly advised to leave their homes only for limited essential activities and public gatherings were limited to two people (known as ‘stayathome‘ restrictions). Health authorities also advised individuals to keep 1.5 m distance from nonhousehold members from midMarch (Price et al., 2020).
Through the second half of March 2020, we estimate that transmission potential across states and territories decreased substantially and rapidly from well above 1 to just below 1 (Figure 2C and G). This reflected a marked increase in macrodistancing/precautionary microbehaviour (Figure 3B, C, F and G) and a decrease in timetocasedetection (Figure 3D and H). Our method, with its ability to distinguish between importtolocal and localtolocal transmission, estimates that the local ${R}_{\mathrm{eff}}$ dropped below 1 on 22 March (upper confidence intervals) in both Victoria and New South Wales — prior to the activation of stayathome restrictions on 30 March (Figure 2B and F). Physical distancing measures were implemented proactively — prior to the establishment of widespread community transmission — suggesting that the effect of these measures, in combination with border measures and casetargeted interventions, led to the definitive control of a first epidemic wave.
Successful suppression, reopening of society
By early April 2020, local case incidence had been driven to very low levels in all Australian states and territories. Substantial numbers of infections continued to be detected in quarantined international arrivals. However, no breaches of quarantine of significant consequence were reported until late May in the state of Victoria Lane et al., 2021.
Despite physical distancing measures remaining in place through April, levels of macrodistancing and precautionary microbehaviour steadily waned following peak levels of adherence in the first week of April (Figure 3B, C, F, and G). This resulted in a steady increase in estimated transmission potential, although it remained below 1 suggesting that the establishment of community transmission was unlikely throughout this period (Figure 2C and G).
From May through to December 2020, the epidemiology of COVID19 across Australia was characterised by sustained periods of zero case incidence and intermittent, localised outbreaks (with the exception of the state of Victoria, see below). With the gradual easing of restrictions from May, levels of macrodistancing and precautionary microbehaviour continued to decrease. Accordingly, transmission potential steadily increased and by early June it had exceeded 1 in most states and territories (Figure 2), suggesting that conditions were suitable to sustain onward transmission if there were an undetected importation event or a breakdown in infection control for managed active cases/identified importations.
During the period from late June to midOctober 2020, Australia’s most populous state of New South Wales effectively controlled a series of localised outbreaks (the largest of which involved hundreds of cases). This was achieved during a period where society remained relatively open, though some restrictions on population movement and social gatherings were in place. For example, household and public gatherings were limited to 20 people. Throughout this period, as estimated at the time and now in this retrospective analysis, statelevel transmission potential hovered just above 1 (Figure 2C), indicating that levels of population mixing were sufficient to allow escalation of epidemic activity in the general population in the absence of active public health measures to control outbreaks.
We estimate that ${R}_{\mathrm{eff}}$ oscillated around 1 throughout this period (Figure 2B). It increased to above 1 at the onset of each incursion and subsequently dropped below 1 as each cluster was contained, with no discernible change in statelevel transmission potential (model Component 1) in response to each cluster. These oscillations — strong positive and then negative deviations from the transmission potential — are captured by model Component 2 and are clearly evident in the timeseries (Figure 2D). Each of the positive deviations from the transmission potential are consistent with heightened transmission among clusters of cases. Each of the subsequent negative deviations from the transmission potential indicate that the number of offspring from each case of the cluster was fewer than expected given the transmission potential and estimated levels of population mixing. We interpret (and interpreted at the time) this as likely reflecting a strong public health response (i.e. early detection and isolation of cases associated with the cluster as a result of contact tracing and quarantine). This was consistent with weekly reporting on the performance of contact tracing systems in New South Wales, with 100% of cases interviewed within 24hr of notification and 100% of close contacts, identified by the case, contacted by public health officials within 48hr of case notification, from early July through to late October (Department of Health and Aged Care, 2021b).
In midNovember 2020, a sustained period of very low case incidence (i.e. zero local cases on all but 10 days in the previous 6 months) in the state of South Australia was disrupted by a breach of mandatory quarantine which led to a cluster of more than 20 cases. At the time, society was largely open with only minimal social restrictions in place. We estimate transmission potential to have been 1.71 [95% CrI: 1.47–2.01] as of 14 November in the retrospective analysis (cf. 1.27 [95% CrI: 1.14–1.41] at the time) (Figure 2), suggesting that the risk of establishing an epidemic was reasonably high (relative to the chance of stochastic extinction), and that once established, transmission would be rapid. Supported by our realtime analysis, authorities imposed a strict 3day lockdown across the entire state to enable contact tracers to comprehensively identify and quarantine primary and secondary contacts of cases. Estimated transmission potential declined dramatically around the time of activation of restrictions, and quickly rebounded when restrictions were eased three days later (Figure 2). The incursion was rapidly contained — as result of changes to transmission potential (driven by social restrictions), an effective public health response (i.e. active case finding and management) and plausibly some favourable stochastic fluctuations — with South Australia returning to zero local case incidence from midDecember 2020.
Resurgence of epidemic activity in one large state
In late May 2020, a breach of mandatory quarantine seeded a second epidemic wave in Australia’s second most populous state of Victoria (approximately 6.7 million people). At the time that the epidemic was seeded, many first wave restrictions were still in place. For example, gatherings within households, outdoor spaces, and dining venues were capped at 20 people, and working from home was strongly advised. Transmission potential is estimated to have been 1.07 [95% CrI: 0.88–1.22] at 25 May 2020, suggesting that levels of physical distancing may have been insufficient to prevent escalation of epidemic activity in the general population (Figure 2G).
Furthermore, from the earliest stages of the epidemic, our model estimated a strong positive deviation from the transmission potential (Component 2 positively biased, Figure 2H), corresponding to an estimate for the ${R}_{\mathrm{eff}}$ > 1 (95% chance of ${R}_{\mathrm{eff}}$ exceeding 1 by 1 June 2020 in the retrospective analysis) reflecting heightened transmission. Demographic and socioeconomic assessments of the outbreak (Australian Institute of Health and Welfare, 2021; Australian Government Department of Health, 2020b; Wild et al., 2021) showed that early affected areas had higher than average household sizes and a large proportion of essential and casualised workers who were unable to work from home. Thus our model findings concurred with the observed epidemiological characteristics — that the virus was predominantly spreading in subsections of the population with higherthanaverage rates of social contact — and supported public health decision making at the time.
By 1 July 2020, there were more than 600 active cases and 129 newly reported cases with an estimated ${R}_{\mathrm{eff}}$ of 1.33 [95% CrI: 1.25–1.41] (Figure 2F). From 9 July 2020, stayathome policies (denoted Stage 3 restrictions) were reinstated across metropolitan Melbourne. Despite these policies, the epidemic continued to grow through July, reaching a peak of 446 daily cases by date of symptom onset on 24 July 2020. More severe stayathome restrictions (denoted Stage 4) were enacted in metropolitan Melbourne on 2 August, including a nighttime curfew, restrictions on movement more than 5 km from a person’s residence, and stricter definitions of essential workers and businesses including invigilation of a work permit requirement.
During the periods of Stage 3 and 4 restrictions, we observed strong increases in macrodistancing and precautionary microbehaviour, which was reflected by a decrease in statelevel transmission potential from around 1 in early June to a minimum of 0.72 [95% CrI: 0.62–0.86] on 23 August 2020 (Figure 2G), two weeks after the implementation of Stage 4 restrictions.
Following an initial sharp rise in the ${R}_{\mathrm{eff}}$ from well below 1 in midMay to a peak of 1.61 [95% CrI: 1.46–1.79] at 14 June 2020, the ${R}_{\mathrm{eff}}$ steadily decreased over the next eight weeks (Figure 2F). We estimate that ${R}_{\mathrm{eff}}$ fell below the critical threshold of 1 on 25 July, approximately one week prior to the implementation of Stage 4 restrictions. With Stage 4 restrictions in place, ${R}_{\mathrm{eff}}$ settled between 0.8 and 1 for another eight weeks.
While both transmission potential and ${R}_{\mathrm{eff}}$ declined over this period, we estimated ${R}_{\mathrm{eff}}$ to be consistently higher than transmission potential (i.e. there was a strong positive deviation in Component 2) reflecting persistent transmission in subsections of the population with higherthanaverage rates of social contact. This was consistent with other epidemiological assessments of the outbreak which suggested that transmission was concentrated in populations that were less able to physically distance (e.g. healthcare workers, residents of aged care facilities, meat workers public housing residents) (Australian Institute of Health and Welfare, 2021; Australian Government Department of Health, 2020b; Wild et al., 2021). A substantial proportion of cases were in healthcare workers and aged care facilities, particularly during the tail of the epidemic. Each of these settings required specifically targeted interventions to bring transmission under control, which were distinct from the impacts of population level measures. This may partly explain why transmission persisted for many weeks when severe stayathome restrictions were active, since these measures primarily target transmission in the broader community and are logically less effective at controlling transmission in essential workplaces and institutional settings.
Definitive control of the epidemic was achieved by early November 2020, when zero local case incidence reported in Victoria for the first time since April 2020.
The pattern in Component 2 for Victoria, where it deviated strongly above zero in the earliest stages of the epidemic, persisted above zero for many months, and returned to around zero once the epidemic was definitely contained, is in contrast to the oscillations seen in New South Wales from June to October.
Discussion
We have presented a novel semimechanistic modelling framework for assessing transmissibility of SARSCoV2 from periods of high to low — or zero — case incidence, with a seamless and coherent transition in interpretation across the changing epidemiological situations. Using timeseries data on cases and population behaviours, our model computes three metrics within a single framework: the effective reproduction number for active cases (${R}_{\mathrm{eff}}$), the populationwide transmission potential (TP), and the deviation between ${R}_{\mathrm{eff}}$ and TP (C2). Our model has been applied (in realtime) to Australian data throughout the pandemic and continues to support the public health response. Here, our analysis of the first 12 months of the pandemic has demonstrated how these quantities enable the tracking and planning of progress towards the control of large outbreaks (as seen in Victoria), maintenance of virus suppression (as seen in New South Wales), and monitoring the risk posed by reintroduction of the virus (as seen in South Australia).
Our approach addresses a major challenge in epidemic situational awareness by enabling assessment of epidemic risk — via the TP — when cases are driven to low levels or (temporary) elimination is achieved. During periods of viral transmission, the model also provides new insight into epidemic dynamics via the deviation between ${R}_{\mathrm{eff}}$ and TP (C2). Further, the TP provides nearrealtime assessment of trends in population macrodistancing and precautionary microbehaviours that fluctuate in response to changing social restrictions, risk perception, and other factors such as school holidays. In combination, knowledge gained from ${R}_{\mathrm{eff}}$, TP and C2 enables policymakers to monitor the relative impacts of communitywide social restrictions and consider the need for more targeted response measures (Department of Health and Aged Care, 2021a).
Social and behavioural data have been used extensively in other countries to support COVID19 situational assessment (Rajatanavin et al., 2021; Jarvis et al., 2020; Coletti et al., 2020; Atchison et al., 2021; Czeisler et al., 2020; Leung et al., 2021). In the UK, the CoMix study Jarvis et al., 2020 has been collecting contact data on a fortnightly basis since March 2020 and reporting “${R}_{c}$" (the basic reproduction number under control measures), to the UK government’s Scientific Pandemic Influenza Group on Modelling, Operational subgroup (SPIMO). Conceptually, CoMix’s ${R}_{c}$ is akin to our TP. However, by synthesising behavioural data from multiple sources, accounting for both micro and macrodistancing behaviours (thus estimating ‘effective’ contacts), and incorporating the effect of case surveillance, our approach is likely to capture a more complete picture of the populationwide potential for virus transmission. Further, by estimating TP and ${R}_{\mathrm{eff}}$ within the same modelling framework (and thus computing C2), our analysis provides a richer and more coherent epidemiological interpretation than that offered through independent measurement and reporting of each metric. Our case studies demonstrate how this richness has supported (and continues to support) the Australian COVID19 response.
Despite its demonstrated impact, there are limitations to our approach. Firstly, it relies on data from frequent, populationwide surveys. In Australia, these data are collected for government and made available to our analysis team by a market research company which has access to an established ‘panel’ of individuals who have agreed to take part in surveys of public opinion. Researchers and governments in many other countries have used such companies for rapid data collection to support pandemic response (Jarvis et al., 2020; Atchison et al., 2021). However, these survey platforms are not readily available in all settings. Further, the sampling strategy did not allow for surveying individuals without internet access, low literacy or limited English language skills, or communication or cognitive difficulties. Further, individuals under 18 years of age were not represented in our surveys. Nor were these survey results available for the prepandemic period, limiting our ability to estimate what a true behavioural baseline would be for the Australian population.
The requirement for specific data streams is a limitation of our approach routinely applied in Australia in 2020 — where it was developed to address situationspecific policy questions and synthesise available data relating to the transmission process. However, the framework is modular and could be adjusted to incorporate or remove timeseries of relevant quantities (e.g. nonhousehold contact rates, adherence to precautionary microbehaviour, effectiveness of surveillance), according to data availability, epidemiological relevance, and policy needs. For its use in Australia in 2020, nonhousehold contact rates (capturing the main effects of stayathome measures) and precautionary microbehaviour were considered the most important (and measurable) drivers of epidemic dynamics. In other times and places (or for other diseases), different factors may be more important for monitoring epidemic dynamics, and the variables that are quantified should be chosen accordingly.
While the patterns of TP, ${R}_{\mathrm{eff}}$ and C2 observed over time in Australia are consistent with “in field” epidemiological assessments, and while the methods have demonstrated impact in supporting decision making, a direct quantification of the validity of the TP is not straightforward. For example, whether selfreported adherence to the 1.5 m rule is a reliable covariate for change in the per contact probability of transmission over time is difficult to assess. If transmission were to become widespread in Australia; and therefore cases become more representative of the general population rather than specific subsets, ${R}_{\mathrm{eff}}$ and TP estimates would be expected to converge. However in the absence of such a natural experiment, no ground truth for this unobserved parameter exists with which to quantitatively validate the model calibration. During the Victorian second wave, while ${R}_{\mathrm{eff}}$ > TP is consistent with virus spread in subpopulations with higherthanpopulationaverage rates of social contact, which was supported by other epidemiological assessments, we cannot rule out that the modelled TP was systematically underestimating the ‘true’ TP over this period.
In Australia, our methods are not only embedded in state and national situational assessment of Department of Health and Aged Care, 2021b but also national response planning. Since the model incorporates a mechanistic understanding of the impacts of physical distancing behaviour on both household and nonhousehold transmission, it can therefore be used to predict the impact of interventions on actual and potential transmission (Doherty Institute, 2021).
Unlike other approaches that make assumptions about impacts of different interventions on behaviour, we directly measure and account for behavioural responses, providing a much more proximal way of assessing the effects of interventions (Flaxman et al., 2020). Further, while detailed data on the demographics and transmission settings for cases in Australia is unavailable, our method considers deviation (the C2) from the regional average (the TP). It is therefore less susceptible to conflation between an epidemic stochastically moving between settings of different transmissibility, and changes in populationwide transmission potential.
While not addressed in this article, our semimechanistic model structure enables us to perform independent estimates of the relative transmissibility of variants compared to ancestral strains. In doing so, we account for variability in the types of contacts made when low restrictions are applied (Golding et al., 2021). We are able to estimate differences between variants in the probability of transmission per unit of contacttime, for example from detailed attack rate data from overseas. These probabilities can then be combined with our estimates from Australian case data to adjust our estimates of TP under different levels of restrictions for current and emerging variants. We have also updated our modelling framework to account for the effects of vaccination on the TP (reported in the Australian Government’s Common Operating Picture from 27 August 2021 Department of Health and Aged Care, 2021b). This enables us to consider the effect of varying levels of population vaccination coverage, agebased vaccination prioritisation strategies, and levels of restrictions on the ability of the Delta variant (and future possible variants) to spread in the population. These analyses underpin the 2021 Australian national COVID19 reopening plan (Doherty Institute, 2021) and will be reported elsewhere. These various additions and the component models of our framework (Figure 1) provide a suite of interoperable modules that could be used to apply the TP modelling framework to future epidemic diseases and other settings. Enabling the broader application and uptake of these methods would be aided by the development of robust research software, with the ability to modify which modules are used, to match the data streams available to the analyst. The development of such software, and detailed description of data inputs and analysis of the value of each data stream will be the focus of future work.
Our novel methods provide new insight into epidemic dynamics in both low and high incidence settings. The analyses have become an indispensable tool supporting the Australian COVID19 response, through both situational assessment and strategic planning processes.
Methods
Model overview
We estimate the timevarying ability of SARSCoV2 to spread in a population using a novel semimechanistic model informed by data on cases, population behaviours and health system effectiveness. We separately model transmission from locally acquired cases (localtolocal transmission) and from overseas acquired cases (importtolocal transmission). We model localtolocal transmission (${R}_{\mathrm{eff}}$) using two components:
The average populationlevel trend in transmissibility driven by interventions that primarily target transmission from local cases, specifically changes in physical distancing behaviour and case targeted measures (Component 1); and
Shortterm fluctuations in ${R}_{\mathrm{eff}}$ to capture stochastic dynamics of transmission, such as clusters of cases and short periods of lowerthanexpected transmission, and other factors factors influencing ${R}_{\mathrm{eff}}$ that are otherwise unaccounted for by the model (Component 2).
During times of disease activity, Components 1 and 2 are combined to provide an estimate of the local ${R}_{\mathrm{eff}}$ as traditionally measured. In the absence of disease activity, Component 1 is interpreted as the potential for the virus, if it were present, to establish and maintain community transmission (gt_{1}) or otherwise (lt_{1}).
Case data
We used linelists of reported cases for each Australian state and territory extracted from the Australian National Notifiable Diseases Surveillance System (NNDSS). The linelists contain the date when the individual first exhibited symptoms, date when the case notification was received by the jurisdictional health department and where the infection was acquired (i.e. overseas or locally).
Modelling the impact of physical distancing
Overview
To investigate the impact of distancing measures on SARSCoV2 transmission, we distinguish between two types of distancing behaviour: (1) macrodistancing that is, reduction in the rate of nonhousehold contacts; and (2) precautionary microbehaviour hat is, reduction in transmission probability per nonhousehold contact.
We used data from nationwide surveys to estimate trends in specific macrodistancing (average daily number of nonhousehold contacts) and precautionary microbehaviour (proportion of the population always keeping 1.5 m physical distance from nonhousehold contacts) behaviours over time. We used these survey data to infer statelevel trends in macrodistancing and precautionary microbehaviour over time, with additional information drawn from trends in mobility data.
Estimating changes in macrodistancing behaviour
To estimate trends in macrodistancing behaviour, we used data from: two waves of a national survey conducted in early April and early May 2020 by the University of Melbourne; and weekly waves of a national survey conducted by the Australian government from late May 2020. Respondents were asked to report the number of individuals that they had contact with outside of their household in the previous 24 hr. Note that the first wave of the University of Melbourne survey was fielded four days after Australia’s most intensive physical distancing measures were recommended nationally on 29 March 2020.
Given these data, we used a statistical model to infer a continuous trend in macrodistancing behaviour over time. This model assumed that the daily number of nonhousehold contacts is proportional to a weighted average of time spent at different types of location, as measured by Google mobility data. The five types of places are: parks and public spaces; residential properties; retail and recreation; public transport stations; and workplaces. We fit a statistical model that infers the proportion of nonhousehold contacts occurring in each of these types of places from:
A survey of locationspecific contact rates preCOVID19 Rolls et al., 2015; and
A separate statistical model fit to the national average numbers of nonhousehold contacts from a preCOVID19 contact survey and contact surveys fielded postimplementation of COVID19 restrictions.
Waning in macrodistancing behaviour is therefore driven by Google mobility data (calibrated to survey data on nonhousehold contact rates) on increasing time spent in each of the different types of locations since the peak of macrodistancing behaviour.
Estimating changes in precautionary microbehaviour
To estimate trends in precautionary microbehaviour, we used data from weekly national surveys (first wave from 27 to 30 March 2020) to assess changes in behaviour in response to COVID19 public health measures. Respondents were asked to respond to the question: ‘Are you staying 1.5 m away from people who are not members of your household’ on a five point scale with response options ‘No’, ‘Rarely’, ‘Sometimes’, ‘Often’ and ‘Always’.
These behavioural survey data were used in a statistical model to infer the trend in precautionary microbehaviour over time. Precautionary microbehaviour was assumed to be nonexistent prior to the first epidemic wave of COVID19, and the increase in precautionary microbehaviour to its peak was assumed to follow the same trend as precautionary microbehaviour — implying that the population simultaneously adopted both macrodistancing and precautionary microbehaviours around the times that restrictions were implemented.
Incorporating estimated changes in behaviour in the model of transmission potential
These statelevel macrodistancing and precautionary microbehaviour trends were then used in the model of transmission potential to inform the reduction in nonhousehold transmission rates. Since the macrodistancing trend is calibrated against the number of nonhousehold contacts, the rate of nonhousehold transmission scales directly with this inferred trend. The probability of transmission per nonhousehold contact is assumed to be proportional to the fraction of survey participants who report that they always maintain 1.5 m physical distance from nonhousehold contacts. The constant of proportionality is estimated in the model of transmission potential.
The estimated rate of waning of precautionary microbehaviour is sensitive to the metric used. If a different metric of precautionary microbehaviour (e.g. the fraction of respondents practicing good hand hygiene) were used, this might affect the inferred rate of waning of precautionary microbehaviour, and therefore increasing the transmission potential.
Modelling the impact of quarantine of overseas arrivals
We model the impact of quarantine of overseas arrivals via a ‘step function’ reflecting three different quarantine policies: selfquarantine of overseas arrivals from specific countries prior to 15 March 2020; selfquarantine of all overseas arrivals from 15 March up to 27 March 2020; and mandatory quarantine of all overseas arrivals after 27 March 2020 (Figure 2). We make no prior assumptions about the effectiveness of quarantine at reducing ${R}_{\mathrm{eff}}$ import, except that each successive change in policy increased that effectiveness. Note that this part of the model is intended to capture broad changes in the contribution of importation to case numbers, and is not intended to provide reliable inferences about the relative contributions of different border quarantine policies to disease importation.
Accounting for the impact of interstateacquired infections
Each of Australia’s eight states and territories were modelled as a separate epidemic, with no travel assumed between jurisdictions and interstateacquired cases handled as ‘imported cases’ within the modelling framework. We believe that these modelling decisions were reasonable for the Australian context given Australia’s unique geography (the majority of Australians live in a handful of major cities, with comparatively little movement between them), and the imposition of interstate travel restrictions during periods of COVID19 transmission over the analysis period. Furthermore, the number of interstate importations in Australia was small and well documented in the data. Unlike overseasacquired cases, interstateacquired cases are assumed to contribute to onward local transmission since they were not required to quarantine.
Model limitations
While we had access to data on whether cases are locally acquired or overseas acquired, no data were available on whether each of the locally acquired cases were infected by an imported case or by another locally acquired case. These data would allow us to disentangle the two transmission rates. Without these data, we can separate the denominators (number of infectious cases), but not the numerators (number of newly infected cases) in each group at each point in time. With access to such data, our method could provide more precise estimates of ${R}_{\mathrm{eff}}$.
Model description
We developed a semimechanistic Bayesian statistical model to estimate ${R}_{\mathrm{eff}}$, or $R(t)$ hereafter, the effective rate of transmission of SARSCoV2 over time, whilst simultaneously quantifying the impacts on $R(t)$ of a range of policy measures introduced at national and regional levels in Australia.
Observation model
A straightforward observation model to relate case counts to the rate of transmission is to assume that the number of new locally acquired cases ${N}_{i}^{L}(t)$ at time $t$ in region $i$ is (conditional on its expectation) Poissondistributed with mean ${\lambda}_{i}(t)$ given by the product of the total infectiousness of infected individuals ${I}_{i}(t)$ and the timevarying reproduction number ${R}_{i}(t)$:
where the total infectiousness, ${I}_{i}(t)$, is the sum of all active infections ${N}_{i}({t}^{\prime})$ — both locallyacquired ${N}_{i}^{L}({t}^{\prime})$ and overseasacquired ${N}_{i}^{O}({t}^{\prime})$ — initiated at times ${t}^{\prime}$ prior to $t$, each weighted by an infectivity function $g({t}^{\prime})$ giving the proportion of new infections that occur ${t}^{\prime}$ days postinfection. The function $g({t}^{\prime})$ is the probability of an infectorinfectee pair occurring ${t}^{\prime}$ days after the infector’s exposure, hat is, a discretisation of the probability distribution function corresponding to the generation interval.
This observation model forms the basis of the maximumlikelihood method proposed by White and Pagano, 2008 White and Pagano, 2008 and the variations of that method by Cori et al., 2013 Cori et al., 2013, Thompson et al., 2019 Thompson et al., 2019 and Abbott et al., 2020b Abbott et al., 2020a that have previously been used to estimate timevarying SARSCoV2 reproduction numbers in Australia Price et al., 2020.
We extend this model to consider separate reproduction numbers for two groups of infectious cases, in order to model the effects of different interventions targeted at each group: those with locally acquired cases ${I}_{i}^{L}(t)$, and those with overseas acquired cases ${I}_{i}^{O}(t)$, with corresponding reproduction numbers ${R}_{i}^{L}(t)$ and ${R}_{i}^{O}(t)$. These respectively are the rates of transmission from imported cases to locals, and from locally acquired cases to locals. We also model daily case counts as arising from a Negative Binomial distribution rather than a Poisson distribution to account for potential clustering of new infections on the same day, and use a state and timevarying generation interval distribution ${g}_{i}({t}^{\prime},t)$ (detailed in Surveillance effect model):
where the negative binomial distribution is parameterised in terms of its mean ${\mu}_{i}(t)$ and dispersion parameter $r$. In the commonly used probability and dispersion parameterisation with probability $\psi $ the mean is given by $\mu =\psi r/(1\psi )$.
Note that if data were available on the whether the source of infection for each locally acquired case was another locallyacquired case or an overseasacquired cases, we could split this into two separate analyses using the observation model above; one for each transmission source. In the absence of such data, the fractions of all transmission attributed to sources of each type is implicitly inferred by the model, with an associated increase in parameter uncertainty.
We provide the model with additional information on the rate of importtolocal transmission by adding a further likelihood term to the model for known events of importtolocal transmission since the implementation of mandatory hotel quarantine:
where $K$ is the total number of known events of transmission from overseasacquired cases occurring within Australia from ${\tau}_{2}$ = 20200328 to ${\tau}_{3}$ = 20201231. These events are largely transmission events within hotel quarantine facilities, some of which led to outbreaks of localtolocal transmission. Prior to this period, importtolocal transmission events cannot be reliably distinguished from localtolocal transmission events.
When estimating ${R}_{\mathrm{eff}}$ from recent case count data, care must be taken to account for underreporting of recent cases (those which have yet to be detected), because failing to account for this underreporting can lead to estimates of ${R}_{\mathrm{eff}}$ that are biased downwards. We correct for this righttruncation effect by first estimating the fraction of locallyacquired cases on each date that we would expect to have detected by the time the model is run (detection probability), and correcting both the infectiousness terms ${I}_{i}^{L}(t)$, and the observed number of new cases ${N}_{i}^{L}(t)$. We calculate the detection probability for each day in the past from the empirical cumulative distribution function of delays from assumed date of infection to date of detection over a recent period (see Surveillance effect model). We correct the infectiousness estimates ${I}_{i}^{L}(t)$ by dividing the number of newly infected cases on each day ${N}_{i}^{L}(t)$ by this detection probability — to obtain the expected number of new infections per day — before summing across infectiousness. We correct the observed number of new infections by a modification to the negative binomial likelihood; multiplying the expected number of cases by the detection probability to obtain the expected number of cases observed in the (uncorrected) time series of locallyacquired cases.
Reproduction rate models
We model the onward reproduction numbers for overseasacquired and locallyacquired cases in a semimechanistic way. Reproduction numbers for localtolocal transmission are modelled as a combination of a deterministic model of the populationwide transmission potential for that type of case, and a correlated time series of random effects to represent stochastic fluctuations in the reporting rate in each state over time. Importtolocal transmission is modelled in a mechanistic way:
For both locally acquired and overseasacquired infections, the effective reproduction number depends on the transmission potential ${R}_{i}^{\ast}(t)$ is given by a deterministic epidemiological model of populationwide transmission potential that considers the effects of distancing behaviours. The correlated time series of random effects ${\u03f5}_{i}(t)$ represents stochastic fluctuations in these locallocal reproduction numbers in each state over time — for example due to clusters of transmission in subpopulations with higher or lower reproduction numbers than the general population. We consider that the transmission potential ${R}_{i}^{*}(t)$ is the average of individual reproduction numbers over the entire state population, whereas the effective reproduction number ${R}_{i}^{L}(t)$ is the average of individual reproduction numbers among a (nonrandom) sample of individuals – those that make up the active cases at that point in time. We therefore expect that the longterm average of ${R}_{i}^{L}(t)$ will equate to ${R}_{i}^{*}(t)$. The relationship between these two is therefore defined such that the hierarchical distribution over ${R}_{i}^{L}(t)$ is marginally (with respect to time) a lognormal distribution with mean ${R}_{i}^{*}(t)$. The parameter ${\sigma}^{2}$ is the marginal variance of the ${\u03f5}_{i}$, as defined in the kernel function of the Gaussian process.
Note that in this model the random effects term ${\u03f5}_{i}$ and its variance term ${\sigma}^{2}$ is intended to have a mechanistic interpretation as the stochasticity due to random sampling (of people currently infected from the total population). It is not incorporated to account for error in specification of the transmission potential in the way that temporal random effects are commonly used in statistical modelling. Consequently, small variance in the timeseries plots of ${\u03f5}_{i}$ is not indicative of good fit, but of a large number of infections; as the size of the sample increases, the variance of mean decreases.
For overseasacquired cases the populationwide transmission rate at time $t$, ${R}_{i}^{*}(0)Q(t)$, is the baseline rate of transmission (${R}_{i}^{*}(0)={R}_{0}$; localtolocal transmission potential in the absence of distancing behaviour or other mitigation) multiplied by a quarantine effect model, $Q(t)$, that encodes the efficacy of the three different overseas quarantine policies implemented in Australia (described below).
We model ${R}_{i}^{*}(t)$, the populationwide rate of localtolocal transmission at time $t$, as the sum of two components: the rate of transmission to members of the same household, and to members of other households. Each of these components is computed as the product of the number of contacts, and the probability of transmission per contact. The transmission probability is in turn modelled as a binomial process considering the duration of contact with each person and the probability of transmission per unit time of contact. This mechanistic consideration of the contact process enables us to separately quantify how macrodistancing and precautionary microbehaviours impact on transmission, and to make use of various ancillary measures of both forms of distancing:
where: $s(t)$ is the effect of surveillance on transmission, due to the detection and isolation of cases (detailed below); $H{C}_{0}$ and $N{C}_{0}$ are the baseline (i.e. before adoption of distancing behaviours) daily rates of contact with, respectively, people who are, and are not, members of the same household; $H{D}_{0}$ and $N{D}_{0}$ are the baseline average total daily duration of contacts with household and nonhousehold members (measured in hours); $d$ is the average duration of infectiousness in days; $p$ is the probability of transmitting the disease per hour of contact, and; ${h}_{i}(t)$, ${\delta}_{i}(t)$, ${\gamma}_{i}(t)$ are timevarying indices of change relative to baseline of the duration of household contacts, the number of nonhousehold contacts, and the transmission probability per nonhousehold contact, respectively (modifying both the duration and transmission probability per unit time for nonhousehold contacts).
The first component in Equation (12) is the rate of household transmission, and the second is the rate of nonhousehold transmission. Note that the duration of infectiousness $d$ is considered differently in each of these components. For household members, the daily number of household contacts is typically close to the total number of household members, hence the expected number of household transmissions asymptotically approaches the household size; so the number of days of infectiousness contributes to the probability of transmission to each of those household members. This is unlikely to be the case for nonhousehold members, where each day’s nonhousehold contacts may overlap, but are unlikely to be from a small finite pool. This assumption would be unnecessary if contact data were collected on a similar timescale to the duration of infectiousness, though issues with participant recall in contact surveys mean that such data are unavailable. Note that this model does not have a household network structure, nor account for depletion of susceptible individuals within a household.
The parameters $H{C}_{0}$, $H{D}_{0}$, and $N{D}_{0}$ are all estimated from a contact survey conducted in Melbourne in 2015 Rolls et al., 2015. $N{C}_{0}$ is computed from an estimate of the total number of contacts per day for adults from Prem et al., 2017, minus the estimated rate of household contacts. Whilst Rolls et al., 2015 also provides an estimate of the rate of nonhousehold contacts, the method of data collection (a combination of ‘individual’ and ‘group’ contacts) makes it less comparable with contemporary survey data than the estimate of Prem et al., 2017.
The expected duration of infectiousness $d$ is computed as the mean of the nontimevarying discrete generation interval distribution:
and change in the duration of household contacts over time ${h}_{i}(t)$ is assumed to be equivalent to change in time spent in residential locations in region $i$, as estimated by the mobility model for the data stream Google: time at residential. In other words, the total duration of time in contact with household members is assumed to be directly proportional to the amount of time spent at home. Unlike the effect on nonhousehold transmission, an increase in macrodistancing is expected to slightly increase household transmission due to this increased contact duration.
The timevarying parameters ${\delta}_{i}(t)$ and ${\gamma}_{i}(t)$ respectively represent macrodistancing and precautionary microbehaviour; behavioural changes that reduce mixing with nonhousehold members, and the probability of transmission for each of nonhousehold member contact. We model each of these components, informed by population mobility estimates from the mobility model and calibrated against data from nationwide surveys of contact behaviour. Surveillance effect model Disease surveillance — both screening of people with COVIDlike symptoms and performing contact tracing — can improve COVID19 control by placing cases in isolation so that they are less likely to transmit the pathogen to other people. Improvements in disease surveillance can therefore lead to a reduction in transmission potential by isolating cases more quickly, and reducing the time they are infectious but not isolated. Such an improvement changes two quantities: the population average transmission potential ${R}^{*}(t)$ is reduced by a factor ${s}_{i}(t)$; and the generation interval distribution $g(t,{t}^{\prime})$ is shortened, as any transmission events are more likely to occur prior to isolation.
We model both of these functions using a region and timevarying estimate of the survival function (one minus the cumulative density function) ${f}_{i}(t,{t}^{\prime})$ of the discrete probability distribution over times from infection to detection:
where ${g}^{*}({t}^{\prime})$ is the baseline generation interval distribution, representing times to infection in the absence of detection and isolation of cases, ${s}_{i}(t)$ is a normalising factor — and also the effect of surveillance on transmission — and ${f}_{i}(t,{t}^{\prime})$ is a region and timevarying probability density over periods from infection to isolation ${t}^{\prime}$. In states/territories and at times when cases are rapidly found and placed in isolation, the distribution encoded by ${f}_{i}(t,{t}^{\prime})$ has most of its mass on small delays, average generation intervals are shortened, and the surveillance effect ${s}_{i}(t)$ tends toward 0 (a reduction in transmission). At times when cases are not found and isolated until after most of their infectious period has passed, ${f}_{i}(t,{t}^{\prime})$ has most of its mass on large delays, generation intervals are longer on average, and ${s}_{i}(t)$ tends toward 1 (no effect of reduced transmission).
We model the region and timevarying distributions ${f}_{i}(t,{t}^{\prime})$ empirically via a timeseries of empirical distribution functions computed from all observed infectiontoisolation periods observed within an adaptive moving window around each time $t$. Since dates of infection and isolation are not routinely recorded in the dataset analysed, we use 5 days prior to the date of symptom onset to be the assumed date of infection, and the date of case notification to be the assumed date of isolation. This will overestimate the time to isolation and therefore underestimate the effect of surveillance when a significant proportion of cases are placed into isolation prior to testing positive — for example, during the tail of an outbreak being successfully controlled by contact tracing.
For a given date and state/territory, the empirical distribution of delays from symptom onset to notification is computed from cases with symptom onset falling within a time window around that date, with the window selected to be the smallest that will yield at least 500 observations; but constrained to between one and eight weeks.
Where a state/territory does not have sufficient cases to reliably estimate this distribution in an eight week period, a national estimate is used instead. Specifically, if fewer than 100 cases, the national estimate is used, if more than 500 the state estimate is used, and if between 100 and 500 the distribution is a weighted average of state and national estimates.
The national estimate is obtained via the same method but with no upper limit on the window size and excluding data from Victoria since 14 June, since the situation during the Victorian outbreak after this time is not likely to be representative of surveillance in states with few cases.
Macrodistancing model
The populationwide average daily number of nonhousehold contacts at a given time can be directly estimated using a contact survey. We therefore used data from a series of contact surveys commencing immediately after the introduction of distancing restrictions to estimate ${\delta}_{i}(t)$ independently of case data. To infer a continuous trend of ${\delta}_{i}(t)$, we model the numbers of nonhousehold contacts at a given time as a function of mobility metrics considered in the mobility model. We model the log of the average number of contacts on each day as a linear model of the log of the ratio on baseline of five Google metrics of time spent at different types of location: residential, transit stations, parks, workplaces, and retail and recreation:
where $\omega $ is the the vector of 5 coefficients, $\mathbf{m}$ is an vector of length 5 containing ones, except for the element corresponding to time at residential locations, which has value 1, and ⊙ indicates the elementwise product. This constrains the direction of the effect of increasing time spent at each of these locations to be positive (more contacts), except for time at residential, which we constrain to be negative. The intercept of the linear model (average daily contacts at baseline) is given an prior formed from the daily number of nonhousehold contacts in a preCOVID19 contact survey Rolls et al., 2015. Since our aim is to capture general trends in mobility rather than daily effects, we model the weekly average of the daily number of contacts, by using smoothed estimates of the Google mobility metrics.
Whilst we aim to model weekly rather than daily variation in contact rates, when fitting the model to survey data we account for variation among responses by day of the week by modelling the fraction of the weekly number of contacts falling on each day of the week (the lengthseven vector in each state and time ${\mathbf{D}}_{i}(t)$) and using this to adjust the expected number of contacts for each respondent based on the day of the week they completed the survey. To account for how the weekly distribution of contacts has changed over time as a function of mixing restrictions (e.g. a lower proportion of contacts on weekdays during periods when stayathome orders were in place), we model the weekly distribution of contacts itself as a function of deviation in the weekly average of the daily number of contacts, with lengthseven vector parameters $\alpha $ and $\theta $. We use the softmax (normalised exponential) function to transform this distribution to sum to one, then multiply the resulting proportion by 7 to reweight the weekly average daily contact rate to the relevant day of the week.
Combining the baseline average daily contact rate $N{C}_{0}$, mobilitydriven modelled change in contact rates over time ${\delta}_{i}(t)$, and timevarying day of the week effects ${\mathbf{D}}_{i}(t)$ we obtain an expected number of daily contacts for each survey response $N{C}_{k}$:
where $i[k]$, $t[k]$, and $d[k]$ respectively indicate the state, time, and day of the week on which respondent $k$ filled in the survey.
We model the number of contacts from each survey respondent as a draw from an intervalcensored discrete lognormal distribution. This choice of distribution enables us to account for the adhoc rounding of reported numbers of contacts (responses larger than 10 tend to be ‘heaped’ on multiples of 10 and 100), whilst also accounting for heavy upper tail in numbers of reported contacts. The support of this distribution is the integers from 0 to 10 inclusive, and the intervals 11–20, 21–50, and 50–999. Reported daily contact rates ≥ 1000 are excluded as these are considered implausible for our definition of a contact. The probability mass function of this distribution is the integral across these ranges of a lognormal distribution with parameters ${\mu}_{k}$ and $\tau $, parameterised such that the mean of the distribution is $N{C}_{k}$:
We incorporate mobility data into transmission potential in a twostage process. In the first stage, nonhousehold contact rates are modelled using mobility and survey data. The posterior mean of the modelled nonhousehold contact rate in each jurisdiction over time is then incorporated in the transmission potential model as a fixed (i.e. ‘data’) timeseries without propagation of posterior uncertainty. Uncertainty in the macrodistancing model could be propagated through to the TP model by estimating both parts in a single joint model. However this would be computationally very burdensome, and long run times would reduce the utility of the transmission potential model for routine situational assessment. Moreover, because uncertainty in both the macrodistancing and transmission potential timeseries are homoscedastic (the posterior variance is more or less constant over time in each state), propagation of the uncertainty in the macrodistancing model is unlikely to have a material effect on estimation of TP timeseries.
Precautionary microbehaviour model
Unlike with macrodistancing behaviour and contact rates, there is no simple mathematical framework linking change in precautionary microbehaviours to changes in nonhousehold transmission probabilities. We must therefore estimate the effect of precautionary microbehaviour on transmission via case data. We implicitly assume that any reduction in localtolocal transmission potential that is not explained by changes to the numbers of nonhousehold contacts, the duration of household contacts, or improved disease surveillance is explained by the effect of precautionary microbehaviour on nonhousehold transmission probabilities.
Whilst it is not necessary to use ancillary data to estimate the effect that precautionary microbehaviour has at its peak, we use behavioural survey data to estimate the temporal trend in precautionary microbehaviour, in order to estimate to what extent adoption of that behaviour has waned and how that has affected transmission potential.
We therefore model ${\gamma}_{t}$ (a timevarying index of change relative to baseline of transmission probability per nonhousehold contact, see Equation (12)), as a function of the proportion of the population adhering to precautionary microbehaviours. We consider adherence to the ‘1.5 m rule’ as indicative of this broader suite of behaviours due to the availability of data on this behaviour in a series of weekly behavioural surveys beginning prior to the last distancing restriction being implemented Department of the Prime Minister and Cabinet, 2020. We consider the number ${m}_{i,t}^{+}$ of respondents in region $i$ on survey wave commencing at time $t$ replying that they ‘always’ keep 1.5 m distance from nonhousehold members, as a binomial sample with sample size ${m}_{i,t}$. We use a generalised additive model to estimate ${c}_{i}(t)$, the proportion of the population in region $i$ responding that they always comply as a the intervention stage, smoothed over time. Intervention stages are defined as periods of a continuous state of stayathome order, and this state thus switches each time a stayathome order is started, ended, or significantly changed. This state switching allows the model to react to sudden changes in compliance behaviour when orders are made or rescinded. We assume that the temporal pattern in the initial rate of adoption of the behaviour is the same as for macrodistancing behaviours — the adoption curve estimated from the mobility model. In other words, we assume that all macrodistancing and precautionary microbehaviours were adopted simultaneously around the time the first populationwide restrictions were put in place in March and April 2020. However we do not assume that these behaviours peaked at the same time or subsequently followed the same temporal trend. The model for the proportion complying with this behaviour is therefore:
where ${\zeta}_{i,j}$ is intervention state $j$ in region $i$, and $s$ is a smoothing function over time $t$.
Given ${c}_{i}(t)$, we model ${\gamma}_{i}(t)$ as a function of the degree of precautionary microbehaviour relative to the peak:
where ${\kappa}_{i}$ is the peak of compliance, or maximum of ${c}_{i}(t)$, and $\beta $ is inferred from case data in the main ${R}_{\mathrm{eff}}$ model.
Overseas quarantine model
We model the effect of overseas quarantine $Q(t)$ via a monotone decreasing step function with values constrained to the unit interval, and with steps at the known dates ${\tau}_{1}$ and ${\tau}_{2}$ of changes in quarantine policy:
where $q}_{1}>{q}_{2}>{q}_{3$ and all parameters are constrained to the unit interval.
Error models
The correlated timeseries of deviance between transmission potential and the effective reproduction number for localtolocal transmission in each region ${\u03f5}_{i}(t)$ is modelled as a zeromean Gaussian process (GP) with covariance structure reflecting temporal correlation in errors within each region, but independent between regions. We use a Matern 5/2 covariance function $k$, enabling a mixture of relatively smooth trends and local ’roughness’ to represent the sudden rapid growth of cases that can occur with a hightransmission cluster. Kernel parameters $\sigma $ and $l$ are the same across regions:
Components of local transmission potential
We model the rate of transmission from locally acquired cases as a combination of the timevarying mechanistic model of transmission rates ${R}_{i}^{*}(t)$, and a temporallycorrelated error term ${e}^{{\u03f5}_{i}(t)}$. This structure enables inference of mechanistically interpretable parameters whilst also ensuring that statistical properties of the observed data are represented by the model. Moreover, these two parts of the model can also be interpreted in epidemiological terms as two different components of transmission rates:
Component 1 (TP) – transmission rates averaged over the whole state population, representing how macrodistancing, precautionary microbehaviours, and other factors affect the potential for widespread community transmission (${R}_{i}^{*}(t)$), and
Component 2 (C2) – the degree to which the transmission rates of the population of current active cases deviates from the average statewide transmission rate (${e}^{{\u03f5}_{i}(t)}$).
Component 2 reflects the fact that the population of current active cases in each state at a given time will not be representative of the the statewide population, and may be either higher (e.g. when cases arise from a cluster in a hightransmission environment) or lower (e.g. when clusters are brought under control and cases placed in isolation).
Component 1 (TP) can therefore be interpreted as the expected rate of transmission if cases were widespread (populationrepresentative) in the community. The product of Components 1 and 2 (${R}_{\mathrm{eff}}$) can be interpreted as the rate of transmission in the subpopulation making up active cases at a given time.
Where a state has active cases in one or more clusters, the combination of these components gives the apparent rate of transmission in those clusters (${R}_{\mathrm{eff}}$), given by Equation 10. This reflects the interpretation that TP captures the population mean of a distribution over individuallevel reproduction numbers, and ${R}_{\mathrm{eff}}$ is the mean of a (nonrandom) sample from that distribution — the population comprising cases at that point in time. While not used in the public health context in Australia, the epidemiological interpretation of the ${R}_{\mathrm{eff}}$ when a state has no active cases is the rate of spread expected if an index case were to occur in a random subpopulation. Because the amplitude of this error term is learned from the data, this is informative as to the range of plausible rates of spread that might be expected from a case being introduced into a random subpopulation. However, the mean of this distribution, TP, may play a similar role and has proven to be a more interpretable quantity for end users of this model.
Parameter values and prior distributions
The parameters of the generation interval distribution are the posterior mean parameter estimates corresponding to a lognormal distribution over the serial interval estimated by Nishiura et al., 2020. The shape of the generation interval distribution for SARSCoV2 in comparable populations is not well understood, and a number of alternative distributions have been suggested by other analyses. A sensitivity analysis performed by running the model with alternative generation interval distributions (not presented here) showed that parameter estimates were fairly consistent between these scenarios, and the main findings were unaffected. A full, formal analysis of sensitivity to this and other assumptions will be presented in a future publication.
No ancillary data are available to inform $p$, the probability of transmission per hour of contact in the absence of distancing behaviour. However, at $t=0$, holding $H{C}_{0}$, $N{C}_{0}H{D}_{0}$, and $N{D}_{0}$ constant, there is a deterministic relationship between $p$ and ${R}_{i}^{*}(0)$ (the basic reproduction number, which is the same for all states). The parameter $p$ is therefore identifiable from transmission rates at the beginning of the first epidemic wave in Australia. We define a prior on $p$ that corresponds to a prior over ${R}_{i}^{*}(0)$ matching the averages of the posterior means and 95% credible intervals for 11 European countries as estimated by Flaxman et al., 2020 in a sensitivity analysis where the mean generation interval was 5 days — similar to the serial interval distribution assumed here. This corresponds to a prior mean of 2.79, and a standard deviation of 1.70 for ${R}_{i}^{*}(0)$. This prior distribution over $p$ was determined by a MonteCarlo momentmatching algorithm, integrating over the prior values for $H{C}_{0}$, $N{C}_{0}H{D}_{0}$, and $N{D}_{0}$.
Model fitting
We fitted (separate) models of ${c}_{i}(t)$ and $N{C}_{0}{\delta}_{i}(t)$ to survey data alone in order to infer trends in those parameters as informed by survey data. These are shown in Figure 3. We used the posterior means of each of these model outputs as inputs into the ${R}_{\mathrm{eff}}$ model. The posterior variance of each of these quantities is largely consistent over time and between states, and the absolute effect of each is scaled by other parameters (e.g. $\beta $), meaning that uncertainty in these quantities is largely not identifiable from uncertainty in other scaling parameters. As a consequence, propagation of uncertainty in these parameters into the ${R}_{\mathrm{eff}}$ model (as was performed in a previous iteration of the model) has little impact on estimates of ${R}_{\mathrm{eff}}$ and transmission potential, so is avoided for computational brevity.
Inference was performed by Hamiltonian Monte Carlo using the R packages greta and greta.gp 5 (Golding, 2019; Golding, 2020). Posterior samples of model parameters were generated by 10 independent chains of a Hamiltonian Monte Carlo sampler, each run for 1000 iterations after an initial, discarded, ‘warmup’ period (1000 iterations per chain) during which the sampler step size and diagonal mass matrix was tuned, and the regions of highest density located. Convergence was assessed by visual assessment of chains, ensuring that the potential scale reduction factor for all parameters had values less than 1.1, and that there were at least 1000 effective samples for each parameter.
Visual posterior predictive checks were performed to ensure that the observed data were consistent with the posterior predictive density over all cases (and survey results), and over timevarying case predictions within each state.
Code availability
Model code for performing the analyses and generating the figures is available at: https://github.com/goldingn/covid19_australia_interventions, (copy archived at swh:1:rev:9fe78353a2ee6ab9c3b9ed35c1feea6935af769a; Golding, 2023).
Data availability
Datasets analysed and generated during this study are available at the following link: https://doi.org/10.26188/19517986.v1. For estimates of the timevarying effective reproduction number and transmission potential (Figure 2), the complete line listed data within the Australian national COVID19 database are not publicly available. However, we provide the cases per day by notification date and state (Data files 1 and 2) which, when supplemented with the estimated distribution of the delay from symptom onset to notification as in Figure 3D and H (provided in Data files 3 and 4), and Data files 510, analyses of the timevarying effective reproduction number and transmission potential can be performed. Data files 510 contain the numerical data, output from each of the model components, used to generate Figure 3. For access to the raw data, a request must be submitted via NNDSS.datarequests@health.gov.au which will be assessed by a data committee. Model code for performing the analyses and generating the figures is available at: https://github.com/goldingn/covid19_australia_interventions (copy archived at swh:1:rev:9fe78353a2ee6ab9c3b9ed35c1feea6935af769a).

figshareData files to support manuscript: A modelling approach to estimate the transmissibility of SARSCoV2 during periods of high, low and zero case incidence.https://doi.org/10.26188/19517986.v1
References

COVID19, Australia: epidemiology report 12Communicable Diseases Intelligence 24:44.https://doi.org/10.33321/cdi.2020.44.36

COVID19, Australia: epidemiology report 17Communicable Diseases Intelligence 24:44.https://doi.org/10.33321/cdi.2020.44.51

COVID19, Australia: epidemiology report 47Communicable Diseases Intelligence 45:47.

A new framework and software to estimate timevarying reproduction numbers during epidemicsAmerican Journal of Epidemiology 178:1505–1512.https://doi.org/10.1093/aje/kwt133

Public attitudes, behaviors, and beliefs related to covid19, stayathome orders, nonessential business closures, and public health guidance  United States, New York City, and Los Angeles, may 512, 2020MMWR. Morbidity and Mortality Weekly Report 69:751–758.https://doi.org/10.15585/mmwr.mm6924e1

Greta: simple and scalable statistical modelling in RJournal of Open Source Software 4:1601.https://doi.org/10.21105/joss.01601

SoftwareCovid19_australia_interventions, version swh:1:rev:9fe78353a2ee6ab9c3b9ed35c1feea6935af769aSoftware Heritage.

Practical considerations for measuring the effective reproductive number, RTPLOS Computational Biology 16:e1008409.https://doi.org/10.1371/journal.pcbi.1008409

Serial interval of novel coronavirus (COVID19) infectionsInternational Journal of Infectious Diseases 93:284–286.https://doi.org/10.1016/j.ijid.2020.02.060

Projecting social contact matrices in 152 countries using contact surveys and demographic dataPLOS Computational Biology 13:e1005697.https://doi.org/10.1371/journal.pcbi.1005697

Potential lessons from the Taiwan and New Zealand health responses to the COVID19 pandemicThe Lancet Regional Health. Western Pacific 4:100044.https://doi.org/10.1016/j.lanwpc.2020.100044

ReportCommentary: The Delta variant has upended the East Asia COVID19 modelChannel News Asia.
Decision letter

Caroline ColijnReviewing Editor; Simon Fraser University, Canada

Eduardo FrancoSenior Editor; McGill University, Canada

Michael PlankReviewer; University of Canterbury, New Zealand

Amy HurfordReviewer; Memorial University of Newfoundland, Canada
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "A modelling approach to estimate the transmissibility of SARSCoV2 during periods of high, low, and zero case incidence" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Michael Plank (Reviewer #1); Amy Hurford (Reviewer #2).
As is customary in eLife, the reviewers have discussed their critiques with one another. What follows below is the Reviewing Editor's edited compilation of the essential and ancillary points provided by reviewers in their critiques and in their interaction postreview. Please submit a revised version that addresses these concerns directly. Although we expect that you will address these comments in your response letter, we also need to see the corresponding revision clearly marked in the text of the manuscript. Some of the reviewers' comments may seem to be simple queries or challenges that do not prompt revisions to the text. Please keep in mind, however, that readers may have the same perspective as the reviewers. Therefore, it is essential that you attempt to amend or expand the text to clarify the narrative accordingly.
Essential revisions:
We all agree that this work is of interest and is likely to be suitable for publication in eLife.
1) Most of the reviewer questions are clarifications. Could you please address these more detailed comments?
2) We had some discussion about the terminology re: microdistancing, whether it's really about distancing, mask use, ventilation etc. In particular, one reviewer noted that the focus on 1.5m distancing should be better justified given what we know about airborne transmission – presumably, this survey question should be interpreted as a proxy for precautionary microbehaviour (which may include mask use, preference for outdoor or well ventilated locations,) rather than a mechanistic metric for transmission risk which we know is not primarily determined by distance.
Would you consider potential alternative terminology to make it clear it's about more than just distance per se? E.g. some modelling groups use the term "precautionary behaviour" for a similar inferred quantity.
We are aware, though, that these terms may be established in Australia, and we don't wish to introduce more confusion.
(3) A clear statement of the minimal data requirements or implications of implementing the framework with reduced data availability would be a helpful addition.
Reviewer #1 (Recommendations for the authors):
For the VIC second wave, the observation that Reff>TP is explained as being due to the nature of the subpopulation the virus was predominantly spread in. That's certainly plausible epidemiologically. However another interpretation is presumably that the TP model is systematically underestimating TP. Can this be ruled out based on the result of the model?
The model description in the supplementary left me unclear about how mobility data and survey data were combined to estimate macrodistancing behaviour. This part of the model could do with some clarification. E.g. Were these used simultaneously or sequentially? From Equation 16 and line 630 it appears there is a deterministic relation between mobility data M(t) and number of nonhousehold contacts δ(t). Were the survey data used in the process of estimating the coefficients m? Presumably the mapping between survey data and mobility data is noisy – is there an implicit noise term or residual in equation 16? Line 433 says waning in macrodistancing is driven by mobility data – so how does survey data come into estimating this? In Figure S5, it appears that the posterior estimates for macrodistancing for NSW and QLD are systematically higher than the survey data – is this because the mobility data is pulling these estimates back up, i.e. mobility data in these states tend to be closer to baseline, for the same number of reported mean nonhousehold contacts?
Similarly regarding the microdistancing model – Line 448 – "infer the date of peak microdistancing behaviour" and the rate of waning. Does this mean microdistancing is assumed to follow some parametric functional form with respect to time? Or otherwise constrained to have a single peak followed by a waning phase? How does this relate to Equation 21, where microdistancing appears to be a function solely of the "intervention state" in that region. So wouldn't the timing of interventions in different regions determine when the peak microdistancing occurred? And how does/would this assumption work if the survey data showed that in fact actual behaviour was not highly correlated with interventions (which it may have been in March 2020 but could conceivably become less so over time). How were the xi_ij parameters in (21) estimated (there doesn't seem to be a prior specified for them)? The microdistancing behaviour is described as mainly relating to observation of the "1.5m distancing rule". Another significant behaviour factor affecting probability of transmission during a nonhousehold contact is mask use yet this is not mentioned. Is there a reason this wasn't considered in the model, e.g. the survey did not ask about masks? or is it that the effect of masks could not be separately estimated?
Other specific comments
– Is transmission/ travel between different states accounted for or ignored?
– The references to the panels of Figure 2 seem to have got mixed up as the text consistently refers to panels B and F as TP and C and G as Reff but the Figure has them the other way round.
– Line 174 "Reff dropped below 1… prior to activation of stayathome restrictions". The data at which Reff dropped below 1 presumably could be given a confidence interval based on the green bands in Figure 2. Do these CIs overlap the date of introduction of stayathome restrictions? I also wonder how sensitive these inferred dates are to the infection to reporting lag and the degree of smoothness that is a priori imposed on Reff(t).
– Figure 3 caption mentions the blue bar but this seems to be missing from the graphs. Also it is a bit confusing that both the macrodistancing graphs (B and F) and microdistancing (C and G) are described "reduction in…" when distancing behaviour appears to be associated with a decrease in B and F but an increase in C and G.
– There appears to be some notational inconsistency or at least ambiguity in the supplementary section. E.g. is TP synonymous with R*(t) in Equation 10? Is Ri^L(t) (or some combination of R^L and R^O) in Equation 6 the same as Reff(t)? Clarifying this would help understand how the method actually estimated these quantities from the data.
– In Equation 10 sigma2 appears to have the effect of always reducing RL (effective reproduction number?) relative to R* (TP). Or do the epsilons have strictly positive mean? In the absence of random effects (epsilon=0) wouldn't you expect R* and RL to have the same mean?
– In Equation (12) is surveillance assumed to have the same effect on household and nonhousehold transmission? If so is that realistic given it's hard to isolate from people at home?
– In Equation (14) f is described as a probability density but I think it must actually be a survival function (or 1 CDF) i.e. f(t') = P(case not yet isolated at time t' after infection). Otherwise Equation (14) can't be correct, e.g. if all cases were isolated on day t'=5 that would say g(t')=0 for t'!=5 rather than g(t')=g*(t') for t'<5 and 0 for t'>5.
– Line 636 should the element corresponding to residential have value 1 not 1 so it is opposite sign to the nonresidential locations? And are the 5 coefficients in w constrained to be nonnegative?
– Paragraph following line 681 – is there a reason only the "always" response was used?
– Line 748 – is estimating p from transmission rates at the beginning if 1st wave representative of subsequent times? Given the concentration in overseas arrivals one might expect this to be different? Or is it because it's per hour of contact so number of contacts are factored out?
– Figure S8 – it appears there is no change in effect at the second intervention why is this? And the black dotted line mentioned in the caption does not seem to be there.
Reviewer #2 (Recommendations for the authors):
This is valuable work that fills an important need – thank you for doing this!
Future work may consider how to make this approach more accessible by outlining minimum data needs, data collection priorities, or describing the implications of proceeding with the transmission potential calculation even if a data source is unavailable (i.e. for estimation of microdistancing). Your approach is rigorous, but also more similar to a complete model of infection spread, rather than a quick calculation of a summary statistic during a realtime pandemic response in a region with few modellers and limited resources (i.e. the Pacific Islands, or Atlantic Canada and Canada's northern territories, with much fewer resources than Australia, but that still had an important need for this approach during the pandemic).
In Table 1, regarding the heading "Local elimination", perhaps "Local transmission" is more appropriate since fundamentally this column discusses local transmission, and elimination is nonessential.
https://doi.org/10.7554/eLife.78089.sa1Author response
Essential revisions:
We all agree that this work is of interest and is likely to be suitable for publication in eLife.
1) Most of the reviewer questions are clarifications. Could you please address these more detailed comments?
We respond to each of these clarifications separately below.
2) We had some discussion about the terminology re: microdistancing, whether it's really about distancing, mask use, ventilation etc. In particular, one reviewer noted that the focus on 1.5m distancing should be better justified given what we know about airborne transmission – presumably, this survey question should be interpreted as a proxy for precautionary microbehaviour (which may include mask use, preference for outdoor or well ventilated locations,) rather than a mechanistic metric for transmission risk which we know is not primarily determined by distance.
Would you consider potential alternative terminology to make it clear it's about more than just distance per se? E.g. some modelling groups use the term "precautionary behaviour" for a similar inferred quantity.
We are aware, though, that these terms may be established in Australia, and we don't wish to introduce more confusion.
We agree with these important points. The modelling framework uses adherence to the 1.5m rule as a proxy for all behaviours (other than reducing the number of contacts, testseeking etc.) that may influence transmission, and so is intended to capture the use of masks and preference for outdoor meetings. Adherence to the 1.5m rule was a convenient metric of these riskavoidance behaviours as this has been consistent public health advice since the beginning of the pandemic in Australia, enabling us to track this metric over time for the entire duration. Comparison between adherence to the 1.5m rule and maskwearing has indicated that they tend to follow the same pattern; ie. increasing in response to lockdowntype restrictions and spikes in case counts. We have added the following wording to the manuscript to clarify this point (line 100108):
The modelling framework uses adherence to the 1.5 metre rule as a proxy for all behaviours (other than those reducing the number of contacts) that may influence transmission, and so is intended to capture the use of masks, preference for outdoor gatherings, and hand hygiene, among other factors. The 1.5 metre rule was a suitable proxy because it was consistent public health advice throughout the analysis period and timeseries data were available to track adherence to this metric over time.
Furthermore, we have adjusted our terminology to “precautionary microbehaviour” throughout the manuscript to improve clarity. However, we note on lines (XX) that the term “microdistancing” has been used for Australian reporting purposes.
(3) A clear statement of the minimal data requirements or implications of implementing the framework with reduced data availability would be a helpful addition.
The framework we describe here was iteratively developed throughout the pandemic in Australia, in order to synthesise available data relating to the transmission process and to address situationspecific questions. The framework is therefore inherently modular, and could be modified to incorporate or remove timeseries of relevant quantities (e.g. nonhousehold contact rates, adherence to precautionary microbehaviour, effectiveness of surveillance), as available. For its use in Australia in 2020, nonhousehold contact rates (capturing the main effects of lockdowntype measures) and precautionary microbehaviour were likely the most important aspects. However in other times and places with different drivers of epidemic dynamics, other factors may be more important. The variables that are important to quantify and include should therefore be chosen accordingly. We have added wording to the discussion to make these points (lines 349359):
“The requirement for specific data streams is a limitation of our approach routinely applied in Australia in 2020  where it was developed to address situationspecific policy questions and synthesise available data relating to the transmission process. However, the framework is modular and could be adjusted to incorporate or remove timeseries of relevant quantities (e.g., nonhousehold contact rates, adherence to precautionary microbehaviour, effectiveness of surveillance), according to data availability, epidemiological relevance, and policy needs. For its use in Australia in 2020, nonhousehold contact rates (capturing the main effects of stayathome measures) and precautionary microbehaviour were considered the most important (and measurable) drivers of epidemic dynamics. In other times and places (or for other diseases), different factors may be more important for monitoring epidemic dynamics, and the variables that are quantified should be chosen accordingly.”
Reviewer #1 (Recommendations for the authors):
For the VIC second wave, the observation that Reff>TP is explained as being due to the nature of the subpopulation the virus was predominantly spread in. That's certainly plausible epidemiologically. However another interpretation is presumably that the TP model is systematically underestimating TP. Can this be ruled out based on the result of the model?
This is an important point. The model is designed such that on average over the long term, Reff is approximately equal to TP. However this does not preclude that modelled TP could be systematically underestimating the ‘true’ TP. However since TP is a theoretical, rather than a directly measurable quantity, it was not possible to quantitatively validate this part of the model in the epidemiological situation in Australia in 2020, since there was no widespread transmission in any jurisdiction. We have added to existing text in the Discussion to clarify this (line 369372).
Existing text: “While the patterns of TP, Reff and C2 observed over time in Australia are consistent with “in field'' epidemiological assessments, and while the methods have demonstrated impact in supporting decision making, a direct quantification of the validity of the TP is not straightforward. For example, whether selfreported adherence to the 1.5 m rule is a reliable covariate for change in the per contact probability of transmission over time is difficult to assess. If transmission were to become widespread in Australia; and therefore cases become more representative of the general population rather than specific subsets, Reff and TP estimates would be expected to converge. However in the absence of such a natural experiment, no ground truth for this unobserved parameter exists with which to quantitatively validate the model calibration.”
New text: “During the Victorian second wave, while Reff > TP is consistent with virus spread in subpopulations with higherthanpopulationaverage rates of social contact, which was supported by other epidemiological assessments, we cannot rule out that the modelled TP was systematically underestimating the `true' TP over this period.”
The model description in the supplementary left me unclear about how mobility data and survey data were combined to estimate macrodistancing behaviour. This part of the model could do with some clarification. E.g. Were these used simultaneously or sequentially? From Equation 16 and line 630 it appears there is a deterministic relation between mobility data M(t) and number of nonhousehold contacts δ(t). Were the survey data used in the process of estimating the coefficients m? Presumably the mapping between survey data and mobility data is noisy – is there an implicit noise term or residual in equation 16?
Thank you for highlighting this unclear wording. We have edited the text to clarify that the models are used sequentially: nonhousehold contact rates are modeled first and then the prediction used in the TP/Reff model; and that uncertainty in the nonhousehold contact rates are not explicitly propagated into the TP/Reff model, because this uncertainty is absorbed by other parameters in the TP/Reff model. We initially fitted these models simultaneously, enabling full propagation of uncertainty, however this modelfitting became computationally infeasible as the timeseries grew and provided no material benefit. We believe this information would be of interest to other modellers, and so have described it in more detail on lines 725735 as follows:
We incorporate mobility data into transmission potential in a twostage process. In the first stage, nonhousehold contact rates are modeled using mobility and survey data. The posterior mean of the modeled nonhousehold contact rate in each jurisdiction over time is then incorporated in the transmission potential model as a fixed (i.e. ‘data’) timeseries without propagation of posterior uncertainty. Uncertainty in the macrodistancing model could be propagated through to the TP model by estimating both parts in a single joint model. However this would be computationally very burdensome, and long run times would reduce the utility of the transmission potential model for routine situational assessment. Moreover, because uncertainty in both the macrodistancing and transmission potential timeseries are homoscedastic (the posterior variance is more or less constant over time in each state), propagation of the uncertainty in the macrodistancing model is unlikely to have a material effect on estimation of TP timeseries.
Line 433 says waning in macrodistancing is driven by mobility data – so how does survey data come into estimating this? In Figure S5, it appears that the posterior estimates for macrodistancing for NSW and QLD are systematically higher than the survey data – is this because the mobility data is pulling these estimates back up, i.e. mobility data in these states tend to be closer to baseline, for the same number of reported mean nonhousehold contacts?
We have edited the explanation of the macrodistancing model to clarify how it is fitted to the survey data. We also note that the bars indicating pointwise estimates of nonhousehold contact rates are not in fact the raw data, but the output of a different model. The reason for this is that empirical averages (or other nonmodelbased summaries) of nonhousehold contact numbers in surveys are highly skewed and volatile: while most respondents have very few contacts, occasionally respondents report hundreds of contacts. Since these responses occur in some weeks and not others, weekbyweek estimates can fluctuate wildly. Model fitting employs a specific likelihood model to account for these data, but this makes visual comparison of data and model fit very difficult. The pointwise estimates are included to indicate data sparsity and uncertainty (being larger in jurisdictions with fewer samples), rather than fit to data. The new wording is as follows (lines 462–464):
“Waning in macrodistancing behaviour is therefore driven by Google mobility data (calibrated to survey data on nonhousehold contact rates) on increasing time spent in each of the different types of locations since the peak of macrodistancing behaviour.”
The following has been added to the Figure S5 legend:
“The pointwise estimates for each survey round represented as black lines and grey rectangles are the outputs of a separate statistical model that does not include the mobility data covariates, and is intended as a visual illustration of the level of data sparsity and variability, rather than a way of estimating fit to data, since the raw data for each week is subject to significant skew.”
Similarly regarding the microdistancing model – Line 448 – "infer the date of peak microdistancing behaviour" and the rate of waning. Does this mean microdistancing is assumed to follow some parametric functional form with respect to time? Or otherwise constrained to have a single peak followed by a waning phase? How does this relate to Equation 21, where microdistancing appears to be a function solely of the "intervention state" in that region. So wouldn't the timing of interventions in different regions determine when the peak microdistancing occurred? And how does/would this assumption work if the survey data showed that in fact actual behaviour was not highly correlated with interventions (which it may have been in March 2020 but could conceivably become less so over time). How were the xi_ij parameters in (21) estimated (there doesn't seem to be a prior specified for them)? The microdistancing behaviour is described as mainly relating to observation of the "1.5m distancing rule".
Apologies for this confusing text, this was accidentally copied over from a report regarding an earlier iteration of this model that used an alternative model with the aim of predicting the peak of microdistancing. For the version of the TP model framework presented here, we used Generalised Additive Models to model microdistancing behaviour, as detailed in the Methods section. We have deleted the confusing final sentence of this paragraph.
Another significant behaviour factor affecting probability of transmission during a nonhousehold contact is mask use yet this is not mentioned. Is there a reason this wasn't considered in the model, e.g. the survey did not ask about masks? or is it that the effect of masks could not be separately estimated?
As detailed and clarified above, we included the 1.5m rule as a metric of precautionary microbehaviour because data on adherence was available for the duration of the pandemic, and it is likely to be correlated with other similar behaviours like mask wearing. Because maskwearing was not initially a part of the official health advice in Australia, data on adherence was not collected in the early stages of the pandemic. This dataset is therefore not ideally suited to distinguishing the marginal effect on transmission of maskwearing, from the effect of the broader suite of precautionary microbehaviours.
Other specific comments
– Is transmission/ travel between different states accounted for or ignored?
Each jurisdiction is modelled as a separate epidemic. Travel is only considered between states when accounting for the place of acquisition of cases. Cases acquired interstate are considered as “imported cases” within the modelling framework, that is, they do not arise from locally acquired cases but can contribute to onward local transmission. A description of how imported versus locallyacquired cases are handled within the modelling framework is provided in equations 18 in the Methods section. Given Australia’s unique geography (the majority of Australians live in a handful of major cities, with comparatively little movement between them), and the implementation of interstate travel restrictions during periods of transmission, the number of interstate importations in Australia was small, and well documented in the data. This may not be the case in other settings.
We have added the following to the text to make this clearer under a new subheading
“Accounting for the impact of interstateacquired infections” (lines 501510):
“Each of Australia's eight states and territories were modelled as a separate epidemic, with no travel assumed between jurisdictions and interstateacquired cases handled as “imported cases" within the modelling framework (but contributing to the case counts in their jurisdiction of origin for the model likelihood). We believe that these modelling decisions were reasonable for the Australian context given Australia's unique geography (the majority of Australians live in a handful of major cities, with comparatively little movement between them), and the imposition of interstate travel restrictions during periods of COVID19 transmission over the analysis period. Furthermore, the number of interstate importations in Australia was small and well documented in the data. Unlike overseasacquired cases, interstateacquired cases are assumed to contribute to onward local transmission since they were not required to quarantine.”
– The references to the panels of Figure 2 seem to have got mixed up as the text consistently refers to panels B and F as TP and C and G as Reff but the Figure has them the other way round.
Thank you. We have corrected the text referring to Figure 2.
– Line 174 "Reff dropped below 1… prior to activation of stayathome restrictions". The data at which Reff dropped below 1 presumably could be given a confidence interval based on the green bands in Figure 2. Do these CIs overlap the date of introduction of stayathome restrictions? I also wonder how sensitive these inferred dates are to the infection to reporting lag and the degree of smoothness that is a priori imposed on Reff(t).
This is an important point. However, the confidence intervals do not overlap with the date of introduction of stayathome restrictions. Our estimates of Reff account for the lag from infection to reporting and any impact of statistical smoothing would have shifted this segment of the Reff timeseries to the right i.e. closer to the date of imposition of stayathome restrictions. We have edited the sentence to improve clarity, now reporting the date on which the upper confidence interval crossed 1 (instead of the median) and reporting the date that the stayathome restrictions were imposed (8 days later) (line 181183):
“Our method, with its ability to distinguish between importtolocal and localtolocal transmission, estimates that the local Reff dropped below 1 on 22 March (upper confidence intervals) in both Victoria and New South Wales  prior to the activation of stayathome restrictions on 30 March.”
– Figure 3 caption mentions the blue bar but this seems to be missing from the graphs. Also it is a bit confusing that both the macrodistancing graphs (B and F) and microdistancing (C and G) are described "reduction in…" when distancing behaviour appears to be associated with a decrease in B and F but an increase in C and G.
Thank you for noting that the blue bar is missing from Figure 3. We have now corrected this. To further aid in interpretation of Figures 2 and 3, we have also added a column to Table S1, “Label”, where we include the letter associated with each vertical line in Figures 2 and 3.
– There appears to be some notational inconsistency or at least ambiguity in the supplementary section. E.g. is TP synonymous with R*(t) in Equation 10? Is Ri^L(t) (or some combination of R^L and R^O) in Equation 6 the same as Reff(t)? Clarifying this would help understand how the method actually estimated these quantities from the data.
That is correct. R(t) denotes Reff, with R^L(t) and R^O(t) respectively the Reffs of locallyacquired and overseas acquired infections. R*(t) is transmission potential. We have edited the text immediately below Equation 10 to make this clearer:
“For both locallyacquired and overseasacquired infections, the effective reproduction number depends on the transmission potential R_i*(t). R_i*(t) is given by a deterministic epidemiological model of populationwide transmission potential that considers the effects of distancing behaviours.”
– In Equation 10 sigma2 appears to have the effect of always reducing RL (effective reproduction number?) relative to R* (TP). Or do the epsilons have strictly positive mean? In the absence of random effects (epsilon=0) wouldn't you expect R* and RL to have the same mean?
In reviewing this we spotted a typo in this equation. The correct equation should have the σ^2 term divided by 2, and this has been corrected. The correct version of the equation produces R^L that is drawn from a lognormal distribution with mean R*. We provide the working here to clarify our following response, but omit this working in the manuscript, for brevity:
R^L is lognormallydistributed with parameters mu and σ: log(R^L) ~ N(mu, σ^2)
which is equivalent in distribution (via affine transformation of a standard normal) to:
R^L = exp(mu + epsilon) where epsilon has variance σ^2 (as stated in the manuscript) epsilon ~ N(0, σ^2)
The mean of a lognormal distribution is given as:
mean(R^L) = exp(mu + σ^2 / 2)
So setting the mean to R* and solving for mu, we obtain:
mu = log(R*) – σ^2 / 2 And the full equation 10 is:
R^L = exp(log(R*) – σ^2 / 2 + epsilon)
This means that a priori we expect R^L to have marginal longterm mean R*. Ie. if we were to simulating from the priors, conditional on R*, the timeseries trends for R^L (Reff) would go up and down at random, but on average over time would have lognormal distribution, and have longterm mean R*. Therefore it does not follow that we would expect R^L = R* in the absence of the random variates (epsilon = 0), since the lognormal distribution is asymmetric. If epsilon = 0, then R^L would be at the median of the lognormal distribution we intend, which for the lognormal distribution is always lower than the mean.
We believe the text below this equation is accurate in detailing this relationship, but we recognise that the interpretation of this temporal random effect is different from most uses of temporal random effects in statistical modelling; where the effect mops up ‘error’, but does not have a mechanistic interpretation in terms of the distribution of a sample. We have therefore added the following text on lines (588594) to make this point:
“Note that in this model the random effects term $\epsilon_i$ and its variance term $\σ^2$ is intended to have a mechanistic interpretation as the stochasticity due to random sampling (of people currently infected from the total population). It is not incorporated to account for error in specification of the transmission potential in the way that temporal random effects are commonly used in statistical modelling. Consequently, small variance in the timeseries plots of $\epsilon_i$ is not indicative of good fit, but of a large number of infections; as the size of the sample increases, the variance of mean decreases.”
– In Equation (12) is surveillance assumed to have the same effect on household and nonhousehold transmission? If so is that realistic given it's hard to isolate from people at home?
Yes, that is correct. Throughout the pandemic, government isolation advice in Australia explicitly outlined the need to isolate from others within the same household. However it is possible that isolation reduced withinhousehold transmission less than betweenhousehold transmission. We are not aware of any detailed longitudinal household data for Australia that could be used to estimate the effectiveness of isolation within households in order to confirm or quantify this.
– In Equation (14) f is described as a probability density but I think it must actually be a survival function (or 1 CDF) i.e. f(t') = P(case not yet isolated at time t' after infection). Otherwise Equation (14) can't be correct, e.g. if all cases were isolated on day t'=5 that would say g(t')=0 for t'!=5 rather than g(t')=g*(t') for t'<5 and 0 for t'>5.
You are correct, it should say that this is the survival function of the distribution. The wording has been updated to make this clear on line 648:
“We model both of these functions using a region and timevarying estimate of the survival function (one minus the cumulative density function) f_i(t, t’) of the discrete probability distribution over times from infection to detection: “
– Line 636 should the element corresponding to residential have value 1 not 1 so it is opposite sign to the nonresidential locations? And are the 5 coefficients in w constrained to be nonnegative?
Yes, that is correct. The minus sign was lost somewhere in formatting. This has now been corrected.
– Paragraph following line 681 – is there a reason only the "always" response was used?
This choice was largely arbitrary, since the aim was to capture temporal patterns in broader precautionary microbehaviour, rather than an absolute value. Temporal patterns for the proportion giving the ‘always’ response were very similar to those using other thresholds.
– Line 748 – is estimating p from transmission rates at the beginning if 1st wave representative of subsequent times? Given the concentration in overseas arrivals one might expect this to be different? Or is it because it's per hour of contact so number of contacts are factored out?
By separately modelling the rates of transmission from overseas arrivals and local transmission, and by accounting for differences in contact rates and durations, and other behaviours, the parameter p can be interpreted as a virus (and variant) specific parameter.
– Figure S8 – it appears there is no change in effect at the second intervention why is this? And the black dotted line mentioned in the caption does not seem to be there.
The model has the ability to estimate the differences between these periods, with exponential priors that imply a priori that small changes are more likely than big ones. The fact that the posterior for the difference at the second interventions is essentially zero, probably reflects the inference procedure selecting a more parsimonious parameter estimate. We cannot think of a particular epidemiological reason to support this finding. This part of the model is intended to account for changing importation effects in the early part of the pandemic in Australia rather than when estimating local transmission rates. These parameters are of lesser interest to policymakers and are fitted to limited data. We would therefore advise not interpreting these particular parameters too closely. We have added the following wording to express this on lines 497500:
“Note that this part of the model is intended to capture broad changes in the contribution of importation to case numbers, and is not intended to provide reliable inferences about the relative contributions of different border quarantine policies to disease importation.”
We also have deleted the sentence about the dotted line, which is not visible given the low yaxis.
Reviewer #2 (Recommendations for the authors):
This is valuable work that fills an important need – thank you for doing this!
Thank you for the positive comments.
Future work may consider how to make this approach more accessible by outlining minimum data needs, data collection priorities, or describing the implications of proceeding with the transmission potential calculation even if a data source is unavailable (i.e. for estimation of microdistancing). Your approach is rigorous, but also more similar to a complete model of infection spread, rather than a quick calculation of a summary statistic during a realtime pandemic response in a region with few modellers and limited resources (i.e. the Pacific Islands, or Atlantic Canada and Canada's northern territories, with much fewer resources than Australia, but that still had an important need for this approach during the pandemic).
We agree. A potential future extension would be modular software that would enable users to input different data sources (not all might be necessary) and lower the technical hurdles to implementation. We have added text noting this to the discussion (lines 398404):
“These various additions and the component models of our framework (Figure 1) provide a suite of interoperable modules that could be used to apply the transmission potential modelling framework to future epidemic diseases and other settings. Enabling the broader application and uptake of these methods would be aided by the development of robust research software, with the ability to modify which modules are used, to match the data streams available to the analyst. The development of such software, and detailed description of data inputs and analysis of the value of each datastream will be the focus of future work.”
In Table 1, regarding the heading "Local elimination", perhaps "Local transmission" is more appropriate since fundamentally this column discusses local transmission, and elimination is nonessential.
This column is intended to highlight the interpretation of each metric in situations where there is community transmission (denoted “Community transmission”) versus no transmission (denoted “Local elimination”). Since the far right column is describing interpretation for situations where there is no local/community transmission, “local transmission” is not an appropriate substitute for “local elimination”. However to improve clarity, we have adjusted “local elimination” to “no transmission”.
https://doi.org/10.7554/eLife.78089.sa2Article and author information
Author details
Funding
Australian Government
 Nick Golding
Australian Research Council (DE180100635)
 Nick Golding
National Health and Medical Research Council (GNT1170960)
 Jodie McVernon
National Health and Medical Research Council (GNT1117140)
 Jodie McVernon
National Health and Medical Research Council (2021/GNT2010051)
 Freya M Shearer
World Health Organization
 Nick Golding
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Our analyses use surveillance data reported through the Communicable Diseases Network Australia (CDNA) as part of the nationally coordinated response to COVID19. We thank public health staff from incident emergency operations centres in state and territory health departments, and the Australian Government Department of Health, along with state and territory public health laboratories. We thank members of CDNA for their feedback and perspectives on the results of the analyses. This work was directly funded by the Australian Government Department of Health Office of Health Protection. Additional support was provided by: the Australian Research Council (NG DECRA fellowship DE180100635); the National Health and Medical Research Council of Australia through its Centres of Research Excellence (SPECTRUM, GNT1170960) and Investigator Grant Schemes (JMcV Principal Research Fellowship, GNT1117140; FMS Emerging Leader Fellowship, 2021/GNT2010051); and through a research agreement with the World Health Organisation (Health Emergency Information & Risk Assessment, Health Emergencies Programme).
Ethics
The study was undertaken as urgent public health action to support Australia’s COVID19 pandemic response. The study used data from the Australian National Notifiable Disease Surveillance System (NNDSS) provided to the Australian Government Department of Health under the National Health Security Agreement for the purposes of national communicable disease surveillance. Data from the NNDSS were supplied after deidentification to the investigator team for the purposes of provision of epidemiological advice to government. Contractual obligations established strict data protection protocols agreed between the University of Melbourne and subcontractors and the Australian Government Department of Health, with oversight and approval for use in supporting Australia’s pandemic response and for publication provided by the data custodians represented by the Communicable Diseases Network of Australia. The ethics of the use of these data for these purposes, including publication, was agreed by the Department of Health with the Communicable Diseases Network of Australia.
Senior Editor
 Eduardo Franco, McGill University, Canada
Reviewing Editor
 Caroline Colijn, Simon Fraser University, Canada
Reviewers
 Michael Plank, University of Canterbury, New Zealand
 Amy Hurford, Memorial University of Newfoundland, Canada
Publication history
 Preprint posted: November 29, 2021 (view preprint)
 Received: February 22, 2022
 Accepted: January 16, 2023
 Accepted Manuscript published: January 20, 2023 (version 1)
 Version of Record published: March 8, 2023 (version 2)
Copyright
© 2023, Golding et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 485
 Page views

 106
 Downloads

 3
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Epidemiology and Global Health
Background: While biological age in adults is often understood as representing general health and resilience, the conceptual interpretation of accelerated biological age in children and its relationship to development remains unclear. We aimed to clarify the relationship of accelerated biological age, assessed through two established biological age indicators, telomere length and DNA methylation age, and two novel candidate biological age indicators , to child developmental outcomes, including growth and adiposity, cognition, behaviour, lung function and onset of puberty, among European schoolage children participating in the HELIX exposome cohort.
Methods: The study population included up to 1,173 children, aged between 5 and 12 years, from study centres in the UK, France, Spain, Norway, Lithuania, and Greece. Telomere length was measured through qPCR, blood DNA methylation and gene expression was measured using microarray, and proteins and metabolites were measured by a range of targeted assays. DNA methylation age was assessed using Horvath's skin and blood clock, while novel blood transcriptome and 'immunometabolic' (based on plasma protein and urinary and serum metabolite data) clocks were derived and tested in a subset of children assessed six months after the main followup visit. Associations between biological age indicators with child developmental measures as well as health risk factors were estimated using linear regression, adjusted for chronological age, sex, ethnicity and study centre. The clock derived markers were expressed as Δ age (i.e., predicted minus chronological age).
Results: Transcriptome and immunometabolic clocks predicted chronological age well in the test set (r= 0.93 and r= 0.84 respectively). Generally, weak correlations were observed, after adjustment for chronological age, between the biological age indicators. Among associations with health risk factors, higher birthweight was associated with greater immunometabolic Δ age, smoke exposure with greater DNA methylation Δ age and high family affluence with longer telomere length. Among associations with child developmental measures, all biological age markers were associated with greater BMI and fat mass, and all markers except telomere length were associated with greater height, at least at nominal significance (p<0.05). Immunometabolic Δ age was associated with better working memory (p = 4e 3) and reduced inattentiveness (p= 4e 4), while DNA methylation Δ age was associated with greater inattentiveness (p=0.03) and poorer externalizing behaviours (p= 0.01). Shorter telomere length was also associated with poorer externalizing behaviours (p=0.03).
Conclusions: In children, as in adults, biological ageing appears to be a multifaceted process and adiposity is an important correlate of accelerated biological ageing. Patterns of associations suggested that accelerated immunometabolic age may be beneficial for some aspects of child development while accelerated DNA methylation age and telomere attrition may reflect early detrimental aspects of biological ageing, apparent even in children.
Funding: UK Research and Innovation (MR/S03532X/1); European Commission (grant agreement numbers: 308333; 874583).

 Epidemiology and Global Health
Background:
Shortterm forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID19 cases and deaths across Europe between 08 March 2021 and 07 March 2022.
Methods:
We used opensource tools to develop a public European COVID19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID19 cases and deaths reported by a standardised source for 32 countries over the next 1–4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equallyweighted average (initially the mean and then from 26th July the median) of all individual models’ predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models’ forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models’ past predictive performance.
Results:
Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models’ forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models’ forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models’ forecasts of deaths (N=763 predictions from 20 models). Across a 1–4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models.
Conclusions:
Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks.
Funding:
AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS1633028, NSF Grant No.: OAC1916805, NSF Expeditions in Computing Grant CCF1918656, CCF1917819, NSF RAPID CNS2028004, NSF RAPID OAC2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA119D0007, and respectively Virginia Dept of Health Grant VDH215010141, VDH215010143, VDH215010147, VDH215010145, VDH215010146, VDH215010142, VDH215010148. AF, AMa, GL funded by SMIGE  Modelli statistici inferenziali per governare l'epidemia, FISR 2020Covid19 I Fase, FISR2020IP00156, Codice Progetto: PRJ0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission  DirectorateGeneral for Communications Networks, Content and Technology through the contract LC01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018095456BI00. DE., MGu funded by Spanish Ministry of Health / REACTUE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID19 (https://www.nfdi4health.de/taskforcecovid192) within the framework of a DFGproject (LO342/171). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for realtime monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).