Estimating SARSCoV2 seroprevalence and epidemiological parameters with uncertainty from serological surveys
Abstract
Establishing how many people have been infected by SARSCoV2 remains an urgent priority for controlling the COVID19 pandemic. Serological tests that identify past infection can be used to estimate cumulative incidence, but the relative accuracy and robustness of various sampling strategies have been unclear. We developed a flexible framework that integrates uncertainty from test characteristics, sample size, and heterogeneity in seroprevalence across subpopulations to compare estimates from sampling schemes. Using the same framework and making the assumption that seropositivity indicates immune protection, we propagated estimates and uncertainty through dynamical models to assess uncertainty in the epidemiological parameters needed to evaluate public health interventions and found that sampling schemes informed by demographics and contact networks outperform uniform sampling. The framework can be adapted to optimize serosurvey design given test characteristics and capacity, population demography, sampling strategy, and modeling approach, and can be tailored to support decisionmaking around introducing or removing interventions.
Introduction
Serological testing is a critical component of the response to COVID19 as well as to future epidemics. Assessment of population seropositivity, a measure of the prevalence of individuals who have been infected in the past and developed antibodies to the virus, can address gaps in knowledge of the cumulative disease incidence. This is particularly important given inadequate viral diagnostic testing and incomplete understanding of the rates of mild and asymptomatic infections (Sutton et al., 2020). In this context, serological surveillance has the potential to provide information about the true number of infections, allowing for robust estimates of case and infection fatality rates (Fontanet et al., 2020) and for the parameterization of epidemiological models to evaluate the possible impacts of specific interventions and thus guide public health decisionmaking.
The proportion of the population that has been infected by, and recovered from, the coronavirus causing COVID19 will be a critical measure to inform policies on a population level, including when and how social distancing interventions can be relaxed, and the prioritization of vaccines (Bubar et al., 2021). Individual serological testing may allow lowrisk individuals to return to work, school, or university, contingent on the immune protection afforded by a measurable antibody response (Weitz et al., 2020; Larremore, 2020). At a population level, however, methods are urgently needed to design and interpret serological data based on testing of subpopulations, including convenience samples such as blood donors (Valenti et al., 2020; Erikstrup et al., 2021; Fontanet et al., 2020) and neonatal heel sticks, to reliably estimate population seroprevalence.
Three sources of uncertainty complicate efforts to learn population seroprevalence from subsampling. First, tests may have imperfect sensitivity and specificity, and studies that do not adjust for test imperfections will produce biased seroprevalence estimates. Complicating this issue is the fact that sensitivity and specificity are, themselves, estimated from data (Larremore and Fosdick, 2020; Gelman and Carpenter, 2020), which can lead to statistical confusion if uncertainty is not correctly propagated (Bendavid et al., 2020). Second, the population sampled will likely not be a representative random sample (Bendavid et al., 2020), especially in the first rounds of testing, when there is urgency to test using convenience samples and potentially limited serological testing capacity. Third, there is uncertainty inherent to any modelbased forecast that uses the empirical estimation of seroprevalence, regardless of the quality of the test, in part because of the uncertain relationship between seropositivity and immunity (Tan et al., 2020; Ward et al., 2020).
A clear evidencebased guide to aid the design of serological studies is critical to policy makers and public health officials both for estimation of seroprevalence and forwardlooking modeling efforts, particularly if serological positivity reflects immune protection. To address this need, we developed a framework that can be used to design and interpret crosssectional serological studies, with applicability to SARSCoV2. Starting with results from a serological survey of a given size and age stratification, the framework incorporates the test’s sensitivity and specificity and enables estimates of population seroprevalence that include uncertainty. These estimates can then be used in models of disease spread to calculate the effective reproductive number ${R}_{\text{eff}}$, the transmission potential of SARSCoV2 under partial immunity, forecast disease dynamics, and assess the impact of candidate public health and clinical interventions. Similarly, starting with a prespecified tolerance for uncertainty in seroprevalence estimates, the framework can be used to optimize the sample size and allocation needed. This framework can be used in conjunction with any model, including ODE models (Kissler et al., 2020a; Weitz et al., 2020), agentbased simulations (Ferguson et al., 2020), or network simulations (StOnge et al., 2019), and can be used to estimate ${R}_{\text{eff}}$ or to simulate transmission dynamics.
Materials and methods
Design and modeling framework
Request a detailed protocolWe developed a framework for the design and analysis of serosurveys in conjunction with epidemiological models (Figure 1), which can be used in two directions. In the forward direction, starting from serological data, one can estimate seroprevalence. While valuable on its own, seroprevalence can also be used as the input to an appropriate model to update forecasts or estimate the impacts of interventions. In the reverse direction, sample sizes can be calculated to yield seroprevalence estimates with a desired level of uncertainty and efficient sampling strategies can be developed based on prospective modeling tasks. The key methods include seroprevalence estimation, propagation of uncertainty through models, and modelinformed sample size calculations.
Bayesian inference of seroprevalence
Request a detailed protocolTo integrate uncertainty arising from test sensitivity and specificity, we used a Bayesian model to produce a posterior distribution of seroprevalence that incorporates uncertainty associated with a finite sample size (Figure 1, green annotations). We denote the posterior probability that the true population seroprevalence is equal to $\theta $, given test outcome data X and test sensitivity and specificity characteristics, as $\text{Pr}(\theta \mid X,\text{se},\text{sp})$. Because sample size and outcomes are included in X, and because test sensitivity and specificity are included in the calculations, this posterior distribution over $\theta $ appropriately handles uncertainty due to limited sample sizes and an imperfect testing instrument, and can be used to produce a point estimate of seroprevalence or a posterior credible interval. The model and sampling algorithm are fully described in Appendix A1.
Sampling frameworks for seropositivity estimates are likely to be nonrandom and constrained to subpopulations. For example, convenience sampling (testing blood samples that were obtained for another purpose and are readily available) will often be the easiest and quickest data collection method (Winter et al., 2018). Two examples of such convenience samples are newborn heel stick dried blood spots, which contain maternal antibodies and thus reflect maternal exposure, and serum from blood donors (Valenti et al., 2020; Erikstrup et al., 2021; Fontanet et al., 2020). As a result, another source of statistical uncertainty comes from uneven sampling from a population.
To estimate seropositivity for all subpopulations based on a given sample (stratified, convenience, or otherwise), we specified a Bayesian hierarchical model that included a common prior distribution on subpopulationspecific seropositivities ${\theta}_{i}$ (Appendix A1). In effect, this allowed seropositivity estimates from individual subpopulations to inform each other while still taking into account subpopulationspecific testing outcomes. The joint posterior distribution of all subpopulation prevalences was sampled using Markov chain Monte Carlo (MCMC) methods (Appendix A1). Those samples, representing posterior seroprevalence estimates for individual subpopulations, were then combined in a demographically weighted average to obtain estimates of overall seroprevalence, a process commonly known as poststratification (Little, 1993; Gelman and Carpenter, 2020). We focus the demonstrations and analyses of our methods on agebased subpopulations due to their integration into POLYMODtype agestructured models (Mossong et al., 2008; Prem et al., 2017), but note that our mathematical framework generalizes naturally to other definitions of subpopulations, including those defined by geography (Fontanet et al., 2020; Nyc health testing data, 2020; Nisar et al., 2020; Malani et al., 2020).
Propagating serological uncertainty through models
Request a detailed protocolIn addition to estimating core epidemiological quantities (Farrington and Whitaker, 2003; Farrington et al., 2001; Hens et al., 2012) or mapping out patterns of outbreak risk (Abrams et al., 2014), the posterior distribution of seroprevalence can be used as an input to any epidemiological model. Such models include the standard SEIR model, where the proportion seropositive may correspond to the recovered/immune compartment, as well as more complex frameworks such as an agestructured SEIR model incorporating interventions like school closures and social distancing (Davies et al., 2020; Figure 1, blue annotations). We integrated and propagated uncertainty in the posterior estimates of seroprevalence and uncertainty in model dynamics or parameters using Monte Carlo sampling to produce a posterior distribution of epidemic trajectories or key epidemiological parameter estimates (Figure 1, black annotations).
Singlepopulation SEIR model with social distancing and serology
Request a detailed protocolTo integrate inferred seroprevalence with uncertainty into a singlepopulation SEIR model, we created an ensemble of SEIR model trajectories by repeatedly running simulations whose initial conditions were drawn from the seroprevalence posterior distribution. In particular, the seroprevalence posterior distribution was sampled, and each sample $\theta $ was used to inform the fraction of the population initially placed into the ‘recovered’ compartment of the model. Thus, uncertainty in posterior seroprevalence was propagated through model outcomes, which were measured as epidemic peak timing and peak height. Social distancing was modeled by decreasing the contact rate between susceptible and infected model compartments. A full description of the model and its parameters can be found in Appendix A2 and Supplementary file 1.
Agestructured SEIR model with serology
Request a detailed protocolTo integrate inferred seroprevalence with uncertainty into an agestructured SEIR model, we considered a model with 16 age bins ($04,59,\mathrm{\dots}7579$). This model was parameterized using countryspecific agecontact patterns (Mossong et al., 2008; Prem et al., 2017) and COVID19 parameter estimates (Davies et al., 2020). The model, due to Davies et al., 2020, includes agespecific clinical fractions and varying durations of preclinical, clinical, and subclinical infectiousness, as well as a decreased infectiousness for subclinical cases. A full description of the model and its parameters can be found in Appendix A2 and Supplementary file 1.
As in the singlepopulation SEIR model, seroprevalence with uncertainty was integrated into the agestructured model by drawing samples from seroprevalence posterior to specify the fraction of each subpopulation placed into ‘recovered’ compartments. Posterior samples were drawn from the agestratified joint posterior distribution whose subpopulations matched the model’s subpopulations. For each set of posterior samples, the effective reproduction number ${R}_{\text{eff}}$ was computed from the model’s nextgeneration matrix. Thus, we quantified both the impact of agestratified seroprevalence (assumed to be protective) on ${R}_{\text{eff}}$ as well as uncertainty in ${R}_{\text{eff}}$.
Serosurvey sample size and allocation for inference and modeling
Request a detailed protocolThe flexible framework described in Figure 1 enables the calculation of sample sizes for different serological survey designs. To calculate the number of tests required to achieve a seroprevalence estimate with a specified tolerance for uncertainty, and to determine optimal test allocation across subpopulations in the context of studying a particular intervention, we treated the estimate uncertainty as a framework output and then sought to minimize it by improving the allocation of samples (Figure 1, dashed arrow).
Uniform allocation of samples to subpopulations is not always optimal. It can be improved by (i) increasing sampling in subpopulations with higher seroprevalence and (ii) sampling in subpopulations with higher relative influence on the quantity to be estimated. This approach, which we term model and demographics informed (MDI), allocates samples to subpopulations in proportion to how much sampling them would decrease the posterior variance of estimates, that is, ${n}_{i}\propto {x}_{i}\sqrt{{\theta}_{i}^{*}(1{\theta}_{i}^{*})}$, where ${\theta}_{i}^{*}=1\text{sp}+{\theta}_{i}(\text{se}+\text{sp}1)$ is the probability of a positive test in subpopulation i given test sensitivity (se), test specificity (sp), and subpopulation seroprevalence ${\theta}_{i}$, and x_{i} is the relative importance of subpopulation i to the quantity to be estimated.
The sample allocation recommended by MDI varies depending on the information available and the quantity of interest. When the key quantity is overall seroprevalence, x_{i} is the fraction of the population in subpopulation i. When the key quantity is total infections, the effective reproductive number, ${R}_{\text{eff}}$, or another quantity derived from compartmental models with subpopulations, x_{i}, is the ith entry of the principal eigenvector of the model’s nextgeneration matrix, after modification to include modeled interventions. In such scenarios, this approach balances the importance of sampling subpopulations due to their role in dynamics (x_{i}) and higher variance in seroprevalence estimates themselves ($\sqrt{{\theta}_{i}^{*}(1{\theta}_{i}^{*})}$). If subpopulation prevalence estimates ${\theta}_{i}$ are unknown, sample allocation based solely on x_{i} is recommended. These methods are derived in Appendix A3.
Data sources
Request a detailed protocolAge distribution of U.S. blood donors was drawn from a study of Atlanta donors (Shaz et al., 2011). Age distribution of U.S. mothers was drawn from the 2016 CDC Vital Statistics Report using Massachusetts as a reference state (Martin et al., 2018). Daily agestructured contact data were drawn from Prem et al., 2017. All data were represented using 5year age bins, that is, $(04$, $59$,…,$7479)$. For datasets with bins wider than 5 years, counts were distributed evenly into the 5year bins. Serological test characteristics were collected from registrations with the U.S. Food and Drug Administration, 2021 and summarized in Supplementary file 1. No attempt was made to test or validate manufacturer claims, and point estimates of sensitivity and specificity were used that did not incorporate test calibration sample sizes (Gelman and Carpenter, 2020; Larremore and Fosdick, 2020). Demographic data for the U.S., India, and Switzerland (analyzed in the article) as well as other countries (provided in opensource code) were downloaded from the 2019 United Nations World Populations Prospects report (United Nations, 2019). Hypothetical survey samples were drawn based on comprehensive seroprevalence estimates from Geneva, Switzerland (Stringhini et al., 2020).
Results
Test sensitivity/specificity, sampling bias, and true seroprevalence influence the accuracy and robustness of estimates
We simulated serological data from a single population with seroprevalence rates ranging from 1% to 50% using the reported sensitivity (90%) and specificity (>99.9%) of the Euroimmun SARSCoV2 IgG test (U.S. Food and Drug Administration, 2021; Supplementary file 1), and with the number of samples ranging from 100 to 5000. We constructed Bayesian posterior estimates of seroprevalence, finding that, when seroprevalence is 10% or lower, around 1000 samples are necessary to correctly estimate seroprevalence to within ±2% (Figure 2). Marketed tests with other characteristics also required around 1000 tests (Figure 2—figure supplement 1A, B) to achieve the same uncertainty levels, approaching the minimum sample size achieved by a theoretical test with perfect sensitivity and specificity (Figure 2—figure supplement 1C). Similar calculations for other test characteristics may be performed using the opensource tools that accompany this study (Opensource code repository and reproducible notebooks for this manuscript, 2020). In general, estimates were most uncertain when true seropositivity was near 50%, the number of samples was low, and/or test sensitivity/specificity were low.
Next, we tested the ability of the Bayesian hierarchical model to infer both population and subpopulation seroprevalence. We simulated serological data from subpopulations for which samples were allocated and with heterogeneous seroprevalence levels (Supplementary file 2) and average seroprevalence values between 5% and 50%. Test outcomes were randomly generated conditioning on the false positive and negative properties of the test being modeled (Supplementary file 1). Test allocations across subpopulations were specified in proportion to age demographics of blood donations, delivering mothers, uniformly across subpopulations, or according to an MDI allocation focused on minimized posterior uncertainty in ${R}_{\text{eff}}$.
Credible intervals of the resulting overall seroprevalence estimates were influenced by the age demographics sampled, with the most uncertainty in the newborn dried blood spots sample set, due to the narrow age range for the mothers (Figure 3). For such sampling strategies, which draw from only a subset of the population, our approach assumes that seroprevalence in each subpopulation does not dramatically vary and thus infers that seroprevalence in the unsampled bins is similar to that in the sampled bins but with increased uncertainty. Uncertainty was also influenced by the overall seroprevalence, such that the width of the 95% credible interval increased with higher seroprevalence for a given sample size. While test sensitivity and specificity also impacted uncertainty, central estimates of overall seropositivity were robust for sampling strategies that spanned the entire population. Note that the MDI sample allocation shown in Figure 3 was optimized to estimate the effective reproductive number ${R}_{\text{eff}}$, and thus, while it performs well, it is slightly outperformed by uniform sampling when used to estimate overall seroprevalence.
Seroprevalence estimates inform uncertainty in epidemic peak, timing, and reproductive number
Figure 4 illustrates how the height and timing of peak infections varied in forward simulations under two serological sampling scenarios and two hypothetical social distancing policies for a basic SEIR framework parameterized using seroprevalence data. Uncertainty in seroprevalence estimates propagated through SEIR model outputs in stages: larger sample sizes at a given seroprevalence resulted in a smaller credible interval for the seroprevalence estimate, which improved the precision of estimates of both the height and timing of the epidemic peak. We note that seroprevalence estimates without correction for the sensitivity and specificity of the test resulted in biased estimates in spite of increasing precision with larger sample size (Figure 4C, D). Test characteristics also impacted model estimates, with more specific and sensitive tests leading to more precise estimates (Figure 4—figure supplement 1). Even estimations from a perfect test carried uncertainty corresponding to the size of the sample set (Figure 4—figure supplement 1).
Figure 5 illustrates how the Bayesian hierarchical model extrapolates seroprevalence values in sampled subpopulations, based on convenience samples from particular age groups or agestratified serological surveys, to the overall population, with uncertainty propagated from these estimates to modelinferred epidemiological parameters of interest, such as the effective reproduction number ${R}_{\text{eff}}$. Estimates from 1000 neonatal heel sticks or blood donations achieved more uncertain, but still reasonable, estimates of overall seroprevalence and ${R}_{\text{eff}}$ as compared to uniform or demographically informed sample sets (Figure 5). Here, convenience samples produced higher confidence estimates in the heavily sampled subpopulations, but high uncertainty estimates in unsampled populations through our Bayesian modeling framework. In all scenarios, our framework propagated uncertainty appropriately from serological inputs to estimates of overall seroprevalence (Figure 5I) or ${R}_{\text{eff}}$ (Figure 5J). Importantly, we note that the inferred posterior estimates shown in Figure 5 are derived from stochastically generated data, meaning that repeating this numerical experiment would produce different simulated test outcomes and therefore different inferred seroprevalence and ${R}_{\text{eff}}$ estimates whose accuracy will stochastically vary, as expected. Improved test sensitivity and specificity correspondingly improved estimation and reduced the number of samples required (i) to achieve the same credible interval for a given seroprevalence and (ii) estimates of ${R}_{\text{eff}}$ (Figure 5—figure supplements 1 and 2).
If the subpopulations in the convenience sample have systematically different seroprevalence rates from the general population, increasing the sample size may bias estimates (Figure 5—figure supplements 3 and 4) while simultaneously decreasing the widths of posterior credible intervals, producing higher confidence in estimates in spite of their bias. This may be avoided using data from other sources or by updating the prior distributions in the Bayesian model with known or hypothesized relationships between seroprevalence of the sampled and unsampled populations. In general, the magnitude of this type of bias is not possible to estimate without secondary sources of seroprevalence data, differentiating it from the avoidable biases that result from failing to poststratify based on population demographics or adjust for the sensitivity and specificity of the test instrument.
Strategic sample allocation improves estimates
We used the MDI strategy to design a study that optimizes estimation of ${R}_{\text{eff}}$ and then tested the performance of the sample allocations against those resulting from blood donation and neonatal heel stick convenience sampling, as well as uniform sampling. As designed, MDI produced higher confidence posterior estimates (Figure 5J, Figure 5—figure supplement 2). Importantly, because the relative importance of subpopulations in a model varies based on the hypothetical interventions being modeled (e.g., the reopening of workplaces would place higher importance on the serological status of workingage adults), MDI sample allocation recommendations should be derived for multiple hypothetical interventions and then averaged to design a study from which the largest variety of high confidence results can be derived. To illustrate how such recommendations would work in practice, we computed MDI recommendations to optimize three scenarios for the contact patterns and demography of the U.S. and India, deriving a balanced sampling recommendation (Figure 6).
Discussion
There is a critical need for serological surveillance of SARSCoV2 to estimate cumulative incidence. Here, we presented a formal framework for doing so to aid in the design and interpretation of serological studies, which avoids the biases associated with seroprevalence estimates that fail to account for sensitivity, specificity, and sampling schemes. We considered that sampling may be done in multiple ways, including efforts to approximate seroprevalence using convenience samples, as well as more complex and resourceintensive structured sampling schemes, and that these efforts may use one of any number of serological tests with distinct test characteristics. We incorporated into this framework an approach to propagating the estimates and associated uncertainty through mathematical models of disease transmission (focusing on scenarios where seroprevalence maps to immunity) to provide decisionmakers with tools to evaluate the potential impact of interventions and thus guide policy development and implementation.
Our results suggest approaches to serological surveillance that can be adapted as needed based on preexisting knowledge of disease prevalence and trajectory, availability of convenience samples, and the extent of resources that can be put towards structured survey design and implementation. While this work focuses on the design and analysis of single crosssectional surveys, stratified by age, extensions to the analysis of serial crosssectional surveys (Stringhini et al., 2020; Nisar et al., 2020) or other stratifications are also possible. Our results suggest that such surveys could benefit from the rebalancing of limited test budgets between subpopulations from one crosssectional wave to the next by basing each wave’s test allocation strategy on the MDI recommendations derived from the preceding wave. Although our numerical demonstrations here focused on heterogeneity and modeling by age, our work may be applied to any population stratification for which there is heterogeneity. Indeed, seroprevalence studies by neighborhood in New York City (Nyc health testing data, 2020), Karachi (Nisar et al., 2020), and Mumbai (Malani et al., 2020) have all found geographical variation in seroprevalence. In situations where sample sizes are low and heterogeneity is high, hyperprior parameters can be adjusted to accommodate larger variation between subpopulations.
In the absence of baseline estimates of cumulative incidence, an initial serosurvey can provide a preliminary estimate (Figure 2). Our framework updates the 'rule of 3' approach (Hanley and LippmanHand, 1983) by incorporating uncertainty in test characteristics and can further address uncertainty from biased sampling schemes (see Appendix A4). As a result, convenience samples, such as the maternal antibodies within newborn heel stick dried blood spots or samples from blood donors, can be used to estimate population seroprevalence. However, it is important to note that in the absence of reliable assessment of correlations in seroprevalence across age groups, extrapolations from these convenience samples to entire populations may be misleading as sample size increases (Figure 5—figure supplement 3). Indeed, as convenience sample size increases, credible intervals will shrink, which, if sampled groups are unrepresentative of unsampled groups, will constitute a ‘false precision’. Uniform or model and demographic informed samples, while more challenging logistically to implement, give the most reliable estimates. The results of a onetime study could be used to update the priors of our Bayesian hierarchical model and improve the inferences from convenience samples. In this context, we note that our framework naturally allows the integration of samples from multiple test kits and protocols, provided that their sensitivities and specificities can be estimated (Larremore and Fosdick, 2020; Gelman and Carpenter, 2020), which will become useful as serological assays improve in their specifications.
The results from serological surveys will be invaluable in projecting epidemic trajectories and understanding the impact of interventions such as ageprioritized vaccination (Bubar et al., 2021). We have shown how the estimates from these serological surveys can be propagated into transmission models, incorporating model uncertainty as well. Conversely, to aid in rigorous assessment of particular interventions that meet accuracy and precision specifications, this framework can be used to determine the needed number and distribution of population samples via model and demographicinformed sampling. Extensions could conceivably address other study planning questions, including sampling frequency (Herzog et al., 2017).
There are a number of limitations to this approach that reflect uncertainties in the underlying assumptions of serological responses and the changes in mobility and interactions due to public health efforts (Kissler et al., 2020b). Serology reflects past infection, and the delay between infection and detectable immune response means that serological tests reflect a historical cumulative incidence (the date of sampling minus the delay between infection and detectable response). However, due to the waning of antibody concentrations over time (Ward et al., 2020), seroreversion may cause seroprevalence studies to underestimate cumulative incidence. As a consequence, modeling studies that incorporate seroprevalence estimates should acknowledge such potential delays and seroreversion when interpreting their findings. The possibility of heterogeneous immune responses to infection and unknown dynamics and duration of immune response means that interpretation of serological survey results may not accurately capture cumulative incidence. For COVID19, we do not yet understand the serological correlates of protection from infection, and as such projecting seroprevalence into models that assume seropositivity indicates immunity to reinfection may be an overestimate; models would need to be updated to include partial protection or return to susceptibility.
Our work also requires the specification of prior and hyperprior distributions, assumptions inherent to any Bayesian approach to statistical inference. Here, we used uninformative uniform prior distributions and a weakly informative hyperprior distribution in order to impose minimal assumptions when modeling the data. This is a conservative choice as assuming uninformative prior distributions results in higher posterior uncertainty. While informative priors can reduce uncertainty in seroprevalence studies (Gelman and Carpenter, 2020), specifying such priors appropriately relies on additional information and/or assumptions about the study population, which may be sparse, particularly during an unfolding pandemic.
Use of model and demographicinformed sampling schemes is valuable for projections that evaluate interventions but are dependent on accurate parameterization. While in our examples we used POLYMOD and other contact matrices, these represent the status quo ante and should be updated to the extent possible using other data, such as those obtainable from surveys (Mossong et al., 2008; Prem et al., 2017) and mobility data from online platforms and mobile phones (Buckee et al., 2020; Ainslie et al., 2020; Open COVID19 Data Working Group et al., 2020). Moreover, the framework could be extended to geographic heterogeneity as well as longitudinal sampling if, for example, one wanted to compare whether the estimated quantities of interest (e.g., seroprevalence, ${R}_{\text{eff}}$) differ across locations or time (Abrams et al., 2014; Stringhini et al., 2020; Kissler et al., 2020a).
Here, we explored only SEIR models, but extensions to alternatives that incorporate waning immunity (Ward et al., 2020) and a return to full or partial susceptibility are possible (SaadRoy et al., 2020). Clarified understanding of SARSCoV2 antibody titers, protection, and durability will further inform whether it is appropriate to model seropositive individuals as no longer susceptible, as we did in example calculations here. We note that, across model types, the derivation of modelfocused MDI sample allocation strategies requires only the formulation of a nextgeneration matrix or network of subpopulations’ epidemiological impacts on each other, providing a more general framework spanning model assumptions and classes.
Overall, the framework here can be adapted to communities of varying size and resources seeking to monitor and respond to SARSCoV2 and future pandemics. Further, while the analyses and discussion focused on addressing urgent needs, this is a generalizable framework that with appropriate modifications can be applicable to other infectious disease epidemics.
Appendix 1
A1 Bayesian inference of seroprevalence
A1.1 Inference of seroprevalence in a sample using an imperfect test
If a serological test had perfect sensitivity and specificity, the probability of observing ${n}_{+}$ seropositive results from n tests, given a true population seroprevalence $\theta $ is given by the binomial distribution:
However, imperfect specificity and sensitivity require that we modify this formula. For convenience, in the remainder of this Appendix, we will use
Using this notation, the probability that a single test returns a positive result, given u, v, and the true seroprevalence $\theta $, is
Substituting this persample probability into Equation (A1) yields
Note that Equation (A1), and therefore Equation (A3), both assume that samples are drawn independently and can therefore be computed using a binomial likelihood. This assumption may be modified in scenarios in which factors contributing to nonindependence of samples are known and measured, for example, when individuals are sampled but are known to belong to the same household (Nisar et al., 2020). Finally, using Bayes’ rule, we can write the posterior distribution over seropositivity $\theta $, given the data, the test’s parameters (Diggle, 2011), and an uninformative (uniform) prior on $\theta $, yielding
where B is an incomplete beta function without normalization. In practice, to sample from this distribution, one can use an accept–reject algorithm with, for example, a uniform proposal distribution and consider only the numerator of Equation (A4). Alternatively, one can generate samples from a truncated beta distribution using accept–reject sampling or an inverse cumulative distribution function method, and these samples can be transformed to represent draws from Equation (A4).
A1.2 Bayesian estimation of seroprevalence across subpopulations
For a test with sensitivity $1v$ and specificity $1u$, and given ${n}_{i+}$ seropositive results from n_{i} tests in subpopulation i—set equal to zero for unsampled subpopulations—the posterior distribution over the vector of subpopulation seropositivities $\mathit{\bm{\theta}}=\{{\theta}_{i}\}$ given all results ${\mathit{\bm{n}}}_{\mathbf{+}}=\{{n}_{i+}\}$ is given by
where we have included a hierarchy of priors. Specifically, the prior for each subpopulation seroprevalence was ${\theta}_{i}\sim \text{Beta}(\overline{\theta}\gamma ,(1\overline{\theta})\gamma )$, which has expectation $\overline{\theta}$ and variance $\overline{\theta}(1\overline{\theta})/(\gamma +1)$. The hyperprior for the overall mean $\overline{\theta}$ was uniform on the interval (0,1), allowing it to be dictated by the observed data. The hyperprior for the variance parameter was $\gamma \sim \text{Gamma}(\nu ,\text{scale}={\gamma}_{0}/\nu )$, which has expected value $E[\gamma ]={\gamma}_{0}$ and $Var[\gamma ]={\gamma}_{0}^{2}/\nu $.
A1.3 Sampling from the Bayesian hierarchical model for subpopulation seroprevalences using MCMC
We sample from the joint posterior distribution inside the integral in Equation (A5) using a MCMC algorithm, with univariate Metropolis–Hastings updates. We initialize the agespecific seroprevalence parameters at ${\theta}_{i}=({n}_{+}+1)/({n}_{i}+2)$, set $\overline{\theta}$ equal to the sample mean of the $\{{\theta}_{i}\}$ and set $\gamma ={\gamma}_{0}$. For each simulation, the MCMC algorithm was run for a total of 50,100 iterations. The first 100 iterations were discarded and every 50th sample was saved to obtain 1000 samples from the joint posterior distribution. Code is open source and freely available (https://github.com/LarremoreLab/covidserologicalsampling). Trace plots and effective sample sizes of the posterior samples were used to evaluate convergence and mixing of the chains. Effective sample sizes were greater than 6000 for all parameters, and trace plots did not raise concerns about the MCMC algorithm settings.
A2 Including protective seropositivity into models
A2.1 Canonical singlepopulation SEIR model with social distancing and seropositivity
Let S, E, I, and R be the number of susceptible, exposed, infected, and recovered people in a population of size N, $S+E+I+R=N$. We model dynamics by
where $\beta $, $\alpha $, and $\gamma $ represent the rates of infection, symptom onset, and recovery, respectively, as in a standard SEIR model. To model social distancing, we include the contact parameter $\rho \in [0,1]$, which modulates the fraction of social contacts between S and I populations that remain. Thus, $\rho =1$ represents no social distancing while $\rho =0.5$ would represent a 50% reduction in contacts. In the simulations of this article, only $\rho =0.5,0.75$ were considered as examples of dynamics.
To parameterize this model using seroprevalence, we made the modeling assumption that seropositive individuals are immune. Noting that this is only an assumption that at present requires indepth research, we therefore placed seropositive individuals into the recovered group. In other words, for a seropositive fraction $\theta $, with 10 individuals in the E and I compartments each, initial conditions would be
A2.2 Canonical singlepopulation SEIR parameters and simulation details
Parameter values used in this study can be found in Supplementary file 2. In prose, the model used transmission rate $\beta =1.75$, exposuretoinfected rate $\alpha =0.2$, and recovery rate $\gamma =0.5$, with no births or deaths, in a finite population of size $N=10,000$. Social distancing was implemented as a coefficient $\rho =\{0.5,0.75\}$, corresponding to 50% and 25% social distancing, multiplying the contact rate between infected and susceptible populations. Integration was performed for 150 days with a timestep of 0.1 days. Initial conditions for $(S,E,I,R)$ were $(N20\theta N,10,10,\theta N)$, to simulate a fraction $\theta $ of recovered individuals, assumed to be immune. For each sampled value of $\theta $, peak infection height and timing were extracted from forwardintegrated time series.
A2.3 Agestructured (POLYMOD) SEIR model with seropositivity
A model with 16 age bins ($04,59,\mathrm{\dots}7579$) was parameterized using countryspecific agecontact patterns (Mossong et al., 2008; Prem et al., 2017) and COVID19 parameter estimates (Davies et al., 2020). The model includes agespecific clinical fractions and varying durations of preclinical, clinical, and subclinical infectiousness, as well as a decreased infectiousness for subclinical cases (Davies et al., 2020).
Davies et al. define a nextgeneration matrix
where u_{i} is the susceptibility of age group i; ${C}_{ij}$ is the number of agej individuals contacted by an agei individual per day; ${y}_{j}$ is the probability that an infection is clinical for an agej individual; ${\mu}_{P}$, ${\mu}_{C}$, and ${\mu}_{S}$ are mean durations of preclinical, clinical, and subclinical infectiousness, respectively; and f is the relative infectiousness of subclinical cases (Davies et al., 2020). Values for all parameters are reported in Supplementary file 2.
Protective seropositivity can be included in the model by multiplying ${N}_{ij}$ as defined above by $1{\theta}_{i}$, where ${\theta}_{i}$ is the seropositivity rate of age group i. With this included term, we can modify Equation (A7) as
where ${D}_{\mathit{\bm{x}}}$ represents a diagonal matrix with entries ${D}_{ii}={x}_{i}$, and the constants are defined $a={\mu}_{P}+{\mu}_{C}f{\mu}_{S}$ and $b={\mu}_{S}$.
The effective reproductive number is then the spectral radius $\rho $ (i.e., the largest eigenvalue $\lambda $) of the nextgeneration matrix:
As written, Equation (A9) represents a model component shown in Figure 1 (blue annotations) as it maps parameters $\mathit{\bm{\theta}}$ to a point estimate of ${R}_{\text{eff}}$. As with the canonical SEIR model, uncertainty in the model parameters themselves can also be incorporated into overall uncertainty in ${R}_{\text{eff}}$ via Monte Carlo.
A2.4 Agestructured SEIR model parameters and simulation details
Parameter values used in this study can be found in Supplementary file 2 and were generally drawn from the work of Davies et al. and the sources therein. Published estimated contact matrices were used for India and the U.S. in the article, with additional countries’ contact matrices shown in the accompanying opensource code.
A3 MDI sampling
The calculations that follow rely on facts from optimization theory. We briefly review these here before applying these results in what follows.
Let $\mathit{\bm{n}}=({n}_{1},\mathrm{\dots},{n}_{K})$. Suppose we want to minimize a function of the form
subject to the constraint that ${\sum}_{i}{n}_{i}=n$. Using the method of Lagrange multipliers, it can be shown that $f(\mathit{\bm{n}})$ is minimized when ${n}_{i}\propto \sqrt{{c}_{i}}$. We apply this result below with various expressions for c_{i} to determine the optimal allocation of n tests across subpopulations in order to minimize the uncertainty of quantities of interest.
A3.1 Minimizing posterior uncertainty for seroprevalence
Given agespecific seroprevalence estimates $\mathit{\bm{\theta}}$, the estimate for overall seroprevalence is defined as ${\theta}_{pop}={\sum}_{i}{d}_{i}{\theta}_{i}$, where d_{i} is the proportion of the population in group i. The uncertainty of this estimator depends on the uncertainties of the agespecific seroprevalences, which inherently depend on the number of tests n_{i} allotted to each subpopulation. Although the posterior uncertainties of the subpopulation seroprevalences are not available in closed form, we can nevertheless approximate them using the uncertainties in the corresponding maximum likelihood estimators. Here, we consider the maximum likelihood estimators based on a separate binomial model for each subpopulation, that is, models of the form Equation (A3), where $\theta $ is replaced by ${\theta}_{i}$. Note that this model assumes independence among the subpopulation seroprevalences.
The maximum likelihood estimate of ${\theta}_{i}$, given ${n}_{i,+}$ positive tests out of n_{i} tests administered, is
but this is only valid when both the numerator and denominator are positive, corresponding to a value of ${\widehat{\theta}}_{i}$ in the interval (0, 1). If the above estimator is computed and found to be negative, which happens when the fraction of tests that are positive is below the false positive rate, then the maximum likelihood lies at the end point, ${\widehat{\theta}}_{i}=0$. Similarly, if the estimator is found to be greater than one, ${\widehat{\theta}}_{i}=1$. These estimators are undefined if no tests are allocated to group i, that is, when ${n}_{i}=0$.
Using the maximum likelihood estimators as proxies for the subpopulation posterior distributions, we can approximate the posterior variance of ${\theta}_{pop}$ as
where ${\theta}_{i}$ is the true seroprevalence of group i. This variance equation has the form of Equation (A10), and thus the optimal allocation of samples is given by n_{i}
where multiplicative constants have been absorbed into the proportion. In the absence of knowledge about the true subpopulation seroprevalences $\mathit{\bm{\theta}}$, we recommend simply allocating samples with respect to the demographic information: ${n}_{i}\propto {d}_{i}$.
A3.2 Minimizing posterior uncertainty for modeling
When the primary quantity of interest is the output from a model, improved test allocation strategies can be developed by leveraging the model structure. For example, suppose the goal is accurate estimation of the total number of infected individuals at some future time point t. To avoid confusion with the identity matrix I or the subpopulation index i, let ${\mathit{\bm{h}}}^{t}=({h}_{1}^{t},{h}_{2}^{t},\mathrm{\dots})$ denote the vector containing the number of infected individuals within each subpopulation and let the total number of infected individuals be ${H}^{t}={\sum}_{i}{h}_{i}^{t}$. Using the nextgeneration matrix defined in Equation (A7) and modification for the depletion of susceptibles as in Equation (A8), the next=generation matrix updates the vector of infected individuals per subpopulation as
where $\mathit{\bm{x}}$ represents the normalized eigenvector of N corresponding to the largest eigenvalue $\lambda $, and k is a scalar $k={\mathit{\bm{x}}}^{T}{\mathit{\bm{h}}}^{t}$. The nextgeneration matrix N is nonnegative and satisfies the conditions of the Perron–Frobenius theorem, which means that it has a largest eigenvalue $\lambda $—for a nextgeneration matrix, ${R}_{0}=\lambda $—which is greater than or equal to all other eigenvalues, with a corresponding eigenvector x of nonnegative components. This means that repeated applications of N to any initial vector that is not orthogonal to x will become increasingly parallel to x at a rate of $\lambda /{\lambda}_{2}$ per iteration, where ${\lambda}_{2}$ is the second largest eigenvalue of N. This is the basis of the socalled Power Method, which repeatedly applies the matrix to find the largest eigenvalue and its corresponding eigenvector. Rewriting Equation (A13) for each subpopulation i leads to
Note that as seroprevalence increases, $(1\theta )$ approaches zero, thereby accounting for the depletion of susceptibles in subpopulation i by reducing the number of infected individuals therein in the next timestep.
There are two helpful interpretations of Equations (A13) and (A14). First, the vector $\mathit{\bm{x}}$ is the principal ‘direction’ of the nextgeneration matrix, and repeated iterations of the dynamics in a large population will result in infected fractions that are proportional to $\mathit{\bm{x}}$. In the above, we approximate the effect of N on $\mathit{\bm{h}}$ as $k\lambda \mathit{\bm{x}}$, an approximation that is better when $\lambda $ is well separated from the second eigenvalue ${\lambda}_{2}$. Measurements of $\lambda /{\lambda}_{2}$ for models considered in this article ranged from 2 to 4.
A second interpretation of this result appeals to the notion of the nextgeneration matrix N as a network in which the nodes are infected subpopulations and the directed links ${N}_{ij}$ explain the effects of an infection at node j on future infections at node i. In this network dynamical system, by calculating $\mathit{\bm{x}}$ we have computed the eigenvector centralities of the network’s nodes (Mark Newman, 2018), which are a measure of the importance of each subpopulation in the network.
With these preliminary calculations in mind, we turn to the estimation of ${H}^{t}$. Because ${H}^{t}={\sum}_{i}{h}_{i}^{t}$, and because the values ${h}_{i}^{t}$ are all functions of a random variable $\mathit{\bm{\theta}}$, ${H}^{t}$ is also a random variable. Our goal is to minimize its variance by strategically allocating finite samples in order to minimize the important posterior variances among the elements of $\mathit{\bm{\theta}}$. In plain language, some of the subpopulations are more important in shaping future disease dynamics than others, so MDI will preferentially allocate more samples to those subpopulations in a principled manner, which we now derive.
As in Equation (A11), we approximate the posterior variance of $\mathit{\bm{\theta}}$ by the posterior variance of the corresponding maximum likelihood estimator $\widehat{\mathit{\bm{\theta}}}$. This results in the following approximation of the variance of the total number infected:
where x_{i} is the ith element of the principal eigenvector $\mathit{\bm{x}}$. The first expression is obtained by using the approximation in Equations (A13) and (A14). The resulting variance expression has the form of Equation (A10), and thus, ignoring constants, the optimal allocation of samples is given by n_{i}
In the absence of knowledge about the true subpopulation seroprevalences $\mathit{\bm{\theta}}$, we recommend simply allocating samples with respect to the entries of the principal eigenvector: ${n}_{i}\propto {x}_{i}$.
A4 Impact of sensitivity and specificity on the ‘rule of 3’
Suppose we have a perfect test ($u=v=0$) and when we perform n tests, zero are positive. The maximum likelihood estimate of the seroprevalence would be 0. Hanley and LippmanHand, 1983 proposed a simple upper 95% confidence bound on true seroprevalence equal to $3/n$.
The derivation of this rule is motivated by the following question: 'What is the maximum seroprevalence under which the probability of observing zero positives in n tests is less than or equal to 5%?'. Briefly, the probability of a negative test is $\theta $. and thus the probability of observing n negative tests is ${(1\theta )}^{n}$. Setting this equal to 0.05 and solving for $\theta $, we find $\theta =1{.05}^{1/n}\approx 3/n$, where the approximation is based on the power series representation of the exponential function.
Now, let us consider what happens if sensitivity and specificity are not equal to one and again zero positive tests are observed. The probability of a negative test is then $1u\theta (1uv)$. An upper 95% confidence bound on the true seroprevalence is then
where the approximation is derived in a similar manner. Notice if $u>3/n$, this upper bound is less than zero. This occurs when there is inconsistency between the specified false positive rate u and the observed data; namely, this occurs when n is large enough that we would have expected at least one false positive.
Even if seroprevalence is zero, we expect to observe some number positive tests simply due to imperfect test specificity. Suppose we observe ${n}_{+}$ positive tests from a sample of n. An approximate upper 95% confidence bound on the true seroprevalence:
Data availability
Reproduction code is open source and provided by the authors at http://github.com/LarremoreLab/covid_serological_sampling (copy archived at https://archive.softwareheritage.org/swh:1:rev:262fb34c19c4bb48bdc74dad1470e4bf8bbe5a69/).
References

Assessing mumps outbreak risk in highly vaccinated populations using spatial seroprevalence dataAmerican Journal of Epidemiology 179:1006–1017.https://doi.org/10.1093/aje/kwu014

Estimating prevalence using an imperfect testEpidemiology Research International 2011:1–5.https://doi.org/10.1155/2011/608719

Estimation of SARSCoV2 infection fatality rate by Realtime antibody screening of blood donorsClinical Infectious Diseases 72:249–253.https://doi.org/10.1093/cid/ciaa849

Estimation of the basic reproduction number for infectious diseases from age‐stratified serological survey dataJournal of the Royal Statistical Society: Series C 50:251–292.https://doi.org/10.1111/14679876.00233

ReportReport 9: Impact of NonPharmaceutical Interventions (NPIs) Toreduce COVID19 Mortality and Healthcare DemandImperial College COVID19 Response Team.

If nothing Goes wrong, is everything all right? interpreting zero numeratorsJama 249:1743–1745.

Implications of test characteristics and population seroprevalence on immune passport strategiesClinical Infectious Diseases 20:ciaa1019.https://doi.org/10.1093/cid/ciaa1019

Softwarecovid_serological_sampling, version swh:1:rev:262fb34c19c4bb48bdc74dad1470e4bf8bbe5a69Software Heritage.

PostStratification: a modeler's PerspectiveJournal of the American Statistical Association 88:1001–1012.https://doi.org/10.1080/01621459.1993.10476368

Births: final data for 2016National Vital Statistics Reports : From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System 67:1–55.

SoftwareOpenSource Code Repository and Reproducible Notebooks for This ManuscriptOpenSource Code Repository and Reproducible Notebooks for This Manuscript.

Projecting social contact matrices in 152 countries using contact surveys and demographic dataPLOS Computational Biology 13:e1005697.https://doi.org/10.1371/journal.pcbi.1005697

Demographic patterns of blood donors and donations in a large metropolitan areaJournal of the National Medical Association 103:351–357.https://doi.org/10.1016/S00279684(15)303163

Efficient sampling of spreading processes on complex networks using a composition and rejection algorithmComputer Physics Communications 240:30–37.https://doi.org/10.1016/j.cpc.2019.02.008

Universal screening for SARSCoV2 in women admitted for deliveryNew England Journal of Medicine 382:2163–2164.https://doi.org/10.1056/NEJMc2009316

SoftwareDepartment of Economic and Social Affairs, Population Division. World Population Prospects 2019Department of Economic and Social Affairs, Population Division. World Population Prospects 2019.

Modeling shield immunity to reduce COVID19 epidemic spreadNature Medicine 26:849–854.https://doi.org/10.1038/s4159102008953

Revealing measles outbreak risk with a nested immunoglobulin G serosurvey in MadagascarAmerican Journal of Epidemiology 187:2219–2226.https://doi.org/10.1093/aje/kwy114
Decision letter

Miles P DavenportSenior Editor; University of New South Wales, Australia

Isabel RodriguezBarraquerReviewing Editor; University of California, San Francisco, United States

Andrew AzmanReviewer

Sereina HerzogReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
The paper by Larremore et al. presents a Bayesian framework to incorporate several sources of uncertainty (test characteristics, sample size, and heterogeneity across tested subpopulations) into estimates of SARSCoV2 seroprevalence and derived epidemiological/transmission parameters. They then use this framework to optimize study design and sampling schemes for serosurveys. While none of the methods presented are novel per se, the paper does present a much needed formal framework to guide the analysis and design of SARSCoV2 serosurveys.
Decision letter after peer review:
Thank you for submitting your article "Estimating SARSCoV2 seroprevalence and epidemiological parameters with uncertainty from serological surveys" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Miles Davenport as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Andrew Azman (Reviewer #1); Sereina Herzog (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary
The paper by Larremore et al. presents a Bayesian framework to incorporate several sources of uncertainty (test characteristics, sample size, and heterogeneity across tested subpopulations) into estimates of SARSCoV2 seroprevalence and derived epidemiological/transmission parameters. They then use this framework to optimize study design and sampling schemes for serosurveys. While none of the methods presented are novel per se, the paper does present a much needed formal framework to guide the analysis and design of SARSCoV2 serosurveys. The paper was previously reviewed in another journal and the authors have provided the comments and responses. We thank them for providing these as they were very useful when evaluating the manuscript. We find that critiques have been adequately addressed.
Essential revisions:
1) As it is clear some research groups conducting large serosurveys have decided not to correct for assay performance, I think the authors missed an important opportunity to clarify the expected magnitude of biases (and false precision) from this. It would be very useful to put this in comparison to the biases and uncertainty expected from sampling and sampling of nonrepresentative demographic samples.
2) It isn't clear why the authors chose to focus on 90% credible intervals as opposed to more commonly used 95% intervals. I don't think this is a major issue but seems like one of many decisions made throughout the paper that make these interesting results one (seemingly unnecessary) step removed from more practical application.
3) Figure 5: I don't see any attempt to explain why the modes for seroprev. posteriors are fairly off from the true values. Can you expand a bit and perhaps provide some solutions on how to do better?
4) Figure S4: Useful lessons here to illustrate how large samples from subpopulations alone (e.g., newborns and blood donors) can artificially increase apparent precision in seroprevalence estimates. Perhaps I missed this somewhere in the text but this seems useful to note.
5) The serological assays modelled in the text are not typical ones used in practice today. I appreciate that this manuscript was drafted early on the pandemic but the supplementary table should at least be updated to include the characteristics of some of the main assays in use today (e.g., EuroImmun, Roche, Abbot) for comparison purposes.
6) Some of the parameter assumptions cite other modelling papers as a reference. While some of these previous papers may have been based on empirical data, others are not. Please cite references for empirical data (e.g., incubation period) otherwise just be clear that these parameters were assumed.
https://doi.org/10.7554/eLife.64206.sa1Author response
Essential revisions:
1) As it is clear some research groups conducting large serosurveys have decided not to correct for assay performance, I think the authors missed an important opportunity to clarify the expected magnitude of biases (and false precision) from this. It would be very useful to put this in comparison to the biases and uncertainty expected from sampling and sampling of nonrepresentative demographic samples.
We now address this suggestion in two ways. First the text of the Introduction, Results, and Discussion, we note that a failure to adjust for test sensitivity and specificity will introduce bias. In numerical experiments there was no systematic way in which these biases of unrepresentative samples can be estimated without secondary data sources. However, we also note that biases from failure to adjust for sensitivity, specificity, and population demographics (via poststratification) are avoidable, while nonrepresentativeness in general is not.
Second, we highlight in Figure 4 where estimates based on uncorrected seroprevalence data would lie. This visually illustrates both the bias and false precision of using uncorrected estimates.
2) It isn't clear why the authors chose to focus on 90% credible intervals as opposed to more commonly used 95% intervals. I don't think this is a major issue but seems like one of many decisions made throughout the paper that make these interesting results one (seemingly unnecessary) step removed from more practical application.
This is a great suggestion. All plots and analyses have been reproduced using only 95% intervals.
3) Figure 5: I don't see any attempt to explain why the modes for seroprev. posteriors are fairly off from the true values. Can you expand a bit and perhaps provide some solutions on how to do better?
Figure 5 demonstrates the analysis and forecasting pipeline described in Figure 1 for a single stochastic simulation of serological test outcomes for each of the 4 sampling schemes. As a result, the estimates themselves are based on n=1000 samples spread across the individual age bins. For instance, even under uniform sampling, there are only 6263 samples per bin. Consequently, rerunning the generating code for Figure 5 using different random seeds may produce estimates that over or underestimate the groundtruth seroprevalence (used to stochastically generate the data) and the true model R_{eff}. We would absolutely expect this for the uniform samples scheme.
However, for the other three sampling schemes, age groups with higher true seroprevalence are more likely to be sampled than those with lower seroprevalence. As a result, when information is shared across age groups, such that data from well sampled age groups inform the estimates for less sampled age groups, we expect a bias in estimates for the less sampled groups, which have systematically lower prevalence. Unfortunately, estimator bias due to unsampled or undersampled groups is not, itself, estimable a priori.
In summary, through the Bayesian modeling, we suffer the possible introduction of bias and in return obtain large reductions in uncertainty for less sampled groups. We now call this out in both the text and caption.
4) Figure S4: Useful lessons here to illustrate how large samples from subpopulations alone (e.g., newborns and blood donors) can artificially increase apparent precision in seroprevalence estimates. Perhaps I missed this somewhere in the text but this seems useful to note.
We now elevate this point explicitly in both the main text and the Discussion.
5) The serological assays modelled in the text are not typical ones used in practice today. I appreciate that this manuscript was drafted early on the pandemic but the supplementary table should at least be updated to include the characteristics of some of the main assays in use today (e.g., EuroImmun, Roche, Abbot) for comparison purposes.
We have now shifted the main text to focus on the Roche Spike IgG test, while the supplemental materials show EuroImmun and Abbott Architect data. Per FDA filings^{}, these tests have characteristics of:
EuroImmun IgG, sensitivity 90% (27/30), specificity >99.9% (80/80)`
Roche IgG, sensitivity 96.6% (225/233), specificity >99.9% (5990/5991)
Abbott Architect IgG, sensitivity >99.9% (88/88), specificity 99.6% (1066/1070)
Analyses redone in both main text and supplement to reflect commonly marketed tests. References to tests now shifted to FDA filings from EuroImmun, Roche, and Abbott.
6) Some of the parameter assumptions cite other modelling papers as a reference. While some of these previous papers may have been based on empirical data, others are not. Please cite references for empirical data (e.g., incubation period) otherwise just be clear that these parameters were assumed.
Citations updated in Supplementary file 1 and original data sources cited where possible (rather than the modeling papers that used the original sources).
[Editors' note: we include below the reviews that the authors received from another journal, along with the authors’ responses.]
Reviewer #1 (initial submission):
This manuscript discussed a framework to design serological surveys to estimate SARSCoV2 seroprevalence and epidemiological parameters by integrating serological data into epidemic models. The manuscript is well written, and the details of epidemic simulations and inferences are well documented. Please see below for my comments and suggestions:
We thank the reviewer for this appraisal of the manuscript.
Major comments:
1) It is not clear in the manuscript whether the design of the serological surveys is oneoff crosssectional or serial crosssectional at different time points. I think the underlying assumption is that the seroprevalence before the COVID19 pandemic is zero. But practically if any serological survey is conducted now, it would require at least sampling at two time points so that infection attack rates could be estimated between these two timepoints, e.g. the recent serological survey in Geneva (Stringhini et al., 2020). Could the authors clarify their assumptions used in the simulation models and discuss if serial sampling could reduce the uncertainty in seroprevalence estimation?
Thank you for raising this issue. These assumptions have now been clarified in the Introduction and in the Discussion, where we explicitly note that our analysis is for the design of individual crosssectional serosurveys. However, it is also the case that, given the results of an existing agestratified serological survey, our framework would help to design a second survey which incorporated the information in the first. We now described exactly this procedure in the Discussion.
We also considered including the concept of a model in which the seroprevalence estimates are linked from one time point to the next, such that the successive waves of the survey could build on each other by creating a larger effective sample size. For instance, if one could justifiably assume that seroprevalence should always be increasing over time, then this information could be productively included in a statistical analysis. However, the evidence around the duration of positive antibody responses for SARSCoV2 remains mixed (for instance, see Herzog et al. Figures 2AC ) and for now, we prefer not to include such strong monotonicity assumptions.
Separately, we have updated the citation for the Stringhini et al. paper to its nowpublished version, and note that our remake of Figure 5 (see reviewer #2 suggestions) uses empirical estimates from that paper to better ground our work in realistic potential outcomes.
2) It seems the time between infection and seropositivity was not considered in the simulation model (Table S2 if the recovery rate is 0.5). It takes 721 days for antibodies to develop to detectable levels (depending on whether IgM or IgG are antibody markers of interest). Thus, the inference about peak time and peak incidence presented in Figure 4 are difficult to interpret without the timing of sampling after considering the time between infection and development of antibodies.
We now address this idea two ways. First, we have rewritten parts of the manuscript to clarify that we are not inferring peak timing and height, but rather, we are using the serosurvey results—with uncertainties—as the initial conditions (via the depletion of susceptibles) from which forwardlooking simulations could be run.
Second, we now discuss the fact that, due to delays in the accumulation of IgM/IgG, modeling studies that incorporate seroprevalence estimates should acknowledge such potential underestimates in interpreting their findings. This issue is inherited by all serological studies.
3) The MDI strategy presented in Figure 5 is interesting. But as mentioned earlier, the timing of sampling is not clear in the model assumptions. Since the authors have included agespecific susceptibility and POLYMODtype contact matrices in the model structure, it is expected that the susceptibles deplete at different rates for different population subgroups stratified by age and susceptibility. This will inevitably affect the next generation matrix at different time points as the pandemic unfolds. How sensitive are the four sampling strategies to the assumption of the timing of sampling?
When agestratified seroprevalence estimates are available, differences in susceptible depletion rates are accounted for in the calculation of R_{eff}, as well as in the MDI recommendation, due to the term (ID_{θ}) in equation S8. In essence, the fact that prior information about contact structure and depletion of susceptibles is included in both the sample recommendation (MDI) and the eventual calculation (here, R_{eff}) means that MDI is able to produce better estimates of R_{eff} and other modelbased calculations.
The effect of the (ID_{θ}) term is that the nextgeneration matrix affects fewer and fewer individuals, on a persubpopulation basis, as the epidemic continues, because fewer and fewer are susceptible. Therefore, when subpopulation seroprevalence estimates are available, midepidemic, the MDI recommendation is to oversample subpopulations that are critical for the ongoing dynamics and those subpopulations that are likely to have higher variance estimates due to their increased seroprevalence. These points are now emphasized in the Materials and methods, and have been further expanded in Supplementary Text S3.
4) Following the comment above, would the estimation of next generation matrix from seroprevalence is delayed considering the time between infection and seropositivity? How would this delay affect the R_{eff} estimation and the four sampling strategies in Figure 5?
We take this question to ask whether one could consider seroprevalence estimates taken at a particular time (or more realistically, over a particularly set of weeks), and then perform a sort of forwardadjustment for the fact that (1) individuals deemed seronegative at the time of sampling may now show robust antibody responses, postconvalescence, and (2) the dynamics of disease spread have continued to deplete susceptibles. In other words, could one adjust the estimate of R_{eff} to account for the delay between sampling and the calculation?
We believe that the answer to this question is yes, but to do so would require a model of disease dynamics and the ability to specify initial conditions from serological data. In other words, rather than estimating R_{eff} at the time of sampling, one could run the model forward by a few weeks and then estimate R_{eff} at that point instead. The uncertainty in the samplingtime estimates should therefore produce a cone of possible futuretime estimates—precisely the type of modeling possible through the present work!
Minor comments:
1) Could the colour gradients be replaced by lines or contour plots for Figure 2A and Figure 3B? The current version is a little difficult to read.
Great suggestion. We have made both into contour plots as suggested, as well as similar plots in Figures S1 and S2. We have also changed the shading in histograms for Figure 5 to be more legible.
Reviewer #1 (comments on revision):
The authors have addressed all my comments.
Reviewer #2 (initial submission):
"Estimating SARSCoV2 seroprevalence and epidemiological parameters with uncertainty from serological surveys" describes a mathematical framework that addresses uncertainty from multiple sources (test characteristics, sample size, heterogeneity). Such a framework allows the final modelled forecasts to admit uncertainty estimates in the forward direction, or alternatively allows test design and allocations to be tailored to fit desired uncertainty tolerances in the reverse direction (Figure 1). The availability of such frameworks is highly desirable, given the potential impact of nationallevel policy interventions based on such modelling (e.g. citation [12],which contributed to the U.K. lockdown response)
The presented framework considers three sources of uncertainty: imperfect sensitivity/specificity of test results, nonrepresentative sampled populations, and uncertain relationship between seropositivity and immunity. Overall, the derivation of the modelling as presented in the supplementary material appears to be sound, and the various explanatory examples demonstrating simulated outcomes over different parameters (e.g. subpopulations, overall seroprevalence, sampling strategies, test kits, etc) are fairly comprehensive. Given the potential farreaching impact of accurate modelling of seroprevalence on public health (including uncertainty estimates), this manuscript would be a timely addition to the literature.
We thank the reviewer for this positive assessment.
There may however remain some issues for the authors' consideration:
1) While it is noted at the end of the Introduction section that "…this framework can be used in conjunction with any model", it appears that the framework is broadly applicable to most types of epidemiological modelling tasks in general (not limited to SARSCoV2), since the integrated sources of uncertainty are usually present in such models. As such, the authors might consider emphasizing both the generalizable nature of the framework, as well as the chosen application (SARSCoV2), in the title.
The reviewer is correct, of course, that this work is a more general framework with applications beyond SARSCoV2. We now emphasize the generality of our work in the Introduction and Discussion, and, if approved by the Editors, suggest that perhaps a new title could simply omit SARSCoV2: “Estimating seroprevalence and epidemiological parameters with uncertainty from serological surveys”
2) The description of the method refers to subpopulations in general (e.g. Figure 1), although it seems that the subpopulations explored throughout the paper refer to agebased subpopulations (in particular, 5year age bins). Moreover, while several reallife age distributions(binned agei histograms) were explored (as described in the Data Sources section), the same seropositivity values (theta_{i}) were assumed for each agei subpopulation (Table S2), for all age distributions.
While Bayesian hierarchical model sampling was deemed to produce sufficient estimates of the posterior distribution (S1.3/Figure 5), it might be noted that the various sampling strategies were evaluated solely on (agebased) subpopulations with relatively low variance in subpopulation seroprevalance (absolute difference capped at 2%, for an absolute average overall seroprevalance of about 10%). This might not be the case for other types of subpopulations; there may possibly be significant differences in seropositivity between subpopulations defined on other factors (e.g. geographical, race/ethnicity, etc). The authors might consider discussing/evaluating their various sampling strategies under circumstances of higher variance in subpopulation seroprevalence.
This point is well taken. Indeed, there can be far larger variation, particularly if estimates are stratified by other characteristics. Studies from New York City , Karachi , and Mumbai (where positive test rates were 54% and 16% in slums and nonslums, respectively), have all found substantial heterogeneities by neighborhood which exceed the heterogeneities assumed in our simulations, as well as the heterogeneities by age found in Swiss (Stringhini et al) and Belgian (Herzog et al) serosurveys.
We now discuss this point in the Materials and methods and Discussion sections of the paper in two ways. First, when there are large sample sizes across heterogeneous subpopulations, the parameters of the hyperprior will be overwhelmed by the data, and thus heterogeneity will not be an issue. Second, when sample sizes are low and heterogeneity is high, the hyperprior parameters for the variance between subpopulations can be adjusted to better accommodate larger variation between subpopulations. Even in these situations, the use of a hyperprior to share information across sparsely sampled (or even unsampled) bins is generally preferred.
3) Further, the modelling of seropositivity values (theta_{i}) used throughout for simulation does not appear to have been justified with actual reported data (as for the contact matrices C_{ij}). In particular, its modelling as an absolute deviance from the average seroprevalence seems unlikely to properly reflect actual seroprevalence distributions (e.g. "Seroprevalence of antiSARSCoV2 IgG antibodies in Geneva, Switzerland (SEROCoVPOP): a populationbased study", Stringhini et al., (2020) suggests significantly lower seroprevalences for the very young and the elderly), particularly when average seropositivity is low (and would be problematic when average seropositivity below 0.014, since this would produce negative values for some subpopulations). The authors might consider justifying the seropositivity value parameter modelling in greater detail, ideally with reference to published work.
To address this suggestion, we have now conducted an analysis identical to the previous Figure 5, but using agestratified seroprevalence estimates as reported by Stringhini et al., (2020) as suggested. We used values reported therein for only seropositive and seronegative samples (Table 1), but discarded indeterminate results. We reestimated seroprevalence values by age directly from test results as posterior means, using the sensitivity and specificity reported by the authors. This new analysis is shown in the remade version of Figure 5.
Because our submitted manuscript was developed prior to the publication of highquality agestratified analyses, we did not base our synthetic seroprevalences θi in realdata estimates.
However, since that time, multiple highquality studies have been conducted, including the referenced Lancet study by Stringhini et al. All studies suggest more variation than we assumed in our synthetic seroprevalence estimates, though both values and trends by age vary, as summarized below:
19.7% (75y+) to 29.9% (017y)
0.9% (59y) to 10.8% (2049y) [MLE from Table 1 values]
1.8% (59y) to 10.8% (2049y) [Bayesian posterior mean with uniform prior from Table 1 values]
1.4% (2030y) to 5.8% (010y)
3.8% (6070y) to 15% (90y+)
3.7% (6070y) to 11.1% (010y)
2.0% (6070y) to 7.5% (1030y)
2.1% (6080y) to 9.0% (1020y)
More broadly, the generation of synthetic and realistically varying agestratified seroprevalence values for simulations is a challenge, due to the need to include variation that scales realistically with overall seroprevalence. Multiplicative scaling between age groups is a reasonable approach to generate synthetic values for lower seroprevalences (e.g. variation from 1% to 5% scales up to variation from 2% to 10%), but is unlikely to be reasonable at higher values. Because our numerical experiments scaled mean seroprevalence values from a minimum of 5% to a maximum of 50%, we therefore chose the variation to be additive.
As an aside, we note that the total range in the synthetic seroprevalence values ranged from mean–0.014 to mean+0.02, and the minimum value of the mean was chosen to be 0.05. We would therefore like to reassure the reviewer that no negative values were nonsensically drawn. However, we also feel that our explanation of these values and how they were used was unclear in our submission, and have therefore updated the manuscript (and Table S2).
4) Still on the seroprevalence modelling, the formulation of theta_{i} as theta~+ [0.014, 0.012.… 0.012] would appear possibly inconsistent, depending on the actual subpopulation distributions used. For example, consider a (young) population that is evenly split between the first two age bins {04, 59} only. Then, the average seroprevalence derived from the definition of theta_{i} on these subpopulations would appear to be (1 – 0.013)theta^{~}, which contradicts the initial definition of theta^{~} being the average seroprevalence.
This is an excellent point, and one which can easily be clarified. The synthetic subpopulation seroprevalences were chosen solely for illustrative purposes to demonstrate the performance of the method at various overall seroprevalence values. We now clarify this in the structure of Table S2, which has been split into values use in dynamical models (e.g. POLYMOD contact matrices) and values used as synthetic test parameters (e.g. values of n and θ).
Further, we note that θ^{~} corresponds to the unweighted average of subpopulation seroprevalences. This reflects a modeling choice: each θii is drawn from a Β distribution with the same mean, and during inference, that mean is estimated as the unweighted average across subpopulation means. In this way, all subpopulations have an equal influence on the floating mean of the Β prior. However, to reassure a skeptical reviewer, we point out that the difference between the demographyweighted average and the unweighted average in the synthetic example of the manuscript is a seroprevalence difference of 0.001 (i.e. an unweighted average of 0.5 corresponds to a weighted average of 0.501).
A few minor suggestions follow:
5) While SEIR models are explored, the authors might consider briefly discussing the SEIRS
extension, given recent indications that antibody immunity to COVID19 possibly wanes over months.
In light of this suggestion, and the suggestions of reviewer 1, we now discuss this point in the manuscript, inclusive of various immunological scenarios including partial protection, waning protection, and complete and durable protection (e.g. as in Saad et al.). Critically, however, we emphasize that our framework should work with forward simulations from any model, including S[E]IRS models and stratified agentbased models.
As a point of interest, we also note that whether or not protection wanes may be separate from whether or not seropositivity wanes, further complicating modeling more broadly.
6) Backtesting the proposed model with actual realworld data (vs. simulated cases), and comparing the predictions to actual observed trajectories, would appear to best illustrate the capabilities of the model. However, it is recognized that this may not be feasible.
We agree that this would be interesting, but also agree about feasibility. Indeed, our goal with this paper was to demonstrate how models, generically defined, could be used in both forward simulation and for serosurvey design, rather than to validate or calibrate any particular model.
7) Citation [20] seems to have just been published in Nature Medicine, and might be updated in the References section.
We have updated this citation, along with citations to other nowpublished works.
Reviewer #2 (comments on revision):
We thank the authors for addressing our major concerns from the previous review round, in particular the inclusion of reallife survey data with higher agegroup variances in the example analyses. There are no further comments.
Reviewer #3 (initial submission):
The article "Estimating SARSCoV2 seroprevalence and epidemiological parameters with uncertainty from serological surveys" by Larremore and colleagues presents a Bayesian hierarchical and modeling framework for estimating important epidemic quantities using seroprevalence studies. One of the key issues the authors advance is that Bayesian frameworks allow for propagation of uncertainty, which can then be immediately used as an input into an epidemic model if the Bayesian and modeling frameworks are linked. The concept is appealing, but the article, like many articles that use Bayesian frameworks, couch important assumptions in statistical considerations that make these assumptions appear unimportant. This is problematic in the entire field, and allows Bayesian statisticians to tinker with priors that yield many variants of posteriors without grappling with the assumptions that go into the choice of priors.
A few comments below, and some summary suggestions at the bottom:
"We denote the posterior probability that the true population seroprevalence is equal to θ, given test outcome data X and test sensitivity and specificity characteristics, as Pr(θX)". This notation implies that the posterior distribution of θ depends on X only. Specific notation for all other inputs is important, as θ depends on multiple factors simultaneously.
We agree and have modified the notation for the posterior distribution to be Pr(θX,se,sp). These changes also affect the notation in Figure 1, to match.
"Because sample size and outcomes are included in X, and because test sensitivity and specificity are included in the calculations, this posterior distribution over θ appropriately handles uncertainty" – just because some elements are in the conditional portion of the probability expression does not mean that uncertainty is handled appropriately. This is an important step that needs more careful handling.
We agree, and have rephrased this text to be more precise about which sources of uncertainty our calculations include. Specifically, we now write, “By explicitly conditioning on the data and test characteristics, the posterior distribution over θ captures the uncertainty in seroprevalence due to limited sample sizes and an imperfect testing instrument.”
The heart of the methods, including all the important but unstated assumptions, are in the Appendix S1. There are several steps that are glossed over and important assumptions are not explicit or tested.
Plugging in equation S2 into S1…that may be ok but there's something that requires more thinking about plugging in a singletest probability expression (S2) into a population binomial function (S1). I appreciate the authors being clear that S2 is a single test probability (not Pr(n+theta, u, v)), but it calls into question whether or not it can be then substituted back into S1, since those are different quantities. More explicit derivation there would be important.
Equation (S2) and (S3) are standard in the testing literature. We have added a citation to Diggle (2011) where both (S2) and (S3) appear in slightly different notation. Given the marginal probability of a random individual testing positive in (S2), the only additional assumption in the binomial model in (S3) is that the test outcomes of the individuals in the sample are independent. We added a statement to emphasize that this independence assumption underlies (S1) and (S3), and point readers to an example in which samples were modeled as nonindependent within households due to correlated exposure.
Then, all the assumptions that go into moving from S3 to S4 are glossed over. Why assume an uninformative prior on theta? Why not assume some function with density around Pr(n+)? I don't know what is the best prior here, but I don't think the author do, either, and those assumptions are important to make explicit and assess as they propagate through the analysis. Similarly with the assumptions around the use of the Β distribution. Why that distribution? What are the implications of using different heuristics to anchor the priors? What are the implicit assumptions in this algorithm? Statistical considerations and expedience are unsatisfactory explanations for issues with a lot of realworld biology that can be used for setting of priors.
We place an uninformative prior on θ because we do not wish to bias the prevalence estimates in any way. For example, in the current SARSCoV2 epidemic, limited testing early in the pandemic is believed to have missed numerous infections and therefore seroprevalence might be expected to be much larger than prevalence estimates based on virological tests. In this setting, placing a uniform prior on the seroprevalence can be viewed as a conservative prior, which will result in wider credible intervals for the seroprevalence estimate compared to what would be obtained from an informative, but potentially mis specified, prior.
We use a Β distribution for the mean seroprevalence parameter in the agestructured model since it is a distribution on the interval [0,1] which can be parameterized in terms of its mean and variance. Although different priors would result in different posteriors, our use of a diffuse prior and weakly informative hyperprior were selected to be conservative in the estimation, thereby producing wider credible intervals reflecting higher uncertainty. An exploration of the impacts of particular choices of prior and hyperprior for a particular seroprevalence study can be found in recent work by Gelman and Carpenter, to which we now refer readers in a new paragraph in the Discussion.
The process for subpopulations is not entirely clear. It seems the authors choose a single prior for all subpopulations, which is then updated with subpopulationspecific data. However, it is unclear how the common prior is selected. While this may seem trivial, it can have very large implications on the posterior. Assuming highvariance priors will result in highvariance posteriors, and vice versa. Without more careful attention to the choice of priors, the authors allow the statistical cloak of Bayesian analysis to disguise the central assumptions that drive the outcomes of such analyses.
The common prior was chosen for the agespecific seroprevalence estimates to construct a hierarchical model. Hierarchical models have the key advantage that they pool information across age groups in constructing the agespecific estimates. However, the common prior does not imply that we expect no heterogeneity across subpopulations. Indeed, we investigate scenarios with heterogeneous seroprevalence levels in the manuscript, as do others, including Stringhini et al., (2020), and Carpenter and Gelman (2020). As noted above, we now include discussion of prior distributions in the manuscript’s Discussion.
The integration of the posterior estimates into SEIR models leave important pieces out. Specifically, it appears samples from the posterior distribution were "initially placed into the "recovered" compartment of the model." The prevalence estimates should not be used to parametrize the initial conditions. Indeed, parameterizing the model so as to achieve the estimated prevalence – use theta as a calibration target – is central to effective use of seroprevalence data. After all, that prevalence is achieved at a moment in time, and reflects (with some caveats) the cumulative incidence of regional infections and transmission from the beginning of the epidemic up to that point. That leaves an important question of t_{0} for the epidemic, but those are the issues that modelers should grapple with in incorporating seroprevalence data into SEIR models.
We emphasize that our efforts to connect serological studies and modeling efforts is designed for forecasting and forward simulation, but not backward inference of past epidemiological parameters. To clarify this point, we now describe models as forwardsimulations, to avoid confusion.
In summary, a few suggestions:
The article would be better if it focused on just one of the two key steps – either the estimation of prevalence, or the integration of prevalence into models. I think the authors have more strength in the former, as their handling of the latter is relatively superficial.
Choosing priors is not merely a statistical exercise. Even choosing weakly informative priors is a nevertheless a choice with important implications for the posterior distribution. Let alone choosing Β functions, despite their convenient properties.
The choice of priors – from the very origins of Bayesian statistics – should take into account knowledge about the real world that is then updated. If there is truly no prior information, then frequentist statistics can be just as useful and there is no need for Bayesian frameworks.
However, that is rarely the case, even in the case of the novel coronavirus. Priors about seroprevalence can be informed by factors such as case counts population density, demographic structures, and more.
As noted above, we now discuss this point directly, and frame the tradeoff between uninformative and informative priors in terms of being conservative about uncertainty vs justifying additional assumptions. While we understand that this review of a complicated topic is far from all there is to say on the matter, we hope that those interested in the topic of using orthogonal data sources to create informative priors will inform themselves appropriately.
A truly interesting paper would examine the sensitivity of Bayesian models to the choice of priors in estimating covid seroprevalence. I would be interested in reading that paper!
Indeed! We suggest Gelman and Carpenter, 2020, which investigates the statistically controversial Santa Clara seroprevalence study from Bendavid et al.
Reviewer #3 (comments on revision):
I appreciate the opportunity to revisit the article "Estimating SARSCoV2 seroprevalence and epidemiological parameters with uncertainty from serological surveys" by Larremore and colleagues. The authors have revised the manuscript to include more agestructured data and a few minor text revisions (little of substance appears to have changed in the tracked manuscript).I appreciate the authors' clear exposition of their goals and the consistent thread of producing seroprevalence estimates through to their incorporation into forwardlooking epidemiologic models.
However, the authors continue to gloss over the hard issues with Bayesian approaches, and continue to present this approach as an improvement over existing approaches. For example, there are several places where the authors imply that their approach affords the "appropriate" propagation of uncertainty. Along a similar vein, the response to the choice of uninformative priors is described as Conservative. While the authors' intent in using the word Conservative is Wider confidence intervals, the choice of the word Conservative points to the likelyimplicit biases of the authors which deserve more careful consideration.
Choosing the priors – and the width of the posterior uncertainty – is a choice which has tradeoffs. The suggestion that this is a "Conservative" choice suggests the authors may think estimating toowide credible/confidence intervals is preferable to estimating toonarrow intervals. However, this is a bias. In genetics, using Bonferronitype "conservative" thresholds for associations leads to discarding many true discoveries that may have important implications for science and medicine. Similarly, choosing priors that imply relatively large uncertainty around SARSCoV2 (or any antibody) seroprevalence could have important implications for things like epidemic projections, with large consequences. Both underestimating and overestimating projected infections or other outcomes have downsides, and being more "conservative" (in the authors' sense) is not "appropriate" or preferable; rather, it is a choice. A close look at the tradeoffs involved would lend much greater credibility to the paper.
An important related challenge is that Bayesian models – to date, given their relative infrequent use compared to frequentist statistics – obscure important assumptions and allow investigators degrees of freedom that are hard to identify. The authors reference a GelmanCarpenter analysis that had done just that. Gelman and Carpenter chose one prior distribution, examined the posterior distribution, didn't like the findings, and Gelman noted "So I'll replace the weakly informative halfnormal(0, 1) prior on σ_{sens} with something stronger: σ_{sens} ~ normal(0, 0.2)." This kind of investigatordependent statistical sleightofhand is common in Bayesian statistics but is hard to identify (it is rarely so explicitly noted) and rarely examined with balance.
Indeed, the authors correctly acknowledge that "specifying such priors appropriately relies on additional information and/or assumptions … which may be sparse, particularly during an unfolding pandemic." And this brings up the last point. When the prior information is sparse, frequentist approaches provide a more transparent and familiar framework for analysis. Both Bayesian and frequentist approaches have their place – and both have important strengths and limitations. And a novel phenomenon (such as COVID19) with little prior information may well be the kind of situation where frequentist approaches are advantageous relative to Bayesian approaches. Indeed, nearly all COVID19 seroprevalence studies have used frequentist statistical approaches exactly for that reason: so much is unknown about COVID19, let us not add obfuscation to uncertainty. (As examples with COVID19, see Anand, Lancet 2020; Pollàn, Lancet 2020; and Sood, JAMA 2020. For a review of seroprevalence studies, see Ioannidis, medRxiv doi 10.1101/2020.05.13.20101253.)
In the end, I do not think Bayesian approaches are, to date, in a place where they provide an improvement over frequentist approaches for seroprevalence estimation (nor for providing inputs into SEIR models). On the other hand, a balanced comparison of their merits and limitations would go a long way to furthering the field.
First, the core problem solved in our work is not solvable using frequentist methods. Our manuscript is about the interplay between epidemiological modeling and the design and analysis of serosurveys. To fully connect the two requires inferences about seroprevalence and uncertainty to be done at the same level of age (or subpopulation) stratification. These inferences are not possible using frequentist methods, especially for the convenience samples like blood donations or neonatal dried blood spots which we develop and analyze in the paper, since the samples inherently do not contain individuals from all subpopulations. The Materials and methods in our paper, whose assumptions we have been entirely clear and transparent about, enable this connection. Indeed, the analysis methods of seroprevalence in the three research articles cited by Reviewer #3 are insufficient to accomplish this key task—and perplexingly, one does not even correct for the sensitivity and specificity of the tests (Pollán et al.).
Second, Bayesian methods are widely and increasingly used precisely because of their flexibility to provide transparent inference in contexts where traditional frequentist methods fall short. Indeed, many journals, including this one, have published many highimpact papers based on Bayesian analyses, on a variety of topics including H1N1, malaria, depression, and infertility. In fact, the landmark seroprevalence study by Stringhini et al. which the other Reviewers noted and on which we based our revised analysis of realworld data, is, itself, a clear, highimpact example of a Bayesian analysis in the context of SARSCoV2 seroprevalence. Reviewer #3 takes issues with Bayesian methods in general; however, the reviewer’s opinion is out of step with the broader literature, past publications in infectious disease epidemiology and modeling, and mathematical modeling and inference under uncertainty.
In sum, although reviewers #1 and #2 approved of our manuscript—and indeed improved it with constructive comments which called for new and more detailed analyses of real data—Reviewer #3 is narrowly focused on the presence of Bayesian inference methods in the manuscript, full stop, and has provided no clear, constructive suggestions. It is our view that this has led to the reviewer’s very targeted dismissal of the paper without considering its goals.
We are not interested in relitigating an imagined dispute between Bayesians and frequentists, and instead appeal to “the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach.” We are interested in the topic on which our manuscript is focused: bridging the divide between serological studies and epidemiological modeling, for the improvement of both.
https://doi.org/10.7554/eLife.64206.sa2Article and author information
Author details
Funding
MorrisSinger Fund for the Center for Communicable Disease Dynamics
 Stephen M Kissler
 Caroline O Buckee
 Yonatan H Grad
National Cancer Institute (1U01CA26127701)
 Daniel B Larremore
 Yonatan H Grad
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The authors thank Nicholas Davies, Laurent HébertDufresne, Johan Ugander, Arjun Seshadri, and the BioFrontiers Institute IT HPC group. The work was supported in part by the MorrisSinger Fund for the Center for Communicable Disease Dynamics at the Harvard T.H. Chan School of Public Health. DBL and YHG were supported in part by the SeroNet program of the National Cancer Institute (1U01CA26127701). Reproduction code is open source and provided by the authors at github.com/LarremoreLab/covid_serological_sampling (Larremore, 2021; copy archived at swh:1:rev:262fb34c19c4bb48bdc74dad1470e4bf8bbe5a69).
Senior Editor
 Miles P Davenport, University of New South Wales, Australia
Reviewing Editor
 Isabel RodriguezBarraquer, University of California, San Francisco, United States
Reviewers
 Andrew Azman
 Sereina Herzog
Publication history
 Received: October 21, 2020
 Accepted: March 4, 2021
 Accepted Manuscript published: March 5, 2021 (version 1)
 Version of Record published: March 19, 2021 (version 2)
 Version of Record updated: April 8, 2022 (version 3)
Copyright
© 2021, Larremore et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,394
 Page views

 280
 Downloads

 39
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Medicine
 Microbiology and Infectious Disease
 Epidemiology and Global Health
 Immunology and Inflammation
eLife has published articles on a wide range of infectious diseases, including COVID19, influenza, tuberculosis, HIV/AIDS, malaria and typhoid fever.

 Epidemiology and Global Health
Background:
Shortterm forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID19 cases and deaths across Europe between 08 March 2021 and 07 March 2022.
Methods:
We used opensource tools to develop a public European COVID19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID19 cases and deaths reported by a standardised source for 32 countries over the next 1–4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equallyweighted average (initially the mean and then from 26th July the median) of all individual models’ predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models’ forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models’ past predictive performance.
Results:
Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models’ forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models’ forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models’ forecasts of deaths (N=763 predictions from 20 models). Across a 1–4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models.
Conclusions:
Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks.
Funding:
AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS1633028, NSF Grant No.: OAC1916805, NSF Expeditions in Computing Grant CCF1918656, CCF1917819, NSF RAPID CNS2028004, NSF RAPID OAC2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA119D0007, and respectively Virginia Dept of Health Grant VDH215010141, VDH215010143, VDH215010147, VDH215010145, VDH215010146, VDH215010142, VDH215010148. AF, AMa, GL funded by SMIGE  Modelli statistici inferenziali per governare l'epidemia, FISR 2020Covid19 I Fase, FISR2020IP00156, Codice Progetto: PRJ0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission  DirectorateGeneral for Communications Networks, Content and Technology through the contract LC01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018095456BI00. DE., MGu funded by Spanish Ministry of Health / REACTUE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID19 (https://www.nfdi4health.de/taskforcecovid192) within the framework of a DFGproject (LO342/171). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for realtime monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).