SARSCoV2 antibody dynamics in blood donors and COVID19 epidemiology in eight Brazilian state capitals: A serial crosssectional study
Abstract
Background:
The COVID19 situation in Brazil is complex due to large differences in the shape and size of regional epidemics. Understanding these patterns is crucial to understand future outbreaks of SARSCoV2 or other respiratory pathogens in the country.
Methods:
We tested 97,950 blood donation samples for IgG antibodies from March 2020 to March 2021 in 8 of Brazil’s most populous cities. Residential postal codes were used to obtain representative samples. Weekly age and sexspecific seroprevalence were estimated by correcting the crude seroprevalence by test sensitivity, specificity, and antibody waning.
Results:
The inferred attack rate of SARSCoV2 in December 2020, before the Gamma variant of concern (VOC) was dominant, ranged from 19.3% (95% credible interval [CrI] 17.5–21.2%) in Curitiba to 75.0% (95% CrI 70.8–80.3%) in Manaus. Seroprevalence was consistently smaller in women and donors older than 55 years. The agespecific infection fatality rate (IFR) differed between cities and consistently increased with age. The infection hospitalisation rate increased significantly during the Gammadominated second wave in Manaus, suggesting increased morbidity of the Gamma VOC compared to previous variants circulating in Manaus. The higher disease penetrance associated with the health system’s collapse increased the overall IFR by a minimum factor of 2.91 (95% CrI 2.43–3.53).
Conclusions:
These results highlight the utility of blood donor serosurveillance to track epidemic maturity and demonstrate demographic and spatial heterogeneity in SARSCoV2 spread.
Funding:
This work was supported by Itaú Unibanco ‘Todos pela Saude’ program; FAPESP (grants 18/143890, 2019/215850); Wellcome Trust and Royal Society Sir Henry Dale Fellowship 204311/Z/16/Z; the Gates Foundation (INV 034540 and INV034652); REDSIVP (grant HHSN268201100007I); the UK Medical Research Council (MR/S0195/1, MR/V038109/1); CAPES; CNPq (304714/20186); Fundação Faculdade de Medicina; Programa Inova FiocruzCE/Funcap  Edital 01/2020 Number: FIO016700065.01.00/20 SPU N°06531047/2020; JBS – Fazer o bem faz bem.
Editor's evaluation
This article describes a large and compelling COVID19 serosurvey in Brazil that, when combined with death data, provides an estimate of the infection fatality ratio. This valuable study highlights both the strengths and challenges of blood donor serosurveillance in a pandemic environment where multiple waves of infection occur and immune responses wane relatively quickly.
https://doi.org/10.7554/eLife.78233.sa0Introduction
Brazil has experienced one of the world’s most significant COVID19 epidemics, with over 22 million cases and 621,000 deaths reported as of 14 January 2022. However, this national picture masks important subnational heterogeneity, with extensive variation in SARSCoV2 spread between population groups (Li et al., 2021) and locations (Castro et al., 2021; Hallal et al., 2020) as well as regional differences in the stringency of nonpharmaceutical interventions (de Souza Santos et al., 2021).
Understanding the drivers of these differences is crucial, both retrospectively as a means of evaluating past attempts at controlling spread, and as a guide to the potential impact of future transmission. Indeed, a significant fraction of the COVID19 burden in Brazil was driven by the emergence of the Gamma (P.1) variant of concern (VOC) in November 2020, which drove extensive resurgence of transmission following its apparent emergence in the Amazonas State capital city of Manaus. Despite the evidence of high levels of populationlevel immunity that should have hindered further transmission (Buss et al., 2021), a phenomenon attributed to the Gamma VOCs likely increased transmissibility and ability to partially evade immune responses (Faria et al., 2021). Subsequent spread to the rest of Brazil led to similar resurgence, extensive transmission, and disease burden leading to substantial pressure on health systems (Brizzi et al., 2021; de Oliveira et al., 2021; Martins et al., 2021). As with the first epidemic wave, the degree and extent to which different locations were affected varied markedly. Understanding the drivers of this variation is crucial to shed light on how and why SARSCoV2 spreads across different populations, and how past epidemics shape subsequent transmission of the virus. More generally, because previous natural infection may enhance vaccine response (Crotty, 2021; Reynolds et al., 2021; Stamatatos et al., 2021), understanding the extent of previous exposure in the country may have important implications for the development of epidemic waves driven by new variants in the context of the ongoing largescale, nationwide vaccination campaign.
Here, we analyse the divergent epidemic SARSCoV2 dynamics in eight of the biggest Brazilian cities (Belo Horizonte, Curitiba, Fortaleza, Manaus, Recife, Rio de Janeiro, Salvador, and São Paulo). We estimate the seroprevalence over time for these cities disaggregated by age and sex using repeated crosssectional convenience samples of routine blood donors collected from March 2020 to March 2021. We also provide estimates for the agespecific infection fatality rates (IFR, defined as the number of deaths per infection) and infection hospitalisation rates (IHR, the number of hospitalisations per infection) for these cities. In Manaus, the Gamma VOC became dominant before March 2021 (see Appendix 1—figure 1), enabling us to provide estimates of Gamma’s IFR and IHR. Our results highlight important differences in the drivers of SARSCoV2 epidemic spread across Brazil’s major population centres and underscore the utility of blood donors for regular serosurveillance as a tool to track progression of epidemics of emerging infectious diseases.
Methods
Selection of blood donors for estimation of seroprevalence
Each of the eight cities had a monthly quota of 1,000 kits for testing selected donation samples in this study. In order to select more representative samples, we selected blood samples so that the spatial distribution of residential location of selected donors matches the spatial distribution of population density in each municipality. More specifically, each city was divided into submunicipal administrative zones, and the original quota (1,000 kits) was divided into subquotas following the populational distribution of the city administrative zones. Starting from the second week of each month, we selected consecutive blood donors based on the geolocation of their residential postcode to fill the subquotas. In this way, donations with missing or wrong postal code were considered ineligible for selection. We chose the sample size (1,000) so an increase in crude seroprevalence of 5% can be detected with power $1\beta =80\%$ and confidence level $1\alpha =95\%$ assuming a baseline seroprevalence of 15%.
In Manaus, however, donor postcodes were not reliably collected, so that the number of missing and wrong values makes this strategy unfeasible. So, samples were selected consecutively with no postal code restrictions. We also developed a study management system to operationalize this sampling strategy, whereby blood donor postcodes and epidemiological data were automatically extracted and selected. After that, the selected donation sample IDs were released for the research assistant to be separated for testing.
From 453,211 available blood samples collected in all 8 cities except Manaus, 72,783 had a missing or invalid residential postal code, and 198,199 were from individuals living in regions not included in this study, thus 182,229 samples were eligible for selection. An average of 1010 samples were selected monthly for each city from March 2020 to March 2021, except for Recife where tests occurred until February 2021. A total of 104,013 samples were selected, but 6063 samples could not be retrieved or did not have enough volume to be tested, leading to 97,950 tested samples (951 samples per month in average for each city). Appendix 1—figure 2 contains a flowchart describing the selection procedure of blood donors.
In Brazil, blood donation samples are usually saved for 6 months, so when serological test kits were made available in July 2020, we could retrospectively select and test frozen samples from February to July. After this, period samples were selected and tested in real time. Antibody tests results were not made available to the blood donors themselves.
Blood donors are a convenience sample, and thus may not be representative of the wider population in terms of their risk of SARSCoV2 exposure. Appendix 1—figures 3–6 show a comparison between recorded blood donor demographics and the last available Brazilian census conducted in 2010. Donors differ systematically in age, sex, and selfreported skin colour compared to the population, but the income per capita is similar. To account for the differences in the agesex structure of blood donors, we divide donors in agesex groups and estimate the prevalence of each agesex group separately. Then, we calculate the seroprevalence of the population as a weighted sum of the seroprevalences of each agesex group.
SARSCoV2 serology assays
We applied chemiluminescent microparticle immunoassays (CIMA, AdviseDx, Abbott) that detect IgG antibodies against the SARSCoV2 nucleocapsid (N) because it was the only automated commercially available kit in Brazil when the study started (July 2020). We used this kit throughout the study until March in all eight cities except Recife, where we used the kit until February 2021. This assay suffers from signal waning  resulting in positivenegative transition, or ‘seroreversion’  during convalescence. This amounts to a fall in assay sensitivity through time. The Abbott antiN IgG CMIA shows particularly rapid signal decay when compared with other assays (Di Germanio et al., 2021). These antibody dynamics mean that as an epidemic progresses, the crude proportion of individuals with a positive test result will increasingly underestimate the true attack rate (Buss et al., 2021; Takahashi et al., 2021; Takahashi et al., 2020).
A test is considered positive if the obtained signal to cutoff (S/C) is greater or equal to a predefined threshold of 0.49. This is the lower threshold recommended by the manufacturer, which was used instead of the upper threshold of 1.4 to partially attenuate the effect of seroreversion. Appendix 1—figures 7–9 contain the number of tests disaggregated by month, age, sex and the monthly S/C distribution. We also decided to validate the results observed in Manaus, as this represents a unique sentinel population, by retesting all samples in November 2020 using the CIMA (AdviseDx, Abbott) that detects IgG antibodies against the SARSCoV2 spike (S) protein (see Appendix 1 for the validation analyses).
To determine the test sensitivity, we considered a cohort of 208 nonhospitalised symptomatic SARSCoV2 PCRpositive convalescent plasma donors tested within 60 days after symptom onset (Supplementary file 1). These donors had symptomatic COVID19 with PCRconfirmed SARSCoV2 infection and were recruited to provide convalescent plasma. We found a sensitivity of 90.6% for the antiN assay using a threshold of 0.49 S/C and 94.0% for the antiS assay. Specificity for the antiN assay was 97.5%, with 801 negative results in 821 prepandemic blood donation samples (Buss et al., 2021). Sensitivity and specificity for other assay thresholds are shown in Supplementary file 1. The antiS assay has a specificity of >99% (Di Germanio et al., 2021; Stone et al., 2021), and we assume 100% in this study. Although the sensitivity of both assays declines through time due to waning of the detected antibodies below the positivity threshold, the antiS IgG antibodies wane more slowly (Di Germanio et al., 2021; Stone et al., 2021). Sensitivity obtained from convalescent plasma donors is likely overestimated due to spectrum bias. This is because convalescent donors had moderatetosevere SARSCoV2 infection, and thus differ from the whole blood donor population (used to estimate seroprevalence), who are more likely to have had asymptomatic or mild disease.
We subsequently estimated the distribution of time to seroreversion, and thus the sensitivity decreasing through time, for the antiN assay. We first calculated this in the convalescent donors, in whom the date of symptom onset is known, and whose blood samples were collected longitudinally during convalescence. As such, the timetoseroreversion distribution was computed after accounting for right censoring. However, due to spectrum bias, the extrapolation of antibody waning from convalescent donors to whole blood donors is unlikely to be valid. As such, we obtained a second cohort of repeat blood donors in Manaus that provided multiple donations during the 2020–2021 period. These donors are expected to have the same antibody dynamics as the seroprevalence cohort, as they are drawn from the same population and have predominantly mild or asymptomatic infections. However, in this group the time of infection is unknown, as infection is inferred by serostatus alone. The procedure to manage this problem is described below.
Methods used to estimate the timedependent sensitivity
We developed an analytic method to correct raw seroprevalence data for seroreversion, improving on the method used in Buss et al., 2021. We first estimate the timetoseroreversion distribution using serial donations from repeat blood donors, which determines how sensitivity for a given individual decreases with the time after seroconversion. We then corrected the raw seroprevalence estimates for the changing sensitivity within a Bayesian framework. We first calculated attack rates for each age and sex group in each city and summed these using the proportion of each group in the Brazilian reference population to obtain standardised estimates. In this section, we describe a procedure to estimate the timedependent sensitivity used to obtain a seroprevalence estimate corrected for antibody waning.
Let ${\mathrm{s}\mathrm{e}}_{0}$ be the sensitivity measured shortly after symptomatic infection (i.e. the probability of an infected individual seroconverting to an S/C above the threshold), and ${p}^{+}\left[n\right]$ be the probability of a donor remaining positive $n$ weeks after seroconversion (given that the donor seroconverted). Then, the sensitivity of the test $n$ weeks after seroconversion is $\mathrm{s}{\mathrm{e}}_{0}\times {p}^{+}\left[n\right]$ for a given donor. In this section, we describe the procedure used to determine ${p}^{+}\left[n\right]$ from repeat blood donors data, for which time of infection and time of seroreversion are unknown. The seroreversion correction model described in the next section uses the estimate of ${p}^{+}\left[n\right]$ to calculate the seroprevalence accounting for seroreversion.
The criteria to select repeat blood donors were: (1) at least one positive test, indicating SARSCoV2 infection, (2) at least one subsequent blood sample, in order to interpolate the date of seroreversion, and (3) falling S/C between these two samples, because one of the samples used to define the interpolation curve may have occurred before the peak S/C; hence, the halflife and the date of seroreversion cannot be estimated. Therefore, all selected donors had at least one positive sample and at least one subsequent sample (positive or negative) with smaller S/C.
To calculate ${p}^{+}\left[n\right]$, we first estimate the date of seroreversion for each repeat blood donor using an exponential interpolation (a linear interpolation in the log scale). We choose an exponential interpolation because an exponential decay is frequently used to model antibody dynamics (Takahashi et al., 2021). When seroreversion is intervalcensored, i.e., a donor that has a positive test subsequently becomes negative, we interpolate an exponential curve that passes through the last positive sample and the first negative sample. Otherwise, when seroreversion is not intervalcensored, then it is rightcensored (a donor remains positive on their last sample), in which case we extrapolate an exponential line through the last two positive samples and project this forward. As such, the estimated instant of seroreversion for blood donor $i$ (denoted as ${{t}_{i}}^{}$) is the point where the interpolation curve crosses the threshold for a positive test. The interpolation procedure is illustrated in Appendix 1—figure 10. The proposed method may overestimate ${{t}_{i}}^{}$ if an S/C used to define the interpolation curve was sampled shortly after seroconversion before the peak S/C was reached, since in this case the S/C curve does not behave as an exponential, leading to an overestimated halflife. To partially overcome this problem, we discard donors that do not serorevert within 106 weeks (2 years) after their first positive test.
After estimating ${{t}_{i}}^{}$, for each blood donor $i,$ we compute the probability distribution of the date of seroconversion for that donor, ${p}_{i}$. For this, we identify the earliest and latest possible date of seroconversion $t}_{\text{min}$ (the date of the last negative result before seroconversion or 1 March 2020 if the donor has no positive results before seroconversion) and $t}_{\text{max}$ (the date of the first positive result). The relative probability of seroconversion within this window depends on the incidence of seroconversions due to SARSCoV2 infection for the cohort of repeat donors, denoted ${u}_{\text{repeat}}\left[n\right]$. To estimate this quantity, we calculate the histogram of the date of first positive donation for repeat blood donors and then apply a 30day moving average. As a sensitivity analysis, we also calculate ${u}_{\text{repeat}}\left[n\right]$ by computing the histogram of the date of onset of ion (SARI) deaths observed in Manaus, and applying to it a 7day window moving average, yielding similar seroprevalence estimates (Appendix 1—figures 11 and 12).
The distribution of the date of seroconversion is obtained by truncating the incidence curve of repeat blood donors ${u}_{\text{repeat}}\left[n\right]$ in the interval $\left[{t}_{\text{min}},{t}_{\text{max}}\right]$ and renormalizing the distribution. We then generate 1,000 samples of the instant of seroconversion ${t}_{i}^{+}\sim {p}_{i}\left[n\right]$ and compute the 1,000 sample delays between seroconversion and seroreversion $\Delta {t}_{i}={{t}_{i}}^{}{{t}_{i}}^{+}$.
The probability of the delay between seroconversion and seroreversion being $n\ge 1$ days (denoted as ${p}_{\text{day}}^{}\left[n\right]$) is calculated with the empirical histogram of the $1000\times {N}_{\text{donors}}$ samples of $\Delta {t}_{i}$ , $i=1,\cdots ,{N}_{donors}$. The distribution ${p}_{\text{day}}^{}\left[n\right]$ is then binned into weeks by taking the average of $\left({p}_{\text{day}}^{}\left[7n\right],\text{}{p}_{\text{day}}^{}\left[7n+1\right],\cdots ,{p}_{\text{day}}^{}\left[7n+6\right]\right)$ for $n\ge 1$. The resulting distribution, denoted as ${p}_{\text{week}}^{}\left[n\right]$, represents the probability of seroreversion exactly $n$ weeks after seroconversion.
Finally, the probability of a donor remaining positive $n$ weeks after seroconversion ${p}^{+}\left[n\right]$ (i.e. the probability of a donor seroreverting after week $n$) is obtained through,
The presented method is summarised in Appendix 1.
Estimating the seroreversion probability from convalescent plasma donors
Unlike repeat blood donors, convalescent plasma donors have a known date of symptom onset. To compute ${p}^{+}\left[n\right]$ for plasma donors, we estimate the instant of seroreversion for each plasma donor as described above and define the date of seroconversion as 8 days after the reported date of symptom onset. This interval of 8 days is the average lag between seroconversion and seroreversion reported in Orner et al., 2021 for a threshold of 1.4 S/C, but it can be shorter for a threshold of 0.49 employed in this work. The probability mass function of the time to seroreversion ${p}_{\text{day}}^{}\left[n\right]$ is then the empirical histogram of $\Delta {t}_{i}={{t}_{i}}^{}{{t}_{i}}^{+}$, and ${p}^{+}\left[n\right]$ is obtained from ${p}_{\text{day}}^{}\left[n\right]$ using the method presented above.
Our proposed seroreversion correction model
Here, we present a Bayesian model that draws posterior samples from the incidence over time corrected by sensitivity, specificity, and seroreversion using as input the estimated curve for ${p}^{+}\left[n\right]$, the number of weekly positive tests, and total number of tests. Even though the main output of the model is the incidence at week $n$ for agesex group $a$ (denoted as $u[n,a]$), the seroprevalence at week $n$ for group $a$ can be calculated from $u[n,a]$ as $\rho \left[n,a\right]={\sum}_{k=1}^{n}u[k,a]$. For simplicity, the proposed model ignores the delay between infection and seroconversion, as it should have small impact on the estimate of $u[n,a]$. To define the agesex groups, age was discretized in the intervals 16–24, 25–34, 35–44, 45–54, and 55–69.
Assuming that the sensitivity $\mathrm{s}{\mathrm{e}}_{0}$ and specificity $\mathrm{s}\mathrm{p}$ of the assay are independent of the agesex group, the probability of a random person from agesex group $a$ being tested positive at week $n$, denoted as $\theta [n,a]$, is
The derivation of the expression above is presented in Appendix 1. The left term $\mathrm{s}{\mathrm{e}}_{0}{\sum}_{k=1}^{n}{p}^{+}[nk]u[k,a]$ represents true positives (previously infected donors that are still seropositive), while the right term $\left(1\mathrm{s}\mathrm{p}\right)\left(1{\sum}_{k=1}^{n}u[k,a]\right)$ represents false positives (uninfected donors that test positive).
Let us denote as ${T}^{+}[n,a]$ and $T\left[n,a\right]$, respectively, the number of positive tests and the total number of tests for week $n$ and agesex group $a$. Given $\theta [n,a]$, the probability distribution of ${T}^{+}[n,a]$ is
We use a Bayesian framework to draw posterior samples from $u$ assuming a noninformative prior, but limiting the final seroprevalence in the interval $[0,b]$, where $b$ is a fixed input of the algorithm that can be 1 or 2 depending on whether reinfections are allowed, and we use $b=2$ in this work. Instead of defining a prior distribution for $u[n,a]$ directly, we decompose it into $u\left[n,a\right]={\rho}_{\text{max}}\left[a\right]{u}_{\text{norm}}\left[n,a\right],$ where ${\rho}_{\text{max}}\left[a\right]\sim \mathrm{U}\mathrm{n}\mathrm{i}\mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\left(0,b\right)$ sets the upper bound of the final prevalence to $b$ and ${u}_{\text{norm}}\left[:,a\right]\sim \mathrm{D}\mathrm{i}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{h}\mathrm{l}\mathrm{e}\mathrm{t}\left(1,1,\dots ,1\right)$ is the normalised incidence which sums to 1. This decomposition is equivalent to assuming a uniformly distributed prior for $u\left[:,a\right]$ in the simplex $0\le {\sum}_{n=1}^{N}u\left[n,a\right]\le b$ with $u\left[n,a\right]\ge 0\forall n$.
After drawing posterior samples from $u[n,a]$, we calculate the seroprevalence at week $n$ for agesex group $a$ as $\rho \left[n,a\right]={\sum}_{k=1}^{n}u[k,a]$ and then compute the agesex weighted seroprevalence $\rho \left[n\right]$, given by
where $\mathrm{p}\mathrm{o}\mathrm{p}\left[a\right]$ is the population for the agesex group $a$ in the corresponding city and $M$ is the number of agesex groups. Of note, in this work we also refer to $\rho [n]$ as the estimated seroprevalence, cumulative seroprevalence or attack rate.
The presented Bayesian model is summarised in Appendix 1. The posterior samples are drawn using a MonteCarlo Markov Chain algorithm with 100,000 iterations.
The incidence returned by the model was validated through posterior predictive checks by randomly selecting 1,000 samples from $u\left[n,a\right]$ and drawing samples from ${T}^{+}\left[n,a\right]\mid \theta \left[n,a\right]\sim \mathrm{B}\mathrm{i}\mathrm{n}\mathrm{o}\mathrm{m}\mathrm{i}\mathrm{a}\mathrm{l}\left(T\left[n,a\right],\theta \left[n,a\right]\right)$ . The resulting crude seroprevalence is then compared with the measured crude seroprevalence (Appendix 1—figure 13).
It is worth noting that the agespecific crude seroprevalence can be larger than the seroprevalence corrected for seroreversion in some weeks, as the model may remove outlier samples. This is because seroprevalence curves that cannot be reconstructed by the model (e.g. due to bias or sampling noise) generate a small likelihood, hence, a smaller probability of being included in the set of posterior samples generated by the model. Therefore, the model excludes weeks where donors are significantly biased towards more seropositive or more seronegative individuals.
The proposed Bayesian seroreversion correction model can be seen as an improvement on that presented in Buss et al., 2021. The model in Buss et al. assumed a parametric form for time to seroreversion and derived the parameters by assuming an increasing cumulative seroprevalence in the repeated crosssectional samples of blood donors in Manaus. Here, we derived the distribution directly from repeat blood donors without assuming any parametric form. Also, Buss et al. applied the seroreversion correction method to the measured seroprevalence corrected for sensitivity, specificity, and reweighted by age and sex, while here we estimate the seroprevalence in each age group separately, allowing the identification of nonhomogeneous incidence in different age groups.
Despite these differences, the results presented here are compatible with the seroprevalence estimates of 28.8 and 76.0%, respectively, for São Paulo and Manaus in Buss et al. The proposed seroreversion method also differs from other methods in the literature (Shioda et al., 2021; Takahashi et al., 2021) in that we use the incidence curve to estimate the timedependent sensitivity instead of the deaths or confirmed cases curve, producing a seroprevalence that does not depend on case reporting and that can be reliably inferred in epidemics where the IFR changes with time, as was the case in Manaus.
Estimating the IFR for December 2020
We estimate the IFR using total deaths due to Severe Acute Respiratory Infection (SARI), which includes PCR and clinically confirmed SARSCoV2 infection as well as SARI deaths without a final diagnosis, and we exclude SARI deaths confirmedly caused by other aetiologies. This approach reduces underreporting, particularly in 2020 when testing was not widely available, as discussed in de Souza et al., 2020. We further justify this approach in Appendix 1.
We retrieved the daily number of SARI deaths from SIVEPGripe (Sistema de Informação da Vigilância Epidemiológica da Gripe), a public database containing individuallevel information of all SARI cases reported in Brazil. To estimate the IFR in 2020, we use the seroprevalence estimated by our model for 16 December 2020 and select only SARI deaths with symptom onset between 1 March and 15 December 2020. Selecting deaths based on the date of first symptoms instead of date of death was possible because SIVEPGripe contains the date of symptom onset for each individual. For the first wave of COVID19 that occurred in the eight cities, we estimate the number of cases as the agespecific population size (https://demografiaufrn.net/laboratorios/lepp/) multiplied by the estimated seroprevalence in the corresponding age group. We propagate the uncertainty in the prevalence estimate through the calculation of IFR.
Let $\rho \left[a\right]$ and $\mathrm{p}\mathrm{o}\mathrm{p}\left[a\right]$ be the cumulative seroprevalence and the population estimated for age group $a$. We assume a uniform distribution in the interval [0, 1] as a noninformative prior for $\mathrm{I}\mathrm{F}\mathrm{R}\left[a\right]$, and the number of deaths $D\left[a\right]$ observed for each age group $a$ is Binomialdistributed with size $\lfloor \rho \left[a\right]\times \mathrm{p}\mathrm{o}\mathrm{p}\left[a\right]\rfloor $ (the number of infections) and probability $\mathrm{I}\mathrm{F}\mathrm{R}\left[a\right]$. For each sample of $\rho \left[a\right]$, we draw a sample of the posterior distribution of $\mathrm{I}\mathrm{F}\mathrm{R}\left[a\right]$ , given by
and compute the median, interquartile ranges (IQRs), and 95% confidence intervals of the IFR by retrieving the quantiles of the posterior distribution.
To infer the IFR, we considered the age groups 16–24, 25–34, 35–44, 45–54, and 55–64. We applied the same method to estimate the overall IFR but using a single age group containing all individuals aged between 16 and 64. Therefore, IFR of individuals older than 64 or younger than 16 is not included in the overall IFR estimates. The method used to infer the IFR was also applied to compute the infection hospitalisation rate (IHR), but we used the number of hospitalisations with SARI instead of the number of deaths.
We note that the proposed seroreversion correction model can be used to estimate the attack rate and IFR of epidemics driven by other lineages in other regions. However, the uncertainty of the seroprevalence estimate increases over time, as a larger amount of seroreversion needs to be corrected. Therefore, estimated attack rates and IFRs suffer from larger uncertainty when longer time periods are considered.
To validate the obtained IFRs, we also estimate the IFRs using the measured prevalence corrected only by the sensitivity and specificity of the assay, without explicitly accounting for seroreversion. In this validation analysis, we use a small threshold of 0.1 S/C to avoid underestimating the prevalence due to seroreversion (see Appendix 1).
Estimating the IFR for the Gamma VOC
We estimate the IFR and the attack rate separately for the second, Gammadominant, SARSCoV2 wave that occurred in Manaus. The Gamma variant was first detected in Manaus in November 2020, and its prevalence among PCRpositive patients grew rapidly to 87.0% on 4 January 2021 (Faria et al., 2021). For this reason, it is reasonable to assume that all infections in Manaus that occurred after 15 December, 2020, are due to the Gamma VOC. The Gammadominated wave was characterised by a nonnegligible proportion of reinfections (Coutinho et al., 2021; Faria et al., 2021; Prete et al., 2022). It is estimated that 13.6–39.3% of the infections in the second wave of COVID19 epidemic in Manaus were reinfections (Prete et al., 2022), which are explained by the higher invitro reinfection potential of Gamma (Lucas et al., 2021) and partial immunity waning 8 months after the first surge. Thus, to calculate the attack rate and IFR of the Gammadominated wave, reinfections must be considered.
However, estimating the incidence of reinfections among positive donors is not straightforward  as a positive result may be either primary infection or reinfection, and these cannot be distinguished using a single test result. For this reason, it was not possible to obtain a point estimate for the number of infections that happened in the second wave in Manaus. To overcome this problem, we calculate upper bounds for the attack rate of the Gammadominated wave in Manaus (i.e. the incidence between December 2020 and March 2021) and conversely lower bounds for the IFR of the Gamma VOC.
We first estimate the attack rate of the second wave using a Bayesian model that does not take reinfections into account. This model also neglects seroreversion for individuals infected during the second wave due to the small interval of 3 months considered in this analysis (see Appendix 1 for a complete description of the model). Denoting as $\hat{\mathrm{A}\mathrm{R}}$ the attack rate estimated by this model, the true attack rate $\mathrm{A}\mathrm{R}$ is given by $\mathrm{A}\mathrm{R}=\hat{\mathrm{A}\mathrm{R}}+R+S$, where $R$ is the proportion of donors that were seropositive in December 2020 and subsequently had a reinfection, and $S$ is the proportion of donors that were seropositive in December 2020 and became seronegative in the following months. Since $R+S$ cannot be greater than the seroprevalence in December 2020 (denoted as ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$), the upper bound for the attack rate is $\mathrm{A}{\mathrm{R}}_{\mathrm{m}\mathrm{a}\mathrm{x}}=\hat{\mathrm{A}\mathrm{R}}+{\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$. Therefore, the upper bound is obtained assuming that all individuals that were seropositive in December were later reinfected or were seronegative in March 2021.
To estimate $\mathrm{A}{\mathrm{R}}_{\mathrm{m}\mathrm{a}\mathrm{x}}$, we compute the monthly number of positive tests ${T}^{+}\left[n\right]$ from December 2020 to March 2021 for each agesex group, as well as the number of true positives (TP) and false negatives (FN) from convalescent plasma donors and the number of false positives (FP) and true negatives (TN) from the prepandemic blood donors cohort in Manaus (Supplementary file 1). The Bayesian model generates posterior samples of the crude monthly incidence and the crude seroprevalence in December ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$ . We then correct the crude incidence by the sensitivity of the assay to obtain posterior samples of $\hat{\mathrm{A}\mathrm{R}}$, which are then added to the posterior samples of ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$, resulting in samples of $\mathrm{A}{\mathrm{R}}_{\mathrm{m}\mathrm{a}\mathrm{x}}$. As explained above, the lower bound for the IFR is then calculated using the upper bound of the attack rate and the number of deaths with symptom onset between 16 December and 15 March. This procedure is repeated for each agesex group independently and is summarised in Appendix 1.
Only small estimates of the upper bound for the attack rate are informative, as in scenarios where ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$ is small. To limit ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$, we estimate the incidence using a threshold of 1.4 S/C (the upper threshold recommended by the manufacturer) instead of 0.49 S/C (the lower threshold recommended by the manufacturer) and correct for sensitivity based on 163 true positives and 30 false negatives in the plasma donors cohort. Since the specificity of the test using a threshold of 1.4 is 99.9%, and since it is not straightforward to take the specificity into account when reinfections are allowed, we do not correct for specificity in this analysis.
We additionally computed the IFR obtained using the seroprevalence estimated by the model. It is worth noting that our seroreversion correction model only estimates the incidence among seronegative individuals, thus an S/C boosting due to reinfection is not detected by our method. As such, our model estimates the seroprevalence assuming there are no reinfections among positive individuals, underestimating the size of the second wave in Manaus.
The IHR for the Gamma VOC was estimated using the same procedure but using the number of hospitalisations by SARI instead the number of deaths.
This method can be applied to estimate upper bounds for the attack rate of epidemics driven by other lineages with high rates of reinfection such as Delta and Omicron VOCs, but as previously highlighted the upper bound is only informative if the initial crude seroprevalence is small. This may not be the case in regions where vaccines inducing antiN antibodies were applied, as it is not possible to distinguish vaccination from natural infection based only on antiN serological data.
Definition of the homestay index
The homestay index for the eight cities was extracted from https://bigdatacovid19.icict.fiocruz.br/. It was calculated using data from Google Mobility reports using the procedure described in Barreto et al., 2021. The homestay index is defined as
where ${X}_{H},{X}_{G},{X}_{P},{X}_{T},{X}_{R}$, and ${X}_{W}$ are, respectively, the variation of mobility (using prepandemic mobility levels as baseline) in the following place categories: residential areas, grocery and pharmacy, parks, transit stations, retail and recreation, and workplaces.
Calculation of agestandardised estimates
In this work, we calculated the agestandardised mortality, the agestandardised overall IFR, and the agestandardised overall IHR. The procedure used to perform age standardisation was the same for all these quantities. We define an agestandardised variable as the estimate that would be obtained if all cities had the same age structure. Denoting $\eta \left[a\right]$ as an agespecific IFR or IHR for a given city and $w\left[a\right]$ as the proportion of the combined population of all eight cities belonging to age group $a$, then the agestandardised overall IFR or IHR is $\sum _{a=1}^{M}w\left[a\right]\eta \left[a\right],$ where $M=5$ is the number of age groups. Similarly, denoting $\mu (t,a)$ as the mortality for age group $a$ and day $t$ for a given city, the agestandardised mortality is $\sum _{a=1}^{M}w\left[a\right]\mu (t,a).$
Results
Serology assay validation and antibody waning
Antibody kinetics vary with disease severity (Buss et al., 2021; Lumley et al., 2021; Takahashi et al., 2021), and whole blood donors represent predominantly asymptomatic or mild SARSCoV2 infections due to donation eligibility criteria (Buss et al., 2021). As such, we sought to estimate a timetoseroreversion distribution that accurately reflected the blood donor convenience sample used in this study. We identified and tested 7675 repeat whole blood donors in Manaus who had made multiple donations throughout for 2020–2021 (Appendix 1—figure 14) and used these data to estimate the timetoseroreversion probability distribution (see Materials and methods).
The results are shown in Figure 1, which compares the halflife, peak S/C values, and timetoseroreversion of repeat whole blood donors to the cohort of symptomatic convalescent plasma donors used to determine sensitivity. Repeat blood donors had a shorter assay signal halflife than plasma donors (median [IQR] 69.3 [53.0–103.8] versus 105.9 [62.7–185.1] days) and a lower observed peak S/C ratio (median [IQR] 2.89 [1.49–4.83] versus 5.08 [3.22–6.99]), yielding a shorter median time between seroconversion and seroreversion (203 [147–294] days versus 280 [175–441] days). This highlights the importance of choosing a timetoseroreversion distribution that is appropriate for the use case  the rate of waning seen in PCRconfirmed symptomatic disease would have resulted in underestimation of SARSCoV2 attack rates.
COVID19 mortality across Brazilian capitals
The location of the eight Brazilian state capitals that contributed serology data is shown in Figure 2A. They collectively represent approximately 14% of the total Brazilian population. The age distributions of the eight cities differ widely (Figure 2—figure supplement 1), as such COVID19 mortality is presented as agestandardised rates (see Appendix 1—figure 15 for the crude mortality curves). Between 1 March 2020 and 31 March 2021, the agestandardised mortality rate varied from 1.7 deaths per 1,000 inhabitants in Belo Horizonte to 5.3 deaths per 1,000 in Manaus, which had twice the mortality of Fortaleza, the city with the next highest mortality (Figure 2C).
Figure 2B shows the homestay index for the eight cities (see Materials and methods for the definition). Manaus, the city with the youngest population, returned to prepandemic levels of mobility by July 2020, having consistently lower homestay index (i.e. higher mobility) than other cities after June 2020, whereas the other seven cities showed a relatively homogenous mobility pattern. The shape of the mortality curves also varied markedly (Figure 2D). Manaus was also an outlier in having the lowest income per capita, health insurance coverage, and lowest proportion of the population with comorbidities, along with the highest number of residents per household (Figure 2—figure supplement 2).
Blood donor serosurveillance
Using an average of 951 monthly samples of routine whole blood donations (from March 2020 to March 2021, a total of 97,950 samples) in each of the eight cities, we measured the crude seroprevalence of antiN IgG antibodies detectable by the Abbott CIMA (Table 1). However, these raw estimates of seroprevalence are affected by seroreversion dynamics and provide a poor guide for assessing past levels of population exposure.
Using our Bayesian seroreversion correction model, we present in Figure 2D the agestandardised SARSCoV2 attack rates (i.e. the cumulative rate of the population that was infected or reinfected) as of March 2021 after accounting for test sensitivity, test specificity, and IgG seroreversion (coloured lines) along with the directly measured seroprevalence (light grey boxplots) and the estimated seroprevalence adjusting for test sensitivity and specificity (dark grey boxplots). Our results further underscore the significantly different scales of SARSCoV2 epidemic impact experienced across the eight cities, with the implied attack rates ranging from only 19.3% in Curitiba, to as high as 75.0% in Manaus by December 2020 (see Table 1). Alternative cumulative seroprevalence estimates produced using different timetoseroreversion distributions are similar to those in Figure 2D and shown in Appendix 1—figure 12. We note that even though the seroprevalence estimated by our model includes reinfection in seronegative individuals, the model does not capture reinfection in already positive individuals. Therefore, the model is likely to underestimate SARSCoV2 attack rates in scenarios where reinfection is not rare, and the obtained seroprevalence can surpass 100% due to reinfections among seronegative individuals.
The slope of the seroprevalence curves (Figure 2D) also differed significantly across cities, showing different dynamics of antibody acquisition at the population level according to the shape and dynamics of the epidemic experienced. Cities with only minimal epidemic peak as Belo Horizonte and Curitiba showed near constant rates of increase in seropositivity after adjustment for antibody waning. By contrast, cities with substantial epidemic peak as Fortaleza and Manaus demonstrated significant variation in the rate at which estimated seropositivity increased in the population, with these rates highest during the epidemic peaks. These findings highlight the capacity of blood donorbased serological data to recapitulate important temporal trends in the intensity and dynamics of the epidemics across these eight cities.
The estimated seroprevalence in June and July in Fortaleza was significantly smaller than the measured seroprevalence without correction for seroreversion, even though the seroprevalence estimates disaggregated by age and sex (Appendix 1—figure 16) lie within or above the confidence intervals of the measured seroprevalence. This effect happened especially in women, which had a crude seroprevalence that was significantly larger than in men in June and July 2020, but became similar in the following months. It is possible that the seroreversion rate observed in Fortaleza had been faster than the rate estimated from repeat blood donors, in which case we undercorrected for seroreversion, underestimating the attack rate. However, a more likely explanation is that samples between March and July 2020 for Fortaleza are less representative of the population, since only 39.4% from 4970 selected samples could have been retrieved and tested, compared to 97.0% for the other cities and months. As such, seropositive individuals from Fortaleza may have been more likely to donate in these months, leading to an overestimated crude seroprevalence.
Agesex patterns in blood donor seroprevalence
We next examine the patterns and dynamics of attack rates across different groups by disaggregating the seroprevalence data by age and sex. The seroprevalence estimates disaggregated by age and sex are shown in Figure 3 (see Appendix 1—figures 17–18 for seroprevalence disaggregated by only age or sex). Across the eight cities, our results consistently show differences between sexes  on average, men tended to have higher attack rates than women, although the degree and extent of this difference varied between cities. In São Paulo, the seroprevalence in December 2020 for men was 30.6% compared to the 23.0% estimated in women (i.e. 33.5% (95% CrI 17.7–51.9) higher, Figure 3B). By contrast, seroprevalence in Curitiba in December 2020 was similar for women and men, being only 4.65% (95% CrI –11.5 to 18.5) higher in women.
We also observed an extensive variation in the dynamics of populationlevel seroprevalence between age groups, with seroprevalence in December 2020 typically highest in younger age groups. The seroprevalence of individuals below the age of 55 increased in all cities except for Recife when compared to donors aged between 55 and 69, increasing by 34.1% (95% CrI –2.23–91.2) in Curitiba and decreasing by a small factor of 0.5% (95% CrI –24.8–19.1) in Recife. Furthermore, in cities with a large increase in seroprevalence during the first epidemic wave (i.e. Manaus, Recife, Fortaleza, and Salvador), this was primarily driven by younger men. In these locations, the differences between agesex groups slowly narrowed during the long period of less intense transmission (Figure 3A). This highlights important differences between agegroups in the extent to which they were exposed to the virus and/or contributed to transmission at different points during the regional epidemics  differences that are not evident, or certain, from case or death counts alone.
In addition to the differences in attack rates by age and sex, seroprevalence did not increase homogeneously among different age and sex groups. In Manaus, seroprevalence was significantly larger in men and younger individuals aged 16–44 in July 2020, but between July and December seroprevalence increased faster in women and donors older than 45 years, leading to smaller differences in attack rate by age and sex in December 2020 (Figure 3B). Similar patterns are also observed in Salvador, Recife, and Fortaleza, although with smaller age and sex inequalities.
Variation in the SARSCoV2 IFR across age groups and locations
Using estimates of the cumulative number of individuals infected alongside records of COVID19 deaths available from Brazil’s SIVEPGripe platform, we next calculated the IFR for each city and age group. Figure 4A presents the estimated agespecific IFRs for each municipality as of December 2020, before the Gamma VOC epidemic in Brazil. Our results show the IFR significantly increases with age, ranging from 0.03% in individuals aged 16–24 years to 1.31% in individuals aged 55–64 years. This is inkeeping with previous work highlighting a strong age dependency in COVID19 mortality (Brazeau et al., 2020; Buss et al., 2021; O’Driscoll et al., 2021). Cities presented different agestandardised overall IFRs, being smaller in Manaus (0.24%) and higher in Curitiba (0.54%).
There was a strong correlation (Pearson’s correlation = 0.92) between the agestandardised mortality rate in each city and the attack rate inferred from blood donor serosurveillance data (Figure 4B). Both the overall IFR and the overall IFR adjusted for the age structure of the city differed significantly between cities (Figure 4C), showing that the IFR differences cannot be explained only by the different age structures. Despite the differences between cities, the obtained agespecific IFRs were similar to the estimates from Brazeau et al., 2020 but higher than the estimates from O’Driscoll et al., 2021 (Figure 4D). The agespecific and overall IHR were also estimated (Figure 4—figure supplement 1) and showed similar patterns, being larger in Belo Horizonte, Curitiba, and São Paulo.
The obtained IFRs and attack rates for December 2020 were validated using alternative approaches that do not correct directly for seroreversion, not depending on the proposed seroreversion correction model (see Appendix 1).
The dynamics and epidemiological impacts of the Gamma VOC in Manaus
As previously highlighted, we could not obtain a point estimate of the attack rate in the Gammadominated period in Manaus because we are unable to identify which of the seropositive blood donors are primary infections and which are reinfections. Instead, we calculated upper bounds assuming maximum proportions of reinfections. The inferred upper bound of the agespecific attack rate in the Gammadominated period in Manaus ranged from 30.6% (95% Bayesian CrI 22.8–41.1) to 46.0% (95% CrI 32.8–60.6) in individuals aged 45–54 and 55–64 (Figure 5—figure supplement 1), showing small variation among age groups. The estimated upper bound for the agestandardised cumulative attack rate in the second period dominated by the Gamma variant was 37.5% (95% CrI 35.3–42.6), significantly smaller than the cumulative attack rate of 75.0% (Figure 2) estimated for the first period dominated by nonGamma variants.
Comparing to the COVID19 attributable hospitalisations and deaths reported to the SIVEP database, we next used the estimated upper bounds of the agespecific attack rates in the Gamma period in Manaus to calculate lower bounds of the agespecific IHR and IFR for the Gamma period. We then compared the IFRs and IHRs obtained with the attack rate estimated for the period during which nonGamma variants dominated (from 1 March 2020 to 15 December 2020). The resulting agespecific IFRs and IHRs are shown in Figure 5A, B, respectively, and the relative risks obtained using the IFR or IHR in December 2020 as baseline in Figure 5C, D. The lower bound for the IHR increased in all age groups, from 34.4% (95% CrI 6.5–70.0) in individuals aged 16–24 to 163.4% (95% CrI 90.9–264.3) in individuals aged 45–54 when compared to the IHR estimated for the nonGamma period. The increased hospitalisation risk combined with an increased inhospital fatality rate (HFR, defined as the number of deaths per hospitalisation) during the second wave (Appendix 1—figure 19) resulted in an increased agespecific IFR, with a lower bound increasing 93.8% (95% CrI 36.4–186.4) in individuals aged 55–64 to 273.5% (95% CrI 167.8–423.4%) in individuals aged 45–54 when compared to the first wave (Figure 5C). As such, even though the IFR and IHR increased for all age groups during the Gammadominated period, this difference was more significant in younger age groups. The obtained lower bound for the overall IFR was 0.527% (95% CrI 0.447–0.630), 2.91 (95% CrI 2.43–3.53) times higher than the estimated IFR for the first wave in Manaus.
Discussion
Our results highlight the divergent epidemic dynamics across eight of Brazil’s biggest cities as reflected by mortality rates, and show that these differences are recapitulated in blood donorbased serial crosssectional serosurveillance. Despite the large IFR differences observed across cities, seroprevalence was strongly correlated with cumulative agestandardised mortality (Figure 4B). These results reinforce the validity of blood donors as a convenient population for serosurveillance. A previous study (Mina et al., 2020) has highlighted the need for a reliable, costeffective method of immunological surveillance to provide evidence of past infection and to understand the dynamics of emerging disease. Even though serology is less precise for identifying infections on an individual level, it is an effective tool for monitoring epidemics at a population level. As blood donation programs are an existing component of medical infrastructure globally and in which blood samples are readily available in many locations, this approach can be rapidly implemented and carried out in large populations.
We estimated larger attack rates in individuals aged 16–54 years. This is consistent with previous work examining age patterns of transmission from mobility data in the United States (Monod et al., 2021), but we have measured infection directly rather than making inferences indirectly on the basis of COVID19 deaths and movement. Possible reasons for the higher attack rate in people aged 16–54 years include, but are not limited to, different risk perception and shielding practises, and disease biology with more frequent asymptomatic infections in younger people, which increase infection risk in this age group due to greater mixing among working age adults. We also found overall higher levels of seroprevalence in men compared to women, and these patterns changed over time. For instance, in Manaus, a very high seroprevalence was reached rapidly among young men by July 2020, after which relatively little increase in overall seroprevalence occurred in men. By contrast, among older women, who reached less than half the attack rate seen in men by June, the seroprevalence continued to increase. This heterogeneity in transmission in a location with high overall antibody prevalence meant that some groups remained relatively susceptible and perpetuated transmission at a lower level (Buss and Sabino, 2021; Lalwani et al., 2021). Other works suggest that socioeconomic condition also contributed to heterogeneity of SARSCoV2 spread in Manaus (Lalwani et al., 2021), which is confirmed by the large seroprevalence observed in Black and lesseducated donors (Appendix 1—figure 20).
We also confirm a strong age dependency of COVID19 IFRs (Brazeau et al., 2020; O’Driscoll et al., 2021). Although agespecific IFRs were roughly similar across the cities (Figure 4A) and similar to estimates in the literature (Figure 4D), there were some noticeable differences. For example, the more affluent south and southeastern cities of Belo Horizonte, Curitiba, and São Paulo tended to have higher agespecific IFRs, whereas in the northern and northeastern cities of Manaus, Salvador, and Fortaleza, the agespecific IFRs tended to be lower. This may be due to underreporting of deaths but might also reflect lower prevalence of comorbidities in the latter populations (Figure 2—figure supplement 2). Cities with larger IFRs also had larger IHRs, suggesting that the differences in IFR reflect the different risks of developing a severe disease. The different lineages circulating in the eight cities may have also contributed to the observed IFR and IHR difference (Appendix 1—figure 1). While most of the cases in the first wave in Amazonas and Ceará were caused by earlier lineages, the lineages B.1.1.28, B.1.1.33, and later P.2 (Zeta) were more prevalent in other states. It is worth noting the IHR also depends on the probability of an individual with severe disease being hospitalised. This probability depends on access to health facilities and availability of healthcare resources, and therefore may vary across cities even if disease severity remains constant.
Our results also clearly demonstrate a higher IHR during the Gammadominated observation period compared to the nonGamma observation period in Manaus for all age groups (Figure 5). This supports observations (Banho et al., 2021) that the Gamma VOC tends to cause more severe disease than the ancestral nonGamma variants circulating locally, even among young adults in Manaus. The larger increase in IHR for younger adults aged 25–54 years is compatible with the younger profile of hospitalisations of the Gammadominated wave in Brazil (de Souza et al., 2021)^{,} observed before vaccination coverage reached significant levels in older age groups. In Manaus, the increased levels of hospitalisation caused parts of the healthcare system to collapse during the second wave causing an increase in HFR as previously described (Brizzi et al., 2021), further increasing the IFR. The higher IFR associated with Gamma VOC infection during the second wave is therefore due to a combination of two factors – increased disease severity resulting in a greater proportion of infections requiring hospitalbased care (the IHR, arising primarily from intrinsic viral properties and pathogenicity), and the impacts of this increased healthcare pressure on mortality withinhospitals (the HFR, arising primarily from healthcare pressure).
There are some relevant limitations to our results that need to be pointed. First, blood donors are a convenience sample, and extrapolation to the entire population should be done with caution. Due to eligibility criteria in Brazil, blood donors are limited to those aged 16–69 years, with a strong skew towards younger adults even within this eligibility range (Figure 2—figure supplement 1) in most Brazilian regions. However, our results do suggest that blood donor serosurveillance agrees with other metrics of epidemic size as mortality, both cumulatively (Figure 5B) and through time (Figure 3). Moreover, both sensitivity and seroreversion could be an agedependent process as a proxy for disease severity, i.e., older individuals are more likely to be symptomatic, seroconvert, and have longer time to seroreversion. Indeed, we see this pattern of longer halflives and larger peak S/Cs in convalescent plasma donors who had recovered from more severe disease (Appendix 1—figure 21). Therefore, correction of crude seroprevalence for antibody waning could possibly be confounded by demographic differences between the eight cities. However, since individuals that had a severe disease are unlikely to donate blood, seropositive whole blood donors are likely fairly homogenous in having had milder or asymptomatic disease, and as such, the rate of waning may not vary significantly between locations. An additional important point to note is that the longer an epidemic last, the more frequent reinfections become due to the natural waning of immunity in the time period following infection. Our data span over a year of transmission in areas with multiple waves with high SARSCoV2 burden and consequently nonnegligible reinfection rates, as such it is difficult to reliably infer the attack rates from seroprevalence data towards the end of the time series. For this reason, our model produces upper bounds for cumulative prevalence of >100% in Manaus by early 2021.
Despite these limitations, blood donors represent an accessible population to detect trends of the epidemic that otherwise could only be obtained through expensive populationbased studies, which are difficult to establish in Brazil during the course of a rapidly progressing epidemic. Studies to understand the main differences between blood donors and the general population would help the development of better sampling protocols to mitigate bias and should be part of preparedness for future epidemics of infectious diseases.
Appendix 1
Validation of the obtained attack rates and IFRs
The seroprevalence and IFRs obtained in December 2020 estimated with our seroreversion correction model were validated using a smaller threshold of 0.1 and correcting only for sensitivity and specificity, without explicitly correcting for seroreversion (Appendix 1—figure 22, Supplementary file 1). Even though this approach underestimates the seroprevalence (thus overestimates the IFR) because a fraction of previously seropositive donors had already seroreverted by December 2020 (leading to a significant number of false negative test results), the obtained attack rates and IFRs were similar to the estimates of our model. The inferred seroprevalence for Manaus and Curitiba in December 2020 was, respectively, 61.0% (95% CrI 56.5–65.4%) and 13.4% (95% CrI 10.0–17.2%), compatible with the estimates of our seroreversion correction model (Figure 2, Table 1) given that these quantities underestimate the seroprevalence due to waning. The IFR pattern across cities was also similar in this analysis, being higher in Curitiba and smaller in Manaus for almost all age groups.
An alternative approach to estimating the attack rate in December 2020, in the face of waning antibodies and falling assay sensitivity, is to calculate the IFR early in the epidemic, prior to significant waning, and extrapolate the number of future cases from the reported death time series. To further validate the attack rates, we calculated the agespecific IFRs in June 2020 (when less seroreversion is expected) for the age range eligible to donate blood and extrapolated using only the deaths within this age bracket. As such, the seroprevalence obtained for the other months is based solely on the number of deaths and the IFR inferred for June 2020 (Appendix 1—figure 23). This approach led to an estimated seroprevalence of 90.8% (95% CrI 78.1–107.7%) and 10.9% (95% CrI 1.2–28.0%) in Manaus and Curitiba, respectively, in December 2020, which are compatible with our estimates if confidence intervals are considered. Nevertheless, this approach has the limitation of assuming a constant IFR through time and only using a small amount of the total available serologic data.
To validate the inferred cumulative attack rate in November 2020 prior to the Gammadominated second wave and also in April 2021, following this second wave, we retested 996 samples from November 2020 in Manaus and tested 769 samples from April 2021 using the Abbott antiS SARSCoV2 IgG CIMA (Appendix 1—figure 24), which showed less waning than the Abbott antiN assay used in this work (Stone et al., 2021). As such, the usage of the antiS assay reduced the difference between the seroprevalence obtained without explicitly correcting for seroreversion and the true seroprevalence. A sensitivity of 94.0% was obtained by testing convalescent plasma donors with this assay (Supplementary file 1), and the specificity was assumed as 100%. The crude prevalence of antiS antibodies was 56.7% (95% CrI 53.6–59.8%) in November 2020 and 78.7% (95% CrI 75.6–81.4%) in April 2021. After correcting for sensitivity and reweighting by age and sex, the seroprevalence estimate was 60.0% (95% CrI 58.4–62.2%) in November 2020 and 83.3% (95% CrI 81.1–86.4%) in April 2021, compared to 68.0% (95% CrI 64.2–72.7%) and 99.5% (95% CrI 94.0–106.6%) estimated using our seroreversion correction model for November 2020 and March 2021. Note that the attack rate estimated by our model considers both infections and reinfections among seronegative individuals, hence, the confidence intervals higher than 100%. Of note, we measured the halflife of this assay using the serial repeat blood donors data available in Prete et al., 2022, obtaining a median halflife of 124.5 (interquartile range 74.7–258.0) days. In November, 6 months following the first wave in Manaus, some cases of seroreversion are expected to have occurred; as such, this remains an underestimate of the true cumulative attack rate by this point. Assuming no reinfections before November, the smaller seroprevalence measured with the antiS assay suggests that 8.0% of previously infected donors seroreverted before November 2020.
Derivation of the expression of the probability of a positive test
In this section, we derive the expression of the probability of a positive test $\theta [n,a]$ in terms of the agespecific weekly incidence $u[n,a]$. Let us denote the negation of an event $E$ as $\overline{E}$, and the probability of an event $E$ as $\mathbb{P}\left\{E\right\}$. To shorten the next equations, we also denote ${\mathcal{T}}^{+}\left(n,a\right)$ the event of a test applied to an individual from agesex group $a$ at week $n$ being positive, $I(n,a)$ the event of an individual from agesex group $a$ being infected at week $n$ such that the incidence at week $n$ and agesex group $a$ is $\mathbb{P}\left\{I\left(n,a\right)\right\}$, $I\left(1:n,a\right)={\bigcup}_{k=1}^{n}I(n,a)$ the event of an individual from group $a$ having being infected before week $n$ and $\mathcal{C}\left(a\right)$ the event of an infected individual from group $a$ seroconverting after infection. We consider that the initial sensitivity $\mathrm{s}{\mathrm{e}}_{0}$ (i.e. the sensitivity right after seroconversion) and specificity $\mathrm{s}\mathrm{p}$ of the assay do not depend on the agesex group or on time.
The probability $\theta \left[n,a\right]$ of a test applied to a person from agesex group $a$ being positive at week $n$ is
The first term can be decomposed as
The term $\mathbb{P}\left\{{\mathcal{T}}^{+}\left(n,a\right)\mid I\left(k,a\right)\right\}$ can be further decomposed into
Assuming that an infected individual that did not seroconvert cannot have a positive test at any instant, we have $\mathbb{P}\left\{{\mathcal{T}}^{+}\left(n,a\right)\mid I\left(k,a\right),\overline{\mathcal{C}}\left(a\right)\right\}=0$. We approximate $\mathbb{P}\left\{{\mathcal{T}}^{+}\left(n,a\right)\mid I\left(k,a\right)\right\}$ to ${p}^{+}[nk]$ (i.e. the probability of a test being positive $nk$ weeks after seroconversion), neglecting the delay between infection and seroconversion. Since the mean delay between infection and seroreversion is smaller than 8 days as explained above, and since crude seroprevalence data are discretized using weeks as time unit, this delay has small influence on seroprevalence estimates. Because $\mathbb{P}\left\{\mathcal{C}\left(a\right)\mid I\left(k,a\right)\right\}$ is the sensitivity $\mathrm{s}{\mathrm{e}}_{0}$ of the assay and $\mathbb{P}\left\{I\left(k,a\right)\right\}$ is the incidence $u[k,a]$, we have $\mathbb{P}\left\{{\mathcal{T}}^{+}\left(n,a\right),I\left(1:n,a\right)\right\}$ $=\mathrm{s}{\mathrm{e}}_{0}\sum _{k=1}^{n}{p}^{+}\left[nk\right]u\left[k,a\right]$.
The second term of $\theta [n,a]$ is
where $\text{sp}=\mathbb{P}\left\{{\mathcal{T}}^{+}\left(n,a\right)\mid \overline{I}\left(1:n,a\right)\right\}$ is the specificity of the assay, which does not change over time.
Therefore, a simpler expression for $\theta [n,a]$ is obtained:
Description of the method used to validate the seroprevalence and IFR for 2020
To validate the seroprevalence and IFRs estimated for 2020, we recalculate these quantities by measuring the prevalence in December 2020 with a smaller threshold equal to 0.1 to partially account for seroreversion and correct for sensitivity and specificity, without explicitly incorporating a method to correct for seroreversion (Appendix 1—figure 23).
Let $\mathrm{T}\mathrm{P},\mathrm{F}\mathrm{N},\mathrm{F}\mathrm{P}$, and $\mathrm{T}\mathrm{N}$ be, respectively, the number of true positives, false negatives, false positives and true negatives obtained from plasma donors and the prepandemic cohort in Manaus using a threshold 0.1. We use a uniform distribution in the interval [0, 1] as prior for the seroprevalence of agesex group $a$, and also for the sensitivity $\mathrm{s}\mathrm{e}$ and specificity $\mathrm{s}\mathrm{p}$. The posterior distribution of the sensitivity and specificity is, respectively, $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}\left(1+\mathrm{T}\mathrm{P},1+\mathrm{F}\mathrm{N}\right)$ and $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}\left(1+\mathrm{T}\mathrm{N},1+\mathrm{F}\mathrm{P}\right)$. The seroprevalence $\rho \left[a\right]$ of agesex group $a$ is distributed according to a binomial distribution of size $T\left[a\right]$ (the number of tests for this agesex group) and probability $\mathrm{s}\mathrm{e}\times \rho \left[a\right]+(1\mathrm{s}\mathrm{p})(1\rho \left[a\right])$. To draw a posterior sample of $\rho \left[a\right]$, we draw a posterior sample of $se$ and $sp$ and from the auxiliary variable $Z\left[a\right]\sim \mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}({T}^{+}\left[a\right]+1,T\left[a\right]{T}^{+}\left[a\right]+1)$, which represents the raw measured prevalence. Then, we compute the prevalence adjusted by sensitivity and specificity through $\rho \left[a\right]=\left(0,\frac{Z\left[a\right]+\text{sp}1}{\text{se}+\text{sp}1}\right)\text{}.$ Finally, a sample of the IFR is then drawn from $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(1+D\left[a\right],1+\lfloor \mathrm{p}\mathrm{o}\mathrm{p}\left[a\right]\times \rho \left[a\right]\rfloor D[a\left]\right)$.
Selection of SARI hospitalisations and deaths to estimate the IFR and IHR
In the first months of the SARSCoV2 epidemic in Brazil, a small proportion of SARI cases were tested for SARSCoV2, leading to a large number of nonnotified deaths. For this reason, instead of using only COVID19 confirmed hospitalisations or deaths to estimate the IFR and IHR, we also included SARI hospitalisations or deaths with unknown aetiology. This approach was proposed in de Souza et al., 2020. In this section, we investigate the validity of this approach by comparing SARI hospitalisations and deaths in the eight cities recorded between 2013 and 2021.
Appendix 1—figure 25 shows the monthly number of SARI deaths from 2013 to 2021 disaggregated by case classification (confirmed SARSCoV2 infection, infection confirmedly caused by other respiratory viruses, and cases with unknown or missing aetiology). The monthly number of SARI deaths increased abruptly in March 2020 due to the SARSCoV2 epidemic to 3810, 14.5 times larger than the previous peak of SARI cases in April 2016. Despite that only 58.6% of the SARI deaths in March 2020 were confirmed as COVID19 cases, and 39.7% had unknown or missing aetiology, suggesting that most SARI deaths with unknown aetiology were nonnotified COVID19 deaths. Even if there was an epidemic of another respiratory virus in March 2020 that caused a number of cases similar to April 2016, it would only explain 17.4% of the SARI cases with unknown or missing aetiology in March 2020.
The proportion of each case classification among monthly SARI deaths is shown in Appendix 1—figure 4b. The proportion of deaths with unknown or missing aetiology had a peak in September 2020, decreasing over time in the following months likely due to the increasing availability of tests. Therefore, the effect of taking SARI into account is more important in the first year of the epidemic in Brazil.
Similar patterns are observed for SARI hospitalisations, as shown in Appendix 1—figure 26. However, SARI hospitalisations in March 2020 are only 4.7 times larger than the previous peak of monthly SARI hospitalisations in April 2016, suggesting that our approach is more sensitive to SARI cases caused by other respiratory viruses when hospitalisations are used. Nevertheless, an epidemic of other respiratory viruses similar to the historical peak of SARI cases in April 2016 would only explain 42% of the SARI cases with unknown or missing aetiology in March 2020.
Method used to estimate ${\mathit{p}}^{+}\left[\mathit{n}\right]$ from repeat blood donors
Here we summarise the stepbystep procedure used to estimate ${p}^{+}\left[n\right]$, the probability of an individual remaining seropositive $n$ weeks after seroconversion, described in Methods. The algorithm receives as inputs: The set of serial donations from ${N}_{\mathrm{d}\mathrm{o}\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{s}}$ repeats blood donors who have at least one positive result and a second result with decaying S/C; the daily incidence over time for the repeat blood donors cohort ${u}_{\mathrm{r}\mathrm{e}\mathrm{p}\mathrm{e}\mathrm{a}\mathrm{t}}\left[n\right]$ (see Methods); the number of samples ${N}_{\mathrm{s}\mathrm{a}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e}\mathrm{s}}$ used to estimate the probability distribution of the time to seroreversion. The output of the algorithm is an estimate of ${p}^{+}\left[n\right]$.
The algorithm is described below:
For $i\in 1,2,\cdots ,{N}_{\mathrm{d}\mathrm{o}\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{s}}:$
Calculate the date of seroreversion for donor $i$ by computing the instant where the exponential curve that passes through the last positive donation and first negative donation after seroconversion (if seroreversion occurred) or the two last positive donations cross the threshold, as illustrated in Appendix 1—figure 10. Denote it by ${t}_{i}^{}.$
Denote ${t}_{\mathrm{m}\mathrm{i}\mathrm{n}}$ as the last negative result of the donor before seroconversion or set ${t}_{\mathrm{m}\mathrm{i}\mathrm{n}}$ as 1 March 2020, if the donor had no positive results before seroconversion. Denote ${t}_{\mathrm{m}\mathrm{a}\mathrm{x}}$ as the date of the first positive result. The unobserved date of seroconversion belongs to the interval $[{t}_{\mathrm{m}\mathrm{i}\mathrm{n}},{t}_{\mathrm{m}\mathrm{a}\mathrm{x}}]$ and its probability mass function is given by
${p}_{i}\left[n\right]=\{\begin{array}{ll}\frac{{u}_{\mathrm{r}\mathrm{e}\mathrm{p}\mathrm{e}\mathrm{a}\mathrm{t}}\left[n\right]}{\sum _{k={t}_{\mathrm{m}\mathrm{i}\mathrm{n}}}^{{t}_{\mathrm{m}\mathrm{a}\mathrm{x}}}{u}_{\mathrm{r}\mathrm{e}\mathrm{p}\mathrm{e}\mathrm{a}\mathrm{t}}\left[k\right]}& ,\text{}\text{}\text{}{t}_{\mathrm{m}\mathrm{i}\mathrm{n}}\le n\le {t}_{\mathrm{m}\mathrm{a}\mathrm{x}}\\ \phantom{\rule{2em}{0ex}}0& ,\text{}\text{}\text{}\text{otherwise}\end{array}.$Generate $N}_{\mathrm{s}\mathrm{a}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e}\mathrm{s}$ samples $\mathrm{\Delta}{t}_{i}^{\left(1\right)},\mathrm{\Delta}{t}_{i}^{\left(2\right)},\cdots ,\mathrm{\Delta}{t}_{i}^{\left({N}_{\mathrm{s}\mathrm{a}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e}\mathrm{s}}\right)}$ from $\mathrm{\Delta}{t}_{i}={t}_{i}^{}{t}_{i}^{+}$, where ${t}_{i}^{+}\sim {p}_{i}\left[n\right]$.
Calculate the empirical probability mass function of $\mathrm{\Delta}{t}_{i}$ by computing the empirical histogram of the generated samples $\mathrm{\Delta}{t}_{i}^{\left(1\right)},\mathrm{\Delta}{t}_{i}^{\left(2\right)},\cdots ,\mathrm{\Delta}{t}_{i}^{\left({N}_{\mathrm{s}\mathrm{a}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e}\mathrm{s}}\right)}$ and denote it as ${p}_{\mathrm{d}\mathrm{a}\mathrm{y}}^{}\left[n\right]$.
Convert ${p}_{\mathrm{d}\mathrm{a}\mathrm{y}}^{}\left[n\right]$ from days to weeks, obtaining ${p}_{\mathrm{w}\mathrm{e}\mathrm{e}\mathrm{k}}^{}\left[n\right]$:
${p}_{\mathrm{w}\mathrm{e}\mathrm{e}\mathrm{k}}^{}\left[n\right]=\frac{1}{7}\sum _{i=1}^{7}\sum _{j=7n+1}^{7\left(n+1\right)}{p}_{\mathrm{d}\mathrm{a}\mathrm{y}}^{}\left[ji\right].$Calculate the probability of an individual remaining seropositive $n$ weeks after seroconversion ${p}^{+}\left[n\right]$ as
${p}^{+}\left[n\right]=1\sum _{i=1}^{n}{p}_{\mathrm{w}\mathrm{e}\mathrm{e}\mathrm{k}}^{}\left[k\right].$
Description of the Bayesian model used to estimate the seroprevalence
We now present an objective description of the seroreversion correction model introduced in Methods. This is a Bayesian model that produces age and sexspecific seroprevalence estimates corrected for seroreversion, sensitivity, and specificity. The model receives as inputs: The probability of seropositivity $n$ weeks after seroconversion ${p}^{+}\left[n\right]$; the weekly number of tests $T[n,a]$ and weekly number of positive tests ${T}^{+}\left[n,a\right]$ at week $n$ for agesex group $a$; the number of true positives ($\mathrm{T}\mathrm{P}$), true negatives ($\mathrm{T}\mathrm{N}$), false positives ($\mathrm{F}\mathrm{P}$) and false negatives ($\mathrm{F}\mathrm{N}$) used to determine the sensitivity and specificity of the assay; the maximum seroprevalence allowed $b$. In this work, we use $b=2$ to partially account for reinfections. The model generates as output posterior samples from the weekly incidence $u[n,a]$ for agesex group $a=\mathrm{1,2},\cdots ,M$ and week $n=\mathrm{1,2},\cdots ,N$. For this reason, the model generates posterior samples from the sensitivity $\mathrm{s}{\mathrm{e}}_{0},$ the specificity $\mathrm{s}\mathrm{p}$, the normalised incidence ${u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}[n,a]$, and the final seroprevalence ${\rho}_{\mathrm{m}\mathrm{a}\mathrm{x}}\left[a\right]$, obtaining $u[n,a]$ from these parameters.
The Bayesian model is described below:
Prior distributions:
$\mathrm{s}{\mathrm{e}}_{0}\sim \mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(1+\mathrm{T}\mathrm{P},1+\mathrm{F}\mathrm{N})$$\mathrm{s}\mathrm{p}\sim \mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(1+\mathrm{T}\mathrm{N},1+\mathrm{F}\mathrm{P})$${u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}\left[1:N,a\right]\sim \mathrm{D}\mathrm{i}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{h}\mathrm{l}\mathrm{e}\mathrm{t}\left(\mathrm{1,1},\cdots ,1\right)\forall a$${\rho}_{\mathrm{m}\mathrm{a}\mathrm{x}}\left[a\right]\sim \mathrm{U}\mathrm{n}\mathrm{i}\mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\left(0,b\right)\forall a$Auxiliary variables:
$u\left[n,a\right]={u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}\left[n,a\right]\times {\rho}_{\mathrm{m}\mathrm{a}\mathrm{x}}\left[a\right],\forall n,a$$\theta \left[n,a\right]=\mathrm{s}{\mathrm{e}}_{0}\sum _{k=1}^{n}{p}^{+}\left[nk\right]u\left[k,a\right]+\left(1\mathrm{s}\mathrm{p}\right)\left(1\sum _{k=1}^{n}u\left[k,a\right]\right)\forall n,a$Likelihood:
${T}^{+}\left[n,a\right]\sim \mathrm{B}\mathrm{i}\mathrm{n}\mathrm{o}\mathrm{m}\mathrm{i}\mathrm{a}\mathrm{l}\left(T\left[n,a\right],\theta \left[n,a\right]\right)\forall n,a$
We note the estimated seroprevalence is the cumulative sum of the obtained incidence $u[n,a]$ and can therefore be larger than 1 due to reinfections. Also, since this model assumes all infections occur in seronegative donors, $u[n,a]$ can be interpreted as the incidence in seronegative donors, and reinfections among seropositive individuals are not detected.
Summary of the method used to estimate the lower bound for the attack rate of the Gamma VOC in Manaus and the upper bound for the IFR
We now present a summary of the procedure used to infer bounds for the agespecific attack rate and IFR of the Gammadominated wave in Manaus explained in Methods. This procedure was also used to estimate the IHR, but in this case, the number of hospitalisations was used instead of the number of deaths.
The algorithm is executed independently for each age group and receives as inputs: The number of deaths $D$ with symptom onset between 16 December 2020 and 15 March 2021; the monthly number of positive tests ${T}^{+}\left[n\right]$ and the monthly number of tests $T\left[n\right]$ for $n\in \left\{\mathrm{12,13,14,15}\right\}$ , i.e., from December 2020 ($n=12)$ to March 2021 ($n=15$); the number of true positives ($\mathrm{T}\mathrm{P}$) and false negatives ($\mathrm{F}\mathrm{N}$) from the convalescent plasma donors cohort; population size $\mathrm{p}\mathrm{o}\mathrm{p}$.
The algorithm produces as outputs posterior samples of the maximum attack rate for the Gamma wave in Manaus (${\mathrm{A}\mathrm{R}}_{\mathrm{m}\mathrm{a}\mathrm{x}}$) and the minimum IFR ($\mathrm{I}\mathrm{F}{\mathrm{R}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$) but also generates posterior samples from the following auxiliary variables: The incidence between months $n$ and $n+1$ denoted as $u\left[n\right]$, and the seroprevalence in December 2020 (month $n=12$) denoted as ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$ .
First, we generate posterior samples from ${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}$ and $u\left[n\right]$ using the Bayesian model described below:
Prior distributions:
${\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}\sim \mathrm{U}\mathrm{n}\mathrm{i}\mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\left(\mathrm{0,1}\right)$$\left({u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}\left[12\right],{u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}\left[13\right],{u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}\left[14\right]\right)\sim \mathrm{D}\mathrm{i}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{h}\mathrm{l}\mathrm{e}\mathrm{t}(\mathrm{1,1},1)$${u}_{\mathrm{m}\mathrm{a}\mathrm{x}}\sim \mathrm{U}\mathrm{n}\mathrm{i}\mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\left(\mathrm{0,1}\right)$Auxiliary variables:
$u\left[n\right]={u}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}\left[n\right]\times {u}_{\mathrm{m}\mathrm{a}\mathrm{x}}\left[a\right],n\in \left\{\mathrm{12,13,14}\right\}$Likelihood:
${T}^{+}\left[n,a\right]\sim \mathrm{B}\mathrm{i}\mathrm{n}\mathrm{o}\mathrm{m}\mathrm{i}\mathrm{a}\mathrm{l}\left(T\left[n\right],{\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}+\sum _{k=12}^{n1}u\left[k\right]\right),n\in \left\{\mathrm{12,13,14,15}\right\}$
Then, for each posterior sample generated by the Bayesian model:
Draw a sample from $\mathrm{s}\mathrm{e}\sim \mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(1+\mathrm{T}\mathrm{P},1+\mathrm{F}\mathrm{N})$.
Compute the incidence corrected by sensitivity ${u}^{\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{r}}\left[n\right]=\frac{u\left[n\right]}{\mathrm{s}\mathrm{e}}$ .
Compute $\hat{\mathrm{A}\mathrm{R}}=\sum _{n=12}^{14}{u}^{\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{r}}\left[n\right].$
Compute $\mathrm{A}{\mathrm{R}}_{\mathrm{m}\mathrm{a}\mathrm{x}}={\rho}_{\mathrm{D}\mathrm{e}\mathrm{c}\mathrm{e}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}+\hat{\mathrm{A}\mathrm{R}}$ .
Draw a sample from the lower bound of the IFR as
$\mathrm{I}\mathrm{F}{\mathrm{R}}_{\mathrm{m}\mathrm{i}\mathrm{n}}\sim \mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}\left(1+D,1+\u230a\mathrm{A}{\mathrm{R}}_{\mathrm{m}\mathrm{a}\mathrm{x}}\times \mathrm{p}\mathrm{o}\mathrm{p}\u230bD\right).$
Data availability
All serological data required to reproduce the analyses are available at Data Dryad (doi:https://doi.org/10.5061/dryad.dz08kps08) and can be downloaded at https://datadryad.org/stash/dataset/doi:10.5061/dryad.dz08kps08. The codes used for the main analyses are available at https://github.com/CADDECENTRE/seroprevalence_eight_cities, (copy archived at swh:1:rev:67518ad26368c1f4856fdfd4c08673abeded4901).

Dryad Digital RepositoryData from: SARSCoV2 antibody dynamics in blood donors and COVID19 epidemiology in eight Brazilian state capitals.https://doi.org/10.5061/dryad.dz08kps08
References

SoftwareReport 34: COVID19 infection fatality ratio: estimates from seroprevalenceReport 34.

Epidemiological and clinical characteristics of the COVID19 epidemic in BrazilNature Human Behaviour 4:856–865.https://doi.org/10.1038/s4156202009284

Second wave of COVID19 in Brazil: younger at higher riskEuropean Journal of Epidemiology 36:441–443.https://doi.org/10.1007/s10654021007508

SARScov2 antibody prevalence in Brazil: results from two successive nationwide serological household surveysThe Lancet. Global Health 8:e1390–e1398.https://doi.org/10.1016/S2214109X(20)303879

The duration, dynamics and determinants of SARScov2 antibody responses in individual healthcare workersClinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America 73:e699–e709.https://doi.org/10.1093/cid/ciab004

Comparison of SARScov2 IgM and IgG seroconversion profiles among hospitalized patients in two us citiesDiagnostic Microbiology and Infectious Disease 99:115300.https://doi.org/10.1016/j.diagmicrobio.2020.115300

Reinfection by the SARScov2 gamma variant in blood donors in manaus, BrazilBMC Infectious Diseases 22:127.https://doi.org/10.1186/s1287902207094y

Are seroprevalence estimates for severe acute respiratory syndrome coronavirus 2 biased?The Journal of Infectious Diseases 222:1772–1775.https://doi.org/10.1093/infdis/jiaa523
Article and author information
Author details
Funding
Itau Unibanco (Todos pela Saúde)
 Nuno R Faria
 Ester C Sabino
FAPESP (18/143890)
 Nuno R Faria
 Ester C Sabino
Medical Research Council (MR/S0195/1)
 Nuno R Faria
 Ester C Sabino
Wellcome Trust and Royal Society (Sir Henry Dale Fellowship 204311/Z/16/Z)
 Nuno R Faria
Gates Foundation (INV 034540 and INV034652)
 Nuno R Faria
 Ester C Sabino
National Heart, Lung, and Blood Institute (Recipient Epidemiology and Donor Evaluation Study HHSN268201100007I)
 Nuno R Faria
 Ester C Sabino
FAPESP (2019/218580)
 Carlos A Prete Jr
Fundacao Faculdade de Medicina
 Carlos A Prete Jr
CAPES (Finance Code 001)
 Carlos A Prete Jr
CNPq (304714/20186)
 Vítor H Nascimento
FAPESP
 Suzete C Ferreira
Programa Inova FIOCRUZCE/Funcap (Edital 01/2020 Number: FIO016700065.01.00/20 SPU Nº 06531047/2020)
 Fabio Miyajima
CNPq
 Manoel BarralNetto
JBS  Fazer o bem faz bem
 Rafael FO Franca
Medical Research Council (MR/V038109/1)
 Oliver Ratmann
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
Acknowledgements
This work was supported by the Itaú Unibanco 'Todos pela Saude' program and by CADDE/FAPESP (MR/S0195/1 and FAPESP 18/14389–0) (http://caddecentre.org/); Wellcome Trust and Royal Society Sir Henry Dale Fellowship 204311/Z/16/Z (NRF); the Gates Foundation (INV 034540 and INV034652) the National Heart, Lung, and Blood Institute Recipient Epidemiology and Donor Evaluation Study (REDS, now in its fourth phase, REDSIVP) for providing the blood donor demographic and zip code data for analysis (grant HHSN268201100007I); and the UK Medical Research Council under a concordat with the UK Department for International Development and Community Jameel and the NIHR Health Protection Research Unit in Modelling Methodology. CAPJ was supported by FAPESP (2019/218580) and Fundação Faculdade de Medicina. CAPJ, VHN were supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. VHN was supported by CNPq (304714/2018–6). SCF is supported by FAPESP. FM is supported by PROGRAMA INOVA FIOCRUZCE/Funcap, Edital 01/2020 Number: FIO0167–00065.01.00/20 SPU N° 06531047/2020. MBN is supported by CPNq. RFOF is supported by JBS  Fazer o bem faz bem. OR is supported by Medical Research Council MR/V038109/1.
The Blood Center SARSCoV2 Prevalence group is also composed by Cláudia M M Abrahim, Martirene A Silva, Fabíola S A Hanna, Adriana S N Ramos, Juqueline R Cristal and Samara Alves. We also thank Robert Verity for his critical review of the paper and suggestions.
Ethics
This project was approved by the Brazilian national research ethics committee, CONEP CAAE  30178220.3.1001.0068. The Brazilian national research committee (CONEP) waived for informed consent. All methods were performed in accordance with relevant guidelines and regulations.
Version history
 Preprint posted: February 22, 2022 (view preprint)
 Received: February 28, 2022
 Accepted: September 17, 2022
 Accepted Manuscript published: September 22, 2022 (version 1)
 Accepted Manuscript updated: September 23, 2022 (version 2)
 Version of Record published: October 7, 2022 (version 3)
Copyright
© 2022, Prete, Buss, Whittaker et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,225
 views

 259
 downloads

 6
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Epidemiology and Global Health
Background:
Comorbidity with type 2 diabetes (T2D) results in worsening of cancerspecific and overall prognosis in colorectal cancer (CRC) patients. The treatment of CRC per se may be diabetogenic. We assessed the impact of different types of surgical cancer resections and oncological treatment on risk of T2D development in CRC patients.
Methods:
We developed a populationbased cohort study including all Danish CRC patients, who had undergone CRC surgery between 2001 and 2018. Using nationwide register data, we identified and followed patients from date of surgery and until new onset of T2D, death, or end of followup.
Results:
In total, 46,373 CRC patients were included and divided into six groups according to type of surgical resection: 10,566 RightNoChemo (23%), 4645 RightChemo (10%), 10,151 LeftNoChemo (22%), 5257 LeftChemo (11%), 9618 RectalNoChemo (21%), and 6136 RectalChemo (13%). During 245,466 personyears of followup, 2556 patients developed T2D. The incidence rate (IR) of T2D was highest in the LeftChemo group 11.3 (95% CI: 10.4–12.2) per 1000 personyears and lowest in the RectalNoChemo group 9.6 (95% CI: 8.8–10.4). Betweengroup unadjusted hazard ratio (HR) of developing T2D was similar and nonsignificant. In the adjusted analysis, RectalNoChemo was associated with lower T2D risk (HR 0.86 [95% CI 0.75–0.98]) compared to RightNoChemo.
For all six groups, an increased level of body mass index (BMI) resulted in a nearly twofold increased risk of developing T2D.
Conclusions:
This study suggests that postoperative T2D screening should be prioritised in CRC survivors with overweight/obesity regardless of type of CRC treatment applied.
Funding:
The Novo Nordisk Foundation (NNF17SA0031406); TrygFonden (101390; 20045; 125132).

 Epidemiology and Global Health
We discuss 12 misperceptions, misstatements, or mistakes concerning the use of covariates in observational or nonrandomized research. Additionally, we offer advice to help investigators, editors, reviewers, and readers make more informed decisions about conducting and interpreting research where the influence of covariates may be at issue. We primarily address misperceptions in the context of statistical management of the covariates through various forms of modeling, although we also emphasize design and model or variable selection. Other approaches to addressing the effects of covariates, including matching, have logical extensions from what we discuss here but are not dwelled upon heavily. The misperceptions, misstatements, or mistakes we discuss include accurate representation of covariates, effects of measurement error, overreliance on covariate categorization, underestimation of power loss when controlling for covariates, misinterpretation of significance in statistical models, and misconceptions about confounding variables, selecting on a collider, and p value interpretations in covariateinclusive analyses. This condensed overview serves to correct common errors and improve research quality in general and in nutrition research specifically.