COVID19 pandemic dynamics in South Africa and epidemiological characteristics of three variants of concern (Beta, Delta, and Omicron)
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARSCoV2) variants of concern (VOCs) have been key drivers of new coronavirus disease 2019 (COVID19) pandemic waves. To better understand variant epidemiologic characteristics, here we apply a modelinference system to reconstruct SARSCoV2 transmission dynamics in South Africa, a country that has experienced three VOC pandemic waves (i.e. Beta, Delta, and Omicron BA.1) by February 2022. We estimate key epidemiologic quantities in each of the nine South African provinces during March 2020 to February 2022, while accounting for changing detection rates, infection seasonality, nonpharmaceutical interventions, and vaccination. Model validation shows that estimated underlying infection rates and key parameters (e.g. infectiondetection rate and infectionfatality risk) are in line with independent epidemiological data and investigations. In addition, retrospective predictions capture pandemic trajectories beyond the model training period. These detailed, validated modelinference estimates thus enable quantification of both the immune erosion potential and transmissibility of three major SARSCoV2 VOCs, that is, Beta, Delta, and Omicron BA.1. These findings help elucidate changing COVID19 dynamics and inform future public health planning.
Editor's evaluation
This paper proposes a modeling framework that can be used to track the complex behavioral and immunological landscape of the COVID19 pandemic over multiple surges and variants in South Africa, which has been validated previously for other regions and time periods. This work may be useful for infectious disease modelers, epidemiologists, and public health officials as they navigate the next phase of the pandemic or seek to understand the history of the epidemic in South Africa.
https://doi.org/10.7554/eLife.78933.sa0Introduction
Since its emergence in late December 2019, the severe acute respiratory syndrome coronavirus 2 (SARSCoV2) has spread globally, causing the coronavirus disease 2019 (COVID19) pandemic (Koelle et al., 2022). In just 2 years, SARSCoV2 has caused several pandemic waves in quick succession in many places. Many of these repeated pandemic waves have been driven by new variants of concern (VOCs) or interest (VOIs) that erode prior immunity from either infection or vaccination, increase transmissibility, or a combination of both. However, while laboratory and field studies have provided insights into these epidemiological characteristics, quantifying the extent of immune erosion (or evasion) and changes to transmissibility for each VOC remains challenging.
Like many places, by February 2022 South Africa had experienced four distinct pandemic waves caused by the ancestral SARSCoV2 and three VOCs (Beta, Delta, and Omicron BA.1). However, South Africa is also unique in that the country had the earliest surge for two of the five VOCs identified to date – namely, Beta (Tegally et al., 2021) and Omicron (Viana et al., 2022). To better understand the COVID19 dynamics in South Africa and variant epidemiological characteristics, here we utilize a modelinference system similar to one developed for study of SARSCoV2 VOCs, including the Beta variant in South Africa (Yang and Shaman, 2021c). We use this system to reconstruct SARSCoV2 transmission dynamics in each of the nine provinces of South Africa from the pandemic onset during March 2020 to the end of February 2022 while accounting for multiple factors modulating underlying transmission dynamics. We then rigorously validate the modelinference estimates using independent data and retrospective predictions. The validated estimates quantify the immune erosion potential and transmissibility of three major SARSCoV2 variants, that is, Beta, Delta, and Omicron (BA.1), in South Africa. Our findings highlight several common characteristics of SARSCoV2 VOCs and the need for more proactive planning and preparedness for future VOCs, including development of a universal vaccine that can effectively block SARSCoV2 infection as well as prevent severe disease.
Results
Model fit and validation
The modelinference system uses case and death data to reconstruct the transmission dynamics of SARSCoV2, while accounting for underdetection of infection, infection seasonality, implemented nonpharmaceutical interventions (NPIs), and vaccination (see Materials and methods). Overall, the modelinference system is able to fit weekly case and death data in each of the nine South African provinces (Figure 1A, Appendix 1—figure 1, and additional discussion in Appendix 1). Additional testing (in particular, for the infectiondetection rate) and visual inspections indicate that posterior estimates for the model parameters are consistent with those reported in the literature, or changed over time and/or across provinces in directions as would be expected (see Appendix 1).
We then validated the modelinference estimates using three independent datasets. First, we used serology data. We note that early in the pandemic serology data may reflect underlying infection rates but later, due to waning antibody titers and reinfection, likely underestimate infection. Compared to seroprevalence measures taken at multiple time points in each province, our model estimated cumulative infection rates roughly match corresponding serology measures and trends over time; as expected, model estimates were higher than serology measures taken during later months (Figure 1B). Second, compared to hospital admission data, across the nine provinces, model estimated infection numbers were well correlated with numbers of hospitalizations for all four pandemic waves caused by the ancestral, Beta, Delta, and Omicron (BA.1) variants, respectively (r>0.75, Appendix 1—figure 2A–D). Third, modelestimated infection numbers were correlated with ageadjusted excess mortality for both the ancestral and Delta wave (r=0.86 and 0.61, respectively; Appendix 1—figure 2A and C). For the Beta wave, after excluding Western Cape, a province with a very high hospitalization rate but low excess mortality during this wave (Appendix 1—figure 2B), modelestimated infection numbers were also correlated with ageadjusted excess mortality for the remaining provinces (r=0.55; Appendix 1—figure 2B). For the Omicron (BA.1) wave, like many other places, due to prior infection and/or vaccination (Nyberg et al., 2022; Wolter et al., 2022), mortality rates decoupled from infection rates (Appendix 1—figure 2D). Overall, comparisons with the three independent datasets indicate our modelinference estimates align with underlying transmission dynamics.
In addition, as a fourth model validation, we generated retrospective predictions of the Delta and Omicron (BA.1) waves at two key time points, that is 2 weeks and 1 week, separately, before the observed peak of cases (approximately 3–5 weeks before the observed peak of deaths; Figure 2). To accurately predict a pandemic wave caused by a new variant, the modelinference system needs to accurately estimate the background population characteristics (e.g. population susceptibility) before the emergence of the new variant, as well as changes in population susceptibility and transmissibility due to the new variant. This is particularly challenging for South Africa, as the pandemic waves there tended to progress quickly, with cases surging and peaking within 3–7 weeks before declining. As a result, often only 1–6 weeks of new variant data were available for modelinference before generating the prediction. Despite these challenges, 1–2 weeks before the case peak and 3–5 weeks before the observed death peak, the model was able to accurately predict the remaining trajectories of cases and deaths in most of the nine provinces for both the Delta and Omicron (BA.1) waves (Figure 2 for the four most populous provinces and Appendix 1—figure 3 for the remainder). These accurate model predictions further validate the modelinference estimates.
Pandemic dynamics and key modelinference, using Gauteng province as an example
Next, we use Gauteng, the province with the largest population, as an example to highlight pandemic dynamics in South Africa thus far and develop key modelinference estimates (Figure 3 for Gauteng and Appendix 1—figures 4–11 for each of the other eight provinces). Despite lower cases per capita than many other countries, infection numbers in South Africa were likely much higher due to underdetection. For Gauteng, the estimated infectiondetection rate during the first pandemic wave was 4.59% (95% CI: 2.62–9.77%), and increased slightly to 6.18% (95% CI: 3.29–11.11%) and 6.27% (95% CI: 3.44–12.39%) during the Beta and Delta waves, respectively (Appendix 1—table 1). These estimates are in line with serology data. In particular, a populationlevel serosurvey in Gauteng found 68.4% seropositivity among those unvaccinated at the end of the Delta wave (Madhi et al., 2022). Combining the reported cases at that time (~6% of the population size) with undercounting of infections in serosurveys due to seroreversions and reinfections suggests that the overall detection rate would be less than 10%.
Using our inferred underdetection (Figure 3E), we estimate that 32.83% (95% CI: 15.42–57.59%, Appendix 1—table 2) of the population in Gauteng were infected during the first wave, predominantly during winter when more conducive climate conditions and relaxed public health restrictions existed (see the estimated seasonal and mobility trends, Figure 3A). This high infection rate, while with uncertainty, is in line with serology measures taken in Gauteng at the end of the first wave (ranging from 15% to 27% among 6 serosurveys during November 2020; Figure 1B) and a study showing 30% seropositivity among participants enrolled in the Novavax NVXCoV2373 vaccine phase 2ab trial in South Africa during August – November 2020 (Shinde et al., 2021).
With the emergence of Beta, another 21.87% (95% CI: 12.16–41.13%) of the population in Gauteng – including reinfections – is estimated to have been infected, even though the Beta wave occurred during summer under less conducive climate conditions for transmission (Figure 3A). The modelinference system estimates a large increase in population susceptibility with the surge of Beta (Figure 3D; note population susceptibility is computed as S / N×100%, where S is the estimated number of susceptible people and N is population size). This dramatic increase in population susceptibility (vs. a likely more gradual change due to waning immunity), to the then predominant Beta variant, suggests Beta likely substantially eroded prior immunity and is consistent with laboratory studies showing low neutralizing ability of convalescent sera against Beta (GarciaBeltran et al., 2021; Wall et al., 2021). In addition, an increase in transmissibility is also evident for Beta, after accounting for concurrent NPIs and infection seasonality (Figure 3C; note transmissibility is computed as the product of the estimated variantspecific transmission rate and the infectious period; see Materials and methods for detail). Notably, in contrast to the large fluctuation of the timevarying effective reproduction number over time (R_{t}, Figure 3B), the transmissibility estimates are more stable and reflect changes in variantspecific properties. Further, consistent with indepth epidemiological findings (AbuRaddad et al., 2021a), the estimated overall infectionfatality risk for Beta was about twice as high as the ancestral SARSCoV2 (0.19% [95% CI: 0.10–0.33%] vs. 0.09% [95% CI: 0.05–0.20%], Figure 3F and Appendix 1—table 3). Nonetheless, these estimates are based on documented COVID19 deaths and are likely underestimates.
With the introduction of Delta, a third pandemic wave occurred in Gauteng during the 2021 winter. The modelinference system estimates a 49.82% (95% CI: 25.22–90.79%) attack rate by Delta, despite the large number of infections during the previous two waves. This large attack rate was possible due to the high transmissibility of Delta, as reported in multiple studies (Public Health England, 2021; Allen et al., 2022; Challen et al., 2021; Earnest et al., 2021; Vöhringer et al., 2021), the more conducive winter transmission conditions (Figure 3A), and the immune erosive properties of Delta relative to both the ancestral and Beta variants (Dhar et al., 2021; Liu et al., 2021; de Oliveira and Lessells, 2021).
Due to these large pandemic waves, prior to the detection of Omicron (BA.1) in Gauteng, estimated cumulative infection numbers surpassed the population size (Figure 4B), indicating the large majority of the population had been infected and some more than once. With the rise of Omicron (BA.1), the modelinference system estimates a very large increase in population susceptibility (Figure 3D), as well as an increase in transmissibility (Figure 3C); however, unlike previous waves, the Omicron (BA.1) wave progresses much more quickly, peaking 2–3 weeks after initiating marked exponential growth. These estimates suggest that several additional factors may have also contributed to the observed dynamics, including changes to the infectiondetection rate (Figure 3E and Appendix 1), a summer seasonality increasingly suppressing transmission as the wave progressed (Figure 3A), as well as a slight change in population mobility suggesting potential behavior changes (Figure 3A). By the end of February 2022, the modelinference system estimates a 44.49% (95% CI: 19.01–75.30%) attack rate, with only 4.26% (95% CI: 2.46–9.72%) of infections detected as cases, during the Omicron (BA.1) wave in Gauteng. In addition, consistent with the reported 0.3 odds of severe disease compared to Delta infections (Wolter et al., 2022), estimated overall infectionfatality risk during the Omicron (BA.1) wave was about 30% of that during the Delta wave in Gauteng (0.03% [95% CI: 0.02–0.06%] vs. 0.11% [95% CI: 0.06–0.21%], based on documented COVID19 deaths; Appendix 1—table 3).
Model inferred epidemiological characteristics across the nine provinces in South Africa
Across all nine provinces in South Africa, the pandemic timing and intensity varied (Figure 4A–C). In addition to Gauteng, high cumulative infection rates during the first three pandemic waves are also estimated for Western Cape and Northern Cape (Figure 1C–E, Figure 4B and Appendix 1—table 2). Overall, all nine provinces likely experienced three large pandemic waves prior to the growth of Omicron (BA.1); estimated average cumulative infections ranged from 60% of the population in Limpopo to 122% in Northern Cape (Figure 4B). Corroboration for these cumulative infection estimates is derived from mortality data. Excess mortality before the Omicron (BA.1) wave was as high as 0.47% of the South African population by the end of November 2021 (The South African Medical Research Council (SAMRC), 2021), despite the relatively young population (median age: 27.6 years (Anonymous, 2020b) vs. 38.5 years in the US [United States Census Bureau, 2020]) and thus lower expected infectionfatality risk (Levin et al., 2020; O’Driscoll et al., 2021). Assuming an infectionfatality risk of 0.5% (similar to estimates in COVID19 Forecasting Team, 2022 for South Africa), these excess deaths would convert to a 94% infection rate.
We then use these modelinference estimates to quantify the immune erosion potential and increase in transmissibility for each VOC. Specifically, the immune erosion (against infection) potential is computed as the ratio of two quantities – the numerator is the increase of population susceptibility due to a given VOC and the denominator is population immunity (i.e. complement of population susceptibility) at wave onset. The relative increase in transmissibility is also computed as a ratio, that is, the average increase due to a given VOC relative to the ancestral SARSCoV2 (see Materials and methods). As populationspecific factors contributing to transmissibility (e.g. population density and average contact rate) would be largely cancelled out in the latter ratio, we expect estimates of the VOC transmissibility increase to be generally applicable to different populations. However, prior exposures and vaccinations varied over time and across populations; thus, the level of immune erosion is necessarily estimated relative to the local population immune landscape at the time of the variant surge and should be interpreted accordingly. In addition, this assessment does not distinguish the sources of immunity or partial protection against severe disease; rather, it assesses the overall loss of immune protection against infection for a given VOC.
In the above context, we estimate that Beta eroded immunity among 63.4% (95% CI: 45.0–77.9%) of individuals with prior ancestral SARSCoV2 infection and was 34.3% (95% CI: 20.5–48.2%) more transmissible than the ancestral SARSCoV2. These estimates for Beta are consistent across the nine provinces (Figure 4D, 1st column and Table 1), as well as with our previous estimates using national data for South Africa (Yang and Shaman, 2021c). Additional support for the high immune erosion of Beta is evident from recoverees of ancestral SARSCoV2 infection who were enrolled in the Novavax NVXCoV2373 vaccine phase 2ab trial (Shinde et al., 2021) and found to have a similar likelihood of COVID19, mostly due to Beta, compared to those seronegative at enrollment.
Estimates for Delta vary across the nine provinces (Figure 4D, 2nd column), given the more diverse population immune landscape among provinces after two pandemic waves. Overall, we estimate that Delta eroded 24.5% (95% CI: 0–53.2%) of prior immunity (gained from infection by ancestral SARSCoV2 and/or Beta, and/or vaccination) and was 47.5% (95% CI: 28.4–69.4%) more transmissible than the ancestral SARSCoV2. Consistent with this finding, and in particular the estimated immune erosion, studies have reported a 27.5% reinfection rate during the Delta pandemic wave in Delhi, India (Dhar et al., 2021) and reduced ability of sera from Betainfection recoverees to neutralize Delta (Liu et al., 2021; de Oliveira and Lessells, 2021).
For Omicron (BA.1), estimates also vary by province but still consistently point to its higher transmissibility than all previous variants (Figure 4D, 3rd column). Overall, we estimate that Omicron (BA.1) is 94.0% (95% CI: 73.5–121.5%) more transmissible than the ancestral SARSCoV2. This estimated transmissibility is higher than Delta and consistent with in vitro and/or ex vivo studies showing Omicron (BA.1) replicates faster within host than Delta (GarciaBeltran et al., 2022; Hui et al., 2022). In addition, we estimate that Omicron (BA.1) eroded 54.1% (95% CI: 35.8–70.1%) of immunity due to all prior infections and vaccination. Importantly, as noted above, the estimate for immune erosion is not directly comparable across variants, as it is relative to the combined population immunity accumulated until the rise of each variant. In the case of Beta, it is immunity accumulated from the first wave via infection by the ancestral SARSCoV2. In the case of Omicron (BA.1), it includes immunity from prior infection and reinfection of any of the previously circulating variants as well as vaccination. Thus, the estimate for Omicron (BA.1) may represent a far broader capacity for immune erosion than was evident for Beta. Supporting the suggestion of broadspectrum immune erosion of Omicron (BA.1), studies have reported low neutralization ability of convalescent sera from infections by all previous variants (Rössler et al., 2022; Cele et al., 2022), as well as high attack rates among vaccinees in several Omicron (BA.1) outbreaks (Brandal et al., 2021; Helmsdal et al., 2022).
Discussion
Using a comprehensive modelinference system, we have reconstructed the pandemic dynamics in each of the nine provinces of South Africa. Uncertainties exist in our findings, due to incomplete and varying detection of SARSCoV2 infections and deaths, changing population behavior and public health interventions, and changing circulating variants. To address these uncertainties, we have validated our estimates using three datasets not used by our modelinference system (i.e. serology, hospitalization, and excess mortality data; Figure 1B and Appendix 1—figure 2) as well as retrospective prediction (Figure 2 and Appendix 1—figure 4). In addition, as detailed in the Results, we have showed that estimated underlying infection rates (Figure 1B and Appendix 1—figure 2) and key parameters (e.g. infectiondetection rate and infectionfatality risk) are in line with other independent epidemiological data and investigations. The detailed, validated modelinference estimates thus allow quantification of both the immune erosion potential and transmissibility of three major SARSCoV2 VOCs, that is, Beta, Delta, and Omicron (BA.1).
The relevance of our modelinference estimates to previous studies has been presented in the Results section. Here, we make three additional general observations, drawn from global SARSCoV2 dynamics including but not limited to findings in South Africa. First, high prior immunity does not preclude new outbreaks, as neither infection nor current vaccination is sterilizing. As shown in South Africa, even with the high infection rate accumulated from preceding waves, new waves can occur with the emergence or introduction of new variants. Around half of South Africans are estimated to have been infected after the Beta wave (Appendix 1—table 2), yet the Delta variant caused a third large pandemic wave, followed by a fourth wave with comparable infection rates by Omicron BA.1 (Figure 4B, Appendix 1—table 2, and Appendix 1—table 4 for a preliminary assessment of reinfection rates).
Second, large numbers of hospitalizations and/or deaths can still occur in later waves with large infection surges, even though prior infection may provide partial protection and to some extent temper disease severity. This is evident from the large Delta wave in South Africa, which resulted in 0.2% excess mortality (vs. 0.08% during the first wave and 0.19% during the Beta wave [The South African Medical Research Council (SAMRC), 2021]). More recently, due to the Omicron BA.4/BA.5 subvariants that have been shown to evade prior immunity including from BA.1 infection (Cao et al., 2022; Khan et al., 2022), a fifth wave began in South Africa during May 2022, leading to increases in both cases and hospitalizations (Sarah et al., 2022). Together, the continued transmission and potential severe outcomes highlight the importance of continued preparedness and prompt public health actions as societies learn to live with SARSCoV2.
Third, multiple SARSCoV2 VOCs/VOIs have emerged in the two years since pandemic inception. It is challenging to predict the frequency and direction of future viral mutation, in particular, the level of immune erosion, changes in transmissibility, and innate severity. Nonetheless, given high exposure and vaccination in many populations, variants capable of eroding a wide spectrum of prior immunity (i.e. from infection by multiple preexisting variants and vaccination) would have a greater chance of causing new major outbreaks. Indeed, except for the Alpha variant, the other four important VOCs (i.e. Beta, Gamma, Delta, and Omicron) all produced some level of immune erosion. In addition, later VOCs, like Delta and Omicron, appear to have been more genetically distinct from previous variants (van der Straten et al., 2022). As a result, they are likely more capable of causing reinfection despite diverse prior exposures and in turn new pandemic waves. Given this pattern, to prepare for future antigenic changes from new variants, development of a universal vaccine that can effectively block SARSCoV2 infection in addition to preventing severe disease (e.g. shown in Mao et al., 2022) is urgently needed (Morens et al., 2022).
The COVID19 pandemic has caused devastating public health and economic burdens worldwide. Yet SARSCoV2 will likely persist in the future. To mitigate its impact, proactive planning and preparedness is paramount.
Materials and methods
Data sources and processing
Request a detailed protocolWe used reported COVID19 case and mortality data to capture transmission dynamics, weather data to estimate infection seasonality, mobility data to represent concurrent NPIs, and vaccination data to account for changes in population susceptibility due to vaccination in the modelinference system. Provincial level COVID19 case, mortality, and vaccination data were sourced from the Coronavirus COVID19 (2019nCoV) Data Repository for South Africa (COVID19ZA)(Data Science for Social Impact Research Group at University of Pretoria, 2021). Hourly surface station temperature and relative humidity came from the Integrated Surface Dataset (ISD) maintained by the National Oceanic and Atmospheric Administration (NOAA) and are accessible using the ‘stationaRy’ R package (Iannone, 2020a; Iannone, 2020b). We computed specific humidity using temperature and relative humidity per the ClausiusClapeyron Equation (Wallace and Hobbs, 2006). We then aggregated these data for all weather stations in each province with measurements since 2000 and calculated the average for each week of the year during 2000–2020.
Mobility data were derived from Google Community Mobility Reports (Google Inc, 2020); we aggregated all businessrelated categories (i.e. retail and recreational, transit stations, and workplaces) in all locations in each province to weekly intervals. For vaccination, provincial vaccination data from the COVID19ZA data repository recorded the total number of vaccine doses administered over time; to obtain a breakdown for numbers of partial (one dose of mRNA vaccine) and full vaccinations (one dose of Janssen vaccine or two doses of mRNA vaccine), separately, we used national vaccination data for South Africa from Our World in Data (Anonymous, 2020a; Mathieu et al., 2021) to apportion the doses each day. In addition, cumulative case data suggested 18,586 new cases on November 23, 2021, whereas the South Africa Department of Health reported 868 (Department of Health Republic of South Africa, 2021a). Thus, for November 23, 2021, we used linear interpolation to fill in estimates for each province on that day and then scaled the estimates such that they sum to 868.
Modelinference system
The modelinference system is based on our previous work estimating changes in transmissibility and immune erosion for SARSCoV2 VOCs including Alpha, Beta, Gamma, and Delta (Yang and Shaman, 2021c; Yang and Shaman, 2022). Below we describe each component.
Epidemic model
Request a detailed protocolThe epidemic model follows an SEIRSV (susceptibleexposedinfectiousrecoveredsusceptiblevaccination) construct per Equation 1:
where S, E, I, R are the number of susceptible, exposed (but not yet infectious), infectious, and recovered/immune/deceased individuals; N is the population size; and ε is the number of travelimported infections. In addition, the model includes the following key components:
Virusspecific properties, including the timevarying variantspecific transmission rate ${\beta}_{t}$ , latency period Z_{t}, infectious period D_{t}, and immunity period L_{t}. Of note, the immunity period L_{t} and the term R/L_{t} in Equation 1 are used to model the waning of immune protection against infection. Also note that all parameters are estimated for each week (t) as described below.
The impact of NPIs. Specifically, we use relative population mobility (see data above) to adjust the transmission rate via the term m_{t}, as the overall impact of NPIs (e.g. reduction in the timevarying effective reproduction number R_{t}) has been reported to be highly correlated with population mobility during the COVID19 pandemic.(Yang et al., 2021b; Lasry et al., 2020; Kraemer et al., 2020) To further account for potential changes in effectiveness, the model additionally includes a parameter, e_{t}, to scale NPI effectiveness.
The impact of vaccination, via the terms v_{1,t} and v_{2,t}. Specifically, v_{1,t} is the number of individuals successfully immunized after the first dose of vaccine and is computed using vaccination data and vaccine effectiveness (VE) for 1st dose; and v_{2,t} is the additional number of individuals successfully immunized after the second vaccine dose (i.e. excluding those successfully immunized after the first dose). In South Africa, around twothirds of vaccines administered during our study period were the mRNA BioNTech/Pfizer vaccine and onethird the Janssen vaccine (Department of Health Republic of South Africa, 2021b). We thus set VE to 20%/85% (partial/full vaccination) for Beta, 35%/75% for Delta, and 10%/35% for Omicron (BA.1) based on reported VE estimates (AbuRaddad et al., 2021b; Lopez Bernal et al., 2021; Andrews et al., 2021).
Infection seasonality, computed using temperature and specific humidity data as described previously (see supplemental material of Yang and Shaman, 2021c). Briefly, we estimated the relative seasonal trend (b_{t}) using a model representing the dependency of the survival of respiratory viruses including SARSCoV2 to temperature and humidity (Biryukov et al., 2020; Morris et al., 2021), per
In essence, the seasonality function in Equation 2 assumes that humidity has a bimodal effect on seasonal risk of infection, with both low and high humidity conditions favoring transmission [i.e. the parabola in 1st set of brackets, where q(t) is weekly specific humidity measured by local weather stations]; and this effect is further modulated by temperature, with low temperatures promoting transmission and temperatures above a certain threshold limiting transmission [i.e. 2nd set of brackets, where T(t) is weekly temperature measured by local weather stations and T_{c} is the threshold]. As SARSCoV2 specific parameters (a, b, c, T_{c}, and T_{exp} in Equation 2) are not available, to estimate its seasonality using Equation 2, as done in Yang and Shaman, 2021c, we use parameters estimated for influenza (Yuan et al., 2021) and scale the weekly outputs [i.e., ${R}_{0}\left(t\right)$ ] by the annual mean (i.e. $\overline{{R}_{\text{0}}}$) per Equation 3. In doing so, the scaled outputs (b_{t}) are no longer specific to influenza; rather, they represent the relative, seasonalityrelated transmissibility by week, general to viruses sharing similar seasonal responses. As shown in Figure 2A, b_{t} estimates over the year averaged to 1 such that weeks with b_{t} >1 (e.g. during the winter) are more conducive to SARSCoV2 transmission, whereas weeks with b_{t} <1 (e.g. during the summer) have less favorable climate conditions for transmission. The estimated relative seasonal trend, b_{t}, is used to adjust the relative transmission rate at time t in Equation 1.
Observation model to account for underdetection and delay
Request a detailed protocolUsing the modelsimulated number of infections occurring each day, we further computed the number of cases and deaths each week to match with the observations, as done in Yang et al., 2021a. Briefly, we include (1) a timelag from infectiousness to detection (i.e. an infection being diagnosed as a case), drawn from a gamma distribution with a mean of T_{d,mean} days and a standard deviation of T_{d, sd} days, to account for delays in detection (Appendix 1—table 5); (2) an infectiondetection rate (r_{t}), that is the fraction of infections (including subclinical or asymptomatic infections) reported as cases, to account for underdetection; (3) a timelag from infectiousness to death, drawn from a gamma distribution with a mean of 13–15 days and a standard deviation of 10 days; and (4) an infectionfatality risk (IFR_{t}). To compute the modelsimulated number of new cases each week, we multiplied the modelsimulated number of new infections per day by the infectiondetection rate, and further distributed these simulated cases in time per the distribution of timefrominfectiousnesstodetection. Similarly, to compute the modelsimulated deaths per week and account for delays in time to death, we multiplied the simulatedinfections by the IFR and then distributed these simulated deaths in time per the distribution of timefrominfectioustodeath. We then aggregated these daily numbers to weekly totals to match with the weekly case and mortality data for modelinference. For each week, the infectiondetection rate (r_{t}), the infectionfatality risk (IFR_{t})., and the two timetodetection parameters (T_{d, mean} and T_{d, sd}) were estimated along with other parameters (see below).
Model inference and parameter estimation
Request a detailed protocolThe inference system uses the ensemble adjustment Kalman filter (EAKF [Anderson, 2001]), a Bayesian statistical method, to estimate model state variables (i.e. S, E, I, R from Equation 1) and parameters (i.e. ${\beta}_{t}$ , Z_{t}, D_{t}, L_{t}, e_{t}, from Equation 1 as well as r_{t}, IFR_{t} and other parameters from the observation model). Briefly, the EAKF uses an ensemble of model realizations (n=500 here), each with initial parameters and variables randomly drawn from a prior range (see Appendix 1—table 5). After model initialization, the system integrates the model ensemble forward in time for a week (per Equation 1) to compute the prior distribution for each model state variable and parameter, as well as the modelsimulated number of cases and deaths for that week. The system then combines the prior estimates with the observed case and death data for the same week to compute the posterior per Bayes' theorem (Anderson, 2001). During this filtering process, the system updates the posterior distribution of all model variables and parameters for each week. For a further discussion on the filtering process and additional considerations, see the Appendix 1; diagnosis of model posterior estimates for all parameters are also included in the Appendix 1 and Appendix 1—figures 15–23.
Estimating changes in transmissibility and immune erosion for each variant
Request a detailed protocolAs in Yang and Shaman, 2021c, we computed the variantspecific transmissibility (${R}_{TX}$) as the product of the variantspecific transmission rate (${\beta}_{t}$) and infectious period (D_{t}). Note that R_{t}, the timevarying effective reproduction number, is defined as ${R}_{t}={{b}_{t}{e}_{t}{m}_{t}\beta}_{t}{D}_{t}S/N={{b}_{t}{e}_{t}{m}_{t}R}_{TX}S/N.$ To reduce uncertainty, we averaged transmissibility estimates over the period a particular variant of interest was predominant. To find these predominant periods, we first specified the approximate timing of each pandemic wave in each province based on: (1) when available, genomic surveillance data; specifically, the onsets of the Beta wave in Eastern Cape, Western Cape, KwaZuluNatal, and Northern Cape, were separately based on the initial detection of Beta in these provinces as reported in Tegally et al., 2021; the onsets of the Delta wave in each of the nine provinces, separately, were based on genomic sequencing data from the Network for Genomic Surveillance South Africa (NGSSA)(The National Institute for Communicable Diseases (NICD) of the National Health Laboratory (NHLS) on behalf of the Network for Genomics Surveillance in South Africa (NGSSA), 2021); and (2) when genomic data were not available, we used the week with the lowest case number between two waves. The specified calendar periods are listed in Appendix 1—table 6. During later waves, multiple variants could initially cocirculate before one became predominant. As a result, the estimated transmissibility tended to increase before reaching a plateau (see, e.g. Figure 2C). In addition, in a previous study of the Delta pandemic wave in India (Yang and Shaman, 2022), we also observed that when many had been infected, transmissibility could decrease a couple months after the peak, likely due to increased reinfections for which onward transmission may be reduced. Thus, to obtain a more variantspecific estimate, we computed the average transmissibility ($\overline{{R}_{TX}}$) using the weekly R_{TX} estimates over the 8week period starting the week prior to the maximal R_{tx} during each wave; if no maximum existed (e.g. when a new variant is less transmissible), we simply averaged over the entire wave. We then computed the change in transmissibility due to a given variant relative to the ancestral SARSCoV2 as $\frac{(\overline{{R}_{TX,variant}}\overline{{R}_{TX,ancestral}})}{\overline{{R}_{TX,ancestral}}}\times 100\mathrm{\%}$.
To quantify immune erosion, similar to Yang and Shaman, 2021c, we estimated changes in susceptibility over time and computed the change in immunity as ΔImm = S_{t+1} – S_{t} +i_{t}, where S_{t} is the susceptibility at timet and i_{t} is the new infections occurring during each weekt. We sum over all ΔImm estimates for a particular location, during each wave, to compute the total change in immunity due to a new variant, ${\mathrm{\Sigma}\mathrm{\Delta}Imm}_{v}$. Because filter adjustment could also slightly increase S, to avoid overestimation, here we only included substantial increases (i.e. ΔImm per week >0.5% of the total population) when computing changes due to a new variant. As such, we did not further account for smaller susceptibility increases due to waning immunity [for reference, for a population that is 50% immune and a 2year mean immunity period, 0.5 / (52×2)×100% = 0.48% of the population would lose immunity during a week due to waning immunity]. We then computed the level of immune erosion as the ratio of ${\mathrm{\Sigma}\mathrm{\Delta}Imm}_{v}$ to the modelestimated population immunity prior to the first detection of immune erosion, during each wave. That is, as opposed to having a common reference of prior immunity, here immune erosion for each variant depends on the state of the population immune landscape –that is, combining all prior exposures and vaccinations – immediately preceding the surge of that variant.
For all provinces, modelinference was initiated the week starting March 15, 2020 and run continuously until the week starting February 27, 2022. To account for model stochasticity, we repeated the modelinference process 100 times for each province, each with 500 model realizations and summarized the results from all 50,000 model estimates.
Model validation using independent data
Request a detailed protocolTo compare model estimates with independent observations not assimilated into the modelinference system, we utilized three relevant datasets:
Serological survey data measuring the prevalence of SARSCoV2 antibodies over time. Multiple serology surveys have been conducted in different provinces of South Africa. The South African COVID19 Modelling Consortium summarizes the findings from several of these surveys (see Figure 1A of The South African COVID19 Modelling Consortium, 2021). We digitized all data presented in Figure 1A of The South African COVID19 Modelling Consortium, 2021 and compared these to corresponding modelestimated cumulative infection rates (computed midmonth for each corresponding month with a seroprevalence measure). Due to unknown survey methodologies and challenges adjusting for seroreversion and reinfection, we used these data directly (i.e. without adjustment) for qualitative comparison.
COVID19related hospitalization data, from COVID19ZA (Data Science for Social Impact Research Group at University of Pretoria, 2021). We aggregated the total number of COVID19 hospital admissions during each wave and compared these aggregates to modelestimated cumulative infection rates during the same wave. Of note, these hospitalization data were available from June 6, 2020 onwards and are thus incomplete for the first wave.
Ageadjusted excess mortality data from the South African Medical Research Council (SAMRC)(The South African Medical Research Council (SAMRC), 2021). Deaths due to COVID19 (used in the modelinference system) are undercounted. Thus, we also compared modelestimated cumulative infection rates to ageadjusted excess mortality data during each wave. Of note, excess mortality data were available from May 3, 2020 onwards and are thus incomplete for the first wave.
Model validation using retrospective prediction
Request a detailed protocolAs a fourth model validation, we generated model predictions at 2 or 1 weeks before the week of highest cases for the Delta and Omicron (BA.1) waves, separately, and compared the predicted cases and deaths to reported data unknown to the model. Predicting the peak timing, intensity, and epidemic turnaround requires accurate estimation of model state variables and parameters that determine future epidemic trajectories. This is particularly challenging for South Africa as the pandemic waves tended to progress quickly such that cases surged to a peak in only 3–7 weeks. Thus, we chose to generate retrospective predictions 2 and 1 weeks before the peak of cases in order to leverage 1–6 weeks of new variant data for estimating epidemiological characteristics. Specifically, for each pandemic wave, we ran the modelinference system until 2 weeks (or 1 week) before the observed peak of cases, halted the inference, and used the population susceptibility and transmissibility of the circulating variant estimated at that time to predict cases and deaths for the remaining weeks (i.e. 10–14 weeks into the future). Because the infection detection rate and fatality risk are linked to observations of cases and deaths, changes of these quantities during the prediction period could obscure the underlying infection rate and accuracy of the prediction. Thus, for these two parameters specifically, we used modelinference estimates for corresponding weeks to allow comparison of modelpredicted cases and deaths with the data while focusing on testing the accuracy of other key model estimates (e.g. transmissibility of the new variant). As for the modelinference, we repeated each prediction 100 times, each with 500 model realizations and summarized the results from all 50,000 ensemble members.
Data Availability
Request a detailed protocolAll data used in this study are publicly available as described in the “Data sources and processing” section.
Code availability
Request a detailed protocolAll source code and data necessary for the replication of our results and figures are publicly available at https://github.com/wanyang/covid_SouthAfrica (copy archived at swh:1:rev:40c0e5ac5ab65005b600a4ca646fec04b0870b81) (Yang, 2022).
Appendix 1
Supplemental results and discussion
A brief note on reported COVID19 mortality and modelinference strategy in this study
COVID19 mortality data in some South African provinces appeared irregular with very high weekly death counts for some weeks even though cases in preceding weeks were low (see, e.g., COVID19 related deaths in Mpumalanga and Northern Cape in Appendix 1—figure 1). A likely explanation is the audit and release of mortality data including deaths that occurred in previous time periods, which were not redistributed according to the actual time of death. Such instances have occurred in multiple countries (see, e.g., some of the documentations by Financial Times in ref (FT Visual & Data Journalism team, 2020), under the header “SOURCES”). Here, we could not adjust for this possibility due to a lack of information on these apparent data releases. Instead, to account for potential data errors, the ensemble adjustment Kalman filter (EAKF) algorithm (Anderson, 2001), used in the modelinference system, includes an estimate of observational error variance for computing the posterior estimates. In this study, the observational error variance was scaled to corresponding observations (thus, weeks with higher mortality would also have larger observational errors). In doing so, the EAKF reduces the weight of observations with larger observational errors (e.g., for weeks with very large death counts), which reduces their impact on the inference of model dynamics. As such, the posterior estimates for mortality tend to (intentionally) miss very high outlying data points (see Figure 1 and Appendix 1—figure 1). In addition, posterior estimates for the infectionfatality risk (IFR) are more stable over time, including for weeks with outlying death counts (see, e.g., Appendix 1—figure 23, IFR estimates for Mpumalanga).
In light of these COVID19 related mortality data patterns, we computed the overall IFR during each pandemic wave using two methods. The first method computes the wavespecific IFR as the ratio of the total reported COVID19 related deaths to the modelestimated cumulative infection rate during each wave. Because reported COVID19 related mortality is used as the numerator, this method is more heavily affected by the aforementioned data irregularities. The second method computes the wavespecific IFR as a weighted average of the weekly IFR estimates during each wave, a measure for which both the numerator and denominator are modelinference derived; the weights are the estimated fraction of infections during each week. As shown in Appendix 1—table 3, for provinces with consistent case and mortality trends (e.g., Gauteng), the two methods generated similar IFR estimates. In contrast, for provinces with mortality trends inconsistent with case trends (e.g., Mpumalanga), the second method generated IFR estimates more comparable to other provinces than the first method.
Considerations in parameter prior choice and the EAKF inference algorithm
The modelinference system included 9 parameters, namely, the variantspecific transmission rate ${\beta}_{t}$ , latency period Z_{t}, infectious period D_{t}, immunity period L_{t}, scaling factor of NPI effectiveness e_{t}, infectiondetection rate r_{t}, IFR_{t}, and two parameters for the distribution of time from infectiousness to case detection (i.e., the mean and standard deviation, for a gamma distribution). The initial prior distributions were randomly drawn from uniform distributions with ranges listed in Appendix 1—table 5. For parameters with previous estimates from the literature (e.g., transmission rate β, incubation period Z, infectious period D, and immunity period L; see Appendix 1—table 5, column “Source/rationale”), we set the prior range accordingly. For parameters with high uncertainty and spatial variation (e.g., infectiondetection rate), we preliminarily tested initial prior ranges by visualizing model prior and posterior estimates, using different ranges. For instance, for the infectiondetection rate, when using a higher prior range (e.g., 5 –20% vs 1 –10%), the model prior tended to overestimate observed cases and underestimate deaths. Based on the initial testing, we then used a wide range able to reproduce the observed cases and deaths relatively well and then derived estimates of unobserved state variables and parameters.
Importantly, the EAKF used here is an iterative filtering algorithm. After initialization using the initial prior distributions, it iteratively incorporates additional observations at each time step (here, each week) to compute and update the model posterior (including all model state variables and parameters) using the model prior and the latest observations. For the model state variables, the prior is computed per the dynamic model (here, Equation 1); for the model parameters, the prior is the posterior from the last time step. As such, the influence of the initial prior range tends to be less pronounced compared to methods such as Markov Chain Monte Carlo (MCMC). In addition, to capture potential changes over time (e.g., likely increased detection for variants causing more severe disease), we applied space reprobing (SR) (Yang and Shaman, 2014), a technique that randomly replaces parameter values for a small fraction of the model ensemble, to explore a wider range of parameter possibilities (Appendix 1—table 5). Due to both the EAKF algorithm and space reprobing, the posterior parameter estimates can migrate outside the initial parameter ranges (e.g., for the transmission rate during the circulation of new variants).
Testing of the infectiondetection rate during the Omicron (BA.1) wave in Gauteng
A major challenge for this study is inferring the underlying transmission dynamics of the Omicron (BA.1) wave in Gauteng, where Omicron was initially detected and had the earliest case surge. In Gauteng, the number of cases during the first week of reported detection (i.e., the week starting 11/21/21) increased 4.4 times relative to the previous week; during the second week of report (i.e., the week starting 11/28/21) cases increased another 4.9 times. Yet after these two weeks of dramatic increases, cases peaked during the third week and started to decline afterwards. Initial testing suggested substantial changes in infectiondetection rates during this time; in particular, detection could increase during the first two weeks due to awareness and concern for the novel Omicron variant and decline during later weeks due to constraints on testing capacity as well as subsequent reports of milder disease caused by Omicron. To more accurately estimate the infectiondetection rate and underlying transmission dynamics, we ran and compared modelinference estimates using 4 settings for the infectiondetection rate.
As noted above, with the modelEAKF filtering algorithm, parameter posterior is iteratively updated and becomes the prior at the next time step such that information from all previous time steps is sequentially incorporated. Given the sequential nature of the EAKF, rather than using a new prior distribution for the infectiondetection rate, to explore new state space (here, potential changes in detection rate), we applied SR (Yang and Shaman, 2014), which randomly assigns the prior values of a small fraction of the model ensemble while preserving the majority that encodes prior information. In previous studies (Yang and Shaman, 2021c; Yang and Shaman, 2014), we have showed that the model ensemble posterior would remain similar if there is no substantial change in the system and more efficiently migrate towards new state space if there is a substantial change. Here, to explore potential changes in infection detection rates during the Omicron (BA.1) wave, we tested 4 SR settings for the infectiondetection rate: (1) Use of the same baseline range as before (i.e., 1%–8%; uniform distribution, same for other ranges) for all weeks during the Omicron (BA.1) wave; (2) Use of a wider and higher range (i.e., 1%–12%) for all weeks; (3) Use of a range of 1%–15% for the 1^{st} week of Omicron reporting (i.e., week starting 11/21/21), 5%–20% for the 2^{nd} week of Omicron reporting (i.e., the week starting 11/28/21), and 1%–8% for the rest; and (4) Use of a range of 5%–25% for the 2^{nd} week of reporting and 1%–8% for all others.
Estimated infectiondetection rates in Gauteng increased substantially during the first two weeks of the Omicron (BA.1) wave and decreased afterwards under all four SR settings (Appendix 1—figure 12, 1^{st} row). This consistency suggests a general trend in infectiondetection rates at the time in accordance with the aforementioned potential changes in testing. Without using a higher SR range (e.g., 1%–8% and 1%–12% in columns 1–2 of Appendix 1—figure 12 vs 5%–20% and 5%–25% for week 2 in columns 3–4), the estimated increases in infectiondetection rate were lower; instead, the modelinference system attributed the dramatic case increases in the first two weeks to higher increases in population susceptibility and transmissibility (Appendix 1—figure 12, 2^{nd} and 3^{rd} row, compare columns 1–2 vs. 3–4). However, the higher estimates for population susceptibility and transmissibility contradicted with the drastic decline in cases shortly afterwards such that the modelinference system readjusted the transmissibility to a lower level during later weeks (see the uptick in estimated transmissibility in Appendix 1—figure 12, 3^{rd} row, first 2 columns). In contrast, when higher infectiondetection rates were estimated for the first two weeks using the last two SR settings, the transmissibility estimates were more stable during later weeks (Appendix 1—figure 12, 3^{rd} row, last 2 columns). In addition, modelinference using the latter two SR settings also generated more accurate retrospective predictions for the Omicron (BA.1) wave in Gauteng (Appendix 1—figure 13).
Given the above results, we used the 4^{th} SR setting in the modelinference for Gauteng (i.e., replace a fraction of the infection detection rate using values randomly drawn from U[5%, 25%] for the week starting 11/28/21 and U[1%, 8%] for all other weeks during the Omicron wave). Reported cases in other provinces did not change as dramatically as in Gauteng; therefore, for those provinces, we used the baseline setting, i.e., values drawn from U[1%, 8%], for reprobing the infectiondetection rate. Nonetheless, we note that the overall estimates for changes in transmissibility and immune erosion of Omicron (BA.1) were slightly higher under the first two SR settings but still consistent with the results presented in the main text (Appendix 1—figure 14).
Examination of posterior estimates for all model parameters
To diagnose posterior estimates for each parameter, we plotted the posterior median, 50% and 95% credible intervals (CrIs) estimated for each week during the entire study period, for each of the nine provinces (Appendix 1—figure 15 – 23). As shown in Appendix 1—figure 15, the estimated transmission rate was relatively stable during the ancestral wave; it then increased along with the surge of the Beta variant around October 2020 and leveled off during the Beta wave. Similarly, following the initial surge of the Delta and Omicron variants, estimated transmission rates increased before leveling off when the new variant became predominant. Similar patterns are estimated for all provinces, indicating the modelinference system is able to capture the changes in transmission rate due to each new variant.
Estimated latent period (Appendix 1—figure 16), infectious period (Appendix 1—figure 17), immunity period (Appendix 1—figure 18), and the scaling factor of NPI effectiveness (Appendix 1—figure 19) all varied somewhat over time, but to a much less extent compared to the transmission rate. Estimated time from infectiousness to case detection decreased slightly over time, albeit with larger variations in later time periods (see Appendix 1—figure 20 for the mean and Appendix 1—figure 21 for the standard deviation). It is possible that the modelinference system could not adequately estimate the nuanced changes in these parameters using aggregated population level data.
Estimated infectiondetection rates varied over time for all provinces (Appendix 1—figure 22). The infectiondetection rate can be affected by (1) testing capacity, e.g., lower during the first weeks of the COVID19 pandemic, and sometimes lower near the peak of a pandemic wave when maximal capacity was reached; (2) awareness of the virus, e.g., higher when a new variant was first reported and lower near the end of a wave; and (3) disease severity, e.g., higher when variants causing more severe disease were circulating. Overall, the estimates were consistent with these expected patterns.
Lastly, estimated IFRs also varied over time and across provinces (Appendix 1—figure 23). IFR can be affected by multiple factors, including infection demographics, innate severity of the circulating variant, quality and access to healthcare, and vaccination coverage. For infection demographics, IFR tended to be much lower in younger ages as reported by many (e.g., Levin et al., 2020). In South Africa, similar differences in infection demographics occurred across provinces. For instance, (Giandhari et al., 2021) noted a lower initial mortality in Gauteng, as earlier infections concentrated in younger and wealthier individuals. For the innate severity of the circulating variant, as noted in the main text, in general estimated IFRs were higher during the Beta and Delta waves than during the Omicron wave. In addition, as shown in Appendix 1—figure 23, estimated IFRs were substantially higher in four provinces (i.e., KwaZuluNatal, Western Cape, Eastern Cape, and Free State) than other provinces during the Beta wave. Coincidentally, the earliest surges of the Beta variant occurred in three of those provinces (i.e., KwaZuluNatal, Western Cape, Eastern Cape)(Tegally et al., 2021). Nonetheless, and as noted in the main text and the above subsection, the IFR estimates here should be interpreted with caution, due to the likely underreporting and irregularity of the COVID19 mortality data used to generate these estimates.
A proposed approach to compute the reinfection rates using modelinference estimates
It is difficult to measure or estimate reinfection rate directly. In this study, we have estimated the immune erosion potential for three major SARSCoV2 variants of concern (VOCs) and the infection rates during each pandemic wave in South Africa. These estimates can be used to support estimation of the reinfection rate for a given population. Indepth analysis is needed for such estimations. Here, as an example, we propose a simple approach to illustrate the possibility.
Consider the estimation in the context of the four waves in South Africa in this study (i.e., ancestral, Beta, Delta, and Omicron BA.1 wave). Suppose the cumulative fraction of the population ever infected before the beta wave is ${c}_{pre\_beta}$ (this is roughly the attack rate during the ancestral wave) and estimated immune erosion potential for Beta is ${\theta}_{beta}$ . To compute the reinfection rate during the Beta wave, we can assume that ${c}_{pre\mathrm{\_}beta}\times \left(1{\theta}_{beta}\right)$ are protected by this prior immunity, and that the remaining $c}_{pre\mathrm{\_}beta}{\theta}_{beta$ (i.e. those lost their immunity due to immune erosion) have the same risk of infection as those never infected, such that the reinfection rate/fraction among all infections, z_{beta}, during the Beta wave (i.e., z_{beta} is the attack rate by Beta) would be:
The reinfection rate/fraction among the entire population would be:
Combining the above, the cumulative fraction of the population ever infected by the end of the Beta wave and before the Delta wave would be:
Note that the fraction of the population ever infected, c, is updated to compute the subsequent fraction of the population protected by prior immunity, because the immune erosion potential here is estimated relative to the combined immunity accumulated until the rise of a new variant. We can repeat the above process for the Delta wave and the Omicron wave. See an example calculation in Appendix 1—table 4.
Work to refine the reinfection estimates (e.g., sensitivity of these estimates to assumptions and uncertainty intervals) is needed. Nonetheless, these example estimates (Appendix 1—table 4) are consistent with reported serology measures [4^{th} column vs. e.g. ~90% seropositive in March 2022 after the Omicron BA.1 wave reported in Bingham et al., 2022] and reinfection rates reported elsewhere [5^{th} and 6^{th} columns vs. e.g., reported much higher reinfection rate during the Omicron wave in Pulliam et al., 2022]. Importantly, these estimates also show that, in addition to the innate immune erosive potential of a given new variant, the reinfection rate is also determined by the prior cumulative fraction of the population ever infected (4^{th} column in Appendix 1—table 4) and the attack rate by each variant (3^{rd} column in Appendix 1—table 4). That is, the higher the prior cumulative infection rate and/or the higher the attack rate by the new variant, the higher the reinfection rate would be for a new variant that can cause reinfection. For instance, despite the lower immune erosion potential of Delta than Beta, because of the high prior infection rate accumulated up to the Delta wave onset, the estimated reinfection rate by Delta among all Delta infections was higher compared to that during the Beta wave (6^{th} column in Appendix 1—table 4). With the higher attack rate during the Delta wave, the reinfection rate among the entire population was much higher for Delta than Beta (5^{th} column in Appendix 1—table 4). Thus, these preliminary results suggest that reinfection rates observed for each variant and differences across different variants should be interpreted in the context of the innate immune erosion potential of each variant, the prior cumulative infection rate of the study population, and the attack rate of each variant in the same population.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. All source code and data necessary for the replication of our results and figures are publicly available at https://github.com/wanyang/covid_SouthAfrica, (copy archived at swh:1:rev:40c0e5ac5ab65005b600a4ca646fec04b0870b81).

ZenodoData Science for Social Impact Research Group at University of Pretoria (2021) Coronavirus COVID19 (2019nCoV) Data Repository for South Africa.https://doi.org/10.5281/zenodo.3819126

GoogleID covid19/mobility/. Google Inc (2020) Community Mobility Reports.

GitHubID covid19data/tree/master/public/data/vaccinations. Data on COVID19 (coronavirus) vaccinations by Our World in Data.

sacoronavirusID 2021/11/23/updateoncovid19tuesday23november2021/. Department of Health Republic of South Africa (2021) Update on Covid19 (Tuesday 23 November 2021).

NICDID SACMCFourthwavereport17112021nicd. The South African COVID19 Modelling Consortium (2021) COVID19 modelling update: Considerations for a potential fourth wave (17 Nov 2021).

SAMRCID reportweeklydeathssouthafrica. The South African Medical Research Council (SAMRC) (2021) Report on Weekly Deaths in South Africa.
References

BookSeverity, Criticality, and Fatality of the SARSCoV2 Beta VariantClinical Infectious Diseases.https://doi.org/10.1093/cid/ciab909

Effectiveness of the bnt162b2 covid19 vaccine against the b.1.1.7 and b.1.351 variantsThe New England Journal of Medicine 385:187–189.https://doi.org/10.1056/NEJMc2104974

Household transmission of COVID19 cases associated with SARSCoV2 delta variant (B.1.617.2): national casecontrol studyThe Lancet Regional Health. Europe 12:e100252.https://doi.org/10.1016/j.lanepe.2021.100252

An ensemble adjustment kalman filter for data assimilationMonthly Weather Review 129:2884–2903.https://doi.org/10.1175/15200493(2001)129<2884:AEAKFF>2.0.CO;2

Early transmission of SARSCoV2 in South Africa: An epidemiological and phylogenetic reportInternational Journal of Infectious Diseases 103:234–241.https://doi.org/10.1016/j.ijid.2020.11.128

Omicron outbreak at a private gathering in the Faroe Islands, infecting 21 of 33 triplevaccinated healthcare workersClinical Infectious Diseases 10:ciac089.https://doi.org/10.1093/cid/ciac089

The changing epidemiology of SARSCoV2Science 375:1116–1121.https://doi.org/10.1126/science.abm4915

Timing of community mitigation and changes in reported covid19 and community mobility  four u.S. Metropolitan areas, february 26april 1, 2020MMWR. Morbidity and Mortality Weekly Report 69:451–457.https://doi.org/10.15585/mmwr.mm6915e2

Assessing the age specificity of infection fatality rates for COVID19: systematic review, metaanalysis, and public policy implicationsEuropean Journal of Epidemiology 35:1123–1138.https://doi.org/10.1007/s10654020006981

Early transmission dynamics in wuhan, china, of novel coronavirusinfected pneumoniaThe New England Journal of Medicine 382:1199–1207.https://doi.org/10.1056/NEJMoa2001316

Effectiveness of covid19 vaccines against the b.1.617.2 (delta) variantThe New England Journal of Medicine 385:585–594.https://doi.org/10.1056/NEJMoa2108891

Population immunity and covid19 severity with omicron variant in south africaThe New England Journal of Medicine 386:1314–1326.https://doi.org/10.1056/NEJMoa2119658

A global database of COVID19 vaccinationsNature Human Behaviour 5:947–953.https://doi.org/10.1038/s41562021011228

Universal coronavirus vaccines  an urgent needThe New England Journal of Medicine 386:297–299.https://doi.org/10.1056/NEJMp2118468

Sarscov2 omicron variant neutralization in serum from vaccinated and convalescent personsThe New England Journal of Medicine 386:698–700.https://doi.org/10.1056/NEJMc2119236

Efficacy of nvxcov2373 covid19 vaccine against the b.1.351 variantThe New England Journal of Medicine 384:1899–1909.https://doi.org/10.1056/NEJMoa2103055

Estimates of the severity of coronavirus disease 2019: a modelbased analysisThe Lancet. Infectious Diseases 20:669–677.https://doi.org/10.1016/S14733099(20)302437

Effectiveness of nonpharmaceutical interventions to contain COVID19: a case study of the 2020 spring pandemic wave in New York CityJournal of the Royal Society, Interface 18:e20200822.https://doi.org/10.1098/rsif.2020.0822

COVID19 pandemic dynamics in India, the SARSCoV2 Delta variant and implications for vaccinationJournal of the Royal Society, Interface 19:e20210900.https://doi.org/10.1098/rsif.2021.0900

Modeling influenza seasonality in the tropics and subtropicsPLOS Computational Biology 17:e1009050.https://doi.org/10.1371/journal.pcbi.1009050
Decision letter

Dobromir DimitrovReviewing Editor; Fred Hutchinson Cancer Center, Seattle, United States

Bavesh D KanaSenior Editor; University of the Witwatersrand, South Africa

Mia MooreReviewer; Fred Hutchinson Cancer Center, United States
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "COVID19 pandemic dynamics in South Africa and epidemiological characteristics of three variants of concern (Β, Δ, and Omicron)" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Bavesh Kana as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Mia Moore (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
1) Authors need to clarify how their modeling analysis supports stated conclusions.
2) The paper will benefit from a more detailed explanation and sensitivity analyses that show how model assumptions influence presented results.
3) The authors should elaborate on the timedependent results for all hidden parameters estimated as part of the model
Reviewer #1 (Recommendations for the authors):
1) Figure 1A is difficult to read but looks like the model underestimates many of the mortality peaks. Authors should discuss this.
2) Figure 1B: unclear what data is shown. ylabel suggests it is a ratio of cum. inf over seroprevalence as %. However, this ratio should be >1 yes? Most of the key quantitative metrics used in the paper include cumulative inf. rate, seroprevalence, susceptibility, transmissibility, etc. should be clearly defined to avoid confusion.
3) Using increased susceptibility and immune erosion interchangeably (as in rows 123126) is troubling if attributed to the properties of different variants only. It should at least partially due to waning immunity, independent of what variants are prevalent. Several studies suggest that waning may not be the same for protection acquired from vaccines or from prior infection. I understand that these effects are not easy to be disentangled but their feasibility needs to be considered.
4) Large attack rate in rows 137138 is attributed to the high transmissibility of Δ. What about waning immunity? What is the estimated reinfection rate?
5) Immunity erosion is a key metric reported in the results. What is the mechanism of this erosion? More susceptible, more likely to get a severe infection or something else? Do you assume the same erosion after prior infection and after vaccination? What about waning over time? A more precise definition will help readers.
6) Many assumptions including those in rows 311315 are critical. Will be nice to show how sensitive the results are to them.
7) Having seasonality in the model is interesting and useful. Authors should elaborate if this is related to proportions of contacts occurring indoors or something else? That will help applicability to other settings.
8) Figure 3E Sharp increase in detection rate during Omicron rate is difficult to believe. Other explanation?
9) Figure 4D The term "relative change" here is confusing. Need more precise definitions. For instance, does it mean that Β is ~50% as transmissible as the ancestral or does it mean that Β is ~50% more transmissible than ancestral?
10) Not sure the IFR estimates by province on p. 38 make sense to me. In my view, they should be similar across regions if health systems are comparably effective. Is it assumed different rates for previously infected and/or vaccinated?
Reviewer #2 (Recommendations for the authors):
In general very good work, but I think you need to make the logic that leads from your analysis to your conclusions a little clearer, ie what is it about what you found that makes you view large future waves as possible. I would consider focusing on the longterm trends that you can see, specifically with regard to transmission and susceptibility.
I'm also concerned about the potential lack of identifiability in the fitting scheme with all of these different parameters. In particular, I'm concerned that the drastic changes in infection detection rate may be masking changes elsewhere, in particular during the Omicron wave. I would consider a sensitivity that leaves this variable constant.
Transmissibility and Susceptibility need a more precise definition when introduced. These are assumed to be timevarying parameters or are they derived from other quantities?
Fit: Cases and Deaths, Validation: Hospitalization, Excess Deaths, Seroprevalence, retrospective predictions of δ and omicron waves
Reviewer #3 (Recommendations for the authors):
The authors have presented an extremely wellwritten and comprehensive analysis of the South African COVID19 pandemic using an intricate epidemiological model. I am having some trouble fully evaluating the model, though, because of the numerous timedependent variables estimated as part of the fitting process. I would suggest that the authors consider adding in a figure (at least from one example region) showcasing the timedependent results for all hidden parameters estimated as part of the model (i.e. Zt, Dt, Lt, et, from Equation 1 as well as rt, IFRt and other parameters from the observation model). There is likely to be a high correlation between many of these parameters over time, so such plots would allow for proper diagnosis of the fitting procedure and ensure that all parameters are within the realistic parameter ranges at all times. As a note, it was not clear whether the prior distributions in Table S4 were actual Bayesian prior distributions, or merely the initial range of starting conditions for T0, and it would be helpful to clarify how they integrate with model fitting. Additionally, I suggest that the authors expand their Results section to include additional analyses from these hidden parameters such as how the latent period has changed over time (e.g. there is some evidence that omicron progressed quicker than previous variants) and/or the impact that mobility has had on transmission over time (incorporating the impact of et). Other analyses have found degradation of the relationship between mobility and transmission over time in limited contexts, so it would be useful to compare the current results (e.g. https://www.pnas.org/doi/10.1073/pnas.2111870119)
The authors use seasonal patterns based on historic climate data from South Africa as a means to modulate COVID19 transmission, but there doesn't appear to be any reference for the actual climate data for South Africa from the same time period. Are there any data from the modeling time period the authors could use as validation that their assumed seasonal curves match the actual climatic conditions?
The use of retrospective forecasts is used as a means to validate that the model is accurately capturing transmission, behavioral, and immunological status in the model. While I agree that overall, the 12 week forecasts look satisfactory, most forecast hubs are asking for forecasts up to 4 weeks (e.g. covid forecast hub), and accurate forecasts over a longer time horizon would dramatically strengthen the validation of the model estimates. I suggest that the authors attempt 34 week forecasts as part of a supplemental analysis to understand the limitations of the model and forecast ability.
https://doi.org/10.7554/eLife.78933.sa1Author response
Essential revisions:
1) Authors need to clarify how their modeling analysis supports stated conclusions.
2) The paper will benefit from a more detailed explanation and sensitivity analyses that show how model assumptions influence presented results.
3) The authors should elaborate on the timedependent results for all hidden parameters estimated as part of the model
We have revised the manuscript to address all these suggestions:
We have clarified how the modeling analysis supports the conclusions in both the Results and the Discussion. We note that the findings and conclusions in this study are model inference estimates, which we draw further support for when comparing our estimates with independent data and reports from the literature in the Results section. In the Discussion section, we broaden the discussion to incorporate both the findings from this study in South Africa and those reported elsewhere on key epidemiological properties of new variants and their potential impacts on future SARSCoV2 dynamics. For the latter, we have now added more specifics to relate the findings to our general observations. We have also clarified that the three general observations we made are more general in the context of global SARSCoV2 dynamics, including but not limited to this study.
We have added more detailed explanation when introducing the terms used in the manuscript and when we present the modelinference estimates (e.g. population susceptibility and variant transmissibility). See e.g., Lines 129 – 138 and 186 – 200 in the main text. We have also added further detailing of the modelinference method (e.g. prior selection and diagnosis) in the Appendix 1. In addition, we have added sensitivity analyses, in particular for the infectiondetection rate in Gauteng during the Omicron wave, to show how model assumptions influence the results. These supporting results are presented in the Appendix 1.
We have plotted and shown the weekly estimates for all parameters included in our modelinference system (Appendix 1figures 15 23). We have also added a brief note in the main text on these results (see Lines 6265). As the focus of this study is general COVID19 dynamics and the epidemiological properties of the SARSCoV2 variants of concern, we present the main estimates (e.g. population susceptibility, transmissibility, infectiondetection rate, infectionfatality risk) in the main text and results for the supporting parameters (e.g. latent period) in Appendix 1.
Reviewer #1 (Recommendations for the authors):
1) Figure 1A is difficult to read but looks like the model underestimates many of the mortality peaks. Authors should discuss this.
The detailed estimates are shown in Appendix 1figure 1. For mortality, there are some data irregularities for some weeks, likely due to audit and release of mortality data including deaths that occurred in previous time periods but were not redistributed according to the actual time of death. Such instances have occurred in many countries. Per reviewer suggestion, we have added a discussion on reported COVID19 mortality and the modelinference strategy for handling data irregularities in this study. Please see Appendix 1, section “1. A brief note on reported COVID19 mortality and modelinference strategy in this study.”
2) Figure 1B: unclear what data is shown. ylabel suggests it is a ratio of cum. inf over seroprevalence as %. However, this ratio should be >1 yes? Most of the key quantitative metrics used in the paper include cumulative inf. rate, seroprevalence, susceptibility, transmissibility, etc. should be clearly defined to avoid confusion.
We thank the reviewer for the comment. To clarify, we have relabeled the yaxis text. It is the estimated cumulative infection rate from this study, or seroprevalence measured by serology surveys from other studies, both relative to population size. Per the reviewer suggestion, we have also defined terms when first introducing them in the Results section (see e.g., Lines 129 – 138 and 186 – 200 in the main text). We have also briefly defined terms in the figure captions. See excerpts below:
Lines 129 – 138:
“The modelinference system estimates a large increase in population susceptibility with the surge of Β (Figure 3D; note population susceptibility is computed as S / N × 100%, where S is the estimated number of susceptible people and N is population size)… In addition, an increase in transmissibility is also evident for Β, after accounting for concurrent NPIs and infection seasonality (Figure 3C; note transmissibility is computed as the product of the estimated variantspecific transmission rate and the infectious period; see Methods for detail).”
Lines 186 – 200:
“We then use these modelinference estimates to quantify the immune erosion potential and increase in transmissibility for each VOC. Specifically, the immune erosion (against infection, same below) potential is computed as the ratio of two quantities – the numerator is the increase of population susceptibility due to a given VOC and the denominator is population immunity (i.e., complement of population susceptibility) at wave onset. The relative increase in transmissibility is also computed as a ratio, i.e., the average increase due to a given VOC relative to ancestral SARSCoV2 (see Methods). As populationspecific factors contributing to transmissibility (e.g., population density and average contact rate) would be largely cancelled out in the latter ratio, we expect estimates of the VOC transmissibility increase to be generally applicable to different populations. However, prior exposures and vaccinations varied over time and across populations; thus, the level of immune erosion is necessarily estimated relative to the local population immune landscape at the time of the variant surge, and should be interpreted accordingly. In addition, this assessment also does not distinguish the sources of immunity or partial protection against severe disease; rather, it assesses the overall loss of immune protection against infection for a given VOC.”
3) Using increased susceptibility and immune erosion interchangeably (as in rows 123126) is troubling if attributed to the properties of different variants only. It should at least partially due to waning immunity, independent of what variants are prevalent. Several studies suggest that waning may not be the same for protection acquired from vaccines or from prior infection. I understand that these effects are not easy to be disentangled but their feasibility needs to be considered.
We thank the reviewer for the comment. We agree that waning immunity could increase population susceptibility, aside from immune erosion due to a new variant. However, as individuals are infected and subsequently lose their prior immunity at different points in time, in aggregation, this would lead to a more gradual change in population susceptibility. In contrast, if a new variant is immune erosive, individuals exposed to such a variant would be susceptible to it regardless of prior infection history; and in aggregation, the population could become susceptible rapidly with the rapid spread of an immune erosive new variant. Note that mass vaccination was not rolled out during the Βeta wave (see, e.g., Figure 3A, green lines for vaccination uptake over time). Therefore, we infer the estimated rapid increase in population susceptibility with the surge of Βeta as an indication of the variant being immune erosive. To be more precise, we have revised the text as follows:
“The modelinference system estimates a large increase in population susceptibility with the surge of Βeta (Figure 3D; note population susceptibility is computed as S / N × 100%, where S is the estimated number of susceptible people and N is population size). Such a dramatic increase in population susceptibility (vs. a likely more gradual change due to waning immunity), to the then predominant Βeta variant, suggests Βeta likely substantially eroded prior immunity and is consistent with laboratory studies showing low neutralizing ability of convalescent sera against Βeta (9, 10). In addition, an increase in transmissibility is also evident for Βeta,.…” (Lines 129 – 138)
4) Large attack rate in rows 137138 is attributed to the high transmissibility of Delta. What about waning immunity? What is the estimated reinfection rate?
We attribute the large attack rate during the Delta wave to multiple factors including higher transmissibility of Delta, the more conducive winter transmission conditions, and the immune erosive properties of Delta (i.e., it can cause reinfection). See the related text below.
“This large attack rate was possible, due to the high transmissibility of Delta, as reported in multiple studies (1216), the more conducive winter transmission conditions (Figure 3A), and the immune erosive properties of Delta relative to both the ancestral and Βeta variants (1719).” (Lines 148 – 151).
For waning immunity, the Delta wave occurred approximately from May – Aug 2021, less than 1 year after the first wave (ending ~Oct 2020) and less than 6 months after the Βeta wave (~Nov 2020 – April 2021). Given the estimated variantspecific immunity period of ~23 years (e.g., data from the SIREN study reported in Hall et al. 2021 Lancet 397: 1459; and data from Sweden reported in Nordström et al. 2022 Lancet Infect Dis 22:781) and the recency of the previous waves, waning immunity likely played a much lesser role producing the high attack rate during the Delta wave.
It is difficult to estimate reinfection rate. However, we are able to estimate the immune erosion potential of Delta, an indication of the variant’s ability to cause reinfection. We report this result and compare it to estimates reported in India:
“Estimates for Delta vary across the nine provinces (Figure 4D, 2nd column), given the more diverse population immune landscape among provinces after two pandemic waves. Overall, we estimate that Delta eroded 24.5% (95% CI: 0 – 53.2%) of prior immunity (gained from infection by ancestral SARSCoV2 and/or Β, and/or vaccination) and was 47.5% (95% CI: 28.4 – 69.4%) more transmissible than the ancestral SARSCoV2. Consistent with this finding, and in particular the estimated immune erosion, studies have reported a 27.5% reinfection rate during the Delta pandemic wave in Delhi, India (17) and reduced ability of sera from Βetainfection recoverees to neutralize Delta (18, 19).” (Lines 210 – 217).
In addition, the estimated immune erosion potential and cumulative infection rate over time can be used to estimate the reinfection rate for a given population. For example, suppose the cumulative fraction of the population ever infected before the Beta wave is $c}_{pre\mathrm{\_}beta$ (this is roughly the attack rate during the ancestral wave) and estimated immune erosion potential for Βeta is θ_{beta}, to compute the reinfection rate during the Βeta wave, we can assume that ${c}_{pre\mathrm{\_}beta}x{(1\theta}_{\text{beta}})$ are protected by this prior immunity, and that the remaining $c}_{pre\mathrm{\_}beta}{\theta}_{\text{beta}$ (i.e. those lost their immunity due to immune erosion) have the same risk of infection as those never infected, such that the reinfection rate/fraction among all infections, z_{beta}, during the Βeta wave would be:
$\eta}_{\text{beta}}=\frac{{c}_{pre\mathrm{\_}beta}{\theta}_{\text{beta}}}{1{c}_{pre\mathrm{\_}beta}+{c}_{pre\mathrm{\_}beta}{\theta}_{\text{beta}}$ The reinfection rate/fraction among the entire population is then: $\eta}_{\text{beta}}^{\prime}={z}_{\text{beta}}{\eta}_{\text{beta}$ Combining the above, the cumulative fraction of the population ever infected by the end of the Βeta wave and before the Delta wave would be:$c}_{pre\mathrm{\_}beta}={c}_{pre\mathrm{\_}beta}={z}_{\text{beta}}{\eta}_{\text{beta}}^{\prime$ Note that the fraction of the population ever infected, c, are updated to compute the subsequent fraction of people protected by prior immunity, because the immune erosion potential here is estimated relative to the combined immunity accumulated until the rise of a new variant. We can repeat the above process for the Delta wave and the Omicron wave. See Appendix 1table 1 for an example.
Work to refine these estimates (e.g., sensitivity of these estimates to assumptions and uncertainty intervals) is needed. Nonetheless, the example estimates shown in Appendix 1table 1 are consistent with reported serology measures (fourth column vs. e.g., ~90% seropositive in March 2022 after the Omicron BA.1 wave reported in Bingham et al. 2022; DOI: https://doi.org/10.21203/rs.3.rs^{1}687679/v2) and the reinfection rates reported elsewhere (fifth and sixth columns vs. e.g. reported much higher reinfection rates during the Omicron wave in e.g., Pulliam et al. Science 376:eabn4947). Importantly, these estimates also show that, in addition to the innate immune erosive potential of a given new variant, the reinfection rate is also determined by the prior cumulative fraction of the population ever infected (fourth column in Appendix 1table 1) and the attack rate by each variant (third column in Appendix 1table 1). That is, the higher the prior cumulative infection rate and/or the higher the attack rate by the new variant, the higher the reinfection rate would be for a new variant that can cause reinfection. For instance, despite the lower immune erosion potential of Delta than Βeta, because of the high prior infection rate accumulated up to the Delta wave onset, the estimated reinfection rate by Delta among all Delta infections was higher compared to that during the Βeta wave (sixth column in Appendix 1table 1). With the higher attack rate during the Delta wave, the reinfection rate among the entire population was much higher for Delta than Βeta (fifth column in Appendix 1table 1). Thus, these preliminary results suggest that reinfection rates observed for each variant and differences across different variants should be interpreted in the context of the innate immune erosion potential of each variant, the prior cumulative infection rate of the study population, and the attack rate of each variant in the same population.
We have added the above preliminary analysis on reinfection rate in the Appendix 1.
5) Immunity erosion is a key metric reported in the results. What is the mechanism of this erosion? More susceptible, more likely to get a severe infection or something else? Do you assume the same erosion after prior infection and after vaccination? What about waning over time? A more precise definition will help readers.
Essentially, we estimate erosion of immune protection against infection (i.e. not disease severity) due to a new variant and combine immune protection from both prior infections and vaccinations, as it is difficult to separate the sources of immunity. To help readers interpret the results, we have added the following paragraph in the Results section before presenting the specific estimates, to explain how the immune erosion potential and change in transmissibility are calculated in this study:
Lines 186 – 203:
“We then use these modelinference estimates to quantify the immune erosion potential and change in transmissibility for each VOC. Specifically, the immune erosion (against infection) potential is computed as the ratio of two quantities – the numerator is the increase of population susceptibility due to a given VOC and the denominator is population immunity (i.e., complement of susceptibility) at wave onset. The change in transmissibility is also computed as a ratio, i.e., the average change in transmissibility due to a given VOC relative to the ancestral SARSCoV2 (see Methods). As factors contributing to populationspecific transmissibility (e.g., population density and average contact rate) are largely cancelled out in the latter ratio, we expect estimates of the VOC transmissibility change to be generally applicable to different populations. However, prior exposures and vaccinations varied over time and across populations; thus, the level of immune erosion is necessarily estimated relative to the local population immune landscape at the time of the variant surge and should be interpreted accordingly. In addition, this approximation does not distinguish the sources of immunity or partial protection against severe disease; rather, it estimates the overall loss of immune protection against infection for a given VOC.
In the above context, we estimate that Β eroded immunity among 63.4% (95% CI: 45.0 – 77.9%) of individuals with prior ancestral SARSCoV2 infection and was 34.3%
(95% CI: 20.5 – 48.2%) more transmissible than the ancestral SARSCoV2…”
We have also clarified in the revision that immunity waning over time is excluded, i.e., only erosion due to the study variant is included (see Lines 443448).
6) Many assumptions including those in rows 311315 are critical. Will be nice to show how sensitive the results are to them.
Rows 311 – 315 specify the vaccine effectiveness (VE) values used in the model, based on types of vaccines used in South Africa and reported VE for each vaccine against each variant. These are based on reported data, i.e., not assumptions. Importantly, we also note that vaccination coverage in South Africa was relatively low throughout the study period (March 2020 – Feb 2022), see e.g. Figure 3A, <30% in Gauteng by Feb 2022, for both vaccine doses. Given these lower vaccination rates, we do not expect the VE settings to make a large difference. In addition, we have conducted similar sensitivity analyses on VE settings in a previous study (Yang and Shaman 2022 RSIF 19: 20210900) and found no substantial differences in model estimates of immune erosion and change in transmissibility for the Δ variant in India (note that, similar to South Africa, India had low vaccination coverage during the time period of these sensitivity analyses on VE).
For other major parameters, the posterior estimates are made each week based on the model prior and the data (weekly cases and deaths). The main assumptions of these parameters are encoded in the initial prior ranges (listed in Appendix 1table 5). Specifically, for parameters with previous estimates from the literature (e.g., transmission rate β, incubation period Z, infectious period D, and immunity period L), we set the prior range accordingly (see Appendix 1table 5). For parameters with high uncertainty and spatial variation (e.g. infectiondetection rate), we preliminarily tested them by visualizing the model prior and posterior estimates, using different ranges. For instance, for the infectiondetection rate, when using a higher prior range (e.g., Author response image 1 showing sample test runs using an initial range of 5 20% vs. 1 10%), the model prior tended to overestimate the observed cases and underestimate deaths. Based on this initial testing, we then used a wide range that is able to reproduce the observed cases and deaths relatively well and allow the filter to make inference of the infectiondetection rate along with other parameters.
Importantly, we note that, the EAKF continuously adjusts the parameters based on both the model prior (in this iterative filtering algorithm, the posterior from the previous time step becomes the prior in the current time step) and the most recent data (here weekly cases and deaths). To capture potential changes over time (e.g., likely increased detection for variants causing more severe disease), we applied space reprobing, a technique that randomly replaces parameter values of a small fraction of the model ensemble, to explore a wider range of parameter values. Both the EAKF and the reprobing allow posterior parameter estimates that migrate outside the initial parameter ranges. This is evident from the posterior estimates (e.g. Figure 3E for infection detection rate).
In the revision, we have added discussion on the above consideration of the modelinference method in the Appendix 1. In addition, we have added sensitivity analyses, in particular for the infectiondetection rate in Gauteng during the Omicron wave, to show how model assumptions influence the results. Please see these supporting results in the Appendix 1. Also see details below in replies to related comments.
7) Having seasonality in the model is interesting and useful. Authors should elaborate if this is related to proportions of contacts occurring indoors or something else? That will help applicability to other settings.
The seasonal trend in the model is computed using temperature and specific humidity data using a function designed to represent how the survival and transmission of the virus respond to different temperature and specific humidity conditions. This seasonality function is described in detail in a previous paper (Yang and Shaman 2021 Nature Communications) and cited in this manuscript. To provide more context, we have revised the text to include more details:
See Lines 353 – 375:
“(4) Infection seasonality, computed using temperature and specific humidity data as described previously (see supplemental material of Yang and Shaman(4)).
[…]
The estimated relative seasonal trend, b_{t}, is used to adjust the relative transmission rate at time t in Equation 1.”
8) Figure 3E Sharp increase in detection rate during Omicron rate is difficult to believe. Other explanation?
In Gauteng, the number of cases during the first week of detection (i.e., the week starting 11/21/21) increased 4.4 times relative to the previous week; during the second week of detection (i.e., the week starting 11/28/21), it increased another 4.9 times. Yet after these two weeks of dramatic increases, the case numbers peaked during the third week and started to decline thereafter. Initial testing suggested substantial changes in infection detection rates during this time; in particular, detection might have increased during the first two weeks due to awareness and concern of the novel Omicron variant and declined during later weeks due to constraints on testing capacity as well as subsequent reports of milder disease caused by Omicron later on. Anecdotal reports on changes in the detection rate during the Omicron wave include hospitals moving to testing patients admitted not necessarily for Omicron when Omicron was initially reported, which increased incidental detection (see e.g., Abdullah, et al. 2022 Int J Infect Dis 116:3842); the very high test positive rates (~40%) around the peak of the Omicron wave also suggest reduced testing at that time (see e.g., https://www.nicd.ac.za/wpcontent/uploads/2022/01/COVID19TestingReport_Week52.pdf).
To more accurately estimate the infection detection rate and underlying transmission dynamics, we ran and compared modelinference using 4 settings for the infection detection rate. Estimated infection detection rates in Gauteng increased substantially during the first two weeks of the Omicron (BA.1) wave and decreased afterwards, under all four settings (see Appendix 1figure 12, first row). This consistency suggests a general trend in infection detection rate at the time, in accordance with the aforementioned potential changes in testing. Further, diagnosis of the posterior estimates (e.g., estimated transmissibility; see Appendix 1figure 12) and retrospective predictions (see Appendix 1figure 13) indicates that there was likely a substantial increase of the infection detection rate during the first ~2 weeks of the Omicron wave. Importantly, we note that despite these increases, detection rates at the time were low (~10% during those weeks). In addition, overall estimates for changes in transmissibility and immune erosion of Omicron (BA.1) were slightly higher under the first two settings with lower infection detection rates, but still consistent with the results presented in the main text (see Appendix 1figure 14).
We have now added the above additional sensitivity analyses and results to the revised Appendix 1. Appendix 1figures 12 – 14 are also included below for ease of reference.
9) Figure 4D The term "relative change" here is confusing. Need more precise definitions. For instance, does it mean that Βeta is ~50% as transmissible as the ancestral or does it mean that Βeta is ~50% more transmissible than ancestral?
50% change means 50% more transmissible than the ancestral strain. To clarify, we have revised the yaxis label to “Relative increase (%)”.
10) Not sure the IFR estimates by province on p. 38 make sense to me. In my view, they should be similar across regions if health systems are comparably effective. Is it assumed different rates for previously infected and/or vaccinated?
We thank the reviewer for the comment. These estimates were computed as the ratio of the total reported COVID19 death to the estimated total infection rate during each wave for each province. For some provinces, COVID19 mortality data appeared to be highly irregular, with very high weekly death counts for some weeks (e.g. see Appendix 1figure 1 for Mpumalanga and Northern Cape). A likely explanation is audit and release of mortality data including deaths that occurred in previous time periods, which has happened in other countries (e.g., see some of the documentation of this phenomenon by the Financial Times at https://ig.ft.com/coronaviruschart/?areas=eur&areas=usa&areas=prt&areas=twn&areas=nzl&areas=e92000001&areasRegio nal=usny&areasRegional=usnm&areasRegional=uspr&areasRegional=ushi&areasRegional=usfl& areasRegional=usco&cumulative=0&logScale=0&per100K=1&startDate=20210601&values=deaths; under the header “SOURCES”). However, these “new” deaths sometimes were not properly redistributed according to the actual time of death but were instead lumped and counted as deaths on the date of data release. Here, we could not adjust for this possibility due to a lack of information. Instead, to account for such potential data errors, the ensemble adjustment Kalman filter (EAKF) algorithm used in the modelinference system includes an observational error when computing the posterior estimates. For instance, in this study, the observational errors were scaled to the corresponding observations (thus, weeks with higher mortality would also have larger observational errors). In doing so, the EAKF downweighs observations with larger observational errors (e.g., for weeks with the very large death counts) to reduce their impact on the inference of model dynamics. As such, the posterior estimates for mortality tend to (intentionally) miss the very high outlying data points (see Figure 1 and Appendix 1figure 1). In addition, posterior estimates for the infectionfatality risk (IFR) are also more stable over time, including for weeks with outlying mortality data (see, e.g., Appendix 1figure 23, IFR estimates for Mpumalanga).
We have now added the above discussion in the Supplemental Material (see “A brief note on reported COVID19 mortality and modelinference strategy in this study”). In addition, we have also added another supplemental table to show the model estimated IFR, computed as the weekly IFR estimates weighted using estimated weekly mortality rate (i.e., not computed directly using the reported mortality data). See the new Appendix 1table 3.
We agree with the reviewer that quality and access to health systems is a key factor contributing to the IFR. However, there are also other contributing factors, including infection demographics, intensity of the pandemic and strains on the health systems, innate severity of the circulating variant, and vaccination coverage. For infection demographics, IFR tended to be much lower in younger ages as reported by many (e.g., Levin et al. 2020 Eur J Epidemiol 35:11231138). In South Africa, similar differences in infection demographics occurred across provinces. For instance, in Giandhari et al. 2021 (Int J Infect Dis 103:234241), authors wrote about the lower initial mortality in Gauteng, as earlier infections concentrated in younger and wealthier individuals. Specifically, Giandhari et al. wrote “GP, home of the largest metropolitan area of Johannesburg, had an unusual epidemic, as the majority of initial cases were in middleage and wealthy individuals who travelled overseas for holidays. This translated in a very small number of deaths over time and infections were concentrated in the wealthy suburb of Sandton in Johannesburg.” There were also substantial variations in pandemic intensity across the South African provinces during each wave, and in turn, varying strains on local health systems and IFRs. This is evident from the raw case fatality ratio (CFR) by province for each pandemic wave (see table below), where the CFR (based on reported data alone, without any modeling) varied largely by province (Aither response table 1).
To provide more context to the IFR estimates, we have added the following paragraph in the Appendix 1 (pages 45):
“Lastly, estimated IFRs also varied over time and across provinces (Appendix 1figure 23). IFR can be affected by multiple factors, including infection demographics, innate severity of the circulating variant, quality and access to healthcare, and vaccination coverage. For infection demographics, IFR tended to be much lower in younger ages as reported by many (e.g., Levin et al. 2020 (5)). In South Africa, similar differences in infection demographics occurred across provinces. For instance, Giandhari et al. (6) noted a lower initial mortality in Gauteng, as earlier infections concentrated in younger and wealthier individuals. For the innate severity of circulating variant, as noted in the main text, in general estimated IFRs were higher during the Β and Δ waves, than the Omicron wave. In addition, as shown in Appendix 1figure 23, IFRs were substantially higher in four provinces (i.e., KwaZuluNatal, Western Cape, Eastern Cape, and Free State) than other provinces during the Β wave. Coincidentally, earliest surges of the Β variant occurred in three of those provinces (i.e., KwaZuluNatal, Western Cape, Eastern Cape)(6). Nonetheless, and as noted in the main text and the above subsection, the IFR estimates here should be interpreted with caution, due to the likely underreporting and irregularity of the COVID19 mortality data used to generate these estimates.”
Reviewer #2 (Recommendations for the authors):
In general very good work, but I think you need to make the logic that leads from your analysis to your conclusions a little clearer, ie what is it about what you found that makes you view large future waves as possible. I would consider focusing on the longterm trends that you can see, specifically with regard to transmission and susceptibility.
We thank the reviewer for the suggestion. As noted above, the three general observations we made are related to the SARSCoV2 dynamics observed in South Africa, as well as in other places. Per this suggestion, we have now revised the text to provide more direct evidence drawn from specific findings here to support those observations. See Lines 252 – 271 and the summary below:
1) New waves of infection are still possible. As shown in this study, new variants can erode prior immunity, increase population susceptibility, and in turn cause new waves. This was the case for Βeta, Delta, and Omicron. In addition, a new wave due to the BA.4/BA.5 Omicron subvariants began in April 2022 after the initial Omicron wave.
2) Large new waves of deaths can still occur. We have revised the text to “… large numbers of hospitalizations and/or deaths can still occur in later waves with large infection surges.” As noted in the manuscript, for mortality, the large Delta wave in South Africa resulted in 0.2% excess mortality, despite the large first wave and Βeta wave; for reference, excess mortality was 0.08% during the first wave and 0.19% during the Βeta wave (20). More recently, the fifth wave caused by the BA.4/BA.5 Omicron subvariants has also led to increases in hospitalization admissions.
3) New variants likely require an ability to evade existing immunity if they are to spread. As noted in the manuscript, this is based on the observation that four out of the 5 major VOCs identified thus far (Βeta, Gamma, Delta, and Omicron) including three in South Africa were able to erode preexisting immunity. In addition, the most recent BA.4/BA.5 Omicron subvariants have also been shown to erode preexisting immunity (including immunity gained from the initial BA.1 Omicron infection, see e.g. Cao et al. 2022 Nature and Khan et al. 2022 medRxiv 2022.2004.2029.22274477). We have added a brief note on the recent BA.4/BA.5 Omicron subvariants.
I'm also concerned about the potential lack of identifiability in the fitting scheme with all of these different parameters. In particular, I'm concerned that the drastic changes in infection detection rate may be masking changes elsewhere, in particular during the Omicron wave. I would consider a sensitivity that leaves this variable constant.
Per the reviewer suggestion, we have added a sensitivity test of model settings for the infection detection rate during the Omicron wave in Gauteng. Note that, we cannot fix the infection detection rate at a constant level, as there is empirical evidence suggesting substantial changes in detection rates through time. For instance, there were reports of changes to the detection rate during the Omicron wave when hospitals moved to testing patients admitted for reasons other than COVID19 after Omicron was initially reported, which increased incidental detection (see e.g., Abdullah, et al. 2022 Int J Infect Dis 116:3842); and the very high test positive rates (~40%) around the peak of the Omicron wave suggest reduced testing at that time (see e.g., https://www.nicd.ac.za/wpcontent/uploads/2022/01/COVID19TestingReport_Week52.pdf).
For all four modeling settings in the sensitivity analysis, the estimated infection detection rate in Gauteng increased substantially during the first two weeks of the Omicron (BA.1) wave and decreased afterwards (see Appendix 1figure 12, first row). This consistency suggests a general trend in infection detection rate at the time, in accordance with the aforementioned potential changes in testing. Further diagnosis of the posterior estimates (e.g., estimated transmissibility; see Appendix 1figure 12) and retrospective predictions (see Appendix 1figure 13) indicate that there was likely a substantial increase in infection detection rate during the first ~2 weeks of the Omicron wave. Importantly, we note that overall estimates for the changes in transmissibility and immune erosion of Omicron (BA.1) were slightly higher under the two model settings with lower infection detection rates, but still consistent with the results presented in the main text (see Appendix 1figure 14).
We have added the above additional sensitivity analyses and results to the revised Appendix 1. Appendix 1figures 12 – 14 are also included above in the reply to Reviewer 1 (see Pages 13 – 15 of this document).
Transmissibility and Susceptibility need a more precise definition when introduced. These are assumed to be timevarying parameters or are they derived from other quantities?
Per the reviewer suggestion, we have added brief definitions for both terms when introducing them in the Results. Transmissibility is computed as the product of the variantspecific transmission rate β_{t} and the infectious period D_{t} (also described in the Methods). Susceptibility is based on the model estimated number of susceptible persons relative to the population size (i.e., S / N × 100%). Please see excerpts below. We have also added similar notes in the figure captions to clarify.
Lines 129 – 138:
“The modelinference system estimates a large increase in population susceptibility with the surge of Βeta (Figure 3D; note population susceptibility is computed as S / N × 100%, where S is the estimated number of susceptible people and N is population size)… In addition, an increase in transmissibility is also evident for Βeta, after accounting for concurrent NPIs and infection seasonality (Figure 3C; note transmissibility is computed as the product of the estimated variantspecific transmission rate and the infectious period; see Methods for detail).”
Reviewer #3 (Recommendations for the authors):
The authors have presented an extremely wellwritten and comprehensive analysis of the South African COVID19 pandemic using an intricate epidemiological model. I am having some trouble fully evaluating the model, though, because of the numerous timedependent variables estimated as part of the fitting process. I would suggest that the authors consider adding in a figure (at least from one example region) showcasing the timedependent results for all hidden parameters estimated as part of the model (i.e. Zt, Dt, Lt, et, from Equation 1 as well as rt, IFRt and other parameters from the observation model). There is likely to be a high correlation between many of these parameters over time, so such plots would allow for proper diagnosis of the fitting procedure and ensure that all parameters are within the realistic parameter ranges at all times. As a note, it was not clear whether the prior distributions in Table S4 were actual Bayesian prior distributions, or merely the initial range of starting conditions for T0, and it would be helpful to clarify how they integrate with model fitting. Additionally, I suggest that the authors expand their Results section to include additional analyses from these hidden parameters such as how the latent period has changed over time (e.g. there is some evidence that omicron progressed quicker than previous variants) and/or the impact that mobility has had on transmission over time (incorporating the impact of et). Other analyses have found degradation of the relationship between mobility and transmission over time in limited contexts, so it would be useful to compare the current results (e.g. https://www.pnas.org/doi/10.1073/pnas.2111870119)
Per suggestion, we have plotted weekly estimates for all parameters (see Appendix 1figures 15 – 23). We have also added further explanation of the modelEAKF inference process (model prior etc.; see section 2 of the Appendix 1), and diagnosis/discussion of the posterior estimates for all parameters in the Appendix 1 (see section 4). In general, visual inspections indicate that posterior estimates for the model parameters are consistent with those reported in the literature or change over time and/or across provinces in directions as would be expected.
The authors use seasonal patterns based on historic climate data from South Africa as a means to modulate COVID19 transmission, but there doesn't appear to be any reference for the actual climate data for South Africa from the same time period. Are there any data from the modeling time period the authors could use as validation that their assumed seasonal curves match the actual climatic conditions?
We used historical climate data (years 2000 – 2020) because measurements are more abundant (when stratified by province) during this extended time period. Comparison of data in 2000 – 2020 and 2020 – 2021 shows similar trends for locations with sufficient data for both periods (i.e., Eastern Cape, KwaZuluNatal, and Western Cape). Please see a comparison in Author response image 2 below.
The use of retrospective forecasts is used as a means to validate that the model is accurately capturing transmission, behavioral, and immunological status in the model. While I agree that overall, the 12 week forecasts look satisfactory, most forecast hubs are asking for forecasts up to 4 weeks (e.g. covid forecast hub), and accurate forecasts over a longer time horizon would dramatically strengthen the validation of the model estimates. I suggest that the authors attempt 34 week forecasts as part of a supplemental analysis to understand the limitations of the model and forecast ability.
The forecasts were generated for the entire wave, often more than 10 weeks into the future for both cases and deaths, not just 12 weeks. For instance, see Figure 2 A for Gauteng, the red lines and surrounding areas show forecasts generated at two time points (i.e., 1 week and 2 weeks before the observed case peak) for the remaining weeks of the Δ wave (i.e., up to 11 and 10 weeks in the future for the two forecasts, respectively) and the Omicron wave (up to 14 and 13 weeks in the future for the two forecasts, respectively).
We have added a note to clarify this, when explaining the rationale for generating retrospective predictions 2 and 1 weeks before the peak of cases, in the Methods (see Lines 486 – 496):
“Predicting the peak timing, intensity, and epidemic turnaround requires accurate estimation of model state variables and parameters that determine future epidemic trajectories. This is particularly challenging for South Africa as the pandemic waves tended to progress quickly such that cases surged to a peak in only 3 to 7 weeks. Thus, we chose to generate retrospective predictions 2 and 1 weeks before the peak of cases in order to leverage 1 to 6 weeks of new variant data for estimating epidemiological characteristics. Specifically, for each pandemic wave, we ran the modelinference system until 2 weeks (or 1 week) before the observed peak of cases, halted the inference, and used the population susceptibility and transmissibility of the circulating variant estimated at that time to predict cases and deaths for the remaining weeks (i.e. 1014 weeks into the future).”
https://doi.org/10.7554/eLife.78933.sa2Article and author information
Author details
Funding
National Institute of Allergy and Infectious Diseases (AI145883)
 Wan Yang
 Jeffrey L Shaman
National Institute of Allergy and Infectious Diseases (AI163023)
 Jeffrey L Shaman
Centers for Disease Control and Prevention (CK000592)
 Jeffrey L Shaman
MorrisSinger Foundation
 Jeffrey L Shaman
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This study was supported by the National Institute of Allergy and Infectious Diseases (AI145883 and AI163023), the Centers for Disease Control and Prevention (CK000592), and a gift from the MorrisSinger Foundation.
Senior Editor
 Bavesh D Kana, University of the Witwatersrand, South Africa
Reviewing Editor
 Dobromir Dimitrov, Fred Hutchinson Cancer Center, Seattle, United States
Reviewer
 Mia Moore, Fred Hutchinson Cancer Center, United States
Version history
 Preprint posted: December 21, 2021 (view preprint)
 Received: March 24, 2022
 Accepted: July 21, 2022
 Version of Record published: August 9, 2022 (version 1)
Copyright
© 2022, Yang and Shaman
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,564
 Page views

 224
 Downloads

 26
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Epidemiology and Global Health
Background:
The aim of our study was to test the hypothesis that the community contact tracing strategy of testing contacts in households immediately instead of at the end of quarantine had an impact on the transmission of SARSCoV2 in schools in Reggio Emilia Province.
Methods:
We analysed surveillance data on notification of COVID19 cases in schools between 1 September 2020 and 4 April 2021. We have applied a mediation analysis that allows for interaction between the intervention (before/after period) and the mediator.
Results:
Median tracing delay decreased from 7 to 3.1 days and the percentage of the known infection source increased from 34–54.8% (incident rate ratioIRR 1.61 1.40–1.86). Implementation of prompt contact tracing was associated with a 10% decrease in the number of secondary cases (excess relative risk –0.1 95% CI –0.35–0.15). Knowing the source of infection of the index case led to a decrease in secondary transmission (IRR 0.75 95% CI 0.63–0.91) while the decrease in tracing delay was associated with decreased risk of secondary cases (1/IRR 0.97 95% CI 0.94–1.01 per one day of delay). The direct effect of the intervention accounted for the 29% decrease in the number of secondary cases (excess relative risk –0.29 95%–0.61 to 0.03).
Conclusions:
Prompt contact testing in the community reduces the time of contact tracing and increases the ability to identify the source of infection in school outbreaks. Although there are strong reasons for thinking it is a causal link, observed differences can be also due to differences in the force of infection and to other control measures put in place.
Funding:
This project was carried out with the technical and financial support of the Italian Ministry of Health – CCM 2020 and Ricerca Corrente Annual Program 2023.

 Epidemiology and Global Health
In biomedical science, it is a reality that many published results do not withstand deeper investigation, and there is growing concern over a replicability crisis in science. Recently, Ellipse of Insignificance (EOI) analysis was introduced as a tool to allow researchers to gauge the robustness of reported results in dichotomous outcome design trials, giving precise deterministic values for the degree of miscoding between events and nonevents tolerable simultaneously in both control and experimental arms (Grimes, 2022). While this is useful for situations where potential miscoding might transpire, it does not account for situations where apparently significant findings might result from accidental or deliberate data redaction in either the control or experimental arms of an experiment, or from missing data or systematic redaction. To address these scenarios, we introduce Region of Attainable Redaction (ROAR), a tool that extends EOI analysis to account for situations of potential data redaction. This produces a bounded cubic curve rather than an ellipse, and we outline how this can be used to identify potential redaction through an approach analogous to EOI. Applications are illustrated, and source code, including a webbased implementation that performs EOI and ROAR analysis in tandem for dichotomous outcome trials is provided.