1. Epidemiology and Global Health
Download icon

Modeling the impact of racial and ethnic disparities on COVID-19 epidemic dynamics

  1. Kevin C Ma  Is a corresponding author
  2. Tigist F Menkir
  3. Stephen Kissler
  4. Yonatan H Grad
  5. Marc Lipsitch
  1. Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, United States
  2. Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, United States
  3. Division of Infectious Diseases, Brigham and Women’s Hospital and Harvard Medical School, United States
Research Article
  • Cited 0
  • Views 883
  • Annotations
Cite this article as: eLife 2021;10:e66601 doi: 10.7554/eLife.66601

Abstract

Background:

The impact of variable infection risk by race and ethnicity on the dynamics of SARS-CoV-2 spread is largely unknown.

Methods:

Here, we fit structured compartmental models to seroprevalence data from New York State and analyze how herd immunity thresholds (HITs), final sizes, and epidemic risk change across groups.

Results:

A simple model where interactions occur proportionally to contact rates reduced the HIT, but more realistic models of preferential mixing within groups increased the threshold toward the value observed in homogeneous populations. Across all models, the burden of infection fell disproportionately on minority populations: in a model fit to Long Island serosurvey and census data, 81% of Hispanics or Latinos were infected when the HIT was reached compared to 34% of non-Hispanic whites.

Conclusions:

Our findings, which are meant to be illustrative and not best estimates, demonstrate how racial and ethnic disparities can impact epidemic trajectories and result in unequal distributions of SARS-CoV-2 infection.

Funding:

K.C.M. was supported by National Science Foundation GRFP grant DGE1745303. Y.H.G. and M.L. were funded by the Morris-Singer Foundation. M.L. was supported by SeroNet cooperative agreement U01 CA261277.

Introduction

The dynamics of SARS-CoV-2 spread are influenced by population heterogeneity. This is especially true for herd immunity, which occurs when susceptible individuals in a population are indirectly protected from infection due to immunity in others. The herd immunity threshold (HIT) is the fraction of the population that is non-susceptible when an unmitigated epidemic reaches its peak, and estimating the HIT for SARS-CoV-2 is important for forecasting the harm associated with letting the epidemic spread in the absence of interventions (Randolph and Barreiro, 2020). A population that has reached the HIT is protected from a new epidemic occurring, until births or waning immunity reduce the proportion nonsusceptible below the HIT, but existing cases will still lead to some onward transmission as the epidemic declines and can often result in a final epidemic size that exceeds the HIT. In a population with homogeneous mixing, the HIT is 1–1/R0, where R0 is the basic reproduction number; this translates to an HIT of 67% using an R0 of 3.

However, population homogeneity is an unrealistic assumption, and models incorporating heterogeneity in social exposure and infection susceptibility (defined as the probability of infection given exposure) generally result in lowered HITs (Hill and Longini, 2003; Britton et al., 2020; Gomes et al., 2020; Aguas et al., 2020; Tkachenko et al., 2020). The key idea behind these models is that subpopulations important for epidemic spread (i.e., those with substantially increased susceptibility or exposure) become infected – and thus develop immunity – early on in an epidemic’s course. Herd immunity for the population overall is then achieved earlier because once these individuals are no longer susceptible to infection, further epidemic spread is slowed.

Importantly, these models also imply that in locations where SARS-CoV-2 has spread there may be demographic subpopulations with particularly high cumulative incidences of infection due to increased exposure, susceptibility, or both. Seroprevalence studies – which characterize past exposure by identifying SARS-CoV-2 antibodies – can identify these subpopulations and are more reliable and unbiased than case data, which suffer from under-reporting and other biases (Metcalf et al., 2020). Identifying and building structured models with these groups in mind is important for understanding how variation in exposure or susceptibility and social disparities is interconnected. These models are also useful for designing interventions that can both reduce disparities and disrupt overall transmission by focusing efforts on groups most affected by high transmission rates (Wallinga et al., 2010; Bubar et al., 2021). Transmission models in this space have incorporated subpopulation structure by focusing primarily on accounting for variation in susceptibility and exposure by age (Britton et al., 2020; Davies et al., 2020; Miller et al., 2020). Supporting this approach, susceptibility to infection, contact rates, and cumulative incidence in some locations all appear to vary by age (Davies et al., 2020; Mossong et al., 2008). Nonetheless, serosurveys in Belgium, Spain, Iran, New York City (NYC), Brazil, and other places exhibit relatively low variation in seropositivity by age (Herzog et al., 2020; Pollán et al., 2020; Shakiba et al., 2020; Rosenberg et al., 2020; Hallal et al., 2020), indicating additional factors that govern transmission spread.

Substantial racial and ethnic disparities in infection rates, hospitalizations, and deaths have been characterized across the US (Chamie et al., 2020; Moore et al., 2020; Millett et al., 2020a; Pan et al., 2020; Chen and Krieger, 2020; Bassett et al., 2020; Hanage et al., 2020), but it is unclear how these heterogeneities in risk are expected to change over time and what implications – if any – they have on overall epidemic dynamics. Here, we aim to address these questions by fitting compartmental SEIR transmission models structured by race and ethnicity to seroprevalence data from NYC and Long Island (Rosenberg et al., 2020). We focus primarily on building and analyzing variable exposure models because observed disparities in infection rates in US cities are strongly attributable to differences in mobility and exposure (Chang et al., 2021; Zelner et al., 2020; Kissler et al., 2020). Because of the challenges in acquiring racial and ethnic COVID-19 data (Krieger et al., 2020b), including social contact data that can be used in transmission models, we analyze a range of model structures that are compatible with the data and assess how these assumptions affect estimates of HITs, final epidemic sizes, and longitudinal trends in risk across groups. These results highlight the importance of developing socially informed COVID-19 transmission models that incorporate patterns of epidemic spread across racial and ethnic groups.

Materials and methods

SEIR model

Request a detailed protocol

We initially modeled transmission dynamics in a homogeneous population using a SEIR compartmental SARS-CoV-2 infection model:

(1) dSdt=-βIS
(2) dEdt=βIS-rE
(3) dIdt=rE-γI
(4) dRdt=γI

where S,E,I,R refer to the number of people in susceptible, latently infected, infectious, and recovered compartments, respectively. Given a mean incubation period and mean serial interval of 5 days as suggested by empirical studies (Nishiura et al., 2020; Lauer et al., 2020), we set the mean latent period 1/r to be 3 days to allow for pre-symptomatic transmission and the mean infectious period 1/γ to be 4 days to coincide with the observed serial interval. The per capita transmission rate is given by β=R0γ/N, where N is the total number of people in the population.

We extended this model to incorporate multiple racial and ethnic groups by including SEIR compartmental variables for each group, which interact through a social contact matrix that governs the interactions between and within groups. In matrix form, the structured SEIR model is given by

(5) dSdt=(BI)S
(6) dEdt=(BI)SrE
(7) dIdt=rEγI
(8) dRdt=γI

where ° denotes element-wise multiplication and S,E,I,R are column vectors comprising the compartmental variables for each group (e.g., S=[S0,,Sp]T for p demographic groups). We let S0 denote non-Hispanic whites, S1 denote Hispanics or Latinos, S2 denote non-Hispanic African-Americans, S3 denote non-Hispanic Asians, and S4 denote multiracial or other demographic groups, with similar ordering for elements in vectors E through R.

We define contacts to be interactions between individuals that allow for transmission of SARS-CoV-2 with some non-zero probability. Following the convention for age-structured transmission models (Wallinga et al., 2019), we defined the p×p per capita social contact matrix C to consist of elements cij at row i and column j, representing the per capita rate that individuals from group i are contacted by individuals of group j. Letting Ni be the total number of individuals in group i, the social contact matrix M consists of elements mij=cij*Ni, which represents the average number of individuals in group i encountered by an individual in group j. The susceptibility to infection can vary between groups, which we modeled by allowing the probability of infection given contact with an infected individual to vary: q=[q0,...,qp]T. The transmission matrix B is then given by (q1T)C, where 1T is a one by p vector of 1 s:

(9) B=[q0...q0...q0q4...q4...q4][c00...c02...c04c40...c42...c44]

Given mean duration of infectiousness 1/γ, the next-generation matrix G, representing the average number of infections in group i caused by an infected individual in group j, is given by (q1T)M/γ=NB/γ. R0 for the overall population described by this structured model was calculated by computing the dominant eigenvalue of matrix G. The effective reproduction number Rt at time t was calculated by computing the dominant eigenvalue of Gt=(q1T)Mt/γ=StB/γ, where the elements in Mt are given by cij*Si,t and Si,t is the number of susceptible individuals in group i at time t. To hold R0 values across model types constant when calculating HITs and final epidemic sizes, we re-scaled transmission matrices to have the same dominant eigenvalue. We also calculated the instantaneous incidence rate at some time t for all groups by calculating the force of infection λt=(BIt)St. To account for the effects of social distancing and other non-pharmaceutical interventions (NPIs), we scaled the transmission rate by a factor α beginning when 5% cumulative incidence in the population was reached, representing an established and expanding epidemic, for a variable duration. We analyzed a range of α values to reflect the variation in NPIs implemented.

Structured model variants

Request a detailed protocol

Simplifying assumptions are needed to constrain the number of variables to estimate in B given limited data. Under the variable susceptibility model, we set the contact rates cij to all be 1, indicating no heterogeneity in exposure, but allowed the qj in the susceptibility vector to vary (i.e., B=q1T).

Under each of two variable exposure models, in contrast, we set the susceptibility factors qj to be equal. The simplest variable exposure model we analyzed was the proportionate mixing model, which assumes that the contact rate for each pair of groups is proportional to the total contact rate of the two groups (i.e., total number of contacts per unit time for an individual of group i) (Hethcote, 1996). Denoting ai as the total contact rate for a member of group  i and a as the 1×p vector of ais, the ij th entry in the transmission matrix is given by

(10) βij=qaiajkakNk

and the overall transmission matrix B can be written as

(11) B=qkakNkaaT

Finally, under the assortative mixing assumption, we extended this model by partitioning a fraction ϵ of contacts to be exclusively within-group and distributed the rest of the contacts according to proportionate mixing (with δij being an indicator variable that is 1 when i=j and 0 otherwise) (Hethcote, 1996):

(12) βij=(1-ϵ)qaiajkakNk+ϵδijqaiNi
(13) B=(1ϵ)qkakNkaaT+ϵqdiag(a1/N)

Calculating the HIT and final epidemic size

Request a detailed protocol

We defined the HIT for all models as the fraction of nonsusceptible people when the effective reproduction number Rt first crosses 1. In the homogenous model, where Rt=Stβ/γ, the analytical solution for the HIT occurs when the fraction of nonsusceptible individuals equals 1–1/R0. In the structured models of heterogeneous populations, the HIT was calculated via simulation: we took the dominant eigenvalue of Gt at each timestep to calculate Rt and identified the number of nonsusceptible individuals when Rt first decreased below 1. For the heterogeneous models with mitigation measures, Rt was calculated at each timestep with respect to the corresponding unmitigated epidemic; in other words, the mitigation scaling factor α was not included in the Rt calculation. This identifies the point in the epidemic trajectory at which the population reaches the HIT even if all mitigation measures were lifted (i.e., HIT due to population immunity), as opposed to the point in the trajectory when the population transiently reaches the HIT due to mitigation measures (see Figure 1—figure supplement 1 for further explanation).

Final epidemic sizes were calculated by simulation by running the epidemic out to 1 year for R0 above 2 and 4 years for R0 below 2 to allow additional time for the slower epidemics to fully resolve. The final time point was used as the estimate for the final epidemic size.

Model fitting and data sources

Request a detailed protocol

SEIR differential equations were solved using the lsoda function in the deSolve package (version 1.28) of R (version 3.6.3). We estimated the ai in the variable exposure models and the qi in the variable susceptibility models using maximum likelihood fits to a single cross-sectional serosurvey from New York, which was collected from over 15,000 adults in grocery stores from April 19 to 28 (Rosenberg et al., 2020). We assumed that the seroprevalence data (adjusted cumulative incidence estimates from Table 2 in Rosenberg et al., 2020) were collected via a binomial sampling process: at a given time point ts representing the time of the serosurvey, the number of seropositive cases Yi(ts) in group i is distributed Bin(mi,Ri(ts)/Ni(ts)), where mi is the number of people tested from group i in the serosurvey and Ri/Ni is the fraction of recovered people from the SEIR model. The likelihood was calculated jointly for all demographic groups, with ts set to 100 days and the initial number of infected individuals set to 1 in each demographic group.

We conducted sensitivity analyses to assess whether these assumptions on epidemic timing, and number and distribution of initial infected individuals, affected parameter and HIT estimates. Varying the timing of epidemic start did not substantially affect HIT estimates as long as the time between epidemic start and serosurvey ts was reasonably large (e.g., >20 days) and assortativity was low (ϵ§lt;0.8) (Figure 2—figure supplement 1). The distribution and number of initial infected individuals also did not substantially affect HIT estimates for low levels of assortativity (ϵ§lt;0.8) (Figure 2—figure supplements 2 and 3). We limited our analyses to models with ϵ less than 0.8.

We acquired total population numbers (i.e., Ni for i{0,,4}) from the 2018 ACS census 1-year estimates Table B03002 (Hispanic or Latino origin by race) subsetted to the following counties: Bronx, Kings, New York, Queens, and Richmond Counties for NYC and Nassau and Suffolk Counties for Long Island. We acquired population numbers at the level of ‘all block groups’ within the above counties from the 2018 ACS 5-year estimates Table B03002 (Hispanic or Latino origin by race). Copies of the census data we used are available at https://github.com/kevincma/covid19-race-ethnicity-model/tree/main/data.

Census-informed transmission model

Request a detailed protocol

The total number of contacts Cij between groups i and j can be calculated using the assortative mixing social contact matrix.

Cij=cijNiNj=(1-ϵ)aiajNiNjkakNk+ϵδijaiNj

To fit this assortative mixing model to both serosurvey and census data, we modeled interactions between racial and ethnic groups at the census block group level – which we interpreted to be roughly equivalent to neighborhoods – allowing the structure of the census data to inform the dynamics of transmission. Specifically, we assumed proportionate mixing between racial and ethnic groups in each census block group, with no interactions between block groups. Under this proportionate mixing within neighborhoods assumption, the total number of contacts Cij between groups i and j is proportional to:

CijlLaiajNi,lNj,l

where L is the number of census block groups, Nj,l is the number of people from demographic group j in census block l, and aj is the total contact rate per individual in group j as before. Within each neighborhood, proportionate mixing holds: the total number of contacts between two groups is proportional to the activity level and neighborhood population of the groups. Additionally, similarly to the previous models, the total number of contacts across all groups (ijCij) must equal kakNk; to satisfy this constraint, we set a proportionality constant:

Cij=kakNkijlaiajNi,lNj,llLaiajNi,lNj,l

To fit ϵ, we minimized the absolute difference between these two formulations (Cij and Cij) of the total number of contacts across all pairs of groups (Figure 2—figure supplement 4):

ϵ^=argminϵij|CijCij|=argminϵij|(1ϵ)aiajNiNjkakNk+ϵδijaiNjkakNkijlaiajNi,lNj,llLaiajNi,lNj,l|

Using the fitted value ϵ^, we then conducted maximum likelihood to fit the varying ai as described previously. We repeated this process iteratively – holding ai constant while ϵ was fit, and then vice versa – until convergence, which we defined as the difference between successive ϵ^ values being lower than a threshold of 0.001 (Supplementary file 3). For the first iteration, we fit ϵ^ holding ai constant at 1. We used this iterative fitting procedure to accommodate both the seroprevalence and census data because the single seroprevalence time point cannot fit both the activity levels and ϵ.

To empirically characterize average neighborhood composition from the census data, we also calculated the exposure index matrix P with elements Pij for demographic groups i and j, defined similarly to McCauley et al., 2001; Richardson et al., 2020

(14) Pij=lL(Nj,lNj)(Ni,lTl)

where Tl is the total number of people in census block group l and other variables are defined as before. The exposure indices were used for descriptive purposes only and not used in the model fitting approach.

Variable susceptibility versus variable exposure models

Request a detailed protocol

The serosurvey data were compatible with variable susceptibility models in which Hispanics or Latinos, non-Hispanic Black people, non-Hispanic Asians, and multiracial or other people had 2.25, 1.62, 0.86, and 1.28 times the susceptibility to infection relative to non-Hispanic whites in NYC, respectively, and 4.32, 1.96, 0.92, and 2.48 times the susceptibility to infection relative to non-Hispanic whites in Long Island, respectively. As with variable exposure models, these differences in susceptibility lowered herd immunity levels and final epidemic sizes relative to the homogeneous model (Figure 1—figure supplement 2), but to a lesser extent; for instance, variable susceptibility models resulted in HITs ~10% greater than HITs under proportionate mixing for Long Island.

The difference between these models is that incorporating heterogeneity in susceptibility only affects susceptible individuals, but heterogeneity in exposure impacts both susceptible and infectious individuals: individuals from racial and ethnic groups with higher contact rates are both more likely to be infected, and when infected, to infect a greater number of secondary cases. This contrast is clear when comparing the next-generation matrices for each model, which lists the average number of secondary infections caused by an infected individual from a given demographic group (Figure 2—figure supplements 5 and 6). The epidemic resolves at an earlier stage in variable exposure models once these key transmission groups become immune because of this additional compound effect on transmission.

Our results contrasting mechanistic variable exposure and susceptibility models are in line with theoretical studies, which also indicate that models incorporating heterogeneity in exposure have more pronounced effects on HITs than models incorporating heterogeneity in susceptibility, assuming comparable continuous distributions of exposure and susceptibility (Tkachenko et al., 2020; Gomes et al., 2020). Tkachenko et al. showed that HIT=1-(1/R0)(1/λ), where λ is either 1+CV2 for variable susceptibility models or 1+CV2(2+γsCV)/(1+CV2) for variable exposure models, and CV is the coefficient of variation and γs is the skewness for the exposure distribution (Tkachenko et al., 2020). We calculated CV and skewness using the susceptibility and total contact rate ratios and substituted those values into the HIT formula, which is an approximation because our exposure and susceptibility distributions are discrete; nonetheless, the approximations result in similar HIT curves to the simulation results (Figure 1—figure supplement 3).

Code availability

Request a detailed protocol

Code and data to reproduce all analyses and figures are available at https://github.com/kevincma/covid19-race-ethnicity-model Ma et al., 2021, copy archived at swh:1:rev:75574621317a599e9058236f62bb34de63120e99. An executable version of the Jupyter notebook is available at https://mybinder.org/v2/gh/kevincma/covid19-race-ethnicity-model/HEAD.

Results

We model the dynamics of COVID-19 infection allowing for social exposure to infection to vary across racial and ethnic groups. Models incorporating variable susceptibility to COVID-19 are commonly used when stratifying by age because children are thought to have decreased susceptibility to infection (Davies et al., 2020). Variable susceptibility to infection across racial and ethnic groups has been less well characterized, and observed disparities in infection rates can already be largely explained by differences in mobility and exposure (Chang et al., 2021; Zelner et al., 2020; Kissler et al., 2020), likely attributable to social factors such as structural racism that have put racial and ethnic minorities in disadvantaged positions (e.g., employment as frontline workers and residence in overcrowded, multigenerational homes) (Henry Akintobi et al., 2020; Thakur et al., 2020; Tai et al., 2021; Khazanchi et al., 2020). In line with the notion that variation in exposure could instead be the main driver of observed seroprevalence differences, our primary focus is on analyzing variable exposure models; we have also analyzed variable susceptibility models for comparison (see Materials and methods section).

The simplest variable exposure models assume proportionate mixing, where the contact rate between groups is set to be proportional to the total contact rates (i.e., total number of contacts per time period per individual) of the two groups (Hethcote, 1996). We fit proportionate mixing models allowing for variable contact rates across racial and ethnic demographic groups to serosurvey data collected in late April from NYC and Long Island, comprising 5946 and 2074 adults, respectively (Rosenberg et al., 2020). The serosurvey data were compatible with proportionate mixing models in which Hispanics or Latinos, non-Hispanic Black people, non-Hispanic Asians, and multiracial or other people had 2.25, 1.62, 0.86, and 1.28 times the total contact rates relative to non-Hispanic whites in NYC. Model fits to Long Island resulted in even more pronounced exposure differences because of greater between-group differences in seropositivity (e.g., the seropositivity in Hispanics or Latinos relative to non-Hispanic whites was 1.85 times higher in Long Island than in NYC). Under proportionate mixing, Hispanics or Latinos, non-Hispanic Black people, non-Hispanic Asians, and multiracial or other people had 4.31, 1.96, 0.92, and 2.48 times the fitted total contact rates relative to non-Hispanic whites in Long Island, respectively. These differences in exposure impacted herd immunity levels and final epidemic sizes relative to the homogeneous model across a range of R0 values (Figure 1 and Figure 1—figure supplement 4); for example, for an R0 of 3, the HIT decreases to 58% in NYC and 40% in Long Island compared to 67% under the homogeneous model. The observed contrast in HITs and final sizes between the proportionate mixing and the homogenous model is in line with theoretical derivations (Figure 1—figure supplement 3). The HIT overall is reached in this model after cumulative incidence has disproportionately increased in certain minority groups: at the HIT, 75% of Hispanics or Latinos and 63% of non-Hispanic Black people were infected compared to 46% of non-Hispanic whites in NYC, and 77% of Hispanics or Latinos and 48% of non-Hispanic Black people were infected compared to 29% of non-Hispanic whites in Long Island (Figure 2).

Figure 1 with 7 supplements see all
Incorporating assortativity in variable exposure models results in increased herd immunity thresholds across a range of R0 values.

Variable exposure models were fitted to New York City and Long Island serosurvey data.

Figure 2 with 8 supplements see all
Cumulative incidence is disproportionately higher in some racial and ethnic minorities when the overall herd immunity threshold (HIT) is reached across model types and locations.

Results are shown for an epidemic with R0 = 3. The HIT for the population is indicated with a black line, and the HIT for a homogeneous model with the same R0 is indicated with a gray line.

The estimated total contact rate ratios indicate increased contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data (Chang et al., 2021); however, the magnitudes of the ratios are substantially higher than one would expect given the findings from those studies. This may reflect some of the limitations of the proportionate mixing assumption, which does not allow for preferential within-group contacts and hence must fit observed seropositivity differences solely by scaling total contact rates. To address this, we augment the model by partitioning a specified fraction ϵ of contacts to be exclusively within-group, with the remaining contacts distributed proportionately. This assortative mixing model captures more realistic patterns of interactions due to neighborhood structure. After fitting the models across a range of ϵ values, we observed that as ϵ increases, HITs and epidemic final sizes shifted higher back towards the homogeneous case (Figure 1, Figure 1—figure supplement 4); this effect was less pronounced for R0 values close to 1. This observation can be understood by comparing the epidemic cumulative incidence trajectories (Figure 3—figure supplement 1) and next-generation matrices (Figure 2—figure supplement 5 and 6): under proportionate mixing (ϵ=0), lower-risk demographic groups are protected from further infection due to built-up immunity in higher-risk demographic groups, but the magnitude of this indirect protection decreases as the proportion of exclusively within-group contacts increases and groups become more isolated.

We assessed a range of values for ϵ because the serosurvey data cannot be used to also fit the optimal ϵ value; given limited numbers of data points, any value of ϵ can fit exactly to the single seroprevalence time point we consider. To inform plausible assortativity levels, we instead used additional data on demographic population distributions from the American Community Survey US census stratified at the census block group level, which represents a small geographic area and population (Figure 2—figure supplement 7). We first calculated the exposure index, which represents the average neighborhood’s demographic composition from the perspective of an individual from a given racial or ethnic group; the proportion of contacts within group were elevated, suggesting assortativity in the census data (Supplementary file 1). To directly fit ϵ with these data, we assumed proportionate mixing within census block groups and ran an iterative fitting approach to jointly fit ϵ and the ai total contact rates using both census and serosurvey data (see Materials and methods and Supplementary file 3). This approach accounts for contacts based on geographic proximity using strong assumptions on mixing patterns, but may not capture contacts in other settings, such as work, beyond one’s immediate neighborhood of residence. The models jointly fitted to serosurvey and census data indicated that 46 and 39% of contacts were exclusively within-group in NYC and Long Island, respectively. The same groups had elevated total contact rates as under proportionate mixing, but the magnitudes of differences were now lower and more concordant with reported mobility differences (Chang et al., 2021): model estimates indicated Hispanics or Latinos, non-Hispanic Black people, non-Hispanic Asians, and multiracial or other people had 1.62, 1.35, 0.90, and 1.17 times the total contact rate relative to non-Hispanic whites in NYC, respectively, and 2.60, 1.63, 0.93, and 1.90 times the total contact rate relative to non-Hispanic whites in Long Island, respectively. For an epidemic with R0=3, the HITs for NYC and Long Island using these census-informed assortativity models were 61 and 45%, respectively. Similar to previous models, at the HIT, 76% of Hispanics or Latinos and 66% of non-Hispanic Black people were infected compared to 50% of non-Hispanic whites in NYC, and 81% of Hispanics or Latinos and 56% of non-Hispanic Black people were infected compared to 34% of non-Hispanic whites in Long Island (Figure 2).

Using these census-informed assortative mixing models, we then considered how the relative incidence rates of infection in demographic groups could change over the course of the epidemic. Early comparisons of infection and mortality rates have helped to identify racial and ethnic groups at high risk and the risk factors for infection (Chamie et al., 2020; Moore et al., 2020; Millett et al., 2020a; Pan et al., 2020; Chen and Krieger, 2020; Bassett et al., 2020; Hanage et al., 2020), but these studies often rely on cross-sectional snapshots of epidemiological patterns. The challenge is that these metrics can change over time: for instance, multiple studies indicate decreasing disparities in incidence rate over time across racial and ethnic groups relative to non-Hispanic whites (Krieger et al., 2020a; Van Dyke et al., 2021). The reasons for these changes are multifaceted, but even independent of the effect of interventions, behavioral changes, or differential access to testing, models of epidemic spread in structured populations imply that incidence rate ratios for high-risk groups can decrease substantially as the epidemic progresses because of depletion of susceptible individuals from these groups (Goldstein et al., 2017; Koopman et al., 1991). In line with this, we observe that instantaneous incidence rate ratios are elevated initially in high-contact groups relative to non-Hispanic whites, but this trend reverses after the epidemic has peaked and overall incidence is decreasing – a consequence of the fact that a majority of individuals have already become infected (Figure 3). Similarly, cumulative incidence ratios remain elevated in high-contact racial and ethnic groups throughout the epidemic, but the magnitude decreases as the epidemic progresses (Figure 3—figure supplement 2). Although these trajectories are not meant to be taken as predictive estimates, these results highlight the importance of controlling for dynamics-induced changes in epidemiological measures of disease burden when evaluating the impact of interventions for reducing inequities in SARS-CoV-2 infections (Kahn et al., 2020); otherwise, ineffective interventions – depending on their timing – might still be associated with declines in relative measures of disease burden.

Figure 3 with 3 supplements see all
Dynamics of incidence rate ratios relative to non-Hispanic whites in assortative mixing models fitted to census and serosurvey data.

Dashed line represents the peak overall incidence for the epidemic.

Finally, we assessed how robust these findings were to the impact of social distancing and other NPIs. We modeled these mitigation measures by scaling the transmission rate by a factor α beginning when 5% cumulative incidence in the population was reached. Setting the duration of distancing to be 50 days and allowing α to be either 0.3 or 0.6 (i.e., a 70% or 40% reduction in transmission rates, respectively), we assessed how the R0 versus HIT and final epidemic size relationships changed. We found that the R0 versus HIT relationships were similar comparing a mitigated to an unmitigated epidemic (Figure 1—figure supplement 5). In contrast, final epidemic sizes depended on the intensity of mitigation measures, though qualitative trends across models (e.g., increased assortativity leads to greater final sizes) remained true (Figure 1—figure supplement 6). To explore this further, we systematically varied α and the duration of NPIs while holding R0 constant at 3. We found again that the HIT was consistent, whereas final epidemic sizes were substantially affected by the choice of mitigation parameters (Figure 1—figure supplement 7); the distribution of cumulative incidence at the point of HIT was also comparable with and without mitigation measures (Figure 2—figure supplement 8). The most stringent NPI intensities did not necessarily lead to the smallest epidemic final sizes, an idea which has been explored in studies analyzing optimal control measures (Neuwirth et al., 2020; Handel et al., 2007). Longitudinal changes in incidence rate ratios also were affected by NPIs, but qualitative trends in the ordering of racial and ethnic groups over time remained consistent (Figure 3—figure supplement 3).

Discussion

Here, we explored how incorporating heterogeneity in SARS-CoV-2 spread across racial and ethnic groups could affect epidemic dynamics using deterministic transmission models. Models incorporating variable exposure generally decreased the HIT and final epidemic size, but incorporating preferential within-group contacts shifted HITs and final epidemic sizes higher, approaching the homogeneous case. Epidemiological measures of disease burden such as incidence rate ratios and cumulative incidence ratios also changed substantially over the course of the epidemic, highlighting the need to account for these trends when evaluating interventions (Kahn et al., 2020). These results illustrate the varied effects of different structured heterogeneity models, but are not meant to be best estimates given the limited seroprevalence data.

Across all model variants, the observed higher cumulative incidence among Hispanics or Latinos and non-Hispanic Black people compared to non-Hispanic whites led to estimates of higher estimated contacts relative to non-Hispanic whites, mirroring existing inequities in housing, education, healthcare, and beyond (Khazanchi et al., 2020; Millett et al., 2020a; Millett et al., 2020b; Boyd et al., 2020; Bailey et al., 2021). The estimated contact rate differences also concord with reports that frontline workers, who are unable to engage in physical or social distancing to the same degree as other types of workers, are disproportionately from minority backgrounds (Blau et al., 2020; Chang et al., 2021; Kissler et al., 2020). The assortativity we observed in the census data has root causes in many areas, including residential segregation arising from a long history of discriminatory practices (Yang et al., 2020; Millett et al., 2020b; Bailey et al., 2021; Benfer et al., 2021). Projecting the epidemic forward indicated that the overall HIT was reached after cumulative incidence had increased disproportionately in minority groups, highlighting the fundamentally inequitable outcome of achieving herd immunity through infection. All of these factors underscore the fact that incorporating heterogeneity in models in a mechanism-free manner can conceal the disparities that underlie changes in epidemic final sizes and HITs. In particular, overall lower HIT and final sizes occur because certain groups suffer not only more infection than average, but more infection than under a homogeneous mixing model; incorporating heterogeneity lowers the HIT but increases it for the highest-risk groups (Figure 2).

These results also suggest that public health interventions for reducing COVID-19 inequities can have synergistic effects for controlling the overall epidemic (Richardson et al., 2020). For instance, from a transmission-control perspective, age-structured models indicate that vaccination of high-contact age groups – such as young adults – is optimal for controlling the spread of SARS-CoV-2 (Bubar et al., 2021). Similar interventions are being explored for disadvantaged populations because of how increased exposure underlies much of the higher infection risk that racial and ethnic minorities experience. For instance, Mulberry et al. used a structured SEIR model to evaluate the impact of preferentially vaccinating essential workers, finding that such a strategy was more effective in reducing morbidity, mortality, and economic cost due to COVID-19 compared to age-only vaccination prioritization strategies (Mulberry et al., 2021). Because racial and ethnic minorities are overrepresented among essential workers (Blau et al., 2020; Chang et al., 2021; Kissler et al., 2020), further modeling studies could evaluate the impact of such strategies on also reducing inequities in COVID-19 infection rates. Wrigley-Field et al. took a different methodological approach, projecting demographically stratified death rates from 2020 into 2021 assuming various vaccination strategies (Wrigley-Field et al., 2021). In line with the prior study, they found that vaccination strategies that prioritized geographic areas based on socioeconomic criteria or prior COVID mortality rates outperformed age-based strategies in reducing overall mortality and inequities in mortality. Because populations here were considered independently, these results could be extended using a transmission modeling framework, such as the one we have described, that accounts for interactions between racial and ethnic groups and thus allows for synergistic benefits from vaccinating high-risk populations. All policy proposals in this space should, of course, carefully consider the legal and ethical dimensions of vaccinations or other interventions targeted by race or ethnicity (Schmidt et al., 2020).

We note several limitations with this study. First, biases in the serosurvey sampling process can substantially affect downstream results; any conclusions drawn depend heavily on the degree to which serosurvey design and post-survey adjustments yield representative samples (Clapham et al., 2020). For instance, because the serosurvey we relied on primarily sampled people at grocery stores, there is both survival bias (cumulative incidence estimates do not account for people who have died) and ascertainment bias (undersampling of at-risk populations that are more likely to self-isolate, such as the elderly) (Rosenberg et al., 2020; Accorsi et al., 2021). These biases could affect model estimates if, for instance, the capacity to self-isolate varies by race or ethnicity – as suggested by associations of neighborhood-level mobility versus demographics (Kishore et al., 2020a; Kissler et al., 2020) – leading to an overestimate of cumulative incidence and contact rates in whites. Other sources of uncertainty, such as antibody test sensitivity and specificity, could also be incorporated into transmission models in future work (Larremore et al., 2020; Accorsi et al., 2021). Second, we have assumed that seropositivity implies complete immunity and that immunity does not wane. These are strong assumptions that can be revisited as empirical studies on the length of natural immunity are conducted. Third, we have assumed the impact of NPIs such as stay-at-home policies, closures, and the like to equally affect racial and ethnic groups. Empirical evidence suggests that during periods of lockdown certain neighborhoods that are disproportionately wealthy and white tend to show greater declines in mobility than others (Kishore et al., 2020a; Kissler et al., 2020). These simplifying assumptions were made to aid in illustrating the key findings of this model, but for more detailed predictive models, the extent to which contact rate differences change could be evaluated using longitudinal contact survey data (Feehan and Mahmud, 2020) since granular mobility data are typically not stratified by race and ethnicity due to privacy concerns (Kishore et al., 2020b). Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation (Mulberry et al., 2021). These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates. Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts (Krieger et al., 2020b). Addressing these data barriers will also be key for translating these and similar models into actionable policy proposals on vaccine distribution and NPIs.

In summary, we have explored how deterministic transmission models can be extended to study the dynamics of infection in racial and ethnic groups, and how the impact of heterogeneity on the HIT and final epidemic size depends strongly on the details of how heterogeneity is modeled. We have shown that due to early infections in individuals from the most at-risk group, relative measures of incidence may decline and even reverse, but inequities in the cumulative burden of infection persist throughout the epidemic as the HIT is reached. These results describe a framework that can be extended to other cities and countries in which racial and ethnic disparities in seropositivity have been observed (Hallal et al., 2020; Flannery et al., 2020; Chan et al., 2021) and are a step towards using transmission models to design policy interventions for reducing disparities in COVID-19 and other diseases.

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files.

References

    1. Blau FD
    2. Koebe J
    3. Meyerhofer PA
    (2020)
    Who are the essential and frontline workers?
    National Bureau of Economic Research 10:w27791.
  1. Book
    1. Hethcote HW
    (1996) Modeling heterogeneous mixing in infectious disease dynamics
    In: Isham V, Medley G, editors. Models for Infectious Human Diseases: Their Structure and Relation to Data. Cambridge University Press. pp. 215–238.
    https://doi.org/10.1017/CBO9780511662935.030
  2. Book
    1. Krieger N
    2. Testa C
    3. Chen JT
    4. Waterman PD
    5. Hanage WP
    (2020a)
    A Warning against Using Static US County-Level Community Data to Guide Equity in COVID-19 Vaccine Distribution: Temporal and Spatial Correlations of Community Characteristics with COVID-19 Cases and Deaths Vary Enormously and Are Increasingly Uninformative
    HCPDS Working Paper.
  3. Book
    1. Wallinga J
    2. Kassteele J
    3. Hens N
    (2019)
    Contact patterns for contagious diseases
    In: Held L, Hens N, O’Neill P, Wallinga J, editors. Handbook of Infectious Disease Data Analysis (1 edn). Chapman and Hall/CRC. pp. 93–110.

Decision letter

  1. Joshua T Schiffer
    Reviewing Editor; Fred Hutchinson Cancer Research Center, United States
  2. Miles P Davenport
    Senior Editor; University of New South Wales, Australia
  3. Joshua T Schiffer
    Reviewer; Fred Hutchinson Cancer Research Center, United States
  4. Jon Zelner
    Reviewer; UMICH
  5. Mia Moore
    Reviewer; Fred Hutch, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

Thank you for submitting and revising this paper which makes the important point that the herd immunity threshold, as well as the pattern of incidence over time, is likely to be dictated by the degree of assortative mixing between racial and ethnic groups during the SARS-CoV-2 pandemic. The paper's observations can hopefully be used to optimize non-pharmaceutical interventions and vaccine implementation in high-risk ethnic and racial groups.

Decision letter after peer review:

Thank you for submitting your article "Modeling the impact of racial and ethnic disparities on COVID-19 epidemic dynamics" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Joshua T Schiffer as the Reviewing Editor and Reviewer #1, by Miles Davenport as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jon Zelner (Reviewer #2); Mia Moore (Reviewer #3).

The reviewers have discussed their reviews with one another in detail, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Please consider broadening the range of R0 considered in the analysis, with inclusion of values closer to one. It would also be useful to allow R0 (or rather R_effective) to take on varying levels over time to reflect observed shifts in physical distancing and NPIs during the pandemic. All 3 reviewers agreed that it would be interesting to see whether these changes, which would more closely approximate reality, would alter the paper's conclusions in anyway.

We do not feel that it is required to precisely recapitulate the dynamics of the NYC and/or Long Island epidemics, but rather to attempt scenarios other than the unmitigated simulated epidemic in the original version.

2) Similarly, another way to strengthen the paper would be to show what happens when "herd immunity" is achieved via vaccination rather than infection. The interaction with the mixing models would be a fascinating and useful contribution.

Reviewer #1 (Recommendations for the authors):

Strengths:

1) The model structure is appropriate for the scientific question.

2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

3) Generally speaking, the analysis supports the conclusions.

Other considerations:

1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the current model.

2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population and within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC and Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence and CI among these groups that is predicted in Figure 3 and Sup Figure 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use and effective of NPIs, etc….). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

3) I think the range of R0 in the figures should be extended to go as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 and 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

Reviewer #2 (Recommendations for the authors):

Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

1. Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

2. It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

3. The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

Reviewer #3 (Recommendations for the authors):

The authors show that in an uncontrolled epidemic the herd-immunity threshold may differ dramatically between racial groups. Although this question is undeniably important and the authors have shown that their results are robust to differing assumptions about population mixing, it seems to me that the situation considered is not completely relevant to current state of the covid epidemic. The herd-immunity threshold is assumed to be reached via a single, unmitigated wave within a few months. Unfortunately, the details of how this threshold is calculated are missing from the manuscript, but I can't help but imagine that it would be quite different if it was assumed to be possible to reach herd-immunity via vaccination. I think the authors need to address this.

Figure 2 could potentially include the cumulative incidence associated with R0=3 in a homogeneous model.

Multi-panel figures are unlabeled and split over more than one page, separated from their captions.

Issue with assortative mixing assumption: When you assume a = 1, you assume that everyone makes the same number of contacts. How, then, is one group defined to be at higher risk than another. Is a really 1 in this case?

Issue with assortative mixing derivation: I have a minor concern about the derivation, specifically how the contact matrix is fit. When I try to follow your steps and fill-in the gaps I get a slightly different result. So please either clarify your reasoning or fix the result.

Total i seen by a single j in one day = c_ij*Ni

Total i seen by all j in one day = c_ij*Ni*Nj=total j seen by all i in one day

Using the P_ij formulation (and please drop the P_ab notation, it is not consistently used and P_ij makes it easier to follow)

Total j seen by a single i in one day = P_ij % of people seen by group i = P_ij*ai=P_ij (as a=1)

Total j seen by all i in one day = P_ij*Ni

Therefore you want to minimize |c_ij*Ni*Nj-P_ij*Ni|=|(1-e)N_j*Ni/sum(N_k)+e*Ni*d_ij – Ni*Pij|

= |Ni((1-e)Nj/sum(N_k) + e*d_ij – P_ij)|

Which is only different from what you have in that Ni and Nj are switched in one term.

With a not equal to 1 (which you appear to use?) should be something like:

= |Ni((1-e)a_i a_j Nj/sum(N_k) + e*d_ij – a_i a_j sum_k(N_ik N_jk/sum(a_i N_lk)))|

Where the P_ij term is now weighted by a.

Calculation of HIT not described: I also feel like I'm missing something about how HIT is derived from the simulations. It sounds as though you used a computation rather than analytic approach, but I cannot find where it is spelled out. That seems important.

Fit to sero-prevalence data not shown or described: You initially list fitted activity levels, can you provide error bounds for these values? What is the nature of the data that is being fit? Is it only a single proportion collected at a single time? Can you show model fits? Unclear what was done here?

You also say "differences in exposure…are in line with theoretical derivations". I assume that what you meant is that differences in herd immunity levels and final epidemic sizes are in line with theoretical derivations, but the sentence doesn't read that way.

Please put activity levels in a table: These values (and uncertainty ranges) should be in a table where it is easy to compare between the models.

Fit of assortative mixing model not shown or described: In the results you say you first choose epsilon, then you fit a afterwards. However the epsilon fit is performed assuming that a is zero. Can you explain the reasoning here? I'm concerned that this methodology underestimates mixing (because if contact is heterogeneous by group, then you are more likely to contact individuals from the higher risk group than is strictly predicted by geographic distribution).

Incidence rate ratio plot is misleading: The incidence does not follow this pattern in proportionate mixing, and incidence is dropping in all groups. Furthermore this plot shows a time series which is completely unrealistic and wildly different from data. Most importantly, it seems to be wholly irrelevant to the main point of the paper.

https://doi.org/10.7554/eLife.66601.sa1

Author response

1) Please consider broadening the range of R0 considered in the analysis, with inclusion of values closer to one.

We agree this would be of interest and have extended the range of R0 values. Figure 1 has been updated accordingly; we also updated the text with new findings: “After fitting the models across a range of ε values, we observed that as ε increases, HITs and epidemic final sizes shifted higher back towards the homogeneous case (Figure 1 and Figure 1—figure supplement 4); this effect was less pronounced for R0 values close to 1.”

It would also be useful to allow R0 (or rather R_effective) to take on varying levels over time to reflect observed shifts in physical distancing and NPIs during the pandemic. All 3 reviewers agreed that it would be interesting to see whether these changes, which would more closely approximate reality, would alter the paper's conclusions in anyway.

We do not feel that it is required to precisely recapitulate the dynamics of the NYC and/or Long Island epidemics, but rather to attempt scenarios other than the unmitigated simulated epidemic in the original version.

We have conducted additional analyses exploring the important suggestion by the reviewers that social distancing could affect these conclusions. The text and figures have been updated accordingly:

“Finally, we assessed how robust these findings were to the impact of social distancing and other non-pharmaceutical interventions (NPIs). We modeled these mitigation measures by scaling the transmission rate by a factor α beginning when 5% cumulative incidence in the population was reached. Setting the duration of distancing to be 50 days and allowing α to be either 0.3 or 0.6 (i.e. a 70% or 40% reduction in transmission rates, respectively), we assessed how the R0 versus HIT and final epidemic size relationships changed. We found that the R0versus HIT relationship was similar to in the unmitigated epidemic (Figure 1—figure supplement 5). In contrast, final epidemic sizes depended on the intensity of mitigation measures, though qualitative trends across models (e.g. increased assortativity leads to greater final sizes) remained true (Figure 1—figure supplement 6). To explore this further, we systematically varied α and the duration of NPIs while holding R0 constant at 3. We found again that the HIT was consistent, whereas final epidemic sizes were substantially affected by the choice of mitigation parameters (Figure 1—figure supplement 7); the distribution of cumulative incidence at the point of HIT was also comparable with and without mitigation measures (Figure 2—figure supplement 8). The most stringent NPI intensities did not necessarily lead to the smallest epidemic final sizes, an idea which has been explored in studies analyzing optimal control measures (Neuwirth et al., 2020; Handel et al., 2007). Longitudinal changes in incidence rate ratios also were affected by NPIs, but qualitative trends in the ordering of racial and ethnic groups over time remained consistent (Figure 3—figure supplement 3).

2) Similarly, another way to strengthen the paper would be to show what happens when "herd immunity" is achieved via vaccination rather than infection. The interaction with the mixing models would be a fascinating and useful contribution.

We appreciate the reviewers interest in how the models we have presented could be extended to incorporate vaccination. Ultimately, we felt it would be irresponsible to provide preliminary estimates that could be misinterpreted as policy suggestions using a model that was not originally designed for such a nuanced question, especially because explicitly race-based approaches are legally unclear and likely controversial (Schmidt et al., 2020). Other studies have instead addressed this question extensively using proxies for race and ethnicity -- such as geography and essential worker status -- and we have expanded the Discussion with references to this emerging and important body of work.

“Similar interventions are being explored for disadvantaged populations because of how increased exposure underlies much of the higher infection risk that racial and ethnic minorities experience. For instance, Mulberry et al. used a structured SEIR model to evaluate the impact of preferentially vaccinating essential workers, finding that such a strategy was more effective in reducing morbidity, mortality, and economic cost due to COVID-19 compared to age-only vaccination prioritization strategies (Mulberry et al., 2021). Because racial and ethnic minorities are overrepresented among essential workers (Blau et al., 2020; Chang et al., 2020; Kissler et al., 2020), further modeling studies could evaluate the impact of such strategies on also reducing inequities in COVID-19 infection rates. Wrigley-Field et al. took a different methodological approach, projecting demographically-stratified death rates from 2020 into 2021 assuming various vaccination strategies (Wrigley-Field et al., 2021). In line with the prior study, they found that vaccination strategies that prioritized geographic areas based on socioeconomic criteria or prior COVID mortality rates outperformed age-based strategies in reducing overall mortality and inequities in mortality. Because populations here were considered independently, these results could be extended using a transmission modeling framework, such as the one we have described, that accounts for interactions between racial and ethnic groups and thus allows for synergistic benefits from vaccinating high-risk populations. All policy proposals in this space should, of course, carefully consider the legal and ethical dimensions of vaccinations or other interventions targeted by race or ethnicity (Schmidt et al., 2020).”

Reviewer #1 (Recommendations for the authors):

Strengths:

1) The model structure is appropriate for the scientific question.

2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

3) Generally speaking, the analysis supports the conclusions.

Other considerations:

1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the current model.

We agree that this would be an important extension for future work and have indicated this in the Discussion, along with the types of data necessary to fit such models:

“Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation (Mulberry et al., 2021). These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population and within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC and Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence and CI among these groups that is predicted in Figure 3 and Sup Figure 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use and effective of NPIs, etc….). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

We have highlighted this important limitation in the Discussion:

“Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts (Krieger et al., 2020). Addressing these data barriers will also be key for translating these and similar models into actionable policy proposals on vaccine distribution and non-pharmaceutical interventions.”

3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 and 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

Thanks for this suggestion; we agree -- see Essential Revisions point 1 for our response.

4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

We agree with these points -- see Essential Revisions point 2 for our response, and points 1 and 2 above for examples of additional data required.

Reviewer #2 (Recommendations for the authors):

Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

We thank the reviewer for the supportive comments.

1. Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

We have included this important point in the limitations section on potential serosurvey biases: “First, biases in the serosurvey sampling process can substantially affect downstream results; any conclusions drawn depend heavily on the degree to which serosurvey design and post-survey adjustments yield representative samples (Clapham et al., 2020). For instance, because the serosurvey we relied on primarily sampled people at grocery stores, there is both survival bias (cumulative incidence estimates do not account for people who have died) and ascertainment bias (undersampling of at-risk populations that are more likely to self-isolate, such as the elderly; Rosenberg et al., 2020; Accorsi et al., 2021). These biases could affect model estimates if, for instance, the capacity to self-isolate varies by race or ethnicity -- as suggested by associations of neighborhood-level mobility versus demographics (Kishore et al., 2020; Kissler et al., 2020) -- leading to an overestimate of cumulative incidence and contact rates in whites.”

2. It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

We agree with this important point and indeed this was the goal, in part, of the analyses in Figure 2. We have added additional text to the Discussion highlighting this: “Projecting the epidemic forward indicated that the overall HIT was reached after cumulative incidence had increased disproportionately in minority groups, highlighting the fundamentally inequitable outcome of achieving herd immunity through infection. All of these factors underscore the fact that incorporating heterogeneity in models in a mechanism-free manner can conceal the disparities that underlie changes in epidemic final sizes and HITs. In particular, overall lower HIT and final sizes occur because certain groups suffer not only more infection than average, but more infection than under a homogeneous mixing model; incorporating heterogeneity lowers the HIT but increases it for the highest-risk groups (Figure 2).”

For vaccination, see our response to Essential Revisions point 2.

3. The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

This is a great suggestion, but this type of data has generally not been available due to privacy concerns from disaggregating mobility data by race and ethnicity (Kishore et al., 2020). Instead, we modeled NPIs as mentioned in Essential Revisions point 2, with the caveat that reduction in mobility was assumed to be identical across groups. We added this into the text explicitly as a limitation: “Third, we have assumed the impact of non-pharmaceutical interventions such as stay-at-home policies, closures, and the like to equally affect racial and ethnic groups. Empirical evidence suggests that during periods of lockdown, certain neighborhoods that are disproportionately wealthy and white tend to show greater declines in mobility than others (Kishore et al., 2020; Kissler et al., 2020). These simplifying assumptions were made to aid in illustrating the key findings of this model, but for more detailed predictive models, the extent to which activity level differences change could be evaluated using longitudinal contact survey data (Feehan and Mahmud, 2020), since granular mobility data are typically not stratified by race and ethnicity due to privacy concerns (Kishore et al., 2020).”

Reviewer #3 (Recommendations for the authors):

The authors show that in an uncontrolled epidemic the herd-immunity threshold may differ dramatically between racial groups. Although this question is undeniably important and the authors have shown that their results are robust to differing assumptions about population mixing, it seems to me that the situation considered is not completely relevant to current state of the covid epidemic. The herd-immunity threshold is assumed to be reached via a single, unmitigated wave within a few months. Unfortunately, the details of how this threshold is calculated are missing from the manuscript, but I can't help but imagine that it would be quite different if it was assumed to be possible to reach herd-immunity via vaccination. I think the authors need to address this.

We have broadened the applicability of the findings by incorporating the effect of NPIs; see Essential Revisions points 1 and 2 for our responses.

Additionally, we have expanded the Methods section to explicitly state how we calculated the HIT and final epidemic size: “Calculating the HIT and final epidemic size”.

We defined the herd immunity threshold (HIT) for all models as the fraction of non-susceptible people when the effective reproduction number Rt first crosses 1. In the homogenous model, where Rt = St β / γ, the analytical solution for the HIT occurs when the fraction of non-susceptible individuals equals 1-1/R0. In the structured models of heterogeneous populations, the HIT was calculated via simulation: we took the dominant eigenvalue of Gt at each timestep to calculate Rt, and identified the number of non-susceptible individuals when Rt first decreased below 1. For the heterogeneous models with mitigation measures, Rt was calculated at each timestep with respect to the corresponding unmitigated epidemic; in other words, the mitigation scaling factor α was not included in the Rt calculation. This identifies the point in the epidemic trajectory at which the population reaches the HIT even if all mitigation measures were lifted (i.e. HIT due to population immunity), as opposed to the point in the trajectory when the population transiently reaches the HIT due to mitigation measures (see Figure 1—figure supplement 1 for further explanation).

Final epidemic sizes were calculated by simulation by running the epidemic out to 1 year for R0 above 2, and 4 years for R0 below 2 to allow additional time for the slower epidemics to fully resolve. The final time point was used as the estimate for the final epidemic size.”

Figure 2 could potentially include the cumulative incidence associated with R0=3 in a homogeneous model.

We agree this could be helpful for readers and have updated Figure 2 and the caption accordingly.

Multi-panel figures are unlabeled and split over more than one page, separated from their captions.

Supplementary Figures 2 and 3 (now labelled as Figure 2—figure supplements 5 and 6) have been updated to be on one page with appropriate labels.

Issue with assortative mixing assumption: When you assume a = 1, you assume that everyone makes the same number of contacts. How, then, is one group defined to be at higher risk than another. Is a really 1 in this case?

Issue with assortative mixing derivation: I have a minor concern about the derivation, specifically how the contact matrix is fit. When I try to follow your steps and fill-in the gaps I get a slightly different result. So please either clarify your reasoning or fix the result.

Total i seen by a single j in one day = c_ij*Ni

Total i seen by all j in one day = c_ij*Ni*Nj=total j seen by all i in one day

Using the P_ij formulation (and please drop the P_ab notation, it is not consistently used and P_ij makes it easier to follow)

Total j seen by a single i in one day = P_ij % of people seen by group i = P_ij*ai=P_ij (as a=1)

Total j seen by all i in one day = P_ij*Ni

Therefore you want to minimize |c_ij*Ni*Nj-P_ij*Ni|=|(1-e)N_j*Ni/sum(N_k)+e*Ni*d_ij – Ni*Pij|

= |Ni((1-e)Nj/sum(N_k) + e*d_ij – P_ij)|

Which is only different from what you have in that Ni and Nj are switched in one term.

With a not equal to 1 (which you appear to use?) should be something like:

= |Ni((1-e)a_i a_j Nj/sum(N_k) + e*d_ij – a_i a_j sum_k(N_ik N_jk/sum(a_i N_lk)))|

Where the P_ij term is now weighted by a.

Fit of assortative mixing model not shown or described: In the results you say you first choose epsilon, then you fit a afterwards. However the epsilon fit is performed assuming that a is zero. Can you explain the reasoning here? I'm concerned that this methodology underestimates mixing (because if contact is heterogeneous by group, then you are more likely to contact individuals from the higher risk group than is strictly predicted by geographic distribution).

We thank the reviewer for working through the derivation -- we have improved our approach to fitting the census-informed model (please see Methods section “Model fitting and data sources” beginning with “We acquired total population numbers” for typeset equations) and updated all figures and analyses accordingly; the new model fitting approach should address the reviewer’s concerns because we now jointly fit epsilon and the a’s to the serosurvey and census data, and no longer are making assumptions about all a’s being 1.

Calculation of HIT not described: I also feel like I'm missing something about how HIT is derived from the simulations. It sounds as though you used a computation rather than analytic approach, but I cannot find where it is spelled out. That seems important.

HIT methodological details have now been provided (see above).

You also say "differences in exposure…are in line with theoretical derivations". I assume that what you meant is that differences in herd immunity levels and final epidemic sizes are in line with theoretical derivations, but the sentence doesn't read that way.

Thank you for noting this; the sentence has been updated: “The observed contrast in HITs and final sizes between the proportionate mixing and the homogenous model is in line with theoretical derivations (Figure 1—figure supplement 3).”

Fit to sero-prevalence data not shown or described: You initially list fitted activity levels, can you provide error bounds for these values? What is the nature of the data that is being fit? Is it only a single proportion collected at a single time? Can you show model fits? Unclear what was done here?

We have clarified that we are fitting to a single cross-sectional data point: “We estimated the ai in the variable exposure models and the qi in the variable susceptibility models using maximum likelihood fits to a single cross-sectional serosurvey from New York, which was collected from over 15,000 adults in grocery stores from April 19-28th (Rosenberg et al., 2020).”

The model fitting procedure is described in the “Model fitting and data sources” section of the Methods. Because of the nature of the data, there are no error bounds (all models tested fit exactly to the single time point), and this is now highlighted as an opportunity for improvement in future studies: “Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts (Krieger et al., 2020).”

Please put activity levels in a table: These values (and uncertainty ranges) should be in a table where it is easy to compare between the models.

Activity levels (i.e. total contact rates) have been formatted into Supplementary File 2.

Incidence rate ratio plot is misleading: The incidence does not follow this pattern in proportionate mixing, and incidence is dropping in all groups.

The trend also holds true for proportionate mixing; for example, in Author response image 1 is the trajectory for the Long Island proportionate mixing model.

Author response image 1

We have clarified that incidence is dropping in all groups: “In line with this, we observe that instantaneous incidence rate ratios are elevated initially in high-exposure contact groups relative to non-Hispanic whites, but this trend reverses after the epidemic has peaked and overall incidence is decreasing -- a consequence of the fact that a majority of individuals have already become infected.”

Furthermore this plot shows a time series which is completely unrealistic and wildly different from data. Most importantly, it seems to be wholly irrelevant to the main point of the paper.

We have clarified that our estimates are meant to be illustrative: “Although these trajectories are not meant to be taken as predictive estimates, these results highlight the importance of controlling for dynamics-induced changes in epidemiological measures of disease burden when evaluating the impact of interventions for reducing inequities in SARS-CoV-2 infections (Kahn et al., 2020)”

Taken qualitatively, we argue that these trends are in fact congruent with multiple reports of decreasing disparities in incidence rate over time across racial and ethnic groups. For instance, Van Dyke et al. showed that incidence in minority groups aged <25 years relative to non-Hispanic whites decreased substantially over the course of the year. These results are similar to county-level correlations reported by Krieger et al. that showed decreasing correlation over time of incidence with county percent of color.

The cause of these trends is an open question. Proposed explanations include differential access to testing, differences in adherence to distancing or mask wearing, and so on, but the possible contribution of intrinsic epidemic dynamics has not been addressed. By showing that our modeling framework can reproduce these trends, we have demonstrated another plausible explanation for the observed trends and highlighted the utility of using transmission models stratified by race and ethnicity. Additionally, these questions are not just theoretical: health equity metrics are being used to guide reopening policies, and it is important to understand how these epidemiological measures are expected to change over time as the epidemic progresses.

https://doi.org/10.7554/eLife.66601.sa2

Article and author information

Author details

  1. Kevin C Ma

    Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, United States
    Contribution
    Formal analysis, Investigation, Software, Visualization, Writing – original draft, Conceptualization
    For correspondence
    kevinchenma@g.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4326-2911
  2. Tigist F Menkir

    Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, United States
    Contribution
    Investigation, Software, Funding acquisition, Conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6070-8017
  3. Stephen Kissler

    Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, United States
    Contribution
    Formal analysis, Resources, Conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6000-8387
  4. Yonatan H Grad

    1. Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, United States
    2. Division of Infectious Diseases, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States
    Contribution
    Supervision, Writing – review and editing, Validation, Methodology, Conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5646-1314
  5. Marc Lipsitch

    1. Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, United States
    2. Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, United States
    Contribution
    Supervision, Writing – review and editing, Resources, Methodology, Conceptualization
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1504-9213

Funding

National Science Foundation (DGE1745303)

  • Kevin C Ma

Morris-Singer Foundation

  • Yonatan H Grad
  • Marc Lipsitch

National Cancer Institute (U01 CA261277)

  • Marc Lipsitch

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Dr. Eli Rosenberg at the University at Albany School of Public Health and members of the Center for Communicable Disease Dynamics and Dr. Mary Bassett at the Harvard T.H. Chan School of Public Health for helpful comments. KCM was supported by the National Science Foundation GRFP grant DGE1745303. YHG and ML were funded by the Morris-Singer Foundation. M.L. was supported by SeroNet cooperative agreement U01 CA261277. Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA261277. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Senior Editor

  1. Miles P Davenport, University of New South Wales, Australia

Reviewing Editor

  1. Joshua T Schiffer, Fred Hutchinson Cancer Research Center, United States

Reviewers

  1. Joshua T Schiffer, Fred Hutchinson Cancer Research Center, United States
  2. Jon Zelner, UMICH
  3. Mia Moore, Fred Hutch, United States

Publication history

  1. Received: January 15, 2021
  2. Accepted: May 17, 2021
  3. Accepted Manuscript published: May 18, 2021 (version 1)
  4. Accepted Manuscript updated: May 24, 2021 (version 2)
  5. Version of Record published: June 23, 2021 (version 3)

Copyright

© 2021, Ma et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 883
    Page views
  • 109
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Ecology
    2. Epidemiology and Global Health
    Morgan P Kain et al.
    Research Article Updated

    Identifying the key vector and host species that drive the transmission of zoonotic pathogens is notoriously difficult but critical for disease control. We present a nested approach for quantifying the importance of host and vectors that integrates species’ physiological competence with their ecological traits. We apply this framework to a medically important arbovirus, Ross River virus (RRV), in Brisbane, Australia. We find that vertebrate hosts with high physiological competence are not the most important for community transmission; interactions between hosts and vectors largely underpin the importance of host species. For vectors, physiological competence is highly important. Our results identify primary and secondary vectors of RRV and suggest two potential transmission cycles in Brisbane: an enzootic cycle involving birds and an urban cycle involving humans. The framework accounts for uncertainty from each fitted statistical model in estimates of species’ contributions to transmission and has has direct application to other zoonotic pathogens.

    1. Epidemiology and Global Health
    2. Genetics and Genomics
    Mohd Anisul et al.
    Research Article Updated

    Background:

    The virus SARS-CoV-2 can exploit biological vulnerabilities (e.g. host proteins) in susceptible hosts that predispose to the development of severe COVID-19.

    Methods:

    To identify host proteins that may contribute to the risk of severe COVID-19, we undertook proteome-wide genetic colocalisation tests, and polygenic (pan) and cis-Mendelian randomisation analyses leveraging publicly available protein and COVID-19 datasets.

    Results:

    Our analytic approach identified several known targets (e.g. ABO, OAS1), but also nominated new proteins such as soluble Fas (colocalisation probability >0.9, p=1 × 10-4), implicating Fas-mediated apoptosis as a potential target for COVID-19 risk. The polygenic (pan) and cis-Mendelian randomisation analyses showed consistent associations of genetically predicted ABO protein with several COVID-19 phenotypes. The ABO signal is highly pleiotropic, and a look-up of proteins associated with the ABO signal revealed that the strongest association was with soluble CD209. We demonstrated experimentally that CD209 directly interacts with the spike protein of SARS-CoV-2, suggesting a mechanism that could explain the ABO association with COVID-19.

    Conclusions:

    Our work provides a prioritised list of host targets potentially exploited by SARS-CoV-2 and is a precursor for further research on CD209 and FAS as therapeutically tractable targets for COVID-19.

    Funding:

    MAK, JSc, JH, AB, DO, MC, EMM, MG, ID were funded by Open Targets. J.Z. and T.R.G were funded by the UK Medical Research Council Integrative Epidemiology Unit (MC_UU_00011/4). JSh and GJW were funded by the Wellcome Trust Grant 206194. This research was funded in part by the Wellcome Trust [Grant 206194]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.