Earliest infections predict the age distribution of seasonal influenza A cases
Abstract
Seasonal variation in the age distribution of influenza A cases suggests that factors other than age shape susceptibility to medically attended infection. We ask whether these differences can be partly explained by protection conferred by childhood influenza infection, which has lasting impacts on immune responses to influenza and protection against new influenza A subtypes (phenomena known as original antigenic sin and immune imprinting). Fitting a statistical model to data from studies of influenza vaccine effectiveness (VE), we find that primary infection appears to reduce the risk of medically attended infection with that subtype throughout life. This effect is stronger for H1N1 compared to H3N2. Additionally, we find evidence that VE varies with both age and birth year, suggesting that VE is sensitive to early exposures. Our findings may improve estimates of agespecific risk and VE in similarly vaccinated populations and thus improve forecasting and vaccination strategies to combat seasonal influenza.
Introduction
Seasonal influenza is a serious public health concern, resulting in approximately 100,000–600,000 hospitalizations and 5000–27,000 deaths per year in the United States despite extensive annual vaccination campaigns (Reed et al., 2015). The rapid evolution of the virus to escape preexisting immunity contributes to the relatively high incidence of influenza, including in previously infected older children and adults. How susceptibility arises and changes over time in the host population has been difficult to quantify.
A pathogen’s rate of antigenic evolution should affect the mean age of the hosts it infects, and differences in the rate of antigenic evolution have been proposed to explain differences in the age distributions of the two subtypes of influenza A. Compared to H3N2, H1N1 disproportionately infects children (Gagnon et al., 2018b; Caini et al., 2018; Khiabanian et al., 2009). It also evolves antigenically more slowly (Bedford et al., 2015). Thus, compared to H3N2, H1N1 is slower to escape immunity in individuals who have experienced prior infection (namely older children and adults), making them less susceptible to reinfection (Bedford et al., 2015; Beauté et al., 2015; Caini et al., 2018; Khiabanian et al., 2009). H3N2, in contrast, exhibits well known changes in antigenic phenotype that are expected to drive cases toward adults (Smith et al., 2004; Cobey and Hensley, 2017). Under this simple model, hosts previously infected with a subtype face equal risk of reinfection (on challenge) with an antigenic variant of that subtype.
The age distributions of influenza cases in exceptional circumstances—pandemics and spillovers of avian influenza—have shown unexpected variation that suggests important effects of prior infection. Excess mortality in some adult cohorts during the 1918 and 2009 H1N1 pandemics correlates with childhood infection with other subtypes (Gagnon et al., 2013; Worobey et al., 2014; Gagnon et al., 2018a). In the post2009 pandemic period, excess mortality and hospitalization were observed among cohorts first exposed to H2N2 or H3N2 during H1N1pdmdominated seasons (Budd et al., 2019). Similarly, the subtypes circulating in childhood predict individuals’ susceptibility to severe zoonotic infections with avian H5N1 and H7N9, regardless of later exposure to other seasonal subtypes (Gostic et al., 2016). These patterns suggest that early influenza infections, and not prior infection per se, strongly shape susceptibility.
Early infections might also affect the protection conferred by influenza vaccination. Foundational work on the theory of original antigenic sin demonstrated that an individual’s immune response to influenza vaccination is biased toward antigens similar to those encountered in childhood (Davenport and Hennessy, 1956). In some cases, this may result in a narrow antibody response focused on a single epitope (Davis et al., 2018). This phenomenon has been suggested to explain an unexpected decrease in vaccine effectiveness (VE) in the middleaged in the 2015–2016 influenza season (Skowronski et al., 2017b; Flannery et al., 2018). More generally, it has been hypothesized that biases in immune memory can arise from both past infections and vaccinations and lead to variation in VE that is sensitive to the precise history of exposures (Smith et al., 1999; Skowronski et al., 2017a).
To measure the effect of early exposures on medically attended infection risk and VE, we fitted statistical models to 3493 PCRconfirmed influenza cases identified through seasonal studies of influenza VE from the 2007–2008 to 2017–2018 seasons in the Marshfield Epidemiologic Study Area (MESA) in Marshfield, Wisconsin (Belongia et al., 2009; Belongia et al., 2011; Griffin et al., 2011; Treanor et al., 2012; Ohmit et al., 2016; McLean et al., 2014; Gaglani et al., 2016; Zimmerman et al., 2016; Jackson et al., 2017; Flannery et al., 2018, Figure 1—figure supplement 1). Each influenza season, individuals in a defined community cohort were recruited and tested for influenza when seeking outpatient care for acute respiratory infection. Eligibility was restricted to individuals >6 months of age living in MESA who received routine care from the Marshfield Clinic and who presented in an outpatient setting.
We sought to explain the variation in the age distribution of these cases by subtype and over time. Our model predicted the relative number of cases of influenza in each birth year each season as a function of the age structure of the population, agespecific differences in the risk of medically attended influenza A infection, early influenza infection, and vaccination. Despite the extensive antigenic evolution in both subtypes over the study period, we found strong evidence of protection from the subtype to which a birth cohort was likely first infected (the imprinting subtype) and variation in VE by birth cohort.
Materials and methods
Study cohort
Request a detailed protocolCases of PCRconfirmed, medically attended influenza were identified from annual community cohorts based on residency in MESA. MESA is a contiguous geographic area surrounding Marshfield, Wisconsin, where nearly all 61,000 residents receive outpatient and inpatient care from the Marshfield Clinic Health System (Kieke et al., 2015). For each influenza season from 2007 to 2008 through 2017–2018, we identified MESA residents >6 months of age who received routine care from the Marshfield Clinic. These individuals were eligible for recruitment into that season’s VE study if they sought care for acute respiratory infection. Trained research coordinators recruited patients during clinical encounters in primary care departments, including urgent care, pediatrics, combined internal medicine and pediatrics, internal medicine, and family practice. Patients were enrolled on weekdays, evenings, and weekends when clinical services were provided. Research staff used an electronic appointment system to screen the chief complaints for respiratory or febrile illness. Patients were then approached inperson to assess eligibility based on specific respiratory symptoms and duration of illness. The proportion of patients with medically attended acute respiratory infection (MAARI) who were screened for enrollment varied by season and was largely determined by the volume of patients each day and staffing capacity. Only symptoms and illness duration were used to determine eligibility among those patients who were in the predefined cohort. Patients were also assessed for the presence of medical conditions that put them at high risk for complications from influenza infection, as defined by the Advisory Committee on Immunization Practice (Smith et al., 2006). These conditions included cardiovascular disease, diabetes, pulmonary disease, cancer, kidney disease, liver disease, blood disorders, immunosuppressive disorders, metabolic disorders, and neurological/musculoskeletal disorders. We considered subjects vaccinated if they received that season’s influenza vaccine ≥14 days before enrollment. For the 2009–2010 season, we only considered receipt of the 2009 monovalent vaccine. The Marshfield Clinic generally does not capture MAARI in nursing facilities with dedicated medical staff, causing undersampling of the oldest age groups. We adjusted for this (Appendix 1: ‘Agespecific rates of approachment, enrollment, and nursing home residence’).
Each season, recruitment began when influenza activity was detected in the community and usually continued for 12–15 weeks. Symptom eligibility criteria varied by season but included fever/feverishness or cough during most seasons. We retroactively standardized symptom eligibility criteria to only require cough as a symptom. Individuals with illness duration >7 days or presenting in an inpatient (hospital) setting were excluded. After obtaining informed consent, a midturbinate swab was obtained for influenza detection. RTPCR was performed using CDC primers and probes to identify influenza cases, including type and subtype.
Calculating differences in the age distribution between seasons
Request a detailed protocolWe defined the age distribution of each season as the number of cases of the dominant (more common) subtype in each of nine age groups (0–4 yearolds, 5–9 yearolds, 10–14 yearolds, 15–19 yearolds, 20–29 yearolds, 30–39 yearolds, 40–49 yearolds, 50–64 yearolds, and ≥65 years old). We excluded the subdominant subtype in each season due to concerns that shortterm interference between the subtypes (Laurie et al., 2015; Goldstein et al., 2011) would affect the age distribution of the rarer subtype. The Gtest of independence was used to measure differences in seasons’ age distributions.
Calculating relative risk
Request a detailed protocolTo evaluate relative infection risk in different age groups, we measured their relative risk of infection in the first versus second half of each season. This risk is a combination of the chance of infection, conditional on infection (susceptibility), and the rate of contact with infected people. Attack rates should be higher in populations that experience more risk, and therefore these populations should be infected earlier in the epidemic (Worby et al., 2015). To calculate relative risk we used an approach similar to Worby et al., 2015. We defined the midpoint of each season as the week in which the cumulative number of cases of the dominant subtype among all people exceeded half the total for that season. Weeks before and after this point were assigned to the first and second half of the season, respectively. We assigned each case to one of the five age groups used by Worby et al., 2015 (04 yearolds, 5–17 yearolds, 18–49 yearolds, 50–64 year olds, and ≥65 years old). For each age group $g$, we defined relative risk as
where ${C}_{\text{first},t,g}$ and ${C}_{\text{second},t,g}$ are the fraction of cases of the dominant subtype during influenza season $t$ that occurred during the first or second half of the season, respectively. A relative risk >1 indicates that cases in an age group were more likely to occur during the first half of the season.
Calculating imprinting probabilities
Request a detailed protocolWe hypothesized that the subtype of a person’s first influenza A infection affects their future susceptibility to that subtype. Testing this hypothesis requires knowing the probability that a person’s primary influenza A infection was with a particular subtype. To calculate these probabilities, we emulated the approach of Gostic et al., 2016, which assumes these probabilities are determined by a person’s year of birth and subsequent exposure to each subtype.
First, we calculated the probability that an individual born in year $y$ received their first influenza A exposure in influenza season $t$. Assuming a constant perseason rate of infection ${i}_{0}$, the probability of infection in one season (i.e., the attack rate) is given by
By assuming that the average probability that a naive individual is infected in a single season is 0.28 (Bodewes et al., 2011; Gostic et al., 2016), we calculated the expected perseason infection rate (${i}_{0}$) as
However, because the intensity of epidemics varies between seasons (${I}_{t}$, Appendix 1: ‘Seasonal intensity’) and the fraction of the epidemic experienced by a person depends on their birth year (${\gamma}_{y,t}$, Appendix 1: ‘Fraction of season experienced’), we considered the timevarying perseason infection rate,
Therefore, the probability that a naive individual born in year $y$ is infected in season $t$ is
We used ${a}_{y,t}$ to calculate the fraction of a birth cohort $y$ that received their first influenza A infection in season $t$. Let ${U}_{y,t}$ be the fraction of people born in year $y$ who were unexposed at the beginning of season $t$ (Appendix 1: ‘Calculating the fraction unexposed’). The probability that a person born in year $y$ has their first infection in season $t$ is
We calculated ${m}_{s,t,y}$, the probability that a person born in year $y$ had their first influenza A infection with subtype $s$ in season $t$, by multiplying ${a}_{y,t}{U}_{y,t}$ by the frequency of subtype $s$ in season $t$, ${l}_{s,t}$ (Figure 3—figure supplement 1),
Modeling approach
Request a detailed protocolWe aimed to predict ${p}_{s,t,y,v}$, the fraction of cases of subtype $s$ in season $t$ among people born in year $y$ with vaccination status $v$. Our models assume that this is proportional to a combination of the following factors:
Demography. The age distribution of our study cohort is not static over the study period. All models adjusted for the changing fractions of the population in each birth cohort and season (Figure 1—figure supplement 2; Mathematical expressions for model components: ‘Demography’).
Agespecific effects. We considered that age itself may be associated with differences in medically attended influenza A infection risk stemming from differences in susceptibility and/or rates of contact with infectious people. Additionally, we expect that age groups may intrinsically differ in their healthcareseeking behaviors. These factors are inseparable in our data, and all models represent their combined effects with a static agespecific parameter shared by both subtypes that describes the risk of agespecific medically attended influenza A infection (Mathematical expressions for model components: ‘Agespecific factors’). We assumed no intrinsic differences in the agespecific virulence of the two subtypes. These agespecific parameters were fitted. We also adjusted for other potential sources of agespecific bias, including agespecific differences in study approachment and enrollment rates (Appendix 1: ‘Agespecific rates of approachment, enrollment, and nursing home residence’).
Imprinting. We tested several hypotheses of how primary exposures could affect the risk of medically attended infection with H1N1 and H3N2. In each version, we estimated fractional reductions in risk of medically attended H1N1 and H3N2 infection due to primary (i.e., imprinting) exposure to the same type:
Subtypespecific imprinting: Influenza has two main antigens, hemagglutinin (HA) and neuraminadase (NA). Imprinting could in theory derive from responses to either or both antigens. Because H1N1 is the only seasonal subtype of influenza with N1, we cannot separate the effects of initial N1 exposure from initial H1 exposure. However, since N2 appears in both H3N2 and H2N2 viruses, we can estimate protection against H3N2 infection from initial N2 exposure separately from protection from initial H3 exposure (Mathematical expressions for model components: ‘HA subtype imprinting’ and ‘N2 imprinting’).
Grouplevel imprinting: Influenza A viruses fall into two groups (I and II) corresponding to the two phylogenetic clades of HA. Gostic et al., 2016 found that primary infection by a virus belonging to one group protected against severe infection by another subtype in the same group. If grouplevel imprinting were influential, we would see primary infection with H2N2 conferring protection against H1N1, another group I virus, as well as H1N1 protecting against H1N1, and H3N2 against H3N2. We considered a separate class of models that assumes grouplevel protection instead of subtypespecific protection (Mathematical expressions for model components: ‘HA group imprinting’).
Vaccination. Approximately 45% of the MESA population was vaccinated against influenza each year (Figure 1—figure supplement 3; Appendix 1: ‘Vaccination coverage’). We estimated cases in vaccinated and unvaccinated individuals of each birth year separately. Naively, we expect that vaccinated individuals should seek medical attention for acute respiratory infection proportionally to the fraction of their cohort vaccinated that season. However, vaccinated individuals may seek medical attention for acute respiratory infection more frequently than nonvaccinees due to correlations between the decision to vaccinate, healthcareseeking behavior, and underlying medical conditions (Jackson et al., 2006a; Jackson et al., 2006b; Belongia et al., 2011). Indeed, we generally observed higher rates of highrisk medical conditions among vaccinated people compared to unvaccinated people (Figure 1—figure supplement 4). We attempted to adjust for this by calculating the fraction of vaccinated people among those who had MAARI and tested negative for influenza (i.e., the testnegative controls, ‘Mathematical expressions for model components: Vaccination’). We found that the vaccinated fraction exceeds vaccination coverage for most age groups, suggesting vaccinated individuals are overrepresented among cases for reasons unrelated to influenza (Figure 1—figure supplement 5). We also assumed that vaccination is not perfectly effective, and defined VE as the fractional reduction in cases expected in vaccinated compared to unvaccinated individuals after controlling for the effects described above. We estimated subtypespecific VE under five scenarios: (i) constant across age groups and seasons; (ii) constant across age groups but seasonspecific; (iii) agespecific but constant across seasons; (iv) imprintingspecific; and (v) birthcohortspecific. We assumed that vaccination affects risk only in the current season, i.e, vaccination in a prior season confers no residual protection (Mathematical expressions for model components: ‘Vaccination’; Ohmit et al., 2014; Ohmit et al., 2016; Jackson et al., 2017; Skowronski et al., 2016; Pebody et al., 2013; McLean et al., 2018).
We defined models as specific combinations of the above factors. We tested a set of 10 models by pairing each of the possible implementations of HA imprinting with each implementation of VE (Figure 1). Demography, agespecific effects, and N2 imprinting were included in all these models. To test whether more complex models truly improved model fit, we also tested a simple model with constant VE and no effect of imprinting. We evaluated these 11 models by maximum likelihood and compared their performance using the corrected Akaike information criterion (cAIC, ‘Model likelihood’) and leaveoneout crossvalidation.
Mathematical expressions for model components
Demography
Request a detailed protocolWe expect that the fraction of cases in each birth cohort should be proportional to the underlying demographic birth year distribution of the population. To calculate the demographic birth year distribution, we used MESAspecific data on the age distribution for each season (Kieke et al., 2015). Because people ≥90 years old were grouped into a single age class, we estimated the number of people in each age ≥90 years old by assuming a geometric decline in population with age. We converted the age distribution for each season into a distribution by birth year by assigning people of a specific age into the two possible birth years of that age (Appendix 1: ‘Birth year distribution of the study population’). Therefore,
where ${D}_{t,y}$ is the fraction of the population in season $t$ who were born in year $y$ .
Agespecific factors
Request a detailed protocolWe modeled intrinsically agespecific differences in medically attended influenza A infection risk and healthcareseeking behavior by using parameters that represent the relative risk of medically attended influenza A infection in each age group. These parameters combine the effects of underlying agespecific differences in influenza A medically attended infection risk as well as agespecific differences in healthcareseeking behavior. We considered the same age groups as before (0–4 yearolds, 5–9 yearolds, 10–14 yearolds, 15–19 yearolds, 20–29 yearolds, 30–39 yearolds, 40–49 yearolds, 50–64 yearolds, and ≥65 years old). We chose 20–29 yearolds as our reference age group. All age groups $g$ aside from 20 to 29 yearolds had an associated parameter (${A}_{g}$) that scaled their risk of medically attended influenza A infection relative to 20–29 yearolds. These parameters can take on any positive value.
Since our models describe the distribution of cases by birth year and not by age, we mapped the agegroupspecific parameters (${A}_{g}$) to birth cohorts in each season $t$ (${A}_{t,y}$). We considered that each birth cohort has two possible ages in each season ($a1$ and $a2$). Let $G(a)$ be a function that specifies the age group $g$ of a given age $a$. Then ${A}_{t,y}$, the agespecific relative risk in season $t$ of medically attended influenza A infection for a person born in year $y$, is
where ${f}_{a1,t,y}$ and ${f}_{a2,t,y}$ are the fractions of birth cohort $y$ who are age $a1$ or $a2$ in influenza season $t$ (Appendix 1: ‘Fraction of birth cohort with specific age’), and ${A}_{G(a1)}$ and ${A}_{G(a2)}$ are the agegroupspecific parameters for $a1$ and $a2$.
Our models also included agespecific approachment rates (${x}_{\text{approach},t,y}^{\prime}$), enrollment rates (${x}_{\text{enroll},t,y,v}^{\prime}$), and nursing home enrollment (${k}_{t,y}$) as covariates, all of which bias the age distribution of medically attended influenza infections (Appendix 1: ‘Agespecific rates of approachment, enrollment, and nursing home residence’). The combination of estimated agespecific effects and agespecific covariates was modeled as
HA subtype imprinting
Request a detailed protocolWe considered that imprinting to HA reduces a birth cohort’s risk of future infection from the same HA subtype. Therefore,
where ${h}_{s}$ is the strength of HA imprinting for subtype $s$ and ${m}_{s,t,y}$ is the imprinting probability in season $t$ of birth cohort $y$ to subtype $s$ (‘Calculating imprinting probabilities’).
HA group imprinting
Request a detailed protocolWe considered that imprinting to HA reduces a birth cohort’s risk of future infection with viruses from the same HA group. Therefore,
where ${g}_{1}$ is the strength of HA imprinting for group one viruses; ${g}_{2}$ is the strength of HA imprinting for group two viruses; and ${m}_{\text{H1N1},t,y}$, ${m}_{\text{H2N2},t,y}$, and ${m}_{\text{H3N2},t,y}$ are the imprinting probabilities in season $t$ of birth cohort $y$ to H1N1, H2N2, and H3N2.
N2 imprinting
Request a detailed protocolWe considered that imprinting to N2 reduces a birth cohort’s risk of H3N2 infection. Therefore,
where ${n}_{m}$ is the strength of N2 imprinting, and ${m}_{\text{H3N2},t,y}$ and ${m}_{\text{H2N2},t,y}$ are the imprinting probabilities of birth cohort $y$ in season $t$ to H3N2 and H2N2.
Vaccination
Request a detailed protocolWe assumed that vaccination decreases the risk of medically attended infection. However, vaccinated individuals may seek healthcare for symptomatic influenza at a different rate than unvaccinated individuals. Moreover, because vaccines are routinely recommended for individuals with underlying health conditions, preexisting susceptibility to MAARI among vaccinated individuals may also differ from unvaccinated individuals. Let ${R}_{t,g}$ represent the fraction of vaccinated individuals in age group $g$ in season $t$ that present with MAARI. We use testnegative controls to estimate this as
where ${v}_{t,g}^{}$ and ${u}_{t,g}^{}$ are the number of vaccinated or unvaccinated individuals born in year $g$ presenting with MAARI and testing negative for influenza in season $t$. We converted ${R}_{t,g}$ to ${R}_{t,y}$ (i.e., to a covariate indexed by birth cohort) using the same method described in ‘Agespecific factors.’ We tested five different VE schemes: subtypespecific VE that remained constant across seasons and cohorts (two parameters), subtypespecific VE that varied between the age groups described above (18 parameters), VE that varied between seasons (12 parameters), VE for each possible imprinting subtype (six parameters), and birthcohortspecific VE (18 parameters). These VE parameters ($V$) reduced the probability of medically attended influenza A infection among vaccinated individuals in a birth cohort, i.e,
where $V$ depends on the specific implementation of VE used.
Constant VE only varies with the infecting subtype, thus
Seasonspecific VE varies with subtype and season, thus
For agespecific VE, we used the same age classes described above for ‘Agespecific factors’ but did not consider a reference age class, so that each age group had an associated VE for each subtype. We used these agespecific VE parameters to calculate the VE against subtype $s$ in birth cohort $y$ during season $t$ using the same procedure described in ‘Agespecific factors’ (Equation 9). Therefore,
where ${v}_{G(a1),s}$ and ${v}_{G(a2),s}$ are agespecific VE parameters for $a1$ and $a2$.
For imprintingspecific VE, we used the imprinting probabilities for each birth cohort described in ‘Calculating imprinting probabilities’ to scale V such that
where ${v}_{s,z}$ is the VE among people imprinted to subtype $z$ against infection by dominant subtype $s$, and ${m}_{z,t,y}$ is the imprinting probability for subtype $z$ in season $t$ for birth cohort $y$.
For birthcohortspecific VE, we defined nine birth cohorts corresponding to the nine age groups we used for the 2017–2018 season: 1918–1952, 1953–1967, 1968–1977, 1978–1987, 1988–1997, 1998–2002, 2003–2007, 2008–2012, and 2013–2017. Let $Q(y)$ be the birth cohort of people born in year $y$. Then
where ${v}_{Q(y),s}$ is the VE among people in cohort $Q(y)$ against infection by dominant subtype $s$.
Model likelihood
Request a detailed protocolRecall that our aim is to predict ${p}_{s,t,y,v}$, the fraction of all PCRconfirmed influenza cases of dominant subtype $s$ in influenza season $t$ among people born in year $y$ with vaccination status $v$. These fractions can also be interpreted as multinomial parameters that describe the probability that in season $t$, a medically attended influenza infection of subtype $s$ occurs among people born in year $y$ with vaccination status $v$. Each model $M$ assumes that ${p}_{s,t,y,v}$ is proportional to a collection of model components $j$ described above (demography, age, imprinting, and vaccination). Thus,
where ${p}_{M,s,t,y,v}$ is a multinomial probability under model $M$, ${\varphi}_{M,j}$ indicates whether model $M$ contains component $j$, and ${\eta}_{j,s,t,y,v}$ is the mathematical expression for model component $j$ given $s$, $t$, $y$, and $v$ (e.g., for HA subtype imprinting, ${\eta}_{j,s,t,y,v}=1{h}_{s}{m}_{s,t,y}$).
To obtain proper multinomial probabilities, we calculated a normalizing constant for each season $t$ such that all probabilities in that season sum to 1. For convenience, let ${p}_{M,s,t,y,v}^{\prime}={\prod}_{j}{\varphi}_{M,j}{\eta}_{j,s,t,y,v}$ be the unnormalized multinomial probability for model $M$. Then for a specific season $t$, the normalized multinomial probability is
where ${y}_{\text{max},t}$ is the maximum birth year possible for a specific season $t$.
To calculate the likelihood of a given model, we used the multinomial probabilities and the observed birth year distribution of cases. Let ${n}_{s,t,y,v}$ be the number of PCRconfirmed cases of dominant subtype $s$ in influenza season $t$ among people born in year $y$ with vaccination status $v$. The total number of PCRconfirmed cases of dominant subtype $s$ in season $t$ is
For models fitted to a restricted set of ages, we limited the cases for each season to the birth cohorts that were guaranteed to meet the age requirements in that season.
Then, the likelihood of model $M$ in season $t$ is given by the multinomial likelihood,
Finally, the full model likelihood for model $M$ over all observed seasons is
We fitted the model to case data using the LBFGSB algorithm implemented in the R package optimx. We estimated 95% confidence intervals for parameters of the bestfitting model by evaluating likelihood profiles at 14 evenly spaced points and interpolating the entire profile using a smoothing spline.
Results
The age distribution of cases varies between seasons and subtypes
The age distribution of cases varies between subtypes. The relative burden of cases is consistently higher in people ≥65 years old during H3N2dominated seasons compared to H1N1dominated seasons (Figure 2). The age distribution tends to vary more between subtypes than within either over time (Figure 2—figure supplement 1, offdiagonal quadrants). This is consistent with recent work showing that the ratios of H3N2 to H1N1 cases differ between age groups (Gagnon et al., 2018b).
The age distribution also varies within subtypes over time (Figure 2—figure supplement 1, diagonal quadrants). The seven H3N2dominated seasons display three types of age distributions (Figure 2—figure supplement 1, clusters of lightercolored cells in the upper lefthand quadrant), and two correspond to major antigenic clusters (2007–2008, Fonville et al., 2016; 2010–2012, Ann et al., 2012). These differences sometimes coincide with significant shifts in the age distribution between seasons. For instance, the highest fraction of H3N2 cases occurs in 20–29 year olds in the 2007–2008 season, but this age group has the lowest fraction of cases in the next H3N2dominated season (2010–2011, Figure 2C). In H1N1, the shift from seasonal to pandemic strains is associated with large changes in the age distribution (Figure 2—figure supplement 1, lower righthand quadrant).
We found further evidence that age groups differed in their susceptibility across seasons by examining the relative risk of infection during the first versus second half of each epidemic period (Materials and methods: ‘Calculating relative risk’). Individuals at greater risk of infection should be infected disproportionately early rather than late in an epidemic (Worby et al., 2015). We confirmed that an age group’s relative risk correlates with the fraction of cases within that age group in the same season (Pearson’s r = 0.58, 95% CI 0.38–0.73; Figure 2—figure supplement 2A; Appendix 1: ‘Correlation of relative risk and fraction of cases’). This trend is evident for H1N1 (Pearson’s r = 0.73, 95% CI 0.45–0.88; Figure 2—figure supplement 2A) and H3N2 seasons separately (Pearson’s r = 0.52, 95% CI 0.30–0.69; Figure 2—figure supplement 2A). The positive correlation in all seasons is robust to undersampling of cases at the start and end of seasons (Appendix 1: ‘Sensitivity to sampling effort’, Figure 2—figure supplement 2B). This provides supporting evidence that the different numbers of cases in each age group reflect underlying differences in infection risk.
Just as the age distribution of cases varies over time, the age groups with high relative risks of infection change over time. If people contact one another similarly from one season to the next, these shifting relative risks imply that age groups’ relative susceptibilities change over time. For instance, 5–17 year olds had the highest relative risk of early infection in the 2008–2009 season, whereas 50–64 yearolds had the highest relative risk in the 2013–2014 season (Figure 2—figure supplement 3). Relative risks in MESA vary more than national estimates, which show that 5–17 yearolds had the highest relative risk in all but one season from the 2009 pandemic to 2013–2014 (Worby et al., 2015). These differences may partly be due to the fact that our measurements of relative risk use outpatient visits, whereas the national estimates use hospitalizations.
Taken together, these findings suggest that the risk of influenza infection is not a simple function of age alone. Other factors, such as past influenza infections and vaccination, might explain the changing age distributions of cases in time.
Imprinting probabilities of age groups change over time
We hypothesized that variation in the age distribution of cases could be explained by the aging of birth cohorts with similar early exposure histories. This would cause the early exposure history of an age group, and thus potentially its susceptibility, to change in time. To calculate the probability that people in a particular age group had their first influenza A infection with a particular subtype, we adapted the approach from Gostic et al., 2016. Briefly, we calculated the probability that an individual born in a specific year had a primary infection with H1N1, H2N2, or H3N2 using data on relative epidemic sizes and the frequencies of circulating subtypes (Figure 3—figure supplement 1; Materials and methods: ‘Calculating imprinting probabilities’).
As expected, age groups’ early exposures are not static and change over time (Figure 3). Older people nonetheless tend to be imprinted to H1N1 or H2N2, whereas younger people have higher probabilities of imprinting to H3N2. The effects of the 2009 H1N1 pandemic are evident in the three youngest age groups as a transient increase (from 2009 to approximately 2013) in their H1N1 imprinting probability. These imprinting probabilities are relatively wellconstrained even after for accounting for uncertainty in epidemic size (Figure 3—figure supplement 2; Appendix 1: ‘Sensitivity to uncertainty in ILI and the frequency of influenza A’).
Agespecific differences in medically attended influenza A infection risk affect epidemic patterns
We fitted models to estimate the underlying effects of age, early infections, and vaccination on the age distributions of cases. As expected, the cases reveal agespecific differences in the risk of medically attended influenza A infection (Figure 4; Figure 4—figure supplement 1; Appendix 2—table 1). This risk is roughly threefold higher among children <4 years old compared to adults 20–29 years old, after adjusting for other effects (Figure 4). The decline in risk through middle age is generally consistent with attack rates estimated from serology (Monto et al., 1985; Bodewes et al., 2011; Wu et al., 2010; Huang et al., 2019) and clinical infections (Wu et al., 2017). We recently observed smaller differences in the attack rates of schoolaged children and their parents when estimating infections serologically (Ranjeva et al., 2019). We hypothesize that the attack rates estimated from clinical infections might show larger differences by age due to agerelated changes in infection severity and healthcareseeking behavior. Indeed, rates of healthcareseeking behavior have been shown to decline with age before rising in adults ≥65 years old (Biggerstaff et al., 2014; BrooksPollock et al., 2011; Van Cauteren et al., 2012), consistent with our results. Finally, the increased risk of medically attended influenza A infection among people ≥65 years old compared to other adults may be related to the increasing prevalence of highrisk medical conditions with age (Figure 1—figure supplement 4).
Initial infection confers longlasting, subtypespecific protection against future infection
Our bestfitting model supports subtypespecific imprinting for H1N1 and H3N2 (Figure 5, top row; Appendix 2—table 1). This model also provides the best predictive power compared to other models in a leaveoneout crossvalidation analysis (Figure 5—figure supplement 1; Figure 5—figure supplement 2; Appendix 1: ‘Evaluation of predictive power’). The risk of future medically attended infection by H1N1 is reduced by 66% (95% CI 53–77%) in people imprinted to H1N1, whereas the risk of future medically attended infection by H3N2 is reduced by 33% (95% CI 17–46%) in people imprinted to H3N2. We found no evidence of a protective effect from imprinting to N2 (0%, 95% CI 0–7%). These estimates of imprinting protection are insensitive to:
uncertainty in imprinting probabilities due to uncertainty in past epidemic sizes (Figure 3—figure supplement 2; Appendix 1: ‘Sensitivity to uncertainty in ILI and the frequency of influenza A’; Appendix 2—table 3),
choice of age groups for medically attended influenza A infection risk and VE (Appendix 1: ‘Sensitivity to age groups’; Appendix 2—table 4), and
undersampling of influenza cases in some seasons (Figure 5—figure supplement 3).
In theory, the estimated protective effects of imprinting could be influenced by crossprotection rather than the impact of first infection per se. Because first infections are also recent infections in children, we reasoned that the observed imprinting effects might arise from confounding with recent infections in these ages. Based on an estimated 7 year halflife of homologous protection after H1N1pdm infection in children (Ranjeva et al., 2019) and the fact that most children experience primary influenza A infection by 5 years of age (Bodewes et al., 2011), we reasoned that excluding children <15 years old would diminish the impact of protection from recent infection on our results. When we excluded the youngest age groups, our estimates of H1N1 imprinting protection decreased while H3N2 imprinting protection increased (Figure 5, second row). However, initial infection by H1N1 was still more protective than initial infection by H3N2, both imprinting effects remained positive, and there was no significant change in the values of other estimated parameters (Appendix 2—table 1 and Appendix 2—table 2).
The effects of recent infection should also manifest in the difference between the observed and estimated numbers of cases (i.e., the excess cases, Appendix 1: ‘Calculating excess cases’), since unlike typical transmission models, our model does not take priorseason infections into account when estimating cases for the current season. More infections in a birth cohort in one season should reduce susceptibility in that birth cohort at the start of the next season. We thus expect that excess cases in one season will be followed by missing cases in the next season dominated by that subtype (i.e., a negative correlation in excess cases). Instead, we observed that excess cases for each birth cohort are weakly positively correlated from season to season, suggesting that immunity from recent infections is not a major driver of temporal variation in the age distribution of cases (Figure 5—figure supplement 5).
Since older adults have the highest probability of primary infection with H1N1, we also reasoned that older adults might disproportionately drive the strong protection from H1N1 imprinting we observe. People born before 1947 were likely exposed to H1N1 strains that are antigenically similar to the postpandemic H1N1 strains that comprise most of our H1N1 infection data (Manicassamy et al., 2010; O'Donnell et al., 2012), creating the possibility that strainspecific crossimmunity drives the pattern we attribute to subtypespecific imprinting. These people nearly all fall into the ≥65 yearold age group in the study period. The study also underenrolled medically attended infections among people in nursing facilities, which would artificially lower the case count in this age group and may affect estimates of imprinting protection. Therefore, we excluded adults ≥65 years old and refitted our models. Excluding the oldest adults does not significantly change estimated imprinting protection or other parameters (Appendix 2—table 1 and Appendix 2—table 2).
When we exclude both the youngest and oldest age groups, initial infections by H1N1 and H3N2 have similar protective effects (Figure 5, bottom row). This shows that the combined effects of crossprotection in both the youngest and oldest individuals contribute to the signal of imprinting protection we observe, but they are not its sole drivers.
VE varies by birth cohort in older children and adults
The bestfitting model includes agespecific VE (Figure 4—figure supplement 1; Appendix 2—table 2). While serological responses to influenza vaccination are weakest in the young (Englund et al., 2005; Neuzil et al., 2006) and old (Lee et al., 2018; DiazGranados et al., 2014), it is unclear what agerelated factors would drive variation in VE in other age groups. We hypothesized that VE in these ages varies with early exposure history, which correlates with birth year, rather than age.
To test this hypothesis, we fitted a model with birthcohortspecific VE to the cases, excluding either children <15 years old or adults ≥65 years old. We chose birth cohorts that corresponded to the age groups of the original model in 2017–2018 (Materials and methods: ‘Vaccination’), keeping the number of parameters the same (e.g., VE in the 20–29 age group became VE in the 1988–1997 birth year cohort). We find that agespecific VE still outperforms all other models after we exclude the oldest age group (≥65 years old). In contrast, birthcohortspecific VE performs better when we exclude children <15 years old (Figure 6—figure supplement 1). Estimates of imprinting protection and agespecific risk of medically attended influenza in the birthcohortspecific VE models are not significantly different from estimates from the bestfitting model fitted to all ages (Appendix 2—table 1). Taken together, these results suggest that birthcohortspecific VE best explains the case distribution in older children and adults, who have likely experienced their first influenza infection, whereas agespecific VE best explains cases in younger children, who have less influenza exposure.
VE differs between birth cohorts that have similar imprinting by subtype (Figure 6; Appendix 2—table 5). For example, the 1968–1977 and 1988–1997 cohorts have similar probabilities of primary exposure to H1N1 and H3N2, but they differ substantially in their VE to both subtypes (Figure 6). The 1988–1997 and 1998–2002 cohorts also have similar probabilities of primary exposure to each subtype and have similar H1N1 VEs, but have significantly different H3N2 VEs (Figure 6). Antigenic differences within each subtype might explain this variation.
Discrepancies partly explained by antigenic evolution
The bestfitting model accurately reproduces the age distributions of vaccinated and unvaccinated cases of each subtype, aggregated across seasons (Figure 7A). The only exception is that it underestimates aggregate H1N1 cases in unvaccinated 5–9 yearolds. By examining the differences between predicted and observed cases for each season, we see that this is largely driven by infection during the 2009 H1N1 pandemic (Figure 7B). Such a large antigenic change may have negated any protection from previous infection in 5–9 yearolds and made them particularly susceptible to pandemic infection.
The model underestimates cases in unvaccinated individuals who were 30–39 years old and over 50 years old in the 2013–2014 season (Figure 7B), as indicated by the many excess cases in these age groups in that season. This is further evidence that subtypespecific imprinting cannot explain all age variation. As mentioned before, this season provided one of the first examples that original antigenic sin could affect protection: middleaged adults had been targeting a familiar site on the pandemic strain that then mutated, rendering them susceptible. Other age groups were effectively blind to these changes, owing to their different exposure histories (Linderman et al., 2014; Huang et al., 2015; Arriola et al., 2014; Dávila et al., 2014; Petrie et al., 2016).
Discussion
The distribution of influenza cases by birth year is consistent with subtypelevel imprinting, whereby initial infection with a subtype protects against future medically attended infections by the same subtype. The stronger protective effect observed from primary H1N1 infection compared to primary H3N2 infection may be caused by stronger crossprotective responses to conserved epitopes in the more slowly evolving H1N1 (Bedford et al., 2015). This is in line with previous work showing that protection conferred by H1N1 infection lasts longer than protection conferred by H3N2 infection (Ranjeva et al., 2019). Another recent study found stronger imprinting protection from primary H1N1 compared to primary H3N2 infection (Gostic et al., 2019). Subtypespecific protection observed in seasonal influenza is narrower than the previously reported HAgrouplevel imprinting protection against avian influenza (Gostic et al., 2016), but in both cases, the protection correlates strongly with primary infection rather than any prior exposure.
Examining cases of seasonal influenza over a 20 year period in Arizona, Gostic et al., 2019 find evidence of imprinting protection not only from HA but also NA, which we do not. We speculate that this discrepancy may be due to increasing vaccination coverage over time in middleaged adults. During the period of the Arizona study (1993–1994 through 2014–2015), vaccination coverage in U.S. adults increased most rapidly in this age group (NHIS, 2009), which corresponds to the H2N2imprinted cohorts near the end of the study. Without adjustment for vaccination, the apparently increased protection in the middle aged might resemble N2 imprinting. Accounting for vaccination in the MESA population, including the relatively stable vaccination coverage in each age group over time (Figure 1—figure supplement 3), suggests imprinting protection is driven by HA.
In contrast to the clear role of the imprinting subtype in protection against medically attended infection, the model implicates the imprinting strain or other attributes of early exposure history in VE. We expect that people born around the same time were likely exposed to similar strains, not just subtypes, of influenza A early in life, and our results support the idea that biases in immune memory from these early exposures (i.e., original antigenic sin; Davenport and Hennessy, 1957; Francis, 1960; Fazekas de St Groth and Webster, 1966) influence VE. Specifically, we observe that our model is consistent with previous suggestions of birthcohortspecific VE. The model with birthcohortspecific VE better estimates cases in vaccinated 50–64 yearolds (born 1953–1967) in the 2015–2016 season than the model with agespecific VE, as indicated by the fewer excess cases predicted in that age group and an improved fit of 1.1 loglikelihood units (Figure 6—figure supplement 2; Appendix 1: ‘Calculating excess cases’). Reduced VE in this group during the 2015–2016 season has been attributed to the exacerbation of antigenic mismatch by the vaccine in adults whose antibody responses were focused on a nonprotective site (Skowronski et al., 2017b; Flannery et al., 2018). The improved performance of birthcohortspecific VE relative to agespecific VE suggests other seasons and age groups where original antigenic sin might have influenced VE, such as 20–29 yearolds in the 2007–2008 influenza season.
Although seasonal estimates of VE routinely stratify by age, shifts in VE from one season to the next might thus be easier to interpret in light of infection history (e.g., Skowronski et al., 2017b; Flannery et al., 2018). The results suggest this effect may be subtle, i.e, influenced by strains’ specific identities rather than merely their subtype. Our model cannot distinguish between the possibility that the precise identity of the imprinting strain primarily determines later VE, or if individuals’ responses to vaccination are shaped by a particular succession of exposures, which would be common to others in the same birth cohort. Regardless, variation in VE between birth cohorts appears substantial and presents a challenge for vaccination strategies (Erbelding et al., 2018).
The use of different influenza vaccines in MESA during this period is unlikely to affect the results. Most people enrolled in the study received the standarddose inactivated influenza vaccine (IIVSD) (Figure 1—figure supplement 7). However, between 9–26% of vaccinated children <18 years old received the live attenuated influenza vaccine (LAIV) between the 2008–2009 and 2015–2016 seasons (Figure 1—figure supplement 7B). A separate study of LAIV VE in the United States found that LAIV and IIVSD recipients who were repeat vaccinees (as most children were) had similar VE, and thus we do not expect that LAIV receipt should affect VE estimates (McLean et al., 2018). Similarly, 1–15% of adults ≥65 years old received the highdose inactivated influenza vaccine (IIVHD) between 2009–2010 and 2017–2018 (Figure 1—figure supplement 7C). This vaccine is 20% more effective than IIVSD (Lee et al., 2018). Therefore, the changing ratio of IIVHD to IIVSD recipients over time might bias results toward cohortspecific VE in models that include people ≥65 years old. However, when we fitted to cases between 15–64 years old, we found that cohortspecific VE still performed best. Thus, we conclude that changes in IIVHD coverage do not substantially influence results.
Potential methodological biases and the vaccination history of the study population nonetheless suggest caution in interpreting VE estimates. Selection and misclassification biases can arise when using influenza testnegative controls to control for differences in healthcareseeking behavior (Lewnard et al., 2018; Sullivan et al., 2016). Because we also use testnegative controls to set our null expectation for the distribution of cases among birth cohorts, our VE estimates are subject to these biases as well. Moreover, since 45% of the study population is vaccinated, and most participants are frequent vaccinees (Figure 1—figure supplement 6), we are limited in our ability to generalize the VE results to populations with much lower vaccination coverage and/or a shorter history of vaccination. Frequent vaccination has been associated with reduced VE (McLean et al., 2014; Saito et al., 2018; Skowronski et al., 2016). Therefore, the model may underestimate VE in less vaccinated populations. Underestimation of VE could also occur if unvaccinated people are protected by vaccination in the preceding season. Inference might also be distorted if vaccination has large indirect effects, which our model does not consider. Finally, our analysis is worth repeating in a larger population to reduce stochastic influences. We observed an unusually high H1N1 VE in the 1998–2002 birth cohort. Because we restricted cases in this analysis to people ≥15 years old, this VE estimate included data from only the 2013–2014 and 2015–2016 influenza seasons. No H1N1 cases among vaccinated or unvaccinated individuals were observed in this birth cohort in those seasons, which led to the high VE. This might have been due to particular epidemic dynamics in MESA.
Incorporating differences in susceptibility based on early exposures might improve methods to forecast influenza seasons. The analysis of the relative risk of infection during the first half of each season shows more variation in the susceptible age groups from season to season than previously estimated (Worby et al., 2015). While the smaller sample sizes in MESA introduce uncertainty, the correlation between the relative risk and total fraction of cases indicates that the age groups driving epidemics indeed change from season to season. Because the contact structure of the population is probably constant over influenza seasons, variation in the driving age group may be determined by fluctuating susceptibility, which is partly determined by early infections. Therefore, incorporating information on early exposure history into epidemic models may allow for more accurate identification of atrisk populations and finescale epidemic timing.
While the rate of antigenic evolution affects the rate at which different populations become susceptible to infection, we propose that the heterogeneity in susceptibility observed here may also drive antigenic evolution. Heterogeneity in susceptibility implies that influenza viruses face different selective pressures in groups with different exposure histories (Cobey and Koelle, 2008; Nakajima et al., 2000). Recent research consistent with this hypothesis has shown that sera isolated from different individuals can select for distinct escape mutants (Lee et al., 2019). More careful study of how immune memory to influenza evolves from infection and vaccination might improve understanding of influenza’s evolution.
Code and data availability
The code and data used to perform the analyses for this project are available at https://github.com/cobeylab/FluAImprinting (Arevalo et al., 2019; copy archived at https://github.com/elifesciencespublications/FluAImprinting).
Appendix 1
Vaccination coverage
Seasonal influenza vaccination coverage for MESA was collected by age in the 2007–2008 through 2017–2018 seasons using a regional immunization registry (Irving et al., 2009). Monovalent vaccination coverage for the 2009–2010 season was obtained by directly measuring monovalent vaccination coverage in enrolled individuals and fitting a smoothing spline to the data (Figure 1—figure supplement 3). We also calculated the fraction of people who received different vaccination formulations, and found that most people received IIVSD (Figure 1—figure supplement 7).
Correlation of relative risk and fraction of cases
To assess whether an age group’s relative risk correlates with the fraction of cases of that age group in the same season, we performed a rank correlation analysis. For each season, we ranked each age group based on its relative risk and the fraction of cases within that age group. If age groups were tied in either relative risk or fraction of cases, we assigned them the average rank they spanned. We then calculated the Pearson’s correlation coefficient for these two rankings. A positive correlation indicates that an age group with a large relative risk compared to other age groups will also make up a large proportion of cases compared to other age groups.
Seasonal intensity
We defined the intensity of an influenza season as the product of the mean fraction of patients with influenzalike illness (ILI) and the percentage of specimens testing positive for influenza A that season,
where ${\text{ILI}}_{t}$ is the mean fraction of all patients with ILI in season $t$ adjusted for differences in state population size (CDC, 2018), ${F}_{t}$ is the number of respiratory specimens testing positive for influenza A in season $t$, and ${N}_{t}$ is the total number of respiratory specimens tested in season $t$. For seasons 1997–1998 through 2017–2018, these data were obtained from the U.S. Outpatient Influenzalike Illness Surveillance Network (ILINet) and the World Health Organization/National Respiratory and Enteric Virus Surveillance System (WHO/NREVSS) Collaborating Labs (CDC, 2018). For seasons 1976–1977 through 1996–1997 when seasonal ILI data were not available, we assumed that the mean ILI was equal to the mean of mean ILI for seasons 1997–1998 through 2017–2018. We obtained data on ${F}_{t}$ and ${N}_{t}$ for these seasons from Thompson et al., 2003. We then normalized the intensity of each season by dividing ${I}_{t}$ by the mean of ${I}_{t}$ from the 1976–1977 through 2017–2018 seasons. For all seasons before 1976–1977, for which no seasonal intensity data were available, we assumed that the intensity of influenza A equalled the mean intensity of seasons 1976–1977 through 2017–2018.
Fraction of season experienced
We defined the fraction of a given influenza season ${f}_{w,t}$ occurring in week $w$ of season $t$ as
where ${\text{ILI}}_{w,t}$ is the weighted fraction of all patients with ILI in week $w$ of season $t$, ${F}_{w,t}$ is the number of respiratory specimens testing positive for influenza A in week $w$ of season $t$, and ${N}_{w,t}$ is the number of specimens tested in week $w$ of season $t$. $\sum _{{w}^{\mathrm{\prime}}={w}_{0}}^{{w}_{f}}\frac{{\text{ILI}}_{{w}^{\mathrm{\prime}},t}{F}_{{w}^{\mathrm{\prime}},t}}{{N}_{{w}^{\mathrm{\prime}},t}}$ is the product of ILI and the fraction of positive influenza A specimens summed over all weeks of the influenza season $t$, where ${w}_{0}$ is the first week of the season and ${w}_{f}$ is the final week of the season. We defined the start of the influenza season as week 40 of the calendar year, which usually falls at the beginning of October. For seasons before 1997–1998, where weekly data is unavailable, we assumed that the fraction of the influenza season experienced in week $w$ was
where ${\overline{f}}_{w,t}$ is the mean fraction of the influenza season experienced at week $w$ for all seasons after 1997–1998.
We used ${f}_{w,t}$ to calculate the fraction of an influenza season experienced by an individual born in year $y$. We assumed that people born in year $y$ were born uniformly throughout the year. We also assumed that due to maternal immunity, infants did not experience immunizing exposure to influenza until they were at least 180 days old. Let ${p}_{y,w,t}$ be the proportion of individuals born in year $y$ that are over 180 days old in week $w$ of season $t$ and ${\gamma}_{y,t}$ be the fraction of individuals born in year $y$ exposed to influenza season $t$. Then
Calculating the fraction unexposed
When calculating imprinting probabilities, we used an iterative approach to calculate ${U}_{y,t}$ the fraction of people in birth cohort $y$ who were unexposed at the start of season $t$. First, we assumed that in the first year of life (i.e., when $t=y$), the entire population was unexposed. For seasons where $t>y$, the fraction unexposed depends on the fraction unexposed at the start of the previous season (${U}_{y,t1}$) and the attack rate in the previous season (${a}_{y,t1}$). Thus,
Birth year distribution of the study population
In order to convert the demographic age distribution to a birth year distribution, we assumed that people were born uniformly throughout the year. We defined a breakpoint date prior to the start of the enrollment period based on when the the 6 monthold age limit cutoff was set (e.g., if the breakpoint date was Ocotober 1, then infants had to be 6 months old by that date to be eligible for enrollment). We used this date to calculate the fraction of people of age $a$ in season $t$ who were born in year $ty$ (${d}_{a,t,y}^{1}$) or year $ty1$ (${d}_{a,t,y}^{2}$). A fraction ${d}_{a,t,y}^{1}$ of the total population of age $a$ in season $t$ was assigned to birth year $ty$ and ${d}_{a,t,y}^{2}$ to $ty1$. Breakpoint dates ranged from September one through January one with the exception of the pandemic season which had a breakpoint date of May 1, 2009. The start of the enrollment period ranged from December to January with the exception of the 2009 pandemic season, when enrollment began in May 2009.
Fraction of birth cohort with specific age
When converting an agespecific parameter to a birthcohortspecific parameter as in Materials and Methods ‘Agespecific factors’, we considered that each birth cohort had two possible ages ($a1$ and $a2$) in a given season $t$. We assumed that people were born uniformly throughout the year and used the same breakpoint dates described above in ‘Birth year distribution of the study population.’ Then, $f(a1,t,y)$, the fraction of people born in year $y$ who were age $a1$ in season $t$, is the fraction of people born in year $y$ who were born on a date prior to the breakpoint date for season $t$. Finally, $f(a2,t,y)$, the fraction of people born in year $y$ who were age $a2$ in season $t$, is 1  $f(a1,t,y)$.
Agespecific rates of approachment, enrollment, and nursing home residence
The relative rates at which different age groups were approached for study enrollment (the approachment rate, ${x}_{\text{approach}}$) varied between seasons. Similarly, the relative rates at which different age groups enrolled in the study after being approached (the enrollment rate, ${x}_{\text{enroll}}$) also varied between seasons. Enrollment rates also varied between vaccinated and unvaccinated individuals.
We defined the approachment rate of an age group $g$ in season $t$ as
where ${N}_{\text{approached},t,g}$ is the number of people in age group $g$ during season $t$ who were approached for enrollment, and ${N}_{\text{MAARI},t,g}$ is the total number of people in the MESA cohort who presented with MAARI regardless of whether they were approached for enrollment.
We defined the enrollment rate of age group $g$ in season $t$ with vaccination status $v$ as
where ${N}_{\text{enrolled},t,g,v}$ is the number of people in age group $g$ with vaccination status $v$ who enrolled in the study in season $t$, and ${N}_{\text{approached},t,g,v}$ is the number of people in age group $g$ with vaccination status $v$ who were approached for enrollment in season $t$. Due to differences in data collection for the 2007–2008 and 2008–2009 seasons, complete vaccination records for eligible unenrolled individuals were not available, so we assumed that the enrollment rates by age group and vaccination status in those seasons were equal to the mean enrollment rate for each age group and vaccination status across all other seasons.
We normalized ${x}_{\text{approach},t,g}$ by the value of ${x}_{\text{approach},t,g}$ for the reference age group (i.e., 20–29 yearolds) in each season. Similarly, we normalized ${x}_{\text{enroll,},t,g,v}$ to the value of ${x}_{\text{enroll,},t,g,v}$ for unvaccinated members of the reference age group for each season. This yielded the relative approachment and enrollment rates ${x}_{\text{approach},t,g}^{\prime}$ and ${x}_{\text{enroll},t,g,v}^{\prime}$. We converted both ${x}_{\text{approach},t,g}^{\prime}$ and ${x}_{\text{enroll},t,g,v}^{\prime}$ to birthyear specific covariates (i.e. covariates by $y$ instead of $g$) using the same procedure described in Materials and Methods: ‘Agespecific factors’ (Equation 9).
Finally, the study did not enroll residents of skilled nursing facilities with dedicated medical staff. To account for this, we estimated the proportion of the population in nursing facilities within the study area. We obtained the total number of beds in nursing facilities within MESA in 2018 from the Wisconsin Department of Health Services (WDHS, 2018). We assumed that the total number of beds did not change between 2007–2008 and 2017–2018. We also used data from the Centers for Medicare and Medicaid Services (CMS, 2015) to calculate the percent of beds occupied in Wisconsin nursing facilities by age for 2011 through 2014 and the fraction of people in a nursing facility by age group. We used a smoothing spline to obtain the fraction of people of a given age in a nursing facility. For seasons before 2010–2011 and after 2013–2014, we assumed that the fraction of people of a given age in a nursing facility was the average value for 2011–2014. Given the total population of the study area by age and season, we calculated the fraction of people in a given age $a$ and season $t$ who are in nursing facilities (${k}_{t,a}$). We converted this to a covariate by birth year (${k}_{t,y})$ using the same procedure described in Materials and Methods: ‘Agespecific factors’ (Equation 9).
Evaluation of predictive power
To evaluate the predictive power of each model, we performed leaveoneout crossvalidation. We excluded data from each season and fitted our models to the remaining seasons. Because our goal was to evaluate how well our models predict seasonal epidemics, we excluded the 2009 pandemic season from all crossvalidation analyses. We also did not test seasonal VE models with crossvalidation since estimation of seasonal VE requires data from the excluded season.
Let ${n}_{s,t,y,v}$ be the number of observed cases of subtype $s$ in season $t$ among people born in year $y$ with vaccination status $v$. Let ${p}_{M,s,{t}^{},y,v}^{{t}^{}}$ be the multinomial probability of a case of subtype $s$ in season ${t}^{}$ among people born in year $y$ with vaccination status $v$ under model $M$ fitted to all seasons except ${t}^{}$. Let ${N}_{s,{t}^{}}$ be the total number of cases of subtype $s$ in season ${t}^{}$. Then, the predicted number of cases of subtype $s$ in season ${t}^{}$ among people born in year $y$ with vaccination status $v$ under model $M$ fitted to all seasons except ${t}^{}$ is
The sum squared prediction error for model $M$ in season ${t}^{}$ is given by
where ${y}_{\text{max},t}$ is the maximum possible birth year in season $t$.
We evaluated each model $M$ by its meansquared prediction error across all excluded seasons ${t}^{}$. Let ${T}^{}$ be the set of all seasons left out and $X$ be the size of ${T}^{}$. Then the meansquared prediction error for model $M$ is
Sensitivity to uncertainty in ILI and the frequency of influenza A
Because of the lack of ILI data prior to the 1997–1998 season and the lack of data on the frequency of influenza A prior to the 1976–1977 season, we used simulated datasets to test the robustness of our results. We randomly assigned ILI values from the 1997–1998 through 2017–2018 seasons to every season which did not have a measured ILI value. Similarly, we randomly assigned values of the frequency of influenza A from the 1976–1977 through 2017–2018 seasons to every season which did not have a measured value for the frequency of influenza A. We created 10000 simulated datasets using this procedure and recalculated imprinting probabilities for each dataset (Figure 3—figure supplement 2). In the period of H1N1 and H3N2 cocirculation, the maximum H1N1 imprinting probability for a particular birth cohort corresponds to the minimum H3N2 imprinting probability for that cohort and viceversa. Therefore, to generate datasets representing the upper and lower bounds of imprinting probabilities, we assigned imprinting probabilities from the simulation with either the lowest or highest H1N1 imprinting probability to each birth cohort in each season. We then fitted our models to these two datasets and evaluated model fit using cAIC.
Sensitivity to age groups
To test whether our models were sensitive to our choice of age groups, we fit revised versions of all our models with different age groups:
0–4 years, 5–17 years, 18–49 years, 50–64 years, and ≥65 years
0–4 years, 5–17 years, 18–64 years, and ≥65 years
These models with alternate age groupings were fitted to case data to determine whether our findings on the strength of protection from initial H1N1 and H3N2 infection were altered compared to fits using the higherresolution age grouping described above (Appendix 2—table 4).
Sensitivity to sampling effort
Sampling effort was not even across seasons, and analysis of the number of influenza cases per sampling day suggested that a significant number of cases may have been missed at the beginning or end of a specific seasons (Figure 5—figure supplement 3). As our analysis of relative risk indicates, different age groups are more susceptible during different points in the influenza season, and therefore missing data from the beginning or end of a season could introduce bias in the observed age distribution of cases.
To adjust for this, we simulated cases for seasons which did not have sufficient sampling of the start or end of the epidemic period. We considered a season sufficiently sampled if the sampling period spanned the start and end of the epidemic. We expect that the start and end of the epidemic have few cases per sampling day, and we therefore defined sufficiently sampled seasons as seasons where
the number of cases per sampling day in the first week of the enrollment period is < 1 and
the number of cases per sampling day in the last week of the enrollment period is < 1.
To extrapolate the start of a season, we linearly regressed the number of cases of the dominant subtype per sampling day for each week of the first half of the season and identified the week of the season where the number of cases per sampling day fell below 1 (${t}_{0}$). For each week from ${t}_{0}$ to the first week of the enrollment period, we used the regression of cases per sampling day to calculate the number of cases we expected to see in each week. Summing these yields the total number of unsampled cases at the beginning of the season. We used a similar approach to extrapolate the number of unsampled cases at the end of a season by instead regressing cases per sampling day for each week of the latter half of the season. We did not extrapolate cases for the 2010–2011 season for this analysis since the observed number of cases per sampling day did not follow a typical epidemic curve.
We stochastically assigned a birth year and vaccination status to these cases according to a multinomial distribution. The success probabilities of this distribution were set using the age distribution of cases of the dominant subtype from the first two weeks of the enrollment period (if extrapolating the beginning of a season) or the last two weeks of the enrollment period (if extrapolating the end of a season). Specifically, we calculated the distribution of observed cases in the first or last two weeks of the enrollment period among nine age groups (Materials and Methods: ‘Agespecific factors’) with their associated vaccination status. We then assumed that cases were uniformly distributed among all birth years contained in an age group. This yielded a set of probabilities describing the probability of infection given birth year and vaccination status in a specific season.
We sampled from these multinomial distributions 1000 times to obtain augmented datasets that combined observed and extrapolated cases. For each replicate simulation, we calculated the age distribution of cases for the entire season as well as the relative risk of each age group in the first versus the latter half of the season (Figure 2—figure supplement 2B). We also fitted the bestfitting model to 100 of these datasets (excluding the 2010–2011 season) and recorded the estimated imprinting strength for both H1N1 and H3N2 for each fit (Figure 5—figure supplement 4).
Calculating excess cases
We defined excess cases for a given birth cohort or age group as the number of observed cases for that birth cohort or age group minus the number of predicted cases for that age group. Predictions were obtained by multiplying the multinomial probabilities produced by the model by the total number of cases of the dominant subtype in each season. A 95% prediction interval was obtained by simulating 10000 datasets using the multinomial probabilities from a specific model (Figure 6—figure supplement 2, Figure 7).
To test whether recent infection might be confounding our estimates, we calculated the correlation between excess cases in each birth cohort in each season with excess cases of the same birth cohort in the next season with the same dominant subtype (Figure 5—figure supplement 5).
Appendix 2
Supplementary tables and figures
Data availability
Code and data for calculation of imprinting probabilities, vaccination coverage, and model fitting are available on GitHub at https://github.com/cobeylab/FluAImprinting (copy archived at https://github.com/elifesciencespublications/FluAImprinting).
References

Molecular and antigenic evolution of human influenza A/H3N2 viruses in Quebec, Canada, 20092011Journal of Clinical Virology 53:88–92.https://doi.org/10.1016/j.jcv.2011.09.016

Update: influenza activity  united states, September 29, 2013february 8, 2014MMWR. Morbidity and Mortality Weekly Report 63:148–154.

Effectiveness of inactivated influenza vaccines varied substantially with antigenic match from the 20042005 season to the 20062007 seasonThe Journal of Infectious Diseases 199:159–167.https://doi.org/10.1086/595861

Influenzalike illness, the time to seek healthcare, and influenza antiviral receipt during the 20102011 influenza seasonUnited statesThe Journal of Infectious Diseases 210:535–544.https://doi.org/10.1093/infdis/jiu224

Prevalence of antibodies against seasonal influenza A and B viruses in children in netherlandsClinical and Vaccine Immunology 18:469–476.https://doi.org/10.1128/CVI.0039610

Birth cohort effects in influenza surveillance data: evidence that first influenza infection affects later InfluenzaAssociated illnessThe Journal of Infectious Diseases 220:820–829.https://doi.org/10.1093/infdis/jiz201

WebsiteFluView national, regional, and state level outpatient illness and viral surveillanceAccessed October 23, 2018.

Immune history and influenza virus susceptibilityCurrent Opinion in Virology 22:105–111.https://doi.org/10.1016/j.coviro.2016.12.004

Capturing escape in infectious disease dynamicsTrends in Ecology & Evolution 23:572–577.https://doi.org/10.1016/j.tree.2008.06.008

A serologic recapitulation of past experiences with influenza A; antibody response to monovalent vaccineThe Journal of Experimental Medicine 104:85–97.https://doi.org/10.1084/jem.104.1.85

Predetermination by infection and by vaccination of antibody response to influenza virus vaccinesThe Journal of Experimental Medicine 106:835–850.https://doi.org/10.1084/jem.106.6.835

Substantial morbidity and mortality associated with pandemic A/H1N1 influenza in Mexico, winter 20132014: gradual age shift and severityPLOS Currents 6:ecurrents.outbreaks.a855a92f19db1d90ca955f5e908d6631.https://doi.org/10.1371/currents.outbreaks.a855a92f19db1d90ca955f5e908d6631

Efficacy of highdose versus standarddose influenza vaccine in older adultsNew England Journal of Medicine 371:635–645.https://doi.org/10.1056/NEJMoa1315727

A universal influenza vaccine: the strategic plan for the national institute of allergy and infectious diseasesThe Journal of Infectious Diseases 218:347–354.https://doi.org/10.1093/infdis/jiy103

Disquisitions of original antigenic sin. I. evidence in manThe Journal of Experimental Medicine 124:331–345.https://doi.org/10.1084/jem.124.3.331

Antigenic maps of influenza A(H3N2) Produced with human antisera obtained after primary infectionJournal of Infectious Diseases 213:31–38.https://doi.org/10.1093/infdis/jiv367

On the doctrine of original antigenic sinProceedings of the American Philosophical Society 104:572–578.

Focused antibody response to influenza linked to antigenic driftJournal of Clinical Investigation 125:2631–2645.https://doi.org/10.1172/JCI81104

Evidence of Bias in estimates of influenza vaccine effectiveness in seniorsInternational Journal of Epidemiology 35:337–344.https://doi.org/10.1093/ije/dyi274

Functional status is a confounder of the association of influenza vaccine and risk of all cause mortality in seniorsInternational Journal of Epidemiology 35:345–352.https://doi.org/10.1093/ije/dyi275

Influenza vaccine effectiveness in the united states during the 20152016 seasonNew England Journal of Medicine 377:534–543.https://doi.org/10.1056/NEJMoa1700153

Validation of health event capture in the Marshfield epidemiologic study areaClinical Medicine & Research 13:103–111.https://doi.org/10.3121/cmr.2014.1246

Measurement of vaccine direct effects under the TestNegative designAmerican Journal of Epidemiology 187:2686–2697.https://doi.org/10.1093/aje/kwy163

Impact of repeated vaccination on vaccine effectiveness against influenza A(H3N2) and B during 8 seasonsClinical Infectious Diseases 59:1375–1385.https://doi.org/10.1093/cid/ciu680

Tecumseh study of illness. XIII. Influenza infection and disease, 19761981American Journal of Epidemiology 121:811–822.https://doi.org/10.1093/oxfordjournals.aje.a114052

Immunogenicity and reactogenicity of 1 versus 2 doses of trivalent inactivated influenza vaccine in vaccinenaive 58yearold childrenThe Journal of Infectious Diseases 194:1032–1039.https://doi.org/10.1086/507309

Beyond antigenic match: possible AgentHost and Immunoepidemiological influences on influenza vaccine effectiveness during the 20152016 season in CanadaThe Journal of Infectious Diseases 216:1487–1500.https://doi.org/10.1093/infdis/jix526

Prevention and control of influenza: recommendations of the advisory committee on immunization practices (acip)MMWR. Recommendations and Reports : Morbidity and Mortality Weekly Report. Recommendations and Reports 55:1–42.

Theoretical basis of the TestNegative study design for assessment of influenza vaccine effectivenessAmerican Journal of Epidemiology 184:345–353.https://doi.org/10.1093/aje/kww064

The infection attack rate and severity of 2009 pandemic H1N1 influenza in Hong KongClinical Infectious Diseases : An Official Publication of the Infectious Diseases Society of America 51:1184–1191.https://doi.org/10.1086/656740

Estimated incidence and number of outpatient visits for seasonal influenza in 20152016 in Beijing, ChinaEpidemiology and Infection 145:3334–3344.https://doi.org/10.1017/S0950268817002369

20142015 influenza vaccine effectiveness in the united states by vaccine typeClinical Infectious Diseases 63:1564–1573.https://doi.org/10.1093/cid/ciw635
Article and author information
Author details
Funding
National Institutes of Health (DP2AI117921)
 Sarah Cobey
National Institutes of Health (F32AI14517701)
 Philip Arevalo
National Institutes of Health (HHSN272201400005C)
 Sarah Cobey
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Jennifer King and Carla Rottscheit for their assistance in providing the data for this study and Rohan Dandavati for compiling historical data on subtype frequencies and ILI. We thank Marcos Vieira and Kangchon Kim for their assistance in calculating imprinting probabilities. This work was completed with computational resources provided by the University of Chicago’s Research Computing Center. Funding for this project was provided by the National Institutes of Health (NIH), Department of Health and Human Services, under grant DP2AI117921 (to SC), CEIRS Contract No. HHSN272201400005C (to SC), and NRSA Fellowship F32AI14517701 (to PA). HQM receives research support from Seqirus unrelated to this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Ethics
Human subjects: Study procedures for the vaccine effectiveness study was approved by the IRB at the Marshfield Clinic Research Institute. Informed consent was obtained from all participants at the time of enrollment into the vaccine effectiveness study. This analysis was subsequently approved by the Marshfield Clinic Research Institute IRB with a waiver of informed consent. The analysis of data was approved by the University of Chicago IRB under protocol number IRB171134CR001.
Version history
 Received: July 10, 2019
 Accepted: June 29, 2020
 Accepted Manuscript published: July 7, 2020 (version 1)
 Version of Record published: July 17, 2020 (version 2)
Copyright
© 2020, Arevalo et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,998
 views

 306
 downloads

 45
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Epidemiology and Global Health
Background:
Comorbidity with type 2 diabetes (T2D) results in worsening of cancerspecific and overall prognosis in colorectal cancer (CRC) patients. The treatment of CRC per se may be diabetogenic. We assessed the impact of different types of surgical cancer resections and oncological treatment on risk of T2D development in CRC patients.
Methods:
We developed a populationbased cohort study including all Danish CRC patients, who had undergone CRC surgery between 2001 and 2018. Using nationwide register data, we identified and followed patients from date of surgery and until new onset of T2D, death, or end of followup.
Results:
In total, 46,373 CRC patients were included and divided into six groups according to type of surgical resection: 10,566 RightNoChemo (23%), 4645 RightChemo (10%), 10,151 LeftNoChemo (22%), 5257 LeftChemo (11%), 9618 RectalNoChemo (21%), and 6136 RectalChemo (13%). During 245,466 personyears of followup, 2556 patients developed T2D. The incidence rate (IR) of T2D was highest in the LeftChemo group 11.3 (95% CI: 10.4–12.2) per 1000 personyears and lowest in the RectalNoChemo group 9.6 (95% CI: 8.8–10.4). Betweengroup unadjusted hazard ratio (HR) of developing T2D was similar and nonsignificant. In the adjusted analysis, RectalNoChemo was associated with lower T2D risk (HR 0.86 [95% CI 0.75–0.98]) compared to RightNoChemo.
For all six groups, an increased level of body mass index (BMI) resulted in a nearly twofold increased risk of developing T2D.
Conclusions:
This study suggests that postoperative T2D screening should be prioritised in CRC survivors with overweight/obesity regardless of type of CRC treatment applied.
Funding:
The Novo Nordisk Foundation (NNF17SA0031406); TrygFonden (101390; 20045; 125132).

 Epidemiology and Global Health
We discuss 12 misperceptions, misstatements, or mistakes concerning the use of covariates in observational or nonrandomized research. Additionally, we offer advice to help investigators, editors, reviewers, and readers make more informed decisions about conducting and interpreting research where the influence of covariates may be at issue. We primarily address misperceptions in the context of statistical management of the covariates through various forms of modeling, although we also emphasize design and model or variable selection. Other approaches to addressing the effects of covariates, including matching, have logical extensions from what we discuss here but are not dwelled upon heavily. The misperceptions, misstatements, or mistakes we discuss include accurate representation of covariates, effects of measurement error, overreliance on covariate categorization, underestimation of power loss when controlling for covariates, misinterpretation of significance in statistical models, and misconceptions about confounding variables, selecting on a collider, and p value interpretations in covariateinclusive analyses. This condensed overview serves to correct common errors and improve research quality in general and in nutrition research specifically.