Antigenic and genetic evolution of seasonal influenza A(H3N2) viruses, 1997 – 2019. A-B.

Temporal phylogenies of hemagglutinin (H3) and neuraminidase (N2) gene segments. Tip color denotes the Hamming distance from the root of the tree, based on the number of substitutions at epitope sites in H3 (N = 129 sites) and N2 (N = 223 sites). “X” marks indicate the phylogenetic positions of US recommended vaccine strains. C-D. Seasonal genetic and antigenic distances are the mean distance between A(H3N2) viruses circulating in the current season t versus the prior season (t – 1), measured by C. four sequence-based metrics (HA receptor binding site (RBS), HA stalk footprint, HA epitope, and NA epitope) and D. hemagglutination inhibition (HI) titer measurements. E. The Shannon entropy of H3 and N2 local branching index (LBI) values in each season. Vertical bars in C, D, and E and are 95% confidence intervals of seasonal estimates from five bootstrapped phylogenies.

Evolutionary indicators of seasonal viral fitness.

Evolutionary indicators are labeled by the influenza gene for which data are available (hemagglutinin, HA or neuraminidase, NA), the type of data they are based on, and the component of influenza fitness they represent. Table format is adapted from Huddleston et al., 2020 [35].

Annual influenza A(H3N2) epidemics in the United States, 1997 – 2019. A.

Weekly incidence of influenza A(H3N2) (red), A(H1N1) (blue), and B (green) averaged across ten HHS regions (Region 1: Boston; Region 2: New York City; Region 3: Washington, DC; Region 4: Atlanta; Region 5: Chicago; Region 6: Dallas, Region 7: Kansas City; Region 8: Denver; Region 9: San Francisco; Region 10: Seattle). Time series are 95% confidence intervals of regional incidence estimates. Incidences are the proportion of influenza-like illness (ILI) visits among all outpatient visits, multiplied by the proportion of respiratory samples testing positive for each influenza type/subtype. Vertical dashed lines indicate January 1 of each year. B. Intensity of weekly influenza A(H3N2) incidence in ten HHS regions. White tiles indicate weeks when influenza-like-illness data or virological data were not reported. Weekly time series for A(H1N1) and B are in Figure S5.

Seasonal metrics of A(H3N2) epidemic dynamics.

Epidemic metrics are defined and labeled by which outcome category they represent.

A(H3N2) antigenic drift correlates with larger, more intense annual epidemics.

A(H3N2) epidemic size, peak incidence, epidemic intensity, and transmissibility (effective reproduction number, Rt) increase with antigenic drift, measured by A. hemagglutinin (H3) epitope distance, and B. neuraminidase (N2) epitope distance, and C. hemagglutination inhibition (HI) log2 titer distance. Seasonal antigenic drift is the mean titer distance or epitope distance between viruses circulating in the current season t versus the prior season (t – 1) or two prior seasons (t – 2). Distances are scaled to aid in direct comparison of evolutionary indicators. Point color indicates the dominant influenza A virus (IAV) subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bands are 95% confidence intervals of regional estimates. Seasonal mean A(H3N2) epidemic metric values were fit as a function of antigenic or genetic distance using LMs (epidemic size, peak incidence), Gaussian GLMs (effective Rt: inverse link), or Beta GLMs (epidemic intensity) with 1000 bootstrap resamples.

The proportion of influenza positive samples typed as A(H3N2) increases with antigenic drift.

A-B. Seasonal A(H3N2) subtype dominance increases with H3 and N2 epitope distance. Seasonal epitope distance is the mean epitope distance between viruses circulating in the current season t versus the prior season (t – 1) or two prior seasons (t – 2). Distances were scaled to aid in direct comparison of evolutionary indicators. Point color indicates the dominant influenza A virus (IAV) subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bands are 95% confidence intervals of regional estimates. Seasonal mean A(H3N2) dominance was fit as a function of H3 or N2 epitope distance using Beta GLMs with 1000 bootstrap resamples. C-D. Regional patterns of influenza type and subtype incidence during two seasons when A(H3N2) was nationally dominant. C. Widespread A(H3N2) dominance during 2003-2004 after the emergence of a novel antigenic cluster, FU02 (A/Fujian/411/2002-like strains). D. Spatial heterogeneity in subtype circulation during 2007-2008, a season with low A(H3N2) antigenic novelty relative to the prior season. Pie charts represent the proportion of influenza positive samples typed as A(H3N2) (red), A(H1N1) (blue), or B (green) in each HHS region. Data for Region 10 (purple) were not available for seasons prior to 2009. The sizes of regional pie charts are proportional to the total number of influenza positive samples.

The effects of influenza A(H1N1) and B epidemic size on A(H3N2) epidemic burden.

A. Influenza A(H1N1) epidemic size negatively correlates with A(H3N2) epidemic size, peak incidence, transmissibility (effective reproduction number, Rt), and epidemic intensity. B. Influenza B epidemic size does not significantly correlate with A(H3N2) epidemic metrics. Point color indicates the dominant influenza A virus (IAV) subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical and horizontal bands are 95% confidence intervals of regional estimates. Seasonal mean A(H3N2) epidemic metrics were fit as a function of mean A(H1N1) or B epidemic size using Gaussian GLMs (inverse link: epidemic size, peak incidence; log link: effective Rt) or Beta GLMs (epidemic intensity) with 1000 bootstrap resamples.

Variable importance rankings from conditional inference random forest models predicting A(H3N2) epidemic dynamics.

Ranking of variables in predicting regional A(H3N2) A. epidemic size, B. peak incidence, C. effective reproduction number, Rt, D. epidemic intensity, and E. subtype dominance. Each forest was created by generating 3,000 regression trees from a repeated leave-one-season-out cross-validated sample of the data. Variables are ranked by their conditional permutation importance, with differences in prediction accuracy scaled by the total (null model) error. Black error bars are 95% confidence intervals of conditional permutation scores. Abbreviations: HI titer = hemagglutination inhibition log2 titer distance, t – 1 = one-season lag, t – 2 = two-season lag, LBI = local branching index, peak = peak incidence, distance to vaccine = epitope distance between currently circulating strains and the recommended vaccine strain, VE = vaccine effectiveness.

Observed versus predicted values of seasonal region-specific A(H3N2) A. epidemic size, B. peak incidence, C. effective reproduction number, Rt, D. epidemic intensity, and E. subtype dominance from conditional random forest models.

Results are facetted by HHS region and epidemic metric. Point color and size corresponds to the degree of hemagglutinin (H3) epitope distance in viruses circulating in season t versus viruses circulating two seasons ago (t – 2). Large, yellow points indicate seasons with high antigenic novelty, and small blue points indicate seasons with low antigenic novelty. Regional Spearman’s correlation coefficients and associated P-values are in the top left section of each facet.

Predictors of seasonal A(H3N2) epidemic burden, transmissibility, intensity, and subtype dominance.

Variables retained in the best fit model for each epidemic outcome were determined by BIC.

Comparison of seasonal antigenic drift measured by substitutions at hemagglutinin (H3) epitope sites and HI titer measurements, from 1997-1998 to 2018-2019.

We used Spearman correlation tests to measure associations between H3 epitope distance and HI titer distance at A. one-season lags and B. two-season lags. Seasonal antigenic distance is the mean distance between strains circulating in season t and strains circulating in the prior season t – 1 year (one season lag) or two seasons ago t – 2 years (two season lag). Seasonal distances are scaled because epitope distance and HI titer distance use different units of measurement. Point labels indicate the current influenza season, and point color denotes the relative timing of influenza seasons, with earlier seasons shaded dark purple (e.g., 1997-1998) and later seasons shaded light yellow (e.g., 2018-2019). H3 epitope distance and HI titer (tree model) distance at two-season lags capture expected “jumps” in antigenic drift during key seasons previously associated with major antigenic transitions [32], such as the SY97 cluster seasons (1997-1998, 1998-1999, 1999-2000) and the FU02 cluster season (2003-2004).

Pairwise correlations between H3 and N2 evolutionary indicators (one season lags).

We measured Spearman’s correlations between seasonal measures of H3 and N2 evolution, including H3 RBS distance, H3 epitope distance, H3 non-epitope distance, H3 stalk footprint distance, HI titer distance, N2 epitope distance based on 223 or 53 epitope sites, N2 non-epitope distance, mean clade growth of H3 and N2 (local branching index, LBI), and the Shannon entropy of H3 and N2 LBI values. Seasonal distances were estimated as the mean distance between strains circulating in the current season t and those circulating in the prior season (t – 1). The Benjamini and Hochberg method was used to adjust P-values for multiple testing. The color of each circle indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation). Stars within circles indicate statistical significance (adjusted P < 0.05).

Pairwise correlations between H3 and N2 evolutionary indicators (two season lags).

We measured Spearman’s correlations between seasonal measures of H3 and N2 evolution, including H3 RBS distance, H3 epitope distance, H3 non-epitope distance, H3 stalk footprint distance, HI titer distance (tree model), N2 epitope distance based on 223 or 53 epitope sites, N2 non-epitope distance, mean clade growth of H3 and N2 (local branching index, LBI), and the Shannon entropy of H3 and N2 LBI values. Seasonal distances were estimated as the mean distance between strains circulating in the current season t and those circulating in the prior season (t – 1). The Benjamini and Hochberg method was used to adjust P-values for multiple testing. The color of each circle indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation). Stars within circles indicate statistical significance (adjusted P < 0.05).

Comparison of seasonal antigenic drift measured by substitutions at hemagglutinin (H3) and neuraminidase (N2) epitope sites, from 1997-1998 to 2018-2019.

We used Spearman correlation tests to measure associations between H3 epitope distance and N2 epitope distance at A. one-season lags and B. two-season lags. Seasonal epitope distance is the mean distance between strains circulating in season t and strains circulating in the prior season t – 1 (one season lag) or two seasons ago t – 2 (two season lag). Point labels indicate the current influenza season, and point color denotes the relative timing of influenza seasons, with earlier seasons shaded dark purple (e.g., 1997-1998) and later seasons shaded light yellow (e.g., 2018-2019). N2 epitope distance at one-season lags captures expected “jumps” in antigenic drift during key seasons previously associated with major antigenic transitions [32], such as the SY97 cluster seasons (1997-1998, 1998-1999, 1999-2000) the FU02 cluster season (2003-2004), and the CA04 cluster season (2004-2005).

Intensity of weekly incidence of A. influenza A(H1N1) and B. influenza B in ten HHS regions, 1997 – 2019.

Seasonal and pandemic A(H1N1) were combined as A(H1N1), and the Victoria and Yamagata lineages of influenza B were combined as influenza B. White tiles indicate weeks when either influenza-like-illness cases or virological data were not reported. Data for Region 10 were not available in seasons prior to 2009.

Pairwise correlations between seasonal A(H3N2), A(H1N1), and B epidemic metrics.

We measured Spearman’s correlations among indicators of A(H3N2) epidemic timing, including onset week, peak week, regional variation (s.d.) in onset and peak timing, and the number of days from onset to peak, indicators of A(H3N2) epidemic magnitude, including epidemic intensity (i.e., the “sharpness” of the epidemic curve), transmissibility (maximum effective reproduction number, Rt), subtype dominance patterns, epidemic size, and peak incidence. We also considered relationships between the circulation of other types/subtypes and A(H3N2) epidemic burden and timing. The Benjamini and Hochberg method was used to adjust P-values for multiple testing. The color of each circle indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation). Stars within circles indicate statistical significance (adjusted P < 0.05).

Univariate correlations between A(H3N2) viral fitness and epidemic impact.

Mean Spearman correlation coefficients, 95% confidence intervals of correlation coefficients, and corresponding p-values of bootstrapped (N = 1000) viral fitness indicators (rows) and epidemic metrics (columns). Point color indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation), and stars indicate statistical significance (* P < 0.05, ** P < 0.01, *** P < 0.001). Abbreviations: HI = hemagglutination inhibition, RBS: receptor binding site, t – 1 = one-season lag, t – 2 = two-season lag, LBI = local branching index.

Low diversity in the growth rates of circulating A(H3N2) clades is associated with more intense epidemics and higher transmissibility.

A(H3N2) effective Rt and epidemic intensity negatively correlate with the diversity of LBI values among circulating A(H3N2) lineages in the current or prior season, measured by the Shannon entropy of A. H3 local branching index (LBI) values in the prior season (t – 1), and B. the Shannon entropy of N2 LBI values in the current season t. LBI values are scaled to aid in direct comparisons of H3 and N2 LBI diversity. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bands are 95% confidence intervals of regional estimates. Mean A(H3N2) epidemic metric values were fit as a function of seasonal LBI diversity using Gaussian GLMs (effective Rt: inverse link) or Beta GLMs (epidemic intensity: logit link) with 1000 bootstrap resamples.

Excess influenza A(H3N2) mortality increases with H3 and N2 antigenic drift, but correlations are not statistically significant.

The number of excess influenza deaths attributable to A(H3N2) (per 100,000 people) were estimated from a seasonal regression model fit to weekly pneumonia and influenza-coded deaths [127]. Seasonal epitope distance is the mean distance between strains circulating in season t and those circulating in the prior season (t – 1) or two seasons ago (t – 2). Distances are scaled to aid in direct comparison of evolutionary indicators. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bars are 95% confidence intervals of excess mortality estimates. National excess mortality estimates were fit as a function of seasonal H3 or N2 epitope distance using Gaussian GLMs (log link) with 1000 bootstrap resamples.

Regional patterns of influenza type and subtype incidence from seasons 1997-1998 to 2018-2019.

Pie charts represent the proportion of influenza positive samples that were typed as A(H3N2), A(H1N1) or A(H1N1)pdm09, and B in each HHS region. Data for Region 10 (purple) were not available in seasons prior to the 2009 A(H1N1) pandemic.

Univariate correlations between A(H3N2) viral fitness and epidemic timing.

Mean Spearman correlation coefficients, 95% confidence intervals of correlation coefficients, and corresponding p-values of bootstrapped (N = 1000) viral fitness indicators (columns) and epidemic timing metrics (rows). Epidemic timing metrics are the week of epidemic onset, regional variation (s.d.) in onset timing, the week of epidemic peak, regional variation (s.d.) in peak timing, the number of days between epidemic onset and peak, and seasonal duration. Color indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation), and stars indicate statistical significance (* P < 0.05, ** P < 0.01, *** P < 0.001). Abbreviations: HI = hemagglutination inhibition, RBS: receptor binding site, t – 1 = one-season lag, t – 2 = two-season lag, LBI = local branching index.

Seasonal duration increases with diversity in clade growth rates of circulating H3 and N2 lineages, measured as the Shannon entropy of local branching index (LBI) values. A.

H3 LBI diversity and B. N2 LBI diversity during the current season positively correlate with seasonal duration. LBI values are scaled to aid in direct comparisons of H3 and N2 LBI diversity. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant). Mean values of regional season duration were fit as a function of H3 LBI diversity or N2 LBI diversity using Gaussian GLMs (inverse link) with 1000 bootstrap resamples.

Epidemic speed increases with N2 antigenic drift.

N2 epitope distance correlates with fewer days from epidemic onset to peak (A), while the relationship between H3 epitope distance and epidemic speed is less apparent (B). Seasonal epitope distance is the mean distance between strains circulating in season t and those circulating in the prior season (t – 1) or two seasons ago (t – 2). Distances are scaled to aid in direct comparison of evolutionary indicators. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant). Mean values of regional days from onset to peak were fit as a function of H3 or N2 epitope distance using Gamma GLMs (inverse link) with 1000 bootstrap resamples.

The timing of epidemic onsets and peaks are weakly correlated with H3 and N2 antigenic change. A.

Epidemic onsets are earlier in seasons with increased H3 epitope distance (t – 2), but the correlation is not statistically significant. B. Epidemic peaks are earlier in seasons with increased H3 epitope distance (t – 2) or increased N2 epitope distance (t – 1), but correlations are not statistically significant. Seasonal epitope distance is the mean distance between strains circulating in season t and those circulating in the prior season (t – 1) or two seasons ago (t – 2). Distances are scaled to aid in direct comparison of evolutionary indicators. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant). Mean values of regional epidemic onsets and peaks were fit as a function of H3 or N2 epitope distance using LMs with 1000 bootstrap resamples.

Univariate correlations between A(H3N2) antigenic change and the age distribution of outpatient influenza-like illness (ILI) cases.

Mean Spearman correlation coefficients, 95% confidence intervals of correlation coefficients, and corresponding p-values of bootstrapped (N = 1000) evolutionary indicators (rows) and the proportion of ILI cases in individuals aged < 5 years, 5-24 years, 25-64 years, and ≥ 65 years (columns). Color indicates the strength and direction of the association, from dark red (strong positive correlation) to dark blue (strong negative correlation), and stars indicate statistical significance (* P < 0.05, ** P < 0.01, *** P < 0.001). Abbreviations: HI = hemagglutination inhibition, RBS: receptor binding site, t – 1 = one-season lag, t – 2 = two-season lag.

N2 epitope distance correlates with the age distribution of outpatient influenza-like illness (ILI) cases.

Seasonal epitope distance is the mean distance between strains circulating in season t and those circulating in the prior season (t – 1) or two seasons ago (t – 2). Distances are scaled to aid in direct comparison of evolutionary indicators. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bars are 95% confidence intervals of regional age distribution estimates. The fraction of cases in each age group were fit as a function of seasonal H3 or N2 epitope distance using Beta GLMs (logit link) with 1000 bootstrap resamples.

National excess influenza A(H3N2) mortality decreases with A(H1N1) epidemic size but not B epidemic size.

Excess influenza deaths attributable to A(H3N2) (per 100,000 people) were estimated from a seasonal regression model fit to weekly pneumonia and influenza-coded deaths. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bands are 95% confidence intervals of model estimates. National excess mortality estimates were fit as a function of seasonal A(H1N1) or B epidemic size using Gaussian GLMs (log link) with 1000 bootstrap resamples.

The effect of influenza A(H1N1) epidemic size on A(H3N2) epidemic burden during the entire study period (1997-2019) (top), pre-2009 seasons (middle), and post-2009 seasons (bottom).

Influenza A(H1N1) epidemic size inversely correlates with A(H3N2) epidemic size, peak incidence, transmissibility (maximum effective reproduction number, Rt), and epidemic intensity. Point color indicates the dominant influenza A virus (IAV) subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical and horizontal bands are 95% confidence intervals of regional estimates. Seasonal mean A(H3N2) epidemic metrics were fit as a function of mean A(H1N1) epidemic size using Gaussian GLMs (epidemic size, peak incidence: inverse link; effective Rt: log link) or Beta GLMs (epidemic intensity: logit link) with 1000 bootstrap resamples.

Wavelet analysis of influenza A and B epidemic timing. A.

A(H3N2) incidence precedes A(H1N1) incidence in most seasons. Although A(H1N1) incidence sometimes leads or is in phase with A(H3N2) incidence (negative or zero phase lag), the direction of seasonal phase lags is not clearly associated with A(H1N1) epidemic size. B. A(H3N2) incidence leads B incidence (positive phase lag) during each season, irrespective of B epidemic size. Point color indicates the dominant influenza A subtype based on CDC influenza season summary reports (red: A(H3N2), blue: A(H1N1), purple: A(H1N1)pdm09, orange: A(H3N2)/A(H1N1)pdm09 co-dominant), and vertical bars are 95% confidence intervals of regional estimates. To estimate the relative timing of influenza subtype incidences, phase angle differences were calculated as phase in A(H3N2) minus phase in A(H1N1) (or B), with a positive value indicating that A(H1N1) (or B) incidence lags A(H3N2) incidence. To calculate seasonal phase lags, we averaged pairwise phase angle differences from epidemic week 40 to epidemic week 20. Seasonal phase lags were fit as a function of seasonal A(H1N1) or B epidemic size using LMs with 1000 bootstrap resamples.

Variable importance rankings from LASSO models predicting A(H3N2) epidemic dynamics.

Ranking of variables in predicting seasonal A(H3N2) A. epidemic size, B. peak incidence, C. transmissibility (effective reproduction number, Rt), D. epidemic intensity (inverse Shannon entropy), and E. subtype dominance. Models were tuned using a repeated leave-one-season-out cross-validated sample of the data. Variables are ranked by their coefficient estimates, with differences in prediction accuracy scaled by the total (null model) error. Abbreviations: HI titer = hemagglutination inhibition log2 titer distance, t – 1 = one-season lag, t – 2 = two-season lag, LBI = local branching index, peak = peak incidence, distance to vaccine = epitope distance between currently circulating strains and the recommended vaccine strain, VE = vaccine effectiveness.

Relationships between the predictive accuracy of random forest models and H3 epitope distance.

Root mean squared errors between observed and model-predicted values were averaged across regions for each season, and results are facetted according to epidemic metric. Point color corresponds to the degree of H3 epitope distance in viruses circulating in season t relative to those circulating two seasons ago (t – 2), with bright yellow points indicating seasons with greater antigenic novelty. Spearman correlation coefficients and associated P-values are provided in the top left section of each facet.

Relationships between the predictive accuracy of random forest models and N2 epitope distance

Root mean squared errors between observed and model-predicted values were averaged across regions for each season, and results are facetted according to epidemic metric. Point color corresponds to the degree of N2 epitope distance in viruses circulating in season t relative to those circulating in the prior season (t – 1), with bright yellow points indicating seasons with greater antigenic novelty. Spearman correlation coefficients and associated P-values are provided in the top left section of each facet.