Timely vaccine strain selection and genomic surveillance improve evolutionary forecast accuracy of seasonal influenza A/H3N2

  1. John Huddleston  Is a corresponding author
  2. Trevor Bedford
  1. Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, United States
  2. Howard Hughes Medical Institute, United States
7 figures, 3 tables and 2 additional files

Figures

Figure 1 with 4 supplements
Model of forecast horizons and submission lags.

(A) Long-term forecasting models historically predicted 12 months into the future from April and October because of the time required to develop and distribute a new vaccine (Luksza and Lässig, 2014). We tested three additional shorter forecast horizons in 3-month intervals of 9, 6, and 3 months prior to the same time in the future season. For each forecast horizon, we calculated the accuracy of forecasts under each of the three submission lags described below including no lag, realistic lag, and ideal lag. (B) Observed lags in days between collection of viral samples and submission of corresponding hemagglutinin (HA) sequences to Global Initiative on Sharing All Influenza Data (GISAID) (purple) for samples collected in 2019 have a mean of 98 days (approximately 3 months). A gamma distribution fit to the observed lag distribution with a similar mean and shape (green) represents a realistic submission lag that we sampled from to assign “submission dates” to simulated and natural A/H3N2 populations. A gamma distribution with a mean that is one-third of the realistic distribution (orange) represents an ideal submission lag analogous to the 1-month average observed lags for SARS-CoV-2 genomes. Retrospective analyses including fitting of forecasting models typically filter HA sequences by collection date instead of submission dates in which case there is no lag (blue).

Figure 1—source data 1

Distribution of lags between sample collection and sequence submission in prepandemic and pandemic eras; see distribution_of_submission_lags.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig1-data1-v1.zip
Figure 1—figure supplement 1
Distribution of submission lags in days for the pre-pandemic era (2019–2020) and pandemic era (2022–2023 in orange).

Vertical dashed lines represent mean lags for each distribution.

Figure 1—figure supplement 2
Number and proportion of A/H3N2 sequences available per timepoint and lag type.

(A) Number of A/H3N2 sequences available per timepoint and lag type. (B) Proportion of all A/H3N2 sequences without lag per timepoint and lag type.

Figure 1—figure supplement 3
Number and proportion of simulated A/H3N2-like sequences available per timepoint and lag type.

(A) Number of simulated A/H3N2-like sequences available per timepoint and lag type. (B) Proportion of all simulated A/H3N2-like sequences without lag per time point and lag type.

Figure 1—figure supplement 4
Number of all available sequences per region and year and proportion of sequences sampled by two different subsampling methods.

(A) Number of all available sequences per region and year in the Global Initiative on Sharing All Influenza Data (GISAID) EpiFlu database for the study period between April 1, 2005, and October 1, 2019. (B) Proportion of sequences sampled per region and year with even subsampling across regions and year/month combinations at 90 viruses per month. (C) Proportion of sequences sampled per region and year with even subsampling across regions and year/month combinations at 270 viruses per month.

Figure 2 with 3 supplements
Distance to the future per timepoint (AAs) for natural A/H3N2 populations by forecast horizon and submission lag type based on forecasts from the local branching index (LBI) and mutational load model.

Each point represents a future timepoint whose population was predicted from the number of months earlier corresponding to the forecast horizon. Points are colored by submission lag type including forecasts made with no lag (blue), an ideal lag (orange), and a realistic lag (green).

Figure 2—source data 1

Distance to the future for natural A/H3N2 populations; see h3n2_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig2-data1-v1.zip
Figure 2—source code 1

Jupyter notebook used to produce this figure and the figure supplement: workflow/notebooks/plot-distances-to-the-future-by-delay-type-and-horizon-for-population.py.ipynb.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig2-code1-v1.zip
Figure 2—figure supplement 1
Distance to the future for simulated A/H3N2-like populations by forecast horizon and submission lag type based on forecasts from the “true fitness” model.
Figure 2—figure supplement 1—source data 1

Distance to the future for simulated A/H3N2-like populations; see simulated_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig2-figsupp1-data1-v1.zip
Figure 2—figure supplement 2
Optimal distance to the future for natural A/H3N2 populations by forecast horizon and submission lag type based on post hoc empirical fitness of the initial population.
Figure 2—figure supplement 3
Optimal distance to the future for simulated A/H3N2-like populations by forecast horizon and submission lag type based on post hoc empirical fitness of the initial population.
Figure 3 with 1 supplement
Clade frequency errors for natural A/H3N2 clades.

Clade frequency errors for natural A/H3N2 clades at the same timepoint calculated as the difference between clade frequencies without submission lag and corresponding frequencies with either (A) ideal or (B) realistic submission lags. Distributions of frequency errors appear normally distributed in both lag scenarios for both (C) small clades (>0% and <10% frequency) and (D) large clades (≥10%). Dashed lines indicate the median error from the distribution of the lag type with the same color.

Figure 3—source data 1

Current and future clade frequencies for natural A/H3N2 populations by forecast horizon and submission lag type; see h3n2_clade_frequencies.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig3-data1-v1.zip
Figure 3—source code 1

Jupyter notebook used to produce this figure and the figure supplement: workflow/notebooks/plot-current-clade-frequency-errors-by-delay-type-for-populations.py.ipynb.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig3-code1-v1.zip
Figure 3—figure supplement 1
Clade frequency errors between simulated A/H3N2-like HA populations with ideal or realistic submission lags and populations without any submission lag.
Figure 3—figure supplement 1—source data 1

Current and future clade frequencies for simulated A/H3N2-like populations by forecast horizon and submission lag type; see simulated_clade_frequencies.csv athttps://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig3-figsupp1-data1-v1.zip
Figure 4 with 3 supplements
Absolute forecast clade frequency errors for natural A/H3N2 populations by forecast horizon in months and submission lag type (none, ideal, or observed) for (A) small clades (<10% initial frequency) and (B) large clades (≥10% initial frequency).
Figure 4—source code 1

Jupyter notebook used to produce this figure and the figure supplements: workflow/notebooks/plot-forecast-clade-frequency-errors-by-delay-type-and-horizon-for-population.py.ipynb.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig4-code1-v1.zip
Figure 4—figure supplement 1
Absolute forecast clade frequency errors for simulated A/H3N2-like HA populations by forecast horizon in months and submission lag type (none, ideal, or realistic) for (A) small clades (<10% initial frequency) and (B) large clades (≥10% initial frequency).
Figure 4—figure supplement 2
Forecast clade frequency errors for natural A/H3N2 HA populations by forecast horizon in months and submission lag type (none, ideal, or realistic) for (A) small clades (10% initial frequency) and (B) large clades (≥10% initial frequency).
Figure 4—figure supplement 3
Forecast clade frequency errors for simulated A/H3N2-like HA populations by forecast horizon in months and submission lag type (none, ideal, or realistic) for (A) small clades (<10% initial frequency) and (B) large clades (≥10% initial frequency).
Figure 5 with 5 supplements
Improvement of clade frequency errors for A/H3N2 populations between the status quo (12-month forecast horizon and realistic submission lags) and realistic interventions of improved vaccine development (reducing 12-month to 6-month forecast horizon), improved surveillance (reducing submission lags from 3 months on average to 1 month), or a combination of both interventions.

We measured improvements from the status quo as the difference in total absolute clade frequency error per future timepoint. Positive values indicate increased forecast accuracy, while negative values indicate decreased accuracy. Each point represents the improvement of forecasts for a specific future timepoint under the given intervention. Horizontal dashed lines indicate median improvements. Horizontal dotted lines indicate upper and lower quartiles of improvements.

Figure 5—source data 1

Differences in total absolute clade frequency error per future timepoint and clade between the status quo and realistic interventions for A/H3N2 populations; see h3n2_effects_of_realistic_interventions.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig5-data1-v1.zip
Figure 5—source code 1

Jupyter notebook used to produce effects of interventions on total absolute clade frequency errors workflow/notebooks/plot-forecast-clade-frequency-errors-by-delay-type-and-horizon-for-population.py.ipynb.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig5-code1-v1.zip
Figure 5—source code 2

Jupyter notebook used to produce effects of interventions on distances to the future: workflow/notebooks/plot-distances-to-the-future-by-delay-type-and-horizon-for-population.py.ipynb.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig5-code2-v1.zip
Figure 5—figure supplement 1
Distribution of total absolute clade frequency errors summed across clades per future timepoint for A/H3N2 populations.

We calculated the effects of interventions as the difference between these values per future timepoint under the status quo (12-month forecast horizon and realistic submission lag) and specific interventions.

Figure 5—figure supplement 2
Improvement of clade frequency errors for simulated A/H3N2-like populations between the status quo and realistic interventions.
Figure 5—figure supplement 2—source data 1

Differences in total absolute clade frequency error per future timepoint and clade between the status quo and realistic interventions for simulated A/H3N2-like populations; see simulated_effects_of_realistic_interventions.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig5-figsupp2-data1-v1.zip
Figure 5—figure supplement 3
Distribution of total absolute clade frequency errors summed across clades per future timepoint for simulated A/H3N2-like populations.
Figure 5—figure supplement 4
Improvement of distances to the future (AAs) for A/H3N2 populations between the status quo (12-month forecast horizon and realistic submission lags) and realistic interventions.

The effects of interventions are the differences between distances to the future per future timepoint under the status quo and specific interventions.

Figure 5—figure supplement 4—source data 1

Improvement of distances to the future per future timepoint for A/H3N2 populations; see h3n2_effects_of_realistic_interventions_on_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig5-figsupp4-data1-v1.zip
Figure 5—figure supplement 5
Improvement of distances to the future (AAs) for simulated A/H3N2-like populations between the status quo (12-month forecast horizon and realistic submission lags) and realistic interventions.

The effects of interventions are the differences between distances to the future per future timepoint under the status quo and specific interventions.

Figure 5—figure supplement 5—source data 1

Improvement of distances to the future per future timepoint for simulated A/H3N2-like populations; see simulated_effects_of_realistic_interventions_on_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig5-figsupp5-data1-v1.zip
Figure 6 with 4 supplements
Improvement of optimal distances to the future (AAs) for A/H3N2 populations between the status quo (12-month forecast horizon and realistic submission lags) and realistic interventions of improved vaccine development (reducing 12-month to 6-month forecast horizon), improved surveillance (reducing submission lags from 3 months on average to 1 month), or a combination of both interventions.

We measured improvements from the status quo as the difference in optimal distances to the future per future timepoint. Positive values indicate increased forecast accuracy, while negative values indicate decreased accuracy. Each point represents the improvement of forecasts for a specific future timepoint under the given intervention. Horizontal dashed lines indicate median improvements. Horizontal dotted lines indicate upper and lower quartiles of improvements.

Figure 6—source data 1

Differences in optimal distances to the future per future timepoint between the status quo and realistic interventions for A/H3N2 populations; see h3n2_optimal_effects_of_realistic_interventions_on_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-data1-v1.zip
Figure 6—source code 1

Python notebook used to produce optimal effects of interventions on distances to the future: workflow/notebooks/plot-distances-to-the-future-by-delay-type-and-horizon-for-population.py.ipynb.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-code1-v1.zip
Figure 6—source code 2

Python notebook used to plot optimal effects by future clade entropy: workflow/notebooks/plot-optimal-effects-of-interventions-by-clade-entropy.py.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-code2-v1.zip
Figure 6—source code 3

Python notebook used to plot optimal effects by hemisphere: workflow/notebooks/plot-optimal-effects-of-interventions-by-hemisphere.py.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-code3-v1.zip
Figure 6—figure supplement 1
Improvement of optimal distances to the future (AAs) for simulated A/H3N2-like populations between the status quo (12-month forecast horizon and realistic submission lags) and realistic interventions.
Figure 6—figure supplement 1—source data 1

Improvement of optimal distances to the future per future timepoint for simulated A/H3N2-like populations; see simulated_optimal_effects_of_realistic_interventions_on_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-figsupp1-data1-v1.zip
Figure 6—figure supplement 2
Improvement of optimal distances to the future (AAs) for A/H3N2 populations between the status quo (12-month forecast horizon and realistic submission lags) and realistic interventions using forecasts based on sampling 270 viruses per month instead of the 90 viruses-per-month sampling used in the main results.
Figure 6—figure supplement 2—source data 1

Improvement of optimal distances to the future per future timepoint for A/H3N2 populations with higher density sampling; see h3n2_high_density_optimal_effects_of_realistic_interventions_on_distances_to_the_future.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-figsupp2-data1-v1.zip
Figure 6—figure supplement 3
Improvement of optimal distances to the future (AAs) for A/H3N2 populations compared to the Shannon entropy of clade frequencies (estimated without submission lags) at the future timepoint being forecast to.

Panel titles include the Pearson r value between the improvement in distance and the future clade entropy.

Figure 6—figure supplement 3—source data 1

Improvement of optimal distances to the future for A/H3N2 populations compared to the Shannon entropy of clade frequencies at the future timepoint; see h3n2_optimal_effects_of_realistic_interventions_on_distances_to_the_future_by_future_clade_entropy.csv at https://doi.org/10.5281/zenodo.17259448.

https://cdn.elifesciences.org/articles/104282/elife-104282-fig6-figsupp3-data1-v1.zip
Figure 6—figure supplement 4
Improvement of optimal distances to the future (AAs) for A/H3N2 populations by intervention and the hemisphere with an active season during the future timepoint being predicted.

We labeled future timepoints that occurred in October or January as ”Northern“ and ”Southern” if the dates were in April or July.

Author response image 1
Streamgraph of clade frequencies for A/H3N2 populations demonstrating variability of clade cocirculation through time.

Tables

Table 1
Distance to the future in amino acids (mean ± SD AAs) by forecast horizon (in months) and submission lag for A/H3N2 populations.
Distance to future (mean ± SD AAs)
HorizonNo lagIdeal lagRealistic lag
32.91± 0.863.32±0.963.85±1.05
64.44±1.394.74±1.545.03±1.66
95.48± 2.055.84±2.146.04±2.15
126.45±2.726.77±2.806.78±2.61
Table 2
Errors in clade frequencies between observed and predicted values by forecast horizon (in months) and submission lag for A/H3N2 clades with an initial frequency ≥10% under the given lag scenario.
Clade frequency error (%)Absolute frequency error (%)
HorizonLag typeMeanMedianSDMinMaxMeanMedianSD
3None109–2828766
3Ideal1011–3236867
3Realistic1013–31501079
6None1017–484512911
6Ideal1019–505313913
6Realistic1020–5275151214
9None0-123–6659161017
9Ideal1-125–6758181118
9Realistic1-126–6779191219
12None0030–8276201022
12Ideal1031–807421923
12Realistic0031–7878201223
Table 3
Improvement in A/H3N2 clade frequency forecast accuracy under realistic interventions of improved vaccine development (reducing 12-month to 6-month forecast horizon), improved surveillance (reducing submission lags from 3 months on average to 1 month), or a combination of both interventions.

We measured improvements from the status quo (12-month forecast horizon and 3-month average submission lag) as the difference in total absolute clade frequency error per future timepoint and the number and proportion of future timepoints for which forecasts improved under the intervention.

Forecast accuracy improvement (%)Timepoints improved
InterventionMeanMedianSDTotalProportion
Improved vaccine5349112190.61
Improved surveillance–11–1356100.32
Improved vaccine and surveillance5429124180.58

Additional files

MDAR checklist
https://cdn.elifesciences.org/articles/104282/elife-104282-mdarchecklist1-v1.pdf
Supplementary file 1

GISAID accessions and metadata including originating and submitting labs for natural strains used across all timepoints.

https://cdn.elifesciences.org/articles/104282/elife-104282-supp1-v1.zip

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. John Huddleston
  2. Trevor Bedford
(2025)
Timely vaccine strain selection and genomic surveillance improve evolutionary forecast accuracy of seasonal influenza A/H3N2
eLife 14:RP104282.
https://doi.org/10.7554/eLife.104282.3