1. Evolutionary Biology
  2. Microbiology and Infectious Disease
Download icon

Within-host evolutionary dynamics of seasonal and pandemic human influenza A viruses in young children

  1. Alvin X Han  Is a corresponding author
  2. Zandra C Felix Garza
  3. Matthijs RA Welkers
  4. René M Vigeveno
  5. Nhu Duong Tran
  6. Thi Quynh Mai Le
  7. Thai Pham Quang
  8. Dinh Thoang Dang
  9. Thi Ngoc Anh Tran
  10. Manh Tuan Ha
  11. Thanh Hung Nguyen
  12. Quoc Thinh Le
  13. Thanh Hai Le
  14. Thi Bich Ngoc Hoang
  15. Kulkanya Chokephaibulkit
  16. Pilaipan Puthavathana
  17. Van Vinh Chau Nguyen
  18. My Ngoc Nghiem
  19. Van Kinh Nguyen
  20. Tuyet Trinh Dao
  21. Tinh Hien Tran
  22. Heiman FL Wertheim
  23. Peter W Horby
  24. Annette Fox
  25. H Rogier van Doorn
  26. Dirk Eggink
  27. Menno D de Jong
  28. Colin A Russell  Is a corresponding author
  1. Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Netherlands
  2. National Institute of Hygiene and Epidemiology, Viet Nam
  3. Ha Nam Centre for Disease Control, Viet Nam
  4. Children's Hospital 2, Viet Nam
  5. Children's Hospital 1, Viet Nam
  6. Vietnam National Children's Hospital, Viet Nam
  7. Siriraj Hospital, Mahidol University, Thailand
  8. Hospital for Tropical Diseases, Viet Nam
  9. National Hospital for Tropical Diseases, Viet Nam
  10. Oxford University Clinical Research Unit, Viet Nam
  11. Radboud Medical Centre, Radboud University, Netherlands
  12. Nuffield Department of Medicine, University of Oxford, United Kingdom
  13. Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia
  14. WHO Collaborating Centre for Reference and Research on Influenza, Australia
  15. Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Netherlands
Research Article
  • Cited 0
  • Views 535
  • Annotations
Cite this article as: eLife 2021;10:e68917 doi: 10.7554/eLife.68917

Abstract

The evolution of influenza viruses is fundamentally shaped by within-host processes. However, the within-host evolutionary dynamics of influenza viruses remain incompletely understood, in part because most studies have focused on infections in healthy adults based on single timepoint data. Here, we analyzed the within-host evolution of 82 longitudinally sampled individuals, mostly young children, infected with A/H1N1pdm09 or A/H3N2 viruses between 2007 and 2009. For A/H1N1pdm09 infections during the 2009 pandemic, nonsynonymous minority variants were more prevalent than synonymous ones. For A/H3N2 viruses in young children, early infection was dominated by purifying selection. As these infections progressed, nonsynonymous variants typically increased in frequency even when within-host virus titers decreased. Unlike the short-lived infections of adults where de novo within-host variants are rare, longer infections in young children allow for the maintenance of virus diversity via mutation-selection balance creating potentially important opportunities for within-host virus evolution.

Introduction

Influenza A viruses (IAV) are some of the most prevalent human respiratory pathogens, infecting hundreds of millions of people worldwide each year. Because of the high error rates of the viral RNA polymerase complex, de novo mutants are generated as the viruses replicate within infected hosts (Andino and Domingo, 2015). However, the emergence of these variants within host does not mean that they will become the majority variant within the infected host or be transmitted between hosts. The evolution of IAVs is the product of a complex mosaic of evolutionary processes that include genetic drift, positive selection (Smith et al., 2004), transmission bottleneck effects (Varble et al., 2014; McCrone et al., 2018), and global migration patterns (Russell et al., 2008; Rambaut et al., 2008). Importantly, the resulting evolutionary dynamics can differ at the individual and population levels (Nelson and Holmes, 2007).

For seasonal IAVs at the global population level, antibody-mediated immune selection pressure from natural infection or vaccination positively selects for novel antigenic variants that facilitate immune escape, resulting in antigenic drift (Smith et al., 2004). However, at the within-host level, the role of positive selection exerted by immunity is less obvious. Several next-generation sequencing studies of typical, short-lived seasonal IAV infections in adult humans showed that intra-host genetic diversity of influenza viruses is low and dominated by purifying selection (McCrone et al., 2018; Dinis et al., 2016; Debbink et al., 2017; Valesano et al., 2020; Sobel Leonard et al., 2016). Additionally, large-scale comparative analyses of IAV hemagglutinin (HA) consensus sequences found limited evidence of positive selection on HA at the individual level regardless of the person’s expected influenza virus infection history (Han et al., 2019). Importantly, these studies focused on virus samples from only one or two timepoints, mostly early in infection, limiting the opportunities to study how virus populations evolved over the course of infection.

Separate from seasonal IAVs, zoonotic IAVs constantly pose new pandemic threats. Prior to becoming human-adapted seasonal strains, IAVs are introduced into the human population from an animal reservoir through the acquisition of host-adaptive mutations, sometimes via reassortment, resulting in global pandemics such as the 2009 swine influenza pandemic (Smith et al., 2009). In the 2009 pandemic, global virus genetic diversity increased rapidly during the early phases of the pandemic as a result of rapid transmissions in the predominantly naïve human population (Su et al., 2015). Over subsequent waves of the pandemic, host-adapting mutations that incrementally improved viral fitness and transmissibility in humans of A/H1N1pdm09 viruses emerged (Elderfield et al., 2014), eventually reaching fixation in the global virus population (Nogales et al., 2018).

At the individual level, the within-host evolutionary dynamics of the pandemic A/H1N1pdm09 virus, particularly in the early stages of the 2009 pandemic, have been relatively underexplored. To date, the only within-host genetic diversity analysis of A/H1N1pdm09 viruses during the initial phase of the pandemic was based on mostly single-timepoint samples collected within ~7 days post-symptom onset (Poon et al., 2016). Despite initial findings of high within-host diversity and loose transmission bottlenecks (Poon et al., 2016), these results were later disputed due to technical anomalies and subsequent reanalyses of a smaller subset of the original data found that intra-host genetic diversity of the pandemic virus was low and comparable to levels observed in seasonal IAVs (Xue and Bloom, 2019; Poon et al., 2019). It remains unclear how frequently host-adaptive mutations appear within hosts infected by a pandemic IAV and if these mutants are readily transmitted between individuals.

Here, we deep-sequenced 275 longitudinal clinical specimens sampled from 82 individuals residing in Southeast Asia between 2007 and 2009 that were either infected with seasonal A/H3N2 or pandemic A/H1N1pdm09 viruses. By analyzing minority variants found across the whole IAV genome, we characterized the evolutionary dynamics of within-host virus populations in these samples collected up to 2 weeks post-symptom onset.

Results

Study participants

The A/H3N2 virus samples were collected from 51 unlinked individuals as part of an oseltamivir dosage trial (South East Asia Infectious Disease Clinical Research Network, 2013; Koel et al., 2020). 48 of the 51 A/H3N2 virus-infected individuals were young children (median age = 2 years; interquartile range [IQR] = 2–3 years) at the time of sampling and most had low or no detectable anti-influenza virus antibody titers on days 0 and 10 post-symptom onset (Koel et al., 2020). Given that young children are substantial contributors to influenza virus transmission (Worby et al., 2015; Viboud et al., 2004), the samples analyzed here offer a valuable opportunity to investigate the within-host IAV evolutionary dynamics in this key population. The A/H1N1pdm09 virus specimens were collected from 32 individuals up to 12 days post-symptom onset. These individuals include both children and adults (median age = 10 years; IQR = 4–20 years) infected during the first wave of the pandemic in Vietnam (July–December 2009). 15 of the 32 individuals (including six index patients) were sampled in a household-based influenza cohort study (Horby et al., 2012). The remaining 16 unlinked individuals were hospitalized patients that were involved in two different oseltamivir treatment studies (South East Asia Infectious Disease Clinical Research Network, 2013; Hien et al., 2010) Details of all study participants are described in the respective cited studies and Supplementary file 4.

Genetic diversity of within-host virus populations

We used the number of minority intra-host single-nucleotide variants (iSNVs; 2% in frequencies) to measure the levels of genetic diversity of within-host IAV populations. Similar to previous studies (McCrone et al., 2018; Dinis et al., 2016; Debbink et al., 2017; Sobel Leonard et al., 2016), within-host genetic diversity of human A/H3N2 virus populations was low (median = 11 iSNVs, IQR = 7–16; Figure 1A). Within-host genetic diversity of pandemic A/H1N1pdm09 virus populations was also low, with a median number of 21 iSNVs (IQR = 13.5–30.0; Figure 1B) identified. Cycle threshold (Ct) values, and thus likely virus shedding, correlated with the number of days post-symptom onset for both IAV subtypes (A/H3N2: Spearman’s ρ=0.468, p=1.38×10-10; A/H1N1pdm09: ρ=0.341, p=0.048; Figure 1C, D). The number of iSNVs observed in within-host A/H3N2 virus populations weakly correlated with days since onset of symptoms in patients (ρ=0.463,p=2.22×10-10) and Ct values (ρ=0.508,p=1.20×10-12), suggesting that as infection progresses genetic variants accumulate within-host even as virus population size decreases (Figure 1A). On the other hand, there was no significant correlation between the number of iSNVs observed in within-host A/H1N1pdm09 virus populations and Ct values (ρ=0.198,p=0.21) or days post-symptom onset (ρ=-0.021,p=0.91Figure 1B).

Figure 1 with 5 supplements see all
Genetic diversity of within-host influenza A virus populations.

Box plots summarizing the number of intra-host single-nucleotide variants (iSNVs; median, interquartile range [IQR], and whiskers extending within median ±1.5 × IQR) identified in samples with adequate breadth of coverage across the whole influenza virus genome in (A) seasonal A/H3N2 and (B) pandemic A/H1N1pdm09 virus samples, stratified by day(s) since symptom onset or qPCR cycle threshold (Ct) values. (C, D) Ct values as a function of day(s) since symptom onset for A/H3N2 viruses (C) and A/H1N1pdm09 viruses (D).

Within-host evolutionary rates of IAV

To investigate within-host evolutionary dynamics, empirical rates of synonymous, nonsynonymous, and premature stop codon (i.e., nonsense) iSNVs were calculated by normalizing the summation of observed iSNV frequencies with the number of available sites and time since symptom onset (see Materials and methods). The overall within-host evolutionary rates of A/H3N2 viruses observed here are in the same order of magnitude (<10-5 divergence per site per day) as those reported in previous within-host seasonal influenza virus evolution studies (Figure 2A; Xue and Bloom, 2020). Synonymous evolutionary rates were significantly higher than nonsynonymous rates during the initial phase of A/H3N2 virus infections (Figure 2A), primarily in the polymerase complex and HA genes (Figure 2A, Figure 2—figure supplement 1, and Figure 3—figure supplement 1). Importantly, nonsynonymous variants gradually accumulated, increasing in rates around 4 days post-symptom onset to similar levels relative to synonymous rates. To ensure that this temporal trend was not due to aggregated effects across multiple individuals, we performed linear regression on the computed evolutionary rates for each A/H3N2-infected individual with at least three sampling timepoints (n = 39). Nonsynonymous evolutionary rates were positively correlated against time for 25/39 individuals (64%; Figure 2—figure supplement 3). In contrast, synonymous evolutionary rates were negatively correlated against time for 27 (69%) individuals.

Figure 2 with 4 supplements see all
Box plots (median, interquartile range [IQR], and whiskers extending within median ±1.5 × IQR) summarizing the empirical within-host evolutionary rates of (A) seasonal A/H3N2 viruses and (B) pandemic A/H1N1pdm09 viruses.

Top panel shows the evolutionary rate of individual gene segments over all timepoints (rg) while the bottom panel depicts the genome-wide evolutionary rate (rt) for each day since symptom onset. All rates are stratified by substitution type (synonymous – blue; nonsynonymous – red; gray – stop codon). Wilcoxon signed-rank tests were performed to assess if the paired synonymous and nonsynonymous evolutionary rates are significantly distinct per individual gene segment or timepoint (annotated with * if p<0.05). This was done for all sets of nonsynonymous and synonymous rate pairs except for those computed per day since symptom onset for A/H1N1pdm09 viruses due to the low number of data points available (median number of A/H1N1pdm09 virus samples collected per day since symptom onset = 2). Note that the scales of the y axes differ between (A) and (B) to better show rate trends.

Consolidating over all samples, most nonsynonymous variants were found in the nucleoprotein (NP) and neuraminidase (NA) gene segments (nonsynonymous to synonymous variant [NS/S] ratios = 1.69 [NP] and 1.32 [NA], whereas NS/S ratios were ≤1 for all other gene segments; Figure 2—figure supplement 1 and Supplementary file 1). While nonsynonymous NA mutations associated with oseltamivir resistance were positively selected for a subset of individuals in response to the antiviral treatment (Koel et al., 2020), nonsynonymous changes to NP were likely mediated by protein stability, T-cell immune response, and/or host cellular factors (see next section).

For A/H1N1pdm09 viruses during the first wave of the pandemic, the overall within-host evolutionary rate was as high as ∼10−4 divergence per site per day in some samples on day 0 post-symptom onset (Figure 2B). We observed higher nonsynonymous evolutionary rates relative to synonymous ones initially after symptom onset but were unable to determine if they were significantly different due to the low number of samples (i.e., median = 2 samples per day post-symptom onset). In turn, we also could not meaningfully characterize the temporal trends of within-host evolution for the pandemic virus with this dataset. Nonetheless, consolidating over all samples across all timepoints, there were significantly higher rates of accumulation of nonsynonymous variants in the polymerase basic 2 (PB2), polymerase acidic (PA), HA, and matrix (M) gene segments (Figure 2B, Figure 2—figure supplement 2, and Figure 3—figure supplement 2). All gene segments also yielded NS/S ratios > 1 (Supplementary file 1).

Intra-host minority variants

Most of the iSNVs identified for both virus subtypes were observed at low frequencies (2–5%; Figure 3) and appear to be stochastically introduced across the virus genome (Figure 4). Purifying selection dominated within-host seasonal A/H3N2 virus populations as the ratio of nonsynonymous to synonymous variants was 0.72 across all samples and variant frequencies (Figure 3A and Figure 3—figure supplement 1). Of note, the canonical antigenic sites of the HA gene segment (Wiley et al., 1981) of the A/H3N2 virus populations experienced strong negative selection as evidenced by the occurrence of synonymous variants (median frequency = 0.14, IQR range = 0.09–0.27) at far greater frequencies relative to those at non-antigenic sites of HA (median frequency = 0.03, IQR range = 0.03–0.05; Mann–Whitney U test p=1.18×10-24; Figure 4C). There were no significant differences in the frequencies of nonsynonymous iSNVs between the antigenic sites of H3 (median frequency = 0.04, IQR range = 0.03–0.06) and the rest of the HA gene segment (median frequency = 0.03, IQR range = 0.02–0.06; Mann–Whitney U test p=0.29; Figure 4C). In contrast, there were 1.94 times as many nonsynonymous minority iSNVs relative to synonymous ones identified in the pandemic A/H1N1pdm09 virus samples (Figure 3B and Figure 3—figure supplement 2). Variant frequencies of nonsynonymous iSNVs found in the antigenic epitopes of H1 (Caton et al., 1982) (median frequency = 0.04, IQR range = 0.04–0.05) were, however, not significantly different from those of non-antigenic sites (median frequency = 0.05, IQR range = 0.03–0.16; Mann–Whitney U test p=0.34; Figure 4D).

Figure 3 with 2 supplements see all
Histogram of the mean number of minority intra-host single-nucleotide variants (iSNVs) identified per sample across all.

(A) A/H3N2 and (B) A/H1N1pdm09 virus specimens, sorted by frequency bins of 5% and substitution type (synonymous – blue; nonsynonymous – red; stop-codon – gray).

Figure 4 with 9 supplements see all
Intra-host single-nucleotide variants in within-host IAV populations.

(A) Breakdown of intra-host single-nucleotide variants (iSNVs) identified in seasonal A/H3N2 virus samples. The top panels plot the nucleotide positions where iSNVs were found in at least two subjects. The bottom panels show the frequencies at which iSNVs were identified. For sites with iSNVs that were found in two or more subjects, the interquartile ranges of variant frequencies are plotted as vertical lines and the median frequencies are marked with a dash. If the iSNV was only found in one subject, its corresponding frequency is plotted as a circle. All iSNVs are stratified to either synonymous (blue), nonsynonymous (red), or stop codon (gray) variants. Only the nonsynonymous variants are plotted if both types of variants are found in a site. Positions of antigenic sites of the hemagglutinin (HA)_ gene segment (Igarashi et al., 2010) are marked in green on the top panels. (B) Box plots of the frequencies of synonymous and nonsynonymous variants between antigenic and non-antigenic sites of seasonal A/H3N2 HA gene segment. (C) Similar plots to (A) for iSNVs found in pandemic A/H1N1pdm09 virus samples. (D) Similar plots to (B) for HA iSNVs identified in the pandemic A/H1N1pdm09 virus samples.

As observed in a previous study using different data (Xue and Bloom, 2020), premature stop codon (nonsense) mutations accumulated within-host, though only at low rates. Here, we observed similarly low median nonsense rates, ranging between 0 and 1.29 × 10−6 divergence per site per day across the entire A/H3N2 virus genome over the course of infection (IQR limits range between 0 and at most, 1.82 × 10−6 divergence per site per day; Figure 2A). Premature stop codons accumulated in the matrix (M) genes predominantly but also appeared in all other influenza gene segments within various individuals (Figures 2A and 4A). Nonsense mutations also accumulated within the A/H1N1pdm09 virus samples (Figure 2B). Similar to A/H3N2 viruses, nonsense mutation rates were much lower compared to the synonymous and nonsynonymous counterparts (median genome-wide rate across all samples between 0 and 1.43 × 10−6 divergence per site per day; IQR limits between 0 and 2.18 × 10−6 divergence per site per day).

The premature stop codon mutations were mostly found at low frequencies for both influenza subtypes (<10%; Figure 3). The exception lies with one of the A/H3N2 virus samples where a premature stop codon was found in position 77 of the M2 ion channel with variant frequency as high as 34.6% (patient 1843, day 6 since symptom onset; Figure 4A and Figure 4—figure supplement 4). The premature stop codon in M2-77 was also found in 27 other individuals across multiple timepoints, albeit at a much lower frequency that never amounted more than 10% (Figure 4A and Figure 4—figure supplement 4). This was unlikely to be a sequencing artifact resulting from a mistaken incorporation of the primer sequence as its carboxyl terminal falls outside the coding region of the M gene segment (Supplementary file 3) and the variant frequencies would have been much higher in all samples if this was the case.

Despite the dominance of purifying selection in seasonal A/H3N2 intra-host viral populations, we detected several nonsynonymous variants of interest. Amino acid variants emerging in the HA and NA proteins were discussed in a previous work (Koel et al., 2020; Appendix A1). In the NP, there were two notable nonsynonymous variants, D101N/G and G384R, that appeared in multiple individuals who were sampled independently between 2007 and 2009 (Figure 4A and Figure 4—figure supplement 3). D101N/G was found in seven different patients and at least for D101G the mutation was previously linked to facilitating escape from MxA, a key human antiviral protein (Mänz et al., 2013). However, the nonsynonymous mutation was only found in low frequencies and remained invariant during the respective courses of infection for all seven patients (median variant frequency across all samples = 0.03; IQR = 0.02–0.07).

NP-G384R emerged in 16 unlinked patients infected by A/H3N2 virus. Even though G384R did not become the majority variant in any of these individuals (median variant frequency across all samples = 0.14; IQR = 0.07–0.20), the variant emerged around days 4–5 post-symptom onset and mostly persisted within each individual for the rest of sampled timepoints. G384R is a stabilizing mutation in the A/Brisbane/10/2007 A/H3N2 virus NP background (Ashenberg et al., 2013) that is similar to the viruses investigated here. Interestingly, position 384 is an anchor residue for several NP-specific epitopes recognized by specific cytotoxic T lymphocytes (CTLs) that are under continual selective pressure for CTL escape (Berkhoff et al., 2005; Gog et al., 2003). The wild-type glycine residue is known to be highly deleterious even though it was shown to confer CTL escape among HLA-B*2705-positive individuals (Gong et al., 2013; Rimmelzwaan et al., 2004; Berkhoff et al., 2004).

Using a maximum-likelihood (ML) approach to reconstruct and estimate the frequencies of the most parsimonious haplotypes of each gene segment, we computed linkage disequilibrium and found evidence of potential epistatic co-variants to NP-G384R in the A/H3N2 virus populations of multiple individuals (Figure 5 and Supplementary file 2). When analyzing how these variants could alter protein stability using FoldX, the stabilizing effects of G384R (mean ΔΔG=-3.84 kcal/mol [SD = 0.06 kcal/mol]) were found to alleviate the likely destabilizing phenotype of a functionally relevant linked variant in two of the three co-mutation pairs identified in separate individuals (i.e., G384R/M426I and G384R/G102R; Supplementary file 2). In the first individual (subject 1224), M426I was inferred to have emerged among the viral haplotypes encoding NP-G384R on the 10th day post-symptom onset (D10). M426I may be compensating for T-cell escape that was previously conferred by 384G even though the two amino acid sites are anchor residues of different NP-specific CTL epitopes (Berkhoff et al., 2005). M426I was found to be highly destabilizing (mean ΔΔG=2.61 kcal/mol [standard deviation, SD = 0.05 kcal/mol]; Table 1) but when co-mutated with G384R, stability changes to NP were predicted to be neutral (mean ΔΔG=-0.42 kcal/mol [SD = 0.06 kcal/mol]). In the second individual (subject 1686), G102R was likely linked to G384R in the within-host virus populations found in the D10 sample. As a single mutant, G102R is also destabilizing to NP (mean ΔΔG=4.87 kcal/mol [SD = 0.00 kcal/mol]). However, when combined with G384R, NP stability was only weakly destabilizing (mean ΔΔG=0.76 kcal/mol [SD = 0.09 kcal/mol]). G102R was previously found to bypass the need for cellular factor importin-α7, which is crucial for viral replication and pathogenicity of IAVs in humans (Resa-Infante et al., 2015; Resa-Infante et al., 2019; Gabriel et al., 2011).

The trimeric and monomeric crystal structures of nucleoprotein (PDB: 3ZDP) (Chenavas et al., 2013) of influenza A viruses.

Amino acid sites with potentially linked epistatic amino acid variants as tabulated in Table 1 are separately colored, with their corresponding positions annotated on the monomeric structure.

Table 1
FoldX stability predictions of likely linked nonsynonymous minority variants found in A/H3N2 nucleoprotein.

The mean ΔΔG and standard deviation (SD) values reported are based on the results of five distinct simulations. Variants with mean ΔΔG<-0.46 kcal/mol are deemed to be stabilizing while destabilizing mutants were estimated to yield ΔΔG>0.46 kcal/mol.

ΔΔG(kcal/mol)
VariantsMeanSD
G384R−3.840.06
M426I2.610.05
G384R,M426I−0.420.06
G102R4.870.00
G384R,G102R0.760.09
A493T11.960.30
G384R,A493T5.560.19
V197I−3.110.02
S353Y−1.970.68
V197I,S353Y−4.480.14

For the pandemic A/H1N1pdm09 viruses, most of the nonsynonymous variants were found singularly in individual patients (Figure 4B). Putative HA antigenic minority variants were found in four individuals in distinct amino acid sites (G143E, N159K, N197K, and G225D; H3 numbering without signal peptide; Figure 4—figure supplement 5). All of these variants were found at frequencies 5% and the wild-type residues have been conserved in the corresponding positions globally to date, with the exception of position 225. Here, HA-225G was the majority variant (76%) in a hospitalized individual (subject 11-1022; Supplementary file 4) and D225G is linked to infections with severe disease outcomes (Mak et al., 2010). Furthermore, one of the few nonsynonymous iSNVs that co-emerged in multiple unlinked patients was found in the usually conserved stem of the HA protein, L455F/I (H3 numbering without signal peptide), appearing in 17 separate individuals (Figure 4B and Figure 4—figure supplement 5). The amino acid variant was found in patients from different time periods and geographical locations (Supplementary file 4), thus it is unlikely that this was a unique variant shared among individuals in the same transmission cluster. It was observed as early as day 0 post-symptom onset for some patients and seemed to persist during the infection but only as a minority variant at varying frequencies (median frequency across all samples with mutation = 0.20; IQR = 0.08–0.28). However, this position has also been conserved with the wild-type Leucine residue in the global virus population to date. Hence, it is unclear if HA-L455F/I actually confers any selective benefit even though it was independently found in multiple patients.

We also found oseltamivir resistance mutation H275Y (Mai et al., 2010) in the NA proteins in two unlinked individuals who were infected with the A/H1N1pdm09 virus and treated with oseltamivir (Figure 4—figure supplement 6 and Supplementary file 4). 275Y quickly became the majority variant in both patients within 3–4 days after the antiviral drug was first administered. Finally, there were two other amino acid variants in the M2 ion channel that appeared within multiple subjects in parallel across different geographical locations – L46P and F48S were identified in 8 and 16 patients, respectively, in a range of frequencies (L46P: median frequency = 0.04, IQR = 0.04–0.05; F48S: median frequency = 0.08, IQR = 0.03–0.13) but similarly, never becoming a majority variant in any of them (Figure 4 and Figure 4—figure supplement 7). Again, the wild-type residues were mostly conserved in the global virus population since the pandemic.

Within-host simulations

To investigate the evolutionary pressures that likely underpin the observed within-host dynamics of A/H3N2 viruses in young children (Figure 2), we performed forward-time Monte Carlo simulations. Given that the median age of the children infected by A/H3N2 virus at the time of sample collection was 2 years of age (IQR = 2–3 years), most of them were likely experiencing one of their first influenza virus infections. Furthermore, influenza vaccination for children is not part of the national vaccination program in Vietnam. As such, most of the children analyzed here lacked influenza virus-specific antibodies based on hemagglutination inhibition assays (Koel et al., 2020). Since seasonal A/H3N2 viruses have circulated within the human population since 1968, the virus is well adapted to human hosts at this point such that most nonsynonymous mutations are likely highly deleterious and would not reach detectable frequencies. We hypothesized that detected variants are mostly expected to be weakly deleterious, and thus not purged fast enough by selection such that mutation-selection balance was observed.

Our simulations used a simple within-host evolution model represented by a binary genome that distinguishes between synonymous and nonsynonymous loci. Given that the estimated transmission bottleneck sizes for seasonal A/H3N2 viruses (McCrone et al., 2018; Ghafari et al., 2020) are narrow at 1–2 genomes, we modeled an expanding virus population size during the initial timepoints of the infection that started with one virion. If within-host virus populations were to evolve neutrally, we would observe similar synonymous and nonsynonymous evolutionary rates throughout the infection (Figure 6A). On the other hand, if negative selection is sufficiently strong, accumulation of deleterious nonsynonymous variants will decrease substantially with time (Figure 6B). Clearly, these patterns were not observed for A/H3N2 viruses (Figure 2A). However, if most de novo nonsynonymous mutations are only weakly deleterious, we would observe larger synonymous evolutionary rates initially before nonsynonymous variants accumulate to similar levels (Figure 6C). By then, virion population size (N) would also be large enough relative to the virus mutation rate (µ) (i.e., Nμ1; Appendix A5) such that mutation-selection balance is expected and evolutionary rates remain fairly constant, similar to the patterns empirically observed for within-host A/H3N2 virus populations (Figure 2A).

Figure 6 with 1 supplement see all
Evolutionary rates computed from forward-time Monte Carlo within-host simulations for different mean deleterious effects (sd-) of nonsynonymous mutations.

We assumed that synonymous mutations are neutral for all simulations. (A) Neutral expectation where all nonsynonymous mutations are neutral (fneu,NS=100%). We tested our hypotheses where the majority of nonsynonymous mutations are non-neutral (fneu,NS=1%) and they are either (B) strongly (sd-=-0.1) or (C) weakly (sd-=-0.01) deleterious.

Discussion

Multiple next-generation sequencing studies have found little evidence of positive selection in seasonal influenza virus populations of acutely infected individuals (McCrone et al., 2018; Dinis et al., 2016; Debbink et al., 2017; Valesano et al., 2020; Sobel Leonard et al., 2016; McCrone et al., 2020). Recent modeling work showed that the time required to initiate new antibody production and asynchrony with virus exponential growth limits the selection of de novo antigenic variants within host in acute seasonal influenza virus infections (Morris et al., 2020). In contrast, phenotypically relevant variants that were positively selected in within-host virus populations of severely immunocompromised patients coincided with those selected by the global seasonal IAV population (Xue et al., 2017; Lumby et al., 2020). This implies that within-host evolutionary dynamics of seasonal IAVs in immunocompromised individuals are likely to be substantially different owing to the increased time for virus diversity to accumulate and for selection to act (Petrova and Russell, 2018). In other words, the duration of infection is likely to be critical for positive evolutionary selection to be effective within host.

Viral shedding duration is often longer in young children infected with seasonal influenza virus compared to otherwise healthy adults (Ng et al., 2016). Children also play a critical role in ‘driving’ influenza epidemics due to their higher contact and transmission rates (Worby et al., 2015; Viboud et al., 2004). As such, our seasonal A/H3N2 virus results fill an important gap in the current literature of within-host evolutionary studies of seasonal IAVs as most of the samples analyzed were collected from children under the age of 6 years up to 2 weeks post-symptom onset. Importantly, the absence of antibody-mediated immunity in young unvaccinated children, which would otherwise reduce the extended duration of infection, has the potential to facilitate other routes of virus evolution.

Similar to the aforementioned within-host studies, the A/H3N2 virus population within these children was characterized by low genetic diversity and dominated by purifying selection early in the infection. Due to a lack of antibody response against the antigenic regions of HA (Koel et al., 2020), it is unsurprising that we observed a lack of adaptive changes to the HA antigenic regions, similar to adults in previous studies (McCrone et al., 2018). We also found that the polymerase genes were subjected to purifying selection, indicating their critical role in virus replication as negative selection purges deleterious variation. However, while purifying selection is detectable, it is incomplete (Xue and Bloom, 2020). We observed that most nonsynonymous variants began to accumulate around 3–4 days post-symptom onset, with incrementally higher empirical rates as the infection progressed.

Through simulations of a within-host evolution model, we investigated the hypothesis that in the absence of any positive selection, the accumulation of nonsynonymous iSNVs was a result of their neutral or only weakly deleterious effects and the expanding within-host virion population size during later timepoints in longer infections of naïve young children such that mutation-selection balance was reached. In contrast, this balance was not detected in otherwise healthy older children or adults with short-lived influenza virus infections lasting no more than a week where de novo nonsynonymous iSNVs are rarely found (McCrone et al., 2018; Dinis et al., 2016; Debbink et al., 2017; Valesano et al., 2020; Sobel Leonard et al., 2016; McCrone et al., 2020). The maintenance of genetic diversity through mutation-selection balance within these young children may provide opportunities for the emergence of phenotypically relevant mutations, which deleterious effects could be alleviated by the accumulation of a secondary compensatory mutations. For example, in one individual NP-G384R was accompanied by NP-M426I, which is an anchor residue of a CTL epitope of NP, abrogating recognition by HLA-B*3501-positive CTLs (Berkhoff et al., 2005) but is likely to be deleterious based on our computational protein stability predictions. G384R, which is located in a CTL epitope distinct from M426I (Berkhoff et al., 2005), was previously shown to be a stabilizing substitution (Ashenberg et al., 2013).

Interestingly, we also observed G384R in the minority virus population of 15 other unlinked individuals. Besides improving NP stability, G384R restores recognition by HLA-B*2705-positive NP-specific CTLs (Berkhoff et al., 2004). The NP gene segment in the global A/H3N2 virus population has an evolutionary history of fixating destabilizing amino acid mutations that promote CTL immune escape alongside stabilizing substitutions that compensate for the deleterious effects of the former (Gong et al., 2013). The reversal R384G mutation confers CTL escape but is known to be highly deleterious. This substitution was fixed in the global A/H3N2 virus population during the early 1990s as other substitutions such as S259L and E375G epistatically alleviated its destabilizing effects (Gong et al., 2013). One possible explanation for the emergence of G384R as a minority variant within these unlinked individuals is that they are all HLA-B*2705 negative. However, we did not collect the necessary blood samples to investigate this possibility.

In contrast, we found a substantially higher fraction of nonsynonymous variants in the within-host virus populations of individuals infected with A/H1N1pdm09 virus during the pandemic. Owing to the low number of A/H1N1pdm09 virus samples and different next-generation sequencing platforms used to sequence samples of the two virus subtypes and consequently differences in base calling error rates and depth of coverage (Figure 1—figure supplement 1), we were unable to directly compare the observed levels of within-host genetic diversity and its temporal trends between the two influenza subtypes here. However, given that only iSNVs with frequencies 2% were called, low-frequency minority variants arising from technical-related errors should be minimized (Watson et al., 2013). Importantly, the relative number of nonsynonymous iSNVs identified was far greater than synonymous ones in the pandemic A/H1N1pdm09 virus infections, suggesting that there was room for further human host adaptation, particularly in the HA but also in the polymerase gene segments similar to those observed in other zoonotic influenza virus infections (Welkers et al., 2019).

Given the tight estimated transmission bottleneck size (Appendix A4), the relatively large number of iSNVs identified at the start of symptom onset and simulations of within-host evolution (Figures 1B, 2B, and 6D), it is unlikely that the initial within-host A/H1N1pdm09 virus populations sampled were the inoculating population that founded the infection. Instead, the inoculating viral population had already undergone substantial within-host replication during the incubation period before symptom onset. In fact, four of the individuals analyzed were asymptomatic (i.e., H058/S02, H089/S04, H186/S05, and H296/S04; Supplementary file 4). Additionally, pre-symptomatic virus shedding was observed in some of the secondary household cases (Thai et al., 2014) and pre-symptomatic transmission has been documented in other settings (Suess et al., 2012). Nonetheless, this would not meaningfully impact our conclusions as most of the within-host viral populations sampled at the start of symptom onset should still constitute those found early in infection and the contrasting feature where nonsynonymous iSNVs outnumbered synonymous ones were not observed in the seasonal A/H3N2 virus samples.

For both A/H3N2 and A/H1N1pdm09 virus samples, nonsense iSNVs resulting in premature stop codons were found to accumulate within host, even though only at low proportions. The accumulation of premature stop codon mutations further suggests that while purifying selection dominates within-host influenza virus populations, it may not be acting strongly enough to completely purge these lethal nonsense mutations (Xue and Bloom, 2020). Additionally, it has been recently found that incomplete influenza virus genomes frequently occur at the cellular level and that efficient infection depends on the complementation between different incomplete genomes (Jacobs et al., 2019). As such, nonsense mutations may not be as uncommon as previously thought. In particular, nonsense mutations in position 77 of the M2 ion channel were independently found in 27 unlinked individuals infected by A/H3N2 virus. While these nonsense mutations are generally considered to be lethal, ion channel activity is retained even if the M2 protein was prematurely truncated up to position 70 at its cytoplasmic tail (McCown and Pekosz, 2005).

Our study has several limitations. The number of iSNVs identified can potentially be biased by variations in sequencing coverage (Zhao and Illingworth, 2019). As such, the number of iSNVs observed in one intra-host virus populations may not be directly comparable to another with a distinct coverage profile (Figure 1—figure supplement 1). As an alternative, the nucleotide diversity π statistic (Nei and Li, 1979) may be a more robust measure of within-host diversity as it solely depends on the underlying variant frequencies (Zhao and Illingworth, 2019). Computing the corresponding π statistics for our data, we observed trends in genetic diversity that were similar to those inferred using iSNV counts (Appendix A2 and Appendix 1—figure 1).

To ensure accurate measurements of virus diversity in intra-host populations, we would also need to be certain that the estimated variant frequencies precisely reflect the distributions of variants that comprise the sampled virus populations. The inferred variant frequencies can be significantly distorted if virus load is low (Illingworth et al., 2017; Xue et al., 2018). As such, we limited our analyses for both virus subtypes to samples with Ct values ≤35, which likely afford sufficient virus material for sequencing (Xue et al., 2018). We were unable to estimate the amount of frequency estimation errors for the A/H1N1pdm09 virus samples as only one sequencing replicate was performed using the universal 8-segment PCR method (Hoffmann et al., 2001). However, for the A/H3N2 virus samples, independent PCR reactions were performed using three partly overlapping amplicons for all gene segments other than the nonstructural and matrix genes. We compared the variant frequencies estimated for any overlapping sites generated by reads derived from distinct amplicons with sufficient coverage (>100×). Variant frequencies computed from independent amplicons agreed well with each other across the range of Ct values of the samples from which variants were identified (Figure 1—figure supplement 2), affirming the precision of our iSNV frequency estimates for the A/H3N2 virus samples, including those with higher Ct values.

We also performed additional checks to ensure that our results were not driven by potential PCR and/or technical artifacts. First, we excluded all iSNVs found under the 75th percentile of frequency range of A/H3N2 variants that were found in only one of the overlapping amplicons. We then recomputed the daily within-host evolutionary rates with the remaining iSNVs (Figure 2—figure supplement 3) and found that the relative temporal trends in synonymous and nonsynonymous rates remain similar to those in Figure 2A. We also checked that the distributions of frequencies for iSNVs found in recurrent mutation sites (i.e., NP-384 and M2-77) that are below variant calling threshold are comparable to those found in their neighboring sites (±10 nucleotide positions; Figure 4—figure supplement 8). Furthermore, we remapped the sample reads to their respective consensus sequences to minimize mapping of technical artifacts. We were still able to detect the recurring NP-G384R and M2-R77* amino acid mutations in multiple individuals and timepoints at similar frequencies when mapped to the reference genome (Figure 4—figure supplements 4, 5, and 9). As such, these recurrent mutations are unlikely to have been resulted from erroneous variant calls of artifacts.

Finally, most study participants received oseltamivir during the course of their infections (Supplementary file 4). Although we were unable to identify any potential effects of enhanced viral clearance or any other evolutionary effects due to the treatment, besides oseltamivir resistance-associated mutations, it is unlikely that the antiviral treatment had a substantial impact on our results. First, the median timepoint in which the antiviral was initially administered was 4 days post-symptom onset (IQR = 3–6 days; Supplementary file 4). Previous studies showed that enhanced viral clearance of IAVs was mostly observed among patients who were treated with oseltamivir within 3 days of symptom onset (South East Asia Infectious Disease Clinical Research Network, 2013; Lee et al., 2009; Ling et al., 2010). Of note, late timepoint samples in this study (≥8 days since symptom onset) mostly came from individuals who started oseltamivir treatments ≥4 days post-symptom onset (Figure 1—figure supplement 5). Second, at least in vitro, there were no differences in the levels of genetic diversity observed in influenza virus populations after multiple serial passages whether they were treated with oseltamivir or not (Renzette et al., 2014).

To conclude, we presented how intra-host populations of seasonal and pandemic influenza viruses are subjected to contrasting evolutionary selection pressures. In particular, we showed that the evolutionary dynamics and ensuing genetic variation of these within-host virus populations changes during the course of infection, highlighting the importance for sequential sampling, particularly for longer-than-average infections such as those in the young children studied here.

Materials and methods

Sample collection and viral sequencing

The A/H3N2 virus samples were collected from 52 patients between August 2007 and September 2009 as part of an oseltamivir dosage trial conducted by the South East Asia Infectious Disease Clinical Research Network (SEAICRN), which is detailed in a previous work (South East Asia Infectious Disease Clinical Research Network, 2013). Briefly, patients with laboratory-confirmed influenza virus infection and duration of symptoms ≤10 days were swabbed for nose and throat samples daily between 0 and 10 days as well as day 14 upon enrolment for the study (Supplementary file 4). All PCR-confirmed A/H3N2 virus samples with cycle threshold (Ct) values ≤35 were included for sequencing.

Library preparation and viral sequencing protocols performed on these A/H3N2 virus samples are elaborated in detail in Koel et al., 2020. Here, we highlight the key aspects of our preparation and sequencing procedures. Using segment-specific primers (Supplementary file 3), we performed six independent PCR reactions, resulting in three partly overlapping amplicons for each influenza virus gene segment other than the matrix (M) and nonstructural (NS) genes where a single amplicon was produced to cover the entirety of the relatively shorter M and NS genes. The use of shorter but overlapping amplicons in the longer gene segments improves amplification efficiency, ensuring that these longer segments are sufficiently covered should there be any RNA degradation in the clinical specimen. These overlapping PCR products were pooled in equimolar concentrations for each sample and purified for subsequent library preparation. Sequencing libraries were prepared using the Nextera XT DNA Library Preparation kit (Illumina, FC-131-1096) as described in Koel et al., 2020. Library pools were sequenced using the Illumina MiSeq 600-cycle MiSeq Reagent Kit v3 (Illumina, MS-102-3003).

The A/H1N1pdm09 virus samples were obtained as part of a household-based influenza virus cohort study that was also performed by SEAICRN. The study was conducted between July and December 2009, involving a total of 270 households in Ha Nam province, Vietnam (Horby et al., 2012). Similarly, combined nose and throat swabs were collected daily for 10–15 days from individuals with influenza-like illness (i.e., presenting symptoms of fever >38°C and cough, or sore throat) and their household members, including asymptomatic individuals (Supplementary file 4). We also analyzed additional samples collected from unlinked hospitalized patients who were infected by the A/H1N1pdm09 virus from two major Vietnamese cities (Hanoi and Ho Chih Minh) during the first wave of the pandemic (South East Asia Infectious Disease Clinical Research Network, 2013; Hien et al., 2010). A total of 32 PCR-confirmed A/H1N1pdm09-infected individuals originating from both households and hospitalized cases were selected for sequencing based on availability and Ct values ≤33 (Supplementary file 4).

For the A/H1N1pdm09 virus samples, RNA extraction was performed manually using the High Pure RNA isolation kit (Roche) with an on-column DNase treatment according to the manufacturer’s protocol. Total RNA was eluted in a volume of 50 μl. Universal influenza virus full-genome amplification was performed using a universal 8-segment PCR method as described previously (Watson et al., 2013; Zhou et al., 2009; Jonges et al., 2014). In short, two separate RT-PCRs were performed for each sample, using primers common-uni12R (5′-GCCGGAGCTCTGCAGAT ATCAGCRAAAGCAGG-3′), common-uni12G (5′-GCCGGAGCTCTG CAGATATCAGCGAAAGCAGG-3′), and common-uni13 (5′-CAGGAA ACAGCTATGACAGTAGAAACAAGG-3′). The first RT-PCR mixture contained the primers common-uni12R and common-uni13. The second RT-PCR mixture contained the primers common-uni12G and common-uni13, which greatly improved the amplification of the PB2, PB1, and PA segments. Reactions were performed using the One-Step RT-PCR kit High Fidelity (Invitrogen) in a volume of 50 μl containing 5.0 μl eluted RNA with final concentrations of 1xSuperScript III One-Step RT-PCR buffer, 0.2 μM of each primer, and 1.0 μl SuperScript III RT/Platinum Taq High Fidelity Enzyme Mix (Invitrogen). Thermal cycling conditions were as follows: reverse transcription at 42°C for 15 min, 55°C for 15 min, and 60°C for 5 min; initial denaturation/enzyme activation of 94°C for 2 min; 5 cycles of 94°C for 30 s, 45°C for 30 s, slow ramp (0.5 °C/s) to 68°C, and 68°C for 3 min; 30 cycles of 94°C for 30 s, 57°C for 30 s, and 68°C for 3 min; and a final extension of 68°C for 5 min. After the PCR, equal volumes of the two reaction mixtures were combined to produce a well-distributed mixture of all eight influenza virus segments. All RT-PCRs were performed in duplicate. Samples were diluted to a DNA concentration of 50 ng/µl followed by ligation of 454 sequencing adaptors and molecular identifier (MID) tags using the SPRIworks Fragment Library System II for Roche GS FLX+ DNA Sequencer (Beckman Coulter), excluding fragments smaller than 350 base pairs, according to the manufacturer's protocol to allow for multiplex sequencing per region. The quantity of properly ligated fragments was determined based on the incorporation efficiency of the fluorescent primers using FLUOstar OPTIMA (BMG Labtech). Emulsion PCR, bead recovery and enrichment were performed manually according to the manufacturer's protocol (Roche), and samples were sequenced in Roche FLX+ 454. Sequencing was performed at the Sanger Institute, Hinxton, Cambridge, England, as part of the FP7 program EMPERIE. Standard flowgram format (sff) files containing the filter passed reads were demultiplexed based on the molecular identifier (MID) sequences using QUASR package version 7.0 (Watson et al., 2013).

Read mapping

Trimmomatic (v0.39; Bolger et al., 2014) was used to discard reads with length <30 bases while trimming the ends of reads where base quality scores fall below 20. The MAXINFO option was used to perform adaptive quality trimming, balancing the trade-off between longer read length and tolerance of base calling errors (target length = 40, strictness = 0.4). For the A/H3N2 virus samples, the trimmed paired reads were merged using FLASH (v1.2.11) (Magoč and Salzberg, 2011). All remaining reads were then locally aligned to A/Brisbane/10/2007 genome (GISAID accession: EPI_ISL_103644) for A/H3N2 virus samples and A/California/4/2009 genome (EPI_ISL_376192) for A/H1N1pdm09 virus samples using Bowtie2 (v2.3.5.1) (Langmead and Salzberg, 2012). Aligned reads with mapping scores falling below 20 alongside bases with quality score (Q-score) below 20 were discarded.

Variant calling and quality filters

Minority variants of each nucleotide site with a frequency of at least 2% were called if the nucleotide position was covered at least 50× (H1N1pdm09) or 100× (H3N2) and the probability that the variant was called as a result of base calling errors (pErr) was less than 1%. pErr was modeled by binomial trials (Illingworth, 2016):

pErr=i=nNNipei1-peN-i

where pe=-10-Q-score10, N is the coverage of the nucleotide site in question, and n is the absolute count of the variant base tallied.

While lower coverage at both ends of individual gene segments was expected, there were also variable coverage results across gene segments for some samples that were mapped to A/H3N2 virus (Figure 1—figure supplement 1). In order to retain as many samples deemed to have adequate coverage across whole genome, a list of polymorphic nucleotide sites found to have >2% minority variants in more than one sample was compiled. Each gene segment of a sample was determined to achieve satisfactory coverage if >70% of these polymorphic sites were covered at least 100×. For A/H1N1pdm09, the gene segment of a sample was deemed to be adequately covered if 80% of the gene was covered at least 50×.

The number of iSNVs observed in A/H3N2 virus samples collected from subject 1673 (39–94 iSNVs in three samples collected from 3 [D3] to 5 [D5] days post-symptom onset) and the D8 sample for subject 1878 (73 iSNVs) were substantially greater than the numbers in all other samples. The putative majority and minority segment-concatenated sequences of these samples did not cluster as a monophyletic clade among themselves phylogenetically (Figure 1—figure supplement 3), suggesting that these samples might be the product of mixed infections or cross-contamination. These samples were consequently excluded from further analyses.

Empirical within-host evolutionary rate

The empirical within-host evolutionary rate (rg,t) of each gene segment (g) in a sample collected on t day(s) since symptom onset was estimated by

rg,t=ing,tfg,t,ing,tt

where fg,t,i is the frequency of minority variants present in nucleotide site i for gene segment g and ng,t is the number of all available sites (Xue and Bloom, 2020). Distinct rates were calculated for synonymous and nonsynonymous iSNVs. If a variant was found in overlapping reading frames and a nonsynonymous change was observed in any of those frames, it would be accounted for as a nonsynonymous mutation. The corresponding whole-genome evolutionary rate (rt) on day t is computed by summing the rates across all gene segments:

rt=grg,t

Haplotype reconstruction

The most parsimonious viral haplotypes of each gene segment were reconstructed by fitting the observed nucleotide variant count data to a Dirichlet multinomial model using a previously developed ML approach to infer haplotype frequencies (Ghafari et al., 2020). Assuming that the viral population is made up of a set of K haplotypes with frequencies qk, the observed partial haplotype frequencies ql for a polymorphic site l can be computed by multiplying a projection matrix Tl. For instance, if the set of hypothetical full haplotypes is assumed to be {AA, GA, AG}, the observed partial haplotype frequencies for site l=1, qA- and qG- are computed as

ql=TlqkqA-qG-=101010×qAAqGAqAG

A list of potential full haplotypes was generated from all combinations of nucleotide variants observed in all polymorphic sites of the gene segment. Starting from K=1 full haplotype, the optimal full haplotype frequency qk is inferred by maximizing the likelihood function:

LL=llogxlTlqk,φ

where is Dirichlet multinomial likelihood, xl is the observed variant count data for read type l, and φ is the overdispersion parameter, assumed to be 1×10-3. Simulated annealing was used to optimize the haplotype frequencies by running two independent searches for at least 5000 states (iterations) until convergence was reached. In each state, the distribution of qk was drawn from a Gaussian distribution centered at the frequency distribution of the previous state with a standard deviation of 0.05. One additional haplotype was added to the set of K full haplotypes during each round of optimization.

The resulting K haplotypes reconstructed depend on the order in which the list of potential full haplotypes is considered. As mentioned above, paired-end reads were merged to produce longer reads (up to ~500–600 base pairs) for mapping in the case of the seasonal A/H3N2 virus samples. Additionally, the single-stranded A/H1N1pdm09 viral reads generated from 454 sequencing can be as long as ~500 base pairs. Consequently, there was a non-trivial number of reads where co-mutations were observed in multiple polymorphic sites. Since iSNV frequencies are generally low, haplotypes with co-mutating sites would inevitably be relegated to the end of the list order if ranked by their expected joint probabilities. As such, the list of full potential haplotypes was ordered in descending order based on the score of each full haplotype set k (sk):

sk=fss,k×fms,k

where fss,k and fms,k are both joint probabilities of the full haplotype k computed in different ways. fss,k is the expected joint probability frequency calculated from the observed independent frequencies of each variant for each polymorphic site found in the full haplotype k. fms,k is based on the observed frequencies of variants spanning across the sets of highest hierarchal combination of polymorphic sites (fms,k).

For example, given a segment where iSNVs were found in three sites, the following reads were mapped: (A, A, C), (T, A, C), (A, T, C), (A, C, –), (–, A, C), and (–, T, C). We can immediately see that the top hierarchal combination of polymorphic sites (i.e., possible haplotypes) are (A, A, C), (T, A, C), and (A, T, C) (i.e., we would compute fms,(A,A,C), fms,(T,A,C), and fms,(A,T,C), respectively). The observed number of reads with (–, A, C) will be counted towards the computation of both fms,(A,A,C) and fms,(T,A,C) since they could be attributed to either haplotype. Similarly, reads with (–, T, C) will be absorbed towards the counts to compute fms,(A,T,C). Finally, we see that reads with (A, C, –) are not a subset of any of the top hierarchal haplotypes considered. As such, they form the fourth possible top hierarchal haplotype on its own. As such, if we were to compute the ranking for haplotype (A, A, C):

s(A,A,C)=fss,(A,A,C)×fms,(A,A,C)={f(A,,)×f(,A,)×f(,,C)}×fms,(A,A,C)

If any nucleotide variants in the observed partial haplotypes were unaccounted for in the current round of full haplotypes considered, they were assumed to be generated from a cloud of 'noise' haplotypes that were present in no more than 1%. Bayesian information criterion (BIC) was computed for each set of full haplotypes considered, and the most parsimonious set of K haplotypes was determined by the lowest BIC value.

Linkage disequilibrium

Using the estimated frequencies of the most parsimonious reconstructed haplotypes, conventional Lewontin’s metrics of linkage disequilibrium were computed to detect for potential epistatic pairs of nonsynonymous variants:

LDij=q^ij-q^iq^j

where q^i and q^j are the estimated site-independent iSNV frequencies of sites i and j, respectively, while q^ij is the frequency estimate of variants encoding co-variants in both i and j. Dividing LD by its theoretical maximum normalizes the linkage disequilibrium measure:

LD'=LDLDmax
LDmax={max{q^iq^j,(1q^i)(1q^j)}ifLD>0min{q^i(1q^j),(1q^i)q^j}ifLD<0

FoldX analyses

FoldX (https://foldxsuite.crg.eu/) was used to estimate structural stability effects of likely linked nonsynonymous minority variants found in the NP of within-host A/H3N2 virus populations. At the time of writing of this paper, there was no A/H3N2-NP structure available. Although the eventual NP structure (PDB: 3ZDP) adopted for stability analyses was originally derived from H1N1 virus (A/WSN/33) (Chenavas et al., 2013), it was the most well-resolved (2.69 Å) crystal structure available, with 78.5% amino acid identity relative to the NP of A/Brisbane/10/2007. Previous work has shown that the mutational effects predicted by FoldX using an NP structure belonging to A/WSN/33 (H1N1) were similar to those experimentally determined on a A/Brisbane/10/2007 NP (Ashenberg et al., 2013). FoldX first removed any potential steric clashes to repair the NP structure. It then estimated differences in free energy changes as a result of the input amino acid mutation (i.e., ΔΔG=ΔGmutant-ΔGwild-type) under default settings (298 K, 0.05 M ionic strength, and pH 7.0). Five distinct simulations were made to estimate the mean and standard deviation ΔΔG values.

Within-host simulations

We implemented forward-time Monte Carlo simulations with varying population size using a simplified within-host evolution model to test if our hypotheses could explain the different evolutionary dynamics observed between A/H3N2 and A/H1N1 viral populations. We assumed that a single virion leads to a productive influenza virus infection within an individual and computed changes in the virus population size (N) using a target cell-limited model. New virions are produced upon infection by existing virions at a rate of βCN, where C is the existing number of target cells while β is the rate of per-cell per-virion infectious contact. Upon infection, a cell will produce r number of virions before it is rendered unproductive. We assume that infected individuals did not mount any antibody-mediated immune response, setting the virus’ natural per capita decay rate (d) such that virions continue to be present within host for 14 days (Figure 6—figure supplement 1 and Table 2). β is then computed by fixing the within-host basic reproduction number (R0):

R0=βC0rd

where C0 is the initial (maximum) target cell population size. We solve the following system of ordinary differential equations numerically to compute the number of virions per viral replicative generation (N(t)):

dCdt=-βCN
dNdt=βCN-dN
Table 2
Parameter values used in the within-host model.
ParameterMeaningValue (units)Source
-Number of hours per replicative generation6 hrAssumption
rAverage number of virions produced by an infected cell100 virionsFrensing et al., 2016
C0Initial target cell population size4 × 108 virionsHadjichrysanthou et al., 2016
dPer capita decay rateTwo per-generationAssumption
R0Within-host basic reproduction number5Hadjichrysanthou et al., 2016
µPer-site, per-generation mutation rate3 × 10−5 per-site, per-generationMcCrone et al., 2020

We assume a binary genome of length L, distinguishing between synonymous and nonsynonymous loci. For A/H3N2 viruses, we hypothesized that most de novo mutations are either weakly deleterious or neutral. To estimate the number of such sites, we aligned A/H3N2 virus sequences that were collected between 2007 and 2012 and identified all polymorphic sites with variants that did not fixate over time (i.e., <95% frequency over 1-month intervals). We estimated L=1050 with 838 and 212 synonymous and nonsynonymous loci, respectively.

We tracked the frequency distribution of genotypes present for every generation t. We assumed that mutations occur at per-locus, per-generation rate µ. During each generation t, the number of virions incurring a single-locus mutation followed a Poisson distribution with mean N(t)μL. For each virion, the mutant locus was randomly selected across all loci. We assumed that all synonymous and a fraction of nonsynonymous sites (fneu,NS) are neutral (i.e., (log) fitness effect s=0). The remaining nonsynonymous sites either had an additive deleterious (sd) or beneficial (sb) fitness effect when mutated. The magnitude of sd / sb follow an exponential distribution with mean effect s-. Epistasis was neglected throughout. The distribution of genotypes in the next generation t+1 was achieved by resampling individuals according to Poisson distribution with mean Nt+1Pfg,t, where Pf(g,t) is the relative fitness distribution of genotype g during generation t.

To decrease the computational costs of the simulations, specifically when Nt reaches orders of 1010-1011 virions (Figure 6—figure supplement 1), we implemented an upper population size limit of 107 virions. Given the mutation rate assumed (Table 2), N(t)μ1 for Nt107 virions, mutation-selection balance is theoretically expected for a single-locus (deleterious) mutant model (Appendix A5). We ran 500 simulations for each variable set of fneu,NS and sd / sb values. All parameter values used in the model are given in Table 2.

Phylogenetic inference

All ML phylogenetic trees were reconstructed with IQTREE (v. 1.6.10) (Nguyen et al., 2015) using the GTR+I+G4 nucleotide substitution model.

Data availability

All raw sequence data have been deposited at NCBI sequence read archive under BioProject Accession number PRJNA722099. All custom Python code and Jupyter notebooks to reproduce the analyses in this paper are available online: https://github.com/AMC-LAEB/Within_Host_H3vH1 (copy archived at swh:1:rev:44e44ddbfab4d157a3c5efd559972f51dec6454a), Han, 2021.

Appendix 1

A1: hemagglutinin and neuraminidase minority variants in A/H3N2 virus samples

The amino acid variants emerging in the HA and NA proteins of A/H3N2 virus samples were discussed in a previous work (Koel et al., 2020). Briefly, given the lack of antibody-mediated immune response in this cohort of mostly naïve children, HA amino acid variants emerging in putative antigenic sites were generally low in frequencies (median frequency = 0.04, IQR = 0.03–0.06) and all only became detectable 3–4 days post-symptom onset (Figure 4—figure supplement 1). Notably, two of these intra-host mutations were also found in the global A/H3N2 virus population in high frequencies: HA-S45N and -D53N (H3 numbering without signal peptide). Both mutations are part of the canonical antigenic site C of H3 and emerged within separate individuals, with D53N eventually becoming the majority variant (97%) as late as day 13 post-symptom onset in one of them. Oseltamivir-resistant amino acid mutations E119V, R292K, and N329K arose in 10 patients that were treated with the antiviral drug and mostly rose in within-host frequencies only 4–7 days after administration of oseltamivir (Figure 4—figure supplement 2).

A2: genetic diversity by π statistic

Given that the number of identified iSNVs can potentially be biased by variations in sequencing coverage, genetic diversity was also assessed using nucleotide diversity π statistic (Nei and Li, 1979). This approach constitutes a more robust measure of within-host diversity as it is solely dependent on the underlying variant frequencies. While the number of polymorphic sites provides an estimate of 'richness' in the viral population, it may be incompatible to compare iSNV counts between samples with different read coverage profiles (Zhao and Illingworth, 2019). In contrast, π is a more robust metric that is unbiased by sequencing depth. For each site l:

πl=NlNl-1-jnj,lnj,l-1NlNl-1

where Nl and nj,l are the coverage and number of reads encoding allele j in site l, respectively (Zhao and Illingworth, 2019). To compute the π statistic for the entire genome of length L:

π=l=1LπlL

Here, we observed similar trends in genetic diversity when using π statistics compared to iSNVs counts (Appendix 1—figure 1). π weakly increased with respect to time and Ct values for A/H3N2 viruses (days since illness onset: Spearman ρ=0.388, p=1.52×10-8; CT: ρ=0.455, p=3.66×10-10) while remaining relatively invariant for A/H1N1pdm09 viruses (days since symptom onset: ρ=0.017, p=0.92; CT: ρ=0.240, p=0.13).

Appendix 1—figure 1
Genetic diversity of within-host influenza A virus populations as estimated by nucleotide diversity π statistic.

Box plots summarizing the π statistic (intra-host single-nucleotide variants [iSNVs]; median, interquartile range [IQR], and whiskers extending within median ±1.5 × IQR) computed for samples with adequate breadth of coverage across the whole influenza genome. (A) Seasonal A/H3N2 and (B) pandemic A/H1N1pdm09 viruses. All box plots are either stratified by day(s) since symptom onset or qPCR cycle threshold (Ct) values.

A3: potential linked minority variants in within-host virus populations

For both within-host seasonal A/H3N2 and pandemic A/H1N1 virus populations, there were few instances of potentially linked nonsynonymous variants, and if such co-variants were to exist, they were mainly found in the internal gene segments (Supplementary file 2). There was only one pair of HA amino acid mutations (E261G/L455F) that was encoded by a minority haplotype of A/H1N1pdm09 viruses infecting one individual, but the normalized Lewontin’s linkage disequilibrium measure (LD') was less than 0.5, suggesting a low likelihood that the mutation pair was linked non-randomly. These potentially linked variants tend to emerge late in the infection (6–7 days post-illness onset) for both viral subtypes, in inferred haplotypes appearing at low frequencies within-host (median frequency = 0.08, IQR = 0.03–0.12) that were not shared between multiple individuals.

A4: transmission bottleneck size estimation of pandemic A/H1N1pdm09 viral infections

Index cases were previously identified for six of the seven households where the pandemic A/H1N1pdm09 viral samples were collected (Thai et al., 2014). Assuming that the non-index cases within the same household were secondarily infected by the index case, five transmission pairs were identified where samples with adequate breadth of coverage (>70% of genome covered with >50× coverage; Appendix 1—figure 2) were collected from the index patient on an earlier date relative to the secondary case.

Appendix 1—figure 2
A/H1N1pdm09 virus household transmission pairs.

(A) Schematic of A/H1N1pdm09 virus household transmission pairs identified by epidemiological linkage and plotted based on timing of sample collection (left panel) Intra-host single-nucleotide variant (iSNV) frequencies found in the donors and recipients of the five transmission pairs (right panel). (B) Violin plots of L1-norm pairwise genetic distance per site between different A/H1N1pdm09 virus sample pairs (each circle = 1 pair of virus samples). Transmission pairs are those represented in Figure 5A. Longitudinal pairs are made up of sample pairs collected from the same individual on the first and any other later timepoints on which the patient was sampled. These pairs are stratified by whether they were collected from households located in the same community (i.e., household) or combined with the rest of the analyzed A/H1N1pdm09 virus samples collected from hospitals (i.e., all). For the same aforementioned categories, we also plotted the distribution of L1-norm distances for pairs of viruses collected from different individuals. All circles are colored by the difference in days between which the sample pairs were collected (ΔT). All p-values reported are based on Mann–Whitney U tests, which were used to determine if the L1-norm genetic distance distributions of the two categories marked by the ends of the horizontal line above are statistically distinct.

Virus transmission bottleneck sizes were then estimated using two binomial sampling models that were elaborated in detail by Sobel Leonard et al., 2017 and McCrone et al., 2018. First, the presence/absence model computes transmission probability as the probability that a transmitted donor iSNV was found in at least one genome in the bottleneck population:

Pd,iANb=pd,ANb

where A refers to the transmitted iSNV in polymorphic site i, Nb is the bottleneck size, and pd,A is the frequency of allele A in the sampled virus population within donor d.

The presence/absence model does not incorporate recipient frequencies of transmitted iSNVs in its probability calculations. It assumes that all transmitted iSNVs are detected in the recipient, and thus any donor iSNVs that are not present in the recipient are considered to have not been transmitted. It also does not account for any changes to pd between the time of sampling and day of transmission. The ML estimate of Nb would thus yield the largest log likelihood value given by

LLNb=dilnPd,i

To incorporate information on recipient frequencies that can change between transmission and sampling, transmission bottleneck sizes were re-estimated using a second beta-binomial model formulated by Sobel Leonard et al., 2017. For each allele A observed in polymorphic site i that was transmitted from donor d to recipient r, the log-likelihood of Nb is given as

LLNbd,rtransmitted=Ailnk=1Nbpbetapr,Aik,Nb-kpbinkNb,pd,Ai

where pbetapr,Aik,Nb-k is the conditional probability density, as modelled by the beta distribution, that the transmitted iSNV, Ai, is found in the recipient at frequency pr,Ai given that the variant is found present in k genomes out of the total transmission bottleneck of Nb genomes. pbinkNb,pd,Ai is the binomial probability of drawing k genomes with allele Ai in a sample of Nb genomes and variant frequency of pd,Ai within the donor.

As some of the iSNVs in the donor may not be transmitted or were present below the 2% minimum variant frequency cutoff, the likelihood of these events was computed by

LLNbd,rlost=Ailnk=1Nbpbeta,cdfpr,Ai<0.02k,Nb-kpbinkNb,pd,Ai

where pbeta,cdf is the cumulative distribution function of the beta distribution.

The ML estimate of Nb as described by the beta-binomial was then computed by searching for the value of Nb that would give the largest value of

LLNb=d,rLLNbd,rtransmitted+LLNbd,rlost

Log-likelihood values were computed for a range Nb between 1 and 1000 genomes for both models.

Transmission bottleneck size was then estimated by aggregating over all transmission pairs and applying two previously developed sampling-based models (McCrone et al., 2018; Sobel Leonard et al., 2017). We estimated the transmission bottleneck of pandemic A/H1N1pdm09 viruses to be 1–2 genomes (ML presence-absence model estimate = 1 genome; ML beta-binomial model estimate = 2 genomes). The tight bottleneck was further reflected in the iSNV frequency plot where most of the donor variants were either present as the single majority allele or were not transmitted/undetected in the recipient, with few shared iSNVs between the two virus populations (Appendix 1—figure 2A).

Furthermore, we observed that the virus populations between patients (median L1-norm distance = 0.031 divergence per site, IQR = 0.024–0.046 divergence per site) were significantly more different than longitudinal samples collected from the same individual (median L1-norm distance = 2.35×10-3 divergence per site, IQR = 1.54×10-3-3.18×10-3 divergence per site), suggesting that the haplotypes found within an individual remain relatively invariant throughout the course of infection (Appendix 1—figure 2B). Combined with the fact that the per-site L1-norm genetic distance between samples attributed to the identified transmission pairs (median L1-norm distance = 6.49×10-3 divergence per site, IQR = 6.34×10-3-6.63×10-3 divergence per site) was greater than those computed for longitudinal sample pairs of each household individual (median L1-norm distance = 2.77×10-3 divergence per site, IQR = 2.18×10-3-3.32×10-3 divergence per site; Mann–Whitney U p-value=0.01; Appendix 1—figure 2B), it is likely that only a limited number of haplotypes were shared between individuals in a transmission pair. Based on the most parsimonious reconstructed haplotypes for the five transmission pairs encoding shared iSNVs, we estimated the median number of haplotypes transmitted from donor to recipient to be between 1 and 2 haplotypes.

A5: mutation-selection balance

Considering a single-locus mutant with deleterious fitness effect s (i.e., s<0), the frequency of the mutant allele (f) can be modeled by the following stochastic differential equation, otherwise known as the Langevin equation (Good and Desai, 2013):

ft=sf(1f)+f(1f)Nη(t)+μ(1f)selectiongenetic driftmutation

where N is the population size, µ is the mutation rate, and ηt is the stochastic noise term due to genetic drift. For any Langevin equation, we can find the time-dependent probability distribution of f (i.e., pf,tt) by its corresponding Fokker–Planck equation. At stationarity (i.e., pf,tt=0), its solution is known to be

pfe-2NΛff1-f

where -Λft=s+μf

Integrating -Λft, we will get

-Λf=sf+μlnf

which can then be substituted to obtain

pfe2Nsff2Nμ-1

In other words, pf strongly depends on Nμ. If Nμ1, we can see that pf will strongly peak at some characteristic value of f that minimizes Λf. If N is large, we can assume that drift effects are negligible and f is largely deterministic due to selection:

ft=sf1-f+μ1-f=-Λftf1-f
Λt=ftΛf=ft-ft1f1-f0

In other words, selection dynamics minimizes the Λf term, and as such, result in mutation-selection balance:

-Λft=s+μf=0
f=-μs

Data availability

All raw sequence data have been deposited at NCBI sequence read archive under BioProject Accession number PRJNA722099. All custom Python code and Jupyter notebooks to reproduce the analyses in this paper are available online: https://github.com/AMC-LAEB/Within_Host_H3vH1 (copy archived at https://archive.softwareheritage.org/swh:1:rev:44e44ddbfab4d157a3c5efd559972f51dec6454a).

The following data sets were generated
    1. South East Asia Infectious Disease Clinical Research Network
    (2021) NCBI BioProject
    ID PRJNA722099. Deep sequencing of virological samples from patients infected with influenza A viruses.

References

Decision letter

  1. Sarah E Cobey
    Reviewing Editor; University of Chicago, United States
  2. Miles P Davenport
    Senior Editor; University of New South Wales, Australia

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

This paper provides important observations and insights on the within-host evolution of seasonal influenza, particularly H3N2 in children. As in adults, H3N2 appears to evolve largely by purifying selection during the initial stages of infection when transmission is likely to occur, and nonsynonymous mutations accumulate later. The authors propose a model of mutation-selection balance that might explain diversity in young children with longer infections.

Decision letter after peer review:

Thank you for submitting your article "Within-host evolutionary dynamics of seasonal and pandemic human influenza A viruses in young children" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Miles Davenport as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

This study adds to our knowledge of the evolutionary dynamics of influenza within hosts. The longitudinal data – from likely primary H3N2 infections in children and infections with emerging H1N1pdm09 – are interesting, and the evolutionary analysis has laudably broad scope. That said, the reviewers agreed that limitations in both the data and analysis cast doubt on the conclusions. More analysis could strengthen the study. After consultation, our major concerns are that:

1. Not enough attention is given to potential errors affecting the sequences. Reviewers discussed both systematic sequencing and random PCR errors. Ideally plasmid controls would have been used, but under the circumstances, we suggest the authors instead perform sensitivity analysis and other checks recommended by Reviewers 2 and 3.

2. The temporal trends are not analyzed in a statistically careful way. Reviewers 1 and 3 raised concerns about the trends reported in Figure 2. (Are they really there?) This affects the general conclusions about H1N1pdm09 v. H3N2.

3. The simulations assume an odd distribution of fitness effects, which might skew the conclusions about the evolutionary regimes of the two subtypes. (It is not unclear why they should have contrasting evolutionary dynamics.) More thorough sensitivity analysis could help here.

The essential revisions would address these concerns. The reviewers were also in agreement on their more specific comments, which I hope you can use to strengthen the work.

Reviewer #1 (Recommendations for the authors):

My primary suggestion is effectively to clarify the statistics for the temporal trends. For instance, in Figure 2, do the p-values test for NS and S being significantly distinct from each other within a time point or gene or between time points/genes? I initially thought the latter given the claims in the text about temporal dynamics. I believe more tests are needed for these claims, not only for H1N1pdm09 (ll. 180-182). This will influence the comparison with simulation output. The Discussion might then also need updating (ll. 474-477).

Reviewer #2 (Recommendations for the authors):

Figure 1 figure supplement 2 appears to show variants present at reported frequencies that where not in all overlapping amplicons. These could be PCR artifacts or potentially real variants that were missed in one of the amplicons. Are the evolutionary rate dynamics driven by variants in this frequency range? Is there enough signal to filter out similar variants and validate the robustness of the findings?

Line 205: It is not clear how the frequency of synonymous mutations, by themselves indicates negative selection in the antigenic sites. What is the importance of the higher frequency of synonymous mutations found in antigenic sites?

Line 663: How were overlapping reading frames accounted for in the evolutionary rate calculation?

The qualitative similarity between the simulated and observed rates is nice addition to the manuscript. Is it possible to use the frequencies of mutations in longitudinal pairs to further support the hypothesis of mutational-selection balance?

Reviewer #3 (Recommendations for the authors):

1. The authors identify and analyze several recurrent mutations in the H3N2 M2 and NP genes, which were found in anywhere from 16 to 27 unlinked individuals. They argue that the recurrent NP mutation is a stabilizing mutation and is epistatically linked to several co-variants that may have destabilizing effects (Figure 5). I am concerned that these mutations may result from technical artifacts and may not represent genuine within-host variants, particularly since amino-acid variants that are known to be associated with oseltamivir resistance were only identified in two patients despite the administration of oseltamivir in many patients. In data from McCrone et al., 2018, for example, several sites appear to harbor low-frequency variants in unrelated individuals, much like the authors describe here, but those variants are also present in the plasmid controls, suggesting that they represent common, site-specific polymerase errors rather than recurrent mutations.

To try to determine whether these variants are technical errors, the authors would ideally sequence plasmid controls using the same protocols that were used to sequence the original samples. Since this is likely infeasible, the authors should also check their data to see if variation is present at M2-77, NP-G384R, and other apparent sites of recurrent mutation at frequencies below the variant-calling threshold. If variation is present at these sites in most samples at a frequency much higher than in neighboring sites, then this variation may reflect technical errors in sequencing. The authors called variants after mapping reads to a reference genome, but they might also try remapping to the sample consensus sequence to reduce the risk that mapping artifacts are causing these recurrent variant calls.

2. The authors write that non-synonymous mutations accumulate later in patient infections, but it's not clear to me how strongly the data in Figure 2A supports that argument, particularly given the limited number of samples collected at any given day following symptom onset. Does nonsynonymous diversity tend to increase over time in successive timepoints collected from the same individual? Does noise in variant frequencies account for the apparent fluctuations in synonymous diversity over time, and if so, how does noise affect the interpretation of nonsynonymous diversity?

3. In the simulations of A/H1N1pdm09 in Figure 6A and 6B, the authors simulate a distribution of fitness effects in which 1% of nonsynonymous mutations are neutral and the remaining mutations are weakly (s=0.01) or strongly (s=0.1) beneficial (lines 785-6). This distribution of fitness effects seems unrealistic to me – even for an emerging virus that may be adapting to a new host, most nonsynonymous mutations will still be deleterious because they affect basic protein functions. It's not clear to me that the large increase in nonsynonymous diversity that the authors observe from these simulations would be observed if deleterious mutations were adequately accounted for; the distribution would probably look much more similar to Figures 6D and 6E.

https://doi.org/10.7554/eLife.68917.sa1

Author response

Essential revisions:

This study adds to our knowledge of the evolutionary dynamics of influenza within hosts. The longitudinal data---from likely primary H3N2 infections in children and infections with emerging H1N1pdm09---are interesting, and the evolutionary analysis has laudably broad scope. That said, the reviewers agreed that limitations in both the data and analysis cast doubt on the conclusions. More analysis could strengthen the study. After consultation, our major concerns are that:

1. Not enough attention is given to potential errors affecting the sequences. Reviewers discussed both systematic sequencing and random PCR errors. Ideally plasmid controls would have been used, but under the circumstances, we suggest the authors instead perform sensitivity analysis and other checks recommended by Reviewers 2 and 3.

2. The temporal trends are not analyzed in a statistically careful way. Reviewers 1 and 3 raised concerns about the trends reported in Figure 2. (Are they really there?) This affects the general conclusions about H1N1pdm09 v. H3N2.

3. The simulations assume an odd distribution of fitness effects, which might skew the conclusions about the evolutionary regimes of the two subtypes. (It is not unclear why they should have contrasting evolutionary dynamics.) More thorough sensitivity analysis could help here.

The essential revisions would address these concerns. The reviewers were also in agreement on their more specific comments, which I hope you can use to strengthen the work.

We thank the editors for considering our manuscript for review. We have addressed all of the reviewers’ concerns and comments in detail below. The revised manuscript is submitted with tracked changes relative to the original submission.

Reviewer #1 (Recommendations for the authors):

My primary suggestion is effectively to clarify the statistics for the temporal trends. For instance, in Figure 2, do the p-values test for NS and S being significantly distinct from each other within a time point or gene or between time points/genes? I initially thought the latter given the claims in the text about temporal dynamics. I believe more tests are needed for these claims, not only for H1N1pdm09 (ll. 180-182). This will influence the comparison with simulation output. The Discussion might then also need updating (ll. 474-477).

We thank the reviewer for their careful consideration of our manuscript.

We agree that there is a lack in statistical power in the A/H1N1pdm09 virus dataset to claim meaningful differences in temporal trends to A/H3N2 within-host dynamics. The only reasonable conclusion that can be made here is that there was a greater accumulation in nonsynonymous iSNVs relative to synonymous ones in A/H1N1pdm09 within-host virus populations. As per the reviewer’s suggestion, we have now removed the boxplots for the A/H1N1pdm09 virus panel in Figure 2B, replacing it with a scatter plot. We have also updated the manuscript to reflect our inability to characterise the within-host temporal trends for A/H1N1pdm09 viruses using this dataset:

Line 210: “We observed higher nonsynonymous evolutionary rates relative to synonymous ones initially after symptom onset but were unable to determine if they were significantly different due to the low number of samples (i.e. median = 2 samples per day post-symptom onset). In turn, we also could not meaningfully characterise the temporal trends of within-host evolution for the pandemic virus with this dataset. Nonetheless, consolidating over all samples across all time points, there was significantly higher rates of accumulation of nonsynonymous variants in the polymerase basic 2 (PB2), polymerase acidic (PA), HA and matrix (M) gene segments (Figure 2B, Figure 2 —figure supplement 2 and Figure 3 —figure supplement 2). All gene segments also yielded NS/S ratios > 1 (Table S1).”

Line 565: “Owing to the low number of A/H1N1pdm09 virus samples and different next-generation sequencing platforms used to sequence samples of the two virus subtypes and consequently differences in base calling error rates and depth of coverage (Figure 1 —figure supplement 1), we were unable to directly compare the observed levels of within-host genetic diversity and evolutionary dynamics between the two influenza subtypes here.”

Linderman et al., (PNAS, 2014) and Huang et al., (JCI, 2015) found that individuals born prior to the early 1980s possessed antibodies that recognized HA-166K (H3 numbering) residing in the Sa antigenic site of A/H1N1pdm09 viruses. They attributed this to previous exposures to seasonal A/H1N1 viruses with the HA-166K Sa epitope. This adaptive immune response likely led to the fixation of HA-K166Q in A/H1N1pdm09 viruses, which abrogated antibody recognition of this epitope. However, this epitope was shielded by glycans in seasonal A/H1N1 viruses in 1986 due to the acquisition of a glycosylation site in HA-129. As such individuals born after the late 1980s did not possess the same antibodies and are therefore unlikely to exert the same adaptive immune pressure as their older counterparts.

Out of the 32 A/H1N1pdm09-infected individuals analysed in our study, only six of them were born before 1986. The median birth year of all individuals was 1999 (IQR = 1989, 2005). Hence, the same adaptive immune pressure on HA-166K was not present in these younger individuals during the first wave of the A/H1N1pdm09 pandemic then. We also did not detect the HA-166Q variant in any of the six older individuals born prior to 1986.

Besides HA-166K, Li et al., (JEM, 2013) also found that individuals born between 1983 and 1996 have narrowly focused antibodies against the HA-133K epitope as a result of previous exposures to seasonal A/H1N1 viruses. HA-133K has, however, remained conserved in the global A/H1N1pdm09 virus population to date. We also did not find any variants above the calling threshold in any of the individuals investigated.

The HA protein is the primary target of human adaptive immune response, which in turn drives its antigenic evolution (Petrova and Russell, Nat Rev Microbiol, 2018). In terms of cellular immunity, HA encodes few CTL epitopes (Woolthuis et al., Sci Rep, 2016). Most CTL epitopes are found in the nucleoprotein (NP), which we have considered here in our discussion observing recurrent NP-G384R variants independently found in multiple individuals.

As mentioned earlier, the A/H1N1pdm09 virus dataset lack statistical power. As such, we are unable to characterise temporal trends for the pandemic virus and have no longer discuss this in the updated manuscript (see response to reviewer #3 as well).

However, the reviewer was right to point out one of our key conclusions that mutation-selection balance is only observed in naïve young children with longer A/H3N2 virus infections and would be less likely to hold for the typically shorter-lived infections of older children and adults. We have now put more emphasis on this conclusion in the abstract and discussion:

Line 42: “For A/H3N2 viruses in young children, early infection was dominated by purifying selection. As these infections progressed, nonsynonymous variants typically increased in frequency even when within-host virus titres decreased. Unlike the short-lived infections of adults where de novo within-host variants are rare, longer infections in young children allow for the maintenance of virus diversity via mutation-selection balance creating potentially important opportunities for within-host virus evolution.”

Line 530: “Through simulations of a within-host evolution model, we investigated the hypothesis that in the absence of any positive selection, the accumulation of nonsynonymous iSNVs was a result of their neutral or only weakly deleterious effects and the expanding within-host virion population size during later timepoints in longer infections of naïve young children such that mutation-selection balance was reached. In contrast, this balance was not detected in otherwise healthy older children or adults with short-lived influenza virus infections lasting no more than a week where de novo nonsynonymous iSNVs are rarely found 4,8–11,44.”

Reviewer #2 (Recommendations for the authors):

Figure 1 figure supplement 2 appears to show variants present at reported frequencies that where not in all overlapping amplicons. These could be PCR artifacts or potentially real variants that were missed in one of the amplicons. Are the evolutionary rate dynamics driven by variants in this frequency range? Is there enough signal to filter out similar variants and validate the robustness of the findings?

As per the suggestion by the reviewer, we first excluded all iSNVs found under the 75th percentile of frequency range of variants detected in only one of the overlapping. We then recomputed the day-by-day evolutionary rates (Figure 2 —figure supplement 3) and found similar relative rate dynamics as those presented in the main results (Figure 2A). We added the following text to manuscript:

Line 632: “We also performed additional checks to ensure that our results were not driven by potential PCR and/or technical artefacts. First, we excluded all iSNVs found under the 75th percentile of frequency range of A/H3N2 variants that were found in only one of the overlapping amplicons. We then recomputed the daily within-host evolutionary rates with the remaining iSNVs (Figure 2 —figure supplement 3) and found that the relative temporal trends in synonymous and nonsynonymous rates remain similar to those in Figure 2A. We also checked that the distributions of frequencies for iSNVs found in recurrent mutation sites (i.e. NP-384 and M2-77) that are below variant calling threshold are comparable to those found in their neighbouring sites (±10 nucleotide positions; Figure 4 —figure supplement 2). Furthermore, we remapped the sample reads to their respective consensus sequences to minimize mapping of technical artefacts. We were still able to detect the recurring NP-G384R and M2-R77* amino acid mutations in multiple individuals and timepoints at similar frequencies when mapped to the reference genome (Figure 4 —figure supplement 1C-D and 3). As such, these recurrent mutations are unlikely to have been resulted from erroneous variant calls of artefacts.”

See response to reviewer #3 for further validation on the robustness of our variant calls.

Line 205: It is not clear how the frequency of synonymous mutations, by themselves indicates negative selection in the antigenic sites. What is the importance of the higher frequency of synonymous mutations found in antigenic sites?

Under neutral selection, the frequency of synonymous and nonsynonymous mutations should be similar to one another. However, if there was negative (purifying) selection to purge deleterious amino acid changes (nonsynonymous mutations) such as those that affect protein functions, this should result in an imbalance in mutation frequencies where there are relatively higher frequencies of synonymous mutations compared to nonsynonymous ones. As many antigenic sites of HA are close to the receptor-binding region, many nonsynonymous mutations in these sites could have negatives effects on host cell binding and thus be purged by negative selection.

Line 663: How were overlapping reading frames accounted for in the evolutionary rate calculation?

We considered a variant to be a nonsynonymous mutation if any of the overlapping reading frames engender nonsynonymous changes. This is now explicitly stated in the Methods (Line 783):

“If a variant was found in overlapping reading frames and a nonsynonymous change was observed in any of those frames, it would be accounted for as a nonsynonymous mutation.”

The qualitative similarity between the simulated and observed rates is nice addition to the manuscript. Is it possible to use the frequencies of mutations in longitudinal pairs to further support the hypothesis of mutational-selection balance?

Given that we are unable to quantify the range and level of uncertainty in variant frequencies of our dataset, it is difficult to use mutation frequencies to support the mutation-selection balance hypothesis. However, we observed that some nonsynonymous variants, including lethal stop-codon mutations such as M2-R77*, persisted in low frequencies within the same individual across multiple timepoints (Figure 4 —figure supplement 1). Furthermore, in response to reviewer #3, we performed linear regression on the computed evolutionary rates for each individual with samples collected at multiple timepoints to ascertain that the patterns observed were not because of aggregating the data from multiple individuals. See detailed response to reviewer #3 below.

Reviewer #3 (Recommendations for the authors):

1. The authors identify and analyze several recurrent mutations in the H3N2 M2 and NP genes, which were found in anywhere from 16 to 27 unlinked individuals. They argue that the recurrent NP mutation is a stabilizing mutation and is epistatically linked to several co-variants that may have destabilizing effects (Figure 5). I am concerned that these mutations may result from technical artifacts and may not represent genuine within-host variants, particularly since amino-acid variants that are known to be associated with oseltamivir resistance were only identified in two patients despite the administration of oseltamivir in many patients. In data from McCrone et al., 2018, for example, several sites appear to harbor low-frequency variants in unrelated individuals, much like the authors describe here, but those variants are also present in the plasmid controls, suggesting that they represent common, site-specific polymerase errors rather than recurrent mutations.

To try to determine whether these variants are technical errors, the authors would ideally sequence plasmid controls using the same protocols that were used to sequence the original samples. Since this is likely infeasible, the authors should also check their data to see if variation is present at M2-77, NP-G384R, and other apparent sites of recurrent mutation at frequencies below the variant-calling threshold. If variation is present at these sites in most samples at a frequency much higher than in neighboring sites, then this variation may reflect technical errors in sequencing. The authors called variants after mapping reads to a reference genome, but they might also try remapping to the sample consensus sequence to reduce the risk that mapping artifacts are causing these recurrent variant calls.

Following the reviewer’s suggestion, we did not find that there was any significant difference in frequencies of iSNVs below the variant-calling threshold between those found in the recurrent sites and neighbouring sites that are ±10 nucleotide positions away (Figure 4 —figure supplement 2). We also remapped the sample reads to their respective consensus sequence and still detected these recurrent calls across multiple individuals and timepoints at similar frequencies when mapped to a reference genome (Figure 4 —figure supplement 3 v. figure supplement 1C-D).

Line 638: “We also checked that the distributions of frequencies for iSNVs found in recurrent mutation sites (i.e. NP-384 and M2-77) that are below variant calling threshold are comparable to those found in their neighbouring sites (±10 nucleotide positions; Figure 4 —figure supplement 2). Furthermore, we remapped the sample reads to their respective consensus sequences to minimize mapping of technical artefacts. We were still able to detect the recurring NP-G384R and M2-R77* amino acid mutations in multiple individuals and timepoints at similar frequencies when mapped to the reference genome (Figure 4 —figure supplement 1C-D and 3). As such, these recurrent mutations are unlikely to have been resulted from erroneous variant calls of artefacts.”

2. The authors write that non-synonymous mutations accumulate later in patient infections, but it's not clear to me how strongly the data in Figure 2A supports that argument, particularly given the limited number of samples collected at any given day following symptom onset. Does nonsynonymous diversity tend to increase over time in successive timepoints collected from the same individual? Does noise in variant frequencies account for the apparent fluctuations in synonymous diversity over time, and if so, how does noise affect the interpretation of nonsynonymous diversity?

To minimize the effect noise, we performed linear regression on the computed evolutionary rates in Figure 2A for each A/H3N2 infected individual with at least three sampled timepoints (n=39). Among these, 25 of them (64%) had a positive correlation in nonsynonymous iSNV accumulation rates against time (Figure 2 —figure supplement 3). For individuals where samples were only collected at two timepoints (n=8, between D4 to D8 post symptom onset), 6 of them had higher accumulation rates of nonsynonymous variants in the successive sample.

In contrast, synonymous evolutionary rates were negatively correlated against time for 27 (69%) of the 39 A/H3N2 infected individuals with at least three sampling timepoints. If noise was the primary determinant for fluctuations in diversity over time, we should observe similar effects on the trends of nonsynonymous and synonymous variants over time. However, the contrasting patient-specific temporal trends between the two type of variants supports our conclusion that nonsynonymous mutations accumulated later in time when within-host virus population is large enough in size such that mutation-selection balance is reached.

Line 175: “To ensure that this temporal trend was not due to aggregated effects across multiple individuals, we performed linear regression on the computed evolutionary rates for each A/H3N2 infected individual with at least three sampling timepoints (n=39). Nonsynonymous evolutionary rates were positively correlated against time for 25/39 individuals (64%; Figure 2 —figure supplement 3). In contrast, synonymous evolutionary rates were negatively correlated against time for 27 (69%) individuals.”

3. In the simulations of A/H1N1pdm09 in Figure 6A and 6B, the authors simulate a distribution of fitness effects in which 1% of nonsynonymous mutations are neutral and the remaining mutations are weakly (s=0.01) or strongly (s=0.1) beneficial (lines 785-6). This distribution of fitness effects seems unrealistic to me – even for an emerging virus that may be adapting to a new host, most nonsynonymous mutations will still be deleterious because they affect basic protein functions. It's not clear to me that the large increase in nonsynonymous diversity that the authors observe from these simulations would be observed if deleterious mutations were adequately accounted for; the distribution would probably look much more similar to Figures 6D and 6E.

We agree that the distribution of fitness effects that was previously used to simulate A/H1N1pdm09 virus dynamics needed to be reworked. However, as in our response to reviewer 1 on the lack of statistical power in the A/H1N1pdm09 virus dataset. Since we cannot reliably characterise the temporal trends of nonsynonymous and synonymous iSNVs in A/H1N1pdm09 infected individuals, we have now removed simulation results for A/H1N1pdm09 viruses.

https://doi.org/10.7554/eLife.68917.sa2

Article and author information

Author details

  1. Alvin X Han

    Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Zandra C Felix Garza and Matthijs RA Welkers
    For correspondence
    x.han@amsterdamumc.nl
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6281-8498
  2. Zandra C Felix Garza

    Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Alvin X Han and Matthijs RA Welkers
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7262-2165
  3. Matthijs RA Welkers

    Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Methodology, Writing - review and editing
    Contributed equally with
    Alvin X Han and Zandra C Felix Garza
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2982-0278
  4. René M Vigeveno

    Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    Contribution
    Resources, Data curation, Validation, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Nhu Duong Tran

    National Institute of Hygiene and Epidemiology, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Thi Quynh Mai Le

    National Institute of Hygiene and Epidemiology, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Thai Pham Quang

    National Institute of Hygiene and Epidemiology, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3796-6162
  8. Dinh Thoang Dang

    Ha Nam Centre for Disease Control, Ha Nam, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  9. Thi Ngoc Anh Tran

    Children's Hospital 2, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  10. Manh Tuan Ha

    Children's Hospital 2, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  11. Thanh Hung Nguyen

    Children's Hospital 1, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  12. Quoc Thinh Le

    Children's Hospital 1, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  13. Thanh Hai Le

    Vietnam National Children's Hospital, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration
    Competing interests
    No competing interests declared
  14. Thi Bich Ngoc Hoang

    Vietnam National Children's Hospital, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  15. Kulkanya Chokephaibulkit

    Siriraj Hospital, Mahidol University, Bangkok, Thailand
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  16. Pilaipan Puthavathana

    Siriraj Hospital, Mahidol University, Bangkok, Thailand
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  17. Van Vinh Chau Nguyen

    Hospital for Tropical Diseases, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  18. My Ngoc Nghiem

    Hospital for Tropical Diseases, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  19. Van Kinh Nguyen

    National Hospital for Tropical Diseases, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  20. Tuyet Trinh Dao

    National Hospital for Tropical Diseases, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  21. Tinh Hien Tran

    1. Siriraj Hospital, Mahidol University, Bangkok, Thailand
    2. Oxford University Clinical Research Unit, Ho Chi Minh city, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  22. Heiman FL Wertheim

    1. Oxford University Clinical Research Unit, Ho Chi Minh city, Viet Nam
    2. Radboud Medical Centre, Radboud University, Nijmegen, Netherlands
    3. Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  23. Peter W Horby

    1. Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    2. Oxford University Clinical Research Unit, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  24. Annette Fox

    1. Oxford University Clinical Research Unit, Hanoi, Viet Nam
    2. Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
    3. WHO Collaborating Centre for Reference and Research on Influenza, Melbourne, Australia
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0565-7146
  25. H Rogier van Doorn

    1. Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    2. Oxford University Clinical Research Unit, Hanoi, Viet Nam
    Contribution
    Resources, Data curation, Funding acquisition, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  26. Dirk Eggink

    1. Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    2. Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, Netherlands
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing - review and editing
    Contributed equally with
    Menno D de Jong and Colin A Russell
    Competing interests
    No competing interests declared
  27. Menno D de Jong

    Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Investigation, Methodology, Project administration, Writing - review and editing
    Contributed equally with
    Dirk Eggink and Colin A Russell
    Competing interests
    No competing interests declared
  28. Colin A Russell

    Department of Medical Microbiology & Infection Prevention, Amsterdam University Medical Center, Amsterdam, Netherlands
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Dirk Eggink and Menno D de Jong
    For correspondence
    c.a.russell@amsterdamumc.nl
    Competing interests
    No competing interests declared

Funding

H2020 European Research Council (818353)

  • Alvin X Han
  • Zandra C Felix Garza
  • Colin A Russell

National Institute of Allergy and Infectious Diseases (N01-A0-50042)

  • Matthijs RA Welkers
  • René M Vigeveno
  • Nhu Duong Tran
  • Thi Quynh Mai Le
  • Thai Pham Quang
  • Dinh Thoang Dang
  • Thi Ngoc Anh Tran
  • Manh Tuan Ha
  • Thanh Hung Nguyen
  • Quoc Thinh Le
  • Thanh Hai Le
  • Thi Bich Ngoc Hoang
  • Kulkanya Chokephaibulkit
  • Pilaipan Puthavathana
  • Van Vinh Chau Nguyen
  • My Ngoc Nghiem
  • Van Kinh Nguyen
  • Tuyet Trinh Dao
  • Tinh Hien Tran
  • Heiman FL Wertheim
  • Peter W Horby
  • Annette Fox
  • H Rogier van Doorn
  • Dirk Eggink
  • Menno D de Jong

National Institutes of Health (HHSN272200500042C)

  • Matthijs RA Welkers
  • René M Vigeveno
  • Nhu Duong Tran
  • Thi Quynh Mai Le
  • Thai Pham Quang
  • Dinh Thoang Dang
  • Thi Ngoc Anh Tran
  • Manh Tuan Ha
  • Thanh Hung Nguyen
  • Quoc Thinh Le
  • Thanh Hai Le
  • Thi Bich Ngoc Hoang
  • Kulkanya Chokephaibulkit
  • Pilaipan Puthavathana
  • Van Vinh Chau Nguyen
  • My Ngoc Nghiem
  • Van Kinh Nguyen
  • Tuyet Trinh Dao
  • Tinh Hien Tran
  • Heiman FL Wertheim
  • Peter W Horby
  • Annette Fox
  • H Rogier van Doorn
  • Dirk Eggink
  • Menno D de Jong

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Carolien van de Sandt for helpful discussions. We gratefully acknowledge the authors and originating and submitting laboratories (Supplementary file 5) for the reference sequences retrieved from GISAID’s EpiFlu Database used in this study. AXH, ZCFG, and CAR were supported by ERC NaviFlu (no. 818353). The South East Asia Infectious Disease Clinical Research Network (SEAICRN) was funded by the National Institutes of Allergy and Infectious Diseases, National Institutes of Health (US), N01-A0-50042, HHSN272200500042C.

Ethics

Human subjects: The Institutional Review Board of all hospitals, the National Institute of Allergy and Infectious Diseases, and the Oxford Tropical Research Ethics Committee approved the study. Written informed consent was given by all patients (or proxies).

Senior Editor

  1. Miles P Davenport, University of New South Wales, Australia

Reviewing Editor

  1. Sarah E Cobey, University of Chicago, United States

Publication history

  1. Received: March 30, 2021
  2. Accepted: August 2, 2021
  3. Accepted Manuscript published: August 3, 2021 (version 1)
  4. Version of Record published: August 23, 2021 (version 2)

Copyright

© 2021, Han et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 535
    Page views
  • 76
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Milton Tan et al.
    Research Article Updated

    Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, yet their genomes are understudied. We report long-read sequencing of the whale shark genome to generate the best gapless chondrichthyan genome assembly yet with higher contig contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of ancestral gene families, immunity, and gigantism. We found a major increase in gene families at the origin of gnathostomes (jawed vertebrates) independent of their genome duplication. We studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate immune defense, and found diverse patterns of gene family evolution, demonstrating that adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We also discovered a new toll-like receptor (TLR29) and three NOD1 copies in the whale shark. We found chondrichthyan and giant vertebrate genomes had decreased substitution rates compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, suggesting substitution and expansion rates of gene families are decoupled in vertebrate genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants were enriched for human cancer-related genes, consistent with gigantism requiring adaptations to suppress cancer.

    1. Evolutionary Biology
    2. Immunology and Inflammation
    Srijan Seal et al.
    Review Article

    Researchers worldwide are repeatedly warning us against future zoonotic diseases resulting from humankind’s insurgence into natural ecosystems. The same zoonotic pathogens that cause severe infections in a human host frequently fail to produce any disease outcome in their natural hosts. What precise features of the immune system enable natural reservoirs to carry these pathogens so efficiently? To understand these effects, we highlight the importance of tracing the evolutionary basis of pathogen tolerance in reservoir hosts, while drawing implications from their diverse physiological and life-history traits, and ecological contexts of host-pathogen interactions. Long-term co-evolution might allow reservoir hosts to modulate immunity and evolve tolerance to zoonotic pathogens, increasing their circulation and infectious period. Such processes can also create a genetically diverse pathogen pool by allowing more mutations and genetic exchanges between circulating strains, thereby harboring rare alive-on-arrival variants with extended infectivity to new hosts (i.e., spillover). Finally, we end by underscoring the indispensability of a large multidisciplinary empirical framework to explore the proposed link between evolved tolerance, pathogen prevalence, and spillover in the wild.