Comprehensive mapping of adaptation of the avian influenza polymerase protein PB2 to humans
Figures

Deep mutational scanning of avian influenza PB2 in human and avian cells.
(A) We mutagenized all codons of PB2 from an avian influenza strain. We generated mutant virus libraries using a helper-virus approach, and passaged libraries at low MOI in human (A549) or duck (CCL-141) cells to select for functional PB2 variants. (B) We deep sequenced PB2 mutants from the initial mutant plasmid library and the mutant virus library after passage through each cell type. We computed the ‘preference’ for each amino acid in each cell type by comparing the frequency of each mutation before and after selection. In the logo plots, the height of each letter is proportional to the preference for that amino acid at that site. (C) To identify mutations that are adaptive in one cell type versus the other, we computed the differential selection by comparing the frequency of each amino-acid mutation in human versus avian cells. Letter heights are proportional to the log enrichment of the mutation in human versus avian cells. Figure 1—figure supplement 1 shows the phylogenetic relation of the chosen avian influenza strain to other influenza strains. Figure 1—figure supplement 2 shows further details of deep mutation scanning experiment. Figure 1—figure supplement 3 shows relative amplification of full-length PB2 versus PB2-GFP and PB2-deletion gene segments.

Phylogenetic relationship of PB2 sequence of chosen avian influenza strain to other influenza strains.
(A) Phylogenetic tree of influenza PB2. We used PB2 sequences from the following influenza strains: A/Green-winged Teal/Ohio/175/1986 (indicated with a green dot), diverse strains sampled across years and hosts (Doud et al., 2015), and representatives of lineage-defining strains (human H3N2, human pandemic H1N1) and recent sporadic human cases of avian influenza strains (H5N1, H7N9). PB2 nucleotide sequences (of the coding sequence) were aligned using MAFFT and the phylogenetic tree was built using RAxML using the GTRCAT substitution model. Scale bar: mean nucleotide substitutions per site. (B) Pairwise amino-acid identity between all PB2 sequences shown in the tree, between just avian strains, between just human strains, and between human and avian strains.

Details of deep mutational scanning experiment.
(A) Experiments were performed in biological triplicate, starting from plasmid mutagenesis. All experimental steps were also performed on a wild-type PB2 gene (blue) to estimate error rates during deep sequencing and other experimental steps. (B—F) We picked 48 clones across the three replicate mutant plasmid libraries for Sanger sequencing. (B) There was an average of 1.4 codon mutants per clone, with the number of mutations per clone roughly following a Poisson distribution. (C) Distribution of number of nucleotide changes for each codon mutation. (D) Nucleotide frequencies in the mutant versus parent codons. (E) Mutations were distributed uniformly across the PB2 gene. (F) Cumulative distribution of pairwise distance between pairs of codon mutations, for clones with multiple mutations. The observed distribution is close to the expected distribution of pairs of mutations occurred independently. (G) Cumulative distribution of the fraction of mutations that are found less than or equal to the indicated number of times. ‘DNA’ refers to the mutant plasmid library. (H) Per codon frequencies of nonsynonymous, stop, and synonymous mutations for each mutant library replicate and wild type, measured either in the DNA plasmid library or after passaging in human (A549) or avian (CCL-141) cells. The top plot shows all mutations; the bottom plot shows only mutations accessible by 2 and 3 nucleotide substitutions. (I) Correlations among experimental replicates of all amino-acid preferences. Correlations compare replicates passaged in human cells (orange), avian cells (green), and between the two cell types (black).

Relative amplification of full-length PB2 versus PB2-GFP and PB2-deletion gene segments.
Following passaging of PB2 mutant virus libraries and RNA extraction, PB2 vRNA was reversed transcribed then amplified using primers annealing to the ends of the vRNA. The PCR products were separated by gel electrophoresis. Bands likely corresponding to full length PB2, PB2-GFP, and PB2-deletion gene segments are labeled. The PB2-GFP comes from residual helper virus (which packages GFP in the PB2 segment) that was not purged by the low MOI passage, while internal deletions in PB2 are well known to arise spontaneously when virus is passaged in cell culture (Xue et al., 2016).

Functional constraints on PB2.
(A) The amino acid preferences measured in human and avian cells for key regions of PB2: the start codon, sites involved in cap-binding, and sites comprising the nuclear localization sequence (NLS). The height of each letter is proportional to the preference for that amino acid at that site. Known critical amino acids are generally strongly preferred in both cell types. (B) Correlation of the site entropy of the amino-acid preferences measured in each cell type. (C) Sites of high variability (as measured by entropy) in natural human influenza sequences occur at sites of high entropy as experimentally measured in human cells. (D) Sites with high variability in natural avian influenza sequences occur at sites of high entropy as experimentally measured in duck cells. Figure 2—figure supplement 1 shows the complete map of amino acid preferences as measured in human and avian cells. Preferences (as well as mutation effect and differential selection for all mutations as calculated for Figure 3) are in Figure 2—source data 1.
-
Figure 2—source data 1
Preference, mutation effect, and differential selection results for all mutations.
- https://doi.org/10.7554/eLife.45079.009

Complete map of amino acid preferences measured in human and avian cells.
(A) Measurements in human (A549) cells. (B) Measurements in avian (CCL-141) cells. The height of the letter at each site is proportional to the rescaled preference for that amino acid at that site. Domains of PB2 (Pflug et al., 2017) are indicated by the top color bar. The wild-type S009 PB2 sequence is indicated by the letters above each site’s logoplot.

Deep mutational scanning identifies known and novel host-adaptive mutations.
(A) Distribution of experimentally measured differential selection for previously characterized human adaptive mutations and all other possible mutations to PB2. Positive differential selection means a mutation is favored in human versus avian cells. (B) Scatterplot of each mutation’s effect in human versus avian cells, showing the top adaptive mutations identified in the deep mutational scanning. (C) Logoplots showing the differential selection at the sites of mutations that we chose for functional validation. The height of each letter above the line indicates how strongly it was selected in human versus avian cells. Top adaptive mutations are colored in orange (human-adaptive) or green (avian-adaptive). Mutations chosen for functional validation are indicated by an asterisk(*). Additional mutations chosen for validation are colored light orange (differentially selected in human) or light green (differentially selected in bird). Mutations observed in H7N9 avian-to-human transmission are indicated by #. Note that not all mutations with high differential selection in human versus avian cells are classified as top adaptive mutations because we also filtered for mutations that are substantially beneficial relative to wildtype. (D) Logoplots showing amino acid preferences at sites we chose for functional validation. Top mutations beneficial in both human and avian cells are colored purple. Mutations chosen for validation are indicated by *. Figure 3—figure supplement 1 shows the complete map of differential selection in human versus avian cells. Catalog of previously described human/mammalian adaptive mutations are in Figure 3—source data 1.
-
Figure 3—source data 1
Catalog of previously described human/mammalian adaptive mutations.
- https://doi.org/10.7554/eLife.45079.012

Complete map of differential selection in human versus avian cells.
(A) Differential selection for PB2. The height of the letter at each site is proportional to the differential selection in human versus avian cells for that amino acid at that site. Letters above the center line are favored in human cells. The wild-type avian influenza (S009) PB2 sequence is indicated above each site’s logoplot. (B) Scatter plots of differential selection versus mutation effect as measured in human or avian cells. Top experimentally adaptive mutations identified in our deep mutational scanning (orange, green, purple dots) are indicated on the plots.

Validation of top experimentally adaptive mutations.
The polymerase activity of selected PB2 mutants as measured using minigenome assays in A549 (A) and HEK293T (B) human cells. The mutations chosen for characterization include previously known human adaptive mutations, top adaptive mutations identified by our deep mutational scanning (orange = human adaptive, green = avian adaptive), and additional mutations differentially selected in human (light orange) or avian (light green) cells. E627E is a synonymous mutation at site 627 used as a negative control. Minigenome activity is represented as percent of transfected cells that expressed a viral GFP reporter. The gray horizontal line indicates the mean value measured for the wild type avian PB2. Minigenome assays were performed in biological triplicate. Mutations that have significantly different minigenome activity from wild type are indicated by asterisks (unpaired t-test, p<0.05). (C) Competition of virus bearing the indicated mutant PB2 against virus with wild-type PB2. For each competition, human A549 and avian CCL-141 cells were infected with mutant and wild-type viruses mixed at a 1:1 ratio of transcriptionally active particles, and the frequency of each variant after viral replication was measured by deep sequencing viral RNA. For samples collected at 10 hr post infection, we infected cells at MOI of 0.1, and sequenced vRNA from cellular extract. For samples collected at 48 hr post infection, we infected cells at MOI of 0.01, and sequenced vRNA from the supernatant. The plots show the ratio of the mutant over wild-type variant in A549, divided by the same ratio in CCL-141 cells. A ratio >1 indicates that a viral mutant grows better in human than avian cells. Competition assays were performed in biological duplicate; circle and cross represent replicate experiments. Flow data for minigenome activity and and mutation counts for viral competition are provided in Figure 4—source datas 1 and 2.
-
Figure 4—source data 1
Flow cytometry data for minigenome assays.
- https://doi.org/10.7554/eLife.45079.014
-
Figure 4—source data 2
Mutant frequency data for competition assay.
- https://doi.org/10.7554/eLife.45079.015

Locations of top human-adaptive mutations on the structure of the influenza polymerase.
Overall structure of the influenza polymerase complex comprising PB2, PB1 and PA in (A, B) the transcription pre-initiation form (PDB: 4WSB) and (C, D) the apo form (PDB: 5D98). PB2 domains defined as in Pflug et al. (2017). (B, D) Sites of top human-adaptive mutations identified by deep mutational scanning are shown in red on the PB2 subunit of the structure. Sites of previously experimentally verified human-adaptive mutations are in blue (25 sites as listed in Figure 3—source data 1). Sites identified by deep mutational scanning and which were also previously known are in purple. A subset of sites are labeled and/or circled for referencing in the main text, to indicate surfaces that might mediate host-interactions. Similar results are obtained if we instead analyze the structures in terms of a continuous variable representing the extent of human-specific adaptation at each site (Figure 5—figure supplement 1B, C). (E) Structure of PB2 C-terminal fragment co-crystalized with importin-α7 (PDB: 4UAD). Sites on PB2 interacting with major and minor NLS binding surfaces of importin-α7 are in green and cyan respectively. Importin-α7 is depicted in ribbon form in tan. We used the deep mutational scanning to define a continuous variable indicating the extent of host-specific adaptation at each site of PB2. Specifically, for each site, we computed the positive site differential selection by summing all positive mutation differential selection values at that site (i.e., the total height of the letter stack in the positive direction in logoplots such as in Figure 3D). We mapped this differential selection onto the PB2 C-terminal fragment in red; PB2 sites with high differential selection are numbered. Regions of importin-α7 that differ from importin-α3 are colored in orange, those near PB2 sites with high differential selection are shown as spheres. For all structures, the avian influenza (S009) PB2 amino acid sequence was mapped onto the PB2 chain by one-2-one threading using Phyre2 (Kelley et al., 2015) (Confidence in models for 4WSB, 5D98, and 4UAD are 100%, 100%, and 99% respectively). Sites are numbered according to the S009 PB2 sequence. Figure 5—figure supplement 1 shows relative solvent accessibility of human-adaptive mutations, as well as positive site differential selection mapped onto structures of influenza polymerase.

Solvent accessibility of sites of human-adaptive mutations, and positive site differential selection mapped onto structures of influenza polymerase.
(A) Scatterplot of relative solvent accessibility (RSA) for all sites in the transcription pre-initiation form (PDB: 4WSB), and the apo form (PDB: 5D98) of the influenza polymerase complex. Sites of top experimentally identified human-adaptive mutations are in orange with site position labeled; all other sites are in gray. Blue lines indicate the RSA cut-off of >0.2 for surface exposed sites. Positive site differential selection mapped onto the PB2 subunit of the influenza polymerase complex in (B) the transcription pre-initiation form (PDB: 4WSB), and (C) the apo form (PDB: 5D98). (D) Positive site differential selection mapped onto PB2 (PDB: 6F5O). Sites in PB2 involved in RNA Pol II CTD binding indicated in green. Sites with high differential selection are numbered. The avian influenza (S009) PB2 amino acid sequence was mapped onto the PB2 chain by one-2-one threading using Phyre2 (Kelley et al., 2015) (Confidence in model for 6F5O: 100%). Sites are numbered according to the S009 PB2 sequence.

Experimentally identified human-adaptive mutations are enriched in avian-human transmission of H7N9 influenza.
(A) Phylogeny of H7N9 influenza PB2 sequences. Branches in human and avian hosts are colored black and gray respectively. Orange or red dots indicate where a mutation was inferred to have occurred. Branch lengths are scaled by annotated and inferred dates of origin of each sequence. (B) Distribution of experimentally measured differential selection values for all mutations occurring during H7N9 evolution in human and avian hosts. A positive differential selection value means that our experiments measured the mutation to be beneficial in human versus avian cells. A subset of top differentially selected mutations that occur frequently are labeled and plotted in orange. Enlarged phylogenetic trees are in Figure 6—figure supplement 1–5. Counts of mutations identified in phylogenetic analysis are in Figure 6—source data 1. Mutations plotted in each bin of the histogram are in Figure 6—source data 2.
-
Figure 6—source data 1
H7N9 human and avian mutation counts.
- https://doi.org/10.7554/eLife.45079.024
-
Figure 6—source data 2
H7N9 human and avian mutation differential selection values and counts in each histogram bin.
- https://doi.org/10.7554/eLife.45079.025

Phylogeny of H7N9 influenza PB2 sequences showing where mutations at site 627 were inferred to have occurred.
Branches in human and avian hosts are colored black and gray respectively. Orange or red dots indicate where a mutation was inferred to have occurred. Branch lengths are scaled by annotated and inferred dates of origin of each sequence.

Phylogeny of H7N9 influenza PB2 sequences showing where mutations at site 701 were inferred to have occurred.
Branches in human and avian hosts are colored black and gray respectively. Orange dots indicate where a mutation was inferred to have occurred. Branch lengths are scaled by annotated and inferred dates of origin of each sequence.

Phylogeny of H7N9 influenza PB2 sequences showing where mutations at site 534 were inferred to have occurred.
Branches in human and avian hosts are colored black and gray respectively. Orange dots indicate where a mutation was inferred to have occurred. Branch lengths are scaled by annotated and inferred dates of origin of each sequence.

Phylogeny of H7N9 influenza PB2 sequences showing where mutations at site 355 were inferred to have occurred.
Branches in human and avian hosts are colored black and gray respectively. Orange or red dots indicate where a mutation was inferred to have occurred. Branch lengths are scaled by annotated and inferred dates of origin of each sequence.

Phylogeny of H7N9 influenza PB2 sequences showing where mutations at site 521 were inferred to have occurred.
Branches in human and avian hosts are colored black and gray respectively. Orange dots indicate where a mutation was inferred to have occurred. Branch lengths are scaled by annotated and inferred dates of origin of each sequence.

Evolutionary accessibility of mutations from current avian influenza PB2 sequences.
Distribution of mean nucleotide substitutions required to access all amino-acid mutations, previously characterized human-adaptive mutations, and top human-adaptive mutations identified in our deep mutational scanning. Mean nucleotide substitution is calculated by averaging over all avian influenza PB2 sequences collected from 2015 to 2018. Most previously characterized human-adaptive mutations are accessible by single nucleotide substitution, whereas many of the new adaptive mutations that we identified require multiple nucleotide substitutions. Mean nucleotide substitutions for each mutation are in Figure 7—source data 1.
-
Figure 7—source data 1
Mean nucleotide substitutions from avian sequences of all mutations.
- https://doi.org/10.7554/eLife.45079.027
Tables
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Cell line (Homo sapiens) | A549 | ATCC | CCL-185; RRID:CVCL_0023 | |
Cell line (Homo sapiens) | HEK293T | ATCC | CRL-3216; RRID:CVCL_0063 | |
Cell line (Canis familiaris) | MDCK-SIAT1 | Sigma-Aldrich | 5071502; RRID:CVCL_Z936 | |
Cell line (Anas platyrhynchus domesticus) | CCL-141 | ATCC | CCL-141; RRID:CVCL_T281 | |
Cell line (Canis familiaris) | MDCK-SIAT1-tet- S009-PB2-E627K | this paper | MDCK-SIAT1 cells expressing S009 PB2-E627K under control of a doxycycline-inducible promoter | |
Recombinant DNA reagent | pHW_noCMV_S009_PB2; pHW_noCMVnoTerm_BsmBI | this paper | Plasmids for generating mutant plasmid library; see Supplementary file 1 | |
Recombinant DNA reagent | pHW_S009_PB2; pHW_S009_PB1; pHW_S009_PA; pHW_S009_NP | this paper | Plasmids for generating helper virus; see Supplementary file 1 | |
Recombinant DNA reagent | HDM_S009_PB2; HDM_S009_PB1; HDM_S009_PA; HDM_S009_NP | this paper | Plasmids for protein expression of S009 polymerase complex; see Supplementary file 1 | |
Recombinant DNA reagent | pHH_PB2_S009_flank _99_eGFP_100 | this paper | Plasmids for generating helper virus; see Supplementary file 1 | |
Recombinant DNA reagent | pHW184_HA; pHW186_NA; pHW187_M; pHW188_NS | Hoffmann et al. (2000) | ||
Recombinant DNA reagent | pHH-PB1-flank-eGFP | Bloom et al. (2010) | Reporter plasmid for minigenome assay; see Supplementary file 1 | |
Recombinant DNA reagent | pcDNA3.1_mCherry | this paper | Transfection control for minigenome assay; see Supplementary file 1 | |
Recombinant DNA reagent | pSBtet_RP_S009 _PB2_E627K | this paper | Plasmid for generating PB2-expressing cell line; see Supplementary file 1 | |
Sequence-based reagent | primers | this paper | See Supplementary file 2 | |
Commercial assay or kit | NEBuilder HiFi DNA Assembly Master Mix | New England Biolabs | E2621S | |
Commercial assay or kit | ElectroMAX DH10B competent cells | Invitrogen | 18290015 | |
Commercial assay or kit | Rneasy Mini Kit | Qiagen | 74104 | |
Commercial assay or kit | Accuscript Reverse Transcriptase | Agilent | 200820 | |
Commercial assay or kit | KOD Hot Start Master Mix | EMD Millipore | 71842 | |
Commercial assay or kit | QIAamp Viral RNA Mini Kit | Qiagen | 52904 | |
Commercial assay or kit | SuperScript III | ThermoFisher Scientific | 18080051 | |
Chemical compound, drug | BioT | Bioland Scientific | B01-01 | |
Chemical compound, drug | Lipofectamine 3000 | ThermoFisher Scientific | L3000015 | |
Antibody | H17-L19 | Gerhard et al. (1981) | ||
Software, algorithm | dms_tools2 | https://jbloomlab.github.io/dms_tools2, version 2.3.0 | ||
Software, algorithm | Jupyter notebooks that perform all steps of analyses | this paper | See Supplementary file 3; https://github.com/jbloomlab/PB2-DMS |
Additional files
-
Supplementary file 1
Plasmid sequences.
- https://doi.org/10.7554/eLife.45079.028
-
Supplementary file 2
Primer sequences.
- https://doi.org/10.7554/eLife.45079.029
-
Supplementary file 3
Jupyter notebooks documenting computational analyses.
- https://doi.org/10.7554/eLife.45079.030
-
Supplementary file 4
Comparison of ExpCM to standard phylogenetic substitution models.
- https://doi.org/10.7554/eLife.45079.031
-
Transparent reporting form
- https://doi.org/10.7554/eLife.45079.032