Influenza virus transcription and progeny production are poorly correlated in single cells

David J. Bacsik; Bernadeta Dadonaite; Andrew Butler; Allison J. Greaney; Nicholas S. Heaton; Jesse D. Bloom

doi:10.7554/eLife.86852.1

Introduction

Many aspects of viral infection are heterogeneous when measured across single cells. Individual infected cells vary widely in transcription of viral genes [1–4], expression of viral proteins [4–6], presence of viral mutations [7], replication of viral genomes [8], and production of viral progeny [6,8–10]. However, it is unclear how variation in these different aspects of infection are related within the same infected cells. For instance, to what degree does the extent of viral transcription in an infected cell determine the number of progeny virions the cell produces? The answer to this question remains elusive because the most common single-cell techniques (flow cytometry and single-cell RNA sequencing) measure the levels of proteins and transcripts, rather than the number of viral progeny produced.

Here, we develop a novel approach to simultaneously measure viral transcription, viral mutations, and viral progeny production in single cells infected with influenza virus. We find that progeny production is even more heterogeneous than viral transcription in single cells. The cells that express the most viral transcripts often do not generate any detectable viral progeny. Instead, cells with extremely high viral transcription often fail to express the NS gene and represent non-productive infections. Our findings emphasize that different aspects of viral infection are not always correlated at the single cell level, and that many of the cells contributing large amounts of viral mRNA to bulk RNA sequencing studies do not appreciably contribute virions to the progeny population.

Results

Viral barcoding to measure transcription, progeny production, and viral genotype in single cells

To quantify the progeny virions released from single infected cells, we inserted random nucleotide barcodes [11–13] into the influenza virus genome so that they are positioned near the 3’ end of the viral mRNAs (Fig 1A). Standard 3’-end single-cell sequencing of the mRNA in infected cells [1,2,7,14–16] captures the viral barcode sequence along with host and viral transcripts, enabling determination of which barcoded virion(s) infected each cell (Fig 1A). We can sequence the viral barcodes on progeny virions released into the supernatant to quantify the relative number of physical progeny produced by each cell, and sequence the viral barcodes in cells secondarily infected with an aliquot of the supernatant to quantify the relative number of infectious progeny produced by each cell. Additionally, we can reconstruct the genome of the virion that infected each cell by selectively amplifying viral genes from the single-cell cDNA library and performing long-read sequencing as described previously [7]. This strategy enables simultaneous measurement of transcription, progeny production, and viral genotype in single cells.

Strategy to measure transcription, progeny production, and viral genotype in single cells.
**(A)** Insertion of barcodes in the viral genome makes it possible to quantify the progeny released from single cells, and relate progeny production to viral transcription and viral genotype. **(B)** Barcodes were inserted near the 3’ end of the mRNA sequence between the stop codon and the polyA site, using a duplicated packaging signal scheme to avoid disrupting viral genome packaging. **(C)** Viruses with one or two barcoded segments grew to similar titers as viruses with unmodified genomes. The titers shown were measured after generating the viruses by transfection.

Creation of dual-barcoded virus library

To insert barcodes into influenza virus genes, we used a previously described approach to duplicate the packaging signals of the HA and NA genes to create sites where exogenous sequence can be added without disrupting viral genome packaging [17, 18]. This approach allowed us to insert 16-nucleotide random barcodes near the 3’ end of the genes, downstream of the stop codon but upstream of the polyadenylation signal (Fig 1B). These barcodes were therefore present in both viral mRNAs and genomic RNAs (vRNAs), but did not modify the amino acid sequence of the viral protein. We refer to these viruses as “dual-barcoded” as they have barcodes on two different genes. Dual barcodes provided duplicate measurements of progeny production from the same infected cell, which were averaged and normalized (see Methods).

We engineered barcodes into the A/California/04/2009 (pdmH1N1) strain of influenza virus with the G155E cell-culture adaptation mutation [19]. Viruses with barcoded HA and NA segments could be generated by reverse genetics, and in cell culture grew to titers comparable to unmodified viruses (Fig 1C). To confirm that the sequence of individual barcodes did not affect viral growth in cell culture, we generated virus libraries carrying a small pool of barcodes and verified that the virus titers and barcode frequencies were stable across three passages (Fig S1).

For our single-cell experiments, we generated libraries of virions with a high diversity of barcodes on the HA and NA genes. Our experiments utilized between 1,500 and 4,000 infecting virions. It was important that nearly every virion have a unique barcode when randomly sampled from the library. We used deep sequencing to verify that for both HA and NA, in a sample of 1,500 barcodes from our virus library, >96% of barcodes were unique (Fig S2). In a sample of 4,000 barcodes, >92% were unique (Fig S2).

We recapitulate prior findings that viral transcription is extremely heterogeneous across single infected cells

We implemented the experiment in Fig 1A at two different MOIs. For the low MOI condition, we infected ∼10⁴ MDCK-SIAT1-TMPRSS2 cells [20] with the dual-barcoded virus library at a MOI of ∼0.15; under these conditions, most cells are uninfected, and most infected cells are infected by a single virion. For the high MOI conditions, we infected ∼7x10³ cells at a MOI of ∼0.6; under these conditions, fewer cells are uninfected and a substantial fraction of the infected cells are infected by multiple virions. To ensure a single round of relatively synchronized infection, we replaced the virus inoculum with fresh medium after one hour and added ammonium chloride, which prevents secondary infection by blocking the endosomal acidification necessary for viral fusion [21, 22]. We collected the cells for single-cell RNA sequencing at a single timepoint 12 hours after infection. Before loading onto a 10X Chromium device, we added a control sample that allowed us to quantify the rate of cell multiplets and PCR strand exchange. This control sample contained cells infected with a second virus carrying synonymous mutations that could be distinguished in sequencing data (see Methods).

For the low MOI sample, we obtained single-cell RNA sequencing data for 254 cells infected with our barcoded virus library, resulting in an empirical MOI of 0.17 (since we recovered data for ∼1,600 total cells from this sample). For the high MOI sample, we obtained single-cell RNA sequencing data for 357 cells infected with our barcoded virus library, resulting in an empirical MOI of ∼0.59 (since we recovered data for ∼800 total cells from this sample).

Note that many cells are lost during preparation for single-cell RNA sequencing, so an important caveat of our study is that some infected cells that produced viral progeny are absent in the transcriptome data [23, 24]. However, because the number of barcodes in our virus library greatly exceeds the number of infected cells, these undetected infections should not substantially affect our measurements of relative viral progeny among the cells that were captured.

Under both infection conditions, there was extremely wide variation in the amount of viral transcription among infected cells (Fig 2A), similar to that observed in prior single-cell studies of influenza infection [1,2,7,14,15]. In most infected cells, viral transcripts accounted for <10% of all transcripts, but in a small number of cells, over half of the transcripts were derived from virus (Fig 2A). On average, cells infected at high MOI expressed higher levels of viral transcripts than cells infected at low MOI, but there was extensive variation under both conditions (Fig 2A).

Viral transcription is extremely heterogeneous across single infected cells, and some cells fail to express some viral genes.
This plot shows single-cell RNA-sequencing data for the 254 cells that were infected at low MOI and 357 cells that were infected at high MOI. **(A)** Viral transcription in infected cells is extremely heterogeneous, with viral mRNA composing <1% of total mRNA in some cells, but >80% in others. **(B)** The number of viral genes detected in each infected cell. More than half of infected cells express mRNA from all 8 viral segments at both low MOI and high MOI. **(C)** The fraction of infected cells expressing each viral gene.

Under low MOI conditions, a substantial fraction (∼40%) of the infected cells also failed to express all eight viral genes (Fig 2B), a phenomenon that has been extensively described in prior studies [1,2,5]. Under high MOI conditions, fewer cells failed to express all eight viral genes (Fig 2B), likely because of complementation by co-infection [25–27]. At a per gene level, each viral gene was not expressed in some infected cells, with variation between the rates of absence for each specific gene (Fig 2C). Note that our ability to determine whether a viral gene is absent depends on the total level of viral transcription in a cell (Fig S3 and Methods), which could reduce our ability to detect the absence of the four viral genes involved in transcription (PB2, PB1, PA, NP).

Full genome sequences of the influenza virions infecting individual cells at low MOI

Virions can be defective in two ways: they can fail to express a viral gene, or they can encode mutated viral proteins. To identify cells infected by mutated virions, we used long-read sequencing to reconstruct the genome of virions infecting single cells under low MOI conditions [7]. Under these conditions, we expect most cells that are infected to be infected by only one virion. We amplified the viral transcripts from our single-cell RNA sequencing library and subjected them to PacBio sequencing. Because each transcript carries a cell barcode, we could link the sequence of each viral transcript to the cell that produced it.

We obtained complete sequences of all expressed viral genes for 131 of the 254 infected cells in our low MOI dataset (Fig S4A). About a third of the infected cells expressed all eight viral genes without any non-synonymous mutations (Fig 3A, Fig S5). The remainder of infected cells failed to express a viral gene, expressed a gene with a non-synonymous mutation, or both (Fig 3A). Mutated virions most commonly had just one non-synonymous mutation in their genome, but some virions had two or three mutations (Fig 3B, Fig S5). Note that some virions had large internal deletions in a gene (Fig S5) as has been previously described [28, 29]; here we have classified deletions as non-synonymous mutations.

Consensus viral genome sequences from single infected cells with long-read viral sequencing data.
Viral genomes were reconstructed for cells infected at low MOI. **(A)** The number of single infected cells expressing all eight viral genes without non-synonymous mutations, expressing all eight viral genes with one or more non-synonymous mutation(s), missing one or more viral gene(s), or with both mutated and missing genes. **(B)** The number of non-synonymous mutations in each viral genome. Deletions are classified as a non-synonymous mutation for these counts. This plot shows only the 131 of 254 single infected cells for which we could determine the sequence of all genes expressed by the infecting virion. See Fig S3A for details on properties of infected cells for which we could obtain full viral sequences, and Fig S4 for the full set of viral mutations in each infected cell.

Progeny production from single influenza-infected cells is more heterogeneous than viral transcription

We measured the amount of physical and infectious progeny virions produced by single infected cells. We quantified physical progeny virions by sequencing viral barcodes from vRNA molecules in the supernatant at 12 hours post-infection (Fig 1A). For the low MOI sample, we also quantified infectious progeny virions by infecting a second set of cells with some of the viral supernatant and sequencing viral barcodes from vRNA expressed in these newly infected cells (Fig 1A). We analyzed progeny production measurements for the 91 infected cells from the low MOI sample that met the following criteria: both barcoded genes were expressed (allowing us to identify both viral barcodes), and the sequences of all expressed viral genes were obtained with long-read sequencing (providing a complete viral genome) (Fig S4). For the high MOI sample, we analyzed progeny production measurements for the 290 infected cells that expressed both barcoded genes, since we did not have sequences of the viral genes.

The number of progeny virions produced per cell was extremely heterogeneous at both low MOI and high MOI (Fig 4A,B). Under low MOI conditions, nearly half of infected cells failed to produce any detectable physical or infectious progeny. At the extreme high end of viral transcription, a few cells each generated >10% of all the virions detected in the progeny population (Fig 4A,B). A similar trend was seen at high MOI, although under these conditions the most productive cells generated a smaller fraction of progeny (Fig 4A).

Viral progeny production is more heterogeneous than viral transcription across single infected cells.
Heterogeneity across single infected cells in **(A)** physical progeny production, **(B)** infectious progeny production, and **(C)** viral transcription. The Gini coefficient [30] quantifying the extent of cell-to-cell variability is indicated on each panel; a larger Gini coefficient indicates a more uneven distribution. For **(A)** and **(B)** the x-axis is the fraction of viral barcodes associated with each cell among all barcodes assignable to any infected cell; for **(C)** the x-axis is the fraction of mRNA in each cell that is derived from virus. The outset bar on the left shows the number of cells that produced no detectable progeny. This plot shows only single infected cells with complete measurements (see Fig S4). For cells infected at low MOI, this is cells that express both barcoded genes and for which we could determine the sequence of all genes expressed by the infecting virion. For cells infected at high MOI, this is cells that express both barcoded genes.

Progeny production was much more heterogeneous across single cells than viral transcription (Fig 4). While just 6 of 91 infected cells were responsible for generating half of the physical progeny at low MOI, 23 cells were required to account for half of the viral transcripts (Fig S6). We can quantify the heterogeneity in these distributions formally by calculating a Gini coefficient, which ranges from zero to one with larger values indicating more uneven distributions [30]. Under low MOI conditions, the Gini coefficients were 0.78 and 0.88 for physical and infectious progeny production (Fig 4A,B). At high MOI, heterogeneity was reduced slightly and the Gini coefficient for physical progeny production was 0.66 (Fig 4A). The Gini coefficients for viral transcription were 0.46 at low MOI and 0.40 at high MOI (Fig 4C), both of which are lower than the corresponding coefficients for progeny production.

Cells that transcribe more viral mRNA do not produce more progeny, and many high-transcribing cells represent aberrant infections that fail to express the NS gene

The correlation between viral transcription and progeny production in single cells is surprisingly poor at both low MOI and high MOI (Fig 5A,B). Under low MOI conditions, none of the cells with >25% of their mRNA transcripts derived from virus produce any detectable progeny (Fig 5A). Instead, most progeny come from cells with moderate viral transcription. At high MOI, viral progeny are produced by cells throughout the range of viral transcription (Fig 5B), but there is no trend for cells with high transcription to produce more progeny even at high MOI.

Relationship between viral transcription and progeny production in single infected cells.
Relationship between viral transcription and progeny virion production. Each point is a different cell. **(A)** For cells infected at low MOI, both physical and infectious progeny were quantified and cells are colored according to whether the cell expresses unmutated copies of all eight genes, all genes with one or more non-synonymous mutations, or fewer than all genes (with or without mutations). **(B)** For cells infected at high MOI, only physical progeny were quantified and cells are colored according to whether they express all eight viral genes or are missing one or more viral genes. Circular points indicate cells that express the NS gene and triangular points indicate cells that do not express the NS gene.. An interactive version of this figure that enables mouse-overs of points with details about individual cells is at https://jbloomlab.github.io/barcoded_flu_pdmH1N1. **(C)** Total viral transcription is plotted for each cell. The mean for each group is shown as a blue line. Cells that do not express the NS gene transcribe significantly more viral mRNA than cells expressing all viral genes (statistical significance determined by permutation test with 5000 random simulations; p=0.0002 for cells infected at low MOI and p=0.004 for cells infected at high MOI). **(D)** Like panel (C), but for physical progeny production. Cells failing to express any viral gene–including NS–produce little or no physical progeny.

At both low and high MOI, the viral gene expression information provided by single-cell RNA sequencing offers a straightforward explanation for why some cells with very high viral transcription fail to produce progeny. Cells that fail to express any viral gene produce little or no detectable progeny virions, regardless of their total viral transcription activity. The lack of physical progeny produced by cells that fail to express even a single viral gene presumably occurs because the absence of the encoded protein impairs virion formation (Fig 5D; note that our analysis is limited to cells that express HA and NA since those are the barcoded genes). But although cells that fail to express a viral gene produce little or no progeny, the converse is not true: cells that express the full complement of viral genes often still fail to produce detectable progeny (Fig 5D).

Strikingly, absence of the influenza NS gene not only precludes progeny production but is specifically associated with an aberrant state of high viral transcription (Fig 5C). At both low MOI and high MOI, many of the highest transcribing cells fail to express NS, and the mean level of viral transcription is significantly higher (p < 0.01) in cells that do not express NS compared to cells that express all viral genes (Fig 5C). These data suggest that NS acts as a negative regulator of viral transcription. This observation is consistent with the known functional roles of the NEP protein expressed from the NS gene, which is to export viral ribonucleoprotein complexes from the nucleus [31, 32] and possibly to switch the viral polymerase from transcription to genome replication through direct protein-protein interactions [33, 34]. Overall, this result suggests that the cells that contribute the most to the signal observed in transcriptomic studies often represent aberrant non-productive infections that do not contribute viral progeny.

For cells infected at low MOI, we can also study the effect of viral mutations on progeny production. At low MOI, physical viral progeny are produced both by cells that express viral genes with mutations and by cells that express unmutated viral genes; however, more of the infectious progeny virions come from cells with unmutated viral genomes (Fig 5A, Fig S7)—probably because some non-synonymous mutations interfere with protein functions that are required for infection of new cells. Nonetheless, physical and infectious progeny production are much more correlated among single cells than are transcription and progeny production (Fig 5A versus Fig S7; Pearson’s R values of -0.14 for the correlation of transcription with infectious progeny at low MOI versus 0.39 for the correlation of physical progeny with infectious progeny at low MOI). However, none of the viral factors we measure (viral transcription, expression of each viral gene, or mutated viral proteins) fully explain the extreme variation we observe in progeny production. Progeny production remains highly variable across cells without any viral deficit (e.g. cells that express unmutated copies of all viral genes), although it is less variable across this population than across all cells (Fig 6A,B). This unexplained variation suggests that cellular or uncharacterized viral factors must also contribute to cell-to-cell variation in progeny production.

Viral gene absence and viral mutations only explain a fraction of heterogeneity observed in progeny production.
**(A)** Distribution of physical progeny virions. For cells infected at low MOI, compare heterogeneity in all infected cells (left, dark grey) with cells that express all eight viral genes without non-synonymous mutations (right, blue). For cells infected at high MOI, compare heterogeneity in all infected cells (left, dark grey) with cells that express all eight viral genes (right, light grey). The outset bar on the left shows the number of cells that produced no detectable progeny. **(B)** Like panel (C), but for infectious progeny rather than physical progeny at low MOI. The plots showing all infected cells are duplicated from Fig 4 to facilitate direct comparison of all cells to those with complete unmutated genomes. For cells infected at low MOI, this figure shows the 91 infected cells for which we could identify the viral barcode on both barcoded genes and determine the sequence of all genes expressed by the infecting virion. For cells infected at high MOI, this figure shows the 290 infected cells for which we could identify the viral barcode on both barcoded genes.

Discussion

Most prior single-cell studies of infection have examined intracellular viral products, like mRNA transcripts or proteins [1–7]. However, the most important outcome of infection for multi-cycle viral growth is how many progeny virions are produced by an infected cell. Prior studies of progeny production from single cells have relied on isolating individual infected cells in small volumes [6,8–10]. These studies have shown progeny production is highly heterogeneous, but have not provided measurements of most other properties of the infected cells—and have therefore lacked explanatory power to understand the basis for the variation in progeny production. Here, we have overcome these limitations with a new method to simultaneously measure physical and infectious progeny production alongside transcription of all viral and host genes, as well as sequencing of the genome of the virion that infected each cell.

The most striking finding from our work is that while viral transcription and progeny production are both highly heterogeneous across influenza-infected cells, they are not well correlated at either low MOI or high MOI. In particular, the cells that transcribe the most viral mRNA often generate no detectable progeny virions. Part of the discrepancy between these two single-cell measurements is due to the fact that cells that fail to express the influenza NS gene tend to transcribe very high levels of mRNA but produce no progeny. This result makes biological sense: one of the proteins encoded by the NS gene is NEP, which exports viral ribonucleoproteins from the nucleus [31], terminating their transcriptional activity [32]. NEP may also mediate a switch from transcription to genome replication by the viral polymerase [33, 34].

However, absence of NS and other defects in the viral genome only explain part of the discordance between viral transcription and progeny production in single influenza-infected cells. These two properties are often discordant even in cells expressing unmutated copies of all influenza genes. We suggest that further study of both viral and cellular factors that promote transcription versus progeny production is an interesting area for future work.

Our study has several limitations. Our experiments used a cell line rather than the differentiated airway cells that are the actual target cells during human influenza infections. We performed experiments using a single strain of H1N1 influenza at a single timepoint. We were also able to profile all relevant single-cell properties for a relatively modest number of infected cells, since single-cell RNA sequencing captures only a fraction of the input cells. However, the method we have described should be extensible to other cell-culture systems and larger numbers of cells in future work.

Despite these limitations, our results suggest several implications for broader thinking about viral infections. First, recent studies have examined the distribution of influenza virus transcription or protein expression across differentiated airway cells ex vivo [24,35,36] or in vivo [15,16,37]. Our results suggest it is also important to measure progeny production across airway cells, as the cell types expressing the most viral transcripts or proteins may not be the ones producing the most viral progeny. Second, our results suggest failure of some cells to express specific viral genes contributes to the discordance between viral transcription and progeny production. Failure to express the influenza NS gene is strongly correlated with high viral transcription, even under high MOI conditions that promote genetic complementation [25–27]. Third, recent work on the transmission of influenza virus [38, 39] and SARS-CoV-2 [40–42] in humans has emphasized the narrow genetic bottleneck, with only a small fraction of viral diversity in the donor transmitted to the recipient. Our results suggest physical bottlenecks in how many virions reach the recipient may be further narrowed by the fact that only a small fraction of the initially infected cells will produce most of the progeny that continue the infection.

Acknowledgements

Thanks to Jason Underwood for providing valuable guidance related to performing long-read sequencing on single-cell cDNA libraries. We thank Will Hannon for assistance with interactive plots. BioRender was used to generate Figs 1A and 1B. This work was funded in part by the NIH/NIAID under grant R01AI165821 and contract No. 75N93021C00015, as well as using a Burroughs Wellcome Fund Young Investigator in the Pathogenesis of Infectious Diseases grant to JDB. JDB is an Investigator of the Howard Hughes Medical Institute.

Competing interests

JDB consults or has recently consulted with Apriori Bio, Merck, Moderna, or Oncorus on topics related to viruses and their evolution. JDB and AJG are inventors on Fred Hutch licensed patents related to viral deep mutational scanning. The other authors declare no competing interests.

Methods

Engineering barcodes in the influenza virus HA and NA genes

The HA segment of the A/California/04/2009 (pdmH1N1) strain of influenza virus with the G155E cell-culture adaptation mutation was engineered to carry exogenous sequence by duplicating the packaging signals at each end [17, 18], as schematized in Fig 1B. A complete plasmid map of the barcoded HA plasmid is at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/data/flu_sequences/plasmid_maps/pHH_bcHA_G155E_DropSeqR1.gb. We included the G155E mutation as it greatly enhances viral growth in cell culture [19]. Packaging signal length and location was informed by previous studies [43–45]. The terminal 105 nucleotides of the HA coding sequence were duplicated to provide an authentic packaging signal at the 5’ end of the vRNA. The corresponding 105 nucleotides of the HA protein coding sequence were synonymously recoded to remove competing RNA-RNA interactions. A second stop codon (TGA) was added at the end of the coding sequence to reduce the chance of translation read-through. The stop codons were followed by an exogenous sequence containing a priming site, a 16-nucleotide random barcode, a second priming site, and a HindIII restriction site. The 3’ end of the vRNA was treated similarly. The first 67 nt of the HA coding sequence were duplicated and the corresponding region of the coding sequence was synonymous recoded. All potential start codons were removed from the duplicated packaging signal using single nucleotide substitutions. A BamHI restriction site was added between the duplicated packaging signal and the start codon.

The NA segment of the A/California/04/2009 strain was engineered using the same strategy except we duplicated 99 nucleotides at the 5’ end of the vRNA and 93 nucleotides at the 3’ end of the vRNA. A complete map of the barcoded NA plasmid is at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/data/flu_sequences/plasmid_maps/pHH_bcNA_DropSeqR1.gb.

Cloning barcoded plasmid libraries

To facilitate cloning highly-diverse barcoded plasmid libraries, a recipient vector was created for each segment. The recipient vectors contained an eGFP insert flanked by the duplicated packaging signals described above. Recipient vector maps are at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/data/experiment_resources/plasmid_maps/2548_pHH_Haflankpdm-eGFP-DropSeqR1.txt and https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/data/experiment_resources/plasmid_maps/2549_pHH_Naflankpdm-eGFP-DropSeqR1.txt.

Inserts were prepared by amplifying the HA and NA genes from templates with synonymously-recoded terminal regions. Random barcodes were added as a string of 16 nucleotides in the primer that binds near the 3’ end of the viral mRNA. PCR was performed using KOD Hot Start Master Mix with 1 ng of plasmid template for 17 cycles. Reactions were treated with DpnI for 1 hour to remove the template plasmid. Barcoded products were gel purified and cleaned with 1X AmpureXP beads. The recipient vectors were prepared by digestion with BamHI and XbaI for 1 hour to remove the eGFP insert and linearize the backbone. Linear backbones were gel purified and cleaned with 1X AmpureXP beads.

Plasmids were assembled from linear vector and barcoded insert using NEBuilder HiFi Assembly Master Mix. A 2:1 molar ratio of insert to vector was used. 25 µl of NEBuilder Master Mix was combined with 0.27 pmol of barcoded insert and 0.13 pmol of linearized vector in a total volume of 50 µl. Assembly was allowed to proceed for 1 hour. Reactions were cleaned with 0.6X AmpureXP beads and eluted in 26 µl of EB. A small portion of the assembled product (1 µl) was used to transform 20 µl of NEB 10-Beta electrocompetent E. coli cells. Transformation was performed at 1.8 kV for > 5 ms per sample. Cells were grown in SOC media for 1 hr at 37C with shaking.

After shaking, transformed E. coli were plated on large LB-ampicillin agar plates and grown at 37C overnight to produce a “lawn” of bacterial colonies. Liquid medium was pipetted onto the plate and a sterile plastic scraper was used to collect all of the bacterial colonies. Bacteria were grown in 200 ml of liquid medium in a 1 liter flask for 4 hours at 37C with shaking. Bacteria were pelleted by centrifugation and frozen at -20C. Plasmid libraries were collected using Qiagen HiSpeed Maxi Prep kit.

Generating a dual-barcoded virus library

We generated a dual-barcoded virus library with all non-HA/NA genes derived from the A/California/04/2009 (pdmH1N1) strain of influenza virus. Virus was generated by reverse genetics in 39 independent transfection reactions. For each transfection reaction, 4e5 293T cells (ATCC #CRL-3216) were seeded in a well of a 6-well dish. Cells were grown in D10 medium (DMEM supplemented with 10% heat-inactivated fetal bovine serum, 2 mM L-glutamine, 100 U per mL penicillin, and 100 µg per mL streptomycin). After ∼16 hours, we transfected each well with bidirectional reverse-genetics plasmids based on the pHW2000 vector [46] carrying the six unmodified segments: (PB2, PB1, PA, NP, M, and NS), unidirectional reverse-genetics plasmids based on the pHH21 vector [47] carrying the two barcoded segments (HA and NA), and a plasmid constitutively expressing the TMPRSS protease (which proteolytically activates HA) [20]. Maps of all plasmids are available at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/tree/main/data/flu_sequences/plasmid_maps. We used 250 ng of each plasmid and 3.4 µl BioT transfection reagent per reaction.

Twenty-four hours after transfection, the medium was replaced with Influenza Growth Medium (Opti-MEM supplemented with 0.1% heat-inactivated FBS, 0.3% bovine serum albumin, 100 µg per mL of calcium chloride, 100 U per mL penicillin, and 100 µg per mL streptomycin) and 3e5 MDCK-SIAT1-TMPRSS2 cells [20] were added to each well. Viral supernatants were collected at 65 hours post-transfection and centrifuged at 500 RCF for 5 min to remove any cellular material. Aliquots were frozen at -80°C and titered by TCID50 assay.

To ensure a genotype-phenotype link between the viral genome and the proteins displayed on the surface of each virion, the virus library was passaged at low MOI. Infections were done at large scale to maintain library diversity. Four five-layer flasks (Falcon #353144) were seeded with 50 million MDCK-SIAT1-TMPRSS2 cells each [20] in D10 medium, for a total of approximately 200 million cells. After 4 hours, the medium was removed and two million TCID50 units of virus library in IGM were used to infect the cells. Viral supernatants were collected at ∼38 hours after infection and centrifuged at 500 RCF for 10 min to remove cellular material. Aliquots were frozen at -80°C and titered by TCID50 assay.

We obtained titers of ∼1e4 TCID50/µl (Fig 1C).

Estimating the rate of infected cell multiplets and chimeric PCR products using a second control virus library

Immediately prior to performing single-cell RNA sequencing on our sample of interest, we mixed the infected cells with a second control sample of cells. The control cells were infected with an otherwise isogenic influenza virus that carried identifying synonymous mutations on all eight viral genes. The synonymous mutations are detectable by sequencing. They mark each mRNA transcript and genome segment derived from the virus library with a distinct “genetic tag.” These synonymous genetic tags allow us to distinguish between viral transcripts from our sample of interest and the control sample, thereby enabling us to quantify two important sources of technical error.

First, in the single-cell RNA sequencing data, these tags provide a means to detect transcriptomes that are derived from droplets that encapsulated multiple infected cells (multiplets) [48]. Such transcriptomes are marked by high frequencies of both tags among the viral transcripts. The overall rate of multiplets among all cells was calculated, and multiplets bearing both tags (which will be about half of multiplets) were excluded to remove them from the dataset.

Second, the genetic tags are also detectable in the long-read viral sequencing data [7] we used to reconstruct the genotype of infecting virions. In the course of preparing long-read sequencing libraries, a polymerase can move from one template molecule to another in the midst of synthesizing its product–a phenomenon known as “strand exchange” [49]. This phenomenon can be detected in long-read viral sequences that contain discordant genetic tags (see Fig S10 of [7]). We estimated the rate at which this type of error occurs, and sequences bearing both tags were excluded from contributing to the results.

The second dual-barcoded virus library was prepared identically to the first viral library as described above. The second library contains synonymous variants near the 5’ and 3’ ends of each viral segment. Plasmid maps are available at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/tree/main/data/flu_sequences/plasmid_maps.

Infecting cells with a dual-barcoded virus library at low MOI

To infect cells at low MOI, 1x10⁴ MDCK-SIAT1-TMPRSS2 [20] cells were suspended in D10 medium and plated in a well of 24-well plate. After 5 hours, cells were observed by microscopy and were confirmed to be well-attached. The medium was aspirated and 1500 transcriptionally active units (measured by single-cell RNA sequencing) of dual-barcoded virus library in 100 µl of Influenza Growth Medium was added to the well. The cells were incubated with virus for 1 hour, and the plate was rocked by hand every 15 minutes. After 1 hour, the inoculum was removed and the cells were washed once with 250 µl of phosphate-buffered saline. 500 µl of Influenza Growth Medium supplemented with 20 mM ammonium chloride (to prevent further entry of virions into cells [21, 22]) was added to the well.

At 12 hours post-infection, the supernatant was collected and cells and debris were removed by centrifugation at 300 RCF for 3 min. The supernatant was split into 2 aliquots of 220 µl each and frozen at -80°C. The cells were collected by addition of 100 µl trypsin and a single-cell suspension was generated. The trypsin digestion was stopped by addition of 400 µl of D10 medium. The cells were washed 3 times with phosphate-buffered saline supplemented with 0.8% by volume non-acetylated bovine serum albumin. The cells were counted to confirm that approximately 10,000 cells were present per well.

Infecting cells with a dual-barcoded virus library at high MOI

High MOI infections were performed similarly to the low MOI infections described above. The following changes were employed: 6.7x10³ MDCK-SIAT1-TMPRSS2 cells [20] were plated and the cells were infected with 4000 transcriptionally active units (measured by single-cell RNA sequencing) of dual-barcoded virus library.

Single-cell RNA sequencing

Infected cells were prepared and mixed with a second control sample of infected cells to control for technical sources of error (see “Estimating the rate of infected cell multiplets and chimeric PCR products using a second control virus library” above). Approximately 20,000 cells (low MOI) or 13,000 cells (high MOI) were loaded into the 10X Chromium device. Single-cell RNA sequencing was performed with the 10X Chromium Next GEM Single Cell 3’ GEM, Library & Gel Bead Kit v3.1. The manufacturer’s standard protocol [50] was used with the following modifications. The template-switching oligo was replaced with a modified single-stranded DNA oligo with the sequence 5’-AGAGTGTTTGGGTAGAGCAGCGTGTTGGCATGTrGrGrG-3’ at a final concentration of 45 µM in the reaction mix. This change was made to accommodate some of the barcoded influenza segments’ exogenous sequence which shares homology with the standard 10X template-switching oligo. The cDNA amplification primer mix was replaced with a pair of primers with the sequences 5’-AGAGTGTTTGGGTAGAGCAGCG-3’ (binding to the custom template-switch oligo mentioned above) and 5’-CTACACGACGCTCTTCCGATCT-3’ (binding to the standard 10X adapter sequence) at a final concentration of 1 µM in the reaction mix. The cDNA amplification PCR reaction extension time was increased to 20 seconds to encourage the formation of full-length cDNA products. The amplified cDNA product was split in half. One half was used for fragmentation and preparation of the transcriptome sequencing library while, for the low MOI sample, the other half was used as template for long-read sequencing of viral transcripts.

Viral long-read sequencing to reconstruct infecting viral genomes

We determined the sequence of the virion that infected each cell for the low MOI sample. Because the cells were infected at a low MOI, infection was initiated by one virion in the large majority of infected cells. To capture these sequences, we selectively enriched viral cDNA molecules using a method described previously [7]. In brief, cDNA derived from the 10x Genomics protocol was first amplified in a semi-specific PCR reaction. Each segment was amplified with a primer annealing to the universal TruSeq primer site that is added to all cDNA molecules during the reverse transcription step of 10x Genomics protocol and a segment-specific primer annealing to 5’ end of the viral mRNA, which also contains a flanking sequence that is complementary to the TrueSeq primer site (Table S1). Semi-specific PCR reaction conditions were as follows: 12 ng cDNA, 0.5 µM of forward and reverse primer, 10 µl of KOD (EMD Millipore, 71842), 0.1 mg/ml BSA, and final volume adjusted to 20 µl with water. PCR was incubated for 120 s at 95°C followed by 10 cycles of 120 s at 95°C, 20 s at 55°C, 90 s at 70°C, and the final extension step at 70°C for 120 s. Semi-specific PCR reactions were purified using AMPure XP beads at 1.8x beads to sample ratio and eluted in 12 µl of water. Following purification, PCR products were circularized via complementary TrueSeq sequence. For circularization, 10 µl of purified PCR product was used in a 20 µl HiFi assembly reaction (NEB, E2621S). HiFi assembly was performed at 50°C for 1 hour.

Next, HiFi products were used in segment-specific PCR reactions. To amplify viral products of all lengths, primers that anneal to the ends of viral mRNA were used; to preferentially amplify full-length viral segments, primers that anneal to the middle of each viral segment were used (Table S2).

Segment-specific PCR conditions were as follows: 9 µl of Hifi reaction, 0.5 µM of forward and reverse primer, 25 µl of KOD, and the final PCR reaction volume adjusted to 50 µl with water. PCR was incubated for 120 s at 95°C followed by cycling 120 s at 95°C, 20 s at 55°C and 90 s at 70°C with a final extension step of at 70°C for 120 s. Cycles were kept to a minimum to reduce strand exchange; since different segments required different yield, different numbers of cycles were employed each segment-specific reaction. For the polymerase segments, 14 cycles of segment-specific PCR were performed;for the HA, NA and NP segments, 10 cycles were performed;, for the M and NS segments, 7 cycles were performed. PCR reactions were purified using AMPure XP beads at 1.8x beads to sample ratio and eluted in 12 µl of water. All purified PCR products were pooled together and long-read sequencing was performed on a PacBio Sequel II.

We generated CCS sequences of each viral transcript using PacBio long-read sequencing. We measured the rate of strand exchange that occurred during sequencing library preparation (see Fig S10 of [7]), and found that fewer than 1% of sequences were affected, providing high confidence that the sequences we obtained could be assigned to their cell of origin. We generated a consensus sequence for each viral genome (see “Computational analysis of single-cell RNA sequencing, long-read virus sequencing, and progeny production viral barcode data” below) . We counted the number of non-synonymous mutations found in each consensus genome; deletions were considered non-synonymous mutations for this purpose.

Quantifying progeny production

The amount of progeny produced by single infected cells was determined by sequencing the viral barcodes on vRNA molecules. To quantify physical progeny virions, we sequenced the vRNA in the viral supernatant at 12 hours post infection. For the low MOI sample, to quantify infectious progeny virions, we infected a second set of cells to select for virions that could perform viral entry and genome replication (Fig 1A) and sequenced the intracellular vRNA molecules at 13 hours post infection.

In detail, we thawed frozen viral supernatants that were collected at 12 hours post infection and split them into four equal volumes. Two volumes were used to isolate supernatant RNA directly. For the low MOI sample, the other two volumes were used to infect MDCK-SIAT1-TMPRSS2 cells [20] at a moderate estimated MOI of ∼0.25 in two independent replicates. To infect the cells, 60,000 MDCK-SIAT1-TMPRSS2 cells [20] were suspended in D10 medium and plated in a well of 6-well plate. After 7 hours, cells were observed by microscopy and were confirmed to be well-attached. The medium was aspirated and an aliquot of supernatant with an estimated 15,000 TCID50 units was added to the well in 500 µl of Influenza Growth Medium. The cells were incubated with virus for 1 hour, and the plate was rocked by hand every 15 minutes. After 1 hour, the inoculum was removed and the cells were washed once with 500 µl of phosphate-buffered saline. 1600 µl of Influenza Growth Medium supplemented with 20 mM ammonium chloride (to prevent further entry of virions into cells [21, 22] was added to the well. At 13 hours post infection, the cells were collected by aspirating the growth medium and incubating with 300 µl trypsin to detach them from the plate. Trypsin digestion was stopped by the addition of 700 µl of D10 medium. The cells were pelleted by centrifugation at 400 RCF for 3 min. The cell pellet was washed by resuspending in 1 ml of phosphate-buffered saline and pelleting at 400 RCF for 3 min again. The phosphate-buffered saline was aspirated and the cell pellet was flash-frozen on dry ice.

RNA was isolated from the viral supernatant or infected cell pellets using the RNeasy Mini Kit (Qiagen, 74104). Lysis buffer was mixed with the viral supernatant sample and 70% ethanol was added . For the infected cell pellets, the sample was mixed with lysis buffer and homogenized by vortexing at high speed for 20 seconds. The homogenized sample was processed on a gDNA eliminator spin column to remove genomic DNA. The processed sample was combined with 70% ethanol. From this point, both the viral supernatant and infected cell pellets were treated identically and followed the standard RNA purification protocol specified by the manufacturer [51]. The RNA for each sample was eluted in 50 µl of RNase-free water.

Reverse transcription was performed with a segment-specific primer targeted to the HA or NA vRNA (Table S3). Two replicate reactions were performed using RNA from the viral supernatant sample, and two independent reactions were performed using RNA from the two infected cell pellets; these replicates provide technical duplicate measurements of both the physical progeny in the supernatant and the infectious progeny in the cell pellets.

Reverse transcription was performed using the SuperScript III First-Strand Synthesis SuperMix kit according to the manufacturer protocol [52]. For the viral supernatant samples, 12 µl of each RNA sample was used as template for each 40 µl reaction. For the infected cell pellet samples which contain much larger amounts of total RNA due to the host RNA present in the cell, 1000 ng of RNA was used as template for each 40 µl reaction. The low-concentration cDNA generated from the viral supernatant samples was purified and concentrated using 2X Ampure SPRI beads and eluted into 22 µl of elution buffer.

Viral barcodes were amplified in 50 µl PCR reactions using KOD Hot-Start Master Mix (Sigma-Aldrich, 71842). For the viral supernatant samples, 22 µl of concentrated cDNA was used as template. For the high-concentration infected cell pellet samples, 10 µl of unpurified cDNA was used as template. Segment-specific primers (Table S3) were used and reactions were run for 20 cycles. Amplicons were size-selected and purified using a double-sided AmpureXP bead cleanup. Samples were first combined with 0.8X AmpureXP beads and the supernatant was collected. The supernatant was then combined with 1.8X AmpureXP beads and the bound DNA was collected.

Sequencing indices and adapters were attached in a 50 µl PCR reaction using KOD Hot-Start Master Mix. For all samples, 2 ng of purified amplicon DNA was used as template. Sample-specific index primers (Table S3) were used and reactions were run for 20 cycles. The resulting amplicons were gel-purified and pooled for single-end sequencing on an Illumina MiSeq. The progeny contribution of each cell was calculated (see “Computational analysis of single-cell RNA sequencing, long-read virus sequencing, and progeny production viral barcode data” below).

Computational analysis of single-cell RNA sequencing, long-read virus sequencing, and progeny production viral barcode data

A reproducible pipeline that performs all analysis is at https://github.com/jbloomlab/barcoded_flu_pdmH1N1. The pipeline uses Snakemake [53]. The pipeline begins with raw sequencing data and ends by generating the figures shown in this manuscript. Most code in the pipeline is arranged in Jupyter notebooks (https://jupyter.org).

Briefly, the raw sequencing data from the single-cell RNA sequencing was aligned using STARsolo [54] against a composite reference made up of the canine genome CanFam3.1.98 concatenated to the A/California/04/2009 influenza virus genome. Alignment produced a cell-gene matrix containing the gene expression of every canine and virus gene for each single cell. Custom Python code was used to parse the “genetic tag” encoded on viral transcripts which differentiates our library of interest from a second control library. The multiplet rate was calculated and only transcriptomes from our library of interest were used for analysis. Transcriptomes from the second control library and transcriptomes composed of multiple infected cells were excluded.

The total viral gene expression was calculated for each infected cell. Because the amount of viral transcripts in the ambient environment during single-cell RNA sequencing [55] varied by MOI, different thresholds were used to call cells as infected at each MOI. For the low MOI sample, cells were called as infected if at least 1% of their transcripts came from virus. For the high MOI sample, cells were called as infected if at least 2.5% of their transcripts came from virus. These thresholds provided clear separation of an infected and uninfected population. Individual viral genes were called as expressed if their frequency was greater than the 99th percentile observed in uninfected cells (see Fig S3).

For the statistical test in Fig 5C, we classified each cell as expressing all viral genes, missing the NS gene, or missing another influenza gene. Because the data are not normally distributed, we performed a non-parametric permutation test, randomly labeling each observed value with a classification. We performed 5000 permutations and calculated the p-value as the frequency with which the randomly generated difference between classified groups matched or exceeded the observed difference between groups. The results indicated that cells missing the NS gene have a statistically significant difference in viral transcription compared to cells that express all viral genes (p < 0.01) under both low MOI and high MOI conditions.

For the low MOI sample, the raw PacBio sequencing data was processed using PacBio’s ccs program (https://github.com/PacificBiosciences/ccs). Consensus sequences were generated from the subread files, requiring a minimum accuracy (‘rq’) of 0.99 for the consensus sequence. The chimera rate was estimated using the “genetic tags”. The cell barcode and UMI were parsed from each CCS using custom Python code that utilized the alignparse package [56]. A consensus sequence was called for each cell barcode-viral gene-UMI combination. A mutation was included in the consensus sequence if it was found in >50% of the CCS for the cell barcode-viral gene-UMI combination. A consensus sequence was then called for each cell barcode-viral gene combination. A mutation was included in the consensus sequence if it was found in >50% of the UMIs for the cell barcode-viral gene and was found in at least two UMIs.

To parse the viral barcodes sequenced from the supernatant (representing physical progeny) and from the second infection (representing infectious progeny), we used custom Python code that utilized the dms_variants package (https://jbloomlab.github.io/dms_variants/). The viral barcodes were error-corrected using UMI-tools [57]. The technical replicates for each sample were plotted against each other and the limit of detection was set at 1e-5, where viral barcode frequencies fail to correlate (Fig S8), indicating bottlenecked subsampling of the molecules carrying the viral barcodes, and suggesting that frequency measurements below this threshold are not reliable; values below the limit of detection were set to the limit of detection. The mean frequency of both replicates was calculated. A subset of infected cells expressing both barcoded viral genes and with complete long-read sequencing data was used to calculate progeny contributions. To determine the fraction of progeny contributed by each infected cell in this set, we took the geometric mean of the HA and NA barcode frequencies associated with each cell. We normalized the progeny contributions by the total frequencies assignable to any cell in this set. The data were visualized in Jupyter notebook (https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/final_analysis.py.ipynb). We used custom Python code utilizing a combination of plotnine (https://github.com/has2k1/plotnine) and altair (https://github.com/altair-viz/altair). An R script utilizing gggenes (https://github.com/wilkox/gggenes) was used to plot the complete viral genomes of infected cells. The figures generated by this notebook are displayed in this manuscript.

Data availability

All data and code are available in the GitHub repository at https://github.com/jbloomlab/barcoded_flu_pdmH1N1. The analysis can be reproduced by running the Snakemake pipeline and final analysis notebook according to the instructions at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/README.md.

Key output files are hosted at the following locations. All raw sequencing files are available on GEO under the accession number GSE214938. The single-cell RNA sequencing cell-gene matrix is also available on GEO under the accession number GSE214938.. An integrated CSV produced by the Snakemake pipeline with cell barcodes, viral gene expression, viral genome sequence, and viral barcode frequencies is available at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/results/viral_fastq10x/all_samples.csv. The final CSV file with progeny contribution measurements, viral gene expression, and viral mutations (when available) for the infected cells with complete measurements is available at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/revisions/results/viral_fastq10x/all_samples_complete_measurements_cells_data.csv.

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Viral barcoding to measure transcription, progeny production, and viral genotype in single cells

Strategy to measure transcription, progeny production, and viral genotype in single cells.

Creation of dual-barcoded virus library

We recapitulate prior findings that viral transcription is extremely heterogeneous across single infected cells

Viral transcription is extremely heterogeneous across single infected cells, and some cells fail to express some viral genes.

Full genome sequences of the influenza virions infecting individual cells at low MOI

Consensus viral genome sequences from single infected cells with long-read viral sequencing data.

Progeny production from single influenza-infected cells is more heterogeneous than viral transcription

Viral progeny production is more heterogeneous than viral transcription across single infected cells.

Cells that transcribe more viral mRNA do not produce more progeny, and many high-transcribing cells represent aberrant infections that fail to express the NS gene

Relationship between viral transcription and progeny production in single infected cells.

Viral gene absence and viral mutations only explain a fraction of heterogeneity observed in progeny production.

Discussion

Acknowledgements

Competing interests

Methods

Engineering barcodes in the influenza virus HA and NA genes

Cloning barcoded plasmid libraries

Generating a dual-barcoded virus library

Estimating the rate of infected cell multiplets and chimeric PCR products using a second control virus library

Infecting cells with a dual-barcoded virus library at low MOI

Infecting cells with a dual-barcoded virus library at high MOI

Single-cell RNA sequencing

Viral long-read sequencing to reconstruct infecting viral genomes

Quantifying progeny production

Computational analysis of single-cell RNA sequencing, long-read virus sequencing, and progeny production viral barcode data

Data availability

Viral barcode sequences are selectively neutral.

Extremely diverse barcoded virus libraries.

Expression of viral genes in infected cells.

Number of cells with progeny measurements and viral genome sequencing.

Viral genotypes in cells infected at low MOI.

Cumulative fraction of viral products produced by single infected cells.

Frequency of physical progeny and infectious progeny from single infected cells infected at low MOI.

Technical replicates of progeny measurements.

Semi-specific primers for amplification of influenza transcripts from full-length cDNA library.

Segment-specific primers for amplification of influenza transcripts.

Primers for the reverse transcription, amplification, and indexing of viral barcode sequencing samples.

References

Article and author information

Author information

David J. Bacsik

Bernadeta Dadonaite

Andrew Butler

Allison J. Greaney

Nicholas S. Heaton

Jesse D. Bloom

Version history

Cite all versions

Copyright

Metrics