Strategy to measure transcription, progeny production, and viral genotype in single cells.

(A) Insertion of barcodes in the viral genome makes it possible to quantify the progeny released from single cells, and relate progeny production to viral transcription and viral genotype. (B) Barcodes were inserted near the 3’ end of the mRNA sequence between the stop codon and the polyA site, using a duplicated packaging signal scheme to avoid disrupting viral genome packaging. (C) Viruses with one or two barcoded segments grew to similar titers as viruses with unmodified genomes. The titers shown were measured after generating the viruses by transfection.

Viral transcription is extremely heterogeneous across single infected cells, and some cells fail to express some viral genes.

This plot shows single-cell RNA-sequencing data for the 254 cells that were infected at low MOI and 357 cells that were infected at high MOI. (A) Viral transcription in infected cells is extremely heterogeneous, with viral mRNA composing <1% of total mRNA in some cells, but >80% in others. (B) The number of viral genes detected in each infected cell. More than half of infected cells express mRNA from all 8 viral segments at both low MOI and high MOI. (C) The fraction of infected cells expressing each viral gene.

Consensus viral genome sequences from single infected cells with long-read viral sequencing data.

Viral genomes were reconstructed for cells infected at low MOI. (A) The number of single infected cells expressing all eight viral genes without non-synonymous mutations, expressing all eight viral genes with one or more non-synonymous mutation(s), missing one or more viral gene(s), or with both mutated and missing genes. (B) The number of non-synonymous mutations in each viral genome. Deletions are classified as a non-synonymous mutation for these counts. This plot shows only the 131 of 254 single infected cells for which we could determine the sequence of all genes expressed by the infecting virion. See Fig S3A for details on properties of infected cells for which we could obtain full viral sequences, and Fig S4 for the full set of viral mutations in each infected cell.

Viral progeny production is more heterogeneous than viral transcription across single infected cells.

Heterogeneity across single infected cells in (A) physical progeny production, (B) infectious progeny production, and (C) viral transcription. The Gini coefficient [30] quantifying the extent of cell-to-cell variability is indicated on each panel; a larger Gini coefficient indicates a more uneven distribution. For (A) and (B) the x-axis is the fraction of viral barcodes associated with each cell among all barcodes assignable to any infected cell; for (C) the x-axis is the fraction of mRNA in each cell that is derived from virus. The outset bar on the left shows the number of cells that produced no detectable progeny. This plot shows only single infected cells with complete measurements (see Fig S4). For cells infected at low MOI, this is cells that express both barcoded genes and for which we could determine the sequence of all genes expressed by the infecting virion. For cells infected at high MOI, this is cells that express both barcoded genes.

Relationship between viral transcription and progeny production in single infected cells.

Relationship between viral transcription and progeny virion production. Each point is a different cell. (A) For cells infected at low MOI, both physical and infectious progeny were quantified and cells are colored according to whether the cell expresses unmutated copies of all eight genes, all genes with one or more non-synonymous mutations, or fewer than all genes (with or without mutations). (B) For cells infected at high MOI, only physical progeny were quantified and cells are colored according to whether they express all eight viral genes or are missing one or more viral genes. Circular points indicate cells that express the NS gene and triangular points indicate cells that do not express the NS gene.. An interactive version of this figure that enables mouse-overs of points with details about individual cells is at https://jbloomlab.github.io/barcoded_flu_pdmH1N1. (C) Total viral transcription is plotted for each cell. The mean for each group is shown as a blue line. Cells that do not express the NS gene transcribe significantly more viral mRNA than cells expressing all viral genes (statistical significance determined by permutation test with 5000 random simulations; p=0.0002 for cells infected at low MOI and p=0.004 for cells infected at high MOI). (D) Like panel (C), but for physical progeny production. Cells failing to express any viral gene–including NS–produce little or no physical progeny.

Viral gene absence and viral mutations only explain a fraction of heterogeneity observed in progeny production.

(A) Distribution of physical progeny virions. For cells infected at low MOI, compare heterogeneity in all infected cells (left, dark grey) with cells that express all eight viral genes without non-synonymous mutations (right, blue). For cells infected at high MOI, compare heterogeneity in all infected cells (left, dark grey) with cells that express all eight viral genes (right, light grey). The outset bar on the left shows the number of cells that produced no detectable progeny. (B) Like panel (C), but for infectious progeny rather than physical progeny at low MOI. The plots showing all infected cells are duplicated from Fig 4 to facilitate direct comparison of all cells to those with complete unmutated genomes. For cells infected at low MOI, this figure shows the 91 infected cells for which we could identify the viral barcode on both barcoded genes and determine the sequence of all genes expressed by the infecting virion. For cells infected at high MOI, this figure shows the 290 infected cells for which we could identify the viral barcode on both barcoded genes.

Viral barcode sequences are selectively neutral.

Influenza virus carrying a pool of HA and NA barcodes was generated by reverse genetics and passaged 3 times at low MOI. (A) The titers were measured after each growth step by TCID50. (B) The frequency of each barcode in the viral population was measured by deep sequencing after each passage. Each color represents a unique viral barcode. The frequencies of viral barcodes were fairly consistent across passages, indicating a lack of selection for any particular barcode sequence. The viral barcode frequencies were calculated using the code at https://github.com/dbacsik/barcode_neutrality.

Extremely diverse barcoded virus libraries.

Rarefaction curves show the diversity of the viral barcodes. The x-axis represents the number of barcodes sampled. The y-axis represents the number of sampled barcodes that are unique. A hypothetical perfect library where every barcode is unique appears as a straight line with formula x=y and is shown here with a blue dashed line. Our low MOI experiments used approximately 1500 virions per sample. The number of unique barcodes in a sample of 1500 is annotated in black in each facet. Our high MOI experiments used approximately 4000 virions per sample. The number of unique barcodes in a sample of 4000 is annotated in red in each facet. The rarefaction curves were calculated using https://jbloomlab.github.io/dms_variants/dms_variants.barcodes.html?highlight=rarefybarcodes#dms_variants.barcodes.rarefyBarcodes.

Expression of viral genes in infected cells.

This plot shows single-cell RNA-sequencing data for the 254 infected cells infected at low MOI and the 357infected cells infected at high MOI. Total viral transcription and expression of each viral gene in single infected cells. Genes with low average transcript counts in the single-cell RNA sequencing data (PB2, PB1, and PA) are called as absent if there are zero transcripts detected in a cell. Genes with higher average transcript counts in this data (HA, NP, NA, M, and NS) are called as absent if their abundance falls at or below the 99th percentile observed in uninfected cells. Low, non-zero transcript counts for these genes most likely result from transcripts leaking from one oil droplet to another during single-cell RNA sequencing [55].

Number of cells with progeny measurements and viral genome sequencing.

Each point indicates a cell, and blue lines indicate the mean. (A) In the low MOI experiment, 254 infected cells were identified by single-cell RNA sequencing. Viral transcription was similar in cells that expressed both barcoded viral genes and cells that were missing expression of one or both. Viral genomes in the low MOI experiment were analyzed using long-read viral genome sequencing. 131 infected cells had complete PacBio long-read sequencing data for every expressed viral gene. On average, cells with complete sequencing coverage had higher viral transcription than cells without complete sequencing coverage. 91 infected cells had long-read sequencing of all expressed viral genes and expressed both barcoded viral genes. On average, cells with all measurements had slightly higher viral transcription than cells without all measurements. (B) In the high MOI experiment, 357 infected cells were identified. 290 cells expressed both barcoded genes, providing complete measurements for that sample. Cells with expressing both barcoded genes had higher viral transcription, on average.

Viral genotypes in cells infected at low MOI.

The sequence of the infecting virion for the 131 infected cells infected at low MOI for which we could determine the sequence of all expressed viral genes. Each infected cell is represented as a row and each viral transcript is represented as an arrow. Missing viral genes, insertions, deletions, and mutations are annotated on the arrows. Viral transcription (as a fraction of UMIs in the cell), and viral progeny production (as a fraction of the physical progeny virions in the supernatant) are shown for each infected cell. Cells with one or more missing barcoded viral genes have “NA” values listed for progeny production. A high-resolution version of this figure is available at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/results/figures/viral_genomes_plot.pdf.

Cumulative fraction of viral products produced by single infected cells.

For the viral mRNA values, the y-axis represents each cell’s contribution to the total viral mRNA transcripts across all cells. For the progeny values, the y-axis represents each cell’s contribution to the barcodes in the supernatant or second infection that are assignable to one of the infected cells. A horizontal line is drawn at y=0.5 to indicate the minimum number of cells that generated half of the total amount of each viral product. This plot shows the single infected cells for which we obtained complete measurements. For the low MOI experiment, this was 91 infected cells for which we could identify the viral barcode on both barcoded genes and determine the sequence of all genes expressed by the infecting virion. For the high MOI experiment, this was 290 infected cells expressing both barcoded viral genes.

Frequency of physical progeny and infectious progeny from single infected cells infected at low MOI.

Each point represents a single infected cell infected at low MOI. The x-axis represents the fraction of physical progeny generated by each cell. The y-axis represents the fraction of infectious progeny generated by each cell. This plot shows the 91 cells infected at low MOI for which we could identify the viral barcode on both barcoded genes and determine the sequence of all genes expressed by the infecting virion.

Technical replicates of progeny measurements.

This plot shows the frequency of each viral barcode as measured in two technical replicates. Imperfect correlations indicate that bottlenecking in recovery of molecules contributes noise to the measurements. As described in Fig 1A, physical progeny were measured by sequencing viral barcodes in RNA extracted from viral supernatants, and infectious progeny were measured by sequencing viral barcodes in RNA extracted from cells infected with progeny virus supernatants. For physical progeny, technical replicates represent independent reverse-transcription reactions. For infectious progeny, independent replicate infections were performed before reverse transcription. Each point represents the fraction of viral barcodes that could be associated with that cell. The limit of detection (dashed blue line) is set as the point below which replicate frequency correlations become much worse, suggesting severe bottlenecking. To calculate the progeny contribution, we averaged the two replicates and then took the geometric mean of the values for HA and NA for each cell.

Semi-specific primers for amplification of influenza transcripts from full-length cDNA library.

These primers were used to amplify each influenza gene from the full-length single-cell RNA sequencing library. For each reaction, a segment-specific primer was paired with the “Read1_TruSeq” primer, which binds to the Illumina sequencing primer found on all transcripts in the library.

Segment-specific primers for amplification of influenza transcripts.

These primers were used to amplify the influenza genes from circularized templates. For each gene, two reactions were performed. One set of reactions targeted templates without large deletions and bound near the middle of the open reading frame; these reactions utilized the primers with “mid” in their name. The other reactions targeted all templates (with and without deletions) and bound near the ends of the open reading frame; these reactions utilized the primers with “end” in their name.

Primers for the reverse transcription, amplification, and indexing of viral barcode sequencing samples.

Binding sequences are shown in uppercase and overhangs are shown in lowercase. Primers with “RT” in their name” were used to reverse transcribe the HA or NA viral RNA in supernatants or infected cells. The viral barcodes were amplified from the cDNA using an HA-specific or NA-specific primer (primers with “PCR” in their name) paired with a primer that binds exogenous sequence embedded in the barcoded segments (“PCR_Universal_R”). The samples were prepared for pooling and Illumina sequencing by attaching a sample index (“SampleIndexXX_F”) and sequencing adapters (“Adapter_Universal_R”) in a final PCR reaction.