Figures and data

Overview of the process of obtaining genomic diversity of influenza virus in a population replicated from a single particle.
(a) MDCK cells were infected with influenza virus at a concentration of 0.05 PFU/well. After 3-4 days, supernatants were collected from wells where cellular tissue destruction had occurred, and viral RNA was extracted.
(b) The RT primer consisted of a 25-nucleotide PCR priming site at the 5’ end, followed by a 6-nucleotide sample ID, a 21-nucleotide UMI containing a 15-nucleotide random nucleotide, and a 12-nucleotide sequence complementary to the template. cDNA was synthesized via reverse transcription, followed by PCR amplification using primers containing the PCR priming site sequence and gene-specific sequence.
(c) The sequences were grouped according to their UMI. For each unique UMI, the original RNA sequence was reconstructed by majority voting among identical sequences. The reconstructed RNA molecules were mapped to the reference genome, and the distribution of mutations was quantified.

Error rates of UMI-ligated plasmid filtered by different group size
The plasmid, replicated by cloning in E. coli. was linearized using restriction enzyme treatment. A unique molecular identifier (UMI) was then attached through ligation, followed by amplification via PCR and subsequent sequencing. After sequencing, consensus sequences were obtained from reads sharing the same UMI, allowing for the reconstruction of single-molecule sequences.
(a) Changes in the error rate of linearized plasmids when filtered based on group size (UMI duplication detection count).
(b) Changes in the error rate in homopolymer and non-homopolymer regions when filtered based on group size.

Error rates obtained from virus extracted RNA and in vitro transcribed RNA
(a) Sequencing results of RNA extracted from a virus population propagated from a single molecule and RNA synthesized by in vitro transcription from a plasmid. The viral RNA represents the average mutation rate obtained from four virus populations, with bars indicating the standard deviation. The in vitro transcribed RNA represents the average of two experiments, with bars indicating the standard deviation.
(b) The bar graph of mutation rates obtained at each position within the coding region of the extracted viral RNA. The horizontal axis represents the nucleotide position within the coding region.

The Distance from the Poisson Distribution and Sequence Diversity in Viral Extracted RNA and In Vitro Transcribed RNA
(a) Distribution of mutation detection counts by position in the PB2 gene extracted from Population1 virus and PB2-like sequence RNA synthesized via in vitro transcription. The horizontal axis represents the number of times a mutation was detected, while the vertical axis represents the number of positions where mutations with the corresponding detection count were observed. The fitting line represents the Poisson distribution calculated based on the mean value for each dataset.
(b) Jensen-Shannon Diversity between the Poisson distribution and each gene extracted from the virus or the PB2-like RNA synthesized via in vitro transcription. To ensure equal conditions across genes, distributions were normalized to a sequencing coverage of 1000. The NS gene was not included in the graph because no population had sequencing coverage exceeding 1000. The bar height for the viral populations represents the mean value across four populations, and the error bars indicate the standard deviation.
(c) Shannon entropy of each gene extracted from the virus and the PB2-like gene synthesized via in vitro transcription. To ensure equal conditions across genes, sequencing results from 1000 molecules were randomly sampled, and the Shannon entropy was calculated for each sample. This procedure was repeated 10 times, and the average value was used as the Shannon entropy for a given gene in a given population. The bar height for the viral populations represents the mean value across four populations, and the error bars indicate the standard deviation.

Detection Rate of Variants in Related Strains and Variants Matching the Latest Strain
(a) Among the three mutations observed between the strain used in this experiment (PR8) and its related strain (Alaska/1935), the detection rates of the two mutations observed in the cultured population of this experiment within each population are presented. The orange bars represent the average mutation rate in non-homopolymer regions across all populations for PB1 and HA.
(b) A list of mutations observed in the cultured population of this experiment among the multiple mutations found between PR8 and the latest circulating strain (CA01). “Homology” indicates the amino acid sequence similarity between PR8 and CA01.