Schema of the workflow for the single-cell LoopSeq assay.

200-300 live cells per sample were co-partitioned with Gel Beads and subsequently lysed. The captured mRNAs were reverse-transcribed and barcoded using Chromium Next GEM 3’ reagent 3.1 kit (10x Genomics). The cellular barcoded cDNAs were ligated with a LoopSeq Adapter (Element Biosciences) and enriched by human core exome capturing (Twist Biosciences). This was followed by amplification and intramolecular distribution of the LOOP UMI located on the LoopSeq Adapter. The LOOP UMI barcoded cDNAs were then fragmented and ligated with an adaptor to generate a short-read sequencing library before sequencing.

Mutation expression clustering of cells from HCC and its benign liver counterpart.

(A) UMAP clustering of cells from the HCC and benign liver, based on mutational gene expressions shares with standard deviations ≥0.4. Red-cells are from HCC; Blue-cells are from benign liver. (B) UMAP clustering of cells from the HCC and benign liver based on mutational isoform expression shares with standard deviations ≥0.4. (C) Relabeling of clusters from (B) as ‘A’, ‘B’, and ‘C’. (D) Venn diagram of mutational isoform expressions in cells from clusters A, B, and C. (E) UMAP clustering of cells from the HCC and benign liver based on the mutational isoform expressions from clusters A, B, and C. (F) UMAP clustering of cells from the HCC and benign liver based on the mutational isoform expression in at least 5 cells from clusters A, B, and C.

Mutations in HLA molecules dominated the landscape of HCC mutational isoform expressions.

(A) The UMAP clusters from Figure 2F were relabeled as A through H groups as indicated. Cells from HCC and benign liver in each cluster are indicated. (B) Heat map of 104 mutational isoform expressions in the HCC and benign liver and clusters A through H. (C) Relabeling of UMAP clusters from (A) with cells expressing mutation HLA isoforms in triangles. Cells expressing mutation HLA isoforms in each cluster are indicated.

Evolution of mutations in HLA-DQB1 molecule.

(A) Somatic mutations in single molecules of HLA-DQB1 NM_002123. The position of mutation is indicated at the bottom of the graph. The mutation is numerically numbered from C-terminus to N-terminus. The numbers of cells expressing these mutation isoforms from each cluster (indicated at the top) or sample of origin (indicated at the bottom) are shown in the right panel. Close circle-mutation codon; Open circle-wild type codon. Open rectangle-double single-nucleotide mutation in the same codon. (B) Pathway flow chart of mutation accumulation in single molecules of HLA-DQB1. The area of the circle is proportional to the accumulated number of mutations in a molecule. The scale on the left indicates the number of mutations in a single molecule but is not mathematically scaled. The arrow indicates the potential pathways of mutation accumulation in these molecules. The number of white text indicates specific mutation(s) in a molecule. The number of red text indicates the number of cells expressing the mutation(s).

Mutation isoform expression of DOCK8 and STEAP4.

(A) Heatmap of wild-type and mutation isoform expressions of DOCK8. The number of transcript, the position of mutation and the specific isoforms were indicated. Some transcripts have multiple assignment due to detection of partial transcripts. *-prediciton based on sequence of NM_203447. (B) Heatmap of wild-type and mutation isoform expression of STEAP4. The number of transcript, the position of mutation, and the specific isoforms were indicated.

Fusion gene expression validation in HCC sample.

Left panel: ACTR2-EML4 fusion. Top: Chromosome organization of EML4 and ACTR2 exons. The directions of transcriptions are indicated. 2nd from the top: Exon representations in ACTR2-EML4 fusion transcript, EML4 NM001145076.3, and ACTR2 NM001005386.3. Middle: Chromogram of Sanger sequencing. The segments for ACTR2 and EML4 are indicated. Bottom: Protein domain and motif organizations of EML4, ACTR2, and ACTR2-EML4 fusion. Middle panel: PDCD6-CCDC127 fusion. Top: Chromosome organization of CCDC127 and PDCD6 exons. The directions of transcriptions are indicated. 2nd from the top: Exon representations in PDCD6-CCDC127 fusion transcript, CCDC127 NM145265.3, and PDCD6 NM013232.4. Middle: Chromogram of Sanger sequencing. The segments for PDCD6 and CCDC127 are indicated. Bottom: Protein domain and motif organizations of CCDC127, PDCD6, and PDCD6-CCDC127 fusion. Right panel: PLG-FGG fusion. Top: Chromosome organization of PLG and FGG exons. The directions of transcriptions are indicated. 2nd from the top: Exon representations in PLG-FGG fusion transcript, PLG NM000301.5, and FGG NM000509.6. Middle: Chromogram of Sanger sequencing. The segments for PLG and FGG are indicated. Bottom: Protein domain and motif organizations of PLG, FGG, and PLG-FGG fusion. The open-reading frame of FGG was eliminated due to frameshift in PLG-FGG fusion. Unrelated four additional amino acids were added to the truncated N-terminus of PLG.

UMAP clustering of cells from the HCC and benign liver based on the combination of normal gene expression, mutational gene expression share, and fusion gene expression share.

(A) UMAP clustering of cells from HCC and benign liver samples based on 182 gene expressions with a standard deviation ≥1, 282 mutational gene expression shares with standard deviations ≥0.4, and 20 fusion gene expression shares of any standard deviation. (B) Relabeling of clusters from (A) as clusters ‘A’, ‘B’, and ‘C’. The number of cells from HCC and benign liver in each cluster is indicated. (C) Heat map of 182 gene expressions, 282 mutational gene expression shares, and 20 fusion genes expression shares for cells from clusters ‘A’, ‘B’, and ‘C’. Cells from the HCC and benign liver are indicated.