Schema of the workflow for the single-cell LoopSeq assay.

200-300 live cells per sample were co-partitioned with Gel Beads and subsequently lysed. The captured mRNAs were reverse-transcribed and barcoded using Chromium Next GEM 3’ reagent 3.1 kit (10x Genomics). The cellular barcoded cDNAs were ligated with a LoopSeq Adapter (Element Biosciences) and enriched by human core exome capturing (Twist Biosciences). This was followed by amplification and intramolecular distribution of the LOOP UMI located on the LoopSeq Adapter. The LOOP UMI barcoded cDNAs were then fragmented and ligated with an adaptor to generate a short-read sequencing library before sequencing.

Mutation expression clustering of cells from HCC and its benign liver counterpart.

(A) UMAP clustering of cells from the HCC and benign liver, based on mutational gene expressions shares with standard deviations ≥0.4. Red-cells are from HCC; Blue-cells are from benign liver. (B) UMAP clustering of cells from the HCC and benign liver based on mutational isoform expression shares with standard deviations ≥0.4. (C) Relabeling of clusters from (B) as ‘A’, ‘B’, and ‘C’. (D) Venn diagram of mutational isoform expressions in cells from clusters A, B, and C. (E) UMAP clustering of cells from the HCC and benign liver based on the unique mutational isoform expressions from clusters A, B, and C. (F) UMAP clustering of cells from the HCC and benign liver based on the unique mutational isoform expression in at least 5 cells from clusters A, B, and C.

Mutations in HLA molecules dominated the landscape of HCC mutational isoform expressions.

(A) The UMAP clusters from Figure 2F were relabeled as A through H groups as indicated. Cells from HCC and benign liver in each cluster are indicated. (B) Heat map of 113 mutational isoform expressions in the HCC and benign liver and clusters A through H. (C) Relabeling of UMAP clusters from (A) with cells expressing mutation HLA isoforms in triangles. Cells expressing mutation HLA isoforms in each cluster are indicated.

Evolution of mutations in HLA-DQB1 molecule.

(A) Somatic mutations in single molecules of HLA-DQB1 NM_002123. The position of mutation is indicated at the bottom of the graph. The mutation is numerically numbered from C-terminus to N-terminus. The numbers of cells expressing these mutation isoforms from each cluster or sample are indicated in the right panel. Close circle-mutation codon; Open circle-wild type codon. Open rectangle-double single-nucleotide mutation in the same codon. (B) Pathway flow chart of mutation accumulation in single molecules of HLA-DQB1. The area of the circle is proportional to the accumulated number of mutations in a molecule. The scale on the left indicates the number of mutations in a single molecule but is not mathematically scaled. The arrow indicates the potential pathways of mutation accumulation in these molecules. The number of white text indicates specific mutation(s) in a molecule. The number of red text indicates the number of cells expressing the mutation(s).

Fusion gene expression validation in HCC sample.

Left panel: ACTR2-EML4 fusion. Top: Chromosome organization of EML4 and ACTR2 exons. The directions of transcriptions are indicated. 2nd from the top: Exon representations in ACTR2-EML4 fusion transcript, EML4 NM001145076.3, and ACTR2 NM001005386.3. Middle: Chromogram of Sanger sequencing. The segments for ACTR2 and EML4 are indicated. Bottom: Protein domain and motif organizations of EML4, ACTR2, and ACTR2-EML4 fusion. Middle panel: PDCD6-CCDC127 fusion. Top: Chromosome organization of CCDC127 and PDCD6 exons. The directions of transcriptions are indicated. 2nd from the top: Exon representations in PDCD6-CCDC127 fusion transcript, CCDC127 NM145265.3, and PDCD6 NM013232.4. Middle: Chromogram of Sanger sequencing. The segments for PDCD6 and CCDC127 are indicated. Bottom: Protein domain and motif organizations of CCDC127, PDCD6, and PDCD6-CCDC127 fusion. Right panel: PLG-FGG fusion. Top: Chromosome organization of PLG and FGG exons. The directions of transcriptions are indicated. 2nd from the top: Exon representations in PLG-FGG fusion transcript, PLG NM000301.5, and FGG NM000509.6. Middle: Chromogram of Sanger sequencing. The segments for PLG and FGG are indicated. Bottom: Protein domain and motif organizations of PLG, FGG, and PLG-FGG fusion. The open-reading frame of FGG was eliminated due to frameshift in PLG-FGG fusion. Unrelated four additional amino acids were added to the truncated N-terminus of PLG.

UMAP clustering of cells from the HCC and benign liver based on the combination of normal gene expression, mutational gene expression share, and fusion gene expression share.

(A) UMAP clustering of cells from HCC and benign liver samples based on 182 gene expressions with a standard deviation ≥1, 282 mutational gene expression shares with standard deviations ≥0.4, and 20 fusion gene expression shares of any standard deviation. (B) Relabeling of clusters from (A) as clusters ‘A’, ‘B’, and ‘C’. The number of cells from HCC and benign liver in each cluster is indicated. (C) Heat map of 182 gene expressions, 282 mutational gene expression shares, and 20 fusion genes expression shares for cells from clusters ‘A’, ‘B’, and ‘C’. Cells from the HCC and benign liver are indicated.

Mutation expression standard deviations.

(A) Mutational gene expressions share standard deviation across all transcriptomes. (B) Mutational isoform expression share standard deviation across all transcriptomes.

Heat maps of mutational gene expression.

(A) Heat map of mutational gene expression share with standard deviations ≥0.4. Cells from the HCC and benign liver are indicated. (B) Heat map of mutational isoform expression shares with standard deviation ≥0.4. Cells from the HCC and benign liver are indicated.

Evolution of mutations in HLA-B, HLA-C, and HLA-DRB1 molecules.

(A) Somatic mutations in single molecules of HLA-B NM_005514_2. The position of the mutations is indicated at the bottom of the graph. Mutations are numerically numbered from C-terminus to N-terminus. The numbers of cells expressing these mutation transcripts from each cluster or sample are indicated in the right panel. Close circle-mutation codon; Open circle-wild type codon. (B) Pathway flow chart of mutation accumulation in single molecules of HLA-B. The area of the circle is proportional to the accumulated number of mutations in a molecule. The scale on the left indicates the number of mutations in a single molecule but is not mathematically scaled. The arrow indicates the potential pathways of mutation accumulation in these molecules. The number in white text indicates the specific mutation(s) in a molecule. The number in red text indicates the number of cells expressing the mutation(s). (C) Somatic mutations in single molecules of HLA-C NM_002117. The position of the mutation is indicated at the bottom of the graph. The mutation is numerically numbered from C-terminus to N-terminus. The numbers of cells expressing these mutation transcripts from each cluster or sample are indicated in the right panel. Close circle-mutation codon; Open circle-wild type codon; Open rectangle-double single-nucleotide mutation in the same codon. (D) Pathway flow chart of mutation accumulation in single molecules of HLA-C. The area of the circle is proportional to the accumulated number of mutations in a molecule. The scale on the left indicates the number of mutations in a single molecule but is not mathematically scaled. The arrow indicates the potential pathways of mutation accumulation in these molecules. The number in white text indicates specific mutation(s) in a molecule. The number in red text indicates the number of cells expressing the mutation(s). (E) Somatic mutations in single molecules of HLA-DRB1 NM_002124. The position of mutation is indicated at the bottom of the graph. The mutation is numerically numbered from C-terminus to N-terminus. The numbers of cells expressing these mutation transcripts from each cluster or sample are indicated in the right panel. Close circle-mutation codon; Open circle-wild type codon. (F) Pathway flow chart of mutation accumulation in single molecules of HLA-DRB1. The area of the circle is proportional to the accumulated number of mutations in a molecule. The scale on the left indicated the number of mutations in a single molecule but is not mathematically scaled. The arrow indicates the potential pathways of mutation accumulation in these molecules. The number in white text indicates specific mutation(s) in a molecule. The number in red text indicates the number of cells expressing the mutation(s).

Taqman RT-PCR of fusion transcripts in HCC and benign liver samples.

Top panel: Taqman RT-PCR results of PDCD6-CCDC127, ACTR2-EML4, PLG-FGG, and β-actin from the benign liver sample. Bottom panel: Taqman RT-PCR results of PDCD6-CCDC127, ACTR2-EML4, PLG-FGG, and β-actin from the HCC sample.

The impact of fusion gene expressions on the cell clustering generated by mutational isoform expressions.

(A) UMAP cluster analysis of cells from HCC and benign liver based on 113 mutational isoform expressions and 20 fusion gene expressions. (B) Relabeling of clusters from (A) as clusters A through H. (C) Heat map of mutational isoform expressions and fusion gene expression shares for clusters A through H. The cells from HCC and benign liver were indicated.

UMAP cluster analyses of cells from the HCC and benign liver based on gene expressions with different standard deviation cutoffs.

(A) UMAP clustering of cells based on gene expression with standard deviations at least 0.5, 0.8, 1.0, or 1.4. The numbers of genes employed in the UMAP analysis are indicated. Blue dot-cell from the benign liver; Red dot-cell from HCC. (B) UMAP clustering of cells based on isoform expressions with standard deviations at least 0.5, 0.8, 1.0, or 1.4. The numbers of genes employed in the UMAP analysis are indicated. Blue dot-cell from the benign liver; Red dot-cell from HCC.

Segregation of cells between HCC and benign liver samples.

(A) Relabeling of clusters as ‘A’ and ‘B’ based on gene expressions with a standard deviation ≥0.5. Left panel: UMAP clustering. The numbers of cells from the HCC and benign liver samples in each cluster are indicated. Right panel: Heatmap of clusters A and B. Cells from HCC and benign liver are indicated. (B) Relabeling of clusters as ‘A’ and ‘B’ based on gene expressions with a standard deviation ≥0.8. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster are indicated. Right panel: Heatmap of clusters A and B. Cells from HCC and benign liver are indicated. (C) Relabeling of clusters as ‘A’ and ‘B’ based on gene expressions with a standard deviation of ≥1.0. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster are indicated. Right panel: Heatmap of clusters A and B. Cells from HCC and benign liver are indicated. (D) Relabeling of clusters as ‘A’, ‘B’, and ‘C’ based on gene expressions with a standard deviation ≥1.4. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster are indicated. Right panel: Heatmap of clusters A, B, and C. Cells from HCC and benign liver are indicated. (E) Relabeling of clusters as ‘A’ and ‘B’ based on isoform expressions with a standard deviation ≥0.5. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster are indicated. Right panel: Heatmap of clusters A and B. Cells from HCC and benign liver are indicated. (F) Relabeling of clusters as ‘A’ and ‘B’ based on isoform expressions with a standard deviation ≥0.8. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster are indicated. Right panel: Heatmap of clusters A and B. Cells from HCC and benign liver are indicated. (G) Relabeling of clusters as ‘A’ and ‘B’ based on isoform expressions with a standard deviation ≥1.0. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster are indicated. Right panel: Heatmap of clusters A and B. Cells from HCC and benign liver are indicated. (H) Relabeling of clusters as ‘A’, ‘B’, and ‘C’ based on isoform expressions with a standard deviation ≥1.4. Left panel: UMAP clustering. The numbers of cells from HCC and benign liver in each cluster were indicated. Right panel: Heatmap of clusters A, B, and C. Cells from HCC and benign liver are indicated.

Genes with isoform expression alterations played key roles in segregating cells between the HCC and benign liver samples.

(A) The role of isoform expressions in segregating cells between the HCC and benign liver when the standard deviation was ≥0.5. Left panel: Venn diagram between gene expressions and isoform expressions with standard deviations ≥0.5. Middle panel: UMAP clustering with genes not overlapping with isoforms. Right panel: UMAP clustering with genes overlapping with isoforms. (B) The role of isoform expressions in segregating cells between the HCC and benign liver when the standard deviation was ≥ 0.8. Left panel: Venn diagram between gene expressions and isoform expressions with standard deviations ≥0.8. Middle panel: UMAP clustering with genes not overlapping with isoforms. Right panel: UMAP clustering with genes overlapping with isoforms. (C) The role of isoform expression in segregating cells between the HCC and benign liver when the standard deviation ≥1.0. Left panel: Venn diagram between gene expressions and isoform expressions with standard deviations ≥ 1.0. Middle panel: UMAP clustering with genes not overlapping with isoforms. Right panel: UMAP clustering with genes overlapping with isoforms. (D) The role of isoform expression in segregating cells between the HCC and benign liver when the standard deviation was ≥1.4. Left panel: Venn diagram between gene expressions and isoform expressions with a standard deviation ≥1.4. Middle panel: UMAP clustering with genes not overlapping with isoforms. Right panel: UMAP clustering with genes overlapping with isoforms.

UMAP clustering of cells from HCC and benign liver based on the combination of normal gene expressions and mutational gene expression shares.

(A) UMAP clustering of cells from the HCC and benign liver samples based on 182 gene expressions with standard deviations ≥1.0 and 282 mutational gene expression shares with standard deviations ≥0.4, (B) Relabeling of clusters from (A) as clusters A, B, and C. The number of cells from the HCC and benign liver in each cluster are indicated. (C) Heat map of 182 gene expressions and 282 shares for cells from clusters A, B, and C. Cells from the HCC and benign liver mutational gene expressions are indicated.