Data acquisition and workflow to train DEGAS using human pancreatic islet transcriptomic data for prediction of T2D-associated cells.

A) Read count data for human islet bulk RNA-seq from GSE138748 was downloaded, processed, and matched with donor metadata. The dataset included 58 non-diabetic and 27 type 2 diabetic (T2D) samples. B) Human islet single-cell RNA-seq (scRNA-seq) count matrices were generated by realignment of reads and the datasets were integrated in Seurat. C) DEGAS transfers the bulk donor expression data and clinical trait information to individual cells in the single cell matrix for the purpose of prioritizing cells. This allows cells to be assigned scores which can then be thresholded for downstream analysis. D,E) Bulk RNA-seq data from Marselli et al. was analyzed by edgeR and displayed as both (D) volcano plot and (E) MA plot. Differentially expressed genes (DEGs) are those with p<0.05 and >1.5-fold changed and are highlighted in red. The MA plot in (E) provides a sense of relative transcript abundance among DEGs. F) Gene set enrichment analysis (GSEA) results for Hallmark_Inflammatory_Response show an enrichment for DEGs in T2D human islets. G) A simple comparison of up- and down-regulated genes in T2D human islets in RNA-seq data between Marselli et al. and Asplund et al. shows consistent findings. Significant DEGs from Marselli et al are outlined in black. H) DEGs from Marselli et al. that were also found on the Type 2 Diabetes Knowledge Portal database of known diabetes effector genes.

Validation of merged scRNA-seq datasets from five human islet studies.

A) After integrating the five scRNA-seq datasets from Fig 1B, gene expression plots were created for the major endocrine and exocrine pancreas genes to demonstrate successful clustering of specific cell types. Metadata was overlaid onto single cell UMAP plots for show cex (B), BMI (C), age (D), and diabetes status (E).

DEGAS analysis based on T2D status.

A) DEGAS T2D association scores (T2D-DEGAS) for each cell was overlaid onto the single cell UMAP plot. Higher scores in pink/red indicate a strong positive association and negative scores in darker black indicate a negative association of those cells with T2D. B) Violin plot displaying the aggregate T2D-DEGAS scores per cell type.

Identification of differentially-expressed genes in high- and low-scoring βT2D-DEGAS cell populations.

A) β-cells were subsetted from all other cells based on INS expression and reclustered. B) DEGAS scores for T2D association (βT2D-DEGAS) were overlaid onto the cell plot. C) β-cells were classified as low, medium and high-scoring βT2D-DEGAS subpopulations for downstream analysis. D) Differential expression analysis of high vs. low βT2D-DEGAS scores. Genes with p<0.05 and >1.5-fold change are highlighted in red. E) Genes from (D) were filtered to remove DEGs that could be identified by comparing β-cells of T2D vs non-diabetic donors in the single cell data (Fig S1D). Gene ontology (GO) analysis of genes enriched in the high (F) and low (G) scoring βT2D-DEGAS subpopulations. H) GSEA results for high-vs. low-scoring βT2D-DEGAS subpopulations. I) Bubble plot highlighting genes driving GO and GSEA categories.

DEGAS analysis based on obesity status.

A) DEGAS obesity association scores (obese-DEGAS) for each cell was overlaid onto the single cell UMAP plot. Higher scores in pink/red indicate a strong positive association and negative scores in darker black indicate a negative association of those cells with obesity. B) Violin plot displaying the aggregate obese -DEGAS scores per cell type.

Two subpopulations of high βobese-DEGAS cells defined by underlying single cell donor T2D status uncovers stress signature in non-diabetic high-scoring obese-DEGAS β-cells.

A) β-cells were subsetted from Fig 5A based on INS expression and obese-DEGAS scores overlaid onto the single cells. B) β-cells were further subsetted based on both their high obese-DEGAS scores and by the underlying T2D status of the single cell donors, leading to two major subpopulations, ND-βobese-DEGAS and T2D-βobese-DEGAS cells. C) Differential expression analysis comparing ND-βobese-DEGAS and T2D-βobese-DEGAS subpopulations. Genes with p<0.05 and >1.5-fold change are highlighted in red. Gene ontology (GO) analysis of genes enriched in the ND-βobese-DEGAS (D) and T2D-βobese-DEGAS (E) subpopulations. F) Gene set enrichment analysis (GSEA) of ND-vs. T2D-βobese-DEGAS subpopulations. G) Bubble plot highlighting genes driving GO and GSEA categories.

DLK1 is a β-cell gene heterogeneously expressed between cells and islets of non-diabetic and T2D humans.

A) Formalin-fixed paraffin-embedded (FFPE) sections of human pancreas from non-diabetic (ND) and T2D donors was stained for INS and DLK1. Representative images are shown from three different ND and T2D donors. Arrows indicate INS+/DLK1+ cells. B) UMAP plot overlaying DLK1 expression and βT2D-DEGAS (i) or βobese-DEGAS (ii) scores.

Islet Gene View results of DEGs found in both Marselli et al. and Asplund et al.

FFAR4 (A), UCN5D (B), SFRP4 (C), and PODN (D) were each significantly regulated in T2D vs. non-T2D human islet RNA-seq data in Marselli et al. and Asplund et al. Plots shown were generated using Islet Gene View browser (https://mae.crc.med.lu.se/IsletGeneView/). FFAR4 and UCN5D were both down-regulated in T2D and SFRP4 and PODN were both up-regulated.

Human islet single cell RNA-seq clusters and diabetes marker genes for all cells and β-cells.

A) Seurat clustering of merged human islet scRNA-seq data. B) Study identifiers overlaid on single cell plot. C) Differential gene expression analysis comparing all cells from T2D donors vs. non-diabetic (ND) donors. D) β-cells reclustered and labeled based on donor T2D status. E) Differential expression analysis of β-cells from T2D vs. ND donors.

Thresholding analysis for βT2D-DEGAS scores and differential gene expression analysis and plots.

A) Quantile thresholding and volcano plot comparing the upper 20% of βT2D-DEGAS scores to the lower 80%. B) Median thresholding and volcano plot comparing the upper 50% vs lower 50% of βT2D-DEGAS scoring cells. C) Gene expression overlays on β-cells of selected genes that drive enrichment of GO terms and GSEA categories. D) GSEA results for high βT2D-DEGAS scoring cells. E) GSEA results of low-scoring βT2D-DEGAS scoring cells.

Score overlay on all islet cells for βlean-DEGAS (A) and βoverweight-DEGAS (B). C,D) Similar score overlays are also shown for β-cells alone. E) Median thresholding of βobese-DEGAS score. F) Differential expression analysis for high vs low βobese-DEGAS scores based on median threshold. G) Subsetting highest-scoring cells based on βobese-DEGAS score. H) Gene expression overlays on β-cell UMAP. I) GSEA results for ND-βobese-DEGAS cells. J) GSEA results for T2D-βobese-DEGAS cells.

Overlap of RePACT trajectory genes with genes identified by DEGAS in human islet scRNA-seq data.

Genes from βT2D-DEGAS (A), or ND-βobese-DEGAS and T2D-βobese-DEGAS (B) that had adjusted p-values < 0.05 and |log2fold-changed| > 0.58 were selected for comparison to Fang et al. Table S4 T2D and obesity trajectory genes, respectively.