Identification of type 2 diabetes- and obesity-associated human β-cells using deep transfer learning

Gitanjali Roy; Rameesha Syed; Olivia Lazaro; Sylvia Robertson; Sean D. McCabe; Daniela Rodriguez; Alex M. Mawla; Travis S. Johnson; Michael A. Kalwat

doi:10.7554/eLife.96713.1

eLife assessment

This is a useful study that used DEGAS, a deep transfer learning tool, to identify distinct pancreatic beta cell subpopulations that could be associated with type 2 diabetes (T2D) and/or obesity status. The data supporting the authors' findings is solid and demonstrates that DEGAS will be a helpful tool for analyzing cell-specific transcriptomic phenotypes. This study will be of interest to researchers studying the genetics of T2D.

https://doi.org/10.7554/eLife.96713.1.sa2

Significance of findings

useful: Findings that have focused importance and scope

landmark
fundamental
important
valuable
useful

Strength of evidence

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Diabetes affects >10% of adults worldwide and is caused by impaired production or response to insulin, resulting in chronic hyperglycemia. Pancreatic islet β-cells are the sole source of endogenous insulin and our understanding of β-cell dysfunction and death in type 2 diabetes (T2D) is incomplete. Single-cell RNA-seq data supports heterogeneity as an important factor in β-cell function and survival. However, it is difficult to identify which β-cell phenotypes are critical for T2D etiology and progression. Our goal was to prioritize specific disease-related β-cell subpopulations to better understand T2D pathogenesis and identify relevant genes for targeted therapeutics. To address this, we applied a deep transfer learning tool, DEGAS, which maps disease associations onto single-cell RNA-seq data from bulk expression data. Independent runs of DEGAS using T2D or obesity status identified distinct β-cell subpopulations. A singular cluster of T2D-associated β-cells was identified; however, β-cells with high obese-DEGAS scores contained two subpopulations derived largely from either non-diabetic or T2D donors. The obesity-associated non-diabetic cells were enriched for translation and unfolded protein response genes compared to T2D cells. We selected DLK1 for validation by immunostaining in human pancreas sections from healthy and T2D donors. DLK1 was heterogeneously expressed among β-cells and appeared depleted from T2D islets. In conclusion, DEGAS has the potential to advance our holistic understanding of the β-cell transcriptomic phenotypes, including features that distinguish β-cells in obese non-diabetic or lean T2D states. Future work will expand this approach to additional human islet omics datasets to reveal the complex multicellular interactions driving T2D.

Background

The advent of high-throughput single-cell RNA sequencing (scRNA-seq) has enabled the generation of an array of single-cell atlases from pancreatic islets. These findings have expanded our understanding of the major cell types of the pancreas along with how they are implicated in both type 1 diabetes (T1D) and type 2 diabetes (T2D). Notably, these studies have: 1) identified multiple reliable transcriptomic markers for endocrine and exocrine pancreatic cell types; 2) provided insight into novel subtypes of cells; and 3) generated large cellular atlases spurring innovation in the development of single cell methods and analysis. scRNA-seq analysis has enabled a more robust characterization of islet cell heterogeneity which may underlie diversity in diabetes risk and drug response ^1–5. In published comparisons between the islet single-cell transcriptomes of humans versus mice and pigs, differences were observed in the relative proportion of major cell types as well as in many cell type-specific genes ⁶. These findings further support a focus on integrating human islet transcriptomic data to delineate disease processes and identify therapeutic targets and biomarkers.

As bulk RNA-seq and scRNA-seq human islet datasets have become increasingly available, so too has the need for new computational tools to integrate the transcriptomics and donor metadata. Some sets of islet scRNA-seq data have been combined and made searchable through web portals to browse datasets and compare cell types and marker genes ^6–9. To further utilize these types of integrated transcriptomic data and donor metadata, we previously developed DEGAS as a flexible deep transfer learning framework that can be used to overlay disease status, survival hazard, drug response, and other clinical information directly onto single cells. Machine learning tools like DEGAS have been primarily used in cancer datasets, but not in human pancreatic islets until now.

The reasons why some obese individuals succumb to T2D while other do not likely involves both genetic and environmental factors, but the factors that underlie this transition are incompletely understood¹⁰. Analyses of human genomics and islet transcriptomics indicate many β-cell genes have causal roles in T2D ^11–15. Subsets of β-cells which are more resilient or susceptible to failure under the secretory pressure of insulin resistance may be uncovered by combining transcriptomics with machine learning approaches like DEGAS. Here we have implemented DEGAS to predict T2D- and obesity-associated subclusters of human pancreatic islet β-cells using a combination of publicly available scRNA-seq and bulk RNA-seq human islet data and associated donor metadata. Through this analysis we sought to identify novel and established genes implicated in T2D and obesity, which were up- or down-regulated in subpopulations of β-cells identified by DEGAS, and to validate our findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors. Our current findings applying DEGAS to islet data have implications for β-cell heterogeneity in T2D and obesity. The abundance of T2D-related factors and functional β-cell genes in our analysis validates applying DEGAS to islet data to identify disease associated phenotypes and increase confidence in the novel candidates.

Data description

Human islet bulk transcriptomic dataset acquisition, processing, and analysis

In this study, human islet bulk RNA-seq raw count data (aligned to GRCh38) and donor metadata from Marselli, et al. GSE159984 was downloaded from the Gene Expression Omnibus ¹⁶ (Table S1). The rationale for using this dataset was its large sample size of both non-diabetic and T2D samples and the agreement of differentially-expressed genes between the GSE159984 dataset and a similarly large independent dataset¹⁷. All genes in the read count data were filtered based on the one-to-one identifier correspondences between gene symbol, Entrez gene identifier, and Ensembl identifier from the Matched Annotation from NCBI and EBI table (MANE GRCh38 v1.1) ¹⁸.

Sample data were labeled and grouped by their RRID and disease status (ND, non-diabetes vs. T2D, type 2 diabetes). After filtering, the GSE159984 read count table contained 19,058 genes from N = 27 T2D samples and N = 58 non-diabetes samples (Fig 1A). Differential gene expression analysis between ND and T2D groups was performed using the edgeR likelihood ratio test and cutoffs were FDR ≤ 0.05 and |log₂ fold-change| ≥ 0.58. The processed and filtered read count table and edgeR results are provided in Table S2 and the R script used for processing is available on GitHub (https://github.com/kalwatlab/Islet_DEGAS_v1).

Data acquisition and workflow to train DEGAS using human pancreatic islet transcriptomic data for prediction of T2D-associated cells.
A) Read count data for human islet bulk RNA-seq from GSE138748 was downloaded, processed, and matched with donor metadata. The dataset included 58 non-diabetic and 27 type 2 diabetic (T2D) samples. B) Human islet single-cell RNA-seq (scRNA-seq) count matrices were generated by realignment of reads and the datasets were integrated in Seurat. C) DEGAS transfers the bulk donor expression data and clinical trait information to individual cells in the single cell matrix for the purpose of prioritizing cells. This allows cells to be assigned scores which can then be thresholded for downstream analysis. **D,E**) Bulk RNA-seq data from Marselli et al. was analyzed by edgeR and displayed as both (D) volcano plot and (E) MA plot. Differentially expressed genes (DEGs) are those with p<0.05 and >1.5-fold changed and are highlighted in red. The MA plot in (E) provides a sense of relative transcript abundance among DEGs. F) Gene set enrichment analysis (GSEA) results for Hallmark_Inflammatory_Response show an enrichment for DEGs in T2D human islets. G) A simple comparison of up- and down-regulated genes in T2D human islets in RNA-seq data between Marselli et al. and Asplund et al. shows consistent findings. Significant DEGs from Marselli et al are outlined in black. H) DEGs from Marselli et al. that were also found on the Type 2 Diabetes Knowledge Portal database of known diabetes effector genes.

Human islet single-cell data acquisition, filtering, and integration

We obtained read count tables for five single-cell human islet datasets (GSE84133; GSE85241; E-MTAB-5061; GSE81608; GSE86469 (Table S1)) which were realigned to GRCh38.p5 (hg38) ⁸ (Fig 1B). Metadata were downloaded from GEO or obtained from the supplemental information of the respective publications. These five datasets were previously integrated and analyzed ⁸. To prepare the data for our machine learning analyses, we repeated the integration and analysis of these datasets in R using Seurat to exclude cells with low expressed genes and cells with over 20% mitochondrial gene expression (low viability). The upper and lower limits varied between the datasets to account for differences in library preparation and sequencing platform following guidelines in the Seurat documentation.

Datasets were integrated into a single dataset using Seurat version 4.1.3, the Seurat objects were normalized using regularized negative binomial regression (SCTransform) to correct for batch effect and clustered using default settings. The most variable features were used to identify integration anchors which are passed to the IntegrateData function to return a Seurat object containing an integrated expression matrix of all cells.

The clusters were visualized using UMAP and expression of pancreatic hormone genes INS, GCG, SST, PPY, GHRL were used to annotate β, α, delta, γ and ε cell clusters, respectively. Clustering analysis used the Louvain algorithm ¹⁹ and labelling was performed to identify different cell types because our integrated dataset contained a mixture of cells from pancreatic islets. In total, 22 clusters were identified from 17,273 pancreatic cells and the clusters were classified into endocrine and non-endocrine cell types based on cell-specific marker expression (Fig 2A). R scripts are available on GitHub (https://github.com/kalwatlab/Islet_DEGAS_v1). The Seurat object containing all read count matrices and merged data and metadata is available on Mendeley²⁰.

Validation of merged scRNA-seq datasets from five human islet studies.
A) After integrating the five scRNA-seq datasets from Fig 1B, gene expression plots were created for the major endocrine and exocrine pancreas genes to demonstrate successful clustering of specific cell types. Metadata was overlaid onto single cell UMAP plots for show cex (B), BMI (C), age (D), and diabetes status (E).

Analyses

Differentially expressed genes from T2D bulk RNA-seq human islets show inflammatory signature and correlate with independent islet datasets

We selected bulk RNA-seq data from Marselli, et al. 2020 which contains 58 non-diabetic and 27 T2D samples with associated metadata for BMI, age, and sex ¹⁶ (Fig 1A-C). To determine the suitability of this data for DEGAS, we reanalyzed the read count matrix from GSE159984 using the edgeR likelihood ratio test to identify up- and down-regulated genes, visualized in both volcano (Fig 1D) and MD plot (Fig 1E, Fig S1A) formats to highlight fold-change versus significance and versus overall expression levels, respectively (Table S2). We observed altered expression of genes well-known to be dysregulated in T2D, including IAPP ^{21, 22}, PAX4 ^{23, 24}, SLC2A2 ²⁵, FFAR4²⁶, and ENTPD3 ^{27, 28} (Fig 1D-E).

Gene set enrichment analysis (GSEA) indicated that the gene expression profile of T2D human islets from this dataset was associated with an inflammatory response phenotype (Fig 1F). To compare our analysis of Marselli, et al. GSE159984 with an independent cohort, we selected a bulk RNA-seq human islet transcriptomic analysis from Asplund et al. ¹⁷. Because the raw read count data from Asplund et al. is not accessible, we used the published supplementary table of log₂-fold changes for differentially expressed genes and compared it with the Marselli dataset (Fig 1G) (Table S2). Significant differentially expressed genes from both Marselli and Asplund data included up-regulation of SFRP4 and PODN, and down-regulation of UNC5D and FFAR4 (Fig S1B). We also found substantial agreement in the directional changes of most other differentially expressed genes from both datasets, indicating that the Marselli islet data represents a suitable cohort for DEGAS analysis. Finally, we compared up- and down-regulated genes from T2D islets in the Marselli dataset with T2D effector genes from the T2D Knowledge Portal ²⁹. Down-regulated genes PAX4 and SLC2A2 and up-regulated genes APOE each have evidence of a strong or causal role in T2D at the genetic level, while CYTIP was up-regulated in T2D islets and has intron variant SNPs associated with T2D (e.g. rs13384965) ³⁰ (Fig 1H).

DEGAS revealed T2D-associated β-cells and marker genes within integrated human islet scRNA-seq data

We obtained five realigned scRNA-seq datasets ⁸ including Baron (GSE84133), Muraro (GSE85241), Segerstolpe (E-MTAB-5061), Lawlor (GSE86469), and Xin (GSE81608), and integrated them using Seurat 4.3.0 to subtype each of the major cell types for use with DEGAS. All five datasets were successfully integrated resulting in 17,273 cells (Fig S2A) and different cell types were clearly stratified from the scRNA-seq data (Fig 2A). Distinct clusters of cell types were enriched for their respective markers including β-cells (INS), α-cells (GCG), δ-cells (SST), ε-cells (GHRL), PP-cells (PPY), acinar cells (REG1A, PRSS1), ductal cells (KRT19, CFTR), stellate cells (COL6A1), endothelial cells (PLVAP), and mesenchymal cells (CD44) (Fig 2A).

Next, we implemented DEGAS ³¹, a unique tool that leverages the increased sequencing depth, larger number of clinical covariates and sample sizes of bulk sequencing data. We used the deep learning architecture within DEGAS to project bulk islet sequencing data into the same latent representation as single cell islet data via domain adaptation. From this common latent representation, clinical covariates from the bulk sequencing data (e.g. T2D status and BMI) were projected onto single cells in a process called transfer learning. This process generates unitless disease-association scores for each single cell. After mapping the T2D-association scores to the single cell map (Fig 3A), we observed that β-cells had the lowest scores compared to other cell types (Fig 3B). This may be interpreted as reflecting fewer β-cells or lower insulin expression after onset of T2D ³². We focused on differential T2D-association scores only among β-cells by subsetting them based on INS expression (Fig 4A) and superimposed the T2D-association scores onto each β-cell in the UMAP plot (Fig 4B). β-cells with higher scores (pink coloration) transcriptionally associate with human islet T2D expression profiles more than the lower scoring (black) β-cells. Next, we set quantile thresholds for the β-cell T2D-association scores and grouped the cells into high (upper 20%), medium and low (lower 20%) categories (Fig 4C). We compared the high versus low β^T2D-DEGAS groups to genes enriched in either subset of β-cells (Fig 4D, Table S3). High-scoring β^T2D-DEGAS cells had elevated expression of CDKN1C (p57^Kip2) which binds and inhibits G1 cyclin/CDK complexes and is also involved in cellular sensescence ³³ ³⁴, and reduced expression of IAPP, consistent with results from bulk T2D islet expression data (Fig 1G). These findings show that the DEGAS can successfully identify diabetes-relevant genes enriched in specific subclusters.

DEGAS analysis based on T2D status.
A) DEGAS T2D association scores (T2D-DEGAS) for each cell was overlaid onto the single cell UMAP plot. Higher scores in pink/red indicate a strong positive association and negative scores in darker black indicate a negative association of those cells with T2D. B) Violin plot displaying the aggregate T2D-DEGAS scores per cell type.

Identification of differentially-expressed genes in high- and low-scoring β^T2D-DEGAS cell populations.
A) β-cells were subsetted from all other cells based on *INS* expression and reclustered. B) DEGAS scores for T2D association (β^T2D-DEGAS) were overlaid onto the cell plot. C) β-cells were classified as low, medium and high-scoring β^T2D-DEGAS subpopulations for downstream analysis. D) Differential expression analysis of high vs. low β^T2D-DEGAS scores. Genes with p<0.05 and >1.5-fold change are highlighted in red. E) Genes from (D) were filtered to remove DEGs that could be identified by comparing β-cells of T2D vs non-diabetic donors in the single cell data (**Fig S1D**). Gene ontology (GO) analysis of genes enriched in the high (F) and low (G) scoring β^T2D-DEGAS subpopulations. H) GSEA results for high-vs. low-scoring β^T2D-DEGAS subpopulations. I) Bubble plot highlighting genes driving GO and GSEA categories.

As an additional way to highlight genes that were identified by DEGAS, we excluded genes that were differentially expressed between all non-diabetic (ND) and T2D β-cells (based on the merged scRNA-seq donor metadata Fig S4D, Table S3). This process left over 90% of significant differentially expressed genes remaining which were only identified through DEGAS analysis (Fig 4E). This emphasizes the advantage of using DEGAS to generate continuous variables (disease-association scores) which enable thresholding and prioritizing cellular subpopulations. Gene ontology (GO) analysis on high-scoring β^T2D-DEGAS cells showed enrichment in biological process categories including neutrophil degranulation, cell growth, and negative regulation of cell development (Fig 4F, Table S4). These categories were driven in part by immune-related genes (e.g. HLA-B, PSAP) and cell cycle regulator BTG1. Conversely, GO analysis on low-scoring β^T2D-DEGAS cells showed enrichment of categories primarily related membrane protein targeting, translation, and mitochondrial electron transport (Fig 4G, Table S4). The top six enriched GO categories were driven largely by RPS/RPL genes. We also performed ranked-list GSEA on these same high-vs. low-scoring β^T2D-DEGAS subpopulations which largely agreed with the GO analysis (Fig 4H). We observed significant positive enrichment for hypoxia, TNFα signaling, estrogen response, and glycolysis (Fig S3D), while significant negative enriched pathways included overlapping genes in oxidative phosphorylation and Myc-target pathways (Fig S3E). Examples of genes that drive the GO and GSEA categories are show (Fig 4I) and heterogeneity in the expression of these genes can be seen in β-cell UMAP plots (Fig S3C).

β-cell clusters with high BMI-association scores (high β^obese-DEGAS) show distinct differences between non-diabetic and T2D donors

Obesity and T2D are related, but not mutually inclusive. We surmised that obesity-association scores in β-cells may highlight unique subpopulations compared to the β^T2D-DEGAS cells. Therefore, we categorized bulk RNA-seq donors by BMI (lean, <25; overweight, 25-30; obese >30) and ran the DEGAS analysis based on these categories to calculate BMI-association scores similar to our approach with T2D. Overlaying the BMI-DEGAS scores onto the islet single-cell data showed a distinct pattern of cell labeling between the obesity-association score (Fig 5A) compared to lean- or overweight-association scores (Fig S5A,B). It is interesting to note that the subpopulation of βlean-DEGAS cells appear to overlap with the subpopulation of low-scoring β^T2D-DEGAS cells (Fig S4C), suggesting that single β-cells share features of low BMI and potentially lower T2D risk. The cell types with the highest obesity-association scores were β-cells, acinar cells, and stellate cells (Fig 5B).

DEGAS analysis based on obesity status.
A) DEGAS obesity association scores (obese-DEGAS) for each cell was overlaid onto the single cell UMAP plot. Higher scores in pink/red indicate a strong positive association and negative scores in darker black indicate a negative association of those cells with obesity. B) Violin plot displaying the aggregate obese -DEGAS scores per cell type.

Reclustering β-cells with overlaid obesity-association scores enabled us to distinguish two major subpopulations of high-scoring β^obese-DEGAS cells (Fig 6A), a feature that did not occur with β^T2D-DEGAS cells (Fig 4B). We observed that a distinguishing difference between these two groups of obesity-associated β-cells was the donor diabetes status (ND vs T2D) in the single-cell metadata. We applied a median quantile threshold to extract these two subpopulations and classified the cell clusters based on their donor diabetes status to create ND-β^obese-DEGAS and T2D-β^obese-DEGAS subpopulations (Fig 6B). We next determined differentially expressed genes between the ND-β^obese-DEGAS and T2D-β^obese-DEGAS cells (Fig 6C). Gene ontology analysis of genes upregulated in the ND-β^obese-DEGAS cluster indicated an enrichment for unfolded protein response (UPR)_ processes like vesicle transport, translation, and protein folding and stability (Fig 6D). Specifically, the ND-β^obese- ^DEGAS cluster was enriched for expression of adaptive (e.g. MANF and HSPA5) and maladaptive (e.g. TRIB3, DDIT3) UPR genes, as well as genes involved in ER-associated degradation (e.g. SEL1L, DERL2, and SDF2L1). Gene ontology analysis of genes upregulated in the T2D-β^obese-DEGAS cluster showed enrichment for hormone transport and secretion pathways (e.g. IGFBP5, UCN3, G6PC2) and inflammatory/immune-related pathways (e.g., HLA-A/B/E, STAT3) (Fig 6E). In parallel, we performed ranked-list GSEA on the ND- and T2D-β^obese- ^DEGAS subpopulations which was largely consistent with gene ontology results (Fig 6F). Significantly enriched pathways for ND-β^obese- ^DEGAS included the UPR and proliferation (E2F and MTORC signaling hallmarks) (Fig S4I), while T2D-β^obese- ^DEGAS enriched pathways were similar to that of high-scoring β^T2D-DEGAS cells and included hypoxia and glycolysis (Fig S4J). Examples of genes driving the GO and GSEA categories are shown, as well as CDKN1C and DLK1, given their differential enrichment in both β^T2D-DEGAS and T2D-β^obese- ^DEGAS cells (Fig 6G). The transcript heterogeneity for genes highlighted in Fig 6G can be observed in single β-cell expression plots (Fig S4H). Our results support the idea that these genes and pathways may underlie heterogeneity in β-cell resilience or susceptibility to obesity-related stress in the context of T2D development.

Two subpopulations of high β^obese-DEGAS cells defined by underlying single cell donor T2D status uncovers stress signature in non-diabetic high-scoring obese-DEGAS β-cells.
A) β-cells were subsetted from Fig 5A based on *INS* expression and obese-DEGAS scores overlaid onto the single cells. B) β-cells were further subsetted based on both their high obese-DEGAS scores and by the underlying T2D status of the single cell donors, leading to two major subpopulations, ND-β^obese-DEGAS and T2D-β^obese-DEGAS cells. C) Differential expression analysis comparing ND-β^obese-DEGAS and T2D-β^obese-DEGAS subpopulations. Genes with p<0.05 and >1.5-fold change are highlighted in red. Gene ontology (GO) analysis of genes enriched in the ND-β^obese-DEGAS (D) and T2D-β^obese-DEGAS (E) subpopulations. F) Gene set enrichment analysis (GSEA) of ND-vs. T2D-β^obese-DEGAS subpopulations. G) Bubble plot highlighting genes driving GO and GSEA categories.

DLK1 is a candidate identified in β^T2D-DEGAS and ND-β^obese-DEGAS cells and is heterogeneously expressed among diabetic and non-diabetic human pancreatic islets

To support the DEGAS approach for identifying true markers of β-cell subpopulations or heterogeneity associated with diabetes, we selected a candidate gene for immunostaining in FFPE human pancreas sections from non-diabetic and T2D donors. The subpopulations of cells for both β^T2D-DEGAS and T2D-β^obese-DEGAS exhibited a subset of overlapping enriched genes. One of the more interesting candidates we noted was DLK1 (Delta-Like 1 Homolog), which is known to have strong associations with T1D ³⁵ and T2D ³⁶. DLK1 expression highly overlapped with high scoring β^T2D- ^DEGAS cells (Fig 7A) and with T2D-β^obese-DEGAS cells (Fig 7B). DLK1 immunostaining primarily colocalized with β-cells in non-diabetic human pancreas (Fig 7C). DLK1 showed heterogeneous expression within islets and between islets within the same pancreas section, wherein some islets had DLK1/INS co-staining in most β-cells and other islets had only a few DLK1⁺ β-cells. In T2D pancreas, DLK1 staining was much less intense and in fewer β-cells, yet DLK1⁺/INS⁺ cells were observed (Fig 7C). This contrasts with the relatively higher DLK1 gene expression seen in the β-cells from the β^T2D-DEGAS and T2D-β^obese-DEGAS subpopulations (Fig 4D & 6C) as highlighted in Fig 7A,B.

DLK1 is a β-cell gene heterogeneously expressed between cells and islets of non-diabetic and T2D humans.
A) Formalin-fixed paraffin-embedded (FFPE) sections of human pancreas from non-diabetic (ND) and T2D donors was stained for INS and DLK1. Representative images are shown from three different ND and T2D donors. Arrows indicate INS⁺/DLK1⁺ cells. B) UMAP plot overlaying DLK1 expression and βT2D-DEGAS (i) or βobese-DEGAS (ii) scores.

Discussion

Deep transfer learning is a useful approach to identify disease-associated genes in subsets of heterogeneous islet cells

The rapid advancement of deep learning has unveiled new opportunities in diabetes research, specifically the ability to merge and analyze the large amount of publicly available omics data. The DEGAS deep transfer learning framework provides an extremely powerful and versatile tool allowing for time to event outcomes and classification outcomes to be transferred to individual cells. Notably, this direct transfer to single cells avoids the cluster resolution problems that arise using deconvolution approaches, allowing for subsets of cells within a cluster to be assigned different associations with clinical outcomes. In this way, DEGAS allows the definition of essentially new cell subpopulations that would not necessarily be identified by standard clustering algorithms and these subpopulations are rationally selected based on assigned disease association scores. A key point from our study is that the disease-associated single cells identified by DEGAS are those cells whose transcriptomic signature share complex patterns with patient transcriptomics that have known clinical and metabolic attributes. Because of this, the genes identified as up-or down-regulated in subsets of β-cells may be: 1) altered independent of disease due to genetic background of donors; 2) altered downstream of the obese or diabetic states; or 3) representative of resilient or dysfunctional β-cells in the face of metabolic syndrome. Through careful cohort inclusion criteria and batch effect removal, our analyses aim to limit these caveats when possible.

Various deconvolution methods enable estimating relative proportions of cell types in bulk samples using scRNA-seq data (e.g. BSEQ-sc ¹, MuSiC ³⁷, CIBERSORT ³⁸, and DWLS ³⁹). These methods have been instrumental in identifying changes to relative quantity of cell types within normal and disease tissues; however, they do not provide detailed information about subsets of cell types which may be related to disease and are limited to the resolution of the predefined cell types through clustering. Early attempts to address this include unsupervised learning algorithms (e.g. RaceID) ⁴⁰. Now, there are multiple advanced computational approaches to identify subsets of cells related to disease status, survival, drug response, and other disease metrics. These tools can be broadly categorized as ‘cell prioritization algorithms’ and include: DEGAS ³¹, scAB ⁴¹, Scissor ⁴², and scDEAL ⁴³. Scissor and scAB can assign survival or clinical information from bulk expression data to disparate scRNA-seq datasets using regression and matrix factorization, respectively. scDEAL is a deep learning framework specifically designed for the assignment of drug response information from bulk expression data to subsets of single cells. The major advantages of DEGAS are 1) the ability to select subsets of β-cells at an individual cell resolution based on their disease association; 2) unique qualities of the highly non-linear DEGAS technology vs. linear models of other tools to detect complex molecular programs; and 3) speed and scalability based on efficient implementation. DEGAS has the potential to be used on datasets containing well over 1 million cells, unlike other methods.

Nevertheless, certain linear model-based approaches, like RePACT (regressing principal components for the assembly of continuous trajectory), have successfully uncovered genes associated with β-cell heterogeneity in T2D and obesity ⁵. In that study, T2D and obesity ‘trajectory’ genes were identified. We compared the genes enriched in our high-scoring β^T2D-DEGAS and in our T2D-β^obese-DEGAS cells with the corresponding T2D and obesity trajectory genes from Fang et al., respectively (Fig S5, Table S5). We noted substantial agreement between DEGAS and RePACT, for example the transcription factor SIX3 was enriched in T2D-β^obese-DEGAS cells and in Fang et al.’s obese trajectory β-cells ⁵. SIX3 has been implicated as a diabetes risk gene and is important for β-cell function ^{44, 45}. Additionally, both approaches highlighted the association of DLK1 with obesity, but only DEGAS identified DLK1 association with T2D. Overall, the directionality of T2D or obesity gene associations agreed between DEGAS and RePACT results. However, not all the same genes were identified by both approaches, for example, FXYD2 was identified via RePACT, but not in DEGAS, although FXYD2 is a down-regulated gene in T2D in Marselli et al. and Asplund et al. (Fig 1G). Interestingly, FXYD2 has was recently shown to be a downstream effector gene of HNF1A in single human β-cells ⁴⁶. Down-regulation of HNF1A in T2D β-cells, and reduced FXYD2 expression as a result, may contribute to membrane hyperpolarization and reduced function ⁴⁶. Distinct from the application of RePACT in Fang, et al.⁵, we did not observe any differences in HNF1A expression in our DEGAS analyses and FXYD2 was slightly enriched, but not significantly altered, in T2D-β^obese- ^DEGAS cells. Taken together, it is necessary to apply multiple approaches to merged sets of publicly available data because each approach will likely uncover unique and important subsets of β-cells or specific genes.

Differences between high- and low-scoring β^T2D-DEGAS subpopulations

High-scoring β^T2D-DEGAS cells had enrichment of pathways like hypoxia and TNFα signaling which contain some overlapping genes like BTG1, BTG2, and BHLHE40. BHLHE40 was recently shown to cause hypoxia-induced β-cell dysfuction via interfering with MAFA expression ⁴⁷. BTG1 and BTG2 are known as anti-proliferation factors. Given the expression of both BTG1 and CDKN1C in high-scoring β^T2D-DEGAS cells, it is possible these cells have very low proliferative potential, even in the context of β-cells which are well-known to have low proliferation. Although some genes like CDKN1C can be identified as differentially expressed in simple comparisons of all T2D vs ND donor β-cells in scRNA-seq data, DEGAS has the advantage of prioritization or ranking of candidates. For example, CDKN1C is within the top ten enriched candidates (by adjusted P value and fold-change) in the β^T2D-DEGAS cluster but is 286^th when simply comparing all T2D vs ND single β-cells (Table S3). Additionally, deletion of CDKN1C has been shown to improve human islet function ⁴⁸ and gain-of-function mutants are associated with diabetes development and hyperinsulinism ^{49, 50}.

The low-scoring β^T2D-DEGAS cells were enriched in pathways involving oxidative phosphorylation and Myc targets. We also noted enrichment of many small and large ribosomal genes (RPS/RPL) in this subpopulation. RPS/RPL ribosomal genes are known to be highly expressed in primary human β-cells ⁵¹, and glucose stimulation induced translation of over 50 different RPS/RPL genes in human β-cell line EndoC-βH2 ⁵². This suggest that low-scoring β^T2D-DEGAS cells may have relatively increased translation and metabolic activity. Taken together, high-scoring β^T2D-DEGAS cells may be reflective of dysfunctional β-cells in either a pre-T2D or T2D-onset state, while low-scoring β^T2D-DEGAS cells could potentially represent more resilient β-cells.

Differences between ND-β^obese-DEGAS and T2D-β^obese-DEGAS subsets of β-cells

Our understanding of why some individuals with obesity develop T2D while others do not is incomplete. A 2021 NHANES report indicated a 41.9% prevalence for obesity in US adults aged 20 and over, with a 14.8% prevalence for diabetes ⁵³, in line with CDC reports ⁵⁴. This suggests that many individuals with obesity have either not progressed, or may never progress, to T2D. This phenomenon may be related to those individuals categorized as metabolically healthy obese as compared to metabolically unhealthy obese ⁵⁵. Expanded work with DEGAS may uncover specific genes that underly this relationship. In our analysis we were able to stratify based on T2D status of the single cell donors and overlay the relative obesity-association scores from DEGAS. That analysis allowed us to set rational thresholds for grouping subsets of β-cells to compare ND- and T2D-β^obese-DEGAS cells. Perhaps counterintuitively, UPR genes were highly enriched in the ND-β^obese-DEGAS cells as opposed to the T2D-β^obese- ^DEGAS cells. It is possible these ND-β^obese-DEGAS cells are in a stressed pre-diabetic state on the path to T2D. Dominguez-Gutierrez et al. analyzed islet scRNA-seq data using pseudotime analysis and identified three major β-cell states which were defined by their insulin expression and UPR level ⁵⁶. The authors speculate that β-cells periodically pass through these states, which in scRNA-seq are visualized as subpopulations of cells. Previously, a FTH1 subpopulation of β-cells was identified with implications in the UPR ⁵⁷. In agreement with those results, we found FTH1 was upregulated in ND-β^obese-DEGAS cells. Therefore, another possibility is that the UPR induced in ND-β^obese-DEGAS cells marks a population with lower functionality but higher proliferative potential to combat the insulin resistant obese state, and the relatively lower UPR in T2D-β^obese-DEGAS cells possibly correlates with enhanced functionality in established T2D. Concordantly, T2D-β^obese-DEGAS cells had enrichment of hormone secretion genes (e.g. SYT7, G6PC2, NEUROD1, UCN3, FFAR1) in our pathway analysis (Fig 6E,F). In a study of human, mouse, and pig islet scRNA-seq data, subpopulations of ‘stressed’ β-cells were identified that exhibited enriched hallmark pathways similarly to what we observe in ND-β^obese-DEGAS cells ⁶. Thus, multiple studies including our current analysis support the existence of subpopulations of stressed β-cells, whether transient or stable, that could in principle be targeted by therapeutics.

We also looked for enrichment of potential secreted biomarkers in high-scoring β^T2D-DEGAS and T2D-β^obese-DEGAS cells and identified LRPAP1 and C1QL1. LRPAP1 encodes the LDL receptor-related protein-associated protein 1 (LRPAP) and is enriched in this subpopulation. LRPAP is ubiquitously expressed and is predicted to be an ER resident protein ⁵⁸. LRPAP appears to be heterogeneously expressed among islet cells and was detected in human plasma in the in the Human Protein Atlas ⁵⁹. C1QL1 encodes a secreted peptide that is highly-expressed in human islets ⁶⁰ and was enriched in β^T2D-DEGAS cells, although little else is known about its role in β-cells. Increased C1QL1 release could potentially signal through its GPCR BAI3 to suppress insulin secretion of surrounding β-cells, as has been shown for C1QL3 ⁶¹.

Potential for DEGAS in identifying β-cell heterogeneity markers

Subsets or subpopulations of β-cells are an emergent property of β-cell heterogeneity. Two of the most widely used scRNA-seq datasets have also identified significant heterogeneity within pancreatic cell types, notably β-cells ^{7, 57}. Various markers have been identified to define these β-cell subpopulations, including expression of ST8SIA1 and CD9 ⁶² or Flattop expression ⁶³. Additionally, UCN3 marks mature β cells ⁶⁴ and RBP4⁺ β-cells correlated with reduced function ⁶⁵. Functional heterogeneity has also been described as in the case of leader/first-responder β cells ^{66, 67} or hub β cells ⁶⁸, and has been linked to transcription factors including PDX1⁶⁹ and HNF1A ⁴⁶. Nevertheless, cellular heterogeneity within a single cell type is complicated and there is uncertainty in the single cell analytics field about what constitutes a stable cell type versus a transient cell state ⁷⁰. Tying transcriptionally distinct clusters to function or morphology may be key to making this distinction ⁷¹. By inferring disease outcomes in high resolution cellular subtypes, application of DEGAS can help to inform this debate.

In our analysis, DLK1 expression was enriched in high-scoring β^T2D-DEGAS and in T2D-β^obese-DEGAS cells. We also observed heterogeneous DLK1 staining of β-cells from human pancreas sections. DLK1 was previously noted to be enriched in subpopulations of β-cells ^{1, 2, 4, 8, 72}, however its heterogeneity in transcript and protein abundance in non-diabetic versus T2D human islet β-cells had not been described until now. DLK1 is a maternally-imprinted gene⁷³ with described roles in Notch signaling^74–76 and glucose homeostasis⁷⁷. Supporting an active role β-cell disease, a recent preprint reported DLK1 was required for proper maturation, function, and stress resilience of β-like cells differentiated from human embryonic stem cells ⁷⁸. Our results combined with the published data indicate DLK1 as a significant player in β-cell function in diabetes. DLK1 may serve as a surface marker of, or be released from, stressed or at-risk β-cells. However, increased sample sizes and in-depth analyses of both non-diabetic, T1D, and T2D human islets will be required to better describe the differences seen in transcript and protein abundance of DLK1 between β-cells within the same islet, between islets of the same and different donors, and across different stages of diabetes development.

Applying tools like DEGAS to available data will increase our understanding of the underlying β-cell features associated with progression to T2D or with an obese non-T2D state. Expanded DEGAS analyses will be needed to include scRNA-seq data from islets subjected to various models of T1D/T2D (e.g. cytokines, glucolipotoxicity). Major questions include whether the gene candidates identified by DEGAS are protective, disease-causing, or simply markers in β-cell subpopulations. In the future, these questions may be addressed by functional validation studies or observations in genome-wide association studies.

Limitations of the study and future directions

We think the approach of joining islet omics data to deep learning and artificial intelligence is an area in need of increased attention; however, we appreciate that our current study has limitations. First, our analysis utilized a single bulk RNA-seq human islet dataset, although the Marselli study contains a relatively large number of islet donors compared to most available datasets. Second, we merged only five scRNA-seq human islet datasets for this application of DEGAS. The datasets were chosen because of the extensive meta-analyses to which they have already been subjected ⁸, and their use as benchmarking data in many scRNA-seq analysis tools ^79–81. scRNA-seq has a limitation on the number of detectable genes in each cell, as compared to bulk RNA-seq samples. While these limitations may eventually be mitigated by improved technologies, we can begin to overcome these by merging a larger number of human islet bulk and scRNA-seq studies which contain more donors. Increased sample sizes and study variables come with their own challenges, however the future gene candidates identified by DEGAS will be of even higher confidence. Additionally, scRNA-seq is a snapshot in time and it may be tenuous to claim that a particular cell state is transient or persistent. To support a claim that a given subset/subtype of β-cells represents an actual stable population in scRNA-seq data requires at least finding that the cells comprising the candidate population occur across multiple individual donors. In future studies, our use of the diabetes field’s considerable investment in islet transcriptomics data and state-of-the-art cell prioritization tools will enable simultaneous identification of disease-associated cellular subtypes and associated biomarkers of function and dysfunction. Additionally, increasing the number of human pancreas donor tissue samples for high-content image analysis will improve confidence in candidate gene validation.

Potential implications

DEGAS has the potential to be applied to even larger mergers of single cell data (> 1 million cells) using the newly developed DEGAS atlas implementations. A vast amount of human (and other species) islet single cell transcriptomics data is publicly available, but often requires substantial reformatting of metadata and realigning of reads to be harmonized. As our endeavors and those of others proceed, DEGAS will lead to even higher confidence predictions of β-cell subpopulations. It is also apparent from our DEGAS analyses that other non-β-cells are highlighted in the islet single cell map for associations with T2D and obesity. Although outside the scope of this work, further exploration of our current DEGAS analyses, or of analyses using larger single cell integrations, will have implications for the other major islet and pancreas cell types, including α-cells, δ-cells and PP-cells. For example, PP cells have been shown to have a role in pancreatogenic diabetes as opposed to T2D ⁸², and there appear to be subpopulations of high and low scoring PP^T2D-DEGAS cells within the DEGAS analysis (Fig 3A). Our successful application of deep transfer learning in human islet data opens the possibility of predicting subtypes of β-cells in other diseases, like T1D and congenital hyperinsulinism, to find potential biomarkers and therapeutic candidates. In some rare disease, like congenital hyperinsulinism, single cell data is not available, however spatial transcriptomics (ST) could be applied to FFPE sections. DEGAS has already been implemented to analyze ST data in prostate cancer ⁸³, and will be applicable to human pancreas as ST technology continues to advance. It is important to note that DEGAS is just one of many machine learning approaches to analyze transcriptomic data. It will be important to include DEGAS in combination with both linear and other non-linear models to capture as many relevant gene candidates as possible.

Methods

Transfer learning using DEGAS on human islet transcriptomic data

The integrated single cell and bulk datasets were processed with the DEGAS (v1.0) pipeline (https://github.com/tsteelejohnson91/DEGAS) ³¹ to calculate disease risk scores associated with T2D status or BMI status. The bulk expression data were scaled and normalized prior to DEGAS analysis using the preprocessCounts function. The merged scRNA-seq expression matrix, the bulk expression matrix, and donor sample labels (matched with the bulk samples) were used as input. The intersection of highly variable genes between scRNA-seq and bulk expression data were used for further analysis. The DEGAS model was trained and predicted on the formatted data. Updated scripts and instructions for running DEGAS are available on GitHub (version 1.0) (https://github.com/tsteelejohnson91/DEGAS). DEGAS for T2D and BMI were run independently.

In DEGAS for T2D, the donor labels were “normal” vs “T2D.” In DEGAS for BMI metadata, donors from bulk RNA-seq were categorized and labeled as “lean” (<25 BMI), “overweight” (25-30 BMI) and “obese” (>30 BMI) according to CDC guidelines. The models were trained to calculate T2D-association scores and BMI-association scores for each of the respective categories. We identified differentially-expressed genes using FindMarkers function in β-cells with high vs low T2D-association scores (β^T2D-DEGAS) or high obesity scores (β^obese-DEGAS) among healthy or T2D donor single-cells.

Subclustering of β-cells and post-DEGAS scRNA-seq analysis

We isolated the β-cell cluster using subset function and reclustered them seperately from the other islet cell types. The β-cells were further classified by different thresholding parameters including median and quantiles. For quantile and median thresholds, we generated high, medium and low, as well upper 50% and lower 50% of median. We used the FindMarkers function to identify differentially-expressed genes in high vs. low (Fig 4D), high vs. medium+low, and median (upper 50% vs lower 50%) (Table S3).

Pathway analysis

To analyze differentially-expressed genes that were either up- or down-regulated between clusters, enrichment analyses and plot generation were completed using ClusterProfiler, EnhancedVolcano, and ggplot2 R packages in R version 4.2.2 and RStudio. Enrichment of gene sets within the the Biological Process GO terms are shown and cluster profiler outputs are provided in Table S4. Venn diagrams for comparing DEGAS and RePACT hit genes (Table S5) were created using https://molbiotools.com/listcompare.php. Gene set enrichment analysis was run using GSEA software ^{84, 85} downloaded from www.gsea-msigdb.org/gsea and the MSigDB Hallmark Gene Set ⁸⁶ from the Broad Institute. For bulk RNA-seq analysis, the edgeR results containing all expressed genes was used as input to run standard GSEA. For GSEA analysis of comparisons between subpopluations of β-cells from scRNA-seq data, a ranked list based on log₂fold-change was generated without a cutoff for adjusted P value. GSEAPreranked mode used the ’classic’ enrichnment statistic for 1000 permutations.

Human pancreas tissue staining and microscopy

Formalin-fixed paraffin-embedded (FFPE) de-identified human pancreas tissue were obtained through the NDRI (Table S6). Tissue was processed into 5 µm sections and mounted on glass slides by the Indiana University School of Medicine Histology Lab Service Core. Slides were deparaffinized by xylene and ethanol washes. Antigen retrieval was performed by heating for 40 min in an Epitope Retrival Steamer with slides submerged in Epitope Retrieval Solution (IHC-Tek). Subsequently, slides were placed onto disposable immunostaining coverplates and inserted into the Sequenza slide rack (Ted Pella / EMS) for washing, blocking, and antibody incubations. After three 10 min washes in IHC wash buffer (0.1% Triton X-100 and 0.01% sodium azide in PBS pH 7.4), slides were blocked for 1 h at room temperature in normal donkey serum (NDS) block solution (5% donkey serum, 1% bovine serum albumin, 0.3% Triton X-100, 0.05% Tween-20, and 0.05% sodium azide in PBS pH 7.4). Slides were then incubated overnight at 4°C with primary antibodies diluted in NDS block solution (Table S7). After three washes in IHC wash buffer 200 µL each), slides were incubated in secondary antibodies in NDS block solution for 1 h at room temperature. The washed slides were mounted in polyvinyl alcohol (PVA) mounting medium (5% PVA, 10% glycerol, 50mM Tris pH 9.0, 0.01% sodium azide) with DAPI (300 nM) added and imaged on Zeiss LSM710 confocal microscope equipped with a Plan-Apochromat 20x/0.8 objective (#420650-9901). Images were processed in the Zeiss Zen software to add scale bars, set coloration for channels, and generate merged images. Scale bars indicate 50 µm.

Statistical Analysis

In edgeR bulk RNAseq differential gene expression analysis, the likelihood ratio test was applied. In the FindMarkers function in Seurat the default Wilcoxon Rank Sum test was used for differential gene expression analysis. Cut-offs for signifcant differentially expressed genes were set at |log₂fold-change| > 0.58 and adjusted p-value < 0.05.

Supporting information

Availability of source code and requirements

Project name: Islet DEGAS
Project home page: https://github.com/kalwatlab/Islet_DEGAS_v1
Operating system(s): Platform independent
Programming language: R and Python
Other requirements: not applicable
License: ©

Data availability

The raw data sets supporting the results of this article are available in the NCBI GEO and the ArrayExpress repositories under the following persistent identifiers (also shown in Table S1): bulk RNA-seq: GSE159984; scRNA-seq: GSE84133, GSE85241, GSE86469, E-MTAB-5061, GSE81608.

Declarations

List of abbreviations

ND: non-diabetic
T1D: type 1 diabetes
T2D: type 2 diabetes
RNA-seq: RNA sequencing
scRNA-seq: single cell RNA sequencing
FFPE: formalin-fixed paraffin-embedded
DEGAS: Diagnostic Evidence GAuge of Single cells
DLK1: Delta Like Non-Canonical Notch Ligand 1
MANE: Matched Annotation from NCBI and EBI
RRID: Research Resource identifer
GSEA: gene set enrichment analysis
MSigDB: Molecular signatures database
BMI: body mass index
UMAP: uniform manifold approximation and projection
β^T2D-DEGAS: T2D-DEGAS disease assocation score for beta cells
β^obese-DEGAS: obese-DEGAS disease assocation score for beta cells
ND-β^obese-DEGAS: obese-DEGAS disease assocation score for beta cells from ND donors
T2D-β^obese-DEGAS: obese-DEGAS disease assocation score for beta cells from T2D donors
GO: gene ontology
RePACT: regressing principle components for the assembly of continuous trajectory
NDS: normal donkey serum
PBS: phosphate buffered saline

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by internal funds at the Indiana Biosciences Research Institute (M.A.K. and T.S.J.), AnalytixIN (T.S.J.), Indiana University Precision Health Initiative (T.S.J), 1R01GM148970 (T.S.J), 1R21CA264339 (T.S.J).’

Authors’ contributions

G.R., R.S., O.L., S.R., S.D.M., A.M.M., T.S.J., and M.A.K. performed data analyses. G.R. and D.R. performed experiments. M.A.K. and T.S.J. conceived and initiated this project. M.A.K. and T.S.J. supervised the project. A.M.M. provided processed data files. G.R., T.S.J. and M.A.K. wrote the manuscript. All authors read and approved the manuscript.

Acknowledgements

We thank Dr. Mark Huising for helpful conversations and providing access to processed data. Thank you to Dr. Andrew Templin for helpful conversations and critical review of the manuscript. Thank you to Dr. Anthony Piron at Université Libre de Bruxelles for assistance with metadata mapping for GSE159984. This work was supported by the Histology Core of the Indiana Center for Musculoskeletal Health at IU School of Medicine and the Bone and Body Composition Core of the Indiana Clinical Translational Sciences Institute (CTSI). A specific thanks to Drew Brown in the Histology Lab Service Core for assisting with pancreas sectioning.

Supplemental Information

Islet Gene View results of DEGs found in both Marselli et al. and Asplund et al.
*FFAR4* (A), *UCN5D* (B), *SFRP4* (C), and *PODN* (D) were each significantly regulated in T2D vs. non-T2D human islet RNA-seq data in Marselli et al. and Asplund et al. Plots shown were generated using Islet Gene View browser (https://mae.crc.med.lu.se/IsletGeneView/). *FFAR4* and *UCN5D* were both down-regulated in T2D and *SFRP4* and *PODN* were both up-regulated.

Human islet single cell RNA-seq clusters and diabetes marker genes for all cells and β-cells.
A) Seurat clustering of merged human islet scRNA-seq data. B) Study identifiers overlaid on single cell plot. C) Differential gene expression analysis comparing all cells from T2D donors vs. non-diabetic (ND) donors. D) β-cells reclustered and labeled based on donor T2D status. E) Differential expression analysis of β-cells from T2D vs. ND donors.

Thresholding analysis for β^T2D-DEGAS scores and differential gene expression analysis and plots.
A) Quantile thresholding and volcano plot comparing the upper 20% of β^T2D-DEGAS scores to the lower 80%. B) Median thresholding and volcano plot comparing the upper 50% vs lower 50% of β^T2D-DEGAS scoring cells. C) Gene expression overlays on β-cells of selected genes that drive enrichment of GO terms and GSEA categories. D) GSEA results for high βT2D-DEGAS scoring cells. E) GSEA results of low-scoring β^T2D-DEGAS scoring cells.

Score overlay on all islet cells for β^lean-DEGAS **(A)** and β^{overweight-DEGAS} **(B). C,D)** Similar score overlays are also shown for β-cells alone. E) Median thresholding of β^obese-DEGAS score. F) Differential expression analysis for high vs low β^obese-DEGAS scores based on median threshold. G) Subsetting highest-scoring cells based on β^obese-DEGAS score. H) Gene expression overlays on β-cell UMAP. I) GSEA results for ND-β^obese-DEGAS cells. J) GSEA results for T2D-β^obese-DEGAS cells.

Overlap of RePACT trajectory genes with genes identified by DEGAS in human islet scRNA-seq data.
Genes from βT2D-DEGAS (A), or ND-β^obese-DEGAS and T2D-β^obese-DEGAS (B) that had adjusted p-values < 0.05 and |log₂fold-changed| > 0.58 were selected for comparison to Fang et al. **Table S4** T2D and obesity trajectory genes, respectively.

Additional Files

Table S1. List of bulk and single cell transcriptomic datasets.

Table S2. Bulk RNA-seq metadata, analysis, and comparisons to publicly-available datasets.

Table S3. Differential gene expression in DEGAS-identified β-cell subpopulations.

Table S4. Pathway analysis results.

Table S5. FFPE human pancreas metadata.

Table S6. Antibodies.

Table S7. Comparison of DEGAS T2D and obesity genes to Fang et al. RePACT trajectory genes.

References

1.
1. Baron M
2. Veres A
3. Wolock SL
4. Faust AL
5. Gaujoux R
6. Vetere A
7. et al.
2016A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter-and Intra-cell Population StructureCell Syst 3:346–60https://doi.org/10.1016/j.cels.2016.08.011 PubMed Google Scholar
2.
1. Li J
2. Klughammer J
3. Farlik M
4. Penz T
5. Spittler A
6. Barbieux C
7. et al.
2016Single-cell transcriptomes reveal characteristic features of human pancreatic islet cell typesEMBO Rep 17:178–87https://doi.org/10.15252/embr.201540946 PubMed Google Scholar
3.
1. Wang YJ
2. Schug J
3. Won KJ
4. Liu C
5. Naji A
6. Avrahami D
7. et al.
2016Single-Cell Transcriptomics of the Human Endocrine PancreasDiabetes 65:3028–38https://doi.org/10.2337/db16-0405 PubMed Google Scholar
4.
1. Lawlor N
2. George J
3. Bolisetty M
4. Kursawe R
5. Sun L
6. Sivakamasundari V
7. et al.
2017Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetesGenome Res 27:208–22https://doi.org/10.1101/gr.212720.116 PubMed Google Scholar
5.
1. Fang Z
2. Weng C
3. Li H
4. Tao R
5. Mai W
6. Liu X
7. et al.
2019Single-Cell Heterogeneity Analysis and CRISPR Screen Identify Key beta-Cell-Specific Disease GenesCell Rep 26:3132–44https://doi.org/10.1016/j.celrep.2019.02.043 PubMed Google Scholar
6.
1. Tritschler S
2. Thomas M
3. Böttcher A
4. Ludwig B
5. Schmid J
6. Schubert U
7. et al.
2022A transcriptional cross species map of pancreatic islet cellsMolecular Metabolism https://doi.org/10.1016/j.molmet.2022.101595 Google Scholar
7.
1. Segerstolpe A
2. Palasantza A
3. Eliasson P
4. Andersson EM
5. Andreasson AC
6. Sun X
7. et al.
2016Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 DiabetesCell Metab 24:593–607https://doi.org/10.1016/j.cmet.2016.08.020 PubMed Google Scholar
8.
1. Mawla AM
2. Huising MO
2019Navigating the Depths and Avoiding the Shallows of Pancreatic Islet Cell TranscriptomesDiabetes 68:1380–93https://doi.org/10.2337/dbi18-0019 PubMed Google Scholar
9.
1. Elgamal RM
2. Kudtarkar P
3. Melton RL
4. Mummey HM
5. Benaglio P
6. Okino ML
7. et al.
An integrated map of cell type-specific gene expression in pancreatic isletsbioRxiv https://doi.org/10.1101/2023.02.03.526994 PubMed Google Scholar
10.
1. Fabbrini E
2. Yoshino J
3. Yoshino M
4. Magkos F
5. Tiemann Luecking C
6. Samovski D
7. et al.
2015Metabolically normal obese people are protected from adverse effects following weight gainJ Clin Invest 125:787–95https://doi.org/10.1172/JCI78425 PubMed Google Scholar
11.
1. Voight BF
2. Scott LJ
3. Steinthorsdottir V
4. Morris AP
5. Dina C
6. Welch RP
7. et al.
2010Twelve type 2 diabetes susceptibility loci identified through large-scale association analysisNat Genet 42:579–89https://doi.org/10.1038/ng.609 PubMed Google Scholar
12.
1. Saxena R
2. Voight BF
3. Lyssenko V
4. Burtt NP
5. de Bakker PI
6. Chen H
7. et al.
2007Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levelsScience 316:1331https://doi.org/10.1126/science.1142358 PubMed Google Scholar
13.
1. Scott LJ
2. Mohlke KL
3. Bonnycastle LL
4. Willer CJ
5. Li Y
6. Duren WL
7. et al.
2007A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variantsScience 316:1341–5https://doi.org/10.1126/science.1142382 PubMed Google Scholar
14.
1. Rottner AK
2. Ye Y
3. Navarro-Guerrero E
4. Rajesh V
5. Pollner A
6. Bevacqua RJ
7. et al.
2023A genome-wide CRISPR screen identifies CALCOCO2 as a regulator of beta cell function influencing type 2 diabetes riskNat Genet 55:54–65https://doi.org/10.1038/s41588-022-01261-2 PubMed Google Scholar
15.
1. Kim H
2. Westerman KE
3. Smith K
4. Chiou J
5. Cole JB
6. Majarian T
7. et al.
2022High-throughput genetic clustering of type 2 diabetes loci reveals heterogeneous mechanistic pathways of metabolic diseaseDiabetologia https://doi.org/10.1007/s00125-022-05848-6 PubMed Google Scholar
16.
1. Marselli L
2. Piron A
3. Suleiman M
4. Colli ML
5. Yi X
6. Khamis A
7. et al.
2020Persistent or Transient Human beta Cell Dysfunction Induced by Metabolic Stress: Specific Signatures and Shared Gene Expression with Type 2 DiabetesCell Rep 33:108466https://doi.org/10.1016/j.celrep.2020.108466 PubMed Google Scholar
17.
1. Asplund O
2. Storm P
3. Chandra V
4. Hatem G
5. Ottosson-Laakso E
6. Mansour-Aly D
7. et al.
2022Islet Gene View-a tool to facilitate islet researchLife Sci Alliance 5https://doi.org/10.26508/lsa.202201376 PubMed Google Scholar
18.
1. Morales J
2. Pujar S
3. Loveland JE
4. Astashyn A
5. Bennett R
6. Berry A
7. et al.
2022A joint NCBI and EMBL-EBI transcript set for clinical genomics and researchNature 604:310–5https://doi.org/10.1038/s41586-022-04558-8 PubMed Google Scholar
19.
1. Blondel VD
2. Guillaume J-L
3. Lambiotte R
4. Lefebvre E
2008Fast unfolding of communities in large networksJournal of statistical mechanics: theory and experiment 2008:10008Google Scholar
20.
1. Kalwat MA
2024Islet DEGAS v1Mendeley Data https://doi.org/10.17632/3sdxv5tzbd.1 Google Scholar
21.
1. Chen YC
2. Taylor AJ
3. Verchere CB
2018Islet prohormone processing in health and diseaseDiabetes Obes Metab 20:64–76https://doi.org/10.1111/dom.13401 PubMed Google Scholar
22.
1. Folli F
2. La Rosa S
3. Finzi G
4. Davalli AM
5. Galli A
6. Dick EJ
7. et al.
2018Pancreatic islet of Langerhans’ cytoarchitecture and ultrastructure in normal glucose tolerance and in type 2 diabetes mellitusDiabetes Obes Metab 20:137–44https://doi.org/10.1111/dom.13380 PubMed Google Scholar
23.
1. Lorenzo PI
2. Fuente-Martin E
3. Brun T
4. Cobo-Vuilleumier N
5. Jimenez-Moreno CM I GHG
6. et al.
2015PAX4 Defines an Expandable beta-Cell Subpopulation in the Adult Pancreatic IsletSci Rep 5https://doi.org/10.1038/srep15672 PubMed Google Scholar
24.
1. Collombat P
2. Mansouri A
3. Hecksher-Sorensen J
4. Serup P
5. Krull J
6. Gradwohl G
7. et al.
2003Opposing actions of Arx and Pax4 in endocrine pancreas developmentGenes Dev 17:2591–603https://doi.org/10.1101/gad.269003 PubMed Google Scholar
25.
1. Dupuis J
2. Langenberg C
3. Prokopenko I
4. Saxena R
5. Soranzo N
6. Jackson AU
7. et al.
2010New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes riskNat Genet 42:105–16https://doi.org/10.1038/ng.520 PubMed Google Scholar
26.
1. Wu CT
2. Hilgendorf KI
3. Bevacqua RJ
4. Hang Y
5. Demeter J
6. Kim SK
7. et al.
2021Discovery of ciliary G protein-coupled receptors regulating pancreatic islet insulin and glucagon secretionGenes Dev https://doi.org/10.1101/gad.348261.121 PubMed Google Scholar
27.
1. Docherty FM
2. Riemondy KA
3. Castro-Gutierrez R
4. Dwulet JM
5. Shilleh AH
6. Hansen MS
7. et al.
2021ENTPD3 Marks Mature Stem Cell-Derived beta-Cells Formed by Self-Aggregation In VitroDiabetes 70:2554–67https://doi.org/10.2337/db20-0873 PubMed Google Scholar
28.
1. Syed SK
2. Kauffman AL
3. Beavers LS
4. Alston JT
5. Farb TB
6. Ficorilli J
7. et al.
2013Ectonucleotidase NTPDase3 is abundant in pancreatic beta-cells and regulates glucose-induced insulin secretionAm J Physiol Endocrinol Metab 305:E1319–26https://doi.org/10.1152/ajpendo.00328.2013 PubMed Google Scholar
29.
1. Costanzo MC
2. von Grotthuss M
3. Massung J
4. Jang D
5. Caulkins L
6. Koesterer R
7. et al.
2023The Type 2 Diabetes Knowledge Portal: An open access genetic resource dedicated to type 2 diabetes and related traitsCell Metab https://doi.org/10.1016/j.cmet.2023.03.001 PubMed Google Scholar
30.
1. Forgetta V
2. Jiang L
3. Vulpescu NA
4. Hogan MS
5. Chen S
6. Morris JA
7. et al.
2022An effector index to predict target genes at GWAS lociHum Genet 141:1431–47https://doi.org/10.1007/s00439-022-02434-z PubMed Google Scholar
31.
1. Johnson TS
2. Yu CY
3. Huang Z
4. Xu S
5. Wang T
6. Dong C
7. et al.
2022Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to diseaseGenome Med 14:11https://doi.org/10.1186/s13073-022-01012-2 PubMed Google Scholar
32.
1. Weir GC
2. Gaglia J
3. Bonner-Weir S
2020Inadequate β-cell mass is essential for the pathogenesis of type 2 diabetesThe Lancet Diabetes & Endocrinology 8:249–56https://doi.org/10.1016/s2213-8587(20)30022-x Google Scholar
33.
1. Stampone E
2. Caldarelli I
3. Zullo A
4. Bencivenga D
5. Mancini FP
6. Della Ragione F
7. et al.
2018Genetic and Epigenetic Control of CDKN1C Expression: Importance in Cell Commitment and Differentiation, Tissue Homeostasis and Human DiseasesInt J Mol Sci 19https://doi.org/10.3390/ijms19041055 PubMed Google Scholar
34.
1. Ou K
2. Yu M
3. Moss NG
4. Wang YJ
5. Wang AW
6. Nguyen SC
7. et al.
2019Targeted demethylation at the CDKN1C/p57 locus induces human beta cell replicationJ Clin Invest 129:209–14https://doi.org/10.1172/JCI99170 PubMed Google Scholar
35.
1. Wallace C
2. Smyth DJ
3. Maisuria-Armer M
4. Walker NM
5. Todd JA
6. Clayton DG
2010The imprinted DLK1-MEG3 gene region on chromosome 14q32.2 alters susceptibility to type 1 diabetesNat Genet 42:68–71https://doi.org/10.1038/ng.493 PubMed Google Scholar
36.
1. Kameswaran V
2. Golson ML
3. Ramos-Rodriguez M
4. Ou K
5. Wang YJ
6. Zhang J
7. et al.
2018The Dysregulation of the DLK1-MEG3 Locus in Islets From Patients With Type 2 Diabetes Is Mimicked by Targeted Epimutation of Its Promoter With TALE-DNMT ConstructsDiabetes 67:1807–15https://doi.org/10.2337/db17-0682 PubMed Google Scholar
37.
1. Wang X
2. Park J
3. Susztak K
4. Zhang NR
5. Li M
2019Bulk tissue cell type deconvolution with multi-subject single-cell expression referenceNat Commun 10:380https://doi.org/10.1038/s41467-018-08023-x PubMed Google Scholar
38.
1. Newman AM
2. Steen CB
3. Liu CL
4. Gentles AJ
5. Chaudhuri AA
6. Scherer F
7. et al.
2019Determining cell type abundance and expression from bulk tissues with digital cytometryNat Biotechnol 37:773–82https://doi.org/10.1038/s41587-019-0114-2 PubMed Google Scholar
39.
1. Tsoucas D
2. Dong R
3. Chen H
4. Zhu Q
5. Guo G
6. Yuan GC
2019Accurate estimation of cell-type composition from gene expression dataNat Commun 10:2975https://doi.org/10.1038/s41467-019-10802-z PubMed Google Scholar
40.
1. Herman JS Sagar
2. Grun D
2018FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq dataNat Methods 15:379–86https://doi.org/10.1038/nmeth.4662 PubMed Google Scholar
41.
1. Zhang Q
2. Jin S
3. Zou X
2022scAB detects multiresolution cell states with clinical significance by integrating single-cell genomics and bulk sequencing dataNucleic Acids Res 50:12112–30https://doi.org/10.1093/nar/gkac1109 PubMed Google Scholar
42.
1. Sun D
2. Guan X
3. Moran AE
4. Wu LY
5. Qian DZ
6. Schedin P
7. et al.
2022Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing dataNat Biotechnol 40:527–38https://doi.org/10.1038/s41587-021-01091-3 PubMed Google Scholar
43.
1. Chen J
2. Wang X
3. Ma A
4. Wang QE
5. Liu B
6. Li L
7. et al.
2022Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq dataNat Commun 13:6494https://doi.org/10.1038/s41467-022-34277-7 PubMed Google Scholar
44.
1. Broadaway KA
2. Yin X
3. Williamson A
4. Parsons VA
5. Wilson EP
6. Moxley AH
7. et al.
2023Loci for insulin processing and secretion provide insight into type 2 diabetes riskAm J Hum Genet https://doi.org/10.1016/j.ajhg.2023.01.002 PubMed Google Scholar
45.
1. Bevacqua RJ
2. Lam JY
3. Peiris H
4. Whitener RL
5. Kim S
6. Gu X
7. et al.
2021SIX2 and SIX3 coordinately regulate functional maturity and fate of human pancreatic beta cellsGenes Dev 35:234–49https://doi.org/10.1101/gad.342378.120 PubMed Google Scholar
46.
1. Weng C
2. Gu A
3. Zhang S
4. Lu L
5. Ke L
6. Gao P
7. et al.
2023Single cell multiomic analysis reveals diabetes-associated beta-cell heterogeneity driven by HNF1ANat Commun 14:5400https://doi.org/10.1038/s41467-023-41228-3 PubMed Google Scholar
47.
1. Tsuyama T
2. Sato Y
3. Yoshizawa T
4. Matsuoka T
5. Yamagata K
2023Hypoxia causes pancreatic beta-cell dysfunction and impairs insulin secretion by activating the transcriptional repressor BHLHE40EMBO Rep :e56227https://doi.org/10.15252/embr.202256227 PubMed Google Scholar
48.
1. Avrahami D
2. Li C
3. Yu M
4. Jiao Y
5. Zhang J
6. Naji A
7. et al.
2014Targeting the cell cycle inhibitor p57Kip2 promotes adult human beta cell replicationJ Clin Invest 124:670–4https://doi.org/10.1172/JCI69519 PubMed Google Scholar
49.
1. Kerns SL
2. Guevara-Aguirre J
3. Andrew S
4. Geng J
5. Guevara C
6. Guevara-Aguirre M
7. et al.
2014A novel variant in CDKN1C is associated with intrauterine growth restriction, short stature, and early-adulthood-onset diabetesJ Clin Endocrinol Metab 99:E2117–22https://doi.org/10.1210/jc.2014-1949 PubMed Google Scholar
50.
1. Brioude F
2. Netchine I
3. Praz F
4. Le Jule M
5. Calmel C
6. Lacombe D
7. et al.
2015Mutations of the Imprinted CDKN1C Gene as a Cause of the Overgrowth Beckwith-Wiedemann Syndrome: Clinical Spectrum and Functional CharacterizationHum Mutat 36:894–902https://doi.org/10.1002/humu.22824 PubMed Google Scholar
51.
1. Augsornworawat P
2. Maxwell KG
3. Velazco-Cruz L
4. Millman JR
2020Single-Cell Transcriptome Profiling Reveals beta Cell Maturation in Stem Cell-Derived Islets after TransplantationCell Rep 32:108067https://doi.org/10.1016/j.celrep.2020.108067 PubMed Google Scholar
52.
1. Bulfoni M
2. Bouyioukos C
3. Zakaria A
4. Nigon F
5. Rapone R
6. Del Maestro L
7. et al.
2022Glucose controls co-translation of structurally related mRNAs via the mTOR and eIF2 pathways in human pancreatic beta cellsFront Endocrinol (Lausanne 13https://doi.org/10.3389/fendo.2022.949097 PubMed Google Scholar
53.
1. Stierman B
2. Afful J
3. Carroll M
4. Chen T-C
5. Davy O
6. Steven F
7. et al.
2017–March 2020 prepandemic data files—Development of files and prevalence estimates for selected health outcomesNational Health Statistics Reports. National Center for Health Statistics 2021:158https://doi.org/10.15620/cdc:106273 Google Scholar
54.
Centers for Disease Control and Prevention. National Diabetes Statistics Report website. [cited 12/26/2022]; Available from: https://www.cdc.gov/diabetes/data/statistics-report/index.html.https://www.cdc.gov/diabetes/data/statistics-report/index.html Google Scholar
55.
1. Smith GI
2. Mittendorfer B
3. Klein S
2019Metabolically healthy obesity: facts and fantasiesJ Clin Invest 129:3978–89https://doi.org/10.1172/JCI129186 PubMed Google Scholar
56.
1. Dominguez-Gutierrez G
2. Xin Y
3. Gromada J
2019Heterogeneity of human pancreatic beta-cellsMol Metab 27S:S7–S14https://doi.org/10.1016/j.molmet.2019.06.015 PubMed Google Scholar
57.
1. Muraro MJ
2. Dharmadhikari G
3. Grun D
4. Groen N
5. Dielen T
6. Jansen E
7. et al.
2016A Single-Cell Transcriptome Atlas of the Human PancreasCell Syst 3:385–94https://doi.org/10.1016/j.cels.2016.09.002 PubMed Google Scholar
58.
1. Bu G
2. Geuze HJ
3. Strous GJ
4. Schwartz AL
199539 kDa receptor-associated protein is an ER resident protein and molecular chaperone for LDL receptor-related proteinEMBO J 14:2269–80https://doi.org/10.1002/j.1460-2075.1995.tb07221.x PubMed Google Scholar
59.
1. Uhlen M
2. Karlsson MJ
3. Hober A
4. Svensson AS
5. Scheffel J
6. Kotol D
7. et al.
2019The human secretomeSci Signal 12https://doi.org/10.1126/scisignal.aaz0274 PubMed Google Scholar
60.
1. Atanes P
2. Ruz-Maldonado I
3. Hawkes R
4. Liu B
5. Zhao M
6. Huang GC
7. et al.
2018Defining G protein-coupled receptor peptide ligand expressomes and signalomes in human and mouse isletsCell Mol Life Sci 75:3039–50https://doi.org/10.1007/s00018-018-2778-z PubMed Google Scholar
61.
1. Gupta R
2. Nguyen DC
3. Schaid MD
4. Lei X
5. Balamurugan AN
6. Wong GW
7. et al.
2018Complement 1q like-3 protein inhibits insulin secretion from pancreatic beta-cells via the cell adhesion G protein-coupled receptor BAI3J Biol Chem https://doi.org/10.1074/jbc.RA118.005403 PubMed Google Scholar
62.
1. Dorrell C
2. Schug J
3. Canaday PS
4. Russ HA
5. Tarlow BD
6. Grompe MT
7. et al.
2016Human islets contain four distinct subtypes of beta cellsNat Commun 7https://doi.org/10.1038/ncomms11756 PubMed Google Scholar
63.
1. Bader E
2. Migliorini A
3. Gegg M
4. Moruzzi N
5. Gerdes J
6. Roscioni SS
7. et al.
2016Identification of proliferative and mature beta-cells in the islets of LangerhansNature 535:430https://doi.org/10.1038/nature18624 PubMed Google Scholar
64.
1. van der Meulen T
2. Xie R
3. Kelly OG
4. Vale WW
5. Sander M
6. Huising MO
2012Urocortin 3 marks mature human primary and embryonic stem cell-derived pancreatic alpha and beta cellsPLoS One 7:e52181https://doi.org/10.1371/journal.pone.0052181 PubMed Google Scholar
65.
1. Camunas-Soler J
2. Dai XQ
3. Hang Y
4. Bautista A
5. Lyon J
6. Suzuki K
7. et al.
2020Patch-Seq Links Single-Cell Transcriptomes to Human Islet Dysfunction in DiabetesCell Metab 31:1017–31https://doi.org/10.1016/j.cmet.2020.04.005 PubMed Google Scholar
66.
1. Kravets V
2. Dwulet JM
3. Schleicher WE
4. Hodson DJ
5. Davis AM
6. Pyle L
7. et al.
2021Functional architecture of the pancreatic islets: first responder cells drive the first-phase [Ca2+] responsebioRxiv https://doi.org/10.1101/2020.12.22.424082 Google Scholar
67.
1. Salem V
2. Silva LD
3. Suba K
4. Georgiadou E
5. Neda Mousavy Gharavy S
6. Akhtar N
7. et al.
2019Leader beta-cells coordinate Ca(2+) dynamics across pancreatic islets in vivoNat Metab. 1:615–29https://doi.org/10.1038/s42255-019-0075-2 PubMed Google Scholar
68.
1. Johnston NR
2. Mitchell RK
3. Haythorne E
4. Pessoa MP
5. Semplici F
6. Ferrer J
7. et al.
2016Beta Cell Hubs Dictate Pancreatic Islet Responses to GlucoseCell Metab 24:389–401https://doi.org/10.1016/j.cmet.2016.06.020 PubMed Google Scholar
69.
1. Weidemann BJ
2. Marcheva B
3. Kobayashi M
4. Omura C
5. Newman MV
6. Kobayashi Y
7. et al.
2024Repression of latent NF-kappaB enhancers by PDX1 regulates beta cell functional heterogeneityCell Metab 36:90–102https://doi.org/10.1016/j.cmet.2023.11.018 PubMed Google Scholar
70.
1. Morris SA
2019The evolving concept of cell identity in the single cell eraDevelopment 146https://doi.org/10.1242/dev.169748 PubMed Google Scholar
71.
1. Systems V Cell
2017What Is Your Conceptual Definition of “Cell Type” in the Context of a Mature Organism?Cell Syst 4:255–9https://doi.org/10.1016/j.cels.2017.03.006 PubMed Google Scholar
72.
1. Xin Y
2. Kim J
3. Okamoto H
4. Ni M
5. Wei Y
6. Adler C
7. et al.
2016RNA Sequencing of Single Human Islet Cells Reveals Type 2 Diabetes GenesCell Metab https://doi.org/10.1016/j.cmet.2016.08.018 PubMed Google Scholar
73.
1. Kobayashi S
2. Wagatsuma H
3. Ono R
4. Ichikawa H
5. Yamazaki M
6. Tashiro H
7. et al.
2000Mouse Peg9/Dlk1 and human PEG9/DLK1 are paternally expressed imprinted genes closely located to the maternally expressed imprinted genes: mouse Meg3/Gtl2 and human MEG3Genes Cells 5:1029–37https://doi.org/10.1046/j.1365-2443.2000.00390.x PubMed Google Scholar
74.
1. Huang D
2. Han Y
3. Tang T
4. Yang L
5. Jiang P
6. Qian W
7. et al.
2023Dlk1 maintains adult mice long-term HSCs by activating Notch signaling to restrict mitochondrial metabolismExp Hematol Oncol 12:11https://doi.org/10.1186/s40164-022-00369-9 PubMed Google Scholar
75.
1. Grassi ES
2. Pietras A
2022Emerging Roles of DLK1 in the Stem Cell Niche and Cancer StemnessJ Histochem Cytochem 70:17–28https://doi.org/10.1369/00221554211048951 PubMed Google Scholar
76.
1. Nueda ML
2. Naranjo AI
3. Baladron V
4. Laborda J
2014The proteins DLK1 and DLK2 modulate NOTCH1-dependent proliferation and oncogenic potential of human SK-MEL-2 melanoma cellsBiochim Biophys Acta 1843:2674–84https://doi.org/10.1016/j.bbamcr.2014.07.015 PubMed Google Scholar
77.
1. Abdallah BM
2. Ditzel N
3. Laborda J
4. Karsenty G
5. Kassem M
2015DLK1 Regulates Whole-Body Glucose Metabolism: A Negative Feedback Regulation of the Osteocalcin-Insulin LoopDiabetes 64:3069–80https://doi.org/10.2337/db14-1642 PubMed Google Scholar
78.
1. Zhao Z
2. D’Oliveira Albanus R
3. Taylor H
4. Tang X
5. Han Y
6. Orchard P
7. et al.
2023An integrative single-cell multi-omics profiling of human pancreatic islets identifies T1D associated genes and regulatory signalsRes Sq https://doi.org/10.21203/rs.3.rs-3343318/v1 PubMed Google Scholar
79.
1. Korsunsky I
2. Millard N
3. Fan J
4. Slowikowski K
5. Zhang F
6. Wei K
7. et al.
2019Fast, sensitive and accurate integration of single-cell data with HarmonyNat Methods 16:1289–96https://doi.org/10.1038/s41592-019-0619-0 PubMed Google Scholar
80.
1. Wang T
2. Johnson TS
3. Shao W
4. Lu Z
5. Helm BR
6. Zhang J
7. et al.
2019BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypesGenome Biol 20:165https://doi.org/10.1186/s13059-019-1764-6 PubMed Google Scholar
81.
1. Hie B
2. Bryson B
3. Berger B
2019Efficient integration of heterogeneous single-cell transcriptomes using ScanoramaNat Biotechnol 37:685–91https://doi.org/10.1038/s41587-019-0113-3 PubMed Google Scholar
82.
1. Hart PA
2. Kudva YC
3. Yadav D
4. Andersen DK
5. Li Y
6. Toledo FGS
7. et al.
2023A Reduced Pancreatic Polypeptide Response is Associated With New-onset Pancreatogenic Diabetes Versus Type 2 DiabetesJ Clin Endocrinol Metab 108:e120–e8https://doi.org/10.1210/clinem/dgac670 PubMed Google Scholar
83.
1. Couetil JL
2. Liu Z
3. Alomari AK
4. Zhang J
5. Huang K
6. Johnson TS
2023Diagnostic Evidence Gauge of Spatial Transcriptomics (DEGAS): Using transfer learning to map clinical data to spatial transcriptomics in prostate cancerbioRxiv https://doi.org/10.1101/2023.04.21.537852 Google Scholar
84.
1. Mootha VK
2. Lindgren CM
3. Eriksson KF
4. Subramanian A
5. Sihag S
6. Lehar J
7. et al.
2003PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNat Genet 34:267–73https://doi.org/10.1038/ng1180 PubMed Google Scholar
85.
1. Subramanian A
2. Tamayo P
3. Mootha VK
4. Mukherjee S
5. Ebert BL
6. Gillette MA
7. et al.
2005Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profilesProc Natl Acad Sci U S A 102:15545–50https://doi.org/10.1073/pnas.0506580102 PubMed Google Scholar
86.
1. Liberzon A
2. Birger C
3. Thorvaldsdottir H
4. Ghandi M
5. Mesirov JP
6. Tamayo P
2015The Molecular Signatures Database (MSigDB) hallmark gene set collectionCell Syst 1:417–25https://doi.org/10.1016/j.cels.2015.12.004 PubMed Google Scholar

Article and author information

Author information

Gitanjali Roy
Indiana Biosciences Research Institute, Indianapolis, IN, USA
ORCID iD: 0000-0002-9622-0184
Rameesha Syed
Indiana Biosciences Research Institute, Indianapolis, IN, USA
- Equal contributors.
Olivia Lazaro
Indiana Biosciences Research Institute, Indianapolis, IN, USA
- Equal contributors.
Sylvia Robertson
Indiana Biosciences Research Institute, Indianapolis, IN, USA
Sean D. McCabe
Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
ORCID iD: 0000-0003-3401-5334
Daniela Rodriguez
Indiana Biosciences Research Institute, Indianapolis, IN, USA
Alex M. Mawla
Department of Neurobiology, Physiology and Behavior, College of Biological Sciences, University of California, Davis, Davis, CA
ORCID iD: 0000-0003-0907-464X
Travis S. Johnson
Indiana Biosciences Research Institute, Indianapolis, IN, USA, Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
ORCID iD: 0000-0002-4628-2256
- Correspondence: mkalwat@indianabiosciences.org and johnstrs@iu.edu
Michael A. Kalwat
Indiana Biosciences Research Institute, Indianapolis, IN, USA, Center for Diabetes and Metabolic Diseases, Indiana University School of Medicine, Indianapolis, IN, USA
ORCID iD: 0000-0002-8349-9470
- Correspondence: mkalwat@indianabiosciences.org and johnstrs@iu.edu

Version history

Preprint posted: January 23, 2024
Sent for peer review: February 28, 2024
Reviewed Preprint version 1: June 25, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.96713. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 558
downloads: 25
citation: 1

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Background

Data description

Human islet bulk transcriptomic dataset acquisition, processing, and analysis

Data acquisition and workflow to train DEGAS using human pancreatic islet transcriptomic data for prediction of T2D-associated cells.

Human islet single-cell data acquisition, filtering, and integration

Validation of merged scRNA-seq datasets from five human islet studies.

Analyses

Differentially expressed genes from T2D bulk RNA-seq human islets show inflammatory signature and correlate with independent islet datasets

DEGAS revealed T2D-associated β-cells and marker genes within integrated human islet scRNA-seq data

DEGAS analysis based on T2D status.

Identification of differentially-expressed genes in high- and low-scoring βT2D-DEGAS cell populations.

β-cell clusters with high BMI-association scores (high βobese-DEGAS) show distinct differences between non-diabetic and T2D donors

DEGAS analysis based on obesity status.

Two subpopulations of high βobese-DEGAS cells defined by underlying single cell donor T2D status uncovers stress signature in non-diabetic high-scoring obese-DEGAS β-cells.

DLK1 is a candidate identified in βT2D-DEGAS and ND-βobese-DEGAS cells and is heterogeneously expressed among diabetic and non-diabetic human pancreatic islets

DLK1 is a β-cell gene heterogeneously expressed between cells and islets of non-diabetic and T2D humans.

Discussion

Deep transfer learning is a useful approach to identify disease-associated genes in subsets of heterogeneous islet cells

Differences between high- and low-scoring βT2D-DEGAS subpopulations

Differences between ND-βobese-DEGAS and T2D-βobese-DEGAS subsets of β-cells

Potential for DEGAS in identifying β-cell heterogeneity markers

Limitations of the study and future directions

Potential implications

Methods

Transfer learning using DEGAS on human islet transcriptomic data

Subclustering of β-cells and post-DEGAS scRNA-seq analysis

Pathway analysis

Human pancreas tissue staining and microscopy

Statistical Analysis

Supporting information

Availability of source code and requirements

Data availability

Declarations

List of abbreviations

Consent for publication

Competing interests

Funding

Authors’ contributions

Acknowledgements

Supplemental Information

Islet Gene View results of DEGs found in both Marselli et al. and Asplund et al.

Human islet single cell RNA-seq clusters and diabetes marker genes for all cells and β-cells.

Thresholding analysis for βT2D-DEGAS scores and differential gene expression analysis and plots.

Overlap of RePACT trajectory genes with genes identified by DEGAS in human islet scRNA-seq data.

Additional Files

References

Article and author information

Author information

Gitanjali Roy

Rameesha Syed†

Olivia Lazaro†

Sylvia Robertson

Sean D. McCabe

Daniela Rodriguez

Alex M. Mawla

Travis S. Johnson

Michael A. Kalwat

Version history

Cite all versions

Copyright

Metrics

Identification of differentially-expressed genes in high- and low-scoring β^T2D-DEGAS cell populations.

β-cell clusters with high BMI-association scores (high β^obese-DEGAS) show distinct differences between non-diabetic and T2D donors

Two subpopulations of high β^obese-DEGAS cells defined by underlying single cell donor T2D status uncovers stress signature in non-diabetic high-scoring obese-DEGAS β-cells.

DLK1 is a candidate identified in β^T2D-DEGAS and ND-β^obese-DEGAS cells and is heterogeneously expressed among diabetic and non-diabetic human pancreatic islets

Differences between high- and low-scoring β^T2D-DEGAS subpopulations

Differences between ND-β^obese-DEGAS and T2D-β^obese-DEGAS subsets of β-cells

Thresholding analysis for β^T2D-DEGAS scores and differential gene expression analysis and plots.

Rameesha Syed

Olivia Lazaro