Figures and data

Large Scale Data Integration Creates an Atlas of AML
(A) Proportion of cells (left panel) and samples (right panel) belonging to each AML subtype as defined by the ELN clinical guideline. (B) Age group and gender distribution of AML scAtlas cohort samples. (C) scVI harmonized UMAP colored by annotated cell types. (D) The expression of key hematopoietic marker genes across annotated cell types shown on a dotplot. Color scale shows mean gene expression, dot size represents the fraction of cells expressing the given gene. (E) Selected hematopoietic stem and progenitor cell (HSPC) clusters from the AML scAtlas, annotated with a curated reference of leukemia stem and progenitor cells (LSPCs). (F) Proportions of HSPC/LSC populations in each AML subtype as defined by the ELN clinical guideline. (G) Proportions of HSPC/LSC populations in each AML risk group as defined by the ELN clinical guideline.

AML scAtlas Reveals Age-Associated Heterogeneity in t(8;21) AML
(A) Clinical characteristics of the AML scAtlas t(8;21) samples, which were divided into three age groups. Gender is indicated where available. *Sample sex inferred from XIST/ChrY expression. (B) Using the AML scAtlas t(8;21) sample cells, UMAP was re-computed and shows the different cell types. (C) Bar plots of the absolute cell type numbers (left panel) and the cell type proportions (right panel) stratified by age group. The CD34 enrichment performed on several adult samples is reflected. (D) Using HSPCs and CMPs only, the pySCENIC gene regulatory network (GRN) and regulon AUC scores were calculated. Z-score normalized scores underwent hierarchical clustering to create a clustered heatmap and identify age-associated regulons. Regulons were prioritized using their regulon specificity scores (RSS).

Validation of Age-Associated Regulons in Large Bulk RNA-Seq Cohorts
(A) Using previously defined age-associated regulons, pySCENIC AUC scores (Z-score normalized) were clustered to identify samples most enriched for inferred-prenatal and inferred-postnatal origin signatures. (B) Volcano plot of differentially expressed genes when comparing the inferred-prenatal origin and inferred-postnatal origin samples. Adjusted P value threshold 0.01; log2 fold change threshold 0.5. Regulon signature associated TFs are indicated. (C) Enrichment plot of significant gene sets enriched in the inferred-prenatal origin samples. GSEA was performed on the DEGs using MSigDB databases. FDR q-value threshold <0.05. (D) Enrichment plot of drug sensitivity gene sets enriched in the inferred-prenatal samples. GSEA was performed on the DEGs, using drug response signatures from published studies of 4 widely used AML drugs. FDR q-value threshold <0.05. (E) The predicted cell type proportions estimated using AutoGeneS deconvolution, of the inferred-prenatal and inferred-postnatal origin samples were compared using T-Tests. Significant P values <0.05 (*), <0.01 (**), <0.001 (***) and <0.0001 (****) are indicated.

Combining Multiomics Data Interrogates Age-Associated Regulons
(A) SCENIC+ eRegulon dotplot of showing correlation between scRNA-seq target gene activity (indicated by the color scale) and scATAC-seq target region accessibility (depicted by spot size). RSS identified the key activating eRegulons (+/+) between inferred-prenatal and inferred-postnatal origin disease and allows comparison of diagnosis (Dx) and relapse (Rel) time points. (B) Correlation plot of the eRegulon target genes. Clusters reflecting inferred-prenatal and inferred-postnatal origin disease at diagnosis (Dx) and relapse (Rel) reflect co-binding of eRegulon TFs. (C) Over-representation analysis of age-associated eRegulon target genes using GO Biological Processes curated gene sets. Adjusted P value threshold 0.05. (D) Principal components analysis (PCA) of the gene based eRegulon enrichment scores for the inferred-prenatal origin disease at diagnosis and relapse. PC1 axis explains variance occurring between diagnosis and relapse, where this patient underwent a lineage switch. PC2 captures variance related to hematopoietic differentiation. (E) SCENIC+ perturbation simulation shows the predicted effect of knockout of selected TFs on the previously computed PCA embedding. Arrows indicate the predicted shift in cell states relative to the initial PCA embedding.