Graphical description of the identification of cell-type specific marker peaks and reference ATAC-Seq profiles included in the EPIC-ATAC framework. 1) 564 pure ATAC-Seq data of sorted cells were collected to build reference profiles for cancer-relevant cell populations. 2) Cell-type specific marker peaks were identified using differential accessibility analysis. 3) Markers with previously observed chromatin accessibility in human healthy tissues were then excluded. 4) For tumor bulk deconvolution, the set of remaining marker peaks was refined by selecting markers with correlated behavior in tumor bulk samples. 5) The cell-type specific marker peaks and reference profiles were finally integrated in the EPIC-ATAC framework to perform bulk ATAC-Seq deconvolution. Parts of this figure were created with BioRender.com.

© 2024, BioRender Inc. Any parts of this image created with BioRender are not made available under the same license as the Reviewed Preprint, and are © 2024, BioRender Inc.

ATAC-Seq data from sorted cell populations reveal cell-type specific marker peaks and reference profiles.

A) Number of samples collected for each cell type. The colors correspond to the different studies of origin. B) Representation of the collected samples in 2D using UMAP based on the PBMC markers (left) and TME markers (right). Colors correspond to cell types. C) Scaled averaged chromatin accessibility of the cell-type specific marker peaks (rows) in each cell type (columns) in the ATAC-Seq reference samples used to identify the marker peaks. D) Scaled averaged chromatin accessibility of the marker peaks in external ATAC-Seq data from samples of pure cell types excluded from the reference samples (see Material and Methods). E) Scaled averaged chromatin accessibility of the marker peaks in an external scATAC-Seq dataset (Human Atlas (K. Zhang et al. 2021)). F) Distribution of the marker peak distances to the nearest transcription start site (TSS) (left panel) and the ChiPSeeker annotations (right panel). G) Significance (-log10(q.value)) of pathways (columns) enrichment test obtained using ChIP-Enrich on each set of cell-type specific marker peaks (rows). A subset of relevant enriched pathways is represented. Colors of the names of the pathways correspond to cell types where the pathways were found to be enriched. When pathways were significantly enriched in more than one set of peaks, pathways names are written in bold.

List of nearest genes and enriched CBPs reported in the PanglaoDB or CellMarker databases.

EPIC-ATAC accurately estimates immune cell fractions in PBMC ATAC-Seq samples.

A) Schematic description of the experiment designed to validate the ATAC-Seq deconvolution on PBMC samples. B) Comparison between cell-type proportions predicted by EPIC-ATAC and the true proportions in the PBMC bulk dataset. Symbols correspond to donors. C) Comparison between the proportions of cell-types predicted by EPIC-ATAC and the true proportions in the PBMC pseudobulk dataset. Symbols correspond to pseudobulks. D) Pearson correlation (left) and RMSE (right) values obtained by each deconvolution tool on the PBMC bulk dataset. The EPIC-ATAC results are highlighted in red. E) Pearson correlation (left) and RMSE (right) values obtained by each deconvolution tool on the PBMC pseudobulk dataset. Parts of this figure (panel 1) were created with BioRender.com.

© 2024, BioRender Inc. Any parts of this image created with BioRender are not made available under the same license as the Reviewed Preprint, and are © 2024, BioRender Inc.

EPIC-ATAC accurately predicts fractions of cancer and non-malignant cells in tumor samples.

A) Comparison between cell-type proportions estimated by EPIC-ATAC and true proportions for the basal cell carcinoma (top) and gynecological (bottom) pseudobulk datasets. Symbols correspond to pseudobulks. B) Pearson’s correlation and RMSE values obtained for the deconvolution tools included in the benchmark. EPIC-ATAC is highlighted in red. C) Same analyses as in panels B, with the uncharacterized cell population excluded for the evaluation of the predictions accuracy. The predicted and true proportions of the immune, stromal and vascular cell types were rescaled to sum to 1.

T cell subtypes quantification reveals the ATAC-Seq deconvolution limits for closely related cell types.

A) Comparison of the proportions estimated by EPIC-ATAC and the true proportions for PBMC samples (PBMC experiment and PBMC pseudobulk samples combined) (top) and the basal cell carcinoma pseudobulks (bottom). Predictions of the proportions of CD4+ and CD8+ T-cells were obtained using the reference profiles based on the major cell types and subtype predictions using the reference profiles including the T-cell subtypes. B) Pearson’s correlation values obtained by EPIC-ATAC in each cell type.

EPIC-ATAC accurately infers the immune contexture in a bulk ATAC-Seq breast cancer cohort.

A) Proportions of different cell types predicted by EPIC-ATAC in the samples stratified based on two breast cancer subtypes. B) Proportions of different cell types predicted by EPIC-ATAC in the samples stratified based on three ER+/HER2-subgroups. Wilcoxon test p-values are represented at the top of the boxplots.

EPIC-ATAC performs similarly to EPIC RNA-seq based deconvolution and better than gene activity based deconvolution.

Pearson’s correlation (left) and RMSE (right) values comparing the proportions predicted by the ATAC-Seq deconvolution, the RNA-Seq deconvolution and the GA-based RNA deconvolution and true cell-type proportions in the 100 pseudobulks simulated form the 10x multiome PBMC dataset (10x Genomics 2021). Dots correspond to outlier pseudobulks.