easySHARE-seq enables high-quality and accurate simultaneous scATAC-seq and scRNA-seq profiling.

(A) Schematic workflow of easySHARE-seq. (B) Generation and structure of the cell-specific barcode within Index 1. Total length of the final barcode is just 17nt compared to 99nt previously. (C) Fraction of sequenced DNA bases allocated for either barcodes (grey) or genetic information (red) in easySHARE-seq and. SHARE-seq using different sequencing kits. (D) Left: Principle of a species-mixing experiment. Murine OP-9 and human HEK cells are mixed prior to easySHARE-seq. After sequencing, sequences associated with each cell barcode are assessed for genome of origin. Middle left: Unique ATAC-seq fragments per cell aligning to the mouse or human genome. Cells are coloured according to their assigned origin (red: human; blue: mouse; orange: doublet). Middle right: Unique RNA-seq transcripts per cell aligning to the mouse or human genome. Right: Percentage of ATAC-seq fragments or RNA-seq transcripts per cell relative to total sequencing reads mapping uniquely to the human genome. 3.17% of all observed cells classified as doublets. Accounting for same-species doublets, this results in a doublet rate of 6.34%. (E) Comparison of UMIs/cell across different single-cell technologies. Red shading denotes all multiomic technologies. Datasets are this study, SHARE-seq 28 (murine skin cells), sci-CAR 30 (murine kidney nuclei), SNARE-seq 33 (adult & neonatal mouse cerebral cortex nuclei), 10x 3’ Expression 34 (murine liver nuclei) and sci-RNAseq3 35 (E16.5 mouse embryo nuclei). Cells have been downsampled to a common sequencing depth where possible (see Methods). (F) Comparison of fragments per cell across different single-cell technologies. Colouring as in (A). Datasets differing to (A) are 10x 3’scATAC 36 (murine liver nuclei) and sciATAC-seq 37 (murine liver nuclei). Cells have been downsampled to a common sequencing depth where possible (see Methods).

Comparison between multiomic single-cell technologies

Cell type classification in primary liver nuclei using joint expression and chromatin accessibility profiles.

(A) UMAP visualisation of WNN-integrated scRNA-seq and scATAC-seq modalities of 19,664 liver nuclei. Nuclei are coloured by their assigned cell type identity. (B) WNN-UMAPs of 19,664 liver nuclei with nuclei coloured according to the mean expression strength of marker genes for several cell types (Alb: Hepatocytes, Clec4g: Liver Sinusoidal Endothelial Cells (LSECs), Dcn: Hepatic Stellate Cells (HSCs), Vsig4: Kupffer Cells). Plots for further marker genes can be found in Suppl. Fig. 2F. (C) Violin Plots depicting the distribution of the normalised expression level of marker genes in all cells assigned to each cell type. (D) Violin Plots depicting the distribution of normalised chromatin accessibility in all cells of each cell type in open chromatin regions overlapping marker genes.

Assigning putative cis-regulatory elements to their target genes by correlating simultaneous measurements of gene expression and chromatin accessibility.

(A) Schematic depicting the concept for linking putative cis-regulatory elements (pCREs) to their target genes. For each gene, all open chromatin regions (OCRs) within +- 500bk of its transcription start site (TSS) are tested. A pCRE is linked to a gene if the spearman-correlation of its chromatin accessibility to the gene’s expression falls outside the expected distribution estimated by correlating chromatin accessibility of 100 unrelated peaks (on different chromosomes) to the gene expression. (B) Genes ranked by their number of significantly correlated pCREs (P < 0.05, FDR < 0.1, ±500kbp from TSS) in Liver Sinusoidal Endothelial Cells (LSECs). Marked are genes in the top 1% that are either transcription factors or cell-type specific regulators shown to fulfill a critical role in LSECs. (C) pCREs are enriched for TSS proximity. Normalised density of all open chromatin regions within ±50kbp of a TSS (red) and of all pCREs within ±50kbp of a TSS (blue). (D) Aggregate snATAC-seq pileup (red) of LSECs at the Gata4 locus and 500kbp upstream region. Grey bars indicate open chromatin regions. Loops denote pCREs significantly correlated with Gata4 and are coloured by Spearman correlation of respective pCRE–Gata4 comparison.

Zonation profiles in LSECs across gene expression and chromatin accessibility.

(A) Schematic depiction of a liver lobule. A liver lobule has a ‘Central–Portal Axis’ defined by morphogen gradients starting from the central vein to the portal vein and portal artery. The sinusoidal capillary channels are lined with Liver Sinusoidal Endothelial Cells (LSECs). (B) UMAP of 1,561 LSECs coloured by pseudotime. (C) Changes along the Central–Portal Axis at the Wnt2 locus. Top: Aggregate snATAC-seq profile (red) of LSECs at the Wnt2 locus. Grey bars denote identified open chromatin regions (OCRs). Bottom: In blue, loess trend line of mean normalised Wnt2 gene expression along pseudotime / the Central–Portal-Axis (CP-axis, central vein, CV; portal vein, PV). In red, loess trend line of mean normalised chromatin accessibility in OCRs at the Wnt2 locus along the CP-axis. (D) Loess trend line of mean normalised gene expression (blue) of marker genes and mean normalised chromatin accessibility (red) at OCRs overlapping the marker gene along the CP-axis for pericentral markers (top, increased toward the central vein, Kit, Dkk3 and Rspo3) and periportal markers (increased toward the portal vein, Efnb2, Meis1 & Ltbp4) (E) Left: Zonation profiles of 550 genes along the CP axis. Right: Zonation profiles of 744 open chromatin regions along the CP axis. All profiles are normalised by their maximum.

Barcode structure and summary of quality control measures in liver nuclei.

(A) Structure of scATAC-seq and scRNA-seq sequencing reads in easySHARE-seq and the original protocol (BC1/2: barcode segment 1/2, UMI: Unique Molecular Identifier. (B) Percentage of total scRNA-seq sequencing reads containing cDNA fragments in murine primary liver nuclei. (C)Percentage of de-duplicated scRNA-seq sequencing reads overlapping an exon, intron, 5’UTR or 3’UTR. (D) Distribution of fraction of reads in peaks (FRiP) per cell in scATAC-seq data in murine primary liver nuclei (mean FRiP: 0.55). (E) Boxplot depicting the distribution of expressed genes and accessible peaks per cell in murine primary liver nuclei (mean expressed genes: 1,798; mean accessible peaks: 1,983) (F) Number of mean UMIs per cell recovered in the snRNA-seq when subsampling to different raw sequencing depths. (G) Number of mean fragments in peaks per cell recovered in the snATAC-seq when subsampling to different raw sequencing depths. (H) Mean transcription start site (TSS) enrichment score per cell in relation to distance from nearest TSS in the snATAC-seq data. (I) Histogram of fragment length in snATAC-seq sequencing reads. (J) Reproducibility of easySHARE-seq between sub-libraries shown by comparing the number of UMIs recovered per gene or peak across them. Each dot depicts either a gene (left) or peak (right). (K) Reproducibility of easySHARE-seq between biological replicates shown by comparing the number of UMIs recovered per gene or peak across biological replicates. Each dot depicts either a gene (left) or peak (right). (L) Comparison of genes expressed per cell across different single-cell technologies. Red shading denotes all multiomic technologies. Datasets are the same as in Fig. 1E. Cells have been downsampled to a common sequencing depth where possible (see Methods). (M) Comparison of accessible peaks per cell across different single-cell technologies. Colouring as in (L). Datasets are the same as in Fig. 1F. Cells have been downsampled to a common sequencing depth where possible (see Methods).

easySHAREseq robustly separates cell types.

(A) UMAP visualisation of total merged and integrated liver nuclei snRNA-seq data. Nuclei are coloured according to their cell type identified in Fig. 2. (B) UMAP visualisation of total merged and integrated liver nuclei snRNA-seq data. Nuclei are coloured according to their cell type identified in Fig. 2. (C) Fraction of each recovered cell type relative to all 19,664 nuclei. (D) Violin plots depicting the distribution of Unique Molecular Identifiers (UMIs; transcripts) per cell split by cell type. (E) Violin plots depicting the distribution of unique fragments per cell split by cell type. (F) WNN-UMAPs of 19,664 liver nuclei with nuclei coloured according to the mean expression strength of marker genes for several cell types (Cyp3a25: Hepatocytes, Stab2: Liver Sinusoidal Endothelial Cells (LSECs), Reln: Hepatic Stellate Cells (HSCs), Clec4f: Kupffer Cells, Ptprc: BCells, Oasl1: Monocytes, Kcnip1: Neurons Spp1: Cholangiocytes). Red circles indicate the position of the cell population showing elevated expression for this marker gene.

Summary of peak–gene correlations.

(A) Number of significantly correlated putative cis-regulatory elements (pCREs) per gene (P < 0.05, FDR <= 0.1), considering all peaks ±500kbp of the transcription start site (TSS). (B) Number of genes a given pCREs is significantly correlated with (P < 0.05, FDR <= 0.1), considering all peaks ±500kbp of the TSS. (C) Number of significantly correlated pCREs per gene (P < 0.05, FDR <= 0.1), considering all peaks ±50kbp of the TSS. (D) Number of genes a given pCREs is significantly correlated with (P < 0.05, FDR <= 0.1), considering all peaks ±50kbp of the TSS. (E) Histogram of Spearman correlations of all significant peak–gene correlations (P < 0.05). (F) Histogram of Spearman correlations of all non-significant peak–gene correlations (P > 0.05). (G) Aggregate snATAC-seq track in Liver Sinusoidal Endothelial Cells (LSECs) at the Igf1 locus and its upstream region. Grey bars indicate open chromatin regions. Loops denote significantly correlated pCREs with Igf1 and are coloured by their respective Spearman correlation. Shaded grey area denotes a potentially LSEC-specific cis-regulatory element regulating Igf1 expression. (H) Gene Ontology enrichment analysis of genes whose associated pCREs are associated with five or more genes.