Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells

Volker Soltys; Moritz Peters; Dingwen Su; Marek Kučka; Yingguang Frank Chan

doi:10.7554/eLife.110034.2

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Reviewing Editor
Yamini Dalal
National Cancer Institute, Bethesda, United States of America
Senior Editor
Yamini Dalal
National Cancer Institute, Bethesda, United States of America

Reviewer #1 (Public review):

In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

https://doi.org/10.7554/eLife.110034.2.sa2

Reviewer #2 (Public review):

Aims:

The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

Strengths:

The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

Weaknesses:

There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the use of the method in certain situations.

Overall:

The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the trade-offs are appropriate.

https://doi.org/10.7554/eLife.110034.2.sa1

Author response:

The following is the authors’ response to the original reviews.

Public Reviews:

Reviewer #1 (Public review):

In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cisregulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

Recommendations:

(1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

We added Scale Biosciences to the relevant part in the introduction.

(2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

We agree that a comparison to the 10x Multiome technology would be of interest in the community. Therefore, we included such a dataset profiling murine liver nuclei in the comparison in Figure 1 E&F as well as Suppl. Fig. 1 L&M. The resulting comparison remains consistent - easySHARE-seq compares favourably to other multiomic technique in RNA-seq data quality (UMIs/cell) but not in ATAC-seq data quality (fragments/cell).

(3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

We thank the reviewer for this suggestion. We agree that knowing the specific step(s) impacting ATAC-seq data quality would be highly valuable. Unfortuantely, we are not in a position to perform the additional wetlab experiments. It remains an area of improvement as we develop the technique further. We can confirm, however, that our early trials showed that the extent of fixation is negatively correlated with ATAC-seq data recovery.

(4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

We thank the reviewer for this suggestion. Since we cannot perform further experiments, we expanded and clarified on upscaling this assay in our Supplementary Notes and referred to them in the text.

We also added a paragraph specifically discussing the use of barcoded Tn5 in the Supplementary Notes.

(5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

We thank the reviewer for pointing out this potential issue. We have used ‘decontX’ to estimate and ‘de-contaminate’ our UMI counts. We have added a histogram of estimated fraction of contaminated counts per nuclei to Suppl. Fig. 1. We have used the decontaminated counts to re-generate the analysis in Fig. 2 B&C and Suppl. Fig. 2 F. This filtering step did not change the results of these analyses; in fact it strengthened the results and improved clarity. We have added the relevant information to the Methods section and codebase and discussed the results and implications in the Supplementary Notes which we briefly summarize here:

“As reported in Suppl. Fig. 10, decontX identifies mean contaminated counts of 9.6% and median contaminated counts of 1.4%, suggesting that few cells that are heavily contaminated strongly inflate the overall estimation of contaminated counts. This could be due to 1) doublets or b) wrongly assigned cell types. The authors of decontX report contamination values of 1-4% in commercial droplet-based protocols and 11-14% in plate-based protocols, suggesting that easySHARE-seq performs better than other plate-based assays.”

We again want to thank the reviewer for this suggestion. It has improved the manuscript.

Reviewer #2 (Public review):

Aims:

The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

Strengths:

The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

Weaknesses:

There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the broad use of the method. While the authors are transparent about this tradeoff, additional discussion would be helpful regarding how this affects data interpretation. Comparisons showing consistency between easySHARE-seq and SHARE-seq chromatin accessibility patterns at the single-cell level would strengthen confidence in the method.

Overall:

The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the tradeoffs are appropriate. Recommendations for the authors:

Recommendations for the authors:

Reviewer #1 (Recommendations for the authors):

Figure 1F appears identical to Supplementary Figure 1M. This should be corrected if this is in error.

Fixed.

Reviewer #2 (Recommendations for the authors):

The following comments are intended to strengthen the work.

(1) scATAC-seq Performance and Data Consistency

While I appreciate the authors' transparency regarding scATAC-seq performance, the extent of underperformance warrants greater emphasis. Additionally, does the average ATAC-seq signal recapitulate previously published results? At the single-cell level, how consistent are easySHARE-seq and SHARE-seq data? I suspect that increased dropout in scATAC-seq may distort consistency between datasets. This should be explicitly discussed in terms of data interpretation.

We thank the reviewer for this suggestion. We have cross-referenced the open chromatin regions in this study and we summarise the result at the end of the ‘benchmarking’ paragraph. We have further expanded on the limitations in our study in the ATAC-seq data given the lower data quality in the relevant part of the discussion. We should note that a direct comparison between SHARE-seq and this study is challenging due to different sample tissues.

(2) LSEC Biological Investigations

The biological investigations could be strengthened (though this may reflect my limited expertise with LSECs).

(a) Enhancer analysis depth

While the authors quantify potential enhancers through RNA-ATAC correlations within individual cells and identify genes regulated by multiple enhancers, a deeper exploration of enhancer biology would strengthen the manuscript. Potential questions include: Do genes sharing correlated enhancer activity also show correlated expression? How do enhancer number and strength relate to gene expression levels? How do RNA-ATAC correlations scale with ATAC peak height? Are stronger enhancers more tightly linked to gene expression? Perhaps the authors explored these questions without finding significant patterns, but this should be clarified.

We thank the reviewer for this suggestions. We performed several analyses aimed at exploring enhancer biology with this dataset. We added a simple comparison for UMIs per gene between genes with at least one associated peak compared to those without in Suppl. Fig. 3I. We provide the corresponding plot for fragments per peak in Suppl. Fig. 3J. We also explored the relationship between gene expression and chromatin accessibility; here, we found that gene expression levels do not correlate with peak heights of chromatin accessibility (possibly because chromatin accessibility signals were somewhat binary). The corresponding plot has been added to Suppl. Fig. 3K. We added a small paragraph discussing these findings in the main text.

(b) Correlation magnitude interpretation

The reported correlation values are extremely small. Does this reflect weak biological linkages or primarily experimental noise? If experimental noise, how does variation in detection per gene influence the confidence in this type of analysis?

We thank the reviewer for raising this potential issue. We identify a total of 40,957 significant peak-gene associations with a mean Spearman correlation of 0.1 (± 0.056; Suppl. Fig. 3E). This analytical workflow to identify these gene-peak associations was first described alongside SHARE-seq in Ma et al.. For context, they reported significant peak-gene associations to have a mean Spearman correlation of 0.026 (± 0.015; Ma et al. Table S4).

Generally, we hypothesize that these low correlation values in this type of analysis are the results of sparseness of single-cell data, especially in chromatin accessibility. Therefore, the power to detect gene–peak associations increases with cell number (Ma et al., Fig. 3B) and the limited cell numbers in the analysis in this study likely results in an enrichment of the most strongly correlated associations among those detected. We have added a comparison of UMIs per gene for genes with and without a significant gene-peak correlation, illustrating this dynamic (Suppl. Fig. 3I). Furthermore, we have described this relationship and limitation in the relevant part of the results section.

(c) Zonation analysis framing

The zonation analysis is compelling, but the authors should more explicitly emphasize that defining pseudotime and examining chromatin state dynamics is only possible because both modalities are measured simultaneously. And more detail on the Monocle3 pseudotime analysis is needed, as it is unclear how this was really done.

We expanded our description on the pseudotime analysis using Monocle in the relevant section in the Methods. Furthermore, we explicitly point out that this type of analysis relies on simultaneous measurements of both modalities at the end of the results section.

https://doi.org/10.7554/eLife.110034.2.sa0

Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells

Peer review process

Editors

Be the first to read new articles from eLife