Avoiding false discoveries: Revisiting an Alzheimer’s disease snRNA-Seq dataset

  1. UK Dementia Research Institute at Imperial College London, London W12 0BZ, UK
  2. Department of Brain Sciences, Imperial College London, London W12 0BZ, UK

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a response from the authors (if available).

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Joon-Yong An
    Korea University, Seoul, Korea, the Republic of
  • Senior Editor
    Murim Choi
    Seoul National University, Seoul, Korea, the Republic of

Reviewer #1 (Public Review):

Murphy, Fancy and Skene performed a reanalysis of snRNA-seq data from Alzheimer Disease (AD) patients and healthy controls published previously by Mathys et al. (2019), arriving at the conclusion that many of the transcriptional differences described in the original publication were false positives. This was achieved by revising the strategy for both quality control and differential expression analysis. I believe the authors' intention was to show the results of their reanalysis not as a criticism of the original paper (which can hardly be faulted for their strategy which was state-of-the-art at the time and indeed they took extra measures attempting to ensure the reliability of their results), but primarily to raise awareness and provide recommendations for rigorous analysis of sc/snRNA-seq data for future studies.

STRENGTHS:

The authors demonstrate that the choice of data analysis strategy can have a vast impact on the results of a study, which in itself may not be obvious to many researchers.

The authors apply a pseudobulk-based differential expression analysis strategy (essentially, adding up counts from all cells per individual and comparing those counts with standard RNA-seq differential expression tests), which is (a) in line with latest community recommendations, (b) different from the "default options" in most popular scRNA-seq analysis suites, and (c) explains the vastly different number of DEGs identified by the authors and the original publication. The recommendation of this approach together with a detailed assessment of the DEGs found by both methodologies could be a useful finding for the research community. Unfortunately, it is currently not fully substantiated and is confounded with concurrent changes in QC measures (see weaknesses).

The authors show a correlation between the number of DEGs and the number of cells assessed, which indicates a methodological shortcoming of the original paper's approach (actually, the authors of the original paper already acknowledged that the lesser number of DEGs for rare cell types was a technical artefact). To be educational for the reader it would be important to provide more information about the DEGs that were "found" and those that were "lost". Given vast inter-individual heterogeneity in humans, it is likely that the study was underpowered to find weaker differences using the pseudobulks (Fig. 1B shows that only genes with more than 4-fold change were found "significant").

All code and data used in this study are publicly available to the readers.

WEAKNESSES:

The authors interpret the fact that they found fewer DEGs with their method than the original paper as a good thing by making the assumption that all genes that were not found were false positives. However, they do not prove this, and it is likely that at least some genes were not found due to a lack of statistical power and not because they were actually "incorrect". The original paper also performed independent validations of some genes that were not found here.

I am concerned that the only DEGs found by the authors are in the rare cell types, foremost the rare microglia (see Fig. 1f). It is unclear to me how many cells the pseudo-bulk counts were based on for these cells types, but it seems that (a) there were few and (b) there were quite few reads per cells. If both are the case, the pseudobulk counts for these cell populations might be rather noisy and the DEG results are liable to outliers with extreme fold changes.

The authors claim they improved the quality control of the dataset. While I do not think they did anything wrong per se, the authors offer no objective metric to assess this putative improvement. This is another major weakness of the paper as it confounds the results of the improved (?) differential analysis strategy and dilutes the results. I detail this weakness in the two following points:

Removing low-quality cells: The authors apply a new QC procedure resulting in the removal of some 20k more cells than in the original publication. They state "we believe the authors' quality control (QC) approach did not capture all of these low quality cells" (l. 26). While all the QC metrics used are very sensible, it is unclear whether they are indeed "better". For instance, removal with a mitochondrial count of <5% seems harsh and might account for a large proportion of additional cells filtered out in comparison to the original analysis. There is no blanket "correct cutoff" for this percentage. For instance, the "classic" Seurat tutorial https://satijalab.org/seurat/articles/pbmc3k_tutorial.html uses the 5% threshold chosen by the authors, an MAD-based selection of cutoff arrived at 8% here https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html, another "best practices" guide choses by default 10% https://bioconductor.org/books/3.17/OSCA.basic/quality-control.html#quality-control-discarded, etc. Generally, the % of mitochondrial reads varies a lot between datasets. As far as I can tell, the original paper did not use a fixed threshold but instead used a clustering approach to identify cells with an "abnormally high" mitochondrial read fraction. That also seems reasonable. Overall, I cannot assess whether the new QC is really more appropriate than the original analysis and the authors do not provide any evidence in favor of their strategy.

Batch correction: "Dataset integration has become a standard step in single-cell RNA-Seq protocols" (l. 29). While it is true that many authors now choose to perform an integration step as part of their analysis workflow, this is by no means uncontroversial as there is a risk of "over-integration" and loss of true biological differences. Also, there are many different methods for dataset integration out there, which will all have different results. More importantly, the authors go on "we found different cell type proportions to the authors (Fig. 1a) which could be due to accounting for batch effects" but offer no support for the claim that the batch effects are indeed related to the observed differences. An alternative explanation would be a selective loss/gain of certain cell types during quality control. The original paper stated concerns about losing certain cell types (microglia, which do not seem to be differentially abundant in the original paper / new analysis).

Relevant literature is incompletely cited. Instead of referring to reviews of best practices and benchmarks comparing methods for batch correction and or differential analysis, the authors only refer to their own previous work.

Due to a lack of comparison with other methods and due to the fact that the author's methodology was only applied to a single dataset, the paper presents merely a case study, which could be useful but falls short of providing a general recommendation for a best practice workflow.

APPRAISAL:

The manuscript could help to increase awareness of data analysis choices in the community, but only if the superiority of the methodology was clearly demonstrated. The recommended pseudobulk differential expression approach along with the indication of drastic differences that this might have on the results is the main output of the current manuscript, but it is difficult to assess unequivocally how this influenced the results because the differential analysis comes after QC and cell type annotation, which have also been changed in comparison to the original publication. In my opinion, the purpose of the paper might be better served by focusing on the DE strategy without changing QC and instead detailing where/how DEGs were gained/lost and supporting whether these were false positives.

Reviewer #2 (Public Review):

Summary: This paper takes on the important topic of preprocessing of single cell/nuclei RNA-seq prior to testing for differential gene expression. However, the manuscript has a number of critical weaknesses.

Strengths: This is an important topic and a key dataset for illustration.

Weaknesses: A major contribution is the use of the authors' own inhouse pipeline for data preparation (scFLOW), but this software is unpublished since 2021 and consequently not yet refereed. It isn't reasonable to take this pipeline as being validated in the field.

The authors claim that Mathys' analysis didn't use batch correction prior to analysis and claim that such processing is routine in the field, but the only citation they give is to the above-mentioned scFLOW. Batch correction for DEG analysis isn't the field standard, for example, Bryois et al. (2022) PMID: 35915177 doesn't perform batch correction. Whether or not to do such preprocessing is certainly arguable, but the authors need to argue it, not presuppose it.

The authors spend considerable effort in discounting the pseudoreplication analysis of Mathys. It is well understood that this analysis yields a lot of false positives, but Mathys only used this approach for removing genes, not as a valid test in and of itself. They also worry that the significant findings in Mathys' paper are influenced by the number of cells of each type. I'm sure it is since power is a function of sample size, but is this a bad thing? It seems odd that their approach is not influenced by sample size.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation