TopOMetry systematically learns and evaluates the latent dimensions of single-cell atlases

  1. Laboratory of Cell Signaling, Obesity and Comorbidities Research Center, School of Medical Sciences, University of Campinas, Brazil
  2. Department of Physiology, Anatomy and Genetics, University of Oxford, United Kingdom

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jihwan Park
    Gwangju Institute of Science and Technology, Gwangju, Korea, the Republic of
  • Senior Editor
    Murim Choi
    Seoul National University, Seoul, Korea, the Republic of

Reviewer #1 (Public review):

Summary:

Sidarta-Oliveira et al. present TopOMetry, a novel dimensionality reduction method based on the eigendecomposition of approximated Laplace-Beltrami Operator. Shortly, TopOMetry is an iterative version of the existing spectral methods (e.g., Laplacian Eigenmap or Diffusion map). It approximates the Laplacian operators twice, once in a "phenotypic space" and then once again in the eigenbases space. By doing this the approximated operator will contain more information of the manifold, which allows for more robust and accurate downstream analyses.

Strengths:

(1) The approach was rigorously tested based on synthetic and real single-cell RNA-seq datasets.

(2) The package is well-made and easily scalable to millions of cells.

(3) The comprehensive documentation helps the end-users to run desired analyses.

Weaknesses:

(1) The method is an extension of the current state-of-art methods, not a fundamentally new one.

(2) Considering the target readers, the paper contains a lot of jargon.

Reviewer #2 (Public review):

Summary:

This work introduces a novel framework to systematically learn the latent dimensions of single-cell data, grounded in the theory of the Riemannian manifold. The authors demonstrate how this framework can be applied to various important tasks, such as estimating intrinsic dimensionalities, annotating cell types, etc. They did a great job of tackling an important but not yet established problem in the field and approaching it with a theoretically sound and novel approach. I think after a more rigorous and comprehensive validation, this work could be impactful.

Strengths:

(1) Dimensionality reduction is a routine step in analyzing many high-dimensional data, such as molecular data. While the downstream analysis results depend heavily on this step, existing methods rely on strong assumptions and are sometimes heuristic. The authors present a novel, theoretically grounded approach to address this important problem.

(2) The authors demonstrated its usability in downstream analysis in a comprehensive manner. In particular, they show evidence suggesting novel T-cell subpopulations.

(3) I commend the authors for releasing and maintaining their software well with comprehensive documentation. This significantly increases the usability and accessibility of the method.

Weaknesses:

(1) To encourage the single-cell community to adopt this method, the authors should more clearly demonstrate its advantages over existing methods. There are many single cell analysis algorithms that are proposed in each task and some of them are widely used by biologists. However, the comparison in this work is somewhat limited. For example, Even methods mentioned in the relevant work paragraph (2nd paragraph) on page 2 are not all compared, or the reason why they are not included is not discussed. Also, I am curious how PC dimensions are determined. The choice of 300 PCs on page 11 seems arbitrary. Furthermore, the usefulness of dimension-reduced data also depends a lot on the preceding processing steps, such as highly variable gene selection. I understand it is hard to control all those factors, but I think there is room for improvement.

(2) The paper lacks experiments that validate the results. It would be beneficial to see additional evaluation settings with better-established ground truths to more strongly demonstrate the method's effectiveness.

(3) The effect of various parameters, such as those involved in k-nearest neighbors (KNN) or choosing the appropriate Laplacian operator, is not comprehensively explored. How can we ensure the analysis is not overly sensitive to these parameters?

(4) Batch effects are prevalent in single-cell data. The paper does not adequately address how the proposed method handles this issue.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation