Abstract

Single-cell RNA sequencing has spurred the development of computational methods that enable researchers to classify cell types, delineate developmental trajectories, and measure molecular responses to external perturbations. Many of these technologies rely on their ability to detect genes whose cell-to-cell variations arise from the biological processes of interest rather than transcriptional or technical noise. However, for datasets in which the biologically relevant differences between cells are subtle, identifying these genes is challenging. We present the self-assembling manifold (SAM) algorithm, an iterative soft feature selection strategy to quantify gene relevance and improve dimensionality reduction. We demonstrate its advantages over other state-of-the-art methods with experimental validation in identifying novel stem cell populations of Schistosoma mansoni, a prevalent parasite that infects hundreds of millions of people. Extending our analysis to a total of 56 datasets, we show that SAM is generalizable and consistently outperforms other methods in a variety of biological and quantitative benchmarks.

Data availability

The schistosome stem cell scRNAseq data generated in this study is available through the Gene Expression Omnibus (GEO) under accession number GSE116920.

The following data sets were generated
The following previously published data sets were used

Article and author information

Author details

  1. Alexander J Tarashansky

    Department of Bioengineering, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Yuan Xue

    Department of Bioengineering, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Pengyang Li

    Department of Bioengineering, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.
  4. Stephen R Quake

    Department of Bioengineering, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1613-0809
  5. Bo Wang

    Department of Bioengineering, Stanford University, Stanford, United States
    For correspondence
    wangbo@stanford.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8880-1432

Funding

Burroughs Wellcome Fund

  • Bo Wang

Arnold and Mabel Beckman Foundation

  • Bo Wang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Alex K Shalek, Broad Institute of MIT and Harvard, United States

Ethics

Animal experimentation: In adherence to the Animal Welfare Act and the Public Health Service Policy on Humane Care and Use of Laboratory Animals, all experiments with and care of mice were performed in accordance with protocols approved by the Institutional Animal Care and Use Committees (IACUC) of Stanford University (protocol approval number 30366).

Version history

  1. Received: June 3, 2019
  2. Accepted: September 16, 2019
  3. Accepted Manuscript published: September 16, 2019 (version 1)
  4. Version of Record published: October 16, 2019 (version 2)

Copyright

© 2019, Tarashansky et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 7,939
    Page views
  • 1,036
    Downloads
  • 33
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alexander J Tarashansky
  2. Yuan Xue
  3. Pengyang Li
  4. Stephen R Quake
  5. Bo Wang
(2019)
Self-assembling manifolds in single-cell RNA sequencing data
eLife 8:e48994.
https://doi.org/10.7554/eLife.48994

Share this article

https://doi.org/10.7554/eLife.48994

Further reading

    1. Chromosomes and Gene Expression
    2. Computational and Systems Biology
    Arthur L Schneider, Rita Martins-Silva ... Nuno L Barbosa-Morais
    Tools and Resources

    We herein introduce voyAGEr, an online graphical interface to explore age-related gene expression alterations in 49 human tissues. voyAGEr offers a visualisation and statistical toolkit for the finding and functional exploration of sex- and tissue-specific transcriptomic changes with age. In its conception, we developed a novel bioinformatics pipeline leveraging RNA sequencing data, from the GTEx project, encompassing more than 900 individuals. voyAGEr reveals transcriptomic signatures of the known asynchronous ageing between tissues, allowing the observation of tissue-specific age periods of major transcriptional changes, associated with alterations in different biological pathways, cellular composition, and disease conditions. Notably, voyAGEr was created to assist researchers with no expertise in bioinformatics, providing a supportive framework for elaborating, testing and refining their hypotheses on the molecular nature of human ageing and its association with pathologies, thereby also aiding in the discovery of novel therapeutic targets. voyAGEr is freely available at https://compbio.imm.medicina.ulisboa.pt/app/voyAGEr.

    1. Cancer Biology
    2. Computational and Systems Biology
    Bingrui Li, Fernanda G Kugeratski, Raghu Kalluri
    Research Article

    Non-invasive early cancer diagnosis remains challenging due to the low sensitivity and specificity of current diagnostic approaches. Exosomes are membrane-bound nanovesicles secreted by all cells that contain DNA, RNA, and proteins that are representative of the parent cells. This property, along with the abundance of exosomes in biological fluids makes them compelling candidates as biomarkers. However, a rapid and flexible exosome-based diagnostic method to distinguish human cancers across cancer types in diverse biological fluids is yet to be defined. Here, we describe a novel machine learning-based computational method to distinguish cancers using a panel of proteins associated with exosomes. Employing datasets of exosome proteins from human cell lines, tissue, plasma, serum, and urine samples from a variety of cancers, we identify Clathrin Heavy Chain (CLTC), Ezrin, (EZR), Talin-1 (TLN1), Adenylyl cyclase-associated protein 1 (CAP1), and Moesin (MSN) as highly abundant universal biomarkers for exosomes and define three panels of pan-cancer exosome proteins that distinguish cancer exosomes from other exosomes and aid in classifying cancer subtypes employing random forest models. All the models using proteins from plasma, serum, or urine-derived exosomes yield AUROC scores higher than 0.91 and demonstrate superior performance compared to Support Vector Machine, K Nearest Neighbor Classifier and Gaussian Naive Bayes. This study provides a reliable protein biomarker signature associated with cancer exosomes with scalable machine learning capability for a sensitive and specific non-invasive method of cancer diagnosis.