1. Computational and Systems Biology
  2. Stem Cells and Regenerative Medicine
Download icon

Self-assembling manifolds in single-cell RNA sequencing data

  1. Alexander J Tarashansky
  2. Yuan Xue
  3. Pengyang Li
  4. Stephen R Quake
  5. Bo Wang  Is a corresponding author
  1. Stanford University, United States
  2. Chan Zuckerberg Biohub, United States
  3. Stanford University School of Medicine, United States
Tools and Resources
Cite this article as: eLife 2019;8:e48994 doi: 10.7554/eLife.48994
6 figures, 1 table, 52 data sets and 5 additional files

Figures

Figure 1 with 2 supplements
The SAM algorithm.

(a) SAM starts with a randomly initialized kNN adjacency matrix and iterates to refine the adjacency matrix and gene weight vector until convergence. (b) Root mean square error (RMSE) of the gene weights (top) and the fraction of different edges of the nearest-neighbor adjacency matrices (bottom) between adjacent iterations (blue) and between independent runs at the same iteration (orange) to show that SAM converges to the same solution regardless of initial conditions. The differences between the gene weights and nearest-neighbor graphs from independent runs are relatively small, indicating that SAM converges to the same solution through similar paths. (c) Graph structures and gene weights of the schistosome stem cell data converging to the final output over the course of 10 iterations (i denotes iteration number). Top: nodes are cells and edges connect neighbors. Nodes are color-coded according to the final clusters. Bottom: weights are sorted according to the final gene rankings. (d) Network properties iteratively improve for the graphs reconstructed from the original data (red) but not on the randomly shuffled data (blue). The network properties converge to the same values when initializing SAM with the Seurat-reconstructed graph instead of a random graph (yellow). Dashed lines: metrics measured from the Seurat-reconstructed graphs.

https://doi.org/10.7554/eLife.48994.003
Figure 1—figure supplement 1
Quality control of library preparation and sequencing of the schistosome stem cells.

(a) Histograms of h2a qPCR measurements in 2.5- (left) and 3.5- (right) week post infection samples. (b) Scatter plot of gene count (>2 TPM) vs. mapped read count of individual sequenced cells. Cells with low gene count or h2a expression are discarded and filtered from analysis (red) and the remaining cells are analyzed (blue). The number of final cells kept for analysis is specified on the top left corner of each plot.

https://doi.org/10.7554/eLife.48994.004
Figure 1—figure supplement 2
A user interface for interactively exploring single-cell data using SAM.

An interactive Jupyter notebook interface provided by the SAM package facilitates convenient visualization of single cell data (upper left) and changing of SAM parameters using various control panels (upper right, and bottom). This interface allows clustering, subclustering, visualizing of gene expression, and many other applications.

https://doi.org/10.7554/eLife.48994.005
Figure 2 with 1 supplement
SAM identifies novel subpopulations within schistosome stem cells.

(a) UMAP projections of the manifolds reconstructed by SAM, PCA, and Seurat. SIMLR outputs its own 2D projection based on its constructed similarity matrix using a modified version of t-SNE. The schistosome cells are color-coded by the stem cell subpopulations μ, δ’, εɑ, and εβ determined by Louvain clustering. (b) UMAP projections with gene expressions of subpopulation-specific markers (eledh, nanos-2, cabp, astf, bhlh,) and a ubiquitous stem cell marker, ago2-1, overlaid. Insets: magnified views of the expressing populations. (c) FISH of cabp and EdU labeling of dividing stem cells in juvenile parasites at 2.5 weeks post-infection show that μ-cells (cabp+EdU+, arrowheads) are close to the parasite surface and beneath a layer of post-mitotic cabp+ cells. Dashed outline: parasite surface. Right: magnified views of the boxed region. (d) FISH of cabp and a set of canonical muscle markers, troponin, myosin, tropomyosin, and collagen, shows colocalization in post-mitotic cabp+ cells. Images in (c–d) are single confocal slices. (e) FISH of astf and bhlh shows their orthogonal expression in adjacent EdU+ cells (arrowheads). Bottom: magnified views of the boxed region. Image is a maximum intensity projection of a confocal stack with a thickness of 12 µm. (f) UMAP projection of stem cells isolated from juveniles at 2.5 and 3.5 weeks post-infection. Cell subpopulation assignments based on marker gene expressions are specified. Right: a magnified view to show the mapping of εɑ- and εβ-cells. (g) Standardized dispersions as calculated by Seurat plotted vs. the SAM gene weights. (h) SC3 AUROC scores plotted vs. the SAM gene weights. Error bars indicate the standard deviation of SC3 AUROC scores between trials using different chosen numbers of clusters. In (g) and (h), the top 20 genes specific to each subpopulation are colored according to the color scheme used in (a).

https://doi.org/10.7554/eLife.48994.006
Figure 2—figure supplement 1
μ-cells express ubiquitous stem cell marker and population specific genes.

UMAP projections with gene expressions of (a) stem cell markers and (b) μ-cell-specific genes overlaid.

https://doi.org/10.7554/eLife.48994.007
Figure 3 with 1 supplement
SAM improves clustering accuracy and runtime performance.

(a) Accuracy of cluster assignment quantified by adjusted rand index (ARI) on nine annotated datasets (left). Right: differences between the number of clusters found by each method (N) and the number of annotated clusters (NTRUE). Smaller differences indicate more accurate clustering. Seurat* denotes Seurat analysis using parameters that maximize ARI. (b) RMSE of gene weights output by SAM averaged across ten replicate runs with random initial conditions for 56 datasets (blue) and simulated datasets with no intrinsic structure (green, Materials and methods). (c) Runtime of SAM, SC3, SIMLR, and Seurat as a function of the number of cells in each dataset. SC3 and SIMLR were not run on datasets with >3000 cells as the run time exceeds 20 min.

https://doi.org/10.7554/eLife.48994.008
Figure 3—figure supplement 1
SAM converges to a stable solution independent of random initial conditions and is robust to the number of nearest neighbors and choice of distance metric.

(a) RMSE of gene weights between adjacent iterations within a run, averaged across ten replicate runs for all datasets. (b–c) Average ARI scores for the nine annotated benchmarking datasets when varying (b) the number of nearest neighbors, k, from 10 to 30 or (c) the choice of distance metric (Euclidean or Pearson correlation). Error bars indicate standard deviations of ARI scores across the different values of k and distance metrics. The errors for data with no error bars are too small to be seen.

https://doi.org/10.7554/eLife.48994.009
SAM improves the analysis of datasets with varying network sensitivities.

(a) Network sensitivity of all 56 datasets ranked in descending order. Blue: the nine benchmarking datasets used in Figure 3a. Sensitivity measures the robustness of a dataset to changes in which features are selected (Materials and methods). (b) The network sensitivity plotted against the fraction of genes with SAM weight greater than 0.5 (in log scale) with Spearman correlation coefficient specified in the upper-right corner. (c) Fold improvement of SAM over Seurat for NACC, modularity, and spatial dispersion with respect to sensitivity for all 56 datasets. These ratios are linearly correlated with network sensitivity with Pearson correlations (r2) specified in the upper-left corner of each plot.

https://doi.org/10.7554/eLife.48994.010
Robust feature selection improves cell clustering and manifold reconstruction.

(a) Network sensitivity, ARI, NACC, modularity, and spatial dispersion with respect to corruption of the Darmanis dataset, in which we randomly permute fractions of the data ranging from 0 to 100% of the total number of elements (Materials and methods). Performance is compared between SAM (blue), Seurat (red), Seurat with optimal parameters (black), and Seurat rescued with the top-ranked SAM genes (indigo). Error bars indicate the standard deviations across 10 replicate runs. The errors for points with no bars are too small to be seen. (b) Comparison of the area under curve (AUC) of the metrics in (a) with respect to data corruption for all nine datasets. Error bars indicate the standard deviations across 10 replicate runs. The errors for data with no error bars are too small to be seen.

https://doi.org/10.7554/eLife.48994.011
Figure 6 with 2 supplements
SAM captures the cellular activation dynamics in a stimulated macrophage dataset.

(a) GSEA analysis (left) and UMAP projections (right) of the activated macrophages before (top) and after (bottom) removing cell cycle effects. Teal: significantly enriched gene sets determined by the significance threshold of 0.25 for the False Discovery Rate (FDR, dashed lines). Bottom: the two clusters are denoted as MT and M with colors representing the time since LPS induction. Arrows: evolution of time. (b) TNFα is enriched in the MT cluster. (c) Diagram of NF-κB activation in response to LPS stimulation via the Myd88 and TRIF signaling pathways. (d) Log2 fold changes of the average expressions of selected inflammatory genes in the MT cluster vs. the M cluster. All genes are significantly differentially expressed between the two clusters according to the Welch’s two-sample t-test (p<5103). (e) Representative traces for transient (left) and prolonged (right) NF-κB activation (Materials and methods). (f) Cells with prolonged NF-κB response (denoted as P) are primarily in the MT population. (g) Seurat and SIMLR projections show that they fail to order the cells by time since LPS induction and do not identify cell clusters representing the different modes of NF-κB activation.

https://doi.org/10.7554/eLife.48994.012
Figure 6—figure supplement 1
Cluster-specific marker genes before and after removing cell cycle effects.

UMAP projections with marker genes specific to the dividing cells (a) and the MT cluster (b) overlaid.

https://doi.org/10.7554/eLife.48994.013
Figure 6—figure supplement 2
SAM groups cells based on NF-κB activation dynamics while other methods cannot.

(a) UMAP projection of the macrophage cells after the removal of cell cycle effects. Cells with prolonged NF-κB dynamics are highlighted in red. (b) UMAP and t-SNE projections for Seurat and SIMLR, respectively, after the removal of cell cycle effects. Cells with prolonged NF-κB dynamics are highlighted in red. (c) UMAP projections for Seurat and SIMLR with three MT-specific marker gene expressions overlaid.

https://doi.org/10.7554/eLife.48994.014

Tables

Key resources table
Reagent type
(species) or
resource
DesignationSource or referenceIdentifiersAdditional
information
Commercial assay or kitSsoAdvanced Universal SYBR Green SupermixBiorad1725270qPCR
Commercial assay or kitQuant-iT PicoGreen dsDNA Assay KitThermo-FisherP7589cDNA quantification
Peptide, recombinant proteinRNase InhibitorTakara Bio2313BRT mix
Chemical compound, drugdNTP Set 100 mM solutionsThermo-FisherR0181RT mix and cDNA pre-amplification
Sequence-based reagents100 µM oligo-dTIDTAAGCAGTGGTATCAACGCAGAGTACT(30)VN
Sequence-based reagents100 µM TSOExiqonAAGCAGTGGTATCAACGCAGAGTACATrGrG+G
Commercial assay or kitERCC RNA Spike-In MixThermo-Fisher4456740RT mix
Chemical compound, drug10% Triton X-100Thermo-Fisher28314RT mix
Peptide,recombinant proteinSMARTscribe reverse transcriptaseTakara Bio639538RT mix
Chemical compound, drug100 mM DTTPromegaP1171RT mix
Chemical compound, drug5 M BetaineThermo-FisherB0300-1VLRT mix
Commercial assay or kitKapa Hotstart Ready MixRocheKK2602cDNA pre-amplification
Sequence-based reagents100 μM IS_PCR primerIDTAAGCAGTGGTATCAACGCAGAGT
Peptide, recombinant proteinlambda exonucleaseNEBM0262SDepletion of primer dimers
Commercial assay or kitAmpure purification beadsNEBM0262SDNA purification
Commercial assay or kitTG Nextera XT DNA Sample Preparation KitIlluminaFC-131–1096Library preparation
Commercial assay or kitTG Nextera XT Index Kit v2 Set A (96 Indices, 384 Samples)IlluminaTG-131–2001Library preparation
Strain, strain background (S. mansoni)NMRIBEI ResourcesNR-21963
AntibodyAnti-Digoxigenin-POD, Fab fragments from sheepRoche11207733910(1:1,000); FISH
experiments
AntibodyAnti-Fluorescein-POD, Fab fragments from sheepRoche11426346910(1:1,500); FISH experiments
Peptide, recombinant DNA reagentsPlasmid-pJC53.2Addgene26536Cloning vector
Chemical compound, drugCy5-azideClick Chemistry ToolsAZ118EdU detection
Chemical compound, drug5-ethynyl-2-deoxyuridine (EdU)InvitrogenA10044
Chemical compound, drugVybrant DyeCycle Violet (DCV)InvitrogenV35003FACS
Chemical compound, drugTOTO-3InvitrogenT3604FACS

Data availability

The schistosome stem cell scRNAseq data generated in this study is available through the Gene Expression Omnibus (GEO) under accession number GSE116920.

The following data sets were generated
  1. 1
    NCBI Gene Expression Omnibus
    1. Y Xue
    2. B Wang
    (2018)
    ID GSE116920. Single-cell RNA sequencing of proliferative stem cell population from juvenile Schistosoma mansoni.
The following previously published data sets were used
  1. 1
    NCBI Gene Expression Omnibus
    1. F Tang
    2. J Qiao
    3. R Li
    (2013)
    ID GSE36552. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells.
  2. 2
    ArrayExpress
    1. M Goolam
    2. A Scialdone
    3. SJL Graham
    4. IC Macaulay
    5. A Jedrusik
    6. A Hupalowska
    7. T Voet
    8. JC Marioni
    9. M Zernicka-Goetz
    (2016)
    ID E-MTAB-3321. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos.
  3. 3
    NCBI Gene Expression Omnibus
    1. B Tasic
    2. V Menon
    3. TN Nguyen
    4. TK Kim
    5. Z Yao
    6. LT Gray
    7. M Hawrylycz
    8. C Koch
    9. H Zeng
    (2016)
    ID GSE71585-GPL17021. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics.
  4. 4
    NCBI Gene Expression Omnibus
    1. F Guo
    2. H Guo
    3. L Li
    4. F Tang
    (2015)
    ID GSE63818. The transcriptome and DNA methylome landscapes of human primordial germ cells.
  5. 5
    ArrayExpress
    1. JK Kim
    2. AA Kolodziejczyk
    3. T Ilicic
    4. T Illicic
    5. SA Teichmann
    6. JC Marioni
    (2015)
    ID E-MTAB-2600. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation.
  6. 6
    NCBI Gene Expression Omnibus
    1. D Wollny
    2. S Zhao
    3. A Martin-Villalba
    (2016)
    ID GSE80032. Single-cell analysis uncovers clonal acinar cell heterogeneity in the adult pancreas.
  7. 7
    NCBI
    1. KM Loh
    2. A Chen
    3. PW Koh
    4. TZ Deng
    5. R Sinha
    6. TsaiJM
    7. AA Barkal
    8. KY Shen
    9. R Jain
    10. RM Morganti
    11. N Shyh-Chang
    12. NB Fernhoff
    13. GeorgeBM
    14. G Wernig
    15. REA Salomon
    16. Z Chen
    17. H Vogel
    18. JA Epstein
    19. A Kundaje
    20. WS Talbot
    21. BeachyPA
    22. LT Ang
    23. IL Weissman
    (2016)
    ID SRP073808. Mapping the pairwise choices leading from pluripotency to human bone, heart, and other mesoderm cell types.
  8. 8
    NCBI Gene Expression Omnibus
    1. Q Deng
    2. D Ramsköld
    3. B Reinius
    4. R Sandberg
    (2014)
    ID GSE45719. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells.
  9. 9
    NCBI Gene Expression Omnibus
    1. P Anoop
    2. T Itay
    (2014)
    ID GSE57872. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma.
  10. 10
    NCBI Gene Expression Omnibus
    1. AH Rizvi
    2. PG Camara
    3. EK Kandror
    4. TJ Roberts
    5. I Schieren
    6. T Maniatis
    7. R Rabadan
    (2017)
    ID GSE94883. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development.
  11. 11
    NCBI Gene Expression Omnibus
    1. Q Tang
    2. D Langenau
    (2017)
    ID GSE100911. Dissecting hematopoietic and renal cell heterogeneity in adult zebrafish at single-cell resolution using RNA sequencing.
  12. 12
    NCBI Gene Expression Omnibus
    1. I Engel
    2. G Seumois
    3. L Chavez
    4. A Chawla
    5. B White
    6. D Mock
    7. P Vijayanand
    8. M Kronenberg
    (2016)
    ID GSE74596. Innate-like functions of natural killer T cell subsets result from highly divergent gene programs.
  13. 13
    ArrayExpress
    1. D Edsgard
    2. F Lanner
    3. R Sandberg
    4. S Petropoulos
    (2016)
    ID E-MTAB-3929. Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos.
  14. 14
    NCBI Gene Expression Omnibus
    1. JC Burns
    2. MC Kelly
    3. M Hoa
    4. RJ Morell
    5. MW Kelley
    (2015)
    ID GSE71982. Single-cell RNA-Seq resolves cellular complexity in sensory organs from the neonatal inner ear.
  15. 15
    NCBI Gene Expression Omnibus
    1. A Namani
    2. XJ Wang
    3. X Tang
    (2017)
    ID GSE94383. Measuring signaling and RNA-Seq in the same cell links gene expression to dynamic patterns of NF-κB activation.
  16. 16
    NCBI Gene Expression Omnibus
    1. FH Biase
    2. X Cao
    3. S Zhong
    (2014)
    ID GSE57249. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing.
  17. 17
    NCBI Gene Expression Omnibus
    1. C Trapnell
    2. D Cacchiarelli
    3. J Grimbsby
    4. P Pokharel
    5. S Li
    6. M Morse
    7. T Mikkelsen
    8. J Rinn
    (2014)
    ID GSE52529-GPL16791. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.
  18. 18
    NCBI SRA
    1. AA Pollen
    2. TJ Nowakowski
    3. J Shuga
    4. X Wang
    5. AA Leyrat
    6. JH Lui
    7. N Li
    8. L Szpankowski
    9. B Fowler
    10. P Chen
    11. N Ramalingam
    12. G Sun
    13. M Thu
    14. M Norris
    15. R Lebofsky
    16. D Toppani
    17. DW Kemp
    18. WongM
    19. B Clerkson
    20. BN Jones
    21. S Wu
    22. L Knutsson
    23. B Alvarado
    24. J Wang
    25. LS Weaver
    26. MayAP
    27. RC Jones
    28. MA Unger
    29. AR Kriegstein
    30. JA West
    (2014)
    ID SRP041736. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.
  19. 19
    ArrayExpress
    1. F Buettner
    2. KN Natarajan
    3. FP Casale
    4. ProserpioV
    5. A Scialdone
    6. FJ Theis
    7. SA Teichmann
    8. JC Marioni
    9. O Stegle
    (2015)
    ID E-MTAB-2805. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.
  20. 20
    NCBI Gene Expression Omnibus
    1. R Satija
    (2014)
    ID GSE48968-GPL13112. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation.
  21. 21
    NCBI Gene Expression Omnibus
    1. L Ning
    2. C Li-Fang
    (2015)
    ID GSE64016. Oscope identifies oscillatory genes in unsynchronized single-cell RNAseq experiments.
  22. 22
    NCBI Gene Expression Omnibus
    1. SE Meyer
    2. T Qin
    3. DE Muench
    4. K Masuda
    5. M Venkatasubramanian
    6. E Orr
    7. E Paietta
    8. MS Tallman
    9. H Fernandez
    10. A Melnick
    11. MM Beau
    12. S Kogan
    13. N Salomonis
    14. ME Figueroa
    15. HL Grimes
    (2016)
    ID GSE77847. DNMT3A haploinsufficiency transforms FLT3ITD myeloproliferative disease into a rapid, spontaneous, and fully penetrant acute myeloid leukemia.
  23. 23
    NCBI Gene Expression Omnibus
    1. B Treutlein
    2. SR Quake
    (2014)
    ID GSE52583-GPL13112. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq.
  24. 24
    NCBI Gene Expression Omnibus
    1. A Olsson
    2. M Venkatasubramanian
    3. VK Chaudhri
    4. BJ Aronow
    (2016)
    ID GSE70245. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice.
  25. 25
    NCBI Gene Expression Omnibus
    1. J Shin
    2. H Song
    (2015)
    ID GSE71485. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis.
  26. 26
    ArrayExpress
    1. PC Schwalie
    2. H Dong
    3. M Zachara
    4. J Russeil
    5. D Alpern
    6. N Akchiche
    7. C Caprara
    8. W Sun
    9. KU Schlaudraff
    10. G Soldati
    11. C Wolfrum
    12. B Deplancke
    (2018)
    ID E-MTAB-6677. A stromal cell population that inhibits adipogenesis in mammalian fat depots.
  27. 27
    NCBI Gene Expression Omnibus
    1. S Darmanis
    2. S Quake
    (2017)
    ID GSE84465. Single-Cell RNA-Seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma.
  28. 28
    ArrayExpress
    1. A Scialdone
    2. Y Tanaka
    3. W Jawaid
    4. V Moignard
    5. NK Wilson
    6. IC Macaulay
    7. JC Marioni
    8. B Göttgens
    (2016)
    ID E-MTAB-4079. Resolving early mesoderm diversification through single-cell expression profiling.
  29. 29
    NCBI Gene Expression Omnibus
    1. M Enge
    2. HE Arda
    (2017)
    ID GSE81547. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns.
  30. 30
    NCBI Gene Expression Omnibus
    1. I Stévant
    2. Y Neirjinck
    3. C Borel
    4. J Escoffier
    5. LB Smith
    6. SE Antonarakis
    7. ET Dermitzakis
    8. S Nef
    (2018)
    ID GSE97519. Deciphering cell lineage specification during male sex determination with single-cell RNA sequencing.
  31. 31
    NCBI Gene Expression Omnibus
    1. MJ Phillips
    2. P Jiang
    3. S Howden
    (2018)
    ID GSE98556. A novel approach to single cell RNA-sequence analysis facilitates in silico gene reporting of human pluripotent stem cell-derived retinal cell types.
  32. 32
    NCBI Gene Expression Omnibus
    1. M Vanlandewijck
    2. L He
    3. MA Mäe
    4. J Andrae
    5. C Betsholtz
    (2018)
    ID GSE99235. A molecular atlas of cell types and zonation in the brain vasculature.
  33. 33
  34. 34
    NCBI Gene Expression Omnibus
    1. A Ghahramani
    2. F Watt
    3. N Luscombe
    (2018)
    ID GSE99989. Epidermal Wnt signalling regulates transcriptome heterogeneity and proliferative fate in neighbouring cells.
  35. 35
    NCBI Gene Expression Omnibus
    1. F Lescroart
    2. X Wang
    3. X Li
    4. S Gargouri
    5. V Moignard
    6. C Dubois
    7. C Paulissen
    8. B Göttgens
    9. C Blanpain
    (2018)
    ID GSE100471. Defining the earliest step of cardiovascular lineage segregation by single-cell RNA-seq.
  36. 36
    NCBI Gene Expression Omnibus
    1. H Mohammed
    2. I Hernando-Herraez
    3. W Reik
    (2017)
    ID GSE100597. Single-cell landscape of transcriptional heterogeneity and cell fate decisions during mouse early gastrulation.
  37. 37
    NCBI Gene Expression Omnibus
    1. H Mathys
    2. F Gao
    3. L Tsai
    (2017)
    ID GSE103334. Temporal tracking of microglia activation in neurodegeneration at single-cell resolution.
  38. 38
    NCBI Gene Expression Omnibus
    1. M Chevée
    2. JD Robertson
    3. GH Cannon
    4. SP Brown
    5. LA Goff
    (2018)
    ID GSE107632. Variation in activity state, axonal projection, and position define the transcriptional identity of individual neocortical projection neurons.
  39. 39
    NCBI Gene Expression Omnibus
    1. PW Hook
    2. SA MyClymont
    3. GH Cannon
    4. WD Law
    5. AJ Morton
    6. LA Goff
    7. AS McCallion
    (2018)
    ID GSE108020. Single-Cell RNA-Seq of mouse dopaminergic neurons informs candidate gene selection for sporadic parkinson disease.
  40. 40
    NCBI Gene Expression Omnibus
    1. D Mi
    2. Z Li
    3. M Li
    (2018)
    ID GSE109796. Early emergence of cortical interneuron diversity in the mouse embryo.
  41. 41
    NCBI Gene Expression Omnibus
    1. F Zanini
    2. S Pu
    3. E Bekerman
    4. S Einav
    5. SR Quake
    (2018)
    ID GSE110496. Single-cell transcriptional dynamics of flavivirus infection.
  42. 42
    NCBI Gene Expression Omnibus
    1. I Tirosh
    2. A Venteicher
    3. M Suva
    4. A Regev
    (2016)
    ID GSE70630. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma.
  43. 43
    NCBI Gene Expression Omnibus
    1. I Amit
    2. A Tanay
    3. F Paul
    4. Y Arkin
    5. A Giladi
    (2015)
    ID GSE72857. Transcriptional heterogeneity and lineage commitment in myeloid progenitors.
  44. 44
    NCBI Gene Expression Omnibus
    1. GK Smyth
    2. Y Chen
    3. B Pal
    4. JE Visvader
    (2017)
    ID GSE95430. Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling.
  45. 45
    NCBI Gene Expression Omnibus
    1. M Häring
    2. A Zeisel
    3. S Linnarsson
    4. P Ernfors
    (2018)
    ID GSE103840. Neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types.
  46. 46
    NCBI Gene Expression Omnibus
    1. S Darmanis
    2. M Enge
    3. SR Quake
    4. SA Sloan
    5. BA Barres
    6. Y Zhang
    7. C Caneda
    8. Gephart MG Hayden
    9. LM Shuer
    (2015)
    ID GSE67835. A survey of human brain transcriptome diversity at the single cell level.
  47. 47
    NCBI Gene Expression Omnibus
    1. YJ Wang
    2. J Schug
    3. ML Golson
    4. K Won
    5. C Liu
    6. A Naji
    7. D Avrahami
    8. KH Kaestner
    (2016)
    ID GSE83139. Single-cell transcriptomics of the human endocrine pancreas.
  48. 48
    NCBI Gene Expression Omnibus
    1. A Veres
    2. M Baron
    (2016)
    ID GSE84133. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure.
  49. 49
    NCBI Gene Expression Omnibus
    1. CT Fincher
    2. O Wurtzel
    3. Hoog T de
    4. KM Kravarik
    5. PW Reddien
    (2018)
    ID GSE111764. Cell type transcriptome atlas for the planarian Schmidtea mediterranea.
  50. 50
    ArrayExpress
    1. Å Segerstolpe
    2. A Palasantza
    3. P Eliasson
    4. EM Andersson
    5. AC Andréasson
    6. X Sun
    7. S Picelli
    8. A Sabirsh
    9. M Clausen
    10. BjursellMK
    11. DM Smith
    12. M Kasper
    13. C Ämmälä
    14. R Sandberg
    (2016)
    ID E-MTAB-5061. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes.
  51. 51
    NCBI Gene Expression Omnibus
    1. MJ Muraro
    2. G Dharmadhikari
    3. Koning E de
    4. Oudenaarden A van
    (2016)
    ID GSE85241. A Single-cell transcriptome atlas of the human pancreas.

Additional files

Supplementary file 1

Ranked gene list with high SAM weights in the schistosome stem cell data.

Gene IDs and annotations are given in the S. mansoni genome version 9 (WormBase, WS268). Genes are assigned to the cluster corresponding to the marker gene, nanos-2, cabp, astf, or bhlh, with which they have the highest correlation. Genes found in our prior work (Wang et al., 2018) to be enriched in subsets of stem cells are specified.

https://doi.org/10.7554/eLife.48994.015
Supplementary file 2

Datasets used in this study.

Accession numbers, library size normalization methods, data preprocessing methods, sensitivity scores, and corresponding references are provided for each dataset. Accession numbers with asterisks indicate datasets that are sourced from the conquer database (Soneson and Robinson, 2018). Accession numbers with crosses indicate the nine well-annotated datasets that were used for benchmarking.

https://doi.org/10.7554/eLife.48994.016
Supplementary file 3

ARI clustering accuracy of individual annotated cell types.

The ARI scores of SAM, Seurat, SC3, and SIMLR applied to the nine benchmarking datasets are provided for each annotated ground truth cluster.

https://doi.org/10.7554/eLife.48994.017
Supplementary file 4

Cloning primer sequences used for generating riboprobes for the FISH experiments and primer sequences for qPCR analysis.

Functional annotations of the genes were given in the S. mansoni genome version 9 (WormBase, WS268).

https://doi.org/10.7554/eLife.48994.018
Transparent reporting form
https://doi.org/10.7554/eLife.48994.019

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)