Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq

Abstract
Data availability
Article and author information
Metrics

Abstract

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell's expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.

Data availability

All of the analyzed real datasets are publicly available and the relevant GEO accession codes are included in the manuscript. All of the simulated and real data can be accessed through Code Ocean at the following URL: https://doi.org/10.24433/CO.9044782e-cb96-4733-8a4f-bf42c21399e6

The following previously published data sets were used

1. Quadrato G
2. Nguyen T
3. Macosko EZ
4. Sherwood JL
5. Berger D
6. Maria N
7. Scholvin J
8. Goldman M
9. Kinney J
10. Boyden E
11. Lichtman J
12. Williams ZM
13. McCarroll SA
14. Arlotta P
(2017) Cell diversity and network dynamics in photosensitive human brain organoids.
Gene Expression Omnibus, GSE86153.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE86153
(2018) Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex
Gene Expression Omnibus, GSE102827.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102827
1. Tasic B
2. Menon V
3. Nguyen TN
4. Kim TK
5. Yao Z
6. Gray LT
7. Hawrylycz M
8. Koch C
9. Zeng H
(2016) Adult mouse cortical cell taxonomy by single cell transcriptomics
Gene Expression Omnibus, GSE71585.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE71585
1. Baron M
2. Veres A
3. Wolock SL
4. Faust AL
5. Gaujoux R
6. Vetere A
7. Ryu JH
8. Wagner BK
9. Shen-Orr SS
10. Klein AM
11. Melton DA
12. Yanai I
(2016) A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure
Gene Expression Omnibus, GSE50244.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50244

Article and author information

Author details

Dylan Kotliar

Department of Systems Biology, Harvard Medical School, Boston, United States

For correspondence
dylan_kotliar@hms.harvard.edu

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-7968-645X
Adrian Veres

Department of Systems Biology, Harvard Medical School, Boston, United States

Competing interests
The authors declare that no competing interests exist.
M Aurel Nagy

Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, United States

Competing interests
The authors declare that no competing interests exist.
Shervin Tabrizi

Viral Computational Genomics, Broad Institute of MIT and Harvard, Cambridge, United States

Competing interests
The authors declare that no competing interests exist.
Eran Hodis

Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, United States

Competing interests
The authors declare that no competing interests exist.
Douglas A Melton

Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Harvard University, Cambridge, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-1623-5504
Pardis C Sabeti

Department of Systems Biology, Harvard Medical School, Boston, United States

Competing interests
The authors declare that no competing interests exist.

Funding

National Institute of General Medical Sciences (T32GM007753)

Dylan Kotliar
Adrian Veres
M Aurel Nagy
Eran Hodis

National Institute of Allergy and Infectious Diseases (R01AI099210)

Pardis C Sabeti

U.S. Food and Drug Administration (HHSF223201810172C)

Dylan Kotliar
Pardis C Sabeti

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.