Figures and data in Common cell type nomenclature for the mammalian brain

Figures
Tables
Additional files

5 figures, 4 tables and 3 additional files

Figures

Figure 1

Download asset Open asset

Overview of common cell type nomenclature (CCN) and application to human middle temporal gyrus (MTG).

(A) Schematic of CCN components and process. (**B–D**) Example outputs from the CCN. (B) Annotated dendrogram of cell types in human MTG, along with associated cell type names, reproduced from Hodge et al., 2019. Internal nodes with a term (teal circles) represent cell sets with preferred alias tags. (C) CCN annotations for a putative cell type (outlined in blue) and an internal node (outlined in orange) of this dendrogram. (D) Snippet of an output file from the CCN showing cell to cell set mappings as applied to human MTG.

Figure 2

Download asset Open asset

Workflow for assigning types to a given dataset with taxonomy.

(1) Cell type classification will initially be performed separately on all taxonomies. (2) One, some, or all of these datasets will be combined into a high-confidence reference taxonomy which can be used as a comparator for any related datasets, by (3) mapping existing and new datasets to the reference taxonomy. (4) The reference will periodically be updated as new datasets and taxonomies are generated.

Figure 3

Download asset Open asset

Series of multimodal, cross-species taxonomies in primary motor cortex (M1) demonstrates utility of nomenclature schema.

(A) Taxonomies based on transcriptomic (‘1’; top), open chromatin (‘2’; middle), and DNA methylation (‘3’; bottom) in human M1. Epigenomic clusters (‘2’, ‘3’; in rows) aligned to RNAseq clusters (‘1’) as indicated by horizontal black bars and are also assigned matching cell sets in the relevant taxonomies. Adapted from Bakken et al., 2020a. (B) Flow chart showing all 11 taxonomies generated for this project and their connections. The integrated (reference) taxonomy included nuclei collected using snRNA-seq from three species (gray box), with nuclei collected from layer five in macaque mapped to this space post hoc (gray line). Separately, epigenetics taxonomies from human, marmoset, and mouse were aligned to their respective transcriptomics taxonomies (black lines). This entire taxonomic structure is captured by the CCN (see Supplementary file 1). (C) An example mapping of corticothalamic (L6 CT) provisional cell types across the human and transcriptomics taxonomies using the CCN (black box in A). Preferred aliases for each taxonomy are used for clarity.

Figure 4

Download asset Open asset

Alignment of glutamatergic cell sets in human middle temporal gyrus (MTG) to a reference primary motor cortex (M1) taxonomy.

Cluster overlap heatmap showing the proportion of nuclei from MTG clusters and the reference (M1) clusters that coalesce with a given aligned cluster. Cell sets corresponding to aligned aliases in the MTG and M1 taxonomies are labeled and indicated by blue boxes. Adapted from Bakken et al., 2020a.

Figure 5

Download asset Open asset

Application of common cell type nomenclature (CCN) to glutamatergic me-types in the mouse visual cortex.

Excitatory (glutamatergic) me-types from Gouwens et al., 2019 that have been incorporated into the nomenclature schema. Eleven of the original 20 excitatory me-types are shown as examples. Representative morphologies and electrophysiological responses are shown to illustrate the differences between types. The ‘inferred subclass’ calls perfectly map to cell set aligned aliases from the reference M1 taxonomy in Figure 3, except that L5 CF (corticofugal) is an additional alias for L5 ET, and cells sets corresponding to L4, L6 IT, and L6 CT (blue boxes) have been added to the taxonomy.

Tables

Table 1

Glossary of terms.

Terminology used with the common cell type nomenclature (CCN), definitions for use, and examples of how terms are applied. Terms are presented in bold upon first use in the text. This glossary is intended to clarify use for the purposes of the CCN since some terms are open to multiple interpretations, and effective classification requires disambiguation. Asterisks denote terms that represent specific components of the CCN.

Term	Definition	Example
Taxonomy	Set of quantitatively derived data clusters defined by a specific computational algorithm on a specific dataset(s). Taxonomies are given a unique label and can be annotated with metadata about the taxonomy, including details of the algorithms and relevant cell and cell set IDs.	Any clustering result in a cell type classification manuscript
Dataset	Feature information (e.g., gene expression) and associated metadata from a set of cells collected as part of a single project.	Gene expression from 6000 human MOp nuclei
Ontology	A structured controlled vocabulary for cell types.	Cell Ontology
Marker gene(s)	A gene (gene set) which, when expressed in a cell, can be used to accurately assign that cell to a specific cell set.	GAD2; PVALB; CHODL
Taxonomy ID*	An identifier uniquely tagging a taxonomy of the format CCN[YYYYMMDD][#].	CCN201910120
Cell	A single entry in a taxonomy representing data from a single cell (or cell compartment, such as the nucleus). Cells have metadata including a unique ID.	N/A
Cell set	Any tagged group of cells in a taxonomy. This includes cell types, groups of cell types, and potentially other informative groupings (e.g., all cells from one donor, organ, cortical layer, or transgenic line). Cell sets have several IDs and descriptors (as discussed below) and can also have other metadata.	A cell type; a group of cell types; all cells from layer two in MTG; all cells from donor X
Provisional cell type	Quantitatively derived data cluster defined within a taxonomy. This is a specific example of a cell set that is of high importance, as most other cell sets are groupings of one or more provisional cell types. Here, the term ‘cell type’ is synonymous with ‘provisional cell type.’ .	A cell type defined in a specific study
Dendrogram	A hierarchical organization of provisional cell types defined for a specific taxonomy. Dendrograms have a specific semantic and visualizable structure and include nodes (representing multiple provisional cell types) and leaves (representing exactly one). Not all taxonomies include a dendrogram (e.g., if the structure of cell sets is non-hierarchical).	N/A
Community structure	Non-hierarchical relationships between cell types defined as groups of cell types in a graph.	N/A
Cell set accession ID*	A unique ID across all tracked datasets and taxonomies. This tag labels the taxonomy and numbers each cell type. CS[taxonomy id]_[unique # within taxonomy]	CS201910120_1
Cell set label*	An ID unique within a single taxonomy that is used for assigning cells to cell sets defined as a combination of multiple ‘provisional cell types’.	MTG 12 MTG 01–08
Cell set alias*	Any cell set descriptor. It can be defined computationally from the data, or manually based on new experiments, prior knowledge, or a combination of both. Cell aliases beyond the ‘preferred’ or ‘aligned’ are defined as ‘cell set additional aliases’.	(Any ‘cell set aligned alias’); Interneuron 1; Rosehip
Cell set preferred alias*	The primary cell set alias (e.g., what cell types might be called in a publication). This can sometimes match the aligned alias, but not always, and can be left unassigned.	Inh L1-2 PAX6 CDH12; ADARB2 (CGE); Chandelier; [blank]
Cell set aligned alias*	Analogous to ‘gene symbol’. At most one biologically driven name for linking matching cell sets across taxonomies and with a reference taxonomy.	L2/3 IT 4; Pvalb 3; Microglia 2
Cell set structure*	The location in the brain (or body) from where cells in the associated set were primarily collected.	Neocortex
Cell set ontology tag*	A tag from a standard ontology (e.g., UBERON) corresponding to the listed cell set structure.	UBERON:0001950
Cell set alias assignee*	Person responsible for assigning a specific cell set alias in a specific taxonomy (e.g., the person who built the taxonomy or uploaded the data, or a field expert).	(First author of manuscript)
Cell set alias citation*	The citation or permanent data identifier corresponding to the taxonomy where the cell set was originally reported.	(Manuscript DOI); [blank]
Reference taxonomy	A taxonomy based on one or a combination of high-confidence datasets, to be used as a baseline of comparison for datasets collected from the same organ system.	Cross-species cortical cell type classification
Morpho-electric(ME) type	A provisional cell type defined using a combination of morphological and electrophysiological features.	ME_Exc_7
Governing body	A forum of subject-matter experts to guide policy and manage change of the CCN and associated ontologies and databasing efforts.	N/A

Table 2

Proposed strategy for naming cortical cell types.

Class	Format	Example
Glutamatergic	[Layer] [Projection] #	L2/3 IT 4
GABAergic	[Canonical gene(s)] #	Pvalb 3
Non-neuronal	[Cell class] #	Microglia 2
Any class	[Historical name] #	Chandelier 1

Table 3

Nomenclature for ‘Sst Chodl’ cell sets cited in Bakken et al., 2020a.

Relevant common cell type nomenclature (CCN) entities and taxonomy metadata, including the cell set additional alias that links to cell set labels from relevant transcriptomics taxonomies. All listed cell sets have a cell set structure of ‘primary motor cortex’ and a cell set ontology tag of ‘UBERON:0001384’.

#	Cell set preferred alias	Cell set label	Cell set accession	Cell set aligned alias	Cell set additional alias
1	Inh L1-6 SST NPY	RNA-seq 040	CS201912131_40	Sst Chodl
2	Inh L1-5 SST AHR	DNAm 12	CS202002272_12		RNA-seq 040, 046–047, 050–052, 068 in CCN201912131
3	Inh L1-6 SST NPY	ATAC-seq 08	CS202002273_8	Sst Chodl	RNA-seq 040 in CCN201912131
4	Inh SST NPY	RNA-seq 01	CS201912132_1	Sst Chodl
5	Sst Chodl	RNA-seq 028	CS202002013_28	Sst Chodl
6	Sst Chodl	DNAm 09	CS202002276_9	Sst Chodl	RNA-seq 028 in CCN202002013
7	Sst Chodl	ATAC-seq 10	CS202002277_10	Sst Chodl	RNA-seq 028 in CCN202002013
8	Sst Chodl	Integrated 14	CS202002270_14	Sst Chodl	Long-range projecting Sst

#	Cell set alias assignee	Cell set alias citation	Taxonomy id	Species	Modality
1	Nikolas Jorstad	10.1101/2020.03.31.016972	CCN201912131	Human	RNA-seq
2	Wei Tian	10.1101/2020.03.31.016972	CCN202002272	Human	DNAm
3	Blue Lake	10.1101/2020.03.31.016972	CCN202002273	Human	ATAC-seq
4	Fenna Krienen	10.1101/2020.03.31.016972	CCN201912132	Marmoset	RNA-seq
5	Zizhen Yao	10.1101/2020.02.29.970558	CCN202002013	Mouse	RNA-seq
6	Hanqing Liu	10.1101/2020.02.29.970558	CCN202002276	Mouse	DNAm
7	Yang Li	10.1101/2020.02.29.970558	CCN202002277	Mouse	ATAC-seq
8	Nikolas Jorstad	10.1101/2020.03.31.016972	CCN202002270	All	RNA-seq

Table 4

Taxonomies with applied CCN.

Table showing the set of taxonomies included in Supplementary file 1. All taxonomies include the annotated nomenclature table. Asterisk (*) and carrot (^) indicate that the updated dendrogram and cell to cell set mapping files are also included for that taxonomy, respectively. CCN202002270 is the reference taxonomy presented in Figure 3B.

Taxonomy id	Description	Reference
CCN201810310^*	Mouse VISp + ALM (from the Tasic et al., 2018)	Tasic et al., 2018
CCN201908210^*	Human MTG (from the Tasic et al., 2018)	Hodge et al., 2019
CCN201908211^*	Joint mouse/human analysis (slight modification from Hodge et al., 2019)	Hodge et al., 2019
CCN201912130	Human M1 taxonomy using 10× data	Bakken et al., 2020a
CCN201912131	Human M1 taxonomy using Smart-seq and 10x data	Bakken et al., 2020a
CCN201912132	Marmoset M1 taxonomy using 10× data	Bakken et al., 2020a
CCN202002013*	Mouse MOp BICCN taxonomy using multiple RNAseq datasets	Yao et al., 2020a
CCN202002270	Cross species (integrated) transcriptomics taxonomy	Bakken et al., 2020a
CCN202002271	Macaque transcriptomics taxonomy, layer 5/6 only	Bakken et al., 2020a
CCN202002272	Human DNA methylation taxonomy	Bakken et al., 2020a
CCN202002273	Human ATAC-seq taxonomy	Bakken et al., 2020a
CCN202002274	Marmoset DNA methylation taxonomy	Bakken et al., 2020a
CCN202002275	Marmoset ATAC-seq taxonomy	Bakken et al., 2020a
CCN202002276	Mouse DNA methylation taxonomy	Yao et al., 2020a
CCN202002277	Mouse ATAC-seq taxonomy	Yao et al., 2020a
CCN202005150^	Mouse inhibitory neurons in VISp defined using electrophysiology, morphology, and transcriptomics	Gouwens et al., 2020
CCN201906170	Mouse neurons in VISp defined using electrophysiology and morphology	Gouwens et al., 2019
CCN201805250	Turtle pallium transcriptomics taxonomy	Tosches et al., 2018

Additional files

Supplementary file 1 Output files from applying the CCN on 17 taxonomies. This file contains annotated cell sets from all 17 taxonomies shown in Table 4 along with annotated dendrograms and cell to cell set assignments for a subset of these taxonomies. This file is available on GitHub (https://github.com/AllenInstitute/nomenclature).: https://cdn.elifesciences.org/articles/59928/elife-59928-supp1-v2.zip
Download elife-59928-supp1-v2.zip
Supplementary file 2 A set of aligned aliase in mammalian M1, reproduced from Bakken et al., 2020a. These terms are also applicable to other cortical areas, representing a starting point for future cell type classification efforts and for ontology curation. InterLex identifiers are provided in parentheses when available (Adkins et al., 2020).: https://cdn.elifesciences.org/articles/59928/elife-59928-supp2-v2.xlsx
Download elife-59928-supp2-v2.xlsx
Transparent reporting form: https://cdn.elifesciences.org/articles/59928/elife-59928-transrepform-v2.pdf
Download elife-59928-transrepform-v2.pdf