5 figures, 4 tables and 3 additional files

Figures

Overview of common cell type nomenclature (CCN) and application to human middle temporal gyrus (MTG).

(A) Schematic of CCN components and process. (B–D) Example outputs from the CCN. (B) Annotated dendrogram of cell types in human MTG, along with associated cell type names, reproduced from Hodge et al., 2019. Internal nodes with a term (teal circles) represent cell sets with preferred alias tags. (C) CCN annotations for a putative cell type (outlined in blue) and an internal node (outlined in orange) of this dendrogram. (D) Snippet of an output file from the CCN showing cell to cell set mappings as applied to human MTG.

Workflow for assigning types to a given dataset with taxonomy.

(1) Cell type classification will initially be performed separately on all taxonomies. (2) One, some, or all of these datasets will be combined into a high-confidence reference taxonomy which can be used as a comparator for any related datasets, by (3) mapping existing and new datasets to the reference taxonomy. (4) The reference will periodically be updated as new datasets and taxonomies are generated.

Series of multimodal, cross-species taxonomies in primary motor cortex (M1) demonstrates utility of nomenclature schema.

(A) Taxonomies based on transcriptomic (‘1’; top), open chromatin (‘2’; middle), and DNA methylation (‘3’; bottom) in human M1. Epigenomic clusters (‘2’, ‘3’; in rows) aligned to RNAseq clusters (‘1’) as indicated by horizontal black bars and are also assigned matching cell sets in the relevant taxonomies. Adapted from Bakken et al., 2020a. (B) Flow chart showing all 11 taxonomies generated for this project and their connections. The integrated (reference) taxonomy included nuclei collected using snRNA-seq from three species (gray box), with nuclei collected from layer five in macaque mapped to this space post hoc (gray line). Separately, epigenetics taxonomies from human, marmoset, and mouse were aligned to their respective transcriptomics taxonomies (black lines). This entire taxonomic structure is captured by the CCN (see Supplementary file 1). (C) An example mapping of corticothalamic (L6 CT) provisional cell types across the human and transcriptomics taxonomies using the CCN (black box in A). Preferred aliases for each taxonomy are used for clarity.

Alignment of glutamatergic cell sets in human middle temporal gyrus (MTG) to a reference primary motor cortex (M1) taxonomy.

Cluster overlap heatmap showing the proportion of nuclei from MTG clusters and the reference (M1) clusters that coalesce with a given aligned cluster. Cell sets corresponding to aligned aliases in the MTG and M1 taxonomies are labeled and indicated by blue boxes. Adapted from Bakken et al., 2020a.

Application of common cell type nomenclature (CCN) to glutamatergic me-types in the mouse visual cortex.

Excitatory (glutamatergic) me-types from Gouwens et al., 2019 that have been incorporated into the nomenclature schema. Eleven of the original 20 excitatory me-types are shown as examples. Representative morphologies and electrophysiological responses are shown to illustrate the differences between types. The ‘inferred subclass’ calls perfectly map to cell set aligned aliases from the reference M1 taxonomy in Figure 3, except that L5 CF (corticofugal) is an additional alias for L5 ET, and cells sets corresponding to L4, L6 IT, and L6 CT (blue boxes) have been added to the taxonomy.

Tables

Table 1
Glossary of terms.

Terminology used with the common cell type nomenclature (CCN), definitions for use, and examples of how terms are applied. Terms are presented in bold upon first use in the text. This glossary is intended to clarify use for the purposes of the CCN since some terms are open to multiple interpretations, and effective classification requires disambiguation. Asterisks denote terms that represent specific components of the CCN.

TermDefinitionExample
TaxonomySet of quantitatively derived data clusters defined by a specific computational algorithm on a specific dataset(s). Taxonomies are given a unique label and can be annotated with metadata about the taxonomy, including details of the algorithms and relevant cell and cell set IDs.Any clustering result in a cell type classification manuscript
DatasetFeature information (e.g., gene expression) and associated metadata from a set of cells collected as part of a single project.Gene expression from 6000 human MOp nuclei
OntologyA structured controlled vocabulary for cell types.Cell Ontology
Marker gene(s)A gene (gene set) which, when expressed in a cell, can be used to accurately assign that cell to a specific cell set.GAD2; PVALB;
CHODL
Taxonomy ID*An identifier uniquely tagging a taxonomy of the format CCN[YYYYMMDD][#].CCN201910120
CellA single entry in a taxonomy representing data from a single cell (or cell compartment, such as the nucleus). Cells have metadata including a unique ID.N/A
Cell setAny tagged group of cells in a taxonomy. This includes cell types, groups of cell types, and potentially other informative groupings (e.g., all cells from one donor, organ, cortical layer, or transgenic line). Cell sets have several IDs and descriptors (as discussed below) and can also have other metadata.A cell type; a group of cell types; all cells from layer two in MTG; all cells from donor X
Provisional cell typeQuantitatively derived data cluster defined within a taxonomy. This is a specific example of a cell set that is of high importance, as most other cell sets are groupings of one or more provisional cell types. Here, the term ‘cell type’ is synonymous with ‘provisional cell type.’ .A cell type defined in a specific study
DendrogramA hierarchical organization of provisional cell types defined for a specific taxonomy. Dendrograms have a specific semantic and visualizable structure and include nodes (representing multiple provisional cell types) and leaves (representing exactly one). Not all taxonomies include a dendrogram (e.g., if the structure of cell sets is non-hierarchical).N/A
Community structureNon-hierarchical relationships between cell types defined as groups of cell types in a graph.N/A
Cell set accession ID*A unique ID across all tracked datasets and taxonomies. This tag labels the taxonomy and numbers each cell type. CS[taxonomy id]_[unique # within taxonomy]CS201910120_1
Cell set label*An ID unique within a single taxonomy that is used for assigning cells to cell sets defined as a combination of multiple ‘provisional cell types’.MTG 12
MTG 01–08
Cell set alias*Any cell set descriptor. It can be defined computationally from the data, or manually based on new experiments, prior knowledge, or a combination of both. Cell aliases beyond the ‘preferred’ or ‘aligned’ are defined as ‘cell set additional aliases’.(Any ‘cell set aligned alias’); Interneuron 1; Rosehip
Cell set preferred alias*The primary cell set alias (e.g., what cell types might be called in a publication). This can sometimes match the aligned alias, but not always, and can be left unassigned.Inh L1-2 PAX6 CDH12; ADARB2 (CGE); Chandelier; [blank]
Cell set aligned alias*Analogous to ‘gene symbol’. At most one biologically driven name for linking matching cell sets across taxonomies and with a reference taxonomy.L2/3 IT 4; Pvalb 3; Microglia 2
Cell set structure*The location in the brain (or body) from where cells in the associated set were primarily collected.Neocortex
Cell set ontology tag*A tag from a standard ontology (e.g., UBERON) corresponding to the listed cell set structure.UBERON:0001950
Cell set alias assignee*Person responsible for assigning a specific cell set alias in a specific taxonomy (e.g., the person who built the taxonomy or uploaded the data, or a field expert).(First author of manuscript)
Cell set alias citation*The citation or permanent data identifier corresponding to the taxonomy where the cell set was originally reported.(Manuscript DOI); [blank]
Reference taxonomyA taxonomy based on one or a combination of high-confidence datasets, to be used as a baseline of comparison for datasets collected from the same organ system.Cross-species cortical cell type classification
Morpho-electric(ME) typeA provisional cell type defined using a combination of morphological and electrophysiological features.ME_Exc_7
Governing bodyA forum of subject-matter experts to guide policy and manage change of the CCN and associated ontologies and databasing efforts.N/A
Table 2
Proposed strategy for naming cortical cell types.
ClassFormatExample
Glutamatergic[Layer] [Projection] #L2/3 IT 4
GABAergic[Canonical gene(s)] #Pvalb 3
Non-neuronal[Cell class] #Microglia 2
Any class[Historical name] #Chandelier 1
Table 3
Nomenclature for ‘Sst Chodl’ cell sets cited in Bakken et al., 2020a.

Relevant common cell type nomenclature (CCN) entities and taxonomy metadata, including the cell set additional alias that links to cell set labels from relevant transcriptomics taxonomies. All listed cell sets have a cell set structure of ‘primary motor cortex’ and a cell set ontology tag of ‘UBERON:0001384’.

#Cell set preferred aliasCell set labelCell set accessionCell set aligned aliasCell set additional alias
1Inh L1-6 SST NPYRNA-seq 040CS201912131_40Sst Chodl
2Inh L1-5 SST AHRDNAm 12CS202002272_12RNA-seq 040, 046–047, 050–052, 068 in CCN201912131
3Inh L1-6 SST NPYATAC-seq 08CS202002273_8Sst ChodlRNA-seq 040 in CCN201912131
4Inh SST NPYRNA-seq 01CS201912132_1Sst Chodl
5Sst ChodlRNA-seq 028CS202002013_28Sst Chodl
6Sst ChodlDNAm 09CS202002276_9Sst ChodlRNA-seq 028 in CCN202002013
7Sst ChodlATAC-seq 10CS202002277_10Sst ChodlRNA-seq 028 in CCN202002013
8Sst ChodlIntegrated 14CS202002270_14Sst ChodlLong-range projecting Sst
#Cell set alias assigneeCell set alias citationTaxonomy idSpeciesModality
1Nikolas Jorstad10.1101/2020.03.31.016972CCN201912131HumanRNA-seq
2Wei Tian10.1101/2020.03.31.016972CCN202002272HumanDNAm
3Blue Lake10.1101/2020.03.31.016972CCN202002273HumanATAC-seq
4Fenna Krienen10.1101/2020.03.31.016972CCN201912132MarmosetRNA-seq
5Zizhen Yao10.1101/2020.02.29.970558CCN202002013MouseRNA-seq
6Hanqing Liu10.1101/2020.02.29.970558CCN202002276MouseDNAm
7Yang Li10.1101/2020.02.29.970558CCN202002277MouseATAC-seq
8Nikolas Jorstad10.1101/2020.03.31.016972CCN202002270AllRNA-seq
Table 4
Taxonomies with applied CCN.

Table showing the set of taxonomies included in Supplementary file 1. All taxonomies include the annotated nomenclature table. Asterisk (*) and carrot (^) indicate that the updated dendrogram and cell to cell set mapping files are also included for that taxonomy, respectively. CCN202002270 is the reference taxonomy presented in Figure 3B.

Taxonomy idDescriptionReference
CCN201810310^*Mouse VISp + ALM (from the Tasic et al., 2018)Tasic et al., 2018
CCN201908210^*Human MTG (from the Tasic et al., 2018)Hodge et al., 2019
CCN201908211^*Joint mouse/human analysis (slight modification from Hodge et al., 2019)Hodge et al., 2019
CCN201912130Human M1 taxonomy using 10× dataBakken et al., 2020a
CCN201912131Human M1 taxonomy using Smart-seq and 10x dataBakken et al., 2020a
CCN201912132Marmoset M1 taxonomy using 10× dataBakken et al., 2020a
CCN202002013*Mouse MOp BICCN taxonomy using multiple RNAseq datasetsYao et al., 2020a
CCN202002270Cross species (integrated) transcriptomics taxonomyBakken et al., 2020a
CCN202002271Macaque transcriptomics taxonomy, layer 5/6 onlyBakken et al., 2020a
CCN202002272Human DNA methylation taxonomyBakken et al., 2020a
CCN202002273Human ATAC-seq taxonomyBakken et al., 2020a
CCN202002274Marmoset DNA methylation taxonomyBakken et al., 2020a
CCN202002275Marmoset ATAC-seq taxonomyBakken et al., 2020a
CCN202002276Mouse DNA methylation taxonomyYao et al., 2020a
CCN202002277Mouse ATAC-seq taxonomyYao et al., 2020a
CCN202005150^Mouse inhibitory neurons in VISp defined using electrophysiology, morphology, and transcriptomicsGouwens et al., 2020
CCN201906170Mouse neurons in VISp defined using electrophysiology and morphologyGouwens et al., 2019
CCN201805250Turtle pallium transcriptomics taxonomyTosches et al., 2018

Additional files

Supplementary file 1

Output files from applying the CCN on 17 taxonomies.

This file contains annotated cell sets from all 17 taxonomies shown in Table 4 along with annotated dendrograms and cell to cell set assignments for a subset of these taxonomies. This file is available on GitHub (https://github.com/AllenInstitute/nomenclature).

https://cdn.elifesciences.org/articles/59928/elife-59928-supp1-v2.zip
Supplementary file 2

A set of aligned aliase in mammalian M1, reproduced from Bakken et al., 2020a.

These terms are also applicable to other cortical areas, representing a starting point for future cell type classification efforts and for ontology curation. InterLex identifiers are provided in parentheses when available (Adkins et al., 2020).

https://cdn.elifesciences.org/articles/59928/elife-59928-supp2-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/59928/elife-59928-transrepform-v2.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jeremy A Miller
  2. Nathan W Gouwens
  3. Bosiljka Tasic
  4. Forrest Collman
  5. Cindy TJ van Velthoven
  6. Trygve E Bakken
  7. Michael J Hawrylycz
  8. Hongkui Zeng
  9. Ed S Lein
  10. Amy Bernard
(2020)
Common cell type nomenclature for the mammalian brain
eLife 9:e59928.
https://doi.org/10.7554/eLife.59928