1. Neuroscience
Download icon

Single-cell transcriptomic evidence for dense intracortical neuropeptide networks

  1. Stephen J Smith  Is a corresponding author
  2. Uygar Sümbül
  3. Lucas T Graybuck
  4. Forrest Collman
  5. Sharmishtaa Seshamani
  6. Rohan Gala
  7. Olga Gliko
  8. Leila Elabbady
  9. Jeremy A Miller
  10. Trygve E Bakken
  11. Jean Rossier
  12. Zizhen Yao
  13. Ed Lein
  14. Hongkui Zeng
  15. Bosiljka Tasic
  16. Michael Hawrylycz  Is a corresponding author
  1. Allen Institute for Brain Science, United States
  2. Neuroscience Paris Seine, Sorbonne Université, France
Research Article
  • Cited 2
  • Annotations
Cite this article as: eLife 2019;8:e47889 doi: 10.7554/eLife.47889

Abstract

Seeking new insights into the homeostasis, modulation and plasticity of cortical synaptic networks, we have analyzed results from a single-cell RNA-seq study of 22,439 mouse neocortical neurons. Our analysis exposes transcriptomic evidence for dozens of molecularly distinct neuropeptidergic modulatory networks that directly interconnect all cortical neurons. This evidence begins with a discovery that transcripts of one or more neuropeptide precursor (NPP) and one or more neuropeptide-selective G-protein-coupled receptor (NP-GPCR) genes are highly abundant in all, or very nearly all, cortical neurons. Individual neurons express diverse subsets of NP signaling genes from palettes encoding 18 NPPs and 29 NP-GPCRs. These 47 genes comprise 37 cognate NPP/NP-GPCR pairs, implying the likelihood of local neuropeptide signaling. Here, we use neuron-type-specific patterns of NP gene expression to offer specific, testable predictions regarding 37 peptidergic neuromodulatory networks that may play prominent roles in cortical homeostasis and plasticity.

Introduction

Neuromodulation - the graded and relatively slow adjustment of fast synapse and ion channel function via diffusible cell-cell signaling molecules - is a fundamental requirement for adaptive nervous system function (Abbott and Regehr, 2004; Bargmann, 2012; Bucher and Marder, 2013; Katz and Lillvis, 2014; Marder, 2012; Marder et al., 2015; McCormick and Nusbaum, 2014; Nadim and Bucher, 2014; Nusbaum et al., 2017). Neuromodulator molecules take many different chemical forms, including diatomic gases such as nitric oxide, lipid metabolites such as the endocannabinoids, and amino acids and their metabolites such as glutamate, GABA, acetylcholine, serotonin and dopamine. By far the largest family of neuromodulator molecules known, however, comprises the evolutionarily ancient proteinaceous signaling molecules known as neuropeptides (Baraban and Tallent, 2004; Burbach, 2011; Gonzalez-Suarez and Nitabach, 2018; Hökfelt et al., 2013; van den Pol, 2012; Wang et al., 2015). The most widely studied neuropeptides are the endogenous ‘opioid’ peptides - enkephalins, endorphins and dynorphins - but there are nearly one hundred other NPP genes in the human genome and numerous homologs are present in almost all known animal genomes (Elphick et al., 2018; Jékely, 2013).

The broadest definition of ‘neuropeptide’ would embrace any soluble peptide that serves as a messenger by diffusing from one neuron to another. A narrower but more common definition (Burbach, 2011) requires (1) translation of a neuropeptide precursor protein (NPP) into the lumen of a source neuron’s rough endoplasmic reticulum (rER), (2) enzymatic cleavage of the NPP into one or more neuropeptide (NP) products during or after passage through the rER–Golgi complex and packaging into dense-core vesicles, (3) transport and storage of dense-core vesicles within the source neuron, (4) secretion of the NP product upon demand by activity- and calcium-dependent dense-core vesicle exocytosis, and then (5) interstitial diffusion to act upon a target neuron by binding to a ligand-specific receptor. This pathway enlarges the potential palette of distinct neuropeptides beyond that established simply by the large number of NPP genes, as a given NPP may be differentially cleaved into alternative NP products during its intracellular and interstitial passage.

Most neuropeptide receptors are encoded by members of the very large superfamily of G-protein-coupled receptor (GPCR) genes (Hoyer and Bartfai, 2012; Krishnan and Schiöth, 2015; Mains and Eipper, 2006; van den Pol, 2012). GPCRs are selective, high-affinity receptors distinguished by characteristic seven-transmembrane-segment atomic structures and signal transduction involving heterotrimeric G-proteins (hence their name). Genes encoding GPCRs selective for neuropeptides (NP-GPCR genes) and for most of the other chemical neuromodulators mentioned above are found in the genomes of almost all metazoans: phylogenomic evidence suggests very early evolutionary origins (Jékely, 2013; Katz and Lillvis, 2014), possibly even predating evolution of the synapse (Varoqueaux and Fasshauer, 2017). In modern metazoan nervous systems, synapses rely primarily upon recycling small molecule transmitters and ligand-gated ion channels (alternatively known as ‘ionotropic receptors’) for fast (millisecond timescale) transmission, but GPCRs selective for widely varied ligands including the fast recycling transmitters and many other secreted molecules (e.g., glutamate, GABA, acetylcholine, monoamines and neuropeptides) play critical roles in the slower modulation of fast synaptic transmission and electrical activity (Elphick et al., 2018; Grimmelikhuijzen and Hauser, 2012; Jékely, 2013; Krishnan and Schiöth, 2015; Varoqueaux and Fasshauer, 2017).

Because modulatory neuropeptides are not subject to the rapid transmitter re-uptake and/or degradation processes necessary for fast synaptic transmission, secreted neuropeptides are thought to persist long enough (e.g., minutes) in brain interstitial spaces for diffusion to very-high-affinity NP-GPCRs hundreds of micrometers distant from release sites (Ludwig and Leng, 2006; Nässel, 2009; Russo, 2017). Neuropeptide signaling can thus be presumed both ‘paracrine’, with secretion from individual neurons hitting receptor-positive cells over substantial diffusion distances and converging by diffusion from many local secreting neurons onto single receptor-positive neurons, and to be relatively slow (seconds-to-minutes timescale of action). Though present information is limited, eventual degradation by interstitial peptidases nonetheless probably restricts diffusion of most neuropeptides to sub-millimeter, local circuit distance scales.

The receptors encoded by NP-GPCR genes are highly diverse in ligand specificity but less diverse in downstream signaling impacts. Although GPCR signaling has long been recognized as complex and many faceted (Hamm, 1998), most known neuronal NP-GPCR actions reflect phosphorylation of ion channels, synaptic proteins or transcription factors mediated by protein kinases dependent on the second messengers cyclic AMP or calcium (Mains and Eipper, 2006; Nadim and Bucher, 2014; van den Pol, 2012). Primary effects of NP-GPCRs expressed in cortex, in turn, fall into just three major categories distinguished by G-protein alpha subunit (Gα) family. The Gi/o family (i/o) inhibits cAMP production, the Gs family (s) stimulates cAMP production, and the Gq/11 family (q/11) amplifies calcium signaling dynamics (Syrovatkina et al., 2016). For most NP-GPCR genes, the primary Gα family (e.g., i/o, s or q/11) is now known (Alexander et al., 2017) and offers a good first-order prediction of the encoded GPCR’s signal transducing action. The profound functional consequences of neuromodulation by GPCRs range from modification of neuronal firing properties and calcium signaling dynamics through regulation of synaptic weights and synaptic plasticity (Bargmann, 2012; Markram et al., 2013; McCormick and Nusbaum, 2014).

It is well established that certain neuropeptides, including vasoactive intestinal peptide (VIP), somatostatin (SST), neuropeptide Y (NPY), substance P, and cholecystokinin (CCK), are detectible at high levels in particular subsets of GABAergic cortical neurons (Tremblay et al., 2016). These neuropeptides have come into broad use as markers for particular GABAergic interneuron classes, while the corresponding NPP and NP-GPCR genetics have provided molecular access to these and other broad neuron type classes (Daigle et al., 2018; Maximiliano José et al., 2018). In situ hybridization and microarray data, for example the Allen Brain Atlases (Hawrylycz et al., 2012; Lein et al., 2007), have also established that mRNA transcripts encoding these five NPPs and that many other NPPs and NP-GPCR genes are expressed differentially in many brain regions. There has been a critical lack, however, of comprehensive expression data combining whole-genome depth with single-cell resolution. Absent such data, it has been difficult to generate specific and testable hypotheses regarding cortical neuropeptide function and to design robust experiments to test those hypotheses (Tremblay et al., 2016; van den Pol, 2012).

Here we describe new findings regarding NPP and NP-GPCR gene expression in mouse cortex. These findings have surfaced during a focused analysis of a previously published single-cell RNA-seq data acquired from large numbers of isolated mouse cortical neurons (Tasic et al., 2018). We begin by leveraging only the genomic depth and single-cell resolution of this dataset. Next, we introduce the transcriptomic neurotaxonomy developed in the same resource publication and explore the additional analytical power of a neurotaxonomic framework. Then, we distill these findings into specific and testable predictions concerning intracortical peptidergic modulation networks. Finally, we discuss the potential of a neurotaxonomically integrated view of neuromodulatory and synaptic networks to reveal previously obscure principles of cortical sensory, mnemonic and motor function.

Results

The present study is based on analysis of a resource single-cell RNA-seq dataset acquired at the Allen Institute (Tasic et al., 2018) and available for download at http://celltypes.brain-map.org/rnaseq/. These RNA-seq data were acquired from a total of 22,439 isolated neurons, with detection of transcripts from a median of 9462 genes per cell (min = 1445; max = 15,338) and an overall total of 21,931 protein-coding genes detected. Neurons were sampled from two distant and very different neocortical areas: 13,491 neurons from primary visual cortex (VISp), and 8948 neurons from anterior lateral motor cortex (ALM). Single neuron harvesting methods were designed to mildly enrich samples for GABAergic neurons, such that the sampled neuron population is roughly half GABAergic (47%) and half glutamatergic (53%). The resource publication (Tasic et al., 2018) should be consulted for full details of neuron harvesting, sample preparation, sequencing and data processing. Since we refer very frequently here to this resource publication and dataset, we’ll refer to both now simply as ‘Tasic 2018’, and all further references here to neuron ‘class’, ‘subclass’ or ‘type’ should be understood to refer specifically to the particular mouse neocortex neurotaxonomy described in the Tasic 2018 publication.

The Tasic 2018 single-cell RNA-seq data tables report the abundance of transcripts from individual neurons in both ‘counts per million reads’ (CPM) and ‘fragments per kilobase of exon per million reads mapped’ (FPKM) units. Our analysis of this data compares gene expression levels quantitatively, with two distinct use cases: (1) comparisons across large sets of different genes, and (2) comparisons of the same gene across different individual cells, cell types and brain areas. We have relied upon FPKM data (Mortazavi et al., 2008; Pimentel, 2014), for use case 1 (i.e., the Tables 1 comparisons across genes). For use case 2 (as in all figures below), we have preferred the CPM units, because these units were used to generate the Tasic 2018 neurotaxonomy. In any case, choice of CPM vs. FPKM units would have very little impact on the present outcomes.

Single-neuron expression profiles of 18 select neuropeptide precursor (NPP) genes

Table 1 lists results of analyzing the expression of 18 NPP genes in all 22,439 individual neurons represented in Tasic 2018. Here we have made use of the ‘peak FPKM’ (pFPKM) metric described in Materials and methods below to quantify the expression of specific genes in highly expressing subsets of single-cell populations that exhibit highly variant expression of that particular gene. Each of the 18 NPP genes on this list meets two conditions: (1) the included NPP gene is highly expressed (top quintile pFPKM over all protein-coding genes) across VISp and ALM cortical areas, and (2) at least one gene for an NP-GPCR selective for the predicted product of at least one of the 18 NPPs is highly expressed in neurons within the same local area of neocortex (see Table 2). The first requirement was imposed to increase the likelihood of active secretion of the NP product encoded by the candidate NPP gene. The second requirement, for ‘cognate’ pairing between each included NPP and a locally expressed NP-GPCR gene, was imposed to elevate the likelihood of paracrine NP signaling within a cortical local circuit volume, as envisioned in Introduction above. The process for selection of these 18 NPP genes is described in more detail in Materials and methods. Table 1 lists Peak FPKM values for each of the 18 NPP genes, percentile and absolute ranks of that Peak FPKM value across all protein-coding genes, the percentage of cells sampled in which expression of the listed gene is detectible, predicted neuropeptide product(s) encoded, and the NP-GPCR gene(s) fulfilling requirement (2) for that NPP gene. Gene ontology results for the 18 select NPP genes are provided by Supplementary file 1. The Peak FPKM ranking columns in Table 1 show that expression levels of most of the 18 NPP genes are extremely high in the range of values for all 21,931 protein-coding genes detected in all 22,439 neurons sampled. Of these genes, Npy, Sst, Vip and Tac2 rank as the top four overall in pFPKM values, while Cck, Penk and Crh also rank in the top ten. Eleven of these NPP genes rank in the top percentile and all 18 rank in the top quintile by pFPKM. To the simplest first approximation, very high abundance of a given protein-coding transcript implies the potential, at least, for a very high rate of synthesis of the encoded protein. The extremely high peak abundance of these NPP transcripts thus suggests that NP precursor proteins could be synthesized at very high rates in neurons exhibiting such peak abundance. In a steady state, a high rate of synthesis would then necessarily imply a correspondingly high overall rate of protein product elimination. For an NP precursor protein, processing and secretion of active neuropeptide would seem the obvious and most likely route of elimination. The high abundance of transcripts encoding these 18 NPPs might thus be construed as prima facie evidence for robust secretion of neuropeptide products.

Table 1
Single-cell RNA-seq expression statistics for 18 highly expressed neuropeptide precursor protein (NPP) genes cognate to locally expressed NP-GPCR genes (see Table 2).

NPP genes are tabulated here along with peak single-cell expression levels as pFPKM (Peak FPKM, see Materials and methods), percentile and absolute ranking of these pFPKM values across pFPKMs for all 21,931 protein-coding genes, and the percentage of cells sampled in which transcripts of the given NPP gene were detected at > 1 CPM. The table also lists predicted neuropeptide products, and genes encoding the locally expressed G-protein-coupled receptors (NP-GPCRs) cognate to the NPP (see Table 2). NPP genes are listed here in descending order of Peak FPKM. Pastel color fills in the ‘Cognate NP-GPCR Genes’ column correspond to i/o (pink), s (light green) and q/11 (light blue) transduction families of associated G-protein and will be used to highlight these families consistently in all following figures.

NPP
Gene
Peak FPKMpFPKM
Percentile
pFPKM
Rank
%
Cells
Predicted NeuropeptidesCognate NP-GPCR Genes
Npy108,865100.00142Neuropeptide YNpy1r, Npy2r, Npy5
Sst70,27499.99226SomatostatinsSstr1, Sstr2, Sstr3, Sstr4 
Vip48,74799.99333Vasoactive Intestinal PeptideVipr1, Vipr2
Tac218,28499.98415Neurokinin BTacr3
Cck16,39699.97669CholecystokininsCckbr
Penk11,16099.96826EnkephalinsOprd1, Oprm1
Crh9,11899.951017Corticotropin-Releasing HormoneCrhr1, Crhr2
Cort7,47799.931532CortistatinSstr1, Sstr2, Sstr3, Sstr4 
Tac15,72899.921811Substance P, Neurokinin ATacr1
Pdyn2,81399.69688DynorphinsOprd1, Oprk1, Oprm1
Pthlh1,65699.2915618Parathyroid-Hormone-Like HormonePth1r
Pnoc69897.6850923NociceptinsOprl1
Trh51096.517663Thyrotropin-Releasing HormoneTrhr, Trhr2
Grp43595.5996812Gastrin-Releasing PeptideGrpr
Rln125891.9917577Relaxin 1Rxfp1, Rxfp2Rxfp3 
Adcyap116587.29278826Adenylate Cyclase-Activating PolypeptidesAdcyap1r1, Vipr1, Vipr2 
Nts12182.1439171NeurotensinNtsr1, Ntsr2
Nmb11280.53427014Neuromedin BNmbr

Figure 1A quantifies differential expression of the 18 NPP genes listed in Table 1. Each of the 18 color-coded solid curves represents a distribution of single-neuron CPM values for one NPP gene. Curves were generated by plotting CPM for each individual neuron in descending rank order along a cell population percentile axis. Each curve exhibits a transition from high to very low (commonly zero) expression across the sampled neuron population, but these transitions occur over very different population percentile ranges, providing clear evidence for highly differential single-cell expression of these genes. Percentages of the sampled neuron population with detectible expression of a given NPP gene range from more than 65% for Cck down to ~1% for Nts. (Note, however, that the cell population sampled has been enriched for GABAergic cell types as described in Tasic 2018).

Single-cell NPP gene expression and co-expression statistics for distant neocortical areas VISp and ALM show that expression patterns for 18 NPP genes are highly differential within both neocortical areas but highly conserved between areas.

(A) Different NPP genes show very different expression level distributions across the 22,439 VISp+ALM neurons sampled. Color-coded solid curves plot single-cell CPM values for the specified individual NPP genes in descending order along a cell population percentile axis. The 18 curves are segregated for clarity into three panels (I, ii, iii) sorted by cell population percentiles at which CPM values fall below 1. Large differences in fractions of cells expressing different NPP genes are evident. The dashed curve labeled ‘Max NPP Gene’ in panel A.i was generated by plotting CPM values of for the most abundant NPP transcript in each individual cell in descending order. (B) Fractions of cells expressing each NPP genes represented separately for 13,491 VISp neurons and 8,948 ALM neurons, showing conservation between areas of the patterning of NPP expression fractions detailed in panel A. (C) Histograms illustrating frequencies of various multiples of NPP genes co-expressed in individual neurons, represented separately for VISp and ALM neurons. The paired vertical bars show strong conservation of co-expression patterns between the two areas.

The RNA-seq data suggest that all, or at least very nearly all, neocortical neurons express at least one NPP gene. The dashed curve in Figure 1A, labeled ‘Max NPP Gene’, was generated by plotting CPM values of the NPP gene with the highest CPM in each individual cell in descending order along a cell population percentile axis. This curve therefore shows that 97% of the sampled mouse cortical neurons express at least one NPP gene at >1 CPM and that 80% express at least one NPP gene at >1,000 CPM, a very high level. When one takes into account the pulsatile nature of transcription (Suter et al., 2011) and the stochastic nature of RNA-seq transcript sampling (Fu and Pachter, 2016; Kim et al., 2015; Tasic et al., 2016), these numbers might be best understood as lower limits. The results summarized in Figure 1A may therefore be consistent with the proposition that every cortical neuron is peptidergic.

Conservation of NPP gene expression statistics between VISp and ALM

The paired bars in Figure 1B represent fractions of cells expressing a given gene in each of the two cortical areas. It is obvious that the differential expression profiles in VISp and ALM are highly similar (ρ = 0.972, p<1.72E-11), in spite of stark differences in function and cytoarchitecture between these two areas. Conservation of expression fractions across so many genes in such divergent cortical areas suggests that these patterns have strong connections to conserved features of cortical function and argues against these patterns being secondary to more ephemeral variables such as neuronal activity patterns, which seem unlikely to be highly conserved between VISp and ALM areas. Figure 1C represents frequencies with which transcripts of various multiples drawn from the set of 18 NPP genes were detected in individual neurons. These data establish a remarkable degree of NPP gene co-expression in almost all individual cortical neurons. The modal number of co-expressed NPP genes detected is two in VISp and five in ALM, but both distributions are actually quite flat between 2 and 5, with broad plateaus out to seven co-expressed NPP genes per cell and a substantial tail out to 10. Figure 1C also reveals strong similarities of NPP co-expression distributions between VISp and ALM.

Single-neuron expression profiles of 29 select neuropeptide receptor (NP-GPCR) genes

Table 2 lists 29 NP-GPCR genes that are highly expressed in varied subsets of the 22,439 individual neurons sampled from cortical areas VISp and ALM. These 29 genes encode receptor proteins substantially selective for neuropeptide products encoded by the 18 NPP genes of Table 1 (cross-referenced from that table as ‘Cognate NP-GPCR Genes’). Table 2 provides quantitative information on expression levels of these 29 NP-GPCR genes, names the receptor proteins they encode, indicates the A-F GPCR class and expected primary Gα family and cross-references back to the cognate cortically-expressed NPP genes. As noted above, the 18 NPP genes and 29 NP-GPCR genes listed in Tables 1 and 2 were selected for focused analysis here due to their cognate pairing and the consequent implication of local intracortical signaling. Methods of NP-GPCR gene selection are described more fully in Materials and methods. A more complete listing of NP-GPCR genes with pFPKM values in provided by Table 2—source data 1. Gene ontology results for the 29 select NPP genes are provided by Supplementary file 2. The ‘pFPKM Percentile’ column in Table 2 shows that most of these 29 NP-GPCR genes are expressed in cortex with Peak FPKM values well above median (50th percentile) for all protein coding genes. The range of cortical neuron pFPKM values for NP-GPCR genes does not match the extreme heights noted for NPP genes, but this is as expected given that NP-GPCR gene products are thought to be durable cellular components, unlikely to be rapidly disposed by secretion as expected for NPP gene products. Peak FPKM values for NP-GPCR transcripts are nonetheless quite high in the range of transcripts of other durable cellular component genes, suggesting a strong likelihood that they are indeed translated into functionally important protein products.

Table 2
Single-cell RNA-seq expression statistics for 29 neuropeptide-selective, G-protein-coupled receptor (NP-GPCR) genes cognate to locally expressed NPP genes (see Table 1).

NP-GPCR gene peak FPPM values, percentile ranking, and percentage sampled as for NPP genes in Table 1. The table names encoded NP-GPCR proteins, A-F class of NP-GPCR, primary Gα signal transduction family (Alexander et al., 2017) and cognate NPP genes. Color fill in ‘primary Gα family’ column as in Table 1.

NP-GPCR GenePeak FPKMpFPKM
Percentile
%
Cells
Neuropeptide ReceptorGPCR
Class
Primary
Gα Family
Cognate NPP Genes
Sstr241395.342Somatostatin Receptor 2A4Gi/oSst, Cort
Npy2r29193.110Neuropeptide Y Receptor Y2A9Gi/oNpy
Npy1r27292.450Neuropeptide Y Receptor Y1A9Gi/oNpy
Grpr2319110GRP ReceptorA7Gq/11Grp
Cckbr2109052Cholecystokinin B ReceptorA6Gq/11Cck
Ntsr216186.917Neurotensin Receptor 2A7Gq/11Nts
Npy5r15286.128Neurpeptide Y Receptor Y5A9Gi/oNpy
Nmbr12382.48Neuromedin B ReceptorA7Gq/11Nmb
Rxfp11218222Relaxin Family Receptor 1A5GsRln1
Sstr410679.528Somatostatin Receptor 4A4Gi/oSst, Cort
Trhr10178.43TRH ReceptorA7Gq/11Trh
Sstr1907638Somatostatin Receptor 1A4Gi/oSst, Cort
Adcyap1r18975.871ADCYAP1 Receptor 1B1GsAdcyap1, Vip
Crhr18674.928CRH Receptor 1B1GsCrh
Rxfp38574.75Relaxin Family Receptor 3A5Gi/oRln1
Oprl18273.848Opioid Receptor-Like 1A4Gi/oPnoc
Crhr27270.73CRH Receptor 2B1GsCrh
Tacr365683Tachykinin Receptor 3A9Gq/11Tac2
Oprk16467.43Kappa-Opioid ReceptorA4Gi/oPdyn
Tacr15664.23Tachykinin Receptor 1A9Gq/11Tac1
Pth1r5161.615PTH 1 ReceptorB1Gq/11Pthlh
Vipr14156.128VIP Receptor 1B1GsVip, Adcyap1
Oprm13552.143Mu-Opioid ReceptorA4Gi/oPenk, Pdyn
Trhr23048.910TRH Receptor 2A7Gq/11Trh
Vipr23048.40.5VIP Receptor 2B1GsVip, Adcyap1
Rxfp22847.34Relaxin Family Receptor 2A5GsRln1
Oprd12645.813Delta-Opioid ReceptorA4Gi/oPenk, Pdyn
Ntsr12444.310Neurotensin Receptor 1A7Gq/11Nts
Sstr31739.521Somatostatin Receptor 3A4Gi/oSst, Cort

The single-cell RNA-seq data expose very highly differential expression of NP-GPCR genes in cortical neurons. Figure 2 represents expression patterns of the 29 NP-GPCR genes listed in Table 2 in the same manner as for the 18 NPP genes in Figure 1. Figure 2A establishes that each of the 29 NP-GPCR genes is expressed in highly differential fashion across the 22,439 mouse cortical neurons sampled, as was the case for the 18 NPP genes. As was noted for NPP genes in Figure 1, each of the curves in Figure 2A exhibits a transition from very high to very low (commonly zero) expression across the sampled neuron population. These transitions occur at very different population percentile points, again providing clear evidence for highly differential expression. Percentages of the sampled neuron population expressing a given NP-GPCR gene (at greater than 1 CPM) range from more than 72% for Adcyap1r1 down to 0.7% for Vipr2.

Figure 2 with 1 supplement see all
Single-cell NP-GPCR gene expression and co-expression statistics for distant neocortical areas VISp and ALM show that expression patterns for 29 NP-GPCR genes are highly differential within neocortical areas but conserved between areas.

(A) Different NP-GPCR genes show very different expression level distributions across the 22,439 VISp+ALM neurons sampled. Color-coded solid curves plot single-cell CPM values for the specified individual NP-GPCR genes in descending order along a cell population percentile axis. The 29 curves are segregated for clarity into five panels (i-v) sorted by cell population percentiles at which CPM values fall below 1. Large differences in fractions of cells expressing different NP-GPCR genes are evident. Dashed curve labeled ‘max NP-GPCR Gene’ in panel A. i was generated by plotting CPM values of the highest CPM NP-GPCR gene for each individual cell in descending order. (B) Fractions of cells expressing each NP-GPCR genes represented separately for 13,491 VISp neurons and 8,948 ALM neurons, showing strong conservation between areas of the patterning of NP-GPCR expression fractions documented in panel A. (C) Histograms illustrating frequencies of various multiples of NP-GPCR gene co-expression in individual neurons, represented separately for VISp and ALM neurons. The paired vertical bars illustrate strong conservation of co-expression patterns between the two cortical areas.

The RNA-seq data suggest that all, or at least very nearly all, neocortical neurons express at least one NP-GPCR gene. The dashed curve in the left panel of Figure 2A, generated similarly to the dashed curve for NPP genes in Figure 1A, shows that 98% of the sampled mouse cortical neurons express at least one NP-GPCR gene at >1 CPM and that 78% express at least one NP-GPCR gene at >100 CPM, lower than the comparable NPP curve in Figure 1, but still indicative of quite high expression. Again, these numbers must be understood as lower limits to percentages of cortical neurons actually expressing at least one of the 29 NP-GPCR genes, after taking into account the pulsatile transcription and stochastic sampling issues cited above. The results summarized in Figure 2A may thus be consistent with a conclusion that every cortical neuron expresses at least one NP-GPCR gene cognate to a cortically expressed NPP gene.

Conservation of NP-GPCR gene expression statistics between VISp and ALM

Figure 2B provides evidence for strong conservation of differential NP-GPCR expression profiles between distant cortical areas VISp and ALM. The paired bars represent fractions of cells expressing a given gene in each of the two areas, again revealing strong similarities of differential expression profiles in the two very different neocortical areas (ρ = 0.959, p<2.2E-16). Figure 2C represents frequencies of NP-GPCR gene co-expression multiples detected in individual neurons. These data establish that multiple NP-GPCR genes are co-expressed in almost all cortical neurons and that numbers of genes co-expressed are even higher than those noted above for co-expression of NPP genes. Modal numbers of co-expressed NP-GPCR genes detected is six in both VISp and ALM with broad plateaus extending out to 12 co-expressed NP-GPCR genes per cell. The striking similarities of NP-GPCR co-expression distributions between the two otherwise divergent neocortical areas once again suggests that the patterning of NP-GPCR co-expression may have consequences for cortical function that are conserved because they are functionally important. As illustrated by Figure 2—figure supplement 1, it is furthermore common for individual neurons to co-express cognate NPP/NP-GPCR pairs, raising the intriguing possibility of cell-autonomous feedback mediated by an autocrine action of a secreted NP product on the secreting cell itself. Figure 2—figure supplement 1 additionally shows that cognate pair co-expression patterning is also highly conserved between areas VISp and ALM.

Neurotaxonomic profiling of NPP and NP-GPCR gene expression

The analysis so far has relied solely upon the genomic depth and single-cell resolution characteristics of the Tasic 2018 transcriptomic data. We now proceed to explore the analytical power of the transcriptomic neurotaxonomy developed as part of the Tasic 2018 study. This neurotaxonomy makes it possible to predict a protein ‘parts list’ for any neuron that can be mapped to a given transcriptomic taxon. Combined with tools for genetic access to transcriptomic taxa, transcriptomic taxonomy thereby offers rich prospects for experimental test of such predictions (see also Discussion below), The present analysis will make extensive use of the Tasic 2018 neurotaxonomy’s representation of 115 glutamatergic and GABAergic transcriptomic neuron types (see Figure 3—figure supplement 1).

Figure 3 shows transcriptomic gene expression ‘heatmaps’, representing transcript abundance for each of 18 NPP (Figure 3A) and 29 NP-GPCR (Figure 3B) across all 115 glutamatergic and GABAergic neuron types by log10-scaled pseudocolor. These heatmaps show that expression of every one of these 47 genes is highly specific to particular neuron types, but that type specificity varies greatly from gene to gene. Note that CPM expression levels vary across neuron types by factors exceeding 10,000 for many NPP genes and 1000 for many NP-GPCR genes. These heat maps also show that every neuron type expresses multiple NPP and NP-GPCR genes and that each of the NPP and NP-GPCR genes is expressed in multiple neuron types (with Vipr2 in one Pvalb type as a near exception). These two heatmaps further show many cases where both an NPP gene and its cognate NP-GPCR receptor are expressed in the same neuron type, with the Cck/Cckbr and Adcyap1/Adcyap1 r1 pairs being particularly prominent examples. Quite intriguingly, these expression heat maps also suggest that each of the neuron types might be distinguished by a unique pattern of expression of these 47 NP genes. This possibility will be explored quantitatively in connection with Figure 4 below.

Figure 3 with 3 supplements see all
Neurotaxonomic heatmaps reveal highly neuron-type-specific expression of (A) 18 NPP and (B) 29 NP-GPCR genes in 22,439 individual neurons harvested from areas VISp and ALM.

Trimmed-mean (5% trim) CPM expression values for each of the 115 VISp+ALM glutamatergic and GABAergic neuron types (see Figure 3—figure supplement 1) are normalized per gene to maximum value indicated at right for each row and pseudocolored according to log10 scales at right. Note that these scales represent 5 (NPP) and 3 (NP-GPCR) orders of magnitude and that each gene spans the entire pseudocolor range across neuron types. Subclasses are called out here by labels (IT, PT, NP, CT, L6b for glutamatergic types; Lamp5, Sncg, Serpinf1, Vip, Chodl, Sst, Pvalb for GABAergic types) and demarcated on the heatmaps by thin gray lines. Gene rows are ordered here as in Tables 1 and 2. (C) Violin plots representing coefficients of CPM variation (CV) for 18 NPP genes across types pooled within each of the 11 subclasses indicated (Chodl not represented here as it is a singular neuron type) and globally across all cell types (‘All’). Callouts on each violin indicate genes of highest CVs within each subclass. Inset shows within-subclass CV/global CV demonstrating variation within subclasses is a significant fraction of global variability (dotted line mean = 0.239). See Figure 3—figure supplement 2 for individual gene statistics. (D) Similar for 29 NP-GPCR genes showing greater relative variability, mean CV = 0.427. See Figure 3—figure supplement 3 for individual gene statistics.

Figure 4 with 2 supplements see all
Neurons can be clustered effectively based on just 47 NP genes (18 NPP and 29 NP-GPCR).

(A) A two-dimensional latent space representation of 22,439 cells based on 6083 highly expressed (HE) genes obtained by an autoencoding neural network. Dots represent individual cells, colored according to the type-code assignments of Tasic et al. (2018) (see Figure 3—figure supplement 1). Cells of the same type appear as grouped into distinct islands, which suggests that classifiers trained to identify cell types would perform well with such low dimensional representations of gene expression. (B) Two-dimensional representation of neurons in such a latent space z2, based on the 47 NP genes. (C) Schematic of the network architecture used to train the second autoencoder that learns to represent neurons in a latent space z2 that is similar to z1. This second autoencoder represents cells in the latent space based on much smaller gene sets. (D) Inset illustrates resolution index (RI) associated with nodes on the hierarchical tree used in the per-cell RI calculation. RI distribution (see Materials and methods) for NP genes-based cell type classification shows that a vast majority of the cells can be correctly classified up to the type level (leaf nodes, RI = 1.0) of the Tasic 2018 hierarchy. Errors in classification (RI <1.0) at the type level are nevertheless resolved at the class level of the hierarchy, as indicated by the high values for RI for the remaining cells. High average RI for HE genes, and 4020 differentially expressed (DE) genes, and 47 DE genes indicates that the cell type classification procedure based on autoencoder representations is accurate. The average RI for cell type classification based on the 47 NP genes is significantly higher (p<0.01, bootstrap) than both, subsets of 47 genes selected randomly (Rand47, n = 100 subsets), and selected randomly but with expression levels matched to the NP genes (Rand47 ExpMatch, n = 100 subsets).

The dashed vertical line spanning Figure 3A and B heatmaps divides glutamatergic and GABAergic neuron types and provides for ready comparison of NP gene expression patterns in these two broad neurotaxonomic classes. Figure 3A shows clearly that more NPP genes are expressed more strongly in GABAergic than in glutamatergic types. This differential is consistent with a long history of neuroscientific use of neuropeptide products as protein markers of GABAergic neuron subsets (e.g., VIP, SST, NPY, Substance P), which has no parallel in the marking of glutamatergic neuron subsets. Figure 3A nonetheless also shows that every glutamatergic type expresses at least one NPP genes at a very substantial level. Figure 3B shows that the broader expression of NPP genes in GABAergic over glutamatergic types is leveled or even reversed for NP-GPCR genes. That is, while GABAergic neurons clearly show the more prolific and varied expression of NPP genes, glutamatergic neurons may be somewhat more prolific expressors of NP-GPCR genes.

Additional graphics on the Figure 3 heatmaps further delineate the Tasic 2018 neurotaxonomy. A cladogram reflects the hierarchical similarity progression from the broad GABAergic and glutamatergic classes to the 115 individual neuron types, as aggregated across VISp and ALM cortical areas. Tinted rectangles and labels call out the five glutamatergic and seven GABAergic subclasses (see also Figure 3—figure supplement 1). Thin gray vertical lines crossing both NPP and NP-GPCR heatmaps demarcate those same subclasses. This delineation of subclasses shows that expression of some genes tends to remain constant within some subclasses, but to change abruptly at subclass boundaries. This does not seem, however, to be a very general case. Many genes show expression that varies widely by type within subclass. Figure 3C quantifies such residual expression variation for all NPP and NP-GPCR genes within each subclass. These significant residuals justify the use of more narrowly defined taxa (e.g., the 115 neuron types) to adequately characterize cortical neuropeptide gene expression. Relationships between NP gene expression patterns and the Tasic 2018 neurotaxonomy will be examined more quantitatively in the following section.

Transcripts of 18 NPP and 29 NP-GPCR genes are exceptionally potent neuron-type markers

The strong marker patterning of the 47 NP gene expression profiles evident in Figure 3 suggests the possibility that each of the 115 glutamatergic and GABAergic neuron types might be distinguished by a unique combination of these 18 NPP and 29 NP-GPCR genes. To explore this possibility and compare NP transcriptomes to other transcriptome subsets quantitatively, we developed the analysis presented in Figure 4.

We began by asking whether there exists a low dimensional representation of gene expression that naturally separates neurons of different types into distinct parts of that low-dimensional space. The extent to which a neuron’s location in such a space can be inferred from the expression of a limited subset of genes (such as the 47 NP genes) would then provide a measure of sufficiency of that subset to determine the type of that neuron accurately. Hierarchical clustering methods to define neuron types based upon gene expression are well established (Hastie et al., 2001; Oyelade et al., 2016) but have difficulty when comparing and making inferences between datasets. We therefore devised a machine learning approach based on linking deep neural networks called autoencoders (Hinton and Salakhutdinov, 2006) to address this question explicitly and quantitatively.

We trained a single autoencoder network to represent cells in a low dimensional space based on CPM values of the 6083 most highly expressed genes (HE genes) in the Tasic 2018 dataset. Figure 4A shows the result of one such two dimensional encoding, where each of the 22,439 individual neurons appear as a distinct dot colored by its type assignment. The tight grouping of type-coding colors evident in Figure 4A implicitly conveys that position within this latent space corresponds well to neuron types, despite the fact that the autoencoder did not have prior information about the Tasic 2018 classification. With the first autoencoder held as fixed, we trained a second autoencoder, linked to the first, to obtain a low-dimensional representation based on a much smaller subset of genes. Figure 4B shows a two-dimensional representation of the same 22,439 neurons based on 47 NP genes. Again the tight color grouping suggests that the 47 NP genes alone suffice to assign types in close register to the Tasic 2018 neurotaxonomy. The autoencoder network architectures are schematized in Figure 4C. The cost function used to train the second autoencoder included a penalty term to minimize differences in the representation of cells compared to that obtained by the first autoencoder. This was done to ensure that the latent spaces of the two autoencoders are as similar as possible while faithfully representing the expression patterns of the respective gene sets they receive as input. This procedure allowed us to visualize the similarity between the gene sets in a latent space that captures type information, and to quantify the extent to which any small gene subset by itself could be used to identify neuron types.

To quantify the type classification ability of different gene sets, we used Quadratic Discriminant Analysis (QDA) (Hastie et al., 2001) to perform supervised classification using five-dimensional latent space representations of the different gene sets obtained by autoencoder networks. We obtained a measure on a per-cell basis, resolution index (RI) to evaluate the degree of correspondence of classification results with the Tasic 2018 neurotaxonomy. The resolution index averaged over all cells is used as a summary statistic to quantify the ability of different gene sets in resolving neuron types. Briefly, QDA was performed iteratively on a given latent space representation, starting with all the leaf node type labels of the neurotaxonomy. In each subsequent iteration the number of labels was reduced by successively merging leaf node labels into their parent node class label (inset, Figure 4D). RI = 1.0 for a neuron that is assigned the correct type (e.g., Pvalb Reln Tac1) and 0.0 < RI < 1.0 for neurons for which the iterative QDA based classification could determine the correct label only up to a subclass (e.g. Pvalb). A neuron is assigned RI = 0.0 if the QDA-based classification failed to determine the correct label even at the glutamatergic or GABAergic level.

Figure 4D shows neuron type classification results based on five dimensional latent space representations of different subsets of genes (k = 13 fold cross validation). For the 6,083 HE genes and a set of 4020 genes most differentially expressed (DE genes) across neuron types, the latent space is obtained with the first autoencoder, and the RI distributions shown in blue have average values of 0.986 and 0.987, respectively, close to the theoretical maximum of 1.0 that can only be achieved with perfect type classification for all neurons in the dataset. For subsets of 47 genes, the latent representations were obtained with the second linked autoencoder, and the corresponding RI distributions are colored red. A set of 47 DE genes achieves average RI = 0.964. These results confirm the idea that autoencoder-based low dimensional representations of gene expression can be used for accurate type classification. The 47 NP genes can be used to classify neuron types well, with average RI = 0.925 and a majority of the neurons (62%) classified correctly at the type level (with nearly uniform performance across all neuron types, see Figure 4—figure supplement 1). This RI performance is significantly higher (p<0.01, bootstrap) than the average RI for of subsets of genes chosen randomly (0.641 ± 0.047, n = 100), and chosen randomly but with expression levels matched with the NP genes (0.843 ± 0.027, n = 100), with none of the individual randomly selected subsets reaching the NP gene index of 0.925. Note that genes in the 47 DE set were chosen with prior knowledge of the Tasic 2018 taxonomy, while the 47 NP gene set was not. This distinction thus makes the near match of the 47 NP to the 47 DE gene sets in average RI all the more striking. This demonstration of the exceptional power of NP genes to mark transcriptomic neuron types reinforces earlier indications of an especially close and fundamental connection between neuropeptide gene expression and neuron type identity.

Conservation of NPP and NP-GPCR gene expression profiles between VISp and ALM

Figure 5 juxtaposes separate VISp and ALM expression profiles for NPP and NP-GPCR genes across 93 VISp neuron types (Figure 5A) and 84 ALM neuron types (Figure 5B). Similarities of expression profiles for the two areas are obvious in Figure 5, but there are also visible differences. The latter are rooted primarily in the substantial divergence of glutamatergic neuron taxonomies discussed at length in Tasic et al. (2018). Very strong similarities of both NPP and NP-GPCR expression profiles are most obvious for the GABAergic types, where the taxonomies are identical except for the absence of two GABAergic types in ALM (indicated by dark gray vertical placeholder bars in Figure 5B). The general conservation of neuron-type-specific expression patterns among common cell types between the two distant neocortical areas (NPP correlation: ρ = 0.974, p<2.2e-16, NP-GPCR: 0.877, p<2.2e-16) thus provides another indication of robust connection between NP gene expression and cortical neuron differentiation.

Neurotaxonomic heatmaps show strong conservation of NPP and NP-GPCR expression patterns between two distant neocortical areas.

(A) Expression heatmap for 18 NPP and 29 NP-GPCR genes in 13,491 single VISp neurons classified by type. (B) Similar heatmap for 8948 single neurons harvested from ALM. Heatmaps generated and displayed as described in Figure 3, except for segregation here of VISp and ALM harvest areas. Heat maps are aligned horizontally here to match GABAergic neuron types between VISp and ALM. Vertical dark gray bars in Figure 5B are spacers marking the two GABAergic cell types absent in ALM. Glutamatergic neurotaxonomies are seen to differ substantially, but differences appear mainly at the finest, ‘leaf’ levels of the neurotaxonomic hierarchy (see Tasic 2018) and Figure 3—figure supplement 1).

Prediction of local peptidergic signaling from expression of cognate NPP/NP-GPCR pairs

Expression of an NPP gene in one neuron and a cognate NP-GPCR gene in another neuron nearby implies a possibility of directed paracrine signaling, with diffusion of a secreted peptide coupling the first neuron to the second. The present set of 47 cortical NP genes (18 NPP and 29 NP-GPCR) comprises the 37 distinct cognate NPP/NP-GPCR pairs enumerated in Table 3 and predicts accordingly 37 distinct peptidergic neuromodulation networks. As noted in the Introduction, expected neuropeptide diffusion distances suggest that any neuron within a local cortical area (e.g., VISp or ALM) might signal by diffusion to any other neuron within that same local area, but almost surely not to more distant areas (e.g., from VISp to ALM). In the following, we therefore make predictions of 74 (37 × 2) peptidergic distinct signaling networks, keeping separate consideration of signaling within VISp and within ALM.

Table 3
The 18 NPP and 29 NP-GPCR genes of Tables 1 and 2 constitute 37 cognate NPP/NP-GPCR pairs and predict at least 37 potentially distinct peptidergic modulatory networks.

The 37 pairs are enumerated here along with indications of the expected primary GPCR signal transduction class for each NP-GPCR (Alexander et al., 2017) and a fraction denoting frequency with which the given cognate pair occurs as a fraction of all neuron pairs surveyed. Pastel table fill colors denote G-protein transduction class as in Tables 1 and 2.

#Cognate Pair
Symbol
NPP
Gene
NP-GPCR
Gene
Primary
Gα Family
Fraction of Type Pairs#Cognate Pair
Symbol
NPP
Gene
NP-GPCR
Gene
Primary
Gα Family
Fraction of Type Pairs
1Npy→Npy1rNpyNpy1rGi/o0.780519Vip→Vipr1VipVipr1Gs0.496
2Npy→Npy2rNpyNpy2rGi/o0.34120Vip→Vipr2VipVipr2Gs0.052
3Npy→Npy5rNpyNpy5rGi/o0.809521Crh→Crhr1CrhCrhr1Gs0.3925
4Sst→Sstr1SstSstr1Gi/o0.75122Crh→Crhr2CrhCrhr2Gs0.2035
5Sst→Sstr2SstSstr2Gi/o0.83623Rln1→Rxfp1Rln1Rxfp1Gs0.2465
6Sst→Sstr3SstSstr3Gi/o0.40524Rln1→Rxfp2Rln1Rxfp2Gs0.07
7Sst→Sstr4SstSstr4Gi/o0.80625Adcyap1→Adcyap1r1Adcyap1Adcyap1r1Gs0.284
8Penk→Oprd1PenkOprd1Gi/o0.495526Adcyap1→Vipr1Adcyap1Vipr1Gs0.1465
9Penk→Oprm1PenkOprm1Gi/o0.927Adcyap1→Vipr2Adcyap1Vipr2Gs0.0155
10Cort→Sstr1CortSstr1Gi/o0.626528Tac2→Tacr3Tac2Tacr3Gq/110.0955
11Cort→Sstr2CortSstr2Gi/o0.696529Cck→CckbrCckCckbrGq/110.6635
12Cort→Sstr3CortSstr3Gi/o0.33830Tac1→Tacr1Tac1Tacr1Gq/110.119
13Cort→Sstr4CortSstr4Gi/o0.67231Pthlh→Pth1rPthlhPth1rGq/110.392
14Pdyn→Oprd1PdynOprd1Gi/o0.211532Trh→TrhrTrhTrhrGq/110.016
15Pdyn→Oprk1PdynOprk1Gi/o0.074533Trh→Trhr2TrhTrhr2Gq/110.055
16Pdyn→Oprm1PdynOprm1Gi/o0.434Grp→GrprGrpGrprGq/110.113
17Pnoc→Oprl1PnocOprl1Gi/o0.65435Nts→Ntsr1NtsNtsr1Gq/110.0225
18Rln1→Rxfp3Rln1Rxfp3Gi/o0.10636Nts→Ntsr2NtsNtsr2Gq/110.054
37Nmb→NmbrNmbNmbrGq/110.5655

Prediction of peptidergic networks from neurotaxonomic NP gene expression profiles

Figure 6 displays weighted adjacency matrix plots representing predictions of neuron-type-specific and neuron-subclass-specific peptidergic coupling from selections drawn from VISp and ALM of the 37 cognate NP gene pairs. The prediction matrices A-E are outer products (CPM*CPM units) of vectors representing expression (CPM units) of an NPP gene (columns) and a cognate NP-GPCR gene (rows) across all VISp or ALM neuron types. The predicted coupling matrices in F matrices are similar except that factor vectors are down-sampled by averaging neuron-type-specific CPM values within each of the subclasses (see Materials and methods for more details).

Figure 6 with 10 supplements see all
Neurotaxonomic expression profiles of 37 cognate NPP/NP-GPCR pairs predict 37 peptidergic networks.

Weighted adjacency matrix plots predicting local peptidergic coupling amongst neuron types (A–E) and subclasses (F). Matrices were computed as outer products (CPM*CPM units) of row and column factor vectors representing abundance (CPM units) of NPP and cognate NP-GPCR genes. Pseudocolor scales representing both expression (CPM) and coupling (CPM*CPM) are logarithmic. (A) 93 × 93 matrix predicting coupling amongst 93 VISp neuron types based on type-specific expression of the Npy gene and the cognate Npy1r NP-GPCR gene, as indicated by row and column vector ‘heat’ strips called out by curved arrows. (B) An 84 × 84 square matrix similarly representing Npy-Npy1r coupling, depicted as in (A), except based only on the 84 ALM neuron types. Dashed crosses demarcate the four quadrants of directed NPP/NP-GPCR pairing between glutamatergic (‘E’) and GABAergic (‘I’) neuron types, called out as (E→ E), (E→ I), (I→ I) and (I→ E). Light gray lines, pastel color blocks and labels flanking both axes demarcate higher, subclass levels of the Tasic 2018 neurotaxonomy (as in Figure 3A above). (C–E) Exemplar matrix predictions for a further sampling of the 37 VISp cognate NPP/NP-GPCR pairs from each of the three primary G-protein transduction families: (C) Gi/o; (D) Gs; (E) Gq/11. (F) Adjacency matrices similar to A-E, except row and column factor vectors were calculated as means of CPM values across neuron types comprising indicated subclasses. (Cladograms and taxonomic color codes as delineated in Figure 6—figure supplements 18). Links below point to source data files and similar plots for all 37 VISp and ALM type and subclass adjacency matrices, and to additional quantitative analysis of coupling matrix hierarchies (Figure 6—figure supplement 9) and morphologies and correlations (Figure 6—figure supplement 10).

Figure 6C-E represents 8 more of the 37 cognate pair coupling matrices predicted for VISp. Along with Figure 6A and B, these exemplify the wide variety of neuron-type-specific coupling motifs resulting from transcriptomic prediction. Most coupling matrices (i.e., pairs 1, 9, 29), predict significant coupling over wide swaths of type-pairs, approaching 20% of the entire matrix. A few matrices at the other extreme, such as 6 and 34, predict very sparse coupling. Other predictions are intermediate in sparsity. As one might expect, similar patterns are evident in the downsampled, subclass level predictions of Figure 6F. Even from the small subset of the 37 coupling matrix plots shown in Figure 6, it is evident that both type-level and subclass-level matrices are densely tiled by predictions of connectivity. Inspection of Figure 6 and similar plots for the remainder of the 37 cognate pairs (Figure 6—figure supplements 18) also reveals that there is a great deal of cross-network redundancy, with multiple pairs covering a large majority of the coupled types and subclasses, sometimes within and sometimes crossing Gα family boundaries. These observations will be strengthened by the analysis of Figure 7 below.

Pooling NP-GPCRs by primary Gα family enables neurotaxonomic prediction of primary NP-GPCR signaling impacts across 37 cognate NP pairs (see text) for each of areas VISp and ALM.

(A) ISQ color maps representing coupling matrix predictions at the Tasic 2018 neurotaxonomy type level, merging Gi/o (red), Gs (green) and Gq/11 (blue) primary Gα family components. (B) Component primary Gα family (color) channels prior to merger displayed in (A). (C) Coupling matrix plots as in (A) and (B), except generated at the higher Tasic 2018 neurotaxonomy subclass level. Individual aggregate matrix components as in (B) are plotted at the right for both VISp and ALM. Dashed white crosses overlaying each matrix plot demarcate glutamatergic and GABAergic classes and the four corresponding matrix quadrants (E→ E, E→I, I→I and I→E) as in the individual matrix plots of Figure 6. (Cladograms and taxonomic color codes as delineated in Figure 3—figure supplement 1).

Finally, Figure 6 illustrates the tendency of coupling predictions from most cognate NP pairs to fall in contiguous ‘patches’ of the full coupling matrix. This is a natural reflection of the strong tendency of both NPP and NP-GPCR expression to align with early nodes in the Tasic 2018 hierarchical clustering which was also evident in Figures 3 and 5. The broadest example of coupling matrix patches reflecting hierarchical neurotaxonomy structure is provided by the observation that most sizable coupling patches fall strictly within single quadrants of glutamatergic-GABAergic neuron type pairing. Variations in coupling matrix structure across all 37 cognate NP pairs are represented in more quatitative terms by Figure 6—figure supplements 9 and 10. Additional details regarding the generation of the coupling matrices are provided in Materials and methods.

Prediction of second-messenger impacts from neurotaxonomic NP gene expression profiles

For compact visualization of predicted signaling impacts of multiple distinct peptidergic networks and to facilitate empirical tests of such predictions based on calcium and cyclic AMP sensors (see Discussion), we developed the ‘ISQ’ graphic exemplified in Figure 7. This treatment makes use of the trichotomous G-protein primary transduction family approximation described in Introduction and delineated in Table 3 above: the Gi/o family (‘I’) inhibits production of cyclic AMP, the Gs family (‘S’) stimulates production of cyclic AMP) and the Gq/11 family (‘Q’) augments intracellular calcium dynamics. Such trichotomy is certainly an oversimplification, as it is known that downstream GPCR signal transduction is richly multifaceted (Weis and Kobilka, 2018) and that some GPCRs may signal via members of multiple Gα families, but we postulate here that this simplified scheme may nonetheless offer a first approximation useful for the design of exploratory experimentation and theory.

Figure 7 displays three-channel ‘ISQ’ (red, green, blue) color maps predicting coupling in areas VISp and ALM based on aggregation across three primary Gα families (Gi/o, Gs and Gq/11). Individual cognate-pair coupling matrices were computed as in Figure 6, log10 scaled, individually normalized to maximum values, then summed into a red, green or blue color channel by primary Gα family as listed in Table 3. Figure 7A merges red, green and blue (i.e., Gi/o, Gs, Gq/11) color channels for the Tasic 2018 neuron-type-level. Figure 7B displays the three component channels individually. The dashed white crosses on these and following coupling matrix both plots divide these ISQ maps into four E-I quadrants as in Figure 6. Major features of the ISQ maps are clearly very similar for VISp and ALM. Figure 7C show aggregated matrix plots generated in similar fashion for the subclass-level neurotaxonomy.

The ISQ maps of Figure 7 exhibit a number of interesting features. (1) The aggregate matrices show that the 37 cognate pairs cumulatively predict coupling that densely tiles the entire neuron-type coupling matrix, with the largest area of relatively weak coupling being that from NP, CT and L6b subclasses to GABAergic neurons. (2) Aggregate predictions are highly conserved between VISp and ALM areas. (3) All four E-I quadrants show coupling representative of all three Gα families. (4) There is nonetheless some family predominance within each quadrant: Gi/o (blue) in the E→E quadrant, Gs (green) in the E→I quadrant, and Gi/o (red) in the I→I and I→E quadrants. (5) As is particularly notable in the component matrix plots, Gi/o (red) signaling is the most heavily concentrated, with quite little signaling expressed in the top two (E-E and E-I) quadrants but tiling the bottom two (I-E and I-I) quite thoroughly. Gq/11 signaling shows a weaker, but still noticeable tendency toward concentration in the two left quadrants (E-E and I-E). Gs signaling exhibits distinct zone of concentration, but these are not well captured by the quadrant structure. (6) The presence of cyan, yellow and purple (blended) colors in the merged matrix plots (A and C), particularly in the bottom quadrants is indicative of coincidence of signaling impacts of multiple Gα families at individual type (A) and subclass (C) intersections.

Discussion

Light from single-cell transcriptomics is now beginning to illuminate dark corners of cellular neuroscience that have long resisted mechanistic and functional analysis (Fan et al., 2018; Fishell and Kepecs, 2019; Földy et al., 2016; Gokce et al., 2016; Luo et al., 2018; Okaty et al., 2011; Paul et al., 2017; Shekhar et al., 2016; Tasic et al., 2018; Tasic et al., 2016; Telley et al., 2016; Zeng and Sanes, 2017). Cortical neuropeptide signaling may be one such corner. While profound impacts of neuropeptide signaling are well-established in a wide range of non-mammalian and sub-cortical neural structures (Borbély et al., 2013; Burbach, 2011; Elphick et al., 2018; Grimmelikhuijzen and Hauser, 2012; Katz and Lillvis, 2014; Jan et al., 1979) and there certainly is an excellent literature on cortical neuropeptide signaling (Crawley, 1985; Férézou et al., 2007; Gallopin et al., 2006; Gomtsian et al., 2018; Hamilton et al., 2013; Liu et al., 2018; Mena et al., 2013; Mitre et al., 2018; Owen et al., 2013; Rossier and Chapouthier, 1982; Williams and Zieglgänsberger, 1981), published physiological results are surprisingly rare given the breadth of neuroscientific interest in cortex. The new transcriptomic data analyzed here suggest a possible explanation for this relative rarity. Though many NPP and cognate NP-GPCR genes are expressed abundantly in all or very nearly all neocortical neurons, such expression is highly differential, highly cell-type specific, and often redundant. These previously uncharted differential expression factors may have hindered repeatable experimentation. Our analysis supports this unwelcome proposition but may also point the way to more productive new perspectives on intracortical peptidergic neuromodulation.

Summary of findings

The present single-cell analysis establishes that mRNA transcripts from one or more of 18 NPP genes are detectible in over 97% of mouse neocortical neurons (Figure 1A,B) and that transcripts of one or more of 29 cognate NP-GPCR genes are detectible in over 98% (Figure 2A,B). Transcripts of at least one of the 18 NPP genes are present in the vast majority of cortical neurons at extremely high copy number (Table 1), suggesting the likelihood of brisk translation into neuropeptide precursor proteins. Brisk synthesis of precursor proteins further suggests brisk processing to active neuropeptide products and secretion of these products. Likewise, NP-GPCR transcripts rank high in abundance compared to most other transcripts of protein-coding genes (Table 2), supporting the likelihood of functional receptor products. Our observations thus support the proposition that all, or very nearly all, neocortical neurons, both glutamatergic and GABAergic, are also both neuropeptidergic and modulated by neuropeptides. We are not aware of any previous empirical support for quite such a strong conclusion.

Leveraging the analytical power of the Tasic 2018 transcriptomic neurotaxonomy, we find that patterns of differential expression of the 18 NPP and 29 NP-GPCR genes are very highly specific to neuron types as discerned from genome-wide transcriptomic analysis (Figure 3). Though much additional work (e.g., see Cadwell et al., 2017; Daigle et al., 2018; Moffitt et al., 2016; Shah et al., 2016; Wang et al., 2018; Zeng and Sanes, 2017) will be needed to fully reconcile new transcriptomic neurotaxonomies such as the Tasic 2018 example with existing anatomical and physiological neurotaxonomies, it seems very likely that some such reconciliation will eventually take place, and that the dimensions of neurotaxonomy will be expanded to include emerging connectomic data (Jonas and Kording, 2015).

Our analysis shows that very intricate single-cell (Figures 1B,C and 2B,C) and neurotaxonomic (Figure 5) patterns of expression of 18 NPP and 29 cognate NP-GPCR genes are very rigorously conserved between VISp and ALM, two distant and quite different areas of neocortex. Such strong conservation would seem improbable if these intricate patterns resulted from ephemeral factors such as local electrical activity or modulation status. Rather, we suggest that this strong conservation is more likely to reflect a really fundamental evolutionary and developmental connection between neuropeptide network architectures and adaptive cortical circuit function.

Following earlier indications that neurons may express multiple NPP genes, for example (Mezey et al., 1999), our analysis establishes that expression of multiple NPP genes in individual neurons may be the rule in cortex (Figure 1C). Our analysis also establishes the generality of expression of multiple NP-GPCR genes in individual cortical neurons (Figure 2C). The significance of these observations remains to be explored but should be viewed in light of recent discoveries of large numbers and great diversity of transcriptomic neuron types in neocortex and many other brain regions. Combinatorial expression of neuropeptide precursor and receptor genes obviously expands the prospects for molecular multiplexing that may allow selective communication amongst a multiplicity of distinct neuron types even though the signaling molecules propagate in diffuse paracrine fashion. It is also good to keep in mind, however, that the selectivity of NP-GPCRs for particular peptide moieties is not perfect. Various kinds of concentration-dependent ‘crosstalk’ between nominally separate peptidergic networks are therefore possible. Here in the interests of simplicity we have confined explicit peptidergic signaling predictions to the highest affinity pairings of NPP and NP-GPCR gene products (e.g., see Alexander et al., 2017).

We also find that a modest set of 47 neuropeptide-signaling genes permits transcriptomic neuron type classification that is exceptionally precise in comparison to other similarly small gene sets (Figure 4). Connections between neuronal cell-type differentiation and differential expression of neuropeptides were first recognized by the widespread use of neuropeptide immunoreactivity to discriminate interneuron types (DeFelipe et al., 2013). The exceptional power of neuropeptide genes as cell type markers is also evident in the Tasic 2018 neuron-type nomenclature (see Tasic et al., 2018) and bold red type highlights in Figure 3—figure supplement 1) and is noteworthy in other recent single-cell transcriptomic analyses of mouse neuron differentiation (Huang and Paul, 2019; Paul et al., 2017; Sugino et al., 2019; Zeisel et al., 2018). The tight alignment of neuron type classifications based solely on neuropeptide-signaling gene expression with the classifications based on genome-wide expression patterns, as evident in Figure 4, offers an intriguing suggestion of a very deep and fundamental connection between the expression of evolutionarily ancient neuropeptide-signaling genes and the differentiation of neuron type identities during metazoan speciation.

The structures of predicted neuropeptidergic modulation networks

Our analysis delineates neuron-type-specific expression of 37 cognate pairs amongst the 18 NPP and 29 NP-GPCR genes analyzed (Table 3). Each of these pairs can be taken to predict a modulatory connection from cells expressing a particular NPP gene, via a secreted NP product, to cells expressing the particular NP-GPCR gene (Figure 6). Each pair thus establishes the prospect of a directed modulatory network with nodes defined by the neurotaxonomic identities of the transmitting NPP-expressing and the receiving NP-GPCR-expressing neurons. The analyses represented in Figures 1, 2, 3 and 5 and Table 3 establish that at least one of the 37 pairs directly involves every neuron sampled, and that the vast majority of neurons are directly involved in more than one of the 37 predicted networks. The nearly complete adjacency matrix tiling evident in Figures 6 and 7 remarkably suggests that at least one of the 18 peptides considered here may directly interconnect almost every cortical neuron type with almost every other neuron type. Because of this saturated, multiplexed coverage of all neurons and neuron types, we refer to these predicted neuropeptidergic networks as ‘dense’.

Transcriptomic prediction of paracrine local signaling from GABAergic neuron sources is particularly compelling. Because few cortical GABAergic neurons have axons that project beyond the confines of a single cortical area, considerations of diffusion physics and the limited lifetime of peptides after secretion strongly imply that secreted neuropeptides act locally. On the other hand, most of the glutamatergic neurons do emit long axons, so it is possible that neuropeptides secreted from such neurons may act in remote cortical or extracortical projection target areas. Even so, most cortical glutamatergic neurons also have locally ramifying axon branches and may also secrete neuropeptides from their local dendritic arbors (Vila-Porcile et al., 2009). The high cortical expression of NP-GPCRs cognate to NPP genes expressed by glutamatergic neurons in the same local area suggests a scenario supportive of local modulatory signaling from glutamatergic neuron sources, though this case may not be quite as strong as that for strictly local GABAergic neurons. That said, the much more profuse expression of NPP genes in GABAergic neuron types along with the somewhat more profuse NP-GPCR expression in glutamatergic types does suggest a ‘prevailing wind’ of peptidergic signaling, blowing mainly from GABAergic to glutamatergic neurons, as presaged in an earlier microarray analysis of developing mouse cortex (Batista-Brito et al., 2008).

Though our NP network predictions are entirely consistent with decades of pioneering work on peptidergic neuromodulation and cortical gene expression (Burbach, 2011; Hökfelt et al., 2013; van den Pol, 2012), it is only with the recent advent of single-cell and neurotaxonomics methods that such specific predictions have become possible and, most importantly, testable.

Testing peptidergic network predictions

The present predictions regrading cortical neuropeptidergic coupling are based on detection of cellular mRNA transcripts, but prediction from such data depends upon (1) extrapolation from cellular mRNA census to inference about rates of synthesis, processing, localization and functional status of cellular NPP and NP-GPCR proteins, (2) assumptions about neuropeptide diffusion and lifetime in cortical interstitial spaces, (3) assumptions about signaling consequences of neuropeptide binding to cortical NP-GPCR receptors. Though we have already discussed several factors that mitigate such concerns, we stipulate here that these uncertainties remain substantial. Nonetheless, we expact that these same uncertainties will define paths for very productive future research.

Physiological and anatomical experimentation will be essential to testing transcriptomic predictions of intracortical neuropeptide signaling. We have suggested that such work may have been frustrated in the past by irreproducibility due to the uncharted multiplicity, neuron-type-specificity, and redundancy of NPP and NP-GPCR expression. This conundrum may now be resolved with the emergence of transcriptomic neurotaxonomies and new tools for experimental access to specific cortical neuron types. Such access may be either prospective, using Cre and/or Flp driver lines (Daigle et al., 2018; He et al., 2016; Madisen et al., 2015) or viral vectors (Dimidschstein et al., 2016) of substantial neuron-type-specificity, or retrospective by multiplexed FISH (Lein et al., 2017; Zeng and Sanes, 2017), immunostaining (He et al., 2016; Xu et al., 2010), patch-seq (Cadwell et al., 2017; Lein et al., 2017) or morphological classification methods (DeFelipe et al., 2013; Zeng and Sanes, 2017). These and other new molecular tools like those discussed below now seem poised enable truly decisive and repeatable tests of neuron-type-specific transcriptomic predictions of peptidergic signaling. It will be critical, however, for the field to have continually updated access to rapidly growing bodies of genetic and transcriptomic data and to the requisite animal strains and labeling materials.

A vast pharmacopeia of well-characterized specific ligands and antagonists for most NP-GPCRs (Alexander et al., 2017) will be bedrock for the functional analysis of neuron-type-specific peptide signaling. For analysis of type-specific neuropeptide signaling in network context (i.e., ex vivo slices and in vivo), newer optophysiological methods of calcium imaging and optogenetic stimulation/inhibition will certainly join electrophysiology as foundations for measurement of neuropeptide impacts. In addition, many new tools more specific to neuropeptide signaling are emerging. Super-resolution 3D immunohistologies like array tomography (Smith, 2018) and 3D single-molecule methods (Jia et al., 2014; von Diezmann et al., 2017) will enable imaging of dense-core vesicle localization and neuropeptide contents in type-specific network anatomical context. Genetically encoded fluorescent dense-core vesicle cargos will allow real-time detection of neuropeptide secretion (Ding et al., 2019), while genetically encoded sensors of extracellular GPCR ligands (Patriarchi et al., 2018; Sun et al., 2018), GPCR activation (Haider et al., 2019; Hill and Watson, 2018; Livingston et al., 2018; Ratnayake et al., 2017; Stoeber et al., 2018), G-protein mobilization (Ratnayake et al., 2017), cAMP concentration (Hackley et al., 2018; Ma et al., 2018), protein kinase activation (Chen et al., 2014) and protein phosphorylation (Haider et al., 2019) will enable fine dissection of NP dynamics and NP-GPCR signal transduction events (Spangler and Bruchas, 2017). In addition, new caged NP-GPCR ligands (Banghart et al., 2018) and antagonists (Banghart et al., 2013) will provide for precise spatial and temporal control for NP receptor activation. All of these tools have already been proved at least in principle, and all should be readily applicable to testing specific hypotheses derived from the type-specific peptidergic signaling predictions set forth here (Figures 6 and 7 and their supplements).

Prospects for elucidating cortical homeostasis, modulation and plasticity

Our results suggest that densely multiplexed peptidergic networks could play very significant roles in the homeostasis, modulation and plasticity of cortical synaptic networks. Due to the clearly formidable complexity of cortical networks, however, a real grasp of the myriad network interactions implicated is certain to require theoretical and computational approaches, in addition to experimental biophysics tests as outlined in the preceding section. Work at the fertile intersection of the neuroscience and the computer science of learning (Dayan and Abbott, 2001; Huh and Sejnowski, 2017; Koch and Segev, 1998; Lillicrap et al., 2016; Marblestone et al., 2016; Guerguiev et al., 2017; Song et al., 2000) seems particularly relevant to fathoming the possible significance of the neuropeptidergic networks we predict here.

Neuroscience and computer science efforts to model or engineer adaptive neural networks (be they biological or artificial) share the hard problem of optimally individualized adjustment of very large numbers of what both fields know as ‘synaptic weights’. At the heart of this challenge is ‘credit assignment’, that is, the assignment of ‘credit’ (or ‘blame’) to guide the strengthening (or weakening) of the small subset of synapses that actually contribute differentially to success (or failure) in a given perceptual, mnemonic or motor task. Neuroscientists struggle with the credit assignment problem as they search for biological learning rules. Computer scientists are driven by a quest for greater computational efficiency in training artificial networks and the prospect that evolution may have developed superior strategies. One concept that has come into prominence as a candidate biologically plausible solution to the credit assignment problem is that of modulated ‘Hebbian’ or ‘spike-timing-dependent’ plasticity (STDP) (Bengio et al., 2016; Dan and Poo, 2006; Farries and Fairhall, 2007; Florian, 2007; Frémaux and Gerstner, 2015; Froemke, 2015; Izhikevich, 2007; Marblestone et al., 2016; Pawlak et al., 2010; Poo et al., 2016; Roelfsema and Holtmaat, 2018; Xie and Seung, 2003) While most biological studies of modulated STDP so far have focused on the monoamine neuromodulator dopamine (Brzosko et al., 2019; Izhikevich, 2007; Kuśmierz et al., 2017; Schultz, 2015) known commonalities of signal transduction downstream from widely varying GPCRs suggest that NP-GPCRs could play roles in credit assignment analogous to those postulated for dopamine-selective GPCRs (Hamilton et al., 2013; Roelfsema and Holtmaat, 2018; but see Edelmann and Lessmann, 2011).

Deeper understanding of neuromodulation roles in adaptive cortical function seems certain to require a framework for integrating consideration of the panoply of possible activity-dependent modulatory networks with modulated excitatory and inhibitory synaptic networks. Figure 8 conceptualizes one such framework schematically, using a common neurotaxonomy to integrate statistics of multiple neuromodulatory and multiple synaptic signaling networks. Panels A-C idealize a logic for prediction of neuropeptidergic connectivity statistics from transcriptomic data. Panel D cartoons the use of a common neurotaxonomy to integrate probabilistic NP network graphs (panels B,C; three in this case) and multiple synaptic networks (panels E,F; two in this case) into a single graph representing superimposed modulatory and synaptic network. The present analysis suggests that a more realistic materialization of the Figure 8 schematic would involve approximately 100 neuron types and dozens of NPP and NP-GPCR genes. It would also require information that is presently unavailable about excitatory and inhibitory synaptic connectivity statistics in such a neurotaxonomic framework. It is very encouraging, however, that vigorous ongoing efforts (e.g., see Daigle et al., 2018; Jonas and Kording, 2015; Swanson and Lichtman, 2016; Tasic, 2018; Zeng and Sanes, 2017) suggest that such information is on the way. A view of cortical circuitry as a superimposition of multiple modulatory and synaptic networks, linked by a common neurotaxonomy as idealized in Figure 8, may prove essential to fathoming the interplay of slow neuromodulation and fast synaptic signaling necessary for adaptive cortical function.

Neurotaxonomy offers a framework for integrating statistical descriptions of multiple modulatory and synaptic networks, as schematized here for purely fictitious neuron types, transcriptomic and connectomic data.

Multiple directed network graphs are predicted here for peptidergic networks from transcriptomic data (A–C) and for synaptic networks from connectomic data (D,E) to predict a modulated synaptic network (F). (A) Transcriptional heat maps representing expression of three fictitious NPP genes (Npp1,2,3) and three cognate NP-GPCR genes (Npr1,2,3) across a neurotaxonomy comprising seven fictitious neuron types (a-c excitatory; d-g inhibitory). (B) Adjacency matrices derived from expression data in (A) as outer products of column and row factor vectors representing NPP and NP-GPCR expression, respectively. (C) Directed network graphs representing the same three NP networks, diagramming paracrine coupling by three neuropeptides (NP1,2,3) with routing of broadcast diffusive signals determined by differential expression of peptide-receptor pairings. (D) Neurotaxonomic adjacency matrices expressing excitatory and inhibitory synaptic connection statistics. (E) Synaptic network graphs derived from (D). (F) Directed multigraph illustrating use neurotaxonomy to integrate the three modulatory graphs and two synaptic connectivity graphs.

Implications for psychopharmacology

Neuropeptidergic signaling molecules have long beguiled as potential neuropsychiatric drug targets (Hökfelt et al., 2003; Hoyer and Bartfai, 2012). There seems to be some disappointment, however, in the returns on what has been rumoured to be very large research investments. The present study raises the possibility that both NP-targeted drug discovery and the reproducibility of physiological experimentation have been hindered by the same uncharted multiplicity, cell-type-specificity and redundancy of NPP and NP-GPCR expression. By charting these waters, single-neuron transcriptomic analysis may improve the odds substantially for both reproducible research and psychiatric drug development.

Today’s psychiatric pharmaceuticals almost all target signaling by the monoamine neuromodulators dopamine, serotonin, noradrenaline and/or histamine and their selective GPCR receptors (Data-Franco et al., 2017; Hamon and Blier, 2013; Millan et al., 2015; Urs et al., 2014). Because they are so numerous, neuropeptide signaling systems may be much more neuron-type specific than monoamines. Greater neuron-type-specificity may translate to NP-targeting drugs being less troubled by side-effects and compensation (Hoyer and Bartfai, 2012). Moreover, while GPCRs have long been known as among the most ‘druggable’ of targets (Gurrath, 2001; Lundstrom, 2009), the ‘druggability’ of GPCRs is currently advancing very rapidly due to advances in GPCR structural biology and molecular dynamic simulations (Hilger et al., 2018; Koehl et al., 2018; Weis and Kobilka, 2018). It seems likely that new knowledge of the neuron-type-specificity of NP signaling gene expression will substantially advance the development of NP-targeting pharmaceuticals.

Conclusions

Because single-cell RNA-seq data enable prediction of complete protein parts lists of individual neurons, they open powerful new perspectives on neuronal differentiation, function and network architectures. The power of these new perspectives has been further enhanced by parallel development of transcriptomic neurotaxonomies. Here we have exploited both a pioneering large-scale RNA-seq dataset and its data-driven neurotaxonomy to pursue a new perspective on local neuropeptidergic modulatory signaling in mouse cortex. This work has revealed a surprisingly highly structured and abundant expression of cortical NPP and NP-GPCR genes: dozens of neuropeptide signaling genes are expressed at very high levels in very distinctive and highly conserved patterns. While entirely consistent with previous bulk transcriptomic and proteomic observations, it is only with the advent of the RNA-seq combination of single-cell resolution with genomic depth that this extreme structure and abundance has come into focus. We have endeavored here to shape these findings into specific and testable peptidergic signaling predictions in the hopes of guiding fruitful experimentation based on emerging transcriptomic neurotaxonomies, new means for genetic access to specific neuron types and powerful new tools for biophysical analysis of neuropeptide actions. The observations presented here suggest the intriguing possibility that the homeostasis, modulation and plasticity of cortical circuitry may involve local neuropeptidergic signaling networks of previously unrecognized abundance and density.

Materials and methods

Data and software resources

Request a detailed protocol

The present study is based on analysis of a resource single-cell mRNA-seq dataset acquired at the Allen Institute (Tasic et al., 2018) and available for download at http://celltypes.brain-map.org/rnaseq/ These RNA-seq data were acquired from a total of 22,439 isolated neurons, with detection of transcripts from a median of 9462 genes per cell (min = 1,445; max = 15,338) and an overall total of 21,931 protein-coding genes detected. Neurons were sampled from two distant and very different neocortical areas: 13,491 neurons from primary visual cortex (VISp), and 8948 neurons from anterior lateral motor cortex (ALM). Tasic et al., harvested tissue specimens from a variety of transgenic mice expressing fluorescent proteins to enable enrichment of samples for neurons and for relatively rare neuron types by FACS sorting after dissociation. This enrichment procedure resulted, by design, in a disproportionate representation of GABAergic neurons, canonically ~20% of neurons (Sahara et al., 2012), such that the sampled neuron population is roughly half GABAergic (47%) and half glutamatergic (53%). The resource publication (Tasic et al., 2018) should be consulted for full details of neuronal sample and library preparation, sequencing and data processing. Source data, spreadsheets, R scripts and other code used to generate all tables and figures presented here are available at https://github.com/AllenInstitute/PeptidergicNetworks (Gala, 2019; copy archived at https://github.com/elifesciences-publications/PeptidergicNetworks). A derived data set np_gpcr_cpm.csv was used for analyses summarizing CPM, region and metadata for the NPP and NP-GPCR genes. The primary Tasic 2018 data tables are available for download at http://celltypes.brain-map.org/rnaseq/.

Data metrics

Request a detailed protocol

The Tasic 2018 single-cell RNA-seq data tables report the abundance of transcripts from individual neurons in both ‘counts per million reads’ (CPM) and ‘fragments per kilobase of exon per million reads mapped’ (FPKM) units. Our analysis of this data compares gene expression levels quantitatively, with two distinct use cases: (1) comparisons across large sets of different genes, and (2) comparisons of the same gene across different individual cells, cell types and brain areas. We have relied upon FPKM data (Mortazavi et al., 2008; Pimentel, 2014), for use case 1 (i.e., the Tables 1 and 2 comparisons across genes). For use case 2 (as in all figures below), we have preferred the CPM units, because these units were used to generate the Tasic 2018 neurotaxonomy. While choices between CPM and FPKM units here should have little impact upon outcomes, it would seem inconsistent to use FPKM units to compare across cell types discerned on the basis of CPM units.

The NP signaling genes upon which the present analysis focuses are expressed very differentially across the sampled populations of individual mouse cortical neurons. That is, each gene is expressed at a high level in some subset of cells but at zero or very low levels in the remainder of the population. To compactly characterize such expression, we developed a ‘Peak FPKM’ (pFPKM) metric. This metric is generated by ranking single-cell FPKM values for a given gene across the entire population of 22,439 neurons sampled, then designating the FPKM value at the ascending 99.9th percentile point as pFPKM. This metric was designed to minimize effects of sporadic outliers and sample size while still closely approximating the actual peak expression value in even very small subsets of neurons expressing the gene in question. Figures 1A and 2A, and their Source data files provide very detailed additional information about the single-cell RNA-seq value distributions sampled by the pFPKM metrics.

Selection of the 18 NPP gene set

Request a detailed protocol

As noted in Introduction, usage and definitions of the term ‘neuropeptide’ vary widely across the current literature. It therefore seems unwise at present to claim that any attempted consensus list would accurately circumscribe all neuropeptides. For the purposes of the present work, we have relied therefore on the reasonably exhaustive list of 96 classical and candidate human and mouse NPP genes put forth in a widely cited publication (Burbach, 2010) and related website (http://neuropeptides.nl/, last accessed 10 October 2019). To reconcile this list to current mouse gene nomenclature, we used both the HGNC nomenclature ((https://www.genenames.org/, last accessed 10 October 2019) and the Mouse Genome Database (MGD) (http://www.informatics.jax.org, last accessed 10 October 2019). The result is the list of 94 putative mouse NPP genes presented in Supplementary file 3, which also tabulates the pFPKM values and percentile scores compiled for each NPP genes from the Tasic 2018 dataset. These 94 NPP genes were further segregated using a preliminary (early 2018) version of the Tasic 2018 neurotaxonomy to select NPP genes exhibiting median CPM expression levels > 10 in one or more neuron type in VISp and ALM cortex. This screening resulted in the list of 39 such NPP genes represented in Supplementary file 4, with most exceeding the 10 CPM threshold by a large margin (observed range was 24–4100 CPM). Supplementary file 4 also tabulates criteria that drove inclusion of only the 18 NPP genes represented in Table 1 while 21 other cortically expressed NPP genes were excluded. The 18 select NPP genes include all but two (Edn3 and Gal) genes for which transcripts ranked in the top quintile by pFPKM of the 94 putative NPP genes as tabulated in Supplementary file 3.

Selection of the 29 NP-GPCR gene set

Request a detailed protocol

The 18 select NPP genes listed in Table 1 were used to search manually for cognate NP-GPCRs expressed in mouse cortex, relying primarily on ligand/receptor pairing data retrieved from the IUPHAR/BPS Pharmacology website (http://www.guidetopharmacology.org, accessed in March, 2018) and the Tasic 2018 NP-GPCR expression data tabulated in Supplementary file 5. This process resulted in selection of the 29 mouse NP-GPCR genes listed in Table 2, which also lists for each the corresponding cognate NPP gene or genes used to root the search. The matching of NP-GPCR and NPP genes in Table 2 neglects a few receptor/ligand pairings rated on the IUPHAR/BPS website as very low in affinity compared to primary pairings.

Autoencoder-based classifier development and evaluation methods

Gene sets

Request a detailed protocol

Table of different sets of genes used for experiments shown in Figure 4:

Gene setDescription
NP47The combined set of 18 NPPs and 29 NP-GPCRs
HE6083 genes selected based on maximum value across all neurons in the dataset
DE4020 differentially expressed genes for Tasic 2018 neurotaxonomy
DE4747 most variable genes selected from the set of DE genes
Rand47Random subsets of 47 genes drawn from the set of HE genes
Rand47 ExpMatchedRandom subsets of 47 genes such that the maximum expression value approximately matches that of the NP genes

Autoencoder network architecture

Request a detailed protocol

Autoencoders are multi-layer, feedforward neural network models that consist of encoder/decoder subnetworks. In its basic realization, the encoder subnetwork learns to compress the high dimensional input into a low dimensional representation, from which the decoder subnetwork estimates the original input. We constructed a network with two autoencoders, with eight hidden layers each. The architecture of the first autoencoder (HE Genes autoencoder, Figure 4C) is Input(6083) → Dropout(0.8) → Dense(100) → Dense(100) → Dense(100) → Dense(100) → Dense(d) → Batch Normalization (latent representation z1) → Dense(100) → Dense(100) → Dense(100) → Dense(100) → Dense(6083), and the architecture of the second autoencoder (NP Genes autoencoder, Figure 4B) is Input(47) → Dropout(0) → Dense(x) → Dense(x) → Dense(x) → Dense(x) → Dense(d) → Batch Normalization (latent representation z2) → Dense(x) → Dense(x) → Dense(x) → Dense(x) → Dense(47). The numbers in parentheses of Dense denote the number of fully connected units in those layers. All Dense layer units use the rectified linear (ReLU) function as the nonlinear transformation except for those in the Dense(d) layers, which do not use a nonlinear transformation. For results using the NP genes autoencoder x = 50; tests with x = 25, led to qualitatively similar results (not shown) and did not change overall conclusions of the analyses. The Dropout layer (Srivastava, 2014) is used with dropout probability = 0.8 to regularize the HE Genes autoencoder and prevent over-fitting. The numbers of input/output units in each network match the number of input genes. The two dimensional representations (d = 2) shown in Figure 4A–B, and the five dimensional (d = 5) representations used in Figure 4D are the outputs of the Batch Normalization layer (Ioffe and Christian, 2015) for the respective networks. We determined the optimal latent space dimensionality d = 5 for the quantitative analysis by varying the latent space dimensionality of the HE Genes network between 2 and 20 dimensions and choosing the value that maximized the QDA analysis-based cell type classification accuracy for the HE genes (see Figure 4—figure supplement 2).

Autoencoder training

Request a detailed protocol

Both autoencoder networks were trained using the backpropagation algorithm with the Adam optimizer (Kingma and Ba, 2014) and a batch size of 956. The HE genes autoencoder was trained for 50,000 epochs using the mean squared error between the input and the output layers as the loss function. The NP genes autoencoder was trained for 10,000 epochs using L = R+λC as the loss function, where R denotes the mean squared reconstruction loss as in the HE genes network, C denotes the penalty for mismatch between the latent representations, and λ = 100 is the weighting scalar between the two terms. After training the HE genes network and obtaining the latent representation z1 for each cell, C calculates the mean squared error between the latent representation of the NP genes network z2 and z1, while simultaneously normalizing variance along the narrowest direction for z2. The two additive loss terms, R and C, together minimize the reconstruction error while attempting to match the representation learned using only the HE gene set. The same procedure was used for all small gene subsets including NP and random gene sets. Python implementations of the networks using the Tensorflow and Keras libraries are included in the code repository.

Quantifying abilities of gene sets to classify cell types

Request a detailed protocol

The neurotaxonomy of Tasic et al. (2018) defines hierarchical relationships of neuronal cell types. For each gene set, we used Quadratic Discriminant Analysis (QDA) to train multiple classifiers on the latent space representations to predict labels at different levels of the cell type hierarchy. The different levels (nodes) in the hierarchy were characterized in Tasic 2018 with a resolution index measure. Here we re-normalized that resolution index measure to have a value of 0.0 for the class of neurons (root node), and 1.0 for the 115 VISp+ALM cell types (leaf nodes, inset in Figure 4D). All intermediate nodes in the hierarchical classification tree have a positive resolution index that is less than 1.0. We used this property of nodes in the hierarchical classification tree to assign a resolution index (RI) value to each cell. The procedure starts with a classifier that was trained using all the leaf node labels, that is all the 115 VISp+ALM cell type labels. Test cells that are classified correctly at this level are assigned RI = 1.0, which corresponds to the resolution index measure of the leaf nodes. Test cells that are incorrectly classified at this level of detail are re-assigned labels by a classifier that was trained on successively merged labels along the hierarchical tree till they are correctly classified. These cells receive the resolution index value of the node for which they are assigned the correct label. This procedure was performed using 13 fold cross validation for all the different gene sets, and the results were pooled.

Peptidergic coupling matrices

Request a detailed protocol

For a given cortical area A{ALM,VISP} , we denote by NPPA(g,t) the mean CPM expression matrix having entries NPP gene g and cell type t. Similarly, NPGPCRA(h,t) has as entries the expression of NP-GPCR gene h in type t. The coupling matrix C(g,t)A of the pair (g,h) in area A is then defined C(g,t)a(t,s)=log10(NPPA(g,t)×NPGPCRA(h,s)) for the fixed pair (g,h) in (NPP, NP-GPCR) as t,s range over all cell types in A. Matrices C(g,t)A are formally the (square matrix) outer product NPPANPGPCRA then presented in log10 units. Pooled representations are computed by averaging values of coupling matrices C(g,t)A over 12 major cell types prior to rendering.

Transduction mode predictions

Request a detailed protocol

Peptidergic coupling matrices are summed, log10 scaled and maximum normalized independently according to Gi/o, Gs and Gq/11 family membership, then displayed in red, green and blue, respectively. Pooled representations are computed by averaging type-level data over subclasses before similar rendering.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
    A PKA activity sensor for quantitative analysis of endogenous GPCR signaling via 2-photon FRET-FLIM imaging
    1. Y Chen
    2. JL Saulnier
    3. G Yellen
    4. BL Sabatini
    (2014)
    Frontiers in Pharmacology, 5, 10.3389/fphar.2014.00056.
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Theoretical Neuroscience - Computational and Mathematical Modeling of Neural Systems
    1. P Dayan
    2. LF Abbott
    (2001)
    MIT Press.
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
    Handbook of Biologically Active Peptides
    1. T Hökfelt
    2. SO Ögren
    3. X Zqd
    (2013)
    Classical Neurotransmitters and Neuropeptides, Handbook of Biologically Active Peptides, Elsevier.
  56. 56
  57. 57
  58. 58
  59. 59
    Batch normalization: accelerating deep network training by reducing internal covariate shift
    1. S Ioffe
    2. S Christian
    (2015)
    Proceedings of the 32nd International Conference on International Conference on Machine Learning.
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
    Methods in Neuronal Modeling: From Ions to Networks, Computational Neuroscience
    1. C Koch
    2. I Segev
    (1998)
    MIT Press.
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
    An Overview on GPCRs and Drug Discovery: Structure-Based Drug Design and Structural Biology on GPCRs
    1. K Lundstrom
    (2009)
    In: W. R Leifert, editors. G Protein-Coupled Receptors in Drug Discovery. Humana Press. pp. 51–66.
    https://doi.org/10.1007/978-1-60327-317-6_4
  79. 79
  80. 80
  81. 81
  82. 82
    Chapter 18: Peptides
    1. RE Mains
    2. BA Eipper
    (2006)
    In: G. J Siegel, R. W Albers, S Brady, D. L Price, editors. Basic Neurochemistry. Elsevier Academic Press. pp. 317–332.
    https://doi.org/10.1016/B978-0-12-374947-5.00020-1
  83. 83
  84. 84
  85. 85
  86. 86
    Spike Timing-Dependent plasticity: a comprehensive overview
    1. H Markram
    2. W Gerstner
    3. PJ Sjöström
    (2013)
    Frontiers in Synaptic Neuroscience 4:2.
  87. 87
  88. 88
    Editorial overview: neuromodulation: tuning the properties of neurons, networks and behavior
    1. DA McCormick
    2. MP Nusbaum
    (2014)
    Current Opinion in Neurobiology, 29, 10.1016/j.conb.2014.10.010, 25457725.
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
    Dropout: a simple way to prevent neural networks from overfitting
    1. N Srivastava
    (2014)
    The Journal of Machine Learning Research 15.1:1929–1958.
  118. 118
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
  126. 126
  127. 127
  128. 128
  129. 129
  130. 130
  131. 131
  132. 132
  133. 133
  134. 134
  135. 135
  136. 136
  137. 137
  138. 138
  139. 139
  140. 140
  141. 141

Decision letter

  1. David D Ginty
    Reviewing Editor; Harvard Medical School, United States
  2. Eve Marder
    Senior Editor; Brandeis University, United States
  3. Gordon Fishell
    Reviewer; Harvard Medical School, United States
  4. Bernardo L Sabatini
    Reviewer; Howard Hughes Medical Institute, Harvard Medical School, United States
  5. Matthew Ryan Banghart
    Reviewer; Harvard Medical School, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The impressive study by Smith and colleagues tackles in unprecedented fashion the relationship between the expression of neuropeptides and their cognate GPCRs. Using the same two cortical regions that the Allen institute has previously used for comparison (the visual and ALM cortices), Smith and colleagues compare the cell type specificity of peptide and receptors across the cortex. A number of fundamental observations are made: 1) virtually every neuronal type expresses multiple discrete types of NPP and associated receptors; 2) GABAergic cells show more NPP diversity while; 3) Glutamatergic cells show more diversity in receptor expression; 4) the 47 pairs of peptides and receptors can uniquely define cell types with high precision; 5) the relationships between peptides and receptors in stereotyped in a region specific manner. These are all observations of first rate importance, and I'd like to congratulate the authors for taking on a complex problem and discussing the underlying logic so systematically.

Decision letter after peer review:

Thank you for submitting your article "Single-cell transcriptomic evidence for dense intracortical neuropeptide networks" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Eve Marder as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Gordon Fishell (Reviewer #1); Bernardo L Sabatini (Reviewer #2); Matthew Ryan Banghart (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Smith et al. performed analyses on a publicly available single-cell transcriptomic dataset from Tasic et al., 2018, to generate testable hypotheses regarding local neuropeptide signaling between neuronal cell types in the mouse cortex. Their main findings are: (1) 18 neuropeptides (NPs) and their receptors are highly expressed in most neurons across cortical areas, (2) most cells have multiple NPs and NP receptors, and (3) 18 NPs and their receptors are differentially expressed between neuronal cell types. These results highlight the importance of elucidating local neuropeptide signaling and their consequences on circuit and network activity. The study also provides a potentially useful framework for predicting intercellular signaling networks and generating experimental hypotheses from transcriptomic datasets. The study is timely, the manuscript is well written, and the results are interesting. For the revision, while no wet lab experiments are requested, the reviewers agree that additional data analysis, methodological explanations and discussion points, as described below, are warranted.

Essential revisions:

1) Table 1: Since the dataset is enriched for certain cell types including Vip, Sst and Ndnf (partly Npy) cells, it is not surprising that these peptides appear high up in the list. The FACS sorting process probably even selects cells with higher peptide expression within these subtypes based on FACS sorting threshold criteria. How would the analysis change if the dataset represented natural proportions of cell types? Presenting the data in this way can be confusing. Is there a way to present the data so that it becomes independent of how many cells per cell type are included? Would Npy, Sst, Vip for example have much lower peak FPKM and pFPKM percentile/rank? Similarly, Figure 1B and Table 3 (Fraction of pairs) do not represent biological distributions but are strongly influenced by enrichment of certain cell types. Table 3: would it be better to present fraction of cell type pairs rather than fraction of cell pairs for same reason?

2) Figure 4: Analysis of the transcriptomic data using the autoencoder was central to many of the main findings, but there was an overall lack of discussion of both the methodology and details of the features learned by the autoencoder. We have listed several areas for further discussion below:

How was the autoencoder architecture chosen, and why is there an increase rather than a decrease in the number of dimensions for the NP autoencoder (47 to 50 vs. 6,083 to 100)? Were there other architectures tested that did not perform as well as the one presented in this study? Please discuss this. Related, it was unclear why the authors chose to use the HE gene set instead of the 4,000 differentially expressed (DE) genes for WGCNA in Tasic et al., 2018. Did using the HE genes perform better or worse than taking the most variable or differentially expressed genes? Also, for the sets of 47 random genes, the authors could have matched these "random" sets to the NP gene set by measures of variability or differential expression instead of matching expression levels. This would make for a more interesting comparison than the random 47-gene sets drawn from all genes in the Tasic et al., 2018, dataset, since many of these randomly drawn genes might not be differentially expressed.

3) What do the features in the 5-d latent space of the autoencoder networks look like in terms of gene weights, and how do these dimensions/vectors compare to the principal components? This will also potentially help with clarifying/understanding the nature of the input used by the GMM classifier for classifying the cells that may subsequently affect the resolution index.

4) Figures 6, 7, and supplements to Figure 6: The coupling scores and matrices in these figures were important for inferring or predicting neuropeptide signaling between neuronal cell types. However, the authors provided little discussion of any significant trends or differences beyond their presentation of the individual coupling matrices for single cognate pairs. The study will be more impactful if the authors could provide more detailed discussion regarding the structure within these coupling matrices, with examples of specific differences or generalized trends observed from that structure to generate specific experimental hypotheses. Some specific points for further discussion are listed below:

Why was the threshold set to 50th percentile? Why not at certain CPM? The reasoning and method here is not completely clear. It seems that this strict threshold produces too many false negatives that may unnecessarily discard biologically plausible interactions to be evaluated.

The coupling matrices are all presented at the cell type level and not at the subclass or "family" level (branches/clades between subclass and type), although there is clearly some clustering or block structure at these higher levels along the hierarchical tree. Is NP signaling more type-specific or generalizable to a family/subclass? Perhaps a heatmap visualization at varying levels besides cell types/leaves more similar to Figure 5 will help.

5) The authors could provide more details on which cognate pairs are region-specific vs. "conserved" across regions based on the sparsity of the corresponding coupling matrix, and to compare this to the known properties of the corresponding NP-expressing cell type/family/subclass (e.g. the disinhibitory action of Vipinterneurons via inhibition of Sst/Pvalb interneurons). Do the 47 pairs distribute in a laminar specific pattern and does this perhaps suggest whether they are used to augment bottom up, top-down or recurrent cortical activity? Also, the authors could comment more on the density/sparsity for each NP. Are there NPs that seem to have very specific signals (almost one-to-one), and which ones are much broader? For example, Trh->Trhr appears sparse/type-specific, whereas Adcyap1 appears to have a broad range of targets via multiple receptors and Crh->Crhr1/2 appears to act at an intermediate scale. Please provide more discussion of the variation in the sparsity of coupling for the cognate pairs analyzed.

6) Related to point 5, could the authors also provide more discussion of autocrine vs. paracrine NP signaling since there seem to be low coupling scores along the diagonal. This may suggest that there are very specific NPs that may act specifically in an autocrine manner, perhaps for autoinhibition in the case of Gi/o coupled pathways.

7) The presentation of the coupling matrices for single cognate pairs makes it difficult to appreciate structure/trends by the predominant coupling of downstream pathways. Are there preferred/predominant couplings for each subclass/family/type, e.g. mostly Gi-coupled for Sstinterneurons?

8) In Figure 7, it would be helpful if the authors could provide an example of a network graph diagram depicting their inferred NP signaling for a particular cell type or subclass, such as the VISp Vipinterneurons.

https://doi.org/10.7554/eLife.47889.sa1

Author response

Essential revisions:

1) Table 1: Since the dataset is enriched for certain cell types including Vip, Sst and Ndnf (partly Npy) cells, it is not surprising that these peptides appear high up in the list. The FACS sorting process probably even selects cells with higher peptide expression within these subtypes based on FACS sorting threshold criteria. How would the analysis change if the dataset represented natural proportions of cell types? Presenting the data in this way can be confusing. Is there a way to present the data so that it becomes independent of how many cells per cell type are included? Would Npy, Sst, Vip for example have much lower peak FPKM and pFPKM percentile/rank? Similarly, Figure 1B and Table 3 (Fraction of pairs) do not represent biological distributions but are strongly influenced by enrichment of certain cell types. Table 3: would it be better to present fraction of cell type pairs rather than fraction of cell pairs for same reason?

This query raises (a) the possibility that our peak FPKM values might be artifactually elevated by the recombinase-driven, FACS-based cell sampling process used by Tasic et al., 2018 to enrich collection of rare cell types, and (b) the likelihood that our results would be more forceful if expressed in terms of natural cell-type proportions, rather than proportions skewed by FACS-based type enrichment.

a) Are peak FPKM estimates artifacts of cell collection biases? As we have now attempted to describe more fully in our new Materials and methods section, our peak FPKM metric was designed to be as insensitive as possible to the absolute number of cells surveyed. The reviewers are nonetheless correct to note that it is conceivable that numbers could be influenced somehow by differential recombinase expression and FACS sorting. Such skewing would seem most likely to arise where sorting was keyed by xFP expression driven by an NPP-gene-based recombinase line (9 out of the 55 used by Tasic et al., 2018, see their Extended Data Figure 8). To address this possibility, we have compared peak FPKM scores extracted across the entire Tasic dataset (as reported in our Table 1 and below as “Tasic 2018 Mixture”) with pFPKM values extracted from four of their most heavily represented recombinase-driven subsets (with actual sorted cell counts): Gad2-IRES-Cre (2131), Slc32a1-IRES-Cre (2306), Vip-IRES-Cre (569), Sst-IRES-Cre (551). Of these four subsets, two (Vip-IRES-Cre and Sst-IRES-Cre) are driven by NPP-based recombinase lines and two by non-NPP lines (GAD2 and Slc32a1). The results are tabulated below:

Peak FPKM across Mixed and Cre-Restricted Neuron Populations
RankGeneTasic 2018 MixtureGad2-
IRES-Cre
Slc32a1-IRES-CreVip-
IRES-Cre
Sst-
IRES-Cre
1Npy108,865113,90493,70356,104103,188
2Sst70,27474,99966,10013567,095
3Vip48,74756,20148,89933,21729
4Tac218,28417,75218,38021,2125,395
6Cck16,39617,58618,80428,7111,580
8Penk11,16012,22714,0989,7193,705
10Crh9,11810,7359,1188,6052,058
15Cort7,4777,7796,3164,1869,110
18Tac15,7285,4674,818748,653
68Pdyn2,8133,1383,34303,276
156Pthlh1,6561,5611,7991,969194
509Pnoc698702758759385
766Trh510895580
968Grp43515010127329
1757Rln125817916078303
2788Adcyap1165496400
3917Nts1216116217532
4270Nmb11246343916

This table shows the variations expected from design of the NPP-based recombinase lines and prevailing knowledge of interneuron cell types (e.g., low expression of Sst in the Vip line and low expression of Vip in the Sst line), but also shows clearly that the high peak FPKM values we report in Table 1 are observed in the large GABA neuron subsets sorted by non-NPP-based recombinase expression. Based on these observations, we believe it is unlikely that the high NPP gene expression levels we report in Table 1 are artifact of the Tasic et al. FACS-sorted single-cell harvesting regime.

b) Does cell collection bias constrain interpretation of results? We concur wholeheartedly that our results could be more forceful and useful if the Tasic data and our analysis accurately reflected natural proportions of cell types. Unfortunately, it was judged impractical at the time Tasic et al. collected their data to assure such quantification, for two reasons. First, it was anticipated based on previous literature that some interneuron cell types would be so rare as to require prohibitive effort and expense to reach rare types via unbiased cell collection. Second, it was apparent in pilot studies that substantial collection bias would be unavoidable due to differential retention of different cell types and their mRNA during the process of single cell isolation. We have added columns labels “#Cells” to Figure 3—figure supplement 1. While these numbers do not reflect natural cell-type proportions, they at least indicate the actual numbers of cells upon which the statistics of each Tasic et al. type are based. We anticipate that developments of highly multiplexed FISH and other spatial transcriptomics methods now under way at the Allen Institute and other institutions will soon enable the very desirable ability to link transcriptomic neuron types to natural type proportions and many other essentially anatomic characteristics.

We have added one new analysis as a check for possible artifacts of neuron-type-dependent collection bias. We tested for a possible dependence of autoencoder clustering results, as illustrated in Figure 4, on neuron type. The result, shown in the new Figure 4—figure supplement 1, is that clustering shows little or no type-dependence, adding some credence to our original interpretation of the main autoencoder results.

c) Would it be better to present fraction of cell type pairs rather than fraction of cell pairs (in Table 3) for same reason? Thank you for the suggestion! We agree that presenting “fraction of type pairs” will be more informative and have modified Table 3 accordingly.

2) Figure 4: Analysis of the transcriptomic data using the autoencoder was central to many of the main findings, but there was an overall lack of discussion of both the methodology and details of the features learned by the autoencoder. We have listed several areas for further discussion below:

How was the autoencoder architecture chosen, and why is there an increase rather than a decrease in the number of dimensions for the NP autoencoder (47 to 50 vs. 6,083 to 100)? Were there other architectures tested that did not perform as well as the one presented in this study? Please discuss this. Related, it was unclear why the authors chose to use the HE gene set instead of the 4,000 differentially expressed (DE) genes for WGCNA in Tasic et al., 2018. Did using the HE genes perform better or worse than taking the most variable or differentially expressed genes? Also, for the sets of 47 random genes, the authors could have matched these "random" sets to the NP gene set by measures of variability or differential expression instead of matching expression levels. This would make for a more interesting comparison than the random 47-gene sets drawn from all genes in the Tasic et al., 2018, dataset, since many of these randomly drawn genes might not be differentially expressed.

We have expanded the description of our autoencoder analysis methodology substantially, both in Results and in our new Materials and methods section and explained our architecture choice in more detail. We have also conducted additional experiments with architecture hyperparameters as requested. Briefly, very similar results were obtained for different choices of the autoencoder architecture. That is, the qualitative results depend only weakly on the hyperparameters of the network architecture.

Both before our original submission and after receiving this review, we have carried out much additional autoencoder analysis pertinent to these queries. We have now added the most germane of these results to our presentation, particularly in Figure 4D and related narrative. As requested, we now compare clustering performance based on the Tasic, et al., 2018, 4,020 DE gene set with that based on the 6,083 HE genes. As now illustrated by Figure 4D and described in text, these two large gene sets show very high and essentially indistinguishable performance close to the theoretical maximum possible. We also now report and discuss comparison of clustering performance based on the 47 NP genes and 47 DE genes that were selected as the most differentially expressed between Tasic 2018 neuron types. This comparison shows that the RI performance by the 47 DE gene set beats that of the 47 NP gene set by a small margin. We feel that our conclusion of exceptional clustering by 47 NP genes alone is still justified by the much better performance of the 47 NP set in comparison to every one of the 200 random 47-gene sets, and the fact that the 47 DE genes had the “unfair” advantage of being chosen with benefit of prior knowledge of the Tasic 2018 taxonomy.

3) What do the features in the 5-d latent space of the autoencoder networks look like in terms of gene weights, and how do these dimensions/vectors compare to the principal components? This will also potentially help with clarifying/understanding the nature of the input used by the GMM classifier for classifying the cells that may subsequently affect the resolution index.

The latent space representations are obtained by transforming the transcript expression data into a low-dimensional, abstract space. Individual dimensions of this space represent nonlinear combinations of gene expression and are not necessarily tied to a small subset of genes. Moreover, it would be incorrect to explain the latent space with simple linear weights. Indeed, the network is able to compress the data into so few dimensions by learning the (non-linear) inter-dependent nature of gene expression. While the underlying mechanics are quite different, the abstract nature of the representation of our autoencoder network is the same as that of the perhaps more familiar tSNE transform.

When one studies the expression of a particular gene in this space, the landscape would appear complex for many genes, essentially capturing the fact that the expression of single genes is rarely localized to single cell types. On the other hand, the markers of well-known cell classes such as Vip, Sst, and Pvalb, are indeed enriched in the “islands” corresponding to those cell classes, directly reflecting the ability to classify them correctly.

The reviewers draw a very good analogy with PCA. Indeed, the objective function used for the autoencoder network is the same as that of the well-known PCA, namely the mean-squared error. The difference is that the transformation from the high dimensional gene expression space to the low-dimensional representation space is restricted to be linear in the case of PCA. Applying PCA to single-cell RNA sequencing data often produces only three ‘blobs’ corresponding to the excitatory, inhibitory, and non-neuronal cell classes. The finer distinctions within those classes are, however, lost because PCA fails to identify further structure within those ‘blobs’.

4) Figures 6, 7 and supplements to Figure 6: The coupling scores and matrices in these figures were important for inferring or predicting neuropeptide signaling between neuronal cell types. However, the authors provided little discussion of any significant trends or differences beyond their presentation of the individual coupling matrices for single cognate pairs. The study will be more impactful if the authors could provide more detailed discussion regarding the structure within these coupling matrices, with examples of specific differences or generalized trends observed from that structure to generate specific experimental hypotheses. Some specific points for further discussion are listed below:

Why was the threshold set to 50th percentile? Why not at certain CPM? The reasoning and method here is not completely clear. It seems that this strict threshold produces too many false negatives that may unnecessarily discard biologically plausible interactions to be evaluated.

The coupling matrices are all presented at the cell type level and not at the subclass or "family" level (branches/clades between subclass and type), although there is clearly some clustering or block structure at these higher levels along the hierarchical tree. Is NP signaling more type-specific or generalizable to a family/subclass? Perhaps a heatmap visualization at varying levels besides cell types/leaves more similar to Figure 5 will help.

What trends or differences do the coupling matrices reveal? We see now that we may have been too limited in our qualitative and quantitative description of the widely varied structures of our peptidergic network coupling predictions. While we are indeed fascinated by these structures, we have been somewhat reticent in elaborating upon them, because they are still only predictions. Much neuroanatomy, cell biology, physiology and biophysics remains to be done to test these predictions and cast them empirically then as facts (or fancy). We feel that our text as it stands discusses these reservations and their implications reasonably well. Having said that, we have embraced the spirit of this query enthusiastically with the substantial increments to network matrix characterization we discuss below and in connection with the next query #5.

How is the coupling matrix scheme justified? This suggestion/query inspired us to redesign and simplify our schemes for predicting and presenting coupling adjacency matrices. We now predict coupling as a straight outer product with NPP and NP-GPCR gene expression by cell type in CPM as of row and column vectors, deleting the former thresholding step (and with it the troublesome need to set a threshold value; mitigating also the tendency toward excess false negatives). Logarithmic color-scale compression is used to accommodate display of a very wide dynamic range of CPM*CPM signals. This scheme is laid out in graphical detail in our revised Figure 6A and applied to all additional examples in Figure 6 and Figure 6—figure supplement 1. As before, we also provide the underlying data and code to regenerate any or all matrices. We are grateful for the reviewer comment that led to this simplifying improvement.

How does peptidergic coupling look at higher taxonomic levels (e.g., subclasses rather than leaves)? Here we have augmented our presentation here very substantially. In Figures 3, 5 and 6 (and the new Figure 7), we have overlaid graphic color block fills and guidelines which demarcate subclass levels of the Tasic 2018 neurotaxonomy on our type-level expression heatmaps and coupling matrix plots. These graphics are intended to make it much easier to visualize patterns at higher taxonomic levels while still revealing variations at finer levels. We have also added a new analysis (depicted in new panels Figures 3C and 3D) to quantify the residual type-dependent gene expression variance after pooling at the level of subclasses. Additionally, a new panel (6F) added to Figure 6 and new panels for every cognate pair added to Figure 6—figure supplement 1 show coupling matrix prediction pooled at the subclass level establish that differential structures of predicted peptidergic coupling persists strongly at higher levels of the Tasic 2018 hierarchy, while at the same time revealing that strong and conserved variations persist at the level of types/leaves.

5) The authors could provide more details on which cognate pairs are region-specific vs. "conserved" across regions based on the sparsity of the corresponding coupling matrix, and to compare this to the known properties of the corresponding NP-expressing cell type/family/subclass (e.g. the disinhibitory action of Vip interneurons via inhibition of Sst/Pvalb interneurons). Do the 47 pairs distribute in a laminar specific pattern and does this perhaps suggest whether they are used to augment bottom up, top-down or recurrent cortical activity? Also, the authors could comment more on the density/sparsity for each NP. Are there NPs that seem to have very specific signals (almost one-to-one), and which ones are much broader? For example, Trh->Trhr appears sparse/type-specific, whereas Adcyap1 appears to have a broad range of targets via multiple receptors and Crh->Crhr1/2 appears to act at an intermediate scale. Please provide more discussion of the variation in the sparsity of coupling for the cognate pairs analyzed.

Are cognate pair distributions laminar specific? A preliminary analysis reveals only minor laminar specificity of cognate pair distributions. The Tasic 2018 data (their Figure 5) shows that the most marked laminar type/subclass differential for interneurons is that distinguishing Sst vs. Vipsubclasses. In Author response image 1, we add yet another color code and a set of matrix overlay lines to indicate upper vs. lower layer cell body harvest origins predominating by neuron type for representative Sst and Vip cognate pairs.

Author response image 1

These plots do not reveal much layer specificity within either subclass. This result has discouraged us from digging much more deeply into the presently available data. We feel we may be at the limits of the microanatomical information we can extract within the technical confines of the present data source. Furthermore, we are aware of spatial transcriptomic data likely to emerge soon from various research sites that should allow for much more conclusive investigation of the important issues raised here by the reviewers. We’d prefer to avoid overloading this report with more analysis along these lines when greatly improved clarity from new experimental findings seems imminent.

How does the density/sparsity of predicted type-coupling vary with cognate pair identity? These queries have inspired us to develop quantitative measures of coupling matrix structure to allow for comparisons across all cognate pairs and both cortical regions. New results linked to Figure 6 (figure supplements 2 and 3) delineate strong variations in sparsity and specificity amongst the 37 cognate pairs and confirm strong conservation between VISp and ALM areas, while also establishing that redundancy of overall architectures of the predicted 37 networks is rather minimal. As requested, we have also included more discussion on these topics.

6) Related to point 5, could the authors also provide more discussion of autocrine vs. paracrine NP signaling since there seem to be low coupling scores along the diagonal. This may suggest that there are very specific NPs that may act specifically in an autocrine manner, perhaps for autoinhibition in the case of Gi/o coupled pathways.

Indeed, there are fascinating indications of possible autocrine signaling in co-expression of cognate pairs evident in the Tasic 2018 data. We have linked a new Supplementary figure to Figure 2 to provide a preliminary glimpse into the structure of possible autocrine peptidergic signaling in mouse neocortex. Interestingly, the more strongly co-expressed autocrine pairs represent all three of the primary Gα families we consider, although there is a notable preponderance of Gi/o.

We are very interested in further exploration of possible autocrine neuropeptide signaling, and plan to actively pursue a complete analysis including experimental work, but we have felt that inclusion of this tangent here would risk serious overloading an already very lengthy report without doing justice to an important topic for future transcriptomic and physiological research.

7) The presentation of the coupling matrices for single cognate pairs makes it difficult to appreciate structure/trends by the predominant coupling of downstream pathways. Are there preferred/predominant couplings for each subclass/family/type, e.g. mostly Gi-coupled for Sst interneurons?

Motivated by the very premise put forth in this query, we have explored many approaches to compact visualization of predominant downstream receptor coupling modes for 37 cognate pairs. We are pleased to offer a new figure (Figure 7) and accompanying narrative that we feel begins to address this issue fairly effectively and may help guide future efforts at empirical tests of our coupling predictions.

8) In Figure 7, it would be helpful if the authors could provide an example of a network graph diagram depicting their inferred NP signaling for a particular cell type or subclass, such as the VISp Vip interneurons.

The network integration “cartoon” schematic (formerly Figure 7, now Figure 8) has been redrawn and re-framed to convey more clearly our original intent, which was to summarize our basic methodology while highlighting the likely future importance of neurotaxonomy as means to integrate statistical descriptions of multiple neuronal signaling networks, be they modulatory, synaptic or both. We apologize if this message seems a bit orthogonal to the more concrete and specific direction the reviewers suggest here. We hope that we have now stated our intent more clearly and that the reviewers find this dose of speculation reasonably justifiable and worthwhile here.

https://doi.org/10.7554/eLife.47889.sa2

Article and author information

Author details

  1. Stephen J Smith

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Investigation, Visualization, Writing—original draft, Writing—review and editing
    For correspondence
    StephenS@alleninstitute.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2290-8701
  2. Uygar Sümbül

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Visualization, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7134-8897
  3. Lucas T Graybuck

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Resources, Data curation, Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8814-6818
  4. Forrest Collman

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0280-7022
  5. Sharmishtaa Seshamani

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Resources, Data curation, Software, Formal analysis, Investigation, Methodology
    Competing interests
    No competing interests declared
  6. Rohan Gala

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Data curation, Software, Formal analysis, Investigation, Visualization, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1872-0957
  7. Olga Gliko

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Data curation, Software, Formal analysis, Investigation, Writing—original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5014-7209
  8. Leila Elabbady

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Data curation, Software, Formal analysis, Investigation, Visualization, Writing—original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1452-1603
  9. Jeremy A Miller

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Data curation, Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4549-588X
  10. Trygve E Bakken

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Data curation, Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3373-7386
  11. Jean Rossier

    Neuroscience Paris Seine, Sorbonne Université, Paris, France
    Contribution
    Data curation, Software, Investigation, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1821-2135
  12. Zizhen Yao

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Resources, Data curation, Formal analysis, Investigation, Methodology
    Competing interests
    No competing interests declared
  13. Ed Lein

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Resources, Supervision, Investigation, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9012-6552
  14. Hongkui Zeng

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Resources, Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0326-5878
  15. Bosiljka Tasic

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Resources, Supervision, Investigation, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6861-4506
  16. Michael Hawrylycz

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    MikeH@alleninstitute.org
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5741-8024

Funding

National Institutes of Health (R01NS092474)

  • Stephen J Smith

National Institutes of Health (R01MH104227)

  • Stephen J Smith

National Institutes of Health (1U24NS109113)

  • Stephen J Smith

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We wish to thank the Allen Institute for Brain Science founder, Paul G Allen, for his vision, encouragement and support. This work was supported in part by award number R01NS092474 from the Office of the Director of National Institutes of Health and award number R01MH104227 from the National Institute of Mental Health. The content is solely the responsibility of the authors and does not necessarily represent official views of the National Institutes of Health.

Senior Editor

  1. Eve Marder, Brandeis University, United States

Reviewing Editor

  1. David D Ginty, Harvard Medical School, United States

Reviewers

  1. Gordon Fishell, Harvard Medical School, United States
  2. Bernardo L Sabatini, Howard Hughes Medical Institute, Harvard Medical School, United States
  3. Matthew Ryan Banghart, Harvard Medical School, United States

Publication history

  1. Received: April 23, 2019
  2. Accepted: November 10, 2019
  3. Accepted Manuscript published: November 11, 2019 (version 1)
  4. Version of Record published: November 27, 2019 (version 2)

Copyright

© 2019, Smith et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    2. Physics of Living Systems
    Joanne C Gordon et al.
    Research Article Updated
    1. Neuroscience
    Julia Erb et al.
    Research Article