The connectivity of diverse types of neurons constrains the flow and processing of information in neural circuits. Neuroanatomical techniques based on microscopy rely on sparse labeling combined with optical tracing to achieve single-cell resolution (Winnubst et al., 2019; Peng et al., 2021; Gao et al., 2022).

However, these approaches are usually labor-intensive and difficult to scale up. Furthermore, because many of these approaches rely on specialized imaging platforms, they are difficult to combine with transcriptomic interrogation for defining cell types. Thus, developing high-throughput neuroanatomical techniques that can map neuronal connectivity and associate connectivity with gene expression at single-cell resolution could help generate new insights into the organization of neural circuits that are difficult to gain using conventional neuroanatomical techniques.

RNA barcoding-based neuroanatomical techniques are a new approach that dramatically improves the throughput of single-neuron projection mapping (Kebschull et al., 2016; Chen et al., 2019; Sun et al., 2021). In these techniques, Sindbis virus has been used to express random RNA sequences, or RNA “barcodes,” to uniquely label each neuron. The barcoded virus RNAs replicate and distribute throughout the soma and axons; therefore, matching barcodes in the axons to those in the somata reveal the axonal projections of individual neurons. Because many barcode molecules can be sequenced in parallel, barcoding-based neuroanatomical approaches can determine the long-range projections of tens of thousands of neurons in single animals. These barcoding-based techniques have been applied to various brain regions (Kebschull et al., 2016; Han et al., 2018; Chen et al., 2019; Gergues et al., 2020; Klingler et al., 2021; Mathis et al., 2021; Munoz-Castaneda et al., 2021; Sun et al., 2021; Chen et al., 2022b) and have validated and extended findings by previous studies using conventional neuroanatomical techniques.

Despite the transformative power of barcoding-based neuroanatomical techniques, the Sindbis virus-based approaches can only reveal information about the axons (i.e., projection mapping) of labeled neurons with cell bodies at the injection sites, because these vectors do not spread between cells and do not retrogradely infect cells via their axon terminals. These approaches thus cannot obtain information about the neurons’ synaptic partners, including their transcriptomic identities.

A potentially powerful extension of this approach that could allow mapping of connections, instead of just projections, would be to use rabies virus. Rabies virus naturally spreads between synaptically-connected neurons in the retrograde direction. In particular, the use of deletion-mutant rabies viruses (e.g. “ΔG” viruses, with their glycoprotein gene G deleted) to perform transsynaptic labeling has become common in neuroscience, as it allows “monosynaptic tracing” (Wickersham et al., 2007b; Wall et al., 2010), or the viral labeling of neurons directly presynaptic to a targeted starting population of neurons.

In the standard monosynaptic tracing paradigm, a targeted starting population of neurons is first transduced by “helper” adeno-associated viruses (AAVs), causing them to express the rabies virus gene G (to complement the G-deleted recombinant rabies) and an avian cell surface protein, TVA. TVA is the receptor for an avian retrovirus (avian sarcoma and leukosis virus subgroup A, or ASLV-A) that is unable to infect mammalian cells (Young et al., 1993). Subsequently, a ΔG rabies virus that is packaged with the ASLV-A envelope protein (EnvA) is injected at the same location to selectively infect the TVA-expressing cells. Due to the expression of the rabies viral gene G in trans, the ΔG rabies virus replicates and spreads from these “source cells” to neurons directly presynaptic to them. In a second approach, ΔG rabies virus that is packaged with its own glycoprotein can be used to retrogradely label neurons with projections to a target area (Wickersham et al., 2007a). This approach, in contrast to the monosynaptic tracing paradigm, reveals only the axonal projections of neurons, but not their synaptic connectivity.

RNA barcoding can be combined with both rabies virus-based tracing approaches to drastically improve the throughput at which projections or connectivity can be interrogated at cellular resolution. Retrograde labeling from multiple locations using several preparations of glycoprotein-deleted rabies virus (RVΔG), each carrying different barcodes, would allow multiplexed retrograde tracing, i.e., the association of neurons with their projections to many different injection sites within single brains (Fig. 1A). This approach is conceptually similar to that used in a recent study in which barcoded adeno-associated virus (AAV) was used to achieve multiplexed retrograde labeling (Zhao et al., 2022), but using rabies virus could potentially achieve broader tropism (Chatterjee et al., 2018). Multiplexed retrograde tracing approaches complement anterograde tracing techniques: Because only neurons that are labeled at the injection sites are mapped in anterograde tracing, it is difficult to precisely estimate how the density of projection neurons vary across large brain regions; in contrast, retrograde labeling approaches label all neurons with projections to the injection site. As anatomical borders between brain areas (e.g. cortical areas) usually correspond to distinct changes in cytoarchitecture (Brodmann, 1909; Vogt and Vogt, 1919; Von Bonin, 1947) and transcriptomically defined neuronal types (Chen et al., 2022a), retrograde labeling is particularly valuable for understanding how long-range connectivity is associated with cell types and anatomical boundaries.

Multiplexed retrograde labeling and monosynaptic tracing using barcoded rabies virus.

(A) In multiplexed retrograde labeling, rabies viruses carrying different barcodes are injected into different brain regions, and retrogradely labeled neurons can be distinguished based on the barcodes they carry. (B) In multiplexed monosynaptic tracing, source cells labeled by helper AAV viruses expressing TVA and rabies G can be infected by barcoded rabies virus, which then passes the barcode to presynaptic neurons. Both rabies barcodes and endogenous mRNAs can be read out to infer cell/type connectivity. However, if multiple source cells share the same barcode, they may obscure single-cell connectivity mapping and must be filtered out.

Alternatively, combining the transsynaptic labeling approach with RNA barcoding could allow sequencing-based readouts of synaptically-connected networks of neurons, providing information about the synaptic partners of neurons, rather than only their patterns of axonal projections. Specifically, simply using the standard monosynaptic tracing system but using a barcoded pool of ΔG rabies virus could allow mapping of synaptic connectivity between many cells at cellular resolution (multiplexed transsynaptic labeling; Fig. 1B). Although other techniques, such as electron microscopy (Bae et al., 2021; Androvic et al., 2022; Zheng et al., 2022; Schneider-Mizell et al., 2023) and multi-patch experiments (Campagnola et al., 2022), can also read out synaptic connectivity, these approaches are difficult to scale up to many neurons across large areas of the brain and/or difficult to relate connectivity to the gene expression of neurons. Thus, achieving barcoded transsynaptic tracing using rabies virus could potentially transform the scale at which synaptic connectivity of transcriptomic types of neurons can be interrogated at cellular resolution.

Using barcoded rabies virus-based approaches to associate connectivity or projections of neurons with their gene expression requires overcoming several technical and conceptual challenges. First, rabies virus alters the gene expression of infected cells (Patino et al., 2022), which may obscure the transcriptomic signatures of neurons. It is unclear how changes in gene expression affect the ability to distinguish fine-grained transcriptomic types and whether cell typing is still possible using spatial transcriptomic approaches, such as in situ sequencing, which interrogate a targeted panel of genes. Second, conventional single-cell sequencing is costly and labor intensive. Thus, sequencing throughput will likely limit the multiplexing advantages associated with barcoding strategies. Third, in a barcoded transsynaptic labeling experiment, synaptic connectivity between individual source cells and their presynaptic partners can only be inferred from networks of barcode-sharing neurons with a single source cell (Fig. 1B). A barcode, however, may be found in multiple or no source cells, which would prohibit connectivity mapping at the single-neuron level. Finally, accurately assessing the distribution of barcodes in a transsynaptic labeling experiment requires sampling most barcodes and/or source cells in an experiment. This is difficult to achieve using single-cell sequencing approaches that require tissue dissociation, because loss of cells during dissociation is inevitable. These challenges are fundamental barriers for using barcoded transsynaptic labeling to infer synaptic connectivity at cellular resolution regardless of the choice of virus. Solving these problems will not only allow barcoded rabies virus-based connectivity mapping, but also provide a foundation for potential future techniques based on a wide range of transsynaptic viruses, such as herpes simplex virus (Ugolini et al., 1989; Xiong et al., 2022; Fischer et al., 2023), pseudorabies virus (Martin and Dolivo, 1983), vesicular stomatitis virus (Beier et al., 2013), and yellow fever virus (Li et al., 2021). Because in situ sequencing (Ke et al., 2013; Chen et al., 2022a) can read out both endogenous mRNAs and random RNA barcodes (Chen et al., 2019; Sun et al., 2021) with high throughput and low cost, and does not rely on tissue dissociation (Chen et al., 2019; Sun et al., 2021), it is uniquely suited to overcome the challenges associated with barcoded rabies tracing.

Here we adapt single-cell RNAseq and an in situ sequencing approach based on Barcoded Anatomy Resolved by Sequencing (BARseq) (Chen et al., 2019; Sun et al., 2021) to map connectivity of transcriptomic types of neurons using barcoded rabies-based retrograde labeling and transsynaptic labeling. We examine the effect of rabies virus infection on the gene expression and clearly identify transcriptomic identities of rabies-labeled neurons. We then elaborate on how different types of barcode-sharing networks of neurons can affect connectivity mapping in a transsynaptic labeling experiment and how these networks can be distinguished theoretically. Finally, we perform scRNA-seq and in situ sequencing on neurons that were transsynaptically labeled by rabies virus to identify pairs of cell types that showed preferences to synapse on to the same post-synaptic neurons.


Identifying transcriptomic types of retrogradely traced neurons by barcoded rabies virus.

To assess whether we can robustly identify transcriptomically defined neuronal types in a multiplexed retrograde labeling experiment, we performed two-plex retrograde labeling using two libraries of barcoded rabies virus coated with the native rabies glycoprotein (Fig. 2A; Methods). In addition to encoding the red fluorescent protein mCherry (Shaner et al., 2004), the two viral libraries contained 20-nt barcode cassettes located in the 3’UTR of the rabies nucleoprotein mRNA, which allowed high-level expression of the barcodes (Conzelmann, 1998). In addition, one of the libraries contained a 22-nt exogenous sequence (the 10x Genomics “Chromium capture sequence 2”, referred to below as CCS) next to the barcode cassette. We sequenced the two barcoded libraries using Illumina next-generation sequencing and identified at least 8,552 and 13,211 barcodes, respectively (see Methods; Fig. 2B). We did not find barcodes that were present in both libraries. Thus, the two libraries could be distinguished both by their barcode sequences and the presence or absence of the CCS.

High-quality cell typing of rabies virus-barcoded neurons using single-cell RNA-seq.

(A) Illustration of the design of barcoded rabies virus libraries. (B) Barcode distribution in the CCS and non-CCS libraries. (C) Outline of the experiments. (D) Cluster mapping confidence of rabies-labeled cells from this study and AAV-labeled cells from (Graybuck et al., 2021). (E) Select marker gene expression in uninfected cells from (Tasic et al., 2018) and rabies-infected cells of matching cell types. (F) UMAP plot of gene expression of cells infected with rabies virus (black) overlaid on non-infected cells from (Tasic et al., 2018). The uninfected cells are color-coded by cluster identities and cluster names are indicated. (G) The expression of rabies-encoded genes in the sequenced cells. The count for each rabies-encoded gene was scaled by the sum of all reads that mapped to viral constructs multiplied by 10,000. Natural logarithms of resulting scaled counts, ln(scaled counts+1), were used to represent expression levels of virally encoded genes. The bar on top indicates donor animals. (H) Example barcode sequences in three sequenced cells. Letter height indicates probability at each position. Gray boxes indicate the barcode region. (I) Distribution of cell types of retrogradely labeled cells. Colors indicate cell types and match those in (F), and disc sizes indicate number of cells.

We injected the two libraries into the dorsal lateral geniculate nucleus (LGd) and anterolateral visual cortex (VISal), respectively, in two animals (Fig. 2C). After 7 days, we dissected out the primary visual cortex (VISp), dissociated the neurons, and FACS-isolated 48 mCherry-expressing neurons from each animal for scRNA-seq using SMART-seq v4 (see Methods). We then mapped each neuron to reference scRNA-seq data (Tasic et al., 2018) to determine the transcriptomic identity of each sequenced neuron.

scRNA-seq data from rabies virus-infected cells had comparable quality to non-infected cells (Supp Fig. S1A). Of 96 RFP-expressing cells sequenced, 75 neurons were of high quality (with > 100,000 total reads, > 1,000 detected genes, and odds ratios of GC dinucleotides <0.5; see Methods for details). We mapped each single cell transcriptome to the reference taxonomy tree of the visual cortex (Tasic et al., 2018) by comparing a cell’s marker gene expression with the marker gene expression of the reference cell types (Gouwens et al., 2020; Graybuck et al., 2021) (Methods). A total of 54 cells were mapped to the reference taxonomy tree with high quality [mapping confidence > 0.7 (5 cells removed) and mapping correlation > 0.6 (an additional 16 cells removed); Fig. 2D; Supp Fig. S1B; Table 1; Methods] were used for downstream analysis. Most cells that were removed by the mapping correlation thresholds were either “Microglia Siglech” or “PVM Mrc1,” which tend to have low gene counts compared to neurons and were sampled less abundantly in the reference taxonomy (Tasic et al., 2018). Consistent with previous studies (Prosniak et al., 2001; Zhao et al., 2011; Huang and Sabatini, 2020; Patino et al., 2022), rabies virus-infected cells had higher expression of immune response-related genes (Supp Fig. S2A, B). We also noticed that some inhibitory cell types showed higher expression of activity-related genes, such as Baz1, Fosl2, and Jun (Supp Fig. S3). Nonetheless, the expression patterns of cell type markers were comparable to the reference scRNA-seq dataset (Fig. 2E), which allowed mapping of neurons to reference cell types (Fig. 2F). Thus, gene expression changes induced by rabies virus infection did not prevent cell typing using scRNA-seq.

Number of cells in the scRNA-seq-based transsynaptic tracing and retrograde tracing experiments.

Rabies transcripts were detected in all FACS collected cells, including the 54 mapped cells (Fig. 2G). Consistent with their known expression levels (Conzelmann, 1998), transcripts for the rabies nucleoprotein, phosphoprotein, and matrix proteins were more abundant than the transcripts for mCherry (which replaced the rabies glycoprotein) and the large protein. Consistent with the robust detection of rabies transcripts, 50 out of 54 neurons had sequencing reads covering the barcode region of the nucleoprotein transcript (Fig. 2H; Table 1). Because some barcodes were likely missed when sequencing the rabies virus libraries (see Methods), we then associated the transcriptomic identities of labeled neurons with their projections based on whether the barcode flanking region contained CCS to maximize the number of cells that can be associated with projections. In the cortex, L6 corticothalamic (CT) neurons and L5 extra-telencephalic neurons (L5 ET, also known as L5 pyramidal tract/PT neurons) mainly project to the thalamus, but not the cortex, whereas intra-telencephalic (IT) neurons mainly project to the cortex and the striatum, but not the thalamus (Harris and Shepherd, 2015; Harris et al., 2019; Peng et al., 2021). Consistent with the known connectivity of cortical cell types (Harris and Shepherd, 2015; Tasic et al., 2018), 14 L6 CT neurons, 3 L6b neurons, and 1 L5 ET neurons projected to the LGd, and not VISal; in contrast, 24 IT neurons and 1 near-projecting (NP) neuron projected to VISal, but not LGd (Fig. 2I). Based on these results, we estimate that the false positive rate in distinguishing the transcriptomic type of retrogradely labeled neurons to be between 0 – 3.1 % (see Methods). Thus, multiplexed retrograde tracing using barcoded rabies virus recapitulated known projection patterns of cortical neuronal types.

In situ sequencing resolves transcriptomic types of rabies-barcoded neurons

scRNA-seq provides a detailed view of the transcriptomic landscape of rabies-labeled neurons, but this approach has several limitations. First, a large fraction of neurons is lost during single-cell dissociation prior to sequencing. Because different types of neurons have different survival rates during dissociation, this procedure introduces biases in the composition of neurons that are sequenced (Tasic et al., 2016; Tasic et al., 2018). Second, a consequence of tissue dissociation is that the precise locations of neurons are lost, which can obscure potential spatial organization of neuronal connectivity. Finally, the low throughput and high cost of scRNA-seq limit the scale at which the retrogradely labeled neurons can be interrogated. To overcome these limitations, we next used high-throughput in situ sequencing to resolve both gene expression and rabies barcodes in retrogradely labeled neurons.

Our in situ sequencing approach is based on BARseq, a high-throughput technique that can determine both long-range projections of neurons and their gene expression by in situ sequencing of both endogenous gene expression and an mRNA barcode encoded in the genome of Sindbis virus (Chen et al., 2019; Sun et al., 2021). We have previously shown that BARseq-style in situ sequencing can distinguish transcriptomic types of cortical neurons with high transcriptomic resolution in a uninfected mouse brain (Chen et al., 2022a). To adapt BARseq to rabies barcodes (Fig. 3A), we used primers that were complementary to the region that was 3’ to the barcodes on the genomic strand to reverse-transcribe the rabies barcodes. In addition to the primers for the rabies barcodes, we also used random 20-mers to reverse transcribe the endogenous mRNAs. After reverse transcription in situ, we used padlock probes targeting the flanking regions of the barcodes and 104 marker genes that distinguished cortical excitatory neuron types (Chen et al., 2022a). We then gap-filled the barcode-targeting padlock probes, ligated all padlock probes, and performed rolling circle amplification as in standard BARseq experiments (Sun et al., 2021). We performed seven rounds of Illumina sequencing in situ to read out 101 cell type marker genes, one round of hybridization to detect three high-level cell type markers (Slc17a7, Gad1, and Slc30a3), and 15 rounds of sequencing in situ to read out the rabies barcodes. We segmented the cells using Cellpose (Stringer et al., 2020) and decoded gene rolonies using Bardensr (Chen et al., 2018; Chen et al., 2019) to obtain single-cell gene expression, as described previously (Chen et al., 2022a). To basecall barcode rolonies, we picked peaks in the first round of barcode sequencing images using both an intensity threshold and a prominence threshold. We then determined the pixel values in all four sequencing channels across all cycles to readout barcode sequences (Chen et al., 2018; Chen et al., 2019), and assigned barcode rolonies to segmented cells. All slices were registered to the Allen Common Coordinate Framework v3 (Wang et al., 2020) using QuickNII and Visualign (Puchades et al., 2019)(Methods).

In situ sequencing resolves cell types of neurons infected with barcoded rabies virus.

(A) Illustration of probe designs and amplification approach for in situ sequencing on rabies barcodes. (B) Illustration of the experiments. (C) Left, example image of a coronal section (outlined by a dashed line) during the hybridization cycle. The final sequencing cycles for genes and barcodes in the boxed area are shown on the Right. Genes and nucleotides corresponding to each color are indicated. Scale bars = 50 µm. (D)(E) All sequenced cells shown on a UMAP plot (D) or on a representative coronal section (E). Colors indicate subclass-level cluster labels as shown in the legend. (F) Cluster matching between BARseq subclass-level clusters and subclasses from scRNA-seq (Tasic et al., 2018). (G) The number of cells with the indicated number of barcodes per cell. (H) The count of the primary barcode and the second most abundant barcode in each barcoded cell. Cells above the dotted line have more than one barcode per cell. (I) Counts of the most dominant barcode and the remaining barcodes in each cell. (J) Summary of the number of cells with more than one barcodes (blue) and/or in which the primary barcodes accounted for less than half of all barcode reads (red). Shapes are not drawn to scale. (K) UMAP plot as plotted in (D), color coded by whether the cells had barcodes. (L)(M) The distribution of endogenous mRNA reads per cell and unique gene counts per cell in all cells (L) or the QC-filtered cells (M). In (L), the dotted vertical lines indicate QC thresholds. BC: Barcoded cells.

We next applied this approach to examine the distribution of neurons with projections to two cortical areas within the dorsal and ventral stream of the visual pathways, respectively. We injected the two libraries of ΔG rabies virus, coated with the rabies virus glycoprotein, with the library that contained CCS injected in VISal, and the non-CCS-containing library injected in the retrosplenial cortex (RSP); the injection into RSP also infected a portion of the superior colliculus (SC) (Fig. 3B). After 7 days, we sacrificed the animals and cryo-sectioned the brains into 20 µm-coronal sections (Fig. 3C). We sequenced 791,966 segmented cells on eight coronal sections from two animals, and 524,952 cells had sufficient mRNA reads (at least 20 read counts per cell and at least 5 genes per cell) to pass quality control (QC).

We performed de novo clustering on QC-filtered cells (Methods)(Chen et al., 2022a) and identified 35 subclasses of neurons. We further clustered the 20 excitatory subclasses into 116 types (Fig. 3D, E). We then mapped the subclasses from our dataset to subclasses in reference scRNA-seq data (Tasic et al., 2018) using SingleR (Aran et al., 2019) (Fig. 3F). Consistent with previous BARseq results (Chen et al., 2022a), the BARseq subclasses had one-to-one correspondence to subclasses of cortical excitatory neurons found in scRNA-seq data (Economo et al., 2018; Tasic et al., 2018). Thus, consistent with our previous study (Chen et al., 2022a), in situ sequencing resolved cortical excitatory neurons at high transcriptomic resolution.

To assign barcodes to cells, we considered a barcode to be associated with a cell if we found at least six molecules of that barcode in a cell (Methods). Of 4,130 barcoded cells we identified, all but 204 (4.9%) cells contained a single barcode that was above this threshold (>6 counts per cell) (Fig. 3G, H). In most cells, the primary barcode, i.e. the barcode with the most abundant counts in each cell, accounted for 82% ± 19% (mean ± standard deviation) of barcodes within each cell, and in 3850 out of 4130 (93.2%) barcoded cells, the read counts of the primary barcode could account for at least half of all barcode reads in a cell before thresholding (i.e., out of all barcode reads without requiring a barcode to have at least six reads in a cell; Fig. 3I, J). Consistent with the scRNA-seq experiment described above (Fig. 2F), barcoded cells co-mingled with non-barcoded cells in a UMAP plot (Fig. 3K) and had similar read counts and gene counts per cell compared to non-barcoded cells (Fig. 3L, M), suggesting that labeling by rabies virus did not significantly alter the expression of the genes that we targeted. Thus, consistent with the results of the scRNA-seq-based approach (Fig. 2), in situ sequencing can resolve the transcriptomic identities of rabies infected cells and their rabies barcode identity.

In situ sequencing-based retrograde tracing resolves projections of transcriptomic types of neurons across cortical areas

From all barcoded cells, we identified 1,415 unique barcodes and matched them to the two barcoded libraries (containing at least 8,552 and 13,211 barcodes, respectively; see Methods). Because we sequenced 15 out of 20 bases in the rabies barcodes, we first assessed whether the shorter sequences were sufficient to unambiguously assign barcodes to the two libraries. We calculated the minimum Hamming distance of each barcode found in cells to the known barcodes in the two libraries and a third mock library with similar numbers of random barcodes (Fig. 4A). Many barcodes had zero to one mismatch to the two barcode libraries, but not a random barcode library. This peak was distinct from a second peak that was centered at four to five mismatches and included barcodes that were absent from the barcode libraries. No barcode (out of 1,415 unique barcodes in this dataset) was within one mismatch to any barcode in the random barcode library. We thus allowed one mismatch when matching barcodes to the two libraries and found 1,169 VISal-projecting cells and 2,483 SC/RSP-projecting cells, including 5 cells that projected to both areas (Fig. 4B). Of the barcodes that matched the two libraries, the number of cells that carried a barcode correlated with the frequency of that barcode in the library (Fig. 4C; Pearson correlation 0.41). An additional 496 barcodes from 483 cells did not match either barcode library, likely because the two virus libraries were not sequenced completely to fully represent diversity of barcodes they contained (see Methods for discussions on the unmatched barcodes).

Multiplexed retrograde labeling recapitulates known cortical projections.

(A) The minimum Hamming distance between each barcode and all other barcodes for barcodes in the VISal library (blue), the SC/RSP library (red), and random barcodes (dashed). (B) The distribution of mismatch between each sequenced barcode and the closest barcode in the two libraries. Colors indicate whether and which library each barcode was mapped to. (C) The frequencies of barcodes in the libraries (x-axis) are plotted against the number of sequenced cells carrying those barcodes (y-axis). Jitter is added on the y-axis to resolve overlapping dots. (D) Two representative slices, one from each brain showing all sequenced cells on each slice. Barcoded cells are color-coded by projections and non-barcoded cells are color-coded by brain regions. (E)(F) The number of neurons from each cortical area and each subclass that project to either VISal (E) or SC/RSP (F). Dot sizes and colors indicate number of cells. (G) The distribution of the number of projecting cells for each subclass. (H)-(J) Total number of cells (H) or the fraction of cells with projections to SC/RSP (I) or VISal (J) of the indicated L4/5 IT type in each cortical area. (K)(L) Fractions of variance in the probability of projections explained by combinations of the compositional profiles of cell types (S for subclass and T for type) and cortical areas (Area) for L4/5 IT neurons (K) or for all excitatory neurons (L). In (L), the first column indicates variance explained by compositional profiles of cell types at the subclass level. *p =6×10-70, **p =6×10-58, ***p =7×10-78, ****p =6×10- 37 comparing the means to 100 iterations of shuffled controls using two-tailed t-tests.

VISal-projecting cells and SC/RSP-projecting neurons were differentially distributed across brain areas and were enriched in different transcriptomic subclasses (Fig. 4D-G). VISal-projecting neurons were found mostly in VISp and cortical areas lateral to VISp, including VISl and TEa (Fig. 4E). All transcriptomic types of cortical excitatory neurons, except L6 CT and L6b neurons, were labeled, but the fraction of cells labeled were biased according to the areas. For example, VISal-projecting L2/3 IT neurons were found mostly in VISp and VISl, but not in TEa (Fig. 4E). Because superficial layer neurons are involved in feedforward projections from lower-hierarchy cortical areas to higher-hierarchy ones (Rockland and Pandya, 1979), this observation is consistent with the hierarchy of cortical areas [VISp and VISl have lower hierarchy than VISal, whereas TEa has higher hierarchy; (Harris et al., 2019)]. In contrast, SC/RSP-projecting neurons were found medial to VISp (including VISpm and RSPd), in hippocampal areas including the postsubiculum, presubiculum, and the ventral subiculum (Fig. 4F), and in the midbrain (Fig. 4D). Within the cortex, projections were dominated by L4/5 IT neurons in VISpm and RSPd, and L5 ET neurons in VISp. Because IT neurons project to the cortex but not the SC (Harris and Shepherd, 2015; Harris et al., 2019; Peng et al., 2021), the L4/5 IT neurons likely projected to RSP, not SC, whereas, the L5 ET neurons likely projected to the SC. Thus, IT neurons that projected to VISal and RSP were separated along the mediolateral axis. This separation is consistent with the ventral and dorsal streams of the visual pathways (Wang et al., 2012): whereas VISl, TEa, and ECT belong to the ventral stream, VISp and RSPd belong to the dorsal stream. These results recapitulated known patterns of projections across areas and transcriptomic subclasses and were consistent with the two pathways in the mouse visual circuit.

Because neurons of different transcriptomic types within a subclass are differentially enriched across cortical areas (Yao et al., 2021; Chen et al., 2022a), we wondered if the differences in projections of the same subclass of neurons across cortical areas can be explained solely by variations in the composition of cell types within subclasses. We first focused on the L4/5 IT neurons, which project to either VISal or RSP from the lateral or medial cortical areas, respectively. Consistent with our previous observations (Chen et al., 2022a), the transcriptomic types of L4/5 IT neurons were differentially enriched across cortical areas along the mediolateral axis (Fig. 4H). For example, L4/5_IT_1 and L4/5_IT_3 were enriched in lateral areas, such as VISl, TEa, and ECT, whereas L4/5_IT_4 to L4/5_IT_6 were enriched in VISp, VISm, and RSPd. The enrichment of cell types across cortical areas, however, cannot account for the differences in projections of L4/5 IT neurons from different cortical areas (Fig. 4I, J). For example, L4/5_IT_2 in VISpm and RSPd mainly projected to RSP, but not VISal, whereas those in TEa and VISl projected to VISal not RSP. To quantify how much source areas and cell types contribute to the diversity in projections, we discretized the cortex into “cubelets.” Each cubelet spanned all cortical layers, was 20 µm thick along the anteroposterior axis (i.e. the thickness of a section), and was about 110 µm in width along the mediolateral axis (Methods). In each cubelet, we calculated the fraction of neurons that projected to VISal for each cell type (we did not perform this analysis for RSP-projecting cells because of insufficient sample size). We then used one-way ANOVA to estimate the variance in the fraction of neurons with projections that can be explained by source area labels, cell type labels, or both (Methods). For example, if projections are determined solely by cell types and the differences in projection probabilities across cortical areas are only due to different compositions of cell types, then the variance in the fraction of neurons with projections explained by cell types and source areas together should be similar to the variance explained by cell types and shuffled source area labels. Within the L4/5 IT subclass, cell type and source area each explained a modest fraction of variance (cell type: 5.9 ± 3.7%; source area: 6.3 ± 6.5%, mean ± std; Fig. 4K). In contrast, combining cell type and source area explained more variance (31.8 ± 25.7%, mean ± std; Fig. 4K;) compared to cell types and shuffled source area labels (p = 6×10-58 using one-sample two-tailed t-test comparing to 100 iterations of controls with shuffled areas). Similar results were observed across all subclasses of cortical excitatory neurons (Fig. 4L; p = 6×10-37 using one-sample two-tailed t-test comparing types with areas to 100 iterations of types with shuffled areas). These results indicate that projections of cortical neurons are determined by both the transcriptomic identities of the neurons at the resolution we observed and their anatomical location.

In situ sequencing of barcoded rabies virus reveals cell type preferences in synaptic convergence

Because we can identify transcriptomic identities of neurons labeled with rabies barcode, similar approaches can also be used in a transsynaptic labeling experiment to interrogate synaptic connectivity. In a barcoded transsynaptic labeling experiment, networks of neurons can share the same barcode because of either connectivity between presynaptic cells and source cells or technical and biological artifacts. Distinguishing different types of barcode-sharing networks is thus crucial for resolving single-cell connectivity. In the following, we first lay out how different types of barcode-sharing networks affect connectivity mapping and how they can be distinguished from each other. We then use in situ sequencing to resolve these networks in a barcoded transsynaptic labeling experiment.

Ideally, each source cell should express a unique barcode, which is also shared with its presynaptic cells. By matching barcodes in the presynaptic cells to those in the source cells, we can then infer connectivity between the presynaptic cells to individual source cells (single-source networks; Fig. 5Aa). In practice, however, multiple source cells may express the same barcode. This barcode sharing could be caused by two viral particles with the same barcodes that independently infected two source cells (double-labeled networks; Fig. 5Ab). For a given barcode, the probability that the same barcode is found in a second source cell because of double labeling scales with the frequency of that barcode in the library and the total number of source cells that are directly labeled in an experiment. Thus, for a given experiment with known number of source cells, barcodes that are more abundant in a library are more likely to result in double-labeled networks. This type of barcode sharing can be minimized by employing sufficiently diverse and uniformly distributed barcodes (Kebschull et al., 2016). However, this is typically challenging to achieve with rabies virus, because the recombinant rabies virus production process usually leads to uneven amplification of barcodes, resulting in their highly skewed distribution (Clark et al., 2021; Saunders et al., 2022). Alternatively, two cells expressing the rabies glycoprotein can be interconnected. In this case, the source cell may pass the barcode to the other cell, thus obscuring the identity of the original source cell (connected-source networks; Fig. 5Ac). Because the probability of connected-source networks scales with both the number of source cells and their local connectivity probability, limiting the number of source cells or restricting source cells to sparsely connected subpopulations of neurons will reduce this type of network. Finally, it is possible to find barcodes that occur only in cells without glycoprotein expression, but not in corresponding source cells. This can happen when the rabies virus directly infects cells that do not express the glycoprotein (no-source networks; Fig. 5Ad). This may happen because the virus library contains trace amount of non-pseudotyped rabies virus, or because the infected cells express TVA but not the glycoprotein (these are expressed from separate helper AAVs in our system). In this scenario, each barcode should be found in only one or a few cells (depending on the frequency of that barcode in the library), because the barcoded cells are all produced by separate infection events and not from cell-cell transmission. Alternatively, source cells may be dead due to cytotoxicity or missed by sequencing (lost-source networks; Fig. 5Ae). If a source cell died, it may produce fewer presynaptic cells than single-source networks, because the death of the source cell may only allow limited cell-cell transmission, but possibly more presynaptic cells than in no-source networks. Furthermore, the presence of lost-source networks in an experiment could cause some double-labeled networks or connected-source networks to be misidentified as single-source networks because some source cells in these networks may have died. Because only true single-source networks can unambiguously resolve synaptic connectivity between individual source cells and presynaptic cells, mapping connectivity between source cells and presynaptic cells requires an experiment in which all source cells are sequenced, and which has few lost-source networks. In contrast, single-source networks, connected-source networks (see below for discussion), and lost-source networks can all be used to infer synaptic convergence, i.e. the degree to which two neuronal types synapse onto the same cells, regardless of the identity of the source cell. If the total number of barcodes that labeled cells in an experiment is known, both no-source networks and double-labeled networks can be minimized by filtering based on the number of neurons that share each barcode and the barcode frequencies in the virus library, respectively. Thus, inferring synaptic convergence neither requires sequencing all source cells nor is constrained by the presence of lost source cells.

We first tried using SMART-seq v4 scRNA-seq to resolve the connectivity and transcriptomic identity of transsynaptically labeled neurons. However, because scRNA-seq can only interrogate a fraction of all barcoded cells, we were unable to confidently infer connectivity from the rabies barcodes (see Supplementary Note for details; Supp Fig. S4). To sequence more barcoded cells, we then investigated whether in situ sequencing can achieve readout of multiplexed transsynaptic labeling with barcoded rabies virus (Fig. 5B). We injected AAV2-retro-synP-mCre (Koresawa et al., 2000) into the dorsal striatum, and two helper viruses, AAV1-synP-FLEX-splitTVA-EGFP-tTA (Kohara et al., 2014) and AAV1-TREtight-mTagBFP2-B19G, into the somatosensory cortex. This combination should allow expression of TVA and the rabies glycoprotein in corticostriatal neurons. After seven days, we injected a barcoded EnvA-pseudotyped ΔG rabies virus library into the somatosensory cortex. This library contained the same barcodes as the CCS-containing library used for retrograde labeling, with at least 13,211 barcodes. After another seven days, we collected 20-µm coronal sections and sequenced both endogenous genes and the rabies barcodes in 32 consecutive sections in situ (Fig. 5C). These sections, which spanned 640 µm along the AP axis, largely covered the whole injection site. Because in situ sequencing did not require dissociation and was able to capture even halves of the same barcoded cells (Fig. 5D), sequencing consecutive sections that spanned the whole injection site should allow us to interrogate most source cells. We interrogated the same gene panel as in the retrograde labeling experiment, except that we included probes for the rabies glycoprotein gene instead of probes for Slc30a3. Barcode rolonies were resolved individually within cells, which allowed us to read out multiple barcodes from the same cell (Fig. 5C).

Multiplexed transsynaptic labeling by sequencing rabies barcodes in situ.

(A) Five possible types of barcode-sharing networks in a barcoded transsynaptic tracing experiment using rabies virus. Whether each network is compatible with monosynaptic tracing and/or mapping synaptic convergence is indicated below. *see text for considerations regarding connected-source networks. (B) Outline of transsynaptic labeling experiment using barcoded rabies virus and in situ sequencing. (C) Example image of a representative coronal section during sequencing. Images of the first gene sequencing cycle, the hybridization cycle, and the first barcode sequencing cycle of the boxed area are shown on the right. Scale bars = 50 µm. (D) First nine barcodes sequencing images of example bisected neuron from two adjacent sections. Scale bars = 10 µm. (E) The fraction of manually curated barcoded cells with the indicated barcode (“BC”) complexity and barcode read counts per cell. Dashed lines indicate quality control thresholds for barcodes. (F) The distribution of endogenous mRNA reads per cell and unique gene counts per cell in non-barcoded cells (gray), barcoded cells (blue), and source cells (red). Dashed lines indicate quality control thresholds for gene expression. (G)(H) UMAP plots of all barcoded cells color-coded by the cluster label at the subclass level (G) or whether the cell has barcodes or is a source cell (H). (I) Locations and cell types of two source cells (red cross) and presynaptic cells (dots) that shared the same barcodes. Colors of dots indicate transcriptomic types of presynaptic neurons. Transcriptomic types of source cells are indicated below each plot. All other cells from the coronal sections that the source cells were on were plotted in gray. (J) Estimated numbers of barcode/source cell combinations that belonged to each of the three networks with source cells. (K) The posterior probability of the number of independent infection events to generate the same number of barcodes found in the experiment. (L) The distribution of the number of cells that shared each barcode that was not found in a source cell. (M) The distribution of the maximum barcode frequency in the virus library that ensures single infection for 95% of barcodes across 10,000 simulations. (N) The ratios between the observed number of converging outputs and the expected number from random connectivity between cortical subclasses of neurons. Colors correspond to log10 of ratios, and the ratios are indicated in the plot. Only values with false positive rate (FPR) < 0.05 are shown (see Methods). As seen from the blue squares associated with L6 IT cells, these neurons were less likely to synapse on to the same post-synaptic neurons with other neuronal types (in particular L6 CT neurons, with only 50% of converging connections compared to those expected from random connectivity).

We sequenced a total of 3,107,645 cells, including 2,914 barcoded cells. Barcoded cells were defined as cells with at least eight counts of the same barcode in a cell, and the barcode was sufficiently complex (linguistic sequence complexity > 10-0.9; see Methods for definition). These barcode quality-control thresholds were picked based on manual examination of a subset of barcoded cells (Fig. 5E) and allowed us to remove fluorescent background and barcodes from passing axons and/or dendrites. Of the 2,914 barcoded cells, 138 cells had at least two counts of the rabies glycoprotein (Methods; Supp. Fig. S5A; see Discussion about the low-level expression) and were considered potential source cells; 2,590 barcoded cells had no glycoprotein expression and were considered potential presynaptic cells; the remaining 186 barcoded cells had a single read of glycoprotein, and were removed from further analyses. Consistent with the retrograde labeling experiment, the barcoded cells had similar numbers of endogenous gene reads as non-barcoded cells (Fig. 5F) and clustered together with non-barcoded cells (Fig. 5G, H).

Because distinguishing different types of barcode-sharing networks requires sequencing most source cells, we first estimated what fraction of the source cells that were present in the tissue was sequenced. The number of cells that shared a barcode varied widely from 1 cell/barcode to 359 cells/barcode. Since previous studies showed that each source cell should label dozens to hundreds of presynaptic neurons (Liu et al., 2013; Wertz et al., 2015), we reasoned that barcodes that were found only in a small number of cells may indicate either no-source networks or lost-source networks due to dead source cells. We thus focused on 23 barcodes that were present in at least 12 cells per barcode, and calculated the fraction of barcodes that were also found in source cells with read counts above a certain threshold. We varied this threshold to generate a range of estimates that likely over-estimated and under-estimated the number of source cells at the extreme ends, respectively, and found that 80% – 93% barcodes were found in source cells (see Methods for details; Supp. Fig. S5B).

We further filtered the original 138 source cells to remove those that had wrong segmentations or included barcodes from nearby cells, resulting in 59 unique barcodes in 120 high quality source cells. Because six source cells contained two barcodes each and the remaining 114 source cells contained a single barcode, these 120 source cells corresponded to 126 unique combinations of source cell/barcodes. Because these source cell/barcode combinations, rather than barcode counts or source cell counts alone, represented potential independent infection events or transsynaptic labeling between source cells, we next used the number of source cell/barcode combinations to estimate the number of barcodes that belonged to the five types of networks (Fig. 5A). 43 barcodes in 42 source cells were found only in a single source cell each (one source cell contained two barcodes; Fig. 5Aa); of these 43 barcodes, 31 barcodes in 30 source cells were shared with 381 putative presynaptic cells (12 ± 14 cells per barcode after rounding, mean ± std; Fig. 5I; Supp. Fig. S6). The remaining 16 (59 – 43) barcodes labeled multiple source cells each (81 source cells in total, including 3 source cells that contained both shared barcodes and unique barcodes). Barcodes shared across source cells could be caused by either connected source cells (Fig. 5Ac) or double labeling (Fig. 5Ab). To estimate the contribution of these two sources, we calculated the posterior probability of N transduction events to generate 59 barcodes in the source cells, given the barcode frequency in the library (Methods). We found that the most likely number of transduction events was 76 (95% confidence interval 67 to 87; Fig. 5J, K), suggesting that about 33 (76 – 43) barcode/source cell combinations can be explained by multiple source cells being infected with the same barcodes through independent infection events. Since there were a total of 126 unique combinations of source cells and barcodes, the remaining 50 (126 - 76) source-barcode pairs were likely from source cells that were connected to other source cells (Fig. 5J). This high number of connected source cells is expected, because many cortical neuronal types are highly connected (Campagnola et al., 2022). Thus, barcode sharing among source cells was likely dominated by connected source-cell networks.

We next estimated numbers of barcode-sharing networks without source cells (no-source networks or lost-source networks; Fig. 5Ad, e). In total, 490 barcodes were found in presynaptic cells but not in source cells. We reasoned that since cells in no-source networks were all generated through independent infection events, these networks should have a small number of cells. In contrast, lost-source networks were likely larger, because source cells may die after presynaptic cells are labeled (Wertz et al., 2015). Indeed, the distribution of the number of cells containing each barcode that were absent in source cells revealed two peaks, one large peak around 1 cell per barcode and a smaller peak around 8 cells per barcode (Fig. 5L). 429 of these barcodes (88%) were found in fewer than 5 cells, suggesting that barcodes that were not found in source cells were dominated by direct infection of cells that did not express the glycoprotein (no-source networks). If all networks with at least 5 cells corresponded to lost-source networks, then 61 source cells died, compared to 120 observed source cells and 76 estimated independent infection events. These results are consistent with previous observations using single cell-initiated monosynaptic tracing that around half of source cells died by seven days after infection (Wertz et al., 2015). Because many source cells likely died in this experiment, some barcodes that were originally shared across multiple source cells may appear as single-source networks. This kind of ambiguity may obscure mapping synaptic connectivity between the source cells and presynaptic cells, but has limited effect on inferring synaptic convergence, i.e. which neuronal types are more likely to synapse onto the same post-synaptic cell, regardless of the identity of the post-synaptic cell.

We next filtered our data with the goal of estimating synaptic convergence. Unlike when estimating single-cell connectivity between source cells and presynaptic cells, both single-source networks (Fig. 5Aa) and lost-source networks (Fig. 5Ae) can be used to estimate synaptic convergence, because presynaptic neurons that shared the same barcodes all synapsed onto the same cells (although the identity of the post-synaptic cells are unknown in lost-source networks). Connected-source networks (Fig. 5Ac) may in principle obscure synaptic convergence if a source cell that is transsynaptically labeled further pass the barcode to its presynaptic cells. However, because these tertiary infections (presynaptic cells of the transsynaptically labeled source cell) occur after the secondary infections (presynaptic cells of the original source cell), the number of barcoded cells that were presynaptic to the transsynaptically labeled source cell would likely be smaller compared to those that were presynaptic to the original source cell after a short post-infection incubation period (7 days in our experiment). Thus, we reasoned that connected-source networks were likely dominated by presynaptic cells of the original source cells and included them in the analysis of synaptic convergence. Finally, no-source networks (Fig. 5Ad) and double-labeled networks (Fig. 5Ab) contained many neurons that were not related by connectivity and would obscure synaptic convergence. We thus minimized the impact of these two networks by limiting network size and barcode frequency. Because cells in no-source networks were generated by independent infection events and generally contained few cells, we first filtered out networks that were too small (< 5 cells; Methods). Because double-labeled networks likely occurred to barcodes that were abundant, we then applied a maximum barcode frequency threshold (1.3×10-3 based on the library sequencing). This threshold should ensure that about 95% of the barcodes could only have transduced a single source cell from the initial infection by EnvA-enveloped virus (see Methods; Fig. 5M).

For all potential presynaptic neurons that shared a barcode, we counted each pair of presynaptic neurons as a pair of “converging connections.” We aggregated all pairs of converging connections across all filtered barcodes for cortical subclasses of presynaptic neurons, and calculated the bias in synaptic convergence as the ratio between the observed number of converging connections compared to the expected numbers of connections based on random connectivity (Fig. 5N; see Methods). That is, if subclass A and subclass B have a convergence bias >1, then neurons of these two subclasses are more likely to synapse onto the same post-synaptic cell, and vice versa. We found that intra-telencephalic (IT) neurons – neurons that project mainly within the cortex and the striatum – generally synapse onto similar cells (the convergence bias is 2.75 between L2/3 IT and L5 IT, 2.81 between L4/5 IT and L5 IT, and 1.06 between L5 IT and L6 IT, false positive rate < 0.05). However, IT neurons in layer 6 (L6 IT) were less likely to synapse onto the same postsynaptic cells as other cortical neurons, including IT neurons in the superficial layers (L2/3 IT and L4/5 IT, convergence bias 0.83 and 0.80, respectively), the corticofugal projection neurons (L5 ET and L6 CT, convergence bias 0.78 and 0.53, respectively), NP neurons (convergence bias 0.72), and two subclasses of inhibitory neurons (Lamp5 and Pvalb, convergence bias 0.70 and 0.84, respectively). The differences in the connectivity of L6 IT neurons are reminiscent of their characteristic areal and laminar patterns of long-range projections (Tasic et al., 2018; Chen et al., 2019) compared to other IT neurons. We speculate that the distinct L6 IT connectivity may reflect specialized roles in cortical processing.


Here we combined scRNA-seq and in situ sequencing with barcoded rabies virus to relate connectivity and transcriptomic identities of neurons in both multiplexed retrograde labeling and transsynaptic labeling. As a proof of principle, we applied these approaches to interrogate projections and synaptic connectivity of cortical neuronal types. Our experiments recapitulated the diverging dorsal and ventral streams in the visual cortex and revealed differences in projections of neurons across cortical areas and transcriptomic types. Furthermore, we laid out the requirements for achieving and assessing barcoded transsynaptic labeling, and revealed converging and diverging synaptic connectivity among cortical neuronal types. Our study provides proof of principle for applying barcoded rabies virus-based neuroanatomy in the mouse brain in vivo and lays the foundation for achieving multiplexed monosynaptic tracing using barcoded rabies virus.

Achieving multiplexed retrograde labeling using barcoded rabies virus

In barcoding-based multiplexed retrograde tracing, different barcodes are used to distinguish projections to different injection sites. Thus, unlike conventional fluorescence-based multiplexed retrograde tracing techniques, a barcoding-based approach is not limited by the number of distinguishable colors of fluorescence. Because barcode diversity increases exponentially with the length of the barcode (4N in which N is the length of the barcode), possible barcode diversity is virtually unlimited. Thus, in a barcoded retrograde tracing experiment, the number of targets that can be investigated by retrograde labeling in a single experiment is likely limited by the number of injections. In our experiments, we only performed retrograde labeling from two sites as a proof of principle, but in future experiments, the approach can be applied to retrograde tracing from dozens of sites to reveal multiple projections from individual cells.

Barcoded retrograde and anterograde projection mapping approaches obtain complementary information. Barcoded anterograde labeling approaches, such as MAPseq and BARseq, can barcode many neuronal somata in a local region in a single injection, and thus can reveal the diversity of projections of densely labeled neurons within a region. In contrast, retrograde labeling approaches can reveal the distribution of neurons with the same projections and the distribution and diversity of projection neurons over large brain regions. For example, we interrogated the distribution of neurons across seven adjacent cortical areas and found that both source areas and the transcriptomic identities of cortical neurons contribute to their projection patterns. To perform the equivalent experiment using an anterograde tracing approach would require multiple injections at precise locations to achieve uniform labeling across a large region of the cortex. The sensitivities of the two approaches are also constrained by different factors: the sensitivity of MAPseq/BARseq relies on detecting individual barcode molecules in the axons; in contrast, because rabies genome is highly amplified in each infected cell, the sensitivity of barcoded rabies-based retrograde labeling is not limited by barcode detection but determined by the virus itself. Any future improvement in rabies virus-based retrograde tracing in general would also benefit barcoded approaches. Therefore, the combination of barcoding-based retrograde and anterograde projection mapping techniques provides a versatile set of tools for understanding the organization of long-range circuits.

Assessing synaptic connectivity of neuronal types using barcoded transsynaptic labeling

Mapping synaptic connectivity using barcoded transsynaptic labeling has the potential to achieve unparalleled throughput and transcriptomic type resolution compared to existing approaches, such as electron microscopy and multi-patch experiments. In a barcoded transsynaptic labeling experiment, accurately inferring synaptic relationships among neurons requires distinguishing different types of barcode-sharing networks (Fig. 5A) and filtering out networks that are not compatible with connectivity analysis. For example, to perform monosynaptic tracing (i.e. mapping connectivity between source cells and presynaptic neurons), source cell death need to be minimized by additional control experiments prior to a barcoded tracing experiment. Furthermore, barcodes shared across multiple source cells need to be excluded. Interrogating synaptic convergence, which does not require knowing the precise identities of the source cells, requires removing highly abundant barcodes that may have potentially infected multiple source cells independently. In both cases, sequencing all barcodes and/or source cells is crucial: for monosynaptic tracing, sequencing nearly all source cells is needed to identify barcodes that are either shared across multiple source cells or missing in source cells; when mapping synaptic convergence, knowing the distribution and the number of barcodes that labeled cells is needed to estimate the maximum frequency of barcode that can ensure single labeling. An in situ sequencing-based approach has a unique advantage over previous scRNA-seq-based approaches (Clark et al., 2021; Saunders et al., 2022) to achieve near-complete sampling of source cells and barcodes. Because conventional single-cell approaches require tissue dissociation, some source cells are inevitably lost. In situ sequencing, in contrast, bypasses the need for tissue dissociation. Although some sections of tissues can be lost during cryo-sectioning, this type of tissue loss can be minimized in future experiments by using a tape-transfer system (Pinskiy et al., 2015), which is also compatible with BARseq-style in situ sequencing (Chen et al., 2019).

In this study, we focused only on investigating the biases in synaptic convergence across cortical neuronal types. To achieve true single-cell resolution in mapping connectivity between source cells and presynaptic cells, we identified two crucial factors for future experiments. First, we estimated that a significant fraction of source cells died during the experiment, which may lead us to misidentify connected-source networks as single-source networks and consequently misidentify the source cells that the presynaptic cells are connected to. In future experiments, this problem can be ameliorated by titrating concentrations of AAV and RV (Lavin et al., 2020), using alternative strains of rabies virus with less cytotoxicity (Reardon et al., 2016), new generations of deletion mutant vectors (Chatterjee et al., 2018; Jin et al., 2022), and/or engineered glycoprotein (Kim et al., 2016). Second, the fact that many source cells shared barcodes complicated the efforts to infer synaptic connectivity between source cells and presynaptic cells at cellular resolution. Furthermore, because some source cells that shared barcodes may have died, the number of barcode-sharing source cells we observed was likely an underestimate of the true numbers.

Because the number of connected source cells increases with both the probability of connections and the total number of cells that express the glycoprotein, connected source cells can be minimized by titrating the number of glycoprotein-expressing neurons and/or by labeling specific subpopulations of neurons that were sparsely connected. In our experiment, there were many cells that had a small number of glycoprotein transcripts. Many of these potential source cells did not have corresponding presynaptic cells, suggesting that this low-level expression was not efficient for enabling transmission of the rabies virus. We speculate that this low-level expression could have been driven by the TRE-tight promoter in the absence of Cre-dependent tTA expression (Loew et al., 2010). Thus, placing the glycoprotein under the direct control of Cre could reduce its broad but low-level expression and constrain potential source cells to a sparser population. These modifications, combined with control experiments that assess the extent of connected source cells, double labeling, and dead source cells experimentally can help achieve multiplexed monosynaptic tracing at cellular resolution in future experiments.

Building a comprehensive barcoded connectomics toolbox based on in situ sequencing

In barcoding-based neuroanatomical techniques, in situ sequencing possesses unique advantages over conventional dissociation based single-cell or single-nuclear RNA-seq techniques. In situ sequencing not only has the ability to capture most source cells, but is also faster and cheaper in mapping cell types than most dissociation based techniques on a per-cell basis (Chen et al., 2022a). For example, in situ sequencing performed in the transsynaptic labeling experiment, which included dense sequencing of 3.1 million cells in total across 32 consecutive coronal sections, cost less than $4,000 in reagents.

Furthermore, in situ sequencing captures the mapped neurons at high spatial resolution, which can reveal spatial organization of connectivity. Such space-based rules in connectivity are common in vertebrate brains (e.g. retinotopy, cortical column) and many may exist independent of transcriptomic type-based differences in connectivity (Chen et al., 2022b); specific spatial patterns of connectivity loss are common in many neurodegenerative diseases. By combining in situ sequencing with barcoded rabies tracing, we expand in situ sequencing-based barcoded neuroanatomical approaches to include anterograde projection mapping (BARseq), multiplexed retrograde tracing, and multiplexed transsynaptic tracing. This diverse set of in situ sequencing-based tools enable flexible strategies for mapping neuronal connectivity across scales.


Animals and surgery

Animal handling and surgery were conducted according to protocols approved by the Institutional Animal Care and Use Committee (IACUC) of the Allen Institute for Brain Science and protocol approved by the MIT Committee on Animal Care. Animals were housed 3-5 per cage and were on a 12/12 light/dark cycle in an environmentally controlled room (humidity 40%, temperature 21 °C). A full list of animals used are provided in Table S1.

For retrograde tracing, barcoded G-deleted rabies viruses enveloped in native rabies glycoprotein were stereotaxically injected into target brain areas of mice using coordinates obtained from the Paxinos adult mouse brain atlas (Paxinos and Franklin, 2013). For retrograde transsynaptic tracing, AAV helper viruses were stereotaxically injected into VISp for the scRNA-seq experiments and SSp/dorsal striatum for the in situ sequencing experiments, followed by the injection of EnvA-pseudotyped barcoded G-deleted rabies virus into the same area two weeks later. Details for all injections are provided in Table S1. Brains were dissected 1 week (7+/1 days) after rabies virus injection.

Viruses and constructs

Viruses used are provided in Table S2. For plasmids and vectors used for making barcoded rabies virus, see below (Constructing barcoded plasmids for rabies virus).

In situ sequencing oligos

RT primers, sequencing primers, and padlock probes for both endogenous genes and rabies transcripts are designed as previously described (Sun et al., 2021; Chen et al., 2022a). A list of oligos used in in situ sequencing are provided in Table S3.

Constructing barcoded plasmids for rabies virus

In order to maximize the copy number of barcode transcripts in rabies virus infected neurons, we inserted a barcode sequence consisting of 20 random nucleotides in the 3’ untranslated region of the nucleoprotein gene in pRVΔG-4mCherry (Addgene #52488) (SAD B19 strain) and RabV CVS-N2c(deltaG)-mCherry (Addgene #73464) (CVS-N2c strain) using NEBuilder HiFi DNA Assembly for constructing the barcoded rabies plasmid libraries. The 10x Genomics “Chromium Capture Sequence 2” (GCTCACCTATTAGCGGCTAAGG) was included following the barcode in one version of the libraries of each strain.

Cloning steps were as follows:

Step 1: Construction of pRVΔG-4mCherry library template

In order to reduce contamination of the library by the original plasmids, a jGCaMP7s fragment flanked by PacI and PmeI was inserted in the 3’ UTR of the N gene in RabV CVS-N2c(deltaG)-mCherry and pRVΔG-4mCherry.

Here we used two approaches to minimize contamination: 1) 2) after long-range PCR with Q5® High-Fidelity 2X Master Mix, the PCR amplicons were run on a low melting agarose gel for two hours in order to separate PCR amplicons from pRVΔG-4mCherry template plasmid in case double digestion from 1) was not sufficient.

Step 2: Whole RV plasmid PCR

To generate large amounts of linearized rabies vector for barcoded plasmids in the next step, we performed long-range PCR using pRVΔG-4mCherry library template from step 1, linearized with PacI and PmeI, as a DNA template. We then ran the PCR mix on a 0.7% low melting point agarose gel (16520-100, Invitrogen) and the target bands at 14,438 bp were cut and purified with NucleoSpin® Gel and PCR Clean-Up kit (740609.250, Takara). The target amplicons were re-concentrated to > 300 ng/ µL with GlycoBlue™ Coprecipitant (AM9515, Invitrogen). The long-range PCR primers are listed below:


The long-range PCR was performed as described below:

Q5® Hot Start High-Fidelity 2X Master Mix (M0494S,NEB): 12.5 µL; N_Barcode_160bp_rp_57 (10uM): 1.25 µL; N_Barcode_NheI_fp_58 (10uM): 1.25 µL; pRVΔG-4mCherry template with PacI and PmeI double digest (without purification, 8 ng/µL): 1 µL; Nuclease-free water (B1500L, NEB): 9 µL.

Cycle conditions: 1) 98℃, 30 seconds; 2) 98℃, 5 seconds; 3°67 ℃, 10 seconds; 4) 72℃, 5 minutes; 5°go to step 2-4, 30 times; 6) 72 ℃, 2 minutes.

Step 3: Barcoding RV vector using NEBuilder HiFi DNA Assembly method

A 160 bp Ultramer™ DNA Oligonucleotides with 20 random nucleotide barcodes sequence (see below) was generated by IDT (Integrated DNA Technologies, Inc., USA). They were inserted into the linearized vector of pRVΔG-4mCherry from step 2 described above using HiFi DNA Assembly (NEB, USA). The HiFi reaction was mixed as described below:

Re-concentrated pRVΔG-4mCherry PCR amplicons (>300 ng/µL): 1 µL; 160bp Ultramer™ DNA Oligonucleotides (0.2uM): 0.9 µL; Nuclease-free water: 8.1 µL; NEBuilder HiFi DNA Assembly Master Mix (E2621L, NEB): 10 µL.

This reaction mix was incubated at 50℃ for 1 hour. After the incubation, we concentrated the mix to >130 ng/µL with GlycoBlue™ Coprecipitant. After electroporation transformation with Endura Electrocompetent Cells (60242-1, Biosearch Technologies, USA), the cells were plated into Nunc™ Square BioAssay Dishes (Catalog number: 240835, Thermo Fisher Scientific, USA). After growing for 14 hours at 37℃, the bacterial colonies were scraped using bacti cell spreaders (60828-688, VWR, USA) for plasmid library purification with NucleoBond® Xtra Midi EF kit (740420.10,Takara).


Production of the CVS-N2c strain library proceeded similarly but using the following PCR primers: N2c whole rabies vector PCR primers:



Rabies virus production

Barcoded rabies viruses were produced and titered largely as described previously [PMID 20203674 and 25834254], using the barcoded vector genome plasmid libraries described above. For SAD B19 viruses, HEK-293T cells were transfected with the respective genome library along with helper plasmids pCAG-B19N (Addgene cat. # 59924), pCAG-B19P (Addgene cat. # 59925), pCAG-B19G (Addgene cat. # 59921), pCAG-B19L (Addgene cat. # 59922), and pCAG-T7pol (Addgene cat. # 59926), were used for rescue transfections, using Lipofectamine 2000 (Thermo Fisher). For CVS-N2c virus, Neuro2a cells stably expressing the CVS-N2c glycoprotein (N2A-N2cG_02 cells) were transfected with the barcoded N2c library along with helper plasmids pCAG-N2cN (Addgene cat. # 100801), pCAG N2cP (Addgene cat. # 100808), pCAG-N2cG (Addgene cat. # 100811), pCAG-N2cL (Addgene cat. # 100812), and pCAG-T7pol. Details for individual preparations are as follows:

RVΔG-4mCherry_20-mer barcode(B19G) (used for retrograde labeling):

Supernatants were collected beginning 3 days after transfection and continuing for a total of five days, with supernatants replaced with 13 ml fresh medium until the last collection; each supernatant was filtered and refrigerated until all supernatants were titered together on HEK-293T cells as described [PMID 20203674]. BHK cells stably expressing SAD B19 glycoprotein were infected with rescue supernatant at a multiplicity of infection (MOI) of 0.1. 24 hours after infection, medium was replaced with 14 ml fresh medium. Supernatants were collected beginning 24 hours later and continuing every 24 hours for a total of three days, clarified by low-speed centrifugation and filtered as described previously [PMID 20203674] and replaced with 14 ml fresh medium until the final collection. Following titering, 25 ml of supernatant III was ultracentrifugated through 25% sucrose as described [PMID 20203674] and resuspended overnight in 20 µl DPBS, aliquoted and frozen, with a final titer of 4.21e11 iu/mL.

RVΔG-4mCherry_CCS2_20nt_HM(B19G) (used for retrograde labeling):

Production of this virus was similar to that of the above but using the CCS2-containing SAD B19 plasmid library. The final titer of the concentrated virus was 5.97e11 iu/mL.

RVΔG-4mCherry_CCS2_20nt_HM(EnvA) (used for monosynaptic tracing with BARseq):

Supernatant I of the passage of RVΔG-4mCherry_CCS2_20nt_HM(B19G) described above was used to infect BHK cell stably expressing a fusion protein of the EnvA envelope protein with the cytoplasmic domain of the SAD B19 glycoprotein at an MOI of 2, as described. 24 hours after infection, cells were washed twice with DPBS and medium was replaced with 13 ml fresh medium per plate. 24 hours later, cells were again washed once with DPBS and medium was again replaced. 24 hours later, supernatants were collected, replaced with fresh medium, clarified and filtered, incubated with 30U/mL Benzonase for 30 min at 37°C, then ultracentrifugated through sucrose. 24 hours later, a second supernatant was collected and processed the same way. Both concentrated supernatants were then pooled, mixed, and aliquoted as 5 µl aliquots, then stored at -80°C before being titered on TVA-expressing cells as described previously [PMID 20203674]. The final titer of the concentrated stock was 7.16e10 iu/mL.

N2cdG-4mCherry_CCS2_20nt_HM(EnvA) (used for monosynaptic tracing with scRNA-seq): Following transfection, medium was changed regularly with 25 ml fresh medium until 10 days post-transfection, when supernatant collection was begun. Supernatants were collected each Monday, Wednesday, and Friday, replaced with 25 ml fresh medium except for the final collection, clarified, filtered, and stored in 4°C, for a total of 9 collections. Supernatants were titered on HEK 293T cells as described previously.

N2A-EnvA_cytG cells [PMID 26804990] were infected with rescue supernatants IV and V at an MOI of 2. 24 hours after infection, supernatants were aspirated, cells were washed twice with DPBS and given 12 ml fresh medium per plate. 24 hours later, cells were again washed once with DPBS and given 12 ml fresh medium per plate. 24 hours later, supernatants were collected, replaced with fresh medium, clarified and filtered, incubated with 30U/mL Benzonase for 30 min at 37°C, then ultracentrifugated through sucrose. 24 hours later, a second supernatant was collected and processed the same way. The two concentrated supernatants were pooled, aliquoted, and stored at -80°C. The final titer of the concentrated stock was 2.82e10 iu/mL.

Primers for N2c virus_Miseq:



Primers for MiSeq sample preparation:



Sequencing primers for Miseq:



We found 13,212 and 8,553 barcodes, respectively, from the RVΔG-4mCherry_20-mer barcode(B19G) and RVΔG-4mCherry_CCS2_20nt_HM(EnvA) libraries. The RVΔG-4mCherry_CCS2_20nt_HM(B19G) library was used to prepare the RVΔG-4mCherry_CCS2_20nt_HM(EnvA) library, so we did not separately sequence the barcodes in the B19G coated library.

In the retrograde labeling experiment, the 12% of barcodes (496 out of 4,130 barcoded cells) did not match barcodes that were seen in the two libraries. These unmatched barcodes were likely caused by the shallow depth at which these two libraries were sequenced. For example, in the RVΔG-4mCherry_CCS2_20nt_HM(EnvA) library, 99.9% of reads corresponded to unique UMIs (200,021 reads corresponding to 199,815 UMIs). Thus, a large fraction of UMIs are likely missed during sequencing.

Because 37% (5,914 out of 13,212) and 46% (3,964 out of 8553) barcodes found in the two libraries had a single UMI recovered, many barcodes with similar frequencies as the recovered barcodes with only 1-2 UMIs were likely missed by chance. Thus, the numbers of barcodes we found represent lower bounds of the barcode diversity in these libraries, and the shallow depth of sequencing could potentially explain the number of barcodes that were not matched to the libraries.

The RVΔG-4mCherry_CCS2_20nt_HM(B19G) library was used to prepare the RVΔG-4mCherry_CCS2_20nt_HM(EnvA) library, so we did not separately sequence the barcodes in the B19G coated library.

Tissue processing and scRNA-seq

scRNA-seq was performed using the SMART-Seq v4 kit (Takara Cat#634894) as described previously (Tasic et al., 2018). Mice were anaesthetized with isoflurane and perfused with cold carbogen-bubbled artificial cerebrospinal fluid (ACSF). The brain was dissected, and sliced into 250-µm coronal sections on a compresstome (Precisionary). Regions of interest were micro-dissected, followed by enzymatic digestion, trituration into single cell suspension, and FACS analysis. For retrograde tracing, 48-52 mCherry-positive cells from each mouse were sorted into 8-well strips containing SMART-Seq lysis buffer with RNase inhibitor (0.17 U/µL; Takara Cat#ST0764). For retrograde transsynaptic tracing, 48 mCherry-positive cells from each mouse were first sorted into 8-well strips, followed by the sorting of mCherry-/mTagBFP2-double positive cells. Sorted cells were immediately frozen on dry ice for storage at −80°C. Reverse transcription, cDNA amplification, and library construction were conducted as described previously (Tasic et al., 2018). Full documentation for the scRNA-seq procedure is available in the ‘Documentation’ section of the Allen Institute data portal at

RNA-seq data processing

Reads were aligned to GRCm38 (mm10) using STAR v2.5.3 in twopassMode as described previously (Tasic et al., 2018), and all non-genome-mapped reads were aligned to sequences encoded by the rabies and AAV helper viruses. PCR duplicates were masked and removed using STAR option ‘bamRemoveDuplicates’. Only uniquely aligned reads were used for gene quantification. Exonic read counts were quantified using the GenomicRanges package for R. To determine the corresponding cell type for each scRNA-seq dataset, we utilized the scrattch.hicat package for R (Tasic et al., 2018). The mapping method was based on comparing a cell’s marker gene expression with the marker gene expression of the reference cell types. Selected marker genes that distinguished each cluster were used in a bootstrapped centroid classifier, which performed 100 rounds of correlation using 80% of the marker panel selected at random in each round. Cells meeting any of the following criteria were removed: < 100,000 total reads, < 1,000 detected genes (with CPM > 0), CG dinucleotide odds ratio > 0.5, mapping confidence <0.7, and mapping correlation <0.6. One animal in which 31 out of the 40 mapped cells were microglia was excluded from analysis.

To extract CCS sequences, non-genome-mapped reads were aligned to the CCS sequence encoded in the rabies genomic sequence, and to extract barcode sequences, non-genome-mapped reads were aligned to the region in the rabies genomic sequence with extra six nucleotides flanking the barcode sequence. genomic sequence. For each sample, nucleotide frequencies of the reads over the specified regions were calculated using the alphabetFrequencyFromBam function of the GenomicAlignments package for R. Position weight matrix (PWM) was constructed from the nucleotide frequency matrix using the makePWM function of the SeqLogo R package, and consensus sequences were derived from PWM objects to represent CCS and barcode sequences.

Sequencing of barcoded rabies virus libraries

Rabies genomes were exacted by NucleoSpin® RNA Virus (740956.50, Takara), RT-PCR was performed with AccuScript PfuUltra II RT-PCR Kit (600184, Agilent) and a customized RT-primer including unique molecular identifier (UMI) sequence, named Adaptor_UMI_N15. The UMI-based counting was used for analyzing rabies barcode diversity.


The redundant Adaptor_UMI_N15 primers in cDNA sample were removed using NucleoSpin® Gel and PCR Clean-Up kit with the ratio NTI solution 1:4. The 217 bp of MiSeq sample was prepared by PCR with a high-fidelity polymerase (Invitrogen™ Platinum™ SuperFi II Green PCR Master Mix, 12-369-010, USA). The PCR was performed as described below: Invitrogen™ Platinum™ SuperFi II Green PCR Master Mix: 12.5 µL; i5-anchor_CTAGCGCT_fp_56 (10 µM): 1.25 µL; i7_TTGGACTT_rp (10 µM):

1.25 µL; cDNA of rabies genome: 1 µL (>100 ng/µL); Nuclease-free water (NEB): 9 µL. The cycle conditions were: 1) 98℃, 30 seconds; 2) 98℃, 5 seconds; 3°60℃, 10 seconds; 4) 72℃, 3 seconds; 5) go to step 2-4, 16 times; 6) 72 ℃, 5 minutes.

The pair of primers used are listed below:



After running the product on a 3% low melting agarose gel and the target bands (217bp) were extracted, the samples were cleaned with Unclasping® Gel and PCR Clean-Up kit. The purified products were sequenced on MiSeq in MIT BioMicro Center with sequencing primers below:



In situ sequencing – library preparation

In situ sequencing was performed as previously described for BARseq(Sun et al., 2021; Chen et al., 2022a) and a detailed step-by-step protocol is provided at ( Briefly, brain sections were fixed in 4% paraformaldehyde in PBS for 1 hour, dehydrated through 70%, 85%, and 100% ethanol, and incubated in 100% ethanol for 1.5 hours at 4°C. After rehydration in PBST (PBS and 0.5% Tween-20), we incubate in the reverse transcription mix (RT primers, 20 U/μL RevertAid H Minus M-MuLV reverse transcriptase, 500 μM dNTP, 0.2 μg/μL BSA, 1 U/μL RiboLock RNase Inhibitor, 1× RevertAid RT buffer) at 37°C overnight. On the second day, we crosslink cDNA using BS(PEG)9 (40 μL in 160 μL PBST) for 1 hour, neutralize the remaining crosslinker with 1M Tris pH 8.0 for 30 mins, and wash with PBST. We then incubate in non-gap-filling ligation mix (1× Ampligase buffer, padlock probe mix for endogenous genes, 0.5 U/μL Ampligase, 0.4 U/μL RNase H, 1 U/μL RiboLock RNase Inhibitor, additional 50 mM KCl, 20% formamide) for 30 mins at 37°C and 45 mins at 45°C, followed by incubating in the gap-filling ligation mix [same as the non-gap-filling mix with the rabies barcode padlock probe (XCAI5) as the only padlock probe, and with 50 μM dNTP, 0.2 U/μL Phusion DNA polymerase, and 5% glycerol] for 5 mins at 37°C and 45 mins at 45°C. After incubation, we wash the sample with PBST, hybridize with 1 μM RCA primer (XC1417) for 10 mins in 2× SSC with10% formamide, wash twice in 2× SSC with10% formamide and twice in PBST, then incubate in the RCA mix (1 U/μL phi29 DNA polymerase, 1x phi29 polymerase buffer, 0.25 mM dNTP, 0.2 μg/μL BSA, 5% glycerol (extra of those from the enzymes), 125 μM aminoallyl dUTP) overnight at room temperature. On the third day we crosslink and neutralize additional crosslinkers as performed on the second day.

In the reverse transcription mix, the RT primers included 50 μM (final concentration) random primer (XC2757) and 2 μM each of primers against the rabies barcode flanking regions (XCAI6 and XCAI7). For the monosynaptic tracing experiments, the RT primers additionally included 2 μM of primers for the rabies glycoprotein (XCAI63 – XCAI76). In the retrograde tracing experiments, the non-gap-filling padlock probe mix included all padlock probes for endogenous genes. In the monosynaptic tracing experiments, the non-gap-filling padlock probe mix included all padlock probes for endogenous genes except Slc30a3, and additionally included padlocks for the rabies glycoprotein (XCAI77 – XCAI90).

In situ sequencing – sequencing and imaging

Sequencing was performed using Illumina MiSeq Reagent Nano Kit v2 (300-cycles) and imaged on a Nikon Ti2-E microscope with Crest xlight v3 spinning disk confocal, photometrics Kinetix camera, and Lumencor Celesta laser. All images were taken with a Nikon CFI S Plan Fluor LWD 20XC objective. For each sample, we first sequence seven sequencing cycles for genes, followed by one hybridization cycle, followed by 15 sequencing cycles for barcodes. In each imaging round, we took a z-stack of 21 images centered around the tissue with 1.5 µm step size at each field of view (FOV). Adjacent FOVs had an overlap of 24%. Detailed list of filters and lasers used for each imaging channel are listed in Table S4.

Detailed sequencing protocols are available at (, and a brief description is provided here.

For sequencing the first gene cycles, we hybridized sequencing primer (YS220) in 2× SSC with10% formamide for 10 mins at room temperature, wash twice in 2× SSC with10% formamide and twice in PBS2T (PBS with 2% Tween-20). We then incubate the sample in MiSeq Incorporation buffer at 60°C for 3 mins, wash in PBS2T, then incubate the sample in Iodoacetamide (9.3 mg vial in 2.5 mL PBS2T) at 60°C for 3 mins, wash once in PBS2T and twice in Incorporation buffer, incubate twice in IMS at 60°C for 3 mins, wash four times with PBS2t at 60°C for 3 mins, then wash and image in SRE.

For sequencing the first barcode cycle, we first strip the sequenced products with three 10 min incubations at 60°C in 2× SSC with 60% formamide. We then wash with 2× SSC with10% formamide, hybridize the barcode sequencing primer (XCAI5) and proceed with sequencing in the same way as for the first gene cycle.

For subsequent sequencing cycles for both genes and barcodes, we wash twice in Incorporation buffer, incubate twice in CMS at 60°C for 3 mins, wash twice in Incorporation buffer, then proceed with incubation with iodoacetamide as performed for the first sequencing cycle.

For hybridization cycles, we strip the sequenced products with three 10 min incubations at 60°C in 2× SSC with 60% formamide. We then wash with 2× SSC with10% formamide, hybridize the hybridization probes (XC2758, XC2759, XC2760, YS221) for 10 mins at room temperature, then wash twice with 2× SSC with10% formamide, and incubate in PBST with 0.002 mg/mL DAPI for 5 mins. We then change to SRE and image.

In situ sequencing – data processing

In situ sequencing data was processed in MATLAB using custom scripts (see Data and Code Availability). (Chen et al., 2022a)In general, we processed each FOV separately, and only combined data from FOV after extracting rolony-level and cell-level data. We denoised the max projection images from each imaging stack using noise2void (Krull et al., 2018), fixed x-y shifts between channels, and corrected channel bleed through. We then registered all gene sequencing cycles to the first gene sequencing cycle using the sum of the signals from all four sequencing channels; we registered the hybridization cycle to the first gene sequencing cycle using the TxRed channel, which visualized all the sequenced gene rolonies; we registered all barcode sequencing cycles to the first barcode sequencing cycle using the sum of sequencing signals; and we registered the first barcode sequencing cycle to the first gene sequencing cycle using the brightfield image. We then performed cell segmentation with Cellpose (Stringer et al., 2020), using the sum of all four hybridization channels, which visualize all gene rolonies, as cytoplasmic signals and the DAPI channel in hybridization cycle as nuclear signals. Gene rolonies are decoded using Bardensr (Chen et al., 2021). Barcode rolonies were first identified by picking peaks in the first barcode cycle using both prominence and intensity thresholds. We then called the nucleotide at each cycle as the channel with the strongest signal of the four sequencing channels. Within each cell, barcodes within 2 mismatches away from other barcodes are error-corrected to the most abundant barcodes within the same cell. All FOVs were stitched using ImageJ to find the transformation matrix from each FOV to each slice. These transformation matrices were then applied to the position of cells and rolonies to find their positions in each slice. Cells that were imaged multiple times in the overlapping regions between neighboring FOVS were removed using an approach based on AABB collision detection (Baraff and Witkin, 1992).

We filtered out neurons with fewer than 20 counts/cell and 5 genes/cell, then clustered the cells in an iterative approach using Louvain clustering as described previously (Chen et al., 2022a). Clusters were mapped to reference scRNA-seq data(Tasic et al., 2018) using SingleR(Aran et al., 2019). We registered slices to the Allen CCF v3 using a manual procedure described previously (Chen et al., 2022a) based on QuickNII and Visualign(Puchades et al., 2019).

Multiplexed retrograde tracing analysis

To estimate an upper bound of the false positive rate for the retrograde tracing experiment using scRNA-seq, we used the definition of false positive rate: FPR = FP/(FP + TN), where FP is the number of false positive detections and TN is the number of true negative detections. Because L5 ET and L6 CT neurons are the main neuronal types that project to the LGd (Harris and Shepherd, 2015; Tasic et al., 2018), we defined TN as all sequenced inhibitory neurons, IT neurons and L5 NP neurons that did not project to the LGd (24 IT neurons, 1 L5 NP neurons, and 7 inhibitory neurons; total 32 neurons), and FP as those that did project to the LGd (0 neurons). We did not include L6b neurons, because some L6b neurons form a continuum with L6 CT neurons in transcriptomics and morphology. It is thus unclear whether L6b neurons could project to the thalamus. Since no FP was identified in this dataset, the true FP likely lies between 0 and 1. We thus estimate that the upper bound of FPR as (1 + FP)/(FP + TN) = 1/32, or 3.1%.

The analysis of the in situ sequencing experiment was performed in MATLAB. Cells were considered barcoded if it contained at least six reads of the same barcode, and the barcodes were matched to the two sequencing libraries allowing one mismatch. To analyze the spatial patterns of the somas of projection neurons, we first divided the cortex into 267 “cubelets.” On each slice, the cubelets were drawn so that they spanned all layers of the cortex and had roughly similar width (∼100 µm) along the outer surface of the cortex along the mediolateral axis. These cubelets largely covered cortical areas VISp, VISl, ECT, and TEa. We did not draw over RSP because the curvature of the cortex at RSP makes it difficult to define cubelets in this area. In each cubelet c, we denoted the projection probability for each cell type t as Pct. Let M = meancEcubelets,tEtypes(Pct) be the mean projection probability across all cell types and cubelets, and TV = ∑cEcubelets,tEtypes(PcM)2 be the total variance. We then computed the variance explained by areas, subclasses, types, and types with areas as follows:

To calculate the variance explained by types and shuffled areas, we first shuffled the area labels of all 267 cubelets, then followed the equations about for VEtype,area.

Barcoded transsynaptic labeling analysis

For the scRNA-seq based transsynaptic labeling experiment, barcode-level analyses, including Hamming distance and matching to barcode library, were performed in MATLAB.

For the in situ sequencing experiments, we filtered barcodes based on linguistic sequence complexity(Trifonov, 1990) and barcode counts per cell. For a barcode with length M, the linguistic sequence complexity C is defined as the product of the vocabulary-usage measures of n-grams, Un, for all possible ns:

Where Un is defined as the ratio between the number of unique n-grams that was observed and the minimum between all possible n-grams and the total number of n-grams in that sequence. We first curated barcoded cells manually to determine appropriate thresholds. We generated 100 × 100 pixels crops of images of all barcode sequencing cycles, the first gene sequencing cycle, the hybridization cycle channels, and cell segmentation mask for 262 cells with varying barcode counts per cell and barcode complexity. The images were visually inspected to determine whether a barcode with the called barcode is present in the cell to determine the thresholds for complexity and barcode counts that removed wrong barcodes. These barcodes were dominated by incomplete cells at the borders of an FOV (low complexity), autofluorescence background (low complexity), and wrong basecalls due to overlapping barcode rolonies (low counts). To determine the source cells, we additionally required cells to have at least two read counts of the rabies glycoprotein. This threshold was chosen so that most glycoprotein positive cells were around the injection sites and were concentrated in layer 5, which are enriched in corticostriatal projection neurons.

To estimate the fraction of barcodes that were observed in source cells, we only focused on barcodes that had sufficient reads per cell (≥10 reads per cell) and were in networks that were large enough to exclude most no-source networks and lost-source networks (≥ 12 cells per barcode). We then asked what fraction of those barcodes were found in source cells, which were defined by having a barcode with at least X reads per cell. Because we knew based on manual inspection that all barcodes with at least ten reads per cell were real, but only some barcodes with between three to ten reads per cell were real, we reasoned that thresholding at X =10 would likely underestimate the number of barcodes that were found in source cells, whereas thresholding at X =3 would overestimate the number of barcodes that were found in source cells. Estimates using these two thresholds thus provided the upper and lower bound of the true rate at which barcodes were observed in source cells.

To estimate the number of independent infection events that achieved the number of barcodes in source cells observed in the experiment, we calculated the posterior probability, P(N|M), of using N independent infection events to achieve 𝑀 = 59 unique barcodes using Bayes’ theorem:

Where P(M|N) is the probability of drawing M unique barcodes after N infections based on the barcode frequency in the virus library, P(N) is the probability of having N infections (uniform distribution), and P(M) is the probability of drawing M unique barcodes overall:

Where 𝐶 = 126 is the number of unique barcode-source cell combinations that was observed.

To find biases in converging outputs between cell types, we first removed all barcodes with fewer than five cells, which resulted in 153 barcodes. This lower boundary allowed us to filter out most no-source networks. We then estimated the number of independent infection events and the maximum barcode frequency to achieve 95% single infection. We randomly drew barcodes from the barcode library based on their frequencies until we obtained the same number of barcodes as in the in situ dataset. Because the library was sequenced only to 82%, we considered the remaining 18% of barcodes to be all non-repetitive for the purpose of this analysis. We repeated this process 10,000 times, and used the median numbers as estimate for the total number of infection events and the barcode frequency threshold for the subsequent analysis. For each barcode, we considered all pair-wise combinations of cells that share that barcode to have converging outputs, and accumulated the number of converging outputs across all barcodes for each combination of cortical subclasses. To estimate the biases compared to the expected number of converging outputs, we calculated the expected number of converging outputs as the square of cell counts for each cortical subclass, normalized so that the total number of connections is the same as the data. To estimate p values, we shuffled the identity of connected cells and recalculated the number of converging outputs for each pairs of cortical subclasses. We repeated this process 10,000 times, and used two times the fraction of iterations with more extreme values of converging outputs as the p values.

Estimating the barcode frequency threshold required knowing the distribution of barcodes in the barcode library. Because the library sequencing was very shallow (most UMIs had a single read), for barcodes with very few UMIs, we could not distinguish whether those barcodes were real barcode/UMI pairs or whether they resulted from PCR errors (Kebschull and Zador, 2015). Furthermore, many barcodes that had low frequencies in the library (which were ideal for unique labeling) were likely missed from the barcode sequencing. Thus, we could not get an accurate estimate of the distribution for the rare barcodes. However, since double labeling was more likely to occur to barcodes that were more abundant, we only needed to know the distribution of abundant barcodes to find a frequency threshold that can ensure a small number of double labeling events. Thus, we thus used the most abundant 1,820 barcodes (out of 13,211 barcodes observed) in the RVΔG-4mCherry_CCS2_20nt_HM (EnvA) library when estimating the frequency threshold; these 1,820 barcodes each had at least 21 UMI counts and constituted 81.9% of all barcode molecules in this library.

Detailed analysis scripts are provided on Mendeley Data (see Data and code availability).

Data and code availability

Single-cell sequencing data were deposited at the Neuroscience Multi-omic Data Archive (NeMO). Sequences of the barcoded rabies library were deposited at Sequence Read Archive (SRA SRR23310758, SRR23310757, and SRR23310756). Processed in situ sequencing data, processing scripts, and analysis scripts (for both in situ experiments and single-cell experiments) are deposited at Mendeley Data (preview link: Max projection images of raw in situ sequencing data are deposited at the Brain Image Library (BIL).


The authors would like to thank Yoav Ben-Simon, Ian Peikon, Justus Kebschull, Anthony Zador, Mara Rue for discussion. This work was supported by the National Institute of Health (1DP2MH132940 to X.C.) and by seed grants from the James W. and Patricia T. Poitras Fund and the Charles S. Camplan Fund to I.W. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Competing interests

I.R.W. is a consultant for Monosynaptix, LLC, advising on design of neuroscientific experiments. The other authors declare no competing interests.

Author contributions

B.T., I.R.W., and X.C. conceived the study. L.J., M.M. H.S., I.R.W. constructed barcoded rabies virus libraries. A.Z., L.J., S.Y., H.S. collected sequencing data. A.Z., S.Y., C.V., N.S., M.K., B.T., I.R.W., X.C. analyzed the data. L.J., S.Y., B.T., I.R.W., X.C. wrote the paper.

Supplementary Note

We first tested whether scRNA-seq was sufficient to map single-cell connectivity using barcoded transsynaptic labeling. We first injected a Cre-dependent helper AAV virus combination (Liu et al., 2017; Lavin et al., 2019; Lavin et al., 2020) to express TVA (co-expressed with EGFP) and the rabies glycoprotein (co-expressed with the blue fluorophore mTagBFP2) in Cre-defined populations of cells in VISp (three Sst-IRES-Cre animals and one Vip-IRES-Cre animal). After 14 days, we injected a barcoded ΔG rabies virus library expressing mCherry and pseudotyped with EnvA for selective infection of TVA-expressing neurons. This library contained the same barcodes as the CCS-containing library used for retrograde labeling, with at least 13,211 barcodes. After another seven days, we dissected out the visual cortex, isolated both mTagBFP2+/mCherry+ neurons and mTagBFP2-/mCherry+ neurons using fluorescence-activated cell sorting (FACS), and performed scRNA-seq on both populations using SMART-Seq v4 (total 295 cells from four animals; Table 1; Supp Fig. S4A). After quality control, we identified 195 potential presynaptic neurons (defined as expressing rabies viral genes, but not transcripts from both helper AAV-viruses) and 37 potential source neurons (defined as expressing both helper AAV transcripts and the other rabies viral genes) (Supp Fig. S4B). Consistent with the Cre lines used, the source neurons were predominantly Sst neurons in the three Sst-IRES-Cre animals, and predominantly Vip neurons in the Vip-IRES-Cre animal (Supp Fig. S4C).

To assess the accuracy of barcode sequencing using scRNA-seq, we compared the distribution of the minimum Hamming distance between barcode pairs in our dataset or a random set of 20-nt barcodes of similar size (Supp Fig. S4D). Whereas no random barcodes had minimum Hamming distance of 5 or less, the distribution of minimum Hamming distance for barcodes had two peaks at 1 and 4, respectively. These two peaks likely correspond to sequencing errors and errors resulting from having multiple barcodes per cell. We thus corrected barcodes by allowing up to five mismatches (Methods).

We then identified and removed double-labeled networks from the barcoded cells. Double-labeled networks should be largely restricted to barcodes that were abundant in the virus library. We thus estimated the prevalence of these networks based on barcode distribution across different animals: if a barcode was found in multiple animals, then it would be unlikely that it labeled a single source cell in one of the animals. Of 20 unique barcodes in all cells, six barcodes were found in only one animal, whereas eight of the 14 remaining barcodes were found in all four animals (Supp Fig. S4E). Consistent with the hypothesis that these barcodes were abundant in the virus library, the frequencies of barcodes found in all four animals ranged from 0.27% to 0.03% (median 0.08%), whereas the frequencies of barcodes that were found in only one animal were all < 0.01% with median at 4×10-5 (p = 7×10-4 compared to the frequency of barcodes that were found in all animals using rank sum test; Supp Fig. S4F). Of the six barcodes that were found in only one animal, five were found in a single neuron each, and the remaining barcode was found only in one source neuron and one non-source (i.e., not expressing G) neuron. Thus, after removing barcodes likely to have resulted in double-labeled networks, the remaining barcoded cells were too few to infer connections in these four animals. We attributed this to cell loss from dissociation and FACS that is inherent to the scRNA-seq pipeline.

Supplementary Figures

Quality control of scRNA-seq in rabies virus-infected cells.

(A) Quality control plots of scRNA-seq for all animals used in both multiplexed retrograde labeling experiments and barcoded transsynaptic labeling experiments. (B) Cluster mapping confidence for rabies barcoded neurons. Boxes indicate quartiles and medians, and whiskers indicate range of data excluding outliers. Outliers are shown as individual dots.

The expression of immune response-related genes in rabies virus-infected cells

(A) Volcano plots showing differential expression of genes in clusters with more than 10 rabies infected cells across all six animals. (B) The expression of select immune response related genes in uninfected cells from (Tasic et al., 2018), AAV-infected cells from (Graybuck et al., 2021), and rabies infected cells of matching cell types from this study.

The expression of activity-related genes in rabies virus-infected cells and uninfected cells from (Tasic et al., 2018)

scRNA-seq is insufficient to resolve connectivity among transsynaptically labeled neurons using barcoded rabies virus.

(A) Outline of the experiment. (B) The expression of rabies encoded genes (top) and AAV-encoded genes (bottom) in sequenced cells. The count for each gene was scaled by the sum of all viral reads multiplied by 10,000. Natural logarithms of resulting scaled counts, ln(scaled counts+1), were used to represent expression levels of virally encoded genes. The bar on the top indicates whether a cells is considered to have AAV or not. The second bar on top indicates animals. (C) The number of source cells (left) and presynaptic cells (right) of each cell type (rows) from each animal (columns). Colors indicate cluster identity and dot size indicates number of cells. (D) The minimum Hamming distance between each barcode and all other barcodes in the pool of sequenced barcodes (solid line) or random barcodes (dashed line). (E) The number of barcodes that were found in the indicated number of animals. (F) Box plots showing the library frequency of barcodes found in one or four animals. All barcodes are indicated by dots. The boxes show median and quartiles and the whiskers indicate range. BC: Barcode.

Barcoded transsynaptic labeling resolved by in situ sequencing

(A) Locations of source cells (red dots) relative to cortical layers, which are color coded as indicated. (B) Fraction of presynaptic cell barcodes that were also found in source cells (y axis) when thresholding barcodes in source cells at the indicated reads per cell (x axis).

Presynaptic cells and source cells in all single-source networks

In each plot, a source cell (red cross) and presynaptic cells (dots) that shared the same barcodes were plotted. Colors of dots indicate transcriptomic types of presynaptic neurons. Transcriptomic types of source cells are indicated below each plot. All other cells from the coronal sections that the source cells were on were plotted in gray.

See SuppTableS1.xlsx

Table S1.Metadata of animals used in this study

See SuppTableS2.xlsx

Table S2.Viruses used in this study

see SuppTableS3.xlsx

Table S3. List of oligos used

List of filters and lasers used for in situ sequencing