Comparative analysis of in situ gene expression profiling technologies.

(a) Mouse brain sections from six publicly available datasets. Cells are colored by broad annotation classes derived from scRNA-seq label transfer (Supplementary Methods). (b) Overview of technical quality control metrics for each of the six datasets. (c) UpSet plot showing the overlap of targeted genes across methods. Seven genes were included in the target panel for all six methods. (d) UMAP visualization of each dataset, cells are colored by broad annotation class as in (a). (e) Schematic overview of comparative analysis. (Left) We obtained molecular spot calls, cell segmentations and gene expression quantifications from the original dataset generators. (Right) we utilized these data to assess sensitivity, specificity, cell type identification, and spatial differential expression analysis.

Variation in specificity across in situ technologies

(a) Heatmaps displaying marker gene expression (Supplementary Methods) for each of five major cell classes. Heatmap shows 50 randomly sampled cells per class, with a threshold of five molecular counts for each gene. While most technologies exhibit robust marker gene expression in the correct class, we also observe evidence for non-specific expression. (b) Pseudobulk expression plots comparing average molecular counts (within astrocytes) for each in situ technology, compared with an scRNA-seq reference. (c) ‘Barnyard’ plots showing the expression (counts) of two mutually exclusive genes, Slc17a7 (excitatory neuron marker) and Gfap (astrocyte marker) in each dataset. (d) MECR for all six in situ methods. Boxplots display the range of MECR values for all pairs of mutually exclusive genes in each dataset (Supplementary Methods). For reference, scRNA-seq datasets generated using 10x Genomics (Linnarsson) and SMART-Seq2 (Allen) are also shown (Supplementary Figure 4b).

Segmentation size and quality affects molecular sensitivity and specificity

(a) Author-provided cell segmentations for the Xenium and MERSCOPE datasets, as representative examples. In these plots, cell borders are colored red, cell interiors are filled in blue, and all detected molecules are overlaid in white. Three representative regions are shown; The dentate gyrus (outlined in green), lateral ventricle (outlined in orange), and a section of the cerebral cortex (outlined in blue). (b) Zoomed in regions from the cerebral cortex. Cell boundaries (author-provided) are shown in white. Molecular assignments are provided by Baysor, and are colored by the cell they are assigned to. Unassigned molecules are shown in white. (c) Relationship between dataset MECR (x-axis) and the fraction of detected molecules assigned to cells (y-axis). (d) ‘Barnyard’ plots showing the expression (counts) of two mutually exclusive genes, Slc17a7 (excitatory neuron marker) and Gad1 (inhibitory neuron marker) in each dataset, using either author-provided or Baysor segmentations from the same region of the cerebral cortex. (e) For each dataset, we assigned molecules using Baysor at ten different molecular assignment stringency thresholds. Varying this threshold enables us to calculate sensitivity (average total molecules per cell; y-axis) as a function of specificity (MECR; x-axis).

Non-specific molecular assignments confound spatial differential expression analysis

(a) We performed differential expression analysis between astrocytes located in the cortex (blue) and astrocytes that were located in the thalamus (red). Xenium dataset is shown. (b) Volcano plot of differential expression results between thalamic and cortical annotated-astrocytes in the Xenium dataset, based on author-provided segmentations. (c) Dot plot illustrating the expression levels of DE genes from (b), in a scRNA-seq dataset across cell types identified from manually dissected cortical and thalamic samples. In many cases, the identified DE genes are neuronal markers, suggesting that the DE result (obtained from astrocytes) was a result of non-specific molecular expression (d) Expression of Satb2 and Slc17a6 across neurons and astrocytes in the cortex and thalamus quantified by Xenium and scRNA-seq. (e) Zoomed view of segmentations and marker gene molecules for cortical neurons (Satb2, Neurod6 and Lamp5), thalamic neurons (Slc17a6), and pan-region astrocytes (Aqp4 and Ntsr2) in representative cortical (blue) and thalamic (red) regions. Neuronal marker molecules are mis-assigned to astrocytes, confounding differential expression analysis. The segmentation background color denotes the annotated cell type; Dark pink segmentations are cortical neurons, dark orange segmentations are thalamic neurons, dark green segmentations are astrocytes, and dark grey segmentations are assigned to one of the remaining cell types.

(a) Log-transformed pseudobulk expression for all genes. Plots are shown for technologies where replicates (i.e. two adjacent tissue slices) are available. (b) The average molecular counts detected per cell (using author-provided segmentations) does not show a relationship with the total number of genes targeted. (c) Distribution of gene expression quantiles, derived from the Linnarsson scRNA-seq dataset, for each gene which is targeted in each panel.

(a) Heatmap displaying gene expression of marker genes for 250 total sampled cells from 5 major cell types. Same as Figure 2a but for the Linnarsson scRNA-seq dataset. Raw counts thresholded at 5 are displayed. (b) Slc17a7 and Gfap counts in cells from the Linnarsson Reference and Allen cortex dataset are shown. Both are scRNA-seq datasets, and in contrast to in situ technologies (Figure 2c), exhibit minimal evidence of mutual co-expression of these markers.

(a) UMAP visualization of each dataset. Same as in Figure 1d, but cells are colored by higher resolution annotations, derived from scRNA-seq label transfer. (b) A rank-sorted comparison of the expression levels of gene targeting probes (in black) and negative targeting control probes (in red). A truncated subplot is provided to emphasize probes at lower expression levels, with the relevant sections of the rank plot highlighted in grey. (c) The ratio of the mean counts of negative targeting probes to gene targeting probes. A ratio value of 0.1 signifies that for every detected gene probe, there are 0.1 negative probes detected.

(a) Sensitivity versus specificity curves shown for multiple experimental replicates. Same as Figure 3e, but multiple replicates are shown when available for the same technology (b), Same as panel (a), but the mean is computed after thresholding the maximum counts for each gene to 5, to reduce the effect of highly expressed outliers, and limiting the effect that a single gene can have on this metric. Technologies that use large panels, like MERFISH, exhibit a more significant relative improvement when compared to the absence of thresholding, as opposed to technologies that use small panels, such as Molecular Cartography. (c) Mean counts for genes targeted by both MERSCOPE and Xenium versus MECR. There are 27 genes targeted by both panels. (d) Same as (c), but Molecular Cartography shown instead of Xenium. There are 16 genes targeted by both panels. (e) Same as (c), but MERFISH shown instead of Xenium. There are 142 genes targeted by both panels. (f) Same as (c), but EEL FISH shown instead of Xenium. There are 53 genes targeted by both panels. For panels (c) through (e) Slc17a7 is excluded because it is an extremely highly detected outlier gene.

(a) Volcano plot displaying differentially expressed genes between cortical astrocytes and thalamic astrocytes in MERSCOPE dataset using author-provided segmentations. (b) Volcano plot displaying differentially expressed genes between cortical astrocytes and thalamic astrocytes in Xenium dataset using Baysor segmentations. (c) Volcano plot displaying differentially expressed genes between cortical astrocytes and thalamic astrocytes in MERSCOPE dataset using Baysor segmentations. (d) Expression of Slc17a6 in neurons and astrocytes from the thalamus and cortex in the MERSCOPE dataset using author-provided segmentations. (e) Expression of Satb2 and Slc17a6 in neurons and astrocytes from the thalamus and cortex in then Xenium dataset using Baysor segmentations. (f) Expression of Slc17a6 in neurons and astrocytes from the thalamus or cortex in MERSCOPE dataset using Baysor segmentations. Satb2 is not shown in (d) and (f) because it is not included in the MERSCOPE panel.