Figures and data

Gene specificity landscape framework.
a) Expression levels of three representative genes across samples. b) Expression level (L; x-axis) vs expression breadth (B; y-axis) lines visualization of the three representative genes. Expression levels are divided into 50 expression bins. c) AUC, dRate and lbSpec scores of the three genes. Score max-normalization is computed over all genes. d) 2-dimensional landscape construction and score visualization. Scores are mapped to a color gradient from dark (low scores) to bright (high scores). e-g) Transcriptomic reference sources considered: human tissue profiles from bulk RNAseq data (e); human cell type pseudobulk profiles from single-cell RNAseq (scRNAseq) data (f); primate brain cell type profiles derived from scRNAseq data (g).

Gene tissue-specificity landscapes.
a) Human tissue transcriptomic reference used as input. b) 2D gene specificity landscape of 18,848 genes. Data points represent individual genes. Scores are visualized using a color gradient from dark (low score) to bright (high score). c) Normalized expression of selected genes across tissues. The top three genes were chosen based on high AUC. The bottom three genes were chosen based on high lbSpec. Expression for each gene was divided by its maximum value. The right barplot shows the highest absolute expression per gene. d) Corresponding L-B lines of the selected genes. e) Enrichment analysis of specificity scores for HGCN gene groups with at least 10 genes. Top 10 groups per score are shown. f) 2D gene specificity landscapes highlighting low tissue specificity (left) or tissue enriched (right) genes as determined by the human protein atlas (HPA). Color gradient represents lbSpec scores. Boxplot insert shows lbSpec scores for the genes in the two groups. g) L-B lines of tissue specificity groups defined by the HPA. The dashed grey line represents random expectation. h) Enrichment analysis of L-B specificity metrics on HPA specificity groups.

Cell type specificity landscape of GO biological processes.
a) Human cell type transcriptomic reference used as input. b) Gene group L-B lines visualization. The black line represents the L-B scores computed on the mean expression of genes in a gene group. The highly specific “cilium movement” group is used as an example. The dashed black lines represent the mean plus or minus the standard deviation of the L-B scores computed over the individual genes in the group. The grey line represents the random expectation of gene groups with the same size as the analysed gene group. The dashed grey lines represent the mean plus and minus the standard deviation of genes in the random groups. The corresponding 2D gene group-cell type specificity landscape of 5,368 Gene Ontology biological processes is shown on the right. Each data point represents a gene group (GO:BP). Color gradient represents AUC scores. The “cilium movement” group is highlighted in the landscape. c) Cell-type specificity landscape of GO biological processes colored by L-B specificity scores -- color gradient from dark (low score) to bright (high score). d) L-B lines of three highly specific gene groups and three constitutive groups. Lines colored based on their AUC values (black: low AUC; yellow: high AUC). Insert on the right shows the corresponding location on the 2D landscape. e) L-B specificity scores (y-axis) distribution across gene categories. Each data point represents a gene group. L-B variability is defined across genes within a group. f) Normalized average expression of selected gene groups across cell types. Expression for each gene group was divided by its maximum value. The right barplot shows the highest absolute expression value per gene group.

Expression specificity changes at increasing biological resolution.
a) L-B lines of representative gene groups across human cell types. ISBC: Intermembrane Space Bridging Complex. GPCR: G-Protein Coupled Receptor. Groups were selected from HGCN groups. GABA and Glutamatergic groups were manually defined using marker genes. b) Cell type specificity landscape of 18,782 genes. The representative gene groups are mapped onto the 2D landscape based on their L-B behavior across human cell types. c) Normalized average expression of representative selected groups across human cell types. The expression for each gene group was divided by its maximum value. d) L-B lines of representative gene groups across human dorsolateral prefrontal cortex (dlPFC) neuron subtypes. e) Representative gene groups are mapped onto the 2D cell type specificity landscape based on their L-B behavior across dlPFC neuron subtypes. f) Normalized average expression of representative groups across dlPFC neuron subtypes. Expression for each gene group was divided by its maximum value -- Glutamatergic neurons subtypes on the left, GABAergic neuron subtypes on the right. g) Ranked neuronal specificity score. Red points highlight the top-100 genes. h) Enrichment of known marker genes in high-scoring neuronal specificity genes (p-value, GSEA rank-based analysis). i) Expression pattern of top-100 neuronal specific genes across reference cell type profiles (HPA). j) L-B behavior of the top-100 neuronal specific genes. On the left, L-B behavior in reference cell types. On the right, L-B behavior across neuron subtypes. k) Neuronal specific genes mapped onto reference cell type reference landscape. Black circled dots represent neuronal specific gene behavior across neuron subtypes. Red circled dots represent neuronal specific gene behavior across reference cell types. The color of the dot represents specificity values, with lighter color indicating higher specificity. l) Neuronal specific genes in the neuron subtype landscape. The color of the dot represents specificity values, with lighter color indicating higher specificity.

Deviations in expression specificity among primates.
a) Phylogenetic tree of the four primate species considered (top). MYA: million years ago. Tree and divergence times were obtained from TimeTree (https://timetree.org/)43. Darker colors indicate higher evolutionary distance to humans (shown in grey). Description of snRNAseq data source (bottom). b) Mean absolute expression breadth (B) deviation between humans and each other primate species. The horizontal dotted line indicates the highest B deviation of chimpanzees in inhibitory neurons and provides a reference point. c) Pairwise correlation of lbSpec scores between primates. A darker color indicates a higher correlation. The correlations were computed separately for inhibitory neurons (left), excitatory neurons (middle) and glial cells (right). d) L-B lines of representative genes with changing specificity behavior across species. L-B analysis was performed across neuronal subtypes.

Expression breadth distribution and L-B specificity metrics.
a) Percentage of expressed genes (y-axis) having a given expression breadth (x-axis). From left to right, an increasing TPM threshold is considered to define a gene as expressed or not. b) Schematic representation of AUC, dRate, and lbSpec metric calculation. Purple lines represent a low scoring gene and orange lines a high scoring gene. AUC (left): colored areas represent the area under a curve (AUC). dRate (middle): grey areas represent portions of the line with a constant expression level. These portions are not considered in the computation of the dRate score. Dashed lines represent the variation in expression level or expression breadth. lbSpec (right): horizontal dashed lines represent the expression level segment considered for each expression breadth value. Vertical dashed lines represent the expression level at which the expression breadth reaches zero.

L-B behavior of gene groups and robustness analyses.
a) L-B lines for ribosomal proteins (high), transcription factors (general TFs) (middle) and gamma-aminobutyric acid (GABA) receptors (low). b) 2D gene tissue specificity landscapes highlighting genes belonging to ribosomal protein (left), general TF (middle), or GABA receptor (right) groups. Genes are colored based on the AUC score. c) Enrichment analysis of L-B specificity metrics for HPA specificity and detection gene categories defined across tissues (above) or cell types (below). d) Correlation between baseline score values and values after sample removal (x-axis). AUC is shown in purple, Tau specificity in black, lbSpec in red, and dRate in yellow. The line plot insert shows the average correlation between random removal replicates per removal fraction (%). The boxplot insert shows random removal replicate correlation across all sample removal percentages. e) Distribution of correlation values between lbSpec and Tau metric across all replicates and sample removal experiments. f) Distribution of correlation values across sample removal percentages.

L-B behavior of specific and constitutive processes.
a) Normalized expression profiles across human cell types for each gene of representative specific (top) and constitutive (bottom) gene groups. GO biological processes were selected based on extreme lbSpec (top three) or AUC (bottom three) values. Gene expression was divided by its maximum value. b) L-B lines of representative gene groups (black), along with the intergenic L-B variability (dashed black), the average random expectation (grey), and the variability of the random expectation (dashed grey). c) Deviation from random expectation of L-B behavior. Z-scores were computed individually by expression level bin. Data points represent −log(pvals) multiplied by 1 in case of positive deviation (enrichment, z-score>0) or −1 for negative deviation (depletion, z-score<1). The red line indicates a value of zero. Lines that for the most part show negative (positive) values indicate unexpectedly low (high) expression breath values and thus high (low) specificity. d) Enrichment analysis of L-B specificity metrics for representative gene sets.

Reference landscape mapping procedure.
a) Procedure to map input L-B relationships to a precomputed reference landscape. b) Evaluation of the mapping procedure. c) Global match scores between predicted and reference landscape positions (top). Global match was measured by the consistency of the relative location between predicted and reference positions as measured by the rank correlation of distance vectors (Methods). Match scores between predicted and reference local neighborhoods (bottom). The match is measured by the overlap or top-100 neighbors and quantified by the Jaccard index (Methods).