Figures and data

Benchmarking performance of CellCover and competing methods on the CBMC dataset & Blood Cell Type Mapping.
A) Balanced accuracy versus the size of the marker panel for all methods, with standard deviation across random seeds shown as shaded areas. CellCover implemented with weights calculated from log normalized or binary data are both shown. B) Proportion of intersection between each method’s global marker panel and the CellCover (log-normalized) reference panel C) Proportion of redundant marker genes selected for multiple cell classes versus panel size for CellCover and DE, also including standard deviation across random seeds. D) Sankey diagram of the mapping from source blood cell types [24] to target blood cell types [25]. Original authors’ cell type labels are used: HSC = hematopoietic stem cells, DC = dendritic cells, B = B-cells, T = T-cells, NK = natural killer cells, eryth + RBC = red blood cells, mono= monocytes, ilc = innate lymphoid cells, prof + prolif = proliferating.

CellCover marker gene panels in the developing mammalian neocortex.
A. Dot plot of empirical conditional expression probabilities of CellCover marker panels of each cell age. The marker genes of each cell age are grouped along the horizontal axis and sorted by expression frequency in the cell class of interest. The color of the dots represents the expression probability of markers conditioned on time since a cell’s terminal division, i.e., the proportion of cells within a class expressing the marker gene. For this analysis, cells of each time point were pooled across embryonic ages (E12–15). The panel is obtained using the CellCover with α = 0.02 and d = 5. RG = radial glia, NB = neuroblast, NR = neuron. B. Transfer of marker gene panels from the Telley data [34] to a second mouse neocortex dataset shows consistent identification of the primary cell types in mouse neurogenesis. Left: UMAP representations of cells from the dorsal forebrain excitatory lineage in the LaManno atlas of mouse brain development [35], colored by cell labels assigned by the original authors. This is followed by the same UMAPs, now showing the proportion of gene panels derived from the Telley dataset, labeled 1H, 24H and 96H, expressed at non-zero levels in each individual cell. These last three plots illustrate the transfer of marker gene panels derived from the Telley data to the LaManno data. C. Box plots of these same proportions broken down by cell-type labels provided by the original authors. D. Transfer of marker gene panels from the Telley data in mouse to data in the developing human neocortex from Polioudakis et al. [39], shows the identification of conserved cell types in neurogenesis across mammalian species. Left: tSNE map of cells from the dorsal forebrain excitatory lineage in the Polioudakis data. This is followed by the same tSNE maps, now showing the transfer of marker gene panels derived from progenitors labeled 1, 24, and 96 hours after terminal cell division in the Telley data. The map is colored by the proportion of each gene panel that is expressed at non-zero levels in each individual cell of the human Polioudakis dataset. Original athor cell labels are used: RG = radial glia, v = ventricular, o = outer, Pg = cycling progenitor, S = in S phase, G2M = in G2M phase, IP = intermediate progenitor, Ex = excitatory neuron, N = new migrating, M = maturing, Dp = deep layer, U = upper layer.E. Box plots of these same proportions broken down by cell type labels and microdissection information provided by the original authors. The final two boxes indicate expression in neuronal subtypes segregated by physical location: germinal zone (GZ) or cortical plate (CP) microdissection.

Expression of 1H, 24H, and 96H CellCover marker gene panels across development in bulk human and microdissected macaque neocortical tissue.
A. Transfer of CellCover marker gene panels from the Telley data in mouse [34] into bulk RNAseq from human fetal cortical tissue [40]. The panel is obtained using the CellCover with α = 0.02 and d = 5. The transferred values of gene panels were assessed as the sum of panel gene expression levels in each individual sample divided by the maximum sum observed in the samples. B. Transfer of marker gene panels (Table S4d: α = 0.02, d = 15 nested expanded from d = 7) from the Telley data in mouse into microarray data from laser microdissected regions of the developing macaque neocortex [37]. Nonlinear fits in 1H, 24H, and 96H panels are to VZ, iSVZ, and Ctx data, respectively. X-axis ages are expressed as embryonic (E) days after conception and months (mo) after birth. Transferred marker gene panel levels were calculated as in panel A. VZ = ventricular zone, iSVZ= inner subventricular zone, oSVZ = outer subventricular zone, subP = subplate, CP = cortical plate, Ctx = cortex.

Expression of the 12 CellCover gene panels from the Telley data across development in the fetal neocortex of the mouse and human.
A. Transfer of Telley gene panels (α = 0.02 and d = 6) into the radial glia (left panel), neuroblasts (center panel) and neurons (right panel) from the developing mouse brain atlas (La Manno et al. [35]). In all cases, covering rates of transferred panels from 1H cells are depicted in blue, 24H in green, and 96H in red. Transferred levels were calculated as the proportion of cells of each type expressing more than 3 marker genes in the gene panel. B. Transfer of the 12 gene panels (α = 0.02 and d = 15 nested expanded from d = 7) into bulk RNA-seq data from the human fetal cortex [40]. Transferred values of gene panels were assessed as the sum of gene panel expression levels in each individual sample divided by the maximum sum observed in the samples (as in Figure 3A).

Expression of the 12 CellCover gene panels from the Telley data across development in the fetal and early postnatal neocortex of the macaque and human.
The CellCover marker panels are obtained at α = 0.02 and d = 15 nested expanded from d = 7. A. Transfer of the 12 CellCover panels into microarray data from microdissected regions of the developing macaque neocortex [37]. X-axis ages are expressed as embryonic (E) days after conception and months (mo) after birth. Transferred gene panel levels were calculated as in Figure 3. VZ = ventricular zone, iSVZ= inner subventricular zone, oSVZ = outer subventricular zone, subP = subplate, CP = cortical plate, Ctx = cortex. B. Repeated transfer of the 96H gene panels into the microdissected macaque data, using additional labeling of dissections by cortical layer. C. Repeated transfer of the 96H gene panels into the human cortex data [40], showing additional late fetal and early postnatal samples in postnatal development.

Expression of gliogenic and oRG marker gene panels across developmental time in human, macaque, and mouse neural progenitor cells.
CellCover (α = 0.02, d = 15) was used to define marker gene panels from scRNA-seq of sorted cell types of the developing human telencephalon [46]. Here, the expression of marker gene panels from gliogenic precursor cells and outer radial glial (oRG) cells is examined in additional scRNA-seq data from progenitor cells of the developing A. human [47], B. macaque [38], and C. mouse [36] neocortex. Each panel depicts the number of progenitor cells (color intensity) expressing differing proportions of oRG (X-axis) and gliogenic precrsor (Y-axis) marker panel genes at one developmental time in each species. Changes in the distribution of progenitor cells expressing different proportions of the marker genes as development progresses can be seen across panels from left to right. Ages are shown in individual panel titles. pcw = post conceptional weeks.