Transcriptional cartography integrates multiscale biology of the human cortex

  1. Konrad Wagstyl  Is a corresponding author
  2. Sophie Adler
  3. Jakob Seidlitz
  4. Simon Vandekar
  5. Travis T Mallard
  6. Richard Dear
  7. Alex R DeCasien
  8. Theodore D Satterthwaite
  9. Siyuan Liu
  10. Petra E Vértes
  11. Russell T Shinohara
  12. Aaron Alexander-Bloch
  13. Daniel H Geschwind
  14. Armin Raznahan
  1. Wellcome Centre for Human Neuroimaging, University College London, United Kingdom
  2. UCL Great Ormond Street Institute for Child Health, United Kingdom
  3. Department of Psychiatry, University of Pennsylvania, United States
  4. Department of Child and Adolescent Psychiatry and Behavioral Science, The Children's Hospital of Philadelphia, United States
  5. Department of Biostatistics, Vanderbilt University, United States
  6. Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
  7. Department of Psychiatry, Harvard Medical School, United States
  8. Department of Psychiatry, University of Cambridge, United Kingdom
  9. Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, United States
  10. Lifespan Informatics and Neuroimaging Center, University of Pennsylvania School of Medicine, United States
  11. Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, United States
  12. Center for Autism Research and Treatment, Semel Institute, Program in Neurogenetics, Department of Neurology and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, United States

Abstract

The cerebral cortex underlies many of our unique strengths and vulnerabilities, but efforts to understand human cortical organization are challenged by reliance on incompatible measurement methods at different spatial scales. Macroscale features such as cortical folding and functional activation are accessed through spatially dense neuroimaging maps, whereas microscale cellular and molecular features are typically measured with sparse postmortem sampling. Here, we integrate these distinct windows on brain organization by building upon existing postmortem data to impute, validate, and analyze a library of spatially dense neuroimaging-like maps of human cortical gene expression. These maps allow spatially unbiased discovery of cortical zones with extreme transcriptional profiles or unusually rapid transcriptional change which index distinct microstructure and predict neuroimaging measures of cortical folding and functional activation. Modules of spatially coexpressed genes define a family of canonical expression maps that integrate diverse spatial scales and temporal epochs of human brain organization – ranging from protein–protein interactions to large-scale systems for cognitive processing. These module maps also parse neuropsychiatric risk genes into subsets which tag distinct cyto-laminar features and differentially predict the location of altered cortical anatomy and gene expression in patients. Taken together, the methods, resources, and findings described here advance our understanding of human cortical organization and offer flexible bridges to connect scientific fields operating at different spatial scales of human brain research.

eLife assessment

This study provides continuous maps of human brain gene expression and explores their relationship with a large variety of microscopic and macroscopic aspects of brain organisation. The authors provide convincing evidence for a relationship between gene expression maps with various aspects of the anatomy of adult brains, during development, and in the case of mental disorders. The data and methods introduced can be an important tool for neuroimaging research.

https://doi.org/10.7554/eLife.86933.3.sa0

Introduction

The human cerebral cortex is an astoundingly complex structure that underpins many of our distinctive facilities and vulnerabilities (Geschwind and Rakic, 2013). Achieving a mechanistic understanding of cortical organization in health and disease requires integrating information across its many spatial scales: from macroscale cortical folds and functional networks (Glasser et al., 2016) to the gene expression programs that reflect microscale cellular and laminar features (Hawrylycz et al., 2012; Kelley et al., 2018). However, a hard obstacle to this goal is that our measures of the human cortex at macro- and microscales are fundamentally mismatched in their spatial sampling. Macroscale measures from in vivo neuroimaging provide spatially dense estimates of structure and function, but microscale measures of gene expression are gathered from spatial discontinuous postmortem samples that have so far only been linked to macroscale features using methodologically imposed cortical parcellations (Hansen et al., 2021; Larivière et al., 2021; Seidlitz et al., 2020). Consequently, local transitions in human cortical gene expression remain uncharacterized and unintegrated with the spatially fine-grained topographies of human cortical structure and function that are revealed by in vivo neuroimaging (Gryglewski et al., 2018; Markello et al., 2021). Finding a way to bridge this gap would not only enrich both our micro- and macroscale models of human cortical organization, but also provide an essential framework for translation across traditionally siloed scales of neuroscientific research.

Here, we use spatially sparse postmortem data from the Allen Human Brain Atlas (AHBA; Hawrylycz et al., 2012) to generate spatially dense cortical expression maps (DEMs) for 20,781 genes in the adult brain, with accompanying DEM reproducibility scores to facilitate wider usage. These maps allow a fine-grained transcriptional cartography of the human cortex, which we integrate with diverse genomic, histological, and neuroimaging resources to shed new light on several fundamental aspects of human cortical organization in health and disease. First, we show that DEMs can recover canonical gene expression boundaries from in situ hybridization (ISH) data, predict previously unknown expression boundaries, and align with regional differences in cortical organization from several independent data modalities. Second, by focusing on the local transitions in gene expression which are captured by DEMs, we reveal a close spatial coordination between molecular and functional specializations of the cortex and establish that the spatial orientation of cortical folding and function at macroscale is aligned with local tangential transitions in cortical gene expression. Third, by defining and annotating gene co-expression modules across the cortex at multiple scales we systematically link macroscale measures of cortical structure and function in vivo to postmortem markers of cortical lamination, cellular composition, and development from early fetal to late adult life. Finally, as a proof of principle, we use this novel framework to secure a newly integrated multiscale understanding of atypical brain development in autism spectrum disorder (ASD).

The tools and results from this analysis of the human cortex, which we collectively call Multiscale Atlas of Gene expression for Integrative Cortical Cartography (MAGICC), open up an empirical bridge that can now be used to connect cortical models (and scientists) that have so far operated at segregated spatial scales. To this end, we share (i) all gene-level DEMs and derived transcriptional landscapes in neuroimaging-compatible files for easy integration with in vivo macroscale measures of human cortical structure and function; and (ii) all gene sets defining spatial subcomponents of cortical transcription for easy integration with any desired genomic annotation (https://github.com/kwagstyl/magicc).

Results

Creating and benchmarking spatially dense maps of human cortical gene expression

To create a dense transcriptomic atlas of the cortex, we used AHBA microarray measures of gene expression for 20,781 genes in each of 1304 cortical samples from six donor left cortical hemispheres (‘Materials and methods,’Table S1Supplementary file 1). We extracted a model of each donor’s cortical sheet by processing their brain MRI scan and identified the surface location (henceforth ‘vertex’) of each postmortem cortical sample in this sheet (‘Materials and methods,’ Figure 1a). For each gene, we then propagated measured expression values into neighboring vertices using nearest-neighbor interpolation followed by smoothing (‘Materials and methods,’ Figure 1b and c). Expression values were scaled across vertices and these vertex-level expression maps were averaged across donors to yield a single DEM for each gene, which provided estimates of expression at ~30,000 vertices across the cortical sheet (e.g., DEM for PVALB, upper panel of Figure 1d). These fine-grained vertex-level expression measures also enabled us to estimate the orientation and magnitude of expression change for each gene at every vertex (e.g., dense expression change map for PVALB, lower panel of Figure 1d).

Figure 1 with 2 supplements see all
Creating and benchmarking spatial dense gene expression maps in the human cortex.

(a) Spatially discontinuous Allen Human Brain Atlas (AHBA) microarray samples (red points) were aligned with MRI-derived cortical surface mesh reconstructions. (b) AHBA vertex expression values were propagated using nearest-neighbor interpolation and subsequently smoothed (c). (d) Subject-level maps were z-normalized and averaged to generate a single reference dense expression map (DEM) for each gene, as well as the associated expression gradient map (shown here for PVALB: top and bottom, respectively). (e) DEMs can recover known expression boundaries in in situ hybridization (ISH) data. Four canonical V1 area markers (Zeng et al., 2012) show a significantly sharp DEM expression gradient at the V1/V2 boundary (inset cortical map and Figure 1—figure supplement 2a, b), which is also evident in all four individual gene DEMs and DEM gradients (SYT6, PENK, and Figure 1—figure supplement 2c). (f) DEMs can discover previously unknown expression boundaries. Genes with high DEM gradients across the PeEc (parahippocampal) and TF (fusiform) gyri (inset cortical map) were validated in ISH data, showing sharp expression changes in both directions at this boundary (CHRNA3, NGB, and Figure 1—figure supplement 2d-f). (g) Illustrative comparisons of selected DEMs against regional variation in microscale measures of cellular composition: scatterplot showing the global correlation of regional cellular proportions from single nucleus RNAseq (snRNAseq) across 16 cells and 6 regions (Lake et al., 2016) with DEM values for corresponding cell-type marker genes (R = 0.48, pspin<0.001, excluding Ex3-V1 and In8-BA10 outlier samples). (h) DEMs for markers of six neuronal subtypes (three excitatory: FEZF2, RORB, THEMIS; three inhibitory: PVALB, SST, VIP) based on recently validated subtype marker genes (Bakken et al., 2021; Hodge et al., 2019). (i) Illustrative comparison of layer IV marker DEMs with corresponding mesoscale cortical measure of layer IV thickness from a 20 μm 3D histological atlas of cortical layers. (j) Illustrative comparisons of selected DEMs with corresponding macroscale cortical measures from independent neuroimaging markers.

We assessed the reproducibility of DEMs by repeating the above process (Figure 1) after repeatedly splitting the donors into non-overlapping groups of varying size and using learning curve analyses to estimate the DEM reproducibility achieved by our full set of six donors. For cortically expressed genes (‘Materials and methods,’ Supplementary file 2), the average reproducibility of gene expression maps was rgene = 0.58 (correlation of expression values for a gene across vertices), and the average reproducibility of ranked gene expression at each vertex was rvertex = 0.63 (correlation of expression values at a vertex across genes) (Figure 1—figure supplement 1c-d). These estimates were both substantially lower for genes not reported to be cortically expressed in the independent Human Protein Atlas (rgene = 0.34, t = 37.6, p<0.001 and rvertex = 0.39, t = 273.6, p<0.001, respectively, ‘Materials and methods,’ Supplementary file 2). Genes without recorded cortical expression were threefold enriched (p=0) among the 9647 genes with estimated DEM reproducibility values of r < 0.5. Regional differences in the density of postmortem sampling in the AHBA did not influence DEM reproducibility or the magnitude of local expression change captured by DEMs (‘Materials and methods,’ Figure 1—figure supplement 1h). Thus, remedying the current lack of any spatially dense gene expression maps in the human cortex, we provide DEMs (and accompanying dense expression change maps) for 20,781 genes and establish that >11k of these DEMs show a spatial reproducibility score of rgene > 0.5 between sets of unrelated individuals. Gene-level DEM reproducibility scores allow future users to filter on this feature as desired, and we establish that key analytic outputs from DEMs (see below) show good reproducibility between unrelated individuals and can be recovered at different DEM reproducibility filters.

Given that DEMs were generated by interpolating expression values between sampled regions, we assessed whether DEMs could recover sharp local microscale transitions in gene expression that could theoretically be obscured by interpolation. Of the very few such transitions that have been verified by ISH in humans, the best established occurs between occipital areas V1 and V2 (Zeng et al., 2012). All four genes known to show a sharp V1/V2 expression boundary across layers by ISH – SYT6, TLE4, PCP4, PENK – exhibited qualitatively and quantitatively sharp expression transitions at the V1/V2 boundary in their DEMs (Figure 1e, Figure 1—figure supplement 2a-d). Motivated by this validation, we next asked whether DEMs could identify previously unknown expression boundary markers in the human cortex. To achieve this, we took advantage of extensive existing ISH data between parahippocampal (area PeEc) and fusiform gyri (area TF). We ranked genes by the magnitude of their expression gradient between these cortical regions in DEMs (‘Materials and methods’) and identified four genes with sharp expression transitions predicted by DEMs – NGB,HTR2A (TF > PeEc) and NTS, CHRNA3 (PeEc > TF) – for which independent ISH data were available. Expression profiling in ISH slabs verified the existence of sharp expression transition for all four genes (Figure 1f, Figure 1—figure supplement 2e-g). As the V1/V2 and the PeEc/TF boundaries both involve transitions between classical laminar types in cortical regions with highly conserved anatomical patterning (von Economo and Koskinas, 1925), we also tested whether DEMs could recover expression boundaries in more variable and uniformly laminated association cortex (Ronan and Fletcher, 2015). No such expression boundaries have been described in humans by ISH, but there are reports of sharp expression boundaries between frontal areas 44 and 45b for several genes in non-human primates: SCN1B, KCNS1, TRIM55 (Chen et al., 2022). These genes also exhibited high DEM gradients at the boundary between human frontal areas 44 and 45 (Figure 1—figure supplement 2h-g). Taken together, these observations demonstrate the capacity of DEMs to resolve sharp expression transitions and indicate that DEMs can be used to help target prospective postmortem validation of new expression boundaries in humans.

To benchmark and illustrate the use of DEMs to capture cortical features across contrasting spatial scales, we drew on selected micro- and macroscale cortical measures that DEMs should align with based on known biological processes (Figure 1g–j, ‘Materials and methods’). To assess whether DEMs could recover microscale differences in cellular patterning across the cortical sheet, we considered the ground truth of neuronal cell-type proportions as measured by single-nucleus RNAseq (snRNAseq) across six different cortical regions (Lake et al., 2016). We observed a strong spatial correlation (r = 0.6, pspin<0.001) between regional marker gene expression in DEMs and regional proportions of their corresponding neuronal subtypes from snRNAseq (Figure 1g, ‘Materials and methods’). Figure 1h shows example marker gene DEMs for six canonical neuronal subtypes: three excitatory (FEZF2, RORB, THEMIS) and three inhibitory (PVAL, SST, VIP) (Bakken et al., 2021; Hodge et al., 2019). Next, to assess whether DEMs could recover regional variation in the mesoscale feature of cortical layering, we tested and verified that regional variation in the average DEM for layer IV marker genes (He et al., 2017; Maynard et al., 2021; Zeng et al., 2012) was highly correlated with regional variation in layer IV thickness as determined from a 3D histological atlas of cortical layers (Wagstyl et al., 2020; Figure 1i). Finally, we asked whether DEMs could recover spatially dense measures of regional variation across the cortical sheet as provided by neuroimaging data and found that maps from diverse measurement modalities showed strong and statistically significant spatial correlations with their corresponding DEM(s) relative to a null distribution based on random ‘spinning’ of maps (Alexander-Bloch et al., 2018; Figure 1j, ‘Materials and methods,’ all pspin<0.01): (i) areas of cortex activated during motor fMRI tasks in humans (Glasser et al., 2016) vs. the average DEM for canonical cell markers of large pyramidal neurons (Betz cells) found in layer V of the motor cortex that are the outflow for motor movements (Bakken et al., 2021), (ii) an in vivo neuroimaging marker of cortical myelination (T1/T2 ratio [Glasser and Van Essen, 2011]) vs. the Myelin Basic Protein DEM, which marks myelin, and (iii) the degree of in vivo regional cortical thinning by MRI in Alzheimer’s disease (AD) patients who have at least one APOE E4 variant (Gutiérrez-Galve et al., 2009; LaMontagne et al., 2019) vs. the APOE DEM (thinning map generated from 119 APOE E4 patients and 633 controls structural MRI [sMRI] scans as detailed in ‘Materials and methods’), testing the hypothesis that higher regional APOE expression will result in greater cortical atrophy in individuals with the APOE E4 risk allele. Collectively, the above tests of reproducibility (Figure 1—figure supplement 1) and convergent validity (Figure 1e–j) supported the use of DEMs for downstream analyses.

Defining and surveying the human cortex as a continuous transcriptional terrain

As an initial summary view of transcriptional patterning in the human cortex, we first averaged all 20,781 DEMs to represent the cortex as a single continuous transcriptional terrain, where altitude encodes the transcriptional distinctiveness (TD) of each cortical point (vertex) relative to all others (TD = mean(abs(zexp)), Figure 2a, Video 1). This terrain view revealed six statistically significant TD peaks (‘Materials and methods,’ Figure 2a and b) which recover all major archetypal classes of the mammalian cortex as defined by classical studies of laminar and myelo-architecture, connectivity, and functional specialization (Mesulam, 1998) encompassing primary visual (V1), somatosensory (Brodmann area [BA] [Brodmann, 1909] 2), and motor cortex (BA 4), as well as limbic (temporal pole centered on dorsal temporal area G [TGd]; von Economo and Koskinas, 1925, ventral frontal centered in orbitofrontal cortex [OFC]) and heteromodal association cortex (BA 9-46d). Of note, our agnostic parcellation of all TD peak vertices by their ranked gene lists (‘Materials and methods’) perfectly cleaved BA2 and BA4 along the central sulcus, despite there being no representation of this macroanatomical landmark in DEMs. The TD map observed from the full DEMs library was highly stable between all disjoint triplets of donors (‘Materials and methods,’ Figure 2—figure supplement 1a, median cross-vertex correlation in TD scores between triplets r = 0.77) and across library subsets at all deciles of DEM reproducibility (‘Materials and methods,’ Figure 2—figure supplement 1b, cross-vertex correlation in TD scores r > 0.8 for the 3rd to 10th deciles), but was not recapitulated in spun null datasets (Figure 2—figure supplement 1c).

Figure 2 with 1 supplement see all
Mapping transcriptional distinctiveness (TD) in the human cortex and its alignment with macroscale structure and function.

(a) Regional TD can be quantified as the mean absolute z-score of dense expression map (DEM) values at each vertex (top) and visualized as a continuous cortical map (middle, TD encoded by color) or in a relief map of the flattened cortical sheet (bottom, TD encoded by color and elevation, Video 1). Black lines on the inflated view identify cuts for the flattening procedure. The cortical relief map is annotated to show the central sulcus (CS), and peaks of TD overlying dorsal sensory and motor cortices (Brodmann areas, BA2, BA4), the primary visual cortex (V1), temporal pole (TGd), insula (Ins), and ventromedial prefrontal cortex (OFC). (b) Thresholding the TD map through spatial permutation of DEMs (tspin ; ‘Materials and methods’) and clustering significant vertices by their expression profile defined six TD peaks in the adult human cortex (depicted as colored regions on terrain and inflated cortical surfaces). (c) Cortical vertices projected into a 3D coordinate system defined by the first three principal components (PCs) of gene expression, colored by the continuous TD metric (left) and TD peaks (right). TD peaks are focal anchors of cortex-wide expression PCs. (d) TD peaks show statistically significant functional specializations in a meta-analysis of in vivo functional MRI data. (e) The average magnitude of local expression transitions across genes (color) and principal orientation of these transitions (white bars) varies across the cortex. (f) Cortical folds in Allen Human Brain Atlas (AHBA) donors (top surface maps and middle flat map) tend to be aligned with the principal orientation of TD change across cortical vertices (p<0.01, middle histogram, sulci running perpendicular to TD change), and the strength of this alignment varies between cortical regions. (g) Putative cortical areas defined by a multimodal in vivo MRI parcellation of the human cortex (Glasser et al., 2016) (top surface maps and middle flat map) also tend to be aligned with the principal direction of gene expression change across cortical vertices (p<0.01, middle histogram, sulci running perpendicular to long axis of area boundaries), and the strength of this alignment varies between cortical areas.

Video 1
Visualisation of Transcriptional Distinctiveness (TD) in the human cortex, encoded by both color and elevation.

Integration with principal component analysis (PCA) of DEMs across vertices (‘Materials and methods,’ Figure 2—figure supplement 1d and e) showed that TD peaks constitute sharp poles of more recently recognized cortical expression gradients (Burt et al., 2018; Figure 2c). The ‘area-like’ nature of these TD peaks is reflected by the steep slopes of transcriptional change surrounding them (Figure 2a and e) and could be quantified as TD peaks being transcriptomically more distinctive than their physical distance from other cortical regions would predict (Figure 2—figure supplement 1f and g). In contrast, transitions in gene expression are more gradual and lack such sharp transitions in the cortical regions between TD peaks (Figure 2a, c and e, Figure 2—figure supplement 1j). Thus, because DEMs provide spatially fine-grained estimates of cortical expression and expression change, they offer an objective framework for arbitrating between area-based and gradient-based views of cortical organization in a regionally specific manner.

The TD peaks defined above exist as both discrete patches of cortex and the distinctive profile of gene expression which defines each peak, and this duality offers an initial bridge between macro- and microscale views of cortical organization. Specifically, we found that each TD peak overlapped with a functionally specialized cortical region based on meta-analysis of in vivo functional neuroimaging data (Yarkoni et al., 2011; ‘Materials and methods,’ Figure 2d, Supplementary file 3), and featured a gene expression signature that was preferentially enriched for a distinct set of biological processes, cell-type signatures, and cellular compartments (‘Materials and methods,’ Supplementary file 2). For example, the peaks overlapping area TGd and OFC were enriched for synapse-related terms, while BA2 and BA4 TD peaks were predominantly enriched for metabolic and mitochondrial terms. At a cellular level, V1 closely overlapped with DEMs for marker genes of the Ex3 neuronal subtype known to be localized to V1 (Lake et al., 2016), while BA4 closely overlapped Betz cell markers (Bakken et al., 2021; Figure 2—figure supplement 1h).

The expression profile of each TD peak was achieved through surrounding zones of rapid transcriptional change (Figure 2a and e, Figure 2—figure supplement 1i and j). We noted that these transition zones tended to overlap with cortical folds, suggesting an alignment between spatial orientations of gene expression and folding. To formally test this idea, we defined the dominant orientation of gene expression change at each vertex (‘Materials and methods,’ Figure 2e) and computed the angle between this and the orientation of folding (‘Materials and methods’). The observed distribution of these angles across vertices was significantly skewed relative to a null based on random alignment between angles (pspin<0.01, Figure 2f, ‘Materials and methods’), indicating that there is indeed a tendency for cortical sulci and the direction of fastest transcriptional change to run perpendicular to each other (pspin<0.01, Figure 2f). A similar alignment was seen when comparing gradients of transcriptional change with the spatial orientation of putative cortical areas defined by multimodal functional and structural in vivo neuroimaging (Glasser et al., 2016) (expression change running perpendicular to area long axis, pspin<0.01, Figure 2g, ‘Materials and methods’). Visualizing these expression-folding and expression-areal alignments revealed greatest concordance over sensorimotor, medial occipital, cingulate, and posterior perisylvian cortices (with notable exceptions of transcription change running parallel to sulci and the long axis of putative cortical areas in lateral temporoparietal and temporopolar regions). As a preliminary probe for causality, we examined the developmental ordering of regional folding and regional transcriptional identity. Mapping the expression of high-ranking TD genes in fetal cortical laser dissection microarray data (Miller et al., 2014) from 21 PCW (post conception weeks) (‘Materials and methods’) showed that the localized transcriptional identity of V1 and TGd regions in adulthood is apparent during the fetal periods that folding topology begins to emerge (Chi et al., 1977; Xu et al., 2022; Figure 2—figure supplement 1k). Thus, the unique capacity of DEMs to resolve local orientations of expression change reveals a close spatial alignment between regional transitions of cortical gene expression at microscale and regional transitions of cortical folding, structure, and function at macroscale.

Cortical gene co-expression integrates diverse spatial scales of human brain organization

To complement the TD analyses above (Figure 2), we next used weighted gene co-expression network analysis (WGCNA; Langfelder and Horvath, 2008, ‘Materials and methods’, Figure 3a) to achieve a more systematic integration of macro- and macroscale cortical features. Briefly, WGCNA constructs a connectivity matrix by quantifying pairwise co-expression between genes, raising the correlations to a power (here 6) to emphasize strong correlations while penalizing weaker ones, and creating a topological overlap matrix (TOM) to capture both pairwise similarities expression and connectivity. Modules of highly interconnected genes are identified through hierarchical clustering. The resultant WGCNA modules enable topographic and genetic integration because they each exist as both (i) a single expression map (eigenmap) for spatial comparison with neuroimaging data (Figure 3a and b, ‘Materials and methods’) and (ii) a unique gene set for enrichment analysis against marker genes systematically capturing multiple scales of cortical organization, namely cortical layers, cell types, cell compartments, protein–protein interactions (PPI), and GO terms (‘Materials and methods,’ Supplementary files 2 and 4). Furthermore, whereas prior applications of WGCNA to AHBA data have revealed gene sets that covary in expression across many different compartments of the brain (Hartl et al., 2021; Hawrylycz et al., 2015; Kelley et al., 2018), using DEMs as input to WGCNA generates modules that are purely based on the fine-scale coordination of gene expression across the cortex. Using WGCNA, we identified 16 gene modules (M1–M16), which we then deeply annotated against independent measures of cortical organization at diverse spatial scales and developmental epochs (Figure 3c, ‘Materials and methods’). Module eigenmaps were primarily driven by highly reproducible genes (Figure 3—figure supplement 1a) as were enrichments for annotational gene sets (median reproducibility of enriching genes = 0.59, p<0.001 elevated vs. background).

Figure 3 with 1 supplement see all
Cortex-wide gene co-expression patterns reflect multiple spatial scales and developmental epochs of brain organization.

(a) Overview of weighted gene co-expression network analysis (WGCNA) pipeline applied to the full dense expression map (DEM) dataset. Starting top left: the pairwise DEM spatial correlation matrix is used to generate a topological overlap matrix between genes (middle top), which is then clustered. Of the 23 WGCNA-defined modules, 7 were significantly enriched for non-cortical genes and removed, leaving 16 modules. Each module is defined by a set of spatially co-expressed genes, for which the principal component of expression can be computed and mapped at each cortical point (eigenmap). M6 is shown as an example projected onto an inflated left hemisphere (M6 z-scored expression and M6 expression change), and the bulk transcriptional distinctiveness (TD) terrain view from Figure 2 (M6 expression). (b) The extremes of WGCNA eigenmaps highlight different peaks in the cortical terrain: the main TD terrain colored by TD value (center, from Figure 2), surrounded by TD terrain projections of selected WGCNA eigenmaps. (c) WGCNA modules (eigenmaps and gradient maps, rows) are enriched for multiscale aspects of cortical organization (columns). Cell color intensity indicates pairwise statistical significance (p<0.05), while black outlines show significance after correction for multiple comparisons across modules. Columns capture key levels of cortical organization at different spatial scales (arranged from macro- to microscale) and developmental epochs: spatial alignment between module eigenmaps and in vivo MRI maps of cortical folding orientation, cortical thickness and T1/T2 ratio, fMRI resting-state functional networks; enrichment for module gene sets for independent annotations (Supplementary file 2) marking: cortical layers (He et al., 2017; Maynard et al., 2021); cell types (Darmanis et al., 2015; Habib et al., 2017; Hodge et al., 2019; Lake et al., 2018; Lake et al., 2016; Li et al., 2018; Ruzicka et al., 2021; Velmeshev et al., 2019; Zhang et al., 2016); subcellular compartments (Binder et al., 2014); synapse-related genes (Koopmans et al., 2019); protein–protein interactions between gene products (Szklarczyk et al., 2019); temporal epochs of peak expression (Werling et al., 2020) (‘fetal’: 8–24 21 post conception weeks [PCW]/’‘perinatal’' 24 PCW–6 mo/‘postnatal’ > 6 mo); transient layers of the mid-fetal human cortex at 21 PCW (Miller et al., 2014) (subpial granular zone [SG], marginal zone [MZ], cortical plate [CP], subplate [SP], intermediate zone [IZ], subventricular zone [SZ], and ventricular zone [VZ]); and fetal cell types at 17–18 PCW (Polioudakis et al., 2019). (d) Independent validation of multiscale enrichments for selected modules M2 and M12. M2 significantly overlaps the Neurosynth topic associated with the terms motor, cortex, and hand. Two high-ranking M2 genes, MOG and TF, exhibit clear layer VI peaks on in situ hybridization (ISH) and GO enrichment analysis myelin-related annotations. M12, overlapping the limbic network most closely overlapped the Neurosynth topic associated with social reasoning. Two high-ranking M22 genes GABRA2 and GRIN2B showed layer II ISH peaks and GO enrichment analysis revealed synaptic annotations. (e) Network visualization of pairwise overlaps between annotational gene sets used in (c), including WGCNA module gene sets (inset expression eigenmaps).

Several WGCNA modules showed statistically significant alignments with structural and functional features of the adult cerebral cortex from in vivo imaging (‘Materials and methods,’ Figure 3c; Glasser and Van Essen, 2011; Yeo et al., 2011). For example, (i) the M6 eigenmap was significantly positively correlated with in vivo measures of cortical thickness from sMRI and enriched within a limbic functional connectivity network defined by resting-state functional connectivity MRI, and (ii) the M8, M9, and M14 eigenmaps showed gradients of expression change that were significantly aligned with the orientation of cortical folding (especially around the central sulcus, medial prefrontal, and temporo-parietal cortices, Figure 3—figure supplement 1b). At microscale, several WGCNA module gene sets showed statistically significant enrichments for genes marking specific cortical layers (He et al., 2017; Maynard et al., 2021) and cell types (Darmanis et al., 2015; Habib et al., 2017; Hodge et al., 2019; Lake et al., 2018; Lake et al., 2016; Li et al., 2018; Ruzicka et al., 2021; Velmeshev et al., 2019; Zhang et al., 2016; ‘Materials and methods,’ Figure 3c, Supplementary file 4). These microscale enrichments were often congruent between cortical layers and cell classes annotations, and in keeping with the linked eigenmap (Figure 3c, Supplementary file 4). For example, M4, which was uniquely co-enriched for markers of endothelial cells and middle cortical layers, showed peak expression over dorsal motor cortices which are known to show expanded middle layers (Bakken et al., 2021; Wagstyl et al., 2020) with rich vascularization (Pfeifer, 1940) relative to other cortical regions. Similarly, M6, which was enriched for markers of astrocytes, microglia, and excitatory neurons, as well as layers 1/2, showed peak expression over rostral frontal and temporal cortices which are known to possess relatively expanded supragranular layers (Wagstyl et al., 2020) that predominantly contain the apical dendrites of excitatory neurons and supporting glial cells (von Economo and Koskinas, 1925). We also observed that modules with similar eigenmaps (Figure 3—figure supplement 1c), (including overlaps of multiple modules with the same TD peak) could show contrasting gene set enrichments. For example, M2 and M4 both showed peak expression of dorsal sensorimotor cortex (i.e., TD areas BA2 and BA4), but M2 captures a distinct architectonic signature of sensorimotor cortex from the mid-layer vascular signal of M4: expanded and heavily myelinated layer 6 (Bakken et al., 2021; Palomero-Gallagher and Zilles, 2019; Wagstyl et al., 2020; Figure 3c). The spatially co-expressed gene modules detected by WGCNA were not only congruently co-enriched for cortical layer and cell markers, but also for nanoscale features such as subcellular compartments (Binder et al., 2014; Supplementary files 2 and 4) (often aligning with the cellular enrichments) and PPIs (Szklarczyk et al., 2019; ‘Materials and methods,’ Figure 3c, Supplementary file 4). This demonstrates the capacity of our resource to tease apart subtle subcomponents of neurobiology based on cortex-wide expression patterns.

To further assess the robustness of these multiscale relationships, we focused on two modules with contrasting multiscale signatures – M2 and M12 – and tested for reproducibility of our primary findings (Figure 3c) using orthogonal methods. Our primary analyses indicated that M2 has an expression eigenmap which overlaps with the canonical somatomotor network from resting-state functional neuroimaging (Yeo et al., 2011) and contains genes that are preferentially expressed in cortical layer 6 from layer-resolved transcriptomics (He et al., 2017; Maynard et al., 2021), and in oligodendrocytes from snRNAseq (Darmanis et al., 2015; Habib et al., 2017; Hodge et al., 2019; Lake et al., 2018; Lake et al., 2016; Li et al., 2018; Ruzicka et al., 2021; Velmeshev et al., 2019; Zhang et al., 2016; Figure 3c). We were able to verify each of these observations through independent validations including spatial overlap of M2 expression with meta-analytic functional activations relating to motor tasks (Yarkoni et al., 2011); immunohistochemistry localization of high-ranking M2 genes to deep cortical layers (Zeng et al., 2012; ‘Materials and methods’); and significant enrichment of M2 genes for myelin-related GO terms (Figure 3d, Supplementary file 4). By contrast, our primary analyses indicated that M12, which had peak expression over ventral frontal and temporal limbic cortices, was enriched for marker genes for layer 2, neurons and the synapse (Figure 3c). These multiscale enrichments were all supported by independent validation analyses, which showed that the M12 eigenmaps is enriched in a limbic network that is activated during social reasoning (Yarkoni et al., 2011) high-ranking M12 marker genes show elevated expression in upper cortical layers by immunohistochemistry (Zeng et al., 2012; ‘Materials and methods’); and there is a statistically significant over-representation of synapse compartment GO terms in the M12 gene set (Figure 3d, Supplementary file 4).

Linking spatial and developmental aspects of cortical organization

Given that adult cortical organization is a product of development, we next asked whether eigenmaps of adult cortical gene expression (Figure 3a and b) are related to the patterning of gene expression between fetal stages and adulthood. To achieve this, we tested WGCNA module gene sets for enrichment of developmental marker genes from three independent postmortem studies (rightmost columns, Figure 3c) capturing genes with differential expression between (i) three developmental epochs between 8 PCWs and adulthood (BrainVar dataset from prefrontal cortex [Werling et al., 2020]);(ii) seven histologically defined zones of mid-fetal (21 PCW) cortex (Miller et al., 2014; ‘Materials and methods,’ 2Supplementary files 1 and 2); and (iii) 16 mid-fetal (17–18 PCW) cell types (Polioudakis et al., 2019; ‘Materials and methods,’ Supplementary file 2).

Comparison with the BrainVar dataset revealed that most module eigenmaps (13 of all 16 cortical modules) were enriched for genes with dynamic, developmentally coordinated expression levels between early fetal and late adult stages (Figure 3c, Supplementary file 4). This finding was reinforced by supplementary analyses modeling developmental trajectories of eigenmap gene set expression between 12 PCW and 40 y in the BrainSpan dataset (Li et al., 2018; ‘Materials and methods,’ Figure 3—figure supplement 1d), and further qualified by the observation that several WGCNA modules were also differentially enriched for markers of mid-fetal cortical layers and cell types (Miller et al., 2014; Polioudakis et al., 2019; Figure 3c, Supplementary file 4). As observed for multiscale spatial enrichments (Figure 3c and d), the developmental enrichments of modules were often closely coordinated with one another, and eigenmaps with similar patterns of regional expression could possess different signatures of developmental enrichment. For example, the M6 and M12 eigenmaps shared a similar spatial expression pattern in the adult cortex (peak expression in medial prefrontal, anterior insula, and medioventral temporal pole), but captured different aspects of human brain development that aligned with the cyto-laminar enrichments of M6 and M12 in adulthood. The M6 gene set, which was enriched for predominantly glial elements of layers 1 and 2 in adult cortex, was also enriched for markers of mid-fetal microglia (Polioudakis et al., 2019), the transient fetal layers that are known to be particularly rich in mid-fetal microglia (subpial granular, subplate, and ventricular zone [Monier et al., 2007]), and the mid-late fetal epoch when most microglial colonization of the cortex is thought to be achieved (Menassa and Gomez-Nicola, 2018; Figure 3c). In contrast, the M12 gene set, which was enriched for predominantly neuronal elements of layer 2 in adult cortex, also showed enrichment for marker genes of developing fetal excitatory neurons, the fetal cortical subplate, and windows of mid-late fetal development when developing neurons are known to be migrating into a maximally expanded subplate (Molnár et al., 2019).

The striking co-enrichment of WGCNA modules for features of both the fetal and adult cortex (Figure 3c) implied a patterned sharing of marker genes between cyto-laminar features of the adult and fetal cortex. To more directly test this idea and characterize potential biological themes reflected by these shared marker genes, we carried out pairwise enrichment analyses between all annotational gene sets from Figure 3c. These gene sets collectively draw from a diverse array of study designs encompassing bulk, laminar, and single-cell transcriptomics of the human cortex between 10 PCW and 60 y of life (‘Materials and methods’; Darmanis et al., 2015; Habib et al., 2017; He et al., 2017; Li et al., 2018; Maynard et al., 2021; Miller et al., 2014; Polioudakis et al., 2019; Ruzicka et al., 2021; Velmeshev et al., 2019; Werling et al., 2020; Zhang et al., 2016). Network visualization and clustering of the resulting adjacency matrix (Figure 3—figure supplement 1e) revealed an integrated annotational space defined by five coherent clusters (Figure 3e). A mature neuron cluster encompassed markers of postmitotic neurons and the compartments that house them in both fetal and adult cortex (red, Figure 3e, Supplementary file 2, example core genes: NRXN1, SYT1, CACNG8). This cluster also included genes with peak expression between late fetal and early postnatal life, and those localizing to the plasma membrane and synapse. A small neighboring fetal ganglionic eminence cluster (fetal GE, yellow, Figure 3e, Supplementary file 2, example core genes: NPAS3, DSX, DCLK2) contained marker sets for migrating inhibitory neurons from the medial and caudal ganglionic eminence in mid-fetal life. These two neuronal clusters – mature neuron and fetal GE – were most strongly connected to the M12 gene set (‘Materials and methods’), which highlights medial prefrontal, and temporal cortices possessing a high ratio of neuropil:neuronal cell bodies (Collins et al., 2010; Spocter et al., 2012). A mitotic annotational cluster (blue, Figure 3e, Supplementary file 2, example core genes: CCND2, MEIS2, PHLDA1) was most distant from these two neuronal clusters and included genes showing highest expression in early development as well as markers of cycling progenitor cells, radial glia, oligodendrocyte precursors, germinal zones of the fetal cortex, and the nucleus. This cluster was most strongly connected to the M15 gene set, which shows high expression over occipito-parietal cortices distinguished by a high cellular density and notably low expression in lateral prefrontal cortices, which possess low cellular density (Collins et al., 2016). The mature neuron and mitotic clusters were separated by two remaining annotational clusters for non-neuronal cell types and associated cortical layers. A myelin cluster (orange, Figure 3e, Supplementary file 2, example core genes: MOBP, CNP, ACER3) – which contained gene sets marking adult layer 6, oligodendrocytes, and organelles supporting the distinctive biochemistry and morphology of oligodendrocytes (the golgi apparatus, endoplasmic reticulum, and cytoskeleton) – was most connected to the M2 gene set highlighting heavily myelinated motor cortex (Nieuwenhuys and Broere, 2017). A non-neuronal cluster (yellow, Figure 3e, Supplementary file 2, example core genes: TGFBR2, GMFG, A2M) – which encompassed marker sets for microglia, astrocytes, endothelial cells, pericytes, and markers of superficial adult and fetal cortical layers that are relatively depleted of neurons – was most connected to the M6 gene set highlighting medial temporal and anterior cingulate cortices with notably high non-neuronal content (Collins et al., 2010).

These analyses show that the regional patterning of bulk gene expression captures the organization of the human cortex across multiple spatial scales and developmental stages such that (i) the summary expression maps of spatially co-expressed gene sets align with independent in vivo maps of macroscale structure and function from neuroimaging, while (ii) the spatially co-expressed gene sets defining these maps show congruent enrichments for specific adult cortical layers and cell types as well as developmental precursors of these features spanning back to mid-fetal life.

ASD risk genes follow two different spatial patterns of cortical expression, which capture distinct aspects of cortical organization and differentially predict cortical changes in ASD

The findings above establish that gene co-expression modules in the human cortex capture multiple levels of biological organization ranging from subcellular organelles to cell types, cortical layers, and macroscale patterns of brain structure and function. Given that genetic risks for atypical brain development presumably play out through such levels of biological organization, we hypothesized that disease-associated risk genes would be enriched within WGCNA module gene sets. Testing this hypothesis simultaneously offers a means of further validating our analytic framework, while also potentially advancing understanding of disease biology. To test for disease gene enrichment in WGCNA modules, we compiled lists of genes enriched for deleterious rare variants in ASD (Ruzzo et al., 2019; Satterstrom et al., 2020), schizophrenia (Scz; Singh et al., 2020), severe developmental disorders (DDD; Deciphering Developmental Disorders Study, 2017), and epilepsy (Heyne et al., 2018; Supplementary file 2). We considered rare (as opposed to common) genetic variants to focus on high effect-size genetic associations and avoid ongoing uncertainties regarding the mapping of common variants to genes (Tam et al., 2019). We observed that disease-associated gene sets were significantly enriched in several WGCNA modules (Figure 4a), with two modules showing enrichments for more than one disease: M15 (ASD, Scz, and DDD) and M12 (ASD and epilepsy). ASD was the only disorder to show a statistically significant enrichment of risk genes within both M12 and M15 (Figure 4a), providing an ideal setting to ask if and how this partitioning of ASD risk genes maps onto (i) multiscale brain organization in health and (ii) altered brain organization in ASD.

Figure 4 with 1 supplement see all
Autism spectrum disorder (ASD) risk genes follow two different spatial patterns of cortical gene expression which differentially predict cortical changes in ASD.

(a) Enrichment of weighted gene co-expression network analysis (WGCNA module gene sets for risk genes associated with atypical brain development through enrichment of rare deleterious variants in studies of ASD, schizophrenia (Scz), severe developmental disorders (DDD, deciphering developmental disorders study), and epilepsy. Cell color intensity indicates pairwise statistical significance (p<0.05)), while outlined matrix cells survived correction for multiple comparisons across modules. (b) Summary of multiscale and developmental annotations from Figure 3c for M12 and M15: the only two WGCNA modules enriched for risk genes of more than one neurodevelopmental disorder. (c) M12 and M15 genes clustered by the strength of their membership to each module. Color encodes module membership. Shape encodes annotations for two GO Biological Process annotations that differ between the module gene sets: neuronal communication and regulation of gene expression. Text denotes specific ASD risk genes. (d) Contrasting GO enrichment of M12 and M15 for neuronal communication and regulation of gene expression GO Biological Process annotations. (e) M12 and M15 differ in the developmental trajectory of their average cortical expression between early fetal and mid-adult life (Li et al., 2018). (f) Regional differences in intrinsic expression of the M15 module (but not the M12 module) in adult cortex is correlated with regional variation in the severity of altered cortical gene expression (number of differentially expressed genes) in ASD (Haney et al., 2020). (g) Statistically significant regional alterations of cortical thickness (CT) in ASD compared to typically developing controls from in vivo neuroimaging (Di Martino et al., 2017) (top). Areas of cortical thickening show a statistically significant spatial overlap (Dice overlap = 0.68, pspin<0.01) with regions of peak intrinsic expression for M15 in adult cortex (bottom). (h) M15 eigenmap expression (but not M12 eigenmap) shows significant spatial correlation with relative CT change in ASD.

The eigenmaps and gene set enrichments of M12 vs. M15 implicated two contrasting multiscale motifs in the biology of ASD (Figure 4b). ASD risk genes, including SCN2A, SYNGAP1, and SHANK2, resided within the M12 module (Figure 4c), which is most highly expressed within a distributed cortical system that is activated during social reasoning tasks (pspin<0.01, Figure 3c and d, Figure 4b). The M12 gene set is also enriched for: genes with peak cortical expression in late-fetal and early postnatal life; marker genes for the fetal subplate and developing excitatory neurons; markers of layer 2 and mature neurons in adult cortex; and synaptic genes involved in neuronal communication (Figures 3c and d and 4b–e, Supplementary file 4). In contrast, ASD risk genes, including ADNP, KMT5B, and MED13L, resided within the M15 module (Figure 4c), which is most highly expressed in primary visual cortex and associated ventral temporal pathways for object recognition/interpretation (Kravitz et al., 2013) (pspin<0.05, Figures 3c and d and 4b, Supplementary file 4). The M15 module is also enriched for genes showing peak cortical expression in early fetal development, marker genes for cycling progenitor cells in the fetal cortex; markers of layer 2, inhibitory neurons and oligodendrocyte precursors in the adult cortex (Figures 3c and d and 4b–e, Supplementary file 4). The alignment of ASD risk genes with M12 and M15 was reinforced when considering all 135 ASD risk genes: spatial co-expression analyses split these genes into two clear subsets with mean expression maps that most closely resembled M12 and M15 (Figure 4—figure supplement 1a, b). Thus, using only spatial patterns of cortical gene expression in adulthood, our analytic framework was able to recover the previous PPI and GO-based partitioning of ASD risk genes into synaptic vs. nuclear chromatin remodeling pathways (Parikshak et al., 2013; Satterstrom et al., 2020), and then place these pathways into a richer biological context based on the known multiscale associations of M12 and M15 (Figures 3c and 4a).

We next sought to address whether regional differences in M12 and M15 expression were related to regional cortical changes observed in ASD. To test this idea, we used two orthogonal indices of cortical change in ASD that capture different levels of biological analysis – the number of differentially expressed genes (DEGs) postmortem (Haney et al., 2020), and the magnitude of changes in cortical thickness (CT) as measured by in vivo sMRI (Di Martino et al., 2017). Regional DEG counts were derived from a recent postmortem study of 725 cortical samples from 11 cortical regions in 112 ASD cases and controls (Haney et al., 2020), and compared with mean M12 and M15 expression within matching areas of a multimodal MRI cortical parcellation (Glasser et al., 2016). The magnitude of regional transcriptomic disruption in ASD was statistically significantly positively correlated with region expression of the M15 module (r = 0.6, pspin<0.05), but not the M12 module (r = −0.3, pspin>0.05) (Figure 4f). This dissociation is notable because M15 (but not M12) is enriched for genes involved in the regulation of gene expression (Figure 4d). Thus the enrichment of regulatory ASD risk genes within M15, and the intrinsically high expression of M15 in occipital cortex may explain why the occipital cortex is a hotspot of altered gene expression in ASD.

To compare M12 and M15 expression with regional variation in cortical anatomy changes in ASD, we harnessed the multicenter ABIDE datasets containing brain sMRI scans from 751 participants with idiopathic ASD and 773 controls (Di Martino et al., 2017; Di Martino et al., 2013). We preprocessed all scans using well-validated tools for harmonized estimation of cortical thickness (CT) (Fischl, 2012) from multicenter data (‘Materials and methods’), and then modeled CT differences between ASD and control cohorts at 150,000 points (vertices) across the cortex (‘Materials and methods’). This procedure revealed two clusters of statistically significant CT change in ASD (‘Materials and methods,’ Figure 4g, upper panel) encompassing visual and parietal cortices (relative cortical thickening vs. controls) as well as superior frontal vertices (relative cortical thinning). The occipital cluster of cortical thickening in ASD showed a statistically significant spatial overlap with the cluster of peak M15 expression (Figure 4g, upper panel, ‘Materials and methods,’ Dice coefficient = 0.7, pspin<0.01), and relative cortical thickness change correlated with the M15 eigenmap (Figure 4h). In contrast, M12 expression was not significantly aligned with CT change in ASD (Figure 4g and h). Testing these relationships in the opposite direction, that is, asking whether regions of peak M12 and M15 expression are enriched for directional CT change in ASD relative to other cortical regions, recovered the M15-specific association with regional cortical thickening in ASD (Figure 4—figure supplement 1c).

Taken together, the above findings reveal that an occipital hotspot of altered gene expression and cortical thickening in ASD overlaps with an occipital hotspot of high expression for a subset of ASD risk genes. These ASD risk genes are spatially co-expressed in a module enriched for several connected layers of biological organization (Figures 3c and 4b–d) spanning: nuclear pathways for chromatin modeling and regulation of gene expression; G2/M phase cycling progenitors and excitatory neurons in the mid-fetal cortex; oligodendrocytes and layer 2 cortical neurons in adult cortex; and occipital functional networks involved in visual processing. These multiscale aspects of cortical organization can now be prioritized as potential targets for a subset of genetic risk factors in ASD, and the logic of this analysis in ASD can now be generalized to any disease genes of interest.

Discussion

We build on the most anatomically comprehensive dataset of human cortex gene expression available to date (Hawrylycz et al., 2012), to generate, validate, characterize, apply, and share spatially dense measures of gene expression that capture the topographically continuous nature of the cortical mantle. By representing patterns of human cortical gene expression without the imposition of a priori boundaries (Burt et al., 2018; Hawrylycz et al., 2015), our library of DEMs allows anatomically unbiased analyses of local gene expression levels as well as the magnitudes and directions of local gene expression change. This core spatial property of DEMs unlocks several methodological and biological advances. First, the unparcellated nature of DEMs allows us to agnostically define cortical zones with extreme transcriptional profiles or unusually rapid transcriptional change, which we show to capture microstructural cortical properties and align with folding and functional specializations at the macroscale (Figure 2). By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a ‘protomap’ (O’Leary, 1989; O’Leary et al., 2007; Rakic, 1988; Rakic et al., 2009)-like model where the placement of some cortical folds is setup by rapid tangential changes in cyto-laminar composition of the developing cortex (Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding (Heuer and Toro, 2019; Llinares-Benadero and Borrell, 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship.

We show that DEMs can recover sharp boundaries in gene expression despite being generated by interpolation algorithms that do not explicitly encode step changes in expression between cortical regions. This property of DEMs will help to target future studies of human cortical patterning (e.g., directing single-cell and spatial omics resources), and we illustrate this utility by applying DEMs to discover two new expression boundaries in the human cortex. Second, we use spatial correlations between DEMs to decompose the complex topography of cortical gene expression into a smaller set of cortex-wide transcriptional programs that capture distinct aspects of cortical biology – at multiple spatial scales and multiple developmental epochs (Figure 3). This effort provides an integrative model that links expression signatures of cell types and layers in prenatal life to the large-scale patterning of regional gene expression in the adult cortex, which can in turn, through DEMs, be compared to the full panoply of in vivo brain phenotypes provided by modern neuroimaging. Indeed, future work might find direct links between these module eigenvectors and similar low-frequency eigenvectors of cortical geometry have been used as basis functions to segment the cortex (Lefèvre et al., 2018) and explain complex functional activation patterns (Pang et al., 2023). Third, we find that some of these cortex-wide expression programs in adulthood are enriched for disease risk genes, which offers a new path to nominating candidate disease mechanisms across different levels of biological organization (Figure 4). For example, the M15 module defines a normative spatial pattern of cortical gene co-expression which not only captures a functionally enriched subset of ASD genes (Satterstrom et al., 2020), but also shows multiscale enrichments and regionally specific expression patterns that tie together several independently reported aspects of ASD neurobiology. Specifically, M15 newly integrates (i) the concentration of ASD risk genes and dysregulated gene expression in upper-layer excitatory neurons (Velmeshev et al., 2019), (ii) the accentuation of altered gene expression and thickness in occipital cortical regions, and (iii) the early emergence among children at heightened genetic risk for ASD of behaviorally relevant changes in cortical structure and function (Girault et al., 2022) within occipital systems important for the processing of visual information. Crucially, the strategy applied in our analysis of ASD risk genes can be generalized to risk genes for any brain disorder of interest to place known risk factors for disease into the rich context of multiscale cortical biology.

Finally, the collection of DEMs, annotational gene sets, and statistical tools used in this work is shared as a new resource to accelerate multiscale neuroscience by allowing flexible and spatially unbiased translation between genomic and neuroanatomical spaces. Of note, this resource can easily incorporate any future expansions of brain data in either neuroanatomical or genomic space. We anticipate that it will be particularly valuable to incorporate new data from the nascent, but rapidly expanding fields of high-throughput histology (Wagstyl et al., 2020), single-cell omics (Bakken et al., 2021), and large-scale imaging-genetics studies (Smith et al., 2021). Taken together, MAGICC enables a new integrative capacity in the way we study the brain, and hopefully serves to spark new connections between previously distant datasets, ideas, and researchers.

Materials and methods

Creating spatially dense maps of human cortical gene expression (Figure 1a–d)

Request a detailed protocol

Cortical surfaces were reconstructed for each AHBA donor MRI using FreeSurfer (Fischl, 2012), and coregistered between donors using surface matching of individuals’ folding morphology (MSMSulc) (Robinson et al., 2018). An average donor cortical mesh was also created for analyses of cortical morphology by averaging the vertex coordinates of volumetrically aligned meshes for the six donors.

Probe-level data measures of gene expression for all samples in the AHBA adult brain microarray dataset were downloaded from https://human.brain-map.org/static/download, providing log2-transformed measures of gene expression for 58,692 probes in each of 3702 brain tissue samples from six donors (Supplementary file 1). Within- and across-brain normalization of these probe-level gene expression values was implemented as detailed by the Allen Institute for Brain Science White Paper (Allen Human Brain Atlas, 2013). Probes were reannotated using the updated manifest from Arnatkeviciute et al., 2019, excluding genes lacking an Entrez, and probe-level expression values were averaged for each gene to yield a single gene * sample expression matrix for each donor. As only two donors had measurements from right hemispheres, samples were filtered by region to retain those originating from the cerebral cortex left hemisphere only. This decision was made given evidence for potential asymmetries in gene expression within the human cortex (de Kovel et al., 2018), and known differences in cortical shape between the hemispheres that complicate the reflection of sample locations from left to right cortical sheets (Jo et al., 2012). The above steps resulted in a final set of six donor-level gene * sample matrices from the left cerebral cortex for downstream analyses. These matrices collectively contained scaled expression values for 20,781 genes in each of the 1304 cortical samples.

Native subject MRI coordinates were extracted for every cortical sample in each donor (Figure 1a). Nearest mid-surface cortical vertices were identified for each sample, excluding samples further than 20 mm from a cortical coordinate. For cortical vertices with no directly sampled expression, expression values were interpolated from their nearest sampled neighbor vertex on the spherical surface (Moresi and Mather, 2019; Figure 1b). Sampling density ρ in each subject was calculated as the number of samples per mm2, from which average inter-sample distance, d, was estimated using the formula: d=1 , giving a mean intersample distance of 17.7 mm ± 1.2 mm. Surface expression maps were smoothed using the Connectome Workbench toolbox (Glasser et al., 2013) with a 20 mm full-width at half maximum (FWHM) Gaussian kernel, selected to be consistent with this sampling density (Figure 1c). To align subjects’ expression, expression values were z-scored by the mean and standard deviation across vertices (given the known criticality of within-subject scaling of AHBA expression values; Markello et al., 2021) and then averaged across the six subjects (Figure 1d), yielding spatially dense estimates of expression at 29,696 vertices across the left cerebral cortex per gene. Vertex-wise, rather than sample-level, estimation of mean and standard deviation mitigates potential biases introduced by intersubject variability in the regional distribution and density of cortical samples. For Y-linked genes, DEMs were calculated from male donors only. For each of the resulting 20,781 gene-level expression maps, the orientation and magnitude of gene expression change at each vertex (i.e., the gradient) were calculated for folded, inflated, spherical, and flattened mesh representations of the cortical sheet using Connectome Workbench’s metric gradient command (Glasser et al., 2013).

Benchmarking DEMs

Spin tests for comparing two spatial maps

Request a detailed protocol

Cortical maps exhibit spatial autocorrelation that can inflate the false-positive rate, for which a number of methods have been proposed (Alexander-Bloch et al., 2018; Burt et al., 2020; Vos de Wael et al., 2020). At higher degrees of spatial smoothness, this high false-positive rate is most effectively mitigated using the spin test (Alexander-Bloch et al., 2018; Markello and Misic, 2021; Vos de Wael et al., 2020). In the following analyses when generating a test statistic comparing two spatial maps, to generate a null distribution, we computed 1000 independent spins of the cortical surface using https://netneurotools.readthedocs.io and applied it to the first map whilst keeping the second map unchanged. The test statistic was then recomputed 1000 times to generate a null distribution for values one might observe by chance if the maps shared no common organizational features. This is referred to throughout as the ‘spin test’ and the derived p-values as pspin.

An additional null dataset was generated to test whether intrinsic geometry of the cortical mesh and its impact on interpolation for benchmarking analyses of DEMs and gradients (Figure 1—figure supplements 1d and 2d, Figure 2—figure supplement 1c). In these analyses, the original samples were rotated on the spherical surface prior to subsequent interpolation, smoothing, and gradient calculation. Due to computational constraints, the full dataset was recreated only for 10 independent spins. These are referred to as the ‘spun + interpolated null.

Replicability and independence from cortical sampling density (Figure 1—figure supplement 1)

Request a detailed protocol

We assessed the replicability of DEMs by applying the above steps for DEM generation to non-overlapping donor subsets and comparing DEMs between the resulting sub-atlases. We quantified DEM agreement between sub-atlases at both the gene level (correlation in expression across vertices for each gene, Figure 1—figure supplement 1c) and the vertex level (correlation in ranking of genes by their scaled expression values at each vertex, Figure 1—figure supplement 1d, e). These sub-atlas comparisons were done between all possible pairs of individuals, donor duos, and donor triplets to give distributions and point estimates of reproducibility for atlases formed of one, two, and three donors. Learning curves were fitted to these data to estimate the projected gene-level and vertex-level DEM reproducibility of our full six-subject sample atlas (Figueroa et al., 2012; Figure 1—figure supplement 1c).

To assess the effect of data interpolation in DEM generation, we compared gene-level and vertex-level reproducibility of DEMs against a ‘ground truth’ estimate of these reproducibility metrics based on uninterpolated expression data. To achieve a strict comparison of gene expression values between different individuals at identical spatial locations, we focused these analyses on the subset of AHBA samples where a sample from one subject was within 3 mm geodesic distance of another. This resulted in 1097 instances (spatial locations) with measures of raw gene expression of one donor and predicted values from the second donor’s uninterpolated AHBA expression data and interpolated DEM. We computed gene-level and vertex-level reproducibility of expression using the paired donor data at each of these sample points – for both DEM and uninterpolated AHBA expression values. By comparing DEM reproducibility estimates with those for uninterpolated AHBA expression data, we were able to quantify the combined effect of interpolation and smoothing steps in DEM generation. We used gene-level reproducibility values from DEMs and uninterpolated AHBA expression data to compute a gene-level difference in reproducibility, and we then visualized the distribution of these difference values across genes (Figure 1—figure supplement 1a). We used gene–rank correlation to compare vertex-level reproducibility values between DEMs and uninterpolated AHBA expression data (Figure 1—figure supplement 1b).

Theoretically, regional gradients of expression change in DEMs could be biased by regional variations in the density of AHBA cortical sampling. To test for this, in each individual subject, we calculated the spatial relationship between the sampling density and mean gene gradient magnitude (Figure 1—figure supplement 1g). We additionally tested whether the regional variability of gene rank predictability in the atlas (shown in Figure 1—figure supplement 1f) was linked to the sampling density within the atlas.

Alignment with reference measures of cortical organization (Figure 1e–g)

Request a detailed protocol

We first determined whether our DEM library was able to differentiate between genes that are known to show cortical expression (CExp) and those without any prior evidence of cortical expression (NCExp) – motivated by the strong expectation that NCExp genes should lack a consistent spatial gradient in expression. For this test, we defined a set of 16,573 CExp genes by concatenating the genes coding for proteins found in the ‘cortex’ tissue class of the human protein atlas (Sjöstedt et al., 2020) genes identified as markers for cortical layers or cortical cells (see below; Darmanis et al., 2015; Habib et al., 2017; He et al., 2017; Hodge et al., 2019; Lake et al., 2018; Lake et al., 2016; Li et al., 2018; Maynard et al., 2021; Ruzicka et al., 2021; Velmeshev et al., 2019; Zhang et al., 2016). The remaining 4208 genes in our DEM library were classified as NCExp. Fisher’s exact test was used to assess whether genes with lower gene reproducibility (r < 0.5) were enriched for NCExp genes. We projected vertex-level reproducibility values for CExp and NCExp genes onto the cortical surface for visual comparison and also computed the mean cross-vertex reproducibility for each of these maps (Figure 1—figure supplement 1f).

We next compiled data from independent studies for a range of macroscale and microscale cortical features that would be expected to align with specific DEM maps, and asked whether the spatial patterns of cortical gene expression from DEMs showed the expected alignment with these independent data. These independent comparison studies were selected to span diverse measurement methods and data modalities representing a range of spatial scales.

We first sought to establish whether local changes in DEMs, that is, the gradient maps of gene expression, could be used to validate existing areal border genes and identify novel candidates. Using a parcellation of the cortex based on multimodal structural and functional neuroimaging (Glasser et al., 2016), we identified the vertices along the boundary between a pair of regions (e.g., V1 and V2). The mean DEM gradient at these vertices was quantified for each gene, enabling us to rank genes by their exhibited border-like features at this cortical location. We then assessed the ranking of known lists of areal marker genes for a given border against a randomly sampled null distribution. To validate known areal marker genes derived from previous ISH studies, we took examples from the human visual cortex (Zeng et al., 2012), macaque visual cortex, and macaque frontal regions 44 and 45 (Chen et al., 2022). To test the capacity of our resource to identify novel putative areal border genes, we calculated average gradients of all genes across the boundary between mesial temporal parahippocampal gyrus (perirhinal ectorhinal cortex, PeEc) and the fusiform gyrus (area TF) for which there is openly available ISH data (https://human.brain-map.org/ish/search). Limiting analyses to those genes for which ISH was available, the two genes exhibiting the largest gradient in either direction (four in total) were selected. The ISH was visually inspected for the presence of area-like features in gene expression. For quantitative support, the cortex in each ISH image was manually segmented over the area of interest. The pixel-wise transverse distance along the cortical segmentation from left to right was calculated and subdivided into 200 equally spaced columns, spanning from pial to white matter surface. Staining intensity was averaged across each column. For each column, we computed the t-statistic between columns to the right and left, and identified the column with the largest t-statistic as the location of the putative interareal boundary.

We used the spun + interpolated null to test whether peaks in gene gradient could be driven primarily by local folding morphology impacting non-uniform interpolation. We quantified the average gradient for all genes along the V1-V2 border in the atlas, as well as for 10 iterations of the atlas where the samples were spun prior to interpolation. We computed the median gradient magnitude for the 20 top-ranked genes for each (Figure 1—figure supplement 2d).

We benchmarked DEMs against regional differences in cellular measures of cortical organization from single-nucleus RNA-sequencing studies (snRNAseq). Specifically, we correlated regional differences in the estimated proportion of 16 neuronal subtypes across six cortical regions (Lake et al., 2016) with regional DEM estimates for the mean expression of provided markers for these cell types (Lake et al., 2016). The test statistic was tested against a null distribution generated through spinning and resampling the cell marker DEM estimates (Table 1). Given the observed correspondence between regional cellular proportions and regional expression of cell marker sets, we used more recently generated reference cell markers from the Allen Institute for Brain Sciences (Bakken et al., 2021; Hodge et al., 2019; Tasic et al., 2016) to generate DEMs for 11 of 14 major cell subclasses in the mammalian cortex (6 neuronal types shown in Figure 1h, all 11 used for TD peak enrichment analysis; Figure 2—figure supplement 1h). Three markers were excluded due to absence in the original dataset or low gene predictability (r < 0.2, Figure 1—figure supplement 1c).

Table 1
Statistical tests used to compare spatial maps and gene sets derived from the Allen Human Brain Atlas with independent multiscale neuroscientific resources.
Input dataTest statisticSignificance test
Comparison of two cortical maps, e.g., Figure 1ePearson’s R (e.g., Figure 1e and f), Spearman Rrank (Figure 3), delta Z for binary and continuous comparison (Figures 1e and 3c and d), Dice score for two binary maps (Figures 2d and 4g), skew in distribution of angles (Figures 2f and c and 3cFigures 2f & g and 3c). Counts for peak expression locations overlapping ROIs (Figure 2—figure supplement 1h).Spin test: generate null distribution for test statistic by independently spinning spherical projections of spatial maps and recalculating test statistic on spun maps (Alexander-Bloch et al., 2018).
Intrasubject alignment of multimodal mapsPearson’s R (e.g., Figure 1—figure supplement 1g).Simple permutation-based intermodal correspondence (SPICE) test (Weinstein et al., 2021).
Comparison of gene–gene connectivity matrix, e.g., protein–protein interaction vs. spatial correlation, gene–gene spatial correlation vs. developmental trajectory correlationIf continuous, threshold matrix at 95%. Fisher’s exact test for significant edge-level overlap.Fisher’s exact test p-values corrected for multiple comparisons using the Holm–Sidak step down procedure (Holm, 1979).
Overlap of two gene lists, e.g., Figure 3eFisher’s exact test.Fisher’s exact p-value corrected for multiple comparisons.
Cortical thickness changes in pathology (in AD, Figure 1e, in ASD, Figure 4g)Linear model:
Vertex cortical thickness ~ Age + sex + group + mean cortical thickness.
Cluster-wise correction. Calculate maximum size of significant clusters on 1000 randomly permuted cohorts, using the 95th centile size as a threshold on the test cohort (Hagler et al., 2006).
Intramodular trajectory correlationPairwise intramodular median rank correlation.Randomly sampled gene sets of comparable size.
Protein–protein interactionIntramodular connectivity.Random resampling of gene sets with decile-matching for degree.
  1. AD, Alzheimer’s disease; ASD, autism spectrum disorder; ROI, region of interest.

We benchmarked DEMs against orthogonal spatially dense measures of cortical through the following comparisons: (i) layer IV thickness values from the 3D BigBrain atlas of cortical layers (Wagstyl et al., 2020) vs. the average DEM for later IV marker genes (He et al., 2017; Maynard et al., 2021; Supplementary file 2); (ii) motor-associated areas of the cortex from multimodal in vivo MRI (Glasser et al., 2016) vs. the average DEM for two marker genes (ASGR2, CSN1S1) of Betz cells, which are giant pyramidal neurons that output from layer V of the human motor cortex (Bakken et al., 2021); (iii) an in vivo neuroimaging map of the T1/T2 ratio measuring of intracortical myelination (Glasser and Van Essen, 2011) vs. the DEM for Myelin Basic Protein; and (iv) regional cortical thinning from in vivo sMRI data in AD patients with the APOE E4 (OASIS-3 dataset [LaMontagne et al., 2019], see ‘MRI data processing’) vs. the APOE4 DEM. For all four of these comparisons, alignment between maps was quantified and test for statistical significance using a strict spin-based spatial permutation method that controls for spatial autocorrelation in cortical data (Alexander-Bloch et al., 2018) methods on statistical testing of pairwise cortical maps can be found in Table 1.

Characterizing the topography of DEMs

TD and PCA (Figure 2a–c)

Request a detailed protocol

TD of each cortical vertex was calculated as the mean of the absolute DEM value for all genes (Figure 2a). Statistically significant peaks in TD, driven by convergence of extreme values across multiple genes, were identified as follows. The DEM for each gene was independently spun and TD was recalculated at each vertex over 1000 sets of gene-level DEM permutations (Alexander-Bloch et al., 2018). The maximum vertex TD value for each permuted TD map was recorded and the 95th percentile value across the 1000 permutations was taken as a threshold value. This threshold represents the maximum TD one would expect in the absence of concentrated colocalisations of extreme expression signatures, and areas above this threshold were annotated as TD peaks. To disambiguate TD peaks that are spatially coalescent but potentially driven by extreme values of heterogeneous gene sets within different regions, we concatenated all suprathreshold TD vertices into a single vertex * gene matrix and vertices in this matrix were clustered based on their expression signatures.

Intervertex correlation of gene rankings was calculated and the matrix was clustered using a Gaussian mixture model. Bayesian information criterion was used to identify the optimum number of clusters (k = 6) from a range of 2–18. Automated labels to localize TD peaks were generated based on their intersection with a reference multimodal neuroimaging parcellation of the human cortex (Glasser et al., 2016). Each TD was given the label of the multimodal parcel that showed the greatest overlap (Figure 2b).

The TD map was assessed for reproducibility through three approaches. First, the six-subject cohort was subdivided into pairs of triplets, for which there are 10 unique combinations. For each combination, independent TD maps were computed for each triplet and compared between triplets (Figure 2—figure supplement 1a). Second, for the full six-subject cohort genes were grouped into deciles according to the reproducibility of their spatial patterns in independent subcohorts (Figure 1—figure supplement 1c). For each decile of genes, a TD map was computed and compared to the TD map from the remaining 90% of genes (Figure 2—figure supplement 1b). Third, to assess whether the covariance in spatial patterning across genes could be a result of mesh-associated structure introduced through interpolation and smoothing, TD maps were recomputed for the spun + interpolated null datasets and compared to the original TD map (Figure 2—figure supplement 1c).

The cortical regions defined by TD peaks were annotated according to their spatial overlap with the 24 cortical cell marker expression DEMs used in Figure 1g and h (Bakken et al., 2021; Hodge et al., 2019; Lake et al., 2016). To establish that cell maps were aligned with TD peaks, we first tested whether the vertex with the highest DEM value for each cell map overlapped with a TD peak and compared the number of overlapping cells to a null distribution created through spinning the TD peaks independently 1000 times. We then identified the cell types whose expression most closely aligned with each TD peak, comparing mean TD expression with a null distribution generated through spinning the peaks 1000 times (Figure 2—figure supplement 1h). TD peaks were also annotated for their functional activations using the meta-analytic Neurosynth database (see ‘Map-based annotations’).

Gene sets characterizing TD peaks were identified as follows. At the vertex with the highest TD value within a peak region, the 95th centile TD value across genes was selected as a threshold. Genes with z-scored expression values above this threshold or below its inverse were selected, allowing TD peaks to have asymmetric length gene lists for high- and low-expressed genes (Supplementary file 3). These TD gene lists were submitted to a Gene Ontology (GO) enrichment analysis pipeline (see ‘Gene set-based annotations’).

To contextualize the newly described TD peaks using previously reported principal components (PCs) of human cortical gene expression, we computed the first five PC of gene expression in our full DEM library. The percentage of variance explained by each PC was calculated and compared to a null threshold derived through fitting PCs to a permuted null given by 1000 random spatial rotations of gene-level DEMs (Figure 2—figure supplement 1d). Taking the gene-level loadings from the first three PCs (Figure 2—figure supplement 1e), each vertex could be positioned in a 3D PC space based on its expression signature and also be colored based on its membership of a TD peak, thereby visualizing the position of TD peaks relative to the dominant spatial gradients of transcriptomic variation across the cortex (Figure 2c).

The assignment of TD regions as ‘peaks’ implies a rapid emergence of the TD signature surrounding the peak boundaries, which we formally assessed by cortex-wide analysis of local tangential changes in gene expression (see ‘Local gradient analysis), and a spatially fine-grained comparisons of the physical vs. transcriptional distance between cortical regions. In the latter of these two analytic approaches, a rapid ‘border-like’ onset of TD features would appear as (i) TD regions showing a greater transcriptional distance from other cortical regions than would be expected from their physical distance from other cortical regions, and (ii) this disparity emerging sharply surrounding the peak. To achieve this test, we first quantified the geodesic physical distance and Euclidean transcriptomic distance between pairs of vertices. For computational tractability, we limited this analysis to a subsample of vertices, choosing central vertices from ROIs in a parcellation with 500 approximately evenly sized parcels (Schaefer et al., 2018). We fit a linear generalized additive model to the data – predicting transcriptomic distance from geodesic distance – and calculated the residuals for each inter-vertex edge (Figure 2—figure supplement 1f). For each sampled vertex, we averaged these residuals and mapped them back to the surface to visualize cortical areas that were transcriptomically more distinctive than their physical distance to other areas would predict (Figure 2—figure supplement 1g).

Relating adult TD peaks to fetal gene expression (Figure 2—figure supplement 1k)

Request a detailed protocol

We sought to establish whether the regional expression signatures characterizing TD peaks were present early in fetal development. This goal required measures of gene expression from multiple regions across the fetal cortical sheet, which are provided by the Allen Institute from Brain Sciences fetal laser micro-dissection microarray dataset (Miller et al., 2014). In each sample’s fetal brain, this dataset represents approximately 25 cortical brain regions tangentially, and radially 7 transient fetal layers/compartments: subpial granular zone (SG), marginal zone (MZ), outer and inner cortical plate (grouped together as CP), subplate zone (SP), intermediate zone (IZ), outer and inner subventricular zone (grouped together as SZ), and ventricular zone (VZ).

Probe-level data measures of gene expression for the two PCW21 donors in the AHBA fetal LMD microarray dataset were downloaded from https://www.brainspan.org/static/download.html, providing log2-transformed measures of gene expression for 58,692 probes in each of the 536 tissue samples across both donors (Supplementary file 1). Preprocessing and normalization of these probe-level gene expression values were implemented as detailed by the Allen Institute for Brain Science White Paper (https://help.brain-map.org/download/attachments/3506181/Prenatal_LMD_Microarray.pdf). Probe-level expression values were averaged for each gene to yield a single gene * sample expression matrix for each donor, which was filtered to include only cortical samples. Gene expression values were scaled across samples within each donor, and scaled gene expression values were compiled for the set of 235 cortical regions that was common to both donor datasets. We averaged scaled regional gene expression values between donors per gene and filtered for genes in the fetal LDM dataset that were also represented in the adult DEM dataset, yielding a single final 20,476 * 235 gene-by-sample matrix of expression values for the human cortex at 21 PCW. Each TD peak region was then paired with the closest matching cortical label within the fetal regions. This matrix was then used to test whether each TD expression signature discovered in the adult DEM dataset (Figure 2, Supplementary file 3) was already present in similar cortical regions at 21 PCW.

The analysis of fetal regional patterning of TD peak gene sets was carried out as follows (Figure 2—figure supplement 1k). For a given TD peak, the significantly enriched genes for that peak (see above for definition of these gene sets) were identified in the fetal dataset and averaged at each fetal sample – capturing how highly expressed the TD signature was in each fetal sample. Next, we identified all samples in the fetal expression dataset that originated from regions underlying the TD peak and defined these as the ‘fetal target region set’ for that TD region (i.e., occipital samples in the fetal brain were the fetal target region set for analysis of gene enriched in the adult occipital TD region). We ranked all fetal samples by their mean expression of the TD marker set and normalized these ranks to between 0 (TD markers most highly expressed) and 1 (TD markers most lowly expressed). Normalization was done to adjust for varying numbers of areas recorded per compartment. This ranking enabled us to compute the median rank of the fetal target region set and test whether this was significantly lower compared to a null distribution of ranks from random reassignment of the fetal target region set labels across all fetal samples. Within this analytic framework, a statistically significant test means that the adult TD signature is significantly localized to homologous cortical regions at 21 PCW fetal life (Figure 2—figure supplement 1k ). We repeated this procedure for each adult TD.

Local gradient analysis (Figure 2e–g)

Request a detailed protocol

Spatially DEMs enabled the calculation of a vector describing the first spatial derivative, that is, the local gradient, of each gene’s expression at each vertex. These vectors describe both the orientation and the magnitude of gene expression change.

Averaging these gene-level magnitude estimates across genes provided a vertex-level summary map of the magnitude of local expression changes in our full DEM library (Figure 2e). Regions with a significantly high average expression gradient were identified using a similar spatial permutation procedure as described for the identification of TD peaks. Briefly, the DEM gradient map for each gene was independently spun and an average expression gradient magnitude was recalculated at each vertex over 1000 sets of these spatial permutations (Alexander-Bloch et al., 2018). For each permutation, we recorded the maximum vertex-level average expression gradient value, and the 95th percentile value of these maximums across the 1000 permutations was taken as a threshold value. Vertices with observed average expression gradient values above this threshold represented cortical regions of significantly rapid transcriptional change (Figure 2—figure supplement 1j).

The principal orientation of gene expression change at each vertex was calculated considering the vectors describing gene expression gradients, thereby providing a single summary of local gene expression gradients that considers both direction and magnitude. PCA of gene gradient vectors was used to calculate the primary orientation of gene expression change at each vertex (Figure 2e) and the percentage of orientation variance accounted for by this PC (Figure 2e, Figure 2—figure supplement 1i). Gene-level PC weights for each vertex were stored for subsequent analyses, including alignment with folds and functional ROIs (Figure 2f and g, see ‘Annotational analyses’).

The rich DEM expression gradient information described above was applied in three downstream analyses. First, we used these resources to detail the emergence of TD expression signatures within the cortical sheet, focusing on all vertices that had been identified to show a significantly elevated mean expression gradient. Specifically, we ranked genes at these vertices by their loadings onto the first PC of gene expression gradients at each vertex and correlated these rankings with the rankings of genes by the expression at each TD peak vertex. This vertex-level correlation score, which quantifies how closely the gene expression gradient at a given vertex resembles that expression signature of a given TD peak, was regenerated for each of the six TD peaks (colors, Figure 2—figure supplement 1j). In each of these six maps, we were also able to plot the principal orientations of expression change at the vertex level (red lines, Figure 2—figure supplement 1i) to ask whether gradients of expression change for a given TD signature were spatially oriented towards the TD in question.

Second, we used the principal orientation of expression change at each vertex to assess whether local transcriptomic gradients were aligned with the orientation of cortical folding patterns. Orientation of cortical folds was calculated using sulcal depth and cortical curvature (Xia et al., 2018). Gradient vectors for sulcal depth describe the primary orientation of cortical folds on the walls of sulci, while gradient vectors of cortical curvature better describe the orientation at sulcal fundi and gyral crowns. These two gradient vector fields were combined and smoothed with a 10 mm FWHM Gaussian kernel to propagate the vector field into plateaus, for example, at large gyral crowns where neither sulcal depth nor curvature exhibit reliable gradients. The folding orientation vectors were calculated with reference to a 2D flattened cortical representation for statistical comparison with the gradient vectors derived from gene expression maps (Figure 2f). At each vertex, the minimum angle was calculated between the folding orientation vector and gene expression gradient vector. Aligned vector maps exhibit positive skew, with angles tending toward zero. Therefore, the skewness of the distribution of angles across all vertices was calculated, and to test for significance, folding and expression vector maps were spun relative to one another 1000 times, generating a null distribution of skewness values against which the test statistic was compared (Table 1). A similar analysis was applied to test the association between module eigenmap gradient vectors and cortical folding (see ‘Weighted gene co-expression network analysis (WGCNA)’).

Third, we sought to quantify the alignment between cortical expression gradients and cortical areas as defined by multimodal imaging. Orientation of each MRI multimodal parcel ROI from Glasser et al., 2016 was calculated taking the coordinates for all vertices within a given ROI. PCA of coordinates was used to identify the short and long axis of the ROI object. The vector describing the short axis was taken for comparison with mean of expression gradient vectors for vertices in the same ROI. For each ROI, the minimum angle was calculated and the skewness of the angles across all ROIs was calculated and compared to a null distribution created through spinning maps independently 1000 times, recalculating angles and their skewness (Figure 2g).

Weighted gene co-expression network analysis (WGCNA) (Figure 3a–c)

Request a detailed protocol

Genes were clustered into modules for further analysis using WGCNA (Langfelder and Horvath, 2008). Briefly, gene–gene cortical spatial correlations were calculated across all vertices to generate a single square 20,781 * 20,781 signed co-expression matrix. This co-expression matrix underwent ‘soft-thresholding,’ raising the values to a soft power of 6, chosen as the smallest power where the resultant network satisfied the scale-free topology model fit of r2 > 0.8 (Zhang and Horvath, 2005). Next, a similarity matrix was created through calculating pairwise topological overlap, assessing the extent to which genes share neighbors in the network (Yip and Horvath, 2007). The inverse of the TOM was then clustered using average linkage hierarchical clustering, with a minimum cluster size of 30 genes. The eigengene for each module is the first PC of gene expression across vertices and provides a single measure of module expression at each vertex (hence, ‘eigenmap’). As per past implementation of WGCNA, pairs of modules with eigengene correlations above 0.9 were merged. These procedures defined a total of 23 gene co-expression modules ranging in size from 77 to 3725 genes, and a single set of unconnected genes (gray module 265 genes). We filtered the gray module from further analysis, as well as all six other modules that were also statistically significantly enriched for NCExp genes (Supplementary file 4, Fisher’s test, all p<0.0001), leaving a total of 16 modules for downstream analysis (Supplementary file 4). To assess the extent to which eigenmaps captured highly reproducible features of cortical organization, for each decile of genes, DEMs were correlated with their module eignmaps recomputed from the remaining 90% of genes (Figure 3—figure supplement 1a).

Each WGCNA module could be visualized as a cortical eigenmap, and eigenmap gradient – on the TD terrain, or inflated cortical (Figure 3a). The eigenmap gradient for each module provides a vertex-level measure for the magnitude of change in module expression at each vertex, as well as a vertex-level orientation of module expression change – calculated as described in ‘Local gradient analysis’. These anatomical representations of each WGCNA module are amenable to spatial comparison with any other cortical map through spatial permutations (Alexander-Bloch et al., 2018; see ‘Annotational analyses’). Each WGCNA module is also defined as a gene set, which is amenable to standard gene set-based enrichment analysis (see ‘Annotational analyses). WGCNA modules can each also be represented as a ranked list of all genes – based on gene-level kME scores for each module, which are the cross-vertex correlation between a gene’s DEM map and a module’s eigenmap.

Multiscale annotation of WGCNA modules (Figure 3c and d)

We used multiple open neuroimaging and genomic datasets to systematically sample diverse levels of cortical organization and achieve a multiscale annotation of WGCNA modules. All gene sets used in enrichment analysis are detailed in Supplementary file 2.

Map-based annotations

Request a detailed protocol
MRI-derived maps of cortical function
Request a detailed protocol

Functional annotations of the cortex were carried out using two independent functional MRI (fMRI) resources – one based on state fMRI (rs-FMRI) (Yeo et al., 2011) and one using task-based fMRI (Rubin et al., 2017; Yarkoni et al., 2011). Resting-state functional connectivity networks were taken from Yeo et al., 2011, which divides the cortex into seven coherent functional networks through surface-based clustering of rs-fMRI as visual, somatomotor, dorsal attention, ventral attention, frontoparietal control, limbic, and default networks. We used spin-based spatial permutation testing to test for networks in which WGCNA eigenmap expression was significantly elevated (Figure 3c, see Table 1).

For task fMRI-driven functional annotation of the cortex, we drew on meta-analytic maps of cortical activation from Neurosynth (Rubin et al., 2017; Yarkoni et al., 2011). Briefly, over 11,000 functional neuroimaging studies were text-mined for papers containing specific terms and associated activation coordinates (Yarkoni et al., 2011). Secondary analyses generated activation maps for 30 topics spanning a range of cognitive domains (Rubin et al., 2017). Topic activation maps were intersected with cortical surface meshes and thresholded to identify vertices with an activation value above 0. Example topics included ‘motor, cortex, hand’ and ‘social, reasoning, medial prefrontal cortex’ (Figure 3d). Topics were excluded if intersected cortical maps indicated activation in fewer than 1% of cortical vertices. Topic maps were used to annotate TD peaks (Figure 2d), identifying for each ROI the two topics with the highest Dice overlap. Topic maps also served as an independent validation of selected WGCNA eigenmaps (Figure 2d, Table 1). Topic maps from Neurosynth were also used to provide an orthogonal validation of observed resting-state network enrichments from Yeo et al (Figure 3c) for M2 and M12: mean eigenmap expression for module M2 and M12 was calculated for Neurosynth topic maps and assessed for statistical significance using spin-based permutations (Figure 3d, Table 1).

MRI-derived maps of cortical structure
Request a detailed protocol

Cortical thickness and T1/T2 ‘myelin’ maps were taken from the Human Connectome Project average (Glasser et al., 2016). Spatial correlations were calculated across all vertices with each WGCNA module eigenmap and assessed for statistical significance using spin-based permutations (Figure 3c, see Table 1).

Orientation of cortical folds
Request a detailed protocol

We used the orientation of expression change at each vertex to assess whether local eigenmap gradients were aligned with the orientation of cortical folding patterns, mirroring the analysis described above (Figure 3—figure supplement 1b, see ‘Local gradient analysis’).

Inter-eigenmap correlations
Request a detailed protocol

We tested the pairwise spatial correlation between pairs of module eigenmaps. Statistical significance was assessed using a null distribution of correlation matrices through independently spinning eigenmaps and recalculating correlations, and correcting for multiple comparisons (Figure 3—figure supplement 1c, see Table 1).

Gene set-based annotations

GO enrichment

Request a detailed protocol

GO enrichment analysis (see Table 1) were carried out on gene sets of interest, testing for enrichment of Biological Processes and Cellular Compartment, using the GOATOOLS Python package (Klopfenstein et al., 2018). Where multiple gene lists were assessed simultaneously (e.g., for TD peak gene lists or WGCNA gene sets), correction for multiple comparisons was carried out by dividing the p<0.05 threshold for statistical significance by the number of tests (i.e., for 16 module p<0.05/16). To facilitate summary descriptions of multiple significant GO terms, terms were hierarchically clustered based on semantic similarity (Resnik, 1995) and representative terms were selected based on biological specificity (i.e., depth within the Gene Ontology tree) and magnitude of the enrichment statistic (Figure 3d, Supplementary file 2).

Layer marker gene sets and ISH validation

Request a detailed protocol

We sought to assess the extent to which convergent spatial patterns of gene expression indicate convergent laminar and cellular features. Marker genes for each cortical layer were defined as the union of layer-specific marker genes from two comprehensive transcriptomic studies of layer-dependent gene expression sampling prefrontal cortical regions (He et al., 2017; Maynard et al., 2021). He et al. took human cortical samples from the prefrontal cortex, corresponding to areas BA 9, 10, and 46. Samples were sectioned into cortical depths and underwent RNAseq to identify 4131 genes exhibiting layer-dependent expression. Maynard et al. took samples from the dorsolateral prefrontal cortex and carried out spatial snRNAseq to identify 3785 genes enriched in specific cortical layers. These independent resources were combined for laminar enrichment analyses (i.e., we took each layer’s marker genes to be the union of layer genes defined in Maynard et al. and He et al.). WGCNA module genes were tested for laminar enrichment using Fisher’s exact test, correcting for multiple comparisons (Figure 3c, see Table 1). Independent validation of laminar associations of candidate genes identified through the above marker lists was carried out using ISH data from the Allen Institute (Zeng et al., 2012). For selected modules, we identified the highest kME genes represented within the ISH dataset. For each of these genes, the highest quality sections were downloaded, and the cortical ribbon was manually segmented. Equivolumetric estimates of cortical depth were generated and profiles of depth-dependent staining intensity were generated (Huber et al., 2021). Accompanying approximate cytoarchitectonic layer thickness estimations were derived from BigBrain and used to describe the laminar location of ISH peaks (Wagstyl et al., 2020; Figure 3d).

Adult cortical cell-type marker gene sets

Request a detailed protocol

Cell marker gene sets were compiled from multiple snRNAseq datasets, sampling a wide variety of cortical areas covering occipital, temporal, frontal, cingulate, and parietal lobes (Darmanis et al., 2015; Habib et al., 2017; Hodge et al., 2019; Lake et al., 2018; Lake et al., 2016; Li et al., 2018; Ruzicka et al., 2021; Velmeshev et al., 2019; Zhang et al., 2016). To integrate across differing subcategories, cell subtype marker lists were grouped into the following cell classes according to their designated names: excitatory neurons, inhibitory neurons, oligodendrocytes, astrocyte, oligodendrocyte precursor cells, microglia, and endothelial cells. Marker lists for each of these cell classes represented the union of all subtypes assigned to the category. Cells not fitting into these categorizations were excluded. WGCNA module genes were tested for cell class marker enrichment using Fisher’s exact test, correcting for multiple comparisons (Figure 3c, see Table 1).

Fetal cortical cell-type marker gene sets

Request a detailed protocol

Fetal cell marker gene lists were taken from Polioudakis et al., 2019. WGCNA module genes were tested for cell class marker enrichment using Fisher’s exact test, correcting for multiple comparisons (Figure 3c, see Table 1).

Compartments and SynGO

Request a detailed protocol

Cellular compartment gene lists were taken from the COMPARTMENTS database (Binder et al., 2014), which identifies subcellular localisation of marker genes based on integrated information from the Human Protein Atlas, literature mining, and GO annotations. Examples of cellular compartments include nucleus, plasma membrane, and cytosol. An additional compartment list for neuronal synapse was generated by collapsing all genes in the manually curated SynGO dataset (Koopmans et al., 2019). WGCNA module genes were tested for cell compartment gene set enrichment using Fisher’s exact test, correcting for multiple comparisons (Figure 3c, see Table 1).

PPI network

Request a detailed protocol

PPIs were derived from the STRING database (Szklarczyk et al., 2019). Physical direct and indirect PPIs were considered. We tested for enrichment of PPIs for proteins coded by genes within WGCNA modules. The median number of intramodular connections was compared to a null distribution of median modular connectivity derived from 10,000 randomly resampled modules with the same number of genes. Gene resampling was restricted within deciles defined by the degree of protein–protein connectivity.

Developmental peak epoch

Request a detailed protocol

Peak developmental epochs for genes were extracted from Werling et al., 2020. Briefly, bulk transcriptomic expression values were measured from DLPFC samples across development (6 PCW to 20 y), fitting developmental trajectories to each gene. Genes were categorized according to developmental epoch in which their expression peaked. For descriptive purposes, epochs were renamed as (1) ‘early fetal’ (‘fetal,’ 8–24 PCW), (2) late fetal transition (‘perinatal,’ 24 PCW – 6 mo postnatal), and (3) ‘postnatal’ (>6 mo). Genes associated with WGCNA modules were tested for enrichment correcting for multiple comparisons across 16 modules.

Developmental trajectories

Request a detailed protocol

Gene-specific developmental trajectories were generated for the cortical samples from Li et al., 2018. Briefly, in this study bulk transcriptomic expression values were measured from brain tissue samples taken from individuals aged between 5 PCW and 64 years old. In our analysis, samples were filtered for cortical ROIs and restricted to post 10 PCW due to lack of samples before this time point. Ages were log transformed and Generalized Additive Models were fit to each gene to generate an estimated developmental trajectory. To compute trajectory correlations between genes, we first resampled expression trajectories at 20 equally spaced time points (in log time), and then z-normalized these values per gene (using the mean and standard deviation of each trajectory). We then calculated expression trajectory Pearson correlations between each pair of genes in this dataset and used these to determine whether the spatially co-expressed genes defining each WGCNA module also showed significant temporal co-expression. To achieve this test, we calculated the median temporal co-expression (correlation in expression trajectories) for each WGCNA module gene set and compared this to null median co-expression values for 1000 randomly resampled gene sets matching module size. The mean trajectories of genes in each module were calculated to visualize the developmental expression pattern of each module (Figure 3—figure supplement 1d).

Fetal compartmental analysis

Request a detailed protocol

We used the 21 PCW fetal microarray data processed for analysis of TD peaks (see ‘Relating adult TD peaks to fetal gene expression,’ Figure 2—figure supplement 1k; Miller et al., 2014) to generated marker gene sets for each of the seven transient fetal cortical compartments: subpial granular zone (SG), marginal zone (MZ), outer and inner cortical plate (grouped together as CP), subplate zone (SP), intermediate zone (IZ), outer and inner subventricular zone (grouped together as SZ), and ventricular zone (VZ). We collapsed 21 PCW cortical expression data into compartments by averaging expression values across cortical regions for each compartment because compartment differences are known to explain the bulk of variation in cortical expression within this dataset (24% [Miller et al., 2014]). The top 5% expressed genes for each of the seven fetal compartments was taken as the compartment marker set and used for enrichment analysis of WGCNA modules with Fisher’s exact test, correcting for multiple comparisons (see Table 1, Figure 3c).

Reproducibility of genes driving enrichment analyses

Request a detailed protocol

We calculated gene-level spatial reproducibilities for the union of all genes contributing to significant neurobiological enrichments of WGCNA modules. This was compared to a null distribution, randomly resampling the same number of genes from all those considered in the enrichment analyses.

Combining gene set-based annotations of the cortical sheet (Figure 3e , Figure 2—figure supplement 1d)

Request a detailed protocol

Our observation that many WGCNA modules showed statistically significant enrichment for diverse gene sets that could span different spatial scales (e.g., layers and organelles) or temporal epochs (e.g., fetal and adult cortical features) (Figure 3c) suggested a potential sharing of marker gene across these diverse sets. To test this idea and characterize potential biological themes reflected by these shared marker genes, we carried out pairwise enrichment analyses between all annotational gene lists (Figure 3e). Gene lists used for enrichment analysis of WGCNA modules for cortical layers, adult cells, cellular compartments, fetal cells, developmental peak epochs, and fetal compartments were taken for further analysis. A genelist–genelist pairwise enrichment matrix was generated. p-Values above 0.1 were set to 1, to limit their contribution, and p-values were converted to -log10(p). To remove isolated gene lists, all lists were ranked by their degree (edges defined as p<0.05) and the bottom 10% were excluded from further analysis. The matrix, excluding WGCNA modules, underwent Louvain clustering (Blondel et al., 2008), grouping together gene lists with similar properties. Clusters were assigned descriptive names according to their salient common features (e.g., non-neuronal, mature neuron, mitotic, myelin, fetal GE) (Figure 3—figure supplement 1e). For visualization, the full matrix underwent UMAP embedding (McInnes et al., 2018), a nonlinear dimensionality reduction technique assigning 2D coordinates to each gene list (Figure 3e), coloring gene lists by their assigned cluster along with the top 20% of edges.

Disease enrichment and ASD-based analysis of WGCNA modules

Request a detailed protocol

The proposed analyses above link regionally patterned cortical gene expression with macroscale imaging maps of structure and function, and microscale gene sets exhibiting laminar, cellular, subcellular, and developmental transcriptomic specificity. We sought to assess whether WGCNA module gene lists capturing shared spatial and temporal features were also enriched for genes implicated in atypical brain development. We included genes identified in exome sequencing studies in neurodevelopmental disorders: ASD (Ruzzo et al., 2019; Satterstrom et al., 2020) , schizophrenia (SCZ; Singh et al., 2020), severe developmental disorders (Deciphering Developmental Disorders Study [DDS]; Deciphering Developmental Disorders Study, 2017), and epilepsy (Heyne et al., 2018). WGCNA module gene sets were tested for enrichment of these genes using Fisher’s test and corrected for multiple comparisons (Table 1, Figure 4a). Two modules – M12 and M15 – showed enrichment for multiple disease sets, with the ASD gene set being unique for showing enrichment in both modules. We therefore focused downstream analysis on further characterizing the enrichment of ASD genes in M12 and M15, and testing whether these enrichments could predict regional cortical changes in ASD.

Characterizing ASD gene enrichments in M12 and M15

kME analysis

Request a detailed protocol

To better characterize the spatially distinctive properties of genes within M12 and M15, we defined the union of genes in both modules and collated the WGCNA-defined kME scores for each gene to both M12 and M15. This provided a basis for plotting all genes by their relative membership to both modules to quantify the proximity of each gene to each module, assess the discreteness of gene assignment to modules, and provide a common space within which to project gene functions and associations with ASD (Figure 4c).

Enrichment of ASD-linked GO terms

Request a detailed protocol

Genes linked to two specific GO terms, ‘Neuronal communication’ and ‘Gene expression regulation,’ enriched among risk genes for ASD in Satterstrom et al., 2020, were separately tested for enrichment within M12 and M15 (Figure 4d) using Fisher’s exact test.

Developmental trajectories of disease-linked modules

Request a detailed protocol

To characterize the distinctive temporal trajectories of M12 and M15 (see Figure 3c), we took gene-level trajectories (see ‘Developmental trajectories’) and calculated the mean gene-expression trajectory of genes in each module (Figure 4e).

Independent characterization of ASD risk genes

Request a detailed protocol

To assess the extent to which modules M12 and M15 captured the underlying axes of spatial patterning across all 135 ASD risk genes, we took DEMs for all 135 risk genes and independently clustered them. Pairwise co-expression was calculated for all risk gene DEMs and the resultant matrix was clustered using Gaussian mixture modeling into two clusters, C1 and C2 (Figure 4—figure supplement 1a). kME values were calculated for each risk gene with all WGCNA modules and averaged within each cluster. For each cluster, we then identified the WGCNA module with the highest mean kME (Figure 4—figure supplement 1b).

Comparing M12 and M15 expression to regional changes of cortical gene expression in ASD (Figure 4f)

Request a detailed protocol

We mapped regional transcriptomic disruption in ASD measured from multiple cortical regions using RNAseq data (Haney et al., 2020). This study compared bulk transcriptomic expression in ASD and control samples across 11 cortical areas, quantifying the extent of transcriptomic disruption by identifying the number of significantly DEGs in each region. Cortical areas sampled in this study were mapped to their closest corresponding area in a multimodal MRI parcellation (Glasser et al., 2016). The mean expression of M12 and M15 eigenmaps was quantified in the same cortical areas (Figure 4f). The test statistic, correlating eigenmap expression with the number of DEGs, was tested against a null distribution generated through spinning and resampling the eigenmaps (see Table 1).

Comparing M12 and M15 expression to regional changes of cortical thickness in ASD (Figure 4g and h, Figure S5c)

Request a detailed protocol

To assess the extent to which WGCNA module eigenmaps pattern macroscale in vivo anatomical differences in ASD, we took the map of relative cortical thickness change in autism (see ‘Preprocessing and analysis of structural MRI data’) and compared this to eigenmap expression patterns. M12 and M15 eigenmaps were thresholded, identifying the 5% of vertices with the highest expression. Areas of high significant thickness change were tested for overlap with areas of significant cortical thickness change using the Dice overlap compared to a null distribution of Dice scores generated through spinning the thresholded eigenmaps (see Table 1).

Preprocessing and analysis of structural MRI data

AHBA donors

Request a detailed protocol

Pial and white matter cortical T1 MRI scans of the six AHBA donor brains were reconstructed using FreeSurfer (v5.3) (Romero-Garcia et al., 2018; see Supplementary file 1). Briefly, scans undergo tissue segmentation, cortical white and pial surface extraction. A mid-thickness surface between pial and white surfaces was also created. The locations of tissue samples taken for bulk transcriptomic profiling, provided in the coordinates of the subject’s MRI, were mapped to the mid-thickness surface as outlined above (see ‘Creating spatially dense maps of human cortical gene expression from the AHBA’). Individual subject cortical surfaces were co-registered to the fs_LR32k template surface brain using MSMSulc (Robinson et al., 2018) as part of the ciftify pipeline (Dickie et al., 2019), which warps subject meshes by nonlinear alignment their folding patterns to the MRI-derived template surface. A donor-specific template surface was created by averaging the coordinates of the aligned meshes and used for analysis of cortical folding patterns used in ‘Alignment with reference measures of cortical organization.’ Pial, inflated, and flattened representations of the fs_LR32k surface were used for the visualization of cortical maps throughout.

OASIS (Figure 1e)

Request a detailed protocol

To estimate relative cortical thickness change in AD patients with the APOE E4 variant, we utilized the openly available OASIS database (LaMontagne et al., 2019). T1w MRI data was collected using a Siemens Tim Trio 3T scanner and underwent cortical surface reconstruction using FreeSurfer v5.3 as above. Reconstructions underwent manual quality control and correction, with poor quality data being removed. Output cortical thickness maps, smoothed at 20 mm FWHM and aligned to the fsaverage template surface, were downloaded via https://www.oasis-brains.org/, along with age, sex, and APOE genotype and cognitive status. Subjects were included in the analysis if they had been diagnosed with AD and had at least one APOE E4 allele (n = 119) or were a healthy control (n = 633) (see Supplementary file 1). Per-vertex coefficients for disease-associated cortical thinning and significance were calculated, adjusting for age, sex, and mean cortical thickness. We controlled for mean CT to identify local anatomical changes given our finding of generalized cortical thickening in AD compared to controls in OASIS. The map of cortical thickness coefficients was then registered from fsaverage to fs_LR32k for comparison with the DEM of APOE (Figure 1e; Robinson et al., 2018).

ABIDE

Request a detailed protocol

To estimate relative cortical thickness change in ASD, MRI cortical thickness maps, generated through FreeSurfer processing of 3T T1 structural MRI scans, were downloaded from ABIDE, along with age, sex, and site information (Di Martino et al., 2017; Supplementary file 1). Multiple sites and scanners were used to acquire these data, which is known to introduce systematic biases in morphological measurements like cortical thickness. To mitigate this, we used neuroCombat, which estimates and removes unwanted scanner effects while retaining biological effects on variables such as age, sex, and diagnosis (Fortin et al., 2018). Subjects with poor quality FreeSurfer segmentations were excluded using a threshold Euler count of 100 (ref). Cortical thickness change in ASD relative to controls was calculated adjusting for age, sex, and mean cortical thickness. Neighbor-connected vertices exhibiting significant cortical thickness change (p<0.05) were grouped into clusters. A null distribution of cluster sizes was generated using 1000 random permutations of the cohort, storing the maximum significant cluster size for each permutation. The 95th percentile cluster size was used as a threshold for removing test clusters that could have arisen by chance (Hagler et al., 2006). Output coefficient and cluster maps were registered from fsaverage to fs_LR32k and compared with the M12 and M15 eigenmaps as described above.

Data availability

The cortical dense expression and gradient maps of 20,781 genes and ~30k vertices that support the findings of this study are available at https://rdr.ucl.ac.uk/articles/dataset/MAGICC_vertex-level_gene_expression_data/22183891/1. Scripts to download, visualize, and analyze MAGICC are available at https://github.com/kwagstyl/MAGICC (copy archived at Wagstyl, 2024).

The following data sets were generated

References

  1. Book
    1. Brodmann K
    (1909)
    Vergleichende Lokalisationslehre Der Grosshirnrinde in Ihren Prinzipien Dargestellt Auf Grund Des Zellenbaues
    Wellcome Collection.
    1. Holm S
    (1979)
    A simple sequentially rejective multiple test procedure
    Scandinavian Journal of Statistics, Theory and Applications 6:65–70.
  2. Book
    1. Pfeifer RA
    (1940)
    Die Angioarchitektonische Areale Gliederung Der Grosshirnrinde: Auf Grund Vollkommener Gefässinjektionspräparate Vom Gehirn Des Macacus Rhesus
    G. Thieme.
  3. Book
    1. von Economo CF
    2. Koskinas GN
    (1925)
    Die Cytoarchitektonik Der Hirnrinde Des Erwachsenen Menschen
    Springer.

Peer review

Reviewer #1 (Public Review):

The manuscript by Wagstyl et al. describes an extensive analysis of gene expression in the human cerebral cortex and the association with a large variety of maps capturing many of its microscopic and macroscopic properties. The core methodological contribution is the computation of continuous maps of gene expression for >20k genes, which are being shared with the community. The manuscript is a demonstration of several ways in which these maps can be used to relate gene expression with histological features of the human cortex, cytoarchitecture, folding, function, development and disease risk. The main scientific contribution is to provide data and tools to help substantiate the idea of the genetic regulation of multi-scale aspects of the organisation of the human brain. The manuscript is dense, but clearly written and beautifully illustrated.

https://doi.org/10.7554/eLife.86933.3.sa1

Reviewer #2 (Public Review):

This is a valuable contribution that will facilitate brain transcriptomic analyses and the joint analyses of gene expression and structural and functional imaging. The methods used are solid, and the authors conducted a wide range of analyses to demonstrate the value of the dense gene expression data.

https://doi.org/10.7554/eLife.86933.3.sa2

Author response

The following is the authors’ response to the original reviews.

Reviewer #1 (Public Review):

The manuscript by Wagstyl et al. describes an extensive analysis of gene expression in the human cerebral cortex and the association with a large variety of maps capturing many of its microscopic and macroscopic properties. The core methodological contribution is the computation of continuous maps of gene expression for >20k genes, which are being shared with the community. The manuscript is a demonstration of several ways in which these maps can be used to relate gene expression with histological features of the human cortex, cytoarchitecture, folding, function, development and disease risk. The main scientific contribution is to provide data and tools to help substantiate the idea of the genetic regulation of multi-scale aspects of the organisation of the human brain. The manuscript is dense, but clearly written and beautifully illustrated.

Main comments

The starting point for the manuscript is the construction of continuous maps of gene expression for most human genes. These maps are based on the microarray data from 6 left human brain hemispheres made available by the Allen Brain Institute. By technological necessity, the microarray data is very sparse: only 1304 samples to map all the cortex after all subjects were combined (a single individual's hemisphere has ~400 samples). Sampling is also inhomogeneous due to the coronal slicing of the tissue. To obtain continuous maps on a mesh, the authors filled the gaps using nearest-neighbour interpolation followed by strong smoothing. This may have two potentially important consequences that the authors may want to discuss further: (a) the intrinsic geometry of the mesh used for smoothing will introduce structure in the expression map, and (b) strong smoothing will produce substantial, spatially heterogeneous, autocorrelations in the signal, which are known to lead to a significant increase in the false positive rate (FPR) in the spin tests they used.

Many thanks to the reviewer for their considered feedback. We have addressed these primary concerns into point-by-point responses below. The key conclusions from our new analyses are: (i) while the intrinsic geometry of the mesh had not originally been accounted for in sufficient detail, the findings presented in this manuscript paper are not driven by mesh-induced structure, (ii) that the spin test null models used in this manuscript [including a modified version introduced in response to (i)] are currently the most appropriate way to mitigate against inflated false positive rates when making statistical inferences on smooth, surface-based data.

a. Structured smoothing

A brain surface has intrinsic curvature (Gaussian curvature, which cannot be flattened away without tearing). The size of the neighbourhood around each surface vertex will be determined by this curvature. During surface smoothing, this will make that the weight of each vertex will be also modulated by the local curvature, i.e., by large geometric structures such as poles, fissures and folds. The article by Ciantar et al (2022, https://doi.org/10.1007/s00429-022-02536-4) provides a clear illustration of this effect: even the mapping of a volume of pure noise into a brain mesh will produce a pattern over the surface strikingly similar to that obtained by mapping resting state functional data or functional data related to a motor task.

Comment 1

It may be important to make the readers aware of this possible limitation, which is in large part a consequence of the sparsity of the microarray sampling and the necessity to map that to a mesh. This may confound the assessments of reproducibility (results, p4). Reproducibility was assessed by comparing pairs of subgroups split from the total 6. But if the mesh is introducing structure into the data, and if the same mesh was used for both groups, then what's being reproduced could be a combination of signal from the expression data and signal induced by the mesh structure.

Response 1

The reviewer raises an important question regarding the potential for interpolation and smoothing on a cortical mesh to induce a common/correlated signal due to the intrinsic mesh structure. We have now generated a new null model to test this idea which indicates that intrinsic mesh structure is not inflating reproducibility in interpolated expression maps. This new null model spins the original samples prior to interpolation, smoothing and comparison between triplet splits of the six donors, with independent spins shared across the triplet. For computational tractability we took one pair of triplets and regenerated the dataset for each triplet using 10 independent spins. We used these to estimate gene-gene null reproducibility for 90 independent pairwise combinations of these 10 spins. Across these 90 permutations, the average median gene-gene correlation was R=0.03, whereas in the unspun triplet comparisons this was R=0.36. These results indicate that the primary source of the gene-level triplet reproducibility is the underlying shared gene expression pattern rather than interpolation-induced structure.

In Methods 2a: "An additional null dataset was generated to test whether intrinsic geometry of the cortical mesh and its impact on interpolation for benchmarking analyses of DEMs and gradients (Fig S1d, Fig S2d, Fig S3c). In these analyses, the original samples were rotated on the spherical surface prior to subsequent interpolation, smoothing and gradient calculation. Due to computational constraints the full dataset was recreated only for 10 independent spins. These are referred to as the “spun+interpolated null”.

Author response image 1
Figure S1d: Gene predictability was higher across all triplet-triplet pairs than when compared to spun+interpolated null.

Comment 2

It's also possible that mesh-induced structure is responsible in part for the "signal boost" observed when comparing raw expression data and interpolated data (fig S1a). How do you explain the signal boost of the smooth data compared with the raw data otherwise?

Response 2

We thank the reviewer for highlighting this issue of mesh-induced structure. We first sought to quantify the impact of mesh-induced structure through the new null model, in which the data are spun prior to interpolation. New figure S1d, S2d and S3c all show that the main findings are not driven by interpolation over a common mesh structure, but rather originate in the underlying expression data.

Specifically, for the original Figure S1a, the reviewer highlights a limitation that we compared intersubject predictability of raw-sample to raw-sample and interpolated-to-interpolated. In this original formulation improved prediction scores for interpolated-to-interpolated (the “signal boost”) could be driven by mesh-induced structure being applied to both the input and predicted maps. We have updated this so that we are now comparing raw-to-raw and interpolated-to-raw, i.e. whether interpolated values are better estimations of the measured expression values. The new Fig S1a&b (see below) shows a signal boost in gene-level and vertex level prediction scores (delta R = +0.05) and we attribute this to the minimisation of location and measurement noise in the raw data, improving the intersubject predictability of expression levels.

In Methods 2b: "To assess the effect of data interpolation in DEM generation we compared gene-level and vertex-level reproducibility of DEMs against a “ground truth” estimate of these reproducibility metrics based on uninterpolated expression data. To achieve a strict comparison of gene expression values between different individuals at identical spatial locations we focused these analyses on the subset of AHBA samples where a sample from one subject was within 3 mm geodesic distance of another. This resulted in 1097 instances (spatial locations) with measures of raw gene expression of one donor, and predicted values from the second donor’s un-interpolated AHBA expression data and interpolated DEM. We computed gene-level and vertex-level reproducibility of expression using the paired donor data at each of these sample points for both DEM and uninterpolated AHBA expression values. By comparing DEM reproducibility estimates with those for uninterpolated AHBA expression data, we were able to quantify the combined effect of interpolation and smoothing steps in DEM generation. We used gene-level reproducibility values from DEMs and uninterpolated AHBA expression data to compute a gene-level difference in reproducibility, and we then visualized the distribution of these difference values across genes (Fig S1a). We used gene-rank correlation to compare vertex-level reproducibility values between DEMs and uninterpolated AHBA expression data (Fig S1b)."

Author response image 2
Figure S1: Reproducibility of Dense Expression Maps (DEMs) interpolated from spatially sparse postmortem measures of cortical gene expression.

(a) Signal boost in the interpolated DEM dataset vs. spatially sparse expression data. Restricting to samples taken from approximately the same cortical location in pairs of individuals (within 3mm geodesic distance), there was an overall improvement in intersubject spatial predictability in the interpolated maps. Furthermore, genes with lower predictability in the interpolated maps were less predictable in the raw dataset, suggesting these regions exhibit higher underlying biological variability rather than methodologically introduced bias. (b) Similarly at the paired sample locations, gene-rank predictability was generally improved in DEMs vs. sparse expression data (median change in R from sparse samples to interpolated for each pair of subjects, +0.5).

1. How do you explain that despite the difference in absolute value the combined expression maps of genes with and without cortical expression look similar? (fig S1e: in both cases there's high values in the dorsal part of the central sulcus, in the occipital pole, in the temporal pole, and low values in the precuneus and close to the angular gyrus). Could this also reflect mesh-smoothing-induced structure?

Response 3

As with comment 1, this is an interesting perspective that we had not fully considered. We would first like to clarify that non-cortical expression is defined from the independent datasets including the “cortex” tissue class of the human protein atlas and genes identified as markers for cortical layers or cortical cells in previous studies. This is still likely an underestimate of true cortically expressed genes as some of these “non-cortical genes” had high intersubject reproducibility scores. Nevertheless we think it appropriate to use a measure of brain expression independent of anything included in other analyses for this paper. These considerations are part of the reason we provide all gene maps with accompanying uncertainty scores for user discretion rather than simply filtering them out.

In terms of the spatially consistent pattern of the gene ranks of Fig S1f, this consistent spatial pattern mirrors Transcriptomic Distinctiveness (r=0.52 for non-cortical genes, r=0.75 for cortical genes), so we think that as the differences in expression signatures become more extreme, the relative ranks of genes in that region are more reproducible/easier to predict.

To assess whether mesh-smoothing-induced structure is playing a role, we carried out an additional the new null model introduced in response to comment 1, and asked if the per-vertex gene rank reproducibility of independently spun subgroup triplets showed a similar structure to that in our original analyses. Across the 90 permutations, the median correlation between vertex reproducibility and TD was R=0.10. We also recalculated the TD maps for the 10 spun datasets and the mean correlation with the original TD did not significantly differ from zero (mean R = 0.01, p=0.2, nspins = 10). These results indicate that folding morphology is not the major driver of local or large scale patterning in the dataset. We have included this as a new Figure S3c.

We have updated the text as follows:

In Methods 3a: "Third, to assess whether the covariance in spatial patterning across genes could be a result of mesh-associated structure introduced through interpolation and smoothing, TD maps were recomputed for the spun+interpolated null datasets and compared to the original TD map (Fig S3c)."

In Results: "The TD map observed from the full DEMs library was highly stable between all disjoint triplets of donors (Methods, Fig S3a, median cross-vertex correlation in TD scores between triplets r=0.77) and across library subsets at all deciles of DEM reproducibility (Methods, Fig S3b, cross-vertex correlation in TD scores r>0.8 for the 3rd-10th deciles), but was not recapitulated in spun null datasets (Fig S3c)."

Author response image 3
Figure S3c: Correlations between TD and TD maps regenerated on datasets spun using two independent nulls, one where the rotation is applied prior to interpolation and smoothing (spun+interpolated) and one where it is applied to the already-created DEMs.

In each null, the same rotation matrix is applied to all genes.

Comment 4

Could you provide more information about the way in which the nearest-neighbours were identified (results p4). Were they nearest in Euclidean space? Geodesic? If geodesic, geodesic over the native brain surface? over the spherically deformed brain? (Methods cite Moresi & Mather's Stripy toolbox, which seems to be meant to be used on spheres). If the distance was geodesic over the sphere, could the distortions introduced by mapping (due to brain anatomy) influence the geometry of the expression maps?

Response 4

We have clarified in the Methods that the mapping is to nearest neighbors on the spherically-inflated surface.

The new null model we have introduced in response to comments 1 & 3 preserves any mesh-induced structure alongside any smoothing-induced spatial autocorrelations, and the additional analyses above indicate that main results are not induced by systematic mesh-related interpolation signal. In response to an additional suggestion from the reviewer (Comment 13), we also assessed whether local distortions due to the mesh could be creating apparent border effects in the data, for instance at the V1-V2 boundary. At the V1-V2 border, which coincides anatomically with the calcarine sulcus, we computed the 10 genes with the highest expression gradient along this boundary in the actual dataset and the spun-interpolated null. The median test expression gradients along this border was higher than in any of the spun datasets, indicating that these boundary effects are not explained by the interpolation and cortical geometry effects on the data (new Fig S2d).The text has been updated as follows:

In Methods 1: "For cortical vertices with no directly sampled expression, expression values were interpolated from their nearest sampled neighbor vertex on the spherical surface (Moresi and Mather, 2019) (Fig 1b)."

In Methods 2: "We used the spun+interpolated null to test whether high gene gradients could be driven by non-uniform interpolation across cortical folds. We quantified the average gradient for all genes along the V1-V2 border in the atlas, as well as for 10 iterations of the atlas where the samples were spun prior to interpolation. We computed the median gradient magnitude for the 20 top-ranked genes for each (Fig S2d)."

Author response image 4
Figure S2d: Mean of gradient magnitudes for 20 genes with largest gradients along V1-V2 border, compared to values along the same boundary on the spun+interpolated null atlas.

Gradients were higher in the actual dataset than in all spun version indicating this high gradient feature is not primarily due to the effects of calcarine sulcus morphology on interpolation

Comment 5

Could you provide more information about the smoothing algorithm? Volumetric, geodesic over the native mesh, geodesic over the sphere, averaging of values in neighbouring vertices, cotangent-weighted laplacian smoothing, something else?

Response 5

We are using surface-based geodesic over the white surface smoothing described in Glasser et al., 2013 and used in the HCP workbench toolbox (https://www.humanconnectome.org/software/connectome-workbench). We have updated the methods to clarify this.

In Methods 1: "Surface expression maps were smoothed using the Connectome Workbench toolbox (Glasser et al. 2013) with a 20mm full-width at half maximum Gaussian kernel , selected to be consistent with this sampling density (Fig 1c)."

Comment 6

Could you provide more information about the method used for computing the gradient of the expression maps (p6)? The gradient and the laplacian operator are related (the laplacian is the divergence of the gradient), which could also be responsible in part for the relationships observed between expression transitions and brain geometry.

Response 6

We are using Connectome Workbench’s metric gradient command for this Glasser et al., 2013 and used in the HCP workbench pipeline. The source code for gradient calculation can be found here:https://github.com/Washington-University/workbench/blob/131e84f7b885d82af76e be21adf2fa97795e2484/src/Algorithms/AlgorithmMetricGradient.cxx

In Methods 2: >For each of the resulting 20,781 gene-level expression maps, the orientation and magnitude of gene expression change at each vertex (i.e. the gradient) was calculated for folded, inflated, spherical and flattened mesh representations of the cortical sheet using Connectome Workbench’s metric gradient command (Glasser et al. 2013).

b. Potentially inflated FPR for spin tests on autocorrelated data."

Spin tests are extensively used in this work and it would be useful to make the readers aware of their limitations, which may confound some of the results presented. Spin tests aim at establishing if two brain maps are similar by comparing a measure of their similarity over a spherical deformation of the brains against a distribution of similarities obtained by randomly spinning one of the spheres. It is not clear which specific variety of spin test was used, but the original spin test has well known limitations, such as the violation of the assumption of spatial stationarity of the covariance structure (not all positions of the spinning sphere are equivalent, some are contracted, some are expanded), or the treatment of the medial wall (a big hole with no data is introduced when hemispheres are isolated).

Another important limitation results from the comparison of maps showing autocorrelation. This problem has been extensively described by Markello & Misic (2021). The strong smoothing used to make a continuous map out of just ~1300 samples introduces large, geometry dependent autocorrelations. Indeed, the expression maps presented in the manuscript look similar to those with the highest degree of autocorrelation studied by Markello & Misic (alpha=3). In this case, naive permutations should lead to a false positive rate ~46% when comparing pairs of random maps, and even most sophisticated methods have FPR>10%.

Comment 7There's currently several researchers working on testing spatial similarity, and the readers would benefit from being made aware of the problem of the spin test and potential solutions. There's also packages providing alternative implementations of spin tests, such as BrainSMASH and BrainSpace, which could be mentioned.

Response 7

We thank the reviewer for raising the issue of null models. First, with reference to the false positive rate of 46% when maps exhibit spatial autocorrelation, we absolutely agree that this is an issue that must be accounted for and we address this using the spin test. We acknowledge there has been other work on nulls such as BrainSMASH and BrainSpace. Nevertheless in the Markello and Misic paper to which the reviewer refers, the BrainSmash null models perform worse with smoother maps (with false positive rates approaching 30% in panel e below), whereas the spin test maintains false positives rates below 10%.

Author response image 5

We have added a brief description of the challenge and our use of the spin test.

In Methods 2a: "Cortical maps exhibit spatial autocorrelation that can inflate the False Positive Rate, for which a number of methods have been proposed(Alexander-Bloch et al. 2018; Burt et al. 2020; Vos de Wael et al. 2020). At higher degrees of spatial smoothness, this high False Positive Rate is most effectively mitigated using the spin test(Alexander-Bloch et al. 2018; Markello and Misic 2021; Vos de Wael et al. 2020). In the following analyses when generating a test statistic comparing two spatial maps, to generate a null distribution, we computed 1000 independent spins of the cortical surface using https://netneurotools.readthedocs.io, and applied it to the first map whilst keeping the second map unchanged. The test statistic was then recomputed 1000 times to generate a null distribution for values one might observe by chance if the maps shared no common organizational features. This is referred to throughout as the “spin test” and the derived p-values as pspin."

Comment 8

Could it be possible to measure the degree of spatial autocorrelation?

Response 8

We agree this could be a useful metric to generate for spatial cortical maps. However, there are multiple potential metrics to choose from and each of the DEMs would have their own value. To address this properly would require the creation of a set of validated tools and it is not clear how we could summarize this variety of potential metrics for 20k genes. Moreover, as discussed above the spin method is an adequate null across a range of spatial autocorrelation degrees, thus while we agree that in general estimation of spatial smoothness could be a useful imaging metric to report, we consider that it is beyond the scope of the current manuscript.

Comment 9

Could you clarify which version of the spin test was used? Does the implementation come from a package or was it coded from scratch?

Response 9

As Markello & Misic note, at the vertex level, the various implementations of the spin test become roughly equivalent to the ‘original’ Alexander-Bloch et al., implementation. We used took the code for the ‘original’ version implemented in python here:https://netneurotools.readthedocs.io/en/latest/_modules/netneurotools/stats.html# gen_spinsamples.

This has been updated in the methods (see Response 7).

Comment 10

Cortex and non-cortex vertex-level gene rank predictability maps (fig S1e) are strikingly similar. Would the spin test come up statistically significant? What would be the meaning of that, if the cortical map of genes not expressed in the cortex appeared to be statistically significantly similar to that of genes expressed in the cortex?

Response 10

Please see response to comment 3, which also addresses this observation.

Reviewer #2 (Public Review):

The authors convert the AHBA dataset into a dense cortical map and conduct an impressively large number of analyses demonstrating the value of having such data.

I only have comments on the methodology.

Comment 1

First, the authors create dense maps by simply using nearest neighbour interpolation followed by smoothing. Since one of the main points of the paper is the use of a dense map, I find it quite light in assessing the validity of this dense map. The reproducibility values they calculate by taking subsets of subjects are hugely under-powered, given that there are only 6 brains, and they don't inform on local, vertex-wise uncertainties. I wonder if the authors would consider using Gaussian process interpolation. It is really tailored to this kind of problem and can give local estimates of uncertainty in the interpolated values. For hyperparameter tuning, they could use leave-one-brain-out for that.

I know it is a lot to ask to change the base method, as that means re-doing all the analyses. But I think it would strengthen the paper if the authors put as much effort in the dense mapping as they did in their downstream analyses of the data.

Response 1

We thank the reviewer for the suggestion to explore Gaussian process interpolation. We have implemented this for our dataset and attempted to compare this with our original method with the 3 following tests: (i) intertriplet reproducibility of individual gene maps, (ii) microscale validations: area markers, (iii) macroscale validations: bio patterns.

Overall, compared to our original nearest-neighbor interpolation method, GP regression (i) did not substantially improve gene-level reproducibility of expression maps (median correlation increase of R=0.07 which was greater for genes without documented protein expression in cortex): (ii) substantially worsened performance in predicting areal marker genes and (iii) showed similar but slightly worse performance at predicting macroscale patterns from Figure 1.

Given the significantly poorer performance on one of our key tests (ii) we have opted not to replace our original database, but we do now include code for the alternative GP regression methodology in the github repository so others can reproduce/further develop these methods.

Author response image 6

ii) Genes ranked by mean expression gradient from current DEMs (left) and Gaussian process-derived interpolation maps (right). Established Human and macaque markers are consistently higher-ranked in DEM maps. iii) Figure 1 Interpolated vs GP regression

Author response table 1
InterpolatedGP regression
Fig 1g Cell typesR=0.62R=0.59
Fig 1i L4R=0.77R=0.63
Fig 1i BetzDelta z=1.76Delta z=1.31
Fig 1i MyelinR=0.60R=0.61
Fig 1i APOER=-0.52R=-0.48

Comment 2

It is nice that the authors share some code and a notebook, but I think it is rather light. It would be good if the code was better documented, and if the user could have access to the non-smoothed data, in case they was to produce their own dense maps. I was only wondering why the authors didn't share the code that reproduces the many analyses/results in the paper.

Response 2

We thank the reviewer for this suggestion. In response we have updated the shared github repository (https://github.com/kwagstyl/magicc). This now includes code and notebooks to reproduce the main analyses and figures.

Reviewer #1 (Recommendations For The Authors):

Minor comments

Comment 11

p4 mentions Fig S1h, but the supp figures only goes from S1a to S1g

Response 11

We thank the reviewer for capturing this error. It was in fact referring to what is now Fig S1h and has been updated.

Comment 12

It would be important that the authors share all the code used to produce the results in the paper in addition to the maps. The core methodological contribution of the work is a series of continuous maps of gene expression, which could become an important tool for annotation in neuroimaging research. Many arbitrary (reasonable) decisions were made, it would be important to enable users to evaluate their influence on the results.

Response 12

We thank both reviewers for this suggestion. We have updated the github to be able to reproduce the dense maps and key figures with our methods.

Comment 13

p5: Could the sharp border reflect the effect of the geometry of the calcarine sulcus on map smoothing? More generally, could there be an effect of folds on TD?

Response 13

Please see our response to Reviewer 1, Comment 1 above, where we introduce the new null models now analyzed to test for effects of mesh geometry on our findings. These new null models - where original source data were spun prior to interpolation suggest that neither the sharp V1/2 border or the TD map are effects of mesh geometry. Specifically: (i) , the magnitudes of gradients along the V1/2 boundary from null models were notably smaller than those in our original analyses (see new figure S2d), and (ii) TD maps computed from the new null models showed no correlation with TD maps from ur original analyses (new Figure S3c, mean R = 0.01, p=0.2, nspins = 10).

Comment 14

p5: Similar for the matching with the areas in Glasser's parcellation: the definition of these areas involves alignment through folds (based on freesurfer 'sulc' map, see Glasser et al 2016). If folds influence the geometry of TDs, could that influence the match?

Response 14

We note that Fig S3c provided evidence that folding was not the primary driver of the TD patterning. However, it is true that Glasser et al. use both neuroanatomy (folding, thickness and myelin) and fMRI-derived maps to delineate their cortical areas. As such Figure 2 f & g aren’t fully independent assessments. Nevertheless the reason that these features are used is that many of the sulci in question have been shown to reliably delineate cytoarchitectonic boundaries (Fischl et al., 2008).

In Results: "A similar alignment was seen when comparing gradients of transcriptional change with the spatial orientation of putative cortical areas defined by multimodal functional and structural in vivo neuroimaging(Glasser et al., 2016) (expression change running perpendicular to area long-axis, pspin<0.01, Fig 2g, Methods)."

Comment 15

p6: TD peaks are said to overlap with functionally-specialised regions. A comment on why audition is not there, nor language, but ba 9-46d is? Would that suggest a lesser genetic regulation of those functions?

Response 15

The reviewer raises a valid point and this was a result that we were also surprised by. The finding that the auditory cortex is not as microstructurally distinctive as, say V1, is consistent with other studies applying dimensionality-reduction techniques to multimodal microstructural receptor data (e.g. Zilles et al., 2017, Goulas et al., 2020). These studies found that the auditory microstructure is not as extreme as either visual and somatomotor areas. From a methodological view point, the primary auditory cortex is significantly smaller than both visual and somatomotor areas, and therefore is captured by fewer independent samples, which could reduce the detail in which its structure is being mapped in our dataset.

For the frontal areas, we would note that (i) the frontal peak is the smallest of all peaks found and was more strongly characterised by low z-score genes than high z-score. (ii) the anatomical areas in the frontal cortex are much more highly variable with respect to folding morphology (e.g. Rajkowska 1995). The anatomical label of ba9-46d (and indeed all other labels) were automatically generated as localisers rather than strict area labels. We have clarified this in the text as follows:

In Methods 3a: "Automated labels to localize TD peaks were generated based on their intersection with a reference multimodal neuroimaging parcellation of the human cortex(Glasser et al., 2016). Each TD was given the label of the multimodal parcel that showed greatest overlap (Fig 2b)."

Comment 16.

p7: The proposition that "there is a tendency for cortical sulci to run perpendicular to the direction of fastest transcriptional change", could also be "there is a tendency for the direction of fastest transcriptional change to run perpendicular to cortical sulci"? More pragmatically, this result from the geometry of transcriptional maps being influenced by sulcal geometry in their construction.

Response 16

Please see our response to Reviewer 1, Comment 1 above, where we introduce the new null models now analyzed to test for effects of mesh geometry on our findings. These models indicate that the topography of interpolated gene expression maps do not reflect influences of sulcal geometry on their construction.

Comment 17

p7: TD transitions are indicated to precede folding. This is based on a consideration of folding development based on the article by Chi et al 1977, which is quite an old reference. In that paper, the authors estimated the tempo of human folding development based on the inspection of photographs, which may not be sufficient for detecting the first changes in curvature leading to folds. The work of the Developing Human Connectome consortium may provide a more recent indication for timing. In their data, by PCW 21 there's already central sulcus, pre-central, post-central, intra-parietal, superior temporal, superior frontal which can be detected by computing the mean curvature of the pial surface (I can only provide a tweet for reference: https://twitter.com/R3RT0/status/1617119196617261056). Even by PCW 9-13 the callosal sulcus, sylvian fissure, parieto-occipital fissure, olfactory sulcus, cingulate sulcus and calcarine fissure have been reported to be present (Kostovic & Vasung 2009).

Response 17

Our field lacks the data necessary to provide a comprehensive empirical test for the temporal ordering of regional transcriptional profiles and emergence of folding. Our results show that transcriptional identities of V1 and TGd are - at least - present at the very earliest stages of sulcation in these regions. In response to the reviewers comment we have updated with a similar fetal mapping project which similarly shows evidence of the folds between weeks 17-21 and made the language around directionality more cautious.

In Results: "The observed distribution of these angles across vertices was significantly skewed relative to a null based on random alignment between angles (pspin<0.01, Fig 2f, Methods) - indicating that there is indeed a tendency for cortical sulci and the direction of fastest transcriptional change to run perpendicular to each other (pspin<0.01, Fig 2f).

As a preliminary probe for causality, we examined the developmental ordering of regional folding and regional transcriptional identity. Mapping the expression of high-ranking TD genes in fetal cortical laser dissection microarray data(Miller et al., 2014) from 21 PCW (Post Conception Weeks) (Methods) showed that the localized transcriptional identity of V1 and TGd regions in adulthood is apparent during the fetal periods when folding topology begins to emerge (Chi et al. 1977; Xu et al. 2022) (Fig "S2d).

In Discussion: "By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a “protomap”(Rakic 1988; O'Leary 1989; O'Leary et al. 2007; Rakic et al. 2009) like model where the placement of some cortical folds is set-up by rapid tangential changes in cyto-laminar composition of the developing cortex(Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding(Llinares-Benadero and Borrell 2019; Heuer and Toro 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship."

Comment 18

p7: In my supplemental figures (obtained from biorxiv, because I didn't find them among the files submitted to eLife) there's no S2j (only S2a-S2i).

Response 18

We apologize, this figure refers to S3k (formerly S3j), rather than S2j. We have updated the main text.

Comment 19 p7: It is not clear from the methods (section 3b) how the adult and fetal brains were compared. Maybe using MSM (Robinson et al 2014)?

Response 19

We have now clarified this in Methods text as reproduced below.

In Methods 3b: "We averaged scaled regional gene expression values between donors per gene, and filtered for genes in the fetal LDM dataset that were also represented in the adult DEM dataset - yielding a single final 20,476*235 gene-by-sample matrix of expression values for the human cortex at 21 PCW. Each TD peak region was then paired with the closest matching cortical label within the fetal regions. This matrix was then used to test if each TD expression signature discovered in the adult DEM dataset (Fig 2, Table 3) was already present in similar cortical regions at 21 PCW."

Comment 20

p7: WGCNA is used prominently, could you provide a brief introduction to its objectives? The gene coexpression networks are produced after adjusting the weight of the network edges to follow a scale-free topology, which is meant to reflect the nature of protein-protein interactions. Soft thresholding increases contrast, but doesn't this decrease a potential role of infinitesimal regulatory signals?

Response 20

We agree with the reviewer that the introduction to WGCNA needed additional details and have amended the Results (see below). One limitation of WGCNA-derived associations is that it will downweigh the role of smaller relationships including potentially important regulatory signals. WGCNA methods have been titrated to capture strong relationships. This is an inherent limitation of all co-expression driven methods which lead to an incomplete characterisation of the molecular biology. Nevertheless we feel these stronger relationships are still worth capturing and interrogating. We have updated the text to introduce WGCNA and acknowledge this potential weakness in the approach.

In Results: "Briefly, WGCNA constructs a constructs a connectivity matrix by quantifying pairwise co-expression between genes, raising the correlations to a power (here 6) to emphasize strong correlations while penalizing weaker ones, and creating a Topological Overlap Matrix (TOM) to capture both pairwise similarities expression and connectivity. Modules of highly interconnected genes are identified through hierarchical clustering. The resultant WGCNA modules enable topographic and genetic integration because they each exist as both (i) a single expression map (eigenmap) for spatial comparison with neuroimaging data (Fig 3a,b, Methods) and, (ii) a unique gene set for enrichment analysis against marker genes systematically capturing multiple scales of cortical organization, namely: cortical layers, cell types, cell compartments, protein-protein interactions (PPI) and GO terms (Methods, Table S2 and S4)."

Comment 21

WGCNA modules look even more smooth than the gene expression maps. Are these maps comparable to low frequency eigenvectors? Autocorrelation in that case should be very strong?

Response 21

These modules are smooth as they are indeed eigenvectors which likely smooth out some of the more detailed but less common features seen in individual gene maps. These do exhibit high degrees of autocorrelation, nevertheless we are applying the spin test which is currently the appropriate null model for spatially autocorrelated cortical maps (Response 7).

Comment 22

If the WGCNA modules provide an orthogonal basis for surface data, is it completely unexpected that some of them will correlate with low-frequency patterns? What would happen if random low frequency patterns were generated? Would they also show correlations with some of the 16 WGCNA modules?

Response 22

We agree with the reviewer that if we used a generative model like BrainSMASH, we would likely see similar low frequency patterns. However, the inserted figure in Response 7 from Makello & Misic provide evidence that is not as conservative a null as the spin test when data exhibit high spatial autocorrelation. The spatial enrichment tests carried out on the WGCNA modules are all carried out using the spin test.

Comment 23

In part (a) I commented on the possibility that brain anatomy may introduce artifactual structure into the data that's being mapped. But what if the relationship between brain geometry and brain organisation were deeper than just the introduction of artefacts? The work of Lefebre et al (2014, https://doi.org/10.1109/ICPR.2014.107; 2018, https://doi.org/10.3389/fnins.2018.00354) shows that clustering based on the 3 lowest frequency eigenvectors of the Laplacian of a brain hemisphere mesh produce an almost perfect parcellation into lobes, with remarkable coincidences between parcel boundaries and primary folds and fissures. The work of Pang et al(https://doi.org/10.1101/2022.10.04.510897) suggests that the geometry of the brain plays a critical role in constraining its dynamics: they analyse >10k task-evoked brain maps and show that the eigenvectors of the brain laplacian parsimoniously explain the activity patterns. Could brain anatomy have a downward effect on brain organisation?

Response 23

The reviewer raises a fascinating extension of our work identifying spatial modes of gene expression. We agree that these are low frequency in nature, but would first like to note that the newly introduced null model indicates that the overlaps with salient neuroanatomical features are inherent in the expression data and not purely driven by anatomy in a methodological sense.

Nevertheless we absolutely agree there is likely to be a complex multidirectional interplay between genetic expression patterns through development, developing morphology and the “final” adult topography of expression, neuroanatomical and functional patterns.

We think that the current manuscript currently contains a lot of in depth analyses of these expression data, but agree that a more extensive modeling analysis of how expression might pattern or explain functional activation would be a fascinating follow on, especially in light of these studies from Pang and Lefebre. Nevertheless we think that this must be left for a future modeling paper integrating these modes of microscale, macroscale and functional anatomy.

In Discussion: "Indeed, future work might find direct links between these module eigenvectors and similar low-frequency eigenvectors of cortical geometry have been used as basis functions to segment the cortex (Lefèvre et al. 2018) and explain complex functional activation patterns(Pang et al. 2023)."

Comment 24

On p11: ASD related to rare, deleterious mutations of strong effect is often associated with intellectual disability (where the social interaction component of ASD is more challenging to assess). Was there some indication of a relationship with that type of cognitive phenotype?

Response 24

Across the two ABIDE cohorts, the total number of those with ASD and IQ <70, which is the clinical threshold for intellectual disability was n=10, which unfortunately did not allow us to conduct a meaningful test of whether ID impacts the relationship between imaging changes in ASD and the expression maps of genes implicated in ASD by rare variants.

Comment 25

Could you clarify if the 6 donors were aligned using the folding-based method in freesurfer?

Response 25

The 6 donors were aligned using MSMsulc (Robinson et al., 2014), which is a folding based method from the HCP group. This is now clarified in the methods.

In Methods 1: "Cortical surfaces were reconstructed for each AHBA donor MRI using FreeSurfer(Fischl, 2012), and coregistered between donors using surface matching of individuals’ folding morphology (MSMSulc) (Robinson et al., 2018)."

Comment 26

The authors make available a rich resource and a series of tools to facilitate their use. They have paid attention to encode their data in standard formats, and their code was made in Python using freely accessible packages instead of proprietary alternatives such as matlab. All this should greatly facilitate the adoption of the approach. I think it would be important to state more explicitly the conceptual assumptions that the methodology brings. In the same way that a GWAS approach relies on a Mendelian idea that individual alleles encode for phenotypes, what is the idea about the organisation of the brain implied by the orthogonal gene expression modules? Is it that phenotypes - micro and macro - are encoded by linear combinations of a reduced number of gene expression patterns? What would be the role of the environment? The role of non-genic regulatory regions? Some modalities of functional organisation do not seem to be encoded by the expression of any module. Is it just for lack of data or should this be seen as the sign for a different organisational principle? Likewise, what about the aspects of disorders that are not captured by expression modules? Would that hint, for example, to stronger environmental effects? What about linear combinations of modules? Nonlinear? Overall, the authors adopt implicitly, en passant, a gene-centric conceptual standpoint, which would benefit from being more clearly identified and articulated. There are citations to Rakic's protomap idea (I would also cite the original 1988 paper, and O'Leary's 1989 "protocortex" paper stressing the role of plasticity), which proposes that a basic version of brain cytoarchitecture is genetically determined and transposed from the proliferative ventricular zone regions to the cortical plate through radial migration. In p13 the authors indicate that their results support Rakic's protomap. Additionally, in p7 the authors suggest that their results support a causal arrow going from gene expression to sulcal anatomy. The reviews by O'leary et al (2007), Ronan & Fletcher (2014, already cited), Llinares-Benadero & Borrell (2019) could be considered, which also advocate for a similar perspective. For nuances on the idea that molecular signals provide positional information for brain development, the article by Sharpe (2019, DOI: 10.1242/dev.185967) is interesting. For nuances on the gene-centric approach of the paper the articles by Rockmann (2012, DOI: 10.1111/j.1558-5646.2011.01486.x) but also from the ENCODE consortium showing the importance of non-genic regions of the genome ("Perspectives on ENCODE" 2020 DOI: 10.1038/s41586-021-04213-8) could be considered. I wouldn't ask to cite ideas from the extended evolutionary synthesis about different inheritance systems (as reviewed by Jablonka & Lamb, DOI: 10.1017/9781108685412) or the idea of inherency (Newman 2017, DOI: 10.1007/978-3-319-33038-9_78-1), but the authors may find them interesting. Same goes for our own work on mechanical morphogenesis which expands on the idea of a downward causality (Heuer and Toro 2019, DOI: 10.1016/j.plrev.2019.01.012)

Response 26

We thank the reviewer for recommending these papers, which we enjoyed reading and have deepened our thinking on the topic. In addition to toning down some of the language with respect to causality that our data cannot directly address, we have included additional discussion and references as follows:

In Discussion: "By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a “protomap”(Rakic 1988; O'Leary 1989; O'Leary et al. 2007; Rakic et al. 2009) like model where the placement of some cortical folds is set-up by rapid tangential changes in cyto-laminar composition of the developing cortex(Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding(Llinares-Benadero and Borrell 2019; Heuer and Toro 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship.

Overall, the manuscript is very interesting and a great contribution. The amount of work involved is impressive, and the presentation of the results very clear. My comments indicate some aspects that could be made more clear, for example, providing additional methodological information in the supplemental material. Also, making aware the readers and future users of MAGICC of the methodological and conceptual challenges that remain to be addressed in the future for this field of research.

Reviewer #2 (Recommendations For The Authors):

Comment 1

The supplementary figures seem to be missing from the eLife submission (although I was able to find them on europepmc)

Response 1

We apologize that these were not included in the documents sent to reviewers. The up-to-date supplementary figures are included in this resubmission and again on biorxiv.

https://doi.org/10.7554/eLife.86933.3.sa3

Article and author information

Author details

  1. Konrad Wagstyl

    Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    For correspondence
    k.wagstyl@ucl.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3439-5808
  2. Sophie Adler

    UCL Great Ormond Street Institute for Child Health, Holborn, United Kingdom
    Contribution
    Formal analysis, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Jakob Seidlitz

    1. Department of Psychiatry, University of Pennsylvania, Philadelphia, United States
    2. Department of Child and Adolescent Psychiatry and Behavioral Science, The Children's Hospital of Philadelphia, Philadelphia, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Simon Vandekar

    Department of Biostatistics, Vanderbilt University, Nashville, United States
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Travis T Mallard

    1. Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, United States
    2. Department of Psychiatry, Harvard Medical School, Boston, United States
    Contribution
    Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Richard Dear

    Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Alex R DeCasien

    Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, Bethesda, United States
    Contribution
    Conceptualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Theodore D Satterthwaite

    1. Department of Psychiatry, University of Pennsylvania, Philadelphia, United States
    2. Lifespan Informatics and Neuroimaging Center, University of Pennsylvania School of Medicine, Philadelphia, United States
    Contribution
    Conceptualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7072-9399
  9. Siyuan Liu

    Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, Bethesda, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3661-6248
  10. Petra E Vértes

    Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0992-3210
  11. Russell T Shinohara

    Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
    Contribution
    Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    receives consulting income from Octave Bioscience and compensation for reviewership duties from the American Medical Association
  12. Aaron Alexander-Bloch

    1. Department of Psychiatry, University of Pennsylvania, Philadelphia, United States
    2. Department of Child and Adolescent Psychiatry and Behavioral Science, The Children's Hospital of Philadelphia, Philadelphia, United States
    Contribution
    Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  13. Daniel H Geschwind

    Center for Autism Research and Treatment, Semel Institute, Program in Neurogenetics, Department of Neurology and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Conceptualization, Supervision, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2896-3450
  14. Armin Raznahan

    Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, Bethesda, United States
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5622-1190

Funding

Wellcome Trust (10.35802/215901)

  • Konrad Wagstyl

National Institute of Mental Health (1ZIAMH002949-04)

  • Armin Raznahan

National Institutes of Health (T32HG010464)

  • Jakob Seidlitz

MQ Mental Health Research (MQF17_24)

  • Petra E Vértes

Rosetrees Trust (A2665)

  • Sophie Adler

Gates Cambridge Trust

  • Richard Dear

National Institutes of Health (T32MH019112)

  • Jakob Seidlitz

National Institutes of Health (K08MH120564)

  • Jakob Seidlitz

National Institutes of Health (R01EB022573)

  • Jakob Seidlitz

National Institutes of Health (R01MH120482)

  • Jakob Seidlitz

National Institutes of Health (R01MH123563)

  • Jakob Seidlitz

National Institutes of Health (R37MH125829)

  • Jakob Seidlitz

National Institutes of Health (R01MH120482)

  • Jakob Seidlitz

National Institutes of Health (R01)

  • Jakob Seidlitz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Acknowledgements

The authors thank all the participants and their families for their generous involvement in this study. The National Institute of Mental Health Intramural Research Program NIH Annual Report Number, 1ZIAMH002949-04 (AR); Wellcome Trust 215901/Z/19/Z (KSW); NIH grant R01MH112847 (RTS, TDS); NIH grant R01MH123550 (RTS); NIH grant R01MH120482 (RTS, TDS); NIH grant R37MH125829 (TDS); NIH grant R01MH123563 (SNV); NIH grant R01MH120482 (TDS); NIH grant R01EB022573 (TDS); Gates Cambridge Trust (RD); NIH grant T32HG010464 (TTM); MQ: Transforming Mental Health MQF17_24 (PEV); NIH grant T32MH019112 (JS); NIH grant K08MH120564 (AAB, JS); and Rosetrees Trust project grant A2665 (SA).

Senior Editor

  1. Floris P de Lange, Donders Institute for Brain, Cognition and Behaviour, Netherlands

Reviewing Editor

  1. Saad Jbabdi, University of Oxford, United Kingdom

Version history

  1. Preprint posted: February 11, 2023 (view preprint)
  2. Sent for peer review: March 7, 2023
  3. Preprint posted: May 24, 2023 (view preprint)
  4. Preprint posted: January 11, 2024 (view preprint)
  5. Version of Record published: February 7, 2024 (version 1)

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.86933. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 792
    Page views
  • 92
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Konrad Wagstyl
  2. Sophie Adler
  3. Jakob Seidlitz
  4. Simon Vandekar
  5. Travis T Mallard
  6. Richard Dear
  7. Alex R DeCasien
  8. Theodore D Satterthwaite
  9. Siyuan Liu
  10. Petra E Vértes
  11. Russell T Shinohara
  12. Aaron Alexander-Bloch
  13. Daniel H Geschwind
  14. Armin Raznahan
(2024)
Transcriptional cartography integrates multiscale biology of the human cortex
eLife 12:RP86933.
https://doi.org/10.7554/eLife.86933.3

Share this article

https://doi.org/10.7554/eLife.86933

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Thomas A Sasani, Aaron R Quinlan, Kelley Harris
    Research Article

    Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair, mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations – the frequencies of C>T, A>G, etc. – will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs (Sasani et al., 2022, Ashbrook et al., 2021). In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh (David et al., 2007). Its effect depends on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci have greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Ban Wang, Alexander L Starr, Hunter B Fraser
    Research Article

    Although gene expression divergence has long been postulated to be the primary driver of human evolution, identifying the genes and genetic variants underlying uniquely human traits has proven to be quite challenging. Theory suggests that cell-type-specific cis-regulatory variants may fuel evolutionary adaptation due to the specificity of their effects. These variants can precisely tune the expression of a single gene in a single cell-type, avoiding the potentially deleterious consequences of trans-acting changes and non-cell type-specific changes that can impact many genes and cell types, respectively. It has recently become possible to quantify human-specific cis-acting regulatory divergence by measuring allele-specific expression in human-chimpanzee hybrid cells—the product of fusing induced pluripotent stem (iPS) cells of each species in vitro. However, these cis-regulatory changes have only been explored in a limited number of cell types. Here, we quantify human-chimpanzee cis-regulatory divergence in gene expression and chromatin accessibility across six cell types, enabling the identification of highly cell-type-specific cis-regulatory changes. We find that cell-type-specific genes and regulatory elements evolve faster than those shared across cell types, suggesting an important role for genes with cell-type-specific expression in human evolution. Furthermore, we identify several instances of lineage-specific natural selection that may have played key roles in specific cell types, such as coordinated changes in the cis-regulation of dozens of genes involved in neuronal firing in motor neurons. Finally, using novel metrics and a machine learning model, we identify genetic variants that likely alter chromatin accessibility and transcription factor binding, leading to neuron-specific changes in the expression of the neurodevelopmentally important genes FABP7 and GAD1. Overall, our results demonstrate that integrative analysis of cis-regulatory divergence in chromatin accessibility and gene expression across cell types is a promising approach to identify the specific genes and genetic variants that make us human.