Standardizing workflows in imaging transcriptomics with the abagen toolbox

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Gene expression fundamentally shapes the structural and functional architecture of the human brain. Open-access transcriptomic datasets like the Allen Human Brain Atlas provide an unprecedented ability to examine these mechanisms in vivo; however, a lack of standardization across research groups has given rise to myriad processing pipelines for using these data. Here, we develop the abagen toolbox, an open-access software package for working with transcriptomic data, and use it to examine how methodological variability influences the outcomes of research using the Allen Human Brain Atlas. Applying three prototypical analyses to the outputs of 750,000 unique processing pipelines, we find that choice of pipeline has a large impact on research findings, with parameters commonly varied in the literature influencing correlations between derived gene expression and other imaging phenotypes by as much as ρ ≥ 1.0. Our results further reveal an ordering of parameter importance, with processing steps that influence gene normalization yielding the greatest impact on downstream statistical inferences and conclusions. The presented work and the development of the abagen toolbox lay the foundation for more standardized and systematic research in imaging transcriptomics, and will help to advance future understanding of the influence of gene expression in the human brain.

Editor's evaluation

This paper will be of interest to scientists studying the large-scale transcriptomic organization of the human brain, and in particular those who have used or plan to use the Allen Human Brain Atlas dataset. The study is well-motivated and novel. The most striking finding is the magnitude of variability that is introduced by different data processing decisions. The open-source software described in this study is comprehensive, well documented, and is an important contribution to the field.

https://doi.org/10.7554/eLife.72129.sa0

Introduction

Technologies like magnetic resonance imaging (MRI) provide unique insights into macroscopic brain structure and function in vivo. Modern research increasingly emphasizes how microscale attributes, such as gene expression, influence these imaging-derived phenotypes (Fornito et al., 2019; Arnatkeviciute et al., 2019; Arnatkevičiūtė et al., 2021). Gene expression is particularly useful as it is a fundamental molecular phenotype that can be plausibly linked to the function of biological pathways (Whitaker et al., 2016; Seidlitz et al., 2018), protein synthesis (Zheng et al., 2019), receptor distributions (Beliveau et al., 2017; Nørgaard et al., 2021; Shine et al., 2019; Deco et al., 2020; Preller et al., 2018), and cell types (Hansen et al., 2021; Anderson et al., 2020b; Anderson et al., 2018; Seidlitz et al., 2020; Gao et al., 2020). However, researchers looking to bridge these macro- and microscopic phenotypes must overcome multiple challenges. Although there are numerous technical and analytic considerations, one foundational issue is that acquiring high-quality transcriptomic data from the human brain is both costly and highly invasive, requiring budgets far greater than most typical neuroimaging studies and restrictive access to tissue from post-mortem donors or cranial surgical patients. As such, researchers must often rely on freely available repositories of gene expression data.

There exist multiple open-access repositories for gene expression in the human brain, including BrainSpan (Miller et al., 2014; Kang et al., 2011) and PsychENCODE (Gandal et al., 2018; Li et al., 2018; Wang et al., 2018; among others: Sousa et al., 2017; Darmanis et al., 2015; Lake et al., 2016); however, these datasets generally provide relatively sparse anatomical coverage, limiting the types of analyses that can be performed. Thus, researchers who aim to compare transcriptomic expression with whole-brain imaging-derived phenotypes have primarily relied on the Allen Human Brain Atlas (AHBA; Hawrylycz et al., 2012; Hawrylycz et al., 2015). Initially released in 2010, the AHBA remains the most spatially comprehensive dataset of its kind. Derived from bulk microarray analysis of tissue samples obtained from six donors, the AHBA provides expression data for more than 20,000 genes across 3702 brain areas in MRI-derived stereotactic space. With its superior resolution, the AHBA has significantly contributed to the emergence of the field of imaging transcriptomics (Fornito et al., 2019), enabling dozens of studies over the past decade examining relationships between gene expression and an array of macroscale imaging attributes, including cortical thickness (Shin et al., 2018), myelination (Burt et al., 2018), developmental brain maturation (Whitaker et al., 2016; Kirsch and Chechik, 2016), structural brain networks (Seidlitz et al., 2018; Romero-Garcia et al., 2018; Arnatkevičiūtė et al., 2020), functional brain networks (Richiardi et al., 2015; Krienen et al., 2016; Vértes et al., 2016), and human cognition (Fox et al., 2014; Hansen et al., 2021). The AHBA has also highlighted the importance of whole-brain gene expression in neurological and psychiatric diseases, where it has become increasingly clear that transcriptional pathways play a critical role in shaping the broader dynamics of disease progression and emergent symptomatology (Zheng et al., 2019; Shafiei et al., 2021; Henderson et al., 2019; Vogel et al., 2020; Rittman et al., 2016; Anderson et al., 2020a; Romme et al., 2017; McColgan et al., 2018; Morgan et al., 2019).

Since its release, several software toolboxes have been developed to help researchers use transcriptional data from the AHBA (French and Paus, 2015; Gorgolewski et al., 2015; Rittman et al., 2017; Rizzo et al., 2016); however, these tools often focus primarily on facilitating integration of the AHBA with neuroimaging data, offering limited if any functionality for modifying how the data are processed prior to analysis. Instead, a recent comprehensive review revealed that many research groups have opted to develop their own processing pipelines for the AHBA (Arnatkeviciute et al., 2019). Unfortunately, as there are no field-accepted standards for processing imaging transcriptomic data, the generated pipelines vary substantially across groups.

The extent to which such processing variability affects analytic outcomes from the AHBA remains unknown. Indeed, over the past decade neuroimaging research has shown that methodological variability can have broad influences on analyses using structural MRI (Bhagwat et al., 2021; Kharabian Masouleh et al., 2020), diffusion MRI (Oldham et al., 2020; Maier-Hein et al., 2017; Schilling et al., 2019), task fMRI (Carp, 2012; Botvinik-Nezer et al., 2020), and resting-state fMRI (Parkes et al., 2018; Ciric et al., 2017). Although researchers are beginning to grapple with the consequences of this variability, the lack of baseline gene expression datasets against which to compare new results impedes the development of standardized practices. In these situations, some researchers have proposed performing ‘multiverse’ analyses (Steegen et al., 2016; Dragicevic et al., 2019), wherein all possible permutations of data processing are analyzed and the full range of analytic results reported. Although such analyses can be computationally intensive, they offer a path to understand how processing choices impact statistical inferences and conclusions, and provide a mechanism by which to help researchers converge on an optimal pipeline.

Here, we comprehensively investigate how different processing choices influence the results of analyses using the AHBA. First, we develop an open-source Python toolbox, abagen, that collates all possible processing parameters into a set of turn-key workflows, optimized for flexibility and ease-of-use. We then use the toolbox to process the AHBA through approximately 750,000 unique pipelines. Across three prototypical imaging transcriptomic analyses, we examine whether and how these different processing options modify derived statistical estimates and quantify the relative importance of each option. Next, we replicate a curated set of processing pipelines from the literature to assess how previously reported findings compare to the full range of potential outcomes observed across all examined pipelines. Finally, we end with a set of recommendations, integrated directly into the developed abagen toolbox, to promote standardized use of the AHBA in future work.

Results

We introduce the abagen toolbox, an open-access software package designed to streamline processing and preparation of the AHBA for integration with neuroimaging data (Markello et al., 2021c, available at https://github.com/rmarkello/abagen; Markello, 2021b copy archived at swh:1:rev:2aeab5bd0f147fa76b488645e148a1c18095378d). Supporting several workflows, abagen offers functionality for an array of analyses and has already been used in several peer-reviewed publications and preprints (Shafiei et al., 2020; Hansen et al., 2021; Shafiei et al., 2021; Brown et al., 2021; Park et al., 2021; Valk et al., 2021; Zhao et al., 2020; Benkarim et al., 2020; Ding et al., 2021; Park et al., 2020; Lariviere et al., 2020; Martins et al., 2021). The primary workflow, used to generate regional gene expression matrices, integrates 17 distinct processing steps that have previously been employed by research groups throughout the published literature (Table 1). We refer to each unique set of processing choices and parameters as a ‘pipeline’. The following results use abagen to investigate how variable application of these processing steps can impact analyses of AHBA data.

Table 1

Abagen pipeline options.

Overview of 17 options to be considered when processing the AHBA data. The Choices column indicates the number of parameters explored in the current report (numerator) and the total number of parameters possible for the given option (denominator). A denominator of $n$ indicates a hypothetically near-infinite parameter space. The Description column gives a brief overview of the processing choice; for more detail refer to the relevant section in Materials and methods: Gene expression pipelines.

Option	Choices	Description
Volumetric or surface atlas	2/2	Whether to use a volumetric or surface representation of the atlas
Individualized or group atlas	1/2	Whether to use individualized donor-specific atlases or a group-level atlas
Use non-linear MNI coordinates	2/2	Whether to use updated MNI coordinates provided by alleninf package
Mirror samples across L/R hemisphere	3/4	Whether to mirror (i.e., duplicate) samples across hemisphere boundary
Update probe-to-gene annotations	2/2	Whether to update probe annotations
Intensity-based filtering threshold	3/ $n$	Threshold for intensity-based filtering of probes
Inter-areal similarity threshold	1/ $n$	Threshold for removing samples with low inter-areal correspondence
Probe selection method	6/8	Method by which to select which probe(s) should represent a given gene
Donor-specific probe selection	3/3	How specified probe selection should integrate data from different donors
Missing data method	2/3	How to handle when brain regions are not assigned expression data
Sample-to-region matching tolerance	3/ $n$	Distance tolerance for matching tissue samples to atlas brain regions
Sample normalization method	3/10	Method for normalizing tissue samples (across genes)
Gene normalization method	3/10	Method for normalizing genes (across tissue samples)
Normalize only matched samples	2/2	Whether to perform gene normalization for all versus matched samples
Normalizing discrete structures	2/2	Whether to perform gene normalization within structural classes
Sample-to-region combination method	2/2	Whether to aggregate tissue samples in regions within or across donors
Sample-to-region combination metric	2/2	Metric for aggregating tissue samples into atlas brain regions

Processing choices influence transcriptomic analyses

To understand how choices made during the processing of AHBA data impact downstream analyses, we enumerated 17 decision points (i.e. processing steps or options) that have been modified and used in the literature (Table 1). From these 17 steps we implemented 746,496 distinct processing pipelines, where each pipeline parcellated microarray expression from the AHBA with the Desikan-Killiany atlas (Desikan et al., 2006) to generate a unique brain region-by-gene expression matrix.

Analyses of expression data from the AHBA can be grouped into one of three broad classes (Fornito et al., 2019): correlated gene expression analyses, gene co-expression analyses, and regional gene expression analyses. Correlated gene expression analyses examine the correlation between brain regions across genes, yielding a symmetric region × region matrix (similar to a functional connectivity matrix). Gene co-expression analyses, on the other hand, examine the correlation between genes across brain regions, yielding a symmetric gene × gene matrix. Finally, regional gene expression analyses examine the expression patterns of specific genes or gene sets in relation to other imaging-derived phenotypes.

To examine how differences in processing choices may impact both the expression matrices generated from the different pipelines and derived statistical estimates we ran one analysis from each of these classes on the matrices generated by each processing pipeline. Notably, these analyses are either direct reproductions or variations of analyses that have been previously published (Arnatkeviciute et al., 2019; Oldham et al., 2008; Hawrylycz et al., 2012; Burt et al., 2018). Although there is no ground truth for any of these analyses, findings from previous work offer some context for interpreting the observed results (i.e. data from other species and other modalities; Lau et al., 2021). Nonetheless, we primarily focus on highlighting the potential variability resulting from different processing pipelines.

Correlated gene expression (CGE)

First, we separately correlated the rows of each expression matrix to generate symmetric region × region ‘correlated gene expression’ matrices, indicating the similarity of gene expression profiles between different brain regions (Figure 1a). Previous work in other species has reliably observed that transcriptional similarity in the brain decays with increasing separation distance (Fulcher et al., 2019; Lau et al., 2021). This distance-dependent relationship is an expected feature due to the functional specialization of brain regions, and is consistent with other imaging-derived phenotypes in humans (Roberts et al., 2016; Goulas et al., 2019; Betzel and Bassett, 2018; Mišić et al., 2014; Shafiei et al., 2020; Horvát et al., 2016). We assessed this relationship by extracting the upper triangle of the correlated gene expression matrices and correlating them with the upper triangle of a regional distance matrix, derived by computing the average Euclidean distance between brain region centroids in the Desikan-Killiany atlas (Figure 1a, left panel). Although previous work has highlighted that this relationship is exponential (Arnatkeviciute et al., 2019), we computed the Spearman correlation as both statistics should exhibit similar variability across pipelines and the latter is less computationally expensive.

Figure 1

Download asset Open asset

Processing choices influence transcriptomic analyses.

(a) Examples of the three analyses used to assess differences in gene expression matrices generated by transcriptomic pipelines. First row: a depiction of the region-by-gene expression matrix generated from one of the 746,496 tested processing pipelines. Second row, left: we compute the correlation between rows of each matrix to generate a symmetric region × region CGE matrix. We then compute the correlation between the upper triangle of this CGE matrix and the upper triangle of a regional distance matrix to examine the degree to which CGE decays with increasing distance between regions (Arnatkeviciute et al., 2019). Second row, middle: we compute the Euclidean distance between columns of each matrix to generate a gene × gene GCE matrix. We use previously defined functional gene communities (Oldham et al., 2008) to compute a silhouette score for this GCE matrix to investigate whether genes within a module have more similar patterns of spatial expression than genes between modules. Second row, right: the first principal component is extracted from the RGE matrix. We compute the correlation between this principal component and the whole-brain T1w/T2w ratio (Burt et al., 2018) to understand how closely these maps covary across the brain. (b) The full statistical distributions from each of the three analyses for all 746,496 pipelines. Left panel: Spearman correlation values, $ρ$ , from the CGE analyses. Middle panel: silhouette scores from the GCE analyses. Right panel: Spearman correlation coefficients, $ρ$ , from the RGE analyses. CGE: correlated gene expression; GCE: gene co-expression; RGE: regional gene expression.

Gene co-expression (GCE)

For the second type of analysis we separately correlated the columns of each expression matrix to generate gene × gene ‘co-expression’ (GCE) matrices, indicating the similarity in spatial expression patterns between all pairs of genes (Figure 1a). A significant body of research has shown that genes tend to form functional communities, exhibiting synchronized expression patterns across space and time (Oldham et al., 2008), such that gene co-expression patterns tend to be more similar within than between such communities. Here, we obtained a set of gene community assignments derived for the brain from a previously studied human transcriptomic dataset (Oldham et al., 2008). We used these community assignments to calculate a silhouette score (Rousseeuw, 1987) for the gene co-expression matrices generated by each pipeline, measuring how well these communities represented the derived co-expression patterns (Figure 1a, middle panel).

Regional gene expression (RGE)

For the third type of transcriptomic analysis, we focused on regional correlations between gene expression measures and an MRI-derived phenotype. Our regional expression measure was defined by computing the first principal component of the region-by-gene expression matrix, representing the axis of maximum spatial variation of gene expression in the brain observed under a given AHBA processing pipeline. As gene expression fundamentally shapes the structure and function of the human brain, it is likely that this principal component may exhibit similar spatial variability to other imaging-derived measures. Recent work has highlighted that the T1w/T2w ratio is a robust phenotype that exhibits patterns of regional variation consistent with other microstructural and functional properties (Gao et al., 2020; Burt et al., 2018; Demirtaş et al., 2019; Fulcher et al., 2019). We therefore correlated the first principal component of gene expression with the whole-brain T1w/T2w ratio (Figure 1a, right panel), measuring the extent to which these values covary across the cortex.

Pipeline distributions

Results from these three analyses reveal that choice of processing pipeline dramatically influences derived statistical estimates (i.e. the CGE-distance correlation, the gene co-expression silhouette score, and the spatial correlations between gene PC1 and whole-brain T1w/T2w ratio; Figure 1b). We observe that all three of the generated distributions of statistical estimates across the 746,496 pipelines have wide ranges (correlated gene expression: [-0.51,–0.13]; gene co-expression: [-0.78,–0.18]; regional gene expression: [0.00, 0.90]) and are either bimodal (Figure 1b, left/middle panels) or heavily skewed (Figure 1b, right panel).

Since there is no ground truth for these analyses we cannot quantitatively assess whether some pipelines are more or less accurate than others. However, there is strong qualitative evidence to suggest that correlated gene expression should be lower between brain regions that are farther apart (Arnatkeviciute et al., 2019; Krienen et al., 2016; Richiardi et al., 2015; Fulcher et al., 2019; Lau et al., 2021). It is notable, then, that the distribution of distance-dependent estimates is so strongly bimodal (splitting at $r \approx - 0.4$ ), suggesting two very different perspectives on the size of this effect (Figure 1a and b, left panels). As increasingly-detailed single-cell transcriptional data become available (e.g. Yao et al., 2021) we may be able to use these estimates to determine accuracy; for now, we simply note that even for this estimate with strong biological priors we see considerable variability.

Similar variability can be observed for the other two analyses. While all the pipelines demonstrate relatively poor fit of gene communities to the derived gene co-expression matrices (refer to Materials and methods: Analytic approaches for information on why this is not unexpected), we observe that a portion of the pipelines yield far worse correspondence (Figure 1a and b, middle panels). Moreover, while the correlations between gene PC1 and whole-brain T1w/T2w ratio are largely consistent across pipelines, there are a small group of pipelines that yield correlations that deviate by $ρ \approx 1.0$ . Notably, the parameter choices for these pipelines are not pathological—that is, their use could be justified—and, as we discuss later (see Results: Variability in parameter importance), modifying just one parameter setting can yield changes in effect sizes within this range.

Collectively, we find that for all three of these analyses there is substantial variability in the statistical estimates generated by different processing pipelines, and this variability is large enough that, across pipelines, it has a meaningful difference in the potential inferences and conclusions that can be drawn.

Variability in parameter importance

Next, we quantified the relative importance of different processing steps and parameters on our three derived statistical estimates. While researchers must ultimately make choices for each of the steps individually when processing AHBA data, we wanted to investigate whether unique choices have distinct influences. Moreover, which parameters are most important may differ based on the type of analysis performed.

We investigated parameter importance by calculating a distribution of difference scores for each parameter, measuring the extent to which changing each parameter—holding all other parameters constant—influences the derived statistical metrics from each of the three analyses. For example, given a processing parameter with two choices this procedure yielded a distribution of $N / 2$ difference scores per analysis, where $N$ is the total number of pipelines (i.e. $746, 496 / 2 = 373, 248$ ). We averaged these distributions separately for each analysis to generate a single, summary ‘impact score’ for each processing step, which we then rank-ordered independently for each analysis.

We find considerable agreement in which parameters are the most impactful across analyses (Figure 2a): the most influential processing steps often involve procedures that influence the gene normalization process in some way (e.g. gene normalization method, normalizing only matched samples; Figure 2b). On the other hand, among the least impactful parameters are choices concerning donor-specific probe selection and handling of missing data. It is worth noting that of the probe selection methods tested in the current manuscript (i.e. max intensity, correlation intensity, correlation variance, differential stability, RNAseq correlation, and averaging), three of the six all render the choice of donor-specific probe selection redundant. In other words, these three methods are mutually exclusive with choice of donor-specific probe selection, potentially confounding our ability to measure the real influence of this parameter. We also highlight that choice of atlas may influence the impact of missing data handling: since the Desikan-Killiany atlas is a relatively low-resolution atlas (68 nodes), expression matrices generated from the tested pipelines are missing, at most, data for two brain regions. It is possible that handling of missing data may be more important when higher-resolution parcellations are employed. That is, while some parameters do not appear to affect our results in aggregate, there are potentially specific research questions where these parameters could play an important and impactful role.

Figure 2

Download asset Open asset

Parameter choice differentially impacts statistical estimates.

(a) Rank of the relative importance for each parameter ( $y$ -axis) across all three analyses ( $x$ -axis). Warmer colors indicate parameters that have a greater influence on statistical estimates. (b) Statistical distributions from the three analyses, shown as kernel density plots, separated by choice of gene normalization method (the most impactful parameter as shown in panel a). (c) Density plots of the statistical estimates for all 746,496 pipelines shown along the first two principal components, derived from the 746,496 (pipeline) x 3 (statistical estimates) matrix, representing how different the statistical estimates from each of the three analyses are relative to other pipelines. Left panel: pipelines are colored based on choice of gene normalization method, where each color represents 1/3 of the pipelines. Here, the pipelines in which no normalization was applied (purple) are distinguished from those in which some form of normalization was applied (blue and brown). Right panel: pipelines are colored based on whether gene normalization was performed within (True, red) or across (False, purple) structural classes (i.e. cortex, subcortex/brainstem, cerebellum; see *Materials and methods: Gene expression pipelines* for more information).

To investigate those parameters that did play an influential role in the current analyses, we visualized their impact by examining the statistical distributions from each analysis separated by the different parameter choices (shown in Figure 2b for gene normalization method). Dividing the distributions in this way highlights how strongly parameter choice can influence the outcomes of the analyses: for example, when no gene normalization is employed the resulting estimates are dramatically shifted from those generated by pipelines that employed some form of normalization (Figure 2b; no normalization: purple distribution). Indeed, the bimodality and skew observed in the full statistical distributions for the analyses (Figure 1b) is almost entirely explained by this single parameter choice.

To investigate more qualitative differences in how parameter choice influences the processing pipelines we performed a principal component analysis (PCA) on the matrix of statistical estimates from the three analyses (i.e. the $746, 496 \times 3$ pipeline-by-analysis matrix). We extracted the first two principal components from the statistical estimate matrix (variance explained: PC1 = 70%, PC2 = 26%) and examined how pipeline scores were distributed along these axes (Figure 2c). Delineating the distribution of pipelines based on parameter choice underscores how these options impact the separability of resulting statistical estimates. Reinforcing results presented above, we find that the choice of gene normalization method distinguishes the one-third of pipelines with no normalization (purple) from the remaining two-thirds that applied some form of normalization (blue and brown; Figure 2c, left). It is clear from the distribution of pipelines, however, that other processing choices interact with this parameter. For example, plotting the pipelines by whether the gene normalization was performed separately on samples within each structural class (i.e. cerebral cortex, subcortex, cerebellum) rather than across all tissue samples further delineates the pipelines that applied gene normalization into two distinct clusters (Figure 2c, right).

These results reveal how different processing steps are grouped in terms of their importance to analyses of the AHBA, with some groups demonstrating greater potential impact. Broadly, parameters modifying normalization are the most important, followed by parameters influencing how tissue samples are matched to brain regions, and finally parameters impacting probe selection. Moreover, we find that choices within each processing step do not all have an equivalent impact on derived estimates (i.e. performing no gene normalization has a much greater influence than choosing between the two other forms of normalization tested).

Reproducing published analyses

The previous subsections demonstrate variability across the complete range of reasonable processing pipelines; however, many of these pipelines have not yet been used in practice. To investigate whether the subset of pipelines that have already been implemented in the published literature display similar variability, we used abagen to reproduce the processing procedures from nine peer-reviewed articles that (1) are highly-cited within the field, (2) highlight a wide range of processing options, and (3) sufficiently describe their processing pipelines such that they could be reproduced. We explored how different the gene expression values and statistical outcomes generated by these published pipelines were (Hawrylycz et al., 2015; French and Paus, 2015; Whitaker et al., 2016; Krienen et al., 2016; Anderson et al., 2018; Burt et al., 2018; Romero-Garcia et al., 2018; Anderson et al., 2020b; Liu et al., 2020). To ensure comparability, we standardized the choice of brain parcellation across pipelines, using the Desikan-Killiany atlas in all instances. The pipelines were used to generate nine region-by-gene expression matrices, which were then subjected to the same three analyses described previously.

In reproducing the pipelines we note important differences in processing parameter selection (Figure 3a), and find that this variability results in slight discrepancies between gene expression values generated by the pipelines. For example, looking at the distribution of cortical somatostatin (SST), a gene discussed heavily in Anderson et al., 2020b where it used as a proxy for somatostatin interneuron density (Fulcher, 2019), we observe some variation between pipelines (Figure 3b and c). Although we find moderate consistency in the statistical estimates generated by the pipelines, there are important differences (ranges: correlated gene expression [-0.49,–0.28], gene co-expression [-0.70,–0.24], regional gene expression [0.34, 0.88]; Figure 3c). One outlier is the single pipeline that did not appear to implement any form of gene normalization (French and Paus, 2015), supporting earlier results demonstrating the importance of this processing step on downstream expression estimates. This is potentially notable as the processed expression data from this pipeline were made openly available and have been used in analyses by other researchers (e.g. Sepulcre et al., 2018; Beliveau et al., 2017).

Figure 3

Download asset Open asset

Reproducing published pipelines.

(a) Parameter choices used in the reproduction of published pipelines. Processing steps with categorical choices (e.g., gene normalization) were converted to numerical choices for display purposes only. These choices reflect the range of choices enumerated in Table 1. (b) Relative expression values of cortical somatostatin (SST) generated by each of the reproduced pipelines. Value ranges vary based on pipeline processing options. (c) The Pearson correlation between the cortical somatostatin (SST) maps generated by the nine pipelines shown in panel (b). (d) Statistical estimates from the three analyses described in *Materials and methods: Analytic approaches* applied to expression data from each of the published pipelines.

Given that imaging transcriptomics is still relatively new and there has been limited work addressing best practices in the field (Arnatkeviciute et al., 2019), these results stress the importance of standardization in use of the AHBA among research groups. Although variation in processing can ostensibly lead to similar inferences in specific analyses, even minor differences in processing choices consistently yield measurable discrepancies in derived expression data. Without proper standardization, these differences will compound and become more problematic as the field continues to grow.

Standardized processing and reporting with the abagen toolbox

Across all of our analyses we find that choice of processing steps and parameters can have a strong influence on the statistical outcomes of research with the AHBA. Here, we briefly highlight features that we have integrated into the abagen toolbox to facilitate standardization in future research.

The abagen toolbox supports two use-case driven workflows: (1) a workflow that accepts an atlas and returns a parcellated, preprocessed regional gene expression matrix (Figure 4a); and, (2) a workflow that accepts a mask and returns preprocessed expression data for all tissue samples within the mask (Figure 4b). Workflows can be called via a single line of code from either the command line or Python terminal, and take approximately one minute to run with default settings using the Desikan-Killiany atlas. The main output of abagen is a single brain region (or tissue sample) × gene expression matrix. Changing the parameters may modify the shape of the matrix (e.g. different atlases will yield different numbers of regions or samples) or different values (e.g. different processing choices may yield different numbers of genes), but not the structure. The outputs of these workflows can be used generally to examine the three prototypical research questions enabled by the AHBA: correlated gene expression, gene co-expression, and regional expression of genes of interest more broadly (Fornito et al., 2019). Beyond its primary workflows, abagen has additional functionality for post-processing the AHBA data (e.g. removing distance-dependent effects from expression data, calculating differential stability estimates; Hawrylycz et al., 2015), and for accessing data from the companion Allen Mouse Brain Atlas (e.g. providing interfaces for querying the Allen Mouse API; https://mouse.brain-map.org/; Lein et al., 2007).

Figure 4

Download asset Open asset

Workflows and features in the abagen toolbox.

(a) The primary workflow of abagen, used in the reported analyses, accepts a brain atlas and returns a parcellated brain-region-by-gene expression matrix. (b) An alternative abagen workflow accepts a regional mask and returns a processed tissue-sample-by-gene expression matrix, for all tissue samples from the six AHBA donors that fall within boundaries of the mask. (c) Examples of selected features from the abagen workflows and additional toolbox functionality. Top left: examples of some commonly-used atlases that can be employed with the parcellation workflow shown in panel (a). Bottom left: abagen can accept either standard atlases (i.e. in MNI space) or atlases defined in the space of the six individual donors from the AHBA. Top right: an additional workflow available in abagen can be used to generate densely-interpolated expression maps from AHBA data using a k-nearest neighbors interpolation algorithm. Bottom right: using high-resolution atlases in the parcellation workflow (panel a) may result in some parcels being assigned no expression data; abagen supports two methods for assigning values to such regions.

Although these workflows support the entire range of processing options that we assessed in the current manuscript (Figure 4c), we have set the default options for all steps based on best practice recommendations developed in Arnatkeviciute et al., 2019 and further informed by the results presented above (see Supplementary file 1 for a full list). We believe the default settings in abagen will provide a reasonable starting point for researchers beginning to work with the AHBA; however, as we have continually noted, the appropriate choices for some parameters will vary based on research question. As such, to make it easier for researchers to report exactly what parameters they use, we have integrated an automated reporting mechanism into the abagen workflows (Figure 5). The generated reports provide manuscript-ready step-by-step documentation describing all the processing done to the AHBA data in the workflow, and are licensed CC0 (https://creativecommons.org/share-your-work/public-domain/cc0/) so that they can be freely used without restriction.

Figure 5

Download asset Open asset

Annotated example abagen report.

Example of an automatically generated methods section report from the abagen toolbox. Processing steps are shown on the left and the relevant methods text—which is updated when these steps are modified—is shown in the same font color on the right. Reports also include a formatted reference section and relevant equations; these are not shown here for conciseness. Note that some processing steps (e.g. normalizing within structures, missing data handling) are omitted here because they are not run by default (see Supplementary file 1).

Creation of the toolbox has followed best-practices in software development, including version control, continuous integration testing, and modular code design. To encourage further use by new research groups we provide comprehensive documentation on installing and working with the abagen toolbox online (https://abagen.readthedocs.io/).

Discussion

In the present report, we introduced the abagen toolbox, an open-source Python library for processing transcriptomic data. Using abagen, we conducted a comprehensive analysis examining whether and how different processing options modify statistical estimates derived from analyses using the AHBA. We investigated how processing pipelines used in the literature compare to those we tested, and provide recommendations for improving standardization and reporting of analyses using the AHBA, highlighting how the abagen toolbox can facilitate future developments in this space.

Testing nearly 750,000 unique processing pipelines, we find that choice of processing parameters can strongly influence statistical estimates derived from analyses of the AHBA, and that these choices interact with the type of analysis performed (Figure 1). We observe significant variability with regard to which parameters are most influential, finding that procedures modifying gene expression normalization have a far greater impact on downstream analyses than other processing steps (Figure 2). Looking to the literature, we reproduce nine pipelines from published articles and find that, despite notable inconsistencies in their processing choices, there is moderate consistency in their produced statistical estimates (Figure 3). We demonstrate, however, that these summary estimates may obscure meaningful differences in gene expression values derived by the pipelines, cautioning researchers to be aware of how analytic choices may impact their findings.

Altogether, the present report provides a comprehensive assessment of how processing variability can impact analyses in the field of imaging transcriptomics. Our results demonstrate how researcher choices (or ‘researcher degrees of freedom’; Simmons et al., 2011) can play a meaningful role in analyses of the AHBA. However, these findings are not necessarily limited to the AHBA. Indeed, increasing reliance on open-access datasets has begun to reveal unique challenges associated with data reuse (Thompson et al., 2020). Improved standardization and reporting among research groups using (and re-using) openly available datasets may help to mitigate some of these challenges. We believe that functionality in the abagen toolbox can support future researchers in overcoming these pitfalls and improve reproducibility in processing and analyzing AHBA data.

Our results also show that not all processing choices are equal: that is, we find a hierarchy of processing parameters, wherein procedures modifying gene normalization have the greatest impact on analyses, followed by steps more broadly influencing the matching of tissue samples to brain regions and finally by parameters that determine probe selection. Furthermore, we find that within processing steps certain parameter choices may lead to more reasonable statistical estimates. In particular, applying some form of gene normalization tends to improve the behavior of processed expression data when compared to instances in which no normalization is applied (Figure 1), but there appear to be limited differences in the type of normalization used. Although we only considered cortical tissue samples in the current analyses, we expect that including non-cortical samples would further reinforce these results (Arnatkeviciute et al., 2019) known differences in microarray expression values between cortex and subcortical structures will likely emphasize the impact of different normalization procedures across pipelines. Critically, these findings largely agree with previous recommendations developed by Arnatkeviciute et al., 2019, and we have chosen default parameter choices for abagen workflows accordingly.

Note that there are some processing steps that should be performed in a specific sequence, and others whose order could potentially be interchanged. For example, intensity-based filtering of probes must always be performed before probe selection—reversing the order of these operations would, in the majority of cases, be problematic because it would potentially result in the selection of noisy probes to be carried through to analysis. However, the order of other steps (i.e. sample versus gene normalization) could arguably be reversed with no ostensible detriment. This procedural ambiguity is a salient example of the need to standardize workflows.

More broadly, this work builds on increasing efforts to examine the importance of methodological choices and analytical flexibility in human neuroimaging research (Bhagwat et al., 2021; Kharabian Masouleh et al., 2020; Oldham et al., 2020; Maier-Hein et al., 2017; Schilling et al., 2019; Carp, 2012; Botvinik-Nezer et al., 2020; Parkes et al., 2018; Ciric et al., 2017). Thankfully, emerging technical solutions have begun to tackle these issues via the development of tools that aim to abstract away sources of variation (e.g. fMRIPrep, Esteban et al., 2019; QSIPrep, Cieslak et al., 2020). While results from the present study reinforce the importance of methodological choices in research, abagen draws significant inspiration from these software packages in providing a set of tools designed to overcome such concerns when working with the AHBA.

While the AHBA dataset remains the only one of its kind, the abagen toolbox is designed to be used more broadly as similar datasets become available. That is, the preprocessing functions in abagen can be applied to other microarray expression datasets assuming, for example, availability of stereotactic coordinates. As new imaging transcriptomic datasets are developed and become more widely used, abagen functionality for creating standardized processing pipelines will only become more important. By developing the toolbox openly on GitHub (https://github.com/rmarkello/abagen), it is our hope that abagen can serve as a foundational, community tool for use in imaging transcriptomics research.

One consideration for future work on this topic is that the pipelines tested cover only a portion of the potential variability possible when processing AHBA data (Table 1). For example, a growing body of research has begun to examine how choice of brain parcellation may impact imaging analyses (e.g. Craddock et al., 2012; Thirion et al., 2014; Messé, 2020; Markello and Misic, 2021). While we only assessed processing pipelines using the Desikan-Killiany atlas, many other atlases have been used with the AHBA and it remains unclear how this variation may impact research findings. We also did not investigate whether donor-specific parcellations may impact analyses, a processing choice used in several published research findings (Anderson et al., 2020b; Romero-Garcia et al., 2018; Burt et al., 2018). Although there is significant evidence suggesting inter-individual variability in brain region definition (e.g. Gordon et al., 2017; Kong et al., 2019; Dickie et al., 2018), the process of generating individualized brain parcellations is fraught with methodological choices and requires careful data processing. Given the quality of the MRI data provided alongside the transcriptomic data in the AHBA—including important differences in scanning protocol and procedures between donors—creating donor-specific parcellations may be a large source of variability between pipelines.

Another limitation of the presented results is that we are unable to make categorical statements about which processing options are ’best’ for the AHBA. First, there is no ground truth against which one can assess what the optimal set of processing parameters. One potential solution to this could be to examine the robustness of pipelines based on a leave-one-donor-out strategy (e.g. Arnatkeviciute et al., 2019; Vogel et al., 2020), wherein analyses are repeated six times, omitting one donor each time, to ensure that none of the donors are unduly influencing analytic estimates. This approach is likely to become more useful as data from more individuals becomes available, but at present may be a worthwhile approach for assessing whether chosen processing parameters are appropriate. Moreover, the optimal set of processing parameters may vary based on research question. For instance, in most applications gene normalization is appropriate, as it ensures that downstream analyses are not driven by a small subset of highly expressed genes. However, in other applications it may be desirable to retain the variance contributed by genes to accurately reflect their relative expression levels. For example, many genes in AHBA are not brain-specific, so normalization will amplify their expression patterns, potentially obscuring more relevant expression information. This can be avoided by sub-selecting genes in a hypothesis-driven manner and skipping the normalization step altogether.

Nonetheless, we offer two alternative solutions for researchers who want to continue using the AHBA data. First, similar to the current report, researchers can conduct a comprehensive analysis with the AHBA, running multiple processing pipelines and showing the entire distribution of generated statistical estimates; however, this process can be computationally prohibitive and may impair researchers’ abilities to interpret their findings (Steegen et al., 2016). A less costly alternative, then, is for the imaging transcriptomic research community to converge on a set of data-driven processing pipeline for the AHBA that can be used across research groups. We believe the abagen toolbox—with its comprehensive workflows, well-informed default parameter choices, and detailed documentation—can facilitate this process. While we acknowledge that some research groups may have strong reasons for wanting to use specific (i.e. non-default) processing choices, in these instances we urge clear and detailed reporting of the methods used—such as via the automated reporting functionality from the abagen toolbox.

Altogether, the current report highlights the problem of processing variability in analyses using the AHBA, impacting many research studies in the burgeoning field of imaging transcriptomics. We demonstrate how different processing options can influence statistical estimates of analyses relating data from the AHBA to imaging-derived phenotypes, and present the abagen toolbox as a promising potential solution to this issue.

Share this article

Cite this article

Abagen pipeline options.

Processing choices influence transcriptomic analyses.

Parameter choice differentially impacts statistical estimates.

Reproducing published pipelines.

Workflows and features in the abagen toolbox.

Annotated example abagen report.

Author details

Ross D Markello

Contribution

For correspondence

Competing interests

Aurina Arnatkeviciute

Contribution

Competing interests

Jean-Baptiste Poline

Contribution

Competing interests

Ben D Fulcher

Contribution

Competing interests

Alex Fornito

Contribution

Competing interests

Bratislav Misic

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism