Multimodal MRI marker of cognition explains the association between cognition and mental health in the UK Biobank

eLife Assessment

This valuable work advances our understanding of the relationship between multimodal magnetic resonance imaging (MRI) measures, cognition, and mental health. Compelling use of statistical learning techniques in UK Biobank data shows that 48% of the variance between an 11-task derived g-factor and imaging data can be explained. Overall, this paper contributes to the study of brain-behaviour relations and will be of interest for both its methods and its findings on how much variance in g can be explained.

https://doi.org/10.7554/eLife.108109.3.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Cognitive dysfunction often co-occurs with psychopathology. Advances in neuroimaging and machine learning have led to neural indicators that predict individual differences in cognition with reasonable performance. We examined whether these indicators explain the relationship between cognition and mental health in the UK Biobank (n>14,000). Using machine learning, we quantified the covariation between cognition and 133 mental health indices and derived neural indicators of cognition from 72 neuroimaging phenotypes across diffusion-weighted MRI (dwMRI), resting-state functional MRI (rsMRI), and structural MRI (sMRI). With commonality analyses, we investigated how much of the cognition–mental health covariation is captured by each indicator and neural indicators combined within and across MRI modalities. The predictive association between mental health and cognition was at r=0.3. Neuroimaging captured 2.1 to 25.8% of the cognition-mental health covariation. Combining phenotypes within modalities improved the explanation to 25.5% for dwMRI, 29.8% for rsMRI, and 31.6% for sMRI, and combining them across modalities enhanced the explanation to 48%. We present an integrated approach to derive multimodal MRI markers of cognition that can be transdiagnostically linked to psychopathology, demonstrating that the predictive ability of neural indicators extends beyond the prediction of cognition itself, enabling us to capture cognition-mental health covariation.

Introduction

Cognition and mental health are closely intertwined (Iosifescu, 2012). Cognitive dysfunction is present in various mental illnesses, including anxiety (Gulpers et al., 2022; Nyberg et al., 2021), depression (Kriesche et al., 2023; Richardson and Adams, 2018; Wen et al., 2022), and psychotic disorders (Chavez-Baldini et al., 2023; Fusar-Poli et al., 2012; Guo et al., 2019; Lindgren et al., 2020; Mesholam-Gately et al., 2009; Semkovska et al., 2019). National Institute of Mental Health’s Research Domain Criteria (RDoC) (Cuthbert and Insel, 2013; Insel et al., 2010) treats cognition as one of the main basic functional domains that transdiagnostically underlie mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains, such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasises that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.

Recent efforts in brain Magnetic Resonance Imaging (MRI) and machine learning have led to predictive models that allow us to create MRI-based neural indicators of cognition with reasonable predictive performance (Krämer et al., 2024; Pat et al., 2022; Tetereva et al., 2022). These models are designed to predict cognition based on different cognitive tasks in unseen individuals who are not part of the modeling process (Marek et al., 2022; Zhi et al., 2024). Yet, the extent to which MRI-based neural indicators designed to predict cognition capture the same variance that mental health shares with cognition remains unknown. Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health (Wang et al., 2025).

Different MRI modalities measure different aspects of the brain, and MRI quantification techniques capture different brain features, resulting in distinct neuroimaging phenotypes. This means there are numerous approaches to derive neural indicators of cognition from MRI data. For example, diffusion-weighted MRI (dwMRI) measures the shape and amount of water diffusion in various directions and tissue compartments (Alexander et al., 2007). Different dwMRI metrics, such as fractional anisotropy (FA), which quantifies the degree of water diffusion directionality, and the streamline count, which indirectly reflects structural connectedness between the two regions (structural connectome), provide information about white matter orientation, density, and microstructural integrity (Basser et al., 1994; Soares et al., 2013; Zhang et al., 2012). Resting-state functional MRI (rsMRI) measures spontaneous low-frequency fluctuations in the Blood Oxygenation Level Dependent (BOLD) signal in the absence of a task, enabling the investigation of resting-state functional connectivity (RSFC) (Lee et al., 2013). RSFC from rsMRI can be estimated between pairs of parcellated grey matter regions (functional connectome) or between widespread networks derived from the Independent Component Analysis (ICA). Structural MRI (sMRI) uses T1-weighted and T2-weighted imaging to quantify various aspects of brain anatomy and morphology. For example, the morphology of the cerebral cortex and white matter can be quantified by measuring grey or white matter thickness, volume, and area in regions defined by different atlases, whereas the characteristics of subcortical regions are conventionally quantified with volumes of subcortical nuclei and their subdivisions (Symms et al., 2004; Wattjes, 2011). Previous studies using machine learning have shown that both (a) the choice of MRI modality and (b) the quantification method within each modality affect the performance of MRI-based models in capturing cognition (Dhamala et al., 2021; Pat et al., 2022; Tetereva et al., 2022). Dhamala and colleagues found that the predictive ability of structural and functional connectomes largely depends on the choice of atlases used to parcellate grey matter and how they were derived (Dhamala et al., 2021).

Given the heterogeneity of neuroimaging phenotypes from different MRI modalities, drawing information across them may boost the predictive ability of MRI-based neural indicators (Caunca et al., 2021). One way to integrate multiple neuroimaging phenotypes across MRI modalities is a stacking approach, which employs two levels of machine learning. First, researchers build a predictive model from each neuroimaging phenotype (e.g. cortical thickness from different grey matter parcellations) to predict a target variable (e.g. cognition). Next, in the stacking level, they use predicted values (i.e. cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable (Pat et al., 2022). Previous studies show that integrating multimodal neuroimaging phenotypes into ‘stacked models’ enhances the prediction of cognition (Krämer et al., 2024; Pat et al., 2022; Rasero et al., 2021; Tetereva et al., 2022). Here, we aim to determine whether this improvement extends beyond the prediction of cognition itself, allowing us to capture more covariation between cognition and mental health.

Using the largest population-level neuroimaging dataset, the UK Biobank, we investigated (a) which neuroimaging phenotypes yield a neural indicator of cognition that explains the relationship between cognition and mental health the most, and (b) whether combining neuroimaging phenotypes within and across MRI modalities enhances the explanation of this relationship. We started by deriving a general cognition factor, or the g-factor, from twelve cognitive scores from different tasks. The g-factor underlies variability across cognitive domains and reflects the overall cognition (Jensen, 2000; Panizzon et al., 2014). Next, we applied machine learning to predict the g-factor from 133 mental health indices and 72 neuroimaging phenotypes in unseen participants. For neuroimaging, we created predictive models from both individual neuroimaging phenotypes and phenotypes combined within and across three MRI modalities via stacking. Finally, we conducted commonality analyses (Nimon et al., 2008) to quantify the contribution of neural indicators of cognition based on different neuroimaging phenotypes to explaining the relationship between cognition and mental health.

Results

g-factor-modeling

To model the g-factor, we split the data into five outer folds, each comprising training (80% of the data) and test (20% of the data) sets (see Table 1 for sample characteristics and the Data analysis section).

Table 1

Characteristics of the train and test sets used to build the g-factor.

Fold	Number		Age, Mean (SD)		Females, %
Fold	Train	Test	Train	Test	Train	Test
1	25290	6323	64.52±7.64	64.48 (7.74)	51.39	51.16
2	25291	6322	64.53±7.66	64.44 (7.62)	51.19	51.96
3	25290	6323	64.53±7.66	64.45 (7.65)	51.38	51.19
4	25290	6323	64.5±7.66	64.56 (7.65)	51.27	51.64
5	25291	6322	64.48±7.67	64.63 (7.61)	51.49	50.78

SD, standard deviation.

In each fold, data factorability, assessed using Kaiser-Meyer-Olkin statistics (KMO >0.87) and Bartlett’s test of sphericity (p<0.05), indicated good suitability for factor analysis. Parallel analysis suggested that four factors were sufficient to explain the latent structure of the dataset (Figure 1). Exploratory structural equation modeling (ESEM) within a confirmatory factor analysis (CFA) further supported the adequacy and construct validity of the resulting factor structure (Figure 2a, Figure 2—figure supplement 1, Table 2).

Figure 1

Download asset Open asset

Experimental design.

(a) UK Biobank variables: cognitive tests, mental health, and neuroimaging phenotypes from three Magnetic Resonance Imaging (MRI) modalities. (b). Derivation of the g-factor from cognitive performance scores with Exploratory Structural Equation Modeling (ESEM) and prediction of the g-factor from mental health indices using Partial Least Squares Regression (PLSR). (c) Scheme of the machine learning model (PLSR) with nested cross-validation. (d) Scheme of the two-level predictive modeling and commonality analyses. First, individual neuroimaging phenotypes from diffusion-weighted MRI (dwMRI) (42 phenotypes), resting-state functional MRI (rsMRI) (10 phenotypes), and structural MRI (sMRI) (20 phenotypes) are used as features to predict the g-factor. Then, g-factor values predicted from distinct neuroimaging phenotypes are combined within each modality (‘dwMRI Stacked,’ ‘rsMRI Stacked,’ and ‘Stacked sMRI’) as well as across all modalities (‘All MRI Stacked’) and used as features, resulting in one predicted value per subject per stacked model (ĝdwMRI, *ĝ_rsMRI*, ĝ_sMRI, and ĝ_allMRI). Finally, values predicted from MRI data together with the values predicted from mental health indices (ĝ_MH) are used as independent explanatory variables in commonality analyses. *X_D-scores*, *X_R-scores*, *X_s-scores*, and *Y-scores*, weighted linear combinations of the original features (dwMRI, rsMRI, and sMRI neuroimaging phenotypes, respectively) in PLSR; WM, white matter; *TBSS*, tract-based spatial statistics; *ACT*, anatomically-constrained tractography; *iFOD2*, Fiber Orientation Distributions; FA, fractional anisotropy; MD, mean diffusivity; MO, diffusion tensor mode; L1, L2, L3, eigenvalues of the diffusion tensor; *ICVF*, intracellular volume fraction; OD, orientation dispersion index; *ISOVF*, isotropic volume fraction; *BOLD*, blood oxygenation level dependent signal; *ICA*, independent component analysis; GM, grey matter; *CSF*, cerebrospinal fluid; F1, F2, F3, and F4, latent factors from ESEM; *ESEM loadings*, loadings of the test scores onto the latent factors and loadings of the latent factors onto the g-factor; *X-loadings* and *Y-loadings*, loadings of the predictor (mental health measures; X) and target (g-factor; Y) variables; respectively, onto the PLSR components; *X_M-scores* and *Y-scores*, the weighted linear combinations of the original predictor (mental health measures) and target (g-factor) variables, respectively; ĝ_MH, values of the g-factor predicted from mental health features; ĝ, predicted values of the g-factor; r, Pearson r (between original and predicted values of the g-factor); R², coefficient of determination (between original and predicted values of the g-factor); *MAE*, mean absolute error; CV, cross-validation.

Figure 2 with 1 supplement see all

Download asset Open asset

g-factor modeling and the relationship between the g-factor and mental health features.

In our analysis, we derived the g-factor and built machine learning models in each outer fold separately. For visualization purposes, we display the results of the (a) Exploratory Structural Equation Modeling (ESEM) performed on a sample of 31,614 participants and (b) Partial Least Squares Regression (PLSR) model for the g-factor and mental health built on a sample of 21,077 participants with a single train/test split (80%/20%) as representations of ESEM and PLSR model structures across fivefolds. (a) Loadings of the cognitive test scores onto four latent factors and loadings of the latent factors onto the g-factor based on the ESEM. (b) Scatterplot of the observed g-factor and g-factor predicted from mental health indices with PLSR. Out-of-sample predictive performance of the PLSR model is evaluated with Pearson r and R² averaged across fivefolds. (c) Pearson correlations between the g-factor and mental health indices. Mental health features are grouped into categories. Within each category, a feature with the highest absolute value of Pearson r is annotated. Pale and bright dots represent non-significant and significant correlations, respectively. (d) Loadings of mental health indices in the PLSR model showing the relationships between the features (mental health) and the target variable (g-factor). The loadings are averaged across all PLSR components and weighted by R² in the training set. Mental health features are grouped into categories. Within each category, a feature with the highest absolute value of the loading is annotated. SD, standard deviation (mean across fivefolds).

Figure 2—source data 1 Source data containing factor loadings, scatterplot data, PLSR results, and the cognitive test correlation matrix underlying all panels of Figure 2.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig2-data1-v1.xlsx
Download elife-108109-fig2-data1-v1.xlsx

Table 2

Goodness-of-fit indices for the hierarchical g-factor model across fivefolds.

Fold	χ²	p-value for χ²	df	CFI	TLI	BIC	RMSEA	SRMR
1	1812.236	<0.001	30	0.969	0.933	805169.905	0.048	0.026
2	892.577	<0.001	30	0.985	0.967	804456.322	0.034	0.016
3	805.374	<0.001	30	0.987	0.971	804114.165	0.032	0.017
4	1335.887	<0.001	30	0.978	0.951	804537.156	0.041	0.019
5	1926.098	<0.001	30	0.967	0.928	805461.947	0.050	0.027

χ², Chi-square test statistic; df, degrees of freedom; CFI, Comparative Fit Index; TLI, Tucker-Lewis Index; BIC, Bayesian Information Criteria; RMSEA, Root Mean Square Error of Approximation; SRMR, Standardised Root Mean Square Residual.

For each fold, the Comparative Fit Index (CFI) was > 0.96, the Tucker-Lewis Index (TLI) > 0.92, the Root Mean Square Error of Approximation (RMSEA) ≤ 0.05, and the Standardized Root Mean Square Residual (SRMR) < 0.03, indicating good model fit (Bentler, 1990; Hu and Bentler, 1999; Xia and Yang, 2019). Four latent factors captured approximately 27% of the covariance structure of cognitive tests, and the g-factor accounted for 39% of the variance in cognitive scores (Supplementary file 1).

Predictive modeling

Mental health

On average, information about mental health predicted the g-factor at R²_mean = 0.10 and r_mean = 0.31 (95% CI [0.291, 0.315]; Figure 2b; Table 3).

Table 3

Out-of-sample predictive performance of mental health features in the Partial Least Squares Regression (PLSR) model across fivefolds.

	MSE	MAE	R²	r	p-value
Fold 1	0.425	0.521	0.142	0.377	<0.01
Fold 2	0.621	0.624	0.061	0.25	<0.01
Fold 3	0.696	0.667	0.059	0.244	<0.01
Fold 4	0.448	0.531	0.12	0.347	<0.01
Fold 5	0.448	0.527	0.114	0.34	<0.01
Mean performance:	0.53	0.57	0.10	0.31

MSE, mean squared error; MAE, mean absolute error; R², coefficient of determination; r, Pearson r.

The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective wellbeing, and negative traumatic events relate negatively to cognition.

Brain MRI

The predictive performance of neuroimaging phenotypes varied from low to moderate. At the modality level, rsMRI and dwMRI showed the highest and lowest performance, respectively (Figure 3). On average, neuroimaging phenotypes stacked within dwMRI, rsMRI, and sMRI predicted the g-factor at R²_mean = 0.073, 0.105, and 0.095, and r_mean = 0.27 (95% CI [0.252, 0.273]), 0.33 (95% CI [0.308, 0.331]), and 0.3 (95% CI [0.284, 0.307]), respectively. Stacking all 72 neuroimaging phenotypes boosted the predictive performance of the MRI-based model for cognition, yielding R²_mean = 0.159 and r_mean = 0.398 (95% CI [0.379, 0.402]; Figure 4). The best algorithm for the stacked model was XGBoost. We outline results for each neuroimaging phenotype in Supplementary file 2 and for each MRI modality and each stacking algorithm in Table 4.

Figure 3

Download asset Open asset

Predictive performance of machine learning models based on 72 individual neuroimaging phenotypes.

Bootstrap distribution of Pearson r for the g-factor derived from Exploratory structural equation modeling ESEM and g-factor predicted from each neuroimaging phenotype, and corresponding 95% confidence intervals (95% CI). Values at the top and bottom of the plots indicate the lower and upper 95% CI for the bootstrap Pearson r.

Figure 3—source data 1 Source data containing the bootstrapped predictive performance metrics of machine learning models based on dwMRI, rsMRI, and sMRI neuroimaging phenotypes.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig3-data1-v1.xlsx
Download elife-108109-fig3-data1-v1.xlsx

Figure 4

Download asset Open asset

Predictive performance of machine learning models based on neuroimaging phenotypes stacked within and across three Magnetic Resonance Imaging (MRI) modalities.

Bootstrap distribution of Pearson r between the g-factor derived from Exploratory structural equation modeling (ESEM) and the g-factor predicted from neuroimaging phenotypes stacked within diffusion-weighted MRI (dwMRI), resting-state functional MRI (rsMRI), structural MRI (sMRI), and across all MRI modalities. Values at the top of each plot mark the median Pearson r.

Figure 4—source data 1 Source data containing the bootstrapped predictive performance metrics of machine learning models based on neuroimaging phenotypes stacked within and across the three MRI modalities.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig4-data1-v1.xlsx
Download elife-108109-fig4-data1-v1.xlsx

Table 4

Mean (averaged across fivefolds) out-of-sample predictive performance of Magnetic Resonance Imaging (MRI) modalities stacked using four machine learning algorithms.

	Algorithm	R²	r	MSE	MAE
dwMRI	ElasticNet	0.027	0.227	0.97	0.782
	Random Forest	0.073	0.265	0.924	0.764
	Support Vector Regression	0.036	0.247	0.961	0.777
	XGBoost	0.061	0.26	0.936	0.768
rsMRI	ElasticNet	0.100	0.325	0.897	0.752
	Random Forest	0.105	0.325	0.891	0.75
	Support Vector Regression	0.101	0.327	0.896	0.751
	XGBoost	0.102	0.326	0.895	0.751
sMRI	ElasticNet	0.094	0.294	0.903	0.755
	Random Forest	0.093	0.293	0.904	0.755
	Support Vector Regression	0.095	0.298	0.902	0.753
	XGBoost	0.095	0.296	0.902	0.754
All MRI modalities	ElasticNet	0.131	0.374	0.866	0.738
	Random Forest	0.152	0.383	0.845	0.729
	Support Vector Regression	0.139	0.383	0.859	0.734
	XGBoost	0.159	0.398	0.838	0.726

R², coefficient of determination; r, Pearson r; MSE, mean squared error; MAE, mean absolute error; dwMRI, diffusion-weighted MRI; rsMRI, resting-state MRI; sMRI, T1-weighted and T2-weighted structural MRI. The algorithms that yielded the highest R² are highlighted in bold.

dwMRI

Overall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Figure 3). TBSS, in turn, performed better than probabilistic tractography (Figure 3 and Supplementary file 2). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R²_mean = 0.052; r_mean = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in the aparc MSA-I parcellation with the predicted g-factor values from the PLSR model. Positive associations with the predicted g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Figure 5, Figure 5—figure supplement 1).

Figure 5 with 3 supplements see all

Download asset Open asset

Feature importance maps for neuroimaging features with the highest predictive performance for cognition derived via the Haufe transformation.

The color of the lines (resting-state functional MRI, rsMRI, and diffusion-weighted MRI, dwMRI) and subcortical structures (sMRI) indicates the magnitude and direction of Pearson correlations between the predicted g-factor and features from the top-performing neuroimaging phenotype. Correlations were computed in test sets pooled across five outer folds. **rsMRI**: A connectogram displays network-level feature importance for functional connectivity between 55 neuronally driven independent components (IC) grouped into seven networks (Thomas Yeo et al., 2011 parcellation). Full correlation matrices rather than tangent space parameterization were used for interpretability. The IC with the highest activation within each network is highlighted in color, and its corresponding functional connectivity map is overlaid. **dwMRI**: The importance of structural connections (streamline count) between brain regions parcellated using the aparc (Desikan-Killiany) MSA-I atlases for predicting cognition is shown as a glass brain plot, with cortical/subcortical nodes (circles) and their connecting edges (lines) colored by correlation direction and strength. **sMRI**: Regional volumes of subcortical structures derived from FreeSurfer subcortical volumetric subsegmentation are overlaid on a glass brain. Values of Pearson r for the top correlations are illustrated in Figure 5—figure supplements 1–3.

Figure 5—source data 1 Source data containing feature importance metrics for dwMRI structural connectivity (streamline counts), rsMRI functional connectivity (IC correlations), and sMRI subcortical volumes.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig5-data1-v1.xlsx
Download elife-108109-fig5-data1-v1.xlsx

The mean length of the streamlines connecting nodes from the Schaefer atlas for 500 cortical areas combined with MSA-IV had the lowest performance among all structural connectivity metrics (R²_mean = 0.018; r_mean = 0.145, 95% CI [0.132, 0.156]; Figure 3). Among dwMRI IDPs, eigenvalue L2 from TBSS had the best predictive performance (R²_mean = 0.045; r_mean = 0.207, 95% CI [0.194, 0.216]), and MO and OD derived with probabilistic tractography were least predictive of the g-factor (R²_mean = 0.006; r_mean = 0.076, 95% CI [0.065, 0.087] and R²_mean = 0.01; r_mean = 0.099, 95% CI [0.085, 0.109], respectively).

Stacking g-factor values predicted from all dwMRI neuroimaging phenotypes improved the predictive performance of dwMRI to R²_mean = 0.073 and r_mean = 0.265 (95% CI [0.252, 0.273]; Figure 4 and Table 4). The best algorithm for the stacked model was Random Forest.

rsMRI

Among RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Figure 5, Figure 5—figure supplement 2).

Among RSFC metrics for parcellated time series data, full correlation matrices performed best in the training set. Overall, RSFC between 55 ICs quantified with tangent space parameterization had the highest predictive performance (R²_mean = 0.088, r_mean = 0.3, 95% CI [0.284, 0.307]), followed by RSFC between 200 cortical and 16 subcortical regions (Schaefer cortical atlas +MSA I) measured with full correlation (R²_mean = 0.07, r_mean = 0.27, 95% CI [0.255, 0.278]). The predictive performance of 21 and 55 ICs amplitudes was the lowest (R²_mean = 0.013, r_mean = 0.14, 95% CI [0.098, 0.122] and R²_mean = 0.019, r_mean = 0.11, 95% CI [0.125, 0.149], respectively; Supplementary file 2).

Stacking g-factor values predicted from rsMRI neuroimaging phenotypes considerably improved the predictive performance of rsMRI to R²_mean = 0.105 and r_mean = 0.325 (95% CI [0.308, 0.331]; Figure 4 and Table 4). Similar to dwMRI, the best algorithm in the stacked model was Random Forest.

sMRI

FreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R²_mean = 0.068; r_mean = 0.244, 95% CI [0.237, 0.259] and R²_mean = 0.059; r_mean = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of the bilateral whole hippocampal head and whole hippocampus (Figure 5, Figure 5—figure supplement 3). Gray matter morphological characteristics from ex-vivo Brodmann Area Maps showed the lowest predictive performance (R²_mean = 0.008, r_mean = 0.089, 95% CI [0.075, 0.098]; Figure 3 and Supplementary file 2).

Stacking g-factor values predicted from all sMRI neuroimaging phenotypes improved the model’s predictive performance to R²_mean = 0.095 and r_mean = 0.298 (95% CI [0.284, 0.307]). The best algorithm in the stacked model was SVR (Figure 4 and Table 4).

Commonality analysis

Different neuroimaging phenotypes captured the relationship between cognition and mental health at varying degrees, as indicated by a percentage ratio between the common effect of mental health-g and neuroimaging-g and the total effect of mental health-g. Neuroimaging phenotypes from dwMRI, rsMRI, and sMRI accounted for 2.1–19.3%, 4–25.8%, and 4.8–21.8% of the cognition-mental health relationship, respectively (Figure 6). Among dwMRI neuroimaging phenotypes, the number of streamlines connecting gray matter regions from Destrieux (aparc.a2009s) cortical +MSA I subcortical parcellations shared the highest proportion of variance with mental health-g (19.3%). For rsMRI, the largest proportion of the common effect of mental health-g was shared with RSFC between 55 ICs (25.8%). For sMRI, subcortical volumetric subsegmentation contributed most to the link between cognition and mental health (21.8%). The correlation between the performance of each neuroimaging phenotype in predicting cognition and the proportion of the relationship between cognition and mental health captured by the phenotype was r=0.97 (95% CI [0.958, 0.982]; Figure 7).

Figure 6

Download asset Open asset

Results of commonality analyses: the contribution of neuroimaging phenotypes to the relationship between cognition and mental health.

Stacked bar plot diagrams of the results of commonality analyses for each neuroimaging phenotype. *Unique variance*, proportion (%) of variance in the g-factor explained uniquely by mental health and neuroimaging phenotypes; *Common variance*, proportion (%) of variance in the g-factor shared between mental health and neuroimaging phenotypes; *aparc.a2009s*, Destrieux cortical atlas; *Schaefer7n200p*, Schaefer Atlas for seven networks and 200 parcels; *Schaefer7n500p*, Schaefer Atlas for seven networks and 500 parcels; I, Melbourne Subcortical Atlas I; IV, Melbourne Subcortical Atlas IV; *SIFT2*, Spherical-Deconvolution Informed Filtering of Tractograms 2; FA, fractional anisotropy; MD, mean diffusivity; MO, diffusion tensor mode; *L1, L2, and L3*, eigenvalues of the diffusion tensor; OD, orientation dispersion index; *ICVF*, intracellular volume fraction; *ISOVF*, isotropic volume fraction; *TBSS*, Tract-Based Spatial Statistics; *Func. Connectivity*, functional connectivity; *Subseg*., FreeSurfer subsegmentation; *ASEG*, FreeSurfer automated subcortical volumetric segmentation; *FSL FAST*, FSL FMRIB’s Automated Segmentation Tool; WM, white matter; GM, gray matter; *FSL*, *FIRST* FMRIB’s Integrated Registration and Segmentation Tool; *DKT*, Desikan-Killiany-Tourville; BA, FreeSurfer ex-vivo Brodmann Area Maps.

Figure 6—source data 1 Source data containing the results of commonality analyses quantifying the contribution of dwMRI, rsMRI, and sMRI neuroimaging phenotypes to the relationship between cognition and mental health.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig6-data1-v1.xlsx
Download elife-108109-fig6-data1-v1.xlsx

Figure 7

Download asset Open asset

Scatterplot of the relationship between the Partial Least Squares Regression (PLSR) performance of individual neuroimaging phenotypes and the proportion of the cognition–mental health relationship they capture.

Figure 7—source data 1 Source data containing the PLSR performance of individual neuroimaging phenotypes and the proportion of the cognition–mental health relationship captured by these phenotypes.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig7-data1-v1.xlsx
Download elife-108109-fig7-data1-v1.xlsx

When we stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, we captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, we enhanced the explanation to 48% (Figure 8e–h).

Figure 8

Download asset Open asset

The contribution of neuroimaging phenotypes stacked within each and across all Magnetic Resonance Imaging (MRI) modalities to the relationship between cognition and mental health: Results of predictive modeling and commonality analyses.

(a–d) Distributions of the g-factor values derived from cognitive tests via Exploratory structural equation modeling (ESEM) and predicted from neuroimaging phenotypes stacked within diffusion-weighted MRI (dwMRI) (a) resting-state functional MRI (rsMRI) (b), structural MRI (sMRI) (c), and across all MRI modalities (d). (e–l) Venn diagrams of the results of commonality analyses. (e–h) The proportion (%) of variance in the g-factor explained uniquely by mental health and neuroimaging phenotypes stacked within dwMRI (e), rsMRI (f), sMRI (g), and across all MRI modalities (h), as well as the common effects between mental health and MRI modalities. (i–l) The proportion (%) of variance in the g-factor explained uniquely by mental health, neuroimaging phenotypes stacked within dwMRI (i), rsMRI (j), sMRI (k), and across all MRI modalities (l), and age and sex, as well as the common effects among all explanatory variables.

Figure 8—source data 1 Source data containing the distributions of the observed and predicted g-factors across MRI modalities and the estimates of unique and shared contributions of mental health, neuroimaging phenotypes, and age and sex to the variance in the g-factor.: https://cdn.elifesciences.org/articles/108109/elife-108109-fig8-data1-v1.xlsx
Download elife-108109-fig8-data1-v1.xlsx

Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (‘All MRI Stacked’) explained 72% of this age and sex-related variance (Figure 8i–l).

Discussion

Our study is the first to quantify the contribution of the neural indicators of cognition, as reflected by the g-factor, to its relationship with mental health in the largest population-level cohort. We show that the performance of each neural indicator in predicting cognition per se is strongly related to its ability to explain the link between cognition and mental health. In other words, the ‘robustness’ of the neural indicator of cognition is associated with its capacity to capture the covariation between cognition and mental health. This means that neural indicators that capture more of the individual differences in cognition also capture more of the cognition–mental health covariation. We further demonstrate that information from 72 neuroimaging phenotypes from three MRI modalities accounts for almost half of the variance in the relationship between cognition and mental health. Aspects of the brain that underpin this relationship are best reflected in neuroimaging phenotypes derived from rsMRI, followed by sMRI and dwMRI. For rsMRI, RSFC between 55 large-scale networks measured with tangent space parameterization was the best-performing neuroimaging phenotype. Among sMRI neuroimaging phenotypes, subcortical grey matter characteristics quantified using FreeSurfer’s subcortical volumetric subsegmentation, and ASEG shared the highest proportion of variance that mental health captures. For dwMRI, microstructural properties of the brain’s white matter that underlie the link between cognition and mental health were best reflected in the number of streamlines connecting grey matter regions from Destrieux cortical +MSA I subcortical parcellations. Integrating information from neuroimaging phenotypes within and across MRI modalities enhanced both the prediction of cognition and the explanation of the relationship between cognition and mental health. We discuss the results in the context of current knowledge in the field, as follows.

The cognition and mental health relationship

Our analysis confirmed the validity of the g-factor as a quantitative measure of cognition (Jensen, 2000), demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies (Canivez and Watkins, 2010; Dombrowski et al., 2018; Dubois et al., 2018; Galsworthy et al., 2005; Gignac and Bates, 2017; Gignac and Szodorai, 2024). Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature (Abramovitch et al., 2021; Wang et al., 2025). Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children (Wang et al., 2025). Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70^th percentile of reported effects and approaches the threshold for a large effect at r=0.32 (Brydges, 2019). While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.

The directions of PLSR loadings were broadly consistent with univariate correlations. PLSR extends beyond univariate approaches by modeling multivariate relationships across features and outcomes. Consistently, both univariate correlations and factor loadings derived from the PLSR model indicated that scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective wellbeing, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition (Karpinski et al., 2018). On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help (Ogueji and Okoloba, 2022). Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course (Ogueji and Okoloba, 2022; Quintana-Orts et al., 2020). Some studies have found a positive correlation between cognitive abilities and the risk of non-suicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status (Bittár et al., 2020; Mars et al., 2014). In our study, the magnitude of the association between self-harm behaviours and cognition was low (Figure 2), indicating a weak relationship.

Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life (Cerdá et al., 2011; Collins, 2016; Druffner et al., 2024). For example, education level and income correlate with cognitive ability and alcohol consumption (Druffner et al., 2024; Lui et al., 2018; Rogne et al., 2021; Zhou et al., 2021). Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli (Kanazawa and Hellberg, 2010; Wilmoth, 2012). This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks (National Academies of Sciences, Engineering, and Medicine, Health and Medicine Division, Board on Population Health and Public Health Practice, & Committee on the Health Effects of Marijuana: An Evidence Review and Research Agenda, 2017). Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition (Krueger et al., 2009). Young adults often drink alcohol as a social ritual in university settings to build connections with peers (Brown and Murphy, 2020). In older adults, drinking may accompany friends or family visits (Beck et al., 2019; Kelly et al., 2018). Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities (Dossi et al., 2020; Nyberg et al., 2021; Yang et al., 2015). Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task (Angelidis et al., 2019; Hayes et al., 2012; Lukasik et al., 2019). Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control (Aupperle et al., 2012; Johnsen and Asbjørnsen, 2008; Khan et al., 2024; Moore, 2009). Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife (Bomyea et al., 2012; Lynch and Lachman, 2020). In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level (Petkus et al., 2018).

In agreement with our findings, cognitive deficits are often found in psychotic disorders (Künzi et al., 2022; McCutcheon et al., 2023; Sheffield et al., 2018). We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g. recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

Finally, both negative PLSR loadings and corresponding univariate correlations for features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research (Allerhand et al., 2014; Jokela, 2022; Shi et al., 2022). On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition (Dawson, 2025). The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life (Dewitte and Dezutter, 2021). The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 (Aftab et al., 2019). Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability (Zhu et al., 2024).

How well does brain MRI capture the predictive relationship between cognition and mental health?

Consistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-sized performance (r ≈ 0.4) (Dhamala et al., 2021; Gignac and Szodorai, 2024; He et al., 2020; Krämer et al., 2024; Pat et al., 2022; Sripada et al., 2020; Tetereva et al., 2022). This provides us confidence in using MRI to derive quantitative neuromarkers of cognition. Neural indicators of cognition derived from rsMRI and sMRI neuroimaging phenotypes explain approximately a third of this link (29.8% and 31.6%, respectively), whereas multimodal neural indicators of cognition derived from all neuroimaging phenotypes account for almost half (48%) of the variance in the cognition-mental health relationship. Yet, combining all neuroimaging phenotypes from three MRI modalities allowed us to explain the highest proportion of the variance in cognition that mental health captures.

Among dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis. One of the mechanisms that can be reflected in the link between streamline count and individual variations in cognition is strengthening local connections within a hemisphere and enhancing local (i.e. the connectivity between neighbouring regions) and nodal efficiency (i.e. how well a given brain region is connected to other regions) (Neudorf et al., 2024). The somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations, such as Alzheimer’s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) (Bergamino et al., 2021; Douaud et al., 2011; Silk et al., 2009), the empirical evidence on their associations with cognitive performance is controversial (Bozzali et al., 2002; Chen et al., 2023; O’Donnell and Westin, 2011; Patil and Ramakrishnan, 2014; Pierpaoli et al., 1996; Shim et al., 2017; Sripada et al., 2020; Stahl et al., 2007).

We extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature (Dhamala et al., 2021; Pat et al., 2022), by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research (Cole et al., 2013; Seeburger et al., 2024; Smallwood et al., 2021; Vossel et al., 2014).

ICA is a data-driven technique that does not rely on predefined anatomical boundaries. It captures intrinsic large-scale functional networks accounting for individual variability in RSFC. Thus, by providing more ‘personalized’ connectivity representations, ICA-derived networks may be more robust neuronal indicators of cognitive performance than node-to-node estimates (Marrelec and Fransson, 2011; Sohn et al., 2015). Furthermore, using tangent space parametrization to quantify RSFC not only improves the predictive performance of rsMRI for cognition, as shown previously (Abbas et al., 2023; Dadi et al., 2019; Ng et al., 2014; Simeon et al., 2022; Venkatesh et al., 2020), but also allows us to capture more variance that cognition shares with mental health. Although tangent parametrization matrices do not reflect individual functional brain connections and cannot be interpreted directly, the linearization and projection to Euclidean space make functional connectivity estimates more suitable for statistical analysis and machine learning (Pervaiz et al., 2020). Resting-state fluctuation amplitudes from rsMRI, which are the least predictive of cognition and explain the smallest proportion of the variance in cognition that mental health captures, are believed to reflect cardiovascular and cerebrovascular factors distinct from neural effects (Tsvetanov et al., 2021). Research indicates that network amplitudes are significantly influenced by age, cardiovascular and lung function, and other physical measures, and are, therefore, subject to high interindividual variability. Consequently, interindividual variability in vascular health and ageing dynamics may confront the capacity of network amplitudes to serve as a robust neural marker of cognition (Lee et al., 2023).

Integrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults (Park and Reuter-Lorenz, 2009; Reuter-Lorenz and Park, 2014; Vieira et al., 2020).

Limitations

The study has several limitations. First, the UK Biobank MRI data include only one task for task functional MRI (tfMRI), the Hariri hammer task (Hariri et al., 2000), which is not designed to be cognitively demanding. Compared to other MRI modalities, tfMRI from certain tasks, such as the n-back working memory task (Kirchner, 1958), has been shown to provide the most robust neuromarker of cognition (Tetereva et al., 2022). Thus, by not including cognitively demanding tfMRI tasks in predictive models, we may have missed condition-specific variance that may account for a substantial portion of the variance specific to particular cognitive domains. Second, for generalizability purposes, we did not stratify the cohort nor exclude individuals with neurological, cardiovascular, or any other clinically established disorders, assuming that, in older adults, these effects can be tightly intertwined with neuronal mechanisms that sustain cognitive processing. Finally, the UK Biobank’s mental health questionnaire and cognitive test battery may miss important information as they cover a limited and specific set of neuropsychiatric conditions and cognitive domains, respectively. For example, the mental health questionnaire does not include questions evaluating autism or ADHD symptoms. The UK Biobank’s cognitive test battery was designed specifically for the study and is different from commonly used cognitive test batteries, such as the NIH Toolbox for Assessment of Neurological and Behavioral Function Gershon et al., 2013 used in the Human Connectome Project (Elam and Essen, 2022) and Adolescent Brain Cognitive Development Study (Casey et al., 2018) or Wechsler Adult Intelligence Scale Hartman, 2009 used in the Dunedin Study (Poulton et al., 2015), which hinders the cross-study comparison of the cognitive performance score.

Conclusions

Overall, our findings suggest that RSFC is a more fine-tuned system that exhibits a degree of flexibility and variability that is not entirely constrained by anatomical pathways, as cortical regions can use alternative or parallel pathways to strengthen or weaken functional connections underlying cognitive processing (Greicius et al., 2009). Although RSFC is believed to reflect anatomical connectivity (Greicius et al., 2009; O’Reilly et al., 2013), alterations in structural connectivity do not always compromise RSFC (O’Reilly et al., 2013). From this point, a pattern of RSFC maintained for an extended period may eventually cause concurrent alterations in cognition and mental health, especially if it involves brain areas that have overlapping effects on both. Still, the physical integrity and morphology of the structures providing a ‘physical’ relay for functionally connected brain areas to communicate play a pivotal role in the brain-behaviour relationship. Although more rigid and less flexible in an adult brain, they determine the amount of neural resources available for cognitive processing. Nevertheless, none of the neuroimaging phenotypes or MRI modalities alone is sufficient to provide a complete picture of the complex relationship between cognition and mental health, and combining all sources of information about brain structure and function reflected in three MRI modalities allows us to derive robust quantitative neuromarkers of cognition and boost the explanation of neural correlates that underlie this link.

In line with the National Institute of Mental Health’s RDoC framework, we shed light on the relationship between cognition and mental health as one of the six transdiagnostic spectrums of neuropsychiatric symptoms at one of the seven units of analysis, i.e., the neural level (Morris and Cuthbert, 2012). This may serve as a methodological ‘bridge’ linking other units of analysis, such as genes and behaviour. By elucidating the role of different neuroimaging phenotypes as neural correlates of cognition that overlap with mental health, we provide potential targets for behavioural and physiological interventions that may affect cognition.

Although recent debates Marek et al., 2022 have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition–mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labelling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Kirchner, 1958; Pat et al., 2022; Rentz et al., 2011; Rentz et al., 2011; Sripada et al., 2020; Tetereva et al., 2022; Tetereva et al., 2025; Wang et al., 2025).

Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition (Insel et al., 2010). Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition–mental health relationship.

Neuroimaging offers a unique window into the biological mechanisms underlying cognition-mental health overlap – insights unattainable from behavioural data alone. Our findings validate brain-based neural markers as a core unit of analysis for cognitive functioning, advancing mental health research through the lens of cognition. Beyond this conceptual contribution, the study has clinical implications. First, by demonstrating a transdiagnostic link between cognition and mental health, we support interventions that enhance cognition as a pathway to improving mental health. Second, we show neuroimaging as an effective tool for assessing the neurobiological basis of this link. Quantifying neuroimaging’s capacity to capture this relationship is essential for future research integrating imaging with cognitive testing to monitor treatment-related neural changes. Such work could enable personalised interventions, using neuroimaging to track cognitive changes and treatment efficacy (e.g. stimulant medications for ADHD) aimed at boosting cognitive functioning.

Materials and methods

Data

We used cognition, mental health, and MRI data collected at the first imaging visit (2014–2019) and additional mental health data collected online from the UK Biobank (UK Biobank Resource Application 70132), a prospective epidemiological study involving individuals aged 40–69 years at recruitment. All UK Biobank participants provided informed consent directly to UK Biobank at recruitment. UK Biobank has ethical approval from the North West Multi-centre Research Ethics Committee (reference 16/NW/0274) as a Research Tissue Bank. The analyses included participants who had all brain MRI scans, performed all cognitive tests, and completed mental health questionnaires (Table 5).

Table 5

Demographics for each subsample analyzed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and Magnetic Resonance Imaging (MRI) scanning.

	N participants	Age: mean (SD) years	Age: Range	% Females
Cognitive Tests	31 614	64.51 (7.66)	46.0–83.0	51.35%
Mental Health Questionnaire	21 077	64.63 (7.63)	47.0–82.0	53.0%
Cognitive Tests, Mental Health Questionnaire, and dwMRI	17 250	64.25 (7.53)	47.0–82.0	54.68%
Cognitive Tests, Mental Health Questionnaire, and rsMRI	17 005	64.2 (7.52)	47.0–82.0	54.92%
Cognitive Tests, Mental Health Questionnaire, and sMRI	14 793	64.21 (7.56)	47.0–82.0	54.62%
Cognitive Tests, Mental Health Questionnaire, and all MRI	14 256	64.04 (7.49)	47.0–82.0	54.97%

SD, standard deviation.

Test	Cognitive domain	Core measures	Field ID
Reaction Time	Reaction time and processing speed	Mean time to correctly identify matches	20023
Numeric Memory	Working memory	Maximum digits remembered correctly	4282
Fluid Intelligence	Verbal and numerical reasoning	Fluid intelligence score	20016
Prospective Memory	Prospective memory	Initial answer	4292
Trail Making	Executive function	Duration to complete numeric path (trail 1) Duration to complete alphabetic path (trail 2)	6348 6350
Matrix Pattern Completion	Non-verbal fluid reasoning	Number of puzzles correctly solved	6373
Symbol Digit Substitution	Processing speed	Number of symbol digit matches made correctly	23324
Picture Vocabulary	Vocabulary (crystallized cognitive ability)	Specific cognitive ability	26302
Tower Rearranging	Planning abilities (a component of executive function)	Number of puzzles correct	21004
Paired Associate Learning	Verbal declarative memory	Number of word pairs correctly associated	20197
Pairs Matching	Visual memory	Number of incorrect matches in round	399

No	Variable	Statistics/Values	Frequencies
1	Reaction time (log(x))	Mean (SD): 6.4 (0.2) min<med<max: 5.8<6.4<7.4 IQR (CV): 0.2 (0)	550 distinct values
2	Fluid intelligence score	Mean (SD): 6.6 (2) min <med<max: 1<7<13 IQR (CV): 3 (0.3)	13 distinct values
3	Numeric memory: Maximum digits remembered correctly	Mean (SD): 6.8 (1.3) min<med<max: 2<7<12 IQR (CV): 2 (0.2)	11 distinct values
4	Trail making: Duration to complete numeric path (log(x))	Mean (SD): 5.4 (0.3) min<med<max: 4.5<5.3<7.5 IQR (CV): 0.3 (0.1)	638 distinct values
5	Trail making: Duration to complete alphabetic path (log(x))	Mean (SD): 6.3 (0.4) min<med<max: 5.1<6.2<8.7 IQR (CV): 0.5 (0.1)	1542 distinct values
6	Symbol digit substitution: Number of correct matches	Mean (SD): 18.9 (5.2) min<med<max: 0<19<37 IQR (CV): 7 (0.3)	38 distinct values
7	Paired associate learning: Number of correct pairs	Mean (SD): 7 (2.5) min<med<max: 0<7<10 IQR (CV): 4 (0.4)	11 distinct values
8	Tower rearranging: Number of puzzles correct	Mean (SD): 9.9 (3.2) min<med<max: 0<10<18 IQR (CV): 4 (0.3)	19 distinct values
9	Matrix pattern completion: Number of puzzles correct	Mean (SD): 8 (2.1) min<med<max: 0<8<15 IQR (CV): 2 (0.3)	16 distinct values
10	Pairs matching: Number of incorrect matches (log(x+1))	Mean (SD): 1.4 (0.6) min<med<max: 0<1.4<3.8 IQR (CV): 0.7 (0.4)	35 distinct values
11	Picture vocabulary: Specific cognitive ability	Mean (SD): 0.4 (0.1) min<med<max: 0<0.4<0.6 IQR (CV): 0.1 (0.2)	3834 distinct values
12	Prospective memory: Initial answer	Min: 0 Mean: 0.8 Max: 1	0: 5502 (17.4%) 1: 26112 (82.6%)

Disorder/Exposure	Definition	Fields	Resources
PHQ-9	The sum of the nine depressive symptoms scored 0–4: Little interest or pleasure in doing things Feeling down depressed, or hopeless Trouble sleeping Feeling tired Poor appetite or overeating Feeling bad about yourself Trouble concentrating Moving or speaking slowly or fidgety or restless Thoughts that you would be better off dead Answers “Prefer not to answer” were assigned the lowest score (0).	20507 20508 20510 20511 20513 20514 20517 20518 20519	Davis et al., 2020 Kroenke et al., 2010
Depression ever	At least one core symptom of depression (Persistent sadness or Loss of interest) that lasted most or all of the day on most or all days within two weeks with some or a lot of impact on normal activity, plus additional depressive symptoms that represent a change in the mental and/or physical state from usual state and occur over the same period with thoughts about death. A total of ≥5 symptoms, including core symptoms. The score is obtained based on the DSM definition of major depressive disorder.	20435 20436 20437 20439 20440 20441 20446 20449 20450 20532 20536	Davis et al., 2020 CIDI-SF (Composite International Diagnostic Interview – Short Form), depression module, lifetime version Kessler et al., 1998
Bipolar affective disorder type I	Ever had depression and ever manic/hyper or irritable, plus at least three manifestations of mania or irritability (more talkative, more restless, thoughts racing, needed less sleep, more creative or had more ideas, easily distracted, more confident, more active) or four manifestations if never manic/hyper, plus duration of symptoms for a week or more and symptoms caused significant problems.	20435 20436 20437 20439 20440 20441 20446 20449 20450 20492 20493 20501 20502 20532 20536 20548	Davis et al., 2020 Cerimele et al., 2014 Carvalho et al., 2015
Bipolar affective disorder type II	Ever had depression and ever manic/hyper or irritable, plus at least three manifestations of mania or irritability (more talkative, more restless, thoughts racing, needed less sleep, more creative or had more ideas, easily distracted, more confident, more active) or four manifestations if never manic/hyper, plus duration of symptoms for a week or more and symptoms did not cause significant problems.	20435 20436 20437 20439 20440 20441 20446 20449 20450 20492 20501 20502 20532 20536 20548	Davis et al., 2020
Subthreshold depressive symptoms ever	Does not meet Composite International Diagnostic Interview (CIDI) diagnostic criteria for depression, but has at least one of the following symptoms: Persistent depression or anhedonia based on CIDI PHQ9 score for current depressive symptoms exceeds the threshold for mild depression The presence of a clinician diagnosis of depression	20002 20435 20436 20437 20439 20440 20441 20446 20449 20450 20507 20508 20510 20511 20513 20514 20517 20518 20519 20532 20536 20544	Davis et al., 2020 National Institute for Health and Clinical Excellence. Depression in adults: recognition and management. NICE Clinical Guideline CG90
Depression single episode	A single episode of depression without bipolar disorder type I.	20435 20436 20437 20439 20440 20441 20442 20446 20449 20450 20492 20493 20501 20502 20532 20536	Davis et al., 2020
Recurrent depression	More than one episode of depression throughout a lifetime without bipolar disorder type I.	20435 20436 20437 20439 20440 20441 20442 20446 20449 20450 20492 20493 20501 20502 20532 20536	Davis et al., 2020
Depression single episode triggered by a loss	A single episode of depression that started within two months after a traumatic event.	20435 20436 20437 20439 20440 20441 20442 20446 20447 20449 20450 20492 20493 20501 20502 20532 20536	Davis et al., 2020
Current depression	At least a single episode of depression (‘Depression ever’) with a minimum of 5 depression symptoms from the PHQ-9 occurring more than half days or for several days for suicidal thoughts.	20435 20436 20437 20439 20440 20441 20446 20449 20450 20507 20508 20510 20511 20513 20514 20517 20518 20519 20532 20536	Davis et al., 2020 Manea et al., 2012
Current severe depression	At least a single episode of depression (‘Depression ever’) As current depression (above) with PHQ score >15.	20435 20436 20437 20439 20440 20441 20446 20449 20450 20507 20508 20510 20511 20513 20514 20517 20518 20519 20532 20536	Davis et al., 2020 Manea et al., 2012
GAD-7	The sum of the recent symptoms of anxiety scored 0–3: Feelings of nervousness or anxiety Inability to stop or control worrying Worrying too much about different things Trouble relaxing Restlessness Easy annoyance or irritability Feelings of foreboding	20505 20506 20509 20512 20515 20516 20520	Davis et al., 2020 Kroenke et al., 2010
Lifetime anxiety disorder (GAD ever)	Ever felt worried, tense, or anxious for most of the day for at least six months with difficulties in controlling symptoms. The symptoms were often difficult to control, they interfered with daily activity and were accompanied by at least three somatic symptoms (restless, keyed up or on edge., easily tired, difficulty keeping the mind on current activity, more irritable than usual, tense muscles, trouble falling or staying asleep).	20417 20418 20419 20420 20421 20422 20423 20425 20426 20427 20429 20537 20538 20539 20540 20541 20542 20543	Davis et al., 2020 CIDI-SF (Composite International Diagnostic Interview – Short Form), GAD module, lifetime version. Scored based on the DSM definition of GAD Gigantesco and Morosini, 2008 Kessler et al., 1998 National Institute for Health and Clinical Excellence. Generalised anxiety disorder and panic disorder in adults: management. NICE Clinical Guideline CG113
Current anxiety	Ever had GAD (‘GAD Ever’) and GAD-7 score ≥10. Subdivided into mild, moderate, and severe with cut-offs at 5, 10, and 15	20417 20418 20419 20420 20421 20422 20423 20425 20426 20427 20429 20505	Davis et al., 2020 Kroenke et al., 2010
Current anxiety	Ever had GAD (‘GAD Ever’) and GAD-7 score ≥10. Subdivided into mild, moderate, and severe with cut-offs at 5, 10, and 15.	20506 20509 20512 20515 20516 20520 20537 20538 20539 20540 20541 20542 20543	Davis et al., 2020 Kroenke et al., 2010
N-12	The sum of the following scores: Mood swings Miserableness Irritability Sensitivity/hurt feelings Fed-up feelings Nervous feeling Worrier/anxious feelings Tense/‘highly strung’ Worry too long after embarrassment Suffer from ‘nerves’ Loneliness, isolation Guilty feelings	1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030	Dutt et al., 2022 Smith et al., 2013a
PDS	Ever been depressed or unenthusiastic for at least one week and seen either a GP or psychiatrist for nerves, anxiety, tension, or depression.	2090 2100 4598 4609 4631 5375	Dutt et al., 2022
RDS-4	Frequency of depressed mood, disinterest, restlessness, and tiredness during the past two weeks scored 1–4.	2050 2060 2070 2080	Dutt et al., 2022
PCL-6	The sum of scores on the core symptoms of PTSD in the past month: Repeated disturbing thoughts of a stressful experience Felt very upset when reminded of a stressful experience Avoided activities or situations because of a previous stressful experience Felt distant from other people Felt irritable or had angry outbursts in the past month Recent trouble concentrating on things The symptoms are grouped into three clusters: Memories, thoughts, or images, upset when reminded Avoid activities or situations, feeling distance or cut-off Irritable or angry, difficulty concentrating	20494 20495 20496 20497 20498 20508	Davis et al., 2020 Lang and Stein, 2005
PTSD	PCL-6 score ≥14.	20494 20495 20496 20497 20498 20508	Davis et al., 2020
Unusual experience	Experience of hallucinations or delusions, such as: Unreal voice Unreal vision Believed in an unreal conspiracy against self Believed in unreal communications or signs	20463 20468 20471 20474	Davis et al., 2020 Nuevo et al., 2012
Recent unusual experience	Reports at least one or two hallucination or delusion episodes within the last year.	20467	Davis et al., 2020
Life not worth living	Ever felt that life was not worth living.	20479	Davis et al., 2020
Self-harm	Ever harmed self, whether or not meant to die.	20480	Davis et al., 2020
Non-suicidal self-harm	Ever self-harmed without intention to end life, i.e., never attempted suicide.	20480 20483	Davis et al., 2020
Suicide attempt	Ever harmed self with intent to end life.	20480 20483	Davis et al., 2020
AUDIT	The sum of scores (0–4) on questions about alcohol consumption comprising three domains: Consumption: Frequency, amount of typical drinks, frequency of having six or more drinks Dependence: Unable to stop, failed to do what expected due to drinking, needed to drink first thing Harm: Guilt due to drinking, unable to remember due to drink Plus had injuries due to drinking or advice to cut down on drinking.	20403 20405 20407 20408 20409 20411 20412 20413 20414 20416	Davis et al., 2020 Saunders et al., 1993 Reinert and Allen, 2007
Alcohol consumption (AUDIT-C)	Sum of questions 1–3 of the Alcohol Consumption domain.	20403 20414 20416	Sanchez-Roige et al., 2019
Problems caused by alcohol (AUDIT-P)	Sum of questions 4–10 of the Alcohol Dependence and Alcohol Harm domains.	20405 20407 20408 20409 20411 20412 20413	Sanchez-Roige et al., 2019
Hazardous/Harmful alcohol use	AUDIT score ≥8.	20403 20405 20407 20408 20409 20411 20412 20413 20414 20416	Davis et al., 2020 Babor et al., 2001 Stansfeld et al., 2016
Current alcohol dependence	AUDIT score ≥15.	20403 20405 20407 20408 20409 20411 20412 20413 20414 20416	Davis et al., 2020 Babor et al., 2001 Drummond et al., 2016
Alcohol dependence ever	Ever physically dependent on alcohol.	20404	Davis et al., 2020
Addiction ever	Ever addicted to any substance or behaviour.	20401	Davis et al., 2020
Substance addiction	Ever been addicted to alcohol, illicit/recreational drugs, or medication.	20406 20456 20503	Davis et al., 2020
Current addiction	Ongoing addiction or dependence.	20415 20432 20457 20504	Davis et al., 2020
Cannabis ever	Taking cannabis at least once in life.	20453	Davis et al., 2020
Cannabis daily	Maximum frequency of taking cannabis when using it every day.	20454	Davis et al., 2020
Childhood adverse events	A positive score if any of the five questions of the Childhood Trauma Screen (CTS) reach the threshold: Felt loved as a child ≤3 (never, rarely, or sometimes) Physically abused by family as a child ≥2 (often or very often) Felt hated by a family member as a child ≥2 (often or very often) Sexually molested as a child ≥2 (often or very often) Someone to take to the doctor when needed as a child ≤4 (never, rarely, sometimes, or often)	20487 20488 20489 20490 20491	Davis et al., 2020 Walker et al., 1999
Adult adverse events	A positive score if any of the five questions of the Adult Trauma Screen reach the threshold: Been in a confiding relationship as an adult ≤3 (never, rarely, or sometimes) Physical violence by partner or ex-partner as an adult ≥2 (often or very often) Belittlement by partner or ex-partner as an adult ≥2 (often or very often) Sexual interference by partner or ex-partner without consent as an adult ≥2 (often or very often) Able to pay rent/mortgage ≤4 (never, rarely, sometimes, or often)	20521 20522 20523 20524 20525	Davis et al., 2020
Catastrophic trauma	At least one catastrophic event: Victim of sexual assault Victim of physically violent crime Been in a serious accident believed to be life-threatening Witnessed sudden violent death Diagnosed with a life-threatening illness Been involved in combat or exposed to war zones	20526 20527 20528 20529 20530 20531	Davis et al., 2020
Wellbeing	The sum of the following scores: General happiness Happiness with own health Belief that life is meaningful	20458 20459 20460	Davis et al., 2020

Share this article

Cite this article

Characteristics of the train and test sets used to build the g-factor.

Experimental design.

g-factor modeling and the relationship between the g-factor and mental health features.

Figure 2—source data 1

Goodness-of-fit indices for the hierarchical g-factor model across fivefolds.

Out-of-sample predictive performance of mental health features in the Partial Least Squares Regression (PLSR) model across fivefolds.

Predictive performance of machine learning models based on 72 individual neuroimaging phenotypes.

Figure 3—source data 1

Predictive performance of machine learning models based on neuroimaging phenotypes stacked within and across three Magnetic Resonance Imaging (MRI) modalities.

Figure 4—source data 1

Mean (averaged across fivefolds) out-of-sample predictive performance of Magnetic Resonance Imaging (MRI) modalities stacked using four machine learning algorithms.

Feature importance maps for neuroimaging features with the highest predictive performance for cognition derived via the Haufe transformation.

Figure 5—source data 1

Results of commonality analyses: the contribution of neuroimaging phenotypes to the relationship between cognition and mental health.

Figure 6—source data 1

Scatterplot of the relationship between the Partial Least Squares Regression (PLSR) performance of individual neuroimaging phenotypes and the proportion of the cognition–mental health relationship they capture.

Figure 7—source data 1

The contribution of neuroimaging phenotypes stacked within each and across all Magnetic Resonance Imaging (MRI) modalities to the relationship between cognition and mental health: Results of predictive modeling and commonality analyses.

Figure 8—source data 1

Demographics for each subsample analyzed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and Magnetic Resonance Imaging (MRI) scanning.

Cognitive tests and core measures of the UK Biobank cognitive test battery used in the study.

Whole-sample distributions of cognitive performance scores used to derive the g-factor (N=31,614).

Derivation of mental health scores.

Author details

Irina Buianova

Contribution

For correspondence

Competing interests

Mateus Silvestrin

Contribution

Competing interests

Jeremiah D Deng

Contribution

Competing interests

Narun Pat

Contribution

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism