Multimodal brain age estimates relate to Alzheimer disease biomarkers and cognition in early stages: a cross-sectional observational study
Abstract
Background:
Estimates of ‘brain-predicted age’ quantify apparent brain age compared to normative trajectories of neuroimaging features. The brain age gap (BAG) between predicted and chronological age is elevated in symptomatic Alzheimer disease (AD) but has not been well explored in presymptomatic AD. Prior studies have typically modeled BAG with structural MRI, but more recently other modalities, including functional connectivity (FC) and multimodal MRI, have been explored.
Methods:
We trained three models to predict age from FC, structural (S), or multimodal MRI (S+FC) in 390 amyloid-negative cognitively normal (CN/A−) participants (18–89 years old). In independent samples of 144 CN/A−, 154 CN/A+, and 154 cognitively impaired (CI; CDR > 0) participants, we tested relationships between BAG and AD biomarkers of amyloid and tau, as well as a global cognitive composite.
Results:
All models predicted age in the control training set, with the multimodal model outperforming the unimodal models. All three BAG estimates were significantly elevated in CI compared to controls. FC-BAG was significantly reduced in CN/A+ participants compared to CN/A−. In CI participants only, elevated S-BAG and S+FC BAG were associated with more advanced AD pathology and lower cognitive performance.
Conclusions:
Both FC-BAG and S-BAG are elevated in CI participants. However, FC and structural MRI also capture complementary signals. Specifically, FC-BAG may capture a unique biphasic response to presymptomatic AD pathology, while S-BAG may capture pathological progression and cognitive decline in the symptomatic stage. A multimodal age-prediction model improves sensitivity to healthy age differences.
Funding:
This work was supported by the National Institutes of Health (P01-AG026276, P01- AG03991, P30-AG066444, 5-R01-AG052550, 5-R01-AG057680, 1-R01-AG067505, 1S10RR022984-01A1, and U19-AG032438), the BrightFocus Foundation (A2022014F), and the Alzheimer’s Association (SG-20-690363-DIAN).
Editor's evaluation
This is a useful study exploring multi-modality brain age (structural plus resting state MRI) in people in the early stages or at risk of Alzheimer's disease. They found solid evidence that people with cognitive impairment had older-appearing brains and that older-appearing brains were related to Alzheimer's risk factors such as amyloid and tau deposition. Their data suggest that the multi-modality brain age model is more accurate than a unimodal structural MRI model.
https://doi.org/10.7554/eLife.81869.sa0eLife digest
The brains of people with advanced Alzheimer’s disease often look older than expected based on the patients’ actual age. This ‘brain age gap’ (how old a brain appears compared to the person’s chronological age) can be calculated thanks to machine learning algorithms which analyse images of the organ to detect changes related to aging. Traditionally, these models have relied on images of the brain structure, such as the size and thickness of various brain areas; more recent models have started to use activity data, such as how different brain regions work together to form functional networks.
While the brain age gap is a useful measure for researchers who investigate aging and disease, it is not yet helpful for clinicians. For example, it is unclear whether the machine learning algorithm could detect changes in the brains of individuals in the initial stages of Alzheimer’s disease, before they start to manifest cognitive symptoms.
Millar et al. explored this question by testing whether models which incorporate structural and activity data could be more sensitive to these early changes. Three machine learning algorithms (relying on either structural data, activity data, or combination of both) were used to predict the brain ages of participants with no sign of disease; with biological markers of Alzheimer’s disease but preserved cognitive functions; and with marked cognitive symptoms of the condition.
Overall, the combined model was slightly better at predicting the brain age of healthy volunteers, and all three models indicated that patients with dementia had a brain which looked older than normal. For this group, the model based on structural data was also able to make predictions which reflected the severity of cognitive decline. Crucially, the algorithm which used activity data predicted that, in individuals with biological markers of Alzheimer’s disease but no cognitive impairment, the brain looked in fact younger than chronological age. Exactly why this is the case remains unclear, but this signal may be driven by neural processes which unfold in the early stages of the disease. While more research is needed, the work by Millar et al. helps to explore how various types of machine learning models could one day be used to assess and predict brain health.
Introduction
Alzheimer disease (AD) is marked by structural and functional disruptions in the brain, some of which can be observed through multimodal magnetic resonance imaging (MRI) in preclinical and symptomatic stages of the disease (Frisoni et al., 2010; Brier et al., 2014a). More recently, the ‘brain-predicted age’ framework has emerged as a promising tool for neuroimaging analyses, leveraging recent developments and accessibility of machine-learning techniques, as well as large-scale, publicly available neuroimaging datasets (Cole and Franke, 2017b; Franke and Gaser, 2019). These models are trained to quantify how ‘old’ a brain appears, as compared to a normative sample of training data - typically consisting of cognitively normal participants across the adult lifespan (e.g., Cole et al., 2015). Thus, the framework allows for a residual-based interpretation of the brain age gap (BAG), defined as the difference between model-predicted age and chronological age, as an index of vulnerability and/or resistance to underlying disease pathology. Indeed, several studies have demonstrated that BAG is elevated (i.e. the brain ‘appears older’ than expected) in a host of neurological and psychiatric disorders, including symptomatic AD (Franke et al., 2010; Franke and Gaser, 2012; Gaser et al., 2013), as well as schizophrenia (e.g., Koutsouleris et al., 2014), HIV (e.g., Cole et al., 2017c), and type-2 diabetes (e.g., Franke et al., 2013), and moreover, predicts mortality (Cole et al., 2018). Conversely, lower BAG is associated with lower risk of disease progression (Gaser et al., 2013; Wang et al., 2019; Bocancea et al., 2021). Critically, at least one comparison suggests that BAG exceeds other established MRI (hippocampal volume) and CSF (pTau and Aβ42) biomarkers in sensitivity to AD progression (Gaser et al., 2013). Thus, by summarizing complex, non-linear, highly multivariate patterns of neuroimaging features into a simple, interpretable summary metric, BAG may reflect a comprehensive biomarker of brain health.
Several studies have established that symptomatic AD and mild cognitive impairment (MCI) are associated with elevated BAG (Cole and Franke, 2017b; Franke and Gaser, 2019). However, the sensitivity of these model estimates to AD in the presymptmatic stage (i.e. present amyloid pathology in the absence of cognitive decline [Sperling et al., 2011]) is less clear. The development of sensitive, reliable, non-invasive biomarkers of preclinical AD pathology is critical for the assessment of individual AD risk, as well as the evaluation of AD clinical prevention trials. Recent studies have demonstrated that greater BAG is associated with greater amyloid PET burden in a Down syndrome cohort (Cole et al., 2017a) and with greater tau PET burden in sporadic MCI and symptomatic AD (Lee et al., 2022). One approach to maximize sensitivity of BAG to presymptomatic AD pathology may be to train brain age models exclusively on amyloid-negative participants. As undetected AD pathology might influence MRI measures, and thus confound effects otherwise attributed to ‘healthy aging’ (Brier et al., 2014b), including the patterns learned by a traditional brain age model, an alternative model trained on amyloid-negative participants only might be more sensitive to detect presymptomatic AD pathology as deviations in BAG. Indeed, one recent study demonstrated that an amyloid-negative trained brain age model (Ly et al., 2020) is more sensitive to progressive stages of AD than a typical amyloid-insensitive model (Cole et al., 2015). However, this comparison included amyloid-negative and amyloid-positive test samples from two separate cohorts and thus may be driven by cohort, scanner, and/or site differences. To validate the applicability of the brain-predicted age approach to presymptomatic AD, it is important to test a model’s sensitivity to amyloid status, as well as continuous relationships with AD biomarkers, within a single cohort. Another recent comparison demonstrated that both traditional and amyloid-negative trained brain age models were similarly related to molecular AD biomarkers, but that further attempts to ‘disentangle’ AD from brain age by including more advanced AD continuum participants in the training sample significantly reduced relationships between brain age and AD markers (Hwang et al., 2022). Thus, in this study, we will apply the amyloid-negative training approach to a multimodal MRI dataset in order to maximize sensitivity to AD pathology in the presymptomatic stage.
Most of the brain-predicted age reports described above focused primarily on structural MRI. However, other studies have successfully modeled brain age using a variety of other modalities, including metabolic PET (Goyal et al., 2019; Lee et al., 2022), diffusion MRI (Cherubini et al., 2016; Petersen et al., 2022), and functional connectivity (FC) (Dosenbach et al., 2010; Liem et al., 2017; Eavani et al., 2018; Nielsen et al., 2019). Integration of multiple neuroimaging modalities may maximize sensitivity of BAG estimates to preclinical AD. Indeed, recent multimodal comparisons suggest that structural MRI and FC capture complementary age-related signals (Eavani et al., 2018; Dunås et al., 2021) and that age prediction may be improved by incorporating multiple modalities (Liem et al., 2017; Engemann et al., 2020). One recent study has shown that BAG estimates from an FC graph theory-based model are significantly elevated in autosomal dominant AD mutation carriers and are positively associated with amyloid PET (Gonneaud et al., 2021). Furthermore, we have recently demonstrated that FC correlation-based BAG estimates are surprisingly reduced in cognitively normal participants with evidence of amyloid pathology and elevated pTau, as well as in cognitively normal APOE ε4 carriers at genetic risk of AD (Millar et al., 2022). Thus, incorporating FC into BAG models may improve sensitivity to early AD.
This project aimed to develop multimodal models of brain-predicted age, incorporating both FC and structural MRI. Participants with presymptomatic AD pathology were excluded from the training set to maximize sensitivity. We hypothesized that BAG estimates would be sensitive to the presence of AD biomarkers and early cognitive impairment. We further considered whether estimates were continuously associated with AD biomarkers of amyloid and tau, as well as cognition. We hypothesized that FC and structural MRI would capture complementary signals related to age and AD. Thus, we systematically compared models trained on unimodal FC, structural MRI, and combined modalities to test the added utility of multimodal integration in accurately predicting age and whether each modality captures unique relationships with AD biomarkers and cognition.
Methods
Participants
We formed a training sample of healthy controls spanning the adult lifespan by combining structural and FC-MRI data from three sources, as described previously (Millar et al., 2022): the Charles F. and Joanne Knight AD Research Center (ADRC) at Washington University in St. Louis (WUSTL), healthy controls from studies in the Ances lab at WUSTL (Thomas et al., 2013; Petersen et al., 2021), and mutation-negative controls from the Dominantly Inherited Alzheimer Network (DIAN) study of autosomal dominant AD at multiple international sites including WUSTL (McKay et al., 2022). To minimize the likelihood of undetected AD pathology in our training set, participants over the age of 50 were only included in the training set if they were cognitively normal, as assessed by the Clinical Dementia Rating (CDR 0; Morris, 1993), and had at least one biomarker indicating the absence of amyloid pathology (CN/A−, see below). We excluded 59 participants who did not have available CDR or biomarker measures (see Figure 1—figure supplement 1). As CDR and amyloid biomarkers were not available in the Ances lab controls, we included only participants at or below age 50 from this cohort in the training set. These healthy control participants were randomly divided into a training set (~80%; N=390) and a held-out test set (~20%; N=97), which did not significantly differ in age, sex, education, or race, see Table 1.
Demographic information of the combined samples.
Measure | Training sets (total N=390) | Test sets (total N=97) § | Analysis sets (total N=452) | ||||||
---|---|---|---|---|---|---|---|---|---|
Ances Controls(CN/<50) | DIAN Controls(CN/A−) | Knight ADRC Controls(CN/A−) | Ances Controls(CN/<50) | DIAN Controls(CN/A−) | Knight ADRC Controls(CN/A−) | CN/A− | CN/A+ | CI | |
N | 136 | 120 | 134 | 38 | 26 | 33 | 144 | 154 | 154 |
Age (mean, SD) | 29.92 (9.92) | 40.02 (10.26) | 64.97 (10.57) | 26.68 (7.11) | 41.46 (12.34) | 64.73 (10.57) | 66.93 (8.53) | 72.56 (7.15)‡ | 75.67 (6.86) ‡ |
CDR (N 0 / N 0.5 / N 1.0 / N 2.0) | NA | 120 / 0 / 0 / 0 | 134 / 0 / 0 / 0 | NA | 26 / 0 / 0 / 0 | 33 / 0 / 0 / 0 | 144 / 0 / 0 / 0 | 154 / 0 / 0 / 0 | 0 / 119 / 35 / 2 |
Amyloid status (N + / N -) | NA | 120 / 0 | 134 / 0 | NA | 26 / 0 | 33 / 0 | 144 / 0 | 0 / 154 | 0 / 57 |
Biomarkers available (N PET / CSF / both) | NA | 30 / 6 / 79 | 11 / 22 / 91 | NA | 3 / 1 / 21 | 5 / 0 / 28 | 24 / 0 / 120 | 17 / 0 / 137 | 14 / 0 / 43 |
APOE ε4 carrier status (N + / N -) | NA | 76 / 44 | 99 / 34 | NA | 19 / 7 | 28 / 5 | 115 / 29 | 71 / 83 ‡ | 55 / 98 ‡ |
MMSE (mean, SD) | NA | NA | 29.26 (1.05) | NA | NA | 29.45 (0.94) | 29.13 (1.17) | 28.97 (1.33) | 25.37 (3.55) ‡ |
Sex (N female / N male) | 70 / 64 | 85 / 35 | 84 / 50 | 19 / 18 | 16 / 10 | 22 / 11 | 89 / 55 | 91 / 63 | 68 / 86† |
Years of education (mean, SD) | 13.68 (2.16) | 14.78 (3.04) | 16.16 (2.43) | 13.95 (1.99) | 14.92 (2.83) | 16.48 (2.43) | 15.71 (2.65) | 15.90 (2.64) | 15.05 (2.97)* |
Race (N American Indian or Alaska Native) | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Race (N Asian) | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
Race (N Black) | 67 | 0 | 20 | 17 | 0 | 7 | 17 | 16 | 20 |
Race (N Native Hawaiian or Other Pacifc Islander) | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
Race (N White) | 57 | 118 | 112 | 17 | 26 | 26 | 127 | 137 | 134 |
Site | WUSTL | Multiple sites | WUSTL | WUSTL | Multiple sites | WUSTL | WUSTL | WUSTL | WUSTL |
Scanner | Siemens Trio | Siemens Trio / Verio | Siemens Trio / Biograph | Siemens Trio | Siemens Trio / Verio | Siemens Trio / Biograph | Siemens Trio / Biograph | Siemens Trio / Biograph | Siemens Trio / Biograph |
Field strength | 3T | 3T | 3T | 3T | 3T | 3T | 3T | 3T | 3T |
-
CN = Cognitively Normal, <50 = less than age 50, A− = amyloid negative, A+ = amyloid positive, CI = cognitively Impaired, DIAN = Dominantly Inherited Alzheimer Network, ADRC = Alzheimer Disease Research Center, AD = Alzheimer disease, CDR = Clinical Dementia Rating, MMSE = Mini Mental State Examination, WUSTL = Washington University in St. Louis, T = Tesla. Group differences from the CN/A− analysis set were tested with t tests for continuous variables and χ2 tests for categorical variables.
-
*
p < 0.05, ^ p < 0.10.
-
†
p < 0.01.
-
‡
p < 0.001.
-
§
Test sets include randomly-selected, non-overlapping subsets of participants drawn from the same studies as the training sets.
Finally, independent samples for hypothesis testing included three groups from the Knight ADRC: a randomly selected sample of 144 CN/A− controls who did not overlap with the training or testing sets, 154 CN/A+ participants, and 154 cognitively impaired (CI) participants (CDR > 0 with a biomarker measure consistent with amyloid pathology [see below] and/or a primary diagnosis of AD or uncertain dementia [McKhann et al., 2011]). See Table 1 for demographic details of each sample. All participants provided written informed consent in accordance with the Declaration of Helsinki and their local institutional review board. All procedures were approved by the Human Research Protection Office at WUSTL (IRB ID # 201204041).
PET and CSF biomarkers
Amyloid burden was imaged with PET using (11 C)-Pittsburgh Compound B (PIB; Klunk et al., 2004) or (18 F)-Florbetapir (AV45; Wong et al., 2010). Regional standard uptake ratios (SUVRs) were modeled from 30 to 60 min after injection for PIB and from 50 to 70 min for AV45, using cerebellar gray as the reference region (Su et al., 2013). Regions of interest were segmented automatically using FreeSurfer 5.3 (Fischl, 2012). Global amyloid burden was defined as the mean of partial-volume-corrected (PVC) SUVRs from bilateral precuneus, superior and rostral middle frontal, lateral and medial orbitofrontal, and superior and middle temporal regions (Su et al., 2013). Amyloid summary SUVRs were harmonized across tracers using a centiloid conversion (Su et al., 2018).
Tau deposition was imaged with PET using (18 F)-Flortaucipir (AV-1451; Chien et al., 2013). Regional SUVRs were modeled from 80 to 100 min after injection, using cerebellar gray as the reference region. A tau summary measure was defined in the mean PVC SUVRs from bilateral amygdala, entorhinal, inferior temporal, and lateral occipital regions (Mishra et al., 2017).
CSF was collected via lumbar puncture using methods described previously (Fagan et al., 2006). After overnight fasting, 20–30 mL samples of CSF were collected, centrifuged, then aliquoted (500 µL) in polypropylene tubes, and stored at –80°C. CSF amyloid β peptide 42 (Aβ42), Aβ40, and phosphorylated tau-181 (pTau) were measured with automated Lumipulse immunoassays (Fujirebio, Malvern, PA, USA) using a single lot of assays for each analyte. Aβ42 and pTau estimates were each normalized for individual differences in CSF production rates by forming a ratio with Aβ40 as the denominator (Hansson et al., 2019; Guo et al., 2020). As pTau/Aβ40 was highly skewed, we applied a log transformation to these estimates before statistical analysis.
Amyloid positivity was defined using previously published cutoffs for PIB (SUVR > 1.42; Vlassenko et al., 2016) or AV45 (SUVR > 1.19; Su et al., 2019). Additionally, the CSF Aβ42/Aβ40 ratio has been shown to be highly concordant with amyloid PET (positivity cutoff < 0.0673; Schindler et al., 2018; Volluz et al., 2021). Thus, participants were defined as amyloid-positive (for CN/A+ and CI groups) if they had either a PIB, AV45, or CSF Aβ42/Aβ40 ratio measure in the positive range. Participants with discordant positivity between PET and CSF estimates were defined as amyloid-positive.
Cognitive battery
Knight ADRC participants completed a 2 hr battery of cognitive tests. We examined global cognition by forming a composite of tasks across cognitive domains, including processing speed (Trail Making A; Schindler et al., 2018), executive function (Trail Making B; Schindler et al., 2018), semantic fluency (Animal Naming; Armitage, 1946), and episodic memory (Free and Cued Selective Reminding Test free recall score; Goodglass and Kaplan, 1983; Grober et al., 1988). This composite has recently been used to study individual differences in cognition in relation the preclinical AD biomarkers and structural MRI (Aschenbrenner et al., 2018), as well as functional MRI measures (Millar et al., 2021).
MRI acquisition
All MRI data were obtained using a Siemens 3T scanner, although there was a variety of specific models within and across studies. As described previously (Millar et al., 2022), participants in the Knight ADRC and Ances lab studies completed one of two comparable structural MRI protocols, varying by scanner (sagittal T1-weighted magnetization-prepared rapid gradient echo sequence [MPRAGE] with repetition time [TR] = 2400 or 2300 ms, echo time [TE] = 3.16 or 2.95 ms, flip angle = 8 or 9°, frames = 176, field of view = sagittal 256×256 or 240×256 mm, 1 mm isotropic or 1×1×1.2 mm voxels; oblique T2-weighted fast spin echo sequence [FSE] with TR = 3200 ms, TE = 455 ms, 256×256 acquisition matrix, 1 mm isotropic voxels) and an identical resting-state fMRI protocol (interleaved whole-brain echo planar imaging sequence [EPI] with TR = 2200 ms, TE = 27 ms, flip angle = 90°, field of view = 256 mm, 4 mm isotropic voxels for two 6 min runs [164 volumes each] of eyes open fixation). DIAN participants completed a similar MPRAGE protocol (TR = 2300ms, TE = 2.95ms, flip angle = 9°, field of view = 270 mm, 1.1×1.1×1.2 mm voxels; McKay et al., 2022). Resting-state EPI sequence parameters for the DIAN participants differed across sites and scanners with the most notable difference being shorter resting-state runs (one 5 min run of 120 volumes; see Supplementary file 1 for summary of structural and functional MRI parameters; McKay et al., 2022).
FC preprocessing and features
All MRI data were processed using common pipelines. Initial fMRI preprocessing followed conventional methods, as described previously (Shulman et al., 2010; Millar et al., 2022), including frame alignment, debanding, rigid body transformation, bias field correction, and normalization of within-run intensity values to a whole-brain mode of 1000 (Power et al., 2012). Transformation to an age-appropriate in-house atlas template (based on independent samples of either younger adults or CN older adults) was performed using a composition of affine transforms connecting the functional volumes with the T2-weighted and MPRAGE images. Frame alignment was included in a single resampling that generated a volumetric time series of the concatenated runs in isotropic 3 mm atlas space.
As described previously (Fox et al., 2009; Millar et al., 2022), additional processing was performed to allow for nuisance variable regression. Data underwent framewise censoring based on motion estimates (framewise displacement [FD] > 0.3 mm and/or derivative of variance [DVARS] > 2.5 above participant’s mean). To further minimize the confounding influence of head motion on FC estimates (Power et al., 2012) in all samples, we only included scans with low head motion (mean FD < 0.30 mm and > 50% frames retained after motion censoring). BOLD data underwent a temporal band-pass filter (0.005 Hz < f < 0.1 Hz) and nuisance variable regression, including motion parameters, timeseries from FreeSurfer 5.3-defined (Fischl, 2012) whole brain (global signal), CSF, ventricle, and white matter masks, as well as the derivatives of these signals. Finally, BOLD data were spatially blurred (6 mm full width at half maximum).
Final BOLD time series data were averaged across voxels within a set of 300 spherical regions of interest (ROIs) in cortical, subcortical, and cerebellar areas (Seitzman et al., 2020). For each scan, we calculated the 300×300 Fisher-transformed Pearson correlation matrix of the final averaged BOLD time series between all ROIs. We then used the vectorized upper triangle of each correlation matrix (excluding auto-correlations; 44,850 total correlations) as input features for predicting age. Since site and/or scanner differences between samples might confound neuroimaging estimates, we harmonized FC matrices using an empirical Bayes modeling approach (ComBat; Johnson et al., 2007; Fortin et al., 2017), which has previously been applied to FC data (Yu et al., 2018).
Structural MRI processing and features
All T1-weighted images underwent cortical reconstruction and structural segmentation through a common pipeline with FreeSurfer 5.3 (Fischl et al., 2002; Fischl, 2012). Structural processing included segmentation of subcortical white matter and deep gray matter, intensity normalization, registration to a spherical atlas, and parcellation of the cerebral cortex based on the Desikan atlas (Desikan et al., 2006). Inclusion and exclusion errors of parcellation and segmentation were identified and edited by a centralized team of trained research technicians according to standardized criteria (Su et al., 2013). We then used the FreeSurfer-defined thickness estimates from 68 cortical regions (Desikan et al., 2006), along with volume estimates from 33 subcortical regions (Fischl et al., 2002) as input features for predicting age. We harmonized structural features across sites and scanners using the same ComBat approach (Johnson et al., 2007; Fortin et al., 2017), which has also been applied to structural MRI data (Fortin et al., 2018).
Gaussian process regression
As described previously (Millar et al., 2022), machine-learning analyses were conducted using the Regression Learner application in Matlab (MathWorks, 2021). We trained two Gaussian process regression (GPR; Rasmussen et al., 2004) models, each with a rational quadratic kernel function to predict chronological age using fully-processed, harmonized MRI features (FC or structural) in the training set. The σ hyperparameter was tuned within each model by searching a range of values from 10–4 to 10*SDage using Bayesian optimization across 100 training evaluations. The optimal value of σ for each model was found (see Figure 1—figure supplement 2) and was applied for all subsequent applications of that model. All other hyperparameters were set to default values (basis function = constant and standardize = true).
Model performance in the training set was assessed using 10-fold cross validation via the Pearson correlation coefficient (r), the proportion of variance explained (R2), the mean absolute error (MAE), and root-mean-square error (RMSE) between true chronological age and the cross-validated age predictions merged across the 10 folds. We then evaluated generalizability of the models to predict age in unseen data by applying the trained models to the held-out test set of healthy controls. Finally, we applied the same fully-trained GPR models to separate analysis sets of 154 CI, 154 CN/A+, and 144 CN/A− controls to test our hypotheses regarding AD-related group effects and individual difference relationships. Unimodal models were each constructed with a single GPR model. The multimodal model was constructed by taking the ‘stacked’ predictions from each first-level unimodal model as features for training a second-level GPR model (Liem et al., 2017; Engemann et al., 2020; Dunås et al., 2021).
For each participant, we calculated model-specific BAG estimates as the difference between chronological age and age predictions from the unimodal FC model (FC-BAG), structural model (S-BAG), and multimodal model (S+FC BAG). To correct for regression dilution commonly observed in similar models (Le et al., 2018; Smith et al., 2019; Liang et al., 2019), we included chronological age as a covariate in all statistical tests of BAG (Cole et al., 2017a; Le et al., 2018). However, to avoid inflating estimates of prediction accuracy (Butler et al., 2021), only uncorrected age prediction values were used for evaluating model performance in the training and test sets.
Statistical analysis
All statistical analyses were conducted in R 4.0.2 (R Development Core Team, 2020). Demographic differences in the AD samples were tested with independent-samples t tests for continuous variables and χ2 tests for categorical variables, using CN/A− controls as a reference group. Differences in brain age model performance were tested using Williams’s test of difference between dependent correlations sharing one variable, i.e., Pearson’s r between age and each model prediction of age. To correct for age-related bias in BAG (Le et al., 2018; as previously mentioned), we controlled for age as a covariate during all statistical tests. Group differences in each BAG estimate were tested using an omnibus ANOVA test with follow-up pairwise t tests on age-residualized BAG estimates, using a false discovery rate (FDR) correction for multiple comparisons. Assumptions of normality were tested by visual inspection of quantile-quantile plots. Assumptions of equality of variance were tested with Levene’s test. Linear regression models tested the effects of cognitive impairment (CDR > 0 vs. CDR 0) and amyloid positivity (A− vs. A+) on BAG estimates from each model, controlling for true age (as noted above), sex, and years of education. Given the potential confounding influence of head motion on FC-derived measures (Power et al., 2012; Van Dijk et al., 2012; Satterthwaite et al., 2012), we also included mean FD as an additional covariate of non-interest in the FC and S+FC models. We tested continuous relationships with AD biomarkers and cognitive estimates using linear regression models, including the same demographic and motion covariates. Since the range of amyloid biomarkers was drastically reduced in the CN/A− sample, we excluded these participants from models testing continuous amyloid relationships. Effect sizes were computed as partial η2 (ηp2).
Results
Sample description and demographics
Demographic characteristics of the training sets, test sets, and analysis sets are reported in Table 1. CN/A+ participants were older (t = 6.15, p < 0.001) and more likely to be APOE ε4 carriers (χ2 = 34.73, p < 0.001) than amyloid-negative controls. Furthermore, CI participants were older (t = 9.71, p < 0.001), more likely male (χ2 = 8.60, p = 0.003), more likely to be APOE ε4 carriers (χ2 = 56.67, p < 0.001), and had fewer years of education (t = 2.03, p < 0.043), and lower MMSE scores (t = 12.46, p < 0.001) than amyloid-negative controls.
Comparison of model performance
All models accurately predicted chronological age in the training sets, as assessed using 10-fold cross validation, as well as in the held-out test sets. Overall, prediction accuracy was lowest in the FC model (MAEFC/Train = 8.67 years, R2FC/Train = 0.68, MAEFC/Test = 8.25 years, R2FC/Test = 0.73; see Figure 1A & B). The structural MRI model (MAES/Train = 5.97 years, R2S/Train = 0.81, MAES/Test = 6.26 years, R2S/Test = 0.82; see Figure 1C & D) significantly outperformed the FC model in age prediction accuracy, Williams’s tS vs. FC = 5.39, p < 0.001. There was a significant, but modestly sized, positive correlation between FC-BAG and S-BAG in the adult lifespan CN/A− training and testing sets (r = 0.095, p = 0.036; see Figure 1—figure supplement 3A), as well as the AD analysis sets (r = 0.134, p = 0.004; see Figure 1—figure supplement 3B).

Performance of the brain age models in the training (left column) and test sets (right column) for each modality: functional connectivity (FC; A and B), structural MRI (S; C and D) and multimodal models (S+FC; E and F).
Age predicted by each model (y axis) is plotted against true age (x axis). Colored lines and shaded areas represent regression lines and 95% confidence regions. Dashed black lines represent perfect prediction. Model performance is evaluated by Pearson’s r, proportion of variance explained (R2), mean absolute error (MAE), and root-mean-square error (RMSE).
Finally, the multimodal model (MAES+FC/Train = 5.34 years, R2S+FC/Train = 0.86, MAES+FC/Test = 5.25 years, R2S+FC/Test = 0.87; see Figure 1E & F) significantly outperformed both the FC model (Williams’s tS+FC vs. FC = 11.20, p < 0.001) and the structural MRI model (Williams’s tS+FC vs. S = 5.67, p < 0.001). It is possible that the modest increase in the multimodal model was due to capitalizing on noise, simply by adding more features to the structural model. Hence, we also compared the observed R2S+FC to a bootstrapped distribution of R2 performance estimates from 1000 resamples using a model in which the original structural MRI model was stacked with a model trained on randomly reshuffled FC features. Thus, this distribution represents the expected improvements in model performance from simply adding new features to the structural MRI model with the stacked approach. The observed R2S+FC outperformed all R2 estimates from this bootstrapped distribution (p < 0.001; see Figure 1—figure supplement 4), suggesting that the modest increase in model performance observed in the stacked multimodal (S+FC) model over the unimodal structural model is due to meaningful age-related FC signal, rather than capitalizing on noise in a larger feature set.
BAG differences in cognitive impairment and amyloid positivity
Residual FC-BAG was normally distributed (see Figure 2—figure supplement 1), and variance in FC-BAG did not significantly differ between the analysis sets, Levene’s statistic = 0.01, p = 0.988. An omnibus ANOVA revealed significant differences in residual FC-BAG across the three groups, F(2,449) = 9.80, p < 0.001. FC-BAG was 2.17 years older in CI participants compared to CN controls (β = 2.17, p = 0.030, ηp2 = 0.01; see Figure 2A&B, Table 2A). Follow-up t tests revealed that residual FC-BAG was significantly elevated in CI relative to CN/A+participants (pFDR < 0.001). FC-BAG was also 1.64 years lower in A+ participants compared to A− (β = –1.64, p = 0.035, ηp2 = 0.01), controlling for global CDR and the other covariates. Follow-up t tests revealed that residual FC-BAG was significantly lower in CN/A+ participants compared to CN/A− controls (pFDR = 0.002).

Group differences in functional connectivity (FC; A and B), structural (S; C and D), and multimodal (S+FC; E and F) brain age in the analysis sets.
Comparisons are presented between cognitively normal (Clinical Dementia Rating [CDR] = 0) biomarker-negative controls (CN/A−; blue) vs. CN/A+ (green) vs. cognitively impaired participants (CI, red). Scatterplots (A, C, and E) show predicted vs. true age for each group. Colored lines and shaded areas represent group-specific regression lines and 95% confidence regions. Dashed black lines represent perfect prediction. Violin plots (B, D, and F) show residual FC-brain age gap (BAG; controlling for true age) in each group. p values are reported from pairwise independent-samples t tests.
Linear regression models predicting functional connectivity (FC)-brain age gap (BAG) (A), S-BAG (B), and FC + S BAG (C).
CDR = Clinical Dementia Rating. FD = framewise displacement.
A. FC-BAG (df = 348) | B. S-BAG (df = 349) | C. S+FC BAG (df = 348) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Estimate | SE | p value | ηp2 | Estimate | SE | p value | ηp2 | Estimate | SE | p value | ηp2 | |||
Intercept | 30.903 | 3.809 | 0.000 | 5.830 | 4.899 | 0.235 | 11.755 | 4.197 | 0.005 | |||||
CDR > 0 | 2.169 | 0.997 | 0.030 | 0.013 | 5.105 | 1.287 | 0.000 | 0.043 | 4.305 | 1.099 | 0.000 | 0.042 | ||
Amyloid+ | –1.640 | 0.776 | 0.035 | 0.013 | 0.900 | 1.002 | 0.369 | 0.002 | 0.060 | 0.855 | 0.944 | 0.000 | ||
Age (y) | –0.586 | 0.044 | 0.000 | 0.335 | –0.151 | 0.057 | 0.008 | 0.020 | –0.201 | 0.049 | 0.000 | 0.047 | ||
Sex = female | –1.174 | 0.700 | 0.094 | 0.008 | 1.792 | 0.904 | 0.048 | 0.011 | 0.691 | 0.771 | 0.371 | 0.002 | ||
Education (y) | –0.006 | 0.127 | 0.964 | 0.000 | –0.155 | 0.164 | 0.345 | 0.003 | –0.152 | 0.140 | 0.276 | 0.003 | ||
Mean FD | 5.528 | 5.467 | 0.313 | 0.003 | NA | NA | NA | NA | 4.893 | 6.024 | 0.417 | 0.002 | ||
Residual S-BAG was also normally distributed (see Figure 2—figure supplement 1), and variance in S-BAG did not significantly differ between the analysis sets, Levene’s statistic = 0.10, p = 0.902. An omnibus ANOVA revealed significant differences in residual S-BAG across the three groups, F(2,449) = 20.64, p < 0.001. S-BAG was 5.10 years older in CI participants compared to CN controls (β = 5.10, p < 0.001, ηp2 = 0.04; see Figure 2C&D, Table 2B). Follow-up t tests revealed that residual S-BAG was significantly elevated in CI participants relative to CN/A− and CN/A+ participants (pFDR’s < 0.001). S-BAG did not significantly differ as a function of amyloid positivity, controlling for CDR and the other covariates.
Residual S+FC-BAG was also normally distributed (see Figure 2—figure supplement 1), and variance in S+FC-BAG did not significantly differ between the analysis sets, Levene’s statistic = 0.89, p = 0.412. An omnibus ANOVA revealed significant differences in residual S+FC-BAG across the three groups, F(2,449) = 21.84, p < 0.001. S+FC-BAG was 4.31 years older in CI participants compared to CN controls (β = 4.1, p < 0.001, ηp2 = 0.04; see Figure 2E, F, Table 2C). Follow-up t tests revealed that residual FC-BAG was significantly elevated in CI participants relative to CN/A− and CN/A+ participants (pFDR’s < 0.001). S+FC-BAG did not significantly differ as a function of amyloid positivity, controlling for CDR and the other covariates.
Relationships with amyloid markers
355 participants (144 CN/A−, 154 CN/A+, 57 CI) had an available amyloid PET scan, and 300 (120 CN/A−, 137 CN/A+, 43 CI) had an available CSF estimate of Aβ42/40. In the FC model, FC-BAG was not significantly related with amyloid PET nor was there an interactive relationship with amyloid PET between groups (see Figure 3A). There were also no significant main effects or interactions between FC-BAG, S-BAG, or S+FC BAG and CSF Aβ42/40 (See Figure 3B, D and F).

Continuous relationships between amyloid biomarkers and functional connectivity (FC-brain age gap [BAG]; A and B), structural (S-BAG; C and D), and multimodal (S+FC BAG; E and F) BAG in the analysis sets.
Scatterplots show amyloid PET (A, C, and E) and CSF AB42/40 (B, D, and F) as a function of residual BAG (controlling for true age) in each group. Colored lines and shaded areas represent group-specific regression lines and 95% confidence regions. Dashed black lines represent main effect regression lines across all groups.
In the structural and multimodal models, there were significant main effects, such that greater S-BAG (β = 0.79, p = 0.004, ηp2 = 0.041; see Figure 3C) and greater S+FC BAG (β = 0.81, p = 0.015, ηp2 = 0.029; see Figure 3E) were both associated with greater amyloid PET. In the multimodal model only, this relationship was further characterized by a non-significant interaction (β = 1.16, p = 0.087, ηp2 = 0.014), such that the association was significantly positive in CI participants interaction (β = 1.53, p = 0.029, ηp2 = 0.092) but not in CN/A+ (β = –0.05, p = 0.881, ηp2 = 0.001).
Relationships with tau markers
99 participants (42 CN/A–, 40 CN/A+, 17 CI) had an available tau PET scan, and 300 (120 CN/A–, 137 CN/A+, 43 CI) had an available CSF estimate of pTau-181/Aβ40. In the FC model, FC-BAG was not significantly related with tau PET or CSF pTau-181/Aβ40 (see Figure 4A and B). However, there was a non-significant interaction, suggesting a more positive association between CSF pTau-181/Aβ40 and FC-BAG in CI participants but not in CN controls (β = 0.02, p = 0.059, ηp2 = 0.016).

Continuous relationships between tau biomarkers and functional connectivity (FC-brain age gap [BAG]; A and B), structural (S-BAG; C and D), and multimodal (S+FC BAG; E and F) BAG in the analysis sets.
Scatterplots show Tau PET summary (A, C, and E) and log-transformed CSF pTau/Aβ40 (B, D, and F) as a function of residual BAG (controlling for true age) in each group. Colored lines and shaded areas represent group-specific regression lines and 95% confidence regions. Dashed black lines represent main effect regression lines across all groups.
In the structural and multimodal models, there were significant main effects, such that greater S-BAG (β = 0.02, p < 0.001, ηp2 = 0.141; see Figure 4C) and greater S+FC BAG (β = 0.02, p = 0.001, ηp2 = 0.110; see Figure 4E) were both associated with greater tau PET. These main effects were further characterized by significant interactions (S-BAG: β = 0.04, p < 0.001, ηp2 = 0.176; S+FC-BAG: β = 0.07, p < 0.001, ηp2 = 0.250), such that the positive association was only observed in CI participants, but not in the other groups.
Consistent with tau PET, CSF pTau/Aβ40 demonstrated similar interactive relationships, such that greater S-BAG (β = 0.02, p < 0.001, ηp2 = 0.052; see Figure 4D) and greater S+FC BAG (β = 0.04, p < 0.001, ηp2 = 0.075; see Figure 4F) were both associated with greater CSF pTau/Aβ40 in the CI participants, but not in the other groups.
Relationships with cognition
445 participants (144 CN/A−, 153 CN/A+, 148 CI) had available performance measures from the cognitive composite tasks. In the FC model, there was a significant main effect, such that across all groups, greater FC-BAG was associated with lower cognitive composite score (β = –0.01, p = 0.006, ηp2 = 0.017; see Figure 5A). However, this effect was driven by group differences in both variables, as there were neither relationships between FC-BAG and cognition within any of the groups nor were there any significant interactions.

Continuous relationships between global cognition and functional connectivity (FC-brain age gap [BAG]; A), structural (S-BAG; B), and multimodal (S+FC BAG; C) in the analysis sets.
Scatterplots show global cognition as a function of residual BAG (controlling for true age) in each group. Colored lines and shaded areas represent group-specific regression lines and 95% confidence regions. Dashed black lines represent main effect regression lines across all groups.
In the structural model and multimodal models, there were significant main effects, such that greater S-BAG (β = –0.03, p < 0.001, ηp2 = 0.104; see Figure 5B) and greater S+FC BAG (β = –0.03, p < 0.001, ηp2 = 0.096; see Figure 5C) were both associated with lower cognitive composite scores. Both effects were further characterized by significant interactions such that the negative associations were observed in the CI participants, but not in the other groups (S-BAG: β = –0.03, p < 0.001, ηp2 = 0.045; S+FC-BAG: β = –0.04, p < 0.001, ηp2 = 0.047).
Discussion
We first found that machine-learning models successfully predicted age when trained on FC, structural MRI, and multimodal datasets. As expected, the structural model predicted age with greater accuracy than the FC model, but the multimodal model outperformed both unimodal models. Second, BAG estimates from all models were significantly elevated in CI participants compared to CN controls. BAG estimates in the FC model were significantly reduced in cognitively normal participants with elevated amyloid, but no structural group differences were observed in presymptomatic stages. Third, interactive relationships were observed, such that greater BAG was associated with greater continuous AD biomarker load in CI, but not in CN, participants. Specifically, in the FC model, such a pattern only appeared in a non-significant interaction predicting CSF pTau/Aβ40. However, in the structural model, these interactions were significantly observed in relation to CSF pTau/Aβ40 and tau PET. In the multimodal model, these same interactions were also observed in addition to a non-significant interaction with amyloid PET. Finally, regarding cognitive relationships, similar interactive patterns were observed, such that in CI participants, greater BAG estimates from structural and multimodal models were associated with lower cognitive performance; however, this relationship was not observed in the FC model.
Predicting brain age with multiple modalities
We found that a GPR model trained on structural MRI features predicted chronological age in a cognitively normal, amyloid-negative adult sample with an R2 of 0.81. This level of performance is comparable to other structural models, which have reported R2s ranging from 0.80 to 0.95 (Cole and Franke, 2017b; Liem et al., 2017; Eavani et al., 2018; Wang et al., 2019; Bashyam et al., 2020; Ly et al., 2020; Gong et al., 2021; Lee et al., 2022). As previously reported (Millar et al., 2022), the FC-trained model predicted age with an R2 of 0.68, again consistent with previous FC models, which have achieved R2s from 0.53 to 0.80 (Liem et al., 2017; Eavani et al., 2018; Gonneaud et al., 2021). Our observation that structural MRI outperformed FC in age prediction is also consistent with previous direct comparisons between modalities (Liem et al., 2017; Eavani et al., 2018; Dunås et al., 2021).
Importantly, however, there was only a modest positive correlation between FC and structural BAG estimates, after correcting for age-related biases, suggesting that functional and structural MRI capture distinct age-related signals. Indeed, the multimodal model outperformed both unimodal models by integrating these complementary signals. These observations, again, are consistent with other recent reports of multimodal age prediction models (Liem et al., 2017; Eavani et al., 2018; Engemann et al., 2020; Dunås et al., 2021). Future models may improve age prediction accuracy by combining data from structural, FC, and/or other neuroimaging modalities, several of which may be available in typical MRI sessions of multiple sequences.
BAG as a marker of cognitive impairment
Structural BAG was elevated by 5.10 years in CI participants compared to CN controls. This effect is comparable to previous structural age prediction models, demonstrating elevations in AD and MCI samples between 5 and 10 years (Cole and Franke, 2017b; Franke and Gaser, 2019). As previously reported, FC BAG was also elevated in CI participants, but to a relatively smaller extent, i.e., 2.17 years (Millar et al., 2022). The multimodal BAG was similarly elevated in CI participants by 5.10 years. Thus, each model is clearly sensitive to group differences in AD status at the symptomatic stage.
Consistent with one previous report (Lee et al., 2022), we demonstrated that within the CI participants, BAG estimates were related to individual differences in AD biomarkers and cognitive function. These effects were most pronounced in the structural model, which showed relationships with tau biomarkers and cognition in the CI participants, and the multimodal model, which showed relationships with tau, cognition, and amyloid PET. Thus, age prediction models that include structural MRI (including unimodal and multimodal approaches) may be useful in tracking AD pathological progression and cognitive decline within the symptomatic stage of the disease.
BAG as a marker of presymptomatic AD
We found that structural and multimodal BAG did not differ between cognitively normal participants with and without amyloid pathology. In cognitively normal participants, structural BAG estimates did not significantly associate with individual differences in any AD biomarkers. Overall, although structural and multimodal BAG estimates track well with some biomarkers of AD pathophysiology, as previously reported (Lee et al., 2022), our novel results suggest that these relationships are not observed until the symptomatic stage of the disease, at which point structural changes become more apparent.
As we have previously reported (Millar et al., 2022), FC-BAG was lower in presymptomatic AD participants compared to amyloid-negative controls. Extending beyond this group difference, we now also note that FC-BAG was negatively associated with amyloid PET in CN/A+ participants. The combined reduction of FC-BAG in the presymptomatic stage and increase in the symptomatic stage suggest a biphasic functional response to AD progression, which is partially consistent with some prior suggestions (Jagust and Mormino, 2011; Jones et al., 2016; Jones et al., 2017; Schultz et al., 2017; Wales and Leung, 2021; see Millar et al., 2022 for a more detailed discussion).
Interpretation of this biphasic pattern is still unclear, although the present results provide at least one novel insight. Specifically, one potential interpretation is that the ‘younger’ appearing FC pattern in the presymptomatic stage may reflect a compensatory response to early AD pathology (Cabeza et al., 2018). This interpretation leads to the prediction that reduced FC-BAG should be associated with better cognitive performance in the preclinical stage. However, this interpretation is not supported by the current results, as FC-BAG did not correlate with cognition in any of the analysis samples.
Alternatively, pathological AD-related FC disruptions may be orthogonal to healthy age-related FC differences, as supported by our previous observation that age and AD are predicted by mostly non-overlapping FC networks (Millar et al., 2022). For instance, the ‘younger’ FC pattern in CN/A+ participants may be driven by hyper-excitability in the preclinical stage (Harris et al., 2020; Ranasinghe et al., 2022). It is also worth considering that patterns of younger FC-BAG in CN/A+ participants may somehow correspond to a recent observation that patterns of youthful-appearing aerobic glycolysis are relatively preserved in the presymptomatic stage of AD (Goyal et al., 2022). Finally, this effect may simply be spuriously driven by poor performance of the FC brain age model, sample-specific noise, and/or statistical artifacts related to regression dilution and its correction (Butler et al., 2021). Hence, future studies should attempt to replicate these results in independent samples and further test potential theoretical interpretations.
BAG as a marker of cognition
Although FC-BAG was not associated with individual differences in a global cognitive composite within any of our analysis samples, greater structural and multimodal BAG estimates were associated with lower cognitive performance within the CI participants. Hence, these estimates may be sensitive markers of cognitive decline in the symptomatic stage. This finding is consistent with previous reports that other structural brain age estimates are associated with cognitive performance in AD (Eavani et al., 2018), Down syndrome (Cole et al., 2017a), HIV (Petersen et al., 2021; Petersen et al., 2022), as well as cognitively normal controls (Richard et al., 2018).
Limitations and future directions
The training sets included MRI scans from a range of sites, scanners, and acquisition sequence parameters, which may introduce noise and/or confounding variance into MRI features. We attempted to mitigate this problem by: (1) including only data from Siemens 3T scanners with similar protocols; (2) processing all MRI data through common pipelines and quality assessments; and (3) harmonizing across sites and scanners with ComBat (Fortin et al., 2017).
Additionally, the training set (N = 390) was relatively small compared to prior models, which have included training samples over 1,000 (e.g., Cole et al., 2015; Bashyam et al., 2020). Future studies may further improve model performance by including larger samples of well-characterized participants in the training set.
Although we took appropriate steps to detect and control for AD-related pathology in the CN/A− training sets, we were unable to control for other non-AD pathologies, e.g., Lewy body disease, TDP-43, etc., which may be present.
Structural MRI was quantified using the Desikan atlas (Desikan et al., 2006), which, although widely used, provides a relatively coarse parcellation of structural anatomy and, moreover, does not align with the parcellation used to define FC regions (Seitzman et al., 2020). Although the structural MRI data still outperformed FC in predicting age, future brain age models may further improve performance by using more refined and harmonized anatomical parcellations to define brain regions.
The sample size of continuous biomarker and cognitive analyses differed across the measures, depending on the availability, and was particularly low for analyses of tau PET. Future studies might improve upon this approach by a larger and more complete biomarker sample.
Moreover, estimates of BAG likely capture variance in early-life factors, which may obscure associations with AD and cognition, especially in cross-sectional designs (Vidal-Piñeiro et al., 2021). Future studies may improve the sensitivity of BAG estimates to disease-related markers by testing associations with longitudinal change.
Finally, although the Ances lab controls were relatively diverse, participants in other samples were mostly white and highly educated. Hence, these models may not be generalizable to broader samples. Future models would benefit by using more representative training samples.
Conclusions
We compared three MRI-based machine-learning models in their ability to predict age, as well as their sensitivity to early-stage AD, AD biomarkers, and cognition. Although FC and structural MRI models were both successful in detecting differences related to healthy aging and cognitive impairment, we note clear evidence that these modalities capture complementary signals. Specifically, FC-BAG was uniquely reduced in cognitively normal participants with elevated amyloid, although the interpretation of this finding still warrants further investigation. In contrast, structural BAG was uniquely associated with biomarkers of AD pathology and cognitive function within the CI participants. Finally, the multimodal age prediction model, which combined FC and structural MRI, further improved the prediction of healthy age differences and also was related to biomarkers and cognition in CI participants. Thus, multimodal brain age models may be useful maximizing sensitivity to AD across the spectrum of disease progression.
Data availability
This project utilized datasets obtained from the Knight ADRC and DIAN. The Knight ADRC and DIAN encourage and facilitate research by current and new investigators, and thus, the data and code are available to all qualified researchers after appropriate review. Requests for access to the data used in this study may be placed to the Knight ADRC Leadership Committee (https://knightadrc.wustl.edu/professionals-clinicians/request-center-resources/) and the DIAN Steering Committee (https://dian.wustl.edu/our-research/for-investigators/dian-observational-study-investigator-resources/data-request-form/). Requests for access to the Ances lab data may be placed to the corresponding author. Code used in this study is available at https://github.com/peterrmillar/MultimodalBrainAge (copy archived at swh:1:rev:de233b8fe813f5fcca317ce0a6353047f0dfbb92).
References
-
An analysis of certain psychological tests used for the evaluation of brain injuryPsychological Monographs 60:i1–i48.https://doi.org/10.1037/h0093567
-
Maintenance, reserve and compensation: the cognitive neuroscience of healthy ageingNature Reviews. Neuroscience 19:701–710.https://doi.org/10.1038/s41583-018-0068-2
-
Importance of multimodal MRI in characterizing brain tissue and its potential application for individual age predictionIEEE Journal of Biomedical and Health Informatics 20:1232–1239.https://doi.org/10.1109/JBHI.2016.2559938
-
Early clinical PET imaging results with the novel PHF-tau radioligand [ F-18 ] -t807Journal of Alzheimer’s Disease 34:457–468.https://doi.org/10.3233/JAD-122059
-
Prediction of brain age suggests accelerated atrophy after traumatic brain injuryAnnals of Neurology 77:571–581.https://doi.org/10.1002/ana.24367
-
Predicting age using neuroimaging: innovative brain ageing biomarkersTrends in Neurosciences 40:681–690.https://doi.org/10.1016/j.tins.2017.10.001
-
The global signal and observed anticorrelated resting state brain networksJ Neurophysiol 101:3270–3283.https://doi.org/10.1152/jn.90777.2008
-
Advanced brainage in older adults with type 2 diabetes mellitusFrontiers in Aging Neuroscience 5:90.https://doi.org/10.3389/fnagi.2013.00090
-
The clinical use of structural MRI in Alzheimer diseaseNature Reviews. Neurology 6:67–77.https://doi.org/10.1038/nrneurol.2009.215
-
BookBoston Diagnostic Aphasia Examination Booklet, III: Oral Expression: Animal Naming Fluency in Controlled AssociationIn Philadelphia: Lea & Febiger.
-
Lifespan brain activity, β-amyloid, and alzheimer’s diseaseTrends in Cognitive Sciences 15:520–526.https://doi.org/10.1016/j.tics.2011.09.004
-
Tau, amyloid, and cascading network failure across the alzheimer’s disease spectrumCortex; a Journal Devoted to the Study of the Nervous System and Behavior 97:143–159.https://doi.org/10.1016/j.cortex.2017.09.018
-
Imaging brain amyloid in Alzheimer’s disease with Pittsburgh compound-BAnnals of Neurology 55:306–319.https://doi.org/10.1002/ana.20009
-
Accelerated brain aging in schizophrenia and beyond: a neuroanatomical marker of psychiatric disordersSchizophrenia Bulletin 40:1140–1153.https://doi.org/10.1093/schbul/sbt142
-
A nonlinear simulation framework supports adjusting for age when analyzing brainageFrontiers in Aging Neuroscience 10:317.https://doi.org/10.3389/fnagi.2018.00317
-
Evaluating cognitive relationships with resting-state and task-driven blood oxygen level-dependent variabilityJournal of Cognitive Neuroscience 33:279–302.https://doi.org/10.1162/jocn_a_01645
-
Accelerated brain aging and cerebral blood flow reduction in persons with human immunodeficiency virusClinical Infectious Diseases 73:1813–1821.https://doi.org/10.1093/cid/ciab169
-
Machine learning quantifies accelerated white-matter aging in persons with HIVThe Journal of Infectious Diseases 226:49–58.https://doi.org/10.1093/infdis/jiac156
-
BookAdvanced lectures on machine learningIn: Carbonell JG, Siekmann J, editors. Advanced Lectures on Machine Learning. Berlin, Heidelberg: Springer-Verlag. pp. 63–71.https://doi.org/10.1007/b100712
-
SoftwareR: A language and environment for statistical computing, version 1.2.1R Foundation for Statistical Computing, Vienna, Austria.
-
Cerebrospinal fluid biomarkers measured by elecsys assays compared to amyloid imagingAlzheimer’s & Dementia 14:1460–1469.https://doi.org/10.1016/j.jalz.2018.01.013
-
Utilizing the centiloid scale in cross-sectional and longitudinal PIB PET studiesNeuroImage. Clinical 19:406–416.https://doi.org/10.1016/j.nicl.2018.04.022
-
Imaging and cerebrospinal fluid biomarkers in early preclinical Alzheimer diseaseAnnals of Neurology 80:379–387.https://doi.org/10.1002/ana.24719
-
Correspondence of CSF biomarkers measured by lumipulse assays with amyloid PETAlzheimer’s & Dementia 17:S5.https://doi.org/10.1002/alz.051085
Decision letter
-
Karla L MillerReviewing Editor; University of Oxford, United Kingdom
-
Jeannie ChinSenior Editor; Baylor College of Medicine, United States
-
James ColeReviewer; Centre for Medical Image Computing, Department of Computer Science, University College London; Dementia Research Centre, Institute of Neurology, University College London, London, United Kingdom
-
Didac Vidal-PineiroReviewer; University of Oslo, Norway
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Multimodal brain age estimates relate to Alzheimer disease biomarkers and cognition in early stages: a cross-sectional observational study" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Jeannie Chin as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: James Cole (Reviewer #1); Didac Vidal-Pineiro (Reviewer #3).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
The overall consensus from the reviewers is that some of the hypotheses (and conclusions) are supported by the results (specifically, the Vol-BAG model) while others are much weaker (specifically, the FC-BAG models). In consultation between the reviewers, they agreed that the work is "useful" but found the evidence to be "incomplete" for the conclusions as presented.
1. The functional connectivity model provides a poor fit to the data. Given that the FC-based models have been recently published (Millar 2022), the reviewers feel that the authors should temper claims about the importance of the FC-based modelling.
2. The reviewers are skeptical about claims regarding the improvements afforded by multi-modal brain age models. In particular, the bootstrapping analyses actually support the claims that FC data improve the quality of brain age modelling.
3. Overall, the reviewers feel that some of the conclusions, such as the biphasic relationships between functional brain-age models and pathological status, are not strongly supported and need to be tempered. In particular, the reviewers object to referring to results as "marginal".
4. Discussion about the potential implications of sample size would be welcome.
Reviewer #1 (Recommendations for the authors):
– In the Methods, they say they used a Gaussian mixture model to define pTau positivity. There are multiple ways to implement GMMs, so more details should be included here.
– The presentation of the MRI Acquisition section in the Methods is not very clear. I suggest the authors consider an alternative format, possibly a supplementary table, where the acquisition details can be more easily appraised. Currently, the acquisition details on the DIAN participants are scarce relative to the ADRC participants.
– Can the authors explain and justify why the fMRI processing included registration to an older-adult template? Could this have caused a bias in the registration accuracy for younger participants?
– It is unclear to me why they chose to perform 10-fold CV and hold-out validation with 1000 bootstraps. To my mind, the latter would have been sufficient. If the authors think including the initial 10-fold CV as well is important, this should be clearly justified.
– It is important that R2 is reported for each model performance, not just MAE. As R2 is a ratio the values can readily be compared across published studies, while the MAE cannot as it is heavily dependent on the age distribution of the test set. For completeness, they could also consider reporting the Pearson's r correlation between predicted age and age, and the root mean square error as well.
– It is unclear how the model performance comparisons were conducted (Results, pg. 12). While t-tests are mentioned in the text, the exact details should be included in the Methods. My concern here is that the n (sample size) for these comparisons is based on the number of bootstraps (arbitrarily determined by the authors to be 1000), rather than the actual sample size. If that is the case (and Figure 1D suggests it is), this is procedure is incorrect as the sensitivity that these tests have to detect differences would be purely a factor of the number of bootstraps, rather than the number of observations. This means that the experimenter can simply choose to make smaller differences 'significant' simply by adding more bootstraps. This needs to be clarified and corrected if appropriate. One approach to achieve the goal of comparing model performances is to take Pearson's correlations with age from each model and use Z-transformations to test the alternative hypothesis that the correlations are different (e.g., the Steiger test). In that way, the n would be determined by the number of observations, so statistical power would appropriately reflect the data.
– I recommend avoiding saying things like 'marginally lower' when a p-value = 0.110. There's no real evidence that there's a difference here, so hard to say whether it's truly lower or not. Generally avoiding 'trends' at 0.1> p >0.05 is best practice. P-values are important, but effect sizes (with confidence intervals) are often more informative.
– In the Discussion, when comparing age prediction accuracy between studies, it's important not to rely on MAE alone as this can vary greatly as a function of the test set age distribution. They should use R2 instead. Where R2 is unavailable, it's essential that the age range of each study mentioned in comparison is reported to provide context to the MAE values.
– The evidence for a biphasic relationship between FC-BAG and pre-clinical/clinical status is somewhat over-interpreted, particularly given there was no difference between A+T- and A+T+ people (p=0.11) and the fit of FC brain age is quite poor (i.e., far from the line of identity in Figure 2A). I suggest more caution when discussing this.
– A key limitation that was not mentioned was the small sample size relative to other studies. Perhaps the model performance is similar but given that only MAE is used to compare studies it is hard to draw meaningful conclusions. My impression is that had larger datasets been available, then performance would have improved.
Reviewer #2 (Recommendations for the authors):
– As explained in the previous section, the FC-BAG model has very limited prediction power, and therefore the results from the FC-BAG model are not reliable while providing marginal benefit. The FC-BAG results should be moved to the supplementary materials.
– For the FC-BAG models and its relation to other clinical variables, please also another version of the model including mean, median, and maximum head motion during the entire rsfMRI scan as covariates in the model to further ensure the reliability of the results.
– It is not clear to me that the bootstrapped based t-test provides evidence in favor of the Vol+FC-BAG model. In other words, a stacked model combining FC-BAG and Vol-BAG will always perform as well or worse than each model. If the stacking approach takes this into account (not clear in the method section, needs further explanation) the marginal increase in performance can be explained to this unidirectional effect and needs further confirmation based on a model selection step (e.g. using new independent data not used in the training-validation of FC-BAG and Vol-BAG model to compare Vol+FC-BAG and Vol-BAG model).
– After the previous step authors can choose the best performing model (either Vol-BAG or Vol+FC-BAG model) and only present the data for the selected model since results between the two models are redundant and don't add extra information to the reader.
– The analysis of hippocampal volume (specially related to the preclinical AD) needs to be confirmed. To do so, hippocampal volume as well as volumetric features from regions highly correlated with hippocampal volume should be removed from the feature set of Vol-BAG and Vol+FC-BAG models. The models need to be retrained using the same procedure. The relationship between hippocampal volume and the newly calculated Vol-BAG and Vol+FC-BAG values should be reported alongside the current results.
Reviewer #3 (Recommendations for the authors):
Find below some recommendations on how (I think) the science in this manuscript might be improved in no particular order.
1. Training sample. It is unclear why one would like to minimize undetected AD pathology (amyloid positivity, that is) in the cognitively healthy training sample as many of these individuals (when Tau negative) have minimal changes in brain structure and function. Since you create a BA "norm" from these individuals, one may benefit from including a bigger, more representative sample using more lenient inclusion criteria. Decisions regarding the training sample can have a big impact on the subsequent interpretation of BA results (e.g. Hwang, 2022, Brain Comm).
2. Group descriptors. It is still a matter of ongoing debate, but I recommend using another descriptor for the amyloid positive group rather than "preclinical AD". Even in the NIAA-AA Research framework from 2018 (Jack Jr.) they only use this tag for individuals that are amyloid and tau positive.
3. Biomarker definition. I am not an expert on biomarkers, but the definition of pTau positivity is uncommon to me "Gaussian mixture model approach to defining pTau positivity based on the CSF pTau/Aβ40 ratio.". Could the authors justify and or cite the correspondent references?
4. Statistical analysis. If I have not misread, the methods section only mentions three test groups (A-, A+, and CDR>0) but the analysis is performed with four groups. This leads to confusion and should be corrected. Also, most higher-level analyses reported in the results are not described in this section. These analyses should be described in the methods section. It is difficult to evaluate whether the performed analyses are appropriate without this description. For example, (lines 323-7) the authors report three different regression models and then a fourth analysis combining the four groups, but only for FC-BAG. This procedure is unclear, not described (as far as I can see), and not justified. Another example is the analysis with NFL which is not mentioned until line 412 (p.20) in the Results section. Also, the authors use different samples for different tests, due to the lack of Biomarker information for some individuals. I suggest adding degrees of freedom/n when reporting the results, so the reader has some information regarding the sample used.
5. The authors are repeating the same analysis in three different modalities (also sometimes they repeat the analyses across several pairs of groups [e.g. lines 323-7]). Thus, I would strongly recommend using some type of multiple comparison corrections.
6. Table 2. The authors should mention what the units in the table represent. Also, I recommend adding df and exact significance values (at least if p >.001).
7. Atlas. The authors used the D-K atlas (not strictly the FS-defined) for BA computation. This is a suboptimal choice, and I would recommend in the future using more fine-grained parcellations. This is not a strong issue, but the choice surprised me since the authors used a 300-ROI parcellation for the rs-fMRI. Also, since the authors use cortical thickness for sampling the cortex, I would not use "Volumetric"-BA as a descriptor.
8. Movement and rs-fMRI. The rs-fMRI preprocessing used might still lead to a signal that is related to movement. Since movement is almost always related to age and disease [and thus can affect both the BA computation and the tests in the test sample], I would suggest taking additional steps in this regard. At the minimum, I would include total motion as an additional covariate in the higher-level analysis and discuss this issue in the limitations section.
9. The results in cognitively healthy samples are largely negative (i.e. do not differ with groups). One possible explanation is that the authors are using cross-sectional samples and thus – even when using BA metrics – have a signal that captures ongoing aging (accelerated aging, if you wish) and baseline (lifelong, preexisting) variability between individuals. The latter may obscure possible existing effects. I recommend the authors acknowledge the limitations of using cross-sectional data to study changes that ought to be longitudinal.
https://doi.org/10.7554/eLife.81869.sa1Author response
Essential revisions:
The overall consensus from the reviewers is that some of the hypotheses (and conclusions) are supported by the results (specifically, the Vol-BAG model) while others are much weaker (specifically, the FC-BAG models). In consultation between the reviewers, they agreed that the work is "useful" but found the evidence to be "incomplete" for the conclusions as presented.
1. The functional connectivity model provides a poor fit to the data. Given that the FC-based models have been recently published (Millar 2022), the reviewers feel that the authors should temper claims about the importance of the FC-based modelling.
Although the reviewers are correct that the FC model indeed provided a relatively poor fit, compared to structural MRI data, a major goal of this project was to test whether each modality (structural MRI and FC) captures unique patterns related to AD progression. As we are primarily motivated to evaluate these models in their associations with AD, it is important to consider that the most accurate BAG models for age prediction are not necessarily the ones that are most sensitive to disease. Indeed, at least one study suggests that models with “moderate” age prediction accuracy might be the most useful in detecting deviation related to disease, as compared to overly “loose” or “tight” age prediction models (Bashyam et al., 2020). We now justify our motivations more clearly in the “Introduction”:
“This project aimed to develop multimodal models of brain-predicted age, incorporating both FC and structural MRI. Participants with presymptomatic AD pathology were excluded from the training set to maximize sensitivity. We hypothesized that BAG estimates would be sensitive to the presence of AD biomarkers and early cognitive impairment. We further considered whether estimates were continuously associated with AD biomarkers of amyloid and tau, as well as cognition. We hypothesized that FC and structural MRI would capture complementary signals related to age and AD. Thus, we systematically compared models trained on unimodal FC, structural MRI, and combined modalities, to test the added utility of multimodal integration in accurately predicting age and whether each modality captures unique relationships with AD biomarkers and cognition.”
Moreover, in the current revision, we aim to focus the discussion on novel associations with this biphasic FC pattern (including the tests of continuous associations with biomarkers and cognition), rather than recapitulating the previously published finding. We also discuss the potential relevance of this result to emerging results from MEG (Ranasinghe et al., 2022) and metabolic PET studies (Goyal et al., 2022). Finally, we now also acknowledge the poor prediction performance of the FC model as a potential spurious explanation of these findings. The discussion of this result now tempers prior claims about the importance of FC:
“As we have previously reported (31), FC-BAG was lower in presymptomatic AD participants compared to amyloid-negative controls. Extending beyond this group difference, we now also note that FC-BAG was negatively associated with amyloid PET in CN/A+ participants. The combined reduction of FC-BAG in the presymptomatic phase and increase in the symptomatic phase suggest a biphasic functional response to AD progression, which is partially consistent with some prior suggestions (77–81) (see ref 31 for a more detailed discussion).
Interpretation of this biphasic pattern is still unclear, although the present results provide at least one novel insight. Specifically, one potential interpretation is that the “younger” appearing FC pattern in the preclinical phase may reflect a compensatory response to early AD pathology (82). This interpretation leads to the prediction that reduced FC-BAG should be associated with better cognitive performance in the preclinical stage. However, this interpretation is not supported by the current results, as FC-BAG did not correlate with cognition in any of the analysis samples.
Alternatively, pathological AD-related FC disruptions may be orthogonal to healthy age-related FC differences, as supported by our previous observation that age and AD are predicted by mostly non-overlapping FC networks (31). For instance, the “younger” FC pattern in CN/A+ participants may be driven by hyper-excitability in the preclinical stage (83,84). It is also worth considering that patterns of younger FC-BAG in CN/A+ participants may somehow correspond to a recent observation that patterns of youthful-appearing aerobic glycolysis are relatively preserved in the preclinical stage of AD (85). Finally, this effect may simply be spuriously driven by poor performance of the FC brain age model, sample-specific noise, and/or statistical artifacts related to regression dilution and its correction (71). Hence, future studies should attempt to replicate these results in independent samples and further test potential theoretical interpretations.”
2. The reviewers are skeptical about claims regarding the improvements afforded by multi-modal brain age models. In particular, the bootstrapping analyses actually support the claims that FC data improve the quality of brain age modelling.
We thank the reviewers for pointing out this flaw in the comparison of model performance. We now test for significant differences between z-transformed Pearson correlations with age in each model using a Williams’s test (as these correlations are dependent in that they share a common variable, age, as opposed to the Steiger test of correlations between different variables). We now report these test results in the
“Comparison of Model Performance” section:
“All models accurately predicted chronological age in the training sets, as assessed using 10-fold cross validation, as well as in the held-out test sets. Overall, prediction accuracy was lowest in the FC model (MAEFC/Train = 8.67 years, R2FC/Train = 0.68, MAEFC/Test = 8.25 years, R2FC/Test = 0.73, see Figure 1A and B). The structural MRI model (MAES/Train = 5.97 years, R2S/Train = 0.81, MAES/Test = 6.26 years, R2S/Test = 0.82, see Figure 1C and D) significantly outperformed the FC model in age prediction accuracy, Williams’s tS vs. FC = 5.39, p <.001. Finally, the multimodal model (MAES+FC/Train = 5.34 years, R2S+FC/Train = 0.86, MAES+FC/Test = 5.25 years, R2S+FC/Test = 0.87, see Figure 1E and F) significantly outperformed both the FC model (Williams’s tS+FC vs. FC = 11.20, p <.001) and the structural MRI model (Williams’s tS+FC vs. S = 5.67, p <.001).”
3. Overall, the reviewers feel that some of the conclusions, such as the biphasic relationships between functional brain-age models and pathological status, are not strongly supported and need to be tempered. In particular, the reviewers object to referring to results as "marginal".
We appreciate the reviewers’ concern over the weak support for some of the results, particularly in the biphasic relationship observed in the multimodal BAG model. In the revised analyses, which focus on a three group comparison (CN/A- vs. CN/A+ vs. CI), the biphasic pattern in the FC-BAG model is clearly reproduced and survives FDR correction for multiple comparisons. However, the previously noted “marginal” biphasic pattern in the S+FC-BAG model is no longer apparent. Thus, we limit our discussion of the biphasic pattern to the FC model, and not the multimodal model. Moreover, we no longer refer to results as “marginal” throughout the revised submission.
4. Discussion about the potential implications of sample size would be welcome.
We agree with the reviewers that the sample size of the training set was relatively small compared to prior models. We now acknowledge this issue as a limitation and an avenue for future development:
“Additionally, the training set (N = 390) was relatively small compared to prior models, which have included training samples over 1000 (e.g., 5,76). Future studies may further improve model performance by including larger samples of well-characterized participants in the training set.”
Reviewer #1 (Recommendations for the authors):
– In the Methods, they say they used a Gaussian mixture model to define pTau positivity. There are multiple ways to implement GMMs, so more details should be included here.
We apologize for the lack of clarity in the GMM methods, as multiple reviewers also noted similar concerns. Specifically, we fit a two-component GMM to the continuous pTau data, and then used the model classification to define pTau- and pTau+ participants. However, in order to simplify the analyses and interpretation of results, we have removed the analyses stratifying by pTau positivity and instead focus only on A- vs. A+ participants (see responses below to reviewer #3, comments #3 and 4).
– The presentation of the MRI Acquisition section in the Methods is not very clear. I suggest the authors consider an alternative format, possibly a supplementary table, where the acquisition details can be more easily appraised. Currently, the acquisition details on the DIAN participants are scarce relative to the ADRC participants.
We apologize for the lack of clarity. In the revision, we now provide more specific details on the acquisition parameters for DIAN participants in the main text (“MRI Acquisition” section) and also provide a summary table of the parameters in the Supplementary Material.
“All MRI data were obtained using a Siemens 3T scanner, although there was a variety of specific models within and across studies. As described previously (Millar et al., 2022), participants in the Knight ADRC and Ances lab studies completed one of two comparable structural MRI protocols, varying by scanner (sagittal T1-weighted magnetization-prepared rapid gradient echo sequence [MPRAGE] with repetition time [TR] = 2400 or 2300 ms, echo time [TE] = 3.16 or 2.95 ms, flip angle = 8° or 9°, frames = 176, field of view = sagittal 256x256 or 240x256 mm, 1-mm isotropic or 1x1x1.2 mm voxels; oblique T2-weighted fast spin echo sequence [FSE] with TR = 3200 ms, TE = 455 ms, 256 x 256 acquisition matrix, 1-mm isotropic voxels) and an identical resting-state fMRI protocol (interleaved whole-brain echo planar imaging sequence [EPI] with TR = 2200 ms, TE = 27 ms, flip angle = 90°, field of view = 256 mm, 4-mm isotropic voxels for two 6-minute runs [164 volumes each] of eyes open fixation). DIAN participants completed a similar MPRAGE protocol (TR = 2300 ms, TE = 2.95 ms, flip angle = 9°, field of view = 270 mm, 1.1x1.1x1.2 mm voxels)(McKay et al., 2022). Resting-state EPI sequence parameters for the DIAN participants differed across sites and scanners with the most notable difference being shorter resting-state runs (one 5-minute run of 120 volumes; see Supplementary File 1 for summary of structural and functional MRI parameters) (McKay et al., 2022).”
– Can the authors explain and justify why the fMRI processing included registration to an older-adult template? Could this have caused a bias in the registration accuracy for younger participants?
We apologize for the lack of clarity. In fact, we used two separate templates (one for younger adults and one for older adults). All participants were registered to the age-appropriate template. We now specify this procedure more clearly in “FC Preprocessing and Features”:
“Transformation to an age-appropriate in-house atlas template (based on independent samples of either younger adults or CN older adults) was performed using a composition of affine transforms connecting the functional volumes with the T2-weighted and MPRAGE images.”
– It is unclear to me why they chose to perform 10-fold CV and hold-out validation with 1000 bootstraps. To my mind, the latter would have been sufficient. If the authors think including the initial 10-fold CV as well is important, this should be clearly justified.
We agree with the reviewer that the 10-fold CV and bootstrapped hold-out validation are somewhat redundant. The hold-out validation step was initially performed to facilitate comparison across models. However, several reviewers have critiqued this approach. We have now removed the bootstrapping approach and instead focus on cross validation in the training set, as well as a non-bootstrapped validation in the testing set. We now specify this approach in the “Gaussian Process Regression (GPR)” section:
“Model performance in the training set was assessed using 10-fold cross validation via the Pearson correlation coefficient (r), the proportion of variance explained (R2), the mean absolute error (MAE), and root-mean-square error (RMSE) between true chronological age and the cross-validated age predictions merged across the 10 folds. We then evaluated generalizability of the models to predict age in unseen data by applying the trained models to the held-out test set of healthy controls.”
– It is important that R2 is reported for each model performance, not just MAE. As R2 is a ratio the values can readily be compared across published studies, while the MAE cannot as it is heavily dependent on the age distribution of the test set. For completeness, they could also consider reporting the Pearson's r correlation between predicted age and age, and the root mean square error as well.
We agree with the reviewer that R2 (as well as Pearson’s r and RMSE) are important metrics of model performance, especially for comparison with other studies. We now report these measures in Figure 1, as well as in the “Comparison of Model Performance”:
“All models accurately predicted chronological age in the training sets, as assessed using 10-fold cross validation, as well as in the held-out test sets. Overall, prediction accuracy was lowest in the FC model (MAEFC/Train = 8.67 years, R2FC/Train = 0.68, MAEFC/Test = 8.25 years, R2FC/Test = 0.73, see Figure 1A and B). The structural MRI model (MAES/Train = 5.97 years, R2S/Train = 0.81, MAES/Test = 6.26 years, R2S/Test = 0.82, see Figure 1C and D) significantly outperformed the FC model in age prediction accuracy, Williams’s tS vs. FC = 5.39, p <.001. Finally, the multimodal model (MAES+FC/Train = 5.34 years, R2S+FC/Train = 0.86, MAES+FC/Test = 5.25 years, R2S+FC/Test = 0.87, see Figure 1E and F) significantly outperformed both the FC model (Williams’s tS+FC vs. FC = 11.20, p <.001) and the structural MRI model (Williams’s tS+FC vs. S = 5.67, p <.001).”
– It is unclear how the model performance comparisons were conducted (Results, pg. 12). While t-tests are mentioned in the text, the exact details should be included in the Methods. My concern here is that the n (sample size) for these comparisons is based on the number of bootstraps (arbitrarily determined by the authors to be 1000), rather than the actual sample size. If that is the case (and Figure 1D suggests it is), this is procedure is incorrect as the sensitivity that these tests have to detect differences would be purely a factor of the number of bootstraps, rather than the number of observations. This means that the experimenter can simply choose to make smaller differences 'significant' simply by adding more bootstraps. This needs to be clarified and corrected if appropriate. One approach to achieve the goal of comparing model performances is to take Pearson's correlations with age from each model and use Z-transformations to test the alternative hypothesis that the correlations are different (e.g., the Steiger test). In that way, the n would be determined by the number of observations, so statistical power would appropriately reflect the data.
We thank the reviewer for pointing out this flaw in the comparison of model performance. We now test for significant differences between z-transformed Pearson correlations with age in each model using a Williams’s test (as these correlations are dependent in that they share a common variable, age, as opposed to the Steiger test of correlations between different variables). We now report these test results in the “Comparison of Model Performance” section:
“All models accurately predicted chronological age in the training sets, as assessed using 10-fold cross validation, as well as in the held-out test sets. Overall, prediction accuracy was lowest in the FC model (MAEFC/Train = 8.67 years, R2FC/Train = 0.68, MAEFC/Test = 8.25 years, R2FC/Test = 0.73, see Figure 1A and B). The structural MRI model (MAES/Train = 5.97 years, R2S/Train = 0.81, MAES/Test = 6.26 years, R2S/Test = 0.82, see Figure 1C and D) significantly outperformed the FC model in age prediction accuracy, Williams’s tS vs. FC = 5.39, p <.001. Finally, the multimodal model (MAES+FC/Train = 5.34 years, R2S+FC/Train = 0.86, MAES+FC/Test = 5.25 years, R2S+FC/Test = 0.87, see Figure 1E and F) significantly outperformed both the FC model (Williams’s tS+FC vs. FC = 11.20, p <.001) and the structural MRI model (Williams’s tS+FC vs. S = 5.67, p <.001).”
– I recommend avoiding saying things like 'marginally lower' when a p-value = 0.110. There's no real evidence that there's a difference here, so hard to say whether it's truly lower or not. Generally avoiding 'trends' at 0.1> p >0.05 is best practice. P-values are important, but effect sizes (with confidence intervals) are often more informative.
We appreciate the reviewer’s concern with over-interpretation of non-significant relationships. We now avoid using the terms “marginal” and “trend” throughout the manuscript. We also report effect sizes (partial η2) for all regression-based analyses.
– In the Discussion, when comparing age prediction accuracy between studies, it's important not to rely on MAE alone as this can vary greatly as a function of the test set age distribution. They should use R2 instead. Where R2 is unavailable, it's essential that the age range of each study mentioned in comparison is reported to provide context to the MAE values.
We thank the reviewer for pointing out this flaw. We now discuss our model performance in comparison to prior models using R2, instead of MAE, in “Predicting Brain Age with Multiple Modalities”:
“We found that a GPR model trained on structural MRI features predicted chronological age in a cognitively normal, amyloid-negative adult sample with an R2 of 0.81. This level of performance is comparable to other structural models, which have reported R2s ranging from 0.80 to 0.95 (Bashyam et al., 2020; Cole and Franke, 2017; Eavani et al., 2018; Gong et al., 2021; Lee et al., 2022; Liem et al., 2017; Ly et al., 2020; Wang et al., 2019). As previously reported (Millar et al., 2022), the FC-trained model predicted age with an R2 of 0.68, again consistent with previous FC models, which have achieved R2s from 0.53 to 0.80 (Eavani et al., 2018; Gonneaud et al., 2021; Liem et al., 2017). Our observation that structural MRI outperformed FC in age prediction is also consistent with previous direct comparisons between modalities (Dunås et al., 2021; Eavani et al., 2018; Liem et al., 2017).”
– The evidence for a biphasic relationship between FC-BAG and pre-clinical/clinical status is somewhat over-interpreted, particularly given there was no difference between A+T- and A+T+ people (p=0.11) and the fit of FC brain age is quite poor (i.e., far from the line of identity in Figure 2A). I suggest more caution when discussing this.
In the revised analyses, which focus on a three group comparison (CN/A- vs. CN/A+ vs. CI), the biphasic pattern in the FC-BAG model is clearly reproduced and survives FDR correction for multiple comparisons. However, the previously noted “marginal” biphasic pattern in the S+FC-BAG model is no longer apparent. Thus, we limit our discussion of the biphasic pattern to the FC model, and not the multimodal model. Moreover, we aim to focus the discussion on novel associations with this biphasic FC pattern (including the tests of continuous associations with biomarkers and cognition), rather than recapitulating the previously published finding. We also discuss the potential relevance of this result to emerging results from MEG (Ranasinghe et al., 2022) and metabolic PET studies (Goyal et al., 2022). Finally, we now also acknowledge the poor prediction performance of the FC model as a potential spurious explanation of these findings. The discussion of this result now reads as follows:
“As we have previously reported (31), FC-BAG was lower in presymptomatic AD participants compared to amyloid-negative controls. Extending beyond this group difference, we now also note that FC-BAG was negatively associated with amyloid PET in CN/A+ participants. The combined reduction of FC-BAG in the presymptomatic phase and increase in the symptomatic phase suggest a biphasic functional response to AD progression, which is partially consistent with some prior suggestions (77–81) (see ref 31 for a more detailed discussion).
Interpretation of this biphasic pattern is still unclear, although the present results provide at least one novel insight. Specifically, one potential interpretation is that the “younger” appearing FC pattern in the preclinical phase may reflect a compensatory response to early AD pathology (82). This interpretation leads to the prediction that reduced FC-BAG should be associated with better cognitive performance in the preclinical stage. However, this interpretation is not supported by the current results, as FC-BAG did not correlate with cognition in any of the analysis samples.
Alternatively, pathological AD-related FC disruptions may be orthogonal to healthy age-related FC differences, as supported by our previous observation that age and AD are predicted by mostly non-overlapping FC networks (31). For instance, the “younger” FC pattern in CN/A+ participants may be driven by hyper-excitability in the preclinical stage (83,84). It is also worth considering that patterns of younger FC-BAG in CN/A+ participants may somehow correspond to a recent observation that patterns of youthful-appearing aerobic glycolysis are relatively preserved in the preclinical stage of AD (85). Finally, this effect may simply be spuriously driven by poor performance of the FC brain age model, sample-specific noise, and/or statistical artifacts related to regression dilution and its correction (71). Hence, future studies should attempt to replicate these results in independent samples and further test potential theoretical interpretations.”
– A key limitation that was not mentioned was the small sample size relative to other studies. Perhaps the model performance is similar but given that only MAE is used to compare studies it is hard to draw meaningful conclusions. My impression is that had larger datasets been available, then performance would have improved.
We agree with the reviewer that the sample size of the training set was relatively small compared to prior models. We now acknowledge this issue as a limitation and an avenue for future development:
“Additionally, the training set (N = 390) was relatively small compared to prior models, which have included training samples over 1000 (e.g., 5,76). Future studies may further improve model performance by including larger samples of well-characterized participants in the training set.”
Reviewer #2 (Recommendations for the authors):
– As explained in the previous section, the FC-BAG model has very limited prediction power, and therefore the results from the FC-BAG model are not reliable while providing marginal benefit. The FC-BAG results should be moved to the supplementary materials.
Although FC performed relatively poorly in predicting age, a major goal of this project was to test whether each modality (structural MRI and FC) captures unique patterns related to AD progression. In fact, the FC model indeed captures a unique pattern in that it is reduced in CN/A+ participants, but increased in CI participants, which stands in contrast to patterns observed in the S-BAG model. We view this as an important observation, which belongs in the main text, rather than a supplementary analysis. We now justify our motivations more clearly in the “Introduction”:
“This project aimed to develop multimodal models of brain-predicted age, incorporating both FC and structural MRI. Participants with presymptomatic AD pathology were excluded from the training set to maximize sensitivity. We hypothesized that BAG estimates would be sensitive to the presence of AD biomarkers and early cognitive impairment. We further considered whether estimates were continuously associated with AD biomarkers of amyloid and tau, as well as cognition. We hypothesized that FC and structural MRI would capture complementary signals related to age and AD. Thus, we systematically compared models trained on unimodal FC, structural MRI, and combined modalities, to test the added utility of multimodal integration in accurately predicting age and whether each modality captures unique relationships with AD biomarkers and cognition.”
– For the FC-BAG models and its relation to other clinical variables, please also another version of the model including mean, median, and maximum head motion during the entire rsfMRI scan as covariates in the model to further ensure the reliability of the results.
We agree with the reviewer (as well as Reviewer #3) that appropriate consideration and control for head motion artifact is a critical element in analysis of FC data. Hence, we now include mean framewise displacement (FD) as an additional covariate in all statistical analyses involving the FC and multimodal (S+FC) BAG estimates. We do not include median and maximum, as suggested by the reviewer, in order to minimize potential multi-collinearity in our regression models. As noted in “Statistical Analysis”:
“Given the potential confounding influence of head motion on FC-derived measures (60,76,77), we also included mean FD as an additional covariate of non-interest in the FC and S+FC models.”
– It is not clear to me that the bootstrapped based t-test provides evidence in favor of the Vol+FC-BAG model. In other words, a stacked model combining FC-BAG and Vol-BAG will always perform as well or worse than each model. If the stacking approach takes this into account (not clear in the method section, needs further explanation) the marginal increase in performance can be explained to this unidirectional effect and needs further confirmation based on a model selection step (e.g. using new independent data not used in the training-validation of FC-BAG and Vol-BAG model to compare Vol+FC-BAG and Vol-BAG model).
We appreciate the reviewer’s concern and agree that it is important to demonstrate that increases in model performance are meaningful, rather than driven by unidirectional effects of adding more features and/or capitalizing on chance. Thus, we performed a supplementary analysis, in which we combined the fully trained structural MRI brain age model with a model trained on “reshuffled” FC features using the same stacking approach in 1000 bootstrap samples. Thus, the distribution of R2 in this analysis reflects the expected range of model performance from adding unrelated FC features to the structural brain age model. In fact, most of the bootstrapped models performed similarly or worse than the unimodal structural model (see Figure 1—figure supplement 4), suggesting that our stacking approach does not have a unidirectional effect of improvement from adding unrelated features. No simulation achieved performance as high or greater than the fully trained S+FC model, suggesting that the modestly sized increase in the stacked multimodal model (compared to the unimodal structural MRI model) is driven by meaningful age-related FC signal, rather than by simply capitalizing on chance in a larger feature set. We now describe this analysis in
“Comparison of Model Performance” and Figure 1—figure supplement 4:
“It is possible that the modest increase in the multimodal model was due to capitalizing on noise, simply by adding more features to the structural model. Hence, we also compared the observed R2S+FC to a bootstrapped distribution of R2 performance estimates from 1000 resamples using a model in which the original structural MRI model was stacked with a model trained on randomly reshuffled FC features. Thus, this distribution represents the expected improvements in model performance from simply adding new features to the structural MRI model with the stacked approach. The observed R2S+FC outperformed all R2 estimates from this bootstrapped distribution (p < 0.001, see Figure 1—figure supplement 4), suggesting that the modest increase in model performance observed in the stacked multimodal (S+FC) model over the unimodal structural model is due to meaningful age-related FC signal, rather than capitalizing on noise in a larger feature set.”
– After the previous step authors can choose the best performing model (either Vol-BAG or Vol+FC-BAG model) and only present the data for the selected model since results between the two models are redundant and don't add extra information to the reader.
Although our revised and supplementary analyses support the selection of the S+FC BAG model for most accurate prediction of age, as noted above in the response to comment #1, a major goal of this project was to test whether each modality (structural MRI and FC) captures unique patterns related to AD progression. As we are primarily motivated to evaluate these models in their associations with AD, it is important to consider that the most accurate BAG models for age prediction are not necessarily the ones that are most sensitive to disease. In fact, at least one study suggests that models with “moderate” age prediction accuracy might be the most useful in detecting deviation related to disease, as compared to overly “loose” or “tight” age prediction models (Bashyam et al., 2020). We now justify our motivations more clearly in the
“Introduction”:
“This project aimed to develop multimodal models of brain-predicted age, incorporating both FC and structural MRI. Participants with presymptomatic AD pathology were excluded from the training set to maximize sensitivity. We hypothesized that BAG estimates would be sensitive to the presence of AD biomarkers and early cognitive impairment. We further considered whether estimates were continuously associated with AD biomarkers of amyloid, tau, and neurodegeneration (Jack et al., 2016), as well as cognitive function. We hypothesized that FC and structural MRI would capture complementary signals related to age and AD. Thus, we systematically compared models trained on unimodal FC, structural MRI, and combined modalities, to test the added utility of multimodal integration in accurately predicting age and whether each modality captures unique relationships with AD biomarkers and cognition.”
– The analysis of hippocampal volume (specially related to the preclinical AD) needs to be confirmed. To do so, hippocampal volume as well as volumetric features from regions highly correlated with hippocampal volume should be removed from the feature set of Vol-BAG and Vol+FC-BAG models. The models need to be retrained using the same procedure. The relationship between hippocampal volume and the newly calculated Vol-BAG and Vol+FC-BAG values should be reported alongside the current results.
We agree with the reviewer that the associations between hippocampal volume and S-BAG and S+FC-BAG create problems for interpretation, as the S-BAG and S+FC-BAG models include hippocampal volume as input features and are thus circular. Moreover, it is more of interest to this study to test associations with the biomarkers associated with earlier Alzheimer disease stages, including amyloid and tau. Thus, in the interest of simplifying the focus of the study, as well as the interpretation of results, we have decided to remove the analyses of the neurodegeneration markers (including hippocampal volume).
Reviewer #3 (Recommendations for the authors):
Find below some recommendations on how (I think) the science in this manuscript might be improved in no particular order.
1. Training sample. It is unclear why one would like to minimize undetected AD pathology (amyloid positivity, that is) in the cognitively healthy training sample as many of these individuals (when Tau negative) have minimal changes in brain structure and function. Since you create a BA "norm" from these individuals, one may benefit from including a bigger, more representative sample using more lenient inclusion criteria. Decisions regarding the training sample can have a big impact on the subsequent interpretation of BA results (e.g. Hwang, 2022, Brain Comm).
We agree with the reviewer that the composition of the training sample is critical for interpreting outputs from a brain age model. Indeed, this consideration motivated us to train our model in amyloid-negative participants for both theoretical and empirical reasons. Specifically, although individuals in the earliest preclinical stages of AD (i.e. A+T-N-) likely have minimal detectable structural changes, it is possible that structural changes might be observable in later stage participants (i.e., T+, N+) even if they are cognitively normal. Thus, removing all A+ participants from the training set is a conservative approach to minimize the potential influence of presymptomatic AD pathology in any stage.
Further, although amyloid positivity may lead to minimal structural differences, prior work from our lab (and others) suggests that amyloid may be associated with differences in functional connectivity, and critically that presymptomatic amyloid pathology may confound effects that are otherwise interpreted to reflect “healthy aging” (Brier et al., 2014). Thus, if these participants are included in the training set, the FC model would learn to associate these disease-related FC patterns with normative aging. When applied to an analysis set of amyloid-positive participants, such a model would be less likely to identify deviation in the BAG, as those disease-related differences are incorporated into the model of healthy aging.
This argument was recently tested by Ly and colleagues (2020), who compared two brain age models: one trained on amyloid-negative participants vs. another trained on cognitively normal participants regardless of amyloid status. They found that the amyloid-negative trained model was able to detect differences in brain age between an amyloid-positive and amyloid-negative test sets, but the model that did not exclude amyloid-positive participants was not sensitive to this difference. Although this study was limited in that the amyloid-positive and amyloid-negative samples were drawn from separate, unmatched cohorts, it represents an important proof of concept, upon we aim to expand in this paper.
We have now revised the introduction to make the motivation for this design decision more clear:
“One approach to maximize sensitivity of BAG to presymptomatic AD pathology may be to train brain age models exclusively on amyloid-negative participants. As undetected AD pathology might influence MRI measures, and thus confound effects otherwise attributed to “healthy aging” (Brier, Thomas, Snyder, et al., 2014), including the patterns learned by a traditional brain age model, an alternative model trained on amyloid-negative participants only might be more sensitive to detect presymptomatic AD pathology as deviations in BAG. Indeed, one recent study demonstrated that an amyloid-negative trained brain age model (Ly et al., 2020) is more sensitive to progressive stages of AD than a typical amyloid-insensitive model (Cole et al., 2015). However, this comparison included amyloid-negative and amyloid-positive test samples from two separate cohorts, and thus may be driven by cohort, scanner, and/or site differences. To validate the applicability of the brain-predicted age approach to preclinical AD, it is important to test a model’s sensitivity to amyloid status, as well as continuous relationships with preclinical AD biomarkers, within a single cohort. Another recent comparison demonstrated that both traditional and amyloid-negative trained brain age models were similarly related to molecular AD biomarkers, but that further attempts to “disentangle” AD from brain age by including more advanced AD continuum participants in the training sample significantly reduced relationships between brain age and AD markers (Hwang et al., 2022). Thus, in this study we will apply the amyloid-negative training approach to a multimodal MRI dataset, in order to maximize sensitivity to AD pathology in the presymptomatic stage.”
2. Group descriptors. It is still a matter of ongoing debate, but I recommend using another descriptor for the amyloid positive group rather than "preclinical AD". Even in the NIAA-AA Research framework from 2018 (Jack Jr.) they only use this tag for individuals that are amyloid and tau positive.
We have revised our terminology throughout the manuscript and figures to refer to our groups by clinical assessment and molecular categorization (e.g, CN/A-, CN/A+, CI), rather than staged progression terms (e.g., “preclinical AD”).
3. Biomarker definition. I am not an expert on biomarkers, but the definition of pTau positivity is uncommon to me "Gaussian mixture model approach to defining pTau positivity based on the CSF pTau/Aβ40 ratio.". Could the authors justify and or cite the correspondent references?
To clarify, we fit a two-component GMM to the continuous pTau data, and then used the model classification to define pTau- and pTau+ participants. However, in order to simplify the analyses and interpretation of results, we have removed the analyses stratifying by pTau positivity and instead focus only on A- vs. A+ participants (see response below to comment #4).
4. Statistical analysis. If I have not misread, the methods section only mentions three test groups (A-, A+, and CDR>0) but the analysis is performed with four groups. This leads to confusion and should be corrected. Also, most higher-level analyses reported in the results are not described in this section. These analyses should be described in the methods section. It is difficult to evaluate whether the performed analyses are appropriate without this description. For example, (lines 323-7) the authors report three different regression models and then a fourth analysis combining the four groups, but only for FC-BAG. This procedure is unclear, not described (as far as I can see), and not justified. Another example is the analysis with NFL which is not mentioned until line 412 (p.20) in the Results section. Also, the authors use different samples for different tests, due to the lack of Biomarker information for some individuals. I suggest adding degrees of freedom/n when reporting the results, so the reader has some information regarding the sample used.
We apologize for the lack of clarity in the statistical analysis. In the revision, we have improved the clarity of this section in the following ways:
A. We no longer analyze the data using a four group split (i.e., A-T- vs. A+T- vs. A+T+ vs. CI). Instead, we focus on analyses of three groups (CN/A- vs. CN/A+ vs. CI) consistently throughout the study.
B. We now provide more detail on the higher level analyses, in which we test for group differences between the three analysis sets and test continuous associations with biomarkers and cognitive measures:
“Group differences in each BAG estimate were tested using an omnibus analysis of variance (ANOVA) test with follow-up pairwise t tests on age-residualized BAG estimates, using a false discovery rate (FDR) correction for multiple comparisons. Assumptions of normality were tested by visual inspection of quantile-quantile plots (see Figure 2—figure supplement 1). Assumptions of equality of variance were tested with Levene’s test. Linear regression models tested the effects of cognitive impairment (CDR > 0 vs. CDR 0) and amyloid positivity (A- vs. A+) on BAG estimates from each model, controlling for true age (as noted above) and demographic covariates (sex, years of education, and race). Given the potential confounding influence of head motion on FC-derived measures (59,75,76), we also included mean FD as an additional covariate of non-interest in the FC and S+FC models. We tested continuous relationships with AD biomarkers and cognitive estimates using linear regression models, including the same demographic and motion covariates.”
C. As noted in the response to Reviewer #2, comment #3, analyses of neurodegeneration biomarkers are of less interest to this study, compared to earlier biomarkers of amyloid and tau. Thus, analyses of NfL have been removed from the study.
D. We now report degrees of freedom for our regression analyses of group differences (see Table 2). We also report the number of participants in each group with available measures of each biomarker throughout the results, for example:
“355 participants (144 CN/A-, 154 CN/A+, 57 CI) had an available amyloid PET scan and 300 (120 CN/A-, 137 CN/A+, 43 CI) had an available CSF estimate of Aβ42/40.”
5. The authors are repeating the same analysis in three different modalities (also sometimes they repeat the analyses across several pairs of groups [e.g. lines 323-7]). Thus, I would strongly recommend using some type of multiple comparison corrections.
We agree with the reviewer that appropriate correction for multiple comparisons is necessary for these analyses. We now apply a false discovery rate (FDR) correction to the pairwise t tests, as described in “Statistical Analysis”:
“Group differences in each BAG estimate were tested using an omnibus analysis of variance (ANOVA) test with follow-up pairwise t tests on age-residualized BAG estimates, using a false discovery rate (FDR) correction for multiple comparisons.”
6. Table 2. The authors should mention what the units in the table represent. Also, I recommend adding df and exact significance values (at least if p >.001).
Table 2 presents the β estimates and standard error for the terms in the linear regression models predicting each BAG estimate. We now label this information more explicitly with separated columns. Further, we now provide exact p values for all terms and df for each model.
7. Atlas. The authors used the D-K atlas (not strictly the FS-defined) for BA computation. This is a suboptimal choice, and I would recommend in the future using more fine-grained parcellations. This is not a strong issue, but the choice surprised me since the authors used a 300-ROI parcellation for the rs-fMRI. Also, since the authors use cortical thickness for sampling the cortex, I would not use "Volumetric"-BA as a descriptor.
We agree with the authors that the D-K atlas is a relatively coarse anatomical parcellation. However, as these analyses were based on large, existing datasets that had already been processed and QC’ed with a harmonized pipeline, it would require significant effort to re-parcellate and QC the full dataset. Moreover, despite this coarse parcellation, the structural MRI data still predicts age quite well and outperforms the FC data, which of course uses the finer grained set of ROIs. We now acknowledge the choice of the D-K parcellation as a potential limitation and area of future development in the “Limitations”:
“Structural MRI was quantified using the Desikan atlas (Desikan et al., 2006), which although widely used, provides a relatively coarse parcellation of structural anatomy, and moreover, does not align with the parcellation used to define FC regions (Seitzman et al., 2020). Although the structural MRI data still outperformed FC in predicting age, future brain age models may further improve performance by using more refined and harmonized anatomical parcellations to define brain regions.”
Additionally, we now refer to the “volumetric” brain age model as “structural”, e.g., S-BAG, throughout the manuscript and figures.
8. Movement and rs-fMRI. The rs-fMRI preprocessing used might still lead to a signal that is related to movement. Since movement is almost always related to age and disease [and thus can affect both the BA computation and the tests in the test sample], I would suggest taking additional steps in this regard. At the minimum, I would include total motion as an additional covariate in the higher-level analysis and discuss this issue in the limitations section.
We agree with the reviewer (as well as Reviewer #2) that appropriate consideration and control for head motion artifact is a critical element in analysis of FC data. Hence, we now include mean framewise displacement (FD) as an additional covariate in all statistical analyses involving the FC and multimodal (S+FC) BAG estimates. As noted in “Statistical Analysis”:
“Given the potential confounding influence of head motion on FC-derived measures (60,76,77), we also included mean FD as an additional covariate of non-interest in the FC and S+FC models.”
9. The results in cognitively healthy samples are largely negative (i.e. do not differ with groups). One possible explanation is that the authors are using cross-sectional samples and thus – even when using BA metrics – have a signal that captures ongoing aging (accelerated aging, if you wish) and baseline (lifelong, preexisting) variability between individuals. The latter may obscure possible existing effects. I recommend the authors acknowledge the limitations of using cross-sectional data to study changes that ought to be longitudinal.
We appreciate the reviewer’s suggestion and now discuss this issue as a limitation and area of future development:
“Moreover, estimates of BAG likely capture variance in early-life factors, which may obscure associations with Alzheimer disease and cognition, especially in cross-sectional designs (87). Future studies may improve the sensitivity of BAG estimates to disease-related markers by testing associations with longitudinal change.”
https://doi.org/10.7554/eLife.81869.sa2Article and author information
Author details
Funding
National Institutes of Health (P01-AG026276)
- John C Morris
National Institutes of Health (P01-AG03991)
- John C Morris
National Institutes of Health (P30-AG066444)
- John C Morris
National Institutes of Health (5-R01-AG052550)
- Beau M Ances
National Institutes of Health (5-R01-AG057680)
- Beau M Ances
National Institutes of Health (U19-AG032438)
- Randall J Bateman
BrightFocus Foundation (A2022014F)
- Peter R Millar
Alzheimer's Association (SG-20-690363-DIAN)
- Randall J Bateman
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank the participants for their dedication to this project, Haleem Azmy, Anna Boerwinkle, and Dimitre Tomov for technical and processing support. This manuscript has been reviewed by DIAN Study investigators for scientific content and consistency of data interpretation with previous DIAN Study publications. We acknowledge the altruism of the participants and their families and contributions of the DIAN research and support staff at each of the participating sites for their contributions to this study. We thank the personnel of the Administration, Biomarker, Biostatistics, Clinical, Genetics, and Neuroimaging Cores of the Knight ADRC, as well as the Administration, Biomarker, Biostatistics, Clinical, Cognition, Genetics, and Imaging Cores of DIAN. This research was funded by grants from the National Institutes of Health (P01-AG026276, P01-AG03991, P30-AG066444, 5-R01-AG052550, 5-R01-AG057680, 1-R01-AG067505, 1S10RR022984-01A1) and the BrightFocus Foundation (A2022014F), with generous support from the Paula and Rodger O Riney Fund and the Daniel J Brennan MD Fund. Data collection and sharing for this project was supported by The Dominantly Inherited Alzheimer Network (DIAN, U19-AG032438) funded by the National Institute on Aging (NIA),the Alzheimer’s Association (SG-20–690363-DIAN), the German Center for Neurodegenerative Diseases (DZNE), Raul Carrea Institute for Neurological Research (FLENI), Partial support by the Research and Development Grants for Dementia from Japan Agency for Medical Research and Development, AMED, and the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), Spanish Institute of Health Carlos III (ISCIII), Canadian Institutes of Health Research (CIHR), Canadian Consortium of Neurodegeneration and Aging, Brain Canada Foundation, and Fonds de Recherche du Québec – Santé.
Ethics
Human subjects: All participants provided written informed consent in accordance with the Declaration of Helsinki and their local institutional review board. All procedures were approved by the Human Research Protection Office at WUSTL (IRB ID # 201204041).
Senior Editor
- Jeannie Chin, Baylor College of Medicine, United States
Reviewing Editor
- Karla L Miller, University of Oxford, United Kingdom
Reviewers
- James Cole, Centre for Medical Image Computing, Department of Computer Science, University College London; Dementia Research Centre, Institute of Neurology, University College London, London, United Kingdom
- Didac Vidal-Pineiro, University of Oslo, Norway
Version history
- Received: July 14, 2022
- Preprint posted: August 27, 2022 (view preprint)
- Accepted: December 30, 2022
- Accepted Manuscript published: January 6, 2023 (version 1)
- Version of Record published: March 6, 2023 (version 2)
Copyright
© 2023, Millar et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,282
- Page views
-
- 237
- Downloads
-
- 4
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Medicine
Billions of apoptotic cells are removed daily in a human adult by professional phagocytes (e.g. macrophages) and neighboring nonprofessional phagocytes (e.g. stromal cells). Despite being a type of professional phagocyte, neutrophils are thought to be excluded from apoptotic sites to avoid tissue inflammation. Here, we report a fundamental and unexpected role of neutrophils as the predominant phagocyte responsible for the clearance of apoptotic hepatic cells in the steady state. In contrast to the engulfment of dead cells by macrophages, neutrophils burrowed directly into apoptotic hepatocytes, a process we term perforocytosis, and ingested the effete cells from the inside. The depletion of neutrophils caused defective removal of apoptotic bodies, induced tissue injury in the mouse liver, and led to the generation of autoantibodies. Human autoimmune liver disease showed similar defects in the neutrophil-mediated clearance of apoptotic hepatic cells. Hence, neutrophils possess a specialized immunologically silent mechanism for the clearance of apoptotic hepatocytes through perforocytosis, and defects in this key housekeeping function of neutrophils contribute to the genesis of autoimmune liver disease.
-
- Medicine
While mitochondria in different tissues have distinct preferences for energy sources, they are flexible in utilizing competing substrates for metabolism according to physiological and nutritional circumstances. However, the regulatory mechanisms and significance of metabolic flexibility are not completely understood. Here, we report that the deletion of Ptpmt1, a mitochondria-based phosphatase, critically alters mitochondrial fuel selection – the utilization of pyruvate, a key mitochondrial substrate derived from glucose (the major simple carbohydrate), is inhibited, whereas the fatty acid utilization is enhanced. Ptpmt1 knockout does not impact the development of the skeletal muscle or heart. However, the metabolic inflexibility ultimately leads to muscular atrophy, heart failure, and sudden death. Mechanistic analyses reveal that the prolonged substrate shift from carbohydrates to lipids causes oxidative stress and mitochondrial destruction, which in turn results in marked accumulation of lipids and profound damage in the knockout muscle cells and cardiomyocytes. Interestingly, Ptpmt1 deletion from the liver or adipose tissue does not generate any local or systemic defects. These findings suggest that Ptpmt1 plays an important role in maintaining mitochondrial flexibility and that their balanced utilization of carbohydrates and lipids is essential for both the skeletal muscle and the heart despite the two tissues having different preferred energy sources.