Structural differences in adolescent brains can predict alcohol misuse

  1. Roshan Prakash Rane  Is a corresponding author
  2. Evert Ferdinand de Man
  3. JiHoon Kim
  4. Kai Görgen
  5. Mira Tschorn
  6. Michael A Rapp
  7. Tobias Banaschewski
  8. Arun LW Bokde
  9. Sylvane Desrivieres
  10. Herta Flor
  11. Antoine Grigis
  12. Hugh Garavan
  13. Penny A Gowland
  14. Rüdiger Brühl
  15. Jean-Luc Martinot
  16. Marie-Laure Paillere Martinot
  17. Eric Artiges
  18. Frauke Nees
  19. Dimitri Papadopoulos Orfanos
  20. Herve Lemaitre
  21. Tomas Paus
  22. Luise Poustka
  23. Juliane Fröhner
  24. Lauren Robinson
  25. Michael N Smolka
  26. Jeanne Winterer
  27. Robert Whelan
  28. Gunter Schumann
  29. Henrik Walter
  30. Andreas Heinz
  31. Kerstin Ritter
  32. IMAGEN consortium
  1. Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Germany
  2. Faculty IV – Electrical Engineering and Computer Science, Technische Universität Berlin, Germany
  3. Department of Education and Psychology, Freie Universität Berlin, Germany
  4. Science of Intelligence, Research Cluster of Excellence, Germany
  5. Social and Preventive Medicine, Department of Sports and Health Sciences, Intra-faculty unit “Cognitive Sciences”, Faculty of Human Science, and Faculty of Health Sciences Brandenburg, Research Area Services Research and e-Health, University of Potsdam, Germany
  6. Department of Child and Adolescent Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Germany
  7. Discipline of Psychiatry, School of Medicine and Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland
  8. Centre for Population Neuroscience and Precision Medicine (PONS), Institute of Psychiatry, Psychology Neuroscience SGDP Centre, King’s College London, United Kingdom
  9. Institute of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Germany
  10. Department of Psychology, School of Social Sciences, University of Mannheim, Germany
  11. NeuroSpin, CEA, Université Paris-Saclay, France
  12. Departments of Psychiatry and Psychology, University of Vermont, United States
  13. Sir Peter Mansfield Imaging Centre School of Physics and Astronomy, University of Nottingham, United Kingdom
  14. Physikalisch-Technische Bundesanstalt, Germany
  15. Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre Borelli, France
  16. AP-HP Sorbonne Université, Department of Child and Adolescent Psychiatry, Pitié-Salpêtrière Hospital, France
  17. Psychiatry Department, EPS Barthélémy Durand, France
  18. PONS Research Group, Dept of Psychiatry and Psychotherapy, Campus Charite Mitte, Humboldt University, Germany
  19. Institut des Maladies Neurodégénératives, UMR 5293, CNRS, CEA, University of Bordeaux, France
  20. Department of Psychiatry, Faculty of Medicine and Centre Hospitalier Universitaire Sainte-Justine, University of Montreal, Canada
  21. Departments of Psychiatry and Psychology, University of Toronto, Canada
  22. Department of Child and Adolescent Psychiatry and Psychotherapy, University Medical Centre Göttingen, Germany
  23. Department of Psychiatry and Neuroimaging Center, Technische Universität Dresden, Germany
  24. Department of Psychological Medicine, Section for Eating Disorders, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, United Kingdom
  25. School of Psychology and Global Brain Health Institute, Trinity College Dublin, Ireland

Abstract

Alcohol misuse during adolescence (AAM) has been associated with disruptive development of adolescent brains. In this longitudinal machine learning (ML) study, we could predict AAM significantly from brain structure (T1-weighted imaging and DTI) with accuracies of 73 -78% in the IMAGEN dataset (n∼1182). Our results not only show that structural differences in brain can predict AAM, but also suggests that such differences might precede AAM behavior in the data. We predicted 10 phenotypes of AAM at age 22 using brain MRI features at ages 14, 19, and 22. Binge drinking was found to be the most predictable phenotype. The most informative brain features were located in the ventricular CSF, and in white matter tracts of the corpus callosum, internal capsule, and brain stem. In the cortex, they were spread across the occipital, frontal, and temporal lobes and in the cingulate cortex. We also experimented with four different ML models and several confound control techniques. Support Vector Machine (SVM) with rbf kernel and Gradient Boosting consistently performed better than the linear models, linear SVM and Logistic Regression. Our study also demonstrates how the choice of the predicted phenotype, ML model, and confound correction technique are all crucial decisions in an explorative ML study analyzing psychiatric disorders with small effect sizes such as AAM.

Editor's evaluation

This study uses a large dataset on alcohol misuse in adolescents that have been followed up for several years. MRI data are used to test whether the structure and connectivity of the brains of adolescents can predict their alcohol misuse later in their early twenties. The results show that binge drinking can be predicted out of multiple brain phenotypes with good accuracy, even after controlling for many confounding variables. This study can be impactful as it suggests a re-evaluation of studies of the effect of alcohol on the adolescent brain.

https://doi.org/10.7554/eLife.77545.sa0

Introduction

Many adolescents participate in risky and excessive alcohol consumption behaviors (Crews et al., 2007), especially in European and North American countries. Several studies have identified that such early and risky exposure to alcohol is a potential risk factor that can lead to the development of Alcohol Use Disorder (AUD) later in life (DeWit et al., 2000; Grant et al., 2006; Nixon and McClain, 2010). During adolescence and early adulthood (age 10–24), the human brain undergoes maturation characterized by an increase in white matter (WM) (Lebel and Beaulieu, 2011) and an initial thickening and later thinning of grey matter (GM) regions (Giedd, 2004). Researchers have suggested that excessive alcohol use during this period might disrupt normal brain maturation, causing lifelong effects (Crews et al., 2007; Monti et al., 2005; Chambers et al., 2003). Therefore, understanding how alcohol misuse during adolescence is related to the development of Alcohol Use Disorder (AUD) later in life is crucial to understanding alcohol addiction. Furthermore, uncovering how adolescent alcohol misuse (AAM) is associated with their brain at different stages of adolescent brain development can help to implement a more informed public health policy surrounding alcohol use during this age. Previous studies: Several studies in the last two decades have attempted to uncover how adolescent alcohol misuse (AAM) and their structural brain are related. These are summarised in Table 1. Earlier studies collected data with small sample size of 30–100 subjects and compared specific brain regions (such as the hippocampus or the pre-frontal cortex (pFC)) between adolescent alcohol misusers (AAMs) and mild users or non-users (controls). They used structural features such as regional volume (De Bellis et al., 2000; Nagel et al., 2005; De Bellis et al., 2005), cortical thickness (Squeglia et al., 2012), or white matter tract volumes (McQueeny et al., 2009; Jones and Nagel, 2019). These studies found differences between the groups in regions such as the hippocampus (De Bellis et al., 2000; Nagel et al., 2005), cerebellum (De Bellis et al., 2005), and the frontal cortex (De Bellis et al., 2005). However, these findings are not always consistent across studies (Jones et al., 2018). This inconsistency is also evident from the findings in the last column of Table 1. Another group of studies investigated into whether AAM disrupts the natural developmental trajectory of adolescent brains (Jacobus et al., 2013; Luciana et al., 2013; Pfefferbaum et al., 2018; Jones and Nagel, 2019; Sullivan et al., 2020; Robert et al., 2020). These studies reported that the brains of AAMs showed accelerated GM decline (Luciana et al., 2013; Pfefferbaum et al., 2018; Sullivan et al., 2020) and attenuated WM growth (Luciana et al., 2013; Sullivan et al., 2020) compared to controls. However, brain regions reported were not consistent between these studies either and do not tell a coherent story (Jones et al., 2018) (see Table 1). These differences in findings could be potentially due to the following reasons:

  1. Heterogeneous disease with a weak effect size: Alcohol misuse has a heterogeneous expression in the brain (Zahr and Pfefferbaum, 2017). This heterogeneity might be driven by alcohol misuse affecting diverse brain regions in different sub-populations depending on demographic, environmental, or genetic differences (Grant et al., 2015). Furthermore, the effect of alcohol misuse on adolescent brain structure can be weak and hard to detect (especially with the mass-univariate methods used in previous studies). The possibility of several disease subtypes exasperated by the small signal-to-noise ratio can generate incoherent findings regarding which brain regions are affected by alcohol.

  2. Higher risk of false-positives: Most previous studies have small sample size that are prone to generate inflated effect size (Button et al., 2013). Furthermore, these studies employ mass-univariate analysis techniques that are vulnerable to multiple comparisons problem (Lindquist and Mejia, 2015) and can produce false-positives if ignored. These factors coupled with the possibility of publication bias to produce positive results (Ioannidis, 2005) can have a high likelihood of generating false-positive findings (Scheel et al., 2021).

  3. Several metrics to measure alcohol misuse: There is no consensus on what is the best phenotype to measure AAM. Many studies use binge drinking or heavy episodic drinking as a measure of AAM (Squeglia et al., 2012; Whelan et al., 2014; Jones and Nagel, 2019; Robert et al., 2020), while few others use a combination of binge drinking, frequency of alcohol use, amount of alcohol consumed and the age of onset of alcohol misuse (Squeglia et al., 2015; Pfefferbaum et al., 2018; Kühn et al., 2019; Seo et al., 2019; Sullivan et al., 2020). These differences in analyses could potentially produce different findings.

Table 1
Literature review of studies that look into structural brain differences between adolescent alcohol misusers (AAMs) and control subjects.

The studies are sorted by the year of publication. For each study, the sample size ‘n’, the main analysis technique, and the main structural differences found in AAMs are listed.

Study (year)nAnalysis / methodSructural differences in AAMs
De Bellis et al., 200036Statistically compare (univariate)regional brain volumes between groupsLower hippocampal volume.
Nagel et al., 200531Statistically compare (univariate)regional brain volumes between groupsLower volume only in left hippocampus aftercontrolling for other psychiatric comorbidities.
De Bellis et al., 200542Statistically compare (univariate)regional brain volumes between groupsLower pFC, cerebellum volumes in malesbut AAMs had comorbid mental disorders.
McQueeny et al., 200928Mass-univariate analysis ofskeletonized FA voxels (DTI)Binge drinkers had lower FA in18 white matter areas.
Squeglia et al., 201259Statistically compare (univariate) regional brain volumes between groupsNo effect of binge drinking oncortical thickness and sex-specificdifferences among AAMs in left frontal cortex.
Jacobus et al., 201354Mass-univariate analysis of skeletonized FA voxels (DTI)No effect in AAM-only group, but lowerFA in AAM and comorbid marijuana users.
Luciana et al., 201355Longitudinal mass-univariate analysis of cortical thickness, white matter extent, DTI-extracted FA and MDAccelerated GM thinning in mid frontal gyrus, attenuated WM growth with lower FAin left caudate, thalamus.
Whelan et al., 2014692Exploratory analysis using ML to find best predictors of AAM amongdemographic, psychosocial, genetic, cortical volumes, and fMRI variablesCurrent AAMs have lower GMVs in parts of frontal lobe and higher GMV in right putamen. Future AAMs have lower GMV in right parahippocampal gyrus and higher in left postcentral gyrus.
Squeglia et al., 2015137Exploratory analysis using ML to find best predictors of AAM among demographic, neuropsychological, cortical thickness, and fMRI variablesFuture AAM have thinner GM inprecuneus, lateral occipital, ACC, PCC, and frontal and temporal cortex.
Pfefferbaum et al., 2018483Longitudinal mass-univariate analysisof GMV developmentAccelerated GMV reduction in frontal brain regions.
Jones and Nagel, 2019113Modeling the WM microstructure development (DTI) for each voxelAltered frontostriatal WM microstructureis predictive of future AAM.
Kühn et al., 2019≈1500Growth curve modeling ofGM volumesHigher GMV in caudate nucleus and left cerebellum predicts future AAMs
Seo et al., 2019≈1000ML analysis of cue-related brain region followed by mass-univariate analysis for identifying region importanceCurrent AAMs show reduced GMV inmedial-pFC, oFC, thalamus, bilateral ACC,left amygdala and anterior insular.
Sullivan et al., 2020548Longitudinal mass-univariate (GLM)analysis of cerebellar region volumesCerebellum: accelerated GM decline in 2 sub-regions and accelerated expansion ofWM in one sub-region and CSF.
Robert et al., 2020726Mass-univariate analyses of voxels, followed by analysis of the direction of causality using causal bayesian networksAccelerated GM atrophy in parts of the temporal cortex and left prefrontal cortex.
Filippi et al., 2021671ML analysis for predictors ofresilence towards polysubstance useAdolescents resilient to PSU show larger GMV in the bilateral cingulate gyrus.
  1. Acronyms::: GM:grey matter; WM:white matter; CSF-cerebrospinal fluid; GMV:grey matter volume; pFC:prefrontal Cortex; oFC:orbitofrontal cortex; ACC:anterior cingulate cortex; PCC:posterior cingulate cortex; GLM:generalized linear models; ML:machine learning; DTI:Diffusion Tensor Imaging; FA:Fractional Anisotropy; MD:mean diffusivity.

Multivariate exploratory analysis: Over the last years, data collection drives such as IMAGEN (Mascarell Maričić et al., 2020), NCANDA (Brown et al., 2015), and UK Biobank (Sudlow et al., 2015) made available large-sample multi-site data with n>1000 that are representative of the general population. This enabled researchers to use multivariate, data-driven, and exploratory analysis tools such as machine learning (ML) to detect effects of alcohol misuse on multiple brain regions (Whelan et al., 2014; Squeglia et al., 2017; Seo et al., 2019; Filippi et al., 2021; Jia et al., 2021; Yip et al., 2022). Such whole-brain multivariate methods are preferable over the previous mass-univariate methods as they have a higher sensitivity to detect true positives (Hebart and Baker, 2018). Furthermore, ML can be easily used for clinical applications such as computer-aided diagnosis, predicting future development of AUD, and future relapse of patients into AUD (Shiraishi et al., 2011).

Due to these advantages, several exploratory studies using ML have been attempted in AUD research (Whelan et al., 2014; Seo et al., 2019; Squeglia et al., 2017). We further extend this line of work by analyzing the newly available longitudinal data from IMAGEN (n1182 at 4 time points of adolescence) (Mascarell Maričić et al., 2020) by designing a robust and reliable ML pipeline. The goal of this study is to explore the relationship between adolescent brain and AAM using ML and discover any brain features that can be associated with AAM. As shown in Figure 1, we predict AAM at age 22 using brain morphometrics derived from structural imaging captured at three stages of adolescence – ages 14, 19, and 22. The structural features of different brain regions are extracted from two modalities of structural MRI, that is, T1-weighted imaging (T1w) and Diffusion Tensor Imaging (DTI). The most informative structural features for the ML model prediction are discovered using SHAP (Lundberg and Lee, 2017; Lundberg et al., 2020) to reveal the most distinct structural brain differences between AAMs and controls. Furthermore, we use multiple phenotypes of alcohol misuse such as the frequency of alcohol consumption, amount of consumption, onset of misuse, binge drinking, the AUDIT score, and other combinations, and systematically compare them. We also compare four different ML models, and multiple methods of controlling for confounds in ML and derive important methodological insights which are beneficial for reliably applying ML to psychiatric disorders such as AUD. To promote reproducibility and open science, the entire codebase used in this study, including the initial data analysis performed on the IMAGEN dataset are made available at https://github.com/RoshanRane/ML_for_IMAGEN(Rane and Kim, 2022; copy archived at swh:1:rev:6c493672ed700ded73c2b77e8976a5551921e634).

An overview of the analysis performed.

Morphometric features extracted from structural brain imaging are used to predict Adolescent Alcohol Misuse (AAM) developed by the age of 22 using machine learning. To understand the causal relationship between AAM and the brain, three separate analyses are performed by using imaging data collected at three stages of adolescence: age 14, age 19, and age 22.

Results

The results are reported in the following four subsections: In subsection 1, different confound-control techniques are compared and the most suitable technique for this study is determined. Subsection 2 shows the results of the ML exploration performed with ten AAM labels, four ML models, and using imaging data from three time points of adolescence. This stage helps to determine the best phenotype of AAM and the best ML model. Subsection 3 reports the final results on the independent data holdout for all three time point analyses and subsection 4 shows the most informative features found in each of the analyses. Subsection 5 reports the result from the additional leave-one-site-out experiment.

Confound correction techniques

The sex csex and recruitment site csite of subjects confound this study (refer to subsection 5.1 in ‘Materials and methods’) and their influence on the study needs to be controlled. We test three confound correction techniques on data explore – (a) confound regression (b) counterbalancing with undersampling and (c) counterbalancing with oversampling. To verify if these methods work as expected, the same analysis approach from Görgen et al., 2018 and the approach by Snoek et al., 2019 are employed. For the two confounds csex and csite, this requires us to test five input-output combinations (Xy, Xcsex, Xcsite, csexy and csitey) for a given Xy analysis.

Figure 2 shows the results of comparing different confound correction techniques for the ‘Binge’ phenotype. The following conclusions can be derived from this comparison:

Comparing confound correction techniques.

Five input-output settings are compared within each confound correction technique: Xy, Xcsex, Xcsite, csexy, and csitey. (a) shows the results before any correction is performed, (b) shows the results of performing confound regression, and (c) and (d) show the results from counterbalancing by undersampling the majority class and oversampling the minority class, respectively. Statistical significance is obtained from 1,000 permutation tests and is shown with ** if p<0.01, * if p<0.05, and ‘n.s’ if p0.05.

1. Sex and site can confound the AAM analysis: As shown in subplot (a), all the input-output combinations involving the confounds (Xcsex, Xcsite, csexy and csitey) produce significant prediction accuracies before any confound correction is performed. This further adds to the evidence that both the confounds csex, csite can strongly influence the accuracy of the main analysis Xy and confound the analysis. 2. Confound regression is not a good choice when followed by a non-linear ML method: Following confound regression, the results of Xcsex and Xcsite should become non-significant as the signal sc has been removed from X. However, it is seen that in some cases the non-linear models SVM-rbf and GB are capable of detecting the confounding signal sc from the imaging data. The red arrow in the subplot (b) points out one such case in the example shown. This is not surprising as the standard confound regression removes linear components of the signal sc but does not remove any non-linear components that might still be present in X (Görgen et al., 2018; Dinga et al., 2020). Furthermore, confound regression carries an additional risk of also regressing-out the useful signal in X that does not confound the analysis Xy but is a co-variate of both c and y (Dinga et al., 2020). 3. Counterbalancing with oversampling is the best choice for this study: As expected, counterbalancing forces the csexy and csitey accuracies to chance-level by removing the correlation between cy (subplots c and d). It can be seen that after the undersampled counterbalancing the results of the main analysis Xy also become non-significant as indicated by the red arrow in (c). This drastic reduction in performance is likely due to the reduction in the sample size of the training data by n100-250 from undersampling. Therefore, counterbalancing with oversampling of the minority group is a better alternative compared to undersampling.

This comparison was also repeated for two other AAM phenotypes - ‘Combined-seo’ and ‘Binge-growth’ and the above findings were found to be consistent across all of them. Hence, counterbalancing with oversampling is used as the confound-control technique in the main analysis. When performing over-sampled counterbalancing, it is ensured that the oversampling is done only for the training data.

ML exploration

The results from the ML exploration experiments are summarised in Figure 3. For the different AAM phenotypes, the balanced accuracies range between 45 and 73%. It must be noted that the results across different phenotypes are not directly comparable as each AAM phenotype classification task has a different sample size varying between 620-780 (refer to ‘Materials and methods’ Table 2 and Appendix 1—table 2 for the list of phenotypes and their respective sample size). These differences in the number of samples in the two classes AAM and controls could add additional variance in the accuracy. Nevertheless, some useful observations can be made from the consistenties found across the three time point analyses, depicted in subplots (a), (b), and (c) of Figure 3:

Figure 3 with 1 supplement see all
Results of the ML exploration experiments: The ten phenotypes of AAM tested are listed on the y-axis and the four ML models are represented with different color coding as shown in the legend of figure (a).

For a given AAM label and ML model, the point represents the mean balanced accuracy across the 7-fold CV and the bars represent its standard deviation. Figure (a) shows the results when the imaging data from age 22 (FU3) is used, figure (b) shows results for age 19 (FU2) and figure (c) for age 14. Figure (d) shows the results from all three time point analyses in a single plot along with the interval of the balanced accuracy that were non-significant (p0.05) when tested with permutation tests.

  1. The most predictable phenotype from structural brain features for all three time point analyses is ‘Binge’ which measures the total lifetime experiences of being drunk from binge drinking.

  2. Other individual phenotypes such as the amount of alcohol consumption (Amount), frequency of alcohol use (Frequency) and the age of AAM onset (Onset) are harder to predict from brain features compared to the binge drinking phenotype. The results on ‘Combined-seo’ and ‘Combined-ours’ shows that using phenotypes measuring amount and frequency of drinking in combination with binge drinking seems to also be detrimental to model performance.

  3. All models perform poorly at predicting AAM phenotypes derived from AUDIT. This is surprising as AUDIT is considered a de facto screening test for measuring alcohol misuse (Kranzler and Soyka, 2018).

  4. Among the four ML models, the SVM with non-linear kernel SVM-rbf, and the ensemble learning method GB perform better than the linear models LR and SVM-lin. This is further evident in the summary plot (d) in the figure.

Table 2
10 phenotypes of Adolescent Alcohol Misuse (AAM) are derived and compared in this analysis.

A description of each phenotype is provided here along with the link to the IMAGEN questionnaires ID used to generate the phenotype.

No.PhenotypeDescriptionQuestionnaire
1FrequencyNumber of occasions drinking alcohol in last 12 monthsESPAD 8b.
2AmountNumber of alcohol drinks consumed on atypical drinking occasionESPAD prev31,AUDIT q2.
3OnsetHad one or more binge-drinking experiences by the age of 14ESPAD 29d
4BingeTotal drunk episodes from binge-drinking in lifetime (by age 22)ESPAD 19a,AUDIT q3.
5Binge-growthLongitudinal trajectory of binge-drinking experiences had per yearGrowth curveof ESPAD 19b.
6AUDITAUDIT screening test performed at the year of scanAUDIT-total (q1-10).
7AUDIT-quickOnly the first 3 questions of AUDIT screening testAUDIT-freq (q1-3).
8AUDIT-growthLongitudinal changes in the AUDIT score measured over the yearsGrowth curve ofAUDIT-total.
9Combined-seoA combined risky-drinking phenotype from Seo et al., 2019 generated using amount, frequency, and binge-drinking dataESPAD 8b, 17b, 19b,and TLFB alcohol2
10Combined-oursA combined risky-drinking phenotype developed by clusteringamount, frequency, and binge-drinking trajectoryAUDIT q1, q2,ESPAD 19a, growthcurve of ESPAD 19b.

In summary, the non-linear ML models SVM-rbf and GB coupled with the ‘Binge’ phenotype consistently perform the best in all three time point analyses. This is more clearly visible in the summary figure (d) where the results from all three analyses are combined in a single plot. Similar general observations can be made when the AUC-ROC metric is used to measure model performance (see Figure 3—figure supplement 1).

Generalization

The generalization test is performed with ‘Binge’ phenotype as the label and the two non-linear ML models, SVM-rbf and GB. The final results are shown in Figure 4. For the three analyses using imaging data from age 22, age 19, and age 14, as input, an average balanced accuracy of 78%, 75.5%, and 73.5% are achieved, respectively. Their average ROC-AUC scores are 83.93%, 83.1%, and 81.5% for the respective analyses. The accuracies for all three time point analyses are significant with p<0.01. To get a better intuition, please refer to Figure 4—figure supplement 1 that shows the model accuracies against the accuracies obtained from permutation tests.

Figure 4 with 1 supplement see all
Final results for the three time point analyses on the ‘Binge’ drinking AAM phenotype obtained with the two non-linear ML models, kernel-based support vector machine (SVM-rbf) and gradient boosting (GB).

The figure shows the mean balanced accuracy achieved by each ML model within each analysis while the table lists the combined average scores for each analysis. The ML models are retrained seven times on data explore with different random seeds and evaluated on data holdout to obtain an estimate of the accuracy with a standard deviation. Statistical significance is obtained from 1000 permutation tests and is shown with ** if p<0.01, * if p<0.05, and ‘n.s’ if p0.05.

To further assess the causality in the MRIage14AAMage22 analysis, we repeated it by using only subjects who had no binge drinking experiences by age 14 (n=477) and also with subjects who had a maximum of one binge drinking experience (n=565) by age 14. The balanced accuracy obtained on the holdout set was 72.9±2% and 71.1±2.3%, respectively.

Important brain regions

Following the generalization test, the most informative structural brain features are determined for the SVM-rbf model, as it performs relatively better among the two non-linear models tested on data holdout (see Figure 4). Figure 5 shows the list of the most important features for all three time point analyses and illustrates where they are located in the brain. It also shows whether these features have lower-than-average or higher-than-average values when the ML model predicts the subjects as AAMs.

Most informative structural features for SVM-rbf model’s predictions on data holdout.

Most important features are listed and their locations are shown on a template brain for a better intuition for each of the three time point analyses. The features are color coded to also display whether these features have lower-than-average or higher-than-average values when the model predicts alcohol misusers. This figure is only illustrative and an exhaustive list of all informative features with their corresponding SHAP values are given in the Appendix 1—table 3. (Acronyms:: AAM: adolescence alcohol misuse, area: surface area, volume: gray matter volume, thickness: average thickness, thicknessstd: standard deviation of thickness, intensity: mean intensity, meancurv: integrated rectified mean curvature, gauscurv: integrated rectified gaussian curvature, curvind: intrinsic curvature index).

Several clusters of regions and feature values can be identified. Most of the important subcortical features are located around the lateral ventricles and the third ventricle and include CSF-related features such as the CSF mean-intensity, volume of left choroid plexus, and left corticospinal tract in the brain stem. Several white matter tracts are found to be informative such as parts of the corpus callosum, internal capsule, and posterior corona radiata. Furthermore, all of these white matter tracts, along with the brain stem have lower-than-average intensities in AAM predictions. The prominent cortical features are spread across the occipital, temporal, and frontal lobes. In the MRIage22AAMage22 analysis important cortical features appear in the occipital lobe. In contrast, for the future prediction analyses MRIage19AAMage22 and MRIage14AAMage22, clusters appear in the limbic system (parts of the cingulate cortex and right parahippocampal gyrus), frontal lobe (left-pars orbitalis, left-frontal pole, right-precentral gyrus, and left-rostral middle frontal gyrus) as well as in the temporal lobe (left-inferior temporal gyrus, left-temporal pole, and right-bank of the superior temporal sulcus). In the occipital lobe, AAMs predictions have lower grey matter thickness in the right-cuneus, lateral occipital, and pericalcarine cortices, and higher curvature index in left-cuneus and left-pericalcarine cortex. The list of all the informative features are provided in Appendix 1—table 3 along with their feature type, modality, and respective SHAP values in each CV folds.

Cross-site experiment

The result from the leave-one-site-out CV experiment are shown in Figure 6. The ML models perform close-to-chance for all AAM labels in the ML exploration experiments and fail to produce a significant performance for any of the three time points in the generalization test. For the ‘Binge’ label in the ML exploration stage, the model accuracy displays very high variance, as compared to the main experiment (compare Figure 6 with Figure 3 (d)). This suggests that the performance of the ML models varies greatly across sites in this study.

Analysis repeated with leave-one-site-out cross validations (CV).

Discussion

For over two decades, researchers have tried to uncover the relationship that exist between adolescent alcohol misuse (AAM) and brain development. Many previous studies found that such a relationship exists (see Table 1) but with low-to-medium effect size (Nagel et al., 2005; Whelan et al., 2014; Squeglia et al., 2017; Seo et al., 2019; De Bellis et al., 2005; McQueeny et al., 2009; Luciana et al., 2013). The brain regions linked with AAM varied greatly across studies (see highlighted text in Table 1). This inconsistency in findings and effect sizes could be due to methodological limitations, small sample studies, unavailability of long-term longitudinal data like IMAGEN (Mascarell Maričić et al., 2020), or simply due to the heterogeneous expression of AAM in the brain. In our study, ML models predicted AAM with significantly above-chance accuracies in the range 73.1%-78% (ROC-AUC in 81.5%-83.9%) from adolescent brain structure captured at ages 14, 19, and 22. Thus, our results demonstrate that adolescent brain structure is indeed associated with alcohol misuse during this period.

The causality of the relationship between adolescent brain structure and AAM is not clear (Whelan et al., 2014; Robert et al., 2020). The relationship could arise from alcohol misuse inducing neurotoxicity (Zahr and Pfefferbaum, 2017) causing the observed changes in their brains. It could also be that these structural differences precede AAM and such adolescents are just more vulnerable towards alcohol misuse (Chambers et al., 2003; Sanchez-Roige et al., 2019). Such neuropsychological predisposition could stem from genetic predispositions or from influencing environmental factors such as early stress or childhood trauma (Baker et al., 2013; Ross et al., 2021), misuse of other drugs such as cannabis (French et al., 2015) and tobacco, and parental drug misuse (Jones and Nagel, 2019). There might also be an interaction effect between alcohol-induced neurotoxity and environmental and genetic predispositions (Robert et al., 2020). While the direction of causality is still under active investigation (Robert et al., 2020; Bourque et al., 2016), the significantly high accuracies obtained in our study for MRIage19AAMage22 and especially MRIage14AAMage22 suggest that these structural differences might be preceding alcohol misuse behavior. Out of the 265 subjects that took the ESPAD survey at age 14 and belonged to the AAM category in MRIage14AAMage22 analysis, 83.3% of subjects reported having no or just one binge drinking experience until age 14. When we repeated the MRIage14AAMage22 analysis with only the subjects who had no binge drinking experiences (n=477) or a maximum of one binge drinking experience (n=565) by age 14, we obtained a balanced accuracy of 72.9±2% and 71.1±2.3% respectively, on the holdout data. This is comparable to the main result of 73.1±2%. This result provides further evidence for the findings of Robert et al., 2020 that certain cerebral predispositions might precede alcohol abuse in adolescents. Thus, like (Robert et al., 2020) we also advocate caution when interpreting the results from previous cross-sectional studies suggesting alcohol-induced brain atrophy. We identified the most informative brain features for the ML predictions using SHAP that has been successfully applied to medical data (Lundberg and Lee, 2017; Lundberg et al., 2020; Molnar, 2022). The important features were found to be distributed across several subcortical and cortical regions of the brain, implying that the association between AAM and brain structure is widespread and heterogeneous. In accordance with previous studies, AAM was associated with lower DTI-FA intensities in several white matter tracts and the brain stem (McQueeny et al., 2009; Jacobus et al., 2013; Jones et al., 2018) and reduced GM thickness (Squeglia et al., 2017; Pfefferbaum et al., 2018), especially in the occipital lobe. Features of anterior cingulate cortex (Squeglia et al., 2017; Seo et al., 2019; Jones et al., 2018), middle frontal and precentral gyrus (Luciana et al., 2013), hippocampus (De Bellis et al., 2000; Nagel et al., 2005), and right parahippocampal gyrus (Whelan et al., 2014) were also found to be informative, although the type of feature and the average feature value in AAMs differed from previous studies. Features from the frontal lobe and cerebellum were informative only for future AAM (Jones and Nagel, 2019) but not for current AAM prediction, in contrast to findings of De Bellis et al., 2005; Whelan et al., 2014; Seo et al., 2019. This difference could be due to the meticulous confound control performed in this study for sex and site of the subjects. Additionally, our ML models also found CSF-related features in the third and lateral ventricles, and some regions of the temporal cortex as informative features for AAM prediction.

In the ML exploration stage, we found that the binge drinking phenotype, which is commonly used in previous studies (Nagel et al., 2005; Whelan et al., 2014; Robert et al., 2020), was the most predictable phenotype of AAM as compared to frequency, amount, or onset of alcohol misuse. Curiously, phenotypes derived from AUDIT, which is a gold standard of screening for alcohol misuse (Kranzler and Soyka, 2018), did not score significantly above-chance in any of the three time point analyses. Other similar compound metrics that use measures of alcohol use frequency and amount along with binge drinking, such as ‘Combined-seo’ and ‘Combined-ours’, also perform worse than using just the binge drinking information. This suggests that using other phenotypes of alcohol misuse in combination with binge drinking was detrimental to the prediction task, as compared to using only binge drinking. Different phenotypes of AAM capture slightly different psychosocial characteristics of adolescents (Castellanos-Ryan et al., 2013). For instance, ‘Amount’ correlates significantly with agreeableness and a life history of relocation valence (r=-0.14, p<0.001), accident valence (r=-0.16, p<0.001) and sexuality frequency (r=-0.17, p<0.001), whereas the other phenotypes do not (p>0.01). ‘AUDIT’ and it’s derivatives significantly correlate with impulsivity trait (r=0.23, p<0.001) on SURPS, where as ‘Binge’ does not (r=0.09, p>0.01) but they both correlate with sensation seeking trait (r>0.29, p<0.001) as also found in previous studies (Castellanos-Ryan et al., 2011). Castellanos-Ryan et al., 2013 have found that these two traits manifest differently in the brain. Therefore, one can hypothesize that the psychosocial differences and their associated neural correlates (Castellanos-Ryan et al., 2011) between ‘Binge’ and the other AAM phenotypes might explain the 2-10% higher accuracy obtained with ‘Binge’.

In contrast to the main results, the ML models failed to attain significantly high prediction accuracy in the leave-one-site-out experiment as the scores displayed high variance across the CV folds (refer to Figure 6). On further investigation, we found that the ML models performed especially poorly on test data from Dublin and Nottingham (60% balanced accuracy) across all time points and metrics. On the contrary, models always performed better-than-chance on subjects from Dresden, Mannheim, and Hamburg. When we compared this with the main experiment, a similar pattern was found. The models least generalized to test subjects from the sites Dublin and Nottingham, across all 7 CV folds. Notably, the accuracy across sites did not correlate with the sample size of the sites, the ratio of AAMs to controls in the site, or their sex distribution. The results are shown in Appendix 1—figure 2 and Appendix 1—figure 3. Altogether, these results suggest that the relationship discovered in this study performs diversely on subjects from different sites and does not generalize equally across all sites of the IMAGEN dataset.

Methodological insights: To the best of our knowledge, this is the first study to analyze and reports results on the complete longitudinal data from IMAGEN, including the follow-up 3 data. Two previous studies, (Whelan et al., 2014; Seo et al., 2019) performed similar ML analysis on the IMAGEN data and unlike us, found only a weak association between structural imaging and AAM. The logistic regression model in Whelan et al., 2014 scored 58±8% ROC-AUC when predicting AAM at age 14 from structural imaging features collected at age 14 (BL) and 63±7% ROC-AUC at predicting AAM at age 16 (FU1). This lower accuracy with high variance obtained in their experiments can be attributed to - (a) the relatively smaller sample size used in their study (n265-271), (b) unavailability of long-term AAM information from IMAGEN’s FU2 and FU3 data, (c) using only a linear ML model, and (d) only using GM volume and thickness as structural features. On the other hand, Seo et al., 2019’s models achieved accuracies in the range 56-58% when predicting AAM at age 19 (FU2) using imaging features from age 19, and did not get a significant accuracy when they used imaging features from age 14. This lower performance can be attributed to the following experimental design decisions - (a) Seo et al., 2019 used GM volume and thickness features from just 24 regions of the brain associated to cue-reactivity, (b) their AAM phenotype is not the best phenotype of AAM as evident from the results of our ML exploration (see results for ‘Combined-seo’ in Figure 3), and (c) the confound-control technique used in their study, confound regression, can result in under-performance as demonstrated in Figure 2.

In contrast to these previous works, our study has the following advantages: First, we use 719 structural features extracted from 2 MRI modalities, T1w and DTI, that include not only GM volume and thickness but also surface area, curvature, and WM and GM intensities from all cortical and sub-cortical regions in the brains. Second, we empirically derive the best AAM label for the task by comparing different phenotypes previously used in the literature. For the different AAM phenotypes, the balanced accuracies range between chance to significant performance (45%-73%), emphasizing the importance of the choice of the label in such ML studies with low effect sizes. And finally, we test different confound correction techniques and use the one that effectively controls for the influence of confounds without also destroying the signal of interest. In summary, the higher accuracy in the current study can be attributed to not just the availability of long-term data on AAM but also to the rigorous comparison of different labels of AAM, different ML models and confound control techniques.

Among the four different ML models tested, the two non-linear models, SVM-rbf and GB, consistently performed better than the two linear models. We also explicitly ensured that the confounding influence of sex and site were eliminated by combining suggestions from Görgen et al., 2018 and Snoek et al., 2019. We found evidence that the linear confound regression technique used often in previous ML-based neuroimaging studies (Seo et al., 2019; Robert et al., 2020; Snoek et al., 2019), might not be the best choice as it cannot be used with non-linear models such as SVM-rbf or Naive Bayes used in Seo et al., 2019 and distorts the signal of interest from the neuroimaging data (Dinga et al., 2020) as seen in Figure 2. In contrast, counterbalancing using oversampling is recommended as it successfully removed the influence of the confounds without reducing the sample size in the study.

Future work: An important follow-up work would involve further investigating the association we found between AAM and adolescent brain structure and its clinical implications. For instance, one can analyze if certain environmental risk factors such as childhood abuse, parental drug use, or life event stressors mediate the relationship we found between brain structure and AAM behavior in the IMAGEN cohort. Another direction would be to further investigate the brain features associated with AAM and understand the relative contributions of specific brain networks (for example, similar to Seo et al., 2019) and certain specific feature types such as thickness, or volumes. Specifically, since ML feature attribution methods such as SHAP can be misled by the presence of correlated features (Molnar, 2022; Lundberg and Lee, 2017), it would be necessary to before-hand determine which features might be correlated and either exclude them, or permute correlated features together in groups when computing SHAP values (Molnar, 2022). Another important future work would be to reproduce our findings on another data set such as NCANDA (Brown et al., 2015) comprising adolescent subjects from a different geographic region. It would also be interesting to explore other modalities such as functional connectivity (fMRI) to predict AAM (Ruan et al., 2019).

Conclusion

This study analyzed alcohol misuse in adolescents and their brain structure in the large, longitudinal IMAGEN dataset consisting of n1182 healthy adolescents (Schumann et al., 2010; Mascarell Maričić et al., 2020). We found that alcohol misuse in adolescents can be predicted from their brain structure with a significant and high accuracy of 73%-78%. More importantly, alcohol misuse at age 22 could be predicted from the brains at age 14 and age 19 with significant accuracies of 73.1% and 75.55%, respectively. This suggests that the structural differences in the brain might at least partly be preceding alcohol misuse behavior (Robert et al., 2020). Results of a leave-one-site-out experiment also revealed that the relationship discovered by the ML models may not generalize to all the sites in the IMAGEN dataset equally, particularly, to subjects from the sites Nottingham and Dublin. We extensively compared different phenotypes of alcohol misuse such as frequency of alcohol use, amount of use, the onset of alcohol misuse, and binge drinking occasions and found that binge drinking is the most predictable phenotype of alcohol misuse. We also compared four different ML models and found that the two non-linear models - SVM-rbf and GB - perform better than the two linear models, SVM-lin and LR. We also evaluated different confound-control techniques and found that counter-balancing with oversampling is most beneficial for the current task. To the best of our knowledge, this was the first study to analyze and report results on the follow-up 3 data from IMAGEN. The results of our exploratory study advocate that collecting long-term, large cohorts of data, representative of the population, followed by a systematic ML analysis can greatly benefit research on complex psychiatric disorders such as AUD.

Materials and methods

Data

The IMAGEN dataset (Mascarell Maričić et al., 2020; Schumann et al., 2010) is currently one of the best candidates for studying the effects of alcohol misuse on the adolescent brain. Most large-sample studies listed in Table 1 [27, 29 and 30] used the IMAGEN dataset for their analysis. It consists of data collected from over 2000 young people and includes information such as brain neuroimaging, genomics, cognitive and behavioral assessments, and self-report questionnaires related to alcohol use and other drug use. The data was collected from 8 recruitment centers across Europe, at 4 successive time points of adolescence and youth. Figure 7 (a) shows the number of subjects at each time point and the number of participants that were scanned. Subjects were not scanned in FU1. More details regarding recruitment of subjects, acquisition of psychosocial measures, and ethics can be found on the IMAGEN project website (https://imagen-europe.com/standard-operating-procedures). Written and informed consent was obtained from all participants by the IMAGEN group and the study was approved by the institutional ethics committee of King’s College London,University of Nottingham, Trinity College Dublin, University of Heidelberg, Technische Universität Dresden, Commissariat à l’Energie Atomique et aux Energies Alternatives, and University Medical Center at the University of Hamburg in accordance with the Declaration of Helsinki (Association, 2013).

The IMAGEN dataset: (a) Data is collected longitudinally at 4 stages of adolescence - age 14 or baseline (BL), age 16 or follow-up 1 (FU1), age 19 or follow-up 2 (FU2) and, finally age 22 or follow-up 3 (FU3).

The blue bar shows the number of subjects with brain imaging data. (b) The distribution of subjects across sex and the site of recruitment, for the 1182 subjects that were scanned at FU3 (c) The same distribution across sex and site also showing the proportion of subjects that meet the AUDIT ’risky drinkers’ category at FU3.

Structural neuroimaging data

Request a detailed protocol

To investigate the effects of alcohol on brain structure, two MRI modalities have been used predominantly in the literature - (a) T1-weighted imaging (T1w), and (b) Diffusion Tensor Imaging (DTI) (see Table 1). While T1w MRI can be used to derive general features of the brain structure such as cortical and sub-cortical volumes, areas, and gray-matter thicknesses, DTI captures white matter microstructures by probing water molecule motion. An axial slice (z=80) of both of these MRI modalities of a control subject from the IMAGEN data are shown in Figure 8. Both modalities were recorded using 3-Tesla scanners. The T1w images were collected using sequences based on the ADNI protocol (Wyman et al., 2013). The IMAGEN consortium used Freesurfer’s recon-all pipeline to process these images and extract structural features. This involves registering the T1w-images to the Talairach template brain, automatic extraction of gray matter, white matter and cerebrospinal fluid (CSF) sections, and then segmenting them into 34 cortical regions per hemisphere and 45 sub-cortical regions.The grey matter volume (in mm3), surface area (in mm2), thickness (in mm), and surface curvature, are extracted for each of the cortical regions using the Desikan-Killiany atlas, along with global features such as total intracranial, total grey matter, white matter and CSF volumes. For the subcortical regions, the mean intensity and volume are determined. This results in a total of 656 structural features per subject. DTI scans were acquired using the protocol described in Jones et al., 2002 and Fractional Anisotropy (FA) is derived from the DTI using FMRIB’s Diffusion Toolbox FDT. The DTI-FA images are then non-linearly registered to the MNI152 space (1 mm3) and the average FA intensity at 63 regions with white matter tracts are calculated using the TBSS toolbox (Smith et al., 2006) by the IMAGEN consortium (https://github.com/imagen2/imagen_processing/tree/master/fsl_dti). Subjects with FA intensity greater than 3 standard deviations from the mean are excluded as outliers.

A schematic representation of the experimental procedure followed for all 3 time point analyses.

In the ML exploration stage, we experiment with four ML models and 10 phenotypes of AAM on 80% of the data (data explore) using a sevenfold cross-validation scheme. Once the best ML model, the best phenotype of AAM, and the most appropriate confound-control technique are determined, the generalization test is performed on data infer by using the data holdout subset as the test data. The result from the generalization test are reported as the final results and the informative brain features are determined at this stage using SHAP (Lundberg and Lee, 2017).

Alcohol misuse phenotypes

Request a detailed protocol

Information related to alcohol use and misuse can be found in the AUDIT screening test (AUDIT questionnaire (link)) (Alcohol Use Disorder Identification Test), ESPAD questionnaire (European School Survey Project on Alcohol and other Drug), and the TLFB logs (Timeline-Followback Interview). Previous studies used different metrics of alcohol misuse such as the number of binge drinking episodes (Squeglia et al., 2012; Whelan et al., 2014; Jones and Nagel, 2019; Robert et al., 2020), the frequency and amount of alcohol consumption (Squeglia et al., 2015; Pfefferbaum et al., 2018; Kühn et al., 2019; Seo et al., 2019; Sullivan et al., 2020), and even the age of onset of alcohol misuse (Ruan et al., 2019) to characterize AAM. There has not yet been a systematic comparison of these different phenotypes.

In this paper, we use four alcohol misuse metrics to derive ten phenotypes of AAM, (a) frequency of alcohol use, (b) amount of alcohol consumed per drinking occasion, (c) year of onset of alcohol misuse, and (d) the number of binge drinking episodes. These phenotypes are listed in Table 2 and include each of the individual metrics, their combinations, and their longitudinal trajectories from age 14–22. The longitudinal phenotypes, ‘Binge-growth’ and ‘AUDIT-growth’, are generated using latent growth curve models (Deeken et al., 2020) to capture the alcohol misuse trajectory over the four time points - BL, FU1, FU2, and FU3. To derive the AAMs group and the controls from each alcohol misuse metric, a standard procedure is followed that is similar to Seo et al., 2019; Ruan et al., 2019. First, the phenotype is used to categorize the subjects into three stages of alcohol misuse severity - heavy AAMs, moderate misusers, and safe users. Moderate misusers are then excluded from the analysis (250-400 subjects) and ML classification is performed with heavy misusers as AAMs and safe users as controls. Appendix 1—figure 1 and Appendix 1—table 2 shows how the subjects are divided into these three sub-groups for each of the 10 phenotype. Appendix 1—table 2 also lists the final number of subjects in each sub-group in the FU3 analysis, as an example. The data analysis procedure can be found in the project code repository (https://github.com/RoshanRane/ML_for_IMAGEN; Rane and Kim, 2022) within the dataset-statistics notebook.

Confounds in the dataset

Request a detailed protocol

Diagram (c) in Figure 7 shows how the proportion of risky alcohol users varies across the 8 recruitment sites and among the male and female subsets at each site within the dataset. For example, a greater portion of subjects from sites like Dublin, London, and Nottingham indulge in risky alcohol use compared to the sites from mainland Europe. Similarly, at most sites, a greater portion of males are risky alcohol users compared to females. These systematic differences can confound ML analyses since ML models can use the sex and site information present in the neuroimaging data to indirectly predict AAM, instead of identifying alcohol-related effects in the brain structure. This problem of confounds in multivariate analysis (Rao et al., 2017; Görgen et al., 2018; Snoek et al., 2019) and the methods used to control for its effects are explained in further detail in the next section.

Methods

Three time point analyses are performed in this study. Each time point analysis is divided into two stages called the ML exploration stage and the generalization test stage. The ML exploration is performed with 80% of data (randomly sampled). The remaining 20% (n=147) serve as an independent test data, called the data holdout, which is only used once, in the end, to perform the final inference and report the results. This design allows us to first determine the best ML algorithm for the task and the best phenotype of AAM, and then test the results on an independent subset of the data. Pseudocode of this pipeline is provided at the end of Appendix (Algorithm 1) and was implemented using python’s scikit-learn software package (https://scikit-learn.org/stable/about.html). The two-stage cross-validation (CV) with a inner n-fold cross-validation (CV) procedure is designed to prevent ‘double dipping’ (Vul et al., 2009; Kriegeskorte et al., 2009). All data preprocessing and analysis is executed only on the training data in data explore, and only applied on the test data during validation. This ensures that there are no data leakage issues that were found in several previous ML neuroimaging studies (Wen et al., 2020). Since multi-site data is used, another additional experiment is performed to test the ability of the ML models to generalize across recruitment sites. In this experiment, instead of randomly sampling 20% of the subjects for the data holdout, all subjects from the Nottingham site (n=176) are set aside as data holdout. Then, subjects from each of the remaining 7 recruitment sites are used as onefold in the sevenfold CV performed during the ML exploration phase. This method of CV is termed leave-one-site-out CV (Rozycki et al., 2018).

MRI features

Request a detailed protocol

The 656 morphometric features extracted from T1w sMRI modality and the 63 features extracted from the DTI-FA modality are used together as the input for the ML models at both stages. Each feature is standardized to have zero mean and unit variance across all subjects (mean and variance are estimated only on the training data, and then applied to the test data). Features with zero variance are dropped.

ML models

Request a detailed protocol

Four ML models are tested in this study. These include logistic regression (LR), linear SVM (SVM-lin) (Boser et al., 1992), kernel SVM with a radial basis function (KSVM-rbf) (Chapelle et al., 2002), and a gradient boosting (GB) classifier (Friedman, 2001). LR and SVM-lin are linear ML methods, whereas SVM-rbf and GB are capable of learning non-linear mappings. We use the liblinear (Fan et al., 2008) implementation of SVM-lin and XGBoost (Chen and Guestrin, 2016) implementation of GB. GB is an ensemble learning method. The hyperparameters of the models are tuned using an inner-CV and are listed in Appendix 1—table 1. Testing 4 different ML models helps to account for any modeling-related bias (Wolpert and Macready, 1997) in the final results. Combining the 4 ML models and the ten different phenotypes of AAM, we end up with a total of 40 ML classification runs in the ML exploration stage.

Evaluation metrics

Request a detailed protocol

The model performance is evaluated using the balanced accuracy metric (Urbanowicz and Moore, 2015). It is formulated as the mean of the model’s accuracies for each class (AAM and controls) in the classification. Therefore, it is insensitive to class imbalances in the data. Along with this, the area under the curve of the receiver-operator characteristic (AUC-ROC) is also calculated. In ML exploratory stage, seven measures are obtained for each metric from the outer sevenfold CV which helps to estimate mean of the model performance and get a sense of the variance (Bengio and Grandvalet, 2004). During generalization test, the ML models are retrained seven times on data explore with different random seeds and reevaluated on data holdout to gain an estimate of the model performance on data holdout. The statistical significance of the final generalization test accuracies is calculated using permutation testing (Ojala and Garriga, 2010). The permutation test is performed by running the entire ML pipeline with randomly shuffled labels in the training data, while keeping the labels in the test data fixed. This is repeated 1000 times to generate the null-hypothesis (H0) distribution and derive the p-value. Since three time point analyses are performed on the same subjects, Bonferroni correction is applied on the p-values to control for the false-positive rate from this multiple comparison.

Model interpretation

Request a detailed protocol

The associations learned by the ML models between structural brain features and AAM is extracted using a post-hoc feature importance attribution technique called SHAP (Lundberg and Lee, 2017). SHAP (SHapley Additive exPlanations) uses the concept of Shapley Values from cooperative game theory to fairly determine the marginal contribution of each input feature to model prediction (Lundberg and Lee, 2017). Among the several SHAP estimator types (Molnar, 2022), we use the permutation-based estimator as it is compatible with all 4 ML models used in this analysis.

Following the generalization test, a SHAP value (Ss,f) is generated for each input feature f of each subject s in data holdout. The goal is to determine which of the 719 features were most informative for the model when classifying AAMs from controls. Feature importance can be determined by looking at the average absolute SHAP value of each feature across all subjects Sf¯=1Ns=1N|Ss,f|, where N denotes the total subjects in data holdout. The most significant features are chosen as those features that have Sf¯ value at least two times higher than the average SHAP value across all the features S¯=1719f=1719|Sf¯|. Our feature importance estimation can be confounded by the presence of correlated features (Molnar, 2022). When several features are correlated, the ML models might use only some features for its prediction and ignore the rest and this preferential bias can be reflected in the SHAP values. Since the generalization test is repeated seven times with different random seeds, we have seven instances of Sf¯ available. Therefore, we repeat the SHAP estimation on each Sf¯ with different random permutations and check for consistency of feature importance scores across these seven trials. Only the features that consistently have Sf¯2*S¯ across at least six of the seven runs are listed as the most informative features. Following this, it is determined if these informative features have higher-than-average or lower-than-average values when predicted as AAM. This information is further relevant for deriving clinical insights about how AAM brain structure differs from controls.

Correcting for confounds

Request a detailed protocol

In ML, a confounding variable c is defined as a variable that correlate with the target y and is deducible from the input X, and this relationship Xcy is not of primary interest to the research question and hinders the analysis (Snoek et al., 2019). As demonstrated by the diagram on the right, a confounding variable c can form an alternative explanation for the relationship between X and y and distract the ML models from detecting the signal of interest sy between Xy. In this study, the sex of the subjects and their site of recruitment can confound the AAM analysis (Seo et al., 2019) since they correlate with the output AAM labels and are predictable from the input structural brain features. Instead of detecting the effects of alcohol misuse in the brain sy, the ML models could potentially use the information about the confounds sc to predict AAM along the alternative pathway (shown with the red dotted lines) and produce significant but confounding results (Seo et al., 2019; Snoek et al., 2019; Dinga et al., 2020). In neuroimaging studies, two methods have been extensively employed for correcting the influence of confounds:

  1. Confound regression: In this method, the influence of the confounding signal sc on X is controlled by regressing out the signal from features in X(Rao et al., 2017). This can remove the alternative confounding explanation pathway by eliminating the link sc between Xc.

  2. Post hoc counterbalancing: The correlation between the confound and the output cy can be removed by resampling the data after the data collection. This method potentially removes the alternative confounding pathway by abolishing the relationship cy(Rao et al., 2017). The resampling is performed such that the distribution of the values of the confounding variable c is similar across all classes of y (AAM and controls). So for example, after counterbalancing for sex in this study, the ratio of male-to-female subjects should be the same in AAMs and controls. One common technique of counterbalancing for categorical confounds (e.g. sex, site) involves randomly dropping some samples from the larger classes in y until they are equal. This is called counterbalancing with undersampling. However, this will result in a reduction in the sample size and hence the statistical power of the study. Another way to counterbalance without losing samples involves performing sampling-with-replacement on the smaller classes in y. This is called counterbalancing with oversampling. One should take care that the sampling-with-replacement is done only on the training data, after the train-test split is performed.

To assess whether confound regression worked and the confounding signal sc is removed successfully, a confound correction method recently proposed by Snoek et al., 2019 can be used. In this method, the ML algorithm used in the original analysis is reused to predict the confound c from the neuroimaging data X. Following a successful confound regression, the confound should not be predictable anymore from X and Xc should produce insignificant or chance accuracy. Similarly, to determine if counterbalancing was successful and the correlation cy was removed, we used the Same Analysis Approach by Görgen et al., 2018. Here, the same ML algorithm is used to predict the confound c from the labels y(Görgen et al., 2018). An above-chance significant prediction accuracy between cy would indicate that the correlation cy still exists and the counterbalancing was not successful. Since the confounds csex and csite are categorical, they are first one-hot encoded to ensure no false ordinal relationship is implied. The confound correction methods are only performed on the training data as recommended by Snoek et al., 2019. The balanced accuracy metric used ensures that we account for any class imbalances in the test data. Before starting the ML exploration, we first compare these different confound correction methods and choose the most suitable method among them.

Algorithm 1. Pipeline pseudocode: Procedure followed for each of the 3 analyses. The ‘’ operation represents fitting or training the ML model given on the left side of the operation on the data given on the right side:
{dataexplore,dataholdout}datainfer                      Keep aside 20% as dataholdout
Start exploratory analysis 
M {LR, SVM-lin, SVM-rbf, GB}
y{yfreq,yamount,ybinge}                       select one of 10 AAM phenotypes
for iouter{1,2,,7} do                      Split dataexplore into 7 equal outer folds
trainouter{dataexplore[i]iiouter}
testouter{dataexplore[i]i=iouter}
for P do                       P is set of all hyperparameter combinations
  for iinner{1,2,,5} do                      Split trainouter into 5 equal inner folds
   traininner{trainouter[i]iiinner}
   testinner{trainouter[i]i=iinner}
   M(P)traininner
   acci= evaluate (M(P),testinner)
  end for
  accP= mean (acciiinner)              average accuracy for hyperparameter combination P
end for
P^{P highest (accPPP)}
M(P^)trainouter
accj= evaluate (M(P^),testouter)
end for
acc(M,y)= mean (accjiouter)                   average accuracy for model M and label y
M^,y^{M highest (acc(M,y)(M,y))             select the best model M^ and AAM phenotype y^
Start generalization test 
M^(P^)dataexplore
acc= evaluate (M^(P^),dataholdout)

Appendix 1

Appendix 1—table 1
Hyperparameters: Each machine learning (ML) model has a set of hyperparameters that are tuned using an inner 5-fold cross-validation during the ML exploration stage.

For both C and, γ higher values lead to overfitting and lower values can lead to underfitting. For gradient boosting, the maximum depth of the trees is set at, 5 the maximum numbers of estimators at, 100 and the subsampling of input features is disabled as counterbalancing is used. The remaining parameters are set at the default values as defined in the scikit-learn python package.

Modelhyperparametervalues tested
Logistic regressionC: Inverse of L2 regularization strength1000, 100, 1.0, 0.001
Linear support vector machineC: Inverse of L2 regularization strength1000, 100, 1.0, 0.001
Kernel-basedsupport vector machineC: Inverse of L2 regularization strengthγ: kernel coefficient of RBF kernel1000, 100, 1.0, 0.001’auto’, ’scale’
Gradient boostinglearning_rate0.05, 0.25
Appendix 1—table 2
AAM phenotype categorization: The table explains how the ten AAM phenotypes are derived from the respective IMAGEN questionnaire.

It lists the total values in that question and what range of values are used to categorize the subjects into safe users, moderate users and heavy users, respectively. For reference, the sample sizes (n) obtained at FU3 by using these value ranges are also shown in the brackets.

PhenotypeIMAGEN questionnaireTotalrangeSafe users range (n)Moderate misusersrange (n)Heavy misusers range (n)
FrequencyESPAD 8b0-60-4 (397)5 (270)6 (372)
AmountAUDIT q20-40 (413)1 (403)2-4 (219)
OnsetESPAD 29d11-2116-21 (531)14-15 (288)11-14 (216)
BingeESPAD 19a0-60-3 (299)4-5 (336)6 (400)
Binge-growthGrowth curveof ESPAD 19b0-90-2 (379)3-5 (420)6-9 (236)
AUDITAUDIT-total0-400-4 (443)5-7 (274)8-40 (318)
AUDIT-quickAUDIT-freq0-120-3 (402)4-5 (359)6-12 (274)
AUDIT-growthGrowth curveof AUDIT-total0-60,3 (377)4 (404)2,5,6 (254)
Combined-seoESPAD 8b, 17b, 19b,and TLFB alcohol20-20 (345)1 (404)2 (286)
Combined-oursAUDIT q1, q2,ESPAD 19a, growthcurve of ESPAD 19b0-30 (429)1 (403)2 (203)
Appendix 1—figure 1
Visualizing AAM phenotype categorization: A qualitative comparison showing how the ten AAM phenotypes categorize the same subjects into the three alcohol user classes – risky alcohol users, moderate users, and safe or non-users.

Each color-coded vertical line in the diagram represents one subject, out of the total 1182 subjects. It can be observed that the Frequency, Onset, and Amount phenotypes categorize very differently from Binge, showing that they capture different factors of alcohol misuse. All AUDIT-derived phenotypes are similar to each other but are different from the Binge phenotype. Furthermore, sex and site-specific variations can be detected. For instance, more males appear on the ‘risky’ groups compared to females. Similarly, most subjects from Dublin are clustered on the risky side.

Appendix 1—figure 2
ML exploration results per site: Accuracy of the non-linear models per site in the main experiments.

The sites are ordered from low to high accuracy.

Appendix 1—figure 3
ML exploration results per site in leave-one-site-out: Accuracy of the non-linear models per site in the leave-one-site-out.

The sites are ordered from low to high accuracy.

Appendix 1—table 3
Most informative sMRI features: An exhaustive list of the ‘most informative’ features in all three time point analyses provided along with their obtained SHAP values across seven repetitions.

SHAP values that didn’t surpass the threshold are shown in italic. (Acronyms: area: surface area, volume: gray matter volume, thickness: average thickness, thicknessstd: standard deviation of thickness, intensity: mean intensity, meancurv: integrated rectified mean curvature, gauscurv: integrated rectified gaussian curvature, curvind: intrinsic curvature index, foldind: folding index;)

Featureavg.avg.SHAP value
ModalityRegionSideNameTypeFeature valueSHAP valuerun1run2run3run4run5run6run7
FU3 (no. features = 21, threshold≥ 0.008743)
DTISplenium of the corpus callosum-0.873530.0144330.0141270.0136670.0151370.0155690.0168920.0144020.0112
T1wCorticalRightLateral occipital cortexThickness-0.6774260.0133780.0128730.0113530.0125390.0143820.0142060.0173040.011
T1wSubcorticalCerebrospinal fluidIntensity0.79030.0132440.0150390.0139020.0143820.0142250.0146670.0092840.0112
T1wCorticalLeftCaudal anterior cingulate cortexFoldind-0.613880.0127210.0113920.0121860.0149310.0145880.0114120.0172840.0073
T1wSubcorticalBrain-StemIntensity-0.594370.0126370.0117840.0103040.0130490.0146570.0175690.0100690.011
T1wSubcorticalRightAmygdalaVolume0.6641470.0125640.0178040.011480.0151370.0120490.0133530.0083240.0098
T1wCorticalRightParahippocampal gyrusArea0.7707220.0125420.0123730.0101370.014490.0157450.0100490.0152650.0097
T1wCorticalRightCuneus cortexThickness-0.6344560.0123730.0142750.0121960.0134610.011980.0100490.0126860.012
T1wSubcorticalRightHippocampusIntensity0.6233550.01220.0154610.0077250.0109410.0127940.0132550.0097650.0155
T1wSubcorticalLeftHippocampusIntensity0.6633950.0119090.0144220.0100490.0113140.0118430.0110980.0120980.0125
T1wSubcorticalLeftChoroid-plexusVolume-0.7616210.0118840.0100590.0093920.0153530.0129220.0136670.0100490.0117
T1wCorticalRightRostral anterior cingulate cortexThickness0.6367690.0117940.0118530.0091080.0154120.0148630.0137750.0094510.0081
T1wSubcorticalAnterior corpus callosumIntensity-0.6127970.011370.0093330.0103730.0128430.0064410.0143140.0174510.0088
T1wCorticalLeftPericalcarine cortexMeancurv-0.7242560.0113640.0126570.0111570.0145390.0104220.013010.0156370.0021
T1wCorticalRightSuperior parietal cortexThickness-0.6305120.0113210.010510.0106080.0112160.0126470.0132450.0138330.0072
T1wCorticalRightParahippocampal gyrusMeancurv-0.7917550.0109130.0116860.0133820.0074710.0117350.0103730.0108430.0109
DTIRightRetrolenticular part of the internal capsule-0.7124370.0107930.0098430.0096670.0127750.0119610.0101370.0103430.0108
T1wCorticalLeftLateral orbitofrontal cortexMeancurv0.6962620.0107610.0119510.010980.0116270.0041670.0111570.015480.01
T1wCorticalLeftRostral anterior cingulate cortexThickness-0.7460050.0106440.0098240.012490.010490.0105880.0122350.0124710.0064
T1wCorticalRightSupramarginal gyrusThickness-0.8089230.0101110.0096270.0055290.0093330.0092450.0124510.0131860.0114
DTILeftHippocampal component of the cingulum-0.7277280.0094510.0109510.009010.0093920.009020.0111670.0073730.0092
FU2 (no. features = 32, threshold≥ 0.009865)
T1wCorticalRightCaudal anterior cingulate cortexCurvind1.5917560.0191670.0189510.0188820.0183240.0183140.0192350.0213040.0192
T1wCorticalLeftCaudal anterior cingulate cortexThicknessstd-0.7460350.017010.018490.0133530.0224410.0170980.0152650.0178040.0146
T1wCorticalLeftCuneus cortexCurvind0.4625890.0161390.0176570.0153530.0152840.0147550.0161180.0181860.0156
T1wCorticalRightPars traingularisThicknessstd-0.817190.0159970.0117550.0133530.0185390.0236470.0220290.0078430.0148
T1wCorticalLeftPericalcarineCurvind0.010160.0159520.0137550.0170590.0126670.0173820.0188630.0159120.016
T1wCorticalRightInferior temporal gyrusThicknessstd-0.8541320.0157460.0146960.0136960.0239220.0153730.0191080.0169510.0065
T1wSubcorticalAnterior corpus callosumIntensity-0.6421570.0153960.0182550.0151370.0151180.0123140.0161370.0175590.0133
T1wCorticalRightCuneus cortexThickness-0.725790.0149550.0101670.0154710.0189510.0177060.0153430.0131180.0139
T1wCorticalLeftPars opercularisVolume-0.6284460.0146970.0163530.0190780.0184310.0129410.018510.0115880.006
DTILeftCorticospinal tract0.7757360.0143360.0138920.0143140.0152650.0159120.0143040.0155490.0111
T1wSubcorticalWhite matterIntensity-0.7547120.014210.0157650.0152550.0101180.0159510.0148530.0159410.0116
T1wCorticalRightFrontal poleThickness0.6196910.0141480.0176470.0132550.0136080.0172750.0126960.0163330.0082
T1wCorticalLeftPars opercularisArea-0.5826270.0141410.0181570.015980.0147450.0132350.0174120.0114710.008
T1wCorticalLeftFrontal poleCurvind0.7327180.0140780.0154120.0164220.0178820.0157250.0104120.0101960.0125
T1wSubcorticalLeftCerebellum cortexIntensity0.6102530.013990.0123040.016520.0162750.0143920.0142350.012510.0117
T1wCorticalRightPrecentral gyrusGauscurv0.2990080.0135560.0141270.0114410.0107250.0133040.0162060.014510.0146
T1wCorticalLeftRostral anterior cingulate cortexThickness-0.9160010.0132680.0103240.0138820.013480.013510.012490.0158920.0133
T1wCorticalLeftCaudal anterior cingulate cortexMeancurv-0.7043520.0132460.0109310.0081470.0207550.0151370.0141960.0118430.0117
T1wSubcorticalBrain-StemIntensity-0.7365920.0132460.01050.0153140.0144710.0141270.0150590.0130.0103
T1wCorticalLeftFusiform gyrusThicknessstd0.7582970.0131780.0129220.0097350.0168530.010.0154710.0130490.0142
T1wCorticalLeftLingual gyrusThicknessstd0.7253560.0130940.0118040.0174610.0068040.0141570.016020.012990.0124
T1wCorticalLeftPars opercularsisMeancurv-0.75110.0130240.0106670.015520.0142350.0133140.0182450.0130980.0061
T1wCorticalLeftInferior temporal gyrusThicknessstd0.731710.0130180.0105290.0101670.0149610.0176270.0113630.0149610.0115
T1wCorticalRightBanks of the superior temporal sulcusMeancurv-0.7668090.0126850.0143140.0118630.0149510.0124610.01450.0133630.0073
T1wSubcorticalRightAccumbens areaIntensity0.6409730.012630.0111080.0123920.0132550.0149710.0161470.0111370.0094
T1wCorticalRightInferior parietal cortexArea0.8440750.0124170.0153240.0086470.0111960.014010.0117840.0129510.013
T1wCorticalLeftPericalcarine cortexThickness-0.7382640.0122660.0106470.0111670.0149510.016480.0116180.0119220.0091
T1wCorticalRightPars opercularisArea-0.5622110.0121390.0108240.0129410.0139020.0133040.0152160.0100290.0088
T1wSubcorticalLeftCerebellum white matterIntensity0.7471480.0120450.0123630.010520.0175980.0150780.0122840.0119020.0046
T1wCorticalLeftSuperior parietal cortexThicknessstd0.7258890.0120030.0115980.0117940.0118430.0160290.0106370.0093630.0128
T1wCorticalRightPostcentral gyrusCurvind0.844780.0113990.0085690.0143530.0123530.0129710.0103430.0103920.0108
T1wSubcorticalLeftInferior lateral ventricleVolume-0.6028080.0109170.0149120.0098820.010490.0110590.010510.0094410.0101
BL (no. features = 46, threshold≥ 0.00993)
T1wSubcorticalRightPallidumVolume0.7757210.0232440.0208920.0198040.0186470.0223330.0251860.0303430.0255
T1wCorticalLeftTemporal poleVolume0.7772930.0214410.0191670.0186960.0196370.020520.024520.0237160.0238
T1wSubcorticalRightCerebellum cortexVolume0.8304380.0203280.0231570.0231180.0189310.0172650.0196080.0220490.0182
T1wSubcorticalAnterior corpus callosumIntensity-0.7118440.018650.0208730.0213530.0121270.0196370.0233730.0129220.0203
T1wCorticalLeftRostral middle frontal gyrusThicknessstd-0.7723790.0185570.0200490.0161470.0134020.020510.0190880.0190490.0217
T1wCorticalRightParahippocampal gyrusArea0.823870.0181410.0237250.0125390.0144310.0217750.0169610.0176180.0199
T1wCorticalRightInferior parietal cortexVolume0.7472010.0180850.0183730.0185490.0126860.0207650.018510.0199710.0177
T1wCorticalLeftLateral occipital cortexThickness-0.7332240.0175170.0176270.0144410.0143730.0257940.0185290.0186670.0132
T1wCorticalRightBanks of the superior temporal sulcusMeancurv-0.71060.0167070.0199310.0195590.0151960.0131270.014980.0147060.0195
T1wCorticalRightParahippocampal gyrusVolume0.8823480.0165980.0205880.0133240.0117060.0203040.0153630.0177650.0171
T1wCorticalLeftPericalcarine cortexThickness-0.6932170.0157630.0156570.0102350.0135690.0191760.0160490.0220690.0136
DTIPosterior corona radiata-0.72440.0156930.0204410.0153240.0117750.0186670.0158730.0157840.012
T1wCorticalRightSuperior parietal cortexThicknessstd0.7514690.0154260.0183330.0170880.0131760.0201960.0098140.0160980.0133
DTIRightPosterior corona radiata-0.6973440.0154060.0204610.0164310.0113330.0129020.0198040.0122450.0147
T1wCorticalLeftParacentral lobuleArea-0.6958950.0149940.0127650.0132940.0138240.0127750.011990.0232840.017
T1wCorticalLeftPars orbitalisArea0.7564530.0149440.0158920.0134410.0113630.0141670.0142450.0169220.0186
T1wCorticalLeftSuperior parietal cortexThicknessstd0.6811010.0149080.0161960.015980.0097840.0124310.0127750.0227160.0145
T1wCorticalRightCuneus cortexVolume-0.7530210.0148050.0152840.0146080.0102550.0120390.0164020.0173920.0177
T1wCorticalRightPericalcarine cortexThickness-0.5991150.0147420.0164410.0142450.0131080.0135880.0162550.0161080.0135
T1wCorticalLeftRostral anterior cingulate cortexCurvind0.8611640.0143570.0195780.0172060.01250.0131270.0067250.0115390.0198
T1wCorticalRightInferior parietal cortexArea0.7783270.0141510.0163530.0165590.0078430.0159610.0151760.0130780.0141
T1wCorticalRightCuneus cortexThickness-0.6501550.0140620.014010.0127550.0147060.0127550.0134510.0180590.0127
DTIRightRetrolenticular part of internal capsule-0.7158540.0139920.0117650.0178330.007510.0143330.0164220.0137250.0164
T1wCorticalRightInferior parietal cortexThicknessstd0.7103870.0138280.0146370.0137060.0141180.0124020.0088140.0173240.0158
T1wSubcorticalAnterior corpus callosumVolume0.8282490.0138240.0146670.0149510.0141470.0165880.0122250.0106370.0135
T1wCorticalLeftMedial orbitofrontal cortexThicknessstd0.753550.0138150.0184610.0130590.0116080.0097060.019990.0136270.0103
T1wSubcorticalLeftCerebellum cortexVolume0.8158260.0136960.0145590.0122450.0150690.0082450.015510.0176860.0126
T1wCorticalRightPars opercularisThickness-0.7521970.013690.0150.0157840.0057450.013990.0180290.0131270.0142
DTIRightAnterior limb of internal capsule0.7767130.0136620.0109710.0133430.0136470.0118530.0151960.0135590.0171
T1wCorticalRightTransveretemporal cortexThickness0.6932020.0136540.0129410.0143430.0086760.0187840.0143140.011520.015
T1wCorticalRightIsthmus cingulate cortexThicknessstd0.6543390.0136530.0146670.0129310.0119610.0104610.0196180.0132750.0127
T1wCorticalRightMedial orbitofrontal cortexMeancurv0.8670910.0135380.0124120.0102450.0087750.0182550.0120780.016520.0165
DTILeftPosterior corona radiata-0.697220.0131320.0153140.0116960.0100880.0171370.011020.0173430.0093
DTILeftCorticospinal tract0.8239270.0130630.0171080.0144410.0112160.0133730.0131860.0127650.0094
T1wCorticalLeftLateral orbitofrontal cortexMeancurv0.7714920.0127770.0107750.0150590.0067450.0183730.0141760.012020.0123
T1wSubcorticalBrain-StemIntensity-0.7916780.0126190.0158820.0115390.0109510.0104610.0118140.0116670.016
DTISplenium of corpus callosum-0.8172570.0125450.0116270.0104410.0131760.0140290.0131960.0151470.0102
T1wCorticalLeftMedial orbitofrontal cortexArea-0.7680570.0125410.0145490.0122250.0113530.0110390.0110590.0117060.0159
T1wCorticalLeftParacentral lobuleVolume-0.728050.0125170.0100780.0104020.0114310.0121760.0091960.0203530.014
T1wSubcorticalLeftInferior lateral ventricleVolume-0.5412170.012380.0132450.0103240.0138630.0136760.0101860.0096760.0157
T1wCorticalRightPericalcarine cortexVolume-0.628290.0123290.0111760.0147840.0100780.0104510.0110290.0157940.013
T1wCorticalLeftIsthmus cingulate cortexVolume0.879030.0121580.0107940.0113530.0111860.0117250.0094120.0137160.0169
T1wCorticalLeftTemporal poleThicknessstd-0.7800850.0119930.0110490.0117350.0121180.0155980.0076080.0126180.0132
T1wCorticalRightIsthmus cingulate cortexMeancurv0.8596620.0117210.0120390.0150290.011520.0126570.0102840.0104510.0101
DTIRetrorenticular part of internal capsule-0.7459340.0109330.0107650.0106370.004010.0133430.0118330.0140290.0119
DTILeftInferior fronto-occipital fasciculus0.9141310.0107210.0107940.0130690.0057350.0127450.0103630.0111080.0112

Data availability

This is a computational study. All data analyses code including the modelling pipeline are openly provided publicly at https://github.com/RoshanRane/ML_for_IMAGEN, (copy archived at swh:1:rev:6c493672ed700ded73c2b77e8976a5551921e634) for reuse and reproduction. Approval to use the IMAGEN dataset for this study was provided under the approval username / project code 'edeman'.

References

    1. Bengio Y
    2. Grandvalet Y
    (2004)
    No unbiased estimator of the variance of k-fold cross-validation
    Journal of Machine Learning Research 5:1089–1105.
  1. Conference
    1. Boser BE
    2. Guyon IM
    3. Vapnik VN
    (1992)
    A training algorithm for optimal margin classifiers
    In Proceedings of the fifth annual workshop on Computational learning theory. pp. 144–152.
  2. Conference
    1. Chen T
    2. Guestrin C
    (2016)
    Xgboost: A scalable tree boosting system
    In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785–794.
    1. Fan R-E
    2. Chang K-W
    3. Hsieh C-J
    4. Wang X-R
    5. Lin C-J
    (2008)
    Liblinear: A library for large linear classification
    The Journal of Machine Learning Research 9:1871–1874.
    1. Jones SA
    2. Lueras JM
    3. Nagel BJ
    (2018)
    Effects of Binge Drinking on the Developing Brain
    Alcohol Research 39:87–96.
  3. Conference
    1. Lundberg SM
    2. Lee S-I
    (2017)
    A unified approach to interpreting model predictions
    Proceedings of the 31st international conference on neural information processing systems. pp. 4768–4777.
  4. Book
    1. Molnar C
    (2022)
    Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (Second Edition)
    self-published.
    1. Ojala M
    2. Garriga GC
    (2010)
    Permutation tests for studying classifier performance
    Journal of Machine Learning Research 11:1833–1863.
    1. Zahr NM
    2. Pfefferbaum A
    (2017)
    Alcohol’s Effects on the Brain: Neuroimaging Results in Humans and Animal Models
    Alcohol Research 38:183–206.

Article and author information

Author details

  1. Roshan Prakash Rane

    Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Berlin, Germany
    Contribution
    Conceptualization, Formal analysis, Methodology, Project administration, Validation, Visualization, Writing - original draft, Writing – review and editing
    For correspondence
    roshan.rane@bccn-berlin.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3996-2034
  2. Evert Ferdinand de Man

    Faculty IV – Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing - original draft
    Competing interests
    No competing interests declared
  3. JiHoon Kim

    Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
    Contribution
    Investigation, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3157-3472
  4. Kai Görgen

    1. Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Berlin, Germany
    2. Science of Intelligence, Research Cluster of Excellence, Berlin, Germany
    Contribution
    Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4711-9629
  5. Mira Tschorn

    Social and Preventive Medicine, Department of Sports and Health Sciences, Intra-faculty unit “Cognitive Sciences”, Faculty of Human Science, and Faculty of Health Sciences Brandenburg, Research Area Services Research and e-Health, University of Potsdam, Potsdam, Germany
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Michael A Rapp

    Social and Preventive Medicine, Department of Sports and Health Sciences, Intra-faculty unit “Cognitive Sciences”, Faculty of Human Science, and Faculty of Health Sciences Brandenburg, Research Area Services Research and e-Health, University of Potsdam, Potsdam, Germany
    Contribution
    Methodology
    Competing interests
    No competing interests declared
  7. Tobias Banaschewski

    Department of Child and Adolescent Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
    Contribution
    Resources
    Competing interests
    No competing interests declared
  8. Arun LW Bokde

    Discipline of Psychiatry, School of Medicine and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  9. Sylvane Desrivieres

    Centre for Population Neuroscience and Precision Medicine (PONS), Institute of Psychiatry, Psychology Neuroscience SGDP Centre, King’s College London, London, United Kingdom
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  10. Herta Flor

    1. Institute of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
    2. Department of Psychology, School of Social Sciences, University of Mannheim, Mannheim, Germany
    Contribution
    Data curation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  11. Antoine Grigis

    NeuroSpin, CEA, Université Paris-Saclay, Paris, France
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  12. Hugh Garavan

    Departments of Psychiatry and Psychology, University of Vermont, Burlington, United States
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  13. Penny A Gowland

    Sir Peter Mansfield Imaging Centre School of Physics and Astronomy, University of Nottingham, Nottingham, United Kingdom
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  14. Rüdiger Brühl

    Physikalisch-Technische Bundesanstalt, Berlin, Germany
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0111-5996
  15. Jean-Luc Martinot

    Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre Borelli, Gif-sur-Yvette, France
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  16. Marie-Laure Paillere Martinot

    1. Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre Borelli, Gif-sur-Yvette, France
    2. AP-HP Sorbonne Université, Department of Child and Adolescent Psychiatry, Pitié-Salpêtrière Hospital, Paris, France
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  17. Eric Artiges

    1. Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre Borelli, Gif-sur-Yvette, France
    2. Psychiatry Department, EPS Barthélémy Durand, Etampes, France
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  18. Frauke Nees

    1. Department of Child and Adolescent Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
    2. Institute of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
    3. PONS Research Group, Dept of Psychiatry and Psychotherapy, Campus Charite Mitte, Humboldt University, Berlin, Germany
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  19. Dimitri Papadopoulos Orfanos

    NeuroSpin, CEA, Université Paris-Saclay, Paris, France
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1242-8990
  20. Herve Lemaitre

    1. NeuroSpin, CEA, Université Paris-Saclay, Paris, France
    2. Institut des Maladies Neurodégénératives, UMR 5293, CNRS, CEA, University of Bordeaux, Bordeaux, France
    Contribution
    Data curation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  21. Tomas Paus

    1. Department of Psychiatry, Faculty of Medicine and Centre Hospitalier Universitaire Sainte-Justine, University of Montreal, Montreal, Canada
    2. Departments of Psychiatry and Psychology, University of Toronto, Toronto, Canada
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  22. Luise Poustka

    Department of Child and Adolescent Psychiatry and Psychotherapy, University Medical Centre Göttingen, Göttingen, Germany
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  23. Juliane Fröhner

    Department of Psychiatry and Neuroimaging Center, Technische Universität Dresden, Dresden, Germany
    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared
  24. Lauren Robinson

    Department of Psychological Medicine, Section for Eating Disorders, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
    Contribution
    Resources
    Competing interests
    No competing interests declared
  25. Michael N Smolka

    Department of Psychiatry and Neuroimaging Center, Technische Universität Dresden, Dresden, Germany
    Contribution
    Resources
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5398-5569
  26. Jeanne Winterer

    1. Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Berlin, Germany
    2. Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
    Contribution
    Resources
    Competing interests
    No competing interests declared
  27. Robert Whelan

    School of Psychology and Global Brain Health Institute, Trinity College Dublin, Dublin, Ireland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  28. Gunter Schumann

    PONS Research Group, Dept of Psychiatry and Psychotherapy, Campus Charite Mitte, Humboldt University, Berlin, Germany
    Contribution
    Data curation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  29. Henrik Walter

    Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Berlin, Germany
    Contribution
    Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  30. Andreas Heinz

    Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Berlin, Germany
    Contribution
    Funding acquisition, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  31. Kerstin Ritter

    Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational Neuroscience, Berlin, Germany
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing - original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  32. IMAGEN consortium

    Contribution
    Data curation, Resources
    Competing interests
    No competing interests declared

Funding

German Research Foundation (402170461-TRR 265)

  • Roshan Prakash Rane
  • JiHoon Kim
  • Henrik Walter
  • Andreas Heinz
  • Kerstin Ritter

German Research Foundation (389563835)

  • Kerstin Ritter

German Research Foundation (414984028-CRC 1404)

  • Kerstin Ritter

German Research Foundation (390523135)

  • Kai Görgen

Research Foundation for International Scientists (82150710554)

  • Gunter Schumann

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Anne Beck and Sambu Seo for sharing the alcohol misuse label generated in their related study (Seo et al., 2019) for our ML exploration analysis.

Ethics

Written and informed consent was obtained from all participants by the IMAGEN consortium and the study was approved by the institutional ethics committee of King's College London,University of Nottingham, Trinity College Dublin, University of Heidelberg, Technische Universitat Dresden, Commissariat a l'Energie Atomique et aux Energies Alternatives, and University Medical Center at the University of Hamburg in accordance with the Declaration of Helsinki (doi:10.1001/jama.2013.281053). For this specific data analysis project, approval was provided by the IMAGEN group to us under the approval username / project ID 'edeman'. For this specific data analysis project, approval was provided by the IMAGEN group under the approval username 'edeman'.

Version history

  1. Preprint posted: February 1, 2022 (view preprint)
  2. Received: February 2, 2022
  3. Accepted: May 25, 2022
  4. Accepted Manuscript published: May 26, 2022 (version 1)
  5. Version of Record published: July 5, 2022 (version 2)
  6. Version of Record updated: July 13, 2022 (version 3)

Copyright

© 2022, Rane et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,394
    Page views
  • 292
    Downloads
  • 7
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Roshan Prakash Rane
  2. Evert Ferdinand de Man
  3. JiHoon Kim
  4. Kai Görgen
  5. Mira Tschorn
  6. Michael A Rapp
  7. Tobias Banaschewski
  8. Arun LW Bokde
  9. Sylvane Desrivieres
  10. Herta Flor
  11. Antoine Grigis
  12. Hugh Garavan
  13. Penny A Gowland
  14. Rüdiger Brühl
  15. Jean-Luc Martinot
  16. Marie-Laure Paillere Martinot
  17. Eric Artiges
  18. Frauke Nees
  19. Dimitri Papadopoulos Orfanos
  20. Herve Lemaitre
  21. Tomas Paus
  22. Luise Poustka
  23. Juliane Fröhner
  24. Lauren Robinson
  25. Michael N Smolka
  26. Jeanne Winterer
  27. Robert Whelan
  28. Gunter Schumann
  29. Henrik Walter
  30. Andreas Heinz
  31. Kerstin Ritter
  32. IMAGEN consortium
(2022)
Structural differences in adolescent brains can predict alcohol misuse
eLife 11:e77545.
https://doi.org/10.7554/eLife.77545

Share this article

https://doi.org/10.7554/eLife.77545

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Bingrui Li, Fernanda G Kugeratski, Raghu Kalluri
    Research Article

    Non-invasive early cancer diagnosis remains challenging due to the low sensitivity and specificity of current diagnostic approaches. Exosomes are membrane-bound nanovesicles secreted by all cells that contain DNA, RNA, and proteins that are representative of the parent cells. This property, along with the abundance of exosomes in biological fluids makes them compelling candidates as biomarkers. However, a rapid and flexible exosome-based diagnostic method to distinguish human cancers across cancer types in diverse biological fluids is yet to be defined. Here, we describe a novel machine learning-based computational method to distinguish cancers using a panel of proteins associated with exosomes. Employing datasets of exosome proteins from human cell lines, tissue, plasma, serum, and urine samples from a variety of cancers, we identify Clathrin Heavy Chain (CLTC), Ezrin, (EZR), Talin-1 (TLN1), Adenylyl cyclase-associated protein 1 (CAP1), and Moesin (MSN) as highly abundant universal biomarkers for exosomes and define three panels of pan-cancer exosome proteins that distinguish cancer exosomes from other exosomes and aid in classifying cancer subtypes employing random forest models. All the models using proteins from plasma, serum, or urine-derived exosomes yield AUROC scores higher than 0.91 and demonstrate superior performance compared to Support Vector Machine, K Nearest Neighbor Classifier and Gaussian Naive Bayes. This study provides a reliable protein biomarker signature associated with cancer exosomes with scalable machine learning capability for a sensitive and specific non-invasive method of cancer diagnosis.

    1. Computational and Systems Biology
    2. Immunology and Inflammation
    Alain Pulfer, Diego Ulisse Pizzagalli ... Santiago Fernandez Gonzalez
    Tools and Resources

    Intravital microscopy has revolutionized live-cell imaging by allowing the study of spatial–temporal cell dynamics in living animals. However, the complexity of the data generated by this technology has limited the development of effective computational tools to identify and quantify cell processes. Amongst them, apoptosis is a crucial form of regulated cell death involved in tissue homeostasis and host defense. Live-cell imaging enabled the study of apoptosis at the cellular level, enhancing our understanding of its spatial–temporal regulation. However, at present, no computational method can deliver robust detection of apoptosis in microscopy timelapses. To overcome this limitation, we developed ADeS, a deep learning-based apoptosis detection system that employs the principle of activity recognition. We trained ADeS on extensive datasets containing more than 10,000 apoptotic instances collected both in vitro and in vivo, achieving a classification accuracy above 98% and outperforming state-of-the-art solutions. ADeS is the first method capable of detecting the location and duration of multiple apoptotic events in full microscopy timelapses, surpassing human performance in the same task. We demonstrated the effectiveness and robustness of ADeS across various imaging modalities, cell types, and staining techniques. Finally, we employed ADeS to quantify cell survival in vitro and tissue damage in mice, demonstrating its potential application in toxicity assays, treatment evaluation, and inflammatory dynamics. Our findings suggest that ADeS is a valuable tool for the accurate detection and quantification of apoptosis in live-cell imaging and, in particular, intravital microscopy data, providing insights into the complex spatial–temporal regulation of this process.