Charting brain growth and aging at high spatial precision

  1. Saige Rutherford  Is a corresponding author
  2. Charlotte Fraza
  3. Richard Dinga
  4. Seyed Mostafa Kia
  5. Thomas Wolfers
  6. Mariam Zabihi
  7. Pierre Berthet
  8. Amanda Worker
  9. Serena Verdi
  10. Derek Andrews
  11. Laura KM Han
  12. Johanna MM Bayer
  13. Paola Dazzan
  14. Phillip McGuire
  15. Roel T Mocking
  16. Aart Schene
  17. Chandra Sripada
  18. Ivy F Tso
  19. Elizabeth R Duval
  20. Soo-Eun Chang
  21. Brenda WJH Penninx
  22. Mary M Heitzeg
  23. S Alexandra Burt
  24. Luke W Hyde
  25. David Amaral
  26. Christine Wu Nordahl
  27. Ole A Andreasssen
  28. Lars T Westlye
  29. Roland Zahn
  30. Henricus G Ruhe
  31. Christian Beckmann
  32. Andre F Marquand
  1. Donders Institute for Brain, Cognition, and Behavior, Radboud University, Netherlands
  2. Department of Cognitive Neuroscience, Radboud University Medical Center, Netherlands
  3. Department of Psychiatry, University of Michigan, United States
  4. Department of Psychiatry, Utrecht University Medical Center, Netherlands
  5. Department of Psychology, University of Oslo, Norway
  6. Norwegian Center for Mental Disorders Research (NORMENT), University of Oslo, and Oslo University Hospital, Norway
  7. Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, United Kingdom
  8. Centre for Medical Image Computing, Medical Physics and Biomedical Engineering, University College London, United Kingdom
  9. Dementia Research Centre, UCL Queen Square Institute of Neurology, United Kingdom
  10. The Medical Investigation of Neurodevelopmental Disorders (MIND) Institute and Department of Psychiatry and Behavioral Sciences, UC Davis School of Medicine, University of California, Davis, United States
  11. Amsterdam UMC, Vrije Universiteit, Psychiatry, Amsterdam Public Health Research Institute, Netherlands
  12. GGZ inGeest, Amsterdam Neuroscience, Netherlands
  13. Centre for Youth Mental Health, University of Melbourne, Australia
  14. Orygen Youth Health, Australia
  15. National Institute for Health Research Mental Health Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King’s College London, United Kingdom
  16. Department of Psychosis Studies, Institute of Psychiatry, King’s College London, United Kingdom
  17. Department of Psychiatry, Amsterdam UMC, Location AMC, Netherlands
  18. Department of Psychiatry, Radboud University Medical Center, Netherlands
  19. Department of Psychology, Michigan State University, United States
  20. Department of Psychology, University of Michigan, United States
  21. KG Jebsen Centre for Neurodevelopmental Disorders Research, Institute of Clinical Medicine, University of Oslo, Norway
  22. Centre for Affective Disorders at the Institute of Psychiatry, King's College London, United Kingdom
  23. Centre for Functional MRI of the Brain (FMRIB), Nuffield Department of Clinical Neurosciences, Wellcome Centre for Integrative Neuroimaging, University of Oxford, United Kingdom

Peer review process

This article was accepted for publication as part of eLife's original publishing model.

History

  1. Version of Record published
  2. Accepted Manuscript published
  3. Accepted
  4. Received
  5. Preprint posted

Decision letter

  1. Chris I Baker
    Senior and Reviewing Editor; National Institute of Mental Health, National Institutes of Health, United States
  2. Bernd Taschler
    Reviewer; University of Oxford, United Kingdom
  3. Oscar Esteban
    Reviewer; Stanford University, United States
  4. Todd Constable
    Reviewer; Yale University, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Charting Brain Growth and Aging at High Spatial Precision" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Chris Baker as the Senior and Reviewing Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Bernd Taschler (Reviewer #1); Oscar Esteban (Reviewer #2); Todd Constable (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

As you will see, all three reviewers are very enthusiastic about this work and have some excellent suggestions for strengthening the manuscript that will require some additional analyses.

Essential revisions:

The comments from the three reviewers are highly consistent and identify three main areas where the manuscript can be improved to increase the strength of the results and the utility of the resource.

1) All reviewers noted concerns about the current evidence for generalization of the findings. The authors should include additional cross-validation tests across sites.

2) Related to point (1), the bulk of the data come from the UK Biobank. The implications and potential limitations caused by this should be more fully discussed, although the additional cross-validation analyses will help with this.

3) The manual QC is very impressive, but the whole process could be described in more detail to enable others to reproduce such a QC.

Reviewer #1 (Recommendations for the authors):

This is a highly valuable resource that will hopefully grow further in the future. The manuscript is well written and data and results are presented in a clear and detailed way. I especially commend the authors on making their code easy to run, understandable and truly accessible.

One aspect that, in my opinion, would strengthen the paper is the inclusion of a more comprehensive evaluation on unseen data across sites. With clinical applications in mind where a small, in-house data set is compared to the reference models, it would be useful to understand how much variation is to be expected from scanner/site differences alone. A comparison of the existing evaluation metrics with a scenario in which the models are trained on one set of sites (or even just UKB alone) and tested on a separate set of data that does not include any of the training sites would increase the interpretability of the current results.

Several recent studies have found recruitment and selection bias in the UKB with respect to the general population and even within the imaging cohort compared to the full 500k. Although briefly mentioned in the limitations, this could be expanded further by discussing recent findings.

Reviewer #3 (Recommendations for the authors):

While the numbers are probably sufficient that it doesn't matter – it seems that the train and test sets were only split once – and then the results presented. Proper form might be to randomly split the train/test set multiple times to obtain distributions. It would be much stronger statistically if this was repeated. If this was already repeated then it should be made clearer. The wording refers to train and test set(s) with sets being plural, but I could not find anything explicitly stating how many times this was repeated.

The data shown in Figure 1 might be better served by splitting this into multiple figures. In A it is impossible to read the y-axis. In C and D the caption states that the lines are centiles of variation but it doesn't say what centiles (for example do they match the centiles of pediatric growth charts 0.4th, 2, 9th, 25, 50th etc?) – this should be stated.

Figure 1C shows whole cortex results, while D shows subcortical. It would be nice to show data for some cortical brain regions – or even summarized for lobes instead of just whole brain.

For regions, it would be reassuring to see that the development curves for PFC for example, agree with the previous literature. Or even show that different regions have different temporal growth charts. Similarly, the work could be put in context with the work of Toga et al., Trends Neurosci, 2006 – mapping brain maturation. Or the work of Pigoni et al., Eur Neuropsychopharm, 2021 where they show (in a large sample) that cortical thickness changes in the temporal lobes can be used in classification of first episode psychosis. While the authors state that a thorough analysis of these curves is beyond the scope (and I agree) it would be helpful to have some text that confirms these curves (for healthy or diseased brains) agree with past literature.

Overall I am enthusiastic to see this work published.

https://doi.org/10.7554/eLife.72904.sa1

Author response

Essential revisions:

The comments from the three reviewers are highly consistent and identify three main areas where the manuscript can be improved to increase the strength of the results and the utility of the resource.

1) All reviewers noted concerns about the current evidence for generalization of the findings. The authors should include additional cross-validation tests across sites.

We thank the reviewers for this suggestion and have addressed the concern regarding generalizability in several ways. First, we ran an additional 10 randomized train/test splits of the data in the full sample. These new analyses show the stability of our models, as there is very little variation in the evaluation metrics across all 10 splits. These results are visualized in Figure 3—figure supplement 2. However, the static Figure 3—figure supplement 2 is challenging to read, simply because there are many brain regions fit into a single plot. Therefore, we also created an online interactive visualization tool (https://brainviz-app.herokuapp.com/) that shows the image of the brain region and the explained variance when you hover over a point (see Author response image 1 and Video 1). This interactive visualization was created for all supplemental tables for easier exploration and interpretations and we now recommend this tool as the primary method to interrogate our findings interactively.

Author response image 1
Example of the online interactive visualizations created to help interpret the evaluation metrics.

This interactive figure was created for each evaluation metric (EV, MSLL, skew, and kurtosis) and all test sets (full controls, mQC controls, clinical, and transfer).

Second, we updated and expanded the transfer data set to include 6 open datasets from OpenNeuro.org (N=546) and we provide this example dataset on our GitHub with the transfer code (https://colab.research.google.com/github/predictive-clinical-neuroscience/braincharts/blob/master/scripts/apply_normative_models.ipynb). This simultaneously provides a more comprehensive evaluation of the performance of our model on unseen data and more comprehensive walk-through for new users applying our models to new data (sites unseen in training).

Finally, we added per-site evaluation metrics (Figure 3—figure supplement 3) to demonstrate that performance is relatively stable across sites and not driven by a single large site (i.e., UKB). As site is strongly correlated with age, these visualizations can also be used to approximate model performance at different age ranges (i.e., 9–10-year-old performance can be assessed by looking at ABCD sites evaluation metrics, and 50–80-year-old performance can be assessed by looking at UKB evaluation metrics). Moreover, we would also like to emphasize that we should not expect that all sites achieve the same performance because the sampling of the different sites is highly heterogeneous in that some sites cover a broad age range (e.g., OASIS, UKB) whereas other sites have an extremely narrow age range (e.g., ABCD).

2) Related to point (1), the bulk of the data come from the UK Biobank. The implications and potential limitations caused by this should be more fully discussed, although the additional cross-validation analyses will help with this.

The responses to revision 1 should help to address this, in that we show per site evaluation metrics, cross validation, and additional transfer examples. These additional analyses show that the model performance is not driven solely by the UKB sample. However, we agree with this comment and have also updated the limitation section regarding the overrepresentation of UKB and included a statement regarding the known sampling bias of UKB.

“We also identify limitations of this work. We view the word “normative” as problematic. […] By sampling both healthy population samples and case-control studies, we achieve a reasonable estimate of variation across individuals, however, downstream analyses should consider the nature of the reference cohort and whether it is appropriate for the target sample.”

3) The manual QC is very impressive, but the whole process could be described in more detail to enable others to reproduce such a QC.

We have added further details regarding the quality checking procedure to the methods section (following reviewer 2 comment 3a-c suggestions) and improved the clarity of directions for implementing the scripts, including an interactive link to view an example of the manual QC environment, on the QC GitHub page to enable others to reproduce our manual QC pipeline.

“Quality control (QC) is an important consideration for large samples and is an active research area (Alfaro-Almagro et al., 2018; Klapwijk et al., 2019; Rosen et al., 2018). […] We separated the evaluation metrics into full test set (relying on automated QC) and mQC test set in order to compare model performance between the two QC approaches and were pleased to notice that the evaluation metrics were nearly identical across the two methods.”

Reviewer #1 (Recommendations for the authors):

This is a highly valuable resource that will hopefully grow further in the future. The manuscript is well written and data and results are presented in a clear and detailed way. I especially commend the authors on making their code easy to run, understandable and truly accessible.

One aspect that, in my opinion, would strengthen the paper is the inclusion of a more comprehensive evaluation on unseen data across sites. With clinical applications in mind where a small, in-house data set is compared to the reference models, it would be useful to understand how much variation is to be expected from scanner/site differences alone. A comparison of the existing evaluation metrics with a scenario in which the models are trained on one set of sites (or even just UKB alone) and tested on a separate set of data that does not include any of the training sites would increase the interpretability of the current results.

As noted, we have addressed this concern in response to Essential revisions 1) and 2) above.

Several recent studies have found recruitment and selection bias in the UKB with respect to the general population and even within the imaging cohort compared to the full 500k. Although briefly mentioned in the limitations, this could be expanded further by discussing recent findings.

We have addressed this concern in response to Essential revision 2) above.

Reviewer #3 (Recommendations for the authors):

While the numbers are probably sufficient that it doesn't matter – it seems that the train and test sets were only split once – and then the results presented. Proper form might be to randomly split the train/test set multiple times to obtain distributions. It would be much stronger statistically if this was repeated. If this was already repeated then it should be made clearer. The wording refers to train and test set(s) with sets being plural, but I could not find anything explicitly stating how many times this was repeated.

We have addressed this comment under Essential revision 1) above. Regarding nomenclature, in our previous version the plural use of test sets previously referred to the full test set, mQC test set, transfer test set, and clinical test set (Table 1). However, we thank the reviewer for pointing out that this is open to misinterpretation and have included specific test set names when referring to them instead of just using “test sets”. We also have included a resampling of the full controls test set to address the generalizability concern. Additional details on this analysis are in response to Essential revision 1 above.

The data shown in Figure 1 might be better served by splitting this into multiple figures. In A it is impossible to read the y-axis. In C and D the caption states that the lines are centiles of variation but it doesn't say what centiles (for example do they match the centiles of pediatric growth charts 0.4th, 2, 9th, 25, 50th etc?) – this should be stated.

We have increased the resolution and font size in Figure 1 and added labels to the centiles.

Figure 1C shows whole cortex results, while D shows subcortical. It would be nice to show data for some cortical brain regions – or even summarized for lobes instead of just whole brain.

For regions, it would be reassuring to see that the development curves for PFC for example, agree with the previous literature. Or even show that different regions have different temporal growth charts. Similarly, the work could be put in context with the work of Toga et al., Trends Neurosci, 2006 – mapping brain maturation. Or the work of Pigoni et al., Eur Neuropsychopharm, 2021 where they show (in a large sample) that cortical thickness changes in the temporal lobes can be used in classification of first episode psychosis. While the authors state that a thorough analysis of these curves is beyond the scope (and I agree) it would be helpful to have some text that confirms these curves (for healthy or diseased brains) agree with past literature.

We have included a frontal cortex ROI trajectory in Figure 1 D and there is another cortical ROI shown (calcarine). Also on our studies GitHub(https://github.com/predictive-clinical-neuroscience/braincharts/tree/master/metrics), we now include images of the trajectory for every ROI so that users can further explore these results if they wish to see a certain ROI that is not represented in Figure 1 as there is not enough space to show all brain regions. We have included additional citations to prior work mapping trajectories in the Results section.

https://doi.org/10.7554/eLife.72904.sa2

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Saige Rutherford
  2. Charlotte Fraza
  3. Richard Dinga
  4. Seyed Mostafa Kia
  5. Thomas Wolfers
  6. Mariam Zabihi
  7. Pierre Berthet
  8. Amanda Worker
  9. Serena Verdi
  10. Derek Andrews
  11. Laura KM Han
  12. Johanna MM Bayer
  13. Paola Dazzan
  14. Phillip McGuire
  15. Roel T Mocking
  16. Aart Schene
  17. Chandra Sripada
  18. Ivy F Tso
  19. Elizabeth R Duval
  20. Soo-Eun Chang
  21. Brenda WJH Penninx
  22. Mary M Heitzeg
  23. S Alexandra Burt
  24. Luke W Hyde
  25. David Amaral
  26. Christine Wu Nordahl
  27. Ole A Andreasssen
  28. Lars T Westlye
  29. Roland Zahn
  30. Henricus G Ruhe
  31. Christian Beckmann
  32. Andre F Marquand
(2022)
Charting brain growth and aging at high spatial precision
eLife 11:e72904.
https://doi.org/10.7554/eLife.72904

Share this article

https://doi.org/10.7554/eLife.72904