Using normative models pre-trained on cross-sectional data to evaluate longitudinal changes in neuroimaging data

  1. Department of Complex Systems, Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic
  2. Department of Cybernetics, Czech Technical University in Prague, Prague, Czech Republic
  3. National Institute of Mental Health, Klecany, Czech Republic
  4. Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
  5. Max Planck Institute for Research on Collective Goods, Bonn, Germany
  6. University of Cologne, Germany

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jason Lerch
    University of Oxford, Oxford, United Kingdom
  • Senior Editor
    Jonathan Roiser
    University College London, London, United Kingdom

**Reviewer #1 (Public Review):
**
Summary:

This paper provides a methodology for normative trajectory modeling, using cross-sectional data to set the "norms," and then applying these norms to longitudinal brain observations. An example of schizophrenia trajectories (two time points) is provided. The method assumes a Bayesian mixed effects model, which included some hyperparameters that need to be tuned. The longitudinal assumption is essentially a random intercept model, assuming that the age-based quantiles do not shift, and if they do that is a sign of disease-like trajectories.

Strengths:

Normative modeling of brain feature trajectories is an important topic. Bayesian models are a promising alternative to modeling these. Leveraging large-scale data to provide norms is also potentially useful.

Weaknesses:

The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models). The assumption of constant quantiles is very strong, and limits the utility of the model to very short term data. The schizophrenia example leads to a counter-intuitive normalization of trajectories, which leads to suspicions that this is driven by some artifact of the data modeling/imaging pipelines. The method also assumes that the cross-sectional data is from a "healthy population" without describing what this population is (there is certainly every chance of ascertainment bias in large scale studies as well as small scale studies). This issue is completely elided over in the manuscript.

Reviewer #2 (Public Review):

Summary:

In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims to solve a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.

Although the proposed method was implemented with real data and shown to be more sensitive in capturing the differences confirmed by previous studies than conventional methods, there is still a lack of simulation studies to formally evaluate the performance of the proposed method in making accurate estimations and inferences about the longitudinal changes.

Strengths:

The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the perspective of methodological development.

Weaknesses:

• The organization and clarity of this manuscript need enhancement for better comprehension and flow. For example, in the first few paragraphs of the introduction, the wording is quite vague. A lot of information was scattered and repeated in the latter part of the introduction, and the actual challenges/motivation of this work were not introduced until the 5th paragraph.

• There are no simulation studies to evaluate whether the adjustment of the cross-sectional normative model to longitudinal data can make accurate estimations and inferences regarding the longitudinal changes. Also, there are some assumptions involved in the modeling procedure, for example, the deviation of a healthy control from the population over time is purely caused by noise and constant variability of error/noise across x_n, and these seem to be quite strong assumptions. The presentation of this work's method development would be strengthened if the authors can conduct a formal simulation study to evaluate the method's performance when such assumptions are violated, and, ideally, propose some methods to check these assumptions before performing the analyses.

• The proposed "z-diff score" still falls in the common form of z-score to describe the individual deviation from the population/reference level, but now is just specifically used to quantify the deviation of individual temporal change from the population level. The authors need to further highlight the difference between the "z-score" and "z-diff score", ideally at its first mention, in case readers get confused (I was confused at first until I reached the latter part of the manuscript). The z-score can also be called a measure of "standardized difference" which kind of collides with what "z-diff" implies by its name.

• Explaining that one component of the variance is related to the estimation of the model and the other is due to prediction would be helpful for non-statistical readers.

• It would be easier for the non-statistical reader if the authors consistently used precision or variance for all variance parameters. Probably variance would be more accessible.

• The functions psi were never explicitly described. This would be helpful to have in the supplement with a reference to that in the paper.

• What is the goal of equations (13) and (14)? The authors should clarify what the point of writing these equations is prior to showing the math. It seems like it is to obtain an estimate of \sigma_{\ksi}^2, which the reader only learns at the end.

• What is the definition of "adaption" as used to describe equation (15)? In this equation, I think norm on subsample was not defined.

• "(the sandwich part with A)" - maybe call this an inner product so that it is not confused with a sandwich variance estimator. This is a bit unclear. Equation (8) does have the inner product involving A and \beta^{-1} does include variability of \eta. It seems like you mean that equation (8) incorrectly includes variability of \eta and does not have the right term vector component of the inner product involving A, but this needs clarifying.

• One challenge with the z-diff score is that it does not account for whether a person sits above or below zero at the first time point. It might make it difficult to interpret the results, as the results for a particular pathology could change depending on what stage of the lifespan a person is in. I am not sure how the authors would address those challenges.

Author response:

We thank the reviewers for the feedback on our manuscript; we are planning to address the raised concerns in the following manner:

We will be more explicit about the novelty of this method framing it more concretely within the scope of current research. From some comments of the reviewers, we understand that it is not clear that our method is an extension of an already existing method and model that has been extensively validated with pre-trained models brought online. Consequently, the details of the model as well as the training cohort are only covered briefly, referencing relevant published works on this topic. We will improve the clarity in this respect in the full responses. Nevertheless, we agree that the work would benefit from a simulation study that formally evaluates the performance of our method compared with more traditional approaches and will add it in our full responses. We will take care specifically of investigating the effect of assumptions like the centile-stability in healthy controls as suggested by the Reviewer 2.

The novelty of this work lies in introducing a mathematically transparent method to use normative modelling for evaluating studies with a longitudinal design, using normative models trained on cross sectional data. We emphasise strongly that this is otherwise not possible using current methods. Furthermore, by building on a pre-trained model, this method enjoys the benefits of big (cross-sectional) data (by the pre-trained model being fitted on an extensive population sample) without the need to have direct access to them, or a ‘big’ longitudinal dataset from the cohort at hand. This is crucial in neuroimaging, where longitudinal data are much more scarce than cross-sectional data.

We strongly disagree with the notion raised by Reviewer 1 that after the first episode cortical thickness alterations are expected to become more severe. There is now increasing evidence that: (i) trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode and (ii) that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode. Indeed, we can provide evidence for this in an independent cohort, with different analytical methodologies, where precisely this occurs (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v1, https://pubmed.ncbi.nlm.nih.gov/36805840/). In the full revision, we would be happy to provide further discussion of evidence in support of this.

We would also like to re-emphasise that the data were processed with the utmost rigour using state of the art processing pipelines including quality control.

We will take care to improve the flow of the manuscript with special attention to the theoretical part and sections highlighted by the Reviewer 2.

We agree with the challenge outlined by the Reviewer 2 regarding the limitations in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to this study. The non-random sampling of large cohort studies is problematic for nearly all studies using such cohorts, and regardless of the statistical approach used. We will explicitly acknowledge these limitations in the full response.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation