1. Neuroscience
Download icon

Diffusion-MRI-based regional cortical microstructure at birth for predicting neurodevelopmental outcomes of 2-year-olds

  1. Minhui Ouyang
  2. Qinmu Peng
  3. Tina Jeon
  4. Roy Heyne
  5. Lina Chalak
  6. Hao Huang  Is a corresponding author
  1. Radiology Research, Children’s Hospital of Philadelphia, United States
  2. Department of Radiology, Perelman School of Medicine, University of Pennsylvania, United States
  3. Department of Pediatrics, University of Texas Southwestern Medical Center, United States
Research Article
  • Cited 1
  • Views 413
  • Annotations
Cite this article as: eLife 2020;9:e58116 doi: 10.7554/eLife.58116

Abstract

Cerebral cortical architecture at birth encodes regionally differential dendritic arborization and synaptic formation. It underlies behavioral emergence of 2-year-olds. Brain changes in 0–2 years are most dynamic across the lifespan. Effective prediction of future behavior with brain microstructure at birth will reveal structural basis of behavioral emergence in typical development and identify biomarkers for early detection and tailored intervention in atypical development. Here we aimed to evaluate the neonate whole-brain cortical microstructure quantified by diffusion MRI for predicting future behavior. We found that individual cognitive and language functions assessed at the age of 2 years were robustly predicted by neonate cortical microstructure using support vector regression. Remarkably, cortical regions contributing heavily to the prediction models exhibited distinctive functional selectivity for cognition and language. These findings highlight regional cortical microstructure at birth as a potential sensitive biomarker in predicting future neurodevelopmental outcomes and identifying individual risks of brain disorders.

Introduction

Brain cerebral cortical microstructure underlies neuronal circuit formation and function emergence during brain maturation. Regionally distinctive cortical microstructural architecture profiles around birth result from immensely complicated and spatiotemporally heterogeneous underlying cellular and molecular processes (Silbereis et al., 2016), including neurogenesis, synapse formation, dendritic arborization, axonal growth, pruning, and myelination. Disturbance of such precisely regulated maturational events is associated with mental disorders (Innocenti and Price, 2005). Diffusion magnetic resonance imaging (dMRI) has been widely used for quantifying microstructural changes in white matter (WM) maturation (e.g. Dubois et al., 2008; Mukherjee et al., 2001). Because of its sensitivity to organized cortical tissue (e.g. radial glial scaffold; Rakic, 1995; Sidman and Rakic, 1973) unique in the fetal and infant brain, dMRI also offers insights into maturation of cortical cytoarchitecture. Cortical fractional anisotropy (FA), a dMRI-derived measurement, of infant and fetal brain can effectively quantify local cortical microstructural architecture related to dendritic arborization and synaptic formation. Thus, cortical FA can potentially be used to infer specific brain circuit formation. In early cortical development, most of cortical neurons are generated in the ventricular and subventricular zone. These neurons migrate toward the cortical surface along a radially arranged scaffolding of glial cells where relatively high FA values are usually observed (Huang et al., 2013; McKinstry et al., 2002). During emergence of brain circuits, increasing dendritic arborization (Bystron et al., 2008; Sidman and Rakic, 1973), synaptic formation (Huttenlocher and Dabholkar, 1997), and myelination of intracortical axons (Yakovlev and Lecours, 1967) disrupt the highly organized radial glia in the immature cortex and result in cortical FA decreases. Such reproducible cortical FA change patterns were documented in many studies of perinatal human brain development (Ball et al., 2013; Huang et al., 2006; Huang et al., 2009; Huang et al., 2013; Kroenke et al., 2007; McKinstry et al., 2002; Neil et al., 1998; Ouyang et al., 2019a; Ouyang et al., 2019b; Yu et al., 2016), suggesting sensitivity of cortical FA measures to maturational processes of cortical microstructure. Diffusion-MRI-based regional cortical microstructure at birth, encoding rich ‘footage’ of regional cellular and molecular processes, may provide novel information regarding typical cortical development and biomarkers for neuropsychiatric disorders.

The first 2 years of life is a critical period for behavioral development, with brain development in this period most rapid across the lifespan. In parallel to rapid maturation of cortical architecture and establishment of complex neuronal connections (Hüppi et al., 1998; Ouyang et al., 2019a; Pfefferbaum et al., 1994), babies learn to walk, talk, and build the core capacities for lifetime. Infant behaviors including cognition, language, and motor emerge during this time and become measurable at around 2 years of age. Reliable diagnosis for many neuropsychiatric disorders, such as autism spectrum disorder (ASD), can be made only around 2 years of age or later (Marín, 2016), as diagnoses rely on observing behavioral problems that are difficult to recognize in early infancy (Arpi and Ferrari, 2013; Ozonoff et al., 2010). On the other hand, early intervention for ASD, especially before 2 years of age, has demonstrated potential on improving outcomes (Rogers et al., 2014). Given that infants cannot communicate with language or writing in early infancy, there may be no better way to assess their brain development other than neuroimaging. Prediction of future cognition and behavior at 2 years of age or later based on brain features around birth creates an invaluable time window for individualized biomarker detection and early tailored intervention leading to better outcomes.

Individual differences in brain WM microstructural architectures (Scholz et al., 2009; Yu et al., 2020), behavior, and functions (Braga and Buckner, 2017; Xu et al., 2019) have been well recognized. Individual variability in brain structures and associated individual variability in future behaviors can be harnessed for robust prediction at the single-subject level (Kanai and Rees, 2011; Rosenberg et al., 2018), a step further than group classification. A few studies have been conducted previously to investigate within-sample imaging-outcome correlations (Ball et al., 2015; Counsell et al., 2014; Deoni et al., 2016; Hintz et al., 2015; Keunen et al., 2017; Peyton et al., 2020; Wee et al., 2017; Woodward et al., 2006), while such correlation approaches made it impossible to be applied to new and incoming subjects. Machine learning approaches that can adopt new subjects and yield continuous prediction values have been explored only recently based on WM structural networks (Girault et al., 2019; Kawahara et al., 2017). Compared to association of WM microstructure with cortical regions through WM fiber end point connectivity, association of cortical microstructure with cortical regions is more direct. Thus, FA of a specific cortical region can be used to directly reflect certain functions of the same cortical region. Our previous study Ouyang et al., 2019b demonstrated that cortical FA predicted neonate age with high accuracy. Regionally distinctive cortical microstructure around birth encodes the information that may predict distinctive functions manifested by future behavior and potentially identify the most sensitive regions as imaging markers to detect early behavioral abnormality. However, dMRI-based cortical microstructure has not been evaluated for predicting either discrete or continuous future behavioral measurement so far. And dMRI-based cortical microstructure has not been incorporated into a machine-learning-based prediction model for predicting future behavior, either.

In this study, we leveraged individual variability of cortical microstructure profiles of neonate brains for predicting future behavior. A novel machine-learning-based model using regional cortical microstructure markers from dMRI was developed to predict continuous outcome values. This model is also capable of incorporating new subjects in contrast to within-sample imaging-outcome correlations. We hypothesized that dMRI-based cortical microstructure at birth only (without inclusion of any WM microstructure information) could robustly predict the future neurodevelopmental outcomes. Out of 107 recruited neonates, high-resolution (0.656 × 0.656 × 1.6 mm3) dMRI data were acquired from 87 neonates, of which 46 underwent a follow-up study at their 2 years of age for neurobehavioral assessments of cognitive, language, and motor abilities. Cortical microstructural architectures at birth were quantified by cortical FA on the cortical skeleton to alleviate partial volume effects (Ouyang et al., 2019b; Yu et al., 2016). Regional cortical FA measures were then used to form feature vectors to predict neurodevelopmental outcomes at 2 years of age. We further quantified the contribution of each cortical region in predicting different outcomes, as distinctive behaviors are likely encoded in uniquely distributed pattern across the cerebral cortex.

Results

Cortical microstructure at birth and neurodevelopmental outcomes at 2 years of age

A cohort of 107 neonates was recruited for studying prenatal and perinatal human brain development (see more details in Materials and methods and Supplementary file 1). Neuroimaging data, including structural and diffusion MRI, were collected from 87 infants around birth in their natural sleep. Forty-six infants went through a follow-up visit at their 2 years of age to complete the cognitive, language, and motor assessments with Bayley scales of infant and toddler development-Third Edition (Bayley-III; Bayley, 2006). Figure 1—figure supplement 1 and Figure 1—figure supplement 2 demonstrate the cortical FA maps across parcellated cortical gyri in the left and right hemisphere from dMRI of all these 46 subjects scanned at birth, revealing individual variability of regional cortical microstructure. The Bayley-III composite scores from these 46 subjects at 2 years of age range from 65 to 110 (mean ± sd: 87.4 ± 8.5) for cognition, 56 to 112 (85.7 ± 10.1) for language, and 73 to 107 (91.2 ± 7.1) for motor abilities (Figure 1—figure supplement 3a). No significant differences between preterm and term born infants were found in any of the Bayley-III composite scores (all p>0.3; Figure 1—figure supplement 3b). No significant correlation between any specific age (i.e. birth age, MRI scan age, and Bayley-III exam age) and neurodevelopmental outcome score was found in either preterm or term born infant groups (all p>0.1; Supplementary file 2).

Robust prediction of cognitive and language outcomes based on cortical dMRI measurement

Fifty-two cortical regions parcellated by transforming neonate atlas labels (Figure 1—figure supplement 4) were used to generate cortical FA feature vectors from each participant’s dMRI data at birth, representing the entire cortical microstructural architecture of an individual neonate (Figure 1; see Materials and methods). Heterogeneous cortical FA values distributed across cortical regions can be appreciated from cortical FA maps (left panels in Figure 1), indicating regionally differentiated maturation level of cortical microstructure. An immature cerebral cortical region with highly organized radial glia scaffold is associated with high FA values, whereas a more mature cortical region with extensive dendritic arborizations and synapses formations is associated with low FA values. To determine whether cortical microstructural features represented by FA measurements at birth are capable of predicting neurodevelopmental outcomes of an individual infant at a later age, we used support vector regression (SVR) with a fully leave-one-out cross-validated (LOOCV) approach (middle panels of Figure 1). With this approach, the neurodevelopmental outcome of each infant was predicted from an independent training sample. That is, for each testing subject out of the 46 participants, the cortical FA features of remaining 45 subjects were used to train prediction models for predicting cognitive or language outcomes of the testing subject at 2 years of age (right panels in Figure 1) only based on cortical FA of the testing subject at birth. An SVR model that best fits the training sample can be represented by a weighted contribution of all features, where the weight vector (w) indicates the relative contribution of each feature, namely, cortical FA of each parcellated cortical region, to the prediction model. The feature contribution weights in the model predicting cognition or language were averaged across all leave-one-out SVR models and then normalized to | wi |/| wi | with i indicating ith cortical gyrus. These normalized feature contribution weights were projected back onto the cortical surface to demonstrate cortical regional contribution (right panels in Figure 1).

Figure 1 with 5 supplements see all
Workflow of predicting neurodevelopmental outcomes at 2 years based on cortical microstructural architecture at birth.

Cortical microstructure at birth (0 year) quantified with cortical fractional anisotropy (FA) measures from diffusion magnetic resonance imaging (dMRI) was used to predict cognitive and language abilities assessed with Bayley-III Scales at 2 years of age (2 year). The prediction workflow includes the following steps: (1) Cortical microstructure was measured at the ‘core’ of cortical mantle, shown as green skeleton overlaid on a FA map and projected on a neonate cortical surface, to alleviate the partial volume effects. Schematic depiction of dendritic arborization and synaptic formation underlying cortical FA decreases during cortical microstructural maturation is shown. (2) Feature vectors were obtained by measuring cortical skeleton FA at parcellated cortical gyri with the gyral labeling transformed from a neonate atlas. Each parcellated cortical gyrus is a region-of-interest (ROI). (3) Prediction models were established and tested with support vector regression (SVR) and cross-validation. Feature vectors from all subjects were concatenated to obtain the input data of prediction models. (4) Prediction model accuracy was evaluated by correlation between predicted and actual scores. Feature contributions from different gyri in the model were quantified by normalized feature contribution weights which were projected back on a cortical surface for visualization.

Significant correlations between predicted and actual neurodevelopmental outcome were found for both cognitive (= 0.536, p=1.2 × 10−4) and language (= 0.474, p=8.8 × 10−4) scores, respectively (left panel in Figure 2a and b), indicating robust prediction of cognitive and language outcomes at 2 years old based on cortical FA measures at birth. According to the permutation tests, these correlations were significantly higher than those obtained by chance (p<0.005). The mean absolute errors (MAEs) between the predicted and actual scores are 5.49 and 7 for cognitive and language outcomes respectively. These MAEs were significantly lower than those obtained by chance (p<0.01), based on permutation tests. The highly predictive models suggest that cortical microstructural architecture at birth plays an important role in predicting future behavioral and cognitive abilities. However, motor scores were not able to be predicted from cortical FA measures (= 0.1, p=0.52).

Figure 2 with 2 supplements see all
Cortical microstructural measures from neonate diffusion magnetic resonance imaging predict cognitive (a) and language (b) scores at 2 years of age with different feature contribution weights from various cortical gyri.

Left panels: The scatter plots show significant correlation between actual scores and cognitive (= 0.536, p=1.2 × 10−4) or language (= 0.474, p=8.8 × 10−4) scores predicted based on cortical fractional anisotropy measures. Each dot represents one subject and linear regression was used to assess predictive accuracy of the model. The width of the line denotes the 95% confidence interval around the linear model fit between predicted and observed scores. Center panels: Normalized feature contribution weights of all cortical gyri in the prediction models are projected on a cortical surface. Right panels: Normalized feature contribution weights from all cortical gyri are demonstrated in the circular bar. These gyri were grouped into frontal, parietal, temporal, occipital, limbic, and insular cortex. Abbreviations: r: right hemisphere; l: left hemisphere. See Supplementary file 3 for abbreviations of cortical regions and values of normalized feature contribution weights from all cortical gyri.

Evaluation of robustness of prediction models

Evaluation with different cortical parcellation schemes and age effects around birth

The prediction models are robust based on evaluation results of different cortical parcellation schemes; and the prediction results are still significant after age adjustment in the cortical FA features (Figure 2—figure supplement 1). To investigate the effects of different cortical parcellation schemes on prediction models, we measured regional cortical FA values with different cortical parcellation schemes that included higher number (128, 256, 512, and 1024) of random cortical parcels. For each parcellation scheme, we calculated correlation coefficient and MAE between the actual and predicted neurodevelopmental scores shown in Figure 2—figure supplement 1. Across different cortical parcellation schemes, robust estimation of the cognitive and language scores was observed in all prediction models. We also investigated the effect of different scan ages on prediction models to demonstrate that high prediction performance remained intact after statistically controlling for the age effect in cortical FA measures. Prediction performances before and after adjustment for the age effect are demonstrated in Figure 2—figure supplement 1b and 1c. After adjustment for the age effect, correlation between the predicted and actual cognitive or language scores is still significant (p<0.05) with original parcellation of 52 cortical regions. Furthermore, significant correlations after controlling for age effect were also observed across other tested cortical parcellation schemes (128, 256, 512, and 1024 cortical parcels).

Evaluation by categorizing subjects with normal and low scores

As Bayley-III is widely used to assess developmental delay with certain cut-off scores, we also evaluated the performance of cortical microstructural measures in classifying subjects with normal and low scores. High accuracy was achieved with a receiver operating characteristic (ROC) curve analysis. Cognitive and language scores of all infants were categorized into normal (>85, n = 22 for cognitive scores and n = 24 for language scores) and low (≤85) score groups. Cortical FA features were used to build classifiers with leave-one-out procedure to classify each infant into one of these two groups. The ROC curve analysis was used to test the ability of cortical FA measures at birth to distinguish infants with low 2-year-old outcomes from those with normal outcomes (Figure 2—figure supplement 2a). Classification accuracy was 76.1% for cognitive and 60.9% for language scores (Figure 2—figure supplement 2b). Our analysis revealed an area under curve (AUC) of 0.809 and 0.737 for cognitive and language classifications (Figure 2—figure supplement 2c), respectively, supporting the cortical microstructural architecture at birth as a sensitive marker for prediction and potential detection of early behavioral abnormality at a population level.

Regionally heterogeneous contribution to the cognitive and language prediction

Regional cortical FA measures across entire cortex did not contribute equally to the prediction models. Heterogeneous feature contribution pattern can be clearly seen across cortex for either cognitive (Figure 2a) or language (Figure 2b) prediction. For instance, from the distribution of normalized feature contribution weights in the cognitive prediction model (center panel in Figure 2a), high contributions from right precuneus gyrus (PrCu) (indicated by black arrow) and bilateral rectus gyri (REC) (indicated by green arrows) are prominent, with bright red gyri associated with high feature contribution. To quantitatively demonstrate heterogeneous feature contribution of all cortical gyri, the normalized feature contribution weights from 52 cortical gyri categorized into six cortices are shown in a circular bar plot (right panel in Figure 2a). Higher bar indicates higher feature contribution of a cortical region to the model. The normalized feature contribution weights of the frontal, parietal, and limbic gyri (e.g. REC, postcentral, and entorhinal gyri) are relatively higher than those of the occipital, temporal, and insular cortex (e.g. superior temporal or occipital gyri) in cognitive prediction.

Similar to cognitive prediction, regional variations of feature contribution can be observed in language prediction model, as demonstrated in cortical surface map and circular bar plot in Figure 2b. For example, higher feature contribution weight was found in the left postcentral gyrus (PoCG) (indicated by black arrow) than its counterpart in the right hemisphere. Feature contribution weights in the frontal and limbic gyri are also higher than those in the occipital and temporal gyri. Differential normalized feature contribution weights in cognitive or language prediction model across all cortical gyri are listed in Supplementary file 3.

Distinguishable regional contribution to predicting cognitive or language outcomes

Besides regionally heterogeneous contributions, distinguishable feature contribution patterns were found in predicting cognitive or language outcomes. The top 10 cortical regions where microstructural measures contributed most to the prediction of cognitive and language scores are listed in Figure 3a and mapped onto cortical surface in Figure 3b. Among these top 10 cortical regions, left REC, bilateral entorhinal gyrus (ENT), right middle/lateral fronto-orbital gyrus (MFOG/LFOG), and left PoCG are the common regions (highlighted in yellow in Figure 3b) for predicting both cognitive and language outcomes. Right REC, right PrCu, right parahippocampal gyrus (PHG), and left fusiform gyrus (FuG) are unique to cognitive prediction (highlighted in red in Figure 3b), and left inferior frontal gyrus (IFG), left cingular gyrus (CingG), left insular cortex (INS), and right angular gyrus (ANG) are unique to language prediction (highlighted in green in Figure 3b). It is striking that left IFG, usually known as ‘Broca’s area’ for language production, was uniquely found in the top contributing regions in language prediction model. Notably right PrCu, an important hub for default mode network, was uniquely found among the top contributing regions in cognitive prediction model. Bootstrapping analysis indicated that the top 10 cortical regions (Figure 3) where microstructural measures contributing most to prediction were highly reproducible from 1000 bootstrap resamples for predicting each behavioral outcome (Figure 3—figure supplement 1). As shown in Figure 3—figure supplement 1, the cortical regions with higher percentages (indicating higher reproducibility) in red or brown color overlap with the top 10 cortical regions where microstructural measures contributed most to predicting cognition or language (from Figure 3; highlighted by dashed blue contours). Distinguishable regional contribution to predicting different outcomes (cognition or language) was quantified by a nonoverlapping index, ranging from 0 to 1 with one indicating completely distinctive regions and 0 indicating same regions. The statistical significance of the observed nonoverlapping index 0.57 was confirmed with permutation tests. Specifically, the permutation tests indicated that the observed nonoverlapping index of 0.57 was not likely to be obtained by chance from predicting the same outcome (p=0.001 from testing with leave-one-out resamples; p=0.05 from testing with resamples by randomly selecting 90% of samples), supporting distinguishable regional contribution to predicting cognitive or language outcomes.

Figure 3 with 1 supplement see all
Distinguishable top 10 cortical regions where microstructural measures contributed most to the prediction of cognitive or language scores.

(a) List of top 10 cortical regions with highest feature contribution weights in predicting cognitive (left) or language (right) scores. (b) Maps of cortical regions listed in (a). Cortical regions contributing most to predicting both cognition and language are painted in yellow (rMFOG, rLFOG, lREC, lPoCG, rENT, and lENT); Cortical regions contributing most to predicting uniquely cognition and language are painted in red (rREC, rPrCu, lFuG, and rPHG) and in green (lIFG, lCingG, rANG, and lINS), respectively. Abbreviations: r: right hemisphere; l: left hemisphere; ANG: angular gyrus; CingG: cingular gyrus; ENT: entorhinal gyrus; FuG: fusiform gyrus; IFG: inferior frontal gyrus; INS: insular cortex; LFOG: lateral fronto-orbital gyrus; MFOG: middle fronto-orbital gyrus; PHG: parahippocampal gyrus; PoCG: postcentral gyrus; PrCu: precuneus gyrus; REC: rectus gyrus.

Comparison among prediction based on cortical FA, WM FA, and combined cortical and WM FA

Since dMRI has been conventionally used mainly for measuring WM microstructure, we also evaluated the prediction performance using regional WM FA measures only as features (left panel in Figure 4a). Both cognitive (r = 0.516, p=2.4 × 10−4) and language (= 0.517, p=2.3 × 10−4) scores can be reliably predicted with WM FA measures, indicated by significant correlations between predicted and actual scores (right panel in Figure 4a). More importantly, solely cortical FA measures at birth are as robust as WM FA measures in predicting the cognitive and language scores at 2 years of age, demonstrated by similar correlation coefficient values between the predicted and actual scores (Figure 4c, all significant correlation with p<0.05). Combined cortical and WM FA measures as features (left panel in Figure 4b) showed higher performance with LOOCV than solely cortical or WM FA measures (Figure 4c) in predicting cognitive (= 0.721, p=1.6 × 10−8) and language (= 0.614, p=5.6 × 10−6) scores (right panel in Figure 4b). Combined features for prediction also passed threefold cross validation with significant correlation between predicted and actual cognitive (= 0.635, p=0.01) and language (r = 0.592, p=0.02) scores across folds.

Figure 4 with 1 supplement see all
Prediction of neurodevelopmental outcomes using (a) white matter (WM) fractional anisotropy (FA) features only and (b) combined cortical and WM FA features, compared with prediction using cortical FA features.

(a) On the left panel, WM microstructure was measured at the core WM regions, shown as yellow skeleton overlaid on a FA map of a neonate brain, to alleviate the partial volume effects. Feature vectors were obtained by measuring WM skeleton FA at 40 tracts transformed from WM labeling of a neonate atlas. On the right panel, scatter plots show linear regressions between actual scores and the predicted cognitive or language scores based on WM FA measures with LOOCV. (b) On the left panel, feature vectors were obtained by combining cortical skeleton (green) FA at 52 parcellated cortical gyri and WM skeleton (yellow) FA at 40 tracts with the labeling transformed from a neonate atlas. On the right panel, scatter plots show linear regressions between actual scores and the predicted cognitive or language scores based on both cortical and WM FA measures with LOOCV. (c) Significant correlation between the predicted and actual cognitive or language outcomes was found based on cortical FA only, WM FA only, and combined cortical and WM FA feature vectors. Dashed line indicates critical r value corresponding to p=0.05. * in the panel indicates significant (p<0.05) correlation.

We further examined the correlation between motion (quantified by mean framewise displacement) and regional FA values from cerebral cortex and WM (Figure 4—figure supplement 1a). After false discovery rate (FDR) correction, no significant correlation between motion estimates and FA values of any of 92 brain regions was found. We also investigated motion effects on prediction results and found that high performance of prediction models was not affected after statistically controlling for motion estimates in cortical and WM FA measures in the predication models, as shown in Figure 4—figure supplement 1b. Figure 4—figure supplement 1a and b collectively demonstrated that motion did not contaminate or confound the significant predication found in this study.

Discussion

We leveraged individual variability in the cortical microstructural architecture at birth for a robust prediction of future behavioral outcomes in continuous values. Cortical microstructure at birth, encoding rich ‘footage’ of regional cellular and molecular processes in early human brain development, was evaluated as the baseline measurements for prediction. Our previous studies (Huang et al., 2009; Ouyang et al., 2019b; Yu et al., 2016) found that individual neonate cortical microstructure profile characterized by different levels of dendritic arborization could be reliably quantified with dMRI-based cortical FA that reveals cortical maturation signature. In this study, we further demonstrated that individual variability of cortical microstructure profile around birth can be used to robustly predict the cognitive and language outcomes of individual infant at 2 years of age. By harnessing different encoding patterns of cognitive and language functions across the entire cortex, we quantified distinguishable contributions of each cortical region and highlighted the most sensitive regions for predicting different outcomes. Cortical regions contributing heavily to the prediction models exhibited distinctive functional selectivity for cognition and language. To our knowledge, this is the first study evaluating regional cortical microstructure for predicting future behavior, laying the foundation for future works using cortical microstructure profile as ‘neuromarkers’ to predict the risk of an individual developing health-related behavioral abnormalities (e.g. ASD). The prediction model is also capable of incorporating new and incoming subjects, a step further than previous within-sample imaging-outcome correlation studies (Ball et al., 2015; Counsell et al., 2014; Deoni et al., 2016; Hintz et al., 2015; Keunen et al., 2017; Peyton et al., 2020; Wee et al., 2017; Woodward et al., 2006). Before the presented prediction can be used to identify individual infant at risk of neurodevelopmental disorders (e.g. ASD) at a time when the infant is pre-symptomatic in behavioral assessments, the prediction model needs to be further refined with a larger sample size.

Cerebral cortex plays a central role in human cognition and behaviors. High performance achieved in prediction of cognition and language based on cortical FA measures (Figure 2) is probably due to sensitivity of cortical microstructural changes to maturational processes including synaptic formation, dendritic arborization, and axonal growth. Distinctive maturation processes manifested by differentiated cortical FA changes across cortical regions in fetal and infant brains were reproducibly reported development (Ball et al., 2013; Huang et al., 2006; Huang et al., 2009; Huang et al., 2013; Kroenke et al., 2007; McKinstry et al., 2002; Neil et al., 1998; Ouyang et al., 2019a; Ouyang et al., 2019b; Yu et al., 2016). Although cortical thickness, volume, or surface area from structural MRI scans (i.e. T1- or T2-weighted images) were conventionally primary structural measurements to characterize infant cerebral cortex development (Hazlett et al., 2017; Hill et al., 2010; Lyall et al., 2015), they cannot characterize the complex microstructural processes that take place inside the cortical mantle. Compared to macrostructural changes quantified by these conventional measurements, the underlying microstructural processes quantified by cortical FA may be more sensitive to infants with pathology such as those with risk of ASD. Because WM microstructure (e.g. FA) measurement is more widely used in dMRI studies than cortical microstructure measurement, we also evaluated WM FA at birth for predicting cognitive and language outcome at 2 years of age (Figure 4). Despite the fact that microstructural measures from solely cerebral cortex or WM at birth have similar sensitivity in predicting neurodevelopment outcomes at 2 years old (Figure 4), cortical microstructure is more directly associated with specific cortical regions and thus certain cortical functions, compared to association of WM with the cortical regions through end point connectivity. On the other hand, combined cortical and WM microstructural measures at birth offered more baseline information and resulted in improved prediction (Figure 4b and c).

Human brain development in the first 2 years is most rapid across the lifespan. During first 2 years after birth, overall size of an infant brain increases dramatically, reaching close to 90% of an adult brain volume by 2 years of age (Pfefferbaum et al., 1994). Despite rapid development, infancy period (0–2 years) of human is probably the longest among all mammals with cognitive and language functions unique in human emerging during this critical period. For instance, infants start to learn their mother tongue from babbling to full sentences, during the age of 6 months to 2–3 years (Kuhl, 2004). This lengthy yet extremely dynamic brain development processes make the prediction of 2-year neurodevelopmental outcome with brain information at birth ultimately invaluable. DMRI-based cortical microstructure at birth, before any behavioral tests could be performed, can well predict both cognition and language outcomes at 2 years of age (Figure 2). A regionally heterogeneous distribution pattern across cerebral cortex was displayed (Figure 2). Furthermore, consistent with their functions documented in the literature, the cortical regions contributing heavily to the prediction models exhibited distinguishable functional selectivity for cognition and language (Figure 3 and Figure 3—figure supplement 1). The cognitive scale in Bayley-III estimates cognitive functions including object relatedness, memory, problem solving, and manipulation on the basis of nonverbal activities (Bayley, 2006). Cortical regions with high weight in the prediction model are tightly associated with cognitive functions. Right rectus, precuneus, parahippocampal, and left fusiform gyri were among the top 10 cortical regions (painted red in Figure 3b) where microstructural measures contributed uniquely to the cognition prediction, but not to the language prediction. Parahippocampal gyrus provides poly-sensory input to the hippocampus (Witter et al., 2000) and holds an essential position for mediating memory function (Young et al., 1997). Precuneus is a pivotal hub essential in brain’s default-mode network (Buckner et al., 2008) and involved in various higher-order cognitive functions (Cavanna and Trimble, 2006). Rectus and fusiform gyri are associated with higher-level social cognition processes (Viskontas et al., 2007). For regions contributing heavily to both cognition and language prediction and painted yellow in Figure 3b, bilateral entorhinal gyri likely serve as a pivot junctional region mediating the processes of different types of sensory information during the cortex-hippocampus interplay (Witter et al., 2000). Middle and lateral fronto-orbital gyrus are involved in cognitive processes including learning, memory, and decision-making (Wikenheiser and Schoenbaum, 2016). Left postcentral gyrus with high feature contribution weight in both prediction models is associated with the general motor demands of performing tasks. Besides higher-order cognitive functions, language functions are also unique in human beings. The first 2 years of life is a critical and sensitive period for the speech-perception and speech-production development (Kuhl, 2004; Werker and Hensch, 2015). The language scale from Bayley-III includes two subdomains: receptive communication and expressive communication. Here the term ‘communication’ refers to any way that a child uses to interact with others, and includes communication in prelinguistic stage (e.g. eye gaze, gesture, facial expression, vocalizations, and words), social-emotional skills, and communication in more advanced stage of language emergence (Bayley, 2006). The distinctive regions with high feature contribution weights only in language prediction model included left inferior frontal gyrus (IFG), cingular gyrus, insular cortex, and right angular gyrus, painted green in Figure 3b. These regions are related to receptive and expressive communication. It is striking that left IFG, known as ‘Broca’s area’, was identified by this data-driven prediction model because it is well known that Broca’s area plays a pivotal role in producing language (Poeppel, 2014). Angular gyrus, another crucial language region in the parietal lobe, supports the integration of semantic information into context and transfers visually perceived words to Wernicke’s area. Left insula, a part of the articulatory network in the dual-stream model of speech processing, is involved in translating acoustic speech signals into articulatory representations in the frontal lobe (Hickok and Poeppel, 2007). Since the Bayley language scale also includes a number of items reflecting social-emotional skills, such as how a child responds to his/her name or reacts when interrupted in play, the high feature contribution weight of insular cortex may be due to its important role in social emotions (Lamm and Singer, 2010). The high feature contribution weight of cingulate cortex might be related to its key involvement in emotion and social behavior (Bush et al., 2000). Taken together, identifying these regions with highest feature contribution weights sheds light on understanding local brain structural basis underlying emergence of distinctive functions manifested by daily behavior, enhancing our knowledge of brain-behavior relationships.

Motor scores from Bayley-III were not predicted reliably in this study possibly due to low variability and low signal-to-noise ratio (SNR) of cortical FA measurement in primary sensorimotor cortical regions associated with motor function. Cortical FA measurements at primary sensorimotor cortex are relatively low compared to those at other cortices (Ball et al., 2013; Ouyang et al., 2019b) and are barely above the noise floor, as primary sensorimotor cortex develops earlier compared to cortical regions associated with higher-order brain functions. Individual variability of cortical microstructure at primary sensorimotor cortex cannot be well captured with relatively low SNR for the cortical FA measurements at these regions. High individual variability enables reliable prediction. Low functional variability at primary sensorimotor cortex was found in a largely overlapped cohort in a separate study (Xu et al., 2019) from our group, and was reproducibly found in another cohort (http://developingconnectome.org). With strict exclusion criteria of participating cohort around birth, higher average and lower variance of motor scores than those of cognition or language scores play an important role in poor prediction of motor scores. Larger group variability in motor scores and larger sample size can offset the limitations elaborated above and enhance the prediction of motor scores.

Technical considerations, limitations, and future directions are discussed below. The capacity of cortical microstructural profile at birth to predict later behavior is substantial (Figure 2). We trained models to classify the low and normal outcomes. The ROC curves and accuracy measurements in Figure 2—figure supplement 2 demonstrated high accuracy of the classification of low- and normal-outcome groups. Robustness of the prediction model was further tested against various factors. These factors included different cortical parcellation schemes (random and finer parcellations versus parcellation based on an atlas) for measuring feature vectors and individual age adjustment (Figure 2—figure supplement 1). Importantly, high performance of the prediction models is reproducible after taking above-mentioned factors into consideration. We used MAE as a metric to measure prediction errors as MAE is more robust to extreme or outlier values compared to other prediction errors such as mean square error or root mean squared error (Willmott and Matsuura, 2005). Despite relatively high dMRI resolution (0.656 × 0.656 × 1.6 mm3) being used, partial volume effects (Jeon et al., 2012) cannot be ignored for measuring cortical FA. The partial volume effects are different across brain with thinner cortical regions more severely affected. To maximally alleviate the partial volume effects and enhance the measurement accuracy, we adopted a ‘cortical skeleton’ approach (Ouyang et al., 2019b; Yu et al., 2016), demonstrated in the Figure 1—figure supplement 4, to measure cortical microstructure at the center or ‘core’ of the cortical plate. Although both preterm and term-born infants are included, none of them was clinically referred. All infants had been recruited solely for brain research and rigorously screened by a neonatologist and a pediatric neuroradiologist to exclude any infants with signs of brain injury (see Materials and methods for more details). To limit the effects of exposure to the extrauterine environment, this study was designed to make the interval between birth and scan age as short as possible. As a result, we did not find any significant correlation between birth age or MRI scan age and neurodevelopmental outcomes (see Supplementary file 2). The literature (Bonifacio et al., 2010) also indicated that the effects of premature birth on brain development are considered to be relatively trivial compared with the effects of brain injury and co-morbid condition which was not presented in any recruited infant due to strict exclusion criteria. However, including preterm infants is still considered a limitation as more pronounced neurodevelopmental deviation of children with preterm birth yet without any brain injury is possible after 2 years of age. Despite this limitation, this later deviation may not significantly affect conclusion of this study focused on 0–2 years of age. The established prediction here incorporating MRI of preterm and term born neonates scanned across 32–42 PMW could benefit future outcome prediction studies based on in utero MRI of the fetuses in the third trimester with recent advances in utero MRI techniques (e.g. Khan et al., 2019; Thomason et al., 2013; Vasung et al., 2020). With potential long-term effects on neurodevelopment in preterm neonates, in utero MRI may be a better choice for future studies on normal brain development in the third trimester than preterm neonates. Although we have taken many precautions to extract cortical FA measures and tested internal validity of our prediction analysis, several limitations will need to be addressed in future research. Despite the fact that relatively high performance of behavioral prediction at a group level was achieved with current cohort of infants, the prediction model will benefit from validation (e.g. k-fold cross-validation) and replication with an independent infant cohort of a larger sample size for generalization. Thus, prediction model with individual variability representing a general population from a much larger cohort is warranted in future research before this approach is effective for meaningfully predicting the outcome for a single individual. As indicated in Figure 4b and c, more baseline information resulted in improved prediction. Future research will also benefit from incorporating distinctive and complementary measurements from multimodal neuroimaging (e.g. Kwon et al., 2014; Smyser et al., 2016), including structural (i.e. T1- or T2-weighted), functional, and diffusion MRI. Genetic factors could also be incorporated. With these multimodal measurements, more advanced machine learning algorithms such as multi-kernel and deep learning need to be adopted and developed to further improve prediction. This study included a healthy cohort of infants for evaluating cortical microstructure for predicting future behavior. Such evaluation in the setting of pathology needs to be further validated. The observed neurodevelopmental outcomes were also contributed by unmeasured factors such as maternal age and tertiary educational level as well as other home environment variable following discharge from the hospital, all of which should be taken into consideration in the future prediction model.

In conclusion, whole-brain cortical FA at birth, encoding rich information of dendritic arborization and synaptic formation, could be reliably used for predicting neurodevelopmental outcomes of 2-year-old infants by leveraging individual variability of these measures. Feature contribution weight in cognitive or language prediction is heterogeneous across brain regions. The cortical regions contributing heavily to the prediction models exhibited distinguishable functional selectivity for cognition and language. Identifying regions with highest feature contribution weights offers preliminary findings on understanding local brain microstructural basis underlying emergence of future behavior, enhancing our knowledge of brain-behavior relationships. These findings also suggest that cortical microstructural information at birth may be potentially used for prediction of behavioral abnormality in infants with high risk for brain disorders early at a time when infants are pre-symptomatic in behavioral assessments and intervention may be most effective.

Materials and methods

Participants

The study was approved by the Institutional Review Board (IRB) at the University of Texas Southwestern Medical Center. A total of 107 neonates were recruited from the Parkland Hospital and scanned at Children’s Medical Center at Dallas. Evaluable MRI was obtained from 87 neonates (58 M/29 F; post-menstrual ages at scan: 31.9–41.7 postmenstrual weeks (PMW); post-menstrual ages at birth: 26–41.4 PMW). All recruited infants were not clinically indicated. In other words, the infants in this study did not have medical reasons to be scanned with clinical MRI as they were considered healthy in routine medical care. They were recruited completely for research purpose which was studying the prenatal and perinatal human brain development. The benefit of MR scan could be that occasionally abnormality was found for some of these scanned infants after neuroradiologist’s reading of their MRI. Data of these infants were then excluded from analysis in the current study. These neonates were selected through rigorous screening procedures by a board-certified neonatologist (LC) and an experienced pediatric radiologist, based on subjects’ ultrasound, clinical MRI, and medical record of the subjects and mothers. Other exclusion criteria included evidence of bleeding or intracranial abnormality by serial sonography; the mother's excessive drug or alcohol abuse during pregnancy; periventricular leukomalacia; hypoxic–ischemic encephalopathy; Grade III–IV intraventricular hemorrhage; body or heart malformations; chromosomal abnormalities, lung disease, or bronchopulmonary dysplasia; necrotizing enterocolitis requiring intestinal resection or complex feeding/nutritional disorders; defects or anomalies of the brain; brain tissue dysplasia or hypoplasia; abnormal meninges; alterations in the pial or ventricular surface; or WM lesions. Informed parental consents were obtained from the subject’s parent. More demographic information of the participants can be found in Supplementary file 1.

Neonate brain MRI

Request a detailed protocol

All neonates were scanned with a 3T Philips Achieva System (ages at scan: 31.9–41.7 PMW). Neonates were fed before the MRI scan and wrapped with a vacuum immobilizer to minimize motion. During scan, all neonates were asleep naturally without sedation. Earplugs, earphones, and extra foam padding were applied to reduce the sound of the scanner. All 87 neonates underwent high-resolution dMRI and structural MRI scans. A single-shot echo-planar imaging (EPI) sequence with Sensitivity Encoding parallel imaging (SENSE factor = 2.5) was used for dMRI. Other dMRI imaging parameters were as follows: time of repetition (TR) = 6850 ms, echo time (TE) = 78 ms, in-plane field of view = 168 × 168 mm2, in-plane imaging matrix = 112 × 112 reconstructed to 256 × 256 with zero filling, in-plane resolution = 0.656 × 0.656 mm2 (nominal imaging resolution or acquisition resolution 1.5 × 1.5 mm2), slice thickness = 1.6 mm without gap, slice number = 60, and 30 independent diffusion encoding directions with b value = 1000 s/mm2. Two repetitions were conducted for dMRI acquisition to improve the SNR, resulting in a scan time of 11 min.

Quality control and quality assurance of MRI

Request a detailed protocol

General MRI slice and slice-time integral measures for quality control (QC) were determined daily using ADNI and BIRN phantoms. Any systematic anomaly identified by significant deflections from normal variation was addressed immediately with technical support and/or the in-house MR physicist team. As is the laboratory practice, test-retest reliability of the MR imaging protocol was assessed with a four subject X four repeat estimation on intra- and inter-subject variation for quality assurance (QA).

Measurement of cortical microstructure with brain MRI at birth

Request a detailed protocol

Diffusion tensor of each brain voxel was calculated with routine tensor fitting procedures. Diffusion MRI data sets from all neonates were preprocessed using DTIstudio (http://www.mristudio.org) (Jiang et al., 2006). To quantify head motion in each dMRI scan, all diffusion weighted image (DWI) volumes were aligned to the first stable image volume in the scan using automatic image registration (AIR) in DTIStudio. The volume-by-volume translation and rotation from the registration were calculated (Ouyang et al., 2016). As all MRI scans were conducted during neonates’ natural sleep, in general the motion was very small during dMRI scans. With occasional abrupt movement during sleep, DWI volumes with translation measurement larger than 5 mm or rotation measurement larger than 5° was determined as corrupted volumes. With 30 scanned DWI volumes and two repetitions, we accepted those scanned diffusion MRI data sets with less than 5 DWI volumes affected by motion. The second dMRI scan was immediately after the first dMRI scan. The affected volumes were replaced by the good volumes from another dMRI scan during postprocessing (Huang et al., 2015). After volume replacement, small motions in dMRI of all 46 infants who had a neurodevelopmental assessment at 2 years of age profiles were measured. Distribution of these motion measurements including translations and rotations are shown in Figure 1—figure supplement 5. The volume-by-volume translation range is 0–1.6 mm with average 0.78 mm and most of translations less than 1 mm. The volume-by-volume rotation range is 0–0.7° with an average of 0.18° and most of rotations less than 0.3°. Small motion and eddy current of dMRI for each neonate were corrected by registering all the DWIs to the non-diffusion weighted b0 image using a 12-parameter (affine) linear image registration with the AIR algorithm. Six elements of diffusion tensor were fitted in each voxel. Maps of FA derived from diffusion tensor were obtained for all neonates (Figure 1). DTI-derived FA maps were used to obtain the cortical skeleton FA measurements at specific cortical gyral region of interests (ROI) identified by certain gyral label from a neonate atlas (Feng et al., 2019). To alleviate partial volume effects, the cortical FA values were measured on the cortical skeleton, i.e. the center of the cortical mantle, demonstrated as green skeletons in the left panels in Figure 1. This procedure was elaborated in our previous studies (Ouyang et al., 2019b; Yu et al., 2016). The cortical skeleton was created from averaged FA maps in three age-specific templates at 33, 36, and 39 PMW due to dramatic anatomical changes of the neonate brain from 31.9 to 41.7 PMW. Based on the scan age, individual subject brain was categorized into three age groups at 33, 36, and 39 PMW, and registered to the corresponding templates using the registration protocol described in details in the literature (Feng et al., 2019; Oishi et al., 2011). By applying the skeletonization function in TBSS of FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/TBSS), cortical skeleton of the 33 PMW or 36 PMW brain was extracted from the averaged cortical FA map and cortical skeleton of the 39 PMW brain was obtained with averaged cortical segmentation map due to low cortical FA in 39 PMW brains. The cortical skeleton in the 33, 36, and 39 PMW space was then inversely transferred to each subject’s native space, to which the 52 cortical gyral labels of a neonate atlas (Feng et al., 2019) were also mapped to parcellate the cortex (Figure 1). Figure 1—figure supplement 4 illustrates the workflow to parcellate and measure the cortical FA at the neonate cortical skeleton from a representative preterm and term born infant. Irregularly small yet significant offsets between the cortical skeleton and subject’s cerebral cortex were widespread, due to imperfect inter-subject registration from transforming cortical skeletons to individual brains. Such offsets were addressed by using a fast-marching technique (details in Jeon et al., 2012; Ouyang et al., 2019b). With the fast-marching technique, FA at the voxels with highest gray matter tissue probability from segmentation of an individual subject, namely, ‘core’ cortical voxels, will be assigned to skeleton voxels sometimes deviating from ‘core’ (Figure 1—figure supplement 4-b5). As can be appreciated from Figure 1—figure supplement 4 displaying a preterm brain at 33 PMW and a term born brain at 41 PMW with the same scale, there are almost no significant differences of partial volume effects on cortical FA measurements between preterm and term-born brains. By directly overlapping the cortical skeleton with the neonate atlas, the cortical skeleton was parcellated into 52 gyri. The FA measurement at each cortical gyrus was calculated by averaging the measurements on the cortical skeleton voxels with this cortical label. In this way, feature vectors consisting cortical FA values from 52 parcellated cortical gyri and measured at the cortical skeleton were obtained for the following SVR procedures.

Neurodevelopmental assessments at 2 years of age

Request a detailed protocol

Out of 87 neonates with evaluable MRI scanned around birth, a follow-up neurodevelopmental assessment was obtained from 46 neonates (32 M/14 F, scan age of 36.7 ± 2.8 PMW) at their 2 years of age (20–29 months, 23.5 ± 2.3 months) corrected for prematurity, with gestational age taken into account. Cognitive, language, and motor development were assessed using Bayley, 2006. Specifically, the cognitive scale estimates general cognitive functioning on the basis of nonverbal activities (i.e. object relatedness, memory, problem solving, and manipulation); the language scale estimates receptive communication (i.e. verbal understanding and concept development) as well as expressive communication including the ability to communicate through words and gestures; and the motor scale estimates both fine motor (i.e. grasping, perceptual-motor integration, motor planning, and speed) and gross motor (i.e. sitting, standing, locomotion, and balance) (Bayley, 2006). The Bayley-III is age standardized and widely used in both research and clinical settings. It has published norms with a mean (standard deviation) of 100 (15), with higher scores indicating better performance. This neurodevelopmental assessment was conducted by a certified neurodevelopmental psychologist, who was blinded to clinical details of infants as well as the neonate MR findings. Unlike cognitive, language, and motor scales reliably obtained using items administered to the child by a certified neurodevelopmental psychologist, other two scales from Bayley-III (social-emotional and adaptive scales) obtained from primary caregiver heterogeneous responses to questionnaires were not included in this study.

Prediction of neurodevelopmental outcome with cortical FA as features

Request a detailed protocol

To determine whether cortical FA at birth could serve as a biomarker for individualized prediction of neurodevelopmental outcomes at 2 years of age, we performed pattern analysis using SVR algorithm implemented in LIBSVM (Chang and Lin, 2011). SVR is a supervised learning technique based on the concept of support vector machine (SVM) to predict continuous variables such as cognitive, language, or motor composite score from Bayley-III. LOOCV was adopted to evaluate the performance of the SVR model for each score. Cortical FA at birth from one individual subject was used as the testing data and the information of remaining 45 subjects including their cortical FA at birth and Bayley scores at 2 years of age were used as training data. In this procedure, the neurodevelopmental outcome of each infant was predicted from an independent training sample. Cortical FA measurements from 52 parcellated cortical gyri formed the feature vectors of each subject and were used as the SVR predictor. Feature vectors for all subjects were concatenated (Feature vectors in Figure 1) to obtain the input data for SVR prediction models with linear kernel function (Figure 1). Each feature represented by FA measurement at each cortical gyrus was independently normalized across training data. Only training data was used to compute the normalization scaling parameters, which were then applied to the testing data. After predicted continuous cognitive or language scores were estimated by the prediction model, Pearson correlation coefficient (r) and MAE between the actual and predicted continuous score were computed to evaluate cognition or language prediction models. The normalized feature contribution weights (| wi |/| wi | with i indicating ith cortical gyrus) were calculated to represent contribution of all parcellated cortical gyri to the cognition or language prediction model. These normalized feature contribution weights of all parcellated cortical gyri in cognition or language prediction model were then mapped to the cortical surface to reveal heterogeneous regional contribution across entire cortex and distinguishable regional contribution distribution in a specific prediction model.

Assessment of robustness of prediction

Request a detailed protocol

Permutation test was conducted to assess LOOCV prediction performance. Specifically, cognitive or language outcomes were randomly shuffled across subjects 1000 times. Prediction procedure was carried out with each set of randomized outcomes, generating null distributions. Pearson correlation was conducted for each set of randomized outcome. MAE between predicted and observed outcome from randomly shuffled distributions was also calculated. The p-values of observed correlation coefficient (r) value in LOOCV prediction, calculated as the ratio of number of permutation tests with correlation coefficient greater than observed r value over number of all permutation tests, are the probability of observing the reported r values by chance. Similarly, the p-values of MAE in LOOCV prediction, calculated as the ratio of number of permutation tests with MAE value lower than observed MAE value over number of all permutation tests, are the probability of observing the reported MAE by chance.

To investigate the effect of cortical parcellation schemes on the cortical FA measures in prediction model, various cortical parcellation schemes, including 52 cortical regions from the neonate atlas labeling (Feng et al., 2019), 128, 256, 512, and 1024 randomly parcellated cortical regions with equal size (Zalesky et al., 2010) were tested. For each parcellation scheme, averaged value of skeletonized FA measurements in each cortical ROI was used as a feature in the SVR model to test prediction performance. To address a possible confounding factor of various neonate gestational ages at scan, we evaluated if the prediction performance of cortical FA measures remained high after controlling for ages at scan following the age correction methods described in the literature (Dukart et al., 2011). Specifically, age effect in the cortical FA of each gyrus (Ouyang et al., 2019b) or each parcellated ROI was adjusted with a biphasic piecewise linear regression between cortical FA and age with inflection point in the biphasic piecewise linear model at 36 PMW. The cortical FA residuals in the biphasic piecewise linear regression model, considered as age adjusted cortical FA measures, were then used as features in SVR models for predicting the Bayley-III scores. To validate the capability of cortical FA in behavioral predictions, individual’s Bayley composite scores were categorized into normal (>85) and low scores (≤85). Cortical FA measures were used as features to classify each subject’s score into normal- or low-score groups using SVM algorithm with LOOCV. Classification accuracy and area under the receiver operating characteristic (ROC) curves were used to evaluate the performance of classification models.

Bootstrap analysis for assessing reproducibility of top 10 cortical regions identified by LOOCV analysis

Request a detailed protocol

We used a bootstrap sampling approach to assess reproducibility of top 10 cortical regions where microstructural measures contributed most to predicting each outcome in LOOCV analysis. Specifically, we randomly selected 90% of the total 46 samples 1000 times. We then built cognition or language prediction model with each set of selected samples and identified top 10 cortical regions with highest contribution to the prediction of cognitive or language outcome. In each of 1000 bootstrap resamples, if any cortical region was identified as top 10 cortical regions contributing to predicting cognition outcome, the count for this specific cortical region was added by 1. After testing with 1000 resamples, the percentage of a certain cortical region was calculated as the total count for this region divided by 1000. In this way, a percentage map of all cortical gyri for predicting cognitive outcome can be created. The same procedure was repeated with 1000 bootstrap resamples for predicting language outcome. If the top 10 cortical regions where microstructural measures contributed most to predicting each outcome in LOOCV analysis (Figure 3) overlaps with the cortical regions with high percentage, it indicated that the top 10 cortical regions identified by LOOCV analysis were highly reproducible.

Permutation tests to assess distinguishable regional contribution to predicting cognitive or language outcomes

Request a detailed protocol

To quantify the extent of distinction between the set of top 10 cortical regions in cognition prediction model and the set of top 10 cortical regions in language prediction model, we defined a nonoverlapping index as the number of nonoverlapped regions between these two sets divided by the number of regions in their union set. This nonoverlapping index ranges from 0 to 1, with one indicating completely distinctive sets of regions and 0 indicating completely same sets of regions. A permutation test was used to evaluate the statistical significance of the observed nonoverlapping index. The null hypothesis is that the observed nonoverlapping index from predicting two different outcomes is not different from a distribution of nonoverlapping index calculated from predicting same (cognitive or language) outcome. The null distribution of nonoverlapping indices was generated by calculating 2070 nonoverlapping indices with each corresponding to one of 1035 pairs of cognitive-cognitive outcome or one of 1035 pairs of language-language outcome using leave-one-out resamples. The p-value of reported nonoverlapping index is the probability of observing the reported nonoverlapping index by chance and was calculated as the number of permutations with higher index value than reported index divided by the number of total permutations. We also conducted a more strict permutation test by increasing variability of the resamples. Specifically, the bootstrap resamples used in the section of ‘bootstrap analysis for assessing reproducibility of top 10 cortical regions identified by LOOCV analysis’ above was adopted to generate another null distribution of nonoverlapping indices and calculate the p-value of observed nonoverlapping index by using the same procedures described above.

Prediction of neurodevelopmental outcome with WM FA only and combined cortical and WM FA as features

Request a detailed protocol

WM FA only and combined cortical and WM FA were tested for predicting neurodevelopmental outcomes. WM skeleton FA values at the core were measured to alleviate the partial volume effects (left panel in Figure 4a). WM skeleton was further parcellated into 40 tracts with the tract labeling transformed from a neonate atlas (Feng et al., 2019). Details of tract-wise FA measurement at the WM skeleton were described in our previous publication (Huang et al., 2012). WM FA measurements of the 40 tracts were used to generate the feature vectors of each subject. In addition, cortical FA measurements of the 52 cortical regions and WM FA measurements of the 40 tracts were combined to generate the feature vector of each subject. Similar to the procedures of predicting neurodevelopmental outcomes with cortical FA feature vectors, these WM FA feature vectors only or combined cortical and WM FA feature vectors were the input of the SVR predictor with LOOCV for predicting neurodevelopmental outcome. To evaluate the generalizability of the prediction model, a threefold cross validation analysis was also applied. Unlike LOOCV trained on the data from all but one participant, threefold cross validation left out 33% of the participant (15 of 46) data for testing and was trained on remaining 67% of the data.

Data availability

Neonate MRI datasets are publicly available and can be freely downloaded from brainmrimap.org (a public website maintained by Huang lab). Behavioral datasets are available in the supplemental information of this publication. Source codes used for prediction are available from first author’s github repository (https://github.com/MHouyang/Prediction-of-neurodevelopmental-outcome ; copy archived at https://archive.softwareheritage.org/swh:1:rev:e3506bfbfa03db27651b1803e5cb662b623f5360/).

References

  1. Book
    1. Bayley N
    (2006)
    BSID-III: Bayley Scales of Infant Development
    Harcourt Assessment, San Antonio.
  2. Book
    1. Buckner RL
    2. Andrews-Hanna JR
    3. Schacter DL
    (2008) The brain's default network: Anatomy, function, and relevance to disease
    In: Kingstone A, Miller M. B, editors. Annals of the New York Academy of Sciences: Vol. 1124. the Year in Cognitive Neuroscience. Blackwell Publishing. pp. 1–38.
    https://doi.org/10.1196/annals.auindex_1
  3. Book
    1. Scholz J
    2. Tomassini V
    3. Johansen-Berg H
    (2009) Individual differences in white matter microstructure in the healthy brain
    In: Johansen-Berg H, Behrens T. E. J, editors. Diffusion MRI. London: Academic Press. pp. 237–250.
    https://doi.org/10.1016/B978-0-12-374709-9.00011-0
  4. Book
    1. Yakovlev PI
    2. Lecours AR
    (1967)
    The myelogenetic cycles of regional maturation of the brain
    In: Minkowski A, editors. Regional Development of the Brain in Early Life. Oxford: Blackwell Science. pp. 3–70.

Decision letter

  1. Alex Fornito
    Reviewing Editor; Monash University, Australia
  2. Richard B Ivry
    Senior Editor; University of California, Berkeley, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper uses brain imaging in neonates to show that brain structure at birth can predict behaviour up to 2 years later. This work provides an important link between brain health at birth and later cognitive development.

Decision letter after peer review:

Thank you for submitting your article "Diffusion-MRI-based regional cortical microstructure at birth for predicting neurodevelopmental outcomes of 2-year-olds" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Richard Ivry as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary

This manuscript focuses on trying to predict cognitive assessment scores, collected at 18 months of age, from neonatal cortical structure. In 46 neonates, they use support vector regression with leave-one-out cross-validation to model the multivariate relationship between cortical microstructure and later scores. The authors are able to calculate predicted individual cognitive and language scores that correlate with actual scores.

All reviewers appreciate the challenge in collecting these kinds of data and the cross-validation procedure used for outcome prediction. However, we shared a series of concerns that must be addressed before we can consider publication.

Essential revisions

– We are concerned about the representativeness and size of the sample, and therefore how generalisable these results might be to the larger neonatal population. The infants are both born and scanned across a very wide age range, 32-42 postnatal weeks. This means that some of the infants were both very preterm at birth and at scan, with very different cortical microstructure (seen in the supplementary figures). There is also very different partial voluming in these two ages simply due to brain volume differences. Moreover, the study population is not described very well. We cannot discern what the relative proportions of preterm and term infants are. The authors only report that the gestational age ranged from 26 to 41 weeks and that images were obtained at postmenstrual ages ranging from 31 to 41 weeks. More detail is needed regarding the study population and who was imaged when. In addition, more than half of the recruited infants didn't finish the study, and it is important to know if there were differences between those subjects who were apparently lost to follow up and those that weren't.

– The mixing of term and preterm groups in the analysis raises an interpretation issue. These two populations aren't equivalent, and few would argue that preterm infants are "normal" despite having normal conventional neuroimaging studies. It is conceivable that the findings of the study are driven by abnormalities in the preterm infants, and are thereby an indication of areas which are most commonly injured in preterm infants. We suggest the authors consider i) comparing findings of preterm with term infants and ii) evaluating outcome correlations for these populations separately. An alternative would be to examine term-equivalent scans for all participants.

– Please clarify the extent to which the correlations between real and predicted scores driven by extreme scorers.

– The authors have demonstrated the quite dramatic changes that occur over this period (Ouyang et al., 2019). It's difficult to see how you could combine datasets that have the transient anisotropic features in cortex (early) and more isotropic mature features (late) without testing on a hold out. Linear age adjustment doesn't really help in this context as the changes are nonlinear. As an example, the positive and negative scores at both ends of the tails of the Bailey-III cognitive scores are preterm and term born infants respectively which again makes me worry about the confound of age in the results of the regression.

– The average assessment score is nearly a full standard deviation below what you'd expect in a typical population at the same age, averaging 85-90 in the three subscales. Is there a reason the average is so low?

– The predicted scores are also in a very restricted range – although the raw scores range from 65-110 (cognitive) and 55-110 (language) the predictions seem to only range from 80-90. As you move away from the mean, the absolute error increases substantially.

– More generally, the analysis provides a useful proof-of-principle that early FA measures can predict later outcomes, but the predictions are not sufficiently accurate for clinical use. The MAE for the models is around 1 SD of the cognitive scores, and the accuracy using performance cut-offs was 61%/76%. Although sensitivity/specificity are not reported, FA does not seem to be a highly sensitive marker of these outcomes (as suggested in text). This approach may work at a population level but is not particularly effective for meaningfully predicting the outcome for a single individual. A more measured tone in describing the findings and their limitations would provide a more balanced account of the findings.

– With samples sizes this small there is substantial risk of overestimation of the accuracy of any machine learning techniques. The sample size is small, but 5-fold or 10-fold CV is possible here – see Poldrack et al. 10.1001/jamapsychiatry.2019.3671

– White matter FA showed a similar prediction accuracy to cortical FA. This raises questions about specificity. Could simpler measures (e.g., T1 or T2 signal) provide comparable prediction? If so, this may challenge the biological interpretation that the prediction derives from FA's capacity to measure cortical microstructure. Perhaps the authors could further examine this issue by comparing the relative prediction efficacy of simpler and FA-derived measures and looking at how strongly cortical and white matter measures correlate with each other. Is there anything to be gained by combining them?

– Please clarify how motion corruption of the DWI data was assessed and determined?

– Please explain reasons for participant dropout over the 2 year follow-up.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Diffusion-MRI-based regional cortical microstructure at birth for predicting neurodevelopmental outcomes of 2-year-olds" for further consideration by eLife. Your revised article has been evaluated by Richard Ivry (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below. We emphasise that these issues must be addressed to the satisfaction of the reviewers before the manuscript can be accepted.

The first major concern is that the motion threshold is very large (>3 times voxel dimensions) and the replacement scan seems to be taken from a different session. The eddy correction approach is also out of date and seems to just be an affine registration (no outlier rejection like in tortoise, eddy, shard). The authors need to comprehensively demonstrate that the results are not driven by motion-related artefact.

Second, it may not be appropriate to classify prematurely-born infants as "normal." Although the MRI may appear normal at early neurodevelopmental follow up, many cognitive issues aren't detectable until these children reach school age. This should be identified as a weakness in the Discussion.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for sending your article entitled "Diffusion-MRI-based regional cortical microstructure at birth for predicting neurodevelopmental outcomes of 2-year-olds" for peer review at eLife. Your article is being evaluated by three peer reviewers, and the evaluation is being overseen by a Reviewing Editor and Richard Ivry as the Senior Editor.

The reviewers feel that the issue of head motion has not been fully addressed. The reviewers requested that you comprehensively show that motion cannot explain the findings. Showing the distribution of motion estimates in the sample is not sufficient. For publication, we would require a stronger demonstration that motion does not contaminate or confound the predictions by, for example, examining correlations between motion estimates (e.g., framewise displacement) and the outcome/connectivity measures used in the analysis.

https://doi.org/10.7554/eLife.58116.sa1

Author response

Essential revisions

– We are concerned about the representativeness and size of the sample, and therefore how generalisable these results might be to the larger neonatal population. The infants are both born and scanned across a very wide age range, 32-42 postnatal weeks. This means that some of the infants were both very preterm at birth and at scan, with very different cortical microstructure (seen in the supplementary figures). There is also very different partial voluming in these two ages simply due to brain volume differences. Moreover, the study population is not described very well. We cannot discern what the relative proportions of preterm and term infants are. The authors only report that the gestational age ranged from 26 to 41 weeks and that images were obtained at postmenstrual ages ranging from 31 to 41 weeks. More detail is needed regarding the study population and who was imaged when. In addition, more than half of the recruited infants didn't finish the study, and it is important to know if there were differences between those subjects who were apparently lost to follow up and those that weren't.

We thank the reviewer for this comment. We acknowledge that a study with larger sample size may further improve generalization of the proposed model. On the other hand, as elaborated in the response to general comment, we would think a very wide age range will benefit generalization given that these neonates are free from perinatal brain injuries. The model can be applied to future in utero MRI where wide age range could be more likely. In the present study, the statistical analyses demonstrated no significant difference in any of the outcome scores, namely cognitive, language and motor scores, between the preterm and term infants (all p>0.3 in the panel (b) of updated Figure 1—figure supplement 3).

We are very aware of the partial volume effects in measuring cortical microstructure and have developed robust method to measure the cortical fractional anisotropy (FA) at the core of the cortical mantle, as demonstrated in added Figure 1—figure supplement 4. This method has been successfully and consistently implemented in our previous studies (e.g. Ouyang et al., 2019; Yu et al., 2016) too. Details of measuring cortical FA at the core (or skeleton) of the cortical ribbon can be found in the subsection “Measurement of cortical microstructure with brain MRI at birth” in the Materials and methods section. Briefly, skeleton voxels or “core” of the cerebral cortical mantle were obtained. Irregularly small yet significant offsets between the cortical skeleton and subject’s cerebral cortex were widespread, due to imperfect inter-subject registration from transforming cortical skeletons to individual brains. Such offsets were addressed by using a fast marching technique (details in our publication, Jeon et al., 2012; Ouyang et al., 2019). With fast marching, FA at the voxels with highest gray matter tissue probability from segmentation of an individual subject, namely, “core” cortical voxels, will be assigned to skeleton voxels sometimes deviating from “core” (Figure 1—figure supplement 4-a5, b5). As can be appreciated from Figure 1—figure supplement 4 using same scale to display a preterm brain at 33 postmenstrual weeks (PMW) and a term born brain at 41PMW, there are almost no significant differences of partial volume effects on cortical FA measurements between preterm and term-born brains.

Supplementary file 1 has been updated to describe subdivided preterm and term born populations. Supplementary file 2 has also been updated to demonstrate no significant correlation between the neurodevelopmental outcome and age (birth age or scan age) in subdivided preterm or term born populations. We intended to follow up all subjects undergoing MRI around birth and conduct neurodevelopmental assessment at their 2 years of age. Study staff worked diligently for that. They made biannual phone calls and sent birthday cards to stay in contact with families who showed interests to be contacted about follow-up study to ensure contact information remained current. All subjects who did not participate 2-year follow-up were those lost to follow up. Relatively moderate follow-up rate might be related that neither recruited preterm nor term born neonates had any perinatal brain injury or were clinically indicated. Motivation of participating 2-year follow-up for these parents may not be as high as parents of infants scanned due to clinical indication in other literature.

– The mixing of term and preterm groups in the analysis raises an interpretation issue. These two populations aren't equivalent, and few would argue that preterm infants are "normal" despite having normal conventional neuroimaging studies. It is conceivable that the findings of the study are driven by abnormalities in the preterm infants, and are thereby an indication of areas which are most commonly injured in preterm infants. We suggest the authors consider i) comparing findings of preterm with term infants and ii) evaluating outcome correlations for these populations separately. An alternative would be to examine term-equivalent scans for all participants.

Please see responses to the general comments. No injury was found in any preterm infants. None of the preterm infants were recruited due to clinical indication like most of relevant literature. Except preterm, the preterm infants were considered healthy and recruited solely for research purpose. All participating neonates were carefully screened by a neonatologist and a pediatric neuroradiologist to ensure “normality”. This study was also designed to make the interval between the birth and scan time as short as possible to limit the effects of exposure to the extrauterine environment. Furthermore, following reviewers’ suggestion, we compared the neurodevelopmental outcomes between preterm and term infants and found no statistically significant difference between these two populations (all p>0.3), shown in Figure 1—figure supplement 3B. We also evaluated outcome correlations with different age indices (i.e. birth age, MRI scan age and Bayley-III exam age) for preterm and term born infants separately and found no significant correlations (all p>0.1) in updated Supplementary file 2. Considering all factors above, it does not seem necessary to divide the neonates into preterm and term born populations for prediction analysis. Limited total sample size also prevented us from further subdividing the sample into two groups for prediction analysis. As we do not have term-equivalent scans for preterm neonates (exactly for limiting the effects of exposure to the extrauterine environment), we cannot exam term-equivalent scans for all participants.

– Please clarify the extent to which the correlations between real and predicted scores driven by extreme scorers.

We thank the reviewer for this comment. The correlations between real and predicted scores should not be driven by extreme scorers. In the present study we used mean absolute error (MAE) as a metric to measure prediction errors of our method. MAE is more robust to extreme/outlier values in the data compared to other prediction errors such as mean square error and root mean squared error (Willmott and Matsuura, 2005). This clarification has been added to the second to the last paragraph in the Discussion section.

To further confirm our results were not significantly affected by extreme scorers, we removed the smallest score values in the cognitive and language and re-ran correlation analysis between real and predicted scores. We found that all correlations remain significant (MAE = 5.74, r=0.496, p = 5x10-4 for cognitive and MAE = 6.95, r = 0.461, p = 1.5x10-3 for language).

– The authors have demonstrated the quite dramatic changes that occur over this period (Ouyang et al., 2019). It's difficult to see how you could combine datasets that have the transient anisotropic features in cortex (early) and more isotropic mature features (late) without testing on a hold out. Linear age adjustment doesn't really help in this context as the changes are nonlinear. As an example, the positive and negative scores at both ends of the tails of the Bailey-III cognitive scores are preterm and term born infants respectively which again makes me worry about the confound of age in the results of the regression.

We thank the reviewer for this comment. We agree that changes of cortical FA are not linear across the entire studied period, but rather biphasic piecewise linear with an inflection point at 36PMW (Ouyang et al., 2019). To better correct the age effect in cortical FA features in this revised manuscript, we adopted the same method to quantify cortical FA time courses (more details in Ouyang et al., 2019). The biphasic piecewise linear model below was used to fit the cortical FA of each gyrus or each parcellated region of interest (ROI) and age during 31-36PMW and 36-42PMW:

Equation 1 31-36 week FAi,j=β1,i+β2,itj+δi,j

Equation 2 36-42 week FAi,j=β3,i+β4,itj+γi,j

Where tj is the post-menstrual age of the jth subject; β1,i, β3,i are the intercepts and β2,i, and β4,I are the slopes for FA of the ith cortical ROI; δi,j and γi,j are the error term for cortical FA measurement from 31-36PMW and from 36-42PMW, respectively. We corrected the age effect of cortical FA features of each gyrus or each parcellated ROI, FAi,jcorrect, based on above biphasic piecewise linear model, following the age correction methods described in the literature (Dukart et al., 2011):

31-36 week FAi,jcorrect=FAi,j+β2,i*(36tj)

36-42 week FAi,jcorrect=FAi,j+β4,i*(36tj)

After above-mentioned adjustment for the age effect, correlation between the predicted and actual cognitive or language scores is still significant (p<0.05) with original parcellation of 52 cortical regions and across other tested cortical parcellation schemes (128, 256, 512 and 1024 cortical parcels). Prediction performances before and after adjustment for the age effect are demonstrated in the updated Figure 2—figure supplement 1B-1C.

– The average assessment score is nearly a full standard deviation below what you'd expect in a typical population at the same age, averaging 85-90 in the three subscales. Is there a reason the average is so low?

The slightly lower average score may be related that samples are not diversified or large enough. All subjects are from a single county’s public hospital and recruitment was within 3 years. We would also make it clear that in this study average composite scores 85.7 – 91.2 in the three subscales are completely within 1 standard deviation from Bayley-III norms (mean = 100; standard deviation = 15). These subjects are still considered normal (Bayley, 2006).

– The predicted scores are also in a very restricted range – although the raw scores range from 65-110 (cognitive) and 55-110 (language) the predictions seem to only range from 80-90. As you move away from the mean, the absolute error increases substantially.

Restricted range of predicted score probably is mainly driven by restricted range of actual score. Although the actual score range is relatively wide, most of actual scores are clustered in 80 to 100 (Figure 1—figure supplement 3). Restricted predicted score range is also related to cortical parcellation scheme (Figure 2—figure supplement 1A). Especially using parcellation scheme with 128 random cortical parcels yields wide range of predicted scores. In addition, combining regional cortical FA and white matter FA measures in prediction (suggested by Essential revisions #9 below) also yields a wider range in predicted value. All these results suggest that larger individual variability of neurodevelopmental scores from a larger sample and more comprehensive information from multimodal neuroimaging (also see response to Essential revisions #9) will improve prediction performance and reduce prediction errors. Such discussion has already been included in the Discussion section.

– More generally, the analysis provides a useful proof-of-principle that early FA measures can predict later outcomes, but the predictions are not sufficiently accurate for clinical use. The MAE for the models is around 1 SD of the cognitive scores, and the accuracy using performance cut-offs was 61%/76%. Although sensitivity/specificity are not reported, FA does not seem to be a highly sensitive marker of these outcomes (as suggested in text). This approach may work at a population level but is not particularly effective for meaningfully predicting the outcome for a single individual. A more measured tone in describing the findings and their limitations would provide a more balanced account of the findings.

We agree with the reviewer that before the predictions can be sufficiently accurate for clinical use, testing with a larger sample is needed. Sensitivity/specificity has been provided in Figure 2—figure supplement 2. We also agree that the approach shows effectiveness of prediction at the level of certain subgroups, but is not effective enough for accurate prediction of the outcome at the level of a single individual. As suggested by the reviewer, to provide a more balanced account of the finding, we have toned down the discussion significantly in the first and second to the last paragraph in the Discussion section.

– With samples sizes this small there is substantial risk of overestimation of the accuracy of any machine learning techniques. The sample size is small, but 5-fold or 10-fold CV is possible here – see Poldrack et al. 10.1001/jamapsychiatry.2019.3671

We thank the reviewer for this comment. Although k-fold cross-validation (CV) is ideal and can be applied to this prediction model, leave-one-out cross validation approach has also been widely used in studies reviewed in Poldrack et al., 2020. From Figure 3A in Polrack et al., 2020, number of leave-one-out studies was similar to number of k-fold studies.

As elaborated in response to Essential revision #6, #7 and #9, this manuscript emphasizes the capability of cortical microstructure measures of predicting future neurodevelopmental outcomes, not perfect prediction itself, as cortical microstructure measures have been rarely used for prediction. Following reviewer’s suggestion, we conducted a 3-fold cross validation by using cortical FA only, white matter (WM) FA only, and combined cortical and WM FA. As shown in Author response table 1, with combined cortical and WM FA, the predicted cognitive and language scores remained significantly correlated with the actual scores (all p<0.05) for all 3 folds, consistent to better prediction of combined cortical and WM FA in revised Figure 4B and 4C. Although using cortical FA or WM FA only did not pass the 3-fold CV, cortical FA features performed slightly better than WM FA features in terms of correlation coefficients. We have added these findings in the last paragraph of the Results section and the second paragraph of the Discussion section.

Author response table 1
Mean correlation coefficient and the corresponding p values between predicted and actual scores in the 3-fold cross validation.

* in the table indicates significant (p<0.05) correlation.

Features in predictionCognitiveLanguage
rprp
Cortical FA0.5080.0530.4790.071
White matter (WM) FA0.4910.0630.4590.085
Combined cortical and WM FA0.6350.01 *0.5920.02 *

– White matter FA showed a similar prediction accuracy to cortical FA. This raises questions about specificity. Could simpler measures (e.g., T1 or T2 signal) provide comparable prediction? If so, this may challenge the biological interpretation that the prediction derives from FA's capacity to measure cortical microstructure. Perhaps the authors could further examine this issue by comparing the relative prediction efficacy of simpler and FA-derived measures and looking at how strongly cortical and white matter measures correlate with each other. Is there anything to be gained by combining them?

We thank the reviewer for raising this question. It is our understanding that any measurement reflecting individual brain maturational level including T1 or T2 signal, white matter microstructure, could possibly be used for predicting neurodevelopmental outcome. The main purpose of this manuscript is not to claim that cortical FA is the “only” index for predicting neurodevelopmental outcome. Instead, we consider revealing capability of cortical FA of predicting neurodevelopmental outcome (probably for the first time) the main and novel contribution of this manuscript. These different brain measures reflect distinct brain developmental properties. For example, simpler measures from structural T1-weighted (T1w) or T2-weighted (T2w) images usually indicate macrostructural measurements such as cortical volume or thickness. White matter FA measures indicate white matter (not directly cerebral cortical) microstructure, although they might be related to cortical FA measures.

Following reviewer’s comment, we combined cortical and white matter FA measures to predict outcomes with results shown in Figure 4B and 4C. Higher performances were achieved in predicting outcomes by using combined features, demonstrated by higher correlation coefficient values between the predicted and actual scores in Figure 4B and 4C. We also added discussion about combining features to improve prediction performance in the Results and Discussion sections.

– Please clarify how motion corruption of the DWI data was assessed and determined?

To quantify head motion in each dMRI scan, all diffusion weighted image (DWI) volumes were aligned to the first stable image volume in the scan using automatic image registration (AIR) in DTIStudio (Jiang et al., 2006). The volume-by-volume translation and rotation from the registration were calculated. DWI volumes with translation measurement larger than 5 mm or rotation measurement larger than 5 degree was determined as corrupted volumes. With 30 scanned diffusion weighted image (DWI) volumes and 2 repetitions, we accepted those scanned diffusion MRI datasets with less than 5 DWI volumes affected by motion. The affected volumes were replaced by the good volumes of another DTI repetition during postprocessing. We have added these details in the Materials and methods section.

– Please explain reasons for participant dropout over the 2 year follow-up.

As elaborated in the response to general comment and Essential revision 2, we intended to follow up all subjects undergoing MRI around birth and conduct neurodevelopmental assessment at their 2 years of age. All subjects who did not participate 2-year follow-up were those lost to follow up. Relatively moderate follow-up rate might be related that neither recruited preterm nor term born neonates had any perinatal brain injury or were clinically indicated. Motivation of participating 2-year follow-up for these parents may not be as high as parents of infants scanned due to clinical indication in other literature.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The first major concern is that the motion threshold is very large (>3 times voxel dimensions) and the replacement scan seems to be taken from a different session. The eddy correction approach is also out of date and seems to just be an affine registration (no outlier rejection like in tortoise, eddy, shard). The authors need to comprehensively demonstrate that the results are not driven by motion-related artefact.

We thank the reviewer for this comment. The second diffusion MRI (dMRI) scan is immediately after the first dMRI scan. The affected dMRI volumes were replaced by the good volumes from another dMRI scan in the same session. As all dMRI scans were conducted during neonates’ natural sleep, in general the motion was very small during scans. Motion threshold of 5 mm or 5 degrees was used only for discarding diffusion volumes with large motions due to occasional abrupt movements during sleep. The distribution of motion measurements including translation and rotation of dMRI scans in this cohort of 46 neonates who had a neurodevelopmental assessment at 2 years of age was demonstrated in Figure 1—figure supplement 5. The volume-by-volume translation range is 0 to 1.6mm with average 0.78mm and most of translations less than 1mm. The volume-by-volume rotation range is 0 to 0.7 degrees with average of 0.18 degrees and most of rotations less than 0.3 degrees. The information above was added to Materials and methods section.

In Author response image 1, axial b0 image from raw dMRI of a representative neonate brain in this study and that of a 3-year-old child brain scanned with the same imaging protocol are shown. Yellow arrows highlight relatively mild distortion of b0 image (left) of the neonate brain and severe distortion of b0 image (right) of the 3-year-old brain. Both b0 images are before eddy current correction and at the level of eyeball. To correct for the small head motion and eddy current distortions, we only used the 12-parameter (affine) registration by registering all the diffusion weighted images to the b0 image. Although we did not use automatic outlier rejection methods (e.g. Tortoise, eddy mentioned by the reviewer), we manually inspected all image slices and volumes and found no outliers due to severe eddy current distortions. Meanwhile, since no dMRI with opposite phase encoding directions was acquired, we could not apply advanced eddy current distortion correction technique requiring dMRI of both AP (anterior-posterior) and PA (posterior-anterior) directions.

Author response image 1
The axial b0 image of a representative neonate (left) and 3-year-old (right) brain scanned with same dMRI imaging protocol.

Second, it may not be appropriate to classify prematurely-born infants as "normal." Although the MRI may appear normal at early neurodevelopmental follow up, many cognitive issues aren't detectable until these children reach school age. This should be identified as a weakness in the Discussion.

Following reviewer’s suggestion, we have removed “normal” in the first paragraph in the Results and Materials and methods section and added discussion to identify inclusion of prematurely-born infants as a limitation in the Discussion section.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The reviewers feel that the issue of head motion has not been fully addressed. The reviewers requested that you comprehensively show that motion cannot explain the findings. Showing the distribution of motion estimates in the sample is not sufficient. For publication, we would require a stronger demonstration that motion does not contaminate or confound the predictions by, for example, examining correlations between motion estimates (e.g., framewise displacement) and the outcome/connectivity measures used in the analysis.

To comprehensively demonstrate our prediction results were not driven by motion-related artefact, we examined the correlation between motion (quantified by mean framewise displacement suggested in the Editor’s letter) and regional fractional anisotropy (FA) values from cerebral cortex and white matter (Figure 4—figure supplement 1a). After false discovery rate (FDR) correction, no significant correlation between motion estimates and FA values of any of 92 brain regions was found. We also investigated motion effects on prediction results and found that high performance of prediction models was not affected after statistically controlling for motion estimates in cortical and white matter FA measures in the predication models, as shown in Figure 4—figure supplement 1b. Figure 4—figure supplement 1A and 1B collectively demonstrated that motion did not contaminate or confound the significant predication found in this study. The response above has been added at the end of Results section.

https://doi.org/10.7554/eLife.58116.sa2

Article and author information

Author details

  1. Minhui Ouyang

    Radiology Research, Children’s Hospital of Philadelphia, Philadelphia, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  2. Qinmu Peng

    1. Radiology Research, Children’s Hospital of Philadelphia, Philadelphia, United States
    2. Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
    Contribution
    Software, Validation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Tina Jeon

    Radiology Research, Children’s Hospital of Philadelphia, Philadelphia, United States
    Contribution
    Data curation, Formal analysis, Investigation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Roy Heyne

    Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, United States
    Contribution
    Resources, Data curation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Lina Chalak

    Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, United States
    Contribution
    Resources, Data curation, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Hao Huang

    1. Radiology Research, Children’s Hospital of Philadelphia, Philadelphia, United States
    2. Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    huangh6@email.chop.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9103-4382

Funding

National Institutes of Health (MH092535)

  • Hao Huang

National Institutes of Health (MH092535-S1)

  • Hao Huang

National Institutes of Health (HD086984)

  • Hao Huang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was sponsored by NIH R01MH092535, R01MH092535-S1, and U54HD086984. We would like to thank Brittany C Bennett at the Children’s Hospital of Philadelphia for her contribution to the schematic depiction.

Ethics

Human subjects: Informed parental consents were obtained from the subject's parent. The Institutional Review Board of both University of Texas Southwestern Medical Center (CR00009778 / STU012012-132) and Children's Hospital of Philadelphia (IRB 15-011775) approved study procedures.

Senior Editor

  1. Richard B Ivry, University of California, Berkeley, United States

Reviewing Editor

  1. Alex Fornito, Monash University, Australia

Publication history

  1. Received: April 21, 2020
  2. Accepted: December 6, 2020
  3. Version of Record published: December 22, 2020 (version 1)

Copyright

© 2020, Ouyang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 413
    Page views
  • 66
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Joshua B Burt et al.
    Research Advance

    Psychoactive drugs can transiently perturb brain physiology while preserving brain structure. The role of physiological state in shaping neural function can therefore be investigated through neuroimaging of pharmacologically induced effects. Previously, using pharmacological neuroimaging, we found that neural and experiential effects of lysergic acid diethylamide (LSD) are attributable to agonism of the serotonin-2A receptor (Preller et al., 2018). Here, we integrate brain-wide transcriptomics with biophysically based circuit modeling to simulate acute neuromodulatory effects of LSD on human cortical large-scale spatiotemporal dynamics. Our model captures the inter-areal topography of LSD-induced changes in cortical blood oxygen level-dependent (BOLD) functional connectivity. These findings suggest that serotonin-2A-mediated modulation of pyramidal-neuronal gain is a circuit mechanism through which LSD alters cortical functional topography. Individual-subject model fitting captures patterns of individual neural differences in pharmacological response related to altered states of consciousness. This work establishes a framework for linking molecular-level manipulations to systems-level functional alterations, with implications for precision medicine.

    1. Neuroscience
    Chin-Hsuan Chia et al.
    Short Report

    Sleep is essential in maintaining physiological homeostasis in the brain. While the underlying mechanism is not fully understood, a 'synaptic homeostasis' theory has been proposed that synapses continue to strengthen during awake, and undergo downscaling during sleep. This theory predicts that brain excitability increases with sleepiness. Here, we collected transcranial magnetic stimulation (TMS) measurements in 38 subjects in a 34-hour program, and decoded the relationship between cortical excitability and self-report sleepiness using advanced statistical methods. By utilizing a combination of partial least squares (PLS) regression and mixed-effect models, we identified a robust pattern of excitability changes, which can quantitatively predict the degree of sleepiness. Moreover, we found that synaptic strengthen occurred in both excitatory and inhibitory connections after sleep deprivation. In sum, our study provides supportive evidence for the synaptic homeostasis theory in human sleep and clarifies the process of synaptic strength modulation during sleepiness.