Introduction

Major depressive disorder (MDD) is a highly heterogeneous condition (American Psychiatric Association, 2013; Fried and Nesse, 2015) with poorly understood pathophysiology (Fried and Nesse, 2015), which sustains a pervasive trial-and-error approach to treatment (Fried, 2017; James et al., 2018). Hence, reliable pretreatment predictors are critically needed to enable personalized therapy, minimize unnecessary medication exposure, and improve resource allocation (Kraus et al., 2019; Szegediet al., 2009). Initial attempts to forecast antidepressant outcomes have primarily depended on symptom-based stratification—utilizing either self-report or clinician-rated scales—to categorize patients and predict treatment response (Arnow et al., 2015; Kato et al., 2020). Although these clinical measures offer practical utility and are widely implemented in routine practice, they provide minimal insight into MDD’s underlying neurobiological mechanisms. This fundamental limitation stems from the non-specific nature of symptoms: identical clinical presentations may arise from disparate biological origins, while shared pathophysiology can manifest as divergent symptom profiles. Consequently, symptom-driven prediction models typically prove inadequate for guiding treatment selection. Illustrating this point, Arnow et al. (Arnow et al., 2015) categorized MDD patients into melancholic, anxious, and atypical subtypes, yet observed significant subgroup overlap and no differential treatment response, ultimately concluding that such classifications possess minimal clinical value for personalized antidepressant selection.

In response to the limitations of symptom-based approaches, research has increasingly turned to neuroimaging as a promising alternative, as it allows for biologically-anchored stratification by mapping symptoms onto discrete neural substrates. The central challenge involves identifying MDD biotypes characterized by distinctive functional or structural brain patterns, thereby facilitating pathophysiology-specific treatments. Crucially, as (Feczko et al., 2019) emphasize, meaningful subtype definitions must correlate with specific clinical or mechanistic outcomes. While pioneer neuroimaging-based endeavors have demonstrated potential for prediction—notably, (Drysdale et al., 2017) delineated four fMRI-based biotypes predictive of transcranial magnetic stimulation response — many existing studies still fail to demonstrate such authentic predictive value, instead predominantly reporting post-hoc correlations (Kennis et al., 2020; Whelan and Garavan, 2014). Furthermore, these problems have been compounded by methodological limitations such as inadequate sample sizes and a lack of external validation (Cohen et al., 2021; Kraus et al., 2019).

Beyond these general methodological issues, this emerging field faces two distinct yet critical limitations that our study aims to address. On one hand, diffusion MRI (dMRI) remains markedly underutilized for MDD subtyping. Despite its direct relevance to neural circuitry, only one study has employed white matter (WM) microstructural abnormalities to characterize heterogeneity (Liang et al., 2019). On the other hand, conventional clustering methodologies dominate current stratification approaches, where derived subtypes’ brain patterns are subsequently obtained through group-level comparative analyses, potentially obscuring intrinsic feature-sample relationships crucial for subtype definition. A noteworthy exception is the fMRI biclustering investigation by (Tokuda et al., 2018), which identified three MDD subtypes with unique functional connectivity patterns. Unlike traditional clustering, biclustering reveals localized associations between specific sample and feature subsets, which may be attenuated in global clustering models using all features.

To overcome these limitations and to translate an improved neurobiological taxonomy into clinically actionable predictors, we examine three key questions: (1) How to stratify MDD subtypes while identifying their unique WM patterns during clustering; (2) How these subtypes differ in their WM profiles and clinical presentations; (3) Whether baseline WM characteristics can predict both overall symptom reduction and domain-specific treatment responses (e.g., anxiety/somatization, cognitive impairment) across diverse therapeutic interventions. The neurobiological rationale for cross-treatment outcome prediction rests on WM pathology serving as a common pathway through which various interventions (e.g., pharmacotherapy, neuromodulation) restore neural circuit function (Long et al., 2025; Mao et al., 2025). Baseline WM architecture reflects neuroplasticity reserve—the inherent capacity for circuit reorganization that underlies treatment response (Castrén and Hen, 2013; Gazerani, 2025; Marzola et al., 2023)—thereby enabling prediction across therapeutic modalities. Our analytical framework addresses these questions through: (1) non-negative matrix factorization (NMF) based biclustering to identify subtype-specific WM-patient relationships; (2) comprehensive characterization of subtypes in discovery and validation cohorts via neuroanatomical profiling and clinical phenotyping; and (3) systematic evaluation of baseline symptom-WM associations, development of baseline WM-based predictive treatment effect models, and assessment of their generalization ability in an external validation dataset.

Results

Data-driven MDD subgroup identification

Three clinically distinct and robust subgroups emerged from the discovery cohort analysis, corresponding to the optimal rank k=3 (cophenetic correlation value with 0.96; Supplementary Figure S1). Subgroup 1 (n=32) exhibited WM alterations across 5, 730 voxels, subgroup 2 (n=129) involved 5,711 voxels, and subgroup 3 (n=48) comprised 5,566 voxels. Demographic characteristics (e.g., age, sex) showed no significant between-subgroup differences (p>0.05, Table 1).

Clinical characteristics of each subgroup in the discovery dataset.

In the independent validation cohort, the subgroup stratification demonstrated high robustness, as evidenced by a cophenetic correlation of 0.96. This high value, with <5% variation in individual assignments across 100 NMF iterations, indicates that the derived taxonomy yields consistent and stable clusters. Subgroup distribution mirrored patterns from the discovery phase: Subgroup 1 (n=19), Subgroup 2 (n=64), and Subgroup 3 (n=19), with preserved subgroup-specific WM spatial signatures.

Their demographic characteristics are presented in Table 2. Notably, Subgroup 3 consistently exhibited the lowest mean 24-Hamilton Depression Rating Scale (HAMD) total scores across discovery and validation cohorts, potentially indicating attenuated symptom severity in this neuroanatomical subtype.

Clinical characteristics of each subgroup in the external independent dataset.

Multidimensional subgroup characterization

Neuroanatomical characterization of MDD subgroups

The excellent clustering stability facilitates precise identification of subgroup-specific WM signatures (Figure 1). Anatomical location assignment using Natbrain and LNAO_SWM79 atlas (Catani and Thiebaut De Schotten, 2008; Guevara et al., 2017) revealed distinct microstructural profiles (detailed anatomical distribution is provided in Supplementary Table S2):

Subgroup 1: Predominantly superficial fibers concentrated in frontopariental regions (pars opercularis, superior/precentral gyri) with long-association involvement in corticospinal tracts, corpus callosum, and right cingulum/ inferior longitudinal fasciculus (ILF).

Subgroup 2: Cerebellar-focused pathology (cortico-ponto-cerebellar tracts, cerebellar peduncles) combined with distributed long-association alterations (bilateral cingulum/ILF) and diffuse superficial fibers across multiple lobes.

Subgroup 3: Fornix-centered abnormalities with complementary long-association changes (corpus callosum, cingulum) and superficial fibers in parieto-frontotemporal regions.

The results of WM signature extraction of the three subgroups.

a), b) and c) parts presented the coronal brain plane of subgroup 1, 2 and 3, respectively.

The analyses of mean fractional anisotropy (FA) value within the subgroup-specific WM signatures demonstrated consistent cross-cohort differences (Figure 2h and 3h). Subgroups 1 and 3 showed elevated FA versus healthy controls (HC) (p≤0.001), while Subgroup 2 exhibited reduced FA (p<0.004).

Clinical phenotypic profiles

(1) Clinical symptomatology across MDD subgroups

Analyses of 24-HAMD profiles revealed distinct symptom trajectories across subgroups, and there was a certain degree of agreement in reproducibility between the discovery and validation cohorts (Figure 2a-g, 3a-g). In the discovery cohort, significant between-subgroup differences emerged for 24-HAMD scores (p=0.02) and three symptom dimensions: psychomotor retardation (p=0.022), sleep disturbance (p=0.002), and hopelessness (p=0.043). Radar plot visualization showed that Subgroup 1 was characterized by elevated anxiety/somatization, cognitive impairment, and hopelessness; Subgroup 2, by pronounced cognitive impairment, retardation, and hopelessness; and Subgroup 3, by selective anxiety/somatization elevation.

Validation analyses confirmed core phenotype stability: Retardation (p=0.046) and hopelessness (p=0.015) remained significantly differentiated. Subgroup-specific severity patterns persisted, with Subgroup 1 remaining characterized by anxiety/somatization and hopelessness; Subgroup 2 by cognitive impairment, retardation, and hopelessness; and Subgroup 3 by anxiety/somatization.

Notably, sleep disturbance differentiation did not replicate in validation, suggesting state-dependent variability. The cross-cohort correlation for symptom profiles is r = 0.958 (p < 1.951e-08), confirming subgroup consistency.

The results of clinical symptomatology and mean FA values across MDD subgroups in the discovery dataset.

a) Five symptom dimensions were visualized using radar chart. b-f) Between-group differences in scores for five symptom—anxiety/somatization, cognitive impairment, retardation, sleep disturbance and feeling of hopelessness—across the three subgroups were shown in panels b to f, respectively. h) Comparison of the mean FA values of the three WM signature patterns between each subgroup and healthy controls.

The results of clinical symptomatology and mean FA values across MDD subgroups in the external validation dataset.

a) Five symptom dimensions were visualized using radar chart. b-f) Between-group differences in scores for five symptom—anxiety/somatization, cognitive impairment, retardation, sleep disturbance and feeling of hopelessness—across the three subgroups were shown in panels b to f, respectively. h) Comparison of the mean FA values of the three WM signature patterns between each subgroup and healthy controls.

(2) Differential treatment responses across MDD subgroups

Analysis of treatment outcomes revealed distinct response patterns among the three MDD subgroups in both discovery and validation cohorts. In the discovery cohort (Table 1), response rates showed a trend-level difference (p=0.09), with Subgroup 2 demonstrating the highest proportion of improvers (115/121, 95.0%) compared to Subgroup 1 (28/32, 87.5%) and Subgroup 3 (41/48, 85.4%). Treatment-stratified analysis revealed significant variation in response patterns (p=0.04). Selective Serotonin Reuptake Inhibitor (SSRI) monotherapy responses were most frequent in Subgroup 2 (49 cases). Serotonin-Norepinephrine Reuptake Inhibitor (SNRI) monotherapy showed limited efficacy across all subgroups. Repetitive Transcranial Magnetic Stimulation (rTMS) augmentation demonstrated subgroup-specific effects, particularly when combined with SSRIs in Subgroup 2 (28 cases). Electroconvulsive Therapy (ECT) showed relatively uniform responses across subgroups. Baseline 24-HAMD scores differed significantly between subgroups (p=0.02), suggesting varying baseline severity levels. While recurrent episodes and illness duration measures showed no significant differences, effective illness duration was significantly different among subgroups (p=0.02).

In the validation cohort (Table 2), response rates were consistently high across all subgroups (89.5 - 98.4%) without significant differences (p=0.20). None of the baseline clinical characteristics showed significant inter-subgroup differences in this cohort. Treatment-specific responses were more uniform in validation, with no significant subgroup differences (p=0.53), though similar numerical patterns emerged: SSRI monotherapy remained most effective in Subgroup 2 (9 cases); rTMS combinations showed broader efficacy; ECT maintained relatively consistent response rates.

Treatment outcome prediction

Distinct Symptom-WM associative networks across MDD subgroups

The 10 sparse canonical correlation analysis (sCCA) results robustly identified distinct symptom correlates for each subgroup’s WM signature. Subgroup 1 specifically correlated with anxiety/somatization, Subgroup 2 with cognitive impairment and retardation, and Subgroup 3 with all the above three symptom domains (The detailed association results are presented in Supplementary Table S1). Based on these symptom-related WM signatures, affected WM networks were constructed for each subgroup. Figure 4 illustrates the affected WM connectivity patterns observed in each MDD subgroup. Notably, the three subgroups exhibited distinct distributions of altered WM networks: Subgroup 1 primarily demonstrated disruptions in default mode network B, subcortical network, dorsal attention A, limbic networks A/B, salience/ventral attention A, somato-motor A, and central visual network; Subgroup 2 showed predominant alterations in the cerebellar network, default mode networks B/C, limbic A, salience/ventral attention A, dorsal attention A/B, and both peripheral and central visual networks; Subgroup 3 was characterized by WM connectivity changes predominantly involving default mode network B, limbic A, salience/ventral attention A, somato-motor A/B, and visual networks (peripheral and central).

Brain WM networks associated with major clinical symptoms at the average group-level.

The thickness of each connection reflected the proportion of edge presence across subgroups. TempPar - temporal parietal, DefaultC - default C, DefaultB - default B, DefaultA - default A, ContC - control C, ContB - control B, ContA - control A, LimbicA - limbic A, LimbicB - limbic B, SalVentAttnB - salience/ventral attention B, SalVentAttnA - salience/ventral attention A, DorsAttnB - dorsal attention B, DorsAttnA - dorsal attention A, SomMotB - somatomotor B, SomMotA - somatomotor A, VisPeri - peripheral visual, VisCent - central visual, Subcor - subcortical network.

Predictive model performance

In the discovery dataset, degree centrality measures derived from affected WM network demonstrated significant predictive capacity for treatment outcomes across all the three subgroups (Figure 5). Specifically, these network-based features successfully forecasted percentage reductions in: 24-HAMD total scores (all subgroups, r = 0.28 - 0.46, p < 0.05); anxiety/somatization (all subgroups, r = 0.38 - 0.58, p < 0.007); cognitive impairment (all subgroups, r = 0.24 - 0.55, p < 0.03); retardation (all subgroups, r = 0.33 - 0.52, p < 0.009); and hopelessness (all subgroups, r = 0.23 - 0.43, p < 0.05); as well as sleep disturbance (subgroup 2 and 3, r = 0.27 - 0.43, p < 0.009).

Prediction of antidepressant treatment outcomes using degree centrality of affected WM network in the discovery dataset.

Columns represented six antidepressant treatment outcome measures: percentage reduction in 24-HAMD total score and five core symptom domains—anxiety/somatization, cognitive impairment, retardation, sleep disturbance, and feeling of hopelessness. Rows indicated subgroups (i.e., 1, 2 and 3). * p≤0.05, ** p≤0.01, *** p≤0.001

External validation in an independent cohort (Figure 6) affirmed the generalizability of these findings. Specifically, successful predictions were maintained for percentage reductions in: 24-HAMD total score (all subgroups, r = 0.28 - 0.60, p < 0.042), cognitive impairment (all subgroups, r = 0.29 - 0.59, p < 0.03), anxiety/somatization (Subgroup 1 and 2, r = 0.27 - 0.53, p < 0.04), retardation (Subgroup 2 and 3, r = 0.30 - 0.67, p < 0.03), and sleep disturbance (Subgroup 2, r = 0.27, p = 0.043), as well as feeling of hopelessness (Subgroup 1, r = 0.62, p = 0.021).

Prediction of antidepressant treatment outcomes using degree centrality of affected WM network in the external validation dataset.

Columns represented six antidepressant treatment outcome measures: percentage reduction in 24-HAMD total score and five core symptom domains—anxiety/somatization, cognitive impairment, retardation, sleep disturbance, and feeling of hopelessness. Rows indicated subgroups (i.e., 1, 2 and 3). N/A indicated the absence of a valid SVR model applicable to the external dataset. * p≤0.05, ** p≤0.01, *** p≤0.001

Discussion

This study advances the neurobiological parsing of MDD heterogeneity by identifying three distinct subgroups with differential WM and clinical profiles. By identifying three robust MDD biotypes through NMF biclustering (cophenetic value = 0.96), we demonstrate that pre-treatment WM topology contains predictive information that is generalizable across independent cohorts and, crucially, across mechanistically distinct treatments. The consistent predictive accuracy (discovery: r = 0.24–0.58; validation: r = 0.27–0.67; all p < 0.05) suggests that these baseline WM patterns reflect a stable neurobiological trait—most plausibly, an individual’s neuroplasticity reserve—which underlies and constrains the brain’s capacity for functional recovery regardless of the specific therapeutic intervention.

Neuroanatomical subgroups and pathophysiological implications

Our identification of three neuroanatomical MDD subgroups, characterized by unique WM disruption patterns, provides novel insights into the heterogeneous pathophysiology of depression. Subgroup 1 exhibited frontoparietal-corticospinal alterations, predominantly involving superficial WM fibers—a finding contrasting with traditional deep WM-focused MDD studies. These disruptions align with this subgroup’s prominent anxiety and hopelessness, implicating dysregulated default mode and salience networks in emotional dysregulation (Menon, 2011). Subgroup 2 demonstrated cerebellar-visual circuit abnormalities, including the ILF, suggesting disrupted cerebro-cerebellar loops may underlie its cognitive-psychomotor deficits (Schmahmann et al., 2019). Subgroup 3 showed fornix-centered alterations, a structure integral to memory and reward processing (Benear et al., 2020; Godsil et al., 2013), potentially reflecting impaired hippocampal-prefrontal connectivity associated with attenuated symptom severity.

Notably, all subgroups shared disruptions in major WM tracts—corpus callosum, cingulum, and ILF—with distinct spatial patterns. The corpus callosum, the largest inter-hemisphere association, may be related to the pathogenesis of depressive symptoms in psychiatric disorders including MDD (Cole et al., 2012). The cingulum forms the outer ring of the limbic WM tracts, which are involved in key brain functions such as emotion, executive function, and episodic memory (Bubb et al., 2018; Pascalau et al., 2018). The ILF connects the occipital cortex to the anterior temporal region. As the WM backbone of the ventral visual pathway, it is considered crucial for maintaining a variety of cognitive and emotional processes in the visual modality, particularly in object and face recognition, visual emotion recognition, language, and semantics (Herbet et al., 2018; Zemmoura et al., 2021). A study by (Haghshomar et al., 2018) summarized the associations between the ILF and the symptoms of Parkinson’s disease, and suggested that this bundle is involved in the recognition of negative facial emotions in its symptoms related to mood disorders. Prior studies have reported the disruptions to these three WM bundles in MDD (Liao et al., 2013; Luttenbacher et al., 2022; Pasternak et al., 2018). Our results are consistent with previous findings, but we further reveal that damage to these bundles varies in specific locations and that FA value changes differ across different subgroups. Specifically, the mean FA value is decreased in Subgroup 2, while it is increased in Subgroup 1 and 3. These findings suggest that distinct sub-bundles exist within these bundles and play different roles in the pathological mechanisms of MDD.

WM stratification in MDD predicts symptom-specific treatment outcomes

Our study demonstrates that WM-based stratification directly predicts overall antidepressant outcomes and domain-specific improvements (especially in cognitive symptom) across subgroups. This neuroanatomical approach moves beyond the limitations of conventional, symptom-based subtyping, which often fails to inform treatment selection due to its inability to capture underlying pathophysiology (Arnow et al., 2015; Kato et al., 2020; Kautzky et al., 2021). In contrast, our WM-derived subgroups exhibit distinct neurostructural homogeneity, enabling a direct link between neuroanatomical pathology, clinical symptomatology, and treatment outcomes. By establishing this brain-symptom-outcome relationship, our work provides a pathophysiology-grounded framework for predicting treatment effects, thereby delivering actionable biomarkers to guide personalized therapy selection.

Our findings demonstrate symptom-specific neurobiological substrates underlying antidepressant treatment prediction across three depression subgroups. In Subgroup 2, cerebellar-visual circuit impairment predicted outcomes in five dimensions (overall efficacy, anxiety-somatization, cognitive impairment, psychomotor retardation, sleep disturbance), excluding hopelessness. This finding provides further support for the concept that cerebellar-visual abnormalities disrupt emotional-cognitive integration via two pathways: (1) amplifying negative visual biases via dysregulated limbic modulation (Ciapponi et al., 2023), and (2) impairing attention/visuospatial processing through compromised cerebellar-prefrontal interactions (Yildiz et al., 2010). Subgroup 1 exhibited frontoparietal-corticospinal alterations uniquely predicting hopelessness alongside anxiety-somatization and cognitive deficits. These associations may stem from disrupted executive control (frontoparietal network dysfunction) and failed emotional-somatic integration, where aberrant prefrontal-limbic connectivity biases reward processing and interoception, sustaining negative future expectations (Kaiser et al., 2015). In contrast, Subgroup 3 showed multi-tract WM pathology involving fornix, corpus callosum, cingulum, and ILF, specifically linked to severe cognitive impairment and psychomotor retardation. These structural disruptions form a disconnection syndrome that manifests as cognitive deficits through impaired information integration and psychomotor symptoms via disrupted motivation-motor coordination.

The baseline imaging data of the three subgroups can predict the treatment outcomes at 4 weeks in a mixed cohort of MDD patients receiving at least one of five antidepressant therapies. Prior studies reported that pretreatment baseline neuroimaging features could predict treatment response across diverse interventions (e.g., pharmacotherapy, neuromodulation therapy, and psychotherapy) (Long et al., 2025; Mao et al., 2025). Nevertheless, whether the imaging features predict outcomes generally or differ depending on specific treatments remains unclear. Our results support the generalizability of baseline imaging biomarkers. This cross-treatment validity may stem from shared neural mechanisms in MDD, as predictive features consistently involved limbic and default mode network, suggesting different therapies modulate common brain circuits. Ultimately, our findings imply that the pretreatment degree metrics of symptom-related network reflect robust network-level disturbances in MDD, capturing core pathophysiological states that influence treatment response beyond specific therapeutic modalities. These pretreatment features could serve as an initial screening tool to identify patients with a high risk of treatment resistance, avoiding unnecessary trial-and-error in clinical practice.

Data-driven identification of MDD subgroups through NMF-biclustering

We employed an NMF-based biclustering approach to identify localized WM alterations in MDD, yielding three robust WM patterns. These data-driven patterns demonstrated: (1) significant regional FA differences compared to HC; (2) strong associations with core depressive symptoms (r = 0.52 ∼ 0.92; supplementary Table S1); (3) predictive value for treatment outcomes.

This biclustering approach offers key advantages over conventional monoclustering method in the task of subtyping MDD: (1) captures localized WM alterations and preserves spatial heterogeneity through simultaneous sample-feature clustering; (2) identifies subgroup-specific features without requiring HC comparisons, avoiding the “averaging problem” inherent in classical subgroup stratification by the monoclustering method. This methodological advance enables more precise mapping of the neurobiological heterogeneity in MDD by identifying localized co-variation patterns that would be obscured in conventional case-control analyses.

Limitations

The limitations of our study include its reliance solely on WM features, which overlooks potential interactions with other modalities like cortical thickness or functional connectivity that could refine subtype characterization (Drysdale et al., 2017). Additionally, the coarse resolution of current WM atlases limits precise localization of alterations within fiber bundles, necessitating finer parcellation schemes. Future studies should employ multimodal imaging, larger multi-site cohorts, and standardized protocols to dissect confounding factors (e.g., scanner variability, sociogeographic influences) and validate the predictive models’ robustness.

Conclusion

Our study demonstrates that data-driven analysis of WM architecture robustly parses the heterogeneity of MDD into neurobiologically distinct biotypes. These subtypes are characterized by specific neuroanatomical signatures and clinical symptom profiles. The predictive power of baseline WM topology for treatment outcomes underscores its potential as an actionable biomarker, and their WM patterns may serve as a proxy for an individual’s neuroplasticity reserve, offering a mechanistic explanation for treatment efficacy.

Materials and methods

Participants

From March 2012 to July 2018, 311 MDD patients were recruited from Nanjing Brain Hospital (n=209, discovery dataset) and Nanjing Drum Tower Hospital (n=102, validation dataset). All patients met DSM-IV criteria (Association, 1994) as confirmed by the Chinese version of MINI, and their depressive symptoms were assessed using 24-HAMD (Hamilton, 1960). Medication-treated patients completed a 2-week washout period prior to baseline. Two trained psychiatrists conducted the 24-HAMD assessments at baseline and after 4 weeks of treatment. We also enrolled 100 age-, gender-, and handedness-matched HC from local communities. The demographic and clinical characteristics are presented in Table 3.

Demographic characteristic and clinical information of depressions and healthy controls in the current study.

The treatment strategies, determined by two experienced clinicians based on patients’ history and symptoms, included: (1) SSRIs monotherapy, (2) SNRIs monotherapy, (3) dual antidepressant therapy (SSRI+SNRI), (4) antidepressant combined with ECT therapy, or (5) antidepressant complemented by rTMS.

Exclusion criteria included: (1) substance abuse/dependence or major medical disorders, (2) history of neurological conditions or brain injury, and (3) MRI contraindications. All participants provided written informed consent after receiving detailed study information. The study protocol was approved by the Ethics Review Boards of both Nanjing Brain Hospital and Nanjing Drum Tower Hospital in accordance with the Declaration of Helsinki.

Imaging acquisitions and preprocessing

All MRI data were acquired using 3.0 T Siemens Verio scanners equipped with 12-channel head coils at both study sites. Consistent imaging protocols were implemented across sites for both dMRI and T1-weighted acquisitions. DMRI parameters included: 30 diffusion directions (b=1,000 s/mm2) with one b0 image, TR/TE=6600/93 ms, flip angle = 90, FOV = 240 × 240 mm2, matrix = 128 × 128, 3 mm slice thickness with no gap, and voxel size= 1.9 × 1.9 × 3 mm3. T1-weigthed imaging parameters were: TR/TE = 1900/2.48 ms, 1 mm slice thickness, flip angle = 9, inversion time = 900 ms, matrix = 256 × 256, FOV = 250 × 250 mm2, and isotropic 1 mm3 voxels.

All images underwent visual inspection for artifacts before standardized preprocessing. For both T1w and dMRI data, we performed: (1) axial alignment; (2) Gibbs ringing correction using local Subvoxel-Shifts (Kellner et al., 2016); and (3) N4ITK intensity inhomogeneity correction (Tustison et al., 2010). DMRI-specific processing included: (1) MP-PCA denoising (Veraart et al., 2016); (2) eddy current correction (FSL’s eddy_correct tool) (Jenkinson et al., 2012); (3) EPI correction; (4) CNN-based brain masking (pnlNipype); (5) T1w-dMRI alignment used FSL (Jenkinson et al., 2012) rigid registration while preserving native diffusion space. (6) Diffusion tensors were computed using the Stejskal-Tanner equation. Eigenvalue decomposition yielded fractional anisotropy (FA) maps.

Whole-brain probabilistic tractography was performed in MRtrix3 using the iFOD2 algorithm, with fiber orientation distributions first estimated using constrained spherical deconvolution. A total of 3 million seeds were employed to initiate streamline propagation, with all other tracking parameters set to their default values (Tournier et al., 2019). Subsequently, the resulting tractograms were processed using the Spherical-deconvolution Informed Filtering of Tractograms (SIFT) method (Smith et al., 2015, 2013) to obtain a refined and biologically meaningful set of 1 million streamlines.

For Tract Based Spatial Statistics (TBSS) analysis in FSL 6.0 (Smith et al., 2006), individual FA maps were nonlinearly registered to the FMRIB58_FA template. A mean FA skeleton was generated, thresholded at FA>0.2 to exclude partial volume effects, representing core WM tracts. Each subject’s FA data were projected onto this skeleton by identifying maximal FA values along directions perpendicular to the skeleton. This standardized approach ensured comparable WM analysis across all participants.

Data-driven identification of MDD subgroups

Biclustering identification via NMF in discovery dataset

Each patient’s FA-TBSS map was flattened into a feature vector, retaining only skeleton voxels. These vectors were then combined to form matrix X (109, 848 voxels × 209 subjects). Prior to NMF-based biclustering, we regressed out age and sex covariates from the X matrix to control for potential confouding variables. We employed the sparse NMF algorithm from Li (Li and Ngom, 2013) to decompose the TBSS-derived FA matrix XRp×n (p= voxels, n=subjects). The X was factorized into basis matrix ARp×k (k latent components, and its value ranged from 2 to 8 here) and coefficient matrix YRk×n, such that XAY. Each column of A represents a latent WM microstructural pattern formed by voxel combinations, while Y quantifies subject-specific expression levels of these patterns. Subject clustering was determined by identifying the dominant component (maximum value) in each column of Y, with basis vectors serving as cluster centroids.

To enhance feature interpretability, we reduced feature redundancy through sparsity constraints during decomposition and performed entropy-based feature selection on matrix A, retaining the top 20% voxels with highest discriminative power for subsequent biclustering. This dual optimization process simultaneously identifies: (a) cohesive WM patterns through column-wise clustering of A, and (b) patient subgroups via row-wise clustering of Y (Detailed implementation parameters are provided in Supplementary Methods Section: identification of biclusters based on NMF method).

To determine the optimal rank k that yields stable and meaningful clusters, we adopted Brunet’s consensus clustering approach (Brunet et al., 2004). For each rank k, we performed 100 iterations of matrix factorization to account for random initialization effects. Each iteration generated a connectivity matrix C (n × n), where Cij =1 if samples i and j clustered together and 0 otherwise. The consensus matrix was computed by averaging these 100 connectivity matrices, with entries representing the probability of pairwise sample co-clustering. We then performed hierarchical clustering (average linkage) on the consensus matrix and evaluated clustering stability using the cophenetic correlation coefficient, which quantifies the agreement between the consensus matrix distances and hierarchical clustering linkage distances. The optimal k was selected as the value maximizing this coefficient (Figure 7).

The pipeline of stratifying subgroup in the discovery dataset.

First, the TBSS method was used to extract FA values within the WM skeleton. NMF bi-clustering was then performed across a range of predefined ranks k (e.g., 2 ∼ 8) to generate corresponding clustering results (e.g., 2 ∼ 8 subgroups). Finally, the cophenetic correlation coefficient was employed to determine the optimal number of subgroups.

Bicluster assignment in independent validation dataset

To classify new patients from the validation cohort, we projected each individual’s data onto the pre-established basis matrix A to compute coefficient matrix Y, where maximal values in each column determine cluster membership. This assignment process was repeated 100 times to generate a test consensus matrix, mirroring the protocol from the discovery phase. Hierarchical clustering with average linkage was then applied, followed by computation of the corresponding cophenetic correlation coefficient. This was done to assess the stability of the clustering using the predetermined optimal rank (k) when applied to the validation cohort.

Subgroup characterization analysis

WM signature extraction

To identify subgroup-specific WM microstructural signatures, we developed a frequency-based feature selection protocol following NMF biclustering. Given the stochastic nature of biclustering assignments across 100 iterations (performed at the predetermined optimal rank k), we quantified voxel-wise feature consistency through the following steps (Figure 8):

  1. Feature occurrence mapping: Created a frequency matrix F (sample pairs × features) by incrementing entries when features co-occurred in biclusters across iterations, normalized to 0 - 1 probability values. Specifically, each row represented a unique sample pair, and each column corresponded to a voxel feature. For each biclustering run, if a feature (column) appeared in the same cluster as a sample pair (row), its entry in F was incremented by 1.

  2. Consensus-driven partitioning: Segmented F into k submatrices by grouping rows (sample pairs) according to their subgroup labels derived from the hierarchical clustering.

  3. Signature thresholding: For each subgroup-specific submatrix, retained features (columns) exceeding probability threshold (p > 0.9) were defined as characteristic WM signatures, ensuring > 90% consistency across biclustering realizations.

This approach transformed each run of biclustering solutions into spatially stable WM profiles, mitigating initialization-dependent variability inherent to NMF decomposition.

The flowchart of WM signature extraction.

a) Feature occurrence mapping was illustrated how to generate the frequency matrix F. b) The split of matrix F involved two parts: consensus-driven partitioning and signature thresholding.

FA-based case-control comparisons

To statistically validate the identified WM patterns, we conducted comparative analysis of mean FA values between each MDD subgroup and HCs within the signature WM profiles. These comparisons were performed independently in both discovery and validation cohorts using Wilcoxon rank-sum tests. Prior to analysis, we accounted for potential confounding effects by regressing out age and sex covariates.

Clinical phenotype differentiation

We conducted comprehensive clinical characterization of the MDD subgroups across discovery and validation datasets. First, we evaluated differences in depression severity (total 24-HAMD score) and symptom domain profiles (anxiety/somatization, cognitive impairment, psychomotor retardation, sleep disturbance, and hopelessness factor scores) (See Supplementary Materials for detailed definitions of symptom factors) using Kruskal-Wallis tests. Second, we examined illness course characteristics including recurrence frequency, current episode duration, total illness duration, and effective illness duration. Significant findings underwent post-hoc analysis using Wilcoxon rank-sum tests in both datasets. Before conducting the above comparisons, we regressed out age and sex effects. Third, we assessed treatment response heterogeneity by comparing responder rates (≥ 20% 24-HAMD reduction after 4 weeks) (Kennedy, 2022) across subgroups for five treatment strategies, analyzed by chi-square tests.

Predicting antidepressant treatment outcomes across MDD subgroups

Symptom-WM associations in discovery cohort

We employed sCCA to investigate associations between the 7 clinical symptom profiles (the above-mentioned 5 symptom domain profiles, plus weight and diurnal variations) and WM microstructural patterns within each MDD subgroup. The analytical pipeline comprised four stages:

  1. Dimensionality reduction: Given the high feature-to-sample ratio (> 5,000 voxels vs. Subgroup sizes), we implemented region-of-interest (ROI) clustering to address dimensionality. WM patterns were parcellated into spatially contiguous clusters (10 - 20 voxels) (Zalesky et al., 2010), with FA values averaged within each cluster. This ROI-based feature extraction was repeated to 10 times per subgroup to account for clustering stochasticity.

  2. sCCA implementation: Reduced-dimension data were analyzed via sCCA with cross-validation (CV) to identify maximally correlated linear combinations between clinical factors and WM features, generating canonical variate pairs.

  3. Statistical validation: Permutation testing (1, 000 iterations) assessed the significance of canonical correlations against null distributions.

  4. Feature stability assessment: Bootstrapping (1, 000 resamples) identified consistently contributing clinical and neuroimaging features across sCCA solutions.

To assess the robustness of our findings, the established sCCA workflow (Xia et al., 2018) was iterated over 10 different ROI parcellations per subgroup. We examined the stability of clinical loadings and model performance in held-out data. The technical details are provided in Supplementary Methods.

Following sCCA analysis, we identified WM ROI (WMcca) associated with 7 clinical symptom factors in each subgroup, retaining ROI appearing in ≥ 2/10 iterations. Fiber tracts passing through these WMcca regions were reconstructed into subgroup-specific WM networks (288 nodes, nodal definitions in Supplementary Excel 1 and are based on the three atlas: Yeo 17 - network, SUIT cerebellum, Tian’s subcortex (Diedrichsen et al., 2009; Schaefer et al., 2018; Tian et al., 2020), representing symptom-related neurocircuitry.

Subgroup-specific treatment outcome prediction

The nodes in the symptom-related network were categorized into 19 functional subsystems (e.g., Yeo 17 functional networks, cerebellum network, and subcortical network). Degree centrality values for each subsystem (Calculation details in Supplementary Methods) served as predictive features in support vector regression (SVR) models to forecast six treatment outcomes (e.g., percentage reductions in 24-HAMD total score and five core symptom domain profiles). All features were separately adjusted for age/sex covariates in both datasets prior to the prediction.

Model development employed epsilon-insensitive SVR with sigmoid kernel (Chang and Lin, 2011). Hyperparameter optimization via grid search explored: γ(default=1), coef0 (0.1 - 2, step = 0.01), ε (0.01 - 2, step = 0.01). Prediction performance was evaluated through nested leave-one-out CV (LOOCV), with grid search external to CV folds to prevent data leakage. Final model performance was quantified via Pearson correlation between predicted and observed values across all LOOCV iterations.

Model validation in independent cohort

The 18 subgroup-specific SVR models (3 subgroups × 6 treatment outcomes) trained in the discovery cohort were rigorously validated using the independent dataset. Fixed hyperparameters, optimized in the discovery phase, were retained during training of the SVR models. The external independent dataset underwent an identical feature preprocessing pipeline (including age/sex regression and degree centrality calculation) as applied to the discovery cohort. Model performance was quantified using the correlation between predicted and observed values for each model. This validation framework ensured generalizability while preventing overfitting through a complete separation between training and validation datasets.

Data availability

The code for the analyses presented in this paper are openly accessible on github: [https://github.com/qinjiaolong/MDDsubtype]. The subgroup-related data are accessible at [https://osf.io/3gauj].

Acknowledgements

Sincere appreciation is extended to the patients and control subjects for their valuable participation. We sincerely thank to Professor Chen Gong for his valuable suggestions of NMF-clustering method.

Additional information

CRediT Author Statement

Jiaolong Qin: Conceptualization, Methodology, Software, Formal analysis, Validation, Writing - Original Draft, Writing - Review & Editing

Xinyi Wang, Huangjing Ni, Ye Wu: Formal analysis, Data Curation

Haiyan Liu, Lingling Hua, Rui Yan, Hao Tang, Peng Zhao: Conceptualization, Investigation, Resources, Data Curation

Zhijian Yao: Conceptualization, Resources, Supervision

Qing Lu: Conceptualization, Methodology, Supervision, Writing - Review & Editing

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Funding

National Natural Science Foundation of China (81701346)

  • Jiaolong Qin

  • Ye Wu

National Natural Science Fundation of China (81871066)

  • Qing Lu

  • Ye Wu

National Natural Science Fundation of China (82151315)

  • Qing Lu

  • Ye Wu

National Natural Science Fundation of China (82271568)

  • Qing Lu

  • Ye Wu

Natural Science Foundation of Jiangsu Province (BK20190736)

  • Huangjing Ni

  • Ye Wu

National Key R&D Program of China (2023YFF1204803)

  • Ye Wu

National Natural Science Foundation of China (62201265)

  • Ye Wu

Key Project of Jiangsu Provincial Natural Science Fund (BK20253028)

  • Ye Wu

Additional files

Supplementary file 1

Supplementary file 2