Overview of MGPfact workflow.

The complete workflow comprises two major stages: MURP downsampling with preprocessed data and trajectory reconstruction. In the stage of trajectory reconstruction, the scRNA-seq data were first factorized into independent bifurcation processes based on mixtures of Gaussian processes, which were then merged into a consensus trajectory.

Trajectory inference (TI) performance of state-of-the-art methods in 239 test datasets.

a. Overall scores;b. F1branches; c. HIM; d. cordist; e. wcorfeatures. All results are color-coded based on the trajectory types, with the black line representing the mean value. The “Overall” assessment is calculated as the geometric mean of all four metrics.

MGPfact outperformed state-of-the-art methods in F1branches. P-values based one-sided paired t-tests suggest that the F1branches scores of MGPfact were significantly higher than those of the other methods for different trajectory types in the test set.

Trajectory inference (TI) performance of state-of-the-art methods in 68 test datasets of real cell population.

a. Overall scores;b. F1branches; c. HIM;d. cordist;e. wcorfeatures. All results are color-coded based on the trajectory types, with the black line representing the mean value for ranking all methods. The “Overall” assessment is calculated as the geometric mean of all four metrics.

MGPfact reconstructed the developmental trajectory of microglia, recovering known determinants of microglia fate.

a-c. The inferred independent bifurcation processes with respect to the unique cell types (color-coded) of microglia development, where phase 0 corresponds to the state before bifurcation; and phases 1 and 2 correspond to the states post-bifurcation. The most highly weighted regulons in each trajectory were labeled by the corresponding transcription factors (left panels). The top three HWG of each bifurcation are significantly differently expressed in phase 1, 2 and 3 (right panel). d. The most highly weighted regulons influencing the three developmental trajectories of microglia. e. The expression levels of the transcription factors of highly weighted regulons in each trajectory significantly differ among different phases. f. The consensus developmental trajectory by merging the three bifurcation processes. Point 0 denotes the initial of differentiation, whereas the notion of “n-m” denotes the m-th branch from the branching point n. Each colored circle represents a landmark (MURP) of the trajectory, showing the fraction of cell types. The transcription factors of highly weighted regulons in each bifurcation process were used to label each branching point. Particularly, PAM-T1 and PAM-T2 are the two newly defined subtypes of PAM. g. Selected differently expressed genes between PAM-T1 and PAM-T2 (|logfc| > 0.25, adjusted P-value < 0.1) are shown by colored-dots corresponding to the mean expression levels in either cell type. The IDs validated marker genes for PAM are labeled in green. In all box plots, the horizontal line represents the median value, and the whisker extends to the furthest data point within 1.5 times the interquartile range. Significance is denoted as: ns, not significant, * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, Wilcoxon rank-sum test.

The consensus trajectory by MGPfact offered better explanatory power of cell fate. R-square and P-values based on F-tests suggest that the consensus trajectory of MGPfact showed significantly better fitness to the subtypes of CD8+ T cells characterized and annotated by experimental evidence than the baseline model (Monocle 2).

Highly weighted genes (HWG) of the bifurcation processes of CD8+ T cells serve as reliable indicators for clinical outcome and ICI treatment response.

a. Gene expression signatures (GES) corresponding to HWG in CD8+ T cells trajectory 1 and 2 in NSCLC predict overall survival of the TCGA-LUAD cohort. b. Gene expression signatures (GES) corresponding to HWG in CD8+ T cells trajectory 1 in CRC predict overall survival of the TCGA-COAD cohort. c. ROC curve showing the weighted mean of HWG in Trajectories 1 and 2 in NSCLC significantly associated with ICI response across 3 independent studies. d. ROC curve showing the weighted mean of HWG in trajectories 1 and 2 in CRC significantly associated with ICI response across 4 independent studies.

MGPfact serves as an effective approach for characterization of new cellular subtypes.

a. The consensus trajectory of tumor-associated CD8+ T cells in NSCLC identified CD8-ZNF683-T1 and CD8-ZNF683-T2 as two subtypes of CD8-ZNF683, which are influenced by TBX21. b. Selected differently expressed genes between CD8-ZNF683-T1 and CD8-ZNF683-T2 (|logfc| > 0.25, adjusted P-value < 0.1). c. High expression of CD8-ZNF683-T1 signatures predicts good overall survival in the TCGA LUAD cohort (Methods). P-values were calculated through multivariate Cox regression analysis, and HR represents hazard ratio. d. The consensus trajectory of tumor-associated CD8+ T cells in CRC identified CD8-GZMK-T1 and CD8-GZMK-T2 as two subtypes of CD8-GZMK. e. Selected differently expressed genes between CD8-GZMK-T1 and CD8-GZMK-T2 (|logfc| > 0.25, adjusted P-value < 0.1). f. ROC curve showing high expression of CD8-GZMK-T1 signature associated with ICI treatment response in three independent studies. The consensus trajectory is formed by merging three bifurcation processes. Each colored circle represents a landmark (MURP), indicating the of cell type.