Figures and data

Identification and validation of a brain metastasis (BrM) signature in epithelial cells.
(A) Workflow overview illustrating the strategy for identifying the BrM signature from single-cell RNA sequencing (scRNA-seq) data across multiple cancer types. (B) UMAP plot illustrating full multi-cancer dataset prior to subsetting and model training. Cells are coloured by label, cancer type and cell type. (C) Representative UMAP plots demonstrating epithelial cell selection for downstream analysis. Cells are coloured according to their cancer type origin from primary tumour or BrM samples. (D) Detailed schematic of the ScaiVision network architecture used for model training. (E) Scatter plots summarising Area Under the Curve (AUC) scores for the predictive models across training and validation datasets. The threshold (AUC > 0.9 and Validation AUC > 0.8) used for selecting high-performing models is indicated by a dashed line.

Prediction of differential features of BrM using Machine Learning.
(A) Chart showing the top 20 genes ranked by Integrated Gradients attribution scores, highlighting their contribution to model predictions. These genes represent the core components of the identified BrM signature (complete list provided in Supplementary Table 2). Points show the mean scores and error bars indicate the standard deviation of scores across models (n =223) (B) UMAP projection of epithelial cells, coloured according to BrM signature scores calculated using UCell. Higher scores indicate greater inferred metastatic potential, highlighting cell populations from confirmed BrM samples and subsets within primary tumours. (C) Violin plots comparing BrM signature scores (UCell) for epithelial cells between primary and BrM samples. (D) Violin plots comparing BrM signature scores (UCell) for epithelial cells among cancer types. (E) Gene Ontology (GO) enrichment analysis illustrating biological processes significantly enriched in high-scoring epithelial cells (top 20%) compared to low-scoring epithelial cells (bottom 20%), derived from differential pseudobulk gene expression analysis. Dot size indicates gene count, and colour gradient represents adjusted p-values. Notably enriched pathways include extracellular matrix organisation, cell-cell adhesion, and synapse-related processes linked to metastatic capability. (F) Monocle 2 trajectory plot showing inferred pseudotime progression for cells from paired primary lung and BrM samples. Cells are coloured by sample origin, BrM signature score (UCell) and inferred pseudotime.

BrM signature scoring and cell-cell communication network analysis across the multi-cancer dataset.
(A) CellChat pathway analysis comparing signalling pathway activity between high-scoring and low-scoring cells. Bar size represents relative pathway strength between the two conditions; colour indicates activity in high or low BrM scored cells (red=high, blue=low). Significant pathways are coloured on the y-axis (B) CellChat network heatmaps comparing the architecture of VEGF signalling pathways specifically within high-scoring cells derived from primary tumour sites and BrM sites. (C) Plot showing significantly enhanced activity scores for VEGF ligand-receptor pairs, particularly VEGFA-VEGFR1, in high-scoring cells compared to low-scoring cells, based on LIANA analysis. (D) Box plot showing UCell scores for a literature-derived VEGF signalling target gene set across cells binned into deciles based on their BrM signature score.

Gene regulatory network analysis implicates VEGF signalling drivers
(A) Dot plot illustrating transcription factor (TF) degree centrality within the gene regulatory network (GRN) inferred by CellOracle for the highest BrM score bin from paired primary/BrM samples. (B) Line plots showing the change in network connectivity (degree centrality) for key TFs (MYC, STAT1, ETS1) across nine BrM score bins. (C) Violin plot showing ETS1 gene expression distribution across BrM score bins in the multi-cancer scRNA-seq dataset. (D) Box plots showing UCell scores for ETS1 target genes (derived from CellOracle GRN) across BrM score bins in the multi-cancer scRNA-seq dataset.

Spatial transcriptomics analysis reveals spatial organization of BrM potential and VEGF signalling.
(A) Spatial feature plots showing BrM signature scores (UCell) mapped onto tissue section of a colorectal cancer BrM patient. Colour scale indicates BrM score. Key anatomical regions are annotated. (B) Gene Ontology (GO) enrichment analysis results for biological processes upregulated in high-scoring regions (top 20% BrM score) compared to other regions, based on spatially-resolved differential gene expression analysis. Dot size represents gene count; colour indicates significance level. (C) CellChat VEGF network plots projecting inferred VEGF pathway activity scores onto tissue sections. Arrow direction indicates signalling flow; line width indicates interaction strength. (D) CellChat network heatmap plot summarizing inferred VEGF signalling interactions between annotated tissue regions. (E) Spatial feature plots showing the enrichment score for the VEGFA-VEGFR1 ligand-receptor interaction pair mapped onto tissue sections. Colour scale indicates expression of each ligand.

Pazopanib as a repurposing candidate for targeting VEGF signaling in brain metastasis.
(A) Bar plots summarizing ASGARD drug repurposing results. Y-axis indicates the predicted drug score (reversal potential). (B) Gene ontology enrichment of predicted Pazopanib target genes (identified by ASGARD) within high BrM-scoring cells across different cell types.

BrM signature aligns with metastatic progression trajectory and is detectable in tumour-educated platelets (TEPs).
(A) Box plots comparing the expression levels of VEGF pathway genes in bulk RNA-seq data from TEPs of metastatic patients, primary tumour-only patients, and healthy controls across seven cancer types. Significance determined by t-test (* p < 0.05, ** p < 0.01, ns = not significant). (B) Box plots comparing the expression levels of the 20 BrM signature genes in bulk RNA-seq data from TEPs across the same patient groups as in (D). Significance determined by t-test (* p < 0.05, ** p < 0.01, ns = not significant).

Model training and BrM signature performance in epithelial cells.
(C) Area under the ROC curve for training and validation samples across all folds, indicating the performance of models on training and validation data. Each point represents a separate model with its own set of hyperparameters. (D) Model accuracy for training and validation samples across all folds at a classification threshold of 0.5. Each point represents a separate model with its own set of hyperparameters. (E) Cross-entropy log-loss for validation samples across all folds. Each point represents a separate model with its own set of hyperparameters. (F) Violin plots comparing BrM signature scores (UCell) for epithelial cells between cancer types.

Extended BrM signature and pseudotime-based enrichment analysis.
(A) UMAP plots of scRNA-seq data from paired primary lung and BrM samples, coloured by endpoint label, original patient ID and BrM signature score (UCell). (B) Violin plot highlighting higher BrM scores in BrM samples (C) GeneSwitches plot identifying genes significantly activated (’on’, positive R2) or inactivated (’off’, negative R2) along the pseudotime trajectory. Key genes are labelled. Colour indicates annotation of gene type (D) Gene Ontology (GO) enrichment analysis results for genes dynamically regulated across pseudotime.

Extended cell-cell communication analysis.
(A) UMAP plots showing BrM signature scores (UCell) projected onto all cells across multiple cancer types. Colour scale indicates BrM score. (B) Violin plots comparing BrM signature scores (UCell) between cancer types and endpoint labels in the multi-cancer full scRNA-seq dataset. (C) Violin plots comparing BrM signature scores (UCell) between cell types in the multi-cancer full scRNA-seq dataset. (D) Gene Ontology (GO) enrichment analysis results for biological processes upregulated in high-scoring (top 20%) compared to low-scoring (bottom 20%) cells across all cell types, based on differential pseudobulk gene expression analysis. Dot size length represents gene count; colour indicates significance level. (E) CellChat VEGF network visualization of the consensus cell types between low and high scored cells and ligand interactions in the consensus cell types (F) Scatter plot illustrating the cell-wise correlation between BrM signature scores (UCell) and VEGF target gene set scores (UCell). Correlation coefficient (R) and p-value are indicated.

Additional spatial transcriptomics results.
(A) Spatial feature plots showing BrM signature scores (UCell) mapped onto additional tissue sections or regions from the colorectal cancer BrM patients. (B) Additional CellChat network heatmap plots summarizing inferred VEGF signalling interactions between annotated tissue regions. (C) Additional CellChat spatial interaction plots detailing VEGF signalling interactions between annotated tissue regions in specific samples.