Figures and data

Overview of dataset and analysis workflow.
A Biopsies were taken before and after NACT (left), and multiple regions of interest (ROIs) extracted from each biopsy, stained with metal-tagged antibodies and imaged using Imaging Mass Cytometry (IMC). Number of ROIs in each response class (middle) and total number of cells identified in the dataset (right). B All patients received EC-T chemotherapy, and some patients with carboplatin (EC-T-carbo (left). Distribution of patients over residual cancer burden (right). C. Analysis workflow for IMC data showing an illustrative example of preprocessing, segmentation, cell phenotype, spatial analysis, and response prediction. The scale bar is 150µm. Image colours show the following: 1. Preprocessing: intensity for representative IMC channel. 2. Segmentation: Representative IMC channels in green and blue (top) and segmentation mask (bottom). 3. Cell phenotyping: Various IMC channels used for cell phenotyping (top) and cell type labels after segmentation (bottom). 4. Spatial analysis: schematic heatmap illustrating high (red) or low (blue) co-localisation between pairs of cell types (represented by cartoon drawings) (top) and cells coloured by neighborhood cluster label (bottom). 5. Response prediction: cells are nodes on the graph, edges represent contact between nearest neighbors, and each cell also carries a feature vector (here represented by columns) of protein expression and/or cell type label.

Data preparation: denoising, batch effects, segmentation.
A Denoised and contrast enhanced image of a representative ROI. B Approximate overlap of data from different patients reflects removal of batch effects. Each point is a single cell. After histogram equalisation (CLAHE) the batches are no longer separated. C Representative raw image (left) and with overlaid segmentation boundaries for nuclei and whole cells (right). Blue: DNA antibody; Green: Combination of E-Cadherin, Beta-Catenin, and Pan-keratin staining.

Cell type assignment and abundance.
A IMC channels used for annotation of cell type (phenotypic markers) and functional state (functional markers). B Cell clusters annotated. C Representative raw images of selected channels (Collagen, PanCK, B7H4, GrzmB, CD45, αSMA) and corresponding cell type labels (see legend). The scale bars are 200µm on the left and 150µm on the right. D Relative abundance of cell clusters across all ROIs, without subdividing proliferating cells. E Relative abundance of cell clusters across all ROIs, with sub-divided Cancer cell and B7H4 Cancer cell clusters into proliferation and non-proliferating based on Ki67 expression.

Spatial organization of cell types.
A Distribution of neighborhood enrichment score over patients for selected pairs of cell types. The score quantifies proximity of a pair of cell types relative to random permutation of cells’ positions in an image, with positive values indicating co-localisation above random chance. Violins show the distribution across patients, and for each patient the neighbor enrichment was averaged across all ROIs of that patient. Mann-Whitney-U test statistics: CD4-CD8: 813 (p=0.005), Cancer-macrophage/HLA: 42 (p=0.06), B7H4-Cancer: 616 (p=0.67), Proliferative Cancer-CD8: 333 (p=0.003). B Representative raw and annotated images showing the colocalization of cancer cells (PanCK) and activated CD8+ T cells (GrzmB) in responders and non-responders. The scale bars are 125µm.

Tissue state before and after treatment.
A Representative raw and annotated images from non-responder patients showing spatial organization of cell types (see legend). The pre- and post-treatment images are matched so that each row of images is from one patient. The scale bars are 200µm. B Fraction of cancer cells depending on B7H4 and proliferation (Ki67) expression. Bars show median and error bars show the standard error of the median. Stars represent the p-value for Kruskal-Wallis test as follows: *: 1.00e-02 < p <= 5.00e-02, **: 1.00e-03 < p <= 1.00e-02, ***: 1.00e-04 < p <= 1.00e-03. C Neighborhood enrichment in selected pairs of cell types. Violins show the distribution across patients, and for each patient the neighbor enrichment was averaged across all ROIs of that patient. Mann-Whitney-U test statistics: Macrophage-CD4: 117 (p=0.0006); APC-Treg: 81 (p=3e-5); Proliferative Cancer-CD8: 380 (p=0.05); Macrophage-fibroblast: 205, (p=0.1).

Differences in neighborhood enrichment between responders and non-responders pre-treatment, and in non-responders between pre- and post-treatment samples
Pairs of cell types (second column) were ranked by the absolute value of their combined difference in neighborhood enrichment (fifth column), adding the difference in responders vs non-responders (third column), and pre- vs post-treatment (fourth column). A positive number means greater co-localisation in responders vs non-responders of pre- vs post-treatment, and a negative number means greater co-localisation in non-responders vs responders of post- vs pre-treatment.

Collagen structure.
A Representative example of segmentation of a collagen image. (right) original pre-processed collagen image, (middle) thresholded image (right) application of the Canny filter. B Comparison of distributions of collagen features pre- and post-treatment for non-responder patients only. (left) Collagen fibre curvature (t-stat = −2.03, p-value = 0.049), (right) Anisotropy (t-stat = −2.82, p-value = 0.007). Boxplots display median, upper and lower quartiles.


Comparison of patient-level predictions of response to chemotherapy
GNN prediction accuracy (mean and std in AUROC scores, higher is better) using different graph construction and node features.

Response prediction and interpretability.
A Graph neural network (GNN) model architecture integrating cell-cell contact graphs and protein abundance profiles to predict response to therapy. Each cell serves as a node with its 35-protein expression profile as features, connected by spatial proximity edges. Images on left show a representative IMC channel (scale bar is 200µm) and corresponding cell-cell contact graph, colored by cell type label (see legend). B Interpretability analysis for an example responder ROI using GNNExplainer yields a minimal predictive subgraph (right), and the ten protein markers contributing most to prediction performance (bottom). C-D Patient-level aggregation of protein marker importance scores (C) and cell type importance scores (D) associated with prediction of chemotherapy response, aggregated across all test samples from GNNExplainer analysis for each individual sample.
