Figures and data

A single cell atlas of healthy colon and dysplastic lesions in mouse. A. Study overview. B. Major cell subsets of healthy colon and dysplastic lesions. 2D embedding of 48,115 single cell profiles colored by cluster (top, legend) or annotated cell type (bottom, legend). C,D. Changes in cell composition in dysplastic tissues. C. 2D embedding of single cell profiles, showing only the cells in each condition state, subsampled to equal numbers of cells per condition state, colored by cluster (same legend as in B (top)). D. Proportion of cells (y axis) of each cell type in each sample (x axis). E. multiplex RNA in situ analysis. Representative images of Cartana analysis of normal colon (left) and AV lesions (right) colored by cell type assignment (same as Supplementary item 2A).

Composition and cell intrinsic expression program changes in dysplastic epithelial cells. A-C. Compositional changes in epithelial cells in dysplastic tissue. A. 2D embedding of epithelial cell profiles colored by clusters (legend). Cluster Epi06: doublets (not called by Scrublet88). B. Proportion of cells out of all epithelial cells (y axis) of each epithelial cell subset in each sample (x axis). C. Fraction of expressing cells (dot size) and mean expression in expressing cells (dot color) of marker genes (columns) for each cluster (rows). D-E. Use of epithelial cell programs changes in dysplastic tissue. D. Weights (x axis) of each of the 20 top ranked genes (y axis) for each program. E. Proportion of program weights summed over all epithelial cells (y axis) in each sample (x axis). F. Stem cell program 16 is induced in epithelial cells in dysplastic tissue. Scaled log-normalized expression (color bar) of the top 100 genes differentially expressed between cells from normal colon and from dysplastic (AV and AKPV) across the 10,812 cells that accounted for 90% of program 16’s expression across all epithelial cells (columns). Selected program genes are marked.

Altered cell type neighborship in CRC. A,B. Cell type neighborships in normal and dysplastic colon tissue. Short-range (up to 20µm) neighborship enrichment (Z score, color bar) vs. a background of spatially random annotation assignments for each pair of cell annotations (rows, columns) in normal (A) and dysplastic (B) tissue. C. Cell type distributions in situ. Slide-seq pucks of dysplastic (top) and normal (bottom) tissue colored by TACCO assignment of cell labels (legend, light grey: low quality beads) (x and y axis: spatial coordinates in μm).

Three cellular neighborhoods associated with tumor progression. A. Spatial regions. Slide-seq pucks of AV (top) and normal (bottom) mouse colon colored by TACCO regions (legend, light gray: low quality beads) (x and y axis are spatial coordinates in μm). B,C. Enrichment and depletion of cell subsets and epithelial programs across different regions. Significance (FDR, color bar, two-sided Welch’s t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of specific cell subsets (rows, B) or epithelial cell programs (rows, C) in the different regions defined by TACCO (columns) as well as all normal (“N (ref.)”, leftmost column) and AV (“AV (ref.)”, leftmost column) samples. D. TACCO defined regions preferentially relate to normal or AV tissue. Significance (FDR, color bar, two-sided Welch’s t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of each TACCO defined region (rows) in normal (“N vs. rest”) and AV (“AV vs. N”) samples (columns). E. TACCO reveals normal colon architecture. Left: Slide-seq puck of normal mouse colon colored by TACCO region annotations (legend) (x and y axis: spatial coordinates (μm)). Right: Main epithelial expression programs enriched in each region (FDR<6.3 10-4, two-sided Welch’s t test on CLR-transformed compositions) except region 2 (muscularis), which is characterized by non- epithelial (stromal) cell types. F. Expression signatures of cells in normal regions 3,5,10 and 12. Scaled log-normalized expression of the top 20 differentially expressed genes (rows) for each bead (columns) in the region. G,H. Malignant-like regions. G. Slide-seq pucks of two AV lesions colored by TACCO annotations of malignant-like regions 6, 8 and 11. H. Scaled log-normalized expression of the top 20 differentially expressed genes (rows) of each bead (left, columns) in the region; or epithelial (middle left), immune (middle right) or stromal (right) fractions of beads (columns) in regions 6, 8 and 11 in dysplastic lesions. I,J. Epithelial cell subsets and programs associated with “malignant-like”, “normal-like” and normal tissues. Significance (FDR, color bar, two-sided Welch’s t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of epithelial cell programs (I, rows) or epithelial, immune and stromal cell subsets (J, rows) in different tissue types (columns) based on Slide-seq or scRNA-seq (“sc”) samples. K. Inferred interaction pathways. Enrichment (FDR) in AV vs. normal tissue of corresponding “sender” (x axis) and “receiver” (y axis) (aggregated over ligand-receptor pairs) pathways (dots). Red/blue: pathways significantly enriched in AV/normal samples and (light red: only for either sender or receiver). The top 5 enriched pathways (in each direction) are labeled.

Mouse tumor regions associated with tumor progression in human colorectal tumors. A. Expression profiles characterizing mouse regions are recapitulated in human tumors. Significance (FDR, color bar, two-sided Welch’s t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of region-associated epithelial, immune or stromal profiles (rows) compared between normal, MMRp, or MMRd samples (columns). B,C. Mouse regions capture malignant features in human tumors. B. Top left: First (PC1, x axis) and second (PC2, y axis) principal components of mouse region scores of mouse and human epithelial pseudo-bulk samples. Top right: PC1 loadings (x axis) of each mouse region score (y axis). Bottom: PC1 values (box plots show mean, quartiles, and whiskers for the full data distribution except for outliers outside 1.5 times the interquartile range (IQR)) for each type of mouse or human sample (x axis). C. Significance (FDR, color bar, two-sided Welch’s t-test) of enrichment (red) or depletion (blue) of region-associated profile scores (rows) in normal and dysplastic samples (columns) in human or mouse. D,E. Expression of malignant like regions 6 and 11 in tumors are associated with PFI (D) and OS (E) in human patients. Kaplan-Meier PFI (e, n = 66295) or OS (f, n = 66295) analysis of human bulk RNA-seq cohort stratified by malignant-like region profile scores.

Marker genes, cell states and cell types in the healthy and dysplastic mouse colon atlas. A. Cell-type expression signatures. Scaled log-normalized expression (color bar) of the top 20 differentially expressed genes (rows) in cells (columns) from each cell type. B,C. Distinct condition states and compartments. 2D embedding of all single cell profiles (dots) colored by either condition state (B) or compartment (C). D. Changes in cell composition between healthy and dysplastic tissue. Proportion of cells (y axis) from each cell subset (x axis) out of all cells in each sample (x axis). E. Changes in cell composition in dysplastic lesions. Significance (FDR, color bar, two-sided Welch’s t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of each cell subset (rows) between samples from different conditions (columns). F-H. Multiplex in situ RNA profiles (Cartana) reproduce scRNA-seq and Slide-Seq findings. F. Proportion of RNA molecules (y axis) attributed to each cell type in each sample (x axis) in two Cartana experiments, with different gene panels. G. Change in cell proportions in AV vs. normal samples (difference of CLR-transformed cell type fractions, x and y axes) for each cell type (dots, color) based on scRNA-seq, Slide-Seq or Cartana (axis labels). Error bars: bootstrapped standard error of the mean (normal n=8, 16, 12, 16 and AV n=11, 24, 12, 12 for scRNA-seq, Slide-seq, Cartana V1, Cartana V2). H. Change in marker gene expression (dots) in AV vs. normal samples (difference of CLR-transformed gene fractions, x and y axes) based on scRNA-seq, Slide-Seq or Cartana (axis labels), for genes measured by all 3 methods. Genes with the 8 maximal and minimal expression ranks across methods are labeled. Error bars: bootstrapped standard error of the mean (normal n=8, 16, 12, 16 and AV n=11, 24, 12, 12 for scRNA-seq, Slide-seq, Cartana V1, Cartana V2).

Compositional and cell intrinsic changes in stromal and immune cells. A-C. Immune cell subsets and composition. A. 2D embedding of immune cell profiles colored by clusters (as in B). B. Proportion of cells out of all immune cells (y axis) of each immune cell subset in each sample (x axis). C. Scaled log-normalized expression (color bar) of the top 20 differentially expressed genes (rows) in cells (columns) from each immune cell subset (color bar on top). D-F. Monocyte and macrophage cell subsets and composition. D. 2D embedding of monocyte and macrophage cell profiles colored by clusters (as in E). E. Proportion of cells out of all monocytes and macrophages (y axis) of each monocyte and macrophage cell subset in each sample (x axis). F. Scaled log-normalized expression (color bar) of the top 20 differentially expressed genes (rows) in cells (columns) from each monocyte and macrophage cell subset (color bar on top). G-I. T/NK cell subsets and composition. G. 2D embedding of T/NK cell profiles colored by clusters (as in H). H. Proportion of cells out of all T/NK cells (y axis) of each T/NK cell subset in each sample (x axis). I. Scaled log-normalized expression (color bar) of the top 20 differentially expressed genes (rows) in cells (columns) from each T/NK cell subset (color bar on top). J. Enrichment of IL17+ gdT cells (TNK05) and depletion of CD8+ gdT cells (TNK01) in dysplastic lesions. Significance (FDR, color bar, two-sided Welch’s t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of each T/NK cell subset (rows) between samples from different conditions (columns). K. T/NK cell compositions is similar in human and mouse. Left: 2D embedding of T/NK cell composition profiles of human and mouse samples colored by sample type (legend) (STAR Methods). Right: Similarity of T/NK cell composition (enrichment z-scores) in the 2D embedding between each set of samples (rows, columns). L-N. Stromal cell subsets and composition. L. 2D embedding of stromal cell profiles colored by clusters (legend). M. Proportion of cells out of all stromal cells (y axis) of each stromal cell subset in each sample (x axis). N. Scaled log-normalized expression (color bar) of the top 20 differentially expressed genes (rows) in cells (columns) from each stromal cell subset (color bar on top). O. Enrichment of vascular endothelial cells (Endo01) and depletion of myofibroblasts (Fibro02) in dysplastic lesions. Significance (FDR, color bar, two-sided Welch’s t test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of each stromal cell subset (rows) between samples from different conditions (columns). P. Increased VEGFA expression in monocyte-macrophage populations with dysplasia. Distribution of expression (y axis, log1p(counts)) of VegfA in monocytes and macrophages from different conditions (x axis).

Changes in T/NK and epithelial cell differentiation in dysplastic lesions. A. T/NK cell differentiation. 2D embedding of T/NK cell profiles from normal tissues (left) and dysplastic lesions (right), colored by cell subset. Streamlines: averaged and projected RNA velocities. B,C. Proliferating stem-like cells give rise to an expanded stem-like compartment and differentiated-like tumor cells in dysplastic lesions. 2D embedding of epithelial cell profiles from normal tissues (B, left and C, top) and dysplastic lesions (B, right, and C, bottom), colored by cell subset (B), or by expression of epithelial programs (C). Streamlines: averaged and projected RNA velocities. Outlined dots (B): Proliferative cells (cells with more than 50% program weight in the proliferation programs #3 and #11).

Changes in cell composition and expression programs usage in dysplastic epithelium. A-D. Changes in epithelial cell composition in dysplastic tissues. A. 2D embedding of all single cell epithelial profiles showing only the profiles (dots) of cells from each condition state, subsampled to equal numbers of cells per condition state, colored by cluster. B. PCA captures differentiation status and lineage of epithelial clusters. First (PC1, x axis) and second (PC2, y axis) principal components of the expression of epithelial clusters. C. Mean expression (dot color) and fraction of expressing cells (dot size) of epithelial cell marker genes97 (columns) in each cluster (rows). D. Significance (FDR, color bar, two-sided Welch’s t test on ALR-transformed compositions with all non-tdTomato counts used as reference compartment) of enrichment (red) or depletion (blue) of tdTomato expression in cells from AKPVT samples between every pair of epithelial clusters. E. Dysplastic secretory like (Epi05) and immune cells express tumor-related genes. Distribution of expression (y axis, log1p(counts)) of different marker genes in cells from each epithelial/immune cell cluster (x axis). F-L. Epithelial gene programs. 2D embedding of all epithelial cells colored by the weight of each program (color bar) and the expression of selected program genes (color bar). M. Epithelial program characteristics of normal and dysplastic colon. Heatmap significance (FDR, color bar, two-sided Welch’s t test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of each epithelial program (rows) between normal vs. dysplastic tissues (columns).

Conservation of cellular composition and expression programs between mouse and human scRNA-seq data. A,B. Conservation of general and subtype-specific human and mouse epithelial expression programs. A. Pearson correlation coefficients (color) between program-specific expression profiles (Methods) of human (all samples, left, MMRd, middle and MMRp, right)) (rows) and mouse programs (columns). B. Mouse epithelial program enrichments in mouse and human tumor and normal samples. Significance (FDR, color bar, two-sided Welch t-test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of mouse epithelial program expression (rows) in sample classes from human or mouse (columns). C,D. Human-mouse conservation of cell type and program associations. Pearson correlation coefficients (color) of the CLR- transformed cell type (C) or epithelial program (D) compositions across samples in mouse (left) or human (right) single cell data. In (D), data are hierarchically clustered for the “not normal” mouse case (AV and AKPV) and this ordering is applied to all other panels.

Spatial distributions of cells and programs across regions. A-C. Slide-seq quality controls. Slide-seq pucks of AV (top) and normal (bottom) mouse colon colored by number of UMIs (A) or of genes (B) per bead. (x and y axis: spatial coordinates (μm)). C. H&E staining of sections adjacent to those used for Slide-seq. D. Selected marker gene expression. Slide-seq pucks from a normal (top) and AV (bottom) sample, colored by marker gene detected per bead. E,F. Spatial mapping of cell types and programs yields comparable composition to scRNA-seq. Distribution of the proportion (y axis) of contributions to each cell type (D) or program (E; based on fractional annotations) in cells (for scRNA-seq; “sc”) or beads (for Slide- seq; “spatial”; based on fractional annotations) in samples from normal (N) or AV (AV) tissue. G,H. Distinct cell types and programs associated with AV and normal colon. Significance (FDR, color bar, two-sided Welch’s t test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of cell types (F, rows) or epithelial programs (G, rows) in normal (N) or AV (AV) tissues based on Slide-seq (“spatial”) data or scRNA-seq (“sc”).

Distinct cellular layout in normal and AV tissues. A,B. Neighborship analysis is robust to the distance and number of randomizations. Neighborship enrichment (z-scores, color) vs. a background of spatially random annotation assignments for each pair of cell annotations (rows, columns) in normal (top) and AV (bottom) samples at varying distance (a, left: ≤40µm; right: ≤60 µm) or number of permutations (B, left: 5, right: 50). C,D. Reproducible spatial arrangement of epithelial cells in normal colon in Slide-seq and Cartana V2. Co-occurrence (y axis) of normal epithelial cell types (color) at different distances (x axis) from different central normal epithelial cell type (annotated on top) in Slide-Seq (C) and Cartana (D) data. E. Decreased spatial order of cell types in AV vs. normal tissue. Distribution of enrichment z-values (x axis) of cluster-cluster interactions (as shown in Figure 3A,B) in AV lesions (orange) and normal colon (blue). F. Epithelial program neighborships in AV tissue. Short-range (≤20µm) neighborship enrichment z-scores (color) vs. a background of spatially random annotation assignments for each pair of epithelial program annotations (rows, columns) in AV lesions. G,H. Reproducible spatial arrangement of dysplasia-associated immune cells relative to endothelial cells in Slide-seq and Cartana V2. Co-occurrence (y axis) of dysplasia-associated monocyte or granulocyte cells (color) with endothelial cells at different distances (x axis) in Slide-Seq (G) and Cartana (H) data.

Cellular neighborhoods. A. Distinct characteristic region distributions in healthy and AV tissue. Proportion of UMIs assigned to each region (y axis) in each Slide-seq puck (x axis). B. β-catenin expression in AV lesions. Representative images (i) of β-catenin IHC on sections of AV lesions showing either nuclear (*) or cytoplasmic (**) staining pattern (x10 magnification), along with high magnification (x20, ii) of the region highlighted in (i), and H&E staining (iii) of a serial section showing a region proximal to the region in (i). (M-muscularis, L-lumen, dashed line: border between areas of nuclear and cytoplasmic staining of β-catenin). C,D. Spatial relations between the regions. Proportion of beads (y axis) of each region category (color code) at different distances (x axis) from region 2 (C, muscularis), 6 (D, left), 8 (D, middle) or 11 (D, right). E,F. Fraction of expressing cells (dot size) and mean expression per celltype (E, dot color) or per region (F, dot color) of marker genes (columns) across cell types in region 6 (E, rows) or regions 6, 8 and 11 (F, rows). G. Cell type neighborships in different malignant regions. Short-range (≤20µm) neighborship enrichment z- scores (color) vs. a background of spatially random annotation assignment for each pair of cell type annotations (rows, columns) in malignant-like regions 6, 8 and 11 within AV lesions. H. Spatial organization of endothelial cells and pericytes based on Cartana multiplex in situ RNA analysis. Co-occurrence (y axis) of Cdh5 (top) or Pecam1 (middle) -expressing (endothelial) cells or Pdgfrb (bottom) -expressing cells (pericytes) at different distances (x axis) from each central cell type (color code), based on expression of a specific gene (color bar). I. Cell-cell interaction pathways enriched in different regions. Enrichment (FDR) in regions 2 (muscularis, left), 6 (malignant-like, inflammation, middle) or 11 (malignant-like, EMT, right) of corresponding “sender” (x axis) and “receiver” (y axis) (aggregated over ligand-receptor pairs) pathways (dots). Red/blue: pathways significantly enriched/depleted in region (light red/blue: only enriched/depleted for either sender or receiver). The top 5 enriched pathways (in each direction) are labeled.

Transfer of mouse spatial region expression profiles to human patient data. A. Expression profiles characterizing mouse regions in human and mouse samples. Significance (FDR, color bar, two-sided Welch’s t test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of mouse region-associated epithelial (left), immune (middle) or stromal profiles (right) in pucks from normal colon and dysplastic lesions, in the same pucks but after mapping the region annotation to itself (consistency check), in mouse single cell data after mapping region annotation from mouse pucks, and in human single cell data after mapping region annotation from mouse pucks. B. PC1 values (y axis; box plots show mean, quartiles, and whiskers for the full data distribution except for outliers outside 1.5 times the interquartile range (IQR)) in a PCA of region scores of mouse and human samples (x axis) sorted by malignant status and colored by status (top, legend) or study (bottom, legend). C. Spatial regions profiles associated with different CMS classes. Significance (Benjamini-Hochberg FDR, color bar, two-sided Welch’s t test) of enrichment (red) or depletion (blue) of each region profile (rows) in pseudo bulk profiles of human tumor samples classified in each CMS class (columns). D,E. Expression of malignant-like regions 6 and 11 in tumors is associated with PFI and OS in MMRd and MMRp patients. Kaplan-Meier PFI (f, MMRd n = 18995, MMRp n = 44795) or OS (g, MMRd n = 18995, MMRp n = 44795) analysis of human bulk RNA-seq cohort stratified by malignant-like region profile scores.

Correspondence between scRNA-seq, Slide-seq, and Cartana data A,B. Agreement in cell type proportions. Inferred cell type proportions (y and x axis; CLR- transformed cell type fractions) for each cell type (dot color) in normal (A) and AV (B) samples in scRNA-seq, Slide-Seq and Cartana data (axis labels). Pearson’s r: top left corners. Error bars: bootstrapped standard error of the mean ((A) n=8, 16, 12, 16 and (B) n=11, 24, 12, 12 for scRNA- seq, Slide-seq, Cartana V1, Cartana V2). C,D. Agreement in gene expression levels. Measured expression (y and x axis; CLR-transformed gene fractions) for each gene (dot) measured by all three methods in normal (A) and AV (B) samples in scRNA-seq, Slide-Seq and Cartana data (axis labels). Pearson’s r: top left corners. Error bars: bootstrapped standard error of the mean ((C) n=8, 16, 12, 16 and (D) n=11, 24, 12, 12 for scRNA-seq, Slide-seq, Cartana V1, Cartana V2). Genes with the 8 maximal and minimal expression ranks across methods are labeled.

Dysplasia-associated cells in the tumor microenvironment. A,B. Representative images of multiplex RNA analysis in normal colon (left) and AV lesions (right) colored by cell type assignment (A, the same is Figure 1E) and expression of marker genes (B) for Mono02 (Arg1, Hilpda, Nos2, Cd274, Vegfa, Trem1; cyan), Mono03 (Osm, Ifi204 and Thbs1; red), granulocytes (S100a8; yellow), as well as Sell (green), Ccr2 (magenta), Ptprc (blue) and Cd14 (gray).

TLS-like structures in AV lesions by Cartana multiplex in situ RNA profiles. A. Proportion of RNA molecules (y axis) attributed to each cell type (top) or gene (bottom) in TLS-like regions (x axis) in Cartana V2 multiplex in situ RNA experiment. Genes which constitute at least 3% of all measured transcripts in any of the identified follicular structures are noted (bottom). B. Representative images from TLS-like regions showing the distribution of cell type (top row) and of cell type markers (bottom four rows; Cd14, Cd68 (monocytes, magenta), Cd3d, Cd3e, Cd3g (T cells, yellow), and Cd79, Jchain, Igkc (B cells, dark cyan)).

Ligand-receptor analysis. A,B. Cell-cell interaction pathways enriched in normal and AV tissue and regions. Significance (FDR, color bar, two-sided Mann-Whitney U test) of enrichment (red) or depletion (blue) of expression of “sender” (left) and “receiver” (right) cell-cell communication pathways (rows) (aggregted from ligand-receptor pairs) in normal vs. AV samples (A, columns), or regions (B, columns). C,D. Split power analysis for cell-cell communication pathway analysis. Significance of enrichment (log10(FDR), y-axis) for receiver (solid lines) and sender (dashed line) for each of the top 5 enriched communication pathways (color) in each iteration for the spatial split of the Slide-Seq pucks into spatially disconnected and separated patches (x axis), for either normal vs. AV tissue enrichment (C) or for specific regions (D).