Study overview.

(A) The study utilised 2332 tumour samples representing six cancer types (bladder, uterine, stomach, ovarian, kidney and colon) and transformed multiomics data into images using chromosome interaction network. After the model was trained, we validated found genes with two independent cohorts representing early stage BLCA (UROMOL) and late stage BLCA (Mariathasan) (B) The validation included looking at the most important genes driving metastatic disease, similar/different methylation patterns between cancer types, latent representation of genome data and looking at survival data. (C) The model architecture where the first part of the network encodes genome data into latent vector, L, followed by decoding where image is reconstructed. Next layers aim to extract information from the reconstructed image, concat it with L and make a final prediction.

Data transformation overview.

(A) The multiomics genome data was transformed into 4 image types; square image organised by chromosome interaction network, chromosome image organised by chromosome interaction network, randomly organised image and flat vector containing all multiomics data. (B) All four image types were used in training for metastatic disease prediction and the square image organised by chromosome interaction network resulted in best model performance (green colour). The red line shows where the model resulted in the best loss. All curves stoped when the loss started increasing, indicating overfitting. The bar plot shows the proportion of correctly predicted (metastatic disease) in every cancer type included in the study. (C) 2-dimensional representation of vector L using UMAP for each predicted variable. Colours indicate the output variable which was used in the specific run.

The most important events in metastatic disease development.

(A) Pieplot showing the relative importance of each data source when predicting metastatic disease for each cancer type included in the study. (B) Top 50 genes for every cancer type scale by cancer type. The star symbol below the gene names indicates that gene is part of COSMIC gene consensus. The color of the gene name indicates the data source and colour of the bar indicates the cancer type.

Validation on late stage immunotherapy treated bladder cancer (Mariathasan).

(A) Comparison of randomly selected genes vs. genes picked by GENIUS in cox proportional hazard model. (B) Forest plot showing top 10 expressed/methylated genes in multivariate coxproportional hazard model. (C) Volcano plot showing top 10 expressed/methylated genes and their enrichment in two comparisons; Stages I-III vs Stage IV and immunotherapy binary response.

Validation on early stage bladder cancer (UROMOL).

(A) Comparison of randomly selected genes vs genes picked by GENIUS in cox proportional hazard model. (B) Forest plot showing top 10 expressed/methylated genes picked by GENIUS for BLCA. (C) Volcano plot showing the top 10 expressed/methylated genes when compared between EORTC-Low and EORTC-High groups. (D) Volcano plot showing the top 10 expressed/methylated genes when compared between low grade and high grade BLCA tumours.

Summary of BLCA genes in two validation cohorts