1. Cancer Biology
  2. Computational and Systems Biology
Download icon

Defining the biological basis of radiomic phenotypes in lung cancer

  1. Patrick Grossmann
  2. Olya Stringfield
  3. Nehme El-Hachem
  4. Marilyn M Bui
  5. Emmanuel Rios Velazquez
  6. Chintan Parmar
  7. Ralph TH Leijenaar
  8. Benjamin Haibe-Kains
  9. Philippe Lambin
  10. Robert J Gillies
  11. Hugo JWL Aerts  Is a corresponding author
  1. Dana-Farber Cancer Institute, Brigham and Women’s Hospital, Harvard Medical School, United States
  2. Dana-Farber Cancer Institute, United States
  3. H. Lee Moffitt Cancer Center and Research Institute, United States
  4. Institut de recherches cliniques de Montreal, Canada.
  5. Maastricht University, Netherlands
  6. University Health Network, University of Toronto, Canada
  7. University of Toronto, Canada
  8. Brigham and Women’s Hospital, Harvard Medical School, United States
Research Article
Cite this article as: eLife 2017;6:e23421 doi: 10.7554/eLife.23421
5 figures, 3 tables, 1 data set and 4 additional files


Radiomics approach.

(A) Workflow of extracting radiomic features: (I) A lung tumor is scanned in multiple slices. (II) Next, the tumor is delineated in every slice and validated by an experienced physician. This allows creation of a 3D representation of the tumor outlining phenotypic differences of tumors. (III) Radiomic features are extracted from this 3D mask, and (IV) integrated with genomic and clinical data. (B) Representative examples of lung cancer tumors. Visual and nonvisual differences in tumor shape and texture between patients can be objectively defined by radiomics features, such as entropy of voxel intensity values (‘How heterogeneous is the tumor?') or sphericity of the tumor (‘How round is the tumor?').

Schema of our strategy to define robust radiomic-pathway-clinical relationships.

Two independent lung cancer cohorts (D1 and D2) with radiomic (R), genomic (G), and clinical (C) data were analyzed. D1 (n = 262) was used as a discovery cohort and D2 (n = 89) was used to validate our findings. A gene set enrichment analysis (GSEA) approach assessed scores for radiomic-pathway associations. These scores were biclustered to modules that contain features and pathways with coherent expression patterns. These modules may overlap and vary in size. Clinical association to overall survival (red), pathologic histology (purple), and TNM stage (yellow) was statistically tested in both datasets, and results were combined in a meta-analysis to investigate relationships of modules.

Figure 2—source data 1


Spreadsheet containing radiomic data, normalized gene expression, and clinical data of the discovery cohort.

Figure 2—source data 2


Spreadsheet containing radiomic data, normalized gene expression, and clinical data of the validation cohort.

Figure 3 with 2 supplements
Radiomic-pathway-clinical modules.

(A) Clustering of significantly validated radiomic-pathway association modules (FDR < 0.05). Normalized enrichment scores (NESs) have been biclustered to coherently expressed modules. Every heatmap in this figure corresponds to a module (M1 - M13) with radiomic features in columns and pathways in rows. Heatmap sizes are proportional to module sizes. Elements are NESs given in Z-scores across features, and are displayed in blue when positive and green when negative. Horizontal color bars above every module indicate radiomic feature groups (black = first order statistics, orange = texture, purple = shape, red = wavelet, and pink = Laplace of Gaussian). Representative molecular pathways are displayed. (B) Clinical module network. We investigated if modules were associated with overall survival (red), stage (yellow), histology (purple), or no clinical factor (white). Relationships of modules based on their number of shared radiomic features (thickness of blue lines) are displayed by a network. While we found that most modules yield clinical information, overlaps of modules did not indicate relationships to similar clinical factors.

Figure 3—source data 1

Enlarged heatmaps of every module depicting normalized enrichment scores (NESs) of every pair of radiomic feature and molecular pathway clustered in a module.

Figure 3—figure supplement 1
Predictive capabilities of representative radiomic features from every module for genetic mutations in KRAS, EGFR, and TP53 in a subset of the discovery cohort.
Figure 3—figure supplement 2
Association of representative features with smoking history in a subset of the discovery cohort.
Figure 4 with 1 supplement
Test for agreement between radiomic and pathological immune response assessment.

Two representative cases are shown where radiomic predictions of immune response were confirmed by immunohistochemical staining for nuclear CD3 highlighting lymphocytes in brown. Each case is displayed in 0.6X and 2.0X magnification of the tumor slides, and an axial slice of the corresponding diagnostic CT scan and the total tumor volume is given for comparison. Automated quantifications of lymphocytes are displayed in addition to the radiomics score incorporated to classify into high and low responders.

Figure 4—figure supplement 1
Representative cases of immunohistochemical staining for RelA.
Figure 5 with 2 supplements
Combining prognostic signatures for overall survival.

We tested combinations of clinical, genomic, and radiomic signatures. To a clinical Cox proportional-hazards regression model with stage and histology, we first added a published gene signature and next a published radiomic signature. These models were fitted on Dataset1 and evaluated with the C-index (CI) on Dataset2. An asterisk indicates significance (p<0.05). Combining different data types resulted in increased prognostic performances. By adding radiomic and genomic information, the initial performance of the clinical model was increased from CI = 0.65 (Noether p=0.001) to CI = 0.73 (p=2×10−9).

Figure 5—figure supplement 1
Prognostic performance of two radiomic signatures (i.e., a previously published and a novel signature) combined with genetic and clinical information.
Figure 5—figure supplement 2
Prognostic performance of two radiomic signatures combined with different gene signatures and clinical information.


Table 1

Proportions of clinical characteristics in Dataset1 and Dataset2, Figure 2.

Histology and TNM stage were based on pathology were available.



Male100 (45%)59 (68%)
Female124 (55%)28 (32%)

Adenocarcinoma129 (58%)42 (48%)
Squamous61 (27%)33 (38%)
Other34 (15%)12 (14%)

I123 (55%)39 (45%)
II35 (15%)26 (30%)
III46 (21%)12 (14%)
Other20 (9%)10 (11%)
Smoking Status

Current66 (29%)NA
Former141 (63%)NA
None17 (8%)NA
Tumor site

Primary224 (100%)87 (100%)

Overall survivals134 (60%)41 (47%)
Overall deaths90 (40%)46 (53%)
Follow up
(median months)
Table 2

Summary of common themes in all of the identified radiomic-pathway association modules. Columns 1–3 display the module name, the number of radiomic features (nr), and pathways (np), respectively. Columns 4–5 hold the radiomic and pathway themes present in each module.

 M167Wavelet texture gray-level runsLipid and lipoprotein metabolism, Notch signaling, circadian clock
 M2585Wavelet intensity entropy; Laplace of Gaussian intensity standard deviationImmune system, p53
 M3417Wavelet minimum intensityNeural system, axon guidance
 M42514Intensity variance and mean; wavelet minimum intensity minBiological oxidations, signaling by insulin receptor, signaling by GPCR, neuronal system
 M5588Wavelet texture gray-level runs; wavelet intensity range and median; (wavelet) texture information correlation and cluster tendencyAxon guidance and synaptic transmission, lipoprotein metabolism, cell type determination
 M6647Laplace of Gaussian standard deviation; wavelet texture gray-level runs; wavelet texture cluster tendencyCircadian clock, signaling by Notch
 M7398Laplace of Gaussian intensity entropy; wavelet intensity variance; Laplace of Gaussian texture information correlationMitochondria, Pol III transcription
 M82017Laplace of Gaussian standard deviationTCA cycle and electron transport, TGF-beta receptor signaling, response to stress, transcription regulation, protein synthesis,
 M9830Intensity variance; wavelet intensity varianceImmune system, p53, cell cycle regulation checkpoints, cell-cell interaction, circadian clock
 M10583Shape surface (SH); wavelet texture gray-level runsAxon guidance, neuronal system, (innate) immune system, hemostasis, FGFR signaling, TGF-beta receptor signaling, Notch signaling, circadian clock
 M111766Wavelet intensity range; wavelet texture information correlationHemostasis, neural system
 M123227Wavelet texture entropy; intensity variance; wavelet texture cluster tendencyP53, immune system
 M133926Intensity entropyGene expression regulation, Pol II/III transcription
Table 3

Pathway prediction and clinical association. For every module, the independent validation performance of the strongest radiomic based pathway predictors is indicated per module by the area under the curve (AUC) of the receiver operator characteristic. In addition, we highlight whether a module was significantly associated with overall survival (OS), TNM stage (ST), or pathologic histology (HI) (p<0.05).

ModuleStrongest radiomic based pathway predictionAUCOSSTHI
 M1Wavelet (HHH) texture (GLCM) correlation →
Cholesterol biosynthesis

 M2Laplace of Gaussian intensity standard deviation →
Autodegration of the E3 Ubiquitin ligase COP1
 M3Wavelet minimum intensity →
Trafficking of GLUR2 containing AMPA receptors

 M4Wavelet intensity minimum →
Glutathione conjugation

 M5Texture information correlation →
Trafficking of GLUR2 containing AMPA receptors

 M6Wavelet texture cluster prominence →
Notch1 intracellular domain regulation of transcription

 M7Laplace of Gaussian intensity entropy →
RNA polymerase III transcription

 M8Laplace of Gaussian intensity standard deviation →
Pyruvate metabolism and citric acid TCA cycle

 M9Wavelet intensity variance →
Trafficking of GLUR2 containing AMPA receptors

 M10Shape compactness and shape sphericity →
TRAF6 mediated NFkB activation

 M11Wavelet texture cluster tendency →
Platelet aggregation plug formation

 M12Wavelet texture entropy →
G0 and early G1
 M13Laplace of Gaussian intensity entropy →
RNA polymerase II transcription initiation and promoter opening

Table 3—source data 1

Radiomic pathway predictors.


Data availability

The following previously published data sets were used
  1. 1
    89 NSCLC patients with gene expression profiles and matching CT imaging data available at TCIA
    1. Aerts HJ
    2. Grossmann P
    Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE58661).

Additional files

Supplementary file 1

Radiomic feature definition and further description towards meaning of feature groups.

Source code 1

Analysis code.

Source code used to analyse data and generate figures.

Supplementary file 2

Exact p-values of modules and list of used R packages and their versions used for analysis.

Supplementary file 3

Methods for automated pathological call assessment.


Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)