(A) Prediction accuracy estimation by bootstrapping analysis upon increasing feature set size. Estimates are averaged over 20 bootstrapping repeats. The maximum accuracy is observed for 17 features (red dot). Sensitivity and specificity are shown in Figure 2—figure supplement 1. The averaged data used to generate the accuracy, sensitivity and specificity plots are available in Figure 2—source data 1. (B) Feature occurrence frequencies of 13 feature selections in 100 bootstrapping repeats. Fifty five features were selected at list once. The 13 most frequently occurring features (red dots) were selected more than 30 times. (C) ROC curves for the three predictive models under comparison, i.e. the p53 mutation status, the 215-feature and the 13-gene signature models. (D) Precision-Recall plot for the 13-gene signature. Five curves are typically shown since cross-validation was repeated 5 times. (E) Performance estimates of the three compared predictive models: AUC (Area Under the Curve, from the ROC curve shown in C), Sensitivity (fraction of correctly predicted sensitive cell lines), Specificity (fraction of correctly predicted insensitive cell lines), PPV (positive predicted value, fraction of sensitive cell lines predicted as such) and NPV (negative predictive value, fraction of insensitive cell lines predicted as such). Measures are averaged over the 5 iterations of 5-fold cross-validations.