Figures and data
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_fig1a.tif/full/max/0/default.jpg)
Accuracy of models generated with various single and paired molecular representations using support vector machine (SVM) during cross-validation (purple heatmap) and testing (blue heatmap)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl1a.tif/full/max/0/default.jpg)
The top performing standalone fingerprints for each of the 5 ML algorithms
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl1b.tif/full/max/0/default.jpg)
The best and worst performing models using a merged fingerprint for all 5 ML algorithms
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl2a.tif/full/max/0/default.jpg)
Accuracy (%) of models trained with an imbalanced training dataset where the number of BRAF actives is decreased but the number of BRAF inactives is maintained at a fixed number (3600)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl2b.tif/full/max/0/default.jpg)
Accuracy (%) of models trained with a balanced training dataset where the numbers of BRAF actives and BRAF inactives are both similarly decreased
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl2c.tif/full/max/0/default.jpg)
Recall and precision (%) of models trained with an imbalanced training dataset where the number of BRAF actives is decreased but the number of BRAF inactives is maintained at a fixed number (3600)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl2d.tif/full/max/0/default.jpg)
Recall and precision (%) of models trained with a balanced training dataset where the numbers of BRAF actives and BRAF inactives are both similarly decreased
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_fig1b.tif/full/max/0/default.jpg)
Accuracy of models generated with various single and paired molecular representations using random forest (RF) during cross-validation (purple heatmap) and testing (blue heatmap)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_fig1c.tif/full/max/0/default.jpg)
Accuracy of models generated with various single and paired molecular representations using naïve bayes (NBayes) during cross-validation (purple heatmap) and testing (blue heatmap)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_fig1d.tif/full/max/0/default.jpg)
Accuracy of models generated with various single and paired molecular representations using k-nearest neighbour (kNN) during cross-validation (purple heatmap) and testing (blue heatmap)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_fig1e.tif/full/max/0/default.jpg)
Accuracy of models generated with various single and paired molecular representations using gradient-boosting decision tree (GBDT) during cross-validation (purple heatmap) and testing (blue heatmap)
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl3.tif/full/max/0/default.jpg)
Average accuracy for the ‘spiked-in’ “less active”-trained models based on testing with 10 balanced BRAF actives and inactives hold-out test sets
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl4a.tif/full/max/0/default.jpg)
Average accuracy for the ‘spiked-in’ decoy-trained models based on testing with 10 balanced BRAF actives and inactives hold-out test sets
![](https://prod--epp.elifesciences.org/iiif/2/97821%2Fv1%2Fcontent%2F587184v2_tbl4b.tif/full/max/0/default.jpg)