Abstract
Highthroughput testing of drugs across molecularcharacterised cell lines can identify candidate treatments and discover biomarkers. However, the cells’ response to a drug is typically quantified by a summary statistic from a bestfit doseresponse curve, whilst neglecting the uncertainty of the curve fit and the potential variability in the raw readouts. Here, we model the experimental variance using Gaussian Processes, and subsequently, leverage uncertainty estimates to identify associated biomarkers with a new Bayesian framework. Applied to in vitro screening data on 265 compounds across 1074 cancer cell lines, our models identified 24 clinically established drugresponse biomarkers, and provided evidence for six novel biomarkers by accounting for association with low uncertainty. We validated our uncertainty estimates with an additional drug screen of 26 drugs, 10 cell lines with 8 to 9 replicates. Our method is applicable to any doseresponse data without replicates, and improves biomarker discovery for precision medicine.
Introduction
The failure rate for new drugs entering clinical trials is in excess of 90%, with more than a quarter of drugs failing due to lack of efficacy (Arrowsmith and Miller, 2013; Cook et al., 2014). The rapid development of technologies for deep molecular characterisation of clinical samples holds the promise to uncover molecular biomarkers that stratify patients towards more efficacious drugs, a cornerstone of precision medicine. In oncology, we can identify potential biomarkers of drug response in highthroughput screens (HTS) of patientderived cell lines; these biomarkers need to be then validated in patients.
Assessment of cell line drug response typically involves treatment with multiple concentrations of the compound, followed by measurement of the amount of viable cells after a fixed period of time for each dose, and derivation of a doseresponse curve. The drug response is commonly then summarised by measurements taken from this curve, most often the concentration required to reduce cell viability by half that is IC_{50}, or the area under the curve that is AUC. Currently the two largest in vitro drug screening studies, the Genomics of Drug Sensitivity in Cancer (GDSC) (Garnett et al., 2012; Iorio et al., 2016) and the Cancer Therapeutics Response Portal (CTRP) Rees et al., 2016 have shown that some clinicallyactionable biomarkers of drug response can be concordantly discovered (Iorio et al., 2016; SeashoreLudlow et al., 2015), and that different properties and mechanisms of drug response are best captured by different metrics dependent on the doseresponse curve (FallahiSichani et al., 2013).
Most HTS efforts focus on increasing throughput (Iorio et al., 2016; SeashoreLudlow et al., 2015) and thereby often neglect experimental replicates, which renders it impossible to correct for experimental noise, resulting in uncertainty for the estimated drugresponse metrics (e.g. IC_{50} value). Extrapolating IC_{50} values beyond the tested drug concentration range is particularly challenging and often unaccounted for in quality control metrics (HaibeKains et al., 2013; Haverty et al., 2016). Most published studies using machine learning algorithms or mechanistic models for predicting drug response and biomarkers assume that the measured drug responses are precise (Costello et al., 2014; Keshava et al., 2019; Menden et al., 2019; Silverbush et al., 2017). If this assumption is not met and there is high uncertainty in the measured drugresponse values, the utility of these methods for enhancing drug development may be severely limited (Costello et al., 2014; Menden et al., 2019; Silverbush et al., 2017). Experimental noise can be reduced by adding experimental replicates, however, this either reduces the throughput of the screen or increases the cost. Most current models for curve fitting and describing doseresponse data have primarily assumed that cell viability has a sigmoidal relationship to the logarithm of the dose concentrations of the drug (Dawson et al., 2012; Wang et al., 2010). Whilst some models are more flexible by allowing many inflection points in the doseresponse curve (Di Veroli et al., 2016; Vis et al., 2016), their main output is a single drugresponse value that does not fully capture the uncertainty in the measurements (FallahiSichani et al., 2013).
Gaussian Processes (GP) are a flexible, probabilistic modelling technique that has been successfully used to measure uncertainty in noisy gene expression datasets (LopezLopera and Alvarez, 2019) and has been incorporated into machine learning prediction of cell fates (Boukouvalas et al., 2018). This technique has been shown to cope well with regression tasks on dependent data and high dimensional covariates (Rasmussen and Williams, 2005; Shi and Choi, 2011). Instead of fitting a single function to the data, GPs allow for a flexible range of beliefs about the function underlying the data (Tian et al., 2017). In the case of cell line drug responses, this can be conceptualised as fitting a range of curves that have equivalently strong fit to the data. We can sample from the inferred posterior distribution over functions, that is the variance between these curves, to generate uncertainty estimates of quantities of interest, in our case, properties of the doseresponse such as IC_{50}.
GPs have been recently utilised to identify and guide experimental validation of compounds, on top of being applied to protein engineering and imputing gene expression values (Hie et al., 2020). GPs have also been used in conjunction with neural networks to model doseresponse curves as a function of molecular markers (Tansey et al., 2018). The main objective in this work was to predict drug response using the molecular measurements, and the nonlinear nature of the prediction model makes interpretation for the purpose of biomarker detection challenging. By contrast, we aimed to develop a model that could provide interpretable summary statistics with uncertainty estimates that can be flexibly used to improve biomarker detection.
In this study, we therefore introduce a new GP regression approach for describing doseresponse relationships in cancer cell lines that quantifies the uncertainty of the model fitted to measured responses for each single experiment, and we show that estimates of IC_{50} values within the tested concentration range correlates with confidence intervals obtained experimentally from replicate experiments. Subsequently, we use our new doseresponse model to identify genetic sensitivity and resistance biomarkers in standard statistical tests (e.g. ANOVA). We demonstrate how the flexibility of the GP doseresponse modelling can be further exploited in a Bayesian framework to identify novel biomarkers. We also describe the variation in the level of drug response uncertainty across cancer types and drug classes. By accounting for the uncertainty in doseresponse experiments, detection of clinicallyactionable biomarkers can be enhanced.
Results
A probabilistic framework for measuring doseresponse and predicting biomarkers
We analysed in vitro screening data on 265 compounds across 1,074 cell lines (Iorio et al., 2016). In those experiments, we quantified the amount of cytotoxicity after four days of compound treatments at each dose compared to controls (Figure 1A). The relationship between the dose and response (decrease in cell viability) was first described using a doseresponse curve derived with a sigmoidal function (Figure 1B and C). This assumes that the number of viable cells decreases at an exponential rate, then slows down and eventually plateaus at a lower limit. Since it was costly to test all possible doses, the sigmoid function was used to extrapolate the response at concentrations that had not been tested and to estimate overall measures of response, such as IC_{50} or AUC values, for downstream analysis. However, considering that each experiment tested only between five and nine dosage concentrations per experiment in GDSC, and a maximum of 16 in CTRP, the tightness of fit of the doseresponse curve to the data points and therefore the level of uncertainty about the inferred response may vary. We utilised the probabilistic nature of GP models to quantify the uncertainty in the doseresponse experiments as an alternate approach (Figure 1D). We sampled from the fitted GP and used the posterior distribution to quantify the uncertainty in curve fits for each experiment. We again generated summary statistics, IC_{50} and AUC values, by taking the average of the GP samples and also quantified the level of uncertainty for these statistics (Figure 1E). The GP model has the advantage that it models outliers at higher doses as one component of a twocomponent Beta mixture in the model (see Materials and methods). Such outliers are typically the result of an experimental failure, and cannot be modelled using simple Gaussian noise without overestimating the noise parameter.
After fitting the doseresponse data using the sigmoid and GP models, we tested various biomarker hypotheses by examining the association between the overall response statistics from the models with genetic variants detected in the cell lines using a frequentist and a Bayesian approach (Figure 1F–H). For one biomarker hypothesis, as an example, we examined copy number alterations and point mutations in breast cancer cell lines in relation to the measured drug response of afatinib in those cells. The GP and sigmoid estimated IC_{50} from cell lines treated with afatinib were significantly different in cases with and without ERBB2 amplification (ANOVA qvalue = 4.12e9; Figure 1I). The GP models provided an added benefit of providing uncertainty estimates that were incorporated into a Bayesian hierarchical model to further verify the association between ERBB2 amplification and afatinib sensitivity (posterior probability = 0.001; Figure 1J).
Gaussian Processes provide estimates of doseresponse uncertainty for single experiments
Both GP and sigmoid curve fitting produced comparable IC_{50} and AUC estimates. Precursor sigmoid curve fitting methods based on Markov Chain Monte Carlo simulations enabled error estimates in IC_{50} values (Garnett et al., 2012), however, this was neglected in the stateoftheart sigmoid curve fitting (Vis et al., 2016) due to missing propagation to biomarker identification. Here, we introduce the added benefit of sampling from the GP posterior, which provides the models inbuild uncertainty obtained for these IC_{50} estimates. This is important for highthroughput drug screening experiments where there is often a high number of drugs and samples tested but very few replicate experiments. By applying the GP model to each experiment, we estimated the standard deviation for each IC_{50} or AUC value based only on data points from that single experiment. These single sample standard deviations were compared to the standard deviations measured from here provided replicate experiments, that is the same drug tested multiple times on the same cell line and at the same concentration. We applied our GP estimation method to data from replicate experiments of 26 drugs on 10 cell lines, which contained 260 test conditions and 8 to 9 replicates for each condition. We wanted to see if an estimate of the uncertainty of the summary statistic, such as the standard deviation of the IC_{50} posterior samples, would be correlated with the dispersion between replicates. Here, we refer to the variability between (mean) estimates for replicates as the observation uncertainty, and the variability in the estimate for a single replicate as the estimation uncertainty.
We compared observation and estimation uncertainty across replicate experiments of all 260 conditions (Figure 2A). When the estimation uncertainty is large, we will have less confidence in the estimated IC_{50} in an experiment. Measurement errors for individual points in a doseresponse curve will generally result in larger estimation uncertainty, whereas greater variation between biological replicates will result in larger observation uncertainty. We found two trends in the relationship between observation and estimation uncertainty. First, for experiments where the estimated IC_{50} lies within the concentration range tested, the estimation uncertainty is positively correlated (Pearson correlation = 0.84, 95% CI [0.76, 0.89]) with the observation uncertainty. Second, for experiments where the estimated IC_{50} lies beyond the maximum tested concentration, we observed a negative correlation (Pearson correlation = −0.39, 95% CI [−0.51,–0.25]). We note that the latter experiments require extrapolation to estimate the IC_{50} beyond the concentration range, which increases the estimation uncertainty, but does not generally affect the observational uncertainty. However, we observed that the estimation uncertainty from our GPs for dabrafenib (BRAF inhibitor) tested in two independent studies on the same cell lines were comparable both within and beyond the concentration range (Figure 2B).
Since the replicate experiments were conducted in batches over a period of several months, we verified that the observed trends held regardless of batches (Figure 2—figure supplement 1). Additionally, we examined the relationship between estimation uncertainty and observation uncertainty in a number of edge cases where IC_{50} was estimated within and beyond the maximum concentration tested (Figure 2C–E). In the case of olaparib tested on PC14, the uncertainty for the IC_{50} within each replicate experiment was high, and this level of uncertainty was consistent across all replicates even beyond the max concentration (Figure 2C and F). In other replicate experiments, both estimation and observation uncertainty were low (Figure 2D and G), or varied depending on whether the batch reported mostly IC_{50} values beyond the concentration range. Talazoparib tested in colorectal cancer line HCT15 is a case where observation uncertainty was high, even though estimation uncertainty was low, and experiments in different batches showed different estimated IC_{50}s from very different doseresponse curves (Figure 2E and H).
In order to examine the diversity of uncertainty estimates across experiments further, we described the relationship between AUC value of GP fits with their corresponding estimation uncertainty (Figure 3). We decided to use AUC here due to the greater uncertainty of estimating IC_{50}s beyond the maximum dose concentration. Since AUCs were computed within the tested concentration range, the estimation uncertainty for AUC was not substantially higher for cases where IC_{50}s were estimated within compared to beyond the maximum concentration (Figure 3—figure supplement 1A). The difference between the AUC estimates from the GP compared to the published GDSC sigmoid curve fits was greatest for experiments showing a partial response (AUC between 0.4 and 0.9), whilst at the same time these experiments also had the highest estimation uncertainty (Figure 3A). Our visual examination of the raw doseresponse data from those experiments revealed evidence of poor quality readouts, for instance, where cell viability increases with increasing drug dose (Figure 3—figure supplement 1B). We were able to quantify the quality of these readouts by estimating the Spearman correlation coefficient based on the raw cell viability counts and the dose concentrations (Figure 3B). A negative Spearman correlation indicates that cell viability decreases as dosage increases (as expected) whilst a positive Spearman correlation indicates the opposite. The experiments with high estimation uncertainty from our GPs were also the experiments with high Spearman correlation pointing to poor quality.
Next, we investigated whether there were any attributes of experiments that would correspond to high estimation uncertainty and poor quality results. Labelling of experiments based on cell culture conditions, dose and cancer type revealed no obvious associations with estimation uncertainty (Figure 3—figure supplement 2A–E). However, there was a large spread in the uncertainty estimates for AUC when we grouped the experiments into target pathways based on the primary targets of the tested drugs (Figure 3C; Figure 3—figure supplement 2F). Whilst most drugs had similar average AUC point estimates between 0.6 and 0.8, suggesting they all had a spread of experiments showing resistance and sensitivity, the average estimation uncertainties varied across target pathways. Interestingly, similar target pathways (e.g. chromatin histone methylation and chromatin histone acetylation) had very different levels of estimation uncertainty. Within each of these target pathways, we also see different distributions of estimation uncertainties (Figure 3D). Most target pathways have a bimodal distribution representing compounds that have low uncertainty in the cases of clear sensitivity or resistance, and high uncertainty in the cases of partial responders (Figure 3E). Both chromatin histone methylation drugs in particular had a much longer right tail towards higher estimation uncertainties that are associated with poor experimental readouts, or possibly offtargets.
Curve fits using Gaussian Processes can help identify clinically relevant biomarkers
The IC_{50} values are highly conconcordant for sigmoid and GPcurve fittings, showing an average weighted Pearson correlation of 0.88 (95% CI [0.85; 0.91]) across individual drugs, and cancer types (Figure 4A). Strong agreement is found when true responding cell lines were observed in the screen (Figure 4B). For example, if >10% of cell lines responded within the concentration range, that is IC_{50} <maximum tested concentration, then a weighted Pearson correlation >0.75 was consistently achieved for all drugs. We found positive correlations for all drugs, even when comparing exclusively nonresponding cell lines, where all the IC_{50} values are extrapolated beyond the maximum dosage range. Drugresponse values are concordantly fitted with both methods for sensitive cell lines (Figure 4C, mean log10(IC_{50}) in µM of 0.02 95% CI [−0.05; 0.09]), whilst extrapolated nonresponders tend to lead to more conservative and higher IC_{50} values fitted with GP (Figure 4C, mean log10(IC_{50}) in µM of 1.10, 95% CI [1.03; 1.18]). Whilst the average fits from the sigmoid and GP models identify known clinical biomarkers, there are clearly differences for individual cell lines, especially when the IC_{50} value has been extrapolated beyond the dosage range, that may help identify new biomarkers. Alternatively, AUC values can be used to compare both curve fitting methods (Figure 4—figure supplement 1). Whilst known clinical biomarkers are recovered with AUC as a drugresponse metric, IC_{50} measures were used in the subsequent analysis as they retain direct relationship with the drug concentration and are more interpretable.
To highlight the overall agreement of both curve fitting methods, we systematically tested 26 clinically established biomarkers of drug response (Figure 4D, Figure 4—figure supplement 2A–C, Supplementary file 1) using previously established association tests (Iorio et al., 2016), 24 of which were significantly reproduced regardless of sigmoid or GPcurve fitting (10% FDR). For example, both curve fittings captured the association of BRAF inhibitors (PLX4720, progenitor of vemurafenib; and dabrafenib) with BRAF mutations in melanoma (Figure 4—figure supplement 3A–C; Chapman et al., 2011). Dabrafenib is a potent BRAF inhibitor and in addition we detected BRAF mutations as a sensitivity marker in thyroid carcinoma (Figure 4D, Figure 4—figure supplement 3D). Another example are the EGFR inhibitors, afatinib and gefitinib, that are concordantly correlated with drug sensitivity in EGFR mutant cell lines in lung adenocarcinoma (Figure 4—figure supplement 3E–G; Tamura and Fukuoka, 2005; Yang et al., 2012). ERBB2(HER2) amplification in breast cancer was also recapitulated as a biomarker of sensitivity to the dual EGFR/ERBB2 inhibitor lapatinib (Figure 4—figure supplement 3H; Konecny et al., 2006). Among the 26 clinical biomarkers, we consistently found drug resistance of TP53 mutants to MDM2 inhibition with nutlin3a in five different cancer types (Figure 4E, Figure 4—figure supplement 3I–L). Overall, the majority of expected clinical and preclinical biomarkers are reproduced, regardless of the drugresponse curve fitting method.
We concordantly and significantly identified six novel (not yet clinically established) drug sensitivity biomarkers (0.1% FDR) regardless of the applied drugresponse curve fitting method. Investigating two different curve fitting algorithms, and retrieving the same biomarkers can be considered as a test of robustness, which in our case concordantly highlighted nongold standard associations for prioritising experimental validation. For example, daporinad (also known as FK866 and APO866) is a smallmolecule inhibitor of nicotinamide phosphoribosyltransferase leading to inhibition of NAD+ biosynthesis. It has been clinically tested in melanoma (ClinicalTrials.gov Identifier: NCT00432107), Refractory BCLL (NCT00435084) and Cutaneous Tcell Lymphoma (NCT00431912), whilst showing antiproliferative effect in glioblastoma cell lines (Zhang et al., 2012). Therapeutic potential when combining with other drugs used to treat gliomas (LucenaCacace et al., 2019; LucenaCacace et al., 2017) has been suggested, whilst we additionally and concordantly identify EGFR amplification as a biomarker (Figure 4F).
Another novel and concordant identified biomarker is doramapimod response (also known as BIRB796) in ARID2 mutant melanoma cell lines (Figure 4G). Doramapimod is a smallmolecule p38 MAPK inhibitor and has been reported in different cancer types (in combination with other drugs) including cervical cancer, paracrine tumours and myeloma (Jin et al., 2016; Yasui et al., 2007). ARID2 is part of chromatin remodelling complex and is involved in DNA repair in hepatocellular carcinoma cells (Oba et al., 2017) and enriched in melanomas (Ding et al., 2014; Hodis et al., 2012). In conclusion, different curve fitting approaches lead to concordantly and novel identified biomarkers, thereby increasing the robustness in those findings, and consequently enabling to prioritise hypotheses.
Improved biomarker detection by taking into account uncertainty in a Bayesian framework
Since both Bayesian and frequentist methods can be used to prioritise biomarkers for further testing, we compared association statistics (posterior probabilities and qvalues) from both statistical methods. We observed a number of cases where the Bayesian and ANOVA tests disagree (Figure 5A; Supplementary file 2). For instance, BRAF mutations in colorectal cancer were detected as a sensitivity biomarker for dabrafenib by the Bayesian test, but less significant by the ANOVA test. This association had been repeatedly reported in in vitro models (Iorio et al., 2016; Rees et al., 2016) and also found in melanoma cases (Chapman et al., 2011), whilst not in colorectal cancer patients due to feedback activation of ERKsignalling mediated via EGFR (Corcoran et al., 2018; Prahallad et al., 2012). We note in Figure 5B that the Bayesian test takes advantage of the additional information that sensitive mutant cell lines have low estimation uncertainty, whilst the small number of resistant mutant cell lines have high estimation uncertainty, causing them to have less influence on the biomarker detection. On the other hand, the ANOVA model detected the KRAS copy number alteration as a resistance biomarker for lenalidomide (immunomodulatory drug) partial sensitivity in skin cutaneous melanoma (SKCM), whilst not detected by our Bayesian approach. Whilst on the linear IC_{50} scale there is some difference between the small number of mutant cell lines and wildtypes, the Bayesian model considered that the estimated responses of the mutant cell lines had high uncertainty (Figure 5C). Additionally, a comparison of the uncertainty estimates for the GP and the Sigmoid curve fitting methods revealed that both display concordant results (Figure 5B and C; Figure 5—figure supplement 1); However, the Sigmoid curve fitting method (Materials and methods; Vis et al., 2016) underestimates variance of nonresponding cell lines rendering the GP approach superior. The dosages within Figure 5B and C were rescaled to prevent the need for adapting the lengthscale hyperparameter to the maximum dosage. IC_{50} values were backtransformed to the log10 drug dosage scale to make comparisons with (Iorio et al., 2016) (see Materials and methods). Whilst discrepancies between Bayesian and ANOVA tests have to be taken with caution, they may highlight novel biological insights which would be missed when applying only a single model.
Discussion
The GP approach developed in this research has several advantages compared to the traditional approach of fitting sigmoidal drugresponse curves. Firstly, these flexible, nonparametric models can be used to fit a wider variety of doseresponse curves than the parametric sigmoidal models, for example curves of unexpected shapes may reflect biological signals of offtarget effects. Secondly, the GP models provide straightforward uncertainty quantification of any summary statistic that can be calculated on a doseresponse curve, a fact that we take advantage of in developing our hierarchical Bayesian model for biomarker testing. Thirdly, the GP model can deal with outlying measurements better than a sigmoidal model, due to formulating it as a mixture model with one component representing the latent GP process of the drug response, and the second component accounting for outliers.
In contrast to other GPbased models in Tansey et al., 2018, our approach is highly interpretable, as we do not integrate the biomarkers into the model estimation in a nonlinear fashion, but instead proceed in a twostep approach that first fits our Gaussian process model to the doseresponse curves, and then uses the derived summary statistics and uncertainty measures to perform biomarker detection. Thus, we can take advantage of the flexibility of the Gaussian process without the complexity of fitting a nonlinear neural network to enable prediction from molecular measurements.
The increased flexibility of the GP model comes at a price. Most notably, because we do not impose a specific functional form, there are few constraints on the behaviour of the curve outside the range of observed dosages. This leads to the counterintuitive behaviour that the posterior mean estimate of drug response can go up when extrapolating beyond the maximum dosage. Note, however, that this goes along with a commensurate increase in the posterior variance (Figure 5D,E). In other words, the model is highlighting that extrapolation beyond the observed dosage range is highly uncertain, and the posterior mean estimate should not be relied on. It would be possible to constrain this behaviour by introducing artificial data points at a high concentration, or less crudely by imposing monotonicity constraints via virtual derivative observations (Riihimäki and Vehtari, 2010). However, these methods would limit the flexibility of our method and lead us to underestimate the uncertainty of the posterior mean. An alternative approach is to constrain the Gaussian process using generalised analytic slice sampling (Tansey et al., 2019), which integrates the constraints into the sampling process. Whilst theoretically appealing, this approach is not compatible with the variational inference method that we have chosen for our work, and would lead to an unacceptable increase in computational burden for fitting the doseresponse curves.
We have systematically compared the application of GP to sigmoid models across a pancancer drug screen. We demonstrated that our GP estimates of the IC_{50} values and their subsequently predicted biomarkers using ANOVA are reliable when compared to estimates from the sigmoid models. In addition, the GP models provide useful information about the uncertainty associated with the drugresponse quantification. However, there is a crucial difference between estimation uncertainty on a single experiment and observational uncertainty across multiple replicates of the same experiment, which incorporates measurement error, technical and biological variation. We are interested in the former to assess the quality of the fit, and therefore the reliability of the estimated IC_{50}. We hypothesized that estimation uncertainty characterises observational uncertainty within the dose concentration range tested. However, extrapolating beyond the concentration range would be challenging due to the uncertainty in the behaviour of the doseresponse curve in unobserved concentrations. Imposing monotonicity may not be the best path in this case, but we avoid making this assumption. Instead, our method defines a very large confidence interval for drugresponse statistics extrapolated beyond the maximum dose tested and we would additionally need to take the observation uncertainty between replicate experiments into account. We have verified this by applying our estimation method to a replication data set of 26 drugs tested on 10 different cell lines, with 8 to 9 replicates for each drugcell line experiment. We conclude that whilst estimation uncertainty is a useful indicator for withinconcentration IC_{50} values, it cannot be used as a proxy for observation uncertainty when the IC_{50} is extrapolated beyond the tested concentration range. Indeed, overall drug responses and biomarkers from independent drug screens were consistent when comparing similar dose ranges (Haverty et al., 2016). Any difference between replicate experiments may be due to batch effects or other unobserved factors that are not necessarily reflected in the estimation error. Whilst previous studies have attempted to capture uncertainty by measuring the spread of the residuals from the fitted curves, such as root mean square error, they were not able to capture these false positive biomarkers by setting strict cutoffs (Cokelaer et al., 2018).
Whilst Bayesian posterior probabilities and ANOVA qvalues are different statistical quantities for measuring biomarker associations that should not be compared in absolute terms, we compared these quantities in relative terms to prioritise biomarkers of response for further testing. Our Bayesian biomarker model extends the classical ANOVA testing, since it is able to leverage the estimation uncertainty of the IC_{50} values. We showed that taking estimation uncertainty into account in the Bayesian model can lead to both inclusion and exclusion of putative biomarkers. For example, the Bayesian model highlighted the association between BRAF mutation in colorectal cancer and BRAF inhibitor response. Targeting BRAF signalling has recently been confirmed as a viable option for metastatic colorectal cancer cases with BRAF mutations (Kopetz et al., 2019). In contrast, the Bayesian model excluded a suggestion from ANOVA of association between KRAS mutation with lenalidomide response in melanoma. Lenalidomide has thus far had no clinical success in KRAS mutant cases nor melanoma (Gandhi et al., 2013; Glaspy et al., 2009).
Although we systematically tested for drugbiomarker associations, we did observe common behaviour for certain cell types or classes of drugs. The high uncertainty in the response estimates of chromatin histone methylation targeting compounds for instance may be due to the large number of factors contributing to epigenetic regulation of cells (Luo, 2015). It would be straightforward to extend the GP model to allow for sharing information across drugs or cell lines of similar class, by using either shared hyperparameters or a hyperprior on the hyperparameters. We have not implemented this approach in our work here as our aim was to show the advantage of fitting individual drugresponse using GPs, and extending the method to fitting multiple curves jointly would increase the memory and computational requirements significantly. It is our hope to continue expanding the suite to multiple dimensions of doseresponse and biomarker prediction needed for drug combinations, which is predominantly based on synergy modelling with either Loewe Additivity or Bliss Independence (Di Veroli et al., 2016; Vlot et al., 2019). In cases where multiple statistical models converge to concordant biomarkers, this increases the reproducibility of the evidence, potential for clinical translatability and ultimately enables precision medicine.
The increasing utilisation of highthroughput drug screening for identifying effective new treatments will necessitate the use of more powerful statistical and machine learning methods (Toh et al., 2019). We have introduced an approach for quantifying the uncertainties of doseresponse using Gaussian Processes and further described how these uncertainties can be integrated into statistical testing of biomarkers. For cancer treatments, our approach can help estimate the uncertainty of doseresponses reported in the numerous drug screening studies by academic (Ghandi et al., 2019; Holbeck et al., 2017; Iorio et al., 2016) and pharmaceutical laboratories (Menden et al., 2019; O'Neil et al., 2016). This can provide more robust metrics for comparing drug responses to identify the most potent ones and highlight sensitivity biomarkers that are more likely to succeed clinically because they are associated with low uncertainty. The approach is also generalisable beyond cancer to any disease and any doseresponse measures. We hope that by considering response uncertainty and providing a probabilistic view of drug biomarkers, the risks associated with drug development can be better balanced and smarter decisions can be made.
Materials and methods
Drug screening
Request a detailed protocolWe analysed 1074 cancer cell lines tested with 265 compounds from a highthroughput screen resulting in 225,384 experiments that were previously published (Iorio et al., 2016). Cell line data was retrieved and is publically available via the GDSC website (Key Resources Table). All cell lines were authenticated. Details for each cell line can be found at: https://www.cancerrxgene.org/help.
Compounds were tested with 5 to 9 titration points, whilst either diluted with 4 or 2fold, respectively. Cells were seeded on day zero, left in the microtiter plate for 24 hr to retain linear growth, and consecutively treated for 3 days. After those 3 days of treatment, cellTiterGlo staining is used to quantify ATP levels within each well. In parallel, untreated cells and blank wells were also measured to estimate and normalise cell viability.
Compounds within the replicate study were screened across a seven point doseresponse curve with a halflog dilution and 1000 fold range. The duration of drug treatment was 72 hr and cell viability was measured using CellTiterGlo (Promega). Each cell line and compound pair was screened in technical triplicate, three assay plates generated simultaneously, and across three biological replicates with 46 and 44 days between the first to second and second to third replicates respectively. Cell viability measurements for these experiments can be found in Supplementary file 3.
Preprocessing
Request a detailed protocolPrior to analysis, we scaled the raw observed fluorescent intensities for each drug/cell line combination using the observations from the blank and negative control wells as follows. Let $R=\left\{{r}_{1},{r}_{2}\mathrm{...},{r}_{n}\right\}$ be the observed intensities for n dosages. Let B be the mean of the intensities for the blank wells on the same plate as the experiment, and C be the mean of the intensities of the negative control wells (no drug added). Then the relative cell viability $V$ can be calculated as:
Relative cell viability values below 0 (n = 2646, Figure 5—figure supplement 2) were set to 0.
For the purpose of fitting the Gaussian process models, we additionally rescale the dosages to avoid having to adapt the lengthscale hyperparameter to the maximum dosage. We rescale the log_{2}transformed dosages $d=\left\{{d}_{1},{d}_{2}\mathrm{...},{d}_{n}\right\}$ as follows:
Note that IC_{50} values have been backtransformed to the log10 drug dosage scale for comparability with those reported in Iorio et al., 2016.
Sigmoid drugresponse model
Request a detailed protocolThe GDSC estimates in Iorio et al., 2016 were obtained using a sigmoid fit to the drugresponse curve, using the same preprocessing of the fluorescent intensities as described above. The particular sigmoid model used is the one described in Vis et al., 2016. In brief, if we have shape parameter $s}_{i$ and position parameter $p}_{ij$ for cell line $i$ and drug $j$ , then cell viability can be represented as a function of dosage $d$:
Note that this allows for cell line/drug specific position parameters, but shape parameters that only vary by cell line and are shared across drugs. The position parameter $p}_{ij$ corresponds to the estimated IC_{50} for cell line $i$ and drug $j$. For full details, see Vis et al., 2016.
To estimate the uncertainty of the Sigmoid curve fitting, a random bootstrap sampling of 80% of all treated cell lines available for each drug over 100 iterations was performed. The Sigmoid curve fitting model from GDSC (Vis et al. 2013) estimates one scale parameter per drug across all treated cell lines, thus the sampling creates variance in the response data. The standard deviation of the log(IC_{50}) estimates was computed to assess the model’s variance.
Gaussian process drugresponse model
Request a detailed protocolFor simplicity, we drop the subscripts $ij$ and present the combination. We model the drug response $\mathbf{y}$ via a twocomponent Beta mixture such that:
where $Bet{a}^{\mu}$ is the reparameterization of the Beta distribution in terms of the mean µ and a scale parameter $s$, and $\mathrm{\Phi}}^{1$ is the probit function (the inverse of the standard normal cumulative distribution function). Component one represents the drug response, which is driven by a latent Gaussian process $\mathbf{f}$, whilst component two represents outliers that deviate from the overall dose response trend. We set the scale parameters ${s}_{1}=50$ and ${s}_{2}=11$ and specify ${\mu}_{2}=0.9$ to reflect our belief that outliers will mostly be erroneous measurements of resistance. We set $\pi =0.999$ as we believe that outliers are rare.
We place a standard Gaussian process prior on $\mathbf{f}$, such that:
where $\mathbf{m}$ is the mean drug response, and ${C}_{\mathrm{\Psi}}(d,\phantom{\rule{thinmathspace}{0ex}}{d}^{\prime})$ is a covariance function with hyperparameters $\mathrm{\Psi}$; in practice we choose a combined linearMatern3/2 as a flexible option, which avoids the excessive smoothness of restrictions of the commonly used RBF kernel. Stein (1999) argues that this is a more realistic representation for physical processes (Stein, 2012). Information sharing across drugs and cell lines can be achieved via shared hyperpriors in a hierarchical model. For the application in this paper joint inference with shared hyperpriors would be computationally difficult, and we choose to instead empirically set the variance and lengthscale parameters for the Matern to 0.2 and 0.3, respectively, and the variance parameter for the linear kernel to 0.1.
Inference is performed using variational learning (Hensman et al., 2013), via the GPFlow software (Matthews AG de and van der Wilk, 2017). We choose variational learning over alternatives such as Markov chain Monte Carlo due to its speed, which allows us to process large drugresponse panels in a realistic time frame. Hyperparameters for the GP model were determined by manual tuning; however, for other datasets, we could also envision a Bayesian model selection procedure which places the variational inference in a variationalwithinMCMC scheme where the MCMC moves update the hyperparameters. If fixed hyperparameters are desired, one could use the maximum a posteriori values. To avoid massive computational complexity, the MCMC scheme could be run on a representative subsample of cell lines.
Calculation of summary statistics
Request a detailed protocolSummary statistics of drug response can be calculated straightforwardly by sampling from the posterior of the Gaussian process (Supplementary file 4). Generally, let $g(\mathbf{d}\mathbf{,}\phantom{\rule{thinmathspace}{0ex}}\mathbf{y})$ be a function that calculates a summary statistic $\tau$ from a doseresponse curve with dosages $\mathbf{d}$ and responses $\mathbf{y}$, then we can obtain a posterior estimate of the mean of the summary statistic by first sampling $N$ doseresponse curves from the posterior of the GP model, and then calculating the average:
A similar procedure can be used to calculate the posterior estimate of the standard deviation.
Although we can extract other response statistics from our curve fits, the most common are the IC_{50} and the area under the drugresponse curve (AUC). On the log_{2} dosage scale the dosages are equally spaced, and hence AUC can be straightforwardly estimated by the mean function:
where $m$ indexes over the $n$ dosages. For the IC_{50}, estimation for a single curve is complicated by the fact that the curve may not cross the 50% viability threshold within the observed dosage range (noncrossing sample). We therefore extrapolate the GP samples to 10 times the maximum (log_{2}) experimental dosage and specify ${g}_{IC50}(\mathbf{d}\mathbf{,}\phantom{\rule{thinmathspace}{0ex}}\mathbf{y})$ as:
Note that this ignores samples where for all dosages, ${y}_{m}\le 0.5$; one could devise a multivariate sufficient statistic that takes this information into account, but we have found that in general there is a reasonable amount of correlation between ${g}_{IC50}(\mathbf{d}\mathbf{,}\phantom{\rule{thinmathspace}{0ex}}\mathbf{y})$ and the number of noncrossing samples for a given cell line/drug combination.
Comparison of GP and sigmoid IC_{50} values
Request a detailed protocolConcordance between IC_{50} values based on sigmoid and GPcurve fitting is quantified with Pearson correlation for each drug. To account for tissue specificity and the varying number of cell lines assessed per tissue type, we employed the average weighted Pearson correlation ($pw$) of the sigmoidcurve versus GPcurve fitted IC_{50} values for the individual cancer types ($i$).
The weight for a given cancer type $i$ was denoted as $\sqrt{{n}_{i}1}$, where $n}_{i$ is the total number of cell lines treated with the drug within this tissue type. The following metric was applied,
where $p}_{i$ is unweighted Pearson correlation within a cancer type ($i$) and a total number of tested cancer types is $N=30$. For a given drug and tissue type combination, at least 10 cell lines need to be treated (${n}_{i}\ge 10$).
Differences in IC_{50} values for each drugresponse value j were consistently defined as
with a total number of tested cell line and drug combinations equalling to ${N}_{j}=171,937$.
Bayesian biomarker testing
Request a detailed protocolStandard statistical approaches for testing the influence of biomarkers on drug response mostly rely on analysis of variance (ANOVA) testing. An ANOVA can be understood as a linear model of the dependent variable $i$ (in this case, a summary measure of drug response such as IC_{50}):
where $x}_{i$ is an indicator variable denoting the group membership of data point $i$. In our application, the data points are cell lines, $z}_{i$ indicates group membership, for example the mutation status of a given SNP, and $x}_{i$ indicates any other covariates that we wish to correct for, such as tissue type. The parameter $\alpha$ captures the global mean of the drug response, whilst $\beta$ captures the effect of mutation status on the drug response, $\gamma$ is the effect of covariates, and $\u03f5}_{i$ is independent Gaussian noise.
This model, whilst useful, fails to account for the fact that our Gaussian process model provides estimates $\sigma}_{i$ of the uncertainty (or standard error) associated with the mean IC_{50} estimates $g}_{i$. In order to make use of these uncertainty estimates, we take an idea from Bayesian metaanalysis, and integrate them via a hierarchical model:
where $\mu}_{i$ is the mean drugresponse estimate for cell line $i$, and $\sigma}^{\ast 2$ is the variance across cell lines (the variance of $\u03f5}_{i$ in the ANOVA example). Note that this model can be reduced to:
We further specify a Gaussian prior $\beta \sim \mathcal{\mathcal{N}}(0,\phantom{\rule{thinmathspace}{0ex}}0.1)$ on the effect size parameter to discourage false positives and reflect our prior belief that most mutations are not associated with drug response. We also place an exponential prior ${\sigma}^{\ast 2}\sim Exp(10)$ to regularize the variance parameter. Finally, $\alpha \sim \mathcal{\mathcal{N}}(0,{\tau}^{2})$ is a Gaussian prior on the global mean with standard error $\tau \sim Gamma(1,\phantom{\rule{thinmathspace}{0ex}}1)$. Early exploratory results showed that using the estimates of $\sigma}_{i$ directly placed too much weight on experiments with very low estimation uncertainty, leading to unrealistic posterior estimates of the effect size $\beta$. To attenuate this, we used a transformed estimate $\sigma}_{i}^{c$, where the effect of parameter $c$ was explored over the range [0,1], and empirically set to 0.25 for the results reported in this paper. The main tuneable hyperparameter is the scaling parameter c, as the model is robust to changes to the parameters for the sparse priors on $\beta$ and $\sigma}^{\ast 2$. Setting this hyperparameter is straightforward, as we can use a simple line search to find a value that optimally trades off between disregarding the uncertainty estimates (c = 0) and placing too much weights on estimates with low uncertainty (c >= 1). One way to determine the optimal value for c is to randomly permute the biomarker labels, and reduce c until the false positive rate is below some acceptable threshold.
Inference in this model is performed using Hamiltonian Monte Carlo via the Stan software package Carpenter, 2017. We report the posterior mode of $\beta$ as well as the posterior probability of observing $\beta >0$ (if the posterior mode is positive) or $\beta <0$ (if the posterior mode is negative).
Data availability
All data is available through the GDSC downloads portal (ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases). Raw dose response data have been deposited in GDSC under v17a_public_raw_data.csv (ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release6.0/v17a_public_raw_data.csv). Sigmoid fitted doseresponse curves have been deposited in GDSC under v17_fitted_dose_response.csv (ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release6.0/v17_fitted_dose_response.xlsx). Cell line genomics data have been deposited in GDSC under GDSCtools_mobems.zip (ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release8.0/GDSCtools_mobems.zip). Cell line identity details have been deposited in GDSC under Cell_Lines_Details.xlsx (ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release7.0/Cell_Lines_Details.xlsx). Drug compound details have been deposited in GDSC under screened_compunds_rel_8.2.csv (ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release6.0/Screened_Compounds.xlsx).
References

Phase II and phase III attrition rates 2011–2012Nature Reviews Drug Discovery 12:569.https://doi.org/10.1038/nrd4090

Stan: a probabilistic programming languageJournal of Statistical Software 76:i01.https://doi.org/10.18637/jss.v076.i01

Improved survival with vemurafenib in melanoma with BRAF V600E mutationNew England Journal of Medicine 364:2507–2516.https://doi.org/10.1056/NEJMoa1103782

GDSCTools for mining pharmacogenomic interactions in cancerBioinformatics 34:1226–1228.https://doi.org/10.1093/bioinformatics/btx744

Lessons learned from the fate of AstraZeneca's drug pipeline: a fivedimensional frameworkNature Reviews Drug Discovery 13:419–431.https://doi.org/10.1038/nrd4309

A community effort to assess and improve drug sensitivity prediction algorithmsNature Biotechnology 32:1202–1212.https://doi.org/10.1038/nbt.2877

Metrics other than potency reveal systematic variation in responses to cancer drugsNature Chemical Biology 9:708–714.https://doi.org/10.1038/nchembio.1337

BookProceedings of the TwentyNinth Conference on Uncertainty in Artificial Intelligence, UAI’13In: Hensman J, editors. Gaussian Processes for Big Data. AUAI Press. pp. 282–290.

The p38 MAPK inhibitor BIRB796 enhances the antitumor effects of VX680 in cervical CancerCancer Biology & Therapy 17:566–576.https://doi.org/10.1080/15384047.2016.1177676

Defining subpopulations of differential drug response to reveal novel target populationsNpj Systems Biology and Applications 5:36.https://doi.org/10.1038/s4154001901134

Encorafenib, Binimetinib, and cetuximab in BRAF V600EMutated colorectal CancerNew England Journal of Medicine 381:1632–1643.https://doi.org/10.1056/NEJMoa1908075

Switched latent force models for ReverseEngineering transcriptional regulation in gene expression dataIEEE/ACM Transactions on Computational Biology and Bioinformatics 16:322–335.https://doi.org/10.1109/TCBB.2017.2764908

Inhibitors of protein methyltransferases as chemical toolsEpigenomics 7:1327–1338.https://doi.org/10.2217/epi.15.87

GPflow: A Gaussian process library using TensorFlowJournal of Machine Learning Research 18:1–6.

An unbiased oncology compound screen to identify novel combination strategiesMolecular Cancer Therapeutics 15:1155–1162.https://doi.org/10.1158/15357163.MCT150843

ARID2 modulates DNA damage response in human hepatocellular carcinoma cellsJournal of Hepatology 66:942–951.https://doi.org/10.1016/j.jhep.2016.12.026

BookGaussian Processes for Machine LearningMassachusetts, United States: MIT Press.

Correlating chemical sensitivity and basal gene expression reveals mechanism of actionNature Chemical Biology 12:109–116.https://doi.org/10.1038/nchembio.1986

ConferenceGaussian processes with monotonicity informationProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 645–652.

BookGaussian Process Regression Analysis for Functional DataNew York, United States: CRC Press.https://doi.org/10.1201/b11038

BookInterpolation of Spatial Data: Some Theory for KrigingBerlin, Germany: Springer Science & Business Media.

Gefitinib in nonsmall cell lung cancerExpert Opinion on Pharmacotherapy 6:985–993.https://doi.org/10.1517/14656566.6.6.985

Multilevel models improve precision and speed of IC50 estimatesPharmacogenomics 17:691–700.https://doi.org/10.2217/pgs.16.15

A grid algorithm for high throughput fitting of doseresponse curve dataCurrent Chemical Genomics 4:57–66.https://doi.org/10.2174/1875397301004010057

LUXLung 3: a randomized, openlabel, phase III study of afatinib versus pemetrexed and cisplatin as firstline treatment for patients with advanced adenocarcinoma of the lung harboring EGFRactivating mutationsJournal of Clinical Oncology 30:LBA7500.https://doi.org/10.1200/jco.2012.30.15_suppl.lba7500

BIRB 796 enhances cytotoxicity triggered by bortezomib, heat shock protein (Hsp) 90 inhibitor, and dexamethasone via inhibition of p38 mitogenactivated protein kinase/Hsp27 pathway in multiple myeloma cell lines and inhibits paracrine tumour growthBritish Journal of Haematology 136:414–423.https://doi.org/10.1111/j.13652141.2006.06443.x

Antiproliferation effect of APO866 on C6 glioblastoma cells by inhibiting nicotinamide phosphoribosyltransferaseEuropean Journal of Pharmacology 674:163–170.https://doi.org/10.1016/j.ejphar.2011.11.017
Decision letter

Joseph LehárReviewing Editor; Boyce Thompson Institute for Plant Research, United States

Detlef WeigelSenior Editor; Max Planck Institute for Developmental Biology, Germany

Alexander W BlockerReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This manuscript presents two statistical approaches to evaluating for drug effect measurements and associations between biomarkers, for dose curve data. Measurements of these kinds are made in many contexts, and frequently reported without accounting well for measurement uncertainties. A statistical framework of this kind will be widely useful and should be frequently applied.
Decision letter after peer review:
Thank you for submitting your article "A statistical framework for assessing pharmacological response and biomarkers using uncertainty estimates" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Detlef Weigel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Alexander W Blocker (Reviewer #1).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.
This manuscript presents two statistical approaches to evaluating for drug effect measurements and associations between biomarkers, for dose curve data. Measurements of these kinds are made in many contexts, and frequently reported without accounting well for measurement uncertainties. A statistical framework of this kind will be widely useful and should be frequently applied.
The reviewers noted some issues with in the manuscript, which should be addressed before final acceptance to eLife.
Essential revisions:
1) While this work presents a clear statistical framework, the Discussion and Abstract should explain more clearly how this framework improves drug potency or biomarker discovery efforts. That will help communicate the relevance of this work.
2) The authors should clarify of how measurement errors in the curves relate to fitted parameter uncertainties, to explain the superiority of particular fitting approaches.
3) A more thorough discussion of fitting performance is recommended. In particular, accounting for how fitting parameter selection and degrees of freedom. It would also be helpful to explore robust estimators to assess the reliability of fitted parameters. A more thorough analysis is also recommended for edge cases where the IC_{50} falls outside the measured range.
4) Additional model checking and comparisons would be useful. For example, Comparing the presented models to a more standard sigmoid, and additional comparisons highlighted in the comments. It would also help to clarify the relationship between Bayesian posterior probabilities, and ANOVA qvalues for the GP approach.
5) The authors should revise Figure 3C and D. Grouping response curves by mechanism can be misleading since different drugs or biomarkers may intrinsically have discordant potencies due to differing ontarget binding efficiency or offtarget effects.
https://doi.org/10.7554/eLife.60352.sa1Author response
Essential revisions:
1) While this work presents a clear statistical framework, the Discussion and Abstract should explain more clearly how this framework improves drug potency or biomarker discovery efforts. That will help communicate the relevance of this work.
We have highlighted the utility of our framework for identifying new biomarkers and highlighting drugs with more potency by putting more weight on cases with lower uncertainty in the statistical models as well as generalised to other diseases. We have edited the Abstract:
“Highthroughput testing of drugs across molecularcharacterised cell lines can identify candidate treatments and discover biomarkers. […] Our method is applicable to any doseresponse data without replicates, and improves biomarker discovery for precision medicine.”
and added the following section in the Discussion to communicate this:
“The increasing utilisation of highthroughput drug screening for identifying effective new treatments will necessitate the use of more powerful statistical and machine learning methods (Toh et al., 2019). […] We hope that by considering response uncertainty and providing a probabilistic view of drug biomarkers, the risks associated with drug development can be better balanced and smarter decisions can be made.”
2) The authors should clarify of how measurement errors in the curves relate to fitted parameter uncertainties, to explain the superiority of particular fitting approaches.
First of all, we would like to clarify the term ‘estimation uncertainty’ in the context of our method. This is not related to betweensample dispersion; due to the highthroughput nature of the GDSC assays, no technical or biological replicates were generated, with the exception of the replicate dataset described in Figure 2 for quality control. Hence the estimation uncertainty refers to the standard deviation of summary measures estimated from a single sample. Here, we capture the dispersion of the posterior distribution of these summary measures. If the dispersion is large, then we will have less confidence in the estimated mean value of the summary measure. Measurement errors for individual points on the doseresponse curve will generally result in larger estimation uncertainty. We have added a clarification in the Results section:
“When the estimation uncertainty is large, we will have less confidence in the estimated IC_{50} in an experiment. Measurement errors for individual points in a doseresponse curve will generally result in larger estimation uncertainty, whereas greater variation between biological replicates will result in larger observation uncertainty.”
Also in Figure 2, we have provided a detailed comparison between this estimation uncertainty and the observation uncertainty that can be measured by estimating summary values for multiple biological replicates. We observe that for the estimation of IC_{50} values, observation uncertainty is strongly positively correlated (r=0.84) with estimation uncertainty when the IC_{50} value is within the dosage range, but for values outside the dosage range, the estimation uncertainty increases, while the observation uncertainty decreases. This indicates that estimation uncertainty is reflective of the true uncertainty in the estimate when the relevant summary statistic (IC_{50}) can be determined from the measured dosages alone, but for estimating statistics that require extrapolation beyond the maximum dosage, we would additionally need to take the observation uncertainty into account. We addressed this in the Discussion section by adding this text:
“We are interested in the former to assess the quality of the fit, and therefore the reliability of the estimated IC_{50}. […] Instead, our method defines a very large confidence interval for drug response statistics extrapolated beyond the maximum dose tested and we would additionally need to take the observation uncertainty between replicate experiments into account.”
3) A more thorough discussion of fitting performance is recommended. In particular, accounting for how fitting parameter selection and degrees of freedom.
We have added a more detailed discussion and suggestions for selecting GP fitting parameters in the Materials and methods section:
“Hyperparameters for the GP model were determined by manual tuning; however, for other datasets, we could also envision a Bayesian model selection procedure which places the variational inference in a variationalwithinMCMC scheme where the MCMC moves update the hyperparameters. […] To avoid massive computational complexity, the MCMC scheme could be run on a representative subsample of cell lines.”
We also added suggestions for how the hyperparameter in the Bayesian biomarker model can be tuned:
“The main tuneable hyperparameter is the scaling parameter c, as the model is robust to changes to the parameters for the sparse priors on β and σ^{*2}. […] One way to determine the optimal value for c is to randomly permute the biomarker labels, and reduce c until the false positive rate is below some acceptable threshold.”
It would also be helpful to explore robust estimators to assess the reliability of fitted parameters.
We agree it would be useful to assess the reliability of the fitted parameters to understand whether the performance of our GP model stemmed from correcting misspecification in original ANOVA, or the use of uncertainty estimates. Like the reviewer, we wondered whether heteroskedasticity of the IC_{50} estimates impacted performance, so we performed our biomarker association testing using the AUCs of the fitted curves, hence AUC values display a homoscedastic behaviour. For this, we show that the uncertainty estimates from the AUCs did not vary across doses like IC_{50} uncertainties (new Figure 4—figure supplement 1AD). We also show the difference in biomarker associations between sigmoid and GP curve fits (Figure 4—figure supplement 1E, F), and show that we recovered all 26 gold standard biomarkers and additional sensitivity biomarkers using AUCs (Figure 4—figure supplement 1G). These new findings have been added to the Results section:
“Alternatively, AUC values can be used to compare both curve fitting methods (Figure 4—figure supplement 1). While known clinical biomarkers are recovered with AUC as a drug response metric, IC_{50} measures were used in the subsequent analysis as they retain direct relationship with the drug concentration and are more interpretable.”
A more thorough analysis is also recommended for edge cases where the IC_{50} falls outside the measured range.
We have highlighted the analysis we conducted in Figure 2CE in the Results section:
“Additionally, we examined the relationship between estimation uncertainty and observation uncertainty in a number of edge cases where IC_{50} was estimated within and beyond the maximum concentration tested (Figure 2CE).”
The reviewer noted that it may not be correct to impose monotonicity when IC_{50} might fall outside the measured range, so we showed that there is much greater IC_{50} uncertainty in those cases. For clarifying this and enhancing the visualisation, we display curve fits within concentration range in solid lines, whilst less confident fits exceeding max concentration are indicated in grey. Furthermore, we have added in red horizontal lines showing the IC_{50} confidence intervals, which are much larger outside of concentration range. We have also provided more rationale and guidance in the Discussion section for interpreting IC_{50}s at untested concentrations:
“However, extrapolating beyond the concentration range would be challenging due to the uncertainty in the behaviour of the doseresponse curve in unobserved concentrations. […] Instead, our method defines a very large confidence interval for drug response statistics extrapolated beyond the maximum dose tested and we would additionally need to take the observation uncertainty between replicate experiments into account.”
4) Additional model checking and comparisons would be useful. For example, Comparing the presented models to a more standard sigmoid, and additional comparisons highlighted in the comments.
We thank the reviewer for this insightful comment. In order to address this, we have performed a within drug random bootstrap sampling of 80% of all treated cell lines for the Sigmoid model. In particular, we estimated the uncertainty of the model for the drugs dabrafenib and lenalidomide for colorectal and melanoma cell lines, respectively (new Figure 5—figure supplement 1), to compare with GP’s variance estimation (Figure 5BC). As theorized, the incorporation of uncertainty in the GP model provides a more adequate variance estimation in the prediction of nonresponding cell lines IC_{50}’s.
An explanation of the methodology was added in the Materials and methods section:
“To estimate the uncertainty of the Sigmoid curve fitting, a random bootstrap sampling of 80% of all treated cell lines available for each drug over 100 iterations was performed. […] The standard deviation of the log(IC_{50}) estimates was computed to assess the model’s variance.”
Additionally, a comparison between the models was added in the Results section:
“Additionally, a comparison of the uncertainty estimates for the GP and the Sigmoid curve fitting methods revealed that both display concordant results (Figure 5B and C; Figure 5—figure supplement 1); However, the Sigmoid curve fitting method (Materials and methods; Vis et al., 2013) underestimates variance of nonresponding cell lines rendering the GP approach superior.”
It would also help to clarify the relationship between Bayesian posterior probabilities, and ANOVA qvalues for the GP approach.
We agree with the reviewer that Bayesian posterior probabilities and ANOVA qvalues are different statistical quantities that should not be compared in absolute terms. Posterior probabilities capture the properties of the distribution of a parameter treated as a random value; qvalues make a statement about the probability of observing an estimated (fixed) parameter value under the null hypothesis. Nevertheless, we believe that our comparison of these quantities, in relative terms, is valid, because we are interested in the value of using each of them for making a decision about the association between drug sensitivity and biomarkers. ANOVA qvalues represent the state of the art, which means that we would expect a broad correlation between the qvalues and the Bayesian posterior probabilities, with some deviations in cases where the inclusion of uncertainty estimates helps the Bayesian model detect or discard associations. This is what we see in Figure 5A, which displayed agreement between both statistical quantities and disagreements for a few biomarker associations. The key observation here is that the comparison is in terms of the relative ranking of the Bayesian posterior probabilities and ANOVA qvalues, respectively, which enabled us to draw concordant conclusions in terms of prioritising biomarkers of drug response for clinical testing.
We have now added clarifying text in the Results section:
“Since both Bayesian and frequentist methods can be used to prioritise biomarkers for further testing, we compared association statistics (posterior probabilities and qvalues) from both statistical methods.”
And the Discussion section:
“While Bayesian posterior probabilities and ANOVA qvalues are different statistical quantities for measuring biomarker associations that should not be compared in absolute terms, we compared these quantities in relative terms to prioritise biomarkers of response for further testing.”
5) The authors should revise Figure 3C and D. Grouping response curves by mechanism can be misleading since different drugs or biomarkers may intrinsically have discordant potencies due to differing ontarget binding efficiency or offtarget effects.
Thank you for touching upon an important point about possible offtargets. While we have information on the primary targets of the drugs, offtargets effects are difficult to measure and define, especially with the histone chromatin targeting drugs we highlighted. Therefore, we classified the target pathways into broad categories that can include many possible target genes (e.g. histone methylation, mitosis). Nevertheless, we have edited the Results and Discussion sections to acknowledge offtarget effects as a possible reason for the observed uncertainties. We have also added Figure 3E to show the distribution of uncertainties for individual drugs within the pathways we highlighted and not mislead by grouping drugs. It was reassuring to see that the distributions for individual drugs recapitulate what was seen when we aggregated the drugs into target pathways.
https://doi.org/10.7554/eLife.60352.sa2Article and author information
Author details
Funding
NIHR Sheffield Biomedical Research Centre (BRC  ISBRC121520017)
 Dennis Wang
Rosetrees Trust (A2501)
 Dennis Wang
 Tzen S Toh
Academy of Medical Sciences (SBF004/1052)
 Dennis Wang
Wellcome Trust (206194)
 Mathew J Garnett
Horizon 2020  Research and Innovation Framework Programme (Grant agreement No. 950293  COMBATRES)
 Michael P Menden
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The MJG laboratory is supported by the Wellcome Trust (206194). DW is supported by the NIHR Sheffield Biomedical Research Centre, Rosetrees Trust (ref: A2501), and the Academy of Medical Sciences Springboard (ref: SBF004/1052). MPM is supported by the European Union's Horizon 2020 Research and Innovation Programme (Grant agreement No. 950293  COMBATRES). We thank Benjamin Sidders and Oliver Stegle for feedback on the methodology. We also thank the Sheffield Bioinformatics Core for help with data preprocessing.
Senior Editor
 Detlef Weigel, Max Planck Institute for Developmental Biology, Germany
Reviewing Editor
 Joseph Lehár, Boyce Thompson Institute for Plant Research, United States
Reviewer
 Alexander W Blocker
Publication history
 Received: June 24, 2020
 Accepted: December 4, 2020
 Accepted Manuscript published: December 4, 2020 (version 1)
 Accepted Manuscript updated: December 7, 2020 (version 2)
 Version of Record published: December 17, 2020 (version 3)
Copyright
© 2020, Wang et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,166
 Page views

 141
 Downloads

 1
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.