Robust and Efficient Assessment of Potency (REAP) as a quantitative tool for dose-response curve estimation

  1. Shouhao Zhou  Is a corresponding author
  2. Xinyi Liu
  3. Xinying Fang
  4. Vernon M Chinchilli
  5. Michael Wang
  6. Hong-Gang Wang
  7. Nikolay V Dokholyan
  8. Chan Shen
  9. J Jack Lee
  1. Department of Public Health Sciences, Pennsylvania State University, United States
  2. Department of Lymphoma and Myeloma, University of Texas MD Anderson Cancer Center, United States
  3. Department of Pharmacology, Pennsylvania State University, United States
  4. Department of Pediatrics, Pennsylvania State University, United States
  5. Department of Biochemistry and Molecular Biology, Pennsylvania State University, United States
  6. Department of Surgery, The Pennsylvania State University, United States
  7. Department of Biostatistics, University of Texas MD Anderson Cancer Center, United States

Abstract

The median-effect equation has been widely used to describe the dose-response relationship and identify compounds that activate or inhibit specific disease targets in contemporary drug discovery. However, the experimental data often contain extreme responses, which may significantly impair the estimation accuracy and impede valid quantitative assessment in the standard estimation procedure. To improve the quantitative estimation of the dose-response relationship, we introduce a novel approach based on robust beta regression. Substantive simulation studies under various scenarios demonstrate solid evidence that the proposed approach consistently provides robust estimation for the median-effect equation, particularly when there are extreme outcome observations. Moreover, simulation studies illustrate that the proposed approach also provides a narrower confidence interval, suggesting a higher power in statistical testing. Finally, to efficiently and conveniently perform common lab data analyses, we develop a freely accessible web-based analytic tool to facilitate the quantitative implementation of the proposed approach for the scientific community.

Editor's evaluation

This article proposes methodology and accompanying software for robustly fitting dose-response curves where response is a number between 0 and 1. When response is transformed using the common logistic transformation, values close to 0 or 1 become large in magnitude, unduly influencing the fitted curve after back-transformation and introducing bias in the estimate of certain parameters. As demonstrated through simulation and application to real data, the proposed approach, called Robust and Efficient Assessment of Potency, is less perturbed by these extreme measurements.

https://doi.org/10.7554/eLife.78634.sa0

eLife digest

Finding a new drug which is both safe and efficient is an expensive and time-consuming endeavour. In particular, establishing the ‘dose-effect relationship’ – how beneficial a drug is at different dosages – can be challenging. Predicting this curve requires gathering experimental data by exposing and recording how cells respond to various levels of the drug. However, extreme values are often observed at low and high dosages, potentially introducing errors that are hard to correct in the prediction process. Yet, these extreme observations are sometimes genuine so researchers cannot just ignore them.

To improve dose-effect estimation, Zhou, Liu, Fang et al. developed a new general-purpose approach. It uses advanced statistical modelling to account for extremes in lab data. This strategy outperformed other methods when dealing with these observations while also providing higher efficiency in data analysis with more uniform data in experiments.

To facilitate implementation, Zhou, Liu, Fang et al. set up a user-friendly tool baptized ‘REAP’; this free online resource allows scientists without advanced statistical experience to harness the new approach and to perform dose-effect analysis more easily and accurately. This could boost research across many different disciplines that examine the effects of chemicals on cells.

Introduction

The median-effect equation is a unified theory in medicine to describe the dose-response relationship and identify agents or their combinations that activate or inhibit specific disease targets (Chou, 2006). It is a fundamental method established based on the pharmacological principle of mass-action law (Chou, 1976). As the common link for many biomedical systems, it has been used extensively to analyze in vitro experimental data and evaluate the potency of related drugs (Chou and Talalay, 1984; Chou and Rideout, 1991; Greco et al., 1995; Lee and Kong, 2009).

In practice, the median-effect equation can be estimated for drug efficacy or pathway inhibition from normalized data generated from experimental studies. Without knowing the true dose-effect curve during the experimental design and data collection, it is common to observe extreme values of (un)affected cell fraction that is close to the response of either 0 or 100% in the analytic dataset. Quantitatively, it poses a special analytic challenge to estimate the median-effect question in practice. The standard estimation approach, often based on a linear regression model after a logit transformation (Roell et al., 2017; Gadagkar and Call, 2015), could suffer badly from poor estimation in such situations. Figure 1 illustrates a preliminary example in that the standard approach is deficient in describing the median effect curve with a perturbation in one extreme data point. The variation in real experimental data, mostly caused by unavoidable measurement error, often at a much larger degree, therefore challenges the reliability of result presentation and interpretation for many drug assessment studies.

Dose-response curve fitting with extreme observations.

The original data points are on the true curve. The leftmost data point is changed from 0.005 to 1e-6, referring to a small white noise that cannot be visually recognized. The change leads to the obvious departure between the estimated curve by linear regression model (dotted) and the true curve (solid), which demonstrates that standard regression is sensitive to extreme values. The response at the true IC50 (dotdashed, vertical, left) is only 22% from the estimated curve; the estimated IC50 (dotdashed, vertical, right) corresponds to the 70% fraction of cell affected, effecting a substantive 20% inflation (50% ->70%) in estimation error. In contrast, the estimated curve by beta regression model (dashed) is almost overlapped with the true curve (solid), which shows that BRM is much more robust to extreme values. LRM: linear regression model; BRM: robust beta regression model. Detailed model descriptions of LRM and BRM are provided in Materials and methods section.

Additionally, the modeling strategy of deleting extreme values may not be feasible in many situations (Solzin et al., 2020). For example, a meaningful drug concentration could consist of high inhibition (>90%) or low cell viability (<10%) in cancer research. It is not logical to ignore extreme observations when they are indeed biologically relevant for the target effect, not even to mention an associated loss of power and accuracy by leaving fewer data points for estimation. As illustrated in Figure 2, deleting the extreme values couldn’t eliminate the estimation bias, but only impaired the efficiency of interval estimation with wider nominal 95% confidence intervals (C.I.) and harmed the estimation accuracy with worse coverage probabilities.

Comparison of estimation efficiency and accuracy using linear regression model and beta regression model.

Deleting the extreme values could not eliminate the bias (panel A), but only harmed the accuracy with worse coverage probabilities (panel B) and impaired the efficiency of interval estimation with wider nominal 95% confidence intervals (panel C). A total of 1000 data sets were generated following the data simulating process described in Appendix 1, using the dose sets and true dose-response curve under 7 dose setting with a precision parameter of 100. Responses ≤5% or ≥95% were considered extreme responses. Dashed line in panel B denotes 95% nominal coverage probability. BRM: beta regression with extreme data points; LRM: linear regression model with extreme data points; LRM(t): linear regression model with truncated dataset after deleting extreme values. Detailed model descriptions of LRM and BRM are provided in Materials and methods section.

Furthermore, it is dubious to apply the constant error variance, a default assumption in standard linear regression modeling, in dose-response estimation. As an assumption can be examined with repeated measures, many dose-response data have indicated either a constant variance before logit transformation or a positive correlation with drug dose. It is incongruous to apply linear regression if the assumption is violated due to error heteroscedasticity (Schmidheiny, 2009; Williams et al., 2007). Therefore, it is essential to develop a robust quantitative approach to estimating the median-effect equation.

Here, we introduce a novel approach to improving the quantitative assessment of dose-response relationship and drug potency, together with a user-friendly web-based analytic tool to facilitate the implementation. The proposed method to estimate the median-effect equation is established in the robust beta regression framework, which not only takes the beta law to account for non-normality and heteroskedasticity (Ferrari and Cribari-Neto, 2004), but also minimizes the average density power divergence (DPD) using a tuning parameter (Ghosh, 2019). We apply a data-driven approach to optimizing the tuning parameter, which further compensates for the lack of robustness against outliers. In the simulation studies, we compare the robust beta regression framework with linear regression models either in the standard normal distribution error, or in the heavy-tailed t distribution error with 3 degrees of freedom hopefully to downweigh the influence of extreme observations. Results from simulation studies under various scenarios confirm that the proposed approach consistently gives robust estimation for the median-effect equation. Particularly, we examine two important measures for drug binding affinity: the Hill coefficient, which signifies the sigmoidicity of the curve, and the overall effect, indicated by dose concentration for a specified (e.g. 50%) response (Shen et al., 2008; Sampah et al., 2011). When there are extreme outcome observations, the improvement of robust beta regression in estimation accuracy could be substantial. Moreover, simulation studies further illustrate that the proposed approach provides a narrower confidence interval, which in turn suggests a higher efficiency to achieve better power in statistical testing even without acquiring additional experimental data. Illustrative examples using real-world data for cancer research and SARS-CoV-2 treatment are provided. The analyses are implemented using the freely accessible web-based application REAP, developed based on the Shiny package of R language, with which research scientists could conveniently upload their drug experiment dataset and perform the data analysis.

Results

REAP Shiny App

We developed a user-friendly analytic tool, coined ‘REAP’ (Robust and Efficient Assessment of Potency), for convenient application of the robust dose-response estimation to real-world data analysis. It is established in an agile modeling framework under the parameterization of the beta law to describe a continuous response variable with values in a standard unit interval (0.1). We further exploited a robust estimation method of the beta regression, named the minimum density power divergence estimators (MDPDE) (Ghosh, 2019), for dose-response estimation, with the tuning parameter optimized by a data-driven method (Ribeiro and Ferrari, 2020). The technical details are provided in the Materials and methods.

REAP presents a straightforward analytic environment for robust estimation of dose-response curve and assessment of key statistics, including implementation of statistical comparisons and delivery of customized output for graphic presentation (Figure 3). The dose-response curve is a time-honored tool to convey the pharmacological activity of a compound. Through dose-response curves, we can compare the relative activity of a compound on different assays or the sensitivity of different compounds on an assay. REAP aims to make this job simple, estimation efficient, and results robust.

REAP App interface, with a highlight of Output section.

Using the robust beta regression method, REAP produces a dose-response curve plot with effect and model estimations. The left panel allows users to specify model features and design plot specifics. REAP also provides hypothesis testing results to compare effect estimations, slopes and models.

There are three sections in REAP: Introduction, Dataset and Output. Users can have both overview and instruction of REAP in the Introduction. Dataset is uploaded in the Dataset section. The input dataset is mandated to be in a csv file format and contains three columns of data respectively for drug concentration, response effect and group name, in a specific order. It is recommended that users normalize the response variable to the range of (0,1) by themselves. Otherwise, REAP automatically will truncate the values exceeding the boundaries to (0,1) using a truncation algorithm (see Appendix 1 - Truncation Strategy). In the Output section, it generates a dose-response plot, along with tabulation for effect and model estimations. A special feature of REAP is that it conveniently allows the users to specify the target effect level, rather than fixed at the common median effect (i.e., 50%), in dose estimation. We also enable hypothesis testing for comparisons of effect estimations, slopes and models (i.e. comparing both intercepts and slopes; see Materials and methods). By default, the x-axis of the dose-response plot is log-scaled. In the plot, users can choose to add mean values and sample standard deviations for data points under the same agent and dose level. Both plots and estimation tables are downloadable on REAP to plug in presentations and manuscripts for result dissemination.

The open-sourced REAP is freely available and accessible at https://xinying-fang.shinyapps.io/REAP/. We demonstrated it in two real-world examples, after presenting the simulation results, to illustrate the functionality of REAP.

Simulations

We conducted simulation studies to investigate the robust beta regression model, in comparison to linear regression models with data transformation, either under a normal distribution error (implemented with R package ‘stats’) or a heavy-tailed t distribution error with 3 degrees of freedom (implemented with R package ‘heavy’), to characterize the median-effect equation under different scenarios. The model assessment is established based on both the point estimation and interval estimation derived from each method. Details on the simulation setting are described in the Appendix 1 - Data simulating process.

With data simulated using normal error terms, the robust beta regression provides sensible estimation of IC50, IC90, β1, and β0 from median-effect equation (Figure 4, Appendix 1—table 1). Particularly, when there are extreme outcome observations, the robust beta regression manages much lower bias and root-mean-square error (RMSE) for point estimates and better coverage probability for interval estimates than the linear regression model with normal distribution error. For data without extreme values, their performance is comparable in bias, RMSE and coverage probability, but the linear regression model has much wider 95% CIs (Figure 4). Indeed, the wider 95% CIs occur across all the scenarios, indicating higher estimation efficiency of the robust beta regression approach. In contrast, the heavy-tailed linear regression model demonstrates improved bias and RMSE in point estimation from the standard linear regression, but the nominal 95% CIs are significantly underestimated with coverage probability below 50% in most cases (Appendix 1—table 1). Therefore, the heavy-tailed linear regression model, although sometimes provides good point estimations, cannot maintain consistently robust and statistically efficient estimations. Overall, the robust beta regression model is the most robust and stable in estimating the median-effect equation with reliable performance in both point estimations and 95% CI coverage probabilities.

Comparison of the point estimates and 95% confidence intervals using linear regression model, heavy-tailed linear regression model and robust beta regression model, with data simulated from normal error term.

The vertical solid lines indicate the true values. The dots represent the averaged point estimates and the bars represent the averaged lower and upper bound of 95% CIs. The point estimation by robust beta regression is consistently closer to the true value with a narrower 95% CI compared to the linear regression model. The 95% CI of heavy-tailed linear regression underestimates the nominal coverage probability. LRM: linear regression model; LRM-7: LRM under 7-dose dataset with extreme data points; LRM-6noL: LRM under 6 dose dataset after removing the highest dose data point; LRM-6noS: LRM under 6-dose dataset after removing the lowest dose data point; LRM-7lessE: LRM under 7-dose dataset with less extreme data points; LRM-7NCP: LRM under 7-dose dataset with extreme data points and dose-dependent precision; HLRM: heavy-tailed linear regression model; HLRM-7: Heavy-tailed LRM under 7-dose dataset with extreme data points; HLRM-6noL: Heavy-tailed LRM under 6-dose dataset after removing the highest dose data point; HLRM-6noS: Heavy-tailed LRM under 6-dose dataset after removing the lowest dose data point; HLRM-7lessE: Heavy-tailed LRM under 7-dose dataset with less extreme data points; HLRM-7NCP: Heavy-tailed LRM under 7-dose dataset with extreme data points and dose-dependent precision; BRM: robust beta regression model; BRM-7: BRM under 7-dose dataset with extreme data points; BRM-6noL: BRM under 6-dose dataset after removing the highest dose data point; BRM-6noS: BRM under 6-dose dataset after removing the lowest dose data point; BRM-7lessE: BRM under 7-dose dataset with less extreme data points; BRM-7NCP: BRM under 7-dose dataset with extreme data points and dose-dependent precision. Detailed model descriptions of LRM, HLRM, and BRM are provided in Materials and methods section.

In parallel, similar results are obtained consistently with data simulated using beta error terms, which induces heteroscedasticity (smaller variation on the two ends and bigger in the middle) at different dose levels (Appendix 1—figure 1, Appendix 1—table 2). All the results above demonstrate the sensitivity of regression models in dealing with datasets including extreme values. In addition, the result comparisons between the seven-dose set and the six-dose set with the largest or smallest dose eliminated display the potential worse influence of deleting extreme values directly in modeling dose-response using linear regression, which further notarizes the robustness and efficiency of the proposed robust beta regression.

Overall, the simulation study suggests that the robust beta regression model produces well-calibrated dose-response curves while being more robust and powerful than the standard regression model and the heavy-tailed linear regression model in estimating the median effect equation.

B-cell lymphoma data

The first example of REAP application is dose-response curve estimation of the same agent under different cell lines. The data was originally from a study on using a drug called auranofin in treating B-cell lymphomas such as relapsed or refractory mantle cell lymphoma (MCL) (Wang et al., 2019). As an FDA-approved treatment of rheumatoid arthritis, auranofin targets thioredoxin reductase-1 (Txnrd1), and was repurposed as a potential antitumor drug to effectively induce DNA damage, reactive oxygen species (ROS) production, cell growth inhibition, and apoptosis in aggressive B-cell lymphomas, especially in TP53-mutated or PTEN-deleted lymphomas.

In the experiment, the effect of auranofin was evaluated in six MCL cell lines (Z-138, JVM-2, Mino, Maver-1, Jeko-1, and Jeko-R) with auranofin in concentrations ranging from 0 to 5 μM for 72 hr and tested cell viability using a luminescent assay. The interval bars of observed dose-response in Figure 5 show that the sample variance of error from repeated measurements decreased with the increase of auranofin concentrations. To account for the heteroscedasticity and asymmetry in the variance, we enable a dose-dependent precision (proportional to inverse variance) in REAP, adding logdose as an additional regressor for the precision parameter. Figure 5 shows the fitted dose-response curves with the dose-dependent precision. The test for homogeneity (p-value <0.0001) suggests distinct dose-response between cell lines. The estimation of intercepts, hill coefficients and pairwise comparisons of IC50 estimations are provided in Appendix 1—table 3.

Dose-response curve estimation of auranofin (μM) under different MCL cell lines.

The dose-response curve was fitted with a dose-dependent precision with log(dose) as an additional regressor for the precision estimator. Observed dose effects are displayed with interval bars, which end with arrows when estimated intervals exceed (0,1). Triangles at the bottom indicate IC50 values for each MCL cell line. MCL: mantle cell lymphoma.

SARS-CoV-2 data

The second example is on the dose-response curve estimation in antiviral drug development for coronavirus disease 2019 (COVID-19). At the beginning of 2020, COVID-19 broke out at an unprecedented pace internationally, but there were limited therapeutic options for treating this disease. Therefore, many compounds and their combinations were rapidly tested in vitro against the SARS-CoV-2 virus to identify potentially effective treatments and prioritize clinical investigation.

In the data (Bobrowski et al., 2021), the benchmark compound collection consists of five known antivirals, including remdesivir, E64d (aloxistatin), chloroquine, calpain Inhibitor IV and hydroxychloroquine. The in vitro experiment was performed using the same biological batch of SARS-CoV-2 virus and conducted in biosafety level-3. In the original publication (Bobrowski et al., 2021), the dose-response curves were fitted by linear regression, which could yield inconclusive estimation (e.g. hydroxychloroquine in Figure 1G of Bobrowski et al., 2021), while the estimated inhibition tends to exceed 1 when concentration is larger than 10 µM. REAP gives reasonable estimation for the dose-response curves (Figure 6). The hypothesis testing results show that at least one slope estimation is different from other antivirals (p-value = 0.0003) and at least one EC50 estimation is different from others (p-value = 0.003). Calpain Inhibitor IV shows a higher potency than other agents including hydroxychloroquine (p-value = 0.0038, Appendix 1—table 4).

Dose-response curve estimation of anti-viral drugs under the same biological batch with SARS-CoV-2 data.

The robust beta regression gives reasonable estimations to the dose-response curve of hydroxychloroquine, compared to the inconclusive dose-response curve fitted by linear regression in Bobrowski et al. (2020). The plot is generated without selecting the option of mean and confidence interval for observations. Triangles indicate the estimated EC50 values for each drug.

Discussion

Quantifying the potency of a compelling substance is always a central topic in life sciences (Schindler, 2017). It is a vital component of research in pharmacology, but also prevalent in the fields of toxicology, environmental science, agrochemistry, and medicine, among many others. For instance, the description of dose-response curves can provide the initial toxicological risk assessment (National Research Council, 2007), and guide in silico modeling of toxic doses to humans and the environment (Blaauboer et al., 2012). Based on proper identification of dose-response relationship from in vitro assays, studies can successfully predict systemic toxicological effects in vivo without additional in silico modelling (Groothuis et al., 2015). Nevertheless, it necessitates accurate and reliable description of the dose-response curve, which further demands robust and efficient modeling strategies to account for embedded variability in observed response and to derive solid inference with valid quantification of uncertainty.

The dose-response estimation could be substantially biased by the standard regression modeling. In the illustrative example (Figure 1), the estimated IC50 dose indeed effects the 70% fraction of cell affected, while the estimated response at the true IC50 dose is only 22%. Such a large discrepancy is sourced by a small (<0.5%) single measurement error, which is common and inevitable in any regular in vivo experiment, but could engender a profound impact on the assessment of drug potency and determination of synergy in drug combinations. In addition, the modeling strategy of deleting those extreme values (e.g. Figure 2, or 6noL and 6 noS datasets in Figure 4 and Appendix 1—figure 1) is futile to improve the poor performance of standard regression model, but may further impair the estimation efficiency and accuracy. In general, it fails to reduce bias but only introduces larger uncertainty in estimation of dose concentration, especially at extreme responses (e.g. IC90). On the other hand, a heavy-tailed error distribution may help to stabilize the point estimation, but the interval estimation could be largely under-estimated with poor coverage probabilities.

We develop REAP for assessment of drug potency to address concerns in this regard. It has substantial advantages over existing methods by reducing the impact of random errors due to implicit variations in the experimental data. To our best knowledge, it is also for the first time that beta regression is introduced to dose-response estimation. The underlying modified robust beta regression model estimated by the data-driven tuning parameter is resilient to estimation bias caused by extreme observations, which is a routinely encountered situation for deficient dose-response estimation using the standard estimation approach. The proposed approach is also efficient in quantitative characterization of dose-response curves with narrower confidence intervals for key estimators. Furthermore, REAP can simultaneously model the data heterogeneity with a dose-dependent precision component (Figure 5). It is simply different from other dose-response methods, in which a vector of weights have to be (possibly mis-)specified externally. REAP is an open-source and user-friendly platform, developed for diverse non-computational scientists for hands-on wet-laboratory data analysis in regular use, and can be hosted within R shiny environment under Windows, Linux, and Mac systems or deployed in Docker available as a web server.

Our work potentially can be useful in applications of drug screening. The proposed method and the developed REAP App allow for the robust and efficient estimation and accounting for outliers as well, making it fitted particularly in a high-throughput setting. As the result of a complex and dynamic cascade of events, exposure time is another important factor ultimately affecting the dose-response. For in vitro experiments measured at different time points in a choice of cell-lines and expressed by a variety of assays (Byrne and Maher, 2019), the proposed modeling framework can be naturally extended to model time-dependent cytotoxicity while controlling for fixed or random effects. Furthermore, the application of robust and efficient dose-response estimation can be integrated into methods to identify drug interaction effect (Lee and Kong, 2009; Lee et al., 2007). There is a venerable history that multi-agent combination therapies demonstrate great advantages in improving therapeutic efficacy and revolutionize patient outcomes in a wide range of diseases. Robust and efficient estimation of the dose-response curve would be crucial in investigation of adequate drug combinations.

The developed method has limitations. We presented a model of the median effect equation for dose-response curve estimation based on mass action law. While in specific scenarios other laws may be considered more suitable to describe the biomedical systems, the current modeling framework can be naturally adapted for other dose-response functions like probit (via cumulative normal distribution) and Weibull model (Christensen, 1984), or any other continuous distribution functions. In addition, the median-effect equation to characterize pharmacological activity assumes the compound can affect all the cells. From a quantitative perspective, a compound that cannot reach high binding affinity will yield an over-conservative estimation for median effective dose of a drug. However, in comparison to the sensitivity of different compounds in an assay, it is not harmful because the less effective compounds will be more easily identified. If it is a concern that the maximal effects of candidate compounds are different and the aim is to accurately model the dose-response curve, the Emax model could be a better choice (Lee et al., 2010). Furthermore, the robust beta regression approach in REAP cannot handle values equal or less than 0, or equal or greater than 1. Thus, we developed a sequential data truncation algorithm in REAP to overcome the limitation of the conventional transformation (y * (n−1)+0.5) / n, which could be too rough in dose-response curve estimation particularly when the sample size n for each group is relatively small. Although empirically we have validated it using simulated data, the algorithm could be improved by future work to retain information more efficiently.

In summary, a good modeling strategy must effectively characterize the nature of the observed dose-response pattern (Lyles et al., 2008). Rapid advances in novel drug development and considerable deficiency in modeling data with extreme values offer an appealing opportunity for next-generation quantitative approaches. While many aspects of the techniques discussed here fit in the statistical framework of robust beta regression, our aim is to clearly apply and rigorously customize the analytic considerations, to reduce bias and ameliorate efficiency in routinely used dose-effect estimation, and to facilitate the convenient analytic implementation and dissemination. Experimental conditions and candidate drug potency could inevitably vary in practice, but REAP provides a great tolerance for points with extreme values, solid support for accurate and efficient dose-response curve estimation, and useful reference to the future development of methodology in drug investigation. Overall, we anticipate that our work will contribute more to quantitative analysis in assessment of drug potency in preclinical research.

Materials and methods

Median-effect equation and dose-response curve

Request a detailed protocol

The median-effect equation describes a popular model of the dose-response relationship based on the median effect principle of the mass action law in various biological systems (Chou, 1976). Assume fa and fu are the fractions of the system affected and unaffected by a drug concentration d. The median-effect equation states that

(1) fafu=(dDm)m,

where m is the Hill coefficient signifying the sigmoidicity of the dose-effect curve and Dm is the dose of a drug required to produce the median effect, which is analogous to the more familiar IC50 (drug concentration that causes 50% of the maximum inhibitory effect), ED50 (half-maximum effective dose), or LD50 (median lethal dose) values (Ghosh, 2019). For example, if an inhibitory substance is of interest, the parameter m measures the cooperativity in the binding of multiple ligands to linked binding sites, and the parameter Dm=IC50 , defined by the concentration that causes 50% of the maximum inhibitory effect.

Given fa+fu=1, the median-effect Equation 1 is equivalent to

(2) logitfa=logfafu=-logitfu=-logfufa=mlogd-log Dm,

where logit(p) denotes the logit function logp1-p . The Equation 2 shows a log-linear relationship between the drug dose d and its effect fa (or fu , if it is, for example, the % survival of interest) after a logit transformation. Because from a modeling perspective the identical strategy can be applied to model both fa and fu , for the effect on cell fraction E, we can rewrite Equation 2 to be:

(3) logitE=logE1-E=β1logd+β0

where β0 is the intercept and β1 the slope of the response curve. A linear regression model (LRM) can be applied in the form of Equation 3 with a standard normal distribution error. In simulation studies, we also examine Equation 3 with a heavy-tailed t-distribution error, denoted by heavy-tailed linear regression model (HLRM).

In this presentation, the median effect dose

(4) Dm= exp-β0β1,

the Hill coefficient

(5) m={β1β1ifE=faE=fu

and the dose-response curve

(6) E=logit-1β1logd+β0,

where logit-1x=exp(x)1+exp(x) is the inverse-logit function.

Beta regression model for dose-response curve estimation

Request a detailed protocol

We will review the beta regression model which for the first time will be applied in dose-response estimation. The effect E and the parameters β=(β0,β1) in Equation 3 cannot be directly observed, but they can be estimated using experimental data, in which the observed sample cell fraction y produced by the drug dose d is a random variable with mean E. It is clear that effective estimation must properly account for random variation and be based upon a model that not only matches the nature of the response variable, but adequately characterizes the observed dose-response pattern (Lyles et al., 2008).

Among all the unknown quantities, the parameters β could be first estimated and play a fundamental role in supporting the inference for others. In the standard estimation procedure based on linear regression, logity=logy1-y is regressed on logd to get the inference on parameters β. Subsequently, the dose-response curve can be estimated by Equation 6, and (Dm,m) can be derived based on Equations (4) and (5) for median-effect Equation 2. Because the extreme values of y close to 0 or 1 could yield very large values of logity (approaching to - or +, respectively, if y0 or 1), and induce significant bias in estimation of β, the accuracy of the estimated dose-response curve and median-effect equation is in question when there exist extreme values in the dataset.

The beta regression model describes a response variable y with continuous values restricted to the open standard unit interval (Johnson et al., 1995; Simas et al., 2010). In a classic beta regression framework, the beta regression model uses a parameterization of the beta law that is indexed by the mean parameter μ, and the precision parameter ϕ that controls the overall variation (Ferrari and Cribari-Neto, 2004). To model the dose-response relationship for the cell fraction E, we assume that the response y is a beta-distributed random variable and its mean μ=E has the form of Equation 6, where d is the dose producing effect E, β1 and β0 are the regression parameters. Estimation of regression parameters β can be performed using maximum likelihood method to derive point estimate β^ and covariance matrix Σ.

Beta regression is resistant to extreme values and provides reliable estimations (Figure 1). Compared with the standard approach, which applies a non-linear transformation in the response for an approximation to the normal distribution, the beta density can take on a variety of shapes to account for non-normality and skewness (Smithson and Verkuilen, 2006). In the presence of heteroskedasticity and asymmetry, two common problems frequently observed in limited range continuous response data, an empirical study showed that the beta regression provided the best estimation among several alternatives (Kieschnick and McCullough, 2016).

Robust beta regression model with MDPDE

Request a detailed protocol

We will present a modified robust beta regression approach in REAP implementation, which is established based on density power divergence for robust estimation (Ghosh, 2019), but further improved after we introduce a data-driven method to identify the optimal tuning parameter. The standard beta regression potentially could still be sensitive against outliers because its inference is based on the maximum likelihood estimation. Ghosh, 2019 developed the robust minimum density power divergence estimators (MDPDE) that address the problem by minimizing the average density power divergence (DPD)

(7) dα(g^, g)=g1+α1+ααg^gα+1αg^1+α,d0(g^, g)=limα0dα(g^,g)g^logg^g,

between the empirical density g^ and the beta model density function gBetaμϕ, 1-μϕ with μ=logit-1β1logd+β0 . α is a non-negative tuning parameter, smoothly connecting the likelihood disparity (at α = 0) to the L2-Divergence (at α = 1). The parameter of interest β is estimated by minimizing the DPD measure between gi and the density, g^i ,

(8) n1i=1ndα(g^i(), gi(,θ))

where θ=(β,ϕ)T. After mathematically simplifying Equation 8, (Ghosh, 2019), θ can be equivalently estimated by minimizing the objective function using the estimation equations:

(9) Hn,αθ=n-1i=1n[Ki,αθ-1+ααgiyi,θα]

where Ki,αθ=B(1+αμiϕ, 1+α1-μiϕ-α)Bμiϕ, 1-μiϕα.

MDPDE improves the standard beta regression with the DPD measure and a fixed tuning parameter. The recommended α is around 0.3 to 0.4, but simply assigning a fixed α in [0.3, 0.4] is not applicable in many cases. Here we adopted a data-driven method (Ribeiro and Ferrari, 2020) to identify the optimal α. The search for the optimal α starts with a grid of α, a pre-defined αmax and grid size ρ, which generates a sequence of equally spaced {αk}k=0m (0=α0<α1<αmαmax). MDPDE calculates the corresponding θ and se(θ) with each α so that we get a vector of standardized estimates:

zαk=(θ^αk1 nse(θ^αk1), , θ^αkp nse(θ^αkp))T

The standardized quadratic variations (SQV) are defined by:

SQVαk=p-1||zαk-zαk+1||.

We compare each SQVαk with a pre-defined threshold L (L>0). If all αk satisfy the stability condition of SQVαk<L, then the optimal α equals the minimal α in αk . Otherwise, restart the search with a new grid of αk . The new grid of the same size p is picked from the sequence αkk=0m starting from the largest αk that fails the stability condition. Repeat searching until all αk in the current grid satisfy the stability condition or αmax is reached. If the stability condition is satisfied before αmax is reached then optimal α equals the minimal value in the grid of αk . If αmax is reached, then optimal α equals 0, which is equivalent to the maximum likelihood estimation. We denote this approach by robust beta regression model (BRM) in the simulation study.

Point estimate and its confidence interval for drug activity measurements

Request a detailed protocol

The objective of analysis is to characterize the dose-response curves in equation (2) and quantify in vitro drug potency. Popular drug activity measurements include Hill coefficient m and median effect dose Dm . In some circumstances, other measurements such as instantaneous inhibitory potential (IIP), which directly quantifies the log decrease in single-round infection events caused by a drug at a clinically relevant concentration, are of special interest (Shen et al., 2009).

The MDPDE for beta regression model provides a robust strategy to estimate β, from which the point estimates and confidence intervals of relevant drug activity measurements can be derived. Mathematically, those drug activity quantities can be written as functions of parameters β with an explicit form. Subsequently, their point estimates and confidence intervals can be derived based on the inference of β. For example, given a point estimate β^=(β^0,β^1), the point estimate for m^ , D^m as a single value, and E^ as a function of dose d can be computed using Equations 4–6.

It is important to construct the confidence interval around the point estimate to gauge the estimation uncertainty. With different levels of measurement error from either well-managed or lousy experiments, the levels of evidence vary for statistical inference, even if it derives the same point estimates for the intercept β0 , slope β1 and the corresponding dose-response curve. Given the point estimate β^ and its positive-definite covariance matrix Σ to account for variability in observed response, we apply the multivariate delta method and approximate the variance estimate after assuming asymptotic normality (Bickel and Doksum, 2015). As demonstrated in our simulation studies, the constructed 1-α×100% confidence interval consistently provides better results to quantify the 1-α×100% coverage probability. More importantly, the width of the constructed confidence interval was narrower than that from a linear regression model, suggesting that our approach is more efficient with a higher statistical power (Appendix 1—tables 1 and 2).

Comparison of the dose-response curves

Request a detailed protocol

When we estimate multiple dose-response curves with the data collection experiments conducted in a similar setting, it is often of interest to statistically compare the drug potency and/or Hill coefficients. A typical comparison may occur when we examine the similarity of response from different drugs, explore the additional effect of a drug combined with certain monotherapy, or assess the homogeneity of a drug to different patient samples or cell lines. In the beta regression framework, the statistical comparison can be conducted by first comparing independent fits for each curve with a global fit that shares the common parameters among different groups. Subsequently, the likelihood ratio test can be applied to examine whether the same Hill coefficient or one dose-response curve can adequately fit all the data. The only exception is to assess whether median effect doses are the same in different groups, while an F test is used for the single parameter testing. If the global test for potency shows a significant p-value, a pairwise comparison can be conducted using two-sided t-test for the ordered groups with Benjamini-Hochberg correction for multiplicity.

Appendix 1

Truncation strategy

Based on the median-effect equation method by Chou TC, the software “CompuSyn” was published. In the data entry illustration of this software, they pointed out the sensitivity limits of data points, for example too low (fa <0.02) and too high (fa >0.99) and suggested that such data points out of effect may be edited or deleted.

There are some data truncation algorithms in the literature. Two obvious remedies are proportionally “shrinking” the range to a sub-range nearly covering the unit interval (e.g., [.00001,.99999]) or simply adding a small amount to 0-valued observations and subtracting the same amount from 1-valued observations while leaving the other observations unchanged. Macmillan and Creelman, 2005 mentioned a method that is frequently used in practice in areas such as signal detection is to add 1/(2n) to a 0 observation and subtract 1/(2n) from a 1 observation, where n is the total number of observations. Besides, Smithson and Verkuilen, 2006 demonstrated that a useful transformation in practice is (y * (n−1)+0.5) / n, which is also mentioned by the documentation for R Betareg package for conditions when data assumes the extremes 0 s and 1 s. In dose-response curve estimation, this treatment could be too rough, especially when n is small.

To minimize the impact from truncation of data points, we apply the following algorithm. The first step is to shrink the data range to [1e-9, 1-1e-9]. If there still exist abnormal conditions, we will sequentially shrink the data range of abnormal ones to [1e-8, 1-1e-8], …, until [1e-3, 1-1e-3] or non-exist of abnormal conditions. Then, if it still exists, though rarely, the transformation of (y * (n−1)+0.5) / n, where n is the sample size, in the documentation of R Betareg package would be applied. We have conducted simulations to test this algorithm in various scenarios with different errors and it achieved reasonable performance in handling all conditions.

Data simulating process

In the simulation study, both robust beta regression and linear regression are applied to estimate dose-response curves under different scenarios. The point estimations and 95% confidence intervals of IC50, IC90, β0 and β1 under each method will be obtained and then, be compared to evaluate the model performance.

To generate data for simulation studies, we define the dose set for simulation as 0.1, 0.2, 0.4, 0.8, 1.6, 3.2 and 6.4 μM, which consists of 7 doses, and choose the appropriate true curve with β1=2.2098 and β0=0.4931 such that the corresponding effects of the smallest and largest dose are 0.01 and 0.99, respectively. Let’s call the true curve “E=f(log(dose))”. Then, the following equation is applied to generate data by inducing random error into effect:

E=true+error=flogdose+error

We simulated data with two types of errors, normal error term and beta error term, to examine the accuracy and sensitivity of model performance in general setting. The normal error term is implemented with different standard deviations (SDs), for example 0.005, 0.01 and 0.05, while the beta error term is under different precision parameter ϕ, for example 35, 15, 5. Note that the larger the ϕ, the smaller the variance. By implementing under different SD or ϕ, it allows for generation of not only well-controlled data which is assumed for experiments with almost no error, but also noised data which is more identical to real-world data. The generated data is 1 replicate given each dose level with the total simulation size equal 10,000 for each choice of SD or ϕ. Since the defined dose set is symmetric, we set up several scenarios under both error terms above: (1) full 7-dose set with extreme values; (2) 6-dose set after removing the largest dose; (3) 6-dose set after removing the smallest dose; (4) full 7-dose set with less extreme values by obtaining the smallest and largest dose levels with corresponding effect as 0.1 and 0.9 under the same true curve. The scenarios 1–4 assume constant precision parameter during data simulation and modeling process.

To mimic the real-world environment of data collections, the assumption of equal variance doesn’t always hold. Thus, we also set up the 5th scenario which uses full 7-dose set with extreme values with non-constant SD or precision parameter during data simulation and modeling process, but linearly dose-dependent. For normal error term, the modified SDs for data simulation have the form of SD*=(γ0+γ1*log.dose)*SD; for beta error term, the modified precisions ϕ for data simulation have the form of ϕ*=(γ0+γ1*log.dose)*ϕ. Assuming the same true dose-response curve as the previous simulation, we pre-defined γ0 and γ1 as 0.25 and 0.1378 such that the average of SD* is close to SD, and the average of ϕ* is close to ϕ, respectively.

Appendix 1—figure 1
Comparison of the point estimates and 95% confidence intervals using linear regression model, heavy-tailed linear regression model and robust beta regression model, with data simulated from beta error term.

The vertical solid lines indicate the true values. The dots represent the averaged point estimates and the bars represent the averaged lower and upper bound of 95% CIs. The point estimation by robust beta regression is consistently closer to the true value with a narrower 95% CI compared to the linear regression model. The 95% CI of heavy-tailed linear regression underestimates the nominal coverage probability. LRM: linear regression model; LRM-7: LRM under 7 dose dataset with extreme data points; LRM-6noL: LRM under 6 dose dataset after removing the highest dose data point; LRM-6noS: LRM under 6 dose dataset after removing the lowest dose data point; LRM-7lessE: LRM under 7 dose dataset with less extreme data points; LRM-7NCP: LRM under 7 dose dataset with extreme data points and dose-dependent precision; HLRM: heavy-tailed linear regression model; HLRM-7: Heavy-tailed LRM under 7 dose dataset with extreme data points; HLRM-6noL: Heavy-tailed LRM under 6 dose dataset after removing the highest dose data point; HLRM-6noS: Heavy-tailed LRM under 6 dose dataset after removing the lowest dose data point; HLRM-7lessE: Heavy-tailed LRM under 7 dose dataset with less extreme data points; HLRM-7NCP: Heavy-tailed LRM under 7 dose dataset with extreme data points and dose-dependent precision; BRM: robust beta regression model; BRM-7: BRM under 7 dose dataset with extreme data points; BRM-6noL: BRM under 6 dose dataset after removing the highest dose data point; BRM-6noS: BRM under 6 dose dataset after removing the lowest dose data point; BRM-7lessE: BRM under 7 dose dataset with less extreme data points; BRM-7NCP: BRM under 7 dose dataset with extreme data points and dose-dependent precision.

Appendix 1—table 1
Simulation result of bias, RMSE and 95% CI coverage probability corresponding to normal error terms.
ScenarioMethodBiasRMSE95% CI Coverage Probability
IC50IC90β0β1IC50IC90β0β1IC50IC90β0β1
(a)data simulated using normal error term with SD = 0.005
7 doses with extreme valuesLRM0.005–0.0470.0370.1520.0980.2980.5250.5570.9540.7730.9430.666
HLRM0.0000.0040.0020.0040.0180.1020.0630.1690.4680.3350.4630.225
BRM0.000–0.0040.0030.0110.0130.1300.0440.0880.9810.9270.9750.889
6 doses after removing largestLRM0.009–0.01–0.0240.0980.0450.1840.1290.5170.9670.9210.9710.691
HLRM–0.0010.0010.002–0.0010.0140.0630.0370.0650.4530.4080.4670.250
BRM0.0010.005–0.000.0050.0090.1320.0240.0890.9550.8920.9500.844
6 doses after removing smallestLRM–0.007–0.0360.0700.1020.0370.3060.3620.5330.9690.6230.9330.695
HLRM0.0010.007–0.002–0.0010.0140.0830.0460.0640.4560.2490.4000.259
BRM–0.0010.0000.0050.0050.0090.1510.0390.0890.9560.8720.9400.853
7 doses with less extreme valuesLRM0.000–0.0000.0000.0010.0050.0300.0160.0260.8910.8280.8830.799
HLRM0.0000.0000.0000.0000.0050.0310.0160.0280.6680.5990.6610.567
BRM0.0000.0000.0000.0000.0040.0260.0130.0240.8640.8230.8600.800
7 doses with extreme values and dose-dependent precisionLRM–0.016–0.0470.1370.1110.0690.4090.5830.5200.9650.5730.9210.685
HLRM0.0000.003–0.001–0.0010.0080.0490.0270.0360.3250.1880.2900.151
BRM0.0000.023–0.003–0.0110.0030.1960.0230.0950.9570.8490.9310.794
(b)data simulated using normal error term with SD = 0.01
7 doses with extreme valuesLRM0.030–0.2000.1660.8040.2240.5441.2341.3050.9580.6990.9500.716
HLRM0.0000.0060.0250.1050.0310.2380.1990.7950.4080.2860.4040.194
BRM0.001–0.0390.0130.0600.0300.1720.0940.1550.9810.9310.9760.892
6 doses after removing largestLRM0.040–0.088–0.1250.5490.1010.2800.3141.2450.9660.9240.9720.734
HLRM–0.0040.0070.008–0.0080.0270.1250.0750.1230.4240.3840.4450.241
BRM0.005–0.012–0.0070.0400.0190.1400.0440.1480.9540.8930.9490.850
6 doses after removing smallestLRM–0.027–0.1720.3570.5310.0800.5340.8361.2200.9660.5610.9370.734
HLRM0.0040.027–0.010–0.0050.0280.1600.0890.1250.4340.2470.3840.253
BRM–0.005–0.0350.0250.0400.0180.1780.0780.1430.9580.8770.9430.857
7 doses with less extreme valuesLRM0.000–0.0010.0010.0040.0110.0600.0320.0530.8920.8260.8830.799
HLRM0.0000.0000.0010.0020.0110.0610.0320.0560.6650.5970.6570.564
BRM0.0000.0010.0000.0010.0090.0520.0260.0480.8650.8220.8600.801
7 doses with extreme values and dose-dependent precisionLRM–0.052–0.2150.5240.4620.1300.6221.1230.9840.9650.5210.9210.726
HLRM0.0020.011–0.004–0.0030.0160.0960.0530.0710.3030.1800.2730.148
BRM0.0000.012–0.001–0.0050.0090.1420.0540.0800.9570.8510.9320.798
(c)data simulated using normal error term with SD = 0.05
7 doses with extreme valuesLRM0.079–0.4630.5602.3990.3930.9242.1281.8530.9480.6120.9420.591
HLRM0.0240.0470.3451.5240.2231.0501.6002.6380.4770.2580.4470.083
BRM0.0130.0290.0100.0950.1290.5240.3710.3770.8570.8610.8690.855
6 doses after removing largestLRM0.079–0.3380.3372.1940.2760.7492.0542.2390.9680.7790.9750.675
HLRM–0.0050.0800.2600.9260.1700.8201.3922.5470.4070.2930.4170.146
BRM0.0170.0010.0070.1600.1240.5120.3870.4300.8790.8350.9120.812
6 doses after removing smallestLRM0.022–0.4030.6662.2320.3350.9651.9362.2200.9720.6100.8980.681
HLRM0.0350.2440.1620.9190.1580.9981.3322.5260.4100.2090.3200.146
BRM0.009–0.0210.0300.1600.1190.5110.3580.4200.8740.7540.8620.819
7 doses with less extreme valuesLRM0.005–0.0770.0860.3620.0840.3830.6931.2150.9050.7950.8950.813
HLRM0.0010.0290.0090.0390.0530.3270.1850.4460.6190.5510.6100.520
BRM0.002–0.0010.0120.0620.0540.2960.1870.3110.8610.8100.8580.815
7 doses with extreme values and dose-dependent precisionLRM–0.029–0.4410.9941.6790.2970.9551.8401.7150.9450.4440.9040.678
HLRM0.0060.0370.2780.6170.1060.7101.1521.7810.3350.1450.2950.105
BRM–0.0050.0370.023–0.0020.0400.3230.1720.2710.9540.8190.9300.758
Appendix 1—table 2
Simulation result of bias, RMSE and 95% CI coverage probability corresponding to beta error terms.
ScenarioMethodBiasRMSE95% CI Coverage Probability
IC50IC90β0β1IC50IC90β0β1IC50IC90β0β1
(a)data simulated using beta error term with ϕ=35
7 doses with extreme valuesLRM0.017–0.3040.1420.6530.1630.4810.6580.6860.9240.6970.9090.566
HLRM0.008–0.1610.0850.3660.1170.4780.4020.5760.5810.4290.5710.313
BRM0.003–0.0140.0180.0730.0740.3390.2190.2750.8350.8180.8320.829
6 doses after removing largestLRM0.039–0.174–0.0270.5040.1150.4020.3540.6660.9330.8640.9360.666
HLRM0.015–0.0920.0260.2730.1090.4630.3370.4860.5810.5180.5940.368
BRM0.0080.0050.0070.0770.0770.3750.2290.2980.8120.8080.8110.803
6 doses after removing smallestLRM–0.022–0.2880.2480.5020.1080.5010.4990.6700.9300.6200.9010.664
HLRM0.000–0.1200.0940.2740.1060.5090.3660.4870.5960.3830.5470.368
BRM–0.002–0.0200.0310.0740.0750.3530.2230.2980.8150.7820.8050.800
7 doses with less extreme valuesLRM0.004–0.0450.0230.1170.0630.3320.1950.3000.9000.8280.8910.821
HLRM0.003–0.0170.0210.0960.0670.3820.2050.3230.6860.6300.6800.619
BRM0.0030.0310.0040.0280.0580.3310.1700.2720.8450.8370.8470.838
7 doses with extreme values and dose-dependent precisionLRM0.080–0.093–0.1790.4440.1470.2810.4350.5480.9040.9080.9270.639
HLRM0.061–0.200–0.0630.7530.1990.5530.6760.9670.4840.4470.5200.231
BRM0.0030.031–0.0030.0120.0670.2770.1820.2360.7550.7800.7680.772
(b)data simulated using beta error term with ϕ=15
7 doses with extreme valuesLRM0.029–0.5710.3731.6330.2330.5551.1771.1970.9410.5880.9100.406
HLRM0.016–0.3340.2250.9830.1720.6570.7251.1860.5610.3530.5440.213
BRM0.008–0.0500.0360.1740.1090.4640.3240.4030.8170.7900.8200.805
6 doses after removing largestLRM0.074–0.360–0.0431.2690.1640.5160.6351.1920.9420.8230.9390.556
HLRM0.029–0.2060.0860.7020.1640.7150.5940.9250.5520.4450.5510.272
BRM0.018–0.0280.0190.2030.1170.5270.3550.4460.7780.7830.7940.776
6 doses after removing smallestLRM–0.042–0.5640.6201.2800.1570.5960.8901.1800.9390.5030.9050.555
HLRM–0.001–0.2620.2370.7000.1600.7120.6630.9170.5460.3000.4860.268
BRM–0.001–0.0710.0660.1920.1150.4990.3450.4490.7870.7330.7820.777
7 doses with less extreme valuesLRM0.008–0.1040.0590.2860.0960.5130.3200.4890.9060.7950.8840.807
HLRM0.009–0.0300.0480.2400.1030.7260.3250.5170.6940.6020.6780.598
BRM0.0060.0740.0120.0620.0880.5530.2600.4090.8380.8360.8410.847
7 doses with extreme values and dose-dependent precisionLRM0.165–0.202–0.4251.1140.2180.3740.7710.9590.9160.8860.9410.534
HLRM0.141–0.347–0.2502.0550.2911.0271.2701.7980.5150.4400.5760.136
BRM0.0100.053–0.0080.0500.1040.4230.2810.3680.7630.7900.7640.769
(c)data simulated using beta error term with ϕ=5
7 doses with extreme valuesLRM0.042–0.8510.7483.4140.2700.5271.6111.4210.9480.4230.8980.198
HLRM0.040–0.6290.6893.1470.2781.5181.4802.1120.6810.2410.6440.094
BRM0.008–0.0110.0470.1910.1610.6060.5890.7670.8110.8000.8150.811
6 doses after removing largestLRM0.106–0.6360.1812.9620.2340.8651.4011.6040.9490.6840.9380.356
HRLM0.075–0.0980.2762.4100.2748.5991.3512.0820.6070.3620.6000.142
BRM0.007–0.1150.1120.3550.1660.6430.6210.7610.7620.7520.7730.769
6 doses after removing smallestLRM–0.039–0.8711.1262.9350.2310.6491.4281.6200.9530.3800.8830.362
HLRM0.004–0.3710.7732.3790.2615.2381.4642.0680.5990.2060.4920.149
BRM0.008–0.0870.0700.3280.1660.5880.5090.7240.7660.7550.7720.773
7 doses with less extreme valuesLRM0.008–0.4560.2751.1260.1520.5150.6791.0410.9150.6390.8740.707
HLRM0.007–0.3150.2360.9230.1640.7670.6511.0470.6810.4750.6440.491
BRM0.002–0.0880.0930.3180.1320.5880.4230.6600.8390.8120.8340.835
7 doses with extreme values and dose-dependent precisionLRM0.245–0.378–0.6462.3000.2740.4701.2071.2620.9080.8330.9470.343
HLRM0.209–0.578–0.2334.9820.3326.8522.0162.0170.6830.3810.7280.050
BRM0.0250.171–0.0320.0740.1610.7620.4260.5490.7570.7960.7610.770
Appendix 1—table 3
The first example of REAP application with B-cell lymphoma data, corresponding with Figure 5.

IC50 estimations are ranked from low to high. Hypothesis testings on equal potency (i.e., concentration for IC50) were conducted pairwise with the group right above (one rank lower). Jeko-1 has the highest potency and the difference of IC50 estimations between Jeko-1 and Jeko-R is significant with a P-value <0.0001. The B-cell lymphoma dataset is available on Github (Fang et al., 2022).

ModelInterceptSlope (m)Std. Err for mP-value for m>1IC50 estimationStd. Err for IC50 estimationPairwise comparison
Jeko-1–4.807–1.2520.1550.05190.0210.008-
Jeko-R–4.305–1.8220.112<.00010.0940.006<0.0001
Rec-1–5.304–2.630.091<.00010.1330.004<0.0001
Mino–2.684–1.4740.1410.00040.1620.0150.0656
Jeko-NO #1–2.012–1.1920.1350.07690.1850.0210.3755
MAVER-1–2.21–1.370.1250.00150.1990.0210.6312
Jeko-NO #11–1.459–1.2670.1520.03980.3160.0380.0114
JVM2–1.056–1.2710.1350.02230.4360.0550.0818
Appendix 1—table 4
The output for the estimated dose-response curve of anti-viral drugs under the same biological batch with SARS-CoV-2 data.

Calpain inhibitor IV has the highest potency (P-value = 0.0038). The reconstructed SARS-CoV-2 dataset is available on Github (Fang et al., 2022).

ModelInterceptSlope (m)Std. Err for mP-value for m>1EC50 estimationStd. Err for EC50 estimationPairwise comparison
CalpainInhibitorIV0.6780.7250.1140.99180.3930.103-
Chloroquine–1.0130.840.1350.88133.3370.880.0038
Remdesivir–1.7910.7970.120.95539.4692.6380.0282
Hydroxychloroquine–1.4850.5620.075114.0744.9940.4445
E64d (Aloxistatin)–3.2110.8610.1290.858741.6115.4730.1242

Data availability

All datasets in illustrative examples are available at https://github.com/vivid225/REAP/blob/main/REAP/ to generate the figures and tables using REAP.

The following data sets were generated

References

  1. Book
    1. Bickel P
    2. Doksum K
    (2015)
    Mathematical Statistics: Basic Ideas and Selected Topics
    Chapman and Hall/CRC.
  2. Book
    1. Chou T
    2. Rideout DC
    (1991)
    The Median-Effect Principle and the Combination Index for Quantitation of Synergism and Antagonism
    Synergism and Antagonism in Chemotherapy.
    1. Greco WR
    2. Bravo G
    3. Parsons JC
    (1995)
    The search for synergy: a critical review from a response surface perspective
    Pharmacological Reviews 47:331–385.
  3. Book
    1. Johnson N
    2. Kotz S
    3. Balakrishnan N
    (1995)
    Continuous Univariate Distributions
    Wiley-Interscience.
  4. Book
    1. Macmillan NA
    2. Creelman CD
    (2005)
    Detection Theory: A User’s Guide (Second Edition)
    Lawrence Erlbaum Associates Publishers.
  5. Book
    1. Schmidheiny K
    (2009)
    Heteroskedasticity in the Linear Model
    Econometrica.

Decision letter

  1. Philip Boonstra
    Reviewing Editor; University of Michigan, United States
  2. Aleksandra M Walczak
    Senior Editor; CNRS LPENS, France
  3. Philip Boonstra
    Reviewer; University of Michigan, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Robust and Efficient Assessment of Potency (REAP): A Quantitative Tool for Dose-response Curve Estimation" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Philip Boonstra as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by and Aleksandra Walczak as the Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Following Reviewer 2's first comment, please clearly state what LRM is.

2) Related to this and as suggested above, please consider also comparing the REAP approach to a version of LRM with a heavy tailed error distribution such as t-distribution with 3 degrees of freedom, which would also seem to possess the same robustness properties as REAP.

3) Please conduct additional stress testing and debugging of the shiny app, if it is intended to be included and advertised in the manuscript. See Reviewer 1's comments for specific suggestions.

Reviewer #1 (Recommendations for the authors):

I would suggest running additional stress tests on the web app to address some of the bugs above. You might also add some additional features. For example, there is a little help key next to 'Add effect estimation' that, when you hover over it, explains what that does. Could more such keys be added to other user inputs?

As to the methodology, it seems to me like it would be worthwhile to consider an approach such as what I describe in my public review – essentially the linear model but with a heavy tailed error distribution that will be less sensitive to extreme values.

Reviewer #2 (Recommendations for the authors):

The topic of this manuscript is very interesting to many researchers who work on drug screening tests and development. The manuscript is well developed and well written. How I do have some comments for the authors to address.

(1) The authors should clearly state what the LRM is, I assume it is model (3) based on normal error assumption. However, the authors should clearly spell it out.

(2) The parameter β is estimated by minimizing (8) or (9), are the two equations (8) and (9) equivalent? If so, the authors should make the connection clear.

(3) In the method session, the authors mentioned the stability condition for the selection of the tuning parameter. What is the stability condition? The authors stated that "If the stability condition is satisfied before αmax is reached, then optimal α equals the minimal value in the grid of αk.". Do the authors imply that the optimal α equals the minimal value in the grids which met the stability condition?

Overall, this is a nice manuscript and will make a nice contribution to the field involving in-vitro drug screening and testing.

https://doi.org/10.7554/eLife.78634.sa1

Author response

Essential revisions:

Reviewer #1 (Recommendations for the authors):

I would suggest running additional stress tests on the web app to address some of the bugs above. You might also add some additional features. For example, there is a little help key next to 'Add effect estimation' that, when you hover over it, explains what that does. Could more such keys be added to other user inputs?

We thank the reviewer for the instructive suggestion and conducted multiple experiments in different data patterns to stress test the web app. We have fixed the following bugs and added instructions for some features:

1. Solve the triplicate model assessments and comparison when there is only one agent in the dataset

2. Add a hover box next to “Model Comparisons” to indicate how to select the two check boxes for effect, slope and model comparisons

3. Add a hover box next to “Width of error bar” indicating the range of the error bar width should be within (0, 0.1). The input width is restricted within the range. If the input is outside of (0, 0.1), an error message will appear below the input box and the input value will be capped at 0.1.

4. Solve the problem when there are negative dose levels in data.

5. Solve the problem of missing sample mean and error bar in dose-response curve plotting given either out of range or missing dose responses.

As to the methodology, it seems to me like it would be worthwhile to consider an approach such as what I describe in my public review – essentially the linear model but with a heavy tailed error distribution that will be less sensitive to extreme values.

This is a great suggestion and we have added the simulation under both normal error term and β error term to compare the robust β regression framework with standard linear regression model with normal error distribution and the heavy-tailed linear model with t (df = 3) error distribution. In general, the heavy-tailed linear regression model demonstrates great potential to improve point estimation, but the coverage probabilities of its nominal 95% confidence interval are consistently underestimated below the level of 50% if with extreme values. The comparison results have been updated in the simulation Results section, Figure 4, Supplementary Figure 1 and Supplementary Tables 1 and 2.

Reviewer #2 (Recommendations for the authors):

The topic of this manuscript is very interesting to many researchers who work on drug screening tests and development. The manuscript is well developed and well written. How I do have some comments for the authors to address.

(1) The authors should clearly state what the LRM is, I assume it is model (3) based on normal error assumption. However, the authors should clearly spell it out.

We thank the reviewer for the instructive suggestion. LRM has been spelled out in the Methods section “A linear regression model (LRM) can be applied in the form of equation (3) with a standard normal distribution error.” Additionally, we also include the specifications on the heavy-tailed linear regression model (HLRM) and on the robust β-regression model (BRM) in the Methods section and provide the linkage in figure captions.

(2) The parameter β is estimated by minimizing (8) or (9), are the two equations (8) and (9) equivalent? If so, the authors should make the connection clear.

We apologize for the confusion. Equation (9) is simplified from equation (8), with technical details in Section 2.2 (Pg. 875) of Ghosh (2019). In the Methods section of the manuscript, additional descriptions are added:

- In equation (8),θ=(β,ϕ)T

- After mathematically simplifying equation (8), θ can be equivalently estimated by minimizing the objective function using the estimation equations (9).

(3) In the method session, the authors mentioned the stability condition for the selection of the tuning parameter. What is the stability condition? The authors stated that "If the stability condition is satisfied before αmax is reached, then optimal α equals the minimal value in the grid of αk.". Do the authors imply that the optimal α equals the minimal value in the grids which met the stability condition?

We thank the reviewer for the question for us to clarify the estimation methods. The stability condition is SQVαk<L. The default choice of L is 0.02. The optimal α is chosen in different situations:

1. If the stability condition is satisfied before αmax is reached, then optimal α equals the minimal value in the grid of αk.

2. If αmax is reached, then optimal α equals 0, which is equivalent to the maximum likelihood estimation.

The above descriptions are also illustrated in the paragraph after the definition of SQV in the Methods section.

https://doi.org/10.7554/eLife.78634.sa2

Article and author information

Author details

  1. Shouhao Zhou

    Department of Public Health Sciences, Pennsylvania State University, Hershey, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Writing – original draft, Writing – review and editing
    Contributed equally with
    Xinyi Liu and Xinying Fang
    For correspondence
    shouhao.zhou@psu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8124-5047
  2. Xinyi Liu

    Department of Public Health Sciences, Pennsylvania State University, Hershey, United States
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft
    Contributed equally with
    Shouhao Zhou and Xinying Fang
    Competing interests
    No competing interests declared
  3. Xinying Fang

    Department of Public Health Sciences, Pennsylvania State University, Hershey, United States
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft
    Contributed equally with
    Shouhao Zhou and Xinyi Liu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9121-5717
  4. Vernon M Chinchilli

    Department of Public Health Sciences, Pennsylvania State University, Hershey, United States
    Contribution
    Investigation, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Michael Wang

    Department of Lymphoma and Myeloma, University of Texas MD Anderson Cancer Center, Houston, United States
    Contribution
    Funding acquisition, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Hong-Gang Wang

    1. Department of Pharmacology, Pennsylvania State University, Hershey, United States
    2. Department of Pediatrics, Pennsylvania State University, Hershey, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Nikolay V Dokholyan

    1. Department of Pharmacology, Pennsylvania State University, Hershey, United States
    2. Department of Biochemistry and Molecular Biology, Pennsylvania State University, Hershey, United States
    Contribution
    Funding acquisition, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8225-4025
  8. Chan Shen

    1. Department of Public Health Sciences, Pennsylvania State University, Hershey, United States
    2. Department of Surgery, The Pennsylvania State University, Hershey, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  9. J Jack Lee

    Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared

Funding

Penn State College of Medicine (Junior Faculty Development Award)

  • Shouhao Zhou
  • Xinyi Liu

University of Texas MD Anderson Cancer Center (B-cell Lymphoma Moonshot Program)

  • Shouhao Zhou

National Cancer Institute (Cancer Center Support Grant P30 CA016672)

  • Shouhao Zhou
  • J Jack Lee

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was supported in part by the Penn State College of Medicine Junior Faculty Development Award, MD Anderson B-cell Lymphoma Moon Shot Program, and NIH Cancer Center Support Grant P30 CA016672.

Senior Editor

  1. Aleksandra M Walczak, CNRS LPENS, France

Reviewing Editor

  1. Philip Boonstra, University of Michigan, United States

Reviewer

  1. Philip Boonstra, University of Michigan, United States

Publication history

  1. Preprint posted: November 22, 2021 (view preprint)
  2. Received: March 14, 2022
  3. Accepted: June 21, 2022
  4. Version of Record published: August 3, 2022 (version 1)

Copyright

© 2022, Zhou, Liu, Fang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 408
    Page views
  • 102
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Shouhao Zhou
  2. Xinyi Liu
  3. Xinying Fang
  4. Vernon M Chinchilli
  5. Michael Wang
  6. Hong-Gang Wang
  7. Nikolay V Dokholyan
  8. Chan Shen
  9. J Jack Lee
(2022)
Robust and Efficient Assessment of Potency (REAP) as a quantitative tool for dose-response curve estimation
eLife 11:e78634.
https://doi.org/10.7554/eLife.78634

Further reading

    1. Cancer Biology
    2. Genetics and Genomics
    Minsu Kang, Hee Young Na ... Jong Seok Lee
    Research Article

    We aimed to elucidate the evolutionary trajectories of gallbladder adenocarcinoma (GBAC) using multi-regional and longitudinal tumor samples. Using whole-exome sequencing data, we constructed phylogenetic trees in each patient and analyzed mutational signatures. A total of 11 patients including 2 rapid autopsy cases were enrolled. The most frequently altered gene in primary tumors was ERBB2 and TP53 (54.5%), followed by FBXW7 (27.3%). Most mutations in frequently altered genes in primary tumors were detectable in concurrent precancerous lesions (biliary intraepithelial neoplasia, BilIN), but a substantial proportion was subclonal. Subclonal diversity was common in BilIN (n=4). However, among subclones in BilIN, a certain subclone commonly shrank in concurrent primary tumors. In addition, selected subclones underwent linear and branching evolution, maintaining subclonal diversity. Combined analysis with metastatic tumors (n=11) identified branching evolution in 9 patients (81.8%). Of these, 8 patients (88.9%) had a total of 11 subclones expanded at least 7-fold during metastasis. These subclones harbored putative metastasis-driving mutations in cancer-related genes such as SMAD4, ROBO1, and DICER1. In mutational signature analysis, 6 mutational signatures were identified: 1, 3, 7, 13, 22, and 24 (cosine similarity >0.9). Signatures 1 (age) and 13 (APOBEC) decreased during metastasis while signatures 22 (aristolochic acid) and 24 (aflatoxin) were relatively highlighted. Subclonal diversity arose early in precancerous lesions and clonal selection was a common event during malignant transformation in GBAC. However, selected cancer clones continued to evolve and thus maintained subclonal diversity in metastatic tumors.

    1. Cancer Biology
    2. Evolutionary Biology
    Juan Manuel Vazquez, Maria T Pena ... Vincent J Lynch
    Research Advance

    The risk of developing cancer is correlated with body size and lifespan within species, but there is no correlation between cancer and either body size or lifespan between species indicating that large, long-lived species have evolved enhanced cancer protection mechanisms. Previously we showed that several large bodied Afrotherian lineages evolved reduced intrinsic cancer risk, particularly elephants and their extinct relatives (Proboscideans), coincident with pervasive duplication of tumor suppressor genes (Vazquez and Lynch 2021). Unexpectedly, we also found that Xenarthrans (sloths, armadillos, and anteaters) evolved very low intrinsic cancer risk. Here, we show that: 1) several Xenarthran lineages independently evolved large bodies, long lifespans, and reduced intrinsic cancer risk; 2) the reduced cancer risk in the stem lineages of Xenarthra and Pilosa coincided with bursts of tumor suppressor gene duplications; 3) cells from sloths proliferate extremely slowly while Xenarthran cells induce apoptosis at very low doses of DNA damaging agents; and 4) the prevalence of cancer is extremely low Xenarthrans, and cancer is nearly absent from armadillos. These data implicate the duplication of tumor suppressor genes in the evolution of remarkably large body sizes and decreased cancer risk in Xenarthrans and suggest they are a remarkably cancer resistant group of mammals.