Region of Attainable Redaction, an extension of Ellipse of Insignificance analysis for gauging impacts of data redaction in dichotomous outcome trials

  1. David Robert Grimes  Is a corresponding author
  1. School of Medicine, Trinity College Dublin, Ireland
  2. School of Physical Sciences, Dublin City University, Ireland

Abstract

In biomedical science, it is a reality that many published results do not withstand deeper investigation, and there is growing concern over a replicability crisis in science. Recently, Ellipse of Insignificance (EOI) analysis was introduced as a tool to allow researchers to gauge the robustness of reported results in dichotomous outcome design trials, giving precise deterministic values for the degree of miscoding between events and non-events tolerable simultaneously in both control and experimental arms (Grimes, 2022). While this is useful for situations where potential miscoding might transpire, it does not account for situations where apparently significant findings might result from accidental or deliberate data redaction in either the control or experimental arms of an experiment, or from missing data or systematic redaction. To address these scenarios, we introduce Region of Attainable Redaction (ROAR), a tool that extends EOI analysis to account for situations of potential data redaction. This produces a bounded cubic curve rather than an ellipse, and we outline how this can be used to identify potential redaction through an approach analogous to EOI. Applications are illustrated, and source code, including a web-based implementation that performs EOI and ROAR analysis in tandem for dichotomous outcome trials is provided.

Editor's evaluation

This valuable study develops the Region of Attainable Redaction (ROAR), which quantifies the potential sensitivity of conclusions due to omitted data in two-arm clinical trials and studies of associations between dichotomous outcomes and exposures. The idea is supported by solid numerical examples and an application to a large meta-analysis. The concept of ROAR is a useful reminder of the fragility of some clinical findings.

https://doi.org/10.7554/eLife.93050.sa0

Introduction

Despite the crucial importance of biomedical science for human well-being, the uncomfortable reality is that swathes of published results in fields from psychology to cancer research are less robust than optimum (Ioannidis, 2005; Krawczyk, 2015; Loken and Gelman, 2017; Grimes et al., 2018; Errington et al., 2021). In cases when findings are spurious, inappropriate or errant statistical methods are often the primary cause of untrustworthy research, from incorrect interpretations of p-values to unsuitable tests to data redaction and under-reporting of overtesting, leading to research waste and unsound conclusions (Hoffmann et al., 2013; Altman and Krzywinski, 2017; Colquhoun, 2014; Glasziou et al., 2014; Grimes and Heathers, 2021a; Itaya et al., 2022; Baer et al., 2021a; Baer et al., 2021b). Across biomedical sciences, dichotomous outcome trials and studies are of paramount importance, forming the basis of everything from preclinical observational studies to randomized controlled trials. Such investigations contrast experimental and control groups for a given intervention, comparing the numbers experiencing a particular event in both arms to infer whether differences between the intervention and control arm might exist.

Such investigations are vital, but concern has been raised over the fragility of many published works, where small amounts of recoding from event to non-event in experimental arms or vice versa in control arms can create an illusion of a relationship where none truly exists. In previous work by the author (Grimes, 2022), Ellipse of Insignificance (EOI) analysis was introduced as a refined fragility index capable of handling even huge data sets analytically with ease, considering both control and experimental arms simultaneously, which traditional fragility analysis cannot. EOI analysis is robust and analytical, suitable not only for Randomized Controlled Trial (RCT) analysis but for observational trials, cohort studies, and general preclinical work. Additionally, it also links the concept of fragility to test sensitivity and specificity when these are known for the detection of events, enabling investigators to probe not only whether a result is arbitrarily fragile, but to truly probe whether consider certain results are even possible. Accordingly, it yields both objective metrics for fragility and can be employed to detect inappropriate manipulation of results if the statistical properties of the tests used are known. A web implementation of this is available at https://drg85.shinyapps.io/EOIanalysis/, replete with code in R and other popular languages for general application.

While EOI analysis is a powerful method for ascertaining trial robustness, it does not explicitly consider the scenario where data is redacted. Data redaction in biomedical science creates spurious results and untrustworthy findings (Grimes and Heathers, 2021b), and can be difficult to detect. Data redaction can be accidental due to some systematic error in analysis, due to missing data, or arise through deliberate cherry-picking, and there are currently few tools for gauging its likely impact outside of direct simulation. In this technical note, we unveil a novel and powerful method for quantifying how much redaction would be required to explain a seemingly significant finding in dichotomous outcome trials, automatically finding the degree of redaction required to yield spurious results, and objective metrics for defining this. While EOI analysis produced a conic section in the form of an inclined ellipse where significance disappeared, this new tool instead produces a bounded region where significance disappears attainable by redaction and calculates the minimal vector to this regions. This technical note outlines the methodology of Region of Attainable Redaction (ROAR) analysis, including examples, R and MATLAB code for user implementations, and a web implementation for ease of deployability.

Methods

The underlying geometrical and statistical basis for EOI analysis has been previously derived and described. In brief, EOI arises from chi-squared analysis, ascertaining how many participants in experimental and control groups could be recoded from events to non-events and vice versa before apparently significance was lost. This is a powerful approach for determining robustness of outcomes, and a web implementation and code are available at https://drg85.shinyapps.io/EOIanalysis/. EOI in its current form, however, does not consider the situation where a significant result might be obtained by data redaction, where an experimenter censors or neglects observations in the final analysis.

Defining a as the reported data of endpoint positive cases in the experimental or exposure arm, b as the reported experimental endpoint negative, c as reported endpoint positive cases in the control arm, and d as the reported endpoint negative cases in the control arm, we may define x and y as hypothetical redacted data in the experimental and control arm, respectively. We further define the total reported sample as n=a+b+c+d. To account for the impact of redaction, consider that an experimenter may obtain a significant result in favour of the experimental group in several ways. When relative risk is given by a(c+d)c(a+b) with no significant difference, the experimental arm could still yield a greater relative risk (RRE>1) than the control arm if either x endpoint negative events had been jettisoned from the experimental arm, y endpoint positive events jettisoned from the control or comparison arm, or a combination of both. Equally, if there is no significant difference but a lower relative risk in the experimental group is sought (RRE<1), such a finding can be manipulated by either jettisoning x endpoint positive cases from the experimental arm, y endpoint negative cases from the control arm, or a combination of both. These situations are given in Table 1. Risk ratio is used in this work for simplicity in gauging the relative impact of an ostensibly significant effect in the experimental arm and can be readily converted to odds ratio if preferred.

Table 1
Reported groups and related variables.
Redaction for RRe>1
Endpoint positiveEndpoint negative
Experimental groupab+x
Control groupc+yd
Redaction for RRe<1
Endpoint positiveEndpoint negative
Experimental groupa+xb
Control groupcd+y

Applying the chi-square statistic outlined previously with a threshold critical value for significance of νc, the resulting identity when RRE>1 is

(1) (n+x+y)(ad(b+x)(c+y))2(a+b+x)(c+d+y)(a+c+y)(b+d+x)νc=0

and when RRE<1, the identity is

(2) (n+x+y)((a+x)(d+y)bc)2(a+b+x)(c+d+y)(a+c+x)(b+d+y)νc=0

Similar to EOI analysis, these forms can be expanded. However, the resultant equations in either case are not the conic sections of an inclined ellipse as with EOI analysis, but a more complicated cubic curve also in two variables. The resultant identity is g(x,y), polynomial in x and y, with the list of 15 coefficients in either case in given in the mathematical appendix (Supplementary file 1, Table S1). The region bound by this equation is the ROAR, and any g(x,y)0 changes an ostensibly significant finding to the null one, with x and y respectively yielding the redaction from the experimental and control group required.

ROAR derivation and FOCK vector

In EOI analysis, we derived an analytical method for finding the minimum distance from the origin to the EOI. This point and vector, the Fewest Experimental/Control Knowingly Uncoded Participants (FECKUP) vector, allowed us to ascertain the minimal error which would render results insignificant. The resultant curve and bound region are inherently more complex in ROAR analysis, but the general principle remains. We seek the minimum distance from the origin to the region bound by g(x,y)=0, defining the vector to this point (xe,ye) as the Fewest Observations/Censored Knowledge (FOCK) vector. Unlike EOI analysis, we cannot exploit geometric arguments to solve this analytically, and instead we proceed by the method of Lagrange multipliers. The minimum distance from the origin to a point is given by

(3) D(x,y)=x2+y2

Defining the polynomial defined in Supplementary file 1, Table S1 as g(x,y), we can exploit the properties of Lagrange multipliers to write

(4) Dx=λgx
(5) Dy=λgy.

As we know λ0, we can rearrange these equations for the constant scalar λ and equate them, subject to the constraint g(x,y)=0. After rearrangement, we deduce that

(6) ygxxgy=0.

This yields another unwieldy polynomial in two variables with 18 coefficients, listed in the mathematical appendix (Supplementary file 1, Table S2) for both cases. If we define the resultant function as h(x,y), we seek to solve the simultaneous equations

(7) g(xe,ye)=0
(8) h(xe,ye)=0.

While analytical solutions are likely intractable, this can be readily solved numerically subject to the constraints that xe>0 and ye>0. By Bézout’s theorem, there are potentially up to 25 solutions to this simultaneous equation, so we restrict potential solution pairs to the real domain and select the pair yielding the minimum length FOCK vector, corresponding to (xe.ye) as illustrated in Figure 1. Additionally, we solve g(xc,0)=0 and g(0,yc)=0 as simple cubic equations to find the minimum number of observations redacted in exclusively the experimental and control groups to lose significance. The resolution of the FOCK vector yields the minimum redacted combination of experimental and control groups, given by

(9) rmin=xe+ye.
ROAR analysis implementation example.

(a) A simulated example of the Region of Attainable Redaction (ROAR) for a=70, b=30, c=50, d=50 (RRE>1) with all points bounded by the shaded region depict a degree of redaction which would not lead to the null being rejected. (b) Relevant vectors for ascertaining possible redaction thresholds in RRE>1 case. (c) ROAR analysis of the similar data but with (RRE<1) (a=50, b=50, c=70, d=30). Note that RRE>1 case is a transform of RRE<1 situation. (d) Relevant vectors for ascertaining possible redaction thresholds in case RRe<1.

Metrics for degree of potential redaction

In EOI analysis, we established objective metrics to characterize the degree of potential miscoding required to sustain the null hypothesis. In this technical note, we establish analogous parameters. Considering only potential redaction in the experimental group, we define the degree of potential redaction that can be sustained while the null hypothesis remains rejected, given by

(10) ρE=1a+ba+b+xc.

For example, a ROAR analysis with ρE=0.1 would inform us that at least 10% of experimental participants would have to be redacted for the result to lose significance. By similar reasoning, the tolerance threshold for error allowable in the control group is then

(11) ρC=1c+dc+d+yc.

Finally, errors in both the coding of the experimental and control group can be combined with FECKUP point knowledge. While fmin gives a minimum vector distance to the ellipse, we instead take the length of the vector components to reflect to yield an absolute accuracy threshold of

(12) ρA=1nn+rmin.

Unlike the EOI case, there is no direct relationship between test sensitivity/specificity and potential redaction.

Application to large data sets and meta-analyses

ROAR is also highly effective with large data sets, and with certain caveats can be applied to even meta-analyses results. For a meta-analyses of i dichotomous outcome trials, the crude pool risk ratio is given by

(13) RRC=1iai1i(ci+di)1ici1i(ai+bi)

whereas the Cochran–Mantel–Haenszel adjusted risk ratio accounts for potential confounding between studies and adjusts for sample size, given by

(14) RRCMH=1iai(ci+di)ni1ici(ai+bi)ni.

The magnitude of confounding between studies is given by |1RRCMHRRC|. If this is small (typically <10%), confounding can be assumed minimal and the crude ratio used, allowing ROAR to be deployed directly on pooled meta-analyses results if these conditions are met. When confounding is significant between studies, direct ROAR is not applicable and these caveats are expanded upon in discussion.

Results

Example deployment and ROAR behaviour

To demonstrate the usage of ROAR, we consider a simple twin example with the following arrangement of data.

  1. RRE>1: We generate a data set of N=200, with a=70, b=30 in the experimental arm, and c=d=50 in the control arm. This yields p<0.004, and a hypothetical risk ratio of 1.4 (95% confidence interval: 1.11–1.77). The ROAR for this data set is illustrated in Figure 1a and b, with the degree of redaction required given in Table 2s.

  2. RRE<1: We generate a similar data set of N=200, but invert the experimental and control arm so that a=b=50 with c=70 and d=30 in the control arm. This also yields p<0.039, and a hypothetical risk ratio of 0.71 (95% confidence interval: 0.57–0.90). The ROAR for this data set is illustrated in Figure 1c and d, with the degree of redaction required given in Table 2.

Table 2
ROAR-derived metrics for published data (see ‘Results’ for details).
Statistics for simulated example case
ROAR statistic (α=0.05)Calculated ROAR value (simulated RRE>1 data)Calculated ROAR value (simulated RRE<1 data)
Total subjects reported N200200
Relative risk (95% CI) RRE1.40 (1.11–1.77)0.60 (0.46–0.76)
FOCK point(xe,ye)=(6.89,4.79)(xe,ye)=(4.79,6.89)
rmin12 subjects12 subjects
Experimental redaction tolerance ρE9.51% (xc = 10 subjects)14.32% (xc = 14 subjects)
Control redaction tolerance ρC14.32% (yc = 16 subjects)9.51% (yc = 10 subjects)
Total redaction tolerance ρA5.66% (12 subjects)5.66% (12 subjects)
Statistics for large meta-analysis
ROAR statistic (α=0.05)Calculated ROAR value
Total subjects reported N39,197
Relative risk (95% CI)0.85 (0.74–0.97)
FOCK point(xe,ye)=(14.17,0.32)
rmin14 subjects
Experimental redaction tolerance ρE0.07% (13 subjects)
Control redaction tolerance ρC3.21% (649 subjects)
Total redaction tolerance ρA0.04% (14 subjects)
  1. FOCK, Fewest Observations/Censored Knowledge; ROAR, Region of Attainable Redaction.

As can be seen from Figure 1, the cases RRE>1 and RRE<1 are essentially geometrical rotations and reflections of one another, with rmin the same in both the values of ρE and ρC being transposed on reflection, as it evident from Table 2. This showcases the general behaviour of ROAR analysis, and in this example, it would require a redaction of between 10 and 16 subjects to lose apparent significance in either case, requiring at least 5.66% of the total subjects to have been redacted. Note that the real values of xe and ye are employed in calculating ρE and ρC, whereas the integer value rmin is used in ρA. For xc and yc, integer ceiling values yield the greatest possible redaction.

Application to large data sets and meta-analyses

We consider a published meta-analysis of vitamin D supplementation on cancer mortality (Zhang et al., 2019), comprising of N=39,197 patients from five RCTs, with a=397 deaths in the experimental group (supplementation) versus b=19,204 non-deaths and c=468 deaths in the control group versus d=19,128 non-deaths. Although the authors did not see reduction in all-cause mortality, subanalysis for the cancer population yielded an odds ratio of 0.85 (95% confidence interval: 0.74–0.97 for supplementation), reporting an ≈ 15% reduction in cancer mortality risk. With RRC=0.8481, RRCMH=0.8474, the magnitude of confounding is < 0.09% and thus we can apply ROAR to the pooled data. In this case, ROAR is illustrated in Figure 2, and the degree of redaction required is given in Table 2. Despite the ostensible strength of the result and large sample size, redaction of a mere 14 subjects (0.036%) or a small fraction of missing data would be sufficient to nullify the apparent finding, despite it stemming from a large meta-analysis.

ROAR example for large data-set.

(a) A simulated example of the Region of Attainable Redaction (ROAR) for a meta-analysis of N=39,197 as described in the text. (b) Relevant vectors for ascertaining possible redaction thresholds.

Browser-based implementation and source code

A browser-based implementation combining both EOI and ROAR is hosted at https://drg85.shinyapps.io/EOIROAR/, and relevant source code is hosted online for languages including R, MATLAB/OCTAVE, and Mathematica at https://github.com/drg85/EOIROAR_code, copy archieved at Grimes, 2023.

Discussion

ROAR analysis outlined in this technical note extends the functionality and usefulness of EOI analysis, allowing users to estimate the likely impacts of missing data. EOI analysis and related fragility methods had the limitations that while they handled potential miscoding, they were unsuitable for inferences or quantification of impacts of redacted data or subjects lost to follow-up. Accordingly, ROAR is a powerful method of gauging the potential impacts of missing data. While more mathematically complex than EOI analysis, ROAR remains deterministic and rapid, but shares the limitation that it is only currently applicable to dichotomous outcome trials and studies, and should be applied very cautiously to time-to-event data, where it may not be suitable. Like EOI analysis, ROAR differs from typical fragility metrics by avoiding Fisher’s exact test as this is not suitable for large data sets which EOI and ROAR can readily handle. This is typically not a problem as the chi-squared test employed approximates Fisher’s test in most circumstances. However, like EOI analysis, p-values for small trials can differ slightly from chi-squared result. ROAR analysis is built upon chi-squared statistics, and it is thus possible for edge cases of small numbers to yield discordant results with Fisher’s exact test also. This can be shown from a theoretical standpoint to not typically make any appreciable difference except for rare events in very small trials (Grimes, 2022; Baer et al., 2021a).

Application of ROAR analysis is inherently context specific. For clinical trials, preregistration in principle reduces the potential for experimenter choices like redaction bias changing the outcome. But the implementation of preregistered protocols does not exclude the possibility of data dredging in the form of p-hacking or HARKing (hypothesizing after the results are known). Researchers rarely follow the precise methods, plan, and analyses that they preregistered. A recent analysis found that pre-registered studies, despite having power analysis and higher sample size than non-registered studies, do not a priori seem to prevent p-hacking and HARKing (Bakker et al., 2020; Singh et al., 2021; El‐Boghdadly et al., 2018; Sun et al., 2019), with similar proportions of positive results and effect sizes between preregistered and non-preregistered studies (van den Akker et al., 2023). A survey of 27 preregistered studies found researchers deviating from preregistered plans in all cases, most frequently in relation to planned sample size and exclusion criteria (Claesen et al., 2021). This latter aspect lends itself to potential redaction bias (Grimes and Heathers, 2021b), which can be systematic rather than deliberate and thus a means to quantify its impact is important. More importantly, EOI analysis has application for dichotomous beyond clinical trials. In preclinical work, cohort, and observational studies, the scope for redaction bias greatly increases as reporting and selection of data falls entirely on the experimenter, and thus methods like ROAR to probe potential redaction bias are important. ROAR also has potential application in case–control studies, where selection of an inappropriately fragile control group could give spurious conclusions. This again comes with caveats as studies adjusted for potential confounders and predictors might make ROAR inappropriate in such cases.

As demonstrated in this work, ROAR is under certain circumstances applicable even to large meta-analysis. In this instance, a potential redaction of just 14 subjects from over 39,000 was sufficient to overturn the null hypothesis, despite a relative risk reduction of 15% being widely reported on the basis of this meta-analysis. This of course is not to say that any redaction occurred only to quantify the vulnerability of such a study to missing data. While beyond the scope of this technical note, it is worth noting that a subsequent 2022 meta-analysis (Zhang et al., 2022) of 11 RCTS (including those in the 2019 [Zhang et al., 2019] meta-analysis) found no significant reduction in cancer mortality with vitamin D supplementation, a potential testament to the need to consider the fragility of results to missing or miscoded data, even with ostensibly large samples. There are however important caveats to applying ROAR to meta-analyses. In its naive form, it is only suitable when there is minimal confounding between studies so that the crude relative risk differs minimally from the adjust risk (RRCRRCMH), such as in the illustrative work considered herein. When this is not the case, results from individual studies cannot be crudely pooled and ROAR is not valid in these instances. As ROAR applied to meta-analyses pools studies into a simple crude measure, it does not identity the particular study or studies where hypothetical redaction might have occurred, only the global fragility metric. A full theoretical extension of ROAR and EOI for specifically for meta-analyses is beyond the scope of this work, and accordingly, ROAR must be cautiously implemented and carefully interpreted in investigations of meta-analyses.

As discussed in the EOI paper (Grimes, 2022), poor research conduct including inappropriate statistical manipulation and data redaction are not uncommon, affecting up to three quarters of all biomedical science (Fanelli, 2009). Like EOI analysis, the ROAR extension has a potential role in detecting manipulations that nudge results towards significance, and identifying inconsistencies in data and adding invaluable context. It is a demonstrable reality that even seemingly strong results can falter under inspection, and tools like ROAR and EOI analysis have a potentially important role in identifying weak results and statistical inconsistencies, with wide potential application across meta-research with the goal of furthering sustainable, reproducible research.

Data availability

Sample R / MATLAB / OCTAVE code and functions for rapid implementation of EOI analysis method outlined, hosted online at https://github.com/drg85/EOIROAR_code (copy archived at Grimes, 2023). Web implementation is available at: https://drg85.shinyapps.io/EOIROAR/. All data is available in the paper and on GitHub.

References

Decision letter

  1. Philip Boonstra
    Reviewing Editor; University of Michigan, United States
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Biology Tübingen, Germany
  3. Philip Boonstra
    Reviewer; University of Michigan, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Region of Attainable Redaction, an extension of Ellipse of Insignificance analysis for gauging impacts of data redaction in dichotomous outcome trials" for consideration by eLife. I apologize for the long delay in reviewing your article. I had a great deal of trouble identifying reviewers who were willing and available to review. I (Phil) have reviewed your article as the Reviewing Editor, and the evaluation has been overseen Detlef Weigel as the Senior Editor.

Please see my recommendations below to help you prepare a revised submission.

Reviewer #1 (Recommendations for the authors):

In this article, Dr. Grimes develops the Region of Attainable Redaction (ROAR), which is a simple metric for the potential sensitivity of reported findings in two-arm randomized clinical trials due to omitted data. The idea of ROAR is to quantify the minimum number of hypothetical observations that – had they existed but not been reported – would have changed a statistically significant finding to a non-statistically significant finding. The author helpfully includes an easy-to-use online app and, for those interested, computer code. The strengths of this work include its ease of use and its intuitive meaning. Weaknesses include its limited scope of application, as discussed below.

The method is nominally applicable to any analysis of a binary outcome with two exposures in which a chi-squared test is appropriate. The intended use is for randomized clinical trials, but the utility of ROAR seems limited here given the expected rigor and reporting requirements for such studies, including preregistration, multiple levels of scientific and ethical review, and reporting requirements. Given this, it seems unlikely that the published analysis of a clinical trial could successfully "hide" observations. Although the Discussion notes that it is applicable to cohort and ecological studies, presumably any such observational study would adjust for other predictors and potential confounders in addition to the exposure of interest. Thus, it is not clear how applicable the concept of ROAR, which is an unadjusted analysis, is for such studies.

Dr. Grimes also applies ROAR to a large meta-analysis of vitamin D supplementation (exposure) and cancer mortality (outcome). This application seems more appropriate than those mentioned above. The key finding is that, in a meta-analysis of more than 39000 patients with 865 deaths, the statistical significance of the of 0.85 estimated risk ratio would be lost had there been just 14 additional (unreported) subjects in the exposure arm who had the outcome. Driving this surprising result is the low baseline risk of mortality in either group. For more prevalent events, the number would be much greater than 14.

A final weakness is in the author's claim in the Discussion that "[ROAR] can be employed…to detect the fingerprints of questionable research practices and even research fraud." This claim is surprising and not substantiated in the article, as ROAR is presented as a sensitivity analysis and not a diagnostic tool.

– Is there utility for single trials which require rigorous definitions and reporting of enrolled, eligible, evaluable patients? It seems unlikely that subjects could simply be redacted from a particular trial.

– Why do the regions in Figures 1 and 2 have upper limits? Does this correspond to a switching of significance in the other direction?

– Tables 2 and 3 are probably not of interest to most eLife readers and could probably be moved to a supplement.

– The notation should be made clearer: a, b, c, and d are the reported data, whereas x and y are the hypothetical redacted data. I don't think this is explicitly stated anywhere. Also, I don't think n is defined anywhere, presumably, it is a+b+c+d.

– In the second sentence of the second paragraph of the methods, I think the grammar needs to be clarified to reflect the hypothetical nature of the statement. For example, I think instead of '…are jettisoned…' the past perfect form '…had been jettisoned…' would be more appropriate. But more than just this needs to be changed.

– In section "ROAR derivation and FOCK vector", fifth line down: the e in 'ye' should be subscripted.

– In section "Results", in points 1 and 2: it is written "p < 0039" presumably there is missing a decimal somewhere. And this should be an equality not an inequality.

– In section Introduction, the sentence "EOI analysis in robust and analytical, suitable not only for RCT analysis but for observational trials, cohort studies, and general preclinical work". Should be changed to "…is robust and analytical".

– In Table 4, the percentages are not clear to me. E.g. Experimental redaction tolerance 9.51%(10 subjects). I thought the tolerance would be 10/110 or 9.1%.

– I think I had a similar comment on the EOI paper, but the FOCK coordinates seem like they should be integer-valued in order to be clinically applicable since you cannot redact a fraction of an observation. Put differently, I think that the identification of x_e and y_e and r_min should comprise three steps: (i) first identify x_e* and y_e*, the real-valued solutions to g(x,y)=0 and then (ii) identify (x_e, y_e) as the integer-valued coordinates that are closest to x_e* and y_e* and still in the ROAR. Then (iii) r_min = x_e + y_e. For example, in the bottom half of table 4, you cannot redact a fraction of an observation,

– Can ROAR be used in case-control studies – if so what special considerations would there need to be? And what about other estimands besides a risk ratio? Can you add some discussion about this?

https://doi.org/10.7554/eLife.93050.sa1

Author response

Reviewer #1 (Recommendations for the authors):

In this article, Dr. Grimes develops the Region of Attainable Redaction (ROAR), which is a simple metric for the potential sensitivity of reported findings in two-arm randomized clinical trials due to omitted data. The idea of ROAR is to quantify the minimum number of hypothetical observations that – had they existed but not been reported – would have changed a statistically significant finding to a non-statistically significant finding. The author helpfully includes an easy-to-use online app and, for those interested, computer code. The strengths of this work include its ease of use and its intuitive meaning. Weaknesses include its limited scope of application, as discussed below.

This is a concise and accurate summary, I thank the reviewer for it and will try to address their queries in order in this document.

The method is nominally applicable to any analysis of a binary outcome with two exposures in which a chi-squared test is appropriate. The intended use is for randomized clinical trials, but the utility of ROAR seems limited here given the expected rigor and reporting requirements for such studies, including preregistration, multiple levels of scientific and ethical review, and reporting requirements. Given this, it seems unlikely that the published analysis of a clinical trial could successfully "hide" observations. Although the Discussion notes that it is applicable to cohort and ecological studies, presumably any such observational study would adjust for other predictors and potential confounders in addition to the exposure of interest. Thus, it is not clear how applicable the concept of ROAR, which is an unadjusted analysis, is for such studies.

It is true that preregistration of clinical trials should, in principle, stem the influence of an experimenter choice changing the outcome. But there are caveats to this. Implementation of preregistered protocols does not exclude the possibility of p-hacking. Researchers rarely follow the precise methods, plan, and analyses that they preregistered. A recent analysis found that pre-registered studies, despite having power analysis and higher sample size than do not a prior seem to prevent p-hacking and HARKing, with similar proportions of positive results and effect sizes between preregistered and non-preregistered studies A 2019 survey of 27 preregistered studies found researchers deviating from preregistered plans in all cases, most frequently in relation to planned sample size and exclusion criteria. This latter aspect lends itself to potential redaction bias, which can be systematic rather than deliberate and thus a means to quantify its impact is important. In preclinical work and cohort studies, the scope for redaction bias greatly increases as reporting and selection of data falls entirely on the experimenter. The point about observational and ecological studies is absolutely valid, and it would depend on the design whether ROAR could be applied. Accordingly, the discussion has been revised with the following additional text to address this:

“Application of ROAR analysis is inherently context specific. For clinical trials, preregistration in principle reduces the potential for experimenter choices like redaction bias changing the outcome. But the implementation of preregistered protocols does not exclude the possibility of data dredging in the form of p-hacking or HARKing (hypothesizing after the results are known). Researchers rarely follow the precise methods, plan, and analyses that they preregistered. A recent analysis found that pre-registered studies, despite having power analysis and higher sample size than do not a prior seem to prevent p-hacking and HARKing17-20, with similar proportions of positive results and effect sizes between preregistered and non-preregistered studies21 A survey of 27 preregistered studies found researchers deviating from preregistered plans in all cases, most frequently in relation to planned sample size and exclusion criteria22 This latter aspect lends itself to potential redaction bias15 which can be systematic rather than deliberate and thus a means to quantify its impact is important. More importantly, EOI analysis has application for dichotomous beyond clinical trials. In preclinical work, cohort, and observational studies, the scope for redaction bias greatly increases as reporting and selection of data falls entirely on the experimenter, and thus methods like ROAR to probe potential redaction bias are important. ROAR also has potential application in case control studies, where selection of an inappropriately fragile control group could give spurious conclusions. This again comes with caveats, as studies adjusted for potential confounders and predictors might make ROAR inappropriate in such cases.”

Dr. Grimes also applies ROAR to a large meta-analysis of vitamin D supplementation (exposure) and cancer mortality (outcome). This application seems more appropriate than those mentioned above. The key finding is that, in a meta-analysis of more than 39000 patients with 865 deaths, the statistical significance of the of 0.85 estimated risk ratio would be lost had there been just 14 additional (unreported) subjects in the exposure arm who had the outcome. Driving this surprising result is the low baseline risk of mortality in either group. For more prevalent events, the number would be much greater than 14.

This is true, but the converse is that where events less rare, such an analysis would suggest the results to be robust. Please note that this discussion has changed substantially in this revision, because upon reflection, it pivots on the assumption there is little confounding between the studies that make up the meta-analyses. To reflect this, the text has changed in the methodology, results, and Discussion sections.

A final weakness is in the author's claim in the Discussion that "[ROAR] can be employed…to detect the fingerprints of questionable research practices and even research fraud." This claim is surprising and not substantiated in the article, as ROAR is presented as a sensitivity analysis and not a diagnostic tool.

This is a justified criticism, and it was badly phrased, apologies. The idea was that it might added to the arsenal of teams like INSPECT-SR in their efforts to detect dubious trials, but without context what I had wrote overstated things. I’ve deleted the offending sentence as I don’t think it adds anything and it is ripe for causing confusion, apologies.

– Is there utility for single trials which require rigorous definitions and reporting of enrolled, eligible, evaluable patients? It seems unlikely that subjects could simply be redacted from a particular trial.

As mentioned in the new references 17-22 inclusive, even with preregistration it is possible to p-hack results, and the redaction of prespecified data is one of the most common reasons for this. Even if not intention, as discussed in reference 15, systematic experimenter choices in excluding data can lend themselves to this (reference 15). To better address this, I’ve modified the text as outlined in my response to point 2.

– Why do the regions in Figures 1 and 2 have upper limits? Does this correspond to a switching of significance in the other direction?

Yes this is right, if I’m understanding the question correctly. In an analogous way to EOI, if one redacts enough they go from losing significance to “flipping” it to highly significant in the other direction the equivalent of going from a fractional relative risk not encompassing unity to a relative risk greater than one also not encompassing unity. For example, in figure 1a, redacting 16 people from the experimental arm would lose significance, whereas redacting 150 would be significant but in the opposite direction from the original reported. The FOCK vector finds the minimal distance to the region of redaction, the rmin resolves this, as explored further in point 14.

– Tables 2 and 3 are probably not of interest to most eLife readers and could probably be moved to a supplement.

They have now been moved to a mathematical appendix as they were rather space consuming.

– The notation should be made clearer: a, b, c, and d are the reported data, whereas x and y are the hypothetical redacted data. I don't think this is explicitly stated anywhere. Also, I don't think n is defined anywhere, presumably, it is a+b+c+d.

Thank you for noticing this, a total oversight on my behalf. The text has now been changed to read:

“Defining a as the reported data of endpoint positive cases in the experimental or exposure arm, b as the reported experimental endpoint negative, c as reported endpoint positive cases in the control arm and d as the reported endpoint negative cases in the control arm, we may define x and y as hypothetical redacted data in the experimental and control arm respectively. We further define the total reported sample as n = a + b + c + d.”

– In the second sentence of the second paragraph of the methods, I think the grammar needs to be clarified to reflect the hypothetical nature of the statement. For example, I think instead of '…are jettisoned…' the past perfect form '…had been jettisoned…' would be more appropriate. But more than just this needs to be changed.

Yes, this sounds much better, thank you. It has been changed throughout the text for consistency.

– In section "ROAR derivation and FOCK vector", fifth line down: the e in 'ye' should be subscripted.

Well spotted! Completely missed this, corrected now.

– In section "Results", in points 1 and 2: it is written "p < 0039" presumably there is missing a decimal somewhere. And this should be an equality not an inequality.

Thank you, it is actually something in the region of p = 0.00389 so I have rewritten this as p < 0.004 for clarity in this iteration.

– In section Introduction, the sentence "EOI analysis in robust and analytical, suitable not only for RCT analysis but for observational trials, cohort studies, and general preclinical work". Should be changed to "…is robust and analytical".

Corrected, thank you for spotting this.

– In Table 4, the percentages are not clear to me. E.g. Experimental redaction tolerance 9.51%(10 subjects). I thought the tolerance would be 10/110 or 9.1%.

The slight discrepancy arises because ρE and ρC are calculated with the real-valued intersection point (xe,ye) whereas ρA arises from the integer valued rmin. This has now been clarified in the text, and is expanded upon in point 14 below.

– I think I had a similar comment on the EOI paper, but the FOCK coordinates seem like they should be integer-valued in order to be clinically applicable since you cannot redact a fraction of an observation. Put differently, I think that the identification of x_e and y_e and r_min should comprise three steps: (i) first identify x_e* and y_e*, the real-valued solutions to g(x,y)=0 and then (ii) identify (x_e, y_e) as the integer-valued coordinates that are closest to x_e* and y_e* and still in the ROAR. Then (iii) r_min = x_e + y_e. For example, in the bottom half of table 4, you cannot redact a fraction of an observation,

This is a fair point, but the current terminology is used to keep it consistent with EOI. While (xe,ye) is real valued, its resolved vector is the integer valued rmin is analogy with the resolved FECKUP vector in EOI. The reason it is only resolved after this step is because there will be edge cases where taking the floor or ceiling values (xe,ye) before the vector resolution will offset the resolved vector by one-two subjects. But to clarify that (xc,yc) must be integer valued, this has been added to the table and text to avoid confusion.

– Can ROAR be used in case-control studies – if so what special considerations would there need to be? And what about other estimands besides a risk ratio? Can you add some discussion about this?

Please see response to point 2 in relation to case-control studies. Risk ratio was employed as it gives an intuitive metric for experimental interventions that have an apparently significant effect, but one could readily reformulate the entire thing as an odds ratio or similar. To account for this, I added the following line to the methodology:

“Risk ratio is used in this work for simplicity in gauging the relative impact of an ostensibly significant effect in the experimental arm, and can be readily be converted to odds ratio if preferred.”

https://doi.org/10.7554/eLife.93050.sa2

Article and author information

Author details

  1. David Robert Grimes

    1. School of Medicine, Trinity College Dublin, Dublin, Ireland
    2. School of Physical Sciences, Dublin City University, Dublin, Ireland
    Contribution
    Conceptualization, Resources, Software, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration
    For correspondence
    davidrobert.grimes@tcd.ie
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3140-3278

Funding

Wellcome Trust (214461/A/18/Z)

  • David Robert Grimes

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Acknowledgements

DRG thanks the Wellcome trust for their support, and Profs Phillip Boonstra and Detlef Weigel for their helpful comments.

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Biology Tübingen, Germany

Reviewing Editor

  1. Philip Boonstra, University of Michigan, United States

Reviewer

  1. Philip Boonstra, University of Michigan, United States

Version history

  1. Preprint posted: September 25, 2023 (view preprint)
  2. Received: October 4, 2023
  3. Accepted: January 23, 2024
  4. Accepted Manuscript published: January 29, 2024 (version 1)
  5. Version of Record published: February 16, 2024 (version 2)

Copyright

© 2024, Grimes

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 81
    Page views
  • 14
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. David Robert Grimes
(2024)
Region of Attainable Redaction, an extension of Ellipse of Insignificance analysis for gauging impacts of data redaction in dichotomous outcome trials
eLife 13:e93050.
https://doi.org/10.7554/eLife.93050

Share this article

https://doi.org/10.7554/eLife.93050

Further reading

    1. Epidemiology and Global Health
    Olivera Djuric, Elisabetta Larosa ... The Reggio Emilia Covid-19 Working Group
    Research Article

    Background:

    The aim of our study was to test the hypothesis that the community contact tracing strategy of testing contacts in households immediately instead of at the end of quarantine had an impact on the transmission of SARS-CoV-2 in schools in Reggio Emilia Province.

    Methods:

    We analysed surveillance data on notification of COVID-19 cases in schools between 1 September 2020 and 4 April 2021. We have applied a mediation analysis that allows for interaction between the intervention (before/after period) and the mediator.

    Results:

    Median tracing delay decreased from 7 to 3.1 days and the percentage of the known infection source increased from 34–54.8% (incident rate ratio-IRR 1.61 1.40–1.86). Implementation of prompt contact tracing was associated with a 10% decrease in the number of secondary cases (excess relative risk –0.1 95% CI –0.35–0.15). Knowing the source of infection of the index case led to a decrease in secondary transmission (IRR 0.75 95% CI 0.63–0.91) while the decrease in tracing delay was associated with decreased risk of secondary cases (1/IRR 0.97 95% CI 0.94–1.01 per one day of delay). The direct effect of the intervention accounted for the 29% decrease in the number of secondary cases (excess relative risk –0.29 95%–0.61 to 0.03).

    Conclusions:

    Prompt contact testing in the community reduces the time of contact tracing and increases the ability to identify the source of infection in school outbreaks. Although there are strong reasons for thinking it is a causal link, observed differences can be also due to differences in the force of infection and to other control measures put in place.

    Funding:

    This project was carried out with the technical and financial support of the Italian Ministry of Health – CCM 2020 and Ricerca Corrente Annual Program 2023.

    1. Epidemiology and Global Health
    Qixin He, John K Chaillet, Frédéric Labbé
    Research Article

    The establishment and spread of antimalarial drug resistance vary drastically across different biogeographic regions. Though most infections occur in sub-Saharan Africa, resistant strains often emerge in low-transmission regions. Existing models on resistance evolution lack consensus on the relationship between transmission intensity and drug resistance, possibly due to overlooking the feedback between antigenic diversity, host immunity, and selection for resistance. To address this, we developed a novel compartmental model that tracks sensitive and resistant parasite strains, as well as the host dynamics of generalized and antigen-specific immunity. Our results show a negative correlation between parasite prevalence and resistance frequency, regardless of resistance cost or efficacy. Validation using chloroquine-resistant marker data supports this trend. Post discontinuation of drugs, resistance remains high in low-diversity, low-transmission regions, while it steadily decreases in high-diversity, high-transmission regions. Our study underscores the critical role of malaria strain diversity in the biogeographic patterns of resistance evolution.