Abstract
Meta-analysis is a vital component in clinical decision making, but previous work found binary event meta-analytic results can be fragile, affected by only a small number of patients in specific trials. Meta-analyses can also miss literature, and a method for estimating how much additional unseen data would flip results would be a useful tool. This works establishes a complementary and generalisable definition of meta-analytic fragility, based on Ellipse of Insignificance (EOI) and Region of Attainable Redaction (ROAR) methods originally developed for dichotomous outcome trials. This method does not require trial-specific alterations to estimate fragility and yields a general method to estimate robustness of a meta-analysis to data redaction or addition of hypothetical trial outcomes. This method is applied to 3 meta-analyses with conflicting findings on the association of vitamin D supplementation and cancer mortality. A full meta-analysis of all trials cited in the 3 meta-analyses yielded no association between vitamin D supplementation and cancer mortality. Using the method outlined here, it was determined that meta-analytic fragility was high in all cases, with recoding of just 5 patients in the full cohort of 133,262 patients was enough to cross the significance threshold. Small amounts of redacted or non-included data also had substantial impact on each meta-analysis, with addition of just 3 hypothetical patients to an ostensibly significant meta-analyses (N = 38,538) enough to yield a null result. This method for analytical fragility is complementary to previous investigations that suggested meta-analyses are frequently fragile. It further shows that merely increasing the sample size is not an assurance against fragility. Caution should be advised when interpreting the results of meta-analyses and conflicting results may stem from inherent fragility and should be carefully employed.
Introduction
Meta-analyses are critical for synthesising medical evidence and providing robust estimates of treatment effects (Murad et al., 2014; Dechartres et al., 2013; Nordmann et al., 2012). However, there are concerns that meta-analyses are frequently improperly conducted, and in recent years there has been methodological concerns over the rigour of an increasing number of these publications (Lawrence et al., 2021; Siemens et al., 2021; Hameed et al., 2020), particularly when constituent Randomized Controlled Trials (RCTs) have data integrity issues which meta-analysis cannot solve (Wilkinson et al., 2024, 2025). There is an additional consideration that deserves mention in dichotomous outcome or binary outcome trials, which is the issue of fragility. In RCTs, many trial results are fragile, with recoding of a small number of events to non-events or vice versa in either the experimental or control arm often sufficient to convert a seemingly statistically significant result to a null, and vice versa (Grimes, 2022; Tignanelli and Napolitano, 2019; Baer et al., 2021a,b).
While this issue has generated much recent discussion in RCTs, this has not received much consideration for meta-analyses, which are often thought of as more robust to such manipulations. This however may not be the case. Previously, Atal et al (Atal et al., 2019) suggested meta-analyses may be more fragile than might be presumed in binary outcome cases. In that work, they established an algorithm to find the minimum re-coding of patients between studies that would engineer a seemingly significant result from a null one or vice versa. Applying this to a large sample of meta-analyses, Atal et al found that many were inherently fragile. This is a powerful method, but one possible objection is that it has the downside of non-generalisability, finding very specific combinations of trials and patients that would have to be re-coded. For example, an Atal meta-analytic fragility of 4 pertains to a specific and often unique circumstance when 4 patients could be recoded from a specific study or combinations thereof to change outputs, but this does not generalise to any 4 patients in that meta-analysis. This makes this definition of meta-analytic fragility useful but not generalisable, and less intuitive to interpret than a typical RCT fragility metric.
In this work, we establish a generalizable meta-analytic fragility metric, based upon Ellipse of Insignificance (EOI) analysis for dichotomous outcome trials (Grimes, 2022). EOI is a refined and analytical fragility metric that simultaneously finds the degree of fragility in both control and experimental arms of dichotomous outcome trials, deriving its name from the geometrical reality relating to chi squared tests that for any seemingly significant result mapped on a plane where one axis corresponds to the experimental arm and the perpendicular the control, there is an associated ellipse inside which all results are null. EOI analysis analytically finds and resolves the minimum displacement (Fewest Experimental/Control Knowingly Uncoded Participants (FECKUP) vector) from this region, reporting fragility as proportion of group sizes in context. As it is analytical and based upon chi square testing, it handles samples of all size, reporting a readily interpreted contextualised measure.
Intimately related to this is Region of Attainable Redaction (ROAR) analysis (Grimes, 2024b), a methodology for ascertaining potential influence of redacted data on reported results. There are many reasons why relevant data might be missing from an analysis, ranging from simple absent data, missed studies, inappropriate cut-offs and even deliberate cherry-picking (Grimes and Heathers, 2021). This is highly relevant to meta-analysis, and it would be useful to have a method to quantify the potential impact of redaction on reported results and gives an additional estimate of how robust reported results may be when new data is accrued. Accordingly in this work, we will extend these methodologies to handle meta-analysis, contrast them to Atal et al’s metric, and apply them to conflicting Vitamin D trials to illustrate their epidemiological utility. This package of meta-analytic fragility tools we name EOIMETA for brevity.
Methods
Derivation of Meta-analytic fragility and redaction analysis (EOIMETA)
For both EOI and ROAR, full mathematical details have been previously published (Grimes, 2022, 2024b) and an online implementation is available at https://drg85.shinyapps.io/EOIROAR/ for any dichotomous outcome trial. Extending this to the metaanalytic situation, consider i studies, each with an experimental group with ai events and bi non-events, and control group with ci events and di non-events per study. Consider for any given meta-analysis the crude unadjusted pooled relative ratio (RRp) and a Cochran–Mantel–Haenszel risk ratio (RRCMH), where confounding effects between studies are given by 


where a continuity correction of 0.5 may be applied for any zero values. The weighted average of each trial is wi = SE(ln(RRi))™1 and total pooled risk ratio is given by

Once the pooled risk ratio is known, we can scale the event and non-event rate in the experimental group to match the adjusted relative risk while keeping the total number in the experimental arm constant. Letting a = Σai, b = Σbi, c = Σci and d = Σdi, we allow a∗ = a + x and b∗ = b ™ x where x may be positive or negative to adjust the experimental group to the derived risk ratio. We can solve for this value to obtain

For consistency, the values of a∗ and b∗ are rounded to integers. This modification accounts for study level heterogeneity, and a standard EOI analysis can then be applied to the vector (a∗, b∗, c, d). In addition, we can also employ ROAR analysis to the same vector, yielding the raw number of patients in either or both arm who could be added a given direction to change the result, and exact combination of control and experimental group redactions required to change the result from a significant finding to a null one. Caveats for implementation and interpretation are outlined in the discussion section.
Application to vitamin D studies
Results of vitamin D supplementation and cancer mortality have been markedly inconsistent. In recent years, several metaanalyses of published trials yielded markedly conficting results. A review and meta-analysis by Zhang et al (Zhang et al., 2019) and correction (Zhang et al., 2020) of 5 RCTs involving 39,168 patients found a significant association between vitamin D supplementation and reduced cancer mortality, reporting a 15% decrease in the risk of cancer death with vitamin D supplementation. Similarly, a meta-analysis by Gou et al (Guo et al., 2023) of 11 RCTs involving 112,165 patients found a significant reduction in relative risk of cancer death with supplementation of 12%. By contrast, a subsequent study of 6 RCTS involving 61,853 patients (Zhang et al., 2022) found no significant reduction in cancer death risk with vitamin D supplementation. All 3 meta-analyses involved combinations drawn from just 12 trials.
To investigate this apparent case of meta-analytic fragility, we extracted data from all 12 studies included in the 3 meta-analyses. A small inconsistency between all 3 meta-analysis was found in relation to one RCT (Scragg et al., 2018); In Zhang et al, the rate of cancer deaths in the supplementation and control group were reported as 44/2558 and 45/2550 respectively, versus being reported as 28/2558 versus 30/2558 in Gou et al and 30/2550 versus 30/2550 in Zheng et al. Disagreement between these figures appears to stem from cancers detected in both groups prior to the randomisation process. Accordingly, the adjusted ITT figure used in this work is 30/2544 versus 30/2535 respectively for Scragg et al’s work. A further nuance is that for another small RCT (Ammann et al., 2017), direct data was not given on the number of cases. This however can be estimated from minimal assumption through the hazard ratio. For all included studies, relative risk and 95% confidence intervals were calculated explicitly except for Lehouck et al (Lehouck et al., 2012) where zero recorded deaths in the supplementation sample required confidence intervals to be estimated by bootstrap methods.Initially, we ran fragility analysis on all of these meta-analysis, with the new metric outlined here as well as Atal et al’s algorithm. After this, we combined data and performed a meta-analysis on all 12 studies to ascertain potential effects of vitamin D and cancer mortality. A full fragility and robustness analysis was then performed on this, and results discussed to elucidate findings.
Meta-analytic fragility package
We developed a custom R package (Grimes, 2025) (eoirroar, available at https://github.com/drg85/EOIMETA and on Zenodo) to perform the fragility and redaction analysis described in the methods, as well as Atal et al’s study specific alogorithm method for comparison.
Results
Meta-analytic fragility estimation of Vitamin D Cancer Mortality studies
For all conflicting meta-analyses, the unadjusted crude pooled risk ratio and CMH risk ratio were calculated prior to the analysis being conducted. For the three meta-analyses, this ranged from 0.08% ™ 1.1% confounding, negligible differences which suggest the meta-analytic approach outlined in this work is applicable. Table 1 gives the results for EOI and ROAR meta-analysis as well as the Atal et al method. Figure 2 depicts the EOI analysis results for meta-analysis by Zhang et al (including 5 RCTs, 38,538 patients) and Gou et al (11 RCTs, 111,952 patients).

Application of meta-analytic fragility / redaction sensitivity to conflicting meta-analyses
Figure 1 gives the EOI meta-analytic fragility vectors for both positive meta-analyses, showing the degree of recoding in both experimental and control groups required to flip results. The negative values on both axes arise because the relative risk of the experimental group (vitamin D supplementation) is ostensibly lower than the placebo group, so in the convention of EOI analysis, negative values imply additional of events to the experimental group and/or subtraction of events from the control. In the case of Zhang et al, it would take recoding of just 4 events / non-events in either experimental or control arms, or 4 recodings total in combination (< 0.01% of the total sample) to lose significance, even less than the fragility estimated by Atal et al’s algorithm. In the case of Guo et al, recoding of 38 events / non-events (< 0.038% of sample) achieves the same result, whereas there is a even more extreme specific fragility detected by Atal et al’s algorithm.

Meta-analytic EOI fragility analysis results for (a) Zhang et al and (b) Guo et al.
See text for details.
The redaction analysis is also informative, because even if there were no coding or data errors in any study or their synthesis, the addition of small numbers of patients can profoundly change the result. In the case of Zhang et al, adding just 3 patients to the cohort would be enough to lose significance, while in Guo et al it is 37. Despite Zheng et al having substantially fewer patients than Guo et al, it appears more robust to fragility by all metrics. This in turn leads to an important observation – the fact that a meta-analysis has more patients and studies does not inherently make it more robust or less fragile, as discussed in the next section.
Meta-analysis of all studies
Figure 2 depicts the meta-analysis forest plot of all 12 RCTs of 133,475 patients using a random effects model, while table 2 gives summary meta-analytic fragility statistics. Table 2 gives the statistics for when all 12 studies are combined, finding that both EOI fragility and Atal et al’s fragility actually increase despite the increase in the number of subjects. Even with a large sample, the recoding of just 5 non-events to events, in this case in the placebo arm (additional deaths in the non-vitamin D group), would alter the results to cross the threshold of significance (p = 0.048) and likely change inferences.

Meta-analysis of all 12 RCTS including study details and relative risk of vitamin D supplementation on cancer mortality

Meta-analytic fragility of all 12 studies
Discussion
The meta-analytic fragility tool presented here, EOIMETA, is generalisable and flexible, complementing Atal et al’s study and patient specific method. The nuance between both approaches is that in Atal et al’s algorithm identifies a specific combination of patients from specific studies that may be moved to alter results, whereas the current work establishes a method for pooling meta-analysis on ascertaining their group fragility and the potential impact of missing data. In this respect, they answer complementary and subtlety different questions specific to meta-analyses.
In RCT literature, a frequent objection to the use of fragility analysis methods is that the mere existence of a small FI might be an artefact of trial design. For clinical trials especially, a well-designed trial should be structured as to minimize patient exposure to unknown harms, striving to ensure that just enough participate to allow the detection of clinically relevant effects. He argument that RCTs at least might be fragile ‘by design’ is however convincingly countered by other authors, who find no evidence of p-value distributions clustering around significance thresholds after a sample size calculation and find additional that fragility in well-designed studies is not always low. More recent robust methods for fragility analysis like EOI and for redaction like ROAR also have application far beyond ostensibly fragile-by-design RCTs, but for cohort studies, preclinical work, and even ecological studies. The fragile-by-design argument is also a weaker on in the context of meta-analysis, where the marshalling of many studies to increase effect precision measurements should strengthen the evidence but results here and previously suggest that even meta-analysis of large groups of patients can be profoundly fragile.
There are a few subtle issues and limitations to consider additionally; Atal et al’s meta-analytic fragility metric in principle identifies the precise combination of inter-study and patient recoding required to flip a result, a “worse-case” scenario, identifying the most fragile point in a meta-analyses. This usually results in the Atal et al fragility metric being typically smaller than the complement EOI fragility metric outlined here, but this is not always the case. Because Atal et al’s method is a greedy algorithm, it is time-consuming to run on large collections of studies and can sometimes miss optimal and smaller solutions. Secondly, the adjustment of the events and non-events in the experimental group due to weighed inverse variance methods in the EOI fragility method subtly alters group composition, resulting in a smaller EOI fragility than a crude pooled estimate. What is evident from the examples shown in this work is that even in large cohorts with relatively low heterogeneity, the impact of recoding only miniscule numbers of patients can drastically alter interpretation; in the 12-study meta-analysis, recoding a mere 5 patients in 133,262 was sufficient to alter the result from a null to a positive finding.
While the method outlined here is analytic and rapid, it has some limitations that must be considered. Like Atal’s method, it applies only to meta-analyses of trials with dichotomous outcomes and is not suitable for continuous outcomes. It is also appropriate only in contexts where internal confounding is low, and it may not be reliable in the presence of high between-study heterogeneity. Accordingly, careful application is required. At its most essential, this implementation reflects the meta-analytic relative risk by generating a synthetic pooled 2×2 table that preserves the total sample size in each arm. This can be interpreted as the best single-study approximation of all studies under fixed-effect assumptions. The resulting meta-analytic fragility metric is useful for modeling perturbations in results, such as the addition of patients (who were previously redacted) or the recoding of events and non-events. The advantage of this approach lies in its tractability, offering a simpler alternative to working with multiple studies individually and a deterministic and analytic resultant fragility metric. However, it is important to emphasize that it does not correspond to true patient-level pooling or standardisation.
The use of vitamin D meta-analyses in this work was chosen as illustrative rather than specific, but it is worth noting that there are methodological concerns with much Vitamin D research (Grimes et al., 2024). This goes beyond the scope of the current work, but serves as an example of the reality that meta-analysis is only as strong as its underlying data, and the conclusions drawn from them must always be seen in context. While meta-analysis is a powerful method for refining effect size estimates, it cannot overcome poorly conducted research or bad data (Jané et al., 2025), nor it is intrinsically robust. These limitations should be kept in mind when considering meta-analytic results, and scientists should consider them doubly when opting to perform a meta-analysis. Mass produced and unreliable meta-analyses are a recognised and growing problem (Ioannidis, 2016), and we need be mindful not to add to increasingly issue of research waste and unreliable research (Glasziou and Chalmers, 2018; Grimes, 2024a).
Data Availability
All relevant code for this undertaking including the study data from the included trials is available at the linked Github repository and the Zenodo DOI
Additional information
Funding statement
No external funding received.
References
- 1.Incidence of hematologic malignancy and cause-specific mortality in the Women’s Health Initiative randomized controlled trial of calcium and vitamin D supplementationCancer 123:4168–4177Google Scholar
- 2.The statistical significance of meta-analyses is frequently fragile: definition of a fragility index for meta-analysesJournal of Clinical Epidemiology 111:32–40Google Scholar
- 3.Fragility indices for only sufficiently likely modificationsProceedings of the National Academy of Sciences 118:e2105254118Google Scholar
- 4.The fragility index can be used for sample size calculations in clinical trialsJournal of Clinical Epidemiology 139:199–209Google Scholar
- 5.Influence of trial sample size on treatment effect estimates: meta-epidemiological studyBmj 346Google Scholar
- 6.Research waste is still a scandal—an essay by Paul Glasziou and Iain ChalmersBmj 363Google Scholar
- 7.The ellipse of insignificance, a refined fragility index for ascertaining robustness of results in dichotomous outcome trialseLife 11:e79573https://doi.org/10.7554/eLife.79573Google Scholar
- 8.Is biomedical research self-correcting? Modelling insights on the persistence of spurious scienceRoyal Society Open Science 11:231056Google Scholar
- 9.Region of Attainable Redaction, an extension of Ellipse of Insignificance analysis for gauging impacts of data redaction in dichotomous outcome trialseLife 13:e93050https://doi.org/10.7554/eLife.93050Google Scholar
- 10.EOIMETA: EOIMETA V1 R code (eoimeta)Zenodo https://doi.org/10.5281/zenodo.15878923
- 11.The new normal? Redaction bias in biomedical scienceRoyal Society open science 8:211308Google Scholar
- 12.Arbitrary vitamin D deficiency thresholds yield unreliable and potentially spurious results–a review and investigationOpen Science Framework https://doi.org/10.31219/osf.io/su6cxGoogle Scholar
- 13.Association between vitamin D supplementation and cancer incidence and mortality: A trial sequential meta-analysis of randomized controlled trialsCritical Reviews in Food Science and Nutrition 63:8428–8442Google Scholar
- 14.An assessment of the quality of current clinical meta-analysesBMC Medical Research Methodology 20:105Google Scholar
- 15.The mass production of redundant, misleading, and conflicted systematic reviews and meta-analysesThe Milbank Quarterly 94:485–514Google Scholar
- 16.Major Flaws in Taylor et al.’s (2025) Meta-analysis on Fluoride Exposure and Children’s IQ ScoresOpen Science Framework https://doi.org/10.31219/osf.io/zhm54_v3Google Scholar
- 17.The lesson of ivermectin: meta-analyses based on summary data alone are inherently unreliableNature Medicine 27:1853–1854Google Scholar
- 18.High doses of vitamin D to reduce exacerbations in chronic obstructive pulmonary disease: a randomized trialAnnals of internal medicine 156:105–114Google Scholar
- 19.How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literatureJama 312:171–179Google Scholar
- 20.Meta-analyses: what they can and cannot doSwiss medical weekly 142:w13518–w13518Google Scholar
- 21.Monthly high-dose vitamin D supplementation and cancer risk: a post hoc analysis of the vitamin D assessment randomized clinical trialJAMA oncology 4:e182178–e182178Google Scholar
- 22.Methodological quality was critically low in 9/10 systematic reviews in advanced cancer patients—A methodological studyJournal of Clinical Epidemiology 136:84–95Google Scholar
- 23.The fragility index in randomized clinical trials as a means of optimizing patient careJAMA surgery 154:74–79Google Scholar
- 24.Protocol for the development of a tool (INSPECT-SR) to identify problematic randomised controlled trials in systematic reviews of health interventionsBMJ open 14:e084164Google Scholar
- 25.Assessing the feasibility and impact of clinical trial trustworthiness checks via an application to Cochrane Reviews: Stage 2 of the INSPECT-SR projectJournal of Clinical Epidemiology :111824Google Scholar
- 26.Association between vitamin D supplementation and cancer mortality: a systematic review and meta-analysisCancers 14:3717Google Scholar
- 27.Association between vitamin D supplementation and mortality: systematic review and meta-analysisbmj 366Google Scholar
- 28.Correction: Association between vitamin D supplementation and mortality: systematic review and meta-analysisbmj 370Google Scholar
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.108693. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, David Robert Grimes
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.