Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorJustin YeakelUniversity of California, Merced, Merced, United States of America
- Senior EditorMeredith SchumanUniversity of Zurich, Zürich, Switzerland
Joint Public Review:
Summary:
This study used a simulation approach with a large-scale compilation of published meta-analytic data sets to address the generalizability of meta-analyses. The authors used prediction interval/distribution as a central tool to evaluate whether future meta-analysis is likely to generate a non-zero effect.
Strengths:
Although the concept of prediction intervals is commonly taught in statistics courses, its application in meta-analysis remains relatively rare. The authors' creative use of this concept, combined with the decomposition of heterogeneity, provides a new perspective for meta-analysts to evaluate the generalizability of their findings. As such, I consider this to be a timely and practically valuable development.
Weaknesses:
First, in their re-analysis of the compiled meta-analytic data to assess generalizability, the authors used a hierarchical model with only the intercept as a fixed effect. In practice, many meta-analyses include moderators in their models. Ignoring these moderators could result in attributing heterogeneity to unexplained variation at the study or paper level, depending on whether the moderators vary across studies or papers. As a consequence, the prediction interval may be inaccurately wide or narrow, leading to an erroneous assessment of the generalizability of results derived from large meta-analytic data sets. A more accurate approach would be to include the same moderators as in the original meta-analyses and generate prediction intervals that reflect the effects of these moderators.
Second, the authors used a t-distribution to generate the prediction intervals and distributions for the hierarchical meta-analysis model. While the t-distribution is exact for prediction intervals in linear models, it is not strictly appropriate for models with random effects. This discrepancy arises because the variances of random effects must be estimated from the data, and using a t-distribution for prediction intervals does not account for the uncertainty in estimating these variance components. Unless the data is perfectly balanced (i.e., all random effects are nested and sample sizes within each level of the random factor are equal), it is well established that t-distribution (or equivalently, F-distribution) based hypothesis testing and confidence/prediction intervals are typically anti-conservative. As recommended in the linear mixed models literature, bootstrapping methods or some form of degrees-of-freedom correction would be more appropriate for generating prediction intervals in this context.
Finally, the authors define generalizability as the likelihood that a future study will yield a significantly non-zero effect. While this is certainly useful information, it is not necessarily the primary concern for many meta-analyses or individual studies. In fact, many studies aim to understand the mean response or effect within a specific context, rather than focusing on whether a future study will produce a significant result. For many research questions, the concern is not whether a future study will generate a significant finding, but whether the true mean response is different from zero. In this regard, the authors may have overstated the importance of knowing the outcome of a single future study, and framing this as the sole goal of research seems somewhat misguided.