Example of how prediction distributions (PDs; panel A) combined with prediction intervals (PIs; panel B) can be used to understand the generality of population effect size.

We simulated hierarchical data with three sources of variance [between- and within-study variance as well as sampling error]. k = 100 effect size estimates were nested within a total of 50 studies, with a , and an overall mean population effect µθ of 0.5. The top plot (Panel A) provides the probability of observing an effect from a new study above the meaningful threshold (i.e., the lower boundary of 95% CIs around the mean). Red dashed lines are the upper and lower 95% PI, and black dashed lines are the ‘null’ effect. The bottom plot (Panel B) displays the estimated overall mean population effects, 95% confidence intervals (CIs), and 95% PI. The PDs were constructed using a t-distribution with k – 1 degrees of freedom, µθ as the location parameter, and total variance or study-specific variance as the scale parameter.

The overall and study-specific generality of 247 meta-analyses with statistically significant overall mean effects.

The generality is measured as 95% prediction intervals (PIs; Panel A) and the probability of observing an effect from a new study above a practically meaningful threshold (Panel B) at the study level. 95% of effect sizes from future studies from similar contexts will yield statistically significant effects if PIs exclude the null effect. Prediction intervals (PDs) offer the estimation of the likelihood that the effect exceeds a biologically or practically meaningful threshold (in this case, the lower confidence limit; see the main text for the sensitivity analysis). Each dot in Panel A represents each meta-analysis’s average population effect size. The whisker denotes 95% PIs. Each dot in Panel B represents the probability of future studies from similar contexts exceeding the lower bound of 95% confidence intervals. Note that practically meaningful thresholds can be adjusted according to practitioners’ needs (e.g., the smallest effect size that can trigger importance differences 17,18).