Meta-Research: Dataset decay and the problem of sequential analyses on open datasets

  1. William Hedley Thompson  Is a corresponding author
  2. Jessey Wright
  3. Patrick G Bissett
  4. Russell A Poldrack
  1. Department of Psychology, Stanford University, United States
  2. Department of Clinical Neuroscience, Karolinska Institutet, Sweden
  3. Department of Philosophy, Stanford University, United States
3 figures, 1 table and 2 additional files

Figures

Correction procedures can reduce the probability of false positives.

(A) The probability of there being at least one false positive (y-axis) increases as the number of statistical tests increases (x-axis). The use of a correction procedure reduces the probability of there being at least one false positive (B: α-debt; C: α-spending; D: α-investing). Plots are based on simulations: see main text for details. Dotted line in each panel indicates a probability of 0.05.

The order of sequential tests can impact true positive sensitivity.

(A) The true positive rate in the uncorrected case (left-most panel), in two cases of simultaneous correction (second and third panels), and in three cases of sequential correction (fourth, fifth and sixth panels). In each panel the true positive rate after 100 tests is plotted as a function of two simulation parameters: λ (x-axis) and the simulated covariance of the true positives (y-axis). When λ is positive (negative), it increases the probability of the true positives being an earlier (later) test. Plots are based on simulations in which there are ten true positives in the data: see main text for details. (B) Same as A for the false positive rate. (C) Same as A for the false discovery rate. (D) Same as C for the average false discovery rate in four quadrants. Q1 has λ <0; covariance >0.25. Q2 has λ >0; covariance >0.25. Q3 has λ <0; covariance <0.25. Q4 has λ >0; covariance <0.25. The probability of true positives being an earlier test is highest in Q2 and Q4 as λ >0 in these quadrants. (E) Same as D with the false discovery rate (y-axis) plotted against the percentage of true positives (x-axis) for the four quadrants. The dotted lines in D and E indicate a false discovery rate of 0.05. Code is available at https://github.com/wiheto/datasetdecay (Thompson, 2020; copy archived at https://github.com/elifesciences-publications/datasetdecay).

Demonstrating the impact of different correction procedures with a real dataset.

(A) The number of significant statistical tests (x-axis) that are possible for various correction procedures in a real dataset from the Human Connectome Project: see the main text for more details, supplementary file 1 for a list of the variables used in the analysis, and https://github.com/wiheto/datasetdecay copy archived at https://github.com/elifesciences-publications/datasetdecay for the code. (B) The potential number of publications (x-axis) that could result from the tests shown in panel A. This assumes that a publication requires a null hypothesis to be rejected in order to yield a positive finding. The dotted line shows the baseline from the two simultaneous correction procedures. Error bars show the standard deviation and circles mark min/max number of findings/studies for the sequential correction procedures with a randomly permuted test order.

Tables

Table 1
Summary of the different sequential correction methods and the open-data desiderata.

Yes indicates that the method is compatible with the desideratum.

Sharing incentiveOpen accessStable false positive rate
α-spendingNoNoYes
α-investingYesNoYes
α-debtYesYesNo

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. William Hedley Thompson
  2. Jessey Wright
  3. Patrick G Bissett
  4. Russell A Poldrack
(2020)
Meta-Research: Dataset decay and the problem of sequential analyses on open datasets
eLife 9:e53498.
https://doi.org/10.7554/eLife.53498