Advantages and disadvantages of different approaches data collection for literature mining. The first two approaches are not scalable, while the third is not accessible to researchers with lower technical expertise. We have aimed to make our approach as scalable, reproducible, and accessible as possible.

Our suggested workflow and litmining ecosystem of tools for efficient, reproducible meta-research. Our tool pubget performs the tasks of collecting documents and extracting content; this corpus can be stored in a dedicated OSF project. Our tools labelbuddy, pubget, and pubextract can be used to manually and automatically extract information. We have an open repository of labelbuddy annotations, where researchers can re-use, update, and add new annotations. For the step of analyzing the data, each project would have its own code, which we hope would be tracked and shared in its own repository on GitHub or elsewhere.

Meta-analytic maps produced by pubget and by the original NeuroSynth and NeuroQuery platforms for some example terms. We note that pubget-NeuroSynth (i.e., the top row) has higher statistical power and better face validity for rare terms than the original NeuroSynth (second row). On the other hand, the original NeuroQuery (fourth row) was trained on 13,000 full-text articles and therefore performs better than pubget-NeuroQuery (third row). From these and other examples we suggest the following rule of thumb: (i) for frequent, well-defined terms such as “auditory” or “parietal”, all methods produce adequate results; (ii) for formal meta-analysis of a single term, pubget-NeuroSynth produces the best results; (iii) for multi-term queries, neuroquery.org or Text2Brain [Ngo et al., 2022] produce the best results. There is no neurosynth.org map for “prosopagnosia” because this term is too rare to be included in NeuroSynth vocabulary.

A) Performance of the sample size extraction plugin in pubextract as well as GPT-3.5. The x-axis shows the sample size extracted from the text, and the y-axis shows the sample size reported in the article. The dashed line shows the identity line. The stars represent the median values of each extraction method, compared to the ground-truth median. B) Distribution of ages for different participant groups, extracted with pubextract. “Unknown” is chosen when the tool fails to detect whether a participant group corresponds to patients or healthy controls – in most cases when this is not specified explicitly the participants are healthy. We note that the distribution of healthy participants’ ages has a large peak around the age of university students, who are often recruited in their own university’s studies. Patients tend to be older on average, with a long tail likely due to studies on aging or neurodegenerative diseases. C) Median sample size through time. Error bars show 95% bootstrap confidence intervals. Following Poldrack et al. [2017], for sample sizes extracted from pubget-downloaded articles, we only consider single-group studies.

Screen capture of annotating the participants demographic in one document. The window on the right is labelbuddy, displaying an article, the available labels and their shortcuts (right column), and the existing annotations for the current document (right column). The left window is the dedicated tool for participant demographics, showing the inferred group structure and information about each group. This tool is not part of labelbuddy itself; it is distributed as part of the labelbuddy-annotations repository.

The scatter plot displays the number of times each dFC assessment method was applied across the annotated papers, versus the number of times it was cited. The year of the first publication of each method is also shown besides its name. Clustering was both highly cited and highly applied, while Time-Frequency was highly cited but not highly applied. Sliding Window was highly applied, although not as frequently cited as Clustering. Co-Activation Patterns and Window-less were not highly applied, although Co-Activation Patterns was as frequently cited as Sliding Window.

Number of documents, labels, annotators and annotations for each project in the labelbuddy-annotations repository.