Meta-Research: Releasing a preprint is associated with more attention and citations for the peer-reviewed article

  1. Darwin Y Fu
  2. Jacob J Hughey  Is a corresponding author
  1. Vanderbilt University Medical Center, United States
  • Download
  • Cite
  • CommentOpen annotations (there are currently 0 annotations on this page).
2 figures, 2 tables and 15 additional files

Figures

Figure 1 with 9 supplements
Absolute effect size of having a preprint, by metric (Attention Score and number of citations) and journal.

Each point indicates the predicted mean of the Attention Score (middle column) and number of citations (right column) for a hypothetical article with (green) or without (orange) a preprint, assuming …

Figure 1—source data 1

Absolute effect size of having a preprint, by metric and journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-fig1-data1-v2.csv
Figure 1—figure supplement 1
Accuracy of automatically inferring last-author publications from names and affiliations in PubMed.

Each point represents one of the 100 randomly selected articles. The gray line represents y = x. For details, see Supplementary file 14.

Figure 1—figure supplement 2
Histogram of the number of days by which release of the preprint preceded publication of the peer-reviewed article, including articles from all journals.
Figure 1—figure supplement 3
Scatterplots of Attention Score (with a pseudocount of 1) for articles in each journal.
Figure 1—figure supplement 4
Scatterplots of number of citations (with a pseudocount of 1) for articles in each journal.

For ease of visualization, 23 articles with more than 1024 citations were set to have exactly 1024 citations.

Figure 1—figure supplement 5
Scatterplots of number of citations vs. Attention Score for articles in each journal.

For ease of visualization, 23 articles with more than 1,024 citations were set to have exactly 1,024 citations.

Figure 1—figure supplement 6
Percentage of variance in MeSH term assignment explained by the top 15 principal components for each journal.
Figure 1—figure supplement 7
Scores for the top two principal components of MeSH term assignments for each journal.

Each point represents an article.

Figure 1—figure supplement 8
Comparing mean absolute error (MAE) and mean absolute percentage error (MAPE) of Gamma and log-linear regression models for each metric.

Each point represents a journal. The gray line indicates y = x.

Figure 1—figure supplement 9
Absolute effect size of having a preprint, by metric and journal.

The plots were generated identically to Figure 1, except they show 95% prediction intervals instead of 95% confidence intervals. Confidence intervals represent uncertainty in the population mean, …

Figure 2 with 2 supplements
Relative effect size of having a preprint, by metric (Attention Score and number of citations) and journal.

Fold-change corresponds to the exponentiated coefficient from log-linear regression, where fold-change >1 indicates higher Attention Score or number of citations for articles that had a preprint. A …

Figure 2—figure supplement 1
Associations of MeSH term PCs with Attention Score and citations in each journal, based on model coefficients from log-linear regression.

P-values are not adjusted for testing multiple journals.

Figure 2—figure supplement 2
Comparing model fits with and without MeSH term PCs.

Comparison in terms of (A) fold-change (i.e., exponentiated coefficient) for preprint status and (B) t-statistic for each of five variables. Each point represents a journal-metric pair.

Tables

Table 1
Random-effects meta-analysis across journals of model coefficients from log-linear regression.

A positive coefficient (column 3) means that Attention Score or number of citations increases as that variable increases (or if the article had a preprint or had an author with a U.S. affiliation or …

MetricArticle-level variableCoef.Std. error95% CI (lower)95% CI (upper)p-valueAdj. p-value
Attention ScoreHad a preprint0.5750.0360.5020.6471.91e-183.82e-18
log2(number of authors)0.1290.0150.0990.1581.04e-101.04e-10
log2(number of references + 1)0.0700.0210.0270.1132.10e-032.10e-03
Had an author with U.S. affiliation0.1430.0210.1000.1876.08e-086.08e-08
Had an author with Nature Index affiliation0.1470.0200.1060.1881.20e-082.41e-08
Last author publication age (yrs)−0.0090.001−0.011−0.0075.86e-101.17e-09
CitationsHad a preprint0.4420.0310.3800.5057.38e-177.38e-17
log2(number of authors)0.1810.0090.1630.2009.76e-221.95e-21
log2(number of references + 1)0.2170.0200.1760.2584.87e-139.73e-13
Had an author with U.S. affiliation0.0790.0110.0570.1021.49e-082.98e-08
Had an author with Nature Index affiliation0.1000.0150.0710.1303.46e-083.46e-08
Last author publication age (yrs)−0.0030.001−0.004−0.0018.61e-058.61e-05
Table 2
Meta-regression across journals of log fold-changes for having a preprint.

A positive coefficient means the log fold-change for having a preprint increases as that variable increases (or if articles in that journal are immediately open access). However, coefficients for …

MetricJournal-level variableCoef.Std. error95% CI (lower)95% CI (upper)t-statisticp-valueAdj. p-value
Attention ScoreImmediately open access0.1180.076−0.0370.2731.5510.1300.260
log2(Impact Factor)−0.0250.040−0.1070.057−0.6160.5420.542
log2(% of articles with preprints)−0.0640.032−0.1290.001−1.9910.0540.109
CitationsImmediately open access−0.0130.069−0.1520.126−0.1870.8530.853
log2(Impact Factor)0.0440.036−0.0300.1171.2110.2340.468
log2(% of articles with preprints)0.0370.029−0.0220.0951.2830.2080.208

Additional files

Supplementary file 1

Characteristics of journals included in this study.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp1-v2.csv
Supplementary file 2

Spearman correlation between Attention Score and number of citations for articles in each journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp2-v2.csv
Supplementary file 3

Journal-specific cutoffs of minimum abstract length for including articles in our dataset.

Articles from journals not specified in this table were included regardless of abstract length.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp3-v2.csv
Supplementary file 4

Descriptive statistics for each variable included in the regression models for each journal.

For logical variables, an entry “logical | nt | nf” corresponds to nt articles for which the variable was true and nf articles for which the variable was false. For numeric variables and dates, each entry corresponds to “minimum | first quartile | median | mean | third quartile | maximum”.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp4-v2.csv
Supplementary file 5

MeSH terms with the highest positive and negative loadings for each PC and each journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp5-v2.csv
Supplementary file 6

Mean absolute error (MAE) and mean absolute percentage error (MAPE) of log-linear regression, gamma regression, and negative binomial regression for each metric and each journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp6-v2.csv
Supplementary file 7

Regression statistics from log-linear regression for each metric, journal, and variable.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp7-v2.csv
Supplementary file 8

Random-effects meta-analysis across journals for each metric and variable, based on the regression statistics in Supplementary file 7.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp8-v2.csv
Supplementary file 9

Regression statistics from log-linear regression for each metric, journal, and variable, including as a variable the amount of time in years between release of the preprint and publication of the peer-reviewed article (“preprint_age”).

https://cdn.elifesciences.org/articles/52646/elife-52646-supp9-v2.csv
Supplementary file 10

Random-effects meta-analysis across journals for each metric and variable, based on the regression statistics in Supplementary file 9.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp10-v2.csv
Supplementary file 11

Regression statistics from log-linear regression for each metric, journal, and variable, excluding as variables the MeSH term PCs.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp11-v2.csv
Supplementary file 12

Meta-regression statistics of log fold-changes for having a preprint.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp12-v2.csv
Supplementary file 13

Frequencies of country of affiliation inferred using free-text affiliations from PubMed.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp13-v2.csv
Supplementary file 14

Comparison of automatically identified and manually identified earliest last-author publications for 50 randomly selected articles.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp14-v2.csv
Transparent reporting form
https://cdn.elifesciences.org/articles/52646/elife-52646-transrepform-v2.docx

Download links