1. Computational and Systems Biology
Download icon

Meta-Research: Releasing a preprint is associated with more attention and citations for the peer-reviewed article

  1. Darwin Y Fu
  2. Jacob J Hughey  Is a corresponding author
  1. Vanderbilt University Medical Center, United States
Feature Article
Cite this article as: eLife 2019;8:e52646 doi: 10.7554/eLife.52646
2 figures, 2 tables and 15 additional files

Figures

Figure 1 with 9 supplements
Absolute effect size of having a preprint, by metric (Attention Score and number of citations) and journal.

Each point indicates the predicted mean of the Attention Score (middle column) and number of citations (right column) for a hypothetical article with (green) or without (orange) a preprint, assuming the hypothetical article was published three years ago and had the mean value (i.e., zero) of each of the top 15 MeSH term PCs and the median value (for articles in that journal) of number of authors, number of references, U.S. affiliation status, Nature Index affiliation status, and last author publication age. Error bars indicate 95% confidence intervals. Journal names correspond to PubMed abbreviations: number of articles with (green) and without (orange) a preprint are shown in the left column. Journals are ordered by the mean of predicted mean Attention Score and predicted mean number of citations.

Figure 1—source data 1

Absolute effect size of having a preprint, by metric and journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-fig1-data1-v2.csv
Figure 1—figure supplement 1
Accuracy of automatically inferring last-author publications from names and affiliations in PubMed.

Each point represents one of the 100 randomly selected articles. The gray line represents y = x. For details, see Supplementary file 14.

Figure 1—figure supplement 2
Histogram of the number of days by which release of the preprint preceded publication of the peer-reviewed article, including articles from all journals.
Figure 1—figure supplement 3
Scatterplots of Attention Score (with a pseudocount of 1) for articles in each journal.
Figure 1—figure supplement 4
Scatterplots of number of citations (with a pseudocount of 1) for articles in each journal.

For ease of visualization, 23 articles with more than 1024 citations were set to have exactly 1024 citations.

Figure 1—figure supplement 5
Scatterplots of number of citations vs. Attention Score for articles in each journal.

For ease of visualization, 23 articles with more than 1,024 citations were set to have exactly 1,024 citations.

Figure 1—figure supplement 6
Percentage of variance in MeSH term assignment explained by the top 15 principal components for each journal.
Figure 1—figure supplement 7
Scores for the top two principal components of MeSH term assignments for each journal.

Each point represents an article.

Figure 1—figure supplement 8
Comparing mean absolute error (MAE) and mean absolute percentage error (MAPE) of Gamma and log-linear regression models for each metric.

Each point represents a journal. The gray line indicates y = x.

Figure 1—figure supplement 9
Absolute effect size of having a preprint, by metric and journal.

The plots were generated identically to Figure 1, except they show 95% prediction intervals instead of 95% confidence intervals. Confidence intervals represent uncertainty in the population mean, whereas prediction intervals represent uncertainty in an individual observation. Thus, prediction intervals show the article-to-article variation in Attention Score and citations, even when all variables in the model are fixed.

Figure 2 with 2 supplements
Relative effect size of having a preprint, by metric (Attention Score and number of citations) and journal.

Fold-change corresponds to the exponentiated coefficient from log-linear regression, where fold-change >1 indicates higher Attention Score or number of citations for articles that had a preprint. A fold-change of 1 corresponds to no association. Error bars indicate 95% confidence intervals. Journals are ordered by mean log fold-change. Bottom row shows estimates from random-effects meta-analysis (also shown in Table 1). The source data for this figure is in Supplementary file 7.

Figure 2—figure supplement 1
Associations of MeSH term PCs with Attention Score and citations in each journal, based on model coefficients from log-linear regression.

P-values are not adjusted for testing multiple journals.

Figure 2—figure supplement 2
Comparing model fits with and without MeSH term PCs.

Comparison in terms of (A) fold-change (i.e., exponentiated coefficient) for preprint status and (B) t-statistic for each of five variables. Each point represents a journal-metric pair.

Tables

Table 1
Random-effects meta-analysis across journals of model coefficients from log-linear regression.

A positive coefficient (column 3) means that Attention Score or number of citations increases as that variable increases (or if the article had a preprint or had an author with a U.S. affiliation or a Nature Index affiliation). However, coefficients for some variables have different units and are not directly comparable. P-values were adjusted using the Bonferroni-Holm procedure, based on having fit two models for each journal. Effectively, for each variable, the procedure multiplied the lesser p-value by two and left the other unchanged. Meta-analysis statistics for the intercept and publication date are shown in Supplementary file 8.

MetricArticle-level variableCoef.Std. error95% CI (lower)95% CI (upper)p-valueAdj. p-value
Attention ScoreHad a preprint0.5750.0360.5020.6471.91e-183.82e-18
log2(number of authors)0.1290.0150.0990.1581.04e-101.04e-10
log2(number of references + 1)0.0700.0210.0270.1132.10e-032.10e-03
Had an author with U.S. affiliation0.1430.0210.1000.1876.08e-086.08e-08
Had an author with Nature Index affiliation0.1470.0200.1060.1881.20e-082.41e-08
Last author publication age (yrs)−0.0090.001−0.011−0.0075.86e-101.17e-09
CitationsHad a preprint0.4420.0310.3800.5057.38e-177.38e-17
log2(number of authors)0.1810.0090.1630.2009.76e-221.95e-21
log2(number of references + 1)0.2170.0200.1760.2584.87e-139.73e-13
Had an author with U.S. affiliation0.0790.0110.0570.1021.49e-082.98e-08
Had an author with Nature Index affiliation0.1000.0150.0710.1303.46e-083.46e-08
Last author publication age (yrs)−0.0030.001−0.004−0.0018.61e-058.61e-05
Table 2
Meta-regression across journals of log fold-changes for having a preprint.

A positive coefficient means the log fold-change for having a preprint increases as that variable increases (or if articles in that journal are immediately open access). However, coefficients for different variables have different units and are not directly comparable. P-values were adjusted using the Bonferroni-Holm procedure, based on having fit two models. Depending on the two p-values for a given variable, the procedure may have left one p-value unchanged. Regression statistics for the intercept are shown in Supplementary file 12.

MetricJournal-level variableCoef.Std. error95% CI (lower)95% CI (upper)t-statisticp-valueAdj. p-value
Attention ScoreImmediately open access0.1180.076−0.0370.2731.5510.1300.260
log2(Impact Factor)−0.0250.040−0.1070.057−0.6160.5420.542
log2(% of articles with preprints)−0.0640.032−0.1290.001−1.9910.0540.109
CitationsImmediately open access−0.0130.069−0.1520.126−0.1870.8530.853
log2(Impact Factor)0.0440.036−0.0300.1171.2110.2340.468
log2(% of articles with preprints)0.0370.029−0.0220.0951.2830.2080.208

Additional files

Supplementary file 1

Characteristics of journals included in this study.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp1-v2.csv
Supplementary file 2

Spearman correlation between Attention Score and number of citations for articles in each journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp2-v2.csv
Supplementary file 3

Journal-specific cutoffs of minimum abstract length for including articles in our dataset.

Articles from journals not specified in this table were included regardless of abstract length.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp3-v2.csv
Supplementary file 4

Descriptive statistics for each variable included in the regression models for each journal.

For logical variables, an entry “logical | nt | nf” corresponds to nt articles for which the variable was true and nf articles for which the variable was false. For numeric variables and dates, each entry corresponds to “minimum | first quartile | median | mean | third quartile | maximum”.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp4-v2.csv
Supplementary file 5

MeSH terms with the highest positive and negative loadings for each PC and each journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp5-v2.csv
Supplementary file 6

Mean absolute error (MAE) and mean absolute percentage error (MAPE) of log-linear regression, gamma regression, and negative binomial regression for each metric and each journal.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp6-v2.csv
Supplementary file 7

Regression statistics from log-linear regression for each metric, journal, and variable.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp7-v2.csv
Supplementary file 8

Random-effects meta-analysis across journals for each metric and variable, based on the regression statistics in Supplementary file 7.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp8-v2.csv
Supplementary file 9

Regression statistics from log-linear regression for each metric, journal, and variable, including as a variable the amount of time in years between release of the preprint and publication of the peer-reviewed article (“preprint_age”).

https://cdn.elifesciences.org/articles/52646/elife-52646-supp9-v2.csv
Supplementary file 10

Random-effects meta-analysis across journals for each metric and variable, based on the regression statistics in Supplementary file 9.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp10-v2.csv
Supplementary file 11

Regression statistics from log-linear regression for each metric, journal, and variable, excluding as variables the MeSH term PCs.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp11-v2.csv
Supplementary file 12

Meta-regression statistics of log fold-changes for having a preprint.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp12-v2.csv
Supplementary file 13

Frequencies of country of affiliation inferred using free-text affiliations from PubMed.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp13-v2.csv
Supplementary file 14

Comparison of automatically identified and manually identified earliest last-author publications for 50 randomly selected articles.

https://cdn.elifesciences.org/articles/52646/elife-52646-supp14-v2.csv
Transparent reporting form
https://cdn.elifesciences.org/articles/52646/elife-52646-transrepform-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)