Each point indicates the predicted mean of the Attention Score (middle column) and number of citations (right column) for a hypothetical article with (green) or without (orange) a preprint, assuming …
Absolute effect size of having a preprint, by metric and journal.
Each point represents one of the 100 randomly selected articles. The gray line represents y = x. For details, see Supplementary file 14.
For ease of visualization, 23 articles with more than 1024 citations were set to have exactly 1024 citations.
For ease of visualization, 23 articles with more than 1,024 citations were set to have exactly 1,024 citations.
Each point represents an article.
Each point represents a journal. The gray line indicates y = x.
The plots were generated identically to Figure 1, except they show 95% prediction intervals instead of 95% confidence intervals. Confidence intervals represent uncertainty in the population mean, …
Fold-change corresponds to the exponentiated coefficient from log-linear regression, where fold-change >1 indicates higher Attention Score or number of citations for articles that had a preprint. A …
P-values are not adjusted for testing multiple journals.
A positive coefficient (column 3) means that Attention Score or number of citations increases as that variable increases (or if the article had a preprint or had an author with a U.S. affiliation or …
Metric | Article-level variable | Coef. | Std. error | 95% CI (lower) | 95% CI (upper) | p-value | Adj. p-value |
---|---|---|---|---|---|---|---|
Attention Score | Had a preprint | 0.575 | 0.036 | 0.502 | 0.647 | 1.91e-18 | 3.82e-18 |
log2(number of authors) | 0.129 | 0.015 | 0.099 | 0.158 | 1.04e-10 | 1.04e-10 | |
log2(number of references + 1) | 0.070 | 0.021 | 0.027 | 0.113 | 2.10e-03 | 2.10e-03 | |
Had an author with U.S. affiliation | 0.143 | 0.021 | 0.100 | 0.187 | 6.08e-08 | 6.08e-08 | |
Had an author with Nature Index affiliation | 0.147 | 0.020 | 0.106 | 0.188 | 1.20e-08 | 2.41e-08 | |
Last author publication age (yrs) | −0.009 | 0.001 | −0.011 | −0.007 | 5.86e-10 | 1.17e-09 | |
Citations | Had a preprint | 0.442 | 0.031 | 0.380 | 0.505 | 7.38e-17 | 7.38e-17 |
log2(number of authors) | 0.181 | 0.009 | 0.163 | 0.200 | 9.76e-22 | 1.95e-21 | |
log2(number of references + 1) | 0.217 | 0.020 | 0.176 | 0.258 | 4.87e-13 | 9.73e-13 | |
Had an author with U.S. affiliation | 0.079 | 0.011 | 0.057 | 0.102 | 1.49e-08 | 2.98e-08 | |
Had an author with Nature Index affiliation | 0.100 | 0.015 | 0.071 | 0.130 | 3.46e-08 | 3.46e-08 | |
Last author publication age (yrs) | −0.003 | 0.001 | −0.004 | −0.001 | 8.61e-05 | 8.61e-05 |
A positive coefficient means the log fold-change for having a preprint increases as that variable increases (or if articles in that journal are immediately open access). However, coefficients for …
Metric | Journal-level variable | Coef. | Std. error | 95% CI (lower) | 95% CI (upper) | t-statistic | p-value | Adj. p-value |
---|---|---|---|---|---|---|---|---|
Attention Score | Immediately open access | 0.118 | 0.076 | −0.037 | 0.273 | 1.551 | 0.130 | 0.260 |
log2(Impact Factor) | −0.025 | 0.040 | −0.107 | 0.057 | −0.616 | 0.542 | 0.542 | |
log2(% of articles with preprints) | −0.064 | 0.032 | −0.129 | 0.001 | −1.991 | 0.054 | 0.109 | |
Citations | Immediately open access | −0.013 | 0.069 | −0.152 | 0.126 | −0.187 | 0.853 | 0.853 |
log2(Impact Factor) | 0.044 | 0.036 | −0.030 | 0.117 | 1.211 | 0.234 | 0.468 | |
log2(% of articles with preprints) | 0.037 | 0.029 | −0.022 | 0.095 | 1.283 | 0.208 | 0.208 |
Characteristics of journals included in this study.
Spearman correlation between Attention Score and number of citations for articles in each journal.
Journal-specific cutoffs of minimum abstract length for including articles in our dataset.
Articles from journals not specified in this table were included regardless of abstract length.
Descriptive statistics for each variable included in the regression models for each journal.
For logical variables, an entry “logical | nt | nf” corresponds to nt articles for which the variable was true and nf articles for which the variable was false. For numeric variables and dates, each entry corresponds to “minimum | first quartile | median | mean | third quartile | maximum”.
MeSH terms with the highest positive and negative loadings for each PC and each journal.
Mean absolute error (MAE) and mean absolute percentage error (MAPE) of log-linear regression, gamma regression, and negative binomial regression for each metric and each journal.
Regression statistics from log-linear regression for each metric, journal, and variable.
Random-effects meta-analysis across journals for each metric and variable, based on the regression statistics in Supplementary file 7.
Regression statistics from log-linear regression for each metric, journal, and variable, including as a variable the amount of time in years between release of the preprint and publication of the peer-reviewed article (“preprint_age”).
Random-effects meta-analysis across journals for each metric and variable, based on the regression statistics in Supplementary file 9.
Regression statistics from log-linear regression for each metric, journal, and variable, excluding as variables the MeSH term PCs.
Meta-regression statistics of log fold-changes for having a preprint.
Frequencies of country of affiliation inferred using free-text affiliations from PubMed.
Comparison of automatically identified and manually identified earliest last-author publications for 50 randomly selected articles.