Meta-Research: Tracking the popularity and outcomes of all bioRxiv preprints
Figures

Total preprints posted to bioRxiv over a 61 month period from November 2013 through November 2018.
(a) The number of preprints (y-axis) at each month (x-axis), with each category depicted as a line in a different color. Inset: The overall number of preprints on bioRxiv in each month. (b) The number of preprints posted (y-axis) in each month (x-axis) by category. The category color key is provided below the figure.
-
Figure 1—source data 1
The number of submissions per month to each bioRxiv category, plus running totals.
- https://doi.org/10.7554/eLife.45133.003
-
Figure 1—source data 2
An Excel workbook demonstrating the formulas used to calculate the running totals in Figure 1—source data 1.
- https://doi.org/10.7554/eLife.45133.004
-
Figure 1—source data 3
The number of submissions per month overall, plus running totals.
- https://doi.org/10.7554/eLife.45133.005
-
Figure 1—source data 4
The number of full-length articles published by an arbitrary selection of well-known journals in September 2018.
- https://doi.org/10.7554/eLife.45133.006
-
Figure 1—source data 5
A table of the top 15 authors with the most preprints on bioRxiv.
- https://doi.org/10.7554/eLife.45133.007
-
Figure 1—source data 6
A list of every author, the number of preprints for which they are listed as an author, and the number of email addresses they are associated with.
- https://doi.org/10.7554/eLife.45133.008
-
Figure 1—source data 7
A table of the top 25 institutions with the most authors listing them as their affiliation, and how many papers have been published by those authors.
- https://doi.org/10.7554/eLife.45133.009
-
Figure 1—source data 8
A list of every indexed institution, the number of authors associated with that institution, and the number of papers authored by those researchers.
- https://doi.org/10.7554/eLife.45133.010

The distribution of all recorded downloads of bioRxiv preprints.
(a) The downloads recorded in each month, with each line representing a different year. The lines reflect the same totals as the height of the bars in Figure 2b. (b) A stacked bar plot of the downloads in each month. The height of each bar indicates the total downloads in that month. Each stacked bar shows the number of downloads in that month attributable to each category; the colors of the bars are described in the legend in Figure 1. Inset: A histogram showing the site-wide distribution of downloads per preprint, as of the end of November 2018. The median download count for a single preprint is 279, marked by the yellow dashed line. (c) The distribution of downloads per preprint, broken down by category. Each box illustrates that category’s first quartile, median, and third quartile (similar to a boxplot, but whiskers are omitted due to a long right tail in the distribution). The vertical dashed yellow line indicates the overall median downloads for all preprints. (d) Cumulative downloads over time of all preprints in each category. The top seven categories at the end of the plot (November 2018) are labeled using the same category color-coding as above.
-
Figure 2—source data 1
A list of every preprint, its bioRxiv category, and its total downloads.
- https://doi.org/10.7554/eLife.45133.021
-
Figure 2—source data 2
The number of downloads per month in each bioRxiv category, plus running totals.
- https://doi.org/10.7554/eLife.45133.022
-
Figure 2—source data 3
An Excel workbook demonstrating the formulas used to calculate the running totals in Figure 2—source data 2.
- https://doi.org/10.7554/eLife.45133.023
-
Figure 2—source data 4
The number of downloads per month overall, plus running totals.
- https://doi.org/10.7554/eLife.45133.024

The distribution of downloads that preprints accrue in their first months on bioRxiv.
https://doi.org/10.7554/eLife.45133.012-
Figure 2—figure supplement 1—source data 1
Monthly download counts for each bioRxiv preprint for each of its first 12 months.
- https://doi.org/10.7554/eLife.45133.013

The proportion of downloads that preprints accrue in their first months on bioRxiv.
https://doi.org/10.7554/eLife.45133.014
Multiple perspectives on per-preprint download statistics.
https://doi.org/10.7554/eLife.45133.015-
Figure 2—figure supplement 3—source data 1
The download counts for each bioRxiv preprint in its first month online.
- https://doi.org/10.7554/eLife.45133.016
-
Figure 2—figure supplement 3—source data 2
Maximum monthly download count for each bioRxiv preprint.
- https://doi.org/10.7554/eLife.45133.017
-
Figure 2—figure supplement 3—source data 3
A list of each bioRxiv preprint and how many downloads it received in 2018.
- https://doi.org/10.7554/eLife.45133.018

Total downloads per preprint, segmented by the year in which each preprint was posted.
https://doi.org/10.7554/eLife.45133.019-
Figure 2—figure supplement 4—source data 1
A list of each bioRxiv preprint and how many downloads it received in each year it was online.
- https://doi.org/10.7554/eLife.45133.020

Characteristics of the bioRxiv preprints published in journals, across the 27 subject collections.
(a) The proportion of preprints that have been published (y-axis), broken down by the month in which the preprint was first posted (x-axis). (b) The proportion of preprints in each category that have been published elsewhere. The dashed line marks the overall proportion of bioRxiv preprints that have been published and is at the same position as the dashed line in panel 3a. (c) The number of preprints in each category that have been published in a journal.
-
Figure 3—source data 1
The number of preprints posted in each month, plus the count and proportion of those later published.
- https://doi.org/10.7554/eLife.45133.029
-
Figure 3—source data 2
The number of preprints posted in each category, plus the count and proportion of those published.
- https://doi.org/10.7554/eLife.45133.030

Observed annual publication rates and estimated range for actual publication rates.
https://doi.org/10.7554/eLife.45133.027-
Figure 3—figure supplement 1—source data 1
Results of manual publication verification for a sample of bioRxiv preprints.
- https://doi.org/10.7554/eLife.45133.028

A stacked bar graph showing the 30 journals that have published the most bioRxiv preprints.
The bars indicate the number of preprints published by each journal, broken down by the bioRxiv categories to which the preprints were originally posted.
-
Figure 4—source data 1
The number of preprints published in each category by the 30 most prolific publishers of preprints.
- https://doi.org/10.7554/eLife.45133.033
-
Figure 4—source data 2
A table showing the proportion of published papers that were previously bioRxiv preprints, for the 30 journals that published the most bioRxiv preprints.
- https://doi.org/10.7554/eLife.45133.034
-
Figure 4—source data 3
Year-level data of the proportion of published papers that were previously bioRxiv preprints, for the 30 journals that published the most bioRxiv preprints.
- https://doi.org/10.7554/eLife.45133.035

A modified box plot (without whiskers) illustrating the median downloads of all bioRxiv preprints published in a journal.
Each box illustrates the journal’s first quartile, median, and third quartile, as in Figure 2c. Colors correspond to journal access policy as described in the legend. Inset: A scatterplot in which each point represents an academic journal, showing the relationship between median downloads of the bioRxiv preprints published in the journal (x-axis) against its 2017 journal impact factor (y-axis). The size of each point is scaled to reflect the total number of bioRxiv preprints published by that journal. The regression line in this plot was calculated using the ‘lm’ function in the R ‘stats’ package, but all reported statistics use the Kendall rank correlation coefficient, which does not make as many assumptions about normality or homoscedasticity.
-
Figure 5—source data 1
A list of every preprint with its total download count and the journal in which it was published, if any.
- https://doi.org/10.7554/eLife.45133.037
-
Figure 5—source data 2
Journal impact factor and access status of the 30 journals that have published the most preprints.
- https://doi.org/10.7554/eLife.45133.038

The interval between the date a preprint is posted to bioRxiv and the date it is first published elsewhere.
(a) A histogram showing the distribution of publication intervals. The x-axis indicates the time between preprint posting and journal publication; the y-axis indicates how many preprints fall within the limits of each bin. The yellow line indicates the median; the same data is also visualized using a boxplot above the histogram. (b) The publication intervals of preprints, broken down by the journal in which each appeared. The journals in this list are the 30 journals that have published the most total bioRxiv preprints; the plot for each journal indicates the density distribution of the preprints published by that journal, excluding any papers that were posted to bioRxiv after publication. Portions of the distributions beyond 1,000 days are not displayed.
-
Figure 6—source data 1
A list of every published preprint, the year it was first posted, the date it was published, and the interval between posting and publication, in days.
- https://doi.org/10.7554/eLife.45133.042
-
Figure 6—source data 2
A list of every preprint published in the 30 journals displayed in the figure, the journal in which it was published, and the interval between posting and publication, in days.
- https://doi.org/10.7554/eLife.45133.043
-
Figure 6—source data 3
The results of Dunn’s test, a pairwise comparison of the median publication interval of each journal in the figure.
- https://doi.org/10.7554/eLife.45133.044
Tables
Unique authors posting preprints in each year.
‘New authors’ counts authors posting preprints in that year that had never posted before; ‘Total authors’ includes researchers who may have already been counted in a previous year, but are also listed as an author on a preprint posted in that year. Data for table pulled directly from database. An SQL query to generate these numbers is provided in the Methods section.
Year | New authors | Total authors |
---|---|---|
2013 | 608 | 608 |
2014 | 3,873 | 4,012 |
2015 | 7,584 | 8,411 |
2016 | 21,832 | 24,699 |
2017 | 52,051 | 61,239 |
2018 | 84,339 | 106,231 |
A comparison of the median downloads per preprint for bioRxiv preprints that have been published elsewhere to those that have not.
See Methods section for description of tests used.
Posted | Published | Unpublished |
---|---|---|
2017 and earlier | 465 | 414 |
Through 2018 | 394 | 208 |
-
Table 2—source data 1
A list of every preprint with its total download count, the year in which it was first posted, and whether it has been published.
- https://doi.org/10.7554/eLife.45133.040
Additional files
-
Source code 1
SQL queries and R code required to pull the data and visualize each figure.
- https://doi.org/10.7554/eLife.45133.045
-
Supplementary file 1
Detailed description of all database fields and tables.
- https://doi.org/10.7554/eLife.45133.046
-
Transparent reporting form
- https://doi.org/10.7554/eLife.45133.047