Download icon

Meta-Research: Tracking the popularity and outcomes of all bioRxiv preprints

  1. Richard J Abdill
  2. Ran Blekhman  Is a corresponding author
  1. University of Minnesota, United States
Feature Article
Cite this article as: eLife 2019;8:e45133 doi: 10.7554/eLife.45133
6 figures, 2 tables, 2 data sets and 3 additional files

Figures

Total preprints posted to bioRxiv over a 61 month period from November 2013 through November 2018.

(a) The number of preprints (y-axis) at each month (x-axis), with each category depicted as a line in a different color. Inset: The overall number of preprints on bioRxiv in each month. (b) The number of preprints posted (y-axis) in each month (x-axis) by category. The category color key is provided below the figure.

https://doi.org/10.7554/eLife.45133.002
Figure 1—source data 1

The number of submissions per month to each bioRxiv category, plus running totals.

https://doi.org/10.7554/eLife.45133.003
Figure 1—source data 2

An Excel workbook demonstrating the formulas used to calculate the running totals in Figure 1—source data 1.

https://doi.org/10.7554/eLife.45133.004
Figure 1—source data 3

The number of submissions per month overall, plus running totals.

https://doi.org/10.7554/eLife.45133.005
Figure 1—source data 4

The number of full-length articles published by an arbitrary selection of well-known journals in September 2018.

https://doi.org/10.7554/eLife.45133.006
Figure 1—source data 5

A table of the top 15 authors with the most preprints on bioRxiv.

https://doi.org/10.7554/eLife.45133.007
Figure 1—source data 6

A list of every author, the number of preprints for which they are listed as an author, and the number of email addresses they are associated with.

https://doi.org/10.7554/eLife.45133.008
Figure 1—source data 7

A table of the top 25 institutions with the most authors listing them as their affiliation, and how many papers have been published by those authors.

https://doi.org/10.7554/eLife.45133.009
Figure 1—source data 8

A list of every indexed institution, the number of authors associated with that institution, and the number of papers authored by those researchers.

https://doi.org/10.7554/eLife.45133.010
Figure 2 with 4 supplements
The distribution of all recorded downloads of bioRxiv preprints.

(a) The downloads recorded in each month, with each line representing a different year. The lines reflect the same totals as the height of the bars in Figure 2b. (b) A stacked bar plot of the downloads in each month. The height of each bar indicates the total downloads in that month. Each stacked bar shows the number of downloads in that month attributable to each category; the colors of the bars are described in the legend in Figure 1. Inset: A histogram showing the site-wide distribution of downloads per preprint, as of the end of November 2018. The median download count for a single preprint is 279, marked by the yellow dashed line. (c) The distribution of downloads per preprint, broken down by category. Each box illustrates that category’s first quartile, median, and third quartile (similar to a boxplot, but whiskers are omitted due to a long right tail in the distribution). The vertical dashed yellow line indicates the overall median downloads for all preprints. (d) Cumulative downloads over time of all preprints in each category. The top seven categories at the end of the plot (November 2018) are labeled using the same category color-coding as above.

https://doi.org/10.7554/eLife.45133.011
Figure 2—source data 1

A list of every preprint, its bioRxiv category, and its total downloads.

https://doi.org/10.7554/eLife.45133.021
Figure 2—source data 2

The number of downloads per month in each bioRxiv category, plus running totals.

https://doi.org/10.7554/eLife.45133.022
Figure 2—source data 3

An Excel workbook demonstrating the formulas used to calculate the running totals in Figure 2—source data 2.

https://doi.org/10.7554/eLife.45133.023
Figure 2—source data 4

The number of downloads per month overall, plus running totals.

https://doi.org/10.7554/eLife.45133.024
Figure 2—figure supplement 1
The distribution of downloads that preprints accrue in their first months on bioRxiv.
https://doi.org/10.7554/eLife.45133.012
Figure 2—figure supplement 1—source data 1

Monthly download counts for each bioRxiv preprint for each of its first 12 months.

https://doi.org/10.7554/eLife.45133.013
Figure 2—figure supplement 2
The proportion of downloads that preprints accrue in their first months on bioRxiv.
https://doi.org/10.7554/eLife.45133.014
Figure 2—figure supplement 3
Multiple perspectives on per-preprint download statistics.
https://doi.org/10.7554/eLife.45133.015
Figure 2—figure supplement 3—source data 1

The download counts for each bioRxiv preprint in its first month online.

https://doi.org/10.7554/eLife.45133.016
Figure 2—figure supplement 3—source data 2

Maximum monthly download count for each bioRxiv preprint.

https://doi.org/10.7554/eLife.45133.017
Figure 2—figure supplement 3—source data 3

A list of each bioRxiv preprint and how many downloads it received in 2018.

https://doi.org/10.7554/eLife.45133.018
Figure 2—figure supplement 4
Total downloads per preprint, segmented by the year in which each preprint was posted.
https://doi.org/10.7554/eLife.45133.019
Figure 2—figure supplement 4—source data 1

A list of each bioRxiv preprint and how many downloads it received in each year it was online.

https://doi.org/10.7554/eLife.45133.020
Figure 3 with 1 supplement
Characteristics of the bioRxiv preprints published in journals, across the 27 subject collections.

(a) The proportion of preprints that have been published (y-axis), broken down by the month in which the preprint was first posted (x-axis). (b) The proportion of preprints in each category that have been published elsewhere. The dashed line marks the overall proportion of bioRxiv preprints that have been published and is at the same position as the dashed line in panel 3a. (c) The number of preprints in each category that have been published in a journal.

https://doi.org/10.7554/eLife.45133.026
Figure 3—source data 1

The number of preprints posted in each month, plus the count and proportion of those later published.

https://doi.org/10.7554/eLife.45133.029
Figure 3—source data 2

The number of preprints posted in each category, plus the count and proportion of those published.

https://doi.org/10.7554/eLife.45133.030
Figure 3—figure supplement 1
Observed annual publication rates and estimated range for actual publication rates.
https://doi.org/10.7554/eLife.45133.027
Figure 3—figure supplement 1—source data 1

Results of manual publication verification for a sample of bioRxiv preprints.

https://doi.org/10.7554/eLife.45133.028
A stacked bar graph showing the 30 journals that have published the most bioRxiv preprints.

The bars indicate the number of preprints published by each journal, broken down by the bioRxiv categories to which the preprints were originally posted.

https://doi.org/10.7554/eLife.45133.031
Figure 4—source data 1

The number of preprints published in each category by the 30 most prolific publishers of preprints.

https://doi.org/10.7554/eLife.45133.033
Figure 4—source data 2

A table showing the proportion of published papers that were previously bioRxiv preprints, for the 30 journals that published the most bioRxiv preprints.

https://doi.org/10.7554/eLife.45133.034
Figure 4—source data 3

Year-level data of the proportion of published papers that were previously bioRxiv preprints, for the 30 journals that published the most bioRxiv preprints.

https://doi.org/10.7554/eLife.45133.035
A modified box plot (without whiskers) illustrating the median downloads of all bioRxiv preprints published in a journal.

Each box illustrates the journal’s first quartile, median, and third quartile, as in Figure 2c. Colors correspond to journal access policy as described in the legend. Inset: A scatterplot in which each point represents an academic journal, showing the relationship between median downloads of the bioRxiv preprints published in the journal (x-axis) against its 2017 journal impact factor (y-axis). The size of each point is scaled to reflect the total number of bioRxiv preprints published by that journal. The regression line in this plot was calculated using the ‘lm’ function in the R ‘stats’ package, but all reported statistics use the Kendall rank correlation coefficient, which does not make as many assumptions about normality or homoscedasticity.

https://doi.org/10.7554/eLife.45133.036
Figure 5—source data 1

A list of every preprint with its total download count and the journal in which it was published, if any.

https://doi.org/10.7554/eLife.45133.037
Figure 5—source data 2

Journal impact factor and access status of the 30 journals that have published the most preprints.

https://doi.org/10.7554/eLife.45133.038
The interval between the date a preprint is posted to bioRxiv and the date it is first published elsewhere.

(a) A histogram showing the distribution of publication intervals. The x-axis indicates the time between preprint posting and journal publication; the y-axis indicates how many preprints fall within the limits of each bin. The yellow line indicates the median; the same data is also visualized using a boxplot above the histogram. (b) The publication intervals of preprints, broken down by the journal in which each appeared. The journals in this list are the 30 journals that have published the most total bioRxiv preprints; the plot for each journal indicates the density distribution of the preprints published by that journal, excluding any papers that were posted to bioRxiv after publication. Portions of the distributions beyond 1,000 days are not displayed.

https://doi.org/10.7554/eLife.45133.041
Figure 6—source data 1

A list of every published preprint, the year it was first posted, the date it was published, and the interval between posting and publication, in days.

https://doi.org/10.7554/eLife.45133.042
Figure 6—source data 2

A list of every preprint published in the 30 journals displayed in the figure, the journal in which it was published, and the interval between posting and publication, in days.

https://doi.org/10.7554/eLife.45133.043
Figure 6—source data 3

The results of Dunn’s test, a pairwise comparison of the median publication interval of each journal in the figure.

https://doi.org/10.7554/eLife.45133.044

Tables

Table 1
Unique authors posting preprints in each year.

‘New authors’ counts authors posting preprints in that year that had never posted before; ‘Total authors’ includes researchers who may have already been counted in a previous year, but are also listed as an author on a preprint posted in that year. Data for table pulled directly from database. An SQL query to generate these numbers is provided in the Methods section.

https://doi.org/10.7554/eLife.45133.025
YearNew authorsTotal authors
 2013608608
 20143,8734,012
 20157,5848,411
 201621,83224,699
 201752,05161,239
 201884,339106,231
Table 2
A comparison of the median downloads per preprint for bioRxiv preprints that have been published elsewhere to those that have not.

See Methods section for description of tests used.

https://doi.org/10.7554/eLife.45133.039
PostedPublishedUnpublished
2017 and earlier465414
Through 2018394208
Table 2—source data 1

A list of every preprint with its total download count, the year in which it was first posted, and whether it has been published.

https://doi.org/10.7554/eLife.45133.040

Data availability

Source data for all figures have been provided in supporting files. A database snapshot containing all data collected for this study has been deposited in a Zenodo repository with DOI 10.5281/zenodo.2465688.

The following data sets were generated
  1. 1
    Zenodo
    1. RJ Abdill
    2. R Blekhman
    (2019)
    Data from: Tracking the popularity and outcomes of all bioRxiv preprints.
    https://doi.org/10.5281/zenodo.2465688
  2. 2
    Zenodo
    1. RJ Abdill
    2. R Blekhman
    (2019)
    Complete Rxivist dataset of scraped bioRxiv data.
    https://doi.org/10.5281/zenodo.2529922

Additional files

Source code 1

SQL queries and R code required to pull the data and visualize each figure.

https://doi.org/10.7554/eLife.45133.045
Supplementary file 1

Detailed description of all database fields and tables.

https://doi.org/10.7554/eLife.45133.046
Transparent reporting form
https://doi.org/10.7554/eLife.45133.047

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)