Download icon

Meta-Research: The growth of acronyms in the scientific literature

  1. Adrian Barnett  Is a corresponding author
  2. Zoe Doubleday
  1. School of Public Health and Social Work, Queensland University of Technology, Australia
  2. Future Industries Institute, University of South Australia, Australia
Feature Article
Cite this article as: eLife 2020;9:e60080 doi: 10.7554/eLife.60080
3 figures, 2 videos, 3 tables and 1 additional file

Figures

Figure 1 with 3 supplements
Mean proportions of acronyms in titles and abstracts over time.

The proportion of acronyms (purple line) has risen steadily over time in abstracts both for acronyms that are letters and/or numbers (top left) or just letters (top right). Acronyms are generally less common in titles than abstracts, and the proportion in titles has been relatively stable since 2000, but there was an increase from 1960 to 2000 (bottom left and right). Three-character acronyms (blue lines) are more common than two-character acronyms (brown-orange lines) and four-character acronyms (olive green lines) in both titles and abstracts. A sufficient number of abstracts only became available from 1956. The spikes in titles for acronyms of length 2+ in 1952 and 1964 are because of the relatively small number of papers in those years, with over 78,000 papers being excluded in 1964 because the title was in capitals.

Figure 1—figure supplement 1
Mean proportions of acronyms in titles and abstracts over time, with the 100 most popular acronyms excluded.

Each line shows the trend after excluding up to the n most popular acronyms (n = 1, ..., 100). The darkest line is for n = 1, and the lightest line is for n = 100. The number of titles and journals in the early 1950s is much smaller, hence the more erratic trend for titles in that decade.

Figure 1—figure supplement 2
Mean proportions of acronyms in titles and abstracts over time by article type.

Data for six article types (journal article, clinical trial, case report, comment, editorial, and other). The high proportion of acronyms in the 1950s and 1960s for ‘other’ is driven by a relatively large number of obituaries that include qualifications, such as FRCP (Fellow of the Royal College of Physicians) or DSO (Distinguished Service Order). The drop in the proportion of acronyms in 2019 for ‘clinical trials’ and ‘other’ may be due to a delay in papers from some journals appearing in PubMed.

Figure 1—figure supplement 3
Mean proportions of acronyms in titles over time by article type with a truncated y−axis.

Using a truncated y−axis more clearly shows the upward trend in the use of acronyms in titles for all article types over time (by reducing the influence of ‘other’ in the 1950s and 1960s; see Figure 1—figure supplement 2).

Estimated time to re-use of acronyms over time.

The solid line is the estimated time in years for 10% of newly coined acronyms to be re-used in the same journal. 10% was chosen based on the overall percentage of acronyms being re-used within a year. Newly coined acronyms are grouped by year. The dotted lines show the 95% confidence interval for the time to re-use, which narrows over time as the sample size increases. The general trend is of an increasing time to re-use from 1965 onwards, which indicates that acronyms are being re-used less often. The relatively slow times to re-use in the 1950s and early 1960s are likely due to the very different mix of journals in that time.

Average number of words in abstracts and titles over time.

The average title length has increased linearly between 1950 and 2019 (left). The average length of abstracts has also increased since 1960, except for a brief reduction in the late 1970s and a short period of no change after 2000 (right). A sufficient number of abstracts only became available from 1956. Note that the y-axes in the two panels are different, and that neither starts at zero, because we are interested in the relative trend.

Videos

Video 1
The top ten acronyms in titles for every year from 1950 to 2019.
Video 2
The top ten acronyms in abstracts for every year from 1950 to 2019.

Tables

Table 1
Top 20 acronyms found in over 24 million titles and over 18 million abstracts.

How many do you recognise?

RankAcronymCommon meaning(s)Count
1DNADeoxyribonucleic acid2,443,760
2CIConfidence interval1,807,878
3ILInterleukin/Independent living1,418,402
4HIVHuman immunodeficiency virus1,172,516
5mRNAMessenger ribonucleic acid1,107,547
6RNARibonucleic acid1,060,355
7OROdds ratio/Operating room788,522
8PCRPolymerase chain reaction745,522
9CTComputed tomography743,794
10ATPAdenosine triphosphate582,838
11MSMultiple sclerosis/Mass spectrometry567,523
12MRIMagnetic resonance imaging504,823
13TNFTumour necrosis factor454,486
14USUnited States/Ultrasound/Urinary system436,328
15SDStandard deviation411,997
16NONitric oxide394,777
17PDParkinson's disease/Peritoneal dialysis389,566
18HRHeart rate/Hazard ratio383,027
19IFNInterferon383,011
20CD4Cluster of differentiation antigen 4363,502
Table 2
Errors made by the algorithm in random samples of titles and abstracts, the number of times that error was made, the average error percentage, and the estimated upper limit.
ErrorCountAverage error (%)Upper limit on error (%)
Wrongly excluded whole title10.31.6
Missed valid acronym from title71.22.2
Wrongly included acronym from title50.81.7
Missed valid acronym from abstract196.39.1
Wrongly included acronym from abstract20.72.1
Table 3
Reasons for excluding titles and abstracts, along with the numbers excluded for each reason.
ReasonTitlesAbstract
No abstractn/a7,253,053
Non-English4,783,5694,783,569
Pre-1950384,4367,973
Title/abstract largely in capitals298,284112,369
One word title/abstract76,303201
Empty title/abstract1499,887
Missing PubMed date1,5101,510
Duplicate PubMed ID1,3441,328
No article type1090
Total excluded5,545,70412,169,890
Total included24,873,37218,249,091

Data availability

The analysis code and data to replicate all parts of the analyses and generate the figures and tables are available from GitHub: https://github.com/agbarnett/acronyms (copy archived at https://github.com/elifesciences-publications/acronyms). We welcome re-use and the repository is licensed under the terms of the MIT license. Data was originally downloaded from the PubMed Baseline Repository (March 23, 2020; https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/).

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)