By Nicola Adamson, Journal Development Editor
When an article is first published as a Reviewed Preprint in eLife, it includes an eLife Assessment written by the editors and reviewers. This assessment uses a common vocabulary to describe the significance of the findings and the strength of the evidence reported in the article. This approach promotes the assessment of research at the article level, rather than relying on journal names or journal-level metrics.
Following publication of the first version of a Reviewed Preprint, authors can choose to publish their article as a regular journal article (Version of Record; VOR) and mark the end of the process. However, many authors revise their work before requesting a VOR. When authors revise, the editors and reviewers reassess the work and consider updates to the eLife Assessment and the terms included in it.
In this article we report how the terms used to describe significance of findings and strength of evidence change between the first and final versions of articles, and how often authors revise their work.
Our initial dataset included all 2,918 original Reviewed Preprints published by the end of 2024. From these we excluded 516 where a VOR had not been declared by the end of 2025. We excluded a further 351 where editors selected more than one term to describe the strength of evidence, leaving a final dataset of 2,051 articles (see graphic).
The analysis reported here is an updated version of that presented at the 10th International Congress on Peer Review and Scientific Publication last September.
eLife Assessment terms
The terms for the strength of evidence can range from inadequate to exceptional. Figure 1 shows the distribution of these terms in Reviewed Preprints (orange) and VORs (blue).
The most frequently used terms for strength of evidence were solid for Reviewed Preprints (34.8%, n=714) and convincing for VORs (41.4%, n=850). Over a fifth of VORs (21.7%, n= 446) were deemed to be compelling in terms of evidence, with 0.6% (n=12) being described as exceptional.
For 43.8% of articles (n=899), the strength of evidence in the VOR had improved from the initial Reviewed Preprint; for 53.7% of articles (n=1,102) it had remained the same; and for 2.4% of articles (n=50) it had decreased (usually because new information had emerged during the revision process).
Although the term used for the strength of evidence improved in less than half of the articles, more than three-quarters of articles initially described as inadequate or incomplete improved (80.5%, n=33, and 76.6%, n=364, respectively). Of the small group of articles initially described as inadequate, around half improved to solid or better (53.7%, n=22). Unsurprisingly, most of the articles initially described as convincing or compelling kept the same strength of evidence term (75.6%, n=419, and 87.2%, n=232, respectively).
The data show that authors are addressing reviewer concerns and improving the weaker areas of articles during the revision process, especially in cases where the strength of evidence was initially limited.
Figure 1: Strength of evidence terms selected in first version Reviewed Preprints published by the end of 2024 (orange) and respective VORs published by the end of 2025 (blue) (n=2,051).
For the significance of findings the terms can range from useful to landmark – although editors also have the option to not include a significance term.
Figure 2 shows that in both first (orange) and final (blue) versions the most frequent terms were important (Reviewed Preprint: 41.9%, n= 860; VOR: 48.8%, n=1,001), followed by valuable (Reviewed Preprint: 37.4%, n= 767; VOR: 32.0%, n=657).
Figure 2: Significance of findings terms selected in first version Reviewed Preprints published by the end of 2024 (orange) and respective VORs published by the end of 2025 (blue) (n=2,051).
In most cases – and more often than the strength of evidence – these terms remained unchanged (76.1%, n=1,561). Improvements were seen in 20.3% of cases (n=417), mostly in articles initially described as useful (44.6% improved, n=125) or valuable (29.1% improved, n=223).
It is not surprising that terms describing the strength of evidence change more frequently during the review process than those for the significance of the findings. Data from new experiments are more likely to reinforce the claims of the paper, whereas the implications of the work for a given field of research are closely tied to the original research question and therefore less likely to change.
Rounds of revision
Most VORs (79.6%, n=1,633) were declared after a single round of revision and 15.7% (n=321) after two rounds. 3.6% of Reviewed Preprints had VORs declared without revision (n=74), but none of these articles were described as inadequate. The vast majority of authors prefer to revise their article in response to reviewer feedback and, in cases where authors haven’t revised, the final version will typically include author responses.
The flexibility of when to declare a final version and which revisions to carry out also has benefits. For example, authors are able to finalise their article where suggestions are beyond the scope of the current aims, or where circumstances, such as funding constraints or authors moving labs, can limit the revisions authors are able to conduct.
Figure 3 shows the number of revisions authors undertook, based on the initial term selected for strength of evidence. Irrespective of the initial term for strength of evidence, most articles went through one round of revision before a VOR was declared. However, articles initially described as inadequate or incomplete went through two rounds of revision more frequently (26.8% and 25.5%, respectively) than articles initially described as solid (15.1%), convincing (10.5%) or compelling (8.6%).
Figure 3: Rounds of revision carried out for articles initially described as inadequate (orange, n=41), incomplete (blue, n=475), solid (green, n=714), convincing (pink, n=554) or compelling (grey, n=266). Data excludes exceptional as this only applied to a single Reviewed Preprint, declared as a VOR without revision.
Alongside the improvements seen to the strength of evidence terms, this suggests that authors are making efforts to revise and take comments from the review process into consideration, particularly so for articles with lower initial strength of evidence terms.
We hope these data will be of interest to the community and potentially other journals who may want to adopt assessments and/or terms to describe the significance and evidence alongside the articles they publish. We also hope this evidence will encourage further conversations on practical methods to move away from the use of journal-level metrics in research assessment processes.
Further information on the progress of the eLife Model can be read in this three-year update and in this Editorial.