Peer reviewers altered their recommendation based on whether they were cited or wanted to be cited. A matched study of open peer review at four journals

Adrian Barnett

doi:10.7554/eLife.108748.1

eLife Assessment

This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence is convincing, but article would benefit from a clearer presentation of the results and a more nuanced discussion of the motivations of reviewers.

https://doi.org/10.7554/eLife.108748.1.sa5

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Peer reviewers judge the validity and quality of new research. These judgements would ideally be impartial, but some reviewers may give a more favourable review if they are cited in the article because the authors have recognised their work and because citations are a valuable academic currency. Reviewers sometimes request self-citations to their own work, which may be justified as reviewers should be relevant experts. However, some self-citation requests may be unethical, with reviewers exploiting the authors’ need to publish. We examined whether citations to a reviewer and self-citations influenced their peer review. We used a matched design at four journals that use open peer review and make all article versions available. Our sample included more than 37,000 peer reviews, with 13% where the reviewer was cited in the article and 6% where the reviewer included a self-citation to their work in their review. Reviewers who were cited were more likely to approve the article, with an odds ratio of 1.61 compared with reviewers who were not cited (adjusted 99.4% CI: 1.16 to 2.23). Reviewers who suggested a self-citation were much less likely to approve the article, with an odds ratio of 0.15 compared with reviewers with no self-citations (adjusted 99.4% CI: 0.08 to 0.30). Reviewers who requested and received a citation were much more likely to approve the article compared to reviewers whose request was disregarded (odds ratio of 3.5, 95% CI: 2.0 to 6.1). Some reviewers’ recommendations are dependent on whether they are cited or want to be cited. Self-citation requests can turn peer review into a transaction rather than an objective critique of the article.

Introduction

In 2024, a published peer reviewed article included this remarkable sentence: “As strongly requested by the reviewers, here we cite some references [35–47] although they are completely irrelevant to the present work” [1]. This was a rare public example of coerced citations, where a reviewer exploits the peer review process to increase their citation counts and hence further their own career [2–4]. Reviewers should be relevant experts, so some suggestions to cite their work will be appropriate. However, excessive requests for self-citations or requests to cite unrelated work are unethical [5–9]. Coerced citations can also come from editors trying to boost their journal’s ranking [10–12].

Coerced citations are reported as a common problem in peer review. In author surveys, two-thirds reported pressure from peer reviewers to cite unrelated articles [13] and 23% had experienced a reviewer that “required them to include unnecessary references to their publication(s)” [14]. Publishers have investigated whether “hundreds of researchers” have manipulated the peer review process to increase their own citations [15]. Some reviewers may be exploiting their power over authors who “have a strong incentive to […] accept all ‘suggestions’ by the referees even if one knows that they are misleading or even incorrect” [16].

As reviewers are often in the same field as the authors, they may already be cited in the article without the need for coerced citations. Reviewers who are cited may give a more favourable peer review and be more willing to overlook flaws [17, 18]. Some authors may try to exploit this using “referee baiting” [3] or “flattery citations” [19] by favourably citing a reviewer’s work.

The interactions during peer review between authors and reviewers can determine whether an article is accepted [20] and what results are included in the published version [21]. Given the importance of peer review for science, studies that examine how peer review works in practice are needed [22–26]. Here, we examine interactions between peer reviewers and authors using four journals that publish all article versions and all peer reviews. We had two research questions:

Do peer reviewers give a more or less favourable recommendation when they are cited in the article?
Do peer reviewers give a more or less favourable recommendation when their review includes a self-citation?

Methods

Journal selection

We studied journals from the publisher F1000 as their journals use open peer review with signed reviewers. F1000 journals use a publish–review–curate model [27], meaning all versions of the article are publicly available, including versions updated after peer review. This allowed us to examine the interactions between authors and reviewers throughout the peer review process. We selected four F1000 journals that each had over 100 articles. Some characteristics of the four journals are given in Table 1. Three journals were created to support funders.

Brief information about the four included journals from the publisher *F1000*.

The peer review process used by F1000 journals differs from most standard journals. The journals do not use academic editors, but do have in-house editors who manage articles but do not make editorial decisions. This means that most interactions during peer review are between authors and reviewers directly. In-house editors perform checks prior to the first version of the article being published and at F1000Research this results in 40 to 50% submissions being rejected (personal communication, F1000 staff). Up to mid-2024, the authors were asked to identify potential reviewers who were qualified experts with no competing interests [28]. Since mid-2024, reviewer identification is made in-house, although authors can suggest reviewers.

Reviewers are asked to recommend one of three categories: Approved, Approved with reservations, and Not approved. For brevity, we refer to ‘Approved with Reservations’ as ‘Reservations’. An article is indexed once it receives two ‘Approved’ or two ‘Reservations’ and one ‘Approved’. The guidelines for recommending Approved are: “the aims and research methods are adequate; results are presented accurately, and the conclusions are justified and supported by the presented data” [29]. Peer reviewers are asked to assess the validity of an article’s content, rather than novelty or interest levels, an approach designed to combat publication bias [30].

All four journals have a peer reviewer code of conduct and state that reviewers should familiarise themselves with the ethical guidelines for peer reviewers by the Committee On Publication Ethics [31]. The journals’ guidelines for reviewers include the following: “reviewers should explicitly state their reasoning when asking authors to cite their own work”.

Data extraction

We extracted data on authors and articles from the OpenAlex database (https://openalex.org/) and directly from the four journals. OpenAlex combines scholarly data from multiple sources, including ORCID – a unique identifier for researchers, Microsoft Academic, Crossref and PubMed. A recent study compared OpenAlex with the two commonly used proprietary bibliometric databases of Web of Science and Scopus for the years 2015 to 2022 [32]. The results were mixed, but OpenAlex had better ORCID coverage and covered more Digital Object Identifiers (DOIs) – the unique identifier for publications. We accessed OpenAlex using the openalexR package [33, 34]. We used each journal’s application programming interface (API) to extract data on the articles and peer reviews. The data were extracted in four stages:

Searches were made using the APIs of the four journals to find all articles published between 1 Jan 2012 and 28 May 2025, with the start date to capture all potential articles.
For each article, the following data were downloaded in XML format:
- The article’s publication date and version number
- The reviewers’ names and ORCIDs (if available)
- The text of all reviews and the reviewers’ recommendations
- The DOIs and PMIDs (PubMed IDs) from the article’s reference list
- The DOIs and PMIDs of any articles cited by the reviewers. The online peer review system at F1000 journals includes the DOI of any article cited in the review, which facilitates the identification of self-citations.
Articles were excluded if:
- They were not peer reviewed or had yet to receive any reviews
- The reference list was empty
The reviewers’ publication histories were collected from OpenAlex using their name, institution and ORCID (if available). Reviews were excluded if there was no record for the reviewer in OpenAlex, or if the reviewer had no published articles as there was no potential for them to be cited or request a self-citation.

Study design

We used two predictor variables about the reviewer:

The number of times they were cited in the article (0,1, 2,…).
The number of times they included self-citations to their own work in their review (0,1, 2,…).

We fitted both predictors as linear, but reviewers may behave differently with any citation rather than a linear change, and hence we also fitted both predictors as a binary “none versus any” (0 vs 1, 2, …). We compared the linear and binary alternatives using the Akaike Information Criterion (AIC) to find the parameterisation that best fitted the data [35].

We matched on article and version to control for confounding by any characteristics of the article [36]; for example, the article’s topic or writing style. Hence, we compared two or more independent reviewers who considered the same article.

All analyses were stratified by article version, using the first version only or the second and subsequent versions. This is because the reviewers are unknown to the authors for the first version, but from the second version onwards, the authors will know the reviewers as the journals use signed peer reviews. This knowledge could alter the behaviour of authors and reviewers.

Any confounding by the characteristics of the articles was controlled by the matched design, but confounding by the characteristics of the reviewers remains possible [18]. We considered the potential confounders of the reviewer’s experience and reviewer’s country. More experienced reviewers will likely be cited more often (on average) and could be more or less strict in their recommendations. The reviewer’s country is a potential confounder due to large differences in citation counts by country [37] and potential differences in recommendations by country [38].

Some reviews were performed by reviewers together with co-reviewers, who were usually less experienced. Our primary analysis excluded the co-reviewers, but we included them in a sensitivity analysis where we created combined versions of the two independent variables using the sum of citations to reviewers and co-reviewers, and the sum of self-citations from the reviewers and co-reviewers.

The study design is summarised in Figure 1

Graphical summary of the study design for research question 1 showing a dummy article and two reviews.
In the first version of the article, the reviewer Smith (blue) is cited whilst Jones (purple) is not. For the second version of the article, the authors are now aware that Jones is a reviewer and Jones has been cited. The reviewers’ recommendations are the outcome and are colour-coded as Not approved (red), Reservations (orange) and Approved (green). We tested whether citations to the reviewer in the article influenced their recommendation. The matched design means that only reviewers of the same article are compared (here, Smith and Jones) and the overall effect is estimated by aggregating over multiple matched comparisons. Research question 2 used the same design but examined self-citations in the reviews.

Statistical methods

We used conditional logistic regression to examine the associations between the citations to the reviewer and their ordinal recommendation (Approved → Reservations → Not approved) while matching the article and the version [39]. Conditional logistic regression requires a binary dependent variable; hence, we fitted two related models that examined the odds of:

“Approved” compared with “Reservations” or “Not approved”.
“Approved” or “Reservations” compared with “Not approved”,

These two models tested the same hypothesis, hence we adjusted for multiple testing. We also used repeated testing because of the stratification by article version and the two formulations of the predictors (linear or none versus any). Since we used 8 (2 × 2 × 2) tests, we displayed all the results using 99.4% confidence intervals instead of 95.0% intervals, which is a 5% type I error divided by eight tests.

In an unplanned analysis, we examined the association between the reviewer’s recommendation and whether they included citations to work other than self-citations. This was added to examine whether the citations were used to highlight important errors and/or missing context in the article.

In sensitivity analyses, we controlled for potential confounding due to the reviewer. We used the reviewer’s number of published articles as a proxy for their experience. This association could be non-linear; for example, a diminishing effect for more experienced reviewers, so we examined six fractional polynomials of the reviewers’ number of articles and used the AIC to select the best association [40].

In a second sensitivity analysis, we planned to use a frailty model for the reviewers’ countries [41]. However, this model often failed to converge, potentially because there were many countries and some countries had relatively small numbers of reviewers. Hence, we instead used a leave-one-out analysis for each of the top ten most common countries and determined if the results were noticeably different.

Outliers were not excluded. No data were missing in the analysis data set. We tested for potential bias from reviewers who were excluded because they had no data in OpenAlex using an elastic net with potential predictors of article version, article date, reviewer’s country, and role (reviewer or co-reviewer) and dependent variable of reviewer excluded (yes/no) [42].

Text analysis

We examined how self-citations were justified in the reviews and whether the text differed according to the reviewer’s recommendation. For an initial view of self-citations, we randomly selected 20 reviews that included a self-citation and extracted the most relevant sentence.

To analyse the review text, we first extracted the 100 most commonly used words across all reviews. To standardise the text, all words were transformed into tokens, with stop-words removed and then stemmed. We then tested which of the 100 words were associated with recommending Approved versus Reservations or Not approved amongst those reviewers who included a self-citation. We chose the set of words using an elastic net with 10-fold cross-validation and selected a parsimonious model by using the lambda within one standard error of the minimum cross-validated error [43]. To get uncertainty intervals for the estimates, we fitted a Bayesian model with the set of words selected by the elastic net and using a sceptical Normal prior centred on zero to create shrinkage.

Sample size

We aimed for a sample size of approximately 5,000 articles and assumed that half would be the first version, giving a sample size of 2,500 articles for the analysis using the first version only [44]. In 1,000 simulations, this gave an 89.1% power to detect an odds ratio of 1.5 using conditional logistic regression for a reviewer who recommended a higher category (Approved → Reservations → Not approved) when they were cited. We assumed that 15% of articles would include a citation to the reviewer. Eighty percent of the simulated articles had two reviews, and the remaining 20% had three reviews. Based on preliminary data from two journals, we assumed that the reviewers’ recommendations would have a ratio for Approved:Reservations:Not approved of 70:24:6.

Reproducibility

Research question 1 was pre-registered using As Predicted on 20 May 2024 [44]. Research question 2 was formulated during data collection but before any data analysis and used the same study design and statistical methods as question 1.

All data extraction and analyses were conducted using R version 4.4.1 [45]. The data and R code are available on GitHub [46].

Results

A flow chart of the included reviews is shown in Supplement S.1. The final sample size was over 37,000 reviews. There were more than 2,000 articles that were not included because they had not yet been peer reviewed, especially recent articles. More than 900 reviewers did not have a record in OpenAlex and so could not be included. These missing reviewers were more likely to be from older articles and more likely to be co-reviewers.

Descriptive statistics on the included reviews are in Table 2. The reviewers were cited at least once in 13% of the articles and 6% of the reviews included a self-citation. Most reviews recommended “Approved” (54%), with only 8% recommending “Not approved” which is low compared with many journals; however, 40 to 50% of submissions are rejected before articles are sent for peer review (personal communication, F1000 staff). The reviewers were relatively experienced, with a median number of papers of 55.

Descriptive statistics for the articles and peer reviews.
Q1 = first quartile, Q3 = third quartile.

The binary predictor for citations of “any versus none” had a generally better fit to the data compared to the linear predictor (Supplement S.2). This indicates that for most reviewers receiving any citation is important, and there is no linear increase for two or more citations. The following results are for the binary predictor “any versus none”, with the results using a linear predictor in Supplement S.3.

Reviewers who were cited were more likely to approve the article, but only after version 1 (Fig 2 and Table 3). If a reviewer was cited in any versions after version 1, the odds ratio for recommending Approved versus Reservations or Not approved was 1.61 (adjusted 99.4% CI 1.16 to 2.23).

Odds ratios and probabilities for reviewers giving a more or less favourable recommendation depending on whether they were cited in the article.
The top row examines Approved vs Reservations or Not approved, and the bottom row examines Approved or Reservations vs Not approved. The figures show the mean (dot) and adjusted 99.4% confidence intervals (horizontal lines). All models were split by article version. The odds ratios and probabilities show the same results but on different scales.

Odds ratios for reviewers giving a more (OR > 1) or less (OR < 1) favourable recommendation depending on whether they were cited in the article (question 1) or included self-citations to their own research (question 2). All models were split by article version.

Reviewers who requested a self-citation were much less likely to approve the article for all versions (Fig 3 and Table 3). The odds ratio for recommending Approved versus Reservations or Not approved was 0.57 (99.4% CI 0.44 to 0.73) for version 1 and strengthened to 0.15 (99.4% CI 0.08 to 0.30) for versions 2+. The less favourable recommendation was only for the approval of the article and the odds ratios for Approved or Reservations versus Not approved were much closer to 1.

Odds ratios and probabilities for reviewers giving a more or less favourable recommendation if they included citations to their own research in their review.
The top row examines Approved vs Reservations or Not approved, and the bottom row examines Approved or Reservations vs Not approved. The figures show the mean (dot) and adjusted 99.4% confidence intervals (horizontal lines). All models were split by article version. The odds ratios and probabilities show the same results but on different scales.

In an unplanned analysis, we examined the behaviour of reviewers in the first two versions of the article. We examined the 441 reviews where the reviewer was not cited in version 1 of the article and requested a self-citation in their first review. The reviewers who were then cited in version 2 recommended approval for 92%, compared to 76% for reviewers who were not cited (odds ratio = 3.5, 95% CI: 2.0 to 6.1). This analysis did not use matching.

In an unplanned analysis, we examined whether the reviewers’ recommendations depended on citations to research other than their own by excluding self-citations. Reviewers who included citations in their review were much more likely not to approve the article (Figure 4), which was similar to the association with self-citations (Figure 3). However, reviewers who included citations other than self-citations were also much more likely to recommend “Not approved”, as shown by the lower odds of “Not approved” or “Reservations” versus “Not approved”. This association was not seen using self-citations (Figure 3)

Odds ratios and probabilities for reviewers giving a more or less favourable recommendation depending on if they included citations to other research in their review.
The top row examines Approved vs Reservations or Not approved, and the bottom row examines Approved or Reservations vs Not approved. The figures show the mean (dot) and adjusted 99.4% confidence intervals (horizontal lines). All models were split by article version. The odds ratios and probabilities show the same results but on different scales.

Sensitivity analyses

The odds ratios when including co-reviewers with reviewers were similar to the odds ratios when using reviewers only (Supplement S.4).

We found no evidence that the reviewers’ publication numbers or country confounded the associations between citations and recommendations (Supplement S.5).

Text analyses of reviewers’ comments

A random sample of how reviewers requested self-citations found some vague justifications (Supplement S.8); for example, “Here are some additional publications you might consider referencing”. Other sentences adhered to the publisher’s guidelines for reviewers, as specific reasoning was provided for self-citation [7]. One reviewer thanked the authors for a previous citation. Three reviews did not have a relevant sentence. One reviewer almost certainly used AI to write their review as it included the phrase “Certainly! Here are some potential review questions for the manuscript” [47]; this review included six self-citations with no justifications.

Reviewers who included a self-citation were more likely to use the words “need” and “please” when not approving the article (Figure 5). In contrast, the words “genome” and “might” were the most strongly associated with the reviewers’ approval.

Words in the reviewers’ comments that were associated with approving the article or not for reviewers who included a self-citation (n = 2, 710).
The words were selected using an elastic net that started with the 100 most commonly used review words with 28 retained. The estimates from the elastic net are shown as empty circles and the mean estimates and 95% credible intervals from a Bayesian model as shown as a solid circle and horizontal line. The axis label shows the stemmed word and most common whole word in brackets.

To examine how often open peer reviews were viewed, we took a random sample of 200 reviews from the four journals and found that, on average, they were viewed just 1.2 times per year (Supplement S.7).

Discussion

Our results show strong evidence that some reviewers have a transactional view of peer review, with their final approval dependent on citations to their work. These reviewers are exploiting the pressure on authors to “publish or perish”. Under this pressure, many authors may oblige and add the suggested citations, especially since adding another citation may only require a minor edit to their article [48]. Both sides gain from this transaction, as the authors get an indexed publication and the reviewer gets a citation.

Some reviewers who withhold their approval and request a self-citation may be justified, as they may be highlighting important errors or missing context in the article. Self-citations can be justified when the authors have made a “large scholarly oversight” [8]. However, our matched design shows that other reviewers who evaluated the same article and who did not self-cite were often willing to approve the article, suggesting that the concerns of the self-citing reviewer were specific to them and not serious enough to be noticed by other reviewers.

Reviewers including citations to other research (excluding self-citations) were also less likely to approve the article, supporting the idea that citations (self or otherwise) are highlighting important errors or missing context. However, reviewers citing other research were also much more likely to recommend “Not approved” whereas this association was not observed for self-citations. This indicates that missing citations that were not self-citations were considered more serious than missing self-citations.

Examining the context of the self-citation, we found vague or non-existent justifications (Supplement S.8), showing that some reviewers ignored the journals’ guidelines to state their reasoning when including self-citations. Reviewers who included self-citations and withheld their approval were more likely to use the words “need” and “please”, indicating use of coercive language. We also found a large increase in reviewers’ approval after they suggested and then received a citation. Some reviewers are not adhering to good scientific practice and instead are treating peer review as a chance to boost their own career.

For both research questions, the effects were stronger for the second and later versions of the article than for the first version. Reviewers may understand that authors may be more willing to compromise on later versions when they are closer to obtaining an indexed publication. Most researchers understand that the peer review system is imperfect and that they must sometimes make compromises to be successful [20, 49].

Potential improvements to peer review

Journals could give stronger guidance to reviewers and authors on coercive citations [4]. However, given the limited time for peer review and the many differences in guidelines between journals [50], most authors may not read peer review instructions. Hence, guidance alone may have limited impact.

One suggestion is that reviewers declare to editors when they have recommended citations to their own work [51]. A useful innovation would be for all reviews that contain self-citations to be automatically flagged to the editors who could check if the self-citations are justified. We are aware of one journal where this is already happening (personal communication, Benno Torgler). F1000 have recently introduced checks to prevent reviewers from publishing a review with three or more self-citations. If the reviewers continue to request more than three, then the review is examined, and if the self-citations are deemed inappropriate and the reviewer declines to remove them, then the review is declined.

Open peer review has been suggested as a way to reduce coercive citations [7, 51]. However, our results from four journals that use open review show that it is not a perfect antidote, although the problem could be worse in journals using blinded peer review. The transparency of open peer review should prevent reviewers from leaving self-serving comments; however, we found some dubious justifications for self-citations and blatant use of AI (Supplement S.6). These reviewers may have rationalised that although their words are public, they are rarely scrutinised (Supplement S.7); hence, it was worth the risk. The assumed additional quality assurance from open peer review [52] may often be absent.

A more radical change to peer review is that the reviewers initially see a version of the article with all references blinded and no reference list; for example, “A strong association between increased cleaning and reduced hospital infection is well established [x]”. Reviewers are asked to give an initial recommendation and comments, and then are shown the version with the full references and asked if they need to update their recommendation or provide additional comments. However, this involves more administrative work and demands more from peer reviewers. This approach could be used for particularly consequential or controversial articles. Some journals already require authors to partially blind their articles to maintain anonymous peer review; for example, the instructions from Taylor & Francis include blinding the authors’ names in the reference list [53].

An argument could be made for using AI to provide peer review that is unmoved by citation flattery. However, peer review is an inherently human task by peers, and instead we need to improve peer review rather than abdicating this often difficult and time-consuming task to machines [54].

Related research

Previous cross-sectional studies of self-citations in reviews found at least one self-citation in 3% at a journal that used blinded peer review [17], 12% at a journal that used blinded peer review [55], and 12% at a journal that used open peer review [56]. A related study found that 15% of reviews included a self-citation and that the self-citations were highest when the reviewer recommended “major revisions” [57]. These figures are comparable with the 6% found here and indicate that most reviews do not include self-citations.

Previous surveys estimated that 14% of authors had experienced a coercive citation request from an editor [58], and 7% and 23% had experienced coercive citation pressure from a reviewer [14, 59]. The frequency with which researchers interact with peer review means that many will encounter coercive citations at some point in their career.

A study of conference submissions estimated that reviewers who were cited gave submissions much higher scores [18]. A study of journal peer review estimated that cited reviewers scored the article higher, but with potential confounding by the quality of the article [17].

A survey of authors concluded that accepting an editor’s request for citations improved the chances of being accepted [12]. Requests in later versions were more strongly associated with acquiescence, and we found a related pattern in our analysis, with reviewers who requested self-citations being much less likely to recommend approval for later versions (Figure 3). A study examining open peer review found that self-citation requests were more likely to be included than other suggested citations, indicating that many authors wanted to please the reviewer or felt pressure to do so [56].

A survey of journal editors found that only 5% objected to reviewers self-citing, and that this should be expected as reviewers are likely to have done related work [8].

A cross-sectional study found that self-citations were more likely to have no rationale compared to other citations, suggesting that they are more likely to be unwarranted [55].

Strengths and limitations

To the best of our knowledge, this is the first analysis to use a matched design when examining reviewer citations, and hence strongly control for any confounding by the characteristics of the authors or articles. We compared reviewers who examined an identical article; hence, the differences we found should be due to the reviewers.

Our models include measurement error, as some citations to the reviewers’ work will be missed by our data collection, and some captured citations will be inaccurate [60]. We performed random data checks that showed a good accuracy (Supplement S.6); however, we also found valid citations that were not captured by our data extraction for conference proceedings and technical reports, which are less likely to have a DOI. This measurement error would most likely underestimate a true association, as it reduced the variance in citation counts and created a regression dilution [61]. Our estimates will be biased if the associations between citations and reviewers’ recommendations are different for publications that do not have a DOI. Reviewers should be equally happy with any citation to their work; however, some reviewers may prefer citations to indexed articles, as these are more likely to count toward their h-indices [62].

We examined whether citing a reviewer altered their recommendation, but did not examine the sentiment of the citation [63]. Some citations would likely have been critical of the reviewer’s work and we would expect these to reduce the chances of a favourable recommendation. An analysis that included the sentiment of the citation would be useful, although previous research found that most citations are neutral or positive [63].

Our results may not be generalisable to journals that use blinded peer review or journals that use the traditional peer review model rather than the publish–review–curate model studied here. A previous study found that asking reviewers to consent to an open review had no important effect on the quality of the review or the reviewers’ recommendation [64]. Another potential difference is that the journals in our sample often asked the authors to suggest peer reviewers; however, this is relatively common in other journals [8].

We found a bias in our sample, as co-reviewers and reviewers from older articles were more likely to be excluded due to not having an OpenAlex record (Supplement S.5). We therefore lost more junior reviewers who were less likely to be cited. The percentage of reviews lost was 5% (2026 out of 39,113), which is hopefully small enough to avoid a large bias.

Supplementary material

S.1 Included and excluded reviews

Flow chart of included reviews. ‘N’ is the number of articles and ‘n’ is the number of reviews.

The flow chart shows the loss of articles and reviews during the data collection process. More than 3,500 articles did not have reviewers as they had yet to be peer reviewed or were Faculty Reviews that are commissioned and use a different peer review model.

More than 2,000 reviewers did not have an OpenAlex record and therefore were excluded from the analyses. We examined the potential bias in the lost reviews by comparing their characteristics with those of the retained reviews. We used a multiple regression model with reviewer lost (yes/no) as the binary dependent variable and predictors of article version, article date, referee or co-referee, and reviewer’s country. We expected many of these predictors to have little effect; therefore, we used an elastic net to reduce the number of predictors [42]. We used the ‘glmnet’ package in R [43]. For the binary dependent variable, 39,455 reviews were retained and 2082 (5%) were lost.

The elastic net retained two predictors. The date of the article had an odds ratio of 1.09 per year increase, which means that more recent articles were more likely to be retained, likely because the reviewer’s information was more current. Referees were more likely to be retained compared to co-referees with an odds ratio of 1.79, likely because co-referees were often relatively junior and some may not have any publications.

S.2 Model fit

Comparing the two alternatives for the citation predictor variables using either a linear variable or a binary “any versus none” variable.
A vs R/N = Approved vs Reservations/Not approved, A/R vs N = Approve/Reservations vs Not approved.

The AIC (Akaike Information Criterion) is a trade-off of model fit and complexity. The smaller the AIC, the better the fit. Differences of 10 are considered large [35].

In most cases, the difference between the linear and binary variables was small (under 5). There were four comparisons out of 16 in which the linear variable had a smaller AIC than the binary variable and all differences were small (under 2). There were four comparisons where the AIC for the binary variable was over 10 units smaller than the linear variable, indicating a large difference in model fit. In summary, using a binary predictor variable is a generally better fit to the data than using a linear variable.

S.3 Linear results

The figure shows the estimates for the two research questions using a linear dose-response for citation counts instead of the binary predictor of any citation versus none. The strongest effect was a greatly reduced odds of “Approvep” for increasing self-citations. However, these estimates should be viewed with caution, as the binary predictor generally better fits the data (Supplement S.2).

Estimated odds ratios for using linear citations as the predictor.

S.4 Including co-reviewers

In a sensitivity analysis we included co-reviewers with reviewers. The results examining whether the reviewers gave a more favourable recommendation when cited (research question 1) were very similar.

Results with or without co-reviewers for research question 1.
Odds ratios and adjusted 99.4% confidence intervals for whether the reviewer gave a more or less favourable recommendation if they were cited. The results are shown for the combinations of predictor variables (linear or any vs none), outcome (Approved → Reservations → Not approved) and article version. The plot is designed to directly compare paired odds ratios with or without co-reviewers.

The results examining whether the reviewers gave a more favourable recommendation when they included self-citations (research question 2) were mostly very similar. Two noticeable differences were two odds ratios where including co-reviewers somewhat reduced the strength of the association. This was for article versions 2+ and examining Approved vs Reservations or Not approved. Despite the noticeable change in the odds ratio, the interpretation remains similar in that there was a strong reduction in the odds of a favourable recommendation when the reviewers included self-citations.

Results with or without co-reviewers for research question 2.
Odds ratios and adjusted 99.4% confidence intervals for whether the reviewer gave a more or less favourable recommendation when they included a self-citation. The results are shown for the combinations of predictor variables (linear or any vs none), outcome (Approved → Reservations → Not approved) and article version. The plot is designed to directly compare paired odds ratios with or without co-reviewers.

S.5 Potential confounding by the reviewers’ characteristics

We examined two potential reviewer-level confounders: their country and their number of publications as a proxy of their seniority.

To examine confounding by the reviewers’ publication counts, we used fractional polynomials to test potentially non-linear associations [40]. For most models, the best fit (according to the AIC) was achieved using a log-transformation. There was little evidence of any confounding by the reviewers’ publication counts as the odds ratios were similar for both research questions (Figures S.5 and S.6). A fractional polynomial of −2 tended to show the largest difference compared to the odds ratios with no confounders; however, this transformation was not the best fit and the differences were relatively small.

Examining potential confounding by reviewers’ publication counts for research question 1.
Odds ratios and adjusted 99.4% confidence intervals for whether the reviewer gave a more or less favourable recommendation when they were cited. We used fractional polynomials to examine a potentially non-linear association between reviewers’ publication counts and recommendation. The results for “None” are the results without the potential confounder. The results are shown for the combinations of predictor variables (linear or any vs none), outcome (Approved → Reservations → Not approved) and article version. Results are missing when the model did not converge.

For the leave-one-country-out sensitivity analyses, the results were generally similar regardless of which country was left-out. Leaving out the USA, which was the largest country, had a relatively large effect on the odds of recommending approved or reservation vs not approved for versions 2+ when using the “none vs any citations” predictor (Figure S.7) and on the odds of recommending approved or reservation vs not approved for versions 2+ when using the “none vs any citations” predictor (Figure S.8). However, neither change was substantively different from the results including all countries.

Examining potential confounding by reviewers’ publication counts for research question 2.
Odds ratios and adjusted 99.4% confidence intervals for whether the reviewer gave a more or less favourable recommendation when they included a self-citation. We used fractional polynomials to examine a potentially non-linear association between reviewers’ publication counts and recommendation. The results for “None” are the results without the potential confounder. The results are shown for the combinations of predictor variables (linear or any vs none), outcome (Approved → Reservations → Not approved) and article version. Results are missing when the model did not converge.

Leave-one-country-out sensitivity analyses for research question 1.
Odds ratios and adjusted 99.4% confidence intervals for whether the reviewer gave a more or less favourable recommendation when they were cited. The results are shown for the combinations of predictor variables (linear or any vs none), outcome (Approved → Reservations → Not approved) and article version.

Leave-one-country-out sensitivity analyses for research question 2.
Odds ratios and adjusted 99.4% confidence intervals for whether the reviewer gave a more or less favourable recommendation when they included a self-citation. The results are shown for the combinations of predictor variables (linear or any vs none), outcome (Approved → Reservations → Not approved) and article version.

S.6 Data validation

We randomly selected reviews from our analysis data and manually verified the accuracy of our automated data extraction. We checked the accuracy of:

Reviewers that were cited
Reviewers that were not cited
Reviewers that included self-citations in their review

We used a Bayesian calculation to estimate the error rates of our data extraction. We started with a vaguely informative Beta(1, 3.32) prior, which had a 90% probability that the error rate was under 0.5. This vague prior was used to exclude high error rates which were unlikely given our testing of the code during the construction of the data extraction. We created posterior estimates for the error rates using the observed counts of errors from manual checks. We calculated the 90% limits for the posterior distributions as an upper estimate of the error rates.

The distributions are plotted in Figure S.9 and the error rates are shown in Table S.2. The errors are proportions, with 0 for no errors and 1 for all errors. The highest error rate was for self-citations.

Distributions of the error rates. Vaguely informative prior and posteriors for errors for not cited reviewers, cited reviewers and self-citations.
The dashed vertical lines are at Pr(error ≤ x) = 90%.

Number of errors found in our data extraction algorithm from manual checks and the estimated 90% limit for the error rate

The two errors for reviewers not being cited were for citations to a book and a conference paper that did not have a DOI. All four errors in capturing self-citations were where the number captured was fewer than the true number, for example, we extracted 1 self-citation when the true number was 3.

S.7 Views of reviews

We randomly sampled 200 reviews from our sample and collected the number of times the review had been viewed online. A histogram of view counts is shown in Figure S.10, which had a strong positive skew with most reviews having 10 or fewer views. We used a Poisson model to estimate the annual number of views per year, accounting for the reviews’ publication dates. The mean number of views per year was 1.24 with a 95% credible interval of 1.20 to 1.28.

Histogram of online view counts of published reviews. The bins are in tens starting at [0, 10).

S.8 Self-citations examples

Example sentences that reviewers used when suggesting self-citations, using a random sample of 20 reviews.
The first column shows the number of self-citations suggested. We have removed any references to names using [xxxx]. The results are ordered by sentence length.

Acknowledgements

Thanks to all four journals for making all their data openly available and easily accessible. Thanks to Robin Blythe and staff from F1000 for providing feedback on a draft of this paper.

References

[1]
1. Yang F.-X.
2. et al.
2024RETRACTION: Origin of the distinct site occupations of H atom in hcp Ti and Zr/HfInternational Journal of Hydrogen Energy 91:933–941https://doi.org/10.1016/j.ijhydene.2024.10.197 Google Scholar
[2]
1. Seeber M.
2. et al.
2019Self-citations as strategic response to the use of metrics for career decisionsResearch Policy 48:478–491https://doi.org/10.1016/j.respol.2017.12.004 Google Scholar
[3]
1. Cranford S.
2020C.R.E.A.M: Citations Rule Everything Around MeMatter 2:1343–1347https://doi.org/10.1016/j.matt.2020.04.025 Google Scholar
[4]
1. Burton S.
2. et al.
2024Cite me! Perspectives on coercive citation in reviewingJournal of Services Marketing 38:809–815https://doi.org/10.1108/jsm-08-2024-0387 Google Scholar
[5]
1. Teixeira da Silva J. A.
2017The ethics of peer and editorial requests for self-citation of their work and journalMedical Journal Armed Forces India 73:181–183https://doi.org/10.1016/j.mjafi.2016.11.008 Google Scholar
[6]
1. Committee on Publication Ethics
2019Citation manipulationhttps://doi.org/10.24318/cope.2019.3.1
[7]
1. Wren J. D.
2. Valencia A.
3. Kelso J.
2019Reviewer-coerced citation: case report, update on journal policy and suggestions for future preventionBioinformatics 35:3217–3218https://doi.org/10.1093/bioinformatics/btz071 Google Scholar
[8]
1. Hamilton D. G.
2. et al.
2020Meta-Research: Journal policies and editors’ opinions on peer revieweLife 9:e62529https://doi.org/10.7554/eLife.62529 Google Scholar
[9]
1. Mehregan M.
2. Moghiman M.
2024The Unnoticed Issue of Coercive Citation Behavior for AuthorsPublishing Research Quarterly 40:164–168https://doi.org/10.1007/s12109-024-09994-0 Google Scholar
[10]
1. Martin B. R.
2013Whither research integrity? Plagiarism, self-plagiarism and coercive citation in an age of research assessmentResearch Policy 42:1005–1014https://doi.org/10.1016/j.respol.2013.03.011 Google Scholar
[11]
1. Heneberg P.
2016From Excessive Journal Self-Cites to Citation Stacking: Analysis of Journal Self-Citation Kinetics in Search for Journals, Which Boost Their Scientometric IndicatorsPLOS One 11:e0153730https://doi.org/10.1371/journal.pone.0153730 Google Scholar
[12]
1. Fong E. A.
2. Patnayakuni R.
3. Wilhite A. W.
2023Accommodating coercion: Authors, editors, and citationsResearch Policy 52:104754https://doi.org/10.1016/j.respol.2023.104754 Google Scholar
[13]
1. Singh Chawla D.
2019Two-thirds of researchers report ‘pressure to cite’ in Nature pollNature https://doi.org/10.1038/d41586-019-02922-9
[14]
1. Resnik D. B.
2. Gutierrez-Ford C.
3. Peddada S.
2008Perceptions of Ethical Problems with Scientific Journal Peer Review: An Exploratory StudyScience and Engineering Ethics 14:305–310https://doi.org/10.1007/s11948-008-9059-4 Google Scholar
[15]
1. Singh Chawla D.
2019Elsevier investigates hundreds of peer reviewers for manipulating citationsNature 573:174https://doi.org/10.1038/d41586-019-02639-9 Google Scholar
[16]
1. Frey B. S.
2. Eichenberger R.
3. Frey R. L.
2009Editorial Ruminations: Publishing KyklosKyklos 62:151–160https://doi.org/10.1111/j.1467-6435.2009.00428.x Google Scholar
[17]
1. Schriger D. L.
2. Kadera S. P.
3. von Elm E.
2016Are Reviewers’ Scores Influenced by Citations to Their Own Work? An Analysis of Submitted Manuscripts and Peer Reviewer ReportsAnnals of Emergency Medicine 67:401–406https://doi.org/10.1016/j.annemergmed.2015.09.003 Google Scholar
[18]
1. Stelmakh I.
2. et al.
2023Cite-seeing and reviewing: A study on citation bias in peer reviewPLOS One 18:1–16https://doi.org/10.1371/journal.pone.0283980 Google Scholar
[19]
1. Frandsen T. F.
2. Nicolaisen J.
2011Praise the bridge that carries you over: Testing the flattery citation hypothesisJournal of the American Society for Information Science and Technology 62:807–818https://doi.org/10.1002/asi.21503 Google Scholar
[20]
1. Smith R.
2006Peer Review: A Flawed Process at the Heart of Science and JournalsJournal of the Royal Society of Medicine 99:178–182https://doi.org/10.1177/014107680609900414 Google Scholar
[21]
1. Bohorquez N. G.
2. et al.
2025Health and medical researchers are willing to trade their results for journal prestige: results from a discrete choice experimentPrometheus
[22]
1. Lee C. J.
2. et al.
2013Bias in peer reviewJournal of the American Society for Information Science and Technology 64:2–17https://doi.org/10.1002/asi.22784 Google Scholar
[23]
1. Schmidt B.
2. et al.
2018Ten considerations for open peer reviewFlOOOResearch 7:969https://doi.org/10.12688/f1000research.15334.1 Google Scholar
[24]
1. Tennant J. P.
2. Ross-Hellauer T.
2020The limitations to our understanding of peer reviewResearch Integrity and Peer Review 5https://doi.org/10.1186/s41073-020-00092-1 Google Scholar
[25]
1. Aczel B.
2. et al.
2025The present and future of peer review: Ideas, interventions, and evidenceProceedings of the National Academy of Sciences 122https://doi.org/10.1073/pnas.2401232121 Google Scholar
[26]
1. Vendé B.
2. Barberousse A.
3. Ruphy S.
2025From 2015 to 2023, eight years of empirical research on research integrity: a scoping reviewResearch Integrity and Peer Review 10https://doi.org/10.1186/s41073-025-00163-1 Google Scholar
[27]
1. Currie G.
2024Open Science: What is publish, review, curate?https://elifesciences.org/inside-elife/dc24a9cd/open-science-what-is-publish-review-curate
[28]
1. F1000Research
2025Finding Article Reviewershttps://f1000research.com/for-authors/tips-for-finding-referees
[29]
1. F1000Research
2025Guidelines For Article Reviewershttps://f1000research.com/for-referees/guidelines
[30]
1. Begg C. B.
2. Berlin J. A.
1988Publication Bias: A Problem in Interpreting Medical DataJournal of the Royal Statistical Society 151:419https://doi.org/10.2307/2982993 Google Scholar
[31]
1. Committee on Publication Ethics
2017Ethical Guidelines for Peer Reviewershttps://doi.org/10.24318/cope.2019.1.9
[32]
1. Culbert J. H.
2. et al.
2025Reference coverage analysis of OpenAlex compared to Web of Science and ScopusScientometrics 130:2475–2492https://doi.org/10.1007/s11192-025-05293-3 Google Scholar
[33]
1. Priem J.
2. Piwowar H.
3. Orr R.
2022OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts
[34]
1. Massimo A.
2. et al.
2024openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlexThe R Journal 15:167–180https://doi.org/10.32614/RJ-2023-089 Google Scholar
[35]
1. Burnham K.
2. Anderson D.
2002Model Selection and Multimodel Inference: A Practical Information-Theoretic ApproachGoogle Scholar
[36]
1. Bland J. M.
2. Altman D. G.
1994Statistics notes: MatchingBMJ 309:1128https://doi.org/10.1136/bmj.309.6962.1128 Google Scholar
[37]
1. Gomez C. J.
2. Herman A. C.
3. Parigi P.
2022Leading countries in global science increasingly receive more citations than other countries doing similar researchNature Human Behaviour 6:919–929https://doi.org/10.1038/s41562-022-01351-5 Google Scholar
[38]
1. Campos-Arceiz A.
2. Primack R. B.
3. Koh L. P.
2015Reviewer recommendations and editors’ decisions for a conservation journal: Is it just a crapshoot? And do Chinese authors get a fair shot?Biological Conservation 186:22–27https://doi.org/10.1016/j.biocon.2015.02.025 Google Scholar
[39]
1. Hosmer D.
2. Lemeshow S.
3. Sturdivant R.
2013Applied Logistic RegressionWiley: Wiley Series in Probability and Statistics Google Scholar
[40]
1. Royston P.
2. Ambler G.
3. Sauerbrei W.
1999The use of fractional polynomials to model continuous risk variables in epidemiologyInternational Journal of Epidemiology 28:964–974https://doi.org/10.1093/ije/28.5.964 Google Scholar
[41]
1. Therneau T.
2. Grambsch P.
2013Modeling Survival Data: Extending the Cox ModelSpringer New York: Statistics for Biology and Health Google Scholar
[42]
1. Zou H.
2. Hastie T.
2005Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology 67:301–320https://doi.org/10.1111/j.1467-9868.2005.00503.x Google Scholar
[43]
1. Tay J. K.
2. Narasimhan B.
3. Hastie T.
2023Elastic Net Regularization Paths for All Generalized Linear ModelsJournal of Statistical Software 106:1–31https://doi.org/10.18637/jss.v106.i01 Google Scholar
[44]
1. Barnett A.
2024F1000Research – reviewer citation studyhttps://aspredicted.org/rn8vg.pdf
[45]
1. R Core Team
2024R: A Language and Environment for Statistical ComputingVienna, Austria: R Foundation for Statistical Computing https://www.R-project.org/
[46]
1. Barnett A.
2025Code and data for the analysis of the association between citations and peer review recommendationsZenodo https://doi.org/10.5281/zenodo.16551814
[47]
1. S D. Peer Review Report
2024For: COVID-19 Vaccine: Predicting Vaccine Types and Assessing Mortality Risk Through Ensemble Learning Algorithms [version 2; peer review: 2 approved, 2 approved with reservations]https://doi.org/10.5256/f1000research.153740.r257039
[48]
1. Oviedo-García M. Á.
2024The review mills, not just (self-)plagiarism in review reports, but a step furtherScientometrics 129:5805–5813https://doi.org/10.1007/s11192-024-05125-w Google Scholar
[49]
1. Anderson M. S.
2. et al.
2007The Perverse Effects of Competition on Scientists’ Work and RelationshipsScience and Engineering Ethics 13:437–461https://doi.org/10.1007/s11948-007-9042-5 Google Scholar
[50]
1. Seeber M.
2020How do journals of different rank instruct peer reviewers? Reviewer guidelines in the field of managementScientometrics 122:1387–1405https://doi.org/10.1007/s11192-019-03343-1 Google Scholar
[51]
1. Thombs B. D.
2. Razykov I.
2012A solution to inappropriate self-citation via peer reviewCanadian Medical Association Journal 184:1864https://doi.org/10.1503/cmaj.120597 Google Scholar
[52]
1. Ross-Hellauer T.
2. Deppe A.
3. Schmidt B.
2017Survey on open peer review: Attitudes and experience amongst editors, authors and reviewersPlos’ One 12:1–28https://doi.org/10.1371/journal.pone.0189311 Google Scholar
[53]
1. Taylor
2. Francis
2025Anonymous peer review: How to make your article ready for double-anonymous peer reviewhttps://authorservices.taylorandfrancis.com/publishing-your-research/peer-review/anonymous-peer-review/
[54]
1. Bergstrom C. T.
2. Bak-Coleman J.
2025AI, peer review and the human activity of scienceNature https://doi.org/10.1038/d41586-025-01839-w
[55]
1. Thombs B. D.
2. et al.
2015Potentially coercive self-citation by peer reviewers: A cross-sectional studyJournal of Psychosomatic Research 78:1–6https://doi.org/10.1016/j.jpsychores.2014.09.015 Google Scholar
[56]
1. Peebles E.
2. Scandlyn M.
3. Hesp B. R.
2020A retrospective study investigating requests for self-citation during open peer review in a general medicine journalPlos’ One 15:1–9https://doi.org/10.1371/journal.pone.0237804 Google Scholar
[57]
1. Sugimoto C. R.
2. Cronin B.
2012Citation gamesmanship: testing for evidence of ego bias in peer reviewScientometrics 95:851–862https://doi.org/10.1007/s11192-012-0845-z Google Scholar
[58]
1. Fong E. A.
2. Wilhite A. W.
2017Authorship and citation manipulation in academic researchPLOS One 12:1–34https://doi.org/10.1371/journal.pone.0187394 Google Scholar
[59]
1. Ho R. C.-M.
2. et al.
2013Views on the peer review system of biomedical journals: an online survey of academics from high-ranking universitiesBMC Medical Research Methodology 13https://doi.org/10.1186/1471-2288-13-74 Google Scholar
[60]
1. Pavlovic V.
2. et al.
2021How accurate are citations of frequently cited papers in biomedical literature?Clinical Science 135:671–681https://doi.org/10.1042/cs20201573 Google Scholar
[61]
1. Clarke R.
2. et al.
1999Underestimation of Risk Associations Due to Regression Dilution in Long-term Follow-up of Prospective StudiesAmerican Journal of Epidemiology 150:341–353https://doi.org/10.1093/oxfordjournals.aje.a010013 Google Scholar
[62]
1. Fire M.
2. Guestrin C.
2019Over-optimization of academic publishing metrics: observing Goodhart’s Law in actionGigaScience 8https://doi.org/10.1093/gigascience/giz053 Google Scholar
[63]
1. Tahamtan I.
2. Bornmann L.
2019What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018Scientometrics 121:1635–1684https://doi.org/10.1007/s11192-019-03243-4 Google Scholar
[64]
1. Rooyen S. van
2. et al.
1999Effect of open peer review on quality of reviews and on reviewers’ recommendations: a randomised trialBMJ 318:23–27https://doi.org/10.1136/bmj.318.7175.23 Google Scholar

Article and author information

Author information

Adrian Barnett
School of Public Health & Social Work, Queensland University of Technology, Brisbane, Australia
ORCID iD: 0000-0001-6339-0374
- For correspondence: a.barnett@qut.edu.au

Version history

Sent for peer review: August 14, 2025
Preprint posted: August 19, 2025
Reviewed Preprint version 1: September 22, 2025
Reviewed Preprint version 2: November 20, 2025
Reviewed Preprint version 3: November 28, 2025
Version of Record published: December 23, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.108748. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 1,143
downloads: 35
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Introduction

Methods

Journal selection

Brief information about the four included journals from the publisher F1000.

Data extraction

Study design

Graphical summary of the study design for research question 1 showing a dummy article and two reviews.

Statistical methods

Text analysis

Sample size

Reproducibility

Results

Descriptive statistics for the articles and peer reviews.

Odds ratios and probabilities for reviewers giving a more or less favourable recommendation depending on whether they were cited in the article.

Odds ratios for reviewers giving a more (OR > 1) or less (OR < 1) favourable recommendation depending on whether they were cited in the article (question 1) or included self-citations to their own research (question 2). All models were split by article version.

Odds ratios and probabilities for reviewers giving a more or less favourable recommendation if they included citations to their own research in their review.

Odds ratios and probabilities for reviewers giving a more or less favourable recommendation depending on if they included citations to other research in their review.

Sensitivity analyses

Text analyses of reviewers’ comments

Words in the reviewers’ comments that were associated with approving the article or not for reviewers who included a self-citation (n = 2, 710).

Discussion

Potential improvements to peer review

Related research

Strengths and limitations

Supplementary material

S.1 Included and excluded reviews

Flow chart of included reviews. ‘N’ is the number of articles and ‘n’ is the number of reviews.

S.2 Model fit

Comparing the two alternatives for the citation predictor variables using either a linear variable or a binary “any versus none” variable.

S.3 Linear results

Estimated odds ratios for using linear citations as the predictor.

S.4 Including co-reviewers

Results with or without co-reviewers for research question 1.

Results with or without co-reviewers for research question 2.

S.5 Potential confounding by the reviewers’ characteristics

Examining potential confounding by reviewers’ publication counts for research question 1.

Examining potential confounding by reviewers’ publication counts for research question 2.

Leave-one-country-out sensitivity analyses for research question 1.

Leave-one-country-out sensitivity analyses for research question 2.

S.6 Data validation

Distributions of the error rates. Vaguely informative prior and posteriors for errors for not cited reviewers, cited reviewers and self-citations.

Number of errors found in our data extraction algorithm from manual checks and the estimated 90% limit for the error rate