Assessing peer review with AI

An analysis of peer review at a major journal using OpenAI’s ChatGPT identified potential gender bias in the peer review process.

Colorful bookcase. Image credit: Maarten van den Heuvel (CC0)

Peer review is a vital step in ensuring the quality and accuracy of scientific research before publication. Experts assess research manuscripts, advise journal editors on publishing them, and provide authors with recommendations for improvement. But some scientists have raised concerns about potential biases and subjectivity in the peer review process. Author attributes, such as gender, reputation, or how prestigious their institution is, may subconsciously influence reviewers’ scores.

Studying peer review to identify potential biases is challenging. The language reviewers use is very technical, and some of their commentary may be subjective and vary from reviewer to reviewer. The emergence of OpenAI’s ChatGPT, which uses machine learning to process large amounts of information, may provide a new tool to analyze peer review for signs of bias.

Verharen demonstrated that ChatGPT can be used to analyze peer review reports and found potential indications of gender bias in scientific publishing. In the experiments, Verharen asked ChatGPT to analyze more than 500 reviews of 200 neuroscience studies published in the scientific journal Nature Communications over the past year. The experiments found no evidence that institutional reputation influenced reviews. Yet, female first authors were more likely to receive impolite comments from reviewers. Female senior authors were more likely to receive higher review scores, which may indicate they had to clear a higher bar for publication.

The experiments indicate that ChatGPT could be used to analyze peer review for fairness. Verharen suggests that reviewers might apply this tool to ensure their reviews are polite and accurate reflections of their opinions. Scientists or publishers might also use it for large-scale analyses of peer review in individual journals or in scientific publishing more widely. Journals might also use ChatGPT to assess the impact of bias-prevention interventions on review fairness.