Exploring disparities in peer review
(a) Effects of the subfield of neuroscience on sentiment (left) and politeness (right) scores. No effects were observed on sentiment (Kruskal-Wallis ANOVA, H = 2.380, P = 0.6663) or politeness (Kruskal-Wallis ANOVA, H = 8.211, P = 0.0842). n = 178, 149, 100, 20, 125 reviews per subfield.
(b) Effects of geographical location of the senior author on sentiment (left) and politeness (right) scores. No effects were observed on sentiment (Kruskal-Wallis ANOVA, H = 1.856, P = 0.3953) or politeness (Kruskal-Wallis ANOVA, H = 0.5890, P = 0.7449). n = 239, 208, 103 reviews per continent.
(c) Effects of QS World Ranking score of the senior author’ institutional affiliation on sentiment (left) and politeness (right) scores. No effects were observed on sentiment (Linear regression, R2 = 0.0006, P = 0.6351) or politeness (Linear regression, R2 < 0.0001, P = 0.9804). n = 430 reviews.
(d) Effects of the first author’s name on sentiment (left) and politeness (right) scores. No effects were observed on sentiment (Mann-Whitney test, U = 19521, P = 0.2131) but first authors with a female name received significantly less polite reviews (Mann-Whitney test, U = 17862, P = 0.0080, Hodges-Lehmann difference of 2.5). Post-hoc tests on the data split per lowest/median/highest politeness score indicated significantly lower politeness scores for females for the lowest (Mann-Whitney test, U = 1987, P = 0.0103, Hodges-Lehmann difference of 5) and median (Mann-Whitney test, U = 1983, P = 0.0093, Hodges-Lehmann difference of 2.5) scores, but not of the highest score (Mann-Whitney test, U = 2279, P = 0.1607). n = 206 (F), 204 (M) reviews for top panels; n = 71 (F), 74 (M) papers for lower panel (but n = 54 (F), 53 (M) papers for median scores, because not all papers received 3 reviews).
(e) Effects of the senior author’s gender on sentiment (left) and politeness (right) scores. Women received more favorable reviews than men (Mann-Whitney test, U = 28007, P = 0.0481, Hodges-Lehmann difference of 5) but no effects were observed on politeness (Mann-Whitney test, U = 29722, P = 0.3265). Post-hoc tests on the data split per lowest/median/ highest sentiment score indicated no effect of gender on the lowest (Mann-Whitney test, U = 3698, P = 0.7963) and median (Mann-Whitney test, U = 3310, P = 0.1739) sentiment scores, but the highest sentiment score was higher for women (Mann-Whitney test, U = 2852, P = 0.0072, Hodges-Lehmann difference of 5). n = 155 (F), 405 (M) reviews for top panels; n = 53 (F), 143 (M) papers for lower panel (but n = 39 (F), 102 (M) papers for median scores, because not all papers received 3 reviews). Asterisks indicate statistical significance in Mann-Whitney tests; * P < 0.05, ** P < 0.01.