Research: Gender bias in scholarly peer review

  1. Markus Helmer  Is a corresponding author
  2. Manuel Schottdorf
  3. Andreas Neef
  4. Demian Battaglia  Is a corresponding author
  1. Max Planck Institute for Dynamics and Self-Organization, Germany
  2. Bernstein Center for Computational Neuroscience, Germany
  3. Yale University, United States
  4. Aix-Marseille University, France

Abstract

Peer review is the cornerstone of scholarly publishing and it is essential that peer reviewers are appointed on the basis of their expertise alone. However, it is difficult to check for any bias in the peer-review process because the identity of peer reviewers generally remains confidential. Here, using public information about the identities of 9000 editors and 43000 reviewers from the Frontiers series of journals, we show that women are underrepresented in the peer-review process, that editors of both genders operate with substantial same-gender preference (homophily), and that the mechanisms of this homophily are gender-dependent. We also show that homophily will persist even if numerical parity between genders is reached, highlighting the need for increased efforts to combat subtler forms of gender bias in scholarly publishing.

https://doi.org/10.7554/eLife.21718.001

Introduction

Peer review has an important role in improving the quality of research papers. It is the “lifeblood of research in academia […] the social structure that subjects research to the critical assessment of other researchers” (Bourdieu, 1975). This structure relies on self-regulated interactions within the scientific community, in which a journal editor appoints peer reviewers with expertise in the subject of a particular manuscript to report on the quality of that manuscript and to provide recommendations for its improvement. Other attributes of the peer reviewer, such as their gender, should be irrelevant (Moss-Racusin et al., 2012; Nature, 2013). However the identities of peer reviewers and editors are usually confidential, so previous work on gender balance in the peer-review process has relied on small, monodisciplinary data sets and these studies have given partly contradictory reports (Lloyd, 1990; Gilbert et al., 1994; Budden et al., 2008; Borsuk et al., 2009; Knobloch-Westerwick et al., 2013; Larivière et al., 2013; Buckley et al., 2014; Demarest et al., 2014; Handley et al., 2015b; Fox et al., 2016).

Frontiers journals (www.frontiersin.org) differ from most journals in that they generally disclose the identities of peer reviewers and associate editors alongside each published article in an attempt to increase the transparency and quality of the publication process (Poynder, 2016). This allowed us to extract the names of associate editors, peer reviewers and authors for articles published in Frontiers journals between 2007 (when the first Frontiers journal was published) and the end of 2015. This data set included the names of more than 9000 editors, 43,000 reviewers, and 126,000 authors for about 41,000 articles published in 142 journals in Science, Health, Engineering and the Humanities and Social Sciences (see Materials and methods). This data set is one of the largest available to date, and contains at least an order of magnitude more information than most data sets used in previous studies of peer review (see Supplementary file 2 for comparison).

Analysis of this data set reveals that women are underrepresented in the peer-review process, and that editors of both genders operate with substantial same-gender preference (homophily) when appointing reviewers. Moreover, our analysis suggests that this homophilic tendency will persist even when men and women are fairly represented in the peer-review process. Our results confirm the need for increased efforts to fight against subtler forms of gender bias in scholarly publishing and not just focus on numerical under-representation alone.

Results

To assess whether our data set was representative of an active and mature research community, we created directed networks (Figure 1a1), in which individual scientists appeared as vertices, while arrows denoted interactions between them (“is appointing” in the editor-to-reviewer network, and “is editing (reviewing) a manuscript of” in the editor (reviewer)-to-author network). As a whole, the networks had an exponentially fast growth in time, with a large fraction of people participating in a connected component of the graph reaching 90% of the total network size. Furthermore, graph theoretical metrics such as shortest path length, small-world index as well as several other network properties have changed little in the 3-5 last considered years (Figure 1—figure supplement 1). Thus, peer-reviewing interactions in the Frontiers journal series gave rise to a mature, topologically stable and integrated community, even though its contributors constitute only a small subset of researchers worldwide.

Figure 1 with 2 supplements see all
Women review and author even less articles than expected from their numeric underrepresentation.

(a1) We represent peer-reviewing interactions as directed graphs, in which vertices denote scientists. In the editor-to-reviewer network every edge represents the act of an editor (source vertex) appointing a reviewer (target vertex) to review a manuscript (and the reviewer has accepted the invitation). Analogously, in the reviewer-to-author network edges represent a reviewer reviewing a manuscript of an author. (b) The development of the fraction of contributions for each gender are shown for editors, reviewers and authors. Since the start of the Frontiers journals in 2007 until 2015, women (circles) edit, review and author much less than 50% of manuscripts, as expected from their numeric underrepresentation. However, the actual numbers of reviewing and authoring contributions by women are even smaller than expected by chance, taking into account their numeric underrepresentation. This is revealed by comparison with a null hypothesis in which gender and number of contributions are assumed to be independent. To this end, we generated surrogate ensembles by shuffling the genders of scientists appearing in a given role in the network (a2). From the surrogate ensembles, we obtained 95% confidence intervals (CIs; shaded areas in b). *, **, *** over (under) the data symbols denote the data lying over (under) the 95%, 99%, 99.9% CIs. Note that for all three subnetworks, there is a noticeable, but extremely slow trend towards equity (dashed line) for the fraction of contributions. (c) The fraction of female contributors, ranked in increasing order of authoring contributions, for the 47 frontier journals, whose published articles were handled by at least 25 distinct editors. Women were underrepresented consistently across all fields and particularly severely in math-intensive disciplines.

https://doi.org/10.7554/eLife.21718.002

We then looked for signatures of gender bias and of its evolution across time in the structure of these large networks. We study first the fractions of assignments for reviewing or editing given to female or male scientists and, for comparison, we also show the fractions of author contributions. Figure 1b reveals that the fractions of authoring, reviewing and editing contributions by women — amounting to 37%, 28% and 26%, respectively, in the complete accumulated data until 2015 — are always significantly smaller than the corresponding fractions for men. The unbalance between male and female contributions thus worsens when gradually ascending through the peer-review hierarchy. Apart from a few outlier countries, this pattern was dominant worldwide (Figure 1—figure supplement 2). It was also largely present in all the considered journals when looking at them individually (Figure 1c). Overall, the number of contributions by female authors varies between about 15% (Frontiers in Neurorobotics) and 50% (Public Health), by female reviewers between about 15% (Surgery) and 50% (Public Health), and by female editors from ca. 5% (Robotics AI) to 35% (Aging Neuroscience). Globally, we observed a trend towards gender parity across time. The rates of change were, however, very slow. Linear extrapolation based on the fractions observed from 2012 to 2015 would predict that exact parity could be achieved as late as 2027 for authoring, 2034 for reviewing and 2042 for editing.

We wondered whether these lower fractions of contributions to the different roles were just due to the fact that overall there are numerically less female than male authors, reviewers and editors (39%, 30% and 28% out of all available authors, reviewers and editors, respectively, were women, closely mirroring the observed fractions of assignments). To test this hypothesis, we took the exact same network of peer-review interactions in the Frontiers journals for given, and randomly permuted gender labels among scientists of a given role (Figure 1a2). This procedure maintained the ratio of female and male scientists acting in the different roles, but destroyed all direct correlations between gender and numbers of contributions. Repeatedly drawing random genders for the scientists in the network generated a surrogate ensemble that we used to estimate the expected number of contributions in a gender-blind control network. Author and reviewing contributions by women lay significantly below the confidence intervals obtained through this permutation testing procedure since 2009 and 2011, respectively. For female editing contributions we found the same, though non-significant, trend. Thus, the mere overall smaller number of female actors cannot explain the observed unbalanced fractions of female contributions to the peer-review chain.

We then looked for possible differences over the entire distributions of the number of peer-review tasks and authoring contributions for men and women. These distributions are fat-tailed (Figure 2a-c), indicating that some individuals provided a large number of contributions to the publication chain, while a majority of scientists authored, reviewed or edited only a small number of manuscripts. Moreover, comparing the observed degree distributions to the expectations derived from the same null hypothesis used above, women had a significantly smaller than chance probability to review (and author) more than one article, while their probability to act as single-time reviewer or author exceeded the expected chance level (Figure 2e-f). In the editing role, women underrepresentation was significant only for a high number of contributions. Furthermore, we found significant deviations from chance-level expectations across the entire studied time-span (Figure 2—figure supplement 1).

Figure 2 with 1 supplement see all
Women are underrepresented in the fat tail of contributions.

A break-down of the number of individuals contributing a given number of times as editors, reviewers and authors (binned, x-axis is marking the bin edges) shows that the majority of scientists (a) edited, (b) reviewed or (c) authored (corresponding zooms for small contribution numbers are shown in e-f) only a small number of manuscripts. Chance levels (shaded) were derived from an ensemble of reference networks constructed as shown in Figure 1a. The underrepresentation of women in relation to these chance levels tends to increase towards the fat tail of the distribution, associated to the relatively few individuals that made many contributions. In the group of one-time authors or reviewers however, women are overrepresented. Time resolved distributions are shown in Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.21718.005

The differences in assignment numbers may reflect behavioral or psychological differences between the groups of male and female scientists — either intrinsic or due to sociocultural context (Moss-Racusin et al., 2012; Nature Neuroscience, 2006; Ceci et al., 2009; Ceci and Williams, 2010; Goulden et al., 2011; Ceci and Williams, 2011; Bloch, 2012; Raymond, 2013; Shen, 2013; Handley et al., 2015a). Nevertheless, assignment numbers are also ultimately influenced by the editors’ active choices. To reveal whether any bias exists in the reviewer assignment relation, we first analyzed gender correlations between directly connected pairs of nodes in the editor-to-reviewer appointment network (Figure 3a) and found a marked gender homophily bias for both male and female editor nodes. Specifically, 73% of reviewers appointed by men were also men, 33% of reviewers appointed by women were women, but, importantly, both these numbers laid above the expectations drawn from the assumption that genders were randomly distributed in the given editor-to-reviewer network topology. Similarly, in the reviewer-to-author network (Figure 3b), male (female) reviewers assessed articles authored by male (female) authors significantly more often than expected.

Editors have a same-gender preference for appointing reviewers.

(a) Female editors (orange) appoint significantly more female reviewers than expected under the gender-blind assumption (shaded area). At the same time, male editors (green) appoint less women than expected. The development of this trend over time is shown, including articles cumulatively until the indicated year. (b) Likewise, female/male reviewers review significantly more female/male-authored articles than expected. (c) Homophily is widespread across scientific fields, including those with relatively mild underrepresentation of women. We here report four example disciplinary groupings, with large numbers of contributions (from left to right, respectively, 13416, 4721, 4020, 5680) and the propensity of appointing a female reviewer depending on the editor's gender for each of these groupings. Only assignments by female neuroscience editors were not homophilic, otherwise the occurrence of same-gender preferences was general, arguing against heterogeneity between subfields as a cause for homophily in assignments. (d) Plotted here are distributions of a measure of inbreeding homophily. To control for baseline homophily at the level of a narrow local neighborhood, we measure, for each editor node, the actual number of reviewer assignments given to women and subtract the expected number, which would be observed if the considered editor appointed women with the same frequency as in his/her local vicinity. For male editors (green) the distribution is skewed towards an underrepresentation of female assignments (left-leaning), while for female editors the distribution is skewed towards an overrepresentation of female assignments. This highlights that homophily bias is detectable even at the level of the reachable narrow surrounding of each editor. (e) Histogram of the probability that an editor assigns as least as many reviews to people of the same gender as he/she actually does reveals that there's an excess of strongly inbreeding-homophilic editors (small Φhom -values) among both men and women compared to expectation (shaded area). Note that below Φhom < 0.1 there are only few strongly homophilic female editors. For male editors, significant homophily extends through many more editors until Φhom < 0.6. (f) Using all data until 2015, the probability that a women is appointed is above expectation (shaded areas) only for female editors and only when all or all but the most extremely inbreeding-homophilic editors are included in the analysis.

https://doi.org/10.7554/eLife.21718.007

While these findings seem to point at homophily created by choices, they might also stem from “baseline” homophily (McPherson et al., 2001), i.e. subtle but unavoidable bias caused by disproportions in the number of reachable male and female nodes due to heterogeneous network structure. We first checked for the influence of local subnetwork structure on apparent gender bias by looking at different scientific fields, including those with relatively mild underrepresentation of women, and found homophily widespread across disciplines (Figure 3c). Second, a more detailed analysis of inter-node gender correlations in the editor-to-reviewer appointment network detected a clear tendency to gender homophily already at the level of the narrow neighborhood of individual nodes (Figure 3d). Specifically, to control for baseline homophily at the level of a narrow local neighborhood, we measured, for each editor node, the actual number of reviewer assignments given to women. We then subtracted from this number its chance expectation, derived individually for every node from the frequency of locally reachable female reviewers, i.e. reviewers situated at most five links away (which is a short distance relative to the average shortest path length of 12 steps for the editor-to-reviewer network, cf Figure 1—figure supplement 1e). Even at this local neighborhood level, we continued to find that male (female) editors generally appointed female reviewers at a lower (higher) rate than expected. Both independent analyses – by topic or localized – validate the existence of a so-called “inbreeding” homophily, i.e. an active preference to connect with same-gender network nodes, on top of “baseline” homophily (McPherson et al., 2001).

Finally, we wondered whether the observed inbreeding homophily in the network was due to the presence of a few strongly homophilic editors or whether, alternatively, homophilic attachment was a feature shared by most editors. To that end, we defined an index of inbreeding homophily at the local level of each editor node. For each considered editor node, we first evaluated the number k of connected same-gender reviewers. We then evaluated the probability 0 ≤ Φhom ≤ 1 that k (or more) homophilic connections could arise by baseline homophily only, taking into account the editor-specific basin of locally reachable male and female reviewers (defined as for Figure 3d). Such Φhom can serve as an index tracking the strength of inbreeding homophily in shaping the actual reviewer appointments by an editor. Large values of Φhom approaching 1 indicate that the observed gender homophilic choices of a given editor are plausibly just due to “passive” baseline homophily. In contrast, small values of Φhom approaching 0 hint at a stronger tendency to “active” – consciously or unconsciously, see Discussion – inbreeding homophily. Figure 3e shows the histograms of the index Φhom for male and female editors, compared with expectations from gender-shuffled networks. For male editors, most histogram bins for Φhom < 0.6 displayed node counts significantly larger than gender-shuffled estimations. The histogram of Φhom for female editors showed much fewer significant overrepresentation and most of them at very low values of Φhom, however it remained compatible with gender-shuffled estimations for most of the Φhom range.

These different distributions of inbreeding homophilic tendencies resulted in a gender-dependent impact of the reviewer-appointment choices of male and female editors in determining the overall number of female reviewer appointments. To determine this impact we pruned links originating from editors with inbreeding homophily index Φhom below a growing threshold Φthr (retaining only editors whose Φhom satisfies 0 ≤ ΦthrΦhom ≤ 1) and we did so separately for male and female editors (Figure 3f). After pruning the most homophilic male or female editors, we evaluated the new resulting probabilities of appointing a female reviewer. On the one hand, we found that it was enough to remove the few most homophilic female editors with the lowest values of Φhom from the network, to bring the probability for a female editor to appoint a female reviewer back to chance-level. On the other hand, the probability for a male editor to appoint a female reviewer increased only very slowly by pruning more and more male editors. In particular, it remained significantly below chance expectations for all the considered thresholds for inclusion, 0 ≤ Φthr ≤ 0.5. This means that the overall smaller-than-chance probability of appointing female reviewers for male editors is due to inbreeding homophilic tendencies that are widespread among male editors, although at varying degrees of strength. In contrast, the overall larger-than-chance probability to appoint female reviewers for female editors is driven by the action of just a small number of strongly homophilic female editors, with most other female editors showing only “passive” baseline homophily.

Discussion

In this study, we found that apart from a few outliers depending on country and discipline, women are underrepresented in the scientific community with a very slow trend towards balance, which is consistent with earlier studies (Larivière et al., 2013; Fox et al., 2016; Topaz and Sen, 2016; Lerback and Hanson, 2017; Nature Neuroscience, 2006; Shen, 2013; Nature, 2012). In addition, we found that women contribute to the system-relevant peer-reviewing chain even less than expected by their numerical underrepresentation, revealing novel and subtler forms of bias than numeric disproportion alone. We reported clear evidence for homophily beyond the expected baseline levels in both genders (Figure 3) using a very large trans-disciplinary data set that allowed us to clarify a previously ambiguous picture (Lloyd, 1990; Gilbert et al., 1994; Borsuk et al., 2009; Buckley et al., 2014; Fox et al., 2016). This network-level inbreeding homophily is driven by a large fraction of male editors, together with only a few highly homophilic female editors.

Evolution of participation rates by gender and causes for remaining inequity

To start our discussion on a positive note, we found that the participation of women in science, at least in terms of their numerical representation, has increased during the last years, which is consistent with other studies. The number of female doctoral recipients at US institutions increased by, on average, 0.1% - 0.6% per year between 2005 and 2015, depending on broad field of study (National Science Foundation, 2016). Ley and Hamilton (2008) reported that the number of fraction of women in medical schools increased by, on average, 0.6% to 0.8% per year (depending on the scientists' rank) between 1996 and 2007. Percentages of women professors has increased at a rate of 0.5%-1% per year in the European Union (ETAN Expert Working Group on Women and Science,ETAN Expert Working Group on Women and Science, 2000). The fraction of publishing female scientists in Germany increased by, on average 0.7% from 2010 to 2014, it is now 30.9% (Pan and Kalinaki, 2015). Fox et al. (2016) found that the number of selected female reviewers in Functional Ecology increased by, on average, 0.8% per year between 2004 and 2014, while, notably, the number of female editors increased by, on average, 3.8% per year. Caplar et al. (2016) noted that the number of female first authors of astronomy articles increased by about 0.4% per year between 1960 and 2015. In the Frontiers series of journals, we found that the number of contributions by female authors, reviewers and editors increased by, on average, 1.1% / year, 1.2% / year and 0.9% / year between 2012 and 2015, respectively, similar to the numbers above.

What could be the reasons for the remaining inequity? It has been argued that underrepresentation of women in science may be due to conscious career choices by female researchers (Ceci et al., 2009; Ceci and Williams, 2010, 2011), even if it is not clear to which extent these choices are really free or rather constrained by society. Previous studies reported that, measured by their number of publications, women are generally less productive than men (Cole and Zuckerman, 1984; Zuckerman, 1991; Long and Fox, 1995; Xie and Shauman, 1998; Pan and Kalinaki, 2015; Caplar et al., 2016) and it has been suggested (Xie and Shauman, 1998) that this might be due to personal characteristics, structural positions, and marital status. Moreover, the fraction of female scientists decreases with rank or age (ETAN Expert Working Group on Women and Science, 2000; Ley and Hamilton, 2008; Goulden et al., 2011) and this shorter career length might contribute to the drop of female-to-male ratio for a high number of contributions. Nevertheless, women who persevere longer in their career despite obstacles are highly performing. While the productivity of young publishing female scientists in Germany was 10% lower than that of their male counterparts, the discrepancy reduced to just 3% for more senior scientists (Pan and Kalinaki, 2015). Also, it has been reported that women with children are not less productive than those without (Hamovitch and Morgenstern, 1977; Cole, 1979; Cole and Zuckerman, 1987), although young children might decrease productivity (Kyvik, 1990; Kyvik and Teigen, 1996). The low number of women among senior scientists might be particularly detrimental for a gender-neutral evaluation of scientific work, as the implicit association of “male” and “science” is strongest in the group of 40-65 year olds (Nosek et al., 2007). Moreover, declining an invitation to review is often due to a lack of time (Tite and Schroter, 2007) and it is possible that female scientists spend more time with duties beyond research (e.g. teaching, mentoring, service; ETAN Expert Working Group on Women and Science, 2000; Knapp, 2005; Misra et al., 2011). On the other hand, a compensating factor seems to be that female editors, in contrast to authors, have been reported to be more productive than male editors (Gilbert et al., 1994). Interestingly, men and women who are invited to review a manuscript have very similar propensities to accept the invitation (Fox et al., 2016; Lerback and Hanson, 2017), suggesting: (1) that simply increasing the number of invitations to female reviewers would have a direct and proportional effect; and, (2) that the low number of female reviews in our data is caused in part by a lower number of invitations.

The underrepresentation and discrimination of women in the scientific community is a problem that will not solve by itself, given the pervasive, generally unconscious nature of gender bias. Women have been reported to be less likely to be hired (Moss-Racusin et al., 2012), to receive a grant (Wennerås and Wold, 1997), and to receive higher salaries (Shen, 2013). Still today, most people implicitly associate science with men, and liberal arts with women more than the other way round (Nosek et al., 2007), and this tendency, for both men and women, is apparent from a very young age (del Río and Strasser, 2013; Bian et al., 2017) and possibly reinforced by social dynamics in school education (American Psychological association, 2007; Duru-Bellat, 2008). Beyond that, men are more reluctant than women to believe that such a bias exists (Handley et al., 2015a), manifesting lack of interest for the problem (“negligence”) or, even, consciously assuming that gender discrimination cannot be avoided (“philosophical acceptance”) more often than females (Parodi, 2011).

How representative are the Frontiers journals?

The data analyzed here comprises a wide spectrum of scientific topics and the findings should generalize. However, Frontiers articles are unusual insofar as they undergo open peer review, whereas the identity of reviewers is not revealed in most other journals. Ambiguous reports exist whether open-peer review (as opposed to single- or double-blind peer-review) affects potential reviewers' willingness to assess a paper (Nature Neuroscience, 1999, van Rooyen et al., 1999; Ware, 2008; Baggs et al., 2008). In particular, a primary concern in disclosing reviewer’s identity is the possibility that a rejected author may also become a prospective employer for the reviewer and hence a possible reluctance of peers in more vulnerable positions to accept an invitation to review. While it is conceivable that assignment rejection due to non-anonymity is more likely for early career scientists, we do not see any reason for a direct effect of gender and such an effect has not been reported to the best of our knowledge.

Then, how does the population of scientists contributing to the Frontiers series of journals compare to other scientometric populations? First, we compared our authorship data to that of Larivière et al. (2013) who analyzed gender bias in articles from a wide range of journals that were published between 2008 and 2012, comprising about 3 million authors. While no analysis of peer review is performed therein, this study comprises an order of magnitude more authors than in our study. It can therefore serve as a benchmark for gender-composition among authors. They reported that 42% of authors in their analyzed scientific articles were women, whereas we found that number to be 39% in the Frontiers journals. Given uncertainties in determining a person's gender these numbers are comparable. Broken down by country, we find overall similar fractions of female authors, although, for some countries, the relative deviations can rise up to 29% (Supplementary file 1). However, small sample sizes, together with, possibly, a varying popularity of the Frontiers journals in different countries, might contribute to such deviations.

Second, not much data was available concerning gender bias among reviewers and editors, until very recently. Many previous studies (cf. Supplementary file 2) were self-diagnoses performed by editorial boards of the corresponding journal and, as a consequence, tended to be based on mono-disciplinary data of relatively small sample size. Larger sample sizes, but limited to editors, were considered in an analysis of the composition of editorial boards of 435 mathematical journals (Topaz and Sen, 2016). Only 9% of editors were women. Other reported numbers for the fraction of women editors in journals of different disciplines range from 38% to 54% (cf. Supplementary file 2). These numbers lie at the lower and upper end of the female editor fractions across the Frontiers journals, ranging between 6% (Frontiers in Robotics and AI) and 37% (Frontiers in Aging Neuroscience), with an average of 28%. Concerning female reviewers or female reviewer appointments, fractions reported in the literature range between 16% and 48% (cf. Supplementary file 2), to be compared with the range between 11% and 48% for the Frontiers journals with an average of 30%. Concerning female authors, Pan and Kalinaki (2014) report fractions ranging from 15% in computer science to 57% in veterinary science. These numbers are once again comparable to female author fractions in Frontiers journals, ranging from 17% (Neurorobotics) to 48% (Public Health). Overall, our study provides thus a global account on the prevalence of women among editors and reviewers and ranks previous reports in a continuum of field-specific participation numbers. Importantly, our data is consistent with these diverse reports, highlighting that the Frontiers peer-review networks are well representative of widespread patterns.

Our work calls for a detailed comparison with another recently published report about peer reviewer assignments in 20 journals of the American Geophysical Union (AGU), based on a slightly smaller sample size compared to ours (Lerback and Hanson, 2017). This study reports information about aspects that our study could not have access to, breaking down women’s underrepresentation by age and showing that the decline rate for invited reviews is only slightly smaller for women than men. Overall, relative fractions of female participation reported by this study are compatible with numbers we found for the journal Frontiers in Earth Science, with e.g. a matching female reviewer appointments fraction close to 20%, suggesting that women play a larger role in other fields compared to that report (cf. Figure 1c). For the AGU journals the authors conclude that editors, especially male ones, appoint too few female reviewers. Male editors’ behavior in that study thus agrees with our findings for the entirety of Frontiers journals, while we find an opposite trend for female editors. We note here that Lerback and Hanson (2017) reached their conclusion of women’s underrepresentation by comparing actual reviewer appointment numbers to the fraction of female first authors. This comparison, however, might be questionable, because reviewers in low age groups are rarely invited by editors (3% of times) whereas first authors tend to be young. To account for such differences, we determined expectation levels by gender shuffling among the reviewers and editors in the fixed network of actual reviewer-editor interactions and find that the fraction of female authors (the expectation value that Lerback and Hanson used) is much higher than the expected number of female reviewer contributions (our expectation value; cf. Figures 1b and 3a). For that reason, Lerback and Hanson may have quantitatively overestimated the female editors’ bias against female reviewer appointments. Still, despite this overestimation, even Lerback and Hanson reported female editors’ preference for female reviewers for certain age classes (although not commented upon).

Homophily in society and science

The phenomenon of gender homophily in peer-reviewing networks have already been described, but these previous reports have reached ambiguous conclusions. Lloyd (1990) found that female reviewers accepted female-authored papers at a higher rate than those of male authors, whereas male reviewers did not show such a bias. In contrast, Borsuk et al. (2009) reported that male and female reviewers were equally likely to reject a female-authored paper. The probability that a female editor appoints a female reviewer was reported to be 31%-33%, whereas male editors appointed female reviewers in 22%-27% of cases (Gilbert et al., 1994; Buckley et al., 2014; Fox et al., 2016). Here, for the whole spectrum of Frontiers journals we found these numbers to be similar: 33% and 27%, respectively. However, our study concludes for the existence of significant inbreeding homophily in the reviewer appointing behavior for both male and female editors, and does so based on a pluri- rather than mono-disciplinary data set, substantially larger than all previous accounts of homophily in peer review.

Socrates, in Plato’s Phaedrus, already asserted that: “similarity begets friendship”. Homophily – or “attraction for the similar”, not only limited to the gender attribute – is ubiquitous in social networks. Since the classic studies of Park and Burgess (1921) and Lazarsfeld and Merton (1954), gender homophily has been found in groups of playing children (Bott, 1928; Shrum et al., 1988) and adult friends (Verbrugge, 1977) and is also present in work environments (Brass, 1985; Bielby and Baron, 1986; Ibarra, 1992) and voluntary organizations (Popielarz, 1999). Since focused interactions between co-workers favor the formation of relations, operation in already homophilic environments will lead to an amplification of homophily (Feld, 1981; Feld, 1984). In particular, homophilic styles of professional interaction with peers may persist since the time in which they were (un-)consciously learned in homophilic school environments (Vinsonneau, 1999).

Importantly, even a slight homophily can influence and alter the way in which information spreads (Yava and Yucel, 2014) and opinions form through the social network of interactions, leading to the emergence of “dead-end” cultural niches (Mark, 2003). Homophilic groups indeed tend to vote together when asked to decide for something (Caldeira and Patterson, 1987) and have similar prospective evaluations, a same mindset (Galaskiewicz, 1985). While homophily can in principle be put to good use, as for instance in the education about good health practices (Centola, 2011), the uncontrolled effects of homophily may constitute a threat to the universalism of the peer-review system, and thus to science.

Gender-specific mechanisms of homophily

We observed very different patterns of homophily for male and female editors, with a widespread homophily across men, while dominated by very few highly homophilic editors for women. After removal of their contribution, homophily became insignificant (cf. Figure 3e,f). This suggests that there is only baseline homophily for the majority of female editors and most assignments are gender-blind (for instance in the neuroscience community, cf. Figure 3c). Differences between men’s and women’s homophily patterns are classically known, finding their root in different styles of social network construction. For instance, in situations where a mutual friendship exists between A and B a friendship initiation with C tends to be reciprocated by boys, but not by girls (Eder and Hallinan, 1978). Such differences in attachment strategies tend to generate gender-segregated worlds for children to preadolescents in which girls evolve in small homogeneous groups and boys form larger but more heterogeneous cliques, with boundaries made looser only later by romantic ties (Shrum et al., 1988). Professional social networks of men are more homophilic than women’s, especially in work environments in which men are dominant (Brass, 1985; Ibarra, 1992). Another source of asymmetry may be that both men and women tend to form connection routes passing through a male node when reaching toward distant domains (Aldrich, 1989).

One could speculate that other factors might contribute, like friendship or (perceived) status, competency and reputation. These factors might, in turn, be partly depending on gender, e.g. through implicit biases (Nosek et al., 2007; Merton, 1968; Paludi and Bauer, 1983). Multiple categories of relationships were analyzed, for instance, by Ibarra (1992) who reported that, in a company setting, men named mostly men as points of contact for five different business-relationship categories, whereas for women the preferred gender was category-dependent. A similar situation could be at work here: one could speculate that a set of other, hidden, variables influence reviewer appointment decisions, and that these variables have a different importance for male and female editors. Determining which factors are most important for male and female editors in the choice of the reviewer and how these factors are or are not, in principle, related to gender, might thus aid in reducing homophily in the peer-review system.

Our finding of strongly homophilic “topology-organizer” female editors is reminiscent of the notion of “femocrat” introduced in political studies, referring to the role played by isolated feminists who, after having managed to integrate inside men-dominated decisional organisms, provide a bridge to the spheres of power for the requests of activists outside of them (Yeatman, 1990). Now, while the active engagement of these femocrats is very useful in pushing forward technocratic (i.e. top-down) solutions aiming at reducing gender discriminations, especially at an early stage, on the long-term, the effects of their action may be precarious. Indeed political experiences have shown that when an external event reduces the influence of these isolated driver women, the situation can quickly deteriorate again (Outshoorn, 2005), aggravated by the suspicious look toward femocrats held by formerly dominant men or, paradoxically, even women, finding them too prone to compromise or too aggressive (Outshoorn and Kantola, 2007). It is thus important to devise strategies ‘healing’ network topology in depth, and in a bottom-up fashion, via pervasive education campaigns targeted to the deciders (Sainsbury, 1994), in our case chiefly the editors. Such strategies are required to protect the acquirements of top-down actions against gender discriminations: increasing the number of women will not be enough to overcome gender bias (Isbell et al., 2012; Avin et al., 2015).

Conclusions

Ideally, all scientific interactions are gender-blind. A scientist's status and the provision of resources to scientists should not be influenced by gender but solely depend on the value of the scientific contributions. Access to the publication systems is a critical determinant of a scientist’s success. Accordingly, reviewers and editors, the gatekeepers of the scientific canon, should be particularly sensitive to base their judgment solely on the merit of scientific work. This merit, however, is difficult to determine and any assessment is necessarily influenced by the assessor's view of the field, including his or her personal position in the network of colleagues and the interactions with them (Mulkay, 1979; Cole, 1992).

Inbreeding homophily, an increased affinity between persons with similar attributes, appears to be a sociological, population-level trait of human societies. It is only natural, thus, that we find gender homophily in interactions between editors, reviewers and authors. Nonetheless, this inbreeding homophily is damaging to female scientists, whose work ends up being overlooked, due to unconscious negative bias. The phenomenon of inbreeding homophily is also likely not restricted to the peer review of manuscripts, so it needs to be taken into account for grant evaluation, hiring, or when designing mentoring programs. Importantly, it is likely to persist even when numerical balance between genders is achieved (Isbell et al., 2012). Altogether, inbreeding homophily negatively affects science as a whole because a stronger involvement of women would increase the quality of scientific output (Merton, 1973; Woolley et al., 2010; Nature, 2013; Campbell et al., 2013). Consequently, all scientists should wholeheartedly support the endeavor to remove gender bias from science - but how could that be achieved?

Initiatives to remove gender-based inequality can roughly be divided into two different categories. On the one hand, “gender mainstreaming” (Special Adviser on Gender Issues and Advancement of Women, 2002) promotes the consideration among actors at all levels of every action’s and policy’s implications on women and men and is geared towards creating long lasting “bottom-up” changes. On the other hand, fast progress could be attempted through “top-down” implementation of technocratic instruments such as quota. This politically issued ‘state feminism’ (Mazur and McBride Stetson, 1995), is suboptimal in that it might even “provide an alibi” for not modifying attitudes in depth (Squires, 2008). As inbreeding homophily is an expression of a state of mind it is likely little amenable to change by externally enforced measures. Raising awareness, in comparison, seems to be the most promising route. The goal should be to motivate all scientific actors to “integrate thinking about gender discrimination in every decisional process” (translated from Woodward, 2008). Educative actions should be conducted with tact, not based uniquely on inducing feeling of guilt and shame, in order not to be perceived as annoying (Woodward, 2003). At the same time, existing formal actions to reduce bias should be upheld.

In the field of peer review two more specific strategies are available to reduce bias: blind review and automated editorial management. However, both strategies are of limited acceptance and use. First of all, removing the authors’ names is often not sufficiently blinding. References to the authors’ previous publications or to the approving ethics committee all but spell out the authors. Second, while removal of the authors’ names does indeed blind the reviewers to all irrelevant attributes, it also blinds them to relevant meta-data, such as the scientific experience of the authors, which might be considered as relevant by many reviewers. In an attempt to assist editors of Frontiers journals, keyword-based reviewer suggestions are automatically provided to them but the editors remain free to make their own choices. While these gender-blind automated suggestions could already contribute to an assignment that is less influenced by homophily, an editorial management software is also the ideal platform to routinely direct the editor’s attention to the issue of homophily. It could display statistics similar to our Figure 3 and encourage non-homophilic choices of reviewers. Such a strategy maintains full editorial freedom and could easily be evaluated, either internally or, in the case of open review as in the Frontiers journals, through analysis of the publicly available data.

Given how engrained homophily is in our nature, the path towards a gender-blind science will be arduous. Yet, with the joined effort of the scientific community to overcome partisanship and discrimination, a merit-based system with equal opportunities for all scientists might just be within reach. After all, which social enterprise would be more apt to follow ratio over instinct than science?

Materials and methods

Collection and parsing of data

Request a detailed protocol

All article data were exclusively obtained from the publicly available articles web pages from the Frontiers Journal Series (RRID:SCR_007214), which was listed (at the date of last data download in March 2016) on: http://www.frontiersin.org/SearchIndexFiles/Index_Articles.aspx, as well as the associated XML file if the HTML code of the article web page contained a corresponding reference. Subsequently, articles' metadata (article id, authors, reviewers, editors, publication date, etc.) were extracted from the XML files and the web pages. All gathered personal identity information was deleted after inference of individual genders (see later), resulting thus in a fully anonymized data set. In total, we analyzed 41’100 articles published before January 1st, 2016, covering 142 Frontiers journals from Science, Health, Engineering, and Humanities and Social Sciences. Our parsing routine was able to find information about authors in 41’092 of these articles, about reviewers in 39’788 articles (note that some articles, like editorial articles, might not have been reviewed), and about editors in 40’405 articles. The anonymized network data is provided as Supplementary file 3.

To recognize and identify people re-occurring in more than one article, every person was assigned a unique identifier number (UID). When a contributor was found to be associated to an official profile identification number in the Frontiers database, then we relied on it, directly translating it into a UID (this happened for 71% of contributors). In the remaining cases, we decided whether a record matched another based on the names and affiliations of people. Specifically, for two names to be matched, we required that the surnames coincided and that each given name of the contributor with less given names needed to have a corresponding match in the other contributor's name (a match could also be an initial like “J” with a fully specified name like “John”). In case both contributors' given names consisted of only initials, we required, in addition, that their affiliations were sufficiently similar. Newman (2001) found that name-matching in the absence of UIDs, and even abbreviating all given names to initials, resulted in errors on the order of few percent in a data set comprising more than a million people. Correspondingly, as we expect the UIDs to be correctly associated with a contributor in the vast majority of cases, erroneously matching or not matching people is likely relatively uncommon.

Determination of gender

Request a detailed protocol

Each UID was assigned a gender based on their associated given names (note that after the steps described in the previous section, at least one first name was fully specified for 99.6% of the UIDs, while for the remaining 0.4% of UIDs all given names consisted of only initials so that no gender could be attributed). The extracted given names were compared with an extensive name list, assembled from public web-sources, such as:

Note that some given names (like Andrea) are in use for both men and women. Gender-ambiguous given names present in the US census database were categorized to the gender to which they were more frequently attributed. When a name appeared as both male and female in one of the other sources, or when different sources did not agree on the gender for a name, we decided not to associate that given name with a gender.

We validated the gender assignment procedure by performing a web search for 1053 randomly selected people from our data set, and determining their gender based on a picture or the use of gender-specific pronouns in a biographical text. We were able to find such information for 924 out of the 1053 people (88%). The gender automatically assigned by our algorithm to those identified was correct in 96 % of cases. For comparison, we note that the name-gender algorithm used in Larivière et al. (2013) misclassified male and female names in 8% of cases.

Our list thus comprised 66605 female and 43482 male names. In addition to the name list, we manually assigned the non-automatically-identified gender of 643 people with a high number of re-occurrences. In total, we were thereby able to assign gender to 131885, that is 87 % of UIDs. All further analyses were done ignoring the remaining 13% of scientists.

Network construction

Request a detailed protocol

We represented the available data in directed networks (Figure 1a), in which vertices were individual scientists and edges denoted peer-reviewing interactions: is appointing in the editor-to-reviewer network, and is editing (reviewing) a manuscript of in the editor (reviewer)-to-author network. Year-resolved graphs were constructed by deleting all links representing articles that were published later than the given year.

Graph analytics

Request a detailed protocol

All graph analyses (Figure 1—figure supplement 1) were performed with the freely-available Python igraph package.

In graph theory, a connected component is a subgraph in which any two vertices are connected to each other by at least one path, and which is connected to no additional vertices in the full graph. The largest of all the connected components of a graph is called its giant component. One can distinguish between the weak giant component (in which the direction of edges is ignored when building inter-node paths) and the strong giant component (in which the direction is taken into account). All the following graph analyses have been performed on the weak giant component of the networks observed at each time.

Transitivity undirected (clustering coefficient) is calculated as the ratio of triangles to connected triangles (triplets) in the graph, considering connections between nodes independent of their direction.

Average path length calculates the mean of the geodesic directed path lengths between all pairs of nodes in a connected component. The geodesic path length between a given pair of nodes is the minimum number of links needed to travel between the nodes along connected edges.

Small-worldness S is defined in Humphries and Gurney (2008), as S=γ/λ. γ is the undirected transitivity of the graph divided by k/n, which is an approximation for the undirected transitivity of an Erdös-Rényi random graph with n nodes and average degree (in+out) of k. λ is the ratio of the average shortest path length of the graph to ln(n)/ln(k), which is the average shortest path length of an Erdös-Rényi graph with n nodes and average degree k.

Statistical testing

Request a detailed protocol

Statistical significance was established by comparing a feature of the data to its confidence interval (CI). The graphic notations *, ** and *** denote that this feature lay outside the 95%, 99% and 99.9% CI, respectively. Confidence intervals were calculated by recalculating the given feature 10000 times, after permuting gender labels (with the exception of Figure 3e where, for computational reasons, only 100 recalculations were performed). Specifically, Figures 1 and 2 are derived from a table with a column given the number of contributions (up to a specified time point) in a given role for each person, and another column of each person's gender, and the latter column was permuted keeping the former constant. On the other hand, confidence intervals in Figure 3 were obtained by repeatedly permuting genders among all nodes in a given graph, independent of their associated roles. The underlying graph used for Figure 3a and Figure 3c-f was a suitably pruned editor-to-reviewer graph, out of which: we first removed all self-loops (i.e. editor and reviewer are identical); second, we deleted all leaf nodes, i.e. scientist who never edited or reviewed anything and had therefore a null out-degree; third, for Figure 3c, we removed cross-disciplinary assignments from journals not belonging to the indicated category. Similarly, Figure 3b was derived from a deleafed reviewer-to-author graph.

Inbreeding homophily at a local level

Request a detailed protocol

Figure 3d shows two histograms, one over all male editor nodes, the other over all female editor nodes. For each editor i who appointed at least 2 distinct reviewers we calculated a measure Hi of inbreeding homophily. To compute it, we first measured the actual number of reviewer assignments given to women nodes by the considered editor i, Wi. The next step was to subtract the expected number of reviewer assignments given to women, which would be observed if the given editor node appointed women with the average frequency pi they are appointed in its local vicinity. To evaluate pi we took the set of all editors (both males and females) at a distance of at most 5 directed edges from the considered editor node i. We counted the overall number Aall of reviewer assignments made by these editors (i.e. the total number of edges originating from editor nodes in the neighborhood shell), and neglected those editors for which Aall < 62 (i.e. we required that, on average, at each of the 5 steps away from the considered editor at least 2 novel reviewers are encountered that could not have been reached in a shorter step count). We then determined the number AfemaleAall of reviewer assignments made toward female nodes. We finally assumed pi = Afemale / Aall.

We could then compute the local inbreeding homophily measure Hi=Wi – Ai pi, where Ai was the total number of assignment made by each considered editor i.

We used a similar technique to assess the impact of the most homophilic editors on the overall network-homophily in the female editor-to-reviewer and male editor-to-reviewer networks. Let qi be the probability a person of the same gender is chosen by an editor i, where qi is calculated exactly as pi in the previous paragraph, i.e. by considering all people at most 5 directed edges away from editor i in the editor-to-reviewer network, counting the number of assignments these people gave to people of the same gender and dividing by the total number of assignments these people made. Next, let ki denote the number of assignments editor i gives to a person of the same gender and ni the total number of assignments editor i makes. Assuming editor i chooses the gender of a reviewer at random, the probability that i assigns ki out of the ni reviewers to have the same gender follows a binomial distribution binom (ki; ni, qi) and Φhom =ν=kinibinom(ν;ni,qi) measures how likely it is that editor i assigns at least ki reviews to a person of the same gender.

References

  1. Book
    1. Aldrich H
    (1989)
    Networking among women entrepreneurs
    In: Hagan O, Rivchun C, Sexton D, editors. Women-Owned Businesses. New York: Praeger. pp. 103–132.
  2. Book
    1. American Psychological Association
    (2007)
    Report of the APA Task Force on the Sexualisation of Girls
    Washington: APA.
  3. Conference
    1. Avin C
    2. Keller B
    3. Lotker Z
    4. Mathieu C
    5. Peleg D
    6. Pignolet Y-A
    (2015) Homophily and the glass ceiling effect in social networks
    Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science.
    https://doi.org/10.1145/2688073.2688097
    1. Bloch C
    (2012)
    Passion and Paranoia: Emotions and the Culture of Emotion in Academia
    55–69, The Janus face of the peer-review, Passion and Paranoia: Emotions and the Culture of Emotion in Academia, Surrey, UK, Ashgate.
    1. Bott H
    (1928)
    Observation of play activities in a nursery school
    Genetic Psychology Monographs 4:44–88.
    1. Buckley HL
    2. Sciligo AR
    3. Adair KL
    4. Case BS
    5. Monks JM
    (2014)
    Is there gender bias in reviewer selection and publication success rates for the ‘New Zealand Journal of Ecology’?
    New Zealand Journal of Ecology 38:335–339.
    1. Cole JR
    2. Zuckerman H
    (1984)
    The productivity puzzle: persistence and change in patterns of publication of men and women scientists
    Advances in Motivation and Achievement 2:1.
  4. Book
    1. Cole JR
    (1979)
    Fair Science: Women in the Scientific Community (1st Edition)
    New York: The Free Press.
  5. Book
    1. Cole S
    (1992)
    Making Science
    Harvard University Press.
  6. Report
    1. ETAN Expert Working Group on Women and Science
    (2000)
    Science Policies in the European Union: Promoting Excellence Through Mainstreaming Gender Equality
    Luxembourg: European Commission.
  7. Report
    1. Federal Glass Ceiling Commission
    (1995)
    Solid Investments: Making Full Use of the Nation’s Human Capital
    Washington, DC: US Department of Labor.
    1. Goulden M
    2. Mason MA
    3. Frasch K
    (2011) Keeping women in the science pipeline
    The Annals of the American Academy of Political and Social Science 638:141–162.
    https://doi.org/10.1177/0002716211416925
    1. Holt A
    2. Webb T
    (2007)
    Gender in ecology: where are the female professors
    Bull Br Ecol Soc 38:51–62.
    1. Knobloch-Westerwick S
    2. Glynn CJ
    3. Huge M
    (2013)
    The Matilda effect in science communication an experiment on gender bias in publication quality perceptions and collaboration interest
    Science Communication 35:603–625.
  8. Book
    1. Lazarsfeld PF
    2. Merton RK
    (1954)
    Friendship as a social process: a substantive and methodological analysis
    In: Berger M, editors. Freedom and Control in Modern Society. New York: Van Nostrand. pp. 18–66.
    1. Lee CJ
    2. Sugimoto CR
    3. Zhang G
    4. Cronin B
    (2013) Bias in peer review
    Journal of the American Society for Information Science and Technology 64:2–17.
    https://doi.org/10.1002/asi.22784
  9. Book
    1. Mazur A
    2. McBride Stetson D
    (1995)
    Comparative State Feminism
    London: Sage.
    1. Merton RK
    (1968)
    The Matthew effect in science
    Science 159:56–63.
    1. Merton RK
    (1973)
    The Sociology of Science: Theoretical and Empirical Investigations
    267–280, The Normative Structure of Science, The Sociology of Science: Theoretical and Empirical Investigations, Chicago, University of Chicago Press.
    1. Misra J
    2. Lundquist JH
    3. Holmes E
    4. Agiomavritis S
    (2011)
    The ivory ceiling of service work
    Academe 97:22–26.
  10. Book
    1. Mulkay MJ
    (1979)
    Science and the Sociology of Knowledge
    Allen & Unwin.
    1. National Science Foundation
    (2016) Doctorate Recipients From US
    National center for science and engineering statistics, Doctorate Recipients From US, www.nsf.gov/statistics/2017/nsf17306/.
    1. Nature
    (2013)
    Science for all
    Nature 495:5.
  11. Book
    1. Outshoorn J
    2. Kantola J
    (2007)
    Changing State Feminism
    Palgrave McMillan.
    1. Outshoorn J
    (2005)
    Paper for the ECPR Joint Sessions of Workshop
    14–19, New politics, new opportunities? The women’s policy agency in the Netherlands in the last decade, Paper for the ECPR Joint Sessions of Workshop, Granada.
  12. Report
    1. Pan L
    2. Kalinaki E
    (2015)
    Mapping Gender in the German Research Arena
    Elsevier Analytical Services.
  13. Book
    1. Park RE
    2. Burgess EW
    (1921)
    Introduction to the Science of Sociology
    Chicago: University Chicago Press.
    1. Parodi M
    (2011)
    Les Discriminations Entre Les Femmes Et Les Hommes
    Les discriminations entre les hommes et les femmes au prisme de l’opinion, Les Discriminations Entre Les Femmes Et Les Hommes, Sciences Po Les Presses.
  14. Book
    1. Sainsbury D
    (1994)
    Gendering Welfare States
    London: Sage Publications.
  15. Book
    1. Special Adviser on Gender Issues and Advancement of Women
    (2002)
    Gender Mainstreaming: An Overview
    New York: United Nations.
    1. Squires J
    (2008)
    Diversity mainstreaming: dépasser les approaches technocratiques et d’addiction des inégalités.
    Cahiers du Genre 44:73–94.
  16. Book
    1. Vinsonneau G
    (1999)
    Inégalités Sociaux Et Procédés Identitaires
    Paris: Armand Colin.
    1. Woodward A
    (2003)
    European gender mainstreaming: innovative policy or disappearing act?
    Review of Policy Research 20:65–88.
    1. Woodward A
    (2008)
    Le Point Sur La Situation À Bruxelles
    95–116, Est-il trop tard pour une approche de l’égalité?, Le Point Sur La Situation À Bruxelles, Cahiers Du Genre, 44.
  17. Book
    1. Yeatman A
    (1990)
    Bureaucrats, technocrats, femocrats: essays on the contemporary Australian state
    Sidney: Allen & Unwin.
    1. Zuckerman H
    (1991)
    The Outer Circle: Women in the Scientific Community
    27–56, The careers of men and women scientists: A review of current research, The Outer Circle: Women in the Scientific Community.

Decision letter

  1. Peter Rodgers
    Reviewing Editor; eLife, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Gender bias in scholarly peer review" to eLife for consideration as a Feature Article. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Senior/Reviewing Editor. The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This is an interesting and high quality manuscript on a topic that is important and timely. The manuscript (which is based on a novel data set) merits publication after revision to improve readability and replicability.

Essential revisions:

1) To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all major data sets associated with an article to be made freely and widely available (unless there are strong reasons to restrict access, for example in the case of human subjects’ data), in the most useful formats, and according to the relevant reporting standards. (See below for more details).

2) How representative is the population studied by the authors? It would be interesting to compare the population studied by the authors with other large-scale studies to enhance the generalizeability of their findings. For example, it seems as if the authors could compare Figure 1—figure supplement 2 to Lariviere et al. (2013) in order to determine whether the population they study is significantly different to traditional scientometric populations. Also, there are several studies that show that women are less productive than men. This might be useful for explaining Figure 2.

3) The authors should more fully acknowledge the limitations of the data set. For example, the authors mention that the names are publicly disclosed (this is what allows this analysis). However, several studies have shown that open peer review significantly affects the willingness of certain populations to conduct reviews (e.g., by age, rank, gender, minority status). Could the fact that names are disclosed make this less generalizable? The limitations of this should be discussed.

4) The analyses only included reviewers who agreed to review (understandably, as there is no publicly available data for reviewers who declined to review). However, women might be more likely to say no to requests to reviews for various reasons (e.g. they might have less time to review, or they might feel less qualified to comment on their peers' work). Hence, might their underrepresentation stem from self-selection rather than discrimination?

5) It would be interesting to comment on the possible effect of age on gender discrepancies at different levels. Specifically, editors might (on average) be older than reviewers, who in turn might (on average) be older than authors (although with large variability). Is it possible that higher gender discrepancy among editors and reviewers is because women might be less represented in older age groups?

6) There is sparse information on author name disambiguation (for those that lacked a UID). More information should be provided on this. The authors state that they cannot guarantee a "completely error-free parsing" – could they run a validation test to give an estimate of the error? And was any validation exercise conducted on the name-gender algorithm? How does this compare with other name-gender algorithms?

7) Several seminal books on peer review and articles specifically focused on bias in peer review are omitted here. Furthermore, the article fails to engage with the previous literature in a substantial way. Given that this work is one of the largest peer-review studies to-date, it would be useful for the authors to establish how their work refutes or confirms previous work.

8) Regarding the third paragraph in the Results section: please include a figure which shows that the gender gap in various contributions cannot be fully explained by population.

9) There are many interesting results that warrant more explanation in the text. At present, some of the most interesting results are buried in the figure captions. It would be great to see the authors embed some of these results, in more detail, in the main text.

10) Overall, the differences between observed and chance-level participation of women (for instance in Figure 2) are not that large, even though they are statistically significant. How concerned should we be about them in the first place?

11) The authors advise that editors should be encouraged to make non-homophilic reviewer selection. But if the goal is to increase gender parity, shouldn't women editors be 'allowed' to be more homophilic, at least until the parity achieved? Might it be better to suggest that all editors should try to recruit more women reviewers?

https://doi.org/10.7554/eLife.21718.011

Author response

Essential revisions:

1) To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all major data sets associated with an article to be made freely and widely available (unless there are strong reasons to restrict access, for example in the case of human subjects’ data), in the most useful formats, and according to the relevant reporting standards. (See below for more details).

We thank the reviewer for the reminder. We have included the anonymized raw data in the revised submission.

2) How representative is the population studied by the authors? It would be interesting to compare the population studied by the authors with other large-scale studies to enhance the generalizeability of their findings. For example, it seems as if the authors could compare Figure 1—figure supplement 2 to Lariviere et al. (2013) in order to determine whether the population they study is significantly different to traditional scientometric populations. Also, there are several studies that show that women are less productive than men. This might be useful for explaining Figure 2.

We thank the reviewer for these suggestions. We have now augmented the Discussion section with a comparison of the Frontiers population with other scientometric populations (subsection “How representative are the Frontiers journals?”) and included two tables (Supplementary files 1 and 2) summarizing previous studies. Specifically, we find that gender composition of Frontiers articles authors is similar to that of Lariviere et al. (2013), although when broken down by country outliers do exist. We also included a comparison with (1) a large data set for the gender composition of editors in mathematical journals (Topaz and Sen, 2016) and (2) a recent study (Lerback and Hansen, 2017) with data from 20 journals of the American Geophysical Union. We find that, overall, our study provides a global account on the prevalence of women among editors and reviewers and ranks these previous reports in a continuum of field-specific participation numbers.

While our data does not allow inferring the reasons for why women contribute less, some studies indeed suggested that women are less productive. These results are now mentioned and discussed as a potential reason in the revised Discussion (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph). However, for completeness, we also refer to other partially conflicting studies indicating that women may have different, not necessarily lower productivity patterns than men (e.g. Gilbert et al., 1994; Pan & Kalinaki, 2015). Furthermore, we stress that the key new findings of our study concern homophily (Figure 3) and that the smaller number of women has been taken into account in the analysis leading to these findings.

3) The authors should more fully acknowledge the limitations of the data set. For example, the authors mention that the names are publicly disclosed (this is what allows this analysis). However, several studies have shown that open peer review significantly affects the willingness of certain populations to conduct reviews (e.g., by age, rank, gender, minority status). Could the fact that names are disclosed make this less generalizable? The limitations of this should be discussed.

We thank the reviewer for pointing this out. We agree that open peer review could potentially have effects other than the intended increase in transparency and quality. These effects are now explicitly discussed in the revised manuscript (subsection “How representative are the Frontiers journals?”, first paragraph). Despite extensive search we could not find any study that specifically assesses whether open peer review affects the gender specific acceptance rate of reviewer assignments. Should we have missed a crucial publication, we would be very grateful for a reference.

We also now systematically compare our findings to a large number of previous studies which considered smaller sample-sizes but had access to more meta-data, since they were initiated journal-side. Overall, as we comment in the extended Discussion, we find women’s participation rates among editors, reviewers and authors of Frontiers journals to be comparable to previous reports, indicating that the Frontiers community is well representative of a general situation.

4) The analyses only included reviewers who agreed to review (understandably, as there is no publicly available data for reviewers who declined to review). However, women might be more likely to say no to requests to reviews for various reasons (e.g. they might have less time to review, or they might feel less qualified to comment on their peers' work). Hence, might their underrepresentation stem from self-selection rather than discrimination?

The reviewer is correct in suggesting that it is critical to distinguish between self-selection and discrimination. Our data set does not allow inferring the reasons for women’s underrepresentation and therefore self-selection (due to various motivations) could potentially play a role. This possibility, along with others, is now acknowledged and discussed in the revised manuscript (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph). Besides this discussion, we also performed a more detailed analysis of the different prevalence of homophily in the male and female subpopulation. Taken together we feel that the data clearly point to same-gender preference of mostly male editors as a contribution to the under-representation of female scientists across the Frontiers series of journals. This conclusion is particularly strengthened by our new analysis of the different patterns of gender homophily for male and female editors (Figure 3e-f).

5) It would be interesting to comment on the possible effect of age on gender discrepancies at different levels. Specifically, editors might (on average) be older than reviewers, who in turn might (on average) be older than authors (although with large variability). Is it possible that higher gender discrepancy among editors and reviewers is because women might be less represented in older age groups?

Our data set does not include scientists’ ages and we are, therefore, unable to analyze how age affects women’s participation in the peer-review system. However, as the number of female scientists decreases with age and rank, the differences seen in the gender fraction of editors, reviewers and authors could well be influenced by the age distribution in these populations. The revised manuscript now mentions this possibility (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph) and we further discuss our results in light of a recent study with access to age data (Lerback and Hansen, 2017; subsection “How representative are the Frontiers journals?”, last paragraph).

6) There is sparse information on author name disambiguation (for those that lacked a UID). More information should be provided on this. The authors state that they cannot guarantee a "completely error-free parsing" – could they run a validation test to give an estimate of the error? And was any validation exercise conducted on the name-gender algorithm? How does this compare with other name-gender algorithms?

We thank the reviewer for pointing this out. The gender assignment is a critical step of the analysis and we have now included both a more detailed description of the name-gender algorithm and a validation test in the Materials and methods section (subsections “Collection and parsing of data” and “Determination of gender”). While errors are in principle unavoidable when identifying people based on their names, Newman (2001) estimated their rate to be only few percent. Combined with our reliance on user IDs for the vast majority of people, we expect that errors due to name disambiguation are uncommon.

To estimate the error rate in gender assignment, we have validated the genders of a random selection of 924 scientists through a manual web search (as described in subsection “Determination of gender”, second paragraph) and compared our assignment error with the supplemental material of Lariviere et al. (2013). Notably, our algorithm’s error rate is smaller than the reported error rate of the name-gender algorithm of Lariviere et al. (2013). Independent of the error rate, we would like to stress that mis-assigning gender can never increase the homophily that we see. Random mis-assignments have the same effect as a small rate of random permutation of gender labels. This would reduce evidence for inbreeding homophily rather than spuriously inducing it Thus, although we cannot reach a 100% correct gender assignment, the finding of inbreeding homophily is robust against this type of uncertainty.

7) Several seminal books on peer review and articles specifically focused on bias in peer review are omitted here. Furthermore, the article fails to engage with the previous literature in a substantial way. Given that this work is one of the largest peer-review studies to-date, it would be useful for the authors to establish how their work refutes or confirms previous work.

We thank the reviewer for pointing this out. It is true that our previous version had the style of a short letter, rather than that of an extended article. We now rewrote and substantially extended the Discussion section of the manuscript, including generous paragraphs of comparison with other empirical work, the current state of the art and detailed discussions about why our study can clarify previously ambiguous pictures.

Despite these comprehensive additions and expanded reference section (99 references cited, while in the previous version we had just 31) we have failed to include the crucial references the reviewer had in mind; we would be keen to receive more explicit guidance.

8) Regarding the third paragraph in the Results section: please include a figure which shows that the gender gap in various contributions cannot be fully explained by population.

Every figure in our article clearly indicates comparison with confidence intervals associated to a null hypothesis of no bias effects besides numerical differences between male and female populations. Therefore, our figures already showed the requested information.

Indeed, our statistical assessment – random shuffling of gender labels – leaves the network structure and the fraction of females at every level intact. In essence, we ask whether the observed contributions could have occurred randomly, if all actors of different gender behaved identically with respect to the contribution in question.

We apologize if this important procedural aspect was not clear enough. We have now better highlighted the concept of the shuffling method (cf. Figure 1A) for the generation of gender-blind control networks, and edited the third paragraph of the Results section (third paragraph) in an attempt to make it more understandable. Furthermore, we have also improved figure captions.

9) There are many interesting results that warrant more explanation in the text. At present, some of the most interesting results are buried in the figure captions. It would be great to see the authors embed some of these results, in more detail, in the main text.

We thank the reviewer for this suggestion. Former Figure S3, now Figure 1C, as well as former Figure S5, now Figure 3D are now presented in more detail in the Results section, (second and sixth paragraphs, respectively). We also better highlighted the homophily findings, which are a key novel contribution of our study. In particular, we have added a completely new section and figure panel about the observed different mechanisms of homophily for male and female editors (Figure 3E and 3F) in the main manuscript.

We further rewrote and substantially extended the Discussion section of the manuscript to now include a discussion of these findings.

10) Overall, the differences between observed and chance-level participation of women (for instance in Figure 2) are not that large, even though they are statistically significant. How concerned should we be about them in the first place?

We agree with the reviewer that it is important to discern statistical significance, effect size, and relevance. First, while the differences, for the most part, seem small, they can reach up to a factor of 2 between the observed fraction of females with 10 contributions and the chance range. Given the large range in the number of occurrences, visualized in Figure 2, we had to use a logarithmic scale. This requires care when attempting to judge the magnitude of the bias just from the visual inspection of those graphs, while statistics are precise. Second, to highlight why our findings are concerning, we now include several sections in the revised Discussion that:

A) Argue why we are confident, in the light of previous findings, to state that discrimination is a critical reason for the remaining inequity (subsection “Evolution of participation rates by gender and causes for remaining inequity”, second paragraph);

B) List direct results of subtler forms of gender bias. Notably, even a slight homophily can influence and alter the way in which information spreads and is damaging to female scientists, whose work is perceived with a negative bias by the majority (subsection “Homophily in society and science”, last paragraph) and;

C) Translate the findings to specific structural disadvantages for women (subsection “Conclusions”, second paragraph).

11) The authors advise that editors should be encouraged to make non-homophilic reviewer selection. But if the goal is to increase gender parity, shouldn't women editors be 'allowed' to be more homophilic, at least until the parity achieved? Might it be better to suggest that all editors should try to recruit more women reviewers?

The reviewer is asking an interesting question and we agree: on first glance, it might seem like a prudent policy to support homophily among female editors to reach equity.

In response to question 1, we would like to highlight, however, that the goal of scientific policy is not equity per se, but equal opportunity and scientific meritocracy. To this end homophily should be reduced, too, as homophily, even in a situation when parity is reached, prevents equal opportunity. This is now detailed in the revised Discussion in the first two paragraphs of the subsection “Conclusions”.

In response to question 2, encouraging editors of both genders to recruit more women might foster the advance of excellent female scientists and advertise that women do good science. We stress however that a broad literature corpus in sociology of work environments have shown that the efficacy of top-down measures aiming at enforcing equal representation is poor if not paired with policies aiming at educating deciders about the importance of fighting gender discrimination (i.e. healing “from within” network construction mechanisms, rather than just artificially “transplanting” more female nodes into the network).

Considering both sides of the medal, it is unclear whether the positive effects (faster route to equity) or negative effects (undesirable behavioral trait) prevail in the long run. We added a corresponding paragraph to the Discussion (last paragraph) and the subsection “Conclusions” (third paragraph).

https://doi.org/10.7554/eLife.21718.012

Article and author information

Author details

  1. Markus Helmer

    1. Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
    2. Bernstein Center for Computational Neuroscience, Göttingen, Germany
    3. Yale University, New Haven, United States
    Contribution
    MH, Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    markus.helmer@yale.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9680-0595
  2. Manuel Schottdorf

    1. Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
    2. Bernstein Center for Computational Neuroscience, Göttingen, Germany
    Contribution
    MS, Conceptualization, Formal analysis, Validation, Investigation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5468-4255
  3. Andreas Neef

    1. Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
    2. Bernstein Center for Computational Neuroscience, Göttingen, Germany
    Contribution
    AN, Conceptualization, Formal analysis, Validation, Investigation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  4. Demian Battaglia

    1. Bernstein Center for Computational Neuroscience, Göttingen, Germany
    2. Institute for Systems Neuroscience, Aix-Marseille University, Marseille, France
    Contribution
    DB, Conceptualization, Formal analysis, Validation, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    demian.battaglia@univ-amu.fr
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2021-7920

Funding

Bundesministerium für Bildung und Forschung (01GQ1005B)

  • Andreas Neef
  • Demian Battaglia

Marie Curie Career Development Fellowship (FP7- IEF 330792 (DynViB))

  • Demian Battaglia

Boehringer Ingelheim Fonds

  • Manuel Schottdorf

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Publication history

  1. Received:
  2. Accepted:
  3. Version of Record published:

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 22,333
    views
  • 1,818
    downloads
  • 252
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Markus Helmer
  2. Manuel Schottdorf
  3. Andreas Neef
  4. Demian Battaglia
(2017)
Research: Gender bias in scholarly peer review
eLife 6:e21718.
https://doi.org/10.7554/eLife.21718
  1. Further reading

Further reading

  1. Edited by Peter A Rodgers
    Collection

    The study of science itself is a growing field of research.

  2. Edited by Julia Deathridge
    Collection

    Research culture needs to be improved for the benefit of science and scientists.