Research Assessment: Reducing bias in the evaluation of researchers

A workshop run by DORA identified a number of ways to reduce bias in hiring and funding decisions.

By Anna Hatch (DORA), Veronique Kiermer (PLOS), Bernd Pulverer (EMBO), Erika Shugart (American Society for Cell Biology), and Stephen Curry (Imperial College London)

Introduction

Hiring and funding decisions influence academic priorities directly by setting research agendas. They also shape priorities indirectly by affecting the diversity of the scientific workforce, which in turn influences the questions that researchers choose to ask. If we inadvertently give preferential treatment to certain labs, publication outlets or geographic locations, we are likely to miss out on the breadth and richness of ideas that the global scholarly community has to offer.

In 2013, just 18% of biomedical PhDs in the United States landed a tenure-track faculty position 6–10 years after graduate school (National Academies of Sciences, Engineering, and Medicine, 2018). With these odds, it is not surprising that considerable effort is put into the decision-making process for faculty searches. But measuring the impact and quality of research objectively and efficiently is difficult, especially when there is a large pool of applicants. Assessors may unconsciously use shortcuts like interpreting the journal name as a proxy for research quality. How can we evaluate researchers fairly, in a way that recognizes the qualities of their contributions and their potential for the position in question, without placing undue burdens on reviewers?

To identify points in the assessment process where we might improve the quality and objectivity of hiring and funding decisions, DORA hosted an interactive session at the ASCB|EMBO meeting held in San Diego in December 2018, titled ‘How to improve research assessment for hiring and funding decisions.’ The 30 individuals who participated have diverse backgrounds and career stages, and include graduate students, postdocs, professors, university administrators, funders and publishers.

To form the basis of our exercise, we collected two job announcements and portions of two grant applications from institutions in the US and Europe. Participants were asked to work together in small groups containing a variety of stakeholders and provide constructive feedback on materials on a particular application. We asked them to look specifically for unnecessary information that may promote conscious or unconscious shortcuts and for any missing information that should be included to inform the decision-making process. While the pitfalls of using journal-based metrics as an evaluation tool have been discussed in detail (Seglen, 1997; The PLoS Medicine Editors, 2006; Adler et al., 2009; Curry, 2018), we were eager to identify any other types of misleading shortcuts that may occur. We also wanted the participants to explore what aspects and outputs of scholarship other than peer-reviewed articles might be worth recognizing in evaluation processes.

Here, we summarize our discussion from that day. Although not all participants agreed on every point, the suggestions exemplify how small changes to existing application materials could improve the ways we evaluate researchers.

Avoiding shortcuts

When applying for a faculty position or grant, one of the first things applicants are often asked to provide – and that the reviewer sees – is their educational history. Three of the four groups identified institution names and researcher pedigree as things that could prejudice reviewers and be used as shortcuts for assessing quality of research, particularly during a triage phase of selection. One group suggested removing institution names from educational history and instead listing degrees accompanied by the countries where they were received. The inclusion of countries was meant to account for geographical differences in graduate training, though this too could introduce bias. The majority of participants believed the most practical way to minimize this bias was to move educational history to the end of the CV or application.

One group noted that in some countries individuals are asked to submit a picture alongside their application, and some applicants do this even it if is not requested. It is not surprising that this practice permits unconscious bias to sneak into the review process. Removing photos is an easy action for institutions to take, but this may not be enough – gender and ethnic identity can often be inferred from a person’s name. To further reduce unconscious bias, applications could be anonymized. This requires more effort, and is probably most effective and easiest to achieve at the triage stage, when judgments are made using submitted paperwork only.

Journal-based metrics, like the Journal Impact Factor (JIF), do not describe the quality of individual articles and can lead to reviewer bias when assessing research quality. Despite this, many researchers apparently feel that these metrics can give them an advantage. EMBO staff remove journal metrics from a significant fraction of the applications they receive for their long-term fellowship program each funding cycle, despite explicitly instructing applicants not to include such numbers. This highlights the fact that cultural change is unlikely to come from the efforts of a single institution; rather, widespread action is needed.

The names of the journals where an applicant has published are also often used as a shortcut for the quality and impact of their research. Asking applicants to upload PDFs of their key papers can have the same effect, because journal branding is prominently displayed on reprints. One group suggested replacing journal names in bibliographies with a 2–4 sentence summary that describes the advances the article makes to its field and the applicant’s contribution to the work. This would provide more context for reviewers and make it easier to gauge the role the researcher played in collaborative research projects, especially if they are listed as a second or middle author.

Another way to judge researcher contributions is to adopt the Contributor Roles Taxonomy (CRediT) system, which has been developed to provide more specific information on each co-author’s contribution to a published paper (Allen et al., 2019). A number of publishers have already adopted the system, and the National Academy of Sciences is tracking CRediT implementation on its webpage, Transparency in Author Contributions in Science (TACS; McNutt et al., 2018).

Missing in action

Although peer-reviewed articles are often the focus of current assessment practices, researchers move science forward in a number of other ways. Some of the scholarly outputs that participants feel should be taken under consideration by reviewers include preprints, code, protocols, teaching, mentorship, and scholarly and community engagement activities (see table). Because every organization has a unique mission, outputs may carry different weights at different places.

Research related outputsPreprints
Research articles
Review articles
Commentary or perspective pieces
Books
Monographs
Invited talks
Conference presentations
Conference papers
Conference abstracts
Patents
Data
Code
Software
Protocols
Reagents
Tools
Teaching and mentorshipTeaching classes
Advising students
Effective mentorship
Graduating students
Successful trainee job placement
Promotion of diversity, equity, and inclusion on campus, in the classroom, and in the lab
Service on committees – qualifying exam, thesis defense and/or advisory committees 
Leading career training and leadership workshops and/or lectures for trainees 
Teaching summer courses and workshops outside of home university 
Education focused publications
Academic serviceService on department committees
Service on grant review panels
Grant writing to support institutional initiatives
Peer-review research articles
Journal editor
Conference organizer
Service on committees for scholarly societies
Other positions of leadership in or outside the university
Collaboration and team sciencePartnership with industry or other stakeholders
Partnerships with other research groups
Contributions to open science including data and educational resource repositories
Societal impactsCreation of new policy
Science advocacy
Effects on community
Public engagementPublic talks
Participation in citizen science projects
Outreach at K-12 schools 
Judging science fair projects

While using more diverse assessment criteria that better reflect the contributions researchers make would encourage desirable changes in behavior, research shows this is not common in practice for hiring and promotion. For example, Alperin and colleagues found that the public dimensions of faculty work are often undervalued in review, promotion and tenure documents in the US and Canada (Alperin et al., 2019). Other groups are also monitoring the inclusion of open science practices in hiring and funding decisions. ASAPBio, a nonprofit organization, is tracking institutional and funder policies that include preprints. A project on the Open Science Framework (OSF) is collecting faculty job advertisements that ask or suggest that applicants supply a statement about open science. Policies like these only encourage behavior change if assessors take them into account. The most successful individuals advance their careers by meeting the expectations of what the field measures. When we assess contributions to the things that we value, we are more likely to encourage desirable behaviors in the academic community.

Form follows function

As institutions reflect on the most useful things to include in research assessment, it is important to provide clear guidance and training to researchers early in their careers, before they are called to be evaluated for faculty positions and grant funding. Professional development programs could help trainees better understand what to include in faculty and grant applications, but that information changes depending on the opportunity. No matter where someone applies or what they apply for, transparent evaluation policies set clear standards for both reviewers and applicants.

The Charité Hospital in Berlin asks researchers applying for professorship positions to answer a set of questions related to research contributions, open science, team science and interactions with stakeholders. While the development of a ‘structured narrative’ approach is a welcome innovation, a universal format for such applications should be considered. Without standardization, assessors will have difficulties extracting the information that matters to them, and a proliferation of prescriptive templates will burden applicants who need to adapt their application to many institutions.

Participants appreciated when institutions included statements expressing commitment to diversity, equity and inclusion in the application process. But, some participants pointed out that these statements would be much more powerful if they were accompanied by a description of how an inclusive environment is supported on campus – and effective action. One example comes from the University of Wisconsin, where recent work shows that a 2.5-hour intervention workshop to reduce gender bias in STEMM departments resulted in an increase in the number of new female faculty hires (Carnes et al., 2015; Devine et al., 2017).

To conclude

We are not the first group to bring people together for an open and honest dialogue about researcher assessment. The University Medical Center Utrecht in the Netherlands used a series of meetings with staff at all career stages to develop new assessment policies for hiring and promotion (Benedictus et al., 2016). Conversations like these reveal practical intervention points for reform, and develop ownership or buy-in from researchers who have to implement changes or might be subject to them. We hope that the ideas that surfaced from our scrutiny of application materials can lead to small, but meaningful recommendations that reduce sources of bias and do not significantly add to the burden of evaluators.

Acknowledgements

We thank the participants at our career enhancement programming session at the 2018 ASCB|EMBO Meeting for a robust and thoughtful discussion about research assessment. We also thank David Drubin for his helpful comments on earlier versions of this piece and Needhi Bhalla, Omar Quintero, and Erik Snapp for providing feedback on our list of scholarly outputs.

Competing interests

Anna Hatch is the community manager of DORA, and the other authors are members of the DORA steering committee. DORA receives financial support from eLife, and eLife’s executive director Mark Patterson is also on the steering committee.

For correspondence: Anna Hatch and Stephen Curry

References

Adler, R; Ewing, J; Taylor, P. 2009. Citation Statistics. Statistical Science 24, 1–14. https://doi.org/10.1214/09-STS285

Allen, L; O’Connell, A; Kiermer, V. 2019. How can we ensure visibility and diversity in research contributions? How the Contributor Role Taxonomy (CRediT) is helping the shift from authorship to contributorship. Learned Publishing 32, 71–74. https://doi.org/10.1002/leap.1210

Benedictus, R; Miedema, F; Ferguson, MWJ. 2016. Fewer numbers, better science. Nature News 538, 453. https://doi.org/10.1038/538453a

Alperin, JP; Muñoz Nieves, C; Schimanski, LA; Fischman, GE; Niles, MT; McKiernan, EC. 2019. How significant are the public dimensions of faculty work in review, promotion and tenure documents? eLife, 8, e42254. https://doi.org/10.7554/eLife.42254

Carnes, M; Devine, P; Baier Manwell, L; Byars-Winston, A; Fine, E; Ford, C; Forscher, P; Isaac, C; Kaatz, A; Magua, W; Palta, M; Sheridan, J. 2015. The effect of an intervention to break the gender bias habit for faculty at one institution: a cluster randomized, controlled trial. Acad Med 90, 221–230. https://doi.org/10.1097/ACM.0000000000000552

Curry, S. 2018. Let’s move beyond the rhetoric: it’s time to change how we judge research. Nature 554, 147. https://doi.org/10.1038/d41586-018-01642-w

Devine, PG; Forscher, PS; Cox, WTL; Kaatz, A; Sheridan, J; Carnes, M. 2017. A gender bias habit-breaking intervention led to increased hiring of female faculty in STEMM departments. J Exp Soc Psychol 73, 211–215. https://doi.org/10.1016/j.jesp.2017.07.002

McNutt, MK; Bradford, M; Drazen, JM; Hanson, B; Howard, B; Hall Jamieson, K; Kiermer, V; Marcus, E; Kline Pope, B; Schekman, R; Swaminathan, S; Stang, PJ; Verma, IM. 2018. Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication. Proceedings of the National Academy of Sciences 115, 2557–2560. https://doi.org/10.1073/pnas.1715374115

National Academies of Sciences, Engineering, and Medicine. 2018. The Next Generation of Biomedical and Behavioral Sciences Researchers: Breaking Through. Washington, DC: The National Academies Press. https://doi.org/10.17226/25008

The PloS Medicine Editors. 2006. The Impact Factor Game. PLOS Medicine 3, e291. https://doi.org/10.1371/journal.pmed.0030291

Seglen, PO. 1997. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 497–497. https://doi.org/10.1136/bmj.314.7079.497

We welcome comments, questions and feedback. Please annotate publicly on the article or contact us at hello [at] elifesciences [dot] org.

Interested in finding out more about opportunities, events and issues that are important for early-career researchers? Sign up to the eLife Early-Career Community newsletter or follow @eLifeCommunity on Twitter.