Science Forum: A community-led initiative for training in reproducible research

Department of Plant Physiology, Institute of Botany, Faculty of Biology, Technische Universität Dresden, Germany
Department of Molecular and Human Genetics, Baylor College of Medicine, United States
QUEST Center, Berlin Institute of Health, Charité Universitätsmedizin Berlin, Germany
Shanghai Key Laboratory of Brain Functional Genomics, East China Normal University, China
Medical Technology Cluster, Indonesian Medical Education and Research Institute, Faculty of Medicine, Universitas Indonesia, Indonesia
Boyce Thompson Institute, United States
CIBO/InBIOO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Portugal
Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Portugal
Research School of Biology, Australian National University, Australia
Department of Biomedical Science, Midwestern University, United States
Department of Neuroscience, Carleton University, Canada
Reproducibility for Everyone, United States

Jun 21, 2021

Open access
Copyright information

Download
Cite
CommentOpen annotations (there are currently 0 annotations on this page).
Share

Article
Figures and data
Abstract
Why is training in reproducibility needed?
Reproducibility for Everyone (R4E)
How can scientists use the R4E materials?
Conclusions
Data availability
References
Decision letter
Author response
Article and author information
Metrics

Abstract

Open and reproducible research practices increase the reusability and impact of scientific research. The reproducibility of research results is influenced by many factors, most of which can be addressed by improved education and training. Here we describe how workshops developed by the Reproducibility for Everyone (R4E) initiative can be customized to provide researchers at all career stages and across most disciplines with education and training in reproducible research practices. The R4E initiative, which is led by volunteers, has reached more than 3000 researchers worldwide to date, and all workshop materials, including accompanying resources, are available under a CC-BY 4.0 license at https://www.repro4everyone.org/.

Why is training in reproducibility needed?

Reproducibility and replicability are central to science. Reproducibility is the ability to regenerate a result using the dataset and data analysis workflow that was used in the original study, while replicability is the ability to obtain similar results in a different experimental system (Leek and Peng, 2015; Schloss, 2018). Despite their importance, studies have shown that it can be quite challenging to reproduce and replicate peer-reviewed results (Baker and Penny, 2016; Freedman et al., 2015). In the past few years, several multi-center projects have assessed the level of reproducibility and replicability in various scientific fields, and have identified major factors that are critical for repeating and confirming scientific results (Alsheikh-Ali et al., 2011; Amaral et al., 2019; Baker et al., 2014; Button et al., 2013; Cova et al., 2021; Errington et al., 2014; Friedl, 2019; Hardwicke et al., 2018; Lazic, 2010; Marqués et al., 2020; Open Science Collaboration, 2015; Shen et al., 2012; Stevens, 2017; Strasak et al., 2007; Weissgerber et al., 2019; Weissgerber et al., 2015). In the rest of this article we will use the term reproducibility as shorthand for reproducibility and replicability, as is often done in the life sciences (Barba, 2018).

The factors that control the reproducibility of an experiment can be grouped into the four categories shown in Figure 1. The first represents technical factors, such as variability in reagents or materials used to perform research. The second category contains factors related to flaws in study design and/or statistical analysis such as the use of inappropriate controls, insufficient sample sizes to properly power the study, inappropriate statistical analyses, underpowered studies, and others. The third category contains human factors, which includes insufficient description of methods and the use of reagents or organisms that are not shared. In addition, scientific misconduct, such as hypothesizing after results have been obtained (HARKing; Kerr, 1998) or P-hacking (Fraser et al., 2018; Head et al., 2015; Miyakawa, 2020), is hard to detect and contributes to confirmation and publication bias issues. Lastly, external factors that are beyond the researchers' control can negatively impact reproducibility; these can include scientific rewards such as a high impact publication or paywalls that restrict access to crucial information. Going forward, developing solutions to minimize these confounding factors will be of vital importance to improve scientific integrity and to further accelerate the advancement of the scientific enterprise (Botvinik-Nezer et al., 2020; Fomel and Claerbout, 2009; Friedl, 2019; Gentleman and Temple Lang, 2007; Mangul et al., 2019; Mesirov, 2010; NIH, 2020; Peng, 2011).

Figure 1

Download asset Open asset

Factors that affect reproducibility in research.

An approximation of the classification of categories that contribute to irreproducible scientific results, including technical, human, errors in study design and statistical analysis and external. Specific examples have been listed under each category.

While the problems with experimental reproducibility have been known for decades, they have only come to the fore over the past ten years (Begley and Ellis, 2012; Munafò et al., 2017; Prinz et al., 2011). Within the scientific community, systemic solutions and tools are being developed that allow scientists to efficiently share research materials, protocols, data, and computational analysis pipelines (some of these tools are covered in our training materials, see Box 1). Despite their transformative potential, these tools are underutilized, as most researchers are unaware of their existence, or do not know how to incorporate them in their daily workflows.

Box 1.

Unit topics.

The units included in the standard introductory workshop cover a range of skills and tools needed to conduct reproducible research. Below are examples of content that has been used in previous workshops. The specific content of each workshop can vary and is adjusted to the audience and event.

1. The reproducibility framework: Reproducible research practices allow others to repeat analyses and corroborate results from a previous study. This is only possible when authors have provided all necessary data, methods and computer codes (Figure 2). Our reproducibility toolbox includes reproducible practices for organization, documentation, analysis, and dissemination of scientific research.

2. Organization, data management and file naming: An effective data management plan, including clear file naming conventions, prevents problems such as lost data, difficulties identifying the most recent version of a file, the inability to locate files after team members leave the laboratory, or difficulties in finding or interpreting files years after the project is completed. This section describes techniques to ensure that all project files are easy to identify and locate and that they are appropriately documented.

3. Electronic lab notebooks: Electronic lab notebooks (ELNs) overcome many of the limitations of paper lab notebooks – they are searchable, cannot be damaged or misplaced, and are easy to back-up and share with collaborators. This section discusses available electronic lab notebooks and strategies for selecting the electronic lab notebook that meets the needs of an individual research team.

4. Preregistrations and protocol sharing: Scientific publications often lack essential details needed to reproduce the methods described. Preregistrations of planned research include details of the methods and tools that will be used in the project and provide transparency of the intended analyses and outcome. Protocol repositories allow researchers to share detailed, step-by-step protocols, which can be cited in scientific papers. Repositories also make it easy to track changes in protocols over time by incorporating version control, allowing researchers to post updated versions of protocols from their own lab, or adapted versions of protocols that were originally shared by other research groups. This section describes strategies for creating effective ‘living protocols’ that other research teams can easily locate, follow, cite and adapt.

5. Biological material and reagent sharing: Laboratories regularly produce specialized materials and organisms, such as reagents, plasmids, seeds and organism strains. Access to these materials is essential to reproduce and expand upon published work. Repositories maintain reagents and biological materials deposited by scientists, and also make these materials accessible to the scientific community for a small or symbolic donation. Nonetheless, many laboratories do not use repository resources to share their materials, and thus limit their outreach and impact. This section introduces the concept of material repositories, which allow investigators access to materials without investing time and resources to recreate, maintain, verify and distribute their own or another researcher’s reagents.

6. Data visualization and figure design: Figures show the key findings of a study and should allow readers to critically evaluate the data and structures behind them. As an example, scientists routinely use the default plots of spreadsheet software such as bar graphs for presenting continuous data (Weissgerber et al., 2015). This is a problem, as many data distributions can lead to the same bar or line graph and the actual data may suggest different conclusions from the summary statistics alone. This section illustrates strategies for replacing bar graphs with more informative alternatives (i.e. dot, box, or violin plots), provides guidance on choosing the visualization best suited for various data structures and images, and provides a brief overview of tools for creating more effective, appealing and informative graphics and figures.

7. Bioinformatic tools: The sample size and number of data points (in multidimensional data) in research studies has greatly increased in the last decade. Bioinformatic tools for analyzing large data sets are essential in many fields. Unfortunately, analyses performed using these tools can only be reproduced or adapted to other study designs if authors share their code, software version and software settings. This section examines techniques and tools for reproducible data analysis, including notebooks, version control, managers for packages, dependencies and the programming environment, and containers.

8. Data sharing: Depositing data in public repositories allows other scientists to review published results and reuse the data for additional analyses or studies. All data should adhere to the principle of FAIR data: be findable, accessible, interoperable and reusable (https://fairsharing.org/). This section describes the types of information that should be shared to allow the community to interpret and use archived data. We also discuss best practices, including criteria for selecting a repository and the importance of specifying a license for data and code reuse. There are instances where data cannot be shared, this includes when there are privacy concerns with genetic data from living people.

Integrating these tools into the standard scientific workflow has the potential to shift the scientific community towards a more transparent and reproducible future. Educational initiatives with open-source materials can significantly increase the reach of teaching materials (Lawrence et al., 2015) to accelerate the uptake of best practices and existing tools for reproducible research. Several initiatives exist that offer tutorials or seminars on some aspects of reproducibility (Box 2). While they each have their strengths, none of them individually offer a scalable solution to the existing training gap in reproducibility. Here, we present Reproducibility for Everyone, a set of workshop materials and modules that can be used to train researchers in reproducible research practices. Our trainings are scalable, from a dozen attendees in an intensive workshop to a few hundred participants in an introductory workshop that can attend at once in a virtual format or a large venue. However, the reproducibility movement worldwide is growing, and as different initiatives cover various aspects of the training process, they can together help bridge the reproducible training gap.

Box 2.

Resources for training in reproducible research.

Carpentries workshops (https://carpentries.org/): Workshops teaching reproducible data handling and coding skills. Intended for scientists at any career stage.

Frictionless Data Fellowship (https://fellows.frictionlessdata.io): Nine-month virtual training program on frictionless data tools and approaches. Target audience are mainly early-career researchers (ECRs). Eight fellows are selected each year and a stipend is provided.

Oxford Berlin Summer School (https://www.bihealth.org/en/notices/oxford-berlin-summer-school-on-open-research-2020): Five-day summer school covering open research and reproducibility in science.

ReproducibiliTea (https://reproducibilitea.org/): Locally run journal clubs focused on open science and reproducibility. Target audience are mainly early career researchers. Global reach with currently 114 local groups.

Research Transparency and Reproducibility Training (RT2; https://bitss.org): Three-day training providing overview of reproducible tools and practices for social sciences. Target audience are scientists at any career stage of Social Sciences.

Project TIER (Teaching Integrity in Empirical Research) (https://www.projecttier.org/): Training in empirical research transparency and replicability for social scientists, students and faculty. Offer fellowships and workshops for faculty and graduate students.

Framework for Open and Reproducible Research Training (FORRT; https://forrt.org/): Connects educators and scholars in higher education to include open and reproducible science tenets in education. Offer the e-learning platform Nexus with several curated resources that include sufficient context for educators to use.

Reproducibility for Everyone (R4E)

R4E was formed in 2018 to address the challenges of integrating reproducible research practices in life science laboratories across the globe. Our mission is to increase awareness of the factors that affect reproducibility, and to promote best practices for reproducible and transparent scientific research. We offer open access introductory materials and workshops to teach scientists at all career stages and across disciplines about concrete steps they can take to improve the transparency and reproducibility of their research. All workshops are offered free of charge. We developed eight modules as independent, in-depth slide sets focusing on different aspects of the day-to-day scientific workflow, allowing trainers to customize the workshop and adapt it to audiences in different disciplines (Box 1). R4E targets mainly biological and medical research practices (reagent and protocol sharing, data management) and in part computer science (bioinformatic tools) as evidenced by the range of trainings offered so far. Tools we discuss could also be useful for disciplines close to biological research like bioengineering, biophysics, (bio)chemistry, etc. Some training modules, especially Data management, Data visualization and Figure design, might be valuable for qualitative research that collects and analyzes text and other non-numerical data.

All materials, including recordings of previous R4E workshops and webinars, are available at https://www.repro4everyone.org/ (RRID:SCR_018958). The goal of R4E is to provide scientists with a clear overview of existing reproducibility-promoting tools, as well as to give scientists the opportunity to revisit all training material when needed, by providing them with full access to all training materials so they learn at their own pace. In addition, we welcome each trainee to fine-tune the material for their own field of expertise and to train their peers. For trainees who want to help run one of our workshops we offer the train-the-trainer approach: We meet with the trainee before the workshop and decide together which section of the material the trainee will present. Then we go through the material together, share speaker notes and practice with the trainee if needed to stay in time during the workshop.

We have developed materials for both introductory and intensive workshop formats that are described below:

Introductory workshops are organized as two-hour sessions, including a 60- to 90 min presentation and 30 min interactive discussion of case studies, which can be held as in-person or virtual workshops with a large number of participants (>100). These introductory workshops are designed for an interdisciplinary audience and do not require prior knowledge of reproducible research practices as they cover many different topics (Box 1). These workshops are generally presented from a team of two to four instructors.
Intensive workshops provide in-depth training in the implementation of reproducible research practices for one or more topics. These workshops take at least four hours. Depending on the number of topics covered, intensive workshops may be spread over several days. R4E members typically design these sessions to provide intensive instruction within their areas of expertise. Outside experts may also be invited to teach sessions on additional topics. This type of workshop is best suited for a smaller (<50) group of participants.

Over the years, our community has grown and diversified substantially, consisting of scientists who taught one, or many R4E workshops. To date, we have reached more than 3000 researchers through over 30 workshops, which were predominantly held at international conferences and spanned numerous life science disciplines (e.g. ecology, biotechnology, plant sciences, neuroscience and many others). In addition, we have hosted several webinars that allowed researchers from all around the world to join, including webinars for early career scientists participating in the eLife Community Ambassadors Program. Investigators and conference organizers can request to host a workshop led by our volunteers or use our materials to learn more about responsible research practices and offer their own training.

The goal of our training is to introduce participants to a reproducible scientific workflow. Individual scientists or laboratories can make their research more reproducible by implementing as many of the steps introduced in our workshops as they are comfortable with (Figure 2). Feedback on our workshops indicate that 80% of participants learned important new aspects of reproducibly research practices and are very likely to implement at least some of the presented tools in their own research workflows in the future.

Figure 2

Download asset Open asset

Approaches that scientists can use to increase the reproducibility of their publications.

From top to bottom, approaches that can be used on their own or in combination to increase the reproducibility of experiments, ordered from least reproducibility to most. The column on the right includes details of tools and resources than can be used to help scientists take each specific approach.

It is important to point out that this will likely work best as a stepwise, iterative process to avoid scientists from feeling overwhelmed with implementing too many changes at once. When writing a research paper, the largest impact on the reproducibility of your work can be made by incorporating the following changes: adding a detailed list of materials used for the research, that includes research resource identifiers (RRIDs; https://scicrunch.org/resources) and catalog numbers for all materials (kits, antibodies, seeds, cell lines, organisms, etc.) that were created or used during the study. Ideally, newly generated reagents or organisms are deposited at appropriate repositories to enable easy access for other scientists. Incorporating a detailed and specific methods section is crucial to reproduce the research. Ideally, protocols are deposited at a repository, and the DOI number of the respective protocol is incorporated in the methods section. Large data sets, including all metadata, should be deposited in public data repositories to generate findable, accessible, interoperable, and reusable (FAIR) data (Sansone et al., 2019). Finally, bioinformatic analytic pipelines and scripts can easily be shared via Github, Anaconda, or computational containers such as Singularity. At a minimum, authors should list and cite all programs used, including version numbers and parameters.

We would also like to point out that a supportive environment is critical for these efforts to be properly adopted in a research environment. Being the first one to speak up about irreproducible research practices at your lab or institute can be challenging, or in some cases even isolating. In this case, getting involved with a local ReproducibiliTea journal club or reaching out to the initiative to start a chapter of your own can help you connect with like-minded individuals. Similarly, joining the R4E community and discussing these situations with our community members can help you find solutions to convince your peers and supervisors of the importance of incorporating reproducible research practices.

How can scientists use the R4E materials?

There are several ways for researchers to take advantage of the materials presented here to teach reproducible research practices. First, researchers can request a workshop run by the Reproducibility for Everyone team for a conference via email (hello@repro4everyone.org). Alternatively, researchers can use the slides and training materials available on our website to organize their own workshops. Reproducibility can be integrated into the research curriculum by asking trainees to organize and run a poster workshop at an institutional or departmental research day. Trainees can also discuss individual topics at journal clubs or as part of a methods course, after which they can develop plans to implement the identified solutions in their own research. Upcoming workshops and other opportunities to get involved and contribute will be shared through our Twitter account (@repro4everyone) and website (https://www.repro4everyone.org/).

Conclusions

Widespread adoption of new tools and practices is urgently needed to make scientific publications more transparent and reproducible. This transition will require scalable and adaptable approaches to reproducibility education that allow scientists to efficiently learn new skills and share them with others in their lab, department and field.

R4E demonstrates how a common, public set of materials curated and maintained by a small group may form the basis for a global initiative to improve transparency and reproducibility in the life-sciences. Flexible materials allow instructors to adapt both the content and workshop format to meet the needs of the audience in their discipline. Continued training on reproducibility could be promoted in the laboratory by for instance changing every n^th journal club to an educational meeting, discussing the latest developments in the reproducibility field.

Our workshops have reached over 3000 learners on six continents and continue to expand each year, offering a unique opportunity to train the next generation of scientists. Moving forward, R4E plans to broaden our reach by translating the existing materials into different languages and bring reproducibility training to more non-native English-speaking scientists. However, increasing training in reproducible research practices alone will not suffice to make all scientific findings reproducible. To achieve this goal, higher-level changes are needed to reduce the hypercompetitive nature of scientific research. Large structural and cultural changes are needed to transition from rewarding only breakthrough scientific findings, to promoting those that were performed using reproducible and transparent research practices.

Data availability

No new data were generated in this study.

References

(2011) Public availability of published research data in high-impact journals
PLOS ONE 6:e24357.

https://doi.org/10.1371/journal.pone.0024357
- PubMed
- Google Scholar
(2019) The Brazilian Reproducibility Initiative
eLife 8:e41602.

https://doi.org/10.7554/eLife.41602
- PubMed
- Google Scholar
1. Baker D
2. Lidster K
3. Sottomayor A
4. Amor S
(2014) Two years later: journals are not yet enforcing the ARRIVE guidelines on reporting standards for pre-clinical animal studies
PLOS Biology 12:e1001756.

https://doi.org/10.1371/journal.pbio.1001756
- Google Scholar
1. Baker M
2. Penny D
(2016) Is there a reproducibility crisis?
Nature 533:452–454.

https://doi.org/10.1038/d41586-019-00067-3
- Google Scholar
Preprint
1. Barba LA
(2018) Terminologies for reproducible research
arXiv.

https://arxiv.org/abs/1802.03311
- Google Scholar
1. Begley CG
2. Ellis LM
(2012) Drug development: raise standards for preclinical cancer research
Nature 483:531–533.

https://doi.org/10.1038/483531a
- PubMed
- Google Scholar
1. Botvinik-Nezer R
2. Holzmeister F
3. Camerer CF
4. Dreber A
5. Huber J
6. Johannesson M
7. Kirchler M
8. Iwanir R
9. Mumford JA
10. Adcock RA
11. Avesani P
12. Baczkowski BM
13. Bajracharya A
14. Bakst L
15. Ball S
16. Barilari M
17. Bault N
18. Beaton D
19. Beitner J
20. Benoit RG
21. Berkers RMWJ
22. Bhanji JP
23. Biswal BB
24. Bobadilla-Suarez S
25. Bortolini T
26. Bottenhorn KL
27. Bowring A
28. Braem S
29. Brooks HR
30. Brudner EG
31. Calderon CB
32. Camilleri JA
33. Castrellon JJ
34. Cecchetti L
35. Cieslik EC
36. Cole ZJ
37. Collignon O
38. Cox RW
39. Cunningham WA
40. Czoschke S
41. Dadi K
42. Davis CP
43. Luca AD
44. Delgado MR
45. Demetriou L
46. Dennison JB
47. Di X
48. Dickie EW
49. Dobryakova E
50. Donnat CL
51. Dukart J
52. Duncan NW
53. Durnez J
54. Eed A
55. Eickhoff SB
56. Erhart A
57. Fontanesi L
58. Fricke GM
59. Fu S
60. Galván A
61. Gau R
62. Genon S
63. Glatard T
64. Glerean E
65. Goeman JJ
66. Golowin SAE
67. González-García C
68. Gorgolewski KJ
69. Grady CL
70. Green MA
71. Guassi Moreira JF
72. Guest O
73. Hakimi S
74. Hamilton JP
75. Hancock R
76. Handjaras G
77. Harry BB
78. Hawco C
79. Herholz P
80. Herman G
81. Heunis S
82. Hoffstaedter F
83. Hogeveen J
84. Holmes S
85. Hu C-P
86. Huettel SA
87. Hughes ME
88. Iacovella V
89. Iordan AD
90. Isager PM
91. Isik AI
92. Jahn A
93. Johnson MR
94. Johnstone T
95. Joseph MJE
96. Juliano AC
97. Kable JW
98. Kassinopoulos M
99. Koba C
100. Kong X-Z
101. Koscik TR
102. Kucukboyaci NE
103. Kuhl BA
104. Kupek S
105. Laird AR
106. Lamm C
107. Langner R
108. Lauharatanahirun N
109. Lee H
110. Lee S
111. Leemans A
112. Leo A
113. Lesage E
114. Li F
115. Li MYC
116. Lim PC
117. Lintz EN
118. Liphardt SW
119. Losecaat Vermeer AB
120. Love BC
121. Mack ML
122. Malpica N
123. Marins T
124. Maumet C
125. McDonald K
126. McGuire JT
127. Melero H
128. Méndez Leal AS
129. Meyer B
130. Meyer KN
131. Mihai G
132. Mitsis GD
133. Moll J
134. Nielson DM
135. Nilsonne G
136. Notter MP
137. Olivetti E
138. Onicas AI
139. Papale P
140. Patil KR
141. Peelle JE
142. Pérez A
143. Pischedda D
144. Poline J-B
145. Prystauka Y
146. Ray S
147. Reuter-Lorenz PA
148. Reynolds RC
149. Ricciardi E
150. Rieck JR
151. Rodriguez-Thompson AM
152. Romyn A
153. Salo T
154. Samanez-Larkin GR
155. Sanz-Morales E
156. Schlichting ML
157. Schultz DH
158. Shen Q
159. Sheridan MA
160. Silvers JA
161. Skagerlund K
162. Smith A
163. Smith DV
164. Sokol-Hessner P
165. Steinkamp SR
166. Tashjian SM
167. Thirion B
168. Thorp JN
169. Tinghög G
170. Tisdall L
171. Tompson SH
172. Toro-Serey C
173. Torre Tresols JJ
174. Tozzi L
175. Truong V
176. Turella L
177. van ‘t Veer AE
178. Verguts T
179. Vettel JM
180. Vijayarajah S
181. Vo K
182. Wall MB
183. Weeda WD
184. Weis S
185. White DJ
186. Wisniewski D
187. Xifra-Porxas A
188. Yearling EA
189. Yoon S
190. Yuan R
191. Yuen KSL
192. Zhang L
193. Zhang X
194. Zosky JE
195. Nichols TE
196. Poldrack RA
197. Schonberg T
(2020) Variability in the analysis of a single neuroimaging dataset by many teams
Nature 582:84–88.

https://doi.org/10.1038/s41586-020-2314-9
- Google Scholar
1. Button KS
2. Ioannidis JP
3. Mokrysz C
4. Nosek BA
5. Flint J
6. Robinson ES
7. Munafò MR
(2013) Power failure: why small sample size undermines the reliability of neuroscience
Nature Reviews Neuroscience 14:365–376.

https://doi.org/10.1038/nrn3475
- PubMed
- Google Scholar
1. Cova F
2. Strickland B
3. Abatista A
4. Allard A
5. Andow J
6. Attie M
7. Beebe J
8. Berniūnas R
9. Boudesseul J
10. Colombo M
11. Cushman F
12. Diaz R
13. N’Djaye Nikolai van Dongen N
14. Dranseika V
15. Earp BD
16. Torres AG
17. Hannikainen I
18. Hernández-Conde JV
19. Hu W
20. Jaquet F
21. Khalifa K
22. Kim H
23. Kneer M
24. Knobe J
25. Kurthy M
26. Lantian A
27. Liao S-yi
28. Machery E
29. Moerenhout T
30. Mott C
31. Phelan M
32. Phillips J
33. Rambharose N
34. Reuter K
35. Romero F
36. Sousa P
37. Sprenger J
38. Thalabard E
39. Tobia K
40. Viciana H
41. Wilkenfeld D
42. Zhou X
(2021) Estimating the reproducibility of experimental philosophy
Review of Philosophy and Psychology 12:9–44.

https://doi.org/10.1007/s13164-018-0400-9
- Google Scholar
1. Errington TM
2. Iorns E
3. Gunn W
4. Tan FE
5. Lomax J
6. Nosek BA
(2014) An open investigation of the reproducibility of cancer biology research
eLife 3:e04333.

https://doi.org/10.7554/eLife.04333
- Google Scholar
1. Fomel S
2. Claerbout JF
(2009) Guest editors' Introduction: reproducible research
Computing in Science & Engineering 11:5–7.

https://doi.org/10.1109/MCSE.2009.14
- Google Scholar
1. Fraser H
2. Parker T
3. Nakagawa S
4. Barnett A
5. Fidler F
(2018) Questionable research practices in ecology and evolution
PLOS ONE 13:e0200303.

https://doi.org/10.1371/journal.pone.0200303
- PubMed
- Google Scholar
(2015) The economics of reproducibility in preclinical research
PLOS Biology 13:e1002165.

https://doi.org/10.1371/journal.pbio.1002165
- PubMed
- Google Scholar
1. Friedl P
(2019) Reproducibility in cancer biology: rethinking research into metastasis
eLife 8:e53511.

https://doi.org/10.7554/eLife.53511
- Google Scholar
1. Gentleman R
2. Temple Lang D
(2007) Statistical analyses and reproducible research
Journal of Computational and Graphical Statistics 16:1–23.

https://doi.org/10.1198/106186007X178663
- Google Scholar
1. Hardwicke TE
2. Mathur MB
3. MacDonald K
4. Nilsonne G
5. Banks GC
6. Kidwell MC
7. Hofelich Mohr A
8. Clayton E
9. Yoon EJ
10. Henry Tessler M
11. Lenne RL
12. Altman S
13. Long B
14. Frank MC
(2018) Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition
Royal Society Open Science 5:180448.

https://doi.org/10.1098/rsos.180448
- PubMed
- Google Scholar
1. Head ML
2. Holman L
3. Lanfear R
4. Kahn AT
5. Jennions MD
(2015) The extent and consequences of p-hacking in science
PLOS Biology 13:e1002106.

https://doi.org/10.1371/journal.pbio.1002106
- PubMed
- Google Scholar
1. Kerr NL
(1998) HARKing: hypothesizing after the results are known
Personality and Social Psychology Review 2:196–217.

https://doi.org/10.1207/s15327957pspr0203_4
- PubMed
- Google Scholar
(2015) Science gateways today and tomorrow: positive perspectives of nearly 5000 members of the research community
Concurrency and Computation: Practice and Experience 27:4252–4268.

https://doi.org/10.1002/cpe.3526
- Google Scholar
1. Lazic SE
(2010) The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?
BMC Neuroscience 11:5.

https://doi.org/10.1186/1471-2202-11-5
- Google Scholar
1. Leek JT
2. Peng RD
(2015) Opinion: reproducible research can still be wrong: adopting a prevention approach
PNAS 112:1645–1646.

https://doi.org/10.1073/pnas.1421412111
- PubMed
- Google Scholar
1. Mangul S
2. Mosqueiro T
3. Abdill RJ
4. Duong D
5. Mitchell K
6. Sarwal V
7. Hill B
8. Brito J
9. Littman RJ
10. Statz B
11. Lam AK
12. Dayama G
13. Grieneisen L
14. Martin LS
15. Flint J
16. Eskin E
17. Blekhman R
(2019) Challenges and recommendations to improve the installability and archival stability of omics computational tools
PLOS Biology 17:e3000333.

https://doi.org/10.1371/journal.pbio.3000333
- PubMed
- Google Scholar
(2020) Imaging methods are vastly underreported in biomedical research
eLife 9:e55133.

https://doi.org/10.7554/eLife.55133
- PubMed
- Google Scholar
1. Mesirov JP
(2010) Accessible reproducible research
Science 327:415–416.

https://doi.org/10.1126/science.1179653
- Google Scholar
1. Miyakawa T
(2020) No raw data, no science: another possible source of the reproducibility crisis
Molecular Brain 13:24.

https://doi.org/10.1186/s13041-020-0552-2
- PubMed
- Google Scholar
(2017) A manifesto for reproducible science
Nature Human Behaviour 1:0021.

https://doi.org/10.1038/s41562-016-0021
- PubMed
- Google Scholar
Website
1. NIH
(2020) Rigor and reproducibility
Accessed May 28, 2021.

https://www.nih.gov/research-training/rigor-reproducibility
1. Open Science Collaboration
(2015) Estimating the reproducibility of psychological science
Science 349:aac4716.

https://doi.org/10.1126/science.aac4716
- PubMed
- Google Scholar
1. Peng RD
(2011) Reproducible research in computational science
Science 334:1226–1227.

https://doi.org/10.1126/science.1213847
- PubMed
- Google Scholar
(2011) Believe it or not: how much can we rely on published data on potential drug targets?
Nature Reviews Drug Discovery 10:712.

https://doi.org/10.1038/nrd3439-c1
- PubMed
- Google Scholar
(2019) FAIRsharing as a community approach to standards, repositories and policies
Nature Biotechnology 37:358–367.

https://doi.org/10.1038/s41587-019-0080-8
- PubMed
- Google Scholar
1. Schloss PD
(2018) Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research
mBio 9:e00525-18.

https://doi.org/10.1128/mBio.00525-18
- PubMed
- Google Scholar
1. Shen K
2. Qi Y
3. Song N
4. Tian C
5. Rice SD
6. Gabrin MJ
7. Brower SL
8. Symmans WF
9. O'Shaughnessy JA
10. Holmes FA
11. Asmar L
12. Pusztai L
(2012) Cell line derived multi-gene predictor of pathologic response to neoadjuvant chemotherapy in breast cancer: a validation study on US oncology 02-103 clinical trial
BMC Medical Genomics 5:51.

https://doi.org/10.1186/1755-8794-5-51
- PubMed
- Google Scholar
1. Stevens JR
(2017) Replicability and reproducibility in comparative psychology
Frontiers in Psychology 8:862.

https://doi.org/10.3389/fpsyg.2017.00862
- PubMed
- Google Scholar
1. Strasak AM
2. Zaman Q
3. Marinell G
4. Pfeiffer KP
5. Ulmer H
(2007) The use of statistics in medical research: a comparison of the New England Journal of Medicine and Nature Medicine
The American Statistician 61:47–55.

https://doi.org/10.1198/000313007X170242
- Google Scholar
(2015) Beyond bar and line graphs: time for a new data presentation paradigm
PLOS Biology 13:e1002128.

https://doi.org/10.1371/journal.pbio.1002128
- PubMed
- Google Scholar
(2019) Reveal, don't conceal: transforming data visualization to improve transparency
Circulation 140:1506–1518.

https://doi.org/10.1161/CIRCULATIONAHA.118.037777
- PubMed
- Google Scholar

Decision letter

Helena Pérez Valle

Reviewing Editor; eLife, United Kingdom
Peter Rodgers

Senior Editor; eLife, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Thank you for submitting your article "A community-led initiative for training in reproducible research" to eLife for consideration as a Feature Article. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Helena Pérez Valle, and the eLife Features Editor, Peter Rodgers.

The reviewers and editors have discussed the reviews and we have drafted this decision letter to help you prepare a revised submission.

Summary:

This manuscript provides an overview of the community-led project Reproducibility for Everyone (R4E, https://repro4everyone.org/). R4E is already a useful training resource and community that provides introductory training sessions across a range of openness and reproducibility practices. Here the project is described so that researchers can make use of it to develop and teach reproducibility skills. This is a good collection of resources, and an excellent base collection for expansion via community engagement.

Essential revisions:

1. Please edit the first sentence of the abstract to temper it, for example, by being more specific about what conducting reproducible research does for the transparency and usefulness of research, and please avoid references to the scientific method in the article.

2. The article clearly defines replicable and reproducible separately but then uses the word reproducibility to encompass the things discussed in the paragraph from line 88-105 (lines are numbered lines from the PDF) which mainly describe issues with replicability. Please correct slippage of terminology throughout the article.

3. Please edit Figure 1 so that it is not presented as a Venn diagram (i.e. each factor should appear under just one heading, and the label "lack of training" should be removed) and edit the caption to describe the figure closely and explain that it is a rough approximation for the taxonomy of factors that contribute to research not being reproducible.

4. If possible, it would be informative for the article to include more statistics and comments about the reach and impact of the Reproducibility for All project. If any participant feedback or similar is available for sharing, please report it in the article. Information on whether the workshops are as successful when delivered virtually as they are in person would be particularly useful given the current context and the fact that scalability may rely on virtual formats.

5. Please consider adding a paragraph at an appropriate place in the paper covering the following comments made by the referees:

"Between the individual and systemic levels of science is a cultural level, including mentor and peer support, lab culture, and supervision. While the authors are rightly proud of their existing reach and have laudable ambitions for translating materials into different languages, they miss an opportunity to address cultural barriers. Including acknowledgement of and advice for researchers who are in unsupportive environments (or researchers who are pushing the reproducibility envelope for their institutions), with suggestions of where practical and social support can be found, would be a valuable inclusion. Addressing the isolation which can surround being the first in an institution to try to address reproducibility issues is, in my experience organising ReproducibiliTea Journal Clubs, all the more relevant for institutions outside the elite English-speaking universities."

"In my reading there is an implicit assumption throughout the manuscript that improving training and improving access to training should solve issues with reproducibility. I agree that training is part of the solution, but what this manuscript does not discuss in depth is grapple with the dynamics between for example improved training and pervasive incentives to not spend time on reproducibility. I do acknowledge that I may be reading too much into this aspect and that it is not the purpose of this manuscript to discuss the research culture issues around reproducibility other than to introduce R4E."

6. Please cite papers (or repositories of papers) that have enacted the practices summarised in Page 8.

7. Please include a discussion about the fields or kinds of research that the reproducibility pipeline refers to. It would be useful for the reader to know whether the approach presented by R4E was developed with different disciplines in mind, or a specific biological research focus that then needs to be ported elsewhere. A limitation of some of the 'open science' projects, for example, has been to narrow focus so much to a specific fields' needs or a specific research type that readers could presume that the approach was simply not valid for their research. R4E seems to cross this boundary nicely and this could be highlighted more in the manuscript. It may also be worth making a note of the applicability of these pipelines to very different research approaches, for example qualitative approaches.

https://doi.org/10.7554/eLife.64719.sa1

Author response

Essential revisions:

1. Please edit the first sentence of the abstract to temper it, for example, by being more specific about what conducting reproducible research does for the transparency and usefulness of research, and please avoid references to the scientific method in the article.

We thank the reviewer for this comment. As requested, we have revised the first sentence of the abstract to the following:

“Open and reproducible research practices increase the reusability and usefulness of scientific research.”

We have removed scientific method from the manuscript.

2. The article clearly defines replicable and reproducible separately but then uses the word reproducibility to encompass the things discussed in the paragraph from line 88-105 (lines are numbered lines from the PDF) which mainly describe issues with replicability. Please correct slippage of terminology throughout the article.

Thank you very much for this thoughtful reply. We discussed this issue and now added the following sentence to the manuscript on page 3, lines 76 to 77.

“We use the term reproducibility to capture both these concepts at once as is done often in life sciences (Barba, 2018).”

The provided reference is a good analysis that shows that different fields use these terms differently. We are of the view that in life sciences reproducibility is often understood at both terms interchangeably. We prefer to acknowledge these shortcomings as the article would become otherwise to complex and wordy as we often mean both reproducibility and replicability at the same time.

3. Please edit Figure 1 so that it is not presented as a Venn diagram (i.e. each factor should appear under just one heading, and the label "lack of training" should be removed) and edit the caption to describe the figure closely and explain that it is a rough approximation for the taxonomy of factors that contribute to research not being reproducible.

We have revised the figure so that the content is not presented as a Venn diagram. We have also revised the figure caption.

4. If possible, it would be informative for the article to include more statistics and comments about the reach and impact of the Reproducibility for All project. If any participant feedback or similar is available for sharing, please report it in the article. Information on whether the workshops are as successful when delivered virtually as they are in person would be particularly useful given the current context and the fact that scalability may rely on virtual formats.

We thank the reviewer for this comment, we have included the following statement within the manuscript (page 8, lines 199 to 201) as well as below.

“Feedback on our workshops indicate that 80% of participants learned important new aspects of reproducibly research practices and are very likely to implement at least some of the presented tools in their own research workflows in the future.”

We would like to point out that this is based on non-representative post-participation surveys, interviews with participants and anecdotal feedback. This is not based on a scientific approach or a fully controlled study. We hope the careful phrasing of the sentence make this clear.

Similarly, we have not seen a significant difference in the uptake comparing in person vs. online workshops. Yet again this has not been analyzed to the degree that we would feel comfortable commenting on this aspect in this manuscript. Hence, we feel this is out of the reach of the current manuscript that introduces R4E.

5. Please consider adding a paragraph at an appropriate place in the paper covering the following comments made by the referees:

"Between the individual and systemic levels of science is a cultural level, including mentor and peer support, lab culture, and supervision. While the authors are rightly proud of their existing reach and have laudable ambitions for translating materials into different languages, they miss an opportunity to address cultural barriers. Including acknowledgement of and advice for researchers who are in unsupportive environments (or researchers who are pushing the reproducibility envelope for their institutions), with suggestions of where practical and social support can be found, would be a valuable inclusion. Addressing the isolation which can surround being the first in an institution to try to address reproducibility issues is, in my experience organising ReproducibiliTea Journal Clubs, all the more relevant for institutions outside the elite English-speaking universities."

We thank the reviewer for pointing this out to us and agree that it can be isolating and frustrating when you find yourself in a non-supportive environment. We have added the following to the manuscript to address this situation (page 9, lines 217 to 225):

“We would like to point out that a supportive environment is critical for these efforts to be properly adopted in a research environment. […] Similarly, joining the R4E community and discussing these situations with our community members can help you find solutions to convince your peers and supervisors of the importance of incorporating reproducible research practices.”

"In my reading there is an implicit assumption throughout the manuscript that improving training and improving access to training should solve issues with reproducibility. I agree that training is part of the solution, but what this manuscript does not discuss in depth is grapple with the dynamics between for example improved training and pervasive incentives to not spend time on reproducibility. I do acknowledge that I may be reading too much into this aspect and that it is not the purpose of this manuscript to discuss the research culture issues around reproducibility other than to introduce R4E."

We wholeheartedly agree with the reviewer’s comment: higher-level issues are to blame for scientific findings being irreproducible. However, as they are deeply engrained in our meritocracy-based science funding system, overcoming these barriers will require a complete re-organization of our current way of funding and evaluating scientific success. R4E tries to focus on the positive, and addresses changes that each scientist can implement themselves to make a difference in the reproducibility of their own research. If the reviewer would like to see this addressed in the manuscript, we could add the following section to the description of factors affecting reproducibility (Page 10, lines 259 to 262):

“Increasing training in reproducible research practices alone will not suffice to make all scientific findings reproducible. […] Large structural and cultural changes are needed to transition from rewarding only breakthrough scientific findings, to promoting those that were performed using reproducible and transparent research practices.”

6. Please cite papers (or repositories of papers) that have enacted the practices summarised in Page 8.

We are unfortunately unable to act on this request as we are unsure what the reviewer refers to. We think the reviewer request is too unspecific as given.

7. Please include a discussion about the fields or kinds of research that the reproducibility pipeline refers to. It would be useful for the reader to know whether the approach presented by R4E was developed with different disciplines in mind, or a specific biological research focus that then needs to be ported elsewhere. A limitation of some of the 'open science' projects, for example, has been to narrow focus so much to a specific fields' needs or a specific research type that readers could presume that the approach was simply not valid for their research. R4E seems to cross this boundary nicely and this could be highlighted more in the manuscript. It may also be worth making a note of the applicability of these pipelines to very different research approaches, for example qualitative approaches.

We thank the reviewers for this comment and have added a paragraph to bridge the gap and make it clearer that our training materials are not exclusively fitting for biological research only. We hope this improves the applicability of our materials. Some feedback we got during workshops from researchers outside of the life sciences indicates that our materials were useful for them as well (Page 6, lines 144 to 147).

R4E targets mainly biological and medical research practices (reagent and protocol sharing, data management) and in part computer science (bioinformatic tools) as evidenced by the range of trainings offered so far. Tools we discuss could also be useful for disciplines close to biological research like bioengineering, biophysics, (bio)chemistry, etc. Some training modules, especially Data management, Data visualization and Figure design, might be valuable for qualitative research that collects and analyzes text and other non-numerical data.

https://doi.org/10.7554/eLife.64719.sa2

Article and author information

Author details

Susann Auer

Susann Auer is in the Department of Plant Physiology, Institute of Botany, Faculty of Biology, Technische Universität Dresden, Dresden, Germany and is an eLife ambassador

Contribution
Formal analysis, Visualization, Methodology, Writing - original draft, Writing - review and editing

Contributed equally with
Nele A Haeltermann

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-6566-5060
Nele A Haeltermann

Nele A Haeltermann is in the Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, United States and is an eLife ambassador

Contribution
Visualization, Writing - original draft, Project administration, Writing - review and editing

Contributed equally with
Susann Auer

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-1431-7581
Tracey L Weissgerber

Tracey L Weissgerber is in the QUEST Center, Berlin Institute of Health, Charité Universitätsmedizin Berlin and is a member of the eLife Early-Career Advisory Group

Contribution
Conceptualization, Funding acquisition, Visualization, Writing - original draft, Project administration, Writing - review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-7490-2600
Jeffrey C Erlich

Jeffrey C Erlich is in the NYU-ECNU Institute of Brain and Cognitive Science, NYU Shanghai and the Shanghai Key Laboratory of Brain Functional Genomics, East China Normal University, Shanghai, China and is an eLife ambassador

Contribution
Visualization, Writing - original draft

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-9073-7986
Damar Susilaradeya

Damar Susilaradeya is in the Medical Technology Cluster, Indonesian Medical Education and Research Institute, Faculty of Medicine, Universitas Indonesia, Jakarta, Indonesia and is an eLife ambassador

Contribution
Visualization, Writing - original draft, Project administration

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-4548-5924
Magdalena Julkowska

Magdalena Julkowska is in the Boyce Thompson Institute, Ithaca, United States and is an eLife ambassador

Contribution
Visualization, Writing - original draft, Writing - review and editing

Competing interests
No competing interests declared
Małgorzata Anna Gazda

Małgorzata Anna Gazda is in the CIBO/InBIOO, Centro de Investigaçao em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão and the Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal and is an eLife ambassador

Contribution
Visualization, Writing - original draft, Project administration, Writing - review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-8369-1350
Benjamin Schwessinger

Benjamin Schwessinger is in the Research School of Biology, Australian National University, Canberra, Australia and is a member of the eLife Early-Career Advisory Group

Contribution
Conceptualization, Funding acquisition, Investigation, Methodology, Writing - original draft, Writing - review and editing

For correspondence
benjamin.schwessinger@anu.edu.au

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-7194-2922
Nafisa M Jadavji

Nafisa M Jadavji is in the Department of Biomedical Science, Midwestern University, Glendale, United States and in the Department of Neuroscience, Carleton University, Ottawa, Canada and is an eLife ambassador

Contribution
Supervision, Visualization, Writing - original draft, Project administration, Writing - review and editing

For correspondence
njadav@midwestern.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3557-7307
Reproducibility for Everyone Team

Reproducibility for Everyone, New York, United States

Contribution
Conceptualization, Funding acquisition, Investigation, Methodology, Writing - original draft, Writing - review and editing

Competing interests
No competing interests declared
1. Angela Abitua, Addgene, Boston, United States
2. Anzela Niraulu, Neuroscience Graduate Program, Ohio State University, Columbus, United States
3. Aparna Shah, The Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, United States
4. April Clyburne-Sherinb, Reproducibility for Everyone, New York, United States
5. Benoit Guiquel, Addgene, London, United Kingdom
6. Bradly Alicea, Orthogonal Research and Education Laboratory, Champaign, United States
7. Caroline LaManna, Addgene, Boston, United States
8. Diep Ganguly, Research School of Biology, Australian National University, Canberra, Australia
9. Eric Perkins, Addgene, Boston, United States
10. Helena Jambor, Centre for Regenerative Therapies Dresden, Dresden, Germany
11. Ian Man Ho Li, Massachusetts General Hospital, Harvard University, Boston, United States
12. Jennifer Tsang, Addgene, Boston, United States
13. Joanne Kamens, Addgene, Boston, United States
14. Lenny Teytelman, Protocols.io, San, Francisco, United States
15. Mariella Paul, Psychology of Language Group, University of Gottingen, Gottingen, Germany
16. Michelle Cronin, Addgene, Boston, United States
17. Nicolas Schmelling, Institute for Synthetic Microbiology, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
18. Peter Crisp, Research School of Biology, Australian National University, Canberra, Australia
19. Rintu Kutum, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
20. Santosh Phuyal, Institute of Basic Medical Science, University of Oslo, Oslo, Norway
21. Sarvenaz Sarabipour, Institute for Computational Medicine and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Baltimore, United States
22. Sonali Roy, Plant Biotechnology, Tennessee State University, Nashville, United States
23. Susanna M Bachle, Addgene, Boston, United States
24. Tuan Tran, Aerospace Engineering, Nanyang Technological University, Singapore, Singapore
25. Tyler Ford, Picture as Portal, San, Francisco, United States
26. Vicky Steeves, Research Data Management and Reproducibility, New York University, New, York, United States
27. Vinodh Ilangovan, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
28. Ana Baburamani, Centre for the Development of Brain, School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
29. Susanna Bachle, Addgene, Boston, United States

Funding

Mozilla Foundation (MF-1811-05938)

Benjamin Schwessinger

Chan Zuckerberg Initiative (223046)

Susann Auer
Nele A Haeltermann
Benjamin Schwessinger
Nafisa M Jadavji
Reproducibility for Everyone Team

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Members of the Reproducibility for Everyone initiative would like to thank all organizers, volunteers and staff who have helped over the years with running our workshops. We would like to thank the eLife Ambassador program, Addgene, Protocols.io, the American Society of Plant Biology, the American Society of Microbiology, New England Biolabs, the Chan-Zuckerberg Initative, Dorothy Bishop, and many others for supporting the Reproducibility for Everyone initiative.

Publication history

Received: November 9, 2020
Accepted: June 18, 2021
Accepted Manuscript published: June 21, 2021
Version of Record published: July 15, 2021

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.