Biomedical supervisors’ role modeling of open science practices

  1. Tamarinde L Haven  Is a corresponding author
  2. Susan Abunijela
  3. Nicole Hildebrand
  1. Danish Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Denmark
  2. QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Germany

Abstract

Supervision is one important way to socialize Ph.D. candidates into open and responsible research. We hypothesized that one should be more likely to identify open science practices (here publishing open access and sharing data) in empirical publications that were part of a Ph.D. thesis when the Ph.D. candidates’ supervisors engaged in these practices compared to those whose supervisors did not or less often did. Departing from thesis repositories at four Dutch University Medical centers, we included 211 pairs of supervisors and Ph.D. candidates, resulting in a sample of 2062 publications. We determined open access status using UnpaywallR and Open Data using Oddpub, where we also manually screened publications with potential open data statements. Eighty-three percent of our sample was published openly, and 9% had open data statements. Having a supervisor who published open access more often than the national average was associated with an odds of 1.99 to publish open access. However, this effect became nonsignificant when correcting for institutions. Having a supervisor who shared data was associated with 2.22 (CI:1.19–4.12) times the odds to share data compared to having a supervisor that did not. This odds ratio increased to 4.6 (CI:1.86–11.35) after removing false positives. The prevalence of open data in our sample was comparable to international studies; open access rates were higher. Whilst Ph.D. candidates spearhead initiatives to promote open science, this study adds value by investigating the role of supervisors in promoting open science.

Editor's evaluation

This paper will be of interest to scientists who are interested in open-access publishing and open data-sharing procedures. The authors examine associations between PhD candidates' use of open access and open data procedures and use by their supervisors. At present, the study provides solid, useful information suggesting that candidates whose supervisors engage in open-access publishing and open data sharing are more likely to do so, but it does not establish causality or directionality.

https://doi.org/10.7554/eLife.83484.sa0

Introduction

When conducted in a manner that emphasizes rigorous and transparent research, supervision can be an important means to socialize Ph.D. candidates into responsible research practices (Bird, 2001; Anderson et al., 2007; Davis et al., 2007; All European Academies, 2017; Universities of The Netherlands, 2018). Responsible research practices are practices researchers can engage in to enhance the transparency, validity, and trustworthiness of their work (Steneck, 2006; Bouter, 2020). Within biomedical science, examples include open science practices such as openly sharing data and publishing open access, as well as making the underlying methodology openly available, and explicitly acknowledging the limitations of the research findings (Iqbal et al., 2016; Moher et al., 2018; Wallach et al., 2018; Serghiou et al., 2021; Gopalakrishna et al., 2022b; Susanin et al., 2022; Roche et al., 2022a; Hughes et al., 2022).

To effectively socialize Ph.D. candidates into open and responsible research practices, a pilot study conducted by a research team including the first author distinguished three components (Haven et al., 2022). First, the supervisor is supposed to be a role model, i.e., the supervisor engages in open and responsible research practices by for example making their own data and code consistently available. Second, the supervisor encourages the Ph.D. candidate to engage in responsible research practices. After all, it could be that the supervisor has more of a coordinating role or is perhaps versed in another sub-discipline than the Ph.D. candidate’s research. Some have referred to this as the distinction between implicit (role modeling) and explicit (verbal instructions and encouragement) supervision (Fisher et al., 2009). Third, the supervisor is able to create a psychologically safe atmosphere where Ph.D. candidates feel the space to discuss dilemmas, admit mistakes, and question the status quo (Antes and DuBois, 2018; Antes et al., 2019a; Antes et al., 2019b). This psychological safety in turn contributes to maintaining quality by creating an environment where colleagues can safely scrutinize each others’ work (Roberto et al., 2006; Halbesleben and Rathert, 2008).

Research into responsible supervision is growing, but many knowledge gaps remain. A scoping review (Pizzolato and Dierickx, 2023) identified a total of 35 empirical studies on the topic, two-thirds of which used a survey design where they enquired about perceptions from either supervisors or supervisees (except for Buljan et al., 2018, who did a qualitative study). More direct evidence (beyond perceptions) on role modeling could not be identified, whereas this role modeling is presumed to be a crucial component of responsible supervision.

This study adds to the literature by proposing a new way to investigate the role modeling of open science practices in biomedicine. It starts from the assumption that if role modeling is important, then it should be possible to discern an association between the supervisor’s engagement in open science practices and the Ph.D. candidate’s engagement in open science practices. We hypothesized that one should be more likely to identify open science practices (here publishing open access and sharing data) in empirical publications that were part of a Ph.D. thesis when the Ph.D. candidates’ supervisors engaged in these open science practices compared to those whose supervisors did not or less often did.

Results

Open science practices analyses

We managed to include 211 pairs of Ph.D. candidates and supervisors, 50 from Leiden UMC, 54 pairs from Amsterdam UMC, 52 from UMC Groningen, and 55 from Maastricht UMC. This resulted in 2062 DOIs, six of which did not resolve in Unpaywall (0.3%) and 14 PDFs could not be obtained for Oddpub (0.7%). Prevalence for each practice (expressed as unique DOIs) appear in Table 1, as well as correlations between the Ph.D. candidates’ engagement in a practice and the supervisors’ engagement in a practice. GEE logistic regression analyses for both crude and adjusted models appear in Table 2.

Table 1
Prevalence of open access publishing and sharing data openly among unique DOIs.
PracticePh.D. candidatesSupervisorsTotal*Spearman’s correlation
Open access54811541702 (82.8%)0.24
Open data (automated)67112179 (8.8%)0.20
Open data (manually verified)3466100 (4.8%)0.22
  1. *

    Between parentheses indicates proportion out of the total sample.

Table 2
GEE logistic analyses for open access, open data (automated detection), and open data (manually verified).
Crude analysisAdjusted analysis (institution added)
PracticeNOR*95% CIp-valueNOR*95% CIp-value
Open access (binary)6511.99(1.17–3.38)0.0116511.64(0.94–2.85)0.079
Reference category: up to or including the national average (76%) of the supervisor’s publications were open access
Open data automated (binary)6442.09(1.13–3.88)0.0196442.21(1.19–4.12)0.012
Reference category: supervisor never shared data
Open data manually verified (binary)6533.74(1.53–9.12)0.0046534.60(1.86–11.35)0.001
Reference category: supervisor never shared data
  1. N=the total number of included publications by Ph.D. candidates.

  2. *

    Odds ratios are EXP transformed.

Retractions analysis

We were able to link a total of 81,091 publications to the supervisors and Ph.D. candidates. Three Ph.D. candidates could not be identified, all supervisors were identified. Of the 81,091 publications that could be matched to the Ph.D. candidates and supervisors, two were retracted. Both regarded publications where the supervisor appeared as one of the co-authors and were retracted one year after publication. The following reasons were specified, which we interpret as honest errors:

‘The authors have become aware that some of the results presented in this paper are invalid, not reproducible, and/or misinterpreted. They consider that the main conclusion of the paper is not valid. They, therefore, retract this publication’.

‘The original version of this article was withdrawn by the authors. An error was discovered in the creation of the protein database file that was used for searching, which led to some incorrect associations between peptides and proteins. A corrected version of the manuscript has been supplied which contains very similar peptide identifications as the original, but the resulting number of proteins in various categories has now changed, and as a result, some of the figures and supplementary files have changed also. The underlying conclusions of this study, however, remain unaltered’.

None of the publications included in our own dataset were retracted. RetractionWatch, a blog and database tracking and reporting on retractions (https://retractionwatch.com/), indicated that retractions can take from 3 months to many years, hence some papers may be retracted in the future.

Discussion

We hypothesized that having a supervisor that shares data or publishes open access was associated with a higher likelihood that the Ph.D. candidate will engage in the same practice. Based on the automated detection of data-sharing statements, we found that having a supervisor that shared data is associated with 2.21 (p=0.012) times the odds to share data when compared to having a supervisor that did not share data. This odds ratio increased to 4.6 (p=0.001) after manually checking the open data statements and removing false positives. The unadjusted open access odds ratio was 1.99 (p=0.011) and became 1.64 (p=0.079) when correcting for the role of the institution. By including the institute variable in our adjusted analyses, the effects of open data remain significant. The odds ratio for the manually verified open data increased by 23%, which could be due to the institution initially masking the effect of the supervisor (see also Kahan et al., 2014).

Contextualisation

Since our sample consists of Ph.D. candidates and supervisors from Dutch UMCs and focuses only on recent years, it may be useful to compare it to international data and reflect on some assumptions that went into our design. Serghiou et al., 2021 screened all of PubMed Central using the same text-mining algorithm as the current study and found 8.9% of publications to return Open Data statements. It, therefore, seems likely that our sample (8.8%) is not substantially different from the rest of biomedicine.

We included publications across various subfields of biomedicine. However, the odds of finding a publication that shares its underlying data may not be the same for all subfields. Subfields working with genetics and OMICS data could be more likely to share data than studies that describe clinical research, because of the ethical and privacy-related complications involved (Mansmann et al., 2023).

It should be noted that open data does not imply the data is gathered in a rigorous, ethical, and reproducible manner. It could even be that the data are FAIR but still not useful, because the methods used were not well-suited to answer the research question, or because the data collection was sloppy, or because the data fail to capture crucial differences in the target population. We manually verified whether we could find the data, and whether data were actually open, stored on a repository and downloadable. Any assessments about the quality of the data or the quality of archiving (see e.g. Roche et al., 2022b) are beyond the scope of this paper.

For open access (82.8% current study), we find our Dutch sample to be in line with national data, but above average when compared to international studies. The Rathenau Institute calculated that 76% of all Dutch publications were available to open access in 2021. Using the Unpaywall, Robinson-Garcia et al., 2020 found the Dutch uptake of open access to be around 60%. Looking internationally, Piwowar et al., 2018 assessed the prevalence of open access in different databases. When they looked specifically at biomedicine, using data from the Web of Science, and using the same open/closed distinction as the current study, they found a little over 30% to be open. This difference could be due to various recent Dutch policies for open access. Dutch universities and UMC federations have sealed various deals with publishers (https://www.openaccess.nl/en/in-the-netherlands/publisher-deals), plus the Dutch Research Council requires all their funded research to be published open access (https://www.nwo.nl/en/open-access-publishing).

Open access comes in different forms and publishing in open-access journals is often not for free (Ross-Hellauer, 2022). Some publishers make exceptions for scientists from low and middle-income countries, but the Netherlands would not classify. It could thus be that the amount of funding that a supervisor or Ph.D. candidate had available affected the relationship we studied. In other words: funding availability may determine whether Ph.D. candidates (or supervisors) chose to publish in an open access journal. However, it should be noted that green open access, archiving a paper in an appropriate format in an (institutional) repository, can be done free of financial charge.

The role of early career researchers (ECRs)

A variety of grassroots initiatives that aim to promote open science practices are spearheaded by ECRs (many of them in the process of obtaining a Ph.D.). Popular examples in the Netherlands include ReproducbiliTea (https://reproducibilitea.org) as well as the Open Science Communities (https://www.osc-nl.com).

In addition, many education and training activities to promote open science and responsible research practices target master and Ph.D. candidates. Assuming this group then has more opportunities to learn about open and responsible research, it begs the question of who teaches who. On this note, Pizzolato and Dierickx propose it might be useful to have Ph.D. candidates mentor their supervisors when it comes to matters of research integrity (Pizzolato and Dierickx, 2022).

Our findings do not allow for causal inferences, yet we believe they don’t need to conflict with ECRs and Ph.D. candidates’ knowledge about and engagement in open science practices. Even when one has knowledge about open science practices when starting a Ph.D. trajectory or engages in a ReproducibiliTea reading group during a Ph.D., it may still help to have a supervisor who role models these practices. Considering the associations that we identified, we speculate that working under supervisors who engage in open science themselves could empower Ph.D. candidates to engage in open science more readily. Or at the very least, the supervisor is then less likely to hamper the Ph.D. candidate’s engagement in these practices. The other side of the coin, supervisors’ lack of engagement in open science practices, still seems more normal, although a recent Dutch survey found Ph.D. candidates to score lower compared to senior researchers on sharing data (Gopalakrishna et al., 2022a). Finally, it could be that the relationship investigated here is bidirectional.

Limitations

This study included many publications by the supervisors, but not all. The number of included first or last author publications for the open science practices varies between 3 and 11; we always included more publications by the supervisor than by the Ph.D. candidate. This meant that at times, we had to exclude pairs because the supervisor did not have a sufficient number of publications, meaning we may have a small bias towards productive supervisors. In addition, we only included publications up until the year that the Ph.D. candidate defended their thesis, meaning that we at times had to exclude the most recent works.

We only sampled from four out of eight Dutch UMCs; hence our findings may not generalize to all Dutch UMCs, let alone to other countries. That said, we see no prima facie reason to believe that Leiden, Amsterdam - AMC, Groningen, and Maastricht differ substantially from Nijmegen, Utrecht, Rotterdam, and Amsterdam – Vumc, especially given the national data from the Rathenau Institute and the fact that a similar proportion of Open Data statements was returned in a much larger study of biomedical research (Serghiou et al., 2021).

Finally, our study does not allow for drawing causal inferences on who educated who regarding open science practices. This is due to its design, but also because we only extracted publications by Ph.D. candidates that were part of their Ph.D. thesis. Hence, we might have missed publications outside the Ph.D. or prior to the Ph.D. candidate that would have indicated a greater engagement in open science practices. That said, this was beyond the scope of our study that aimed at looking at the effect of a supervisor engaging in open science practices.

Conclusion

We investigated whether having a supervisor that shared data openly and published open access, resulted in a greater odds of the Ph.D. candidate sharing their data and publishing open access. Based on our sample of 211 pairs of biomedical Ph.D. candidates and supervisors, we find the odds of a Ph.D. candidate sharing data to be greater when working under a supervisor who shared data themselves. The effect of open access was smaller and vanished when correcting for institutions, which might be explained by a greater uptake of open access across the Dutch ecosystem. Our design highlights a new way of investigating role modeling in the context of Open Science and other responsible research practices.

Materials and methods

Materials availability statement

Request a detailed protocol

Data were collected using a pilot-tested protocol that is freely accessible on OSF, we provide a brief overview of our data collection procedures and materials below.

Ethical aspects

Request a detailed protocol

This study used publicly available information (publications) as its data and hence no ethical approval was required. The study was preregistered on the OSF, see: 10.17605/OSF.IO/2PBNS.

Population

Request a detailed protocol

Our population consisted of pairs of Ph.D. candidates and their main supervisors (in the Netherlands, the primary supervisor has to be a full professor, although recently associate professors can get these rights, too). They had to be affiliated with a Dutch University Medical Center (henceforth: UMC) and had to work in biomedicine (understood as their publications being indexed in PubMed).

The Netherlands has eight UMCs, four of those maintained Ph.D. thesis repositories that allowed for the reliable extraction of data (based on a pilot study, see here). These were Leiden UMC, Amsterdam UMC (location AMC), Maastricht UMC, and UMC Groningen, respectively.

Sample size

Request a detailed protocol

In the absence of, to our knowledge, previous studies using a similar method to examine supervisor’s role modeling in this or a comparable manner, we conducted a pilot study (n=30). We used the correlations found in the pilot for open access (0.2) as input for the sample size calculation. With an alpha of 0.05 and a power of 0.80, we would need 194 pairs. However, we oversampled as some publications might not meet eligibility criteria after screening the full publication.

Sampling time

Request a detailed protocol

We identified pairs and extracted data between April 17th and June 30th, 2022. We stopped sampling when we passed the required sample size, and wanted to include an equal share of pairs from each of the four university medical centers. This meant we focused on Ph.D. theses that were defended in 2022 or late 2021.

Eligibility criteria

Request a detailed protocol

Ph.D. candidates’ publications had to be in English, part of their Ph.D. thesis (other works published during the same time were excluded), regard empirical work (excluding reviews, commentaries, and narratives), published no earlier than 2018 (to make it reasonable they worked with the supervisor we identified), where the Ph.D. candidate was the sole first author. We only included Ph.D. candidates if they had at least two publications that met these criteria.

Supervisors’ publications had to be in English, had to regard empirical work, and be published no earlier than 2017 where the supervisor was the sole first or last author. We only included supervisors if they had at least three publications that met these criteria. Each supervisor only appears once in our dataset to prevent additional clustering and the Ph.D. candidates could not be co-authors on included publications.

Data extracted

Request a detailed protocol

If both the Ph.D. candidate and the supervisor met the eligibility criteria, we extracted the DOIs of the relevant publications, the names of the pairs, the institute they worked at the time of the Ph.D. defense, and the year of the thesis defense. For the supervisor, we also extracted the authorship position.

Data preparation

Request a detailed protocol

To assess the open access status, we used the Unpaywall API through the UnpaywallR package (Ridel and Franzen, 2022). The UnpaywallR package takes the DOI and returns the different forms in which a publication is available. We applied the following hierarchy: Gold, Green, Hybrid, Bronze, and Paywalled, following the interpretations of the different forms as described by Priem, 2021. We recoded this into a binary variable where Gold, Green, and Hybrid were considered open, and Bronze and paywalled were considered closed.

To identify papers with open data, we used Oddpub followed by a manual review of extracted statements. First, we downloaded the PDFs from the extracted DOIs, we could not access 11 publications – this did not result in excluding pairs. Next, we transformed the PDFs into raw text and applied Oddpub (Riedel et al., 2020). Oddpub is a text-mining algorithm that is designed to pick up data-sharing statements in biomedical research papers (Riedel et al., 2020) RRID:SCR_018385; Version 6. Publications where Oddpub returned a statement were assigned a one and publications where Oddpub did not return a statement were assigned a zero. We refer to this as open data automated.

To assure that the publications where Oddpub returned a statement genuinely had open data, two extractors manually reviewed all statements using a piloted protocol (Iarkaeva, 2022). If there were any discrepancies between extractions, a third extractor or a research data management expert was consulted, and discrepancies were solved through discussion. This resulted in another binary variable where all publications from the list that Oddpub picked up on that had open data were assigned a one and all other publications (i.e. publications where Oddpub initially returned a statement but that were on closer inspection no instances of open data plus all publications where Oddpub returned no statement) were assigned a zero. We refer to this as manually verified open data.

Data analysis

Request a detailed protocol

First, we calculated the prevalence and correlations between a supervisors’ engagement in a practice and Ph.D. candidates’ engagement in a practice. Next, we used Generalized Equations Estimations (GEE) logistic regression to analyze the data, because our dataset is clustered. We transformed the dataset to the level of the Ph.D. candidate where publications (by the candidate) cluster within the candidate. We recoded supervisors’ engagement in open access publishing and data sharing into dichotomous covariates so they could be added to the GEE logistic regression model.

When the percentage of publications from the supervisor that was openly available was above the national average (76%, see Koens and Vennekens, 2022), we gave them a 1. If the percentage was 76% or lower, we assigned this supervisor a 0.

We recoded the supervisors’ sharing of data into never (no included publications with open data) and ever (one or more included publications with open data) and applied the same categorization to automated statements and manually checked statements. We then exponentially transformed the model’s ß coefficients and present Odds ratios. We conducted a crude and adjusted analysis of our odds ratios, where the adjusted models include a dummy coded institute variable to control for a potential confounding bias. In order to determine if the institute was a confounding factor, we compared the measure of association (odds ratios) before and after adjustment. The 10% rule for confounding was applied (Beukelman and Brunner, 2016; Budtz-Jørgensen et al., 2007).

Additional retraction analyses

Request a detailed protocol

A potential concern with our way of studying supervisors’ role modeling regards missing potential irresponsible behaviors. To accommodate this concern, we used the author-disambiguation algorithm developed by Caron and van Eck, 2014 to obtain meta-data on all publications from supervisors and Ph.D. candidates that were available in the in-house version of Web of Science database at CWTS, Leiden University, the Netherlands, and screened these for retractions. Note that a retraction need not indicate actual irresponsible behavior, it may regard honest mistakes. Where possible, we provide the reason for the retraction as specified by the respective journal.

Data availability

All data are available alongside the code to produce it on the associated GitHub repository (copy archived at Haven et al., 2023).

References

  1. Conference
    1. Caron E
    2. van Eck NJ
    (2014)
    Large scale author name disambiguation using rule-based scoring and clustering
    Proceedings of the Science and Technology Indicators Conference 2014.
    1. Roberto MA
    2. Bohmer RMJ
    3. Edmondson AC
    (2006)
    Facing ambiguous threats
    Harvard Business Review 84:106–113.

Decision letter

  1. David B Allison
    Reviewing Editor; Indiana University, United States
  2. Mone Zaidi
    Senior Editor; Icahn School of Medicine at Mount Sinai, United States
  3. Lisa Schwiebert
    Reviewer; University of Alabama at Birmingham, United States
  4. Jon Agley
    Reviewer; Indiana University Bloomington, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Meta-Research: Biomedical supervisors' role modeling of responsible research practices" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Mone Zaidi as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: Lisa Schwiebert (Reviewer #1); Jon Agley (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The investigators are encouraged to widen the years of their analyses so as to include the number and incidence of retracted papers by respective supervisor and PhD supervisee pairs.

Please also address Reviewer 2's recommendations for the authors.

Reviewer #1:

The purpose of the current study is to address a gap in knowledge regarding the assessment of responsible supervision of PhD supervisees in the field of biomedical research. Specifically, the investigators theorize that a supervisor's role modeling of responsible research practices in the context of data sharing will likely result in the PhD supervisees engaging in the same responsible practice as compared with supervisors who did not data share. Through careful analyses of Open Access publishing and Open Data sharing platforms, the investigators found that the odds of a PhD supervisee sharing data were greater working with a supervisor who shared data themselves versus those who did not.

The strengths of this study are several and they include an innovative approach toward a complex concern; strong statistical analyses; well-described limitations of the study. Overall, this is an interesting report, while not wholly surprising, it does add value with its evidenced-based approach toward the assessment of responsible research practices.

Of note, the conclusion assumes that shared data themselves are accurate, rigorous, and reproducible, which may not be wholly representative of the desired responsible practices.

The investigators are encouraged to widen the years of their analyses so as to include the number and incidence of retracted papers by respective supervisor and PhD supervisee pairs.

Reviewer #2:

The authors sought to undertake an examination of open-access publishing and open data sharing among PhD students and their supervisors as part of an expressed interest in responsible research practices (RRPs). The study was preregistered (including hypotheses, procedures, and outcomes), contained a step-by-step guide with screenshots for replicating their protocol, and shared all data and code.

The study results are fairly clearly explained, though I have some specific comments and questions that I provide to the authors privately. The research question itself is interesting and the procedures used to identify and match open access publication and data availability both use and advance novel procedures for doing so. I do have some questions about possible mediating and moderating factors that may be important to discuss, as well as the possible importance of separating open publication and data accessibility from the highly related, but not identical, concept of RRP. In most cases, the resolution of these questions would not change the core findings of the manuscript but would simply serve to advance clarity and discussion. As with all papers using associative analyses, readers are cautioned to avoid causal interpretations – a caveat that is also addressed by the paper itself.

Abstract

Although I have a few questions about some of the analyses, if indeed it is the case that the open access odds were lower, since your manuscript has 4 primary findings, consider sharing all 4 in the abstract.

Introduction

Paragraph 1: The first sentence appears to imply that supervision generally results in socialization into responsible research practices. This may benefit from being expanded somewhat for clarity. For example, the first paper (Bird, 2001) discusses how role modeling and mentorship differ in key ways, and the second paper (Anderson et al., 2007) explicitly notes that certain kinds of mentorship/supervision were associated with higher rates of problematic practices for early career researchers. Would it be more apt to state something like, "When conducted in a manner that clearly emphasizes rigorous, reproducible, and transparent research practices, it is plausible that supervision…"

Paragraph 1: The sentence focused on examples of RRPs cites articles that address RRPs to a degree, but some (e.g., Gopalakrishna et al., 2022) primarily focus on questionable research practices, not necessarily why RRPs might mitigate some of those concerns. I encourage the identification of additional references on RRP to be included here.

Paragraph 2: Although scientific writing conventions vary, my own preference would be for your description to be in the personal possessive here (e.g., "…a pilot study conducted by the first author identified three components."). I think it is important for the reader to understand the origin of these ideas and that you are building on your own prior work.

Paragraph 2: In the final sentence, consider also adding a point raised by the cited work around the importance of being able to safely scrutinize each others' work.

Materials and methods

Materials Availability: I appreciated the excellent use of screenshots to guide readers through the procedures used (as documented on OSF).

Population: I encourage including some of the population information from your preregistration here as well. For example, readers may not intuit that full professorship is the normative rank for candidate supervision in Dutch universities and might mistakenly assume some sort of selection bias (e.g., if associate professors were more commonly supervisors but you identified mostly full professors…).

Sampling Time: It is unclear why the data extraction timeframe resulted in the specific range of PhD thesis defense times. You explore some of this in your preregistration, but additional details would be useful here (even if it's just to indicate that normative delays reflected X – this also addresses why you planned to but did not include German universities, in part).

Eligibility Criteria: Can you clarify whether "latest 2018" means "no earlier than 2018"? I suspect this is a linguistic difference but I want to be sure. In addition, can you explain why you included PhD candidates if they only had at least 2 publications? Is it possible that this inclusion criterion systematically excluded candidates or supervisors with specific characteristics?

Data Preparation: I recommend referencing the Unpaywall FAQ here for data definitions (can be cited as a resource since the FAQ was prepared by Jason Priem; https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-). The advantage is that, as the FAQ notes, there is no formal classification schema that exists. I would not have known that "Green" OA includes institutional repositories because some journals use that term similarly to how this FAQ uses "Hybrid."

Data Analysis: I am not a statistician, but I wonder (perhaps naively) whether 'university' should have been included in the model as a clustering term, especially since some of the earlier literature cited indicates that the environment (beyond the mentor and lab) can contribute to RRP.

Discussion

The first paragraph of the Discussion contains information that I would ordinarily associate with Results. That aside, can you clarify the interpretation of the open-access information? If these are presented as exp(b) values, then would a.18 would indicate that PhD candidates published open access less often when their supervisors did than when they did not?

Contextualization: Since there is a substantial cost associated with publishing in many open-access venues, might an important contributor to variance (that might affect this model but is not included in the model) be the funding amount of the supervisor? For example, some supervisors may have limited ability to publish open access where a cost is incurred, some may be able to publish their own work (or at least some of their own work) open access, but fewer likely can afford to support their candidates in publishing at cost. If there is indeed a weaker relationship between supervisor publishing and candidate publishing than between data availability, then could this be a mediating or moderating factor?

The role of ECRs: I'm not sure that we should assume that ECRs have more opportunities to learn about responsible research, in general, though they may have more exposure to open research principles. This paper primarily focuses on open data and open publication. While these are important components of reproducibility and transparency, and while they may aid in the identification of problematic findings and work, they do not subsume the entirety of RRPs. Some components of RRP are longstanding or axiomatic ethical determinations or orientations, whereas the rise of open access has been relatively recent. So even highly ethical, responsible, and transparent senior researchers may be slow to uptake new publishing approaches.

The role of ECRs: Since your findings are nondirectional (e.g., correlations), it seems plausible (given the citations you provide) that you may be capturing a bidirectional relationship.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Biomedical supervisors' role modeling of open science practices" for further consideration by eLife. Your revised article has been evaluated by Mone Zaidi (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues to consider and respond to from Reviewer #2:

Reviewer #1 (Recommendations for the authors):

The authors have well responded to the initial review – no further comments or concerns.

Reviewer #2 (Recommendations for the authors):

I would like to thank the authors for the work put into this revision and the clarifications offered in the response to reviewers. I restate my perception that this work is interesting and provides a solid and useful contribution to the overall scientific enterprise in this area. I have a few questions and comments around the revised work that I hope you find useful.

Abstract

Here and elsewhere, you might want to use or provide the specific meaning of "actively published open access" since it's very specific to this article (e.g., published open access more often than the average Dutch professor did in 2021). There are also comments in the Discussion about the odds ratio that apply to the abstract (depending on whether you decide to make those revisions).

Introduction

I think that the modifications made to the Introduction section are helpful and address my comments in full.

Materials and methods

I am glad that my questions prompted further investigation into the analyses and variable structure. My only remaining comment relates to the new statement, "The 10% rule for confounding was applied." I think the application here is fine (again – I am not a professional statistician, so caveat emptor) but a citation may be useful. Some readers may be unfamiliar with that rule, and others may have questions about it (see, eg., Lee, 2014 – https://doi.org/10.2188%2Fjea.JE20130062).

Results

In the title for Table 1, you might want to specify the unit of analysis (e.g., a unique DOI). While it is clear to me what is meant by the table, a reader who is skimming might assume that there are, for example 548 PhD candidates publishing open access, rather than 548 DOIs from PhD candidates that were published open access.

If I understand your methods for identifying open data correctly, it may be worth removing the "Open data (automated)" information from the Results entirely. Here is my reasoning: Oddpub was used to extract possible instances of open data statements, but through manual verification, you triaged around half of those instances. Specifically, it seems that those that were excluded by manual verification did not have open data statements as affirmed by two to three human reviewers. Thus, I am not sure what value is added by analyzing the "Open data (automated)" variable, which seems to only describe a subset of papers for which a specific algorithm thought it found open data. We are not concerned with whether things are associated with whether a machine thinks there might be an open data statement, but rather whether things are associated with actual open data access.

Can you please consider verifying the total number of linked publications (81,091)? Given that the supervisor+candidate total n was around 2,000, that number would imply that the supervisors in this study published an additional 79,000 or so papers before 2017. While that is certainly possible, even if we assume that the 211 recent PhD candidates published an aggregate 10,000 papers (a very generous assumption), it would mean that the 211 supervisors published an average of 327 papers each, which is fairly remarkable. In some ways, this number is less relevant than the 2 retractions, but it does make me wonder.

Discussion

If you agree with my point about automated open data statement identification, you can also remove this sentence from the Discussion ("Based on the automated detection of data sharing statements, we found that having a supervisor that shared data is associated with 2.21 times the odds to share data when compared to having a supervisor that did not share data.")

I would suggest revising this sentence: "Effects for Open Access were smaller (1.99) and became nonsignificant when correcting for the role of the institution." Instead, you could indicate that the unadjusted odds ratio was 1.99 (p=.011) and the adjusted odds ratio was 1.64 (p=.079). I think that the shift in odds is interesting even though it falls above the conventional significance threshold.

You currently write, "This may indicate there being greater acceptance of and support for a particular practice in a research group. In other words: a responsible supervisory climate may empower PhD candidates to engage in open science more readily." You might want to contextualize this with something like, "In considering the associations that we identified, we speculate that…"

https://doi.org/10.7554/eLife.83484.sa1

Author response

Essential revisions:

The investigators are encouraged to widen the years of their analyses so as to include the number and incidence of retracted papers by respective supervisor and PhD supervisee pairs.

Please also address Reviewer 2's recommendations for the authors.

Thank you for these valuable suggestions. We have widened the years of analyses to assess retractions (described in more detail below) and incorporated the recommendations by Reviewer 2 to the best of our ability.

Reviewer #1:

The purpose of the current study is to address a gap in knowledge regarding the assessment of responsible supervision of PhD supervisees in the field of biomedical research. Specifically, the investigators theorize that a supervisor's role modeling of responsible research practices in the context of data sharing will likely result in the PhD supervisees engaging in the same responsible practice as compared with supervisors who did not data share. Through careful analyses of Open Access publishing and Open Data sharing platforms, the investigators found that the odds of a PhD supervisee sharing data were greater working with a supervisor who shared data themselves versus those who did not.

The strengths of this study are several and they include an innovative approach toward a complex concern; strong statistical analyses; well-described limitations of the study. Overall, this is an interesting report, while not wholly surprising, it does add value with its evidenced-based approach toward the assessment of responsible research practices.

Of note, the conclusion assumes that shared data themselves are accurate, rigorous, and reproducible, which may not be wholly representative of the desired responsible practices.

We agree and added this point to the Contextualisation section (part of Discussion) noting open data does not imply ethically, sound and rigorously collected data, see below. We also adapted our terminology and now use Open Science Practices throughout the paper.

“It should be noted that open data does not imply the data is gathered in a rigorous, ethical, and reproducible manner. It could even be that the data are FAIR but still not useful, because the methods used were not well-suited to answer the research question, or because the data collection was sloppy, or because the data fail to capture crucial differences in the target population. We manually verified whether we could find the data, whether data were actually open, stored on a repository and downloadable.Any assessments about the quality of the data or the quality of archiving (see e.g., Roche et al., 2022) are beyond the scope of this paper.”

The investigators are encouraged to widen the years of their analyses so as to include the number and incidence of retracted papers by respective supervisor and PhD supervisee pairs.

Thank you for this interesting suggestion. We used a modified approach to answer this request. First we collected additional data on the supervisor and supervisee pairs, namely their email addresses and their Open Researcher and Contributor IDs (ORCIDs).

Second we used our existing data plus the newly acquired information to identify the authors and their papers using the author-disambiguation algorithm developed by Caron & van Eck (2014) in the in-house version of Web of Science database at CWTS, Leiden University, the Netherlands.

Web of Science data includes a binary variable on retractions. Note that the time window for retractions tends to be rather long (see e.g., https://retractionwatch.com/2017/07/07/retraction-countdown-quickly-journals-pull-papers/). Hence it remains possible that papers we included may be retracted in the future.

Using this approach, we identified a total of 81091 publications where the researchers in our sample were identified as (co-)authors. We found 2 retracted papers. Both regarded publications where the supervisor appeared as co-author and both regarded honest mistakes where the retraction was issued one year after the initial publication. We include the specifications provided by the journal below:

Original paper: British Journal of Pharmacology (2002) 136, 1107–1116. Doi:10.1038/sj.bjp.0704814

Retraction: British Journal of Pharmacology (2003) 138, 531. Doi:10.1038/sj.bjp.0705183

Retraction notice: The authors have become aware that some of the results presented in this paper are invalid, not reproducible and/or misinterpreted. They consider that the main conclusion of the paper is not valid. They therefore retract this publication.

Original paper: Molecular & Cellular Proteomics (2018) 17, 2132-2145. Doi: 10.1074/mcp.RA118.000792

Retraction: Molecular & Cellular Proteomics (2019) 16, 1270. Doi:10.1074/mcp.W119.001571

Retraction notice: The original version of this article was withdrawn by the authors. An error was discovered in the creation of the protein database file that was used for searching, which led to some incorrect associations between peptides and proteins. A corrected version of the manuscript has been supplied which contains the very similar peptide identifications as the original, but the resulting number of proteins in various categories has now changed, and as a result some of the figures and supplementary files have changed also. The underlying conclusions of this study, however, remain unaltered.

Corrected version: Shraibman, B., Barnea, E., Kadosh, D. M., Haimovich, Y., Slobodin, G., Rosner, I., López-Larrea, C., Hilf, N., Kuttruff, S., Song, C., Britten, C., Castle, J., Kreiter, S., Frenzel, K., Tatagiba, M., Tabatabai, G., Dietrich, P.-Y., Dutoit, V., Wick, W., Platten, M., Winkler, F., von Deimling, A., Kroep, J., Sahuquillo, J., Martinez-Ricarte, F., Rodon, J., Lassen, U., Ottensmeier, C., van der Burg, S. H., Thor Straten, P., Poulsen, H. S., Ponsati, B., Okada, H., Rammensee, H. G., Sahin, U., Singh, H., and Admon, A. (2019) Identification of tumor antigens among the HLA peptidomes of glioblastoma tumors and plasma. Mol. Cell. Proteomics 18, 1255–1268.

In case the reviewer would like to see these findings in the paper, we have prepared the following text. That said, we feel that given the small number of retractions (#2) and the provided reasons (honest mistakes), it may not add much to the paper. We leave it up to the reviewer and the editor to decide on the matter and have attached the data underlying in anonymised form to this re-submission.

“Additional analyses

A potential concern with our way of studying supervisors’ role modeling regards missing potential irresponsible behaviors. To accommodate this concern, we used the author-disambiguation algorithm developed by Caron & van Eck (2014) to obtain meta-data on all publications from supervisors and PhD candidates that were available in the in-house version of Web of Science database at CWTS, Leiden University, the Netherlands, and screened these for retractions. Note that a retraction need not indicate actual irresponsible behavior, it may regard honest mistakes. Where possible, we provide the reason for the retraction as specified by the respective journal.

Retractions analysis

We were able to link a total of 81091 publications to the supervisors and PhD candidates. Three PhD candidates could not be identified, all supervisors were identified. Of the 81091 publications that could be matched to the PhD candidates and supervisors, 2 were retracted. Both regarded publications where the supervisor appeared as one of the co-authors and were retracted one year after publication. The following reasons were specified, which we interpret as honest errors:

“The authors have become aware that some of the results presented in this paper are invalid, not reproducible and/or misinterpreted. They consider that the main conclusion of the paper is not valid. They therefore retract this publication.”

“The original version of this article was withdrawn by the authors. An error was discovered in the creation of the protein database file that was used for searching, which led to some incorrect associations between peptides and proteins. A corrected version of the manuscript has been supplied which contains the very similar peptide identifications as the original, but the resulting number of proteins in various categories has now changed, and as a result some of the figures and supplementary files have changed also. The underlying conclusions of this study, however, remain unaltered.”

None of the publications included in our own dataset were retracted. RetractionWatch, a blog and database tracking and reporting on retractions (https://retractionwatch.com/), indicated that retractions can take from 3 months to many years, hence some papers may

be retracted in the future.”

Reviewer #2:

The authors sought to undertake an examination of open-access publishing and open data sharing among PhD students and their supervisors as part of an expressed interest in responsible research practices (RRPs). The study was preregistered (including hypotheses, procedures, and outcomes), contained a step-by-step guide with screenshots for replicating their protocol, and shared all data and code.

The study results are fairly clearly explained, though I have some specific comments and questions that I provide to the authors privately. The research question itself is interesting and the procedures used to identify and match open access publication and data availability both use and advance novel procedures for doing so. I do have some questions about possible mediating and moderating factors that may be important to discuss, as well as the possible importance of separating open publication and data accessibility from the highly related, but not identical, concept of RRP. In most cases, the resolution of these questions would not change the core findings of the manuscript but would simply serve to advance clarity and discussion. As with all papers using associative analyses, readers are cautioned to avoid causal interpretations – a caveat that is also addressed by the paper itself.

Thank you for the kind words and helpful comments. We re-ran our analyses with institution as a potential confounding variable and present the crude and adjusted results side-by-side in Table 2 now. We have also adjusted the formulation of our concept, and the paper now consistently refers to open science practices. The remainder of the comments is answered in-depth below.

Abstract

Although I have a few questions about some of the analyses, if indeed it is the case that the open access odds were lower, since your manuscript has 4 primary findings, consider sharing all 4 in the abstract.

Thanks for flagging this, we added the open access odds to the abstract. We added the following sentence:

“Having a supervisor who actively published open access was associated with an odds of 1.99 to publish open access, but this effect became nonsignificant when correcting for institutions.”

Introduction

Paragraph 1: The first sentence appears to imply that supervision generally results in socialization into responsible research practices. This may benefit from being expanded somewhat for clarity. For example, the first paper (Bird, 2001) discusses how role modeling and mentorship differ in key ways, and the second paper (Anderson et al., 2007) explicitly notes that certain kinds of mentorship/supervision were associated with higher rates of problematic practices for early career researchers. Would it be more apt to state something like, "When conducted in a manner that clearly emphasizes rigorous, reproducible, and transparent research practices, it is plausible that supervision…"

That would indeed be more apt – we revised the sentence to emphasize that a particular style of supervision is necessary, namely responsible supervision. We now write:

“When conducted in a manner that emphasizes rigorous and transparent research, supervision can be an important manner to socialize PhD candidates into responsible research practices.”

Paragraph 1: The sentence focused on examples of RRPs cites articles that address RRPs to a degree, but some (e.g., Gopalakrishna et al., 2022) primarily focus on questionable research practices, not necessarily why RRPs might mitigate some of those concerns. I encourage the identification of additional references on RRP to be included here.

Thank you for flagging this, we added an additional 6 references that investigate outcome variables similar to ours, i.e., Iqbal et al. (2016); Wallach et al. (2018); Susanin et al. (2022); Roche et al. (2022), and Hughes et al. (2022).

Paragraph 2: Although scientific writing conventions vary, my own preference would be for your description to be in the personal possessive here (e.g., "…a pilot study conducted by the first author identified three components."). I think it is important for the reader to understand the origin of these ideas and that you are building on your own prior work.

It was indeed out of different conventions that we choose this description. We now rephrased it to: “… a pilot study conducted by a research team including the first author identified three components.” to denote the connection with previous research.

Paragraph 2: In the final sentence, consider also adding a point raised by the cited work around the importance of being able to safely scrutinize each others' work.

Thank you for this suggestion, we added the following point with references:

“This psychological safety in turn contributes to maintaining quality by creating an environment where colleagues can safely scrutinize each others’ work (Roberto, Bohmer, and Edmondson, 2006; Halbesleben & Rathert, 2008).”

Materials and methods

Materials Availability: I appreciated the excellent use of screenshots to guide readers through the procedures used (as documented on OSF).

We are pleased to read you found it useful.

Population: I encourage including some of the population information from your preregistration here as well. For example, readers may not intuit that full professorship is the normative rank for candidate supervision in Dutch universities and might mistakenly assume some sort of selection bias (e.g., if associate professors were more commonly supervisors but you identified mostly full professors…).

This is very helpful, especially given the international readership. We added the following clarifier:

“(in The Netherlands, the primary supervisor has to be a full professor, although recently associate professors can get these rights, too).”

Sampling Time: It is unclear why the data extraction timeframe resulted in the specific range of PhD thesis defense times. You explore some of this in your preregistration, but additional details would be useful here (even if it's just to indicate that normative delays reflected X – this also addresses why you planned to but did not include German universities, in part).

The sampling time frame of 2022-2021 was a result of the required sample size. We set out to include about 200 pairs and wanted to assure equal representation among the four institutions. For some institutions, this meant that we had to go back until theses defended late 2021. We now added the following clarification:

“We stopped sampling when we passed the required sample size, and wanted to include an equal share of pairs from each of the four university medical centers.”

Eligibility Criteria: Can you clarify whether "latest 2018" means "no earlier than 2018"? I suspect this is a linguistic difference but I want to be sure. In addition, can you explain why you included PhD candidates if they only had at least 2 publications? Is it possible that this inclusion criterion systematically excluded candidates or supervisors with specific characteristics?

Indeed, no earlier than 2018. This has been adjusted in the main text now. Initially, the criterion of at least 2 publications was intended to filter our German thesis that were Dr Med instead of PhD degrees – Dr Med theses can be built around one journal publication. In addition, Dutch university medical centers often apply the criterion of at least 2 journal publications the regulations of Maastricht University and Amsterdam University Medical center even denote this more explicitly. Hence, we see no reason to believe this systematically excluded PhD candidates. That said, we encountered cases where we could not include a pair because the supervisor did not have enough works, either as primary or last author, or works without the PhD candidate as a co-author. We reflect on this on the Limitations’ section as follows:

“This meant that at times, we had to exclude some pairs because the supervisor did not have a sufficient number of publications, meaning we may have a small bias towards productive supervisors.”

Data Preparation: I recommend referencing the Unpaywall FAQ here for data definitions (can be cited as a resource since the FAQ was prepared by Jason Priem; https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-). The advantage is that, as the FAQ notes, there is no formal classification schema that exists. I would not have known that "Green" OA includes institutional repositories because some journals use that term similarly to how this FAQ uses "Hybrid."

Excellent suggestion, we added the following sentence:

“We applied the following hierarchy: Gold, Green, Hybrid, Bronze, and Paywalled, following the interpretations of the different forms as described by Priem (2021).”

Data Analysis: I am not a statistician, but I wonder (perhaps naively) whether 'university' should have been included in the model as a clustering term, especially since some of the earlier literature cited indicates that the environment (beyond the mentor and lab) can contribute to RRP.

Thanks for flagging this, we agree and reran our models to see if university functioned as a potential confounding variable. This would mean that part of the association between behavior of the supervisor and the PhD candidate was actually due to the university environment. We now present crude and adjusted models side-by-side in Table 2. The upshot is that the institutional environment is of greater relevance for open access, but the associations identified for open data stand and are even strengthened by adding the environment.

Discussion

The first paragraph of the Discussion contains information that I would ordinarily associate with Results. That aside, can you clarify the interpretation of the open-access information? If these are presented as exp(b) values, then would a.18 would indicate that PhD candidates published open access less often when their supervisors did than when they did not?

Apologies for the confusion, the reviewer is right. When looking into this, we found that the recoding of our variable (never open access/sometimes open access/often open access) was flawed, as recent data showed that 76% of publications from Dutch researchers are open access (https://www.rathenau.nl/en/science-figures/output/publications/open-access-research-publications). Hence the groups we created when recoding were heavily skewed. We updated the preregistration, detailing the rationale for choosing a different coding scheme and recoded Open Access into up to the national average (0) and beyond the national average now (1). This created a binary variable. We reasoned that supervisors who publish Open Access more often than the national average could be seen as actively practicing open access, and thus as role models. We now write:

“Effects for Open Access were smaller (1.99) and became nonsignificant when correcting for the role of the institution.”

Contextualization: Since there is a substantial cost associated with publishing in many open-access venues, might an important contributor to variance (that might affect this model but is not included in the model) be the funding amount of the supervisor? For example, some supervisors may have limited ability to publish open access where a cost is incurred, some may be able to publish their own work (or at least some of their own work) open access, but fewer likely can afford to support their candidates in publishing at cost. If there is indeed a weaker relationship between supervisor publishing and candidate publishing than between data availability, then could this be a mediating or moderating factor?

This could indeed be true, and we added this to the Contextualisation section (Discussion), while noting that green OA is still open, but free of (financial) cost:

“Open access comes in different forms and publishing in open access journals is often not for free. Some publishers make exceptions for scientists from low and middle income countries, but The Netherlands would rightly not classify. It could thus be that the amount of funding that a supervisor or PhD candidate had available affected the relationship we studied. In other words: funding availability may determine whether PhD candidates (or supervisors) chose to publish in an open access journal. However, it should be noted that green open access, archiving a paper in an appropriate format in an (institutional) repository, can be done free of financial charge.”

The role of ECRs: I'm not sure that we should assume that ECRs have more opportunities to learn about responsible research, in general, though they may have more exposure to open research principles. This paper primarily focuses on open data and open publication. While these are important components of reproducibility and transparency, and while they may aid in the identification of problematic findings and work, they do not subsume the entirety of RRPs. Some components of RRP are longstanding or axiomatic ethical determinations or orientations, whereas the rise of open access has been relatively recent. So even highly ethical, responsible, and transparent senior researchers may be slow to uptake new publishing approaches.

We now refer to open science practices, and tried to make it clear from the outset that these are, as you rightly indicate, only a subset of RRPs. It seems true that many activities to educate researchers about open science practices do focus on ECRs, hence we trust the paragraph to be in order, given the greater current emphasis on open science.

The role of ECRs: Since your findings are nondirectional (e.g., correlations), it seems plausible (given the citations you provide) that you may be capturing a bidirectional relationship.

True, we incorporated this so that the paragraph now describes the possibility of PhD candidates influencing supervisors, supervisors influencing PhD candidates, or a bidirectional relationship. We end the paragraph with the following sentence:

“Finally, it could be that the relationship investigated here is bidirectional.”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Reviewer #2 (Recommendations for the authors):

I would like to thank the authors for the work put into this revision and the clarifications offered in the response to reviewers. I restate my perception that this work is interesting and provides a solid and useful contribution to the overall scientific enterprise in this area. I have a few questions and comments around the revised work that I hope you find useful.

Abstract

Here and elsewhere, you might want to use or provide the specific meaning of "actively published open access" since it's very specific to this article (e.g., published open access more often than the average Dutch professor did in 2021). There are also comments in the Discussion about the odds ratio that apply to the abstract (depending on whether you decide to make those revisions).

Thank you for flagging this, we now use the more descriptive “more often than the national average” to prevent confusion.

Introduction

I think that the modifications made to the Introduction section are helpful and address my comments in full.

Thank you.

Materials and methods

I am glad that my questions prompted further investigation into the analyses and variable structure. My only remaining comment relates to the new statement, "The 10% rule for confounding was applied." I think the application here is fine (again – I am not a professional statistician, so caveat emptor) but a citation may be useful. Some readers may be unfamiliar with that rule, and others may have questions about it (see, eg., Lee, 2014 – https://doi.org/10.2188%2Fjea.JE20130062).

Many thanks for suggesting this, we have added some references to sources (Budtz-Jørgensen et al., 2007; Beukelman & Brunner, 2016) that provide an accessible introduction to the confounder rule applied.

Results

In the title for Table 1, you might want to specify the unit of analysis (e.g., a unique DOI). While it is clear to me what is meant by the table, a reader who is skimming might assume that there are, for example 548 PhD candidates publishing open access, rather than 548 DOIs from PhD candidates that were published open access.

We have adjusted the description of the table accordingly, thanks for the close reading. It now reads:

“Prevalence for each practice (expressed as unique DOIs) appears in Table 1, as well as correlations between the PhD candidates’ engagement in a practice and the supervisors’ engagement in a practice.”

The new title for Table 1 is:

“Prevalence of Open Access publishing and sharing data openly among unique DOIs.”

If I understand your methods for identifying open data correctly, it may be worth removing the "Open data (automated)" information from the Results entirely. Here is my reasoning: Oddpub was used to extract possible instances of open data statements, but through manual verification, you triaged around half of those instances. Specifically, it seems that those that were excluded by manual verification did not have open data statements as affirmed by two to three human reviewers. Thus, I am not sure what value is added by analyzing the "Open data (automated)" variable, which seems to only describe a subset of papers for which a specific algorithm thought it found open data. We are not concerned with whether things are associated with whether a machine thinks there might be an open data statement, but rather whether things are associated with actual open data access.

We understand your reasoning but respectfully disagree for a number of reasons. First, The Oddpub tool by Riedel and colleagues (2020) has fairly good sensitivity (0.73) and excellent specificity (0.97) and has been used around the world. Hence, we would prefer to keep the passages in as it allows for international comparison. In addition, keeping it in allows others to re-do our analyses for integrity purposes. Also, the automated screening is arguably a less resource intensive manner (as compared to human deliberation) to assess data sharing. Finally, the sample size and power was calculated based on automated Open Access screening, not on human deliberation, and hence we would prefer to present both results side-by-side as we believe that they provide the most accurate description together.

Can you please consider verifying the total number of linked publications (81,091)? Given that the supervisor+candidate total n was around 2,000, that number would imply that the supervisors in this study published an additional 79,000 or so papers before 2017. While that is certainly possible, even if we assume that the 211 recent PhD candidates published an aggregate 10,000 papers (a very generous assumption), it would mean that the 211 supervisors published an average of 327 papers each, which is fairly remarkable. In some ways, this number is less relevant than the 2 retractions, but it does make me wonder.

We verified this. A few contextual remarks may be helpful. Firstly, researchers may have published since we stopped sampling last year. Secondly, the number of unique papers is smaller, as many publications are multi-authored. Thirdly, it is not entirely helpful to speak of averages here as we found 20% of the authors to account for 64% of the publications (see excel sheet attached to this revision and added to the GitHub repository). Note that for the first researcher who started publishing in 1982, over 2000 publications were identified.

Discussion

If you agree with my point about automated open data statement identification, you can also remove this sentence from the Discussion ("Based on the automated detection of data sharing statements, we found that having a supervisor that shared data is associated with 2.21 times the odds to share data when compared to having a supervisor that did not share data.")

As above, we would prefer to keep the passage in as the two ways of assessing data sharing together provide the most accurate description.

I would suggest revising this sentence: "Effects for Open Access were smaller (1.99) and became nonsignificant when correcting for the role of the institution." Instead, you could indicate that the unadjusted odds ratio was 1.99 (p=.011) and the adjusted odds ratio was 1.64 (p=.079). I think that the shift in odds is interesting even though it falls above the conventional significance threshold.

We agree this is a cleaner presentation of the main findings and now write:

“ The unadjusted open access odds ratio was 1.99 (p=.011) and became 1.64 (p=.079) when correcting for the role of the institution.”

You currently write, "This may indicate there being greater acceptance of and support for a particular practice in a research group. In other words: a responsible supervisory climate may empower PhD candidates to engage in open science more readily." You might want to contextualize this with something like, "In considering the associations that we identified, we speculate that…"

Thank you for suggesting this, we have largely taken over your formulation and now write:

“Considering the associations that we identified, we speculate that working under supervisors who engage in open science themselves could empower PhD candidates to engage in open science more readily.”

References

Riedel, N., Kip, M. and Bobrov, E., (2020). ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications. Data Science Journal, 19(1), 1-14. http://doi.org/10.5334/dsj-2020-042

Beukelman, T., & Brunner, H. I. (2016). Chapter 6 – Trial Design, Measurement, and Analysis of Clinical Investigations. In Textbook of Pediatric Rheumatology (7th ed., pp. 54–77). W.B. Saunders. https://doi.org/10.1016/b978-0-323-24145-8.00006-5

Budtz-Jørgensen, E., Keiding, N., Grandjean, P., & Weihe, P. (2007). Confounder selection in environmental epidemiology: assessment of health effects of prenatal mercury exposure. Annals of epidemiology, 17(1), 27–35. https://doi.org/10.1016/j.annepidem.2006.05.007

https://doi.org/10.7554/eLife.83484.sa2

Article and author information

Author details

  1. Tamarinde L Haven

    Danish Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Aarhus, Denmark
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    tlh@ps.au.dk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4702-2472
  2. Susan Abunijela

    QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
    Contribution
    Data curation, Formal analysis, Investigation, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Nicole Hildebrand

    QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
    Contribution
    Data curation, Formal analysis, Investigation, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared

Funding

Nederlandse Organisatie voor Wetenschappelijk Onderzoek (019.212SG.022.)

  • Tamarinde L Haven

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to acknowledge Martin Holst for his support during the pilot study to assess the feasibility of the approach. Benjamin Gregory Carlisle’s support of data science-related issues was crucial. Delwen Franzen helpfully adapted the UnpaywallR script for our project, and Nico Riedel revised the Oddpub script to work on our dataset. Evgeny Bobrov and Anastasiia Iarkaeva instructed us on how to use their protocol and provided the much-needed guidance when assessing challenging cases. Thanks also to Evgeny for pointing out that data sharing may not be equal across all biomedical fields. We also extend our gratitude to Jesper Wiborg Schneider who kindly helped with the retraction analyses and obtaining relevant author publications using the in-house version of the Web of Science database at CWTS, Leiden University, the Netherlands.

Senior Editor

  1. Mone Zaidi, Icahn School of Medicine at Mount Sinai, United States

Reviewing Editor

  1. David B Allison, Indiana University, United States

Reviewers

  1. Lisa Schwiebert, University of Alabama at Birmingham, United States
  2. Jon Agley, Indiana University Bloomington, United States

Version history

  1. Preprint posted: September 15, 2022 (view preprint)
  2. Received: September 15, 2022
  3. Accepted: May 4, 2023
  4. Version of Record published: May 22, 2023 (version 1)

Copyright

© 2023, Haven et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 448
    Page views
  • 32
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Tamarinde L Haven
  2. Susan Abunijela
  3. Nicole Hildebrand
(2023)
Biomedical supervisors’ role modeling of open science practices
eLife 12:e83484.
https://doi.org/10.7554/eLife.83484

Further reading

    1. Cell Biology
    2. Medicine
    Thao DV Le, Dianxin Liu ... Julio E Ayala
    Research Article Updated

    The canonical target of the glucagon-like peptide-1 receptor (GLP-1R), Protein Kinase A (PKA), has been shown to stimulate mechanistic Target of Rapamycin Complex 1 (mTORC1) by phosphorylating the mTOR-regulating protein Raptor at Ser791 following β-adrenergic stimulation. The objective of these studies is to test whether GLP-1R agonists similarly stimulate mTORC1 via PKA phosphorylation of Raptor at Ser791 and whether this contributes to the weight loss effect of the therapeutic GLP-1R agonist liraglutide. We measured phosphorylation of the mTORC1 signaling target ribosomal protein S6 in Chinese Hamster Ovary cells expressing GLP-1R (CHO-Glp1r) treated with liraglutide in combination with PKA inhibitors. We also assessed liraglutide-mediated phosphorylation of the PKA substrate RRXS*/T* motif in CHO-Glp1r cells expressing Myc-tagged wild-type (WT) Raptor or a PKA-resistant (Ser791Ala) Raptor mutant. Finally, we measured the body weight response to liraglutide in WT mice and mice with a targeted knock-in of PKA-resistant Ser791Ala Raptor. Liraglutide increased phosphorylation of S6 and the PKA motif in WT Raptor in a PKA-dependent manner but failed to stimulate phosphorylation of the PKA motif in Ser791Ala Raptor in CHO-Glp1r cells. Lean Ser791Ala Raptor knock-in mice were resistant to liraglutide-induced weight loss but not setmelanotide-induced (melanocortin-4 receptor-dependent) weight loss. Diet-induced obese Ser791Ala Raptor knock-in mice were not resistant to liraglutide-induced weight loss; however, there was weight-dependent variation such that there was a tendency for obese Ser791Ala Raptor knock-in mice of lower relative body weight to be resistant to liraglutide-induced weight loss compared to weight-matched controls. Together, these findings suggest that PKA-mediated phosphorylation of Raptor at Ser791 contributes to liraglutide-induced weight loss.

    1. Epidemiology and Global Health
    2. Medicine
    Jeffrey Thompson, Yidi Wang ... Ulrich H von Andrian
    Research Article Updated

    Background:

    Although there are several efficacious vaccines against COVID-19, vaccination rates in many regions around the world remain insufficient to prevent continued high disease burden and emergence of viral variants. Repurposing of existing therapeutics that prevent or mitigate severe COVID-19 could help to address these challenges. The objective of this study was to determine whether prior use of bisphosphonates is associated with reduced incidence and/or severity of COVID-19.

    Methods:

    A retrospective cohort study utilizing payer-complete health insurance claims data from 8,239,790 patients with continuous medical and prescription insurance January 1, 2019 to June 30, 2020 was performed. The primary exposure of interest was use of any bisphosphonate from January 1, 2019 to February 29, 2020. Bisphosphonate users were identified as patients having at least one bisphosphonate claim during this period, who were then 1:1 propensity score-matched to bisphosphonate non-users by age, gender, insurance type, primary-care-provider visit in 2019, and comorbidity burden. Main outcomes of interest included: (a) any testing for SARS-CoV-2 infection; (b) COVID-19 diagnosis; and (c) hospitalization with a COVID-19 diagnosis between March 1, 2020 and June 30, 2020. Multiple sensitivity analyses were also performed to assess core study outcomes amongst more restrictive matches between BP users/non-users, as well as assessing the relationship between BP-use and other respiratory infections (pneumonia, acute bronchitis) both during the same study period as well as before the COVID outbreak.

    Results:

    A total of 7,906,603 patients for whom continuous medical and prescription insurance information was available were selected. A total of 450,366 bisphosphonate users were identified and 1:1 propensity score-matched to bisphosphonate non-users. Bisphosphonate users had lower odds ratios (OR) of testing for SARS-CoV-2 infection (OR = 0.22; 95%CI:0.21–0.23; p<0.001), COVID-19 diagnosis (OR = 0.23; 95%CI:0.22–0.24; p<0.001), and COVID-19-related hospitalization (OR = 0.26; 95%CI:0.24–0.29; p<0.001). Sensitivity analyses yielded results consistent with the primary analysis. Bisphosphonate-use was also associated with decreased odds of acute bronchitis (OR = 0.23; 95%CI:0.22–0.23; p<0.001) or pneumonia (OR = 0.32; 95%CI:0.31–0.34; p<0.001) in 2019, suggesting that bisphosphonates may protect against respiratory infections by a variety of pathogens, including but not limited to SARS-CoV-2.

    Conclusions:

    Prior bisphosphonate-use was associated with dramatically reduced odds of SARS-CoV-2 testing, COVID-19 diagnosis, and COVID-19-related hospitalizations. Prospective clinical trials will be required to establish a causal role for bisphosphonate-use in COVID-19-related outcomes.

    Funding:

    This study was supported by NIH grants, AR068383 and AI155865, a grant from MassCPR (to UHvA) and a CRI Irvington postdoctoral fellowship, CRI2453 (to PH).