Abstract
For over thirty years, research has highlighted a sex bias in early research, risking the validity of biological knowledge. The first step towards change is effectively challenging misconceptions allowing researchers to perceive sex inclusive research as do-able. Utilising the theory of planned behaviour, we quantified researchers’ intention as a proxy measure for conducting sex inclusive research and explored attitude (value of the behaviour), subjective norm (perceived social pressure) and behavioural control (ability to conduct the behaviour). Additionally, we quantified the knowledge gap, prevalence of misconceptions, and assessed perceived benefits and barriers. We tested a workshop intervention that directly challenges the cultural embedded barriers. The data shows researcher’s intentions were high, but they had weak statistical knowledge and misunderstandings leading to a perception that inclusive research is prohibitive due to cost and animal use. We demonstrate that participation in the training intervention improved knowledge, altered the perceived barriers and cultural expectations.
1. Introduction
Research culture encompasses the values, ideals, norms and thus the resultant behaviours of a community. This resulting culture in turn has substantial impact on how research is conducted and communicated. There is a long-established embedded practice of studying only a single (typically male) sex in preclinical research and then generalising the results to a wider population (1–3).
Since the 1990’s, it has been recognised that this approach is highly limited (4), with an emerging body of published evidence highlighting that sex may profoundly influence biological response (5, 6). Consequently, there has been growing awareness that unless sex is factored into experimental design its influence remains unknown. Importantly, it is now appreciated that single sex studies result in a knowledge bias (7, 8). The need to drive cultural change has been highlighted, with the goal of enabling equitable research at the exploratory stage (9) which delivers a translational body of knowledge relevant to the broader population (10).
Despite numerous funding bodies establishing initiatives to encourage researchers to routinely integrate males and females into basic, preclinical, and clinical research, little progress has been observed (11, 12). As a response to the status quo, numerous funding bodies have recently issued inclusion mandates requiring justification for single sex studies (9, 11, 13). The proportion of published papers including females and males has improved (14, 15). A study comparing inclusion between 2009 and 2019 across nine biological disciplines, found that six out of the nine disciplines saw improved representation, and overall inclusion improved from 26% to 48% of published studies (15). Whilst a positive step forward, this proportion still represents a minority of studies, and notably this inclusion was not associated with an increase in the proportion studies that included data analysed by sex. Additionally, studies identified methodological issues when males and females were included. Examples include: a failure to report the proportion of the sexes in the cohort (16), inappropriate data analysis such as disaggregation, pooling or comparison of p values to assess for treatment by sex interaction (14, 17). There is a need to not only improve inclusion but also improve the analysis of the subsequent data to ensure integrity of the conclusions made.
Notionally scientists support efforts to improve sex representation in research (15, 18, 19). However, sociological research suggests that many researchers believe inclusion is not “doable” (20, 21). The perceived barriers cited encompass ethical, financial, statistical, and practical constraints (18, 22, 23). An analysis of 30 single sex justifications identified six themes: concern around a known sex difference or sex effect (30%), fear of increased experimental variability (27%), experimental conditions limiting use of female and male animals (13%), limited sample size (13%), inability to sex subjects (10%), and issues with animal husbandry (7%) (15). Many of the barriers have been highlighted as cultural ingrained misconceptions (23). For example, a commonly proffered argument for exclusion is that female animals are inherently more variable, and thus inclusion will lead to the need for larger sample size (N). This is despite a wealth of studies that reject this thesis of increased variability across a range of biologically relevant endpoints (24–26). Alongside common perception that inclusion will require a doubling of the sample size as standard (20, 27, 28), researchers have also questioned the value of studying multiple sexes in preclinical research, suggesting that single sex in vivo studies can be generalisable to the wider population (20).
The scale of the misconceptions and statistical misunderstandings that result in barriers to inclusion have resulted in a call for training (9, 23). We developed a workshop directed at equipping participants with knowledge of the current best practice for sex inclusion. The workshop included material designed to explicitly address many of the known misconceptions and barriers to sex inclusive research. To evaluate the efficacy of this intervention we ran two independent experiments using a survey to collect quantitative and qualitative data regarding self-efficacy (confidence to conduct inclusive research), knowledge, and views of scientists on the issues regarding incorporating males and females into preclinical in vivo research. The first experiment was conducted at an international conference with a parallel group design testing the general population attending the conference, those who expressed interest in the topic before attending a related seminar and those who attended the intervention. The second experiment was conducted at a UK Russell group university with a pre- and post-intervention design. The survey was based on the theory of planned behaviour; as an established strategy for evaluating and understanding human behaviour (29). The theory of planned behaviour is a psychological theory that quantifies behavioural intention as a proximal determinant of behaviour through three core beliefs (attitude, subjective norms and perceived behavioural control: Box 1). We then assessed the impact of the intervention, with three key objectives; (1) to capture and quantify the scale of misconceptions around sex inclusive studies within the research community, (2) to assess whether the intervention led to an improvement in knowledge, (3) to explore statistically the impact of the intervention on the intention to conduct sex inclusive research, alongside identification of other predictors of intention. Taken together, the studies demonstrate that the intervention is effective and gives insight into how the research community can expedite change in research practice.
Box 1: Glossary
Technical terms, used within this manuscript, and their associated definitions are detailed in the table below. For those terms associated to the theory of planned behaviour, the definitions provided were originally published by Icek Ajzen (29). The name in brackets represents the term used within the statistical model and associated output.


Workshop Intervention construct
2. Results
2.1 Demographic results
2.1.1 Study 1
Across the three test groups, the demographics and distribution for the potential predictor variables were found to be balanced (Supplementary Table 1). Furthermore, no significant correlation (p > 0.05) was found between the continuous variables as assessed by a Pearson correlation analysis (Supplementary Analysis 1A). Missingness was observed in two demographic variables: two participants (Baseline:1; Interested:1) did not provide information on the years worked in animal research (missingness = 1.9%) and seven responders (Baseline:4; Interested:3) did not provide their age (missingness = 6.7%). The two that did not report years worked also did not report their age.
The majority of the participants were female (61%), had an average age of 39 years, had a PhD (63%), worked at an academic institution (97%) in Europe (58%) and predominantly studied basic biological mechanisms (68%). A large proportion of the participants (71%) had received some type of formal statistical training but 41% were not, or not very, familiar with factorial designs. Whilst 63% of participants are always, or often, involved, or can influence, the planning of experiments, the majority (62%) have only incorporated females and males in 50% or less of their studies. On average, the participants had been involved in animal research for 13.8 years.
2.1.2 Study 2
Analysis of the demographic data (Supplementary Table 2) found that the majority of the participants identified as female (52%), had an average age of 35 years, had a PhD (68%) and predominantly studied basic biological mechanisms (52%). A large proportion of the participants had received some type of formal statistical training (68%) but 55% were not, or not very, familiar with factorial designs. Whilst 81% of participants are always, or often, involved, or can influence, the planning of experimental designs, the majority (51%) have only incorporated females and males in 50% or less of their studies. On average, the participants had been involved with animal research for 10.6 years. No statistically significant correlation (p > 0.05) was found between the continuous variables as assessed by a Pearson correlation analysis (Supplementary Analysis 1B).
2.2 Exploration of advantages and barriers
2.2.1 Study 1
Participants had an opportunity to articulate the barriers and benefits to inclusive research by either selecting preset options or entering free text. For most preset options, there was no statistically significant difference seen in the proportion selected between the treatment groups (Fig 1A-B, Supplementary Analysis 2, Supplementary Analysis 3). We did not expect differences between the baseline and interested group but hypothesised that we would observe a shift in the intervention group following the workshop with a reduction in those identifying misconceptions as barriers and an increase in the advantage statements selected. However, power is low to see a change in this exploratory proportion data due to the low sample size of eligible attendees at the workshop. Two exceptions occurred in the benefit question.

Exploration of the perceived barriers and benefits of inclusion of males and females in in vivo research from study 1.
Survey data collected in study 1 with 39 participants in the baseline group, 51 in the interested group and 15 in the intervention group. A: Barrier comparison between treatment groups. B: Benefit comparison between treatment groups. C: Proportion time barrier selected in the general population (baseline and interested group combined). D: Proportion time benefit selected in the general population (baseline and interested group combined). A Pearson’s chi-squared test was used to compare proportions between the treatment groups. Statistical significance is highlighted with a horizontal bar and if the p-value is less than 0.05, it is flagged with one star (*). If the p-value is less than 0.01, it is flagged with 2 stars (**). If a p-value is less than 0.001 it is flagged with 3 stars (***).
Firstly, for the preset option: ‘efficient use of all animals from breeding’ we saw a statistically significant increase for the intervention group selecting this as a benefit of inclusive research (34.2% and 37.5% selecting this option in the baseline and interested group respectively which then increasing to 80% in the intervention group). We also observed a statistically significant increase for the intervention group selecting the translation option (74% baseline, 68% interested, 100% intervention group).
To represent the general population, the baseline and interested treatment groups data was pooled (Fig. 1C-D). A higher proportion of participants selected benefits that are associated with generalisability (translatability and understanding sex differences) over robustness (reproducibility) and ethical use of animals. A small proportion of the sample (1.2%) did not perceive any barriers. The predominate barrier was associated with cost (48.9%). 35% of the participants raised male animal aggression. Barriers associated with misconceptions (Female variability and sample size concerns) were raised by 25% of the participants.
2.2.2 Study 2
The study design allowed a comparison of the pre- versus the post-selection of barriers and benefits. For some barriers, statistically significant differences were seen in the proportion of time they were selected (Supplementary Analysis 4, Fig. 2.). The workshop was designed to challenge seven of the most frequently cited barriers by highlighting that they were based on misconceptions. The intervention produced a statistically significant reduction in five out of the seven barriers, with an average reduction of 28% citing the barrier post-workshop (Fig 2A). Participants had the option of selecting alternative barriers and entering free text options. Of these, there was no statistically significant change in the proportions selected between the data collected pre- and post-intervention (Fig 2B). For some of the barriers, the proportions observed in the general population in study 1 (baseline and interested community) were similar to that seen in study 2 in pre-intervention data (e.g. ‘Cost’, ‘Female variability’)(Fig 2E). Whilst other barriers saw a statistically significant difference between studies, which could represent a difference in the local culture and practices (Fig 2C). For example, relative to study 1, there was a 24% increase in the proportion selecting test material availability in study 2, whilst we observed a 25% reduction in those selecting male animal aggression concerns in study 2.

Exploration of the perceived barriers and benefits on the topic of inclusion of females and males in in vivo research for study 2.
Data collected from study 2 with 29 participants completing the barrier question in the pre-survey and 28 in the post-survey. A: Barriers associated with misconceptions that the intervention was looking to address. B: Other barriers. C: Pre- versus post-comparison of the benefits of inclusion. D: Comparison of the benefits selected in study 1 general population (baseline plus interested group) versus study 2 pre-testing group. E: Comparison of the barriers selected in study 1 general population (baseline plus interested groups) versus study 2 pre-testing group. A McNamar’s test of association was used to compare the proportions between the pre and post intervention data in study 2. A chi-squared test was used to compare proportions between the two studies. Statistical significance is highlighted with a horizontal bar and if the p-value is less than 0.05, it is flagged with one star (*). If the p-value is less than 0.01, it is flagged with 2 stars (**). If a p-value is less than 0.001 it is flagged with 3 stars (***).
A comparison of the pre- versus post-data (Fig. 2C), found the intervention had little impact on the proportion of time benefits were selected except for “reproducibility” where the intervention led to a statistically significant increase (26%). The proportion of time benefits were selected were equivalent between study 1 and study 2 (Fig. 2D; Supplementary Analysis 5).
2.3 Exploration of intention
2.3.1 Study 1
The average intention score, across all treatment groups for study 1 was high (5.67 ± 1.19, possible range 1:7). A statistical analysis assessed which variables could explain variation in intent to conduct sex inclusive research (Table 2, Fig 3). Two of the three theory of planned behaviour attributes, subjective norm and attitude, positively correlated with intent (Fig 3A-C). The data indicates that as the participant’s attitude increases (i.e. feel incorporating males and females is important) their intention to incorporate females and males increases; with a high mean attitude of 6.5 ± 0.87. The subjective norm attribute had a similar positive correlation with intent, indicating that when participants perceived higher societal pressure, they had higher intention. Subjective norm differed from attitude in that at the population level the score was in the mid-range (4.7 ± 1.4). The intent for those who attended the workshop was significantly higher, with approximately 20% increase in the intention score (Fig 3D). However, no significant difference in intent was found between the baseline and interested treatment groups. The only other variable to have some evidence of predictive ability was how often a participant was involved in or could influence the planning of experiments positively correlated with intent to incorporate females and males (Table 2, Fig 3E). Thus, suggesting individuals more involved in planning experiments had a higher intention to run inclusive designs.

Exploration of intent for significant and critical predictor variables for study 1 data.
A full model, with all potential predictors and demographics was fitted to explore the variation in intent and assess for evidence of predictive behaviour. The baseline group was set as the reference group. If main effects were significant the variation was explored with Tukey post hoc testing. For graphs A, B and C the grey area indicates the 95% confidence interval for the fitted linear relationship (blue). D: Data are presented as the model estimated mean (Least Square Means) with a standard error bar estimated from the model. Bar represents the planned comparison with * representing statistical significance <0.05. E: Violin plot showing the distribution of intent as a function of the ability to influence the design. The red box indicates the mean.

Workshop Intervention construct


Statistical model output for the full model for study 1 data exploring the predictors ability to explain variation in average intent.
Where Beh_control represent the average behavioural control score, Soc_norm the average social normal score, Year_Work represented the number of years the participants have worked in animal research, Type_Work represents the type of research conducted by the participant, Education the highest level of education obtained, Stats_Training represents the level of statistical training received, Factorial_Fam represents how familiar the participants were with factorial experimental design, Factorial_Incor represents how often the participants incorporated males and females into their experiments while studying an intervention, attitude represents the average attitude score, and Ability_Influence represents how often the participants were involved or could influence the planning of experiments involving animals. Nparm stands for the number of parameters, Df represents the degrees of freedom and Prob > F represent the p value associated with the F ratio. Statistical significance shown as * for p-value < 0.05, ** for p< 0.01 and *** for p<0.001.
2.3.2 Study 2
In an exploration of intention for study 2, only subjective norm positively correlated with intent to conduct a sex inclusive study (Table 3, Fig 4A-C). Attitude did not predict intention but was found to have a very high average score (6.7 ± 0.51). Like study 1, behavioural control was not predictive but had a mid-range average score (4.3 ± 1.14). The workshop intervention was not found to explain variation in the intention score (Table 3, Fig 4D). However, the intention score in the pre-test group (5.71 ± 1.08) was only slightly higher than the intention score in the general population in study 1 (baseline and general interest; 5.51 ± 1.2). Age was also found to be negatively correlated indicating older staff were less engaged with sex inclusive research in this dataset (Table 3, Fig 4E).

Exploration of intent for significant and critical predictor variables for study 2 data.
A full model, with all potential predictors and demographics was fitted to explore the variation in intent and assess for evidence of predictive behaviour. For graphs A, B, C and E: the grey area indicates the 95% confidence interval for the fitted linear relationship (blue). Graph D displays density and individual participant pre/post Box- Cox transformed intent values.

Statistical model output for the full model for study 2 data exploring the predictors ability to explain variation in average intent.
Where Beh_control represent the average behavioural control score, Soc_norm the average social normal score, Year_Work represented the number of years the participants have worked in animal research, Type_Work represents the type of research conducted by the participant, Education the highest level of education obtained, Stats_Training represents the level of statistical training received, Factorial_Fam represents how familiar the participants were with factorial experimental design, Factorial_Incor represents how often the participants incorporated males and females into their experiments while studying an intervention, attitude represents the average attitude score, and Ability_Influence represents how often the participants were involved or could influence the planning of experiments involving animals. Nparm stands for the number of parameters, Df represents the degrees of freedom and Prob > F represent the p value associated with the F ratio. Statistical significance is flagged with one star (*) if the p-value is less than 0.05, with 2 stars (**) if less than 0.01, and with 3 stars (***) if less than 0.001.
2.4 Exploration of knowledge
2.4.1 Study 1
To complement the evaluation of intention and barriers and advantages, five questions were included to assess the knowledge of the participants on common misconceptions and errors around planning and data analysis of sex inclusive studies (Supplementary Analysis 6, Fig 5). The cumulative knowledge score did not significantly differ between the baseline and interested treatment groups (z = - 0.513; p = 0.601) (Fig 5A). Whilst the workshop intervention group had a significantly higher score relative to the baseline (z = 3.515; p = 0.0044) with an average increase of 1.94 questions correctly answered. This could also be seen in a comparison of proportion of correct answers for each individual question with a statistically significant higher score in the intervention group for 3 out of the 5 questions (Fig. 5B).

Exploration of intervention impact on the proportion of correctly answered knowledge questions.
A: Study 1: Impact of treatment group on the cumulative knowledge score. Statistical significance assessed with a Poisson regression (Baseline: N=39, interest: N=51 and intervention: N=15). B: Study 1: Proportion of correct answers for each question (Baseline: N=39, interest: N=51 and intervention: N=15). Statistical significance assessed with a Pearson’s chi squared test. C: Study 2: Impact of intervention on the cumulative knowledge score. Statistical significance assessed with a paired t-test (N=26 with pre and post data available). D: Study 2: Proportion of correct answers for each question (Pre: N=29, Post: N=28). Statistical significance assessed with McNamar’s test of association. Statistical significance shown as * for p-value < 0.05, ** for p< 0.01 and *** for p<0.001.
2.4.2 Study 2
The knowledge questions allow exploration in dataset 2 on the common misconceptions around sex inclusive designs and common errors in data analysis of sex inclusive studies (Supplementary Analysis 7, Fig 5). The workshop intervention was found to significantly improve the score (t = 4.328, df = 25, p = 0.0002) with an average increase of 1.769 questions answered correctly (Fig 5C), a similar effect size to study 1. The impact of the intervention could also be seen at an individual question level with a statistically significant higher score in the intervention group for four out of the five questions (Fig. 5D).
3. Discussion
General population conclusions – barriers/benefits/knowledge
The survey provided an opportunity to understand researchers position around sex inclusive research. Across the two studies, the benefits of sex inclusive research were selected at a similar rate with a higher proportion of participants valuing the benefits associated with generalisability of the findings (ability to understand sex differences and translation) over robust ethical research. This recognition of the benefit of inclusive research aligns with previous suggestions that scientists are generally supportive of efforts to improve sex representation in biomedical research (23, 28). However, the focus on the ability to understand sex differences aligns with a common misconception that the goal of inclusion is to identify sex differences. This misconception causes a dissonance with the recommendation to share the sample size (N) for a treatment effect between the two sexes as this approach does not power experiments to identify sex differences. Being explicit on the experimental design goals and the impact on the design was raised by Rich-Edwards (9) who developed a “4 Cs” decision framework to guide researchers in sex inclusive research by focusing on four steps: Consider, Collect, Characterise and Communicate. Within this framework, they identify two pathways depending on whether the study is an exploratory study (where males and females are included to improve generalizability) versus confirmatory study (those actively exploring sex- related differences) and guide researchers in the design and analysis implication of this. There is a culture shift needed that the primary goal in exploratory sex-inclusion is to deliver generalisable results where large differences in sex-related variation will be identified. It is this understanding of the difference between exploratory versus confirmatory sex inclusive research that in our opinion is deficient.
Research has found that scientists do not believe that inclusion is ‘do-able’ (21) due to a range of perceived barriers to inclusion (13, 22, 23, 25, 30). This study provides the first quantification of the scale of these perceived barriers, of which many are misconceptions, and finds them endemic. Seldom did the sampled community express the absence of barriers. Cost was the most cited barrier, approaching half of the community raising this as a concern. This was followed by on average a third of the community selecting barriers related to misconceptions such as experimental design complexity, sample size concerns and female variability.
For some of the barriers, we observed a statistically significant difference between studies (test material availability and male animal aggression), which could represent a difference in the local culture or research interests impacting the perceived barriers. For example: In study 1, male mouse aggression was raised by 30% of the respondents but only 10% in study 2. This finding is at odds with the observation that male animal research predominates (15) but could relate to welfare management practices prioritised within that community. Sharing best practices and working on housing and husbandry conditions could minimise the impact of this concern (31–33).
The responses to the knowledge questions also give an opportunity to directly assess whether participants are holding a misconception. By focusing on the baseline and interested group in study 1 or the pre-intervention data in study 2, we get an estimate of the general research population. This finds that misconceptions dominate the research community beliefs. For example, the misconception that inclusion of males and females would double the N (80% of study 1 N=90, 79% of study 2 N=28) held this view. This is probably arising from the perception that sex inclusive designs increased variability and therefore require a larger overall sample size (90% of study 1 N=90, 93% of study 2 N=28). This perception has been acknowledged in previously published articles (13, 20, 21, 25, 26), however, this is the first known quantitation of the extent of these viewpoints.
Regarding the data analysis, most researchers (75.6% in study 1 general population N=90, and 72% of the participants in pre-assessment of study 2 N=28) correctly knew that sex should be included in the statistical model. Questions that are more specific about analysis strategies finds the proportion of incorrect answers to be very high. For example, most thought the data could be pooled across the sexes (81.1% in study 1 general population, 83% study 2 pre-assessment) and that the data should be disaggregated (80% in study 1 general population, 72% study 2 pre-assessment). The prevalence of incorrect ideas around data analysis aligns with a published meta- analysis that analysis mistakes are common when females and males are included in addition to the intervention of interest in research (17). The high proportion of correct answers for the initial question to include sex but then high proportion of agreeing with poor analytical strategies suggests that participants have a sense they need to do something but either don’t know what the right analysis strategy is or are unaware of the problems with the suboptimal analysis strategies.
It is of note that no one selected data analysis concerns in study 1 and infrequently in study 2 (7% in the pre-test assessment) as a barrier to inclusion which suggests researchers are unaware of this shortcoming. This aligns with research identifying that statistical errors in the analysis of sex inclusive datasets are common (17). The difficulty in recognizing one’s own incompetence is a known cognitive bias, described as the Dunning-Kruger effect (34). Research has shown that training not only improves participants skills but also helps the individuals recognize the limitations of their ability (34). This highlights the importance of investing in training researchers in experimental design and data analysis and that this is a critical step to enable sex inclusive research.
These findings highlight the scale of the misconceptions hindering engagement with sex inclusive research and the need for research practice leaders to focus both on these general misconceptions but also to determine local barriers that need exploration with their research communities.
Exploration of intention to inclusive research
While training is crucially important, it alone is unlikely to completely solve the imbedded cultural problem we currently face. Simply knowing that one should exercise to improve your health, does not mean one will get up every morning to go to the gym. Thus, knowing exactly what drivers are needed to influence the required behavioural change are equally as important as making sure people receive training to enact the change. The theory of planned behaviour provides us with key insights on how to influence intention and ultimately behavioural change. This is key as the intention to conduct sex inclusive research in the future was generally positive (5.51 ± 1.25 in study 1 general population, 5.71 ± 1.08 study 2 pre-assessment out of a total score of 7) and attitude was very high (6.41 ± 0.92 in study 1 general population, 6.66 ± 0.67 study 2 pre-assessment out of a score of 7). This indicates that generally, scientists view sex inclusive research as beneficial, useful, and the right thing to do. Despite this, the researchers reported only conducting sex inclusive research in ≤ 55% of their studies over the past 5 years (55% in study 1 general population and 35% study 2 pre-assessment). Thus, confirming the above posit that simply knowing sex inclusive research is important is not enough to influence active incorporation, and since attitude is already high, it leaves little room for further influence on behavioural intent.
Although attitude was high, subjective norm differed at the population level where the score was more in the mid-range (4.6 ± 1.4 in study 1 general population, 5.14 ± 1.11 study 2 pre-assessment). This factor was also the only one of the three theory of planned behaviour’s beliefs found to significantly influence intent in both studies with large effect sizes. Understanding the strong positive correlation between subjective norm and intent allows for focused interventions to increase societal pressure on the scientific community which should then improve overall intention towards sex inclusive research.
While researchers may have the right motivation, performing the behaviour is dependent on other non-motivational factors such as the time, money, or skills (29). A behaviour can only be conducted if the researcher is the decision maker. Not only does actual control influence behaviour but so does the perception of behavioral control, such as self-efficacy. Self-efficacy encompasses both how well a person can do the behaviour but also how confident they feel about doing it. Self-efficacy opinions can influence choice of activities, preparation, as well as effort. In this case, we believe that the lack of significance in our model is due to the false perception of how to accurately design and analyse factorial designs, evidenced by the low knowledge score. According to Ajzen, perceived behavioural control may not be realistic if a participant has little knowledge about the behaviour in question (29). Again, based on the low knowledge scores of how to design and analyse sex inclusive data may indicate that this measure may not add much accuracy to behavioural prediction. However, it does indicate the overwhelming need to create a training that is accessible and practical to all in vivo scientists.
Impact of the intervention on intention
We had an opportunity to incorporate a workshop at an international conference and as part of this we decided to assess how effective that training was. As many have experienced, the practical aspects and decisions that need to be made in an in vivo experiment are rarely covered in traditional statistics courses. This often leaves scientists unsure how to apply what they have learned in the real world. Further, 26- 29% of the participants in this study had received no formal statistical training, and yet 71-75% can always or often influence the experimental design. Illustrating the need for continuing education and practical instruction for established scientists. In the first study, trained attendees had the highest level of intent to conduct sex inclusive research in the future. Potentially indicating that our workshop was effective. However, it is also plausible that those who signed up inherently had a higher level of intent. Thus, we felt it was important to evaluate attendee’s intent before and after the training. So, this naturally led to our second study where we found that intent was not altered by the workshop, because it was already high. Even though intent was not statistically altered (Pre: 5.71 ± 1.08; Post: 6.00 ± 1.03), participant’s knowledge increased and indicated fewer barriers after the training. Thus, illustrating that education alone may not be enough to change behaviour but utilizing the insight gained that social pressure may be the best avenue to influence actual behavioural change.
Impact of the intervention on perceived benefits and barriers
The primary goal of the intervention was to challenge misconceptions; however, the workshop material naturally started the learning journey by exploring the drivers for, and barriers preventing, change. When focused on the benefits, we found that following the training intervention participants tended to select each of the benefits at a higher proportion compared to other groups. There was a statistically significant change in a few of the benefits: a large increase in the selection of efficient use of animals and translatability in study 1 and reproducibility in study 2. The infrequent statistical significance of the change at the individual benefit level is likely to arise from low statistical power due to the high baseline signal and low sample size.
An exploration of the study 2 data found that the intervention did not generally reduce the selection of barriers but did decrease selection of the barriers associated with the misconceptions. This demonstrates that the targeted intervention was effective, with a meaningful effect size (a 28% reduction on average), at improving the perception that generalisable sex inclusive research is feasible.
Impact of intervention on knowledge
It has been highlighted previously that early career researchers lack awareness of best practice recommendations across their areas of research (35). To address this, it is important that training initiatives directed at improving knowledge of best practice are put in place by institutions, with the goal of continuously improving practice, and supporting cultural change. This means that training must extend beyond that provided simply as part of graduate and post-graduate studies, and that implementing training opportunities as part of continuing professional development of research staff.
In the baseline and pre-intervention data, performance on the panel of knowledge questions was generally poor, with only the question around whether sex should be included in analysis answered correctly by more than half the participants. This is consistent with previous findings that statistical approaches, even studies designed as sex inclusive, are frequently erroneous (17). Encouragingly, post-intervention participants demonstrated significant improvements in knowledge. For example, far more post-intervention researchers correctly identified that disaggregation and/or pooling of the sexes within a statistical analysis was a mistake, particularly in study 2.
Given that the majority of commonly cited barriers stem from misconceptions which are arising from knowledge gaps, the evidence that targeted interventions can directly enhance knowledge offers promise for promoting cultural change through educational initiatives. Despite across-the-board improvements in performance, the outcomes still leave room for further improvement. For example, nearly 40% of post- intervention participants still expressed the view that inclusion of both sexes required a doubling of the sample size. This highlights that a single session is likely insufficient to equip researchers with all relevant knowledge and would thus support a future strategy of continuous improvement through multiple training and learning strategies.
Limitations of the study
Whilst the data indicate clear benefit of the workshop intervention, there are several caveats and limitations that should be considered and acknowledged. For both studies, university researchers made up the majority of the population. The workshop was hosted by a recognised senior leader (Professor Ahluwalia) both at the World Congress (role: WCP2023 Secretary-General) and within the university setting (role: Dean for research at the Faculty of Medicine and Dentistry). It is possible that participant awareness of the views of the senior leader may have influenced the positive outcome of the intervention. For study 1, the congress was specifically focussed for pharmacologists, and it is possible that the outcomes reflect only this sector of biomedical research. Although for study 2, the workshop was advertised across a multi-faculty university and likely represented a more specialty- diverse population. Whether the workshop would effect a similar change in understanding from those that work within the pharmaceutical industry requires further evaluation. The impact of the workshop could depend on the skills and experience of the trainers. Furthermore, the results could be specific to a face-to- face training event and whether a virtual course (which has the benefit of scale) would have similar benefit would need further exploration. This workshop focused on the barriers and misconceptions around the design and analysis. The material did not tackle any of the practical barriers e.g. management of welfare issue with male mice. There will be a need for future work to explore how to support and address these barriers. Finally, the questions posed to workshop attendees did not state whether the queries were being posed for a first exploratory study or not. It is possible that had this been made explicit then the responses could have differed.
4. Conclusions
An endemic and persistent sex bias in early research has been raised as a risk to the validity of biological knowledge. Sociological research finds that scientists believe sex matters, but change has been limited. Funders are actively trying to change behaviour by requiring a justification for exclusion. Changing the status quo requires individuals to buy into a new direction that aligns with their values and beliefs. Here we demonstrate that a workshop intervention can rescue the core beliefs that hinder sex inclusive research. We also provide evidence that institutes, funders, professional bodies can help this journey by raising awareness of best practice as this will change the perceived cultural norms. All these resources used in this research both the teaching material and the surveys have been made freely available as supplementary material to support further research and activities.
5. Materials and Methods
All procedures and informed consent protocols were approved by the Queen Mary University of London Ethics Research Committee (ID: QMERC23.001). The protocols obtained informed consent after the nature and possible consequences of the study were explained. In November 2023, an amendment was approved to run additional workshops for staff and students at Queen Mary University of London (ID: QMERC23.002).
5.1 Study 1 Experimental Design
In this study, data from three treatment groups (baseline, interested, intervention) were collected at the 19th World Congress of Basic & Clinical Pharmacology 2023 in Glasgow, Scotland with an on-line questionnaire (Supplementary material 1) by convenience sampling within each treatment group. Participants attending the poster sessions on 3rd and 4th July were recruited by circulating staff to contribute to the baseline treatment group, thus sampling the general population that attended the conference. Participants for the interested group were recruited by approaching attendees who were in attendance at a related symposium advertised in the conference proceedings on the 4th of July 2023 (entitled: “The importance of interrogating sex differences in cardiovascular physiology and disease”). Participants for the intervention group were recruited by advertising a free educational workshop in the conference proceedings entitled “Best practice for sex inclusive research” conducted on the 5th of July 2023 and data were collected post intervention. Workshop participants were given the option to participate in the survey at the end of the workshop. To compensate participants, in any of the treatment groups, for their time, they received an entry into a drawing for one of ten £50 Amazon e-gift cards. Participants were included if they conducted research on a disease or biological phenomenon that affects male and females and were able to influence the plan or conduct of in vivo research. At the point of data collection, the organisers were not masked (blinded) as we knew the hypothesis and the intervention group in the study. However, data was collected by sharing a QR code with the participants who then completed the data entry independently and therefore our ability to influence the results were minimal. A sample size estimate was conducted a priori using the highest standard deviation (1.4) observed in a similar study (36) that also used a survey based on the theory of planned behaviour. With this standard deviation, to detect a change of one point between the groups a power calculation based on the student’s t-Test estimated 31 participants per group (62 total) would be needed to achieve a power of 0.8. We aimed to collect data from 50 participants in each treatment group, to ensure we had sufficient data after dropout (failure to attend or meet the inclusion criteria).
5.2 Study 2 Experimental Design
In this study, a pre- and post- intervention design was implemented to assess the impact of the workshop with an online questionnaire (Supplementary material 2 (pre- questions) and 3 (post-questions)) with convenience sampling of a pre-existing group. Participants from Queen Mary University of London were recruited by advertising an educational workshop entitled “Best Practice for sex inclusive research” conducted on the 14th of February 2024. To compensate participants for their time, they received an entry into a drawing for a single £50 Amazon e-gift card. Participants were included if they conducted research on a disease or biological phenomenon that affects female and males and were able to influence the plan or conduct of in vivo research.
We aimed to recruit 50 participants to the workshop and only collected data during the workshop. This was the maximum number we felt would be feasible to maintain the interactive nature of the workshop. The implementation of a pre- and post-design should have higher power than the previous study (see power calculation section 2.1) to detect changes. At the point of data collection, the organisers were not masked (blinded) as we knew the hypothesis and the intervention point in the study. However, data was collected by sharing a QR code with the participants who then completed the data entry independently and therefore our ability to influence the results were minimal.
5.3 Survey construct
The survey was developed through consulting with experts in survey methodology, behaviour theory, and sex inclusive research design and analysis. When possible, validated instruments were used (i.e., questions based on the TBP). When validated instrumentation did not exist, new items were created, reviewed by statistical experts, piloted, and revised as necessary. Two cycles of pilots were conducted with in vivo pharmacology researchers (N=9) from both academic and pharmaceutical settings. Survey 1 consisted of 39 questions and survey 2 a maximum of 34 questions in a testing phase (Supplementary Material 1). Demographic questions were omitted from the post-question format in Study 2 and data from the same individual were aligned between the pre- and post-questions through the 3+ initials inputted by the researcher (pre- survey: Supplementary Material 2 and post-survey: Supplementary Material 3).
The initial four questions obtained consent and assessed whether participants met the inclusion criteria. If the participant did not grant consent or failed to meet the inclusion criteria on any question the survey was terminated. Thereafter, all questions were presented to each participant. Participants were asked demographic questions which included age, gender, geographic region (study 1 only), and highest level of education. Several questions captured information about their work including type of institution (e.g., academic, contract research organization; study 1 only), primary type of research (e.g., applied, basic, regulatory), and number of years working with laboratory animals. In addition, questions were included to capture data on potential explanatory variables. These included their ability to influence the construct of experiments, the amount of statistical education they had received, familiarity with factorial designs, how often they incorporate males and females into their experimental designs, their knowledge on how to analyse data when females and males were collected, and the impact of including males and females on statistical power. Finally, a list of pre-identified advantages and barriers were provided to participants to list perceived reasons why incorporating females and males was considered advantageous or challenging. Participants were provided with an empty text box to list any other advantages or barriers in addition to the list provided.
Motivations, outlined by the theory of planned behaviour, regarding sex inclusive research were assessed using a series of thirteen questions. Participants answered four close-ended quantitative questions about their behavioural attitudes (relative perception about the positive or negative valence of sex inclusive research), three questions on subjective norms (social and professional pressures to conduct sex inclusive research) and three questions on perceived behavioural control (general confidence/control over the ability to run sex inclusive studies). The perceived behavioural control variable is different from self-efficacy as it asks participants about external control factors, such as whether inclusive designs are under the control of the participant. One open ended question was included at the end of survey to assess if there were any other thoughts or comments participants wanted to share on the topic. Finally, three questions were used to assess participants’ future intent to implement sex inclusive research.
The survey includes a question to determine whether participants are answering the question as part of the baseline/interested/intervention group (study 1) or pre/post assessment (study 2). Multiple participation was not anticipated due to the length of the survey and the time constraints for data collection.
The surveys were constructed to ensure that the only identifiable information was the optional input of an email address to support the draw of the Amazon vouchers. To maintain anonymity, once data was downloaded, an ID code was assigned, email addresses removed, and the email information only used for the draw. The final data was stored in an encrypted, password-protected file accessible only to the research team.
5.4 Workshop construct
The workshop, or training intervention, was developed by statisticians who have published, presented internationally, and taught on this topic (Supplementary Material 4). One of the statisticians has a formal teaching qualification and a background in education. The training consisted of a 1 hour 45 min workshop with didactic, interactive activities and introduced a framework to evaluate research proposals from a sex inclusive perspective (Table 1, Supplementary material 4). Prior to study 1, it was run as a pilot with a small group of scientists to test the material. Following the results of study 1, the workshop material was refined by amending the multiple-choice question one and two to be more explicit in driving a discussion around whether data should be analysed by pooling, disaggregation or with a factorial test. Furthermore, multiple-choice question six was added to re- enforce learning that a baseline sex difference can be separated from an intervention effect.
5.5 Statistics and Reproducibility
5.5.1 Participant inclusion
In study 1, a total of 194 participants started the survey. Of these, 105 met the inclusion criteria questions (N=39 baseline, N=51 interested and N=15 intervention). While all 105 met the inclusion criteria, seven participants left the question about age blank. Therefore, the final dataset to evaluate intent included 98 participants (N=35 baseline, N=48 interested and N=15 intervention).
In study 2, a total of 63 individuals registered for the workshop, 42 participants attended all, or part of the workshop, and survey data was collected from 31 individuals. However, due to some not meeting the inclusion criteria, or not completing one of the surveys, we received data from 28 individuals for the pre- survey and 28 for the post.
5.5.2 Variable coding
For the theory of planned behaviour questions, each question had a scale from 1 to 7. An average score was calculated for each motivation (attitude, perceived behavioural control and social norm) and for intent. This strategy required that each participant answer at least 50% of the questions for each motivation, otherwise their data was discarded. All participants answered at least 50% of the theory of planned behaviour questions and therefore no participant’s data were discarded based on these criteria.
To aid analysis, demographic categories with less than ten responses were combined with similar themed options into larger categories. For example, the question that asked about previous statistical training, the categories of “No training” and “Primarily informal/practical” were collapsed into a category called “No courses”. Missing data for categorical variables (gender, geography, etc) were coded as “unknown”.
Within the survey, two (survey 2) to three (survey 1) questions had opportunities for participants to add a free text answer in response to a question as an additional option provided to a question. Those questions were about the perceived barriers to or advantages of using males and females in preclinical research. For these questions, all free text submitted were reviewed by the research team and categorized based on perceived theme. Generally, very few free responses (Survey 1: 12 for barrier question only; Survey 2:6 for barrier question only) were provided.
Five knowledge questions were posed to the participants based on misconceptions (22) concerning using females and males simultaneously in an experiment and variability in the female sex. With each question having a single correct answer. If a participant provided the correct answer, they were given a point, and the points were summed across the five questions. The knowledge question summary metric was not included in the statistical exploration of intent, as knowledge was anticipated to correlate with treatment group.
5.5.3 Quantitative analysis
Throughout this research, a significance level of p < 0.05 was considered statistically significant. Multiple testing correction was not applied as this is an exploratory analysis.
5.5.3.1 Evaluation of intent
All data analyses were conducted in JMP 14.0.0 (SAS Institute Inc., Cary, NC USA). The data for each analysis are provided in Supplementary Analysis 8, including the specific SAS code used to run the statistical model regarding the theory of planned behaviour intent analyses.
The dependent variable for quantitative analysis was the average intent (Avg_Intent) which was Box-Cox transformed to improve the distributional characteristics in study 1 only. As fixed effects of interest, the statistical model included ‘Treatment_Group’ for study 1 and ‘Intervention’ for study 2. In addition, three theory of planned behaviour attributes: average attitude (Attitude), average behavioural control score (Beh_Control) and average social norm (Soc_Norm) were included as fixed effects. Furthermore, potential explanatory variables (e.g., demographic variables) were included in the model as fixed effects. A full model containing all possible terms (Eq. [1] study 1 and Eq. [2] study 2) was used as we had no strong prior information that we could use to select predictive variables. For the data from study 2, a mixed model analysis was used as we included a random term for participant to account for the repeat nature of the data. For both studies, the baseline group was set as the reference group. Since a study of this kind has not been conducted before and the social factors (age, gender, previous statistical experience, etc.) are not reported as explaining/not explaining variability in data such as this, we left all potential explanatory variables in the model to provide data on demographic variables for future social science work of this kind. We evaluated the final model with and without these non-significant variables and the inclusion, or exclusion, did not affect the results presented. Model diagnostics (e.g., independence of residuals, homogeneity of variance, and normality of residuals) were inspected and no concerns on the model quality were identified.
Eta squared for study 2 was estimated using the F_to_eta2 function in the effect size library with R4.3.1.
Where Year_Work represented the number of year the participants have worked in animal research, Education the highest level of education obtained (‘Doctoral’, ‘Masters’, ‘Other’), Type_Work represents the type of research, Training represents the level of statistical training received (‘No courses’, ‘1-2 courses’, (>2 courses’), Factorial_Fam represents how familiar the participants were with factorial experimental design, Factorial_Incor represents how often the participants incorporated males and females into their experiments while studying an intervention and Ability_Influence represents how often the participants were involved or could influence the planning of experiments involving animals. Participant was included as a random effect in Eq. 2 and is represented in the model as (1|Participant). For study 2, few demographics could be included in the analysis due to lack of variability and the fact that several were not asked at the post-intervention time point.
5.5.3.2 Evaluation of knowledge
Using the statistical programming language, R Version 4.3.1, a Poisson regression analysis assessed the role of treatment group in explaining variation in the cumulative knowledge score for study 1 (data: Supplementary Table 3, analysis: Supplementary Analysis 9) and study 2 (data: Supplementary Table 4). A variety of diagnostics (e.g., residual distribution, assessment Cook’s distance, Leverage, goodness fit test) were generated to ensure the model was robust and appropriate for the data.
Data availability statement
All data generated during this research has been made available within the manuscript or via the supplementary information.
Additional information
Funding
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. This work was supported by grants for AA from The British Journal of Pharmacology (https://bpspubs.onlinelibrary.wiley.com/journal/14765381) and AstraZeneca (https://www.astrazeneca.com/).
Contribution
NAK: Conceptualization, Funding acquisition, Investigation, Visualization, Project administration, Formal analysis, Supervision, Writing – original draft, Writing – review & editing
BP: Investigation, Visualization, Writing – original draft, Writing – review & editing BG: Conceptualization, Investigation, Formal analysis, Writing – original draft, Writing – review & editing
HR: Investigation OO: Investigation AR: Investigation JO: Investigation
AA: Funding acquisition, Investigation, Writing – original draft, Writing – review & editing
Abbreviations
SIRF: Sex Inclusive Research Framework
Supplementary Material



Summary of demographic and potential predictors between groups for all survey 1 contributors who met the inclusion criteria.
The demographic information included is for the full 105 participants that met the inclusion criteria. While all 105 met the inclusion criteria, 7 participants left the question about age blank. As age was included as a predictor in the analysis of intent the missing values were managed with listwise deletion, assuming missing at random, reducing the dataset size to 98 participants (N=35 baseline, N=48 interested and N=15 intervention). To test for a statistically significant difference (association) between the treatment groups, a Pearson’s Chi-square test was used for categorical variables, ordered logistic regression for nominal variables and ANOVA test for continuous variables. Institute type was collected as a demographic, the resulted population sampled was predominately academic and this variable was therefore removed from downstream analysis due to the lack of predictive ability to assess institute type on the outcome of interest. The abbreviation name in bracket, within the demographic column, indicates the term used within the statistical model and associated output.


Summary of demographic and potential predictors for all participants who met the inclusion criteria for Study 2.
The demographic information is for the full 31 participants that met the inclusion criteria in study 2. While 31 unique individuals met the criteria, an individual may not have responded to both surveys (N=29 pre-survey and N=28 in the post-survey). For instance, 2 participants did not meet the inclusion criteria for the pre-survey but met the criteria for the post-intervention survey. For the intention analysis, some missing data was observed in the demographic data. To reduce survey burden, the post-survey only included one demographic question (the participant’s age) to support alignment of data just in case duplicate initials were used as an identifier. Two responders did not include the age information in the study and were managed with listwise deletion, assuming missing at random. The abbreviation name in bracket, within the demographic column, indicates the term used within the statistical model and associated output. For McNemar’s paired analysis we conducted listwise deletion, assuming missing at random, reducing the dataset size to 26.
Supplementary Analysis 1: Pearsons’s correlation coefficient analysis between continuous variables.
Where Year_Work represents the number of years the participants have worked in animal research, Education represents the highest level of education obtained, Type_Work represents the type of research conducted by the participant, Training represents the level of statistical training received, Factorial_Fam represents how familiar the participants were with factorial experimental design, Factorial_Incor represents how often the participants incorporated males and females into their experiments while studying an intervention, attitude represents the average attitude score, Beh_control represents the average behavioural control score, Soc_norm represents the average social normal score and Ability_Influence represents how often the participants were involved or could influence the planning of experiments involving animals. Only variables that were used in the final statistical model were compared for correlations.


Supplementary Analysis 2: For the perceived barriers, the Pearson’s chi-square test of association between treatment groups in study 1
This survey question provided several pre-defined options and ability to enter a free-texted option. Participants were asked to choose all that applied. Exploration of the free text has grouped the barriers into three additional categories: convention, logistic, and none or no barriers. To test for a statistically significant difference (association) between the treatment groups, a Pearson’s Chi-square test was applied for all options where the total N>10.

Supplementary Analysis 3: For the perceived benefits, the Pearson’s chi-square test of association between treatment groups in study 1
This question provided several pre-defined options and the ability to enter a free-texted option. Participants were asked to choose all that applied. No free text advantages were provided by survey takers for this question. To test for a statistically significant difference (association) between the treatment groups, a Pearson’s chi-square test of association was applied for all options where the total N >10.

Supplementary Analysis 4: For the perceived barriers, the McNemar’s test of association between treatment groups in study 2
This question provided several pre-defined options and the ability to enter a free-texted option. Participants were asked to choose all that applied. No free text advantages were provided by survey takers for this question. To test for a statistically significant difference (association) between the treatment groups, a McNemar’s test was applied for all options where the total N >10.


Supplementary Analysis 5: For the perceived benefits, the McNemar’s test of association between treatment groups in study 2
This survey question provided several pre-defined options and ability to enter a free-texted option. Participants were asked to choose all that applied. Participants were given the option of a free text response, but none were submitted. To test for a statistically significant difference (association) between the pre and post measures, a McNemar’s test was applied for all options where the total N >10. This test accounts for the repeat nature of the data which required the list wise deletion of those individuals with missing data.

Supplementary Analysis 6: For the knowledge questions, a Pearson’s chi-squared test to assess association between treatment groups of the proportion of participants who answered the question correctly in study 1


Supplementary Analysis 7: For the knowledge questions, a McNemar’s test to assess association between pre and post answers for each knowledge question in study 2

Supplementary Analysis 8: Intention data and SAS analysis code
The following provides the data and SAS code used to analyze study 1 and study 2 intention data. This information can be used for data and analysis transparency. Further, the information below can be cut and pasted directly into SAS.
Study 1 intent data and analysis code








cumulative knowledge score for study 1.


Pre- and post-cumulative knowledge scores for study 2.
Supplementary Analysis 9: Poisson regression analysis of cumulative knowledge score for study 1






References
- 1.Sex bias in neuroscience and biomedical researchNeurosci Biobehav Rev 35:565–572Google Scholar
- 2.Vive la differenceTrends Neurosci 15:331–332Google Scholar
- 3.A 10-year follow-up study of sex inclusion in the biological scienceseLife 9:e56344https://doi.org/10.7554/eLife.56344Google Scholar
- 4.Unmasking the Adverse Impacts of Sex Bias on Science and Research Animal WelfareAnimals 13:2792Google Scholar
- 5.Prevalence of sexual dimorphism in mammalian phenotypic traitsNat Commun 8Google Scholar
- 6.Drug Safety: Most Drugs Withdrawn in Recent Years Had Greater Health Risks for WomenGoogle Scholar
- 7.Qualitative sex differences in pain processing: emerging evidence of a biased literatureNature Reviews Neuroscience 21:353–365Google Scholar
- 8.Studying both sexes: a guiding principle for biomedicineThe FASEB Journal 30Google Scholar
- 9.Best practices to promote rigor and reproducibility in the era of sex-inclusive researcheLife 12:e90623https://doi.org/10.7554/eLife.90623Google Scholar
- 10.Considering sex as a biological variable will require a global shift in science cultureNature neuroscience 24:457–464Google Scholar
- 11.Sex as an important biological variable in biomedical researchBMB reports 51Google Scholar
- 12.Pervasive neglect of sex differences in biomedical researchCold Spring Harbor perspectives in biology 14Google Scholar
- 13.Sex in experimental designUK Research and Innovation https://www.ukri.org/councils/mrc/guidance-for-applicants/policies-and-guidance-for-researchers/sex-in-experimental-design/Google Scholar
- 14.An analysis of neuroscience and psychiatry papers published from 2009 and 2019 outlines opportunities for increasing discovery of sex differencesNature communications 13:2137Google Scholar
- 15.A 10-year follow-up study of sex inclusion in the biological scienceseLife 9Google Scholar
- 16.Considering and reporting sex as an experimental variable II: An update on progress in the British Journal of PharmacologyBritish Journal of Pharmacology Google Scholar
- 17.Reporting and misreporting of sex differences in the biological scienceseLife 10Google Scholar
- 18.Working Group on Sex in Experimental Design of Animal ResearchGoogle Scholar
- 19.Implementation of the NIH sex-inclusion policy: attitudes and opinions of study section membersJournal of Women’s Health 28:9–16Google Scholar
- 20.Evaluating the National Institutes of Health’s sex as a biological variable policy: conflicting accounts from the front lines of animal researchJournal of Women’s Health 30:348–354Google Scholar
- 21.Three years in: “sex as a biological variable” policy in practice-and an invitation to collaborateGenderscilab Google Scholar
- 22.Statistical simulations show that scientists need not increase overall sample size by default when including both sexes in in vivo studiesPlos Biology 21:e3002129Google Scholar
- 23.Sex bias in preclinical research and an exploration of how to change the status quoBr J Pharmacol 176:4107–4118Google Scholar
- 24.Female rats are not more variable than male rats: a meta-analysis of neuroscience studiesBiology of sex differences 7:1–7Google Scholar
- 25.Female mice liberated for inclusion in neuroscience and biomedical researchNeuroscience & Biobehavioral Reviews 40:1–5Google Scholar
- 26.Inclusion of females does not increase variability in rodent research studiesCurrent opinion in behavioral sciences 23:143–149Google Scholar
- 27.NIH policy: mandate goes too farNature 510:340–340Google Scholar
- 28.Sex in experimental design: summary reportUK Research and Innovation https://www.ukri.org/publications/sex-in-experimental-design-summary-report/Google Scholar
- 29.The theory of planned behaviorOrganizational behavior and human decision processes 50:179–211Google Scholar
- 30.Inclusion of both sexes in research design - Call for inputhttps://engagementhub.ukri.org/mrc-regulatorysupportcentre/inclusion-sex-in-research-design/Google Scholar
- 31.The effect of group size, age and handling frequency on inter-male aggression in CD 1 miceScientific reports 10:2253Google Scholar
- 32.Breaking up is hard to do: Does splitting cages of mice reduce aggression?Applied animal behaviour science 206:94–101Google Scholar
- 33.Male management: coping with aggression problems in male laboratory miceLaboratory animals 37:300–313Google Scholar
- 34.Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessmentsJournal of personality and social psychology 77:1121Google Scholar
- 35.Experiences of early career researchers: Influences on the design and reporting of animal experiments, and the practical and emotional support needed to enhance best practice methodsLaboratory Animals 00236772241242850Google Scholar
- 36.Changing human behavior to improve animal welfare: A longitudinal investigation of training laboratory animal personnel about Heterospecific play or “rat tickling”Animals 10:1435Google Scholar
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.106545. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Gaskill et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 59
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.