Statistics: Sex difference analyses under scrutiny
Scientific research requires the use of appropriate methods and statistical analyses, otherwise results and interpretations can be flawed. How research outcomes differ by sex, for example, has historically been understudied, and only recently have policies been implemented to require such consideration in the design of a study (e.g., NIH, 2015).
Over two decades ago, the renowned biomedical statistician Doug Altman labeled methodological weaknesses a “scandal”, raising awareness of shortcomings related to the representativeness of research as well as inappropriate research designs and statistical analysis (Altman, 1994). These methodological weaknesses extend to research on sex differences: simply adding female cells, animals, or participants to experiments does not guarantee an improved understanding of this field of research. Rather, the experiments must also be correctly designed and analyzed appropriately to examine such differences. While guidance exists for proper analysis of sex differences, the frequency of errors in published research articles related to this topic has not been well understood (e.g., Beltz et al., 2019).
Now, in eLife, Yesenia Garcia-Sifuentes and Donna Maney of Emory University fill this gap by surveying the literature to examine whether the statistical analyses used in different research articles are appropriate to support conclusions of sex differences (Garcia-Sifuentes and Maney, 2021). Drawing from a previous study that surveyed articles studying mammals from nine biological disciplines, Garcia-Sifuentes and Maney sampled 147 articles that included both males and females and performed an analysis by sex (Woitowich et al., 2020).
Over half of the articles surveyed (83, or 56%) reported a sex difference. Garcia-Sifuentes and Maney examined the statistical methods used to analyze sex differences and found that over a quarter (24 out of 83) of these articles did not perform or report a statistical analysis supporting the claim of a sex difference. A factorial design with sex as a factor is an appropriate way to examine sex differences in response to treatment, by giving each sex each treatment option (such as a treatment or control diet; see Figure 1A). A slight majority of all articles (92, or 63%) used a factorial design. Within the articles using a factorial design, however, less than one third (27) applied and reported a method appropriate to test for sex differences (e.g., testing for an interaction between sex and the exposure, such as different diets; Figure 1B). Similarly, within articles that used a factorial design and concluded a sex-specific effect, less than one third (16 out of 53) used an appropriate analysis.
Notably, nearly half of the articles (24 out of 53) that concluded a sex-specific effect statistically tested the effect of treatment within each sex and compared the resulting statistical significance. In other words, when one sex had a statistically significant change and the other did not, the authors of the original studies concluded that a sex difference existed. This approach, which is sometimes called ‘differences in nominal significance’, or ‘DINS’ error (George et al., 2016), is invalid and has been found to occur for decades among several disciplines, including neuroscience (Nieuwenhuis et al., 2011), obesity and nutrition (Bland and Altman, 2015; George et al., 2016; Vorland et al., 2021), and more general areas (Gelman and Stern, 2006; Makin, 2019; Matthews and Altman, 1996; Sainani, 2010; Figure 1C).
This approach is invalid because testing within each sex separately inflates the probability of falsely concluding that a sex-specific effect is present compared to testing between them directly. Other inappropriate analyses that were identified in the survey included testing sex within treatment and ignoring control animals; not reporting results after claiming to do an appropriate analysis; or claiming an effect when the appropriate analysis was not statistically significant despite subscribing to ‘null hypothesis significance’ testing. Finally, when articles pooled the data of males and females together in their analysis, about half of them did not first test for a sex difference, potentially masking important differences.
The results of Garcia-Sifuentes and Maney highlight the need for thoughtful planning of study design, analysis, and communication to maximize our understanding and use of biological sex differences in practice. Although the survey does not quantify what proportion of this research comes to incorrect conclusions from using inappropriate statistical methods, which would require estimation procedures or reanalyzing the data, many of these studies’ conclusions may change if they were analyzed correctly. Misleading results divert our attention and resources, contributing to the larger problem of ‘waste’ in biomedical research, that is, the avoidable costs of research that does not contribute to our understanding of what is true because it is flawed, methodologically weak, or not clearly communicated (Glasziou and Chalmers, 2018).
What can the scientific enterprise do about this problem? The survey suggests that there may be a large variability in discipline-specific practices in the design, reporting, and analysis strategies to examine sex differences. Although larger surveys are needed to assess these more comprehensively, they may imply that education and support efforts could be targeted where they are most needed. Compelling scientists to publicly share their data can facilitate reanalysis when statistical errors are discovered – though the burden on researchers performing the reanalysis is not trivial. Partnering with statisticians in the design, analysis, and interpretation of research is perhaps the most effective means of prevention.
Scientific research often does not reflect the diversity of those who benefit from it. Even when it does, using methods that are inappropriate fails to support the progress toward equity. Surely this is nothing less than a scandal.
References
-
Analysis of sex differences in pre-clinical and clinical data setsNeuropsychopharmacology 44:2155–2158.https://doi.org/10.1038/s41386-019-0524-3
-
Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approachThe American Journal of Clinical Nutrition 102:991–994.https://doi.org/10.3945/ajcn.115.119768
-
The difference between “significant” and “not significant” is not itself statistically significantThe American Statistician 60:328–331.https://doi.org/10.1198/000313006X152649
-
Erroneous analyses of interactions in neuroscience: a problem of significanceNature Neuroscience 14:1105–1107.https://doi.org/10.1038/nn.2886
-
WebsiteConsideration of Sex as a Biological Variable in NIH-funded ResearchAccessed October 13, 2021.
Article and author information
Author details
Publication history
Copyright
© 2021, Vorland
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 3,572
- views
-
- 251
- downloads
-
- 12
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Medicine
- Neuroscience
Pain after surgery causes significant suffering. Opioid analgesics cause severe side effects and accidental death. Therefore, there is an urgent need to develop non-opioid therapies for managing post-surgical pain. Local application of Clarix Flo (FLO), a human amniotic membrane (AM) product, attenuated established post-surgical pain hypersensitivity without exhibiting known side effects of opioid use in mice. This effect was achieved through direct inhibition of nociceptive dorsal root ganglion (DRG) neurons via CD44-dependent pathways. We further purified the major matrix component, the heavy chain-hyaluronic acid/pentraxin 3 (HC-HA/PTX3) from human AM that has greater purity and water solubility than FLO. HC-HA/PTX3 replicated FLO-induced neuronal and pain inhibition. Mechanistically, HC-HA/PTX3-induced cytoskeleton rearrangements to inhibit sodium current and high-voltage activated calcium current on nociceptive DRG neurons, suggesting it is a key bioactive component mediating pain relief. Collectively, our findings highlight the potential of naturally derived biologics from human birth tissues as an effective non-opioid treatment for post-surgical pain. Moreover, we unravel the underlying neuronal mechanisms of pain inhibition induced by FLO and HC-HA/PTX3.
-
- Medicine
Background:
Clonal hematopoiesis of indeterminate potential (CHIP) was initially linked to a twofold increase in atherothrombotic events. However, recent investigations have revealed a more nuanced picture, suggesting that CHIP may confer only a modest rise in myocardial infarction (MI) risk. This observed lower risk might be influenced by yet unidentified factors that modulate the pathological effects of CHIP. Mosaic loss of the Y chromosome (mLOY), a common marker of clonal hematopoiesis in men, has emerged as a potential candidate for modulating cardiovascular risk associated with CHIP. In this study, we aimed to ascertain the risk linked to each somatic mutation or mLOY and explore whether mLOY could exert an influence on the cardiovascular risk associated with CHIP.
Methods:
We conducted an examination for the presence of CHIP and mLOY using targeted high-throughput sequencing and digital PCR in a cohort of 446 individuals. Among them, 149 patients from the CHAth study had experienced a first MI at the time of inclusion (MI(+) subjects), while 297 individuals from the Three-City cohort had no history of cardiovascular events (CVE) at the time of inclusion (MI(-) subjects). All subjects underwent thorough cardiovascular phenotyping, including a direct assessment of atherosclerotic burden. Our investigation aimed to determine whether mLOY could modulate inflammation, atherosclerosis burden, and atherothrombotic risk associated with CHIP.
Results:
CHIP and mLOY were detected with a substantial prevalence (45.1% and 37.7%, respectively), and their occurrence was similar between MI(+) and MI(-) subjects. Notably, nearly 40% of CHIP(+) male subjects also exhibited mLOY. Interestingly, neither CHIP nor mLOY independently resulted in significant increases in plasma hs-CRP levels, atherosclerotic burden, or MI incidence. Moreover, mLOY did not amplify or diminish inflammation, atherosclerosis, or MI incidence among CHIP(+) male subjects. Conversely, in MI(-) male subjects, CHIP heightened the risk of MI over a 5 y period, particularly in those lacking mLOY.
Conclusions:
Our study highlights the high prevalence of CHIP and mLOY in elderly individuals. Importantly, our results demonstrate that neither CHIP nor mLOY in isolation substantially contributes to inflammation, atherosclerosis, or MI incidence. Furthermore, we find that mLOY does not exert a significant influence on the modulation of inflammation, atherosclerosis burden, or atherothrombotic risk associated with CHIP. However, CHIP may accelerate the occurrence of MI, especially when unaccompanied by mLOY. These findings underscore the complexity of the interplay between CHIP, mLOY, and cardiovascular risk, suggesting that large-scale studies with thousands more patients may be necessary to elucidate subtle correlations.
Funding:
This study was supported by the Fondation Cœur & Recherche (the Société Française de Cardiologie), the Fédération Française de Cardiologie, ERA-CVD (« CHEMICAL » consortium, JTC 2019) and the Fondation Université de Bordeaux. The laboratory of Hematology of the University Hospital of Bordeaux benefitted of a convention with the Nouvelle Aquitaine Region (2018-1R30113-8473520) for the acquisition of the Nextseq 550Dx sequencer used in this study.
Clinical trial number: