A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation

  1. Daniel S Quintana  Is a corresponding author
  1. University of Oslo, Norway

Abstract

Open research data provides considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.

Data availability

Data and analysis scripts are available at the article's Open Science Framework webpage https://osf.io/z524n/

The following previously published data sets were used
    1. Jones BC
    2. DeBruine L
    (2019) Sociosexuality and self-rated attractiveness
    Open Science Framework, DOI: 10.17605/OSF.IO/6BK3W.

Article and author information

Author details

  1. Daniel S Quintana

    Institute of Clinical Medicine, University of Oslo, Oslo, Norway
    For correspondence
    daniel.quintana@medisin.uio.no
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2876-0004

Funding

Novo Nordisk Foundation (Excellence grant NNF16OC0019856)

  • Daniel S Quintana

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2020, Quintana

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 4,395
    views
  • 384
    downloads
  • 66
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniel S Quintana
(2020)
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation
eLife 9:e53275.
https://doi.org/10.7554/eLife.53275

Share this article

https://doi.org/10.7554/eLife.53275

Further reading

    1. Medicine
    Yao Li, Hui Xin ... Wei Zhang
    Research Article

    Estrogen significantly impacts women’s health, and postmenopausal hypertension is a common issue characterized by blood pressure fluctuations. Current control strategies for this condition are limited in efficacy, necessitating further research into the underlying mechanisms. Although metabolomics has been applied to study various diseases, its use in understanding postmenopausal hypertension is scarce. Therefore, an ovariectomized rat model was used to simulate postmenopausal conditions. Estrogen levels, blood pressure, and aortic tissue metabolomics were analyzed. Animal models were divided into Sham, OVX, and OVX +E groups. Serum estrogen levels, blood pressure measurements, and aortic tissue metabolomics analyses were performed using radioimmunoassay, UHPLC-Q-TOF, and bioinformatics techniques. Based on the above research content, we successfully established a correlation between low estrogen levels and postmenopausal hypertension in rats. Notable differences in blood pressure parameters and aortic tissue metabolites were observed across the experimental groups. Specifically, metabolites that were differentially expressed, particularly L-alpha-aminobutyric acid (L-AABA), showed potential as a biomarker for postmenopausal hypertension, potentially exerting a protective function through macrophage activation and vascular remodeling. Enrichment analysis revealed alterations in sugar metabolism pathways, such as the Warburg effect and glycolysis, indicating their involvement in postmenopausal hypertension. Overall, this current research provides insights into the metabolic changes associated with postmenopausal hypertension, highlighting the role of L-AABA and sugar metabolism reprogramming in aortic tissue. The findings suggest a potential link between low estrogen levels, macrophage function, and vascular remodeling in the pathogenesis of postmenopausal hypertension. Further investigations are needed to validate these findings and explore their clinical implications for postmenopausal women.

    1. Medicine
    2. Neuroscience
    Gansheng Tan, Anna L Huguenard ... Eric C Leuthardt
    Research Article

    Background:

    Subarachnoid hemorrhage (SAH) is characterized by intense central inflammation, leading to substantial post-hemorrhagic complications such as vasospasm and delayed cerebral ischemia. Given the anti-inflammatory effect of transcutaneous auricular vagus nerve stimulation (taVNS) and its ability to promote brain plasticity, taVNS has emerged as a promising therapeutic option for SAH patients. However, the effects of taVNS on cardiovascular dynamics in critically ill patients, like those with SAH, have not yet been investigated. Given the association between cardiac complications and elevated risk of poor clinical outcomes after SAH, it is essential to characterize the cardiovascular effects of taVNS to ensure this approach is safe in this fragile population. Therefore, this study assessed the impact of both acute and repetitive taVNS on cardiovascular function.

    Methods:

    In this randomized clinical trial, 24 SAH patients were assigned to either a taVNS treatment or a sham treatment group. During their stay in the intensive care unit, we monitored patient electrocardiogram readings and vital signs. We compared long-term changes in heart rate, heart rate variability (HRV), QT interval, and blood pressure between the two groups. Additionally, we assessed the effects of acute taVNS by comparing cardiovascular metrics before, during, and after the intervention. We also explored acute cardiovascular biomarkers in patients exhibiting clinical improvement.

    Results:

    We found that repetitive taVNS did not significantly alter heart rate, QT interval, blood pressure, or intracranial pressure (ICP). However, repetitive taVNS increased overall HRV and parasympathetic activity compared to the sham treatment. The increase in parasympathetic activity was most pronounced from 2 to 4 days after initial treatment (Cohen’s d = 0.50). Acutely, taVNS increased heart rate, blood pressure, and peripheral perfusion index without affecting the corrected QT interval, ICP, or HRV. The acute post-treatment elevation in heart rate was more pronounced in patients who experienced a decrease of more than one point in their modified Rankin Score at the time of discharge.

    Conclusions:

    Our study found that taVNS treatment did not induce adverse cardiovascular effects, such as bradycardia or QT prolongation, supporting its development as a safe immunomodulatory treatment approach for SAH patients. The observed acute increase in heart rate after taVNS treatment may serve as a biomarker for SAH patients who could derive greater benefit from this treatment.

    Funding:

    The American Association of Neurological Surgeons (ALH), The Aneurysm and AVM Foundation (ALH), The National Institutes of Health R01-EB026439, P41-EB018783, U24-NS109103, R21-NS128307 (ECL, PB), McDonnell Center for Systems Neuroscience (ECL, PB), and Fondazione Neurone (PB).

    Clinical trial number:

    NCT04557618.