A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation
Abstract
Open research data provides considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.
Data availability
Data and analysis scripts are available at the article's Open Science Framework webpage https://osf.io/z524n/
-
Effects of oxytocin administration on spirituality and emotional responses to meditationOpen Science Framework, https://osf.io/rk2x7/.
-
Sociosexuality and self-rated attractivenessOpen Science Framework, DOI: 10.17605/OSF.IO/6BK3W.
Article and author information
Author details
Funding
Novo Nordisk Foundation (Excellence grant NNF16OC0019856)
- Daniel S Quintana
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Mone Zaidi, Icahn School of Medicine at Mount Sinai, United States
Version history
- Received: November 1, 2019
- Accepted: March 11, 2020
- Accepted Manuscript published: March 11, 2020 (version 1)
- Version of Record published: April 1, 2020 (version 2)
Copyright
© 2020, Quintana
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,165
- views
-
- 356
- downloads
-
- 59
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.