A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation

Abstract
Data availability
Article and author information
Metrics

Abstract

Open research data provides considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.

Data availability

Data and analysis scripts are available at the article's Open Science Framework webpage https://osf.io/z524n/

The following previously published data sets were used

(2016) Effects of oxytocin administration on spirituality and emotional responses to meditation
Open Science Framework, https://osf.io/rk2x7/.

https://osf.io/rk2x7/
1. Jones BC
2. DeBruine L
(2019) Sociosexuality and self-rated attractiveness
Open Science Framework, DOI: 10.17605/OSF.IO/6BK3W.

https://osf.io/6bk3w/

Article and author information

Author details

Daniel S Quintana

Institute of Clinical Medicine, University of Oslo, Oslo, Norway

For correspondence
daniel.quintana@medisin.uio.no

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-2876-0004

Funding

Novo Nordisk Foundation (Excellence grant NNF16OC0019856)

Daniel S Quintana

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.