A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation

  1. Daniel S Quintana  Is a corresponding author
  1. University of Oslo, Norway

Abstract

Open research data provides considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.

Data availability

Data and analysis scripts are available at the article's Open Science Framework webpage https://osf.io/z524n/

The following previously published data sets were used
    1. Jones BC
    2. DeBruine L
    (2019) Sociosexuality and self-rated attractiveness
    Open Science Framework, DOI: 10.17605/OSF.IO/6BK3W.

Article and author information

Author details

  1. Daniel S Quintana

    Institute of Clinical Medicine, University of Oslo, Oslo, Norway
    For correspondence
    daniel.quintana@medisin.uio.no
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2876-0004

Funding

Novo Nordisk Foundation (Excellence grant NNF16OC0019856)

  • Daniel S Quintana

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2020, Quintana

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 4,285
    views
  • 368
    downloads
  • 63
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniel S Quintana
(2020)
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation
eLife 9:e53275.
https://doi.org/10.7554/eLife.53275

Share this article

https://doi.org/10.7554/eLife.53275

Further reading

    1. Medicine
    Tumininu S Faniyan, Xinyi Zhang ... Kavaljit H Chhabra
    Short Report

    The kidneys facilitate energy conservation through reabsorption of nutrients including glucose. Almost all the filtered blood glucose is reabsorbed by the kidneys. Loss of glucose in urine (glycosuria) is offset by an increase in endogenous glucose production to maintain normal energy supply in the body. How the body senses this glucose loss and consequently enhances glucose production is unclear. Using renal Slc2a2 (also known as Glut2) knockout mice, we demonstrate that elevated glycosuria activates the hypothalamic-pituitary-adrenal axis, which in turn drives endogenous glucose production. This phenotype was attenuated by selective afferent renal denervation, indicating the involvement of the afferent nerves in promoting the compensatory increase in glucose production. In addition, through plasma proteomics analyses we observed that acute phase proteins - which are usually involved in the body’s defense mechanisms against a threat – were the top candidates which were either upregulated or downregulated in renal Slc2a2 KO mice. Overall, afferent renal nerves contribute to promoting endogenous glucose production in response to elevated glycosuria and loss of glucose in urine is sensed as a biological threat in mice. These findings may be useful in improving the efficiency of drugs like SGLT2 inhibitors that are intended to treat hyperglycemia by enhancing glycosuria but are met with a compensatory increase in endogenous glucose production.

    1. Developmental Biology
    2. Medicine
    Stephen E Flaherty III, Olivier Bezy ... Zhidan Wu
    Research Article

    From a forward mutagenetic screen to discover mutations associated with obesity, we identified mutations in the Spag7 gene linked to metabolic dysfunction in mice. Here, we show that SPAG7 KO mice are born smaller and develop obesity and glucose intolerance in adulthood. This obesity does not stem from hyperphagia, but a decrease in energy expenditure. The KO animals also display reduced exercise tolerance and muscle function due to impaired mitochondrial function. Furthermore, SPAG7-deficiency in developing embryos leads to intrauterine growth restriction, brought on by placental insufficiency, likely due to abnormal development of the placental junctional zone. This insufficiency leads to loss of SPAG7-deficient fetuses in utero and reduced birth weights of those that survive. We hypothesize that a ‘thrifty phenotype’ is ingrained in SPAG7 KO animals during development that leads to adult obesity. Collectively, these results indicate that SPAG7 is essential for embryonic development and energy homeostasis later in life.