MDverse, shedding light on the dark matter of molecular dynamics simulations

  1. Johanna KS Tiemann  Is a corresponding author
  2. Magdalena Szczuka
  3. Lisa Bouarroudj
  4. Mohamed Oussaren
  5. Steven Garcia
  6. Rebecca J Howard
  7. Lucie Delemotte
  8. Erik Lindahl
  9. Marc Baaden
  10. Kresten Lindorff-Larsen
  11. Matthieu Chavent  Is a corresponding author
  12. Pierre Poulain  Is a corresponding author
  1. Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Denmark
  2. Institut de Pharmacologie et Biologie Structurale, CNRS, Université de Toulouse, France
  3. Université Paris Cité, CNRS, Institut Jacques Monod, France
  4. Independent researcher, Netherlands
  5. Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Sweden
  6. Department of applied physics, Science for Life Laboratory, KTH Royal Institute of Technology, Sweden
  7. Laboratoire de Biochimie Théorique, CNRS, Université Paris Cité, France
5 figures, 1 table and 1 additional file

Figures

Explore and Expand ($Ex^2$) strategy used to index MD-related files and number of deposited files in generalist data repositories, identified by this strategy.

(A) Explore and Expand (Ex2) strategy used to index and collect MD-related files. Within the explore phase, we search in the respective data repositories for datasets that contain specific keywords …

Categorization of index files based on their file types and assigned MD engine.

(A) Distribution of files among MD simulation engines (B) Expansion of (A) MD Engine category ‘Unknown’ into the 10 most observed file types.

Content analysis of .xtc and .gro files.

(A) Number of Gromacs-related files available in searched data repositories. In red, files used for further analyses. (B) Simple analyze of a subset of .xtc files with the cumulative distribution of …

Content analysis of .mdp files.

(A) Cumulative distribution of .mdp files versus the simulation time for all-atom and coarse-grain simulations. (B) Sankey graph of the repartition between different values for thermostat and …

Snapshots of the MDverse data explorer, a prototype search engine to explore collected files and datasets.

(A) General view of the web application. (B) Focus on the .mdp and .gro files sets of data exported as.tsv files. The web application also includes links to their original repository.

Tables

Table 1
Statistics of the MD-related datasets and files found in the data repositories Figshare, OSF, and Zenodo.
Data repositorydatasetsfirst datasetlatest datasetfilestotal size (GB)zip filesfiles within ziptotal files
Zenodo101119/11/201405/03/202320,25012,8511780141,304161,554
Figshare91320/08/201203/03/2023333673659074,72078,056
OSF5524/05/201705/02/202361464951406146
Total197929,73214,0822384216,024245,756

Additional files

Download links