MDverse, shedding light on the dark matter of molecular dynamics simulations

  1. Johanna KS Tiemann  Is a corresponding author
  2. Magdalena Szczuka
  3. Lisa Bouarroudj
  4. Mohamed Oussaren
  5. Steven Garcia
  6. Rebecca J Howard
  7. Lucie Delemotte
  8. Erik Lindahl
  9. Marc Baaden
  10. Kresten Lindorff-Larsen
  11. Matthieu Chavent  Is a corresponding author
  12. Pierre Poulain  Is a corresponding author
  1. Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Denmark
  2. Institut de Pharmacologie et Biologie Structurale, CNRS, Université de Toulouse, France
  3. Université Paris Cité, CNRS, Institut Jacques Monod, France
  4. Independent researcher, Netherlands
  5. Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Sweden
  6. Department of applied physics, Science for Life Laboratory, KTH Royal Institute of Technology, Sweden
  7. Laboratoire de Biochimie Théorique, CNRS, Université Paris Cité, France
5 figures, 1 table and 1 additional file

Figures

Explore and Expand ($Ex^2$) strategy used to index MD-related files and number of deposited files in generalist data repositories, identified by this strategy.

(A) Explore and Expand (Ex2) strategy used to index and collect MD-related files. Within the explore phase, we search in the respective data repositories for datasets that contain specific keywords (e.g. ‘molecular dynamics’, ‘md simulation’, ‘namd’, ‘martini’...) in conjunction with specific file extensions (e.g. ‘mdp’, ‘psf’, ‘parm7’...), depending on their uniqueness and level of trust to not report false-positives (i.e. not MD related). In the expand phase, the content of the identified datasets is fully cataloged, including files that individually could result in false positives (such as e.g. ‘.log’ files). (B) Number of deposited files in generalist data repositories, identified by our Ex2 strategy.

Categorization of index files based on their file types and assigned MD engine.

(A) Distribution of files among MD simulation engines (B) Expansion of (A) MD Engine category ‘Unknown’ into the 10 most observed file types.

Content analysis of .xtc and .gro files.

(A) Number of Gromacs-related files available in searched data repositories. In red, files used for further analyses. (B) Simple analyze of a subset of .xtc files with the cumulative distribution of the number of frames (in green) and the system size (in orange). (C) Cumulative distribution of the system sizes extracted from .gro files. (D) Upset plot of systems grouped by molecular composition, inferred from the analysis of .gro files. For this figure, 3D structures of representative systems were displayed, including soluble proteins such as TonB and T4 Lysozyme, membrane proteins such as Kir Channels and the Gasdermin prepore, Protein-/RNA and G-quadruplex and other non-protein molecules.

Content analysis of .mdp files.

(A) Cumulative distribution of .mdp files versus the simulation time for all-atom and coarse-grain simulations. (B) Sankey graph of the repartition between different values for thermostat and barostat. (C) Temperature distribution, full scale in upper panel and zoom-in in lower panel.

Snapshots of the MDverse data explorer, a prototype search engine to explore collected files and datasets.

(A) General view of the web application. (B) Focus on the .mdp and .gro files sets of data exported as.tsv files. The web application also includes links to their original repository.

Tables

Table 1
Statistics of the MD-related datasets and files found in the data repositories Figshare, OSF, and Zenodo.
Data repositorydatasetsfirst datasetlatest datasetfilestotal size (GB)zip filesfiles within ziptotal files
Zenodo101119/11/201405/03/202320,25012,8511780141,304161,554
Figshare91320/08/201203/03/2023333673659074,72078,056
OSF5524/05/201705/02/202361464951406146
Total197929,73214,0822384216,024245,756

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Johanna KS Tiemann
  2. Magdalena Szczuka
  3. Lisa Bouarroudj
  4. Mohamed Oussaren
  5. Steven Garcia
  6. Rebecca J Howard
  7. Lucie Delemotte
  8. Erik Lindahl
  9. Marc Baaden
  10. Kresten Lindorff-Larsen
  11. Matthieu Chavent
  12. Pierre Poulain
(2024)
MDverse, shedding light on the dark matter of molecular dynamics simulations
eLife 12:RP90061.
https://doi.org/10.7554/eLife.90061.3