The Neurodata Without Borders ecosystem for neurophysiological data science

  1. Oliver Rübel  Is a corresponding author
  2. Andrew Tritt
  3. Ryan Ly
  4. Benjamin K Dichter
  5. Satrajit Ghosh
  6. Lawrence Niu
  7. Pamela Baker
  8. Ivan Soltesz
  9. Lydia Ng
  10. Karel Svoboda
  11. Loren Frank
  12. Kristofer E Bouchard  Is a corresponding author
  1. Scientific Data Division, Lawrence Berkeley National Laboratory, United States
  2. Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, United States
  3. CatalystNeuro, United States
  4. McGovern Institute for Brain Research, Massachusetts Institute of Technology, United States
  5. Department of Otolaryngology - Head and Neck Surgery, Harvard Medical School, United States
  6. MBF Bioscience, United States
  7. Allen Institute for Brain Science, United States
  8. Department of Neurosurgery, Stanford University, United States
  9. Janelia Research Campus, Howard Hughes Medical Institute, United States
  10. Kavli Institute for Fundamental Neuroscience, United States
  11. Departments of Physiology and Psychiatry University of California, San Francisco, United States
  12. Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, United States
  13. Helen Wills Neuroscience Institute and Redwood Center for Theoretical Neuroscience, University of California, Berkeley, United States
  14. Weill Neurohub, United States
15 figures, 4 tables and 1 additional file

Figures

NWB addresses the massive diversity of neurophysiology data and metadata.

(a) Diversity of experimental systems: species and tasks: (i) mice performing a visual discrimination task; (ii) rats performing a memory-guided navigation task; (iii) humans speaking …

NWB enables unified description and storage of multimodal raw and processed data.

(a) Example pipelines for extracellular electrophysiology and optical physiology demonstrate how NWB facilitates data processing. For extracellular electrophysiology (top), raw acquired data is …

The NWB software architecture modularizes and integrates all components of a data language.

(a) Illustration of the main components of the NWB software stack consisting of: (i) the specification language (light blue) to describe data standards, (ii) the data standard schema (lilac), which …

NWB enables creation and sharing of extensions to incorporate new use cases.

(a) Schematic of the process of creating a new neurodata extension (NDX), sharing it, and integrating it with the core NWB data standard. Users first identify the need for a new data type, such as …

NWB is foundational for the DANDI data repository to enable collaborative data sharing.

The DANDI project makes data and software for cellular neurophysiology FAIR. DANDI stores electrical and optical cellular neurophysiology recordings and associated MRI and/or optical imaging data. …

NWB is integrated with state-of-the-art analysis tools throughout the data life cycle.

NWB technologies are at the heart of the neurodata lifecycle and applications. Data standards are a critical conduit that facilitate the flow of data throughout the data lifecycle and integration of …

NWB together with DANDI provides an accessible approach for FAIR sharing of neurophysiology data.

The table above assesses various approaches for sharing neurophysiology data with regard to their compliance with FAIR data principles. Here, cells shown in gray/green indicate non-compliance and …

Coordinated community engagement, governance, and development of NWB.

(a) NWB is open source with all software and documents available online via GitHub and the nwb.org website. NWB provides a broad range of online community resources, e.g., Slack, online Help Desk, …

Appendix 1—figure 1
Visualization of the stimulus (bottom) and response (top) signals recorded via intracellular electrophysiology and stored in NWB.
Appendix 1—figure 2
Visualization of the intracellular electrophysiology file using NWBWidgets.
Appendix 2—figure 1
Files and folders generated by the cookiecutter ndx-template.

The main folder contains the license and readme file for extension along with files required for installing the extension (e.g., setup.py, setup.cfg, MANIFEST.in, and requirements.txt) as well a …

Appendix 3—figure 1
Illustration of the process for creating, publishing, and updating extensions via the Neurodata Extension Catalog (NDX Catalog), and (3) updating an extension/record.

Boxes shown in gray indicate Git repositories; boxes in orange describe user actions; and boxes in blue indicate actions by administrators of the NDX catalog.

Appendix 3—figure 2
Example ndx-meta.yaml metadata record for the ndx-simulation-output extension.
Appendix 4—figure 1
Overview of select data analysis, visualization, and management tools that support NWB.

Visualization showing select data analysis, visualization, and management tools that support NWB organized by their main application (x-axis) and programming environment (y-axis).

Appendix 7—figure 1
Software Release Process and History.

Overview of the release history of the PyNWB, HDMF, and MatNWB APIs and the NWB and hdmf-common data standard schema.

Tables

Appendix 5—table 1
Compliance of NWB+DANDI with FAIR principles: Findability.
Findable
F1. (Meta)data are assigned a globally unique and persistent identifier F2. Data are described with rich metadata (defined by R1 below) F3. Metadata clearly and explicitly include the identifier of the data they describe F4. (Meta)data are registered or indexed in a searchable resource
Custom No No No
  • N/A. This is a key function of data archives and management systems

Zarr No
  • Self-describing, structural metadata (e.g., data type, array shape etc.) only

  • Scientific (meta)data is fully user defined

HDF5 No
NIX
  • UUIDs are assigned to all objects

  • Self-describing, structural metadata (uses HDF5)

  • Generic data model (i.e., scientific (meta)data is user-defined)

NWB 1.0 No
  • Yes, but the schema language was not formally defined

  • Similar to NWB 2 .x but the much more flexible schema (including inclusion of arbitrary data) often lead to non-compliance

NWB 2 .x
  • UUIDs are assigned to all objects

  • External file identifier can be stored in the identifier field

  • Rich schema for neurophysiology (meta)data

  • Self-describing, structural metadata (uses HDF5) constrained by the standard schema

  • Metadata is either directly associated with or explicitly linked to by the corresponding objects

DANDI
  • All dandisets and assets carry unique and persistent identifiers

  • Uses NWB and other modern data standards

  • Provides its own Dandiset schema for metadata about whole data collections

  • Yes, persistent identifiers used by the archive are included with the metadata

  • DANDI is a public archive that features rich search features over publicly shared data

Appendix 5—table 2
Compliance of NWB+DANDI with FAIR principles: Accessibility.
Accessible
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol A1.1 The protocol is open, free, and universally implementable A1.2 The protocol allows for an authentication and authorisation procedure, where necessary A2. Metadata are accessible, even when the data are no longer available
Custom No No
  • N/A. This is a key function of data archives and management systems

Zarr
  • Non-persistent file/object paths only

  • Yes, but python-only API

  • Long-term support is not clear

  • N/A. This is a key function of data archives and management systems

  • Encryption of files is possible via external tools

  • HDF5/Zarr could support encryption of data elements via I/O filters

HDF5
  • Portable format with broad support across programming languages and compute systems

  • Intended for long-term support

NIX
  • Yes

  • Uses HDF5

  • NIX API for C++. Matlab, Python and Java

  • Open source

NWB 1.0
  • Non-persistent file/object paths only (same as HDF5)

  • Yes, but schema language was not formally defined and available APIs were limited

NWB 2 .x
  • Yes. Objects retrievable based on UUID and path.

  • Uses HDF5

  • NWB API in Python and Matlab

  • Open source

DANDI
  • Uses NWB

  • Metadata is exported as JSON/JSON-LD alongside with data

  • REST API, Python, CLI, DataLad, ROS3 HDF5

  • Uses standard protocols (e.g., REST API)

  • Supports integration with external services

  • Supports user authentication and authorized access to all Dandisets, assets and other DANDI resources

  • Searchable on the the archive and exposed as LinkedData

Appendix 5—table 3
Compliance of NWB+DANDI with FAIR principles: Interoperability.
Interoperable
I1. (Meta)data uses a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (Meta)data use vocabularies that follow FAIR principles I3. (Meta)data include qualified references to other (meta)data
Custom No No No
Zarr No No No
HDF5 No No No
NIX
  • Uses odML

  • Uses HDF5

  • User defined

  • User defined

NWB 1.0
  • Uses custom schema definition in Python

  • Data follows the NWB 1.0 schema

  • Partially. NWB 2 .x significantly enhanced support for linking of metadata with data.

NWB 2 .x
  • Schema defined in JSON/YAML using json-schema

  • NWB and extension schema are available with NWB files and online

  • Uses HDF5

  • Data follows the NWB schema

  • NWB supports use of ontologies via linking to external resources*

  • The NWB schema explicitly models links between (meta)data

  • NWB supports linking to external resources3

DANDI
  • Uses NWB, JSON +json-schema, JSON-LD

  • Uses NWB and other FAIR ontologies

  • schema.org, spdx.org (licenses), PROV

  1. *

    Support for external resources has been released in HDMF >2.3 and is currently undergoing community review for integration with the NWB core data standard.

Appendix 5—table 4
Compliance of NWB+DANDI with FAIR principles: Reusability.
Reusable
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (Meta)data are released with a clear and accessible data usage license R1.2. (Meta)data are associated with detailed provenance R1.3. (Meta)data meet domain-relevant community standards
Custom No
  • N/A. Usage licences are typically managed by data archives

No No
Zarr No No No
HDF5 No No No
NIX
  • User defined

No
  • User defined

NWB 1.0
  • Yes

  • Yes. NWB 2 .x further refined this significantly

  • Yes

NWB 2 .x
  • Yes

  • Includes detailed metadata about publications, experimenters, devices, subjects etc.

  • Derived data (e.g., ROIs) link to the source data

  • Yes, NWB provides detailed, neurophysiology-specific data schema

DANDI
  • Uses NWB and defined dandiset schema

  • All data in DANDI is published with a clear data usage licence

  • Dandisets support detailed metadata about the data generation

  • Dandisets are versioned

  • Uses NWB

Additional files

Download links