Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows

  1. Olivia Eriksson  Is a corresponding author
  2. Upinder Singh Bhalla
  3. Kim T Blackwell
  4. Sharon M Crook
  5. Daniel Keller
  6. Andrei Kramer
  7. Marja-Leena Linne
  8. Ausra Saudargienė
  9. Rebecca C Wade
  10. Jeanette Hellgren Kotaleski  Is a corresponding author
  1. Science for Life Laboratory, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Sweden
  2. National Center for Biological Sciences, Tata Institute of Fundamental Research, India
  3. Department of Bioengineering, Volgenau School of Engineering, George Mason University, United States
  4. School of Mathematical and Statistical Sciences, Arizona State University, United States
  5. Blue Brain Project, École Polytechnique Fédérale de Lausanne, Switzerland
  6. Department of Neuroscience, Karolinska Institute, Sweden
  7. Faculty of Medicine and Health Technology, Tampere University, Finland
  8. Neuroscience Institute, Lithuanian University of Health Sciences, Lithuania
  9. Department of Informatics, Vytautas Magnus University, Lithuania
  10. Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS), Germany
  11. Center for Molecular Biology (ZMBH), ZMBH-DKFZ Alliance, University of Heidelberg, Germany
  12. Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany
5 figures and 4 tables

Figures

Schematic illustration of different types of models and their relative relationships.

A subset of neuroscience models are visualized based on four criteria: (A) biological scale, (B) level of abstraction, (C) the degree to which a hypothesis-driven, or (D) data-driven approach have been used for the model construction. Illustration includes examples from the text as well as several classical models: Albus, 1971 (hypothesis-driven phenomenological model of the cerebellar circuitry as a pattern recognition system); Bienenstock et al., 1982 (algebraic bidirectional synaptic plasticity rule); Bruce et al., 2019b (mechanistic molecular model of the regulation of adenylyl cyclase 5 by G-proteins in corticostriatal synaptic plasticity induction); Dreyer et al., 2010 (data-driven mechanistic model of tonic and phasic dopamine release and activation of receptors in the dorsal striatum), Frémaux et al., 2013 (hypothesis-driven phenomenological model of a reward-modulated spike-timing-dependent learning rule for an actor-critic network); Gurney et al., 2015 (basal ganglia model with corticostriatal reinforcement learning using data-driven synaptic plasticity rules); Hayer and Bhalla, 2005 (data-driven biochemical bidirectional synaptic plasticity model involving calcium/calmodulin-dependent protein kinase II [CaMKII]); Hodgkin and Huxley, 1952 (classic biophysical model of action potentials); Knight, 1972 (hypothesis-driven phenomenological model of stimulus encoding into a neuronal population); Markram et al., 2015 (large-scale digital reconstruction of somatosensory cortex microcircuitry); Marr, 1969 (cerebellar algorithm model); Traub et al., 1994 (biophysical hippocampal CA3 neuron model showing the importance of dendritic ion channels).

The modeling process.

Model development starts with assembling the information that is the foundation of the modeling study, such as the relevant experimental literature, published models, and additional experimental results (box 1). This is specified in a structured model and data format (box 2a), and the model is next refined and updated (box 2b). The model refinement step is an iterative process, where the model is simulated a large number of times to update the model parameters so that the model captures specific experimental data (quantitatively) or phenomena (qualitatively). Once developed, the model can be exploited for a variety of applications (box 3).

Abstract and mechanistic versions of the Bienenstock–Cooper–Munro (BCM) rule model.

(A) Original version where the rate of plasticity change, ϕ, is a function of the stimulus strength c. At the threshold ΘM the sign of synaptic change flips from negative to positive. (B) Simplified mechanistic (chemical) model based on known pathways that could implement the BCM rule. The calcium stimulus activates both a kinase, CaMKII, and a phosphatase, CaN. These act in an antagonistic manner on the AMPA receptor, leading to its dephosphorylation and removal from the synapse when the calcium concentration, [Ca2+], is moderate, but insertion when [Ca2+] is high. The output from the model is p_AMPAR, the phospho-form of AMPAR, which is inserted into the membrane. (C) Simulated response of the model in Panel B, measured as phosphorylated receptor p_AMPAR. This curve has the same shape as the abstract BCM curve model in Panel A, including a threshold level of [Ca2+] = ΘM at which the synaptic change as measured by AMPA phosphorylation changes sign from negative to positive. The basal fraction of p_AMPAR is 0.4. Model from accession 96 on DOQCS (see Table 1, Row 15).

Workflow for uncertainty quantification.

An important aspect of data-driven, mechanistic modeling is to describe the uncertainty in the parameter estimates (inverse uncertainty quantification), that is to find and describe the parameter space that provides a good fit with the selected data. This is often done through Bayesian methodology starting from a model structure, some quantitative data that can be mapped to the output from the model and prior information on the parameters (like assumed ranges or distribution). The parameter space retrieved from this process is referred to as the posterior parameter distribution. This uncertainty is propagated (forward uncertainty propagation) to the predictions that we make from the model, by performing simulations from a sample of parameters representing the possible parameter distribution (corresponding to an ‘ensemble model’). Finally a global sensitivity analysis can be performed based on the posterior distribution (Eriksson et al., 2019). Global sensitivity analysis are also often performed in other settings (not shown here) directly on a preassumed parameter distribution. Figure modified from Eriksson et al., 2019.

Toward FAIR (Findable, Accessible, Interoperable, Reusable) workflows in neuroscience.

Three examples of workflows at different biological scales. Several of the workflow components can be found in the tables of the article as indicated by T (Table) and R (Row) in the figure, for example T1,R4. (a) To build a single neuron multicompartmental electrical model, morphologies from databases such as NeuroMorpho or Allen Brain Atlas (http://celltypes.brain-map.org/) can be used (Ascoli et al., 2007; Gouwens et al., 2020). Electrical recordings from Allen Brain Atlas are also available. Features from experimental traces can be extracted using eFEL (https://github.com/BlueBrain/eFEL), and kinetic parameters for ion channel models obtained from Channelpedia. During the neuron model reconstruction step, the model is represented in NeuroML and model parameters such as conductance density are optimized with BluePyOpt. In silico experiments can be performed using NEURON. (b) Example workflow for building a chemical kinetic model to implement the Bienenstock–Cooper–Munro (BCM) curve. The expected shape of the curve is obtained from the classic Bienenstock et al., 1982 study, and a first pass of likely chemical pathways from Lisman, 1989. Detailed chemistry and parameters are from databases that cover models (DOQCS, BioModels) and from BRENDA, which hosts enzyme kinetics. Both model databases support SBML, which can be used to define the model and parameters. Several simulators including MOOSE and COPASI can run the SBML model during the optimization step and subsequently for model predictions. For optimization, the FindSim framework (Viswan et al., 2018) compares model outcome to experiments. The score from these comparisons is used by HOSS (Hierarchical Optimization of Systems Simulations https://github.com/upibhalla/HOSS) to carry out parameter fitting. (c) An example of a workflow used in subcellular model building with Bayesian parameter estimation. A prototype of the workflow was used in Church et al., 2021, where the experimental data is described. The model structure was adopted from Buxbaum and Dudai, 1989. A smaller demo version can be found at https://github.com/icpm-kth/uqsa (copy archived at https://doi.org/10.5281/zenodo.6625529). Experimental data and model structure are taken from literature and saved in the SBtab format. Scripts written in R convert the SBtab file to R code and the Bayesian parameter estimation is performed with the UQSA software written in R (https://github.com/icpm-kth/uqsa, copy archived at https://doi.org/10.5281/zenodo.6625529). The SBtab files can subsequently be updated with the refined model with new parameter estimates including uncertainties.

Tables

Table 1
Databases in cellular neuroscience and systems biology.

These are some of the commonly used databases for creating and constraining models at the intracellular and cellular scale.

Database, alphabeticallyPurpose/focusReferenceHomepage
Computational models
1BioModels DatabasePhysiologically and pharmaceutically relevant mechanistic models in standard formatsLi et al., 2010http://www.ebi.ac.uk/biomodels/
2MoDEL Central Nervous SystemAtomistic-MD trajectories for relevant signal transduction proteinshttp://mmb.irbbarcelona.org/MoDEL-CNS/
3NeuroML-DBModels of channels, cells, circuits, and their properties and behaviorBirgiolas et al., 2015https://neuroml-db.org/
4NeuroElectroExtract and compile from literature electrophysiological properties of diverse neuron typesTripathy et al., 2014https://neuroelectro.org
5ModelDBComputational neuroscience modelMcDougal et al., 2017https://senselab.med.yale.edu/modeldb/
6Ion Channel GenealogyIon channel modelsPodlaski et al., 2017https://icg.neurotheory.ox.ac.uk/
Experimental data
7Allen Brain AtlasHuman and mouse brain dataLein et al., 2007http://www.brain-map.org/
8BRENDAEnzyme kinetic dataChang et al., 2021https://www.brenda-enzymes.org/
9CRCNS - Collaborative Research in Computational NeuroscienceForum for sharing tools and data for testing computational models and new analysis methodsTeeters et al., 2008https://CRCNS.org
10NeuroMorphoNeuronal cell 3D reconstructionsAscoli et al., 2007http://neuromorpho.org/
11Protein Data Bank (PDB)3D structures of proteins, nucleic acids, and complex assemblieswwPDB consortium, 2019http://www.wwpdb.org/
12Sabio-RKCurated database on biochemical reactions, kinetic rate equations with parameters and experimental conditionsWittig et al., 2012http://sabio.h-its.org/
13Yale Protein Expression Database (YPED)Proteomic and small moleculesColangelo et al., 2019https://medicine.yale.edu/keck/nida/yped/
Experimental data and models
14ChannelpediaIon channel data and channel modelsRanjan et al., 2011https://channelpedia.epfl.ch/
15DOQCS The Database of Quantitative Cellular SignalingKinetic data for signaling molecules and interactionsSivakumaran et al., 2003http://doqcs.ncbs.res.in/
16EBRAINS (including EBRAINS Knowledge Graph)Digital research infrastructure that gathers data, models and tools for brain-related researchhttps://ebrains.eu
(https://search.kg.ebrains.eu)
17FAIRDOMHubThe FAIRDOMHub is a repository for publishing FAIR Data, Operating procedures and Models for the Systems Biology communityWolstencroft et al., 2017https://fairdomhub.org/
18Open Source BrainA resource for sharing and collaboratively developing computational models of neural systemsGleeson et al., 2019https://www.opensourcebrain.org/
Table 2
Model standards and file formats in cellular neuroscience and systems biology.

The formats described in this table allow standardized representation of models and their porting across simulation platforms.

NamePurposeWebpageReference
Formats for intracellular models in systems biology
1SBMLSystems biology markup language, for storing and sharing modelshttps://sbml.orgHucka et al., 2003
2SBtabSystems Biology tables, for storing models (and data for parameter estimation) in spreadsheet formhttps://sbtab.net/Lubitz et al., 2016
3CellMLStore and exchange mathematical models, primarily in Biologyhttps://www.cellml.org/Kohl et al., 2001
Formats for cellular and network-level models in Neuroscience
4NeuroMLA XML-based description language that provides a common data format for defining and exchanging descriptions of neuronal cell and network modelshttp://www.neuroml.org/Gleeson et al., 2010
5NineMLUnambiguous description of neuronal network modelshttps://github.com/INCF/nineml-specRaikov et al., 2011
6NestMLDomain-specific language for the specification of neuron models (python)https://github.com/nest/nestmlPlotnikov et al., 2016
Custom formats for specific simulators
7sbprojSimbiology Project filehttps://se.mathworks.com/products/simbiology.htmlSchmidt and Jirstrand, 2006
8COPASI project fileCOPASI native format for models and simulationshttp://copasi.org/Hoops et al., 2006
9SONATAEfficient descriptions of large-scale neural neworkshttps://github.com/AllenInstitute/sonataDai et al., 2020
10JSON (HillTau)JSON files for FindSim and HillTau model reduction methodhttps://github.com/BhallaLab/HillTauBhalla, 2020
11MODExpanding NEURON’s repertoire of mechanisms with NMODLhttps://www.neuron.yale.edu/neuron/Hines and Carnevale, 2000
Formats for specification of parameter estimation problems
12SBtabSystems Biology tables, for storing both models and data for parameter estimation in spreadsheet formhttps://sbtab.net/Lubitz et al., 2016
13PEtabInteroperable specification of parameter estimation problems in systems biologyhttps://github.com/PEtab-dev/PEtabSchmiester et al., 2021
Formats for specification of experiments and data
14SED-MLSimulation Experiment Description Markup Languagehttps://sed-ml.orgWaltemath et al., 2011
Table 3
Software for model simulation.

These tools span a wide range of scales and levels of abstraction.

NamePurposeInterchange file formats supportedHomepageRRIDReference
Molecular level
1BioNetGenRule-based modeling framework (NFsim)BNGL, SBMLhttp://bionetgen.org/Harris et al., 2016
2COPASIBiochemical system simulatorSBMLhttp://copasi.org/SCR_014260Hoops et al., 2006
3IQM ToolsSystems Biology modeling toolbox in MATLAB; successor to SBPOPSBMLhttps://iqmtools.intiquan.com/
4MCellSimulation tool for modeling the movements and reactions of molecules within and between cells by using spatially realistic 3D cellular models and specialized Monte Carlo algorithmsSBMLhttps://mcell.org/SCR_007307Stiles et al., 1996; Stiles and Bartol, 2001, Kerr et al., 2008
5NeuroRDStochastic diffusion simulator to model intracellular signaling pathwaysXMLhttp://krasnow1.gmu.edu/CENlab/software.htmlSCR_014769Oliveira et al., 2010
6SimbiologyMATLAB’s systems biology toolbox (Mathworks)sbprojhttps://www.mathworks.com/products/simbiology.htmlSchmidt and Jirstrand, 2006
7STEPSSimulation tool for cellular signaling and biochemical pathways to build systems that describe reaction–diffusion of molecules and membrane potentialSBMLhttp://steps.sourceforge.net/STEPS/default.phpSCR_008742Hepburn et al., 2012
8VCellSimulation tool for deterministic, stochastic, and hybrid deterministic–stochastic models of molecular reactions, diffusion and electrophysiologySBML, CellMLhttps://vcell.org/SCR_007421Schaff et al., 1997
Cellular level
9NEURONSimulation environment to build and use computational models of neurons and networks of neurons; also subcellular simulations with the reaction–diffusion moduleSONATA (after conversion) for networks, but can also use NeuroML and SBMLhttps://neuron.yale.edu/neuron/SCR_005393Carnevale and Hines, 2009; Hines and Carnevale, 1997
Network level
10BRIANSimulation tool for spiking neural networksSONATAhttps://briansimulator.org/SCR_002998Goodman and Brette, 2008; Stimberg et al., 2019
11NESTSimulation tools for large-scale biologically realistic neuronal networksSONATA (after conversion)https://www.nest-initiative.org/SCR_002963Diesmann et al., 1999
12PyNNA Common Interface for Neuronal Network SimulatorsSONATAhttp://neuralensemble.org/PyNN/SCR_005393Davison et al., 2008
Multiscale
13MOOSEMultiscale object-oriented simulation environment to simulate subcellular components, neurons, circuits, and large networks.SBML, NeuroMLhttps://moose.ncbs.res.in/SCR_002715Ray and Bhalla, 2008
14NetPyNeMultiscale models for subcellular to large network levelsNeuroML/SONATAnetpyne.orgSCR_014758Dura-Bernal et al., 2019
15PottersWheelComprehensive modeling framework in MATLABhttps://potterswheel.de/SCR_021118Maiwald and Timmer, 2008
16SYCAMOREBuilding, simulation, and analysis of models of biochemical systemsSBMLhttp://sycamore.h-its.org/sycamore/SCR_021117Weidemann et al., 2008
17The Virtual BrainCreate personalized brain models and simulate multiscale networkshdf5, Nifti, GIFTIhttps://thevirtualbrain.org/SCR_002249Sanz-Leon et al., 2015
Table 4
Tools for model refinement and analysis.

This table exemplifies some common and new tools for parameter estimation and different types of model analyses.

NamePurposeInterchange file formats supportedHomepageRRIDReference
1AjustadorData-driven parameter estimation for Moose and NeuroRD models. Provides parameter distributions.CSV, MOOSE, NeuroRDhttps://neurord.github.io/ajustador/Jedrzejewski-Szmek and Blackwell, 2016
2AMICIHigh-level language bindings to CVODE and SBML support.SBMLhttps://amici.readthedocs.io/en/latest/index.htmlFröhlich et al., 2021
3BluePyOptData-driven model parameter optimization.https://github.com/BlueBrain/BluePyOptSCR_014753Van Geit et al., 2016
4PottersWheelParameter estimation, profile likelihood: determination of identifiabilty and confidence intervals for parameters.SBMLhttps://potterswheel.de/SCR_021118Maiwald and Timmer, 2008; Raue et al., 2009
5pyABCParameter estimation through Approximate Bayesian Computation (likelihood free Bayesian approch).PEtab, SBML via AMICIhttps://pyabc.readthedocs.io/en/latest/Klinger et al., 2018
6SimbiologyMATLAB’s systems biology toolbox (Mathworks), performs, for example, parameter estimation, local and global sensitivity analysis, and more.SBMLhttps://se.mathworks.com/products/simbiology.htmlSchmidt and Jirstrand, 2006
7UncertainpyGlobal Sensitivity AnalysisNEURON and NEST modelshttps://github.com/simetenn/uncertainpyTennøe et al., 2018
8XPPAUT/AUTOModel analysis including phase plane analyses, stability analysis, vector fields, null clines, and more XPPAUT contains a frontend to AUTO for bifurcation analysis.http://www.math.pitt.edu/~bard/xpp/xpp.htmlSCR_001996Ermentrout, 2002 (XPPAUT)
9pyPESTOToolbox for parameter estimationSBML, PEtabhttps://github.com/ICB-DCM/pyPESTO/SCR_016891Stapor et al., 2018
10COPASISimulation and analysis of biochemical network modelsSBMLhttps://copasi.orgSCR_014260Hoops et al., 2006
11PyBioNetFitParameterizing biological modelsBNGL, SBML, BPSLhttps://bionetfit.nau.edu/Mitra et al., 2019
12Data2DynamicsEstablishing ODE models based on experimental dataSBMLhttps://github.com/Data2Dynamics/d2dRaue et al., 2015
13HippoUnitTesting scientific modelsHOC languagehttps://github.com/KaliLab/hippounitSáray et al., 2021

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Olivia Eriksson
  2. Upinder Singh Bhalla
  3. Kim T Blackwell
  4. Sharon M Crook
  5. Daniel Keller
  6. Andrei Kramer
  7. Marja-Leena Linne
  8. Ausra Saudargienė
  9. Rebecca C Wade
  10. Jeanette Hellgren Kotaleski
(2022)
Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows
eLife 11:e69013.
https://doi.org/10.7554/eLife.69013