Reward-based training of recurrent neural networks for cognitive and value-based tasks

  1. H Francis Song
  2. Guangyu R Yang
  3. Xiao-Jing Wang  Is a corresponding author
  1. New York University, United States

Abstract

Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal's internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task.

Article and author information

Author details

  1. H Francis Song

    Center for Neural Science, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Guangyu R Yang

    Center for Neural Science, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Xiao-Jing Wang

    Center for Neural Science, New York University, New York, United States
    For correspondence
    xjwang@nyu.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3124-8474

Funding

Office of Naval Research (N00014-13-1-0297)

  • H Francis Song
  • Guangyu R Yang
  • Xiao-Jing Wang

Google

  • H Francis Song
  • Guangyu R Yang
  • Xiao-Jing Wang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2017, Song et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 10,903
    views
  • 1,932
    downloads

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. H Francis Song
  2. Guangyu R Yang
  3. Xiao-Jing Wang
(2017)
Reward-based training of recurrent neural networks for cognitive and value-based tasks
eLife 6:e21492.
https://doi.org/10.7554/eLife.21492

Share this article

https://doi.org/10.7554/eLife.21492

Further reading

    1. Neuroscience
    Claire Meissner-Bernard, Friedemann Zenke, Rainer W Friedrich
    Research Article

    Biological memory networks are thought to store information by experience-dependent changes in the synaptic connectivity between assemblies of neurons. Recent models suggest that these assemblies contain both excitatory and inhibitory neurons (E/I assemblies), resulting in co-tuning and precise balance of excitation and inhibition. To understand computational consequences of E/I assemblies under biologically realistic constraints we built a spiking network model based on experimental data from telencephalic area Dp of adult zebrafish, a precisely balanced recurrent network homologous to piriform cortex. We found that E/I assemblies stabilized firing rate distributions compared to networks with excitatory assemblies and global inhibition. Unlike classical memory models, networks with E/I assemblies did not show discrete attractor dynamics. Rather, responses to learned inputs were locally constrained onto manifolds that ‘focused’ activity into neuronal subspaces. The covariance structure of these manifolds supported pattern classification when information was retrieved from selected neuronal subsets. Networks with E/I assemblies therefore transformed the geometry of neuronal coding space, resulting in continuous representations that reflected both relatedness of inputs and an individual’s experience. Such continuous representations enable fast pattern classification, can support continual learning, and may provide a basis for higher-order learning and cognitive computations.

    1. Neuroscience
    Cristina Gil Avila, Elisabeth S May ... Markus Ploner
    Research Article

    Chronic pain is a prevalent and debilitating condition whose neural mechanisms are incompletely understood. An imbalance of cerebral excitation and inhibition (E/I), particularly in the medial prefrontal cortex (mPFC), is believed to represent a crucial mechanism in the development and maintenance of chronic pain. Thus, identifying a non-invasive, scalable marker of E/I could provide valuable insights into the neural mechanisms of chronic pain and aid in developing clinically useful biomarkers. Recently, the aperiodic component of the electroencephalography (EEG) power spectrum has been proposed to represent a non-invasive proxy for E/I. We, therefore, assessed the aperiodic component in the mPFC of resting-state EEG recordings in 149 people with chronic pain and 115 healthy participants. We found robust evidence against differences in the aperiodic component in the mPFC between people with chronic pain and healthy participants, and no correlation between the aperiodic component and pain intensity. These findings were consistent across different subtypes of chronic pain and were similarly found in a whole-brain analysis. Their robustness was supported by preregistration and multiverse analyses across many different methodological choices. Together, our results suggest that the EEG aperiodic component does not differentiate between people with chronic pain and healthy individuals. These findings and the rigorous methodological approach can guide future studies investigating non-invasive, scalable markers of cerebral dysfunction in people with chronic pain and beyond.