Reward-based training of recurrent neural networks for cognitive and value-based tasks

  1. H Francis Song
  2. Guangyu R Yang
  3. Xiao-Jing Wang  Is a corresponding author
  1. New York University, United States

Abstract

Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal's internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task.

Article and author information

Author details

  1. H Francis Song

    Center for Neural Science, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Guangyu R Yang

    Center for Neural Science, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Xiao-Jing Wang

    Center for Neural Science, New York University, New York, United States
    For correspondence
    xjwang@nyu.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3124-8474

Funding

Office of Naval Research (N00014-13-1-0297)

  • H Francis Song
  • Guangyu R Yang
  • Xiao-Jing Wang

Google

  • H Francis Song
  • Guangyu R Yang
  • Xiao-Jing Wang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Timothy EJ Behrens, University College London, United Kingdom

Version history

  1. Received: September 13, 2016
  2. Accepted: January 12, 2017
  3. Accepted Manuscript published: January 13, 2017 (version 1)
  4. Version of Record published: February 6, 2017 (version 2)

Copyright

© 2017, Song et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 10,247
    Page views
  • 1,825
    Downloads
  • 83
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. H Francis Song
  2. Guangyu R Yang
  3. Xiao-Jing Wang
(2017)
Reward-based training of recurrent neural networks for cognitive and value-based tasks
eLife 6:e21492.
https://doi.org/10.7554/eLife.21492

Share this article

https://doi.org/10.7554/eLife.21492

Further reading

    1. Neuroscience
    Harry Clark, Matthew F Nolan
    Research Article

    Grid firing fields have been proposed as a neural substrate for spatial localisation in general or for path integration in particular. To distinguish these possibilities, we investigate firing of grid and non-grid cells in the mouse medial entorhinal cortex during a location memory task. We find that grid firing can either be anchored to the task environment, or can encode distance travelled independently of the task reference frame. Anchoring varied between and within sessions, while spatial firing of non-grid cells was either coherent with the grid population, or was stably anchored to the task environment. We took advantage of the variability in task-anchoring to evaluate whether and when encoding of location by grid cells might contribute to behaviour. We find that when reward location is indicated by a visual cue, performance is similar regardless of whether grid cells are task-anchored or not, arguing against a role for grid representations when location cues are available. By contrast, in the absence of the visual cue, performance was enhanced when grid cells were anchored to the task environment. Our results suggest that anchoring of grid cells to task reference frames selectively enhances performance when path integration is required.

    1. Neuroscience
    Kiwamu Kudo, Kamalini G Ranasinghe ... Srikantan S Nagarajan
    Research Article

    Alzheimer’s disease (AD) is characterized by the accumulation of amyloid-β and misfolded tau proteins causing synaptic dysfunction, and progressive neurodegeneration and cognitive decline. Altered neural oscillations have been consistently demonstrated in AD. However, the trajectories of abnormal neural oscillations in AD progression and their relationship to neurodegeneration and cognitive decline are unknown. Here, we deployed robust event-based sequencing models (EBMs) to investigate the trajectories of long-range and local neural synchrony across AD stages, estimated from resting-state magnetoencephalography. The increases in neural synchrony in the delta-theta band and the decreases in the alpha and beta bands showed progressive changes throughout the stages of the EBM. Decreases in alpha and beta band synchrony preceded both neurodegeneration and cognitive decline, indicating that frequency-specific neuronal synchrony abnormalities are early manifestations of AD pathophysiology. The long-range synchrony effects were greater than the local synchrony, indicating a greater sensitivity of connectivity metrics involving multiple regions of the brain. These results demonstrate the evolution of functional neuronal deficits along the sequence of AD progression.