Reward-based training of recurrent neural networks for cognitive and value-based tasks
Abstract
Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal's internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task.
Article and author information
Author details
Funding
Office of Naval Research (N00014-13-1-0297)
- H Francis Song
- Guangyu R Yang
- Xiao-Jing Wang
- H Francis Song
- Guangyu R Yang
- Xiao-Jing Wang
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2017, Song et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 10,799
- views
-
- 1,919
- downloads
-
- 120
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
The concept that dimeric protein complexes in synapses can sequentially replace their subunits has been a cornerstone of Francis Crick’s 1984 hypothesis, explaining how long-term memories could be maintained in the face of short protein lifetimes. However, it is unknown whether the subunits of protein complexes that mediate memory are sequentially replaced in the brain and if this process is linked to protein lifetime. We address these issues by focusing on supercomplexes assembled by the abundant postsynaptic scaffolding protein PSD95, which plays a crucial role in memory. We used single-molecule detection, super-resolution microscopy and MINFLUX to probe the molecular composition of PSD95 supercomplexes in mice carrying genetically encoded HaloTags, eGFP, and mEoS2. We found a population of PSD95-containing supercomplexes comprised of two copies of PSD95, with a dominant 12.7 nm separation. Time-stamping of PSD95 subunits in vivo revealed that each PSD95 subunit was sequentially replaced over days and weeks. Comparison of brain regions showed subunit replacement was slowest in the cortex, where PSD95 protein lifetime is longest. Our findings reveal that protein supercomplexes within the postsynaptic density can be maintained by gradual replacement of individual subunits providing a mechanism for stable maintenance of their organization. Moreover, we extend Crick’s model by suggesting that synapses with slow subunit replacement of protein supercomplexes and long-protein lifetimes are specialized for long-term memory storage and that these synapses are highly enriched in superficial layers of the cortex where long-term memories are stored.
-
- Neuroscience
Complex macro-scale patterns of brain activity that emerge during periods of wakeful rest provide insight into the organisation of neural function, how these differentiate individuals based on their traits, and the neural basis of different types of self-generated thoughts. Although brain activity during wakeful rest is valuable for understanding important features of human cognition, its unconstrained nature makes it difficult to disentangle neural features related to personality traits from those related to the thoughts occurring at rest. Our study builds on recent perspectives from work on ongoing conscious thought that highlight the interactions between three brain networks – ventral and dorsal attention networks, as well as the default mode network. We combined measures of personality with state-of-the-art indices of ongoing thoughts at rest and brain imaging analysis and explored whether this ‘tri-partite’ view can provide a framework within which to understand the contribution of states and traits to observed patterns of neural activity at rest. To capture macro-scale relationships between different brain systems, we calculated cortical gradients to describe brain organisation in a low-dimensional space. Our analysis established that for more introverted individuals, regions of the ventral attention network were functionally more aligned to regions of the somatomotor system and the default mode network. At the same time, a pattern of detailed self-generated thought was associated with a decoupling of regions of dorsal attention from regions in the default mode network. Our study, therefore, establishes that interactions between attention systems and the default mode network are important influences on ongoing thought at rest and highlights the value of integrating contemporary perspectives on conscious experience when understanding patterns of brain activity at rest.