Reward-based training of recurrent neural networks for cognitive and value-based tasks

  1. H Francis Song
  2. Guangyu R Yang
  3. Xiao-Jing Wang  Is a corresponding author
  1. New York University, United States
  2. NYU Shanghai, China
4 figures and 2 tables

Figures

Figure 1 with 4 supplements
Recurrent neural networks for reinforcement learning.

(A) Task structure for a simple perceptual decision-making task with variable stimulus duration. The agent must maintain fixation (at=F) until the go cue, which indicates the start of a decision …

https://doi.org/10.7554/eLife.21492.003
Figure 1—figure supplement 1
Learning curves for the simple perceptual decision-making task.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.004
Figure 1—figure supplement 2
Reaction-time version of the perceptual decision-making task, in which the go cue coincides with the onset of stimulus, allowing the agent to choose when to respond.

(A) Task structure for the reaction-time version of the simple perceptual decision-making task, in which the agent can choose to respond any time after the onset of stimulus. (B) Reaction time as a …

https://doi.org/10.7554/eLife.21492.005
Figure 1—figure supplement 3
Learning curves for the reaction-time version of the simple perceptual decision-making task.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.006
Figure 1—figure supplement 4
Learning curves for the simple perceptual decision-making task with a linear readout of the decision network as the baseline.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.007
Figure 2 with 3 supplements
Performance and neural activity of RNNs trained for 'simple' cognitive tasks in which the correct response depends only on the task condition.

Left column shows behavioral performance, right column shows mixed selectivity for task parameters of example units in the decision network. (A) Context-dependent integration task (Mante et al., 2013

https://doi.org/10.7554/eLife.21492.009
Figure 2—figure supplement 1
Learning curves for the context-dependent integration task.

(A) Average reward per trial. Black is for the network realization in the main text, gray for additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.010
Figure 2—figure supplement 2
Learning curves for the multisensory integration task.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.011
Figure 2—figure supplement 3
Learning curves for the parametric working memory task.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.012
Figure 3 with 1 supplement
Perceptual decision-making task with postdecision wagering, based on Kiani and Shadlen (2009).

(A) Task structure. On a random half of the trials, a sure option is presented during the delay period, and on these trials the network has the option of receiving a smaller (compared to correctly …

https://doi.org/10.7554/eLife.21492.014
Figure 3—figure supplement 1
Learning curves for the postdecision wager task.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percent …

https://doi.org/10.7554/eLife.21492.015
Figure 4 with 3 supplements
Value-based economic choice task (Padoa-Schioppa and Assad, 2006).

(A) Choice pattern when the reward contingencies are indifferent for roughly 1 'juice' of A and 2 'juices' of B (upper) or 1 juice of A and 4 juices of B (lower). (B) Mean activity of example value …

https://doi.org/10.7554/eLife.21492.016
Figure 4—figure supplement 1
Fit of cumulative Gaussian with parameters μ, σ to the choice pattern in Figure 4 (upper), and the deduced indifference point nB*/nA*=(1+μ)/(1-μ).
https://doi.org/10.7554/eLife.21492.017
Figure 4—figure supplement 2
Fit of cumulative Gaussian with parameters μ, σ to the choice pattern in Figure 4A (lower), and the deduced indifference point nB*/nA*=(1+μ)/(1-μ).
https://doi.org/10.7554/eLife.21492.018
Figure 4—figure supplement 3
Learning curves for the value-based economic choice task.

(A) Average reward per trial. Black indicates the network realization shown in the main text, gray additional realizations, i.e., trained with different random number generator seeds. (B) Percentage …

https://doi.org/10.7554/eLife.21492.019

Tables

Table 1

Parameters for reward-based recurrent neural network training. Unless noted otherwise in the text, networks were trained and run with the parameters listed here.

https://doi.org/10.7554/eLife.21492.008
ParameterSymbolDefault value
Learning rateη0.004
Maximum gradient normΓ1
Size of decision/value networkN100
Connection probability (decision network)pcπ0.1
Connection probability (value network)pcv1
Time stepΔt10 ms
Unit time constantτ100 ms
Recurrent noiseσrec20.01
Initial spectral radius for recurrent weightsρ02
Number of trials per gradient updateNtrials# of task conditions
Table 2

Psychophysical thresholds σvisual, σauditory, and σmultisensory obtained from fits of cumulative Gaussian functions to the psychometric curves in visual only, auditory only, and multisensory trials in the multisensory …

https://doi.org/10.7554/eLife.21492.013
σvisualσauditoryσmultisensory1σvisual2+1σauditory2
1σmultisensory2
2.1242.0991.4510.4490.475
2.1072.0861.4480.4550.477
2.2762.1281.5520.4140.415
2.1182.1551.5080.4380.440
2.0772.1711.5820.4440.400
2.0882.1491.4800.4460.457

Download links