Figures and data
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_fig1.tif/full/max/0/default.jpg)
Comparison of the behavior of trained RNNs and monkeys. (A) Schematic of RNN training setup. In a trial, network makes a choice in response to a cue. Then, a feedback input, determined by the choice and reward outcome, is injected to the network. This procedure is repeated across trials. (B) Example of a trained RNN’s choice outcomes. Vertical bars show RNN choices in each trial and the reward outcomes (magenta: choice A, blue: choice B, light: rewarded, dark: not rewarded). Horizontal bars on the top show reward schedules (magenta: choice A receiving reward is 70%, choice B receiving reward is 30%; blue: reward schedule is reversed). Black curve shows the RNN output. Green horizontal bars show the posterior of reversal probability at each trial inferred using Bayesian model. (C) Probability of choosing the initial best option. Relative trial indicates the trial number relative to the behavioral reversal trial inferred from the Bayesian model. Relative trial number 0 is the trial at which the choice was reversed. (D) Fraction of no-reward blocks as a function of relative trial. Dotted lines show 0.3 and 0.7. (E) Distribution of RNN’s and monkey’s reversal trial, relative to the experimentally scheduled reversal trial.
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_fig2.tif/full/max/0/default.jpg)
Neural trajectories encoding choice and reversal probability variables. (A) Neural trajectories of PFC (top) and RNN (bottom) obtained by projecting population activity onto task vectors encoding choice and reversal probability. Trial numbers indicate their relative position to the behavioral reversal trial. Neural trajectories in each trial were averaged over 8 experiment sessions and 23 blocks for the PFC, and 40 networks and 20 blocks for the RNNs. Black square indicates the time of cue onset. (B-C) Neural activity encoding reversal probability and choice in PFC (top) and RNN (bottom) at the time of cue onset (black squares in panel A) around the behavioral reversal trial. Shaded blue shows the standard error of mean over sessions (or networks) and blocks.
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_fig3.tif/full/max/0/default.jpg)
Integration of reward outcomes drives reversal probability activity. (A) The reversal probability activity of PFC (orange) and prediction by the reward integration equation (blue) at the time of cue onset across trials around the behavioral reversal trial. Three example blocks are shown. Pearson correlation between the actual andpredicted PFC activity is shown on each panel. (B)
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_fig4.tif/full/max/0/default.jpg)
Dynamic neural trajectories encoding reversal probability are separated in response to reward outcomes. (A) Two neural models for the reversal probability dynamics. Left: Line attractor model where xrev(t) remains constant during a trial. Right: Dynamic trajectory model where xrev(t) is non-stationary. In both models, the trajectories of adjacent trials are separable if the shift due to reward
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_fig5.tif/full/max/0/default.jpg)
Mean trajectories encoding reversal probability shift monotonically across trials. (A) Traces of
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_fig6.tif/full/max/0/default.jpg)
Perturbing RNN’s neural activity encoding reversal probability biases choice outcomes. (A) RNN perturbation scheme. Three perturbation stimuli were used; v+, population vector encoding the reversal probability; v−, negative of v+; vrnd, control stimulus in random direction. Perturbation stimuli were applied at the reversal (0) and two preceding (−2, -1) trials. (B) Deviation of reversal probability activity Δxrev and choice activity Δxchoice from the unperturbed activity. Perturbation was applied at the reversal trial during a time interval the cue was presented (shaded red). Choice was made after a short delay (shaded gray). Perturbation response along the reversal probability vector v+ (solid) and random vector vrnd (dotted) are shown. (C) Perturbation of reversal probability activity (left) and choice activity (right) in response to three types of stimulus. Δxrev shows the activity averaged over the duration of perturbation, and Δxchoice shows the averaged activity over the duration of choice. (D-E) Fraction of blocks in all 40 trained RNNs that exhibited delayed or accelerated reversal trials in response to perturbations of the reversal probability activity. Perturbations at trial number -1 by three stimulus types are shown on the left panels, and perturbations at all three trials by the stimulus of interest (v− in D and v+ in E) are shown on the right panels. (F) Left: The slope of linear regression model fitted to the residual activity of reversal probability and choice. The residual activity at each trial over the time interval [0, 500]ms was used to fit the linear model. Red dot indicates the slope at trial number -1. Right: Each dot is the residual activity of a block at trial number -1. Red line shows the fitted linear model.
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_tbl1.tif/full/max/0/default.jpg)
Four types of feedback inputs
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_figS1.tif/full/max/0/default.jpg)
Break down of R+, R− by the reward outcomes of two consecutive trials. (A) R+ was decomposed into two components R+ = R++ + R+−, where R++ indicates two consecutive reward trials and R+− indicates a reward followed by no reward. Left: R++ across trial and time (top). Traces of R++ at individual trials and the fraction of trials whose traces are negative (bottom). Middle: Same as the left panel but for R+−. Right: Same as the other panels but for R+. (B) R− was decomposed into two components R− = R−+ + R−−, where R−+ indicates no reward followed by a reward and R−− indicates two consecutive no rewards. Same analysis as in panel (A) was performed.
![](https://prod--epp.elifesciences.org/iiif/2/103660%2Fv1%2Fcontent%2F613033v1_figS2.tif/full/max/0/default.jpg)
Decoding reward outcome and the behavioral reversal trial using neural trajectories encoding reversal probability. (A) Left: Decoding the reward outcome (i.e., reward or no reward) of every trial at each time point, given the difference of neural trajectories of two adjacent trials. At each time point, 300ms segment of the trajectories were used for decoding. Right: Decoding accuracy is averaged over all trials shown on the left panel. Red dotted line shows the approximate time of next trial’s reward. Gray dotted line shows the chance level performance. (B) Left: Decoding the behavioral reversal trial using neural trajectories of 20 trials around the reversal trial. Decoding error shows the position of predicted reversal trial relative to the actual reverse trial. At each time point, 300ms segment of each trajectory was used for decoding. Black shows the decoding error when single trial trajectories were used, and green shows the result when randomly chosen 5 blocks of trajectories were averaged before decoding. Gray dotted line shows the chance level performance. Right: Distance between trajectories was measured by taking the average of normalized mean-squared-error of adjacent trajectories at all trials. Each dot corresponds to a time point shown on the left panel.