(A) Behavior of the model in an example session, composed of four blocks (separated by dashed vertical lines). The probabilities of reward for choosing actions 1 and 2 are denoted by the pair of …
(A) Two example random-walk neurons that appear as if they represent action-values. The red (top) and blue (bottom) lines denote the estimated action-values 1 and 2, respectively that were depicted …
(A) An example of operant learning by the covariance model (see Materials and methods). Legend is the same as in Figure 1A. (B) Two example covariance neurons that appear is if they represent …
(A) and (B) motor cortex neurons (A) An example motor cortex neuron recorded in a BMI task, presented as if the sequence of spike counts of this neuron corresponds to the sequence of trials in a …
Population analysis on basal ganglia neurons. Spike counts were regressed on estimated action-values that were created in the same experimental setting as in Figure 1. To compare with the number of …
Conducting this analysis using the four data sets and unrelated, simulated action-values we erroneously detect action-value representation in all data sets. (A) (B) (C) and (D) denote the …
Following (Kim et al., 2013), we simulated the experimental settings in (Kim et al., 2013) to create simulated action-values (see also Meterials and methods on Figure 2—figure supplement 4, with the …
Following (Ito and Doya, 2009; Samejima et al., 2005) for each of the four data-sets (A) random-walk (B) motor cortex (C) auditory cortex and (D) basal ganglia neurons, spike counts in the last 20 …
Following (Ito and Doya, 2015a), we conducted a multiple linear regression analysis using unrelated action-values (the same action-values as in Figure 2—figure supplement 6) and the following …
Following (Wang et al., 2013) (whose main focus was state-value representation), we considered an unbiased classification of action-value neurons. (A) (B) (C) and (D) denote the random-walk neurons, …
In the spirit of the experiment conducted by (FitzGerald et al., 2012) we simulated an experiment, in which two different trial types are marked by cues, randomly selected every trial, and …
We repeated the standard analysis with eight blocks, so that the four blocks from the experiment in Figure 1A were repeated twice, each time in random permutation. The mean length of the sessions …
(A) Red and blue correspond to red and blue - labeled neurons in Figure 1B. Arrow-heads denote the t-values from the regressions on the estimated action-values from the session in which the neuron …
(A) The standard analysis on the neuronal recordings taken from (Ito and Doya, 2009), using action-values estimated from the behavior in the sessions in which the neurons were recorded. The top and …
(A) A Q-learning model was simulated in 1,000 sessions of 400 trials, where the original reward probabilities (same as in Figure 1A) were associated with different cues and appeared randomly. …
(A) Behavior of the model in an example session, same as in Figure 1A for the direct-policy model. (B) Red and blue lines denote ‘action-values’ 1 and 2, respectively, calculated from the choices …
(A) Regression analysis on policy neurons with choice as an added regressor. Following common experimental practice, we used the following regression model: where is the spike count in trial …
(A) The Q-learning model (Equations 1 and 2) was simulated in 1,000 sessions of 400 trials each, where the reward probabilities were associated with different cues and were randomly chosen in each …