Striatal action-value neurons reconsidered

  1. Lotem Elber-Dorozko  Is a corresponding author
  2. Yonatan Loewenstein
  1. The Hebrew University of Jerusalem, Israel
10 figures and 1 additional file

Figures

Model of action-value neurons.

(A) Behavior of the model in an example session, composed of four blocks (separated by dashed vertical lines). The probabilities of reward for choosing actions 1 and 2 are denoted by the pair of …

https://doi.org/10.7554/eLife.34248.002
Figure 2 with 10 supplements
Erroneous detection of action-value representation in random-walk neurons.

(A) Two example random-walk neurons that appear as if they represent action-values. The red (top) and blue (bottom) lines denote the estimated action-values 1 and 2, respectively that were depicted …

https://doi.org/10.7554/eLife.34248.003
Figure 2—figure supplement 1
Erroneous detection of action-value representation in a model with covariance based synaptic plasticity.

(A) An example of operant learning by the covariance model (see Materials and methods). Legend is the same as in Figure 1A. (B) Two example covariance neurons that appear is if they represent …

https://doi.org/10.7554/eLife.34248.004
Figure 2—figure supplement 2
Erroneous detection of action-value neurons in unrelated experiments.

(A) and (B) motor cortex neurons (A) An example motor cortex neuron recorded in a BMI task, presented as if the sequence of spike counts of this neuron corresponds to the sequence of trials in a …

https://doi.org/10.7554/eLife.34248.005
Figure 2—figure supplement 3
Erroneous detection of unrelated action-value representations in basal ganglia neurons.

Population analysis on basal ganglia neurons. Spike counts were regressed on estimated action-values that were created in the same experimental setting as in Figure 1. To compare with the number of …

https://doi.org/10.7554/eLife.34248.006
Figure 2—figure supplement 4
Spike count permutation (as in [Kim et al., 2009]) does not resolve the temporal correlations confound.

Conducting this analysis using the four data sets and unrelated, simulated action-values we erroneously detect action-value representation in all data sets. (A) (B) (C) and (D) denote the …

https://doi.org/10.7554/eLife.34248.007
Figure 2—figure supplement 5
Autoregressive coefficients do not resolve the temporal correlations confound.

Following (Kim et al., 2013), we simulated the experimental settings in (Kim et al., 2013) to create simulated action-values (see also Meterials and methods on Figure 2—figure supplement 4, with the …

https://doi.org/10.7554/eLife.34248.008
Figure 2—figure supplement 6
Regression on reward probabilities does not resolve the temporal correlations confound.

Following (Ito and Doya, 2009; Samejima et al., 2005) for each of the four data-sets (A) random-walk (B) motor cortex (C) auditory cortex and (D) basal ganglia neurons, spike counts in the last 20 …

https://doi.org/10.7554/eLife.34248.009
Figure 2—figure supplement 7
Detrending analysis does not resolve the temporal correlations confound.

Following (Ito and Doya, 2015a), we conducted a multiple linear regression analysis using unrelated action-values (the same action-values as in Figure 2—figure supplement 6) and the following …

https://doi.org/10.7554/eLife.34248.010
Figure 2—figure supplement 8
Unbiased classification of action-value neurons does not resolve the temporal correlations confound.

Following (Wang et al., 2013) (whose main focus was state-value representation), we considered an unbiased classification of action-value neurons. (A) (B) (C) and (D) denote the random-walk neurons, …

https://doi.org/10.7554/eLife.34248.011
Figure 2—figure supplement 9
Random intermingling of estimated action-values does not resolve the temporal correlations confound.

In the spirit of the experiment conducted by (FitzGerald et al., 2012) we simulated an experiment, in which two different trial types are marked by cues, randomly selected every trial, and …

https://doi.org/10.7554/eLife.34248.012
Figure 2—figure supplement 10
Increasing the number of blocks does not resolve the temporal correlations confound.

We repeated the standard analysis with eight blocks, so that the four blocks from the experiment in Figure 1A were repeated twice, each time in random permutation. The mean length of the sessions …

https://doi.org/10.7554/eLife.34248.013
Figure 3 with 1 supplement
Permutation analysis.

(A) Red and blue correspond to red and blue - labeled neurons in Figure 1B. Arrow-heads denote the t-values from the regressions on the estimated action-values from the session in which the neuron …

https://doi.org/10.7554/eLife.34248.014
Figure 3—figure supplement 1
Analyses of basal ganglia data using estimated action-values from the neurons' sessions.

(A) The standard analysis on the neuronal recordings taken from (Ito and Doya, 2009), using action-values estimated from the behavior in the sessions in which the neurons were recorded. The top and …

https://doi.org/10.7554/eLife.34248.015
A possible solution for the temporal correlations confound that is based on a trial design.

(A) A Q-learning model was simulated in 1,000 sessions of 400 trials, where the original reward probabilities (same as in Figure 1A) were associated with different cues and appeared randomly. …

https://doi.org/10.7554/eLife.34248.016
Figure 5 with 1 supplement
Erroneous detection of action-value representation in policy neurons.

(A) Behavior of the model in an example session, same as in Figure 1A for the direct-policy model. (B) Red and blue lines denote ‘action-values’ 1 and 2, respectively, calculated from the choices …

https://doi.org/10.7554/eLife.34248.017
Figure 5—figure supplement 1
Alternative analyses do not resolve the correlated decision variables confound.

(A) Regression analysis on policy neurons with choice as an added regressor. Following common experimental practice, we used the following regression model: st=β0+β1Q1t+β2Q2t+β3(at=1)+ϵt where s(t) is the spike count in trial t

https://doi.org/10.7554/eLife.34248.018
A possible solution for the policy and state confounds.

(A) The Q-learning model (Equations 1 and 2) was simulated in 1,000 sessions of 400 trials each, where the reward probabilities were associated with different cues and were randomly chosen in each …

https://doi.org/10.7554/eLife.34248.019

Additional files

Download links