Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons

7 figures, 1 table and 5 additional files

Figures

Figure 1 with 1 supplement
Mice performed a probabilistic reversal learning task during GCaMP6f recordings from VTA/SN::DMS terminals or cell bodies.

(a) Schematic of a mouse performing the task. The illumination of the central nosepoke signaled the start of the trial, allowing the mouse to enter the nose port. After a 0–1 second jitter delay, two levers were presented to the mouse, one of which result in a reward with high probability (70%) and the other with a low probability (10%). The levers swapped probabilities on a pseudorandom schedule, unsignaled to the mouse. (b) The averaged probability of choosing the lever with high value before the switch, 10 trials before and after the block switch, when the identity of the high value lever reversed. Error bars indicate ±1 standard error (n = 19 recording sites). (c) We fit behavior with a trial-by-trial Q learning mixed effect model. Example trace of 150 trials of a mouse's behavior compared to the model’s results. Black bars above and below the plot indicate which lever had the high probability for reward; Orange dots indicate the mouse’s actual choice; Blue dots indicate whether or not mouse was rewarded; Grey line indicates the difference in the model’s Q values for contralateral and ipsilateral choices. (d) Surgical schematic for recording with optical fibers from the GCaMP6f terminals originating from VTA/SN. (e) Example recording from VTA/SN::DMS terminals in a mouse expressing GCaMP6f (top) or GFP (bottom). (f, g) Previous work has reported contralateral choice selectivity in VTA/SN::DMS terminals (Parker et al., 2016) when the signals are time-locked to nose poke (f) and lever presentation (g). ‘Contra’ and ‘Ipsi’ refer to the location of the lever relative to the side of the recording. Colored fringes represent ±1 standard error (n=12 recording sites).

https://doi.org/10.7554/eLife.42992.002
Figure 1—figure supplement 1
Recording from VTA/SN::DMS cell bodies (n = 7 recording sites).

(a) Surgical schematic for recording with optical fibers from the GCaMP6f VTA/SN::DMS cell-bodies. (b) Sample GCaMP6f traces from VTA/SN::DMS cell bodies. (c) Contralateral choice selectivity was also observed in DMS DA cell bodies when the signals were time-locked to nose poke (top) and lever presentation (bottom). Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 7).

https://doi.org/10.7554/eLife.42992.003
Schematics of three possible types of value modulation at lever presentation.

Trials here are divided based on the difference in Q values for chosen and unchosen action. (a) Contralateral value modulation postulates that the signals are selective for the value of the contralateral action (relative to ipsilateral value) instead of the action chosen. This means that the direction of value modulation should be flipped for contralateral versus ipsilateral choices. Since mice would more often choose an option when its value is higher, the average GCaMP6f signals would be higher for contralateral than ipsilateral choices. (b) Alternatively, the signals may be modulated by the value of the chosen action, resulting in similar value modulation for contralateral and ipsilateral choices. This type of value modulation will not in itself produce contralateral selectivity seen in previous results. (c) However, if the signals were modulated by the chosen value and the contralateral choice, the averaged GCaMP6f would exhibit the previously seen contralateral selectivity.

https://doi.org/10.7554/eLife.42992.006
Figure 3 with 7 supplements
DA neurons that project to DMS were modulated by both chosen value and movement direction.

(a) GCaMP6f signal time-locked to lever presentation for contralateral trials (blue) and ipsilateral trials (orange), as well as rewarded (solid) and non-rewarded previous trial (dotted) from VTA/SN::DMS terminals. Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 12). (b) GCaMP6f signal for contralateral trials (blue) and ipsilateral trials (orange), further binned by the difference in Q values for chosen and unchosen action. Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 12). (c) Mixed effect model regression on each datapoint from 3 seconds of GCaMP6f traces. Explanatory variables include the action of the mice (blue), the difference in Q values for chosen and unchosen actions (orange), their interaction (green), and an intercept. Colored fringes represent ±1 standard error from estimates (n = 12 recording sites). Black diamond represents the average latency for mice pressing the lever, with the error bars showing the spread of 80% of the latency values. Dots at bottom mark timepoints when the corresponding effect is significantly different from zero at p<0.05 (small dot), p<0.01 (medium dot), p<0.001 (large dot). P values were corrected with Benjamini Hochberg procedure. (d-f) Same as (a-e), except with signals from VTA/SN::DMS cell bodies averaged across recording sites (n = 7) instead of terminals.

https://doi.org/10.7554/eLife.42992.007
Figure 3—figure supplement 1
Four Examples of non-Z-scored Individual Sessions of Photometry Data from VTA/SN::DMS Terminals.

Sample, not Z-scored ∆F/F recording from VTA/SN::DMS Terminal. Each row is an example session from a different mouse. Traces are time-locked to the lever presentation for contralateral trials (left column) and ipsilateral trials (right column). White dotted vertical line indicate lever presentation. Colorbars are provided for each row for each example session.

https://doi.org/10.7554/eLife.42992.008
Figure 3—figure supplement 2
Four Examples of non-Z-scored Individual Sessions of Photometry Data from VTA/SN::DMS Cell-Bodies.

Sample, not Z-scored ∆F/F recording from VTA/SN::DMS Cell-bodies. Each row is an example session from a different mouse. Traces are time-locked to the lever presentation for contralateral trials (left column) and ipsilateral trials (right column). White dotted vertical line indicate lever presentation. Colorbars are provided for each row for each example session.

https://doi.org/10.7554/eLife.42992.009
Figure 3—figure supplement 3
Mixed effect model regression on GCaMP6f traces of VTA/SN::DMS terminals (n = 12 recording sites) using the difference in Q values for contralateral and ipsilateral choices.

Same analysis as Figure 3c, except explanatory variables include the action of the mice (blue), the difference in Q values for contralateral and ipsilateral choices (orange), their interaction (green), and an intercept. Colored fringes represent ±1 standard error from estimates (n = 12 recording sites). Dots at bottom mark timepoints where the corresponding effect is significantly different from zero at p<0.05 (small dot), p<0.01 (medium dot), p<0.001(large dot). P values were corrected with Benjamini Hochberg procedure.

https://doi.org/10.7554/eLife.42992.010
Figure 3—figure supplement 4
Analysis of DA signals time-locked to nose poke.

(a) GCaMP6f signal time-locked to nose poke for contralateral trials (blue) and ipsilateral trials (orange), as well as rewarded (solid) and non-rewarded previous trial (dotted) from VTA/SN::DMS terminals. Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 12). (b) GCaMP6f signal for contralateral trials (blue) and ipsilateral trials (orange), and further binned by the difference in Q values for chosen and unchosen action. Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 12). (c) Mixed effect model regression on each datapoint from 3 seconds of GCaMP6f traces. Explanatory variables include the action of the mice (blue), the difference in Q values for chosen vs unchosen actions (orange), their interaction (green), and an intercept. Colored fringes represent ±1 standard error from estimates (n = 12 recording sites). Black diamond represents the average latency for lever presentation from nose poke, with the error bars showing the spread of 80% of the latency values. Dots at bottom mark timepoints when the corresponding effect is significantly different from zero at p<0.05 (small dot), p<0.01 (medium dot), p<0.001 (large dot). P values were corrected with Benjamini Hochberg procedure. (d-f) Same as (a-e), except with signals from VTA/SN::DMS cell bodies averaged across recording sites (n = 7) instead of terminals.

https://doi.org/10.7554/eLife.42992.011
Figure 3—figure supplement 5
Kernels for each significant behavioral event from the multiple event kernel analysis.

(a) Nose poke kernel output from linear regression model using GCaMP6f from VTA/SN::DMS terminals. Each line is the kernel for a combination of contralateral (blue) and ipsilateral (orange) trials, as well as rewarded (solid) and non-rewarded (dotted) trials. Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 12). Black diamond represents the average latency for lever presentation from nose poke with the error bars showing the spread of 80% of the latency values. (b) Lever presentation kernels, with the black diamond representing the average latency from lever press to lever presentation. (c) Lever press kernels, with the black diamond representing the average latency from CS +or CS- to lever press. (d-f) Same as (a-e), except with signals from VTA/SN::DMS cell bodies averaged across recording sites (n = 7) instead of terminals.

https://doi.org/10.7554/eLife.42992.012
Figure 3—figure supplement 6
Averaged GCaMP6f signals of left and right hemispheres recordings from VTA/SN::DMS cell-bodies data (n = 4 mice, 7 recording sites).

GCaMP6f signal relative to the lever presentation time for contralateral trials (blue) and ipsilateral trials (orange), as well as rewarded (solid) and non-rewarded previous trial (dotted) from VTA/SN::DMS terminals. Colored fringes represent ±1 standard error from activity averaged across trials. Each row represents averaged data from a distinct mouse (n = 4 total), with left and right column representing the left and right hemisphere recordings.

https://doi.org/10.7554/eLife.42992.013
Figure 3—figure supplement 7
Mixed effect model regression with latency as nuisance covariate.

(a). Mixed effect model regression with log latency of lever press (red) as additional nuisance covariate for VTA/SN::DMS terminal data (n = 12 recording sites). As with in Figure 3c,f, the mixed effect model’s other explanatory variables include the action of the mice (blue), the difference in Q values for chosen vs unchosen actions (orange), their interaction (green), and an intercept. Colored fringes represent ±1 standard error from estimates. Dots at bottom mark timepoints when the corresponding effect is significantly different from zero at p<0.05 (small dot), p<0.01 (medium dot), p<0.001 (large dot). P values were corrected with Benjamini Hochberg procedure. (b) Same as (a), except with signals from VTA/SN::DMS cell bodies averaged across recording sites (n = 7) instead of terminals. .

https://doi.org/10.7554/eLife.42992.014
DA neurons that project to DMS reversed their choice selectivity after the lever press, around the time the mice reversed their movement direction.

(a). GCaMP6f signal from VTA/SN::DMS terminals time-locked to the lever press, for contralateral choice trials (blue) and ipsilateral choice trials (orange), as well as rewarded (solid) and non-rewarded previous trial (dotted). The GCaMP6f traces for each choice cross shortly after the lever-press, corresponding to the change in the mice's head direction around the time of the lever press (shown schematically above the plot). Colored fringes represent ±1 standard error from activity averaged across recording sites (n = 12). (b) Same as (a), except with signals from VTA/SN::DMS cell bodies averaged across recording sites (n = 7) instead of terminals.

https://doi.org/10.7554/eLife.42992.015
Author response image 1
Early and late trials in block are both modulated by chosen value and contralateral action (VTA/SN::DMS Terminals, n = 12 sites)

(A) GCaMP6f signal from VTA/SN::DMS Terminal (n = 12 sites) from first 6 trials of each block. Traces are time-locked to the lever presentation for contralateral trials (blue) and ipsilateral trials (orange), as well as rewarded (solid) and non-rewarded previous trial (dotted). Colored fringes represent 1 standard error from activity averaged across recording sites (n = 12) (B) GCaMP6f signal for contralateral trials (blue) and ipsilateral trials (orange), and further binned by the difference in Q values for chosen and unchosen action. Colored fringes represent 1 standard error from activity averaged across recording sites (n = 12). (C) Mixed effect model regression on each datapoint from 3 seconds of GCaMP6f traces. Explanatory variables include the action of the mice (blue), the difference in Q values for chosen vs. unchosen actions (orange), their interaction (green), and an intercept. Colored fringes represent 1 standard error from estimates. Dots at bottom mark timepoints when the corresponding effect is significantly different from zero at p<.05 (small dot), p<.01 (medium dot), p<.001 (large dot). P values were corrected with Benjamini Hochberg procedure. (D-F) Same as (A-E), except using the last 6 trials of each block.

https://doi.org/10.7554/eLife.42992.030
Author response image 2
Early and late trials in block are both modulated by chosen value and contralateral action (VTA/SN::DMS Cell-bodies, n = 7 sites)

(A) GCaMP6f signal from VTA/SN::DMS Cell-bodies (n = 7 sites) from first 6 trials of a block. Traces are time-locked to the lever presentation for contralateral trials (blue) and ipsilateral trials (orange), as well as rewarded (solid) and non-rewarded previous trial (dotted). Colored fringes represent 1 standard error from activity averaged across recording sites (n = 7). (B) GCaMP6f signal for contralateral trials (blue) and ipsilateral trials (orange), and further binned by the difference in Q values for chosen and unchosen action. Colored fringes represent 1 standard error from activity averaged across recording sites (n = 12). (C) Mixed effect model regression on each datapoint from 3 seconds of GCaMP6f traces. Explanatory variables include the action of the mice (blue), the difference in Q values for chosen vs. unchosen actions (orange), their interaction (green), and an intercept. Colored fringes represent 1 standard error from estimates. Dots at bottom mark timepoints when the corresponding effect is significantly different from zero at p<.05 (small dot), p<.01 (medium dot), p<.001 (large dot). P values were corrected with Benjamini Hochberg procedure. (D-F) Same as (A-E),except using trials from the last 6 trials of the block.

https://doi.org/10.7554/eLife.42992.031
Author response image 3
Kernels for each significant behavioral event for mixed effect model regression

(A) Nose poke kernel output from linear regression model using GCaMP6f from VTA/SN::DMS terminals. Each line represents a normalized regression variable: action (blue; 0 for ipsilateral, 1 for contralateral), difference in Q values for chosen direction and unchosen direction (orange), and the interaction between the two (green). Colored fringes represent 1 standard error from activity averaged across recording sites (n = 12). Black diamond represents the average latency for lever presentation from nose poke with the error bars showing the spread of 80% of the latency values. (B) Lever presentation kernels, with the black diamond representing the average latency from lever press to lever presentation. (C) Lever press kernels, with the black diamond representing the average latency from CS+ or CS- to lever press. (D-F) Same as (A-E), except with signals from VTA/SN::DMS cell bodies averaged across recording sites (n = 7) instead of terminals.

https://doi.org/10.7554/eLife.42992.032

Tables

Table 1
Fitted Parameters for Q-learning model from PyStan.

25th, 50th, and 75th percentile of the alpha, beta, and stay parameters of the Q-learning mixed effect model. These are the the group-level parameters that reflect the distribution of the subject-level parameters.

https://doi.org/10.7554/eLife.42992.004
25th percentile50th percentile (median)75th percentile
Alpha (learning rate)0.5816070.6116930.639946
Beta (inverse temperature)0.9265010.9902751.058405
Stay0.8836700.9453851.008465
Table 1—source data 1

Mixed effect Q-learning model parameters.

Parameters from the mixed effect Q-learning model, including group-level and individual-level parameters, and the mean and range of data across samples from the model. See 'Q Learning Mixed Effect Model' in the Materials and methods section for more details. 

https://doi.org/10.7554/eLife.42992.005

Additional files

Source code 1

Mixed Effect Q-learning Model Code.

Stan code for the trial-by-trial Mixed Effect Q-learning Model (more details in Materials and methods section).

https://doi.org/10.7554/eLife.42992.016
Source code 2

Regression Model for Figure 3c,f.

Julia code for running the regression model of GCaMP6f against Q values, mice’s action, the interaction between those two variables, and an intercept. Regression was performed using Julia’s MixedModels package. See 'Regression Model' in Materials and methods section for more details. 

https://doi.org/10.7554/eLife.42992.017
Source data 1

GCaMP6f data.

Each row is one timepoint, with columns that denote the GCaMP signal for that timepoint, binary indicator variables for behavioral events at that timepoint (0 represents no event at this timepoint, 1 represents event occurred at this timepoint), the recording site, session, and high value lever at the timepoint. Behavioral events include trial start, nose poke enter and exit, lever presentations, and contralateral or ipsilateral lever press.

https://doi.org/10.7554/eLife.42992.018
Source data 2

Trial-by-trial data with Q values and GCaMP6f.

CSV with relevant trial information for each trial across terminals and cell-bodies data. Trial information includes the recording location, recording site ID, session ID, the mouse’s choice, and whether or not the mouse was rewarded. Additional columns include the Q values for each trial (including Q value of contralateral minus ipsilateral choice and Q values of chosen minus unchosen choice) and the z-scored GCaMP signal time-locked at four important behavioral events (nose poke, lever presentation, lever press/choice, and reward).

https://doi.org/10.7554/eLife.42992.019
Transparent reporting form
https://doi.org/10.7554/eLife.42992.020

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Rachel S Lee
  2. Marcelo G Mattar
  3. Nathan F Parker
  4. Ilana B Witten
  5. Nathaniel D Daw
(2019)
Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons
eLife 8:e42992.
https://doi.org/10.7554/eLife.42992