Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference

  1. Lorenz Deserno  Is a corresponding author
  2. Rani Moran  Is a corresponding author
  3. Jochen Michely
  4. Ying Lee
  5. Peter Dayan
  6. Raymond J Dolan
  1. Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom
  2. The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, United Kingdom
  3. Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of Würzburg, Germany
  4. Department of Psychiatry and Psychotherapy, Technische Universität Dresden, Germany
  5. Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin, Germany
  6. Max Planck Institute for Biological Cybernetics, Germany
  7. University of Tübingen, Germany
17 figures, 4 tables and 1 additional file

Figures

Study and task design.

(A) Illustration of within-subjects design. On each of 2 testing days, approximately 7 days apart, participants started with either a medical screening and brief physical exam (day 1) or a working memory test (day 2). Subsequently they drank an orange squash containing either levodopa ("D") or placebo ("P"). (B) Task structure of the Magic Castle Game. Following a choice of vehicle, participants ‘travelled’ to two associated destinations. Each vehicle shared a destination with another vehicle. At each destination, participants could win a reward (10 pence) with a probability that drifted slowly as Gaussian random walks, illustrated in (C). (D) Depiction of trial types and sequences. (1) On standard trials (2/3 of the trials), participants made a choice out of two options in trial n (max. choice 2 s). The choice was then highlighted (0.25 s) and participants subsequently visited each destination (0.5 s displayed alone). Reward, if obtained, was overlaid to each of the destinations for 1 s. (2) On uncertainty trials, participants made a choice between two pairs of vehicles. Subsequently, the ghost nominates, unbeknown to the participant, one vehicle out of the chosen pair. Firstly, the participant is presented the destination shared by the chosen pair of vehicles (here the forest) and this destination is therefore non-informative about the ghost’s nominee. Secondly, the destination unique to the ghost-nominated vehicle is then shown (the highway). This second destination is informative because it enables inference of the ghost’s nominee with perfect certainty based on a model-based (MB) inference that relies on task transition structure. Trial timing was identical for standard and uncertainty trials.

Model-free (MF) and model-based (MB) contributions.

(A) Left panel: Illustration of MF and MB values at choice in standard trials. MB values are computed prospectively based on the sum of values of the two destinations associated with each of the two vehicles offered for choice (here highway and forest for green antique car; desert and forest for yellow racing car). Right panel: MF vs. MB credit assignment (MFCA vs. MBCA) in standard trials. MFCA only updates the chosen vehicle (here green antique car) based on the sum of rewards at each destination (here forest and highway). MBCA updates separately the values for each of the two destinations (forest and highway). Each of these updates will prospectively affect equally the values of the vehicle pair associated with that destination (updated MB value for forest influences MB value of the yellow racing car and the green antique car while the updated MB value for highway influences the MB value of the green antique car and the blue crane). Forgetting of Q values was left out for simplicity (see Materials and methods and see Appendix 1—figure 1 for validating simulations). (B1) Illustration of MF choice repetition. We consider only standard trials n + 1 that offer for choice the standard trial n chosen vehicle (e.g., green antique car) alongside another vehicle (e.g., yellow racing car), sharing a common destination (forest). Following choice of a vehicle in trial n (framed in red, here the green antique car), participants visited two destinations of which one can be labelled on trial n + 1 as common to both offered vehicles (C, e.g., forest, which was also rewarded in the example) and the other labelled as unique (U, e.g., city highway, unrewarded in this example) to the vehicle chosen on trial n (the green antique car). The trial n common destination reward effect on the probability to repeat the previously chosen vehicle (dashed frame in red, e.g., the green antique car) constitutes an MF choice repetition. (B2) The empirical reward effect for the common destination (i.e., the difference between rewarded and unrewarded on trial n, see Appendix 1—figure 2 for a more detailed plot of this effect) on repetition probability in trial n + 1 is plotted for placebo and levodopa (L-Dopa) conditions. There was a positive common reward main effect and this reward effect did not differ significantly between placebo and levodopa conditions. (C1) Illustration of the MB contribution. We considered only standard trials n + 1 that excluded from the choice set the standard trial n chosen vehicle (e.g., green antique car, framed in red). One of the vehicles offered on trial n + 1 shared one destination in common with the trial n chosen vehicle (e.g., yellow racing car, sharing the forest, and we term its choice a generalization). A reward (on trial n) effect for the common destination on the probability to generalize on trial n + 1 (e.g., by choice of the yellow racing car, dashed frame in red) constitutes a signature of MB choice generalization. (C2) The empirical reward effect at the common destination (i.e., the difference between rewarded and unrewarded, see Appendix 1—figure 2 for a more detailed plot of this effect) on generalization probability is plotted for placebo and levodopa conditions. (C3) In the regression analysis described in the text, we also include the current (subject- and trial-specific) state of the drifting reward probabilities (at the common destination) because we previously found this was necessary to control for temporal auto correlations in rewards (Moran et al., 2019). For completeness, we plot beta regression weights of reward vs. no reward at the common destination (indicated as MB) and for the common reward probability (RewProbC) each for placebo and levodopa conditions. No significant interaction with drug session was observed. Error bars correspond to SEM reflecting variability between participants.

Illustration of model-free (MF) credit assignment (MFCA) guided by model-based (MB) inference and MB credit assignment (MBCA) in uncertainty trials.

The ghost, unbeknown to the participants, nominates a vehicle (e.g., green antique car). The ghost’s nomination does not matter for MBCA because it updates values for each of the destinations (here forest and highway) separately, which will prospectively effect on all associated vehicle values (here green antique car, yellow racing car, and blue crane). With respect to MFCA guided by MB inference, participants are in state uncertainty and have a chance belief about the ghost-nominated vehicle. The firstly presented destination (the forest) holds no information about the ghost-nominated vehicle (the green antique car), the non-informative (‘N’) destination. Thus, participants remain in state uncertainty. The destination presented second (here the highway) enables retrospective MB inference about the ghost’s nomination (the green antique car) and is therefore informative (‘I’). This retrospective MB inference enables preferential MFCA for the ghost-nominated vehicle (here green antique car) based on the sum of rewards at each destination (without such inference MFCA can only occur equally for the ghost-nominated and -rejected vehicles). Forgetting of Q values was left out for simplicity (see Materials and methods and see Appendix 1—figure 3 for validating simulations).

Guidance of model-free (MF) credit-assignment (CA) by retrospective model-based (MB) inference.

(A1) Illustration of the repeat condition. The ghost-nominated vehicle (e.g., green antique car) is offered for choice in standard trial n + 1 alongside a vehicle from the non-chosen pair (e.g., blue building crane). A higher probability to repeat the ghost-nominated vehicle in standard trial n + 1 after a reward as compared to no reward at the informative destination, the highway, constitutes model-free credit assignment (MFCA) for the ghost’s nomination (GN, the green antique car). (A2) Illustration of the switch condition. The ghost-rejected vehicle (e.g., the yellow racing car) is offered for choice in standard trial n + 1 alongside a vehicle from the non-chosen pair (e.g., brown farming tractor). A higher probability to choose the ghost-rejected vehicle in standard trial n + 1 after a reward as compared to no reward at the informative destination constitutes MFCA for the ghost’s rejection (GR). Both ghost-based assignments depend on retrospective model-based (MB) inference. (A3) Preferential effect of retrospective MB inference on MFCA (effects of GN > GR) based on the informative destination is enhanced under levodopa (L-Dopa; "D") as compared to placebo ("P"). This is indicated by a significant trial type (GN/GR) × drug (placebo/ levodopa) interaction (also see Appendix 1—figure 4 and Appendix 1—figure 5 for more detailed plots). (B1) Illustration of the clash condition. The previously chosen pair (green antique and yellow racing car) is offered for choice in standard trial n + 1. A higher probability to repeat the ghost-nominated vehicle (the green antique car) in standard trial n + 1 following reward (relative to non-reward) at the non-informative destination (the forest) constitutes a signature of preferential MFCA for GN (the green antique car) over GR (the yellow racing car). (B2) Choice repetition in clash trial is plotted as a function of L-Dopa ("D") vs. placebo ("P") and reward (R+: reward; R−: no-reward, see Appendix 1—figure 6 for a more detailed plot). While there was a main effect for drug, there was no interaction of non-informative reward × drug, providing no evidence that drug modulated MFCA based on the non-informative outcome. Error bars correspond to SEM reflecting variability between participants.

Analyses based on estimated credit assignment (CA) parameters from computational modelling (for model comparisons, based on the current and the Moran et al., 2019) data, see Appendix 1—figure 7 and Appendix 1—figure 8; for parameter recoverability, see Appendix 1—figure 9.

(A) Model-free and model-based credit assignment parameters (MFCA; MBCA) did not differ significantly for placebo (P) and levodopa (D) conditions. (B) MFCA parameters based on the informative destination for the ghost-nominated (GN) and the ghost-rejected (GR) destinations as a function of drug condition. (C) Same as B but for the non-informative destination. (D) The extent to which MFCA prefers the ghost-nominated over the ghost-rejected vehicle for each destination and drug condition. We name this preferential MFCA (PMFCA). Error bars correspond to SEM reflecting variability between participants.

Inter-individual differences in drug effects in model-based credit assignment (MBCA) and in preferential model-based credit assignment (MFCA), averaged across informative and non-informative destinations (aPMFCA).

(A) Scatter plot of the drug effects (levodopa minus placebo; ∆aPMFCA, ∆MBCA). Dashed regression line and Pearson r correlation coefficient (see Appendix 1—figure 10 for an analysis that controls for parameter trade-off). (B) Drug effects in credit assignment (∆CA) based on a median on ∆MBCA. Error bars correspond to SEM reflecting variability between participants. See Appendix 1—figure 11, for a report on inter-individual differences in drug effects related to working memory.

Appendix 1—figure 1
Simulations for standard trials based on the full model and sub-models.

NR = no reward, R = reward. Rew = reward at the common destination, RewProBC = reward probability at the common destination.

Appendix 1—figure 2
Empirical probabilities of model-agnostic model-free (MF) (A and B) and model-based (MB) (C and D) choice contribution under placebo and levodopa (L-Dopa).

U-Non = no reward at unique destination, U-Rew = reward at unique destination, C-Non = no reward at common destination, C-Rew = reward at common destination.

Appendix 1—figure 3
Simulations for uncertainty trials based on the full model and sub-models.

GS = ghost-selected, GR = ghost-rejected.

Appendix 1—figure 4
Same data as plotted in Figure 4 in the main manuscript but individual variability reflects differences in task conditions.

GN = ghost-nominated, GR = ghost-rejected, R− = no reward, R+ = reward.

Appendix 1—figure 5
Retrospective model-based (MB) inference using the informative destination based on repeat and switch signatures after uncertainty trials.

I-Non = no reward at informative destination, I-Rew = reward at informative destination, N-Non = no reward at non-informative destination, N-Rew = reward at non-informative destination.

Appendix 1—figure 6
Retrospective model-based (MB) inference using the non-informative destination based on choice repetition in ‘clash’ trials n + 1 following an uncertainty trial n.

I-Non = no reward at informative destination, I-Rew = reward at informative destination, N-Non = no reward at non-informative destination, N-Rew = reward at non-informative destination.

Appendix 1—figure 7
Model comparison results.

(A) Results of the bootstrap-GLRT model comparison for the pure model-based (MB) sub-model. The blue bars show the histogram of the group twice log-likelihood improvement (model vs. sub-model) for synthetic data simulated using the sub-model (10,000 simulations). The blue line displays the smoothed null distribution (using Matlab’s ‘ksdensity’). The red line shows the empirical group twice log-likelihood improvement. p-Value reflects the proportion of 10,000 simulations that yielded an improvement in likelihood that was at least as large as the empirical improvement. (B–E) Same as (A), but for the pure model-free (MF) choice, the no informativeness effects on MF credit assignment (MFCA), the no MB guidance for MFCA, the no MB guidance for the informative destination, and the no MB guidance for the non-informative destination sub-models.

Appendix 1—figure 8
Reanalysis of Moran et al., 2019, based on the current models.

(A–F) As Appendix 1—figure 7 but for the data of Moran et al., 2019. Each of the sub-models was rejected at the group level in favour of the full model. (G) Full-model uncertainty trials model-based credit assignment (MFCA) parameters as a function of outcome informativeness (blue/red) and nomination. Using a mixed effects model, we found a significant interaction effect between informativeness and nomination (b = 0.10, t = 2.05, p = 0.042) implying the nomination effect on MFCA was stronger for the informative than the non-informative outcome. Simple effect analysis showed significant positive nomination effects for both the informative outcome (blue; b = 0.2, F(1,156) = 28.16, p = 4e-7) and the non-informative outcome (red; b = 0.31, F(1,156) = 12.36, p = 6e-4). Thus, this analysis supports the conclusions from Moran et al., 2019, that retrospective model-based (MB) inference guides MFCA on uncertainty trials for both outcomes. Note that Moran et al., 2019, did not separate MFCA for the informative and non-informative outcomes.

Appendix 1—figure 9
Parameter recoverability.

For each of the 2*62 full-model parameter combinations, 1000 synthetic (simulated) datasets were created by simulating the full model on experimental sessions as in the true experiment. Then the full model was fit to each of these generated datasets. For each credit assignment (CA) parameter we plot the recovered against the generating parameters, report the Spearman correlation and impose black diagonals where ‘recovered = generating’. (A) Model-free CA (MFCA) on standard trials, (B–E) MFCA on uncertainty trials; (B) informative outcome, ghost-nominated, (C) informative outcome, ghost-rejected, (D) non-informative outcome, ghost-nominated, (E) non-informative outcome, ghost-rejected, (F) model-based CA (MBCA).

Appendix 1—figure 10
Trade-off between parametric drug effects on averaged preferential model-free credit assignment (aPMFCA) and model-based credit assignment (MBCA).

Based on our parameter recovery simulations (see Appendix 1—figure 9), we also calculated for each participant and each simulation estimation errors (est errors) for drug effects on aPMFCA and MBCA (as differences between fitted and generating drug effects). Next for each simulation index (i = 1,2,…,1000) we calculated the group-level Spearman correlation between these two estimation errors. The histogram of these correlations is plotted. There is weak negative trade-off between estimation errors of drug effects on aPMFCA and MBCA. Importantly, the negative empirical correlation (vertical black line) was still significant even after controlling for this trade-off (p = 0.03; calculated as the proportion of simulations with correlation ≤ empirical correlation).

Appendix 1—figure 11
Scatter plots of the drug effect (levodopa minus placebo) on preferential model-free credit assignment (∆PMFCA) based on the informative destination reward and for the non-informative destination reward against working memory (WM).

Tables

Appendix 1—table 1
Mixed effects models on model-agnostic choice data from standard trials.
NameEstimateSEt-StatDFp-ValueLower CIUpper CI
MF choice (standard trials)REPEAT ~ 1 + C*U*DRUG*ORDER + (C + U + DRUG + ORDER | PART)
(Intercept)0.340.065.554800.0000.220.46
C (common)0.670.079.144800.0000.530.82
U (unique)1.540.0917.404800.0001.361.71
DRUG0.030.070.464800.643–0.110.18
ORDER0.070.070.914800.365–0.080.21
C*U0.190.111.724800.085–0.030.40
C*DRUG0.070.110.674800.500–0.140.29
U*DRUG0.060.110.564800.577–0.150.27
C*ORDER0.120.111.094800.277–0.100.33
U*ORDER–0.110.11–0.994800.321–0.320.11
DRUG*ORDER–0.250.25–1.024800.310–0.730.23
C*U*DRUG0.140.220.644800.524–0.290.57
C*U*ORDER0.130.220.594800.554–0.300.56
C*DRUG*ORDER–0.020.29–0.064800.952–0.600.56
U*DRUG*ORDER–0.180.35–0.514800.610–0.870.51
C*U*DRUG*ORDER–0.220.44–0.504800.618–1.080.64
MB choice (standard trials)GENERALIZE ~ C*P*DRUG*ORDER + (C + P + DRUG + ORDER | PART)
(Intercept)0.300.046.9671770.0000.220.38
C (common)0.400.066.2271770.0000.270.52
P (common reward probability)1.330.216.3971770.0000.921.74
DRUG–0.130.08–1.6571770.099–0.290.03
ORDER–0.130.08–1.5771770.116–0.290.03
C*P–0.230.23–1.0171770.311–0.670.21
C*DRUG0.050.120.3971770.695–0.190.28
P*DRUG–0.340.23–1.4871770.140–0.790.11
C*ORDER–0.060.12–0.5271770.606–0.300.17
P*ORDER0.160.230.7071770.482–0.290.61
DRUG*ORDER–0.240.17–1.4171770.158–0.580.09
C*P*DRUG–0.080.45–0.1871770.856–0.970.80
C*P*ORDER0.570.451.2671770.207–0.311.45
C*DRUG*ORDER–0.380.25–1.4871770.140–0.870.12
P*DRUG*ORDER0.460.830.5571770.583–1.182.09
C*P*DRUG*ORDER1.400.911.5471770.123–0.383.17
Appendix 1—table 2
Mixed effects models on model-agnostic choice data from uncertainty trials.
NameEstimateSEt-StatDFp-ValueLower CIUpper CI
Preferential MFCA for the informative destination (ghost-nominated, ‘repeat trials’ > ghost-rejected, ‘switch trials’)MFCA ~ NOM*DRUG*ORDER + (NOM + DRUG + ORDER | PART)
(Intercept)0.100.018.272390.0000.070.12
NOM (nomination)0.030.021.572390.117–0.010.08
DRUG0.010.020.402390.687–0.040.06
ORDER0.000.020.132390.895–0.040.05
NOM*DRUG0.110.042.732390.0070.030.19
NOM*ORDER0.020.040.512390.613–0.060.10
DRUG*ORDER–0.030.05–0.732390.467–0.120.06
NOM*DRUG*ORDER–0.010.09–0.042390.966–0.180.17
MFCA for non-informative destination (ghost-nominated > ghost-rejected, ‘clash trials’)REPEAT ~ N*I*DRUG*ORDER + (N*I*DRUG + ORDER | PART)
(Intercept)0.050.041.274790.203–0.030.12
N (non-informative)0.130.071.964790.0510.000.26
I (informative)1.010.109.954790.0000.811.21
DRUG0.160.072.314790.0210.020.29
ORDER0.030.070.414790.684–0.100.16
N*U0.080.140.574790.568–0.190.35
N*DRUG0.050.130.394790.696–0.210.31
I*DRUG0.030.150.244790.810–0.250.32
N*ORDER–0.050.13–0.344790.733–0.300.21
I*ORDER0.060.140.434790.664–0.220.35
DRUG*ORDER–0.200.15–1.374790.171–0.490.09
N*I*DRUG0.070.290.264790.798–0.490.64
N*I*ORDER0.250.290.864790.388–0.320.81
N*DRUG*ORDER–0.470.26–1.804790.072–0.990.04
I*DRUG*ORDER–0.120.41–0.314790.759–0.920.67
N*I*DRUG*ORDER0.860.551.564790.119–0.221.94
Appendix 1—table 3
Mixed effects models on parameters of the computational model.
NameEstimateSEt-StatDFp-ValueLower CIUpper CI
MFCA for ghost-nominated vs. ghost-rejected and informative vs. non-informativeMFCA ~ NOM*INFO*DRUG*ORDER + (NOM*INFO*DRUG + ORDER | PART)
(Intercept)0.180.027.604800.0000.140.23
NOM (nomination)0.100.033.724800.0000.050.15
INFO (informativeness)0.080.042.194800.0290.010.15
DRUG0.050.050.944800.347–0.050.16
ORDER0.040.050.734800.463–0.060.15
NOM*INFO–0.030.05–0.574800.567–0.120.06
NOM*DRUG0.100.042.434800.0150.020.18
INFO*DRUG–0.080.07–1.164800.247–0.220.06
NOM*ORDER0.020.040.374800.715–0.070.10
INFO*ORDER0.100.071.424800.157–0.040.23
DRUG*ORDER–0.090.10–0.984800.328–0.280.10
NOM*INFO*DRUG0.020.070.334800.738–0.120.17
NOM*INFO*ORDER–0.010.07–0.084800.934–0.150.14
NOM*DRUG*ORDER–0.060.11–0.604800.551–0.270.15
INFO*DRUG*ORDER0.160.141.104800.272–0.120.44
NOM*INFO*DRUG*ORDER0.100.190.554800.585–0.260.47
Preferential MFCA for informative vs. non-informativePMFCA ~ INFO*DRUG*ORDER + (INFO + DRUG + ORDER | PART)
(Intercept)0.100.033.712400.0000.050.15
INFO (informativeness)–0.030.05–0.572400.568–0.120.07
DRUG0.100.042.392400.0170.020.18
ORDER0.020.040.362400.720–0.070.10
INFO*DRUG0.020.070.342400.734–0.120.17
INFO*ORDER–0.010.07–0.082400.933–0.150.14
DRUG*ORDER–0.060.11–0.602400.552–0.270.15
INFO*DRUG*ORDER0.100.190.552400.585–0.270.47
Appendix 1—table 4
Distribution of parameters from the full computational model.
Cond.%MFCA standardMFCA info-nomMFCA info-rejMFCA non-info-nomMFCA non-info-rejMBCAPerseveration- standardperseveration-nominatedforget_MFforget_MBforget_Pers
Placebo250.053–0.056–0.026–0.070–0.0740.059–0.197–0.0930.0020.0380.010
500.1470.1680.1490.0480.0300.2730.0420.0710.0580.1480.123
750.3640.4790.3910.3330.2040.4540.3830.3530.5190.5210.428
Levodopa250.060–0.025–0.073–0.011–0.0980.026–0.086–0.0470.0190.0220.008
500.2720.1650.1300.1780.0700.2780.0980.0840.1900.1270.089
750.5740.5170.3830.3900.2910.3670.3460.3740.5980.5080.492

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lorenz Deserno
  2. Rani Moran
  3. Jochen Michely
  4. Ying Lee
  5. Peter Dayan
  6. Raymond J Dolan
(2021)
Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference
eLife 10:e67778.
https://doi.org/10.7554/eLife.67778