Neuroscience

Methamphetamine-induced adaptation of learning rate dynamics depend on baseline performance

Hans Kirschner author has email address
Hanna M Molla
Matthew R Nassar
Harriet de Wit
Markus Ullsperger

Institute of Psychology, Otto-von-Guericke University, Magdeburg, Germany
Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, United States
Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, United States
Department of Neuroscience, Brown University, Providence, United States
Center for Behavioral Brain Sciences, Magdeburg, Germany
German Center for Mental Health (DZPG), Center for Intervention and Research on Adaptive and Maladaptive Brain Circuits Underlying Mental Health (C-I-R-C), Halle-Jena-Magdeburg, Germany

https://doi.org/10.7554/eLife.101413.2

Open access
Copyright information

Figures and data

Demographics and drug use characteristics of study participants (n = 94 )

Demographics and drug use characteristics of study participants (n = 94 )

Subjective drug effects post-capsule administration.
MA increased ‘feel drug effect’ ratings compared to placebo. The scale for the ratings of Feeling a drug effect range from 0 to 100. The vertical black line indicates the time at which the task was started. Ratings of ‘feeling’ a drug effect did not differ significantly between low vs high baseline performers. (all p > .05). DEQ = Drug Effects Questionnaire (Morean et al., 2013).

Methamphetamine improved performance in a modified probabilistic reversal learning task only in participants who performed the task poorly at baseline.
(A) Schematic of the learning task. Each trial began with the presentation of a random jitter between 300 ms and 500 ms. Hereafter, a fixation cross was presented together with two response options (choose – green tick mark; or avoid – red no-parking sign). After the fixation cross, the stimulus was shown centrally until the participant responded or for a maximum duration of 2000 ms. Thereafter, participants’ choices were confirmed by a white rectangle surrounding the chosen option for 500 ms. Finally, the outcome was presented for 750 ms. If subjects chose to gamble on the presented stimuli, they received either a green smiling face and a reward of 10 points or a red frowning face and a loss of 10 points. When subjects avoided a symbol, they received the same feedback but with a slightly paler color and the points that could have been received were crossed out to indicate that the feedback was fictive and had no effect on the total score. A novel feature of this modified version of the task is that we introduced different levels of noise (probability) to the reward contingencies. Here, reward probabilities could be less predictable (30% or 70%), more certain (20% or 80%), or random (50%). (B) Total points earned in the task split up in sessions (baseline, drug session 1 and 2) and drug condition (PL vs. MA). Results show practice effects but no differences between the two drug sessions (baseline vs. drug session 1: 595.85 (39.81) vs. 708.62 (36.93); t(93) = −4.21, p = 5.95^-05, d = 0.30; baseline vs. drug session 2: 595.85 (39.81) vs. 730.00 (38.53); t(93) = −4.77, p = 6.66^-06, d = 0.35; session 1 vs. session 2: t(93) = −0.85, p = 0.399, d = 0.05). Dashed gray indicates no significant difference on/off drug (Δ∼35 points) (C) Interestingly, when we stratified drug effects by baseline performance (using median split on total points at baseline), we found that there was a trend towards better performance under MA in the low baseline performance group (n=47, p = .07). (D) Overall performance in drug session 1 and 2 stratified by baseline performance. Here, baseline performance appears not to affect performance in drug session 1 or 2. Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine.

Learning curves after reversals suggest that methamphetamine improves learning performance in phases of less predictable reward contingencies in low baseline performer.
Top panel of the Figure shows learning curves after all reversals (A), reversals to stimuli with less predictable reward contingencies (B), and reversals to stimuli with high reward probability certainty (C). Bottom panel displays the learning curves stratified by baseline performance for all reversals (D), reversals to stimuli with less predictable reward probabilities (E), and reversals to stimuli with high reward probability certainty (F). Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Results suggest no clear differences in the initial learning between MA and PL. However, learning curves diverged later in the learning, particular for stimuli with less predictable rewards (B) and in subjects with low baseline performance (E). Note. PL = Placebo; MA = methamphetamine; Mean/SEM = line/shading. Data plots are smoothed with a running average (+/-2 trials), leading to slightly overestimated values after reversals. Additional analyses confirm that the probability of choosing the correct response after reversals is not above chance level (t-test against chance: all reversals: t(93)=1.64,p=0.10,d=0.17, 99% CI[0.49,0.55]; reversal to low outcome noise: t(93)=1.67,p=0.10,d=0.17, 99% CI [0.49,0.56]; Reversal to high outcome noise: t(93)=0.87,p=0.38,d=0.09, 99% CI [0.47,0.56]).

Computational modeling results reveal that methamphetamine affects the model parameter controlling dynamic adjustments of learning rate.
(A) Model comparison. Bayesian model selection was performed using −0.5*BIC as a proxy for model evidence (Stephan et al., 2009). The best fitting mixture model assigned proportions to each model based on the frequency with which they provided the “best” fit to the observed participant data (Mixture proportion; blue bars) and estimated the probability with which the true population mixture proportion for a given model exceeded that of all others (Exceedance probability; black bars). The hybrid model plus learning rate modulation by feedback confirmatory (model 3) provided the best fit to the majority of participants and had an exceedance probability near one in our model set. (B-C) Comparison of parameter estimates from the winning model on-/ off drug. Stars indicate significant difference for the respective parameter. Results suggest that only the parameter controlling dynamic adjustments of learning rate according to recent prediction errors, eta, was affected by our pharmacological manipulation. (D-F) Modelled and choice behavior of the participants in the task, stretched out for all stimuli. Note that in the task the different animal stimuli were presented in an intermixed and randomized fashion, but this visualization allows to see that participants’ choices followed the reward probabilities of the stimuli. Data plots are smoothed with a running average (+/-2 trials). Ground truth corresponds to the reward probability of the respective stimuli (good: 70/80%; neutral: 50%; bad: 20/30%). Dashed black lines represent 95% confidence intervals derived from 1000 simulated agents with parameters that were best fit to participants in each group. Model predictions appear to capture the transitions in choice behavior well. Mean/SEM = line/shading. Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine;

Simulated task performance based on individual maximum likelihood parameter estimates reflects drug-induced behavioral differences.
Simulated performance (y-axis) is plotted against trials after reversals (x-axis) for low (blue) and high (yellow) baseline performers. Using each participant’s estimated parameters, 100 artificial agents were simulated playing the task, and their choices were averaged to represent each participant’s behavior. The simulation shows that MA increases performance later in learning for stimuli with high outcome noise, particularly in subjects with low baseline performance (A). In contrast, no drug effect was observed for stimuli with low outcome noise (B).

Methamphetamine boosts signal-to-noise ratio between real reversals and misleading feedback in late learning stages.
Learning rate trajectories after reversal derived from the computational model. First column depicts learning rates across all subjects for all reversals (A), reversal to stimuli with high reward probability certainty (D), and reversal to stimuli with noisy outcomes (G). Middle and right column shows learning rate trajectories for subjects stratified by baseline performance (B, E, H – low baseline performance; C, F, I – high baseline performance). Results suggest that people with high baseline performance show a large difference in learning rates after true reversals and during the rest of the task including misleading feedback. Specifically, they show a peak in learning after reversals and reduced learning rates in later periods of a learning block, when choice preferences should ideally be stabilized (C). This results in a better signal-to-noise ratio (SNR) between real reversals and misleading feedback (i.e., surprising outcomes in the late learning stage). In low baseline performers the SNR is improved after the administration of MA. This effect was particularly visible in stages of the task where rewards were less predictable (H). Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine.

Misleading feedback effects on choice accuracy are modulated by eta and methamphetamine in low baseline performers.
This figure shows the association between receiving misleading feedback later in learning (i.e., reward or losses that do not align with a stimulus’ underlying reward probability) and the probability of making the correct choice during the next encounter of the same stimulus. Results indicate a negative correlation between the probability of a correct choice after double-misleading feedback and eta (A). Here, the probability of a correct choice after double-misleading feedback decreases with increasing eta. There was a trend (p = .06) that subjects under MA were more likely to make the correct choice after two misleading feedback as compared to PL (B). (C) This effect appeared to be dependent on baseline performance, whereby only subjects with low baseline performance seem to benefit from MA (p = 0.02). Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine; MFB = misleading feedback.

Changes in learning rate adjustment explain drug induced performance benefits in low baseline performers.
(A) Regression coefficients and 95% confidence intervals (points and lines; sorted by value) stipulating the contribution of each model parameter estimate to overall participants task performance (i.e., scored points in the task). Play bias and eta (the parameter governing the influence of surprise on learning rate) both made a significant negative contribution to overall task performance, whereas inverse temperature and learning rates were positively related to performance. (B) Differences in parameter values for on- and off-drug sessions as quantified by regression coefficients and 95% confidence intervals are plotted separately for high (red) and low (yellow) baseline performers. Note that the drug predominately affected the eta parameter and did so to a greater extent in low baseline performers. (C) eta estimates on-drug (y-axis) are plotted against eta estimates off-drug (x-axis) for high baseline performer (yellow points) and low baseline performer (red points). Note that a majority of subjects showed a reduction in eta on-drug vs. off-drug (67.02%). This effect was more pronounced in low baseline performers (low baseline performers: 74.47%; (low baseline performers: 59.57%). (D) To better understand how changes in eta might have affected overall performance we conducted a set of simulations using the parameters best fit to human subjects, except that we equipped the model with a range of randomly chosen eta values to examine how altering that parameter might affect performance (n=1000 agents). The results revealed that simulated agents with low to intermediate levels of eta achieved the best task performance, with models equipped with the highest etas performing particularly poorly. To illustrate how this relationship between eta and performance could have driven improved performance for some participants under the methamphetamine condition, we highlight four participants with low-moderate eta values under methamphetamine, but who differ dramatically in their eta values in the placebo condition (D, inset). PL = Placebo; MA = methamphetamine.

The stochasticity lesion model shows a pattern of learning deficits associated low performer in our task.
Behaviour of the lesioned model, in which stochasticity is assumed to be small and constant, is shown along the control model that jointly estimates stochasticity and volatility. (A-B) The inability to make inference about stochasticity, leads to misestimation of volatility, particularly for high outcome noise phases (grey patches show trials with high outcome noise (30/70% reward probability). (C) This led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). (D) The lesioned model shows similar behaviour to our low performer group, with reduced accuracy in later learnings stages for stimuli with high outcome noise. (E) When we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model.

Summary of key findings.
Mean (SEM) scores on three measures of task performance after PL and MA, in participants stratified on low or high baseline performance. (A) There was a trend toward a drug effect, with boosted task performance (total points scored in the task) in low baseline performers (subjects were stratified via median split on baseline performance) after methamphetamine (20mg) administration. (B) Follow-up analyses revealed that on-drug performance benefits were mainly driven by significantly better choices (i.e., choosing the advantageous stimuli and avoiding disadvantageous stimuli) at later stages after reversals for less predictable reward contingencies (30/70% reward probability). (C) To understand the computational mechanism through which methamphetamine improved performance in low baseline performers we investigated how performance in the task related to model parameters from our fits. Our results suggest that methamphetamine alters performance by changing the degree to which learning rates are adjusted according to recent prediction errors (eta), in particular by reducing the strength of such adjustments in low-baseline performers to push them closer to task-specific optimal values.

Learning curves
Top part shows learning curves quantified as the probability to select the correct choice (choosing the advantageous stimuli and avoiding disadvantageous stimuli) stratified by orientation performance. Two-way ANOVAs with the factors Drug (two levels) and Baseline Performance (two levels) on the averaged probability of correct choice during the early and late stage of learning were used to investigate drug effects. (A) No differences in the learning curves between MA and PL became evident when considering all reversals (all p > .1). (B) There was no drug related difference in the acquisition phase of the task between (all p > .05) or (C) in the first reversal learning (all p > .1). In the bottom part of the figure, learning curves are defined as the probability to select a stimulus. (D) No drug effect emerged for reversal learning from a bad stimulus to a good stimulus (all p > .09) or (E) good to bad stimuli (all p > .09). Moreover, there was no difference in reversal learning to neutral stimuli (F and G). Note. PL = Placebo; MA = methamphetamine.

Validation of model selection and parameter recovery.
After model-fitting, each model was used to simulate data for each participant using the best-fitting parameters for that participant. Each model was fit to each synthetic dataset and BIC was used to determine which model provided best fit to synthetic data. (A) Inverse confusion matrix. The frequency with which a recovered model (abscissa, determined by lowest BIC) corresponded to a given simulation model (ordinate) is depicted in color. Recovered models correspond to the same models labeled on the ordinate, with recovered model 1 corresponding to the base model, and so on. The results of the model recover analyses suggest that the recovered model typically corresponded to a synthetic dataset produced by that model. (B) Parameter values that were used to simulate data from the hybrid model with additional modulation of the learning rate by feedback confirmatory (ordinate) tended to correlate (color) with the parameter values best fit to those synthetic datasets (abscissa). Recovered parameter values correspond to the labels on the ordinate, with parameter 1 reflecting temperature parameter of the softmax function, and so on.

Analysis of parameter differences between mathampehtamine (MA) and placebo (PL).

Time-on-Task Effect on choice and reaction time.

Speeding over the course of the task is enhanced by methamphetamine.
This figure shows raw data splits for the effect of time on task (i.e., trial number) on accuracy (panels A & B; a binary variable indicating stimulus-appropriate behavior on each trial) and log-scaled reaction times (RT; panels C & D) across different drug sessions. Results indicate that overall choice accuracy was not affected by time on task. However, participants exhibited faster reaction times over the course of the task, with this speeding effect being more pronounced in the methamphetamine condition.

The effect of methamphetamine depends on baseline performance.
To examine the relationship between methamphetamine effects and baseline performance, we plotted difference scores (i.e., the outcome variable under the drug condition minus placebo condition) against baseline performance. (A-F) show nonlinear relationships between baseline performance and outcome variables ((A) total points, (B) probability of a correct choice (P(correct choice)) in the late learning phase with high outcome noise, (C) parameter estimate for eta, (D) learning rate variance, (E) probability of a correct choice (P(correct choice)), and (F) signal-to-noise ratio in high outcome noise). Specifically, we show that drug effects were maximal in moderately low baseline performer. It is noteworthy that these subjects had particularly high Eta’s on placebo (inset in C) which may have allowed the drug effects to have a larger impact on their performance. This in line with our key finding, that methamphetamine brings eta (parameter controlling dynamic adjustments of learning rate according to recent prediction errors) closer to optimal levels. Vertical dashed line indicates medium baseline performance. MFB = misleading feedback.

The effect of methamphetamine is strongest in moderately low baseline performers.
This figure shows the drug effect from the sliding window linear mixed-effects model analysis plotted against baseline performance for a set of dependent variables: (A) total points, (B) probability of a correct choice (P(correct choice)) in the late learning phase with high outcome noise, (C) parameter estimate for eta, (D) learning rate variance, (E) probability of a correct choice (P(correct choice)), and (F) signal-to-noise ratio in high outcome noise. Grey areas indicate clusters with significant drug effects after correcting for multiple comparisons using a permutation test for cluster mass (Maris & Oostenveld, 2007).

High baseline performers reach near-optimal levels in bost test sessions, limiting potential drug-induced performance enhancement.
(A) Overall task performance and (B) the probability of correct choices in the high outcome noise condition—the condition with the strongest observed drug effect—are plotted against normalized baseline performance. In both testing sessions, high baseline performers cluster around optimal performance. Furthermore, performance simulations using (a) optimal eta values and (b) observed eta values from the high baseline performance group reveal only a small, non-significant performance difference (points optimal eta: 701.91 (21.66) vs. points high performer: 694.47 (21.71); t(46)=2.84,p=0.07, d=0.059 t(46)=2.84, p=0.07, d=0.059). These results suggest that high baseline performers are already at or near their performance ceiling, limiting the potential for further drug-induced improvements.

The stochasticity lesion model shows a pattern of learning deficits associated low performer in our task.
Behaviour of the lesioned model, in which stochasticity is assumed to be small and constant, is shown along the control model that jointly estimates stochasticity and volatility. An example of estimated reward by the models shows that the stochasticity lesion model is more sensitive to noisy outcomes (A). This reduces sensitivity of the learning rate to volatility (i.e., first ten trials after a reversal (B). This, however, is primarily related to inability to make inference about stochasticity, which leads to misestimation of volatility (C–D). Simulations reveal that this is accompanied by reduced performance, particularly for high outcome noise stimuli (grey patches) later in learning (E).

Simulated data form the stochasticity lesion model shows increased eta parameters compared to the control model.
Here, we fit simulated data from the control and stochasticity lesion model (100 simulations for each model) to our task model. Data from the control model reveal increased inverse temperature (A; 2.57 (0.04) vs. 3.54 (0.07); t(198) = −11.09, p = 0, d = 1.71); increased play bias (B; 0.24 (0.01) vs. 0.28 (0.01); t(198) = −2.56, p = 0.01, d = 0.39), and decreased eta’s (C; 0.18 (0.01) vs. 0.14 (0.01); t(198) = 2.21, p = 0.02, d = 0.34). There was no difference in the intercept term of the learning rate (D) or the feedback confirmation term of the learning rate (E).

Overall points full sample.
When comparing overall point in the whole sample (n = 109), we do not see a difference between MPH vs. PLA (705.68 (36.27) vs. 685.77 (35.78); t(108) = 0.81, p = 0.42, d = 0.05). Repeated mixed ANOVAs suggested, that drug effects did not depend on session order (MPH first vs. PLA first), or whether subjects performed the orientation session. Yet, participants who completed the orientation tended to performed better during the dug sessions (F(1,107) = 3.09, p = 0.08; 719.31 (26.6264) vs. 548.00 (75.09)). Note. PL = Placebo; MA = methamphetamine.

Learning curves after reversals full sample.
Figure shows learning curves after all reversals (A), reversals to high reward probability uncertainty (B), and reversals to low reward probability uncertainty (C) for the whole sample. Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Paired-sample t-test revealed no drug related difference for all reversals during early learning (0.72 (0.01) vs. 0.72 (0.01); t(108) = −0.02, p = 0.98, d < 0.01) and late learning (0.83 (0.01) vs. 0.84 (0.01); t(108) = −0.80, p = 0.42, d = 0.04). Similarly, there was no significant differences in both learning stages for reversals to low reward probability certainty stimuli (early learning PLA vs MPH: 0.68 (0.01) vs. 0.69 (0.01); t(108) = −0.92, p = 0.35, d = 0.08; late learning PLA vs MPH: 0.80 (0.01) vs. 0.81 (0.01); t(108) = −1.48, p = 0.14, d = 0.10) or to low reward probability certainty stimuli (early learning PLA vs MPH: 0.74 (0.01) vs. 0.73 (0.01); t(108) = 0.87, p = 0.38, d = 0.06; late learning PLA vs MPH: 0.85 (0.01) vs. 0.85 (0.01); t(108) = −0.02, p = 0.97, d < 0.01). Mixed effect ANOVAs that controlled for session order effects and whether participants performed the orientation session revealed no significant effects (all p > .06). Note. PL = Placebo; MA = methamphetamine.

Learning curves after reversals full sample.
(A) Here we compare MPHs effect on best-fitting parameters of the winning model in the full sample (n = 109). We found that eta (i.e., the weighting of the effect of the abs. reward prediction error on learning) was reduced under MPHs (eta MPH: 0.23 (0.01) vs. PLA 0.29 (0.01); t(108) = −3.05, p = 0.002, d = 0.40). Mixed effect ANOVAs that controlled for session order effects and whether participants performed the orientation session revealed this effect did not depend on these cofounds. No other condition differences emerged. (B) Learning rate trajectories after reversal derived from the computational model. As in the reduced sample MPH appears to be associated reduced learning rate dynamics in the full sample too. Specifically, variability in learning rate (average individual SD of learning rate) tended to be reduced in the MPH condition both during early and late stages of learning across all reversals (early PLA: 0.19 (0.01) vs. 0.18 (0.01); t(108) = 1.89, p = 0.06, d = 0.24; late PLA: 0.17 (0.01) vs. MPH: 0.16 (0.01); t(108) = 1.77, p = 0.08, d = 0.23) and reversals to high reward probability uncertainty (early PLA: 0.18 (0.01) vs. 0.16 (0.01); t(108) = 1.74, p = 0.08, d = 0.22; late PLA: 0.18 (0.01) vs. MPH: 0.16 (0.01); t(108) = 1.82, p = 0.07, d = 0.24). Condition differences became most evident in reversals to low reward probability uncertainty (early PLA: 0.19 (0.01) vs. MPH: 0.16 (0.01); t(108) = 2.18, p = 0.03, d = 0.28; late PLA: 0.18 (0.01) vs. MPH: 0.16 (0.01); t(108) = 1.93, p = 0.05, d = 0.24). Control analyses revealed that these effects were independent of session order and orientation session. Note. PL = Placebo; MA = methamphetamine.

Sign up for email alerts