Demographics and drug use characteristics of study participants (n = 94)

Subjective drug effects post-capsule administration.

MA administration significantly increased ‘feel drug effect’ ratings compared to placebo. The scale for the ratings of Feeling a drug effect range from 0 to 100. The vertical black line indicates the time at which the task was completed. The asterisks refer to a significant on-/ off-drug difference.

Methamphetamine improved performance in a modified probabilistic reversal learning task only in participants who performed the task poorly at baseline.

(A) Schematic of the learning task. Each trial began with the presentation of a random jitter between 300 ms and 500 ms. Hereafter, a fixation cross was presented together with two response options (choose – green tick mark; or avoid – red no-parking sign). After the fixation cross, the stimulus was shown centrally until the participant responded or for a maximum duration of 2000 ms. Thereafter, participants’ choices were confirmed by a white rectangle surrounding the chosen option for 500 ms. Finally, the outcome was presented for 750 ms. If subjects chose to gamble on the presented stimuli, they received either a green smiling face and a reward of 10 points or a red frowning face and a loss of 10 points. When subjects avoided a symbol, they received the same feedback but with a slightly paler color and the points that could have been received were crossed out to indicate that the feedback was fictive and had no effect on the total score. A novel feature of this modified version of the task is that we introduced different levels of noise (probability) to the reward contingencies. Here, reward probabilities could be less predictable (30% or 70%), more certain (20% or 80%), or random (50%). (B) Total points earned in the task split up in sessions (baseline, drug session 1 and 2) and drug condition (PL vs. MA). Results show practice effects but no differences between the two drug sessions (baseline vs. drug session 1: 595.85 (39.81) vs. 708.62 (36.93); t(93) = –4.21, p = 5.95-05, d = 0.30; baseline vs. drug session 2: 595.85 (39.81) vs. 730.00 (38.53); t(93) = –4.77, p = 6.66-06, d = 0.35; session 1 vs. session 2: t(93) = –0.85, p = 0.399, d = 0.05). Dashed gray indicates no significant difference on/off drug (Δ∼35 points) (C) Interestingly, when we stratified drug effects by baseline performance (using median split on total points at baseline), we found that there was a trend towards better performance under MA in the low baseline performance group (n=47, p = .07). (D) Overall performance in drug session 1 and 2 stratified by baseline performance. Here, baseline performance appears not to affect performance in drug session 1 or 2. Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine.

Learning curves after reversals suggest that methamphetamine improves learning performance in phases of less predictable reward contingencies in low baseline performer.

Top panel of the Figure shows learning curves after all reversals (A), reversals to stimuli with less predictable reward contingencies (B), and reversals to stimuli with high reward probability certainty (C). Bottom panel displays the learning curves stratified by baseline performance for all reversals (D), reversals to stimuli with less predictable reward probabilities (E), and reversals to stimuli with high reward probability certainty (F). Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Results suggest no clear differences in the initial learning between MA and PL. However, learning curves diverged later in the learning, particular for stimuli with less predictable rewards (B) and in subjects with low baseline performance (E). Note. PL = Placebo; MA = methamphetamine; Mean/SEM = line/shading.

Computational modeling results reveal that methamphetamine affects the model parameter controlling dynamic adjustments of learning rate.

(A) Model comparison. Bayesian model selection was performed using –0.5*BIC as a proxy for model evidence (Stephan et al., 2009). The best fitting mixture model assigned proportions to each model based on the frequency with which they provided the “best” fit to the observed participant data (Mixture proportion; blue bars) and estimated the probability with which the true population mixture proportion for a given model exceeded that of all others (Exceedance probability; black bars). The hybrid model plus learning rate modulation by feedback confirmatory (model 3) provided the best fit to the majority of participants and had an exceedance probability near one in our model set. (B-C) Comparison of parameter estimates from the winning model on-/ off drug. Stars indicate significant difference for the respective parameter. Results suggest that only the parameter controlling dynamic adjustments of learning rate according to recent prediction errors, eta, was affected by our pharmacological manipulation. (D-F) Modelled and choice behavior of the participants in the task, stretched out for all stimuli. Note that in the task the different animal stimuli were presented in an intermixed and randomized fashion, but this visualization allows to see that participants’ choices followed the reward probabilities of the stimuli. Data plots are smoothed with a running average (+/− 2 trials). Ground truth corresponds to the reward probability of the respective stimuli (good: 70/80%; neutral: 50%; bad: 20/30%). Dashed black lines represent 95% confidence intervals derived from 1000 simulated agents with parameters that were best fit to participants in each group. Model predictions appear to capture the transitions in choice behavior well. Mean/SEM = line/shading. Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine;

Methamphetamine boosts signal-to-noise ratio between real reversals and misleading feedback in late learning stages.

Learning rate trajectories after reversal derived from the computational model. First column depicts learning rates across all subjects for all reversals (A), reversal to stimuli with high reward probability certainty (D), and reversal to stimuli with noisy outcomes (G). Middle and right column shows learning rate trajectories for subjects stratified by baseline performance (B, E, H – low baseline performance; C, F, I – high baseline performance). Results suggest that people with high baseline performance show a large difference in learning rates after true reversals and during the rest of the task including misleading feedback. Specifically, they show a peak in learning after reversals and reduced learning rates in later periods of a learning block, when choice preferences should ideally be stabilized (C). This results in a better signal-to-noise ratio (SNR) between real reversals and misleading feedback (i.e., surprising outcomes in the late learning stage). In low baseline performers the SNR is improved after the administration of MA. This effect was particularly visible in stages of the task where rewards were less predictable (H). Bottom row (J) shows the association between receiving misleading feedback later in learning (i.e., reward or losses that do not align with a stimulus’ underlying reward probability) and the probability of making the correct choice during the next encounter of the same stimulus. Results indicate a negative correlation between the probability of a correct choice after double-misleading feedback and eta (scatter plot on the right). Here, the probability of a correct choice after double-misleading feedback decreases with increasing eta. There was a trend (p = .06) that subjects under MA were more likely to make the correct choice after two misleading feedback as compared to PL (plot in the middle). This effect appeared to be dependent on baseline performance, whereby only subjects with low baseline performance seem to benefit from MA (p = 0.02; plot on the right). Note. IQR = inter quartile range; PL = Placebo; MA = methamphetamine; MFB = misleading feedback.

Changes in learning rate adjustment explain drug induced performance benefits in low baseline performers.

(A) Regression coefficients and 95% confidence intervals (points and lines; sorted by value) stipulating the contribution of each model parameter estimate to overall participants task performance (i.e., scored points in the task). Play bias and eta (the parameter governing the influence of surprise on learning rate) both made a significant negative contribution to overall task performance, whereas inverse temperature and learning rates were positively related to performance. (B) Differences in parameter values for on– and off-drug sessions as quantified by regression coefficients and 95% confidence intervals are plotted separately for high (red) and low (yellow) baseline performers. Note that the drug predominately affected the eta parameter and did so to a greater extent in low baseline performers. (C) eta estimates on-drug (y-axis) are plotted against eta estimates off-drug (x-axis) for high baseline performer (yellow points) and low baseline performer (red points). Note that a majority of subjects showed a reduction in eta on-drug vs. off-drug (67.02%). This effect was more pronounced in low baseline performers (low baseline performers: 74.47%; (low baseline performers: 59.57%). (D) To better understand how changes in eta might have affected overall performance we conducted a set of simulations using the parameters best fit to human subjects, except that we equipped the model with a range of randomly chosen eta values to examine how altering that parameter might affect performance (n=1000 agents). The results revealed that simulated agents with low to intermediate levels of eta achieved the best task performance, with models equipped with the highest etas performing particularly poorly. To illustrate how this relationship between eta and performance could have driven improved performance for some participants under the methamphetamine condition, we highlight four participants with low-moderate eta values under methamphetamine, but who differ dramatically in their eta values in the placebo condition (D, inset). (E) To test whether simulations correspond to actual performance differences across conditions we calculated the predicted improvement for each participant based on their eta in each condition using a polynomial function that best described the relationship between simulated eta values and scored points (red line in D; fitted with matlab’s ployfit.m function; f(x) = –– 2.35e+03*x4 + 5.64e+03*x3 +––4.71e+03*x2 + 1.29e+03*x + 692.08). We found that actual performance differences were positively correlated with the predicted ones (high baseline performer: Pearson’s Rho (47) = .31, p = .03; low baseline performer: Spearman’s Rho(47) = .0.34, p = .02). These results indicate that the individuals who showed the greatest task benefit from methamphetamine were those who underwent the most advantageous adjustments of eta in response to it. Note that we used rank order statistics for low baseline performers based on the fact that the distribution is skewed due to an outlier (upper left corner). PL = Placebo; MA = methamphetamine.

Summary of key findings.

Mean (SEM) scores on three measures of task performance after PL and MA, in participants stratified on low or high baseline performance. (A) There was a trend toward a drug effect, with boosted task performance (total points scored in the task) in low baseline performers (subjects were stratified via median split on baseline performance) after methamphetamine (20mg) administration. (B) Follow-up analyses revealed that on-drug performance benefits were mainly driven by significantly better choices (i.e., choosing the advantageous stimuli and avoiding disadvantageous stimuli) at later stages after reversals for less predictable reward contingencies (30/70% reward probability). (C) To understand the computational mechanism through which methamphetamine improved performance in low baseline performers we investigated how performance in the task related to model parameters from our fits. Our results suggest that methamphetamine alters performance by changing the degree to which learning rates are adjusted according to recent prediction errors (eta), in particular by reducing the strength of such adjustments in low-baseline performers to push them closer to task-specific optimal values.

Learning curves

Top part shows learning curves quantified as the probability to select the correct choice (choosing the advantageous stimuli and avoiding disadvantageous stimuli) stratified by orientation performance. Two-way ANOVAs with the factors Drug (two levels) and Baseline Performance (two levels) on the averaged probability of correct choice during the early and late stage of learning were used to investigate drug effects. (A) No differences in the learning curves between MA and PL became evident when considering all reversals (all p > .1). (B) There was no drug related difference in the acquisition phase of the task between (all p > .05) or (C) in the first reversal learning (all p > .1). In the bottom part of the figure, learning curves are defined as the probability to select a stimulus. (D) No drug effect emerged for reversal learning from a bad stimulus to a good stimulus (all p > .09) or (E) good to bad stimuli (all p > .09). Moreover, there was no difference in reversal learning to neutral stimuli (F and G). Note. PL = Placebo; MA = methamphetamine.

Validation of model selection and parameter recovery.

After model-fitting, each model was used to simulate data for each participant using the best-fitting parameters for that participant. Each model was fit to each synthetic dataset and BIC was used to determine which model provided best fit to synthetic data. (A) Inverse confusion matrix. The frequency with which a recovered model (abscissa, determined by lowest BIC) corresponded to a given simulation model (ordinate) is depicted in color. Recovered models correspond to the same models labeled on the ordinate, with recovered model 1 corresponding to the base model, and so on. The results of the model recover analyses suggest that the recovered model typically corresponded to a synthetic dataset produced by that model. (B) Parameter values that were used to simulate data from the hybrid model with additional modulation of the learning rate by feedback confirmatory (ordinate) tended to correlate (color) with the parameter values best fit to those synthetic datasets (abscissa). Recovered parameter values correspond to the labels on the ordinate, with parameter 1 reflecting temperature parameter of the softmax function, and so on.

Relationships between model parameters not affected by the drug and task performance (measured by total scored points in the task).

To better understand how changes in model parameters not affected by methamphetamine might have affected overall performance we conducted a set of simulations using the parameters best fit to human subjects, except that we equipped the model with a range of randomly chosen temperature parameters of the softmax function (A), paly bias term (B), intercept term of the learning rate (C), and feedback confirmation term of the learning rate (D), to examine how altering these parameters might affect performance. For each model we draw 1000 values of the respective parameter from a uniform distribution spanning the fitted parameter space. The results revealed that simulated agents with higher temperature parameters achieved the best task performance (A). Moreover, agents with a play bias around zero (B), and intercept term of the learning rate (C), and feedback confirmation term of the learning rate (D) centered around zero achieved the best task performance. To test whether simulations correspond to actual performance differences across conditions we calculated the predicted performance difference for each participant based on their on– / off-drug parameter difference using a polynomial function that best described the relationship between simulated parameter values and scored points (red lines fitted with matlab’s ployfit.m function). Results are shown next to the simulation and suggest that predicted performance differences were unrelated to actual performance differences for changes in the temperature parameters of the softmax function (A; r(188) = 0.16, p = 0.10), play bias term (B; r(188) = 0.12, p = 0.22), intercept term of the learning rate (C; r(188) = 0.09, p = 0.34), and feedback confirmation term of the learning rate (D; r(188) = 0.08, p = 0.39).

Overall points full sample.

When comparing overall point in the whole sample (n = 109), we do not see a difference between MPH vs. PLA (705.68 (36.27) vs. 685.77 (35.78); t(108) = 0.81, p = 0.42, d = 0.05). Repeated mixed ANOVAs suggested, that drug effects did not depend on session order (MPH first vs. PLA first), or whether subjects performed the orientation session. Yet, participants who completed the orientation tended to performed better during the dug sessions (F(1,107) = 3.09, p = 0.08; 719.31 (26.6264) vs. 548.00 (75.09)). Note. PL = Placebo; MA = methamphetamine.

Learning curves after reversals full sample.

Figure shows learning curves after all reversals (A), reversals to high reward probability uncertainty (B), and reversals to low reward probability uncertainty (C) for the whole sample. Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Paired-sample t-test revealed no drug related difference for all reversals during early learning (0.72 (0.01) vs. 0.72 (0.01); t(108) = –0.02, p = 0.98, d < 0.01) and late learning (0.83 (0.01) vs. 0.84 (0.01); t(108) = –0.80, p = 0.42, d = 0.04). Similarly, there was no significant differences in both learning stages for reversals to low reward probability certainty stimuli (early learning PLA vs MPH: 0.68 (0.01) vs. 0.69 (0.01); t(108) = –0.92, p = 0.35, d = 0.08; late learning PLA vs MPH: 0.80 (0.01) vs. 0.81 (0.01); t(108) = –1.48, p = 0.14, d = 0.10) or to low reward probability certainty stimuli (early learning PLA vs MPH: 0.74 (0.01) vs. 0.73 (0.01); t(108) = 0.87, p = 0.38, d = 0.06; late learning PLA vs MPH: 0.85 (0.01) vs. 0.85 (0.01); t(108) = –0.02, p = 0.97, d < 0.01). Mixed effect ANOVAs that controlled for session order effects and whether participants performed the orientation session revealed no significant effects (all p > .06). Note. PL = Placebo; MA = methamphetamine.

Learning curves after reversals full sample.

(A) Here we compare MPHs effect on best-fitting parameters of the winning model in the full sample (n = 109). We found that eta (i.e., the weighting of the effect of the abs. reward prediction error on learning) was reduced under MPHs (eta MPH: 0.23 (0.01) vs. PLA 0.29 (0.01); t(108) = –3.05, p = 0.002, d = 0.40). Mixed effect ANOVAs that controlled for session order effects and whether participants performed the orientation session revealed this effect did not depend on these cofounds. No other condition differences emerged. (B) Learning rate trajectories after reversal derived from the computational model. As in the reduced sample MPH appears to be associated reduced learning rate dynamics in the full sample too. Specifically, variability in learning rate (average individual SD of learning rate) tended to be reduced in the MPH condition both during early and late stages of learning across all reversals (early PLA: 0.19 (0.01) vs. 0.18 (0.01); t(108) = 1.89, p = 0.06, d = 0.24; late PLA: 0.17 (0.01) vs. MPH: 0.16 (0.01); t(108) = 1.77, p = 0.08, d = 0.23) and reversals to high reward probability uncertainty (early PLA: 0.18 (0.01) vs. 0.16 (0.01); t(108) = 1.74, p = 0.08, d = 0.22; late PLA: 0.18 (0.01) vs. MPH: 0.16 (0.01); t(108) = 1.82, p = 0.07, d = 0.24). Condition differences became most evident in reversals to low reward probability uncertainty (early PLA: 0.19 (0.01) vs. MPH: 0.16 (0.01); t(108) = 2.18, p = 0.03, d = 0.28; late PLA: 0.18 (0.01) vs. MPH: 0.16 (0.01); t(108) = 1.93, p = 0.05, d = 0.24). Control analyses revealed that these effects were independent of session order and orientation session. Note. PL = Placebo; MA = methamphetamine.