Abstract
The ability to calibrate learning according to new information is a fundamental component of an organism’s ability to adapt to changing conditions. Yet, the exact neural mechanisms guiding dynamic learning rate adjustments remain unclear. Catecholamines appear to play a critical role in adjusting the degree to which we use new information over time, but individuals vary widely in the manner in which they adjust to changes. Here, we studied the effects of a low dose of methamphetamine (MA), and individual differences in these effects, on probabilistic reversal learning dynamics in a within-subject, double-blind, randomized design. Participants first completed a reversal learning task during a drug-free baseline session to provide a measure of baseline performance. Then they completed the task during two sessions, one with MA (20 mg oral) and one with placebo (PL). First, we showed that, relative to PL, MA modulates the ability to dynamically adjust learning from prediction errors. Second, this effect was more pronounced in participants who performed poorly at baseline. These results present novel evidence for the involvement of catecholaminergic transmission on learning flexibility and highlights that baseline performance modulates the effect of the drug.
Introduction
Goal-directed behavior requires organisms to continually update predictions about the world to select actions in the light of new information. In environments that include discontinuities (changepoints) and noise (probabilistic errors), optimal learning requires increased weighting of surprising information during periods of change and ignoring surprising events during periods of stability. A burgeoning literature suggests that humans are able to calibrate learning rates according to the statistical content of new information (Behrens et al., 2007; Cook et al., 2019; Diederen et al., 2016; Nassar et al., 2019; Nassar et al., 2010; Razmi & Nassar, 2022), albeit to varying degrees (Kirschner et al., 2022; Kirschner et al., 2023; Nassar et al., 2016; Nassar et al., 2012; Nassar et al., 2021).
Although the exact neural mechanisms guiding dynamic learning adjustments are unclear, several neuro-computational models have been put forward to characterize adaptive learning. While these models differ in their precise computational mechanisms, they share the hypothesis that catecholamines play a critical role in adjusting the degree to which we use new information over time. For example, a class of models assumes that striatal dopaminergic prediction errors act as a teaching signal in cortico–striatal circuits to learn task structure and rules (Badre & Frank, 2012; Collins & Frank, 2013; Collins & Frank, 2016; Lieder et al., 2018; Pasupathy & Miller, 2005; Schultz et al., 1997). Another line of research highlights the role of dopamine in tracking the reward history with multiple learning rates (Doya, 2002; Kolling et al., 2016; Meder et al., 2017; Schweighofer & Doya, 2003). This integration of reward history over multiple time scales enables people to estimate trends in the environment through past and recent experiences and adjust actions accordingly (Wilson et al., 2013). Within the broader literature of cognitive control, it has been suggested that dopamine in the prefrontal cortex and basal ganglia is involved in modulating computational tradeoffs such as cognitive stability–flexibility balance (Cools, 2008; Dreisbach et al., 2005; Floresco, 2013; Goschke, 2013; Goschke & Bolte, 2014; Goschke & Bolte, 2018). In particular, it has been proposed that dopamine plays a crucial role in the regulation of meta-control parameters that facilitate dynamic switching between complementary control modes (i.e., shielding goals from distracting information vs. switching goals in response to significant changes in the environment) (Goschke, 2013; Goschke & Bolte, 2014; Goschke & Bolte, 2018). Finally, other theories highlight the importance of the locus coeruleus/norepinephrine system in facilitating adaptive learning and structure learning (Razmi & Nassar, 2022; Silvetti et al., 2018; Yu et al., 2021). Consistent with these neuro-computational models catecholaminergic drugs are known to affect cognitive performance including probabilistic reversal learning (Cook et al., 2019; Dodds et al., 2008; Repantis et al., 2010; Rostami Kandroodi et al., 2021; van den Bosch et al., 2022; Westbrook et al., 2020). Indeed, psychostimulants, such as methamphetamine, that increase extracellular catecholamine availability, can enhance cognition (Arria et al., 2017; Husain & Mehta, 2011; Smith & Farah, 2011) and are used to remediate cognitive deficits in attention deficit hyperactivity disorder (ADHD) (Arnsten & Pliszka, 2011; Prince, 2008). However, the cognitive enhancements vary across tasks and across individuals (Bowman et al., 2023; Cook et al., 2019; Cools & D’Esposito, 2011; Garrett et al., 2015; Rostami Kandroodi et al., 2021; van den Bosch et al., 2022; van der Schaaf et al., 2013) and the mechanisms underlying this variability remain poorly understood.
There is evidence that the effects of catecholaminergic drugs depend on an individual’s baseline dopamine levels in the prefrontal cortex (PFC) and striatum (Cohen & Servan-Schreiber, 1992; Cools & D’Esposito, 2011; Dodds et al., 2008; Durstewitz & Seamans, 2008; Rostami Kandroodi et al., 2021; van den Bosch et al., 2022). Depending on individual baseline dopamine levels the administration of catecholaminergic drugs can promote states of cognitive flexibility or stability. For example, pushing dopamine from low to optimal (medium) levels may increase update thresholds in the light of new information (i.e., facilitating shielding/stability), whereas if a drug pushes dopamine either too high or too low may decrease update thresholds (i.e., facilitating shifting/flexibility) (Durstewitz & Seamans, 2008; Goschke & Bolte, 2018).
Here, we argue that baseline performance should be considered when studying the behavioral effects of catecholaminergic drugs effects. To investigate the role of baseline performance in drug challenge studies, it is important to control for several factors. First, the order of drug and placebo sessions must be balanced to control for practice effects (Bartels et al., 2010; Garrett et al., 2015; MacRae et al., 1988; Servan-Schreiber et al., 1998). Second, it is desirable to obtain an independent measure of baseline performance that is not confounded with the drug vs placebo comparison. Thus, participants may be stratified based on their performance on an independent session.
In the present study, we studied the effects of methamphetamine, a stimulant that increases monoaminergic transmission, on probabilistic reversal learning dynamics in a within-subject, double-blind, randomized design. The effects of the drug on a reversal learning task were examined in relation to participants’ baseline level of performance. Baseline performance was determined during an initial drug-free session. Then, participants completed the task during two sessions after receiving placebo (PL) and 20 mg of methamphetamine (MA; order counterbalanced).
The task used to study adaptive learning dynamics was a reversal variant of an established probabilistic learning task (Fischer & Ullsperger, 2013; Jocham et al., 2014; Kirschner et al., 2022; Kirschner et al., 2023). On each trial, subjects made a choice to either gamble or avoid gambling on a probabilistic outcome, in response to a stimulus presented in the middle of the screen (see Figure 2A). A gamble could result in a gain or loss of 10 points, depending on the reward contingency associated with that stimulus. In choosing not to gamble, subjects avoided losing or winning points, but they were informed what would have happened if they had chosen to gamble. The reward contingency changed every 30-35 trials. By learning which symbols to choose and which to avoid, participants could maximize total points. A novel feature of this modified version of the task is that we introduced different levels of noise (probability) to the reward contingencies. Here, reward probabilities could be less predictable (30% or 70%) or more certain (20% or 80%). This manipulation allowed us to study the effect of MA on the dynamic balancing of updating and shielding beliefs about reward contingencies within different levels of noise in the task environment. To estimate learning rate adjustments, we fit a nested set of reinforcement learning models, that allowed for trial-by-trial learning rate adjustments.
We found that MA improved participants’ performance in the task, but this effect was driven mainly by a greater improvement in performance in those participants who performed poorly during the baseline session. Modeling results suggested that MA helps performance by adaptively shifting the relative weighting of surprising outcomes based on their statistical context. Specifically, MA facilitates down-weighting of probabilistic errors in stages of less predictable reward contingencies. Together, these results reveal novel insights into the role of catecholamines in adaptive learning behavior and highlights the importance to consider individual difference at baseline.
Results
97 healthy young adults completed the probabilistic learning task (Figure 2) (Fischer & Ullsperger, 2013; Jocham et al., 2014; Kirschner et al., 2022; Kirschner et al., 2023) on three separate sessions, an initial drug-free session, and after PL and MA. The study followed a double-blinded cross-over design, whereby 50% of participants received MA first, and 50% of participants PL first. Table 1 shows the demographic characteristics of the participants grouped by their task performance during the baseline session. The groups did not differ significantly on any of the variables measured. In a first analysis, we checked for general practice effects across the three task completions based on the total points earned in the task. We found a strong practice effect (F(2,186) = 14.53, p < .001) with better performance on session two and three compared to session one (baseline). There was no difference in the total scores between session two and three (see Figure 2B). These results suggest that the baseline session may have minimized order effects between MA and PL sessions (see also results and discussion below). The key findings detailed below are summarized in a schematic figure presented in the discussion section (Figure 7).
Subjective drug effects
MA administration significantly increased ‘feel drug effect’ ratings compared to PL, at 30, 50, 135, 180, and 210 min post-capsule administration (see Figure 1; Drug x Time interaction F(5,555) = 38.46, p < 0.001).
Drug effects on overall performance and RT
In general, participants learned the task well, based on the observation that their choice behavior largely followed the underlying reward probabilities of the stimuli across the sessions (see Figure 4D-F). When all subjects were considered together, we did not find a performance benefit under MA quantified by the total points scored in the task (MA: 736.59 (37.11) vs. PL: 702.02 (38.305); t(93) = 1.38, p = 0.17, d = 0.10). When participants were stratified by their baseline performance (median spilt on total points at baseline), we found a marginally significant Drug x Baseline Performance Group interaction (Drug x Baseline Performance Group interaction: F(1,92) = 3.20, p = 0.07; see Figure 2C and Figure 7A). Post hoc t tests revealed that compared to PL, MA improved performance marginally in participants with poor baseline performance (total points MA: 522.55 (53.79) vs. PL: 443.61 (47.81); t(46) = 1.85, p = 0.071, d = 0.23). MA did not, however, improve performance in the high baseline performance group (total points MA: 950.63 (26.15) vs. PL: 960.42 (27.26); t(46) = –0.38, p = 0.698, d = 0.05). In control analyses we ensured that these effects are not driven by session-order effects (see also section on session control analyses below). Results showed no effect of Session (F(1,92) = 0.71, p = 0.40) and no Session x Baseline Performance Group interaction (F(1,92) = 0.59, p = 0.44 ; see Figure 1C). There was a trend for slightly faster RTs under MA (PL: 544.67ms (9.87) vs. MA: 533.84ms (11.51); t(93) = 1.75, p = 0.08, d = 0.10). This speed effect appeared to be independent of baseline performance (Drug x Baseline Performance Group interaction: F(1,92) = 0.45, p = 0.50). Moreover, MA was associated with reduced RT variability (average individual SD of RTs: PL: 193.74 (6.44) vs. MA: 178.98 (5.47); t(93) = 2.54, p = 0.012, d = 0.25). Reduced RT variability has previously been associated with increased attention and performance (Esterman et al., 2012; Karamacoska et al., 2018). Two-way ANOVA on RT variability revealed an effect of baseline performance (F(1,92) = 4.52, p = 0.03), with increased RT variability in low baseline performers across the drug sessions (low baseline performance: 197.27 (6.48) vs. high baseline performance: 175.45 (5.29)). Moreover, there was an effect of Drug (F(1,92) = 6.87, p = 0.01), and a Drug x Baseline Performance Group interaction (F(1,92) = 6.97, p = 0.009). Post hoc t tests indicated that the MA-related reduction in RT variability was specific to low baseline performers (PL: 212.07 (9.84) vs. MA: 182.46 (7.98); t(46) = 3.04, p = 0.003, d = 0.48), whereas MA did not affect high baseline performers RT variability (PL: 175.40 (7.51) vs. MA: 175.50 (7.55); t(46) = –0.02, p = 0.98, d < 0.01).
Methamphetamine improves learning performance when reward contingencies are less predictable
Next, to get a better understanding of how MA affects learning dynamics, we investigated the probability of correct choice (i.e., choosing the advantageous stimuli and avoiding disadvantageous stimuli) across successive reversals. As shown in Figure 3 the drug did not affect initial learning. However, the drug improved performance later in learning, particularly for stimuli with less predictable reward probabilities (see Figure 3B) and in subjects with low baseline performance. To quantify this observation, we first applied the Bai-Perron multiple break point test (see Methods) to find systematic breaks in the learning curves allowing us to divide learning into early and late stages. We applied the test to the reversal learning data across subjects. One break point was identified at 10 trials after a reversal (indexed by the vertical lines in Figure 3). We did not find drug differences when considering all reversals (PL: 0.84 (0.01) vs. MA 0.85 (0.01); t(93) = –1.14, p = 0.25, d = 0.07) and reversals to stimuli with high reward probability certainty (PL 0.86 (0.01) vs. MA 0.87 (0.01); t(93) = –0.25, p = 0.80, d = 0.02). Interestingly, we found a trend for increased learning under MA for stimuli with less predictable rewards (PL 0.80 (0.01) vs. 0.82 (0.01); t(93) = –1.80, p = 0.07, d = 0.14). Two-way ANOVA on the averaged probability of correct choice during the late stage of learning revealed a Drug x Baseline Performance Group interaction (F(1,92) = 4.85, p = 0.03; see Figure 7B). Post hoc t tests revealed that subjects performing lower at baseline appeared to benefit from MA (average accuracy late learning PL: 0.69 (0.02) vs. MA 0.74 (0.02); t(46) = –2.59, p = 0.01, d = 0.32), whereas there was no difference between MA and PL in the high baseline performance group (PL: 0.91 (0.01) vs. MA: 0.91 (0.01); t(46) = 0.29, p = 0.77, d = 0.04). We did not find other differences in reversal learning (all p > 0.1). In control analyses we split the learning curves into other possible learning situations in the task (i.e., acquisition, first reversal learning etc.). Here no drug related effects emerged (see Supplementary Figure1).
Computational modeling results
To gain a better mechanistical understanding of the trial-to-trial learning dynamics we constructed a nested model set built from RL models (see methods) that included the following features: (1) a temperature parameter of the softmax function used to convert trial expected values to action probabilities (β), (2) a play bias term that indicates a tendency to attribute higher value to gambling behavior, and (3) an intercept term for the effect of learning rate on choice behavior. Additional parameters controlled trial-by-trial modulations of the learning rate including feedback confirmation (confirmatory feedback was defined as factual wins and counterfactual losses, disconfirmatory feedback was defined as factual losses and counterfactual wins), feedback modality (factual vs. counterfactual) and weighting of the learning rate as a function of the absolute value of previous prediction error (parameter Eta, determining the influence of surprise about the outcome on learning; Li et al., 2011). The winning model (as measured by lowest BIC and achieving protected exceedance probabilities of 100%) was one that allowed the learning rate to vary based on whether the feedback was confirmatory or not and the level of surprise of the outcome (see Figure 4A). Sufficiency of the model was evaluated through posterior predictive checks that matched behavioral choice data (see Figure 4D-F) and model validation analyses (see Supplementary Figure 2). We did not find evidence for differences in model fit between the groups (avg. BIC PL: 596.77 (21.63) vs. MA: 599.66 (19.85); t(93) = –0.25, p = 0.80, d = 0.01).
Next, we compared MAs effect on best-fitting parameters of the winning model (see Figure 4B-C). We found that eta (the parameter controlling dynamic adjustments of learning rate according to recent absolute prediction errors) was reduced under MA (eta MA: 0.24 (0.01) vs. PL 0.30 (0.01); t(93) = –3.005, p = 0.003, d = 0.43). When we stratified drug effects by baseline performance, we found a marginally significant Drug x Baseline Performance Group interaction (F(1,92) = 3.09, p = 0.08; see Figure 7C)). Post hoc t tests revealed that compared to PL, MA affected eta depending on baseline performance in the task. Here, subjects performing less well at baseline showed smaller eta’s (eta MA: 0.24 (0.01) vs. 0.33 (0.02); t(46) = –3.06, p = 0.003, d = 0.67), whereas there was no difference between MA and PL in the high baseline performance group MA: 0.23 (0.01) vs. 0.26 (0.01); t(46) = –1.03, p = 0.31, d = 0.18). We did not find drug related differences in any model parameters (all p > 0.1).
Methamphetamine affects learning rate dynamics
Next, we investigated how the model parameters fit with trial-by-trial modulations of the learning rate. Learning rates in our best fitting model were dynamic and affected by both model parameters and their interaction with feedback. Learning rate trajectories after reversals are depicted in Figure 5. As suggested by lower eta scores, MA appears to be associated with reduced learning rate dynamics in low-baseline performers. In contrast, low-baseline-performers in the PL condition exhibited greater variability in learning rate (and average LR throughout) rendering their choices more erratic. Consistent with this, on many trials their choices were driven by the most recent feedback, as their learning rates on a large subset of trials in later learning stages (on average 9 out of 11; Figure 5H) were greater than 0.5. Specifically, variability in learning rate (average individual SD of learning rate) was reduced in both early and late stages of learning across all reversals (early PL: 0.20 (0.01) vs. MA: 0.17 (0.01); t(93) = 2.72, p = 0.007, d = 0.36; late PL: 0.18 (0.01) vs. MA: 0.15 (0.01); t(93) = 2.51, p = 0.01, d = 0.33), as were reversals to stimuli with less predictable rewards (early PL: 0.19 (0.01) vs. 0.16 (0.01); t(93) = 2.98, p = 0.003, d = 0.39; late PL: 0.18 (0.01) vs. MA: 0.16 (0.01); t(93) = 2.66, p = 0.009, d = 0.35). Reversals to stimuli with high outcome certainty were also associated with decreased learning rate variability after MA administration (early PL: 0.18 (0.01) vs. MA: 0.15 (0.01); t(93) = 2.57, p = 0.01, d = 0.34; late PL: 0.18 (0.01) vs. MA: 0.15 (0.01); t(93) = 2.63, p = 0.009, d = 0.35). Two-way ANOVA revealed that this effect depended on baseline performance across all reversals (Drug x Baseline performance: F(1,92) = 3.47, p = 0.06), reversals to stimuli with less predictable rewards (Drug x Baseline performance: F(1,92) = 4.97, p = 0.02), and stimuli with high outcome certainty (Drug x Baseline performance: F(1,92) = 5.26, p = 0.03). Here, reduced variability under MA was observed in low baseline performers (all p < .006, all d > .51) but not in high baseline performers (all p >.1). Together, these patterns of results suggest that people with high baseline performance show a large difference in learning rates after true reversals and during the rest of the task including misleading feedback. Specifically, they show a peak in learning after reversals and reduced learning rates in later periods of a learning block, when choice preferences should ideally be stabilized (see Figure 5C). This results in a better signal-to-noise ratio (SNR) between real reversals and misleading feedback (i.e., surprising outcomes in the late learning stage). In low baseline performers the SNR is improved after the administration of MA. This effect was particularly visible in stages of the task where rewards were less predictable. To quantify the SNR for less predictable reward contingencies for low baseline performers, we computed the difference between learning rate peaks on true reversals (signal) vs. learning rate peaks after probabilistic feedback later in learning (noise; SNR = signal –noise). The results of this analysis revealed that MA significantly increased the SNR for low baseline performers (PL: 0.01 (0.01) vs. MA: 0.04 (0.01); t(46) = –2.81, p = 0.007, d = 0.49). Moreover, learning rates were generally higher in later stages of learning, when choice preferences should ideally have stabilized (avg. learning rate during late learning for less predictable rewards: PL: 0.48 (0.01) vs. MA: 0.42 (0.01); t(46) = 3.36, p = 0.001, d = 0.56).
Thus far, our results suggest that (1) MA improved performance in subjects who performed poorly at baseline, and (2) that MA reduced learning rate variability in subjects with low baseline performance (driven by significantly lower eta parameter estimates, which improved the SNR between true reversals and misleading feedback particularly for less predictable rewards). Next, we aimed to test how these differences relate to each other. Given that eta causes increased learning after surprising feedback and that we found the biggest drug differences in later stages of learning for stimuli that have less predictable rewards, we tested the association between the probability of making the correct choice after two consecutive probabilistic errors (wins for bad stimuli and losses for good stimuli; in total this happened 8 times in the late learning stage for stimuli with 30/70% reward probability) and eta. We found a significant correlation across participants (see Figure 5J), whereby higher etas scores were associated with fewer correct choices (r = .29, p = < .001). There was a trend toward a drug effect, with subjects in MA condition being more likely to make the correct choice after two misleading feedbacks (PL: 0.82 (0.02) vs. 0.84 (0.01); t(93) = –1.92, p = 0.06, d = 0.13). Two-way ANOVA revealed, that this effect depended on baseline performance (Drug x Baseline performance: F(1,92) = 4.27, p = 0.04). Post-hoc t tests indicated higher correct choice probabilities under MA in low baseline performers (PL: 0.70 (0.02) vs. MA: 0.75 (0.02); t(46) = –2.41, p = 0.02, d = 0.30) but not in high baseline performers (PL: 0.92 (0.01234) vs. MA: 0.92 (0.01); t(46) = 0.11, p = 0.91, d = 0.01).
Methamphetamine shifts learning rate dynamics closer to the optimum for low baseline performers
To better understand the computational mechanism through which MA improved performance in low baseline performers, we first examined how performance in the task related to model parameters from our fits. To do so, we regressed task performance onto an explanatory matrix containing model parameter estimates across all conditions (see Figure 6A). The results of this analysis revealed that variability in several of the parameters was related to overall task performance, with the overall learning rate, feedback confirmation LR adjustments, and inverse temperature all positively predicting performance and eta and the play bias term negatively predicting it.
While each of these parameters explained unique variance in overall performance levels, only the parameter controlling dynamic adjustments of learning rate according to recent prediction errors, eta, was affected by our pharmacological manipulation (Figure 6B). In particular, eta was reduced in the MA condition, specifically in the low baseline group, albeit to an extent that differed across individuals (Figure 6C). To better understand how changes in eta might have affected overall performance we conducted a set of simulations using the parameters best fit to human subjects, except that we implemented equipped the model with a range of randomly chosen eta values, to examine how altering that parameter might affect performance. The results revealed that simulated agents with low to intermediate levels of eta achieved the best task performance, with models equipped with the highest etas performing particularly poorly (Figure 6D). To illustrate how this relationship between eta and performance could have driven improved performance for some participants under the methamphetamine condition, we highlight four participants with low-moderate eta values under methamphetamine, but who differ dramatically in their eta values in the placebo condition (Figure 6D, inset). Note that the participants who have the largest decreases in eta under the methamphetamines, resulting from the highest placebo levels of eta, would be expected to have the largest improvements in performance. To test whether these simulations correspond to actual performance differences across conditions we calculated the predicted improvement for each participant based on their eta in each condition using the function in Figure 6D. We found that actual performance differences were positively correlated with the predicted ones (Figure 6E), indicating that the individuals who showed the greatest task benefit from methamphetamine were those who underwent the most advantageous adjustments of eta in response to it. This result was specific to eta, and taking a similar approach to explain conditional performance differences in terms of the other model parameters, including those that were quite strongly related to performance (Figure 6A), yielded negative results (all p > .1; see Supplementary Figure S3). It is noteworthy that low-baseline performers tended to have particularly high values of eta under the baseline condition (low-baseline performers: 0.33 (0.02) vs. high-baseline performers: 0.25 (0.01); t(46) = 2.59, p = 0.01 d = 0.53), explaining why these individuals saw the largest improvements under the methamphetamine condition. Taken together, these results suggest that MA alters performance by changing the degree to which learning rates are adjusted according to recent prediction errors (eta), in particular by reducing the strength of such adjustments in low-baseline performers to push them closer to task-specific optimal values.
While eta seemed to account for the differences in the effects of MA on performance in our low and high performance groups, it did not fully account for performance differences across the two groups (see Figure 1C and Figure 7A/B). When comparing other model parameters between low and high baseline performer across drug sessions, we found that high baseline performer displayed higher overall inverse temperatures (2.97(0.05) vs. 2.11 (0.08); t(93) = 7.94, p < .001, d = 1.33). This suggests that high baseline performers displayed higher transfer of stimulus values to actions leading to better performance (as also indicated by the positive contribution of this parameter to overall performance in the GLM). Moreover, they tended to show a reduced play bias (–0.01 (0.01) vs. 0.04 (0.03); t(93) = –1.77, p = 0.08, d = 0.26) and increased intercepts in their learning rate term (–2.38 (0.364) vs. –6.48 (0.70); t(93) = 5.03, p < .001, d = 0.76). Both of these parameters have been associated with overall performance (see Figure 6A). Thus, overall performance difference between high and low baseline performed can be attributed to differences in model parameters other than eta. However, as described in the previous paragraph, differential effects of MA on performance on the two groups were driven by eta.
Control analyses
To control for the potentially confounding factor session order (i.e., PL first vs. MA first), we repeated the two-way mixed ANOVAs with significant Drug x Baseline Session interactions with session order as a between subject factor. Including session order did not alter the significance of the observed effects and did not interact with the effects of interest (all p > .24).
Discussion
To study learning dynamics participants completed a reversal variant of an established probabilistic learning task (Fischer & Ullsperger, 2013; Jocham et al., 2014; Kirschner et al., 2022; Kirschner et al., 2023). Participants completed the task three times: in a baseline session without drug, and after PL and after oral MA (20 mg) administration. We observed a trend towards a drug effect on overall performance, with improved task performance (total points scored in the task) selectively in low baseline performers. Follow-up analyses revealed that MA performance benefits were mainly driven by significantly better choices (i.e., choosing the advantageous stimuli and avoiding disadvantageous stimuli) at later stages after reversals for less predictable reward contingencies. Modeling results suggest that MA is helping performance by adaptively shifting the relative weighting of surprising outcomes based on their statistical context. Specifically, MA facilitated down-weighting of probabilistic errors in phases of less predictable reward contingencies. In other words, in low baseline-performers the SNR between true reversals and misleading feedback is improved after the administration of MA. Our results advance the existing literature that, to date, overlooked baseline performance effects. Moreover, although existing literature has linked catecholamines to volatility-based learning rate adjustments (Cook et al., 2019), we show that these adjustments also relate to other context-dependent adjustments like levels of probabilistic noise. The key findings of this study are summarized in Figure 7.
Methamphetamine affects the relative weighting of reward prediction errors
A key finding of the current study is that MA affected the relative weighting of reward prediction errors. In our model, adjustments in learning rate are afforded by weighting the learning rate as a function of the absolute value of the previous prediction error (Li et al., 2011). This associability-gated learning mechanism is empirically well supported (Le Pelley, 2004) and facilitates decreasing learning rates in periods of stability and increasing learning rates in periods of change. MA was associated with lower weighting of prediction errors (quantified by lower eta parameters under MA). Our results comprise an important next step in understanding the neurochemical underpinnings of learning rate adjustments.
Neuro-computational models suggest that catecholamines play a critical role in adjusting the degree to which we use new information. One class of models highlights the role of striatal dopaminergic prediction errors as a teaching signal in cortico–striatal circuits to learn task structure and rules (Badre & Frank, 2012; Collins & Frank, 2013; Collins & Frank, 2016; Lieder et al., 2018; Pasupathy & Miller, 2005; Schultz et al., 1997). The implication of such models is that learning the structure of a task results in appropriate adjustments in learning rates. Optimal learning in our task with high level of noise in reward probabilities in combination with changing reward contingencies required increased learning from surprising events during periods of change (reversals) and reduced learning from probabilistic errors. Thus, neither too low learning adjustments after surprising outcomes (low eta), nor too high learning adjustments after surprising outcomes (high eta) are beneficial in our task structure. Interestingly, MA appears to shift eta closer to the optimum. In terms of the neurobiological implementation of this effect, MA may prolong the impact of phasic dopamine signals, which in turn facilitates better learning of the task structure and learning rate adjustments (Cook et al., 2019; Marshall et al., 2016; Volkow et al., 2002). Our data, in broad strokes, are consistent with the idea that dopamine in the prefrontal cortex and basal ganglia is involved in modulating meta-control parameters that facilitated dynamic switching between complementary control modes (i.e., shielding goals from distracting information vs. shifting goals in response to significant changes in the environment) (Cools, 2008; Dreisbach et al., 2005; Floresco, 2013; Goschke, 2013; Goschke & Bolte, 2014; Goschke & Bolte, 2018). A key challenge in our task is differentiating real reward reversals from probabilistic misleading feedback which is a clear shielding/shifting dilemma described in the meta-control literature. Our data suggest that MA might improve meta-control of when to shield and when to shift beliefs in low baseline performers.
Moreover, it is possible that MA’s effect on learning rate adjustments is driven by its influence on the noradrenaline system. Indeed, a line of research is highlighting the importance of the locus coeruleus/norepinephrine system in facilitating adaptive learning and structure learning (Razmi & Nassar, 2022; Silvetti et al., 2018; Yu et al., 2021). In particular, evidence from experimental studies, together with pharmacological manipulations and lesion studies of the noradrenergic system suggest that noradrenaline is important for change detection (Muller et al., 2019; Nassar et al., 2012; Preuschoff et al., 2011; Set et al., 2014). Thus, the administration of MA may have increased participants’ synaptic noradrenaline levels and, therefore, increased the sensitivity to salient events indicating true change points in the task.
It should be noted that other neuromodulators, such as acetylcholine (Marshall et al., 2016; Yu & Dayan, 2005), and serotonin (Grossman et al., 2022; Iigaya et al., 2018), have also been associated with dynamic learning rate adjustment. Future studies should compare the effects of neuromodulator-specific drugs for example a dopaminergic modulator, a noradrenergic modulator, a cholinergic modulator, and a serotonin modulator to make neuromodulator-specific claims (for example see Marshall et al., 2016). Taken together, it is likely that in our study MA effects on learning rate adjustments are driven by multiple processes that perhaps also work in concert. Moreover, because we only administered a single pharmacological agent, our results could reflect general effects of neuromodulation.
Our results are in line with recent studies that show improved performance under methylphenidate (MPH) by making learning more robust against misleading information. For example, Fallon et al. (2017) showed that (MPH) helped participants to ignore irrelevant information but impaired the ability to flexibly update items held in working memory. Another study showed that (MPH) improved performance by adaptively reducing the effective learning rate in participants with higher working memory capacity (Rostami Kandroodi et al., 2021). These studies highlight the complex effects of MPH on working memory and the role of working memory in reinforcement learning (Collins & Frank, 2012; Collins & Frank, 2018). It could be that the effect of MA on learning rate dynamics reflect a modulation of interactions between working memory and reinforcement learning strategies. However, it should be acknowledged that our task was not designed to parse out specific contributions of the reinforcement learning system and working memory to performance.
Methamphetamine selectively boosts performance in participants with poor initial task performance
Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. It is important to note, that MA did not bring performance of low baseline performers to the level of performance of high baseline performers. We speculate that high performers gained a good representation of the task structure during the orientation practice session, taking specific features of the task into account (change point probabilities, noise in the reward probabilities). This is reflected in a large signal to noise ratio between real reversals and misleading feedback. Because the high performers already perform the task at a near-optimal level, MA may not further enhance performance.
These results have several interesting implications. First, a novel aspect of our design is that, in contrast to most pharmacological studies, participants completed the task during a baseline session before they took part in the two drug sessions. Drug order and practice effects are typical nuisance regressors in pharmacological imaging research. Yet, although practice effects are well acknowledged in the broader neuromodulator and cognitive literature (Bartels et al., 2010; MacRae et al., 1988; Servan-Schreiber et al., 1998), our understanding of these effects is limited. One of the few studies that report on drug administration effects, showed that d-amphetamine (AMPH) driven increases in functional-MRI–based blood oxygen level-dependent (BOLD) signal variability (SDBOLD) and performance depended greatly on drug administration order (Garrett et al., 2015). In this study, only older subjects who received AMPH first improved in performance and SDBOLD. Based on research in rats, demonstrating that dopamine release increases linearly with reward-based lever press practice (Owesson-White et al., 2008), the authors speculate that practice may have shifted participants along an inverted-U-shaped dopamine performance curve (Cools & D’Esposito, 2011) by increasing baseline dopamine release (Garrett et al., 2015). Interestingly, we did not see a modulation of the MA effects by drug session order (PL first vs. MA first). Thus, the inclusion of an orientation session might be a good strategy to control for practice and drug order effects.
Our results also illustrate the large interindividual variability of MA effects. Recently a large pharmacological fMRI/PET study (n=100) presented strong evidence that interindividual differences in striatal dopamine synthesis capacity explain variability in effects of methylphenidate on reversal learning (van den Bosch et al., 2022). They demonstrated that methylphenidate improved reversal learning performance to a greater degree in participants with higher dopamine synthesis capacity, thus establishing the baseline-dependency principle for methylphenidate. These results are in line with previous research showing that methylphenidate improved reversal learning to a greater degree in participants with higher baseline working memory capacity, an index that is commonly used as an indirect proxy of dopamine synthesis capacity (Rostami Kandroodi et al., 2021; van der Schaaf et al., 2013; van der Schaaf et al., 2014). In the current study, we did not collect working memory capacity related information. However, our result that initial task performance strongly affected the effect of MA is in line with the pattern of results showing that individual baseline differences strongly influence drug effects and thus should be considered in pharmacological studies (Cools & D’Esposito, 2011; Durstewitz & Seamans, 2008; van den Bosch et al., 2022). Indeed, there is evidence from the broader literature on the effects of psychostimulants on cognitive performance, that suggest that stimulants improve performance only in low performers (Ilieva et al., 2013). Consistent with this, there is evidence in rats, that poor baseline performance was associated with greater response to amphetamine and increased performance in signal detection task (Turner et al., 2017).
Conclusion
The current data provide evidence that relative to placebo, methamphetamine facilitates the ability to dynamically adjust learning from prediction errors. This observation was seen to a greater degree in those participants who performed poorly at baseline. These results advance existing literature by presenting evidence for a causal link between catecholaminergic modulation and learning flexibility and further highlights a baseline-dependency principle for catecholaminergic modulation.
Materials and methods
Design
The results presented here were obtained from the first two sessions of a larger four-session study (clinicaltrials.gov ID number NCT04642820). During the two 4-h laboratory sessions, healthy adults ingested capsules containing methamphetamine (20 mg; MA) or placebo (PL), in mixed order under double-blind conditions. One hour after ingesting the capsule they completed the 30-min reinforcement reversal learning task. The primary comparisons were on acquisition and reversal learning parameters of reinforcement learning after MA vs PL. Secondary measures included subjective and cardiovascular responses to the drug.
Subjects
Healthy men and women aged 18-35 years were recruited with flyers and on-line advertisements. Initial eligibility was ascertained in a telephone interview (age, current drug use, medical conditions), and appropriate candidates attended an in-person interview with a physical examination, EKG and a structured clinical psychiatric interview (First et al., 2015). Inclusion criteria were a high school education, fluency in English, body mass index between 19 and 26, and good physical and mental health. Exclusion criteria were serious psychiatric disorders (e.g., psychosis, severe PTSD, depression, history of Substance Use Disorder), any regular prescription medication, history of cardiac disease, high blood pressure, consuming >4 alcoholic or caffeinated beverages a day, or working night shifts. A total of 113 healthy young adults took part in the study. We excluded four subjects because of excessive misses on at least one session. Grubbs’ test for outlier detection with a one-sided alpha of 0.001 identified a cut-off of > 40 missed trials.
Orientation session
Participants attended an initial orientation session to provide informed consent, and to complete personality questionnaires. They were told that the purpose of the study was to investigate the effects of psychoactive drugs on mood, brain, and behavior. To reduce expectancies, they were told that they might receive a placebo, stimulant, or sedative/tranquilizer. They agreed not to use any drugs except for their normal amounts of caffeine for 24 hours before and 6 hours following each session. Women who were not on oral contraceptives were tested only during the follicular phase (1-12 days from menstruation) because responses to stimulant drugs are dampened during the luteal phase of the cycle (White et al., 2002). Most participants (N=97 out of 113) completed the reinforcement learning task during the orientation session as a baseline measurement. This measure was added after the study began. Participants who did not complete the baseline measurement were omitted from the analyses presented in the main text. We run the key analyses on the full sample (n=109). This sample included participants who completed the task only on the drug sessions. When controlling for session order and number (two vs. three sessions) effects, we see no drug effect on overall performance and learning. Yet, we found that eta was also reduced under MA in the full sample, which also resulted in reduced variability in the learning rate (see supplementary results for more details).
Drug sessions
The two drug sessions were conducted in a comfortable laboratory environment, from 9 am to 1 pm, at least 72 hours apart. Upon arrival, participants provided breath and urine samples to test for recent alcohol or drug use and pregnancy (CLIAwaived Inc,Carlsbad, CAAlcosensor III, Intoximeters; AimStickPBD, hCG professional, Craig Medical Distribution). Positive tests lead to rescheduling or dismissal from the study. After drug testing, subjects completed baseline mood measures, and heart rate and blood pressure were measured. At 9:30 am they ingested capsules (PL or MA 20 mg, in color-coded capsules) under double-blind conditions. Oral MA (Desoxyn, 5 mg per tablet) was placed in opaque size 00 capsules with dextrose filler. PL capsules contained only dextrose. Subjects completed the reinforcement learning task 60 minutes after capsule ingestion. Drug effects questionnaires were obtained at multiple intervals during the session. They completed four other cognitive tasks not reported here. Participants were tested individually and were permitted to relax, read or watch neutral movies when they were not completing study measures.
Dependent measures
Reinforcement Learning Task
Participants performed a reversal variant of an established probabilistic learning task (Fischer & Ullsperger, 2013; Jocham et al., 2014; Kirschner et al., 2022; Kirschner et al., 2023). On each trial participants were presented one of three different stimuli and decided to either gamble or avoid gambling with that stimulus with the goal to maximize the final reward (see Figure 1A). A gamble resulted in winning or losing points, depending on reward contingencies associated with the particular stimulus. If participants decided not to gamble, they avoided any consequences, but were still able to observe what would have happened if they had gambled by receiving counterfactual feedback. The three stimuli – white line drawings of animals on black background – were presented in a pseudo random series that was the same for all participants. Reward contingencies for every stimulus could be 20%, 30%, 50%, 70%, or 80% and stayed constant within one block of 30-35 trials. After every block, reward contingency changed without notice. The experiment consisted of 7 blocks per stimulus, leading to 18 reversals and 714 trials in total. Presentation 22.0 (Neurobehavioral Systems) was used for task presentation. Every trial of the task began with a central fixation cross, presented for a variable time between 300 and 500 ms. After fixation, the stimulus was presented together with the two choice alternatives (a green checkmark for choosing and a red no-go sign for avoiding, sides counterbalanced across subjects) for a maximum of 2000 ms or until a response was given. If participants failed to respond in time, a question mark was shown and the trial was repeated at the end of the block. When a response was made, the stimulus stayed on screen and feedback was given after 500 ms. The outcome was then presented for 750 ms depending on the subject’s choice. Choosing to gamble led to either a green smiley face and a reward of 10 points or a red frowning face and a loss of 10 points according to the reward probability of the stimulus. An avoided gamble had no monetary consequences: the outcome was always 0. Counterfactual/fictive outcomes, indicating what would have happened had the participant chosen to gamble, were shown on screen using the same smileys in a paler color, but the reward or punishment was crossed out to indicate that the outcome was fictive.
Drug Effects Questionnaire (DEQ)
(Morean et al., 2013) The DEQ consists of 5 questions in total. In this paper we only reported the ratings of the “Do you feel any drug effect?” question which was rated on a 100 mm visual analog scale. Participants completed this at regular intervals throughout the session.
Reinforcement learning model fitting
We fit variants of reinforcement learning models to participants’ choice behavior using a constrained search algorithm (fmincon in MATLAB 2021b), which computed a set of parameters that maximized the total log posterior probability of choice behavior. The base model (M1) was a standard Q-learning model with three parameters: (1) a temperature parameter of the softmax function used to convert trial expected values to action probabilities, (2) a play bias term that indicates a tendency to attribute higher value to gambling behavior, and (3) an intercept term for the effect of learning rate on choice behavior. On each trial the expected value (Qt) of a stimulus (Xt) was calculated according to the following formula:
Q values represent the expected value of an action at trial t. α reflects the learning rate. δt represents the prediction error with Rt being the reward magnitude of that trial. On each trial, this value term was transferred into a “biased” value term (VB(Xt) = Bplay + Qt(Xt), where Bplay is the play bias term) and converted into action probabilities (P(play|VB(t)(Xt); P(pass|VB(t)(Xt)) using a softmax function. This was our base model (M1).
Next, we fit further reinforcement models by complementing the base model with additional parameters. These additional parameters controlled trial-by-trial modulations of the learning rate. Note that our base model treats the learning rate for value updates as a constant. However, previous studies have been shown that people are able to adjust their learning rate according to the volatility of the environment (Behrens et al., 2007; Nassar et al., 2010). In the Pearce-Hall hybrid model, adjustments in learning rate are afforded by weighting the learning rate as a function of the absolute value of previous prediction error (Li et al., 2011). This associability-gated learning mechanism is empirically well supported (Le Pelley, 2004) and facilitates decreasing learning rates in periods of stability and increasing learning rates in periods of change. Previous work has shown that the hybrid model can approximate normative learning rate adjustments (Li et al., 2011; Piray et al., 2019). In this hybrid model, the learning rate is updated as follows:
Here, κ is scale of learning rate (αt) and η determines the step size for updating associability (At) as a function of the absolute RPE (|δt|). On each trial, the learning rate (αt) depends on the absolute RPE from past trial. Note that the initial learning rate is defined by κ, whereby κ is determined by a logistic function of a weighted predictor matrix that could include an intercept term (Pearce-Hall hybrid model (M2)) and task variables that may additionally affected trial-by-trial learning rate adjustments. In the Pearce-Hall hybrid feedback confirmatory model (M3) the predictor matrix included an intercept term and feedback confirmatory information (i.e., was the feedback on a given trial confirmatory (factual wins and counterfactual losses) or disconfirmatory (factual losses and counterfactual wins)). Finally, in the Pearce-Hall hybrid feedback confirmatory and modality model (M4) the predictor matrix included an intercept term, feedback confirmatory information and feedback modality (factual vs. counterfactual feedback) information. The best-fitting model was determined by computing the Bayesian Information Criterion (BIC) for each model (Schwarz, 1978). Moreover, we computed protected exceedance probabilities, which gives the probability that one model was more likely than any other model of the model space (Rigoux et al., 2014). To compare participant behavior to model-predicted behavior, we simulated choice behavior using the best fitting model (Pearce-Hall hybrid feedback confirmatory model; see Figure 3A). For each trial, we used the expected trial value (Qt(Xt)) computed above, and the parameter estimates of the temperature variable as inputs to a softmax function to generate choices. Validation of model selection and parameter recovery is reported in the supplementary materials (Figure S1).
Data analysis
We analyzed drug effects on behavioral performance, and model parameters using paired t tests. Given the effects of initial performance and practice in pharmacological imaging research (Garrett et al., 2015), we additionally stratified MA effects by task performance in the orientation using median split. These data were analyzed using a two-way repeated-measures ANOVA with the factors Drug (two levels) and Baseline Performance (two levels). Paired t tests were used as post hoc tests. Moreover, we investigated reversal learning by calculating learning curves. Post hoc, we observed that drug effects on learning became only apparent in the second phase of learning. We therefore used the Bai-Perrin multiple break point test (Bai & Perron, 2003) to identify the number and location of structural breaks in the learning curves. In broad strokes, the test detects whether breaks in a curve exists, and if so, how many there are, based on the regression slope in predefined segments (here, we set the segment length to 5 trials). In our case, the test could reveal between 0 and 5 breaks (number of trials / segment length – 1). We run this test using data from all subjects and all sessions. The test detected one break that cut the learning curves into two segments (see results). We then calculated an index of learning performance after reversals by averaging the number of correct choices over the second learning phase. The index was then subjected to a two-way repeated ANOVA with the factors Drug (two levels) and Baseline Performance (two levels).
Data Availability Statement
All raw data and analysis scripts can be accessed at the Open Science Framework data repository: [insert after acceptance].
Acknowledgements
We thank all our participants who took part in this research for the generosity of their time and commitment. This research was supported by the National Institute on Drug Abuse DA02812. HM was supported by the National Institutes of Health T32 GM07019. MU was supported by the Deutsche Forschungsgemeinschaft, Grant/Award Number: SFB 1436; and the European Research Council, Grant/Award Number: 101018805”.
Competing interests
HdW is on the Board of Directors of PharmAla Biotech, and on scientific advisory committees of Gilgamesh Pharmaceuticals and MIND Foundation. These activities are unrelated to the present study. The other authors report no competing interests.
Supplementary Information
References
- 1.Catecholamine influences on prefrontal cortical function: relevance to treatment of attention deficit/hyperactivity disorder and related disordersPharmacol Biochem Behav 99:211–216https://doi.org/10.1016/j.pbb.2011.01.020
- 2.Do college students improve their grades by using prescription stimulants nonmedically?Addict Behav 65:245–249https://doi.org/10.1016/j.addbeh.2016.07.016
- 3.Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRICereb Cortex 22:527–536https://doi.org/10.1093/cercor/bhr117
- 4.Computation and analysis of multiple structural change modelsJournal of Applied Econometrics 18:1–22https://doi.org/10.1002/jae.659
- 5.Practice effects in healthy adults: a longitudinal study on frequent repetitive cognitive testingBMC Neurosci 11https://doi.org/10.1186/1471-2202-11-118
- 6.Learning the value of information in an uncertain worldNat Neurosci 10:1214–1221https://doi.org/10.1038/nn1954
- 7.Not so smart? “Smart” drugs increase the level but decrease the quality of cognitive effortSci Adv 9https://doi.org/10.1126/sciadv.add4165
- 8.Context, cortex, and dopamine: a connectionist approach to behavior and biology in schizophreniaPsychol Rev 99:45–77https://doi.org/10.1037/0033-295x.99.1.45
- 9.Cognitive control over learning: creating, clustering, and generalizing task-set structurePsychol Rev 120:190–229https://doi.org/10.1037/a0030852
- 10.How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysisEuropean Journal of Neuroscience 35:1024–1035https://doi.org/10.1111/j.1460-9568.2011.07980.x
- 11.Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learningCognition 152:160–169https://doi.org/10.1016/j.cognition.2016.04.002
- 12.Within– and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memoryProc Natl Acad Sci U S A 115:2502–2507https://doi.org/10.1073/pnas.1720963115
- 13.Catecholaminergic modulation of meta-learningElife 8https://doi.org/10.7554/eLife.51439
- 14.Role of Dopamine in the Motivational and Cognitive Control of BehaviorThe Neuroscientist 14:381–395https://doi.org/10.1177/1073858408317009
- 15.Inverted-U-shaped dopamine actions on human working memory and cognitive controlBiol Psychiatry 69:e113–125https://doi.org/10.1016/j.biopsych.2011.03.028
- 16.Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning EfficiencyNeuron 90:1127–1138https://doi.org/10.1016/j.neuron.2016.04.019
- 17.Methylphenidate has differential effects on blood oxygenation level-dependent signal related to cognitive subprocesses of reversal learningJ Neurosci 28:5976–5982https://doi.org/10.1523/jneurosci.1153-08.2008
- 18.Metalearning and neuromodulationNeural Netw 15:495–506https://doi.org/10.1016/s0893-6080(02)00044-8
- 19.Dopamine and Cognitive Control: The Influence of Spontaneous Eyeblink Rate and Dopamine Gene Polymorphisms on Perseveration and DistractibilityBehavioral Neuroscience 119:483–490https://doi.org/10.1037/0735-7044.119.2.483
- 20.The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-o-methyltransferase genotypes and schizophreniaBiol Psychiatry 64:739–749https://doi.org/10.1016/j.biopsych.2008.05.015
- 21.In the Zone or Zoning Out? Tracking Behavioral and Neural Fluctuations During Sustained AttentionCerebral Cortex 23:2712–2723https://doi.org/10.1093/cercor/bhs261
- 22.The Neurocognitive Cost of Enhancing Cognition with Methylphenidate: Improved Distractor Resistance but Impaired UpdatingJournal of Cognitive Neuroscience 29:652–663https://doi.org/10.1162/jocn_a_01065
- 23.Structured clinical interview for DSM-5 Research version (SCID-5 for DSM-5, research version; SCID-5-RV).Arlington, VA: American Psychiatric Association
- 24.Real and fictive outcomes are processed differently but converge on a common adaptive mechanismNeuron 79:1243–1255https://doi.org/10.1016/j.neuron.2013.07.006
- 25.Prefrontal dopamine and behavioral flexibility: shifting from an “inverted-U” toward a family of functions [Review]Frontiers in Neuroscience 7https://doi.org/10.3389/fnins.2013.00062
- 26.Amphetamine modulates brain signal variability and working memory in younger and older adultsProc Natl Acad Sci U S A 112:7593–7598https://doi.org/10.1073/pnas.1504090112
- 27.Volition in Action: Intentions, Control Dilemmas, and the Dynamic Regulation of Cognitive ControlAction Science: Foundations of an Emerging Discipline The MIT Press https://doi.org/10.7551/mitpress/9780262018555.003.0024
- 28.Emotional modulation of control dilemmas: the role of positive affect, reward, and dopamine in cognitive stability and flexibilityNeuropsychologia 62:403–423https://doi.org/10.1016/j.neuropsychologia.2014.07.015
- 29.A dynamic perspective on intention, conflict, and volition: Adaptive regulation and emotional modulation of cognitive control dilemmasWhy people do the things they do: Building on Julius Kuhl’s contributions to the psychology of motivation and volition :111–129https://doi.org/10.1027/00540-000
- 30.Serotonin neurons modulate learning rate through uncertaintyCurrent Biology 32:586–599https://doi.org/10.1016/j.cub.2021.12.006
- 31.Cognitive enhancement by drugs in health and diseaseTrends Cogn Sci 15:28–36https://doi.org/10.1016/j.tics.2010.11.002
- 32.An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervalsNature Communications 9https://doi.org/10.1038/s41467-018-04840-2
- 33.Objective and subjective cognitive enhancing effects of mixed amphetamine salts in healthy peopleNeuropharmacology 64:496–505https://doi.org/10.1016/j.neuropharm.2012.07.021
- 34.Differential modulation of reinforcement learning by D2 dopamine and NMDA glutamate receptor antagonismJ Neurosci 34:13151–13162https://doi.org/10.1523/jneurosci.0757-14.2014
- 35.Electrophysiological underpinnings of response variability in the Go/NoGo taskInternational Journal of Psychophysiology 134:159–167https://doi.org/10.1016/j.ijpsycho.2018.09.008
- 36.Feedback-related EEG dynamics separately reflect decision parameters, biases, and future choicesNeuroimage 259https://doi.org/10.1016/j.neuroimage.2022.119437
- 37.Transdiagnostic inflexible learning dynamics explain deficits in depression and schizophreniaBrain 147:201–214https://doi.org/10.1093/brain/awad362
- 38.Value, search, persistence and model updating in anterior cingulate cortexNat Neurosci 19:1280–1285https://doi.org/10.1038/nn.4382
- 39.The Role of Associative History in Models of Associative Learning: A Selective Review and a Hybrid ModelThe Quarterly Journal of Experimental Psychology Section B 57:193–243https://doi.org/10.1080/02724990344000141
- 40.Differential roles of human striatum and amygdala in associative learningNat Neurosci 14:1250–1252https://doi.org/10.1038/nn.2904
- 41.Rational metareasoning and the plasticity of cognitive controlPLoS Comput Biol 14https://doi.org/10.1371/journal.pcbi.1006043
- 42.Reaction time and nigrostriatal dopamine function: the effects of age and practiceBrain Res 451:139–146https://doi.org/10.1016/0006-8993(88)90758-5
- 43.Pharmacological Fingerprints of Contextual UncertaintyPLoS Biol 14https://doi.org/10.1371/journal.pbio.1002575
- 44.Simultaneous representation of a spectrum of dynamically changing value estimates during decision makingNature Communications 8https://doi.org/10.1038/s41467-017-02169-w
- 45.The drug effects questionnaire: psychometric support across three drug typesPsychopharmacology (Berl 227:177–192https://doi.org/10.1007/s00213-012-2954-z
- 46.Control of entropy in neural models of environmental stateElife 8https://doi.org/10.7554/eLife.39404
- 47.Statistical context dictates the relationship between feedback-related EEG signals and learningElife 8https://doi.org/10.7554/eLife.46975
- 48.Age differences in learning emerge from an insufficient representation of uncertainty in older adultsNat Commun 7https://doi.org/10.1038/ncomms11609
- 49.Rational regulation of learning dynamics by pupil-linked arousal systemsNat Neurosci 15:1040–1046https://doi.org/10.1038/nn.3130
- 50.All or nothing belief updating in patients with schizophrenia reduces precision and flexibility of beliefsBrain 144:1013–1029https://doi.org/10.1093/brain/awaa453
- 51.An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environmentJ Neurosci 30:12366–12378https://doi.org/10.1523/JNEUROSCI.0822-10.2010
- 52.Dynamic changes in accumbens dopamine correlate with learning during intracranial self-stimulationProc Natl Acad Sci U S A 105:11957–11962https://doi.org/10.1073/pnas.0803896105
- 53.Different time courses of learning-related activity in the prefrontal cortex and striatumNature 433:873–876https://doi.org/10.1038/nature03287
- 54.Emotionally Aversive Cues Suppress Neural Systems Underlying Optimal Learning in Socially Anxious IndividualsJ Neurosci 39:1445–1456https://doi.org/10.1523/jneurosci.1394-18.2018
- 55.Pupil Dilation Signals Surprise: Evidence for Noradrenaline’s Role in Decision MakingFront Neurosci 5https://doi.org/10.3389/fnins.2011.00115
- 56.Catecholamine dysfunction in attention-deficit/hyperactivity disorder: an updateJ Clin Psychopharmacol 28:S39–45https://doi.org/10.1097/JCP.0b013e318174f92a
- 57.Adaptive Learning through Temporal Dynamics of State RepresentationJ Neurosci 42:2524–2538https://doi.org/10.1523/jneurosci.0387-21.2022
- 58.Modafinil and methylphenidate for neuroenhancement in healthy individuals: A systematic reviewPharmacol Res 62:187–206https://doi.org/10.1016/j.phrs.2010.04.002
- 59.Bayesian model selection for group studies – revisitedNeuroimage 84:971–985https://doi.org/10.1016/j.neuroimage.2013.08.065
- 60.Effects of methylphenidate on reinforcement learning depend on working memory capacityPsychopharmacology (Berl 238:3569–3584https://doi.org/10.1007/s00213-021-05974-w
- 61.A neural substrate of prediction and rewardScience 275:1593–1599https://doi.org/10.1126/science.275.5306.1593
- 62.Estimating the Dimension of a ModelThe Annals of Statistics 6:461–464
- 63.Meta-learning in reinforcement learningNeural Netw 16:5–9https://doi.org/10.1016/s0893-6080(02)00228-9
- 64.Dopamine and the mechanisms of cognition: Part II. D-amphetamine effects in human subjects performing a selective attention taskBiol Psychiatry 43:723–729https://doi.org/10.1016/s0006-3223(97)00449-6
- 65.Dissociable contribution of prefrontal and striatal dopaminergic genes to learning in economic gamesProc Natl Acad Sci U S A 111:9615–9620https://doi.org/10.1073/pnas.1316259111
- 66.Dorsal anterior cingulate-brainstem ensemble as a reinforcement meta-learnerPLoS Comput Biol 14https://doi.org/10.1371/journal.pcbi.1006370
- 67.Are prescription stimulants “smart pills”? The epidemiology and cognitive neuroscience of prescription stimulant use by normal healthy individualsPsychol Bull 137:717–741https://doi.org/10.1037/a0023825
- 68.Bayesian model selection for group studiesNeuroimage 46:1004–1017https://doi.org/10.1016/j.neuroimage.2009.03.025
- 69.Baseline-dependent effects of amphetamine on attention are associated with striatal dopamine metabolismSci Rep 7https://doi.org/10.1038/s41598-017-00437-9
- 70.Striatal dopamine dissociates methylphenidate effects on value-based versus surprise-based reversal learningNat Commun 13https://doi.org/10.1038/s41467-022-32679-1
- 71.Working memory capacity predicts effects of methylphenidate on reversal learningNeuropsychopharmacology 38:2011–2018https://doi.org/10.1038/npp.2013.100
- 72.Establishing the dopamine dependency of human striatal signals during reward and punishment reversal learningCereb Cortex 24:633–642https://doi.org/10.1093/cercor/bhs344
- 73.Relationship between blockade of dopamine transporters by oral methylphenidate and the increases in extracellular dopamine: therapeutic implicationsSynapse 43:181–187https://doi.org/10.1002/syn.10038
- 74.Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive workScience 367:1362–1366https://doi.org/10.1126/science.aaz5891
- 75.Differential subjective effects of D-amphetamine by gender, hormone levels and menstrual cycle phasePharmacol Biochem Behav 73:729–741
- 76.A mixture of delta-rules approximation to bayesian inference in change-point problemsPLoS Comput Biol 9https://doi.org/10.1371/journal.pcbi.1003150
- 77.Uncertainty, neuromodulation, and attentionNeuron 46:681–692https://doi.org/10.1016/j.neuron.2005.04.026
- 78.Adaptive learning is structure learning in timeNeurosci Biobehav Rev 128:270–281https://doi.org/10.1016/j.neubiorev.2021.06.024
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2024, Kirschner et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 159
- downloads
- 3
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.