It has been difficult to link synaptic modification to overt behavioral changes. Rodent models of DYT1 dystonia, a motor disorder caused by a single gene mutation, demonstrate increased long-term potentiation and decreased long-term depression in corticostriatal synapses. Computationally, such asymmetric learning predicts risk taking in probabilistic tasks. Here we demonstrate abnormal risk taking in DYT1 dystonia patients, which is correlated with disease severity, thereby supporting striatal plasticity in shaping choice behavior in humans.https://doi.org/10.7554/eLife.14155.001
We learn to choose better options and avoid worse ones through trial and error, but exactly how this happens is still unclear. One idea is that we learn 'values' for options: whenever we choose an option and get more reward than originally expected (for example, if an unappetizing-looking food turns out to be very tasty), the value of that option increases. Likewise, if we get less reward than expected, the chosen option’s value decreases.
This learning process is hypothesized to work via the strengthening and weakening of connections between neurons in two parts of the brain: the cortex and the striatum. In this model, the activity of the neurons in the cortex represents the options, and the value of these options is represented by the activity of neurons in the striatum. Strengthening the connections is thought to increase the value of the stimulus, but this theory has been difficult to test.
In humans, a single genetic mutation causes a movement disorder called DYT1 dystonia, in which muscles contract involuntarily. In rodents, the same mutation causes the connections between the neurons in the cortex and the striatum to become too strong. If the theory about value learning is true, this strengthening should affect the decisions of patients that have DYT1 dystonia.
Arkadir et al. got healthy people and people with DYT1 dystonia to play a game where they had to choose between a 'sure' option and a 'risky' option. Picking the sure option guaranteed the player would receive a small amount of money, whereas the risky option gave either double this amount or nothing. The theory predicts that the double rewards should cause the patients to learn abnormally high values, which would lure them into making risky choices. Indeed, Arkadir et al. found that players with DYT1 dystonia were more likely to choose the risky option, with the people who had more severe symptoms of dystonia having a greater tendency towards taking risks.
Arkadir et al. showed that these results correspond with a model that suggests that people with DYT1 dystonia learn excessively from unexpected wins but show weakened learning after losses, causing them to over-estimate the value of risky choices. This imbalance mirrors the previous results that showed an inappropriate strengthening of the connections between neurons in rodents, and so suggests that similar changes occur in the brains of humans. Thus it appears that the changes in the strength of the connections between neurons translate into changes in behavior.
This pattern of results might also mean that the movement problems seen in people with DYT1 dystonia may be because they over-learn movements that previously led to a desired outcome and cannot sufficiently suppress movements that are no longer useful. Testing this idea will require further experiments.https://doi.org/10.7554/eLife.14155.002
DYT1 dystonia is a rare, dominantly inherited form of dystonia, caused almost exclusively by a specific deletion of three base pairs in the TOR1A gene (Ozelius et al., 1997). Clinically, DYT1 dystonia is characterized by variable severity of sustained or intermittent muscle contractions that produce abnormal movements. DYT1 dystonia patients have normal intelligence, and post-mortem examination of their brains does not reveal obvious abnormalities or evidence of neurodegeneration (Paudel et al., 2012). Nevertheless, research in two different rodent models of DYT1 dystonia points to the existence of a fundamental deficit in synaptic plasticity. Specifically, brain slices of transgenic rodents expressing the human mutant TOR1A gene show abnormally strong long-term potentiation (LTP; Martella et al., 2009) and weak, or even absent, long-term depression (LTD; Grundmann et al., 2012; Martella et al., 2009) in corticostriatal synapses, as compared to wild-type controls.
Reinforcement learning theory (Sutton and Barto, 1998) hypothesizes that dopamine-dependent synaptic plasticity in corticostriatal networks is the neuronal substrate for learning through trial and error (Barnes et al., 2005; Barto, 1995; Schultz et al., 1997). The core assumptions of this theory are that (1) dopamine release in the striatum signals errors in the prediction of reward, with dopamine levels increasing following successful actions (to signal a positive prediction error) and decreasing when actions fail to achieve the expected outcome (a negative prediction error), (2) fluctuations in dopamine modulate downstream plasticity in recently active corticostriatal synapses such that synapses responsible for positive prediction errors are strengthened through long-term potentiation (LTP), and those that led to disappointment are weakened through long-term depression (LTD) (Reynolds et al., 2001), and (3) the efficacy of corticostriatal transmission affects voluntary action selection.
Dopamine’s role as a reinforcing signal for trial-and-error learning is supported by numerous findings (Pessiglione et al., 2006; Schultz et al., 1997; Steinberg et al., 2013), including in humans, where Parkinson’s disease serves as a human model for altered dopaminergic transmission (Frank et al., 2004). However, the contribution of (dopamine modulated) corticostriatal plasticity to shaping action has remained unconfirmed in the behaving organism, as it is not clear that the behavioral effects of altered dopamine signaling in Parkinson’s disease (and other conditions in which dopamine transmission is compromised) indeed stem from the role of dopamine in modulating plasticity. Towards this end, here we test whether DYT1 dystonia, where corticostriatal plasticity is suggested to be altered despite preserved dopaminergic signaling, leads to the behavioral effects predicted by reinforcement learning with imbalanced plasticity. In particular, our predictions stem from considering the effects of intact prediction errors on an altered plasticity mechanism that amplifies the effect of positive prediction errors (i.e., responds to positive prediction errors with more LTP than would otherwise occur in controls) and mutes the effects of negative prediction errors (that is, responds with weakened LTD as compared to controls).
We compared the behavior of DYT1 dystonia patients and healthy controls on an operant-learning paradigm with probabilistic rewards (Niv et al., 2012). Participants learned from trial and error to associate four different visual cues with monetary rewards (Figure 1a), optimizing their gain by selecting one of two cues in choice trials, and choosing the single available cue in forced trials. Three visual cues were associated with a payoff of 0¢, 5¢ and 10¢, respectively, while the fourth cue was associated with an unpredictable payoff of either 0¢ or 10¢ with equal probabilities (henceforth the ‘risky 0/10¢’ cue). Based on the findings in rodents with the DYT1 mutation, we predicted that dystonia patients would learn preferentially from positive prediction errors (putatively due to abnormally strong LTP) and to a much lesser extent from negative prediction errors (due to weak LTD) (Figure 1b). As a result, they should show a stronger tendency to choose the risky cue as compared to healthy controls.
We tested 13 patients with DYT1 dystonia (8 women, 5 men, age 20–47, mean 28.6 years, henceforth DYT) and 13 healthy controls (CTL; 8 women, 5 men, age 19–46, mean 28.8 years), matched on an individual basis for sex and age (Mann-Whitney U test for age differences, z = −0.59, df = 24, P = 0.55), all with at least 13 years of education. Patients had no previous neurosurgical interventions for dystonia (including deep brain stimulation) and were tested before their scheduled dose of medication when possible (see Materials and methods). The number of aborted trials was similarly low in both groups (DYT 2.3 ± 2.5, CTL 1.1 ± 1.2, Mann-Whitney z = −1.61, df = 24, P = 0.11) and reactions times were well below the 1.5s response deadline (DYT 0.78s ± 0.11, CTL 0.71s ± 0.10, Mann-Whitney z = −1.49, df = 24, P = 0.14), confirming that motor symptoms of dystonia did not interfere with the minimal motor demands of the task.
Both groups quickly learned the task, and showed similarly high probabilities of choosing the best cue in trials in which a pair of sure cues (sure 0¢ vs. sure 5¢ or sure ¢5 vs. sure 10¢) appeared together (mean probability correct choice: DYT1 0.92 ± 0.08, CTL 0.93 ± 0.05, Mann-Whitney z = 0.08, df = 24, P = 0.94; Figure 2a), as well as in trials in which the risky cue appeared together with either the sure 0¢ or sure 10¢ cues (mean probability correct: DYT1 0.84 ± 0.09, CTL 0.89 ± 0.04, Mann-Whitney z = −1.39, df = 24, P = 0.17; Figure 2b).
On trials in which the risky 0/10¢ cue appeared together with the equal-mean 5¢ sure cue, control participants showed risk-averse behavior, as is typically observed in such tasks (Kahneman and Tversky, 1979; Niv et al., 2012). In contrast, patients with DYT1 dystonia displayed significantly less risk aversion, choosing the risky stimulus more often than controls throughout the experiment (Figure 3a, Mann Whitney one-sided test for each block separately, all z > 1.68, df = 24, P < 0.05; Friedman’s test for effect of group after correcting for the effect of time χ2 = 16.2, df = 1, P < 0.0001). Overall, the probability of choosing the risky cue was significantly higher among patients with dystonia than among healthy controls (Figure 3b, probability of choosing the risky cue over the sure cue DYT 0.44 ± 0.18, CTL 0.25 ± 0.20, Mann-Whitney z = 2.33, df = 24, P < 0.05).
To rule out the possibility that DYT1 patients were simply making choices randomly, causing their behavior to seem indifferent to risk, we divided all 0/10¢ versus 5¢ choice trials according to the outcome of the previous trial in which the risky 0/10¢ cue was chosen. As shown in Figure 3c (see also Figure 3—figure supplement 1), both groups chose the risky 0/10¢ cue significantly more often after a 10¢ ‘win’ than after a 0¢ ‘loss’ outcome (DYT P < 0.005, CTL P < 0.05, Wilcoxon signed-rank test), attesting to intact reinforcement learning in the DYT group (see Figure 3—figure supplement 2, for a reinforcement learning simulation of the same result). If anything, DYT1 dystonia patients showed a greater difference between trials following a win and those following a loss. We next tested for a correlation between risk-taking behavior and the clinical severity of dystonia, as rated on the day of the experiment (see Materials and methods). The results showed that patients with more severe dystonia were more risk taking in our task (Figure 3d, Pearson’s r = 0.62, df = 11, P < 0.05). Risky behavior was not significantly affected by sex (Figure 3—figure supplement 3) or the patient's regime of regular medication (Figure 3—figure supplement 4), and the relationship between risk taking and symptom severity held even when controlling for these factors (p < 0.05 for symptom severity when regressing risk taking on symptom severity, age and either of the two medications; including both medications in the model lost significance for symptom severity, likely due to the large number of degrees of freedom for such a small sample size; age and medication did not achieve significance in any of the regressions).
To test whether increased risk-taking in DYT1 dystonia could be explained by asymmetry in the effects of positive and negative prediction errors on corticostriatal plasticity, we modeled participants’ choice data using an asymmetric reinforcement-learning model (see methods) where the learning rate () is modulated by ) when learning from positive prediction errors and by ) when the prediction error is negative (also called a 'risk-sensitive' reinforcement learning model; Mihatsch and Neuneier, 2002; Niv et al., 2012). Our model also included an inverse-temperature parameter () controlling the randomness of choices. This approach exploits fluctuations in each individual’s propensity for risk taking (see Figure 3a) as they update their policy based on the outcomes they experience, to recover the learning rate and learning asymmetry that provide the best fit to each participant’s observed behavior.
First, we tested whether the asymmetric-learning model is justified, that is, whether it explains participants’ data significantly better than the classical reinforcement-learning model with only learning-rate and inverse-temperature parameters. The results showed that the more complex model was justified for the majority of participants (16 out of 26 participants; DYT 6, CTL 19), and in particular, for participants who were risk seeking or risk taking (but not risk-neutral; Figure 4a).
We then compared the individually fit parameters of the asymmetric model across the two groups. We found significant differences between the groups in the learning asymmetry parameter (DYT −0.05 ± 0.27, CTL −0.34 ± 0.27, Mann-Whitney z = −2.51, df = 24, P < 0.05), but no differences in the other two parameters (learning rate DYT1 0.25 ± 0.19, CTL 0.14 ± 0.11, Mann-Whitney z = 1.33, df = 24, P = 0.18; inverse temperature DYT 0.68 ± 0.37, CTL 0.93 ± 0.47, Mann-Whitney z = −1.18, df = 24, P = 0.23). Thus patients’ behavior was consistent with enhanced learning from positive prediction errors and reduced learning from negative prediction errors as compared to healthy controls, despite the overall rate of learning and the degree of noise in choices (modeled by the inverse temperature parameter) being similar across groups. A significant correlation was also observed between the learning asymmetry parameter and the severity of dystonia (Pearson’s r = 0.64, df = 11, P < 0.05).
One alternative explanation for our results is that the nonlinearity of subjective utility functions (Kahneman and Tversky, 1979) for small amounts of money is different between DYT1 dystonia patients and controls. However, replicating previous results from a healthy cohort (Niv et al., 2012), formal model comparison suggested that choice behavior in our task is significantly better explained by the asymmetric-learning model above (Figure 4b). Moreover, the impetus for our experiment was an a priori hypothesis regarding risk sensitivity as a consequence of asymmetric learning, based on findings from the mouse model of DYT1 dystonia, which has no straightforward equivalent interpretation in terms of nonlinear utilities. We note also that strongly nonlinear utilities in the domain of small payoffs such as those we used here are generally unlikely (Rabin and Thaler, 2001), again suggesting that risk sensitivity is more likely to arise in our experiment from asymmetric learning. Another alternative explanation for behavior in our task, is a win-stay lose-shift strategy that is perhaps utilized to different extent by DYT1 patients and controls. However, this model, equivalent to the classical reinforcement-learning model with a learning rate of 1 and only an inverse temperature parameter, fit 25 out of 26 participants’ data considerably worse than the asymmetric learning model, and therefore was not investigated further.
We demonstrated that DYT1 dystonia patients and healthy controls have different profiles of risk sensitivity in a trial-and-error learning task. Our results support the dominant model of reinforcement learning in the basal ganglia, according to which prediction-error modulated LTP and LTD in corticostriatal synapses are responsible for changing the propensity to repeat actions that previously led to positive or negative prediction errors, respectively. Similar to Parkinson’s disease, at first considered a motor disorder but now recognized to also cause cognitive and learning abnormalities, it appears that DYT1 dystonia is not limited to motor symptoms (Fiorio et al., 2007; Heiman et al., 2004; Molloy et al., 2003; Stamelou et al., 2012), and specifically, that the suspected altered balance between LTP and LTD in this disorder has overt, readily measurable effects on behavior.
DYT1 dystonia and Parkinson's disease can be viewed as complementary models for understanding the mechanisms of reinforcement learning in the human brain. In unmedicated Parkinson’s disease patients, learning from positive prediction errors is impaired due to reduced levels of striatal dopamine that presumably signal the prediction errors themselves, whereas learning from negative prediction errors is intact (Frank et al., 2004; Rutledge et al., 2009). This impairment, and the resulting asymmetry that favors learning from negative prediction errors, can be alleviated using dopaminergic medication (Frank et al., 2004; Shohamy et al., 2004). DYT1 dystonia patients, on the other hand, seem to have intact striatal dopamine signaling (Balcioglu et al., 2007; Dang et al., 2006; Grundmann et al., 2007; Zhao et al., 2008), but altered corticostriatal LTP/LTD that favors learning from positive prediction errors.
Our a priori predictions were based on a simplified model of the role of corticostriatal LTP and LTD in reinforcement learning, and the entire picture is undoubtedly more complex. Controversies regarding the functional relationship between the direct and indirect pathways of the basal ganglia (Calabresi et al., 2014; Cui et al., 2013; Kravitz et al., 2012) and the large number of players taking part in shaping synaptic plasticity (Calabresi et al., 2014; Shen et al., 2008) make it hard to pin down the precise mechanism behind reinforcement learning. Indeed, the DYT1 mouse model has also been linked to impaired plasticity in the indirect pathway due to D2 receptor dysfunction (Beeler et al., 2012; Napolitano et al., 2010; Wiecki et al., 2009), which can lead to abnormal reinforcement (Kravitz et al., 2012).
In any case, our finding are compatible with the prominent 'Go'/'NoGo' model of learning and action selection in the basal ganglia (Frank et al., 2004) that incorporates opposing directions of plasticity in the direct and indirect pathways (Collins and Frank, 2014). In particular, current evidence suggests that corticostriatal LTP following positive prediction errors and LTD following negative prediction errors occur in D1 striatal neurons (direct pathway), whereas plasticity in D2-expressing neurons (indirect pathway) is in the opposite direction (Kravitz et al., 2012; Shen et al., 2008). As the direct pathway supports choice (‘Go’) while the indirect pathway supports avoidance (‘NoGo’), under this implementation of reinforcement learning both types of learning eventually lead to the same behavioral outcome: a positive prediction error increases the probability that the action/choice that led to the prediction error would be repeated in the future, and vice versa for negative prediction errors. As such, at the algorithmic level in which our asymmetric learning model was cast, the differences we have shown between dystonia patients and controls would still be expected to manifest behaviorally through diminished risk-aversion in dystonia patients.
In particular, our results are compatible with several alternative abnormalities in corticostriatal plasticity in DYT1 dystonia: (a) Abnormally strong LTP/weak LTD in D1-expressing striatal neurons only, with plasticity in the indirect pathway being intact; in this case, learning in the direct pathway would exhibit the abnormal asymmetries we argue for, whereas the indirect pathway would learn as normal. (b) Abnormally strong LTP/weak LTD in D1-expressing striatal neurons and the opposite pattern, abnormally strong LTD and/or weak LTP in D2-expressing striatal neurons of the indirect pathway in DYT1 dystonia. As a result, a positive prediction error would generate extra strong positive learning in the Go pathway, and a similarly large decrease in the propensity to avoid this stimulus due to activity in the 'NoGo' pathway. Conversely, learning from negative prediction errors would generate relatively little decrease in the propensity to 'Go' to the stimulus and little increase in the propensity to 'NoGo'. In both cases, the effect on both pathways would be in the same direction as is seen in the behavioral asymmetry. (c) Finally, abnormalities may exist in both pathways in the same direction (stronger LTP and weaker LTD), but with a larger effect on LTP as compared to LTD. In this case, a positive prediction error would increase 'Go' activity considerably, but not decrease 'NoGo' activity to the same extent. Negative prediction errors, on the other hand, would increase 'NoGo' propensities while decreasing 'Go' propensities to a lesser extent. This type of asymmetry can explain why the rodent studies suggested almost absent (not only weaker) LTD, but nevertheless, patients did not behave as if they did not learn at all from negative prediction errors. Unfortunately, our model and behavioral results cannot differentiate between these three options. We hope that future data, especially from transgenic DYT1 rodents, will clarify this issue.
Relative weighting of positive and negative outcomes shapes risk-sensitivity in tasks that involve learning from experience. Humans with preserved function of the basal ganglia have been shown to be risk-averse in such tasks. We showed that patients with DYT1 dystonia are more risk-neutral, a rational pattern of behavior given our reward statistics, and in such tasks in general. While this type of behavior may offer advantages under certain conditions, it may also contribute to impaired reinforcement learning of motor repertoire and fixation on actions that were once rewarded. In any case, these reinforcement-learning manifestations of what has been considered predominantly a motor disease provide support for linking corticostriatal synaptic plasticity and overt trial-and-error learning behavior in humans.
Fourteen participants with genetically-proven (c.907_909delGAG) (Ozelius et al., 1997) symptomatic DYT1 dystonia were recruited through the clinics for movement disorders in Columbia University and Beth Israel Medical Centers in New York and through publication in the website of the Dystonia Medical Research Foundation. Exclusion criteria included age younger than 18 or older than 50 years old and deep brain stimulation or other prior brain surgeries for dystonia. A single patient was excluded from further analysis due to choosing the left cue in 100% of trials. Thirteen age- and sex-matched healthy participants were recruited among acquaintances of the DYT1 patients and from the Princeton University community. Healthy control participants were not blood relatives of patients with dystonia and did not have clinical dystonia. All patients and healthy controls had at least 13 years of education.
Nine DYT1 dystonia patients took baclofen (n = 6, daily dose 66.7 ± 28.0 mg, range 30–100 mg) and/or trihexyphenidyl (n = 7, daily dose 30.9 ± 25.8 mg, range 12–80 mg) for their motor symptoms. In order to reduce possible effects of medication, patients were tested before taking their scheduled dose. The median time interval between the last dose of medication and testing was 7.5 hr for baclofen (range 1–20 hr) and 13 hr for trihexyphenidyl (range 1–15 hr). Given that the reported plasma half-life times of baclofen is 6.8 hr (Wuis et al., 1989) and of trihexyphenidyl is 3.7 hr (Burke and Fahn, 1985), three patients were tested within the plasma half life of the last dose of their medication. Finally, we could not find correlation between sex of participants (Figure 3—figure supplement 1) or medication doses (Figure 3—figure supplement 4) and relevant behavioral outcomes.
All participants gave informed consent and the study was approved by the Institutional Review Boards of Columbia University, Beth Israel Medical Center, and Princeton University. Clinical scale of dystonia severity was scored immediately after consenting by a movement-disorders specialist (DA), using the Fahn-Marsden dystonia rating scale (Burke et al., 1985). This scale integrates the number of involved body parts, the range of actions that induce dystonia and the severity of observed dystonia. One patient was scored 0 since dystonia was not clinically observed on the day of her testing.
Prior to, and after completing the reported task, all participants performed a short (8–9 min) unrelated auditory discrimination task (Baron, 1973) (results not reported here) that was not associated with any monetary reward. Participants were informed that the two tasks were not related.
Four different pseudo-letters served as cues (‘slot machines’) and were randomly allocated to four payoff schedules: sure 0¢, sure 5¢, sure 10¢, and one variable-payoff ‘risky’ stimulus associated with equal probabilities of 0¢ or 10¢ payoffs. Participants were not informed about the payoffs associated with the different cues and had to learn them from trial and error.
Two types of trials were pseudo-randomly intermixed: In ‘choice trials’, two cues were displayed (left and right location randomized), and the participant was instructed to select one of the two cues by pressing either the left or right buttons on a keyboard. The cue that was not selected then disappeared and the payoff associated with the chosen cue was displayed for 1 s. After a variable (uniformly distributed) inter-trial interval of 1–2 s, the next trial began. In ‘forced trials’, only one cue was displayed on either the left or right side of the screen, and the participant had to indicate its location using the keyboard to obtain its associated outcome. All button presses were timed out after 1.5 s, at which time the trial was aborted with a message indicating that the response was 'too slow' and the inter-trial interval commenced. Participants were instructed to try to maximize their winnings and were paid according to their actual payoffs in the task. On-screen instructions for the task also informed participants that payoffs depended only on the ‘slot machine’ chosen, not on its location or on their history of choices. Throughout the experiment, to minimize motor precision requirements, any of keys E, W, A, Z, X, D, and S (on the left side of the keyboard) served as allowable response buttons for choosing the left cue and any of keys I, O, L, <, M, J and K (on the right side of the keyboard) served as allowable response buttons for choosing the right cue. Each set of response keys was marked with stickers of different colors (blue for left keys and red for right keys) to aid in their visual identification.
Participants were first familiarized with the task and provided with several observations of the cue–reward mapping in a training phase that included two subparts. The first part involved 16 pseudo-randomly ordered forced trials (four per cue). The second part comprised 10 pseudo-randomly ordered choice trials (two of each of five types of choice trials: 0¢ versus 5¢, 5¢ versus 10¢, 0¢ versus 0/10¢, 5¢ versus 0/10¢ and 10¢ versus 0/10¢).
Before the experimental task began, on-screen instructions informed subjects that they would encounter the same cues as in the training phase. They were briefly reminded of the rules and encouraged to choose those ‘slot machines’ that yielded the highest payoffs, as they would be paid their earnings in this part. The task then consisted of 300 trials (two blocks of 150 trials each, with short breaks after every 100 trials of the experiment), with choice and forced trials randomly intermixed. Each block comprised of 30 'risk' choice trials involving a choice between the 5¢ cue and the 0/10¢ cue, 20 choice trials involving each of the pairs 0¢ versus 0/10¢ and 10¢ versus 0/10¢, 15 choice trials involving each of the pairs 0¢ versus 5¢ and 5¢ versus 10¢, 14 forced trials involving the 0/10¢ cue and 12 forced trials involving each of the 0¢, 5¢ and 10¢ cues. Trial order was pseudo-randomized in advance and was similar between patients and between blocks. Payoffs for the 0/10¢ cue were counterbalanced such that groups of eight consecutive choices of the risky cue included exactly four 0¢ payoffs and four 10¢ payoffs. All task events were controlled using MATLAB (MathWorks, Natick, MA) PsychToolbox (Brainard, 1997).
Our modeling and quantification of the effects of abnormal learning from prediction errors rest solely on the risky cue, for which learning presumably continued throughout the experiment. However, one potential worry is that participants did not use trial-and-error learning to evaluate this cue, but rather ‘guessed’ its value using a cognitive system (as in rule-based learning). To evaluate this possibility, we tested for a difference in the propensity to choose the risky cue after a previous win or a loss, throughout the task (see Figure 3c and Figure 3—figure supplement 1).
To test the hypothesis that increased risk-taking in DYT1 dystonia was due to an enhanced effect of positive prediction errors and a weak effect of negative prediction errors, we modeled participants’ choice data using an asymmetric reinforcement learning model (also called a risk-sensitive temporal difference reinforcement learning (RSTD) model) (Mihatsch and Neuneier, 2002; Niv et al., 2012). The learning rule in this model is
where V(cue) is the value of the chosen cue, δ = r – Vold(cue) is the prediction error, that is, the difference between the obtained reward r and the predicted reward Vold(cue), η is a learning-rate parameter and is an asymmetry parameter that is applied as ) if the prediction error is positive () and as ) if the prediction error is negative (). This model is fully equivalent to a model with two learning rate parameters, one for learning when prediction errors are positive and another for learning when prediction errors are negative. Following common practice, we also assumed a softmax (or sigmoid) action selection function:
where p(A) is the probability of choosing cue A over cue B, and β is an inverse-temperature parameter controlling the randomness of choices (Niv et al., 2012).
We fit the free parameters of the model (η, , and β) to behavioral data of individual participants, using data from both training and test trials (total of 326 trials) as participants learned to associate cues with their outcomes from the first training trial. Cue values were initialized to 0. We optimized model parameters by minimizing the negative log likelihood of the data given different settings of the model parameters using the Matlab function "fminunc". The explored ranges of model parameters was [0,1] for the learning-rate parameter, [−10,10] for the learning-asymmetry parameters, and [0–30] for the inverse-temperature parameter. To avoid local minima, for each participant we repeated the optimization 5 times from randomly chosen starting points, keeping the best (maximum likelihood) result. This method is commonly used for temporal difference learning models and is known to be well-behaved (Niv et al., 2012).
Previous work has shown that the asymmetric learning model best explains participants’ behavior in our task (Niv et al., 2012). To replicate those results in our sample population, we compared the asymmetric learning model to three other alternative models. The first was a classical reinforcement learning model with no learning asymmetry . The second alternative model was based on the classical nonlinear (diminishing) subjective utility of monetary rewards. The idea is that the 10¢ reward may not be subjectively equal to twice the 5¢ reward, therefore engendering risk-sensitive choices in our task. We thus defined learning in a nonlinear utility model as where U(R) is the subjective utility of reward R. Without loss of generality, we parameterized the utility function over the three possible outcomes (0¢, 5¢ or 10¢) by setting U(0) = 0, U(5) = 5 and U(10) = a×10, where the parameter a could be larger, equal to or smaller than 1, and was fit to the data of each participant separately. If the effect of a loss is larger than that of a gain of the same magnitude, a should be smaller than 1. Finally, we tested a win-stay-lose-shift strategy model that is equivalent to the classic reinforcement learning model with a learning rate of 1. All models used the softmax choice function with an inverse temperature parameter . The parameters of each of the models were fit to each participant’s data as was done for the asymmetric learning model.
Because the relevant sets of data were not normally distributed (tested using a Kolmogorov-Smirnov test, P < 0.05), we analyzed the data using the nonparametric Mann-Whitney U test to compare two populations, Wilcoxon signed-rank test for repeated measures tests, and Friedman’s test for non-parametric one-way repeated measures analysis of variance by ranks. All statistical tests were two-sided unless otherwise specified.
Dopamine release is impaired in a mouse model of DYT1 dystoniaJournal of Neurochemistry 102:783–788.https://doi.org/10.1111/j.1471-4159.2007.04590.x
Postdiscrimination gradients of human subjects on a tone continuumJournal of Experimental Psychology 101:337–342.https://doi.org/10.1037/h0035206
Adaptive Critics and the Basal GangliaIn: J. C Houk, J. L Davis, D. G Beiser, editors. Models of Information Processing in the Basal Ganglia. Cambridge, MA: MIT Press. pp. 215–232.
Direct and indirect pathways of basal ganglia: a critical reappraisalNature Neuroscience 17:1022–1030.https://doi.org/10.1038/nn.3743
Motor deficits and hyperactivity in Dyt1 knockdown miceNeuroscience Research 56:470–474.https://doi.org/10.1016/j.neures.2006.09.005
Generation of a novel rodent model for DYT1 dystoniaNeurobiology of Disease 47:61–74.https://doi.org/10.1016/j.nbd.2012.03.024
Distinct roles for direct and indirect pathway striatal neurons in reinforcementNature Neuroscience 15:816–818.https://doi.org/10.1038/nn.3100
Review: genetics and neuropathology of primary pure dystoniaNeuropathology and Applied Neurobiology 38:520–534.https://doi.org/10.1111/j.1365-2990.2012.01298.x
A causal link between prediction errors, dopamine neurons and learningNature Neuroscience 16:966–973.https://doi.org/10.1038/nn.3413
Reinforcement Learning: An IntroductionCambridge, Mass: MIT Press.
Plasma and urinary excretion kinetics of oral baclofen in healthy subjectsEuropean Journal of Clinical Pharmacology 37:181–184.https://doi.org/10.1007/BF00558228
Abnormal motor function and dopamine neurotransmission in DYT1 DeltaGAG transgenic miceExperimental Neurology 210:719–730.https://doi.org/10.1016/j.expneurol.2007.12.027
Rui M CostaReviewing Editor; Fundação Champalimaud, Portugal
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "DYT1 dystonia increases risk taking in humans" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. The reviewers found the behavioral findings interesting. However, they indicate that the model presented needs to be better validated, and that alternative models should be considered to explain the data. For example, given the current claims it is important to verify that the risk-taking effect is demonstrably a learning effect rather than a choice-like effect or a WSLS type effect that would not implicate striatal plasticity. To be clear, this was the key concern raised in the reviewer discussion and the reviewers were clear that publication of the study would be contingent on the outcome of these analyses.
Furthermore, in the current version of the study, there is no evidence that the changes in corticostriatal plasticity mentioned are responsible for the risk taking behavior. So that interpretation should be toned down, unless new evidence is presented. Below are the detailed comments, which should help in preparing a revised version.
I enjoyed reading this relatively brief and focused test of the role of dopamine in learning in human subjects. Several previous reports have exploited the fact that the impact of diminished dopamine on learning can be investigated in Parkinson's disease patients (Frank et al., Science, 2004, Shohamy et al., Brain, 2004; Rutledge et al., J. Neurosci., 2009) but the novelty of this report is that it examines the impact of a dopaminergic/striatal change occurring in a type of dystonia, DYT1 dystonia, that is, in a sense, the opposite way round (in the sense that there is an increase in LTP/decrease in LTD in transgenic animals). Perhaps the only other way to conduct a test of this sort is by examining the impact of dopamine increases by L-DOPA treatment (e.g. Pessiglione et al., Nature, 2006).
The task used is one that has been previously used by one of the authors (Niv et al., J. Neuroscience, 2012). The previous report also justifies the modeling approach taken. I therefore had very few questions. I wondered, however, about the following minor points.
1) In the last paragraph of the Introduction. Although I agree that the results are interesting I was not always quite sure precisely how the authors intended to frame them. Perhaps changing or adding a few words in the Introduction or Discussion might help provide the final clarification needed. The current version emphasizes that the dystonic condition is associated with a change in LTP/LTD balance and emphasizes that this might lead to a change in learning from positive prediction errors. By contrast when discussing studies in Parkinson's patients they seem to emphasize dopaminergic changes. Is it not the case that changes in dopamine (or its balance with other neurotransmitters) occur both in Parkinson's patients and dystonic patients and that as a consequence both will be expected to lead to changes in LTP /LTD ratios. Or is there no evidence of a change in LTP/LDT in Parkinson's disease?
2) While I understand that most patients with this particular condition have normal general intelligence I wondered if there was any information about intelligence/education matching in the control and patient groups reported here?
3) In the fourth paragraph of the Discussion. It is stated that "Current methodologies cannot yet bridge the gap between cellular level processes (LTP/LTD) and behavior at the level of an animal, let alone a human." Arguably, however, there are techniques that can be used to induce increments and decrements in the strengths of specific connexions even in awake, behaving human subjects (Buch et al., 2011, J. Neuroscience, Johnen et al., eLife, 2015) although their use is limited to the cortex.
This study investigates risk-taking in a reinforcement learning experiment in 13 patients with dystonia and matched controls. Subjects learned to select between pairs of cues (or are forced to pick one). Three cues lead deterministically to 0, 5 or 10c, and the last one to 0 or 10c with equal probability. Risk was assessed by the choice between the 5c and the probabilistic cues, which have same expected value. The authors show that controls are significantly more risk-averse in this task than patients, who appear to be risk neutral. They show evidence that this is not due to patients choosing randomly, but that they are sensitive to previous reward and thus presumably learning the value over time. They argue that the behavioral difference can be accounted for by assuming variable learning rates for positive vs. negative prediction errors, with a greater asymmetry for patients than controls. The authors hypothesize, based on a rodent model showing abnormally high LTP and low LTD, that the observed behavioral effects thus reflect differences in synaptic plasticity.
Overall, this paper is well written and interesting. There are some limitations to the study.
1) The authors overstate the implications of this study in the Abstract, main text and Discussion. For example, the Discussion states that this "may provide the first link between synaptic plasticity and overt learning behavior in humans". This is a stretch: for one, based on the literature they present, there is no direct evidence that dyt1 patients' impairment is indeed a plasticity impairment. Furthermore, the behavioral evidence tying risk taking to learning here is also weak (see next point). These results may provide more evidence, of a different nature, but of a similar degree of "directness" as existing evidence, for example linking dopamine dependent plasticity individual differences to learning behavior in humans.
2) Model validation. Despite the authors' careful effort to validate it, the RSTD is still not very convincing in this data set.
A) While it fits better over the whole group, it does not seem to be the case within the patient group (6/13). This is a problem, as it may indicate that the fit parameters don't reflect important aspects of the behavior within this group, rendering interpretation difficult.
B) This may be a limitation of the task: Dyt1 patients appear to be in average risk neutral, such that a task in which the only risk comparison relies on equal average value will render the RSTD not identifiable beyond TD.
C) An important validation of the model would be to confirm with simulations that it can reproduce behavioral pattern 3c. This is an important, model-independent data point supporting the learning interpretation, and the model should at minima be able to capture this qualitative phenomenon.
D) As the authors point out, there is nearly no learning needed in this task. A natural model to consider would thus be a non-learning utility model, parameterized with the same a parameter as the learning utility model presented here, but assuming known probabilities. While 3c hints at a potential learning effect, it could also reflect a non-learning win-stay lose-shift strategy. Investigating whether this combination accounts for behavior better or worse than RSTD would be an important point in ensuring whether the learning interpretation is valid.
3) If the RSTD model is validated, it would be useful to also present results relating to the learning rates directly, not just the difference index, as authors attempt an interpretation in terms of LTP/and or LTD.
4) It would also be important to know whether previous rewards have differential effects depending on whether the trial was a free or forced choice trial (e.g. Figure 3C). Recent work (Cockburn et al) has shown that value learning, especially from positive prediction errors, differed between these conditions, and if patients and controls had baseline differences in risk taking for non-learning reasons, this would lead patients to experience more free-choice risky trials than controls, potentially biasing learning. This might be important to rule out.
5) It would be interesting to report patients' behavior in absolute, not only relative to controls. It appears that they are in average risk neutral, which is not a suboptimal or irrational thing to do in this task. It might be interesting to discuss why their impairment appears to "correct an imbalance" that is seen in healthy controls, contrary to what is usually observed in patient studies.
Building on animal work showing that DYT1 dystonia animal models are associated with exaggerated LTP and diminished LTD, the authors outline an experiment testing for learning abnormalities in human DYT1 dystonia patients. The authors hypothesize that DYT1 patients will show a positive learning bias owing to increased LTP and/or reduced LTD. Consistent with their hypothesis, DYT1 patients do indeed show atypical responsivity to outcomes in that they do not exhibit risk aversion as did matched controls.
The authors correctly emphasize the chasm between synaptic modification and behavior. The application of genetically linked animal models and human disease states in concert with an algorithmic description of a learning mechanism offers an exciting and constructive path forward. However, this depends critically on a shared mechanism between the animal model and the human brain. I cannot speak to the quality of the animal models or the nature of DYT1 dystonia; however, the text leaves some question as to the shared commonality. The authors point to atypical LPT/LTD in animal models, but abnormal dopamine/acetylcholine transmission in humans. Given the manuscript's emphasis on lessening the gap between synapse and behavior, I feel that the manuscript could benefit from more support linking animal and human disorders (perhaps via behavioral patterns in the animal models, mechanisms through which medications operate etc.).
Of greater concern is whether these results truly inculpate a reinforcement learning mechanism. As outlined out by the authors, patients exhibited a strong propensity to pick the risky option again following a win but avoided it following a loss, which is argued to demonstrate outcome sensitivity. But, this response pattern does not necessitate a reinforcement learning strategy. These data also appear to be consistent with a win-stay/lose-shift strategy (WS/LS). Given the 50% chance of reward on the risky stimulus, a WS/LS strategy is consistent with the reported lack of risk aversion. Furthermore, the model generated response pattern illustrated in Figure 1B (bottom panel), suggests that a reinforcement learning strategy becomes increasingly risk averse. There does appear to be some trend of increasing risk aversion in controls, but not so for the patient group. Given that there are no clear response patterns that demonstrate a reinforcement learning strategy (i.e. there are no learning curves etc.), I feel that the results would be more interpretable if a WS/LS model was also included in the analysis and Discussion.
Could observed effects in the patient group be driven by medication withdrawal? The authors offer helpful discussion of medication half-life, but this only serves to demonstrate that effects are probably not driven by the medication itself.
What were the model parameter ranges explored while fitting the data? Within subject variance (Figure 3A) seem to indicate considerably more variance in the control group (though I understand this to be a subset of the dataset), so I am a bit surprised by the apparent lack of effect in the inverse temperature parameter.https://doi.org/10.7554/eLife.14155.014
- David Arkadir
- David Arkadir
- David Arkadir
- Yael Niv
- Angela Radulescu
- Yael Niv
- Angela Radulescu
- Yael Niv
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
This research was supported in part by the Parkinson’s Disease Foundation (DA), the NIH Office of Rare Diseases Research through the Dystonia Coalition (DA) and the National Institute for Psychobiology in Israel (DA), a Sloan Research Fellowship to YN, NIH grant R01MH098861 (AR and YN) and Army Research Office grant W911NF-14-1-0101 (YN & AR). We are grateful to Hagai Bergman, Reka Daniel, Nathaniel Daw, Stanley Fahn, Ann Graybiel, Elliot Ludvig, Rony Paz, Daphna Shohamy, Nicholas Turk-Browne and Jeff Wickens for very helpful comments on previous versions of the manuscript.
Human subjects: All participants gave informed consent and the study was approved by the Institutional Review Boards of Columbia University, Beth Israel Medical Center, and Princeton University.
- Rui M Costa, Reviewing Editor, Fundação Champalimaud, Portugal
© 2016, Arkadir et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.