(a) Our experimental design was aimed explicitly at focusing on learning about the risky cue so that we could analyze learning from positive and negative prediction errors decoupled from initial learning about deterministic cues. As shown in Figure 3c, participants’ tendency to choose the risky 0/10 cue over the same-mean 5¢ cue was dynamically adjusted according to experience: if the previous choice of the risky cue was rewarded with 10¢, participants were significantly more likely to choose the risky cue again on the next time it was available, as compared to the case in which the previous choice of the risky cue resulted in 0¢. To verify that the value of the risky cue was continuously updated, we calculated the proportion of choices of the risky cue over the sure 5¢ cue after different outcomes of the previous instance in which the risky cue was selected, for different time bins throughout the task (15 risky trials in each). A three way ANOVA (group X outcome X time-bin) revealed a significant effect for group (P < 0.001), outcome (win or loss; P < 0.001) and no effect of time-bin or interactions. Post-hoc comparisons revealed that the differences between win and loss conditions were significant in all bins for the DYT group only (all Ps < 0.05, two tailed). The first two bins for the CTL group approached significance (P = 0.054, two-tailed). This analysis showed that DYT patients changed their behavior based on outcomes of the risky cue throughout training. Control participants, on the other hand, evidenced somewhat less learning as the task continued, with their behavior in the last quarter of training settling on a risk-averse policy that was not sensitive to local outcomes. In reinforcement learning, this could result from a gradual decrease of learning rates, which is optimal in a stationary environment. Indeed, the final risk-averse policy was predicted by our model, based on the ratio of positive and negative learning rates. In any case, these results suggest that participants learned to evaluate the risky cue based on experienced rewards, and that the locally fluctuating value of the risky cue affected choice behavior, at least in the first half of the experiment, and for the DYT group, throughout the experiment. (b) Recent work on similar reinforcement learning tasks has shown that choice trials and forced trials may exert different effects on learning (Cockburn et al., 2014). To test for this effect in our data, we examined separately the probability of choosing the risky cue over the sure cue following wins or losses, after either forced or choice trials. Our analysis revealed that choices were significantly dependent upon the previous outcome of the risky cue (P < 0.01, F = 7.45, df = 1 for main effect of win versus loss; 3-way ANOVA with factors outcome, choice and group) but not upon its context (P = 0.38, F = 0.93, df = 1 for main effect of forced vs. choice trials). Similar to Cockburn et al. (2014), we did observe a numerically smaller effect of the outcome of forced trials (as compared to choice trials) on future choices, however this was not significant (interaction between outcome and choice P = 0.46, F = 0.56, df = 1). P values in the figure reflect paired t-tests.