Uncertainty alters the balance between incremental learning and episodic memory

  1. Jonathan Nicholas  Is a corresponding author
  2. Nathaniel D Daw
  3. Daphna Shohamy
  1. Department of Psychology, Columbia University, United States
  2. Mortimer B. Zuckerman Mind, Brain, Behavior Institute, Columbia University, United States
  3. Department of Psychology, Princeton University, United States
  4. Princeton Neuroscience Institute, Princeton University, United States
  5. The Kavli Institute for Brain Science, Columbia University, United States

Abstract

A key question in decision-making is how humans arbitrate between competing learning and memory systems to maximize reward. We address this question by probing the balance between the effects, on choice, of incremental trial-and-error learning versus episodic memories of individual events. Although a rich literature has studied incremental learning in isolation, the role of episodic memory in decision-making has only recently drawn focus, and little research disentangles their separate contributions. We hypothesized that the brain arbitrates rationally between these two systems, relying on each in circumstances to which it is most suited, as indicated by uncertainty. We tested this hypothesis by directly contrasting contributions of episodic and incremental influence to decisions, while manipulating the relative uncertainty of incremental learning using a well-established manipulation of reward volatility. Across two large, independent samples of young adults, participants traded these influences off rationally, depending more on episodic information when incremental summaries were more uncertain. These results support the proposal that the brain optimizes the balance between different forms of learning and memory according to their relative uncertainties and elucidate the circumstances under which episodic memory informs decisions.

Editor's evaluation

This paper posits that higher uncertainty environments should lead to more reliance on episodic memory, finding compelling evidence for this idea across several analysis approaches and across two independent samples. This is an important paper that will be of interest to a broad group of learning, memory, and decision-making researchers.

https://doi.org/10.7554/eLife.81679.sa0

Introduction

Effective decision-making depends on using memories of past experiences to inform choices in the present. This process has been extensively studied using models of learning from trial-and-error, many of which rely on error-driven learning rules that in effect summarize experiences using a running average (Sutton and Barto, 1998; Rescorla and Wagner, 1972; Houk et al., 1995). This sort of incremental learning provides a simple mechanism for evaluating actions without maintaining memory traces of each individual experience along the way and has rich links to conditioning behavior and putative neural mechanisms for error-driven learning (Schultz et al., 1997). However, recent findings indicate that decisions may also be guided by the retrieval of individual events, a process often assumed to be supported by episodic memory (Bakkour et al., 2019; Plonsky et al., 2015; Mason et al., 2020; Bornstein et al., 2017; Collins and Frank, 2012; Bornstein and Norman, 2017; Duncan et al., 2019; Duncan and Shohamy, 2016; Lee et al., 2015; Wimmer and Büchel, 2020). Although theoretical work has suggested a role for episodic memory in initial task acquisition, when experience is sparse (Gershman and Daw, 2017; Lengyel and Dayan, 2007), the use of episodes may be much more pervasive as its influence has been detected empirically even in decision tasks that are well-trained and can be solved normatively using incremental learning alone (Plonsky et al., 2015; Bornstein et al., 2017; Bornstein and Norman, 2017). The apparent ubiquity of episodic memory as a substrate for decision-making raises questions about the circumstances under which it is recruited and the implications for behavior.

How and when episodic memory is used for decisions relates to a more general challenge in cognitive control: understanding how the brain balances competing systems for decision-making. An overarching hypothesis is that the brain judiciously adopts different decision strategies in circumstances for which they are most suited; for example, by determining which system is likely to produce the most rewarding choices at the least cost. This general idea has been invoked to explain how the brain arbitrates between deliberative versus habitual decisions and previous work has suggested a key role for uncertainty in achieving a balance that maximizes reward (Daw et al., 2005; Lee et al., 2014). Moreover, imbalances in arbitration have been implicated in dysfunction such as compulsion (Gillan et al., 2011; Voon et al., 2015), addiction (Ersche et al., 2016; Everitt and Robbins, 2005), and rumination (Hunter et al., 2022; Dayan and Huys, 2008; Huys et al., 2012).

Here, we hypothesized that uncertainty is used for effective arbitration between decision systems and tested this hypothesis by investigating the tradeoff between incremental learning and episodic memory. This is a particularly favorable setting in which to examine this hypothesis due to a rich prior literature theoretically analyzing, and experimentally manipulating, the efficacy of incremental learning in isolation. Studies of this sort typically manipulate the volatility, or frequency of change, of the environment, as a way of affecting uncertainty about incrementally learned quantities. In line with predictions made by statistical learning models, these experiments demonstrate that when the reward associated with an action is more volatile, people adapt by increasing their incremental learning rates (Behrens et al., 2007; Mathys et al., 2011; O’Reilly, 2013; Nassar et al., 2012; Nassar et al., 2010; Browning et al., 2015; Piray and Daw, 2020; Kakade and Dayan, 2002; Yu and Dayan, 2005). In this case, incrementally constructed estimates reflect a running average over fewer experiences, yielding both less accurate and more uncertain estimates of expected reward. We, therefore, reasoned that the benefits of incremental learning are most pronounced when incremental estimation can leverage many experiences or, in other words, when volatility is low. By contrast, when the environment is either changing frequently or has recently changed, estimating reward episodically by retrieving a single, well-matched experience should be relatively more favorable.

We tested this hypothesis using a choice task that directly pits these decision systems against one another (Duncan et al., 2019), while manipulating volatility. In particular, we (i) independently measured the contributions of episodic memory vs. incremental learning to choice and (ii) altered the uncertainty about incremental estimates using different levels of volatility. Two large online samples of healthy young adults completed three tasks. Results from the primary sample (n = 254) are reported in the main text; results from a replication sample (n = 223) are reported in the appendices (Appendix 1).

The main task of interest combined incremental learning and episodic memory, referred to throughout as the deck learning and card memory task (Figure 1A, middle panel). On each trial of this task, participants chose between two cards of a different color and received feedback following their choice. The cards appeared on each trial throughout the task, but their relative value changed over time (Figure 1B). In addition to the color of the card, each card also displayed an object. Critically, objects appeared on a card at most twice throughout the task, such that a chosen object could reappear between 9 and 30 trials after it was chosen the first time, and would deliver the same reward. Thus, participants could make decisions based on incremental learning of the average value of the decks or based on episodic memory for the specific value of an object which they only saw once before. Additionally, participants made choices across two environments: a high-volatility and a low-volatility environment. The environments differed in how often reversals in deck value occurred.

Study design and sample events.

(A) Participants completed three tasks in succession. The first was the deck learning task that consisted of choosing between two colored cards and receiving an outcome following each choice. One color was worth more on average at any given timepoint, and this mapping changed periodically. Second was the main task of interest, the deck learning and card memory task, which followed the same structure as the deck learning task but each card also displayed a trial-unique object. Cards that were chosen could appear a second time in the task after 9–30 trials and, if they reappeared, were worth the same amount, thereby allowing participants to use episodic memory for individual cards in addition to learning deck value from feedback. Outcomes ranged from $0 to $1 in increments of 20¢ in both of these tasks. Lastly, participants completed a subsequent memory task for objects that may have been seen in the deck learning and card memory task. Participants had to indicate whether they recognized an object and, if they did, whether they chose that object. If they responded that they had chosen the object, they were then asked if they remembered the value of that object. (B) Uncertainty manipulation within and across environments. Uncertainty was manipulated by varying the volatility of the relationship between cue and reward over time. Participants completed the task in two counterbalanced environments that differed in their relative volatility. The low-volatility environment featured half as many reversals in deck luckiness as the high-volatility environment. Top: the true value of the purple deck is drawn in gray for an example trial sequence. In purple and orange are estimated deck values from the reduced Bayesian model (Nassar et al., 2010). Trials featuring objects appeared only in the deck learning and card memory task. Bottom: uncertainty about deck value as estimated by the model is shown in gray. This plot shows relative uncertainty, which is the model’s imprecision in its estimate of deck value.

In addition to the main task, participants also completed two other tasks in the experiment. First, participants completed a simple deck learning task (Figure 1A, left panel) to acclimate them to each environment and quantify the effects of uncertainty. This task included choices between two diamonds of a different color on each trial, without any trial-unique objects. Second, after the main task, participants completed a standard subsequent memory task (Figure 1A, right panel) designed to assess later episodic memory for objects encountered in the main task.

We predicted that greater uncertainty about incremental values would be related to increased use of episodic memory. The experimental design provided two opportunities to measure the impact of uncertainty: across conditions, by comparing between the high- and the low-volatility environments, and within condition, by examining how learning and choices were impacted by each reversal.

Results

Episodic memory is used more under conditions of greater volatility

As noted above, participants completed two decision-making tasks. The deck learning task familiarized them with the underlying incremental learning task and established an independent measure of sensitivity to the volatility manipulation. The separate deck learning and card memory task measured the additional influence of episodic memory on decisions (Figure 1). In the deck learning task, participants chose between two decks with expected value (V) that reversed periodically across two environments, with one more volatile (reversals every 10 trials on average) and the other less volatile (reversals every 20 trials on average).

Participants were told that at any point in the experiment one of the two decks was ‘lucky,’ meaning that its expected value (Vlucky = 63¢) was higher than the other ‘unlucky’ deck (Vunlucky = 37¢). They were also told that which deck was currently lucky could reverse at any time, and that they would be completing the task in two environments that differed in how often these reversals occurred. We reasoned that, following each reversal, participants should be more uncertain about deck value and that this uncertainty should reduce with experience. Because the more volatile environment featured more reversals, in this condition subjects should have greater uncertainty about the deck value overall.

In the second deck learning and card memory task, each deck featured cards with trial-unique objects that could reappear once after being chosen and were worth an identical amount at each appearance. Here, participants were told that they could use their memory for the value of objects they recognized to guide their choices. They were also told that the relative level of volatility in each environment during the card learning task would be identical in this task. We predicted that decisions would be based more on object value when deck value was more volatile. Our logic was that episodic memory should be relied upon more strongly when incremental learning is less accurate and reliable due to frequent change. This, in turn, is because episodic memory is itself imperfect in practice, so participants face a nontrivial tradeoff between attempting episodic recall vs. relying on incremental learning when an object recurs. We, therefore, expected choices to be more reliant on episodic memory in the high- compared to the low-volatility environment.

We first examined whether participants were separately sensitive to each source of value in the deck learning and card memory task: the value of the objects (episodic) and of the decks (incremental). Controlling for average deck value, we found that participants used episodic memory for object value, evidenced by a greater tendency to choose high-valued old objects than low-valued old objects (βOldValue=0.621,95%CI=[0.527,0.713]; Figure 2A). Likewise, controlling for object value, we also found that participants used incrementally learned value for the decks, evidenced by the fact that the higher-valued (lucky) deck was chosen more frequently on trials immediately preceding a reversal (βt4=0.038,95% CI=[0.038,0.113]; βt3=0.056,95% CI=[0.02,0.134]; βt2=0.088,95% CI=[0.009,0.166]; βt1=0.136,95% CI=[0.052,0.219]; Figure 2B), that this tendency was disrupted by the reversals (βt=0=-0.382,95%CI=[-0.465,-0.296]), and by the quick recovery of performance on the trials following a reversal (βt+1=-0.175,95%CI=[-0.258,-0.095]; βt+2=-0.106,95%CI=[-0.18,-0.029]; βt+3=0.084,95%CI=[0.158,0.006]; βt+4=0.129,95%CI=[0.071,0.184]).

Figure 2 with 1 supplement see all
Evaluating the proportion of incremental and episodic choices.

(A) Participants’ (n = 254) choices demonstrate sensitivity to the value of old objects. Group-level averages are shown as points and lines represent 95% confidence intervals. (B) Reversals in deck luckiness altered choice such that the currently lucky deck was chosen less following a reversal. The line represents the group-level average, and the band represents the 95% confidence interval. (C) On incongruent trials, choices were more likely to be based on episodic memory (e.g., high-valued objects chosen and low-valued objects avoided) in the high- compared to the low-volatility environment. Averages for individual subjects are shown as points, and lines represent the group-level average with a 95% confidence interval. (D) Median reaction time was longer for incongruent choices based on episodic memory compared to those based on incremental learning.

Having established that both episodic memory and incremental learning guided choices, we next sought to determine the impact of volatility on episodic memory for object value by isolating trials on which episodic memory was most likely to be used. To identify reliance on object value, we first focused on trials where the two sources of value information were incongruent: that is, trials for which the high-value deck featured an old object that was of low value (<50¢) or the low-value deck featured an old object that was of high value (>50¢). We then defined an episodic-based choice index (EBCI) by considering a choice as episodic if the old object was, in the first case, avoided or, in the second case, chosen. Consistent with our hypothesis, we found greater evidence for episodic choices in the high-volatility environment compared to the low-volatility environment (βEnv=0.092,95%CI=[0.018, 0.164]; Figure 2C). Finally, this analysis also gave us the opportunity to test differences in reaction time between incremental and episodic decisions. Decisions based on episodic value took longer (βEBCI=37.629,95%CI=[28.488,46.585]; Figure 2D), perhaps reflecting that episodic retrieval may take more time than retrieval of cached incremental value.

Uncertainty about incremental values increases sensitivity to episodic value

The effects of environment described above provide a coarse index of overall differences in learning across conditions. To capture uncertainty about deck value on a trial-by-trial basis, we adopted a computational model that tracks uncertainty during learning. We then used this model to test our central hypothesis: that episodic memory is used more when posterior uncertainty about deck value is high. Our reasoning was that episodic memory should not only be deployed more when incremental learning is overall inaccurate due to frequent change, but also within either condition following recent change. We, therefore, predicted that, across both environments, participants would be more likely to recruit episodic memory following reversals in deck value, when uncertainty is at its highest.

We began by hierarchically fitting two classes of incremental learning models to the behavior on the deck learning task: a baseline model with a Rescorla–Wagner (Rescorla and Wagner, 1972) style update (RW) and a reduced Bayesian model (Nassar et al., 2010) (RB) that augments the RW learner with a variable learning rate, which it modulates by tracking ongoing uncertainty about deck value. This approach – which builds on a line of work applying Bayesian learning models to capture trial-by-trial modulation in uncertainty and learning rates in volatile environments (Behrens et al., 2007; Mathys et al., 2011; Nassar et al., 2010; Piray and Daw, 2020; Kakade and Dayan, 2002; Yu and Dayan, 2005) – allowed us to first assess incremental learning free of any contamination due to competition with episodic memory. We then used the parameters fit to this task for each participant to generate estimates of subjective deck value and uncertainty around deck value, out of sample, in the deck learning and card memory task. These estimates were then used alongside episodic value to predict choices on incongruent trials in the deck learning and card memory task.

We first tested whether participants adjusted their rates of learning in response to uncertainty, both between environments and due to trial-wise fluctuations in uncertainty about deck value. We did this by comparing the ability of each combined choice model to predict participants’ decisions out of sample. To test for effects between environments, we compared models that controlled learning with either a single free parameter (for RW, a learning rate α; for RB, a hazard rate H capturing the expected frequency of reversals) shared across both environments or models with a separate free parameter for each environment. To test for trial-wise effects within environments, we compared between RB and RW models: while RW updates deck value with a constant learning rate, RB tracks ongoing posterior uncertainty about deck value (called relative uncertainty, RU) and increases its learning rate when this quantity is high.

We also included two other models in our comparison to control for alternative learning strategies. The first was a contextual inference model (CI), which modeled deck value as arising from two switching contexts (either that one deck was lucky and the other unlucky or vice versa) rather than from incremental learning. The second was a Rescorla–Wagner model that, like the RB model but unlike the RW models described above, learned only a single-value estimate (RW1Q). The details for all models can be found in Appendix 3.

Participants were sensitive to the volatility manipulation and also incorporated uncertainty into updating their beliefs about deck value. This is indicated by the fact that the RB combined choice model that included a separate hazard rate for each environment (RB2H) outperformed both RW models, the RB model with a single hazard rate, as well as other alternative learning models (Figure 3A). Further, across the entire sample, participants detected higher levels of volatility in the high-volatility environment, as indicated by the generally larger hazard rates recovered from this model in the high- compared to the low-volatility environment (HLow=0.04,95%CI=[0.033,0.048]; HHigh=0.081,95%CI=[0.067,0.097]; Figure 3B). Next, we examined the model’s ability to estimate uncertainty as a function of reversals in deck luckiness. Compared to an average of the four trials prior to a reversal, RU increased immediately following a reversal and stabilized over time (βt=0=0.014,95%CI=[0.019,0.048]; βt+1=0.242,95%CI=[0.276,0.209]; βt+2=0.145,95%CI=[0.178,0.112]; βt+3=0.1,95%CI=[0.131,0.07]; βt+4=0.079, 95%CI=[0.108,0.048]; Figure 3C). As expected, RU was also, on average, greater in the high- compared to the low-volatility environment (βEnv=0.015, 95%CI=[0.012,0.018]). Lastly, we were interested in assessing the relationship between reaction time and RU as we expected that higher uncertainty may be reflected in more time needed to resolve decisions. In line with this idea, RU was strongly related to reaction time such that choices made under more uncertain conditions took longer (βRU=1.685,95%CI=[0.823,2.528]).

Figure 3 with 1 supplement see all
Evaluating model fit and sensitivity to volatility.

(A) Expected log pointwise predictive density (ELPD) from each model was calculated from a 20-fold leave-N-subjects-out cross-validation procedure and is shown here subtracted from the best-fitting model. The best-fitting model was the reduced Bayesian (RB) model with two hazard rates (2H) and sensitivity to the interaction between old object value and relative uncertainty (RU) in the choice function. Error bars represent standard error around ELPD estimates. (B) Participants (n = 254) were sensitive to the relative level of volatility in each environment as measured by the hazard rate. Group-level parameters are superimposed on individual subject parameters. Error bars represent 95% posterior intervals. The true hazard rate for each environment is shown on the interior of the plot. (C) RU peaks on the trial following a reversal and is greater in the high- compared to the low-volatility environment. Lines represent group means, and bands represent 95% confidence intervals.

Having established that participants were affected by uncertainty around beliefs about deck value, we turned to examine our primary question: whether this uncertainty alters the use of episodic memory in choices. We first examined effects of RU on the episodic choice index, which measures choices consistent with episodic value on trials when it disagrees with incremental learning. This analysis verified that episodic memory was used more on incongruent trial decisions made under conditions of high RU (βRU=2.133,95%CI=[0.7,3.535]; Figure 4A). To more directly test the prediction that participants would use episodic memory when uncertainty is high, we included trial-by-trial estimates of RU in the RB2H combined choice model, which was augmented with an additional free parameter to capture any change with RU in the effect of episodic value on choice. Formally, this parameter measured an effect of the interaction between these two factors, and the more positive this term the greater the impact of increased uncertainty on the use of episodic memory. This new combined choice model further improved out-of-sample predictions (RB2H+RU, Figure 3A). As predicted, while both incremental and episodic value were used overall (βDeckValue=0.502,95%CI=[0.428,0.583]; βOldValue=0.150,95%CI=[0.101, 0.20]), episodic value impacted choices more when RU was high (βOldValue:RU=0.067,95%CI=[0.026,0.11]; Figure 4B) and more generally in the high- compared to the low-volatility environment (βOldValue:Env=0.06,95%CI=[0.02,0.1]). This is consistent with the hypothesis that episodic value was relied on more when beliefs about incremental value were uncertain.

Figure 4 with 2 supplements see all
Evaluating effects of sensitivity to uncertainty on episodic choices.

(A) Participants’ (n = 254) degree of episodic-based choice increased with greater relative uncertainty (RU) as predicted by the combined choice model. Points are group means, and error bars are 95% confidence intervals. (B) Estimates from the combined choice model. Participants were biased to choose previously seen objects regardless of their value and were additionally sensitive to their value. As hypothesized, this sensitivity was increased when RU was higher, as well as in the high- compared to the low-volatility environment. There was no bias to choose one deck color over the other, and participants were highly sensitive to estimated deck value. Group-level parameters are superimposed as bars on individual subject parameters represented as points. Error bars represent 95% posterior intervals around group-level parameters. Estimates are shown in standard units.

The analyses above focus on uncertainty present at the time of retrieving episodic value because this is what we hypothesized would drive competition in the reliance on either system at choice time. However, in principle, reward uncertainty at the time an object is first encountered might also affect its encoding, and hence its subsequent use in episodic choice when later retrieved (Rouhani et al., 2018). To address this possibility, we looked at the impact of RU resulting from the first time an old object’s value was revealed on whether that object was later retrieved for a decision. Using our EBCI, there was no relationship between the use of episodic memory on incongruent trial decisions and RU at encoding (βRU=0.622,95%CI=[-0.832,2.044]; Figure 4—figure supplement 2A). Similarly, we also examined effects of trial-by-trial estimates of RU at encoding time in the combined choice model by adding another free parameter that captured change with RU at encoding time in the effect of episodic value on choice. This parameter was added alongside the effect of RU at retrieval time (from the previous analysis). There was no effect on choice in either sample (main: βOldValue:RU=0.028,95%CI=[-0.011,0.067]; replication: βOldValue:RU=-0.003,95%CI=-0.046,0.037 ; Figure 4—figure supplement 2B) and the inclusion of this parameter did not provide a better fit to subjects’ choices than the combined choice model with only increased sensitivity due to RU at retrieval time (Figure 4—figure supplement 2C).

Episodic and incremental value sensitivity predicts subsequent memory performance

Having determined that decisions depended on episodic memory more when uncertainty about incremental value was higher, we next sought evidence for similar effects on the quality of episodic memory. Episodic memory is, of course, imperfect, and value estimates derived from episodic memory are therefore also uncertain. More uncertain episodic memory should then be disfavored while the influence of incremental value on choice is promoted instead. Although in this study we did not experimentally manipulate the strength of episodic memory, as our volatility manipulation was designed to affect the uncertainty of incremental estimates, we did measure memory strength in a subsequent memory test. Thus, we predicted that participants who base fewer decisions on object value and more decisions on deck value should have poorer subsequent memory for objects from the deck learning and card memory task.

We first assessed subsequent memory performance. Participants’ recognition memory was well above chance (β0=1.887,95%CI=[1.782,1.989]), indicating a general ability to discriminate objects seen in the main task from those that were new. Recall for the value of previously seen objects was also well predicted by their true value (βTrueValue=0.174,95%CI=[0.160,0.188]), providing further support that episodic memory was used to encode object value. To underscore this point, we sorted subsequent memory trials according to whether an object was seen on an episodic- or incremental-based choice, as estimated according to our EBCI, during the deck learning and card memory task. Not only were objects from episodic-based choices better remembered than those from incremental-based choices (βEBCI=0.192,95%CI=[0.072,0.322]; Figure 5A), but value recall was also improved for these objects (βEBCI:TrueValue=0.047,95%CI=[0.030,0.065]; Figure 5B).

Figure 5 with 2 supplements see all
Relationship between choice type and subsequent memory.

(A) Objects originally seen during episodic-based choices were better remembered than objects seen during incremental-based choices. Average hit rates for individual subjects (n = 254) are shown as points, bars represent the group-level average, and lines represent 95% confidence intervals. (B) The value of objects originally seen during episodic-based choices was better recalled than objects seen during incremental-based choices. Points represent average value memory for each possible object value, and error bars represent 95% confidence intervals. Lines are linear fits, and bands are 95% confidence intervals. (C) Participants with greater sensitivity to episodic value as measured by random effects in the combined choice model tended to better remember objects seen originally in the card learning and deck memory task. (D) Participants with greater sensitivity to incremental value tended to have worse memory for objects from the card learning and deck memory task. Points represent individual participants, lines are linear fits, and bands are 95% confidence intervals.

We next leveraged the finer-grained estimates of sensitivity to episodic value from the learning model to ask whether, across participants, individuals who were estimated to deploy episodic value more during the deck learning and card memory task also performed better on the subsequent memory test. In line with the idea that episodic memory quality also impacts the relationship between incremental learning and episodic memory, participants with better subsequent recognition memory were more sensitive to episodic value (βEpSensitivity=0.373,95%CI=[0.273,0.478]; Figure 5C), and these same participants were less sensitive to incremental value (βIncSensitivity=-0.276,95%CI=[-0.383,-0.17]; Figure 5D). This result provides further evidence for a tradeoff between episodic memory and incremental learning. It also provides preliminary support for a broader version of our hypothesis, which is that uncertainty about value provided by either memory system arbitrates the balance between them.

Lastly, the subsequent memory task also provided us with the opportunity to replicate other studies that have found that prediction error and its related quantities enhance episodic memory across a variety of tasks and paradigms (Rouhani et al., 2018; Rouhani and Niv, 2021; Antony et al., 2021; Ben-Yakov et al., 2022). We predicted that participants should have better subsequent memory for objects encoded under conditions of greater uncertainty. While not our primary focus, we found support for this prediction across both samples (see Appendix 2, Figure 5—figure supplement 2).

Replication of the main results in a separate sample

We repeated the tasks described above in an independent online sample of healthy young adults (n = 223) to test the replicability and robustness of our findings. We replicated all effects of environment and RU on episodic-based choice and subsequent memory (see Appendix 1 and figure supplements for details).

Discussion

Research on learning and value-based decision-making has focused on how the brain summarizes experiences by error-driven incremental learning rules that, in effect, maintain the running average of many experiences. While recent work has demonstrated that episodic memory also contributes to value-based decisions (Bakkour et al., 2019; Plonsky et al., 2015; Mason et al., 2020; Bornstein et al., 2017; Collins and Frank, 2012; Bornstein and Norman, 2017; Duncan et al., 2019; Duncan and Shohamy, 2016; Lee et al., 2015; Wimmer and Büchel, 2020), many open questions remain about the circumstances under which episodic memory is used. We used a task that directly contrasts episodic and incremental influences on decisions and found that participants traded these influences off rationally, relying more on episodic information when incremental summaries were less reliable, that is, more uncertain and based on fewer experiences. We also found evidence for a complementary modulation of this episodic-incremental balance by episodic memory quality, suggesting that more uncertain episodic-derived estimates may reduce reliance on episodic value. Together, these results indicate that reward uncertainty modulates the use of episodic memory in decisions, suggesting that the brain optimizes the balance between different forms of learning according to volatility in the environment.

Our findings add empirical data to previous theoretical and computational work, which has suggested that decision-making can greatly benefit from episodic memory for individual estimates when available data are sparse. This most obviously arises early in learning a new task, but also in task transfer, high-dimensional or non-Markovian environments, and (as demonstrated in this work) during conditions of rapid change (Lengyel and Dayan, 2007; Blundell, 2016; Santoro et al., 2016). We investigate these theoretical predictions in the context of human decision-making, testing whether humans rely more heavily on episodic memory when incremental summaries comprising multiple experiences are relatively poor. We operationalize this tradeoff in terms of uncertainty, exemplifying a more general statistical scheme for arbitrating between different decision systems by treating them as estimators of action value.

There is precedent for this type of uncertainty-based arbitration in the brain, with the most well-known being the tradeoff between model-free learning and model-based learning (Daw et al., 2005; Keramati et al., 2011). Control over decision-making by model-free and model-based systems has been found to shift in accordance with the accuracy of their respective predictions (Lee et al., 2014), and humans adjust their reliance on either system in response to external conditions that provide a relative advantage to one over the other (Simon and Daw, 2011; Kool et al., 2016; Otto et al., 2013). Tracking uncertainty provides useful information about when inaccuracy is expected and helps to maximize utility by deploying whichever system is best at a given time. Our results add to these findings and expand their principles to include episodic memory in this tradeoff. This may be especially important given that human memory is resource limited and prone to distortion (Schacter et al., 2011) and forgetting (Ebbinghaus, 2013). Notably, in our task, an observer equipped with perfect episodic memory would always benefit from using it to make decisions. Yet, as our findings show, participants vary in their episodic memory abilities, and this memory capacity is related to the extent to which episodic memory is used to guide decisions.

One intriguing possibility is that there is more than just an analogy between the incremental-episodic balance studied here and previous work on model-free versus model-based competition. Incremental error-driven learning coincides closely with model-free learning in other settings (Schultz et al., 1997; Daw et al., 2005) and, although it has been proposed that episodic control constitutes a ‘third way’ (Lengyel and Dayan, 2007), it is possible that behavioral signatures of model-based learning might instead arise from episodic control via covert retrieval of individual episodes (Gershman and Daw, 2017; Hassabis and Maguire, 2009; Schacter et al., 2012; Vikbladh et al., 2017), which contain much of the same information as a cognitive map or world model. While this study assesses single-event episodic retrieval more overtly, an open question for future work is whether the extent to which these same processes, and ultimately the same episodic-incremental tradeoff, might also explain model-based choice as it has been operationalized in other decision tasks. A related line of work has emphasized a similar role for working memory in maintaining representations of individual trials for choice (Collins and Frank, 2012; Yoo and Collins, 2022; Collins, 2018; Collins and Frank, 2018). Given the capacity constraints of working memory, we think it unlikely that working memory can account for the effects shown here, which involve memory for dozens of trial-unique stimuli maintained over tens of trials.

Our findings also help clarify the impacts of uncertainty, novelty, and prediction error on episodic memory. Recent studies found that new episodes are more likely to be encoded under novel circumstances while prior experiences are more likely to be retrieved when conditions are familiar (Duncan et al., 2019; Duncan and Shohamy, 2016; Duncan et al., 2012; Hasselmo, 2006). Shifts between these states of memory are thought to be modulated by one’s focus on internal or external sources of information (Decker and Duncan, 2020; Tarder-Stoll et al., 2020) and signaled by prediction errors based in episodic memory (Bein et al., 2020; Chen et al., 2015; Sinclair and Barense, 2018; Greve et al., 2017). Relatedly, unsigned prediction errors, which are a marker of surprise, improve later episodic memory (Rouhani et al., 2018; Rouhani and Niv, 2021; Antony et al., 2021; Ben-Yakov et al., 2022). Findings have even suggested that states of familiarity and novelty can bias decisions toward the use of single past experiences or not (Duncan et al., 2019; Duncan and Shohamy, 2016).

One alternative hypothesis that emerges from this work is that change-induced uncertainty and novelty could exert similar effects on memory, such that novelty signaled by expectancy violations increases encoding in a protracted manner that dwindles as uncertainty is resolved, or the state of the environment becomes familiar. Our results provide mixed support for this interpretation. While subsequent memory was improved by the presence of uncertainty at encoding, as would be predicted by this work, there was little effect of uncertainty at encoding time on the extent to which decisions were guided by individual memories. It, therefore, seems likely that uncertainty and novelty operate in concert but exert different effects over decision-making, an interpretation supported by recent evidence (Xu et al., 2021).

This work raises further questions about the neurobiological basis of memory-based decisions and the role of neuromodulation in signaling uncertainty and aiding memory. In particular, studies have revealed unique functions for norepinephrine (NE) and acetylcholine (ACh) on uncertainty and learning. These findings suggest that volatility, as defined here, is likely to impact the noradrenergic modulatory system, which has been found to signal unexpected changes throughout learning (Nassar et al., 2012; Yu and Dayan, 2005; Yu and Dayan, 2002; Zhao et al., 2019). Noradrenergic terminals densely innervate the hippocampus (Schroeter et al., 2000), and a role for NE in both explicit memory formation (Grella et al., 2019) and retrieval (Murchison et al., 2004) has been posited. Future studies involving a direct investigation of NE or an indirect investigation using pupillometry (Nassar et al., 2012) may help to isolate its contributions to the interaction between incremental learning and episodic memory in decision-making. ACh is also important for learning and memory as memory formation is facilitated by ACh in the hippocampus, which may contribute to its role in separating and storing new experiences (Hasselmo, 2006; Decker and Duncan, 2020). In addition to this role, ACh is heavily involved in incremental learning and has been widely implicated in signaling expected uncertainty (Yu and Dayan, 2002; Bland and Schaefer, 2012). ACh may therefore play an important part in managing the tradeoff between incremental learning and episodic memory.

Indeed, while in this work we investigated the impact of uncertainty on learning using a well-established manipulation of environmental volatility, in general (and even in this task) uncertainty also arises from many other parameters of the environment, such as stochasticity (trial-wise outcome variance) (Piray and Daw, 2021). It remains to be seen whether similar results would be observed using other types of manipulations targeting uncertainty. In our task, the outcome variance was held constant, making it difficult to isolate the effects of stochasticity on participants’ subjective experience of uncertainty. The decision to focus on volatility was based on a rich prior literature demonstrating that volatility manipulations are a reliable means to modulate uncertainty in incremental learning (Behrens et al., 2007; Mathys et al., 2011; O’Reilly, 2013; Nassar et al., 2012; Nassar et al., 2010; Browning et al., 2015; Piray and Daw, 2020). Nonetheless, altering outcome variance to capture effects of stochasticity on episodic memory remains a critical avenue for further study. Still other attributes of the learning environment, like valence, have been shown to impact both uncertainty estimation (Aylward et al., 2019; Pulcu and Browning, 2019) and subsequent memory (Rosenbaum et al., 2022; Kensinger, 2004). It remains an open question how the valence of outcomes may impact the effects we observed here.

Further, another interpretation of this work is that, rather than capturing a tradeoff between multiple memory systems, our task could possibly be accomplished by a single system learning about, and dynamically weighting, independent features. Specifically, here we operationalized incremental learning as learning about a feature shared across multiple events (deck color) and episodic memory as learning about a trial-unique feature (an object that could be repeated once). Shifting attention between these independent features whenever one is less reliable could then yield similar behavior to arbitrating between incremental learning and episodic memory as we have posited here. While a scheme like this is possible, much prior work (Duncan et al., 2019; Lee et al., 2015; Poldrack et al., 2001; Packard and McGaugh, 1996; McDonald and White, 1994; Wimmer et al., 2014) indicates that multiple memory systems (differentiated by numerous other behavioral and neural signatures) are involved in the types of repeated vs. one-shot learning measured here. Further, our subsequent memory findings that individual objects and their associated value were better remembered from putatively episodic choices lend further support to the idea that episodic memory is used throughout the task. Nevertheless, more work is needed to distinguish between these alternatives and verify the connection between our task and other signatures of incremental vs. episodic memory.

For example, while in this study we disadvantaged incremental learning relative to episodic memory, similar predictions about their balance could be made by instead preferentially manipulating episodic memory, for example, through effects such as interference or recency and primacy. Another direction would be to look to the computational literature for additional task circumstances in which there are theoretical benefits to deploying episodic memory, and where incremental learning is generally ill suited, such as in environments that are high dimensional or require planning far into the future (Gershman and Daw, 2017). In principle, the brain can use episodic memory to precisely target individual past experiences in these situations depending on the relevance of their features to decisions in the present. Recent advances in computational neuroscience have, for example, demonstrated that artificial agents endowed with episodic memory are able to exploit its rich representation of past experience to make faster, more effective decisions (Lengyel and Dayan, 2007; Blundell, 2016; Santoro et al., 2016). While here we provided episodic memory as an alternative source of value to be used in the presence of uncertainty about incremental estimates, future studies making use of paradigms tailored more directly toward episodic memory’s assets will help to further elucidate how and when the human brain recruits episodic memory for decisions.

Finally, it is worth noting that many individuals, in both the main and replication samples, failed to meet our baseline performance criterion of altering the incremental learning rate between the low- and high-volatility environments (see ‘Materials and methods’). It is unclear whether this insensitivity to volatility was due to the limitations of online data collection, such as inattentiveness, or whether it is a more general feature of human behavior. While the low-volatility environment used here had half as many reversals as the high-volatility environment, it was still much more volatile than some environments used previously to study the effects of volatility on incremental learning (e.g., in entirely stable environments; Behrens et al., 2007). Thus, the relatively subtle difference between environments may also have contributed to some participants’ volatility insensitivity.

In conclusion, we have demonstrated that uncertainty induced by volatile environments impacts whether incremental learning or episodic memory is recruited for decisions. Greater uncertainty increased the likelihood that single experiences were retrieved for decision-making. This effect suggests that episodic memory aids decision-making when simpler sources of value are less accurate. By focusing on uncertainty, our results shed light on the exact circumstances under which episodic memory is used for decision-making.

Materials and methods

Experimental tasks

Request a detailed protocol

The primary experimental task used here builds upon a paradigm previously developed by our lab (Duncan et al., 2019) to successfully measure the relative contribution of incremental and episodic memory to decisions (Figure 1A). Participants were told that they would be playing a card game where their goal was to win as much money as possible. Each trial consisted of a choice between two decks of cards that differed based on their color (shown in Figure 1 as purple and orange). Participants had 2 s to decide between the decks and, upon making their choice, a green box was displayed around their choice until the full 2 s had passed. The outcome of each decision was then immediately displayed for 1 s. Following each decision, participants were shown a fixation cross during the intertrial interval period that varied in length (mean = 1.5 s, min = 1 s, max = 2 s). Decks were equally likely to appear on either side of the screen (left or right) on each trial and screen side was not predictive of outcomes. Participants completed a total of 320 trials and were given a 30 s break every 80 trials.

Participants were made aware that there were two ways they could earn bonus money throughout the task, which allowed for the use of incremental and episodic memory, respectively. First, at any point in the experiment one of the two decks was ‘lucky,’ meaning that the expected value (V) of one deck color was higher than the other (Vlucky = 63¢, Vunlucky = 37¢). Outcomes ranged from $0 to $1 in increments of 20¢. Critically, the mapping from V to deck color underwent an unsignaled reversal periodically throughout the experiment (Figure 1B), which incentivized participants to utilize each deck’s recent reward history in order to determine the identity of the currently lucky deck. Each participant completed the task over two environments (with 160 trials in each) that differed in their relative volatility: a low-volatility environment with 8 V reversals, occurring every 20 trials on average, and a high-volatility environment with 16 V reversals, occurring every 10 trials on average. Reversal trials in each environment were determined by generating a list of bout lengths (high volatility: 16 bouts between 6 trials minimum and 14 trials maximum; low volatility: 8 bouts between 15 trials minimum and 24 trials maximum) at the beginning of the task and then randomizing this list for each participant. Participants were told that they would be playing in two different casinos and that in one casino deck luckiness changed less frequently while in the other deck luckiness changed more frequently. Participants were also made aware of which casino they were currently in by a border on the screen, with a solid black line indicating the low-volatility casino and a dashed black line indicating the high-volatility casino. The order in which the environments were seen was counterbalanced across participants.

Second, in order to allow us to assess the use of episodic memory throughout the task, each card within a deck featured an image of a trial-unique object that could reappear once throughout the experiment after initially being chosen. Participants were told that if they encountered a card a second time it would be worth the same amount as when it was first chosen, regardless of whether its deck color was currently lucky or not. On a given trial t, cards chosen once from trials t-9 through t-30 had a 60% chance of reappearing following a sampling procedure designed to prevent each deck’s expected value from becoming skewed by choice, minimize the correlation between the expected value of previously seen cards and deck expected value, and ensure that choosing a previously selected card remained close to 50¢. Specifically, outcomes for each deck were drawn from a pseudorandom list of deck values that was generated at the start of the task, sampled without replacement, and repopulated after each reversal. Previously seen cards were then sampled using the following procedure: (i) a list of objects from the past 9–30 trials equal to an outcome left in the current list of potential deck outcomes was generated; (ii) the list was narrowed down to objects whose value was incongruent with the current expected value of their associated deck if such objects were available; and (iii) if the average value of objects shown to a participant was greater than 50¢, the object with the lowest value was shown, otherwise an object was randomly sampled without replacement. This sampling procedure is identical to that used previously in Duncan et al., 2019.

Participants also completed a separate decision-making task prior to the combined deck learning and card memory task that was identical in design but lacked trial-unique objects on each card. This task, the deck learning task, was designed to isolate the sole contribution of incremental learning to decisions and to allow participants to gain prior experience with each environment’s volatility level. In this task, all participants first saw the low-volatility environment followed by the high-volatility environment in order to emphasize the relative increase in the high-volatility environment. Participants completed the combined deck learning and card memory task immediately following completion of the deck learning task and were told that the likelihood of deck luckiness reversals in each environment would be identical for both the deck learning task and the deck learning and card memory task. Instructions were presented immediately prior to each task, and participants completed five practice trials and a comprehension quiz prior to starting each.

Following completion of the combined deck learning and card memory task, we tested participants’ memory for the trial-unique objects. Participants completed 80 (up to) three-part memory trials. An object was first displayed on the screen, and participants were asked whether or not they had previously seen the object and were given five response options: Definitely New, Probably New, Don’t Know, Probably Old, and Definitely Old. If the participant indicated that they had not seen the object before or did not know, they moved on to the next trial. If, however, they indicated that they had seen the object before, they were then asked if they had chosen the object or not. Lastly, if they responded that they had chosen the object, they were asked what the value of that object was (with options spanning each of the six possible object values between $0 and $1). Of the 80 trials, 48 were previously seen objects and 32 were new objects that had not been seen before. Of the 48 previously seen objects, half were sampled from each environment (24 each) and, of these, an equal number were taken from each possible object value (with 4 from each value in each environment). As with the decision-making tasks, participants were required to pass a comprehension quiz prior to starting the memory task.

All tasks were programmed using the jsPsych JavaScript library (de Leeuw, 2015) and hosted on a Google Cloud server running Apache and the Ubuntu operating system. Object images were selected from publicly available stimulus sets (Konkle and Oliva, 2012; Brady et al., 2008) for a total of 665 unique objects that could appear in each run of the experiment.

Participants

A total of 418 participants between the ages of 18–35 were recruited for our main sample through Amazon Mechanical Turk using the Cloud Research Approved Participants feature (Litman et al., 2017). Recruitment was restricted to the United States, and $9 compensation was provided following completion of the 50 min experiment. Participants were also paid a bonus in proportion to their final combined earnings on both the training task and the combined deck learning and card memory task (total earnings/100). Before starting each task, all participants were required to score 100% on a quiz that tested their comprehension of the instructions and were made to repeat the instructions until this score was achieved. Informed consent was obtained with approval from the Columbia University Institutional Review Board.

From the initial pool, participants were excluded from analysis on the deck learning and card memory task if they (i) responded to fewer trials than the group average minus 1 standard deviation on the deck learning and card memory task, (ii) responded faster than the group average minus 1 standard deviation on this task, or (iii) did not demonstrate faster learning in the high- compared to the low-volatility environment on the independent deck learning task. Our reasoning for this latter decision was that it is only possible to test for effects of volatility on episodic memory recruitment in participants who were sensitive to the difference in volatility between the environments, and it is well-established that a higher learning rate should be used in more volatile conditions (Behrens et al., 2007). Further, our independent assessment of deck learning was designed to avoid issues of selection bias in this procedure. We measured the effect of environment on learning by fitting a mixed-effects logistic regression model to predict if subjects chose the lucky deck up to five trials after a reversal event in the deck learning task. For each subject s and trial t, this model predicts the probability that the lucky deck was chosen:

p(ChooseLucky)=σ(β0+b0,s[t]+TSinceRevt×Envt(β1+b1,s[t]))
σ(x)=11+e-x

where βs are fixed effects, b s are random effects, TSinceRev is the trial number coded as distance from a reversal event (1–5), and Env is the environment a choice was made in coded as –0.5 and 0.5 for the low- and high-volatility environments, respectively. Participants with positive values of b1 can be said to have chosen the lucky deck more quickly following a reversal in the high- compared to the low-volatility environment, and we included only these participants in the rest of our analyses. A total of 254 participants survived after applying these criteria, with 120 participants failing to respond to the volatility manipulation (criteria iii) and 44 participants responding to too few trials (criteria i) or too quickly (criteria ii).

Deck learning and card memory task behavioral analysis

Request a detailed protocol

For regression models described here as well as those in the following sections, fixed effects are reported in the text as the median of each parameter’s marginal posterior distribution alongside 95% credible intervals, which indicate where 95% of the posterior density falls. Parameter values outside of this range are unlikely given the model, data, and priors. Thus, if the range of likely values does not include zero, we conclude that a meaningful effect was observed.

We first analyzed the extent to which previously seen (old) objects were used in the combined deck learning and card memory task by fitting the following mixed-effects regression model to predict whether an old object was chosen:

p(ChooseOld)=σ(β0+b0,s[t]+OldValt(β1+b1,s[t])+TrueDeckValt(β2+b2,s[t]))

where OldVal is the centered value (between –0.5 and 0.5) of an old object. We additionally controlled for the influence of deck value on this analysis by adding a regressor, TrueDeckVal, which is the centered true average value of the deck on which each object was shown. Trials not featuring old objects were dropped from this analysis.

We then similarly assessed the extent to which participants engaged in incremental learning overall by looking at the impact of reversals on incremental accuracy directly. To do this, we grouped trials according to their distance from a reversal, up to four trials prior to (t=-4:-1), during (t=0), and after (t=1:4) a reversal occurred. We then dummy coded them to measure their effects on incremental accuracy separately. We also controlled for the influence of old object value in this analysis by including in this regression the coded value of a previously seen object (ranging from 0.5 if the value was $1 on the lucky deck or $0 on the lucky deck to –0.5 if the value was $0 on the lucky deck and $1 on the unlucky deck), for a total of 18 estimated effects:

p(ChooseLucky)=σ(T-4:4(β1:9+b1:9,s[t])+T-4:4×OldValt(β10:18+b10:18,s[t]))

To next focus on whether there was an effect of environment on the extent to which the value of old objects was used for decisions, we restricted all further analyses involving old objects to ‘incongruent’ trials, which were defined as trials on which either the old object was high valued (>50¢) and on the unlucky deck or low valued (<50¢) and on the lucky deck. To better capture participants’ beliefs, deck luckiness was determined by the best-fitting incremental learning model (see next section) rather than using the experimenter-controlled ground truth: whichever deck had the higher model-derived value estimate on a given trial was labeled the lucky deck. Our logic in using only incongruent trials was that choices that stray from choosing whichever deck is more valuable should reflect choices that were based on the episodic value for an object. Lastly, we defined our outcome measure of EBCI to equal 1 on trials where the ‘correct’ episodic response was given (i.e., high-valued objects were chosen and low-valued object were avoided), and 0 on trials where the ‘correct’ incremental response was given (i.e., the opposite was true). A single mixed-effects logistic regression was then used to assess the possible effects of environment Env on EBCI:

p(EBCI)=σ(β0+b0,st+EnvNoisetβ1+ Envt(β2+b2,s[t]))

where Env was coded identically to the above analyses. We included a covariate EnvNoise in this analysis to account for the possibility that participants are likely to make noisier incremental value-based decisions in the high-volatility compared to the low-volatility environment, which may contribute to the effects of environment on EBCI. To calculate this index, we fit the following mixed-effects logistic regression model to capture an interaction effect of environment and RB model-estimated deck value (see ‘Deck learning computational models’ section below) on whether the orange deck was chosen:

p(ChooseOrange)=σ(β0+b0,s[t]+DeckValt(β1+b1,s[t])+Envt(β2+b2,s[t])+DeckValt×Envt(β3+b3,s[t]))

We fit this model only to trials without the presence of a previously seen object in order to achieve a measure of noise specific to incremental learning. Each participant’s random effect of the interaction between deck value and environment, b3 , was then used as the EnvNoise covariate in the logistic regression testing for an effect of environment on EBCI.

To assess the effect of episodic-based choices on reaction time (RT), we used the following mixed-effects linear regression model:

RTt=β0+b0,s[t]+ EBCIt(β1+b1,s[t])+Switcht(β2+b2,s[t])+ChosenValtβ3+b3,st+RUtβ4+b4,st

where EBCI was coded as –0.5 for incremental-based trials and 0.5 for episodic-based trials. We also included covariates to control for three other possible effects on RT. The first, Switch, captured possible RT slowing due to exploratory decisions, which in the present task required participants to switch from choosing one deck to the other. This variable was coded as –0.5 if a stay occurred and 0.5 if a switch occurred. The second, ChosenVal, captured any effects due to the value of the option that may have guided choice, and was set to be the value of the previously seen object on episodic-based trials and the running average true value on incremental-based trials. Finally, the third, RU, captured effects due to possible slowing when choices occurred under conditions of greater uncertainty as estimated by the reduced Bayesian model (see below).

Deck learning computational models

Request a detailed protocol

We next assessed the performance of several computational learning models on our task in order to best capture incremental learning. A detailed description of each model can be found in the ‘Supplementary methods.’ In brief, these included one model that performed ("Rescorla-Wagner style updating [Rescorla and Wagner, 1972]”) with both a single (RW1α) and a separate (RW2α) fixed learning rate for each environment, two reduced Bayesian (RB) models (Nassar et al., 2010) with both a single (RB1H) and a separate hazard rate for each environment (RB1H), a contextual inference model (CI), and a Rescorla–Wagner model that learned only a single-value estimate (RW1Q). Models were fit to the deck learning task (see ‘Posterior inference’ and Appendix 3) and used to generate subject-wise estimates of deck value, and where applicable, uncertainty in the combined deck learning and card memory task.

Combined choice models

Request a detailed protocol

After fitting the above models to the deck learning task, parameter estimates for each subject were then used to generate trial-by-trial time series for deck value and uncertainty (where applicable) throughout performance on the combined deck learning and card memory task. Mixed-effects Bayesian logistic regressions for each deck learning model were then used to capture the effects of multiple memory-based sources of value on incongruent trial choices in this task. For each subject s and trial t, these models can be written as

p(ChooseOrange)=σ(β0+b0,s[t]+DeckValt(β1+b1,s[t])+Oldt(β2+b2,s[t])+OldValt(β3+b3,s[t]) +OldValt×Envt(β4+b4,s[t]))

where the intercept captures a bias toward choosing either of the decks regardless of outcome, DeckVal is the deck value estimated from each model, the effect of Old captures a bias toward choosing a previously seen card regardless of its value, and OldVal is the coded value of a previously seen object (ranging from 0.5 if the value was $1 on the orange deck or $0 on the purple deck to –0.5 if the value was $0 on the orange deck and $1 on the purple deck). To capture variations in sensitivity to old object value due to volatility (represented here by a categorical environment variable, Env, coded as –0.5 for the low- and 0.5 for the high-volatility environment), we also included an interaction term between old object value and environment in each model. An additional seventh regression that also incorporated our hypothesized effect of increased sensitivity to old object value when uncertainty about deck value is higher was also fit. This regression was identical to the others but included an additional interaction effect of uncertainty and old object value: OldValt×Unct(β5+b5,s[t]) and used the RB2H model’s DeckVal estimate alongside its estimate of RU to estimate the effect of OldVal×Unc. RU was chosen over CPP because it captures the reducible uncertainty about deck value, which is the quantity we were interested in for this study. Prior to fitting the model, all predictors were z scored in order to report effects in standard units.

Relative uncertainty analyses

Request a detailed protocol

We conducted several other analyses that tested effects on or of RU throughout the combined deck learning and card memory task. RU was mean-centered in each of these analyses. First, we assessed separately the effect of RU at retrieval time on EBCI using a mixed-effects logistic regression:

p(EBCI)=σ(β0+b0,s[t]+RUt(β1+b1,s[t])+RUt2(β2+b2,s[t]))

An additional binomial term was included in this model to allow for the possibility that the effect of RU is nonlinear, although this term was found to have no effect. The effect of RU at encoding time was assessed using an identical model but with RU at encoding included instead of RU at retrieval.

Next, to ensure that the RB model captured uncertainty related to changes in deck luckiness, we tested for an effect of environment on RU using a mixed-effects linear regression:

RUt=β0+b0,s[t]+Envt(β1+b1,s[t])

We then also looked at the impact of reversals on RU. To do this, we calculated the difference in RU on reversal trials and up to four trials following a reversal from the average RU on the four trials immediately preceding a reversal. Then, using a dummy coded approach similar to that used for the model testing effects of reversals on incremental accuracy, we fit the following mixed-effects linear regression with five effects:

RUDifferencet=T0:4(β1:5+b1:5,s[t])

We also assessed the effect of RU on reaction time using another mixed-effects linear regression:

RTt=β0+b0,s[t]+RUt(β1+b1,s[t])

Subsequent memory task behavioral analysis

Request a detailed protocol

Performance on the subsequent memory task was analyzed in several ways across recognition memory and value memory trials. We first assessed participants’ recognition memory accuracy in general by computing the signal detection metric d prime for each participant adjusted for extreme proportions using a log-linear rule (Hautus, 1995). The relationship with d prime and sensitivity to both episodic value and incremental value was then determined using simple linear regressions of the form dprimes=β0+Sensitivitys(β1), where Sensitivity was either the random effect of episodic value from the combined choice model for each participant or the random effect of incremental value from the combined choice value for each participant. We additionally assessed the difference in recognition memory performance between environments by computing d prime for each environment separately, with the false alarm rate shared across environments and hit rate differing between environments, using the following mixed-effects linear regression:

dprime=β0+b0,s+Env(β1+b1,s)

We next determined the extent to which participants’ memory for previously seen objects was impacted by whether an object was seen initially on either an episodic- or incremental-based choice using the following mixed-effects logistic regression model:

p(Hitt)=σ(β0+b0,s[t]+EBCIt(β1+b1,st))

where Hit was 0 if an object was incorrectly labeled as new and 1 if it was accurately identified as old. The final recognition memory analysis we performed was focused on assessing the impact of variables (RU, changepoint probability [CPP], and the absolute value of prediction error [APE]) extracted from the RB model at encoding time on future subsequent memory. Because these variables are, by definition, highly correlated with one another (see ‘Supplementary methods’), we fit separate simple mixed-effects logistic regression models predicting recognition memory from each variable separately and then compared the predictive performance of each model (see below) to determine which best accounted for subsequent memory performance. The models additionally controlled for potential recognition memory enhancements due to the absolute magnitude of an object’s true value by including this quantity as a covariate in each of these models.

In addition to the analyses of recognition memory, analogous effects were assessed for performance on memory for value. General value memory accuracy and a potential effect of environment on remembered value were assessed using the following mixed-effect linear regression:

Valuet=β0+b0,s[t]+TrueValt(β1+b1,s[t])+Envt(β2+b2,s[t])+Envt×TrueValt(β3+b3,s[t])

where Value is the remembered value of an object on each memory trial (between $0 and $1), and TrueVal is an object’s true value. We next assessed whether value memory was similarly impacted by whether an object was seen initially on either ran episodic- or incremental-based choice using a similar model for objects from incongruent trials only with EBCI as a predictor rather than Env. Lastly, as with the recognition memory analyses, we determined the extent to which trial-wise variables from the RB model (RU, CPP, and APE) at encoding impacted subsequent value memory by using each of these as a predictor instead in similar models and then comparing the predictive performance of each in an identical manner to the recognition memory models.

Posterior inference and model comparison

Request a detailed protocol

Parameters for all incremental learning models were estimated using hierarchical Bayesian inference such that group-level priors were used to regularize subject-level estimates. This approach to fitting reinforcement learning models improves parameter identifiability and predictive accuracy (van Geen and Gerraty, 2021). The joint posterior was approximated using No-U-Turn Sampling (Hoffman and Gelman, 2011) as implemented in Stan (Team SD, 2020). Four chains with 2000 samples (1000 discarded as burn-in) were run for a total of 4000 posterior samples per model. Chain convergence was determined by ensuring that the Gelman–Rubin statistic R^ was close to 1. A full description of the parameterization and choice of priors for each model can be found in Appendix 3. All regression models were fit using No-U-Turn Sampling in Stan with the same number of chains and samples. Default weakly informative priors implemented in the rstanarm package (Rstanarm, 2022) were used for each regression model. Model fit for the combined choice models and the models measuring trial-wise effects of encoding on subsequent memory was assessed by separating each dataset into 20-folds and performing a cross-validation procedure by leaving out N/20 subjects per fold, where N is the number of subjects in each sample. The expected log pointwise predictive density (ELPD) was then computed and used as a measure of out-of-sample predictive fit for each model.

Replication

Request a detailed protocol

We identically repeated all procedures and analyses applied to the main sample on an independently collected replication sample. A total of 401 participants were again recruited through Amazon Mechanical Turk, and 223 survived exclusion procedures carried out identically to those used for the main sample, with 124 participants failing to respond to the volatility manipulation (criteria iii) and 54 participants responding to too few trials (criteria i) or too quickly (criteria ii).

Citation race and gender diversity statement

Request a detailed protocol

The gender balance of papers cited within this work was quantified using databases that store the probability of a first name being carried by a woman. Excluding self-citations to the first and last authors of this article, the gender breakdown of our references is 12.16% woman (first)/woman (last), 6.76% man/woman, 23.44% woman/man, and 57.64% man/man. This method is limited in that (i) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity and (b) it cannot account for intersex, nonbinary, or transgender people. Second, we obtained predicted racial/ethnic category of the first and last authors of each reference using databases that store the probability of a first and last name being carried by an author of color. By this measure (and excluding self-citations), our references contain 9.55% author of color (first)/author of color(last), 19.97% white author/author of color, 22.7% author of color/white author, and 47.78% white author/white author. This method is limited in that (i) using names and Florida Voter Data to make the predictions may not be indicative of racial/ethnic identity, and (ii) it cannot account for indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names.

Appendix 1

Replication results

Here, we repeat and describe all analyses reported in the main text with replication sample. All results are reported in the same order as in the main text.

Episodic memory is used more under conditions of greater volatility

Participants in the replication sample were substantially more likely to choose high-valued old objects compared to low-valued old objects (βOldValue=0.723,95%CI=[0.624,0.827]; Figure 2—figure supplement 1A). Participants also altered their behavior in response to reversals in deck value. The higher-valued (lucky) deck was chosen more frequently on trials immediately preceding a reversal (βt4=0.095,95% CI=[0.016,0.176]; βt3=0.128,95% CI=[0.047,0.213]; βt2=0.168,95% CI=[0.085,0.251]; βt1=0.161,95%CI=[0.075,0.25]; Figure 2—figure supplement 1B). This tendency was then disrupted by trails on which a reversal occurred (βt=0=0.373,95%CI=[0.464,0.286]), with performance quickly recovering as the newly lucky deck became chosen more frequently on the trials following a reversal (βt+1=-0.256,95%CI=[-0.337,-0.175]; βt+2=-0.144,95%CI=[-0.22,-0.064]; t+3: βt+3=-0.024,95%CI=[-0.102,0.053]; βt+4=0.113,95%CI=[0.055,0.174]). Thus, participants in the replication sample were also sensitive to reversals in deck value, thereby indicating that they engaged in incremental learning throughout the task.

Participants in the replication sample also based more decisions on episodic value in the high-volatility environment compared to the low-volatility environment (βEnv=0.146,95%CI=[0.06, 0.228]; Figure 2—figure supplement 1C). Furthermore, decisions based on episodic value again took longer (βEBCI=39.445,95%CI=[29.660,49.328]; Figure 2—figure supplement 1D).

Uncertainty increases sensitivity to episodic value

In the replication sample, the reduced Bayesian model with two hazard rates was again the best-fitting model (Figure 3—figure supplement 1A). Participants detected higher levels of volatility in the high- compared to the low-volatility environment, as indicated by the generally larger hazard rates recovered from the high- compared to the low-volatility environment (βLow=0.048,95%CI=[0.038,0.06]; βHigh=0.071,95%CI=[0.058,0.088]; Figure 3—figure supplement 1B). Compared to an average of the four trials prior to a reversal, RU also increased immediately following a reversal and stabilized over time (βt=0=0.021,95% CI=[0.014,0.056]; βt+1=0.22,95% CI=[0.253,0.185]; βt+2=0.144,95% CI=[0.178,0.11]; βt+3=0.098,95% CI=[0.129,0.064]; βt+4=0.05,95% CI=[0.083,0.019]; Figure 3—figure supplement 1C). RU was again also, on average, greater in the high- compared to the low-volatility environment (βEnv=0.01,95%CI=[0.007,0.013]) and related to reaction time such that choices made under more uncertain conditions took longer (βRU=1.364,95%CI=[0.407,2.338]).

Episodic memory was also used more on incongruent trial decisions made under conditions of high RU (βRU=2.718,95%CI=[1.096,4.436]; Figure 4—figure supplement 1A). We again fit the combined choice model to the replication sample and found the following. Participants again used both sources of value throughout the task: both deck value as estimated by the model (βDeckValue=0.431,95%CI=[0.335,0.516]; Figure 4—figure supplement 1B) and the episodic value from old objects (βOldValue=0.191,95%CI=[0.137,0.245]) strongly impacted choice. Lastly, episodic value again impacted choices more when RU was high (βOldValue:RU=0.043,95%CI=[0.00003,0.088]) and in the high- compared to the low-volatility environment (βOldValue:Env=0.092,95%CI=[0.047,0.136]).

Finally, there was again no relationship between the use of episodic memory on incongruent trial decisions and RU at encoding (βRU=0.99,95%CI=[-0.642,2.576]; Figure 4—figure supplement 2). Including a sixth parameter to assess increased sensitivity to old object value due to RU at encoding time did not have an effect in the combined choice model (βOldValue:RU=-0.003,95%CI=-0.046,0.037 ; Figure 4—figure supplement 2), which is also reported in the main text. As with the main sample, including this parameter did not provide a better fit to subjects’ choices than the combined choice model with only increased sensitivity due to RU at retrieval time.

Episodic and incremental value sensitivity predicts subsequent memory performance

Participants in the replication sample again performed well above chance on the test of recognition memory (β0=1.874,95%CI=[1.772,1.977]), and objects from episodic choice trials were better remembered than those from incremental choice trials (βEBCI=0.157,95%CI=[0.033,0.278]; Figure 5—figure supplement 1A). Recall for the value of previously seen objects was also well predicted by their true value (βTrueValue=0.181,95%CI=[0.162,0.120]) and value recall was improved for objects from episodic choice trials (βEBCI:TrueValue=0.049,95%CI=[0.030,0.067]; Figure 5—figure supplement 1B). Participants with better subsequent recognition memory were again more sensitive to episodic value (βEpSensitivity=0.334,95%CI=[0.229,0.44]; Figure 5—figure supplement 1C), and these same participants were again less sensitive to incremental value (βIncSensitivity=-0.124,95%CI=[-0.238,-0.009]; Figure 5—figure supplement 1D).

Appendix 2

Uncertainty during encoding improves subsequent memory in both samples

The subsequent memory task provided us with the opportunity to test whether participants have better subsequent memory for objects encoded under conditions of greater uncertainty. Supporting the notion that uncertainty improves subsequent memory, recognition memory for objects encoded in the high-volatility environment was better than for those encoded in the low-volatility environment (main: βEnv=0.053,95%CI=[0.009,0.098]; replication: βEnv=0.078,95%CI=[0.031,0.126]). This coarse effect was limited to recognition memory, however, as memory for object value was less impacted by the environment in which it was seen (main: βEnv=-0.002,95%CI=[-0.012,0.009]; replication: βEnv=0.008,95%CI=[-0.002,0.019]).

We next examined the impact of RU at encoding on subsequent memory. Both recognition memory (main: βRU=0.129,95%CI=[0.022,0.241]; replication: βRU=0.179,95%CI=[0.041,0.329]) and value memory (main: βTrueValue:RU=0.012,95%CI=[0.001,0.023] ; replication: βTrueValue:RU=0.012,95%CI=[0.001,0.023] ; Figure 5—figure supplement 2) were associated with greater RU at encoding time. Lastly, we assessed how these effects of uncertainty at encoding compared to the effects of surprise, which is thought to also improve subsequent memory and is separately estimated by the RB model (see ‘Supplementary methods’). We found that surprise at encoding (quantified here as both the probability of a reversal in deck value and the absolute value of reward prediction error) led to modest improvement in subsequent memory, but these effects were less consistent across samples and types of memory (Figure 5—figure supplement 2). Models of subsequent memory performance featuring surprise were also outperformed by those that instead predicted memory from RU. Together, these results indicate that the presence of uncertainty at encoding improves subsequent memory.

Appendix 3

Supplementary methods

Description of incremental learning models

Rescorla–Wagner (RW)

The first model we considered was a standard model-free reinforcement learner that assumes a stored value (Q) for each deck is updated over time. Q is then referenced on each decision in order to guide choices. After each outcome ot , the value for the orange deck QO is updated according to the following rule (Rescorla and Wagner, 1972) if the orange deck is chosen:

QO,t+1=QO,t+α(ot-QO,t)

And is not updated if the purple deck is chosen:

QO,t+1=QO,t

Likewise, the value for the purple deck QB is updated equivalently. Large differences between estimated value and outcomes therefore have a larger impact on updates, but the overall degree of updating is controlled by the learning rate, α. Two versions of this model were fit, one with a single learning rate (RW1α), and one with two learning rates (RW2α), αlow or αhigh , depending on which environment the current trial was completed in. These parameters are constrained to lie between 0 and 1. A separate learning rate was used for each environment in the (RW2α) version to capture the well-established idea that a higher learning rate should be used in more volatile conditions (Behrens et al., 2007). A third RW model (RW1Q), also with two learning rates, was additionally fit to better match the property of the reduced Bayesian model (described below) in which anticorrelation between each deck’s value is assumed due to learning only a single value. This was accomplished by forcing the model to learn only one Q, where outcomes were coded in terms of the orange deck. For example, this means that an outcome worth $1 on the orange deck is treated the same as an outcome worth $0 on the purple deck by this model.

Reduced Bayesian (RB)

The second model we considered was the reduced Bayesian (RB) model developed by Nassar and colleagues (Nassar et al., 2010). This model tracks and updates its belief that the orange deck is lucky based on trial-wise outcomes, ot , using the following prediction error-based update:

Bt+1=Bt+αt(ot-Bt)

This update is identical to that used in the RW model; however, the learning rate αt is itself updated following each outcome according to the following rule:

αt=Ωt+(1-Ωt)τt

where Ωt is the probability that a change in deck luckiness has occurred on the most recent trial (the CPP) and τt is the imprecision in the model’s belief about deck value (the RU). The learning rate therefore increases whenever CPP or RU increases. CPP can be written as

Ωt=U(ot|0,1)HU(ot|0,1)H+N(ot|Bt,σ2)(1H)

where H is the hazard rate or probability of a change in deck luckiness. Two versions of this model were fit, one with a single hazard rate (RB1H), and one with two hazard rates (RB2H), Hlow and Hhigh, depending on the environment the current trial was completed in. In this equation, the numerator represents the probability that an outcome was sampled from a new average deck value, whereas the denominator indicates the combined probability of a change and the probability that the outcome was generated by a Gaussian distribution centered around the most recent belief about deck luckiness and the variance of this distribution, σ2. Because CPP is a probability, it is constrained to lie between 0 and 1. In our implementation, H was a free parameter (see ‘Posterior inference’ section below) and Ω1 was initialized to 1.

RU, which is the uncertainty about deck value relative to the amount of noise in the environment, is quite similar to the Kalman gain used in Kalman filtering:

kt=Ωtσ2+(1-Ωt)τtσ2+Ωt(1-Ωt)((ot-Bt)(1-τt))2
τt+1=ktkt+σ2

where σ2 is the observation noise and was here fixed to the true observation noise (0.33). kt consists of three terms: the first is the variance of the deck value distribution conditional on a change point, the second is the variance of the deck value distribution conditional on no change, and the third is the variance due to the difference in means between these two distributions. These terms are then used in the equation for τt+1 to provide the uncertainty about whether an outcome was due to a change in deck value or the noise in observations that is expected when a change point has not occurred. Because this model does not follow the two-armed bandit assumption of our task (i.e., that outcomes come from two separate decks), all outcomes were coded in terms of the orange deck, as in the RW1Q model described above. While this description represents a brief overview of the critical equations of the reduced Bayesian model, a full explanation can be found in Nassar et al., 2010.

Softmax choice

All incremental learning models were paired with a softmax choice function in order to predict participants’ decisions on each trial:

θt=11+e-(β0+β1Vt)

where θt is the probability that the orange deck was chosen on trial t. This function also consists of two inverse temperature parameters: β0 to model an intercept and β1 to model the slope of the decision function related to deck value. The primary difference for each model was how Vt is computed: RW (Vt=QO,t-QB,t); RB (Vt=Bt); RW1Q (Vt=Qt). In each of these cases, a positive Vt indicates evidence that the orange deck is more valuable while a negative Vt indicates evidence that the purple deck is more valuable.

Posterior inference

For all incremental learning models, the likelihood function can be written as

cs,tBernoulli(θs,t)

where cs,t is 1 if subject s chose the orange deck on trial t and 0 if purple was chosen. Following the recommendations of Gelman and Hill, 2006 and van Geen and Gerraty, 2021, βs is drawn from a multivariate normal distribution with mean vector μβ and covariance matrix Σβ :

βsMultivariateNormal(μβ,Σβ)

where Σβ is decomposed into a vector of coefficient scales τβ and a correlation matrix Ωβ via

Σβ=diag(τβ)×Ωβ×diag(τβ)

Weakly informative hyperpriors were then set on the hyperparameters μβ,Ωβ, and τβ :

μβN(0,5)
τβCauchy+(0,2.5)
ΩβLKJCorr(2)

These hyperpriors were chosen for their respective desirable properties: the half Cauchy is bounded at zero and has a relatively heavy tail that is useful for scale parameters, the LKJ prior with shape = 2 concentrates some mass around the unit matrix, thereby favoring less correlation (Lewandowski et al., 2009), and the normal is a standard choice for regression coefficients.

Because sampling from heavy-tailed distributions like the Cauchy is difficult for Hamiltonian Monte Carlo (Team SD, 2020), a reparameterization of the Cauchy distribution was used here. τβ was thereby defined as the transform of a uniformly distributed variable τβ_u using the Cauchy inverse cumulative distribution function such that

Fx-1(τβ_u)=τβ(π(τβ_u-12))
τβ_uU(0,1)

In addition, a multivariate noncentered parameterization specifying the model in terms of the Cholesky factorized correlation matrix was used in order to shift the data’s correlation with the parameters to the hyperparameters, which increases the efficiency of sampling the parameters of hierarchical models (Team SD, 2020). The full correlation matrix Ωβ was replaced with a Cholesky factorized parameter LΩβ such that

Ωβ=LΩβ×LΩβT
βs=μβ+(diag(τ)×LΩβ×z)T
LΩβLKJCholesky(2)
zN(0,1)

where multiplying the Cholesky factor of the correlation matrix by the standard normally distributed additional parameter z and adding the group mean μβ creates a βs vector distributed identically to the original model.

While the choice function is identical for each model, the parameters used in generating deck value differ for each. All were fit hierarchically and were modeled with the following priors and hyperpriors:

Rescorla–Wagner with a single learning rate (RW1α):

αβ(a1,a2)a1N(0,5)a2N(0,5)

Rescorla–Wagner with two learning rates (RW2α) and with one Q-value (RW1Q):

αlowβ(a1low,a2low)αhighβ(a1high,a2high)a1lowN(0,5)a2lowN(0,5)a1highN(0,5)a2highN(0,5)

Reduced Bayes with a single hazard rate (RB1H):

Hβ(h1,h2)h1N(0,5)h2N(0,5)

Reduced Bayes with two hazard rates (RB2H):

Hlowβ(h1low,h2low)Hhighβ(h1high,h2high)h1lowN(0,5)h2lowN(0,5)h1highN(0,5)h2highN(0,5)
Description of contextual inference model

Because of the structure of our task, one possibility is that participants did not engage in incremental learning, but instead inferred which one of two switching contexts they were in (either that the orange deck was lucky and the purple deck was unlucky or vice versa). To address this, we developed a contextual inference (CI) model based on a standard hidden Markov model (HMM) with two latent states. While HMMs are covered extensively elsewhere (Rabiner and Juang, 1986), we provide the following brief overview. The model assumes that each outcome, ot , was generated by a hidden state, st , which may take one of two values on each trial, st[1,2]. The goal of the model is then to infer which of the two states gave rise to each outcome on each trial using the following generative model:

otN(μs,1)stCategorical(θst1)

where μ[1,2], and θ is a 2 × 2 transition matrix. Here, we assume that each outcome is normally distributed with a known scale parameter and unknown location parameters, (μ1,μ2). The state variable follows a categorical distribution parameterized by θ, which determines the likelihood that, on a given trial, each state will transition to either the other state or itself. Here, θ was modeled separately for each environment to mirror the difference in volatility between environments. μ and θ were then fit as free parameters for each participant using Hamiltonian Monte Carlo, following recommendations for fitting HMMs in Stan (Team SD, 2020). The following priors were used for each parameter:

θlowDirichlet(1,1)θhighDirichlet(1,1)μ1N(Vlucky,σ)μ2N(Vunlucky,σ)

where σ is the true standard deviation of outcomes, and Vlucky and Vunlucky are the true expected values of the lucky and unlucky decks, respectively.

We then calculated the likelihood of each participant’s sequence of outcomes using the forward algorithm to compute the following marginalization:

p(o|θ,μ)=sp(o,s|θ,μ)

Upon estimating the parameters, the most probable sequence of states to have generated the observed outcomes was computed using the Viterbi algorithm. Assigning a state to each timepoint allowed us to make use of the assigned state’s μ as the expected state value for the timepoint. This was then treated as the deck value for further analyses, as for the incremental learning models. Lastly, outcomes were coded similarly to the RB and RW1Q models.

Data availability

All code, data, and software needed to reproduce the manuscript can be found here: https://codeocean.com/capsule/2024716/tree/v1; DOI: https://doi.org/10.24433/CO.1266819.v1.

The following data sets were generated
    1. Nicholas J
    2. Daw ND
    3. Shohamy D
    (2022) Code Ocean
    Uncertainty alters the balance between incremental learning and episodic memory.
    https://doi.org/10.24433/CO.1266819.v1

References

    1. Hassabis D
    2. Maguire EA
    (2009) The construction system of the brain
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 364:1263–1271.
    https://doi.org/10.1098/rstb.2008.0296
  1. Book
    1. Houk JC
    2. Adams JL
    3. Barto AG
    (1995)
    A model of how the basal ganglia generate and use neural signals that predict reinforcement
    In: Houk JC, editors. Models of Information Processing in the Basal Ganglia. The MIT Press. pp. 249–270.
  2. Book
    1. Lengyel M
    2. Dayan P
    (2007)
    Hippocampal contributions to control: the third way
    In: Singer Y, Koller, D, Roweis ST, Platt JC, editors. In Advances in Neural Information Processing Systems. Curran Associates, Inc. pp. 889–896.
  3. Book
    1. Rescorla R
    2. Wagner A
    (1972)
    A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement in classical conditioning II
    In: Prokasy WF, Black AH, editors. Current Research and Theory. New York: Appleton- Century-Crofts. pp. 64–99.
    1. Rouhani N
    2. Norman KA
    3. Niv Y
    (2018) Dissociable effects of surprising rewards on learning and memory
    Journal of Experimental Psychology. Learning, Memory, and Cognition 44:1430–1443.
    https://doi.org/10.1037/xlm0000518
  4. Book
    1. Simon DA
    2. Daw ND
    (2011)
    Environmental statistics and the trade-off between model-based and TD learning in humans
    In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, editors. In Advances in Neural Information Processing Systems. Curran Associates. pp. 127–135.
  5. Conference
    1. Vikbladh O
    2. Shohamy D
    3. Daw N
    (2017)
    Episodic contributions to model-based reinforcement learning
    Annual Conference on Cognitive Computational Neuroscience.
  6. Conference
    1. Yu A
    2. Dayan P
    (2002)
    Expected and unexpected uncertainty: ACh and NE in the neocortex
    NIPS’02: Proceedings of the 15th International Conference on Neural Information Processing Systems. pp. 173–180.

Decision letter

  1. Michael J Frank
    Senior and Reviewing Editor; Brown University, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Uncertainty alters the balance between incremental learning and episodic memory" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

All of the reviewers felt that this was a promising paper with compelling results, but they also raised a number of questions about the methodology and interpretation of the results that should be addressed in a revision, as detailed below.

Reviewer #1 (Recommendations for the authors):

The wide vs. skinny error bars are very difficult to visually differentiate. I recommend a more obvious difference.

Reviewer #2 (Recommendations for the authors):

1. 40-45% of the participants are excluded from the analysis in the main and replication samples. The authors should clarify how many were excluded for each criterion. Was the main culprit whether people responded to the volatility manipulation in the original deck learning task? If so, what does this mean about the generality of the effects of uncertainty in incremental learning? I suspect this may be due to the relative attentiveness of online participants, but the authors should address the caveat in the text. Some aspects of these excluded data may still be relevant. For example, if these participants do not register any differences in uncertainty between the two environments, wouldn't the prediction be that their use of episodic memory also does not differ?

2. Some aspects of the methods were not clear.

a. Was the order of the high and low volatility blocks counterbalanced across participants? Was the order the same in the first deck learning and second deck learning + card memory tasks? Were participants told explicitly that the environments would carry over between the two tasks? (The last point would further support using estimates fit to the first task out of the sample in the second.)

b. How individual objects were re-sampled was described in the overview (lines 435-439), but I imagine the details will matter a lot to people interested in replicating and extending this work. It would help to point the interested reader to where these could be found (eg, is the code provided, or is it based on a previous study described in detail, etc.).

3. Given that the results precede the methods, there are some aspects of the task that would be helpful to explain at the outset (around Figure 1). I was initially confused because the outcome in Figure 1 is $1 but the value memory was on a scale of 0-100, but this could be cleared up with a sentence about the possible outcomes. It would also be helpful to mention the mean outcome on the lucky versus unlucky deck, how frequently the lucky deck changes, and what participants are told explicitly about the volatility manipulation. This could be done in the text or with a revision to Figure 1. I was also confused about how the two samples were used until the end of the Results section (are they being analyzed together or separately? What am I looking at in Figures 2-5?). Again, a well-placed sentence at the top of the Results section would clear this up.

4. In Figure 3, I think a slightly different comparison would be useful, in addition to or instead of the two Rescorla-Wagner models. One difference between the reduced Bayesian and RW models is that the learning rate is dynamic in the RB model but not the RW model. But another difference between the way the two models are implemented is that the RB model assumes the value of the two decks are perfectly anti-correlated (ie, it is learning only one value estimate), while the RW model does not (ie, it is learning about the two decks independently). Thus, the RB model assumes a critical aspect of the structure of the task that the RW model does not. I doubt this difference completely accounts for its better performance, but this should be tested. A δ-rule model with a fixed learning rate that learns a single value estimate (like the RB model) would be the needed comparison. This comparison would also isolate the effect of including the dynamic learning rate (according to RU and CPP) in the model.

5. The discussion goes into the different effects of novelty, surprise, and uncertainty on subsequent memory (lines 349-364), in the context of the lack of effect of uncertainty (RU from the reduced Bayesian model) at encoding. But have the authors looked at the effect of surprise (changepoint probability in the reduced Bayesian model) at encoding? The previous studies discussed here would predict that surprise at encoding should enhance subsequent memory (and perhaps the use of episodic memory in choice). This point is not central to the manuscript, of course, but the authors have additional data relevant to the distinctions they are raising here.

Reviewer #3 (Recommendations for the authors):

As mentioned in the public review, I thought this was a very interesting study and the results were clearly communicated. However, I have a number of questions/recommendations for the authors to strengthen the results and interpretation.

Regarding the points made in the public review about uncertainty v volatility and context, I'd make the following recommendations:

(1) Uncertainty vs. volatility (described in public review): it would make sense to reframe results around volatility rather than uncertainty, or strengthen the trial-wise RU analyses to look at trial-wise RU only at trials far away from reversals. At least there should be some discussion of where uncertainty arises besides volatility.

(2) Context: I think the analyses would be strengthened with an additional model using context inference rather than incremental learning. A natural choice might be Gershman and Niv 2012, although you could possibly get away with something simpler if you assume 2 contexts.

(3) The focus on incongruent trials seems potentially thorny. It is intuitive why the authors do this: trials in which episodic and incremental values disagree are informative about which normative strategy the subjects are using. However, in the high volatility condition, if subjects are using incremental value, they may be more likely to have an outdated incremental value which would look consistent with the episodic choice. I would propose the authors look at congruent trials as well to confirm that they are indeed less likely to make errors on these trials in the high volatility condition than they are in the low volatility condition.

(4) Another question relates to the interpretation of competing for episodic v. incremental strategies, as opposed to just learning about independent features. One could argue that the subjects are doing instrumental learning over objects and colors separately, and when the reliability of one feature (color) is decremented, the other feature is relatively up-weighted. This also seems consistent with the fact that episodic and incremental learning tradeoff -- attending more to the object feature would perhaps compete with color.

(5) The authors show that uncertainty at encoding time does not have a discernible effect on the episodic index. This is evidence that volatility is modulating episodic contribution to decision-making, rather than encoding strength, which is a pretty fundamental part of the results (and in contrast to eg Sun et al. 2021, where unpredictability modulates episodic consolidation). One key thing to look at is if subjects show any difference in their ability to recall familiarity/value of objects from the different conditions. This would also speak to the question of if volatility is affecting encoding rather than just recall during decision-making. It would also make sense to look at the role of other variables at encoding time (eg prediction error) to see if these predict future use. It would also be interesting to see if subjects are storing the object value or the incremental value at the time the object was first shown -- this would be easy to check (when subjects rate the value in the last block, are they more likely to err in the direction of the incremental value at the time of encoding (eg like Figure 2A but with x-axis = incremental value at the time estimated with RW)). This would shed insight into exactly what kind of episodic strategy the subjects are deploying.

(6) The analysis showing that subjects that were better at recall in block 3 also had higher episodic index was a useful sanity check. It seems it would also be possible to perform this analysis within-subject (eg does episodic choice correlate with accurate value memory) and that would bear more on the question of whether it was uncertainty or simply a subjective preference for one strategy or another.

https://doi.org/10.7554/eLife.81679.sa1

Author response

Reviewer #1 (Recommendations for the authors):

The wide vs. skinny error bars are very difficult to visually differentiate. I recommend a more obvious difference.

Thank you. We agree that it was difficult to differentiate between the 80% and 95% posterior intervals that were plotted around group-level estimates. This is largely because there was little difference between these intervals on the scale that we are plotting in order to visualize individual subject estimates in addition to group-level estimates. In the revision, we have altered figures that previously had both intervals (e.g. Figure 3B and Figure 4B) to have only 95% posterior intervals, as these are more informative and in line with what is reported throughout the rest of the paper. We have additionally changed the visualization of group-level estimates in Figure 4B from lines to bars in order to more explicitly differentiate between how error and estimates are visualized.

Reviewer #2 (Recommendations for the authors):

1. 40-45% of the participants are excluded from the analysis in the main and replication samples. The authors should clarify how many were excluded for each criterion. Was the main culprit whether people responded to the volatility manipulation in the original deck learning task? If so, what does this mean about the generality of the effects of uncertainty in incremental learning? I suspect this may be due to the relative attentiveness of online participants, but the authors should address the caveat in the text. Some aspects of these excluded data may still be relevant. For example, if these participants do not register any differences in uncertainty between the two environments, wouldn't the prediction be that their use of episodic memory also does not differ?

Thank you for giving us the opportunity to clarify. We have now added how many participants were excluded due to insensitivity to the volatility manipulation or for their general performance during the second task in the text (lines 603-605 and 786-788) and have included in the Discussion a new paragraph focused on the nature of our online sample and some participants’ insensitivity to the volatility manipulation (lines 477-485). As suggested, we have also verified that the participants excluded due to their insensitivity to the volatility manipulation were indeed less affected by environment when making episodic-based choices. We repeated the same analysis as in the paper of the effect of environment on these participants’ episodic-based choice index. In both the main (β=0.087, 95% CI = [−0.112, 0.182]) and replication (β = 0.087, 95% CI = [0.000, 0.168]) samples, there was a reduced and less reliable effect of environment on choice type in these excluded participants compared to the included participants.

2. Some aspects of the methods were not clear.

Thank you for these suggestions of places where the methods could be made more clear. We have made several changes to address these points, as noted below:

a. Was the order of the high and low volatility blocks counterbalanced across participants? Was the order the same in the first deck learning and second deck learning + card memory tasks? Were participants told explicitly that the environments would carry over between the two tasks? (The last point would further support using estimates fit to the first task out of the sample in the second.)

The order in which participants saw the two environments was counterbalanced across participants for the deck learning and card memory task. We have clarified this in the paper (lines 139 and 523-524). In the deck learning task, all participants first saw the low volatility environment and then saw the high volatility environment. This decision was made in order to emphasize the increased volatility of the high volatility environment relative to the low volatility environment, and this information has been added to the methods (lines 547-549). Lastly, participants were indeed told explicitly that the volatility levels of each environment would carry over from one task to the other, and we have also added this to the methods (lines 550-552). Thank you for catching that this information was not included in the methods.

b. How individual objects were re-sampled was described in the overview (lines 435-439), but I imagine the details will matter a lot to people interested in replicating and extending this work. It would help to point the interested reader to where these could be found (eg, is the code provided, or is it based on a previous study described in detail, etc.).

The sampling procedure has now been described in detail in the methods (lines 533-542).

3. Given that the results precede the methods, there are some aspects of the task that would be helpful to explain at the outset (around Figure 1). I was initially confused because the outcome in Figure 1 is $1 but the value memory was on a scale of 0-100, but this could be cleared up with a sentence about the possible outcomes. It would also be helpful to mention the mean outcome on the lucky versus unlucky deck, how frequently the lucky deck changes, and what participants are told explicitly about the volatility manipulation. This could be done in the text or with a revision to Figure 1. I was also confused about how the two samples were used until the end of the Results section (are they being analyzed together or separately? What am I looking at in Figures 2-5?). Again, a well-placed sentence at the top of the Results section would clear this up.

Thank you. We have clarified this information in multiple places throughout the text prior to the methods. In the first two paragraphs of the Results, we have added information on how often reversals occurred (lines 103-104), what participants were told (lines 105-109) and the expected value of each deck (lines 105-106). We have further added information about the types and range of outcomes to the caption of Figure 1 (lines 132-133) and information about where the main and replication sample results are reported in the introduction (lines 70-72).

4. In Figure 3, I think a slightly different comparison would be useful, in addition to or instead of the two Rescorla-Wagner models. One difference between the reduced Bayesian and RW models is that the learning rate is dynamic in the RB model but not the RW model. But another difference between the way the two models are implemented is that the RB model assumes the value of the two decks are perfectly anti-correlated (ie, it is learning only one value estimate), while the RW model does not (ie, it is learning about the two decks independently). Thus, the RB model assumes a critical aspect of the structure of the task that the RW model does not. I doubt this difference completely accounts for its better performance, but this should be tested. A δ-rule model with a fixed learning rate that learns a single value estimate (like the RB model) would be the needed comparison. This comparison would also isolate the effect of including the dynamic learning rate (according to RU and CPP) in the model.

Thank you for your idea to include this model, as we agree that it helps to isolate exactly how the RB model improves over the RW models. We added a model that implements a δ rule identical to the RW models, but with a single Q-value (labeled as RW1Q in the text). Like the RB model, this model assumes that the value of the two decks are perfectly anti-correlated, and learns over outcomes that have been re-coded in terms of the orange deck (e.g. $1 on orange is treated equivalently to $0 on blue). This model is now described in the Results of the main text (lines 228-230), listed in the Methods (line 675), and explained in detail in Appendix 3. Using a procedure identical to how the other models were fit and compared, we found that this model performed worse than both the RB and RW models we had previously presented in both samples, suggesting that the dynamic learning rate used in the RB model does indeed account for its performance improvements. The results of this comparison are reflected in an updated Figure 3A.

5. The discussion goes into the different effects of novelty, surprise, and uncertainty on subsequent memory (lines 349-364), in the context of the lack of effect of uncertainty (RU from the reduced Bayesian model) at encoding. But have the authors looked at the effect of surprise (changepoint probability in the reduced Bayesian model) at encoding? The previous studies discussed here would predict that surprise at encoding should enhance subsequent memory (and perhaps the use of episodic memory in choice). This point is not central to the manuscript, of course, but the authors have additional data relevant to the distinctions they are raising here.

This is a great suggestion, thank you. We agree that our data can provide additional insights into the effects of novelty, surprise, and uncertainty at the time of encoding on subsequent memory and episodic-memory based choice. While previously we had only investigated effects of relative uncertainty (RU) at encoding on the use of episodic memory for decisions, we additionally looked at the effects of changepoint probability (CPP) and absolute prediction error (the absolute value of prediction error; APE), which are both potential markers of surprise at the time of encoding on episodic choices. Similar to the effects of RU at encoding time reported in the Results of the main text, there was no effect of CPP (Main: β = 0.044, 95% CI = [−0.004, 0.092]; Replication: β = 0.004, 95% CI = [−0.04, 0.048]) in either sample. There was an effect of APE at encoding in the main sample (β = 0.1, 95% CI = [0.039, 0.165]), but this effect did not replicate (β = 0.056, 95% CI = [−0.013, 0.123]). Based on this, our original conclusion about the effects of these variables at encoding time on episodic-based choice remains unchanged.

In addition, based on your suggestion here along with Reviewer Three’s fifth recommendation below, we also looked at the effects of RU, CPP, and APE on participants’ performance on the subsequent memory test. Because these variables are, by definition, highly correlated with one another (e.g. at encoding RU and CPP are correlated with r=0.827), we fit multiple mixed effects regression models predicting either recognition memory (hits or misses) or value memory (response between $0-$1) for objects from each variable separately. We then performed a 20Fold leave-N-subjects out cross validation procedure to compare these models in order to determine which provided the best prediction of subsequent memory. This information is now provided in the Methods (lines 742-751 and 761-765) and the results are now mentioned in the main text (lines 342-347) and reported in Appendix 2, and in a new supplementary figure (Figure 5—Figure supplement 2). In brief, only RU at encoding time had an effect on both recognition and value memory in both samples. Specifically, higher RU at encoding predicted greater subsequent memory. Further, in both the main and replication samples, both recognition and value memory were best predicted by RU. We have now amended our discussion of the effects of surprise and uncertainty on subsequent memory to incorporate these results (lines 414-419).

Reviewer #3 (Recommendations for the authors):

As mentioned in the public review, I thought this was a very interesting study and the results were clearly communicated. However, I have a number of questions/recommendations for the authors to strengthen the results and interpretation.

Regarding the points made in the public review about uncertainty v volatility and context, I'd make the following recommendations:

(1) Uncertainty vs. volatility (described in public review): it would make sense to reframe results around volatility rather than uncertainty, or strengthen the trial-wise RU analyses to look at trial-wise RU only at trials far away from reversals. At least there should be some discussion of where uncertainty arises besides volatility.

Thank you for this suggestion—we would like to expand on our response in the Public Review to elaborate on the variant we pursued of the specific analysis suggested here. Our understanding is that the main issue here is whether the results really are mediated by uncertainty rather than reflecting some effect of blockwise volatility other than through its dynamic effects on uncertainty. We agree with the reviewer that the trialwise RU analyses, if correctly done, could provide additional supporting evidence on this point, because the timeseries of trial-wise posterior uncertainty reflects many finer details of uncertainty, even within a block, than the cruder high-vs-low blockwise condition variable. But we think the main issue here is distinguishing blockwise effects from trialwise dynamics rather than, within trialwise effects, the dynamic effect on uncertainty of each individual ground-truth reversal event vs. other (i.i.d.) outcome variability. We are not clear what would be revealed by parsing out response to reversal events vs other noisy outcomes, since the inferential issue in this type of design (from the subjects’ perspective) is precisely that they can’t be reliably distinguished. Thus, even far from a reversal, an ideal observer will have higher posterior uncertainty in a highvolatility block due to a higher expected hazard of reversal, so even there these effects are intertwined.

Accordingly, to address one take on this, we added an additional effect of the interaction between the environment and episodic value to our combined choice model. This allowed us to look at, separately, participants’ tendency to modulate their reliance on episodic memory in response to volatility (as captured by our categorical environment variable) and in response to trialwise fluctations in posterior uncertainty (as captured by relative uncertainty), in the same model. After doing this, we found that both the environment and relative uncertainty increased sensitivity to episodic value. These changes and results are described in the Methods (lines 686-694) and Results (lines 276-277; Figure 4C).

(2) Context: I think the analyses would be strengthened with an additional model using context inference rather than incremental learning. A natural choice might be Gershman and Niv 2012, although you could possibly get away with something simpler if you assume 2 contexts.

Thank you for this suggestion. We agree that including another model to capture context inference would substantially strengthen the paper. As mentioned in our response to the related question raised in the Public Review, we have addressed this point using a hidden Markov model with two states. While the model used in Gershman and Niv, 2012 solves a similar problem, it uses a Chinese Restaurant Process to infer the total number of hidden contexts. We think it is unlikely that the participants in our task engaged in inference over the total number of contexts as they were explicitly informed that each deck could be either lucky or unlucky at a given time (essentially informing them that there were only two contexts in this task). Further, this model was developed for binary outcomes, whereas the outcomes used in our task range between $0 and $1.

(3) The focus on incongruent trials seems potentially thorny. It is intuitive why the authors do this: trials in which episodic and incremental values disagree are informative about which normative strategy the subjects are using. However, in the high volatility condition, if subjects are using incremental value, they may be more likely to have an outdated incremental value which would look consistent with the episodic choice. I would propose the authors look at congruent trials as well to confirm that they are indeed less likely to make errors on these trials in the high volatility condition than they are in the low volatility condition.

Thank you for raising this issue; we agree that it is important to disambiguate episodic-based choices from noisy choices. This point is related to Reviewer One’s first Public Review suggestion, and our solution is described in detail in our response there. In brief, we first assessed the extent to which each subject made noisier choices in the high volatility compared to the low volatility environment and then controlled for this in our analysis of episodic-based choice between environments. The effect of environment was similar to that originally reported in the manuscript following this adjustment. The reported effects (lines 178 and Appendix 1) and methods (lines 643-655) have been updated to reflect these changes.

(4) Another question relates to the interpretation of competing for episodic v. incremental strategies, as opposed to just learning about independent features. One could argue that the subjects are doing instrumental learning over objects and colors separately, and when the reliability of one feature (color) is decremented, the other feature is relatively up-weighted. This also seems consistent with the fact that episodic and incremental learning tradeoff -- attending more to the object feature would perhaps compete with color.

We agree that this is a possible interpretation of our task—for the purposes of this study, we operationalized incremental learning as a repeated feature and episodic memory as a trialunique feature, but future work can be done to more directly implicate each of these memory systems in a task that allows for them to trade off. We have added a paragraph to the paper discussing and responding to this point in more detail (lines 447-461).

(5) The authors show that uncertainty at encoding time does not have a discernible effect on the episodic index. This is evidence that volatility is modulating episodic contribution to decision-making, rather than encoding strength, which is a pretty fundamental part of the results (and in contrast to eg Sun et al. 2021, where unpredictability modulates episodic consolidation). One key thing to look at is if subjects show any difference in their ability to recall familiarity/value of objects from the different conditions. This would also speak to the question of if volatility is affecting encoding rather than just recall during decision-making. It would also make sense to look at the role of other variables at encoding time (eg prediction error) to see if these predict future use. It would also be interesting to see if subjects are storing the object value or the incremental value at the time the object was first shown -- this would be easy to check (when subjects rate the value in the last block, are they more likely to err in the direction of the incremental value at the time of encoding (eg like Figure 2A but with x-axis = incremental value at the time estimated with RW)). This would shed insight into exactly what kind of episodic strategy the subjects are deploying.

Thank you for these suggestions, as we agree that there are many other opportunities for analysis of the subsequent memory data. We now expand these analyses, as detailed below as well as in response to Reviewer Two (point five, above). First, we looked at the effects of RU, change-point probability (CPP), and absolute prediction error (APE) at encoding time on both subsequent recognition and value memory (lines 342-347 and Appendix 2).

In addition, based on your other points here, we also performed several other analyses of the subsequent memory data. We first looked at whether subsequent memory differed depending on whether an object was seen in either the low or high volatility environment. For recognition memory, this analysis consisted of calculating the signal detection metric d-prime for objects seen in each environment and testing for a difference in performance. For value memory, we tested for the presence of an interaction between an object’s true value and the environment in which it appeared on the value that was remembered by each participant. While environment did not impact value memory, recognition memory performance was better for objects seen in the high compared to the low volatility environment, suggesting that greater volatility at encoding time improved subsequent recall. These analyses are now included in the updated Methods (lines 732-751) and Results (lines 308-318) and Appendix 2. Lastly, based on your final point, we additionally looked at whether participants were more sensitive to episodic or incremental value at the time of encoding when reporting their remembered value for objects and found that object value (Main: β = 0.173, 95% CI = [0.159, 0.187]; Replication: β = 0.183, 95% CI = [0.168, 0.197]) was a substantially stronger predictor than incremental value (Main: β = 0.012, 95% CI = [0.001, 0.024]; Replication: β = 0.014, 95% CI = [0.002, 0.026]) in both samples, thereby suggesting that episodic value was more likely to drive these memory responses.

(6) The analysis showing that subjects that were better at recall in block 3 also had higher episodic index was a useful sanity check. It seems it would also be possible to perform this analysis within-subject (eg does episodic choice correlate with accurate value memory) and that would bear more on the question of whether it was uncertainty or simply a subjective preference for one strategy or another.

Thank you for this idea, which complements Reviewer One’s Public Review suggestion to sort recognition memory trials by whether the object was from episodic- or incremental-choice trials, where we found that participants have greater recognition memory for objects from episodicbased choices. We have additionally performed the within-subject analysis you suggested here by looking at whether participants better remember the value of objects from episodic-based choice trials. To do this, we fit a mixed effects linear regression predicting each participant’s subsequent memory value response from the interaction between choice type and an object’s true value (lines 752-758). We found that, in both samples, participants better remembered the value of objects from episodic-based choices. This effect is now reported in the Results (lines 308-318) and appears as a new panel in Figure 5 (Figure 5B).

https://doi.org/10.7554/eLife.81679.sa2

Article and author information

Author details

  1. Jonathan Nicholas

    1. Department of Psychology, Columbia University, New York, United States
    2. Mortimer B. Zuckerman Mind, Brain, Behavior Institute, Columbia University, New York, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    For correspondence
    jonathan.nicholas@columbia.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2314-0765
  2. Nathaniel D Daw

    1. Department of Psychology, Princeton University, Princeton, United States
    2. Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Contribution
    Conceptualization, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5029-1430
  3. Daphna Shohamy

    1. Department of Psychology, Columbia University, New York, United States
    2. Mortimer B. Zuckerman Mind, Brain, Behavior Institute, Columbia University, New York, United States
    3. The Kavli Institute for Brain Science, Columbia University, New York, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared

Funding

National Science Foundation (1644869)

  • Jonathan Nicholas

National Science Foundation (1822619)

  • Nathaniel D Daw
  • Daphna Shohamy

National Institutes of Health (MH121093)

  • Nathaniel D Daw
  • Daphna Shohamy

John Templeton Foundation (60844)

  • Daphna Shohamy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Sam Gershman, Raphael Gerraty, Camilla van Geen, Mariam Aly, and members of the Shohamy Lab for insightful discussion and conversations. Support was provided by the NSF Graduate Research Fellowship (JN; award # 1644869), the NSF (DS, ND; award # 1822619), the NIMH/NIH (DS, ND; award # MH121093), and the Templeton Foundation (DS grant #60844).

Ethics

Informed consent was obtained online with approval from the Columbia University Institutional Review Board (IRB #1488).

Senior and Reviewing Editor

  1. Michael J Frank, Brown University, United States

Publication history

  1. Preprint posted: July 6, 2022 (view preprint)
  2. Received: July 7, 2022
  3. Accepted: December 1, 2022
  4. Accepted Manuscript published: December 2, 2022 (version 1)
  5. Version of Record published: January 3, 2023 (version 2)

Copyright

© 2022, Nicholas et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 846
    Page views
  • 171
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jonathan Nicholas
  2. Nathaniel D Daw
  3. Daphna Shohamy
(2022)
Uncertainty alters the balance between incremental learning and episodic memory
eLife 11:e81679.
https://doi.org/10.7554/eLife.81679

Further reading

    1. Neuroscience
    2. Physics of Living Systems
    Sabrina A Jones, Jacob H Barfield ... Woodrow L Shew
    Research Article

    Naturally occurring body movements and collective neural activity both exhibit complex dynamics, often with scale-free, fractal spatiotemporal structure. Scale-free dynamics of both brain and behavior are important because each is associated with functional benefits to the organism. Despite their similarities, scale-free brain activity and scale-free behavior have been studied separately, without a unified explanation. Here we show that scale-free dynamics of mouse behavior and neurons in visual cortex are strongly related. Surprisingly, the scale-free neural activity is limited to specific subsets of neurons, and these scale-free subsets exhibit stochastic winner-take-all competition with other neural subsets. This observation is inconsistent with prevailing theories of scale-free dynamics in neural systems, which stem from the criticality hypothesis. We develop a computational model which incorporates known cell-type-specific circuit structure, explaining our findings with a new type of critical dynamics. Our results establish neural underpinnings of scale-free behavior and clear behavioral relevance of scale-free neural activity.

    1. Neuroscience
    Barna Zajzon, David Dahmen ... Renato Duarte
    Research Article

    Information from the sensory periphery is conveyed to the cortex via structured projection pathways that spatially segregate stimulus features, providing a robust and efficient encoding strategy. Beyond sensory encoding, this prominent anatomical feature extends throughout the neocortex. However, the extent to which it influences cortical processing is unclear. In this study, we combine cortical circuit modeling with network theory to demonstrate that the sharpness of topographic projections acts as a bifurcation parameter, controlling the macroscopic dynamics and representational precision across a modular network. By shifting the balance of excitation and inhibition, topographic modularity gradually increases task performance and improves the signal-to-noise ratio across the system. We demonstrate that in biologically constrained networks, such a denoising behavior is contingent on recurrent inhibition. We show that this is a robust and generic structural feature that enables a broad range of behaviorally-relevant operating regimes, and provide an in-depth theoretical analysis unravelling the dynamical principles underlying the mechanism.