LeaveOneTrialOut, LOTO, a general approach to link singletrial parameters of cognitive models to neural data
Abstract
A key goal of modelbased cognitive neuroscience is to estimate the trialbytrial fluctuations of cognitive model parameters in order to link these fluctuations to brain signals. However, previously developed methods are limited by being difficult to implement, timeconsuming, or modelspecific. Here, we propose an easy, efficient and general approach to estimating trialwise changes in parameters: LeaveOneTrialOut (LOTO). The rationale behind LOTO is that the difference between parameter estimates for the complete dataset and for the dataset with one omitted trial reflects the parameter value in the omitted trial. We show that LOTO is superior to estimating parameter values from single trials and compare it to previously proposed approaches. Furthermore, the method makes it possible to distinguish true variability in a parameter from noise and from other sources of variability. In our view, the practicability and generality of LOTO will advance research on tracking fluctuations in latent cognitive variables and linking them to neural data.
https://doi.org/10.7554/eLife.42607.001Introduction
Modelbased cognitive neuroscience attempts to link mathematical models of cognitive processes to neural data in order to advance our knowledge of the mind and brain (Forstmann et al., 2011). A particularly promising but challenging approach in this regard is to derive trialspecific values for parameters of cognitive models and to relate these values to trialspecific brain data, which offers insights into cognitive and neural principles at a highly detailed level of analysis (Gluth and Rieskamp, 2017; van Maanen et al., 2011; Turner et al., 2017; Wiecki et al., 2013). In the present work, we introduce a novel technique for capturing trialspecific values of model parameters that is efficient, widely applicable and easy to apply. We briefly summarize related work and methods before turning to our current proposal.
Primarily, the difficulty in linking momentary fluctuations in cognitive states to neural measures lies in specifying the variability in the cognitive model. Often, cognitive models specify the distribution from which this variability is assumed to originate (e.g., a normal distribution), but remain silent about its direction and extent in single trials. Previous attempts to capture parameter variability have often been specific to a single model. For example, van Maanen et al. (2011) derived maximum likelihood estimates for two singletrial parameters of the Linear Ballistic Accumulator (LBA) model (Brown and Heathcote, 2008). Because the LBA model is of special interest for the present paper, we describe it in some detail here. The LBA belongs to the class of sequential sampling models of decision making, which assume that decisions arise from an evidenceaccumulation process that continues until a threshold has been reached, indicating that the required amount of evidence for a decision has been gathered. Three critical parameters of the LBA are the rate of evidence accumulation (drift rate), the amount of evidence required to reach a decision (decision threshold), and the point from which the accumulation begins (start point). Importantly, sequential sampling models such as LBA predict accuracy rates and the shape of response times (RT) distributions conjointly. Wiecki et al. (2013) introduced a hierarchical Bayesian modeling tool for another sequential sampling model, the Diffusion Decision Model (DDM) (Ratcliff, 1978). Their tool allows the regresssion of singletrial parameters of the DDM onto trialbytrial neural measures such as electroencephalography (EEG) or eventrelated functional magnetic resonance imaging (fMRI) recordings (similar regressionbased approaches were proposed by Hawkins et al. (2017) and by Nunez et al. (2017)). Turner and colleagues proposed a related method, whose rationale is to model behavioral and neuroimaging data jointly under a common (hierarchical Bayesian) umbrella (Palestro et al., 2018; Turner et al., 2013; Turner et al., 2015). Although the approach was again tested within the DDM framework, it may be generalized to any other cognitive model. As we have argued before, however, the complexity of this approach might discourage many cognitive neuroscientists from applying it (Gluth and Rieskamp, 2017). Another, but less general, approach of modeling behavioral and brain data jointly was proposed by van Ravenzwaaij et al. (2017).
We recently proposed a comparatively simple and very general approach to capture trialbytrial variability in cognitive model parameters (Gluth and Rieskamp, 2017). Again, the approach is based on Bayesian principles: it specifies the posterior probability of a parameter value in a specific trial, using the ‘average’ parameter value across all trials and the behavior in the specific trial as the prior and likelihood, respectively. Thus, the method basically answers the question of how the difference between a person’s behavior in specific trials vs. that in all trials can be mapped (in a Bayesian optimal way) onto changes in a specific parameter of interest. For example, a surprisingly fast decision made by an otherwise very cautious decision maker might be attributed to a reduced decision threshold (in the context of sequential sampling models such as LBA or DDM), which then might be linked to altered neural activation in a specific region of the brain such as the presupplementary motor area (Gluth et al., 2012; van Maanen et al., 2011). While this approach is both general and comparatively simple, it can require high amounts of computation time, in particular when one seeks to estimate variability in more than one parameter simultaneously.
In the current work, we propose a novel approach to capturing trialbytrial variability in cognitive model parameters that tries to overcome the shortcomings of specificity, complexity, and inefficiency: LeaveOneTrialOut (LOTO). Briefly, the idea of LOTO is that the difference in the parameter estimates that are based on all trials vs. those that are based on all trials except a single trial provides a reflection of the ‘true’ parameter value in the leftout trial. The rest of the article is structured as follows: first, we introduce the rationale behind LOTO and exemplify this rationale with a simple toy model. Second, we show the circumstances under which LOTO can be expected to capture trialbytrial variability appropriately, and explain how LOTO’s performance can be improved by adapting its application, the underlying cognitive model or the experimental design. Third, we illustrate how one can test the assumption of the presence of systematic parameter variability. Fourth, we compare LOTO to previously proposed methods. Fifth, we present a simulationbased power analysis that aims to test whether LOTO can be expected to identify the neural correlates of trialbytrial fluctuations of model parameters with sufficient power and specificity. Finally, we provide an example analysis, in which we apply LOTO to link variability in a cognitive process (encoding of episodic memory) to brain signals (fMRI activation in the hippocampus). Although we believe that it is essential for the potential user to read the entire article (in order to understand when and why LOTO can be expected to provide acceptable results), we note that some parts of the article are rather technical and may be difficult to understand. Therefore, our suggestion for readers with limited statistical and mathematical knowledge is to read 'Results: II. The LOTO principle', and 'Results: XI.: An example of using LOTO for modelbased fMRI', which exemplifies the application of LOTO, and then to proceed directly to the discussion, in which we provide a ‘LOTO recipe’ that provides a stepbystep summary of how LOTO should be applied. The recipe then refers the reader back to relevant specific sections of the article.
Results
I. The LOTO principle
The rationale behind LOTO is simple and intuitive and can be summarized as follows:
When fitting a (cognitive) model to data, all n trials contribute to the model’s goodnessoffit statistic (e.g., loglikelihood) and thus to the estimation of parameter values.
A single trial, t, shifts the parameter estimation to the direction that accords with the behavior in trial t.
Therefore, if trial t is taken out and the model is fitted again to the n–1 dataset, the new parameter estimate will shift (slightly) to the opposite direction.
Therefore, the (true and unknown) parameter value in t is reflected in the difference between the parameter estimate of all trials and the parameter estimate of all trials except t.
Accordingly, the application of LOTO works as follows:
Estimate all parameters of your model by fitting the model to the entire dataset (e.g., all n trials of a subject).
Reestimate the parameter(s) of interest by fitting the model to the dataset without trial t; keep all other parameters fixed to their estimates derived in step 1.
Take the difference between the parameter estimate(s) of step 1 and step 2, which will provide a reflection of the parameter value in t (i.e., the difference should be correlated positively with the true parameter value).
Repeat the Steps 2 and 3 for all n trials.
You may then link the obtained vector of trialbytrial parameter values to any external measure that was recorded at the same time and that has an appropriate temporal resolution (e.g., eventrelated fMRI data, EEG data, singleunit recordings, eyetracking data, skin conductance responses; see also Gluth and Rieskamp, 2017).
Note that instead of taking the difference between the estimates for n and n–1 trials in step 3, one could simply multiply the n1 trials estimates by −1 to obtain LOTO estimates that are positively correlated with the true parameter values. However, we prefer taking the difference, as this nicely separates aboveaverage from belowaverage parameter values, which will be positive and negative, respectively. In general, it should be obvious that LOTO does not provide singletrial parameter estimates in an absolute sense. Rather, the method provides information about the direction and (relative) amount of deviation of singletrial parameter values from the average. Note that, in most cases, cognitive neuroscientists will be interested in the correlation between trialbytrial parameter values and neural data, for which absolute values do not matter (Gluth and Rieskamp, 2017). In the rest of the article, we sometimes refer to ‘LOTO estimates of a parameter’, but it should be clear that this refers to estimates in a relative sense only.
II. A ‘toy’ model example
In the following, we illustrate the LOTO principle using the binomial distribution as a ‘toy’ model example. In general, the binomial distributions specifies the probability of observing k successes out of n trials given probability θ, which specifies the probability of a success in each trial. Let us assume that we seek to find the estimate of θ that provides the best account of the observation of k successes in n trials. The Maximum Likelihood Estimate (MLE) of this is simply (e.g., Farrell and Lewandowsky, 2018):
Let us now assume that we believe that θ is not stable but may vary from trial to trial, and that our goal is to estimate this trialbytrial variability using LOTO. (Note that for this and the following sections, we simply assume that such trialbytrial variability exists; we address the issue of testing for the presence of trialbytrial variability in 'Results: Section V'. Also note that for this ‘toy’ model example, LOTO cannot provide more information about parameter θ than what is already given by the observations k; the goal of this section is to exemplify the workflow of LOTO rather than its capabilities.) As explained above, LOTO’s estimate for trial t is the difference between the n and the n–1 estimates of θ. For the binomial model, we thus obtain:
where k_{t} indicates whether a success was observed in trial t (k_{t} = 1) or not (k_{t} = 0). The critical question is whether we can expect the LOTO approach to capture changes in observations meaningfully. To answer this question, we take the derivative of Equation 2 with respect to k_{t} to see how LOTO’s estimate changes as a function of k_{t}:
Equation 3 implies that changes in singletrial observations map linearly and positively onto LOTO’s estimates, which is what we desired. To give an example, let us assume k = 5, n = 10, k_{t=1} = 1, and k_{t=2} = 0. Then LOTO’s estimates for t = 1 and t = 2 are 1/18 and −1/18, respectively, with the difference being 1/(n–1) = 1/9. Equation 3 also reveals a general property of LOTO, namely that the change in LOTO is inversely related to the number of trials: naturally, the influence of a single trial on the estimation of a parameter decreases as the number of trials increases. We will further elaborate on this issue in 'Results: Section VII'. Also, we see that LOTO_{t} is not an estimate of θ for trial t in an absolute sense (e.g., a θ of −1/18 is impossible), but that positive and negative LOTO_{t} values indicate above and belowaverage singletrial values, respectively.
So far, we have only established a sensible relationship between LOTO’s estimate and the observation k_{t}, but ultimately we are interested in whether LOTO is also related to the underlying trialwise (datagenerating) parameter θ_{t}. This depends on how a change in θ_{t} is mapped onto a change in k_{t}, which is given by the Fisher information, the variance of the first derivative of the loglikelihood function with respect to θ_{t}. For the binomial distribution with n = 1 (i.e., the Bernoulli distribution), the Fisher information is (e.g., Ly et al., 2017):
As can be seen in the left panel of Figure 1A, the Fisher information of the Bernoulli distribution indicates that observations are more informative about the underlying parameter at the extremes. Accordingly, LOTO’s performance (i.e., the correlation between the true θ_{t} and LOTO’s estimates of it) will also depend on the range of θ_{t}. To illustrate this, we ran simulations by drawing 300 values of θ_{t} from a uniform distribution with a range of .2 (e.g., between .4 and .6) and repeated this for every range from [0, .2] to [.8, 1] in steps of .01 (detailed specifications of all simulations are provided in 'Materials and methods'). LOTO was then used to recover θ_{t}. The middle panel of Figure 1A depicts the average correlation between θ_{t} and its LOTO estimate as a function of the average θ. As predicted by the Fisher information, the correlation is higher when θ is close to 0 or close to 1. Even though we do not further discuss the Fisher information in the context of the more complex models in the following sections, the principles of Fisher information also apply there (including the fact that more extreme parameter values are most informative).
Notably, the (average) correlations between θ_{t} and LOTO’s estimate are quite low (i.e., < .2). The reason for this can be seen in the right panel of Figure 1A, which shows the correlation for an example of 300 trials in the range [.4, .6]: the estimate of LOTO can only take two values, depending in whether a success was observed in t or not. Fortunately, the performance can be improved by assuming that θ does not change from trial to trial but remains constant over a certain number of trials, which we denote m. The larger m, the more different values LOTO can take (and at the same time, our confidence in the estimate of θ increases). LOTO’s estimate, its derivative with respect to m, and the Fisher information for the Bernoulli distribution generalized to m ≥ 1 (i.e., the binomial distribution) are:
Figure 1B illustrates the increases in Fisher information and LOTO’s performance as m increases. This is true, even though we kept the number of draws to 300 (implying that when m = 20, there are only n/m = 300/20 = 15 values estimated by LOTO; see Figure 1B, right panel). Thus, it might be more promising to track fluctuations over multiple trials (e.g., short blocks or sessions) than to try to capture parameter values in each and every trial. For example, Gluth et al. (2015) investigated memorybased decisions and inferred whether an item had been remembered on the basis of three trials that included the item as a choice option (see also 'Results: Section XI', in which we use LOTO to reproduce this analysis).
We see that LOTO can be applied to the Bernoulli and binomial distributions, but it should be (again) noted that LOTO is not particularly helpful in these cases. This is because LOTO will not provide us with any novel information over and above what is already known from the observations. In other words, LOTO’s estimates are perfectly correlated with k_{t} or k_{m}, which is mirrored in the derivatives (Equations 3 and 5b) which depend only on constants n or m. In principle, there are two ways to make LOTO (and any other method for tracking trialbytrial variability in parameters) more useful. First, one could introduce variations in the task, with these variations being directly linked to the predictions of the cognitive model. Second, one could apply cognitive models that predict a ‘richer’ set of data (e.g., sequential sampling models that predict accuracy rates and RT distributions conjointly; see above). In the next section, we first turn to introducing variations in the task.
III. Task variations and comparison with ‘singletrial fitting’
We demonstrate the importance of using variations in the experimental design that are systematically linked to the prediction of the cognitive model within the context of studying intertemporal choices. Intertemporal choices are decisions between a smaller reward available now or in near future vs. a larger reward available only in the more distant future (an example trial of a typical intertemporal choice task could be: ‘Do you want to receive $20 now or $50 in 20 days?'). Traditionally, these decisions have been modeled using temporal discounting models that describe the decrease in utility u_{i} of an option i as a function of the delay d_{i} of its payout (Frederick et al., 2002). We apply the hyperbolic discounting (HD) model (Mazur, 1987), according to which:
where x_{i} refers to the amount of reward and κ is the discount rate (i.e., a free parameter modeling the degree to which utilities are lowered by delays; 0 ≤ κ ≤ 1). To derive choice probabilities (that can then be subjected to MLE for the estimation of κ), the HD model is usually combined with the logistic (or softmax) choice function (e.g., Gluth et al., 2017; Peters et al., 2012), according to which the probability p_{i} of choosing option i from a set of options including i and j is:
where β is a second free parameter modeling the sensitivity of the decision maker to utility differences (β ≥ 0).
Our goal here is to show that by applying LOTO, we can extract information about parameter κ (which we assume to vary from trial to trial) that goes beyond the information already provided by the observations. We do this by looking at the first derivatives of the loglikelihood of the HD model with respect to κ. Let us assume that option i is an ‘immediate’ option, offering amount x_{i} at delay d_{i} = 0, so that u_{i} = x_{i}, and that option j is a ‘delayed’ option, offering amount x_{j} (x_{j} > x_{i}) at delay d_{j} (d_{j} > 0). It turns out that the first derivative of the loglikelihood for choosing the immediate option is:
Detailed derivations are provided in the 'Materials and methods'. Similarly, the first derivative of the loglikelihood for choosing the delayed option is:
Note that the MLE can be obtained by setting these derivatives to 0. Importantly, we see that in contrast to the derivatives of the binomial distribution, the derivatives of the HD model depend on features of the task, namely on the amount and delay of the delayed option (i.e., x_{j} and d_{j}) and on the amount of the immediate option (i.e., x_{i}), which is hidden in p_{i}. At first glance, this already looks as if the estimation of singletrial values of κ_{t} will vary with the task’s features, which is what we desired. Note, however, that both of the terms on the righthand side of the derivatives are ≥ 0. Thus, by setting the derivatives in Equation 8a and Equation 8b to 0 to obtain the MLEs, parameter κ_{t} will take the highest possible value (i.e., κ_{t} = 1) whenever the immediate option is chosen and the lowest possible value (i.e., κ_{t} = 0) whenever the delayed option is chosen. (Intuitively speaking: the best account for choosing the immediate option is to assume maximal impatience, the best account for choosing the delayed option is to assume is maximal patience.) This is illustrated in Figure 2A and B, which depict the loglikelihoods of choosing i (left panels) or j (middle panels) and their derivatives as a function of κ for two different trials with x_{i},_{1} = 20, x_{j},_{1} = 50, and d_{j},_{1} = 20 in trial 1 (bright colors) and x_{i},_{2} = 18, x_{j},_{2} = 100, and d_{j},_{2} = 25 in trial 2 (dark colors). Independent of the features of the task (i.e., the trialspecific amount and delays), the loglikelihoods of choosing i are monotonically increasing, and their derivatives are always positive, whereas the loglikelihoods of choosing j are monotonically decreasing, and their derivatives are always negative.
Critically, however, this caveat disappears as soon as each option (i and j) is chosen at least once. For instance, when keeping x_{i}, x_{j}, and d_{j} constant over two trials in which both i and j are chosen once, the derivative of the loglikelihood becomes:
which is positive if p_{i} < .5 but negative if p_{i} > .5. The right panels of Figure 2A and B show the loglikelihoods and their derivatives when assuming that trial 1 or trial 2 were presented twice, and i was chosen in one occasion and j was chosen in the other occasion. Here, the derivatives cross the 0line, and even more importantly, they do so at different values of κ for the different trials (the MLE of κ for trial 2 is higher because the immediate option is comparatively less attractive than in trial 1).
Why does this allow LOTO to extract more information about κ_{t} than that given by the observations? Recall that the estimate of LOTO is the difference between the MLE of κ for all trials and the MLE for all trials except t. As explicated above, these two MLEs will depend on the task features in the n/n–1 trials as long as both choice options (e.g., immediate, delayed) have been chosen at least once in the n/n–1 trials (so that we get nonextreme estimates of κ from the n/n–1 trials). In principle, LOTO can therefore be applied as soon as both options have been chosen at least twice over the course of an experiment (if one option has been chosen in only one trial, then the estimate of κ when excluding this particular trial will be extreme as in the singletrial case). We illustrate this using the two example trials from Figure 2 at the end of the 'Materials and methods'.
It follows that LOTO should outperform the arguably most elementary method of capturing trialbytrial parameter variability, which we denote ‘singletrial fitting’. With singletrial fitting, we mean that after estimating all parameters of a model once for all trials, a specific parameter of interest (e.g., κ of the HD model) is fitted again to each trial individually (while all other parameters are fixed to their ‘alltrial’ estimates). Here, we show that LOTO but not singletrial fitting of parameter κ of the HD model is able to extract additional information about the true κ_{t}, over and above what is already provided by the observations. Thereto, we generated 160 trials with a constant immediate choice option and variable delayed choice options, and simulated decisions of a participant with κ_{t} varying from trial to trial. Then, we applied LOTO and the singletrial fitting method to recover κ_{t}. As can be seen in Figure 3, the estimates of LOTO (left panel) but not of singletrial fitting (right panel) are correlated with the true κ_{t} after accounting for the observations (i.e., whether the immediate or the delayed option was chosen). In other words, the positive correlation between κ_{t} and the singletrial estimates (dashed magenta line) can be attributed entirely to the fact that higher values of κ_{t} lead to more choices of the immediate option, but the positive correlation between κ_{t} and the LOTO estimates is further driven by how expectable or surprising a decision is (given the trial features x_{i,t}, x_{j,t}, and d_{j,t}).
To summarize, one approach to extract information about trialbytrial changes in a parameter (over and above what is already provided by the observations) is to introduce variations in the experiment that are linked in a systematic manner to the cognitive model. Because its estimates are not solely dependent on the behavior in a single trial but also take all remaining trials into account, LOTO allows to capture this additional information. Note, however, that for specific models, such as sequential sampling models of decision making, LOTO can be useful even without introducing task variations. In the next section, we will look at further improvements in LOTO’s ability to recover trialspecific parameter values by extending the HD model to allow the prediction of choice and RT conjointly.
IV. LOTO benefits from modeling an enriched set of data
Using LOTO to recover trialspecific parameter values of models that predict only binary outcomes, such as choices or accuracy rates, is possible but somewhat limited. This can be seen when looking at the left panel of Figure 3, which suggests that LOTO’s estimates are always higher when the immediate option was chosen. The reason for this is that taking one trial out in which the immediate/delayed option was chosen will always de/increase the estimation of the discount factor κ, because (as stated above) the singletrial loglikelihood is monotonically in/decreasing given the choice. (The magnitude of this change varies as a function of the trialspecific choice options, but the sign of this change is determined by the choice.) However, this strict choicedependent separation might not be plausible; for example, it could very well be that the delayed option is chosen in a trial in which this option offered a very high reward x_{j} with a very low delay d_{j}, even though the actual discount factor in that trial (i.e., κ_{t}) was higher than on average. To overcome this limitation, we suggest that the model is extended so that it can take RT data into account. Going back to the example, the intuition is that even if the highly attractive delayed option is chosen, a high discount factor κ_{t} will make this decision more difficult, which will result in a comparatively long RT. Applying LOTO to the extended model then allows to capture this RTspecific information.
We extended the HD model by adding the LBA model to it (using an approach that is similar but not equivalent to that used by Rodriguez et al., 2014 and Rodriguez et al., 2015). Specifically, we assume that the difference in utility between the immediate and delayed options is mapped onto their LBA’s expected drift rates as follows:
where λ is a free parameter that scales the relationship between utility and expected drift rate. Note that these definitions ensure that the two expected drift rates sum up to 1, following the original specification of the LBA (Brown and Heathcote, 2008). More specifically, ν_{i}, and ν_{j} represent the means of two independent normal distributions (with standard deviation s), from which the actual drift rates in a trial are assumed to be drawn (see 'Materials and methods'). Choices and RT were simulated for 160 trials of 30 participants and LOTO was applied to the HD and the HDLBA models. Singletrial fitting was also applied to the HDLBA model. Figure 4A depicts correlations of an example participant between the true κ_{t} and LOTO’s estimate for HD (left panel) and for HDLBA (right panel). For HDLBA, the correlation is much improved, and LOTO’s estimates do not strictly separate choices of the immediate and delayed options anymore. The improved performance of LOTO is confirmed when comparing the correlations of all 30 simulated participants (Figure 4C; immediate chosen: t(29) = 10.79, p < .001, Cohen’s d = 1.97, Bayes Factor in favor of the alternative hypothesis BF_{10} > 1,000; delayed chosen: t(29) = 8.79, p < .001, d = 1.61, BF_{10} > 1,000).
Notably, the singletrial fitting method applied to the HDLBA model is also able to capture some variability over and above the information that is already contained in the decisions (Figure 4). This is because RT (and the interplay between RT and decisions) add further information about κ_{t} that can be picked up by singletrial fitting. Nonetheless, applying LOTO to HDLBA yields significantly better recovery of the trialwise parameter values (Figure 4C; immediate chosen: t(29) = 7.24, p < .001, d = 1.32, BF_{10} > 1,000; delayed chosen: t(29) = 7.71, p < .001, d = 1.41, BF_{10} > 1,000). Taken together, recovering trialbytrial changes in a parameter of a cognitive model with LOTO (but also with other techniques) is much more efficient when the model is able to predict a rich set of singletrial observations, such as RT (and possibly even continuous ratings of choice confidence as in Pleskac and Busemeyer, 2010) in addition to choices or accuracy rates.
V. A nonparametric test for the presence of systematic trialbytrial variability
So far, our derivations and simulations have rested on the assumption that systematic variability in the parameter of interest exists. As we have argued before, however, this assumption does not have to be correct — not even when the overt behavior appears to be stochastic (Gluth and Rieskamp, 2017). It could be that stochasticity is induced by random fluctuations that are not accounted for by the cognitive model (e.g., randomness in motor execution). When assuming variability in a specific parameter, the risk is that such unsystematic variability is taken up by LOTO even in the absence of any systematic variability in the parameter. Hence, the question is whether one can separate situations in which LOTO captures systematic vs. unsystematic trialbytrial variability.
To test for the presence of systematic trialbytrial variability, we propose a nonparametric and simulationbased approach that exploits the improvement in model fit by LOTO: if a model is fitted to n–1 trials, the fit statistic (e.g., the deviance) will improve for two reasons: i) the statistic for the leftout trial is excluded; and ii) the parameter estimate is adjusted so that it optimally fits the n–1 dataset. The rationale of our proposed approach is that the improvement in fit that is attributable to the second reason (i.e., the adjustment) will be lower when there is no systematic trialbytrial variability in the parameter of interest compared to when there is systematic variability. Practically, we suggest conducting this test in five steps:
For each participant, estimate parameters of the cognitive model for all trials and compute the goodnessoffit per trial.
For each participant, generate many (e.g., 1,000) simulations of the model for all trials and all participants using the participants’ estimated parameters and assuming no variability in the parameter of interest (i.e., the null hypothesis).
Estimate parameters for these simulated data again, apply LOTO, and compute the LOTObased improvement in goodnessoffit for the n–1 trials.
Do the same as in step 3, but for the empirical data.
Test whether the LOTObased improvement in fit for the empirical data is greater than 95% of the improvements in the simulations that were generated under the null hypothesis.
Note that one advantage of this nonparametric approach is that it does not rely on any assumptions about how the improvement of model fit is distributed under the null hypothesis.
Figure 5A shows the results of this test for the HD model introduced in 'Results: Section III'.
We generated 1,000 simulations per participant under the null hypothesis and 10 additional simulations with increasing levels of variability in parameter κ (for details, see 'Materials and methods'). Importantly, the figure shows the average difference in deviance between the ‘alltrial’ fit and the LOTO fits only for the n–1 trials that were included in each LOTO step. This means that the trivial improvement in fit resulting from the fact that LOTO has to explain one trial less is taken out. Thus, the histogram in Figure 5A, which shows the results for the 1,000 simulations under the null hypothesis, confirms our suspicion that LOTO will improve the fit even in the absence of true trialbytrial variability in parameter κ. However, the continuous vertical lines, which show the results for the simulations with true variability of increasing amount (from blue to red), indicate significantly higher improvements in fit when variability is actually present — except for the lowest level of variability (the dashed black line indicates the 95% threshold).
What happens if there is no systematic variability in the parameter of interest (here: κ), but there is systematic variability in another parameter (here: β)? We can repeat the previous analysis but now simulate variability (of increasing amount) in β instead of κ. (Note that variability in β would mean that the utility difference between choice options is mapped onto the decision in a more/less deterministic way, depending on whether β_{t} is high or low in a trial; for instance, choosing the immediate option in a trial in which the delayed option is very attractive could be attributed to a low β_{t} value instead of a high κ_{t} value.) As can be seen in Figure 5B, the improvement in goodnessoffit when trying to explain variability in β by modeling variability in κ using LOTO does not exceed the improvement under the null hypothesis, and there is no systematic relationship with the amount of variability in β. Thus, we can conclude that finding a significantly higher improvement in fit in our data compared to simulated data under the null hypothesis cannot be due to a misattribution of variability in β to variability in κ. (Of course, this conclusion does not have to hold for other models, parameter sets, and tasks; the test has to be conducted anew for every different situation.)
Variability in behavior (and neural activity) may derive not only from variability in model parameters but also from the processes used to infer the input and output signals of a computation (Drugowitsch et al., 2016; Findling et al., 2018; Loomes et al., 2002; Polanía et al., 2019). For example, choosing the immediate or delayed option in the intertemporal choice task might depend not only on the current impulsiveness (i.e., κ) but also on whether the utility of the delayed option (i.e., u_{j}) is computed correctly or whether the amount x_{j} and the delay d_{j} are perceived accurately. Thus, it might be desirable to examine whether LOTO can dissociate potential variability in these features from variability in the parameter of interest.
Accordingly, we simulated data of the HD model without trialbytrial variability in κ but with variability in either the amounts x_{j}, the delays d_{j} or the computation of utilities u_{j} of the delayed option. (Note that we ensured that the degree of variability in these variables was similar to that of κ in the previous simulations; see 'Materials and methods'.) Then, we applied LOTO to these simulated data to test whether LOTO misattributes the variability to κ. As can be seen in Figure 5—figure supplement 1, LOTO does not misattribute variability in amounts, but for 3 out of 10 simulations of delaybased variability and for 4 of 10 simulations of utilitybased variability, a misattribution could not be ruled out. This appears reasonable given the tight connection of the parameter to both delays and utilities (see Equation 6).
How can the separability of the discount factor κ from the input and output variables x, d, and u be improved? We reasoned that changing the intertemporal choice task by increasing the number of options should help: whereas the discount factor can be assumed to be stable across options (and to vary only from trial to trial), the variability in the input and output variables can be assumed to differ between options (i.e., they are drawn from independent distributions for each option). Accordingly, we extended the simulated task to three options, which can still be considered a reasonable experimental design (e.g., Gluth et al., 2017). We then performed the test for systematic variability using simulated data for this extended task that exhibited true variability in either κ, x, d, or u. As predicted, a significant improvement in model fit over and above the improvement under the null hypothesis of no variability was only found for κ but not for any of the input or output variables (Figure 5—figure supplement 2). In conclusion, LOTO might be an efficient tool to dissociate parameter variability from variability in the input and output signals of computational processes, assuming an appropriate experimental design and careful simulations.
VI. Inferring trialbytrial variability in more than one parameter
An advantageous feature of LOTO is that the duration of applying the method is almost independent of the number of parameters for which trialbytrial variability is inferred. The full model is still estimated once for all n trials, and LOTO is still applied n times (but the estimation might take a bit longer for multiple parameters). In this section, we illustrate the application of LOTO to inferring trialbytrial variability in two parameters of the HDLBA model: the discount factor κ and the decision threshold b (for details, see 'Materials and methods').
First, we used the nonparametric test from the previous section to check for the presence of systematic variability. In doing this, we simulated 1,000 sets of 30 participants under the null hypothesis and tested whether the simulation with true variability in κ and b provides a significantly improved deviance than those simulations. This was confirmed as only 19 of the 1,000 simulations under the null hypothesis provided a better fit. It should be noted that the explanatory power of this test is somewhat limited, because we can only conclude that there is systematic variability in at least one of the two parameters but not necessarily in both of them.
Next, we looked at the correlations between the true κ_{t} and b_{t} parameter values and their LOTO estimates for an example participant. As shown in the upper left panel of Figure 6A, LOTO is still capable of recovering the variability in parameter κ from trial to trial for both types of decisions (i.e., immediate and delayed). Similarly, LOTO captures the variability in parameter b (lower right panel). The lower left and upper right panels in Figure 6A show the correlations between the true κ_{t} and b_{t} with the ‘wrong’ LOTO estimates of b_{t}and κ_{t}, respectively. Ideally, one would desire correlations close to zero here, that is, LOTO should not attribute variability in one parameter to variability in another parameter. However, we see that there are modest correlations whose signs depend on the choice. When the immediate option was chosen, the correlations are negative; when the delayed option was chosen, the correlations are positive. These results are confirmed when looking at the average correlations across all 30 simulated participants (Figure 6B): LOTO attributes variability in the ‘correct’ parameters independent of choice and variability in the ‘wrong’ parameters depending on which option was chosen. (Intuitively, a high discount factor κ_{t} competes with the decision threshold b_{t} as an alternative explanation for fast choices of the immediate option or slow choices of the delayed option. Similarly, a high decision threshold b_{t} competes with the discount factor κ_{t} as an alternative explanation for choices of the option with highest utility.)
Importantly, these results indicate that the observations (i.e., choices and RT) could explain away the undesired correlations while not affecting the desired correlations. To test this, we conducted randomeffects linear regression analyses with the true parameter (κ_{t}, b_{t}) as the dependent variable and with the LOTO estimates of these parameters together with choices and RT as independent variables (Figure 6C). The standardized regression coefficient (averaged across simulated participants) for LOTO’s estimate of κ_{t} when explaining variance in true κ_{t} was indeed significantly positive with a very high effect size (t(29) = 28.48, p < .001, d = 5.20, BF_{10} > 1,000). The coefficient for the ‘wrong’ estimate of b_{t} when explaining variance in κ_{t} also reached significance, but the effect was much weaker (t(29) = 3.01, p = .005, d = 0.55, BF_{10} = 7.74). With respect to explaining variance in true b_{t}, only LOTO’s estimate of the ‘correct’ estimate of b_{t} was significant and strong (t(29) = 30.40, p < .001, d = 5.55, BF_{10} > 1,000), whereas the effect for the ‘wrong’ estimate of κ_{t} did not reach significance (t(29) = −1.93, p = .063, d = −0.45, BF_{10} = 0.99).
Taken together, LOTO makes it possible to infer trialbytrial variability in more than one parameter within the same model, revealing information over and above that provided by the observations. At the same time, LOTO’s misattributions of variability to other parameters are largely explained away by the observations. (As before, these conclusions are restricted to the current setting of task, model, and parameters, and have to be tested anew for different settings.) Notably, the randomeffects linear regression analysis conducted above is very similar to the conventional approach of analyzing fMRI data on the basis of the General Linear Model (Friston et al., 1994). The ultimate question is not whether LOTO induces some significant misattribution or not, but whether this misattribution can in turn be expected to lead to wrong inferences about the relationship between brain activity and variability in cognitive parameters. We further elaborate on this issue in 'Results: Section X'.
VII. The performance of LOTO as a function of the number of trials
Trial numbers in neuroimaging studies are often low. fMRI studies are particularly limited because scanning hours are expensive, and because the low temporal resolution of fMRI requires long intertrial intervals in the case of eventrelated fMRI (which is most suitable for trialbytrial analyses). Therefore, it is important to understand the extent to which LOTO’s accuracy depends on the number of trials. To test this, we repeated the simulations of the HD and the HDLBA models for 20, 40, 80, 160, 320, and 640 trials. As shown in Figure 7A, the average extent of the correlations between the true parameter values and the respective estimates of LOTO are largely unaffected by the number of trials. The recovery performance is slightly reduced for the threshold parameter b of the HDLBA model in the case of 20 trials. Note, however, that the variance of the correlations decreases as the number of trials increases. Therefore, LOTO still benefits from increasing the number of trials in terms of achieving higher statistical power, as we show in 'Results: Section X'.
In 'Results: Section II', we mentioned that the impact of a single trial on the overall model fit, as well as on the LOTO estimate, decreases as the number of trials increases. If the number of trials is very high, then the improvement in model fit that results from a change in the parameter value might become smaller than the default tolerance criterion set by the minimization algorithm. In other words, the algorithm assumes convergence because the change in model fit caused by taking one trial out is insignificantly small. Figure 7B illustrates this phenomenon. The upper panels show the results for parameter κ of the HD model for 160 trials, the lower panels for 1,600 trials. In the left panels, a strict tolerance criterion was used; in the right panels, the default tolerance criterion in MATLAB was used (for details, see 'Materials and methods'). In the case of 1,600 trials, the tolerance criterion matters: if the default criterion is used, the LOTO estimates exhibit discrete jumps, that is, the same parameter value is estimated for trials with relatively similar observations, because a more finegraded estimation does not improve the fit beyond the default tolerance criterion. This is not the case for the strict tolerance criterion.
According to the example in Figure 7B, the default tolerance criterion appears to lower the correlation between the true values and the LOTO estimates in the case of 1,600 trials, but only to a minor extent. This observation is confirmed when comparing the correlations across all 30 simulated participants. In the case of 160 trials, the correlations obtained using the strict and the default tolerance criterion are not significantly different (t(29) = 1.27, p = .213, d = 0.23, BF_{10} = 0.40). In the case of 1,600 trials, there is a significant difference (t(29) = 4.59, p < .001, d = 0.84, BF_{10} = 323.03), but the difference is small in absolute terms (i.e., on average the correlation is reduced by only .003). In general, we recommend checking the potential impact of the tolerance criterion on LOTO’s performance by running simulations, in particular when LOTO is applied to a task with many trials (i.e., ≥ 1,000).
VIII. Invariance of LOTO with respect to the underlying parameter distribution
In contrast to some other methods for inferring trialbytrial variability in model parameters (van Maanen et al., 2011; Turner et al., 2015), LOTO does not require the assumption that parameters are drawn from specific distributions. Even though it is not the purpose of LOTO, one might wonder whether the distributions of parameters and of LOTO’s estimates can be expected to be similar. To answer this question, we reverted to the initial ‘toy’ model example of the binomial distribution ('Results: Section II'). For this purpose, we sampled 10,000 Bernoulli events, which we split into 500 ‘blocks’ of 20 trials each (see 'Materials and methods'). For each block, parameter θ_{m} was drawn from either a normal or a uniform distribution and observations k_{m} were then drawn from binomial distributions given θ_{m}. Finally, LOTO was applied to infer variability in θ.
The left and right panels of Figure 8A depict the normally and uniformly distributed parameter values of θ_{m}, respectively. Similarly, the left and right panels of Figure 8B show the distributions of the corresponding LOTO estimates. The distributions of the true parameter values are markedly different, whereas the distributions of LOTO estimates are very similar. The reason for this can be seen in Figure 8C, which depicts the distribution of observations k_{m}, which perfectly match the LOTO distributions (because the observations and LOTO estimates are perfectly correlated for this ‘toy’ model; see 'Results: Section II'). By definition, the observations are distributed binomially and so are the LOTO estimates. Generally speaking, the distribution of LOTO estimates is mostly invariant to the distribution of the underlying parameters but depends much more on how the parameter values map onto the observations via the model and its likelihood function.
To generalize these conclusions to the more interesting case of a cognitive model, we sampled singletrial parameter values of κ (i.e., the discount factor) and b (i.e., the decision threshold) of the HDLBA model either from a normal distribution or from a uniform distribution (details are provided in 'Materials and methods'). The results for parameter κ and for parameter b are presented in Figure 8—figure supplement 1 and Figure 8—figure supplement 2, respectively. Again, the distributions of LOTO estimates appear to be invariant to the shape of the underlying parameter distributions. In this case of the HDLBA model, the parameter values and LOTO’s estimates are linked via the LBA likelihood function. Most of all, this function depends on the drift rates that are drawn from normal distributions. Expectedly, the LOTO estimates appear to be roughly normally distributed. Importantly, our simulations of the HDLBA model also suggest that the performance of LOTO (in terms of the correlation between true parameter values and LOTO estimates) does not seem to depend much on the distribution of the underlying parameter values. Hence, we conclude that LOTO cannot be used to infer the shape of the underlying parameter distribution. Despite this limitation, LOTO is highly efficient in detecting significant relationships between cognitive model parameters and neural data, regardless of the underlying parameter distribution.
IX. Comparison of LOTO with related approaches
In this section, we compare LOTO’s ability to recover the underlying trialbytrial variability of cognitive model parameters with alternative techniques. More specifically, we generated synthetic data (i.e., choices and RT) from the LBA model (Brown and Heathcote, 2008), which assumes that the trialwise drift rates ν_{t} and start points A_{t} are drawn from normal and uniform distributions, respectively (details are provided in the 'Materials and methods'). We compare LOTO with three other methods: i) with our previously proposed technique (Gluth and Rieskamp, 2017; henceforth ‘GR17’), ii) with ‘singletrial fitting’, and iii) with the ‘SingleTrial LBA’ approach (van Maanen et al., 2011; henceforth ‘STLBA’), which was specifically developed for retrieving maximumlikelihood estimates of trialwise drift rates and start points of the LBA.
Figure 9 shows correlations between the true drift rates and start points and the values recovered by each method for an example simulation (upper panels), together with average correlations for multiple simulated experiments (lowest panels). From the example correlations, we see that LOTO and GR17 yield very similar results. This is to be expected because the two approaches employ similar principles. In both cases, parameters are estimated for all trials, and those parameters for which trialbytrial variability is not assumed are fixed to their alltrial estimates; then, both approaches use the discrepancy between the observation in each trial and the observations over all trials to infer plausible trialspecific values. However, the comparison across multiple simulations reveals that LOTO performs significantly better than GR17 for both parameters (drift rate: t(19) = 3.76, p = .001, d = 0.84, BF_{10} = 28.75; start point: t(19) = 4.86, p < .001, d = 1.09, BF_{10} = 259.17). The reason for this is that the performance of GR17 depends on the step size with which the range of possible parameter values is scanned, and we were forced to choose an immediate step size to avoid excessive computation times. As a consequence, the GR17 estimates exhibit discrete jumps (similar to those mentioned in the previous section when applying LOTO with a too lenient tolerance criterion), which lower the correlation with the true parameters. In other words, the advantage of LOTO over GR17 is its higher processing speed, which is particularly relevant when modeling variability in multiple parameters.
The ‘singletrial fitting’ approach yields acceptable results for the drift rate, but is still outperformed by LOTO (t(19) = 19.72, p < .001, d = 4.41, BF_{10} > 1,000). More dramatically, the method fails completely with respect to recovering the start point values. In our view, this is because adjusting the drift rate is sufficient to account for the observations in a single trial (i.e., whether the choice was correct or not and how fast the choice was), so that there is no information left to be captured by the start point. By contrast, LOTO (and GR17) estimates depend on how the likelihood space of the LBA model changes when one trial is taken out, and this likelihood space depends on both parameters.
Finally, the STLBA method performs worse than LOTO with respect to the drift rate (t(19) = 52.60, p < .001, d = 11.76, BF_{10} > 1,000) but better with respect to the start point (t(19) = −30.08, p < .001, d = −6.73, BF_{10} > 1,000). Looking at the example simulations in Figure 9, we see that this is because many singletrial drift rates (but not start points) of the STLBA are equal to the ‘alltrial estimate’. This is a consequence of LBA’s assumption that drift rates are drawn from a normal distribution whereas start points are drawn from a uniform distribution, which induces a ‘ridge’ in the joint density distribution on which one drift rate value (namely the mean of the normal distribution) together with many start points are the maximumlikelihood estimates (L. van Maanen; personal communication, February 2018). Overall, LOTO appears to provide comparatively good, and in fact mostly superior, results when compared to alternative approaches for capturing trialbytrial variability in cognitive model parameters.
The previous comparison does not include arguably the most powerful approach for linking cognitive models with neural data, which is the joint modeling approach (Turner et al., 2015). This method employs hierarchical Bayesian modeling (Farrell and Lewandowsky, 2018; Lee and Wagenmakers, 2013) to draw both behavioral and neural parameters from a joint, higherorder probability distribution and to fit models to both behavioral and neural data. To compare LOTO to the joint modeling approach, we first implemented Bayesian modeling to obtain singletrial estimates of parameter κ of the HD model (details are provided in the 'Materials and methods'). Then, we extended this to the joint modeling approach by simulating neural data F(κ)_{t}, which we assumed to correlate with κ_{t}, and included these data when estimating the model. Thus, the following comparison comprises three methods: LOTO, Bayesian modeling (without joint modeling), and the joint modeling approach.
We compared the three methods with respect to the correlation between the true values of κ_{t} and the methods’ estimates (Figure 9—figure supplement 1A) and with respect to the correlation between the neural data F(κ)_{t} and the methods’ estimates (Figure 9—figure supplement 1B). First of all, all three methods provided significant relationships with the true model parameters and with the neural data (all p < .001, all BF_{10} > 1,000). Expectedly, the correlations illustrate that the Bayesian modeling and the joint modeling approach yielded similar estimates. Averaged over the simulated participants, however, the joint modeling approach outperformed (nonjoint) Bayesian modeling with respect to both κ_{t} (t(29) = 3.61, p = .001, d = 0.66, BF_{10} = 21.96) and F(κ)_{t} (t(29) = 4.32, p < .001, d = 0.79, BF_{10} = 131.84). Remarkably, LOTO outperformed Bayesian modeling (t(29) = 2.82, p = .009, d = 0.51, BF_{10} = 6.07) as well as the joint modeling approach (t(29) = 2.51, p = .018, d = 0.46, BF_{10} = 3.33) with respect to κ_{t}. With respect to the neural signal F(κ)_{t}, however, LOTO yielded similar correlations compared to Bayesian modeling (t(29) = −0.62, p = .540, d = −0.11, BF_{10} = 0.26) and was outperformed by the joint modeling approach (t(29) = −3.03, p = .005, d = −0.55, BF_{10} = 13.78). Overall, the joint modeling approach did indeed provide the best resultsin terms of linking the singletrial neural data to the cognitive model, but the differences between the methods appear to be comparatively small. Taking flexibility and difficulty of implementation into account, LOTO appears to offer an expedient alternative to joint modeling (see also 'Discussion').
X. A power analysis for linking LOTO estimates to neural data
The ultimate goal of LOTO (and related approaches) is to establish a link between the inferred trialbytrial variability in parameters of cognitive models and the brain activity measured with tools such as fMRI or EEG. Therefore, the question of whether LOTO is an appropriate tool for a given experiment with a specific design (including task, number of trials, number of participants etc.), cognitive model, and parameter(s) of interest should be answered by testing whether LOTO can be expected to recover the (truly existing) correlation between a neural signal and a parameter. In the case of modeling variability in more than one parameter, it is also important to test whether the potential misattribution of variability by LOTO may lead to incorrect inferences regarding the brain–parameter relationship (see 'Results: Sections V and VI'). We emphasize here that these requirements are not unique to LOTO. Any approach for investigating the neural correlates of trialbytrial parameter variability should be accompanied by such analyses of statistical power (i.e., sensitivity) and falsepositive rates (i.e., specificity).
To link LOTO estimates to neural data, we reverted to the HDLBA model for which we have inferred variability in two parameters (i.e., discount factor κ and decision threshold b) in 'Results: Section VI'. When simulating trialspecific parameter values κ_{t} and b_{t}, as well as choices and RT, we also generated two hypothetical neural signals F(κ)_{t} and F(b)_{t} that were positively correlated with κ_{t} and b_{t}, respectively (for details, see 'Materials and methods'). Then, we generated 1,000 datasets of 30 participants with 160 trials each and tested how often the LOTO estimate for a parameter (e.g., κ) correctly explains a significant amount of variability in the associated neural signal (e.g., F[κ]), and how often it incorrectly explains a significant amount of variability in the neural signal associated with the other parameter (e.g., F[b]) over and above that explained by the observations. This was realized by a randomeffects linear regression analysis – similar to the General Linear Model approach in fMRI analysis (Friston et al., 1994). Stated differently, we tested i) the sensitivity of LOTO by means of a power analysis, asking how likely it is that LOTO identifies the correct neural signal, and ii) the specificity of LOTO by means of a falsepositive analysis, asking how likely it is that LOTO misattributes a neural signal to a different parameter.
Figure 10A depicts the histograms of pvalues for tests of whether LOTO’s estimates for κ and b are significantly related to the brain signals F(κ)_{t} (left panel) and F(b)_{t} (right panel), using the setting with 30 participants. According to these results, LOTO achieves a statistical power (or sensitivity) of 67% for parameter κ and of 76% for parameter b, values that are slightly below the threshold of 80% that is usually deemed acceptable. On the other hand, the falsepositive rate (or specificity) of LOTO is good, with values lying just slightly above the 5% error rate. This means that the small misattribution of parameter variability seen in 'Results: Section VI' does not carry over much to a misattribution of the neural signal. In Figure 10B, we repeated the analysis for a sample size that is increased by twothirds (i.e., 50 instead of 30 simulated participants). Expectedly, the statistical power of LOTO improves and lies within acceptable ranges. An alternative solution is to increase the number of trials per participant. Figure 10C depicts the results when the number of participants is kept to 30 but the number of trials is increased by twothirds (i.e., 267 instead of 160 trials). The improvement in statistical power is comparable to that resulting from the increased sample size. Accordingly, one would conclude from these analyses that investigating the neural basis of trialbytrial variability in parameters κ and b of the HDLBA model might require testing more than 30 participants or testing more than 160 trials per participant, whatever is more suitable in a given context (e.g., adding trials might be difficult for eventrelated fMRI designs that require long intertrial intervals but less problematic for EEG designs).
XI. An example of using LOTO for modelbased fMRI
In this section, we apply LOTO to a real neuroimaging dataset to further illustrate the method’s use and usability in modelbased cognitive neuroscience. The dataset is taken from a previously published fMRI study (Gluth et al., 2015), in which participants were asked to make preferential choices from memory. More specifically, participants first encoded associations between food snacks and screen positions and then decided between pairs of food snacks while seeing only the snacks’ screen positions (Figure 11A). Participants had either one or five seconds in which to retrieve the item–location associations and to deliberate upon the preferred snack option.
Choice behavior was modeled using an unbounded sequential sampling model that predicted more accurate decisions (i.e., decisions that are more consistent with evaluations of the food snacks in a separate task) with more deliberation time (details on the model and the fMRI analyses are provided in 'Materials and methods'; see also Figure 3 in Gluth et al. (2015)). Importantly, the model estimates the average memory performance of a participant, that is, the probability of accurately remembering a snack when making decisions. In the original study (Gluth et al., 2015), we employed our previously proposed GR2017 method for estimating trialbytrial variability in order to infer the likelihood of successful memory retrieval for each snack given the choice behavior (intuitively, choosing unattractive snacks is attributed to poor memory performance). Here, we repeat this analysis but use LOTO to make the inference. Notably, LOTO takes advantage of the fact that each snack is a choice option in multiple (i.e., three) choice trials (see 'Results: Section II'). Ultimately, LOTO’s itemspecific estimate of memory performance is applied to the fMRI analysis of the encoding phase. In other words, a modelbased analysis of the wellestablished difference in subsequent memory effect (Paller and Wagner, 2002) is conducted with the prediction that variations in memory performance should be positively correlated with activation of the hippocampus (as has been found in our original publication).
Before using LOTO to inform the fMRI analysis, we checked whether the variability estimates could be linked to other behavioral indicators of memory performance and whether the assumption of systematic variability in memory could be justified. To address the first question, we conducted a logistic regression testing whether the LOTObased estimate predicted correct responses in the cued recall phase (i.e., a phase following the decisionmaking phase in which participants were asked to recall each snack; see 'Materials and methods' and Gluth et al., 2015). As additional independent variables, this regression analysis also included whether the snack was encoded once or twice, how often a snack was chosen, and the subjective value of the snack. As can be seen in Figure 12A, the LOTObased estimate of memory explained variance in cued recall performance over and above the other predictor variables (t(29) = 7.44, p < .001, d = 1.36, BF_{10} > 1,000). To address the question of systematic variability, we followed the procedures described in 'Results: Section V' and generated 1,000 datasets under the null hypothesis of no variability in memory performance. We then compared the LOTObased improvement in model fit in these datasets with the improvement in the real data. Indeed, the improvement in the real data was much larger than in any of the datasets generated under the null hypothesis (Figure 12B). Furthermore, we checked whether potential variability in parameter σ, which represents the standard deviation of the drift rate (i.e., noise in the decision process), could have been misattributed to the memoryrelated parameters. (It seems less plausible to assume that variability in the decision process, in contrast to variability in memory, is itemspecific. Nevertheless, we included this control analysis for instructional purposes.) Thereto, we repeated the test for the presence of systematic variability with simulated data that included different levels of true variability in σ (for details, see 'Materials and methods'). As can be seen in Figure 12—figure supplement 1, LOTO does not appear to misattribute variability in σ to the memoryrelated parameters.
Having established that LOTO has the potential to identify meaningful neural correlates of itemspecific memory performance, we turned to the fMRI analysis. Here, we employed a frequent approach to modelbased fMRI (O'Doherty et al., 2007), which is to add the predictions of a cognitive model as a socalled parametric modulator (PM) of the hemodynamic response during a time window of interest. In our case, this time window is the period during which snack food items are presented during encoding, with the LOTObased variability estimates serving as the PM (in order to test for a modelbased subsequent memory effect). As can be seen in Figure 12C, this analysis revealed fMRI signals in bilateral hippocampus (MNI coordinates of peak voxel in left hippocampus: x = −18, y = −10, z = −12; right hippocampus: 38,–12, −26) that surpassed the statistical threshold corrected for multiple comparisons (left: Z = 3.73; p = .033; right: Z = 4.46, p = .002). These effects are in line with those reported in the original publication, which were based on the GR2017 method and also showed a significant relationship between itemspecific memory strength and hippocampal activation. (Note that, as in the original publication, we ruled out alternative explanations for these results by including cued recall performance and subject values of snacks as additional PMs in the analysis; see 'Materials and methods'.)
To illustrate a second analysis approach, we also conducted a regionofinterest (ROI) analysis by extracting the average fMRI signal from anatomically defined masks of the left and right hippocampi (Figure 12D). The average signals in these ROIs were significantly related to the LOTObased PM (left: t(29) = 2.93, p = .007, d = 0.53, BF_{10} = 6.42; right: t(29) = 4.36, p < .001, d = 0.80, BF_{10} = 182.52). Taken together, these findings demonstrate that we were able to reproduce the analysis of a modelbased subsequent memory effect reported by Gluth et al. (2015) using LOTO.
Discussion
In this work, we introduce LOTO, a simple yet efficient technique to capture trialbytrial measures of parameters of cognitive models so that these measures can be related to neural or physiological data. The core principle is to infer singletrial values from the difference in the parameter estimates when fitting a model with all trials vs. a model with one trial being left out. The application of LOTO is simple: first, the full set of parameters of a model is estimated for all n trials; then, the parameters of interest are estimated again n times with each trial being left out once, while the remaining parameters are fixed to their alltrial estimates. In addition to its simplicity, LOTO is also comparatively fast: if fitting a model once to all trials takes u time units, then LOTO’s runtime will be u × n at most (in fact, it can be expected to be even faster because not all parameters are estimated again). And in contrast to the GR2017 approach, the LOTO runtime will not increase (much) if variability in more than one parameter is estimated. Compared to related methods, LOTO appears to provide mostly comparable or even superior results. Finally, it is a very general approach for capturing variability in parameters. In principle, it can be applied to any cognitive model, although we stress the importance of testing the feasibility of LOTO for each model and study design anew.
The LOTO recipe
In this section, we provide a ‘LOTO recipe’ that summarizes the critical steps, tests and checks that one should follow when seeking to apply the method. We link each point to the relevant section(s) in the 'Results'. Thus, despite our recommendation that readers with limited statistical and mathematical knowledge should first consult this recipe, it is indispensable for interested cognitive neuroscientists to read the relevant sections before trying to apply LOTO to their own data. Notably, Steps 1 and 2 of this recipe are to be performed before conducting the experiment. Therefore, LOTO is not suitable for posthoc analyses of data that have already been acquired with different study goals in mind. Also note that all of the computer codes associated with this article and with LOTO are freely available on the Open Science Framework (https://osf.io/du85f/).
Specify the cognitive model and the parameters for which trialbytrial variability should be estimated; similarly, specify the design of the study (number of trials and participants, experimental manipulations, neuroscientific method).
What type of observations does the model explain? If it explains only choices, try to extend the model to allow the prediction of RT ('Results: Section IV').
How are the experimental manipulations or conditions related to the model (i.e., which manipulation is predicted to selectively influence a given parameter)? If the model has to be fit to each condition separately, try to adjust the model so that its predictions change directly as a function of the condition ('Results: Section III').
Should variability be captured in each trial or across multiple trials (e.g., small blocks of trials)? If simulations of LOTO (see Step 2 below) do not yield acceptable results, consider pooling multiple trials into small blocks and capturing parameter variability from block to block ('Results: Section II').
Does the neuroscientific method allow the measurement of signals that can be related to trialbytrial variability? Blockdesigns for fMRI and ERPs for EEG are less suitable than eventrelated designs for fMRI and timefrequency analyses for EEG.
Test the feasibility of the LOTO approach for the chosen model and study design.
Test whether LOTO can (in principle) provide sensible estimates of the true trialspecific values of the targeted model parameter(s). Specification of the first derivative of the loglikelihood and the Fisher information (as described in 'Results: Sections II and III') is desirable but often cumbersome or even impossible. Instead, we suggest running simulations ('Results: Sections II–IV').
Check whether LOTO might misattribute random noise by conducting the nonparametric test for the presence of systematic variability ('Results: Section V'). Similarly, check whether LOTO might misattribute variability in other parameters to the parameters of interest by conducting the regression analyses outlined in 'Results: Section VI'.
Estimate the statistical power and if applicable the specificity of LOTO given the model and study design ('Results: Section X'). This might require running a pilot study to obtain a rough estimate of the amount of trialbytrial variability in model parameters that can be expected (alternatively, existing datasets could be inspected for this purpose).
For experiments with many trials per participant (i.e.,≥1,000), check whether the default tolerance criterion of the parameter estimation method is sufficient or whether there are discrete ‘jumps’ in the LOTO estimates. If necessary, choose a stricter criterion ('Results: Section VII').
Always use the actually planned number of trials, number of participants and manipulations when doing simulations. Always include the observations that are used to estimate parameters (e.g., choices and RT) when testing the potential of LOTO to recover parameter variability. Applying LOTO is meaningless when the method does not yield any additional information over and above that provided by the observations themselves ('Results: Section II').
If the simulations (see Steps 2a,b) do not yield acceptable results, consider increasing the number of participants or the number of trials per participant ('Results: Section X'), or consider adapting the model or experimental paradigm ('Results: Sections II–IV').
After conducting the study, apply LOTO ('Results: Section I'), conduct the nonparametric test for the presence of systematic variability ('Results: Section V'), and use LOTO’s estimate for analyzing the recorded neural data (e.g., 'Results: Section XI').
Of note, the test for the presence of variability ('Results: Section V'; Steps 2b and 3) and the power analyses ('Results: Section X'; Step 2c) are very timeconsuming because they require the simulation of a large number (e.g., 1,000) of entire experiments and the application of LOTO to the resulting data. For example, for the first power analysis reported in 'Results: Section X', we ran 1,000 simulations of 30 participants with 160 trials each, meaning that the HDLBA model had to be estimated ~5 million times. Therefore, it may not be possible to conduct an extensive search for optimal sample size and trial number to meet a desired statistical power such as 80%. Instead, trying out a limited set of realistic scenarios appears most appropriate. Note, however, that other approaches such as GR17 or hierarchical Bayesian modeling can be much more timeconsuming than LOTO, so that even the simplest simulationbased power analyses would become infeasible using these approaches.
The principle of leaving one out
The idea of leaving one data point out, also termed the ‘jackknife’ principle, is not new. In fact, the approach was proposed by statisticians in the 1950s (Quenouille, 1956; Tukey, 1958) as a way to estimate the variance and bias of estimators when dealing with small samples. From a technical point of view, LOTO bears obvious similarities to statistical methods such as (Bayesian) leaveoneout crossvalidation, which is employed in classificationbased analyses (Bishop, 2006), Bayesian model evaluation (Vehtari et al., 2017), or the leaveonesubjectout approach, which is used to circumvent the nonindependence error in statistical analyses of fMRI data (Esterman et al., 2010). Notably, the jackknife principle has also been adopted for inferring trialspecific onset latencies of ERPs (Miller et al., 1998; Stahl and Gibbons, 2004) and for quantifying trialspecific functional connectivity between different brain regions (Richter et al., 2015). Thus, a combination of LOTO for inferring variability in cognitive model parameters with the same principle used for identifying trialspecific brain activation patterns could be a promising approach for future research in modelbased cognitive neuroscience (e.g., Benwell et al., 2018).
LOTO as a twostage approach to linking behavioral and neural data
As mentioned in the introduction, LOTO is one of several approaches that have been put forward to link computational models of cognition and behavior with neural data. Turner et al. (2017) proposed a taxonomy of these approaches based on whether a method: i) uses neural data to constrain the (analysis of the) cognitive model, ii) uses the cognitive model to constrain the (analysis of the) neural data, or iii) uses both types of data to constrain each other in a reciprocal manner. Within this taxonomy, LOTO is best classified as belonging to the second class, that is, it infers trialwise values of the cognitive model to then inform the analysis of neural data. In other words, LOTO is a ‘twostage approach’ to modelbased cognitive neuroscience, in which the modeling of behavior precedes the analysis of brain data (as exemplified in 'Results: Section XI'). This approach misses out on the advantage of reciprocity exhibited by the joint modeling approach (Turner et al., 2015). However, LOTO’s strength lies in its ease, speed and generality of application. Moreover, it retains the flexibility of the twostage approach with respect to identifying the neural correlates of trialbytrial parameters: it does not require an a priori hypothesis of which brain region or EEG component might be related to variability in a model parameter. Instead, the variability estimate captured by LOTO can simply be plugged into, for instance, a GLMbased fMRI wholebrain analysis (Gluth and Rieskamp, 2017; O'Doherty et al., 2007; see also Figure 12C). Notably, LOTO also retains the flexibility of testing different functional relationships between brain and behavior (e.g., linear, quadratic, exponential) and thus lends itself to exploring different linking functions for subsequent, confirmatory studies.
Concluding remarks
Today, neuroimaging and psychophysiological techniques such as fMRI, EEG and eyetracking allow us to shed light on the process of single actions. However, the development of cognitive models has only partially kept up with these advances because the central tenet of cognitive modeling is still the identification of invariant features of behavior by averaging over repeated observations. LOTO has the potential to fill this gap, in particular because it will not be restricted to a handful of specific models or to a handful of experts in the field who are highly proficient in computational modeling.
Materials and methods
Simulations of Bernoulli and binomial distributions
Request a detailed protocolWe drew n = 300 values of θ_{t} from a uniform distribution over a range of .2 (e.g., [.4, .6]), and then generated one Bernoulli event per trial t using θ_{t} as the underlying probability. LOTO was then used to recover θ_{t} by estimating and subtracting ${\mathrm{\theta}}_{t}{\mathrm{\theta}}_{\neg t}$, for every t according to Equation 2. This was done for every range from [0, .2] to [.8, 1] in steps of .01 (i.e., 80 times) and repeated 1,000 times. Pearson productmoment correlations between θ_{t} and LOTO_{t} were calculated and averaged over the 1,000 repetitions.
In a second step, we generalized to the binomial distribution by assuming that θ remains stable over a limited number of trials m (m < n). The goal was then to recover θ_{m} with LOTO. We chose four different levels of m (i.e., L(m) = {1, 5, 10, 20}), but kept n constant (i.e., n = 300). Thus, the number of different values of θ_{m} decreased by n/m as m increased. Again, simulations were run for every range from [0, .2] to [.8, 1] in steps of .01 and repeated 1,000 times, before LOTO was applied (Equation 5a) and correlations between θ_{m} and LOTO_{m} were calculated and averaged over the 1,000 repetitions (for each level of m separately).
For 'Results: Section IX', which discusses the shape of the distribution of LOTO estimates, we conducted two simulations of n = 10,000 trials each, sampling 500 values of θ_{m} for blocks of 20 trials. The 500 values were drawn from a normal distribution (with mean = 0.5 and SD = 0.05) in the first simulation and from a uniform distribution (in the range [.4, .6]) in the second simulation. We then generated the corresponding Bernoulli events and applied LOTO.
Simulations of intertemporal choice sets and the HD model
Request a detailed protocolFor the sake of plausibility, choice sets for the intertemporal choice task were created in rough correspondence to typical settings in fMRI studies (Peters et al., 2012). We chose to simulate 160 trials per ‘participant’, which would result in a realistic total duration of 40 min of an fMRI experiment, in which one trial lasts ~15 s (e.g., Peters and Büchel, 2009). As is often the case in fMRI studies on intertemporal choice, the immediate choice option was identical in all trials (i.e., d_{i} = 0; x_{i} = 20), but the delay and the number of the delayed choice options varied from trial to trial. In a first step, delays d_{j} were drawn uniformly from the set S(d_{j}) = {1, 7, 14, 30, 90, 180, 270, 360}, and amounts x_{j} were drawn from a uniform distribution with limits [20.5; 80], discretized to steps of 0.5. In a second step, this procedure was repeated until a choice set was found for which the utility of the immediate option u_{i} was higher than the utility of the delayed option u_{j} in ~50% of the trials, given the simulated participant’s baseline parameter κ (see below). Specifically, a choice set was accepted when u_{i} – u_{j} > 0 in more than 45% but less than 55% of trials. Otherwise, a new choice set was created. This was repeated up to 1,000 times. If no choice set could be found, the criterion was relaxed by 10% (i.e., a choice set was now accepted when u_{i} – u_{j} >0 in more than 40% but less than 60% of trials). This procedure was repeated until an appropriate choice set was found. Note that using such adaptive designs is very common and recommended for intertemporal choice tasks in order to avoid having participants who choose only immediate or only delayed options (Gluth et al., 2017; Koffarnus et al., 2017; Peters et al., 2012). In practice, adaptive designs require a pretest to infer the individual discount rate κ.
Simulations of the HD model were also based on empirical data (Peters et al., 2012). For each simulated participant, we first determined ‘baseline’ parameters κ and β (for the sake of simplicity, we omit the indicator variable for participants here and for all following notations) by drawing from lognormal distributions truncated to ranges [0.001, 0.07] and [0.01, 10], respectively. Means (µ) and standard deviations (σ) of these lognormal distributions were {µ_{κ} = –4.8, σ_{κ} = 1} and {µ_{κ} = –0.77, σ_{κ} = 0.71}, respectively, and were chosen so that the medians and interquartile ranges of the generated values would match those reported by Peters et al. (2012). Lognormal distributions were used to account for the fact that distributions of empirical discount rates and sensitivity parameters are often rightskewed. If trialbytrial variability in κ and β was assumed, then this was realized by drawing trialspecific values of κ_{t} and β_{t} from a normal distribution (truncated to values ≥ 0) with means κ and β, and standard deviations 0.01 and 1, respectively. Otherwise, we assumed that κ_{t} = κ, β_{t} = β.
After specifying choice sets and trialspecific parameter values, choices were simulated and the parameters of the HD model were estimated for all trials. This was done in a twostep procedure starting with a grid search (with 25 different values per parameter) to find suitable starting values for the parameters, which were then used in a constrained simplex minimization. Then, LOTO was applied by leaving each trial out once and estimating the parameter(s) of interest (i.e., either κ_{t} or β_{t} or both), again using the ‘alltrials’ parameter estimates as starting values. Similarly, singletrial fitting was conducted whenever necessary by fitting each trial individually, also using the ‘alltrials’ estimates as starting values.
Unless stated otherwise, this procedure was conducted for 30 participants and repeated multiple times whenever necessary. The example participant shown in Figure 3 was taken from such a set of simulations. This participant had baseline parameters κ = 0.0076 and β = 2.11, but only κ_{t} varied from trial to trial. When testing for the presence of systematic trialbytrial variability in 'Results: Section V', we generated 1,000 simulations of 30 participants under the null hypothesis and 10 simulations of 30 participants assuming 10, 20, 30, 40, 50, 60, 70, 80, 90, 100% variability in κ or β compared to the amount of variability used in our initial simulations (see above). When testing for potential misattribution of variability in amounts x_{j}, delays d_{j}, or utilities u_{j} of the delayed option, we simulated 10 different levels of variability (i.e., Gaussian noise) in these input or output variables. Importantly, we ensured comparability with effects of κ by adjusting the largest amount of Gaussian noise in x_{j} (i.e., SD_{x} = 9), d_{j} (i.e., SD_{d} = 10), or u_{j} (i.e., SD_{u} = 6), so that the variance in the output variable u_{j} would be similar to that under the largest amount of Gaussian noise in κ (i.e., SD_{κ}=0.01; see above). In the context of these simulations (see 'Results: Section V'), we also extended the intertemporal choice task to three options. For every trial of this task, three amounts and three delays were randomly drawn from the range of values used in the standard version of the task (see above) and were always combined so that one option had the smallest amount and delay, one option had the largest amount and delay, and one option had medium amount and delay. As for the standard version, trial generation was adapted to the sampled discount factor, so that choice probabilities would not be too extreme. Thereto, the choice probability of each option was calculated using the (generalized) logistic or softmax choice function:
and amount and delays were sampled until choice probabilities were as similar as possible. (More specifically, the first set of options was accepted only if the highest average choice probability for one option was ≤ .34; for each subsequent set of options, this criterion was relaxed by an increase of .001 until a set of options was accepted.) Equation 10 was also used to generate simulated choices, to estimate parameters, and to apply LOTO. Note that a threeoption design could induce specific choice phenomena, socalled ‘context effects’, which are incompatible with the logistic or softmax choice rule (Busemeyer et al., 2019; Gluth et al., 2017). For the purposes of the present work, however, these effects can be neglected.
Simulations of the HDLBA model
Request a detailed protocolProcedures for simulating the HDLBA model were similar to those for the HD model. Like the HD model, the HDLBA model includes parameter κ, and values for κ and κ_{t} were generated in the same way as before. However, HDLBA does not use the softmax choice function (see Equation 7) to specify choice probabilities but instead uses the LBA model (Brown and Heathcote, 2008) to specify the joint choice and response time (RT) distributions. Thus, instead of including the softmax choice sensitivity parameter β, the HDLBA model includes parameter λ, which scales the relationship between utility and drift rate (see Equations 9a and 9b). HDLBA also includes four additional parameters from the LBA model: parameter s, which models (unsystematic) trialbytrial variability in the drift rates (i.e., drift rates for options i, j are drawn from independent normal distributions with means ν_{i}, ν_{j} and standard deviation s); parameter b, which represents the decision threshold; parameter A, which represents the height of the uninform distribution of the start point; and parameter t_{0}, which accounts for the nondecision time of the RT. Most of these parameters were kept identical for all simulated participants (i.e., λ = 0.1, s = 0.3, A = 1, t_{0} = 0.5). Besides κ, only the decision threshold b varied across participants (because it was used to test how well LOTO captures trialbytrial variability in two parameters simultaneously) by drawing individual values from a uniform distribution with limits [2, 3]. When testing LOTO on two parameters simultaneously, trialspecific values of κ_{t} were drawn as stated above, and trialspecific values of b_{t} were drawn from a normal distribution with mean b and standard deviation 0.04 (with the constraint that b_{t} ≥ A).
Parameter estimation again started with an initial grid search. To reduce computation time, grid search was extensive (i.e., 15 grid search values) for parameters κ and b but limited (i.e., three grid search values) for the remaining parameters that were identical across simulated participants. As for the HD model, the best parameter estimates from the grid search were used as starting values for the constrained simplex minimization, followed by applying LOTO and singletrial fitting.
For 'Results: Section IX', which discusses the shape of the distribution of LOTO estimates, the HDLBA model was simulated with a few modifications. First of all, the number of trials was increased to 480 to improve the interpretability of the shape of distributions. Second, the parameters of the lognormal distribution from which individual values of κ were drawn were changed to {µ_{κ} = 3.5, σ_{κ} = 0.2}, so that singletrial parameter values were unlikely to be extreme (i.e., κ_{t} = 0)–which would have affected the shape of the distribution. Normally distributed singletrial values of κ_{t} and b_{t} were drawn as specified above, whereas uniformly distributed values of κ_{t} were drawn from a range [κ – K/2; κ + K/2] with K = 0.35, and uniformly distributed values of b_{t} were drawn from a range [b – B/2; b + B/2] with B = 1.4.
Simulations of the LBA model
Request a detailed protocolWe compared four different methods for capturing trialbytrial variability with respect to how well these methods recover the trialwise drift rate and the start point values of the LBA model (Brown and Heathcote, 2008). We chose these two LBA parameters because one of the tested methods, the singletrial LBA (STLBA) approach (van Maanen et al., 2011), has been developed specifically for this set of parameters. The other three methods were LOTO, the ‘singletrial fitting’ method (see 'Results: Section III'), and an approach that we proposed in a previous publication (Gluth and Rieskamp, 2017), ‘GR17’.
In total, we generated synthetic data from 20 experiments with 20 simulated participants each, and with 200 trials per participant. The decision threshold and nondecision time parameters were fixed across participants (i.e., b = 1.5; t_{0} = 0.2). The remaining three parameters of the LBA model were sampled individually for each participant: i) a value representing the average drift rate parameter ν was drawn from a normal distribution with mean μ_{ν} = 0.6 and standard deviation σ_{ν} = 0.02; ii) a value representing the standard deviation parameter s (for trialbytrial variability of the drift rate) was drawn from a normal distribution with μ_{s} = 0.2 and σ_{s} = 0.02; and iii) a value representing the height of the start point distribution A was drawn from a normal distribution with μ_{A} = 1 and σ_{A} = 0.1 (with the constraint that A ≤ b). Trialspecific drift rates ν_{t} were drawn from a normal distribution with μ(ν_{t}) = ν and σ(ν_{t}) = s. Trialspecific start points A_{t} were drawn from a uniform distribution with limits [0, A]. Note that these values refer to the (trialspecific) drift rates and start points for correct responses (as, for instance, in a twoalternative forcedchoice perceptual or inferential choice task). Similarly, trialspecific drift rates and start points for incorrect responses were drawn from distributions with the same specifications, except that the mean of the drift rate was 1–ν. We chose the abovementioned values so that accuracy rates and RTs for simulated participants were kept in realistic ranges (i.e., mean accuracy rates and RTs per simulated participant ranged from 53 to 89% and from 1.5 to 2.1 s, respectively).
Parameter estimation procedures were very similar to those of the HDLBA model. An initial grid search with an extensive search (i.e., 15 grid search values) for parameters ν and A and a limited search (i.e., three grid search values) for the remaining parameters was used to find good starting values for the constrained simplex minimization, followed by the application of the four methods for capturing trialbytrial variability. The STLBA method was applied according to the equations provided by van Maanen et al. (2011). The GR17 method required the specification of a range of possible parameter values for ν and A as well as a step size with which this range of values was scanned (i.e., prior, likelihood and posterior values were calculated for each step within the range of possible values; for more details, see Gluth and Rieskamp, 2017). Possible values for both parameters were restricted to be within three standard deviations below and above the parameters’ group means, with 1,000 steps per parameter. This implied 1 million calculations for both prior and likelihood distributions because the joint distributions of both parameters needed to be specified. (With these specifications, the GR17 method required about three times the computation time of the three other methods combined; this extensive computation time was the reason for reducing the number of participants in these simulations from 30 to 20.) To test the methods’ abilities to recover trialspecific drift rates and start points, we restricted the analysis to correct trials only (as is often the case in perceptual or inferential choice tasks).
Bayesian modeling and joint modeling
Request a detailed protocolFor the second comparison of LOTO with related approaches in 'Results: Section IX', Bayesian modeling (Farrell and Lewandowsky, 2018; Lee and Wagenmakers, 2013) and the joint modeling approach (Palestro et al., 2018; Turner et al., 2015) were implemented for the HD model using the Just Another Gibbs Sampler (JAGS) software. Our specifications of the joint modeling approach closely followed the tutorial by Palestro et al. (2018): the ‘behavioral’ parameter δ and the ‘neural’ parameter θ were drawn from a joint, multivariate normal distribution with means μ_{δ} and μ_{θ} and precision matrix Ω. The means μ_{δ} and μ_{θ} were drawn from a multivariate normal distribution with means [0, 0] and precision matrix [1, 0; 0, 1], and Ω was drawn from a Wishart distribution with dispersion matrix [1, 0; 0, 1] and df = 2. The neural data were modeled as being sampled from a normal distribution with mean θ and standard deviation τ = 2 (the simulation of neural data is explained in the next paragraph). The behavioral data were sampled from the HD model. Within this behavioral model, the discount factor κ was the probittransformed value of δ (because κ is restricted to values between 0 and 1, but δ is drawn from a normal distribution). To reduce the joint modeling approach to the (nonjoint) Bayesian modeling, the sampling of neural data was simply omitted, so that parameter estimates were solely dependent on the behavioral data. For the sake of simplicity, we conducted parameter estimation within each simulated participant and did not draw individual parameters from higherlevel group distributions. Sampling settings were as follows: five chains, 20,000 burnin samples per chain, 2,000 recorded samples, thinning of 20 samples. Model convergence was checked by computing the GelmanRubin statistic $\hat{R}$ (Gelman and Rubin, 1992), which was ensured to be below 1.01.
Simulations of neural signals
Request a detailed protocolTo apply the joint modeling approach (Turner et al., 2015) and to test whether LOTO can capture neural signals that systematically vary with variability in a cognitive model parameter, we simulated neural signals in a very basic form. We assumed that the neural signal F(π) in trial t (which could, for instance, represent the amplitude of the fMRI bloodoxygenleveldependent signal in a circumscribed region of the brain or the average EEG spectral power in a specific frequency band during a circumscribed time period of the trial) that is systematically linked to variability in a parameter π is drawn from a normal distribution with mean π_{t} and standard deviation σ_{F}_{(π)} (see Palestro et al., 2018 for a similar assumption of normally distributed neural data in related work). The value for the standard deviation σ_{F}_{(π)} was specified as follows: we simulated F(π)_{t} together with π_{t} and the observations (choices and RT) for the desired number of trials (160) and participants (30), and then conducted linear regressions with F(π)_{t} as the dependent variable and with π_{t} and the observations as independent variables for each simulated participant (independent variables were standardized). Then, we estimated the effect size of the regression coefficient of π_{t} at the group level. The goal was to obtain an effect size of d = 1 by trying out different values for σ_{F}_{(π)} (the choice of the effect size is justified in the next paragraph). This iterative procedure was stopped as soon as a value of σ_{F}_{(π)} was found, for which the 95% confidence interval of the effect size overlapped with 1 when generating 1,000 simulations of the desired number of participants and trials (i.e., 20/30 participants with 200/160 trials in 'Results: Sections IX and X'). For parameter κ of the HD model, we obtained σ_{F}_{(κ) }= 7.5. For parameter κ of the HDLBA model, we obtained σ_{F}_{(κ) }= 0.93; for parameter b of the HDLBA model, we obtained σ_{F}_{(b) }= 4.8. With respect to the HDLBA model, this implies lower variability in the simulated neural signal for parameter κ than in the signal for parameter b, because the trialbytrial variability in the former parameter is much lower than that in the latter (see above). This might be implausible from a neurophysiological point of view. However, this implausibility does not matter in the context of the current work because we are only interested in the correlation between F(π)_{t} and π_{t} (and the LOTO estimate associated with π_{t}), which is unaffected by a linear transformation of F(π)_{t}.
The rationale of this approach is that in an idealized world, in which we would know the true trialspecific values of π_{t}, we could expect a very high effect size (such as d = 1) for detecting a relationship between π_{t} and F(π)_{t}. This effect size is slightly higher than, for instance, current estimates of realistic effect sizes for fMRI studies, which are based on largescale empirical studies (Geuter et al., 2018; Poldrack et al., 2017). But again, the chosen effect size refers to a hypothetical world, in which the true trialbytrial value of a cognitive function is known and then related to brain activity. For such a fortunate scenario, we can expect to obtain a slightly higher effect size than in reality.
We generated 1,000 datasets of 30 simulated participants with 160 trials for the HDLBA model. For each of these datasets, we conducted randomeffects linear regression analyses, that is, we ran two linear regression analyses with F(κ)_{t}/F(b)_{t} as the dependent variable and the LOTO estimates for π_{t} and b_{t} together with the observations (i.e., choices and RT) as independent variables in each simulated participant, and then conducted twosided onesample ttests on the regression coefficients for π_{t} and b_{t} at the group level. The proportion of significant effects for ‘correct’/‘wrong’ pairings of F and π determine the statistical power/false positive rate of LOTO.
A note on the statistical analyses for simulations
Request a detailed protocolThe employment of frequentist statistical analyses for simulations is controversial because the results depend on the degrees of freedom of the tests, which can easily be adjusted to change the significance of a result. Our motivation in providing these statistical tests is that they offer an intuition of whether specific settings or features can be expected to be relevant when applying LOTO ‘in the field’. Therefore, we chose to conduct statistical tests with their degrees of freedom usually being in the range of common and realistic sample sizes in cognitive neuroscience (e.g., studies of 20 or 30 participants). To complement the frequentist statistics, we also report effect sizes (i.e., Cohen’s d, correlation coefficients) and the results of Bayesian hypothesis tests (i.e., the Bayes Factor BF_{10} refers to evidence in favor of the alternative hypothesis). Bayesian hypothesis tests were conducted in R using the BayesFactor package.
fMRI example: essential study information
Request a detailed protocolThree groups with a total of 84 participants (Gluth et al., 2015) took part in the study (one group of participants that underwent fMRI and two groups that conducted behavioral tasks only). The final sample of the fMRI group included 30 participants, and the analyses in the current article are restricted to this sample. All participants gave written informed consent and the study was approved by the Ärztekammer Hamburg, a local ethics committee in Hamburg, Germany (case # PV4290), where the study was conducted.
Participants were asked to not to eat for 3 hr prior to the experiment. They were familiarized with a set of 48 common food snacks from German supermarkets. To infer the subjective values of food snacks, participants conducted a BeckerDeGrootMarschak (BDM) auction (Becker et al., 1964), in which they stated their willingness to pay money for each snack. To incentivize accurate bids, one randomly selected bid trial was realized at the end of the experiment. Inside the MR scanner, participants conducted the rememberanddecide task, which consisted of multiple runs of encoding, distraction, decisionmaking and recall phases. During encoding, participants learned associations between six different food snacks and six different screen positions. To manipulate memory performance, half of the snacks were encoded once, and the other half twice. During distraction, participants worked on a 2back working memory task for 30 s. Afterwards, participants made nine decisions between pairs of snacks. Snacks were not shown directly, but the respective screen positions were highlighted and participants had to use their memory to make accurate decisions. To manipulate deliberation time, participants were given either a 1 s or a 5 s time window before they had to make a choice. Finally, the ability of the participant to remember snacks correctly was assessed via cued recall (during this phase, MR scanning was paused to enable communication between the participant and the experimenter). Participants were incentivized to make accurate choices by realizing one choice from a randomly selected decisionmaking trial. Further information can be found in Gluth et al. (2015).
fMRI example: cognitive model
Request a detailed protocolTo predict preferential choices and to inform the LOTObased fMRI analysis, we used a slightly modified version of the cognitive model proposed in the original study (each modification is explained below). This model assumes that decisions are based on a noisy evidence accumulation (diffusion) process that is informed by the subjective values of the available choice options. The model also accounts for the possibility that options are not recalled and for participants’ tendency to prefer remembered over forgotten options (the memory bias; see also Gluth et al., 2015 and MecheraOstrovsky and Gluth, 2018). Because participants were forced to wait for a specified amount of time (i.e., either 1 s or 5 s) and then had to indicate their decision promptly, the model does not assume a decision threshold but an unbounded diffusion process that is terminated at the specified amount of time. The probability that an option is chosen is thus given by the probability that the diffusion process ends with a positive or negative value. More specifically, the probability of choosing the left option (l) over the right option (r) is:
where Φ(x) refers to the value of the cumulative distribution function of the standard normal distribution at x, t refers to the deliberation time (i.e., t = 1 or t = 5), σ accounts for the amount of noise in the diffusion process (σ ≥ 0), and Δ_{V} refers to the value difference between l and r, which is given by:
where M_{l} is the probability that option l is remembered, V_{r} is the standardized subjective value of option r, and g represents the memory bias. The memory probability M is either α or β, depending on whether an item has been shown once or twice during encoding, and α and β are two free parameters (0 ≤ α, β ≤ 1). This definition differs from the model used in Gluth et al. (2015) in which β represented the improvement of memory performance for items encoded twice compared to items encoded once. We changed this definition because here we are not interested in testing whether there is a significant improvement in memory for items encoded twice. The second adaption of the model is that the memory bias g is not a free parameter. Instead, we took each participant’s intercept coefficient from the logistic regression that tested for the presence of a memory bias (see Equation 1 in Gluth et al., 2015). This adaption was made because, contrary to the original study, we did not estimate parameters with hierarchical Bayesian modeling here, and because treating the memory bias as a free parameter led to severe misspecifications of the other free parameters in some participants when using maximum likelihood estimation.
The three free parameters of the model (i.e., σ, α and β) were estimated via maximum likelihood estimation using the same minimization techniques as were used for the simulations. A grid search was not performed, but the minimization was repeated ten times with randomly generated starting values. After estimating the full model, LOTO was applied to extract trialbytrial (or more exactly, itembyitem) variability in memory probability M by fixing σ to the fullmodel estimate and reestimating either α or β with each item taken out once (e.g., for an item i that was encoded twice, α was also fixed to the fullmodel estimate and β was reestimated by taking out the three decisionmaking trials that included i as a choice option). The difference between the n item and the n–1 item parameter estimates were then computed and used for the fMRI analysis. To test for the presence of systematic variability in memory probability (see 'Results: Section V'), we generated 1,000 datasets of each participant using the fullmodel parameter estimates and assuming no variability in α or β. LOTO was then applied to these datasets and the improvement in model fit was compared to that in the real data. To check whether potential trialbytrial variability in parameter σ could have been misattributed to the memoryrelated parameters, we simulated data with true trialwise Gaussian noise in σ (but not in α or β) and compared the LOTObased improvement in model fit with that of the distribution under the null hypothesis of no variability. We tried 10 different levels of noise in σ; for the highest level, the trialbytrial variability within a simulated participant was as high as the acrosssubject variability in the real data (i.e., SD_{σ }= 2.2). The lower nine levels exhibited 10% to 90% of this variability.
fMRI example: analysis of fMRI data
Request a detailed protocolBecause the fMRI analysis closely followed the procedures described in Gluth et al. (2015), we provide only a brief overview here. Details on fMRI data acquisition, preprocessing and statistical analyses can be found in the original publication. Statistical analysis was based on the General Linear Model (GLM) approach as implemented in SPM12. As in the original study, twelve onset vectors were defined, including two vectors for the encoding phase: one for items shown once and one for items shown twice. Both of these two onset vectors were accompanied by three parametric modulators (PM) that modeled the predicted change of the hemodynamic response as a function of the following variables of interest: i) the subjective value of the encoded snack; ii) whether the snack was correctly remembered or not in the subsequent cued recall phase; and iii) the LOTObased estimate of itembyitem variability in memory probability (see above). Importantly, the LOTObased PM was entered into the GLM last, so that correlated fMRI signals could be uniquely attributed to this PM. For each participant, one contrast image was generated that combined the effect of the LOTObased PM for both onset vectors of the encoding phase.
At the group level, individual contrast images were subjected to a onesample ttest. For illustration purposes, we performed two different statistical analyses. In the first analysis, we tested for significant voxels within the bilateral hippocampus, using a familywise error correction for small volume at p < .05. The small volume was defined by a bilateral anatomical mask of the hippocampus taken from the AAL brain atlas (TzourioMazoyer et al., 2002). The results were displayed on the mean anatomical (T1) image of all participants using an uncorrected threshold of p < .001 with at least 10 contiguous voxels. The second analysis was a regionofinterest (ROI) analysis for which we averaged the fMRI signal for left and right hippocampus separately (using the same anatomical mask) and tested whether the average signal in the ROI was significant (at p < .05, twosided).
General settings
Request a detailed protocolAll simulations and analyses were conducted in MATLAB. Function minimization via a (constrained) simplex algorithm was conducted using the function fminsearchcon. For determining the convergence of parameter estimation and the termination of search, a very strict tolerance criterion of 10^{−13} was chosen, because LOTO needs to work with very small differences in deviance when trying to adjust a parameter value from the ‘alltrial’ estimate to the n–1 estimate, in particular when n is high. To illustrate the effect of the tolerance criterion, in 'Results: Section VII' the performance of LOTO for the HD model was compared between the strict tolerance criterion and the default criterion in MATLAB, which is 10^{−4}. Bayesian and joint modeling were conducted with JAGS (see above), which was called from within MATLAB via the function matjags.m.
Code availability
Request a detailed protocolThe data and computer codes associated with this article and with the LOTO method are freely available on the Open Science Framework (https://osf.io/du85f/). Although running these codes requires access to the proprietary MATLAB software, we stress that LOTO is a technique rather than a software package and can therefore be realized in various programing languages (including open source software such as R or Python).
Derivatives of the hyperbolic discounting (HD) model
Request a detailed protocolThe utilities u_{i} and u_{j} of the immediate and delayed choice options, as well as the corresponding choice probabilities p_{i} and p_{j}, according to the HD model are provided in Equations 6 and 7, respectively. The loglikelihood for a single choice of the immediate option is given by the natural logarithm of the choice probability p_{i} and thus:
The derivative of LL_{i} with respect to κ (which is hidden in u_{j} in Equation 13) is therefore:
Similarly, the loglikelihood for a single choice of the delayed option is:
And the derivative of LL_{j} with respect to κ is therefore:
Example derivatives of LOTO for the HD model
Request a detailed protocolHere, we show that LOTO will produce estimates of a parameter that are related to both the choices and task features and that are not extreme, as soon as each option has been chosen at least two times (in the case of a twoalternative choice task). We will use the examples of the HD model mentioned in 'Results: Section III'. Assume that there are two trials with identical features (i.e., identical amount x_{i} of the immediate option, identical amount x_{j} of the delayed option, identical delay d_{j} of the delayed option) and that the immediate option is chosen in one trial but the delayed option is chosen in the other trial. (We assume identical features only to simplify the derivatives. As stated above, it is important to vary the features across trials.) In this case, the derivative of the loglikelihood with respect to the discount factor κ is given by Equation 8c. Setting this equation to 0 yields the MLE of the parameter:
We can easily generalize this to the case of n trials in which the immediate option was chosen n_{i} times and the delayed option was chosen n_{j} times (i.e., n_{i} + n_{j} = n):
This equation has to be solved for κ, which requires some steps. By putting the second part of the righthand side to the left hand side, we get:
This simplifies to:
We rearrange this to:
Note that p_{i} is specified in Equation 7. Therefore:
Taking the logarithm on both sides yields:
We rearrange this to:
Note that u_{i} and u_{j} are given by Equation 6. Therefore:
Finally, we rearrange this to isolate κ:
For instance, in the case of n_{i} = n_{j}, this equation simplifies to:
which yields a κ of 0.075 and 0.182 for the two example trials mentioned in 'Results: Section III' (compare with the right panels of Figure 2).
Importantly, Equation 18b can be expected to yield reasonable results as long as n_{i} and n_{j} are ≥ 1. If either n_{i} or n_{j} is 0, then the first term in the parentheses on the righthand side of Equation 18b will be 0, because we have $\mathrm{l}\mathrm{n}\left[0\right]=\pm \mathrm{\infty}$ in the denominator, and the estimation of κ will simply be –1/d_{j}, which is negative and thus lies outside of the range of values allowed for κ. When applying LOTO, we therefore require n_{i} and n_{j} to be ≥ 2, because the parameter is estimated once for n and once for n–1 trials, and in the n–1 case, LOTO will yield an unreasonable estimate in one trial (i.e., if n_{i} = 1, then the error will be in that one trial in which the immediate option was chosen; if n_{j} = 1, then the error will be in that one trial in which the delayed option was chosen).
Data availability
The relevant data and computer codes are uploaded on the Open Science Framework (https://osf.io/du85f/) and are freely available.
References

Measuring utility by a singleresponse sequential methodBehavioral Science 9:226–232.https://doi.org/10.1002/bs.3830090304

Trialbytrial covariation of prestimulus EEG alpha power and visuospatial bias reflects a mixture of stochastic and deterministic effectsEuropean Journal of Neuroscience 48:2566–2584.https://doi.org/10.1111/ejn.13688

Cognitive and neural bases of MultiAttribute, MultiAlternative, Valuebased decisionsTrends in Cognitive Sciences 23:251–263.https://doi.org/10.1016/j.tics.2018.12.003

BookComputational Modeling of Cognition and BehaviorNew York: Cambridge University Press.https://doi.org/10.1017/CBO9781316272503

Reciprocal relations between cognitive neuroscience and formal cognitive models: opposites attract?Trends in Cognitive Sciences 15:272–279.https://doi.org/10.1016/j.tics.2011.04.002

Time discounting and time preference: a critical reviewJournal of Economic Literature 40:351–401.https://doi.org/10.1257/jel.40.2.351

Statistical parametric maps in functional imaging: a general linear approachHuman Brain Mapping 2:189–210.https://doi.org/10.1002/hbm.460020402

Inference from iterative simulation using multiple sequencesStatistical Science 7:457–472.https://doi.org/10.1214/ss/1177011136

The attraction effect modulates reward prediction errors and intertemporal choicesJournal of Neuroscience 37:371–382.https://doi.org/10.1523/JNEUROSCI.253216.2016

Variability in behavior that cognitive models do not explain can be linked to neuroimaging dataJournal of Mathematical Psychology 76:104–116.https://doi.org/10.1016/j.jmp.2016.04.012

On the efficiency of neurallyinformed cognitive models to identify latent cognitive statesJournal of Mathematical Psychology 76:142–155.https://doi.org/10.1016/j.jmp.2016.06.007

BookBayesian Cognitive Modeling: A Practical CourseCambridge : Cambridge University Press.https://doi.org/10.1017/CBO9781139087759

A microeconometric test of alternative stochastic theories of risky choiceJournal of Risk and Uncertainty 24:103–130.https://doi.org/10.1023/A:1014094209265

The Effect of Delay and of Intervening Events on Reinforcement Value55–73, An adjusting procedure for studying delayed reinforcement. In Quantitative Analyses of Behavior, The Effect of Delay and of Intervening Events on Reinforcement Value, 5, Hillsdale, Erlbaum.

Memory beliefs drive the memory bias on Valuebased decisionsScientific Reports 8:10592.https://doi.org/10.1038/s41598018287289

How attention influences perceptual decision making: Singletrial EEG correlates of driftdiffusion model parametersJournal of Mathematical Psychology 76:117–130.https://doi.org/10.1016/j.jmp.2016.03.003

Modelbased fMRI and its application to reward learning and decision makingAnnals of the New York Academy of Sciences 1104:35–53.https://doi.org/10.1196/annals.1390.022

A tutorial on joint models of neural and behavioral measures of cognitionJournal of Mathematical Psychology 84:20–48.https://doi.org/10.1016/j.jmp.2018.03.003

Observing the transformation of experience into memoryTrends in Cognitive Sciences 6:93–102.https://doi.org/10.1016/S13646613(00)018453

Twostage dynamic signal detection: a theory of choice, decision time, and confidencePsychological Review 117:864–901.https://doi.org/10.1037/a0019737

Efficient coding of subjective valueNature Neuroscience 22:134–142.https://doi.org/10.1038/s4159301802920

Scanning the horizon: towards transparent and reproducible neuroimaging researchNature Reviews Neuroscience 18:115–126.https://doi.org/10.1038/nrn.2016.167

A theory of memory retrievalPsychological Review 85:59–108.https://doi.org/10.1037/0033295X.85.2.59

The neural basis of value accumulation in intertemporal choiceEuropean Journal of Neuroscience 42:2179–2189.https://doi.org/10.1111/ejn.12997

Informing cognitive abstractions through neuroimaging: the neural drift diffusion modelPsychological Review 122:312–336.https://doi.org/10.1037/a0038894

Approaches to analysis in modelbased cognitive neuroscienceJournal of Mathematical Psychology 76:65–79.https://doi.org/10.1016/j.jmp.2016.01.001

Neural correlates of trialtotrial fluctuations in response cautionJournal of Neuroscience 31:17488–17495.https://doi.org/10.1523/JNEUROSCI.292411.2011

A confirmatory approach for integrating neural and behavioral data into a single modelJournal of Mathematical Psychology 76:131–141.https://doi.org/10.1016/j.jmp.2016.04.005

Practical bayesian model evaluation using leaveoneout crossvalidation and WAICStatistics and Computing 27:1413–1432.https://doi.org/10.1007/s1122201696964

HDDM: hierarchical bayesian estimation of the DriftDiffusion model in pythonFrontiers in Neuroinformatics 7:14.https://doi.org/10.3389/fninf.2013.00014
Article and author information
Author details
Funding
Swiss National Science Foundation (100014_172761)
 Sebastian Gluth
Israel Science Foundation (38115)
 Nachshon Meiran
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank the members of the Decision Neuroscience and Economic Psychology groups at the University of Basel for helpful discussions, and Gilad Zlotkin, Yoav Kessler and Gideon Rosenthal for suggesting related work in other domains. This work was supported by a grant from the Swiss National Science Foundation (#100014_172761) to SG and by a grant from the Israel Science Foundation (#381–15) to NM.
Ethics
Human subjects: All participants gave written informed consent, and the study was approved by the Aerztekammer Hamburg, Germany (case # PV4290). All experiments were performed in accordance with the relevant guidelines and regulations.
Version history
 Received: October 5, 2018
 Accepted: February 7, 2019
 Accepted Manuscript published: February 8, 2019 (version 1)
 Version of Record published: February 27, 2019 (version 2)
 Version of Record updated: February 28, 2019 (version 3)
Copyright
© 2019, Gluth and Meiran
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,647
 views

 372
 downloads

 14
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
The flow of neural activity across the neocortex during active sensory discrimination is constrained by taskspecific cognitive demands, movements, and internal states. During behavior, the brain appears to sample from a broad repertoire of activation motifs. Understanding how these patterns of local and global activity are selected in relation to both spontaneous and taskdependent behavior requires indepth study of densely sampled activity at single neuron resolution across large regions of cortex. In a significant advance toward this goal, we developed procedures to record mesoscale 2photon Ca^{2+} imaging data from two novel in vivo preparations that, between them, allow for simultaneous access to nearly all 0f the mouse dorsal and lateral neocortex. As a proof of principle, we aligned neural activity with both behavioral primitives and highlevel motifs to reveal the existence of large populations of neurons that coordinated their activity across cortical areas with spontaneous changes in movement and/or arousal. The methods we detail here facilitate the identification and exploration of widespread, spatially heterogeneous neural ensembles whose activity is related to diverse aspects of behavior.

 Neuroscience
Declarative memory retrieval is thought to involve reinstatement of neuronal activity patterns elicited and encoded during a prior learning episode. Furthermore, it is suggested that two mechanisms operate during reinstatement, dependent on task demands: individual memory items can be reactivated simultaneously as a clustered occurrence or, alternatively, replayed sequentially as temporally separate instances. In the current study, participants learned associations between images that were embedded in a directed graph network and retained this information over a brief 8 min consolidation period. During a subsequent cued recall session, participants retrieved the learned information while undergoing magnetoencephalographic recording. Using a trained stimulus decoder, we found evidence for clustered reactivation of learned material. Reactivation strength of individual items during clustered reactivation decreased as a function of increasing graph distance, an ordering present solely for successful retrieval but not for retrieval failure. In line with previous research, we found evidence that sequential replay was dependent on retrieval performance and was most evident in low performers. The results provide evidence for distinct performancedependent retrieval mechanisms, with graded clustered reactivation emerging as a plausible mechanism to search within abstract cognitive maps.