Normative evidence accumulation in unpredictable environments
 Cited 17
 Views 2,821
 Annotations
Abstract
In our dynamic world, decisions about noisy stimuli can require temporal accumulation of evidence to identify steady signals, differentiation to detect unpredictable changes in those signals, or both. Normative models can account for learning in these environments but have not yet been applied to faster decision processes. We present a novel, normative formulation of adaptive learning models that forms decisions by acting as a leaky accumulator with nonabsorbing bounds. These dynamics, derived for both discrete and continuous cases, depend on the expected rate of change of the statistics of the evidence and balance signal identification and change detection. We found that, for two different tasks, human subjects learned these expectations, albeit imperfectly, then used them to make decisions in accordance with the normative model. The results represent a unified, empirically supported account of decisionmaking in unpredictable environments that provides new insights into the expectationdriven dynamics of the underlying neural signals.
https://doi.org/10.7554/eLife.08825.001eLife digest
Organisms gather information from their surroundings to make decisions. Traditionally, neuroscientists have investigated decisionmaking by first asking what would be optimal for the animal, and then seeing whether and how the brain implements the optimal process. This approach has assumed that the environment consists of noisy, but stable, signals that the brain must decipher by accumulating information over time and ‘averaging out’ the noise.
Previous research had suggested that most animals can accumulate information. However, these studies also showed that animals, including humans, often fall short of the optimal solution by being overly sensitive to noise and failing to completely average it out. Of course, in real life, the signals themselves can change abruptly and unpredictably, challenging us to distinguish noise from changes in the underlying signals. If a moving target suddenly jolts to the right, is that change part of the normal jitter that should be ignored, or does it predict where the target will be next? How do we know when to keep old information that is still relevant to the decision, and when to discard the old information because a change might have occurred that renders it irrelevant?
Glaze et al. have addressed this question by building optimal change detection into the traditional ‘informationaccumulation’ framework. The model suggests that what researchers previously thought was an oversensitivity to noise might actually be optimal for the reallife challenge of detecting change. In two different tasks, Glaze et al. tested human volunteers to see if they could make decisions in ways predicted by the model. One task involved the volunteers making decisions about which one of two possible sources of noisy signals generated a given piece of information, with the correct answer changing unpredictably every 1–20 trials. The other task involved looking at a crowd of moving dots, which jolted and wobbled as they changed direction, and the volunteers had to decide which direction the dots were moving at the end of each trial.
Both experiments showed that the volunteers were remarkably good at making decisions in the ways predicted by the new model, and incorporated learned expectations about the rate of change in underlying signals. The results suggest that humans, and potentially other organisms, are capable of detecting changes in the optimal ways suggested by the decisionmaking model. The study also makes predictions about what kinds of neural patterns neuroscientists might find when measuring brain activity while organisms do similar tasks.
https://doi.org/10.7554/eLife.08825.002Introduction
Even the simplest perceptual judgments, like detecting the presence of a dim light, take time for the brain to process (Luce, 1986). Some of this time reflects sensory and motor processing, but a considerable fraction is dedicated to the decision process that converts the incoming sensory information into a categorical judgment that guides behavior (Sternberg, 2001). Under certain conditions, this temporally unfolding process serves a normative purpose: improving the accuracy of the decision by reducing uncertainty about the source or identity of noisy inputs. The sequential probability ratio test (SPRT), driftdiffusion model, and related sequentialsampling models are forms of ‘beliefupdating’ rules for this normative process, based on perfect integration over time of the logarithm of the likelihood ratio (LLR) associated with each data point (Barnard, 1946; Wald, 1947; Good, 1979; Link, 1992; Gold and Shadlen, 2001; Smith and Ratcliff, 2004; Bogacz et al., 2006). These models have been useful for studying neural mechanisms of decisionmaking (Gold and Shadlen, 2007) but are normative for only a restricted set of conditions in which: (1) the ideal starting timepoint for accumulation is known (e.g., given by the onset of an experimental trial); and (2) the statistics of the incoming information are perfectly stable throughout the entire sequence, with no change in the underlying signal and all noise coming from the same probability distribution.
Perfect integration can be particularly problematic for tasks that require the detection of signal changes (Clifford and Ibbotson, 2002). When there is certainty about when the change might occur, integrated signals from before vs after that time can be compared to detect the change (Green and Swets, 1966; Macmillan and Creelman, 2004). However, when there is temporal uncertainty about the change, integrating evidence at the wrong time might miss the signal or add unnecessary noise, resulting in a loss of sensitivity to the change (Lasley and Cohn, 1981). Several possible solutions to this problem have been proposed, including using a leaky integrator, taking a time derivative of the evidence to identify changes, or using knowledge of the spatial and temporal structure of the stimulus to guide a more directed search for the evidence (Henning et al., 1975; Nachmias and Rogowitz, 1983; Smith, 1995, 1998; Verghese et al., 1999; Schrater et al., 2000). However, none of these solutions provide more general insights into how to balance the operations used to identify both steady, noisy, signals and unpredictable changes in those signals.
Here we present a normative model of decisions between two alternatives that provides such an account. In a variety of learning and other tasks, the tradeoff between signal identification and change detection has been related to inference algorithms in hidden Markov models and other Bayesian algorithms. These algorithms estimate statistical parameters in the presence of abrupt and unpredictable changepoints in the otherwise stable statistics of a datagenerating process (Zakai, 1965; Liptser and Shiryaev, 1977; Rabiner, 1989; Yu and Dayan, 2005; Adams and MacKay, 2007; Behrens et al., 2007; Fearnhead and Liu, 2007; Wilson et al., 2010; McGuire et al., 2014; Sato and Kording, 2014). Here we express these algorithms in a novel form that, unlike previous changepoint models, is based on the LLR and thus can be compared directly to standard decision models based on evidence accumulation (Gold and Shadlen, 2001; Usher and McClelland, 2001; Smith and Ratcliff, 2004; Bogacz et al., 2006). The form thus yields quantitative predictions of both choice behavior and the underlying neural signals for decisions about unstable, noisy stimuli (Gold and Shadlen, 2007). A key feature of the model is that the expected amount of instability in the environment governs the temporal dynamics of the decision process. When perfect stability is expected, evidence is accumulated perfectly. Otherwise, evidence is accumulated with a leak (Usher and McClelland, 2001) to a nonabsorbing boundary that expedites the identification of unexpected changes that should restart the accumulation process, where both the leak and the boundary depend on the level of expected instability in the environment. These expectationdependent dynamics represent a novel view of leaky, saturating, or otherwise imperfect evidence accumulation, which here may be understood as facilitating, rather than hindering, statistical inference. We show that human decisionmakers can use these dynamics to solve two different tasks on different timescales (tens of seconds vs hundreds of milliseconds) that each requires information accumulation in the presence of unpredictable changepoints occurring at different rates.
Results
Model
Consider a decision about which of two alternatives is the present source of a sequence of noisy data arriving over time. We derived a beliefupdate rule for these kinds of decisions based on Bayesian principles that have typically been used to understand learning processes in dynamic environments on relatively slow timescales (Figure 1A) (Yu and Dayan, 2005; Adams and MacKay, 2007; Behrens et al., 2007; Fearnhead and Liu, 2007; Wilson et al., 2010; McGuire et al., 2014; Sato and Kording, 2014). This rule both accounts for environmental instability and relates directly to models of perfect, leaky, and bounded accumulation that have been used to understand decision processes in stable environments (Link, 1992; Gold and Shadlen, 2001; Usher and McClelland, 2001; Smith and Ratcliff, 2004; Bogacz et al., 2006). We define belief as the logarithm of the posterior odds of the alternative sources of information (L) given all information collected until a given time point. The sign of L indicates which source is currently believed to be generating the information, and the magnitude of L indicates how certain that belief is. The update rule is optimal when there is a fixed probability that the source could switch to the alternative at any time (i.e., according to a Bernoulli process). Specifically,
where L_{n} is the belief at time step n, LLR_{n} is the sensory evidence (the log likelihood ratio) at step n, H (the ‘hazard rate’) is the expected probability at each time step that the source will switch from one alternative to the other, and ψ is the timevarying prior expectation (the logarithm of the prior odds) about the source before observing the new evidence:
The prior expectation ψ is the key feature of the model, balancing integration to identify steady signals and differentiation to detect changes by dynamically filtering sensory information in a way that depends on both L and H (Figures 1, 2). For the special case of H = 0 (perfect stability), the two rightmost terms in Equation 2 cancel. In this case, the update Equation 1 reduces to perfect accumulation as in randomwalk and related decision models used to identify steady, but noisy, signals (Figure 1D) (Smith and Ratcliff, 2004; Bogacz et al., 2006). In contrast, when H is high and changes are expected, accumulation over time is severely limited to facilitate change detection (Figures 1F, 2G). For intermediate values of H, these operations tradeoff to emphasize change detection at the expense of steady signal identification (for higher H) or vice versa (for lower H; Figures 1E, 2G). Finally, in the special case of H = 0.5, the history of evidence is irrelevant at all times and all three terms in Equation 2 cancel, so ψ = 0 and L_{n} = LLR_{n}.
To gain further insight into the dynamics of the model and how it controls this tradeoff, we made approximations of the nonlinearity in Equation 2:
Here K_{n} governs the leakiness of the accumulation process, and θ_{n}, governs a bias. Both parameters are adaptive, depending on both H and L_{n}, with dynamics that jointly establish a boundary on the prior and thus limit subsequent belief strength. The dynamics include two regimes, as follows.
First, when beliefs are uncertain (i.e., regimes around L_{n − 1} = 0 in Figures 1B, 2A,C; Equation 3a, in which K_{n} predominates over θ_{n}), the model acts like a leaky accumulator, in which the prior expectation is a fraction of the previous belief (Busemeyer and Townsend, 1993; Usher and McClelland, 2001; Bogacz et al., 2006; Tsetsos et al., 2012). Thus, the dynamics of a leaky accumulator can, in principle, act like the normative model, but only in the lowcertainty regime (Figure 3). In this regime, the normative leak is adaptive, which has been demonstrated previously (Ossmy et al., 2013), and is directly dependent on H, which has not been described previously. For low H and thus relative stability, a small leak provides long integration times. For H ≈ 0.5 (the correct answer is equally likely to stay or switch after each sample), the model discards all historical information and L depends only on LLR. For H > 0.5 (the correct answer is more likely to switch after each sample), the prior expectation undergoes damped oscillations (Figure 2G), even when the source of evidence is transiently stable. These oscillations repeatedly switch the direction of existing beliefs because of the high expected probability of change on each discrete time step.
Second, as the magnitude of L_{n − 1} increases and belief certainty becomes high (i.e., regimes around L_{n − 1} far from zero in Figures 1B, 2A,C; Equation 3b,c, in which θ_{n} predominates over K_{n}), such as when the incoming evidence is strong or during periods of stability in the source, the prior expectation approaches a ‘stabilizing boundary’ whose height directly depends on H. Thus, the dynamics of a model that stabilizes the decision process at a hazarddependent value can, in principle, act like the normative model, but only in the highcertainty regime (Figure 3). This boundary represents a suspension of the accumulation process but, unlike the decision bound in the SPRT and related models (Barnard, 1946; Wald, 1947; Good, 1979; Link, 1992; Smith and Ratcliff, 2004; Bogacz et al., 2006; Gold and Shadlen, 2007), does not terminate the decision process. Instead, it stabilizes L_{n} when no changes occur (i.e., temporarily ending further evidence accumulation) while still allowing for the sampling of new evidence that might lead to changes in belief and a restart of the accumulation process (Resulaj et al., 2009). The stabilizing boundary is also in contrast to the asymptote in leaky accumulation, which increases linearly with the strength of evidence (Busemeyer and Townsend, 1993; Usher and McClelland, 2001; Bogacz et al., 2006; Tsetsos et al., 2012).
Together these properties navigate an inherent tradeoff between identification of steady signals and change detection. This tradeoff depends on both evidence strength and expected H (Figure 2E,G). For weak evidence, the tradeoff is most severe, as the model uses expected H to err on the side of either detecting changes quickly when H is high or identifying stable signals when H is low. As the strength of evidence increases, performance improves steadily for both conditions and the tradeoff diminishes.
Continuoustime version
We also developed a continuoustime version of the model (Figure 1C) that allowed for a more direct comparison to driftdiffusion and other continuoustime models of decisionmaking (Figure 1C) (Smith and Ratcliff, 2004; Bogacz et al., 2006; Gold and Shadlen, 2007). The model is based on the optimal filter for a Markov jump process with two states and stationary whitenoise emissions (Zakai, 1965; Liptser and Shiryaev, 1977; Crisan and Rozovskii, 2011). Here we write the incoming evidence as a continuoustime sequence of noisy observations: dx(t) = h(t)d(t) + σdW, where h(t) = ±μ, with the sign depending on which source is generating data at time t, and σ is the standard deviation of the noise in a standard Wiener process dW. The source h(t) jumps between states at an average rate λ, with jumps occurring as a Poisson process. Letting A = 2μ/σ^{2}:
The result can be viewed as a nonlinear filter for the incoming evidence that is more general than the perfect or leaky integration central to previous models of decisionmaking between two alternatives (Busemeyer and Townsend, 1993; Usher and McClelland, 2001; Bogacz et al., 2006). In the special case that λ = 0, dL(t) = Adx(t), which is perfect integration of the noisy observations dx(t). Approximations of this model are similar to those for the discretetime model (Figure 2B,D). When beliefs are uncertain (L ≈ 0), dL ≈ −2λLdt + Adx(t), which results in an OrnsteinUhlenbeck process over periods in which the source is perfectly stable (Busemeyer and Townsend, 1993; Bogacz et al., 2006). As certainty increases (L > 0), a simultaneous increase in leak rate and bias drives the decision variable to a stabilizing boundary (Figure 1C) with a probability distribution that has a heavy tail, reflecting dynamics that facilitate the detection of subsequent changes (Figure 1—figure supplement 1). As with the discretetime model, these dynamics navigate the tradeoff between identification of steady signals and change detection in a way that depends on both evidence strength and expected λ (Figure 2F).
Psychophysics
We used two separate tasks to investigate if and how human subjects could use these dynamics to adapt to different rates of change and find the optimal tradeoff between stable signal identification and change detection. For both tasks, we found that: (1) subjects adapted, albeit imperfectly, to different hazard rates (via comparisons to a suboptimal model, which ignored blockwise changes in H) and used their subjective estimates of hazard rate in a manner consistent with the normative model; and (2) their choice dynamics were better described by the normative model than two other adaptive, but suboptimal, alternatives inspired by the approximations to the normative model (one was an accumulator with a leak that could vary as a free parameter for each hazardspecific block of trials but no stabilizing boundary; the other was a perfect accumulator with a stabilizing boundary that could vary as a free parameter for each hazardspecific block of trials; see Figure 3).
‘Triangles’ task
This task required subjects to make trialbytrial choices about which of two spatially separated triangles on a computer screen was the source of a single data point presented on that trial, represented as the position of a star on the screen (Figure 4A,B). Subjects could thus make choices based on accumulated evidence after each new sample of data, with the LLR for each star corresponding to its position relative to each triangle. The correct source changed at a hazard rate that was constant within a block of trials but varied across blocks (0.05–0.95). In a subset of sessions (65 of 111), learning was facilitated by beginning and ending each block of trials with stretches of trialbytrial feedback about the correct answer. However, subjects were never instructed on what the hazard rates were or when they would change.
The subjects were able to adapt their decisionmaking to these different hazard rates, as assessed by direct fits of their choice data by the normative model. Specifically, models that allowed subjective H to freely vary by block provided better fits to the data than a suboptimal model that ignored the blockwise changes in objective H (median ± bootstrapped SEM difference of Bayesian Information Criterion, or BIC, from normative vs blockindependent H fits was −23.721 ± 9.671, Wilcoxon signedrank test, p < 0.0001; per subject, the normative fits were better in 43 of 48 subjects using a signedrank test, Bonferroni corrected p < 0.05). Overall, the normative model performed well, with choice residuals centered around zero (mean ± std deviance residual = 0.003 ± 0.458) and a reasonably close match to the choice data (median ± bootstrapped SEM McFadden's r^{2} of 0.895 ± 0.016 across subjects). Moreover, the estimated values of subjective H from these fits were strongly correlated with objective H across all subjects (Pearson's r = 0.721, p < 0.0001; Figure 4C), with 47 of 48 individual subjects showing a regression slope of subjective on objective H that was >0 and in 46 of those cases was also <1 (Figure 4D, median ± bootstrapped SEM regression slope = 0.402 ± 0.042). However, although the subjects adapted their decisionmaking behavior appropriately for different values of H, their subjective estimates of H tended towards H ∼ 0.5, for which the history of evidence is irrelevant. This tendency to misestimate extreme hazard rates did not appear to reflect insufficient learning opportunities, because these trends persisted even when restricting fits to the last 200 trials of each block (across subjects and blocks, bootstrapped regression slope of subjective on objective H = 0.373 ± 0.044; see also Figure 4—figure supplement 1) or to blocks beginning with explicit, trialtotrial feedback (regression slope = 0.562 ± 0.038).
The subjects appeared to be using these learned, subjective estimates of hazard rate in a manner consistent with the normative model, for several reasons. First, their choice dynamics directly reflected biases predicted by the two regimes of the normative model (Equation 3, Figure 5A–F). When certainty was high (i.e., choices just following a trial in which the star position was far to the left or right), the subjects consistently showed hazarddependent biases that were predicted by the stabilizing boundary of the model: weak evidence interpreted as stability for low H and change for high H (Spearman's r = 0.890, p < 0.0001, comparing predicted and actual biases from individual blocks; Figure 5A,B). When weak evidence persisted without changepoints for a run of trials following a changepoint, choice dynamics consistently reflected the hazarddependent leak predicted by the model (Figure 2G, Figure 5D,E): gradual updates for low subjective H, immediate updates for subjective H ≈ 0.5, and damped oscillations for high subjective H (including lower accuracy on two vs one trial following the changepoint). Fits to the blockindependent model did not show these Hdependent choice dynamics, confirming that the choice dynamics did not result from any differences in how the randomly generated stars were sampled under the different conditions used for these analyses (Figure 5C,F).
Second, the subjects' decision processes reflected a strong, hazard and evidencestrengthdependent tradeoff between detecting changes and identifying steady signals, as predicted by the normative model (Figure 5G–L). When the evidence was weak (star positions close to the midline), accuracy following a changepoint was highest for high H (mean ± bootstrapped SEM across blocks = 73.7 ± 2.4% correct) then declined steadily for intermediate (66.4 ± 1.2%) and low H (50.0 ± 7.3%; Spearman's correlation between changepoint accuracy and H was 0.559, p < 0.0001, vs a predicted correlation from the normative model of 0.651). In contrast, accuracy on non changepoint trials with the same strength of evidence was lowest for high H (57.1 ± 4.1%) then improved steadily for intermediate (71.4 ± 1.2%) and low H (81.1 ± 2.2%; Spearman's correlation between non changepoint accuracy and H was −0.502, p < 0.0001, vs a normative prediction of −0.625; Figure 5G,H). When the evidence was strong, the tradeoff was much smaller, as predicted by the normative model (Spearman's correlation between changepoint accuracy and H was −0.040, p = 0.557 and between non changepoint accuracy and H was −0.005, p = 0.945; normative predictions were 0.027 and 0.010, respectively; Figure 5J,K). Fits to the blockindependent model did not show these Hdependent tradeoffs, for either weak or strong evidence (Figure 5I,L).
Third, we used choice data to directly estimate the mapping of subjective beliefs to priors (L_{n − 1} to Ψ_{n}, Figure 6; compare to Figure 1B). Like for the normative model, the estimated mappings depended on subjective H (oneway MANOVA for the groups shown in Figure 6A, p < 0.0001). Moreover, for each Hgroup, these mappings matched predictions of the normative model (Figure 6B,E; Hotelling's ttest comparing data and model, p = 0.189, 0.321, and 0.086 for low, medium, and high values of objective H, respectively).
In contrast, the choice data from the triangles task were not as well matched by either of the two adaptive, suboptimal models we considered (Figure 3, 6). The leakyaccumulator model had worse overall fits to the choice data than the normative model for 34 of 48 subjects (median ± SEM difference in BIC = −5.179 ± 1.967, Wilcoxon signedrank test, p < 0.0001) and predicted mappings of subjective beliefs to priors that matched the pooled data only for medium and high values of H but not for low values of H, which lacked the asymptotic regime prescribed by the normative model when beliefs were more certain (Figure 6C,F; Hotelling's ttest, p < 0.0001 for low objective H, and p = 0.312 and 0.545 for medium and high objective H, respectively). Likewise, the model with perfect accumulation to a hazardspecific stabilizing boundary had worse overall fits to the choice data than the normative model for 34 of 48 subjects (−0.942 ± 0.487, p = 0.007), reflecting the lack of a leakyaccumulation regime prescribed by the normative model when beliefs were uncertain and H was high (Figure 6D,G; Hotelling's ttest, p = 0.228, 0.463, and 0.017, respectively). This relatively modest, but reliable, difference in BIC reflected an inherent difficulty in distinguishing these models with the particular task conditions we used (fitting simulated data from either model yielded similarly small BIC differences: −1.448 ± 0.900 for simulations based on the normative fits and 1.393 ± 1.054 for simulations using the stabilizingboundary fits). Both suboptimal models had bestfitting, subjective hazard rates that, even more than for the normative fits, tended to overestimate small objective values and underestimate large objective values, further supporting the idea that the subjects were using misestimated hazard rates to make their decisions (regression slope of subjective vs objective H = 0.115 ± 0.023 for the leaky accumulator and 0.290 ± 0.044 for the perfect accumulator to a stabilizing boundary, p < 0.0001 when compared to slopes from the normative fits in both cases).
Dotsreversal task
This task was a novel version of a commonly used randomdot motion task (Britten et al., 1992). For this ‘dotsreversal’ task, the direction of coherent motion underwent sudden changes within trials. Each subject participated in two separate sessions, one in which changes occurred at a relatively slow rate (0.1 Hz), and one in which changes occurred at a fast rate (2.0 Hz) rate (Figure 7, Videos 1–4). Motion strength (coherence) was fixed to either a high or low value within each trial, and subjects were instructed to pay attention to the stimulus throughout the trial and then indicate its final direction, after which they received feedback on the correct answer.
As with the triangles task, the subjects were able to adapt their decisionmaking to these different hazard rates. Models that allowed subjective λ (changepoint rate, here treated as a continuoustime variable) to vary with objective λ provided better fits to the data than a model that ignored the sessionspecific changes in λ (median ± SEM difference in BIC from the normative vs blockindependent model fits was −4.790 ± 1.216, p < 0.0005; per subject, normative BIC values were significantly lower in 9 of 13 subjects with a Bonferroni corrected p < 0.05, Wilcoxon signedrank test). The normative model performed well overall, with choice residuals centered around zero (mean ± std deviance residual = 0.053 ± 0.897) and a reasonably close match to the choice data (median ± bootstrapped SEM McFadden r^{2} of 0.385 ± 0.055 across subjects). Of the 13 subjects, 12 had bestfitting values of adaptive, subjective λ that showed appropriate sensitivity to objective λ, with estimated subjective λ lower on 0.1 vs 2.0 Hz trials (Figure 7B; Wilcoxon signedrank p < 0.05). However, like for the triangles task, the subjects tended to overestimate low values and underestimate high values of λ (median ± bootstrapped SEM estimated subjective λ = 0.365 ± 0.109 and 1.129 ± 0.168 Hz for the 0.1Hz and 2Hz conditions respectively), with a similar tendency even when restricting model fits to the last 50 trials of each session to account for learning (subjective λ = 0.291 ± 0.122 and 0.968 ± 0.207 Hz for the 0.1Hz and 2Hz conditions, respectively).
Also consistent with our results from the triangles task, the subjects appeared to be using their adaptive estimates of λ in a manner consistent with the normative model, based on several lines of evidence. First, the choice data exhibited dynamics predicted by the normative model (Figure 8). For highcoherence trials, the strong sensory evidence dominated the decision process, yielding >90% accuracy within 500 ms following the final change in direction irrespective of the rate of preceding direction changes (Figure 8D,E). In contrast, for lowcoherence trials, integration times were strongly dependent on hazard rate (i.e., greater effects of ψ in Equation 1) Specifically, accuracy improved more steeply as a function of viewing duration for the low vs highhazard condition: performance was worse for the 0.1Hz condition for durations <500 ms, reflecting persistence of the perceived direction of motion just prior to the final changepoint (i.e., direction reversal), but rose as viewing duration increased and exceeded performance for the 2Hz condition at long durations (Figure 8A,B). These dynamics, particularly at low coherences, were not predicted by the blockindependent model and thus did not reflect uneven sampling of the data under the different coherence and hazard conditions (Figure 8C,F).
Second, as with the triangles task, this decision process reflected a strong hazard and evidencestrengthdependent tradeoff between detecting changes and identifying steady signals, as predicted by the normative model (Figure 8G–K). For lowcoherence trials, choice accuracy was lower for the lowhazard condition just following a changepoint (median ± bootstrapped SEM 7.1 ± 11.1% and 40.3 ± 2.8%, for low and highhazard sessions respectively, when the stimulus was shown for <300 ms following the final changepoint, Wilcoxon signed rank, p < 0.01, vs predicted accuracies of 34.6 ± 4.7% and 44.9 ± 3.1%, respectively) but was greater for the lowhazard condition thereafter (83.6 ± 3.6% and 69.9 ± 8.3%, respectively, p < 0.0005, vs predicted 86.2 ± 2.3% and 74.2 ± 2.9%, respectively). For highcoherence trials, choice accuracy was much higher overall and, consistent with the normative model, showed a weaker tradeoff, with no difference by hazardrate condition for short viewing durations following the final change point (66.7 ± 14.6% and 66.1 ± 7.3% for 0.1Hz and 2Hz sessions respectively, p = 0.831, vs predicted 46.9 ± 9.4% and 59.7 ± 3.7%, respectively) and only a slight difference for longer postchange durations (100 ± 0% and 93.5 ± 1.4%, respectively, p = 0.004, vs predicted 97.8 ± 0.4% and 92.1 ± 2.0%, respectively). In contrast, the blockindependent model did not predict these hazarddependent tradeoffs (Figure 8I,L).
Third, we directly measured the dependence of choice dynamics on both the hazard rate and the strength of the sensory evidence by fitting choice data to integrating models with separate leaks for each hazardspecific session and coherence level (Figure 9). As predicted by the normative model (Figures 3A, 8B,E), the bestfitting leak depended on both (Friedman test, p < 0.0005 for the effect of hazard rate and p < 0.0001 for the effect of motion coherence; Figure 9A). The persubject, pairwise differences between bestfitting leak, computed with respect to either coherence or hazard rate, did not differ from predictions of the normative model (median ± bootstrapped SEM normalized data–model difference by hazard rate = −0.003 ± 0.021, by coherence = −0.001 ± 0.050, Wilcoxon signed rank p = 0.501 and 0.292, respectively).
In contrast, the choice data from the dotsreversal task were not as well matched by either of the two adaptive, suboptimal models we considered (Figure 3). The leakyintegrator model, in which the leak depended on hazard rate but not coherence level, had worse overall fits to the choice data than the normative model for 10 of 13 subjects (median ± bootstrapped SEM difference in BIC = −4.066 ± 3.078, Wilcoxon signed rank, p < 0.05), failing in particular to capture the strongly coherencedependent leak of the data and the normative model (Figure 9C,F,H,I; normalized data–model median ± bootstrapped SEM difference between change in leak by coherence = 0.241 ± 0.067, Wilcoxon signed rank p < 0.0005). The normative model also outperformed the model with perfect integration to a stabilizing boundary that freely varied by objective hazard rate (BIC was lower for 9 of 13 subjects, −4.305 ± 2.662, p < 0.05). This suboptimal model had a more subtle deviation from the data, consisting primarily of an exaggerated dependence of leak on coherence (Figure 9D,G,J,K; persubject, normalized data–model difference between change in leak by coherence = −0.189 ± 0.059, Wilcoxon signed rank p < 0.0001). Like for the normative fits, both suboptimal models also had bestfitting subjective hazard rates that were imperfectly adapted to the objective values, further supporting the idea that the subjects were using imperfect estimates to make their decisions (bestfitting λ = 0.789 ± 0.084 and 1.697 ± 0.227 Hz for the 0.1Hz and 2Hz conditions, respectively, for the leaky integrator and 4.378 ± 0.998 and 8.803 ± 1.159 Hz, respectively, for the perfect integrator to a stabilizing boundary).
Discussion
We derived a normative model of evidence accumulation for decision tasks that is based on Bayesian principles for inferring changes in the statistics of a generative process (Rabiner, 1989; Adams and MacKay, 2007; Behrens et al., 2007; Fearnhead and Liu, 2007; Brown and Steyvers, 2009; Wilson and Finkel, 2009; Nassar et al., 2010, 2012; Wilson et al., 2010; Boerlin et al., 2013; Wilson et al., 2013; Gonzalez Castro et al., 2014; McGuire et al., 2014; Sato and Kording, 2014). Our model incorporates change detection into sequentialsampling decision models and is related to other, modified versions of these models that have been used to combine multiple sensory cues of different but known reliabilities or infer unknown sensory reliability assumed to be stable during the course of decisionmaking (Hanks et al., 2011; Deneve, 2012; Drugowitsch et al., 2014). However, unlike those models, which invoked a separate learningrate term or had other, more complex forms, our model casts adaptation directly in the context of the evidenceaccumulation process that is a key focus of studies of decisionmaking (Usher and McClelland, 2001; Roitman and Shadlen, 2002; Huk and Shadlen, 2005; Uchida et al., 2006; Brunton et al., 2013; Hanks et al., 2015). This formulation allowed us to identify, for the first time, features of evidence accumulation that can underlie normative, adaptive decisionmaking, including expectationdependent changes in leaky accumulation when beliefs are weak and saturating accumulation when beliefs are stronger. We showed that human subjects made decisions on two separate tasks, requiring evidence accumulation either across or within trials, that were consistent with the adaptive, hazarddependent accumulation process prescribed by the model.
Our findings substantially extend previous studies that similarly suggested that human decisionmaking behavior can reflect adaptations to the rate of environmental changes (Behrens et al., 2007; Brown and Steyvers, 2009; Gonzalez Castro et al., 2014). Specifically, we showed that subjects could both learn a range of hazard rates and then use those learned rates in a normative manner to interpret sequences of evidence to make decisions. However, they tended to learn imperfectly, overestimating low hazard rates and underestimating high hazard rates. Thus, although their use of these imperfectly learned hazard rates was consistent with the normative model, their overall decisions in some cases fell short of the ideal observer. Our framework provides a new way to interpret these deviations from optimality: not simply as poor performance, but rather as different, hazarddependent setpoints of an inherent tradeoff. This tradeoff balances sensitivity to change during periods of expected instability, and sensitivity to steadystate signals during periods of expected stability. These different set points may have reflected certain prior expectations about the improbability of either perfect stability or excessive instability that could constrain performance when those conditions occur.
Such prior expectations about a lack of perfect environmental stability interpreted in the context of our framework might also provide new insights into previous studies of the temporal dynamics of evidence accumulation. In some cases, decisions about perfectly stable stimuli appear to involve perfect accumulation, as described by driftdiffusion and related models (Gold and Shadlen, 2000; Roitman and Shadlen, 2002; Brunton et al., 2013; Hanks et al., 2015). Under those conditions, deviations from perfect accumulation in the brain may be considered as inefficient, operating under other constraints (e.g., computational costs), or of uncertain relevance to decisionmaking (Usher and McClelland, 2001; Drugowitsch et al., 2012). In contrast, our results imply that at least some deviations from perfect accumulation might reflect normative adjustments to expected instabilities, even under the nominally stable conditions used for many tasks. For example, leaky accumulation that places more emphasis on recent vs past information or rates of accumulation that vary as a function of time, which can account for the temporal dynamics of certain decisions about stimuli that are presented with stable statistics for 100's of ms or more, might reflect prior expectations that instabilities are likely to occur within that time frame (Usher and McClelland, 2001; Eckhoff et al., 2008). Likewise, reports of an ‘urgency’ signal that limits temporal integration based on a drive to respond quickly might reflect similar expectations of impending instabilities (Reddi and Carpenter, 2000; Ditterich, 2006; Cisek et al., 2009; Drugowitsch et al., 2012; Thura et al., 2012). More extreme expectations of instabilities might relate to other tasks that appear not to require temporal integration at all and instead show little dependence of performance on stimulus duration beyond what is needed to activate the sensory detectors (Ludwig et al., 2005; Uchida et al., 2006). These interpretations are consistent with the idea that the temporal integration window for many kinds of decisions might be highly flexible and adapt to the temporal dynamics of the environment (Ossmy et al., 2013; Gonzalez Castro et al., 2014). Insofar as the accumulated evidence that serves as the decision variable governing choice behavior can also be thought of as a confidence signal, such adaptive dynamics might also pertain to confidence judgments associated with certain decision tasks (Kepecs et al., 2008; Kiani and Shadlen, 2009; Ma and Jazayeri, 2014). Further work is needed to understand if and how these findings can be understood in the context of a common set of normative principles that balance the identification of steady signals with change detection.
Our results might also have implications for understanding the tradeoff between speed and accuracy inherent to many tasks (Gold and Shadlen, 2007; Bogacz et al., 2010). Sequentialsampling models like driftdiffusion typically account for this tradeoff in terms of an absorbing decision boundary. This boundary can be set to a predefined value to terminate the decision process while emphasizing either speed or accuracy at the expense of the other, or possibly balancing the two in the service of maximizing related quantities like reward rate (Gold and Shadlen, 2002; Palmer et al., 2005; Bogacz et al., 2006, 2010; Simen et al., 2009). Alternatively, in our model the adaptive accumulation process can be suspended, at least temporarily, not by an extrinsically imposed decision rule like an absorbing decision boundary but rather by the nonlinear dynamics of the accumulation process itself. In principle, certain decisions might be made by committing to an alternative once this asymptotic regime is reached. This regime represents an upper limit on the expected level of confidence and thus precludes the need for either additional data for that alternative or for an additional boundary. In this case, the resulting speedaccuracy tradeoff would not necessarily reflect a predefined attempt to control those factors explicitly but rather expectations about the rate at which the evidencegenerating process is changing.
Future work is needed to investigate how key features of our model might be implemented in the nervous system for different tasks and different timescales. Previous studies using tasks that required information accumulation on the timescale of the triangles task (e.g., over many seconds to minutes) have similarly suggested that humans can approximate optimal change detection, which in some cases includes a sensitivity to different hazard rates (Behrens et al., 2007; Brown and Steyvers, 2009; Nassar et al., 2010). The neural mechanisms of these abilities are not yet known, but fMRI and pupillometry data suggest possible roles for the arousal system including the anterior cingulate cortex and the noradrenergic system, and genotype data imply possible contributions of the dopamine system (Yu and Dayan, 2005; Nassar et al., 2012; Behrens et al., 2007; Krugel et al., 2009; O'Reilly et al., 2013; McGuire et al., 2014). Conversely, evidenceaccumulation processes that operate over shorter timescales, like for various versions of the randomdot motion task, have focused on dynamic neural signals in other parts of cortex, the basal ganglia, and the superior colliculus that can reflect the rapid buildup of evidence to select a particular motor response (in these cases involving eye movements) (Gold and Shadlen, 2007; Ding and Gold, 2013). There are some suggestions that these systems may interact under certain conditions (O'Reilly et al., 2013), but much more work is needed to understand the brain mechanisms responsible for the kinds of normative, scaleinvariant dynamics of evidence accumulation we characterized in this study. Extending our framework to more than two alternatives and to conditions in which the statistics of the evidence changes gradually, as opposed to abruptly, would also be an important step towards better understanding how the brain accumulates and interprets dynamic evidence to solve complex, realworld problems.
Materials and methods
Discretetime model
The normative model is based on the posterior probability each of option (z_{1} or z_{2}) given all of the evidence collected so far (x_{1:n}), $q\left({z}_{in}\right)\equiv p\left({z}_{in}{x}_{1:n}\right)$. We assume that at each time step, there is a probability (H, for ‘hazard rate’) that there will be a switch in the correct option. Beginning with Bayes' Rule, and using the sum and product rules of probability, it can be shown that:
where p(x_{n}z_{i}) is the likelihood of observing the evidence from source i. This relationship is the forward recursion for the BaumWelch algorithm in Hidden Markov Models and has been proven elsewhere (Bishop, 2006). We derived the model (Equations 1, 2) by taking the logarithm of the ratio of the two equations; that is, defining ${L}_{n}\equiv \mathrm{log}\left(q\left({z}_{1n}\right)/q\left({z}_{2n}\right)\right)$ and expanding the logarithm, giving:
where the first term of the RHS is the LLR in Equation 1 by definition. The second term of the RHS can be manipulated to yield ψ (Equation 2) first by dividing both the numerator and denominator by Hq(z_{2,n − 1}), then expanding the expression while using $\frac{q\left({z}_{1,n1}\right)}{q\left({z}_{2,n1}\right)}=\mathrm{exp}\left({L}_{n1}\right)$ by definition, giving $\psi \left({L}_{n1},H\right)=\mathrm{log}\left[\frac{1H}{H}\mathrm{exp}\left({L}_{n1}\right)+1\right]\mathrm{log}\left[\mathrm{exp}\left({L}_{n1}\right)+\frac{1H}{H}\right]$. Factoring out exp(L_{n − 1}) from the first term of the RHS yields Equation 2.
The special cases of H = 0 and H = 0.5 are most straightforward to see from Equation 5. When H = 0:
and
which is perfect integration of the log likelihood ratios. When H = 0.5:
and
Continuoustime model
Akin to the discretetime model, the continuoustime version is based on the posterior probabilities of each option given all evidence collected until a given time point t. It has been shown previously that the nonnormalized posterior probabilities of each of two states in a Markov jump process dx(t), with average values ±μ and noise magnitude σ, can be written as a system of stochastic differential equations (Zakai, 1965; Liptser and Shiryaev, 1977):
We used this result to write the logodds ratio signal as L(t), seeking the derivative $dL\left(t\right)\equiv d\mathrm{log}\left({q}_{1}\left(t\right)/{q}_{2}\left(t\right)\right)$, by beginning with Equation 6, separating out the deterministic and stochastic components of the incoming evidence, and rewriting Equation 6 in vector form:
Applying Itō's Lemma:
We now expand each component of Equation 8, beginning with those in the deterministic expression:
and
Turning to the stochastic component:
Substituting Equations 9–11 into Equation 8 yields:
Letting $A=2\mu /{\sigma}^{2}$, and using the hyperbolic sine function, we have $\text{d}L\left(t\right)=\left[2\lambda \mathrm{sinh}L\left(t\right)+Ah\left(t\right)\right]\text{d}t+A\sigma \text{d}W$, which can be rewritten as Equation 4 using $\text{d}x\left(t\right)=h\left(t\right)\text{d}t+\sigma \text{d}W$. Simulations in Figure 1—figure supplement 1 show examples of timeevolution of the belief variable by approximating Equation 4 with the EulerMaruyama method.
Firstorder approximations
We made firstorder Taylor approximations of the deterministic terms in each model (Figure 2A–D, Equation 3).
Discretetime model
For the discretetime model, we based the approximations on the log prior odds in Equation 2: $\psi \left({L}_{n1},H\right)\approx \psi \left({{L}^{\prime}}_{n1},H\right)+{\partial /\partial L}_{n1}\psi \left({{L}^{\prime}}_{n1},H\right)\left({L}_{n1}{{L}^{\prime}}_{n1}\right)$, where L_{n − 1}′ is the value of the previous belief around which the approximation was made, and
Writing this equation with leak rate K and bias θ as in Equation 3, $K\equiv 1\partial /\partial L\psi \left({L}^{\prime},H\right)$ and $\theta \equiv \psi \left({L}^{\prime},H\right)\partial /\partial L\psi \left({L}^{\prime},H\right){L}^{\prime}$.
When previous beliefs are weak; that is, L′ = 0,
and
Expressing the approximation in terms of Equation 3, leak rate K = 2H and bias θ = 0, as in Equation 3a.
When previous beliefs are strongly in favor of the first alternative; that is, L_{n − 1} → ∞,
so the leak rate K = 1, and the bias is determined entirely by the value of the logprior odds evaluated as:
as in Equation 3b.
Similarly, when previous beliefs are strongly in favor of the second alternative; i.e., L_{n − 1} → −∞,
so the leak rate K = 1 here as well, and the bias is determined entirely by the value of the logprior odds evaluated as:
as in Equation 3c.
Continuoustime model
We approximated continuoustime model in Equation 4 by taking the firstorder Taylor approximation of the deterministic term, which we write here as $g\left(L\right)\equiv 2\lambda \mathrm{sinh}\left(L\right)$. Specifically, $g\left(L\right)\approx kL+b$, where k represents the timevarying leak rate as in the discretetime model (Figure 2B), and b represents a bias in the derivative of the belief variable (Equation 4; Figure 2D) that, along with the changing leak rate, effects a stabilizing boundary like in the discretetime version. We calculated k as the slope of g(L), which is given as $2\lambda \times \text{d}/\text{d}L\mathrm{sinh}\left({L}^{\prime}\left(t\right)\right)=2\lambda \mathrm{cosh}\left({L}^{\prime}\left(t\right)\right)$, where L′ is the belief state around which the approximation was made. We computed bias as: $g\left({L}^{\prime},\lambda \right)\frac{\partial}{\partial L}g\left({L}^{\prime},\lambda \right)\times {L}^{\prime}=2\lambda \mathrm{sinh}\left({L}^{\prime}\right)+2\lambda {L}^{\prime}\mathrm{cosh}\left({L}^{\prime}\right)$.
Analogous to the discretetime case, when L′ = 0 (certainty is low and beliefs are weak), $k=2\lambda \mathrm{cosh}\left(0\right)=2\lambda $, $b=2\lambda \mathrm{sinh}\left(0\right)+2\lambda \times 0\times \mathrm{cosh}\left(0\right)=0$, and the approximation is linear, resulting in an OrnsteinUhlenbeck process during periods of stability in the data. However, as L′ → ±∞, k → ∞, analogous to the leak rate approaching one in the discretetime case, and
Whereas discretetime approximations of the model give logpriors that are qualitatively similar to Equation 2 for strong beliefs (Figure 1C), this regime in general is not as well approximated as a linearGaussian process, with steadystate solutions over stable periods that are shifted, extremevalue distributions (Figure 1—figure supplement 1, panel D). These distributions can be approximated by first solving for the general steadystate probability distribution of L, as follows. Beginning with the corresponding FokkerPlanck equation and letting p(L, t) denote the timedependent probability distribution of L and γ = Ah(t), the average of the sensory evidence during the stable period, we want the solution to p(L, t) such that: $\frac{\partial}{\partial t}p\left(L,t\right)=\frac{\partial}{\partial L}\left(2\lambda \mathrm{sinh}\left(L\right)+\gamma \right)p\left(L,t\right)+\sqrt{{\gamma}^{2}}\frac{{\partial}^{2}}{\partial {L}^{2}}p\left(L,t\right)=0$.
Therefore, $\frac{\partial}{\partial L}\left(2\lambda \mathrm{sinh}\left(L\right)+\gamma \right)p\left(L,t\right)=\sqrt{{\gamma}^{2}}\frac{{\partial}^{2}}{\partial {L}^{2}}p\left(L,t\right)$, which we solved as $p\left(L\right)={C}_{0}\mathrm{exp}\left({\displaystyle {\int}_{{L}_{a}}^{L}\frac{2\lambda \mathrm{sinh}\left(L\right)+\gamma}{\sqrt{{\gamma}^{2}}}\text{d}L}\right)$, where C_{0} is a normalizing constant and L_{a} is a reflecting boundary condition. So $p\left(L\right)={C}_{0}\mathrm{exp}\left(\left(2\lambda \mathrm{cosh}\left(L\right)+\gamma L+2\lambda \mathrm{cosh}\left({L}_{a}\right)\gamma {L}_{a}\right)/\sqrt{{\gamma}^{2}}\right)$ and letting C be another normalizing constant that absorbs the constant terms inside the exponential:
In the highcertainty regime (the expected value of the belief variable is very positive or negative, either because of strong sensory evidence or a very low hazard rate, or both), this expression can be well approximated as
Figure 1—figure supplement 1, panel B shows an example of this approximation along with simulations. Equation 14 can be rewritten as extreme value distributions with location parameters ±log(γ/λ) with the sign depending on the sign of the sensory evidence, and scale parameter = 1. For example, taking Equation 14a,
where C′ = Cγ/λ.
Tasks
48 subjects (29 female, 19 male; age range = 19–45 years) participated in the triangles task, and 13 subjects (7 female, 6 male; age range = 19–38 years) participated in the dotsreversal task after providing informed consent. Human subject protocols were approved by the University of Pennsylvania Internal Review Board. Both tasks were performed on an iMac with a 27′′ (68.5 cm) screen.
Triangles task
Triangles were separated by 16 cm and represented the centers of a pair of twodimensional Gaussian distributions. On each trial, one triangle was chosen as the true source of the generated star, and that source's associated twodimensional distribution was sampled to determine the position of the star. Distributions were directly represented on the screen by scaling the color axis by screen position between blue and green according to the probability that a star would be generated in the given position (Figure 4A,B). Because the triangles were separated along the horizontal axis, only this dimension was relevant for determining which triangle represented the true source on that trial. For each trial, the star blinked on and off for ∼1.5 s before the subject could choose the inferred source of that star, to minimize fast guesses. For each session, one of three different variances of the pair of twodimensional distributions was randomly chosen without replacement (the ratio of the standard deviation of the generative process to the distance between the triangles was 0.24, 0.33, or 0.41). These three conditions corresponded to mean values of loglikelihood ratios of 9, 4.5, or 3.33, respectively, of generated stars.
Each subject performed two 1000trial blocks per session in 1–4 total sessions. Each block used a hazard rate that governed the rate of switching between the two sources (triangles) and was chosen randomly from a set of seven possible values (0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95). Hazard rates were chosen without replacement within sessions to ensure a change across blocks. In a subset of sessions, each block began with 400 trials in which subjects received feedback on the correct choice, followed by another 400 trials without feedback and ending with 200 feedbackbased trials so that changes in hazard rate did not coincide with the onset of feedback.
Before each session, subjects were instructed that the triangles generated stars into overlapping neighborhoods and shown representations of the spatial distributions. They were then instructed that triangles would ‘take turns’ generating stars, with switches in the turntaking that would occasionally be fast or slow. After receiving these instructions, subjects were shown an animation illustrating the generative process (i.e., a sample sequence of trials).
Subjects were paid a minimum $8 per session and an additional amount based on performance: at the beginning of each session, the subject had $5, and over the trials was penalized by 20 cents for each incorrect choice and rewarded with either 1 or 2 cents for correct choices. Total cash reward was continuously updated on feedback trials but not on nonfeedback trials so subjects could not infer the previous correct choice. On average, subjects received a total additional cash reward of $8 (range $0–27).
Dotsreversal task
This task was based on decisions about the coherent direction of motion of a set of stochastic dots (density = 70 dots/deg^{2}/s) presented in a 10°diameter circular aperture at the center of the computer screen with three interleaved frames of motion. Each trial involved a stimulus 5–10 s in duration, determined as min(10, 5 + τ), where τ was an exponentially distributed random variable, making trial terminations unpredictable within the given time frame. Within each trial, the direction of movement alternated between leftward and rightward at an average hazard rate of either 0.1 Hz or 2 Hz. Subjects participated in two sessions each, with 200 trials per session and a hazard rate that was constant throughout the session. The order of the two hazard rate sessions was chosen at random.
Each session began with 20 practice trials in which coherence was set to 60–85%, relatively high values that were considered easy for all subjects. For the remaining 180 trials, coherence was randomly chosen as either this ‘high coherence’ value (25% of trials) or a ‘lowcoherence’ value (75% of trials). The value used for ‘low coherence’ was determined separately for each subject (mean ± SEM = 14.85 ± 1.53%, range 6–38%), corresponding to the coherence for which the participant could correctly decide the direction of a 500ms long stable stimulus (i.e., no direction changes) 65% of the time. We assessed this threshold using a modified version of the adaptive QUEST procedure in which coherence was set to the mean of the threshold probability distribution (Watson and Pelli, 1983; KingSmith et al., 1994). We ran the QUEST procedure before each session and did not end the procedure until stable performance was achieved, defined as an estimated threshold loglikelihood ≥ −2.5 under the assumption that unstable performance would shift the corresponding threshold and keep loglikelihoods lower (direct estimates of threshold across consecutive 20trial blocks verified this assumption). We also ensured that the measured threshold from this procedure was consistent across sessions for individual subjects. We based ‘high’ coherence on the lowcoherence threshold: subjects with thresholds >15% received 85% high coherence, those with thresholds between 7–15% received 80% high coherence, and one subject with a remarkably low (∼4–5%) threshold received 60% high coherence. All stimuli in this range are fairly easy to judge.
Before the session, subjects were instructed to do their best to follow motion direction throughout a given trial and indicate the direction they believed dots were moving in at the ‘very end’ of the trial. Subjects received feedback after each trial on the correct answer but were not given additional monetary reward for performance as in the triangles task.
Model fitting
All models were fit to choice data using Matlab's optimization toolbox by minimizing the crossentropy error function (Bishop, 2006):
where ρ_{n} is a binary variable indicating which alternative was chosen on trial n (arbitrarily defined as 0 for the right and 1 for the left) and ${\widehat{\rho}}_{n}$ is the choice probability predicted by the given model. All models assumed the choices were based on the sign of the subjective logodds.
Triangles task
We fit choices from the triangles task for each block of trials and each session by computing estimated L_{n} on each iteration of the fitting algorithm using Equation 1; the current best estimate of subjective hazard rate for that iteration and block of trials (${\widehat{H}}_{b}$); and a gain term on star position (β) that was applied across blocks but was specific to each session, which accounted for a subjective estimate of the generative variance (Gold and Shadlen, 2001): ${L}_{n}=\psi \left({L}_{n1},{\widehat{H}}_{b}\right)+\beta {x}_{n}$. Unless otherwise indicated, all fits were made to trials without any feedback on the correct answer to confirm that learned hazard rates were used during this period. We omitted from further analyses the first 200 trials of each block for sessions in which no trialbytrial feedback was given at all, allowing for stabilization of learned, subjective estimates of the current generative H and resulting in a total of 800–6400 trials per subject for analysis. We assumed choices were based on the subjective logodds and Gaussian noise: ${\rho}_{n}=\text{sign}\left({L}_{n}+{\zeta}_{n}\right)$ where ς_{n} is the Gaussian noise variable. We also assumed a fixed magnitude of noise υ across sessions but specific to each subject. Model fits thus computed the cumulative probability of choosing the left triangle for subject i, in session s, block b, and trial n as:
Thus, this model had 4–16 free parameters for each subject, depending on the number of sessions. Because of the large number of parameters for some subjects, we used a maximum a posteriori (MAP) fitting procedure that placed conjugate priors on each parameter that were fixed across subjects and conditions, subtracting off log(p[parameter estimateconjugate prior distribution of parameters]) from the error term in Equation 15.
Dotsreversal task
We fit the continuoustime model to data from the dotsreversal task using Equation 1, which is the most efficient discretetime approximation of Equation 4 and faster than the EulerMaruyama approximation with small timesteps. However, unlike the triangles task, here all choices were made after observing an entire sequence of stimuli over the trial. We thus fit the model using Equation 1 with n indexing trial rather than time and m indexing timestep within a trial, i indexing subject and s indexing session, as follows:
where k_{i} is a gain term for that subject on signed coherence C_{ismn} (negative for rightward motion, positive for leftward motion; because coherence was determined probabilistically, its magnitude could vary stepbystep within a trial); $\langle \left{C}_{isn}\right\rangle $ indicates the expected coherence magnitude for that trial, determined experimentally as described above; η_{ismn} is zeromean, unit variance Gaussian noise; and ${\widehat{H}}_{is}$ is subjective hazard rate specific to each session. To convert to a continuoustime subjective hazard rate λ, we multiplied ${\widehat{H}}_{is}$ by the monitor refresh rate (60 Hz), which determines the timesteps between dot drawings.
The last two terms of Equation 17 represent the subjective LLR, which we assume reflects a coherencedependent signal (the secondtolast term) plus internal (neural), signaldependent noise (the last term). Because we could not directly observe this noisy quantity, we fit the model by numerically deriving the timeevolution of the logodds probability distribution over each trial and each step of the fitting iteration. Specifically, we determined the probability of each logodds (L_{isomn}) at each time step by marginalizing over the probability of each logodds from the previous timestep (L_{isj, m − 1, n}):
where $\mathrm{N}\left(x\mu ,\sigma \right)$ denotes a Gaussian probability distribution function of variable x with mean μ and standard deviation σ and is the conditional probability distribution of the logodds on each timestep given the coherence, model parameters, and logodds on the previous timestep. Probability distributions were initialized on each trial as a Dirac delta function. For the model fits, we discretized the logodds space between −10 and 10 over steps of 0.4 logodds, which we determined via simulations to be a sufficient range and resolution for accurate parameter estimates. Estimated choice probabilities were then computed as the cumulative probability of choosing the leftward direction: ${\widehat{p}}_{isn}={\displaystyle {\int}_{0}^{\infty}p\left({L}_{isMn}\right)}\text{d}{L}_{i}$. Model fits were thus based on three parameters per subject, a single gain term k_{i} and two hazard rate terms ${\widehat{H}}_{is}$, one for each session.
Suboptimal model ^{#}1: blockindependent model
For both tasks, we fit to choice data a version of the normative model that ignored the blockwise changes in objective hazard rate. To ensure a fair model comparison with the blockdependent normative model, we constructed the null model using the same number of hazardrate parameters but randomly shuffled across blocks. Specifically, for the triangles task the main model equation (Equation 16) was written as ${L}_{isbn}=\psi \left({L}_{isb,n1},{\widehat{H}}_{is,b\left(n\right)}\right)+{\beta}_{is}{x}_{isbn}$, where b(n) denotes the hazardspecific block to which this trial was randomly assigned. Randomization was performed before each fit, and this shuffleandfit procedure was repeated 50 times to generate a distribution of model loglikelihoods and BIC values against which the normative model could be compared across subjects. For the dotsreversal task, the main model equations (Equations 17, 18) were written as ${L}_{mn}=\psi \left({L}_{m1,n},\widehat{H}\left({s}_{n}\right)\right)+k{C}_{mn}+\sqrt{2k\langle \left{C}_{n}\right\rangle}{\eta}_{mn}$, where s was a vector indicating the randomized session, with the shuffleandfit procedure repeated 20 times per subject.
Suboptimal model ^{#}2: blockdependent leaky accumulator
For both tasks, we also fit an alternative model based on the linear approximation in Equation 3a. For the triangles task this model was written as ${L}_{isbn}=\left(1{K}_{isb}\right){L}_{isb,n1}+{\beta}_{is}{x}_{isbn}$, where K was the leak specific to subject i in session s and block b, and the other parameters were as described above. For the dotsreversal task we fit the same model, only here treating noise as a latent variable like we did for fitting the normative model and deriving the timeevolution of the logodds probability distribution over each trial and each step of the fitting iteration: ${L}_{ismn}=\left(1{K}_{is}\right){L}_{ism1,n}+{k}_{i}{C}_{ismn}+\sqrt{2{k}_{i}\langle \left{C}_{isn}\right\rangle}{\eta}_{ismn}$. Unlike the normative model, these fits were greatly simplified by an analytic solution for the stationary standard derivation of the logodds, given the current coherence and model parameters: ${\sigma}_{isn}=\sqrt{{k}_{i}\langle \left{C}_{isn}\right\rangle /{K}_{is}}$. This quantity is the standard deviation for the stationary probability distribution of the discretetime analogue of an OrnsteinUhlenbeck process in which coherence is perfectly stable over a trial. Thus, for each step of the fitting procedure, we solved for the average logodds at the end of the trial by: (1) running the deterministic portion of the leakyaccumulation over the timedependent coherence for the trial, and (2) writing the probability distribution of the logodds at the end of the trial as a Gaussian with the final mean determined in step 1 and a standard deviation as described above, giving ${\widehat{p}}_{i,s,n}=\frac{1}{2}+\frac{1}{2}\mathrm{erf}\left({L}_{i,s,M,n}/\sqrt{2{k}_{i}\langle \left{C}_{isn}\right\rangle /{K}_{is}}\right)$, where M indicates the final sample of trial n.
Suboptimal model ^{#}3: perfect accumulation with blockdependent stabilizing boundaries
For both tasks, we also fit an alternative model assuming perfect integration to a stabilizing boundary as in Equation 2 with the boundary in Equation 3b,c as a free parameter; that is, rewriting Equation 2 as:
For the triangles task, these fits were made as for the normative model, only using the rewritten expression in Equation 19 for logprior odds above; that is, ${L}_{isbn}=\psi \left({H}_{isb},{L}_{isb,n1}\right)+{\beta}_{is}{x}_{isbn}$. For the dotsreversal task, we similarly made fits as for the normative model, substituting the expression for ψ in Equation 19 into Equations 17, 18.
Analysis details
Choice data shown in Figure 5A were fit by a twoparameter logistic function: ${\widehat{\rho}}_{n}=1/\left[1+\mathrm{exp}\left(\left(LL{R}_{n}\varphi \right)/\beta \right)\right]$, where ϕ represents the LLR for which subjects have a 50% chance of choosing either side, and β is the slope of the function around that point. Average differences in parameter ϕ by hazardrate condition are reflected in the horizontal shift of the psychometric function, representing a bias towards (leftward shift) or away from (rightward shift) repeating the same choice. Like for the fits of the main normative model, here the first 200–400 trials of each block were excluded to allow for a period of learning. We report the correlation between ϕ and the prediction from the asymptotic approximation of the fit normative model: $\mathrm{log}\left(\left(1{H}_{subj}\right)/{H}_{subj}\right)$.
The subjective mappings of ψ_{n} to L_{n − 1} in Figure 6 were estimated as a nonparametric function, fit to choice data based on ${L}_{n}=\widehat{\psi}\left({L}_{n1}\right)+LL{R}_{n}$ where here the LLR was based on the generative variance used for the task. We expressed ψ_{n} as an interpolated function of L_{n − 1}, with values spread evenly between −10 and 10 in steps of one logodds ratio and interpolation performed with cubic splines. We fit the mapping with the same objective function as for the parametric models (Equation 10), using the entire data set within a given block to estimate the ψ that best fit choice data. We used Tikhonov regularization of the derivative (for smoothness) using the added penalty term: $\gamma {\displaystyle {\sum}_{i}{\left(\Delta {\psi}_{i}\right)}^{2}}$, where i indexes the value of L for which ψ was estimated and γ = 1/20 was determined through ad hoc methods.
Standard errors and statistical tests on performance measured as a function of viewing duration following the final changepoint on the dotsreversal task were based on bootstrapped samples of the behavioral data or model predictions (Figure 8A–F, Figure 9H–K). Viewing duration bins were 0–200, 200–500, 500–1000, 1000–1500 and 1500–3000 ms. These analyses only included trials for which there was at least one changepoint and the duration of the secondtolast direction was at least 300 ms to avoid immediately sequential changepoints. The mean ± SEM durations of the secondtolast direction for trials used in this analysis were 2945 ± 55 ms for 0.1 Hz and 789 ± 13 ms for 2 Hz. At these durations, discrimination accuracy was likely to be at nearly asymptotic levels at the time of the final changepoint. For a given duration bin specific to coherence and condition (indexed jointly as k), a single bootstrapped sample m of performance was calculated as ${\overline{\rho}}_{km}=\left({\displaystyle {\sum}_{n}{\overline{\rho}}_{kmn}}\right)/{N}_{km}$, where m indexes subject, generated as a random integer between zero and 14; n indexes trial; and N_{km} is the total number of trials within the duration bin for that subject and trial type (e.g., 0.1 Hz, low coherence). Means and standard errors were calculated as means and standard deviations of all bootstrapped samples (1000 samples per comparison). Statistical tests between conditionbycoherence trials i and j were based on paired differences between the same bootstrapped samples. The probability that ${\overline{\rho}}_{i}>{\overline{\rho}}_{j}$ was calculated as $\frac{1}{M}{\displaystyle {\sum}_{m}\left({\overline{\rho}}_{im}{\overline{\rho}}_{jm}>0\right)}$, (where M was the total number of bootstrapped samples for that comparison), likewise for ${\overline{\rho}}_{i}<{\overline{\rho}}_{j}$, and statistical significance indicated when one of these was less than the desired confidence level (0.05).
Fit leaks for the dotsreversal task (Figure 9) used the same algorithm as in the leakyaccumulator model described above but with separate leak terms for both hazard rate and low vs high coherence. Specifically, for subject i, session s, timestep m, trial n, and coherence level c, we defined the leak ${L}_{ismn}=\left(1{K}_{isc}\right){L}_{ism1,n}+{k}_{i}{C}_{ismn}+\sqrt{2{k}_{i}\langle \left{C}_{isn}\right\rangle}{\eta}_{ismn}$. The noise was based on the stationary standard derivation of the logodds, given the current coherence and model parameters: ${\sigma}_{isn}=\sqrt{{k}_{i}\langle \left{C}_{isn}\right\rangle /{K}_{isc}}$. We then fit the choice data to the predicted probability of a given choice for each trial, ${\widehat{p}}_{i,s,n}=\frac{1}{2}+\frac{1}{2}\mathrm{erf}\left({L}_{i,s,M,n}/\sqrt{2{k}_{i}\langle \left{C}_{isn}\right\rangle /{K}_{isc}}\right)$, where M indicates the final sample of trial n. We used a beta distribution prior on leak rate, letting the maximized model log probability for each subject be based on the sum of log likelihoods and this log prior probability, similar to what we did for the triangles task. We were interested in the dependence of leak on coherence level and comparing these dependencies between the choice data and model predictions. To control for overall level of leak by session (and subject), we computed the dependence as a normalized quantity: ${d}_{is}=\left({K}_{is,high}{K}_{is,low}\right)/\left({K}_{is,high}+{K}_{is,low}\right)$ where the ‘high’ and ‘low’ subscripts denote leaks for high and low coherences. The comparison between the data and normative model prediction of the dependence on hazard rate (session) used an analogous normalized measure: ${d}_{is}=\left({K}_{is,fast}{K}_{is,slow}\right)/\left({K}_{is,fast}+{K}_{is,slow}\right)$ where ‘fast’ and ‘slow’ indicate the 2 Hz and 0.1 Hz sessions respectively. Statistics reported were based on these difference measures.
References
 1

2
Sequential tests in industrial statisticsJournal of the Royal Statistical Society 8:1–26.https://doi.org/10.2307/2983610

3
Learning the value of information in an uncertain worldNature Neuroscience 10:1214–1221.https://doi.org/10.1038/nn1954

4
Pattern recognition and machine learningPattern recognition and machine learning, New York, Springer.
 5
 6

7
The neural basis of the speedaccuracy tradeoffTrends in Neurosciences 33:10–16.https://doi.org/10.1016/j.tins.2009.09.002

8
The analysis of visual motion: a comparison of neuronal and psychophysical performanceThe Journal of Neuroscience 12:4745–4765.

9
Detecting and predicting changesCognitive Psychology 58:49–67.https://doi.org/10.1016/j.cogpsych.2008.09.002
 10
 11

12
Decisions in changing conditions: the urgencygating modelThe Journal of Neuroscience 29:11560–11571.https://doi.org/10.1523/JNEUROSCI.184409.2009

13
Fundamental mechanisms of visual motion detection: models, cells and functionsProgress in Neurobiology 68:409–437.https://doi.org/10.1016/S03010082(02)001545
 14

15
Making decisions with unknown sensory reliabilityFrontiers in Neuroscience 6:75.https://doi.org/10.3389/fnins.2012.00075
 16

17
Evidence for timevariant decision makingThe European Journal of Neuroscience 24:3628–3641.https://doi.org/10.1111/j.14609568.2006.05221.x
 18

19
The cost of accumulating evidence in perceptual decision makingThe Journal of Neuroscience 32:3612–3628.https://doi.org/10.1523/JNEUROSCI.401011.2012

20
On diffusion processes with variable drift rates as models for decision making during learningNew Journal of Physics 10:nihpa49499.https://doi.org/10.1088/13672630/10/1/015006

21
Online inference for multiple changepoint problemsJournal of the Royal Statistical Society: Series B 69:589–605.https://doi.org/10.1111/j.14679868.2007.00601.x
 22

23
Neural computations that underlie decisions about sensory stimuliTrends in Cognitive Sciences 5:10–16.https://doi.org/10.1016/S13646613(00)015679
 24

25
The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038

26
Environmental consistency determines the rate of motor adaptationCurrent Biology 24:1050–1061.https://doi.org/10.1016/j.cub.2014.03.049
 27
 28
 29

30
Elapsed decision time affects the weighting of prior probability in a perceptual decision taskThe Journal of Neuroscience 31:6339–6352.https://doi.org/10.1523/JNEUROSCI.561310.2011
 31
 32
 33
 34
 35

36
Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisionsProceedings of the National Academy of Sciences of USA 106:17951–17956.https://doi.org/10.1073/pnas.0905191106

37
Detection of a luminance increment: effect of temporal uncertaintyJournal of the Optical Society of America 71:845–850.https://doi.org/10.1364/JOSA.71.000845
 38
 39

40
Response times: their role in inferring elementary mental organizationNew York: Oxford University Press.

41
The temporal impulse response underlying saccadic decisionsThe Journal of Neuroscience 25:9907–9912.https://doi.org/10.1523/JNEUROSCI.219705.2005

42
Neural coding of uncertainty and probabilityAnnual Review of Neuroscience 37:205–220.https://doi.org/10.1146/annurevneuro071013014017
 43
 44

45
Masking by spatiallymodulated gratingsVision Research 23:1621–1629.https://doi.org/10.1016/00426989(83)901761

46
Rational regulation of learning dynamics by pupillinked arousal systemsNature Neuroscience 15:1040–1046.https://doi.org/10.1038/nn.3130

47
An approximately Bayesian deltarule model explains the dynamics of belief updating in a changing environmentThe Journal of Neuroscience 30:12366–12378.https://doi.org/10.1523/JNEUROSCI.082210.2010

48
Dissociable effects of surprise and model update in parietal and anterior cingulate cortexProceedings of the National Academy of Sciences of USA 110:E3660–E3669.https://doi.org/10.1073/pnas.1305373110
 49
 50

51
A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE 77:257–286.https://doi.org/10.1109/5.18626

52
The influence of urgency on decision timeNature Neuroscience 3:827–830.https://doi.org/10.1038/77739
 53

54
Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time taskThe Journal of Neuroscience 22:9475–9489.

55
How much to trust the senses: likelihood learningJournal of Vision 14:13.https://doi.org/10.1167/14.13.13
 56

57
Reward rate optimization in twoalternative decision making: empirical tests of theoretical predictionsJournal of Experimental Psychology. Human Perception and Performance 35:1865–1897.https://doi.org/10.1037/a0016926

58
Psychophysically principled models of visual simple reaction timePsychological Review 102:567.https://doi.org/10.1037/0033295X.102.3.567

59
Bloch's law predictions from diffusion process models of detectionAustralian Journal of Psychology 50:139–147.https://doi.org/10.1080/00049539808258790

60
Psychology and neurobiology of simple decisionsTrends in Neurosciences 27:161–168.https://doi.org/10.1016/j.tins.2004.01.006
 61

62
Decision making by urgency gating: theory and experimental supportJournal of Neurophysiology 108:2912–2930.https://doi.org/10.1152/jn.01071.2011
 63

64
Seeing at a glance, smelling in a whiff: rapid forms of perceptual decision makingNature Reviews. Neuroscience 7:485–491.https://doi.org/10.1038/nrn1933

65
The time course of perceptual choice: the leaky, competing accumulator modelPsychological Review 108:550–592.https://doi.org/10.1037/0033295X.108.3.550
 66
 67

68
QUEST: a Bayesian adaptive psychometric methodPercept Psychophys 33:113–120.https://doi.org/10.3758/BF03202828

69
Advances in neural information processing systems 22A neural implementation of the Kalman filter p. 2062–p. 2070.

70
Bayesian online learning of the hazard rate in changepoint problemsNeural Computation 22:2452–2476.https://doi.org/10.1162/NECO_a_00007

71
A mixture of deltarules approximation to bayesian inference in changepoint problemsPLoS computational biology 9:p. e1003150.
 72

73
The optimal filtering of Markov jump processes in additive white noiseResearch Note No. 563, Sylvania Electronic System.
Decision letter

Timothy BehrensReviewing Editor; Oxford University, United Kingdom
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for submitting your work entitled “Normative evidence accumulation in unpredictable environments” for peer review at eLife. Your submission has been favorably evaluated by Timothy Behrens (Senior and Reviewing Editor) and three peer reviewers.
The reviewers have discussed the reviews with one another and the editor has drafted this decision to help you prepare a revised submission.
The study of how the brain integrates a history of evidence into a single expectation has been a remarkably profitable avenue of research in forming quantitative understandings of neural mechanisms over recent years and decades. However, two largely distinct literatures have tackled this problem from very different perspectives. One set of researchers have considered the mathematics and neurophysiology of integrating continuous streams of evidence using approaches such as the drift diffusion model and the sequential probability ratio test. Another set of researchers have considered the integration of discrete events or trials using Bayesian models that can, for example, optimally determine when to integrate evidence together, and when to separate new evidence because the world has changed.
The current paper presents an important step towards a unification of these two literatures, by presenting an elegant mathematical analysis of the Bayesian changepoint models to demonstrate that they can be viewed as modified sequential probability ratio tests, and are therefore directly applicable to the large set of researchers who are interested in continuous evidence integration. It makes new and interesting predictions about the stream integration case in situations when the evidence is nonstationary and it tests these predictions in behaviour. All three reviewers considered this to be a very interesting and potentially important step.
As you will see below, the reviewers did not have major criticisms of the model that is presented or data, which they broadly found to be convincing. Most important in terms of solidifying the conclusions, however, are Reviewer 1's and 3's similar points about the demonstration of the qualitative reason for the improvement of the current model (Reviewer 1) and the absence of a reasonable analysis of other (suboptimal) models (Reviewer 3).
In the discussion, the reviewers and editor were also clear that the manuscript is really written for a technical audience. eLife readership is broad and the reviewers and editor would appreciate a reframing of the paper that makes the central points clearer to this broad audience.
One suggestion that emerged during the discussion was a clearer framing of the manuscript as follows:
a) The same Bayesian learning framework, that has been used in other contexts (work by Behrens, Adams, Kording and also Nassar and Gold) is here derived for evidence accumulation in perceptual decisionmaking tasks like RDM;
b) This normative framework can be described, in both discrete and continuous cases, as a leaky accumulator with nonabsorbing bounds, with the leak rate and the height of the bounds being explicit functions of the environmental change rate;
c) Show that this is the case in their data (their paradigm with variable hazard rate is a good test even if atypical of RDM experiments);
d) Provide some justification why this model helps our understanding of perceptual decisionmaking (can the relationship with change point detection explain or offer insight into previously unexplained data?);
e) Provide some clear commentary on the relationship to previous models of change point detection (can the relationship with RDMs explain or offer insight into previously unexplained data?).
We urge you to consider this, or other possible reframings in which the strong message can be understood clearly without an understanding of the technical details.
Reviewer #1:
In this study, Glaze, Kable and Gold present a model of evidence integration over time in which the relative weighting given to new and past observations can be adjusted to reflect the hazard rate for change in the environment. They show that this model can be generalized to both discrete and continuous time cases and that it fits human performance better than a model without adjustable hazard rate, in two very different decisionmaking tasks – a random dot motion task with withintrial reversals, and a task in which evidence is accumulated over many discrete trials.
This is a good quality paper which I think will be of interest to people across the field of perceptual decisionmaking, since it presents a clear framework that is applicable in diverse tasks. It could be an influential and highly cited paper in the field.
If I were to make a case why the manuscript should be published in eLife, I would point out that the adaptation of the Bayesian framework to the random dot motion case does represent a major conceptual advance over the more typical approaches in that field (SPRT models with fixed leak rates or the drift diffusion model), and the model does better at explaining participants' performance than those more typical models. However, I would then suggest that the paper could be strengthened by exploring in more detail why the current model outperforms others (especially, the leaky accumulator with hazard rate as a free parameter by block). Is this because the leaky accumulator down weights past beliefs about the correct response without taking into account evidence strength? If so the reason for the current model's superiority is not really to do with estimating the hazard rate, although the text implies that it is. Furthermore, to what extent does the current model address burning questions in that field, such as how confidence judgements are made or when the evidence accumulation process should terminate?
Reviewer #2:
In this manuscript by Glaze et al., the authors present a normative model for evidence accumulation in a nonstationary environment in which the successive simples are drawn from one of two alternative distributions for an unknown and variable duration. The main part of the manuscript is to compare the performance of human subjects in two different behavioral tasks with the predictions of this normative model and a simpler alternative model based on leaky integration. The results show that human subjects differ significantly from the predictions of normative model, in that subjective estimate of the rate of change (hazard rate) is close to 0.5, suggesting that they tend to give insufficient weight to the history of evidence. Although the manuscript is quite technical in nature, it is written clearly, and the findings would be of high value to many researchers in the field. There are only a few, relatively minor, comments:
1) The overall conclusion of the authors is that the subjective estimates of hazard rate are biased, but this quantity is used in a normative way. However, this might be misleading. Namely, is it fair to refer to an algorithm as “normative” if the quantity used in this algorithm is biased? Does such an algorithm behave differently, for example, from a model that uses the accurate estimate of hazard rate in nonnormative way? Unless the authors can clarify how these two different scenarios can be distinguished, how the word “normative” is used in this manuscript might need to be improved.
2) In the subsection “Psychophysics”, the authors should indicate for how many sessions, trialbytrial feedback was provided in the beginning and end of each block.
3) What do green and blue colors in Figure 7A indicate?
Reviewer #3:
This is an interesting paper that provides a significant contribution through the clear derivation of a normative approach for accumulation of evidence under conditions where the evidence is not stationary. The derivation and the description of how changes in the rate of change of environments correspond to leakiness in accumulation was very appealing.
I found the experimental section convincing in terms of showing that humans do indeed try to estimate the rate of change of the environment, and that this estimate affects how they make decisions. But the further claim made in the paper that humans behave according to the normative model was harder to be convinced by – it seemed that other (suboptimal) models that use an estimate of the hazard rate might also be consistent with the data, but this wasn't much explored in the paper. Instead, the straw man such as a model where the estimated hazard rate is a single value fixed for all time was used, but it seemed rather a weak straw man.
In addition, much greater clarity in the exposition would be desirable.
Despite these concerns, the nice derivation of the normative results under changes in environments makes me see the paper positively.
https://doi.org/10.7554/eLife.08825.018Author response
[…] As you will see below, the reviewers did not have major criticisms of the model that is presented or data, which they broadly found to be convincing. Most important in terms of solidifying the conclusions, however, are Reviewer 1's and 3's similar points about the demonstration of the qualitative reason for the improvement of the current model (Reviewer 1) and the absence of a reasonable analysis of other (suboptimal) models (Reviewer 3).
Our original submission included direct comparisons of the normative model with two suboptimal models: 1) subjective hazard rate was randomly shuffled across conditions, meant to mimic choices under the assumption of a single subjective hazard rate, but with the same parameter structure; and 2) a leaky accumulator, which was inspired by both the approximation to the normative model in the weakbelief regime (Equation 3a in the manuscript) as well as the many neural models that assume (linear) leaky integration. We now include a comparison to a third, suboptimal model, in this case assuming perfect integration to a stabilizing boundary (“bounded accumulation”), inspired by the approximation to the normative model in the strongbelief regime (Equation 3b, c). We also now provide for each experimental task a consolidated discussion of how the fits compare between the normative model versus the leaky and bounded accumulator approximations and highlight key differences between models in Figures 3, 6, and 9.
In the discussion, the reviewers and editor were also clear that the manuscript is really written for a technical audience. eLife readership is broad and the reviewers and editor would appreciate a reframing of the paper that makes the central points clearer to this broad audience.
One suggestion that emerged during the discussion was a clearer framing of the manuscript as follows:
a) The same Bayesian learning framework, that has been used in other contexts (work by Behrens, Adams, Kording and also Nassar and Gold) is here derived for evidence accumulation in perceptual decisionmaking tasks like RDM;
b) This normative framework can be described, in both discrete and continuous cases, as a leaky accumulator with nonabsorbing bounds, with the leak rate and the height of the bounds being explicit functions of the environmental change rate;
c) Show that this is the case in their data (their paradigm with variable hazard rate is a good test even if atypical of RDM experiments);
d) Provide some justification why this model helps our understanding of perceptual decisionmaking (can the relationship with change point detection explain or offer insight into previously unexplained data?);
e) Provide some clear commentary on the relationship to previous models of change point detection (can the relationship with RDMs explain or offer insight into previously unexplained data?).
We urge you to consider this, or other possible reframings in which the strong message can be understood clearly without an understanding of the technical details.
We very much appreciate the thoughtful suggestions, which we have incorporated throughout the manuscript, including the Abstract, Introduction, Results, and Discussion sections.
Reviewer #1:
[…] However, I would then suggest that the paper could be strengthened by exploring in more detail why the current model outperforms others (especially, the leaky accumulator with hazard rate as a free parameter by block). Is this because the leaky accumulator down weights past beliefs about the correct response without taking into account evidence strength?
Yes, and this failure to account for evidence strength will lead to, for example, confidence that is allowed to grow to a much higher asymptote that hinders change detection when the leaky accumulator has been optimized to process weak evidence, as in Equation 3a. In contrast, the normative model limits confidence at any given moment to the logarithmic terms in Equation 3b, c. We now provide these intuitions, along with complementary limitations of the suboptimal model with perfect accumulation to a stabilizing boundary, in the new Figure 3 and associated text.
If so the reason for the current model's superiority is not really to do with estimating the hazard rate, although the text implies that it is.
We respectfully disagree: the value of the limit in confidence mentioned above depends entirely on estimated hazard rate as in Equations 3b, c. In other words, the normative model accounts for evidence strength in a way that depends on hazard rate.
Furthermore, to what extent does the current model address burning questions in that field, such as how confidence judgements are made or when the evidence accumulation process should terminate?
We agree that these are very interesting issues that are closely related to evidence accumulation and therefore are pertinent to our model. We have an entire paragraph in the Discussion about how our model provides new insights into the speedaccuracy tradeoff and decision termination. We thank the reviewer for pointing out the link to confidence judgments, which we now also include in that paragraph.
Reviewer #2:
[…] There are only a few, relatively minor, comments.
1) The overall conclusion of the authors is that the subjective estimates of hazard rate are biased, but this quantity is used in a normative way. However, this might be misleading. Namely, is it fair to refer to an algorithm as “normative” if the quantity used in this algorithm is biased?
We would argue that the algorithm is normative while the subjective parameter estimates are not. We now explain our perspective on this important point more clearly in the Discussion, as follows:
“[W]e showed that subjects could both learn a range of hazard rates and then use those learned rates in a normative manner to interpret sequences of evidence to make decisions. However, they tended to learn imperfectly, overestimating low hazard rates and underestimating high hazard rates. […] These different set points may have reflected certain prior expectations about the improbability of either perfect stability or excessive instability that could constrain performance when those conditions occur.”
Does such an algorithm behave differently, for example, from a model that uses the accurate estimate of hazard rate in nonnormative way?
We agree that this is a very interesting question. We have provided several lines of evidence that support the idea that the subjects are using their subjective estimates of hazard rate in a normative fashion. First, we used choice data to directly estimate the mapping of subjective beliefs to priors (new Figure 6) and showed that subjects exhibited leaky/asymptotic integration in ways that closely matched predictions of the normative model using imperfect estimates of hazard rate. Second, their behavior exhibited the tradeoff between change detection and steadystate signal identification predicted by the model, based on their subjective estimates of hazard (Figures 5 and 8). Third, their behavior also reflected the signalstrengthdependent biases predicted by the model, in a manner also dependent on their subjective estimates of hazard.
In addition, we now include in the manuscript estimated subjective hazard rates from both the leaky and bounded accumulator models for both tasks. These parameters also deviate from objective values, “further supporting the idea that the subjects were using imperfect estimates [of hazard rate] to make their decisions.”
We of course cannot fully rule out the possibility that there are some other models that can better explain our choice data using objective hazard rates. However, given the relatively few parameters per block in our model, plus its ability to capture many aspects of the behavioral data, we believe that we have presented a compelling case that, to a very large degree, our subjects’ behavior is consistent with a normative process using imperfect estimates of hazard rate.
Unless the authors can clarify how these two different scenarios can be distinguished, how the word “normative” is used in this manuscript might need to be improved.
We hope that we have addressed this important point sufficiently thoroughly and clearly in the revised manuscript.
2) In the subsection “Psychophysics”, the authors should indicate for how many sessions, trialbytrial feedback was provided in the beginning and end of each block.
We apologize for the lack of detail, which we now provide.
3) What do green and blue colors in Figure 7A indicate?
We apologize for the confusion and have now ensured that all figures have appropriate legends.
Reviewer #3:
[…] I found the experimental section convincing in terms of showing that humans do indeed try to estimate the rate of change of the environment, and that this estimate affects how they make decisions. But the further claim made in the paper that humans behave according to the normative model was harder to be convinced by – it seemed that other (suboptimal) models that use an estimate of the hazard rate might also be consistent with the data, but this wasn't much explored in the paper. Instead, the straw man such as a model where the estimated hazard rate is a single value fixed for all time was used, but it seemed rather a weak straw man.
We strongly agree for the need to compare normative model fits with other likely candidate models. We had originally focused on comparison with a leaky integrator that had the leak vary freely by blocks of trials, because this alternative model seemed to be the most viable form of information accumulation examined in prior studies that could solve these tasks. We apologize for any confusion about what forms of this model we used (it was not the case that we used the true straw man of a single fixed hazard rate for all conditions). We have clarified and expanded upon these issues substantially and include for each task model comparisons to both leaky integration (allowing the leak to vary by hazardspecific block) and bounded, perfect, accumulation (allowing the bound to vary by hazardspecific block). The models are described in detail in separate subsections of Methods.
In addition, much greater clarity in the exposition would be desirable.
We have implemented the suggested changes described in the summary comments, above, and hope that we have substantially clarified the exposition.
https://doi.org/10.7554/eLife.08825.019Article and author information
Author details
Funding
National Institutes of Health (NIH) (OppNet Grant R01 MH098899)
 Christopher M Glaze
 Joseph W Kable
 Joshua I Gold
The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Gabriel Kroch and Timothy Kim for help with data collection and Joe McGuire, Yin Li, Matt Nassar, and Bob Wilson for comments.
Ethics
Human subjects: Informed consent, and consent to publish, was obtained from each subject prior to each experiment. Human subject protocols were approved by the University of Pennsylvania Internal Review Board.
Reviewing Editor
 Timothy Behrens, Oxford University, United Kingdom
Publication history
 Received: May 19, 2015
 Accepted: August 30, 2015
 Accepted Manuscript published: August 31, 2015 (version 1)
 Version of Record published: September 28, 2015 (version 2)
Copyright
© 2015, Glaze et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,821
 Page views

 551
 Downloads

 17
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.