The influence of evidence volatility on choice, reaction time and confidence in a perceptual decision
 Cited 22
 Views 3,389
 Annotations
Abstract
Many decisions are thought to arise via the accumulation of noisy evidence to a threshold or bound. In perception, the mechanism explains the effect of stimulus strength, characterized by signaltonoise ratio, on decision speed, accuracy and confidence. It also makes intriguing predictions about the noise itself. An increase in noise should lead to faster decisions, reduced accuracy and, paradoxically, higher confidence. To test these predictions, we introduce a novel sensory manipulation that mimics the addition of unbiased noise to motionselective regions of visual cortex, which we verified with neuronal recordings from macaque areas MT/MST. For both humans and monkeys, increasing the noise induced faster decisions and greater confidence over a range of stimuli for which accuracy was minimally impaired. The magnitude of the effects was in agreement with predictions of a bounded evidence accumulation model.
https://doi.org/10.7554/eLife.17688.001eLife digest
Many of our decisions are made on the basis of imperfect or ‘noisy’ information. A longstanding goal in neuroscience is to work out how such noise affects three aspects of decisionmaking: the accuracy (or appropriateness) of a choice, the speed at which the choice is made, and the decisionmaker’s confidence that they have chosen correctly.
One theory of decisionmaking is that the brain simultaneously accumulates evidence for each of the options it is considering, until one option exceeds a threshold and is declared the ‘winner’. This theory is known as bounded evidence accumulation. It predicts that increasing the noisiness of the available information decreases the accuracy of decisions made in response. Counterintuitively, it also predicts that such an increase in noise speeds up decisionmaking and increases confidence levels.
Zylberberg et al. have now tested these predictions experimentally by getting human volunteers and monkeys to perform a series of trials where they had to decide whether a set of randomly moving dots moved to the left or to the right overall. Using a newly developed method, the noisiness of the dot motion could be changed between trials. The effectiveness of this technique was confirmed by recording the activity of neurons in the region of the monkey brain that processes visual motion information.
After each trial, the humans rated their confidence in their decision. By comparison, the monkeys could indicate that they were not confident in a decision by opting for a guaranteed small reward on certain trials (instead of the larger reward they received when they correctly indicated the direction of motion of the dots).
In both humans and monkeys, increasing the noisiness associated with the movement of the dots led to faster and more confident decisionmaking, just as the bounded evidence accumulation framework predicts. Furthermore, the results presented by Zylberberg et al. suggest that the brain does not always gauge how reliable evidence is in order to finetune decisions.
Now that the role of noise in decisionmaking is better understood, future experiments could attempt to reveal how artificial manipulations of the brain contribute both information and noise to a decision. Other experiments might ascertain when the brain can learn that noisy information should invite slower, more cautious decisions.
https://doi.org/10.7554/eLife.17688.002Introduction
Decisions that combine information from different sources or across time are of special interest to neuroscience because they serve as a model of cognitive function. These decisions are not hard wired or reflexive, yet they are experimentally tractable. Psychologists have long sought to understand how the process of decision formation gives rise to three key observables (Cartwright and Festinger, 1943; Audley, 1960; Vickers, 1979). First there is the choice itself (left or right, coffee or tea), which determines accuracy in cases where a correct alternative can be defined. Second, there is the time it takes to reach a decision, which determines reactiontime (RT). RT furnishes a powerful constraint on models of decisionmaking, and is a defining element of the tradeoff between speed and accuracy that characterizes most decisions. Third, decisions are often accompanied by a graded degree of belief in the accuracy or appropriateness of the choice. This belief, referred to as decision confidence, influences many aspects of behavior: how we learn from our mistakes, plan subsequent decisions, and communicate our decisions to others. A model of the decision process ought to explain not just choices but all three of these observables in a quantitative fashion.
The family of bounded evidence accumulation models, including drift diffusion, race and attractor models, offers one such framework for linking choice, reaction time and confidence [for reviews, see Gold and Shadlen (2007); Shadlen and Kiani (2013)]. These models depict the decision process as a race between competing accumulators, each of which integrates momentary evidence for one alternative and against the others. The decision terminates when the accumulated evidence for one alternative, termed a decision variable (DV), reaches a threshold or bound, thereby determining both the choice and the decision time. Confidence in the decision derives from a mapping between the DV and the probability that a decision based on this DV will be correct. The mapping is thought to incorporate the decision time or the state of the competing (losing) accumulator(s), or both (Vickers, 1979; Kiani and Shadlen, 2009; Zylberberg et al., 2012; Kiani et al., 2014; Van den Berg et al., 2016). The noisiness of the momentary evidence causes the DV to wander from its starting point, as in Brownian motion or diffusion, whereas the expectation (i.e., mean) of the momentary evidence increments or decrements the DV deterministically. Noise is the main determinant of both RT and confidence when signaltonoise is low, that is when choices are more stochastic (less accurate). Recent evidence from neurophysiology (Kiani and Shadlen, 2009), brain stimulation (Fetsch et al., 2014), and psychophysics (Kiani et al., 2014) supports such a mechanism.
If the bounded accumulation of noisy evidence underlies choice accuracy, RT and confidence, then a selective manipulation of the noise should produce quantitatively consistent effects on all three measures. Specifically, were it possible to leave unchanged the expectation of each sample of momentary evidence while boosting the noise associated with it, then the bounded accumulation of the noisier samples should lead to (i) lower accuracy when the expectation of the momentary evidence is strong, (ii) faster reaction times when the momentary evidence is weak, and (iii) increased confidence when the momentary evidence is weak. The basic insight behind the latter two predictions is that with greater volatility, the DV tends to diffuse more quickly away from the starting point to achieve levels nearer the termination bound which are ordinarily associated with stronger evidence and thus greater confidence (Figure 1).
These predictions have not been tested thoroughly, because a controlled method for selectively increasing noise is not known. A dissociation between accuracy and confidence led Rahnev et al. (2012) to conclude that transcranial magnetic stimulation (TMS) increased the neural noise associated with the representation of a visual pattern, and a similar dissociation led Fetsch et al. (2014) to conclude that cortical microstimulation (µStim) might affect both the mean and the variance of the representation of motion by neurons in the extrastriate visual cortex (areas MT/MST). However, characterization of these effects of TMS and µStim was inferred from behavior. Similarly, psychophysical studies that attempted to increase the noise through changes in the visual stimulus (de Gardelle and Summerfield, 2011; Zylberberg et al., 2014; de Gardelle and Mamassian, 2015) or attentional state (Rahnev et al., 2011; Morales et al., 2015) did not characterize the influence of these manipulations on the neural signals that the brain accumulates to form a decision.
We therefore sought a method to manipulate the variance associated with the neural representation of momentary evidence without affecting its mean. We achieved this with a manipulation of the motion information in a random dot motion (RDM) display, by adding a second level of randomness which increased its volatility but was unbiased with respect to the strength and direction of motion evidence. We verified that the manipulation has the desired properties by recording from direction selective neurons in the middle temporal (MT) and medial superior temporal (MST) areas of the macaque visual cortex. Neurons in these areas are known to represent the momentary evidence in tasks identical to those in our study (Salzman et al., 1990; Celebrini and Newsome, 1995; Ditterich et al., 2003; Fetsch et al., 2014). We then used the volatility manipulation to test the influence of noise on the three observables of choice behavior—accuracy, RT and confidence—in monkeys and humans.
Results
A manipulation that mimics the addition of noise to the visual cortex
The standard RDM stimulus is itself stochastic, meaning that a particular movie (e.g., shown on a trial) is an instantiation of a random process that conforms to an expected motion strength and direction. On each video frame, a dot that had appeared ∆t ms ago is either displaced (i.e., moved) or replaced by a new dot at a random location within the stimulus aperture. The determination of displacement versus replacement is in accordance with a flip of a biased coin, and the magnitude of this bias confers the motion strength, which we refer to as a motion coherence (c). The sign of c indicates the direction of the displacement along an axis (e.g., up/down). Thus the probability of displacement (or unfairness of the coin) is c. The randomly replaced dots fall in the neighborhood of other dots (recently displayed) and thus contribute random motion in both directions. In the standard RDM, the coherence, c, is fixed for the duration of an experimental trial (e.g., c = 0.13; Figure 2A, left, blue line). Here we introduce a second layer of variability, wherein the mean of c is fixed for the duration of a trial but the value of c varies randomly from video frame to video frame (Figure 2A, left, red line). We will refer to trials that employ this doubly stochastic RDM as the 'high volatility' condition and those that use the standard RDM as 'low volatility'.
This description explains how the stimulus is generated, but it does not explain what effect it should have on perception or on the neural processing of motion. The construction of the RDM we use is in video frames displayed every 1/75 of a second. The visual system blurs these images over time, leading for example to the illusion that many more dots are present simultaneously than are actually displayed. The right panel of Figure 2A applies an established motion filter (Adelson and Bergen, 1985) to the example movies parameterized by the low and high volatility traces shown in the left panel (see also Video 1). The filter extracts a timeblurred motion signal that provides a reasonable approximation to the firing rates of direction selective neurons in the primate visual cortex (Britten et al., 1993; Rust et al., 2006; Hedges et al., 2011). The example highlights the subtlety of the volatility manipulation by reminding us that the standard RDM is itself volatile (blue curve) such that the overall contour of both traces is similar. Nonetheless, the extra bumps and wiggles in the red trace result from the random variation in c.
A more systematic analysis of the motion energy, displayed in Figure 2B, reveals that the mean is identical for low and high volatility stimuli, for all motion strengths (upper panel), whereas the variance is larger for the high volatility stimuli (lower panel). The linear relationship between the mean motion energy and c is known (Britten et al., 1993), but the dependency of variance of the motion energy on c is less well characterized. For the low volatility condition (Figure 2B bottom, blue trace), the motion energy variance is dominated by the variance in the number of coherently displaced dots, which obeys a binomial distribution, hence the monotonic increase over the range of c = 0 to 0.5. For the high volatility condition (Figure 2B bottom, red trace), the overall increase in variance is not surprising, because we have added a second layer of variability. Note that the effect is strongest at the low coherences, where the distribution of c in the high volatility condition spans both positive and negative values.
These observations characterize the volatility present in the visual stimulus, but we are mainly interested in the noisy signals that the brain accumulates to form a decision. We therefore measured the impact of volatility on the response of direction selective neurons in cortical areas MT/MST (Figure 2C). These neurons represent the momentary evidence used by monkeys to guide their choice, reaction time and confidence (Salzman et al., 1990; Celebrini and Newsome, 1995; Ditterich et al., 2003; Fetsch et al., 2014) in motion discrimination tasks. As previously shown (Britten et al., 1993), the firing rate of MT neurons increases linearly, on average, as a function of motion strength in the neuron’s preferred direction (c > 0, Figure 2C, upper panel, blue trace). The firing rate decreases linearly, but less steeply, as a function of motion strength in the antipreferred direction (c < 0), giving rise to a bilinear function. We refer to the shallower slope for c < 0 as rectification (Britten et al., 1993). These features are preserved under high volatility (red trace), but there is a subtle increase in firing rate at the low coherences, which is explained by the rectification of neural responses when the distribution of c spans positive and negative values (Figure 2C, inset). The variance of the neural response is known to scale approximately linearly with firing rate (Tolhurst et al., 1983; Vogels et al., 1989; Geisler and Albrecht, 1997; Shadlen and Newsome, 1998). Thus the variance curves in Figure 2C (lower panel) parallel the means. The high volatility condition adds to the variance in a manner that is exaggerated at the low motion strengths, consistent with the motion energy analysis above.
We are now ready to consider the mean and variance of the quantity that is integrated toward a decision. We assume that the momentary evidence is the difference between the average firing rates from two pools of neurons with direction preferences for the two opposite directions (e.g., rightpreferring minus leftpreferring) (Shadlen et al., 1996; Ditterich et al., 2003; Hanks et al., 2006). The expectation of this signal can be estimated empirically by subtracting the mean firing rates of single neurons to motion in their preferred versus antipreferred directions (Figure 2D). Notice that the rectification is now canceled by the subtraction.
The variance of this difference is more nuanced, drawing on two related considerations. First, because we did not record multiple single units simultaneously, we are not directly measuring the variance of the pools. Assuming a population of correlated neurons, the variance of the population mean differs from that of a single neuron by a multiplicative constant. For large pools, the variance is reduced to roughly $r{\sigma}^{2}$, where $r$ is the average pairwise spikecount correlation for neurons within the pool and ${\sigma}^{2}$ is the variance of the spike counts from a single neuron (see Materials and methods). In MT, $r$ is on the order of 0.2 for neurons with similar directional preferences (Zohary et al., 1994; Bair et al., 2001). An important implication of such correlation is that the beneficial effects of pooling saturate with modest number of neurons (e.g., 50–100; [Zohary et al., 1994; Shadlen et al., 1996]).
Second, the variance of the MT population comprises contributions from the variance in motion energy, described above, as well as a component that is independent of stimulus fluctuations. The opposing pool is assumed to share the component of variance originating in the stimulus, albeit of opposite sign, so the variances add rather than cancel in the difference. In contrast, the stimulusindependent component of shared variance (e.g., driven by fluctuations of arousal) should have the same sign in the two pools and thus cancel in the difference.
For a given coherence c and volatility v, the variance of the difference in neuronal response between a pair of populations selective to the preferred and antipreferred directions is given by:
where ${\sigma}_{c,v}^{2}$ and ${\sigma}_{c,v}^{2}$ are the variance of the spike counts for motion in the preferred and antipreferred directions, $r$ is the average pairwise correlation for neurons within the same pool, and $\rho $ is the correlation between the two pools with opposite direction preferences. The variances on the righthand side of Equation 1 can be obtained from Figure 2C. However, without simultaneous recordings from neurons in the two pools, we cannot know how much of the variability is shared across neurons.
In Figure 2D we explored three different values of $\rho $: 0, −0.5 and −1 (with $r=0.2$). Note that positive values of $\rho $ are unlikely because a large portion of the shared variability comes from stimulus fluctuations, which as stated above induce changes in firing rate of opposite sign in the two pools. Under the low volatility condition, the variance of the difference variable increases slightly as a function of motion strength. This is a consequence of rectification and the tendency for variance to parallel the mean firing rate. More importantly, the doubly stochastic stimuli lead to a marked increase in ${\sigma}_{\Delta}^{2}$, especially in the low coherence range where the impact on motion energy is greatest. This effect did not depend on the value of $\rho $ (Figure 2D).
From these complementary analyses of stimulus and neural response, we conclude that the volatility manipulation has negligible effects on the expectation of momentary evidence and more substantial effects on the variance, especially at weak motion strengths. This enables us to proceed with a critical test of the bounded accumulation framework. In what follows we attempt to ascertain whether a change in the variance of the momentary evidence, introduced by our volatility manipulation, affects decision speed, accuracy, and confidence in accordance with the predictions of bounded evidence accumulation.
Effect of volatility on choice and reaction time
One monkey (monkey W) and three humans were required to decide between two possible directions of motion and, when ready, to indicate their decision by looking to one of two targets (Figure 3A). For both high and low volatility conditions, stronger motion led to faster and more accurate choices. The main effect of high volatility was to decrease RTs, particularly at the weakest motion strengths (Figure 3B, bottom row, red). This effect was robust for all three human subjects and the monkey (Equation 16, all p<0.03, ttest, H_{0}: ${\beta}_{2}=0$). The manipulation affected the accuracy only subtly, and this was not statistically reliable for individual subjects in the RT task (Figure 3B, top row; for the four subjects: p=[0.35, 0.65, 0.2, 0.26], Equation 17, likelihoodratio test). However, there was a significant effect of volatility on accuracy when pooling data across subjects and including data from the confidence tasks described below (Equation 18, p<0.0005, likelihood ratio test, H_{0}: ${\beta}_{2}=0$; see also Figure 3—figure supplement 1).
The pattern of results in Figure 3B is consistent with the hypothesis that decisions are made when an accumulation of noisy evidence reaches a bound. Indeed, the smooth curves are fits of this model to the data, where the variance of the momentary evidence is the only parameter that we allowed to change between conditions of high and low volatility (see below).
The effect of increased volatility on RT is most apparent at motion strengths near zero, for two reasons: (i) the volatility manipulation has a larger impact on variance of the motion energy at the weak motion strengths (Figure 2B), and (ii) the time to reach a bound is dominated by the variance of the momentary evidence, ${\sigma}_{\Delta}^{2}$, when the motion strength is weak. For instance, when c = 0, the average time required by a diffusion process to reach a bound is proportional to ${\sigma}_{\Delta}^{2}$ (Shadlen et al., 2006). These considerations also help to reconcile the contrast between the striking effects of volatility on RT versus subtle effects on choice accuracy: the volatility manipulation mainly affects the weakest motion strengths where accuracy is already poor (but see Figure 3—figure supplement 1). The important point is that by increasing noise, the volatility manipulation accelerates the dispersion of the decision variable away from its expected value and nearer the termination bounds, hence faster RT. A similar idea guides intuitions about the effect of volatility on confidence in a decision.
Effects of volatility on confidence
Confidence refers to the belief that a decision one is about to make (or has just made) is likely to be correct. In the framework of bounded evidence accumulation, it can be formalized as the conditional probability of a correct choice given the state of the DV, which comprises the accumulated evidence and elapsed decision time (Equation 5). For the motion discrimination task, this can be calculated by considering, for each possible state of the DV, the likelihood that it was the result of motion strength of the appropriate sign. We refer to this as a mapping between DV and probability correct (Figure 1C). It depends on the set of possible motion strengths (the prior distribution of c), the two possible volatility conditions, and the amount of time that has elapsed in the trial. We assume the subject has implicit knowledge of this mapping, and does not adjust it when a low or high volatility stimulus is shown. The latter seems justified because volatility levels were randomly interleaved and not cued or even mentioned to the subjects (we evaluate this assumption, below, in several alternative models).
Increased volatility should affect confidence because it mimics an increase in the diffusion rate. At low coherences in particular, its main effect on the DV is to accelerate its exodus away from neutral (probability correct = 0.5) to more extreme values. Therefore, we predicted that volatility would increase confidence at low coherences, for the same reason that it speeds the RT. To test this prediction, we used two variants of the motion task, tailored to the abilities of monkeys and humans.
Monkey D was trained on a motion discrimination task with postdecision wagering (Kiani and Shadlen, 2009) (PDW; Figure 4A). The monkey had to decide between two opposite directions of motion and report its decision after a memory delay. The monkey was rewarded for correct decisions and randomly on the 0% coherence trials. On half of the trials, the monkey had the opportunity to opt out of reporting the direction choice and to select instead a smaller but certain reward. The 'sure bet' option was not revealed until at least onehalf second after motion offset (i.e., during the delay). The task design thus encouraged the monkey to perform the direction discrimination on every trial. After extensive experience with the standard RDM (>100,000 trials; low volatility condition), we introduced the high volatility RDM on a random half of the trials. Single and multiunit recordings during performance of this task furnished the data for Figure 2C–D, as well as additional neurophysiological analyses described later.
In both low and high volatility conditions, the monkey made rational use of the sure bet, opting out more often for weaker motion (Equation 19, p<10^{–6}, logistic regression, likelihoodratio test; Figure 4B) and for briefer stimuli (Equation 19, p<10^{–6}, logistic regression; Figure 4—figure supplement 1). When the sure bet was offered but waived, choice accuracy was higher than when the sure bet was not offered (Equation 20, p<10^{–6}, logistic regression; Figure 4—figure supplement 1). This indicates that the monkey was more likely to opt out of rendering its decision when the answer was more likely to be wrong. It implies that the decision to accept or waive the sure bet is based on the state of the evidence on the trial and not a general propensity associated with each motion strength (Kiani and Shadlen, 2009).
The main question we wished to address is whether the high volatility condition would elicit fewer surebet choices, consistent with greater confidence. As shown in Figure 4B (lower panel), the proportion of trials the monkey decided to waive the surebet option (deciding instead for a riskier direction choice) was greater on the high volatility trials (Equation 19, p<10^{–6}, likelihoodratio test). Thus, high volatility increased the monkey’s confidence, and did so despite a negligible effect on accuracy (Figure 4B, upper). Further, like its effect on RT, volatility affected PDW mainly when the motion was weak (Figure 4B, lower).
We confirmed the relationship between volatility and confidence in human participants. Instead of using PDW, we asked subjects to report their confidence on a scale from “feels like I’m guessing” to “certain I’m correct.” The same three observers that performed the reaction time task participated in this second experiment. The RDM (low or high volatility, randomly interleaved) was displayed for a fixed 200 ms on each trial, after which they reported the perceived direction of motion (left or right) and the confidence in their decision. Participants reported the choice and the confidence rating by looking at a particular position on one of two elongated targets (Figure 5A), where the left or right target specified the motion choice and the vertical position was used to indicate confidence. They were allowed to adjust their gaze to the desired level before finalizing their combined choice and confidence report (Figure 5A). We thus encouraged subjects to use all available information in the 200 ms stimulus for both reports (Van den Berg et al., 2016). The results from the human observers were similar to those from the monkey. Naturally, subjects were more confident for high coherence stimuli (Equation 21, p<10^{–6}, ttest; Figure 5). They also reported higher confidence for the high volatility stimuli, and the effect was most apparent for the low coherence stimuli (Equation 21, p<0.0004, ttest).
A common mechanism for the effects of volatility on choice, RT and confidence
So far, the effect of volatility has been described qualitatively. Now we show how a single bounded accumulation model can account for the combined effect of motion strength and volatility on choice, accuracy and RT. In the model, choice, RT and confidence result from the accumulation of noisy momentary evidence as a function of time, until the integral of the evidence (the decision variable, DV) reaches one of two bounds, or for the PDW and confidence tasks, until the stimulus is curtailed. In the latter case, the sign of the DV determines the choice.
The DV is updated at each time step by the addition of a constant, proportional to motion strength, plus a draw from a zeromean Gaussian distribution. In the language of driftdiffusion, the former gives rise to deterministic drift and the latter to a Wiener process scaled by a diffusion coefficient. The noise is itself comprised of stochastic contributions from the stimulus and its neural representation. Many studies make the simplifying assumption that the variance of the momentary evidence is fixed and independent of motion strength (Ditterich et al., 2003; Palmer et al., 2005; Shadlen et al., 2006). This would be the case if the momentary evidence obeyed the idealization in Figure 2B and if the neural responses of rightward and leftward preferring neurons exhibited variance that scaled linearly with mean. Then the difference between population responses would have the same variance for all motion strengths. However, the partial rectification (Figure 2C) implies that the variance of the difference should increase as a function of motion strength.
We characterize the dependency of the diffusion coefficient on motion strength and volatility based on the empirical observations of Figure 2. These analyses showed that (i) the variance of the momentary evidence increases with motion strength, and (ii) the difference in noise between volatility conditions is larger at 0% coherence and decays gradually for stronger motion. We capture these observations with a simple parameterization of the diffusion coefficient (Equations 2 and 3). First, we assumed that in the low volatility condition, the variance of the momentary evidence increases linearly with motion strength (Figure 2—figure supplement 1, blue trace; note the log scale of the abscissa). Second, we modeled the additional variability introduced by the doublystochastic manipulation as a variance offset at 0% coherence that decays exponentially as motion strength increases (Figure 2—figure supplement 1, red trace).
The framework can explain confidence if we assume that the brain has implicit knowledge of (i) the state of the accumulated evidence, (ii) the elapsed deliberation time, and (iii) the mapping of time and evidence to the probability of making a correct choice (Figure 1C). Time matters because the same level of accumulated evidence is associated with lower levels of accuracy if the evidence was accrued over longer periods of time (Figure 1C). In PDW, a surebet choice supersedes a direction decision if the probability correct (estimated from the state of accumulated evidence and the decision time) is lower than a criterion $\Phi $. In the human confidence task, probability correct is transformed into a confidence rating through a monotonic transformation (Materials and methods).
The solid curves in Figures 3–5 are model fits. The model was fit to maximize the likelihood of the observables (choice and RT in the reaction time task; choice and sure bet in PDW). Bestfitting parameters are shown in Table 1. In the confidence task, we fit one parameter per subject ($\kappa $; see Materials and methods). This parameter was fit to maximize the likelihood of the direction choices. All other parameters were taken from the RT task, performed by the same participants. Therefore, the confidence curves in Figure 5B can be considered predictions of the model. These predictions capture the trend well, supporting the notion that time and accumulated evidence are the main determinants of confidence in a perceptual choice, even when noise is under experimental control. The overall quality of the fits—across all tasks and both species—indicates that the influence of motion strength and volatility on choice, reaction time and confidence can be explained by a common mechanism of bounded evidence accumulation.
Alternative models
Up to now, we have attempted to explain the data on the assumption that subjects apply the same mapping between the accumulated evidence (the DV) and the probability that a decision rendered upon that evidence will be correct (i.e., confidence), regardless of the volatility condition. As stated earlier, the mapping is derived from all possible motion strengths, directions, and volatility conditions. Thus, we assume that subjects do not infer the noisiness of incoming evidence, or that if they do, they do not revise the mapping accordingly. An alternative is that the brain infers an estimate of the noisiness of the stimulus, in real time, to adjust the parameters of the decision process (Deneve, 2012; Qamar et al., 2013) or the evaluation of confidence (Yeung and Summerfield, 2012). This is a reasonable proposition, at least in principle, because the sample mean and variance of the motion energy can be used to classify volatility conditions with 90% accuracy (see Materials and methods).
We evaluated several 'two map' models which apply a different mapping between the DV and probability correct for each volatility condition. The first twomap model implements the assumption that subjects have full and immediate knowledge of the volatility condition on each trial. Although the maps are qualitatively similar (compare the isoconfidence contours of Figure 6A), the consequence of having separate maps is to reduce the effect of volatility on confidence. When fit to data, this twomap model produces visibly worse fits than the model that relies on a common map, despite having the same number of parameters (Figure 6B; ∆BIC = 252.4 favoring the commonmap model; see Table 2 for parameter fits).
For the second twomap model, the assessment of volatility is not instantaneous but evolves over the course of a trial. For simplicity, we assumed that the probability of correctly identifying the volatility condition increases monotonically at a rate determined by a free parameter (see Materials and methods). Interestingly, the rate estimated from the best fit is exceedingly slow. For example, after 1 s of viewing, the weight assigned to the appropriate volatility map is just 1%. In other words, the confidence is dominated by the common mapping, consistent with our assumption. The fit is indistinguishable from the commonmap model depicted in Figure 4 (see Table 2), and the BIC statistic revealed that the addition of the extra parameter was not justified (∆BIC = 7.24).
We also considered the possibility that subjects used different termination criteria (bound heights) on low and high volatility trials. For the PDW task, this amounts to the addition of an extra free parameter in the first twomap model above. This model was also inferior to the simpler commonmap model (∆BIC = 127; see Table 2 for parameter fits). This is not surprising because in the PDW task, stimulus duration is controlled by the experimenter, and bounds merely curtail the expected improvement in accuracy on longer duration stimuli. We also fit a model for the RT task that allowed the bounds to be different for the two volatility conditions. This led to a marginal increase in the likelihoods, but not enough to justify the addition of the extra parameter (∆BIC = [29.4, 7.7, 27.3, 12.1] for the four subjects; Table 2).
These analyses of alternative models support our assumption that subjects applied a common mapping and decision strategy on trials of low and high volatility. We do not believe this holds generally but is likely a consequence of the particular volatility manipulation and task designs we employed. Indeed, the normative strategy for several model tasks, which approximate those in our study, would apply different bounds and mappings to the two volatility conditions (see Appendix). The full normative solution for the tasks we used is not known. Hence, we do not know if our subjects performed suboptimally or if they were simply unable to identify the volatility conditions without adding additional costs (e.g., effort and/or time).
Choice and confidencepredictive fluctuations in MT/MST activity
The role of the neural data in this study was to validate and characterize the volatility manipulation in a population of neurons known to represent the momentary evidence used to inform decisions and confidence (Salzman et al., 1990; Ditterich et al., 2003; Hanks et al., 2006; Fetsch et al., 2014). Nevertheless, there are features of this limited data set which are germane to findings associated with the confidence task in particular. We share them in Figure 7, accompanied by the proviso that the data set is limited.
Consistent with earlier reports (Britten et al., 1992), trial to trial variation in the activity of neurons in MT/MST were indicative of the choice that the monkey was about to make. Figure 7A shows averaged residual responses, formed by subtracting the mean response for each motion strength as a function of time and multiplying by ±1 if the monkey chose the preferred of antipreferred direction, respectively. Positive residuals therefore indicate an excess of activity in the chosen direction. For both low and high volatility conditions, trialtotrial variation in the neural response was reflected in the monkey’s choices. The fluctuations were more informative in the high volatility condition, presumably because they were induced by exaggerated variance in the motion display itself (e.g., Figure 2A). Notably, the time course of choicerelated signals evolved with similar latencies in the low and high volatility conditions. The latencies were comparable to that of the direction selective signal itself (Figure 7B), suggesting that the choice was informed by the earliest motion information available in the stimulus (Kiani et al., 2008). The influence of neural variation declines over 200 ms, consistent with the idea that the brain terminates some decisions before the end of the stimulus presentation (Kiani et al., 2008).
The trialbytrial variation in neural activity was also correlated with the decision to accept or waive the surebet option, when it was offered. Monkeys should opt out of the direction decision when the evidence is weak, and waive the sure bet when the evidence is strong. For positive coherences (i.e., net motion in the preferred direction), the residuals of firing rate were on average negative (Figure 7C, magenta trace). This implies that the monkey tended to opt out of the direction decision when the neural representation of the evidence was weaker than average. For negative coherences (net motion in the nonpreferred direction), the residuals were positive on average (Figure 7C, blue trace), for an analogous reason. The difference between the two traces furnishes an estimate of the time course over which MT/MST neurons inform the decision to opt out. Notice the similarity in the time course of the choice and confidence signals (compare Figure 7A and C). The latency estimate derived from Figure 7C was unreliable (arrow and horizontal error bar, Figure 7C), but it was corroborated by a complementary analysis of the trials in which the monkey waived the sure bet (Figure 7D). Here we compared the average firing rate residuals on trials when the monkey waived the surebet option (green trace) with those on trials when the sure bet was not available (orange trace). We expect these traces to differ if the monkey waves the sure bet on trials when the neural responses are stronger. The point of divergence of the two traces in Figure 7D furnishes a more reliable estimate of the latency with which confidence signals are represented in the neuronal response (arrow). These results indicate that early motion evidence simultaneously informs both choice and confidence (Zylberberg et al., 2012). They are inconsistent with the proposal that choice and confidence are resolved in strict succession, as these predict that confidence selectivity ought to emerge later than choicerelated signals (Pleskac and Busemeyer, 2010; Navajas et al., 2016).
Discussion
We have shown that a stimulus manipulation that increases the variance of the momentary evidence bearing on a decision—what we term volatility—increases both the speed of the decision and the confidence associated with it. Testing the influence of volatility on the decision process is difficult, because it requires independent control over the signal and the noise in the evidence. We mimicked a manipulation of noise by changing the statistical properties of a dynamic stimulus. Our approach differs from recent studies that have attempted to vary evidence reliability through stimulus manipulations (de Gardelle and Summerfield, 2011; Zylberberg et al., 2014; de Gardelle and Mamassian, 2015) in that we (i) applied the manipulation to a well studied motion task for which much is known about the underlying physiology; (ii) verified the effect of the manipulation by recording from neurons in the visual cortex of the macaque, and (iii) showed how a framework based on the bounded accumulation of evidence can account for the joint effect of volatility on choice, reaction time and confidence.
The modeling framework pursued here was able to explain the observed pattern of choices, RTs and confidence in a quantitatively coherent way (Figures 3–5), even predicting subjects’ confidence ratings (Figure 5B) based on a fit to their RT data from a separate experiment (Figure 3B). The intuition is that increased volatility disperses the decision variable away from its expectation. For low coherences, it accelerates departure from the starting point (i.e., neutral evidence) and closer to one of the decision bounds. This tendency to arrive at larger absolute values of accumulated evidence—in support of either choice—leads to faster and more confident decisions (Zylberberg et al., 2012; Maniscalco et al., 2016). The intuition would apply to any theoretical framework that would associate confidence with the absolute deviation of a DV from neutral. This includes models based on signal detection theory (Clarke et al., 1959; Ferrell and McGoey, 1980; Macmillan and Creelman, 2004; Kepecs and Mainen, 2012; Fleming and Lau, 2014); however, these models ignore the temporal domain and are thus unable to account for RT or the strong correlation between deliberation time and confidence (Figure 4—figure supplement 1) (Henmon, 1911; Pierrel and Murray, 1963; Vickers et al., 1985; Link, 1992; Kiani et al., 2014).
These intuitions and our fits to the data rest on the assumption that subjects do not change their decision strategy based on the volatility of the evidence on a particular trial. On all trials, we assumed subjects applied the same termination policy (i.e., decision bound) and the same mapping between the state of the evidence and confidence, for both volatility conditions as well as for all motion strengths (Gorea and Sagi, 2000; Kiani and Shadlen, 2009). We considered and rejected alternative models in which the brain uses volatility to adjust the mapping and/or the decision bound. In particular, if different mappings between DV and confidence were used for the low and high volatility conditions, a larger excursion of the DV would be required in the high volatility condition to reach the same level of confidence, predicting a pattern of postdecision wagering behavior that was not supported by our data (Figure 6). In the RT task, volatility could be used to adjust the height of the decision bound in the face of lower reliability in order to maximize reward rate (Deneve, 2012; Drugowitsch et al., 2014). Indeed, the normative solution for a simplified version of the RT task is to increase the bound height on high volatility trials, which nevertheless leads to slightly faster responses than for low volatility trials when the motion is weak (Appendix 1—figure 1). However, this idea presupposes knowledge of reliability on the trials, which ought to predict lower confidence in the high volatility condition. Thus, models that posit an online estimation of reliability [cf., Deneve (2012); Yeung and Summerfield (2012); Qamar et al. (2013)] make predictions that run counter to one or more of the trends we observed.
This does not mean humans and monkeys are incapable of using information about stimulus reliability or difficulty to adjust their decision policy, and perhaps they would have in other circumstances (Qamar et al., 2013; Shen and Ma, 2016). For instance, had we used only a very difficult and a very easy condition, there would be a stronger incentive to ascertain the difficulty of the decision online and use different termination criteria for each condition. However, our experiment—in particular, the mixture of interleaved motion strengths and the volatility manipulation—is representative of a broad class of decisions for which the reliability of the evidence is unknown to the decisionmaker before beginning deliberation and not readily apparent from a small number of samples. In such circumstances, an estimate of reliability might be viewed as another decision, which would entail (i) specification of alternative hypotheses about reliability, (ii) defining which stimulus features constitute evidence bearing on these hypotheses, (iii) accumulating the relevant evidence, and (iv) specifying a termination criterion for this decision. Such an evaluation must balance the benefits derived from the use of reliability to adjust the parameters of the decision process trial by trial, with the associated cost in time and effort.
Even if subjects were cued explicitly about reliability, it is not clear that they would adjust the decision criteria on a trialbytrial basis. In a detection task where the stimulus categories were signaled by an external cue, human subjects did not adjust the decision criterion to the levels used when each stimulus category was presented on its own (Gorea and Sagi, 2000). Instead, subjects behaved as if they assumed a common distribution of signals encompassing all stimulus conditions and applied a single decision criterion. Our volatility manipulation was more subtle than an explicit cue, but we do not doubt that our subjects could perform above chance in a 2AFC experiment if they were trained to identify the higher volatility stimulus among a pair sharing the same motion strength. If nothing else, they could monitor their own decision times and confidence. However, when a mixture of different levels of volatility are presented in a sequence of otherwise similar events (trials), subjects appear to combine trials of low and high volatility to form a single internal distribution with signed coherence as the only relevant dimension.
Our results highlight limitations to the brain’s capacity to extract and exploit knowledge of volatility. Our study may therefore be of interest to psychologists and behavioral economists (d' Acremont and Bossaerts, 2016). Systems with multiple interacting units, like financial markets, sometimes give rise to 'leptokurtic' distributions, referred to as those where the probability of extreme events is larger than expected from normal distributions (Mandelbrot, 1997). A simple way of constructing leptokurtic distributions is by mixing Gaussian distributions that have the same mean but different variances, similar to our doubly stochastic (high volatility) stimulus. When interpreting ‘leptokurtic’ noise, people appear to overreact to outliers. For instance, when making stock investment decisions, people often misinterpret large fluctuations as evidence for a fundamental change in expected value (De Bondt and Thaler, 1990). Similarly, our subjects interpreted the 'outliers' introduced by our doubly stochastic procedure (motion bursts of unlikely strength given the average motion strength of the trial) as if they were caused by a higher coherence stimulus. In this sense, they behaved as if the noisy samples they acquired were generated by a mesokurtic distribution (e.g., Gaussian). Is intriguing to think that the inferences and biases that people display in simple decisions about stochastic motion may bear on how they interpret and act upon stochastic signals operating over longer time scales.
Materials and methods
Random dot stimuli
Request a detailed protocolThree humans and two monkeys performed one or more tasks where they had to make binary choices about the direction of motion of a set of randomly moving dots drawn in a circular aperture. Dots could move in one of two opposite directions, and were generated as described in previous studies (e.g., [Roitman and Shadlen, 2002]). Briefly, three interleaved sets of dots were drawn in successive frames (monitor refresh rate: 75 Hz). When a dot disappeared, it was redrawn 40 ms later (i.e., 3 video frames) either at a random location in the stimulus aperture or displaced in the direction of motion.
We refer to trials where the probability of coherent motion is fixed within the trial as ‘low volatility’, and trials where it varies within the trial as ‘high volatility’. Trials of low and high volatility were uncued and randomly interleaved. Example stimuli can be seen in Video 1.
RT task
Request a detailed protocolWe studied the relationship between volatility and decision speed with a reactiontime version of the randomdot motion discrimination task (Roitman and Shadlen, 2002). Three human participants completed 6631 trials (subject S1: 2490 trials; S2: 2070; S3: 2071), and one macaque (monkey W) completed 14,137 trials.
Each trial started with subjects fixating on a central spot (0.33° diameter) for 0.5 s. Then two targets (1.3° diameter) appeared on the horizontal meridian at an eccentricity of 9º to indicate the two possible directions of motion. Observers had to maintain fixation for an additional 0.3–0.7 s (sampled from a truncated exponential with $\tau$ = 0.1 s) and were then presented with the motion stimulus, centered at fixation and subtending 5° of visual angle. Dot density was 16.7 dots/deg^{2}/s, and the displacement of the coherent dots was consistent with apparent motion of 5 deg/sec.
Feedback was provided after each trial. Correct decisions were rewarded with a drop of juice (monkey) or a pleasant sounding chime (humans). Errors were followed by a timeout of 1 (human) or 5 (monkey) seconds, and, in humans, also accompanied by a lowfrequency tone. For the monkey, a minimum time of 950 ms was imposed from dot onset to reward delivery (e.g., Hanks et al., 2011) in order to discourage fast guessing. Trials employing 0% coherence motion were deemed correct with probability ½.
Confidence task (Monkey)
Request a detailed protocolA second monkey (monkey D) was trained to perform a direction discrimination task with postdecision wagering (Kiani and Shadlen, 2009). After acquiring fixation, two targets appeared (6.5–9° eccentricity) to indicate the alternative directions of motion, followed by the motion stimulus after a variable time (truncated exponential; range 0.3–0.75 s, $\tau $ = 0.25 s). Motion viewing duration was sampled from a truncated exponential distribution (range 0.1–0.93 s, $\tau $ = 0.3 s). After motion offset, the monkey had to maintain fixation for another 1.2 to 1.7 s. During this delay, a third target (surebet target; T_{s}) appeared on half of the trials, no earlier than 0.5 s from motion offset, positioned perpendicular to the axis of motion. After this delay, the fixation point disappeared, cueing the monkey to report its choice. Correct decisions led to a juice reward, and incorrect decisions led to a timeout (5 s). Selecting the surebet led to a small but certain reward, roughly equivalent to 55% of the juice volume received in correct trials.
The monkey performed a total of 65,751 behavioral trials, a subset of which (44,334 trials) were accompanied by neurophysiological recordings. By convention, positive motion coherences correspond to the preferred direction of motion of the recorded neurons. When paired with neural recordings, the speed and direction of motion, and the size of the circular aperture, were adjusted to match the properties of the neuron or multiunit site under study (see below).
Confidence task (Human)
Request a detailed protocolThe relationship between volatility and confidence was also studied in a task that required explicit confidence reports. After the subject fixated a central spot, two crescentshaped targets appeared on each side of the fixation (Figure 5). The targets were the left and right arcs of a circle (radius 10° visual angle) centered on the fixation point. These arcs were visible for for $2\pi /3$ radians (i.e., extending ± 60° angle above and below the horizontal meridian). The left (right) target ought to be selected to indicate that the perceived direction of motion was to the left (right, respectively). Subjects were instructed to select the upper extreme of the targets if they were completely certain of their decision, and the lowermost extreme if they thought they were guessing. Intermediate values represent intermediate levels of confidence. Visual aid was provided by coloring the targets in green at the top, red at the bottom, with a gradual transition between the two. After a variable delay during which participants had to maintain fixation, the random dot motion stimulus was shown for a fixed duration of 200 ms. Dot speed, density and aperture size were identical to the RT experiment. After motion offset, the subjects were required to indicate their response by directing the gaze to one target. Decisions were reported without time pressure and subjects were allowed to make multiple eye movements until they pressed the spacebar to accept the confidence and the choice. The same participants that completed the RT task performed the confidence task (subject S1: 1536 trials; S2: 2103; S3: 2107).
Neurophysiological methods
Request a detailed protocolAll animal procedures complied with guidelines from the National Institutes of Health and were approved by the Institutional Animal Care and Use Committee at Columbia University. A head post and recording chamber were implanted using aseptic surgical procedures. Multi (MU) and singleunit (SU) recordings were made with tungsten electrodes (1–2 MΩ, FHC). Areas MT (n = 13 SU and 9 MU sites) and MST (n = 13 SU, 12 MU) were identified using structural MRI scans and standard physiological criteria. We did not observe substantial differences between the two areas in the main results (Figure 2) and therefore pooled the data for all analyses. However, the sample size is too small to rule out subtle differences between areas.
The electrode was advanced while the monkey viewed brief, highcoherence randomdot motion stimuli of different directions while fixating a central target. When we encountered an area with robust spiking activity and clear directionselectivity, we attempted to isolate a single neuron (SortClient software, Plexon Inc., Dallas, TX, USA) but otherwise proceeded with mapping of receptive field position, size, preferred speed and direction based on multiunit activity, as described previously (Fetsch et al., 2014). When direction tuning was sufficiently strong (>2 S.D. separating firing rates for preferred vs. antipreferred direction motion), we proceeded with the PDW task, tailoring the stimulus to the neurons’ RF and tuning properties and aligning the choice targets with the axis of motion.
Bounded accumulation model
Request a detailed protocolSolid lines in Figures 3–5 represent fits (or predictions) of a bounded accumulation model. In the model, noisy momentary evidence is accumulated until the integral of the evidence (termed the decision variable, DV) reaches one of two bounds at $\pm B(t)$, or until the motion stimulus is terminated by the experimenter. The momentary evidence comprises samples from a Gaussian distribution with mean $\kappa c$ and variance ${\sigma}_{v}^{2}(\text{c})$, where $\kappa $ is a constant, $c$ is the motion coherence, and $v$ indicates whether the volatility is high or low. In most applications of diffusion models, the variance is assumed to be fixed and independent of motion strength, but our analyses of the motion energy and the neuronal recordings (Figure 2), motivate a more complex dependence of variance on $c$ and $v$. To capture these trends parsimoniously, we modeled the variance as a linear function of motion strength
plus an offset for the high volatility, which was maximal at c = 0 and diminishing at higher coherences:
The three degrees of freedom ($\beta ,\alpha ,\gamma )$ control the slope of the coherence dependence, the effect of volatility at $c=0$, and its diminishing effect at higher coherence (Figure 2—figure supplement 1). We constrained the variance in the high volatility condition to be monotonically increasing. Note that the unity constant in Equation 2 is necessary because a model in which the offset is a free parameter in addition to $\kappa $ and $B\left(t\right)$ is equivalent to one in which the offset is set to 1 and $\kappa $ and $B\left(t\right)$are scaled appropriately (Palmer et al., 2005; Shadlen et al., 2006).
For a given motion coherence and volatility ($v$), the probability density function for the state of the decision variable ($x$) as a function of time ($t$) is given by a onedimensional FokkerPlanck equation:
where $p$ is the probability density of decision variable $x$ at time $t$. Boundary conditions were such that the probability mass is 1 for $x=0$ at $t=0$, and the probability density vanishes at the upper and lower bounds $\pm B(t)$.
Confidence is given by the probability of being correct given the state of the evidence (x) and elapsed time, which could either correspond to the time of boundcrossing or the stimulus duration if no bound was reached. Because the direction decision depends on the sign of $x$, the sign of the decision variable must equal the sign of the coherence for the choice to be correct, except for 0% coherence trials that are rewarded at random. Therefore,
where $t$ is either the time at which the bound was hit or the time at which the stimulus was curtailed. The distribution over coherences $p(cx,t,v)$ can be obtained by Bayes rule, such that $p(cx,t,v)\propto p(x,tc,v)p(cv)$, where the constant of proportionality ensures that ${\sum}_{c}p(cx,t,v)=1$. This constitutes a mapping between the DV and probability correct, which is the basis for assignment of confidence to a decision (Figure 1C). In general we assume that the same mapping $p(corrx,t)$ supports confidence ratings (and PDW) on all trials irrespective of volatility, but evaluate this assumption using the alternative models described below.
The data were fit to maximize the likelihood of the parameters given the choice, confidence and RTs observed on each trial. In the RT task, the model parameters were maximum likelihood fits to choice and RT:
where ${\xi}^{RT}$ represents the model parameters for the RT task, $i$ is the trial number and $N$ is the total number of trials. The probability density function for the time of bound crossing (decision times) is obtained by numerical solutions to the FokkerPlanck equation. The difference between the reaction time and the decision time is the nondecision latency, assumed to reflect sensory and motor delays unrelated to motion strength or volatility. This latency is assumed Gaussian with mean $\mu}_{tnd$ and standard deviation $\sigma}_{tnd$. The RT probability density function is obtained by convolving the p.d.f. of the decision times with the distribution of nondecision latencies.
For the PDW task, the log likelihood is a sum of two terms,
where ${L}^{{S}^{+}}$(${L}^{{S}^{}}$) is the loglikelihood computed over trials with (without) the surebet target, and $\xi}^{PDW$ are the model parameters. For trials without the sure target, the loglikelihood of the parameters is
where the summation runs over trials without the sure target, and ${T}_{i}$ is the duration of the stimulus on trial $i$. The argument of the summations is computed as follows. If ${p}_{up}(t)$ is the probability of crossing the upper bound at time $t$, then the probability of crossing the bound anytime before time T is
and
where choice '1' is associated with a positive DV (i.e., $x>0$). In the equation, $p(x0,t=Tc,v,{\xi}^{PDW})$ is the probability that the decision variable ($x$) is positive at time $T$ and that no bound has been reached before $T$.
For trials where the surebet target was offered, we compute the likelihood of the parameters given the three possible responses in a trial: the two directional choices and the sure bet choice. We assumed that subjects opt out of reporting the direction choice and select the sure bet if the confidence in the decision is lower than a criterion, $\Phi $, which was the same for conditions of low and high volatility. The value identifies a probability contour like those depicted in Figure 1C. It demarcates a zone in the middle of the graph depicted in Figure 1C in which the state of the evidence would lead the subject to opt out. Therefore, the probability of opting out of the direction choice $p(o)$ is
where $\mathscr{H}\left(x\right)$ is a step function that evaluates to one if $x>0$, and zero otherwise. The first term on the righthand side of the equation integrates the probability density that has not been absorbed at a bound before time $T$ and for which probability correct is lower than $\Phi $. The second and third terms allow for the possibility that even when a bound was reached, the probability correct at the bound is lower than the criterion $\Phi $. In practice, this only occurs (e.g., during fitting) when the bound is too low or the criterion is too high. ${B}_{up}(t)$ and ${B}_{lo}(t)$ correspond to the height of the upper and lower bounds at time $t$, respectively. For readability, we have omitted the dependence of $p(corr)$ on some parameters (e.g., ${\xi}^{PDW}$).
The probability of waiving the sure bet and making a direction choice follows the complementary logic:
where the first term of the righthand side corresponds to the probability of selecting choice '1' when the bound is reached, and the second term computes the probability of selecting this choice when no bound is reached before $T$.
In the human confidence task, we performed a maximum likelihood fit to the choice reported on each trial:
where $\hat{\xi}}^{HCONF$ is the maximum likelihood estimate of the parameters and the likelihood is computed as described by Equation 10. We fit only one parameter per subject ($\kappa $). The rest of the parameters were taken from the RT task (i.e., from $\hat{\xi}}_{}^{RT$; see Table 1). Note that confidence was not used for the fits, and therefore the solid curves in Figure 5 can be considered predictions of the model.
For the RT task, we allowed the bound height to change as a function of time, as suggested by previous work (Churchland et al., 2008; Hanks et al., 2011; Drugowitsch et al., 2012). The upper and lower bounds were symmetric around zero, and were parameterized by a logistic function of time:
where a and d are the scale and location parameters of the logistic. The bound parameters were constrained to be the same for the two volatility conditions, except in the alternative model for the RT task where we fit separate ${B}_{0}$ for the two volatility conditions (Table 2).
In the human confidence task, the presence of bounds did not improve the quality of the fits. This implies that subjects used all the stimulus information to inform their choices, presumably because the stimulus duration was only 0.2 s. In the PDW, a stationary bound (i.e., $B\left(t\right)=\text{}{B}_{0}$) improved the quality of the fits.
In the human confidence experiment, we do not know how each subject maps a position on the rating scale (position along the crescent target) to probability correct. Therefore, we assumed a monotonic transformation between the expected probability correct $p(corrc,v)$ and saccadic end point. Probability correct $p(corrc,v)$ was obtained by marginalizing $p(corrx,t)$ over the state of the evidence ($x)$ at the time of decision termination ($t$). Because we did not include a bound in the human confidence task, $t$ is the stimulus duration (i.e., $T$ = 0.2 s). The distribution of the DV at decision time depends on coherence $c$and volatility $v$, therefore
The monotonic transformation $\mathcal{F}$ that maps probability correct to the average position in the rating scale $\u27e8sac(c,v)\u27e9}_{tr$ was constructed as a linear combination of three error functions plus a constant offset: $\mathcal{\mathcal{F}}\left(x\right)={\sum}_{i=1}^{3}{\text{w}}_{i}\text{}{\text{erf}}_{i}\left(\frac{x{o}_{i}}{s}\right)+k$, where ${o}_{i}$ is an offset term, and $s$ is a scaling parameter. The three linear weights and the offset $k$ were fit to minimize the sum of squared differences between $\mathcal{\mathcal{F}}\left[p(corrc,v)\right]$ and $\u27e8sac(c,v)\u27e9}_{tr$. Similar results were obtained using different parameterizations of $\mathcal{F}$.
For the PDW task, we explored three alternative 'two map' models. In the first, we used a different mapping between DV and confidence for each volatility condition. Each map is the one that should be used if the volatility condition of each trial were known (i.e., the one specified by the bottom row of Equation 5). For the second twomap model, the assessment of volatility develops gradually during the trial. We assume that for a trial $i$ with stimulus duration ${T}_{i}$, the probability that the decision maker can identify the trial’s volatility is given by $w\left({T}_{i}\right)=1{e}^{{T}_{i}/\tau}$. For trials where the sure bet was offered, we compute the probability of the action that was chosen by the monkey as a weighted average of the two probabilities: the probability that results from using a common map for both volatility conditions, which was weighted by $(1w({T}_{i}))$, and the probability obtained from using the mapping that corresponds to the appropriate volatility of the trial, which was weighted by $w({T}_{i})$. The time constant $\tau $ was fitted to data. If $\tau $ is small, information about volatility builds up rapidly and the decision maker can use the appropriate map for each condition. Fitting the model to data showed that the volatility information develops very gradually, with $w(t)$ being ~0.01 for a 1s stimulus. For the third model, besides using different mappings between DV and confidence for the two volatility conditions, we also fit independent bounds, such that ${B}_{0}^{high}={B}_{0}^{}+\Delta {B}_{0}$ where $B$ denotes bound height (see Table 2). Best fitting parameters for the three alternative models and the BIC comparisons to the model of Figure 4 are shown in Table 2.
Statistical analysis
Request a detailed protocolTo examine whether high volatility leads to faster responses in the reaction time task, we fit a linear regression model for each subject where the reaction time is given by
where ${I}_{v}$ is an indicator variable for volatility (1: high, 0: low), and $\beta $’s are fitted coefficients. Unless otherwise indicated, the null hypothesis is that the $\beta $ term associated with ${I}_{v}$ equals zero, evaluated with ttest (tstatistics were derived using the parameter estimates and their associated standard errors [i.e., the square root of the elements in the diagonal of the covariance matrix of the parameter estimates]).
To evaluate the influence of volatility on accuracy, we used logistic regression, excluding trials of 0% coherence:
The influence of volatility was evaluated with a likelihoodratio test comparing models with and without the ${\beta}_{2}$ term.
We also used logistic regression to evaluate the effect of volatility on accuracy when pooling data across subjects and experiments:
where ${I}_{s,x}$ are indicator variables for every combination of task and subject (n = 8). This equation parallels the structure of the previous one. The first term in the argument of the exponential allows fitting a different intercept for each combination of task and subject, and the third term allows for different intercepts on high and low volatility trials. The significance of the influence of volatility on accuracy was evaluated with a likelihood ratio test comparing nested models with and without the ${\beta}_{2}$ terms, with the test statistic evaluated against a ${\chi}^{2}$ distribution with n = 8 degrees of freedom. Only nonzero coherences were included in this analysis.
Similarly, to evaluate the influence of volatility on the monkey’s PDW behavior on trials where the sure bet was offered, we fit
where ${p}_{waived}$ is the probability that the sure bet was declined, and ${T}_{d}$ is stimulus duration. We also examined whether availability of the sure bet influenced accuracy:
where ${I}_{s}$ is 1 if the sure bet was offered, and 0 otherwise. A positive $\beta}_{4$ indicates that the accuracy increases if the sure bet is offered but waived.
In the human confidence task, we mapped subjects’ confidence reports to a 0–1 scale, such that ‘0’ stands for ‘guessing’ and ‘1’ for ‘full certainty’. To evaluate the significance of the effect of volatility on confidence we fit for each subject the following linear regression model:
Motion energy
Request a detailed protocolWhile the motion coherence specifies the nominal strength of motion in the stimulus, the effective motion strength varies from trial to trial and even within trials, due to the random fluctuations in the stimulus. To extract the effective motion strength, we computed the motion energy in the stimulus (Adelson and Bergen, 1985; Kiani et al., 2008), following published procedures which we briefly review here. We convolved the sequence of random dots presented on each trial with two pairs of spatiotemporal filters. Each pair of filters is selective for one of the two alternative directions of motion ($\pm x$). Directional selectivity is achieved through the addition or subtraction of two spacetime separable filters. As in previous work (Kiani et al., 2008), the temporal impulse responses are:
The spatial filters are even (mirrorsymmetric) and odd (nonsymmetric) fourth order Cauchy functions:
where $\alpha =ta{n}^{1}\left(x/{\sigma}_{c}\right)$. The constants in Equations 22 and 23 were adjusted to match the apparent speed of the coherently moving dots.
The two pairs of directionally selective filters were obtained through appropriate addition and subtraction of the product of a spatial and a temporal filter. Specifically, the two filters selective to the +x direction are given by ‘slow $\times $ even – fast $\times $ odd’, and ‘slow $\times $ odd + fast $\times $ even’. Filters selective to the x direction are given by ‘fast $\times $ odd + slow $\times $ even’, and ‘fast $\times $ even – slow $\times $ odd’. The four directional filters were convolved with the 3dimensional (x,y,time) stimulus. After squaring the output and adding the two filters that prefer the same direction, we compute opponent motion energy by subtracting x from +x preferring responses. Finally, we average across space to obtain a temporal signal, ${e}_{tr}(t)$, which quantifies how motion strength varies within each trial. Because the motion energy has arbitrary units, which varies, for instance, with the size of the stimulus, we normalized ${e}_{tr}(t)$ multiplying it by a constant $\lambda $. The normalization constant was the same for all trials in a session, and was set such that the motion energy is, on average, equal to the motion coherence. This normalization is possible because the motion energy is a linear function of the motion coherence. The motion energy profile for ${e}_{tr}(t)$ is shown in Figure 2A for an example trial.
To characterize the mean and variance of the motion energy for high and low volatility (Figure 2B), we first computed the average motion energy for each trial, i.e. $e}_{tr}={\u27e8{e}_{tr}\left(t\right)\u27e9}_{t$, ignoring the rise and decay times of the motion filters, that is from 50 ms after motion onset to 50 ms after offset. The mean and variance of ${e}_{tr}$ was computed over subsets of trials grouped by motion coherence and volatility condition.
We used logistic regression to determine if the motion energy profile of each trial of the PDW task contains enough information to identify the trial’s volatility. We calculated the mean (${e}_{tr}$) and an index of the dispersion (${e}_{tr}^{v}$) of the motion energy time course for each trial. The dispersion index was estimated as the variance of the distribution of motion energy values estimated at the frame rate, ignoring the autocorrelation in motion energy profile. Thus, ${e}_{tr}^{v}$ is more accurately described as a measure of dispersion of the motion energy profile on single trials rather than as an estimate of the variance. The mean and the dispersion of the motion energy were used together with the stimulus duration (${T}_{d}$) to train a logistic regression model to classify the volatility condition of each trial:
where ${p}_{v}^{tr}$ is the probability that trial $tr$ is of high volatility. After fitting the logistic model, we estimated the degree of overlap in the distributions of ${p}_{v}^{tr}$ between trials of low and high volatility. The area under the ROC curve was 0.895, indicating that there is information in the stimulus to reliably estimate the volatility condition of each trial, even for the brief stimulus presentations used in the PDW task. If we remove the interaction term (${\beta}_{4}$) the area under the ROC curve is 0.85. To be clear, we do not put forward this calculation as a plausible model for inferring volatility. It merely serves to document that information is present in the stimuli to render a categorization possible.
Analysis of neural data
Request a detailed protocolFor simplicity, in what follows we refer to both single units and multiunit sites as ‘neurons’. To investigate how the volatility manipulation affected the mean and variance of the neuronal response, we first counted spikes occurring between 100 ms and 200 ms from stimulus onset. To avoid artifacts produced by the response to the offset of the RDM stimulus, we restricted this analysis to trials where the motion stimulus was presented for at least 150 ms. The counts were standardized (zscored) independently for each neuron and subsequently grouped across neurons to obtain a large array of normalized counts, ${s}_{tr}$, where $tr$ indexes the trial number across sessions. Figure 2C shows the mean (${\mu}_{c,v}$) and the variance (${\sigma}_{c,v}^{2}$) of ${s}_{tr}$ computed over the subset of trials given by every combination of motion coherence and volatility condition.
These analyses furnished empirical estimates of the mean and variance of the spike count as a function of motion strength and direction. Findings from neurophysiology (Ditterich et al., 2003) and computational modeling (Mazurek et al., 2003) suggest that the momentary evidence is proportional to the difference of firing rates between pools of neurons with opposite direction preferences (e.g., rightpreferring minus leftpreferring). The expectation of this difference variable ($\Delta $) can be estimated empirically:
where c and c indicate motion in the preferred and antipreferred direction of the neuron, for motion strength c. The mean of the difference variable is shown in Figure 2D, with mean counts ${\mu}_{c,v}$ and ${\mu}_{c,v}$ obtained from Figure 2C.
The variance of the difference variable ($\sigma}_{\mathrm{\Delta}}^{2$) was approximated as follows. Because the variance of a sum equals the sum of the covariances, if the average pairwise correlation for a pool of $n$ neurons is given by $r$, then the variance of the average response of the pool is $(\frac{{\sigma}^{2}}{n}+\frac{n1}{n}r{\sigma}^{2})$, where ${\sigma}^{2}$ is the variance in the spike counts from a single neuron. As $n$ becomes large (in practice, above 50 to 100 neurons is sufficient), the variance of the pool converges to $r{\sigma}^{2}$. Further, there is a portion of the variance that is shared between neurons tuned to the preferred and antipreferred directions. If the correlation between the average responses of populations of neurons with opposite directional preferences is given by $\rho $, the variance of the difference variable as is given by Equation 1 of the main text.
For the analyses depicted in Figure 7, we extracted the spike times from each trial up to 50 ms after motion offset and then smoothed the spike counts with a centered boxcar filter with a 30 ms width. For the analysis of Figure 7B we computed, for each neuron, the difference in firing rate between the response to the preferred and the nonpreferred directions, for trials of the highest coherence (c = 0.512). This difference was used to estimate the latency with which motion information is represented in these neurons, regardless of the choice. For the analyses of Figure 7A,C,D, we obtained the residuals of firing rate by subtracting, from each trial and time step, the average firing rate of the same neuron on trials having the same motion direction, coherence and volatility. To group trials across neurons, we divided the activity of each neuron by a normalization constant, given by the maximum average firing rate at the highest coherence (i.e., c = 0.512). The latencies in Figure 7B were estimated with a curve fitting procedure based on the CUSUM method (Ellaway, 1978). In the CUSUM method, the latency of the difference between two conditions is estimated based the cumulative sum of the differences, thereby achieving robustness against the noisiness of individual data point. The cumulative sum of differences was fit to a curve composed of two lines, the first of which was constrained to have a zero slope [similar to Lorteije et al. (2015); Van den Berg et al. (2016)]. The latency is then estimated as the time point when the two lines intersect. Standard errors of the latency estimates were derived with a bootstrapping procedure (N = 1000).
Appendix 1
Derivation of the normative model
We used dynamic programming to determine how a rational decisionmaker ought to adjust the height of the decision termination bounds when trials of different volatilities are randomly interleaved. For simplicity, we assume that the variance of momentary evidence is known to the decision maker – or that it can be estimated very rapidly (e.g., Drugowitsch et al. [2014]). As in previous studies (Rao, 2010; Drugowitsch et al., 2012; Huang et al., 2012) we derive the optimal strategy by representing the randomdot motion discrimination task as a partiallyobservable Markov Decision Process (POMDP). The solution to the POMDP is then derived by recasting it as an MDP (i.e., assuming full observability over the belief states) and using dynamic programming to derive the policy that maximizes average reward.
An MDP can be described as a tuple given by (Bertsekas et al., 1995; Geffner and Bonet, 2013):
a nonempty state space $S$,
an initial state ${S}_{0}$,
a goal state ${S}_{G}$,
a set of actions $A(s)$ applicable in state $s$,
positive and negative rewards $r(a,s)$ for doing action $a$ in state $s$,
transition probabilities $Pa(s\text{'}s)$ indexing the probability of transitioning to state $s\text{'}$ after doing action $a$ in state $s$.
The state $s$ was defined as a tuple $(x,t,v)$, where $x$ is the accumulated motion evidence for one direction and against the other (with its sign indicating the direction of motion), $t$ is elapsed time from the onset of motion, and $v$ is the volatility condition (low or high).
In the initial state, $x=0$ (no net evidence favoring either of the alternatives), $t=0$ and there is an equal probability of being in a high or low volatility regime.
Three actions are applicable in each state: two directional choices (e.g., left and right) and third action (‘fix’), which is to maintain fixation for an extra time step to gather additional motion information. The outcome of the MDP is a deterministic policy, which assigns an action to each state.
Transition probabilities ${P}_{a}(s\text{'}s)$ indicate the probability of transitioning to $s\text{'}$ after performing action $a$ in state $s$. As for the bounded accumulation model, the momentary motion evidence is assumed to be normally distributed with a mean that depends linearly on motion coherence ($c$), and variance ${\sigma}_{v}^{2}\delta t.$ After $t$ sec, the accumulated evidence would—in the absence of bounds—also be normally distributed with mean $t\kappa c$ and variance $t{\sigma}_{v}^{2}$. Here we assume that ${\sigma}_{v}^{2}$ is independent of coherence to avoid additional complexities in the numerical solution of Bellman’s equation. Note that this simplification departs from the volatility manipulation introduced in the experiment.
For a given motion coherence, the probability that the evidence gathered in a time step δt leads to a transition from state $s=\left(x,t,v\right)$ to state $s\text{'}=\left(x\text{'},t+\delta t,v\right)$ is given by:
where $N(\bullet \mu ,\sigma )$ is the normal p.d.f. with mean $\mu $ and standard deviation $\sigma $.
We then need to marginalize over coherences to obtain the transition probability ${p}_{fix}\left({s}^{\prime}s\right)$:
Marginalizing over coherences requires knowledge of $p(cs)$, the probability that the motion coherence is $c$, given that state $s$ was reached, which can be computed as:
where the coherences $c$ are the discrete set of signed coherences used in the experiment, and the proportionality constant is such that that the sum of $p(cx,t,v)$ over all motion coherences adds to one (MorenoBote, 2010). As in the experiment, $p(c)$ is distributed uniformly over the discrete set of unsigned motion coherences.
The policy that maximizes average reward was found using value iteration to numerically solve Bellman’s equation. The process works by assigning to every state, $s$, a value $V(s)$, which is the largest associated with the three possible actions: choose right ($r$), choose left ($l$), or continue gathering evidence ($fix$):
where $b(s,a)$ is the probability of being correct after doing action $a$ in state $s$; and ${R}_{nc}$ are the rewards following correct and incorrect decisions (here 1 and 0 respectively); ${t}_{p}$ is the time penalty after an error, ${t}_{nd}$ is the average nondecision time, and ${t}_{w}$ is the average time spend between decisions including the time spend acquiring fixation and observing feedback; $\rho $ is the amount of reward obtained per unit of time (explained further below).
The probability of being correct after doing action $a$ in state $s$, $b(s,a)$, can be obtained summing over the coherences for which the action $a$ is the appropriate action. For instance, the action ‘right’ is the appropriate action for all positive coherences and for half of the 0% coherence trials. Therefore,
Because choosing right is a terminating event, there is no need to consider future states, and the same applies to the left choice. The value of gathering additional evidence before committing to a choice is captured by $Q(s,fix)$, computed as an expectation over all future states $s\text{'}$ that result from being in $s$ and gathering evidence for an additional time step $\delta t$.
Because time flows in a single direction, if the reward rate were known, then Bellman’s equation can be solved by backwards induction in a single pass. Since the reward rate depends on the policy itself, we perform multiple backward passes, bracketing $\rho $ within a sequence of diminishing intervals until the value of the initial state $V({S}_{0})$ becomes vanishingly small (Bertsekas et al., 1995; Drugowitsch et al., 2012). The procedure yields a formulation of the stopping criteria as a function of time. These are the optimal bounds shown in Appendix 1—figure 1 (top row) for different scenarios (Appendix 1—table 1).
With the optimal bounds for each volatility condition, we compute the probability that the decision was correct given that a bound was reached at time $t$. For a single motion coherence (Appendix 1—figure 1A–B), this probability is independent of time (Wald and Wolfowitz, 1948). For different sets of parameters (Appendix 1—table 1) we derive the choice and decision time by solving numerically the FokkerPlanck equations using the optimal bounds, for a fine grid of coherence values and for both volatility conditions (second and third rows of the figure). The confidence for correct choices (Appendix 1—figure 1, bottom row) in this model is determined solely by the time [see Kiani et al. (2014)]. The confidence associated with each coherence was obtained by marginalizing the probability correct at the bound over the distribution of decision times obtained for each motion coherence and volatility condition.
While none of the normative models depicted in Appendix 1—figure 1 correspond in detail to the experiment we conducted, the analysis carries three implications which are likely to apply. First, if the volatility conditions (low or high) were known, the decision maker should adjust the termination criteria and confidence mapping. In other words, it would be desirable to know the volatility conditions and to adjust the decision process accordingly. Second, if subjects approximated the optimal behavior they would have been less confident on high volatility trials. The observation that they were more confident on these trials implies that they were not optimal, or they could not identify the volatility without adding additional costs (e.g., effort and/or time). Third, the faster RT on high volatility trials would have been expected even if the subjects had applied different decision criteria.
References

1
Spatiotemporal energy models for the perception of motionJournal of the Optical Society of America A 2:284–299.https://doi.org/10.1364/JOSAA.2.000284

2
A stochastic model for individual choice behaviorPsychological Review 67:1–15.https://doi.org/10.1037/h0046438

3
Correlated firing in macaque visual area MT: time scales and relationship to behaviorJournal of Neuroscience 21:1676–1697.
 4

5
The analysis of visual motion: a comparison of neuronal and psychophysical performanceJournal of Neuroscience 12:4745–4765.

6
Responses of neurons in macaque MT to stochastic motion signalsVisual Neuroscience 10:1157–1169.https://doi.org/10.1017/S0952523800010269
 7

8
Microstimulation of extrastriate area MST influences performance on a direction discrimination taskJournal of Neurophysiology 73:437–448.

9
Decisionmaking with multiple alternativesNature Neuroscience 11:693–702.https://doi.org/10.1038/nn.2123

10
Two types of ROC curves and definitions of parametersThe Journal of the Acoustical Society of America 31:629–630.https://doi.org/10.1121/1.1907764
 11
 12
 13

14
Robust averaging during perceptual judgmentPNAS 108:13341–13346.https://doi.org/10.1073/pnas.1104517108

15
Making decisions with unknown sensory reliabilityFrontiers in Neuroscience 6:75.https://doi.org/10.3389/fnins.2012.00075

16
Microstimulation of visual cortex affects the speed of perceptual decisionsNature Neuroscience 6:891–898.https://doi.org/10.1038/nn1094

17
The cost of accumulating evidence in perceptual decision makingJournal of Neuroscience 32:3612–3628.https://doi.org/10.1523/JNEUROSCI.401011.2012

18
Optimal decisionmaking with timevarying evidence reliabilityAdvances in Neural Information Processing Systems.

19
Cumulative sum technique and its application to the analysis of peristimulus time histogramsElectroencephalography and Clinical Neurophysiology 45:302–304.https://doi.org/10.1016/00134694(78)900172

20
A model of calibration for subjective probabilitiesOrganizational Behavior and Human Performance 26:32–53.https://doi.org/10.1016/00305073(80)900458
 21

22
How to measure metacognitionFrontiers in Human Neuroscience 8:1–9.https://doi.org/10.3389/fnhum.2014.00443

23
A concise introduction to models and methods for automated planningSynthesis Lectures on Artificial Intelligence and Machine Learning 7:1–141.https://doi.org/10.2200/S00513ED1V01Y201306AIM022
 24

25
The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038
 26
 27

28
Elapsed decision time affects the weighting of prior probability in a perceptual decision taskJournal of Neuroscience 31:6339–6352.https://doi.org/10.1523/JNEUROSCI.561310.2011
 29

30
The relation of the time of a judgment to its accuracyPsychological Review 18:186–201.https://doi.org/10.1037/h0074579

31
How prior probability influences decision making: A unifying probabilistic modelAdvances in Neural Information Processing Systems.

32
A computational framework for the study of confidence in humans and animalsPhilosophical Transactions of the Royal Society B 367:1322–1337.https://doi.org/10.1098/rstb.2012.0037
 33
 34
 35
 36
 37
 38
 39

40
Heuristic use of perceptual evidence leads to dissociation between performance and metacognitive sensitivityAttention, Perception, & Psychophysics 78:923–937.https://doi.org/10.3758/s134140161059x

41
A role for neural integrators in perceptual decision makingCerebral Cortex 13:1257–1269.https://doi.org/10.1093/cercor/bhg097

42
Low attention impairs optimal incorporation of prior knowledge in perceptual decisionsAttention, Perception, & Psychophysics 77:2021–2036.https://doi.org/10.3758/s1341401508972
 43

44
Postdecisional accounts of biases in confidenceCurrent Opinion in Behavioral Sciences 11:55–60.https://doi.org/10.1016/j.cobeha.2016.05.005
 45

46
Some relationships between comparative judgment, confidence, and decisiontime in weightliftingThe American Journal of Psychology 76:28–38.https://doi.org/10.2307/1419996
 47
 48

49
Attention induces conservative subjective biases in visual perceptionNature Neuroscience 14:1513–1515.https://doi.org/10.1038/nn.2948

50
Direct injection of noise to the visual cortex decreases accuracy but increases decision confidenceJournal of Neurophysiology 107:1556–1563.https://doi.org/10.1152/jn.00985.2011

51
Decision making under uncertainty: a neural model based on partially observable markov decision processesFrontiers in Computational Neuroscience 4:146.https://doi.org/10.3389/fncom.2010.00146

52
Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time taskJournal of Neuroscience 22:9475–9489.

53
How MT cells analyze the motion of visual patternsNature Neuroscience 9:1421–1431.https://doi.org/10.1038/nn1786
 54

55
A computational analysis of the relationship between neuronal and behavioral responses to visual motionJournal of Neuroscience 16:1486–1510.

56
Bayesian Brain: Probabilistic Approaches to Neural Coding209–237, The speed and accuracy of a simple perceptual decision: a mathematical primer, Bayesian Brain: Probabilistic Approaches to Neural Coding.
 57

58
The variable discharge of cortical neurons: implications for connectivity, computation, and information codingJournal of Neuroscience 18:3870–3896.

59
A detailed comparison of optimality and simplicity in perceptual decision makingPsychological Review 123:452–480.https://doi.org/10.1037/rev0000028
 60
 61
 62
 63

64
The response variability of striate cortical neurons in the behaving monkeyExperimental Brain Research 77:432–436.https://doi.org/10.1007/BF00275002

65
Optimum character of the sequential probability ratio testThe Annals of Mathematical Statistics 19:326–339.https://doi.org/10.1214/aoms/1177730197

66
Metacognition in human decisionmaking: confidence and error monitoringPhilosophical Transactions of the Royal Society B 367:1310–1321.https://doi.org/10.1098/rstb.2011.0416
 67

68
The construction of confidence in a perceptual decisionFrontiers in Integrative Neuroscience 6:79.https://doi.org/10.3389/fnint.2012.00079

69
Variance misperception explains illusions of confidence in simple perceptual decisionsConsciousness and Cognition 27:246–253.https://doi.org/10.1016/j.concog.2014.05.012
Decision letter

Michael J FrankReviewing Editor; Brown University, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "The influence of evidence volatility on choice, reaction time and confidence in a perceptual decision" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and David Van Essen as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Hakwan Lau (Reviewer #1); Jeff Beck (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
This article reports an experiment using a random dot motion stimulus in which the coherence is a random variable from frametoframe with a given mean and standard deviation. The experiment manipulated the mean coherence and the volatility of the coherence with either no frametoframe variance (low volatility) or with frametoframe variance (high volatility). Choice, response time, and measures of retrospective confidence were collected from three human participants and one macaque monkey. For the monkey, confidence monkey was measured with postdecision wagering. For the humans, confidence was measured via a scale positioned on the left and right of the stimulus allowing a choice to be recorded at the same time a confidence level was recorded. Mean coherence had the typical effects on all 3 measures. Volatility, particularly at the lower levels of coherence, decreased response time and increased the average level of confidence. The results from detailed analyses using a motion energy analysis of the stimulus, firing rate of direction selective neurons in the MT, and behavioral data, are reported as consistent with a bounded evidence accumulation model where sensory information is accumulated as evidence until it reaches a threshold or stimulus offset. This determines the observed choice and response time. Confidence is assumed to reflect the probability the decision is correct given the level of evidence reached, though an important conditional here is that this mapping is not impacted by the socalled volatility of the stimulus.
Essential revisions:
While all reviewers were enthusiastic about the potential contribution, noting that it was timely and methodologically rigorous, they also raised some substantive issues that need to be addressed with further analysis. Individual reviewer comments are amalgamated below. You will note that some of the comments are redundant. We have kept the redundancy to reflect both the overlapping views, but also the different means that might be taken to address the concerns.
1) A major concern with the paper echoed by multiple reviewers (see related points 24 below) and reinforced in discussion is that for deriving confidence judgments there is a strong assumption that the decision maker does not adjust the mapping between the location of the DV and probability of a correct choice. Is this an assumption that the authors had before seeing the data, which seems hard to defend? If so it might be worth a few sentences defending why they assumed this. If not this is also fine, but this should also be made explicit.
2) Strong assumption of variability. There is a pretty strong assumption in the model that to account for the confidence data that "the subject has implicit knowledge of the mapping between the DV and probability correct, and does not adjust the mapping when a high volatility stimulus is shown" (–Results). It seems, at first glance, consistent with the data, but inconsistent with the motivation behind the model itself. That is, how can confidence reflect the probability the decision is correct given the decision variable, but then that mapping is invariant to manipulations that in principle impacts the probability of being correct? Choice accuracy was not significantly impacted, but the trends were certainly in the correct direction with volatility leading a small drop in accuracy (see Figure 3). More justification for this assumption is needed. Theoretically this just doesn't seem to be consistent with the a priori principles of the model because in principle more variance should impact accuracy. Isn't it also the case that this assumption implies there should be no change between low and high volatility conditions in the average difference in the confidence given to correct vs incorrect choices? I suspect the difference in confidence between correct and incorrect choices (at least for the human data) is smaller in the high volatility condition.
One way to address this is with fitting an ideal observer model. Another way (which might lead to the same model) is to fit the same model but allowing the mapping to change and compare goodness of fits of the two models. It might just be that this data lead to this conclusion. I would also be curious to what degree confidence in corrects and confidence in incorrect is different, at least in humans. I think their model with the invariance assumption predicts that there should be no change between variability conditions in this difference score. I believe it would be informative to see this examined.
3) Methods: "We derived the mapping using the variance from the low volatility condition. We assume, however, that the same mapping supports confidence ratings (and PDW) in both volatility conditions." This is a problematic assumption at best and really needs a more thorough justification. It's tantamount to the assumption that the system is not estimating volatility on a trial by trials basis as an ideal observer would when high and low viability trials are interleaved. As a result, it is unclear whether or not these results are a product of this assumption or a product of the task itself. We know from human psychophysics that we do estimate volatility (or stimulus variance) on a trial by trial basis and more or less rationally incorporate that information into our decision variables. In this context an ideal observer model should be associated with an accumulator that rises sublinearly it accumulates evidence that this is a high volatility trial. While the opposite is true in low volatility trials. If there is not enough information to accurately estimate stimulus variability then the author's assumption is valid. But it's not clear that this is the case. Moreover, even an ideal observer model may exhibit the same qualitative behavior as the author's suboptimal model even when there is enough information to estimate variability accurately. It would be nice to know if this is the case, but regardless, some discussion of these issues would be a welcome addition to this work.
4) Moreover, if I am right about how you set your prior on c then it shouldn't be too hard to adapt the code you have already written to just make the sum in Equation 5 include the two volatility conditions and then just talk log and call this your decision variable. This decision variable will not be a linear sum of the momentary evidence. It will be sublinear when volatility is high and superlinear with volatility is low. This actually fits very nicely with your neural data which shows that in the high volatility condition the antipreferred neurons slightly increase in activity.
5) On this point about fixed boundary, how important it is that the high and low volatility conditions are mixed? Would it have worked if they were blocked in different days? Weeks? I.e. how inflexible are the subjects in shifting from one set of bound to another? Finding it in monkeys may be too much work, but can we do this in human subjects? Seems most plausibly, that in normal everyday circumstances, adding noise should make people slower and less confident, not the other way round. So these effects may be limited to the experimentally contrived conditions where subjects are overtrained on trials with different levels of noise mixed randomly.
6) A related but perhaps broader point, regarding confidence. If subjects adopt a fixed set of bounds/criteria, their decision mechanism seems decidedly suboptimal/heuristical. However, many prominent researchers now advocate the approach of first writing down the optimal mathematical definition of confidence, and then proceed to find its correlates in the brain (cf Kepecs, Pouget, etc.). Given that confidence can be assessed in animals including rodents, an alternative approach is to empirically assess confidence which may not be optimal. In this context, don't the present findings give important and stern warning to the "optimality" approach? This is a key issue that may influence the agenda of the field for years to come, and should be emphasized.
7) The neuronal recordings also add to the novelty beyond previous work, but much more details are needed – right now it's almost like an afterthought. As noted by the authors, the basic conclusion that adding noise leads to increased confidence has already been shown. Can we plot the fano factor of these neurons at different conditions? The impact of stimulus noise seems modest, but does it change pairwise noise correlation between neurons too? Even if it doesn't, what is the correlation to begin with, for these neurons? Since multiunit recording was performed in at least some of these neurons, we should have some idea? If pairwise noise correlation is high, it limits the efficiency of a readout mechanism (if such mechanism is to do anything resembling averaging of individual neuronal responses), and may thus mean that the readout is noisy too (because individual noise can't be efficiently averaged out). Again, I understand there are relatively few neurons here, but this is an important part of the results, going beyond previous studies, and should perhaps be mentioned in the Abstract, so people will know this is not just a psychophysics paper.
8) I had a hard time appreciating whether the more extreme confidence judgments were diagnostic of this particular model of choice, response time, and confidence, or if other models would also predict this result. For instance, Pleskac and Busemeyer's 2DSD model (assuming say a serial process of choice then confidence) would also predict higher average confidence at lower levels, but for a slightly different reason with the variability combined with a bounded scale would produce regressive like effects. It may be the case the monkey data with postdecision wagering would speak against this, but it seems like a relevant discussion item.
Pleskac, T. J., & Busemeyer, J. R. (2010). TwoStage Dynamic Signal Detection: A Theory of Choice, Decision Time, and Confidence. Psychological Review, 117, 864901. doi:10.1037/A0019737
https://doi.org/10.7554/eLife.17688.025Author response
Essential revisions:
While all reviewers were enthusiastic about the potential contribution, noting that it was timely and methodologically rigorous, they also raised some substantive issues that need to be addressed with further analysis. Individual reviewer comments are amalgamated below. You will note that some of the comments are redundant. We have kept the redundancy to reflect both the overlapping views, but also the different means that might be taken to address the concerns.
We thank the reviewers for their comments and the reviewing editor for compiling them. We have addressed all of the concerns, both in this response letter and in the revised manuscript which contains several new analyses and discussion points. We first describe our response to the major concern described in section 1 below, then touch on any additional aspects of this concern as they come up in subsequent comments.
1) A major concern with the paper echoed by multiple reviewers (see related points 24 below) and reinforced in discussion is that for deriving confidence judgments there is a strong assumption that the decision maker does not adjust the mapping between the location of the DV and probability of a correct choice. Is this an assumption that the authors had before seeing the data, which seems hard to defend? If so it might be worth a few sentences defending why they assumed this. If not this is also fine, but this should also be made explicit.
We assumed a fixed mapping between the DV and confidence because the volatility manipulation is fairly subtle (see Video 1) and the trial types were interleaved and uncued. Indeed, subjects were not told and had no reason to suspect that anything other than motion strength was being varied across trials, let alone that there were exactly two volatility conditions. Keep in mind that motion strength itself is a type of reliability, and we know from many lines of evidence that subjects do not immediately identify the motion strength in this task. For example, if they did, RT would be shortest for trials of 0% coherence because reward probability is 0.5 by definition, thus the sensible thing to do is guess quickly.
We also note that the primary goal of the study was not to evaluate the assumption of a fixed mapping, but to test a more fundamental prediction of the bounded evidence accumulation framework, namely the effect of manipulating noise independent of signal on choice, RT and confidence.
Nevertheless, we agree that the issue is important and deserves additional attention in the paper. To this effect, we have added the following analyses/figures:
A) We tested whether there is sufficient information in the stimulus to discriminate between volatility conditions. We did this by training a logistic regression model which had access to the mean and dispersion of motion energy (and their interaction) on each trial of the PDW task (subsection “Alternative models”, first paragraph; subsection “Motion Energy”, last paragraph). The logistic model was able to discriminate volatility conditions with 90% accuracy, suggesting that it is reasonable to ask whether subjects did in fact use this information to adjust their decision policy.
B) To determine whether they did, we fit three alternative models in which the volatility condition affects the mapping between DV and probability correct. These are explained in a new subsection of Results (“Alternative Models”, first paragraph). The first version simply assigns separate mappings to the two volatility conditions (derived from the variance of the momentary evidence in each condition). This is tantamount to assuming that subjects identify the volatility instantaneously. The second version permits volatility to be estimated gradually, and the third allows the termination criterion (bound height) to differ as well. A Bayesian model comparison showed that none of the alternative models were as well supported by the data as the commonmapping model (see Table 2 and new Figure 6).
C) We then went a step further and asked what an ideal observer would do given knowledge of the volatility of each trial. For instance, if errors are inexpensive it is conceivable that the best strategy in the high volatility condition is to respond rapidly, to hasten onset of the next trial. However, if errors are costly, for instance if they lead to a large timeout penalty, then the rational strategy for high volatility stimuli could be to increase the bound height to maintain high levels of accuracy even under high volatility. We used dynamic programming to derive the optimal policy (in the sense of maximizing reward rate) when the volatility of the evidence can differ between trials (see Appendix).
As described in detail in the new Appendix, we found that the optimal policy includes an increase in the height of the termination bound (Appendix 1—figure 1, top row), as well as the aforementioned change in the mapping between DV and probability correct. Interestingly, the added bound height is not enough to overcome the effect of noise on RT, such that an optimal decision maker would still show slightly faster RTs at low coherence in the high volatility condition (Appendix 1—figure 1, third row). However, the normative solution predicts lower confidence on highvolatility trials (bottom row), which we did not observe in our subjects. Again, as our main focus was not on the optimality (or lack thereof) of our subjects in this task, we situated the analysis of normative models in an appendix.
D) Lastly, given the assumption of a common mapping, there is the question of which mapping to use. In our original submission, we assumed that the mapping used for both volatility conditions was constructed based on the variance of the lowvolatility condition. The motivation for this assumption was that subjects, especially the monkeys, had extensive experience with the low volatility condition before we introduced the high volatility condition. Given the subtlety of the manipulation, and the lack of a strong effect on accuracy, subjects may have had little impetus to conclude that task contingencies have changed when we introduced the volatility manipulation, and would therefore continue to rely on the mapping that they learned for the low volatility condition. Based on the reviewer’s comments, we now realize that this reasoning was unclear, as we did not specify the conditions under which the mapping is or is not expected to change, and how much experience is required for the mapping to change. Therefore, we now redefine the mapping to include both volatility conditions. The map was derived rationally, meaning that it was obtained marginalizing over the two volatility conditions, as it is explained in Results (subsection “Effects of volatility on confidence”, first paragraph) and Methods (new equation 5). We believe this to be more parsimonious and more consistent with the extensive experience that our subjects eventually developed under the mixture of volatilities. We note, however, that the two models produce almost identical fits. The reason for this is better understood with Author response image 1, which is equivalent to the new Figure 6A but for three maps (instead of two): low volatility (dashed lines), high volatility (dotted lines) and mixture of high and low volatilities (solid lines). As can be observed in the maps, the isoprobability contours are approximately scaled versions of one another. Because the criterion Φ that separates high from low confidence is a parameter that we fit to data, we cannot distinguish between a low volatility map with a Φ of (say) 0.6 from a mixture map with a Φ of ~0.56.
2) Strong assumption of variability. There is a pretty strong assumption in the model that to account for the confidence data that "the subject has implicit knowledge of the mapping between the DV and probability correct, and does not adjust the mapping when a high volatility stimulus is shown" (–Results). It seems, at first glance, consistent with the data, but inconsistent with the motivation behind the model itself. That is, how can confidence reflect the probability the decision is correct given the decision variable, but then that mapping is invariant to manipulations that in principle impacts the probability of being correct? Choice accuracy was not significantly impacted, but the trends were certainly in the correct direction with volatility leading a small drop in accuracy (see Figure 3). More justification for this assumption is needed. Theoretically this just doesn't seem to be consistent with the a priori principles of the model because in principle more variance should impact accuracy.
We appreciate the conundrum raised by the reviewer, and thank him/her for the opportunity to make an important point. First, any mechanism or criterion for establishing confidence in a perceptual decision must be invariant to at least one manipulation that affects probability correct, namely the difficulty or stimulus strength itself. If the brain were able to identify perfectly the difficulty level (motion coherence in our tasks), it would have no need to estimate probability correct from a DV, but would simply assign a level of confidence based on the actual probability correct experienced at that difficulty. In the extreme case, decisions on 0% coherence trials should all be rendered with the lowest possible confidence level (regardless of the DV), but this is not what happens. Volatility is simply another parameter of the visual stimulus that, as we have now shown, does not appear to affect the mapping of a DV to confidence (leaving aside the empirical observation that our volatility manipulation had very weak effects on accuracy, providing little incentive for the brain to adjust to it).
In short, the idea that confidence reflects an estimate of probability correct is not at all inconsistent – logically or empirically – with the mechanism being invariant to a stimulus manipulation that affects accuracy, especially one that is subtle and randomly interleaved. Nevertheless, the reviewer’s general concern about the assumption of an invariant mapping is valid, and we hope we have addressed it in the response to item #1 above.
Isn't it also the case that this assumption implies there should be no change between low and high volatility conditions in the average difference in the confidence given to correct vs incorrect choices? I suspect the difference in confidence between correct and incorrect choices (at least for the human data) is smaller in the high volatility condition.
We do not completely follow the reviewer’s intuition, but we performed the suggested analysis and found no clear indication for a greater difference in confidence – between correct and incorrect choices – for the low volatility condition. For each subject, coherence, and volatility condition, we computed the difference in confidence (saccadic endpoint) between correct and error trials. Author response image 2 shows the result of this analysis. Colors indicate different coherences, and shapes denote different subjects. We found no strong evidence for differences in confidence between correct and error trials among the two volatility conditions. We note, however, that there are many caveats to this analysis that the current dataset cannot resolve. First, the analysis would require a larger number of trials to obtain a more reliable estimate of confidence on error trials, especially for the higher coherences. Second, the subjects reported confidence on an uncalibrated confidence scale, therefore it is difficult to map differences in this arbitrary scale to differences in subjective probabilities or degrees of belief. Given these caveats, we would rather not include this figure in the manuscript.
One way to address this is with fitting an ideal observer model. Another way (which might lead to the same model) is to fit the same model but allowing the mapping to change and compare goodness of fits of the two models. It might just be that this data lead to this conclusion. I would also be curious to what degree confidence in corrects and confidence in incorrect is different, at least in humans. I think their model with the invariance assumption predicts that there should be no change between variability conditions in this difference score. I believe it would be informative to see this examined.
We thank the reviewer for this very useful suggestion. We have now performed both analyses suggested by the reviewer: (1) fit several alternative models where the mapping is allowed to change (new Figure 6), and (2) derive an ideal observer model for our task (see new Appendix). See response to item #1 above for details. Regarding the issue of comparing the differences in confidence between correct and errors for both volatilities, please see the response to the previous point.
3) Methods: "We derived the mapping using the variance from the low volatility condition. We assume, however, that the same mapping supports confidence ratings (and PDW) in both volatility conditions." This is a problematic assumption at best and really needs a more thorough justification. It's tantamount to the assumption that the system is not estimating volatility on a trial by trials basis as an ideal observer would when high and low viability trials are interleaved. As a result, it is unclear whether or not these results are a product of this assumption or a product of the task itself. We know from human psychophysics that we do estimate volatility (or stimulus variance) on a trial by trial basis and more or less rationally incorporate that information into our decision variables. In this context an ideal observer model should be associated with an accumulator that rises sublinearly it accumulates evidence that this is a high volatility trial. While the opposite is true in low volatility trials. If there is not enough information to accurately estimate stimulus variability then the author's assumption is valid. But it's not clear that this is the case. Moreover, even an ideal observer model may exhibit the same qualitative behavior as the author's suboptimal model even when there is enough information to estimate variability accurately. It would be nice to know if this is the case, but regardless, some discussion of these issues would be a welcome addition to this work.
The concern raised by the reviewer’s quote of our text is that we used the variance from the low volatility condition to derive the mapping used in both conditions. Above, under item 1D, we explain our reasoning for doing this in the original submission, and that we have now changed the model to reflect a mixture of the two volatility conditions. However, we suspect this issue is secondary to the broader questions surrounding whether a common mapping is used at all: (i) whether there is enough information to estimate stimulus variability on a trial by trial basis (subsection “Alternative models”, first paragraph; subsection “Motion Energy”, last paragraph), (ii) whether our subjects exploited such information (subsection “Alternative Models”), and (iii) what an ideal observer ought to do (Appendix). We hope these aspects of the reviewer’s concern are fully addressed in our responses to item #1 and in the revised manuscript.
Regarding the point that “we know from human psychophysics that we do estimate volatility (or stimulus variance) on a trial by trial basis[…]”, we agree with the reviewer that there are situations where subjects do take into account the reliability of the evidence when making a decision. When reliability is easily discernible—as when the contrast of the stimulus is markedly reduced—observers may adjust their decision criteria in a rational manner.
However, our task is representative of a class of problems in which reliability is neither explicitly cued nor easily discernible. Under such circumstances, whether subjects would still try to infer the reliability of the evidence is an open question. Our data are consistent with the interpretation that extracting information about volatility from a stream of momentary evidence and using it to adjust the parameters of the decision process (e.g., the bound height or the mapping of DV to probability correct) is not an automatic process. This, however, does not imply that this volatility information will be ignored if more easily available or if ignoring volatility has more severe consequences, as we address in Discussion (fourth paragraph).
Regarding the point about sub and supralinear accumulation, we are aware that this is required to obtain Bayesianoptimal solutions for decisions when the reliability of the evidence changes dynamically within a trial. Our situation is different because the noisiness in the momentary evidence is constant within a trial, and therefore scaling the evidence by the noise would be equivalent to a change in bound height. It is conceivable that subjects could have estimated the volatility of the evidence online in order to adjust the weight of the evidence, but we found no support for this hypothesis, as discussed in the new section on Alternative Models.
The new analysis of the neural data – which we added to address another suggestion by the reviewers – also argues against this interpretation. If there is a different adjustment of the weights of the evidence as a function of time for trials of low and high volatility, then the covariation between the neural responses and behavior should have different time courses for trials of low and high volatility. For instance, in the low volatility condition, late evidence should be weighted more because the monkey would have more certainty about the trial’s volatility condition after observing many samples of evidence. This prediction is not supported by the data. We found that the temporal profiles of the choiceconditioned firing rate residuals were similar for low and high volatilities, as can be seen in the new Figure 7A.
4) Moreover, if I am right about how you set your prior on c then it shouldn't be too hard to adapt the code you have already written to just make the sum in Equation 5 include the two volatility conditions and then just talk log and call this your decision variable. This decision variable will not be a linear sum of the momentary evidence. It will be sublinear when volatility is high and superlinear with volatility is low. This actually fits very nicely with your neural data which shows that in the high volatility condition the antipreferred neurons slightly increase in activity.
We hope we have clarified this issue above. Equation 5 has now been changed. Before, it conveyed the dependency of confidence (probability correct) on DV and time for the low volatility condition. In the revised manuscript the dependency reflects both volatility conditions – that is, a mixture with equal weighting from both, as the reviewer suggests.
Regarding the sublinear/superlinear weighting of the momentary evidence to construct the DV, we believe the reviewer is suggesting a model in which the decision maker (or brain) estimates the volatility of the momentary evidence onthefly. As indicated in the previous response, we now consider a related model in which the information about the volatility condition develops gradually during the trial, and found no support for the proposal. The exploration of normative models confirms the reviewers’ intuition that it would be ideal to ascertain volatility on the fly and to adjust policy accordingly, but the same normative models predict that confidence would be reported as lower under high volatility, which is not the case in our data.
Regarding the slight increase in activity for the antipreferred direction in the high volatility condition, this is explained by the rectification of the inputoutput response of MT/MST neurons. In the high volatility condition, for the lowest coherences the motion can reverse direction in some frames. Therefore, even if the manipulation is symmetric in terms of the dot displacement, it is not symmetric when pushed through the nonlinear inputoutput response of real neurons (as shown in the inset of Figure 2C).
5) On this point about fixed boundary, how important it is that the high and low volatility conditions are mixed? Would it have worked if they were blocked in different days? Weeks? I.e. how inflexible are the subjects in shifting from one set of bound to another? Finding it in monkeys may be too much work, but can we do this in human subjects? Seems most plausibly, that in normal everyday circumstances, adding noise should make people slower and less confident, not the other way round. So these effects may be limited to the experimentally contrived conditions where subjects are overtrained on trials with different levels of noise mixed randomly.
We do not know how subjects would have responded if high and low volatility conditions were blocked over days or weeks. However, it is interesting to note that even the alternative model that presumes knowledge of volatility (and the appropriate mapping) predicts higher confidence at the weak motion strengths (notice that the red curve in Figure 6B is above the blue; ignore the misses in the fits). This is because the increased noise in the highvolatility condition makes it more probable that the DV will reach the space normally occupied by the higher coherences. The same intuition explains our observation that, despite increasing the bound height on high volatility trials, an ideal observer still makes faster decisions on those trials, at least when motion strength is weak (see Appendix 1—figure 1, 3rd row). There certainly may be circumstances where a noise manipulation leads to slower and less confident decisions, in accordance with the reviewer’s intuition. But it is not obvious why such circumstances should be deemed more natural or general than the conditions of our experiment, in which reliability can vary unpredictably.
6) A related but perhaps broader point, regarding confidence. If subjects adopt a fixed set of bounds/criteria, their decision mechanism seems decidedly suboptimal/heuristical. However, many prominent researchers now advocate the approach of first writing down the optimal mathematical definition of confidence, and then proceed to find its correlates in the brain (cf Kepecs, Pouget, etc.). Given that confidence can be assessed in animals including rodents, an alternative approach is to empirically assess confidence which may not be optimal. In this context, don't the present findings give important and stern warning to the "optimality" approach? This is a key issue that may influence the agenda of the field for years to come, and should be emphasized.
The reviewer raises an important point, bearing on a larger set of ideas, but we are not keen on editorializing in this paper. While our results suggest that participants did not use information about volatility to adjust the DVconfidence mapping or the termination bounds, we cannot claim that this is necessarily suboptimal. To make this claim, we would need to know the cost in time and effort of estimating the volatility online to use it to adjust the parameters of the decision making process. So we would prefer to remain uncommitted on the optimality issue. We now mention this point explicitly in the Results section: "[…] we do not know if our subjects performed suboptimally or if they were simply unable to identify the volatility conditions without adding additional costs (e.g., effort and/or time). "
7) The neuronal recordings also add to the novelty beyond previous work, but much more details are needed – right now it's almost like an afterthought. As noted by the authors, the basic conclusion that adding noise leads to increased confidence has already been shown. Can we plot the fano factor of these neurons at different conditions? The impact of stimulus noise seems modest, but does it change pairwise noise correlation between neurons too? Even if it doesn't, what is the correlation to begin with, for these neurons? Since multiunit recording was performed in at least some of these neurons, we should have some idea? If pairwise noise correlation is high, it limits the efficiency of a readout mechanism (if such mechanism is to do anything resembling averaging of individual neuronal responses), and may thus mean that the readout is noisy too (because individual noise can't be efficiently averaged out). Again, I understand there are relatively few neurons here, but this is an important part of the results, going beyond previous studies, and should perhaps be mentioned in the Abstract, so people will know this is not just a psychophysics paper.
We thank the reviewer for this suggestion. As we now discuss in the text (subsection “Choice and confidencepredictive fluctuations in MT/MST activity”, first paragraph), the main goal of the physiological recordings was to verify that the manipulation had the desired effect on the mean and the variance of the firing rate of neurons which presumably represent the momentary evidence in this task. We now mention this finding in the Abstract. The result of these analyses were summarized in Figure 2, an important prerequisite for the psychophysics and modeling that form the bulk of the paper. It was not an afterthought, but the data were acquired with this purpose and powered appropriately. That said, we agree with the reviewer that the data deserve further analysis. Accordingly, we added a short section in Results and a new figure that displays the time course over which the neural response informs confidence as assayed in the PDW task. Previous studies have shown correlations between the fluctuations of neural activity in MT/MST and directional choices in tasks similar to the one studied here. We now show that fluctuations in neural activity are also informative of whether the monkey will waive the sure target if available (new Figure 7). The spikes are informative about PDW at the same time that they are informative about motion direction and choice. The observation is consistent with the idea that choice and confidence are influenced by the earliest samples of motion information.
As suggested by the reviewer, we computed the Fano factor for the different conditions of motion coherence and volatility. For each neuron, we compute the mean and variance of the spike counts in a 100 ms counting window (100200 ms after motion onset). For this analysis, we only included single neurons and trials in which the motion stimulus was presented for at least 150 ms. We then compute the variancetomean ratio for each neuron. The means ( ± SEM) are displayed in Author response image 3. It shows that (i) the average Fano factor was above 1 for both volatility conditions and for all coherences, (ii) the Fano factor tends to increase with the strength of motion in the preferred direction, and (iii) there is an influence of volatility on the Fano factor, which is on average greater when the volatility is high.
We believe, however, that this analysis does not add much to the conclusions that can be derived from the analysis of main Figure 2, and therefore we prefer not to include it in the main text. Further, while the Fano factor is usually considered to be a characterization of the internal noise, here it includes contributions from both the internal noise and the fluctuations in the stimulus. Indeed, the measured variancetomean ratio is probably never a proper characterization of purely internal or external noise, or spike generation given a latent rate (see Shadlen & Newsome, 1998; Churchland et al., 2011; Nawrot et al., 2008), which is why Fano factor is a misnomer as it is commonly applied. Therefore, the increase in “Fano factor” with the motion coherence and with the volatility of the stimulus may mainly (but not entirely) reflect that the motion stimulus had higher variance in these conditions.
8) I had a hard time appreciating whether the more extreme confidence judgments were diagnostic of this particular model of choice, response time, and confidence, or if other models would also predict this result. For instance, Pleskac and Busemeyer's 2DSD model (assuming say a serial process of choice then confidence) would also predict higher average confidence at lower levels, but for a slightly different reason with the variability combined with a bounded scale would produce regressive like effects. It may be the case the monkey data with postdecision wagering would speak against this, but it seems like a relevant discussion item.
Pleskac, T. J., & Busemeyer, J. R. (2010). TwoStage Dynamic Signal Detection: A Theory of Choice, Decision Time, and Confidence. Psychological Review, 117, 864901. doi:10.1037/A0019737
The model of Pleskac and Busemeyer was developed for the specific situation in which choice and confidence are reported sequentially. Under this circumstance, we (and others) have shown that late information can inform confidence but not choice (e.g., van der Berg et al., eLife 2016). This situation, however, does not apply to our tasks. In the human confidence task the choice and the confidence are reported simultaneously. Both were based on short stimulus durations, and subjects used all information to guide their choices (i.e., adding termination bounds did not improve the quality of the fits). This means that there is no “late evidence” that can be used to inform confidence but not choice.
Further, the new analysis of the neurophysiological data from the PDW task (Figure 7) suggests that the information bearing on both choice and confidence are contemporaneous, if not one in the same, as suggested by a recent microstimulation study from our group (Fetsch et al. 2014).
https://doi.org/10.7554/eLife.17688.026Article and author information
Author details
Funding
Howard Hughes Medical Institute
 Ariel Zylberberg
 Christopher R Fetsch
 Michael N Shadlen
Human Frontier Science Program
 Michael N Shadlen
National Eye Institute (R01 EY11378)
 Ariel Zylberberg
 Christopher R Fetsch
 Michael N Shadlen
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This research was supported by the Howard Hughes Medical Institute, the Human Frontier Science Program and the National Eye Institute (R01 EY11378). We thank Mariano Sigman, Luke Woloszyn and Daniel Wolpert for helpful discussions, and NaYoung So for comments on the manuscript.
Ethics
Human subjects: The institutional review board of Columbia University (protocol #IRBAAAL0658) approved the experimental protocol, and subjects gave written informed consent.
Animal experimentation: This study was performed in accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (ACAAAE9004) of Columbia University.
Reviewing Editor
 Michael J Frank, Brown University, United States
Publication history
 Received: May 11, 2016
 Accepted: September 29, 2016
 Version of Record published: October 27, 2016 (version 1)
Copyright
© 2016, Zylberberg et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,389
 Page views

 661
 Downloads

 22
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, PubMed Central, Crossref.