Abstract
Decisions are accompanied by a degree of confidence that a selected option is correct. A sequential sampling framework explains the speed and accuracy of decisions and extends naturally to the confidence that the decision rendered is likely to be correct. However, discrepancies between confidence and accuracy suggest that confidence might be supported by mechanisms dissociated from the decision process. Here we show that this discrepancy can arise naturally because of simple processing delays. When participants were asked to report choice and confidence simultaneously, their confidence, reaction time and a perceptual decision about motion were explained by bounded evidence accumulation. However, we also observed revisions of the initial choice and/or confidence. These changes of mind were explained by a continuation of the mechanism that led to the initial choice. Our findings extend the sequential sampling framework to vacillation about confidence and invites caution in interpreting dissociations between confidence and accuracy.
https://doi.org/10.7554/eLife.12192.001eLife digest
To understand how the brain makes decisions is to understand how we think – how we deal with information, interpret it and agree with a particular interpretation of the information. Neuroscience has begun to uncover the mechanisms that underlie these processes by linking the activity of nerve cells in the brain to different aspects of making decisions. These include how long it takes to reach a decision, why we make errors and how confident we feel about a decision.
Sometimes when we make a decision and have committed to an answer, we then change our minds. Now, van den Berg et al. have asked whether the brain mechanisms that support a change of mind also support a change in confidence. To investigate this problem, human volunteers were asked to perform a difficult task where they had to decide whether a field of randomly moving dots had a tendency to drift to the left or to the right.
During the experiment, van den Berg et al. recorded how long the volunteers took to make their decision, how confident the volunteers felt about their choice, and whether they were correct. Analyzing this data revealed that all of these measures could be explained by a mechanism where the brain accumulates evidence only until there appears to be enough evidence to favor one choice over the other. This process specifies how confident an individual should be based on the quality of the sensory evidence and how long it takes to make a decision.
In addition, van den Berg et al. found that occasionally a volunteer changed their mind about how confident they were about a decision after they’d made it, as if they had continued to think about it. This was despite the volunteers receiving no more information about the task or how well they had done once they had made their decision. Therefore, it appears that the brain processed additional information that had already been detected but did not have time to affect the initial choice.
The activity of the nerve cells in the brain was not recorded as the volunteers made their decisions. Future experiments that incorporate these measurements could help reveal how the brain performs the necessary computations and account for the time delay seen in processing some of the data. Where is this delayed information processed in the brain, and how does it lead to a change of mind?
https://doi.org/10.7554/eLife.12192.002Introduction
Many decisions benefit from the acquisition of multiple samples of evidence acquired sequentially in time. In that case, a decision maker must decide not only about the proposition in question but also about when to terminate deliberation. The ensuing tradeoff between speed and accuracy is explained by sequential sampling with optional stopping models in which evidence is accumulated to some stopping criterion or bound (Link, 1975; Ratcliff and Rouder, 1998). The mechanism receives experimental support from human psychophysics and neural recordings in monkeys and rats (Gold and Shadlen, 2007; Brunton et al., 2013; Shadlen and Kiani, 2013; Hanks et al., 2015). The same framework also explains the confidence that a decision is correct (Kiani and Shadlen, 2009; Kiani et al., 2014a). This is because the quantity that is accumulated, termed a decision variable (DV), when combined with elapsed decision time, maps to the probability that a decision rendered on its value will be correct (Kiani and Shadlen, 2009; Drugowitsch et al., 2014). The attribution of confidence is important for guiding subsequent decisions, learning from mistakes and exploring alternatives. Thus, when the decision maker terminates deliberation, the choice is accompanied by a degree of certainty (i.e., confidence), based on the same stream of evidence that supported that decision (Fetsch et al., 2014).
This last point remains controversial, however, for there are many instances when the confidence in a decision and the decision itself are dissociable. For example, human decision makers tend to overestimate their certainty about choices based on truly ambiguous evidence (Fischoff et al., 1982; Baranski and Petrusic, 1994; Erev et al., 1994; Drugowitsch et al., 2014; Kiani et al., 2014a), and they can perform above chance level yet report they are guessing (Kunimoto et al., 2001). These and other observations have led psychologists to suggest that confidence and choice may be guided by different sources of evidence (Pleskac and Busemeyer, 2010; Zylberberg et al., 2012; Moran et al., 2015), or that the evaluation of the same evidence differs fundamentally in the way that it affects choice and confidence (Fleming and Dolan, 2012; Maniscalco and Lau, 2012; De Martino et al., 2013; Ratcliff and Starns, 2013). The latter distinction is captured by the notion of a 1storder confidence that is based rationally on the evidence in support of the decision and a 2ndorder confidence that can depart from this evidence. As this distinction rests on a proper understanding of the mechanism that supports choice and confidence, it is possible that some of the observations taken as support for higher order explanations of confidence are simply accounted for by deficiencies of the theory of 1st order choices.
Naturally, if a decision maker acquires additional information after committing to a choice, she might wish to revise a decision, or the confidence in that decision, or both. Such changes of mind occur occasionally, even when there appears to be no additional information available after the initial decision has been rendered. The sequential sampling framework (e.g., bounded evidence accumulation) offers a natural account of this phenomenon, because the mechanism incorporates processing delays, which leave open the possibility that the brain might have access to additional evidence that did not influence the initial decision and which might instead influence a revision. Evidence for such a process was adduced to explain reversals of perceptual decisions in humans and monkeys (Rabbitt, 1966; Rabbitt and Vyas, 1981; McPeek et al., 2000; Caspi and Beutter, 2004; Van Zandt and MaldonadoMolina, 2004; Resulaj et al., 2009; Burk et al., 2014; Kiani et al., 2014b; Moher and Song, 2014). Here, we address the possibility that this same mechanism can account for a revision in the confidence a decision maker assigns to his or her choice. We hypothesized that such revisions might account for an apparent dissociation between degree of confidence and choice.
We asked humans to decide about the net direction of motion in a dynamic random dot display, using a variety of difficulty levels. They simultaneously indicated both their choice and the confidence in that choice by moving a handle with their arm. We show that choice, confidence, and reaction time are explained by a sequential sampling mechanism operating on a common evidence stream. On a small fraction of trials, subjects changed their initial decision about the direction of motion and, more frequently, about their confidence. We show that confidence and choice are informed by the same evidence, both at the initial choice and on any subsequent revision. However, changes of confidence arising through postdecision processing can contribute to an apparent dissociation between confidence and decision.
Results
Four subjects performed the motion direction discrimination task illustrated in Figure 1a. The stimulus duration was controlled by the subjects (Figure 1b), who were trained to make fast but accurate decisions (see Materials and methods). Whenever ready, the subject moved a handle from the home position to one of four choice targets, thereby indicating simultaneously the direction decision and confidence level. On most trials subjects made an approximately straightline movement to one of the choice targets. However, although the random dot stimulus was extinguished on movement onset, on a proportion of trials subjects changed their choice or confidence rating or both, initially reaching towards one target but deviating to reach another (Figure 1c). Our central hypothesis is that a common mechanism explains the rate of these occurrences, as well as the speed, accuracy and confidence in the subjects’ initial decisions.
Initial decisions
We wish to account for the effect of stimulus difficulty on the initial direction choice, the confidence level associated with the choice and the time taken to make the decision. Not surprisingly, stronger motion supported more accurate (Figure 2a) and faster decisions (Figure 2b), that tended to be assigned the higher confidence (Figure 2c). The interplay between confidence, accuracy and reaction time can be appreciated from the breakdown of the data within the panels of Figure 2 (indicated by color). When subjects were more confident, they tended to be more accurate (Figure 2a; P<0.0001 for all subjects) and faster (Figure 2b; P<0.0001 for all subjects). In addition, subjects reported high confidence more often on correct choices than on errors (Figure 2c), and they tended to be more confident on those errors associated with stronger motion (P<1e5 for 3 subjects and P = 0.50 for S3).
These regularities are consistent with a common mechanism of bounded evidence accumulation to support choice, reaction time and confidence (Kiani et al., 2014a). The process involves a race between a mechanism that accumulates evidence for right, and against left, and a process that integrates evidence for left, and against right (Usher and McClelland, 2001; Mazurek et al., 2003). Since these processes obtain evidence originating in the same stimulus, they will tend to accumulate noisy evidence of opposite sign (i.e., anticorrelated). However, the neural noise associated with the accumulations need not be perfectly anticorrelated, so they are depicted separately (Figure 3a). The first process to reach an upper bound determines the choice and decision time.
The state of both the winning and losing processes as well as decision time (Figure 3b shows the races on the same plot) confer an expectation that a decision rendered on the evidence is likely to be correct, what we term confidence or belief. Since the winning process is at a fixed bound, the confidence is adequately summarized by the height of this bound, decision time, and the state of the losing accumulator. Thus the degree of belief is based on the balance of evidence (i.e., difference) between winning and losing DVs, as well as the decision time. This model has been tested in neurophysiology and a previous report in humans (Kiani et al., 2014a) which extends ideas from signal detection theory, race, and Bayesian models (Vickers, 1979; Kepecs et al., 2008; Maniscalco and Lau, 2012; Drugowitsch et al., 2014). Its main distinction is the recognition that the time to decision influences confidence (Fetsch et al., 2014; Kiani et al., 2014a). For example, this explains why confidence associated with errors increases with stronger coherences: stronger motion induces faster decision times for correct and errors alike. For simplicity, we assumed that subjects adopt a consistent criterion on “degree of belief” (log odds, Figure 3b) to decide in favor of high or low confidence.
The smooth curves in Figure 2 are fits of the model (4 parameters; see Materials and methods), which capture the main features of the subjects’ choices. Naturally, subjects differed in their sensitivity, speed/accuracy and confidence which was accounted for by the parameters of the fits (Table 1). They also differed in their criteria for categorizing degree of belief into high and low, which can be appreciated by the separation of choice functions by confidence report. To maximize the number of points, given the reward structure (see Materials and methods), subjects should choose high confidence if they believe the probability of a correct choice is greater than twothirds, which corresponds to a logodds of 0.69. Interestingly the inferred criteria for all the subjects were close to this optimal criterion, although all subjects were somewhat riskaverse (probability thresholds of 0.71, 0.78, 0.71 and 0.75; thresholds in logodd units shown in Table 1). There are some noteworthy discrepancies between model and data (e.g., proportion of high confidence choices at 0% coherence for 3 of 4 subjects), but the model captures the trend in the confidence ratings on error trials, mentioned above (3 of 4 subjects; Figure 2c), even though the fits themselves are dominated by the more numerous correct choices. The model is vastly superior to two alternative formulations, which would explain confidence on balance of evidence or deliberation time but not both (compared to the original model, log likelihood is reduced by 63–1180 and 2301–3749 across participants for these models, respectively; see Materials and methods).
Importantly, the fits of the initial choices allow us to characterize the state of the DV—including decision time—leading to the subjects’ choice and the mapping of this DV to confidence. This is the starting point to address the possibility of revising the initial choice and/or confidence rating afterwards. The critical assumptions are (i) the same information was used to reach an initial choice about direction and confidence, and (ii) there is additional information available to the decision maker after committing to a choice and confidence category, despite the disappearance of the random dot motion upon response initiation.
Both assumptions are supported by the modelfree analyses depicted in Figure 4. We calculated psychophysical kernels using the information from the stochastic motion displays to determine the time frame over which information in the stimulus affected both components of the choice. To do this we used only weak motion strengths and examined the residual motion energy after removing the means associated with motion direction and strength (see Materials and methods). The traces in Figure 4a show the averages of these residuals grouped by choice (Figure 4a, top) and grouped by confidence rating (Figure 4a, bottom). The similar time course of the confidence and choice kernels implies that the initial direction and confidence choices were supported by a common evidence stream. Moreover, by discounting the lag and smoothing introduced by the motion filter (inset), it is apparent that subjects relied on information from the beginning of the display until ~400 ms before movement onset (arrows; see Materials and methods and Figure 4—figure supplement 1) to guide both their direction choices and their confidence. These values are consistent with the estimates of nondecision time (t_{nd}) obtained from the model fits to the initial choices and RTs (Table 1). We next evaluate our hypothesis that some of the additional ~400 ms of information, which did not inform the initial direction and confidence choices, affects changes of mind about direction, confidence or both.
Changes of mind
After indicating their initial choices, subjects occasionally changed the direction of their hand movement to indicate a different direction or level of confidence or both (Table 2 and Figure 5). The frequency of these changes of mind varied between subjects, from 2.0 to 8.8% of trials. Changes of decision were more likely if the initial decision was an error than if it was correct (Figure 5a). These corrections were more likely if the motion information was stronger. Of course, errors were less frequent with stronger motion and less frequent than correct responses (Figure 2a). For three of the four subjects, changes of decision corrected an initial error more often than they spoiled an initially correct choice (P<0.001, P<0.001, P = 0.084, P<0.001), consistent with previous reports (Resulaj et al., 2009; Burk et al., 2014).
The novel insights from the present study derive from the changes in the confidence ratings (Figure 5b). For all subjects, changes of confidence were more frequent than changes of decision (Table 2 and Figure 5c; p<1e4 for all participants). Further, changes of confidence were more likely if the initial decision was low confidence than if it was high confidence (Figure 5b). These changes to high confidence were more likely when the initial low confidence accompanied a stronger motion. Note that Figure 5a shows the conditional probability of changing the decision about direction, given the initial choice was correct or incorrect (the actual proportions are shown in Figure 5—figure supplement 1a; and change of confidence shown in Figure 5—figure supplement 1b).
Changes of mind were beneficial to the participants in that on changes of mind trials (either decision, confidence or both) the gain in points over not changing one’s mind ranged from 0.72–1.12 points across the subjects (Table 2). Changes of both decision and confidence were less common (0.4–1.6%), which may be partly due to an energy cost associated with crossing the workspace (Burk et al., 2014; Moher and Song, 2014). Those double changes that did occur tended to move from high to low confidence (1.03, 0.35, 0.12, 1.36%) compared to trials from low to high (0.26, 0.04, 0.04, 0.26%; p<0.0001 for three subjects and p<0.05 for S3).
We hypothesized that a change of confidence, like a change of decision, can be explained by continuation of the processing of visual information that arrived after the subject had committed to her initial choice (Figure 3b). We first evaluated this hypothesis by elaborating the modelfree analysis of motion energy (Figure 4), described above. This analysis implies that ~400 ms of stimulus information, which did not affect the initial decision, might be available to revise the initial decision about confidence and direction. Indeed, when subjects changed their confidence rating from low to high, the motion information in the postdecision period supported the initial choice (Figure 4b, bottom), whereas changes from high to low confidence were associated with motion information in support of the direction opposite to the one chosen (Figure 4b, middle), and this trend was amplified for changes of mind about direction (Figure 4b, top).
A simple extension of the bounded accumulation model explains the frequency of these changes as well as their dependency on features of the stimulus and the subject’s initial report. The model scheme is illustrated in Figure 3b. We assumed that evidence was accumulated past the point of the initial decision for a fixed amount of time (a free parameter, t_{pip}) that is less than the nondecision time. The state of the belief at t_{pip} determines the final confidence and choice (see Materials and methods). As shown in Figure 3b, we allowed for the possibility that subjects might not apply the identical criteria for confidence and choice in the pre and postinitiation epoch (the δθ parameters; see Materials and methods). The settings of t_{pip} and the δθ parameters capture aspects of the energetic costs, by suppressing changes that would occur with small fluctuations in the evidence or very late in the movement. The trace in Figure 3b illustrates an initial highconfidence, rightward choice. Evidence that arrived in the postinitiation period (i.e., too late to affect the initial choice) tended to favor leftward, detracting enough from the belief to support a change of confidence but not enough to support a change of decision in favor of leftward. The example shows a resistance to change because the total evidence actually favors leftward, but the model asserts a change of decision bound that is somewhat below the neutral evidence level.
The curves in Figure 5 are fits of the model, which capture the main features of the subjects’ revisions of both choice and confidence. Note that all parameters of these fits are fixed from the fits to the initial choice and confidence, except for the three δθ parameters and t_{pip} (fit values in Table 1). The variations in the participants’ behavior are accounted for by the different settings of the model parameters, which are illustrated graphically in Figure 6 for an initial low and initial high confidence rightward decision. Although the relation between the precise values of the δθ parameters and behavior are hard to intuit, the trends are consistent. After an initial low confidence decision (Figure 6, top row), all subjects required more belief to change to a high confidence decision than would have been required for an initial high confidence decision (δθ_{1}>0). Similar hysteresis is seen for initial high confidence decisions (Figure 6, bottom row). Moreover, all subjects required more evidence to change their initial decision about direction than a simple reversal in sign of the evidence (δθ_{2}>0). Finally, all participants relaxed their belief criterion when they changed their direction decision while either maintaining or changing to high confidence (δθ_{3}<0). Because most initial choices were high confidence, this strategy would reduce motor effort by avoiding changes of both direction and confidence, as this requires crossing the workspace with a complete reversal of the initial movement.
We wish to emphasize that confidence and direction reports remain coupled via a common mechanism at the time of both the initial and final decision. The observation supports a parsimonious account of decision confidence which is somewhat at odds with recent literature on metacognition (see Discussion). Evidence for a metacognitive process is adduced from a dissociation between the signaltonoise properties of the evidence that would support a level of choice accuracy versus the level associated with a degree of confidence (e.g., metad′; see Fleming and Lau, 2014; Maniscalco and Lau, 2012). The metad′ statistic is not well behaved in the sequential sampling framework, but the basic logic remains applicable (see Materials and methods). In Figure 7, we compare two odds ratios (OR). Both express the relative probability of a highconfidence rating, given a correct or error choice. Along the abscissa, we base the OR on the initial confidence ratings, whereas on the ordinate, we use the final confidence ratings, thus accommodating the change of confidence. In essence, we are pretending that the subject indicated her choice and subsequently told us her confidence. Clearly, there is a systematic discrepancy between the OR pairs (P<1e4, sign test). All but one of the points are above the main diagonal, and the difference in OR is statistically reliable for 13 of the 19 individual points (4 subjects × 5 nonzero motion strengths with at least 1 error; P<0.05, bootstrap; see Materials and methods). This discrepancy might be regarded as a sign of 2ndorder confidence, whereas it is simply the result of a continuation of the 1storder process that couples choice and confidence, via bounded evidence accumulation.
Discussion
Amongst the three manifestations of choice behavior, confidence is perhaps the most important and the least understood—compared to speed and accuracy—because in the world outside the laboratory, we do not always receive immediate feedback about our choices. Often, all we know about the accuracy of a choice is a degree of confidence. The attribution of confidence is important for guiding subsequent decisions, learning from mistakes and exploring alternatives. The present study establishes that this assessment evolves in time and, like the choice itself, can undergo revision with additional evidence. The study of perceptual decisionmaking offers insight into the process because the stream of evidence can be controlled experimentally. This is especially so in the present study using random dot kinematograms because the temporal stream of evidence is effectively a sequence of independent, statistically stationary samples. In other types of decisions—and most perceptual ones as well—the samples of evidence are derived from internal evaluations and memory processes, which are less well characterized, but which also evolve as a function of time.
When the timing of a choice is under the control of the decision maker, a common strategy is to terminate deliberation upon a sufficient level of evidence. The class of bounded sequential sampling models, including many variants (Wald, 1947; Stone, 1960; Laming, 1968; Link, 1975; Good, 1979; Luce, 1986; Ratcliff and Rouder, 1998) explains the tradeoff between speed and accuracy, and many of the essential steps have correlates in neurophysiology (Romo et al., 2002; Heekeren et al., 2004; Ploran et al., 2007; Heitz and Schall, 2012; White et al., 2012; Kelly and O'Connell, 2013; Hanks et al., 2014; Hanks et al., 2015). The mechanism (and models) also expose a discrepancy between the information supplied to the decision maker and the information that is actually used to make the decision. Specifically, there are processing delays between the arrival of information from the environment and updating its representation in working memory as a decision variable bearing on a proposition, and there are delays between commitment to a decision and communicating this decision through the motor system. Even in relatively fast perceptual decisions, this socalled nondecision time typically exceeds 300 ms.
In previous studies, we showed that the motor system receives a continuous updating of the state of the evidence leading to a choice (Gold and Shadlen, 2000; Selen et al., 2012) and can utilize the late arriving information to revise an initial choice by continuing the process of deliberation (Resulaj et al., 2009; Burk et al., 2014). The present study extends these observations to the metacognitive operation of assigning a degree of belief that the decision is correct. The finding is important for several reasons. First, it establishes that deliberation in the postinitiation epoch is as rich a process as the deliberation preceding initiation. This implies that termination has a kind of specificity; it marks the end of deliberation for purposes of response initiation, but it does not actually terminate all deliberation. We suspect that this is just one aspect of a more general property. For example, the same information can bear on a variety of potential actions and cognitive operations, which need not share identical speedaccuracy tradeoffs and thus deliberate for different amounts of time.
Second, the finding explains a potential dissociation between the information that decision makers might use to guide confidence and choice. We explain the initial choice and confidence using the identical stream of information, as in a previous study (Kiani et al., 2014a). The model is by no means perfect (e.g., it overestimates confidence at the lowest motion strength), but it demonstrates, nonetheless, that on many trials, the final confidence judgments are clearly based on additional information that did not affect the initial choice. This assertion is supported by the analysis of motion energy leading to the confidence choices (Figure 4). Just a few hundred milliseconds of additional evidence led to changes in the confidence associated with the initial choice, ranging from 1.5–6.3% of trials across our subjects, and this was enough to induce a clear departure from the level of confidence that ought to be associated with the evidence available at the time of the initial choice. In other words, had we asked our subjects only for their confidence rating after they indicated their initial choice—without an opportunity to revise those choices—we would have detected a different relationship between confidence and accuracy (Figure 7). When confidence is assessed after the initial choice, as it is in most experiments, then it ought to come as no surprise that choice and confidence are explained by only partially overlapping sources of information. Importantly, however, this does not imply a dissociation between confidence and the decision (Pleskac and Busemeyer, 2010), as the latter can also be revised based on the same additional information.
Of course, in many settings, it is possible that a decision maker might acquire new information after an initial choice, either from the environment or from memory (e.g., reconsidering the evidence, weights and costs), and this could lead to a divergence of choice and confidence. However, our result invites caution when interpreting such divergence as indicators of metacognitive processes. Does the divergence necessarily implicate a process of a different nature, as implied by the distinction between 1st and 2nd order processes (Fleming and Dolan, 2012)? Or, is it possible that the mechanism responsible for assigning confidence based on the additional evidence is compatible with the one that supports a choice—perhaps a different choice were the decision maker given an opportunity to revise (consistent with the original notion of type1 and type2 decisions; Clarke et al., 1959; Galvin et al., 2003)? The evaluation of additional evidence may explain why confidence ratings are more strongly correlated with accuracy when they are reported with less time pressure (Yu et al., 2015), and why some individuals appear to be better than others at discriminating correct from incorrect decisions (e.g. Ais et al., 2016). The central question is whether this additional information affects confidence in a manner that is fundamentally different from the process that would tie these attributes together. In our study, the answer seems to be negative, and we speculate that the interesting aspects of more realworld distinctions involving reevaluation of evidence using memory, for example, will require better understanding of the way that memory and decision processes interact but not a fundamentally different mechanism for associating confidence with the evidence arising from that process.
We conclude that confidence, choice and reaction time can be understood in a common framework of bounded evidence accumulation. By definition, confidence may be regarded as metacognitive, simply because it is a report about the decision process itself. Yet the operations leading to confidence seem neither mysterious nor dissociable from the decision process (cf. De Martino et al., 2013). That said, there are many unknown features of the underlying mechanism. We know little about the establishment of the mapping between belief and the representation of evidence and time. Nor do we know how a criterion is applied to this mapping to render the categorical choices in our study, or in postdecision wagering decisions (Kiani and Shadlen, 2009), or in confidence ratings (Maniscalco and Lau, 2012; Fleming and Lau, 2014; Kiani et al., 2014a). An appealing idea is that this also looks like a threshold crossing in the brain with dynamic costs associated with time (e.g., urgency Thura et al., 2012; Drugowitsch et al., 2012) and dynamic biases (Hanks et al., 2011). Presumably, downstream structures that represent confidencerelated values such as reward prediction error, must approximate the mapping between decision variable—represented in LIP and other brain areas—and elapsed time (or number of samples) to achieve this. Of course, downstream structures that control the arm must have access to the decision about both direction and confidence to control the initial reach and possibly revise the movement. To explain our findings, we assumed that choice and confidence are related but processed as if separate attributes, via a correspondence between decision variable and belief. It is intriguing to think that the two dimensions, which are bound together into a single action by the motor reach system in our task, could be dissociated in cognition and memory.
Materials and methods
Subjects
Six naïve righthanded subjects, between the ages of 21 and 34, participated in this study. The Cambridge Psychology Research Ethics Committee approved the experimental protocol, and subjects gave informed consent. Two of the subjects were excluded from the analyses based on poor task performance (see below).
Apparatus
Subjects were seated and used their right hand to hold the handle of a vBOT manipulandum that was free to move in the horizontal plane and allowed the recording of the position of the handle at 1000 Hz (Howard et al., 2009). Subjects were prevented from seeing their arm by a horizontal mirror that was used to overlay virtual images of a downward facing CRT video display, mounted above the mirror, into the plane of the movement. A headrest ensured a viewing distance of around 40 cm.
Stimulus
Request a detailed protocolSubjects discriminated the direction of motion in a dynamic randomdot motion stimulus (Roitman and Shadlen, 2002) presented within an aperture subtending 5 degrees of visual angle. The dots were displayed for one frame (13.3 ms, 75 Hz refresh) and then three frames later a subset of these dots was displaced in the direction of motion while the rest of the dots were displaced randomly. Thus the positions of the dots in frame four, say, could only be correlated with dots in frames one and/or seven but not with dots in frames two, three, five and six. The dot density was 12.5 dots deg^{2}s^{1} and displacements were consistent with a motion speed of 5 deg/s. The difficulty of the task was manipulated through the coherence of the stimulus, defined as the probability that each dot would be displaced as opposed to randomly replaced.
Procedure
Request a detailed protocolFigure 1a show a schematic of the experimental setup. A trial began when the subject’s hand (displayed to the subject as a 0.5 cm radius red circle) was inside the home region, i.e., within 1 cm of a grey cross approximately 30 cm in front of their body. After a random delay, sampled from a truncated exponential distribution (range, 0.5–2.0 s; mean, 0.82 s), a dynamic randomdot stimulus appeared at the home position. On each trial, stimulus coherence was selected randomly from the set ± 0, ± 3.2, ± 6.4, ± 12.8, ± 25.6, and ± 51.2%, where negative coherences correspond to leftward motion and positive coherences to rightward motion. The sign on the 0% coherence is arbitrary but determined which direction would be rewarded (see below).
Four circular choice targets with a radius of 1.5 cm were displayed at the corners of a 17 x 17 cm square centered on the home position. The two choice targets on the left corresponded to a leftward motion decision and the two on the right to a rightward motion decision. To encourage participants to also report the confidence in their decision, the two choice targets for each motion direction decision had different payoffs for correct and incorrect choices. One target was lowrisk with a reward of 1 point for a correct choice and a loss of 1 point for an incorrect choice. The other target was highrisk with 2 points for a correct choice and a loss of 3 points for an incorrect choice. The designation correct/incorrect was assigned randomly on 0% coherence trials.
Subjects judged the direction of the moving random dots and reached to a choice target when ready. We encouraged them to make quick decisions without sacrificing accuracy. They were free to interpret this instruction as they wished; they received no verbal instruction to aim for any particular speed/accuracy regime. Critically, when the movement was initiated—that is, the hand was more than 1 cm from the central cross—the randomdot stimulus was extinguished. The trial ended when the subject reached one of the four choice targets. The time course of a trial is shown in Figure 1b. If the movement had not been initiated within 3 s after stimulus onset, an error message appeared (“Too Slow”) and the trial would be repeated later in the session. After each trial, auditory feedback was given with a pleasant chime or a lowpitched tone corresponding to a correct and incorrect choice, respectively, and the number of points earned or lost was displayed on the screen. Subjects were instructed to maximize the number of points per trial. To encourage this, a running sum of the points was displayed at the top of the display in a bar graph.
Each experimental session consisted of four blocks of 180 or 192 trials each (15 or 16 trials of the 12 coherences). We generated 48 stimuli with a rightward motion direction (8 for each of the 6 different coherence values). The leftward stimuli were generated by using the same dot locations but horizontally mirrored about the center of the aperture. This ensured that across the stimuli there was no leftright bias due to the motion energy of the stimuli. In most experiments, we used this “doublepass” procedure so that these 96 stimuli were displayed twice, in a random order. In a given session, the vertical orientation of the targets changed from block to block; in half of the blocks (A), the two highrisk targets were at the top of the display, while in the other half (B), the highrisk targets were at the bottom (Figure 1a). A session consisted of four blocks (768 trials) ordered ABBA or BAAB, and this alternated from session to session. Each subject took part in 12 experimental sessions (9024–9216 trials). All subjects received extensive training on the motion task, beginning with variable duration viewing, controlled by the computer and choicereactiontime testing without confidence categories. Subjects passed to the main experiment when choice and reaction time functions were stable.
We required subjects to have sufficient perceptual skills and motivation to perform the task. One subject was excluded based on poor discrimination performance: at the end of the first training session, this subject still performed at chance level on all coherences. A second subject was excluded because s/he responded with high confidence on 95% of trials (and with 90% high confidence even at 0% coherence). After 5 sessions (3330 trials) we decided to replace this subject. We found that despite the idiosyncrasy in this subject’s data, our model can fit their initial choices (Figure 2—figure supplement 1). Subject 4 showed a qualitative change in behavior on the second half of their data (after ~5000 trials; Figure 2—figure supplement 2a), reporting high confidence on nearly all trials. As no stationary model can account for such nonstationary behavior we included only the first 5000 trials in our analysis. However, our model could still fit the initial choices for the omitted second half of this participant's data (Figure 2—figure supplement 2b).
Analysis
We excluded from analysis any trials with a reaction time less than 150 ms (9 trials). For each trial, the final decision was determined by the choice target reached. To determine whether a change of decision had taken place we calculated the area between the hand’s path over the first 1 cm of movement and the vertical line through the hand’s initial starting location (i.e. the line separating the left and right choice targets). A change of decision was reflected in the area on the side opposite to the final choice being greater than 0.1 cm^{2} (Resulaj et al., 2009). The same procedure was applied to determine changes of confidence with the area now calculated relative to the horizontal line separating the high and low confidence choice targets. On each trial, we thus obtain the initial and final decision: choice (left/right), confidence (low/high), and reaction time (time between stimulus onset and movement initiation). We show combined data for the two target configurations (high/low confidence targets at top/bottom), having reassured ourselves that the arrangement had no detectable effects on choice accuracy (p≥0.4; Fisher’s exact test) and only small effects on reaction time (RT) for 2 subjects (magnitudes<3%, p<0.01; ttest). There was a subtle bias for the bottom targets (50.6, 51.9, 52.5 and 51.4%, respectively), possibly due to kinematic factors, but recall that the orientation was balanced across the experiment.
To examine effects of motion strength on confidence, we fit a logistic model to the probability of reporting high confidence, as a function of absolute value of motion coherence (C)
where and b_{i} are fitted coefficients. To examine whether confidence judgments were associated with more accurate choices we fit a logistic model to the direction choice data for each subject where the probability of choosing right is given by
where C is the signed motion strength, I is an indicator variable (zero for a low confidence choice and one for a high confidence choice). To test for improved sensitivity (accuracy) with high confidence, we evaluated the null hypothesis (H_{0}: b_{3} ≤ 0). To examine whether confidence judgments were associated with different reaction times we analyzed each subject's reaction time as an ANOVA with categorical factors of unsigned coherence and confidence. All comparisons of event frequency (e.g. changes of mind) were performed with the Fisher exact test.
For the modelfree analyses of the time course of motion information on choice and confidence (Figure 4), we derived choice and/or confidence conditioned averages of stimulus motion energy (psychophysical kernels). Due to the stochastic nature of the motion stimuli, the strength of motion will vary from trial to trial, and even within a trial. To quantify the fluctuations of motion along the horizontal axis, we convolved the sequence of random dots shown on each trial with a pair of spatiotemporal oriented filters, selective for rightward and leftward motion. The filters were matched to the speed and displacement of the coherently moving dots (see details in Adelson and Bergen, 1985; Kiani et al., 2008). The results of the convolution were summed across space to yield the motion energy for each direction and as a function of time. The net motion energy was obtained by subtracting leftward from rightward motion energy. To average data across trials, we removed the average motion energy associated with each trial’s coherence and direction of motion. Because fluctuations have a stronger impact when motion is weak, only the lowest motion strengths (≤6.4% coh) were included in these analyses.
The influence of motion fluctuations on choice and confidence becomes negligible a few hundred milliseconds before movement onset (Figure 4a). To obtain empirical estimates of nondecision time, we fit a function to the psychophysical kernel. The shape of the function f(t) was derived assuming that the psychophysical kernels decay slowly when aligned on movement onset (Figure 4a, right) because of: (i) the trialtotrial variability in the nondecision time (assumed Gaussian), which gradually reduces the number of trials that contribute to the psychophysical kernel, and (ii) the additional smoothing introduced by the impulse response of the motion energy (inset of Figure 4a). With these assumptions, f(t) becomes:
where α is a scaling parameter, Φ represents a cumulative Gaussian distribution with parameters μ_{tnd} and σ_{tnd}, IR(t) is the impulse response of the motion filter, and $*$ indicates convolution. The curvefitting procedure entails fitting μ_{tnd}, σ_{tnd} and α to match f(t) to the psychophysical kernels (leastsquare fit). Figure 4—figure supplement 1 illustrates the fitting procedure and shows best fitting parameters for each subject.
Although we do not make use of metad′ in this paper, we refer to the concept and therefore provide a brief definition. In signal detection theory (SDT), d′ refers to the difference between the means of two standard Normal distributions, X_{1}~N(μ_{1},1) and X_{2}~N(μ_{2},1), where X_{n} is a random variable and N(μ_{1},σ) is a Normal distribution with mean μ_{1} and standard deviation σ. The two distributions might represent signalplusnoise and noisealone or the distributions of firing rates of neurons tuned to opposite directions of motion (e.g., Britten et al., 1992). For binary classification we can conceive of a single distribution of the difference between samples of X_{1} and X_{2}, such that X_{∆}~N(d′,$\surd 2$), assuming independence. X_{∆} thus represents a DV whose sign identifies the more likely alternative. There is a onetoone correspondence between proportion correct and d′. Applied to binary classification tasks, metad′ is the value of d′ that would support the proportions of highconfidence ratings on error and correct choices, respectively, based on the assumption that the confidence designation is based on comparison of X_{∆} to some arbitrary but fixed criterion. Thus, within the SDT framework, if metad′≠d′ then it is not possible to account for the confidence ratings using the same signal:noise relationship that supports the choice accuracy, and this discrepancy has been interpreted as a sign of metacognitive confidence (Maniscalco and Lau, 2012; Fleming and Lau, 2014). However, the SDT framework must be extended to account for RT and choice. Under sequential sampling (e.g., bounded drift diffusion), metad′ ≠ d′, even when the choice and confidence are explained by the same decision variable. Hence we pursued a more empirical approach.
To examine the extent to which initial and final confidence are related to the correctness of the initial choice (Figure 7), we first calculated the odds of a high confidence rating, for correct and incorrect initial decision. For example,
From this we calculated the odds ratio
which indicates whether an event is more likely to occur in the first condition (i.e. more often for correct choices, hence OR >1) or second condition (i.e. more often for errors, OR <1). We compared the ORs for the initial and final confidence, both with regard to the initial choice. The ORs calculated from the initial confidence report establish a baseline: the breakdown of confidence associated with the information that explains the accuracy at each motion strength, analogous to metad′ equal to d′ in the SDT framework described above. A larger OR for the final vs. initial confidence would indicate that a final high confidence response became more probable for correct choices compared to incorrect initial choices. We used a bootstrap (N = 1000) to evaluate the reliability of the inequalities in ORs, depicted in Figure 7.
Model
Request a detailed protocolWe fit the initial decision data with a model in which the decision process is a race between two accumulators (Vickers, 1979; Usher and McClelland, 2001; Gold and Shadlen, 2007; Churchland et al., 2008), one that accrues momentary evidence for right (and against left; RL) and another that accrues evidence for left (and against right; LR). Momentary evidence was modeled as draws from a bivariate Gaussian with a mean that depends on the coherence (C) such that drift rate (/s) is κC for the rightward and κC for the leftward accumulator, giving a mean of (κC,κC). The bivariate Gaussian has a negative covariance ρ which determines the extent to which the two accumulators share noise (for example arising from fluctuations in the stimulus). A decision is made when one of the two races crosses a decision bound B. The reaction time is determined by the time to reach the decision bound and an additional nondecision time (e.g., due to sensory and motor latencies) which was modeled as a normal distribution with mean μ_{tnd} and standard deviation σ_{tnd}. The state of both the winning and losing race together with decision time map directly to the logodds of being correct (see Kiani et al., 2014a). To model confidence, we included a logodds threshold θ which separated high from low confidence judgments. For simplicity, we chose a timeinvariant threshold. We have also fit data using timedependent confidence bounds, which improve the fits for all subjects but does not affect the figures visibly and does not affect our conclusions.
To fit the accuracy, confidence, and reaction time of the initial choices, we determined the number of trials for the 4 possible initial choices (correct/highconfidence, correct/lowconfidence, error/highconfidence & error/lowconfidence) for each (unsigned) coherence, as well as the corresponding mean reaction times. For any setting of the parameters, the model predicts the probability of each of the 4 possible initial choices and the mean reaction times for each coherence level. To fit model parameters we minimized the negative log likelihood (i.e., cost), using a multinomial distribution for the 4 choice types and Gaussian distribution for the sample mean RTs. For analytic simplicity (see below), we used a flat bound (i.e., stationary rather than collapsing), which does not capture the shape of the RT distributions and the mean RT on error trials (Drugowitsch et al., 2012). Therefore we used only sample mean RT and its associated s.e.m. for correct trials (and all 0% coherence trials) in the cost function.
To further reduce the degrees of freedom of the model to four we fixed both the covariance ρ to $\sqrt{0.5}$ ≈ 0.71 and nondecision time standard deviation σ_{tnd} to 60 ms (within the normal range from a previous study (Burk et al., 2014) and supported by an analysis of motion energy; see Figure 1—figure supplement 1). This choice of covariance assumes that about half the noise is shared between the two accumulators (i.e. arises from the stimulus) and is within the normal range of fitted covariances (Kiani et al., 2014a). Importantly, this choice of covariance also allowed us to use an analytic solution to the race model when fitting. That is, we generalized the method of images used in (MorenoBote, 2010) which provides analytic solution for covariances of 0 and 0.5 (requiring 3 and 5 images respectively). Increasing the number of images leads to a variety of possible covariances (but not to arbitrary covariances). We chose a covariance of $\sqrt{0.5}$ which requires the use of only 7 images thereby allowing efficient fitting of the subjects’ data. Using this analytic method precluded the use of collapsing decision bounds (and no lower reflecting bounds; cf., Kiani et al., 2014a). The fitting was performed for each participant using multiple (30) runs of Matlab’s fminsearchbnd with a wide range of different initial parameters settings. The variability across runs was minimal, suggesting that the optimization procedure converged to global maximum.
To model changes of mind, we assumed that once the initial decision had been made, the accumulators continued to integrate information that was not accessible at the time of the initial decision (due to latencies in the sensory and motor system) for a further postinitiation period t_{pip} (constrained to be less than t_{nd}). A change in confidence or choice would occur if at the end of this period the log odds crossed a confidence or choice threshold, respectively (Figure 3b). Since there are motor costs involved in changes of mind, we included additional parameters that could move the confidence and choice thresholds away from their initial values in the postinitiation stage by δθ_{1}, δθ_{2} and δθ_{3 }(Figure 3b). The three thresholds split the final decision into four zones and the specification of these thresholds depended on whether the initial choice was low or high confidence. For an initial highconfidence choice, the three thresholds were θ–δθ_{1}, δθ_{2}, and –θ–δθ_{3}, respectively (see Figure 3b). For an initial lowconfidence decision, the first threshold was instead θ+δθ_{1}. To fit the model to each subject’s data, for each unsigned coherence we calculated the number of trials corresponding to the 16 possible events that could occur over the trial (4 possible initial choices and 4 possible final choices) and again used maximum likelihood to fit the four free parameters carrying over κ and θ from the initial decision fits.
Note that basing the change of decision about direction on log odds represents a simplification of a process similar to the one for the initial decision, for example, one that would operate on the evolving DVs represented by the competing accumulators. In principle, such a mechanism could terminate postinitiation processing more flexibly, but the frequency of change of mind about choice and confidence is too small to evaluate this possibility.
We also evaluated two alternative models for initial choices only, which differ in how they assign high vs. low confidence. Model 1 exploits a classic idea from signaldetectiontheory, and assigns confidence based on a threshold on the balance of evidence (ignoring deliberation time) (Vickers, 1979; Kepecs et al., 2008; Kepecs and Mainen, 2012; Wei and Wang, 2015). Model 2, assigns confidence based on threshold on decision time only, thus ignoring the balance of evidence. We fit both models to our participants’ data. They have the same number of parameters (4) as the original model. A 4choice model for the initial choiceconfidence decision (based on dynamic programming) is worthy of consideration but we have yet to find a satisfactory account of our data with this approach.
References

Spatiotemporal energy models for the perception of motionJournal of the Optical Society of America A 2:284–299.https://doi.org/10.1364/JOSAA.2.000284

The calibration and resolution of confidence in perceptual judgmentsPerception & Psychophysics 55:412–428.https://doi.org/10.3758/BF03205299

The analysis of visual motion: a comparison of neuronal and psychophysical performanceThe Journal of Neuroscience 12:4745–4765.

The time course of visual information accrual guiding eye movement decisionsProceedings of the National Academy of Sciences of the United States of America 101:13086–13090.https://doi.org/10.1073/pnas.0305329101

Decisionmaking with multiple alternativesNature Neuroscience 11:693–702.https://doi.org/10.1038/nn.2123

Two types of ROC curves and definitions of parametersThe Journal of the Acoustical Society of America 31:629–630.https://doi.org/10.1121/1.1907764

The cost of accumulating evidence in perceptual decision makingJournal of Neuroscience 32:3612–3628.https://doi.org/10.1523/JNEUROSCI.401011.2012

Simultaneous over and underconfidence: the role of error in judgment processesPsychological Review 101:519–527.https://doi.org/10.1037/0033295X.101.3.519

BookCalibration of Probabilities: The State of the Art to 1980Judgement under Uncertainty: Heuristics and Biases.

The neural basis of metacognitive abilityPhilosophical Transactions of the Royal Society B: Biological Sciences 367:1338–1349.https://doi.org/10.1098/rstb.2011.0417

How to measure metacognitionFrontiers in Human Neuroscience 8:.https://doi.org/10.3389/fnhum.2014.00443

Type 2 tasks in the theory of signal detectability: discrimination between correct and incorrect decisionsPsychonomic Bulletin & Review 10:843–876.https://doi.org/10.3758/BF03196546

The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038

Studies in the history of probability and statistics. XXXVII AM turing’s statistical work in world war IIBiometrika pp. 393–396.

Elapsed decision time affects the weighting of prior probability in a perceptual decision taskJournal of Neuroscience 31:6339–6352.https://doi.org/10.1523/JNEUROSCI.561310.2011

A modular planar robotic manipulandum with endpoint torque controlJournal of Neuroscience Methods 181:199–211.https://doi.org/10.1016/j.jneumeth.2009.05.005

Internal and external influences on the rate of sensory evidence accumulation in the human brainJournal of Neuroscience 33:19434–19441.https://doi.org/10.1523/JNEUROSCI.335513.2013

A computational framework for the study of confidence in humans and animalsPhilosophical Transactions of the Royal Society B: Biological Sciences 367:1322–1337.https://doi.org/10.1098/rstb.2012.0037

Confidence and accuracy of nearthreshold discrimination responsesConsciousness and Cognition 10:294–340.https://doi.org/10.1006/ccog.2000.0494

The relative judgment theory of two choice response timeJournal of Mathematical Psychology 12:114–135.https://doi.org/10.1016/00222496(75)90053X

BookResponse Times: Their Role in Inferring Elementary Mental OrganizationOxford University Press.

A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratingsConsciousness and Cognition 21:422–430.https://doi.org/10.1016/j.concog.2011.09.021

A role for neural integrators in perceptual decision makingCerebral Cortex 13:1257–1269.https://doi.org/10.1093/cercor/bhg097

Concurrent processing of saccades in visual searchVision Research 40:2499–2516.https://doi.org/10.1016/S00426989(00)001024

Twostage dynamic signal detection: a theory of choice, decision time, and confidencePsychological Review 117:864–901.https://doi.org/10.1037/a0019737

Processing a display even after you make a response to it. how perceptual errors can be correctedThe Quarterly Journal of Experimental Psychology Section A 33:223–239.https://doi.org/10.1080/14640748108400790

Modeling response times for twochoice decisionsPsychological Science 9:347–356.https://doi.org/10.1111/14679280.00067

Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time taskThe Journal of Neuroscience 22:9475–9489.

Exploring the cortical evidence of a sensorydiscrimination processPhilosophical Transactions of the Royal Society B: Biological Sciences 357:1039–1051.https://doi.org/10.1098/rstb.2002.1100

Deliberation in the motor system: reflex gains track evolving evidence leading to a decisionJournal of Neuroscience 32:2276–2286.https://doi.org/10.1523/JNEUROSCI.527311.2012

Decision making by urgency gating: theory and experimental supportJournal of Neurophysiology 108:2912–2930.https://doi.org/10.1152/jn.01071.2011

The time course of perceptual choice: the leaky, competing accumulator modelPsychological Review 108:550–592.https://doi.org/10.1037/0033295X.108.3.550

Response reversals in recognition memoryJournal of Experimental Psychology: Learning, Memory, and Cognition 30:1147–1166.https://doi.org/10.1037/02787393.30.6.1147

Confidence estimation as a stochastic process in a neurodynamical system of decision makingJournal of Neurophysiology 114:99–113.https://doi.org/10.1152/jn.00793.2014

Perceptual criteria in the human brainJournal of Neuroscience 32:16716–16724.https://doi.org/10.1523/JNEUROSCI.174412.2012

Dynamics of postdecisional processing of confidenceJournal of Experimental Psychology 144:489–510.https://doi.org/10.1037/xge0000062

The construction of confidence in a perceptual decisionFrontiers in Integrative Neuroscience 6:.https://doi.org/10.3389/fnint.2012.00079
Decision letter

Timothy EJ BehrensReviewing Editor; University College London, United Kingdom
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your work entitled "A common mechanism underlies changes of mind about decisions and confidence" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor (Timothy Behrens) and David Van Essen as the Senior Editor. One of the three reviewers has agreed to reveal his identity: Mark Goldman.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
The paper follows previous papers by the senior authors on decisionmaking tasks involving an accumulation of evidence, and corresponding models for accuracy, reaction time, and confidence judgments. This paper makes major new contributions to these problems, particularly with regard to the issue of confidence, and whether confidence judgements can be naturally explained in the same framework used to explain accuracy and reaction time. The authors use a very clever experimental paradigm for human psychophysics, in which the direction a lever is moved lets subjects not only indicate their choice (left or right) but also whether this choice has high or low confidence. As a result, both changes of choice and changes of confidence can be seen by changes in hand movement.
All of the reviewers had praise for the manuscript, describing the paradigm as elegant and clever, the paper as insightful and the contribution to the stateoftheart as important because of the unification into a single framework of decisions and metadecisions, which have preciously been treated as separate entities. They note that there is clear structure in the data in support of a common driver of these variables such as the modelfree analyses of motion energy.
The reviewers, however, suggested that the paper should not be published without some essential revisions (that do not require additional data).
Essential revisions:
First, there were concerns about the extent to which the common mechanism argument was a complete description of the data
Two reviewers raised a related point on these lines.
1) The model fits in Figure 2 show a good fit to accuracy and RT data (panels A and B) but a poorer fit to the proportion of high confidence responses (C), particularly for error trials. E.g. subject 1 has only two (out of 6) errortrial data points situated near the fitted line, and only one data point for subject 2 (part of the issue in realising this is that the purple error trial points are often hidden behind the green correct trial points, it would be useful to jitter these sideways for visualisation). This is the case even though there are several hundred error trials particularly at lower coherences with correspondingly tight error bars on confidence proportions. This discrepancy is particularly apparent at 0% coherence, where subjects 1, 2 and 4 show substantially lower confidence than predicted by the model. I'm not suggesting therefore that there can't be a common mechanism that does a good job of explaining all these data features (e.g. perhaps subjects are inferring on coherence, leading to lower confidence at 0% than predicted by a model that marginalizes coherence?). But it seems reasonable that a putative common mechanism driving choice, RT and confidence should do an equally good job of explaining each feature separately.
2) First, it is not clear to me what data could not be fitted by their model, given multiple fitted parameters per subject. It would be nice to see how a model with a separate "confidencegenerating" and "decisiongenerating" circuit would produce data that would not be fit by their model.
3) I do agree though that using Occam's razor, fitting by one model circuit is preferable to two. However, Figure 2C shows a horrible fit of the model to the 0% coherence data points in a systematic direction across subjects. This suggests a systematic problem with the model, but the authors do not mention this discrepancy.
4) There was agreement in the reviewer discussion that simulation of competing models to demonstrate how the data disconfirm a model that does not rely on a common mechanism would be important if the argument is to shift broader opinion. The reviewers regarded such a demonstration as essential. One suggestion during the discussion was as follows: "Perhaps a simplification of their model and then use of Model comparison techniques to adjust for different numbers of parameters would be a good test to show that a separate computation is needed for confidence as they claim."
Related to these concerns, was a concern about data exclusion.
5) This worry is compounded by the exclusion of data that shows aberrant confidence behaviour. One subject was excluded for a strong highconfidence biasand absence of confidenceaccuracy correlation, which is indeed at odds with the proposed mechanism linking choice and confidence, but perhaps not with other models. Similarly the second half of subject 4's data is excluded because they changed the way they reported confidence half way through the session. These exclusions seem to unfairly skew the playing field in favour of the authors' singlemechanism account, particularly as the remaining subjects' data constitute only a little over half of all data that was originally collected.
During the discussion it was made clear that although it is understandable that subjects may have effortrelated or other biases that make the data uninteresting, it is a concern when this is 2 of 6 subjects, and it is a concern that the particular exclusions seem to act in your favour. Because of this, it was thought to be essential that the excluded data were presented in the supplement, and that the reason for the odd behaviour was investigated and described.
6) There was also an important question about perceived asymmetries in model that the reviewers required clarification about. It was unclear why you switch between working in accumulator space and logodds space. In the discussion, it was agreed that it might be possible for you to clarify in the text if there was a strong and compelling reason, but if not then it would be interesting to see a symmetric model.
My largest comment is that, while confidence and decisions are clearly put together in a single framework, I found the framework to be a little awkward and/or unclear in what information/type of model informed decisions versus what informed confidence. The authors use a classic "race to bound" model to inform the decision. However, then they appear to use a separate probabilistic judgment (log odds, or maybe log odds ratiothis was a bit unclear) for confidence and for changes of decision and confidence. This seems awkward: why not just use the log odds metric for all stages of the task instead of artificially making a race to threshold for two completely independent integrators for the decision and then a separate probabilistic framework for confidence and changes of decision?
Or, maybe I am misinterpreting the log odds graph and this could simply be interpreted as a onetoone mapping between the decision variable and the log odds (so that one could indeed reformulate the race to bound in the decision variable as a race to bound in log odds). If this is the case, then I find the log odds measure for change of choice to be somewhat strange: in the race to threshold model, there are two completely independent accumulators that are racing against each other. In the log odds analysis, it would seem very awkward if only the odds for the initially winning accumulator was used (i.e. I'm assuming either the authors are using odds or odds ratios based upon using both accumulators, or they are just using the winning accumulator, since the figures only indicate a single trace in log odds). If only the winning accumulator is used, then it seems strange that the losing accumulator was never consulted on a change of choice, i.e. that this change of choice was solely driven by a change of log odds in the winning accumulator.
So the bottom line is that, at least as written, it appears to me that one of two awkward modeling choices is occurring:
A) The decision variable is determined by 2 independent integrators of a decision variable but confidence and changes of decision are determined by an odds ratio that combines the 2 integrators, in which case, why wasn't the initial decision made based upon a measure that combines the two integrators, rather than having this sudden switch between what determines the decision?
B) The log odds are determined only by the winning accumulator's log odds, in which case, it seems odd to have a change of choice that occurs without consideration of the new winner's accumulator.
Finally, there were several important points that do not require more analysis but that the reviewers thought merited serious consideration and discussion:
7) The evidence for separate mechanisms includes a correlation of confidence parameters across subjects when completing distinct tasks that is not seen in the decisionmaking parameters. The model in this paper seems to be distinct from that – it seems that one would expect the lowhigh confidence boundaries to remain intact but the decision boundary to change when a new task is performed. Hard to imagine why this would occur in one model, so this issue should at least be discussed.
Perhaps the authors data do explain this, but it is unclear to me – there is a wellknown "highlow effect" whereby subjects are more confident than they should be at hard decisions (low coherence) and less confident than they should be at easy decisions (high coherence). My intuition is that if a decision is made yet more information affecting confidence is received postdecision then that extra information would more likely favor the decision (so boost confidence) in easy decisions but as often as not favor the nonchosen target (so reduce conference) in difficult decisions. Is this effect accounted for in the authors' framework? The universal increase in logodds ratio shown in Figure 7 suggests not (but I have not gone through the calculated effect here so may be wrong).
8) In trying to think through the arguments about the amount of stimulus time not affecting the initial decision, I started wondering about a related time scale that I'm not sure was considered: how long does it take to reverse an already begun motor action? If this takes a long time, then it seems that there may be a very narrow window to actually accomplish a change of mind reflected in a hand movement direction change. Is this accounted for implicitly in the time scale arguments and model for the deadline for change of mind, i.e. can the state of belief at t_{pip} actually turn around the hand this late into the process?
9) I think the authors should mention and compare their model with a recent paper by Wei and Wang (J Neurophysiol) which purports a biophysical mechanism that produces similar results. I think one discrepancy may be the confidence in errors (Figure 2C) being of incorrect trend in their model.
With respect to this final point, note that in the reviewer discussion it was made clear that we do not intend you to code up the Wang model, but instead to discuss the likely similarities and differences.
https://doi.org/10.7554/eLife.12192.019Author response
Essential revisions: First, there were concerns about the extent to which the common mechanism argument was a complete description of the data Two reviewers raised a related point on these lines. 1) The model fits in Figure
2 show a good fit to accuracy and RT data (panels A and B) but a poorer fit to the proportion of high confidence responses (C), particularly for error trials. E.g. subject 1 has only two (out of 6) errortrial data points situated near the fitted line, and only one data point for subject 2 (part of the issue in realising this is that the purple error trial points are often hidden behind the green correct trial points, it would be useful to jitter these sideways for visualisation). This is the case even though there are several hundred error trials particularly at lower coherences with correspondingly tight error bars on confidence proportions. This discrepancy is particularly apparent at 0% coherence, where subjects 1, 2 and 4 show substantially lower confidence than predicted by the model. I'm not suggesting therefore that there can't be a common mechanism that does a good job of explaining all these data features (e.g. perhaps subjects are inferring on coherence, leading to lower confidence at 0% than predicted by a model that marginalizes coherence?). But it seems reasonable that a putative common mechanism driving choice, RT and confidence should do an equally good job of explaining each feature separately.
The number of parameters in the model is small, by design, and we would like to remind the reviewers that the main focus of the paper is on change of mind about confidence and direction. We have applied more complex models to initialchoice data (e.g., Kiani et al., Neuron, 2014) but elected to go with the simpler model here so as to place emphasis on the paper’s novel contribution. We are aware that the fits are not perfect, and we now acknowledge this in the revised manuscript. The pathology in our view arises from several factors: the fit is driven by the more frequent observations (i.e. correct choices); we do not fit the correlation between races which is fixed to a sensible value (only a few values of correlation lead to an analytically tractable fitting procedure); the bounds are stationary (i.e., flat). We suspect that a combination of these factors contribute to the poorer fit of the errors. We suspect the model may be missing a feature that would be required to explain the confidence at 0% coherence, and we are now forthcoming on this observation.
As requested we have jittered the less visible points in Figure 2C.
2) First, it is not clear to me what data could not be fitted by their model, given multiple fitted parameters per subject. It would be nice to see how a model with a separate "confidencegenerating" and "decisiongenerating" circuit would produce data that would not be fit by their model.
Examining separate "confidencegenerating" and "decisiongenerating" processes is a useful suggestion, albeit somewhat open ended, since a “confidencegenerating” mechanism that does not exploit the information in the decision variable (DV), as ours does, could conceivably exploit a wide class of nonnormative sources. We considered several possibilities and now mention two in the paper, in response to the reviewers’ summary point (#4) below.
3) I do agree though that using Occam's razor, fitting by one model circuit is preferable to two. However, Figure 2C shows a horrible fit of the model to the 0% coherence data points in a systematic direction across subjects. This suggests a systematic problem with the model, but the authors do not (I think) mention this discrepancy.
We addressed this above in point #1.
4) There was agreement in the reviewer discussion that simulation of competing models to demonstrate how the data disconfirm a model that does not rely on a common mechanism would be important if the argument is to shift broader opinion. The reviewers regarded such a demonstration as essential. One suggestion during the discussion was as follows: "Perhaps a simplification of their model and then use of Model comparison techniques to adjust for different numbers of parameters would be a good test to show that a separate computation is needed for confidence as they claim."
As suggested, we have examined two alternative models that do not use a separate computation for confidence (i.e., a criterion in logodds space). Model 1 exploits a classic idea from signaldetectiontheory, which would base confidence solely on the balance of evidence. This information is also used in our model, but the simpler alternative that we now consider ignores elapsed deliberation time. In Model 2, confidence is based on decision time only, thus ignoring the balance of evidence. We fit both models to our participants’ data. They have the same number of parameters as the original model (df=4), but the fits are substantially poorer (compared to the original model, log likelihood is reduced by 631180 and 23013749 across participants for models 1 and 2, respectively). We include two figures (Author response image 1 and Author response image 2) so that the reviewers can appreciate the inadequacy of these models, but we feel it would distract readers from the focus of the paper – the changes in confidence and choice – were we to include them. The exposition is already too heavy on the initial choices, in our view. We report these new analyses and model comparisons in terms of log likelihoods in the revised paper. Also, please see response to point #6 as to why the initial decision is not terminated on a logodds bound.
Related to these concerns, was a concern about data exclusion. 5) This worry is compounded by the exclusion of data that shows aberrant confidence behaviour. One subject was excluded for a strong highconfidence bias
and absence of confidenceaccuracy correlation, which is indeed at odds with the proposed mechanism linking choice and confidence, but perhaps not with other models. Similarly the second half of subject 4's data is excluded because they changed the way they reported confidence half way through the session. These exclusions seem to unfairly skew the playing field in favour of the authors' singlemechanism account, particularly as the remaining subjects' data constitute only a little over half of all data that was originally collected. During the discussion it was made clear that although it is understandable that subjects may have effortrelated or other biases that make the data uninteresting, it is a concern when this is 2 of 6 subjects, and it is a concern that the particular exclusions seem to act in your favour. Because of this, it was thought to be essential that the excluded data were presented in the supplement, and that the reason for the odd behaviour was investigated and described.
We required subjects to have sufficient perceptual skills and motivation to perform the task. One subject was excluded very early as they were at chance on all coherence levels during initial training.
A second subject was excluded because he responded with high confidence on 95% of trials (and with 90% high confidence even at 0% coherence). After 5 sessions (3330 trials) we decided to replace this subject. We found that despite the idiosyncrasy in his data, our model can fit the data quite well. We now show the fit to his initial choices in a new Figure 2—figure supplement 1. Note that because of the low frequency of low confidence decisions, the fits are dominated by the high confidence choices, which are fitted well.
With regard to the subject for whom we only used the first half of the data, no stationary model can account for nonstationary behavior. This subject showed a qualitative change in behavior on the second half of their data (after ~5000 trials), as shown in new Figure 2—figure supplement 2A. Importantly, we are able to fit each half of this subject's data separately with our model. We have now included the second half of the data and the model fits in new Figure 2—figure supplement 2B.
6) There was also an important question about perceived asymmetries in model that the reviewers required clarification about. It was unclear why you switch between working in accumulator space and logodds space. In the discussion, it was agreed that it might be possible for you to clarify in the text if there was a strong and compelling reason, but if not then it would be interesting to see a symmetric model. My largest comment is that, while confidence and decisions are clearly put together in a single framework, I found the framework to be a little awkward and/or unclear in what information/type of model informed decisions versus what informed confidence. The authors use a classic "race to bound" model to inform the decision. However, then they appear to use a separate probabilistic judgment (log odds, or maybe log odds ratiothis was a bit unclear) for confidence and for changes of decision and confidence. This seems awkward: why not just use the log odds metric for all stages of the task instead of artificially making a race to threshold for two completely independent integrators for the decision and then a separate probabilistic framework for confidence and changes of decision?
Or, maybe I am misinterpreting the log odds graph and this could simply be interpreted as a onetoone mapping between the decision variable and the log odds (so that one could indeed reformulate the race to bound in the decision variable as a race to bound in log odds). If this is the case, then I find the log odds measure for change of choice to be somewhat strange: in the race to threshold model, there are two completely independent accumulators that are racing against each other. In the log odds analysis, it would seem very awkward if only the odds for the initially winning accumulator was used (i.e. I'm assuming either the authors are using odds or odds ratios based upon using both accumulators, or they are just using the winning accumulator, since the figures only indicate a single trace in log odds). If only the winning accumulator is used, then it seems strange that the losing accumulator was never consulted on a change of choice, i.e. that this change of choice was solely driven by a change of log odds in the winning accumulator.
So the bottom line is that, at least as written, it appears to me that one of two awkward modeling choices is occurring:
A) The decision variable is determined by 2 independent integrators of a decision variable but confidence and changes of decision are determined by an odds ratio that combines the 2 integrators,
in which case, why wasn't the initial decision made based upon a measure that combines the two integrators, rather than having this sudden switch between what determines the decision?
B) The log odds are determined only by the winning accumulator's log odds,
in which case, it seems odd to have a change of choice that occurs without consideration of the new winner's accumulator.
We accept the reviewers’ concern about “awkwardness” and will address it in detail after responding to the “bottom line,” which is that neither alternative (A) or (B) characterizes our modeling choice. Beginning with the latter (B), we agree that it would have been odd to ignore the new winner’s accumulator, but that is not what we did. Logodds is calculated by taking all available information into account, i.e., the state of the winning accumulator, the state of the losing accumulator, and elapsed time. Regarding the first point (A), we actually conceive of similar operations in the initial and postinitiation epoch. Both rely on the two integrators (which are negatively correlated, so not independent) and elapsed time to establish the logodds for confidence, and in both epochs, the choice is based on the states of the competing accumulators. We were opaque on this last point because we adopted, and failed to explain, a simplification, applied to the postinitiation epoch only, that allowed us to display the changeofdirection bound as a threshold in belief space. We will justify the simplification in a moment, but the important point is that there is really no conceptual “asymmetry.” Before justifying the simplification, we need to explain the more fundamental awkwardness that the reviewers spotlight – that is, why wasn't the initial decision based upon a measure that combines the two integrators, such as logodds?
The reason is that such a termination on logodds is inconsistent with existing behavioral and neurophysiological data. For example, (1) If decisions terminate on the same log odds correct, confidence (by definition) would be the same for all choices, independent of reaction time and stimulus strength. Even if the logodds bounds were to move in time, the confidence would be purely timedependent and not depend on coherence (inconsistent with Kiani et al., Neuron, 2014). (2) A constant logodds correct bound would produce a very flat choice function (probability correct as a function of coherence) and bizarrely long reaction times. (3) Neural representations of accumulated evidence have been reported in several areas of the monkey brain. The drift and noise are consistent with the representation of a DV, and there is a signature of a terminating bound applied to the neural representation of the winning DV. All of the information to compute logodds correct is available in the state of the competing DVs and elapsed time, but there is no evidence that the firing rate of the competing accumulators combine these sources to represent logodds explicitly.
Therefore, there is ample justification for assuming different circuits for DV and belief (if belief is explicitly represented at all). Importantly, however, it is easy to show that any computation based on belief (e.g., reporting confidence or its changes) can be formulated as a time dependent criterion on the DVs.
In principle, everything we just stated about the initial decision should apply to the postinitiation epoch to explain change of mind about confidence and/or direction. However, we made a simplifying assumption that allowed us to perform both calculations in belief space. Because the information available in this epoch is limited by t_{nd} and motor costs, we assumed that subjects would base their choices on all the available information up to some point (t_{pip}≤ t_{nd}). Thus we are ignoring explicitly (it is now explicit in the revision) the likelihood that a change of mind has a termination rule. We think it does (with all kinds of interesting motor costs), but there are too few trials to achieve traction on this, as we now explain. Expressing changes of confidence via a criterion in logodds is already the natural choice, of course. We conceive of the change of mind about direction as operating on the state of the DVs at t_{pip}. For example, were the criterion for change of choice simply a change in the sign of the balance of evidence, this would be identical to a logodds criterion at zero. However, the hysteresis effect for choice places the criterion on DV on one or the other side of zero. In summary, our simplifying assumption introduces a minor approximation which allows us to illustrate changes of confidence and choice on the same figure, which we view as useful to readers – or should be now that we have cleared up the confusion.
Finally, we acknowledge that it is possible to view both confidence and choice as operating on DVs and elapsed time. This is the approach that yields the optimal policy (stopping rule) for deciding among the four options (left/right direction x high/low confidence) based on optimizing the points per unit time (or some other desideratum), as pursued by Drugowitsch and Rao groups, for example.
We have clarified the approximation and our rationale in the revised manuscript (see Methods).
Finally, there were several important points that do not require more analysis but that the reviewers thought merited serious consideration and discussion. 7) The evidence for separate mechanisms includes a correlation of confidence parameters across subjects when completing distinct tasks that is not seen in the decisionmaking parameters. The model in this paper seems to be distinct from that –
it seems that one would expect the lowhigh confidence boundaries to remain intact but the decision boundary to change when a new task is performed. Hard to imagine why this would occur in one model, so this issue should at least be discussed.
The reviewers seem to be making an intriguing connection to a controversial literature bearing on the question of whether individuals performing different tasks tend to exhibit consistent “metacognitive” sensitivities, such that some individuals are consistently better than others at discriminating correct from incorrect decisions (e.g., Fleming et al., Science 2010). We do not have much to say about this, but the analysis in Figure 7 demonstrates one reason why a subject might have higher metacognitive sensitivity than another. When confidence is reported after a choice, confidence can be influenced by additional evidence accrued after the commitment to a choice. The idea is similar to one discussed by Pleskac and Busemeyer (2010), and more recently by Moran, Teodorescu and Usher (2015) and Yu, Pleskac and Zeigenfuse (2015). The analysis of Figure 7 demonstrates that the opportunity to revise the initial confidence based on additional evidence leads to higher ‘metacognitive’ sensitivity. Therefore, an intriguing possibility is that individuals with higher metacognitive sensitivity are those that can accumulate evidence for longer periods of time (or more efficiently) after the commitment to a choice. This mechanism could account for individual differences in sensitivity without the need to postulate separate mechanisms for choice and for confidence. Furthermore, it is empirically testable, with analyses like those shown in Figure 4. We now have added the following to Discussion:
“The evaluation of additional evidence may explain why confidence ratings are more strongly correlated with accuracy when they are reported with less time pressure (Yu et al., 2015), and why some individuals appear to be better than others at discriminating correct from incorrect decisions (e.g. Ais et al., 2016).”
Perhaps the authors data do explain this, but it is unclear to me –
there is a wellknown "highlow effect" whereby subjects are more confident than they should be at hard decisions (low coherence) and less confident than they should be at easy decisions (high coherence). My intuition is that if a decision is made yet more information affecting confidence is received postdecision then that extra information would more likely favor the decision (so boost confidence) in easy decisions but as often as not favor the nonchosen target (so reduce conference) in difficult decisions. Is this effect accounted for in the authors' framework? The universal increase in logodds ratio shown in Figure 7 suggests not (but I have not gone through the calculated effect here so may be wrong).
We don’t need to resort to changes of mind to explain the highlow effect. For example, Kiani et al. (Neuron 2014) showed that even at the lowest coherence, where participants are at chance, the predicted certainty is around 0.7. And for the high coherence trials certainty is at around 0.9 despite almost perfect performance. So the highlow effect arises naturally from the diffusion model. The reviewers’ intuitions are partly correct, but the deterministic component of the DV contributes a biastoward higher confidence for all correct trials. The deeper intuition behind highlow is to think about all the motion strengths and directions that could have given rise to the DVs at decision time. The DVs achieved in a highcoherence trial could occasionally arise in a low or intermediatecoherence trial. This possibility biases confidence to lower values despite nearly perfect accuracy on highcoherence trials. Similarly, the DVs achieved in a lowcoherence trial could be occasionally observed in an intermediate or highcoherence trial, biasing confidence on lowcoherence trials to higher values despite near chance accuracy.
8) In trying to think through the arguments about the amount of stimulus time not affecting the initial decision, I started wondering about a related time scale that I'm not sure was considered: how long does it take to reverse an already begun motor action? If this takes a long time, then it seems that there may be a very narrow window to actually accomplish a change of mind reflected in a hand movement direction change. Is this accounted for implicitly in the time scale arguments and model for the deadline for change of mind, i.e. can the state of belief at t_{pip} actually turn around the hand this late into the process?
Yes, this is implicitly included in t_{pip}, but we were less than clear on this, and thank the reviewers for bringing this to our attention. We suggest that t_{pip} (Figure 3) is the amount of the nondecision time participants will use after the initial decision to determine whether to change one’s mind. This is partially determined by the physical effort to change the trajectory of the hand. We showed previously (Burk et al., Plos One, 2014) that manipulating the effort required to change one’s mind can affect the number of changes of mind and t_{pip}. We revised the legend to Figure 3 to make the latencies clearer, and we reinforce the concept that t_{pip} cannot exceed t_{nd}.
9) I think the authors should mention and compare their model with a recent paper by Wei and Wang (J Neurophysiol) which purports a biophysical mechanism that produces similar results. I think one discrepancy may be the confidence in errors (Figure
2C) being of incorrect trend in their model. With respect to this final point, note that in the reviewer discussion it was made clear that we do not intend you to code up the Wang model, but instead to discuss the likely similarities and differences.
The Wei and Wang paper focuses mainly on the postdecision wagering task (Kiani & Shadlen, 2009), although they devote a section to the RT version of the task. Wei and Wang estimated confidence based only on the difference of neural activities in a ring attractor model. Computation of confidence in their model is in essence similar to the Balance of Evidence model explained above (see point #4). It suffers from the same shortcomings too. For example, as the reviewers mentioned, the model predicts that confidence in errors decreases with stimulus strength, contrary to the data. Despite this failure, we appreciate Wei and Wang’s model as a step toward understanding the circuit that underlies choice and confidence. They are aware of the discrepancy between their model and experimental results and are working toward a better solution (see their Discussion).
We now cite this paper when we mention the alternative model that bases confidence solely on balance of evidence (i.e., ignoring elapsed time).
https://doi.org/10.7554/eLife.12192.020Article and author information
Author details
Funding
Wellcome Trust
 Ronald van den Berg
 Kavi Anandalingam
 Daniel M Wolpert
Howard Hughes Medical Institute
 Ariel Zylberberg
 Michael N Shadlen
Royal Society
 Ronald van den Berg
 Kavi Anandalingam
 Daniel M Wolpert
Human Frontier Science Program
 Ronald van den Berg
 Kavi Anandalingam
 Michael N Shadlen
 Daniel M Wolpert
National Eye Institute
 Ariel Zylberberg
 Michael N Shadlen
Simons Foundation
 Roozbeh Kiani
Alfred P. Sloan Foundation
 Roozbeh Kiani
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank the Wellcome Trust, Human Frontier Science Program, Royal Society (Noreen Murray Professorship in Neurobiology to DMW), Howard Hughes Medical Institute, National Eye Institute Grant EY11378 to MNS, a Sloan Research Fellowship to RK, and a Simons Collaboration on the Global Brain grant to RK.
Ethics
Human subjects: The Cambridge Psychology Research Ethics Committee approved the experimental protocol, and subjects gave written informed consent.
Reviewing Editor
 Timothy EJ Behrens, University College London, United Kingdom
Publication history
 Received: October 9, 2015
 Accepted: January 31, 2016
 Accepted Manuscript published: February 1, 2016 (version 1)
 Version of Record published: March 7, 2016 (version 2)
Copyright
© 2016, van den Berg et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 4,561
 Page views

 1,216
 Downloads

 74
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.