1. Neuroscience
Download icon

Human VMPFC encodes early signatures of confidence in perceptual decisions

  1. Sabina Gherman
  2. Marios G. Philiastides  Is a corresponding author
  1. University of Glasgow, United Kingdom
Research Article
  • Cited 2
  • Views 1,114
  • Annotations
Cite this article as: eLife 2018;7:e38293 doi: 10.7554/eLife.38293

Abstract

Choice confidence, an individual’s internal estimate of judgment accuracy, plays a critical role in adaptive behaviour, yet its neural representations during decision formation remain underexplored. Here, we recorded simultaneous EEG-fMRI while participants performed a direction discrimination task and rated their confidence on each trial. Using multivariate single-trial discriminant analysis of the EEG, we identified a stimulus-independent component encoding confidence, which appeared prior to subjects’ explicit choice and confidence report, and was consistent with a confidence measure predicted by an accumulation-to-bound model of decision-making. Importantly, trial-to-trial variability in this electrophysiologically-derived confidence signal was uniquely associated with fMRI responses in the ventromedial prefrontal cortex (VMPFC), a region not typically associated with confidence for perceptual decisions. Furthermore, activity in the VMPFC was functionally coupled with regions of the frontal cortex linked to perceptual decision-making and metacognition. Our results suggest that the VMPFC holds an early confidence representation arising from decision dynamics, preceding and potentially informing metacognitive evaluation.

https://doi.org/10.7554/eLife.38293.001

eLife digest

While waiting to cross the road on a foggy morning, you see a shape in the distance that appears to be an approaching car. How do you decide if it is safe to cross? We often have to make important decisions about the world based on imperfect information. What guides our subsequent actions in these situations is a sense of accuracy, or confidence, that we associate with our initial judgments. You would not step off the kerb if you were only 10% confident the car was a safe distance away. But how, when, and where in the brain does such confidence emerge?

Gherman and Philiastides examined how brain activity relates to confidence during the early stages of decision-making, that is, before people have explicitly committed to a particular choice. Healthy volunteers were asked to judge the direction in which dots were moving across a screen. They then had to rate how confident they were in their decision. Two techniques – EEG and fMRI – tracked their brain activity during the task. EEG uses scalp electrodes to reveal when and how electrical activity is changing inside the brain, while fMRI, a type of brain scan, shows where these changes in brain activity occur. Used together, the two techniques provide a greater understanding of brain activity than either used alone.

Activity in multiple regions of the brain correlated with confidence at different stages of the task. Certain brain networks showed confidence-related activity while the volunteers tried to judge the direction of movement, and others were engaged when volunteers made their confidence ratings. However, activity in only one area reliably indicated how confident the volunteers felt before they had made their choice. This area, the ventromedial prefrontal cortex, also helps process rewards. This suggests that feelings of confidence early in the decision-making process could guide our behaviour by virtue of being rewarding.

Many brain disorders – including depression, schizophrenia and Parkinson's disease – compromise decision-making. Patients show changes in accuracy, response times, and in their ability to accurately evaluate their decisions. The methods used in the current study could help reveal the neural changes that cause these impairments. This could lead to new methods to diagnose and predict cognitive deficits, and new ways to treat them at an earlier stage.

https://doi.org/10.7554/eLife.38293.002

Introduction

Our everyday lives involve situations where we must make judgments based on noisy or incomplete sensory information – for example deciding whether crossing the street on a foggy morning, in poor visibility, is safe. Being able to rely on an internal estimate of whether our perceptual judgments are accurate is fundamental to adaptive behaviour and accordingly, recent years have seen a growing interest in understanding the neural basis of confidence judgments.

Within the perceptual decision making field, several studies have sought to characterise the neural correlates of confidence during metacognitive evaluation (i.e., while subjects actively judge their performance following a choice), revealing the functional involvement of frontal networks, in particular the lateral anterior and anterior cingulate prefrontal cortices (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018). Concurrently, psychophysiological work in humans and non-human primates using time-resolved measurements has shown that confidence encoding can also be observed at earlier stages, and as early as the decision process itself (Kiani and Shadlen, 2009; Zizlsperger et al., 2014; Gherman and Philiastides, 2015).

In line with these latter observations, recent fMRI studies have reported confidence-related signals nearer the time of decision (e.g., during perceptual stimulation) in regions such as the striatum (Hebart et al., 2016), dorsomedial prefrontal cortex (Heereman et al., 2015), cingulate and insular cortices (Paul et al., 2015), and other areas of the prefrontal, parietal, and occipital cortices (Heereman et al., 2015; Paul et al., 2015). Interestingly, confidence-related processing has also been reported in the ventromedial prefrontal cortex (VMPFC) during value-based decisions and various ratings tasks (De Martino et al., 2013; Lebreton et al., 2015), however the extent to which this region is additionally involved in perceptual judgments relying on temporal integration of sensory evidence remains unclear.

Importantly, the studies above suggest that confidence is likely to involve a temporal progression of neural events requiring the involvement of multiple networks, as opposed to a single event or quantity. Identifying neural confidence representations that arise early in the decision process (e.g., prior to metacognitive report or as early as the choice itself) is an important prerequisite in understanding the broader confidence-related dynamics, as these signals may provide the basis for higher-order and more deliberate processes such as metacognitive appraisal. Nevertheless, efforts to characterise early confidence representations in the human brain have been limited.

One potential limitation in previous approaches to studying the neural representations of confidence is the exclusive reliance on correlations with behavioural measures, most commonly in the form of subjective ratings given by participants after the decision (Grimaldi et al., 2015). However, theoretical and empirical work suggests that post-decisional metacognitive reports may be affected by processes occurring after termination of the initial decision (Resulaj et al., 2009; Pleskac and Busemeyer, 2010; Fleming et al., 2015; Moran et al., 2015; Murphy et al., 2015; Yu et al., 2015; Navajas et al., 2016; van den Berg et al., 2016; Fleming and Daw, 2017), such as integration of existing information, processing of novel information arriving post-decisionally, or decay (Moran et al., 2015), and may consequently be only partly reflective of early confidence-related states.

Here we aimed to derive a more faithful representation of these early confidence signals using EEG, and exploit the trial-by-trial variability in these signals to build parametric EEG-informed fMRI predictors, thus providing a starting point to a more comprehensive spatiotemporal account of decision confidence. We hypothesised that using an electrophysiologically-derived (i.e., endogenous) representation of confidence to detect associated fMRI responses would provide not only a more temporally precise, but also a more accurate spatial representation of confidence around the time of decision.

To test this hypothesis, we collected simultaneous EEG-fMRI data while participants performed a random-dot direction discrimination task and rated their confidence in each choice. Using a multivariate single-trial classifier to discriminate between High vs. Low confidence trials in the EEG data, we extracted an early, stimulus-independent discriminant component appearing prior to participants’ behavioural response. These early representations of confidence correlated across subjects with measures of confidence predicted by an accumulation-to-bound model of decision making. We then used the trial-to-trial variability in the resulting confidence signal as a predictor for the fMRI response, revealing a positive correlation within a region of the VMPFC not commonly associated with confidence for perceptual decisions. Crucially, activation of this region was unique to our EEG-informed fMRI predictor (i.e., additional to those detected with a conventional fMRI regressor, which relied solely on participants’ post-decisional confidence reports). Furthermore, a functional connectivity analysis revealed a link between the activation in the VMPFC, and regions of the prefrontal cortex involved in perceptual decision making and metacognition.

Results

Behaviour

Subjects (N = 24) performed a speeded perceptual discrimination task whereby they were asked to judge the motion direction of random dot kinematograms (left vs. right), and rate their confidence in each choice on a 9-point scale (Figure 1A). Stimulus difficulty (i.e., motion coherence) was held constant across all trials, at individually determined psychophysical thresholds. We found that on average, subjects indicated their direction decision 994 ms (SD = 172 ms) after stimulus onset and performed correctly on 75% (SD = 5.2%) of the trials. In providing behavioural confidence reports, subjects tended to employ the entire rating scale, showing that subjective confidence varied from trial-to-trial despite perceptual evidence remaining constant throughout the task (Figure 1B).

Experimental design and behavioural performance.

(A) Schematic representation of the behavioural paradigm. Subjects made speeded left vs. right motion discriminations of random dot kinematograms calibrated to each individual’s perceptual threshold. Stimulus difficulty (i.e., motion coherence) and was held constant across trials. Stimuli were presented for up to 1.2 s, or until a behavioural response was made. After each direction decision, subjects rated their confidence on a 9-point scale (3 s). The response mapping for high vs. low confidence ratings alternated randomly across trials to control for motor preparation effects, and was indicated by the horizontal position of the scale, with the tall end representing high confidence. All behavioural responses were made on a button box, using the right hand. (B) Mean confidence rating behaviour, showing the frequency with which subjects selected each point on the confidence scale. (C) Mean proportion of correct direction choices as a function of reported confidence. (D) Mean response time as a function of reported confidence. Faint grey lines in (B), (C), and (D) indicate individual subject data. For (C) and (D) we excluded any trial averages based on fewer than five trials.

https://doi.org/10.7554/eLife.38293.003

As a general measure of validity of subjects’ confidence reports, we first examined the relationship with behavioural task performance. Specifically, confidence is largely known to scale positively with decision accuracy and negatively with response time (Vickers and Packer, 1982; Baranski and Petrusic, 1998), though this relationship is not perfect, and is subject to individual differences (Baranski and Petrusic, 1994; Fleming et al., 2010; Fleming and Dolan, 2012). As expected, we found a positive correlation with accuracy (subject-averaged R = 0.30; one-sample t-test, t(23) = 13.9, p<0.001) (Figure 1C), and a negative correlation with response time (subject-averaged R = −0.27; one-sample t-test, t(23) = −7.8, p<0.001) (Figure 1D). Thus, subjects’ confidence ratings were generally reflective of their performance on the perceptual decision task.

Next, we asked whether the observed variability in subjects’ confidence reports could be explained by sustained fluctuations in attention (i.e., spanning multiple trials). We reasoned that decreases in attention may be reflected as serial correlations in confidence ratings across trials. To test this possibility, we performed a serial autocorrelation regression analysis on a single subject basis, which predicted confidence ratings on the current trial from ratings given on the immediately preceding five trials. On average, this model accounted for only a minimal fraction of the variance in confidence ratings (subject-averaged R2 = 0.07). Finally, we sought to rule out the possibility that trial-to-trial variability in confidence could be explained by potential subtle differences in low-level physical properties of the stimulus that may go beyond motion coherence (e.g., location and/or timing of individual dots). To this end, we compared subjects’ confidence reports on the two experimental blocks (consisting of identical sequences of random-dot kinematograms), and found no significant correlation between these (subject-averaged R = 0.02, one-sample t-test, p=0.44). Taken together, these results support the hypothesis that subjects’ reports reflected internal fluctuations in their sense of confidence, which are largely unaccounted for by external factors.

EEG-derived measure of confidence

To identify confidence-related signals in the EEG data, we first separated trials into three confidence groups (Low, Medium, and High) on the basis of subjects’ confidence ratings. We then conducted a single-trial multivariate classifier analysis (Parra et al., 2005Sajda et al., 2009) on the stimulus-locked EEG data, designed to estimate linear spatial weightings of the EEG sensors (i.e., spatial projections) discriminating between Low- vs. High-confidence trials (see Materials and methods). Applying the estimated electrode weights to single-trial data produced a measurement of the discriminating component amplitudes (henceforth yCONF), which represent the distance of individual trials from the discriminating hyperplane, and which we treat as a surrogate for the neural confidence of the decision.

Note that even though participants’ post-decision ratings may not form an entirely faithful representation of earlier confidence signals, they can nevertheless be used to separate trials into broad confidence groups for training the classifier and estimating the relevant discrimination weights at the time of decision. Data from individual trials, including those not originally used in the discrimination analysis, were subsequently subjected through these electrode weights to obtain a trial-specific graded measure of internal confidence. In other words, these electrophysiologically-derived confidence measures depart from their behavioural counterparts in that they contain trial-to-trial information from the neural generator giving rise to the relevant discriminating components. As such, these estimates can potentially offer additional insight into the internal processes that underlie confidence at these early stages of the decision.

To quantify the discriminator's performance over time we used the area under a receiver operating characteristic curve (i.e., Az value) with a leave-one-out trial cross validation approach to control for overfitting (see Materials and methods).

We found that discrimination performance (Az) between the two confidence trial groups peaked, on average, 708 ms after stimulus onset (SD = 162 ms, Figure 2A; see Figure 2—figure supplement 1 for Az locked to the time of rating). To visualise the spatial extent of this confidence component, we computed a forward model of the discriminating activity (Materials and methods), which can be represented as a scalp map (Figure 2A). Importantly, both the temporal profile and electrode distribution of confidence-related discriminating activity were consistent with our previous work (Gherman and Philiastides, 2015) where we used stand-alone EEG to identify time-resolved signatures of confidence during a face vs. car visual categorisation task. Together these observations are an indication that the temporal dynamics of decision confidence can be reliably captured using EEG data acquired inside the MR scanner, and that these early confidence-related signals may generalise across tasks.

Figure 2 with 2 supplements see all
Neural representation of confidence in the EEG.

(A) Classifier performance (Az) during High- vs. Low-confidence discrimination for stimulus-locked data. Each row represents the Az as a function of time, for a single subject (warm colours indicate higher values). The overlapping line (orange) shows the mean classifier performance across subjects. Outlined in white are the pre-response time windows of peak confidence discrimination used subsequently to extract single-trial measures of confidence (i.e., discriminant component amplitudes). In selecting these, we considered only the discrimination period ending, on average, at least 100 ms (across-subject mean 271 ± 162 ms) prior to subjects’ mean response times, to minimise potential confounds with activity related to motor execution, due to a sudden increase in corticospinal excitability in this period (Chen et al., 1998). Inset shows average (normalised) topography associated with the discriminating component at subject-specific times of peak confidence discrimination. (B) Mean amplitude of the confidence discriminant component as a function of reported confidence, showing a parametric effect across the Low, Medium, and High bins. The mean component amplitudes for individual confidence ratings (weighted by each subjects’ trial count per rating) are also shown (inset). (C) Trial-by-trial confidence discriminant component amplitudes were positively correlated with accuracy. To visualise this relationship, single-trial component amplitudes were grouped into five bins. (D) Mean amplitude of the confidence discriminant component for correct vs. error responses, showing a significant effect of choice accuracy.(E) Mean amplitude of the confidence discriminant component as a function of reported confidence, for correct trials only (in order to control for accuracy). The same pattern as in (B) is observed. (F) Mean amplitudes of the confidence discriminant component did not differ significantly between trials associated with High vs. Low prestimulus oscillatory power in the alpha band (which we used as a proxy for subjects’ prestimulus attentional state). (G) Relationship between the strength of electrophysiological confidence signals on the current trial (i.e., confidence-discriminating component amplitudes) and the tendency to repeat a choice on the immediately subsequent trial, for trial pairs showing stimulus motion in the same direction (i.e., nominally identical stimuli). Faint orange (in B) and grey lines (in C–G) represent individual subject data.

https://doi.org/10.7554/eLife.38293.004

To provide additional support linking this discriminating component to choice confidence, we considered the Medium-confidence trials. Importantly, these trials can be regarded as ‘unseen’ data, as they are independent from those used to train the classifier. We subjected these trials through the same neural generators (i.e., spatial projections) estimated during discrimination of High- vs. Low-confidence trials and, as expected from a graded quantity, found that the mean component amplitudes for Medium-confidence trials were situated between, and significantly different from, those in the High- and Low-confidence trial groups (both p<0.001, Figure 2B). To ensure these results were not due to overfitting, we also repeated the above comparisons using fully out-of-sample discriminant component amplitudes obtained from our leave-one-out cross-validation procedure (see Materials and methods), and found that differences remained significant (both p<0.001, Figure 2—figure supplement 2)

We next examined the relationship between the confidence-discriminating component and objective performance on the perceptual discrimination task. We found that component amplitudes were positively correlated with decision accuracy (one-sample t-test on logistic regression coefficients, t(23)=8.6, p<0.001, Figure 2C), and were consistently higher for correct vs. incorrect responses across subjects (t(23)=7.58, p<0.001, Figure 2D), in line with the well-established relationship between confidence and accuracy. To rule out the possibility that the modulation of discriminant component amplitude by confidence was purely explained by objective performance, we compared component amplitudes for Medium-confidence against High-/Low-confidence using only trials associated with correct responses, and showed that differences between these trial groups remained significant (both p<0.001, Figure 2E). The same pattern was found when repeating the analysis separately on error trials (both p<0.001). These results indicate that the confidence-related neural component can be dissociated from objective performance, as might be expected from previous reports (Lau and Passingham, 2006; Rounis et al., 2010; Komura et al., 2013; Lak et al., 2014; Fleming and Daw, 2017).

As the duration of the visual motion stimulus varied across trials in our task (i.e., remained on until subjects made a motor response on the perceptual task) another potential concern might be that the variability in the EEG-derived confidence signatures we identified here could be explained by these stimulus-related factors. We reasoned that if that were the case, we might expect high correlation between stimulus duration and discriminant component amplitudes. However, we found that this correlation was weak (subject-averaged R = -.15), suggesting that our classification results could not have been solely driven by this factor.

Finally, we addressed the possibility that the observed variability in the confidence discriminating component could be attributed to sustained fluctuations in attention, by conducting a serial autocorrelation analysis which predicted component amplitudes on a given trial from those on the preceding five trials (separately for each subject). As before, we expected that if attentional fluctuations are driving the variability in our EEG-derived confidence measures, component amplitudes on a given trial would be reliably predicted by those observed in the immediately preceding trials. We found that this model only explained a small fraction of the variance in component amplitudes (subject-averaged R2 = 0.03).

We also assessed the influence of a neural signal known to correlate with attention (Thut et al., 2006) and predict visual discrimination (van Dijk et al., 2008), namely occipitoparietal prestimulus alpha power. To do this, we separated trials into High vs. Low alpha power groups, individually for each subject, and compared the corresponding average discriminant component amplitudes. We found that these did not differ significantly between the two groups (paired t-test, p=0.19, Figure 2F). Note that variability in the confidence discriminant component was also independent of stimulus difficulty, as this was held constant across all trials. In line with this, discriminant component amplitudes for the two identical-stimulus experimental blocks were not significantly correlated (subject-averaged R = 0.02; one-sample t-test, p=0.39).

Confidence-dependent influences on behaviour

We next sought to identify potential influences of neural confidence signals on decision-related behaviour. In particular, there is evidence that confidence, as reflected in behavioural (Braun et al., 2018) and physiological (Urai et al., 2017) correlates, can play a role in the modulation of history-dependent choice biases. Here, we tested whether the strength of our EEG-derived confidence signals (i.e., confidence discriminant component amplitude yCONF) on a given trial might influence the probability to repeat a choice on the immediately subsequent trial (PREPEAT). While we observed no overall significant links between yCONF and subsequent choice behaviour when considering the entire data set, we found a positive relationship between yCONF and PREPEAT if stimulus motion on the immediately subsequent trial was in the same direction as in the current trial (F(2,46)=5.89, p=.005, with post-hoc tests showing a significant difference in PREPEAT following Low vs. High yCONF trials, p=.015, Bonferroni corrected), as shown in Figure 2G. Thus, stronger confidence signals were associated with an increased tendency to repeat the previous choice.

In contrast, we did not find any modulatory effect of yCONF on choice repetition/alternation behaviour when motion on the current trial was in the opposite direction from that of the previous trial. Thus, choices were only affected by previous confidence when no global change in motion direction had occurred from one trial to the next. Interestingly, this dependence of confidence-related repetition bias on stimulus identity points to a mechanism by which the representation of confidence interacts with a putative process of (subliminal) stimulus-consistency detection (distinguishable from the decision process itself) on the subsequent trial, to influence the decision and/or behaviour.

Dynamic model of decision making

To seek preliminary insight into how our confidence-related EEG measure relates to the decision formation process, we compared our neural signals with a measure of confidence derived from a dynamic model of decision making. Namely, we fitted subjects’ behavioural data (i.e., accuracy and response time) with an adapted version of the race model (Vickers, 1979; Vickers and Packer, 1982; De Martino et al., 2013) (see Materials and methods). This class of models describes the decision process as a stochastic accumulation of perceptual evidence over time by independent signals representing the possible choices (Figure 3A). The decision terminates when one of the accumulators reaches a fixed threshold, with choice being determined by the winning accumulator. Importantly, confidence for binary choices can be estimated in these models as the absolute distance (Δe) between the states of the two accumulators at the time of decision (i.e., ‘balance of evidence’ hypothesis).

Figure 3 with 2 supplements see all
Modelling results.

(A) Schematic representation of the decision model for one trial. Evidence in favour of the two choice alternatives (here, leftward and rightward motion) accumulates gradually over time. A decision is made when one of the accumulators reaches a decision threshold (θ). The model quantifies confidence as the absolute difference in the accumulated evidence for the two options, at the time of decision (Δe). (B) Correlation between behavioural vs. model-predicted choice accuracy. Each point represents trial-averaged data for one subject. (C) Behavioural (circles) and model-predicted (crosses) response time distribution. On the x axis from left to right, data points represent the RT below which 10%, 30%, 50%, 70% and 90% of the data, respectively, are situated. The y axis shows the associated proportion of data for correct (upper symbols) and incorrect (bottom symbols) responses. (D) Across-subject correlation between the model-predicted and neurally observed relationship of confidence with choice accuracy (quantified as the difference in confidence estimates between correct and error trials). Each dot represents data for one subject.

https://doi.org/10.7554/eLife.38293.007

Overall, we found that this model provided a good fit to the behavioural data (Accuracy: R = 0.76, p<0.001, Figure 3B; RT: subject-averaged R = 0.965, all p<=0.0016, see Figure 3—figure supplement 1 for individual subject fits). We illustrate model fits to response time data in Figure 3C (see Figure 3—figure supplement 2 for individual subject fits), whereby response time distributions for correct and error trials are summarised separately using five quantile estimates of the associated cumulative distribution functions (Forstmann et al., 2008).

Here, we were interested in how our neural measures of confidence (EEG-derived discriminant component yCONF) compared against the confidence estimates predicted by the decision model (Δe), at the subject group level. To this end, we computed the mean difference in confidence (as reflected by yCONF and Δe, respectively) between correct and error trials, separately for each subject, and tested the extent to which these quantities were correlated across participants. This relative measure, which captured the relationship between confidence and choice accuracy, also ensured that comparisons across subjects remained meaningful after averaging across trials. We found a significant positive correlation (i.e., subjects who showed stronger difference in yCONF between correct and error trials also showed a higher difference in Δe, R=.48, p=.019, robust correlation coefficient obtained using the percentage bend correlation analysis (Wilcox, 1994); see Figure 3D), opening the possibility that neural confidence signals might be informed by a process similar to the race-like dynamic implemented by the current model.

Exploratory mediation analysis

We sought to further clarify the link between model-derived confidence estimates (Δe), early neural signatures of confidence (yCONF), and subjects’ behavioural reports during the rating phase of the trial (Ratings), by performing an exploratory mediation analysis on these measures. We hypothesised that yCONF may be informed by quantities equivalent to Δe, and in turn influence the confidence estimates reflected in post-choice reports. Thus, we tested whether yCONF may act as a statistical mediator on the link between Δe and Ratings. As with our previous analysis linking yCONF and Δe (Figure 3D), we first computed the mean difference between correct and error trials for each of the three variables of interest, to produce comparable measures across subjects (i.e., by removing potentially task-irrelevant individual differences in the trial-averaged scores, such as rating biases). These quantities (henceforth referred to as ΔeDIFF, yCONF_DIFF, and RatingsDIFF) were then submitted to the mediation analysis.

Specifically, we defined a three-variable path model (Wager et al., 2008) with ΔeDIFF as the predictor variable, RatingsDIFF as the dependent variable, and yCONF_DIFF as the mediator (Materials and methods). In line with our prediction, we found that: 1) ΔeDIFF was a significant predictor of yCONF_DIFF (p=.01), 2) yCONF_DIFF reliably predicted RatingsDIFF after accounting for the effect of predictor ΔeDIFF (p<.001), and 3) the indirect effect of yCONF_DIFF, defined as the coefficient product of effects 1) and 2), was also significant (p=.004). While the across-subject nature of the analysis calls for caution in interpreting the results, these observations are consistent with the possibility that yCONF reflects a (potentially noisy) readout of decision-related balance of evidence (as modelled by Δe), and informs eventual confidence reports.

fMRI correlates of confidence

We sought primarily to identify fMRI activations correlating uniquely with the endogenous signatures of confidence at the time of the perceptual decision, as obtained from our EEG discrimination analysis. In particular, we were interested in confidence-related variability in the fMRI response that might be over and above what can be inferred from behavioural confidence reports alone. To this end, we constructed a general linear model (GLM; see Materials and methods) of the fMRI using an EEG-derived regressor for confidence (yCONF) together with additional regressors accounting for variance related to subjects’ behavioural confidence reports (i.e., ratings), and other potentially confounding factors (task performance, response time, attention, and visual stimulation).

fMRI correlates of behavioural confidence reports. We first investigated the activation patterns associated with confidence ratings during the perceptual decision phase of the trial (Figure 4A), defined as the time window beginning at the onset of the random-dot stimulus (and ending prior to the onset of the confidence rating prompt). The coordinates of all activations are listed in Supplementary Table 1 (Supplementary file 1). We found that the BOLD response increased with reported confidence in the striatum, lateral orbitofrontal cortex (OFC), the ventral anterior cingulate cortex (ACC) – areas thought to play a role in human valuation and reward (O'Doherty, 2004Rushworth et al., 2007; Grabenhorst and Rolls, 2011) – as well as the right anterior middle frontal gyrus, amygdala/hippocampus, and visual association areas. Overall, these activations appear consistent with findings from previous studies that have identified spatial correlates of decision confidence (Rolls et al., 2010; De Martino et al., 2013; Heereman et al., 2015Hebart et al., 2016). Negative activations (i.e., regions showing increasing BOLD response with decreasing reported confidence) were found in the right supplementary motor area, dorsomedial prefrontal cortex, right inferior frontal gyrus (IFG), anterior insula/frontal operculum, in line with previous reports of decision uncertainty near the time of decision (Heereman et al., 2015; Hebart et al., 2016 ).

Parametric modulation of the BOLD signal by reported confidence.

(A) Clusters showing positive correlation with confidence during the decision phase of the trial. (B) Clusters showing negative correlation with confidence at the onset of the rating cue (i.e., rating phase). All results are reported at |Z| ≥ 2.57, and cluster-corrected using a resampling procedure (minimum cluster size 162 voxels; see Materials and methods). Ang Gyr, angular gyrus; Ant Ins, anterior insula; IFG (orb), inferior frontal gyrus (orbital region); LOFC, lateral orbitofrontal cortex; MedFG, medial frontal gyrus; MidFG, middle frontal gyrus; NAcc, nucleus accumbens; pgACC, pregenual anterior cingulate cortex; RLPFC, rostrolateral prefrontal cortex; SFG, superior frontal gyrus. The complete lists of activations are shown in Supplementary Tables 1 and 2 (Supplementary file 1).

https://doi.org/10.7554/eLife.38293.010

During the metacognitive report stage of the trial (i.e., 'rating phase', defined as the time window beginning at the onset of the confidence prompt; Figure 4B), we found negative correlations with confidence ratings in extended networks (Supplementary Table 2; Supplementary file 1) which included regions of the rostrolateral prefrontal cortex (bilateral, right lateralised), middle frontal gyrus, superior frontal gyrus (extending along the cortical midline and into the medial prefrontal cortex), orbital regions of the IFG, angular gyrus, precuneus, posterior cingulate cortex (PCC), and regions of the occipital and middle temporal cortices. These activations are largely in line with research on the spatial correlates of choice uncertainty (Grinband et al., 2006Fleming et al., 2012; ) and metacognitive evaluation (Fleming et al., 2010; Molenberghs et al., 2016). Finally, positive correlations were observed in the striatum and amygdala/hippocampus, as well as motor cortices.

fMRI correlates of EEG-derived confidence signals. To identify potential brain regions encoding early representations of confidence as captured by our confidence-discriminating EEG component, we turned to the parametric EEG-derived fMRI regressor (i.e., yCONF regressor), which captured the inherent single-trial variability in these signals. Our approach therefore allowed us to model the fMRI response using time-resolved neural signatures of confidence, which were specific to each subject. Crucially, as these measures captured the variability in the neural representation of confidence near the time of the perceptual decision itself (i.e., prior to behavioural response), they may be better suited for spatially characterising confidence during this time window compared to the behavioural confidence reports obtained later on in the trial (as the latter may be more reflective of confidence-related information arriving post-decisionally). Note that these signals were only moderately correlated with reported confidence (subject-averaged R=.39, SD=.07), and thus could potentially provide additional explanatory power in our fMRI model.

This EEG-informed fMRI analysis revealed a large cluster in the ventromedial prefrontal cortex (VMPFC, peak MNI coordinates [−8 40 – 14]), extending into the subcallosal region and ventral striatum, and a smaller cluster in the right precentral gyrus (peak MNI coordinates [30 -20 64]), where the BOLD response correlated positively with the EEG-derived confidence discriminating component (Figure 5). The VMPFC has been linked to confidence-related processes in value-based, as well as other complex decisions (De Martino et al., 2013; Lebreton et al., 2015), however this region is not typically associated with confidence in perceptual decisions (though see Heereman et al., 2015; Fleming et al., 2018).

Figure 5 with 5 supplements see all
Positive parametric modulation of the BOLD signal by an EEG-derived single-trial confidence measure (see Materials and methods), during the decision phase of the trial.

Results are reported at |Z|≥2.57, and cluster-corrected using a resampling procedure (minimum cluster size 162 voxels). Bottom right: Time course of VMPFC BOLD response, showing parametric modulation by neural confidence (presented for illustration purposes only). Trials are separated by the strength of confidence-discriminating component amplitudes (yCONF). VMPFC, ventromedial prefrontal cortex.

https://doi.org/10.7554/eLife.38293.011

Note also that, as regression parameter estimates resulting from standard GLM analysis reflect variability unique to each regressor (i.e., disregarding common variability) (Mumford et al. 2015), the correlation we observed with the EEG-derived yCONF regressor in the VMPFC during the perceptual decision period is over and above what can be explained by behavioural confidence ratings alone (i.e., the RatingsDEC regressor, Figure 4A). Consistent with this, correlation of the RatingsDEC regressor with activity in the relevant VMPFC cluster (including in a supplementary GLM analysis whereby the yCONF regressor was removed) failed to pass statistical thresholding and would have therefore been missed using behavioural ratings alone.

Interestingly, the scalp map associated with our confidence discriminating EEG component showed a diffused topography including contributions from several centroparietal electrode sites. One possibility is that the observed spatial pattern reflects sources of shared variance between the EEG component and confidence ratings themselves (which was otherwise controlled for in our original fMRI analysis). To test this, we ran a separate control GLM analysis where the confidence ratings regressor (RatingsDEC) was removed, and found that with this model the yCONF regressor explained additional variability of the BOLD signal within several regions, including precuneus/PCC regions of the parietal cortex (Figure 5—figure supplement 1). Notably, activity in these regions has been previously shown to scale with confidence (De Martino et al., 2013; White et al., 2014) and hypothesised to play a role in metacognition (McCurdy et al., 2013).

In a separate analysis, we also explored BOLD signal correlations with the yCONF regressor locked to the confidence rating stage (as part of a GLM model which only included regressors at the time of rating). We found no correlation with yCONF in the VMPFC, suggesting confidence-related activation in this region was specific to the earlier stages of the decision. Clusters showing positive correlation with yCONF were found in the (bilateral) motor cortex, left planum temporale, putamen/pallidum, and lateral occipital cortex (Figure 5—figure supplement 2). Suggestive mainly of motor-related processes, these activations may have been partially confounded by repeated movement (i.e., button pushes) during the rating stage of the trial. More speculatively, confidence representations may be present within motor regions, in line with the idea that decision-related information 'leaks' into the motor systems that support relevant action (Gold and Shadlen, 2000; Song and Nakayama, 2009). We found no clusters showing negative correlation with yCONF at this stage of the trial.

Psychophysiological interaction (PPI) analysis

Having identified the VMPFC as uniquely encoding a confidence signal early on in the trial (i.e., near the time of the perceptual decision), we next sought to explore potential functional interactions of this region with the rest of the brain (for instance, with networks involved in perceptual decision making and/or post-decision metacognitive processes). To this end, we conducted a whole-brain PPI analysis (see Materials and methods), whereby we searched for areas showing increased correlation of their BOLD response with that of a VMPFC seed, during the perceptual decision phase of the trial (i.e., defined here as the trial-by-trial time window between the onset of the motion stimulus and subject’s explicit commitment to choice).

Based on existing literature showing negative BOLD correlations with confidence ratings in regions recruited post-decisionally (e.g., during explicit metacognitive report), such as the anterior prefrontal cortex (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018), we expected that increased functional connectivity of such regions with the VMPFC would be reflected in stronger negative correlation in our PPI. Similarly, we hypothesised that fMRI activity in regions encoding the perceptual decision would also correlate negatively with confidence/VMPFC activation, in line with the idea that easier (and thus more confident) decisions are characterised by faster evidence accumulation to threshold (Shadlen and Newsome, 2001) and weaker fMRI signal in reaction time tasks (Ho et al., 2009; Kayser et al., 2010; Liu and Pleskac, 2011; Filimon et al., 2013; Pisauro et al., 2017). Accordingly, we expected that if such regions increased their functional connectivity with the VMPFC during the decision, this would manifest as stronger negative correlation in the PPI analysis.

We found that clusters in the bilateral orbitofrontal cortex (OFC; peak MNI: [16 18 -16] and [−28 28–20]), left anterior prefrontal cortex (aPFC; peak MNI: [−40 46 4]), and right dorsolateral prefrontal cortex (dlPFC; peak MNI: [48 22 30]) (Figure 6) showed increased negative correlation with VMPFC activation during the perceptual decision. Interestingly, regions in the aPFC and dlPFC in particular have been previously linked to perceptual decision making (Noppeney et al., 2010; Liu and Pleskac, 2011; Philiastides et al., 2011; Filimon et al., 2013), as well as post-decisional confidence-related processes (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018) and metacognition (Fleming et al., 2010; Rounis et al., 2010; McCurdy et al., 2013).

Psychophysiological interaction (PPI) analysis showing functional connectivity with the ventromedial prefrontal cortex (i.e., the seed region of interest; approximate location shown in green) during the perceptual decision phase of the trial.

Clusters in the anterior and dorsolateral prefrontal cortices, as well as the orbitofrontal cortex (shown in blue), show increased negative correlation with the VMPFC during the perceptual decision. All results are reported at |Z| ≥ 2.57, and cluster-corrected using a resampling procedure (minimum cluster size 162 voxels).

https://doi.org/10.7554/eLife.38293.017

Discussion

Here, we used a simultaneous EEG-fMRI approach to investigate the neural correlates of confidence during perceptual decisions. Our method capitalised on the unique explanatory power of time-resolved, internal measures of confidence to identify associated responses in the fMRI, allowing for a more precise spatiotemporal characterisation of confidence than if relying solely on behavioural measures. We found that BOLD response in the VMPFC was uniquely explained by the single-trial variability in an early, EEG-derived neural signature of confidence occurring prior to subjects’ behavioural expression of response. This activity was additional to what could be explained by subjects’ behavioural reports alone. Our results provide empirical support for the involvement of the VMPFC in confidence of perceptual decisions, and suggest that this region may support an early readout of confidence (i.e., at, or near, the time of decision) preceding explicit choice or metacognitive evaluation.

We first showed that our EEG results - namely the temporal and spatial profile of the confidence-discriminating activity - were consistent with our previous work (Gherman and Philiastides, 2015) where we used a different perceptual task involving face vs. car visual categorisations, indicating that these confidence-related signals may generalise across a broader range of tasks. Interestingly, the spatial topography associated with this activity appears consistent with centroparietal scalp projections arising from signals culminating near the decision (O'Connell et al., 2012; Kelly and O'Connell, 2013; Philiastides et al., 2014). While the spatial limitation of EEG precludes conclusive interpretations based on this similarity, this pattern could potentially reflect a mixture of decision- and confidence-related signals, in line with the evidence that suggests these quantities may unfold together around the decision process itself (Kiani and Shadlen, 2009; Gherman and Philiastides, 2015; van den Berg et al., 2016; Dotan et al., 2018). Signals such as the centroparietal positivity (CPP) (O'Connell et al., 2012) and/or related P300 may themselves hold information about confidence as suggested by electrophysiological work (Boldt and Yeung, 2015) (see also (Urai and Pfeffer, 2014; Twomey et al., 2015) for brief discussions).

Further, our fMRI data revealed activation patterns suggesting that distinct neural networks carry information about confidence during perceptual decision vs. explicit confidence reporting stages of the trial, respectively. Indeed, it seems plausible that qualitatively distinct representations of confidence may be encoded at different times relative to the decision process. In particular, activations during the decision phase of the trial such as the VMPFC or anterior cingulate cortex, are in line with a more automatic encoding of confidence, i.e., in the absence of explicit confidence report (Lebreton et al., 2015; Bang and Fleming, 2018). In line with this idea, we also observed activations in regions associated with the human reward/valuation system, such as the striatum and orbitofrontal cortex. In contrast, regions showing correlation with confidence during the confidence rating stage, in particular the anterior prefrontal cortex, have been previously associated with explicit metacognitive judgment/report (Fleming et al., 2012; Morales et al., 2018), potentially serving a role in higher-order monitoring and confidence communication.

We presented several findings that sought to further clarify the nature and role of the early confidence signals observed in the EEG data, as well as their relationship with the perceptual decision and metacognition. Our computational modelling approach provided preliminary insight into the potential decision dynamics that might inform early confidence. Namely, we showed that these neural signals were consistent with predictions from a dynamic model of decision that quantifies confidence as the difference in accumulated evidence in favour of the possible choice alternatives, at the termination of the decision process. A possible interpretation is that the early confidence representations reflect a readout of this difference (for instance, by a distinct system than the one supporting the perceptual choice itself). In other words, early confidence representations could be informed by, yet be distinct from, the quantities reflected in the model-derived confidence, in line with a dissociation between the information supporting the decision vs. confidence. Our exploratory mediation analysis is in agreement with this interpretation, suggesting that EEG-derived confidence representations can be thought of as a statistical mediator between model-derived confidence measures (reflecting the balance of accumulated evidence at the time of decision) and confidence ratings.

In another exploratory analysis that aimed to better understand the potential impact of neural confidence signals on subsequent behaviour, we found that stronger signal amplitude increased the likelihood of repeating a choice on the subsequent trial, when the motion direction of the stimulus was consistent with that of the previous trial. Interestingly however, we did not observe this effect when subsequent motion was in the opposite direction. This dependence of the confidence-related choice repetition bias on stimulus identity is counterintuitive yet intriguing, as it points to a process that detects stimulus consistency (i.e., independently of the decision process itself), which interacts with representations of previous confidence to alter decision/behaviour (e.g., through selective re-weighting of evidence). While our current decision model cannot account for this confidence-driven trial-to-trial dependence, future computational developments may help reconcile these observations with formal models of decision and confidence.

Our main fMRI finding, linking early confidence representations with VMPFC activity suggests partial independence of these signals from decision centres. Specifically, as the VMPFC is not typically known to support perceptual decision processes, it seems more plausible that the confidence signals we observe here represent a (potentially noisy) readout of confidence-related information. In line with this, computational and neurobiological accounts of confidence processing have proposed architectures by which a first-level form of confidence in a decision emerges as a natural property of the neural processes that support the decision, and in turn is read out (i.e., summarised) by separate higher-order monitoring network(s) (Insabato et al., 2010; Meyniel et al., 2015; Pouget et al., 2016).

The timing of our EEG-derived confidence representations arising in close temporal proximity to the decision (but prior to commitment to a motor response) further endorse the hypothesis that the VMPFC may encode an automatic readout of confidence (Lebreton et al., 2015) in decision making, or early (and automatic) ‘feeling of rightness’ (Hebscher and Gilboa, 2016) in memory judgments. While dedicated research will be necessary to establish the functional role of these early signals, fast pre-response confidence signals could be necessary to regulate the link between decision and impending action, for example with low confidence signalling the need for additional evidence (Desender et al., 2018).

Consistent with a role in providing a confidence readout, recent work suggests the VMPFC may encode confidence in a task-independent and possibly domain-general manner. Specifically, several functional neuroimaging studies have shown positive modulation of VMPFC activation by confidence, across a range of decision making tasks (Rolls et al., 2010; De Martino et al., 2013; Heereman et al., 2015; Lebreton et al., 2015; Fleming et al., 2018). Notably, one study showed that fMRI activation in the VMPFC was modulated by confidence across four different tasks involving both value-based and non-value based rating judgments (Lebreton et al., 2015). Furthermore, evidence from memory-related decision making research appears to also implicate the VMPFC in confidence processing (Hebscher and Gilboa, 2016).

An outstanding question is whether, and how, the early confidence signals we identified in the VMPFC might further contribute to post-decisional metacognitive signals and eventual confidence reports. It has been long proposed that metacognitive evaluation relies on additional processes taking place post-decisionally (Pleskac and Busemeyer, 2010; Moran et al., 2015; Yu et al., 2015). For instance, recent evidence suggests that choice itself (and corresponding motor-related activity) affects confidence (Fleming et al., 2015; Gajdos et al., 2018) and may help calibrate metacognitive reports (Siedlecka et al., 2016; Fleming and Daw, 2017). The early confidence signals in the VMPFC could serve as one of multiple inputs to networks supporting retrospective metacognitive processes, e.g., anterior prefrontal regions (Fleming et al., 2012). Interestingly, our functional connectivity analysis revealed a strengthening of the link between the VMPFC and frontal areas (notably the aPFC and dlPFC) during the perceptual decision stage of the trial. While the functional significance of these connections remains to be determined, previous involvement of these regions in perceptual decision making and metacognition makes them likely candidates for providing or receiving input to/from the VMPFC within a confidence-related network.

The observation that the VMPFC, a region known for its involvement in choice-related subjective valuation (Philiastides et al., 2010; Rangel and Hare, 2010; Bartra et al., 2013; Pisauro et al., 2017) encodes confidence signals during perceptual decisions raises an interesting possibility for interpreting our results. Our behavioural paradigm did not involve any explicit reward/feedback manipulation and accordingly, the observed confidence-related activation cannot be interpreted as an externally driven value signal. Instead, as has been suggested previously (Barron et al., 2015; Lebreton et al., 2015), a likely explanation is that as an internal measure of performance accuracy, confidence is inherently valuable. Such a signal may represent implicit reward and possibly act as a teaching signal (Daniel and Pollmann, 2012; Guggenmos et al., 2016; Hebart et al., 2016; Lak et al., 2017) to drive learning.

In line with this interpretation, recent work suggests that confidence may be used in the computation of prediction errors (i.e., the difference between expected and currently experienced reward) (Lak et al., 2017; Colizoli et al., 2018), thus guiding a reinforcement-based learning mechanism. Relatedly, confidence prediction error (the difference between expected and experienced confidence) has been hypothesised to act as a teaching signal and guide learning in the absence of feedback. In particular, regions in the human mesolimbic dopamine system, namely the striatum and ventral tegmental area, have been shown to encode both anticipation and prediction error related to decision confidence, in the absence of feedback (Guggenmos et al., 2016), similarly to what is typically observed during reinforcement learning tasks where feedback is explicit (Preuschoff et al., 2006; Fouragnan et al., 2015; Fouragnan et al., 2017; Fouragnan et al., 2018). Importantly, these effects were predictive of subjects’ perceptual learning efficiency. Thus, confidence in valuation/reward networks could be propagated back to the decision systems to optimize the dynamics of the decision process, possibly by means of a reinforcement-learning mechanism. At the neural level, this could be implemented through a mechanism of strengthening or weakening information processing pathways that result in high and low confidence, respectively (Guggenmos and Sterzer, 2017). Though testing this hypothesis extends beyond the scope of the current study, we might expect that fluctuations in expected vs. actual confidence signals observed in our data have a similar influence on learning (e.g., perceptual learning (Law and Gold, 2009; Kahnt et al., 2011; Diaz et al., 2017).

In conclusion, we showed that by employing a simultaneous EEG-fMRI approach, we were able to localise an early representation of confidence in the brain with higher spatiotemporal precision than allowed by fMRI alone. In doing so, we provided novel empirical evidence for the encoding of a generalised confidence readout signal in the VMPFC preceding explicit metacognitive report. Our findings provide a starting point for further investigations into the neural dynamics of confidence formation in the human brain and its interaction with other cognitive processes such as learning, and the decision itself.

Materials and methods

Participants

Thirty subjects participated in the simultaneous EEG-fMRI experiment. Four were subsequently removed from the analysis due to near chance (n = 3) and near ceiling (n = 1) performance, respectively, on the perceptual discrimination task. Additionally, one subject was excluded whose confidence reports covered only a limited fraction of the provided rating scale, thus yielding an insufficient number of trials to be used in the EEG discrimination analysis (see below). Finally, one subject had to be removed due to poor (chance) performance of the EEG decoder. All results presented here are based on the remaining 24 subjects (age range 20 – 32 years). All were right-handed, had normal or corrected to normal vision, and reported no history of neurological problems. The study was approved by the College of Science and Engineering Ethics Committee at the University of Glasgow (CSE01355) and informed consent was obtained from all participants. While we conducted no explicit power analysis for determining sample size, note that our EEG analysis was performed on individual subjects using cross validation, such that in estimating our electrophysiologically-derived measure of confidence, each subject became their own replication unit (Smith and Little, 2018).

Stimuli and task

All stimuli were created and presented using the PsychoPy software (Peirce, 2007). They were displayed via an LCD projector (frame rate = 60 Hz) on a screen placed at the rear opening of the bore of the MRI scanner, and viewed through a mirror mounted on the head coil (distance to screen = 95 cm). Stimuli consisted of random dot kinematograms (Newsome and Pare, 1988), whereby a proportion of the dots moved coherently to one direction (left vs. right), while the remainder of the dots moved at random. Specifically, each stimulus consisted of a dynamic field of white dots (number of dots = 150; dot diameter = 0.1 degrees of visual angle, dva; dot life time = 4 frames; dot speed = 6 dva/s), displayed centrally on a grey background through a circular aperture (diameter = 6 dva). Task difficulty was controlled by manipulating the proportion of dots moving coherently in the same direction (i.e., motion coherence).

We aimed to maintain overall performance on the main perceptual decision task consistent across subjects (i.e., near perceptual threshold, at approximately 75% correct). For this reason, task difficulty was calibrated individually for each subject on the basis of a separate training session, prior to the day of the main experiment.

Training

To first familiarise subjects with the random dot stimuli and facilitate learning on the motion discrimination task, subjects first performed a short simplified version of the main task (lasting approx. 10 min), where feedback was provided on each trial. The task, which required making speeded direction discriminations of random dot stimuli (see below), began at a low-difficulty level (motion coherence = 40%) and gradually increased in difficulty in accordance with subjects’ online behavioural performance (a 3-down-1-up staircase procedure, where three consecutive correct responses resulted in a 5% decrease in motion coherence, whereas one incorrect response yielded a 5% increase). This was followed by a second, similar task, which served to determine subject-specific psychophysical thresholds. Seven motion coherence levels (5%, 8%, 12%, 18%, 28%, 44%, 70%) were equally and randomly distributed across 350 trials. The proportion of correct responses was separately computed for each motion coherence level, and a logarithmic function was fitted through the resulting values in order to estimate an optimal motion coherence yielding a mean performance of approximately 75% correct. Subjects who showed near-chance performance across all coherence levels or showed no improvement in performance with increasing motion coherence were not tested further and did not participate in the main experiment. No feedback was given for this or any of the subsequent tasks.

Main task

On the day of the main experiment, subjects practised the main task once outside the scanner, and again inside the scanner prior to the start of the scan (a short 80 trial block each time). Subjects made left vs. right direction discriminations of random dot kinematograms and rated how confident they were in their choices, on a trial-by-trial basis (Figure 1A). Each trial began with a random dot stimulus lasting for a maximum of 1.2 s, or until the subject made a behavioural response. Subjects were instructed to respond as quickly as possible, and had a time limit of 1.5 s to do so. The message ‘Oops! Too slow’ was displayed if this time limit was exceeded or no direction response was made. Once the dot stimulus disappeared, the screen remained blank until the 1.2 s stimulation period elapsed and through an additional random delay (1.5 – 4 s).

Next, subjects were presented with a rating scale for 3 s, during which they reported their confidence in the previous direction decision. The confidence scale was represented intuitively by means of a white horizontal bar of linearly varying thickness, with the thick end representing high confidence. Its orientation on the horizontal axis (thin-to-thick vs. thick-to-thin) informed subjects of the response mapping, and this was equally and randomly distributed across trials to control for motor preparation effects. To make a confidence response, subjects moved an indicator (a small white triangle) along a 9-point marked line. The indicator changed colour from white to yellow when a confidence response was selected and this remained on the screen until the 3 s elapsed). A final delay (blank screen, jittered between 1.5 – 4 s) ended the trial. The timing of the inter-stimulus jitters was optimised using a genetic algorithm (Wager and Nichols, 2003) in order to increase estimation efficiency in the fMRI analysis. Failing to provide either a direction or a confidence response within the respective allocated time limits on a given trial rendered it invalid, and this was subsequently removed from further analyses. This resulted in a total fraction of .04 (.02 and. 02, respectively) of trials being discarded.

Subjects performed two experimental blocks of 160 trials each, corresponding to two separate fMRI runs. Each block contained two short (30 s) rest breaks, during which the MR scanner continued to run. Subjects were instructed to remain still throughout the entire duration of the experiment, including during rest breaks and in between scans. Motion coherence was held constant across trials, at the subject-specific level estimated during training. The direction of the dots was equally and randomly distributed across trials. To control for confounding effects of low-level trial-to-trial variability in stimulus properties on decision confidence, an identical set of stimuli was used in the two experimental blocks. Specifically, for each subject, the random seed, which controlled dot stimulus motion parameters in the stimulus presentation software was set to a fixed value. This manipulation allowed for subsequent control comparisons between pairs of identical stimuli.

Subjects were encouraged to explore the entire scale when making their responses and to abstain from making a confidence response on a given trial if a motor mapping error had been made (for instance, a premature or accidental button press that was inconsistent with the perceptual representation). They were instructed to make their responses as quickly and accurately as possible, and provide a response on every trial. All behavioural responses were executed using the right hand, on an MR-compatible button box.

EEG data acquisition

EEG data was collected using an MR-compatible EEG amplifier system (Brain Products, Germany). Continuous EEG data was recorded using the Brain Vision Recorder software (Brain Products, Germany) at a sampling rate of 5000 Hz. We used 64 Ag/AgCl scalp electrodes positioned according to the 10 – 20 system, and one nasion electrode. Reference and ground electrodes were embedded in the EEG cap and were located along the midline, between electrodes Fpz and Fz, and between electrodes Pz and Oz, respectively. Each electrode had in-line 10 kOhm surface-mount resistors to ensure subject safety. Input impedance was adjusted to < 25 kOhm for all electrodes. Acquisition of the EEG data was synchronized with the MR data acquisition (Syncbox, Brain Products, Germany), and MR-scanner triggers were collected separately to enable offline removal of MR gradient artifacts from the EEG signal. Scanner trigger pulses were lengthened to 50μs using a built-in pulse stretcher, to facilitate accurate capture by the recording software. Experimental event markers (including participants’ responses) were synchronized, and recorded simultaneously, with the EEG data.

EEG data processing

Preprocessing of the EEG signals was performed using Matlab (Mathworks, Natick, MA). EEG signals recorded inside an MR scanner are contaminated with gradient artifacts and ballistocardiogram (BCG) artifacts due to magnetic induction on the EEG leads. To correct for gradient-related artifacts, we constructed average artifact templates from sets of 80 consecutive functional volumes centred on each volume of interest, and subtracted these from the EEG signal. This process was repeated for each functional volume in our dataset. Additionally, a 12 ms median filter was applied in order to remove any residual spike artifacts. Further, we corrected for standard EEG artifacts and applied a 0.5 – 40 Hz band-pass filter in order to remove slow DC drifts and high frequency noise. All data were downsampled to 1000 Hz.

To remove eye movement artifacts, subjects performed an eye movement calibration task prior to the main experiment (with the MRI scanner turned off, to avoid gradient artifacts), during which they were instructed to blink repeatedly several times while a central fixation cross was displayed in the centre of the computer screen, and to make lateral and vertical saccades according to the position of the fixation cross. We recorded the timing of these visual cues and used principal component analysis to identify linear components associated with blinks and saccades, which were subsequently removed from the EEG data (Parra et al., 2005).

Next, we corrected for cardiac-related (i.e., ballistocardiogram, BCG) artifacts. As these share frequency content with the EEG, they are more challenging to remove. To minimise loss of signal power in the underlying EEG signal, we adopted a conservative approach by only removing a small number of subject-specific BCG components, using principal component analysis. We relied on the single-trial classifiers to identify discriminating components that are likely to be orthogonal to the BCG. BCG principal components were extracted from the data after the data were first low-pass filtered at 4 Hz to extract the signal within the frequency range where BCG artifacts are observed. Subject-specific principal components were then determined (average number of components across subjects: 1.8). The sensor weightings corresponding to those components were projected onto the broadband data and subtracted out. Finally, data were baseline corrected by removing the average signal during the 100 ms prestimulus interval.

Single-trial EEG analysis

To increase statistical power of the EEG data analysis, trials were separated into three confidence groups (Low, Medium, High), on the basis of the original 9-point confidence rating scale. Specifically, we isolated High- and Low-confidence trials by pooling across each subject’s three highest and three lowest ratings, respectively. To ensure robustness of our single trial EEG analysis, we imposed a minimum limit of 50 trials per confidence trial group. For those data sets where subjects had an insufficient number of trials in the extreme ends of the confidence scale, neighbouring confidence bins were included to meet this limit.

We used a single-trial multivariate discriminant analysis, combined with a sliding window approach (Parra et al., 2005; Sajda et al., 2009) to discriminate between High- and Low-confidence trials in the stimulus-locked EEG data. This method aims to estimate, for predefined time windows of interest, an optimal combination of EEG sensor linear weights (i.e., a spatial filter) which, applied to the multichannel EEG data, yields a one-dimensional projection (i.e., a 'discriminant component') that maximally discriminates between the two conditions of interest. Importantly, unlike univariate trial-average approaches for event-related potential analysis, this method spatially integrates information across the multidimensional sensor space, thus increasing signal-to-noise ratio whilst simultaneously preserving the trial-by-trial variability in the signal, which may contain task-relevant information. In our data, we identified confidence-related discriminating components, y(t), by applying a spatial weighting vector w to our multidimensional EEG data x(t), as follows:

(1) yt=wTxt= i=1Dwixi(t)

where D represents the number of channels, indexed by i, and T indicates the transpose of the matrix. To estimate the optimal discriminating spatial weighting vector w, we used logistic regression and a reweighted least squares algorithm (Jordan and Jacobs, 1994). We applied this method to identify w for short (60 ms) overlapping time windows centred at 10 ms-interval time points, between -100 and 1000 ms relative to the onset of the random dot stimulus (i.e., the perceptual decision phase of the trial). This procedure was repeated for each subject and time window. Applied to an individual trial, spatial filters (w) obtained this way produce a measurement of the discriminant component amplitude for that trial. In separating the High and Low trial groups, the discriminator was designed to map the component amplitudes for one condition to positive values and those of the other condition to negative values. Here, we mapped the High confidence trials to positive values and the Low confidence trials to negative values, however note that this mapping is arbitrary.

To quantify the performance of the discriminator for each time window, we computed the area under a receiver operating characteristic (ROC) curve (i.e., the Az value), using a leave-one-out cross-validation procedure (Duda et al., 2001). Specifically, for every iteration, we used N-1 trials to estimate a spatial filter (w), which was then applied to the remaining trial to obtain out-of-sample discriminant component amplitudes (y) for High- and Low-confidence trials and compute the Az. Note that these out-of-sample y values were highly correlated with the y values resulting from the original High- vs. Low-confidence discrimination described above (subject-averaged R=.93). We determined significance thresholds for the discriminator performance using a bootstrap analysis whereby trial labels were randomised and submitted to a leave-one-out test. This randomisation procedure was repeated 500 times, producing a probability distribution for Az, which we used as reference to estimate the Az value leading to a significance level of p<0.01.

Given the linearity of our model we also computed scalp projections of the discriminating components resulting from Equation 1 by estimating a forward model for each component:

(2) a= X yyTy

where the EEG data (X) and discriminating components (y) are now in a matrix and vector notation, respectively, for convenience (i.e., both X and y now contain a time dimension). Equation 2 describes the electrical coupling of the discriminating component y that explains most of the activity in X. Strong coupling indicates low attenuation of the component y and can be visualised as the intensity of vector a.

Single-trial power analysis

We calculated prestimulus alpha power (8 – 12 Hz) in the 400 ms epoch beginning at −500 ms relative to the onset of the random dot stimulus. To do this, we used the multitaper method (Mitra and Pesaran, 1999) as implemented in the FieldTrip toolbox for Matlab (http://www.ru.nl/neuroimaging/fieldtrip). Specifically, for each epoch data were tapered using discrete prolate spheroidal sequences (two tapers for each epoch; frequency smoothing of ± 4 Hz) and Fourier transformed. Resulting frequency representations were averaged across tapers and frequencies. Single-trial power estimates were then extracted from the occipitoparietal sensor with the highest overall alpha power and baseline normalised through conversion to decibel units (dB).

Assessing the influence of neural confidence on behaviour

To test whether fluctuations in the confidence-discriminating component amplitudes, yCONF, were predictive of the probability to repeat a choice on the immediately subsequent trial, PREPEAT), we divided yCONF into 3 equal bins (Low, Medium, and High), separately for each subject, and compared the corresponding PREPEAT across subjects, using a one-way repeated measures ANOVA. To ensure that any observed modulation of PREPEAT by yCONF was independent of the correlation of yCONF with accuracy on the current trial(s), we first equalised the number of correct and error trials within each yCONF bin. Specifically, for each subject, we removed either exclusively correct or error trials (depending on which of the two was in excess) via random selection from 500 permutations of the trial set. We report results based on the average yCONF values obtained with this procedure (see Results).

Modelling decision confidence

We modelled the perceptual decision process using a variant of the original race model of decision making (Vickers, 1979; Vickers and Packer, 1982; De Martino et al., 2013). Specifically, each decision was represented as a race-to-threshold between two independent accumulating signals - variables L and R - which collected evidence in favour of the left and right choices, respectively. At each time step of the accumulation (time increment = 1 ms), the two variables were updated separately with an evidence sample s(t) extracted randomly from normal distributions with mean μ and standard deviation σ, s(t)=N(μ,σ), such that:

(3) L(t+1)=L(t)+sL(t)
R(t+1)=R(t)+sR(t)

Here, we assumed that evidence samples for the two possible choices are drawn from distributions with identical variances but distinct means, whereby the mean of the distribution is dependent on the identity of the presented stimulus. For instance, a leftward motion stimulus would be associated with a larger distribution mean (and thus on average faster rate of evidence accumulation) in the left (stimulus-congruent) than right (stimulus-incongruent) accumulator. We defined the mean of the distribution associated with the stimulus-congruent accumulator as μcongr=0.1 (arbitrary units), and that of the stimulus-incongruent accumulator as μincongrcongr/r, where r is a free parameter in the model. For each simulated trial, evidence accumulation for the two accumulator variables began at 0 and progressed towards a fixed decision threshold θ, with choice being determined by the first accumulator to reach this threshold. Finally, response time was defined as the time taken to reach the decision threshold plus a non-decision time (nDT) accounting for early visual encoding and motor preparation processes.

We fitted the model to each subject’s response time data, using a maximum likelihood function (as in Pisauro et al., 2017). Namely, we combined RTs for correct and incorrect trials into a single distribution by mirroring the distribution of incorrect trials at the 0 point on the time axis, and thus transforming all error RTs into negative values. We compared resulting distributions and mean choice accuracies obtained from behavioural data vs. model simulations. The log likelihood function was estimated according to:

(4) LL ~ logKSRTdata , RTmodel+logexp-Accuracydata-Accuracymodel 0.12

KS represents the estimated probability that two independent samples (here, behavioural vs. simulated RTs) come from populations with the same distribution, as inferred with the two-sample Kolmogorov-Smirnov test (implemented in Matlab function kstest2).

For each subject, the free model parameters were iteratively adjusted to maximise the LL. This was done by performing a grid search through a fixed range of values (σ=[.6:0.1:1], θ=[55:7:97], nDT=[250:50:450], r=[1.2:0.05:1.6]), determined after an initial exploratory search which sought to identify parameter ranges that generated plausible behavioural measures (RT and accuracy) (i.e., comparable to those observed in subjects’ behaviour). For each set of parameters, we simulated 500 trials and recorded mean choice accuracy, RT, and confidence (Δe).

To assess the quality of the model fits, we computed the correlation between observed vs. model-predicted behaviour (namely response time quantiles for correct and error responses, as well as mean choice accuracy), using the robust percentage bend correlation analysis (Wilcox, 1994).

Exploratory mediation analysis

To examine the relationship between model-derived confidence estimates (Δe), neural confidence signals (yCONF), and behavioural confidence reports (Ratings), we performed an exploratory mediation analysis (M3 toolbox for Matlab; Wager, 2018 http://wagerlab.colorado.edu/tools) on these measures. A mediation analysis aims to identify whether the link between a predictor variable (here, Δe) and an outcome (Ratings) can be explained, fully or partially, by the indirect effect of a mediator variable (yCONF). For each of the three variables of interest, we computed the mean difference between correct and error trials, and resulting values (ΔeDIFF, yCONF_DIFF, and RatingsDIFF, respectively) were subjected to the mediation analysis. To establish significance of the mediator effect of yCONF_DIFF, three conditions must be met 1) ΔeDIFF reliably predicts yCONF_DIFF, 2) yCONF_DIFF reliably predicts RatingsDIFF when the effect of ΔeDIFFis accounted for, and (3) a significant indirect effect of yCONF_DIFF, defined as the coefficient product of effects (1) and (2), can be observed. We established coefficient significance in the three models using a 5000 sample bootstrap test (Wager et al., 2008).

MRI data acquisition

Imaging was performed at the Centre for Cognitive Neuroimaging, Glasgow, using a 3-Tesla Siemens TIM Trio MRI scanner (Siemens, Erlangen, Germany) with a 12-channel head coil. Cushions were placed around the head to minimize head motion. We recorded two experimental runs of 794 whole-brain volumes each, corresponding to the two blocks of trials in the main experimental task. Functional volumes were acquired using a T2*-weighted gradient echo, echo-planar imaging sequence (32 interleaved slices, gap: 0.3 mm, voxel size: 3 × 3 × 3 mm, matrix size: 70 × 70, FOV: 210 mm, TE: 30 ms, TR: 2000 ms, flip angle: 80°). Additionally, a high-resolution anatomical volume was acquired at the end of the experimental session using a T1-weighted sequence (192 slices, gap: 0.5 mm, voxel size: 1 × 1 × 1 mm, matrix size: 256 × 256, FOV: 256 mm, TE: 2300 ms, TR: 2.96 ms, flip angle: 9°), which served as anatomical reference for the functional scans.

fMRI preprocessing

The first 10 volumes prior to task onset were discarded from each fMRI run to ensure a steady-state MR signal. Additionally, 13 volumes were discarded from the post-task period at the end of each block. The remaining 771 volumes were used for statistical analyses. Pre-processing of the MRI data was performed using the FEAT tool of the FSL software (FMRIB Software Library, http://www.fmrib.ox.ac.uk/fsl) and included slice-timing correction, high-pass filtering (>100 s), and spatial smoothing (with a Gaussian kernel of 8 mm full width at half maximum), and head motion correction (using the MCFLIRT tool). The motion correction preprocessing step generated motion parameters which were subsequently included as regressors of no interest in the general linear model (GLM) analysis (see fMRI analysis below). Brain extraction of the structural and functional images was performed using the Brain Extraction tool (BET). Registration of EPI images to standard space (Montreal Neurological Institute, MNI) was performed using the Non-linear Image Registration Tool with a 10 mm warp resolution. The registration procedure involved transforming the EPI images into an individual’s high-resolution space (with a linear, boundary-based registration algorithm [Greve and Fischl, 2009]) prior to transforming to standard space. Registration outcome was visually checked for each subject to ensure correct alignment.

fMRI analysis

Whole-brain statistical analyses of functional data were conducted using a general linear model (GLM) approach, as implemented in FSL (FEAT tool):

(5) Y=βX+ ε= β1X1+ β2X2++ βnXn+ ε

where Y represents the BOLD response time series for a given voxel, structured as a T×1 (T time samples) column vector, and X represents the T×N (N regressors) design matrix, with each column representing one of the psychological regressors (see GLM analysis below for details), convolved with a canonical hemodynamic response function (double-gamma function). β represents the parameter estimates (i.e., regressor betas) resulting from the GLM analysis in the form of a N × 1 column vector. Lastly, ε is a T × 1 column vector of residual error terms. A first-level analysis was performed to analyse each subject’s individual runs. These were then combined at the subject-level using a second-level analysis (fixed effects). Finally, a third-level mixed-effects model (FLAME 1) was used to combine data across all subjects.

Simultaneous EEG-fMRI analysis

With the combined EEG-fMRI approach, we sought to identify confidence-related activation in the fMRI surpassing what could be explained by the relevant behavioural predictors alone. In particular, we looked for brain regions where BOLD responses correlated with the confidence-discriminating component derived from the EEG analysis. Our primary motivation behind this approach was the hypothesis that endogenous trial-by-trial variability in the confidence discriminating EEG component (near the time of perceptual decision, and prior to behavioural response) would be more reflective of early internal representations of confidence at the single-trial level, compared to the metacognitive reports which are provided post-decisionally and therefore likely to be subjected to additional processes. We predicted that the simultaneous EEG-fMRI approach would enable identification of latent brain states that might remain unobserved with a conventional analysis approach. To this end, we extracted trial-by-trial amplitudes of yt (resulting from Eq. 1) at the time window of maximum confidence discrimination, and used these to build a BOLD predictor (i.e., the yCONF regressor). Importantly, to avoid possible confounding effects of motor preparation/response, the time of this component was determined on a subject-specific basis, by only considering the period prior to the behavioural choice (mean peak discrimination time = 708 ms from stimulus onset, SD=162 ms). Thus, on average this was selected 287ms (SD=171 ms) prior to each subject’s mean response time. To ensure our results were not affected by potential overfitting during the estimation of y, we conducted a control GLM analysis whereby the yCONF regressor was built using fully out-of-sample y values resulting from our leave-one-out cross-validation procedure detailed above (Figure 5—figure supplement 5).

Note that the trial-by-trial variability in our EEG component amplitudes is driven mostly by cortical regions found in close proximity to the recording sensors and to a lesser extent by distant (e.g., subcortical) structures. Nonetheless, an advantage of our EEG-informed fMRI predictors is that they can also reveal relevant fMRI activations within deeper structures, provided that their BOLD activity covaries with that of the cortical sources of our EEG signal.

GLM analysis

We designed our GLM model to account for variance in the BOLD signal at two key stages of the trial, namely the perceptual decision period (beginning at the onset of the random dot visual stimulus) and the metacognitive evaluation/rating (beginning at the onset of the rating scale display), respectively. A total of 10 regressors were included in the model. Our primary predictor of interest was the EEG-derived endogenous measure of confidence (yCONF regressor). We modelled this as a stick function (duration = 0.1 s) locked to the stimulus onset, with event amplitudes parametrically modulated by the trial-to-trial variability in the confidence discriminating componentyt. To ensure variance explained by this regressor was unique (i.e., not explained by subjects’ behavioural reports), we included a second regressor whose event amplitudes were parametrically modulated by confidence ratings, and which was otherwise identical to the yCONF regressor (i.e., RatingsDEC regressor, duration = 0.1 s, locked to stimulus onset). Importantly, yCONF amplitudes were only moderately correlated with behavioural confidence ratings, thus allowing us to exploit additional explanatory power inherent to this regressor. Other regressors of no interest for the perceptual decision stage included: one regressor parametrically modulated by prestimulus alpha power in the EEG signal (to control for potential attentional baseline effects), one categorical regressor (1/0) accounting for variability in response accuracy, and one unmodulated regressor (all event amplitudes set to (1) modelling stimulus-related visual responses of no interest across both valid and non-valid (missed) trials (all event durations = 0.1 s, locked to stimulus onset). To control for motor preparation/response, we also included a parametric regressor modulated by subjects’ reaction time on the direction discrimination task (duration = 0.1 s, locked to the time of behavioural response). Note that including an additional unmodulated regressor locked to the time of the behavioural response did not alter our results.

Additionally, locked to the onset of the metacognitive rating period, we included one parametric regressor (duration = 0.1 s) with event amplitudes modulated by subjects’ confidence ratings, one boxcar regressor with duration equivalent to subjects’ active behavioural engagement in confidence rating (to minimise effects relating to motor processes), and one unmodulated regressor (duration = 0.1 s). Lastly, we included one categorical boxcar regressor (1/0) to model non-task activation (i.e., rest breaks within each run). Motion correction parameters obtained from fMRI preprocessing were entered as additional covariates of no interest.

As we included two rating-modulated regressors in our model, which were identical except for their onset times (i.e., decision and rating phases, respectively), we sought to ensure that these were not highly correlated. We computed the correlation between the convolved regressors, separately for each subject and experimental run (mean R = -.13; Figure 5—figure supplement 3). Additionally, we conducted two separate control GLM analyses whereby only the regressors pertaining to one trial phase (i.e., decision or rating, respectively) were included at a time. This allowed us to further validate our results, to ensure they remained unaffected by potential correlations between regressors at the two stages of the trial (Figure 5—figure supplement 4). Finally, we also assessed the correlations between all regressors by computing the variance inflation factors (VIF) for the regressors in our model. We found that mean VIF = 3.57 (±1.83), with multicollinearity typically being considered high if VIF > 5 – 10.

Resampling procedure for fMRI thresholding

To estimate a significance threshold for our fMRI statistical maps whilst correcting for multiple comparisons, we performed a nonparametric permutation analysis that took into account the a priori statistics of the trial-to-trial variability in our primary regressor of interest (yCONF), in a way that trades off cluster size and maximum voxel Z-score (Debettencourt et al., 2011). For each resampled iteration, we maintained the onset and duration of the regressor identical, whilst shuffling amplitude values across trials, runs and subjects. Thus, the resulting regressors for each subject were different as they were constructed from a random sequence of regressor amplitude events. This procedure was repeated 200 times. For each of the 200 resampled iterations, we performed a full 3-level analysis (run, subject, and group). Our design matrix included the same regressors of non-interest used in all our GLM analysis. This allowed us to construct the null hypothesis H0, and establish a threshold on cluster size and Z-score based on the cluster outputs from the permuted parametric regressors. Specifically, we extracted cluster sizes from all activations exceeding a minimal cluster size (5 voxels) and Z-score (2.57 per voxel) for positive correlations with the permuted parametric regressors. Finally, we examined the distribution of cluster sizes (number of voxels) for the permuted data and found that the largest 5% of cluster sizes exceeded 162 voxels. We therefore used these results to derive a corrected threshold for our statistical maps, which we then applied to the clusters observed in the original data (that is, Z=2.57, minimum cluster size of 162 voxels, corrected at p=0.05).

Psychophysiological interaction analysis

We conducted a psychophysiological (PPI) analysis to explore potential functional connectivity between the region of the VMPFC found to uniquely explain trial-to-trial variability in our electrophysiologically-derived measures of confidence, and the rest of the brain, during the perceptual decision phase of the trial. To carry out the PPI analysis, we first extracted the time-series data from the seed region. Specifically, we identified the cluster of interest at the group level (i.e., in standard space) by applying the cluster correction procedure described in the previous section. Using this as a template, we constructed subject-specific masks of the voxels exhibiting the strongest correlation with the VMPFC region of interest, and back-projected these into the functional space of each individual. Resulting masks were used to compute average time-series data, separately for each subject and functional run, which subsequently served as the physiological regressor(s) in the PPI model. To carry out the PPI analysis, we performed a new GLM analysis. This included the following regressors, locked to the time of stimulus onset: (1) an unmodulated regressor (all event amplitudes set to 1), (2) the physiological regressor (time course of the VMPFC seed), (3) the psychological regressor (a boxcar function with event amplitudes set to one and duration parametrically modulated by trial-specific decision times (i.e., interval between stimulus presentation and behavioural response on the perceptual task), and (4) the interaction regressor. Additionally, motion parameters estimated during registration (see preprocessing step) were included as regressors of no interest. The statistical output from the interaction regressor thus reveals regions of the brain where correlation with the BOLD signal in the VMPFC is stronger during the perceptual decision than the rest of the trial. Importantly, this represents variance additional to that explained by the psychological and physiological regressors alone. Correction for multiple comparisons was performed on the whole brain using the outcome of the resampling procedure as described earlier.

Extracting BOLD response time course

To illustrate the activation time course within the VMPFC cluster identified with our EEG-informed fMRI analysis, we first extracted the average BOLD response time-series from this region, separately for each subject and functional run (as detailed in the previous section). We aligned our data to the onset of the random-dot stimulus, by approximating to the time of the nearest fMRI volume, and defined the temporal window of interest as the -4 s to 10 s interval relative to stimulus onset. We proceeded to separate trials into three bins according to the magnitude of the confidence discriminating component yCONF (i.e., Low, Medium, and High yCONF), and computed the respective percent signal change as follows:

(6) %BOLD Changej(t)= BOLDj(t) BOLDjbaselineBOLD¯

where j represents the trial index, BOLDt represents the stimulus-locked data at time point t, andBOLDbaseline is the mean baseline data, with the baseline window defined as the 4 s interval prior to stimulus onset. Finally, BOLD¯ is the average signal across the entire functional run. Resulting signals were averaged across trials, runs, and subjects.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
    Task-evoked pupil responses reflect internal belief states
    1. O Colizoli
    2. JW de Gee
    3. AE Urai
    4. TH Donner
    (2018)
    Scientific Reports, 8, 10.1038/s41598-018-31985-3, 30209335.
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
    Pattern Classification
    1. RO Duda
    2. PE Hart
    3. DG Stork
    (2001)
    New York: Wiley.
  17. 17
  18. 18
  19. 19
    The neural basis of metacognitive ability
    1. SM Fleming
    2. RJ Dolan
    (2012)
    Philosophical Transactions of the Royal Society B: Biological Sciences 367:1338–1349.
    https://doi.org/10.1098/rstb.2011.0417
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
    Domain general mechanisms of perceptual decision making in human cortex
    1. TC Ho
    2. S Brown
    3. JT Serences
    (2009)
    The Journal of Neuroscience : The Official Journal of the Society for Neuroscience 29:8675–8687.
    https://doi.org/10.1523/JNEUROSCI.5984-08.2009
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
    Neural correlates of metacognitive ability and of feeling confident: a large-scale fMRI study
    1. P Molenberghs
    2. F-M Trautwein
    3. A Böckler
    4. T Singer
    5. P Kanske
    (2016)
    Social Cognitive and Affective Neuroscience 28:nsw093.
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
    Small is beautiful: In defense of the small-N design
    1. PL Smith
    2. DR Little
    (2018)
    Psychonomic Bulletin & Review, 349, 10.3758/s13423-018-1451-8.
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
    Decision Processes in Visual Perception
    1. D Vickers
    (1979)
    Academic Press.
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101

Decision letter

  1. Tobias H Donner
    Reviewing Editor; University Medical Center Hamburg-Eppendorf, Germany
  2. Joshua I Gold
    Senior Editor; University of Pennsylvania, United States
  3. Tobias H Donner
    Reviewer; University Medical Center Hamburg-Eppendorf, Germany

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for submitting your work entitled "Human VMPFC encodes early signatures of confidence in perceptual decisions" for consideration by eLife. Your article has been reviewed by four peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

All reviewers agreed that your study addresses an important topic, and that the simultaneous fMRI-EEG approach has high potential for doing so. However, all reviewers also raised a number of substantive concerns about the specifics of the approach, and neither of the reviewers was sufficiently convinced by the conceptual advance afforded by this study.

Specifically, the reviewers identified one central issue as limiting the conclusions that can be drawn from the results – the functional meaning of the discriminating component amplitude Y remains unclear. This is for a number of reasons: (i) there is no model that explains how Y is computed and links it to the elements of the decision process, which are better understood (sensory input, decision variable, internal noise); (ii) Y is only partially correlated with confidence ratings; (iii) the behaviour of Y is at odds with most current models of confidence (e.g. Y does not predict choice accuracy); (iv) no predictive effect of Y on future behaviour is established; and (v) by design, reaction times (and with it stimulus duration and time of response preparation) vary from trial to trial, raising concerns about trivial explanations for Y. In addition the reviewers were not convinced that the functional connectivity analysis adds anything of substance to the paper.

That said, most reviewers felt that an improved version of this paper which convincingly addresses the above concerns and the ones of the individual reviewers copied below might warrant another round of formal review. An essential prerequisite for this would be the presentation of a clear generative model for Y and a new version of the analysis and/or an improved experimental design that eliminates the reaction time confound from the decoding analysis. In such a case, we would need to evaluate the changes first to decide if further review is justified and therefore we would treat the paper as a new submission (but considering its full history).

Reviewer #1:

This is an interesting and timely study into the neural basis of perceptual decision confidence behavior. The overall approach is original and state-of-the-art. The early confidence signal uncovered in VMPFC is novel and potentially important. That said, I am troubled by a number conceptual and methodological issues.

1) One conceptual limitation is that the paper does not provide any insight into how the early confidence signal is constructed in the brain – specifically, how the confidence signal relates to sensory evidence, and the internal decision variable that the brain derives from that evidence. This issue is central to current theoretical work on perceptual decision confidence. Addressing this issue would substantially raise the significance of the paper. The functional connectivity analysis might shed some light onto this issue, but the authors should take VMPFC (rather than RLPFC, as they do now) as a seed. Then, the analysis should reveal two sets of regions: those that drive the VMPC signal (regions encoding the decision variable?) and those are driven by the VMPC signal (RLPFC, as the authors speculate?).

2) A second limitation is that the functional role of the neural confidence signals is not assessed. Many influential papers (theoretical and experimental) in this field have begun to uncover the roles of confidence in controlling behaviour and learning. While the task design is not tailored to addressing this issue, the authors could test for confidence-dependent, short-term changes in choice behavior or longer-term learning effects. Again, this would raise the significance of the findings reported.

3) The logic behind the PPI analysis needs to be unpacked – it is not clear if the result provides any conceptual insight. First, the authors seem to suggest that the VMPFC confidence signal drives RLPFC – then, why should the strength of this correlation scale with confidence? Should the correlation not be the same, regardless of whether confidence is high or low? Second, the functional consequences of this coupling result are unclear. This part of the authors' conclusions is purely speculative ("informing metacognitive evaluation and learning") – a meaningful link to behavior would help.

Reviewer #2:

In this paper, the authors describe a combined EEG-MRI study that links early predictors of confidence in a perceptual discrimination task to stimulus-locked activity in the ventromedial prefrontal cortex (vmPFC). Globally the paper is well written, with many aspects to praise in the design (notably controls for difficulty and motor confounds) and very sophisticated analyses.

To me it is not uncommon to see vmPFC activity associated with confidence in perceptual decision tasks, even if this is generally not the main message. So I think this claim is not particularly novel. However, the dataset reported here is rather unique because it allows using EEG measurements to track the neural noise that is added to perceptual evidence in the generation of confidence. This construct (called Y by the authors) can then be used as a regressor in the analysis of fMRI data.

The main issue is the circularity in the approach: the multivariate EEG decoder is trained to predict confidence ratings, and then the output of this decoder (i.e., Y) is said to represent an early predictor of confidence that is different from the rating. I am convinced that vmPFC activity indeed correlates with Y and not rating, but what is Y? That is the question. Without a precise specification of this construct, I do not think there is much information to get from the result. It remains open to uninteresting interpretations: for instance Y could represent the fact that at this time point the decision has been made or not, or the proximity of the motor response, since confidence correlates with response time.

My suggestion is to build a generative model of confidence rating, in which Y would be a factor among others. What needs to be explained is how Y is generated (possibly something like evidence plus neural noise) and then how it is transformed into choice, response time and confidence rating. If Y could be estimated independently of rating, then the circularity would be broken and the dissociation between neural representations of Y and rating would be meaningful. Computational modeling may be helpful here, perhaps a race model as that used in De Martino et al., 2013.

Another concern relates to the PPI analysis. I cannot make sense of the result that vmPFC activity reflects the interaction between confidence rating and time series in the rostrolateral PFC. If this region already signals confidence level, then the interaction regressor is something like confidence squared. Does this really tell us anything about the passage of information from vmPFC to rlPFC?

Reviewer #3:

This study investigates the neural mechanisms of perceptual decision confidence using a combination of EEG, fMRI and machine learning. Participants discriminate the drift direction of moving dot kinematograms. Motion coherence was varied on a per-subject basis to achieve around 75% performance in each participant. The perceptual task is speeded, meaning that the response is given as soon as the participant feels ready and then the stimulus presentation is terminated. There are three key analyses: The first links trials-wise reported confidence during the decision making stage directly to BOLD signals and finds the striatum, lateral OFC, ACC and other regions to be positively related to confidence, consistent with previous work. The second analysis links reported confidence during the rating stage to BOLD signals and finds striatum, medial temporal regions and motor cortex to be positively related to confidence. The third main analysis is based on a trial-wise EEG-classifier that classifies the confidence on each trial. This third component was related to BOLD signlas in ventromedial PFC.

The findings here are interesting and add to and extend the literature on confidence signals in the brain. However, I have a number of points that still need to be clarified.

1) The use of a speeded perceptual task means that the stimulus presentation duration is shorter for high confidence than for low confidence trials. I was wondering which effect this contamination effect has on the EEG classifier, and thus also in turn on the BOLD signals observed in the third main (i.e. EEG-based) analysis. Could it be that the classifier is partly picking up the effect of the longer stimulus duration?

2) I would be more upfront with approach used for defining and dissociating the different temporal stages (decision making, rating). I couldn't work this out until I reached the Materials and methods section, but it is vital to understanding the design.

3) I didn't understand the logic of the autocorrelation analysis that was used to control for attention. Please explain.

4) The summary of time series as "delay" and "peak" is too dense (Figure 2A). It would be better to show individual time courses to confirm that the data can be appropriately summarized by a delay and peak.

5) How can it be ensured that the EEG-derived measures are independent of difficulty, accuracy and attention? For this it would be necessary to assess the relationship between the EEG-measure and these behavioral properties explicitly (ideally to plot them as well).

Other comments:

In order to accord with requirements for reporting statistics the paper here should add a statement that "No explicit power analysis was conducted for determining sample size".

Reviewer #4:

In this paper, the authors investigate the correlates of confidence using single trial multivariate analyses based on EEG signals that were concurrently recorded during fMRI. Crucially, the authors derive interesting relations between BOLD-fMRI and EEG decoded signals during the decision stage (before an actual motor action is executed), thus allowing them to elegantly show in humans with high spatial and temporal specificity the neural correlates of confidence in perceptual decisions before a motor action is observed.

Overall this study offers new insights on the origins of confidence during perceptual decisions, by showing that the vmPFC also encodes an early confidence readout for this type of choices. I am a fan of the authors' methodological strategy to study human decision-making. However, for this study, my points of criticism are mainly related to the set of statistical analyses that the authors stand on to make their conclusions, which I consider should be revised. I provide some suggestions that may help to strengthen the authors' conclusions.

1) An important concern is the statistical fMRI modelling approach. The delay between stimulus response and confidence rating is extremely short if one wants to incorporate the same parametric regressors at both the decision and confidence rating stage. I appreciate that there is a jitter of 1.5-4 s, however, given this short average duration (~2.6 s, roughly corresponding to a bit more than 1 TR), I suspect that adding the confidence rating as parametric modulator at both the rating and decision stage will be highly correlated. Even if FSL allows to run such model, highly correlated regressors (after convolution with the HRF) can have dramatic effects on the beta weights due to variance inflation (see for instance, Mumford et al., 2015).

On a related issue, the authors write: "we also included a parametric regressor modulated by subjects' reaction time on the direction discrimination task (duration = 0.1 s, locked to the time of behavioural response)". First, did the authors also include the main effect regressor? This is not reported. If it wasn't included, then the model is wrongly specified. You cannot include a parametric regressor without including the main effect regressor. In any case, if this main effect regressor is indeed included, then once more, I suspect that this main effect regressor will be highly correlated with the main effect regressor that is included on trial onset. The mean response times are less than 1 s. If you convolve two stick functions that are less than 1 s apart, the resulting convolved regressors will be highly correlated.

I would like to see (and I think this should be formally included as supplementary information in the manuscript) for the current design, an example of the design and design_cov figures produced by FLS for two or three subjects (or in general that the authors report the correlation not only between original the parametric regressors, but also the correlations between all the regressors including both main effects and parametric regressors after convolution with the HRF).

Regarding the first point, I understand that the authors wanted to split explanatory variance between the decision and rating stages, but what I recommend and find more appropriate given my above-mentioned concern (if the authors insist in investigating the separation of confidence between the decision and the rating stages) is to run to separate GLMs, one with the parametric regressor on the decision stage, and one with the parametric regressor on the confidence rating stage and investigate whether the main conclusions of the authors still hold. Then, in a second level analysis the authors could investigate at which time point (decision or rating) there is a stronger relationship between the confidence ratings and BOLD responses.

2) In the results presented in Figure 2B, if I understood correctly, the authors report the y(t) out-of-sample values of the "unseen" data for the middle confidence level. I am not convinced that the results reported by the authors are strong evidence for their decoder's ability to generate sensible out-of-sample y(t) values as this could simply reflect regression to the mean (if the authors use the middle confidence level for reading out y(t) by using a dichotomous decoder). I do not think that this result is especially revealing nor necessary to conduct the subsequent fMRI analysis (see my next point).

3) Regarding the use of the decoded value y(t) as parametric regressor for the fMRI analysis, if I understood correctly, the authors use out of sample values of y(t) only for the middle confidence level (see my concern in the point above), whereas for the low and high confidence levels this was not the case. I think that a more appropriate analysis would be to obtain values y(t) fully out of sample. The authors can split the data in n-folds (for instance 10) and use n-1 folds to train the data and obtain the yCONF regressor using the remaining fold for decoding, and then use these yCONF values as regressors of the fMRI data (see for instance for a similar n-fold cross-validation approach: van Bergen et al., 2015 Nature Neuroscience). Given the nature of the decoder used by the authors (dichotomous predictions), it should be enough that the authors split the data in high and low confidence levels to train the decoder (and therefore it is not necessary to use three levels or more).

4) Subsection “EEG-derived measure of confidence”, last paragraph: Maybe I missed the point, but for me it is not entirely clear why it is expected that the discriminant component amplitudes are not different for correct and incorrect answers. One of the well established statistical signatures of confidence is that confidence is markedly different for correct and incorrect responses (e.g. see Sanders, Hangya and Kepecet al., 2016; Urai, Braun and Donner, et al.2017). Why didn't the authors expect a separation (see my next point for a related concern), and if not, what the discriminant component amplitude really reflects? More discussion on this point in general would be great.

5) Subsection “Stimuli and task”, last paragraph: On a related point (and perhaps an important caveat of this study), I do not understand why the authors excluded or explicitly asked the participants to "abstain from making a confidence response on a given trial if they became aware of having made an incorrect response". Again, one of the well established signatures of confidence is that confidence is markedly different for correct and incorrect responses. How the results would have been affected without such explicit instruction to the participants? In my opinion, this confidence information should not be excluded or spuriously biased via instructions to the participants. Therefore, I am afraid that what the authors are capturing with their actual confidence ratings (and therefore the decoded values y(t)) is a biased response that is not formally confidence per se. I urge the authors to report this potential caveat and make a clear case (from the beginning) of why this strategy was adopted in the first place.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for submitting your article "Human VMPFC encodes early signatures of confidence in perceptual decisions" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Tobias H Donner as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Sabine Kastner as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The authors have made substantial new analyses in their revision, and I acknowledge the paper has improved on the methodological level. Again we are impressed by the technical achievement that clearly establishes a link between the EEG-derived construct and fMRI activity. However, reviewers are still concerned with the interpretation of this construct at a functional/cognitive level. The authors fitted response time distributions for correct and error trials with a race model. The reviewers appreciate the effort made to address these concerns, but are not sufficiently convinced by the insight this provides into the underlying mechanisms.

Major comments:

1) One issue is that the model makes the same predictions for every trial, as task difficulty (motion coherence) is held constant. So the authors are bound to between-subject correlations, which they found indeed between modeled sensory evidence at the bound (Δe) and EEG-derived predictor of confidence rating (Y). My understanding of the current interpretation is that Y represents evidence plus some neural noise, and then that confidence rating (R) is Y plus something else (perhaps noise again) which could be loosely defined as 'meta-cognitive reappraisal'. This is not very informative, and not even directly tested.

A more straightforward test would be a mediation analysis, assessing whether Y could indeed mediate the link from Δe to R. The alternative hypothesis to be discarded is that Δe is actually closer to R, which would take us back to the fundamental question of how Y can be specified in cognitive terms.

Yet a more informative use of the race model would be to fit trial-by-trial variations in R. This means allowing free parameters to vary across trials, and to test their potential relationship with Y. It could be for instance that Y fluctuations arise from variations in the starting point, or, within a Bayesian framework, in the prior on motion direction. Having said this, my agenda is not to bury the paper under requests for additional work. A clarified relationship between Δe, Y and R may be a reasonable limit to what can be inferred from the dataset.

2) A related issue comes with the new analysis provided to substantiate a role for confidence in behavioral control. The authors found that higher confidence predicts repetition of the same choice in the next trial, if motion direction is the same as in the current trial. This is not in line with the computational model in its present form. It could mean that confidence influences the prior on motion direction, but this would obviously not be adaptive. There is therefore a need to reconcile this finding with the generative model of choice and confidence.

3) Y is not different for correct/incorrect responses: The authors should add a figure plotting Y split for correct and incorrect responses (perhaps next to the new panel 2C) indicating the quantitative difference. Also, please add an explicit explanation (in the Results/Discussion section) about this result. This is essential, as I believe it reveals quite a lot about what type of information Y carries, namely, it is not the classical statistical signature of confidence (see Sanders and Urai work) but something different in line to the arguments that the authors give at the end of the response to this point, i.e. not the probability that the choice is correct, but something else. The authors try to describe this "dissociation" to some extent in other parts of the text, but should be more explicit about this point.

4) Please add the results of the neural correlates of Y at the rating stage (can be in the supplement), and briefly comment about this in the Results section. Reviewers understood that the authors did not want that to focus on this stage, but it is quite a nice result that information about Y transitions from vmPFC at the decision stage to motor related areas at the confidence rating stage.

https://doi.org/10.7554/eLife.38293.026

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Reviewer #1:

This is an interesting and timely study into the neural basis of perceptual decision confidence behavior. The overall approach is original and state-of-the-art. The early confidence signal uncovered in VMPFC is novel and potentially important. That said, I am troubled by a number conceptual and methodological issues.

1) One conceptual limitation is that the paper does not provide any insight into how the early confidence signal is constructed in the brain – specifically, how the confidence signal relates to sensory evidence, and the internal decision variable that the brain derives from that evidence. This issue is central to current theoretical work on perceptual decision confidence. Addressing this issue would substantially raise the significance of the paper.

We agree that a better understanding of the mechanisms underlying these early confidence signals is essential for linking theoretical and empirical work on decision confidence. In the revised manuscript, we have employed a computational modelling approach to address this question, and provide a prospective link between observed neural confidence signals and the perceptual decision process.

Specifically, we fitted our behavioural data with a variant of the race model of decision making (Vickers, 1979; Vickers and Packer, 1982; De Martino et al., 2013) (Materials and methods, subsection “Modelling decision confidence”, first paragraph), which describes the decision process as a stochastic accumulation of perceptual evidence over time by two independent signals representing the possible choices (with confidence represented as the difference in the evidence accumulated towards the two choices at the termination of the decision process – Δe). Overall the model fitted our behavioural data well, and importantly, we found that our neural measures of confidence (EEG-derived discriminant component – Y) were able to capture patterns in the model estimates of confidence (Δe) across participants (Results, subsection “Dynamic model of decision making”, last paragraph). In particular, for each subject, we computed the mean confidence difference between correct and error trials, as reflected by the neural signals (Y) and the model estimates (Δe) and tested the extent to which these quantities were correlated across subjects. This relative measure, which captured the relationship between confidence and choice accuracy, also ensured that any potential between-subject differences in the overall magnitude of the discriminant component (e.g., due to across-subject variability in overall EEG power) were subtracted out. Indeed, we found a significant positive correlation, such that subjects who showed stronger Y difference between correct and error trials also showed higher correct vs. error difference in Δe (R=.48, p=.019, robust correlation coefficient obtained using the percentage bend correlation analysis (Wilcox, 1994); see Figure 3D), suggesting that neural confidence could arise from a race-like process similar to that implemented by the current model. In other words, it is possible that, in line with the “balance of evidence hypothesis” (Vickers et al., 1979) and the idea that confidence emerges from the process of decision formation itself (Kiani and Shadlen, 2009; Gherman and Philiastides, 2015), the observed early EEG-derived measures of confidence (Y) may reflect the difference in the evidence accumulated towards the two choices at the time of decision. Alternatively, Y could also represent a (potentially noisy) readout of this difference (e.g., by a distinct system than the one supporting the perceptual choice itself).

How these early signatures of confidence contribute to post-decisional metacognitive signals and eventual confidence reports remains an open question that might be more adequately addressed with specifically tailored experimental designs (for example, by explicitly interrogating the transfer of information between networks associated with decisional and post-decisional confidence; e.g., (Fleming et al., 2018). We now discuss this issue in the seventh paragraph of the Discussion.

The functional connectivity analysis might shed some light onto this issue, but the authors should take VMPFC (rather than RLPFC, as they do now) as a seed. Then, the analysis should reveal two sets of regions: those that drive the VMPC signal (regions encoding the decision variable?) and those are driven by the VMPC signal (RLPFC, as the authors speculate?).

We thank the reviewer for this suggestion. We now report results from a separate PPI analysis where we examined whole-brain functional connectivity using the VMPFC as seed (Materials and methods, subsection “Psychophysiological interaction analysis”). In particular, we sought to identify regions that might increase their connectivity (i.e., show stronger signal correlation) with the VMPFC seed during the decision phase of the trial (defined as the time interval between stimulus presentation and subjects’ behavioural expression of choice), relative to baseline. Based on existing literature showing negative BOLD correlations with confidence ratings in regions recruited post-decisionally (e.g., during explicit metacognitive report), such as the anterior prefrontal cortex (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018), we expected that increased functional connectivity of such regions with the VMPFC would be reflected in stronger negative correlation in our PPI.

With respect to potential functional connectivity with regions involved in perceptual decision making, we hypothesised that fMRI activity in regions encoding the decision variable would correlate negatively with confidence, in line with the idea that easier (and thus more confident) decisions are characterised by faster evidence accumulation to threshold (Shadlen and Newsome, 2001) and weaker fMRI signal in reaction time tasks (Ho et al., 2009; Kayser et al., 2010; Liu and Pleskac, 2011; Filimon et al., 2013; Pisauro et al., 2017). Accordingly, we expected that if such regions increased their functional connectivity with the VMPFC during the decision, this would also manifest as stronger negative correlation in the PPI analysis.

We found increased negative correlations with the VMPFC signal in the orbitofrontal cortex (OFC), left anterior PFC (aPFC), and right dorsolateral PFC (dlPFC), shown in updated Figure 6. Regions of the aPFC and dlPFC, in particular, have been previously been linked to perceptual decision making (Noppeney et al., 2010; Liu and Pleskac, 2011; Filimon et al., 2013), as well as post-decisional confidence-related processes (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018) and metacognition (Fleming et al., 2010; Rounis et al., 2010; McCurdy et al., 2013). We now report these results in the subsection “Psychophysiological interaction (PPI) analysis”, and discuss their potential involvement with decisional confidence in the seventh paragraph of the Discussion.

2) A second limitation is that the functional role of the neural confidence signals is not assessed. Many influential papers (theoretical and experimental) in this field have begun to uncover the roles of confidence in controlling behaviour and learning. While the task design is not tailored to addressing this issue, the authors could test for confidence-dependent, short-term changes in choice behavior or longer-term learning effects. Again, this would raise the significance of the findings reported.

We thank the reviewer for this comment, and as suggested, we have conducted a series of additional analyses to address this matter. We discuss these below.

We first tested for potential influences of confidence signals on short-term decision-related behaviour. Two recent studies have shown that confidence, as captured by behavioural (Braun et al., 2018) or physiological (Urai et al., 2017) correlates, can play a role in modulating history-dependent choice biases. We thus asked whether the neural confidence signals derived from our EEG discrimination analysis might show a similar influence on subjects’ choices.

Specifically, we tested whether trial-to-trial fluctuations in the confidence discriminant component amplitudes (YCONF) were predictive of the probability to repeat a choice on the immediately subsequent trial (PREPEAT). To this end, we divided YCONF into 3 equal bins (Low, Medium, and High) and compared the associated PREPEAT across subjects. While we found no overall significant links between YCONF and subsequent choice behaviour when considering the entire data set, we did observe a positive relationship between YCONF and PREPEATif stimulus motion on the immediately subsequent trial was in the same direction as in the current trial (one-way repeated measures ANOVA, F(2,46)=5.89, p=.005, with post-hoc tests showing a significant difference in the probability to repeat a choice after Low vs. High YCONF trials, p=.015, Bonferroni corrected; Figure 2F). Note that for this analysis, we first equalised the number of correct and error trials within each YCONF bin. This ensured that any observed modulation of PREPEAT by YCONF was independent of the correlation of Y with accuracy on the current trial(s). Specifically, for each subject, we removed either exclusively correct or error trials (depending on which of the two was in excess) via random selection from 500 permutations of the trial set. Results reported here are based on the average Y values obtained with this procedure.

We found that stronger confidence signals were associated with an increased tendency to repeat the previous choice. However, there was no modulatory effect of YCONF on choice repetition/alternation behaviour when the direction of motion on the current trial differed from that of the previous trial.

Thus, choices were only affected by previous confidence when no change in motion direction had occurred from one trial to the next. Interestingly, this suggests that subjects might be able to detect consistency with the previous stimulus without necessarily having full conscious access to the motion direction of the current stimulus, which in turn impacts the modulatory effect of previous choice confidence on subjects’ tendency to repeat their choice. We now report this analysis in the subsection “Confidence-dependent influences on behaviour”.

In a separate set of analyses, we asked whether confidence in a choice might influence subjects’ decision times on subsequent trials. Error monitoring research indicates that individuals tend to respond more slowly after having committed an error (Dutilh et al., 2012), an effect known as “post-error slowing” and thought to indicate an increase in caution. We tested whether low confidence (as captured by both subjective ratings and EEG-derived neural signatures of confidence) might have a similar impact on response time slowing, however we found no evidence for such an effect. One reason could be that response time slowing occurs when one is more confident about having made an error than a correct response (i.e., when estimated probability of being correct is below chance), whereas our behavioural results suggest this was likely a rare occurrence (the lowest confidence ratings were on average associated with chance or above-chance performance on the perceptual choice). We note that these results may differ under stronger speed emphasis (the time response limit in current paradigm was 1.35 s, with the mean response time across subjects being 994 ± 35 ms). Should the reviewer deem it necessary, we would be happy to report these analyses in the revised manuscript.

With respect to the potential role of confidence on learning, we wish to emphasise that (as the reviewer has also pointed out) the design of our behavioural paradigm was not optimised for addressing this question. In fact, we designed our experiment to specifically minimise perceptual learning effects, to avoid potential confounds with confidence (e.g., lower confidence and higher confidence trials clustering towards the beginning and towards the end of the experiment respectively, which in turn could have resulted in trivial EEG discrimination performance of low-vs.-high confidence trials due to overall signal changes in the course of the experiment – e.g., impedance changes, signal adaptation, etc.). In particular, subjects underwent task training prior to participating in the simultaneous EEG/fMRI experiment, which partly served to allow subjects’ performance to reach a plateau. Though perceptual learning can also be assessed using paradigms that maintain performance constant through online adjustments of the stimulus difficulty, we opted against such an approach to avoid potential confounding effects of stimulus difficulty on confidence.

Thus, as expected, only small to no improvements can be observed in subjects’ behavioural performance over the course of the task (e.g., mean difference in the proportion of correct responses between the first and second halves of the task =.03). Similarly, we found no significant increase in confidence ratings or neural confidence signals (YCONF) across trials.

Recent work suggests that confidence may act as an implicit (expected) reward signal and be used in the computation of prediction errors (i.e., the difference between expected and currently experienced reward) (Lak et al., 2017; Colizoli et al., 2018), thus guiding a reinforcement-based learning mechanism. Relatedly, confidence prediction error (the difference between expected and experienced confidence) has been hypothesised to act as a teaching signal and guide learning in the absence of feedback (Guggenmos et al., 2016). In the brain, this could potentially be implemented through a mechanism of strengthening or weakening information processing pathways that result in high and low confidence, respectively (Guggenmos and Sterzer, 2017). Though testing this hypothesis extends beyond the scope of the current study (see previous paragraph on purposely “clumping” learning effects), we might expect that fluctuations in expected vs. actual confidence signals as derived from the EEG data have a similar influence on perceptual learning. We now discuss this point in the Discussion (eighth paragraph).

3) The logic behind the PPI analysis needs to be unpacked – it is not clear if the result provides any conceptual insight. First, the authors seem to suggest that the VMPFC confidence signal drives RLPFC – then, why should the strength of this correlation scale with confidence? Should the correlation not be the same, regardless of whether confidence is high or low? Second, the functional consequences of this coupling result are unclear. This part of the authors' conclusions is purely speculative ("informing metacognitive evaluation and learning") – a meaningful link to behavior would help.

This is a valid point and we have aimed to rectify this issue in the revised manuscript by conducting a separate PPI analysis (see Materials and methods subsection “Psychophysiological interaction analysis”, and our earlier response) where we removed the parametric modulation by confidence when searching for functional connectivity with our seed region. Specifically, we now use the VMPFC region (which showed modulation by neural confidence in our original GLM analysis) as a PPI seed, and searched instead for regions across the brain where connectivity (i.e., BOLD signal correlation) increased during the decision phase of the trial, which we defined as the interval between stimulus presentation and behavioural choice.

As we note above, we found increased negative correlations with the VMPFC signal in the OFC, left aPFC, and right dlPFC. Regions of the aPFC and dlPFC have been linked to perceptual decision making (Ho et al., 2009; Noppeney et al., 2010; Liu and Pleskac, 2011; Filimon et al., 2013; Pisauro et al., 2017), as well as post-decisional confidence-related processes (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018) and metacognition (Fleming et al., 2010; Rounis et al., 2010; McCurdy et al., 2013).

Reviewer #2:

[…] The main issue is the circularity in the approach: the multivariate EEG decoder is trained to predict confidence ratings, and then the output of this decoder (i.e., Y) is said to represent an early predictor of confidence that is different from the rating.

The reviewer is correct in that the EEG-derived measures of confidence rely on a classification analysis between Low- and High-confidence trials as defined by subjects’ behavioural ratings. However, we wish to clarify that the EEG classifier is not trained to predict confidence ratings per se. Rather, we make use of subjects’ ratings only for the purpose of extracting the Low- and High-confidence trial groups and training the classifier, but subsequently rely only on the single-trial graded measures of “neural” confidence (Y) to make any subsequent inferences.

The underlying assumption is that, while ratings per se may not be entirely faithful representations of early confidence signals, they may carry sufficient explanatory power to reliably estimate a set of spatial weights representing the topographical contributions to confidence signals at the time of decision. Importantly, the classification output Y, obtained by subjecting the original multichannel through these neural generators, will depart from the behavioural measures of confidence in that it will contain trial-to-trial information about neural signals generated by these sources, thus potentially offering additional insight into the internal processes that underlie confidence at these early stages of the decision. We now clarify these points in the Results section (subsection “EEG-derived measure of confidence”, second paragraph).

I am convinced that vmPFC activity indeed correlates with Y and not rating, but what is Y? That is the question. Without a precise specification of this construct, I do not think there is much information to get from the result. It remains open to uninteresting interpretations: for instance Y could represent the fact that at this time point the decision has been made or not, or the proximity of the motor response, since confidence correlates with response time.

My suggestion is to build a generative model of confidence rating, in which Y would be a factor among others. What needs to be explained is how Y is generated (possibly something like evidence plus neural noise) and then how it is transformed into choice, response time and confidence rating. If Y could be estimated independently of rating, then the circularity would be broken and the dissociation between neural representations of Y and rating would be meaningful. Computational modeling may be helpful here, perhaps a race model as that used in De Martino et al., 2013.

We thank the reviewer for this thoughtful comment, which led us to the addition of a computational modelling component and inclusion of additional control analyses. We recognise the importance of providing a more concrete interpretation of the neural mechanisms that generate the observed confidence signals and have addressed this question more thoroughly as detailed below.

Firstly, with regards to the possibility that our EEG-derived measures of confidence (Y) might merely represent the termination of a decision or proximity of the motor response, we note that Y was only weakly correlated with subject’s response times (subject-averaged R=-.15; we now report this in Results, p. 11). In addition, Ys were extracted on average at least 100ms (mean 271 ± 162 ms) prior to subjects’ mean response times to minimise potential confounds with activity related to motor execution (due to increase in corticospinal excitability during this period (Chen et al., 1998)).

To control for potentially confounding effects of motor response in our fMRI analysis, we included a regressor which is parametrically modulated by subjects’ response times on the perceptual task. We reasoned that this regressor would absorb any variance related to motor planning and execution processes.

Finally, in our previous EEG work on decision confidence (Gherman and Philiastides, 2015) we used a delayed-response behavioural paradigm in which subjects were unaware of the mapping between choice and response effector whilst they made their perceptual decision. In that study we could still observe the same neural signature of confidence (i.e., consistent in terms of both timing and scalp topography). On the basis of the points above, we argue that neural measures Y are unlikely to be merely explained by motor-related processes.

Most importantly, in order to tackle the question of how confidence signals might emerge, we have used a computational modelling approach, as the reviewer suggested above. Specifically, we fitted our behavioural data with a variant of the race model of decision making (Vickers, 1979; Vickers and Packer, 1982; De Martino et al., 2013) (Materials and methods, subsection “Modelling decision confidence”, first paragraph).

This class of models describes the decision process as a stochastic accumulation of perceptual evidence over time by independent signals representing the possible choices. The decision terminates when one of the accumulators reaches a fixed threshold, with choice being determined by the winning accumulator. Importantly, confidence for binary choices can be estimated in these models as the absolute distance (Δe) between the states of the two accumulators at the time of decision (i.e., “balance of evidence” hypothesis).

In our model, the state of the accumulator is represented by two variables, L and R, which collect evidence in favour of the left and right choices, respectively (Figure 3A). At each time step of the accumulation, the two variables are updated separately with an evidence sample s(t) extracted randomly from a normal distributions with mean μ and standard deviation σ, s(t)=N(μ,σ), such that:

L(t+1) = L(t) + sL(t)

R(t+1) = R(t) + sR(t)

Here, we assumed that evidence samples for the two possible choices are drawn from distributions with identical variances but distinct means, whereby the mean of the distribution is dependent on the identity of the presented stimulus. For instance, a leftward motion stimulus will be associated with a larger distribution mean (and thus on average faster rate of evidence accumulation) in the left (stimulus-congruent) than right (stimulus-incongruent) accumulator. We defined the mean of the distribution associated with the stimulus-congruent accumulator as μcongr=0.1 (arbitrary units), and that of the stimulus-incongruent accumulator as μincongrcongr/r, where r is a free parameter in the model. For each trial, evidence accumulation for the two accumulator variables begins at 0 and progresses towards a fixed decision threshold θ. Finally, response time is defined as the time taken to reach the decision threshold plus a non-decision time (nDT) which accounts for early visual encoding and motor preparation processes.

We illustrate model fits in Figure 3C (with individual subject fits shown in Figure 3—figure supplements 1 and 2). Response time distributions for correct and error trials are summarised separately using 5 quantile estimates of the associated cumulative distribution functions (Forstmann et al., 2008). Overall, we found that this model provided a good fit to the behavioural data (Accuracy: R=.76, p<.001, Figure 3B; RT: subject-averaged R=.965, all p<=.0016)

Crucially, we proceeded to inspect the relationship between our neural measures of confidence (EEG-derived discriminant component Y) and the confidence estimates predicted by the decision model (Δe) at the subject group level. Specifically, for each subject, we extracted the mean difference in confidence (as reflected by Y and Δe, respectively) between correct and error trials. We then tested the extent to which these quantities correlated across subjects. This relative measure, which captured the relationship between confidence and choice accuracy, also ensured that any potential between-subject differences in the overall magnitude of the discriminant component Y (e.g., due to across-subject variability in overall EEG power) were subtracted out. We found a significant positive correlation (i.e., subjects who showed stronger difference in Y between correct and error trials also showed a higher difference in Δe, R=.48, p=.019, see Figure 3D), opening the possibility that neural confidence signals might arise directly from a process similar to the race-like dynamic implemented by the current model.

Two possible interpretations of Y may be proposed based on our modelling results. Namely, in following the “balance of evidence hypothesis” (Vickers et al., 1979), the observed early EEG-derived measures of confidence may reflect the difference in the evidence accumulated towards the two choices at the time of decision, consistent with the idea that confidence emerges from the process of decision formation itself (Kiani and Shadlen, 2009; Gherman and Philiastides, 2015). An alternative interpretation is that Y represents a (potentially noisy) readout of this difference (i.e., by a distinct system than the one supporting the perceptual choice itself).

How these early signatures of confidence contribute to post-decisional metacognitive signals and eventual confidence reports remains an open question that might be more adequately addressed with specifically tailored experimental designs (for example, by explicitly interrogating the transfer of information between networks associated with decisional and post-decisional confidence). We discuss these interpretations in the seventh paragraph of the Discussion.

Another concern relates to the PPI analysis. I cannot make sense of the result that vmPFC activity reflects the interaction between confidence rating and time series in the rostrolateral PFC. If this region already signals confidence level, then the interaction regressor is something like confidence squared. Does this really tell us anything about the passage of information from vmPFC to rlPFC?

We thank the reviewer for their comment. To address this point, we have conducted a new PPI analysis (see Materials and methods subsection “Psychophysiological interaction analysis”) where we use the VMPFC region as a seed (as kindly suggested by two of the reviewers), and have removed the parametric modulation by confidence from the psychological regressor. Specifically, we searched for potential regions across the brain which may increase their connectivity with confidence-encoding VMPFC during the decision phase of the trial (defined as the interval between stimulus presentation and behavioural choice) relative to baseline, such as those involved in the formation of the decision and/or metacognition.

Based on existing literature showing negative BOLD correlations with confidence ratings in regions recruited post-decisionally (e.g., during explicit metacognitive report), such as the anterior prefrontal cortex (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018), we expected that increased functional connectivity of such regions with the VMPFC would be reflected in stronger negative correlation in our PPI.

Similarly, we hypothesised that fMRI activity in regions encoding the perceptual decision would also correlate negatively with confidence / VMPFC activation, in line with the idea that easier (and thus more confident) decisions are characterised by faster evidence accumulation to threshold (Shadlen and Newsome, 2001) and weaker fMRI signal in reaction time tasks (Ho et al., 2009; Kayser et al., 2010; Liu and Pleskac, 2011; Filimon et al., 2013; Pisauro et al., 2017). Accordingly, we expected that if such regions increased their functional connectivity with the VMPFC during the decision, this would manifest as stronger negative correlation in the PPI analysis.

We found increased negative correlations with the VMPFC signal in the orbitofrontal cortex (OFC), left anterior PFC (aPFC), and right dorsolateral PFC (dlPFC), shown in updated Figure 6. Regions in the aPFC and dlPFC have been linked to perceptual decision making (Noppeney et al., 2010; Liu and Pleskac, 2011; Filimon et al., 2013), as well as post-decisional confidence-related processes (Fleming et al., 2012; Hilgenstock et al., 2014; Morales et al., 2018) and metacognition (Fleming et al., 2010; Rounis et al., 2010; McCurdy et al., 2013)

Reviewer #3:

[…] The findings here are interesting and add to and extend the literature on confidence signals in the brain. However, I have a number of points that still need to be clarified.

1) The use of a speeded perceptual task means that the stimulus presentation duration is shorter for high confidence than for low confidence trials. I was wondering which effect this contamination effect has on the EEG classifier, and thus also in turn on the BOLD signals observed in the third main (i.e. EEG-based) analysis. Could it be that the classifier is partly picking up the effect of the longer stimulus duration?

We thank the reviewer for pointing out this potential confound. We now performed additional analyses to address it more directly.

We first investigated the correlation between our EEG-derived measures of confidence (Y) and the duration of the visual stimulus (which, as the reviewer points out, is equal to subjects’ response time). We reasoned that if stimulus presentation time had an influence on the classification analysis and consequently on the estimation of Y, we might expect these two measures to be highly correlated. However, we found only a weak correlation between Y and stimulus duration (subject-averaged R=-.15), suggesting that classification results could not have been solely driven by this factor. We report this additional analysis in the revised manuscript (Results, subsection “EEG-derived measure of confidence”, seventh paragraph). Please also note that in a previous study from our lab (Gherman and Philiastides, 2015) in which we used EEG alone to temporally characterise decision confidence, the duration of stimulus presentation was fixed at.1 s (this was followed by a forced delay of 2-2.5s prior to response, during which subjects were not aware of the mapping between stimulus and motor response). Importantly, we observed a similar temporal profile and scalp topography in a Low-vs.-High confidence discrimination.

With regards to the impact on the fMRI data, we first wish to clarify that the three sets of fMRI results we report in relation to the neural correlates of confidence were obtained using a single GLM model, which included regressors for each of our variables of interest (namely, confidence reports at the time of decision and rating, respectively, and EEG-derived confidence measures), as well as additional nuisance regressors. Crucially, we included a regressor parametrically modulated by stimulus duration (i.e., response time) which served to regress out potential variance shared with the EEG-derived regressor (note that parameter estimates obtained with standard GLM analysis in FSL reflect variability that is unique to each regressor, thus ignoring common variability, Mumford et al., 2015). We have amended the text to make the above points explicit (Results subsection “fMRI correlates of confidence”).

2) I would be more upfront with approach used for defining and dissociating the different temporal stages (decision making, rating). I couldn't work this out until I reached the Materials and methods section, but it is vital to understanding the design.

We have added explicit definitions of the “decision” and “rating” phases of the trial in the Results (subsection “fMRI correlates of behavioural confidence reports”).

3) I didn't understand the logic of the autocorrelation analysis that was used to control for attention. Please explain.

The aim of the autocorrelation analysis was to test for potential sustained fluctuations of attention that span multiple trials and might therefore be reflected in serial dependencies of either behaviour (e.g., choice) (De Martino et al., 2013) or neural signals. In particular, we were interested in ruling out the possibility that such attentional fluctuations might be the driving factor behind the variability in our EEG-derived confidence measures (Y). We expected that if that were the case, Y values on a given trial would be reliably predicted by those observed in the immediately preceding trials. However the regression model we used to test for this possibility explained only a small fraction of the variance in our Ys (subject-averaged R2 =.03; Results subsection “EEG-derived measure of confidence”, eighth paragraph). We have amended the text to clarify the autocorrelation analyses (Results, subsection “Behaviour”, last paragraph and subsection “EEG-derived measure of confidence”, eighth paragraph).

4) The summary of time series as "delay" and "peak" is too dense (Figure 2A). It would be better to show individual time courses to confirm that the data can be appropriately summarized by a delay and peak.

We have updated our figure to contain the time course of the confidence discrimination performance (Az) for individual subjects (Figure 2A). Note that on average we only considered peaks occurring at least 250ms after stimulus onset (to avoid early visual processes) and 100ms (mean 271 ± 162 ms) prior to subjects’ average response times to minimise potential confounds with activity related to motor execution (due to a sudden increase in corticospinal excitability in this period (Chen et al., 1998)) (Materials and methods).

5) How can it be ensured that the EEG-derived measures are independent of difficulty, accuracy and attention? For this it would be necessary to assess the relationship between the EEG-measure and these behavioral properties explicitly (ideally to plot them as well).

To control for confounding effects of difficulty on the neural measures of confidence, we maintained the motion coherence (i.e., difficulty) of the visual stimuli constant across the entire experiment, and for each subject. In addition, each stimulus in the first half of the experiment was presented again (in an identical form) in the second half of the experiment. This enabled us to compare both behavioural and neural responses to identical stimuli and further assess whether subjects might have been sensitive to subtle differences in low-level physical properties of the stimulus that go beyond motion coherence (e.g., the motion dynamics of individual dots). Importantly, we found no correlation between the EEG-derived confidence measures (i.e., confidence-discriminating component amplitudes, Y) associated with the two sets of identical stimuli (subject-averaged R=.02) (Results subsection “EEG-derived measure of confidence”, last paragraph). We believe these observations represent strong evidence that the EEG-derived measures of confidence are independent of objective difficulty.

We have also performed a control analysis to verify whether our EEG-derived confidence measures are independent of accuracy. Namely, Figure 2D illustrates that these neural signatures continue to show significant modulation by (reported) confidence when accuracy is constant (i.e., when only correct trials are considered).

Finally, we tested for potential attentional effects on EEG-derived confidence measures as follows. Firstly, we investigated the influence of occipitoparietal prestimulus α, a neural signal thought to correlate with attention and predict visual discrimination (Thut et al., 2006; van Dijk et al., 2008), on the EEG-derived confidence measures. We found that Y measures associated with High vs. Low prestimulus alpha power did not differ significantly (Results subsection “EEG-derived measure of confidence”, last paragraph; Figure 2E). Nevertheless, we also included a parametric regressor modulated by prestimulus alpha power in our fMRI GLM model in order to absorb potential variability associated with this signal.

Secondly, we focused on potential effects of sustained fluctuations in subjects’ attention (i.e., across trials). Specifically, we looked for correlations in the EEG-derived measures between neighbouring trials. We found that a serial autocorrelation analysis predicting component amplitudes Y based on the immediately preceding 5 trials provided limited explanatory power (subject-averaged R2 =.03) (Results, see the aforementioned paragraph). Overall, these observations suggest that our results are unlikely to be purely explained by attentional factors.

Other comments:

In order to accord with requirements for reporting statistics the paper here should add a statement that "No explicit power analysis was conducted for determining sample size".

We have updated the text of the manuscript accordingly (subsection “Participants”).

Reviewer #4:

[…] Overall this study offers new insights on the origins of confidence during perceptual decisions, by showing that the vmPFC also encodes an early confidence readout for this type of choices. I am a fan of the authors' methodological strategy to study human decision-making. However, for this study, my points of criticism are mainly related to the set of statistical analyses that the authors stand on to make their conclusions, which I consider should be revised. I provide some suggestions that may help to strengthen the authors' conclusions.

1) An important concern is the statistical fMRI modelling approach. The delay between stimulus response and confidence rating is extremely short if one wants to incorporate the same parametric regressors at both the decision and confidence rating stage. I appreciate that there is a jitter of 1.5-4 s, however, given this short average duration (~2.6 s, roughly corresponding to a bit more than 1 TR), I suspect that adding the confidence rating as parametric modulator at both the rating and decision stage will be highly correlated. Even if FSL allows to run such model, highly correlated regressors (after convolution with the HRF) can have dramatic effects on the beta weights due to variance inflation (see for instance, Mumford et al., 2015).

We wish to clarify that the rating regressor at the time of decision (RatingsDEC) is locked to the onset of the random dot stimulus (i.e., rather than the behavioural response to the stimulus). Thus, the actual delay between the onsets of the decision-locked (RatingsDEC) and rating-locked (RatingsRAT) regressors is on average 3.84 (SD=.02) seconds. The jitter of 1.5-4 s that the reviewer is referring to spans only the time interval between the end of the response time window and onset of the rating prompt.

Relatedly, we note that in our experimental design, the timing of the inter-stimulus jitters was optimised using a genetic algorithm (Wager and Nichols, 2003) which served to increase estimation efficiency (we now report this in Materials and methods subsection “Main task”, second paragraph).

As per the reviewer’s suggestion we have now calculated the correlation between these two confidence rating regressors and show that they are only weakly correlated (mean R=-.13) (see Figure 5—figure supplement 3, top panel). To be fully transparent, we show the correlations for individual subjects and runs separately (Figure 5—figure supplement 3).

To more directly address this concern, we conducted two additional GLM analyses whereby only the regressors pertaining to one phase of the trial were included at a time (i.e. either the decision, or the rating, respectively). We found that activations for RatingsDEC, RatingsRAT, as well as YCONF remained qualitatively and quantitatively virtually identical to the original design that including both regressors (see Figure 5—figure supplement 4).

On a related issue, the authors write: "we also included a parametric regressor modulated by subjects' reaction time on the direction discrimination task (duration = 0.1 s, locked to the time of behavioural response)". First, did the authors also include the main effect regressor? This is not reported. If it wasn't included, then the model is wrongly specified. You cannot include a parametric regressor without including the main effect regressor. In any case, if this main effect regressor is indeed included, then once more, I suspect that this main effect regressor will be highly correlated with the main effect regressor that is included on trial onset. The mean response times are less than 1 s. If you convolve two stick functions that are less than 1 s apart, the resulting convolved regressors will be highly correlated.

Our model included a main effect regressor locked to the onset of the stimulus, which served to account for all parametric regressors in the decision phase of the trial, including the RT-modulated regressor (considering the short time span between RT and stimulus onset and the slow nature of the HRF).

We aimed to address the reviewer’s concern by running a separate GLM analysis which formally assessed whether including an unmodulated regressor at the time of RT would alter our results. We found that while this new regressor was indeed correlated with the unmodulated regressor at the time of stimulus onset (VSTIMDEC)(R=.73) as the reviewer speculates, activations for the YCONF regressor remained unchanged (see Author response image 1).

Positive parametric modulation of the BOLD signal by EEG-derived measures of confidence (during the decision phase of the trial), resulting from a GLM analysis whereby we included an additional unmodulated regressor locked to the time of response on the perceptual decision. Correlations with the EEG-derived confidence regressor have remained largely identical to those observed with the original GLM analysis (see Figure 4 for comparison). Results are reported at |Z|≥2.57, and cluster-corrected using a resampling procedure (minimum cluster size 162 voxels).

I would like to see (and I think this should be formally included as supplementary information in the manuscript) for the current design, an example of the design and design_cov figures produced by FLS for two or three subjects (or in general that the authors report the correlation not only between original the parametric regressors, but also the correlations between all the regressors including both main effects and parametric regressors after convolution with the HRF).

We thank the reviewer for this suggestion. We have now computed variance inflation factors for all regressors in our model and found that mean VIF = 3.57 ( ± 1.83), with multicollinearity typically being considered high if VIF > 5-10. We included these results in the manuscript (Materials and methods subsection “GLM analysis.”, last paragraph). We also illustrate the correlations between the identical confidence-related parametric regressors locked to the decision vs. rating stages of the trial (please see Figure 5—figure supplement 3), separately for each subject and experimental run (see response to earlier comment above).

Regarding the first point, I understand that the authors wanted to split explanatory variance between the decision and rating stages, but what I recommend and find more appropriate given my above-mentioned concern (if the authors insist in investigating the separation of confidence between the decision and the rating stages) is to run to separate GLMs, one with the parametric regressor on the decision stage, and one with the parametric regressor on the confidence rating stage and investigate whether the main conclusions of the authors still hold.

We have conducted the two suggested analyses using separate GLMs (please refer to our earlier comment). Our results remain quantitatively and qualitatively nearly identical (Figure 5—figure supplement 4).

Then, in a second level analysis the authors could investigate at which time point (decision or rating) there is a stronger relationship between the confidence ratings and BOLD responses.

We wish to clarify that our goal here was not necessarily to assess whether BOLD signals show stronger correlation with confidence ratings at one stage or the other. In fact, our results are particularly intriguing in that distinct neural networks appear to carry information about confidence during these two stages of the trial. In particular, activations during the decision phase of the trial such as the VMPFC or anterior cingulate cortex, appear consistent with a more automatic encoding of confidence, i.e., in the absence of explicit confidence report (Lebreton et al., 2015; Bang and Fleming, 2018). In line with this, we also observed activations in regions associated with the human reward/valuation system, such as the striatum and orbitofrontal cortex. In contrast, regions showing correlation with confidence during the confidence rating stage, in particular the anterior prefrontal cortex, have been previously associated with explicit metacognitive judgment/report (Fleming et al., 2012; Morales et al., 2018), perhaps serving a role in higher-order monitoring and confidence communication. We now address these points in the Discussion (third paragraph).

2) In the results presented in Figure 2B, if I understood correctly, the authors report the y(t) out-of-sample values of the "unseen" data for the middle confidence level. I am not convinced that the results reported by the authors are strong evidence for their decoder's ability to generate sensible out-of-sample y(t) values as this could simply reflect regression to the mean (if the authors use the middle confidence level for reading out y(t) by using a dichotomous decoder). I do not think that this result is especially revealing nor necessary to conduct the subsequent fMRI analysis (see my next point).

Please note that the results we report in relation to the Low- vs. High-confidence discrimination analysis (i.e., classifier performance) were based on a cross validation procedure to ensure there was no overfitting. Specifically, classifier performance (Az) for each subject was computed on the basis of Y values obtained from a leave-one-out procedure, whereby confidence-discriminating spatial filters (w) estimated using N-1 trials at a time were applied to the remaining trial(s) to obtain out-of-sample Y values. For clarity, we now describe this procedure in more detail in the revised paper (Materials and methods subsection “Single-trial EEG analysis”, third paragraph).

To address the reviewer’s concern, we compared these out-of-sample Ys with the values of Y obtained from the original Low- vs. High-confidence discrimination, and found that they were highly correlated (mean R value across subjects:.93, now reported in the aforementioned paragraph). Further, we repeated the analyses presented in Figure 2B, and found that on average, Y values for Medium-confidence trials continued to be situated between, and significantly different from, those in the Low-confidence (t(23)=-4.37, p<.001) and High-confidence (t(23)=-5.04, p<.001) trials; see new Figure 2—figure supplement. 2.

3) Regarding the use of the decoded value y(t) as parametric regressor for the fMRI analysis, if I understood correctly, the authors use out of sample values of y(t) only for the middle confidence level (see my concern in the point above), whereas for the low and high confidence levels this was not the case. I think that a more appropriate analysis would be to obtain values y(t) fully out of sample. The authors can split the data in n-folds (for instance 10) and use n-1 folds to train the data and obtain the yCONF regressor using the remaining fold for decoding, and then use these yCONF values as regressors of the fMRI data (see for instance for a similar n-fold cross-validation approach: van Bergen et al., 2015 Nature Neuroscience). Given the nature of the decoder used by the authors (dichotomous predictions), it should be enough that the authors split the data in high and low confidence levels to train the decoder (and therefore it is not necessary to use three levels or more).

We have now extracted out-of-sample values for the High- and Low-confidence levels resulting from a leave-one-trial-out procedure (please see previous point) and found that these values were highly correlated with the original Ys (mean R across subjects =.93), which we take as evidence for the generalisability of our decoder. Repeating the main GLM analysis using these values yielded nearly identical results (Figure 5—figure supplement 2).

Note that the use of 3 confidence bins (i.e., performing the EEG classification analysis using the extreme ends of the confidence rating scale) was done in an effort to increase sensitivity of the classification analysis and obtain more reliable discrimination weights (that is, minimise overlap between internal representations of High vs. Low confidence caused by potential within-subject inconsistency in confidence ratings), which would in turn improve the quality of our EEG-informed fMRI results. For this reason, we opted to perform the above analysis using the original trial split. Finally, please note that due to the limited number of trials per subject in the current experiment (≤320), we avoided estimating neural signals on subsets of the data in order to preserve the reliability of our results.

4) Subsection “EEG-derived measure of confidence”, last paragraph: Maybe I missed the point, but for me it is not entirely clear why it is expected that the discriminant component amplitudes are not different for correct and incorrect answers. One of the well established statistical signatures of confidence is that confidence is markedly different for correct and incorrect responses (e.g. see Sanders, Hangya and Kepec, 2016; Urai, Braun and Donner, 2017). Why didn't the authors expect a separation (see my next point for a related concern), and if not, what the discriminant component amplitude really reflects? More discussion on this point in general would be great.

This is an important consideration and we thank the reviewer for pointing it out. The goal here was to ensure that variability in discriminant component amplitudes were not driven solely by fluctuations in decision accuracy. For example, one could argue that the confidence discrimination patterns we observe in the EEG might be purely explained by an unbalanced proportion of correct and error responses in the confidence trial bins used for discrimination (e.g., more correct trials in the High-confidence bin than the Low-confidence bin). We wanted to demonstrate that even when accuracy is constant (i.e., when correct and error trials are considered separately) we continue to see significant effects of confidence in the discriminant amplitude Y (Figure 2D). However we agree that the lack of an effect of accuracy on Y calls for additional investigation. We ran a correlation analysis to assess this relationship and found a small but significant positive correlation between accuracy and Y (t-test on regression coefficients: t(23)=8, p<.001). The original analysis might have been less sensitive to this effect due to the binning of trials by both confidence and accuracy, which meant that mean Y estimates per bin were made from only a few trials (≤10) in some cases. We report this new analysis in the revised paper (Results subsection “EEG-derived measure of confidence”, sixth paragraph) and have also added a new panel which illustrates this relationship (Figure 2C).

More broadly, the point we wish to make is that while our EEG-derived neural measures of confidence might correlate partly with performance (i.e., choice accuracy) as would be expected from existing work and as pointed out by the reviewer, it can more importantly be decoupled from it. Indeed, much of recent work supports the idea of a dissociation between performance and confidence/metacognition (Lau and Passingham, 2006; Rounis et al., 2010; Komura et al., 2013; Lak et al., 2014; Fleming and Daw, 2017).

5) Subsection “Stimuli and task”, last paragraph: On a related point (and perhaps an important caveat of this study), I do not understand why the authors excluded or explicitly asked the participants to "abstain from making a confidence response on a given trial if they became aware of having made an incorrect response". Again, one of the well established signatures of confidence is that confidence is markedly different for correct and incorrect responses. How the results would have been affected without such explicit instruction to the participants? In my opinion, this confidence information should not be excluded or spuriously biased via instructions to the participants. Therefore, I am afraid that what the authors are capturing with their actual confidence ratings (and therefore the decoded values y(t)) is a biased response that is not formally confidence per se. I urge the authors to report this potential caveat and make a clear case (from the beginning) of why this strategy was adopted in the first place.

We apologise for the ambiguity in the description of the task instructions. To clarify, subjects were instructed to refrain from making a confidence rating only if a motor mapping error had been made, for example a premature response that was accidentally initiated in favour of one motion direction despite clear perceptual representation of the opposite direction (this was reported by some subjects following practice sessions, and in previous experiments). Our motivation for wanting to exclude these trials from our analyses was that we were interested in the representations of confidence associated with the perceptual judgment/choice per se, as opposed to the physical action that accompanied it.

As there was no strong emphasis on speed in our paradigm (subjects were allowed up to 1.3s to make a response) we anticipated that this would be a relatively rare occurrence. Indeed, the number of trials in which no confidence rating was recorded following a perceptual choice was very small, on average 6.13 (SD=5.4) trials per subject, representing only 1.9% (SD=1.7%) of the total number of trials, thus suggesting that this instruction could not have had a substantial impact on our results.

We have amended the revised paper to clarify the description of the task instructions (see Materials and methods subsection “Main task”).

[Editors' note: the author responses to the re-review follow.]

Major comments:

1) One issue is that the model makes the same predictions for every trial, as task difficulty (motion coherence) is held constant. So the authors are bound to between-subject correlations, which they found indeed between modeled sensory evidence at the bound (Δe) and EEG-derived predictor of confidence rating (Y). My understanding of the current interpretation is that Y represents evidence plus some neural noise, and then that confidence rating (R) is Y plus something else (perhaps noise again) which could be loosely defined as 'meta-cognitive reappraisal'. This is not very informative, and not even directly tested.

A more straightforward test would be a mediation analysis, assessing whether Y could indeed mediate the link from Δe to R. The alternative hypothesis to be discarded is that Δe is actually closer to R, which would take us back to the fundamental question of how Y can be specified in cognitive terms.

Yet a more informative use of the race model would be to fit trial-by-trial variations in R. This means allowing free parameters to vary across trials, and to test their potential relationship with Y. It could be for instance that Y fluctuations arise from variations in the starting point, or, within a Bayesian framework, in the prior on motion direction. Having said this, my agenda is not to bury the paper under requests for additional work. A clarified relationship between Δe, Y and R may be a reasonable limit to what can be inferred from the dataset.

We thank the reviewer for their constructive feedback. We have now carried out the suggested mediation analysis and report these results in the paper (subsection “Exploratory mediation analysis”). We have also made changes to the Discussion where we attempt to clarify the relationship between Δe, Y, and Ratings, whilst acknowledging current limitations in interpretation and highlighting some remaining open questions (Discussion, fifth to eleventh paragraphs).

Please note that as our current computational model does not provide trial-to-trial correspondence between model data (Δe) and observed data (Y/Ratings), the mediation analysis was performed at the subject group level (Materials and methods subsection “Exploratory mediation analysis”). As in our previous analysis linking Y and Δe (Figure 3D), we first computed the mean difference between correct and error trials for each of the three variables of interest, to produce measures that are comparable across subjects (i.e., by removing individual differences in the trial-averaged scores that may be due to task-irrelevant factors, e.g., rating biases). These quantities (henceforth referred to as ΔeDIFF, YDIFF, and RatingsDIFF) were then subjected to a mediation analysis testing the hypothesis that Y mediates the link between Δe and ratings. Specifically, we defined a three-variable path model (Wager et al., 2008) with delta-EDIFF as the predictor variable, RatingsDIFF as the dependent variable, and YDIFF as the mediator. Consistent with the initial prediction, we found that: 1) delta-EDIFF was a significant predictor of YDIFF (p=.01), 2) YDIFF reliably predicted RatingsDIFF after accounting for the effect of predictor deltaEDIFF (p<.001), and 3) the indirect effect of YDIFF, defined as the coefficient product of effects 1) and 2), was also significant (p=.004) (Results subsection “Exploratory mediation analysis”).

Indeed, the mediator effect of the EEG-derived confidence is in agreement with the idea that Y represents a (potentially noisy) readout of decision-related balance of evidence (as modelled by Δe) by the vmPFC, which in turn serves as a basis for subjective confidence ratings. Please note, however, that given the across-subject nature of the analysis, these findings must be interpreted with some caution.

In conjunction with the observations presented in the manuscript, we interpret Y as largely relying on, though potentially distinct from, the quantity represented by Δe, in line with the idea of a dissociation between the information that supports the decision vs. confidence. Importantly, the timing of Y (i.e., which is recorded prior to commitment to a motor response), suggests that these early neural estimates of confidence arise in close temporal proximity to the decision, in line with an automatic readout of confidence (i.e., in the absence of explicit report) (Lebreton et al., 2015) and the proposal that vmPFC might encode an early and automatic “feeling of rightness” (Moscovitch and Winocur, 2002; Hebscher and Gilboa, 2016) in memory judgments. While dedicated research will be necessary to establish the functional role of this quantity, early/fast pre-response confidence signals could be necessary to regulate the link between decision and impending action, e.g. with low confidence signalling the need for additional evidence (Desender et al., 2018). We now make these points more explicit in the Discussion (eighth paragraph).

Regarding the link between Y and the neural activity leading to explicit confidence ratings, it has been long proposed that metacognitive evaluation relies on additional post-decisional processing (Pleskac and Busemeyer, 2010; Moran et al., 2015; Yu et al., 2015). For instance, recent evidence suggests that choice itself (and corresponding motor-related activity) impacts confidence (Fleming et al., 2015; Gajdos et al., 2018) and may help calibrate/optimise metacognitive reports (Siedlecka et al., 2016; Fleming and Daw, 2017). In this framework, Y could serve as one of multiple inputs to networks supporting retrospective metacognitive processes such as the anterior prefrontal regions (Fleming et al., 2012), which would explain both the correlation with, and dissociation from, subjects’ confidence reports (Discussion, tenth paragraph).

2) A related issue comes with the new analysis provided to substantiate a role for confidence in behavioral control. The authors found that higher confidence predicts repetition of the same choice in the next trial, if motion direction is the same as in the current trial. This is not in line with the computational model in its present form. It could mean that confidence influences the prior on motion direction, but this would obviously not be adaptive. There is therefore a need to reconcile this finding with the generative model of choice and confidence.

Our analyses investigating the influence of neural confidence signals (Y) on subsequent behaviour indicated that high Y amplitude increased the likelihood of repeating a choice only if stimulus direction was consistent with that of the previous trial. Crucially, we did not see this repetition bias when the subsequent trial showed a stimulus moving in the opposite direction. This dependence of choice repetition on stimulus identity is not straightforward to interpret/model (indeed, we are not aware of similar observations in the literature), as it suggests the existence of a potentially separate process which detects consistency between the previous and current stimulus, and which interacts with the representation of previous confidence to influence decision/behaviour (e.g., through selective re-weighting of evidence).

As such, our results cannot be explained by a direct effect of confidence on the motion prior/starting point, as this would predict a generic repetition bias impacting subsequent choices equally, regardless of whether the next presented stimulus motion is in the same or opposite direction. Indeed, we confirmed this hypothesis with a new analysis using a modified version of the model whereby starting point on a given trial was biased in the direction of the previous choice, in proportion to the magnitude of Δe associated with that choice.

If indeed stimulus congruency interacts with the representation of previous confidence to influence decision/behaviour via a separate process one might not necessarily be able to capture the choice repetition bias as observed here only by modelling the decision process itself. Therefore, for the purposes of this paper, we designed our model to describe the decision process without accounting for potential trial-to-trial dependencies arising from the interaction with confidence, to offer an initial mechanistic interpretation of Y and provide a plausible link to the eventual confidence ratings. We now acknowledge this point in the Discussion (sixth paragraph) and highlight the need to reconcile our findings with formal models of decision making.

Another potential interpretation of this finding is that it is used as a means for balancing speed/accuracy demands in rapid decision tasks, even if this might inevitably lead to sub-optimal (local) behaviour. Since our task was not specifically designed to manipulate such variables (e.g. transitions in stimulus identity, speed/accuracy etc.) we cannot, at this stage, offer unequivocal support to this idea – though we have started to think carefully of possible extensions of this work and we are beginning to plan future experiments accordingly.

We hope our results will serve as a starting point for future work that will be tailored specifically towards sequential trial dependencies between neural confidence and stimulus/decision dynamics.

3) Y is not different for correct/incorrect responses: The authors should add a figure plotting Y split for correct and incorrect responses (perhaps next to the new panel 2C) indicating the quantitative difference. Also, please add an explicit explanation (in the Results/Discussion section) about this result. This is essential, as I believe it reveals quite a lot about what type of information Y carries, namely, it is not the classical statistical signature of confidence (see Sanders and Urai work) but something different in line to the arguments that the authors give at the end of the response to this point, i.e. not the probability that the choice is correct, but something else. The authors try to describe this "dissociation" to some extent in other parts of the text, but should be more explicit about this point.

We have now added a new panel (Figure 2D) showing the trial-averaged Y for correct and incorrect responses. We quantified this difference and found that across subjects, Y was consistently higher for correct than error responses (t(23)=7.58, p<.001, Results subsection “EEG-derived measure of confidence”, sixth paragraph), in line with the assumption that these early confidence signals are at least partially informed by the same information leading to the decision. That said, the observation that effects of confidence in the discriminant amplitude Y remain significant when accuracy is kept constant (i.e., when correct and error trials are considered separately) could indicate that, as the reviewer suggests, a dissociation of the confidence-related signals from statistical confidence exists. Y may be more in line with what has been referred to as “subjective confidence” (Odegaard et al., 2018), i.e., a quantity that can largely track statistical confidence (Sanders et al., 2016) but be prone to additional discrepancy or bias. Additional work could help directly address this question, for example by explicitly manipulating perceived confidence independently of task performance (Odegaard et al., 2018). The nature of discrepancies between statistical and subjective confidence has been addressed in behavioural and modelling studies (e.g., (Zylberberg et al., 2012; Samaha et al., 2018), however it is less well understood at the neural level, and lies beyond the main scope of this paper. We have adapted the Discussion to make the above points mode explicit (sixth paragraph).

4) Please add the results of the neural correlates of Y at the rating stage (can be in the supplement), and briefly comment about this in the Results section. Reviewers understood that the authors did not want that to focus on this stage, but it is quite a nice result that information about Y transitions from vmPFC at the decision stage to motor related areas at the confidence rating stage.

We have now included these results in the manuscript, along with the brief interpretation as suggested by the reviewer (Results subsection “fMRI correlates of EEG-derived confidence signals”, last paragraph).

References:

Dutilh G, Vandekerckhove J, Forstmann BU, Keuleers E, Brysbaert M, Wagenmakers EJ (2012) Testing theories of post-error slowing. Attention, perception & psychophysics 74:454-465.

Moscovitch M, Winocur G (2002) The frontal cortex and working with memory. Principles of frontal lobe function:188-209.

Odegaard B, Grimaldi P, Cho SH, Peters MAK, Lau H, Basso MA (2018) Superior colliculus neuronal ensemble activity signals optimal rather than subjective confidence. Proceedings of the National Academy of Sciences of the United States

of America 115:E1588-E1597.

Samaha J, Switzky M, Postle BR (2018) Confidence boosts serial dependence in orientation estimation. bioRxiv.

Sanders JI, Hangya B, Kepecs A (2016) Signatures of a Statistical Computation in the Human Sense of Confidence. Neuron 90:499-506.

Zylberberg A, Barttfeld P, Sigman M (2012) The construction of confidence in a perceptual decision. Frontiers in integrative neuroscience 6:79.

https://doi.org/10.7554/eLife.38293.027

Article and author information

Author details

  1. Sabina Gherman

    Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9918-3692
  2. Marios G. Philiastides

    Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
    Contribution
    Conceptualization, Resources, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    marios.philiastides@glasgow.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7683-3506

Funding

Economic and Social Research Council (ES/L012995/1)

  • Marios Philiastides

British Academy (SG121587)

  • Marios Philiastides

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the Economic and Social Research Council (ESRC; grant ES/L012995/1 to MGP) and the British Academy (BA; grant SG121587 to MGP).

Ethics

Human subjects: The study was approved by the College of Science and Engineering Ethics Committee at the University of Glasgow (CSE01355) and informed consent, and consent to publish, was obtained from all participants.

Senior Editor

  1. Joshua I Gold, University of Pennsylvania, United States

Reviewing Editor

  1. Tobias H Donner, University Medical Center Hamburg-Eppendorf, Germany

Reviewer

  1. Tobias H Donner, University Medical Center Hamburg-Eppendorf, Germany

Publication history

  1. Received: May 29, 2018
  2. Accepted: September 20, 2018
  3. Accepted Manuscript published: September 24, 2018 (version 1)
  4. Version of Record published: October 23, 2018 (version 2)
  5. Version of Record updated: November 9, 2018 (version 3)

Copyright

© 2018, Gherman et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,114
    Page views
  • 189
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)