Distinct neural contributions to metacognition for detecting, but not discriminating visual stimuli
 Cited 2
 Views 1,040
 Annotations
Abstract
Being confident in whether a stimulus is present or absent (a detection judgment) is qualitatively distinct from being confident in the identity of that stimulus (a discrimination judgment). In particular, in detection, evidence can only be available for the presence, not the absence, of a target object. This asymmetry suggests that higherorder cognitive and neural processes may be required for confidence in detection, and more specifically, in judgments about absence. In a withinsubject, preregistered and performancematched fMRI design, we observed quadratic confidence effects in frontopolar cortex for detection but not discrimination. Furthermore, in the right temporoparietal junction, confidence effects were enhanced for judgments of target absence compared to judgments of target presence. We interpret these findings as reflecting qualitative differences between a neural basis for metacognitive evaluation of detection and discrimination, potentially in line with counterfactual or higherorder models of confidence formation in detection.
Introduction
When foraging for berries, one first needs to decide whether a certain bush bears fruit or not. Only if berries are detected, can one proceed to examine and classify them into a category  are these raspberries or blackberries? The first is a detection task: a decision about whether something is there or not, and the second is a discrimination task: a decision about which item is there. For these types of decisions, it is important not only to understand the decision process that leads to deciding present or absent, or raspberries or blackberries, but also our ability to reflect on and estimate the quality of the decision, known as metacognition. For instance, two foragers working together may want to share their confidence in deciding which bush to tackle next (Bahrami et al., 2010; Frith, 2012).
There is an increasing understanding of the neural basis of confidence in simple decisions, with a network of prefrontal and parietal regions being identified as important for tracking metacognitive beliefs about the accuracy of both perceptual and valuebased decisions (see Domenech and Koechlin, 2015; Meyniel et al., 2015, for reviews). Accordingly, neuropsychological data in humans suggest that damage or impairment of prefrontal function can lead to metacognitive impairments such as noisy or inappropriate confidence judgments (see Rouault et al., 2018, for a review). However, in a majority of these cases, the study of confidence has been restricted to discrimination, or deciding whether a stimulus is from category A or B. Despite their ubiquity and importance in decisionmaking, much less is known about how confidence is formed in detection settings, in which subjects are asked to make a judgment about whether a target stimulus is present or not.
Computational considerations and behavioral findings suggest that computing confidence in detection judgments may differ from computing confidence in the more commonly studied discrimination tasks. In particular, detection is unique in the landscape of perceptual tasks in that evidence can only be available to support the presence, not the absence, of a target object. This makes confidence ratings in judgments about absence a unique case, where confidence is decoupled from the amount of supporting perceptual evidence. Accordingly, behavioral evidence indicates that metacognitive sensitivity, or the alignment between subjective confidence and objective performance, for judgments about absence is typically impaired compared to metacognitive sensitivity for judgments about presence (Meuwese et al., 2014; Kanai et al., 2010).
Under one family of models (firstorder models), confidence in detection judgments is formed in the same way as confidence in discrimination judgments. For example, in evidenceaccumulation models, confidence can be evaluated as the distance of the losing accumulator from the threshold at the time of decision (Vickers, 1979; Merkle and Van Zandt, 2006). Similarly, in models of discrimination confidence based on Signal Detection Theory (SDT), decision confidence is assumed to be proportional to the strength of the available evidence supporting the decision, which is modeled as the distance of the perceptual sample from the decision criterion on a strengthofevidence axis (Wickens, 2002, section 5.2). While firstorder models are traditionally symmetric, they can be adapted to account for the asymmetry between judgments about presence and absence. For example, unequalvariance (uvSDT) and multidimensional SDT models account for the inherent difference between presence and absence by making the signal distribution wider than the noise distribution (Wickens, 2002, section 3.4), or by assuming a highdimensional stimulus space, in which the absence of a signal is represented as a distribution centered around the origin (King and Dehaene, 2014; Wickens, 2002, section 7.2). Importantly, firstorder models treat the process of metacogntive evaluation of detection and discrimination as qualitatively similar, with any differences between detection and discrimination emerging from differences in the underlying distributions (uvSDT), or the mapping between stimulus features and responses (twodimensional SDT).
In contrast with firstorder models of detection confidence, higherorder models treat confidence in judgments about target absence as emerging from a distinct, higherorder cognitive process. For instance, in one version of the higherorder approach, confidence in judgments about absence is assumed to be based on counterfactual estimation of the likelihood of a hypothetical stimulus to be detected, if presented. In other words, subjects may be more confident in the absence of a target object when they believe they would not have missed it, based on their global estimation of task difficulty, or on their current level of attention. A similar type of modeling has been successfully employed in studies of memory, to explain how participants form judgments that an item was not presented during the preceding learning phase, based on their counterfactual expectations about remembering an item (Glanzer and Adams, 1990). When applied to the comparison of detection and discrimination, this approach predicts that qualitatively distinct cognitive and neural resources will be recruited when judging confidence in detection responses, due to the additional demand on counterfactual and selfmonitoring processes, and that this recruitment will be most pronounced for confidence about absence. In particular, the counterfactual account predicts that responses in the frontopolar cortex, a region which has been shown to track counterfactual world states (Boorman et al., 2009), will show specificity for confidence judgements when inferring the absence of a target.
To test for such qualitative differences, here we set out to directly compare the neural basis of metacognitive evaluation of detection and discrimination responses within two similar lowlevel perceptual tasks, while controlling for differences in task performance. In a preregistered design, we asked whether parametric relationships between subjective confidence ratings and the bloodoxygenationleveldependent (BOLD) signal in a set of predefined prefrontal and parietal regions of interests (ROIs) would show systematic interaction with task (detection/discrimination) and, within detection, type of response (present/absent). To anticipate our results, we observed a quadratic effect of confidence on regional responses in frontopolar cortex for detection, but not for discrimination judgments. In further wholebrain exploratory analyses, we found stronger confidencerelated effects for judgments of absence compared to presence in right temporoparietal junction.
Results
A total of 35 participants performed two perceptual decisionmaking tasks while being scanned in a 3T MRI scanner: an orientation discrimination task (‘was the grating tilted clockwise or anticlockwise?‘), and a detection task (‘was any grating presented at all?'; see Figure 1). The discrimination and detection tasks were performed in separate blocks each lasting 40 trials. At the end of each trial, participants rated their confidence in the accuracy of their decision on a 6point scale. We adjusted the difficulty of the two tasks in a preceding behavioral session to achieve equal performance of around 70% accuracy. At scanning, 10 discrimination and detection blocks were presented in 5 scanner runs.
Behavioral results
Task performance was similar for detection (75% accuracy, d’=1.48) and discrimination blocks (76% accuracy, d’=1.50). Repeated measures ttests failed to detect a difference between tasks both in mean accuracy (t(34) = −0.90, p=0.37, BF_{01 }= 5.15), and d’ ( t(34) = −0.30, p=0.76, BF_{01}=7.29), indicating that performance was well matched. Responses were also balanced for the two tasks. The probability of responding yes (target present) in the detection task was 0.49 ± 0.11, and not significantly different from 0.5 (t(34) = −0.39, p=0.70, BF_{01}=7.07). The probability of responding clockwise in the discrimination task was 0.50 ± 0.08, and not significantly different from 0.5 (t(34) = 0.22, p=0.83, BF_{01}=7.43).
The distribution of confidence ratings was generally similar between the two tasks and four responses. For all four responses, participants were most likely to report the highest confidence rating compared to any other option. Within detection, a significant difference in mean confidence was observed between yes (target present) and no (target absent) responses, such that participants were more confident in their yes responses (t(34) = −4.85, p<0.0001; see Figure 2). This difference in mean confidence was mostly driven by the higher proportion of maximum confidence ratings in yes responses compared to no responses (46% of all yes responses compared to 26% of all no responses, t(34)=5.63, p<0.00001), but persisted even when ignoring the highest ratings (t(34)=2.39, p<0.05).
Metacognitive sensitivity, quantified as the area under the typeII ROC curve, was significantly higher for yes compared to no responses (t(34) = 7.83, p<10–8; see Figure 2), as expected (Meuwese et al., 2014). In other words, confidence ratings about the presence of a target stimulus were more diagnostic of accuracy than ratings about target absence, even though both sets of ratings tended to cover the full range of the scale, from low to high confidence. Taking metacognitive sensitivity following discrimination responses as a baseline, we found that this effect was driven by a decrease in metacognitive sensitivity for no responses (t(34) = −4.89, p<0.0001), whereas a quantitative increase in metacognitive sensitivity for yes responses compared to discrimination was not significant (t(34)=1.84, p=0.07). No difference was observed in metacognitive sensitivity between the two discrimination responses (clockwise and anticlockwise; t(34) = 0.06, p=0.95, BF_{01}=7.6). Taken together, these results are consistent with the previously reported selective asymmetry in the fidelity of metacognitive evaluation following judgments about target absence (Meuwese et al., 2014; Kanai et al., 2010).
Response times were faster on average for correct responses (849 ± 79 milliseconds) compared to incorrect responses (938 ± 95 milliseconds; t(34)=10.59, p<10^{11 }for a paired ttest on the logtransformed response times). Within the detection task, yes responses were significantly faster than no responses (850 ± 90 milliseconds and 896 ± 103 milliseconds, respectively; t(34)=3.16, p<0.005 for a paired ttest on the logtransformed response times).
Imaging results
Parametric effect of confidence
We next turned to our fMRI data to ask whether confidencerelated responses were similar or distinct across tasks (detection/discrimination) and response (target present: yes/target absent: no). We first established the presence of linear confidencerelated effects in our a priori ROIs, both across tasks and response types and across correct and incorrect responses, in line with previous findings of ‘generic’ or taskinvariant confidence signals in these regions (Morales et al., 2018). Specifically, higher confidence ratings were associated with increased activation in the ventromedial prefrontal cortex (vmPFC), the ventral striatum, and the precuneus. Conversely, activations in the posterior medial frontal cortex (pMFC) were negatively correlated with confidence (see Figure 3). For the confidence effect pattern obtained from the GlobalConfidence Design Matrix (GCDM), see Appendix 3—figure 1.
Interaction of linear confidence effects with task and response
We next asked whether the linear parametric relationship between confidence and BOLD activity differed as a function of task (discrimination vs. detection) and response type (yes vs. no in detection). In the pMFC, vmPFC, ventral striatum and precuneus ROIs, the parametric effect of confidence failed to show a significant difference between the two tasks (all pvalues>0.3), between the two discrimination responses (all pvalues>0.24), or between the two detection responses (all pvalues>0.09). Similarly, no cluster within the prespecified frontopolar ROI showed a differential effect of confidence as a function of task or response. We show below that this absence of a linear interaction should not be taken as evidence of absence of differences between detection and discrimination, due to the presence of nonlinear interaction effects. In the next section we first explain the analysis steps we took to uncover nonlinear effects of confidence.
Interaction of nonlinear confidence effects with task and response
An exploratory whole brain analysis (p<0.05, corrected for multiple comparisons at the clusterlevel) revealed no differential confidence effect as a function of task anywhere in the brain. However, within detection, wholebrain analysis revealed that the linear effect of confidence was significantly more negative for no compared to yes responses in the right temporoparietal junction (rTPJ: 101 voxels, peak voxel: [54,46, 26], z = 5.10). To further characterize the nature of the interaction between confidence and response in the rTPJ, we fitted a new design matrix for each task (CategoricalConfidence Design Matrices (posthoc analysis; CCDM)) where confidence was represented as a categorical variable with 6 levels instead of one parametric modulator. In contrast to our original design matrix (Main Design Matrix (DM1)) that assumed a linear effect of confidence, this analysis is agnostic as to the functional form of the confidence effect. We then plotted the mean activation level for each combination of response and confidence level in the rTPJ cluster (see Figure 3, panel c).
The categoricalconfidence design matrix revealed a positive quadratic effect of confidence on activation levels in the rTPJ, with stronger activation levels for the two extremities of the confidence scale. We confirmed the presence of a significant quadratic effect of confidence in this region by fitting a secondorder polynomial to the responsespecific confidence curve of each participant (see Materials and methods). This analysis revealed a main quadratic effect of confidence in this region (t(34) = 5.21, p<0.00001), an effect which was stronger in detection compared to discrimination (t(34)=2.06, p<0.05, d = 0.35). Importantly, the linear interaction of confidence with detection responses remained significant for this quadratic model, establishing that this responsespecific effect is not explained by an overall quadratic pattern (t(33)=2.09, p<0.05, d = 0.36 ; see Figure 3). More generally, these analyses make clear that linear effects of parametric modulators and their interactions are not exhaustive in their characterization of the confidencerelated BOLD response – in this region and potentially in our other ROIs too.
To formally test for such nonlinear differences in the activation profile of other ROIs, we extracted the coefficients from the categorical model for each ROI, and fitted a secondorder polynomial to the ensuing confidencerelated response. Within our a priori ROIs, no quadratic effect of confidence was observed in the pMFC, the precuneus, the ventral striatum, or the vmPFC (Appendix 5—figure 1). In contrast, in all three anatomical subregions of the frontopolar cortex, we found a positive quadratic effect of confidence, with stronger activations for the two extremities of the confidence scale. Strikingly, in both the FPl and the FPm, this positive quadratic effect of confidence was entirely driven by the detection task (FPm: t(34)=3.04, p<0.005, d = 0.51; FPl: t(34)=3.90, p<0.001, d = 0.66; see Figure 4). Confidence ratings for the discrimination task however showed a quadratic effect that was not statistically different from zero (FPm: t(34)=0.54, p=0.59, d = −0.09, BF_{01}=6.61; FPl: t(34)=1.42, p=0.16, d = 0.24, BF_{01}=2.92). In the FPm, the linear effect of confidence was more negative for detection than for discrimination (t(34) = −2.11, d = −0.36, p<0.05), and within detection, more negative for confidence in judgments about absence (no responses; t(34) = 2.10, d = −0.36, p<0.05).
Finally, to test for similar quadratic effects of confidence at the wholebrain level, we constructed a new design matrix (in a departure to our preregistered analysis plan) in which confidence was modeled by a parametric modulator with a polynomial expansion of 2 (QuadraticConfidence Design Matrix (posthoc analysis; QCDM)). Three clusters showed a significantly stronger quadratic effect of confidence in detection compared to discrimination (Figure 5). These were located in the right superior temporal sulcus (72 voxels, peak voxel: [60,43,2], Z = 3.99), preSMA (130 voxels, peak voxel: [0,35,47], Z = 4.07), and right frontopolar cortex, overlapping with our FPl and FPm frontopolar anatomical subregions (51 voxels, peak voxel: [9,65,10], Z = 4.00). Importantly, no region showed stronger quadratic effects of confidence in discrimination compared to detection.
To visualize activity patterns in these regions, we extracted the mean coefficients from the categorical model for these three clusters, and fitted a secondorder polynomial separately to each response estimate (see Figure 5). In addition to the effect of task on the quadratic effect of confidence in all three clusters, the linear effect of confidence in the right frontopolar cluster was significantly more negative for detection, compared to discrimination (t(34)=3.13, d = −0.53, p<0.005). For both tasks, intersubject variability in metacognitive efficiency (measured as metad’/d’; Maniscalco and Lau, 2012) was not reliably correlated with linear or quadratic parametric effect of confidence in any of the three regions (see Appendix 7).
Computational models
We next considered alternative computationallevel explanations for the detectionspecific quadratic activation profile. Specifically, we evaluated how latent model variables or belief states change nonlinearly as a function of confidence in three candidate model architectures (see Figure 6): a static ‘Signal Detection’ model, a ‘Dynamic Criterion’ model where policy changes as a function of previous perceptual samples, and an ‘Attention Monitoring’ model in which beliefs about fluctuations in attention inform decisions and confidence judgments. A detailed formal description of the three models is available in the appendix (sections 9, 10 and 11).
First, we consider the static Signal Detection Theory (SDT) model. In SDT models of confidence formation, the log likelihoodratio between the two competing hypotheses ($LLR=\mathrm{log}\frac{p(x{S}_{1})}{p(x{S}_{2})}$) is a useful measure for determining the certainty with which one should commit to a choice. The mapping between the perceptual sample $x$ and the LLR is linear for equalvariance SDT, which is often used to model discrimination, but quadratic for unequalvariance SDT, which is often used to model detection. It then follows that if confidence is proportional to the distance of the sample $x$ from the decision criterion, neuronal populations that represent the relative likelihood of a choice being correct (be it LLR or an analogue quantity) will show a quadratic tuning function of confidence in detection and a linear tuning function in discrimination, similar to that observed in FPC, preSMA and STS. However, LLR is also expected to scale more strongly with confidence in yes responses (see simulation results in Figure 6, upper panel), which was not observed in these brain regions. This model also predicts a stronger quadratic effect of confidence in participants for which the variance ratio between the signal and noise distributions is particularly high. However, the variance ratio was not significantly correlated with the quadratic effect of confidence in any of these regions, as would be expected if they were representing LLR or a similar quantity (see Appendix 6—figure 1).
For the next two models, confidence was assumed to be directly proportional to the LLR, with the measured signal representing internal beliefs about hidden model parameters. In the ‘Dynamic Criterion’ model, we considered whether a quadratic effect of confidence in detection may reflect the active tuning of decision policy in the absence of explicit feedback (Guggenmos et al., 2016; Ko and Lau, 2012). In the model, beliefs about the underlying distributions are updated on a trialtotrial basis, and in turn affect the placement of decision criterion (for a formal description of the model, see Appendix section 10). The Dynamic Criterion model predicts that the magnitude of shift in decision criterion will display a positive quadratic relation to confidence (LLR) in detection but not discrimination (see simulation results in Figure 6, middle panel). This is because the problem is asymmetric in detection, and decision policy should depend on beliefs about both sensory precision (or the relative variance of the noise and signal distribution) and expected signal strength (mean of the signal distribution), which is not the case for a symmetric discrimination problem.
Notably, the pattern of criterion shifts in the Dynamic Criterion model resembled the taskspecific effect of confidence in the FPC, STS and preSMA. As a posthoc test of a role for these regions in criterion adjustment, we examined sequential pairs of trials of the same stimulus category (for example, a signal present trial that was followed by a signal present trial), and contrasted ‘repeat’ trials with ‘switch’ trials (for example, [yes, yes] vs. [yes, no]). The Dynamic Criterion model predicts stronger activation in switch compared to stay trials in both detection and discrimination. The FPl showed a weak effect in this direction (t = 2.03, p=0.05, d = 0.34), whereas FPm, preSMA, right BA10 and STS did not (all pvalues>0.15).
Finally, we considered a higherorder ‘Attention Monitoring’ model in which beliefs about one’s current attentional state (precision or inverse variance in SDT) are taken into account when making perceptual decisions and confidence ratings on detection trials. This model formalizes the notion that after not detecting a target the participant may ask ‘Given my current attentional state, would I have missed the target?'. The Attention Monitoring model thus makes different predictions for confidence in detection ‘target absent’ (no) responses, where the participant is assumed to reflect on the detectionlikelihood of hypothetical targets, compared to ‘target present’ (yes) responses, similar to the activation profile observed in the rTPJ. However, this model also predicts a pronounced quadratic confidence profile for all four responses, which we do not see in our data.
Discussion
Previous studies of the neural basis of human perceptual decisionmaking have tended to focus on discrimination judgments, such as sorting stimuli into category A or B. The general computational architecture supporting discrimination judgments can be naturally extended to support detection (for instance, within signal detection theory). However, computational considerations and behavioral findings suggest that forming confidence in detection judgments may rest on qualitatively distinct cognitive and neural processes in comparison to generating confidence in discrimination judgments.
To test for such differences, here we acquired functional MRI data from 35 participants who reported their subjective confidence in judgments about stimulus type (discrimination), and target presence or absence (detection). These judgments were given on separate trials that were wellmatched for stimulus characteristics, response requirements and task difficulty. Across both tasks, we found the expected linear effects of confidence in our prespecified regions of interest in the prefrontal and parietal cortex. Specifically, in the precuneus, vmPFC, pMFC and ventral striatum, the effect of confidence was invariant to task and response. In contrast, having adjusted our planned design matrix to be sensitive to nonmonotonic effects of confidence, we observed a quadratic effect of confidence in detection judgments in the frontopolar cortex (medial and lateral surfaces of BA10), that was absent for discrimination judgments. Similar quadratic activation profiles were observed for both yes and no responses. Wholebrain analysis revealed a similar effect of task on the quadratic effect of confidence in the right STS and the preSMA. Since task performance was matched across the two tasks and since we did not observe overall differences in activation between detection and discrimination (see Appendix 4—figure 1), these differences in confidence profiles are unlikely to originate from experimental confounds such as task difficulty, but instead indicate a unique neurocognitive contribution to metacognition of detection judgments. In what follows we will unpack what this contribution might be.
The three regions that showed an interaction of the quadratic expansion of confidence with task in our wholebrain analysis (right frontopolar cortex, right STS, and preSMA), as well as two anatomical subcomponents of our frontopolar ROI (FPl and FPm), all shared a very similar activation profile. In detection, the quadratic effect of confidence was positive, but was almost entirely absent for the discrimination task. Followup analysis confirmed that this difference was not driven by motor aspects of the confidence rating procedure, such as the number of increase or decrease confidence steps taken to reach the desired confidence level, which was similar for the two tasks (see Appendix 1—figure 1). Ours is not the first report of a quadratic relation between activation in prefrontal cortical structures and different subjective ratings. For example, in a study by Christensen et al. (2006), participants were presented with masked stimuli and gave subjective visibility ratings on a threepoint scale. The right frontopolar cortex showed decreased activation for ‘clear perception’ and ‘no perception’ categories relative to a middle ‘vague perception’ category. Similarly, De Martino et al. (2017) reported a quadratic effect of product desirability in the pMFC. However, for both of the above cases, a quadratic effect can reflect a monotonic relationship with an implicit representation of subjective confidence (Lebreton et al., 2015). For example, participants may be more confident in the ‘clear perception’ and ‘no perception’ responses compared to the ‘vague perception’ option, or more confident about liking or not liking a product, compared to when using the middle parts of the liking scale. This explanation cannot account for the observed quadratic trend in our case, where in addition to strong activation levels for the highest confidence ratings in target presence and absence, we also find strong activation levels for the lowest levels of confidence.
We are unable to determine whether this effect originates from one homogeneous population of neurons that shows a quadratic effect of detection confidence, or from two overlapping populations that show nonlinear positive and negative effects of detection confidence – summing to an overall quadratic effect at the voxel level (similar to positive and negative confidenceselective neurons in the human posterior parietal cortex; Rutishauser et al., 2018) Addressing this question would require higher spatial resolution, for example using singlecell recordings in patients. Furthermore, because confidence judgments were always preceded by perceptual decisions in our design, we cannot determine whether the observed effects reflect an implicit representation of uncertainty, computed in parallel with the perceptual decision itself, or a higherorder representation that emerges at the explicit confidence rating phase. Future studies which use modelbased estimates of covert decision confidence (Bang and Fleming, 2018) or EEGinformed fMRI to resolve early and late processing stages (Gherman and Philiastides, 2018) may answer this question.
We considered three alternative computational models that were able to account for asymmetries between detection and discrimination activation profiles. An unequal variance signal detection theory model provided a simple account of the asymmetry between detection and discrimination, but could not account for the similar quadratic profiles observed for yes and no responses. A more direct test of the proposal that a detectionspecific quadratic effect of confidence originates from the unequalvariance properties of stimulus distributions in detection would be to test for similar effects in a discrimination task in which one category of stimuli is of higher variance (e.g., Denison et al., 2018). In contrast, the Dynamic Criterion model provided good qualitative accounts for distinct regional activation profiles, and the Attention Monitoring account predicted an interaction between confidence in judgments about presence and absence. However, the Attention Monitoring model also predicted a quadratic effect in discrimination, which we did not see.
Notably, both of these models share the need to learn (in the Dynamic Criterion model) or estimate (in the Attention Monitoring model) the current level of precision (inverse variance) in detection. Such online precision estimation evinces a profound asymmetry between detection and discrimination tasks: in discrimination tasks, one simply has to evaluate the relative evidence for different causes of sensory samples, under some prior belief about sensory precision; namely, the precision of the likelihood that any particular cause (e.g., clockwise or anticlockwise orientation) would generate sensory samples. In contrast, detection presents a difficult (illposed, dual estimation) problem. When assessing the evidence for the absence of a target, there could be no sensory evidence because the target is not there or because precision is low (or both). This puts pressure on the estimation of precision to resolve conditional dependencies between posterior beliefs about target presence and the precision with which it can be detected. In short, two things have to be estimated; the posterior expectation about the target and posterior beliefs about precision (Clark, 2013; Feldman and Friston, 2010; Haarsma et al., 2018; Palmer et al., 2019; Parr et al., 2018).
In line with a role in monitoring of attention or precision, right TPJ showed a negative effect of confidence that was stronger for ‘target absent’ responses compared to ‘target present’ responses in detection. This cluster was closest to the posterior subdivision of the right TPJ (TPJpR; Igelström et al., 2015), which is most strongly associated with reasoning about others’ beliefs (Igelström et al., 2016). In addition to its role in Theory of Mind (Saxe and Wexler, 2005; Lee and McCarthy, 2016), previous work has highlighted the importance of the rTPJ in controlling attention (Marois et al., 2004; Geng and Vossel, 2013; Lee and McCarthy, 2016; Dugué et al., 2018) and filtering distractors in visual search (Shulman et al., 2007). Furthermore, damage to the rTPJ can result in visual hemineglect: a condition in which stimuli in the left visual hemifield fail to reach awareness (Corbetta et al., 2005). Together, these observations have led to a proposal (the ‘Attention Schema Theory’) that the rTPJ is maintaining a simplified representation of one’s own and others’ attentional states, and that this function makes this region essential for maintaining conscious awareness (Graziano and Webb, 2015).
The current Attention Monitoring model fits well with the Attention Schema Theory. A representation of one’s current attentional state is a useful source of information for determining confidence in detection judgments, because stimuli are more likely to be missed when participants are not paying careful attention. This will be specifically useful for judgments about stimulus absence: if a target was not observed, the participant may reason something along the lines of 'given my current state of attention, I was not very likely to miss a target, therefore I can be very confident that a target was not presented’. In support of this idea, the typically poor metacognitive evaluations of decisions about stimulus absence are partially recovered when task difficulty is controlled by manipulating attention rather than stimulus visibility (Kanai et al., 2010; Kellij et al., 2018), suggesting that subjects may harness information about their attentional state to inform their confidence judgments. Interestingly, the frontopolar cortex, which showed a detectionspecific quadratic effect of confidence in our experiment, has also been implicated in attentional control via the gating of internal and external modes of attention (Burgess et al., 2007) and in discriminating between imagined and externally perceived memory items (Simons et al., 2006; Turner et al., 2008). Together, the engagement of this set of regions in detection confidence hints at a potential role for selfmonitoring of attention in metacognition of detection.
To conclude, we find a quadratic effect of confidence in detection judgments in several brain regions, including the frontopolar cortex and rTPJ. In the frontopolar cortex, this quadratic effect was not seen for discrimination judgments. In the rTPJ, we also found a linear effect of confidence that was more negative for judgments about stimulus absence compared to judgments about stimulus presence. We consider three computational accounts of our results, two of which implicate the learning and estimation of signaltonoise statistics as promising accounts of the observed detectionspecific activation profiles. However, while each of these accounts could explain some of our findings, none of the models could provide a complete account of the data. Further work is needed to decide between these alternatives, or to suggest new ones.
Materials and methods
All design and analysis details were preregistered before data acquisition and timelocked using preRNG randomization (Mazor et al., 2019). The timelocked protocol folder is available at https://github.com/matanmazor/detectionVsDiscrimination_fMRI (Mazor, 2020; copy archived at https://github.com/elifesciencespublications/detectionVsDiscrimination_fMRI). The entire set of preregistered analyses results is available at https://osf.io/98mv4/.
Participants
46 participants took part in the study (ages 18–36, mean = 24 ± 4; 29 females). 35 participants met our prespecified inclusion criteria (ages 18–36, mean = 24 ± 4; 20 females). After applying our runwise exclusion criteria to the data of the remaining 35 participants, our dataset consisted of 5 usable experimental runs from 15 participants, 4 usable experimental runs from 14 participants, 3 usable experimental runs from 5 participants, and 2 usable experimental runs from one participant. We prespecified a samplesize of 35, balancing statistical power and resource considerations.
Design and procedure
Request a detailed protocolAfter a temporally jittered rest period of 500–4000 milliseconds, each trial started with a fixation cross (500 milliseconds), followed by a presentation of a target for 33 milliseconds. In discrimination trials, the target was a circle of diameter 3° containing randomly generated white noise, merged with a sinusoidal grating (2 cycles per degree; oriented 45° or −45°). In half of the detection trials, targets did not contain a sinusoidal grating and consisted of random noise only. After stimulus offset, participants used their righthand index and middle fingers to make a perceptual decision about the orientation of the grating (discrimination blocks), or about the presence or absence of a grating (detection blocks). The response mapping was counterbalanced between blocks, such that an index finger press was used to indicate a clockwise tilt on half of the trials, and an anticlockwise tilt on the other half. Similarly, in half of the detection trials the index finger was mapped to a yes (‘target present’) response, and on the other half to a no (‘target absent’) response.
Immediately after making a decision, participants rated their confidence on a 6point scale by using two keys to increase and decrease their reported confidence level with their lefthand thumb. Confidence levels were indicated by the size and color of a circle presented at the center of the screen. The initial size and color of the circle was determined randomly at the beginning of the confidence rating phase, to decorrelate the number of button presses and the final confidence rating. The mapping between color and size to confidence was counterbalanced between participants: for half of the participants high confidence was mapped to small, red circles, and for the other half high confidence was mapped to large, blue circles. This counterbalancing was employed to isolate confidencerelated activations from activations that originate from the perceptual properties of the confidence scale or from differences in the motor requirement to press the upper and lower buttons. The perceptual decision and the confidence rating phases were restricted to 1500 and 2500 milliseconds, respectively. No feedback was delivered to subjects about their performance.
Participants were acquainted with the task in a preceding behavioral session. During this session, task difficulty was adjusted independently for detection and for discrimination, targeting around 70% accuracy on both tasks. We achieved this by adaptively controlling the stimulus signaltonoise ratio (SNR) once in every 10 trials: increasing the SNR when accuracy fell below 60%, and decreasing it when accuracy exceeded 80%. Performance on the detection and discrimination task was further calibrated to the scanner environment at the beginning of the scanning session, during the acquisition of anatomical (MPRAGE and fieldmap) images. After completing the calibration phase, participants underwent five tenminute functional scanner runs, each comprising one detection and one discrimination block of 40 trials each, presented in random order.
To avoid stimulusdriven fluctuations in confidence, grating SNR was fixed within each experimental block. Nevertheless, following experimental blocks with markedly bad (≤ 52.5%) or good (≥ 85%) accuracy, grating SNR was adjusted for the next block of the same task (SNR level was divided or multiplied by a factor of 0.9 for bad and good performance, respectively). Finally, grating SNR was adjusted for both tasks following runs in which the difference in performance between the two tasks exceeded 16.25% (SNR level was multiplied by the square root of 0.9 for the easier task and divided by the square root of 0.9 for the more difficult task).
To incentivize participants to do their best at the task and rate their confidence accurately, we offered a bonus payment according to the following payment schedule: bonus = £$\frac{\overrightarrow{accuracy}\cdot \overrightarrow{confidence}}{200}$ Where $\overrightarrow{accuracy}$ is a vector of 1 and −1 for correct and incorrect responses, and $\overrightarrow{confidence}$ is a vector of integers in the range of 1 to 6, representing confidence reports for all trials. We explained the payment structure to participants in the preceding behavioral session. Specifically, we advised participants that to maximize their bonus they should do their best at the main task, rate the confidence higher when they believe they are correct, and rate their confidence lower when they believe they might be wrong.
Scanning parameters
Request a detailed protocolScanning took place at the Wellcome Centre for Human Neuroimaging, London, using a 3 Tesla Siemens Prisma MRI scanner with a 64channel head coil. We acquired structural images using an MPRAGE sequence (1×1×1 mm voxels, 176 slices, in plane FoV = 256×256 mm^{2}), followed by a doubleecho FLASH (gradient echo) sequence with TE1 = 10 ms and TE2 = 12.46 ms (64 slices, slice thickness = 2 mm, gap = 1 mm, in plane FoV = 192 × 192 mm^{2}, resolution = 3 × 3 mm^{2}) that was later used for field inhomogeneity correction. Functional scans were acquired using a 2D EPI sequence, optimized for regions near the orbitofrontal cortex (3×3×3 mm voxels, TR = 3.36 s, TE = 30 ms, 48 slices tilted by −30 degrees with respect to the T > C axis, matrix size = 64×72, Zshim = −1.4).
Analysis
The preregistered objectives of this study were to:
Replicate findings of a generic (taskinvariant) confidence signal in the activity of medial prefrontal cortex (De Martino et al., 2013; Morales et al., 2018).
Test for an interaction between the parametric effect of confidence level and task (detection/discrimination) in the BOLD response in prefrontal cortex ROIs.
Within detection trials, test for an interaction between the parametric effect of confidence level and response (yes/no) in the BOLD response, specifically in the prefrontal cortex and in frontopolar regions that have previously been associated with counterfactual reasoning (Boorman et al., 2009; Donoso et al., 2014).
Test for relationships between fluctuations in metacognitive adequacy (a trialbytrial measure of metacognitive sensitivity; Wokke et al., 2017), and the BOLD signal separately for detection and for discrimination, and for yes and no responses within detection.
Replicate previous findings of betweensubject correlations between lateral prefrontal cortex (lPFC) function and metacognitive efficiency (metad’/d’; Fleming and Lau, 2014) in discrimination (Yokoyama et al., 2010).
Identify betweensubject functional correlates of metacognitive efficiency in detection. Specifically, ask if metacognitive efficiency in detection is predicted by activity in distinct networks compared to metacognitive efficiency in discrimination.
Exclusion criteria
Request a detailed protocolSubjects were excluded from all analyses for any of the following prespecified reasons: missing more than 20% of the trials, performing one of the tasks with accuracy below 60%, exceeding the 4 mm affine motion cutoff criterion in more than 2 experimental runs, and showing a consistent response bias (i.e. using the same response in more than 75% of the trials) in at least one task. Individual scan runs were excluded from all analyses if the participant exceeded the affine motion cutoff, if more than 20% of trials were missed, if mean accuracy was below 60% or if the response bias for one of the tasks exceeded 80%.
In addition, we applied a confidencerelated exclusion criterion: participants were excluded if they used the same confidence level in more than 80% of all trials globally or for a particular response, and individual scan runs were excluded if the same confidence level was used in more than 95% of the trials, either globally or for particular response types. Our preregistration document specified that the confidence exclusion criterion will be used to exclude participants from confidencerelated analyses only, but we subsequently revised this plan in order to use identical design matrices for all participants.
Behavioral analysis
Response conditional typeII ROC curves
Request a detailed protocolResponse conditional typeII ROC (Receiver Operating Characteristic) curves were extracted for the two discrimination and two detection responses. This was done by plotting the cumulative distribution of confidence levels in correct responses against the cumulative distribution of confidence levels in incorrect responses. As a measure of responsespecific metacognitive sensitivity, we extracted the area under these curves (AUROC2). The expected AUROC2 for no metacognitive insight (i.e., the confidence distributions are identical for correct and incorrect responses) is 0.5. Perfect metacognitive insight (i.e., confidence in all correct responses is higher than confidence in all incorrect responses) will result in an AUROC2 of 1.
Imaging analysis
fMRI data preprocessing
Request a detailed protocolData preprocessing followed the procedure described in Morales et al. (2018): 'Imaging analysis was performed using SPM12 (Statistical Parametric Mapping; www.fil.ion.ucl.ac.uk/spm). The first five volumes of each run were discarded to allow for T1 stabilization. Functional images were realigned and unwarped using local field maps (Andersson et al., 2001) and then slicetime corrected (Sladky et al., 2011). Each participant’s structural image was segmented into gray matter, white matter, CSF, bone, soft tissue, and air/background images using a nonlinear deformation field to map it onto template tissue probability maps (Ashburner and Friston, 2005). This mapping was applied to both structural and functional images to create normalized images in Montreal Neurological Institute (MNI) space. Normalized images were spatially smoothed using a Gaussian kernel (6 mm FWHM). We set a withinrun 4 mm affine motion cutoff criterion'.
Preprocessing and construction of first and secondlevel models used standardized pipelines and scripts available at https://github.com/metacoglab/MetaLabCore/.
Regions of interest
Request a detailed protocolIn addition to an exploratory wholebrain analysis (corrected for multiple comparisons at the cluster level), our analysis focused on the following a priori regions of interest, largely following the ROIs used by Fleming et al. (2018):
Frontopolar cortex (FPC, defined anatomically). We used a connectivitybased parcellation (Neubert et al., 2014) to define a general FPC region of interest as the total area spanned by areas FPl, FPm and BA46. The right hemisphere mask was mirrored to create a bilateral mask.
Ventromedial prefrontal cortex (vmPFC). The vmPFC ROI was defined as a 8 mm sphere around MNI coordinates [0,46,–7], obtained from a metaanalysis of subjectivevalue related activations (Bartra et al., 2013) and aligned to the cortical midline.
Bilateral ventral striatum. The ventral striatum ROIs was specified anatomically from the OxfordImanova Striatal Structural Atlas included with FSL (http://fsl.fmrib.ox.ac.uk).
Posterior medial frontal cortex (pMFC). The pMFC ROI was defined as a 8 mm sphere around MNI coordinates [0, 17, 46], obtained from a functional MRI study on decision confidence and aligned to the cortical midline (Fleming et al., 2012).
Precuneus. The precuneus ROI was defined as a 8 mm sphere around MNI coordinates [0,–57,18], based on Voxel Based Morphometry studies of metacognitive efficiency (Fleming et al., 2010; McCurdy et al., 2013) and aligned to the cortical midline.
For the general FPC ROI, smallvolume correction was applied to individual voxels within the ROI for all univariate contrasts. For the multivariate analysis, we used a searchlight approach to scan for spatial patterns within the ROI, followed by a correction for multiple comparisons. For all other ROIs, a GLM was fitted to the mean time course of voxels within the region, and multivariate analysis was performed on all voxels within the ROI. While our preregistered analysis defined the frontopolar cortex as a single region, we subsequently decided to separately analyze its 3 separate anatomical subregions identified by Neubert et al. (2014) (FPl, FPm and BA46). The decision to separate the FPC ROI to its subcomponents was made after data collection and these anatomical subregions should not be taken as a priori ROIs.
Univariate analysis
Univariate analysis was based on a design matrix in which different trial types are modeled by different regressors (main design matrix, below). Additionally, to examine the global effect of confidence across trial types, a simpler design matrix was fitted to the data as a first step (global confidence design matrix, below). Experimental runs for each subject were temporally concatenated before estimating the GLM coefficients. This was done in order to maximize sensitivity to response and taskspecific modulations of confidence, given the limited and varying number of trials within each experimental run.
Main design matrix (DM1)
Request a detailed protocolThe main design matrix for the univariate GLM analysis consisted of 16 regressors of interest. There was a regressor for each of the eight combinations of task x condition x response: For example, a regressor for detection trials where a signal was present and the subject reported seeing a signal with a yes response (present and present, P_P). The relevant trials were modeled by a boxcar regressor with nonzero entries at the interval starting at the offset of the stimulus and ending immediately after the confidence rating phase, convolved with the canonical hemodynamic response function (HRF). The duration of this interval was 4300 milliseconds, and not 4000 milliseconds as mistakenly indicated in the preregistration document. Each of these primary regressors was accompanied by a linear parametric modulation of the confidence reported for each trial. Together, the design matrix included 16 regressors of interest (see Table 1).
Trials in which the participant did not respond within the 1500 millisecond time frame were modeled by a separate regressor. The design matrix also include a runwise constant term regressor, an instructionscreen regressor for the beginning of each block, motion regressors (the 6 motion parameters and their first derivatives as extracted by SPM in the head motion correction preprocessing phase) and regressors for physiological measures. Button presses were modeled as stick functions, convolved with the canonical HRF, in three regressors: two regressors for the right and left righthand buttons, and one regressor for both up and down lefthand presses. We decided to have one regressor for both types of lefthand presses due to the strong positive correlation of the final confidence rating with the number of ‘increase confidence’ button presses, and the strong negative correlation with the number of ‘decrease confidence’ button presses.
Global confidence design matrix (GCDM)
Request a detailed protocolThe global confidence design matrix consisted of 4 regressors of interest. The first two primary regressors were ’correct trials’ (trials in which the participant was correct, across tasks and responses) and ’incorrect trials’ (trials in which the participant was incorrect, across tasks and responses). Single events were modeled by a boxcar regressor with nonzero entries at the 4300 millisecond interval starting at the offset of the stimulus and ending immediately after the confidence rating phase, convolved with the canonical hemodynamic response function (HRF). Additionally, the design matrix included a confidence parametric modulator for each of the first two regressors. The construction of the regressors and the additional nuisance regressors was handled similarly to the main design.
QuadraticConfidence design matrix (posthoc analysis; QCDM)
Request a detailed protocolThe quadraticconfidence design matrix for the univariate GLM analysis consisted of 12 regressors of interest. There was a regressor for each of the four responses: yes, no, clockwise and anticlockwise. Similar to the main design matrix, the relevant trials were modeled by a boxcar regressor with nonzero entries at the 4300 millisecond interval starting at the offset of the stimulus and ending immediately after the confidence rating phase, convolved with the canonical hemodynamic response function (HRF). Each of these primary regressors was accompanied by two parametric modulators, representing the linear and quadratic effects of confidence. Together, the design matrix included 12 regressors (4 responses + 4 linear confidence regressors + 4 quadratic confidence regressors). The QCDM included the same set of nuisance regressors as the main design matrix.
CategoricalConfidence design matrices (posthoc analysis; CCDM)
Request a detailed protocolIn order to better understand the nature of the linear interaction between confidence in yes and no responses, we specified a pair of design matrices—one for each task—in which confidence level was modeled as a categorical variable. Instead of the 8 primary regressors in the main design matrix, this design matrix consisted of only one regressor of interest for all trials, modeled by a boxcar with nonzero entries at the 4300 millisecond interval starting at the offset of the stimulus and ending immediately after the confidence rating phase, convolved with the canonical hemodynamic response function (HRF). This regressor was in turn modulated by a series of 12 dummy (0/1) parametric modulators  one for every response (yes and no for detection and clockwise and anticlockwise for discrimination) and confidence rating (1–6 for both tasks). Using two design matrices instead of one allowed us to set discrimination trials to be the baseline category for detection, and detection trials as the baseline for discrimination. These design matrices included the same set of nuisance regressors as the main design matrix.
For each participant, we used the betaestimates from the categoricalconfidence design matrices as the input to four responsespecific multiple linear regression models, with linear confidence and quadratic confidence as predictors, in addition to an intercept term. The subjectspecific coefficients were then subjected to ordinary least squares grouplevel inference, to compare linear and quadratic effects of confidence between responses. The rationale for choosing this twostep approach was its indifference to the confidence distributions for the four responses, that may bias the estimation of the quadratic and linear terms.
Multivariate analysis
Request a detailed protocolMultivoxel pattern analysis (Norman et al., 2006) was used to test for consistent spatial patterns in the fMRI data. We used The Decoding Toolbox (Hebart et al., 2015) and followed the procedures described by Morales et al. (2018). In order to identify brain regions that are implicated in inference about presence and absence, we trained and tested a linear classifier on detection decisions. We classified hits and correct rejections, instead of hits and misses as originally planned, due to an insufficient number of detection misses in some experimental blocks. We then compared the resulting classification accuracy with the crossclassification accuracy of training on detection responses and testing on discrimination confidence and vice versa. The purpose of this comparison was to isolate neural correlates of inference about stimulus absence or presence that should be specific to detection from more general neural correlates of stimulus visibility, that are also expected to affect confidence in discrimination judgements (see Appendix 8—figure 1).
The other prespecified multivariate tests were designed to find universal and responsespecific spatially multivariate representations of confidence. After conducting this analysis we came to realize that our experimental design was not appropriate for estimating the degree to which the representation of confidence is ‘responsegeneral’. In our experimental design, confidence is confounded with visual feedback during the confidencerating phase, such that ‘responsegeneral’ representations of confidence could appear if the spatial pattern of activation was sensitive to the visual feedback in the confidence rating. For completeness, we include the results of this analysis in the osf project page, but do not interpret them further.
Statistical inference
Request a detailed protocolTtest and anova Bayes factors use a JeffreyZellnerSiow Prior for the null distribution, with a unit prior scale (Rouder et al., 2009; Rouder et al., 2012). Wholebrain fMRI significance was corrected for familywise error rate at the cluster level (p<0.05), with a cluster defining threshold of p<0.001.
Appendix 1
Confidence button presses
Appendix 2
zROC curves
Appendix 3
Global confidence design matrix
From our prespecified ROIs, only the vmPFC and BA46 ROIs showed a significant linear effect of confidence in correct responses, in the opposite direction to what we expected based on previous studies. This is likely to be due to the differences in confidence profiles between the detection and discrimination tasks:
Average beta  T value  P value  Standard deviation  

vmPFC  0.35  3.06  4 × 10^{3}  0.67 
pMFC  0.31  2.48  0.02  0.74 
precuneus  0.25  2.30  0.03  0.64 
ventral striatum  0.056  1.51  0.14  0.22 
FPl  0.16  1.52  0.14  0.64 
FPm  0.12  1.46  0.16  0.48 
BA 46  0.37  3.77  6 × 10^{4}  0.57 
Appendix 4
Main effect of task
Average beta  T value  P value  Standard deviation  

vmPFC  0.01  0.05  0.96  1.64 
pMFC  0.15  0.60  0.55  1.45 
precuneus  0.04  0.16  0.87  1.65 
ventral striatum  0.09  0.77  0.45  0.72 
FPl  0.28  1.08  0.29  1.55 
FPm  5 × 10^{3}  0.02  0.98  1.22 
BA 46  0.38  1.19  0.24  1.89 
Appendix 5
Effect of confidence in our prespecified ROIs
Appendix 6
SDT variance ratio correlation with the quadratic confidence effect
Appendix 7
Correlation of metacognitive efficiency with linear and quadratic confidence effects
Appendix 8
Confidencedecision cross classification
Appendix 9
Static signal detection theory
Discrimination
Generative model
According to SDT, a decision variable $x$ is sampled from one of two distributions on each experimental trial.
Inference
$x$ is compared against a criterion to generate a decision about which of the two distributions was most likely, given the sample. For a discrimination task with equally likely symmetric distributions around 0, the optimal placement for a criterion is at 0.
In standard discrimination tasks, a common assumption is that the two distributions are Gaussian with equal variance. This assumption has a convenient computational consequence: the loglikelihood ratio (LLR), a quantity that reflects the degree to which the sample is more likely under one distribution or another, is linear with respect to $x$. Confidence is then assumed to be proportional to the distance of ${x}_{t}$ from the decision criterion.
In what follows $\varphi (x,\mu ,\sigma )$ is the likelihood of observing x when sampling from a normal distribution with mean μ and standard deviation $\sigma $.
Detection
Generative model
A common assumption is that in detection the signal distribution is wider than the noise distribution (unequalvariance SDT; Wickens, 2002, section 3.4).
Inference
Here $med(x)$ represents the median sensory sample $x$. This criterion was chosen to ensure that detection responses are balanced.
Importantly, in uvSDT, LLR is quadratic in x.
Appendix 10
Dynamic criterion
In SDT, task performance depends on the degree of overlap between the underlying distributions (d’) and on the positioning of the decision criterion (c). Participants may optimize criterion placement based on their changing beliefs about the underlying distributions (Lau, 2007; Ko and Lau, 2012). To model this dynamic process of criterion setting we simulated a model where beliefs about the underlying distributions are the Maximum Likelihood Estimates of the mean and standard deviation, based on the last 5 samples that were (correctly or not) categorized.
Discrimination
Generative Model
As in the Static Signal Detection model.
Inference
Means and standard deviations of the two distributions are estimated based on the last 5 samples in each category. To model prior beliefs about these parameters, each participant starts the task with 5 imaginary samples from the veridical distributions. Means and standard deviations are then extracted from these imaginary samples. In what follows, $\overrightarrow{cw}$ and $\overrightarrow{acw}$ are vectors with entries corresponding to the last 5 samples that were (correcly or not) labelled as clockwise and anticlockwise, respectively. ${\overline{x}}_{cw}$ and ${\overline{x}}_{acw}$ correspond to the sample means of these vectors. ${\sigma}_{cw}$ and ${\sigma}_{acw}$ correspond to their standard deviations.
Decisions and confidence are extracted from the LLR as in the Static Signal Detection model.
Detection
Generative Model
As in the Static Signal Detection model.
Inference
As in discrimination. In what follows, $\overrightarrow{a}$ and $\overrightarrow{p}$ are vectors with entries corresponding to the last 5 samples that were (correcly or not) labelled as ’signal absent’ and ’signal present’, respectively. ${\overline{x}}_{a}$ and ${\overline{x}}_{p}$ correspond to the sample means of these vectors. ${\sigma}_{a}$ and ${\sigma}_{p}$ correspond to their standard deviations.
In detection, LLR = 0 at two points (see Figure 6). The decision criterion ${c}_{t}$ is chosen to coincide with the rightmost point, which is positioned between the Signal and Noise distribution means.
Appendix 11
Attention monitoring
Similar to the Dynamic Criterion model, in the Attention Monitoring model participants adjust a decision criterion based on changing beliefs about the underlying distributions. However, unlike the Dynamic Criterion model, here beliefs change not as a function of recent perceptual samples, but as a function of access to an internal variable that represents the expected sensory precision (attention).
Discrimination
Generative model
In our schematic formulation of this model, participants have a true attentional state, which for simplicity we treat as either being on (1) or off (0). When attending, participatns enjoy higher sensitivity than when they are not attending.
The attentional state determines the means of sensory distributions.
However, they do not have direct access to their attentional state, but only to a noisy approximation of the probability that they were attending.
Inference
Participants are then assumed to use their knowledge about the $onTask$ variable when making a decision and confidence estimate.
Detection
Generative model
In detection, attentional states only affect the signal distribution, as noise is always centred at 0.
Inference
The likelihood of observing ${x}_{t}$ if no stimulus was presented is independent of the attention state.
Nevertheless, confidence in judgments about stimulus absence is dependent on beliefs about the attentional state. This is mediated by the effect of attention on the likelihood of observing ${x}_{t}$ if a stimulus were present. This is the counterfactual part.
References
 1
 2
 3
 4
 5
 6

7
Function and localization within rostral prefrontal cortex (area 10)Philosophical Transactions of the Royal Society B: Biological Sciences 362:887–899.https://doi.org/10.1098/rstb.2007.2095
 8
 9

10
Neural basis and recovery of spatial attention deficits in spatial neglectNature Neuroscience 8:1603–1610.https://doi.org/10.1038/nn1574
 11

12
Social information is integrated into value and confidence judgments according to its reliabilityThe Journal of Neuroscience 37:6066–6074.https://doi.org/10.1523/JNEUROSCI.388016.2017
 13

14
Executive control and decisionmaking in the prefrontal cortexCurrent Opinion in Behavioral Sciences 1:101–106.https://doi.org/10.1016/j.cobeha.2014.10.007
 15

16
Specific visual subregions of TPJ mediate reorienting of spatial attentionCerebral Cortex 28:2375–2390.https://doi.org/10.1093/cercor/bhx140

17
Attention, uncertainty, and freeenergyFrontiers in Human Neuroscience 4:215.https://doi.org/10.3389/fnhum.2010.00215
 18

19
Prefrontal contributions to metacognition in perceptual decision makingJournal of Neuroscience 32:6117–6125.https://doi.org/10.1523/JNEUROSCI.648911.2012

20
Neural mediators of changes of mind about perceptual decisionsNature Neuroscience 21:617–624.https://doi.org/10.1038/s4159301801046

21
How to measure metacognitionFrontiers in Human Neuroscience 8:443.https://doi.org/10.3389/fnhum.2014.00443

22
The role of metacognition in human social interactionsPhilosophical Transactions of the Royal Society B: Biological Sciences 367:2213–2223.https://doi.org/10.1098/rstb.2012.0123

23
Reevaluating the role of TPJ in attentional control: contextual updating?Neuroscience & Biobehavioral Reviews 37:2608–2620.https://doi.org/10.1016/j.neubiorev.2013.08.010
 24

25
The mirror effect in recognition memory: data and theoryJournal of Experimental Psychology: Learning, Memory, and Cognition 16:5–16.https://doi.org/10.1037/02787393.16.1.5

26
The attention schema theory: a mechanistic account of subjective awarenessFrontiers in Psychology 06:500.https://doi.org/10.3389/fpsyg.2015.00500
 27
 28
 29
 30
 31
 32
 33

34
A model of subjective report and objective discrimination as categorical decisions in a vast representational spacePhilosophical Transactions of the Royal Society B: Biological Sciences 369:20130204.https://doi.org/10.1098/rstb.2013.0204

35
A detection theoretic explanation of blindsight suggests a link between conscious perception and metacognitionPhilosophical Transactions of the Royal Society B: Biological Sciences 367:1401–1411.https://doi.org/10.1098/rstb.2011.0380

36
A higher order bayesian decision theory of consciousnessProgress in Brain Research 168:35–48.https://doi.org/10.1016/S00796123(07)680042

37
Automatic integration of confidence in the brain valuation signalNature Neuroscience 18:1159–1167.https://doi.org/10.1038/nn.4064
 38

39
A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratingsConsciousness and Cognition 21:422–430.https://doi.org/10.1016/j.concog.2011.09.021
 40

41
A novel tool for timelocking study plans to resultsThe European Journal of Neuroscience 49:1149–1156.https://doi.org/10.1111/ejn.14278
 42

43
Anatomical coupling between distinct metacognitive systems for memory and visual perceptionJournal of Neuroscience 33:1897–1906.https://doi.org/10.1523/JNEUROSCI.189012.2013

44
An application of the Poisson race model to confidence calibrationJournal of Experimental Psychology: General 135:391–408.https://doi.org/10.1037/00963445.135.3.391

45
The subjective experience of object recognition: comparing metacognition for object detection and object categorizationAttention, Perception, & Psychophysics 61:1057–1068.https://doi.org/10.3758/s1341401406431
 46

47
DomainGeneral and DomainSpecific patterns of activity supporting metacognition in human prefrontal cortexThe Journal of Neuroscience 38:3534–3546.https://doi.org/10.1523/JNEUROSCI.236017.2018
 48

49
Beyond mindreading: multivoxel pattern analysis of fMRI dataTrends in Cognitive Sciences 10:424–430.https://doi.org/10.1016/j.tics.2006.07.005
 50

51
Precision and false perceptual inferenceFrontiers in Integrative Neuroscience 12:39.https://doi.org/10.3389/fnint.2018.00039
 52

53
Bayesian t tests for accepting and rejecting the null hypothesisPsychonomic Bulletin & Review 16:225–237.https://doi.org/10.3758/PBR.16.2.225

54
Default bayes factors for ANOVA designsJournal of Mathematical Psychology 56:356–374.https://doi.org/10.1016/j.jmp.2012.08.001
 55
 56
 57
 58
 59
 60

61
Decision Processes in Visual Perception171–200, Confidence, Decision Processes in Visual Perception, Academic Press, 10.1016/B9780127215501.500119.

62
Elementary Signal Detection TheoryUSA: Oxford University Press.https://doi.org/10.1093/acprof:oso/9780195092509.001.0001

63
Sure I'm sure: prefrontal oscillations support metacognitive monitoring of decision makingThe Journal of Neuroscience 37:781–789.https://doi.org/10.1523/JNEUROSCI.161216.2016
 64
Decision letter

Thorsten KahntReviewing Editor; Northwestern University, United States

Joshua I GoldSenior Editor; University of Pennsylvania, United States

Michael GrazianoReviewer; Princeton University
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This study explores whether neural correlates of metacognitive judgments differ between detection and discrimination tasks. Results show that neural correlates of confidence in frontopolar cortex differ by task, such that activity correlates nonlinearly with confidence in detection tasks. In addition, the authors show that right temporoparietal junction is uniquely recruited for absence judgments.
Decision letter after peer review:
Thank you for submitting your article "Distinct neural contributions to metacognition for detecting (but not discriminating) visual stimuli" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Joshua Gold as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Michael Graziano (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
This study explores the neural correlates of metacognitive judgments of detection compared to discrimination. The authors find that confidence in detection judgments differs from that of discrimination in frontopolar cortex, and that right TPJ may be involved in confidence for absence judgments. Most importantly, they show that activity in detection judgements has a quadratic relationship with confidence ratings. All reviewers agreed that the paper is interesting and informative. However, there were also questions regarding the Discussion and whether alternative explanations have been adequately explored.
Essential revisions:
1) More needs to be said in the Discussion about what the quadratic confidence effects really mean. Do these signals originate from neurons that show a quadratic relationship between firing and confidence, or do they come from two overlapping populations that show positive and negative linear relationships with confidence, respectively? In the former case, what do neurons that respond to only to very high and very low confidence really signal. To these neurons encode confidence as such or a function of confidence? For instance, such a function could be the confidence of the confidence rating, which, as for any subjective report, should be higher at the extremes of the scale. However, in this case, it is the confidence about the confidence rating not the confidence about the perceptual judgement. The question is whether you would still observe quadratic confidence effects in detection task if there was no confidence rating? The authors bring up the related concept of metaconfidence as it can be derived from the Bayesian inference model. But it is not made clear enough that here confidence is on precision not on the detection judgement itself. These issues of metaconfidence need to be discussed more expansively and also independently of the Bayesian model.
2) Related to this, in several areas the quadratic effect is stronger in the detection than in the discrimination task. So this says that these regions care about whether you're really confident, in which case you should probably learn something about your environment based on feedback regarding the outcome of your choice (e.g. Guggenmos et al., 2016), or you're really not confident, in which case you should not update your model of the world at all. So what are these areas coding for? How much you care to update your world model based on the outcome of a choice? What does the quadratic relationship buy the organism?
3) Still related to the interpretation of the quadratic effect, it would be helpful to expand on the models and mechanisms that can give rise to nonlinear confidence effects. For instance, the suggestion that equal vs. unequal variance of the sample in the SDT model can explain linear vs. nonlinear effects is very interesting but this needs to be unpacked and explained in much more detail. The same is true (though to a lesser degree) for the Bayesian model. It may be helpful to include a figure illustrating how different models can explain nonlinear confidence effects. In general, these theoretical/conceptual considerations are an essential part of the current study and it would be important for the authors to expand on this.
4) The difference in responseconditional metacognitive sensitivity between yes and no responses, as shown also by Meuwese et al., and the lack of difference between YESconditional metacognitive sensitivity and discriminationbased metacognitive sensitivity, suggests that the difference in metacognitive sensitivity may result from difference in variance of the underlying internal response distributions. If YES and discriminationdependent metacognitive sensitivity is similar, and the only one that is different is nodependent metacognitive sensitivity, doesn't that imply simply that the signal distribution has larger variance than the noise across both tasks? Some simulations could be done to evaluate this possibility. This would be in line with a report from Kellij et al. (2018) on Psyarxiv (doi: 10.31234/osf.io/xky38), suggesting that asymmetric variance is wholly responsible for differences in t2AUC.
5) The exploratory analysis sought regions in which the quadratic effect of confidence was stronger in the detection than the discrimination task. What about the other way around? Were there ROIs where the quadratic effect of confidence was stronger for discrimination than detection?
6) Was the difference between quadratic effects in detection versus discrimination related to the difference in the (negative) linear effects, in any of these ROIs? In other words, is this just a regression to the mean problem, where you're having trouble finding areas that show either linear or quadratic effects of confidence? It seems that the explanation offered in the Discussion may be of utility here, such that if one specifies the SDT system in LLR space there is a quadratic relationship between the internal estimate x and the LLR is quadratic for unequal variance systems (Discussion, eighth paragraph). Although the measure of variance inequality across individuals was not correlated with the quadratic effect of confidence in the reported ROIs, I wonder if this might be mediated by the SNR of the BOLD signal in each individual, which could maybe be informed by the relationship between the linear differences between detection and discrimination and their quadratic relationship difference. In other words, if the linear relationship is weak in a given person, that could also imply a weaker quadratic relationship due to irrelevant factors such as SNR of the BOLD signal. Unless I am missing a point, this could destroy crosssubject relationships between zROCbased estimates of variance inequality and quadratic magnitude. At the least, the authors could test whether the variance imbalanced revealed by the slope of the zROC is related to any of these measures.
https://doi.org/10.7554/eLife.53900.sa1Author response
Essential revisions:
1) More needs to be said in the Discussion about what the quadratic confidence effects really mean. Do these signals originate from neurons that show a quadratic relationship between firing and confidence, or do they come from two overlapping populations that show positive and negative linear relationships with confidence, respectively? In the former case, what do neurons that respond to only to very high and very low confidence really signal. To these neurons encode confidence as such or a function of confidence? For instance, such a function could be the confidence of the confidence rating, which, as for any subjective report, should be higher at the extremes of the scale. However, in this case, it is the confidence about the confidence rating not the confidence about the perceptual judgement. The question is whether you would still observe quadratic confidence effects in detection task if there was no confidence rating? The authors bring up the related concept of metaconfidence as it can be derived from the Bayesian inference model. But it is not made clear enough that here confidence is on precision not on the detection judgement itself. These issues of metaconfidence need to be discussed more expansively and also independently of the Bayesian model.
We thank the reviewers for prompting further reflection on the interpretation of the confidence response profiles here. The observed activation profile in the FPC, STS and preSMA could indeed originate from one homogeneous population of neurons that shows a quadratic effect of confidence, or from two overlapping populations that show a nonlinear positive and negative effects of confidence – summing to an overall quadratic effect at the voxel level. It is difficult to evaluate these alternatives with the current design, as they do not make different predictions at the level of the BOLD signal. We are also unable to tell whether these activations would continue to be observed in the absence of an explicit confidence rating, given that a rating was always required in the current design. However, the quadratic response profile cannot be explained as a result of simply using a confidence scale: if this were the case, we would have observed the same pattern in the discrimination trials which also required an explicit subjective judgment. We have now added the following section to the Discussion to make clear where further work is needed to evaluate these alternatives:
“We are unable to determine whether this effect originates from one homogeneous population of neurons that shows a quadratic effect of detection confidence, or from two overlapping populations that show nonlinear positive and negative effects of detection confidence – summing to an overall quadratic effect at the voxel level (similar to positive and negative confidenceselective neurons in the human posterior parietal cortex; Rutishauser et al., 2015). […] Future studies which use modelbased estimates of covert decision confidence (Bang and Fleming, 2018) or EEGinformed fMRI to resolve early and late processing stages (Gherman and Philiastides, 2018) may answer this question.”
In response to point (3) below, we now also unpack in greater detail the possible computational mechanisms that may have given rise to a quadratic profile.
2) Related to this, in several areas the quadratic effect is stronger in the detection than in the discrimination task. So this says that these regions care about whether you're really confident, in which case you should probably learn something about your environment based on feedback regarding the outcome of your choice (e.g. Guggenmos et al., 2016), or you're really not confident, in which case you should not update your model of the world at all. So what are these areas coding for? How much you care to update your world model based on the outcome of a choice? What does the quadratic relationship buy the organism?
We agree with the reviewers that this asymmetry in the quadratic profile with respect to confidence is the key result of our paper. The notion that this is related to updating of a model of the task, or learning, is interesting to pursue – and forms the basis of new simulations that we present in response to point (3) below (please see below).
3) Still related to the interpretation of the quadratic effect, it would be helpful to expand on the models and mechanisms that can give rise to nonlinear confidence effects. For instance, the suggestion that equal vs. unequal variance of the sample in the SDT model can explain linear vs. nonlinear effects is very interesting but this needs to be unpacked and explained in much more detail. The same is true (though to a lesser degree) for the Bayesian model. It may be helpful to include a figure illustrating how different models can explain nonlinear confidence effects. In general, these theoretical/conceptual considerations are an essential part of the current study and it would be important for the authors to expand on this.
We agree and are glad of the opportunity to expand on potential models of the quadratic effect. We now simulate three different models that predict detectionspecific nonlinear effects of confidence: a Signal Detection model, a Dynamic Criterion model, and an Attention Monitoring model. We include simulations from the three models in the appendix and as part of our GitHub repository here: https://github.com/matanmazor/detectionVsDiscrimination_fMRI/tree/master/simulation
We describe the three models and their predictions in a new section of the Results section (subsection “Computational models”).and include Figure 6 for clarity.
4) The difference in responseconditional metacognitive sensitivity between yes and no responses, as shown also by Meuwese et al., and the lack of difference between YESconditional metacognitive sensitivity and discriminationbased metacognitive sensitivity, suggests that the difference in metacognitive sensitivity may result from difference in variance of the underlying internal response distributions. If YES and discriminationdependent metacognitive sensitivity is similar, and the only one that is different is nodependent metacognitive sensitivity, doesn't that imply simply that the signal distribution has larger variance than the noise across both tasks? Some simulations could be done to evaluate this possibility. This would be in line with a report from Kellij et al. (2018) on Psyarxiv (doi: 10.31234/osf.io/xky38), suggesting that asymmetric variance is wholly responsible for differences in t2AUC.
We thank the reviewer for suggesting this analysis. Because we used two independent staircase procedures, stimulus visibility (SNR) in detection ‘signal’ trials was generally higher than in discrimination. For this reason, signal variance in detection cannot be directly extrapolated from signal variance in discrimination in the same way as Kellij et al. However, we carried out an additional analysis of the detection and discrimination zROC curves to evaluate extent of variance asymmetries, which we now include as Appendix 2—figure 1. For both tasks the zROC curves were relatively linear, showing a good fit to the SDT assumptions. The discrimination task zROC had a linear slope of approximately ~1, supporting an equalvariance model, whereas the detection task zROC showed a shallower slope of < 1, consistent with an uvSDT model. However, uvSDT does not preclude interpretations of the metacognitive disparity effect at the decisionmaking or metacognitive levels, especially in light of the sensitivity of this effect to the means by which stimuli are made difficult to perceive (Kanai et al., 2011; Kellij et al., 2018).
5) The exploratory analysis sought regions in which the quadratic effect of confidence was stronger in the detection than the discrimination task. What about the other way around? Were there ROIs where the quadratic effect of confidence was stronger for discrimination than detection?
We did not find any brain region that showed stronger effects of confidence (linear or quadratic) for discrimination over detection. This point is now clarified in the manuscript:
“…voxels, peak voxel: [9,65,10], Z=4.00). Importantly, no region showed stronger quadratic effects of confidence in discrimination compared to detection.”
6) Was the difference between quadratic effects in detection versus discrimination related to the difference in the (negative) linear effects, in any of these ROIs? In other words, is this just a regression to the mean problem, where you're having trouble finding areas that show either linear or quadratic effects of confidence? It seems that the explanation offered in the Discussion may be of utility here, such that if one specifies the SDT system in LLR space there is a quadratic relationship between the internal estimate x and the LLR is quadratic for unequal variance systems (Discussion, eighth paragraph). Although the measure of variance inequality across individuals was not correlated with the quadratic effect of confidence in the reported ROIs, I wonder if this might be mediated by the SNR of the BOLD signal in each individual, which could maybe be informed by the relationship between the linear differences between detection and discrimination and their quadratic relationship difference. In other words, if the linear relationship is weak in a given person, that could also imply a weaker quadratic relationship due to irrelevant factors such as SNR of the BOLD signal. Unless I am missing a point, this could destroy crosssubject relationships between zROCbased estimates of variance inequality and quadratic magnitude. At the least, the authors could test whether the variance imbalanced revealed by the slope of the zROC is related to any of these measures.
We thank the reviewer for suggesting this useful control analysis. We agree that crosssubject correlation may indeed have been masked by differences in overall noisiness of the signal between participants, especially with this relatively modest sample size for examining betweensubject effects. We have now tackled this potential concern in a new analysis. To estimate the noisiness of single subject data, we extracted subjectwise Rsquared for the secondorder polynomial model predicting BOLD signal from confidence level and response. We replotted Appendix 6—figure 1, this time with a colorcode that indicates this goodness of fit for each subject. If the relation between zROC slope and the quadratic coefficient is masked by variability in overall data quality, a correlation should be unmasked when focusing on only the participants with high Rsquared scores. We did not see evidence for such an effect in any of the three clusters – instead, the relationship with log(SD ratio) seemed similar for different subgroups of subjects.
https://doi.org/10.7554/eLife.53900.sa2Article and author information
Author details
Funding
Royal Society (Sir Henry Dale Fellowship 206648/Z/17/Z)
 Stephen M Fleming
Wellcome Trust (Sir Henry Dale Fellowship 206648/Z/17/Z)
 Stephen M Fleming
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Maayan Keshev, Dan Bang, Madeleine Scott, Peter Zeidman, Nadège Corbin, Tim Tierney, Emma Holmes, Max Rollwage, Roni Maimon, Rani Moran, Noam Mazor and the FIL imaging team for their help in different stages of this project. The Wellcome Centre for Human Neuroimaging is supported by core funding from the Wellcome Trust (203147/Z/16/Z). SMF is supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (206648/Z/17/Z).
Ethics
Human subjects: Participants gave their informed consent to take part in the experiment. The experiment was approved by the UCL ethics committee (approval numbers 8231/001 and 1260/003).
Senior Editor
 Joshua I Gold, University of Pennsylvania, United States
Reviewing Editor
 Thorsten Kahnt, Northwestern University, United States
Reviewer
 Michael Graziano, Princeton University
Publication history
 Received: November 23, 2019
 Accepted: March 24, 2020
 Version of Record published: April 20, 2020 (version 1)
Copyright
© 2020, Mazor et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,040
 Page views

 130
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Genetics and Genomics
 Neuroscience
Somatic expansion of the Huntington’s disease (HD) CAG repeat drives the rate of a pathogenic process ultimately resulting in neuronal cell death. Although mechanisms of toxicity are poorly delineated, transcriptional dysregulation is a likely contributor. To identify modifiers that act at the level of CAG expansion and/or downstream pathogenic processes, we tested the impact of genetic knockout, in Htt^{Q111} mice, of Hdac2 or Hdac3 in mediumspiny striatal neurons that exhibit extensive CAG expansion and exquisite disease vulnerability. Both knockouts moderately attenuated CAG expansion, with Hdac2 knockout decreasing nuclear huntingtin pathology. Hdac2 knockout resulted in a substantial transcriptional response that included modification of transcriptional dysregulation elicited by the Htt^{Q111} allele, likely via mechanisms unrelated to instability suppression. Our results identify novel modifiers of different aspects of HD pathogenesis in mediumspiny neurons and highlight a complex relationship between the expanded Htt allele and Hdac2 with implications for targeting transcriptional dysregulation in HD.

 Computational and Systems Biology
 Neuroscience
Mechanistic modeling in neuroscience aims to explain observed phenomena in terms of underlying causes. However, determining which model parameters agree with complex and stochastic neural data presents a significant challenge. We address this challenge with a machine learning tool which uses deep neural density estimators—trained using model simulations—to carry out Bayesian inference and retrieve the full space of parameters compatible with raw data or selected data features. Our method is scalable in parameters and data features and can rapidly analyze new data after initial training. We demonstrate the power and flexibility of our approach on receptive fields, ion channels, and Hodgkin–Huxley models. We also characterize the space of circuit configurations giving rise to rhythmic activity in the crustacean stomatogastric ganglion, and use these results to derive hypotheses for underlying compensation mechanisms. Our approach will help close the gap between datadriven and theorydriven models of neural dynamics.