Confidence phenotypes: a unified computational account of value and decision certainty in reinforcement learning

Nicolás A Comay; Guillermo Solovey; Pablo Barttfeld

doi:10.7554/eLife.111820.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Nathan Faivre
Centre National de la Recherche Scientifique, Grenoble, France
Senior Editor
Michael Frank
Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary:

This study addresses an important question in reinforcement learning and metacognition by distinguishing value confidence from decision confidence and testing how each is computationally represented. The findings are significant because they suggest that value confidence is well captured by Bayesian uncertainty, whereas decision confidence reflects a hybrid computation combining probability correct with broader value certainty. The evidence is promising, supported by multiple datasets and model comparisons.

Strength.

(1) A major strength of the study is that the authors test their hypotheses across multiple datasets, including previously published datasets and newly collected data. This broad empirical approach increases the generality of the findings.

(2) The Bayesian model of value confidence has a clear theoretical basis. The proposed hybrid model of decision confidence is also intuitive. It appears to capture important aspects of the decision confidence data.

(3) The paper provides a useful framework for linking how certainty about value estimates guides the subsequent choice and the corresponding decision confidence.

Weakness

(1) The conceptual link between value confidence and decision confidence is not yet fully established. The manuscript argues that overall value certainty contributes to decision confidence, but this conclusion is based largely on the latent variable that the model infers from the decision confidence experiment alone. A more direct test would require measuring value confidence and decision confidence within the same participants and task, and analysing how these two types of confidence interact.

(2) The individual-difference analyses in Figure 5 are methodologically challenging. The predictors used in these analyses are derived from model fits to the behavioural data and are then correlated to behaviour in the same task. This creates a risk that correlations inevitably arise. Thus, it does not assure that correlations are cognitively meaningful.

(3) The model recovery results suggest that some candidate models are not clearly distinguishable.

(4) The manuscript would benefit from clearer explanations of why specific models capture particular behavioural patterns.

(5) The claim that value confidence modulates the exploration-exploitation trade-off should be interpreted carefully, because the model uses global uncertainty across both options, not option-specific value confidence.

https://doi.org/10.7554/eLife.111820.1.sa2

Reviewer #2 (Public review):

Summary:

In this work, the authors propose a common value-estimation framework based on Bayesian inference and show that it can account for both participants' confidence in their value estimates ("value confidence") and for their confidence in their final choices ("decision confidence").

Strengths:

The study extends several established findings in the confidence and reinforcement-learning literature. In particular, the authors not only examine decision confidence but also directly model value confidence, and they replicate the idea that decision confidence reflects a combination of multiple computations, previously described for categorical decisions (Navajas et al., 2017), in the context of continuous value-based decisions. I therefore consider the work a useful contribution to the field.

Weaknesses:

However, I believe that the scope of the conclusions is overstated relative to the results that are actually presented.

(1) Interaction between value confidence and decision confidence

The abstract and introduction frame the study as addressing a major gap in the literature, namely, the lack of direct investigation of the interaction between value confidence and decision confidence. Yet the manuscript never directly tests the interaction between these two quantities. Instead, the authors show that the reported decision confidence depends not only on the probability of being correct, but also on the precision of the decision variable DV, which is related to the precision of the value estimates underlying value confidence. While this is related to the proposed research question, it is not a direct analysis of the interaction between value confidence and decision confidence themselves.

(2) Unified computational framework

Similarly, the claim that the study provides a "unified computational framework" appears somewhat overstated. The proposed models build on standard and well-established Bayesian frameworks and extend them specifically to account for decision confidence. While this demonstrates that both forms of confidence can be expressed within a common Bayesian formalism, the manuscript does not establish a direct computational interaction or shared mechanism between them beyond their dependence on the same underlying uncertainty estimates.

(3) "Phenotypes" interpretation

The interpretation of the observed individual differences as distinct "behavioural phenotypes" also appears overstated. The reported analyses primarily show continuous variability across participants in the relative weighting of different components contributing to confidence reports, rather than evidence for qualitatively distinct categories or computational subtypes of decision-makers.

(4) Decision confidence terminology

I also found some conceptual ambiguity in the terminology used throughout the manuscript. Early in the paper, decision confidence is defined normatively as the subjective probability of having made the correct choice, corresponding to P(DV>0). Later, however, the authors show that participants' confidence reports are better explained by a combination of this probability and the precision of the decision-variable distribution. Despite this distinction, the manuscript continues referring to the reported quantity simply as "decision confidence." Clarifying the distinction between the theoretical construct and the empirical reports (for example, by referring to "reported decision confidence") would improve conceptual clarity.

https://doi.org/10.7554/eLife.111820.1.sa1

Reviewer #3 (Public review):

Summary:

Comay, Solovey, and Barttfeld aim to provide a unified computational account of confidence in reinforcement learning by distinguishing value confidence-the certainty associated with latent value estimates-from decision confidence-the confidence that a particular choice is correct. Across new experiments and reanalyses of previously published datasets, they argue that value confidence is best described by Bayesian posterior precision, that this form of confidence adaptively reduces decision noise as learning progresses, and that decision confidence is better captured by a hybrid model combining Bayesian probability correct with a more global estimate of value certainty. They further propose that individual differences in the relative weighting of these components define "confidence phenotypes" that predict task performance, exploration-exploitation behavior, and metacognitive accuracy.

Strengths:

A major strength of the study is that it addresses an important conceptual distinction that is often blurred in the confidence literature. The paper usefully separates uncertainty about latent environmental states from confidence in an action derived from those latent beliefs. This distinction is especially important in reinforcement learning, where uncertainty is not merely a retrospective judgment about accuracy but can directly shape future sampling, learning, and action selection. The manuscript is therefore well positioned to bridge work on Bayesian confidence in perceptual decision-making with work on uncertainty-guided learning and exploration.

A second strength is the authors' use of multiple datasets and model comparisons. The claim that value confidence tracks Bayesian uncertainty is supported across tasks in which participants explicitly report confidence in value estimates, including datasets where reward variance is manipulated. The latter manipulation is particularly useful because it helps distinguish a Bayesian uncertainty account from simpler models based only on the number of observations. The finding that value confidence modulates the softmax slope and thereby promotes more exploitative choices as uncertainty decreases is also theoretically coherent and supported across several datasets, including a preregistered replication.

The manuscript's most interesting and potentially impactful contribution is the hybrid model of decision confidence. The authors show that a model based only on Bayesian probability correct captures confidence on correct trials better than on incorrect trials, whereas adding an "overall value confidence" term improves the fit. This is a useful result because it suggests that confidence reports in reinforcement learning may not be a pure readout of decision-level discriminability, but instead may combine decision-specific evidence with more global latent-state uncertainty. This could help explain why human confidence often deviates from ideal Bayesian predictions, especially on error trials.

Weaknesses:

However, the interpretation of the hybrid model remains the main weakness of the paper. The second term, overall value confidence, is not equivalent to the precision of the decision variable. It can dissociate from decision difficulty: two options can be far apart but individually uncertain, or nearly identical but individually well estimated. The authors appear to recognize this issue and have reframed the term as "overall value confidence" rather than decision-variable precision. This is a useful clarification, but the conceptual role of the term still requires sharper treatment. In its current form, it is sometimes described as part of a unified confidence computation, but it may be more accurately understood as a biasing or contextual signal that modulates reported confidence without necessarily improving decision calibration.

A related concern is model identifiability. In many reinforcement-learning tasks, probability correct and overall value confidence both change systematically over the course of learning. As a result, the hybrid model may gain predictive power partly because it captures generic time-on-task or learning-progress effects, rather than because participants explicitly combine two separable uncertainty signals. The manuscript would be stronger if it more clearly demonstrated that the two latent variables are distinguishable in the behavioral data, for example, through model recovery, parameter recovery, cross-validated prediction, and analyses of the correlation between latent regressors across task conditions and individuals.

The link between the decision rule and confidence model also deserves more scrutiny. The authors use value confidence to modulate decision noise in the choice model, and then use a related global value-confidence term in the confidence-report model. This creates an appealing unified architecture, but it also raises the possibility that the same latent variable is doing multiple kinds of explanatory work. The paper would benefit from a clearer separation between uncertainty as a driver of choices, uncertainty as a determinant of confidence reports, and uncertainty as an inferred latent variable extracted from the same behavioral data.

From a computational neuroscience perspective, the manuscript would also benefit from a more explicit discussion of how these confidence quantities might be represented neurally. The current model treats value confidence, probability correct, and overall value confidence as scalar latent variables available to the observer. Yet uncertainty-related computations may be represented nonlinearly in neural population activity rather than as explicit scalar readouts. Work on nonlinear neural decoding and population codes has shown that task-relevant variables can be carried by nonlinear statistics of neural activity, especially when nuisance variables obscure mean tuning, and that behavioral choices can reveal whether such nonlinear information is efficiently decoded. This literature provides a useful framework for connecting the present behavioral model to possible neural implementations of value and decision confidence.

Overall, the authors largely achieve their goal of demonstrating that value confidence and decision confidence are computationally dissociable in reinforcement learning. The evidence for Bayesian value confidence is strong, and the evidence that confidence-guided exploitation improves the account of choice behavior is convincing. The evidence for the hybrid account of decision confidence is promising but would be strengthened by additional analyses clarifying model identifiability, the interpretation of the overall value-confidence term, and the conditions under which the model makes distinct predictions from simpler time-, value-, or evidence-based alternatives. The paper is likely to be useful for researchers interested in computational models of confidence, metacognition, and adaptive behavior under uncertainty.

https://doi.org/10.7554/eLife.111820.1.sa0

Confidence phenotypes: a unified computational account of value and decision certainty in reinforcement learning

Peer review process

Editors

Be the first to read new articles from eLife