Trading mental effort for confidence in the metacognitive control of valuebased decisionmaking
Abstract
Why do we sometimes opt for actions or items that we do not value the most? Under current neurocomputational theories, such preference reversals are typically interpreted in terms of errors that arise from the unreliable signaling of value to brain decision systems. But, an alternative explanation is that people may change their mind because they are reassessing the value of alternative options while pondering the decision. So, why do we carefully ponder some decisions, but not others? In this work, we derive a computational model of the metacognitive control of decisions or MCD. In brief, we assume that fast and automatic processes first provide initial (and largely uncertain) representations of options' values, yielding prior estimates of decision difficulty. These uncertain value representations are then refined by deploying cognitive (e.g., attentional, mnesic) resources, the allocation of which is controlled by an effortconfidence tradeoff. Importantly, the anticipated benefit of allocating resources varies in a decisionbydecision manner according to the prior estimate of decision difficulty. The ensuing MCD model predicts response time, subjective feeling of effort, choice confidence, changes of mind, as well as choiceinduced preference change and certainty gain. We test these predictions in a systematic manner, using a dedicated behavioral paradigm. Our results provide a quantitative link between mental effort, choice confidence, and preference reversals, which could inform interpretations of related neuroimaging findings.
Introduction
Why do we carefully ponder some decisions, but not others? Decisions permeate every aspect of our lives – what to eat, where to live, whom to date, etc. – but the amount of effort that we put into different decisions varies tremendously. Rather than processing all decisionrelevant information, we often rely on fast habitual and/or intuitive decision policies, which can lead to irrational biases and errors (Kahneman et al., 1982). For example, snap judgments about others are prone to unconscious stereotyping, which often has enduring and detrimental consequences (Greenwald and Banaji, 1995). Yet we don't always follow the fast but negligent lead of habits or intuitions. So, what determines how much time and effort we invest when making decisions?
Biased and/or inaccurate decisions can be triggered by psychobiological determinants such as stress (Porcelli and Delgado, 2009; Porcelli et al., 2012), emotions (Harlé and Sanfey, 2007; De Martino et al., 2006; SokolHessner et al., 2013), or fatigue (Blain et al., 2016). But, in fact, they also arise in the absence of such contextual factors. That is why they are sometimes viewed as the outcome of inherent neurocognitive limitations on the brain's decision processes, e.g., bounded attentional and/or mnemonic capacity (Giguère and Love, 2013; Lim et al., 2011; Marois and Ivanoff, 2005), unreliable neural representations of decisionrelevant information (Drugowitsch et al., 2016; Wang and Busemeyer, 2016; Wyart and Koechlin, 2016), or physiologically constrained neural information transmission (Louie and Glimcher, 2012; Polanía et al., 2019). However, an alternative perspective is that the brain has a preference for efficiency over accuracy (Thorngate, 1980). For example, when making perceptual or motor decisions, people frequently trade accuracy for speed, even when time constraints are not tight (Heitz, 2014; Palmer et al., 2005). Related neural and behavioral data are best explained by ‘accumulationtobound’ process models, in which a decision is emitted when the accumulated perceptual evidence reaches a bound (Gold and Shadlen, 2007; O'Connell et al., 2012; Ratcliff and McKoon, 2008; Ratcliff et al., 2016). Further computational work demonstrated that if the bound is properly set, these models actually implement an optimal solution to speedaccuracy tradeoff problems (Ditterich, 2006; Drugowitsch et al., 2012). From a theoretical standpoint, this implies that accumulationtobound policies can be viewed as an evolutionary adaptation, in response to selective pressure that favors efficiency (Pirrone et al., 2014).
This line of reasoning, however, is not trivial to generalize to valuebased decisionmaking, for which objective accuracy remains an elusive notion (Dutilh and Rieskamp, 2016; Rangel et al., 2008). This is because, in contrast to evidencebased (e.g., perceptual) decisions, there are no right or wrong valuebased decisions. Nevertheless, people still make choices that deviate from subjective reports of value, with a rate that decreases with value contrast. From the perspective of accumulationtobound models, these preference reversals count as errors and arise from the unreliable signaling of value to decision systems in the brain (Lim et al., 2013). That valuebased variants of accumulationtobound models are able to capture the neural and behavioral effects of, e.g., overt attention (Krajbich et al., 2010; Lim et al., 2011), external time pressure (Milosavljevic et al., 2010), confidence (De Martino et al., 2013), or default preferences (LopezPersem et al., 2016) lends empirical support to this type of interpretation. Further credit also comes from theoretical studies showing that these process models, under some simplifying assumptions, optimally solve the problem of efficient value comparison (Tajima et al., 2016; Tajima et al., 2019). However, they do not solve the issue of adjusting the amount of effort to invest in reassessing an uncertain prior preference with yetunprocessed valuerelevant information. Here, we propose an alternative computational model of valuebased decisionmaking that suggests that mental effort is optimally traded against choice confidence, given how value representations are modified while pondering decisions (Slovic, 1995; Tversky and Thaler, 1990; Warren et al., 2011).
We start from the premise that the brain generates representations of options' value in a quick and automatic manner, even before attention is engaged for comparing option values (Lebreton et al., 2009). The brain also encodes the certainty of such value estimates (Lebreton et al., 2015), from which a priori feelings of choice difficulty and confidence could, in principle, be derived. Importantly, people are reluctant to make a choice that they are not confident about (De Martino et al., 2013). Thus, when faced with a difficult decision, people should reassess option values until they reach a satisfactory level of confidence about their preference. This effortful mental deliberation would engage neurocognitive resources, such as attention and memory, in order to process valuerelevant information. In line with recent proposals regarding the strategic deployment of cognitive control (Musslick et al., 2015; Shenhav et al., 2013), we assume that the amount of allocated resources optimizes a tradeoff between expected effort cost and confidence gain. The main issue here is that the impact of yetunprocessed information on value representations is a priori unknown. Critically, we show how the system can anticipate the expected benefit of allocating resources before having processed valuerelevant information. The ensuing metacognitive control of decisions or MCD thus adjusts mental effort on a decisionbydecision basis, according to prior decision difficulty and importance (Figure 1).
As we will see, the MCD model makes clear quantitative predictions about several key decision variables (cf. Model section below). We test these predictions by asking participants to report their judgments about each item's subjective value and their subjective certainty about their value judgments, both before and after choosing between pairs of the items. Note that we also measure choice confidence, response time, and subjective effort for each decision.
The objective of this work is to show how most nontrivial properties of valuebased decisionmaking can be explained with a minimal (and mutually consistent) set of assumptions. The MCD model predicts response time, subjective effort, choice confidence, probability of changing one's mind, as well as choiceinduced preference change and certainty gain, out of two properties of prechoice value representations, namely value ratings and value certainty ratings. Relevant details regarding the model derivations, as well as the decisionmaking paradigm we designed to evaluate those predictions, can be found in the Model and Methods sections below. In the subsequent section, we present our main dual computational/behavioral results. Finally, we discuss our results in light of the existing literature on valuebased decisionmaking.
The MCD model
In what follows, we derive a computational model of the metacognitive control of decisions or MCD. In brief, we assume that the amount of cognitive resources that is deployed during a decision is controlled by an effortconfidence tradeoff. Critically, this tradeoff relies on a prospective anticipation of how these resources will perturb the internal representations of subjective values. As we will see, the MCD model eventually predicts how cognitive effort expenditure depends upon prior estimates of decision difficulty, and what impact this will have on postchoice value representations.
Deriving the expected value of decision control
Let $z$ be the amount of cognitive (e.g., executive, mnemonic, or attentional) resources that serve to process valuerelevant information. Allocating these resources will be associated with both a benefit $B\left(z\right)$, and a cost $C\left(z\right)$. As we will see, both are increasing functions of $z$: $B\left(z\right)$ derives from the refinement of internal representations of subjective values of alternative options or actions that compose the choice set, and $C\left(z\right)$ quantifies how aversive engaging cognitive resources are (mental effort). In line with the framework of expected value of control or EVC (Musslick et al., 2015; Shenhav et al., 2013), we assume that the brain chooses to allocate the amount of resources $\widehat{z}$ that optimizes the following cost–benefit tradeoff:
where the expectation accounts for predictable stochastic influences that ensue from allocating resources (this will be clarified below). Note that the benefit term $B\left(z\right)$ is the (weighted) choice confidence ${P}_{c}\left(z\right)$:
where the weight $R$ is analogous to a reward and quantifies the importance of making a confident decision (see below). As we will see, ${P}_{c}\left(z\right)$ plays a pivotal role in the model, in that it captures the efficacy of allocating resources for processing valuerelevant information. So, how do we define choice confidence?
We assume that decision makers may be unsure about how much they like/want the alternative options that compose the choice set. In other words, the internal representations of values ${V}_{i}$ of alternative options are probabilistic. Such a probabilistic representation of value can be understood in terms of, for example, an uncertain prediction regarding the tobeexperienced value of a given option. Without loss of generality, the probabilistic representation of option value takes the form of Gaussian probability density functions, as follows:
where ${\mu}_{i}$ and ${\sigma}_{i}$ are the mode and the variance of the probabilistic value representation, respectively (and $i$ indexes alternative options in the choice set).
This allows us to define choice confidence ${P}_{c}$ as the probability that the (predicted) experienced value of the (to be) chosen item is higher than that of the (to be) unchosen item:
where $s\left(x\right)=1/1+{e}^{x}$ is the standard sigmoid mapping. Here the second line derives from assuming that the choice follows the sign of the preference $\Delta \mu ={\mu}_{1}{\mu}_{2}$, and the last line derives from a momentmatching approximation to the Gaussian cumulative density function (Daunizeau, 2017a).
As stated in the Introduction section, we assume that the brain valuation system automatically generates uncertain estimates of options' value (Lebreton et al., 2009; Lebreton et al., 2015), before cognitive effort is invested in decisionmaking. In what follows, ${\mu}_{i}^{0}$ and ${\sigma}_{i}^{0}$ are the mode and variance of the ensuing prior value representations (we treat them as inputs to the MCD model). We also assume that these prior representations neglect existing valuerelevant information that would require cognitive effort to be retrieved and processed (LopezPersem et al., 2016).
Now, how does the system anticipate the benefit of allocating resources to the decision process? Recall that the purpose of allocating resources is to process (yet unavailable) valuerelevant information. The critical issue is thus to predict how both the uncertainty ${\sigma}_{i}$ and the modes ${\mu}_{i}$ of value representations will eventually change, before having actually allocated the resources (i.e., without having processed the information). In brief, allocating resources essentially has two impacts: (i) it decreases the uncertainty ${\sigma}_{i}$, and (ii) it perturbs the modes ${\mu}_{i}$ in a stochastic manner.
The former impact derives from assuming that the amount of information that will be processed increases with the amount of allocated resources. Here, this implies that the variance of a given probabilistic value representation decreases in proportion to the amount of allocated effort, that is:
where ${\sigma}_{i}^{0}$ is the prior variance of the representation (before any effort has been allocated), and $\beta $ controls the efficacy with which resources increase the precision of the value representation. Formally speaking, Equation 5 has the form of a Bayesian update of the belief's precision in a Gaussianlikelihood model, where the precision of the likelihood term is $\beta z$. More precisely, $\beta $ is the precision increase that follows from allocating a unitary amount of resources $z$. In what follows, we will refer to $\beta $ as the ‘type #1 effort efficacy’.
The latter impact follows from acknowledging the fact that the system cannot know how processing more valuerelevant information will affect its preference before having allocated the corresponding resources. Let ${\delta}_{i}\left(z\right)$ be the change in the position of the mode of the ith value representation, having allocated an amount $z$ of resources. The direction of the mode's perturbation ${\delta}_{i}\left(z\right)$ cannot be predicted because it is tied to the information that would be processed. However, a tenable assumption is to consider that the magnitude of the perturbation increases with the amount of information that will be processed. This reduces to stating that the variance of ${\delta}_{i}\left(z\right)$ increases in proportion to $z$, that is:
where ${\mu}_{i}^{0}$ is the mode of the value representation before any effort has been allocated, and $\gamma $ controls the relationship between the amount of allocated resources and the variance of the perturbation term $\delta $. The higher the parameter $\gamma $, the greater the expected perturbation of the mode for a given amount of allocated resources. In what follows, we will refer to $\gamma $ as the ‘type #2 effort efficacy’. Note that Equation 6 treats the impact of future information processing as a nonspecific random perturbation on the mode of the prior value representation. Our justification for this assumption is twofold: (i) it is simple, and (ii) and it captures the idea that the MCD controller does not know how the allocated resources will be used (here, by the valuebased decision system downstream). We will see that, in spite of this, the MCD controller can still make quantitative predictions regarding the expected benefit of allocating resources.
Taken together, Equations 5 and 6 imply that predicting the net effect of allocating resources onto choice confidence is not trivial. On one hand, allocating effort will increase the precision of value representations (cf. Equation 5), which mechanically increases choice confidence, all other things being equal. On the other hand, allocating effort can either increase or decrease the absolute difference $\left\Delta \mu \left(z\right)\right$ between the modes. This, in fact, depends upon the sign of the perturbation terms $\delta $, which are not known in advance. Having said this, it is possible to derive the expected absolute difference between the modes that would follow from allocating an amount $z$ of resources:
where we have used the expression for the firstorder moment of the socalled 'folded normal distribution', and the second term in the righthand side of Equation 7 derives from the same momentmatching approximation to the Gaussian cumulative density function as above. The expected absolute means' difference $E\left[\left\Delta \mu \rightz\right]$ depends upon both the absolute prior mean difference $\left\Delta {\mu}_{}^{0}\right$ and the amount of allocated resources $z$. This is depicted in Figure 2.
One can see that $E\left[\mathrm{\Delta}\mu z\right]\mathrm{\Delta}{\mu}^{0}$ is always greater than 0 and increases with $z$ (and if $z=0$, then $E\left[\left\Delta \mu \rightz\right]=\left\Delta {\mu}_{}^{0}\right$). In other words, allocating resources is expected to increase the value difference, despite the fact that the impact of the perturbation term can go either way. In addition, the expected gain in value difference afforded by allocating resources decreases with the absolute prior means' difference.
Similarly, the variance $V\left[\left\Delta \mu \rightz\right]$ of the absolute means' difference is derived from the expression of the secondorder moment of the corresponding folded normal distribution:
One can see in Figure 2 that $V\left[\left\Delta \mu \rightz\right]$ increases with the amount $z$ of allocated resources (but if $z=0$, then $V\left[\left\Delta \mu \rightz\right]=0$).
Knowing the moments of the distribution of $\left\Delta \mu \right$ now enables us to derive the expected confidence level ${\overline{P}}_{c}\left(z\right)$ that would result from allocating the amount of resource $z$:
where we have assumed, for the sake of conciseness, that both prior value representations are similarly uncertain (i.e., ${\sigma}_{1}^{0}\approx {\sigma}_{2}^{0}\triangleq {\sigma}_{}^{0}$). It turns out that the expected choice confidence ${\overline{P}}_{c}\left(z\right)$ always increase with $z$, irrespective of the efficacy parameters, as long as $\beta \ne 0$ or $\gamma \ne 0$. These, however, control the magnitude of the confidence gain that can be expected from allocating an amount $z$ of resources. Equation 9 is important, because it quantifies the expected benefit of resource allocation, before having processed the ensuing valuerelevant information. More details regarding the accuracy of Equation 9 can be found in section 1 of the Appendix. In addition, section 2 of the Appendix summarizes the dependence of MCDoptimal choice confidence on $\left\Delta {\mu}_{}^{0}\right$ and ${\sigma}_{}^{0}$.
To complete the cost–benefit model, we simply assume that the cost of allocating resources to the decision process linearly scales with the amount of resources, that is:
where $\alpha $ determines the effort cost of allocating a unitary amount of resources $z$. In what follows, we will refer to $\alpha $ as the ‘effort unitary cost’. We note that weak nonlinearities in the cost function (e.g., quadratic terms) would not qualitatively change the model predictions.
In brief, the MCDoptimal resource allocation $\widehat{z}\triangleq \widehat{z}\left(\alpha ,\beta ,\gamma \right)$ is simply given by:
which does not have any closedform analytic solution. Nevertheless, it can easily be identified numerically, having replaced Equations 7–9 into Equation 11. We refer the readers interested in the impact of model parameters $\left\{\alpha ,\beta ,\gamma \right\}$ on the MCDoptimal control to section 2 of the Appendix.
At this point, we do not specify how Equation 11 is solved by neural networks in the brain. Many alternatives are possible, from gradient ascent (Seung, 2003) to winnertakeall competition of candidate solutions (Mao and Massaquoi, 2007). We will also comment on the specific issue of prospective (offline) versus reactive (online) MCD processes in the Discussion section.
Note: implicit in the above model derivation is the assumption that the allocation of resources is similar for both alternative options in the choice set (i.e. ${z}_{1}^{}\approx {z}_{2}^{}\triangleq z$). This simplifying assumption is justified by eyetracking data (cf. section 8 of the Appendix).
Corollary predictions of the MCD model
In the previous section, we derived the MCDoptimal resource allocation $\widehat{z}$, which effectively best balances the expected choice confidence with the expected effort costs, given the predictable impact of stochastic perturbations that arise from processing valuerelevant information. This quantitative prediction is effectively shown in Figures 5 and 6 of the Results section below, as a function of (empirical proxies for) the prior absolute difference between modes $\left\Delta {\mu}^{0}\right$ and the prior certainty $1/{\sigma}_{}^{0}$ of value representations. But, this resource allocation mechanism has a few interesting corollary implications.
To begin with, note that knowing $\widehat{z}$ enables us to predict what confidence level the system should eventually reach. In fact, one can define the MCDoptimal confidence level as the expected confidence evaluated at the MCDoptimal amount of allocated resources, that is, ${\overline{P}}_{c}\left(\widehat{z}\right)$. This is important, because it implies that the model can predict both the effort the system will invest and its associated confidence, on a decisionbydecision basis. The impact of the efficacy parameters on this quantitative prediction is detailed in section 2 of the Appendix.
Additionally, $\widehat{z}$ determines the expected improvement in the certainty of value representations (hereafter: the ‘certainty gain’), which trivially relates to type #2 efficacy, that is: $1/\sigma \left(\widehat{z}\right)1/{\sigma}_{}^{0}=\beta \widehat{z}$. This also means that, under the MCD model, no choiceinduced value certainty gain can be expected when $\beta =0$.
Similarly, one can predict the MCDoptimal probability of changing one's mind. Recall that the probability $Q\left(z\right)$ of changing one's mind depends on the amount of allocated resources $z$, that is:
One can see that the MCDoptimal probability of changing one's mind $Q\left(\widehat{z}\right)$ is a simple monotonic function of the allocated effort $\widehat{z}$. Importantly, $Q\left(z\right)=0$ when $\gamma =0$. This implies that MCD agents do not change their minds when effort cannot change the relative position of the modes of the options’ value representations (irrespective of type #1 effort efficacy). In retrospect, this is critical, because there should be no incentive to invest resources in deliberation if it would not be possible to change one’s predeliberation preference.
Lastly, we can predict the magnitude of choiceinduced preference change, that is, how value representations are supposed to spread apart during the decision. Such an effect is typically measured in terms of the socalled ‘spreading of alternatives’ or SoA, which is defined as follows:
where $\Delta \delta \left(z\right)\sim N\left(0,2\gamma z\right)$ is the cumulative perturbation term of the modes' difference. Taking the expectation of the righthand term of Equation 13 under the distribution of $\Delta \delta \left(z\right)$ and evaluating it at $z=\widehat{z}$ now yields the MCDoptimal spreading of alternatives $\overline{SOA}\left(\widehat{z}\right)$:
where the last line derives from the expression of the firstorder moment of the truncated Gaussian distribution. Note that the expected preference change also increases monotonically with the allocated effort $\widehat{z}$. Here again, under the MCD model, no preference change can be expected when $\gamma =0$.
We note that all of these corollary predictions essentially capture choiceinduced modifications of value representations. This is why we will refer to choice confidence, value certainty gain, change of mind, and spreading of alternatives as ‘decisionrelated’ variables.
Correspondence between model variables and empirical measures
In summary, the MCD model predicts cognitive effort (or, more properly, the amount of allocated resources) and decisionrelated variables, given the prior absolute difference between modes $\left\Delta {\mu}^{0}\right$ and the prior certainty $1/{\sigma}_{}^{0}$ of value representations. In other words, the inputs to the MCD model are the prior moments of value representations, whose trialbytrial variations determine variations in model predictions. Here, we simply assume that prechoice value and value certainty ratings provide us with an approximation of these prior moments. More precisely, we use ΔVR^{0} and VCR^{0} (cf. section 3.3 below) as empirical proxies for $\Delta {\mu}^{0}$ and $1/{\sigma}_{}^{0}$, respectively. Accordingly, we consider postchoice value and value certainty ratings as empirical proxies for the posterior mean $\mu \left(\widehat{z}\right)$ and precision $1/\sigma \left(\widehat{z}\right)$ of value representations, at the time when the decision was triggered (i.e., after having invested the effort $\widehat{z}$). Similarly, we match expected choice confidence ${\overline{P}}_{c}\left(z\right)$ (i.e., after having invested the effort $\widehat{z}$) with empirical choice confidence.
Note that the MCD model does not specify what the allocated resources are. In principle, both mnesic and attentional resources may be engaged when processing valuerelevant information. Nevertheless, what really matters is assessing the magnitude $z$ of decisionrelated effort. We think of $z$ as the cumulative engagement of neurocognitive resources, which varies both in terms of duration and intensity. Empirically, we relate $\widehat{z}$ to two different ‘effortrelated’ empirical measures, namely subjective feeling of effort and response time. The former relies on the subjective cost incurred when deploying neurocognitive resources, which would be signaled by experiencing mental effort. The latter makes sense if one thinks of response time in terms of effort duration. Although it is a more objective measurement than subjective rating of effort, response time only approximates $\widehat{z}$ if effort intensity shows relatively small variations. We will comment on this in the Discussion section.
Finally, the MCD model is also agnostic about the definition of ‘decision importance’, that is, the weight $R$ in Equation 2. In this work, we simply investigate the effect of decision importance by comparing subjective effort and response time in ‘neutral’ versus ‘consequential’ decisions (cf. section 'Task conditions' below). We will also comment on this in the Discussion section.
Materials and methods
Participants
Participants for our study were recruited from the RISC (Relais d’Information sur les Sciences de la Cognition) subject pool through the ICM (Institut du Cerveau et de la Moelle – Paris Brain Institute). All participants were native French speakers, with no reported history of psychiatric or neurological illness. A total of 41 people (28 female; age: mean = 28, SD = 5, min = 20, max = 40) participated in this study. The experiment lasted approximately 2 hr, and participants were paid a flat rate of 20€ as compensation for their time, plus a bonus, which was given to participants to compensate for potential financial losses in the ‘penalized’ trials (see below). More precisely, in ‘penalized’ trials, participants lost 0.20€ (out of a 5€ bonus) for each second that they took to make their choice. This resulted in an average 4€ bonus (across participants). One group of 11 participants was excluded from the crosscondition analysis only (see below) due to technical issues.
Materials
Written instructions provided detailed information about the sequence of tasks within the experiment, the mechanics of how participants would perform the tasks, and images illustrating what a typical screen within each task section would look like. The experiment was developed using Matlab and PsychToolbox, and was conducted entirely in French. The stimuli for this experiment were 148 digital images, each representing a distinct food item (50 fruits, 50 vegetables, and 48 various snack items including nuts, meats, and cheeses). Food items were selected such that most items would be well known to most participants.
Eye gaze position and pupil size were continuously recorded throughout the duration of the experiment using The Eye Tribe eyetracking devices. Participants’ head positions were fixed using stationary chinrests. In case of incidental movements, we corrected the pupil size data for distance to screen, separately for each eye.
Task design
Prior to commencing the testing session of the experiment, participants underwent a brief training session. The training tasks were identical to the experimental tasks, although different stimuli were used (beverages). The experiment itself began with an initial section where all individual items were displayed in a random sequence for 1.5 s each, in order to familiarize the participants with the set of options they would later be considering and form an impression of the range of subjective value for the set. The main experiment was divided into three sections, following the classic FreeChoice Paradigm protocol (e.g., Izuma and Murayama, 2013): prechoice item ratings, choice, and postchoice item ratings. There was no time limit for the overall experiment, nor for the different sections, nor for the individual trials. The item rating and choice sections are described below (see Figure 3).
Item rating (same for prechoice and postchoice sessions)
Request a detailed protocolParticipants were asked to rate the entire set of items in terms of how much they liked each item. The items were presented one at a time in a random sequence (pseudorandomized across participants). At the onset of each trial, a fixation cross appeared at the center of the screen for 750 ms. Next, a solitary image of a food item appeared at the center of the screen. Participants had to respond to the question, ‘How much do you like this item?’ using a horizontal slider scale (from ‘I hate it!’ to ‘I love it!”) to indicate their value rating for the item. The middle of the scale was the point of neutrality (‘I don’t care about it.”). Hereafter, we refer to the reported value as the ‘prechoice value rating’. Participants then had to respond to the question, ‘What degree of certainty do you have?’ (about the item’s value) by expanding a solid bar symmetrically around the cursor of the value slider scale to indicate the range of possible value ratings that would be compatible with their subjective feeling. We measured participants' certainty about value rating in terms of the percentage of the value scale that is not occupied by the reported range of compatible value ratings. We refer to this as the ‘prechoice value certainty rating’. At that time, the next trial began.
Note
Request a detailed protocolIn the Results section below, ΔVR^{0} is the difference between prechoice value ratings of items composing a choice set. Similarly, VCR^{0} is the average prechoice value certainty ratings across items composing a choice set. Both value and value certainty rating scales range from 0 to 1 (but participants were unaware of the quantitative units of the scales).
Choice
Request a detailed protocolParticipants were asked to choose between pairs of items in terms of which item they preferred. The entire set of items was presented one pair at a time in a random sequence. Each item appeared in only one pair. At the onset of each trial, a fixation cross appeared at the center of the screen for 750 ms. Next, two images of snack items appeared on the screen: one toward the left and one toward the right. Participants had to respond to the question, ‘Which do you prefer?’ using the left or right arrow key. We measured response time in terms of the delay between the stimulus onset and the response. Participants then had to respond to the question, ‘Are you sure about your choice?’ using a vertical slider scale (from ‘Not at all!’ to ‘Absolutely!'). We refer to this as the report of choice confidence. Finally, participants had to respond to the question, ‘To what extent did you think about this choice?’ using a horizontal slider scale (from ‘Not at all!’ to ‘Really a lot!”). We refer to this as the report of subjective effort. At that time, the next trial began.
Task conditions
Request a detailed protocolWe partitioned the task trials into three conditions, which were designed to test the following two predictions of the MCD model: all else equal, effort should increase with decision importance and decrease with related costs. We aimed to check the former prediction by asking participants to make some decisions where they knew that the choice would be real, that is, they would actually have to eat the chosen food item at the end of the experiment. We refer to these trials as ‘consequential’ decisions. To check the latter prediction, we imposed a financial penalty that increased with response time. More precisely, participants were instructed that they would lose 0.20€ (out of a 5€ bonus) for each second that they would take to make their choice. The choice section of the experiment was composed of 60 ‘neutral’ trials, 7 ‘consequential’ trials, and 7 ‘penalized’ trials, which were randomly intermixed. Instructions for both ‘consequential’ and ‘penalized’ decisions were repeated at each relevant trial, immediately prior to the presentation of the choice items.
Probabilistic model fit
Request a detailed protocolThe MCD model predicts trialbytrial variations in the probability of changing one’s mind, choice confidence, spreading of alternatives, certainty gain, response time, and subjective effort ratings (MCD outputs) from trialbytrial variations in value rating difference ΔVR^{0} and mean value certainty rating VCR^{0} (MCD inputs). Together, three unknown parameters control the quantitative relationship between MCD inputs and outputs: the effort unitary cost $\alpha $, type #1 effort efficacy $\beta $, and type #2 effort efficacy $\gamma $. However, additional parameters are required to capture variations induced by experimental conditions. Recall that we expect ‘consequential’ decisions to be more important than ‘neutral’ ones, and ‘penalized’ decisions effectively include an extraneous costoftime term. One can model the former condition effect by making $R$ (cf. Equation 2) sensitive to whether the decision is consequential or not. We proxy the latter condition effect by making the effort unitary cost $\alpha $ a function of whether the decision is penalized (where effort induces both intrinsic and extrinsic costs) or not (intrinsic effort cost only). This means that condition effects require one additional parameter each.
In principle, all of these parameters may vary across people, thereby capturing idiosyncrasies in people’s (meta)cognitive apparatus. However, in addition to estimating these five parameters, fitting the MCD model to each participant’s data also requires a rescaling of the model’s output variables. This is because there is no reason to expect the empirical measure of these variables to match their theoretical scale. We thus inserted two additional nuisance parameters per output MCD variable, which operate a linear rescaling (affine transformation, with a positive constraint on slope parameters). Importantly, these nuisance parameters cannot change the relationship between MCD inputs and outputs. In other terms, the MCD model really has only five degrees of freedom.
For each subject, we fit all MCD dependent variables concurrently with a single set of MCD parameters. Withinsubject probabilistic parameter estimation was performed using the variational Laplace approach (Daunizeau, 2017b; Friston et al., 2007), which is made available from the VBA toolbox (Daunizeau et al., 2014). We refer the reader interested in the mathematical details of withinsubject MCD parameter estimation to the section 3 of the Appendix (this also includes a parameter recovery analysis). In what follows, we compare empirical data to MCDfitted dependent variables (when binned according to ΔVR^{0} and VCR^{0}). We refer to the latter as ‘postdictions’, in the sense that they derive from a posterior predictive density that is conditional on the corresponding data.
We also fit the MCD model on reduced subsets of dependent variables (e.g., only ‘effortrelated’ variables), and report proper outofsample predictions of data that were not used for parameter estimation (e.g., ‘decisionrelated’ variables). We note that this is a strong test of the model, since it does not rely on any train/test partition of the predicted variable (see next section below).
Results
Here, we test the predictions of the MCD model. We note that basic descriptive statistics of our data, including measures of test–retest reliability and replications of previously reported effects on confidence in valuebased choices (De Martino et al., 2013), are appended in sections 5–7 of the Appendix.
Withinsubject model fit accuracy and outofsample predictions
To capture idiosyncrasies in participants’ metacognitive control of decisions, the MCD model was fitted to subjectspecific trialbytrial data, where all MCD outputs (namely change of mind, choice confidence, spreading of alternatives, value certainty gain, response time, and subjective effort ratings) were considered together. In the next section, we present summary statistics at the group level, which validate the predictions that can be derived from the MCD model, when fitted to all dependent variables. But can we provide even stronger evidence that the MCD model is capable of predicting all dependent variables at once? In particular, can the model make outofsample predictions regarding effortrelated variables (i.e., RT and subjective effort ratings) given decisionrelated variables (i.e., choice confidence, change of mind, spreading of alternatives, and certainty gain), and vice versa?
To address this question, we performed two partial model fits: (i) with decisionrelated variables only, and (ii) with effortrelated variables only. In both cases, outofsample predictions for the remaining dependent variables were obtained directly from withinsubject parameter estimates. For each subject, we then estimated the crosstrial correlation between each pair of observed and predicted variables. Figure 4 reports the ensuing groupaverage correlations, for each dependent variable and each model fit. In this context, the predictions derived when fitting the full dataset only serve as a reference point for evaluating the accuracy of outofsample predictions. For completeness, we also show chancelevel prediction accuracy (i.e. the 95% percentile of group average correlations between observed and predicted variables under the null).
In what follows, we refer to model predictions on dependent variables that were actually fitted by the model as ‘postdictions’ (full data fits: all dependent variables, partial model fits: variables included in the fit). As one would expect, the accuracy of postdictions is typically higher than that of outofsample predictions. Slightly more interesting, perhaps, is the fact that the accuracy of model predictions/postdictions depends upon which output variable is considered. For example, choice confidence is always better predicted/postdicted than spreading of alternatives. This is most likely because the latter data has lower reliability.
But the main result of this analysis is the fact that outofsample predictions of dependent variables perform systematically better than chance. In fact, all acrosstrial correlations between observed and predicted (outofsample) data were statistically significant at the grouplevel (all p<10^{−3}). In particular, this implies that the MCD model makes accurate outofsample predictions regarding effortrelated variables given decisionrelated variables, and reciprocally.
Predicting effortrelated variables
In what follows, we inspect the threeway relationships between prechoice value and value certainty ratings and each effortrelated variable: namely, RT and subjective effort rating. The former can be thought of as a proxy for the duration of resource allocation, whereas the latter is a metacognitive readout of resource allocation cost. Unless stated otherwise, we will focus on both the absolute difference between prechoice value ratings (hereafter: ΔVR^{0}) and the mean prechoice value certainty rating across paired choice items (hereafter: VCR^{0}). Under the MCD model, increasing ΔVR^{0} and/or VCR^{0} will decrease the demand for effort, which should result in smaller expected RT and subjective effort rating. We will now summarize the empirical data and highlight the corresponding quantitative MCD model postdictions and outofsample predictions (here: predictions are derived from model fits on decisionrelated variables only, that is, all dependent variables except RT and subjective effort rating).
First, we checked how RT relates to prechoice value and value certainty ratings. For each subject, we regressed (log) RT data against ΔVR^{0} and VCR^{0}, and then performed a grouplevel randomeffect analysis on regression weights. The results of this modelfree analysis provide a qualitative summary of the impact of trialbytrial variations in prechoice value representations on RT. We also compare RT data with both MCD model postdictions (full data fit) and outofsample predictions. In addition to summarizing the results of the modelfree analysis, Figure 5 shows the empirical, predicted, and postdicted RT data, when mediansplit (within subjects) according to both ΔVR^{0} and VCR^{0}.
One can see that RT data behave as expected under the MCD model, that is, RT decreases when ΔVR^{0}and/or VCR^{0} increases. The random effect analysis shows that both variables have a significant negative effect at the group level (ΔVR^{0}: mean standardized regression weight = −0.16, s.e.m. = 0.02, p<10^{−3}; VCR^{0}: mean standardized regression weight = −0.08, s.e.m. = 0.02, p<10^{−3}; onesided ttests). Moreover, MCD postdictions are remarkably accurate at capturing the effect of both ΔVR^{0}and VCR^{0} variables in a quantitative manner. Although MCD outofsample predictions are also very accurate, they tend to slightly underestimate the quantitative effect of ΔVR^{0}. This is because this effect is typically less pronounced in decisionrelated variables than in effortrelated variables (see below), which then yield MCD parameter estimates that eventually attenuate the impact of ΔVR^{0} on effort.
Second, we checked how subjective effort ratings relate to prechoice value and value certainty ratings. We performed the same analysis as above, the results of which are summarized in Figure 6.
Here as well, subjective effort rating data behave as expected under the MCD model, that is, subjective effort decreases when ΔVR^{0} and/or VCR^{0} increases. The random effect analysis shows that both variables have a significant negative effect at the group level (ΔVR^{0}: mean standardized regression weight = −0.21, s.e.m. = 0.03, p<10^{−3}; VCR^{0}: mean regression weight = −0.05, s.e.m. = 0.02, p=0.027; onesided ttests). One can see that MCD postdictions and outofsample predictions accurately capture the effect of both ΔVR^{0}and VCR^{0} variables. More quantitatively, we note that MCD postdictions slightly overestimate the effect VCR^{0}, whereas outofsample predictions also tend to underestimate the effect of ΔVR^{0}.
At this point, we note that the MCD model makes two additional predictions regarding effortrelated variables, which relate to our task conditions. In brief, all else equal, effort should increase in ‘consequential’ trials, while it should decrease in ‘penalized’ trials. To test these predictions, we modified the modelfree regression analysis of RT and subjective effort ratings by including two additional subjectlevel regressors, encoding consequential and penalized trials, respectively. Figure 7 shows the ensuing augmented set of standardized regression weights for both RT and subjective effort ratings.
First, we note that accounting for task conditions does not modify the statistical significance of the impact of ΔVR^{0} and VCR^{0} on effortrelated variables, except for the effect of VCR^{0} on subjective effort ratings (p=0.09, onesided ttest). Second, one can see that the impact of ‘consequential’ and ‘penalized’ conditions on effortrelated variables globally conforms to MCD predictions. More precisely, both RT and subjective effort ratings were significantly higher for ‘consequential’ decisions than for ‘neutral’ decisions (logRT: mean standardized regression weight = 0.07, s.e.m. = 0.03, p=0.036; effort ratings: mean standardized regression weight = 0.12, s.e.m. = 0.03, p<10^{−3}; onesided ttests). In addition, response times are significantly faster for ‘penalized’ than for ‘neutral’ decisions (mean standardized regression weight = −0.26, s.e.m. = 0.03, p<10^{−3}; onesided ttest). However, the difference in subjective effort ratings between ‘neutral’ and ‘penalized’ decisions does not reach statistical significance (mean effort difference = 0.012, s.e.m. = 0.024, p=0.66; twosided ttest). We will comment on this in the Discussion section.
Predicting decisionrelated variables
Under the MCD model, ‘decisionrelated’ dependent variables (i.e., choice confidence, change of mind, spreading of alternatives, and value certainty gain) are determined by the amount of resources allocated to the decision. However, their relationship to features of prior value representation is not trivial (see section 2 of the Appendix for the specific case of choice confidence). For this reason, we will recapitulate the qualitative MCD prediction that can be made about each of them, prior to summarizing the empirical data and its corresponding postdictions and outofsample predictions. Note that here, the latter are obtained from a model fit on effortrelated variables only.
First, we checked how choice confidence relates to ΔVR^{0} and VCR^{0}. Under the MCD model, choice confidence reflects the discriminability of the options’ value representations after optimal resource allocation. Recall that more resources are allocated to the decision when either ΔVR^{0} or VCR^{0} decreases. However, under moderate effort efficacies, this does not overcompensate decision difficulty, and thus choice confidence should decrease. As with effortrelated variables, we regressed trialbytrial confidence data against ΔVR^{0} and VCR^{0}, and then performed a grouplevel randomeffect analysis on regression weights. The results of this analysis, as well as the comparison between empirical, predicted, and postdicted confidence data are shown in Figure 8.
The results of the grouplevel random effect analysis confirm our qualitative predictions. In brief, both ΔVR^{0} (mean standardized regression weight = 0.25, s.e.m. = 0.02, p<10^{−3}; onesided ttest) and VCR^{0} (mean standardized regression weight = 0.16, s.e.m. = 0.03, p<10^{−3}; onesided ttest) have a significant positive impact on choice confidence. Here again, MCD postdictions and outofsample predictions are remarkably accurate at capturing the effect of both ΔVR^{0}and VCR^{0} variables (though predictions slightly underestimate the effect of ΔVR^{0}).
Second, we checked how change of mind relates to ΔVR^{0} and VCR^{0}. Note that we define a change of mind according to two criteria: (i) the choice is incongruent with the prior preference inferred from the prechoice value ratings, and (ii) the choice is congruent with the posterior preference inferred from postchoice value ratings. The latter criterion distinguishes a change of mind from a mere ‘error’, which may arise from attentional and/or motor lapses. Under the MCD model, we expect no change of mind unless type #2 efficacy $\gamma \ne 0$. In addition, the rate of change of mind should decrease when either ΔVR^{0} or VCR^{0} increases. This is because increasing ΔVR^{0} and/or VCR^{0} will decrease the demand for effort, which implies that the probability of reversing the prior preference will be smaller. Figure 9 shows the corresponding model predictions/postdictions and summarizes the corresponding empirical data.
Note that, on average, the rate of change of mind reaches about 14.5% (s.e.m. = 0.008, p<10^{−3}, onesided ttest), which is significantly higher than the rate of ‘error’ (mean rate difference = 2.3%, s.e.m. = 0.01, p=0.032; twosided ttest). The results of the grouplevel random effect analysis confirm our qualitative MCD predictions. In brief, both ΔVR^{0} (mean standardized regression weight = −0.17, s.e.m. = 0.02, p<10^{−3}; onesided ttest) and VCR^{0} (mean standardized regression weight = −0.08, s.e.m. = 0.03, p<10^{−3}; onesided ttest) have a significant negative impact on the rate of change of mind. Again, MCD postdictions and outofsample predictions are remarkably accurate at capturing the effect of both ΔVR^{0}and VCR^{0} variables (though predictions slightly underestimate the effect of ΔVR^{0}).
Third, we checked how spreading of alternatives relates to ΔVR^{0} and VCR^{0}. Recall that spreading of alternatives measures the magnitude of choiceinduced preference change. Under the MCD model, the reported value of alternative options cannot spread apart unless type #2 efficacy $\gamma \ne 0$. In addition, and as with change of mind, spreading of alternatives should globally follow the optimal effort allocation, that is, it should decrease when ΔVR^{0} and/or VCR^{0} increase. Figure 10 shows the corresponding model predictions/postdictions and summarizes the corresponding empirical data.
One can see that there is a significant positive spreading of alternatives (mean = 0.04 A.U., s.e.m. = 0.004, p<10^{−3}, onesided ttest). This is reassuring, because it dismisses the possibility that $\gamma =0$ (which would mean that effort does not perturb the mode of value representations). In addition, the results of the grouplevel random effect analysis confirm that both ΔVR^{0} (mean standardized regression weight = −0.09, s.e.m. = 0.03, p=0.001; onesided ttest) and VCR^{0} (mean standardized regression weight = −0.04, s.e.m. = 0.02, p=0.03; onesided ttest) have a significant negative impact on spreading of alternatives. Note that this replicates previous findings on choiceinduced preference change (Lee and Coricelli, 2020; Lee and Daunizeau, 2020). Finally, MCD postdictions and outofsample predictions accurately capture the effect of both ΔVR^{0} and VCR^{0} variables in a quantitative manner (though predictions slightly underestimate the effect of ΔVR^{0}).
Fourth, we checked how ΔVR^{0} and VCR^{0} impact value certainty gain. Under the MCD model, the certainty of value representations cannot improve unless type #1 efficacy $\beta \ne 0$. In addition, value certainty gain should globally follow the optimal effort allocation, i.e., it should decrease when ΔVR^{0} and/or VCR^{0} increase. Figure 11 shows the corresponding model predictions/postdictions and summarizes the corresponding empirical data.
Importantly, there is a small but significantly positive certainty gain (mean = 0.11 A.U., s.e.m. = 0.06, p=0.027, onesided ttest). This is reassuring, because it dismisses the possibility that $\beta =0$ (which would mean that effort does not increase the precision of value representation). This time, the results of the grouplevel random effect analysis only partially confirm our qualitative MCD predictions. In brief, although VCR^{0} has a very strong negative impact on certainty gain (mean standardized regression weight = −0.61, s.e.m. = 0.04, p<10^{−3}; onesided ttest), the effect of ΔVR^{0} does not reach statistical significance (mean standardized regression weight = −0.009, s.e.m. = 0.01, p=0.35; onesided ttest). We note that a simple regressiontothemean artifact (Stigler, 1997) likely inflates the observed negative correlation between VCR^{0} and certainty gain, beyond what would be predicted under the MCD model. Accordingly, both MCD postdictions and outofsample predictions clearly underestimate the effect of VCR^{0} (and overestimate the effect of ΔVR^{0}).
Discussion
In this work, we have presented a novel computational model of decisionmaking that explains the intricate relationships between effortrelated variables (response time, subjective effort) and decisionrelated variables (choice confidence, change of mind, spreading of alternatives, and choiceinduced value certainty gain). This model assumes that deciding between alternative options whose values are uncertain induces a demand for allocating cognitive resources to valuerelevant information processing. Cognitive resource allocation then optimally trades mental effort for confidence, given the prior discriminability of the value representations.
Such metacognitive control of decisions or MCD provides an alternative theoretical framework to accumulationtobound models of decisionmaking, e.g., driftdiffusion models or DDMs (Milosavljevic et al., 2010; Ratcliff et al., 2016; Tajima et al., 2016). Recall that DDMs assume that decisions are triggered once the noisy evidence in favor of a particular option has reached a predefined bound. Standard DDM variants make quantitative predictions regarding both response times and decision outcomes, but are agnostic about choice confidence, spreading of alternatives, value certainty gain, and/or subjective effort ratings. We note that simple DDMs are significantly less accurate than MCD at making outofsample predictions on dependent variables common to both models (e.g., change of mind). We refer the reader interested in the details of the MCD–DDM comparison to section 9 of the Appendix.
But how do MCD and accumulationtobound models really differ? For example, if the DDM can be understood as an optimal policy for valuebased decisionmaking (Tajima et al., 2016), then how can these two frameworks both be optimal? The answer lies in the distinct computational problems that they solve. The MCD solves the problem of finding the optimal amount of effort to invest under the possibility that yetunprocessed valuerelevant information might change the decision maker’s mind. In fact, this resource allocation problem would be vacuous, would it not be possible to reassess preferences during the decision process. In contrast, the DDM provides an optimal solution to the problem of efficiently comparing option values, which may be unreliably signaled, but remain nonetheless stationary. Of course, the DDM decision variable (i.e., the ‘evidence’ for a given choice option over the alternative) may fluctuate, e.g. it may first drift toward the upper bound, but then eventually reach the lower bound. This is the typical DDM’s explanation for why people change their mind over the course of deliberation (Kiani et al., 2014; Resulaj et al., 2009). But, critically, these fluctuations are not caused by changes in the underlying value signal (i.e., the DDM’s drift term). Rather, the fluctuations are driven by neural noise that corrupts the value signals (i.e., the DDM’s diffusion term). This is why the DDM cannot predict choiceinduced preference changes, or changes in options’ values more generally. This distinction between MCD and DDM extends to other types of accumulationtobound models, including race models (De Martino et al., 2013; Tajima et al., 2019). We note that either of these models (DDM or race) could be equipped with prechoice value priors (initial bias), and then driven with ‘true’ values (drift term) derived from postchoice ratings. But then, simulating these models would require both prechoice and postchoice ratings, which implies that choiceinduced preference changes cannot be predicted from prechoice ratings using a DDM. In contrast, the MCD model assumes that the value representations themselves are modified during the decision process, in proportion to the effort expenditure. Now the latter is maximal when prior value difference is minimal, at least when type #2 efficacy dominates (γeffect, see section 2 of the Appendix). In turn, the MCD model predicts that the magnitude of (choiceinduced) value spreading should decrease when the prior value difference increases (cf. Equation 14). Together with (choiceinduced) value certainty gain, this quantitative prediction is unique to the MCD framework, and cannot be derived from existing variants of DDM.
As a side note, the cognitive essence of spreading of alternatives has been debated for decades. Its typical interpretation is that of ‘cognitive dissonance’ reduction: if people feel uneasy about their choice, they later convince themselves that the chosen (rejected) item was actually better (worse) than they originally thought (Bem, 1967; HarmonJones et al., 2009; Izuma and Murayama, 2013). In contrast, the MCD framework would rather suggest that people tend to reassess value representations until they reach a satisfactory level of confidence prior to committing to their choice. Interestingly, recent neuroimaging studies have shown that spreading of alternatives can be predicted from brain activity measured during the decision (Colosio et al., 2017; Jarcho et al., 2011; Kitayama et al., 2013; van Veen et al., 2009, Voigt et al., 2019). This is evidence against the idea that spreading of alternatives only occurs after the choice has been made. In addition, key regions of the brain’s valuation and cognitive control systems are involved, including: the right inferior frontal gyrus, the ventral striatum, the anterior insula, and the anterior cingulate cortex (ACC). This further corroborates the MCD interpretation, under the assumption that the ACC is involved in controlling the allocation of cognitive effort (Musslick et al., 2015; Shenhav et al., 2013). Having said this, both MCD and cognitive dissonance reduction mechanisms may contribute to spreading of alternatives, on top of its known statistical artifact component (Chen and Risen, 2010). The latter is a consequence of the fact that prechoice value ratings may be unreliable and is known to produce an apparent spreading of alternatives that decreases with prechoice value difference (Izuma and Murayama, 2013). Although this pattern is compatible with our results, the underlying statistical confound is unlikely to drive our results. The reason is twofold. First, effortrelated variables yield accurate withinsubject outofsample predictions about spreading of alternatives (cf. Figure 10). Second, we have already shown that the effect of prechoice value difference on spreading of alternatives is higher here than in a control condition where the choice is made after both rating sessions (Lee and Daunizeau, 2020).
A central tenet of the MCD model is that involving cognitive resources in valuerelated information processing is costly, which calls for an efficient resource allocation mechanism. A related notion is that information processing resources may be limited, in particular: valueencoding neurons may have a bounded firing range (Louie and Glimcher, 2012). In turn, ‘efficient coding’ theory assumes that the brain has evolved adaptive neural codes that optimally account for such capacity limitations (Barlow, 1961; Laughlin, 1981). In our context, efficient coding implies that valueencoding neurons should optimally adapt their firing range to the prior history of experienced values (Polanía et al., 2019). When augmented with a Bayesian model of neural encoding/decoding (Wei and Stocker, 2015), this idea was successful in explaining the nontrivial relationship between choice consistency and the distribution of subjective value ratings. Both MCD and efficient coding frameworks assume that value representations are uncertain, which stresses the importance of metacognitive processes in decisionmaking control (Fleming and Daw, 2017). However, they differ in how they operationalize the notion of efficiency. In efficient coding, the system is ‘efficient’ in the sense that it changes the physiological properties of valueencoding neurons to minimize the information loss that results from their limited firing range. In MCD, the system is ‘efficient’ in the sense that it allocates the amount of resources that optimally trades effort cost against choice confidence. These two perspectives may not be easy to reconcile. A possibility is to consider, for example, energyefficient population codes (Hiratani and Latham, 2020; Yu et al., 2016), which would tune the amount of neural resources involved in representing value to optimally trade information loss against energetic costs.
Now, let us highlight that the MCD model offers a plausible alternative interpretation for the two main reported neuroimaging findings regarding confidence in valuebased choices (De Martino et al., 2013). First, the ventromedial prefrontal cortex or vmPFC was found to respond positively to both value difference (i.e., ΔVR^{0}) and choice confidence. Second, the right rostrolateral prefrontal cortex or rRLPFC was more active during lowconfidence versus highconfidence choices. These findings were originally interpreted through a socalled ‘race model’, in which a decision is triggered whenever the first of optionspecific value accumulators reaches a bound. Under this model, choice confidence is defined as the final gap between the two value accumulators. We note that this scenario predicts the same threeway relationship between response time, choice outcome, and choice confidence as the MCD model (see section 7 of the Appendix). In brief, rRLPFC was thought to perform a readout of choice confidence (for the purpose of subjective metacognitive report) from the racing value accumulators hosted in the vmPFC. Under the MCD framework, the contribution of the vmPFC to valuebased decision control might rather be to construct item values, and to anticipate and monitor the benefit of effort investment (i.e., confidence). This would be consistent with recent fMRI studies suggesting that vmPFC confidence computations signal the attainment of task goals (Hebscher and Gilboa, 2016; Lebreton et al., 2015). Now, recall that the MCD model predicts that confidence and effort should be anticorrelated. Thus, the puzzling negative correlation between choice confidence and rRLPFC activity could be simply explained under the assumption that rRLPFC provides the neurocognitive resources that are instrumental for processing valuerelevant information during decisions (and/or to compare item values). This resonates with the known involvement of rRLPFC in reasoning (Desrochers et al., 2015; Dumontheil, 2014) or memory retrieval (Benoit et al., 2012; Westphal et al., 2019).
At this point, we note that the current MCD model clearly has limited predictive power. Arguably, this limitation is partly due to the imperfect reliability of the data, and to the fact that MCD does not model all decisionrelevant processes. In addition, assigning variations in many effort and/or decisionrelated variables to a unique mechanism with few degrees of freedom necessarily restricts the model’s expected predictive power. Nevertheless, the MCD model may also not yield a sufficiently tight approximation to the mechanism that it focuses on. In turn, it may unavoidably distort the impact of prior value representations and other decision input variables. The fact that it can only explain 81% of the variability in dependent variables that can be captured using simple linear regressions against ΔVR0 and VCR0 (see section 11 of the Appendix) supports this notion. A likely explanation here is that the MCD model includes constraints that prevent it from matching the modelfree postdiction accuracy level. In turn, one may want to extend the MCD model with the aim of relaxing these constraints. For example, one may allow for deviations from the optimal resource allocation framework, e.g., by considering candidate systematic biases whose magnitudes would be controlled by specific additional parameters. Having said this, some of these constraints may be necessary, in the sense that they derive from the modeling assumptions that enable the MCD model to provide a unified explanation for all dependent variables (and thus make outofsample predictions). What follows is a discussion of what we perceive as the main limitations of the current MCD model, and the directions of improvement they suggest.
First, we did not specify what determines decision ‘importance’, which effectively acts as a weight for confidence against effort costs (cf. $R$ in Equation 2 of the Model section). We know from the comparison of ‘consequential’ and ‘neutral’ choices that increasing decision importance eventually increases effort, as predicted by the MCD model. However, decision importance may have many determinants, such as, for example, the commitment duration of the decision (e.g., life partner choices), the breadth of its repercussions (e.g., political decisions), or its instrumentality with respect to the achievement of superordinate goals (e.g., moral decisions). How these determinants are combined and/or moderated by the decision context is virtually unknown (Locke and Latham, 2002; Locke and Latham, 2006). In addition, decision importance may also be influenced by the prior (intuitive/emotional/habitual) appraisal of choice options. For example, we found that, all else equal, people spent more time and effort deciding between two disliked items than between two liked items (results not shown). This reproduces recent results regarding the evaluation of choice sets (Shenhav and Karmarkar, 2019). One may also argue that people should care less about decisions between items that have similar values (Oud et al., 2016). In other terms, decision importance would be an increasing function of the absolute difference in prechoice value ratings. However, this would predict that people invest fewer resources when deciding between items of similar prechoice values, which directly contradicts our results (cf. Figures 5 and 6). Importantly, options with similar values may still be very different from each other, when decomposed on some valuerelevant feature space. For example, although two food items may be similarly liked and/or wanted, they may be very different in terms of, e.g., tastiness and healthiness, which would induce some form of decision conflict (Hare et al., 2009). In such a context, making a decision effectively implies committing to a preference about feature dimensions. This may be deemed to be consequential, when contrasted with choices between items that are similar in all regards. In turn, decision importance may rather be a function of options’ feature conflict. In principle, this alternative possibility is compatible with our results, under the assumption that options’ feature conflict is approximately orthogonal to prechoice value difference. Considering how decision importance varies with feature conflict may significantly improve the amount of explained trialbytrial variability in the model’s dependent variables. We note that the brain’s quick/automatic assessment of option features may also be the main determinant of the prior value representations that eventually serve to compute the MCDoptimal resource allocation. Probing these computational assumptions will be the focus of forthcoming publications.
Second, our current version of the MCD model relies on a simple variant of resource costs and efficacies. One may thus wonder how sensitive model predictions are to these assumptions. For example, one may expect that type #2 efficacy saturates, i.e. that the magnitude of the perturbation $\delta \left(z\right)$ to the modes $\mu \left(z\right)$ of the value representations eventually reaches a plateau instead of growing linearly with $z$ (cf. Equation 6). We thus implemented and tested such a model variant. We report the results of this analysis in section 10 of the Appendix. In brief, a saturating type #2 efficacy brings no additional explanatory power for the model’s dependent variables. Similarly, rendering the cost term nonlinear (e.g., quadratic) does not change the qualitative nature of the MCD predictions. More problematic, perhaps, is the fact that we did not consider distinct types of effort, which could, in principle, be associated with different costs and/or efficacies. For example, the efficacy of allocating attention may depend upon which option is considered. In turn, the brain may dynamically refocus its attention on maximally uncertain options when prospective information gains exceed switch costs (Callaway et al., 2021; Jang et al., 2021). Such optimal adjustment of divided attention might eventually explain systematic decision biases and shortened response times for ‘default’ choices (LopezPersem et al., 2016). Another possibility is that effort might be optimized along two canonical dimensions, namely duration and intensity. The former dimension essentially justifies the fact that we used RT as a proxy for the amount of allocated resources. This is because, if effort intensity stays constant, then longer RT essentially signals greater resource expenditure. In fact, as is evident from the comparison between ‘penalized’ and ‘neutral’ choices, imposing an external penalty cost on RT reduces, as expected, the ensuing effort duration. More generally, however, the dual optimization of effort dimensions might render the relationship between effort and RT more complex. For example, beyond memory span or attentional load, effort intensity could be related to processing speed. This would explain why, although ‘penalized’ choices are made much faster than ‘neutral’ choices, the associated subjective feeling of effort is not as strongly impacted as RT (cf. Figure 7). In any case, the relationship between effort and RT might depend upon the relative costs and/or efficacies of effort duration and intensity, which might themselves be partially driven by external availability constraints (cf. time pressure or multitasking). We note that the essential nature of the cost of mental effort in cognitive tasks (e.g., neurophysiological cost, interferences cost, or opportunity cost) is still a matter of intense debate (Kurzban et al., 2013; Musslick et al., 2015; Ozcimder et al., 2017). Progress toward addressing this issue will be highly relevant for future extensions of the MCD model.
Third, we did not consider the issue of identifying plausible neurocomputational implementations of MCD. This issue is tightly linked to the previous one, in that distinct cost types would likely impose different constraints on candidate neural network architectures (Feng et al., 2014; Petri et al., 2017). For example, underlying brain circuits are likely to operate MCD in a more reactive manner, eventually adjusting resource allocation from the continuous monitoring of relevant decision variables (e.g., experienced costs and benefits). Such a reactive process contrasts with our current, prospectiveonly variant of MCD, which sets resource allocation based on anticipated costs and benefits. We already checked that simple reactive scenarios, where the decision is triggered whenever the online monitoring of effort or confidence reaches the optimal threshold, make predictions qualitatively similar to those we have presented here. We tend to think, however, that such reactive processes should be based on a dynamic programming perspective on MCD, as was already done for the problem of optimal efficient value comparison (Tajima et al., 2016; Tajima et al., 2019). We will pursue this and related neurocomputational issues in subsequent publications.
Code availability
The computer code and algorithms that support the findings of this study will soon be made available from the open academic freeware VBA (http://mbbteam.github.io/VBAtoolbox/). Until then, they are available from the corresponding author upon reasonable request.
Ethical compliance
This study complies with all relevant ethical regulations and received formal approval from the INSERM Ethics Committee (CEEIIRB00003888, decision no 16–333). In particular, in accordance with the Helsinki declaration, all participants gave written informed consent prior to commencing the experiment, which included consent to disseminate the results of the study via publication.
Appendix 1
1. On the approximation accuracy of the expected confidence gain
The MCD model relies on the system's ability to anticipate the benefit of allocating resources to the decision process. Given the mathematical expression of choice confidence (Equation 4 in the main text), this reduces to finding an analytical approximation to the following expression:
where $x\to s\left(x\right)=1/1+{\mathrm{e}}^{x}$ is the sigmoid mapping, $\lambda $ is an arbitrary constant, and the expectation is taken under the Gaussian distribution of $x\sim N\left(\mu ,{\sigma}^{2}\right)$, whose mean and variance are µ and ${\sigma}^{2}$, respectively.
Note that the absolute value mapping $x\to \leftx\right$ follows a folded normal distribution, whose first two moments $E\left[\leftx\right\right]$ and $V\left[\leftx\right\right]$ have known expressions:
where the first line relies on a momentmatching approximation to the cumulative normal distribution function (Daunizeau, 2017a). This allows us to derive the following analytical approximation to Equation A1:
where setting $a\approx 3/{\pi}^{2}$ makes this approximation tight (Daunizeau, 2017a).
The quality of this approximation can be evaluated by drawing samples of $x\sim N\left(\mu ,{\sigma}^{2}\right)$, and comparing the MonteCarlo average of $s\left(\lambda \leftx\right\right)$ with the expression given in Equation A3. This is summarized in Appendix 1—figure 1, where the range of variation for the moments of $x$ was set as follows: $\mu \in \left[4,4\right]$ and ${\sigma}^{2}\in \left[0,4\right]$.
One can see that the error rarely exceeds 5%, across the whole range of moments $\left\{\mu ,{\sigma}^{2}\right\}$ of the parent distribution. This is how tight the analytic approximation of the expected confidence gain (Equation 9 in the main text) is.
2. On the impact of model parameters for the MCD model
To begin with, note that the properties of the metacognitive control of decisions (in terms of effort allocation and/or confidence) actually depend on the demand for resources, which is itself determined by prior value representations (or, more properly, by the prior uncertainty ${\sigma}_{}^{0}$ and the absolute means' difference $\left\Delta {\mu}_{}^{0}\right$). Now, the way the MCDoptimal control responds to the resource demand is determined by effort efficacy and unitary cost parameters. In addition, MCDoptimal confidence may not trivially follow resource allocation, because it may be overcompensated by choice difficulty.
First, recall that the amount $\widehat{z}$ of allocated resources maximizes the EVC:
where ${\overline{P}}_{c}\left(z\right)$ is given in Equation 9 in the main text. According to the implicit function theorem, the derivatives of $\widehat{z}$ w.r.t. ${\sigma}_{}^{0}$ and $\left\Delta {\mu}_{}^{0}\right$ are given by Gould et al., 2016:
The double derivatives in Equations A5 are not trivial to obtain.
First, the gradient $\partial {\overline{P}}_{c}\left(z\right)/\partial \left\Delta {\mu}^{0}\right$ of choice confidence w.r.t. $\left\Delta {\mu}_{}^{0}\right$ writes:
where $K\left(z\right)\ge 0$ is given by:
Note that the gradient $\partial E\left[\left\Delta \mu \rightz\right]/\partial \left\Delta {\mu}^{0}\right\ge 0$ in Equation A6 can be obtained analytically from Equation 7 in the main text. However, we refrain from doing this, because it is clear that deriving the righthand term of Equation A6 w.r.t. both ${\sigma}^{0}$ and $z$ will not bring any simple insight regarding the impact of $\left\Delta {\mu}_{}^{0}\right$ onto $\widehat{z}$.
Also, although the gradient $\partial {\overline{P}}_{c}\left(\widehat{z}\right)/\partial {\sigma}^{0}$ of choice confidence wr.t. ${\sigma}^{0}$ takes a much more concise form:
it still remains tedious to derive its expression with respect to both ${\sigma}^{0}$ and $z$. This is why we opt for separating the respective effects of type #1 and type #2 efficacies.
First, let us ask what would be the MCDoptimal effort $\widehat{z}$ and confidence ${\overline{P}}_{c}\left(\widehat{z}\right)$ when $\gamma =0$, that is, if the only effect of allocating resources is to increase the precision of value representations. We call this the 'βeffect'. In this case, $E\left[\left\Delta \mu \rightz\right]=\left\Delta {\mu}^{0}\right$ and $V\left[\left\Delta \mu \rightz\right]=0$ irrespective of $z$. This greatly simplifies Equations A6–A8:
Inserting Equation A9 back into Equation A5 now yields:
Now the sign of the gradients of $\widehat{z}$ w.r.t. ${\sigma}_{}^{0}$ and $\left\Delta {\mu}_{}^{0}\right$ are driven by the numerators of Equation A10 because all partial derivatives of $K\left(z\right)$ have unambiguous signs:
Replacing the expression for $\partial K\left(z\right)/\partial z$ in Equation A11 into Equation A10 now yields:
At the limit $\left\Delta {\mu}^{0}\right\to 0$, then: $\partial \widehat{z}/\partial \left\Delta {\mu}^{0}\right\ge 0$ and $\partial \widehat{z}/\partial {\sigma}^{0}\ge 0$. However, one can see from Equation A12 that there may be a critical value for $\left\Delta {\mu}^{0}\right$, above which the gradient $\partial \widehat{z}/\partial \left\Delta {\mu}^{0}\right$ will eventually become negative. This means that the amount of allocated resources will behave as a bellshaped function of $\left\Delta {\mu}^{0}\right$. This may not be the case along the ${\sigma}_{}^{0}$ direction though, because ${\sigma}_{}^{0}\ge \sigma \left(z\right)$ and the last term in the brackets shrinks as ${\sigma}_{}^{0}$ increases.
Similar derivations eventually yield expressions for the gradients of MCDoptimal confidence:
Equation A13 implies that, under moderate type #1 efficacy ($\beta \approx 0$), MCDoptimal confidence decreases when $\left\Delta {\mu}_{}^{0}\right$ decreases and/or when ${\sigma}_{}^{0}$ increases, irrespective of the amount $\widehat{z}$ of allocated resources. In other terms, variations in choice confidence are dominated by variations in the discriminability of prior value representations.
This analysis is exemplified in Appendix 1—figure 2, which summarizes the βeffect, in terms of how MCDoptimal resource allocation and choice confidence depend upon $\left\Delta {\mu}_{}^{0}\right$ and ${\sigma}_{}^{0}$.
One can see that, overall, increasing the prior variance ${\sigma}_{}^{0}$ increases the resource demand, which eventually increases the MCDoptimal allocated effort $\widehat{z}$. This, however, does not overcompensate for the loss of confidence incurred when increasing the prior variance. This is why the MCDoptimal confidence ${\overline{P}}_{c}\left(\widehat{z}\right)$ decreases with the prior variance ${\sigma}_{}^{0}$. Note that, for the same reason, the MCDoptimal confidence increases with the absolute prior means' difference $\left\Delta {\mu}_{}^{0}\right$.
Now the impact of the absolute prior means' difference $\left\Delta {\mu}_{}^{0}\right$ on $\widehat{z}$ is less trivial. In brief, when $\left\Delta {\mu}_{}^{0}\right$ is high, the MCDoptimal allocated effort $\widehat{z}$ decreases when $\left\Delta {\mu}_{}^{0}\right$ increases. This is due to the fact that the resource demand decreases with $\left\Delta {\mu}_{}^{0}\right$. However, there is a critical value for $\left\Delta {\mu}_{}^{0}\right$, below which the MCDoptimal allocated effort $\widehat{z}$increases with $\left\Delta {\mu}_{}^{0}\right$. This is because, although the resource demand still increases when $\left\Delta {\mu}_{}^{0}\right$ decreases, the cost of allocating resources overcompensates the gain in confidence. For such difficult decisions, the system does not follow the demand anymore, and progressively demotivates the allocation of resources as $\left\Delta {\mu}_{}^{0}\right$ continues to decrease. In brief, the amount $\widehat{z}$ of allocated resources decreases away from a 'sweet spot', which is the absolute prior means' difference that yields the maximal confidence gain per effort unit. Critically, the position of this sweet spot along the $\left\Delta {\mu}_{}^{0}\right$ dimension decreases with $\beta $ and increases with $\alpha $. This is because confidence gain increases, by definition, with effort efficacy, whereas it becomes more costly when $\alpha $ increases.
Second, let us ask what would be the MCDoptimal effort $\widehat{z}$ and confidence ${\overline{P}}_{c}\left(\widehat{z}\right)$ when $\beta =0$, that is, if the only effect of allocating resources is to perturb the value difference. The ensuing 'γ effect' is depicted in Appendix 1—figure 3.
In brief, the overall picture is reversed, with a few minor differences. One can see that increasing the absolute prior means' difference $\left\Delta {\mu}_{}^{0}\right$ decreases the resource demand, which eventually decreases the MCDoptimal allocated effort $\widehat{z}$. This can decrease confidence, if $\gamma $ is high enough to overcompensate the effect of variations in $\left\Delta {\mu}_{}^{0}\right$. When no effort is allocated, however, confidence is driven by $\left\Delta {\mu}_{}^{0}\right$, that is, it becomes an increasing function of $\left\Delta {\mu}_{}^{0}\right$. In contrast, variations in the prior variance ${\sigma}_{}^{0}$ always overcompensate the ensuing changes in effort, which is why confidence always decreases with ${\sigma}_{}^{0}$. In addition, the amount $\widehat{z}$ of allocated resources decreases away from a sweet prior variance spot, which is the prior variance ${\sigma}_{}^{0}$ that yields the maximal confidence gain per effort unit. Critically, the position of this sweet spot increases with $\gamma $ and decreases with $\alpha $, for reasons similar to the βeffect.
Now one can ask what happens in the presence of both the βeffect and the γeffect. If the effort unitary cost $\alpha $ is high enough, the MCDoptimal effort allocation is essentially the superposition of both effects. This means that there are two 'sweet spots': one around some value of $\left\Delta {\mu}_{}^{0}\right$ at high ${\sigma}_{}^{0}$ (βeffect) and one around some value of ${\sigma}_{}^{0}$ at high $\left\Delta {\mu}_{}^{0}\right$ (γeffect). If the effort unitary cost $\alpha $ decreases, then the position of the βsweet spot increases and that of the βsweet spot decreases, until they effectively merge together. This is exemplified in Appendix 1—figure 4.
One can see that, somewhat paradoxically, the effort response is now much simpler. In brief, the MCDoptimal effort allocation $\widehat{z}$ increases with the prior variance ${\sigma}_{}^{0}$ and decreases with the absolute prior means' difference $\left\Delta {\mu}_{}^{0}\right$. The landscape of the ensuing MCDoptimal confidence level ${\overline{P}}_{c}\left(\widehat{z}\right)$ is slightly less trivial, but globally, it can be thought of as increasing with $\left\Delta {\mu}_{}^{0}\right$ and decreasing with ${\sigma}_{}^{0}$. Here again, this is because variations in $\left\Delta {\mu}_{}^{0}\right$ and/or ${\sigma}_{}^{0}$ almost always overcompensate the ensuing effects of changes in allocated effort.
3. On MCD parameter estimation
Let ${y}_{t}$ be a 6 × 1 vector composed of measured choice confidence, spreading of alternatives, value certainty gain, change of mind, response time, and subjective effort rating on trial $t$. Let ${u}_{t}$ be a 4 × 1 vector, whose two first entries are composed of prechoice value difference (ΔVR^{0}) and average value certainty (VCR^{0}) ratings, and whose two last entries encode consequential and penalized trials. Finally, let $\phi $ be the set of unknown MCD parameters (i.e., intrinsic effort cost $\alpha $ and effort efficacies $\beta $ and $\gamma $), augmented with conditioneffect parameters and affine transform parameters (see below). From a statistical perspective, the MCD model then reduces to the following observation equation:
where $\overline{y}$ denotes data that have been zscored across trials, ${\epsilon}_{t}$ are model residuals, and the observation mapping $g\left(\phi ,{u}_{t}\right)$ is given by:
where $E\left[\left\Delta \mu \right\widehat{z}\right]$ and $V\left[\left\Delta \mu \right\widehat{z}\right]$ depend upon $\gamma $ (see Equations 7 and 8 in the main text). In Equation A15, ${a}_{1:6}$ and ${b}_{1:6}$ are the unknown offset and slope parameters of the (nuisance) affine transform on MCD outputs. Note that when fitting the MCD model to empirical data, theoretical prechoice value difference and value certainty ratings are replaced by their empirical proxies, that is, $\Delta {\mu}^{0}\approx \Delta {\text{VR}}^{0}$ and $1/{\sigma}_{}^{0}\approx {\text{VCR}}^{0}$. In turn, given MCD parameters, Equations A14 and A15 predict trialbytrial variations in choice confidence, spreading of alternatives, value certainty gain, change of mind, response time, and subjective effort rating from variations in prior moments of value representations. We note that Equation A15 does not yet include conditionspecific effects. As we will see, it will be easier to complete the definition of model parameters $\phi $ once we have explained the variational Laplace scheme for parameter estimation.
Recall that the variational Laplace scheme is an iterative algorithm that indirectly optimizes an approximation to both the model evidence $p\left(ym,u\right)$ and the posterior density $p\left(\phi y,m,u\right)$, where $m$ is the socalled generative model (i.e., the set of assumptions that are required for inference). The key trick is to decompose the log model evidence into:
where $q\left(\phi \right)$ is any arbitrary density over the model parameters, ${D}_{KL}$ is the KullbackLeibler divergence and the socalled free energy $F\left(q\right)$, defined as:
where $S\left(q\right)$ is the Shannon entropy of $q$ and the expectation ${\u27e8\xb7\u27e9}_{q}$ is taken under $q$.
From Equation A16, maximizing the functional $F\left(q\right)$ w.r.t. $q$ indirectly minimizes the KullbackLeibler divergence between $q\left(\phi \right)$ and the exact posterior $p\left(\phi y,m\right)$. This decomposition is complete in the sense that if $q\left(\phi \right)=p\left(\phi y,m\right)$, then $F\left(q\right)=\mathrm{ln}p\left(ym\right)$.
The variational Laplace algorithm iteratively maximizes the free energy $F\left(q\right)$ under simplifying assumptions (see below) about the functional form of $q$, rendering $q$ an approximate posterior density over model parameters and $F\left(q\right)$ an approximate log model evidence (Daunizeau, 2017a; Friston et al., 2007). The free energy optimization is then made with respect to the sufficient statistics of $q$, which makes the algorithm generic, quick, and efficient.
Under normal i.i.d. model residuals (i.e., ${\epsilon}_{t}\sim N\left(0,1/\lambda \right)$), the likelihood function writes:
where $\lambda $ is the residuals' precision or inverse variance hyperparameter and the observation mapping $g\left(\phi ,{u}_{t}\right)$ is given in Equation A15.
We also use Gaussian priors $p\left(\phi m\right)=N\left({\eta}_{0},{\Sigma}_{0}\right)$ for model parameters and gamma priors for precision hyperparameters $p\left(\lambda m\right)=Ga\left({\varpi}_{0},{\kappa}_{0}\right)$.
In what follows, we derive the variational Laplace algorithm under a 'meanfield' separability assumption between parameters and hyperparameters, that is: $q\left(\phi ,\lambda \right)=q\left(\phi \right)q\left(\lambda \right)$. We will see that this eventually yields a Gaussian posterior density $q\left(\phi \right)\approx N\left(\eta ,\Sigma \right)$ on model parameters, and a Gamma posterior density $q\left(\lambda \right)=Ga\left(\varpi ,\kappa \right)$ on the precision hyperparameter.
First, let us note that, under the Laplace approximation, the free energy bound on the logmodel evidence can be written as:
where ${n}_{\phi}$ is the number of parameters, $\Gamma (\xb7)$ is the gamma function, $\psi (\xb7)$ is the digamma function, and $I\left(\phi \right)$ is defined as:
Given the Gamma posterior $q\left(\lambda \right)$ on the precision hyperparameter, $I\left(\phi \right)$ can be simply expressed as follows:
where we have ignored the terms that do not depend upon $\phi $, and $\u27e8\lambda \u27e9=E\left[\lambda y,m\right]=\varpi /\kappa $ is the posterior mean of the data precision hyperparameter $\lambda $.
The variational Laplace update rule for the approximate posterior density $q\left(\phi \right)$ on model parameters now simply reduces to an update rule for its sufficient statistics:
In Equation A22, the firstorder moment $\eta $ of $q\left(\phi \right)$ is obtained from the following GaussNewton iterative gradient ascent scheme:
where the gradient and Hessians of $I\left(\phi \right)$ are given by:
At convergence of the above gradient ascent, the approximate posterior density $q\left(\phi \right)$ on the precision hyperparameter is updated as follows:
where ${n}_{t}$ is the number of trials.
The variational Laplace scheme alternates between Equations A22 and A25 iteratively until convergence of the free energy.
Now, let us complete the definition of the model parameter vector $\phi ={\phi}_{1:17}$.
First, note that effort efficiency parameters are necessarily positive. Enforcing this constraint can be done using the following simple change of variable in Equation A15: $\beta =\mathrm{exp}\left({\phi}_{1}\right)$ and $\gamma =\mathrm{exp}\left({\phi}_{2}\right)$. In other words, ${\phi}_{1:2}$ effectively measure efficiency parameters in logspace. Second, recall that we want to insert conditionspecific effects in the model. More precisely, we expect ‘consequential’ decisions to be more important than ‘neutral’ ones, and ‘penalized’ decisions effectively include an extraneous costoftime term. One can model the former condition effect by making $R$ (Equation 2 in the main text) sensitive to whether the decision is consequential (${u}_{}^{\left(c\right)}=1$) or not (${u}_{}^{\left(c\right)}=0$), that is: ${R}_{t}=\mathrm{exp}\left({\phi}_{3}\text{\hspace{0.17em}}{u}_{t}^{\left(c\right)}\right)$, where $t$ indexes trials, and ${\phi}_{3}$ is the unknown weight of consequential choices on decision importance. This parameterization makes decision importance necessarily positive, and forces nonconsequential trials to act as reference choices (in the sense that their decision importance is set to 1). We proxy the latter condition effect by making the effort unitary cost a function of whether the decision is penalized (${u}_{}^{\left(p\right)}=1$) or not (${u}_{}^{\left(p\right)}=0$), that is: ${\alpha}_{t}=\mathrm{exp}\left({\phi}_{4}+{\phi}_{5}\text{\hspace{0.17em}}{u}_{t}^{\left(p\right)}\right)$, where ${\phi}_{4}$ is the unknown intrinsic effort cost (in logspace), and ${\phi}_{5}$ is the unknown weight of penalized choices on effort cost. The remaining parameters ${\phi}_{6:17}$ lump the offsets (${a}_{1:6}$) and logslopes ($\mathrm{log}{b}_{1:6}$: this enforces a positivity constraint on slope parameters) of the affine transform.
Finally, we set the prior probability density functions on model parameters and hyperparameters as follows:
$p\left({\phi}_{i}m\right)=N\left(0,{10}^{2}\right)\text{\hspace{0.17em}}\forall i$, that is, the prior mean of model parameters is ${\eta}_{0}=0$ and their prior variance is ${\Sigma}_{0}={10}^{2}\times I$.
$p\left(\lambda m\right)=Ga\left(1,1\right)$. Since the data has been zscored prior to model inversion, this ensures that the prior and likelihood components of $I\left(\phi \right)$ are balanced when the variational Laplace algorithm starts.
This completes the description of the variational Laplace approach to MCD inversion. For more details, we refer the interested reader to the existing literature on variational approaches to approximate Bayesian inference (Beal, 2003; Daunizeau, 2017b; Friston et al., 2007). We note that the above variational Laplace approach is implemented in the opensource VBA toolbox (Daunizeau et al., 2014).
In what follows, we use MonteCarlo numerical simulations to evaluate the ability of this approach to recover MCD parameters. Our parameter recovery analyses proceed as follows. First, we sample a set of model parameters $\phi $ under a standard i.i.d. normal distribution. Here, we refer to ${\phi}_{ij}$ as i^{th} element of $\phi $ at the j^{th} MonteCarlo simulation. Second, for each of these parameter set ${\phi}_{\xb7j}$, we simulate a series of N=100 decision trials according to Equation A14 and A15 above (under random prior moments of value representations). Note that we set the variance of model residuals ($\epsilon $ in Equation A14) to match the average correlation between MCD predictions and empirical data (about 20%, see Figure 4 in the main text). We also used the same rate of neutral, consequential, and penalized choices as in our experiment. Third, we fit the model to the resulting simulated data (after zscoring) and extract parameter estimates ${\eta}_{\xb7j}$ (at convergence of the variational Laplace approach). We repeat these three steps 1000 times, yielding a series of 1000 simulated parameter sets, and their corresponding 1000 estimated parameters sets. Should ${\eta}_{\xb7j}^{}\approx {\phi}_{\xb7j}\text{\hspace{0.17em}}\forall j$, then parameter recovery would be perfect. Appendix 1—figure 5 compares simulated and estimated parameters to each other across MonteCarlo simulations. Note that we only report recovery results for ${\phi}_{1:5}$, since we do not care about nuisance affine transform parameters.
We also quantify pairwise nonidentifiability issues, which arise when the estimation method confuses two parameters with each other. We do this using the socalled ‘recovery matrices’, which summarize whether variations (across the 1000 MonteCarlo simulations) in estimated parameters faithfully capture variations in simulated parameters. We first zscore simulated and estimated parameters across MonteCarlo simulations. We then regress each estimated parameter against all simulated parameters through the following multiple linear regression model:
where ${\theta}_{ii\text{'}}^{}$ are regression weights, and ${\varsigma}_{ij}^{}$ are regression residuals. Here, regression weights are partial correlation coefficients between simulated and estimated parameters (across MonteCarlo simulations). More precisely, ${\theta}_{ii\text{'}}^{}$ quantifies the impact that variations of the simulated parameter ${\phi}_{i\text{'}\xb7}^{}$ have on variations of the estimated parameter ${\eta}_{i\xb7}^{}$, conditional on all other simulated parameters. Would parameters be perfectly identifiable, then ${\theta}_{ii}^{}\approx 1$ and ${\theta}_{ii\text{'}}^{}\approx 0\text{}\forall i\text{'}\ne i$. Pairwise nonidentifiability issues arise when ${\theta}_{ii\text{'}}^{}\ne 0$. In other words, the regression model in Equation A26 effectively decomposes the observed variability in the series of estimated parameter ${\eta}_{i\xb7}^{}$ into 'correct variations' that are induced by variations in the corresponding simulated parameter ${\phi}_{i\xb7}^{}$, and 'incorrect variations' that are induced by the remaining simulated parameters ${\phi}_{i\text{'}\xb7}^{}$ (with $i\text{'}\ne i$). This analysis is then summarized in terms of 'recovery matrices', which simply report the squared regression weights ${\theta}_{ii\text{'}}^{2}$ for each simulated parameter (see right panel of Appendix 1—figure 5).
One can see that parameter recovery is far from perfect. This is in fact expected, given the high amount of simulation noise. However, no parameter estimate exhibits any noticeable estimation bias, that is, estimation error is nonsystematic and directly results from limited data reliability. Recovery matrices provide further quantitative insight regarding the accuracy of parameter estimation.
First, variability in all parameter estimates is mostly driven by variability in the corresponding simulated parameter (amount of 'correct variability': ${\phi}_{1}$: 5.3%, ${\phi}_{3}$: 17.4%, ${\phi}_{4}$: 22.1%, ${\phi}_{5}$: 22.7%, to be compared with 'incorrect variability' – see below), except for type #1 efficacy (${\phi}_{2}$: 0.3%). The latter estimate is thus comparatively much less efficient than other MCD parameters. This is because $\beta =\mathrm{exp}\left({\phi}_{2}\right)$ only has a limited impact on MCD outputs. Second, there are no strong nonidentifiability issues (total amount of 'incorrect invariability' is always below 2.7%, even when including nuisance affine transform parameters ${\phi}_{6:17}$), except for type #2 effort efficacy. In particular, the latter estimate may be partly confused with intrinsic effort cost (amount of “incorrect variability” driven by ${\phi}_{1}$: 1.6%).
Having said this, the reliability of MCD parameter recovery is globally much weaker than in the ideal case, where data is not polluted with simulation noise (the amount of ‘correct variability’ in this case is higher than 95% for all parameters – results not shown). This means that acquiring data of higher quality and/or quantity may significantly improve inference on MCD parameters.
We note that the weak identifiability of type #1 effort efficacy (β) does not imply that some dependent variables will be less well predicted/postdicted than others. Recall that β indirectly influences all dependent variables, through its impact on the optimal amount of allocated resources. Therefore, all dependent variables provide information about β. Importantly, some dependent variables are more useful than others for estimating β. If empirical measures of these variables become unreliable (e.g., because they are very noisy), then β will not be identifiable. However, the reverse is not true. In fact, in our recovery analysis, we found no difference in postdiction accuracy across dependent variables. Now, the question of whether weak β identifiability may explain (outofsample) prediction errors regarding the impact of MCD input variables (such as ΔVR0) on dependent variables is more subtle. This is because, by construction, MCD parameters control the way MCD input variables eventually influence dependent variables. As one can see from the analytical derivations in section 2 of this Appendix, the impact of input variables on MCD dependent variables (in particular, the optimal amount of allocated resources) depends upon whether β dominates effort efficacy (cf. ‘βeffect’) or not (cf. ‘γeffect’). For example, if β dominates, then the relationship between ΔVR^{0} and effort is bellshaped (cf. Figure S6), whereas it is monotonic if β = 0 (cf. Figure S7). This means that estimation errors on β may confuse the predicted relationship between input variables and MCD dependent variables.
4. Data descriptive statistics and sanity checks
Recall that we collect value ratings and value certainty ratings both before and after the choice session. We did this for the purpose of validating specific predictions of the MCD model (in particular: choiceinduced preference changes: see Figure 10 in the main text). It turns out this also enables us to assess the test–retest reliability of both value and value certainty ratings. We found that both ratings were significantly reproducible (value: mean correlation = 0.88, s.e.m. = 0.01, p<0.001, value certainty: mean correlation = 0.37, s.e.m. = 0.04, p<0.001).
We also checked whether choices were consistent with prechoice ratings. For each participant, we thus performed a logistic regression of choices against the difference in value ratings. We found that the balanced prediction accuracy was beyond chance level (mean accuracy=0.68, s.e.m.=0.01, p<0.001).
5. Does choice confidence moderate the relationship between choice and prechoice value ratings?
Previous studies regarding confidence in valuebase choices showed that choice confidence moderates choice prediction accuracy (De Martino et al., 2013). We thus split our logistic regression of choices into high and lowconfidence trials, and tested whether higher confidence was consistently associated with increased choice accuracy. A random effect analysis showed that the regression slopes were significantly higher for high than for lowconfidence trials (mean slope difference = 0.14, s.e.m. = 0.03, p<0.001). For the sake of completeness, the impact of choice confidence on the slope of the logistic regression (of choice onto the difference in prechoice value ratings) is shown in Appendix 1—figure 6.
These results clearly replicate the findings of De Martino et al., 2013, which were interpreted with a race model variant of the accumulationtobound principle. We note, however, that this effect is also predicted by the MCD model. Here, variations in both (i) the prediction accuracy of choice from prechoice value ratings and (ii) choice confidence are driven by variations in resource allocation. In brief, the expected magnitude of the perturbation of value representations increases with the amount of allocated resources. This eventually increases the probability of a change of mind. However, although more resources are allocated to the decision, this does not overcompensate for decision difficulty, and thus choice confidence decreases. Thus, lowconfidence choices will be those choices that are more likely to be associated with a change of mind. We note that the anticorrelation between choice confidence and change of mind can be seen by comparing Figures 7 and 8 in the main text.
6. How do choice confidence, difference in prechoice value ratings, and response time relate to each other?
In the main text, we show that trialbytrial variation in choice confidence is concurrently explained by both prechoice value and value certainty ratings. Here, we reproduce previous findings relating choice confidence to both absolute value difference ΔVR^{0} and response time (De Martino et al., 2013). First, for each participant, we regressed response time concurrently against both ΔVR^{0} and choice confidence. A random effect analysis showed that both have a significant main effect on response time (ΔVR^{0}: mean GLM beta = −0.016, s.e.m. = 0.003, p<0.001; choice confidence: mean GLM beta = −0.014, s.e.m. = 0.002; p<0.001), without any twoway interaction (p=0.133). This analysis is summarized in Appendix 1—figure 7, together with the full threeway relationship between ΔVR^{0}, confidence, and response time.
In brief, confidence increases with the absolute value difference and decreases with response time. This effect is also predicted by the MCD model, for reasons identical to the explanation of the relationship between confidence and choice accuracy (see above). Recall that, overall, an increase in choice difficulty is expected to yield an increase in response time and a decrease in choice confidence. This would produce the same data pattern as Appendix 1—figure 7, although the causal relationships implicit in this data representation is partially incongruent with the computational mechanisms underlying MCD.
7. Do postchoice ratings better predict choice and choice confidence than prechoice ratings?
The MCD model assumes that value representations are modified during the decision process, until the MCDoptimal amount of resources is met. This eventually triggers the decision, whose properties (i.e., which alternative option is eventually preferred, and with which confidence level) then reflect the modified value representations. If postchoice ratings are reports of modified value representations at the time when the choice is triggered, then choice and its associated confidence level should be better predicted with postchoice ratings than with prechoice ratings. In what follows, we test this prediction.
In Section 4 of this Appendix, we report the result of a logistic regression of choice against prechoice value ratings (see also Appendix 1—figure 6). We performed the same regression analysis, but this time against postchoice value ratings. For each subject, we then measured the ensuing predictive power (here, in terms of balanced accuracy or BA) for both prechoice and postchoice ratings. The main text also features the result of a multiple linear regression of choice confidence ratings onto ΔVR^{0} and VCR^{0} (Figure 8 in the main text). Again, we performed the same regression, this time against postchoice ratings. For each subject, we then measured the ensuing predictive power (here, in terms of percentage of explained variance or R^{2}) for both prechoice and postchoice ratings.
A simple random effect analysis shows that the predictive power of postchoice ratings is significantly higher than that of prechoice ratings, both for choice (mean difference in BA=7%, s.e.m.=0.01, p<0.001) and choice confidence (mean difference in R^{2}=3%, s.e.m.=0.01, p=0.004).
8. Analysis of eyetracking data
We first checked whether pupil dilation positively correlates with participants' subjective effort ratings. We epoched the pupil size data into trialbytrial time series, and temporally coregistered the epochs either at stimulus onset (starting 1.5 s before the stimulus onset and lasting 5 s) or at choice response (starting 3.5 s before the choice response and lasting 5 s). Data was baselinecorrected at stimulus onset. For each participant, we then regressed, at each time point during the decision, pupil size onto effort ratings (across trials). Time series of regression coefficients were then reported at the group level, and tested for statistical significance (correction for multiple comparison was performed using random field theory 1DRFT). Appendix 1—figure 8 summarizes this analysis, in terms of the baselinecorrected time series of regression coefficients.
We found that the correlation between subjective effort ratings and pupil dilation became significant from 500 ms after stimulus onset onwards. Note that, using the same approach, we found a negative correlation between pupil dilation and prechoice absolute value difference ΔVR^{0}. However, this relationship disappeared when we entered both ΔVR^{0} and effort into the same regression model.
Our eyetracking data also allowed us to ascertain which item was being gazed at for each point in peristimulus time (during decisions). Using the choice responses, we classified each time point as a gaze at the (to be) chosen item or at the (to be) rejected item. We then derived, for each decision, the ratio of time spent gazing at chosen/rejected items versus the total duration of the decision (between stimulus onset and choice response). The difference between these two gaze ratios measures the overt attentional bias toward the chosen item. We refer to this as the gaze bias. Consistent with previous studies, we found that chosen items were gazed at more than rejected items (mean gaze bias = 0.02, s.e.m. = 0.01, p=0.067). However, we also found that this effect was in fact limited to low effort choices. Appendix 1—figure 9 shows the gaze bias for low and higheffort trials, based on a mediansplit of subjective effort.
We found that there was a significant gaze bias for low effort choices (mean gaze ratio difference = 0.033, s.e.m. = 0.013, p=0.009), but not for high effort choices (mean gaze ratio difference = 0.002, s.e.m. = 0.014, p=0.453). A potential trivial explanation for the fact that the gaze bias is large for low effort trials is that these are the trials where participants immediately recognize their favorite option, which attracts their attention. More interesting is the fact that the gaze bias is null for high effort trials. This may be taken as evidence for the fact that, on average, people allocate the same amount of (attentional) resources to both options. This is important, because we use this simplifying assumption in our MCD model derivations.
9. Comparison with evidenceaccumulation (DDM) models
In the main text, we evaluate the accuracy of the MCD model predictions, without considering alternative computational scenarios. Here, we report results of a modelbased data analysis that relies on the standard driftdiffusion decision or DDM model for valuebased decisionmaking (De Martino et al., 2013; LopezPersem et al., 2016; Milosavljevic et al., 2010; Ratcliff et al., 2016; Tajima et al., 2016).
In brief, DDMs tie together decision outcomes and response times by assuming that decisions are triggered once the accumulated evidence in favor of a particular option has reached a predefined threshold or bound (Ratcliff and McKoon, 2008; Ratcliff et al., 2016). Importantly here, evidence accumulation has two components: a drift term that quantifies the strength of evidence and a random diffusion term that captures some form of neural perturbation of evidence accumulation. The latter term allows choice outcomes to deviate from otherwise deterministic, evidencedriven, decisions.
Importantly, standard DDMs do not predict choice confidence, spreading of alternatives, value certainty gain, or subjective effort ratings. This is because these concepts have no straightforward definition under the standard DDM. However, DDMs can be used to make outofsample trialbytrial predictions of, for example, decision outcomes, from parameter estimates obtained with response times alone. This enables a straightforward comparison of MCD and DDM frameworks, in terms of the accuracy of RT ‘postdictions’ and change of mind outofsample prediction. Here, we also make sure both models rely on the same inputs: namely, prechoice value (ΔVR^{0}) and value certainty (VCR^{0}) ratings as well as information about task conditions.
The simplest DDM variant includes the following set of five unknown parameters: the drift rate $v$, the bound's height $b$, the standard deviation of the diffusion term $\sigma $, the initial decision bias ${x}_{0}$, and the nondecision time ${T}_{nd}$. Given these model parameters, the expected response time (conditional on the decision outcome) is given by Srivastava et al., 2016:
where $o\in \left\{1,1\right\}$ is the decision outcome. One can then evaluate Equation A27 at each trial, given its corresponding set of DDM parameters. In particular, if one knows how, for example, drift rates vary over trials, then one can predict the ensuing expected RT variations. In typical applications to valuebased decisionmaking, drift rates are set proportional to the difference ΔVR^{0} in value ratings (De Martino et al., 2013; Krajbich et al., 2010; LopezPersem et al., 2016; Milosavljevic et al., 2010). One can then define a likelihood function for observed response times from the following observation equation: $RT=E\left[RTo,v,{x}_{0},b,\sigma ,{T}_{nd}\right]+\epsilon $, where $\epsilon $ are trialbytrial DDM residuals. The variational Laplace treatment of the ensuing generative model then yields estimates of the remaining DDM parameters.
Outofsample predictions of change of mind (i.e., decision errors) can then be derived from DDM parameter estimates (Bogacz et al., 2006):
where ${Q}_{DDM}$ is the DDM equivalent to the probability $Q\left(\widehat{z}\right)$ of a change of mind under the MCD model (see Equation 14 in the main text).
Here, we use two modified variants of the standard DDM for valuebased decisions. In all of these variants, we allow the DDM system to change its speedaccuracy tradeoff according to whether the decision is consequential (${u}_{}^{\left(c\right)}=1$) or not (${u}_{}^{\left(c\right)}=0$), and/or ‘penalized’ (${u}_{}^{\left(p\right)}=1$) or not (${u}_{}^{\left(p\right)}=0$). This is done by enabling the decision bound to vary over trials, i.e., ${b}_{t}\equiv \mathrm{exp}\left({b}^{\left(0\right)}+{b}^{\left(c\right)}\text{\hspace{0.17em}}{u}_{t}^{\left(c\right)}+{b}^{\left(p\right)}\text{\hspace{0.17em}}{u}_{t}^{\left(p\right)}\right)$, where $t$ indexes trials. Here, ${b}^{\left(0\right)}$, ${b}^{\left(c\right)}$, and ${b}^{\left(p\right)}$ are unknown parameters that quantify the bound's height of ‘neutral’ decisions, and the strength of ‘consequential’ and ‘penalized’ condition effects, respectively. The exponential mapping is used for imposing a positivity constraint on the resulting bound (see section 8 above). One might then expect that ${b}^{\left(c\right)}>0$ and ${b}^{\left(p\right)}<0$, that is, ‘consequential’ decisions demand more evidence than ‘neutral’ ones, whereas ‘penalized’ decisions favor speed over accuracy.
The two DDM variants then differ in terms of how prechoice value certainty is taken into account (Lee and Usher, 2020):
DDM1: at each trial, the drift rate is set to the affinetransformed certaintyweighted value difference, that is, ${\nu}_{t}\equiv {\nu}^{\left(0\right)}+{\nu}^{\left(s\right)}\times VC{R}_{t}^{0}\times \Delta V{R}_{t}^{0}$, where ${\nu}^{\left(0\right)}$ and ${\nu}^{\left(s\right)}$ are unknown parameters that control the offset and slope of the affine transform, respectively. Here, the strength of evidence in favor of a given alternative option is measured in terms of a signaltonoise ratio on value. Note that the diffusion standard deviation $\sigma $ is kept fixed across trials.
DDM2: at each trial, the drift rate is set to the affinetransformed value difference, that is, ${\nu}_{t}\equiv {\nu}^{\left(0\right)}+{\nu}^{\left(s\right)}\times \Delta V{R}_{t}^{0}$, and the diffusion standard deviation is allowed to vary over trials with value certainty ratings: ${\sigma}_{t}\equiv \mathrm{exp}\left({\sigma}^{\left(0\right)}\mathrm{exp}\left({\sigma}^{\left(1\right)}\right)\text{\hspace{0.17em}}\times VC{R}_{t}^{0}\right)$. Here, ${\sigma}^{\left(0\right)}$ and ${\sigma}^{\left(1\right)}$ are unknown parameters that quantify the fixed and varying components of the diffusion standard deviation, respectively. In this parameterization, value representations that are more certain will be signaled more reliably. Note that the statistical complexity of DDM2 is higher than that of DDM1 (one additional unknown parameter).
For each subject and each DDM variant, we estimate unknown parameters from RT data alone using Equation A27, and derive outofsample predictions for changes of mind using Equation A28. We then measure the accuracy of trialbytrial RT postdictions and outofsample change of mind predictions, in terms of the correlation between observed and predicted/postdicted variables. We also perform the exact same analysis under the MCD model (this is slightly different from the analysis reported in the main text, because only RT data is included in model fitting here).
To begin with, we compare the accuracy of RT postdictions, which is summarized in Appendix 1—figure 10.
One can see that the RT postdiction accuracy of both DDMs is higher than that of the MCD model. In fact, onesample paired ttests on the difference between DDM and MCD withinsubject accuracy scores show that this comparison is statistically significant (DDM1: mean accuracy difference = 12.3%, s.e.m. = 2.6%, p<10^{−3}; DDM2: mean accuracy difference = 10.5%, s.e.m. = 2.6%, p<10^{−3}; twosided ttests). In addition, one can see that DDM1 accurately captures variations in RT data that are induced by ΔVR^{0} and VCR^{0}. However, DDM2 is unable to reproduce the impact of VCR^{0} (cf. wrong effect direction). This is because, in DDM2, as value certainty ratings increase and the diffusion standard deviation decreases, the probability that DDM bounds are hit sooner decreases (hence prolonging RT on average). These results reproduce recent investigations of the impact of value certainty ratings on DDM predictions (Lee and Usher, 2020).
Now, Appendix 1—figure 11 summarizes the accuracy of outofsample change of mind predictions.
It turns out that the MCD model exhibits the highest accuracy of outofsample change of mind predictions. Onesample paired ttests on the difference between DDM and MCD withinsubject accuracy scores show that this comparison reaches statistical significance for both DDM1 (mean accuracy difference=5%, s.e.m. = 2.4%, p=0.046; twosided ttest) and DDM2 (mean accuracy difference = −9.9%, s.e.m. = 3.4%, p=0.006; twosided ttest). One can also see that neither DDM variant accurately predicts the effects of ΔVR^{0} and VCR^{0}.
In brief, the DDM framework might be better than the MCD model at capturing trialbytrial variations in RT data. This may not be surprising, given the longstanding success of the DDM on this issue (Ratcliff et al., 2016). The result of this comparison, however, depends upon how the DDM is parameterized (cf. wrong effect direction of VCR^{0} for DDM2). More importantly, in our context, DDMs make poor outofsample predictions on decision outcomes, at least when compared to the MCD model. For the purpose of predicting decisionrelated variables from effortrelated variables, one would thus favor the MCD framework.
10. Accounting for saturating γeffect
When deriving the MCD model, we considered a linear γeffect, that is, we assumed that the variance of the perturbation $\delta \left(z\right)$ of value representation modes increases linearly with the amount $z$ of allocated resources (Equation 6 in the main text). However, one might argue that the marginal impact of effort on the variance of $\delta \left(z\right)$ may decrease as further resources are allocated to the decision. In other terms, the magnitude of the perturbation (per unit of resources) that one might expect when no resources have yet been allocated may be much higher than when most resources have already been allocated. In turn, Equation 6 would be replaced by:
where the variance $f\left(z,\gamma \right)$ of the modes' perturbations would be a saturating function of $z$, e.g:
where ${\gamma}_{1}$ is the maximum or plateau variance that perturbations can exhibit and ${\gamma}_{2}$ is the decay rate toward the plateau variance.
It turns out that this does not change the mathematical derivations of the MCD model, that is, model predictions still follow Equations 9–14 in the main text, having replaced $\gamma z$ with $f\left(z,\gamma \right)$ everywhere.
Model simulations with this modified MCD model show no qualitative difference from its simpler variant (linear γeffect), across a wide range of ${\gamma}_{1,2}$ parameters. Having said this, the modified MCD model is in principle more flexible than its simpler variant, and may thus exhibit additional explanatory power. We thus performed a formal statistical model comparison to evaluate the potential advantage of considering saturating γeffects. In brief, we performed the same withinsubject analysis as with the simpler MCD variant (see main text). We then measured the accuracy of model postdictions on each dependent variable and performed a randomeffect grouplevel Bayesian model comparison (Rigoux et al., 2014; Stephan et al., 2009). The results of this comparison are summarized in Appendix 1—figure 12.
First, one can see that considering saturating γeffects does not provide any meaningful advantage in terms of MCD postdiction accuracy. Second, Bayesian model section clearly favors the simpler (linear γeffect) MCD variant (linear efficacy: estimated model frequency = 84.4 ± 5.5%, exceedance probability = 1, protected exceedance probability = 0.89). We note that other variants of the MCD model may be proposed, with similar modifications (e.g., nonlinear effort costs, nonGaussian – skewed – value representations). Preliminary simulations seem to confirm that such modifications would not change the qualitative nature of MCD predictions. In other terms, the MCD model may be quite robust to these kinds of assumptions. Note that these modifications would necessarily increase the statistical complexity of the model (by inserting additional unknown parameters). Therefore, the limited reliability of behavioral data (such as we report here) may not afford subtle deviations to the simple MCD model variant we evaluate here.
11. Comparing MCD and modelfree postdiction accuracy
The MCD model provides quantitative predictions for both effortrelated and decisionrelated variables, from estimates of three native parameters (effort unitary cost and two types of effort efficacy), which control all dependent variables. However, the model prediction accuracy is not perfect, and one may wonder what is the added value of MCD compared to modelfree analyses.
To begin with, recall that one cannot make outofsample predictions in a modelfree manner (e.g., there is nothing one can learn about effortrelated variables from regressions of decisionrelated variables on ΔVR^{0} and VCR^{0}). In contrast, a remarkable feature of modelbased analyses is that training the model on some subset of variables is enough to make outofsample predictions on other (yet unseen) variables. In this context, MCDbased analyses show that variations in response times, subjective effort ratings, changes of mind, spreading of alternatives, choice confidence, and precision gain can be predicted from each other under a small subset of modeling assumptions.
Having said this, modelfree analyses can be used to provide a reference for the accuracy of MCD postdictions. For example, one may regress each dependent variable onto ΔVR^{0}, VCR^{0}, and indicator variables of experimental conditions (whether or not the choice is ‘consequential’ and/or ‘penalized’), and measure the correlation between observed and postdicted variables. This provides a benchmark against which MCD postdiction accuracy can be evaluated. To enable a fair statistical comparison, we reperformed MCD model fits, this time fitting each dependent variable one by one (leaving the others out). In what follows, we refer to this as ‘MCD 1variable fits’. The results of this analysis are summarized in Appendix 1—figure 13:
As expected, MCD 1variable fits have better postdiction accuracy than the MCD 'fulldata' fit. This is because the latter approach attempts to explain all dependent variables with the same parameter set, which requires finding a compromise between all dependent variables.
Now, modelfree regressions seem to show globally better postdiction accuracy than MCD 1variable fits: on average, the MCD model captures about 81% of the variance explained using linear regressions. However, the postdiction accuracy difference is only significant for effortrelated variables (RT: p=0.0002, subjective effort rating: p=0.0007), but not for decisionrelated variables (choice confidence: p=0.06, spreading of alternatives: p=0.28, change of mind: p=0.24) except certainty gain (p<10^{−4}).
A likely explanation here is that the MCD model includes constraints that prevent onevariable fits from matching the modelfree postdiction accuracy level. In turn, one may want to extend the MCD model with the aim of relaxing these constraints. Having said this, these constraints necessarily derive from the modeling assumptions that enable the MCD model to make outofsample predictions. We comment on this and related issues in the Discussion section of the main text.
Data availability
Empirical data as well as model fitting code have been uploaded as part of this submission. Also, it is now publicly available at Dryad: https://doi.org/10.5061/dryad.7h44j0zsg.

Dryad Digital RepositoryLee and Daunizeau choice data from: Trading mental effort for confidence in the metacognitive control of valuebased decisionmaking.https://doi.org/10.5061/dryad.7h44j0zsg
References

BookPossible principles underlying the transformations of sensory messagesIn: Rosenblith W. A, editors. Sensory Communication. MIT Press. pp. 217–234.https://doi.org/10.7551/mitpress/9780262518420.003.0013

BookVariational Algorithms for Approximate Bayesian Inference (Doctoral DissertationUCL (University College London).

Selfperception: an alternative interpretation of cognitive dissonance phenomenaPsychological Review 74:183–200.https://doi.org/10.1037/h0024835

Rostral prefrontal cortex and the focus of attention in prospective memoryCerebral Cortex 22:1876–1886.https://doi.org/10.1093/cercor/bhr264

Fixation patterns in simple choice reflect optimal information samplingPLOS Computational Biology 17:e1008863.https://doi.org/10.1371/journal.pcbi.1008863

How choice affects and reflects preferences: revisiting the freechoice paradigmJournal of Personality and Social Psychology 99:573–594.https://doi.org/10.1037/a0020217

Neural mechanisms of cognitive dissonance (Revised): An EEG studyThe Journal of Neuroscience 37:5074–5083.https://doi.org/10.1523/JNEUROSCI.320916.2017

VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural dataPLOS Computational Biology 10:e1003441.https://doi.org/10.1371/journal.pcbi.1003441

Evidence for timevariant decision makingEuropean Journal of Neuroscience 24:3628–3641.https://doi.org/10.1111/j.14609568.2006.05221.x

The cost of accumulating evidence in perceptual decision makingJournal of Neuroscience 32:3612–3628.https://doi.org/10.1523/JNEUROSCI.401011.2012

Development of abstract thinking during childhood and adolescence: the role of rostrolateral prefrontal cortexDevelopmental Cognitive Neuroscience 10:57–76.https://doi.org/10.1016/j.dcn.2014.07.009

Comparing perceptual and preferential decision makingPsychonomic Bulletin & Review 23:723–737.https://doi.org/10.3758/s1342301509411

Multitasking versus multiplexing: toward a normative account of limitations in the simultaneous execution of controldemanding behaviorsCognitive, Affective, & Behavioral Neuroscience 14:129–146.https://doi.org/10.3758/s1341501302369

The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038

A dimertype saddle search algorithm with preconditioning and linesearchMathematics of Computation 85:2939–2966.https://doi.org/10.1090/mcom/3096

Implicit social cognition: attitudes, selfesteem, and stereotypesPsychological Review 102:4–27.https://doi.org/10.1037/0033295X.102.1.4

Actionbased model of dissonance: a review, integration, and expansion of conceptions of cognitive conflictAdvances in Experimental Social Psychology 41:119–166.https://doi.org/10.1016/S00652601(08)004036

The speedaccuracy tradeoff: history, physiology, methodology, and behaviorFrontiers in Neuroscience 8:150.https://doi.org/10.3389/fnins.2014.00150

The neural basis of rationalization: cognitive dissonance reduction during decisionmakingSocial Cognitive and Affective Neuroscience 6:460–467.https://doi.org/10.1093/scan/nsq054

BookJudgment Under Uncertainty: Heuristics and BiasesCambridge University Press.https://doi.org/10.1007/9789401018340_8

Visual fixations and the computation and comparison of value in simple choiceNature Neuroscience 13:1292–1298.https://doi.org/10.1038/nn.2635

An opportunity cost model of subjective effort and task performanceBehavioral and Brain Sciences 36:661–679.https://doi.org/10.1017/S0140525X12003196

A simple coding procedure enhances a neuron's Information CapacityZeitschrift Für Naturforschung C 36:910–912.https://doi.org/10.1515/znc198191040

Automatic integration of confidence in the brain valuation signalNature Neuroscience 18:1159–1167.https://doi.org/10.1038/nn.4064

An empirical test of the role of value certainty in decision makingFrontiers in Psychology 11:574473.https://doi.org/10.3389/fpsyg.2020.574473

New directions in GoalSetting theoryCurrent Directions in Psychological Science 15:265–268.https://doi.org/10.1111/j.14678721.2006.00449.x

Efficient coding and the neural representation of valueAnnals of the New York Academy of Sciences 1251:13–32.https://doi.org/10.1111/j.17496632.2012.06496.x

Dynamics of winnertakeall competition in recurrent neural networks with lateral inhibitionIEEE Transactions on Neural Networks 18:55–69.https://doi.org/10.1109/TNN.2006.883724

Capacity limits of information processing in the brainTrends in Cognitive Sciences 9:296–305.https://doi.org/10.1016/j.tics.2005.04.010

The drift diffusion model can account for valuebased choice response times under high and low time pressureJudgment and Decision Making 5:437–449.

ConferenceA computational model of control allocation based on the expected value of controlReinforcement Learning and Decision Making Conference.

A supramodal accumulationtobound signal that determines perceptual decisions in humansNature Neuroscience 15:1729–1735.https://doi.org/10.1038/nn.3248

Irrational time allocation in decisionmakingProceedings of the Royal Society B: Biological Sciences 283:20151439.https://doi.org/10.1098/rspb.2015.1439

When natural selection should optimize speedaccuracy tradeoffsFrontiers in Neuroscience 8:73.https://doi.org/10.3389/fnins.2014.00073

Efficient coding of subjective valueNature Neuroscience 22:134–142.https://doi.org/10.1038/s4159301802920

Acute stress influences neural circuits of reward processingFrontiers in Neuroscience 6:157.https://doi.org/10.3389/fnins.2012.00157

Acute stress modulates risk taking in financial decision makingPsychological Science 20:278–283.https://doi.org/10.1111/j.14679280.2009.02288.x

A framework for studying the neurobiology of valuebased decision makingNature Reviews Neuroscience 9:545–556.https://doi.org/10.1038/nrn2357

Diffusion decision model: current issues and historyTrends in Cognitive Sciences 20:260–281.https://doi.org/10.1016/j.tics.2016.01.007

The construction of preferenceAmerican Psychologist 50:364–371.https://doi.org/10.1037/0003066X.50.5.364

Emotion regulation reduces loss aversion and decreases amygdala responses to lossesSocial Cognitive and Affective Neuroscience 8:341–350.https://doi.org/10.1093/scan/nss002

Explicit moments of decision times for single and doublethreshold driftdiffusion processesJournal of Mathematical Psychology 75:96–109.https://doi.org/10.1016/j.jmp.2016.03.005

Bayesian model selection for group studiesNeuroImage 46:1004–1017.https://doi.org/10.1016/j.neuroimage.2009.03.025

Regression towards the mean, historically consideredStatistical Methods in Medical Research 6:103–114.https://doi.org/10.1177/096228029700600202

Optimal policy for valuebased decisionmakingNature Communications 7:12400.https://doi.org/10.1038/ncomms12400

Optimal policy for multialternative decisionsNature Neuroscience 22:1503–1511.https://doi.org/10.1038/s4159301904539

Anomalies: preference reversalsJournal of Economic Perspectives 4:201–211.https://doi.org/10.1257/jep.4.2.201

Neural activity predicts attitude change in cognitive dissonanceNature Neuroscience 12:1469–1474.https://doi.org/10.1038/nn.2413

Hard decisions shape the neural coding of preferencesThe Journal of Neuroscience 39:718–726.https://doi.org/10.1523/JNEUROSCI.168118.2018

Values and preferences: defining preference constructionWiley Interdisciplinary Reviews: Cognitive Science 2:193–205.https://doi.org/10.1002/wcs.98

A bayesian observer model constrained by efficient coding can explain 'antiBayesian' perceptsNature Neuroscience 18:1509–1517.https://doi.org/10.1038/nn.4105

Anodal transcranial direct current stimulation to the left rostrolateral prefrontal cortex selectively improves source memory retrievalJournal of Cognitive Neuroscience 31:1380–1391.https://doi.org/10.1162/jocn_a_01421

Choice variability and suboptimality in uncertain environmentsCurrent Opinion in Behavioral Sciences 11:109–115.https://doi.org/10.1016/j.cobeha.2016.07.003
Decision letter

Tobias H DonnerReviewing Editor; University Medical Center HamburgEppendorf, Germany

Michael J FrankSenior Editor; Brown University, United States

Andrew WestbrookReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This work addresses a timely and heavily debated subject: the role of mental effort in valuebased decisionmaking. Plenty of models attempt to explain valuebased choice behavior, and there is a growing number of computational accounts concerning the allocation of mental effort therein. Yet, little theoretical work has been done to relate the two literatures. The current paper contributes a novel and inspiring step in this direction.
Decision letter after peer review:
Thank you for submitting your article "Trading Mental Effort for Confidence in the Metacognitive Control of ValueBased DecisionMaking" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Tobias Donner as the Reviewing Editor and Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Andrew Westbrook (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option.
Summary:
This manuscript addresses a timely subject: the role of cognitive control (or mental effort) in valuebased decision making. While there are plenty of models explaining valuebased choice, and there is a growing number of computational accounts concerning effortallocation, little theoretical work has been done to relate the two literatures. This manuscript contributes a novel and interesting step in this direction, by introducing a computational account of metacontrol in valuebased decision making. According to this account, metacontrol can be described as a costbenefit analysis that weighs the benefits of allocating mental effort against associated costs. The benefits of mental effort pertain to the integration of valuerelevant information to form posterior beliefs about option values. Given a small set of parameters, as well as prechoice value ratings and prechoice uncertainty ratings as inputs to the model, it can predict relevant decision variables as outputs, such as choice accuracy, choice confidence, choice induced preference changes, response time and subjective effort ratings. The study fits the model to data from a behavioral experiment involving valuebased decisions between food items. The resulting behavioral fits reproduce a number of predictions derived from the model. Finally, the article describes how the model relates to established accumulator models of decisionmaking.
The (relatively simple) model is impressive in its apparent ability to reproduce qualitative patterns across diverse data including choices, RTs, choice confidence ratings, subjective effort, and choiceinduced changes in relative preferences successfully. The model also appears wellmotivated, wellreasoned, and wellformulated. While all reviewers agreed that the manuscript is of potential interest, they also all felt that a stronger case needs to be made for the explanatory power of the model, and that the model should be embedded more thoroughly in the existing literature on this topic.
Essential revisions:
1. Evaluation of the (explanatory power of) the model.
1a. Parameter recoverability: Please include an analysis of parameter recoverability: How well can the fitting procedure recover model parameters from data generated by the model?
1b. Fitting procedure: Rather than fitting the model to all dependent variables at once, it would be more compelling to fit the model to a subset of established decisionrelated variables (e.g. accuracy, choice confidence, choice induced preference changes) and then evaluate if, and how well, the fitted model can predict outofsample variables related to effort allocation (e.g. response time and subjective effort ratings). The latter would be a more stringent test of the model, and may serve to highlight its value for linking variables related to valuebased decision making to variables related to metacontrol.
1c. Model complexity: Assess (through model comparison) how many degrees of freedom are needed to account for the data (e.g. by fixing some of the crucial parameters and evaluating the fit). Currently, the authors show that their model explains more variance in dependent variables when fit to real data than random data. Almost any model which systematically relates independent variables to dependent variables would explain more variance when fit to real data than to data. It would be more useful to know whether (and if so, how much) the model explains data better, than, e.g. a model with where effort only affects precision (β efficacy), or a model in which effort only impacts value mode (γ efficacy).
1d. Singlesubject data.
The model appears to do fairly well in predicting aggregate, grouplevel data, but does it predict subjectlevel data? Or, does it sometimes make unrealistic predictions when fitting to individual subjects? The Authors should provide evidence of whether it can or cannot describe subject level choices, confidence ratings, subjective effort, etc.
2. Qualify central assumptions underlying the model.
2a. The model assumes that it is "rewarding" to choose the correct (highestvalue) option (B = R*P). Is this realistic? If the two options have approx the same value, then R should be small (it doesn't matter which one you choose); if the options have different value, it is important to choose the correct one. Of course, the probability P_{c} continuously differentiates between the two options, but that is not the same as the reward. Can the predictions generalise toward a more general R that depends on value difference?
2b. Is it reasonable to assume that variance would increase as a linear function of resource allocation? It seems to me that variance might increase initially, but then each increment of resources would add diminishing variance to the mode since, e.g., new mnesic evidence should tend to follow old evidence. How sensitive are model predictions to this assumption? What about if each increment of resources added to variance in an exponentially decreasing fashion? What about anchoring biases? Because anchoring biases suggest that we estimate things with reference to other value cues, should we always expect that additional resources increase the expected value difference, or might additional effort actually yield smaller value differences over time? If we relax this assumption, how does this impact model predictions?
3. Address relationship to other accounts.
3a. Does the current model predict the diverse dependent variables better than a standard accumulator models of decisionmaking?
3b. The model could also situate itself better in the broader existing literature on the topic. For instance, how does the model compares to existing computational work on this matter, e.g. the models described in Izuma and Murayama (2013) or the efficient coding account of Polanía, Woodford, and Ruff (2019)? We understand that the presented model can account for some phenomena that the other models cannot account for, at least without auxiliary assumptions (e.g. subjective effort ratings), but the interested reader might want to know how well the presented model can explain established decisionrelated variables, such as decision confidence, choice accuracy or choiceinduced preference changes compared to existing models, by having them contrasted in a formal manner. Finally, it would seem fair to relate the presented account to emerging, more mechanistically explicit accounts of metacontrol in valuebased decision making (e.g. Callaway, Rangel and Griffiths, 2020; Jang, Sharma, and Drugowitsch, 2020). Ideally, some of the above would be addressed in the form of formal model comparisons, but we realise that this may be difficult to achieve in practice within a reasonable time frame. At the least, the manuscript should discuss in detail how the abovementioned models differ from the presented model here.
https://doi.org/10.7554/eLife.63282.sa1Author response
Essential revisions:
1. Evaluation of the (explanatory power of) the model.
1a. Parameter recoverability: Please include an analysis of parameter recoverability: How well can the fitting procedure recover model parameters from data generated by the model?
We have now included a parameter recovery analysis of the MCD model. It is now included as part of the new section 3 of the revised Appendix. Importantly, our parameter recovery was performed under simulated data with similar SNR as our empirical data. In brief, the reliability of MCD parameter recovery does not suffer from any strong nonidentifiability issue. However, its reliability is much weaker than in the ideal case, where data is not polluted with simulation noise.
1b. Fitting procedure: Rather than fitting the model to all dependent variables at once, it would be more compelling to fit the model to a subset of established decisionrelated variables (e.g. accuracy, choice confidence, choice induced preference changes) and then evaluate if, and how well, the fitted model can predict outofsample variables related to effort allocation (e.g. response time and subjective effort ratings). The latter would be a more stringent test of the model, and may serve to highlight its value for linking variables related to valuebased decision making to variables related to metacontrol.
This is an excellent suggestion. In fact, we have decided to generalize it, in the aim of providing a strong test of the model’s ability to explain all dependent variables at once. We thus performed three distinct model fits: (i) with all dependent variables, (ii) with effortrelated variables only (leaving “decisionrelated” variables out), and (iii) with decisionrelated variables only (leaving effortrelated variables out). We did this for each subject, each time estimating a single (withinsubject) set of model parameters. We then quantified, for each dependent variable, the model’s prediction accuracy. This allows to distinguish between the accuracy of “postdictions” (i.e., the trialbytrial correlation between data and predictions on variables that were used for fitting the model), and the accuracy of proper outofsample predictions (i.e., the trialbytrial correlation between data and predictions on variables that were not used for fitting the model). Note: the latter are formally derived from parameter estimates obtained when leaving the corresponding data out. The accuracy of postdictions and outofsample predictions is summarized on Figure 4 of the revised Results section. In our opinion, this analysis also addresses the point 1.d below, which relates to singlesubject fit accuracy (see our response below). Note that we also report grouplevel summaries of outofsample predictions for each dependent variable, when plotted against prechoice value ratings and value certainty ratings (along with experimental data and model postdictions, see Figures 5 to 11 of the revised Results section).
1c. Model complexity: Assess (through model comparison) how many degrees of freedom are needed to account for the data (e.g. by fixing some of the crucial parameters and evaluating the fit). Currently, the authors show that their model explains more variance in dependent variables when fit to real data than random data. Almost any model which systematically relates independent variables to dependent variables would explain more variance when fit to real data than to data. It would be more useful to know whether (and if so, how much) the model explains data better, than, e.g. a model with where effort only affects precision (β efficacy), or a model in which effort only impacts value mode (γ efficacy).
We agree with you that we did not highlight explicit evidence for the existence of β and/or γ effects in the previous version of our manuscript. We have now revised our Results section to provide evidence for this. In fact, β and γ effects can be tested directly against empirical data. More precisely, under the MCD model, nonzero type #1 efficacy trivially implies that the precision of postchoice value representations should be higher than the precision of prechoice value representations. Similarly, under the MCD model, nonzero type #2 efficacy implies the existence of spreading of alternatives. In our modified manuscript, we highlight and assess these predictions using simple significance testing on our data (see Figures 10 and 11 in the revised Results section). We note that we find this procedure more robust than model comparison in this case, given the limited reliability of parameter recovery.
1d. Singlesubject data.
The model appears to do fairly well in predicting aggregate, grouplevel data, but does it predict subjectlevel data? Or, does it sometimes make unrealistic predictions when fitting to individual subjects? The Authors should provide evidence of whether it can or cannot describe subject level choices, confidence ratings, subjective effort, etc.
We entirely agree with you. In the previous version of our manuscript, we had reported the accuracy of withinsubject “postdictions” in the Appendix (former Figure S3). In the revised manuscript, we now report the accuracy of withinsubject postdictions and outofsample predictions. In particular, we test for the significance of outofsample predictions, which provide direct evidence for the model’s ability to guess withinsubject trialbytrial variations in each dependent variable. These results are reported in the section 4.1 of the revised Results section (see Figure 4).
2. Qualify central assumptions underlying the model.
2a. The model assumes that it is "rewarding" to choose the correct (highestvalue) option (B = R*P). Is this realistic? If the two options have approx the same value, then R should be small (it doesn't matter which one you choose); if the options have different value, it is important to choose the correct one. Of course, the probability P_{c} continuously differentiates between the two options, but that is not the same as the reward. Can the predictions generalise toward a more general R that depends on value difference?
If you mean that people do not care about the decision when the prechoice values are similar, then we disagree with you. In brief, we have shown that both response time and subjective effort ratings decrease when the difference in prechoice value increases (NB: this result has been reproduced many times for RT). In other words, effort is maximal when prechoice values are similar. This is direct evidence against the idea that decision importance (i.e., R in the MCD model) should tend to zero for such “isovalue” decisions.
Of course, decision importance is a critical component of the MCD model. This is why we had included an empirical way of manipulating it, by contrasting trials where subjects had to consume the item they chose (socalled “consequential” decisions) with trials where it was not the case (“neutral” decisions). The MCD model then predicts that people should allocate more resources (spend more time and report higher subjective effort) for “consequential” than for “neutral” decisions. In our revised manuscript, we highlight this qualitative prediction and its corresponding empirical test (cf. Figure 7 in section 4.2 of the revised Results section).
Having said this, we acknowledge that decision importance falls short of a complete and concise computational definition. In the previous version of our manuscript, we had discussed possible cognitive determinants of decision importance that would be independent of option values. Now, whether and how decision importance depends upon the prior assessment of choice options is virtually unknown. We have now modified the paragraph of the related Discussion as follows (new lines 783815):
“First, we did not specify what determines decision “importance”, which effectively acts as a weight for confidence against effort costs (cf. in Equation 2 of the Model section). […] Probing these computational assumptions will be the focus of forthcoming publications.”
2b. Is it reasonable to assume that variance would increase as a linear function of resource allocation? It seems to me that variance might increase initially, but then each increment of resources would add diminishing variance to the mode since, e.g., new mnesic evidence should tend to follow old evidence. How sensitive are model predictions to this assumption? What about if each increment of resources added to variance in an exponentially decreasing fashion? What about anchoring biases? Because anchoring biases suggest that we estimate things with reference to other value cues, should we always expect that additional resources increase the expected value difference, or might additional effort actually yield smaller value differences over time? If we relax this assumption, how does this impact model predictions?
This is an intriguing suggestion. We recognize that, under some simple Bayesian algorithm for value estimation, one would expect some form of saturating type #2 efficacy. In other terms, the magnitude of the perturbation (per unit of resources) that one might expect when no resources have yet been allocated may be much higher than when most resources have already been allocated. We thus implemented and tested such a model. We report the results of this analysis in the section 10 of our revised Appendix. In brief, a saturating type #2 efficacy brings no additional explanatory power for the model’s dependent variables.
3. Address relationship to other accounts.
3a. Does the current model predict the diverse dependent variables better than a standard accumulator models of decisionmaking?
This is a fair point, to which we wholeheartedly concur. We have thus implemented two simple variants of a driftdiffusion model (DDM) which can, in principle, exploit the same information as the MCD model (namely: prechoice value difference, prechoice value certainty, and encodings of “consequential”/”penalized”/”neutral” task conditions). We have then compared these models with MCD, w.r.t. their ability to predict outofsample data. The results of this comparison are reported in section 9 of our revised Appendix. In brief, standard DDM variants make quantitative predictions regarding both response times and decision outcomes, but are agnostic about choice confidence, spreading of alternatives, value certainty gain, and/or subjective effort ratings. In addition, simple DDM variants are less accurate than MCD at making outofsample predictions on dependent variables common to both models (e.g., change of mind).
3b. The model could also situate itself better in the broader existing literature on the topic. For instance, how does the model compares to existing computational work on this matter, e.g. the models described in Izuma and Murayama (2013) or the efficient coding account of Polanía, Woodford, and Ruff (2019)? We understand that the presented model can account for some phenomena that the other models cannot account for, at least without auxiliary assumptions (e.g. subjective effort ratings), but the interested reader might want to know how well the presented model can explain established decisionrelated variables, such as decision confidence, choice accuracy or choiceinduced preference changes compared to existing models, by having them contrasted in a formal manner. Finally, it would seem fair to relate the presented account to emerging, more mechanistically explicit accounts of metacontrol in valuebased decision making (e.g. Callaway, Rangel and Griffiths, 2020; Jang, Sharma, and Drugowitsch, 2020). Ideally, some of the above would be addressed in the form of formal model comparisons, but we realise that this may be difficult to achieve in practice within a reasonable time frame. At the least, the manuscript should discuss in detail how the abovementioned models differ from the presented model here.
This is a fair point, which we have addressed by augmenting the revised Discussion section with topicspecific paragraphs.
First, the model described in Izuma and Murayama (2013) is describing a wellknown statistical artifact of measured spreading of alternatives. We have now included the following paragraph in the revised Discussion section (new lines 714738):
“As a side note, the cognitive essence of spreading of alternatives has been debated for decades. […] Second, we have already shown that the effect of prechoice value difference on spreading of alternatives is higher here than in a control condition where the choice is made after both rating sessions (Lee and Daunizeau, 2020).”
Second, the model by Polania and Ruff (2019) describes how limited neural coding resources shapes the transmission of information about subjective value. We comment on the relationship between this model and the MCD framework in the following paragraph of the revised Discussion section (new lines 739759):
“A central tenet of the MCD model is that involving cognitive resources in valuerelated information processing is costly, which calls for an efficient resource allocation mechanism. […] A possibility is to consider, for example, energyefficient population codes (Hiratani and Latham, 2020; Yu et al., 2016), which would tune the amount of neural resources involved in representing value to optimally trade information loss against energetic costs.”
Third, the models of Callaway et al. (2020) and Jang et al. (2020) effectively consider optimal policies for dividing attention between items in the choice set. They are very similar to each other, although the work by Jang et al. (2020) has a more solid theoretical grounding. We thank you for pointing us to these papers, which we were not aware of. We now refer to these work in the following modified paragraph of the Discussion section (new lines 824830):
“More problematic, perhaps, is the fact that we did not consider distinct types of effort, which could, in principle, be associated with different costs and/or efficacies. […] Such optimal adjustment of divided attention might eventually explain systematic decision biases and shortened response times for “default” choices (LopezPersem et al., 2016).”
https://doi.org/10.7554/eLife.63282.sa2Article and author information
Author details
Funding
LabEx BIOPSY
 Douglas G Lee
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
DL was supported by a grant from the Laboratory of Excellence of Biology for Psychiatry (LabEx BIOPSY, Paris, France).
Ethics
Human subjects: This study complies with all relevant ethical regulations and received formal approval from the INSERM Ethics Committee (CEEIIRB00003888, decision no 16333). In particular, in accordance with the Helsinki declaration, all participants gave written informed consent prior to commencing the experiment, which included consent to disseminate the results of the study via publication.
Senior Editor
 Michael J Frank, Brown University, United States
Reviewing Editor
 Tobias H Donner, University Medical Center HamburgEppendorf, Germany
Reviewer
 Andrew Westbrook
Publication history
 Received: September 20, 2020
 Accepted: April 23, 2021
 Accepted Manuscript published: April 26, 2021 (version 1)
 Version of Record published: May 17, 2021 (version 2)
Copyright
© 2021, Lee and Daunizeau
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,531
 Page views

 237
 Downloads

 9
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Cancer Biology
 Computational and Systems Biology
Colorectal cancer (CRC) remains a challenging and deadly disease with high tumor microenvironment (TME) heterogeneity. Using an integrative multiomics analysis and artificial intelligenceenabled spatial analysis of wholeslide images, we performed a comprehensive characterization of TME in colorectal cancer (CCCRC). CRC samples were classified into four CCCRC subtypes with distinct TME features, namely, C1 as the proliferative subtype with low immunogenicity; C2 as the immunosuppressed subtype with the terminally exhausted immune characteristics; C3 as the immuneexcluded subtype with the distinct upregulation of stromal components and a lack of T cell infiltration in the tumor core; and C4 as the immunomodulatory subtype with the remarkable upregulation of antitumor immune components. The four CCCRC subtypes had distinct histopathologic and molecular characteristics, therapeutic efficacy, and prognosis. We found that the C1 subtype may be suitable for chemotherapy and cetuximab, the C2 subtype may benefit from a combination of chemotherapy and bevacizumab, the C3 subtype has increased sensitivity to the WNT pathway inhibitor WIKI4, and the C4 subtype is a potential candidate for immune checkpoint blockade treatment. Importantly, we established a simple gene classifier for accurate identification of each CCCRC subtype. Collectively our integrative analysis ultimately established a holistic framework to thoroughly dissect the TME of CRC, and the CCCRC classification system with high biological interpretability may contribute to biomarker discovery and future clinical trial design.

 Computational and Systems Biology
 Neuroscience
Inhibition is crucial for brain function, regulating network activity by balancing excitation and implementing gain control. Recent evidence suggests that beyond simply inhibiting excitatory activity, inhibitory neurons can also shape circuit function through disinhibition. While disinhibitory circuit motifs have been implicated in cognitive processes including learning, attentional selection, and input gating, the role of disinhibition is largely unexplored in the study of decisionmaking. Here, we show that disinhibition provides a simple circuit motif for fast, dynamic control of network state and function. This dynamic control allows a disinhibitionbased decision model to reproduce both value normalization and winnertakeall dynamics, the two central features of neurobiological decisionmaking captured in separate existing models with distinct circuit motifs. In addition, the disinhibition model exhibits flexible attractor dynamics consistent with different forms of persistent activity seen in working memory. Fitting the model to empirical data shows it captures well both the neurophysiological dynamics of value coding and psychometric choice behavior. Furthermore, the biological basis of disinhibition provides a simple mechanism for flexible topdown control of the network states, enabling the circuit to capture diverse taskdependent neural dynamics. These results suggest a biologically plausible unifying mechanism for decisionmaking and emphasize the importance of local disinhibition in neural processing.