1. Neuroscience
Download icon

Response-based outcome predictions and confidence regulate feedback processing and learning

  1. Romy Frömer  Is a corresponding author
  2. Matthew R Nassar
  3. Rasmus Bruckner
  4. Birgit Stürmer
  5. Werner Sommer
  6. Nick Yeung
  1. Humboldt-Universität zu Berlin, Germany
  2. Brown University, United States
  3. Freie Universität Berlin, Germany
  4. Max Planck School of Cognition, Germany
  5. International Max Planck Research School LIFE, Germany
  6. International Psychoanalytic University, Germany
  7. University of Oxford, United Kingdom
Research Article
  • Cited 0
  • Views 1,376
  • Annotations
Cite this article as: eLife 2021;10:e62825 doi: 10.7554/eLife.62825

Abstract

Influential theories emphasize the importance of predictions in learning: we learn from feedback to the extent that it is surprising, and thus conveys new information. Here, we explore the hypothesis that surprise depends not only on comparing current events to past experience, but also on online evaluation of performance via internal monitoring. Specifically, we propose that people leverage insights from response-based performance monitoring – outcome predictions and confidence – to control learning from feedback. In line with predictions from a Bayesian inference model, we find that people who are better at calibrating their confidence to the precision of their outcome predictions learn more quickly. Further in line with our proposal, EEG signatures of feedback processing are sensitive to the accuracy of, and confidence in, post-response outcome predictions. Taken together, our results suggest that online predictions and confidence serve to calibrate neural error signals to improve the efficiency of learning.

Introduction

Feedback is crucial to learning and adaptation. Across domains it is thought that feedback drives learning to the degree that it is unexpected and, hence, provides new information, for example in the form of prediction errors that express the discrepancy between actual and expected outcomes (McGuire et al., 2014; Yu and Dayan, 2005; Behrens et al., 2007; Diederen and Schultz, 2015; Diederen et al., 2016; Pearce and Hall, 1980; Faisal et al., 2008; Sutton and Barto, 1998; Wolpert et al., 2011). Yet, the same feedback can be caused by multiple sources: we may be wrong about what is the correct thing to do, or we may know what to do but accidentally still do the wrong thing (McDougle et al., 2016). When we know we did the latter, we should discount learning about the former (McDougle et al., 2019; Parvin et al., 2018). Imagine for instance learning to throw darts. You know the goal you want to achieve – hit the bullseye – and you might envision yourself performing the perfect throw to do so. However, you find that the throw you performed as intended missed the target entirely and did not yield the desired outcome: In this case, you should adjust what you believe to be the right angle to hit the bullseye, based on how you missed that last throw. On a different throw you might release the dart at a different angle than intended and thus anticipate the ensuing miss: In this case, you may not want to update your beliefs on what is the right angle of throw. How do people assign credit to either of these potential causes of feedback when learning how to perform a new task? How do they regulate how much to learn from a given feedback depending on how much they know about its causes?

Performance monitoring, that is the internal evaluation of one’s own actions, could reduce surprise about feedback and uncertainty about its causes by providing information about execution errors. For instance in the second dart throw example, missing the target may be unsurprising if performance monitoring detected that, for example, the dart was released differently than desired (Figure 1A). In simple categorical choices, people are often robustly aware of their response errors (Maier et al., 2011; Yeung et al., 2004; Riesel et al., 2013; Maier et al., 2012) and this awareness is reflected in neural markers of error detection (Murphy et al., 2015). Although errors are often studied in simple categorization tasks in which responses are either correct or incorrect, in many tasks, errors occur on a graded scale (e.g. a dart can miss the target narrowly or by a large margin), and both error detection, as well as feedback processing are sensitive to error magnitude (Luft et al., 2014; Ulrich and Hewig, 2014; Frömer et al., 2016a; Arbel and Donchin, 2011). People are even able to report gradual errors reasonably accurately (Kononowicz et al., 2019; Akdoğan and Balcı, 2017; Kononowicz and van Wassenhove, 2019).

Interactions between performance monitoring and feedback processing.

(A) Illustration of dynamic updating of predicted outcomes based on response information. Pre-response the agent aims to hit the bullseye and selects the action he believes achieves this goal. Post-response the agent realizes that he made a mistake and predicts to miss the target entirely, being reasonably confident in his prediction. In line with his prediction and thus unsurprisingly the darts hits the floor. (B) Illustration of key concepts. Left: The feedback received is plotted against the prediction. Performance and prediction can vary in their accuracy independently. Perfect performance (zero deviation from the target, dark blue line) can occur for accurate or inaccurate predictions and any performance, including errors, can be predicted perfectly (predicted error is identical to performance, orange line). When predictions and feedback diverge, outcomes (feedback) can be better (closer to the target, area highlighted with coarse light red shading) or worse (farther from the target, area highlighted with coarse light blue shading) than predicted. The more they diverge the less precise the predictions are. Right: The precision of the prediction is plotted against confidence in that prediction. If confidence closely tracks the precision of the predictions, that is if agents know when their predictions are probably right and when they’re not, confidence calibration is high (green). If confidence is independent of the precision of the predictions, then confidence calibration is low. (C) Illustration of theoretical hypotheses. Left: We expect the correspondence between predictions and Feedback to be stronger when confidence is high and to be weaker when confidence is low. Right: We expect that agents with better confidence calibration learn better. (D) Trial schema. Participants learned to produce a time interval by pressing a button following a tone with their left index finger. Following each response, they indicated on a visual analog scale in sequence the estimate of their accuracy (anchors: ‘much too short’ = ‘viel zu kurz’ to ‘much too long’ = ‘viel zu lang’) and their confidence in that estimate (anchors: ‘not certain’ = ‘nicht sicher’ to ‘fully certain’ = ‘völlig sicher’) by moving an arrow slider. Finally, feedback was provided on a visual analog scale for 150 ms. The current error was displayed as a red square on the feedback scale relative to the target interval indicated by a tick mark at the center (Target, t) with undershoots shown to the left of the center and overshoots to the right, and scaled relative to the feedback anchors of -/+1 s (Scale, s; cf. E). Participants are told neither Target nor Scale and instead need to learn them based on the feedback. (E) Bayesian Learner with Performance Monitoring. The learner selects an intended response (i) based on the current estimate of the Target. The Intended Response and independent Response Noise produce the Executed Response (r). The Efference Copy (c) of this response varies in its precision as a function of Efference Copy Noise. It is used to generate a Prediction as the deviation from the estimate of Target scaled by the estimate of Scale. The Efference Copy Noise is estimated and expressed as Confidence (co), approximating the precision of the Prediction. Learners vary in their Confidence Calibration (cc), that is, the precision of their predictions, and higher Confidence Calibration (arrows: green >yellow > magenta) leads to more reliable translation from Efference Copy precision to Confidence. Feedback is provided according to the Executed Response and depends on the Target and Scale, which are unknown to the learner. Target and Scale are inferred based on Feedback (f), Response Noise, Prediction, and Confidence. Variables that are observable to the learner are displayed in solid boxes, whereas variables that are only partially observable are displayed in dashed boxes. (F) Target and scale error (absolute deviation of the current estimates from the true values) for the Bayesian learner with Performance monitoring (green, optimal calibration), a Feedback-only Bayesian Learner (solid black), and a Bayesian Learner with Outcome Prediction (dashed black).

This ability may be afforded by reliance on internal models to predict the outcome of movements (Wolpert and Flanagan, 2001), for example, based on an efference copy of a motor command. These predictions could help discount execution errors in learning from feedback. In fact, if these predictions perfectly matched the execution error that occurred, the remaining mismatch between predicted and obtained feedback (sensory prediction error) could serve as a reliable basis for adaptation and render feedback maximally informative about the mapping from actions to outcomes (Figure 1B).

Although participants are able to evaluate their own performance reasonably well, error detection is far less certain than outlined in the ideal scenario above, and the true cause of feedback often remains uncertain to some extent. People are critically sensitive to uncertainty, and learn more from feedback when they expect it to be more informative (McGuire et al., 2014; Schiffer et al., 2017; Bland and Schaefer, 2012; Nassar et al., 2010; O'Reilly, 2013). Uncertainty about what caused a given feedback inevitably renders it less informative, similar to decreases in reliability, and this uncertainty should be taken into account when learning from it. Confidence could support such adaptive learning from feedback by providing a read-out of the subjective precision of predicted outcomes (Nassar et al., 2010; Vaghi et al., 2017; Meyniel et al., 2015; Pouget et al., 2016), possibly relying on shared neural correlates of confidence with error detection (Boldt and Yeung, 2015; van den Berg et al., 2016). Similar to its role in regulating learning of transition probabilities (Meyniel et al., 2015; Meyniel and Dehaene, 2017), information seeking/exploration in decision making (Desender et al., 2018a; Boldt et al., 2019), and hierarchical reasoning (Sarafyazd and Jazayeri, 2019), people could leverage confidence to calibrate their use of online predictions. In line with this suggestion, people learn more about advice givers when they are more confident in the choices that advice is about (Carlebach and Yeung, 2020). In the throwing example above, the more confident you are about the exact landing position of the dart, the more surprised you should be when you find that landing position to be different: The more confident you are, the more evidence you have that your internal model linking angles to landing positions is wrong, and the more information you get about how this model is wrong. Thus, you should learn more when you are more confident. However, this reasoning assumes that your predictions are in fact more precise when you are more confident, i.e., that your confidence is well calibrated (Figure 1B).

In the present study, we tested the hypothesis that performance monitoring – error detection and confidence (Yeung and Summerfield, 2012) – adaptively regulates learning from feedback. This hypothesis predicts that error detection and confidence afford better learning, with confidence mediating the relationship between outcome predictions and feedback, and that learning is compromised when confidence is mis-calibrated (Figure 1C). It further predicts that established neural correlates of feedback processing, such as the feedback-related negativity (FRN) and the P3a (Ullsperger et al., 2014a), should integrate information about post-response outcome predictions and confidence. That is to say, an error that could be predicted based on internal knowledge of how an action was executed should not yield a large surprise (P3a) or reward prediction error (FRN) signal in response to an external indicator of the error (feedback). However, any prediction error should be more surprising when predictions were made with higher confidence. We formalize our predictions using a Bayesian model of learning and test them using behavioral and EEG data in a modified time-estimation task.

Results

Rationale and approach

Our hypothesis that performance monitoring regulates adaptive learning from feedback makes two key behavioral predictions (Figure 1C): (1) The precision of outcome predictions (i.e. the correlation between predicted and actual outcomes) should increase with confidence. (2) Learners with superior calibration of confidence to the precision of their outcome predictions should learn more quickly. Our hypothesis further predicts that feedback processing will be critically modulated by an agent’s outcome prediction and confidence. We tested these predictions mechanistically using computational modeling and empirically based on behavioral and EEG data from 40 participants performing a modified time-estimation task (Figure 1D). In comparison to darts throwing as used in our example, the time estimation task requires a simple response – a button press – such that errors map onto a single axis that defines whether the response was provided too early, timely, or too late and by how much. These errors can be mapped onto a feedback scale and, just as in the darts example where one learns the correct angle and acceleration to hit the bullseye, participants here can learn the target timing interval. In addition to requiring participants to learn and produce a precisely timed action on each trial, our task also included two key measurements that allowed us to better understand how performance monitoring affects feedback processing: (1) Participants were required to predict the feedback they would receive on each trial and indicate it on a scale visually identical to the feedback scale (Figure 1D, Prediction) and (2) Participants indicated their degree of confidence in this prediction (Figure 1D, Confidence). Only following these judgments would they receive feedback about their time estimation performance.

A mechanism for performance monitoring-augmented learning

As a demonstration of proof of the hypothesized learning principles, we implemented a computational model that uses performance monitoring to optimize learning from feedback in that same task (Figure 1E). The agent’s goal is to learn the mapping between its actions and their outcomes (sensory consequences) in the time-estimation task, wherein feedback on an initially unknown scale must be used to learn accurately timed actions. Learning in this task is challenged in two ways: First, errors signaled by feedback include contributions of response noise, for example, through variability in the motor system or in the representations of time (Kononowicz and van Wassenhove, 2019; Balci et al., 2011). Second, the efference copy of the executed response (or the estimate of what was done) varies in its precision. To overcome these challenges, the agent leverages performance monitoring: It infers the contribution of response noise to a given outcome based on an outcome prediction derived from the efference copy, and the degree of confidence in its prediction based on an estimate of the current efference copy noise. The agent then weighs Prediction and Intended Response as a function of Confidence and Response Noise when updating beliefs about the Target and the Scale based on Feedback.

We compare this model to one that has no insights into its trial-by-trial performance, but updates based on feedback and its fidelity due to response noise alone (Feedback), and another model that has insights into its trial-by-trial performance allowing it to generate predictions, and into the average precision of its predictions, but not the precision of its current prediction (Feedback + Prediction). We find that performance improves as the amount of insight into the agent’s performance increases (Figure 1F): The optimally calibrated Bayesian learner with performance monitoring outperforms both other models. Further, in line with our behavioral predictions, we find in this model that confidence varies with the precision of predictions (Figure 2A, Figure 2—figure supplement 1) and, when varying the fidelity of confidence as a read-out of precision (Confidence Calibration), agents with superior Confidence Calibration learn better (Figure 2B, Figure 2—figure supplement 1). We next sought to test whether participants’ behavior likewise displays these hallmarks of our hypothesis.

Figure 2 with 3 supplements see all
Relationships between outcome predictions and actual outcomes in the model and observed data (top vs.bottom).

(A) Model prediction for the relationship between Prediction and actual outcome (Feedback) as a function of Confidence. The relationship between predicted and actual outcomes is stronger for higher confidence. Note that systematic errors in the model’s initial estimates of target (overestimated) and scale (underestimated) give rise to systematically late responses, as well as underestimation of predicted outcomes in early trials, visible as a plume of datapoints extending above the main cloud of simulated data. (B) The model-predicted effect of Confidence Calibration on learning. Better Confidence Calibration leads to better learning. (C) Observed relationship between predicted and actual outcomes. Each data point corresponds to one trial of one participant; all trials of all participants are plotted together. Regression lines are local linear models visualizing the relationship between predicted and actual error separately for high, medium, and low confidence. At the edges of the plot, the marginal distributions of actual and predicted errors are depicted by confidence levels. (D) Change in error magnitude across trials as a function of confidence calibration. Lines represent LMM-predicted error magnitude for low, medium and high confidence calibrations, respectively. Shaded error bars represent corresponding SEMs. Note that the combination of linear and quadratic effects approximates the shape of the learning curves, better than a linear effect alone, but predicts an exaggerated uptick in errors toward the end, Figure 2—figure supplement 3. Inset: Average Error Magnitude for every participant plotted as a function of Confidence Calibration level. The vast majority of participants show positive confidence calibration. The regression line represents a local linear model fit and the error bar represents the standard error of the mean.

Confidence reflects precision of outcome predictions

To test the predictions of our model empirically, we examined behavior of 40 human participants performing the modified time-estimation task. To test whether the precision of outcome predictions increases with confidence, we regressed participants’ signed timing production errors (signed error magnitude; scale: undershoot [negative] to overshoot [positive]) on their signed outcome predictions (Predicted Outcome; same scale as for signed error magnitude), Confidence, Block, as well as their interactions. Our results support our first behavioral prediction (Table 1): As expected, predicted outcomes and actual outcomes were positively correlated, indicating that participants could broadly indicate the direction and magnitude of their errors. Crucially, this relationship between predicted and actual outcomes was stronger for predictions made with higher confidence (Figure 2C).

Table 1
Relations between actual performance outcome (signed error magnitude), predicted outcome, confidence in predictions and their modulations due to learning across blocks of trials.
Signed error magnitude
PredictorsEstimatesSECItp
Intercept4.639.99−14.94–24.200..466.427e-01
Predicted Outcome523.9929.66465..86–582.1217.677.438e-70
Block29.478.1213..56–45.373..632.832e-04
Confidence−27.0711.05−48.73 – −5.42−2..451.428e-02
Predicted Outcome: Block−149.7021.90−192.62 – −106.78−6..848.145e-12
Predicted Outcome: Confidence322.5627.31269.03–376.0911.813.477e-32
Block: Confidence−25.529..15−43.46 – −7.58−2..795.297e-03
Predicted Outcome: Block: Confidence90.6833.6524.73–156.642..697.043e-03
Random effectsModel Parameters
Residuals54478.69N40
Intercept3539.21Observations9996
Confidence2813.79log-Likelihood−68816.092
Predicted Outcome22357.33Deviance137632.185
  1. Formula: Signed error magnitude ~Predicted Outcome*Block*Confidence+(Confidence +Predicted Outcome+Block|participant); Note: ‘:” indicates interactions between predictors.

In addition to this expected pattern, we found that both outcome predictions, as well as confidence calibration, improved across blocks, suggestive of learning at the level of performance monitoring (Figure 2—figure supplement 2). Note however that participants tended to bias their predictions toward the center of the scale in early blocks, when they had little knowledge about the target interval and could thus determine neither over- vs. undershoots nor their magnitude. This strategic behavior may give rise to the apparent improvements in performance monitoring.

To test more directly our assumption that Confidence tracks the precision of predictions, we followed up on these findings with a complementary analysis of Confidence as the dependent variable and tested how it relates to the precision of predictions (absolute discrepancy between predicted and actual outcome, see sensory prediction error, SPE below), the precision of performance (error magnitude), and how those change across blocks (Table 2). Consistent with our assumption that Confidence tracks the precision of predictions, we find that it increases as the discrepancy between predicted and actual outcome decreases. Confidence was also higher for larger errors, presumably because their direction (i.e. overshoot or undershoot) is easier to judge. The relationships with both the precision of the prediction and error magnitude changed across blocks, and confidence increased across blocks as well.

Table 2
Relations of confidence with the precision of prediction and the precision of performance and changes across blocks.
Confidence
PredictorsEstimatesSECItp
(Intercept)0.260.040.18–0.336.352.187e-10
Block0.050.020.02–0.083.052.257e-03
Sensory Prediction Error (SPE)−0.440.04−0.52 – −0.36−10.842.289e-27
Error Magnitude (EM)0.170.050.08–0.273.731.910e-04
Block: SPE−0.080.04−0.15 – −0.00−1.994.642e-02
Block: EM0.150.050.05–0.253.072.167e-03
Random effectsModel Parameters
 Residuals0.12N40
 Intercept0.06Observations9996
 SPE0.03log-Likelihood−3640.142
 Error Magnitude0.06Deviance7280.284
 Block0.01
 Error Magnitude: Block0.04
  1. Formula: Confidence ~ (SPE +Error Magnitude)*Block+(SPE +Error Magnitude *Block|participant); Note: ‘:” indicates interactions between predictors.

To test whether these effects reflect monotonic increases in confidence and its relationships with prediction error and error magnitude, as expected with learning, we fit a model with block as a categorical predictor and SPE and Error Magnitude nested within blocks (Supplementary file 1). We found that confidence increased numerically from each block to the next, with significant differences between block 1 and 2, as well as block 3 and 4. Its relationship to error magnitude was reduced in the first block compared to the remaining blocks and enhanced in the final two blocks compared to the remaining blocks. These findings are thus consistent with learning effects. While the precision of predictions was more strongly related to confidence in the final block compared to the remaining blocks, it was not less robustly related in the first block, and instead somewhat weaker in the third block. This pattern is thus not consistent with learning. Importantly, whereas error magnitude was robustly related to confidence only in the last two blocks, the precision of the prediction was robustly related to confidence throughout.

Having demonstrated that, across individuals, confidence reflects the precision of their predictions (via the correlation with SPE), we next quantified this relationship for each participant separately as an index of their confidence calibration. While quantifying the relationship, we controlled for changes in performance across blocks, and to ease interpretation, we sign-reversed the obtained correlations so that higher values correspond to better confidence calibration. We next tested our hypothesis that confidence calibration relates to learning.

Superior calibration of confidence judgments relates to superior learning

To empirically test our second behavioral prediction, that people with better confidence calibration learn faster, we modeled log-transformed trial-wise error magnitude as a function of Trial (linear and quadratic effects to account for non-linearity in learning, that is stronger improvements in the beginning), Confidence Calibration for each participant (Figure 2D inset), and their interaction (Table 3). As expected, Confidence Calibration interacted significantly with the linear Trial component, that is with learning (Figure 2D). Thus, participants with better confidence calibration showed greater performance improvements during the experiment. Importantly, Confidence Calibration did not significantly correlate with overall performance (Figure 2D inset), supporting the assumption that confidence calibration relates to learning (performance change), rather than performance per se. Confidence calibration was also not correlated with individual differences in response variance (r = - 2.07e-4, 95% CI = [−0.31, 0.31], p=0.999), and the interaction of confidence calibration and block was robust to controlling for running average response variance (Supplementary file 2).

Table 3
Confidence calibration modulation of learning effects on performance.
log Error Magnitude
PredictorsEstimatesSECItp
(Intercept)5.170.065.05–5.3080.740.000e + 00
Confidence Calibration0.580.58−0.57–1.720.993.228e-01
Trial (linear)−0.590.07−0.72 – −0.45−8..821.197e-18
Trial (quadratic)0.160.020.11–0.206.801.018e-11
Trial (linear): Confidence Calibration−0.860.32−1.48 – −0.24−2.726.467e-03
Random effectsModel Parameters
 Residuals1.18N40
 Intercept0..12Observations9996
 Trial (linear)0..03log-Likelihood−15106.705
Deviance30213.411
  1. Formula: log Error Magnitude ~ (Confidence Calibration* Trial(linear)+Trial(quadratic) + (Trial(linear)|participant)); Note: ‘:' indicates interactions between predictors.

Thus, taken together, our model simulations and behavioral results align with the behavioral predictions of our hypothesis: Participants’ outcome predictions were better related to actual outcomes when those outcome predictions were made with higher confidence, and individuals with superior confidence calibration showed better learning.

Outcome predictions and confidence modulate feedback signals and processing

At the core of our hypothesis and model lies the change in feedback processing as a function of outcome predictions and confidence. It is typically assumed that learning relies on prediction errors, and signatures of prediction errors have been found in scalp-recorded EEG signals. Before testing directly how feedback is processed, as reflected in distinct feedback related ERP components, we will show how these prediction errors vary over time, and as a function of confidence.

We dissociate three signals that can be processed to evaluate feedback (Figure 3A): The objective magnitude of the error (Error Magnitude) reflects the degree to which performance needs to be adjusted regardless of whether that error was predicted or not. The reward prediction error (RPE), thought to drive reinforcement learning, indexes whether the outcome of a particular response was better or worse than expected. The sensory prediction error (SPE), thought to underlie forward model-based and direct policy learning in the motor domain (Hadjiosif et al., 2020), indexes whether the outcome of a particular response was close to or far off the predicted one. To illustrate the difference between the two prediction errors, one might expect to miss a target 20 cm to the left but find the arrow misses it 20 cm to the right instead. There is no RPE, as the actual outcome is exactly as good or bad as the predicted one, however, there is a large SPE, because the actual outcome is very different from the predicted one.

Changes in objective and subjective feedback.

(A) Dissociable information provided by feedback. An example for a prediction (hatched box) and a subsequent feedback (red box) are shown overlaid on a rating/feedback scale. We derived three error signals that make dissociable predictions across combinations of predicted and actual outcomes. The solid blue line indicates Error Magnitude (distance from outcome to goal). As smaller errors reflect greater rewards, we computed Reward Prediction Error (RPE) as the signed difference between negative Error Magnitude and the negative predicted error magnitude (solid orange line, distance from prediction to goal). Sensory Prediction Error (SPE, dashed line) was quantified as the absolute discrepancy between feedback and prediction. Values of Error Magnitude (left), RPE (middle), and SPE (right) are plotted for all combinations of prediction (x-axis) and outcome (y-axis) location. (B) Predictions and confidence associate with reduced error signals. Average error magnitude (left), Reward Prediction Error (center), and Sensory Prediction Error (right) are shown for each block and confidence tercile. Average prediction errors are smaller than average error magnitudes (dashed circles), particularly for higher confidence.

Our hypothesis holds that predictions should help discount noise in the error signal and more so for higher confidence. Prediction errors should thus be smaller than error magnitude and particularly so when confidence is higher. We find that this is true in our data (Figure 3B, Supplementary file 3 and 4, note that unlike SPE, by definition RPE cannot be larger than error magnitude and that its magnitude, but not sign varies robustly with confidence).

To examine changes in these error signals with trial-to-trial changes in confidence and learning, we regressed each of these signals onto Confidence, Block, and their interaction (Supplementary file 5, Figure 3B). Consistent with our assumption that confidence tracks the precision of predictions, SPE decreased as confidence increased (b = −71.20, p>0.001), but there were no significant main effects on error magnitude or reward prediction error. However, Confidence significantly interacted with Block on all variables (Error Magnitude: b = 30.09, p<0.001, RPE: b = −64.48, p<0.001, SPE: b = 16.99, p=0.005), such that in the first block, increased Confidence is associated with smaller Error Magnitudes, less negative RPE, as well as smaller SPE. All error signals further decreased significantly across blocks (Error Magnitude: b = - 37.10, p<0.001, RPE: b = 36.26, p<0.001, SPE: b = - 17.54, p<0.001, block wise comparisons significant only from block 1 to 2 Supplementary file 6). These parallel patterns might emerge because prediction errors are derived from and thus might covary with error magnitude. To test whether changes in prediction errors were primarily driven by improvements in error magnitude rather than predictions, we reran the previous RPE and SPE models with error magnitude as a covariate (Supplementary file 7). Controlling for error magnitude notably reduced linear block effects on RPE (b = 36.26 to b = 10.3). It further eliminated block effects on SPE: b = −17.54 to b = 3.65, p=0.274, as well as the interaction of confidence and Block (b = 0.10, p=0.984), while the hypothesized main effect of Confidence prevailed (b = −60.12, p<0.001).

In summary, we find that all error signals decrease across blocks as performance improves. Although higher confidence is associated with smaller error signals in all three variables early in learning, across all blocks we find that confidence only has a consistent relationship with smaller sensory prediction errors.

Taken together, these results are consistent with our hypothesis that outcome predictions and confidence optimize feedback processing. Accordingly, we predicted that participants’ internal evaluations would modulate feedback processing as indexed by distinct feedback-related potentials in the EEG: the feedback-related negativity (FRN), P3a and P3b. Thus, the amplitude of a canonical index of signed RPE (Holroyd and Coles, 2002), the FRN, should increase to the extent that outcomes were worse than predicted, that is, with more negative-going RPE. P3a amplitude, a neural signature of surprise (Polich, 2007), should increase with the absolute difference between participants’ outcome predictions and actual outcomes (i.e. with SPE) and be enhanced in trials in which participants indicated higher confidence in their outcome predictions. To further explore the possible role of performance monitoring in learning, we also tested the joint effects of our experimental variables on the P3b as a likely index of learning (Fischer and Ullsperger, 2013).

If participants did not take their predictions into account, ERP amplitudes should scale with the actual error magnitude reflected in the feedback (Error Magnitude). Note that both RPE and SPE are equivalent to Error Magnitude in the special case where predicted errors are zero (Figure 3A), and thus Error Magnitude can be thought of as the default RPE and SPE that would arise if an individual predicted perfect execution on each trial. Thus, if participants did not take knowledge of their own execution errors into account, their FRN and P3a should both simply reflect Error Magnitude. A key advantage of our experimental design is that RPE, SPE, and Error Magnitude vary differentially as a function of actual and predicted outcomes (Figure 3A), which allowed us to test our predictions by examining whether ERP amplitudes are modulated by prediction errors (SPE and RPE) and Confidence, while controlling for other factors including Error Magnitude.

Reward prediction error modulates the feedback-related negativity

The feedback-related negativity (FRN) is an error-sensitive ERP component with a fronto-central scalp distribution that peaks between 230 and 330 ms following feedback onset (Miltner et al., 1997Figure 4A). It is commonly thought to index neural encoding of RPE (Holroyd and Coles, 2002): Its amplitude increases with the degree to which an outcome is worse than expected and, conversely, decreases to the extent that outcomes are better than expected (Hajcak et al., 2006; Holroyd et al., 2006; Walsh and Anderson, 2012; Holroyd et al., 2003; Sambrook and Goslin, 2015). Its amplitude thus decreases with increasing reward magnitude (Frömer et al., 2016a) and reward expectancy (Lohse et al., 2020). However, it is unknown whether reward prediction errors signaled by the FRN contrast current feedback with predictions based only on previous (external) feedback, or whether they might incorporate ongoing (internal) performance monitoring. Based on our overarching hypothesis, we predicted that FRN amplitude would scale with our estimate of RPE, which quantifies the degree to which actual feedback was ‘better’ than the feedback predicted after action execution, causing more negative RPEs to produce larger FRN amplitudes (Figure 4B). A key alternative option is that the FRN indexes the magnitude of an error irrespective of the participant’s post-response outcome prediction (e.g. with large FRN to feedback indicating a large error, even when the participant knows to have committed this error) (Pfabigan et al., 2015; Sambrook and Goslin, 2014; Talmi et al., 2013). Note that the prediction errors experienced by most error-driven learning models would fall into this alternative category, as they would reflect the error magnitude minus some long-term expectation of that magnitude, but not update these expectations after action execution. Thus, to test whether RPE explains variation in FRN above and beyond Error Magnitude, and to control for other factors, we included Error Magnitude, SPE, Confidence, and Block in the model (Table 4).

Multiple prediction errors in feedback processing.

(A-C) FRN amplitude is sensitive to predicted error magnitude. (A) FRN, grand mean, the shaded area marks the time interval for peak-to-peak detection of FRN. Negative peaks between 200 and 300 ms post feedback were quantified relative to positive peaks in the preceding 100 ms time window. (B) Expected change in FRN amplitude as a function of RPE (color) for two predictions (black curves represent schematized predictive distributions around the reported prediction for a given confidence), one too early (top: high confidence in a low reward prediction) and one too late (bottom: low confidence in a higher reward prediction). Vertical black arrows mark a sample outcome (deviation from the target; abscissa) resulting in different RPE/expected changes in FRN amplitude for the two predictions, indicated by shades. Blue shades indicate negative RPEs/larger FRN, red shades indicate positive RPEs/smaller FRN and gray denotes zero. Note that these are mirrored at the goal for any predictions, and that the likelihood of the actual outcome given the prediction (y-axis) does not affect RPE. In the absence of a prediction or a predicted error of zero, FRN amplitude should increase with the deviation from the target (abscissa). (C) LMM-estimated effects of RPE on peak-to-peak FRN amplitude visualized with the effects package; shaded error bars represent 95% confidence intervals. (D– I) P3a amplitude is sensitive to SPE and Confidence. (D) Grand mean ERP with the time-window for quantification of P3a, 330–430 ms, highlighted. (E) Hypothetical internal representation of predictions. Curves represent schematized predictive distributions around the reported prediction (zero on abscissa). Confidence is represented by the width of the distributions. (F) Predictions for SPE (x-axis) and Confidence (y-axis) effects on surprise as estimated with Shannon information (darker shades signify larger surprise) for varying Confidence and SPE (center). The margins visualize the predicted main effects for Confidence (left) and SPE (bottom). (G) P3a LMM fixed effect topographies for SPE, and Confidence. (H–I) LMM-estimated effects on P3a amplitude visualized with the effects package; shaded areas in (H) (SPE) and (I) (confidence) represent 95% confidence intervals.

Table 4
LMM statistics of learning effects on FRN.
Peak-to-Peak FRN amplitude
PredictorsEstimatesSECItp
Intercept−12.670.49−13.62 – −11.71−26.032.322e-149
Confidence−0.190.15−0.49–0.11−1.252.126e-01
Reward prediction error1.430.410.62–2.243.475.302e-04
Sensory prediction error−0.670.42−1.49–0.15−1.611.078e-01
Error magnitude0.510.55−0.57–1.580.923.553e-01
Block−0.150.11−0.36–0.06−1.431.513e-01
Random effectsModel Parameters
 Residuals27.69N vpn40
 Intercept9.23Observations9678
 Error magnitude2.24log-Likelihood−29908.910
 Block0.22Deviance59817.821
  1. Formula: FRN ~ Confidence + RPE+SPE + EM+Block + (EM +Block|participant).

As predicted, FRN amplitude decreased with more positive-going RPEs (b = 1.43, p<0.001, Figure 4C), extending previous work that investigated prediction errors as a function of reward magnitude and frequency (Holroyd and Coles, 2002; Sambrook and Goslin, 2015). In contrast, error magnitude and SPE did not significantly affect FRN amplitude, suggesting in the case of the error magnitude that when errors can be accounted for by faulty execution, they do not drive internal reward prediction error signals. We found no other reliable effects and when including interaction terms, they were neither significant, nor supported by model selection (ΔΧ2(10)=10.98, p=0.359, AICreduced-full = −9, BICreduced-full = −81). We conclude that FRN amplitude reflects the degree to which feedback is better than predicted, and critically, that the outcome predictions incorporate information about likely execution errors.

Sensory prediction error and confidence modulate P3a

The frontocentral P3a is a surprise-sensitive positive-going deflection between 250 and 500 ms following stimulus onset (Figure 3E; Polich, 2007). Its functional significance can be summarized as signaling the recruitment of attention for action to surprising and motivationally relevant stimuli (Polich, 2007; Nieuwenhuis et al., 2011). P3a has been shown to increase with larger prediction errors in probabilistic learning tasks (Fischer and Ullsperger, 2013), higher goal-relevance in a go/no-go task (Walentowska et al., 2016), with increasing processing demands (Frömer et al., 2016b), and with meta-memory mismatch (feedback about incorrect responses given with high confidence [Butterfield and Mangels, 2003]).

Surprise can be quantified using Shannon Information, which reflects the amount of information provided by an outcome given a probability distribution over outcomes (O'Reilly et al., 2013). As seen in Figure 4F, this measure scales with increasing confidence, as well as SPE, that is, increasing deviations between predicted and actual outcome (margins). To generate these predictions, we computed the Shannon Information for a range of outcomes given a range of predictive distributions with varying precision, assuming that confidence reflects the precision of a distribution of predicted outcomes (Figure 4E). Thus, P3a amplitude should scale with both SPE and Confidence. We tested our predictions by examining whether P3a was modulated by SPE, and Confidence, in a model that also included Error Magnitude, RPE, and Block as control variables.

As predicted, our analyses showed that P3a amplitude significantly increased with increasing SPEs, in line with the idea of stronger violations of expectations by less accurately predicted outcomes (Figure 4G and I, Table 5), and with increasing Confidence (Figure 4G and H). Our Shannon Information simulation also predicts a small interaction between SPE and Confidence (see slight diagonal component in Figure 4F). However, when including the interaction term it was not significant and did not improve model fit, ΔΧ2(10)=10.36, p=0.410 (AIC reduced-full = −10, BIC reduced-full = −81), suggesting that any such effect was minimal.

Table 5
LMM statistics of learning effects on P3a.
P3a Amplitude
PredictorsEstimatesSECItp
Intercept4.100.423.28–4.939.791.293e-22
Confidence0.970.140.70–1.246.963.338e-12
Block−0.910.07−1.05 – −0.77−12.933.201e-38
Sensory prediction error2.060..481.11–3.004..271.969e-05
Reward prediction error−0.750.38−1.49 – −0..01−1.984.794e-02
Error magnitude−1..950..44−2.81 – −1..09−4.439.512e-06
Random effectsModel Parameters
 Residuals22.98N40
 Intercept6.83Observations9678
 SPE3.02log-Likelihood−28997.990
Deviance57995.981
  1. Formula: P3a ~ Confidence + Block +SPE + RPE+EM + (SPE|participant).

In addition, P3a amplitude decreased across blocks, perhaps reflecting decreased motivational relevance of feedback as participants improved their performance and predictions (Walentowska et al., 2016; Severo et al., 2020). We also observed a significant decrease of P3a with increasing Error Magnitude and larger P3a amplitudes for more negative reward prediction errors. However, these effects showed more posterior scalp distributions than those of SPE and confidence. As P3a temporally overlaps with the more posteriorly distributed P3b, these effects are likely a spillover of the P3b. Hence, we discuss them below. Taken together our results support our hypothesis that predictions and confidence shape feedback processing at the level of the P3a.

Prediction errors, objective errors, and confidence converge in the P3b

Our overarching hypothesis and model predict that outcome predictions and confidence should affect the degree to which feedback is used for future behavioral adaptation. The parietally distributed P3b scales with learning from feedback (Ullsperger et al., 2014a; Fischer and Ullsperger, 2013; Yeung and Sanfey, 2004; Sailer et al., 2010; Chase et al., 2011) and predicts subsequent behavioral adaptation (Fischer and Ullsperger, 2013; Chase et al., 2011). P3b amplitude has been found to increase with feedback salience (reward magnitude irrespective of valence; Yeung and Sanfey, 2004), behavioral relevance (choice vs. no choice; Yeung et al., 2005), with more negative-going RPE (Ullsperger et al., 2014a; Fischer and Ullsperger, 2014), but also with better outcomes in more complex tasks (Pfabigan et al., 2014).

Consistent with their necessity for governing behavioral adaptation, P3b was sensitive to participants’ outcome predictions (Table 6). P3b amplitude increased with increasing SPE (Figure 5B,E), indicating that participants extracted more information from the feedback stimulus when outcomes were less expected. As for P3a, this SPE effect decreased across blocks, and so did overall P3b amplitude, suggesting that participants made less use of the feedback as they improved on the task (Fischer and Ullsperger, 2013).

Figure 5 with 1 supplement see all
Performance-relevant information converges in the P3b.

(A) Grand average ERP waveform at Pz with the time window for quantification, 416–516 ms, highlighted. (B) Effect topographies as predicted by LMMs for RPE, error magnitude, SPE and the RPE by Confidence by Block interaction. (C–F) LMM-estimated effects on P3b amplitude visualized with the effects package in R; shaded areas represent 95% confidence intervals. (C.) RPE. Note the interaction effects with Block and Confidence (D), that modulate the main effect (D) Three-way interaction of RPE, Confidence and Block. Asterisks denote significant RPE slopes within cells. (E) P3b amplitude as a function of SPE. (F) P3b amplitude as a function of Error Magnitude.

Table 6
LMM statistics of learning effects on P3b.
P3b Amplitude
PredictorsEstimatesSECItp
Intercept4.120.293.55–4.7014.122.937e-45
Block−0.480.09−0.66 – −0.30−5.202.037e-07
Confidence0.080.20−0.31–0.480.426.740e-01
Reward prediction error−1.120.46−2.03 – −0.22−2.431.493e-02
Sensory prediction error1.750.470.84–2.663.761.691e-04
Error magnitude−2.350.46−3.24 – −1.45−5.142.743e-07
Confidence: Reward prediction error−0.510.55−1.60–0.57−0.923.556e-01
Block: Confidence0.070.18−0.28–0.430.416.823e-01
Block: Reward prediction error−0.520.44−1.39–0.34−1.192.359e-01
Block: Sensory prediction error−0.980.46−1.88 – −0.07−2.123.405e-02
Block: Confidence: Reward prediction error2.220.720.81–3.643.082.057e-03
Random effects
 Residuals23.95N40
 Intercept3.17Observations9678
 Sensory Prediction Error2.16log-Likelihood−29197.980
 Reward prediction error1.67Deviance58395.960
 Confidence0.63
  1. Formula: P3b ~ Block*(Confidence*RPE +SPE)+Error Magnitude + (SPE +RPE + Confidence|participant); Note: ‘:” indicates interactions.

P3b amplitude also increased with negative-going RPE (Figure 5B,C), hence, for worse-than-expected outcomes, replicating previous work (Ullsperger et al., 2014a; Fischer and Ullsperger, 2014). This RPE effect was significantly modulated by Confidence and Block, indicating that the main effect needs to be interpreted with caution, and the relationship between P3b and RPE is more nuanced than previous literature suggested. As shown in Figure 5D in the first block, P3b amplitude was highest for large negative RPE and high Confidence, whereas in the last block it was highest for large negative RPE and low Confidence (see below for follow-up analyses).

In line with previous findings (Pfabigan et al., 2014; Ernst and Steinhauser, 2018), we further observed significant increases of P3b amplitude with decreasing Error Magnitude, thus, with better outcomes (Figure 5B,F, Table 5). We found no further significant interactions, and excluding the non-significant interaction terms from the full model did not significantly diminish goodness of fit, ΔΧ2(5)=10.443, p=0.064 (AICreduced-full = 0; BICreduced- full = −35).

Our hypothesis states that the degree to which people rely on their predictions when learning from feedback should vary with their confidence in those predictions. In the analysis above, we observed such an interaction with confidence only for RPE (and Block). RPE is derived from the contrast between Error Magnitude and Predicted Error Magnitude, and changes in either variable or their weighting could drive the observed interaction. To better understand this interaction and test explicitly whether confidence regulates the impact of predictions, we therefore ran complementary analyses where instead of RPE we included Predicted Error Magnitude (Table 7). Confirming its relevance for the earlier interaction involving RPE, Predicted Error Magnitude indeed interacted with Confidence and Block. Consistent with confidence-based regulation of learning, a follow-up analysis showed that in the first block, P3b significantly increased with higher Confidence, and importantly decreased significantly more with increasing Predicted Error Magnitude as Confidence increased (Supplementary file 8). Main effects of Predicted Error Magnitude emerged only in the late blocks when participants were overall more confident.

Table 7
LMM statistics of confidence weighted predicted error discounting on P3b.
P3b Amplitude
PredictorsEstimatesSECItp
Intercept4.260.303.68–4.8514.227.239e-46
Confidence0.310.22−0.12–0.751.411.595e-01
Predicted error magnitude−0.830.46−1.74–0.07−1.807.133e-02
Block−0.320.11−0.52 – −0.11−2.982.860e-03
Error magnitude−1.060.49−2.03 – −0.09−2.133.277e-02
Sensory prediction error1.490.400.71–2.283.721.992e-04
Confidence: Predicted error magnitude−0.980.69−2.34–0.38−1.411.582e-01
Confidence: Block−0.500.20−0.90 – −0.11−2.501.249e-02
Predicted Error magnitude: Block−1.120.56−2.22 – −0.02−2.004.540e-02
Confidence:
Predicted error magnitude: Block
3.120.841.47–4.783.702.141e-04
Random effectsModel Parameters
 Residuals23.98N40
 Intercept3.30Observations9678
 Error magnitude3.43log-Likelihood−29201.951
 Confidence0.72Deviance58403.902
  1. Formula: P3b ~ Block*(Confidence*Predicted Error Magnitude +SPE)+Error Magnitude + (Error Magnitude +Confidence|participant); Note: ‘:” indicates interactions.

Hence, our P3b findings indicate that early on in learning, when little is known about the task, participants learn more and discount their predictions more when they have high confidence in those predictions. In later trials however, when confidence is higher overall, participants discount their predicted errors even when confidence is relatively lower.

We next explored whether P3b amplitude is associated with trial-by-trial adjustments. To that aim, we computed the improvement on trial n as the difference between the error on trial n and the error on trial n-1. Time-estimation responses are noisy, and thus provide only a coarse trial-by-trial indicator of learning. Consistent with regression to the mean, where larger errors are more likely followed by smaller errors, improvements increased with the magnitude of the error on the previous trial (b = 0.85, p<0.001, Supplementary file 9). We find, however, that this effect varies across blocks (b = 0.13, p<0.001), and is least pronounced in the first block when most learning takes place (Block 1: b = 0.66, p<0.001; Block 2–5: b >= 0.90, p<0.001, Supplementary file 10). We thus next tested whether P3b on trial n-1 mediates the relationship between error magnitude on trial n-1 and the improvement on the current trial, leading to stronger improvements following a given error, particularly in the first block when most learning takes place and performance is less determined by previous error alone. Indeed, we found a significant three-way interaction between previous error magnitude, previous P3b amplitude and Block (b = - 0.03, p=0.031, Supplementary file 9, Figure 5—figure supplement 1) on improvement. A follow-up analysis confirmed that P3b mediated the relationship between previous error magnitude and improvement in the first block (b = 0.06, p<0.001, Supplementary file 10). This interaction was not significant within any of the remaining blocks. While intriguing and in line with previous work linking P3b to trial-by-trial adjustments in behavior, these results should be interpreted with a degree of caution given that the present task is not optimized to test for trial-to-trial adjustments in behavior.

Taken together, our ERP findings support our main hypothesis that individuals take their internal evaluations into account when processing feedback, such that distinct ERP components reflect different aspects of internal evaluations rather than just signaling objective error.

Discussion

The present study explored the hypothesis that learning from feedback depends on internal performance evaluations as reflected in outcome predictions and confidence. Comparing different Bayesian agents with varying insights into their trial-by-trial performance, we show that performance monitoring provides an advantage in learning, as long as agents’ confidence is accurately calibrated. To test our hypothesis empirically, we collected participants’ trial-wise outcome predictions and confidence judgments in a time-estimation task prior to receiving feedback, while recording EEG. Like the simulations from the Bayesian learner with performance monitoring, our empirical results show that trial-by-trial confidence tracks the precision of outcome predictions, and individuals with better coupling between confidence and the precision of their predictions (confidence calibration) showed greater improvements in performance over the course of the experiment. Moreover, participants’ subjective predictions, as well as their confidence in those predictions, influenced feedback processing as revealed by feedback-related potentials.

Our study builds on an extensive body of work on performance monitoring, proposing that deviations from performance goals are continuously monitored, and expectations are updated as soon as novel information becomes available (Holroyd and Coles, 2002; Ullsperger et al., 2014b). Hence, performance monitoring at later stages should depend on performance monitoring at earlier stages (Holroyd and Coles, 2002). Specifically, learning from feedback should critically depend on internal performance monitoring. Our results extend previous work demonstrating a shift from feedback-based to response-based evaluations as learning progresses (Bellebaum and Colosio, 2014; Bultena et al., 2017): They show that performance monitoring and learning from feedback are not mutually exclusive modes of performance evaluation; instead, they operate in concert, with confidence in response-based outcome predictions determining the degree to which this information is relied on.

Participants’ behavior displayed hallmarks of error monitoring (Kononowicz et al., 2019; Akdoğan and Balcı, 2017; Kononowicz and van Wassenhove, 2019), such that outcome predictions tracked factual errors in both direction and magnitude. Crucially, extending those previous findings, our empirical results align with unique predictions based on our hypothesis: Confidence reflected the precision of participants’ outcome predictions, and participants with superior calibration of their confidence judgments to the accuracy of their predictions learned better than those with poorer calibration. This latter finding is notable given that overall confidence calibration was similar for participants with different performance quality (error magnitude, response variance). Therefore, the empirical confidence calibration effect on learning is unlikely to be a consequence of better overall ability as described in the ‘unskilled and unaware effect’ (Kruger and Dunning, 1999) or of the dependence of confidence calibration (or metacognitive sensitivity) on performance (Fleming and Lau, 2014). Instead, the finding supports our hypothesis that confidence supports learning via optimized feedback processing.

Our simulations and ERP data reveal two critical mechanisms through which performance monitoring may impact learning from feedback: modulation of surprise and reduction of uncertainty via credit assignment. The main impact of error monitoring is to reduce the surprise about outcomes. All else being equal, a given outcome is less surprising the better it matches the predicted outcome. Consistent with discounting of predicted deviations from the goal, we found that participants’ trial-by-trial subjective outcome predictions consistently modulated feedback-based evaluation reflected in ERPs as evidenced by prediction error effects. Participants’ response-based outcome predictions were reflected in the amplitudes of FRN reward prediction error signals (Holroyd and Coles, 2002; Walsh and Anderson, 2012; Sambrook and Goslin, 2015; Correa et al., 2018), of P3a surprise signals, as well as the P3b signals combining information about reward prediction error and surprise. In our computational models, reducing surprise by taking response-based outcome predictions into account led to more accurate updating of internal representations supporting action selection and thus superior learning.

Learning – in our computational model and in our participants - was further supported by the adaptive regulation of uncertainty-driven updating via confidence. Specifically, as deviations from the goal were predicted with higher confidence, these more precise outcome predictions enhanced the surprise elicited by a given prediction error. This mechanism implemented in our model is mirrored in participants’ increased P3a amplitudes for higher confidence, and further reflected in confidence-weighted impacts of predicted error magnitude on P3b, as well as larger P3b amplitudes for higher confidence in the first block when most learning took place. Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high.

Although FRN amplitude was not modulated by confidence, we found that P3a increased with confidence, as predicted by uncertainty-driven changes in surprise. Our results align with previous findings of larger P3a amplitude for metacognitive mismatch (Butterfield and Mangels, 2003) and offer a computational mechanism underlying previous theorizing that feedback about errors committed with high confidence attracts more attention, and therefore leads to hypercorrection (Butterfield and Metcalfe, 2006; Butterfield and Metcalfe, 2001). We also found that confidence modulated the degree to which predicted error magnitude reduced P3b amplitude, such that in initial blocks, where most learning took place, predicted error magnitude effects were amplified for higher confidence, whereas this effect diminished in later blocks, where predicted error magnitude effects were present also for low confidence (and performance and prediction errors were attenuated when confidence was high). This shift is intriguing and may indicate a functional change in feedback use as certainty in the response-outcome mapping increases and less about this mapping is learned from feedback, but the effect was not directly predicted and therefore warrants further research and replication.

Confidence has typically been studied in two-alternative choice tasks, and only rarely in relation to continuous outcomes (Meyniel et al., 2015; Meyniel and Dehaene, 2017; Boldt et al., 2019; Lebreton et al., 2015; Nassar et al., 2012; Arbuzova, 2020). By reconceptualizing error detection as outcome prediction, our results shed new light on the well-supported claim that error monitoring and confidence are tightly intertwined (Boldt and Yeung, 2015; Yeung and Summerfield, 2012; Charles and Yeung, 2019; Desender et al., 2018b; Desender et al., 2019) and forge valuable links between research on performance monitoring (Ullsperger et al., 2014a; Holroyd and Coles, 2002; Ullsperger et al., 2014b) and on learning under uncertainty (McGuire et al., 2014; Behrens et al., 2007; O'Reilly et al., 2013; Nassar et al., 2019). In doing so, our results provide further evidence to the growing literature on the role of confidence in learning and behavioral adaptation (Meyniel and Dehaene, 2017; Desender et al., 2018a; Boldt et al., 2019; Colizoli et al., 2018).

While we captured the main effects of interest with our Bayesian model and our key behavioral results are in line with our overarching hypothesis, our behavioral findings also reveal other aspects of learning that remain to be followed up on. Unlike our Bayesian agents, participants exhibited signatures of learning not only at the level of first order performance, but also at the level of performance monitoring. The precision of their outcome predictions increased as learning progressed, as did confidence. Identifying the mechanisms that drive this metacognitive learning, that is, whether changes in confidence follow the uncertainty in the internal model or reflect refinement of the confidence calibration to the efference copy noise, is an exciting question for future work.

Anyone who tried to learn a sport can relate to the intuition that just because you find out what you did was wrong doesn’t mean you know how to do it right. Our task also evokes this so-called distal problem, which refers to the difficulty of translating distal sensory outcomes of responses (e.g. the location of a red dot on a feedback scale) to required proximal movement parameter changes (changes in the timing of the response). Indeed, when practicing complex motor tasks, individuals prefer and learn better from feedback following successful trials compared to error trials (Chiviacowsky and Wulf, 2007; Chiviacowsky and Wulf, 2002; Chiviacowsky and Wulf, 2005). In line with the notion that in motor learning feedback about success is more informative than feedback about failure, we, like others in the time estimation task (Pfabigan et al., 2014; Ernst and Steinhauser, 2018), observed increasing P3b amplitude after feedback about more accurate performance (i.e. for smaller error magnitude), in addition to prediction error effects.

In our study, the P3b component, previously shown to scale with the amount of information provided by a stimulus (Cockburn and Holroyd, 2018) and the amount of learning from a stimulus (Fischer and Ullsperger, 2013), was sensitive to both RPE and SPE, indicating that multiple learning mechanisms may act in parallel, supported by different aspects of feedback. Our findings resonate with recent work in rodents showing that prediction error signals encode multiple features of outcomes (Langdon et al., 2018), and are based on distributed representations of predictions (Dabney et al., 2020). This encoding of multiple features of outcomes, like the uncertainty in predictions, may help credit assignment and support learning at multiple levels. It is still unclear to what degree different learning mechanisms – error-based, model-based, reinforcement learning – contribute to motor learning (Wolpert and Flanagan, 2016). Further research is needed to identify whether the same or different learning mechanisms operate across levels, for example, via hierarchical reinforcement learning (Holroyd and Yeung, 2012; Lieder et al., 2018), and how learning interacts between levels.

Taken together, our findings provide evidence that feedback evaluation is fundamentally affected by an individual’s internal representations of their own performance at the time of feedback. These internal representations in turn influence how people learn and thus which beliefs they will have and which actions they will take, driving what internal and external states they will encounter in the future. The present study is a first step toward elucidating this recursive process of performance optimization via internal performance monitoring and monitoring of external task outcomes.

Materials and methods

Task variables

Request a detailed protocol
  • t denotes the target interval, which was set to t:=19, (this simulation parameter choice was necessarily somewhat arbitrary, and choosing a different parameter does not change the model’s behavior)

  • s denotes the feedback scale, which was set to s:=90,

  • r denotes the model’s or participant’s response,

  • f denotes the feedback in the task, which was defined as

(1) f:=(r-t)s

Computational model

Request a detailed protocol

The Bayesian learner with performance monitoring attempted to sequentially infer the target interval t and feedback scale s (defining how the magnitude of a given response error translates to the magnitude of the error displayed on the visual feedback scale) over multiple trials, based on its intended response i, an efference copy c of its executed response and feedback f indicating the magnitude and direction of its timing errors. On each trial the model computed its intended response based on the inferred target interval. During learning, the model faced several obstacles including (1) the initially unknown scale of the feedback, making it difficult to judge whether feedback indicates small or large timing errors, (2) response noise, which offsets executed responses from intended ones, and (3) efference copy noise, which makes the efference copy unreliable to a degree that varies from trial to trial. Formally, the Bayesian learner with performance monitoring is represented by the following variables:

  • ptU(t;[0,100]) denotes the model’s prior distribution over the target interval t, as a uniform distribution (denoted by U throughout) of over possible values of t within the range 0 to 100.

  • psU(s;[0.1,100]) denotes the model’s prior distribution over the feedback scale s, as a uniform distribution over possible values of s within a range of 0.1 to 100.

  • pσr2r|iNr;i,σr2 denotes the model’s response distribution (N denoting normal distributions throughout), where i denotes the model’s intended response, which corresponds to the expected target interval i:=Ep(t)=tp(t)t, and σr2 denotes the response noise, which was set to σr: = 10 in terms of the standard deviation

  • pσc2crN(c;r,σc2) denotes the model’s efference-copy distribution with efference-copy noise (we simulated three levels: low, medium and high) expressed as standard deviation σc5,10,20, where pσcCat(σc,1/3). Here we assumed that the model was aware of its trial-by-trial efference copy noise. That is, from the perspective of the Bayesian learner with performance monitoring, efference-copy noise was not a random variable.

Intended response

Request a detailed protocol

During the response phase in the task, the model first computed its intended response i and then sampled its actual response r from the Gaussian response distribution. We assumed that the model’s internal response monitoring system subsequently generated the noisy efference copy c.

Learning

Request a detailed protocol

Based on the definition of the task and the Bayesian learner with performance monitoring, the joint distribution over the variables of interest during a trial of the task is given by

(2) pi,σc2,σr2t,s,f,r,c:=pfr,t,spσc2(c|r)pi,σr2(r)p(t,s)

To infer the target interval t and the feedback scale s, we can evaluate the posterior distribution conditional on the efference copy c and feedback f and given the intended response i, response noise σr2 and efference copy noise σc2 according to Bayes’ rule:

(3) pi,σc2,σr2(t,s|c,f)pi,σc2,σr2t,s|c,f,rdrpf|r,t,srσc2c|rpi,σr2(r)p(t,s)dr

Note that we assumed that the Bayesian learner with performance monitoring was aware how feedback f was generated in the task (Equation 1), that is, conditional on the response r, target interval t and feedback scale s, the model was able to exactly compute the probability of the feedback according to

(4) p(f|r,t,s)={1,iff=(rt)s0,else

We approximated inference using a grid over the target interval t[0,100] and feedback scale s[0.1,100]. The model first computed the probability of the currently received feedback. Although it was aware how feedback was generated in the task, it suffered from uncertainty over its response due to noise in the efference copy. For each s and t on the grid, the model evaluated the Gaussian distribution

(5) pi,σc2,σr2fc,t,s:=Nf;(m-t)Ts,vA

where A was a 100 × 50 matrix containing the grid values of t and s.

(6) v=11σc2+1σr2

denotes the expected variance in feedback under consideration of both efference-copy noise σc2 and response noise σr2 and

(7) m=v1σc2c+v1σr2i

denotes the expected feedback under the additional consideration of efference copy c and intended response i. When computing the probability of the feedback, our model thus took into account the efference copy c and the response it intended to produce i, which were weighted according to their respective reliabilities.

Second, the model multiplied the computed probability of the observed feedback with the prior over the scale and target response, that is

(8) pi,σc2,σr2(t,s|c,f)N(f;(mt)Ts,vA)p(s,t)

In the grid approximation, the model started with a uniform prior on the joint distribution over t and s and was applied recursively, such that the posterior joint distribution on t and s for each trial served as the prior distribution for the subsequent one.

Outcome prediction

Request a detailed protocol

On each trial, the model reports an outcome prediction po and the confidence in this prediction, which we refer to as co. The outcome prediction maps the discrepancy between the intended response and the efference copy onto the feedback scale given the best guess of the current t and s.

(9) po=(c-i)sp(s)s

Note that this outcome prediction is different from the mean of the uncertainty-weighted expectancy distribution defined in Equation 7, in that it does not take uncertainty into account, but reflects the expectation given the efference copy alone. The inverse uncertainty σc2 of the efference copy, as described below, is translated into the agent’s confidence report.

Confidence calibration

Request a detailed protocol

Confidence calibration cc0,0.75,1 denotes the probability that the agent assumes the correct efference copy variance σc2 for learning (c.f. Equations 6,7). cc=1 indicates that the subjectively assumed efference-copy precision is equal to the true precision and cc=0, in contrast, indicates that the assumed precision of the efference copy is different from the true one. That is, in the case of cc=0.75, the agent more likely assumes the true precision of the efference copy but sometimes fails to take it accurately into account during learning. As shown in Figure 1, we simulated behavior of three agents that differed in their confidence calibration according to this idea.

Confidence report

Request a detailed protocol

In our model, the confidence report co3,2,1 (we simulate three levels as for efference copy noise) reflects how certain the agent thinks its efference copy is, where 3 refers to 'completely certain', and 1 to 'not certain'. In particular,

(10) co={3,ifσc=52,ifσc=101,ifσc=20

That is, the confidence report is directly related to the subjective precision of the efference copy, which, as shown above, depends to the agent’s level of confidence calibration.

Model with incomplete performance monitoring

Request a detailed protocol

We also applied a model that had no insight into the precision of its current predictions. The agent was thus unaware of its trial-by-trial efference-copy noise σc and therefore relied on the average value of σc, that is σcp(σc)=12. As the model does not differentiate between precise and imprecise predictions but treats each prediction as average precise, it relies too much on imprecise predictions and too little on precise ones.

Model without performance monitoring

Request a detailed protocol

Finally, we applied a model that was aware about its response noise but, because it completely failed to consider its efference copy, it lacked insight into its trial-by-trial performance. In this version, we had

(11) v=11σr2

and m=i.

This model accounts for the expected variance in the feedback, but cannot differentiate between trials in which the feedback is driven more by incorrect beliefs about the target versus incorrect execution. Therefore, it adjusts its beliefs too much following feedback primarily driven by execution errors and too little following feedback primarily driven by incorrect beliefs.

Participants

The experimental study included 40 participants (13 males) whose average age was 25.8 years (SD = 4.3) and whose mean handedness score (Oldfield, 1971) was 63.96 (SD = 52.09; i.e., most participants were right-handed). Participants gave informed consent to the experiment and were remunerated with course credits or 8 € per hour.

Task and procedure

Request a detailed protocol

Participants performed an adapted time-estimation task (Luft et al., 2014; Miltner et al., 1997) that included subjective accuracy and confidence ratings (similar to Kononowicz et al., 2019; Akdoğan and Balcı, 2017; Kononowicz and van Wassenhove, 2019). Participants were instructed that their primary goal in this task is to learn to produce an initially unknown time interval. In addition, they were asked to predict the direction and magnitude of any errors they produced and their confidence in those predictions. The time-estimation task is well established for ERP analyses e.g. (Luft et al., 2014; Miltner et al., 1997), and has the advantages that it limits the degrees of freedom of the response, and precludes concurrent visual feedback that might affect performance evaluation. The task consisted of four parts on each trial, illustrated in Figure 1B. After a fixation cross lasting for a random interval of 300–900 ms, a tone (600 Hz, 200 ms duration) was presented. Participants’ task was to terminate an initially unknown target interval of 1504 ms from tone onset, by pressing a response key with their left hand. We chose a supra-second duration to make the task sufficiently difficult (Luft et al., 2014). Following the response, a fixation cross was presented for 800 ms. Participants then estimated the accuracy of the interval they had just produced by moving an arrow on a visual analogue scale (too short – too long;±125 pixel, 3.15 ° visual angle) using a mouse cursor with their right hand. Then, on a scale of the same size, participants rated their confidence in this estimate (not certain – fully certain). The confidence rating was followed by a blank screen for 800 ms. Finally, participants received feedback about their performance with a red square (0.25° visual angle) placed on a scale identical to the accuracy estimation scale but without any labels. The placement of the square on the scale visualized error magnitude in the interval produced, with undershoots shown to the left and overshoots on the right side of the center mark, indicating the correct estimate. Feedback was presented for only 150 ms to preclude eye movements. The interval until the start of the next trial was 1500 ms.

The experiment comprised five blocks of 50 trials each, with self-paced rests between blocks. We used Presentation software (Neurobs.) for stimulus presentation, event and response logging. Visual stimuli were presented on a 4/3 17’’ BenQ Monitor (resolution: 1280 × 1024, refresh rate: 60 Hz) placed at 60 cm distance from the participant. A standard computer mouse and a customized response button (accuracy 2 ms, response latency 9 ms) were used for response registration.

Prior to the experiment, participants filled in demographic and personality questionnaires: Neuroticism and Conscientiousness Scales of NEO PI-R (Costa and McCrae, 1992) and the BIS/BAS scale (Strobel et al., 2001), as well as a subset of the Raven, 2000 progressive matrices as an index for figural-spatial intelligence. These measures were registered as potential control variables and for other purposes not addressed here. Participants were then seated in a shielded EEG cabin, where the experiment including EEG recording was conducted. Prior to the experiment proper, participants performed three practice trials.

Psychophysiological recording and processing

Request a detailed protocol

Using BrainVision recorder software (Brain Products, München, Germany) we recorded EEG data from 64 Ag/AgCl electrodes mounted in an electrode cap (ECI Inc), referenced against Cz at a sampling rate of 500 Hz. Electrodes below the eyes (IO1, IO2) and at the outer canthi (LO1, LO2) recorded vertical and horizontal ocular activity. We kept electrode impedance below 5 kΩ and applied a 100 Hz low pass filter, a time constant of 10 s, and a 50 Hz notch filter. At the beginning of the session we recorded 20 trials each of prototypical eye movements (up, down, left, right) for offline ocular artifact correction.

EEG data were processed using custom Matlab (The MathWorks Inc) scripts (Frömer et al., 2018) and EEGlab toolbox functions (Delorme and Makeig, 2004). We re-calculated to average reference and retrieved the Cz channel. The data were band pass filtered between 0.5 and 40 Hz. Ocular artifacts were corrected using BESA (Ille et al., 2002). We segmented the ongoing EEG from −200 to 800 ms relative to feedback onset. Segments containing artifacts were excluded from analyses, based on values exceeding ±150 µV and gradients larger than 50 µV between two adjacent sampling points. Baselines were corrected to the 200 ms pre-stimulus interval (feedback onset).

The FRN was quantified in single-trial ERP waveforms as peak-to-peak amplitude at electrode FCz, specifically as the difference between the minimum voltage in a window from 200 to 300 ms post-feedback onset and the preceding positive maximum in a window from −100 to 0 ms relative to the detected negative peak. To define the time windows for single-trial analyses of P3a and P3b amplitudes, we first determined the average subject-wise peak latencies at FCz and Pz, respectively, and exported 100 ms time windows centered on the respective latencies. Accordingly, the P3a was quantified on single trials as the average voltage within an interval from 330 to 430 ms after feedback onset across all electrodes within a fronto-central region of interest (ROI: F1, Fz, F2, FC1, FCz, FC2, C1, Cz, C2). P3b amplitude was quantified in single trials as the average voltage within a 416–516 ms interval post-feedback across all electrodes within a parietally-focused region of interest (ROI: CP1, CPz, CP2, P1, Pz, P2, PO3, POz, PO4).

Analyses

Request a detailed protocol

Outlier inspection of the behavioral data identified one suspicious participant (average RT >10 s) and one trial each in four additional participants (RTs > 6 s, 0.4% of data of remaining participants). These data were excluded from further analyses. We computed two kinds of prediction errors (Figure 3A): SPE was determined as the absolute difference between predicted and actual interval length: |Prediction – Feedback|. RPE was computed as the difference between the absolute predicted error and the absolute actual error as revealed by feedback: |Prediction| – |Feedback|. We quantified confidence calibration as each participant’s correlation of confidence and SPE (absolute deviation of the prediction from the actual outcome) across all trials, controlling for average error magnitude per block to account for shared changes in our confidence calibration measure with performance. To ease interpretation, we sign-reversed the correlations, such that higher values correspond to higher confidence calibration.

Statistical analyses were performed by means of linear mixed models (LMMs) using R (R Development Core Team, 2014) and the lme4 package (R Package, 2014). We chose LMMs, similar to linear multiple regression models, as they allow for parametric analyses of single-trial measures. Further, LMMs are robust to unequally distributed numbers of observations across participants, and simultaneously estimate fixed effects and random variance between participants in both intercepts and slopes. For all dependent variables, full models, including all predictors, were reduced step-wise until model comparisons indicated significantly decreased fit.

We report model comparisons and fit indices: Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), which decrease with improving model fit. Random effect structures were determined using singular value decomposition. Variables explaining zero variance were removed from the random effects structure (Bates et al., 2015; Matuschek et al., 2017).

Prior to the analyses, error magnitude, RPE and SPE were scaled from ms to seconds and confidence and block were also scaled to a range of ±1 for similar scaling of all predictors. Furthermore, block, error magnitude, confidence, and SPE were centered on their medians for accurate intercept computation. RPE was not centered, as zero represents a meaningful value on the scale (predicted and actual error magnitude are the same), and positive and negative values are qualitatively different (negative and positive values represent outcomes that are, respectively, worse or better than expected). Model formulas are reported in the respective tables. Fixed effects are visualized using the effects package (Fox and Weisberg, 2019).

Data availability

Request a detailed protocol

The datasets generated and analyzed during the current study are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning (copy archieved at swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3; Frömer, 2021).

Code availability

Request a detailed protocol

Scripts for all analyses are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning.

Data availability

Scripts and source data for all analyses are available under https://github.com/froemero/Outcome-Predictions-and-Confidence-Regulate-Learning (copy archived at https://archive.softwareheritage.org/swh:1:rev:e8bfacf8fdb8126aade59581b98616b4f2fae7b3).

References

    1. Butterfield B
    2. Metcalfe J
    (2001) Errors committed with high confidence are hypercorrected
    Journal of Experimental Psychology: Learning, Memory, and Cognition 27:1491–1494.
    https://doi.org/10.1037/0278-7393.27.6.1491
  1. Conference
    1. Costa PT
    2. McCrae RR
    (1992)
    Revised NEO personality inventory (NEO-PI-R) and NEO Five-Factor inventory (NEO-FFI): Professional manual
    Psychological Assessment Resources.
  2. Book
    1. Fox J
    2. Weisberg S
    (2019)
    An R Companion to Applied Regression
    Thousand Oaks CA: Sage.
  3. Software
    1. R Development Core Team
    (2014) R: A Language and Environment for Statistical Computing
    R Foundation for Statistical Computing, Vienna, Austria.
  4. Book
    1. Sutton RS
    2. Barto AG
    (1998)
    Reinforcement Learning: An Introduction
    MIT press.
    1. Yeung N
    2. Summerfield C
    (2012) Metacognition in human decision-making: confidence and error monitoring
    Philosophical Transactions of the Royal Society B: Biological Sciences 367:1310–1321.
    https://doi.org/10.1098/rstb.2011.0416

Decision letter

  1. Tadeusz Wladyslaw Kononowicz
    Reviewing Editor; Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, France
  2. Richard B Ivry
    Senior Editor; University of California, Berkeley, United States
  3. Tadeusz Wladyslaw Kononowicz
    Reviewer; Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, France
  4. Simon van Gaal
    Reviewer; University of Amsterdam, Netherlands

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The authors show a novel and important finding that participants use self-knowledge to optimize learning. Participants in a time estimation task used post-response information about their temporal errors to optimize learning. This is evident in the neural prediction error signals that indexed deviations from the intended target response. This work nicely integrates reinforcement-learning, time estimation and performance monitoring.

Decision letter after peer review:

Thank you for submitting your article "I knew that! Response-based Outcome Predictions and Confidence Regulate Feedback Processing and Learning" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Tadeusz Wladyslaw Kononowicz as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Richard Ivry as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Simon van Gaal (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The authors tested 40 human volunteers in a time production task and post-production performance evaluation (with an initially unknown target duration and feedback scale) while recording EEG. The authors tested the hypothesis that confidence (both its absolute value and its calibration to performance) have an effect on learning and that it affects the processing of reward and sensory prediction errors.

The reviewers all found the results to be interesting and the work was well-conducted. At the same time the reviewers agreed that the authors should be able to address several issues and clarify multiple aspects of the task, performed analyses, and data interpretation. The comments were compiled into essential revisions where we summarize the remarks that should involve additional data analysis and those proposing changes in the manuscript.

Essential revisions:

Additional analyses:

1. The authors analyze correlations between Error Magnitude, Predicted Outcome and Confidence, however before proceeding to analysis of ERPs the manuscript could be improved by including similar analysis of confidence correlations with RPE and SPE, beyond the one relying only on Predicted Outcome (Table 1).

2. Related to point 1, panel 3A should belong to Figure 1, especially if analyses proposed in the first point are included.

3. The authors showed that Error Magnitude decreases on average. However, all ERP analyses were focused on the current trial. If these ERP signals indeed reflect some "updating of internal representations" they should have a relationship with the behavior or neural measures observed on the next trial. It would've been very interesting to see how the processing of feedback (in behavior and ERP responses) relates to performance on the next trial. These analyses should better support the claims of "updating of internal representations", which would considerably improve in impact and quality if these analyses will be reported.

4. Plausible changes of precision (variance) of temporal performance over the course of experiment. Variance dynamics across experimental session could affected the outcome of the confidence calibration. The authors rightfully show that Confidence Calibration was not related to Average Error Magnitude. The same check should be performed for Time Production variance. Moreover, the effects within participants and over the course of experiment should be considered and presumably included as covariates in the LMM.

5. Specific point from one of the reviewers: The authors mention again on page 24: "We also found that confidence modulated RPE effects on P3b amplitude, such that in initial blocks, where most learning took place, RPE effects were amplified for higher confidence, whereas this effect reversed in later blocks, where RPE effects were present for low, but not high confidence. This shift is intriguing and may indicate a functional change in feedback use as certainty in the response-outcome mapping increases and less about this mapping is learned from feedback, but the effect was not directly predicted and therefore warrants further research and replication." This is the one result where confidence interacts with other behavioral measures, in this case RPE, which is interesting, however it does so in an unpredicted and counterintuitive way. I wonder whether the authors can in some way get a better understanding of what's going on here? Possibly the paper by Colizoli et al. (2018, Sci Rep.) may be relevant. The authors here show how task difficulty (related to confidence) and error processing are reflecting in pupil size responses.

Other reviewer raised concerns on how different Confidence splits were computed. Although, the authors provide and an intriguing interpretation reference in the paragraph above, is it possible that the early and late effects originate in fact from different group of subjects?

To sum up, extending the analyses with respect to the interaction of confidence and RPE in modulation of P3b component would strongly benefit the manuscript.

6. There is not explicit statement on what exact instructions were given to participants beyond the following one: "participants were required to predict the feedback they would receive on each trial". The caption of Figure 1B says that "scaled relative to the feedback anchors". Therefore, it is not clear what was the primary objective of the task – accurate time production or predicting the feedback accurately? Participants could have increased time production variance to perform better on feedback prediction. If participants employed that kind of strategy that could have impact indices of learning from feedback.

Given the lack of clarity of what instruction was provided to participants it is still unclear on which aspect of the task the participants focused on in their learning. Error Magnitude decreases over trial, however does RPE and SPE increase over trials as well?

Reshaping the manuscript:

1. It was evident from all reviews that at many places an explicit link between interpretative statements and performed analyses were far from clear. Below we list a few specific examples:

– "Taken together, our findings provide evidence that feedback evaluation is an active constructive process that is fundamentally affected by an individual's internal representations of their own performance at the time of feedback." I wonder what results the authors refer to here and on what results this statement is based on.

– The authors say "In line with the notion that positive feedback is more informative than error feedback in motor learning, we, like others in the time estimation task (65,66), observed increasing P3b amplitude after more positive feedback, in addition to prediction error effect". It is not clear which outcome the authors are referring to. Is "better than expected" referred to as "positive feedback"? In this case "worse than expected" triggered higher P3b amplitude.

– On page 24 the authors conclude that "Learning was further supported by the adaptive regulation of uncertainty-driven updating via confidence." Although this sounds interesting I do not see the results supporting this conclusion (but maybe I have missed those). I also think this conclusion is rather difficult to follow. The sentence thereafter they say "Specifically, as deviations from the goal were predicted with higher confidence, these more precise outcome predictions enhanced the surprise elicited by a given prediction error. Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high." Also here I have difficulty extracting what the authors really mean. What does it mean "surprise elicited by a prediction error"? To me these are two different measures, one signed one unsigned. Further, where is it shown that participants learn more from feedback when confidence in their prediction was high?

– Differences between blocks in the effect of confidence. This result is discussed twice: in the Results (p. 19) and Discussion. Only in the latter do the authors acknowledge that their interpretation of the effect is rather speculative. I would also flag that in Results, as it was neither part of the model predictions or their design.

2. Performed transformations involving confidence should be clearly explained.

3. Model specification (the formula) should be included in the table legend to aid readability and interpretation as it makes it immediately clear what was defined as a random or fixed effect.

4. On more conceptual level, the authors rely on the assumption that 'Feedback Prediction' is derived from efference copy, which carries motor noise only. In light of the goal of the current manuscript, that is an appropriate strategy. However, I think it should be acknowledged that in the employed paradigm part of behavioral variance may originate from inherent uncertainty of temporal representations (Balci, 2011). Typically, time production variance is partition into a 'clock' variance and 'motor' variance. I have a feeling that this distinction should be spelled out in the manuscript and if assumptions are made they shall be spelled out clearer. Moreover, recent work attempted to tease apart origins of 'Feedback Predictions', indicating that it is unlikely that they originate solely from motor variability (Kononowicz and Van Wassenhove, 2019).

5. The main predictions of the experiment are described in the first paragraph of the Results. But they are not reflected in Figure 1, which is referenced in that paragraph. I would have expected an illustration of the effects of confidence, and instead that only appears on Figure 2. The authors have clear predictions that drive the analysis, but this is not reflected in the flow of the text.

6. Simulations (Figure 2. B, D): As far as I can tell, the model does not capture the data in two ways: it fails to address the cross-over effect (which the authors address) but also does not account for the apparent tendency of the data to increase the error on later trials (whereas the model predict a strict decrease in error over the course of the experiment). The second aspect is not addressed in the Discussion, I think (or I missed it). Do the authors think this is just fatigue, and therefore not consider it as a reason to modify the model? Also Panels 2.A. And C do not really match in the sense that the simulation is done over a much wider range of predicted outcomes. It seems like the model parameters were not fine-tuned to the data. Perhaps this is not strictly necessary if the quantitative predictions of the effects of confidence remain unchanged with a narrower range, but it is perhaps worth discussing.

7. "… it is unknown whether reward prediction errors signaled by the FRN rely on predictions based only on previous feedback, or whether they might incorporate ongoing performance monitoring". I think that phrase should be rephrased based on the findings of Miltner et al. (1997), correctly cited in the manuscript, which showed that FRN was responsive to correct and incorrect feedback in time estimation.

8. Relevance of the dart-throwing example: In the task, participants initially had no idea about the length of the to-be-reproduced interval, and instead had to approximate it iteratively. It was not immediately clear to me how this relates to a dart-throw, where the exact target position is known. I think I understand the authors that the unknown target here is "internal" – the specific motor commands that would lead to a bulls-eye are unknown and only iteratively approximated. If that interpretation is correct, I would recommend the authors clarify it explicitly in the paper, to aid the reader to make a connection. Or perhaps I misunderstood it. Either way it would be important to clarify it.

Balci, F., Freestone, D., Simen, P., Desouza, L., Cohen, J. D., and Holmes, P. (2011). Optimal temporal risk assessment. Front Integr Neurosci, 5, 56.

Colizoli, O., De Gee, J. W., Urai, A. E., and Donner, T. H. (2018). Task-evoked pupil responses reflect internal belief states. Scientific reports, 8(1), 1-13.

Correa, C. M., Noorman, S., Jiang, J., Palminteri, S., Cohen, M. X., Lebreton, M., and van Gaal, S. (2018). How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. Journal of Neuroscience, 38(48), 10338-10348.

Kononowicz, T. W., and Van Wassenhove, V. (2019). Evaluation of Self-generated Behavior: Untangling Metacognitive Readout and Error Detection. Journal of cognitive neuroscience, 31(11), 1641-1657.

https://doi.org/10.7554/eLife.62825.sa1

Author response

Essential revisions:

Additional analyses:

1. The authors analyze correlations between Error Magnitude, Predicted Outcome and Confidence, however before proceeding to analysis of ERPs the manuscript could be improved by including similar analysis of confidence correlations with RPE and SPE, beyond the one relying only on Predicted Outcome (Table 1).

Thank you for this recommendation. We agree that our claim that Confidence reflects the precision of predictions can be tested more directly. As a critical test of our assumption that confidence varies with the precision of the prediction – i.e. SPE, we now analyze Confidence as the dependent variable and test how it relates to the precision of the prediction (sensory prediction error), the precision of performance (error magnitude), and how these relationships change across blocks. Consistent with our theoretical assumption, we find a robust relationship with SPE. We also find that Confidence increases with increasing error magnitude, and more so in later blocks. The latter finding is important because it shows that participants were in fact reporting their confidence in the accuracy of their predictions and not confidence in their performance.

The novel Results section reads as follows: “To test more directly our assumption that Confidence tracks the precision of predictions, we followed up on these findings with a complementary analysis of Confidence as the dependent variable and tested how it relates to the precision of predictions (absolute discrepancy between predicted and actual outcome, see sensory prediction error, SPE below), the precision of performance (error magnitude), and how those change across blocks (Table 2). […] This pattern is thus not consistent with learning. Importantly, whereas error magnitude was robustly related to confidence only in the last two blocks, the precision of predictions was robustly related to confidence throughout.”

As to the relationship with RPE, we agree that this is an important relationship to look at, particularly given the somewhat surprising 3-way interaction on the P3b (point 5). We think that in the context of feedback processing and ERPs this relationship is the most relevant and informative and therefore we now introduce a novel set of analyses that specifically investigates changes in our feedback regressors (RPE, SPE and EM) over time and their interaction with confidence.

The novel section reads as follows: “At the core of our hypothesis and model lies the change in feedback processing as a function of outcome predictions and confidence. […] Accordingly, we predicted that participants’ internal evaluations would modulate feedback processing as indexed by distinct feedback-related potentials in the EEG: the feedback-related negativity (FRN), P3a and P3b.”

2. Related to point 1, panel 3A should belong to Figure 1, especially if analyses proposed in the first point are included.

We agree that foreshadowing the different dimensions along which feedback can be evaluated, would help the readers. We have now altered Figure 1 to include the dissociation between performance and prediction and how one being better (or worse) than the other can alter the subjective valence of the outcome. In order to maintain continuity of Figure 1 we have introduced these concepts as part of the cartoon example, rather than in terms of our exact task. Thus, we still include panel 3A in a separate figure in the new subsection we added following your recommendation where we unpack the different kinds of prediction errors in our task, how they change across blocks and as a function of confidence. We hope that this provides the relevant information at the appropriate locations in the manuscript.

3. The authors showed that Error Magnitude decreases on average. However, all ERP analyses were focused on the current trial. If these ERP signals indeed reflect some "updating of internal representations" they should have a relationship with the behavior or neural measures observed on the next trial. It would've been very interesting to see how the processing of feedback (in behavior and ERP responses) relates to performance on the next trial. These analyses should better support the claims of "updating of internal representations", which would considerably improve in impact and quality if these analyses will be reported.

We agree that it would be great if we could link the ERPs to adjustments in behavior and we have now added an exploratory analysis to link P3b to trial-by-trial adjustments to feedback. To show such trial-by-trial adjustments, we quantify the degree to which the performance improvement on the current trial relates to the error on the previous trial and demonstrate that this relationship is contingent on P3b amplitude, specifically in the first block when most learning takes place. We set up a model for improvement on the current trial (previous error minus current error) as a function of error magnitude on the previous trial in interaction with P3b amplitude and their interaction with block. We found the expected 3-way interaction (see Figure S4). When participants’ performance improves the most (in block 1), larger P3b amplitudes to the feedback on the previous trial lead to larger improvements on the current trial. Note, however, that this finding is mostly driven by large errors and that participants are overall likely to perform worse following smaller errors. Responses at the single trial level are subject to substantial noise – the very prerequisite for our study – likely masking local adjustments in the underlying representations. Thus, we are wary to overinterpret this result and thus highlight potential caveats to this analysis in a new section we now added. –

The new section reads: “We next explored whether P3b amplitude is associated with trial-by-trial adjustments. […] While intriguing and in line with previous work linking P3b to trial-by-trial adjustments in behavior, these results should be interpreted with a degree of caution given that the present task is not optimized to test for trial-totrial adjustments in behavior.”

4. Plausible changes of precision (variance) of temporal performance over the course of experiment. Variance dynamics across experimental session could affected the outcome of the confidence calibration. The authors rightfully show that Confidence Calibration was not related to Average Error Magnitude. The same check should be performed for Time Production variance. Moreover, the effects within participants and over the course of experiment should be considered and presumably included as covariates in the LMM.

This is an excellent point. We addressed this concern in two ways as suggested by the reviewer: First, we computed the correlation between participants’ response variance and their confidence calibration. Second, to capture changes over time, we added participants running average response variance as a covariate to the model of error magnitude.

In the manuscript we now write: “Confidence calibration was also not correlated with individual differences in response variance (r = – 2.07e-4, 95%CI = [-0.31, 0.31], p =.999), and the interaction of confidence calibration and block was robust to controlling for running average response variance (Supplementary File 2).”

5. Specific point from one of the reviewers: The authors mention again on page 24: "We also found that confidence modulated RPE effects on P3b amplitude, such that in initial blocks, where most learning took place, RPE effects were amplified for higher confidence, whereas this effect reversed in later blocks, where RPE effects were present for low, but not high confidence. This shift is intriguing and may indicate a functional change in feedback use as certainty in the response-outcome mapping increases and less about this mapping is learned from feedback, but the effect was not directly predicted and therefore warrants further research and replication." This is the one result where confidence interacts with other behavioral measures, in this case RPE, which is interesting, however it does so in an unpredicted and counterintuitive way. I wonder whether the authors can in some way get a better understanding of what's going on here?

We agree that this three-way interaction deserves more unpacking, particularly given the relevance of interactions with confidence for our theoretical hypothesis. The new analyses in response to comments 1 and 2 made it clear that changes in RPE effects with Confidence and Block are complicated by the fact that RPE is not systematically related to Confidence, nor the degree to which it reduces error signals relative to error magnitude. The interaction could reflect changes in the components of RPE – Error Magnitude and Predicted Error Magnitude.

To address this point, we now report a complementary analysis that uses predicted error magnitude rather than RPE. This has the advantage that it allows us to test the specific prediction that predictions are weighted by confidence when processing feedback. This is exactly what we find. In particular in the first block, when most learning takes place, the degree to which predicted errors are discounted (as reflected in a decrease in P3b amplitude) depends on Confidence, and higher confidence is overall associated with larger P3b amplitudes. In later blocks, main effects of predicted error magnitude emerge (and we know from prior analyses, that performance is more variable when confidence is low in those late blocks allowing for larger errors to discount), likely underlying the late confidence by RPE interaction in our original analysis.

The novel Results section reads as follows: “Our hypothesis states that the degree to which people rely on their predictions when learning from feedback should vary with their confidence in those predictions. […] In later trials however, when confidence is higher overall, participants discount their predicted errors even when confidence is relatively low.”

The corresponding section in the discussion now reads: “We also found that confidence modulated the degree to which predicted error magnitude reduced P3b amplitude, such that in initial blocks, where most learning took place, predicted error magnitude effects were amplified for higher confidence, whereas this effect diminished in later blocks, where predicted error magnitude effects were present also for low confidence (and performance and prediction errors were attenuated when confidence was high).”

Possibly the paper by Colizoli et al. (2018, Sci Rep.) may be relevant. The authors here show how task difficulty (related to confidence) and error processing are reflecting in pupil size responses.

Thank you for pointing out the Colizoli reference to us. This is indeed very relevant and we now cite it in the discussion.

Other reviewer raised concerns on how different Confidence splits were computed. Although, the authors provide and an intriguing interpretation reference in the paragraph above, is it possible that the early and late effects originate in fact from different group of subjects?

The confidence splits in the original analysis were merely performed to get a sense of the underlying pattern. We have now removed these follow-up analyses, as we follow up on the 3-way interaction as described above. The pattern in these novel analyses renders a between-group effect unlikely. When separating out Confidence mean and z-scored variations for each participant, we find that the within-subject variability drives the effects we observe – reassuring us about our interpretation. However, confidence levels between subjects seem important as well, as results become unstable when they are not included, maybe because confidence changes with learning.

To sum up, extending the analyses with respect to the interaction of confidence and RPE in modulation of P3b component would strongly benefit the manuscript.

We agree that the extension of the analyses has benefitted the manuscript and thank the reviewers for their recommendation.

6. There is not explicit statement on what exact instructions were given to participants beyond the following one: "participants were required to predict the feedback they would receive on each trial". The caption of Figure 1B says that "scaled relative to the feedback anchors". Therefore, it is not clear what was the primary objective of the task – accurate time production or predicting the feedback accurately? Participants could have increased time production variance to perform better on feedback prediction. If participants employed that kind of strategy that could have impact indices of learning from feedback.

Given the lack of clarity of what instruction was provided to participants it is still unclear on which aspect of the task the participants focused on in their learning. Error Magnitude decreases over trial, however does RPE and SPE increase over trials as well?

Participants were instructed to learn to produce the correct the time interval. Thus, the emphasis was on correct timing production. In addition, they were asked to estimate the error in their response.

We now clarify in the Method: “Participants were instructed that their primary goal in this task is to learn to produce an initially unknown time interval. In addition, they were asked to predict the direction and magnitude of any errors they produced and their confidence in those predictions.”

As now reported in the manuscript, SPE decreased over the course of the experiment just like Error magnitude (primarily from block 1 to 2). Changes in RPE are difficult to interpret given that both better than expected and worse than expected outcomes are still “incorrectly” predicted. SPE is thus clearly the superior indicator. However, we can also look at changes in absolute RPE across blocks and we find that, like Error Magnitude and SPE, it decreases across blocks and primarily from block 1 to 2. Note, however, that these changes are primarily driven by improvements in time estimation performance and diminish substantially once we control for Error Magnitude. We have now added all these analyses in the Feedback section prior to the ERP analyses.

Reshaping the manuscript:

1. It was evident from all reviews that at many places an explicit link between interpretative statements and performed analyses were far from clear. Below we list a few specific examples:

– "Taken together, our findings provide evidence that feedback evaluation is an active constructive process that is fundamentally affected by an individual's internal representations of their own performance at the time of feedback." I wonder what results the authors refer to here and on what results this statement is based on.

We can see how some of this phrasing goes beyond the key findings of our study. We have now simplified the sentence to more distinctly reflect our contributions: “Taken together, our findings provide evidence that feedback evaluation is fundamentally affected by an individual’s internal representations of their own performance at the time of feedback.”

– The authors say "In line with the notion that positive feedback is more informative than error feedback in motor learning, we, like others in the time estimation task (65,66), observed increasing P3b amplitude after more positive feedback, in addition to prediction error effect". It is not clear which outcome the authors are referring to. Is "better than expected" referred to as "positive feedback"? In this case "worse than expected" triggered higher P3b amplitude.

Thank you, we now realize that this was ambiguous. This statement refers to objective performance and we have now changed the statement to make this clear. “In line with the notion that in motor learning feedback about success is more informative than feedback about failure, we, like others in the time estimation task 66,67, observed increasing P3b amplitude after feedback about more accurate performance (i.e. for smaller error magnitude), in addition to prediction error effects.”

– On page 24 the authors conclude that "Learning was further supported by the adaptive regulation of uncertainty-driven updating via confidence." Although this sounds interesting I do not see the results supporting this conclusion (but maybe I have missed those). I also think this conclusion is rather difficult to follow. The sentence thereafter they say "Specifically, as deviations from the goal were predicted with higher confidence, these more precise outcome predictions enhanced the surprise elicited by a given prediction error. Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high." Also here I have difficulty extracting what the authors really mean. What does it mean "surprise elicited by a prediction error"? To me these are two different measures, one signed one unsigned. Further, where is it shown that participants learn more from feedback when confidence in their prediction was high?

It seems that there are some misunderstandings here that we have now tried to clarify. To address the unclear link between the conclusion and our findings, we have now extended this section to read: “Learning – in our computational model and in our participants – was further supported by the adaptive regulation of uncertainty-driven updating via confidence. […] Thus, a notable finding revealed by our simulations and empirical data is that, counterintuitively, agents and participants learned more from feedback when confidence in their predictions had been high.”

We believe it is important to dissociate between prediction errors and surprise. In particular, we quantify two types of prediction errors, where only RPE is signed (better or worse than predicted) and SPE (how different than predicted) is not. However, we propose that surprise is even more nuanced than the latter, because it is not only dependent on the absolute mismatch between prediction and outcome, but also on the confidence with which the prediction was made. That is what we simulate using Shannon information. To make this more apparent from the beginning and provide intuitions, we now foreshadow this concept in the introduction: “In the throwing example above, the more confident you are about the exact landing position of the dart, the more surprised you should be when you find that landing position to be different: The more confident you are, the more evidence you have that your internal model linking angles to landing positions is wrong, and the more information you get about how this model is wrong. […] However, this reasoning assumes that your predictions are in fact more precise when you are more confident, i.e., that your confidence is well calibrated (Figure 1B).”

We have further altered the relevant sentences in last paragraph in the introduction to read: “That is to say, an error that could be predicted based on internal knowledge of how an action was executed should not yield a large surprise (P3a) or reward prediction error (FRN) signal in response to an external indicator of the error (feedback). However, any prediction error should be more surprising when predictions were made with higher confidence.”

– Differences between blocks in the effect of confidence. This result is discussed twice: in the Results (p. 19) and Discussion. Only in the latter do the authors acknowledge that their interpretation of the effect is rather speculative. I would also flag that in Results, as it was neither part of the model predictions or their design.

Thank you for pointing out this oversight on our end. We have now entirely removed the interpretation of the 3-way interaction from the Results section. As you can see described in our response to point 5 above we have added extensive additional analyses that provide better insights into the confidence effects across blocks as they relate to our hypothesis and now rely on these additional findings for our interpretations instead.

2. Performed transformations involving confidence should be clearly explained.

We assume that this comment refers to the computation of confidence calibration. The reviewers are right in that we did not clearly explain that in the results. Now that we added the novel group-level analysis of the underlying relationship, we build on that to unpack more clearly how we derive the individual difference

measure before moving on to the section where we test for differences in learning varying with confidence calibration.

“Having demonstrated that, across individuals, confidence reflects the precision of their predictions (via the correlation with SPE), we next quantified this relationship for each participant separately as an index of their confidence calibration. […] We next tested our hypothesis that confidence calibration relates to learning.”

3. Model specification (the formula) should be included in the table legend to aid readability and interpretation as it makes it immediately clear what was defined as a random or fixed effect.

We have now added formulas to all tables.

4. On more conceptual level, the authors rely on the assumption that 'Feedback Prediction' is derived from efference copy, which carries motor noise only. In light of the goal of the current manuscript, that is an appropriate strategy. However, I think it should be acknowledged that in the employed paradigm part of behavioral variance may originate from inherent uncertainty of temporal representations (Balci, 2011). Typically, time production variance is partition into a 'clock' variance and 'motor' variance. I have a feeling that this distinction should be spelled out in the manuscript and if assumptions are made they shall be spelled out clearer. Moreover, recent work attempted to tease apart origins of 'Feedback Predictions', indicating that it is unlikely that they originate solely from motor variability (Kononowicz and Van Wassenhove, 2019).

First of all, apologies for having missed these papers. Thank you for pointing them out to us! It’s true, we previously used motor noise as a blanket term to account for multiple sources of variability. This was more of a convenience than a strong assumption. We have now replaced motor noise with response noise throughout the manuscript and briefly mention the two different drivers, citing the mentioned papers. “First, errors signaled by feedback include contributions of response noise, e.g. through variability in the motor system or in the representations of time 24,41. Second, the efference copy of the executed response (or the estimate of what was done) varies in its precision.”

5. The main predictions of the experiment are described in the first paragraph of the Results. But they are not reflected in Figure 1, which is referenced in that paragraph. I would have expected an illustration of the effects of confidence, and instead that only appears on Figure 2. The authors have clear predictions that drive the analysis, but this is not reflected in the flow of the text.

Thank you for your comment. We agree that it would help to visualize our predictions in the beginning. We have now revised Figure 1 to clarify key concepts and show our main predictions. We still show the model predictions and the empirical data as tests of these predictions in Figure 2.

6. Simulations (Figure 2. B, D): As far as I can tell, the model does not capture the data in two ways: it fails to address the cross-over effect (which the authors address) but also does not account for the apparent tendency of the data to increase the error on later trials (whereas the model predict a strict decrease in error over the course of the experiment). The second aspect is not addressed in the Discussion, I think (or I missed it). Do the authors think this is just fatigue, and therefore not consider it as a reason to modify the model? Also Panels 2.A. And C do not really match in the sense that the simulation is done over a much wider range of predicted outcomes. It seems like the model parameters were not fine-tuned to the data. Perhaps this is not strictly necessary if the quantitative predictions of the effects of confidence remain unchanged with a narrower range, but it is perhaps worth discussing.

To follow up on this, we plotted the log transformed running average error magnitude for three bins of confidence calibration. As can be seen in Figure 2—figure supplement 3, our statistical model approximates, but does not properly capture the shape of the learning curves, which rather seems to saturate within the first 100 trials for low confidence calibration, than showing a marked increase towards the end. We now note this in the figure caption. This figure shows the running average log-transformed error magnitude (10 trials) averaged within Confidence calibration terciles across trials. Computing running averages was necessary to denoise the raw data for display. The edited figure caption reads: “Note that the combination of linear and quadratic effects approximates the shape of the learning curves, better than a linear effect alone, but predicts an exaggerated uptick in errors towards the end, cf. Figure 2 – supplement 3.”

7. "… it is unknown whether reward prediction errors signaled by the FRN rely on predictions based only on previous feedback, or whether they might incorporate ongoing performance monitoring". I think that phrase should be rephrased based on the findings of Miltner et al. (1997), correctly cited in the manuscript, which showed that FRN was responsive to correct and incorrect feedback in time estimation.

We think this is a misunderstanding. As the Reviewer describes, Miltner et al. demonstrated that on average, error feedback elicits a negative deflection over fronto-central sites (FRN) relative to correct feedback. They do not consider expectations/predictions at all – either based on performance history or on performance monitoring. This finding was later built on and extended (Holroyd and Coles, 2002) by showing that the processing of error and correct feedback is sensitive to contextual expectations, i.e. reflects reward prediction error, not just error feedback per se. We extend this line of work further, by asking whether beyond the contextually-defined reward prediction error, FRN amplitude is sensitive to response-based outcome predictions derived through internal monitoring. Thus, the key question in our paper is whether error detection feeds into the prediction that underlies the prediction error processing reflected in the FRN. Miltner et al. have not shown or tested that. To avoid confusion, we have now changed the corresponding sentence to read: “However, it is unknown whether reward prediction errors signaled by the FRN contrast current feedback with predictions based only on previous (external) feedback, or whether they might incorporate ongoing (internal) performance monitoring.”

8. Relevance of the dart-throwing example: In the task, participants initially had no idea about the length of the to-be-reproduced interval, and instead had to approximate it iteratively. It was not immediately clear to me how this relates to a dart-throw, where the exact target position is known. I think I understand the authors that the unknown target here is "internal" – the specific motor commands that would lead to a bulls-eye are unknown and only iteratively approximated. If that interpretation is correct, I would recommend the authors clarify it explicitly in the paper, to aid the reader to make a connection. Or perhaps I misunderstood it. Either way it would be important to clarify it.

Thank you, yes, that is exactly right. One can think of the feedback scale just like the target. In each case, the right movement that produces the desired outcome needs to be learned. It doesn’t matter if I know that the target interval is 1.5 s if I don’t have a good sense of what that means for my time production. Similarly, it doesn’t help me to know where the target is if I don’t know how to reach it. Thus, the reviewer is exactly right that what is being iteratively approximated is the correct response. We now unpack the dart-throwing example in more detail throughout the introduction and when we introduce the task. To explicitly tie the relevant concepts together we now write: “In comparison to darts throwing as used in our example, the time estimation task requires a simple response – a button press – such that errors map onto a single axis that defines whether the response was provided too early, timely, or too late and by how much. These errors can be mapped onto a feedback scale and, just as in the darts example where one learns the correct angle and acceleration to hit the bullseye, participants here can learn the target timing interval.”

Balci, F., Freestone, D., Simen, P., Desouza, L., Cohen, J. D., and Holmes, P. (2011). Optimal temporal risk assessment. Front Integr Neurosci, 5, 56.

Colizoli, O., De Gee, J. W., Urai, A. E., and Donner, T. H. (2018). Task-evoked pupil responses reflect internal belief states. Scientific reports, 8(1), 1-13.

Correa, C. M., Noorman, S., Jiang, J., Palminteri, S., Cohen, M. X., Lebreton, M., and van Gaal, S. (2018). How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. Journal of Neuroscience, 38(48), 10338-10348.

Kononowicz, T. W., and Van Wassenhove, V. (2019). Evaluation of Self-generated Behavior: Untangling Metacognitive Readout and Error Detection. Journal of cognitive neuroscience, 31(11), 1641-1657.

https://doi.org/10.7554/eLife.62825.sa2

Article and author information

Author details

  1. Romy Frömer

    1. Humboldt-Universität zu Berlin, Berlin, Germany
    2. Brown University, Providence, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    romy_fromer@brown.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9468-4014
  2. Matthew R Nassar

    Brown University, Providence, United States
    Contribution
    Formal analysis, Supervision, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5397-535X
  3. Rasmus Bruckner

    1. Freie Universität Berlin, Berlin, Germany
    2. Max Planck School of Cognition, Leipzig, Germany
    3. International Max Planck Research School LIFE, Berlin, Germany
    Contribution
    Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3033-6299
  4. Birgit Stürmer

    International Psychoanalytic University, Berlin, Germany
    Contribution
    Writing - review and editing
    Competing interests
    No competing interests declared
  5. Werner Sommer

    Humboldt-Universität zu Berlin, Berlin, Germany
    Contribution
    Resources, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Nick Yeung

    University of Oxford, Oxford, United Kingdom
    Contribution
    Conceptualization, Supervision, Writing - review and editing
    Competing interests
    No competing interests declared

Funding

NIH Office of the Director (R00 AG054732)

  • Matthew R Nassar

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Lena Fliedner and Lara Montau for support in data acquisition and helpful discussions during the setup of the task, Rainer Kniesche for advice on programming the stimulus government, Markus Ullsperger, Adrian Haith, Martin Maier, and Rasha Abdel Rahman for valuable discussions, and Mehrdad Jazayeri for valuable feedback on a previous draft. RF is further grateful for the continuous scientific and personal support by her office mates at Humboldt-University, Benthe Kornrumpf and Florian Niefind, who made her life and work a lot more fun and happened to also have inspired the original title of this paper.

Ethics

Human subjects: The study was performed following the guidelines of the ethics committee of the department of Psychology at Humboldt University. Participants gave informed consent to the experiment and were remunerated with course credits or 8€ per hour.

Senior Editor

  1. Richard B Ivry, University of California, Berkeley, United States

Reviewing Editor

  1. Tadeusz Wladyslaw Kononowicz, Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, France

Reviewers

  1. Tadeusz Wladyslaw Kononowicz, Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, France
  2. Simon van Gaal, University of Amsterdam, Netherlands

Publication history

  1. Received: September 4, 2020
  2. Accepted: April 30, 2021
  3. Accepted Manuscript published: April 30, 2021 (version 1)
  4. Version of Record published: May 14, 2021 (version 2)

Copyright

© 2021, Frömer et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,376
    Page views
  • 229
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Joshua B Burt et al.
    Research Advance

    Psychoactive drugs can transiently perturb brain physiology while preserving brain structure. The role of physiological state in shaping neural function can therefore be investigated through neuroimaging of pharmacologically induced effects. Previously, using pharmacological neuroimaging, we found that neural and experiential effects of lysergic acid diethylamide (LSD) are attributable to agonism of the serotonin-2A receptor (Preller et al., 2018). Here, we integrate brain-wide transcriptomics with biophysically based circuit modeling to simulate acute neuromodulatory effects of LSD on human cortical large-scale spatiotemporal dynamics. Our model captures the inter-areal topography of LSD-induced changes in cortical blood oxygen level-dependent (BOLD) functional connectivity. These findings suggest that serotonin-2A-mediated modulation of pyramidal-neuronal gain is a circuit mechanism through which LSD alters cortical functional topography. Individual-subject model fitting captures patterns of individual neural differences in pharmacological response related to altered states of consciousness. This work establishes a framework for linking molecular-level manipulations to systems-level functional alterations, with implications for precision medicine.

    1. Neuroscience
    Chin-Hsuan Chia et al.
    Short Report

    Sleep is essential in maintaining physiological homeostasis in the brain. While the underlying mechanism is not fully understood, a 'synaptic homeostasis' theory has been proposed that synapses continue to strengthen during awake, and undergo downscaling during sleep. This theory predicts that brain excitability increases with sleepiness. Here, we collected transcranial magnetic stimulation (TMS) measurements in 38 subjects in a 34-hour program, and decoded the relationship between cortical excitability and self-report sleepiness using advanced statistical methods. By utilizing a combination of partial least squares (PLS) regression and mixed-effect models, we identified a robust pattern of excitability changes, which can quantitatively predict the degree of sleepiness. Moreover, we found that synaptic strengthen occurred in both excitatory and inhibitory connections after sleep deprivation. In sum, our study provides supportive evidence for the synaptic homeostasis theory in human sleep and clarifies the process of synaptic strength modulation during sleepiness.