Statistical context dictates the relationship between feedbackrelated EEG signals and learning
Abstract
Learning should be adjusted according to the surprise associated with observed outcomes but calibrated according to statistical context. For example, when occasional changepoints are expected, surprising outcomes should be weighted heavily to speed learning. In contrast, when uninformative outliers are expected to occur occasionally, surprising outcomes should be less influential. Here we dissociate surprising outcomes from the degree to which they demand learning using a predictive inference task and computational modeling. We show that the P300, a stimuluslocked electrophysiological response previously associated with adjustments in learning behavior, does so conditionally on the source of surprise. Larger P300 signals predicted greater learning in a changing context, but less learning in a context where surprise was indicative of a oneoff outlier (oddball). Our results suggest that the P300 provides a surprise signal that is interpreted by downstream learning processes differentially according to statistical context in order to appropriately calibrate learning across complex environments.
https://doi.org/10.7554/eLife.46975.001Introduction
People are capable of rationally adjusting the degree to which they incorporate new information into their beliefs about the world (Behrens et al., 2007; Nassar et al., 2010; Cheadle et al., 2014; d'Acremont and Bossaerts, 2016; Diederen et al., 2016). In environments that include discontinuous changes (changepoints) normative learning requires increasing learning when beliefs are uncertain or when observations are most surprising (Nassar et al., 2010; Nassar et al., 2012). Human participants display both of these tendencies, albeit to varying degrees (Nassar et al., 2010; Nassar et al., 2012; Nassar et al., 2016).
A major open question in the learning domain is how the brain achieves such apparent adjustments in learning rate. This question has fueled a number of recent studies that have identified neural correlates of surprise in functional magnetic resonance imaging (fMRI) (McGuire et al., 2014), electroencephalography (EEG) (Jepma et al., 2016; Jepma et al., 2018), and pupil signals (Nassar et al., 2012) that predict subsequent learning behavior. These signals might reflect candidate mechanisms for a general system to adjust learning rate (Behrens et al., 2007; O'Reilly et al., 2013; Iglesias et al., 2013), yet the generality has yet to be established outside of discontinuously changing environments, where surprise and learning are tightly coupled.
The relationship between surprise and learning is complex and depends critically on the overarching statistical context. We refer to learning as the degree to which an observed prediction error promotes measurable behavioral updating. While changing environments require increased learning in the face of surprising information, stable environments with outliers (‘oddballs’), dictate less learning from surprising information (d'Acremont and Bossaerts, 2016). People are capable of this type of robust learning rate adjustment that deemphasizes surprising information (Cheadle et al., 2014; d'Acremont and Bossaerts, 2016; Summerfield and Tsetsos, 2015), yet the learning signals measured under such conditions do not correspond directly to those observed in changing environments. Most notably, a number of candidate learning signals measured through fMRI do not reflect learning rate when considering a broader set of statistical contexts (d'Acremont and Bossaerts, 2016).
However, prior studies on EEG correlates of learning seem to favor the idea that a late, stimuluslocked positivity referred to as the P300, tracks learning in a broader range of statistical contexts. While the central parietal component of the P300 (P3b) has been long known to reflect surprise (Mars et al., 2008; Kolossa et al., 2015; Kopp et al., 2016; Seer et al., 2016; Kolossa et al., 2012), recent work suggests it relates to learning (Fischer and Ullsperger, 2013) even after controlling for the degree of surprise in changing environments (Jepma et al., 2016; Jepma et al., 2018). In a stationary environment where integration of sequential samples is required to make a subsequent decision, a late posterior positivity, reminiscent of the P300, predicts the degree to which a particular sample influences the subsequent decision (Wyart et al., 2012). Interestingly, within this particular task, more surprising outcomes tended to exert less influence on decisions (Cheadle et al., 2014; Summerfield and Tsetsos, 2015), suggesting that this late positivity might provide a general learning or updating signal, irrespective of statistical context. This idea would be in line with a prominent theory of P3b function, which emphasizes its role in updating context representations – sometimes defined in terms of items stored in working memory (Donchin, 1981; Donchin and Coles, 1988; Polich, 2003; Polich, 2007).
Here we tested the idea that the P3b provides a general learning signal that is independent of the statistical context. In particular, we measured learning behavior using a modified predictive inference task and a normative learning model and examined how learning behavior and surprise related to evoked potentials measured through EEG. We found that people are capable of contextually adjusting learning in response to surprise: they tended to learn more from surprising outcomes when those outcomes were indicative of changepoints, but learned less from surprising outcomes when those outcomes were indicative of an oddball. Outcome evoked potentials reminiscent of a parietal P300 were related to surprising events irrespective of context. The magnitude of this P300 response on a given trial positively predicted learning in the presence of changepoints, but negatively predicted learning in the presence of oddballs. This conditional relationship between the P300 signal and learning was most pronounced in individuals who showed the largest behavioral adjustments in the two conditions. Furthermore, early P300 signaling predicted subsequent learning even when controlling for variability in learning behavior that could be explained by the best behavioral model.
Taken together these findings suggest that the P300 does not naively reflect increased behavioral updating, but may play a role in adaptively increasing or decreasing learning in response to surprising information, depending on the statistical context.
Results
We used EEG to measure electrophysiological signatures of feedback processing while participants performed a modified predictive inference task (Nassar et al., 2010) designed to dissociate surprise from learning. Predictions were made in the context of a video game that required participants to place a shield at a location on a circle in order to block cannonballs that would be fired from a cannon located at the center of the circle (Figure 1A). Surprise and learning were manipulated independently using two different task conditions. In the oddball condition, the aim of the cannon drifted slowly from one trial to the next (Figure 1B, dotted line) and cannonball locations were distributed around the point of cannon aim (Figure 1B, green points nearby dotted line) or, occasionally and unpredictably, uniformly distributed around the circle (oddballs; see green point on trial 11 of Figure 1B for example). In the changepoint condition, the cannon aim remained constant for an unpredictable duration, and was then reaimed at a new location on the circle at random (changepoints; Figure 1C, dotted line). Cannonball locations were always distributed around the point of cannon aim in this condition (Figure 1C, green points).
Behavior of human participants and normative model
In both conditions, participants were instructed to place a shield on each trial in order to maximize the chances of blocking the upcoming cannonball (Figure 1B and C, orange line). However, behavior differed qualitatively in these two conditions, which can be observed clearly in the example participant data in Figure 1. In particular, shield placements were not updated in response to extreme outcomes in the oddball condition (oddballs; Figure 1B) but were updated dramatically in response to extreme outcomes in the changepoint condition (changepoints; Figure 1C).
To quantitatively analyze the differences between the two task conditions, we extended a previously developed normative learning model (Nassar et al., 2010; Nassar et al., 2016). The model approximates optimal inference using an errordriven learning rule by adjusting learning from trial to trial according to two latent variables. The first latent variable tracks the probability with which the most recent outcome was generated from an unexpected generative process (oddball probability in Figure 1D; changepoint probability in Figure 1E), whereas the second latent variable tracks the model’s uncertainty about the true cannon aim (Figure 1D and E; uncertainty). Critically, the model stipulates that surprising events in the oddball condition, which are tracked through the model’s estimate of oddball probability, should reduce learning, as oddballs are unrelated to future cannonball locations (d'Acremont and Bossaerts, 2016). In contrast, the model stipulates that surprising events in the changepoint condition, which are tracked through the model’s estimate of changepoint probability, should amplify learning, as changepoints render prior cannonballs (and thus prior beliefs) irrelevant to the problem of predicting future ones (Adams and MacKay, 2007; Wilson et al., 2010). Qualitatively, behavior from the example participant seems to follow these prescriptions, with adjustments in shield position fairly minimal on trials that include a spike in oddball probability (Figure 1B,D), but fairly large on trials that include a spike in changepoint probability (Figure 1C,E).
The normative model also makes quantitative prescriptions for how learning should be adjusted according to surprise differentially in the changepoint and oddball conditions. The surprise of a given outcome can be measured crudely through the degree to which a cannonball location differed from that which was predicted (e.g., the shield position). Larger absolute prediction errors indicate a higher degree of surprise, and higher oddball or changepoint probabilities depending on the task condition. Learning in this task can be measured through the degree to which a participant adjusts the shield position in response to a given prediction error (Nassar et al., 2010), and a fixed rate of learning would correspond to a straight line mapping each prediction error onto a corresponding shield update, where the slope of the line can be thought of as the learning rate (Figure 2C, gray lines). The normative learning model does not prescribe a fixed learning rate across all levels of surprise; instead it prescribes higher learning rates for more surprising outcomes in the changepoint condition (Figure 2C, orange) and lower learning rates for more surprising outcomes in the oddball condition (Figure 2C, blue).
Participants adjusted learning behavior in accordance with normative predictions, albeit with considerable heterogeneity across trials and participants. Shield updating behavior and corresponding prediction errors for an example participant reveal the basic trend predicted by the normative model, although exact updates were variable from one trial to the next (Figure 2D). To summarize the degree to which updating behavior of individual subjects was contingent on key task variables, we constructed a linear regression model that described trialbytrial updates in terms of prediction errors as well as key task variables thought to modulate the degree to which prediction errors are translated into updates (Figure 2E) including condition (changepoint versus oddball block), surprise (as measured by changepoint or oddball probability estimates from normative model), and their multiplicative interaction (capturing the degree to which learning is increased for surprising outcomes in the changepoint context, but decreased for surprising outcomes in the oddball context). As expected, prediction error coefficients were positive, capturing a tendency for participants to update shield position toward the most recent cannonball position (Figure 2F, red; mean/SEM beta = 0.58/0.04, t = 14.4, dof = 38, p=6×10^{−17}). Furthermore, participants systematically adjusted the degree to which they did so according to condition (Figure 2F, green; mean/SEM beta = 0.08/0.02, t = 3.1, dof = 38, p=0.003), but not significantly according to surprise (Figure 1F, blue; mean/SEM beta = 0.03/0.03, t = 0.8, dof = 38, p=0.43). Critically, surprise robustly impacted learning in opposite directions for the two conditions, as indicated by the interaction between surprise and condition (Figure 2F, orange; mean/SEM beta = 0.71/0.07, t = 9.9, dof = 38, p=4×10^{−12}). Specifically, positive coefficients indicate that sensitivity to prediction errors was increased for surprising outcomes in the changepoint condition and decreased for surprising outcomes in the oddball condition (Figure 2—figure supplement 1), as predicted by the normative model.
Electrophysiological signatures of feedback processing
We took a data driven approach to identify electrophysiological signatures of feedback processing. First we regressed feedbacklocked EEG data collected simultaneously with task performance onto an explanatory matrix that included separate binary variables reflecting changepoint and oddball trials (as opposed to neutral trials that did not involve a rare event), amongst other terms (Figure 3A, left). Spatiotemporal maps for changepoint and oddball coefficients were combined to create a surprise contrast (changepoint +oddball) and a learning contrast (changepoint – oddball) for each subject. Contrasts were aggregated across subjects to create a map of tstatistics (Figure 3A, right), and spatiotemporal clusters of electrode/timepoints exceeding a clusterforming threshold were tested against a permutation distribution of cluster mass to spatially and temporally organized fluctuations in voltage that related to task variables.
When applied to the surprise contrast, this procedure yielded a number of significant clusters distributed across electrodes and timepoints (Figure 3C). One cluster of positive coefficients spanning 300–700 ms after onset of the cannonball location was of particular interest, given its consistency with the timing and direction of the canonical P300 response. Examining the spatial distribution of coefficients during this period revealed an early frontocentral locus of positive coefficients (350 ms; Figure 3B, left) that moves posterior and eventually dissipates over the subsequent 350 ms (Figure 3B, middle and right). The positive surprise contrast within the cluster included positive contributions of both changepoint and oddball trials (Figure 3—figure supplement 1).
The time course of positive surprise coefficients (peak tstatistic = 390 ms) is consistent with a P300 response locked to the outcome (cannonball location). Furthermore, the dynamics with which the positivity moves from anterior to posterior central electrodes is reminiscent of a transition from P3a to P3b signaling, with the spatial profile of early time points (e.g., 350 ms) consistent with the frontal P3a and the spatial profile of later time points more consistent with the parietal P3b (e.g., 500 ms). Average outcomelocked event related potentials in a frontocentral electrode (FCz) reveal a positive deflection from 300 to 500 ms (Figure 3D, black). This deflection is enhanced on both changepoint and oddball trials (Figure 3D,E, orange and blue), reminiscent of the P3a component, also referred to as the novelty P300. Posterior electrode (Pz) eventrelated potentials (ERPs) reveal a later and longer lasting positive deflection in response to a new outcome (Figure 3F, black). This positive deflection is enhanced on both changepoint and oddball trials (Figure 3F,G, orange and blue), reminiscent of the P3b, or updating component of the P300. Since the spatial and temporal profiles of this cluster were consistent with what has been referred to in previous literature as the P300, we will refer to it as a P300 signal.
In contrast to the EEG signature of surprise, which included a robust and extended P300 response, no signals were identified as reflecting the learning contrast (changepointoddball) after correcting for multiple comparisons using a permutation test (Figure 3—figure supplement 2).
Behavioral relevance of the P300
Competing theories posit different functional roles for the signal underlying the P300. In particular, some theories suggest that the P300 reflects a general surprise signal, whereas others attribute a more specific role in accumulating information, for example about the current state of the world. To test how the P300 may relate to learning behavior in our task we extracted trialtotrial measures of these components by taking the dot product of the cluster tmap and each single trial ERP (Figure 4A; Collins and Frank, 2018). The dot product indexes the degree to which a single trial ERP displays the profile of a given spatiotemporal cluster, thereby allowing us to test the degree to which the measured signal on any given trial might relate to behavior. We then examined how trialtotrial behavioral updates in shield position related to these single trial EEG signal strengths using a regression model similar to that employed in the behavioral analysis (Figure 4B). The regression model included two key terms to characterize the influence of 1) the multiplicative interaction of prediction error with the EEG signal strength, and 2) the interaction between prediction error, EEG signal strength and condition. The first EEGbased term provided a measure of the relationship between learning and the P300 that was independent of condition, and thus allowed us to test the prediction that the P300 reflects a direct learning signal (Figure 4C, left). The second EEGbased term provided a measure of the relationship between learning and the P300 that depended on condition (conditional learning), and thus allowed us to test the prediction that any learning impact of the P300 is bidirectionally sensitive to the source of surprise (Figure 4C, right).
Indeed, participant learning behavior systematically related to trialbytrial measures of the P300, but only in a manner that depended critically on task condition. Direct learning coefficients from the model revealed that the P300 signal was not systematically related to learning in the same manner across both conditions (Figure 4D, left; mean/SEM = −0.014/0.01, dof = 38, t = −1.5, p=0.14). In contrast, conditional learning coefficients tended to be positive (mean/SEM = 0.09/0.02, dof = 38, t = 4.7, p=3×10^{−5}), albeit with considerably heterogeneity across participants (Figure 4D, Right). Individual differences in the degree to which the P300 conditionally predicted learning were related to individual differences in the degree to which participant updates were conditionally responsive to surprise, as measured by our behavioral regression model (Figure 4E). In particular, the participants who showed the greatest behavioral modulation of learning according to surprise and condition (e.g., the behavioral effect that one would expect to be mediated by a conditional learning signal) tended to also have the highest conditional learning coefficients indicating the degree to which P300 conditionally predicted learning (Figure 4E; r = 0.54, p=3×10^{−4}). Learning rate predictions derived from the EEGbased regression model show that higher P300 signal strength predicts more learning in the changepoint condition (Figure 4E, orange), but less learning in the oddball condition (Figure 4E, blue) and this prediction was validated in a followup analysis that separately modeled the effect of EEG on learning in the changepoint and oddball conditions (Figure 4—figure supplement 1). Thus, there was a systematic relationship between P300 and learning, but that relationship was oppositely modulated by the task condition and hence the inferred source of surprise.
The relationship between the P300 and participant learning behavior persisted even after controlling for all known sources of variability in learning behavior. To establish this, we conducted a similar analysis to that described above to test whether the P300 displayed direct or conditional learning relationships to behavior, but also: 1) included an additional predictor term that could account for variability in updating captured by the behavioral regression model (Figure 2E), and 2) conducted the analysis in sliding windows of time from 300 to 700 ms after outcome presentation (Figure 5B). These modifications allowed us to test if and when the P300 could explain variance in updating behavior that was unrelated to observable task features (Figure 5A). Consistent with our previous analysis, conditional learning coefficients were positive at early time points within the P300 signal window (peak time = 318 ms, peak mean/SEM coefficient = 0.04/0.01) and the extent of contiguous positive coefficients was more extreme than would be expected due to chance (permutation test for cluster mass: mass = 52.8, p=0.01). We also observed in later time points that direct learning coefficients tended to be negative across participants, and this negativity was significant after correcting for multiple comparisons (permutation test for cluster mass: mass = 61.8, p=0.01), however this result should be interpreted cautiously given that we did not see a systematic direct learning effect before including behavioral predictions in the regression model (Figure 4D). Taken together, our results demonstrate that the magnitude of the P300 signal predicted learning increases in changepoint contexts and learning decreases in oddball contexts, and did so beyond what could otherwise be predicted with behavioral modeling alone.
Discussion
The brain receives a steady stream of sensory inputs, but these inputs differ dramatically from moment to moment in the degree to which they should affect ongoing inferences about the world. People and animals do not treat each datum in this stream as the same, and instead tend to rely more heavily on some pieces of information than others. Identifying the mechanisms through which these adjustments occur could be an important step toward understanding why learning occurs more rapidly in some domains or for some people, yet our understanding of these mechanisms has been heavily conditioned on specific statistical contexts, namely changing environments in which the degree to which one should learn from information is closely coupled to the surprise associated with it. Here we examined how relationships between learning and a specific brain signal, the P300 evoked EEG potential, depend on the statistical context that they are measured in.
We show that the P300 relates systematically to learning, but that the direction of this relationship depends critically on the statistical context. In a context where surprising events indicated changepoints (Figure 1C,E) and participants learned more from surprising information (Figure 2), larger P300 responses predicted increased learning (Figure 4). In contrast, in a context where surprising events indicated oddballs (Figure 1B,D) and participants deemphasized surprising information (Figure 2), larger P300 responses predicted reduced learning (Figure 4). These contextdependent predictive relationships explained variance in learning beyond what could be captured through computational modeling of behavior alone (Figure 5), suggesting that the P300 signal may be involved in adjustments of learning rate, but does so by mediating the subjective response to surprise, rather than translating surprise into a conditionally appropriate learning signal.
Neural representations of surprise and updating
A key question that has motivated a number of recent studies is how does the brain represent surprise differently than the belief updating it sometimes prescribes. Under most conditions, the degree of surprise is tightly linked to the update that is required. However, recent fMRI studies have exploited cued updating paradigms (O'Reilly et al., 2013), irrelevant stimulus dimensions (Schwartenbeck et al., 2016; Nour et al., 2018), and complementary statistical contexts (d'Acremont and Bossaerts, 2016) in order to tease apart neural representations of surprise and updating. While there are trends that seem to generalize across task boundaries (for example, dorsal anterior cingulate cortex (dACC) reflecting updating in cued updating and irrelevant stimulus dimension paradigms; O'Reilly et al., 2013; Nour et al., 2018) there is also a good deal of inconsistency across different tasks in terms of the roles of specific signals. For example, even though BOLD responses in dACC were identified as reflecting updating in two studies, they were shown to represent surprise in another (d'Acremont and Bossaerts, 2016) and manipulations of statistical context failed to reveal any brain regions that provide a pure updating signal (d'Acremont and Bossaerts, 2016).
One possible explanation for this discrepancy is that the component processes of updating and nonupdating might overlap in some specific paradigms. For example, the oddball outcomes that led to reduced learning in our paradigm and that of d’Acremont and Bossaerts were dissimilar to all previous outcomes and indistinguishable on other feature dimensions (in contrast to O'Reilly et al., 2013). Thus, while these outcomes do not contain information pertinent to ongoing beliefs about future outcomes, they did contain information critical for perception, namely that prior expectations should not be used to bias their perceptual representations (Krishnamurthy et al., 2017). Interestingly, recent work has suggested that people dynamically adjust the degree to which percepts are biased using systems, including the pupil linked arousal system, that are closely linked to the systems implicated in adjusting learning rate (Nassar et al., 2012; Krishnamurthy et al., 2017; Nieuwenhuis et al., 2011; Vazey et al., 2018; Urai et al., 2017; de Gee et al., 2017). Thus, one possible explanation for the inconsistency in previous studies attempting to dissociate surprise from updating is that these studies have differed in the degree to which they inadvertently manipulated systems for controlling perceptual biases.
Like in the previous fMRI study relying on statistical context to dissociate learning from surprise (d'Acremont and Bossaerts, 2016), our EEG results revealed a large number of signals related to surprise and no signals that convincingly reflected learning rate in a context independent manner. This comes as somewhat of a surprise given previous work identifying EEG signals analogous to a late P300 component reflecting surprise, predicting learning and influence on choice even in paradigms where this influence was unrelated to surprise (Cheadle et al., 2014; Jepma et al., 2016; Jepma et al., 2018; Fischer and Ullsperger, 2013; Wyart et al., 2012). In line with previous work from fMRI studies, we interpret the differences in our results from what might have been predicted based on previous work as pertaining to unique strategy we employed for dissociating learning from surprise through the use of different statistical contexts.
Mechanisms of learning rate adjustment
Our results, particularly when taken in the context of previous studies examining how the brain adjusts learning in accordance with surprise, constrain possible models of learning rate adjustment in the brain. We show that that the updating P300 signal, which positively predicts learning in changing environments (Figure 4E), also negatively predicts learning in a context with infrequent statistical outliers (Figure 4E). Thus, in a most basic sense, our results suggest that the P300 signals reflects an early contribution to learning rate adjustment, and that this signal is untangled according to statistical context at some downstream stage of processing. The lack of robust ERP correlates of direct learning signals (Figure 3—figure supplement 2) suggests that this downstream process does not have a tasklocked electrophysiological signature.
One potential mechanism for learning rate adjustment that fits well with these constraints is the notion that adjustments in learning might be implemented through flexible replacement of state representations (Collins and Reasoning, 2012; Collins and Frank, 2013; Wilson et al., 2014). Learning rate adjustment is adaptive in changing environments because it can effectively partition data relevant to the current predictive context from data that are no longer relevant to prediction (Adams and MacKay, 2007; Wilson et al., 2010). One possible implementation of this partitioning would be to change the active state representations that serve as the substrate for contextual associations. Recent work has identified signals in OFC, a region implicated in representations of latent states (Schuck et al., 2016), that change more rapidly during periods of rapid learning (Nassar et al., 2019). If this is indeed the implementation through which learning rate adjustments occur, observed learning rate signals might actually signal the need to adjust the representation of the latent state.
Interestingly, replacement of the active latent state, or partitioning of data more generally, might also be an effective way to implement the decreased learning observed in response to surprising observations in the oddball condition of our task. In the case of an oddball, one strategy would be to recognize the oddball as having been generated by an alternative causal process (e.g., oddball distribution) and to attribute learning to a latent representation of this process (Gershman and Niv, 2010). Under such conditions, implementation would require a surprise signal that reflects the relevance of this oddball latent state. After the new observation is attributed to the oddball context, the system would require a transition back into the original ‘nonoddball’ state in order to make a prediction that is unaffected by the most recent oddball outcome. The more effectively surprise is recognized and responded to through latent state changes (e.g., the stronger the surprise signal) the more effectively this implementation would partition an oddball observation from ongoing beliefs about the standard generative process, and therefore the smaller learning rates would be. Thus, one mechanistic interpretation of the P300 results might be that it is providing a partitioning signal that results in transitions in the internal latent state representation, which can either increase or decrease learning depending on the statistical context.
Implications for theories of P300 function
We took a data driven approach to identifying signals that related to surprising outcomes in different statistical contexts. The primary signal that we identified, however, was similar in timing (Figure 3D and F), location (Figure 3B), and sensitivity to surprise (Figure 3E and G) to those previously reported for the P300 (Kopp et al., 2016; Kolossa et al., 2012). The topography of our spatiotemporal cluster changed over time from frontocentral to centroparietal, consistent with inclusion of both an early frontocentral P3a component as well as a later centroparietal P3b component. Thus, although our methods were agnostic to detection of a specific signal, we interpret our results in the context of the larger literature relating to P300 signaling.
Our findings are consistent with a number of studies that have demonstrated the P300 is related to surprise (Jepma et al., 2016; Donchin, 1981; Wessel, 2018; Garrido et al., 2016), but extend them to reveal how the P300 differentially relates to learning in different contexts. Our results are inconsistent with standard interpretations of the context updating theory of the P300 in which context is defined as a working memory for an observable stimulus (Donchin, 1981; Donchin and Coles, 1988; Polich, 2003; Polich, 2007), as under this definition a larger P300 should always lead to more learning. However, if the updated ‘contexts’ were defined in terms of the latent states described above, the predictions of the context updating theory would indeed match our results. Thus, our results can constrain potential interpretations of the context updating theory, although they do not falsify the theory altogether. Nor do our results directly conflict with other prominent theories of P300 signaling including the idea that central parietal positivity might reflect accumulated evidence for a particular decision or course of action (Kelly and O'Connell, 2013; O'Connell et al., 2012), or anticipate the need to inhibit responding (Wessel, 2018; Wessel and Aron, 2017), as both of these theories could be framed in terms of the latent states above (e.g.. the accumulated evidence for a change in latent state or the need to inhibit responding until the appropriate latent state is loaded). Thus, our results do not arbitrate between these theories, but do require expansion of their interpretation (to include latent variables involved in the generation of outcomes) and also highlight their implications for learning when mechanistic interpretations are refined and applied to our task and data.
Confirming our proposed mechanistic interpretation of these results in terms of latent states would require future studies more closely relating P300 signals to purported state representations (Nassar et al., 2019). Furthermore, given that our study relied completely on computational modeling and correlations with behavior, our results raise important questions as to whether the observed associations could be manipulated directly pharmacologically or through biofeedback paradigms. Thus, our work provides new insight into the underlying mechanisms of learning rate adjustment and the role of the P300 in this process, but leaves many unanswered questions to be addressed in future research.
Materials and methods
Participants
Participants were recruited from the Brown University community: n = 39, 22 female, mean age = 20.2 (SD = 3.1, range = 18–36). Data from all 39 participants was included for both behavioral and EEG analysis. Sample size was selected based on a recent EEG study using a similar task and focusing on the P300 (Jepma et al., 2016). All human subject procedures were approved by the Brown University Institutional Review Board and conducted in agreement with the Declaration of Helsinki.
Cannon task
Request a detailed protocolParticipants performed a modified predictive inference task that is available on GitHub (Bruckner, 2019; copy archived at https://github.com/elifesciencespublications/AdaptiveLearning) and was programmed in Matlab (Mathworks, Natick, MA, USA), using the Psychtoolbox2 (http://psychtoolbox.org/) package. The task was based on predictive inference tasks in which participants are asked to predict the next in a series of outcomes (Nassar et al., 2010; Nassar et al., 2012; Nassar et al., 2016), but differed from previous such tasks the following ways: (1) the outcomes were generated from both changepoint and oddball processes to dissociate learning from surprise, (2) information necessary for performance evaluation was not available at time of outcome so that signals related to belief updating could be dissociated from valenced performance evaluation signals, (3) the task space was circular, and (4) the generative process was cast in terms of a cannon shooting cannonballs.
Participants were instructed to place a shield at some position along a circle subtending 5 degrees of visual angle in order to maximize the chances of catching a cannonball that would be shot on that trial (Figure 1A). During an instructional training period, the generative process that gave rise to cannonball locations was made explicit to participants. During this phase, participants were shown a cannon in the center of the screen. On each trial, a cannonball would be ‘shot’ from that cannon with some angular variability (Von Mises distributed ‘Noise’, concentration = 10 degrees). A key manipulation in our design was how the aim of the cannon evolved from one trial to the next. The cannon would either (1) remain stationary on the majority of trials and reaim to a random angle with an average hazard rate of 0.14 (changepoint condition) or (2) change position slightly from one trial to the next according to a Von Mises distributed random walk with mean zero and concentration 30 degrees (oddball condition). In the changepoint condition, all cannonballs were displayed as originating at the cannon in the center of the circle, whereas in the oddball condition a small fraction (0.14) of trials were oddballs, in which the cannonball location was sampled uniformly across the entire circle and the cannonball appeared without a trajectory.
Each experimental condition was preceded by (1) instructions that included an explicit description of the generative process for that condition and (2) a set of training trials with the cannon visible such that the generative process could be observed directly. These instructions and training trials were designed to ensure that all participants were aware of the statistical context, so as to improve our ability to detect EEG signals that related to it.
After completing the instructional training, in which the generative process was fully observable, participants were asked to perform the same basic task without being able to see the cannon. In this experimental phase participants were forced to use knowledge of the generative structure gained during training, along with the sequence of prior cannonball locations, in order to infer the aim of the cannon and to inform shield placement. When making a prediction, tick marks on the circle indicated the locations of the cannonball and shield placement from the previous trial. Participants completed four blocks of 60 trials for each task condition (changepoint and oddball) in order randomized across participants. The 240 experimental trials for each condition always followed the instructional training period for that condition in order to minimize ambiguity over which generative structure was giving rise to the experimental outcomes.
On each trial of the experimental task, participants would adjust the position of the shield through key presses (starting at the shield position from the previous trial) until they were satisfied with its location (Figure 1A; prediction phase). After participants locked in their prediction (through a key press) there was a 500 ms delay and then the cannonball location was revealed for 500 ms (Figure 1A; outcome phase). The cannonball then disappeared for 1000 ms before it reappeared, along with a full depiction of the participants shield (Figure 1A; shield phase). The shield was always centered on the position indicated by the participant during the prediction phase, but differed in size from one trial to the next in a random and unpredictable fashion that ensured subjects could not predict whether they would successfully ‘catch’ the cannonball during the outcome phase. Thus, information provided during the outcome phase provided all necessary information to update beliefs about the cannon aim, but did not contain sufficient information to determine whether the cannonball would be successfully blocked on the trial. In addition to trial feedback provided during the shield phase, participants were also provided information about their performance at the end of each block that included the fraction of cannonballs that were blocked. Participants were paid an incentive bonus at task completion that was based on the number of cannonballs that were blocked.
Computational model
Request a detailed protocolOptimal inference in the changepoint condition would require considering all possible durations of stable cannon position (Adams and MacKay, 2007; Wilson et al., 2010) but can be approximated by collapsing the mixture of predictive distributions expected to arise from this optimal solution into a single Gaussian distribution, which approximates the posterior probability distribution over cannon locations, achieves near optimal inference, reduces to an error driven learning rule in which learning rate is adjusted from moment to moment according to environmental statistics, and provides a detailed account of human behavior (Nassar et al., 2010; Nassar et al., 2016). Similarly, the ideal observer for the oddball generative process would require tracking the predictive distributions and posterior probabilities associated with each possible sequence of oddball/non oddball trials that could have preceded the time step of interest. Like in the changepoint condition, this algorithm can be simplified by approximating the set of all possible predictive distributions with a single Gaussian distribution, leading to an error driven learning rule in which learning rate is adjusted dynamically from trial to trial, allowing us to derive normative prescriptions for learning for both conditions (see supplementary material for full derivation).
While the normative model for the changepoint condition has been described elsewhere (Nassar et al., 2016) the analogous model for the oddball condition is not, and thus we describe the normative account of oddball learning in full detail. In order to minimize the differences between experienced and modeled latent variables, we formulated our model in terms of the prediction errors made by participants on each trial (rather than those that would have been made by the model) (Nassar et al., 2016). On each trial of the oddball condition, the normative model: (1) updated its representation of uncertainty, (2) observed a prediction error and computed the probability that the prediction error reflects an oddball, (3) computed the normative learning rate by combining uncertainty (step 1) and oddball probability (step 2), (4) adjusted prediction about cannon position according learning rate and prediction error.
Relative uncertainty, which reflects the fraction of uncertainty about an upcoming cannonball location that is due to imperfect knowledge of the cannon aim and is analogous to the Kalman gain, was updated on each trial according to the most recent observation (which should decrease uncertainty about cannon position) and the expected drift in the aim of the cannon occurring between trials (which should increase uncertainty about cannon position). Given that relative uncertainty is expressed as a fraction of total uncertainty, it is useful to think of the numerator of the fraction, or the estimation uncertainty over possible cannon aims, which is the variance on a gaussian mixture distribution and is updated as follows:
where ${\mathrm{\Omega}}_{t}$ is the probability that an oddball occurred on trial $t$, ${\sigma}_{N}^{2}$ reflects the variance on the distribution of cannonball locations around the true cannon aim (noise), ${\tau}_{t}$ reflects the relative uncertainty on trial $t$, ${\delta}_{t}$ is the prediction error made in predicting the outcome on trial $t$, and ${\sigma}_{drift}^{2}$ reflects the degree to which the cannon position drifts from one trial to the next. The first two terms in the model reflect the oddball and nonoddball contributions to the updated uncertainty, the third term reflects uncertainty resulting from the difference between predictions for trial t+1 conditioned on an oddball or nonoddball having occurred on trial $t$, and the last term reflects uncertainty resulting from the expected drift of the cannon position between trials. Relative uncertainty for trial $t+1$ is then updated as the updated fraction of uncertainty about the upcoming outcome that is attributable to imprecise knowledge of the true cannon position, rather than to noise in the distribution of exact cannonballs around that position:
The updated relative uncertainty, along with assumed knowledge of the overall noise and hazard rate, were used to calibrate the oddball probability associated with each new prediction error:
Where H is the average hazard of an oddball (0.14) and ${\delta}_{t+1}$ is the new prediction error, and the second term in the denominator reflects the probability density on a normal distribution centered on the predicted location and with variance derived from relative uncertainty. The model’s prediction about cannon aim was then updated according to a fraction of the prediction error ${\delta}_{t+1}$ with the exact fraction, or learning rate, determined according to the updated uncertainty and oddball probability:
Note that relative uncertainty ($\tau}_{t+1$) contributes positively to the learning rate, whereas oddball probability (${\mathrm{}\mathrm{\Omega}}_{t+1}$) reduces the learning that would otherwise be dictated by the current level of uncertainty.
Behavioral analysis
Request a detailed protocolTwo key behavioral measures were extracted from each trial. First, the prediction error on a trial was defined as the circular distance between the cannonball location and the shield position for that trial. Second, the update on a given trial was defined as the circular distance between the shield position on that trial and the shield position on the subsequent trial (e.g., the updated shield position). In order to better understand the computational factors governing adjustments in shield position, we fit updates with a linear model that included an intercept term to model overall biases in learning along with a prediction error term to capture general tendencies to adjust the shield towards the most recent cannonball location. The model also included additional terms to model how the influence of recent cannonball locations changed dynamically according to task context. These terms included: (1) prediction error times uncertainty interaction (to model how much more participants updated shield position under conditions of uncertainty – as assessed by the computational model), (2) prediction error times surprise (where surprise was indexed by changepoint probability or oddball probability from computational model depending on the context), (3) prediction error times surprise times condition (where condition was +1 for changepoint blocks and −1 for oddball conditions), (4) prediction error times block (a categorical variable indicating whether the shield ‘blocked’ the most recent cannonball. The model was fit to updates from each participant individually, excluding updates in response to outcomes immediately following oddballs, as such updates are difficult to attribute to learning (e.g., movements toward recent outcome) rather than memory (returning to a preoddball location). Nonetheless, inclusion of these trials in our behavioral models does not change the primary results reported here.
Unlike standard regression in which the error distribution is assumed to be normal, our model imposed a circular (VonMises) distribution of errors around the predicted update. Maximum posterior coefficients for each individual subject were estimated using the fmincon optimization tool in Matlab (Mathworks, Natick, MA, USA) and ttests were performed on the regression coefficients across participants to test for significant contributions of each term to update behavior. Weak Gaussian zerocentered priors were included to regularize coefficients of interest. In the purely behavioral analysis the width of priors over coefficients on standardized predictors was set to 5. In the analyses that included EEG predictors, EEGbased predictors were regularized using a zero centered Gaussian prior with a standard deviation of 0.1 (as compared to a standard deviation of 1 for the predicted learning rates from the behavioral model) making the regularization for the EEG terms stronger than that on the competing behavioral prediction term by a factor of 10 (thereby allowing preferential explanation of shared variance by the other terms in the model).
EEG acquisition
Request a detailed protocolEEG was recorded from a 64channel Synamps2 system (0.1–100 Hz bandpass; 500 Hz sampling rate). Data were collected using CPz as a reference channel and rereferenced to the grand mean for analysis. Continuous EEG data was epoched with respect to the outcome presentation for each trial. Preprocessing was done manually in Matlab (Mathworks, Natick MA) using the EEGLAB toolbox (https://sccn.ucsd.edu/eeglab/index.php) as described previously (Collins and Frank, 2018) and included the following steps: (1) epoching and alignment to outcome onset, (2) epoch rejection by inspection, (3) channel removal and interpolation by inspection, (4) bandpass filtering [.1–50 hz], (5) removal of blink and eye movement components using ICA.
EEG analysis
Request a detailed protocolEEG Data for individual participants were analyzed using a mass univariate approach. Specifically, the trial series EEG data for a given participant, channel, and time relative to outcome onset was regressed onto an explanatory matrix that included the following explanatory variables: (1) intercept, (2) changepoint, (3) oddball, (4) condition, (5) block. Explanatory variables 2 and three were binary variables marking trials in which a surprising event occurred (i.e. changepoint or oddball) whereas four reflected the overall task context (i.e. whether oddballs or changepoints were present in the current statistical context), and five conveyed whether the participant successfully ‘blocked’ the cannonball on each trial. Surprise and learning contrasts were created as the sum and difference of the changepoint and oddball coefficients, respectively. Tstatistics were computed across subjects to assess the consistency of contrasts at each electrode and timepoint.
Tstatistic maps were thresholded (cluster forming threshold of p=0.001, two tailed) and spatiotemporal clusters were identified as temporally and/or spatially contiguous signals that shared a common sign of effect and exceeded the clusterforming threshold. Cluster mass was computed as the average absolute tstatistic within a cluster times its size (number of electrode timepoints contained within it). Cluster mass for each spatiotemporal cluster was compared to a permutation distribution for cluster mass generated using sign flipping to correct for multiple comparisons (Nichols and Holmes, 2002).
Trialtotrial EEG analyses were conducted by computing the dot product of the tstatistic map for a given spatiotemporal cluster and the ERP measured on a given trial. The resulting measure of EEG signal strength was then zscored across all trials and included in a behavioral regression model to explain trialtotrial updating behavior. Like for the behavioral analyses, trialtotrial updates were regressed onto an explanatory matrix that included intercept and prediction error terms to capture updating biases and static tendencies to update toward recent cannonball locations. In addition, EEG informed linear model included (1) the interaction between the EEG signal strength computed above and prediction error (direct learning), and (2) the threeway interaction between EEG signal strength, prediction error, and condition (conditional learning). Positive direct learning coefficients indicated an unconditional increase in learning for trials in which EEG signal strength was greater, whereas positive conditional learning coefficients indicated a positive relationship between EEG signal strength and learning in the changepoint condition but a negative relationship between EEG signal strength in the oddball condition.
In order to test if and when the EEGupdating relationships could explain behavior beyond the best descriptions afforded by our behavioral model, we applied the method described above with two changes. First, we computed EEG signal strength in sliding windows of time by masking the unthresholed surprise tstatistic map with a sliding 40 ms window and taking the dot product of the masked tmap and masked ERPs from each trial. We included signal strength calculated in this way in a regression to explain updating behavior as described above, except that we also included the predicted update from our purely behavioral model, as a competing explanatory variable. Direct learning and conditional learning coefficients for each participant were smoothed in time by convolving them with a Gaussian kernel (std = 8 ms). Tstatistics were computed for each sliding window and coefficient across participants and temporal clusters of extreme tstatistics were formed using a clusterforming threshold of p<0.05. Cluster mass was computed as described above and compared to permutation distribution created with iterative signflipping (10000 permutations) to estimate a clustercorrected pvalue.
Appendix 1
Computational modeling
Derivation of normative learning model for changepoint condition
The generative process in the changepoint condition can be defined in terms of the following sampling statements:
Here we assume that the hazard rate is known, given the extensive training participants receive under visible cannon conditions, however the more general case where hazard rate is unknown is similar and has been addressed elsewhere (Wilson et al., 2010). Note that the generative hazard rate differs from the empirical hazard rate (0.14) because the first trial of each block was considered to be a changepoint.
The inference problem is thus to infer the current cannon aim based on the sequence of observed cannonballs, which has a recursive solution according to the conditional probability:
that can be derived based on the abovedefined random variables. The joint distribution in the numerator can be expanded and partitioned according to the generative graph as follows:
and the marginal distribution in the denominator is obtained from
which together thus yields
Thus, on each trial it is possible to infer the probability distribution over possible cannon locations given all previous cannonball locations using a recursive rule that updates a prior inference from the previous trial $p({C}_{t1}{B}_{1:t1})$ according to the appropriate transition function ${p(C}_{t}\left{S}_{t},{C}_{t1}\right)$ and the likelihood of the observed cannonball location from this trial $p\left({B}_{t}{C}_{t}\right)$. Expanding the summation over the changepoint variable reveals that $p({C}_{t}{B}_{1:t})$ is a mixture with two components:
Where the first component reflects the ‘changepoint’ predictive distribution and the latter reflects the ‘non changepoint’ predictive distribution. In principle, we could maintain this mixture distribution and add a new mixture component with each observation (Appendix 1—figure 2, top), producing exact parametric inference at a computational cost that scales linearly with time (Adams and MacKay, 2007). However, instead, we approximate the mixture distribution by replacing it with a Gaussian distribution that shares the same mean and variance as the full mixture (Appendix 1—figure 2, bottom). The mean of the updated mixture distribution can be approximated with the weighted sum of the two mixture components:
In the case of a changepoint, the mean of the predictive distribution over cannon aim $({\hat{c}}_{t+1}{s}_{t}=1)$ is equivalent to the most recent outcome ${B}_{t}$. The mean of the nonchangepoint distribution is a convex combination of the prior predictive mean $\hat{c}}_{t$ and the most recent outcome ${B}_{t}$. Together, this yields an errordriven learning rule, in which the influence of unpredicted outcomes, or learning rate, is adjusted on each trial (Nassar et al., 2010):
where $\hat{c}$ is the mean of the predictive distribution over cannon locations, ${\delta}_{t}\text{}:=\text{}({B}_{t}{\hat{c}}_{t})$ is the prediction error, and $\alpha}_{t}\text{}:=\text{}{\tau}_{t}\text{}+\text{}{\mathrm{\Omega}}_{t}\text{}\text{}{\tau}_{t}\text{}{\mathrm{\Omega}}_{t$ is a learning rate that depends on changepoint probability (${\mathrm{\Omega}}_{t}$, the integral over ${C}_{t}$ in the first term of Equation 5) and relative uncertainty ${\tau}_{t}$. The contribution of ${\mathrm{\Omega}}_{t}$ to the learning rate emerges because, as shown above, the mean is equal to the probabilityweighted average of the means of the individual mixture components (changepoint and nonchangepoint terms in Equation 6a) and thus ${\mathrm{\Omega}}_{t}$ controls the weights of these two terms. The mean of the changepoint component is centered at the most recent outcome (corresponding to an ${\alpha}_{t}$ of 1) and the mean of the nonchangepoint component is an uncertainty ${\tau}_{t}$ weighted average of the most recent outcome and the mean of the prior predictive distribution over cannon location ${\widehat{c}}_{t}$ (corresponding to an ${\alpha}_{t}$ of ${\tau}_{t}$). A full derivation of the ${\tau}_{t}$ term has been provided in previous work (Nassar et al., 2012).
Derivation of normative learning model for oddball condition
The generative process in the oddball condition can be defined in terms of the following sampling statements:
Once again, the inference problem is thus to infer the current cannon aim based on the sequence of observed cannonballs, which has a recursive solution that is similar to the solution of the changepoint condition described above:
Note that this equation differs from the changepoint solution only in that the likelihood of ${B}_{t}$ is now conditional on ${S}_{t}$ (now reflecting a binary oddball variable) whereas the ${C}_{t}$ is now conditionally independent of ${S}_{t}$. Once again, the equation can be expanded to reveal a mixture of two components:
Where, once again, the first component reflects the ‘oddball’ predictive distribution and the latter reflects the ‘non oddball’ predictive distribution. As described for the changepoint condition, we could maintain this mixture distribution and propagate new mixture components with each observation (Appendix 1—figure 2, top). However, instead, we approximate the mixture distribution by replacing it with a Gaussian distribution that shares the same mean and variance as the full mixture (Appendix 1—figure 2, bottom). The mean of the updated mixture distribution can be written as the weighted sum of the two mixture components:
In the case of an oddball, the mean of the predictive distribution over cannon aim $({\hat{c}}_{t+1}{s}_{t}=1)$ is equivalent to the prior predictive mean, as the likelihood distribution $\left({B}_{t\text{}}{s}_{t}=1\right)$ is uniformly distributed for oddball trials. The mean of the nonoddball distribution is, like in the nonchangepoint case in the changepoint condition, a weighted average of the prior predictive mean and the most recent outcome. This combination can also be expressed in terms of an errordriven learning rule:
Note that the learning rate equation ${\alpha}_{t}\text{}:=\text{}\left({\tau}_{t}{\mathrm{\Omega}}_{t}{\tau}_{t}\right)$ differs from that in the changepoint condition in that it does not include a ${(+\mathrm{\Omega}}_{t}$) term. This is because the oddball mixture component is centered on the previous belief (as oddballs don’t affect the position of the cannon), and thus higher levels of oddball probability $\mathrm{\Omega}$ push learning rates towards zero, rather than towards one.
Data availability
All analysis code has been made available on GitHub (https://github.com/learningmemoryanddecisionlab/NassarBrucknerFrank_eLife_2019.git; copy archived at https://github.com/elifesciencespublications/NassarBrucknerFrank_eLife_2019). All behavioral and EEG data has been made available on Dryad (https://doi.org/10.5061/dryad.570pf8n).

Dryad Digital RepositoryStatistical context dictates the relationship between feedbackrelated EEG signals and learning.https://doi.org/10.5061/dryad.570pf8n
References

Learning the value of information in an uncertain worldNature Neuroscience 10:1214–1221.https://doi.org/10.1038/nn1954

Cognitive control over learning: creating, clustering, and generalizing taskset structurePsychological Review 120:190–229.https://doi.org/10.1037/a0030852

Presidential address, 1980. Surprise!...surprise?Psychophysiology 18:493–513.https://doi.org/10.1111/j.14698986.1981.tb01815.x

Is the P300 component a manifestation of context updating?Behavioral and Brain Sciences 11:357.https://doi.org/10.1017/S0140525X00058027

Learning latent structure: carving nature at its jointsCurrent Opinion in Neurobiology 20:251–256.https://doi.org/10.1016/j.conb.2010.02.008

Catecholaminergic regulation of learning rate in a dynamic environmentPLOS Computational Biology 12:e1005171.https://doi.org/10.1371/journal.pcbi.1005171

Noradrenergic and cholinergic modulation of belief updatingJournal of Cognitive Neuroscience 30:1803–1820.https://doi.org/10.1162/jocn_a_01317

Internal and external influences on the rate of sensory evidence accumulation in the human brainJournal of Neuroscience 33:19434–19441.https://doi.org/10.1523/JNEUROSCI.335513.2013

Kopp B. A modelbased approach to trialbytrial p300 amplitude fluctuationsFrontiers in Human Neuroscience 6:359.https://doi.org/10.3389/fnhum.2012.00359

P300 amplitude variations, prior probabilities, and likelihoods: a bayesian ERP studyCognitive, Affective, & Behavioral Neuroscience. Cognitive, Affective, & Behavioral Neuroscience 16:1–18.https://doi.org/10.3758/s1341501604423

Rational regulation of learning dynamics by pupillinked arousal systemsNature Neuroscience 15:1040–1046.https://doi.org/10.1038/nn.3130

Dissociable forms of uncertaintydriven representational change across the human brainJournal of Neuroscience 39:1688–1698.https://doi.org/10.1523/JNEUROSCI.171318.2018

A supramodal accumulationtobound signal that determines perceptual decisions in humansNature Neuroscience 15:1729–1735.https://doi.org/10.1038/nn.3248

Updating P300: an integrative theory of P3a and P3bClinical Neurophysiology 118:2128–2148.https://doi.org/10.1016/j.clinph.2007.04.019

Neural signals encoding shifts in beliefsNeuroImage 125:578–586.https://doi.org/10.1016/j.neuroimage.2015.10.067

Do humans make good decisions?Trends in Cognitive Sciences 19:27–34.https://doi.org/10.1016/j.tics.2014.11.005

Bayesian online learning of the hazard rate in changepoint problemsNeural Computation 22:2452–2476.https://doi.org/10.1162/NECO_a_00007
Decision letter

Tobias H DonnerReviewing Editor; University Medical Center HamburgEppendorf, Germany

Timothy E BehrensSenior Editor; University of Oxford, United Kingdom

Tobias H DonnerReviewer; University Medical Center HamburgEppendorf, Germany

Redmond G O'ConnellReviewer; Trinity College Dublin, Ireland
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Statistical context dictates the relationship between feedbackrelated EEG signals and learning" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Tobias H Donner as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jonas Obleser (Reviewer #2); Redmond G O'Connell (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
Nassar, Bruckner, and Frank examine the contextdependence of the impact of surprising sensory events on learning and choice behaviour. To this end, they have participants perform a continuous sensory decisionmaking task in two different statistical contexts: one in which mean of the process generating the evidence changes at unpredictable times ("change point task") and one in which this generative process does not change, but – by design – generates occasional outliers. Surprising evidence samples should elicit an adjustment of choice behavior in the first context, but be ignored in the second context.
Behavioural modelling shows that this normative prediction holds for human participants: They do factor surprising evidence samples into behaviour differently depending on the statistical context. The authors also examine EEG data for signatures of this contextdependent encoding of surprise and learning. A centroparietal positivity in the evoked response scales with surprise irrespective of statistical context; but the influence of this response component on choice updating is conditional on context. The authors link the response component to the classic P3b or P300 of the EEG. The authors conclude that the P3 provides a general surprise signal, which is fed to a downstream process which translates surprise into a contextually appropriate behavioural adjustment.
This study addresses an interesting and timely question. It uses an original approach and is generally well executed. Specifically, all reviewers were impressed by the behavioral modelling part, but there are some issues pertaining the approach and interpretation of the EEG part, which should be resolved prior to publication.
Essential revisions:
1) Behavioral modelling.
Please present the normative model in more depth, in a longer methods section or supplement. Specifically, you should (i) derive the oddball version of the model, and (ii) explain the difference between the changepoint and oddball versions of the model. A description of how the application of the model to a circular stimulus space differs from previous versions of the model would also be helpful.
2) Relationship to previous literature and theory on P3.
2a) Tease apart the novel aspects and the replication of established findings more explicitly.
The finding that P300 indexes surprise is already wellestablished in the oddball literature. For example, Kollossa et al., 2013 state: "It has long been recognized that fluctuations in P300 amplitude reflect the degree of surprise.". What is novel here seem to be two things: first, the establishment of the same surprise sensitivity in the context of a change point process; and second, the contextdependent relation of the signal to learning. This should be clarified throughout the manuscript, including the Impact Statement. Previous studies quantifying the link between P3 and surprise should be cited and discussed – specifically:
 Mars et al., (2008)
 Work by Bruno Kopp, e.g. Kopp et al., (2016).
2b) Relation to existing accounts of P3.
The discussion of how the present results relate to previous accounts of the P3 is somewhat confusing and should be clarified. In the Introduction the authors initially only allude to the context updating theory of the P3 but other accounts are mentioned in the Discussion section. A key issue here is that no clear initial hypotheses are derived from these models in the Introduction and the efforts to relate the present results to the models in the Discussion section is unclear – in several instances it is initially suggested that the present findings are at odds with a given theory but then acknowledged that the findings are potentially reconcilable. If the authors are unable to generate unique hypotheses for the present data based on previous theories of the P3, then this should be stated clearly.
Much of the P3 literature and the explanatory accounts have tended to centre on responses to stimuli that call for an immediate choice and report. Here the authors are effectively looking at feedbackrelated responses and it does seem difficult to generate specific predictions from the models in this particular context but that is not necessarily a limitation of the models, more a matter of unexplored territory requiring additional simulations and empirical research. For example, if the P3 reflects the perceptual decision relating to the location of the canonball (i.e. an evidence accumulation process) then, in line with the expectationrelated bound modulations proposed in sequential sampling models, one would expect larger responses for less expected canonball locations and this would effectively fit the bill of a surprise response and would be expected to relate to conditional behaviour updating. Alternatively, and the peak timing of the P3 might speak to this, the P3 may largely/partly reflect the process of selecting the next shield position in light of the current outcome and more surprising events may prompt more careful deliberation resulting in high decision bounds and larger signal amplitudes. Our impression is that the present results are interesting in their own right but do not necessarily arbitrate among existing explanatory accounts of the P3.
In the final paragraphs the authors actually lay out a compelling account of what might be going on here without necessitating a full functional account of the P3: the P3 surprise response can play a role in triggering a change in the latent state. We suggest leading with this and following up with a discussion of the relevant theories.
2c) Relationship between P3 and the EEG component identified here.
While we appreciate the general "datadriven" approach used by the authors, we noticed that it inevitably raises questions about the relation to the socalled "P3 components" characterised in the oddball literature. This point requires more discussion.
3. Data analysis.
We believe that several aspects of the data analysis require further attention, specifically:
3a) The authors state that the critical PE*surprise*condition (or PE*EEG*condition) regressors indicate whether "surprise (EEG signal) tends to increase learning in the changepoint condition but decrease learning in the oddball condition". But these interaction term regressors only test for a significant interaction – significant β weights do not imply a sign flip between conditions (increase in one condition, decrease in the other). For example, if surprise increased learning in the change point task, but does not correlate with learning in the oddball task, this might still yield a significant interaction. A sign flip should be assessed via posthoc comparisons.
3b) Subsection “Electrophysiological signatures of feedback processing”: With two conditions, oddball and changepoint, in this experiment, how can we have separate regressor weight estimates for both (one dummy variable coding condition would suffice/avoid collinearity)?
3c) The contrast "surprise" based on those two regressors might be modelled too liberally: A contrast setting both conditions to "1" is not necessarily identical to a true conjunction (i.e., both regressors driving the EEG significantly). This has been dealt with extensively in the fMRI/GLM literature. In short, outcomes from this "surprise" contrast are not necessarily as decisive as outcomes from a true difference contrast ("learning").
3c) Figure 5: Why should (behavioural) learning outcome be used to predict (on the yaxis) the temporally preceding positivity in the EEG? Also, the entire figure seems to stand on statistically shaky grounds, with p values in the.02.04 range in highly sophisticated/convoluted models with many researcher degrees of freedom. Under the null, the result in Figure 5C would be as surprising (4 to 5 heads in row). The authors should do more to convince the reader that we are not looking at some lucky, highly selective results.
3d) The authors highlight an early frontocentral modulation in Figure 3 as being the P3a however the traces 3D indicate that this signal is equal in amplitude for expected and oddball stimuli. Shouldn't it be larger for oddballs if it is indeed a P3a?
3e) We suggest toning down the language in certain instances where the authors seem to imply that they have established a causal role for the P3 in belief updating e.g. Our findings are consistent with a number of studies that have suggested the P300 is related to surprise (9,14,17,24), but extend them by demonstrating the role of the signal in controlling the degree to which new information affects updated beliefs.
3f) The authors excluded 12/37 subjects excluded from EEG analysis because of low data quality. The criterion of excluding any subject with >25% artifactual trials seems rather stringent. Can you provide more rationale for the procedure? Are the main results robust with respect to such (arbitrary) selection criteria?
3g) 0.5 Hz is quite a severe highpass cutoff and likely to attenuate some of the P3 activity. We don't think this could account for the significant effects of surprise etc but we would encourage the authors to repeat their key analyses with a substantially lower cutoff (e.g. 0.05 Hz) just to make sure that nothing changes
3f) What reference channel did the authors use for the EEG analyses – grand average?
https://doi.org/10.7554/eLife.46975.019Author response
Summary:
Nassar, Bruckner, and Frank examine the contextdependence of the impact of surprising sensory events on learning and choice behaviour. To this end, they have participants perform a continuous sensory decisionmaking task in two different statistical contexts: one in which mean of the process generating the evidence changes at unpredictable times ("change point task") and one in which this generative process does not change, but – by design – generates occasional outliers. Surprising evidence samples should elicit an adjustment of choice behavior in the first context, but be ignored in the second context.
Behavioural modelling shows that this normative prediction holds for human participants: They do factor surprising evidence samples into behaviour differently depending on the statistical context. The authors also examine EEG data for signatures of this contextdependent encoding of surprise and learning. A centroparietal positivity in the evoked response scales with surprise irrespective of statistical context; but the influence of this response component on choice updating is conditional on context. The authors link the response component to the classic P3b or P300 of the EEG. The authors conclude that the P3 provides a general surprise signal, which is fed to a downstream process which translates surprise into a contextually appropriate behavioural adjustment.
This study addresses an interesting and timely question. It uses an original approach and is generally well executed. Specifically, all reviewers were impressed by the behavioral modelling part, but there are some issues pertaining the approach and interpretation of the EEG part, which should be resolved prior to publication.
Essential revisions:
1) Behavioral modelling.
Please present the normative model in more depth, in a longer methods section or supplement. Specifically, you should (i) derive the oddball version of the model, and (ii) explain the difference between the changepoint and oddball versions of the model. A description of how the application of the model to a circular stimulus space differs from previous versions of the model would also be helpful.
We now include a full derivation of the normative learning model for the oddball condition in the supplementary material and refer interested readers to it:
“…leading to an error driven learning rule in which learning rate is adjusted dynamically from trial to trial, allowing us to derive normative prescriptions for learning for both conditions (see supplementary material for full derivation).”
We now also provide additional information about how the circular distributions were handled with respond to modeling and fitting the regression model as follows:
“Unlike standard regression in which the error distribution is assumed to be normal, our model imposed a circular (VonMises) distribution of errors around the predicted update. Maximum posterior coefficients for each individual subject were estimated using the fmincon optimization tool in Matlab (Mathworks, Natick, MA, USA) and ttests were performed on the regression coefficients across participants to test for significant contributions of each term to update behavior. Weak Gaussian zerocentered priors were included to regularize coefficients of interest. In the purely behavioral analysis the width of priors over coefficients on standardized predictors was set to 5. In the analyses that included EEG predictors, EEGbased predictors were regularized using a zero centered Gaussian prior with a standard deviation of 0.1 (as compared to a standard deviation of 1 for the predicted learning rates from the behavioral model) making the regularization for the EEG terms stronger than that on the competing behavioral prediction term by a factor of 10 (thereby allowing preferential explanation of shared variance by the other terms in the model).”
We also plan to make the analysis and modeling code available on the first author’s website once the paper has been accepted in final form.
2) Relationship to previous literature and theory on P3.
2a) Tease apart the novel aspects and the replication of established findings more explicitly.
The finding that P300 indexes surprise is already wellestablished in the oddball literature. For example, Kollossa et al., 2013 state: "It has long been recognized that fluctuations in P300 amplitude reflect the degree of surprise.". What is novel here seem to be two things: first, the establishment of the same surprise sensitivity in the context of a change point process; and second, the contextdependent relation of the signal to learning. This should be clarified throughout the manuscript, including the Impact Statement. Previous studies quantifying the link between P3 and surprise should be cited and discussed – specifically:
 Mars et al., (2008)
 Work by Bruno Kopp, e.g. Kopp et al., (2016).
We now make this clear in the Impact statement:
“The P300, an EEG component known to be evoked by surprising events, predicts learning in a bidirectional manner that depends critically on the surrounding statistical context.”
And in the Introduction:
“While the central parietal component of the P300 (P3b) has been long known to reflect surprise (Mars et al., 2008; Kolossa, Kopp and Fingscheidt, 2015; Kopp et al., 2016; Seer et al., 2016; Kolossa et al., 2012), recent work suggests it relates to learning (Fischer and Ullsperger, 2013) even after controlling for the degree of surprise in changing environments (Jepma et al., 2016; Jepma et al., 2018).”
2b) Relation to existing accounts of P3.
The discussion of how the present results relate to previous accounts of the P3 is somewhat confusing and should be clarified. In the Introduction the authors initially only allude to the context updating theory of the P3 but other accounts are mentioned in the Discussion section. A key issue here is that no clear initial hypotheses are derived from these models in the Introduction and the efforts to relate the present results to the models in the Discussion is unclear – in several instances it is initially suggested that the present findings are at odds with a given theory but then acknowledged that the findings are potentially reconcilable. If the authors are unable to generate unique hypotheses for the present data based on previous theories of the P3, then this should be stated clearly.
Much of the P3 literature and the explanatory accounts have tended to centre on responses to stimuli that call for an immediate choice and report. Here the authors are effectively looking at feedbackrelated responses and it does seem difficult to generate specific predictions from the models in this particular context but that is not necessarily a limitation of the models, more a matter of unexplored territory requiring additional simulations and empirical research. For example, if the P3 reflects the perceptual decision relating to the location of the canonball (i.e. an evidence accumulation process) then, in line with the expectationrelated bound modulations proposed in sequential sampling models, one would expect larger responses for less expected canonball locations and this would effectively fit the bill of a surprise response and would be expected to relate to conditional behaviour updating. Alternatively, and the peak timing of the P3 might speak to this, the P3 may largely/partly reflect the process of selecting the next shield position in light of the current outcome and more surprising events may prompt more careful deliberation resulting in high decision bounds and larger signal amplitudes. Our impression is that the present results are interesting in their own right but do not necessarily arbitrate among existing explanatory accounts of the P3.
In the final paragraphs the authors actually lay out a compelling account of what might be going on here without necessitating a full functional account of the P3: the P3 surprise response can play a role in triggering a change in the latent state. We suggest leading with this and following up with a discussion of the relevant theories.
We agree with the reviewers that our results are interesting in their own right and do not arbitrate between existing theories of the P3. Thus, we have taken the suggested course of action and moved our discussion of theories of P3 to the end of the discussion, after our interpretation of the role of P3 signaling in our specific paradigm. We have also simplified this section, and made it clear that our results to not arbitrate between broader theories:
“Our findings are consistent with a number of studies that have demonstrated the P300 is related to surprise (Donchin, 1981; Wessel, 2016; Garrido et al., 2016; Jepma et al., 2016), but extend on them to reveal how the P300 relates to learning in different contexts. Our results are inconsistent with standard interpretations of the context updating theory of the P300 in which context is defined as a working memory for an observable stimulus (Donchin, 1981; Donchin and Coles, 2010; Polich, 2003; Polich, 2007), as under this definition a larger P300 should always lead to more learning. However, if the updated “contexts” were defined in terms of the latent states described above, the predictions of the context updating theory would indeed match our results. Thus, our results can constrain potential interpretations of the context updating theory, although they do not falsify the theory altogether. Nor do our results directly conflict with other prominent theories of P300 signaling including the idea that central parietal positivity might reflect accumulated evidence for a particular decision or course of action (Kelly and O'Connell, 2013; O'Connell, Dockree and Kelly 2012), or anticipate the need to inhibit responding (Wessel, 2016; Wessel and Aron, 2017), as both of these theories could be framed in terms of the latent states above (e.g., the accumulated evidence for a change in latent state or the need to inhibit responding until the appropriate latent state is loaded). Thus, our results do not arbitrate between these theories, but do require expansion of their interpretation (to include latent variables involved in the generation of outcomes) and also highlight their implications for learning when mechanistic interpretations are refined and applied to our task and data.”
2c) Relationship between P3 and the EEG component identified here.
While we appreciate the general "datadriven" approach used by the authors, we noticed that it inevitably raises questions about the relation to the socalled "P3 components" characterised in the oddball literature. This point requires more discussion.
We now discuss the relationship between our purported P300 signals and those that have previously been characterized in the literature:
“We took a datadriven approach to identifying signals that related to surprising outcomes in different statistical contexts. The primary signal that we identified, however, was similar in timing (3D and F), location (3B), and sensitivity to surprise (3E and G) to those previously reported for the P300 (Kolossa et al., 2012; Kopp et al., 2016). The topography of our spatiotemporal cluster changed over time from frontocentral to centroparietal, consistent with inclusion of both an early frontocentral P3a component as well as a later centroparietal P3b component. Thus, although our methods were agnostic to detection of a specific signal, we interpret our results in the context of the larger literature relating to P300 signaling.”
3. Data analysis.
We believe that several aspects of the data analysis require further attention, specifically:
3a) The authors state that the critical PE*surprise*condition (or PE*EEG*condition) regressors indicate whether "surprise (EEG signal) tends to increase learning in the changepoint condition but decrease learning in the oddball condition". But these interaction term regressors only test for a significant interaction – significant β weights do not imply a sign flip between conditions (increase in one condition, decrease in the other). For example, if surprise increased learning in the change point task, but does not correlate with learning in the oddball task, this might still yield a significant interaction. An sign flip should be assessed via posthoc comparisons.
We agree with the reviewers and now have performed the additional analysis that they recommended.
In terms of behavior, we have conducted an additional behavioral regression that includes separate PE*surprise terms for the changepoint condition (in which surprising trials reflect changepoints) and oddball condition (in which surprising trials reflect oddballs). We see that the changepoint coefficients are significantly positive across participants, whereas the oddball coefficients are significantly negative. We have created a supplementary figure (Figure 2—figure supplement 1) to report these results, and now include a reference to this figure in the main Results section.
To address the concern with our model of behavior that included the PE*condition*EEG interaction term, we have now performed another version of the same analysis where we include separate PE*EEG terms for each condition (changepoint, oddball). These terms include the meancentered product for the modeled condition but are set to zero for the other condition. Applying this model to our updating behavior, we found that coefficients were positive for the changepoint condition, negative for the oddball condition, and significant in 3 of 4 cases when evaluated using the ttest method that we employ. It is also worth noting that the case where we did not see statistical significance (Early P300 cluster in the changepoint condition) the majority of participants showed an effect in the predicted direction (19/25, pvalue for sign test for null hypothesis median = 0: 0.01), and thus we suspect that our inability to reject the null using a ttest was more related to the shape of the distribution than to a lack of effect. We now report this analysis in the Results section:
“Learning rate predictions derived from the regression model show that higher P300 signal strength predicts more learning in the changepoint condition (Figure 4E, orange), but less learning in the oddball condition (Figure 4E, blue) and this prediction was validated in a followup analysis that separately modeled the effect of EEG on learning in the changepoint and oddball conditions (figure 4—figure supplement 1).”
And display the results from it in figure 4—figure supplement 1.
3b) Subsection “Electrophysiological signatures of feedback processing”: With two conditions, oddball and changepoint, in this experiment, how can we have separate regressor weight estimates for both (one dummy variable coding condition would suffice/avoid collinearity)?
Apologies for the lack of clarity. All trials that did not involve a rare event (eg. neutral trials) were the implicit baseline to which our changepoint and oddball trials were compared. Thus, there were three types of trials – changepoints (rare events in the changepoint condition), oddballs (rare events in the oddball condition) and neutral trials (outcomes emerging from the expected transition). As the reviewers note, the trial type that we did not model explicitly (neutral trials) is captured by the intercept in our model. We have changed the description of our terms to make this point more clear:
“First we regressed feedbacklocked EEG data collected simultaneously with task performance onto an explanatory matrix that included separate binary variables reflecting changepoint and oddball trials (as opposed to neutral trials that did not involve a rare event), amongst other terms (Figure 3A, left).”
3c) The contrast "surprise" based on those two regressors might be modelled too liberally: A contrast setting both conditions to "1" is not necessarily identical to a true conjunction (i.e., both regressors driving the EEG significantly). This has been dealt with extensively in the fMRI/GLM literature. In short, outcomes from this "surprise" contrast are not necessarily as decisive as outcomes from a true difference contrast ("learning").
We now directly examine the raw coefficients from which the surprise contrast was composed (changepoint and oddball) within the spatiotemporal cluster of interest and find that surprise is reflected similarly in the changepoint and oddball conditions. We have added a sentence to the Results section to describe this:
“In each cluster, changepoint and oddball trials contributed similarly to the overall surprise effect (Figure 3—figure supplement 1).”
And Figure 3—figure supplement 1 showing the raw coefficients.
3c) Figure 5: Why should (behavioural) learning outcome be used to predict (on the yaxis) the temporally preceding postivity in the EEG? Also, the entire figure seems to stand on statistically shaky grounds, with p values in the.02.04 range in highly sophisticated/convoluted models with many researcher degrees of freedom. Under the null, the result in Figure 5C would be as surprising (4 to 5 heads in row). The authors should do more to convince the reader that we are not looking at some lucky, highly selective results.
Our intention in Figure 5 of our previous manuscript was to make two points:
1) individuals who conditionally modulate updating in response to surprise to a greater degree, also have P300 responses that conditionally predict updating to a greater degree. Or, more concisely, the P300 signal is most predictive of behavior in individuals that show the behavior we are trying to predict.
2) trialtotrial variability in the P300 predicts trialtotrial updating behavior beyond, even beyond what can be inferred using our behavioral model.
We now make these points in separate figures. In order to more clearly make the point about individual differences, we now include a panel in Figure 4 showing that the conditional learning EEG coefficients are positively correlated with our behavioral measure of the conditional effect of surprise on updating. This differs from our previous individual difference analysis in that it is based on the coefficients from our original EEGbased model of updates, rather than the one that includes an additional term to soak up variance that could be accounted for by our behavioral model (see below for updates to that analysis). We see a correlation between these variables that provides compelling evidence for a link between individual differences in behavior and EEG signaling (r = 0.54, p = 3 x 10^{4}).
We have also revised our analysis of the degree to which trialtotrial EEG signals explain behavioral variability beyond that afforded by our original model. Our previous analysis was focused on two separate spatiotemporal clusters that were identified in our regression coefficient clustering procedure. However, after rerunning our analysis using the reviewer suggestions (more inclusive high pass filter and eliminating subject exclusion criterion) we now identify one temporally extended spatiotemporal cluster. To more carefully test whether there is information in our cluster that can predict behavior, we now compute EEG signal strength in 40ms sliding windows of time across the duration of the P300 response. For each sliding window, we include the EEG signal strength in a linear model of participant updating behavior, and estimate coefficients that describe the degree to which the EEG signal directly predicts learning (direct learning) and conditionally predicts learning (conditional learning), while including the predicted updates of our behavioral model as a competing explanatory variable. We smooth coefficients over time and use clusterbased permutation testing to identify contiguous epochs over which coefficients deviated significantly from zero. We find that early in the P300 window (peak = 318 ms) conditional learning coefficients are positive (peak mean/SEM coefficient = 0.04/0.01; cluster corrected p value = 0.01), providing direct support for our claim that trialtotrial variability in EEG signals can explain behavioral variability beyond what can be explained with behavioral measures alone.
3d) The authors highlight an early frontocentral modulation in Figure 3 as being the P3a however the traces 3D indicate that this signal is equal in amplitude for expected and oddball stimuli. Shouldn't it be larger for oddballs if it is indeed a P3a?
The early frontocentral modulation noted in Figure 3 is more positive for changepoint and oddball trials than for expected stimuli. This is indicated by the hot colors at frontocentral locations in Figure 3B (tstat on CP+Oddball contrast) and is also evident in the positive eventrelated difference signals in the FCz channel [changepoint/oddball ERP – expected ERP] shown in figure 3e. Our inclusion of error bars (SEM) in figure 3D obscures this difference, as individual differences in the shape and magnitude of the ERP contribute substantially to the variance in the signal, and this was the motivation for including the error related difference signal on the right panels in which these components are removed.
3e) We suggest toning down the language in certain instances where the authors seem to imply that they have established a causal role for the P3 in belief updating e.g. Our findings are consistent with a number of studies that have suggested the P300 is related to surprise (9,14,17,24), but extend them by demonstrating the role of the signal in controlling the degree to which new information affects updated beliefs
We agree with the reviewers that our results do not unequivocally demonstrate a causal role for the P300 in controlling the degree to which new information affects updated beliefs. We have changed this sentence as follows:
“Our findings are consistent with a number of studies that have demonstrated the P300 is related to surprise (Donchin, 1981; Wessel, 2016; Garrido et al., 2016; Jepma et al., 2016), but extend them to reveal how the P300 differentially relates to learning in different contexts.”
3f) The authors excluded 12/37 subjects excluded from EEG analysis because of low data quality. The criterion of excluding any subject with >25% artifactual trials seems rather stringent. Can you provide more rationale for the procedure? Are the main results robust with respect to such (arbitrary) selection criteria?
The primary author had been advised by a technician to remove low signaltonoise participants from EEG analysis – specifically using epoch rejections greater than 25% as a criterion. However, after a thorough search of the literature and consulting with several experienced EEG researchers, we realized that (1) there is no accepted approach for removing participants who are likely to have low signaltonoise and (2) many (the majority?) of high quality papers in the field exclude very few participants, if any.
To assess the degree to which our subject exclusion affected our results, we reproduced our primary analyses using all possible rejection criterions. We found our primary result (that the P300like EEG signals conditionally relates to learning) was apparent and statistically significant for all but the most conservative criterion values (e.g., removing all participants but 5). The size of the effects noticeably decreased for more liberal criterions – potentially consistent with the idea that some subjects were contributing more noise than signal – however effects were still reliable even when no subjects are excluded. However, performing the same robustness check on the residual analysis (the EEGbased regression in which we included a competing term in the model to soak up known sources of behavioral variability) yielded mixed results – with a wide range of criterion values over which the conditional learning effect was significant and in the appropriate direction, but with a number of criterions, including the most liberal case in which all subjects are included, unable to reject the null hypothesis (see Author response image 1).
In light of this lack of consistency, and with reproducibility and best practices in mind, we have decided to remove the exclusion criterion altogether (eg. include all EEG data). This is not a perfect solution – however we feel that this conservative approach is the lesser of two evils. As we reported above, inclusion of all participants contributed to a change in the clusters identified in our initial analysis, but our key conclusions remain unchanged.
3g) 0.5 Hz is quite a severe highpass cutoff and likely to attenuate some of the P3 activity. We don't think this could account for the significant effects of surprise etc but we would encourage the authors to repeat their key analyses with a substantially lower cutoff (e.g. 0.05 Hz) just to make sure that nothing changes
We have now reanalyzed the data with a lower cutoff (0.05) and find similar results. However, using this lower frequency cutoff changes the clusters formed in our EEG analysis slightly – a primary result being that the two clusters that we previously referred to as the “early” and “late” P300 cluster are merged into a single prolonged cluster. In addition, the magnitude of P300 effects is larger when using the lower cutoff. Both of these findings are consistent with the reviewer conjecture that some P3 activity might have been attenuated in earlier analyses. Thus, we have redone all analyses using this lower threshold.
3f) What reference channel did the authors use for the EEG analyses – grand average?
EEG data were collected in reference to CPz and rereferenced during preprocessing to the grand average. We now report this explicitly in the Materials and methods section:
“Data were collected using CPz as a reference channel and rereferenced to the grand mean for analysis”
https://doi.org/10.7554/eLife.46975.020Article and author information
Author details
Funding
National Institute of Mental Health (F32MH102009)
 Matthew R Nassar
National Institute on Aging (K99AG054732)
 Matthew R Nassar
National Institute of Mental Health (R01 MH08006601)
 Michael J Frank
National Science Foundation (1460604)
 Michael J Frank
German Academic Exchange Service London (Promos travel grant)
 Rasmus Bruckner
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We would like to thank Julie Helmers and Andrea Mueller for their help collecting EEG and behavioral data Rob Chambliss for help cleaning EEG data, and Romy Frömer and Rachel RatzLubashevsky for helpful discussion. This work was funded by NIH grants F32MH102009 and K99AG054732 (MRN), NIMH R01 MH08006601 and NSF Proposal #1460604 (MJF). RB was supported by a Promos travel grant from the German Academic Exchange Service (DAAD). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Ethics
Human subjects: Informed consent was obtained from each participant in the study and all procedures were performed in accordance with the Declaration of Helsinki. All procedures were approved by the Brown University Institutional Review Board (Brown University Federal Wide Assurance #00004460).
Senior Editor
 Timothy E Behrens, University of Oxford, United Kingdom
Reviewing Editor
 Tobias H Donner, University Medical Center HamburgEppendorf, Germany
Reviewers
 Tobias H Donner, University Medical Center HamburgEppendorf, Germany
 Redmond G O'Connell, Trinity College Dublin, Ireland
Publication history
 Received: March 19, 2019
 Accepted: August 12, 2019
 Accepted Manuscript published: August 21, 2019 (version 1)
 Version of Record published: August 30, 2019 (version 2)
Copyright
© 2019, Nassar et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,852
 Page views

 396
 Downloads

 31
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Neuroscience
Biological motor control is versatile, efficient, and depends on proprioceptive feedback. Muscles are flexible and undergo continuous changes, requiring distributed adaptive control mechanisms that continuously account for the body's state. The canonical role of proprioception is representing the body state. We hypothesize that the proprioceptive system could also be critical for highlevel tasks such as action recognition. To test this theory, we pursued a taskdriven modeling approach, which allowed us to isolate the study of proprioception. We generated a large synthetic dataset of human arm trajectories tracing characters of the Latin alphabet in 3D space, together with muscle activities obtained from a musculoskeletal model and modelbased muscle spindle activity. Next, we compared two classes of tasks: trajectory decoding and action recognition, which allowed us to train hierarchical models to decode either the position and velocity of the endeffector of one's posture or the character (action) identity from the spindle firing patterns. We found that artificial neural networks could robustly solve both tasks, and the networks'units show tuning properties similar to neurons in the primate somatosensory cortex and the brainstem. Remarkably, we found uniformly distributed directional selective units only with the actionrecognitiontrained models and not the trajectorydecodingtrained models. This suggests that proprioceptive encoding is additionally associated with higherlevel functions such as action recognition and therefore provides new, experimentally testable hypotheses of how proprioception aids in adaptive motor control.

 Computational and Systems Biology
Correlation between objects is prone to occur coincidentally, and exploring correlation or association in most situations does not answer scientific questions rich in causality. Causal discovery (also called causal inference) infers causal interactions between objects from observational data. Reported causal discovery methods and singlecell datasets make applying causal discovery to single cells a promising direction. However, evaluating and choosing causal discovery methods and developing and performing proper workflow remain challenges. We report the workflow and platform CausalCell (http://www.gaemons.net/causalcell/causalDiscovery/) for performing singlecell causal discovery. The workflow/platform is developed upon benchmarking four kinds of causal discovery methods and is examined by analyzing multiple singlecell RNAsequencing (scRNAseq) datasets. Our results suggest that different situations need different methods and the constraintbased PC algorithm with kernelbased conditional independence tests work best in most situations. Related issues are discussed and tips for best practices are given. Inferred causal interactions in single cells provide valuable clues for investigating molecular interactions and gene regulations, identifying critical diagnostic and therapeutic targets, and designing experimental and clinical interventions.