The temporal representation of experience in subjective mood
Abstract
Humans refer to their mood state regularly in daytoday as well as clinical interactions. Theoretical accounts suggest that when reporting on our mood we integrate over the history of our experiences; yet, the temporal structure of this integration remains unexamined. Here, we use a computational approach to quantitatively answer this question and show that early events exert a stronger influence on reported mood (a primacy weighting) compared to recent events. We show that a Primacy model accounts better for mood reports compared to a range of alternative temporal representations across random, consistent, or dynamic reward environments, different age groups, and in both healthy and depressed participants. Moreover, we find evidence for neural encoding of the Primacy, but not the Recency, model in frontal brain regions related to mood regulation. These findings hold implications for the timing of events in experimental or clinical settings and suggest new directions for individualized mood interventions.
Introduction
Selfreports of momentary mood carry broad implications, yet their underpinnings are poorly understood. We report on our momentary mood to convey to others an impression of our wellbeing in everyday life (Clark and Watson, 1988; Forgas et al., 1984); clinically, selfreports of momentary mood form a cornerstone of psychiatric interviewing (Daviss et al., 2006; Wood et al., 1995); in research, momentary mood is widely used to quantify human emotional responses, such as in ecological momentary assessment (EMA) (Kahneman et al., 2004; Larson et al., 1980; Taquet et al., 2020). Moreover, theoretical accounts suggest that when we report on our mood we integrate over the history of our experiences with the environment (Eldar et al., 2016; Katsimerou et al., 2014; Nettle and Bateson, 2012; Rutledge et al., 2014; Vinckier et al., 2018). In this paper, we address the fundamental question of the time pattern of this integration—what is the timing of events, for example, early versus recent—that matter the most for how we report our mood.
The standard account is that momentary mood reporting is predominantly affected by recent reward prediction errors (Rutledge et al., 2014) (RPEs, or how much better or worse outcomes were relative to what was expected). Accordingly, the more surprising an event is (operationalized as a positive or negative RPE) and the more recent it is, the more it will affect momentary mood. In modeling terms, the standard account posits that humans apply a recency weighting such that their most recent experiences override those in the more distant past (e.g., experiences that happened early in the course of a conversation or a game would matter far less). This computational account has several reallife implications. In terms of measurement of momentary mood, a momentary happiness rating would be a good proxy for one’s most recent experiences. In terms of clinical interactions (such as during an interview or a treatment session), a person’s momentary mood could be lifted by the addition of a positive event.
This standard temporal account is widely applied in models of mood (Eldar et al., 2016; Katsimerou et al., 2014; Rutledge et al., 2014; Vinckier et al., 2018), yet is largely unexamined and has not been compared to plausible alternatives. Indeed, at the opposite end of the standard recency model stands a primacy account of momentary mood. According to a primacy model, experiences that occur early in a conversation, a game, or interaction prevail over more recent ones. The intuition for such a model comes from idiomatic expressions such as starting off on the right foot or empirical evidence, which shows that the first instances of an interaction can be highly informative (Ambady and Rosenthal, 1992; Houser and Furler, 2007). Computationally speaking, early events would be weighted more heavily than recent events, which has several realworld implications. From a measurement perspective, the time scales of momentary mood reporting and of experience would overlap less than in the recency model—the current mood rating would be less of a reflection of the current environment. Moreover, the emphasis on the start of interactions such as interviews or treatment sessions would be much greater.
A computational approach can help us answer this important question as it allows us to make explicit in model terms how humans integrate over their experiences in order to arrive at a selfreport of their moods (Huys et al., 2016). For this purpose, we developed a novel Primacy model that we pitted against the standard Recency model. We then also examine a host of other plausible models as suggested in disparate literatures about valuation timing (Kahneman and Tversky, 2000; Olsson et al., 2017).
We examine these different temporal integration models across a range of conditions in order to establish their generalizability.
First, we examine the Primacy and the Recency models in their generalizability across reward environments. To do this, we exploited the flexibility of a standard probabilistic task (Rutledge et al., 2014) and adapted it to create different task conditions. (1) A random environment, where there was no consistent trend over time in the direction or value of surprises (RPEs); (2) a structured environment, where events in the form of RPEs were organized in positive and negative blocks; and (3) a structuredadaptive environment, where the intensity of RPEs was enhanced in real time to maximize the influence of task events on mood over time and across individuals. We could not be certain that the fixed stimuli of the structured task would be sufficient to drive large changes in mood in each participant, which might influence the temporal integration of events. Therefore, we developed the adaptive task to compensate for individual and temporal differences in mood response, this by delivering personalized stimuli that increase the likelihood of observing a large variation in mood from each participant (implemented by adding a closedloop controller into the standard probabilistic task).
Second, we examine the generalizability of the different temporal integration models across age groups given that previous studies have shown important differences in reward processing particularly between adolescent and adult groups (Braams et al., 2015; Casey et al., 2010; Heller and Casey, 2016; Kayser et al., 2015; Somerville et al., 2010; Walker et al., 2017).
Third, we also examine the generalizability of these models across healthy volunteers and depressed participants, given the wealth of evidence that depression is associated with aberrations in reward processing (Keren et al., 2018; Ng et al., 2019; Stringaris et al., 2015; Whitmer et al., 2012), which might be also affecting the temporal integration of experiences.
We then examine the generalizability of the performance of the two models across simulated data in a model recovery analysis that protects from model selection biases (Hastie et al., 2009; Wilson and Collins, 2019).
Additionally, we examine the generalizability of the Primacy model performance in comparison to other variants of the Recency model.
Finally, we compare the neural correlates of key terms of these competing models using wholebrain fMRI. Previous work has shown that the reporting of mood and evoking emotional responses leads to activations in a network of brain areas encompassing the frontolimbic circuit (Etkin et al., 2015; Etkin et al., 2011; Rutledge et al., 2014). Concordance between computational model parameters and neural activity levels provides evidence that the mechanisms described by the model correspond to the neural processes underlying that behavior. For this reason, we test the correlation of Primacy and Recency model parameters with neural activity measured as blood oxygenleveldependent (BOLD) signal during fMRI and then directly contrast between the relations of the two models.
Results
The Primacy and Recency models of mood
As a first step, we compared the Primacy model versus the Recency model of mood. These were designed to correspond to the general experimental setup that is presented in Figure 1A and has been used extensively before to answer questions about mood (Rutledge et al., 2017; Rutledge et al., 2014). In brief, participants first chose whether to receive a certain amount or to gamble between two values. These values allowed each trial to present to the participant an expectation and an RPE value, where the latter is considered as the difference between the outcome and the expectation values. Subjects were also asked to rate their momentary mood every 2–3 trials by moving a cursor along a scale between unhappy and happy mood. Such mood ratings have been shown before to correspond to the general state of wellbeing of participants (Rutledge et al., 2017); we validated this in our dataset with a significant correlation between baseline mood ratings and participant’s depressive symptom scores (with Mood and Feelings Questionnaire [MFQ] measure in adolescent sample: CC = −0.62, p = 2.62e8, CI = [−0.75,–0.44]; with Center for Epidemiologic Studies Depression [CESD] measure in adult sample: CC = −0.69, p = 7.12e13, CI = [−0.79,–0.56]) and in strong concordance with the gold standard psychiatric interview (KSADS) in distinguishing between patients with depression and healthy volunteers (where the mean initial mood of healthy was also significantly higher than of depressed, t = −3.36, df = 69, p = 0.0012, Cohen’s d effect size = 0.97).
The two principal models, Recency and Primacy, are described in Figure 1B. Both models consider a cumulative and discounted impact of the expectation term on mood, as shown in Figure 1B, Equation 1 (Equations 6–8 in Materials and methods provide the complete formulation of these models). The Recency model represents the standard models applied in computational accounts of mood in such setups. In this recency model, expectation is defined as the average between the two gambling values in the current trial (Figure 1B, Equation 2). By contrast, the Primacy model is our hypothesized account of mood in such setups. In this model, expectation is defined as a weighted average of all previous outcomes (Figure 1B, Equation 3). The critical difference between the two models is illustrated below by presenting the different theoretical scaling curves for the influence that events have on mood across the task. As can be appreciated, the Recency model places an emphasis on the most recent trials. By contrast, the Primacy model emphasizes the early ones. The stronger weight of earlier outcomes in the Primacy model emerges from two separate aspects of the model: first, that one’s expectation for the next outcome is based on the average of all previously received outcomes and, moreover, that mood is determined by the sum of all such past expectations (see Figure 1—figure supplement 1 for a graphical illustration of these two aspects). The Primacy model, therefore, is unique from the prior Recency model because it defines expectation on the basis of the full history of events. We should note though, since the RPEs are recency weighted and are calculated based on the difference of the (primacy weighted) expectation and actual outcome, that recencyweighted outcomes do still influence mood in the Primacy model.
Primacy versus variants of the Recency model
Next, we tested the Primacy model performance against several alternative variants of the Recency model.
First, we addressed the learning component implemented in the Primacy model by creating a Recency model with learning from previous trials (termed the ‘Recency with dynamic win probability model’). This model considered the actual individual winning probability instead of a fixed win probability of 50% in the expectation term (using a triallevel individual winning probability derived from the percentage of previous wintrials).
Specifically, the win probability was formulated as follows:
where the sum in the numerator counts the previous trials on which the outcome was the higher value H (I is a binary vector of this condition with 1 for outcome H and 0 for the lower outcome L) and the sum in the denominator counts the previous trials on which the participant chose to gamble (G is a binary vector of this condition with 1 for the choice to gamble and 0 for the choice of the certain value). The additional bias of 5 in the numerator and 10 in the denominator implement Bayesian shrinkage corresponding to 10 prior observations with an average success of 0.5. The expectation term was modified accordingly:
We then addressed the elimination of the Certain term in the Primacy model by testing it against a ‘Recency without the Certain outcome’ model.
Additionally, we addressed the unique feature of the Primacy model where individuals use experienced outcomes to generate their expectations. We therefore compared the Primacy model to a Recency model where the expectation term is based on the previous outcome rather than current trial’s gamble values (the ‘Recency with outcome as expectation model’).
Next, we compared the Primacy model to a Recency model that merges the dynamic win probability and the elimination of the Certain term modifications, which is the most similar Recency model to the Primacy model (termed the ‘Recency with both dynamic win and no Certain model’).
See Supplementary file 2 for the formulation of these alternative Recency models.
Primacy versus Recency models comparison criteria
We started from using two main criteria to compare between the models. First, a training error, the mean squared error (MSE) of fitting the model to participant’s mood ratings. Second, a streaming prediction error, a withinsubject prediction of each mood rating using the preceding mood ratings (with first 10 mood ratings being discarded as we found that the streaming prediction error criteria were unstable in the first trials due to fewer available data points). A model performed better if it had significantly smaller error between predicted and rated mood values in these criteria, as tested across participants with a onesided Wilcoxon signedrank test, with p<0.05 (tests the null hypothesis that two related paired samples come from the same distribution). We chose a onesided null because the conservative null would be that the new approach is equal to or worse than the existing approach. Moreover, we used a leaveout sample validation and independent confirmatory datasets in all model comparisons.
We then performed a model recovery assessment to validate the model selection criteria by which we compare the performance of the Primacy and Recency models. We first generated simulated datasets using each of the models and then fit the models and tested whether we could correctly identify the model that generated the data. According to both the training error and the streaming prediction criteria, it was possible to recover the true model from the simulated data (the Recency model performed better on data simulated with the Recency model, and vice versa; see Supplementary file 1). We then preferred to use the streaming prediction error across all comparisons as it is a more valid criterion due to the training error favoring overfitting (Hastie et al., 2009).
Primacy versus Recency models across different reward environments
Here, we compare the models and assess their validity across differently structured reward environments.
Random environment
In order to generate a random reward environment, we used the standard probabilistic task (Rutledge et al., 2014) as described above, where the RPE values were drawn randomly from a predefined range of values (Figure 2A). As shown in Figure 2B (left panel), this causes mood fluctuations in keeping with previous results (presents the mean across n = 60 participants, with a significant effect of linear time, but not squared time, on mood in a linear mixedeffects model: ${\beta}_{time}$ = 0.31, SE = 0.11, p = 0.006; ${\beta}_{{time}^{2}}=$ 0.0009, SE = 0.002, p>0.05, and mood change effect size [mean ± SD] = 0.93 ± 1.70).
Comparing the Primacy versus the different Recency models, we found that the Primacy model outperformed each of the Recency models on the streamingprediction criterion (see Figure 3 and Table 1 for model performance comparison and Figure 3—figure supplement 1 for the distributions of the fitting coefficients of each of the tested models in this environment).
Structured environment
In order to generate an environment that has some structure (consistently positive or negative events), we modified the probabilistic task as shown in Figure 2A (middle panel): RPE values were divided into three blocks, one of positive RPEs (+5), the second of negative RPEs (–5), and a third block of positive RPEs again (+5). We found that the experimental setup leads to substantial fluctuations in mood as can be seen in the middle panel of Figure 2B (presents the mean across n = 89 participants, with a significant effect of time, both linear and squared, on mood in a linear mixedeffects model: ${\beta}_{time}$ = 0.02, SE = 0.006, p<0.0001; ${\beta}_{{time}^{2}}$ = 0.0004, SE = 0.0001, p = 0.009, and an effect size per block of [mean ± SD]: 0.56 ± 1.90, –1.42 ± –1.42, –0.55 ± –0.55, for the first, second, and third blocks, respectively).
Comparing the Primacy versus the different Recency models, we found that the Primacy model outperformed each of the Recency models on the streaming prediction criterion (see Figure 3 and Table 1 for model performance comparison and Figure 3—figure supplement 1 for the distributions of the fitting coefficients of each of the tested models in this environment).
Structuredadaptive environment
Since there can be substantial individual differences in response to events in the environment, we developed a third, adaptive task that tracks individual performance and modifies the environment accordingly. In this paradigm, RPE values were not predefined but modified in real time and in an individualized manner by a proportionalintegral (PI) controller (Levine, 2011) to enhance their potential positive or negative influence on mood over time (see rightmost panel of Figure 2A for the average RPE values across all n = 80 participants). This task also consisted of three blocks of RPE values, pushing mood towards the highest mood value in the first block, the lowest mood in the second, and the highest mood again in the third block. We found that this experimental setup leads to the largest changes in mood as can be seen in the rightmost panel of Figure 2B (a significant effect of time, both linear and squared, on mood in a linear mixedeffects model: ${\beta}_{time}$ = 0.04, SE = 0.004, p<0.0001; ${\beta}_{{time}^{2}}$ = 0.001, SE = 0.0001, p<0.0001, and an effect size per block of [mean ± SD]: 0.92 ± 1.60, –1.75 ± 1.10, 1.45 ± 1.70, for the first, second, and third blocks, respectively).
Comparing the Primacy versus the different Recency models, we found that the Primacy model outperformed each of the Recency models on the streaming prediction criterion (see Figure 3 and Table 1 for model performance comparison and Figure 3—figure supplement 1 for the distributions of the fitting coefficients of each of the tested models in this environment).
In what follows, we tested the performance of the Primacy model versus the Recency models, also across different age groups (including different experimental conditions), and across depressed participants.
Primacy versus Recency models across different age groups
We found no differences in the strength of the mood changes by age group (no significant group effect on mood in a linear mixedeffects model: F(152,1) = 2.35, p = 0.12, between an adult sample [n = 80] with mean age ± SD = 37.76 ± 11.23, versus an adolescent sample [n = 72] with mean age of 15.49 ± 1.48). The collection of these two datasets differed in age but also in experimental conditions as the adult sample was collected online (see Materials and methods for details and the preregistered analysis link), while the adolescent sample was a labbased collection in an fMRI scanner.
We found that the Primacy model outperformed the different Recency models in both the online adult and the labbased adolescent samples (see Figure 3 and Table 1 for model performance comparison and Figure 3—figure supplement 1 for the distributions of the fitting coefficients in each age group).
Primacy versus Recency models across different diagnostic groups
We found no differences in the strength of the mood changes between the healthy and depressed adolescent participants (when controlling for the difference in baseline mood, there was no significant group effect on mood in a linear mixedeffects model: F(70,1) = 0.77, p = 0.38; between healthy participants [n = 29] with mean ± SD depression score [MFQ] = 1.84 ± 2.49, versus participants diagnosed with major depression disorder [n = 43 with mean depression score of 8.31 ± 6.27; 12 or higher being the cutoff for indicating depression]).
Comparing the Primacy model versus the different Recency models in the depressed adolescent sample, we found that the Primacy model outperformed each of the Recency models (see Figure 3 and Table 1 for model performance comparison). Moreover, we confirmed the superior performance of the Primacy model result also in adult participants with high risk for depression (n = 28 participants with CESD scores above 16, being the cutoff for high risk for depression, showed significantly lower MSEs for the Primacy against the Recency model in a Wilcoxon test with p<0.001).
For completeness, we also tested the Primacy model against models with other weighting of past events. Figure 3—figure supplement 2 presents a model in which we added to the expectation term a decay parameter and a parameter for how many previous outcomes are included, resulting in various possible scaling curves for the influence of previous events (Equations S1 and S2). Comparing five such alternative models showed that the Primacy model outperformed these models too (significantly lower streaming prediction errors for the Primacy model in a Wilcoxon test with p<0.001).
See also Figure 3—figure supplement 3 for a demonstration of the change over time in task values and the Primacy versus the Recency model parameters, shown both for a single participant and on average across the group.
Primacy versus Recency models in relation to brain responses
Finally, we compared the Primacy and the Recency models on the basis of their relationship to brain activity measured using fMRI. To this end, participants were scanned whilst completing the structuredadaptive version of the task. We correlated BOLD signal with the participantlevel weights of the parameters of the Primacy and two of the Recency models (the original Recency model and the Recency model that is most similar to the Primacy model, i.e., the one with both dynamic win probability and no Certain term). We found that neural activity preceding the mood rating phase (Figure 4A) was significantly correlated to the Primacy model expectation term (${\beta}_{E}$), which reflects the relationship between mood and previous events (Figure 4B, cluster at the anterior cingulate cortex [ACC] and ventromedial prefrontal cortex [vmPFC] regions, n = 56, peak beta = 44.80, t = 3.37 and p = 0.0017, which corrects to p<0.05, first by using the 3dClustSim in AFNI software [Analysis of Functional NeuroImages] with an autocorrelation function [ACF] resulting in p = 0.005 and a minimal cluster size of 100 voxels followed with a Bonferroni correction since we compared three different models, reaching p = 0.005/3 = 0.0017). By contrast, both Recency models’ individual parameters showed no significant relation to neural activity. To formally compare between the two models in their relationship to brain activity, we contrasted the two voxelwise correlation images, that is, the BOLD signal correlation across participants with the Primacy model individual weights ${\beta}_{E}$ (coefficients of the expectation term) versus BOLD signal correlation with the Recency models ${\beta}_{E}$ . This showed a significantly stronger relation of the Primacy model to neural activity, specifically in the ACC and vmPFC regions (Figure 4C, t = 5.00, using the corrected threshold of p = 0.0017). This result provides a possible neural underpinning specific to the Primacy model’s mathematical realization of expectations and mood. Additionally, mood ratings were correlated to the preceding neural activity level in the striatum during the structuredadaptive task (Figure 4—figure supplement 1), in congruence with previous accounts of mood relations to striatal activity in the random task design.
Discussion
A fundamental assumption about how humans report on their mood is that they integrate over the history of their experiences. In this paper, we sought to test this assumption and establish the temporal structure of this integration.
We show that when humans report on their momentary mood, they do indeed integrate over past events in the environment. However, we find no support for the commonly assumed recency model of integration, that is, for the assumption that most recent events matter the most in this integration. Instead, we find several lines of evidence to support a primacy model of the integration of past experiences in mood reports.
The first line of evidence comes from comparing the primacy, the recency, and several other models in a probabilistic task that has been widely used in the past to influence subjective mood reports. This version of the probabilistic task presents participants with random RPE values (Rutledge et al., 2014). The model comparison, conducted using errors of withinsubject prospective prediction of consecutive mood ratings, indicated that a primacy model, that is, a model that places greater and longlasting emphasis on early events, described mood ratings the best. The Primacy model also outperformed several other plausible models, including an extended model that allowed other timing of previous events to be most influential, as well as models resulting from modifications of the Recency model (where terms were excluded or replaced by alternative task values).
We then sought to test whether our findings generalized beyond a random reward environment. We did so in order to emulate reallife situations where negative or positive events tend to cluster over periods of time, such as a conversation between two people that can include consistently pleasant (or unpleasant) events throughout the interaction time frame. We did this by adapting the probabilistic task in two different ways. First, by introducing blocks of predetermined consecutively negative or positive RPEs. In this structured environment, the Primacy model outperformed the alternative Recency models. We also modified the probabilistic task in another way, namely by introducing a PI control algorithm. This created a structuredadaptive task, with a block of consecutively negative or positive RPEs that were, however, tailored in real time to each individual’s initial mood level and mood response, to maximize the influence of these events (such as when we modify our tone of speech in real time during a conversation, e.g., according to who we are interacting with and the response we aim for). The Primacy model clearly outperformed the alternative recency models also in this task.
It is conceivable that what appears to be a primacy effect is actually due to longerlasting effects of, say, positive RPEs on mood—this could be particularly exacerbated in the structuredadaptive task. However, since our result was also robust in a random task design, as well as when testing a model with a varying time window parameter that considered different number of previous trials (t_{max}, see Figure 3—figure supplement 2), we do not find evidence for the block valence order to account for the better performance of the Primacy model. In addition, the interaction between individual behavior and the controller in the structuredadaptive environment could raise interpretative difficulties. Therefore, it is important that the Primacy model also fit better in the structured and random tasks, where the tasks did not respond to individual differences in responsiveness to RPEs. Yet, it is possible that this is a contribution to the fact that the advantage of the Primacy model over the Recency models is greater in the structuredadaptive task. There may also be additional mechanisms at play in the structuredadaptive task such as hedonic extinction towards RPEs that explain some of the increased performances of the Primacy model compared to the Recency model in this task.
We then also sought to test whether our findings generalized across two important variables. One such variable is age. Substantial evidence shows that adolescence is a time when levels of selfreported mood can change dramatically, for example, through overall increases in the levels of depression (Heller and Casey, 2016; Maciejewski et al., 2015; Ronen et al., 2016; Stringaris and Goodman, 2009). Also, adolescence marks a time when reward processing appears to be different to that of adulthood with reported increases in the sensitivity of mood (Braams et al., 2015; Casey et al., 2010; Heller and Casey, 2016; Kayser et al., 2015; Somerville et al., 2010; Walker et al., 2017). Our primacy model fitted better than the recency alternative in both adolescent and adult samples.
The other variable is subjects’ depression. Its importance is twofold. First, the selfevident fact that persons with depression report a lower mood than nondepressed persons, this is the case in clinical but also in the experimental task setting we and others have used (Rutledge et al., 2017). This difference in mean scores could be reflecting a different way in which persons with depression report on their mood. In keeping with this, the integration of experiences could be happening in a different temporal structure. The second reason is that persons with depression are thought to display reward processing aberrations (Keren et al., 2018; Ng et al., 2019; Stringaris et al., 2015; Whitmer et al., 2012)—for example, in the form of being less sensitive to rewards or learning less from them—that could impact the way in which they integrate across environmental experiences. We address this question specifically in adolescents, the time of a sharp increase in depression incidence (Beesdo et al., 2009), and find no evidence that depressed adolescents applied a different model to the one that their nondepressed counterparts did. Moreover, also adult participants with high scores of depression showed that the Primacy model is a better account of their mood reports. These findings strongly suggest that the temporal representation of experiences offered by the model is robust to important personal characteristics.
A formal comparison between the relation of brain activation to the Primacy versus the Recency mood models was conducted. We link the Primacy model to neural activity by a correlation between the model parameters and neural activation at the time preceding mood ratings. Specifically, we show that individual activation at the ACC and vmPFC is correlated to the weight of the expectation parameter of the Primacy model, but not the Recency model. These regions are implicated in mood regulation (Bush et al., 2000; Etkin et al., 2015; Etkin et al., 2011; Hiser and Koenigs, 2018; Rudebeck et al., 2014; Stevens et al., 2011; Zald et al., 2002) and in underlying decision making relative to previous outcomes (Behrens et al., 2007; Scholl et al., 2017; Scholl et al., 2015; Wittmann et al., 2016). Activity in these regions increased as the weight of the expectation parameter ( ${\beta}_{E}$) of an individual was higher. Since the weight of this parameter determines the influence of previous outcomes on mood, this result suggests that these regions’ activity plays a role in mediating the integration of previous outcomes to a subjective mood report. Therefore, the strength of this modelbased fMRI analysis (Cohen et al., 2017; O'Doherty et al., 2007) is in allowing us to link neural signals to the computational relation between previous experiences and subjective mood reports.
Importantly, Vinckier et al., 2018 also reported the mvPFC region as positively correlated in its activity level to changes in mood ratings, supporting the role that our model suggests for the mvPFC region in mediating mood ratings according to a primacy weighting of previous events. We did not find, however, the negative neural correlations to mood ratings that were also reported by that study.
Our experiments examine only a short space of time, no longer than 40 min. Human experiences are undoubtedly integrated over longer time periods, including temporally distant events in childhood. Whilst these are inherently difficult to model experimentally, it is noteworthy that earlylife experiences, such as early adversity, are thought to exert longterm influences on mood (Douglas et al., 2010; LewisMorrarty et al., 2015; Raby et al., 2015). We also note that the time scales of our experiments are congruous to a number of reallife situations, both in research and clinical terms.
In research terms, selfreported mood in EMA is typically within the span of hours (Kahneman et al., 2004; Larson et al., 1980; Taquet et al., 2020). Given that the goal of EMA is often to uncover mood dynamics in relation to experiences in the environment, our results strongly indicate that explicit modeling of the relative timing of these two variables to each other may be crucial. Similarly, during fMRI and other scanning, researchers often ask participants to report on their mood during these sessions and use these to relate to neuroimaging results. Our results suggest that not just the value of events as such (whether, e.g., an aversive film was shown to participants), but also when it was shown may differentially impact such reports.
In terms of clinical events—such as patients’ interactions with healthcare professionals for the purposes of psychotherapy or medication treatment—these typically last for about an hour. Importantly, the assessment of treatment progress relies on selfreport (or clinician assessment of patients’ reports). Our results suggest that timing of such reports in relation to experiences during treatment could be an important source of variance.
Moreover, although our experiments test for different temporal structures of reward, they use a single type of task, a simple gambling decision task. It might be that the temporal structure of mood dependence is sensitive to the type of task or the context (i.e., social situations such as a conversation may include a different integration), which is an important matter for future studies. Another potential caveat relates to the online data collection using the Amazon Mechanical Turk (MTurk) platform, which can possibly include a different subset of participant characteristics (Ophir et al., 2020), but importantly, our results were robust to this difference and were wellreplicated in our labbased participants. In respect to the Primacy model characteristics, we aimed to minimize the divergence from the existing Recency model (we therefore changed the expectation term to consider the average of all previous outcomes but maintained the sum and the overall exponential discounting of that term). This computational modification between the models reflected our hypothesis that expectations in nonrandom temporal structures of rewards would be also influenced by the history of previous outcomes. This modification then resulted in a Primacy weighting. Nevertheless, the better performance of a Primacy weighting was consistent also when considering other formulations for weighing of previous events (and without taking into account the fewer parameters of the Primacy model).
Our results demonstrate that the Primacy model is superior to the Recency model, indicating that the full history of prior events influences mood. However, inclusion of a recencyweighted outcomes in the RPE term of the Primacy model prevents us from concluding simply that early events are more important than recent events in the ultimate outcome of selfreported mood. We therefore also note that when fitting the Primacy model the coefficients of the expectation term were significantly larger than the coefficients of the RPE term (which include recencyweighted outcomes), supporting the dominance of the expectation term primacy weighting (paired ttest with t = 2.6, p = 0.009, CI = [0.008,0.059]).
Additionally, there may be alternative, mathematically equivalent formulations of these models that would support different interpretations. Future work should compare the overall impacts of primacy and recency effects on mood with approaches robust to reparameterization, such as analysis of the causal effect of previous outcomes on mood using the potentialoutcomes framework (Imbens and Rubin, 2015).
Overall, our conclusion that the effect of outcomes on mood through expectations has a primacy weighting in our tasks holds robustly when we consider a variety of different but similar models that either have primacy weighting (Figure 3—figure supplement 2) or recency weighting (Table 1). All the models with primacy weighting share that the expectation is based on an average over previous outcomes or potential outcome values. We stress that the expectation itself does not have to have primacy weighting for our conclusions to hold. The primacy model that we have chosen as our representative primacy model (due to having superior or statistically indistinguishable performance over the alternative primacy models) applies equal weights to all past outcomes to form the expectation, but we have also tried models where the weighting within the expectation had higher weights for more recent outcomes. In all these cases, the combination of current and past expectation still results in a primacyweighted aggregate effect of previous outcomes on mood. The dependence of mood on an accumulation of previous expectations is therefore what causes the primacy weighting as the initial outcomes have a larger influence on mood versus a smaller influence of past expectation terms. In an intuitive sense, the primacy effect represents the greater weight first experiences have in a new environment or context, simply by virtue of coming first. The first event has nothing against which it can be compared, the second event has only itself and the first; the third event can be compared only against the first two, and so on, till eventually each additional event has a minimal impact in the face of all the events that have come before. The more trials we experience, the more information we gain, and the less meaning each event has on its own. This process has clear parallels to learning, but our models are agnostic to the exact mechanism by which expectations are accumulated. It is likely that there are equivalent formulations to our models in which expectation is a learned parameter controlled by a learning rate. The details of this mechanism are certainly of interest, but these will need to be elucidated by future studies.
More generally, our findings point to the importance of studying the temporal architecture of the interplay between experiences and mood. So far, computational and theoretic accounts of mood have focused on event value (either in terms of expectations or outcomes or both) (Eldar et al., 2016; Katsimerou et al., 2014; Russell et al., 1989; Rutledge et al., 2014; Vinckier et al., 2018; Watson and Tellegen, 1985) as influences on subjective reports of wellbeing, while neglecting the importance of time. Our results suggest that in addition to these influential properties of the environment the dimension of time, that is, the temporal structure of previous events, also plays an important role, and that rather than being a matter of what happened most recently, the temporal representation of experience in mood seems to be dominated by a longlasting effect of early events.
Materials and methods
1. The Primacy versus Recency mood models
Request a detailed protocolThe formulation of both models consisted of two dynamic terms: the expectation term (E) and the RPE term (R), which is the difference of the outcome relative to the expected value.
Specifically, the Recency model of mood at trial t (M_{t}) was defined as
where ${\u03f5}_{t}$ is a random noise variate with some unknown distribution (we may assume it to be normal with mean 0 and standard deviation $\sigma $), ${M}_{0}$ is the participant’s baseline mood, $\gamma \in \left(\mathrm{0,1}\right)$ is an exponential discounting factor, C_{j} is the nongamble certain amount at trial j (if not chosen then C_{j} = 0 and when chosen instead of a gamble then E_{j} = R_{j} = 0), ${\beta}_{C}$ is the participant’s sensitivity to certain rewards during nongambling trials, ${\beta}_{E}$ is the participant’s sensitivity to expectation, and ${\beta}_{R}$ is the sensitivity to surprise during gambles.
In this model, the expectation term at trial t (E_{t}) was defined as the average between the two gamble values (see Figure 1, Equation 2) and the RPE term, R, was defined as
A_{t} being the trial outcome.
In the Primacy model, the expectation term was replaced by the average of all previous outcomes (Figure 1, Equation 3) and R was defined similarly as shown in Equation 7. The overall Primacy model for mood at trial t was
where $\beta}_{E$ and $\beta}_{R$ are the participant’s sensitivity to previous outcomes and to how surprising these outcomes are relative to expectation, respectively. Note that this model performed better when we did not distinguish between gambling and nongambling trials, which was another divergence from the standard Recency model.
2. Model fitting
Request a detailed protocolAll models were fit using a TensorFlow package (code can be downloaded using the link provided in Section 6). We chose group regularization constants by creating simulated datasets with realistic parameters and selecting the regularization parameters from a grid that had the best performance. The grid consisted of powers of 10 from 0.001 to 10,000.
For optimization, we used the following generic parametric model across subjects:
where s indexes the subject, M_{s}(t) is subject's mood rating at trial t, µ_{s} is the subjectspecific baseline mood, v is one of p timevarying task variables X (e.g., expectation or RPE values at each trial j), and β_{v,s} are subjectspecific coefficients for each timevarying variable X_{v,s} (note that we constrain β_{1},…,β_{3} ≥ 0).
To facilitate optimization, we further reparameterized the discount factor γ_{s} by defining
so that $\xi}_{s$ is an unbounded real number.
We found that the use of grouplevel regularization was necessary in order to stabilize the estimated coefficients. This took the form of imposing a variance penalty on $\xi$ and on each coefficient βv. The empirical variance is defined as
where n_{s} is the number of subjects, and $\overline{\xi}$ is the group mean:
We define Var(X_{v}) for v = 1,…,p likewise.
The objective function is therefore
where T is the set of trials where M_{s}(t) was defined (optionally, one can also discard the first few trials in T to minimize window effects, we required t ≥ 11), with λ_{ξ} = 10 and λ_{β} = 100 as the regularization parameters with the best performance in recovering the simulation ground truth for both models.
A leaveout sample validation approach was used in all model fitting, where a subsample of 40% randomly selected participants were modeled and then results were confirmed on the entire sample.
3. Testing the Primacy model across reward environments
The random task
Request a detailed protocolParticipants played a gambling task where they experienced a series of different RPE values while rating their mood after every 2–3 trials. In this task, each trial consisted of a choice of whether to gamble between two monetary values or receive a certain amount. RPE values were randomly modified (ranging between −2.5 to +2.5) by assigning random values to the two gambles and a 50% probability for receiving one of these values as an outcome. The certain value was the average of the two gamble values. Specifically, each trial consisted of three phases: (1) gamble choice: 3 s during which the participants pressed left to get the certain value or right to gamble between two values (using a fourbutton response device); (2) expectation: only the chosen certain value or the two gamble options remained on the screen for 4 s; and (3) outcome: a feedback of the outcome value was presented for 1 s, followed by an intertrial interval of 2–8 s. Participants completed 81 trials.
The mood rating consisted of two separate phases: (1) prerating mood phase, where the mood question ‘How happy are you at this moment?’ was presented for a random duration between 2.5 and 4 s, while the option to rate mood was still disabled; and (2) mood rating by moving a cursor along a scale labeled ‘unhappy’ on the left end and ‘happy’ on the right end. Each rating started from the center of the scale, and participants had a time window of 4 s to rate their mood. The cursor could move smoothly by holding down a single button press towards the left or the right directions. Each rating was followed by a 2–8 s jittered interval. Participants completed 34 mood ratings, and the overall task lasted 15 min.
The structured task
Request a detailed protocolIn this version, participants experienced blocks of high or low RPE values, that is, patterns of positive or negative events, where RPE values were predefined and identical for all participants. RPE values were set by a premade choice of the two gamble values and the outcome value, such that these values were random but RPE value, the difference between the average of the two values and the outcome, resulted in a predefined value (positive blocks of RPE = +5 during the first and the third blocks and a negative block in the middle of RPE = −5 during the second block). To maintain the unpredictability of outcomes within a block in the latter fixed version, 30% of the trials were incongruent to the block valence (i.e., small negative values of −1.5 during the first and third positive blocks, and a positive value of +1.5 during the second negative block).
The certain value was the average between the two gamble values. To avoid a predictable pattern of wins and losses, 30% of the trials were incongruent trials, where gamble and outcome values resulted in the opposite valence of RPE to the block (negative during the first and third positive blocks and positive during the middle negative block), but of a smaller value (RPE incongruent = 1.5). More specifically, the task consisted of three blocks, where each block had 27 trials and 11–12 mood ratings. Trials were identical to the random version in appearance. Participants again completed overall 34 mood ratings, and the overall task lasted 15 min.
The structuredadaptive task
Request a detailed protocolThe structuredadaptive version was designed to maximally influence mood upwards or downwards by increasing or decreasing RPE values in real time. This task was identical to the structured task in the block design and number of trials but differed in RPE values being calculated in real time using a closedloop control (PI algorithm used in control of nonlinear systems in engineering; Levine, 2011). Specifically, following each mood rating, RPE values were increased or decreased according to the difference of mood from the target mood, which was set to the highest mood value in the first and third positive blocks and to the lowest mood during the second block. This setup therefore generated personalized ‘reward environments' as the task values were calculated online according to individual mood response and were not predetermined as in conventional paradigms.
More specifically, in each iteration of mood rating, the current mood, M(t), was compared to the block mood target value (M_{T}), which was set prior to the task in the aim to generate maximal mood transitions. Mood target value was defined as the maximal mood value on the mood scale in first and third blocks and the minimal mood value during the second block. To bring the mood value as close as possible to the target value M_{T}, the algorithm aimed at minimizing the error between the rated mood and the target mood value (M_{E}).
The resulting ${M}_{E}$ value was between 0 and 1, then mapped to a change in the task RPE value, using a PI controller algorithm. This control algorithm uses a proportional and an integral error term derived from ${M}_{E}$ . Importantly, the integral error term enables an RPE modification when mood remains in the same distance from the target mood value, and it was reset for each block.
Next, the RPE value was calculated such that the larger the mood error the stronger the modification of the RPE value, as follows:
where RPE_{baseline} is a fixed value that was precalibrated to the value of 14 points (to have a moderate yet efficient influence of RPE change on mood). Congruent trials were 70% of trials, aligned with the control algorithm direction; the remaining 30% were incongruent, providing an RPE value with the opposite sign to the block context (set to be smaller in amplitude: on average, incongruent RPE values were −1.5 ± 0.8 SD). Then the two gamble values were calculated for the next 2–3 trials as follows:
where H was the higher value, randomly assigned from a list ranging between [−1.5,14] with a step size of 0.2, and L is the lower gamble outcome. The allocation of these values between the upper or lower squares on the screen was randomly assigned.
The certain value (CR) that appeared on the left side was set to the average between the two values (unless this resulted in a certain value higher than two points, in which case it was half of the lower value L(t + 1)).
Last, the outcome value (A) was assigned according to the block H(t + 1) in 70% of the first and third positive blocks trials, and L(t + 1) during 70% of the second block trials (and vice versa in the 30% of trials that were incongruent).
This closedloop circuit continued throughout the task, with each new mood rating used to update the reward values for the next series of 2–3 trials.
In all above task designs, participants were not informed of the probability of winning the gamble. We probed whether participants noticed that the win probability was rigged between blocks in the structured and structuredadaptive tasks with a followup questionnaire, which showed that most participants (90%, 65/72) were unaware of the manipulation (in a scale between 0 and 3, the average rating for whether the task was unfair was 0.36 ± 0.69 SD with 7/72 subjects indicating ‘agree’ or ‘strongly agree’).
Participants
Participants completed either the random task (n = 60, mean age ± SD = 39.81 ± 13, 44% females), the structured task (n = 89, mean age ± SD = 37.55 ± 10.46, 44% females), or the structuredadaptive task (n = 80, mean age ± SD = 37.76 ± 11.23, 42% females). See Table 2 for participants characteristics.
These participants were recruited from Amazon Mechanical Turk (MTurk) system and completed the tasks online. Analyses of this structuredadaptive dataset were publicly preregistered on an open science online repository to confirm our modeling results (https://osf.io/g3u6n/). The MTurk Worker ID was used to distribute a compensation of $8 for completing the task and a separate task bonus between $1 and $6 according to the points gained during the task. Participants were instructed before the task that they would receive a payment that is proportional to the points that they gain during the task. These study populations were ordinary, nonselected adults of 18 years of age or older. Participants were not screened for eligibility, all individuals living in the US and who wanted to participate were able to do so. Participants were restricted to doing the task just once. Three participants were excluded from analyses due to an error in the task script where mood ratings were inconsistently spread along the three blocks. All participants received similar scripted instructions and provided informed consent to a protocol approved by the NIH Institutional Review Board.
Statistical testing of the influence of reward environments on mood
Request a detailed protocolWe applied a linear mixed effects model to estimate the task influence on mood using the nlme package in RStudio (2020). This model enabled the estimation of the acrossparticipants significance of mood change while controlling for the withinparticipant variability in mood change slopes and intercepts, defined as random effects. Specifically, the independent variable was the response variable of interest mood (M), and the dependent variables were time (t, which is the trial index) and time squared (t2), with the two different time variables considered as random effects, as follows:
The effect was considered significant with p<0.05. All ttests conducted were twosided.
4. Testing the Primacy model across participant characteristics and neural signals
Request a detailed protocolAn additional dataset of the structuredadaptive task was collected in an fMRI scanner, providing us with different experimental conditions, a different age group of adolescent participants, data of participants diagnosed with depression, and a recording of neural signals during the task (n = 72, mean age ± SD = 15.49 ± 1.48, 76% females, mean depression score MFQ ± SD = 5.81 ± 5.98, n = 43 participants met diagnostic criteria for depression according to DSM5, of whom at the time of the experiment n = 18 had an ongoing depressive episode and n = 35 were medicated). See Table 2 for participants characteristics. These participants completed the task in an fMRI scanner and were compensated for doing the task and for scanning, as well as receiving a separate bonus proportional to the points earned during the task (a value between $5 and $35). This task version lasted 24 min instead of the duration of 15 min of the online versions to allow for an optimal analysis of brain data. Participants were screened for eligibility, and inclusion criteria were the capability to be scanned in the MRI scanner and not satisfying diagnosis criteria for disorders other than depression according to DSM5. Overall, five participants were excluded from analyses due to incomplete data files, and three additional participants were excluded due to repeatedly rating a single fixed mood value for an entire block of the task. These participants received the same scripted instructions and provided informed consent to a protocol approved by the NIH Institutional Review Board.
5. Analyzing the neural correlates of the Primacy model
fMRI data acquisition
Request a detailed protocolParticipants in the adolescent sample performed the structuredadaptive task while scanning in a General Electric (Waukesha, WI) Signa 3Tesla MR750s magnet, being randomly assigned to one of two similar scanners. Task stimuli were displayed via backprojection from a headcoilmounted mirror. Foam padding was used to constrain head movement. Behavioral choice responses were recorded using a handheld Fiber Optic Response Pad (FORP). 47 oblique axial slices (3.0 mm thickness) per volume were obtained using a T2weighted echoplanar sequence (echo time, 30 ms; flip angle, 75°; 64 × 64 matrix; field of view, 240 mm; inplane resolution, 2.5 mm × 2.5 mm; repetition time was 2000 ms). To improve the localization of activations, a highresolution structural image was also collected from each participant during the same scanning session using a T1weighted standardized magnetization prepared spoiled gradient recalled echo sequence with the following parameters: 176 1 mm axial slices; repetition time, 8100 ms; echo time, 32 ms; flip angle, 7°; 256 × 256 matrix; field of view, 256 mm; inplane resolution, 0.86 mm × 0.86 mm; NEX, 1; bandwidth, 25 kHz. During this structural scanning session, all participants watched a short neutralmood documentary movie about bird migration.
Data preprocessing
Request a detailed protocolAnalysis of fMRI data was performed using Analysis of Functional and Neural Images (AFNI; Cox, 1996) software (version 19.3.14). Standard preprocessing of EPI data included slicetime correction, motion correction, spatial smoothing with a 6 mm fullwidth halfmaximum Gaussian smoothing kernel, normalization into Talairach space and a 3D nonlinear registration. Each participant's data were transformed to a percent signal change using the voxelwise timeseries mean blood oxygenleveldependent (BOLD) activity. Time series were analyzed using multiple regression (Neter et al., 1990), where the entire trial was modeled using a gammavariate basis function. The model included the following task phases: gamble choice: an interval that lasted up to 3 s, from the presentation of the three monetary values to the choice button press, left for the certain amount or right to gamble. Expectation: a 4 s interval from making the choice of whether to gamble to receiving the gamble outcome. Outcome: a 1 s interval during which the received outcome is shown. The prerating interval: a variable interval between 2.5 and 4 s when the mood question is presented but the option to rate mood is still disabled. Mood rating phase: a 4 s interval during which participants rate their mood. The model also included six nuisance variables modeling the effects of residual translational (motion in the x, y, and z planes), rotational motion (roll, pitch, and yaw), and a regressor for baseline plus slow drift effect, modeled with polynomials (baseline being defined as the nonmodeled phases of the task). Echoplanar images (EPIs) were visually inspected to confirm image quality and minimal movement. The code for generating the full processing stream for each participant was created using the afni_proc.py command. This script creates also a quantitative and qualitative quality control (QC) outputs, which were used to verify the processing in the present study. We then ran a wholebrain, grouplevel ANOVA (3dMVM [Chen et al., 2014] in AFNI) with the weights of the Primacy or the Recency model as betweenparticipant covariates of each of these neural activations (each participant’s neural activity was represented by a single wholebrain image of activation across all trials).
Statistical significance
Request a detailed protocolThis was determined at the group level using 3dClustSim (the latest acceptable version in AFNI with an ACF model), which generated a corrected to p<0.05 voxelwise significance threshold of p<0.005 and a minimal cluster size of 100 voxels. We analyzed relation to model parameters with neural activity during three different phases of the task: activation during the premood rating period, mood rating encoding (with mood values as a parametric regressor of the mood prerating period), and taskbased RPE encoding (RPE values as a parametric regressor of the outcome period). Since these are three separate tests, we added a Bonferroni correction to the multiple comparison correction, which resulted in a final pvalue threshold of 0.005/3 = 0.0017.
6. Code and data availability
Request a detailed protocolTo enable the reproducibility of this study we made all scripts and datasets available online at: https://osf.io/vw7sz. This online repository includes the scripts for modeling the mood data, the sourcedata of Figure 2 (tasks and mood rating data of all participants), the afni_proc and 3dMVM neural analysis scripts and the wholebrain neural images presented in Figure 4. All the unprocessed neuroimaging data can be found online at: https://openneuro.org/datasets/ds003709.
Data availability
To enable the reproducibility of this study we made scripts and datasets available online at: https://osf.io/vw7sz/. This repository includes: Mood modeling code; Sourcedata of Figure 2 (tasks trialwise values and mood ratings values of all participants); Neural analyses code; Files of the wholebrain neural images presented in Figure 4. All the unprocessed neuroimaging data can be found online at: https://openneuro.org/datasets/ds003709.

Open Science FrameworkID vw7sz. Primacy Weight on Mood  Analyses code and datasets.
References

Thin slices of expressive behavior as predictors of interpersonal consequences: a metaanalysisPsychological Bulletin 111:256–274.https://doi.org/10.1037/00332909.111.2.256

Learning the value of information in an uncertain worldNature Neuroscience 10:1214–1221.https://doi.org/10.1038/nn1954

Cognitive and emotional influences in anterior cingulate cortexTrends in Cognitive Sciences 4:215–222.https://doi.org/10.1016/S13646613(00)014832

The storm and stress of adolescence: insights from human imaging and mouse geneticsDevelopmental Psychobiology 52:225–235.https://doi.org/10.1002/dev.20447

Mood and the mundane: relations between daily life events and selfreported moodJournal of Personality and Social Psychology 54:296–308.https://doi.org/10.1037/00223514.54.2.296

Computational approaches to fMRI analysisNature Neuroscience 20:304–313.https://doi.org/10.1038/nn.4499

AFNI: software for analysis and visualization of functional magnetic resonance neuroimagesComputers and Biomedical Research 29:162–173.https://doi.org/10.1006/cbmr.1996.0014

Criterion validity of the mood and feelings questionnaire for depressive episodes in clinic and nonclinic subjectsJournal of Child Psychology and Psychiatry 47:927–934.https://doi.org/10.1111/j.14697610.2006.01646.x

Mood as representation of momentumTrends in Cognitive Sciences 20:15–24.https://doi.org/10.1016/j.tics.2015.07.010

Emotional processing in anterior cingulate and medial prefrontal cortexTrends in Cognitive Sciences 15:85–93.https://doi.org/10.1016/j.tics.2010.11.004

The neural bases of emotion regulationNature Reviews Neuroscience 16:693–700.https://doi.org/10.1038/nrn4044

The influence of mood on perceptions of social interactionsJournal of Experimental Social Psychology 20:497–513.https://doi.org/10.1016/00221031(84)900404

Predicting relational outcomes: an investigation of thin slice judgments in speed datingHuman Communication 102:69–81.

Computational psychiatry as a bridge from neuroscience to clinical applicationsNature Neuroscience 19:404–413.https://doi.org/10.1038/nn.4238

BookCausal Inference for Statistics, Social, and Biomedical Sciences: An IntroductionNew York: Cambridge University Press.https://doi.org/10.1017/CBO9781139025751

BookChoices, Values, and FramesNew York Cambridge: Cambridge University Press.https://doi.org/10.1037/0003066x.39.4.341

A computational model for mood recognitionUser Modeling, Adaptation, and Personalization, Umap 2014:122–133.https://doi.org/10.1007/9783319087863_11

A functional MRI study of exploratory behaviors in early adolescenceNeurology 84:P2.243.

Reward processing in depression: a conceptual and MetaAnalytic review across fMRI and EEG studiesAmerican Journal of Psychiatry 175:1111–1120.https://doi.org/10.1176/appi.ajp.2018.17101124

Mood variability and the psychosocial adjustment of adolescentsJournal of Youth and Adolescence 9:469–490.https://doi.org/10.1007/BF02089885

A 5Year longitudinal study on mood variability across adolescence using daily diariesChild Development 86:1908–1921.https://doi.org/10.1111/cdev.12420

BookApplied linear statistical models: regression, analysis of variance, and experimental designs (3rd ed)Homewood, IL: Irwin.

The evolutionary origins of mood and its disordersCurrent Biology 22:R712–R721.https://doi.org/10.1016/j.cub.2012.06.020

Modelbased fMRI and its application to reward learning and decision makingAnnals of the New York Academy of Sciences 1104:35–53.https://doi.org/10.1196/annals.1390.022

The turker blues: hidden factors behind increased depression rates among Amazon’s Mechanical TurkersClinical Psychological Science 8:65–83.https://doi.org/10.1177/2167702619865973

Subjective WellBeing in adolescence: the role of SelfControl, social support, age, gender, and familial crisisJournal of Happiness Studies 17:81–104.https://doi.org/10.1007/s1090201495855

Affect grid: a singleitem scale of pleasure and arousalJournal of Personality and Social Psychology 57:493–502.https://doi.org/10.1037/00223514.57.3.493

Anterior cingulate cortex: unique role in cognition and emotionThe Journal of Neuropsychiatry and Clinical Neurosciences 23:121–125.https://doi.org/10.1176/jnp.23.2.jnp121

Mood lability and psychopathology in youthPsychological Medicine 39:1237–1245.https://doi.org/10.1017/S0033291708004662

Adolescence and reward: making sense of neural and behavioral changes amid the ChaosThe Journal of Neuroscience 37:10855–10866.https://doi.org/10.1523/JNEUROSCI.183417.2017

Toward a consensual structure of moodPsychological Bulletin 98:219–235.https://doi.org/10.1037/00332909.98.2.219

Properties of the mood and feelings questionnaire in adolescent psychiatric outpatients: a research noteJournal of Child Psychology and Psychiatry 36:327–334.https://doi.org/10.1111/j.14697610.1995.tb01828.x
Decision letter

Jonathan RoiserReviewing Editor; University College London, United Kingdom

Timothy E BehrensSenior Editor; University of Oxford, United Kingdom
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This is a very interesting study whose goal is to determine what drives subjective mood over time during a rewardbased decision making task. The authors report data from a series of online behavioural studies and one performed during neuroimaging. Participants played a wellestablished gambling task during which they had to select between a sure outcome and a 50:50 gamble, reporting momentary mood assessments throughout the game. The authors compared the performance of a number of computational models of how the mood ratings were generated.
The authors identify as their "baseline" model that proposed by Rutledge and colleagues, in which an important determinant of mood seems to be the reward prediction error  this is called the Recency model. They contrast it with a Primacy model, in which earlier events (in this case, aggregated and weighted experienced outcomes) play a more important role. They validate the model across different behavioural conditions, involving healthy participants and depressed patients. The conclusion is that the data are more consistent with their Primacy model, in other words a stronger weight of earlier events on reported mood. In the fMRI experiment they found that the weights of the Primacy model correlated with prefrontal activation across participants, while this was not the case for the Recency model.
The paper is clearly written and easy to understand. The question of how humans combine experienced events into reported mood is topical and the conclusions are striking, given the dominance of recencybased models in the literature (e.g. Kahneman's peakend heuristic). The paper takes an interesting approach and presents an impressive amount of data, and the reviewers and editor felt that it makes a substantial contribution to the literature.
Decision letter after peer review:
Thank you for submitting your article "The temporal representation of experience in subjective mood" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
As the editors have judged that your manuscript is of interest, but as described below that substantial additional analyses are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)
Summary:
This is a very interesting study whose goal is to determine what drives subjective mood over time during a rewardbased decision making task. The authors report data from a series of online studies and one performed with fMRI. Participants played a wellestablished gambling task during which they had to select between a sure outcome and a 50:50 gamble, reporting momentary mood assessments throughout the game. The authors compared the performance of a number of models of how the mood ratings were generated.
The authors identify as their "baseline" model that proposed by Rutledge and colleagues, in which an important determinant of mood seems to be the reward prediction error  the authors call this Recency model. They contrast it with a Primacy model, where earlier events (in this case, average experienced outcomes) play a more important role. They validate the model across different behavioural conditions, involving healthy subjects, teenagers and depressive patients. The conclusion is that the data are more consistent with their Primacy model, in other words a higher weight of earlier events on reported mood. In the fMRI experiment they found that the weights of the Primacy model correlated with prefrontal activation across subjects, while this was not the case for the Recency model.
The paper is clearly written and easy to understand. The question of how humans combine experienced events into reported mood is topical and the conclusions are striking, given the dominance of recencybased models in the literature (e.g. Kahneman's peakend heuristic). The paper takes an interesting approach and presents an impressive amount of data.
However, at some points the arguments seemed a considerable stretch, in part because important experimental and methodological detail is missing, and in part because the analyses do not currently consider a number of potential confounds in both the models and the task design. It is not clear whether these concerns can be addressed or not, but we would like to give you the opportunity to do so. Ultimately, these concerns come down to whether we can be certain that the results reflect a true primacy effect, as opposed to some other process that simply appears at face value to be a primacy effect.
To this end, some important checks need to be made concerning both the computational and the fMRI analyses, as detailed below. These do require substantial extra modelling work, and it is quite possible that the conclusions will not survive these control analyses.
Essential revisions:
1. In relation to model comparison, the authors need to show us whether or not their model selection criteria allow them to correctly recover the true generative model in simulated datasets. Are we sure that the model selection criteria are unbiased toward the two models?
2. Related to point (1), can the authors provide a qualitative signature of mood data that falsifies the Recency model at the group level (see Palminteri, Wyart and Koechlin. 2017). They do so in Figure S2 for one participant, but it would be important to show the same (or similar) result at the group level (this should be easier in the structured or in the structuredadaptive conditions).
3. It is not clear where the weights on the primacy graph (Figure 1B) come from. The recency weights make sense  there is a discount factor in the model that is less than 1, so there is a an exponential discount of more distant past events. However, for the primacy model the expectation is apparently calculated as the arithmetic mean of previous outcomes (which suggests a flat weight across previous trials) and the discount factor remains. So how can this generate the decreasing pattern of weights? It would be really useful if the authors could spell this out as it is currently quite confusing.
4. The models differ in terms of whether they learn about the expected value of the gamble outcomes, or whether they assume a 50:50 gamble (the Recency model assumes this, the Primacy model generates an average of all experienced outcomes). This leaves open the possibility that the benefit of the Primacy model is simply that people do in fact largely use experienced outcomes to generate their expectations, rather than believing the outcome probabilities displayed in the experiment. Can the authors exclude this possibility?
5. Related to the above point, the structured and adaptive environments seem to have something to learn about (blocks with positive vs. negative RPEs), so it is perhaps not surprising that humans show evidence of learning here, and a model with some learning outperforms one with none.
The description of these environments is insufficient at present  can the authors explain how RPEs were manipulated? Was it by changing the probability of win/loss outcomes, and if so how? Or was it by changing the magnitudes of the options? For the adaptive design was the change deterministic? And was the outcome (and thus the RPE) therefore always positive if mood was low; or was this probabilistic, and if so with what probability? Finally, did the Recency model still estimate its expectations here as 50:50, even when this was not the case? If so, this requires justification.
6a. In addition to changing the expectation term of the Recency model, the Primacy model also drops the term for the certain outcomes (because this improves model performance). Can this account for the relative advantage of the Primacy over the Recency model? In other words, if the certain outcome term is dropped from the Recency model as well, does the Primacy model still win? If the authors want to establish conclusively that Primacy is a better model than Recency, then surely more models ought to be compared, at the very least using a 2x2 design with primacy/recency of expectations/outcomes.
6b. On a related point, the standard Recency model was originally designed such that the certain option C was NOT the average of the two gambles, so C was required in the model (at least in the 2014 PNAS paper). Here, C is the average of the gambles, so presumably it would be identical to E in the Recency model, and therefore be extraneous in the Recency model as well as the Primacy model. Did the authors perform model comparison to see if C could be eliminated from the Recency model? If so, this is not another difference between the models after all.
7. The structured and structuredadaptive tasks seem to have some potential problems when it comes to assessing their impact on mood ratings:
i. the valence of the blocks was not randomised, meaning that the results could be confounded. E.g. what if negative RPE effects are longerlasting than positive RPE effects? This seems plausible given the downward trend in mood in the random environment despite average RPE of zero. Could this also explain the pattern of mood in the other two tasks, rather than primacy?
ii. scaling: if there is a marginally decreasing relationship between cumulative RPE and mood (such that greater and greater RPEs are required to lift/decrease mood by the same amount), then this will resemble a primacy effect? This is unlikely to be an issue in the random task but could be a problem in the structured and certainly in the structuredadaptive tasks.
iii. individual differences in responsiveness to RPE: in the structuredadaptive task, some subjects' mood ratings may be very sensitive to RPE, and others very insensitive. One might expect that given the control algorithm has a target mood, the former group would reach this target fairly soon and then have trials without any RPEs, while the latter group would not reach the target despite ever increasing RPEs. In both cases the Primacy model would presumably win, due to sensitivity to outcomes in the first half or insensitivity to bigger outcomes in the second half respectively? Can the authors exclude these possibilities?
8. In relation to the fMRI analyses, the results in the main text seem to result from a secondlevel ANCOVA, where the individual weights of the Primacy model are shown to correlate with activation in the prefrontal cortex. Similar analyses using the weights of Recency model do not produce significant results at the chosen threshold. This analysis is problematic for two reasons. First, absence of evidence does not imply evidence of absence  was a formal comparison of the regression coefficients conducted? Second, to really validate the model the authors should show that the trialbytrial correlates of expectations and prediction errors are more consistent with the Primacy than the Recency model, using a parametric analysis at the participant level.
9. Similar to point 6, it is hard to conclude much about the models from the fact that the Primacy model E beta (but not the Recency model E beta) correlates with BOLD responses in a prefrontal cluster, when the Recency model E term is based on previous expectations, not previous outcomes. Likewise with the direct comparison of the models' voxelwise correlation images.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for submitting your article "The temporal representation of experience in subjective mood" for consideration by eLife. Your article has been reviewed by the same 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted the below summary to help you prepare a revised submission. The reviewers felt that you had been very receptive to their comments, making extensive changes and performing the requested additional analyses. Consequently, the manuscript is nearly ready for publication, subject to a few final clarifications that are listed below.
Essential Revisions:
1. The new Figure S1 is helpful, but caused some confusion as the description in the legend could be clearer. The reviewers assume that the blue bars are the weights attached to all outcomes (not expectations) from trial 18, all contributing to the expectation on trial 8; the orange bars are the weights attached to all outcomes from trial 17, all contributing to the expectation on trial 7; and so on. Assuming this understanding is correct, please amend the text in the legend to clarify this better, and also provide a colour key for the figure to make it clear that the bars refer to outcomes from specific trials.
2. The Primacy vs Recency model comparison is of critical importance, so it would help the paper to be as clear as possible about it. The key comparison here is between the variant of the Recency model that is identical to the Primacy model in terms of having both a learned outcome as the expectation and without the certainty term. Your model comparison seems to test these changes separately rather than together (apologies if we have misunderstood this).
Can you explain what it is, specifically, about the Primacy model that causes it to perform better than the best Recency one (e.g. is it the nature of the way the expectation is learned, or something else)? As we understand it, the central claim of your paper is that people learn the expectation term in a way that effectively puts more weight on the initial outcomes encountered, so it would be really useful to understand what it is about the learning process that causes this effect, and whether it is this that improves the fit of the model.
3. Relating to the above point, the key comparisons between Primacy and Recency models in the main manuscript (e.g. the comparative fMRI analyses) should be between these similar models, not the Primacy and original Recency model.
https://doi.org/10.7554/eLife.62051.sa1Author response
Essential revisions:
1. In relation to model comparison, the authors need to show us whether or not their model selection criteria allow them to correctly recover the true generative model in simulated datasets. Are we sure that the model selection criteria are unbiased toward the two models?
Thank you for asking for this validation. It has helped us to finetune our model selection criteria and reach a single optimal predictive criterion (as described below) that indeed allows us to recover the true generative model in simulated data. Using this selection criterion, we replicated our finding that the Primacy model was a better temporal description of mood ratings, and in addition, we use this criterion to test a set of new models that we developed to address the questions raised by the reviewers.
Method:
We performed a model recovery assessment to validate our model selection criteria comparing the Primacy and Recency models (Wilson and Collins, 2019). In order to simulate the mood ratings, we first fit the Primacy and Recency models to the Structured task data. We then used the variance of the residuals as the variance for Gaussian distributions around the predicted mood values produced by each model. Next, we generated simulated data for each participant for each model based on these distributions. Finally, we fit the models to both sets of simulated data and evaluated the ability of each of our model selection criteria to correctly identify the model that generated the data. Significant differences for each criterion were evaluated with the Wilcoxon test, as was done for our main analysis.
Results:
According to both the training error and the streaming prediction criteria, the Recency model performs better on data simulated with the Recency model and the Primacy model performs better on data simulated with the Primacy model (see Table S1). In the process of conducting the model recovery assessment, we found that the streaming prediction error criterion was unstable in the first few trials due to fewer available data points, so we modified this criterion to discard the first ten mood ratings. We then opted to use the streaming prediction errors for model comparison, as it is a more valid criterion due to the training error favoring overfitting (as a model with more parameters will fit the training data better). Other performance tests, such as the AIC and BIC, do place a penalty on each additional parameter added to the model. But, on the other hand, the AIC still tends to favor overfitting while the BIC tends to be conservative. Furthermore, both AIC and BIC require the assumption of a parametric distribution for the noise term, while metrics based on prediction error do not require such an assumption. Therefore, we could apply streaming prediction directly to our models while application of AIC or BIC would have required us to make additional assumptions about the distributional form of the noise terms. The streaming prediction error we use is a heldout prediction error, reflecting sequential prediction of each mood rating (Hastie, Tibshirani et al., 2009). We therefore used the streaming prediction criterion to replicate our previous result as well as to test the newly developed models.
It is not a crossvalidation, which is unfeasible because of the temporal dependency in our data.
Changes made:
On page 9 of the results:
“The Primacy vs Recency models performance criteria
We then performed a model recovery assessment to validate the model selection criteria by which we compare the performance of the Primacy and Recency models. We first generated simulated datasets using each of the models and then fit the models and tested whether we could correctly identify the model that generated the data. According to both the training error and the streaming prediction criteria, it was possible to recover the true model from the simulated data (the Recency model performed better on data simulated with the Recency model, and vice versa, see TableS1). We then preferred to use the streaming prediction error across all comparisons, as it is a more valid criterion due to the training error favoring overfitting (Hastie, Tibshirani et al., 2009).”
On Page 5, of the Introduction:
“We then examine the generalizability of the performance of the two models across simulated data, in a model recovery analysis that protects from model selection biases (Hastie, Tibshirani et al., 2009).”
2. Related to point (1), can the authors provide a qualitative signature of mood data that falsifies the Recency model at the group level (see Palminteri, Wyart and Koechlin, 2017). They do so in Figure S2 for one participant, but it would be important to show the same (or similar) result at the group level (this should be easier in the structured or in the structuredadaptive conditions).
We have extended our Figure S2 (now Figure S4 on page 48) to include a group level representation of mood ratings and the respective triallevel Recency and Primacy model parameters:
3. It is not clear where the weights on the primacy graph (Figure 1B) come from. The recency weights make sense  there is a discount factor in the model that is less than 1, so there is an exponential discount of more distant past events. However, for the primacy model the expectation is apparently calculated as the arithmetic mean of previous outcomes (which suggests a flat weight across previous trials) and the discount factor remains. So how can this generate the decreasing pattern of weights? It would be really useful if the authors could spell this out as it is currently quite confusing.
This point is a very important one to make clear to the reader and therefore we thank the reviewers for asking us to clarify.
We now further explain in the paper (on page 6) that the stronger weight of earlier outcomes in the Primacy model emerges from two separate aspects of the model: First, that one’s expectation for the next outcome is based on the average of all previously received outcomes and, moreover, that mood is determined by the sum of all such past expectations. We also illustrate these two aspects of the model and how they are integrated to a Primacy weighting with the following graphical illustration (on Page 40):
4. The models differ in terms of whether they learn about the expected value of the gamble outcomes, or whether they assume a 50:50 gamble (the Recency model assumes this, the Primacy model generates an average of all experienced outcomes). This leaves open the possibility that the benefit of the Primacy model is simply that people do in fact largely use experienced outcomes to generate their expectations, rather than believing the outcome probabilities displayed in the experiment. Can the authors exclude this possibility?
To test the possibility that the better performance of the Primacy model is due to individuals using experienced outcomes to generate their expectations, rather than the Primacy weighting, we compared a new model against the Primacy model: a Recency model where the expectation term is based on the previous outcome rather than current trial’s possible outcome values (the “Recency with outcome as expectation model”).
This test showed that the Primacy model performed better than this “Recency with outcome as expectation model”, in the random, Structured, and Structuredadaptive tasks, according to the streaming prediction performance criterion. This is also the result for the additional sample of adolescents, that includes clinically depressed subjects as well as different experimental conditions (labbased rather than online collection). See Table1 for a summary of these results.
We further addressed this possibility along with the possible learning component of the Primacy model in the Structured and Structuredadaptive tasks by creating a Recency model that considers the actual individual winning probability, instead of a fixed win probability of 50% The triallevel individual winning probability is the percentage of previous trials with a win outcome. This is calculated for each trial by dividing the number of preceding trials that resulted in winning the higher outcome by the overall number of previous trials. This model also introduces a similar component of learning from previous trials to the Recency model, allowing us to test whether this “learning” accounts for the superior performance of the Primacy model, as suggested in point (5).
This “Dynamic win probability recency model” was implemented by considering win probability (Equation 4):
where the sum in the numerator counts the previous trials on which the higher value H was the outcome (I is a binary vector of this condition with 1 for the outcome H and 0 for the lower outcome L) and the sum in the denominator counts the previous trials on which the participant chose to gamble (G is a binary vector of this condition with 1 for the choice to gamble and 0 for the choice of the certain value). The additional bias of 5 in the numerator and 10 in the denominator implement Bayesian shrinkage corresponding to 10 prior observations with an average success of 0.5.
The expectation term uses the dynamically updated win probability to compute expected gambling outcome (Equation 5):
This test showed that the Primacy model performed better than this dynamic Recency model, for all three Random, Structured and Structuredadaptive reward tasks, according to the validated streaming prediction criterion (see Table1 for stats).
Changes made:
We summarize these new results in Table1 which presents all model fitting values and stats for each of the three reward environments:
We describe these new results in the added subsection ““Primacy vs other variants of the Recency model” of the Results on page 14:
In addition, these results are presented in the new supplemental TableS2 on page 46 which describes the formulations of the new alternative Recency models:
And in the new Figure 3B on page 17:
We also added another section to the discussion, where we address additional aspects of the Primacy model on page 25:
“Our results demonstrate that the Primacy model is superior to the Recency model, indicating that the full history of prior events influences mood. However, inclusion of a recencyweighted outcomes in the RPE term of the Primacy model prevents us from concluding simply that early events are more important than recent events in the ultimate outcome of selfreported mood. We therefore also note, that when fitting the Primacy model the coefficients of the expectation term were significantly larger than the coefficients of the RPE term (which include recency weighted outcomes), supporting the dominance of the expectation term primacy weighting (paired ttest with t = 2.6, p = 0.009, CI = [0.008,0.059]). Additionally, there may be alternative, mathematically equivalent formulations of these models that would support different interpretations. Future work should compare the overall impacts of primacy and recency effects on mood with approaches robust to reparameterization, such as analysis of the causal effect of previous outcomes on mood using the potentialoutcomes framework (Imbens and Rubin, 2015).”
5. Related to the above point, the structured and adaptive environments seem to have something to learn about (blocks with positive vs. negative RPEs), so it is perhaps not surprising that humans show evidence of learning here, and a model with some learning outperforms one with none.
The description of these environments is insufficient at present  can the authors explain how RPEs were manipulated? Was it by changing the probability of win/loss outcomes, and if so how? Or was it by changing the magnitudes of the options? For the adaptive design was the change deterministic? And was the outcome (and thus the RPE) therefore always positive if mood was low; or was this probabilistic, and if so with what probability? Finally, did the Recency model still estimate its expectations here as 50:50, even when this was not the case? If so, this requires justification.
We have added elaborate explanations of the different reward environments, addressing both the manipulation of RPEs in general as well as specifically in the adaptive task.
We have extended and modified the task descriptions as follows (starting on page 30 of the Methods section):
“The Random task:
Participants played a gambling task where they experienced a series of different RPE values while rating their mood after every 23 trials. In this task, each trial consisted of a choice of whether to gamble between two monetary values or receive a certain amount. RPE values were randomly modified [ranging between 2.5 to +2.5] by assigning random values to the two gambles and a 50% probability for receiving one of these values as an outcome. The certain value was the average of the two gamble values.”
“The Structured task:
In this version participants experienced blocks of high or low RPE values, i.e., patterns of positive or negative events, where RPE values were predefined and identical for all participants. RPE values were set by a premade choice of the two gamble values and the outcome value, such that these values were random but RPE value, the difference between the average of the two values and the outcome, resulted in a predefined value (positive blocks of RPE=+5 during the first and the third blocks and a negative block in the middle of RPE = –5 during the second block). To maintain the unpredictability of outcomes within a block in the latter Fixed version, 30% of the trials were incongruent to the block valence (i.e., small negative values of 1.5 during the first and third positive blocks, and a positive value of +1.5 during the second negative block).
The certain value was the average between the two gamble values. To avoid a predictable pattern of wins and losses, 30% of the trials were incongruent trials, where gamble and outcome values resulted in the opposite valence of RPE to the block (negative during the first and third positive blocks and positive during the middle negative block), but of a smaller value (RPE incongruent = 1.5). More specifically, the task consisted of three blocks, where each block had 27 trials and 1112 mood ratings. Trials were identical to the Random version in appearance. Participants again completed overall 34 mood ratings in 15 min of task time.”
"The StructuredAdaptive task:
More specifically, in each iteration of mood rating, the current mood, M(t), was compared to the block mood target value (M_{T}), which was set prior to the task in the aim to generate maximal mood transitions. Mood target value was defined as the maximal mood value on the mood scale in first and third blocks, and the minimal mood value during the second block. To bring the mood value as close as possible to the target value M_{T}, the algorithm aimed at minimizing the error between the rated mood and the target mood value (M_{E}).
[…]
This closedloop circuit continued throughout the task, with each new mood rating used to update the reward values for the next series of 23 trials.”
The structuredadaptive task design was probabilistic as there was a 70% chance for a congruent trial.
As for the second part of the question (“Finally, did the Recency model still estimate its expectations here as 50:50, even when this was not the case”):
The Recency model indeed considered 5050 win probability, which was not the case in this task design. We therefore validated our result on the random task design, where the win probability was 5050. We also validated the Primacy model against a Recency model with a dynamic win probability (see point 4 above).
6a. In addition to changing the expectation term of the Recency model, the Primacy model also drops the term for the certain outcomes (because this improves model performance). Can this account for the relative advantage of the Primacy over the Recency model? In other words, if the certain outcome term is dropped from the Recency model as well, does the Primacy model still win? If the authors want to establish conclusively that Primacy is a better model than Recency, then surely more models ought to be compared, at the very least using a 2x2 design with primacy/recency of expectations/outcomes.
6b. On a related point, the standard Recency model was originally designed such that the certain option C was NOT the average of the two gambles, so C was required in the model (at least in the 2014 PNAS paper). Here, C is the average of the gambles, so presumably it would be identical to E in the Recency model, and therefore be extraneous in the Recency model as well as the Primacy model. Did the authors perform model comparison to see if C could be eliminated from the Recency model? If so, this is not another difference between the models after all.
To answer this question, we have conducted the suggested analysis and dropped the Certain outcome term from the recency model, to form the model “Recency without a Certain term”.
We found that the Primacy model performed better than this alternative model for all three Random, Structured and Structuredadaptive reward tasks according to the validated streaming prediction criterion (see Table1 for stats). This model is described in the new subsection of the Results on page 14.
We appreciate the contribution of the reviewers in pointing out these alternatives and hope that our results presented in response to this item and the previous items demonstrates that the Primacy effect is unlikely to be a result of these alternative explanations.
7. The structured and structuredadaptive tasks seem to have some potential problems when it comes to assessing their impact on mood ratings:
i. the valence of the blocks was not randomised, meaning that the results could be confounded. E.g. what if negative RPE effects are longerlasting than positive RPE effects? This seems plausible given the downward trend in mood in the random environment despite average RPE of zero. Could this also explain the pattern of mood in the other two tasks, rather than primacy?
ii. scaling: if there is a marginally decreasing relationship between cumulative RPE and mood (such that greater and greater RPEs are required to lift/decrease mood by the same amount), then this will resemble a primacy effect? This is unlikely to be an issue in the random task but could be a problem in the structured and certainly in the structuredadaptive tasks.
iii. individual differences in responsiveness to RPE: in the structuredadaptive task, some subjects' mood ratings may be very sensitive to RPE, and others very insensitive. One might expect that given the control algorithm has a target mood, the former group would reach this target fairly soon and then have trials without any RPEs, while the latter group would not reach the target despite ever increasing RPEs. In both cases the Primacy model would presumably win, due to sensitivity to outcomes in the first half or insensitivity to bigger outcomes in the second half respectively? Can the authors exclude these possibilities?
The reviewers raise several important concerns related to potential experimental confounds, that might have led to the Primacy weighting advantage. We would like to point out a few points that we believe address these concerns (and we also discuss these points in the manuscript on page 21):
First, we agree that a lack of randomization of block order is a weakness of the experiments we present, and we have added language to the discussion highlighting this point. However, we do not find evidence in our data for negative RPE effects being longerlasting than positive ones. We find a sharp increase of mood at the beginning of the third block (from negative mood back up to positive mood) in the adaptive task, for example. Moreover, since the negative block in the adaptive task is second, negative RPEs being longerlasting would on the contrary, interfere with a Primacy weighting of events, rather than explain it.
Second, as the reviewers mention, we find a primacy weighting of events on mood in the random task, as well as in the Structure task. Since RPEs do not adaptively increase over time in these tasks, it is unlikely that this is an explanation for the superior performance of the Primacy model in general. However, the difference between the primacy and recency models is larger in the Structured and Structured Adaptive tasks, and some of this difference may be due to an adaptation to rewards.
Third, we agree that the interaction between individual behavior and the controller in the Structured Adaptive environment could raise numerous interpretative difficulties. However, the Primacy model also fits better in the structured and random tasks, where the tasks did not respond to individual differences in responsiveness to RPEs. This indicates that the better fit of the Primacy model is unlikely to be solely driven by the adaptive controller in the Structured Adaptive task. However, it is possible that the controller does provide an advantage to the Primacy model over the Recency models in the Structured Adaptive task, and we have added these caveats to the discussion, as follows:
On page 21 of the Discussion:
“It is conceivable that what appears to be a primacy effect, is actually due to longer lasting effects of, say, positive RPEs on mood—this could be particularly exacerbated in the structured adaptive task. However, since our result was robust also in a random task design, as well as when testing a model with a varying time window parameter that considered different number of previous trials (t_{max}, see Figure S3), we do not find evidence for the block valence order to account for the better performance of the Primacy model. In addition, the interaction between individual behavior and the controller in the structuredadaptive environment could raise interpretative difficulties. Therefore, it is important to stress that the Primacy model also fit better in the structured and random tasks, where the tasks did not respond to individual differences in responsiveness to RPEs. Yet, it is possible that this is a contribution to the fact that the advantage of the Primacy model over the Recency models is greater in the structuredadaptive task. There may also be additional mechanisms at play in the structuredadaptive task such as hedonic extinction towards RPEs that explain some of the increased performance of the Primacy model compared to the Recency model in this task.”
8. In relation to the fMRI analyses, the results in the main text seem to result from a secondlevel ANCOVA, where the individual weights of the Primacy model are shown to correlate with activation in the prefrontal cortex. Similar analyses using the weights of Recency model do not produce significant results at the chosen threshold. This analysis is problematic for two reasons. First, absence of evidence does not imply evidence of absence  was a formal comparison of the regression coefficients conducted? Second, to really validate the model the authors should show that the trialbytrial correlates of expectations and prediction errors are more consistent with the Primacy than the Recency model, using a parametric analysis at the participant level.
To address this point, we now explain the comparison of the regression coefficients we have conducted in more detail. In this analysis, we contrasted between the regression coefficients of the two models and showed that the activation in the prefrontal cortex was more strongly correlated to the individual expectation weights of the Primacy model versus the Recency model (with t = 5.00, p = 0.0017, peak at [11,49,9], and a cluster size of 529 voxels). We explain this analysis in more details in the figure legend, as follows:
On page 19 of the Results, Figure 4:
“A formal comparison between the relation of brain activation to the Primacy versus the Recency models, was conducted. We compared the regression coefficients of the correlation between participants’ brain activation and the Primacy expectation term weights, versus the regression coefficients of the relation to the Recency model expectation term (see Figure S5 for the two images before thresholding and before contrasting against each other)”
Although we strongly agree that a trial level correlation would be an interesting analysis, we believe the number of trials in our task is insufficient to allow us to test this question with sufficient rigor. We definitely agree that this question of the triallevel correlations will be important to study, as it might inform of a different resolution of such taskbrain relations, e.g., temporal processes that occur across trials and within blocks. Although this is of great interest to us, it is unfortunately beyond the scope of our collected data. However, we believe that our current results answer a different but not less important question, namely the relation between the overall individual weight of the expectation term across the entire task and the overall neural activation along the 34 mood ratings.
9. Similar to point 6, it is hard to conclude much about the models from the fact that the Primacy model E beta (but not the Recency model E beta) correlates with BOLD responses in a prefrontal cluster, when the Recency model E term is based on previous expectations, not previous outcomes. Likewise with the direct comparison of the models' voxelwise correlation images.
We thank the reviewers for pointing out that we were unclear in our explanation of this. We hope that our updated explanation (legend of Figure 4) makes the point clearer. Both 𝛽_{E} coefficients of the Recency and the Primacy models are based on previous expectations E, while the E term differs between the models in being either based on the current gamble options (in the Recency model) or the average of all previous outcomes (in the Primacy model). As our result shows (figure 4), the expectation term weight 𝛽_{E} from the Primacy model correlates significantly with bold responses in the prefrontal cluster (a peak at [3,52,6], size of 132 voxels, peak beta = 44.80, t = 3.37, threshold at p = 0.0017). We then directly tested this correlation against the correlation of the 𝛽_{E} of the Recency model, and found that the Primacy model 𝛽_{E} was more strongly correlated to brain activity (peak at [11,49,9], t = 5.00, extending to a cluster of 529 voxels, threshold at p = 0.0017). This result provides a possible neural underpinning specific to the Primacy model’s mathematical realization of expectations and mood.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Essential Revisions:
1. The new Figure S1 is helpful, but caused some confusion as the description in the legend could be clearer. The reviewers assume that the blue bars are the weights attached to all outcomes (not expectations) from trial 18, all contributing to the expectation on trial 8; the orange bars are the weights attached to all outcomes from trial 17, all contributing to the expectation on trial 7; and so on. Assuming this understanding is correct, please amend the text in the legend to clarify this better, and also provide a colour key for the figure to make it clear that the bars refer to outcomes from specific trials.
Thank you for asking for this clarification helped us to improve the clarity of this figure to the readers. We have modified the text of the legend in the following way (on page 4647):
“The Primacy effect of outcomes on mood. In the Primacy model, the expectation term ${E}_{j}$ is the unweighted average of previous outcomes. At each trial, all previous expectation terms are combined in an exponentially weighted sum: ${\sum}_{j=1}^{t}{\gamma}^{tj}{E}_{j}$. Here we illustrate how this gives rise to a primacy weighting of previous outcomes that depends on the value of $\gamma $ (each subplot represents a different magnitude of exponential weighting $\gamma $). The total height of each bar represents the influence of the outcome of the corresponding trial on the result of the exponential sum at the end of trial 9. Each color indicates the contributions of the outcomes that form an expectation term ${E}_{j}$ at the end of trial j. The dark yellow block represents the contribution of the expectation term ${E}_{1}$ from the end of trial 1 (comprised only of the first outcome). The grey blocks represent the contributions of the expectation term ${E}_{2}$ that is being added from the end of trial 2 (which is the average of the outcomes from trials 1 and 2 and therefore it appears in both the first and the second bars). This continues for the rest of the expectation terms until the last expectation term ${E}_{9}$ is added, which is formed by averaging the outcomes from trials 1 to 9 as shown by the blue bars. “
2. The Primacy vs Recency model comparison is of critical importance, so it would help the paper to be as clear as possible about it. The key comparison here is between the variant of the Recency model that is identical to the Primacy model in terms of having both a learned outcome as the expectation and without the certainty term. Your model comparison seems to test these changes separately rather than together (apologies if we have misunderstood this).
Can you explain what it is, specifically, about the Primacy model that causes it to perform better than the best Recency one (e.g. is it the nature of the way the expectation is learned, or something else)? As we understand it, the central claim of your paper is that people learn the expectation term in a way that effectively puts more weight on the initial outcomes encountered, so it would be really useful to understand what it is about the learning process that causes this effect, and whether it is this that improves the fit of the model.
We have addressed this modelling question by building an additional model as suggested, that has both a learned dynamic outcome probability as expectation and no Certain value term. We have tested for these two properties separately before to follow a simple one modification at a time process, but we greatly agree with the importance of testing such a merged Recency model that is most similar to the Primacy model. We have therefore run all the analyses of this manuscript again with this new merged model (termed the Recency with both dynamic win and no Certain term model). We now present in the paper these new results which strengthen and support the conclusion of this paper, as we found that the Primacy model also preforms better than this more similar Recency model.
Our conclusion that the effect of outcomes on mood through expectations has a primacy weighting in our tasks holds robustly when we consider a variety of different but similar models which either have primacy weighting (Figure 3figure supplement 2) or recency weighting (Table 1). All the models with primacy weighting share that the expectation is based on an average over previous outcomes or potential outcome values. We stress that the expectation itself does not have to have primacy weighting for our conclusions to hold. The primacy model that we have chosen as our representative primacy model (due to having superior or statistically indistinguishable performance over the alternative primacy models) applies equal weights to all past outcomes to form the expectation, but we have also tried models where the weighting within the expectation had higher weights for more recent outcomes. In all these cases, the combination of current and past expectation still results in a primacy weighted aggregate effect of previous outcomes on mood. The dependence of mood on an accumulation of previous expectations is therefore what causes the primacy weighting, as the initial outcomes have a larger influence on mood versus a smaller influence of past expectation terms. In an intuitive sense, the primacy effect represents the greater weight first experiences have in a new environment or context, simply by virtue of coming first. The first event has nothing against which it can be compared, the second event has only itself and the first; the third event can be compared only against the first two, and so on, till eventually each additional event has a minimal impact in the face of all the events that have come before. The more trials we experience, the more information we gain, and the less meaning each event has on its own. This process has clear parallels to learning, but our models are agnostic to the exact mechanism by which expectations are accumulated. It is likely that there are equivalent formulations to our models in which expectation is a learned parameter controlled by a learning rate. The details of this mechanism are certainly of interest, but these will need to elucidated by future studies.
We have added the above interpretation of the primacy model to the Discussion on pages 2627.
3. Relating to the above point, the key comparisons between Primacy and Recency models in the main manuscript (e.g. the comparative fMRI analyses) should be between these similar models, not the Primacy and original Recency model.
Method:
As requested, we have performed all the analyses throughout this manuscript using the new merged Recency model which is most similar to the Primacy model, termed the “Recency with both dynamic win and no Certain”. These analyses included model comparison on fitting the data from all the different datasets, i.e., the random, structured, and structuredadaptive tasks, adult and adolescent samples, clinically depressed adolescents, online and labbased conditions. Moreover, the new model coefficients were used to conduct again the brain activity correlation analysis, to search for subjectlevel correlations between this merged Recency model variant and neural activity when rating mood.
Results:
The new Recency model with both dynamic win probability and no Certain term performed better on all the different datasets and samples used in this study, according to the streaming prediction criterion (see the new Table 1 that presents these values and the new section of the Results that summarizes this work).
Moreover, brain analysis revealed a similar result to the results with the original Recency model. No significant neural correlations were found to this model’s coefficients, but only to the primacy model coefficients (there were also no frontal clusters before thresholding the images, as shown in the Supplement figure to Figure 4 where the nonthresholded images of this analysis are presented). We have conducted both the wholebrain level analysis with the new merged Recency model and also contrasted these resulting images against the resulting images of the Primacy model, to formally show the stronger relation of the Primacy model against this new Recency model as well (see legend of Figure 4 for these results).
Changes made:
First, the ordering of the manuscript was changed to match the presentation of the merged Recency model now throughout the sections of the results (thus presenting the different Recency models prior to reporting the results).
Results section on page 10:
“Next, we merged the dynamic win probability and the elimination of the Certain term to an additional Recency model that is most similar in its characteristic to the Primacy model (i.e., the “Recency with both dynamic win and no Certain model”). “
Table 1 on page 16: Now reports the performance of the Primacy model against the new Recency model as well.
Results section on page 18:
“We correlated BOLD signal with the participantlevel weights of the parameters of the Primacy and two of the Recency models (the original Recency model and the Recency model that is most similar to the Primacy model, i.e., the one with both dynamic win probability and no Certain term).”
Figure 4 legend, on pages 1920:
“(see Figure 4figure supplement 1 for the images of the relation to the Primacy model and each of the Recency models before thresholding and before contrasting against each other). This contrast showed a significantly stronger relation of the Primacy model expectation weight to brain signals around the ACC region (for p = 0.0017), with 529 voxels around [11,49,9] for the original Recency model and 328 voxels around [11.2, 48.8, 3.8] for the Recency with both dynamic win and no Certain term model”
https://doi.org/10.7554/eLife.62051.sa2Article and author information
Author details
Funding
National Institute of Mental Health (Intramural Research Program no. ZIAMH00295701)
 Hanna Keren
 Katharine Chang
 Aria Vitale
 Dylan Nielson
 Argyris Stringaris
National Institute of Mental Health (Intramural Research Program)
 Charles Zheng
 David C Jangraw
 Francisco Pereira
National Institute of Mental Health
 Robb B Rutledge
Medical Research Council (Career Development Award MR/N02401X/1)
 Robb B Rutledge
Brain and Behavior Research Foundation (NARSAD Young Investigator Award)
 Robb B Rutledge
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Elisabeth A Murray and Nathaniel D Daw for helpful comments and questions. This research was supported in part by the Intramural Research Program of the National Institute of Mental Health National Institutes of Health (NIH) (Grant No. ZIAMH00295701 to AS). RBR is supported by the National Institute of Mental Health (1R01MH124110), a Medical Research Council Career Development Award (MR/N02401X/1), and a NARSAD Young Investigator Award from the Brain & Behavior Research Foundation, P&S Fund. This work used the computational resources of the NIH HPC (highperformance computing) Biowulf cluster (http://hpc.nih.gov). The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, and approval of the manuscript; or decision to submit the manuscript for publication. The views expressed in this article do not necessarily represent the views of the National Institutes of Health, the Department of Health and Human Services, or the United States Government.
Ethics
Clinical trial registration NCT03388606.
Human subjects: All participants signed informed consent to a protocol approved by the NIH Institutional Review Board. The protocol is registered under the clinical trial no. NCT03388606.
Senior Editor
 Timothy E Behrens, University of Oxford, United Kingdom
Reviewing Editor
 Jonathan Roiser, University College London, United Kingdom
Version history
 Received: August 12, 2020
 Accepted: June 2, 2021
 Accepted Manuscript published: June 15, 2021 (version 1)
 Accepted Manuscript updated: June 18, 2021 (version 2)
 Version of Record published: June 29, 2021 (version 3)
Copyright
This is an openaccess article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics

 3,501
 Page views

 367
 Downloads

 8
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Neuroscience
Cerebellar climbing fibers convey diverse signals, but how they are organized in the compartmental structure of the cerebellar cortex during learning remains largely unclear. We analyzed a large amount of coordinatelocalized twophoton imaging data from cerebellar Crus II in mice undergoing ‘Go/Nogo’ reinforcement learning. Tensor component analysis revealed that a majority of climbing fiber inputs to Purkinje cells were reduced to only four functional components, corresponding to accurate timing control of motor initiation related to a Go cue, cognitive errorbased learning, reward processing, and inhibition of erroneous behaviors after a Nogo cue. Changes in neural activities during learning of the first two components were correlated with corresponding changes in timing control and error learning across animals, indirectly suggesting causal relationships. Spatial distribution of these components coincided well with boundaries of AldolaseC/zebrin II expression in Purkinje cells, whereas several components are mixed in single neurons. Synchronization within individual components was bidirectionally regulated according to specific task contexts and learning stages. These findings suggest that, in close collaborations with other brain regions including the inferior olive nucleus, the cerebellum, based on anatomical compartments, reduces dimensions of the learning space by dynamically organizing multiple functional components, a feature that may inspire newgeneration AI designs.

 Neuroscience
Ultrasonic vocalizations (USVs) fulfill an important role in communication and navigation in many species. Because of their social and affective significance, rodent USVs are increasingly used as a behavioral measure in neurodevelopmental and neurolinguistic research. Reliably attributing USVs to their emitter during close interactions has emerged as a difficult, key challenge. If addressed, all subsequent analyses gain substantial confidence. We present a hybrid ultrasonic tracking system, Hybrid Vocalization Localizer (HyVL), that synergistically integrates a highresolution acoustic camera with highquality ultrasonic microphones. HyVL is the first to achieve millimeter precision (~3.4–4.8 mm, 91% assigned) in localizing USVs, ~3× better than other systems, approaching the physical limits (mouse snout ~10 mm). We analyze mouse courtship interactions and demonstrate that males and females vocalize in starkly different relative spatial positions, and that the fraction of female vocalizations has likely been overestimated previously due to imprecise localization. Further, we find that when two male mice interact with one female, one of the males takes a dominant role in the interaction both in terms of the vocalization rate and the location relative to the female. HyVL substantially improves the precision with which social communication between rodents can be studied. It is also affordable, opensource, easy to set up, can be integrated with existing setups, and reduces the required number of experiments and animals.