Primate prefrontal neurons signal economic risk derived from the statistics of recent reward experience
Abstract
Risk derives from the variation of rewards and governs economic decisions, yet how the brain calculates risk from the frequency of experienced events, rather than from explicit riskdescriptive cues, remains unclear. Here, we investigated whether neurons in dorsolateral prefrontal cortex process risk derived from reward experience. Monkeys performed in a probabilistic choice task in which the statistical variance of experienced rewards evolved continually. During these choices, prefrontal neurons signaled the rewardvariance associated with specific objects (‘object risk’) or actions (‘action risk’). Crucially, risk was not derived from explicit, riskdescriptive cues but calculated internally from the variance of recently experienced rewards. Supportvectormachine decoding demonstrated accurate neuronal risk discrimination. Within trials, neuronal signals transitioned from experienced reward to risk (risk updating) and from risk to upcoming choice (choice computation). Thus, prefrontal neurons encode the statistical variance of recently experienced rewards, complying with formal decision variables of object risk and action risk.
https://doi.org/10.7554/eLife.44838.001Introduction
Rewards vary intrinsically. The variation can be characterized by a probability distribution over reward magnitudes. Economists distinguish between risk when probabilities are known and ambiguity when probabilities are only incompletely known. The variability of risky rewards can be quantified by the higher statistical ‘moments’ of probability distributions, such as variance, skewness or kurtosis (Figure 1A). The most frequently considered measure of economic risk is variance (D'Acremont and Bossaerts, 2008; Kreps, 1990; Markowitz, 1952; Rothschild and Stiglitz, 1970), although skewness and even kurtosis constitute also feasible risk measures that capture important components of variability (Burke and Tobler, 2011; D'Acremont and Bossaerts, 2008; Genest et al., 2016; Symmonds et al., 2010). Thus, among the different definitions of economic risk, variance constitutes the most basic form, and this study will consider only variance as economic risk.
Existing neurophysiological studies on risk have used explicit, wellestablished informative cues indicating specific levels of risk in an unequivocal manner (Fiorillo et al., 2003; Lak et al., 2014; Ledbetter et al., 2016; McCoy and Platt, 2005; Monosov, 2017; Monosov and Hikosaka, 2013; O'Neill and Schultz, 2010; O'Neill and Schultz, 2013; Raghuraman and PadoaSchioppa, 2014; Stauffer et al., 2014; White and Monosov, 2016). However, in daily life, outside of testing laboratories, animals are confronted with risky rewards without being overtrained on explicit, riskdescriptive cues; they need to estimate themselves the risk from the experienced rewards in order to make economic decisions. Thus, the more natural way to address the inherently risky nature of rewards is to sample the occurrence of reward from experience in a continuous manner, integrate it over time, and compute risk estimates from that information.
The current study aimed to obtain a more representative view on neuronal risk processing by studying variance risk estimated from experience. To this end, we examined behavioral and neurophysiological data acquired in a probabilistic choice task (Herrnstein, 1961; Tsutsui et al., 2016) in which reward probabilities, and thus risk, changed continuously depending on the animal’s behavior, without being explicitly indicated by specific riskdescriptive cues. Similar to previous risk studies, experienced rewards following choices for specific objects or actions constituted external cues for risk estimation. The terms of risk seeking and risk avoidance indicate that risk processing is subjective. Therefore, we estimated subjective risk from the recent history of the animal’s own choices and rewards, based on logistic regression; we investigated the objective risk derived from experimentally programmed reward probabilities only for benchmark tests, again without explicit riskdescriptive cues. The individually distinct risk attitudes reflect the fact that risk influences economic decisions (Holt and Laury, 2002; Stephens and Krebs, 1986; Weber and Milliman, 1997). Therefore, we followed concepts of decisionmaking based on competition between object values or action values (Deco et al., 2013; Sutton and Barto, 1998; Wang, 2008) and defined in analogy object risk as the risk attached to individual choice objects and action risk attached to individual actions for obtaining the objects. We studied individual neurons in the dorsolateral prefrontal cortex (DLPFC) of rhesus monkeys where neurons are engaged in reward decisions (Barraclough et al., 2004; Donahue and Lee, 2015; Kennerley et al., 2009; Seo et al., 2012; Tsutsui et al., 2016; Watanabe, 1996). The results suggest that DLPFC neurons signal risk derived from internal estimates of ongoing reward experiences.
Results
Choice task and behavior
Two monkeys performed in a probabilistic choice task (Figure 1B) in which they repeatedly chose between two visual objects (A and B) to obtain liquid rewards (Tsutsui et al., 2016). The matching behavior in this task has previously been reported (Tsutsui et al., 2016). Here, we briefly show the basic pattern of matching behavior across animals before addressing the novel question of how risk influenced behavior and neuronal activity. In the task, both options had the same, constant reward amount but specific, independently set base probabilities during blocks of typically 50–150 trials. Despite the set base probability, each object’s instantaneous reward probability increased in each trial in which the object was not chosen but fell back to its base probability after the object had been chosen (Equation 1). Importantly, once reward probability had reached p=1.0, the reward remained available until the animal chose the object. Thus, reward probabilities changed depending on the animal’s choice, and the current, instantaneous level of reward probability was not explicitly cued. Accordingly, to maintain an estimate of reward probability, the animal would need to track internally its own choices and experienced rewards.
Under these conditions, an efficient strategy consists of repeatedly choosing the object with the higher base probability and choosing the alternative only when its instantaneous reward probability has exceeded the base probability of the currently sampled object (Corrado et al., 2005; Houston and McNamara, 1981). Aggregate behavior in such tasks usually conforms to the matching law (Herrnstein, 1961), which states that the ratio of choices to two alternatives matches the ratio of the number of rewards received from each alternative. Such behavior has been observed in monkeys (Corrado et al., 2005; Lau and Glimcher, 2005; Lau and Glimcher, 2008; Sugrue et al., 2004). Consistent with these previous studies, the animals allocated their choices proportionally to the relative objectreward probabilities (Figure 1C,D). Through their alternating choices, they detected uncued baseprobability changes and adjusted their behavior accordingly (Figure 1E).
Thus, the animals’ behavior in the choice task corresponded well to the theoretical assumptions (Herrnstein, 1961; Houston and McNamara, 1981) and suggested that they estimated well the current reward probabilities of the choice options. On the basis of these choices, we derived specific risk measures that we used as regressors for neuronal responses in DLPFC.
Definition of risk measures
We used two measures of variance risk. The first, objective risk measure linked our study to previous work and provided a foundation for investigating subjective risk. In the choice task, the objective reward probability evolved continually from the animal’s choices (Equation 1). This characteristic allowed us to calculate objective risk in each trial as statistical variance derived only from the objective reward probability (Equation 2) (reward amount was constant and identical for each option). The second, subjective measure of risk addressed the origin of the neuronal risk information in the absence of informative risk cues. Here, risk was derived directly and subjectively from the evolving statistical variance of recently experienced reward outcomes resulting from specific choices (Equation 58). We then investigated whether responses in DLPFC neurons associated these two different risk estimates with choice objects (‘object risk’) or with actions required for choosing these objects (‘action risk’), irrespective of the animal’s choice.
Neuronal coding of objective risk
The first, most basic risk measure derived risk as variance in each trial and for each choice object directly from the ‘invertedU’ function (Figure 1A) of the true, objective, ‘physical’ reward probability (which depended in each trial on the animal’s previous choices) (Equation 2). This risk measure derived from the programmed binary probability (Bernoulli) distribution on each trial that governed actual reward delivery and assumed no temporal decay in subjective perception, attention or memory of reward occurrence and risk. Risk as variance defined in this way increased between p=0.0 and p=0.5 and decreased thereafter, following an invertedU function (Figure 1A). Thus, probability and risk were not one and the same measure; they correlated positively for the lower half of the probability range (p=0.0 to p=0.5) and inversely for the upper half of the probability range (p=0.5 to p=1.0); higher probability did not necessarily mean higher risk.
Among 205 taskrelated DLPFC neurons, 102 neurons had activity related to the objective form of risk, defined as variance derived from the true probability (50% of taskrelated neurons; p<0.05 for object risk coefficient, multiple linear regression, Equation 3; 185 of 1222 significant taskrelated responses in different trial periods, 15%; taskrelatedness assessed by Wilcoxon test with p<0.005, corrected for multiple comparisons). During the fixation periods, the activity of the neuron shown in Figure 2A reflected the risk for object B (Figure 2A,B; p=0.0172, multiple linear regression, Equation 3), whereas the coefficient for objectA risk was nonsignificant (p=0.281). Critically, the signal reflected the variance risk associated with object B and neither reward probability for object A or B (both p>0.36) nor choice, action or leftright cue position (all p>0.25). To further evaluate the objectspecificity of the risk response, we classified the response using the angle of regression coefficients (Equation 4, see Materials and methods), which confirmed object risk coding rather than risk difference or risk sum coding of the two objects (Figure 2C). Thus, the neuron’s activity signaled the continually evolving true risk for a specific choice object. Across neuronal responses, classification based on the angle of regression coefficients (Equation 4) showed 195 significant risk responses in 89 neurons (p<0.05, Ftest), of which 75 responses (39%, 47 neurons, Figure 2C) coded object risk rather than risk difference or risk sum, thus confirming the objectrisk coding described above. In the population of DLPFC neurons, objectrisk responses occurred in all task periods (Figure 2D), including the prechoice periods, in time to inform decisionmaking.
We also tested for neuronal coding of objective action risk. Once the leftright position of the risky choice objects was revealed on each trial, the animals could assess the risk associated with leftward or rightward saccade actions. A total of 57 DLPFC neurons coded action risk in cue or postcue task periods (p<0.05; multiple linear regression, Equation 4, action risk regressors). During the choice cue phase, the activity of the neuron in Figure 2E reflected the action risk for leftward saccades (p=0.041, multiple linear regression, Equation 4), whereas the coefficient for rightward saccade risk was nonsignificant (p=0.78). The signal reflected the variance risk associated with left actions but neither reflected reward probability for left or right actions (both p>0.44), nor the actual choice or leftright action (both p>0.34) but reflected additionally the leftright cue position (p=0.0277).
Taken together, a significant number of DLPFC neurons showed activity related to the true, objective risk associated with specific objects or actions. Although this basic risk measure accounted for the continually evolving risk levels in the choice task, it did not reflect the assumption that the animals’ risk estimates were likely subjective, owing to imperfect knowledge and memory of the true reward probabilities. Accordingly, we next defined a subjective risk measure, validated its relevance to the animals’ behavior, and tested its coding by DLPFC neurons.
Subjective risk: definition and behavior
Whereas our first, objective risk measure concerned the variance derived from objective (true, programmed) reward probability, our second risk measure assumed imperfect, temporally degrading assessment of recently experienced rewards. To obtain this measure, we established subjective weights for recently experienced rewards using logistic regression on the animal’s reward and choice history (Equation 5), following standard procedures for analyzing subjective decision variables in similar choice tasks (Corrado et al., 2005; Lau and Glimcher, 2005). These weights (Figure 3A,B) revealed declining influences of past rewards and past choices on the animal’s currenttrial choice, in line with previous matching studies (Corrado et al., 2005; Lau and Glimcher, 2005). On the basis of this result, the assessment of subjective variance risk in each trial considered only data from the preceding 10 trials.
For behavioral and neuronal comparisons between subjective risk and value, we first defined ‘object value’ as the recencyweighted reward value for a specific choice object (Tsutsui et al., 2016). We followed previous studies of matching behavior (Lau and Glimcher, 2005) that distinguished two influences on value: the history of recent rewards and the history of recent choices. The first value component related to reward history can be estimated by the mean of subjectively weighted reward history over the past ten trials (Figure 3C, dashed blue curve, Equation 6) and provided a useful comparison for our subjective risk measure, which also derived from reward history (described in the next paragraph). To estimate a comprehensive measure of object value for behavioral and neuronal analysis, we incorporated the additional effect of choice history on value, which is distinct from reward history as shown in previous studies of matching behavior (Lau and Glimcher, 2005). Thus, we estimated object value based on both subjectively weighted reward history and subjectively weighted choice history (Equation 7); this constituted our main value measure for behavioral and neuronal analyses. (We consider distinctions between reward and choice history and their potential influence on risk in the Discussion).
We next calculated for each trial the subjective measure of ‘object risk’ as the statistical variance of the distribution of rewards of the preceding ten trials, separately for each object (Figure 3C, dashed magenta curve; Equation 8). Specifically, subjective object risk was derived from the sum of the weighted, squared deviations of objectspecific rewards from the mean of the objectspecific reward distribution over the past ten trials (Equation 8); subjective weighting of the squared deviations with empirically estimated reward weights (Figure 3A, Equation 5) accounted for declining influences of more remote past trials on behavior. Object risk defined in this manner varied continuously as a function of past rewards and past choices as follows. When blockwise reward probability was low, each reward increased both object risk and value; without further rewards, both risk and value decreased gradually over subsequent trials (Figure 3C compare blue and magenta curves; note that the blue curve shows the effect of reward history on value). When reward probability was high, object risk increased when a reward was omitted (which drove the instantaneous probability toward the center of the invertedU function); with further rewards, object risk decreased gradually over subsequent trials (driving the instantaneous probability toward the high end of the inverted U function). Risk was highest in mediumprobability blocks with alternating rewarded and unrewarded trials (variations around the peak of the invertedU function).
We next assessed the animals’ risk attitude by testing the influence of subjective risk on the animals’ choices. Choice probability increased monotonically with increasing difference in object value (Figure 3D, ΔValue, derived from the sum of weighted reward and choice histories, Equation 7). Importantly, risk had an additional influence on object choice: with increasing risk difference between objects (ΔRisk, Equation 8), choice probability consistently increased for the higherrisk object even with constant valuedifference level (Figure 3D, yellow and orange bars). The more frequent choice of the riskier object at same value level indicated riskseeking. A specific logistic regression (Equation 9, different from the logistic regression estimating the subjective weights for past trials) confirmed the riskseeking attitude by a significant positive weight (i.e. beta) of risk on the animals’ choices, independent of value (Figure 3E,F). When using a subset of trials with minimal value difference for this logistic regression, we again found a significant influence of risk on choices (Figure 3E, inset). Formal comparisons favored this choice model based on subjective object value (derived from both weighted reward history and choice history) and subjective object risk over several alternatives, including models without risk, models with different value definitions, models based on objective (true) reward probabilities and risks, and variants of reinforcement learning models (see Table 1). Our subjective risk measure also showed specific relationships to saccadic reaction times, alongside value influences on reaction times (Figure 3—figure supplement 1). These data confirmed the previously observed positive attitudes of macaques towards objective risk (Genest et al., 2016; Lak et al., 2014; McCoy and Platt, 2005; O'Neill and Schultz, 2010; Stauffer et al., 2014) and validated our subjective, experiencebased objectrisk measure as regressor for neuronal activity.
Neuronal coding of subjective risk associated with choice objects
The subjective variancerisk for specific choice objects, derived from reward history, was coded in 95 of the 205 taskrelated DLPFC neurons (46%; p<0.05 for objectrisk regressors, multiple linear regression, Equation 10). These 95 neurons showed 153 objectriskrelated responses (among 1222 taskrelated responses, 13%; 28 of the 153 responses coded risk for both objects, and 125 responses only for one object). Importantly, objectrisk coding in these neurons was not explained by object value, which was included as covariate in the regression (shared variance between risk and value regressors: R^{2} = 0.148 across sessions). A distinct objectchoice regressor significantly improved the regression for only 12 responses (p<0.05, partial Ftest, Equation 10), suggesting most objectrisk responses (141/153, 92%) were choiceindependent (p=1.8 × 10^{−25}, ztest). A subset of 66 of 153 risk responses (43%) fulfilled our strictest criteria for coding object risk: they coded risk before choice, only for one object, and irrespective of the actual choice, thus complying with requirements for coding a decision variable analogous to object value.
The activity of the neuron in Figure 4A–B illustrates the response pattern of an objectrisk neuron. During fixation, the activity of the neuron in Figure 4A reflected the current risk estimate for object A (p=0.0316, multiple linear regression, Equation 10), but was not significant for objectB risk (p=0.69), nor for object values (both p>0.22). True to the concept of a decision input, the risk signal occurred well before the monkey made its choice (in time to inform decisionmaking) and it was not explained by currenttrial choice, cue position, or action (all p>0.46). Classification based on the angle of regression coefficients confirmed coding of object risk, rather than relative risk (Figure 4C). Thus, the neuron’s activity signaled the continually evolving subjective risk estimate for a specific choice object and may constitute a suitable input for decision mechanisms under risk.
Classification of neuronal responses based on the angle of regression coefficients (Equation 4) showed 159 significant risk responses in 80 neurons (p<0.05, Ftest), of which 83 responses (52%, 53 neurons, Figure 4C) coded object risk rather than risk difference or risk sum. This result confirmed that a substantial number of neurons encoded risk for specific objects; in addition, other neurons encoded risk signals related to both objects as risk difference or risk sum, similar to encoding of value difference and value sum in previous studies (Cai et al., 2011; Tsutsui et al., 2016; Wang et al., 2013). Object risk signals occurred with high prevalence in early trial epochs, timed to potentially inform decisionmaking (Figure 4D). They were recorded in upper and lower principal sulcus banks, confirmed by histology (Figure 4—figure supplement 1).
Activity of these objectrisk neurons conformed to key patterns that distinguish riskcoding from valuecoding. Their population activity followed the typical invertedU relationship with reward value (cf. Figure 1A) by increasing as a function of value within the lowvalue range and decreasing with value within the highvalue range (Figure 4E). Accordingly, reward outcomes should increase or decrease riskrelated activity depending on whether the additional reward increased or decreased the variance of recent rewards. This feature of rewardvariance risk was implicit in the formulation of our risk regressor (Figure 3C, magenta curve) and is also illustrated in the population activity in Figure 4F. Following reward receipt on trial N1 (blue curves), activity of riskneurons on the subsequent trial increased only when the reward led to an increase in reward variance (Figure 4F magenta curve, cf. Equation 8; ascending slope in Figure 4E). By contrast, when reward receipt led to decreased reward variance, neuronal activity on the subsequent trial also decreased (Figure 4F green curve; descending slope in Figure 4E). Thus, activity of risk neurons followed the evolving statistical variance of rewards, rather than reward probability.
Control analyses for subjective objectrisk coding
Further controls confirmed subjective objectrisk coding in DLPFC. Slidingwindow regression without preselecting responses for taskrelatedness identified similar numbers of objectrisk neurons as our main fixedwindow analysis (82/205 neurons, 40%; Equation 11). This slidingwindow regression also confirmed that objectrisk signals were not explained by pasttrial reward, choice, or reward ×choice history, which were regression covariates.
Because our behavioral model estimated risk over multiple past trials, we also tested whether history variables from the past two trials could explain objectrisk coding (Equation 12). An extended regression identified some responses that reflected nonlinear interactions between rewards over two consecutive past trials (Figure 4—figure supplement 2A); these responses might contribute to risk estimation by detecting changes in reward rate. However, they were rare and did not explain our main finding of objectrisk coding (Figure 4—figure supplement 2B; 99 risk neurons from Equation 12).
Varying the integrationtime windows for risk estimation (using different exponentials and integration up to 15 past trials) resulted in some variation of identified numbers of risk neurons but did not affect our main finding of riskcoding in DLPFC neurons (Figure 4—figure supplement 3A).
A direct comparison of objective and subjective risk showed that neuronal activity tended to be better explained by subjective risk. We compared the amount of variance explained by both risk measures when fitting separate regressions. The distributions of partialR^{2} values were significantly different between risk measures (Figure 4—figure supplement 3B, p=0.0015, KolmogorovSmirnov test), attesting to the neuronal separation of these variables. Specifically, subjective risk explained significantly more variance in neuronal responses compared to objective risk (p=0.0406, Wilcoxon test). When both risk measures were included in a stepwise regression model (Equation 13), and thus competed to explain variance in neuronal activity, we identified more neurons related to subjective risk than to objective risk (107 compared to 83 neurons, Figure 4—figure supplement 3C), of which 101 neurons were exclusively related to subjective risk but not objective risk (shared variance between the two risk measures across sessions: R^{2} = 0.111 ± 0.004, mean ± s.e.m.).
We also considered alternative, more complex definitions of subjective risk that incorporated either weighted reward history or both weighted reward and choice history in the risk calculation. These alternative definitions yielded identical or only slightly higher numbers of identified risk neurons compared to our main risk definition (Figure 4—figure supplement 4; less than 5% variation in identified neurons). We therefore focused on our main risk definition (Equation 8), which was simpler and more conservative as it incorporated fewer assumptions.
Finally, we examined effects of potential nonstationarity of neuronal activity (ElberDorozko and Loewenstein, 2018), by including a firstorder autoregressive term in Equation 10. This resulted in 88 identified risk neurons (compared to 95 neurons in our original analysis). In a further test, we subtracted the activity measured in a control period (at trial start) of the same trial before performing the regression analysis; this procedure should remove effects due to slowly fluctuating neuronal activities. This analysis identified 56 neurons with activity related to risk (note that the control period itself was excluded from this analysis; our original analysis without the control period yielded 81 risk neurons).
Taken together, objectrisk signals reflecting our subjective risk measure occurred in significant numbers of DLPFC neurons and were robust to variation in statistical modeling.
Neuronal coding of subjective risk associated with actions
Neurons in DLPFC process reward values not only for objects (Tsutsui et al., 2016) but also for actions (Khamassi et al., 2015; Seo et al., 2012). Accordingly, we derived subjective action risk from the experienced variance of recent rewards related to specific actions. As our task varied reward probability for particular objects independently of reward probability for particular actions, object risk and action risk showed only low intercorrelation (R^{2} = 0.153). Among 205 taskrelated neurons, 90 (44%) coded action risk (148 of 1222 taskrelated responses, 12%; Equation 14). A subset of 77 of 148 actionrisk signals (52%) fulfilled our strictest criteria for action risk: they coded risk before the saccadic choice, only for one action, and irrespective of the actual choice, thus complying with requirements for coding a decision variable analogous to action value.
The fixationperiod activity of the neuron in Figure 5A signaled the risk associated with rightward saccades, reflecting the variance of rewards that resulted from recent rightward saccades (Figure 5A; p=0.0233, multiple linear regression, Equation 14). The neuronal response was linearly related to risk for rightward but not leftward saccades and failed to correlate with action choice, object choice or action value (all p>0.11; Figure 5B). Classification based on the angle of regression coefficients confirmed the designation as an actionrisk signal (Figure 5C). Thus, the neuron’s activity signaled the continually evolving subjective risk estimate for a specific action.
Classification of responses based on the angle of regression coefficients (Equation 4) showed 149 significant risk responses in 90 neurons (p<0.05, Ftest), of which 71 responses (48%, 56 neurons) coded action risk rather than risk difference or risk sum (Figure 5C). This result confirmed that a substantial number of neurons encoded risk for specific actions, in addition to neurons encoding risk sum or difference. Actionrisk signals occurred throughout all trial periods (Figure 5D). Adding an actionchoice regressor improved the regression model in only 15 of 148 risk responses (p<0.05, partial Ftest). Thus, most responses (133/148, 90%) coded actionrisk without additional actionchoice coding (p=3.2 × 10^{−22}, ztest for dependent samples). Activity of actionrisk neurons followed the typical invertedU relationship with reward value (Figure 5E). Reward increased the activity of these riskneurons only when the reward increased current reward variance, but decreased neuronal activity when it led to decreased reward variance (Figure 5F). Thus, activity of actionrisk neurons followed the evolving statistical variance of rewards.
Object risk and action risk were often coded by distinct neurons. Separate multiple regression analyses (Equations 10 and 14) revealed that 43 of the 205 taskrelated neurons (21%) encoded object risk but not action risk, and 38 of the 205 taskrelated neurons (19%) encoded action risk but not object risk. A stepwise regression on both object risk and action risk (Equation 15) resulted in 55 neurons encoding object risk but not action risk (27%) and 38 neurons encoding action risk but not object risk (19%, Figure 4—figure supplement 3C). Controlling for nonstationarity of neuronal responses, we identified 83 actionrisk neurons when including a firstorder autoregressive term and 56 neurons when subtracting neuronal activity at trial start. Neurons encoding object risk and action risk were intermingled without apparent anatomical clustering in DLPFC (Figure 4—figure supplement 1), similar to previous studies that failed to detect clustering of objectselective and locationselective neurons (Everling et al., 2006).
Population decoding of object risk and action risk
To quantify the precision with which downstream neurons could read risk information from DLPFC neurons, we used previously validated populationdecoding techniques, including nearestneighbor and linear supportvectormachine classifiers (Grabenhorst et al., 2012; Tsutsui et al., 2016). We subjected the total of 205 DLPFC neurons to this analysis and did not preselect riskneurons for these analyses (‘unselected neurons’). We grouped trials according to terciles of object risk and action risk and performed classification based on low vs. high terciles (see Materials and methods).
We successfully decoded object risk and action risk from the population of 205 DLPFC neurons, with accuracies of up to 85% correct in prechoice trial periods (Figure 6A,B). Decodingaccuracy increased as a function of the number of neurons in the decoding sample (Figure 6A). Both object risk and action risk were coded with good accuracy across task periods, although object risk was coded significantly more accurately than action risk in several periods (Figure 6B). Decoding from randomly sampled small subsets of neurons (N = 20 per sample) showed that riskdecoding accuracy depended on individual neurons’ risk sensitivities (standardized regression slopes; Figure 6C). Decoding from specifically defined subsets showed that even small numbers of individually significant risk neurons enabled accurate riskdecoding (Figure 6D). However, individually significant objectrisk neurons carried little information about action risk and individually significant actionrisk neurons carried little information about object risk (Figure 6D), attesting to the neuronal separation of object risk and action risk in DLPFC. Decoding of risk from neuronal responses remained significantly above chance in control analyses in which we held constant the value of other taskrelated variables including object choice, action and cue position (Figure 6E).
Taken together, unselected population activity carried accurate codes for object risk and action risk. These neuronal population codes depended on population size and individual neurons’ risk sensitivities.
Dynamic integration of risk with reward history, value and choice in single neurons
Neurons often signaled object risk irrespective of other factors. However, over the course of a trial, many neurons dynamically integrated risk with behaviorally important variables in specific ways that were predicted on theoretical grounds. We assessed these coding dynamics with a slidingwindow regression (Equation 11), which also served to confirm the robustness of our main fixedwindow analysis reported above.
Our main, subjective risk measure derived from the history of recently received rewards (Equation 8). Following this concept, neurons often combined objectrisk signals with information about rewards or choices from previous trials (‘history’ variables). Early on in trials, the neuron in Figure 7A signaled whether or not reward had been received on the last trial, following the choice of a particular object. This signal was immediately followed by an explicit objectrisk signal, reflecting the updated, currenttrial risk level given the outcome of the preceding trial. In total, 44 neurons showed such joint coding of rewardchoice history variables and explicit object risk (54% of 82 riskcoding neurons from slidingwindow regression, 21% of 205 recorded neurons; Equation 11; Figure 7B). By dynamically coding information about recent rewards alongside explicit objectrisk signals, these DLPFC neurons seemed suited to contribute to internal risk calculation from experience.
According to the meanvariance approach of finance theory (D'Acremont and Bossaerts, 2008; Markowitz, 1952), the integration of expected value and risk into utility is thought to underlie behavioral preferences. In agreement with this basic concept, some DLPFC neurons dynamically combined objectrisk signals with objectvalue signals (34 neurons, 41% of 82 riskcoding neurons, 17% of 205 neurons; Equation 11; Figure 7C,D). The neuron in Figure 7C showed overlapping objectvalue and objectrisk signals early in trials during the fixation period, in time to inform object choice on the current trial. This result is potentially consistent with the integration of risk and value into utility. Supplementary analyses (Figure 7—figure supplement 2, described below) provided further evidence that some individual neurons integrated risk and value into utilitylike signals (although formal confirmation of utility coding would require additional behavioral testing).
If neuronal risk signals in DLPFC contributed to decisions, the activity of individual neurons might reflect the forward information flow predicted by computational decisions models (Deco et al., 2013; Grabenhorst et al., 2019; Wang, 2008), whereby reward and risk evaluations precede choice. Thus, object risk as an important decision variable and the resulting, subsequent object choice should be jointly represented by neurons during decisionmaking. Indeed, activity in some DLPFC neurons dynamically combined object risk with the choice the animal was going to make (29 neurons, 35% of 82 riskcoding neurons, 14% of 205 recorded neurons; Equation 11; Figure 7E,F). At the time of choice, the neuron in Figure 7E signaled the risk of a specific object moments before it signaled the object choice for that trial, consistent with theoretically predicted transformations of risk and value signals into choice. Comparing the coding latencies for different variables across the population of DLPFC neurons, signals for reward history, object risk and object value arose significantly earlier than choice signals (p<0.0001, ranksum tests; Figure 7G).
The percentages of neurons coding specific pairs of variables was not significantly different than expected given the probabilities of neurons coding each individual variable (history and risk: χ2 = 1.58, p=0.2094, value and risk: χ2 = 3.54, p=0.0599, choice and risk: χ2 = 0.845, p=0.358). We also tested for relationships in the coding scheme (measured by signed regression coefficients) among neurons with joint risk and choice coding or joint risk and value coding. Across neurons, there was no significant relationship between the regression coefficients (standardized slopes) for the different variables (Figure 7H). This suggested that while some neurons used corresponding coding schemes for these variables (risk and choice, risk and value) other neurons used opposing coding schemes (see Discussion).
Overall, the majority of DLPFC neurons coded taskrelated variables in combination with other variables (Figure 7I). Thus, ‘pure’ coding of any given variable, including object risk, was rare in DLPFC and many neurons dynamically combined these signals with one or more additional variables (ztests for dependent samples comparing proportion of joint and pure coding: p<1.6 × 10^{−13} for all variables in Figure 7I). In addition to the riskrelated dynamic coding transitions described above, activity in some DLPFC neurons transitioned from coding risk to coding of spatial variables such as cue position or action choice (Figure 7—figure supplement 1).
Thus, in addition to pure risk coding, DLPFC neurons frequently combined object risk with other reward and decision parameters on individual trials. These neurons provide a suitable basis for internal risk calculation and for the influence of risk on economic choices.
Utility control
According to approaches in finance theory, risk is integrated with expected value into utility (D'Acremont and Bossaerts, 2008; Markowitz, 1952). We tested whether neuronal risk responses were accounted for by subjective integration of value (derived from the mean of the recent reward and choice histories) and risk (derived from the variance of the recent reward history). We calculated this meanvariance utility as a weighted sum of object value and risk with the weights for value and risk derived from logistic regression (Equation 9).
The neuron in Figure 7—figure supplement 2A reflected the utility of object A (p=0.025; Equation 10, with objectutility regressors substituting objectvalue regressors); it failed to reflect utility of object B, object risk, cue position or action (Figure 7—figure supplement 2B, all p>0.075) but reflected in addition the object choice (p=0.0029). Multiple regression identified such responses reflecting the utility of individual objects (Figure 7—figure supplement 2C,D, Equation 10). Specifically, 109 responses (97 neurons) were classified as coding utility (based on the angle of utility coefficients, Equation 4). Population decoding accuracy for utility was significant across task periods (Figure 7—figure supplement 2E). As for risk, decoding accuracy increased with more neurons and dependent on individual neurons’ sensitivities (Figure 7—figure supplement 2F,G).
Neuronal coding of utility did not account for our main finding of riskcoding. Regression with utility and risk regressors (in addition to object choice, cue position and action regressors) showed 108 neurons with significant risk responses (52.7% of neurons). Among them, 34 neurons were significant for risk but not utility. A stepwise regression resulted in 222 risk responses (of 1222 taskrelated responses, 18%) and 186 utility responses (15.2%). Thus, riskrelated signals were not accounted for by utility.
Discussion
These data suggest that neurons in primate DLPFC signal the variance of fluctuating rewards derived from recent experience. Neuronal risk signals correlated with the subjective risk estimated from the animal’s choices and with experimentally programmed, objective risk. The frequently changing risk levels in the choice task required the animal to derive risk from the statistics of experienced rewards, rather than from cues that explicitly signaled risk levels (such as pretrained riskassociated bar stimuli or fractals), in order to make choices. The variancetracking neurons encoded risk information explicitly, and distinctly, as object risk and action risk; these risk signals were specific for particular objects or actions, occurred before the animal’s choice, and typically showed little dependence on the object or action being chosen, thus complying with criteria for decision inputs (Sutton and Barto, 1998). Some neurons dynamically combined risk with information about past rewards, current object values or future choices, and showed transitions between these codes within individual trials; these characteristics are consistent with theoretical predictions of decision models (Deco et al., 2013; Grabenhorst et al., 2019; Wang, 2008) and our model of subjective risk estimation from recent experience (Figure 3C). These prefrontal risk signals seem to provide important information about dynamically evolving risk estimates as crucial components of economic decisions under uncertainty.
Risk is an abstract variable that is not readily sensed by decisionmakers but requires construction from experience or description (Hertwig et al., 2004). Reward probabilities in our choice task varied continually, encouraging the animals to alternately choose different objects and actions. Under such conditions, reward value and risk for objects and actions vary naturally with the animals’ behavior. We followed previous studies that estimated subjective reward values, rather than objective physical values, as explanatory variables for behavior and neurons (Lau and Glimcher, 2005; Lau and Glimcher, 2008; Samejima et al., 2005; Sugrue et al., 2004). To estimate subjective risk, we adapted an established approach for rewardvalue estimation based on the integration of rewards over a limited window of recent experiences (Lau and Glimcher, 2005). We used this approach to calculate subjective risk from the variance of recent reward experiences and showed that the animals’ choices (Figure 3D–F) and prefrontal neurons (Figures 4 and 5) were sensitive to these continually evolving subjective risk estimates, irrespective of value.
We could detect risk neurons also with an objective risk measure derived from true reward probability. However, our subjective risk measure took into account that the animals likely had imperfect knowledge of true reward probabilities and imperfect memory for past rewards. Accordingly, the observed neuronal relationships to objective risk are unlikely to reflect perfect tracking of the environment by the neurons; rather, the correlation with objective risk is likely explained by the fact that objective and subjective risk were related. This situation is similar to previous studies that compared coding of objective and subjective values (Lau and Glimcher, 2008; Samejima et al., 2005).
Our definition of subjective risk has some limitations. To facilitate comparisons with previous studies, we restricted our definition to the variance of past reward outcomes; we did not extend the risk measure to the variance of past choices, which would not have a clear correspondence in the economic or neuroeconomic literature. Our subjective risk measure provided a reasonable account of behavioral and neuronal data: it had a distinct relationship to choices, was encoded by a substantial number of neurons, and tended to better explain neuronal responses compared to objective risk. We followed the common approach of calculating reward statistics over a fixed temporal window because of its generality and simplicity, and to link our data to previous studies. An extension of this approach could introduce calculation of reward statistics over flexible time windows, possibly depending on the number of times an option was recently chosen (similar to an adaptive learning rate in reinforcement learning models).
Many neurons encoded risk for specific objects or actions prior to choice and independently of choice. Such pure risk signals could provide inputs for competitive, winnertakeall decision mechanisms that operate on separate inputs for different options (Deco et al., 2013; Grabenhorst et al., 2019; Wang, 2008). The presence of DLPFC neurons that transitioned from riskcoding to choicecoding (Figure 7E) is consistent with this interpretation. In addition to providing pure risk inputs to decisionmaking, objectrisk signals could converge with separately coded value signals to inform utility calculations, which require integration of risk with expected value according to individual risk attitude (D'Acremont and Bossaerts, 2008; Markowitz, 1952). This possibility is supported by DLPFC neurons that dynamically encoded both risk and value for specific choice objects (Figure 7C). However, confirmation that DLPFC neurons encode utility will require further experimental testing with formal utility curves (Genest et al., 2016; Stauffer et al., 2014). Thus, risk neurons in DLFPC seem well suited to contribute to economic decisions, either directly by providing riskspecific decision inputs or indirectly by informing utility calculations.
Notably, while some DLPFC neurons jointly coded risk with value or choice in a common coding scheme (indicated by regression coefficients of equal sign), this was not the rule across all neurons with joint coding (Figure 7H). This result and the observed high degree of joint coding, with most DLPFC dynamically coding several taskrelated variables (Figure 7I), matches well with previous reports that neurons in DLPFC show heterogeneous coding and mixed selectivity (Rigotti et al., 2013; Wallis and Kennerley, 2010). An implication for the present study might be that risk signals in DLPFC can support multiple cognitive processes in addition to decisionmaking, as also suggested by the observed relationship between risk and reaction times (Figure 3—figure supplement 1).
Objects (or goods) represent the fundamental unit of choice in economics (PadoaSchioppa, 2011; Schultz, 2015), whereas reinforcement learning in machine learning conceptualizes choice in terms of actions (Sutton and Barto, 1998). The presently observed neuronal separation of object risk and action risk is analogous to neuronal separation of value signals for objects (Grabenhorst et al., 2019; PadoaSchioppa, 2011; So and Stuphorn, 2010; Tsutsui et al., 2016) and actions (Lau and Glimcher, 2008; Samejima et al., 2005; Seo et al., 2012). Accordingly, objectrisk and actionrisk signals could provide inputs to separate mechanisms contributing to the selection of competing objects and actions. Additionally observed neuronal signals related to the sum or difference of object risk could contribute separately to decisionmaking and motivation processes, similar to previously observed neuronal coding of value sum and value difference (Cai et al., 2011; Tsutsui et al., 2016; Wang et al., 2013).
Previous studies reported risksensitive neurons in dopaminergic midbrain (Fiorillo et al., 2003; Lak et al., 2014; Stauffer et al., 2014), orbitofrontal cortex (O'Neill and Schultz, 2010; O'Neill and Schultz, 2013; Raghuraman and PadoaSchioppa, 2014), cingulate cortex (McCoy and Platt, 2005; Monosov, 2017), basal forebrain (Ledbetter et al., 2016; Monosov and Hikosaka, 2013), and striatum (White and Monosov, 2016). In one study, orbitofrontal neurons encoded ‘offer risk’ for specific juice types (Raghuraman and PadoaSchioppa, 2014), analogous to the presently observed risk signals for specific visual objects. Critically, most previous studies used explicit riskdescriptive cues indicating fixed risk levels and tested neuronal activity when the animals had already learned cueassociated risk levels. One series of studies (Monosov and Hikosaka, 2013; White and Monosov, 2016) documented how risk responses evolved for novel cues with fixed risk levels, although relations to statistical variance of reward history were not examined. Here, we examined how neuronal risk estimates are derived internally and continually updated based on distinct reward experiences.
How could variance estimates be computed neurally? In neurophysiological models, reward signals modify synaptic strengths of valuation neurons, with decaying influences for more temporally remote rewards (Wang, 2008). Variancerisk could be derived from such neurons by a mechanism that registers deviations of rewardoutcomes from synaptically stored mean values. This process may involve risksensitive prediction error signals in dopamine neurons (Lak et al., 2014; Stauffer et al., 2014), orbitofrontal cortex (O'Neill and Schultz, 2013) and insula (Preuschoff et al., 2008). Our data cannot determine whether DLPFC neurons perform such variance computation or whether the observed risk signals reflected processing elsewhere; resolving this question will require simultaneous recordings from multiple structures. Nevertheless, a role of DLPFC neurons in local risk computation would be consistent with known prefrontal involvement in temporal reward integration (Barraclough et al., 2004; Seo et al., 2007), including recently described integration dependent on volatility (Massi et al., 2018), processing of numerical quantities and mathematical rules (Bongard and Nieder, 2010; Nieder, 2013), and the currently observed transitions from past reward coding to object risk coding (Figure 7A,B). Moreover, in one recent study, reward information from past trials enhanced the encoding of currenttrial taskrelevant information in DLPFC neurons (Donahue and Lee, 2015).
The prefrontal cortex has long been implicated in adaptive behavior (Miller and Cohen, 2001), although economic risk processing is often associated with its orbital part, rather than the dorsolateral region studied here (O'Neill and Schultz, 2010; PadoaSchioppa, 2011; Stolyarova and Izquierdo, 2017). However, DLPFC neurons are well suited to signal object risk and action risk based on recent reward variance: DLPFC neurons process numerical quantities and basic mathematical rules (Bongard and Nieder, 2010; Nieder, 2013), integrate reward information over time (Barraclough et al., 2004; Donahue and Lee, 2015; Massi et al., 2018; Seo et al., 2007), and process both visual objects and actions (Funahashi, 2013; Suzuki and Gottlieb, 2013; Watanabe, 1996). Previous studies implicated DLPFC neurons in reward valuation during decisionmaking (Cai and PadoaSchioppa, 2014; Hosokawa et al., 2013; Kennerley et al., 2009; Kim and Shadlen, 1999; Wallis and Miller, 2003). We recently showed that DLPFC neurons encoded objectspecific values and their conversion to choices (Tsutsui et al., 2016). A previous imaging study detected a riskdependent reward value signal in human lateral prefrontal cortex but no separate, valueindependent risk signal (Tobler et al., 2009), perhaps due to insufficient spatiotemporal resolution. Importantly, the presently described risk signals were not explained by value, which we controlled for in our regressions. Thus, the presently observed DLPFC risk neurons may contribute to economic decisions beyond separately coded value signals.
Activity in DLPFC has been implicated in attention (Everling et al., 2002; Squire et al., 2013; Suzuki and Gottlieb, 2013) and the presently observed risk signals may contribute to DLPFC’s object and spatial attentional functions (Miller and Cohen, 2001; Watanabe, 1996). However, several observations argue against an interpretation of our results solely in terms of attention. First, DLPFC neurons encoded risk with both positive and negative slopes, suggesting no simple relationship to attention, which is usually associated with activity increases (Beck and Kastner, 2009; Hopfinger et al., 2000; Squire et al., 2013). Second, in many neurons, risk signals were dynamically combined with signals related to other taskrelevant variables, suggesting specific functions in risk updating and decisionmaking. Thus, the present neuronal risk signals may contribute to established DLPFC functions in attention (Everling et al., 2002; Squire et al., 2013; Suzuki and Gottlieb, 2013) but also seem to play distinct, more specific roles in decision processes.
Our findings are consistent with a potential role for DLPFC neurons in signaling economic value and risk, and in converting these signals to choices. This general notion is supported by an earlier study implicating DLPFC in conversions from sensory evidence to oculomotor acts (Kim and Shadlen, 1999). However, the choices may not be computed in DLPFC. Indeed, previous studies showed that value signals occur late in DLPFC, at least following those in orbitofrontal cortex (Kennerley et al., 2009; Wallis and Miller, 2003). A recent study using explicitly cued juice rewards demonstrated conversions from chosen juice signals to action signals in DLPFC but apparently few DLPFC neurons encoded the value inputs to these choices (Cai and PadoaSchioppa, 2014). By contrast, risk in our task was derived from integrated reward history, to which DLPFC neurons are sensitive (Barraclough et al., 2004; Massi et al., 2018; Seo et al., 2007). It is possible that DLPFC’s involvement in converting decision parameters (including object risk as shown here) to choice signals depends on task requirements. This interpretation is supported by a recent study which showed that the temporal evolution of decision signals in DLPFC differs between delaybased and effortbased choices, and that orbitofrontal and anterior cingulate cortex differentially influence DLPFC decision signals for these different choice types (Hunt et al., 2015).
In summary, these results show that prefrontal neurons tracked the evolving statistical variance of recent rewards that resulted from specific object choices and actions. Such variancesensing neurons in prefrontal cortex may provide a physiological basis for integrating discrete event experiences and converting them into representations of abstract quantities such as risk. By coding this quantity as object risk and action risk, these prefrontal neurons provide distinct and specific risk inputs to economic decision processes.
Materials and methods
Animals
Request a detailed protocolAll animal procedures conformed to US National Institutes of Health Guidelines and were approved by the Home Office of the United Kingdom (Home Office Project Licenses PPL 80/2416, PPL 70/8295, PPL 80/1958, PPL 80/1513). The work has been regulated, ethically reviewed and supervised by the following UK and University of Cambridge (UCam) institutions and individuals: UK Home Office, implementing the Animals (Scientific Procedures) Act 1986, Amendment Regulations 2012, and represented by the local UK Home Office Inspector; UK Animals in Science Committee; UCam Animal Welfare and Ethical Review Body (AWERB); UK National Centre for Replacement, Refinement and Reduction of Animal Experiments (NC3Rs); UCam Biomedical Service (UBS) Certificate Holder; UCam Welfare Officer; UCam Governance and Strategy Committee; UCam Named Veterinary Surgeon (NVS); UCam Named Animal Care and Welfare Officer (NACWO).
Two adult male macaque monkeys (Macaca mulatta) weighing 5.5–6.5 kg served for the experiments. The animals had no history of participation in previous experiments. The number of animals used and the number of neurons recorded for the experiment is typical for studies in this field of research; we did not perform explicit power analysis. A head holder and recording chamber were fixed to the skull under general anaesthesia and aseptic conditions. We used standard electrophysiological techniques for extracellular recordings from single neurons in the sulcus principalis area of the frontal cortex via stereotaxically oriented vertical tracks, as confirmed by histological reconstruction. After completion of data collection, recording sites were marked with small electrolytic lesions (15–20 µA, 20–60 s). The animals received an overdose of pentobarbital sodium (90 mg/kg iv) and were perfused with 4% paraformaldehyde in 0.1 M phosphate buffer through the left ventricle of the heart. Recording positions were reconstructed from 50µmthick, stereotaxically oriented coronal brain sections stained with cresyl violet.
Behavioral task
Request a detailed protocolThe animals performed an oculomotor freechoice task involving choices between two visual objects to each of which reward was independently and stochastically assigned. Trials started with presentation of a red fixation spot (diameter: 0.6°) in the center of a computer monitor (viewing distance: 41 cm; Figure 1B). The animal was required to fixate the spot and contact a touch sensitive, immobile resting key. An infrared eye tracking system continuously monitored eye positions (ISCAN, Cambridge, MA). During the fixation period at 1.0–2.0 s after eye fixation and key touch, an alert cue covering the fixation spot appeared for 0.7–1.0 s. At 1.4–2.0 s following offset of the alert cue, two different visual fractal objects (A, B; square, 5° visual angle) appeared simultaneously as ocular choice targets on each side of the fixation spot at 10° lateral to the center of the monitor. Left and right positions of objects A and B alternated pseudorandomly across trials. The animal was required to make a saccadic eye movement to its target of choice within a time window of 0.25–0.75 s. A red peripheral fixation spot replaced the target after 1.0–2.0 s of target fixation. This fixation spot turned to green after 0.5–1.0 s, and the monkey released the touch key immediately after color change. Rewarded trials ended with a fixed quantity of 0.7 ml juice delivered immediately upon key release. A computercontrolled solenoid valve delivered juice reward from a spout in front of the animal's mouth. Unrewarded trials ended at key release and without further stimuli. The fixation requirements restricted eye movements from trial start to cue appearance and, following the animals’ saccade choice, from choice acquisition to reward delivery. This ensured that neuronal activity was minimally influenced by oculomotor activity, especially in our main periods of interest before cue appearance.
Reward probabilities of object A and B were independently calculated in every trial, depending on the numbers of consecutive unchosen trials (Equation 1):
with P as instantaneous reward probability, P_{0} as experimentally imposed, base probability setting, and n as the number of trials that the object had been consecutively unchosen. This equation describes the probabilistic reward schedule of the Matching Law (Herrnstein, 1961) in defining how the likelihood of being rewarded on a target increased with the number of trials after the object was last chosen but stayed at base probability while the object was repeatedly chosen (irrespective of whether that choice was rewarded or not). Reward was thus probabilistically assigned to the object in every trial, and once a reward was assigned, it remained available until the associated object was chosen.
We varied the base reward probability in blocks of typically 50–150 trials (chosen randomly) without signaling these changes to the animal. We used different base probabilities from the range of p=0.05 to p=0.55, which we chose randomly for each block. The sum of base reward probabilities for objects A and B was held constant so that only relative reward probability varied. The trialbytrial reward probabilities for objects A and B, calculated according to Equation 1 varied within the range of p=0.05 to p=0.99.
Calculation of objective risk
Request a detailed protocolA most basic, assumptionfree definition of objective risk derives risk directly from the variance of the true, specifically set (i.e. programmed) reward probabilities that changed on each trial depending on the animal’s matching behavior (Equation 1). Accordingly, we used the conventional definition of variance to calculate risk in each trial from the programmed, binary probability distribution (Bernoulli distribution) that governed actual reward delivery in each trial (Equation 2):
with p as the trialspecific probability derived from Equation 1, m as reward magnitude (0.7 ml for reward, 0 for noreward outcome), k as outcome (0 ml or set ml of reward) and EV as expected value (defined as the sum of probabilityweighted reward amounts). In our task, reward magnitude on rewarded trials was held constant at 0.7 ml; the definition generalizes to situations with different magnitudes. With magnitude m held constant, variance risk follows an invertedU function of probability (Figure 1A). The risk for objects A and B, calculated as variance according to Equation 2 varied within the range of var = 0.0003 to var = 0.1225.
Objective risk: analysis of neuronal data
Request a detailed protocolWe counted neuronal impulses in each neuron on correct trials relative to different task events with 500 ms time windows that were fixed across neurons: before fixation spot (Prefix, starting 500 ms before fixation onset), early fixation (Fix, following fixation onset), late fixation (Fix2, starting 500 ms after fixation spot onset), precue (Precue, starting 500 ms before cue onset), cue (Cue, following cue onset), postfixation (Postfix, following fixation offset), before cue offset (Precue off, starting 500 ms before cue offset), after cue offset (Postcue off, following cue offset), preoutcome (Preoutc, starting 500 ms before reinforcer delivery), outcome (Outc, starting at outcome delivery), late outcome (Outc2, starting 500 ms after outcome onset).
We first identified taskrelated responses in individual neurons and then used multiple regression analysis to test for different forms of riskrelated activity while controlling for the most important behaviorally relevant covariates. We identified taskrelated responses by comparing activity to a control period (Prefix) using the Wilcoxon test (p<0.005, Bonferronicorrected for multiple comparisons). A neuron was included as taskrelated if its activity in at least one task period was significantly different to that in the control period. Because the Prefixation period served as control period we did not select for taskrelatedness in this period and included all neurons with observed impulses in the analysis. We chose the prefixation period as control period because it was the earliest period at the start of a trial in which no sensory stimuli were presented. The additional use of a slidingwindow regression approach for which no comparison with a control period was performed (see below) confirmed the results of the fixed window analysis that involved testing for taskrelationship.
We used multiple regression analysis to assess relationships between neuronal activity and taskrelated variables. Statistical significance of regression coefficients was determined using ttest with p<0.05 as criterion. Our analysis followed established approaches previously used to test for value coding in different brain structures (Lau and Glimcher, 2008; Samejima et al., 2005). All tests performed were twosided.
Each neuronal response was tested with the following multiple regression model to identify responses related to objective, true risk derived from reward probability (Equation 3):
with y as trialbytrial neuronal impulse rate, ObjectChoice as currenttrial object choice (0 for A, one for B), CuePosition as currenttrial spatial cue position (0 for object A on the left, one for object A on the right), Action as currenttrial action (0 for left, one for right), TrueProbA as the true currenttrial reward probability of object A calculated from Equation 1, TrueProbB as the true currenttrial reward probability of object B calculated from Equation 1, TrueRiskA as the true currenttrial risk of object A calculated from TrueProbA according to Equation 2, TrueRiskB as the true currenttrial risk of object B calculated from TrueProbB according to Equation 2, β_{1} to β_{7} as corresponding regression coefficients, β_{0} as constant, ε as residual. A neuronal response was classified as coding object risk if it had a significant coefficient for TrueRiskA or TrueRiskB.
We used the same model to test for neuronal coding of objective action risk by substituting the objectrisk regressors with actionspecific risk regressors (defined by the leftright object arrangement on each trial; thus, if object A appeared on the left side on a given trial, the action L risk regressor would be determined by the object A risk regressor for that trial).
An alternative method for classification of neuronal risk responses used the angle of regression coefficients (Tsutsui et al., 2016; Wang et al., 2013). This classification method is ‘axisinvariant’ as it is independent on the axis choice for the regression model, that is whether the model includes separate variables for object risk or separate variables for risk sum and difference (Wang et al., 2013). However, the regression in this form omits relevant variables coded by DLPFC neurons (choice, cue position, action); accordingly, we use this approach as additional confirmation for our main regression above. We fitted the following regression model (Equation 4):
Using this method, a neuronal response was categorized as riskrelated if it showed a significant overall model fit (p<0.05, Ftest), rather than testing the significance of individual regressors. For responses with significant overall model fit, we plotted the magnitude of the beta coefficients (slopes) of the two objectrisk regressors on an xy plane (Figure 2C). We followed a previous study (Wang et al., 2013) and divided the coefficient space into eight equally spaced segments of 45° to categorize neuronal responses based on the polar angle. We classified responses as coding object risk (‘absolute risk’) if their coefficients fell in the segments pointing toward 0° or 180° (object risk A) or toward 90° or 270° (object risk B). We used an analogous procedure for action risk classification. We classified responses as coding risk difference if their coefficients fell in the segments pointing toward 135° or 315° and as coding risk sum if their coefficients fell in the segments pointing toward 45° or 225°.
Logistic regression for defining the weight of past reward
Request a detailed protocolThe animal may have paid more attention to immediately past rewards compared to earlier, more remote rewards. To establish a potential subjective reward value weighing, we used a logistic regression to model subjective reward value from the animals’ past rewards and choices, similar to comparable previous behavioral and neurophysiological macaque studies (Corrado et al., 2005; Lau and Glimcher, 2005; Sugrue et al., 2004). As our task involved tracking changing values of objects (one fractal image for each of the two choice options), we formulated the model in terms of object choices rather than action choices. We fitted a logistic regression to the animal’s trialbytrial choice data to estimate beta coefficients for the recent history of received rewards and recently made choices. Note that in choice tasks such as the one used here, choices and reward outcomes depend not only on reward history but also on choice history (Lau and Glimcher, 2005). Thus, rewards and choices were both included into the logistic regression to avoid an omitted variable bias and provide a more accurate estimation of the weighting coefficients. The resulting coefficients quantified the extent to which the animals based their choices on recently received rewards and recently made choices for a given option; thus, the coefficients effectively weighed past trials with respect to their importance for the animal’s behavior. We used the following logistic regression to determine the weighting coefficients for reward history (${\beta}_{j}^{r}$) and choice history (${\beta}_{j}^{c}$) (Equation 5):
with ${p}_{A}\left(i\right)$ [or${p}_{B}\left(i\right)$] as the probability of choosing object A (or B) on the ith current trial, ${R}_{A}\left[\mathrm{o}\mathrm{r}{R}_{B}\right]$ as reward delivery after choice of object A [or B] on the ith trial, j is the past trial relative to the ith trial, ${C}_{A}\left[\mathrm{o}\mathrm{r}{C}_{B}\right]$ as choice of object A [or B] on the ith trial, N denoting the number of past trials included in the model (N = 10), and ${\beta}_{0}$ as bias term. Exploratory analysis had shown that the beta coefficients did not differ significantly from 0 for more than (N = 10) past trials. Thus, Equation 5 modeled the dependence of the monkeys’ choices on recent rewards and recent choices for specific objects; by fitting the model we estimated the subjective weights (i.e. betas) that animals placed on recent rewards and choices. Thus, the crucial weighting coefficients reflecting the subjective influence of past rewards and choices were ${\beta}_{j}^{r}$ and ${\beta}_{j}^{c}.$ As described below, the ${\beta}_{j}^{r}$ coefficients were also used for calculating the variance that served as the subjective risk measure of our study. The logistic regression was estimated by fitting regressors to a binary indicator function (dummy variable), setting the variable to 0 for the choice of one object and to 1 for the choice of the alternative object, using a binomial distribution with logit link function. The coefficients for reward and choice history from this analysis are plotted in Figure 3A and B as reward and choice weights.
Calculation of weighted, subjective value
Request a detailed protocolTo calculate the subjective value of each reward object from the animal’s experience, we used the weights estimated from the logistic regression (Equation 5). We followed previous studies of matching behavior (Lau and Glimcher, 2005) that distinguished two influences on value: the history of recent rewards and the history of recent choices. The first objectvalue component related to reward history, ${OV}_{A}^{r}$, can be estimated by the mean of subjectively weighted reward history over the past 10 trials (Equation 6):
with ${R}_{A}$ again as reward delivery after choice of object A on the ith trial, j as the past trial relative to the ith trial, N the number of past trials included in the model (N = 10), ${\beta}_{j}^{r}$ as regression coefficient for the weight of past rewards (estimated by Equation 5).
In tasks used to study matching behavior, such as the present one, it has been shown that choice history has an additional influence on behavior and that this influence can be estimated using logistic regression (Lau and Glimcher, 2005). To account for this second objectvalue component related to choice history, we estimated a subjective measure of object value that incorporated both a dependence on weighted reward history and a dependence on weighted choice history (Equation 7):
with ${R}_{A}$ as reward delivery after choice of object A on the ith trial, ${C}_{A}\left[\mathrm{o}\mathrm{r}{C}_{B}\right]$ as choice of object A [or B] on the ith trial, j as the past trial relative to the ith trial, N the number of past trials included in the model (N = 10), ${\beta}_{j}^{r}$ as regression coefficient for the weight of past rewards and ${\beta}_{j}^{c}$ as regression coefficient for the weight of past choice (estimated by Equation 5). This measure of subjective object value (Equation 7) based on both weighted reward and choice history, ${OV}_{A}^{r,c}$, constituted our main value measure for behavioral and neuronal analyses.
Calculation of subjective risk
Request a detailed protocolOur aim was to construct a subjective risk measure that derived risk from the variance of recent reward experiences and that incorporated the typically observed decreasing influence of past trials on the animal’s behavior. We used the following definition as our main measure of subjective object risk (Equation 8):
with ${\beta}_{j}^{r}$ representing the weighting coefficients for past rewards (derived from Equation 5), ${R}_{A}$ as reward delivery after choice of object A, j as index for past trials relative to the current ith trial, and N as the number of past trials included in the model (N = 10); the term $\left(\sum _{j=1}^{N}\left({R}_{A}\left(ij\right)\right)\right)\text{}/N$ represents the mean reward over the last ten trials. Thus, the equation derives subjective object risk from the summed, subjectively weighted, squared deviation of reward amounts in the last ten trials from the mean reward over the last ten trials. By defining risk in this manner, we followed the common economic definition of risk as the mean squared deviation from expected outcome and in addition accounted for each animal’s subjective weighting of past trials. This definition (Equation 8) constituted our main subjective object risk measure for behavioral and neuronal analyses.
Alternative, more complex definitions of subjective risk in our task could incorporate the weighting of past trials in the calculation of the mean reward (the subtrahend in the numerator of Equation 8) or incorporate both weighted reward history and weighted choice history in this calculation. We explore these possibilities in a supplementary analysis (Figure 4—figure supplement 4). In our main risk definition, we also assumed that the animals used a common reward weighting function for value and risk. As described in the Results, for neuronal analysis we also explored alternative pasttrial weighting functions for the risk calculation (Figure 4—figure supplement 3). The alternative weights identified similar although slightly lower numbers of risk neurons compared to those obtained with the weights defined by Equation 5.
Testing the influence of subjective object risk on choices
Request a detailed protocolFor outofsample validation, we used one half of the behavioral data within each animal to derive weighting coefficients (Equation 5) and subsequently used the remaining half for testing the behavioral relevance of objectrisk and objectvalue variables. To do so, we used logistic regression to relate each animal’s choices to the subjective object values and object variancerisks, according to the following equation (Equation 9):
with ${p}_{L}\left(i\right)$ [or${p}_{R}\left(i\right)$] as the probability of choosing left or right on the ith trial, ObjectValueLeft as currenttrial value of the left object (derived from ${OV}_{A}^{r,c}$, Equation 7), ObjectValueRight as currenttrial value of the right object (Equation 7), ObjectRiskLeft as currenttrial risk of the left object (Equation 8), ObjectRiskRight as currenttrial risk of the right object (Equation 8), β_{1} to β_{2} as corresponding regression coefficients, β_{0} as constant, ε as residual. The resulting regression coefficients are shown in Figure 3E. Thus, object choice was modeled as a function of relative object value and relative object risk.
We compared this model to several alternative behavioral models using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC; Table 1). The alternative models included variations of the above model (described in the legend to Table 1), a model based on objective (true) reward probabilities and risks, a standard reinforcement learning model that updated the objectvalue estimate of the chosen option based on the obtained outcome (Sutton and Barto, 1998), a reinforcement learning model that updated objectvalue estimates of both chosen and unchosen option, and a modified reinforcement learning model that captured timedependent increases in reward probability in tasks used to study matching behavior (Huh et al., 2009).
Subjective risk: analysis of neuronal data
Subjective risk for objects
Request a detailed protocolEach neuronal response was tested with the following multiple regression model to identify responses related to subjective risk derived from weighted reward history (Equation 10):
with y as trialbytrial neuronal impulse rate, ObjectChoice as currenttrial object choice (0 for A, one for B), CuePosition as currenttrial spatial cue position (0 for object A on the left, one for object A on the right), Action as currenttrial action (0 for left, one for right), ObjectValueA as currenttrial value of object A (Equation 7), ObjectValueB as currenttrial value of object B (Equation 7), ObjectRiskA as currenttrial risk of object A (Equation 8), ObjectRiskB as currenttrial risk of object B (Equation 8), β_{1} to β_{7} as corresponding regression coefficients, β_{0} as constant, ε as residual. This equation differs from Equation 3 as it replaced the regressors for true risk and true probability with regressors for subjective risk and subjective value. A neuronal response was classified coding object risk if it had a significant coefficient for ObjectRiskA or ObjectRiskB.
Regression including lasttrial history variables
Request a detailed protocolObject risk was calculated based on reward received in previous trials. Accordingly, we used an additional regression to test whether these variables were directly encoded by DLPFC neurons, and whether objectrisk responses were better explained in terms of these history variables. To test for encoding of object risk alongside explicit regressors for lasttrial reward, lasttrial choice, and lasttrial choice × reward, we used the following regression model (Equation 11):
To test whether inclusion of additional regressors for the past two trials affected our main results (Figure 4—figure supplement 2), we used the following regression model (Equation 12) that included regressors for rewards, choices, and their interactions on the last two trials:
Stepwise regression model for testing coding of objective and subjective object risk
Request a detailed protocolWe used this stepwise regression as an additional analysis to test for objectrisk coding and actionrisk coding when these variables directly competed to explain variance in neuronal responses. Note that the objective, true probabilities used for the analysis were not the base probabilities but the trialspecific probabilities that evolved trialbytrial from the baseline probabilities according to Equation 1. The following variables were included as regressors in the starting set (Equation 13):
Subjective risk for actions
Request a detailed protocolTo test for encoding of action risk, we used the following multiple regression model (Equation 14):
with y as trialbytrial neuronal impulse rate, ActionValueL as currenttrial value of a leftward saccade, ActionValueR as currenttrial value of a rightward saccade, ActionRiskL as currenttrial risk of a leftward saccade, ActionRiskR as currenttrial risk of a rightward saccade (all other variables as defined above). Note that subjective action values and action risks were not simply spatially referenced object values and object risks but were estimated separately, based on object reward histories and action reward histories. Specifically, regressors for action value and action risk were estimated analogously to those for object value and object risk as described in Equation 7 and Equation 8, based on filter weights derived from fitting the model in Equation 5 for action choice rather than object choice. Thus, for defining action risk, we calculated the variance of rewards that resulted from rightward and leftward saccades within the last ten trials, with coefficients from Equation 5 (calculated for actions) determining how strongly each trial was weighted in the variance calculation. A neuronal response was classified as coding action risk if it had a significant regressor for ActionRiskL or ActionRiskR.
Stepwise regression model for testing coding of object risk and action risk
Request a detailed protocolWe used this stepwise regression as an addition analysis to test for objectrisk coding and actionrisk coding when these variables directly competed to explain variance in neuronal responses. The following variables were included as regressors in the starting set (Equation 15):
Sliding window regression analysis
Request a detailed protocolWe used additional sliding window multiple regression analyses (using the regression model in Equation 11) with a 200 ms window that we moved in steps of 25 ms across each trial. To determine whether neuronal activity was significantly related to a given variable we used a bootstrap approach based on shuffled data as follows. For each neuron, we performed the sliding window regression 1000 times on trialshuffled data and determined a false positive rate by counting the number of consecutive windows in which a regression was significant with p<0.05. We found that less than five per cent of neurons with trialshuffled data showed more than six consecutive significant analysis windows. In other words, we used the shuffled data to obtain the percentage of neurons with at least one case of six consecutively significant windows. Therefore, we counted a sliding window analysis as significant if a neuron showed a significant (p<0.05) effect for more than six consecutive windows.
Normalization of population activity
Request a detailed protocolWe subtracted from the measured impulse rate in a given task period the mean impulse rate of the control period and divided by the standard deviation of the control period (zscore normalization). Next, we distinguished neurons that showed a positive relationship to object value and those with a negative relationship, based on the sign of the regression coefficient, and signcorrected responses with a negative relationship. Normalized activity was used for all population decoding analyses and for Figure 4E,F and Figure 5E,F.
Normalization of regression coefficients
Request a detailed protocolStandardized regression coefficients were defined as xi(si/sy), xi being the raw slope coefficient for regressor i, and si and sy the standard deviations of independent variable i and the dependent variable, respectively. These coefficients were used for Figure 2B,C, Figure 3E, Figure 4B,C, Figure 5B,C, Figure 6C, Figure 7H, Figure 7—figure supplement 2B,C.
Population decoding
Request a detailed protocolWe used support vector machine (SVM) and nearestneighbor (NN) classifiers to quantify the information contained in DLPFC population activity in defined task periods. This method determines how accurately our main variables object risk and action risk were encoded by groups of DLPFC neurons. The SVM classifier was trained on a set of training data to find a linear hyperplane that provides the best separation between two patterns of neuronal population activity defined by a grouping variable (e.g. high vs. low object risk). Decoding was typically not improved by nonlinear (e.g. quadratic) kernels. Both SVM and NN classification are biologically plausible in that a downstream neuron could perform similar classification by comparing the input on a given trial with a stored vector of synaptic weights. Both classifiers performed qualitatively similar, although SVM decoding was typically more accurate. We therefore focused our main results on SVM decoding.
We aggregated znormalized trialbytrial impulse rates of independently recorded DLPFC neurons from specific task periods into pseudopopulations. We used all recorded neurons that met inclusion criteria for a minimum trial number, without preselecting for risk coding, except where explicitly stated. For each decoding analysis, we created two n by m matrices with n columns defined by the number of neurons and m rows by the number of trials. We defined two matrices, one for each group for which decoding was performed (e.g. high vs. low object risk). Thus, each cell in a matrix contained the impulse rate from a single neuron on a single trial measured for a given group. Because neurons were not simultaneously recorded, we randomly matched up trials from different neurons for the same group and then repeated the decoding analysis with different random trial matching (withingroup trial matching) 150 times for the SVM and 500 times for the NN. We found these numbers to produce very stable classification results. (We note that this approach likely provides a lower bound for decoding performance as it ignores potential contributions from crosscorrelations between neurons; investigation of crosscorrelations would require data from simultaneously recorded neurons.) We used a leaveoneout crossvalidation procedure whereby a classifier was trained to learn the mapping from impulse rates to groups on all trials except one; the remaining trial was then used for testing the classifier and the procedure repeated until all trials had been tested. An alternative approach of using 80% trials as training data and testing on the remaining 20% produced highly similar results (Pagan et al., 2013). We only included neurons in the decoding analyses that had a minimum number of eight trials per group for which decoding was performed. ‘Group’ referred to a trial category for which decoding was performed, such as low risk, high risk, A chosen, B chosen, etc. The minimum defined the lower cutoff in case a recording session contained few trials that belonged to a specific group as in the case of decoding based on risk terciles within each session, separately for object A and object B.
The SVM decoding was implemented in Matlab (Version R2013b, Mathworks, Natick, MA) using the ‘svmtrain’ and ‘svmclassify’ functions with a linear kernel and the default sequential minimal optimization method for finding the separating hyperplane. We quantified decoding accuracy as the percentage of correctly classified trials, averaged over all decoding analyses for different random withingroup trial matchings. To investigate how decoding accuracy depends on population size, we randomly selected a given number of neurons at each step and then determined the percentage correct. For each step (i.e. each possible population size) this procedure was repeated 10 times. We also performed decoding for randomly shuffled data (shuffled group assignment without replacement) with 5000 iterations to test whether decoding on real data differed significantly from chance. Statistical significance (p<0.0001) was determined by comparing vectors of percentage correct decoding accuracy between real data and randomly shuffled data using the rank sum test (Quian Quiroga et al., 2006). For all analyses, decoding was performed on neuronal responses taken from the same task period. We trained classifiers to distinguish high from low risk terciles (decoding based on median split produced very similar results).
References

1
Prefrontal cortex and decision making in a mixedstrategy gameNature Neuroscience 7:404–410.https://doi.org/10.1038/nn1209
 2
 3

4
Reward skewness coding in the insula independent of probability and lossJournal of Neurophysiology 106:2415–2422.https://doi.org/10.1152/jn.00471.2011
 5
 6

7
LinearNonlinearPoisson models of primate choice dynamicsJournal of the Experimental Analysis of Behavior 84:581–617.https://doi.org/10.1901/jeab.2005.2305

8
Neurobiological studies of risk assessment: a comparison of expected utility and meanvariance approachesCognitive, Affective, & Behavioral Neuroscience 8:363–374.https://doi.org/10.3758/CABN.8.4.363

9
Brain mechanisms for perceptual and rewardrelated decisionmakingProgress in Neurobiology 103:194–213.https://doi.org/10.1016/j.pneurobio.2012.01.010
 10
 11

12
Filtering of neural signals by focused attention in the monkey prefrontal cortexNature Neuroscience 5:671–676.https://doi.org/10.1038/nn874

13
Selective representation of taskrelevant objects and locations in the monkey prefrontal cortexEuropean Journal of Neuroscience 23:2197–2214.https://doi.org/10.1111/j.14609568.2006.04736.x
 14

15
Space representation in the prefrontal cortexProgress in Neurobiology 103:131–155.https://doi.org/10.1016/j.pneurobio.2012.04.002
 16
 17
 18

19
Relative and absolute strength of response as a function of frequency of reinforcementJournal of the Experimental Analysis of Behavior 4:267–272.https://doi.org/10.1901/jeab.1961.4267

20
Decisions from experience and the effect of rare events in risky choicePsychological Science 15:534–539.https://doi.org/10.1111/j.09567976.2004.00715.x

21
Risk aversion and incentive effectsAmerican Economic Review 92:1644–1655.https://doi.org/10.1257/000282802762024700

22
The neural mechanisms of topdown attentional controlNature Neuroscience 3:284–291.https://doi.org/10.1038/72999

23
Singleneuron mechanisms underlying costbenefit analysis in frontal cortexJournal of Neuroscience 33:17385–17397.https://doi.org/10.1523/JNEUROSCI.222113.2013

24
How to maximize reward rate on two variableinterval paradigmsJournal of the Experimental Analysis of Behavior 35:367–396.https://doi.org/10.1901/jeab.1981.35367
 25
 26

27
Neurons in the frontal lobe encode the value of multiple decision variablesJournal of Cognitive Neuroscience 21:1162–1178.https://doi.org/10.1162/jocn.2009.21100
 28

29
Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaqueNature Neuroscience 2:176–185.https://doi.org/10.1038/5739
 30
 31

32
Dynamic responsebyresponse models of matching behavior in rhesus monkeysJournal of the Experimental Analysis of Behavior 84:555–579.https://doi.org/10.1901/jeab.2005.11004
 33

34
Multiple mechanisms for processing reward uncertainty in the primate basal forebrainThe Journal of Neuroscience 36:7852–7864.https://doi.org/10.1523/JNEUROSCI.112316.2016

35
Portfolio selectionThe Journal of Finance 7:77–91.https://doi.org/10.1111/j.15406261.1952.tb01525.x
 36

37
Risksensitive neurons in macaque posterior cingulate cortexNature Neuroscience 8:1220–1227.https://doi.org/10.1038/nn1523

38
An integrative theory of prefrontal cortex functionAnnual Review of Neuroscience 24:167–202.https://doi.org/10.1146/annurev.neuro.24.1.167
 39
 40

41
Coding of abstract quantity by ‘number neurons’ of the primate brainJournal of Comparative Physiology A 199:1–16.https://doi.org/10.1007/s0035901207639
 42

43
Risk prediction error coding in orbitofrontal neuronsJournal of Neuroscience 33:15810–15814.https://doi.org/10.1523/JNEUROSCI.423612.2013

44
Neurobiology of economic choice: a goodbased modelAnnual Review of Neuroscience 34:333–359.https://doi.org/10.1146/annurevneuro061010113648
 45

46
Human insula activation reflects risk prediction errors as well as riskJournal of Neuroscience 28:2745–2752.https://doi.org/10.1523/JNEUROSCI.428607.2008

47
Movement intention is better predicted than attention in the posterior parietal cortexJournal of Neuroscience 26:3615–3620.https://doi.org/10.1523/JNEUROSCI.346805.2006

48
Integration of multiple determinants in the neuronal computation of economic valuesJournal of Neuroscience 34:11583–11603.https://doi.org/10.1523/JNEUROSCI.123514.2014
 49

50
Increasing risk: I. A definitionJournal of Economic Theory 2:225–243.https://doi.org/10.1016/00220531(70)900384
 51

52
Neuronal reward and decision signals: from theories to dataPhysiological Reviews 95:853–951.https://doi.org/10.1152/physrev.00023.2014
 53
 54

55
Supplementary eye field encodes option and action value for saccades with variable rewardJournal of Neurophysiology 104:2634–2653.https://doi.org/10.1152/jn.00430.2010

56
Prefrontal contributions to visual selective attentionAnnual Review of Neuroscience 36:451–466.https://doi.org/10.1146/annurevneuro062111150439

57
Dopamine reward prediction error responses reflect marginal utilityCurrent Biology 24:2491–2500.https://doi.org/10.1016/j.cub.2014.08.064
 58
 59
 60
 61

62
Distinct neural mechanisms of distractor suppression in the frontal and parietal lobeNature Neuroscience 16:98–104.https://doi.org/10.1038/nn.3282

63
A behavioral and neural evaluation of prospective decisionmaking under riskJournal of Neuroscience 30:14380–14389.https://doi.org/10.1523/JNEUROSCI.145910.2010
 64

65
A dynamic code for economic object valuation in prefrontal cortex neuronsNature Communications 7:12554.https://doi.org/10.1038/ncomms12554

66
Heterogeneous reward signals in prefrontal cortexCurrent Opinion in Neurobiology 20:191–198.https://doi.org/10.1016/j.conb.2010.02.009

67
Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference taskEuropean Journal of Neuroscience 18:2069–2081.https://doi.org/10.1046/j.14609568.2003.02922.x
 68
 69
 70

71
Perceived risk attitudes: relating risk perception to risky choiceManagement Science 43:123–144.https://doi.org/10.1287/mnsc.43.2.123
 72
Decision letter

Daeyeol LeeReviewing Editor; Yale School of Medicine, United States

Richard B IvrySenior Editor; University of California, Berkeley, United States

Kenway LouieReviewer; New York University, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Primate prefrontal neurons signal economic risk derived from the statistics of recent reward experience" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Richard Ivry as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Kenway Louie (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
Authors of this work have examined the activity of single neurons in the monkey dorsolateral prefrontal cortex (DLPFC) during probabilistic "matching task" in which the reward probability was adjusted dynamically according to the baseline probability as well as the animal's choice history. The focus of this manuscript is to characterize the signals related to "risk", namely the uncertainty associated with the reward probability. Whereas previous studies have examined the neural signals related to "objective risk", the current study have dissected the DLPFC signals related to the experiential/subjective risk signals, and demonstrated how such risk signals coexist with other signals previously identified in this brain area.
Essential revisions:
1) Introduction needs a better organization. Unfortunately, the term risk has been used in a few different ways, and the authors have attempted to clarify how this term is used in the present study, but the first two paragraphs of the Introduction are still somewhat confusing. The manuscript might be easier to digest, if the authors tried to clarify only the terms used in this study, and better avoid mistakenly equating the reward probability and risk (which are different) or using terms such as "variance risk", which is unnecessarily confusing. In particular, the discussion on loss and risk is not clear, and perhaps unnecessary for the paper since the focus is on variance.
2) Behavioral effects of risk on choice. Authors should provide more information about how reward risk (or variance) plays into monkey choice behavior in this task. As well documented in these type of matching law tasks, choice behavior is a product of both past rewards and past choices (as also addressed by the reward and choice kernels quantified by the authors). This arises because these kind of environments are dynamic and complex: the baited reward outcomes (probabilistic rewards that remain once armed) are constructed in order to generate matching law behavior, and thus the values of options explicitly depend on past outcomes and choices.
The question is how does risk (true or subjective) affect behavior, and is it independent of past rewards and choices? The crucial issue is whether risk (or some measure of reward variance) is capturing an aspect of behavior beyond what is captures with past rewards and choices alone. In the analyses in Figure 3, the authors decompose taskrelevant information into object value and subjective object risk, and show that both have an effect on monkey choice. However, object value is the weighted sum of reward history (with the weights determined by regression on overall monkey choices), meaning that the effect of past choices is explicitly not captured in the value variable. The key question is whether subjective risk is just capturing the effect of past choices, and I think the authors need to find a way of quantifying the relative influence of risk – above and beyond past choices – on behavior. One way to be to do a formal comparison between the logistic regression based on past rewards and choice and a model using value and risk, or alternatively a full model with rewards, choices, AND risk in the same model. [The authors may have included this kind of analysis (as seems to be the case in the model comparison noted in the Materials and methods, subsection “Testing the influence of subjective object risk on choices”), but the description in the main text only refers to value as a function of past rewards – please correct me if I am mistaken.]
Understanding of risk, as defined, affects choice behavior is important two reasons. At the behavioral level, it is not yet clear to me how reward risk – objective or subjective – is related to monkey choice behavior in this experiment. One possibility, as detailed above, is that risk is merely capturing the effect of past monkey choices. Alternatively, monkeys may have a preference/aversion for risk itself. The latter point is what I believe the authors are getting at in their analyses (subsection “Subjective risk: definition and behavior”, last paragraph, Figure 3D), but as discussed above this is done by examining the influence of risk on choices for various object value differences, with object values determined solely by reward kernel weighting – this ignores the influence of past choices that may well govern the effect of risk in this task. At the neural level, the authors electrophysiological results show a robust coding for risk in DLPFC neurons, and it is important to distinguish whether these neurons are coding for (objective or subjective) risk itself, or simply some aspect of choicerelated strategy that correlates with risk.
3) Methods to quantify subjective needs to be justified better or improved. The authors have demonstrated that the animals incorporated their choice history to determine their values (Equation 5). Given this, it seems difficult to justify that choice history is not incorporated in the estimation of risk (Equation 6 and 7).
First, Equation 6 (that estimates value) does so in a classical RL way by assigning higher values to options that have been recently rewarded (more often). However, in this task the true value of an option is considerably lower after having been recently chosen (Equation 1 says the reward probability is lowest after a choice and then grows with each trial the option is not chosen) and animals seem to understand this as shown by their β_{c} weights. Shouldn't the subjective value estimate be based on both past reward and past choice since both influence the animal's choice? The same point applies to Equation 7. If animals understand the task, they should know that the option's risk on the current trial is influenced by its choice history in addition to its reward history.
Second, Equation 7 attempts to estimate variance. It does not seem to be a true variance measure (i.e. mean squared deviation of quantities from the mean of their distribution) because it is based on the deviation from the sum of those quantities (OV_{A}, from Equation 6) not their mean.
Beyond that, an estimate of variance should involve the variance of past outcomes, but Equation 7 is also strongly influenced by the variance of the β weights β_{r} and the variance of choice history, and treats reward and nonreward asymmetrically. For instance, if the animal chose option A 10 consecutive times and always received reward, then the reward history has zero variance but var_{A} will still be high because of the variance in the β_{r}. Also, if the animal chose A 10 consecutive times and always received no reward, then var_{A} will be zero, even though it is symmetrical to the previous case and should have the same variance. Finally, if the animal chose B 5 times and then chose A 5 times and always received reward, then var_{A} will be higher than if the animal chose A all 10 times and always received reward, even though the variance of the reward histories from choices of A are the same (5/5 vs. 10/10 rewards), because of the variance in choice history.
The authors must either 1) use another modelling approach that considers the structure of the "dual assignment with hold" task or 2) substantially revise to make it clear to the reader what the limitations are of the current approach and why they chose to utilize it.
4) Relationship between neural signals related to objective and subjective risk. One important aspect, however, that is not addressed is the relationship between true risk (i.e. Equation 2) and subjective risk (i.e. Equation 7). How correlated are these measures? At the behavioral level, does subjective risk do a better job at explaining choices compared to true risk?
At the neural level, the major conclusions of the paper center around the DLPFC coding of subjective object and action risk, but the paper does not clearly show that subjective risk better explains neural responses than true objective risk. For example, based on the stated results, 102 of 205 neurons (subsection “Neuronal coding of objective risk”, second paragraph) significantly responded to true object risk and 96 of 205 neurons (subsection “Neuronal coding of subjective risk associated with choice objects”, first paragraph) coded for subjective object risk. Is there a way for the authors to statistically distinguish whether DLPFC is encoding subjective rather than objective risk? If the different risk measures are uncorrelated (or only mildly correlated), this should be testable; if they are strongly correlated, I am not sure that the neural analyses centered on subjective risk (rather than true risk) are the right approach.
Showing that the authors' subjective risk estimate better captures choice and/or neural data is important because the specific quantification of subjective risk is not well known or clearly justified. Aside from the issue of the potential relationship between past choice effects and risk (see point 1 above), the weighting of past rewards in the variance calculation feels a bit arbitrary (not the reward kernel itself, which is well known in quantifying choice, but its use in estimating variance). In addition to showing that such a measure better captures behavior/neural responses, it would help if the authors could provide a more formal justification for their form of subjective risk.
5) Potential problems and weaknesses of decoding analysis. The decoding analysis needs to be strengthened, because currently this analysis does not attempt to distinguish between risk signals and other potentially covarying signals (as was done in the regression analysis). This can be accomplished for example, by balancing the trials with low and high levels of risk, in terms of other potentially confounding variables (c.f. Massi et al., 2018).
The% of neurons plots common to neuroeconomics studies (and SVMstyle decoding that only shows what info is in a pool of neurons and not how good individual neurons are at "encoding") may not be well suited for dlPFC. dlPFC is a complex spatial and object selective, attention, and motivation/reward related area and accordingly multiplexes many signals. It is simply very hard to tell what is going on from the current figures on a cell by cell basis. I want to see if the coding strategies (e.g. value, risk) of the single neurons in dlPFC are consistent across the task epochs in this task to get a better sense of what dlPFC may be doing, and to get a better understanding of the relationship of those valuerelated variables with spatial and object preferences of single neurons.
6) Description of models and equations need to be improved.
6a) The authors introduce two equations (Equations 2 and 7) to define objective and subjective risks, respectively, which are central to this study. However, these two equations can be simplified. For example, it would be much easier to understand this if the Equation 2 is replaced or supplemented by a much more common expression, p(1p), for a Bernoulli distribution. Including the reward magnitude in Equation 2 doesn't seem necessary, and this causes confusion. In addition, in Equation 7, the coefficient β should be outside the square of the difference between reward and OV. If not, this this needs to be justified/explained better.
6b) The authors have used Equation 2 in their stepwise regression analysis. However, this seems problematic, because this seems to violate the fullrank assumption, given that ValueA+ValueB = ValueL+ValueR and RiskA+RiskB=RiskL+RiskR? Similarly, is it possible to have both TrueProbA and TrueProbB in Equation 3, given that they sum up to 1, i.e., TrueProbB = (1TrueProbA)?
6c) The exact formulation of Equation 8 is a little unclear. The text states that "To account for known choice biases in matching tasks (Lau and Glimcher, 2005), we added to the objectvalue terms the weighted choicehistory using weights derived from Equation 5." Can the authors state explicitly how ObjectValue was calculated?
7) Novelty of the task. The authors claim that the animals are not cued and must derive a measure of risk (that would then influence their choices). This important because the authors claim that previous studies used explicit cues during learning while they do not in this study, and claim this is a key advance. While it is true, the cited papers did not utilize trialbytrial information to look at how subjective value and risk are updated on a trialbytrial basis, their approaches in many other ways seemed pretty similar to this study. Particularly, in this study, the key moment is when the reward probabilities associated with the two options change and the animal must figure out the new probabilities by experiencing two types of external cues: choice options and feedback. Broadly, the idea that this is the first paper to look at risk estimates that are independent of cueing is factually wrong and requires revision.
Furthermore, even if it was the case, the impact of this is unclear from the current manuscript. Perhaps the authors are most excited about preoptionpresentation objectrisk signals in the context of behavioral control? If so, how would this control take place? Is this an arousal / "let's get ready" signal? Or would this signal serve to bias choice to the risky object in a spatial manner (consistent with previous work on dlPFC)? Or is this risk signal to influence SV derivations elsewhere in the brain?
Specifically, are there reaction time correlates of this early "object risk signal" with action after options are on?
8) Problem with nonstationarity. The authors should take into account the fact that activity of prefrontal cortex is often nonstationary and is likely to be correlated serially (autocorrelation) across successive trials. This diminishes the effective degree of freedom, and can inflate the estimate of neurons encoding the signals that are related to events in multiple trials (equivalent to lowpass filtering). The authors should refer to a recent paper in eLife ("Striatal actionvalue neurons reconsidered).
9) Dynamics of DLPFC coding. The authors show that subpopulations of neurons carry multiple signals that could integrate various aspects of reward and choice information (Figure 7). Two additional analyses are important to include. First, is the percentage of neurons showing coding of two variables (last reward x choice and object risk, object value and object risk, etc.) different than that expected given the probabilities of neurons representing either variable?
Second, for those neurons that carry multiple signals, is the information coded in a consistent manner? For example, do the neurons that represent both value and risk information (Figure C, D) both modulated in the same direction by value and risk? Figure 7 plots the timeline of explained variance, but not the actual direction of modulation. One would expect that, for example in the case of value and risk, that since the behavioral data suggests that choice is driven by increases in value and risk, that neurons integrating that information would represent both in the same manner. A similar argument could be made for risk and choice. A regression weight by regression weight plot, similar to that used for the angle analyses elsewhere in the paper, would be helpful in understanding how this information is integrated across different variable pairs.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Primate prefrontal neurons signal economic risk derived from the statistics of recent reward experience" for further consideration at eLife. Your revised article has been favorably evaluated by Richard Ivry as the Senior Editor, a Reviewing Editor, and two reviewers.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
The authors have largely addressed the concern re: the influence of past choices on subjective value, and Table 1 shows that a model with value (from reward and choice history) and risk performs best. However, the present manuscript is still confusing in that it includes two different measure of subjective object value (Equation 6 and Equation 7).
First, it is confusing that the two equations both use the same term OV_{A} for the value measure; since they are different definitions, they should have different names. Second, it is entirely not clear which measure is used in which analyses. According to the Materials and methods, Equation 6 (value from rewards alone) is used for the calculation of subjective risk (Equation 8), and Equation 7 (value from rewards and choices) is used for behavioral analyses and neural analyses involving value. So were all neural subjective risk analyses performed only with the measure derived from form Equation 6 OV_{A}? This seems odd, given that the results of the model comparison suggest that value is a function of reward and choice, and the authors use the reward/choice definition of value for neural analyses – shouldn't the subjective risk measure use deviance from OV_{A} determined from reward and choice as well?
The paper would read more clearly, and be more conceptually unified, if they use a single measure for OV_{A} derived from reward and past choices (Equation 8). Note that this is different than the "risk from choice variance" addressed by the authors in their response letter – it is simply risk as variance of rewards from subjective value (calculated from reward and choice).
The description in the Materials and methods (subsection “Final calculation of subjective risk”) is incorrect given the revised Equation 8, as it implies that β weights are applied to rewards rather than to the squared deviations of rewards from OV_{A} (as described by the revised equation). For example, "βjrRAβ" term is not in the equation, and "the summed, squared deviation of subjectively weighted reward amounts from the mean weighted value of past rewards)" does not match Equation 8.
https://doi.org/10.7554/eLife.44838.032Author response
Essential revisions:
1) Introduction needs a better organization. Unfortunately, the term risk has been used in a few different ways, and the authors have attempted to clarify how this term is used in the present study, but the first two paragraphs of the Introduction are still somewhat confusing. The manuscript might be easier to digest, if the authors tried to clarify only the terms used in this study, and better avoid mistakenly equating the reward probability and risk (which are different) or using terms such as "variance risk", which is unnecessarily confusing. In particular, the discussion on loss and risk is not clear, and perhaps unnecessary for the paper since the focus is on variance.
Thank you for pointing out the need for better organization. We have revised the Introduction accordingly and now focus on the terms used in the present study.
“Rewards vary intrinsically. The variation can be characterized by a probability distribution over reward magnitudes. […] Thus, among the different definitions of economic risk, variance constitutes the most basic form, and this study will consider only variance as economic risk.”
2) Behavioral effects of risk on choice. Authors should provide more information about how reward risk (or variance) plays into monkey choice behavior in this task. As well documented in these type of matching law tasks, choice behavior is a product of both past rewards and past choices (as also addressed by the reward and choice kernels quantified by the authors). This arises because these kind of environments are dynamic and complex: the baited reward outcomes (probabilistic rewards that remain once armed) are constructed in order to generate matching law behavior, and thus the values of options explicitly depend on past outcomes and choices.
The question is how does risk (true or subjective) affect behavior, and is it independent of past rewards and choices? The crucial issue is whether risk (or some measure of reward variance) is capturing an aspect of behavior beyond what is captures with past rewards and choices alone. In the analyses in Figure 3, the authors decompose taskrelevant information into object value and subjective object risk, and show that both have an effect on monkey choice. However, object value is the weighted sum of reward history (with the weights determined by regression on overall monkey choices), meaning that the effect of past choices is explicitly not captured in the value variable. The key question is whether subjective risk is just capturing the effect of past choices, and I think the authors need to find a way of quantifying the relative influence of risk – above and beyond past choices – on behavior. One way to be to do a formal comparison between the logistic regression based on past rewards and choice and a model using value and risk, or alternatively a full model with rewards, choices, AND risk in the same model. [The authors may have included this kind of analysis (as seems to be the case in the model comparison noted in the Materials and methods, subsection “Testing the influence of subjective object risk on choices”), but the description in the main text only refers to value as a function of past rewards – please correct me if I am mistaken.]
Understanding of risk, as defined, affects choice behavior is important two reasons. At the behavioral level, it is not yet clear to me how reward risk – objective or subjective – is related to monkey choice behavior in this experiment. One possibility, as detailed above, is that risk is merely capturing the effect of past monkey choices. Alternatively, monkeys may have a preference/aversion for risk itself. The latter point is what I believe the authors are getting at in their analyses (subsection “Subjective risk: definition and behavior”, last paragraph, Figure 3D), but as discussed above this is done by examining the influence of risk on choices for various object value differences, with object values determined solely by reward kernel weighting – this ignores the influence of past choices that may well govern the effect of risk in this task. At the neural level, the authors electrophysiological results show a robust coding for risk in DLPFC neurons, and it is important to distinguish whether these neurons are coding for (objective or subjective) risk itself, or simply some aspect of choicerelated strategy that correlates with risk.
We have included a formal comparison of different models of the animals’ behavioral choices. Specifically, we systematically compared models that included different forms of subjective values, with values based on weighted reward history, weighted choice history or both weighted reward and weighted choice history. We compared the effect of adding our main risk measure as a separate regressor to these different forms of value. We also tested a model based on true, objective reward probabilities and risk. Finally, we tested three versions of reinforcement learning models: (i) a standard RescorlaWagner model that updated the value of the chosen option following outcomes, (ii) an adaptation of this model that incorporated timedependent effects related to choice history, which has been proposed as a suitable model of matching behavior (Huh et al., 2009), and (iii) a variant of the RescorlaWagner model that updated both the value of the chosen and unchosen option. The results are shown in a new table (Table 1).
In both animals, the model comparisons favored a model that included subjective value and subjective risk regressors, with subjective value based on both weighted reward and choice history. This result confirms that our main measure of subjective risk was behaviorally meaningful and explained variation in choices that is independent of reward history and choice history.
To further illustrate this point, we have performed a new logistic regression of choices on value and risk in a subset of trials that minimized the value difference between options. For these trials, value difference did not explain variation in choices (as expected by design of this test) whereas the effect of risk remained significant. Thus, the effect of risk on choices was not explained by value difference. The result of this analysis is shown in Figure 3E, inset and described in the last paragraph of the Results subsection “Subjective risk: definition and behavior”.
3) Methods to quantify subjective needs to be justified better or improved. The authors have demonstrated that the animals incorporated their choice history to determine their values (Equation 5). Given this, it seems difficult to justify that choice history is not incorporated in the estimation of risk (Equations 6 and 7).
First, Equation 6 (that estimates value) does so in a classical RL way by assigning higher values to options that have been recently rewarded (more often). However, in this task the true value of an option is considerably lower after having been recently chosen (Equation 1 says the reward probability is lowest after a choice and then grows with each trial the option is not chosen) and animals seem to understand this as shown by their β_{c} weights. Shouldn't the subjective value estimate be based on both past reward and past choice since both influence the animal's choice? The same point applies to Equation 7. If animals understand the task, they should know that the option's risk on the current trial is influenced by its choice history in addition to its reward history.
Second, Equation 7 attempts to estimate variance. It does not seem to be a true variance measure (i.e. mean squared deviation of quantities from the mean of their distribution) because it is based on the deviation from the sum of those quantities (OV_{A}, from Equation 6) not their mean.
Beyond that, an estimate of variance should involve the variance of past outcomes, but Equation 7 is also strongly influenced by the variance of the β weights β_{r} and the variance of choice history, and treats reward and nonreward asymmetrically. For instance, if the animal chose option A 10 consecutive times and always received reward, then the reward history has zero variance but var_{A} will still be high because of the variance in the β_{r}. Also, if the animal chose A 10 consecutive times and always received no reward, then var_{A} will be zero, even though it is symmetrical to the previous case and should have the same variance. Finally, if the animal chose B 5 times and then chose A 5 times and always received reward, then var_{A} will be higher than if the animal chose A all 10 times and always received reward, even though the variance of the reward histories from choices of A are the same (5/5 vs. 10/10 rewards), because of the variance in choice history.
The authors must either 1) use another modelling approach that considers the structure of the "dual assignment with hold" task or 2) substantially revise to make it clear to the reader what the limitations are of the current approach and why they chose to utilize it.
We thank the reviewers for raising these issues and pointing out the need for clearer and betterjustified definitions. In responding to the points raised we have revised our value definition to incorporate choice history and have recalculated all our main analyses accordingly and updated all relevant figures. We now also explicitly discuss the assumptions and limitations of our approach to defining subjective risk. Below we explain these changes in more detail.
1,1) Should value include choice history? We agree that it is important to use a comprehensive definition of value for behavioral and neuronal analyses. Accordingly, we have now modelled value in direct correspondence to our logistic regression model have now incorporated this component into one scalar value measure. Accordingly, we recalculated all our models with this revised, more comprehensive value definition, which resulted in small changes in the numbers of identified neurons. We have updated all the numbers of identified neurons and all relevant figures to reflect these changes. The new value definition is described in Results, and in Materials and methods, section “Calculation of weighted, subjective value.”
Results: “As our aim was to study the risk associated with specific objects, we estimated object value by the mean of subjectively weighted reward history over the past ten trials (Figure 3C, dashed blue curve, Equation 6); this object value definition provided the basis for estimating subjective risk as described next. […] (We consider distinctions between reward and choice history and their influence and risk in the Discussion).”
1,2) Should risk include choice history? One of the main aims of our study was to extend the wellestablished notion of the risk of choice options by introducing a risk measure derived from an animal’s recent experiences, rather than from pretrained explicit risk cues. Accordingly, in order to facilitate comparisons with previous behavioral and neurophysiological risk studies, we restricted our definition of risk to the variance of past reward outcomes; we did not extend the risk measure to include the variance of past choices irrespective of rewards as such “risk from choice variance” does not have a correspondence in the economic or neuroeconomic risk literature. Our results suggest that defining risk based on the variance of past rewards provided a reasonable account of behavioral and neuronal data: we show that our main subjective risk measure has a distinct influence on choices, is encoded by a substantial number of neurons, and that this risk measure seems to provide a better explanation of many neuronal responses compared to an alternative, objective risk measure. To better explain our motivation for defining risk based on reward history, we have included the following additional text in Discussion:
“Our definition of subjective risk has some limitations. To facilitate comparisons with previous studies, we restricted our definition to the variance of past reward outcomes; we did not extend the risk measure to the variance of past choices, which would not have a clear correspondence in the economic or neuroeconomic literature. […] An extension of this approach could introduce calculation of reward statistics over flexible time windows, possibly depending on the number of times an option was recently chosen (similar to an adaptive learning rate in reinforcement learning models).”
2) Should risk involve subtraction of value calculated from sum or mean? Thank you for pointing this out. For the results presented previously in the manuscript we did follow the general definition of risk and subtracted the mean (rather than the sum) although we used the sum for value definition. (Note that for behavioral and neuronal analyses that test the effect of value, using the sum or mean of the weighted reward history would yield identical results as the number of trials over which the mean is calculated is constant (N = 10 in our case)). For the variance calculation it is of course critical that deviations of single trials are referenced to the mean rather than the sum. We have now corrected the equation in the Materials and methods (Equation 6).
3) Influence of weights and choice history on variance. We have rewritten the risk equation (now Equation 8) to reflect correctly how we calculated risk: the weight vector was applied to the vector of squared deviations from the mean. With this definition, deviations calculated for more recent trials are given a larger weight in the variance calculation. This approach is similar to the wellestablished definition of value based on weighted reward history and it is consistent with the notion that the animals base their choice more strongly on reward outcomes of recent trials compared to more remote trials.
With this definition, variance would not be artificially inflated by the weight vector, and reward and nonreward are not treated asymmetrically. For example, if the animal chose option A 10 consecutive times and always received reward, then reward history has zero variance and multiplication with the weight vector still results in zero variance. The same would be true for the case of 10 consecutively nonrewarded trials.
The reviewer notes correctly that “if the animal chose B 5 times and then chose A 5 times and always received reward, then var_{A} will be higher than if the animal chose A all 10 times and always received reward”. This is a consequence of the time window (here: 10 trials) used for the variance calculation that also affects the calculation of value in a similar manner. We chose to follow this common approach of using a fixed temporal window over which reward statistics are calculated because of its generality and simplicity, and in order to link our study with these established approaches. An extension of this approach would be to introduce a flexible temporal window that calculates reward statistics over varying time windows, possibly depending on the number of times an option was recently chosen (similar to an adaptive learning rate in Reinforcement learning models).
To acknowledge the assumptions and limitations of our risk definition we have included the following new text in the Discussion section:
“We followed the common approach of calculating reward statistics over a fixed temporal window because of its generality and simplicity, and to link our data to previous studies. An extension of this approach could introduce calculation of reward statistics over flexible time windows, possibly depending on the number of times an option was recently chosen (similar to an adaptive learning rate in reinforcement learning models).”
4) Relationship between neural signals related to objective and subjective risk. One important aspect, however, that is not addressed is the relationship between true risk (i.e. Equation 2) and subjective risk (i.e. Equation 7). How correlated are these measures? At the behavioral level, does subjective risk do a better job at explaining choices compared to true risk?
At the neural level, the major conclusions of the paper center around the DLPFC coding of subjective object and action risk, but the paper does not clearly show that subjective risk better explains neural responses than true objective risk. For example, based on the stated results, 102 of 205 neurons (subsection “Neuronal coding of objective risk”, second paragraph) significantly responded to true object risk and 96 of 205 neurons (subsection “Neuronal coding of subjective risk associated with choice objects”, first paragraph) coded for subjective object risk. Is there a way for the authors to statistically distinguish whether DLPFC is encoding subjective rather than objective risk? If the different risk measures are uncorrelated (or only mildly correlated), this should be testable; if they are strongly correlated, I am not sure that the neural analyses centered on subjective risk (rather than true risk) are the right approach.
Showing that the authors' subjective risk estimate better captures choice and/or neural data is important because the specific quantification of subjective risk is not well known or clearly justified. Aside from the issue of the potential relationship between past choice effects and risk (see point 1 above), the weighting of past rewards in the variance calculation feels a bit arbitrary (not the reward kernel itself, which is well known in quantifying choice, but its use in estimating variance). In addition to showing that such a measure better captures behavior/neural responses, it would help if the authors could provide a more formal justification for their form of subjective risk.
The objective and subjective risk measures only showed a moderate correlation: mean shared variance was R^{2} = 0.111 ± 0.004 (mean ± s.e.m. across sessions). To establish their relevance for behavioral choices, we have included a formal model comparison, summarized in Table 1, which favored a model based on subjective value and subjective risk over a model based on objective value and objective risk. We also examined in direct comparisons whether neuronal responses were better explained by objective or subjective risk. These analyses are described in Results and in Figure 4—figure supplement 3B and C:
“A direct comparison of objective and subjective risk showed that neuronal activity tended to be better explained by subjective risk. […] When both risk measures were included in a stepwise regression model (Equation 13), and thus competed to explain variance in neuronal activity, we identified more neurons related to subjective risk than to objective risk (107 compared to 83 neurons, Figure 4—figure supplement 3C), of which 101 neurons were exclusively related to subjective risk but not objective risk (shared variance between the two risk measures across sessions: R^{2} = 0.111 ± 0.004, mean ± s.e.m.).”
5) Potential problems and weaknesses of decoding analysis. The decoding analysis needs to be strengthened, because currently this analysis does not attempt to distinguish between risk signals and other potentially covarying signals (as was done in the regression analysis). This can be accomplished for example, by balancing the trials with low and high levels of risk, in terms of other potentially confounding variables (c.f. Massi et al., 2018).
The% of neurons plots common to neuroeconomics studies (and SVMstyle decoding that only shows what info is in a pool of neurons and not how good individual neurons are at "encoding") may not be well suited for dlPFC. dlPFC is a complex spatial and object selective, attention, and motivation/reward related area and accordingly multiplexes many signals. It is simply very hard to tell what is going on from the current figures on a cell by cell basis. I want to see if the coding strategies (e.g. value, risk) of the single neurons in dlPFC are consistent across the task epochs in this task to get a better sense of what dlPFC may be doing, and to get a better understanding of the relationship of those valuerelated variables with spatial and object preferences of single neurons.
We have performed additional decoding analyses in which we balanced risk levels with respect to other taskrelated variables. The results of these analyses are shown in Figure 6E and confirm significant decoding of risk levels and are described in Results:
“Decoding of risk from neuronal responses remained significantly above chance in control analyses in which we held constant the value of other taskrelated variables including object choice, action and cue position (Figure 6E).”
We also clarify that we used these decoding analyses to examine the extent to which a biologically realistic decoder, such as a downstream neurons could read out risk levels from neuronal population responses, rather than to provide an alternative to the singleneuron regression analysis (subsection “Population decoding of object risk and action risk”). Such a downstream neuron decoding risk from its inputs would of course need to perform the decoding on naturally occurring, “unbalanced” data.
We thank the reviewer(s) for raising the interesting issue of the complexity of neuronal responses in DLPFC, and their relationships to spatial variables. We have now examined this issue in more detail in our data set and have included these analyses in Figure 7I, Figure 7—figure supplement 1, and described in the Results and in the Discussion.
Results: “The percentages of neurons coding specific pairs of variables was not significantly different than expected given the probabilities of neurons coding each individual variable (history and risk: χ2 = 1.58, P = 0.2094, value and risk: χ2 = 3.54, P = 0.0599, choice and risk: χ2 = 0.845, P = 0.358). We also tested for relationships in the coding scheme (measured by signed regression coefficients) among neurons with joint risk and choice coding or joint risk and value coding. […] In addition to the riskrelated dynamic coding transitions described above, activity in some DLPFC neurons transitioned from coding risk to coding of spatial variables such as cue position or action choice (Figure 7—figure supplement 1).”
Discussion: “Notably, while some DLPFC neurons jointly coded risk with value or choice in a common coding scheme (indicated by regression coefficients of equal sign), this was not the rule across all neurons with joint coding (Figure 7H). […] An implication for the present study might be that risk signals in DLPFC can support multiple cognitive processes in addition to decisionmaking, as also suggested by the observed relationship between risk and reaction times (Figure 3—figure supplement 1).”
6) Description of models and equations need to be improved.
6a) The authors introduce two equations (Equations 2 and 7) to define objective and subjective risks, respectively, which are central to this study. However, these two equations can be simplified. For example, it would be much easier to understand this if the Equation 2 is replaced or supplemented by a much more common expression, p(1p), for a Bernoulli distribution. Including the reward magnitude in Equation 2 doesn't seem necessary, and this causes confusion. In addition, in Equation 7, the coefficient β should be outside the square of the difference between reward and OV. If not, this this needs to be justified/explained better.
Thank you for pointing out the misplaced β in Equation 7 which we have corrected (now Equation 8). We prefer to keep the reward magnitude term in Equation 2 as the definition of variance in this notation is consistent with previous neuroscientific studies of risk and readily generalizes to situations in which different magnitudes are used. We have included a statement that explains this below the equation: “In our task, reward magnitude on rewarded trials was held constant at 0.7 ml; the definition generalizes to situations with different magnitudes.”
6b) The authors have used Equation 2 in their stepwise regression analysis. However, this seems problematic, because this seems to violate the fullrank assumption, given that ValueA+ValueB = ValueL+ValueR and RiskA+RiskB=RiskL+RiskR? Similarly, is it possible to have both TrueProbA and TrueProbB in Equation 3, given that they sum up to 1, i.e., TrueProbB = (1TrueProbA)?
Note that subjective action values and action risks were not simply spatially referenced object values and object risks but were estimated separately, based on object reward histories and action reward histories. Accordingly, the stepwise regression approach was not invalidated by the joint inclusion of these regressors in the starting set. Moreover, the true probabilities used for the analysis were not the base probabilities but the trialspecific probabilities that evolved trialbytrial from the baseline probabilities according to Equation 1. We have clarified these points in the Materials and methods section: “Note that subjective action values and action risks were not simply spatially referenced object values and object risks but were estimated separately, based on object reward histories and action reward histories.”
6c) The exact formulation of Equation 8 is a little unclear. The text states that "To account for known choice biases in matching tasks (Lau and Glimcher, 2005), we added to the objectvalue terms the weighted choicehistory using weights derived from Equation 5." Can the authors state explicitly how ObjectValue was calculated?
We have now clarified this section by introducing the new Equation 7 and related new text.
7) Novelty of the task. The authors claim that the animals are not cued and must derive a measure of risk (that would then influence their choices). This important because the authors claim that previous studies used explicit cues during learning while they do not in this study, and claim this is a key advance. While it is true, the cited papers did not utilize trialbytrial information to look at how subjective value and risk are updated on a trialbytrial basis, their approaches in many other ways seemed pretty similar to this study. Particularly, in this study, the key moment is when the reward probabilities associated with the two options change and the animal must figure out the new probabilities by experiencing two types of external cues: choice options and feedback. Broadly, the idea that this is the first paper to look at risk estimates that are independent of cueing is factually wrong and requires revision.
Furthermore, even if it was the case, the impact of this is unclear from the current manuscript. Perhaps the authors are most excited about preoptionpresentation objectrisk signals in the context of behavioral control? If so, how would this control take place? Is this an arousal / "let's get ready" signal? Or would this signal serve to bias choice to the risky object in a spatial manner (consistent with previous work on dlPFC)? Or is this risk signal to influence SV derivations elsewhere in the brain?
Specifically, are there reaction time correlates of this early "object risk signal" with action after options are on?
Novelty of task: We have toned down the aspect of cueindependence throughout. In the Introduction we added the following sentence” “Similar to previous studies (cited above), experienced rewards following choices for specific objects or actions constituted external cues for risk estimation.” To acknowledge that rewards and choice options of course constituted critical cues for risk estimation, we also removed the emphasis “without requiring explicit, riskinformative cues.” from the last sentence of the Introduction and we removed “in the absence of explicit risk information” from the first sentence of the Discussion. In the first paragraph of the Discussion, when referring to explicit cues, we have added the following: “(such as pretrained riskassociated bar stimuli or fractals)”. We have also revised the Abstract accordingly.
Influences on behavior: The Results section ‘Dynamic integration of risk with reward history, value and choice in single neurons’ provides evidence for how DLPFC neurons may integrate risk with choice and value, and we have included new analyses to show that there is also integration with spatially referenced variables in Figure 7—figure supplement 1: “In addition to the riskrelated dynamic coding transitions described above, activity in some DLPFC neurons transitioned from coding risk to coding of spatial variables such as cue position or action choice (Figure 7—figure supplement 1).” Moreover, the Discussion covers this topic in several places (sixth and seventh paragraphs, and dedicated Discussion paragraphs covering DLPFC’s contribution to risk and decisionmaking processes). Taken together, we acknowledge that risk signals in DLPFC may support several functions, in addition to influencing choices, either through local processing or connections to other brain structures.
Reaction time correlates We have performed a new analysis in which we regress saccadic reaction times on value and risk variables and other factors. These results are shown in Figure 3—figure supplement 1 and mentioned in the Results (subsection “Neuronal coding of subjective risk associated with actions”) and Discussion (sixth paragraph).
8) Problem with nonstationarity. The authors should take into account the fact that activity of prefrontal cortex is often nonstationary and is likely to be correlated serially (autocorrelation) across successive trials. This diminishes the effective degree of freedom, and can inflate the estimate of neurons encoding the signals that are related to events in multiple trials (equivalent to lowpass filtering). The authors should refer to a recent paper in eLife ("Striatal actionvalue neurons reconsidered).
We have addressed this issue with control analyses as described below. We note that in task such as the present one, probabilities and associated risk change and reset frequently even within trial blocks; accordingly, related neuronal signals tracking value or risk should be quite distinct from any potential nonstationary activity due to noise, drift or unknown sources. We explored whether potential nonstationarity in neuronal activity could have inflated estimates of riskcoding signals, as risk evolved over trials. We performed two control analyses and include these results as follows:
“Finally, we examined effects of potential nonstationarity of neuronal activity (ElberDorozko and Loewenstein, 2018), by including a first order autoregressive term in Equation 10. […] This analysis identified 56 neurons with activity related to risk (note that the control period itself was excluded from this analysis; our original analysis without the control period yields 81 risk neurons).”
“Controlling for nonstationarity of neuronal responses, we identified 83 actionrisk neurons when including a firstorder autoregressive term and 56 neurons when subtracting neuronal activity at trial start.”
9) Dynamics of DLPFC coding. The authors show that subpopulations of neurons carry multiple signals that could integrate various aspects of reward and choice information (Figure 7). Two additional analyses are important to include. First, is the percentage of neurons showing coding of two variables (last reward x choice and object risk, object value and object risk, etc.) different than that expected given the probabilities of neurons representing either variable?
Second, for those neurons that carry multiple signals, is the information coded in a consistent manner? For example, do the neurons that represent both value and risk information (Figure C, D) both modulated in the same direction by value and risk? Figure 7 plots the timeline of explained variance, but not the actual direction of modulation. One would expect that, for example in the case of value and risk, that since the behavioral data suggests that choice is driven by increases in value and risk, that neurons integrating that information would represent both in the same manner. A similar argument could be made for risk and choice. A regression weight by regression weight plot, similar to that used for the angle analyses elsewhere in the paper, would be helpful in understanding how this information is integrated across different variable pairs.
Thank you for suggesting these new analyses which we have now included in Results, Figure 7H, and Discussion:
Results: “The percentages of neurons coding specific pairs of variables was not significantly different than expected given the probabilities of neurons coding each individual variable (history and risk: χ2 = 1.58, P = 0.2094, value and risk: χ2 = 3.54, P = 0.0599, choice and risk: χ2 = 0.845, P = 0.358). We also tested for relationships in the coding scheme (measured by signed regression coefficients) among neurons with joint risk and choice coding or joint risk and value coding. […] This suggested that while some neurons used corresponding coding schemes for these variables (risk and choice, risk and value) other neurons used opposing coding schemes (see Discussion for further interpretation).”
Discussion: “Notably, while some DLPFC neurons jointly coded risk with value or choice in a common coding scheme (indicated by regression coefficients of equal sign), this was not the rule across all neurons with joint coding (Figure 7H). This result and the observed high degree of joint coding, with most DLPFC dynamically coding several taskrelated variables (Figure 7I), matches well with previous reports that neurons in DLPFC show heterogeneous coding and mixed selectivity (Rigotti et al., 2013; Wallis and Kennerley, 2010).”
[Editors' note: further revisions were requested prior to acceptance, as described below.]
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
The authors have largely addressed the concern re: the influence of past choices on subjective value, and Table 1 shows that a model with value (from reward and choice history) and risk performs best. However, the present manuscript is still confusing in that it includes two different measure of subjective object value (Equation 6 and Equation 7).
First, it is confusing that the two equations both use the same term OV_{A} for the value measure; since they are different definitions, they should have different names. Second, it is entirely not clear which measure is used in which analyses. According to the Materials and methods, Equation 6 (value from rewards alone) is used for the calculation of subjective risk (Equation 8), and Equation 7 (value from rewards and choices) is used for behavioral analyses and neural analyses involving value. So were all neural subjective risk analyses performed only with the measure derived from form Equation 6 OV_{A}? This seems odd, given that the results of the model comparison suggest that value is a function of reward and choice, and the authors use the reward/choice definition of value for neural analyses – shouldn't the subjective risk measure use deviance from OV_{A} determined from reward and choice as well?
The paper would read more clearly, and be more conceptually unified, if they use a single measure for OV_{A} derived from reward and past choices (Equation 8). Note that this is different than the "risk from choice variance" addressed by the authors in their response letter – it is simply risk as variance of rewards from subjective value (calculated from reward and choice).
Thank you for pointing these issues out. We have fully addressed the points by (i) using different names for the two value terms, (ii) clearly stating the purpose and use of each term, (iii) extending our analysis to test the suggested alternative risk definition – the results are shown in a table and indicate that numbers of risk neurons were very similar for the extended risk definition.
In detail, to rectify the first point, we revised the Results section to clarify for what purposes these definitions were used and, in the Materials and methods, we now use distinct terms for these value definitions. We also clarify that Equation 7 was our main value measure used for behavioral and neuronal analyses whereas Equation 6 was used for comparisons to risk in Figure 3C. These changes to the text are shown below.
With respect to the second point, our neural subjective risk analyses were performed with the risk measure in Equation 8, based on the sum of the weighted, squared deviations from the mean of the objectspecific reward distribution. We prefer this risk definition because it is simple (no additional assumptions about how choice history might be incorporated and weighted), directly interpretable (as deviation of reward from the mean of the reward distribution), and follows previous neuronal studies (which tested risk as variance of a reward distribution). We note that it is partly a conceptual question of whether choice history should be incorporated into neuronal measures of value or risk, or considered as a separate behavioral influence (e.g. Lau and Glimcher, 2008, modelled choice history for behavior but based their neuronal value measure on reward history without choice history).
Nevertheless, we appreciate that other, more elaborate risk definitions are possible and of interest and therefore include the following new analyses. We added a table (Figure 4—figure supplement 4) to compare the numbers of risk neurons obtained with different risk definitions, including the one suggested by the reviewer (incorporating reward and choice history). These alternative definitions yielded identical or only slightly higher numbers of risk neurons compared to our main risk definition (< 5% variation in identified neurons). We therefore focus on our main risk definition (Equation 8), which is simpler and conservative as it makes fewer assumptions.
Revised Results text: “We followed previous studies of matching behavior (Lau and Glimcher, 2005) that distinguished two influences on value: the history of recent rewards and the history of recent choices. […] Thus, we estimated object value based on both subjectively weighted reward history and subjectively weighted choice history (Equation 7); this constituted our main value measure for behavioral and neuronal analyses.”
Revised Materials and methods text: “We followed previous studies of matching behavior (Lau and Glimcher, 2005) that distinguished two influences on value: the history of recent rewards and the history of recent choices. The first objectvalue component related to reward history,OVAr, can be estimated by the mean of subjectively weighted reward history over the past ten trials (Equation 6):…”
“In tasks used to study matching behavior, such as the present one, it has been shown that choice history has an additional influence on behavior and that this influence can be estimated using logistic regression (Lau and Glimcher, 2005). To account for this second objectvalue component related to choice history, we estimated a subjective measure of object value that incorporated both a dependence on weighted reward history and a dependence on weighted choice history (Equation 7):…”
New Results text: “We also considered alternative, more complex definitions of subjective risk that incorporated either weighted reward history or both weighted reward and choice history in the risk calculation. […] We therefore focused on our main risk definition (Equation 8), which was simpler and more conservative as it incorporated fewer assumptions.”
New Materials and methods text: “Alternative, more complex definitions of subjective risk in our task could incorporate the weighting of past trials in the calculation of the mean reward (the subtrahend in the numerator of Equation 8) or incorporate both weighted reward history and weighted choice history in this calculation. We explore these possibilities in a supplementary analysis (Figure 4—figure supplement 4).”
The description in the Materials and methods (subsection “Final calculation of subjective risk”) is incorrect given the revised Equation 8, as it implies that β weights are applied to rewards rather than to the squared deviations of rewards from OV_{A} (as described by the revised equation). For example, "βjrRA" term is not in the equation, and "the summed, squared deviation of subjectively weighted reward amounts from the mean weighted value of past rewards)" does not match Equation 8.
Thank you for pointing this out. We have revised the section as follows:
Revised Materials and methods text: “We used the following definition as our main measure of subjective object risk (Equation 8):
varA=∑j=1Nβjr(RAij(∑j=1NRAij)/N)2N1
with βj rrepresenting the weighting coefficients for past rewards (derived from Equation 5), RA as reward delivery after choice of object A, j as index for past trials relative to the current ith trial, and N as the number of past trials included in the model (N = 10); the term (∑j=1NRAij)/N represents the mean reward over the last ten trials. Thus, the equation derives subjective object risk from the summed, subjectively weighted, squared deviation of reward amounts in the last ten trials from the mean reward over the last ten trials.”
https://doi.org/10.7554/eLife.44838.033Article and author information
Author details
Funding
Wellcome Trust (Principal Research Fellowship)
 Wolfram Schultz
Wellcome Trust (Programme Grant 095495)
 Wolfram Schultz
Wellcome Trust (Sir Henry Dale Fellowship 206207/Z/17/Z)
 Fabian Grabenhorst
European Research Council (Advanced Grant 293549)
 Wolfram Schultz
National Institutes of Health (Caltech Conte Center P50MH094258)
 Wolfram Schultz
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by the Wellcome Trust (Principal Research Fellowship and Programme Grant 095495 to WS; Sir Henry Dale Fellowship 206207/Z/17/Z to FG), the European Research Council (ERC Advanced Grant 293549 to WS), and the National Institutes of Health (NIH) Caltech Conte Center (P50MH094258).
Ethics
Animal experimentation: All animal procedures conformed to US National Institutes of Health Guidelines and were approved by the Home Office of the United Kingdom (Home Office Project Licenses PPL 80/2416, PPL 70/8295, PPL 80/1958, PPL 80/1513). The work has been regulated, ethically reviewed and supervised by the following UK and University of Cambridge (UCam) institutions and individuals: UK Home Office, implementing the Animals (Scientific Procedures) Act 1986, Amendment Regulations 2012, and represented by the local UK Home Office Inspector; UK Animals in Science Committee; UCam Animal Welfare and Ethical Review Body (AWERB); UK National Centre for Replacement, Refinement and Reduction of Animal Experiments (NC3Rs); UCam Biomedical Service (UBS) Certificate Holder; UCam Welfare Officer; UCam Governance and Strategy Committee; UCam Named Veterinary Surgeon (NVS); UCam Named Animal Care and Welfare Officer (NACWO).
Senior Editor
 Richard B Ivry, University of California, Berkeley, United States
Reviewing Editor
 Daeyeol Lee, Yale School of Medicine, United States
Reviewer
 Kenway Louie, New York University, United States
Publication history
 Received: January 7, 2019
 Accepted: July 12, 2019
 Version of Record published: July 25, 2019 (version 1)
Copyright
© 2019, Grabenhorst et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 700
 Page views

 118
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.