Neuronal origins of reduced accuracy and biases in economic choices under sequential offers

Economic choices are characterized by a variety of biases. Understanding their origins is a long-term goal for neuroeconomics, but progress on this front has been limited. Here, we examined choice biases observed when two goods are offered sequentially. In the experiments, rhesus monkeys chose between different juices offered simultaneously or in sequence. Choices under sequential offers were less accurate (higher variability). They were also biased in favor of the second offer (order bias) and in favor of the preferred juice (preference bias). Analysis of neuronal activity recorded in the orbitofrontal cortex revealed that these phenomena emerged at different computational stages. Lower choice accuracy reflected weaker offer value signals (valuation stage), the order bias emerged during value comparison (decision stage), and the preference bias emerged late in the trial (post-comparison). By neuronal measures, each phenomenon reduced the value obtained on average in each trial and was thus costly to the monkey.


Introduction 49
Some of the most mysterious aspects of economic behavior are choice biases documented in 50 behavioral economics (Camerer et al., 2003;Kahneman and Tversky, 2000; Lichtenstein and 51 Slovic, 2006). Standard economic theory fails to account for these effects, and shedding light on 52 their origins is a long-term goal for neuroeconomics (Camerer et al., 2005; Glimcher and 53 Rustichini, 2004). Progress on this front has been relatively modest, largely because the neural 54 underpinnings of (even simple) choices were poorly understood until recently. However, the last First, the identification in OFC and other brain regions of distinct groups of neurons encoding 68 different decision variables is essential to ultimately understand the neural circuit and the 69 mechanisms through which economic decisions are formed. 70 Second, in a more conceptual sense, the results summarized above provide a long-sought 71 validation for the construct of value. The proposal that choices entail computing and comparing 72 subjective values was put forth by early economists such as Bernoulli and Bentham (Niehans, 73 1990). Although this idea has remained influential, values defined at the behavioral level suffer 74 from a fundamental problem of circularity. On the one hand, choices supposedly maximize 75 values; on the other hand, values cannot be measured behaviorally independent of choices 76 (Samuelson, 1938). Because of this problem, the construct of value gradually lost centrality in 77 economic theory. Thus in the standard neoclassic formulation choices are "as if" driven by 78 values, but there is no commitment to the idea that agents actually compute values (Samuelson, 79 1947). In this perspective, the fact that neuronal firing rates in any brain region are linearly 80 related to values defined at the behavioral level constitutes powerful evidence that choices 81 indeed entail the computation of values (Camerer, 2008). 82 Third and less frequently discussed, the identification of neurons encoding offer values and 83 other decision variables, together with some rudimentary understanding of the decision circuit, 84 provides the opportunity to break the circularity problem described above. To appreciate this 85 point, consider the fact that economic choices are often affected by seemingly idiosyncratic 86 biases. For example, while choosing between two options offered sequentially, people and 87 monkeys typically show a bias favoring the second option (Ballesta and Padoa-Schioppa, 2019; 88 Krajbich et al., 2010; Rustichini et al., 2021). This order bias might occur for at least two 89 reasons.
(1) Subjects might assign a higher value to any given good if that good is offered 90 second.
(2) Alternatively, subjects might assign identical values independent of the presentation 91 order, and the bias might emerge downstream of valuation, for example during value 92 comparison. In the latter scenario, by introducing the order bias, the decision process would 93 actually fail to maximize the value obtained by the agent. Due to the circularity problem 94 described above, these two hypotheses are ultimately not distinguishable based on behavior 95 alone. However, access to a credible neural measure for the offer values makes it possible, at 96 least in principle, to disambiguate between them. The results presented in this study build on 97 this fundamental idea. 98 We focused on choice biases measured when two goods are offered sequentially. In the 99 experiments, monkeys chose between two juices offered in variable amounts. In each session, 100 we randomly interleaved two types of trials referred to as two tasks. In Task 1, offers were 101 presented simultaneously; in Task 2, offers were presented in sequence. Comparing choices 102 across tasks revealed three phenomena.
(2) 104 Choices in Task 2 were biased in favor of the second offer (order bias).
(3) Choices in Task 2 105 were biased in favor of the preferred juice (preference bias). These effects are especially 106 interesting because in most daily situations offers available for choice appear or are examined 107 sequentially. Thus we investigated the neuronal origins of these phenomena. values are compared. The output of this circuit feeds brain regions involved in working memory 128 and the construction of action plans (Fig.1). 129 This framework guided a series of analyses relating the activity of each cell group to the choice 130 biases described above. Our results revealed that different phenomena emerged at different 131 computational stages. The lower choice accuracy observed under sequential offers reflected 132 weaker offer value signals (valuation stage). Conversely, the order bias did not have neural 133 correlates at the valuation stage, but rather emerged during value comparison (decision stage). 134 Finally, the preference bias did not have neural correlates at the valuation stage or during value 135 comparison; it emerged late in the trial, shortly before the motor response.

Reduced accuracy and biases in choices under sequential offers 140
Two monkeys participated in the experiments. In each session, they chose between two juices 141 labeled A and B, with A preferred. Offers were represented by sets of colored squares on a 142 monitor, and animals indicated their choice with a saccade. In each session, two choice tasks 143 were randomly interleaved. In Task 1, offers were presented simultaneously ( Fig.2A); in Task 2,  144 offers were presented in sequence (Fig.2B). A cue displayed at the beginning of the trial 145 revealed to the animal the task for that trial. Offers varied from trial to trial, and we indicate the 146 quantities offered in any given trial with q A and q B . An "offer type" was defined by two quantities 147 [q A , q B ], and the same offer types were used for the two tasks in each session. For Task 2, trials 148 in which juice A was offered first and trials in which juice B was offered first are referred to as 149 "AB trials" and "BA trials", respectively. The first and second offers are referred to as "offer1" 150 and "offer2", respectively. 151 The data set included 241 sessions (101 from monkey J, 140 from monkey G; see Methods).

152
Sessions lasted for 217-880 trials (mean ± std = 589 ± 160). For each session, we analyzed 153 choices in the two tasks separately using probit regressions. For Task 1 (simultaneous offers), 154 we used the following model: 155 (1) 156 where choice B = 1 if the animal chose juice B and 0 otherwise, Φ was the cumulative function 158 of the standard normal distribution, and q A and q B were the quantities of juices offered on any 159 given trial. From the fitted parameters a 0 and a 1 , we derived measures for the relative value of 160 the two juices ρ Task1 = exp(-a 0 /a 1 ) and for the sigmoid steepness η Task1 = a 1 . Intuitively, the 161 relative value was the quantity ratio q B /q A that made the animal indifferent between the two 162 juices, and the sigmoid steepness was inversely related to choice variability. 163 For Task 2 (sequential offers), we used the following model: 164 (2) 165 X = a 2 + a 3 log(q B /q A ) + a 4 (δ order,AB -δ order,BA ) 166 where δ order,AB = 1 for AB trials and 0 otherwise, and δ order,BA = 1 -δ order,AB . In essence, AB trials 167 and BA trials were analyzed separately but assuming that the two sigmoids were parallel. From 168 the fitted parameters a 2 , a 3 and a 4 , we derived measures for the relative value of the two juices 169 ρ Task2 = exp(-a 2 /a 3 ), for the sigmoid steepness η Task2 = a 3 , and for the order bias ε = 2 ρ Task2 170 a 4 /a 3 . Intuitively, the order bias was a bias favoring the first or the second offer. Specifically, ε<0 171 indicated a bias favoring offer1; ε>0 indicated a bias favoring offer2. We also defined relative 172 values specific to AB trials and BA trials as ρ AB = exp(-(a 2 +a 4 )/a 3 ) and ρ BA = exp(-(a 2 -a 4 )/a 3 ). Of 173 note, the order bias was defined such that 174 The experimental design gave us the opportunity to compare choices across tasks 176 independently of factors such as selective satiation or changes in the internal state. The relative 177 values measured in the two tasks were highly correlated (Fig.2EF). At the same time, our 178 analyses revealed three interesting phenomena. First, for both animals, sigmoids measured in 179 Task 2 were significantly shallower compared to Task 1 (Fig.2GH). In other words, presenting 180 offers in sequence reduced choice accuracy. Second, in Task 2, both animals showed a  181 consistent order bias favoring offer2 (Fig.2IJ). Third, in both animals, relative values in Task 2  182 were significantly higher than in Task 1 (ρ Task2 >ρ Task1 ), and this effect increased with the relative 183 value (Fig.2EF). In other words, the ellipse marking the 90% confidence interval for the joint 184 distribution of relative values laid above the identity line and was rotated counterclockwise 185 compared to the identity line. 186 To further investigate the differences in relative values measured across tasks, we quantified 187 them separately in AB trials and BA trials in each monkey. We thus examined the relation 188 between ρ Task1 and ρ Task2,AB and, separately, that between ρ Task1 and ρ Task2,BA (Fig.3). In both 189 animals and in both sets of trials, the ellipse marking the 90% confidence interval was rotated 190 counterclockwise compared to the identity line. Furthermore, the ellipse measured for BA trials 191 was higher than that for AB trials. We quantified these observations with an analysis of 192 covariance (ANCOVA) using the presentation order (AB, BA) as a covariate and imposing 193 parallel lines (Fig.3C had an additional bias favoring juice A in Task 2, and that this bias increased as a function of 198 the relative value ρ. We refer to this phenomenon as the preference bias. 199

Computational framework 200
The following sections present a series of results on the neuronal origins of these behavioral 201 phenomena. We begin by discussing the computational framework for the analyses. 202 Economic choice is thought to entail two stages: values are assigned to the available offers and 203 a decision is made by comparing values. Importantly, in our tasks and in most circumstances, 204 choices elicit an ensemble of mental operations taking place before, during and after the 205 computation and comparison of offer values. Upstream of valuation, choices examined here 206 entail the sensory processing of visual stimuli and the retrieval from memory of relevant 207 information (e.g., the association between color and juice type). Downstream of value 208 comparison, the decision outcome must guide a suitable motor response. In addition, 209 performance in Task 2 requires holding in working memory the value of offer1 until offer2, 210 remembering the decision outcome for an additional delay, and mapping that outcome onto the 211 appropriate saccade target (Fig.2B). In principle, choice biases could emerge at any of these 212 computational stages. Likewise, each of these mental operations could be noisy and thus 213 contribute to choice variability. 214 Neuronal activity in OFC does not capture all of these processes. However, previous work 215 indicates that neurons in this area participate both in value computation and value comparison. 216 In the framework proposed here (Fig.1 we noted that offer value signals in Task 2 were significantly weaker than in Task 1. Fig.4AC  241 illustrates one example cell. In both tasks, this neuron encoded the offer value B. However, the 242 activity range (see Methods) measured in Task 2 was smaller than that measured in Task 1.

243
This effect was also observed at the population level. For this analysis, we pooled offer value 244 cells associated with juices A and B, and with positive or negative encoding (see Methods). For 245 Task 1, we focused on the post-offer time window; for Task 2, we focused on post-offer1 and 246 post-offer2 time windows, pooling trial types from both windows. For each cell, we imposed that 247 the response be significantly tuned in these time windows in each task, and we quantified the 248 mean activity and the activity range (Δr, see Methods). At the population level, the mean 249 activity did not differ significantly across tasks (p = 0.6, t test; p = 0.4, Wilcoxon test Fig.4D). In 250 contrast, the activity range was significantly lower in Task  activity range (Fig.4E). We thus examined the relation between the difference in sigmoid 260 steepness (Δη = η Task2 -η Task1 ) and the difference in activity range (ΔΔr = Δr Task2 -Δr Task1 ). The 261 two measures were positively correlated (Spearman r = 0.2, p = 0.01; Pearson r = 0.3, p = 262 0.003; Fig.4F). In other words, the drop in choice accuracy observed in Task 2 compared to 263 Task 1 correlated with weaker offer value signals. A similar analysis of chosen value cells found 264 that the activity range Δr was reduced in Task 2 compared to Task 1. However, this effect and 265 the drop in choice accuracy were not significantly correlated ( Fig.4figure supplement 1). 266 In conclusion, the lower choice accuracy measured in Task 2 compared to Task 1 correlated  267 with weaker offer value signals in OFC. Thus this behavioral phenomenon emerged, at least 268 partly, during valuation. 269

The order bias emerged during value comparison 270
The next series of analyses focused on the neural origins of the order bias (ε). Since this 271 phenomenon pertains only to choices under sequential offers, we included in the analyses an 272 additional data set recorded in the same animals performing only Task 2 (see Methods). 273 In the framework of Fig.1 (post-offer2 time window). If the order bias emerged during valuation, the mean activity and/or 280 the activity range should be higher for the latter (Fig.5figure supplement 1A). Contrary to 281 this prediction, across a population of 128 cells, we did not find any systematic difference in 282 mean activity or activity range ( Fig.5figure supplement 1BC). Furthermore, the difference 283 between the activity parameters measured in OE and EO trials did not correlate with the order 284 bias (Fig.5figure supplement 1D). In conclusion, assigned values did not depend on the 285 presentation order. 286 We next examined whether the order bias emerged during value comparison. If so, the bias 287 should be reflected in the activity of both chosen juice and chosen value cells (Fig.1). For 288 chosen value cells, the hypothesis might be tested noting that in post-offer1 and post-offer2 time 289 windows these neurons encoded the value currently offered independently of the juice type 290 ( were correlated with each other (Fig.5B). Most importantly, the difference Δρ neuronal and the 302 order bias ε were significantly correlated across the population (Spearman r = 0.3, p = 0.007; 303 Pearson r = 0.2, p = 0.02; Fig.5C). Hence, session-to-session fluctuations in the activity of 304 chosen value cells correlated with fluctuations in the order bias. 305 Further insights on the order bias came from the analysis of chosen juice cells. Again, for each 306 neuron, E and O indicated the juice encoded by the cell and the other juice, respectively. A 307 previous study found that the baseline activity of chosen juice cells recorded in OE trials 308 immediately before offer2 was negatively correlated with the value of offer1 (i.e., the value of the 309 other juice)a phenomenon termed circuit inhibition (Ballesta and Padoa-Schioppa, 2019). If 310 the decision is conceptualized as the evolution of a dynamic system (Rustichini and Padoa-311 Schioppa, 2015; Wang, 2002), circuit inhibition sets the system's initial conditions and is thus 312 integral to value comparison. In this account, the evolving decision is essentially captured by the 313 activity of chosen juice cells in OE trials, which reflects a competition between the negative 314 offset set by the value of offer1 (initial condition) and the incoming signal encoding the value of 315 offer2. If so, the intensity of circuit inhibition should be negatively correlated with the order bias. 316 We tested this prediction as follows. First, we replicated previous findings and confirmed the 317 presence of circuit inhibition in our primary data set (Fig.6A). We then focused on a 300 ms 318 time window starting 250 ms before offer2 onset. For each chosen juice cell, we regressed the 319 firing rate against the normalized offer1 value (see Methods). Thus the regression slope c 1 320 quantified circuit inhibition for individual cells. Across a population of 295 chosen juice cells, 321 mean(c 1 ) was significantly <0 (p = 5 10 -6 , t test; p = 9 10 -8 , Wilcoxon test; Fig.6B). Third, we 322 examined the relation between circuit inhibition (c 1 ) and the order bias (ε). Confirming the 323 prediction, the two measures were significantly correlated across the population (Spearman r = 324 0.1, p = 0.02; Pearson r = 0.1, p = 0.02; Fig.6C). In other words, stronger circuit inhibition (more 325 negative c 1 ) corresponded to a weaker order bias (smaller ε). 326 In conclusion, the order bias did not originate before or during valuation. Analysis of chosen 327 juice cells and chosen value cells indicated that the order bias emerged during value 328 comparison (decision stage). 329

The preference bias emerged late in the trial (post-comparison) 330
When offers were presented sequentially (Task 2), both monkeys showed an additional 331 preference bias that favored juice A and was more pronounced when the relative value of the 332 two juices was larger (Fig.3). Our last series of analyses focused on the origins of this bias. 333 First, we inquired whether the preference bias emerged during valuation. If this was the case, 334 one or both of the following should be true: (a) offer value A cells encoded higher values in Task  335 2 than in Task 1 and/or (b) offer value B cells encoded lower values in Task 2 than in Task 1. 336 Furthermore, these putative effects should increase as a function of the relative value. To test 337 these predictions, we examined the tuning functions of offer value cells. For each cell group 338 (offer value A, offer value B), we pooled neurons with positive and negative encoding. For Task  339 1, we focused on the post-offer time window; for Task 2, we focused on post-offer1 and post-340 offer2 time windows, pooling trial types from both windows. Indicating with b 0 and b 1 the tuning 341 intercept and tuning slope (see Methods, Eq.8), we computed the difference in intercept Δb 0 = 342 b 0,Task2b 0,Task1 and the difference in slope Δb 1 = b 1,Task2b 1,Task1 for each cell. We then 343 examined the relation between these measures and the relative value ρ across the population, 344 separately for each cell group. Contrary to the prediction, we did not find any correlation 345 between neuronal measures (Δb 0 , Δb 1 ) and the behavioral measure (ρ) for either offer value A 346 or offer value B cells ( Fig.7figure supplement 1). Thus the preference bias did not seem to 347 emerge at the valuation stage. 348 We next examined chosen value cells. As discussed above, their activity provided a neuronal 349 measure for the relative value (ρ neuronal ), which reflected the internal subjective values of the 350 juices emerging during value comparison. In principle, ρ neuronal might differ from the relative value 351 derived from choices through the probit regression (ρ behavioral ) because choices might be affected 352 by systematic biases originating downstream of value comparison (Fig.1). In the light of this 353 consideration, we examined the relation between the neuronal measure of relative value in Task 354 2 (ρ neuronal Task2 , see Methods) and the behavioral measures obtained in the two tasks 355 (ρ behavioral Task1 , ρ behavioral Task2 ). We envisioned two possible scenarios (Fig.7A). In scenario 1, the 356 preference bias reflected a difference in values across tasks. In other words, the subjective 357 values of the juices in the two tasks were different and such that the relative value of juice A 358 was higher in Task 2 than in Task 1. If so, ρ neuronal Task2 should be statistically indistinguishable  359 from ρ behavioral Task2 and systematically larger than ρ behavioral Task1 . In scenario 2, the subjective 360 values of the juices were the same in both tasks and the preference bias reflected some 361 neuronal process taking place downstream of value comparison. If so, ρ neuronal Task2 should be 362 statistically indistinguishable from ρ behavioral Task1 and systematically smaller than ρ behavioral Task2 . 363 The results of our analysis clearly conformed with scenario 2 (Fig.7B in Task 2 was equal to that inferred from choices in Task 1, and significantly different from that 371 inferred from choices in Task 2. This fact implies that the preference bias was costly for the 372 monkey, as it reduced the value obtained on average at the end of each trial (see Discussion). 373 In summary, the preference bias did not reflect differences in the values assigned to individual 374 offers (offer values). Furthermore, insofar as the activity of chosen value cells reflects the 375 decision process (Fig.1), the preference bias did not seem to emerge during value comparison. 376 So how can one make sense of this behavioral phenomenon? At the cognitive level, the 377 preference bias might be interpreted as due to the higher demands of Task 2. When the two 378 saccade targets appeared on the monitor, information about values was no longer on display 379 (Fig.2B). If at that point the animal had not finalized its decision, or if it had failed to retain in 380 working memory the decision outcome, the animal might have selected the target associated 381 with the better juice (juice A). Such bias would have been especially strong when the value 382 difference between the two juices was large. In this view, the preference bias would reflect a 383 "second thought" occurring after value comparison, in some trials. 384 To test this intuition, we turned to the activity of chosen juice cells. As noted above, in Task 2, 385 the evolving decision was captured by the activity of these neurons recorded in OE trials 386 immediately before and after offer2 onset (Fig.8A). More specifically, the state of the ongoing 387 decision was captured by the distance between the two traces corresponding to the two 388 possible choice outcomes (E chosen, O chosen). For any neuron, we quantified this distance 389 with an ROC analysis, which provided a choice probability (CP). In essence, CP can be 390 interpreted as the probability with which an ideal observer may guess the eventual choice 391 outcome based on the activity of the cell. For each chosen juice cell, we computed the CP at 392 different times in the trial. Across the population, mean(CP) exceeded chance level starting 393 shortly before offer2, consistent with the above discussion on circuit inhibition. We then 394 proceeded to investigate the origins of the preference bias. 395 We reasoned that, at the net of noise in measurements and cell-to-cell variability, CPs ultimately 396 quantify the animal's commitment to the eventual choice outcome. If the preference bias 397 emerged late in the trialperhaps after target presentation, if animals had not already finalized 398 their decisionthe intensity of the preference bias should be inversely related to the animals' 399 commitment to the eventual choice outcome measured earlier in the trial. In other words, there 400 should be a negative correlation between the preference bias and CPs computed at the time 401 when decisions normally take place (shortly before or after offer2 onset). Our analyses 402 supported this prediction. To quantify the preference bias intensity independent of the juice pair, 403 we defined the preference bias index PBI = 2 (ρ Task2 -ρ Task1 ) / (ρ Task2 + ρ Task1 ). We then focused 404 on four 250 ms time windows before offer1 (control window), before and after offer2 onset, and 405 before juice delivery (Fig.8B-E). Confirming our predictions, CP and PBI were significantly anti-406 correlated immediately before and during offer2 presentation, but not in the control time window 407 or late in the trial (Fig.8F-I). 408 In conclusion, our results indicated that the preference bias did not emerge during valuation or 409 during value comparison. Conversely, our results suggest that the preference bias emerged late 410 in the trial, as a "second thought" process that guided choices when decisions were not finalized 411 based on offer values alone. 412

Behavioral values, neuronal values and the origins of choice biases 414
Early economists proposed that choices between goods entail the computation and comparison 415 of subjective values. However, the concept of value is somewhat slippery, because values 416 relevant to choices cannot be measured behaviorally other than from choices themselves. This 417 circularity problem haunted generations of scholars, dominating academic debates in the 19 th 418 and 20 th century. In the end, neoclassic economic theory came to reject (cardinal) values and to 419 rely only on (ordinal) preferences (Niehans, 1990;Samuelson, 1947  quantifying the degree to which the decision was finalized when offer values are "normally" 471 compared (i.e., upon presentation of the second offer). These findings indicate that the 472 preference bias emerged late in the trial. As a caveat, the hypothesis discussed above, linking 473 different cell groups in OFC to specific decision stages, awaits further confirmation. 474 Two of our findings are particularly relevant to the distinction between behavioral values and 475 neuronal values. First, the activity of offer value cells did not present any difference associated 476 with the presentation order or with the juice preference. Second, relative values derived from 477 chosen value cells under sequential offers differed significantly from behavioral measures 478 obtained in the same task, and were indistinguishable from behavioral measures obtained in the 479 other task (simultaneous offers). Thus the order bias and the preference bias highlighted 480 significant differences between neuronal and behavioral measures of value. These observations 481 imply that the order bias and the preference bias emerged downstream of valuation. 482 Importantly, they also imply that the two choice biases imposed a cost to the animals, in the 483 sense that they reduced the (neuronal) value obtained on average in any given trial. Notably, it 484 would be impossible to draw such conclusion based on choices alone. juices were independent of the choice task, and independent of the presentation order in Task  510 2. Thus scenario (b) held true with respect to both the order bias and the preference bias. 511 Consequently, both biases were detrimental to the animals. 512 With respect to the preference bias, one question is whether the bias affected choices in Task 1 513 or in Task 2 (in principle, there could be a bias favoring the unpreferred juice in Task 1). The 514 fact that ρ behavioral Task1 , ρ neuronal Task1 , and ρ neuronal Task2 were all indistinguishable from each other 515 while ρ behavioral Task2 differed from them (Fig.7) argues for the latter understanding. 516 It is interesting to speculate whether the choice biases documented here might benefit the 517 animal in some more general sense. For example, one might wonder whether the cost imposed 518 by the preference bias was lower than the metabolic cost the monkey would have incurred to 519 increase its performance level and avoid that bias. If so, the preference bias would be, in fact, 520 ecologically adaptive. Addressing this question would require quantifying the metabolic cost of 521 increasing performance in the same value units used for the juicesa challenge open for future 522 studies. However, independent of that assessment, our present results indicate that the putative 523 metabolic cost of increasing performance in the task did not explicitly enter the decision 524 process. If metabolic cost affected behavior, it did so in a meta-decision sense. . That sigmoids were shallower in Task 2 means that the average payoff was lower 530 in that taska detriment to the animal. Again, it is interesting to speculate whether weaker offer 531 value signals recorded in Task 2 might also benefit the animal in some way, perhaps by 532 reducing cognitive or metabolic costs. This question remains open for future studies. 533 Importantly, such costs did not explicitly enter the decision process; if they affected behavior, 534 they did so in a meta-decision sense. 535

Conclusions 536
The past two decades have witnessed a lively interest for the neural underpinnings of choice 537 behavior. In this effort, a significant breakthrough came from the adoption of behavioral In each session, the animal chose between two juices labeled A and B (A preferred) offered in 565 variable amounts. Trials with two choice tasks, referred to as Task 1 and Task 2, were pseudo-566 randomly interleaved. In both tasks, offers were represented by sets of colored squares 567 displayed on the monitor. For each offer, the color indicated the juice type and the number of 568 squares indicated the quantity. Each trial began with the animal fixating a large dot. After 0.5 s, 569 the initial fixation point changed to a small dot or a small cross; the new fixation point cued the 570 animal to the choice task used in that trial. In Task 1 ( Fig.2A), cue fixation (0.5 s) was followed 571 by the simultaneous presentation of the two offers. After a randomly variable delay (1-1.5 s), the 572 center fixation point disappeared and two saccade targets appeared near the offers (go signal). 573 The animal indicated its choice with an eye movement. It maintained peripheral fixation for 0.75 574 s, after which the chosen juice was delivered. In Task 2 (Fig.2B), cue fixation (0.5 s) was 575 followed by the presentation of one offer (0.5 s), an inter-offer delay (0.5 s), presentation of the 576 other offer (0.5 s), and a wait period (0.5 s). Two colored saccade targets then appeared on the 577 two sides of the fixation point. After a randomly variable delay (0.5-1 s), the center fixation point 578 disappeared (go signal). The animal indicated its choice with a saccade, maintained peripheral 579 fixation for 0.75 s, after which the chosen juice was delivered. Central and peripheral fixation 580 were imposed within 4-6 and 5-7 degrees of visual angle, respectively. Aside from the initial 581 cue, the choice tasks were nearly identical to those used in previous studies (Ballesta and 582 Padoa-Schioppa, 2019; Padoa-Schioppa and Assad, 2006). 583 For any given trial, q A and q B indicate the quantities of juices A and B offered to the animal, 584 respectively. An "offer type" was defined by two quantities [q A q B ]. On any given session, we 585 used the same juices and the same sets of offer types for the two tasks. For Task 1, the spatial 586 configuration of the offers varied randomly from trial to trial. For Task 2, the presentation order 587 varied pseudo-randomly and was counterbalanced across trials for any offer type. The terms 588 "offer1" and "offer2" indicated, respectively, the first and second offer, independently of the juice 589 type and amount. Trials in which juice A was offered first and trials in which juice B was offered 590 first were referred as "AB trials" and "BA trials", respectively. The spatial location (left/right) of 591 saccade targets varied randomly. The juice volume corresponding to one square (quantum) was 592 set equal for the two choice tasks and remained constant within each session. It varied across 593 sessions between 70 and 100 μl for both monkeys. The association between the initial cue 594 (small dot, small cross) and the choice task varied across sessions in blocks. Across sessions, 595 we used 12 different juices (and colors) and 45 different juice pairs. Based on a power analysis, 596 in most sessions the number of trials for Task 2 was set equal to 1.5 times that for Task 1. 597 Neuronal recordings were guided by structural MRI scans (1 mm sections) obtained before and 598 after surgery and targeted area 13m (Ongur and Price, 2000). We recorded from both 599 hemispheres in both monkeys. Tungsten single electrodes (100 µm shank diameter; FHC) were 600 advanced remotely using a custom-built motorized micro-drive. Typically, one motor advanced 601 two electrodes placed 1 mm apart, and 1-2 such pairs of electrodes were advanced unilaterally 602 or bilaterally in each session. Neural signals were amplified (gain: 10,000) band-pass filtered 603 (300 Hz -6 kHz; Lynx 8, Neuralynx), digitized (frequency: 40 kHz) and saved to disk (Power 604 1401, Cambridge Electronic Design). Spike sorting was performed off-line (Spike2, v6, 605 Cambridge Electronic Design). Only cells that appeared well isolated and stable throughout the 606 session were included in the analysis. 607

Behavioral analyses 608
In each session, choice patterns were analyzed using probit regressions as described in the 609 main text (Eq.1 and Eq.2). For convenience, we repeat here the equation only for Task 1: 610 (4) 611

X = a 0 + a 1 log(q B /q A ) 612
Here Φ indicates the cumulative function of the standard normal distribution. This model is 613 referred to as the "log value ratio" model. For Task 1 (simultaneous offers), the probit fit 614 provided measures for the relative value ρ Task1 and the sigmoid steepness η Task1 . For Task 2 615 (sequential offers), the probit fit provided measures for the relative value ρ Task2 , the sigmoid 616 steepness η Task2 and the order bias ε. Subsequent analyses of neuronal activity relied on these 617 behavioral measures. 618 To test the robustness of our findings, we conducted a series of control analyses. First, we fitted 619 a probit using a "value difference" model, defined as follows: 620

X = a 0 q A + a 1 q B 622
Second, we fitted a logit using a log value ratio model: 623 624

X = a 0 q A + a 1 q B 628
Each of these fit provided measures for each of the parameters characterizing choices in the 629 two tasks (ρ Task1 , ρ Task2 , etc.). For each session and for each model we obtained an R 2 . We then 630 compared different models by computing the distribution of BIC across sessions for each pair of 631 models. We generally found that log value ratio models provided a better fit compared to value 632 difference models, consistent with theoretical considerations (Padoa Schioppa, 2022). We also 633 found that logit models provided a better fit compared to probit models, although measures of 634 relative value, sigmoid steepness and order bias were very similar and highly correlated. For 635 consistency with previous studies, we report the results of neuronal analyses based on neuronal 636 measures derived from Eqs.1-2. However, all our results held true using measures derived from 637 logit regressions. 638 Notably, Eq.2 describes two parallel sigmoids. In a control analysis, we relaxed this assumption 639 and fitted choices in AB and BA trials with two independent sigmoids. Analyzing neuronal 640 activity based on measures derived from this analysis did not substantially alter any of the 641 results. 642 Finally, we defined the order bias as ε = 2 ρ Task2 a 4 /a 3 . This definition is particularly convenient 643 for the present analyses because ε equals the difference ρ BA -ρ AB (Eq.3). Alternative and valid

644
definitions include ε=a 4 and ε=a 4 /a 3 . Control analyses showed that using these definitions did 645 not substantially alter any of the results. Task 1, we defined four time windows: post-offer (0.5 s after offer onset), late-delay (0.5-1 s 655 after offer onset), pre-juice (0.5 s before juice onset) and post-juice (0.5 s after juice onset). A 656 "trial type" was defined by two offered quantities and a choice. For Task 2, we defined three 657 time windows: post-offer1 (0.5 s after offer1 onset), post-offer2 (0.5 s after offer2 onset) and 658 post-juice (0.5 s after juice onset). A "trial type" was defined by two offered quantities, their order 659 and a choice. For each task, each trial type and each time window, we averaged spike counts 660 across trials. A "neuronal response" was defined as the firing rate of one cell in one time window 661 as a function of the trial type. Neuronal responses in each task were submitted to an ANOVA 662 (factor: trial type). Neurons passing the p<0.01 criterion in ≥1 time window in either task were 663 identified as "task-related" and included in subsequent analyses. significantly from zero (p<0.05), the variable was said to "explain" the response. In this case, we 669 set the signed R 2 as sR 2 = sign(b 1 ) R 2 ; if the variable did not explain the response, we set sR 2 = 670 0. After repeating the operation for each time window, we computed for each cell the sum(sR 2 ) 671 across time windows. Neurons explained by at least one variable in one time window, such that 672 sum(sR 2 ) ≠ 0, were said to be tuned; other neurons were labeled "untuned". Tuned cells were 673 assigned to the variable and sign providing the maximum |sum(sR 2 )|, where |·| indicates the 674 absolute value. Thus indicating with "+" and "-" the sign of the encoding, each neuron was 675 classified in one of 9 groups: neurons presented one of 8 patterns referred to as "sequences". Classification proceeded as 690 follows. For each cell and each time window, we regressed the neuronal response against each 691 of the variables predicted by each sequence. If the regression slope b 1 differed significantly from 692 zero (p<0.05), the variable was said to explain the response and we set the signed R 2 as sR 2 = 693 sign(b 1 ) R 2 ; if the variable did not explain the response, we set sR 2 = 0. After repeating the 694 operation for each time window, we computed for each cell the sum(sR 2 ) across time windows 695 for each of the 8 sequences. Neurons such that sum(sR 2 ) ≠ 0 for at least one sequence were 696 said to be tuned; other neurons were untuned. Tuned cells were assigned to the sequence that 697 provided the maximum |sum(sR 2 )|. As a result, each neuron was classified in one of 9 groups: 698 seq #1, seq #2, seq #3, seq #4, seq #5, seq #6, seq #7, seq #8 and untuned (Table 1). 699 The results of the two classifications were compared using analyses for categorical data. In 700 essence, we found a strong correspondence between the cell classes identified in the two 701 choice tasks (Shi et al., 2022a). Hence, we may refer to the different groups of cells using the 702 standard nomenclatureoffer value, chosen juice and chosen valueindependently of the 703 choice task. Based on this result, we proceeded with a comprehensive classification based on 704 the activity recorded in both choice tasks. For each task-related cell, we calculated the sum(sR 2 ) 705 for the eight variables in Task 1 (sum(sR 2 ) Task1 ) and eight sequences in Task 2 (sum(sR 2 ) Task2 ) 706 as described above. We then added the corresponding sum(sR 2 ) Task1 and sum(sR 2 ) Task2 to 707 obtain the final sum(sR 2 ) final . Neurons such that sum(sR 2 ) final ≠ 0 for at least one class were said 708 to be tuned; other neurons were untuned. Tuned cells were assigned to the cell class that 709 provided the maximum |sum(sR 2 ) final |. 710

Data sets 711
In some sessions, one or both choice patterns presented complete or quasi-complete 712 separationi.e., the animal split choices for <2 offer types in Task 1 and/or in Task 2. In these 713 cases, the probit regression did not converge, the resulting steepness η was high and unstable, 714 and the relative value was not unique. This issue affected the classification analyses described 715 above only marginally, but for the present study it was critical that behavioral measures be 716 accurate and precise. We thus restricted our analyses to stable sessions by imposing an 717 interquartile criterion on the sigmoid steepness (Tukey, 1977). Defining IQR as the interquartile 718 range, values below the first quartile minus 1.5*IQR or above the third quartile plus 1.5*IQR 719 were identified as outliers and excluded. Thus our entire data set included 1,204 neurons (577 720 from monkey J, 627 from monkey G) recorded in 241 sessions (101 from monkey J, 140 from 721 monkey G). In this population, the classification procedures identified 183 offer value cells, 160 722 chosen juice cells and 174 chosen value cells. These neurons constitute the primary data set for 723 this study. 724 Most of our analyses compared choices and neuronal activity across tasks and were restricted 725 to the primary data set. However, some analyses included only trials from Task 2 and quantified 726 the effects due to the presentation order (AB vs. BA). In these analyses we included an 727 additional data set recorded previously from the same two animals performing only Task 2 728 (Ballesta and Padoa-Schioppa, 2019). All the procedures for behavioral control and neuronal 729 recording were essentially identical to those described above. Furthermore, behavioral analyses 730 and inclusion criteria were identical to those used for the primary data set. The resulting data set 731 included 1,205 neurons (414 from monkey J, 791 from monkey G) recorded in 196 sessions (51 732 from monkey J, 145 from monkey G). In this population, the classification procedures identified 733 243 offer value cells, 182 chosen juice cells and 187 chosen value cells. We refer to these 734 neurons as the additional data set. Importantly, the order bias was also observed in these 735 sessions (Ballesta and Padoa-Schioppa, 2019). 736 The interquartile criterion was also used to identify outliers in all the analyses conducted 737 throughout this study. In practice, this criterion became relevant only for the analyses shown in 738 Fig.6 and Fig.5figure supplement 1, as indicated in the respective figure legends. 739

Comparing tuning functions across choice tasks 740
Several analyses compared the tuning functions recorded in the two tasks. Tuning functions 741 were defined by the linear regression of the firing rate r onto the encoded variable S: 742 Regression coefficients b 0 and b 1 were referred to as tuning intercept and tuning slope, 744 respectively. Positive and negative encoding corresponded to b 1 >0 and b 1 <0, respectively. We 745 also defined the mean activity and the activity range as follows. Indicating with S max the 746 maximum value of S, the mean activity was defined as r mean = b 0 + b 1 S max /2. The activity range 747 was defined as Δr = |b 1 S max |, where |·| indicates the absolute value. 748 For any neuronal response, the tuning was considered significant if b 1 differed significantly from 749 zero (p<0.05) and if the sign of the encoding was consistent with the cell class (e.g., b 1 >0 for 750 offer value A + cells). All the analyses comparing tuning functions across tasks were restricted 751 to neuronal responses with significant tuning. 752

Neuronal measures of relative value 753
Several analyses relied on neuronal measures for the relative value of the juices (ρ neuronal ) 754 derived from the activity of chosen value cells. In Task 1, these neurons encode the chosen 755 value independently of the juice type. For each neuronal response, we performed a bilinear 756 regression: 757 proportional to the value of a quantum of juice A (uA), θ B should be proportional to the value of a 761 quantum of juice B (uB), and the ratio θ A /θ B should equal the value ratioi.e., the relative value 762 of the two juices. We thus defined 763 Previous studies showed that this measure is statistically indistinguishable from the behavioral 765 measure ρ behavioral derived from the probit analysis of choice patterns (Padoa-Schioppa and 766 Assad , 2006). 767 In Task 2, in the post-offer1 and post-offer2 time windows, chosen value cells encoded the 768 value of the current offer, independent of the juice type ( Table 1). For each neuron, we thus 769 performed a bi-linear regression for each of the two time windows: 770 r 1 = θ 10 + θ 1A q A δ order,AB + θ 1B q B δ order,BA (11) 771 r 2 = θ 20 + θ 2A q A δ order,BA + θ 2B q B δ order,AB (12) 772 where r 1 and r 2 were their responses recorded in the post-offer1 and post-offer2 time windows, 773 respectively, and θ 10 , θ 1A , θ 1B , θ 20 , θ 2A and θ 2B were regression coefficients. These coefficients 774 provided four neuronal measures of relative value: 775 In essence, these four measures corresponded to the two time windows (post-offer1 and post-780 offer2) and to the two presentation orders (AB and BA). Importantly, all these measures were 781 computed conditioned on θ 1A , θ 1B , θ 2A and θ 2B differing significantly from zero (p<0.05). The 782 analyses illustrated in Fig.5 and Fig.7 were restricted to neurons satisfying this criterion. 783 In terms of notation, we often omit the superscript in ρ behavioral and we indicate behavioral 784 measures simply as ρ (with the relevant subscripts). We use the superscript "behavioral" only 785 when we explicitly compare behavioral and neuronal measures, for clarity. In contrast, for 786 neuronal measures of relative value we always use the superscript "neuronal". 787

Activity profiles of chosen juice cells 788
To conduct population analyses, we pooled all chosen juice cells. The juice eliciting higher firing 789 rates was labeled "E" (encoded) and other juice was labeled "O". In Task 2, we thus referred to 790 EO trials and OE trials, depending on the presentation order. 791 To illustrate the activity profiles of chosen juice cells in Task 2, we aligned spike trains at offer1  792 and, separately, at juice delivery. For each trial, the spike train was smoothed using a kernel 793 that mimicked the post-synaptic potential by exerting influence only forward in time (decay time 794 constant = 20 ms) (So and Stuphorn, 2010). In Fig.6A and Fig.8A we used moving averages of 795 100 ms with 25 ms steps for display purposes. 796

Under sequential offers, chosen juice cells encode different variables in different time windows 797
(see Table 1). During offer1 and offer2 presentation, these cells encode in a binary way the 798 juice type currently on display. Later, as the decision develops, these neurons gradually come to 799 encode the binary choice outcome (i.e., the chosen juice). We previously showed that the 800 activity of these neurons recorded in OE trials shortly before offer2 is inversely related to the 801 value of offer1 (Ballesta and Padoa-Schioppa, 2019). This phenomenon, termed circuit 802 inhibition, resembles the setting of a dynamic system's initial conditions and is regarded as an 803 integral part of the decision process (Ballesta and Padoa-Schioppa, 2019). 804 For a quantitative analysis of circuit inhibition, we focused on a 300 ms time window starting 250 805 ms before offer2 onset. We excluded forced choice trials, for which one of the two offers was 806 null. For each neuron, we examined OE trials and we regressed the firing rates against the 807 normalized value of offer1: 808 where ΔV O was the value range for juice O. The normalization allowed to pool neurons recorded 810 with different value ranges. The regression slope c 1 quantified circuit inhibition for individual 811 cells, and we studied this parameter at the population level. choices (y-axis) is plotted against the log quantity ratio (x-axis). Each data point indicates one 852 offer type in Task 1 (gray circles) or Task 2 (red and blue triangles for AB trials and BA trials, 853 respectively). Sigmoids were obtained from probit regressions. The relative value (ρ) and 854 sigmoid steepness (η) measured in each task and the order bias (ε) measured in Task 2 are  855 indicated. In this session, the animal presented all three biases. Compared to Task 1, choices in 856 Task 2 were less accurate (η Task2 < η Task1 ) and biased in favor of juice A (ρ Task2 > ρ Task1 ;  857 preference bias). Furthermore, choices in Task   format. The results closely resemble those for monkey J but the preference bias is weaker. 878 correspond to the firing rate (y-axis) and to the offered juice quantity (x-axis). The two colors 908 correspond to the two orders (AB, BA    envisioned at the outset of this analysis. In both panels, the x-axis represents behavioral 948 measures from either Task 1 (green) or Task 2 (yellow); the y-axis represents the neuronal 949 measure from Task 2. In scenario 1, the animal assigned higher relative value to juice A in Task  taken as equal to ρ neuronal offer2 (Eq.14). Other definitions provided similar results (data not shown). 963  Correlation between CP and preference bias index. Each panel corresponds to the histogram 971 immediately above it. CPs are plotted against the preference bias index (PBI), which quantifies 972 the preference bias independently of the juice types. Each symbol represents one cell and the 973 line is from a linear regression. CP and PBI were negatively correlated immediately before and 974 after offer2 onset, but not later in the trial. This pattern suggests that the preference bias 975 emerged late in the trial, when decisions were not finalized shortly after offer2 presentation.