1. Neuroscience
Download icon

Sequential selection of economic good and action in medial frontal cortex of macaques during value-based decisions

  1. Xiaomo Chen  Is a corresponding author
  2. Veit Stuphorn  Is a corresponding author
  1. Johns Hopkins University, United States
  2. Johns Hopkins University School of Medicine, United States
Research Article
  • Cited 5
  • Views 814
  • Annotations
Cite this article as: eLife 2015;4:e09418 doi: 10.7554/eLife.09418

Abstract

Value-based decisions could rely either on the selection of desired economic goods or on the selection of the actions that will obtain the goods. We investigated this question by recording from the supplementary eye field (SEF) of monkeys during a gambling task that allowed us to distinguish chosen good from chosen action signals. Analysis of the individual neuron activity, as well as of the population state-space dynamic, showed that SEF encodes first the chosen gamble option (the desired economic good) and only ~100 ms later the saccade that will obtain it (the chosen action). The action selection is likely driven by inhibitory interactions between different SEF neurons. Our results suggest that during value-based decisions, the selection of economic goods precedes and guides the selection of actions. The two selection steps serve different functions and can therefore not compensate for each other, even when information guiding both processes is given simultaneously.

https://doi.org/10.7554/eLife.09418.001

eLife digest

Much of our decision making seems to involve selecting the best option from among those currently available, and then working out how to attain that particular outcome. However, while this might sound straightforward in principle, exactly how this process is organized within the brain is not entirely clear.

One possibility is that the brain compares all the possible outcomes of a decision with each other before constructing a plan of action to achieve the most desirable of these. This is known as the 'goods-based' model of decision making. However, an alternative possibility is that the brain instead considers all the possible actions that could be performed at any given time. One specific action is then chosen based on a range of factors, including the potential outcomes that might result from each. This is an 'action-based' model of decision making.

Chen and Stuphorn have now distinguished between these possibilities by training two monkeys to perform a gambling task. The animals learned to make eye movements to one of two targets on a screen to earn a reward. The identity of the targets varied between trials, with some associated with larger rewards or a higher likelihood of receiving a reward than others. The location of the targets also changed in different trials, which meant that the choice of 'action' (moving the eyes to the left or right) could be distinguished from the choice of 'goods' (the reward).

By using electrodes to record from a region of the brain called the supplementary eye field, which helps to control eye movements, Chen and Stuphorn showed that the activity of neurons in this region predicted the monkeys’ decision-making behavior. Crucially, it did so in two stages: neurons first encoded the reward chosen by the monkey, before subsequently encoding the action that the monkey selected to obtain that outcome.

These data argue against an action-based model of decision making because outcomes are encoded before actions. However, they also argue against a purely goods-based model. This is because all possible actions are encoded by the brain (including those that are subsequently rejected), with the highest levels of activity seen for the action that is ultimately selected. The data instead support a new model of decision making, in which outcomes and actions are selected sequentially via two independent brain circuits.

https://doi.org/10.7554/eLife.09418.002

Introduction

Value-based decision-making requires the ability to select the reward option with the highest available value, as well as the appropriate action necessary to obtain the desired option. Currently it is still unclear how the brain compares value signals and uses them to select an action (Gold and Shadlen, 2007; Cisek, 2012). The goods-based model of decision-making (Padoa-Schioppa, 2011) suggests that the brain computes the subjective value of each offer, selects one of these option value signals, and then prepares the appropriate action plan (Figure 1A). Support for this model comes from recording studies in orbitofrontal cortex (OFC) during an economic choice task (Padoa-Schioppa and Assad, 2006; Cai and Padoa-Schioppa, 2012). In contrast, the action-based model of decision making (Tosoni et al., 2008; Cisek and Kalaska, 2010; Christopoulos et al., 2015a) suggests that all potential actions are represented in the brain in parallel and compete with each other (Figure 1B). This competition is influenced by a variety of factors including the value of each actions’ outcome. According to this model, option value signals should not predict the chosen option, since these signals only serve as input into the decision process, which is determined by the competition among the potential actions. Support for this model comes primarily from recording studies in parietal and premotor cortex (Platt and Glimcher, 1999; Sugrue et al., 2004; Shadlen et al., 2008; Cisek and Kalaska, 2010; Christopoulos et al., 2015b ).

Architecture of different decision models.

(A, B) Goods- and action-based models envision the important selection step during value-based decisions to be either at the value (A) or action (B) representation stage. (C, D) The other two models presume that important selection processes occur at both the value and the action representation stage. However, they differ in their underlying architecture and in the resulting pattern of activity across the network as it unfolds in time. (C) The distributed consensus model assumes reciprocal interactions between the value and the action representation. These reciprocal interactions allow the action selection to influence the simultaneous ongoing value selection. The selection of the chosen good and action proceeds therefore in parallel. (D) In contrast, the sequential model assumes that there are no meaningful functional reciprocal connections from the action to the value representation. Because of this the action value representations cannot influence the value selection process, which has to finish first, before the action selection can begin. Thus, this decision architecture by necessity implies a sequential decision process. Red arrows indicate excitatory connections. Green buttons indicate inhibitory connections. Thickness of the connection indicates relative strength of the neural activity.

https://doi.org/10.7554/eLife.09418.003

As there is evidence supporting both theories, it is unlikely that either the goods-based or the action-based model in their pure form are correct. However, the exact role of goods- and action-based selection processes in decision making is not known. The distributed consensus model (Cisek, 2012) combines elements of the goods-based and the action-based model (Figure 1C). It is characterized by strong reciprocal interactions between the goods and the action representation levels that allow the action selection to influence the simultaneous ongoing value selection and vice versa. This model predicts therefore that the selection of the chosen good and action are closely integrated and proceed in parallel.

Here, we test these different models by recording neuronal activity in the supplementary eye field (SEF). Previous research indicates that neurons in the SEF participate in the use of value signals to select eye movements (So and Stuphorn, 2010). Its anatomical connections make the SEF ideally suited for this role. It receives input from areas that represent option value, such as OFC, ACC, and the amygdala (Huerta and Kaas, 1990; Matsumoto et al., 2003; Ghashghaei et al., 2007), and projects to oculomotor areas, such as frontal eye field (FEF) and superior colliculus (Huerta and Kaas, 1990).

We designed an oculomotor gamble task, in which the monkey had to choose between two gamble options indicated by visual cues. The monkeys indicated their choice by making a saccade to the cue indicating the desired gamble option. Across different trials, the visual cues were presented in different locations and required saccades in different directions to be chosen. This allowed us to distinguish the selection of gamble options or economic goods from the selection of actions. We found that the activity of SEF neurons predicted the monkey’s choice. Importantly, this decision process unfolded sequentially. First, the chosen gamble option was selected and only then the chosen action. The saccade selection process seemed to be driven by competition between directionally tuned SEF neurons. Our findings are not in agreement with any of the previously suggested models (Figure 1A–C). Instead, they support a new sequential decision model (Figure 1D). According to this model, at the beginning of the decision two selection processes start independently on the goods and action level. Our data indicate that the SEF activity is part of the action selection process. The action selection process receives input from the goods selection. However, due to the absence of recurrent feedback, the goods selection process does not receive input from the action selection process. Once the competition on the goods level is resolved, the value signals for the chosen gamble option increase in strength and the ones for the non-chosen one decrease in strength. This activity difference cascades downward to the action level and determines the outcome of the action selection.

Results

Behavior

Two monkeys (A and I) were trained to perform a gambling task in which they chose between two different gamble options with different maximum reward and/or reward probability (Figure 2A,B). The maximum and minimum reward amounts were indicated by the color of the target. The portion of a color within the target corresponded to the probability of receiving the reward amount (see experimental procedures). We estimated the subjective value for each target based on the choice preference of the monkeys for all combinations of options (Figure 2C) (Maloney and Yang, 2003; Kingdom and Prins, 2010). The subjective value estimate (referred to in the rest of the paper as ‘value’) is measured on a relative scale, with 0 and 1 being the least and most preferred option in our set. Consistent with earlier findings (So and Stuphorn, 2010), the mean saccade reaction times during no-choice trials were significantly negatively correlated with the value of the target (Figure 2D, monkey B: t(5) = 8.40, p = 0.03; monkey I: t(5) = 27.35, p = 0.003). On choice trials, the mean reaction times were significantly correlated with the signed value difference between chosen and non-chosen targets (Figure 2E, monkey A: t(40) = 159.23, p<10 –14; monkey I: t(38) = 16.18, p<10–4). Please note that there were a small number of trials with very negative value differences indicating that during these trials the monkey chose a normally non-preferred option. The unusually short reaction time in these trials suggests that the choices were not driven by the normal value-based decision process. These other mechanisms may include history effects, spatial selection bias, express saccades, lapse of attention to the task, and other factors. For this reason, these trials were excluded from the analysis and are marked by a separate color.

Figure 2 with 1 supplement see all
Oculomotor gambling task and behavioral results.

(A) Visual cues used in the gambling task. Four different colors (cyan, red, blue, and green) indicated four different reward amounts (increasing from 1, 3, 5 to 9 units of water, where 1 unit equaled 30 µl). The expected value of the gamble targets along the diagonal axis was the same. For example, the expected value of the bottom right green/cyan target is: 9 units (maximum reward) x 0.2 (maximum reward probability) + 1 unit (minimum reward) x 0.8 (minimum reward probability) = 2.6 units. (B) Sequence of events during choice trials (top) and no-choice trials (bottom). The lines below indicate the duration of various time periods in the gambling task. The black arrow is not part of the visual display; it indicates the monkeys' choices. (C–E) Behavioral results for monkey A (top) and monkey I (bottom). (C) The mean subjective value of the seven gamble options is plotted as a function of expected value. Different colors indicate different amounts of maximum reward. (D) The mean reaction times in no-choice trial as a function of subjective value. (E) The mean reaction times in choice trial as a function of subjective value differences between chosen and non-chosen targets.

https://doi.org/10.7554/eLife.09418.004

SEF neurons predict chosen option and chosen action sequentially

We recorded 516 neurons in SEF (329 from monkey A, 187 from monkey I, Figure 2—figure supplement 1). In the following analysis, we concentrated on a subset of 128 SEF neurons, whose activity was tuned for saccade direction (see experimental procedures).

First, we asked whether SEF activity predicted the chosen gamble option or saccade direction. We performed a trial by trial analysis using linear classification to decode the chosen direction and chosen value from the spike density function with 1 ms temporal resolution. Figure 3A shows the classification accuracy across all 128 directionally-tuned SEF neurons. Single neuron activity clearly predicted both chosen gamble option and chosen direction better than chance, but sequentially, not simultaneously. The activity began to predict chosen gamble option around 160 ms before saccade onset and reached a peak around 120 ms before saccade onset, after which it gradually decreased. The activity started to predict saccade direction only around 60 ms before saccade onset. The same pattern is shown by the number of neurons showing significant classification accuracy as a function of time (Figure 3B).

Figure 3 with 1 supplement see all
Time course of chosen gamble option and saccade direction representation in SEF.

(A) Significant classification accuracy for chosen gamble option (red) and chosen saccade direction (black) across 128 neurons. We excluded values that were not significantly different from chance (permutation test; p≤0.05). (B) Number of neurons showing significant classification accuracy for chosen gamble option (red) and chosen saccade direction (black). (C) Average mutual information between SEF activity and chosen and non-chosen gamble option (top panel; dark and light red) and saccade direction (bottom panel; dark and light grey). The time period when the amount of information about chosen and non-chosen option/direction was significantly different (paired t-test adjusted for multiple comparisons, p≤0.05) are indicated by the thick black line at the bottom of the plots. The onset of a significant difference is indicated by the vertical dashed line. SEF, supplementary eye field.

https://doi.org/10.7554/eLife.09418.006

This result indicates a sequence of decisions, whereby first an economic good, here a gamble option, is chosen and only later the action that will bring it about. To confirm this finding, we employed an independent information theoretic analysis to study how SEF activity encoded the chosen and non-chosen gamble option, as well as the chosen and non-chosen direction throughout the decision process (Figure 3C). We used 106 neurons which were tested with at least 8 out of 12 possible target position combinations. We assumed that the onset of significantly more information about the chosen than the non-chosen variable in the neural firing rate indicated the moment at which the selection process had finished and the choice could be predicted. This moment was reached 66 ms earlier for gamble option information (113 ms before saccade onset; permutation test adjusted for multiple comparison, p≤0.05) than for saccade direction information (47 ms before saccade onset; permutation test adjusted for multiple comparisons, p≤0.05). Thus, the onset and timing of the information representation in SEF is consistent with the results of the classification analysis and also indicates a sequential decision process.

In our gamble task, the monkey was free to indicate his choice as soon as he was ready. Because of this design feature, the saccade onset is likely to be closer aligned with the conclusion of the decision process than target onset. The fact that reaction time reflected chosen value and value difference (i.e. choice difficulty), as indicated in Figure 2, also supports this idea. We analyzed therefore the neural activity aligned on movement onset, because it likely reflects the dynamic of the decision process more accurately. The analysis of the neural activity aligned on target onset further confirms this conclusion (Figure 3—figure supplement 1).

SEF neurons reflect the value of both choice options in an opposing way

Our findings indicated that SEF neurons show signs of a sequential decision process, whereby first a desired economic good is chosen and only then the action that is necessary to obtain the good. Next, we investigated the neural activity in the SEF neurons more closely to test if the SEF only reflects the outcome of the decision, or whether it also reflects one or both of the selection steps. Specifically, we searched for opposing contributions of the two choice options to the activity of SEF neurons, which would indicate a competitive network that could select a winning option from a set of possibilities.

The directionally tuned SEF neurons represented the value of targets in the preferred direction (PD) (Table 1). The PD is the saccade direction for which a neuron is maximally active, irrespective of reward value obtained by the saccade. We estimated each neuron’s PD using a non-linear regression analysis of activity for saccades to all four possible target locations. We defined here PD as the target direction that is closest to the estimated PD. Figure 4A,B shows the activity of the SEF neurons during no-choice trials, that is, when only one target appears on the screen. Although the SEF neurons are strongly active for PD targets and show value-related modulations (Figure 4A), they are not active for saccades into the non-preferred direction (NPD), independent of their value (Figure 4B; regression coefficient = 0.013, t(5) = 1.324, p=0.243). The SEF neurons encode, therefore, the value of saccades to the PD target, confirming previous results (So and Stuphorn, 2010).

Figure 4 with 4 supplements see all
SEF neurons represent the difference in action value associated with targets in the preferred and non-preferred direction.

The neural activity of 128 directionally tuned SEF neurons was normalized and compared across trials with different values of targets in the preferred or non-preferred direction. (A) The neural activity in no-choice trials, when the target was in the preferred direction. (B) The neural activity in no-choice trials, when the target was in the non-preferred direction. (C, D) The neural activity in choice trials. To visualize the contrasting effect of targets in the preferred or non-preferred direction on neural activity, the value of one of the targets was held constant, while the value of the other target was varied. Activity was sorted by target value, but not by saccade choice. (C) The neural activity, when the value of the target in the preferred direction varied, while the value of the target in the non-preferred direction was held constant at a medium value. (D) The neural activity, when the value of the target in the non-preferred direction varied, while the value of the target in the preferred direction was held constant at a medium value. The color of the spike density histograms indicates the target value [high value = 6–7 units (red line); medium value = 3–5 units (orange line); low value = 1–2 units (yellow line)]. (E-H) The regression analysis corresponding to (A-D). A t-test was used to determine whether the regression coefficients were significantly different from 0. The regression coefficients, confidence intervals, t-values, and p-values are listed in Table 1. SEF, supplementary eye field.

https://doi.org/10.7554/eLife.09418.008
Table 1

Average value effect on neural activity across all directional SEF neurons. The upper two rows show the effect of preferred and non-preferred direction target value on normalized neuronal activity in no-choice trials, and the lower two rows show their effect in choice trials. Within each set, the upper row (VPD) shows the effect of the preferred direction target value on normalized neural activity, whereas the lower row shows the effect of the non-preferred direction target value on normalized neural activity. Significance was calculated using a t-test, which shows whether the regression coefficient is significant difference from zero. The analysis corresponds to the results presented in Figure 4

https://doi.org/10.7554/eLife.09418.013
All neurons
(n = 128)
Regression coefficientLower confidence boundUpper confidence boundt(5)p
No-choiceVPD0.0570.0200.0953.9450.011
VNPD0.013-0.0120.0391.3240.243
ChoiceVPD0.1190.0900.14810.629<0.001
VNPD-0.058-0.093-0.0238-4.3450.007
  1. SEF, supplementary eye field.

There are a number of subtypes of value signals that are associated with actions, such as saccades. These signals are related to different stages of the decision process (Schultz, 2015; Stuphorn, 2015). First, there are signals that represent the value of the alternative actions irrespective of the choice. These signals represent the decision variables on which the decision process is based and are commonly referred to as ‘action value’ signals. Second, there are signals that encode the central step in the decision process, namely the comparison between the values of the alternative actions. Such ‘relative action value’ signals represent a combination of different ‘action value’ signals. They should be positively correlated with the action value of one alternative and negatively correlated with the action value of the other alternatives. Third, there are signals that indicate the value of the chosen action. These ‘chosen action value’ signals represent the output of the decision process. Single target trials do not allow to distinguish between these functionally very different signals, but choice trials do.

On their own, NPD targets did not evoke neural activity. However, the value of NPD targets clearly modulated the response of the SEF neurons to the PD targets in choice trials (Figure 4D; Table 1). To isolate the effect that targets in the two directions have on the neural activity, we first held the value of the NPD target constant at a medium amount and compared the SEF population activity across trials with PD targets of varying value (Figure 4B). The neural activity clearly increased with the value of the PD target (regression coefficient = 0.119, t(5) = 10.629, p<0.001; Figure 4E). Next, we held the value of the PD target constant at a medium value and compared the population activity across trials with NPD targets of varying value (Figure 4C). The neural activity clearly decreased with the value of the NPD target (regression coefficient=-0.058, t(5)=-4.345, p=0.007; Figure 4F). Thus, the SEF neurons represented a relative action value signal. The value of the PD target influences the SEF activity about twice as large as the value of the NPD target. This means that the SEF neurons do not encode the exact value difference. Nevertheless, the opposing influence of the targets indicates that SEF represents the essential step in decision-making, namely a comparison of the relative value of the available actions. All these effects were present well before saccade onset (Figure 4—figure supplement 1) and did not depend on the chosen saccade direction (Figure 4—figure supplement 2). Similar activity pattern can also be observed if pooling together all task related neurons (N=353, Figure 4—figure supplement 3). In contrast, the neurons were not significantly influenced by the relative spatial location of each target (Figure 4—figure supplement 4).

We used a regression analysis to further quantify the relative contribution of the chosen and non-chosen gamble option and saccade to the neural activity during the decision. In modeling the neural activity in choice trials, we used each neurons’ activity in no-choice trials as a predictor of its response in choice trials. Specifically, we modeled the neural activity as a weighted sum of the activity in no-choice trials for saccades to targets with the same gamble option or direction as the chosen and non-chosen targets in the choice trials. The strength of the coefficients is a measure of the relative influence that each target has on the neural activity in a particular time period during the decision process. Figure 5 shows the time course of the coefficient strength for the chosen and non-chosen target when we sorted trials either by gamble option or saccade direction. In both cases, the correlation coefficients for the two targets were initially of equal value, indicating that the SEF neurons reflected each target equally during this time period. However, 110 ms before saccade onset (permutation test adjusted for multiple comparisons, p≤0.05) the strength of the chosen gamble option coefficient started to rise, while the non-chosen gamble coefficient strength stayed the same. Later in the trial, 60 ms before saccade onset (permutation test adjusted for multiple comparisons, p≤0.05) the coefficient strength for the chosen saccade direction increased. Simultaneously, the coefficient strength for the non-chosen saccade direction decreased. The results of the regression analysis allow a number of conclusions about the mechanism underlying decision-making and the role of SEF in it. First, the results confirm the findings of the decoding and encoding analysis (Figure 3) and indicate that the decision process involves a sequence of two different selection processes. Second, the opposing pattern of influence on neural activity in the case of saccade direction suggests that the latter action selection step could involve at least partially the SEF. The mechanism underlying this selection involves competition between the action value signals associated with the two saccade targets. Thus, the increasing influence of one target leads simultaneously to a decreasing influence of the other target. However, the neural activity associated with the earlier gamble option selection step does not show such a pattern of competing influences. Instead, the influence of the chosen option just increases. That indicates that the choice of an economic good does not involve competitive interactions between SEF neurons. Instead, only the output of the gamble option selection is represented in SEF. This signal could reflect input from other brain regions.

Relative influence of chosen and non-chosen target on SEF activity.

(A) Regression coefficients for chosen and non-chosen gamble options (dark and light red). (B) Regression coefficients for chosen and non-chosen saccade directions (dark and light grey). Time periods in which the regression coefficients for chosen and non-chosen option/direction are significantly different (paired t-test adjusted for multiple comparisons, p0.05) are indicated by a thick black line. The onset of a significant difference is indicated by a vertical dashed line. All panels are aligned on saccade onset. The shaded areas represent SEM. SEF, supplementary eye field; SEM, standard error of the mean.

https://doi.org/10.7554/eLife.09418.014

Relative action value map in SEF reflects competition between available saccade choices

Each directionally tuned SEF neuron represents the relative action value of saccades directed toward its PD. Together these neurons form a map encoding the relative value of all possible saccades during the decision process. Our analysis of the activity pattern in individual neurons suggested that the action selection relied on competition between different relative action value signals. In that case, the relative action value map in SEF should contain different groups of neurons that represent the competing relative action values of the two saccade choices. If the activities of these two groups of neurons indeed reflect inhibitory competition, the selection of a particular action should lead to increased activity of the neurons representing the chosen and decreased activity in the neurons representing the non-chosen saccade. Furthermore, the inhibition that the winning neurons can exert on the losing neurons should depend on their relative strength. We should therefore see differences in the dynamic of the neural activity within the relative action value map if we compare trials with small or large value differences.

To reconstruct the SEF relative action value map, we combined the activity of all directionally tuned neurons in both monkeys (Figure 6). We sorted each SEF neuron according to its PD and normalized their activity across all trial types (choice, no-choice trials). We then smoothed the matrix by linear interpolation at a bin size of 7.2° and plotted the activity. For each successive moment in time, the resulting vector represented the relative action value of all possible saccade directions, because all task-relevant saccades were equidistant to the fixation point. The succession of states of the map across time represented the development of relative action value-related activity in SEF over the course of decision making.

Figure 6 with 1 supplement see all
Action value maps showing population activity in SEF during decision making.

Each neuron’s activity was normalized across all trial conditions. The maps in the left column are aligned on target onset and the panels in the right column on saccade onset. In each map, horizontal rows represent the average activity of cells whose preferred direction lies at a given angle relative to the chosen target (red circle on left). Color indicates change in normalized firing rate relative to the background firing rate (scale on the right). (A) Population activity during no-choice (A) and choice trials (B). Population activity in choice trials divided into trials with small (C) and (D) large value differences between the reward options. The subplots above the action value maps show the time course of the neural activity associated with the chosen (45–135°) and non-chosen (225–315°) target. The brown lines underneath show the time when population activities were significantly different than the baseline (permutation test adjusted for multiple comparisons). The blue lines underneath show the time when the neural activities associated with the chosen target were significantly different from those associated with the non-chosen target (permutation test adjusted for multiple comparisons). SEF, supplementary eye field.

https://doi.org/10.7554/eLife.09418.015

In choice trials, activity started to rise in two sets of neurons (Figure 6B). One was centered on the chosen target (indicated by the red dot), while the other one was centered on the non-chosen target (indicated by the black dot). The initial rise in activity was not significantly different between choice and no-choice trials (onset time on no-choice trial: 44 ms, choice trial: 40 ms; permutation test adjusted for multiple comparisons, p0.05). However, there was a longer delay between the initial rise in activity and saccade onset (onset time on no-choice trial: 141 ms before saccade onset, choice trial: 185 ms before saccade onset; permutation test adjusted for multiple comparisons, p0.05), in keeping with the fact that reaction times were longer when the monkey had to choose between two response options (Figure 2E). At the beginning, the activity associated with both possible targets was of similar strength, but 70 ms before saccade onset (permutation test adjusted for multiple comparisons, p0.05), a significant activity difference developed between the two sets of cells that predicted which saccade would be chosen. The activity centered on the chosen saccade became much stronger than the one centered on the non-chosen saccade. This differentiation reflected the decision outcome within the SEF relative action value map.

The chosen saccade was often also the one with the larger value. We therefore performed a separate analysis of those choice trials were the monkeys made different choices for the same pair of gambles, which allowed us to differentiate between the representation of action value and choice. A comparison of the trials when the larger (‘correct’) or smaller (‘error’) value targets were chosen shows a strong increase of neural activity for the chosen target regardless of its average value (Figure 6—figure supplement 1). This confirmed that neural activity in SEF represented not only the relative action value of the competing saccades, but also the final choice.

We hypothesized that the action selection process is driven by competition between the action values of the two targets. If that were true, we would expect the reduction in non-chosen saccade related activity to be less pronounced and to occur later because of weaker inhibition, when the differences in action values were less pronounced. We divided therefore the choice trials into two groups with small (value difference smaller than 0.4) and large value differences (value difference larger or equal to 0.4), while controlling that the mean chosen value in both conditions was the same (Figure 6C,D). As predicted, the neural activity associated with the non-chosen target was stronger and lasted longer, when the value differences were small (onset of significant activity difference for chosen target and non-chosen target was 68 ms before saccade onset for small value difference trial and 42 ms before saccade onset for large value difference trials; permutation test adjusted for multiple comparisons, p0.05, Table 2). This longer lasting activity was consistent with the longer reaction time for smaller value difference trials (Figure 2E). In contrast, the activity for the chosen target was weaker, when the value differences were small, especially early on (100–150 ms after target onset). The stronger activity associated with the non-chosen target in these trials was likely better able to withstand the competition of the activity associated with the chosen target and in turn reduced this activity more strongly.

Table 2

The onset times in time-direction maps. The first main column shows the onset time calculated from trials aligned on target onset and the second main column shows the onset time calculated from trials aligned on saccade onset. Within each main column, the first minor column shows the time when the neural activity was significantly different from background activity (-20 to 0 before target onset). The second minor column shows the time when the neural activity represented the choice. In no-choice trials, this corresponds to the time when the activity of neurons with a preferred direction within ± 30° of the target was significantly different from the activity of neurons where no target was presented (the neurons with preferred direction within 240–300°). For choice trial, it corresponds to the time when the activity for the chosen target was significant different form the activity for the non-chosen target (in both cases the neurons with preferred direction within ± 30° of their respective target). A permutation test with multiple comparison correction was used to calculate the onset times.

https://doi.org/10.7554/eLife.09418.017
Time from target onsetTime from saccade onset
Activity vs backgroundChosen vs non-chosenActivity vs backgroundChosen vs non-chosen
No-choice44 ms-141 ms
Choice40 ms105 ms-185 ms-70 ms
Choice (dV>=0.4)44 ms92 ms-129 ms-68 ms
Choice (dV<0.4)41 ms139 ms-169 ms-42 ms

On choice trials, there is a simultaneous onset of activity in the two areas of the relative action value map that correspond with the location of the two target options (Figure 6B). Throughout the task, there is a robust representation of both options maintained in SEF, even after the divergence of activity that indicates the chosen option and action (Figure 4C,D; Figure 6B,C). However, during the initial rise in activity the SEF population does not indicate the value of the target in its PD. At the time, when the neurons start to differentiate their activity according to the value of the target in the preferred direction they also reflect the value of the target in the NPD. This can be seen very clearly in Figure 4D showing the activity of SEF neurons for PD targets of medium value. Depending on the value of the NPD target, the activity starts to change ~110–120 ms after target onset. However, it took this much time for the SEF neurons to indicate value even during no-choice trials, when there was no competing target. It seems therefore that the SEF neurons always indicate relative action value. There is no time period in which two populations of SEF neurons represent the absolute action value of a target independent of the value of any competing target. Nevertheless, there is clear evidence of a succession of an initial undifferentiated state to a more and more differentiated state in which the influence of the chosen action value on the neuronal activity increases and the influence of the non-chosen one decreases (Figure 4D; Figure 6B,C). This indicates a dynamic process as would be expected by a decision mechanism driven by competition via inhibitory interactions.

Our results therefore support the idea that an ongoing process of inhibitory competition underlies the action selection. SEF neurons might directly participate in this action selection process, or at least reflect it.

Instantaneous changes in SEF activity state space reflect decision process

So far, all analysis have been performed using individual SEF neurons or comparisons of specific subsets of neurons. However, the decision process should also manifest itself in the dynamic changes in the instantaneous activity distribution across the entire SEF population. To study how the SEF population dynamically encodes the task variables underlying the monkeys’ behavior, we analyzed the average population responses as trajectories in the neural state space (Yu et al., 2009; Shenoy et al., 2011). As the previous analyses show, the activity pattern of individual SEF neurons during the decision process is not completely independent from each other, but follows particular pattern. The movement of SEF activity state trajectories in a lower-dimensional sub-space captured therefore most of the relationship between the SEF activity state and the behaviorally relevant task variables (Figure 7). We estimated this task-related subspace by using linear regression to define three orthogonal axis: chosen saccade direction along the horizontal and vertical dimension, and value of the chosen option (Mante et al., 2013).

Figure 7 with 2 supplements see all
Dynamics of SEF population activity trajectories in state space during decision making.

The average population response for a given condition and time period (10 ms) is represented as a point in state space. Responses are shown from 200 ms before to 10 after saccade onset. The time of saccade initiation is indicated by the larger dot. The four different chosen saccade directions are indicated by different colors (up right: red; down right: orange: up left: black; down left: blue) and the value of the chosen target by line style (high value (value>=0.7): solid line, medium value (value<0.7 and value >0.3): dashed line, low value dotted line). (A) Trajectories of up-left and down-right movement in value and horizontal (left/right) subspace for three different values. (B) Trajectories of movements in value and action subspace. (C) The effect of the chosen option value on the state space trajectory at saccade onset. The subjective value of each chosen option was measured relative to the option with the smallest chosen value. The Euclidian distance in 3-D task space between the state vectors of each pair of chosen options increased as a function of their difference in subjective value. The significance of the relationship between difference in Euclidian distance and value was tested using a regression analysis (t-test; the p-value indicates the probability that the regression slope is significantly different from zero). (D) The effect of the non-chosen option value on the state space trajectory at saccade onset. For trajectories with fixed saccade direction and chosen option value, the difference in Euclidian distance increased as a function of difference in subjective value of each non-chosen option relative to the option with the largest non-chosen value. SEF, supplementary eye field.

https://doi.org/10.7554/eLife.09418.018

First, we compare only trajectories for saccades in two different directions and three different value levels in a simplified two-dimensional space spanned by the value and the vertical movement direction axis (Figure 7A). The trajectories of upward and downward saccades (as indicated by the different color) are clearly well separated along the direction axis. In addition, the trajectories for different chosen value are also separate from each other along the value axis (as indicated by different line style). Thus, the trajectories move in an orderly fashion with respect to the two task-related axis. As a result of the separation across both axis, the trajectories reach six different points in state space, when the respective saccade is initiated.

Similarly, in the full three-dimensional (3-D) task space, the trajectories for all four directions and the different chosen values are also well separated from each other and do not converge (Figure 7B). As a result, the trajectories reach different positions in 3-D task space at the moment of saccade initiation, and their distance is significantly correlated with the difference in chosen value for all four different saccade directions (Figure 7C). Our previous analysis suggests that the neurons encoding the relative action value of saccades in different direction compete through mutual inhibition. This mutual inhibition should change the direction and endpoint of the different trajectories, so that they should not only depend on the saccade direction and subjective value of the chosen target, but also on the value of the non-chosen target. To test this, for a fixed direction and chosen value, we computed the distance between the trajectory with the largest non-chosen value and all other trajectories with decreasing non-chosen values. The regression analysis shows that, when saccade direction and chosen value is fixed, the distance between trajectories is significantly modulated by non-chosen value (Figure 7D). The larger the non-chosen value difference is the further apart the trajectories are when the saccade is initiated. This can also be observed by comparing population activity trajectories in the state space (Figure 7—figure supplement 1). Thus, the trajectories in state space reflect both the chosen and the non-chosen target values. In this context, no-choice trials could be considered as choice trials with zero non-chosen value. In this case, one would expect the trajectory to be similar to the trials with large chosen and small unchosen value. However, this is not the case (Figure 7—figure supplement 1). Although trajectories for no-choice trials are also modulated by both value and direction, they always reach a point along both the value and direction axis that is less extended than during choice trials.

Lastly, we asked whether the sequential value and action selection can be observed in the state space analysis. An indication of this can be seen in Figure 7A. The trajectories first start to separate along the value axis, before they separate along the direction axis. Consistent with this observation and the single neuron analysis, the variance explained by value axis increased earlier than the variance explained by saccade direction axis (Figure 7—figure supplement 2).

Discussion

Our results indicate that SEF represents the relative action value of all possible saccades, forming a relative action value map. During value-based decisions, the SEF population first encodes the chosen gamble option and only later the chosen direction. Our data suggest that neural activity in SEF reflects the action selection, the second step in the decision process. This selection process occurs likely through a process of competitive inhibition between groups of neurons carrying relative action value signals for different saccades. This inhibition could occur locally between SEF neurons, either through mutual inhibition between different SEF neurons within the relative action value map, or through a global pool of inhibitory neurons that receive input from excitatory neurons in SEF (Schlag et al., 1998; Wang, 2008; Nassi et al., 2015). Alternatively, the neural activity in SEF and the competitive process manifested by it could reflect shared signals within the more distributed action selection network that SEF is part of. Similar action value signals have been reported in lateral prefrontal cortex (Matsumoto et al., 2003; Wallis and Miller, 2003), anterior cingulate cortex (Matsumoto et al., 2003), and basal ganglia (Samejima et al., 2005; Lau and Glimcher, 2008). Of course, it is also possible that the action selection involves both interactions between neurons in a larger network and local inhibitory interactions. Future perturbation experiments will be required to test whether SEF plays a causal role in decision making and also if the relative action value encoding is at least partly the result of local inhibitory mechanism or whether it reflects only input from connected brain regions. Independent of these considerations, our results allow us to draw some conclusions about the basic functional architecture of decision making in the brain. Specifically, they invalidate a number of previously suggested decision models and instead support a new sequential model of decision making.

Currently, three major hypotheses about the mechanism underlying value-based decision making have been suggested: the goods-based model (Figure 1A), the action-based model (Figure 1B), and the distributed consensus model (Figure 1C). Our gamble experiment design allows us to test these models by dissociating the value selection process from the action selection process. Due to the uncertainty of reward for each individual gamble, the large number of the gamble option pairs, and the fact that each gamble option pairing could be presented in multiple spatial configurations, the task design prevent the subjects from making direct associations between the visual representation of the gamble options and action choice. Therefore, the task required on each trial a good-based, as well as an action-based selection.

Our findings support none of the previously suggested models. First, the pure good-based model of decision making would predict that action-related representations are downstream of the decision stage and should therefore only represent the decision outcome (i.e. the chosen action; see Figure 1A). However, we found evidence for competition between relative action value-encoding neurons in SEF that is spatially organized, that is, in an action-based frame of reference (Figure 4). This fact, together with the fact that we find activity corresponding to both response options (Figure 6) clearly rules out the pure good-based model (Figure 1A). Second, the pure action-selection model B would predict that the competition would only happen in the action value space. In Figure 1B, this is indicated by the absence of inhibitory connections between the nodes representing the reward options (or goods). This model therefore predicts that information about chosen saccade direction should appear simultaneously with or even slightly earlier than information about chosen reward option, since the selected action value signal contains both direction and value information. However, this prediction is contradicted by our observation that the chosen value information is present earlier than the chosen action information (Figure 3, Figure 5 and Figure 7). Thus, there is a moment during the decision-making process (100–50 ms before saccade onset), when the SEF neurons encode which option is chosen, but not yet, what saccade will be chosen. This could clearly never happen in an action-selection model of decision-making. Lastly, the distributed consensus model (Figure 1C) suggests strong recurrent connections from the action selection level back to the option selection level. The reciprocal interaction should lead to a synchronization of the selection process in both stages and the chosen gamble option and chosen action should be selected simultaneously. This prediction is clearly not supported by our findings, given the robust 100 ms time difference in the onset of chosen option and direction information.

Instead, our data are most consistent with a model that predicts selection both on the option and the action representation level, but with asymmetric connections between them, so that the option selection level influences the action selection level, but not vice versa. This is the sequential model of decision making (Figure 1D). According to this model, value-based decisions require two different selection processes within two different representational spaces. First, a preferred option has to be chosen within an offer or goods space by comparing their value representations (Padoa-Schioppa, 2011). Second, within an action space the response has to be chosen that most likely will bring about the preferred option. A similar sequence has been found, when initially only information relevant for the selection of an economic good is provided, and information relevant for the selection of an action is only given after a delay (Cai and Padoa-Schioppa, 2014). Here, we show that this sequence is obligatory, since it even occurs, when information guiding the selection of goods and actions is given simultaneously. This suggests that these two selection steps are related, but functionally independent from each other and involve different brain circuits. This explains why evidence for decision-related neural activity has been found at both selection stages (Shadlen et al., 2008; Cisek and Kalaska, 2010; Wunderlich et al., 2010; Padoa-Schioppa, 2011; Cisek, 2012). A similar separation between stimulus categorization and action selection has been also found in other decision processes (Schall, 2013). The competition between subjective value representations takes most likely place in OFC (Padoa-Schioppa and Assad, 2006) and vmPFC (Wunderlich et al., 2010; Lim et al., 2011; Strait et al., 2014), while the competition between action value representations takes place in SEF and DLPFC (Wallis and Miller, 2003; Kim et al., 2008). The selection of action value signals in turn can influence the neural activity in primarily motor-related areas, such as FEF and SC that encode the final commitment to a particular course of action (Schall et al., 2002; Brown et al., 2008; Thura and Cisek, 2014).

It has been suggested that different neural and functional architectures underlie different types of value-based decisions making (Hunt et al., 2013). In contrast, our sequential decision model predicts that decisions are always made using the same decision architecture: an initial goods-based selection, followed by action-based selection, because both stages are necessary and not functionally interchangeable. Nevertheless, the relative importance of each selection stages likely depends on the behavioral context. To understand strictly economic behavior (such as savings behavior or consumption of goods) the goods selection is the more important step. Preferred options can be selected without knowledge about the action necessary to indicate the chosen option (Gold and Shadlen, 2003; Bennur and Gold, 2011; Grabenhorst et al., 2012; Cai and Padoa-Schioppa, 2014). In such situations, there is no evidence of ongoing competition between potential actions (Cai and Padoa-Schioppa, 2012, 2014). Obtaining a desired good is typically considered to be a trivial act in a well-functioning market (Padoa-Schioppa, 2011). On the other hand during perceptual or rule-based decision making, the action selection is the most important step in the decision process, because only one type of good can be achieved by engaging in the task. An example are competitive games such as chess, were the goal (checkmate) is clear and implicitly chosen when a player starts a game, but were the player still has to find the most appropriate actions to achieve this goal. This implies that within different behavioral contexts, different elements of the decision circuit become critical. Altogether, we think that actual behavior under a wide range of different conditions is best understood by a model that respects that behavioral choices are the result of two independent and functionally different selection mechanisms.

Materials and methods

Two rhesus monkeys (both male; monkey A: 7.5 kg, monkey I: 7.2 kg) were trained to perform the tasks used in this study. This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (PR13A337) of Johns Hopkins University.

Behavioral task

In the gambling task, the monkeys had to make saccades to peripheral targets that were associated with different amounts of reward (Figure 2A). The targets were squares of various colors, 2.25×2.25° in size. They were always presented 10° away from the central fixation point at a 45, 135, 225, or 315° angle. There were seven different gamble targets (Figure 2B), each consisting of two colors corresponding to the two possible reward amounts. The portion the target filled with each color corresponded to the probability of receiving the corresponding reward amount. Four different colors indicated four different reward amounts (increasing from 1, 3, 5 to 9 units of water, where 1 unit equaled 30 µl). The minimum reward amount for the gamble option was always 1 unit of water (indicated by cyan), while the maximum reward amount ranged from 3 (red), 5 (blue) to 9 units (green), with three different probabilities of receiving the maximum (20, 40, and 80%). This resulted in a set of gambles, whose expected value on the diagonal axis was identical, as shown in the matrix (Figure 2B).

The task consisted of two types of trials - choice trials and no-choice trials. All the trials started with the appearance of a fixation point at the center of the screen (Figure 2B), on which the monkeys were required to fixate for 500–1000 ms. In choice trials, two targets appeared on two randomly chosen locations across the four quadrants. Simultaneously, the fixation point disappeared and within 1000 ms the monkeys had to choose between the gambles by making a saccade toward one of the targets. Following the choice, the non-chosen target disappeared from the screen. The monkeys were required to keep fixating on the chosen target for 500–600 ms, after which the target changed color. The two-colored square then changed into a single-colored square associated with the final reward amount. This indicated the result of the gamble to the monkeys. The monkeys were required to continue to fixate on the target for another 300 ms until the reward was delivered. Each gamble option was paired with all other six gamble options. This resulted in 21 different combinations of options that were offered in choice trials. The sequence of events in no-choice trials was the same as in choice trials except that only one target was presented. In those trials, the monkeys were forced to make a saccade to the given target. All seven gamble options were presented during no-choice trials.

We presented no-choice and choice trials mixed together in blocks of 28 trials that consisted of 21 choice trials and 7 no-choice trials. Within a block, the order of trials was randomized. The locations of the targets in each trial were also randomized, which prevented the monkeys from preparing a movement toward a certain direction before the target appearance.

For reward delivery, we used an in-house built fluid delivery system. The system was based on two syringe pumps connected to a fluid container. A piston in the middle of the two syringes was connected with the plunger of each syringe. The movement of the piston in one direction pressed the plunger of one syringe inward and ejected fluid. At the same time, it pulled the plunger of the other syringe outward and sucked fluid into the syringe from the fluid container. The position of the piston was controlled by a stepper motor. In this way, the size of the piston movement controlled the amount of fluid that was ejected out of one of the syringes. The accuracy of the fluid amount delivery was high across the entire range of fluid amounts used in the experiment, because we used relatively small syringes (10 ml). Importantly, it was also constant across the duration of the experiment, unlike conventional gravity-based solenoid systems.

Estimation of subjective value of gamble options

We used Maximum Likelihood Difference Scaling (MLDS) (Maloney and Yang, 2003; Kingdom and Prins, 2010) to estimate the subjective value of different targets. The algorithm is an optimization algorithm which gives the best estimation of the subjective value and internal noise based on the maximum across-trial likelihood, which is defined as:

(1) L(ψ(1),ψ(2),ψ(N),σd|r)=k=1Tlogep(rk|Dk;ψ(1),ψ(2),ψ(N),σd)

where ψ(i) are the subjective value for all the targets, σd is the internal noise, rk is the response (chosen:1 or non-chosen:0) on the kth trial, and Dk is the estimated subjective value difference between two targets in the kth trial given the set of subjective value and internal noise,r the full set of responses across all trials and T the number of trials. We performed the MLDS using the Matlab based toolbox 'Palamedes' developed by Prins and Kindom (Kingdom and Prins, 2010).

Neurophysiological methods and data analysis

After training, we placed a hexagonal chamber (29 mm in diameter) centered over the midline, 28 mm (monkey A) and 27 mm (monkey I) anterior of the interaural line. During each recording session, single units were recorded using 1–4 tungsten microelectrodes with an impedance of 2–4 MΩs (Frederick Haer, Bowdoinham, ME). The microelectrodes were advanced, using a self-built microdrive system. Data were collected using the PLEXON system (Plexon, Inc., Dallas, TX). Up to four template spikes were identified using principal component analysis. The time stamps and local field potential were then collected at a sampling rate of 1,000 Hz. Data were subsequently analyzed off-line to ensure only single units were included in consequent analyses. We used custom software written in Matlab (Mathworks, Natick, MA), which are available at the following GitHub respository: https://github.com/XMoChen/Sequential-good-and-action-selection-during-decision-making.

Recording location

To determine the location of the SEF, we obtained magnetic resonance images (MRI) for monkey A and monkey I. A 3-D model of the brain was constructed using MIPAV (BIRSS, NIH) and custom Matlab codes. As an anatomical landmark, we used the location of the branch of the arcuate sulcus. The locations of neural recording sites are shown in (Figure 2—figure supplement 1). In both monkeys, we found neurons during the saccade preparation period in the region from 0 to 11 mm anterior to the genu of the arcuate branch and within 5 mm to 2 mm of the longitudinal fissure. We designated these neurons as belonging to the SEF, consistant with previous studies from our lab and existing literature (Tehovnik et al., 2000; So and Stuphorn, 2010).

Task-related neurons

We used several criteria to determine whether a neuron was task related. To test whether a neuron was active while the monkey generated saccades to the targets, we analyzed the neural activity in the time period between target onset to saccade initiation. We performed a permutation test on the spike rate in 50 ms intervals throughout the saccade preparation time period (150–0 ms before saccade onset or 50–200 ms after target onset) to compare against the baseline period (200–150ms prior to target onset). If p value was ≤0.05 for any of the intervals, the cell was determined to have activity significantly different form baseline. Out of 516 neurons, 353 were classified as task-related using these criteria.

Furthermore, we used a more stringent way to define the task related neuron by fitting a family of regression models to the neural activity and determining the best-fitting model (So and Stuphorn, 2010).

The influence of value (V) on neuronal activity was described using a sigmoid function

(2) f(V)=b11+es(Vt)

where b1 is the weight coefficient, s (s(0,1)) is the steepness, and t (t(0,1)) is the threshold value. Often, the influence of expected value on neuronal activity is described using a linear function. However, SEF neurons are better described using a sigmoid function. The reasons for this are twofold: 1) Many SEF neurons actually had a ‘curved’ relationship with increasing value (So and Stuphorn, 2010). 2) The more important reason is that a substantial number of value-related SEF neurons showed floor or ceiling effects, that is, they showed no modulation for value increases in a certain range, but started to indicate value above or below a certain threshold. In addition, the sigmoid function is flexible enough to easily approximate a linear value coding. In Equation 2, by setting t=0.5, b1>1, the relatively linear part of the sigmoid function can be used for value coding. Thus, the sigmoid function is flexible enough to fit the behavior of a large number of neurons with monotonically increasing or decreasing activity for varying value (including linearly related ones).

The influence of saccade direction (D) on neuronal activity was described using a circular Gaussian function

(3) g(D)=b2×e{w×[cos(Dp)]1}

where b2 is the weight coefficient, w (w(0,4π]) is the turning width, p (p[0,2π]) is the PD of the neuron.

The interaction of value and direction was described using the product of f(V) and g(D)

(4) h(V, D) = f(V) × g(D) =b3 × 11+e-s(V-t)  × e{w×[cosD-p]-1} 

where b3 is the weight coefficient.

For each neuron, we fitted the average neuronal activity before saccade (50ms before saccade onset to 20 ms after saccade onset) on each no-choice trial with all possible linear combinations of the three terms f(V), g(D), h(V,D) as well as with a simple constant model (b0). We identified the best fitting model for each neuron by finding the model with the minimum Bayesian information criterion (Burnham and Anderson, 2002; Busemeyer and Diederich, 2010)

(5) BIC=n×log(RSSn)+k×log(n)

where n is the number of trials (a constant in our case), and RSS is the residual sum of squares after fitting. We used a loosely defined BIC in order to include more neurons into analysis, where k is the number of independent variables in the equation rather than the number of parameters in the regression model. A lower numerical BIC value indicates that the model fit the data better: with a lower residual sum of squares indicating better predictive power and a larger k penalizes less parsimonious models. All neurons with lower BIC value than the baseline model containing only a constant (b0) were considered task related. Among the 353 task-related neurons, 128 neurons were further classified as directionally tuned and were used in the following analyses.

All neurons were tested with all 21 gamble option combinations and at least four diagonal directional combinations in which two targets where 180 degree away. One hundred and six neurons (26 from monkey A and 80 from monkey I) were tested with no less than 8 out of 12 direction combinations (4 diagonal and 4 ninety degree away in the same hemi-visual field direction combinations), and 86 neurons (6 from monkey A and 80 from monkey I) were tested with all 12 direction-combinations.

Averages of neural activity across the entire population of all 128 directionally-tuned SEF neurons were performed after the individual neurons activity was normalized by searching for the minimum and maximum activity across all choice and no-choice trial conditions and setting the minimum activity to 0 and the maximum activity to 1. The only exception is the construction of the relative action value map, were we used a slightly different definition of the zero reference point.

Relative action value map

The normalized time-direction maps show the population activity of all directional SEF neurons based on their preferred direction relative to the chosen and non-chosen target (Cisek and Kalaska, 2005). For each neuron, we generated the mean firing rate separately for all 16 combinations of choices and target configurations (choice trials: 12; no-choice trials: 4). The neuron’s firing rate was normalized by setting the baseline activity (mean activity between 50 to 0 before target onset across 16 conditions) to 0 and the maximum activity across all 16 conditions to 1. Each cell's preferred direction was defined by the circular Gaussian term in the best fitting model in the BIC analysis. Population data were displayed as a 2D color plot, in which the spike density functions of each neuron were sorted along the vertical axis according to their preferred direction with respect to the location of the selected target. This resulted in a matrix in which the PD distribution within the relative action value map was unevenly sampled. The sorted matrix was therefore smoothed by linear interpolation at an angle of 7.2°. The horizontal axis showed the development of the population activity across time aligned to either target or movement onset. For all population maps, the same baseline activity and maximum activity were used for each neuron.

Neural decoding of chosen reward option and direction

Binary linear classification was performed using Matlab toolboxes and custom code. The analysis was performed on neural activity 200 ms before till 20 ms after movement onset at a 1 ms time resolution. For each neuron, we used the neural activity in those choice trials in which the monkey chose a particular value or direction to train the classifier (around half of the trials for direction, and different numbers of trials for different values depending on the monkeys' choice behavior). We then used the classifier to predict either the direction or the value of the chosen target for each trial. When predicting the chosen direction, for example, there are two target locations in a choice trial. We used the neural activity in all choice trials when the monkey chose either one of the target locations to train the classifier, and then used the optimized classifier to predict the chosen saccade direction based on the observed neural activity in a particular trial. The overall classification accuracy was calculated by averaging across all trials for each neuron. We used a permutation test, in which we shuffled the chosen and non-chosen target value and direction, to test whether the classification accuracy was significant (1000 shuffle; p0.05).

Mutual information

In order to compare the relative strength of the relationship between neural activity and saccade value and direction, we calculated separately for each neuron the mutual information between neural activity and chosen and non-chosen value or direction, respectively. To capture the dynamics of value and direction encoding, we performed the calculation repeatedly for consecutive time periods during saccade preparation using spike density at a 1 ms time resolution.

To reduce the bias in estimating the mutual information and let the estimated information comparable between trial conditions, we discretized the neural activities in the same way. During no-choice trials, we sampled the space of possible values and directions evenly, in contrast to choice trials were the values and direction depended on the monkey’s preferences. We assumed that the neural activity in no-choice trials allowed us to capture how this neuron encoded value and direction information and we used neural activity in no-choice trial to determine the bins for neural activity for all trial conditions. We set the number of bins for neural activity (NF) as four. At each particular time window, we collected the mean neural firing rates (F) from every no-choice trial and divided them into four bins so that each bin held equal number of no-choice trials. We then got Q1,Q2 and Q3 as the boundaries for 4 quartiles. For all trial conditions, at the same time window, neural activity below Q1 was classified as F1, between Q1 and Q2 as F2, between Q2 and Q3 as F3, and finally, neural activity above Q3 was classified as F4.

The mutual information between neural activity F and the variable X, which can be either chosen or non-chosen value or chosen or no-chosen direction in our case, was approximated by the following:

(6) I(F,X)=(i=1NFj=1NXMijMlogMijMMi·M·j)Bias

here Mij is the number of trials having both Fi and Xj; is the number of trials having Fi, and M·j is the number of trials having Xj. M is the number of total trials. As mentioned before, we set NF, the number of distinct states of neural activity, to four. In the case of direction, we set Nx, the number of distinct states of the relevant variable, to four, because we tested four different saccade directions. In the case of value, we tested seven different values. However, distinguishing seven different value levels would have resulted in different maximum amounts of mutual information for the two variables (direction: 2.00 bits; value: 2.81 bits). This would have led to an overestimation of value information relative to directional information. In order to make the value and direction information estimations directly comparable, we set Nx for value to four as well. In grouping the seven different values into four bins, we followed the same binning procedure as we did for the neural activities. The chosen values were divided into four quartiles so that each bin held an equal number of no-choice trials. We computed a first approximation of the bias as follows:

(7) Bias=12Mlog2(UFXUFUX+1)

where UFX is the number of nonzero Mij’s for all i and j, UF is the number of nonzero M.i for all i, and UX is the number of nonzero Mi. for all j. This procedure followed the approach described in (Ito and Doya, 2009).

Finally, we performed a Bootstrap procedure to test whether the amount of mutual information was significant, and to further reduce any remaining bias. We generated a random set of Fi and Xj pairs, by permuting both F and X arrays, respectively. We calculated the mutual information between F and X, using the same method described above, and repeated this process for 100 times. The mean of the mutual information obtained from these 100 random processes represented remaining bias and was subtracted from I(F, X). To test whether the final estimated mutual information was significant (p0.05), we compared it with the sixth highest information obtained from the 100 random processes. If it was non-significant, we set the mutual information to zero. The bias reductions sometimes lead to negative estimates of mutual information. In that case, we also set the final estimated information to be zero.

To determine when the SEF population carried different amounts of information about the chosen and non-chosen direction or value, we compared the information about the chosen and non-chosen option across all neurons in each time bin using a paired t-test. We defined the onset of differences in information as the first time bin in which p-values were less than 0.05 for 10 or more consecutive time bins.

Regression analysis

A linear regression was used to determine the temporally evolving contribution of the chosen and non-chosen target to the neural firing rate in choice trials. First, for each neuron, we calculated the mean firing rate on no-choice trials for each direction (Sno-choice(D,t) or value (Sno-choice(V,t)) for sequential time steps in the trial, using a sliding time window with 20 ms width and 10 ms step size. Then, in the regression analysis, the contribution of the chosen and non-chosen directions was described as:

(8) Schoice(t)=b1Snochoice¯(Dchosen,t)+b2Snochocie¯(Dnonchosen,t)

The contribution of the chosen and non-chosen values was described as:

(9) Schoice(t)=b1Snochoice¯(Vchosen,t)+b2Snochoice¯(Vnonchosen,t)

The data were fitted with a linear least-square fitting routine implemented in Matlab (The Math Works, Natick, MA).

To determine when the SEF population showed a significant (p0.05) difference in the influence of the chosen and non-chosen regression coefficients for direction and value, we performed paired t-tests for each time bin. We defined the onset of differences in the strength of coefficients as the first time bin in which p-values were less than 0.05 for 3 or more consecutive time bins.

State-space analysis

Population activity can be represented within the state space framework (Yu et al., 2009; Shenoy et al., 2011). In this framework, the state of activity of all n recorded neurons (i.e. the activity distribution) is represented by a vector in an n-dimensional state space. The successive vectors during a trial form a trajectory in state space that describes the development of the neural activity. Our state-space analysis follows generally the one described in Mante et al. (2013). The main difference is that we did not perform a principal component analysis to reduce the dimensionality of the state space.

To construct population responses, we first computed the average activity of all recorded neurons in both monkeys for each trial condition. Then, we combined the 128 average activity values to a 128-dimensional vector array representing the population activity trajectory in state space for each trial condition. Next, we used linear regression to identify dimensions in state space containing task related variance. For the z-scored responses of neuron i at time t, we have:

(10) ri,t(k)=βi,t(1)chosen_directionleft_right(k)+βi,t(2)chosen_directionup_down(k)+βi,t(3)chosen_value(k)+βi,t(4)+βi,t(4)nonchosen_directionleft_right(k)+βi,t(5)nonchosen_directionup_down(k)+βi,t(6)nonchosen_value(k)+βi,t(7)

where ri,t(κ)=βi,t is the z-scored response of neuron i at time t on trial κ, chosen_directionleft_right(k) and nonchosen_directionleft_right(k) is the monkeys chosen and nonchosen direction on trial k (+1: right; -1: left), chosen_directionup_down(k)and nonchosen_directionup_down(k) is the monkeys chosen and non-chosen direction on trial k (+1: right; -1: left). There are six independent variables (var) that can influence the responses of neuron i in function (10). To estimate the respective regression coefficients βi,t(var), for var=1 to 6, we define, for each unit i, a matrix Fiof size Ncoef×Ntrial, where Ncoef is the number of regression coeffients to be estimated and Ntrial is the number of trial recorded for neuron i. The regression coefficients can be then estimated as:

(11) βi,t=(FiFiT)1Firi,t

where βi,t is a vector of length Ncoef with elements βi,t(var), v=1–6. It corresponds to the regression coefficient for task variable var, time t, and neuron i. For each task variable, we build a set of coefficient vectors βv,t whose entries is βi,t(var). The new vector βvar,t correspond to the directions in state space along which the task variable are represented at the level of the population.

For each task variable var, we then determined the time, tvarmax, for which the corresponding set of vectors βvar,t. βvarmax=βvar,tvarmax with tvarmax=argmaxt||βvar,t||. Last, we orthogonalized the axes of direction and value with QR-decomposition. The new axis βvar span the same ‘regression subspace’ as the original regression vectors; however, it each explains distinct portions of the variance in the responses. Then at a specific time t, the projections of the population response on the time-independent axes are defined by:

(12) pv,arc=βvarTXc

where pvar,c is the set of time-series vectors over all task variable and conditions. Xc is the firing rate matrix in different trial conditions with the size of Nunit×T.

References

  1. 1
  2. 2
  3. 3
    Model Selection and Multimodal Inference
    1. KP Burnham
    2. DR Anderson
    (2002)
    New York, NY: Springer.
  4. 4
    Cognitive Modeling
    1. JR Busemeyer
    2. A Diederich
    (2010)
    Thousand Oaks, CA: SAGE.
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
    Making decisions through a distributed consensus
    1. P Cisek
    (2012)
    Current Opinion in Neurobiology, 22, 10.1016/j.conb.2012.05.007.
  12. 12
  13. 13
    The influence of behavioral context on the representation of a perceptual decision in developing oculomotor commands
    1. JI Gold
    2. MN Shadlen
    (2003)
    The Journal of Neuroscience 23:632–651.
  14. 14
  15. 15
    Prediction of economic choice by primate amygdala neurons
    1. F Grabenhorst
    2. I Hernádi
    3. W Schultz
    (2012)
    Proceedings of the National Academy of Sciences of the United States of America 109:18950–18955.
    https://doi.org/10.1073/pnas.1212706109
  16. 16
    Supplementary eye field as defined by intracortical microstimulation: connections in macaques
    1. MF Huerta
    2. JH Kaas
    (1990)
    The Journal of Comparative Neurology 293:299–330.
  17. 17
  18. 18
  19. 19
  20. 20
    Psychophysics: A Practical Introduction
    1. FAA Kingdom
    2. N Prins
    (2010)
    Amsterdam: Academic Press.
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
    Monitoring and control of action by the frontal lobes
    1. JD Schall
    2. V Stuphorn
    3. JW Brown
    (2002)
    Neuron 36:309–322.
  32. 32
  33. 33
    Interaction of the two frontal eye fields before saccade onset
    1. J Schlag
    2. P Dassonville
    3. M Schlag-Rey
    (1998)
    Journal of Neurophysiology 79:64–72.
  34. 34
  35. 35
    Neurobiology of Decision Making, An Intentional Framework
    1. MN Shadlen
    2. R Kiani
    3. TD Hanks
    4. AK Churchland
    (2008)
    In: C Engel, W Singer, editors. Better Than Conscious?. Cambridge, MA: MIT Press. pp. 71–101.
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
    Eye fields in the frontal lobes of primates
    1. EJ Tehovnik
    2. MA Sommer
    3. IH Chou
    4. WM Slocum
    5. PH Schiller
    (2000)
    Brain Research. Brain Research Reviews 32:413–448.
  42. 42
  43. 43
  44. 44
    Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task
    1. JD Wallis
    2. EK Miller
    (2003)
    The European Journal of Neuroscience 18:2069–2081.
  45. 45
  46. 46
    Economic choices can be made using only stimulus values
    1. K Wunderlich
    2. A Rangel
    3. JP O'Doherty
    (2010)
    P Natl Acad Sci USA 107:15005–15010.
  47. 47

Decision letter

  1. Wolfram Schultz
    Reviewing Editor; University of Cambridge, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your work entitled "Sequential selection of economic good and action in medial frontal cortex of macaques during value-based decisions" for peer review at eLife. Your submission has been favorably evaluated by Timothy Behrens (Senior editor), a Reviewing editor, and two reviewers.

Although both reviewers were supportive of the theme of your work and the experimental procedures and data, they voiced a number of concerns that need to be addressed carefully. You will find these concerns carefully numbered, and we hope you can address them all in a satisfactory way. In addition, the Reviewing editor has made the following comments:

1) It is unclear how you define 'preferred direction' and 'non-preferred direction'. This may be hidden somewhere in the text, but this must be made in a prominent place. The question is whether the preference relates to spatial position or direction (spatial preference), or whether it relates to the larger reward value obtained from movement in a particular direction (preference based on reward).

2) Only when this point has been clarified can we understand what you mean by 'action value': is this the value obtained by a specific action irrespective of that action being performed (the correct definition for an input to a competitive decision mechanism: an action value neuron would reflect that value irrespective of the animal choosing this action; and for each different action there would be a different pool of action value neurons that compete at the input), or is this the value obtained when the animal actually makes this action but not when it doesn't make that action (which is called chosen value, which is usually not considered an input to a competitive decision mechanism). It would be good to present these definitions clearly immediately after having defined 'preferred direction' and 'non-preferred direction'.

3) I would recommend transferring the interpretation of a difference signal in terms of inhibitory interactions between (action value?) neurons from the Results into the Discussion, which would also partly address one of the referees' issues. The results are fine as they are, but their interpretation is debatable in the absence of a visible inhibitory response.

4) In Figure 2A, how did you arrive at the EV values of 1.8, 2.6 and 4.2, and how can you say that all points on the same diagonal have same EV? E.g. 3 x 0.4 = 1.2 but 5 x 0.2 = 1.0 for the lower diagonal, and the other two diagonals present the same issue.

5) I would recommend not to put any abbreviations in the figures, even if they are explained in the legends. This makes the figures less understandable, and the paper less cited. There is always a more informative way to indicate what is required.

Reviewer #1:

Most authors agree that economic decisions entail computing and comparing the subjective values of available options. There is also general agreement that subjective values are computed in the OFC/vmPFC, with the possible participation of other areas such as the amygdala. However, there is no general consensus of where and how exactly subjective values are compared to make a decision. Several models have been put forth in the past few years, as described in Figure 1 of this manuscript. These models are not necessarily mutually exclusive – for example, Paul Cisek who proposed the "distributed consensus" model, also pointed out that economic decisions can be made in a good-based representation (Cisek, 2012). However, even if one accepts the idea that each model has some domain of applicability, data shedding light on the relative extent of these domains are very valuable. In this study Chen and Stuphorn use a clever task design, in which monkeys choose between different gambles. Across trials, the expected values of the chosen and non-chosen gambles vary in ways that allow several insightful analyses. The animals revealed their choices through eye movements and the authors recorded and analyzed the activity of neurons in the supplementary eye fields (SEF). Contrary to most neurophysiology studies, here the trial structure did not impose a delay between the offer and the saccade targets, which provided a measure of the RTs as a function of the decision difficulty. The lack of a delay imposed particular care in the analysis of neuronal data, but the senior author is an expert of this sort of analyses and the approach(es) taken here are adequate. The main results of the study are (1) both information about value options and saccade directions are present in the SEF, (2) the decision between values is resolved significantly earlier than that between actions, and (3) in SEF there is evidence of mutual inhibition between the signals associated to different possible saccades (but this effect is not present in signals associated to different possible values). The authors conclusions are (A) in this task design decisions are made sequentially, first in goods space (selection of one value/option) and then in actions space (selection of a suitable action) and (B) SEF participates in the process of action selection, but not in the process of value/option selection. For the reasons outlined above, (A) is particularly notable because the task design did not impose or even invite monkeys to make the primary choice in goods space. In other words, this would have been a perfect situation for an action-based decision, but even in this situation the evidence is against that model. The authors thus propose a new "sequential decision" model, which differs from the good-based model in that both possible actions are computed and selected between. My view of the study is positive overall, although I have several reservations. The experiment was well designed, the data analysis is for the most part credible, and the discussion of the results is mostly appropriate. My main concerns are due to the fact that in a number of circumstances the authors do not provide sufficient information and/or their description is unclear – specific instances are indicated below. I'd like to see a revision of the paper, but if the authors can address these issues adequately, I would consider the study worth of publication.

1) The authors recorded from 516 neurons but only 128 entered the analysis. I am confused about the selection criteria. It appears that 353 cells were task-related, but only about one third of them made it into the analysis. Was this because the remaining 2/3 did not have a response field? I thought that the majority of neurons in SEF are selective for saccades in some direction. Also, the authors established the "influence" of value using the function in Equation 2. I am not sure where this function came from, but there are many cases in the literature in which value was encoded in a linear way, which is at odds with the present assumption. The authors should elaborate on their criteria and also test alternative forms of value encoding.

2) The evidence showing that neuronal activity is positively related to the value in the PD and negatively related to the value in the NPD is convincing. However, the authors repeatedly state that neurons encode the difference in action values. This is a much more stringent statement and there is no evidence to support it. For example, the influence of NPD could be through mechanisms of divisive normalization akin to those observed in area LIP (Louie et al.). If the value modulation was really according to the value difference, one would expect the two regression coefficients shown in Table 1 to be similar in absolute value. In contrast, the absolute value of the regression coefficient for PD is roughly twice that for NPD. Incidentally, the authors should provide a confidence interval for these coefficients.

3) Throughout the analysis, the authors treat non-choice trials separately from choice trials. But it is not clear that this is the most appropriate way to think about the data. In many studies, non-choice trials are considered forced choices in which simply one of the two values is zero, and all trials are pooled. This is relevant to the issue discussed in the previous point because if the value modulations was according to the value difference, forced choices in the PD should present the highest activity, and forced choices in the NPD should present the lowest activity (all for given chosen value). I don't see evidence of this in Figure 4, comparing panes A and B, although I am not sure how exactly the activity was normalized for these plots (please describe). Also, it would be good to see the equivalent of Figure 4A for no-choice, NPD trials.

4) To reiterate the point, unless the authors can provide clear evidence for modulation based on value difference as distinguished from other context dependent value modulation, they should remove the phrase "value difference" completely from the manuscript.

5) Figure 4. I presume that all the data shown in the figure are conditioned on the animal choosing the target in the PD (please clarify!). If so, the number of trials participating to the yellow line in Figure 4B should be fewer than those participating to the red line in the same panel. Similarly in Figure 4C, there should be few/more trials in the red/yellow line. This difference in number of trials, which should be very substantial, should translate into a difference in the SEM traces shown in the figure, and I am puzzled by the fact that I don't see any of that.

6) Subsection “SEF neurons reflects value difference between offered gamble options”: I am not convinced by the argument that mutual inhibition implies that the neuronal population participates in the decision (between actions), while lack of such mutual inhibition (a.k.a., menu invariance) implies that this neuronal population does not participate in the decision (between values/options). For example, context effects of the sort described here are relatively minor in area MT, even though neurons in MT clearly participate in, or are upstream of, the decision. Similarly, "offer value" cells in OFC, which are thought to provide the input for value-based decisions, don't show such effects of mutual inhibition. So this is a tricky argument, even though I agree that neurons in SEF are likely downstream of the decision between values while they participate in the decision between actions.

Reviewer #2:

The authors report on a study in which they trained monkeys to make choices among pairs of gambles that differed in reward magnitude and probability. Monkeys chose well, preferring options with larger expected values. They examined neural activity in the SEF while monkeys carried out the task and found that the representation of the chosen option occurred about 120 ms before saccade onset and then decreased before the representation of the chosen direction about 40 ms before saccade onset. They further found that the response to reward magnitude did not scale responses in the unchosen direction; the response in the chosen direction did depend on the target that was not chosen. Further, there was an increase in activity for neurons that represented both the chosen and unchosen options before the activity diverged and began to represent almost only the chosen option. They interpret this effect in terms of a competition model of decision making.

This paper addresses an important question about the contribution of the SEF to choices among gambles. The study was carefully carried out and the results are clearly and thoroughly presented. The task and the analytical approach to the task clarify several points about the representation of choices and actions to obtain those choices. I have several comments, however which, if addressed, could further clarify the results.

1) My two main comments have to do with the interpretation of the data relative to the models. First, how do the authors eliminate the possibility that they are observing a read-out of a choice process (possibly mutual-inhibition) that is actually occurring elsewhere? The fact that activity for both options begins to increase and then separate does not clear rule out this possibility.

2) If I had to choose one of the models in Figure 1 that was most consistent with the data, I would probably choose model B. Can the authors address more directly why their data is more consistent with model D and not model B? Is the evidence for a recurrent choice process of mutual inhibition the presence of activity representing both options for a period before they diverge? What other mechanisms could give rise to this? Is it really just mutual inhibition? Is there evidence for a representation of both choice options for a period before they diverge? An analysis similar to the one shown in Figure 6 for options instead of directions could clarify this.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Sequential selection of economic good and action in medial frontal cortex of macaques during value-based decisions" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior editor), a Reviewing editor, and two reviewers. The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) Both referees have some remaining issues that we would like you to address carefully.

2) The Reviewing editor is unhappy about your reply concerning action value (your point #2 and the text 'The SEF neurons encode therefore the action value of saccades to the PD target'): of the three types of action value that you mention, only the first type is real action value. The other two types are a combination of action values (your type 2) and straight chosen value (your type 3, which is the opposite of action value, a decision output variable). To avoid confusion in the community, please accept the only valid definition of action value, which comes out of machine learning: the value of an action that is independent of the action being chosen (a decision input variable). To me it looks like you did not find coding of action value of your type 1. Please rephrase your description of the type of value being coded.

Reviewer #1:

My concerns were addressed for the most part. However there remain a few relatively minor issues. The numeration below refers to the points raised in my first review.

1) How many neurons are "task-related"? In the first paragraph of the subsection “Task-related neurons” the authors state 353; in the fifth paragraph they state 362. Also, in the rebuttal, they say that the results reported in the paper, obtained with the smaller population of 128 neurons that pass the BIC criterion, are valid also for the larger population of 353/362 task-related cells. This fact should be reported in the paper (the additional figure could be included as supplementary material).

In the rebuttal the authors explain why they fitted responses with a sigmoid. The two reasons – (1) there are cells with "curved" tuning and (2) there are cells with floor/ceiling effects – seem to me one and the same. In any case, the authors should clarify this point in the Methods, not just with this reviewer.

3) Pooling forced choice trials with other trials. Here the authors did not really address my point. The question is this: If instead of separating forced choices for other trials the authors had pooled all trials, as was done in many other studies, would the results presented here be affected in any significant way?

Reviewer #2:

It would be useful if the authors provided additional discussion, in the manuscript, on the predictions they believe each of the models make, and how their data is consistent/inconsistent with each of the models. They have provided this information in the replies to the reviewers, but it needs to be in the manuscript. Otherwise, I have no further comments.

https://doi.org/10.7554/eLife.09418.022

Author response

Although both reviewers were supportive of the theme of your work and the experimental procedures and data, they voiced a number of concerns that need to be addressed carefully. You will find these concerns carefully numbered, and we hope you can address them all in a satisfactory way. In addition, the Reviewing editor has made the following comments:

1) It is unclear how you define 'preferred direction' and 'non-preferred direction'. This may be hidden somewhere in the text, but this must be made in a prominent place. The question is whether the preference relates to spatial position or direction (spatial preference), or whether it relates to the larger reward value obtained from movement in a particular direction (preference based on reward).

We thank the editor for pointing this out. We define “preferred direction” (PD) as the saccade direction for which a neuron is maximally active, irrespective of reward value obtained by the saccade. PD therefore refers to spatial preference. This is illustrated in the revised Figure 4, which shows the activity of the SEF neurons during no-choice trials, i.e. when only one target appears on the screen. While the SEF neurons are strongly active for saccades into the preferred direction and show value-related modulations (Figure 4A), they are not active for saccades into the non-preferred direction, even when this leads to large reward amounts (Figure 4B). We agree with the reviewer that the definition of ‘preferred direction’ (PD) is important and should be in a prominent place. In the revised manuscript, we added an explanation in the Results section, when we first refer to PD (subsection “SEF neurons reflect the value of both choice options in an opposing way”). In addition, we explain how PD is estimated in more detail in the Method part of our manuscript. We used a nonlinear regression to estimate value modulation and direction modulation simultaneously for each neuron. Therefore, the estimation of preferred direction is based on the residual neuronal activity variation, after the influence of reward was taken into account. In addition, in our experiment design, all the directions were assigned with all possible reward. Thus, the reward amount and direction was dissociated from each other (across different trials).

2) Only when this point has been clarified can we understand what you mean by 'action value': is this the value obtained by a specific action irrespective of that action being performed (the correct definition for an input to a competitive decision mechanism: an action value neuron would reflect that value irrespective of the animal choosing this action; and for each different action there would be a different pool of action value neurons that compete at the input), or is this the value obtained when the animal actually makes this action but not when it doesn't make that action (which is called chosen value, which is usually not considered an input to a competitive decision mechanism). It would be good to present these definitions clearly immediately after having defined 'preferred direction' and 'non-preferred direction'.

We thank the editor for asking us to clarify, what we mean by action value. We would define it as neural signals that combine information that specifies a particular type of action (e.g. saccade direction and amplitude) with information about the value associated with the outcome of that action. There are a number of subtypes of action value signals that are related to the different stages of the decision process: 1) Signals that represent the value of the alternative actions irrespective of the choice (this is what the editor has in mind, the input on which a decision is based), 2) Signals that represent the essential step in decision making, namely the comparison between the action values of the various alternatives. Such signals should be positively correlated with the action value of one alternative and negatively correlated with the action value of the other alternatives, 3) Signals that indicate the value of the chosen action (the output of the decision process). We have found in SEF ‘action value’ signals of the 2nd and 3rd type, i.e. comparative action value signals that develop into chosen action value signals. Please see our clarification in the new revision (subsection “SEF neurons reflect the value of both choice options in an opposing way”).

3) I would recommend transferring the interpretation of a difference signal in terms of inhibitory interactions between (action value?) neurons from the Results into the Discussion, which would also partly address one of the referees' issues. The results are fine as they are, but their interpretation is debatable in the absence of a visible inhibitory response.

We feel that we show convincingly that the activity of the SEF population is positively correlated with the value of the PD and negatively correlated with the value of the NPD target (Figure 4). We think it is very hard to explain this finding without presuming some form of inhibitory interaction between the alternatives at some stage in the decision process. However, we agree with the editor and the reviewers that there are a number of different possibilities regarding the exact nature of the inhibitory interactions and the stage in the decision process at which they take place. They could simply reflect activity in SEF that is correlated with the decision process that takes place in other brain areas, or they could reflect inhibitory processes within SEF. Likewise, local inhibitory processes could take the form of specific mutual inhibition between SEF neurons with different PD, or global inhibition, or a mixture of both. Accordingly, we followed the suggestion of the editor and shifted text describing the interpretation of the result to the Discussion section. We left some references to inhibitory interactions in the Results section to motivate particular analyses. Without it, we felt the paper would be harder to follow.

4) In Figure 2A, how did you arrive at the EV values of 1.8, 2.6 and 4.2, and how can you say that all points on the same diagonal have same EV? E.g. 3 x 0.4 = 1.2 but 5 x 0.2 = 1.0 for the lower diagonal, and the other two diagonals present the same issue.

We apologize for the confusion. For each target, the red, blue and green colors denote the maximum reward amounts the animals can receive, while the cyan color denotes the minimum reward amount the animal can receive. The minimum reward amount is always 1 (and not 0 as the editor quite naturally assumed). For example, the expected value of the red/cyan targets is calculated as 3 (maximum reward) *0.4 (maximum reward probability) +1 (minimum reward) *0.6 (minimum reward probability), which adds up to an EV value of 1.8. In the revision, we added some description to the legend of Figure 1 to better explain this fact.

5) I would recommend not to put any abbreviations in the figures, even if they are explained in the legends. This makes the figures less understandable, and the paper less cited. There is always a more informative way to indicate what is required.

We thank the editor for the helpful suggestion. We changed all the abbreviations in the figures into their spelled out version when there is enough space.

Reviewer #1:

[…] I'd like to see a revision of the paper, but if the authors can address these issues adequately, I would consider the study worth of publication.

1) The authors recorded from 516 neurons but only 128 entered the analysis. I am confused about the selection criteria. It appears that 353 cells were task-related, but only about one third of them made it into the analysis. Was this because the remaining 2/3 did not have a response field? I thought that the majority of neurons in SEF are selective for saccades in some direction.

We thank the reviewer for pointing this out. It is indeed true that earlier recording studies in SEF have reported that large majorities of SEF neurons showed saccade direction-related selectivity (Schlag and Schlag-Rey, 1987; Schall, 1991; Schlag et al., 1992; Chen and Wise, 1995a, b; Hanes et al., 1995; Olson and Gettner, 1995; Russo and Bruce, 1996, 2000; Isoda and Tanji, 2002; Moorman and Olson, 2007; Stuphorn et al., 2010). However, most of these studies were performed with only one electrode at a time, and often with the capability to record only the action potentials of one neuron at a time (this was simply the technical standard in the 80-90ies of the last century). This required the use of strong selection criteria for recording neurons. Typically, a researcher would search for some time within a population of neurons for cells, whose activity patterns are of interest. Clearly, such a recording scheme introduces a strong recording bias. Therefore, the numbers concerning the frequency of neurons with particular functional criteria in older papers has to be taken cautiously. In contrast, in our study we used multiple electrodes (2-4) and we were able to identify an unlimited number of individual spike waveforms, since we recorded the analog voltage recording at each electrode (the current technical standard). We also started to record only after we had found at least one ‘interesting’ neuron on an electrode, but nevertheless we recorded a much larger number of surrounding SEF neurons. Accordingly, our sample of recorded SEF neurons includes a much larger number of ‘randomly’ sampled neurons. It is therefore maybe not too much of a surprise that we found a large number of neurons that did not seem to be engaged in the task.

Thus, the topic of our study is not SEF as a whole or even a large majority of SEF neurons, but rather the attributes of a particular subclass of SEF neurons that had characteristics that were of interest to us. We defined task-related very generously as any neuron that modulated its activity relative to baseline (the time immediately before trial start, i.e., fixation cue onset). We were interested in neurons with saccade-direction modulation and selected neurons according to the Bayesian information criterion (BIC) as described in the Methods (subsection “Task-related neurons“). The BIC is a very strict criterion, and therefore a large number of neurons with weak direction or value modulation were not included in the analysis. However, the population results with 353 neurons are similar to the result with only 128 neurons, but just overall relatively weaker. For example, Author response image 1 shows the same result of Figure 4 in the main text, but across all 353 task-related neurons.

Author response image 1
SEF neurons represent the difference in action value of saccades in the preferred and non-preferred direction.

The neural activity of 353 directionally tuned SEF neurons was normalized and compared across trials with different values of the preferred direction or non-preferred direction target. (A–B) The neural activity in no-choice trials when the saccade in the preferred direction (A) and non-preferred direction (B) was performed. (C–D) The neural activity in choice trials. To visualize the contrasting effect of the preferred and non-preferred target on neural activity, the value of one of the targets was held constant, while the value of the other target was varied he value of the preferred target varied, while the value of the non-preferred target was held at a medium value. Alternatively, the value of the non-preferred target varied, while the value of target was held at a medium value. The color of the spike density histograms indicates the target value [high value=6-7 units (red line); medium value=3-5 units (orange line); low value=1-2 units (yellow line)]. (E–H) The regression analysis corresponding to (A–D). A t-test was used to determine whether the regression coefficients were significantly different from 0.

https://doi.org/10.7554/eLife.09418.021

In sum, we are confident that our selection criteria did not distort our findings regarding the activity of SEF neurons in the gamble task.

Also, the authors established the "influence" of value using the function in Equation 2. I am not sure where this function came from, but there are many cases in the literature in which value was encoded in a linear way, which is at odds with the present assumption. The authors should elaborate on their criteria and also test alternative forms of value encoding.

We first introduced the use of the sigmoid function in an earlier publication (So and Stuphorn, 2010), in which we described SEF neurons that encode both saccade direction and expected value (i.e., action value). The reason we used a sigmoid instead of a linear function is because it resulted in a better fit and in a larger number of neurons in which we could describe the relationship with value. The reasons for this are two-fold: 1) Many SEF neurons actually had a ‘curved’ relationship with increasing value. An example of this is shown in Figure 9 from So and Stuphorn (2010). 2) The more important reason is that a substantial number of value-related SEF neurons showed floor or ceiling effects, that is, they showed no modulation for value increases in a certain range, but started to indicate value above or below a certain threshold. In general, we agree with the reviewer that many neurons encode value in a relative linear way. However, the sigmoid function is flexible enough to easily approximate a linear value coding. For example, in Equation 2, by setting t=0.5, b1>1, the relatively linear part of the sigmoid function then can be used for value coding. Thus, the sigmoid function is flexible enough to fit the behavior of a large number of neurons with monotonically increasing or decreasing activity for varying value (including linearly related ones). The main concern of the reviewer, namely that we might have missed such neurons seems therefore unwarranted. Of course, it might be argued that the additional parameters of the sigmoid function are not necessary to capture the main trend and that there might be a danger of over-fitting. However, this seems unlikely, given the fact that the BIC criterion was rather selective (see #1 above). In any case, we used the multiple regression analysis only for classification purposes. None of the main analysis depends on using the sigmoid function.

2) The evidence showing that neuronal activity is positively related to the value in the PD and negatively related to the value in the NPD is convincing. However, the authors repeatedly state that neurons encode the difference in action values. This is a much more stringent statement and there is no evidence to support it. For example, the influence of NPD could be through mechanisms of divisive normalization akin to those observed in area LIP (Louie et al.). If the value modulation was really according to the value difference, one would expect the two regression coefficients shown in Table 1 to be similar in absolute value. In contrast, the absolute value of the regression coefficient for PD is roughly twice that for NPD.

We apologize for the misunderstanding here. We made multiple changes in the revised version of the paper to clarify this issue (Results).

We fully agree with the reviewer that the SEF neurons do not encode the difference in action values in the strict mathematical sense. Our reference to the ‘difference in action value’ was meant as a description of the fact that the SEF neurons represent the action value strength of saccades in the preferred direction relative to the strength of saccades in the non-preferred direction. Thus, they signal a relative, not absolute, action value signal. The reviewer is of course quite right in pointing out that our terminology is misleading. Clearly, the effect of the negative modulation by alternative targets is not as strong as the effect of the positive modulation by the preferred target. This is in fact an interesting finding and we now note it in the revised manuscript (subsection “SEF neurons reflect the value of both choice options in an opposing way”). Nevertheless, despite the fact that the effect of PD and NPD is not exactly equal at the level of SEF neurons, the neural activity pattern implies that somewhere in the neural circuit leading from the target detection and evaluation to the SEF activity we record there is at least one (possible multiple) steps in which the neurons representing the competing targets exert an inhibitory effect on each other, which embodies a competitive interaction. Of course, based on the existing data we cannot say if this step occurs: 1) upstream of the SEF, 2) locally within the SEF, or 3) whether it occurs at multiple levels within the decision network. The demonstration of direct local inhibition between SEF neurons would require experiments such as the ones performed by the Schlag’s in FEF and SC (Schlag-Rey et al., 1992; Schlag et al., 1998) in which they recorded activity in some oculomotor neuron and demonstrated that activation of a different part of FEF or SC led to decreased activity in the neuron, or optogenetic stimulation experiments (Nassi et al., 2015). However, given the knowledge that inhibition plays an important role in many other cortical areas, including FEF and LIP (Schlag et al., 1998; Falkner et al., 2010), we think the suggestion that this might also be the case in SEF is not far-fetched and certainly would fit with our results.

Likewise, there are multiple possible inhibitory mechanisms within the SEF that could give rise to the activity pattern we describe. The inhibition could operate directly between SEF neurons with different PDs (some form of lateral inhibition). Alternatively, a global inhibitory network could be involved as in standard neuronal decision models (Wang, 2002), or as in divisive normalization (Nassi et al., 2015)). The exact mechanism (or mechanisms) must be of course worked out in future experiments. Nevertheless, it seems important to note that all of these mechanisms have the same functional consequence: they generate a situation, in which the incentive to choose one course of action is suppressed by the simultaneously existing incentive to choose alternative actions. Thus, the various action value signals have to compete with each other, which constitutes the essential step in decision-making.

We thank the reviewer for this comment that encouraged us to discuss the different possible interpretations of our data in more detail and to distinguish more clearly between data and interpretation.

Incidentally, the authors should provide a confidence interval for these coefficients.

Please see the new Table 1.

3) Throughout the analysis, the authors treat non-choice trials separately from choice trials. But it is not clear that this is the most appropriate way to think about the data. In many studies, non-choice trials are considered forced choices in which simply one of the two values is zero, and all trials are pooled. This is relevant to the issue discussed in the previous point because if the value modulations was according to the value difference, forced choices in the PD should present the highest activity, and forced choices in the NPD should present the lowest activity (all for given chosen value). I don't see evidence of this in Figure 4, comparing panes A and B.

We thank the reviewer for this insightful comment. In general, we concentrated in this manuscript on choice trials simply because only these trials allow us to study the decision process and the interaction of the two targets on each other. With respect to the encoding of value difference, we agree with the predictions of the reviewer. Activity on no-choice trials choices in the PD, the activity should be highest. Instead, the activity on choice trials with a medium value NPD target, was higher than on no-choice trials, when the PD target had a higher value, but lower, when it had a smaller value (compare Figure 4A, E with Figure 4C, G). This shows that SEF neurons encode a relative action value signal, but not the exact mathematical difference between the absolute value amounts.

Although I am not sure how exactly the activity was normalized for these plots (please describe).

In the revised manuscript, we describe the normalization method in more detail (subsection “Task-related neurons”). In brief – for each trial of a particular neuron, we generate a smooth spike density function. Within this trial set, we then search for the maximum and minimum neuronal activity across all trial conditions (choice and no-choice trials) and re-scale the spike density functions, so that the maximum activity is 1 and the minimum activity is 0. Thus, the activity scale of the normalized activity is the same across all trial types and the activity can be directly compared.

Also, it would be good to see the equivalent of Figure 4A for no-choice, NPD trials.

Please see the new Figure 4 and new Figure 4—figure supplement 1.

4) To reiterate the point, unless the authors can provide clear evidence for modulation based on value difference as distinguished from other context dependent value modulation, they should remove the phrase "value difference" completely from the manuscript.

We agree with the reviewer that “value difference” is a potentially misleading term and removed it from the manuscript. In the revised version, we discuss in detail the different possible interpretations of our findings, as well as their general meaning (Discussion).

5) Figure 4. I presume that all the data shown in the figure are conditioned on the animal choosing the target in the PD (please clarify!). If so, the number of trials participating to the yellow line in Figure 4B should be fewer than those participating to the red line in the same panel. Similarly in Figure 4C, there should be few/more trials in the red/yellow line. This difference in number of trials, which should be very substantial, should translate into a difference in the SEM traces shown in the figure, and I am puzzled by the fact that I don't see any of that.

We apologize for this misunderstanding that is caused by a misleading figure labeling. We have corrected the figure legend and thank the reviewer for noticing the ambiguity. In Figure 4, the trials (different colored lines) are sorted by the value of the PD and NPD target regardless of the saccade direction that the monkey chose. The number of trials is therefore similar in all conditions, which explains that the SME traces have a similar range. We did this because we want to clearly demonstrate that the neuronal activity was influenced by both PD and NPD value (in an inverse fashion) independent of the additional complication of saccade choice. It is of course true that the SEF activity reflects not only PD and NPD value, but also chosen saccade direction. We decided therefore to analyze the SEF activity by PD and NPD value, as well as by chosen saccade direction. The results are included in the revised manuscript as Figure 4—figure supplement 2. The trials are sorted by direction (preferred: right; non-preferred: left) and value of the chosen or non-chosen target. We grouped the subjective value of the reward options into three groups (red: high value; orange: medium value; yellow: low value). The inset above the histograms indicates the location and value of the targets in the trials shown in the histograms below. The grey oval indicates the location of the preferred direction of the neuron, while the arrow indicates the chosen saccade direction. The p-values indicate the significance of a regression using all 7 individual target values without grouping. The shaded areas represent standard error of the mean (SEM).

Figure 4—figure supplement 2A shows the neural activity in no-choice trials. The color of the spike density histograms indicates the target value. As shown in Figure 4A, B in the revised manuscript, the neurons reflect the value of the PD target, but not of the NPD target. Figure 4—figure supplement 2B shows the neural activity in those choice trials, in which the chosen target had the highest possible value (7 units). We chose this reference point, instead of a medium value (as in Figure 4D), because it allowed us to a comparison with the widest possible range of non-chosen target values (indicated by the color of the spike density histograms). Figure 4—figure supplement 2C shows the neural activity in those choice trials, in which the non-chosen target had the lowest possible value (1 unit). Again, this reference point allowed for the widest possible range of chosen target values (indicated by the color of the spike density histograms).

Consistent with Figure 4, we can observe that increasing PD target value increases neuronal activity (Figure 4—figure supplement 2B left panel; Figure 4—figure supplement 2C right panel), while increasing NPD target value decreases neuronal activity (Figure 4—figure supplement 2C left panel). This is true, whether the PD or the NPD target is chosen. However, the value of the chosen target has a stronger influence than the non-chosen one, and in the extreme case, when the PD target has the highest possible value and is chosen, the NPD value has no significant effect on the neural activity (Figure 4—figure supplement 2B right panel).

6) Subsection “SEF neurons reflects value difference between offered gamble options”: I am not convinced by the argument that mutual inhibition implies that the neuronal population participates in the decision (between actions), while lack of such mutual inhibition (a.k.a., menu invariance) implies that this neuronal population does not participate in the decision (between values/options). For example, context effects of the sort described here are relatively minor in area MT, even though neurons in MT clearly participate in, or are upstream of, the decision. Similarly, "offer value" cells in OFC, which are thought to provide the input for value-based decisions, don't show such effects of mutual inhibition. So this is a tricky argument, even though I agree that neurons in SEF are likely downstream of the decision between values while they participate in the decision between actions.

We thank the reviewer for helping us to clarify our argument. We agree with the reviewer that the mutual inhibition is not a necessary condition for neural signals to participate in the decision process. Our argument is slightly different. The decision process consists of: 1) a representation of the alternatives (e.g. options, actions, or perceptual states), 2) a comparison of the alternatives, and 3) a representation of the chosen alternative. All 3 types of variables are necessary, but they indicate different stages of the decision process. We observe in SEF neuronal signals that match the last stage of the economic goods selection, i.e., a chosen option signal. This is evident in the asymmetric effect of chosen option in the regression analysis (Figure 5), but also in the fact that there is almost no increase in mutual information about the non-chosen option (Figure 3C). This is in contrast to action-related signals, which imply that SEF represent the comparison stage and the chosen alternative stage in the case of action selection. Altogether, we would argue this suggests that SEF receives the output of the option selection process that takes place upstream of SEF, but participates in the action selection process. It seems that this interpretation is not too far off the reviewer’s ideas. We agree that our line of reasoning is not completely decisive. In the revised paper, we therefore clarified our reasoning and made sure to indicate clearly that this is only our interpretation of the data (subsection “SEF neurons reflect the value of both choice options in an opposing way”).

Reviewer #2:

1) My two main comments have to do with the interpretation of the data relative to the models. First, how do the authors eliminate the possibility that they are observing a read-out of a choice process (possibly mutual-inhibition) that is actually occurring elsewhere? The fact that activity for both options begins to increase and then separate does not clear rule out this possibility.

We agree with the reviewer that our results do not rule out that the neuronal processes in SEF might reflect processes that actually occur in some other part of the brain. However, our results are also compatible with the possibility that SEF at least partly participates in the action selection. More importantly, our main results regarding the sequential nature of decision making is independent of this question. We have tried to be clear about this in the previous manuscript and have made additional modifications to be even clearer in this revised version (Discussion).

In fact, it seems likely that SEF is part of a larger network of cortical and subcortical areas (including vLPFC, dLPFC, FEF, ACC, caudate) that all participate in the value-based decision. We know that many of these regions are closely interconnected and that activity in these regions is likely correlated. On the other hand, the fact that SEF is part of a larger network and is influenced by the ongoing activity in this network does not preclude an active role of SEF in shaping the dynamic and outcome of this process. Separating and identifying the specific role of all the different nodes in this network is a complex process and will require a number of additional future experiments, but this is true for practically all neurophysiological recordings at the present time.

Our main results are independent of whether SEF actively participates in the decision process or only passively reflects it. In either case, the SEF activity allows us to describe two central elements of value-based decision making: 1) decisions are made sequentially, with option selection preceding action selection, and 2) the selection process involves some form of inhibitory competition between the neural representations of the different alternatives. We think that the first point is the most novel contribution of our work and should go some way to help understanding of value-based decision making. The second point is of course not so novel and should not be controversial. We hope we made clear in our response to Reviewer 1 that we do not wish to make claims about more specific aspects of the neural architecture underlying the competition, which also await future experiments to uncover.

2) If I had to choose one of the models in Figure 1 that was most consistent with the data, I would probably choose model B. Can the authors address more directly why their data is more consistent with model D and not model B?

If the pure action-selection model B were correct, the competition would only happen in the action value space. This is graphically indicated by the absence of inhibitory connections between the nodes representing the reward options (or goods). In this model B, we would expect the chosen direction information to appear simultaneously with or even slightly earlier than the chosen reward option information, since the selected action value signal contains both direction and value information. However, this prediction is contradicted by our observation that the chosen value information is present earlier than the chosen action information (Figure 3, Figure 5 and Figure 7). Thus, there is a moment during the decision-making process (100-50 ms before saccade onset), when the SEF neurons encode which option is chosen, but not yet, what saccade will be chosen. This could clearly never happen in an action selection model of decision-making. On the other hand, we found also evidence for competition between action value-encoding neurons in SEF that is spatially organized, i.e., in an action-based frame of reference (Figure 4). This fact, together with the fact that we find activity corresponding to both response option (Figure 6) clearly rules out the pure good-based model A. The time difference in the onset of chosen option and direction information also argues against strong recurrent connections from the action selection level back to the option selection level, as suggested by the distributed consensus model C. Thus, our data are most consistent with a model that predicts selection both on the option and the action representation level, but with asymmetric connections between them, so that the option selection level influences the action selection level, but not vice versa. This is the sequential model D.

Is the evidence for a recurrent choice process of mutual inhibition the presence of activity representing both options for a period before they diverge? What other mechanisms could give rise to this? Is it really just mutual inhibition?

The most important evidence for the presence of mutual inhibition is the fact that, as shown in Figure 4, the neuronal activity is negatively correlated with the action value in NPD targets (A) while the neuron does not response to the NPD target if it appears alone (B). Thus, the PD target value representation is modulated in negatively in proportion to the NPD target value. As shown in Figure 6, the magnitude of negative contribution and the speed with which one representation is influenced by the other is closely related to the relative strength of each representation. We hypothesize that these effects are caused by inhibitory mechanisms. As discussed in our reply to Reviewer 1, there is a number of inhibitory mechanisms that would be in line with this finding (mutual inhibition of local neuron pools across SEF, or activation of a global pool of inhibitory neurons). In the revised paper, we now discuss these different possibilities (Discussion).

Is there evidence for a representation of both choice options for a period before they diverge? An analysis similar to the one shown in Figure 6 for options instead of directions could clarify this.

In addition to the ‘static’ argument following from the differential effect of PD and NPD value on SEF activity, there is also a ‘dynamic’ argument in the sense that a process of mutual inhibition should lead from a state of equal strength of the two choice representations to a state, in which one strongly dominates. (At least, that is how we understand the question of the reviewer.) We thank the reviewer for bringing up this question, because it forced us to be clearer in the revised manuscript about this aspect of the decision process (subsection “Action value map in SEF reflects competition between available saccade choices”). On choice trials, there is a simultaneous onset of activity in the two areas of the action value map that correspond with the location of the two target options (Figure 6B). Throughout the task, there is a robust representation of both options maintained in SEF, even after the divergence of activity that indicates the chosen option and action (Figure 4C, D; Figure 6B, C). However, during the initial rise in activity the SEF population does not indicate the value of the target in its PD. At the time, when the neurons start to differentiate their activity according to PD value they also reflect the value of the NPD value. This can be seen very clearly in Figure 4D showing the activity of SEF neurons for PD targets of medium value. Depending on the value of the NPD target, the activity starts to change ~110-120 ms after target onset. The activity in this figure is not sorted by saccade choice, but it can be presumed that the monkey chose the PD target, when the NPD target was smaller in value (yellow line) and chose the NPD target, when it was larger in value (red line). However, it took this much time for the SEF neurons to indicate value even during no-choice trials, when there was no competing target. It seems therefore that the SEF neurons always indicate relative action value. There is therefore no time period in which two populations of SEF neurons represent the absolute action value of a target independent of the value of any competing target. Nevertheless, there is clear evidence of a succession of an initial undifferentiated state to a more and more differentiated state in which the chosen action value representation increases in activity and the non-chosen one decreases (again see Figure 4D). Thus, there is clear evidence for a dynamic process as would be expected by a decision mechanism driven by competition via inhibitory interactions.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Sequential selection of economic good and action in medial frontal cortex of macaques during value-based decisions" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior editor), a Reviewing editor, and two reviewers. The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) Both referees have some remaining issues that we would like you to address carefully. 2) The Reviewing editor is unhappy about your reply concerning action value (your point #2 and the text 'The SEF neurons encode therefore the action value of saccades to the PD target'): of the three types of action value that you mention, only the first type is real action value. The other two types are a combination of action values (your type 2) and straight chosen value (your type 3, which is the opposite of action value, a decision output variable). To avoid confusion in the community, please accept the only valid definition of action value, which comes out of machine learning: the value of an action that is independent of the action being chosen (a decision input variable). To me it looks like you did not find coding of action value of your type 1. Please rephrase your description of the type of value being coded. The reviewer is correct that ‘action value’ is a term that is often used in the literature with slightly varying meaning and that a consistent use of nomenclature is important for the field. We modified therefore our description of the different types of signals that represent the value associated with particular actions.

“There are a number of subtypes of value signals that are associated with actions, such as saccades […] These ‘chosen action value’ signals represent the output of the decision process.”

We fully agree that we found ‘relative action value’, and not ‘action value’ signals using the terminology used in the paragraph above. We rephrased our description accordingly throughout the manuscript.

Reviewer #1:

My concerns were addressed for the most part. However there remain a few relatively minor issues. The numeration below refers to the points raised in my first review.

1) How many neurons are "task-related"? In the first paragraph of the subsection “Task-related neurons” the authors state 353; in the fifth paragraph they state 362. Also, in the rebuttal, they say that the results reported in the paper, obtained with the smaller population of 128 neurons that pass the BIC criterion, are valid also for the larger population of 353/362 task-related cells. This fact should be reported in the paper (the additional figure could be included as supplementary material).

We thank the reviewer for pointing this out. The number of “task related” neurons is 353. We apologize for the confusion and have corrected the numbers in the manuscript. We also reported the result for the later population. We added a new Figure4—figure supplement 3 and some descriptions (subsection “SEF neurons reflect the value of both choice options in an opposing way”).

In the rebuttal the authors explain why they fitted responses with a sigmoid. The two reasons – (1) there are cells with "curved" tuning and (2) there are cells with floor/ceiling effects – seem to me one and the same. In any case, the authors should clarify this point in the Methods, not just with this reviewer.

We have included in the Methods section a description that explains the reasons for using the sigmoidal function and its relationship with linear functions (subsection “Task-related neurons”).

3) Pooling forced choice trials with other trials. Here the authors did not really address my point. The question is this: If instead of separating forced choices for other trials the authors had pooled all trials, as was done in many other studies, would the results presented here be affected in any significant way?

We thank the reviewer for pointing out this interesting point. We apologize for not fully address the reviewer’s point in the previous reply. We added a new figure (Figure 7—figure supplement 1) and some discussion in the paper (subsection “Instantaneous changes in SEF activity state space reflect decision process”) to explain why we did not pooled forced choice trial with other trials. We agree that in many studies, no-choice trials can be considered forced choices in which simply one of the two values is zero and all trials can be pooled. This is a reasonable strategy if neuronal activity in no-choice trials is indeed similar to the one in choice trial. However, there are examples in the literature, where there is an obvious difference. In a human psychophysical experiments that we conducted recently, the reaction time distribution for choice trials with zero alternative value (that is two targets, one of which is associated with no reward) are different from those for no-choice trials (Chen, Mihalas, Niebur and Stuphorn, 2013). Alexandre Pastor-Bernier and Paul Cisek (2011) reported activity differences in the premotor cortex between choice and no-choice trials. In our initial analysis, we therefore separated the choice trials from no-choice trials to avoid overlooking some useful information by combining the two conditions together. It turned out that we did find neuronal activity differences between no-choice and choice trials. In contrast to premotor cortex neurons (Alexandre Pastor-Bernier and Paul Cisek, 2011), SEF neurons did show value tuning in no-choice trial similar to what was shown in choice trials. However, SEF neurons showed higher activity in choice trials than in no-choice trials when the targets in the receptive field (RF) were chosen, and lower activity in choice trials than in no-choice trials when the targets in RF were not chosen (see Figures 4 and 6). Therefore, we could not treat no-choice trials as equivalent to a choice trial in which one of the two values is zero. It is worth pointing out that this is different from the activity pattern observed in LIP (Louie and Glimcher, 2012) that indicate a process of divisive normalization. We will address the difference between no-choice trial and choice trials in detail in a following up manuscript with some descriptive models.

The difference between choice and no-choice trials is also apparent in the state space analysis (see the new Figure 7—figure supplement 1 in the revised manuscript). The trajectories show the population neuronal activity when the down-left targets were chosen (the black trajectories in Figure 7A, B in the main manuscript). The trajectories are grouped according to both chosen and non-chosen values. The red, orange and blue colors indicate the large (L), medium (M) and small (S) chosen values in choice trials. Solid, dash and dotted lines indicate the small, medium and large non-chosen values. Please note that there are fewer trajectories when the target was less valuable, because it was chosen less often. In comparison, purple, black and green colors indicate the large, medium and small chosen value in no-choice trials. The trajectories for choice trials were influenced by both chosen and non-chosen values. Specifically, given a chosen value (indicating by color), the trajectories were slightly lower along the value axis if the non-chosen value was larger (indicating by line patterns). Accordingly, if no-choice trials simply represent a situation, in which the non-chosen value is 0, the trajectories for no-choice trial should be similar to the ones on choice trials with small non-chosen value. Instead, the trajectories for no-choice trials are very different and reach always a smaller point along both the value and direction axis.

Given the neuronal activity differences, we decided to keep the choice and no-choice trials separate and to show the SEF activity on both types of trials were appropriate.

Reviewer #2:

It would be useful if the authors provided additional discussion, in the manuscript, on the predictions they believe each of the models make, and how their data is consistent/inconsistent with each of the models. They have provided this information in the replies to the reviewers, but it needs to be in the manuscript. Otherwise, I have no further comments.

We added a more detailed discussion of the different predictions for each model and whether our data support these predictions (Discussion).

https://doi.org/10.7554/eLife.09418.023

Article and author information

Author details

  1. Xiaomo Chen

    Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, United States
    Contribution
    XC, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    xiaomo@stanford.edu
    Competing interests
    The authors declares that no competing interests exist.
  2. Veit Stuphorn

    1. Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, United States
    2. Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, United States
    3. Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University School of Medicine, Baltimore, United States
    Contribution
    VS, Conception and design, Drafting or revising the article
    For correspondence
    veit@jhu.edu
    Competing interests
    The authors declares that no competing interests exist.

Funding

National Institute of Neurological Disorders and Stroke (R01NS086104)

  • Veit Stuphorn

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to S Everling, Shreesh P Mysore and JD Schall for helpful comments on the manuscript. This work was supported by the National Institutes of Health through grant 2R01NS086104 to VS.

Ethics

Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (PR13A337) of Johns Hopkins University.

Reviewing Editor

  1. Wolfram Schultz, University of Cambridge, United Kingdom

Publication history

  1. Received: June 13, 2015
  2. Accepted: November 26, 2015
  3. Accepted Manuscript published: November 27, 2015 (version 1)
  4. Version of Record published: February 8, 2016 (version 2)

Copyright

© 2015, Chen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 814
    Page views
  • 219
    Downloads
  • 5
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Tom Alisch et al.
    Tools and Resources Updated
    1. Neuroscience
    Qiaoli Huang et al.
    Research Article