Introduction

Many decisions involve a process of value computation and comparison between options. Imagine you are planning a vacation based on online travel-guides describing two locations. As you research these options, you sample pieces of information that support one option or another to varying degrees. For example, one destination might have cheaper flights while the other has cheaper hotels. Additionally, some information may attract more of your attention, such as pictures of waterfalls or ancient ruins. As you evaluate your options, you must integrate these pieces of information and eventually decide when to stop and make a final decision.

Decisions like this are thought to rely on a bounded, evidence-accumulation process that depends on factors such as the value of the sampled information and shifts in attention. These factors produce reliable patterns in choice, response-time (RT), and eye-tracking data (Ashby et al., 2016; Callaway et al., 2021; Gluth et al., 2018; Krajbich et al., 2010; Smith & Krajbich, 2018). For instance, decisions that are less predictable also tend to take more time (Konovalov & Krajbich, 2019) and can be influenced by attention manipulations (Parnamets et al., 2015; Tavares et al., 2017; Gwinn et al., 2019; Bhatnagar & Orquin, 2022). The quantitative relations between these measures argue for an evidence-accumulation process.

Sequential sampling models (SSMs) offer a framework to understand this deliberative choice process. SSMs vary in specific details, but generally share some core features. When faced with a decision, people begin to sample information in favor of each option. This information is evaluated and converted into relative evidence for one option or the other.

Relative evidence builds up over time until there is enough to commit to an option (Busemeyer & Townsend, 1993; Ratcliff & Smith, 2004).

Most SSMs can be broken down into two key components: inputs and integrators (Bogacz et al., 2006). Inputs encode the drift rate – the rate of evidence accumulation – and integrators encode the decision variable – the amount of accumulated evidence.

Each option has its own input. In the context of value-based decisions, the input represents the value of the currently considered piece of information. For a given option, the average input value is generally assumed to be constant over the course of the decision but does vary randomly from one instant to the next due to stochasticity in the sampling process (Shadlen & Shohamy, 2016).

Integrators accumulate the stochastic sequences of sampled input values. Typically, each option has its own integrator (Busemeyer & Townsend, 1993; Gold & Shadlen, 2007; Krajbich & Rangel, 2011; Usher & McClelland, 2001; Wang, 2002) which accumulates the evidence from that option’s input, but is also inhibited by the other options’ inputs or integrators. Thus, each integrator represents the accumulated, relative evidence favoring a given option. Once one of the integrators’ accumulated evidence reaches a pre-determined threshold, the corresponding option is chosen. In contrast to the input values, these accumulated values dynamically evolve over the course of the decision.

The neural inputs and accumulated values have been successfully identified in perceptual decision making (for recent reviews see: (Forstmann et al., 2016; Hanks & Summerfield, 2017; O’Connell et al., 2018; Ratcliff et al., 2016), but it has proven more challenging for value-based decisions. The main reason is that decisions are typically very quick, with RTs shorter than the time resolution of functional magnetic resonance imaging (fMRI) (but see Gluth et al., 2012).

Evidence for the neural substrates of value-based SSMs have typically come as trial-level measures correlating with model parameters (Hare et al., 2011; Rodriguez et al., 2015), scalp-level electric activity from electroencephalography (EEG) (Polanía et al., 2014), or a combination of the two (Pisauro et al., 2017). Taken together, these studies have implicated a fronto-parietal network underlying value-based decision-making, with more ventral/frontal regions serving as inputs (for reviews see: Bartra et al., 2013; Clithero, 2018) and more dorsal/parietal regions serving as integrators.

Despite their many strengths, these past value-based experiments have been limited by their inability to determine whether purported integrator regions are accumulating evidence or instead representing unchosen values (Boorman et al., 2011; Kolling et al., 2016; Wittmann et al., 2016), decision conflict (Frömer et al., 2024; Hunt et al., 2018; Kaanders et al., 2021; Kolling et al., 2012; Shenhav et al., 2014, 2016; Vassena et al., 2020), or time on task (Holroyd et al., 2018). In most experiments, these variables are highly correlated and difficult to distinguish. Increasing the value of the worse option while holding the better option constant will simultaneously increase the perceived conflict, increase the deliberation time, and slow the rate of evidence accumulation.

To distinguish between accumulated evidence and the other confounding explanations, we sought a factor that modulates accumulated evidence within a decision, independent of time. For this we turned to visual attention, measured with eye-tracking. Research on the attentional drift diffusion model (aDDM) has argued that gaze amplifies value during the choice process (Bhatnagar & Orquin, 2022; Krajbich et al., 2010; Smith & Krajbich, 2019; Westbrook et al., 2020; Sepulveda et al., 2020). This means that current gaze location should amplify value signals in the input regions, and that the balance of gaze allocation over the course of the decision should amplify accumulated evidence signals in the integrator regions. Both human and non-human primate research has confirmed gaze effects on value inputs in the orbitofrontal cortex (Lim et al., 2011; McGinty et al., 2016; Rich & Wallis, 2016; Hunt et al., 2018; but see McGinty, 2019). However, it has yet to be shown that these gaze-modulated inputs are integrated into accumulated decision values. Gaze modulated signals in purported integration regions would provide critical evidence against the alternative explanations (i.e., conflict, time, or unchosen value).

Here, we present the results of an fMRI experiment designed to provide evidence that integrator regions accumulate gaze-weighted evidence. Our approach was to slow down the decision process by gradually presenting choice-relevant information. Our task design allowed us to extend the decision-making period to approximately a minute, while also allowing us to dissociate the inputs’ sampled value (SV) signals from the integrators’ accumulated value (AV) signals (Gwinn, 2019). The inputs represent the perceived value of stimuli currently on the screen, while the integrators represent the values of previously presented stimuli within that choice problem. We simultaneously collected eye-tracking data, allowing us to test whether gaze modulates SV and AV representations. To preview the results, we found evidence for gaze-weighted SV signals in the reward network – the ventromedial prefrontal cortex (vmPFC) and ventral striatum – and gaze-weighted AV signals in the pre-supplementary motor area (pre-SMA) in the dorsomedial prefrontal cortex.

Results

Experiment description

Our choice task builds on an extensive literature examining choices between familiar snack foods. Instead of choosing between two food items, which typically only takes a few seconds, we asked subjects to choose between two food lotteries. A lottery consisted of 3-6 different items, each with a different probability of being selected. Subjects did not know anything else about the lotteries; they had to learn about them from experience (Hertwig et al., 2004). Specifically, subjects sampled a random draw from both lotteries every 4-8 seconds. They continued to sample random draws until they were ready to stop and choose one of the two lotteries (Fig. 1). Choosing a lottery led to a final random draw from that lottery, revealing the actual food that the subject would receive if that trial was rewarded at the end of the experiment.

Task timeline.

Subjects chose between two snack food lotteries on each trial. Subjects learned about the lotteries through random food draws. Every 4-8 seconds, subjects sampled a new draw from each lottery. They were allowed to sample as many times as they wanted but were incentivized to sample approximately 7 draws per trial. Sampled food draws were presented for 2 seconds, followed by a fixation cross appearing for 2-6 seconds with random jitter. The trial ended when the subject chose the left or right lottery, using the respective index finger. Upon making their choice, subjects were presented with a food drawn from their chosen lottery.

We placed no explicit limit on the number of draws subjects could sample within each trial (i.e. pair of lotteries). To prevent subjects from spending the entire session on a single trial, we gave them 45 minutes to make at least 60 choices. Subjects were informed that any unmade choices would be randomly completed by the computer, and any trials beyond 60 would be added to the list from which the rewarded trial would be drawn. If a subject were to sample the same number of stimuli in each trial, the optimal number would be seven. However, subjects could (and did) vary their number of samples throughout the experiment; we observed substantial variability in the number of samples per trial within most of our subjects (mean number of samples = 6.37 and mean within-subject SD = 2.61).

We constructed each trial’s sequence of items pseudo-randomly to minimize the correlation between the sampled value (SV) (i.e., the difference between the left and right values of the currently presented stimuli) and the accumulated value (AV) (i.e., the sum of SV at a given timepoint within a trial) (see Methods). For the first sample in each trial, SV and AV were always equal. After subsequent samples, SV diverged from AV, yielding two distinct time courses to look for in the fMRI data (Fig. 2).

Example trial with the sampled value and accumulated value.

The sampled value (SV; red) and accumulated value (AV; black) are plotted for this example trial. For the first draw, the SV and AV are identical. However, as the trial proceeds, the two signals diverge. In the model, a choice is made when the AV reaches a pre-specified decision boundary.

To measure the subjective value of each stimulus, we separately asked subjects to rate all the food items. Before entering the scanner, subjects rated 148 unique food items based on how much they would like to eat them at the end of the experiment. These ratings were incentivized (see Methods) and we retained only the positively rated items (0 to 10) for the choice task. We used each subject’s ratings to calculate SV and AV.

Behavioral results

A core assumption of SSMs is that individuals decide based on the evidence accumulated over the course of the decision. We thus anticipated that subjects would choose in line with AV and not just the most recent SV in the trial. We tested this key assumption with a mixed-effects logistic regression of choosing the left lottery on SV and AV at the time of choice. Subjects chose in line with both (AV excluding the final samples: β = 0.062, SE= 0.010, p < 0.001; SV: β = 0.257, SE= 0.024, p < 0.001) (Fig. 3A). The larger coefficient for SV compared to AV is an inevitable consequence of an SSM choice process. For any data generated by such a model, regressing the probability of choosing an option on the final SV and the total AV would produce larger coefficients on the final SV.

Choice data.

(a) The probability of choosing left based on the left – right value difference for both SV and AV. As the value difference becomes greater in favor of one option, the probability of choosing that option increases, for both SV and AV. (b) The effect of gaze on choice. The longer that subjects looked at one lottery over the other, over the course of the whole trial, the more likely they were to choose that lottery.

A second feature of SSM data is that easier decisions generally take less time (i.e., fewer samples). Therefore, we expected a negative correlation between the number of samples and the absolute difference in expected value between the two lotteries, as well as a higher probability of terminating a trial when the absolute value difference (|AV|) is higher. A mixed-effects regression of log(n samples) on the absolute expected value difference between the two lotteries revealed a negative relationship (β = -0.025, SE = 0.009, p = 0.006, two-sided t-test). A mixed-effects logistic regression of P(stop sampling) on |AV| also revealed a significant positive relationship (β = 0.062, SE = 0.009, p < 0.001, two-sided t-test). These tests confirm that our subjects were sampling more in more difficult trials.

A third behavioral pattern predicted by the aDDM and other gaze-based SSMs is that individuals should generally choose options that they have looked at more (Krajbich et al., 2010; Thomas et al., 2019; Westbrook et al., 2020). We thus anticipated a positive correlation between choice and relative dwell time over the course of the whole trial. We added dwell proportion advantage (left – right dwell time divided by total dwell time) to the choice regression and observed a positive effect on choosing the left lottery (β =1.642, SE = 0.459, p < 0.001).

To be certain that gaze-weighted evidence accumulated over the course of the whole trial, and not simply on the final sample of the trial, we excluded the final sample from each trial and re-ran the previous regression. All regressors were significant and positive (SV: β = 0.290, SE = 0.028, p < .001; AV: β = 0.059, SE = 0.011, p < 0.001; dwell proportion advantage: β = 1.636, SE = 0.462, p < 0.001). Thus, the influence of dwell time on choice occurred over the course of the entire decision, not simply on the final sample (Fig. 3B).

Neuroimaging results

Our general strategy for the fMRI data was to identify regions with BOLD activity correlating with the time series of |SV| or |AV|. We primarily focused on the absolute value differences since we were looking for evidence in favor of making any choice, not specifically the left or right choice. This way we could identify the key components of the SSM choice process: the inputs and the integrators. We then tested whether these representations were modulated by gaze.

We tested the following hypotheses: (1) vmPFC and striatum contain input but not integrator representations; (2) the vmPFC and striatum input representations are modulated by gaze; (3) the pre-supplementary motor area (pre-SMA), the intraparietal sulci (IPS), and dorsolateral PFC (dlPFC) contain integrator but not input representations; (4) the pre-SMA, IPS, and dlPFC integrator representations are modulated by accumulated gaze.

All general linear models (GLM) included variants of |SV| and lagged |AV|, either gaze weighted or not, interacted with boxcar functions covering each sample period (2 seconds). We use lagged |AV| (i.e., AV that excludes the current sample’s SV) because that helps to decorrelate |ΔSV| and |ΔAV| and ensure that any neural correlations with |ΔAV| are not due to the currently presented stimuli. In addition to the regressors of interest, each GLM contained a stick function for the button press onset, modulated by lagged |AV|, as a nuisance regressor event, as well as a boxcar function during the feedback screen, modulated by the value of the received item. We also added motion parameter time series to account for variation due to motion.

GLM 1: Sampled and Accumulated Value Results

The variables of interest in GLM 1 were absolute sampled value difference (|ΔSV|) and absolute accumulated value difference (|ΔAV|) where , t is in units of draws (i.e., pairs of samples), and T is the current draw. Note again that we sum |ΔAV| to T-1 in order to exclude the current sample.

We first investigated the vmPFC and striatum, regions that we hypothesized represent the inputs (i.e., SV). Looking specifically at BOLD activity in the vmPFC ROI defined in Bartra et al. (2013), we found a positive correlation with |ΔSV| (peak voxel x = -8, y = 56, z = -2; p < 0.05) but no significant correlation with |ΔAV| (peak voxel x = 6, y = 38, z = 26; p = 0.14) (Fig. 4A). In contrast, the striatum showed no significant relationship with either |ΔSV | or |ΔAV | (peak voxel: x = 16, y = 20, z = 12; p = 0.18 and peak voxel: x = 28, y = -10, z = -8; p = 0.13, respectively). A contrast analysis revealed that the vmPFC indeed correlated more strongly with |ΔSV| than with |ΔAV| (t(22) = 3.52, p = .002), as did the striatum (t(22) = 2.24, p = .035).

Regions responding to sampled and accumulated value

(a) vmPFC showed a significantly positive correlation with |ΔSV|, but did not respond to |ΔAV|. (b) Both pre-SMA and IPS (as well as the dlPFC, not pictured) showed a significantly positive correlation with |ΔAV|, but no correlation with |ΔSV|. Voxels thresholded at p < .05.

We next investigated the pre-SMA, IPS, and dlPFC regions, which we hypothesized represent the integrated values (i.e., AV). In the pre-SMA, whose ROI we defined based on Hare et al. (2011), we found a significant, positive relationship between BOLD activity and |ΔAV| (peak voxel: x = 4, y = 12, z = 50; p < 0.001), but no relationship with |ΔSV| (peak voxel: x = 8, y = 16, z = 54; p = 0.56) (Fig 4B). In the IPS, identified with the Harvard-Oxford Cortical Structural Atlas, we also saw a significant increase in activity as |ΔAV| increased (peak voxel: x = 42, y = -48, z = 38; p < 0.001), but no relationship with |ΔSV| (peak voxel: x = 38, y = -66, z = 46; p = 0.30) (Fig. 4C). In the dlPFC, whose ROI we defined based on Hare et al. (2011), we found the same pattern as the pre-SMA and IPS. There was a significantly positive relationship between BOLD activity and |ΔAV| (peak voxel: x = 42, y = 34, z = 28; p < 0.001), and no relationship between BOLD activity and |ΔSV| (peak voxel: x = 40, y = 38, z = 26; p = 0.55). Contrast analyses revealed that these three regions all correlated more strongly with |ΔAV| than with |ΔSV| (pre − SMA: t(22) = 2.87, p = .009; IPS: t(22) = 2.80, p = .010); dlPFC: t(22) = −2.10, p = .047).

In summary, we found evidence that the vmPFC represents inputs but not integrators, while the pre-SMA, IPS, and dlPFC represent integrators but not inputs (Fig. 5).

Beta plots from the vmPFC, striatum, pre-SMA, IPS, and dlPFC (GLM1).

Displayed are regression coefficients from each region for absolute sampled value difference (|ΔSV|) and absolute accumulated value difference (|ΔAV|). Both vmPFC and striatum show a similar pattern of BOLD activity that scales positively with |ΔSV|, but does not respond to |ΔAV|. The opposite pattern can be seen in the pre-SMA, IPS, and dlPFC which both show a strong positive correlation between BOLD activity and |ΔAV|, but no relationship to |ΔSV|.

GLM 2: Gaze Weighted Sampled and Accumulated Values

Having identified input and integrator regions, we next asked whether the activity in these regions was affected by gaze. The aDDM (and other gaze-weighted SSMs) predict that gaze to one option should amplify that option’s value relative to the other option (Krajbich et al., 2010; Thomas et al., 2019; Westbrook et al., 2020). Consider a simple model where an option’s value is weighted by the proportion of time during which it is looked at. Imagine two trials with the same pair of values, 7 on the left and 3 on the right. In Trial A, the subject looks left 30% of the time and right 70% of the time. In Trial B, the subject looks left 70% of the time and right 30% of the time. In Trial A, the net input value (“drift rate”) would be |0.3 · 7 − 0.7 · 3| = 0. In Trial B, the drift rate would be |0.7 · 7 − 0.3 · 3| = 4. In Trial A, the value advantage for the left option is canceled out by the gaze advantage for the right option. In Trial B, both the value and gaze advantage favor the left option, leading to strong evidence in favor of a left choice. In sum, there is stronger evidence when gaze difference is aligned with value difference. This should be true for both SV and AV, though SV is only affected by gaze during the current draw, while AV is affected by gaze over the entire trial.

To test this prediction, GLM 2 used the gaze-weighted values of the items, looking at the BOLD signal for the entire duration of each presentation of food pairs. The sampled gaze-weighted values (SVGaze) were derived, as in the example above, by multiplying the proportion of left gaze time with the left value, and the proportion of right gaze time with the right value, within a sample. Accumulated gaze-weighted values (AVGaze) were the sums of SVGaze across samples. GLM 2 included both |ΔSVGaze| = |SVGaze LeftSVGaze Right| and |ΔAVGaze| = |AVGaze LefAVGaze Right|. Again, to reduce the correlation between sampled and accumulated values, ΔAVGaze did not include the currently presented pair of food items.

In the vmPFC, we found a significant correlation with |ΔSVGaze| (peak voxel: x = -4, y = 36, z = 4; p < 0.005) but no effect of |ΔAVGaze| (peak voxel: x = -8, y = 36, z = 2; negative beta, p = 0.42). The striatum also showed a correlation with |ΔSVGaze| (peak voxel: x = 8, y = 10, z = - 6; p < 0.01), but no corresponding effect for |ΔAVGaze| (peak voxel: x = -20, y = 4, z = -6; negative beta, p = 0.18).

In contrast, in the pre-SMA we found a significantly positive relationship with |ΔAVGaze| (peak voxel: x = 4, y = 18, z = 54; p = 0.03), but a non-significant correlation with |ΔSVGaze| (peak voxel: x = -8, y = 16, z = 54; p = 0.06). The IPS, on the other hand did not seem to respond to either |ΔSVGaze|or |ΔAVGaze| (peak voxel: x = -14, y = -54, z = 48; p = 0.12, and peak voxel: x = 48, y = -34, z = 44; p = 0.48 respectively) (Fig. 6). Similar to the IPS, the dlPFC did not appear to incorporate gaze, responding to neither |ΔSVGaze| nor |ΔAVGaze| (peak voxel: x = 42, y = 28, z = 26; p = 0.20; peak voxel: x = -46, y = 26, z = 18; p = 0.20, respectively).

Regions responding to sampled gaze-weighted value (SGWV) and accumulated gaze-weighted value (AGWV).

SGWV correlates with activity in (a) vmPFC, and (b) striatum, while AGWV correlates with activity in (c) pre-SMA. Voxels thresholded at p < .05.

GLM 3: Gaze Contrast Results

Another way to test for the effect of gaze on value representations is to directly contrast cases where gaze is focused on the better option to cases where gaze is focused on the worse option. As described earlier, there should be more evidence when gaze and high value are aligned than when they are misaligned. Thus, regions representing accumulated evidence should show a significant effect in this contrast. The sign of that effect is obvious in the case of SV. For any given sample, if the current left item is better than the right item, then we should see stronger input activity when the subject is currently looking left compared to right. The prediction is less obvious for the integrators and accumulated values. Suppose the left lottery is better than the right lottery: how should the current gaze location affect the integrator activity? What matters for the integrator activity is where the subject has looked more in the past, since the integrator represents all the evidence accumulated so far. As it turns out, at any time point t, the gaze location at t is negatively correlated with the amount of time spent looking at that location so far in the trial. That is, if the subject is currently looking left then she has, on average, spent more time looking right up until that point.

We established this fact with a mixed-effects logistic regression of whether subjects at time t were looking left (1) or right (0) on AV, SV, and dwell advantage up until t − 1. The regression revealed a significantly negative beta on dwell advantage (β = -0.308, p < 0.001), indicating that subjects had looked at the currently fixated lottery at time t less than the other lottery. What this means is that if the left lottery is better than the right lottery, we should see stronger integrator activity when the subject is looking right compared to left.

To test these predictions, we ran a third GLM (GLM3) that included SV, AV, a dummy variable for gaze location (left = 1, right = 0) at time t, and the interactions of this dummy with both SV and AV.

For the first hypothesis we focused on the fixated vs. non-fixated contrast for sampled value (SV · gazeleft > SV · gazeright). Since this is a replication of Lim et al. (2011), we report the results of one-sides statistical tests. Here we found a marginally significant effect in the vmPFC (peak voxel: x = 4, y = 54, z = -2; one-sided p = 0.05; Fig. 7). We found no such effect in the striatum (peak voxel: x = -16, y = 18, z = -6; negative beta, one-sided p = 0.17).

Representation of gaze-weighted evidence.

The vmPFC shows a positive interaction between sample value (SV) and gaze location, while the pre-SMA shows a negative interaction between accumulated value (AV) and gaze location. Both results are consistent with gaze enhancing the value of fixated items.

We also looked for these effects in our integrator regions. While some of the effects were marginal, they were in the opposite direction as expected: pre-SMA (negative beta, peak voxel: x = 4, y = 8, z = 52; p = 0.14), IPS (negative beta, peak voxel: x = 34, y = -50, z = 46; p = 0.07), and dlPFC (negative beta, peak voxel: x = -46, y = 26, z = 18; p = 0.09).

For the second hypothesis, we focused on the fixated vs. non-fixated contrast for accumulated value (AV · gazeleft > AV · gazeright). Here we found that activity was marginally lower for fixated versus non-fixated lotteries in the pre-SMA (peak voxel: x = -6, y = 14, z = 58; p = 0.06), and non-significantly so in the IPS (peak voxel: x = 36, y = -60, z = 30; p = 0.12) (Fig. 6) and the dlPFC (peak voxel: x = 42, y = 28, z = 26; p = 0.17).

We also looked for these effects in our input regions. Here we found no effects in the vmPFC (peak voxel: x = 4, y = 58, z = -2; p = 0.13) and marginal effects in the striatum (peak voxel: x = 32, y = -10, z = -10; p = 0.07). If anything, these perhaps reflect a trace of the input activity from the previous sample.

Modeling a non-uniform temporal weighting function

While sequential sampling models like the DDM assume equal weighting of information during evidence accumulation, other models allow for information sampled at different time points to differentially impact choice. For example, information that arrives early (i.e., primacy) or late (i.e., recency) can preferentially influence decision-making (Usher & McClelland, 2001). This could especially be the case in our task where information has to be integrated over a long period of time.

To account for temporal biases during evidence accumulation, we fit participant data with a model that incorporates primacy and recency biases into the sequential sampling process (Galdo et al., 2022; Pooley et al., 2011). Within a trial t, the weight of sample i is determined by the following temporal weighting function:

where εp and εr are weights on primacy and recency, respectively, η is a lower bound on the weight of any sample, and Nt is the number of samples on trial t.

In the context of the current experiment, we assume that decision-makers accumulate evidence (AV) based on the sum of sampled evidence (SV) weighted by the temporal weighting function:

At the time of choice, the decision-maker chooses according to the logit function:

with inverse-temperature parameter β governing the how strongly AVnon-uniform,t governs the selection of the higher value lottery.

For models incorporating visual attention, the only modification is to use SVGaze instead of SV.

Behavioral model results

We observed a substantial recency bias, with all participants showing εr > εp when using SV and all but one participant showing εr > εp when using SVGaze (Fig. 8) as inputs. Independent samples t-tests (with the Welch approximation to degrees of freedom) confirmed that participants’ εr parameters were significantly larger than their εp parameters, for both SV (t(36) = 12.54, p = 10−14) and SVGaze (t(20) = 7.25, p = 10−6) inputs. Goodness-of-fit measures based on the Bayesian Information Criterion (BIC) also favored a non-uniform temporal weighting function (No gaze: BIC = 1262; Gaze: BIC = 1593) over a uniform temporal weighting function (No gaze: BIC = 1307; Gaze: BIC = 1658). However, it is worth noting that the recency bias is surely over-estimated based on our parameter-recovery exercise (Methods) and the bias introduced by allowing participants to choose when to stop collecting evidence (Discussion).

Non-uniform temporal weighting.

In both gaze-weighted and non-gaze-weighted models, participants showed stronger recency than primacy effects, both in terms of (A) the model parameters, and (B) the resulting temporal weighting functions averaged across all trials. Error bars are standard errors clustered by participant.

fMRI model results

To identify brain regions that encoded |AVnon-uniform|, we tested whether activation parametrically varied as a function of these new values using updated versions of the previously fit GLMs in the previously defined ROIs. Within each ROI, we used a FWE corrected threshold of p<0.05 and cluster-forming threshold of p<0.001 with 5000 permutations.

For GLM1 we again found significant correlations between |AVnon-uniform| and BOLD activity in the pre-SMA (peak voxel x = 2, y = 18, z = 50; p = 0.0006), IPS (peak voxel x =46, y = -46, z = 42; p = 0.0004), and dlPFC (peak voxel x =42, y = 34, z = 28; p = 0.0002). Additionally, we found weaker correlations in the vmPFC (peak voxel x = 4, y = 38, z = 24; p = 0.010) and striatum (peak voxel x = 16, y = 16, z = -10; p = 0.032).

The results for GLM2 with gaze-weighted AVnon-uniform were very similar, with significant correlations in the pre-SMA (peak voxel x = 2, y = 18, z = 50; p = 0.0008), IPS (peak voxel x =50, y = -40, z = 46; p = 0.0002), and dlPFC (peak voxel x =42, y = 34, z = 28; p = 0.0006), as well as a weaker correlation in the vmPFC (x = 6, y = 38, z = 24; p = 0.030).

For GLM3, where results were already marginal, the new analysis with |AVnon-uniform| yielded no significant clusters.

Discussion

In this paper we presented results from a simultaneous eye-tracking and fMRI study of value-based decision-making, using an expanded-judgment task where subjects sampled from, and then chose between food lotteries. We found that the vmPFC, and to a lesser extent the striatum, represent sampled input values, and the pre-SMA, IPS, and dlPFC appear to compute accumulated values. We found that sampled value signals in vmPFC and striatum are modulated by gaze allocation (Lim et al., 2011; McGinty et al., 2016), and more importantly, we found that this gaze modulation extends to accumulated value signals in pre-SMA.

These results provide novel evidence for the neural mechanisms underlying the SSM process, as exemplified by the DDM, which appears to govern many types of decisions (Busemeyer, 1985; Ratcliff, 1978). The gaze modulation of the accumulated value signals in the pre-SMA provides critical evidence that this region indeed represents accumulated evidence, as opposed to unchosen values (Boorman et al., 2011; Kolling et al., 2016; Wittmann et al., 2016), decision conflict (Hunt et al., 2018; Kaanders et al., 2021; Kolling et al., 2012; Shenhav et al., 2014, 2016; Vassena et al., 2020), or time on task (Holroyd et al., 2018). While accumulated evidence is typically correlated with these other measures, we were able to dissociate them by taking advantage of the fact that accumulated evidence, but not the other measures, are modulated by gaze location. Value is amplified by gaze (Smith & Krajbich, 2019), leading to stronger value signals in the brain when the decision maker is looking at the higher value option. This is what we observed in the vmPFC and striatum for input values, and in the pre-SMA for accumulated values – consistent with an SSM account.

These findings were made possible by considering the role that visual attention plays in the decision process. While SSMs capture choice behavior and RTs extremely well, most do not consider the effects of attention. Attention is thought to shift over the course of the decision, amplifying the attended inputs and/or inhibiting the non-attended inputs (Diederich, 1997; Johnson & Busemeyer, 2005; Roe et al., 2001). These shifts in attention are reflected in eye-movements (Hoffman & Subramaniam, 1995), which affect choice outcomes (Fiedler et al., 2013; Fiedler & Glöckner, 2015; Folke et al., 2016; Glaholt & Reingold, 2009; Gwinn et al., 2019; Janiszewski et al., 2013; Jiang et al., 2016; Kim et al., 2012; Konovalov & Krajbich, 2016; Lopez-Persem et al., 2016; Orquin & Mueller Loose, 2013; Pärnamets et al., 2015; Polonio et al., 2015; Russo & Leclerc, 1994; Shi et al., 2012; Smith & Krajbich, 2018; Stewart et al., 2016; Vaidya & Fellows, 2015; Venkatraman et al., 2014; Wang et al., 2010; Sheng et al., 2020; Vanunu et al., 2021), and their effect on the choice process is captured by the attentional drift diffusion model (aDDM) and other related SSMs (Ashby et al., 2016; Fisher, 2021; Glickman et al., 2019; Jang et al., 2021; Krajbich et al., 2010; Li & Ma, 2021; Smith & Krajbich, 2019; Teoh et al., 2020; Westbrook et al., 2020; X. Yang & Krajbich, 2023; Zilker & Pachur, 2023).

Neural implementations of SSMs generally require at least two sets of neurons, one set to represent the current information from the stimuli and a second set to integrate that information over time. In the current study, the information being used to make decisions was subjective value (i.e., utility). A large body of work has implicated the vmPFC and striatum in representing value (Bartra et al., 2013; Boorman et al., 2009; Chib et al., 2009; Levy & Glimcher, 2011). Interestingly, our results confirm that these two regions do represent value information, but primarily just for the presented stimuli. The integrated value information, which is what ultimately determines the decision, is instead encoded in the pre-SMA, IPS, and dlPFC, regions that have received less attention in the literature.

By using eye-tracking, our study extends previous work connecting computational models to fMRI data. Our results align with Hare et al.’s (2011) proposed neural model in which the vmPFC provides inputs to the pre-SMA and IPS. While the Hare model was based on dynamic causal modeling results, our task provides a more direct test of the proposed neural network. Additionally, our eye-tracking data allows us to identify additional features of the network. First, we identify the striatum as an input region, a result that only appears in analyses that account for gaze-weighted value (Lim et al., 2011). Second, we find that the pre-SMA is sensitive to gaze-weighted accumulated value, while the IPS is not. The reason for this distinction between the pre-SMA and IPS is unclear, but it does suggest that the pre-SMA is more likely to be the final decision-making region, consistent with some recent studies (Juechems et al., 2017; Pisauro et al., 2017; Rodriguez et al., 2015; Rouault et al., 2019). Of course, the recruitment of the pre-SMA may be because our subjects made their decisions with a button press, which is supported by the correlations between accumulated value and motor cortex activity. Had our study required eye-movements to indicate a choice, we may very well have observed integrator activity in other regions such as the frontal eye fields or posterior parietal cortex (O’Connell et al., 2018).

The striatum activity is difficult to interpret. It showed no correlation with sampled or accumulated value in the models without gaze. However, once gaze was included, the results were somewhat contradictory. When analyzing gaze at the dwell level, we found greater striatal activity in response to accumulated values, but when gaze was used as a modifier to the true value of each item, we found greater striatal activity in response to the sampled input value. This may be due to the limitations of running a GLM on the dwell level, since our TR was 2.6 seconds and dwells lasted for 0.66 seconds (SE= 0.03 seconds). Additional research is needed to resolve this issue.

A major advantage of this study is its use of a task designed to slow down the decision process and force sequential integration of information. Such expanded judgment tasks have been used to study SSM assumptions in perceptual decision-making, more recently in combination with neural recordings, but mostly with electrophysiology in rats and monkeys (Brunton et al., 2013; Cisek et al., 2009; Gluth et al., 2012; Tsetsos et al., 2012; T. Yang & Shadlen, 2007).

On the other hand, one concern with longer decision times is that decision-makers might either under-weight (i.e., forget), or put too much weight on, early information. Our analysis of subjects’ temporal weighting functions did reveal a primacy effect, where the first sample in each trial was overweighted, as well as a recency effect, where the last samples in each trial were also overweighted. Nonetheless, using a temporal weighting function with these primacy and recency effects did not substantially change the conclusions from our fMRI analysis. While these analyses did reveal weak support for accumulator dynamics in the vmPFC and striatum, these results should be interpreted with caution because adding recency effects into the accumulated-value signal increased the correlation between |SV| and |AV| from 0.12 to 0.21. Moreover, the recency effect is surely overestimated due to well-known statistical artifacts (Mullett & Stewart, 2016). In short, because decision-makers tend to terminate the decision process after strong pieces of evidence, and because a strong piece of evidence will tend to have a large noise component, information that appears at the end of a trial will appear to have a stronger influence on choices than it should.

Taken together, we separated sampled input values from the overall decision value, or accumulated value, and found a network of brain regions that are involved in an aDDM-like choice process. This process involves passing sampled input values to an integrator which responds to not only the values themselves, but also to the gaze-modulated values. These results indicate that gaze effects on value representations are not epiphenomenal; rather, they reflect how gaze is incorporated into the decision process, affecting how we perceive the value of each option over the course of the entire decision.

Materials and Methods

Experimental Design and Statistical Analyses

Subjects

Twenty-eight undergraduate students at The Ohio State University participated in this study. Due to time constraints, 4 subjects were unable to finish all 3 runs of the scan. One subject was excluded for not choosing in line with their ratings. This left 23 subjects in the analysis (14 men, 9 women, average age: 22.61). Another 3 subjects could not be calibrated on the eye-tracker and so are discarded from any analyses involving eye tracking (leaving 13 men, 7 women, average age: 22.9). All subjects were right-handed, had normal or corrected-to-normal vision, and no history of neurological disorders. This study was approved by The Ohio State Biomedical Sciences IRB.

Stimuli and Tasks

Rating Task: Outside of the scanner, subjects rated 148 food items using a continuous rating scale from -10 (extreme dislike) to 10 (extreme like), with 0 being indifference towards an item. To choose their preferred rating, subjects moved the mouse across the rating scale and then clicked the left mouse button when the cursor was at their desired value for the item. We used an incentivization procedure for these ratings. There was a 50% chance that the rating task would be used to determine the subject’s reward. In such cases, the computer would randomly select two foods and the subject would receive the one with the higher rating. If both items were rated negatively, the subject would not receive any food. Negative values (-10 to 0) were excluded from the choice task except for two subjects who did not have enough positively valued items.

Choice Task: Once in the MRI scanner, subjects chose between pairs of lotteries. Lotteries were constructed by creating 1000 potential lotteries of randomly selected items evenly split between having 3, 4, 5, or 6 items. Each item in a lottery was then assigned a probability of being drawn, with probabilities summing up to 1 in each lottery. For each of these lotteries we calculated its expected utility by multiplying the subjective value (i.e., rating) of each item (Vi) in the lottery by its associated probability (Pi) of being drawn, and summing the results

where N is the total number of items in that lottery. Subjects were not told these probabilities, nor were they told which items were in each lottery – they had to learn this every trial.

We tracked subjects’ eye movements in the scanner using an Eyelink 1000 plus (SR Research) set at 500 Hz. Eye position was monitored with the camera and infrared source reflected in the mirror attached to the head coil. The eye tracker was calibrated at the beginning of the session.

Food items were sampled from the lotteries and presented on the screen one pair at a time. Each draw was presented for 2 seconds, followed by a fixation cross for 2-6 seconds. This process repeated until the subject made a choice. Once a subject was ready to choose a lottery, they used the index finger of their left (right) hand to press a button corresponding to the left (right) lottery. They were then presented with a random item from the lottery they had chosen (on the same side of the screen as the chosen basket) for 2 seconds, indicating the food they would receive from this trial, should it be randomly selected at the end of the study (Fig. 1).

We constructed each trial’s sequence of items pseudo-randomly to minimize the correlation between the sampled value signal (|SV|; i.e., the absolute difference in sampled input values) and the accumulated value signal (|AV|; i.e., the absolute difference in accumulated values). For the first draw in each trial, sampled and accumulated value signals are equal. On subsequent draws, the SV diverges from the AV signal, yielding two distinct time courses to look for in the fMRI data (Fig. 2). Across subjects, |SV| and |AV| had an average correlation of 0.23 (SD = 0.15, min = 0.11, max = 0.43), while |SV| and lagged |AV| (i.e., the variables in our GLMs) had an average correlation of 0.12 (SD = 0.10, min = -0.01, max = 0.24).

The task structure was designed to incentivize subjects to average 7 samples per trial. They had 45 minutes to make 60 choices, and any trials that they did not complete by the end of the experiment were made for them randomly by the computer. Any trials they completed beyond 60 were simply added to the pool of potentially rewarded trials. At the end of the study, there was a 50% chance that one of the choice trials would be randomly selected for payment (otherwise the rating task was selected for payment), in which case the subject received the corresponding food from that trial.

Outside of the scanner, subjects first completed a 5-minute practice section where they chose between baskets made up of cars. After each trial, they were given feedback on how many samples they took and were reminded that the goal was to take 7 samples on average. These choices were not incentivized.

Temporal-weighting-function model fits

We estimated model parameters using an iterative maximum aposterior (MAP) approach (Huys et al., 2011; Wittmann et al., 2020). This method improves upon maximum likelihood estimation (MLE) by simultaneously estimating parameters at both the subject- and group-level. This hierarchical procedure constrains subject-level parameters and reduces the influence of outlier data.

Group-level parameters were initialized with uninformative Gaussian priors with mean of 0.1 and variance of 100. For all models, η was held constant at 1. During the expectation step, we estimated model parameters (εp, εr, β) for each participant using MLE and calculated the log-likelihood of their choices given the model parameters. During the maximization step, we calculated the maximum posterior probability based on the observed choices and prior group-level parameters, and then updated the group-level parameters to generate posterior parameter estimates. These posterior parameter estimates were then used as the priors in subsequent steps in this procedure. We iteratively repeated the expectation and maximization steps until convergence of the posterior likelihood summed over group-level parameters exceeded a change of less than 0.0001 from the previous iteration (for a maximum of 800 iterations). During this procedure, bounded free parameters were transformed from Gaussian space to native model space using link functions (e.g., sigmoidal function for εp, εr) to ensure accurate estimation near the bounds.

We assessed parameter recovery of this model on simulated choices using best-fitting model parameters. We then refit the simulated choices using the same MAP process described above. We found strong Pearson’s r correlations between the generated and estimated parameter values (r > .8; Fig. 9), though both parameters were systematically over-estimated.

Correlation plots illustrating parameter recovery.

MRI Data Acquisition

MRI scanning was carried out at the OSU Center for Cognitive and Behavioral Brain Imaging. We used a 3T Siemens Magnetom Prisma scanner with a 32-channel head array coil to collect the neural data. Functional data were acquired with a T2*-weighted gradient-echo sequence (48 slices, interleaved, with a field of view of 155×x1554, with an in-plane resolution of 3 mm isotropic and a 3mm slice gap, TR = 2600 ms, TE = 28 ms, 80° flip angle). Slices were oriented such that the anterior side of the acquisition was raised dorsally by 30 degrees compared to the line formed by joining the anterior commissure to the posterior commissure. A high-resolution MPRAGE anatomical scan (256 slices, field of view 22×x256, with an in-plane resolution of 1 mm and no slice gap, TR = 1900 ms, TE = 4.44 ms, 12° flip angle) was also acquired for each participant. Each participant was scanned in one 1.5-hour session, which included the three experimental runs (15 minutes each) and the high-resolution MPRAGE anatomical scan. Additionally, a resting state scan (5 minutes) and a DTI scan were acquired, but these data are not presented here. Stimuli were presented using Psychtoolbox (Brainard, 1997; Kleiner et al., 20070101; Pelli, 1997) for MATLAB (MATLAB and Statistics Toolbox, 2016) and displayed with a DLP projector onto a screen mounted in the rear of the scanner bore.

MRI Preprocessing and analyses

Statistical parametric mapping (SPM12, Update Rev. Nr. 6905; Functional Imaging Laboratory, University College London) was used to carry out the preprocessing of fMRI data. First, we corrected for the different slice times per echo planar image (EPI) across the total volume (using the bottom slice as a reference) and then realigned each volume in a run to the mean EPI volume from that run. Next, the anatomical scan was coregistered with the MNI average of 152 brains template, and the mean EPI per run was used to coregister all functional scans to this coregeistered anatomical scan. In order to warp the EPIs to MNI space, SPM12’s normalise function was applied to the coregistered anatomical scan and the resulting warping parameters were applied to the coregistered EPIs. The resulting images were smoothed using an isometric Gaussian kernel (8 mm full width at half maximum). First level GLMs were run using SPM on each subject individually, including contrasts of interest. Runs were combined using fslmerge. We then used FSL’s randomise function to run second-level, non-parametric significance tests with threshold-free clustering and family wise error (FWE) correction to find significant clusters for the described effects.

We used a canonical hemodynamic response function (HRF) without time derivatives. We modeled the noise as an AR(1) process. We additionally used a high-pass filter set to 128 s. We also used global signal normalization with a value of 0.8. We used no corrections for susceptibility distortions. All general linear models (GLM) included variants of |ΔSV | and lagged |ΔAV |, either gaze weighted or not, interacted with boxcar functions covering each sample period (2 seconds) – see details below. In addition to the regressors of interest, each GLM contained the following nuisance terms: a stick function for trial number, a stick function for the button press onset modulated by lagged |ΔAV|, and a boxcar function during the feedback screen, modulated by the value of the received item. We also added motion parameter time series to account for variation due to motion.

In GLM1, in addition to the nuisance terms, we included the following regressors: |ΔSV| and lagged |ΔAV|.

In GLM2 in addition to the nuisance terms, we included the following regressors: |ΔSVGaze| and lagged |ΔAVGaze|.

In GLM3 in addition to the nuisance terms, we included the following regressors: |ΔSV|, lagged |ΔAV|, and gaze location.

None of the regressors in the models were orthogonalized.

Prior to second-level statistical modeling, data were smoothed using a 6.0 mm3 FWHM Gaussian kernel. For the second-level analyses, we used permutation-based random-effects models to run one-sample t-tests across subjects. For significance testing, we used FWE corrected threshold of p<0.05 and cluster-forming threshold of p<0.001 with 5000 permutations.

Region of interest specifications

ROIs were based upon previously published brain atlas parcellations and relevant literature. We used the Harvard-Oxford atlas for the intraparietal sulcus (IPS) and striatum (Desikan et al., 2006). The dorsolateral prefrontal cortex (dlPFC) and pre-supplementary motor area (pre-SMA) were defined based on (Hare et al., 2011). The ventromedial prefrontal cortex (vmPFC) was defined in (Bartra et al., 2013).

Data Availability

Experiment and analysis code as well as Behavioral and eye-tracking data are available on the Open Science Framework: https://osf.io/eyxvb/files/.

fMRI statistical maps are available on Neuro Vault: upload in progress. Raw data is available upon request.

Acknowledgements

Thanks to Kareem Soliman for research assistance with recruiting and to the Cattell Sabbatical Fund for financial support.