Tuning the speedaccuracy tradeoff to maximize reward rate in multisensory decisionmaking
 Cited 19
 Views 1,765
 Annotations
Abstract
For decisions made under time pressure, effective decision making based on uncertain or ambiguous evidence requires efficient accumulation of evidence over time, as well as appropriately balancing speed and accuracy, known as the speed/accuracy tradeoff. For simple unimodal stimuli, previous studies have shown that human subjects set their speed/accuracy tradeoff to maximize reward rate. We extend this analysis to situations in which information is provided by multiple sensory modalities. Analyzing previously collected data (Drugowitsch et al., 2014), we show that human subjects adjust their speed/accuracy tradeoff to produce nearoptimal reward rates. This tradeoff can change rapidly across trials according to the sensory modalities involved, suggesting that it is represented by neural population codes rather than implemented by slow neuronal mechanisms such as gradual changes in synaptic weights. Furthermore, we show that deviations from the optimal speed/accuracy tradeoff can be explained by assuming an incomplete gradientbased learning of these tradeoffs.
https://doi.org/10.7554/eLife.06678.001Introduction
In the uncertain and ambiguous world we inhabit, effective decision making not only requires efficient processing of sensory information, but also evaluating when enough information has been accumulated to commit to a decision. One can make fast, but uninformed and thus inaccurate, decisions or one can elect to make slower, but wellinformed, choices. Choosing this socalled speedaccuracy tradeoff (SAT) becomes even more complex if several sensory modalities provide decisionrelated information. For example, the strategy for crossing a busy street will be very different in bright daylight, when one can rely on both eyes and ears to detect oncoming vehicles, as compared to complete darkness, in which case the ears will prove to be the more reliable source of information.
The SAT has been extensively studied for perceptual decisions based on information provided by a single sensory modality. For the most commonly studied visual modality, it has been shown that animals accumulate evidence nearoptimally over time (Kiani and Shadlen, 2009). In this context, the efficiency of the chosen SAT is assessed in comparison to diffusion models, a family of models that trigger decisions as soon as a drifting and diffusing particle reaches one of two bounds (Ratcliff, 1978). In these models, which describe the SAT surprisingly well despite their simplicity (Ratcliff, 1978; Palmer et al., 2005; Ratcliff and McKoon, 2008), the drift represents the available sensory information, and the diffusion causes variability in decision times and choices. The level of the bound controls the SAT, with a higher bound leading to slower, more accurate choices. Instructed changes to the SAT have been shown to be well captured by changes to only the bound in a diffusion model (Reddi and Carpenter, 2000; Reddi et al., 2003; Palmer et al., 2005). Without being explicitly instructed to make either fast or accurate decisions, welltrained human subjects are known to adjust their SAT to maximize their reward rate (Simen et al., 2009; Balci et al., 2011), or a combination of reward rate and choice accuracy (Bogacz et al., 2010). These SAT adjustments are also well captured by tuning the corresponding diffusion model bounds. Thus, we can define the SAT directly in terms of these bounds: a constant SAT refers to behavior predicted by diffusion models with constant bounds, and a SAT that changes across trials requires a diffusion model with bounds that vary on the same timescale.
Here, we extend the analysis of how human decisionmakers adjust their SAT to situations in which they receive information from multiple sensory modalities. We have previously shown that, even in the case of multiple modalities and timevarying evidence reliability, humans are able to accumulate evidence across time and modalities in a statistically nearoptimal fashion (Drugowitsch et al., 2014). This analysis was based on a variant of diffusion models that retains optimal evidence accumulation even for multiple sources of evidence whose reliability varies differentially over time. As we focused on evidence accumulation in that study, we were agnostic as to how the SAT varied across stimulus conditions; thus, we left the model bounds, which controlled the SAT, as free parameters that were adjusted to best explain the subjects' behavior.
In this followup study, we use the previously devised model to analyze whether and how effectively human subjects adjust their SAT if they have evidence from multiple modalities at their disposal. Specifically, we find that subjects adjust their SAT on a trial by trial basis, depending on whether the stimuli are unisensory or multisensory. Moreover, the changes in SAT result in reward rates that are close to those achievable by the besttuned model, a finding that is robust to changes in assumptions about how the reward rate is computed. Finally, we demonstrate that small deviations from the optimal SAT seem to stem from an incomplete reward rate maximization process. Overall, our findings hint at decisionmaking strategies that are more flexible than previously assumed, with SATs that are efficiently changed on a trialbytrial basis.
Results and discussion
Our analysis is based on previously reported behavioral data from human subjects performing a reactiontime version of a heading discrimination task based on optic flow (visual condition), inertial motion (vestibular condition), or a combination of both cues (combined condition) (Drugowitsch et al., 2014). Reliability of the visual cue was varied randomly across trials by changing the motion coherence of the optic flow. Subjects experienced forward translation with a small leftward or rightward deviation, and were instructed to report as quickly and as accurately as possible whether they moved leftward or rightward relative to straight ahead.
First, we ask whether subjects can adjust their SAT from trial to trial. Having related changes in the SAT to changes in diffusion model bounds, this is akin to asking if their behavior could arise from a diffusion model with a bound that changes on a trialbytrial basis. Our diffusion model necessitates the use of a scaled bound, which is the constant actual bound per modality divided by the diffusion standard deviation that depends on optic flow coherence. The use of such a scaled bound prohibits us from fitting actual bound levels, but rather scaled versions thereof. For the same reason, we cannot unambiguously predict the behavior that would emerge from a model with actual bounds matched across modalities (i.e., a constant SAT). Therefore, we instead rely on a qualitative argument about how such matched bounds would be reflected in the relation between decision speed and accuracy across modalities.
As Figure 1A illustrates for subject B2, increasing the coherence of the optic flow caused subjects to make faster, more accurate choices. This pattern was similar if only the visual modality (solid blue lines, Figure 1A) or both modalities were present (solid red lines, Figure 1A). This result is qualitatively compatible with the idea that subjects used a single SAT within conditions in which the same modality (visual/vestibular) or modality combination (combined) provided information about heading. Within the framework of diffusion models with fixed actual bounds on the diffusing particle, such a single SAT predicts that, once the amount of evidence per unit time (in our case controlled by the coherence) increases, choices ought to be on average either faster, more accurate, or both in combination, but never slower or less accurate. However, our data violate this prediction, thus showing that the SAT changes across conditions. Consider, for example, the choice accuracy and reaction times of subject B2 in both the visualonly (top blue circle, Figure 1A) and combined condition (top red square, Figure 1A) trials at 70% motion coherence. Although the combined condition provides more evidence per unit time due to the additional presence of the vestibular modality, responses in the combined condition are less accurate than in the visualonly condition, violating the idea of a single SAT (that is, a fixed diffusion model bound) across conditions. The same pattern emerged across all subjects, whose choices in the combined condition were on average significantly less accurate than in the visual condition (for 70% coherence; onetailed Wilcoxon signedrank W = 54, p < 0.002). As these stimulus conditions were interleaved across trials, our results clearly indicate that subjects were able to change their SAT on a trialbytrial basis.
A less common variant of diffusion models bounds the posterior belief rather than the diffusing particle. In this case, changing the amount of evidence per unit time only affects the response time but not its accuracy, which remains unchanged. When increasing coherence, we observed a change of both response time and choice accuracy within each condition (Figure 1A), supporting a bounded diffusing particle rather than a bounded posterior belief. In rare cases, the two model variants predict the same behavior (Drugowitsch et al., 2012), but this is not the case in our context.
Next, we explore whether these adjustments in the SAT serve to maximize subjects' reward rate. Even though subjects did not receive an explicit reward for correct trials, we assumed that correct decisions evoke an internal reward of magnitude one. Therefore, we computed reward rate as the fraction of correct decisions across all trials, divided by the average time between the onset of consecutive trials. We proceed in two steps: first, we ask whether subjects have a higher reward rate across trials of the multisensory condition compared to both unimodal conditions. This is an important question because we have found previously that subjects accumulate evidence optimally across modalities (Drugowitsch et al., 2014), which implies that, with proper setting of the SAT, they should be able to obtain higher reward rates in the multisensory condition compared to the unimodal conditions. As shown in Figure 1B, reward rate is indeed greater, for all subjects, when both sensory modalities are presented than for either modality alone (both unimodal vs combined: Wilcoxon signedranks W = 0, p < 0.002). This confirms that subjects combined evidence across modalities to improve their choices.
We now turn to the question of whether subjects tune their SATs to maximize the reward rate. For this purpose, we focus on the reward rate across all trials rather than for specific stimulus conditions, as subjects might, for example, trade off decision accuracy in unimodal conditions with decision speed in the combined condition. To determine how close subjects were to maximizing their reward rate, we needed to compute the best achievable reward rate. To do this, we tuned the bounds of our modified diffusion model to maximize its reward rate, while keeping all other model parameters, including the nondecision times and choice biases, fixed to those resulting from fits to the behavior of individual subjects. As a starting point, we allowed bounds to vary freely for each stimulus modality and each motion coherence, to provide the greatest degrees of freedom for the maximization. As described further below, we also performed the same analysis with more restrictive assumptions. We call the reward rate resulting from this procedure the optimal reward rate. This reward rate was subjectdependent, and was used as a baseline against which the empirical reward rates were compared.
Figure 2A shows the outcome of this comparison. As can be seen, all but one subject featured a reward rate that was greater than 90% of the optimum, with two subjects over 95%. As a comparison, the best performance when completely ignoring the stimulus and randomly choosing one option at trial onset (i.e., all actual bounds set to zero) causes a significant 25–30% drop in reward rate (subjects vs random: Wilcoxon signedrank W = 55, p < 0.002). Thus, subjects featured nearoptimal reward rates that were significantly better than those resulting from rapid, uninformed choices.
Our analysis of the subjects' reward rate relative to the optimum is fairly robust to assumptions we make about how this reward rate and its optimum are defined. Thus far, we have assumed implicit, constant rewards for correct decisions and the absence of any losses for the passage of time or incorrect choices. However, accumulating evidence is effortful, and this effort might offset the eventual gains resulting from correct choices. In fact, previous work suggests that human decision makers incur such a cost, possibly related to mental effort, in the range of 0.1–0.2 units of reward per second for accumulating evidence (Drugowitsch et al., 2012). Importantly, this cost modulates both the subjects' and the optimal reward rate, causing the median reward rate across subjects to actually rise slightly to 95.4% and 95.1% (costs of 0.1 and 0.2) of the optimum value (Figure 2B, second and third columns), compared to the costfree median of 93.7%.
The optimal reward rates so far were obtained from a model in which we allowed independent bounds for each stimulus modality and each motion coherence, which implies that subjects can rapidly and accurately estimate coherence. Using instead the more realistic assumption (Drugowitsch et al., 2014) that bounds only vary across modalities while coherence modulates diffusion variance but not bound height, we reduce the number of parameters and thus degrees of freedom for reward rate maximization. As a result, subjects' reward rates relative to the optimum rise slightly (median 94.5%), where the optimal model is now restricted to use the same bound across all coherences (Figure 2B, fourth column). Furthermore, we have assumed the model to feature the same choice biases as the subjects. These biases reduce the probability of performing correct choices, and thus the reward rate, such that removing them from our model boosts the model's optimal reward rate. As a consequence, removing these biases causes a consistent drop in subjects' relative reward rate (Figure 2B, last two columns). Even then, reward rates are still around 90% of the optimum (median 87.8% and 88.7% for free and parametric bounds, respectively). If instead of featuring the observed behavior, subjects were to ignore the stimulus and randomly choose one option at trial onset, they would incur a significant drop in reward rate for all of the different assumptions about how we define this optimum (e.g., with/without accumulation cost, …) as outlined above (subject vs random, blue vs red in Figure 2B: Wilcoxon signedrank W = 55, p < 0.002, except cost 0.2: W = 54, p < 0.004).
Despite exhibiting nearoptimal reward rates, all subjects feature small deviations from optimality. These deviations may result from incomplete learning of the optimal SAT. We only provided feedback about the correctness of choices in early stages of the experiment, until performance stabilized, and subjects did not receive feedback during the main experiment. Nevertheless, subjects' speed/accuracy tradeoff remained rather stable after removing feedback, which includes all trials we analyzed. Thus, incomplete learning in the initial training period should be reflected equally in all of these trials. To test the incompletelearning hypothesis, we assumed that subjects adjusted their strategy in small steps by using gradientbased information about how the reward rate changed in the local neighborhood of the currently chosen bounds. For our argument, it does not matter if the gradientbased strategy was realized through stochastic trialanderror or more refined approaches involving analytic estimates of the gradient, as long as it involved an unbiased estimate of the gradient. What is important, however, is that such an approach would lead to faster learning along directions of steeper gradients (Figure 3A). As a result, incomplete learning should lead to nearoptimal bounds along directions having a steep gradient, but large deviations from the optimal bound settings along directions having shallow gradients.
To measure the steepness of the gradient for different nearoptimal bounds, we used the reward rate's curvature (that is, its second derivative) with respect to each of these bounds. If these bounds were set by incomplete gradient ascent, we would expect bounds associated with a strong curvature to be nearoptimal (red dimension in Figure 3A; large curvature, close to optimal bound in inset) and bounds in directions of shallow curvature to be far away from their optimum (blue dimension in Figure 3A; small curvature, distant from optimal bound in inset). In contrast, strongly mistuned bounds associated with a large curvature (points far away from either axis in Figure 3B) would violate this hypothesis. If we plot reward rate curvature against the distance between estimated and optimal bounds, the data clearly show the predicted relationship (Figure 3B). Specifically, reward rate curvature is generally moderate to strong in the vestibularonly and combined conditions, and most of these bounds are found to be nearoptimal. In contrast, curvature is rather low for the visual condition, and many of the associated bounds are far from their optimal settings. This is exactly the pattern one would expect to observe if deviations from optimality result from a prematurely terminated gradientbased learning strategy. This analysis rests on the assumption that the manner in which reward rate varies with changes in the bounds is well approximated by a quadratic function. If this were the case, then the estimated loss in reward rate featured by the subjects when compared to the tuned model should also be well approximated by this quadratic function. These two losses are indeed close to each other for most subjects (Figure 3C), thus validating the assumption.
Previous studies have suggested that deviations from optimal bound settings may arise if subjects are uncertain about the intertrial interval (Bogacz et al., 2010; Zacksenhouse et al., 2010). With such uncertainty, subjects should set their bound above that deemed to be optimal when the intertrial interval is perfectly known. A similar aboveoptimal bound would arise if subjects are either uncertain about the optimal bound, or have difficulty in maintaining their bounds at the same level across trials. This is because the reward rate drops off more quickly below than above the optimal bounds (Figure 4A). Thus, if the subject's bounds fluctuate across trials, or the subjects are uncertain about the optimal bounds, they should aim at setting their bounds above rather than below this optimum. Indeed, this would minimize the probability that the bound would fluctuate well below the optimal value, which would result in a very sharp drop in reward rate. However, our data indicate that, in contrast to previous findings from singlemodality tasks (Simen et al., 2009; Bogacz et al., 2010), subjects consistently set their bounds below the optimum level (Figure 4B). In other words, they make faster and less accurate decisions than predicted by either of the above considerations. Figure 1A (data vs tuned) illustrates an extreme case for subject B2, in which the best reward rate is achieved in some conditions by waiting until stimulus offset. While not always as extreme as shown for this subject, a distinct discrepancy between observed and reward ratemaximizing behavior exists for all subjects, and is a reflection of the fact that nearoptimal reward rates can be achieved with remarkably different joint tunings of reaction times and choice accuracy.
What are the potential neural correlates of the highly flexible decision bounds and associated SATs that are reflected in the subjects' behavior? One possibility is the observed bound on neural activity (Roitman and Shadlen, 2002; Schall, 2003; Churchland et al., 2008; Kiani et al., 2008) in the lateral intraparietal cortex in monkeys, an area that seems to reflect the accumulation of noisy and ambiguous evidence (Yang and Shadlen, 2007). It still needs to be clarified if similar mechanisms are involved in our experimental setup, in which we observed modalitydependent trialbytrial changes in the SAT. In contrast to suggestions from neuroimaging studies (Green et al., 2012), such trialbytrial changes are unlikely to emerge from slow changes in connectivity. A more likely alternative, that is compatible with neurophysiological findings, is a neuronal ‘urgency signal’ that modulates this tradeoff by how quickly it drives decisionrelated neuronal activity to a common decision threshold (Hanks et al., 2014). Although only observed for blocked designs, a similar modalitydependent urgency signal could account for the trialbytrial SAT changes of our experiment, and qualitatively mimic a change in diffusion model bounds. Currently, our model can only predict changes in scaled decision boundaries, which conflate actual boundary levels with the diffusion standard deviation. It does not predict how the actual bound level changes, which is the quantity that relates to the magnitude of such an urgency signal. In general, quantitatively relating diffusion model parameters to neural activity strongly depends on how specific neural populations encode accumulated evidence, which has only been investigated for cases that are substantially simpler (e.g., Kira et al., 2015) than the ones we consider here.
Further qualitative evidence for neural mechanisms that support trialbytrial changes in the SAT comes from monkeys performing a visual search task with different, visually cued, response deadlines (Heitz and Schall, 2012). Even though the different deadline conditions were blocked, analysis of FEF neural activity revealed a change in baseline activity that emerged already in the first trial of each consecutive block, hinting at flexible mechanisms that preemptively govern changes in SAT. In general, such changes in SAT are likely to emerge through orchestrated changes in multiple neural mechanisms, such as changes in baseline, visual gain, duration of perceptual processing, and the other effects observed by Heitz and Schall (2012), or through combined changes to perceptual processing and motor preparation, as suggested by Salinas et al. (2014).
The observed SATs support the hypothesis that gradientbased information is used by subjects during the initial training trials to try to learn the optimal bound settings. We do not make strong assumptions about exactly how this training information is used, and even a very simple strategy of occasional bound adjustments in the light of positive or negative feedback is, in fact, gradientbased (albeit not very efficient) (e.g., Myung and Busemeyer, 1989). The clearest example of a strategy that is not gradientbased is one that does not at all adjust the SAT, or one that does so randomly, without regard to the error feedback that was given to subjects during the initial training period. Such strategies are not guaranteed to lead to the consistent curvature/bound distance relationship observed in Figure 3B. For a single speed/accuracy tradeoff, adjusting this tradeoff has already been thoroughly investigated, albeit with conflicting results (Myung and Busemeyer, 1989; Simen et al., 2006; Simen et al., 2009; Balci et al., 2011). Greater insight into the dynamics of learning this tradeoff will require further experiments that keep the task stable throughout acquisition of the strategy, and reduce the number of conditions and potential confounds to explain the observed changes in behavior.
In summary, we have shown that subjects performing a multisensory reactiontime task tune their SAT to achieve reward rates close to those achievable by the besttuned model. This nearoptimal performance is invariant under various assumptions about how the reward rate is computed, and is, even under the most conservative assumptions, in the range of 90% of the optimal reward rate. Deviations from optimality are unlikely to have emerged from a strategy of setting bounds to make them robust to perturbations. Instead, our data support the idea that decision bounds have been tuned by a gradientbased strategy. Such tuning is also in line with the observation of nearoptimal reward rates, which are unlikely to result from a random boundsetting strategy. Overall, our study provides novel insights into the flexibility with which human decision makers choose between speed and accuracy of their choices.
Materials and methods
Seven subjects (3 male) aged 23–38 years participated in a reactiontime version of a heading discrimination task with three different coherence levels of the visual stimulus. Of these subjects, three (subjects B, D, F; 1 male) participated in a followup experiment with six coherence levels. The sixcoherence version of their data is referred to as B2, D2, and F2. More details about the subjects and the task can be found in Drugowitsch et al. (2014). Not discussed in this reference is the intertrial interval, which is the time from decision to stimulus onset in the next trial. This interval is required to compute the reward rate, and was 6 s on average across trials.
Unless otherwise noted, we used a variant of the modified diffusion model described in Drugowitsch et al. (2014) to fit the subjects' behavior, and we tuned its parameters to maximize reward rates. Rather than using a constant decision bound for each modality and parameterizing how the diffusion variance depends on the coherence of visual motion (as in Drugowitsch et al., 2014), the model variant used here allowed for a separate bound/variance combination per modality and coherence. Thus, it featured 7 bound parameters for the 3coherence experiments, and 13 bound parameters for the 6coherence experiments. This variant was chosen to increase the model's flexibility when maximizing its reward rate. The original model variant with constant bounds and a changing variance led to qualitatively comparable results (Figure 1A, ‘tuned’, and Figure 2B, ‘parametric bounds’).
For each subject, we adjusted the model's parameters to fit the subject's behavior as in Drugowitsch et al. (2014), through a combination of posterior sampling and gradient ascent. Based on these maximumlikelihood parameters, we then found the model parameters that maximized reward rate by adjusting the bound/variance parameters using gradient ascent on the reward rate, while keeping all other model parameters fixed. To avoid getting trapped in local maxima, we performed this maximization 50 times with random restarts, and chose the parameters that led to the overall highest reward rate. When performing the maximization, we only modified the parameters controlling the bounds, while keeping all other parameters fixed to the maximumlikelihood values. The latter differed across subjects, such that this maximization led to different maximum reward rates for different subjects. For the ‘no bias’ variant in Figure 2B, we set the choice biases to zero before performing the reward rate maximization.
In all cases, the reward rate was computed as the fraction of correct choices across trials, divided by the average trial time, which is the time between the onsets of consecutive trials. Any nonzero evidence accumulation cost (Figures 2, 4) was first multiplied with the average decision time (that is, reaction time minus estimated nondecision time) across all trials, and then subtracted from the numerator.
Our argument about the speed of convergence of steepest gradient ascent is based on the assumption that bounds are updated according to ${\mathit{\theta}}^{n}={\mathit{\theta}}^{n1}+\alpha \nabla f\left({\mathit{\theta}}^{n1}\right)$, where ${\mathit{\theta}}^{n1}$ and ${\mathit{\theta}}^{n}$ are the bound vectors before and during the nth steepest gradient ascent step, $f\left(\mathit{\theta}\right)$ and $\nabla f\left(\mathit{\theta}\right)$ are the reward rate and its gradient for bounds $\mathit{\theta}$, and $\alpha $ is the step size. The speed of this procedure (i.e., the bound change between consecutive steps) depends for each bound on the size of the corresponding element in the reward rate gradient. For optimal bounds, this gradient is zero, which makes the gradient itself unsuitable as a measure of gradient ascent speed. Instead, we use the rate of change of this gradient close to the bounds $\widehat{\mathit{\theta}}$ estimated for individual subjects. This rate of change, called the curvature, is proportional to the gradient close to $\widehat{\mathit{\theta}}$, and therefore also proportional to the speed at which $\widehat{\mathit{\theta}}$ is approached. Closeto and at the optimal reward rate, which is a maximum, this curvature is negative. As we were more interested in its size than its sign, Figure 3 shows the absolute value of this curvature. We estimated this curvature at $\widehat{\mathit{\theta}}$ by computing the Hessian of $f\left(\widehat{\mathit{\theta}}\right)$ by finite differences (D'Errico, John [2006]. Adaptive Robust Numerical Differentiation. MATLAB Central File Exchange. Retrieved 3 July 2014), where we used the model that allowed for a different bound level per modality and coherence (7 and 13 bound parameters/dimensions for 3 and 6 coherence experiment, respectively). Before computing the distance between estimated and reward ratemaximizing bounds, we projected bound parameter vectors into the eigenspace of this Hessian, corresponding to the orientations of decreasing curvature strength. The absolute bound difference was then computed for each dimension (i.e., modality and coherence) of this eigenspace separately, with the corresponding curvature given by the associated eigenvalue (Figure 3B).
In Figure 3B, each bound dimension (i.e., modality and coherence, see figure legend) is associated with a different color. As described in the previous paragraph, this figure shows bound differences and curvatures not in the space of original bound levels, but rather in a projected space. To illustrate this bound coordinate transformation in the figure colors, we performed the same coordinate transform on the RGB values associated with each dimension, to find the colors associated with the dimensions of the projected space. The projected colors (filled cirles in Figure 3B plot) closely match the original ones (Figure 3B legend), which reveals that the curvature eigenspace is well aligned to that of the bound parameters. This indicates that the reward rate curvatures associated with each of the bound parameters, that is, each modality/coherence combination, are fairly independent. Due to the close match between projected and original colors, we do not mention the color transformation in the legend of Figure 3.
Our analysis is also valid if subjects do not follow the reward rate gradient explicitly. They could, for example, approximate this gradient stochastically on a stepbystep basis. As long as the stochastic approximation is unbiased, our argument still holds. One such stochastic approximation would be to test if a change in a single bound (corresponding to a single trial) improves the noisy estimate of the reward rate, that is, if $f\left({\mathit{\theta}}^{n}\right)>f\left({\mathit{\theta}}^{n1}\right)+\epsilon $, where only a single element (i.e., bound) is changed between ${\mathit{\theta}}^{n1}$ and ${\mathit{\theta}}^{n}$, and $\epsilon $ is zeromean symmetric random noise. In this case, larger changes, which are more likely to occur in directions of larger gradient, are more likely accepted. As a result, faster progress is made along steeper directions, which is the basic premise upon which our analysis is based.
To illustrate how the reward rate changed with bound height (Figure 4), we assumed that all (7 or 13) bound parameters varied along a straight line drawn from the origin to the reward ratemaximizing parameter settings. To project the maximumlikelihood bound parameters from the subject fits onto this line (dots in Figure 4A,B), we followed the isoreward rate contour from these parameters until they intersected with the line. We also tried an alternative approach by projecting these parameters onto the line by vector projection, which resulted in a change of the reward rate, but otherwise led to qualitatively similar results as those shown in Figure 4B. In both cases, the subjects' bound parameters were well below those found to maximize the reward rate.
References

1
Acquisition of decision making criteria: reward rate ultimately beats accuracyAttention, Perception & Psychophysics 73:640–657.https://doi.org/10.3758/s1341401000497

2
Do humans produce the speedaccuracy tradeoff that maximizes reward rate?Quarterly Journal of Experimental Psychology 63:863–891.https://doi.org/10.1080/17470210903091643

3
Decisionmaking with multiple alternativesNature Neuroscience 11:693–702.https://doi.org/10.1038/nn.2123
 4

5
The cost of accumulating evidence in perceptual decision makingThe Journal of Neuroscience 32:3612–3628.https://doi.org/10.1523/JNEUROSCI.401011.2012

6
Changes in neural connectivity underlie decision threshold modulation for reward maximizationThe Journal of Neuroscience 32:14942–14950.https://doi.org/10.1523/JNEUROSCI.057312.2012
 7
 8

9
Bounded integration in parietal cortex underlies decisions even when viewing duration is dictated by the environmentThe Journal of Neuroscience 28:3017–3029.https://doi.org/10.1523/JNEUROSCI.476107.2008
 10
 11

12
Criterion learning in a deferred decisionmaking taskAmerican Journal of Psychology 102:1–16.https://doi.org/10.2307/1423113
 13

14
Theory of memory retrievalPsychological Review 85:59–108.https://doi.org/10.1037/0033295X.85.2.59

15
The diffusion decision model: theory and data for twochoice decision tasksNeural Computation 20:873–922.https://doi.org/10.1162/neco.2008.1206420

16
Accuracy, information, and response time in a saccadic decision taskJournal of Neurophysiology 90:3538–3546.https://doi.org/10.1152/jn.00689.2002

17
The influence of urgency on decision timeNature Neuroscience 3:827–830.https://doi.org/10.1038/77739

18
Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time taskThe Journal of Neuroscience 22:9475–9489.
 19

20
Neural correlates of decision processes: neural and mental chronometryCurrent Opinion in Neurobiology 13:182–186.https://doi.org/10.1016/S09594388(03)000394

21
Rapid decision threshold modulation by reward rate in a neural networkNeural Networks 19:1013–1026.https://doi.org/10.1016/j.neunet.2006.05.038

22
Reward rate optimization in twoalternative decision making: empirical tests of theoretical predictionsJournal of Experimental Psychology. Human Perception and Performance 35:1865–1897.https://doi.org/10.1037/a0016926
 23

24
Robust versus optimal strategies for twoalternative forced choice tasksJournal of Mathematical Psychology 54:230–246.https://doi.org/10.1016/j.jmp.2009.12.004
Decision letter

Timothy BehrensReviewing Editor; Oxford University, United Kingdom
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for sending your work entitled “Tuning the speedaccuracy tradeoff to maximize reward rate in multisensory decisionmaking” for consideration at eLife. Your article has been favorably evaluated by Timothy Behrens (Senior editor) and two reviewers, one of whom, Bruno Averbeck, has agreed to share his identity.
The editor and the reviewers discussed their comments before we reached this decision, and agree that the study potentially represents an important advance on your previous work.
For example:
“This manuscript is essentially a continuation of the work that these authors published recently in eLife on multisensory decision making, and as such is an appropriate update/complement to those results.
The current study presents model fits to the original psychophysical data specifically investigating whether the behavior of the subjects was consistent with a strategy to maximize the reward rate. This is indeed an important point because the answer is not entirely obvious given the original observations: in a reaction time (RT) paradigm, when subjects based their perceptual choices on information from two sensory modalities (visual and vestibular) rather than a single one, they generally chose more rapidly rather than more accurately. What the authors found was: (1) that the subjects' behavior across experimental conditions (difficulty x modality) was indeed consistent with rewardrate maximization, and (2) that the observed deviations from optimality were small and can be explained as the consequence of incomplete or imperfect learning.”
However, there are some key points that the reviewers would like addressed before we can reach a final decision.
Most critically, there are: (a) questions about how the model relate to the experimental data:
Reviewer 1:
The demonstration that the reward rate is higher in the multimodal than the singlemodality conditions is given in Figure 1B, and the demonstration that the SAT setting affects the reward rate is given in Figure 2, and there are two important issues about how these key results relate to each other.
In Figure 2 the baseline condition or null hypothesis is the set of red bars, which represent an extreme situation in which decisions are made randomly and as fast as possible; accuracy doesn't matter at all (presumably this corresponds to a zero, or very low bound). It is fine to show that, but the question is whether the SAT is tuned, not whether it exists at all. The relevant comparison is the condition in which the bound is not zero but just constant across conditions; i.e., when the SAT setting is simply fixed. The key piece of information is, how large is the discrepancy between the maximum reward rate obtainable with a variable SAT (i.e., adjustable bound) and one obtainable with a constant SAT (nonadjustable bound)?
A second, related issue, is how those model results square with the experimental data. In particular, suppose you take the bestfitting model with fixed bounds (constant SAT) and make a plot of the model results identical to that in Figure 1B. What will it look like? If it matches the real data, then no SAT tuning is necessary across conditions, and conversely, if it does not match the real data then it will provide compelling evidence that the SAT setting does vary across conditions. Then, assuming that the fixed bound is indeed insufficient (which is difficult to infer from the current results), a similar plots can be constructed for the bestfitting model with adjustable bounds. This would directly answer the question of whether the SAT setting is likely to be tuned based on the experimental data, regardless of whether the setting is the optimal one or not. This distinction would be very useful.
Reviewer 2:
1) What would the fraction correct look like in Figure 1A for the bound that optimizes reward rate?
2) The SAT is explicitly linked to the bound height in the Introduction and then referred to as SAT throughout. But this was slightly confusing. If one explicitly unpacks the speedaccuracy tradeoff then it's not clear what it means to use one for all conditions. Perhaps at times it would be useful to link it back to the bound, or to just say bound when talking about the SAT across conditions. This is particularly confusing because in the final paragraph, the paragraph starts by talking about adjusting the SAT from trialtotrial and later suggests that one SAT is used across conditions. If this is discussed in terms of the bound it makes sense, but discussing it in terms of the SAT is hard to follow. Also, during the modeling the bounds are allowed to vary by condition. So why is it suggested that a single bound is used? How much do the bounds vary by condition? How close to optimal do the subjects get if a single bound is used across modalities?
and (b) questions about potential circularities in the analysis:
Reviewer 2:
The gradient analysis seems circular. Or rather it seems like the results of the gradient analysis are consistent with the subjects doing relatively well. If they set parameters far from optimal in dimensions where the gradient has high curvature they would be quite suboptimal. In fact, it should be possible to relate the total deviation of the subject parameters from optimal, normalized by the hessian to the performance of the subject. In other words, take the difference between the optimal bound parameters and the subject's actual bound parameters (call this delta_b) and multiply them by the inverse of the Hessian. Specifically, delta_b * inv(H) * delta_b. Does this predict how well the subjects do?
However, it is also important that you address the following substantive issues:
The SAT is also confusing because it is in diffusion particle space and not in belief space. If the SAT was in belief space, and various assumptions are met (lack of side bias etc.) then accuracy should be on average the same across conditions, and the speed should be the only thing that changes as information increases, correct? This should perhaps be made more clear and developed in a bit more detail.
Isn't there an explicit link between increasing drift rate (i.e. information rate) and whether speed and/or accuracy both increases for a fixed bound?
Is there a significant difference between subject reward rate and random choices for the cost 0.2 condition? These appear to differ by the least amount.
https://doi.org/10.7554/eLife.06678.006Author response
Most critically, there are: (a) questions about how the model relate to the experimental data:
Reviewer 1:
The demonstration that the reward rate is higher in the multimodal than the singlemodality conditions is given in Figure 1B, and the demonstration that the SAT setting affects the reward rate is given in Figure 2, and there are two important issues about how these key results relate to each other.
In Figure 2 the baseline condition or null hypothesis is the set of red bars, which represent an extreme situation in which decisions are made randomly and as fast as possible; accuracy doesn't matter at all (presumably this corresponds to a zero, or very low bound). It is fine to show that, but the question is whether the SAT is tuned, not whether it exists at all. The relevant comparison is the condition in which the bound is not zero but just constant across conditions; i.e., when the SAT setting is simply fixed. The key piece of information is, how large is the discrepancy between the maximum reward rate obtainable with a variable SAT (i.e., adjustable bound) and one obtainable with a constant SAT (nonadjustable bound)?
A second, related issue, is how those model results square with the experimental data. In particular, suppose you take the bestfitting model with fixed bounds (constant SAT) and make a plot of the model results identical to that in Figure 1B. What will it look like? If it matches the real data, then no SAT tuning is necessary across conditions, and conversely, if it does not match the real data then it will provide compelling evidence that the SAT setting does vary across conditions. Then, assuming that the fixed bound is indeed insufficient (which is difficult to infer from the current results), a similar plots can be constructed for the bestfitting model with adjustable bounds. This would directly answer the question of whether the SAT setting is likely to be tuned based on the experimental data, regardless of whether the setting is the optimal one or not. This distinction would be very useful.
We fully agree that it would have been desirable to address the above questions regarding a constant vs. tuned SAT (and respective model bound) across conditions. Unfortunately, when fitting the diffusion model, we never fit the bound per condition, ϴ_{vis}, ϴ_{vest} , and ϴ_{comb}, directly. Instead, we only ever fitted the fraction of the bound over the diffusion standard deviation per condition, that is, ϴ_{vis}/ σ_{vis} (c), ϴ_{vest}/ σ_{vest}, and ϴ_{comb}/σ_{comb}(c). These fractions are a sufficient measure to predict reaction times and choice probabilities per condition. As a consequence, reaction times and choice probabilities only allow us to infer these fraction, but not the absolute bound magnitudes independent of the diffusion standard deviations.
This has two important consequences. First, the observed behavior (i.e. choices and reaction times) does not allow us to estimate the subjects’ bounds directly, such we cannot tell by how much they differ for the different conditions. Second, fixing these bounds ϴ_{j} in the model to be the same across conditions j does not sufficiently constrain this model, as we can rescale the diffusion standard deviations σ_{j} to achieve arbitrary factions ϴ_{j}/σ_{j} (c). In other words, setting these bounds to the same level across conditions is not sufficient to guarantee that the SAT is the same across conditions.
Furthermore, we cannot even relate the estimated fraction of bound over diffusion standard deviation, ϴ_{j}/σ_{j} (c), across conditions j, as we assume the diffusion standard deviation σ_{j} (c) to be a function of the coherence of the visual flow field. Therefore, it varies within conditions, which makes it unclear which coherence to pick as a baseline for comparison.
Overall, it becomes unclear which model parameters to fix to achieve a constant SAT, such that we could not find a modelbased definition for a “constant SAT”. For this reason we decided to fall back to using qualitative measures, such as faster, less accurate decisions (e.g. Figure 1A) as an indicator for a change in the SAT. The same applies for the baseline reward rate, where we used immediate, random choices, which—as the reviewer correctly points out—corresponds to all bounds set to zero.
Reviewer 2:
1) What would the fraction correct look like in Figure 1A for the bound that optimizes reward rate?
We have added this information to Figure 1A. It illustrates that the optimally tuned bounds cause slower and more accurate decisions than those featured by the subject. This confirms the analysis underlying Figure 4.
2) The SAT is explicitly linked to the bound height in the Introduction and then referred to as SAT throughout. But this was slightly confusing. If one explicitly unpacks the speedaccuracy tradeoff then it's not clear what it means to use one for all conditions. Perhaps at times it would be useful to link it back to the bound, or to just say bound when talking about the SAT across conditions. This is particularly confusing because in the final paragraph, the paragraph starts by talking about adjusting the SAT from trialtotrial and later suggests that one SAT is used across conditions. If this is discussed in terms of the bound it makes sense, but discussing it in terms of the SAT is hard to follow. Also, during the modeling the bounds are allowed to vary by condition. So why is it suggested that a single bound is used? How much do the bounds vary by condition? How close to optimal do the subjects get if a single bound is used across modalities?
The relation between SAT and bound was indeed unclear in the previous version of the manuscript. We now make this link more explicit, first at the end of the second paragraph in the Introduction, and then again in the second paragraph of Results and Discussion. There, we had accidentally stated that this SAT does not change across conditions, which was incorrect, and which we have now fixed.
Due to the reasons outlined further above, we could not relate the bound magnitudes across conditions. The same reasons forbid us to ask how well subjects would fare with a single bound across modalities.
and (b) questions about potential circularities in the analysis:
Reviewer 2:
The gradient analysis seems circular. Or rather it seems like the results of the gradient analysis are consistent with the subjects doing relatively well. If they set parameters far from optimal in dimensions where the gradient has high curvature they would be quite suboptimal. In fact, it should be possible to relate the total deviation of the subject parameters from optimal, normalized by the hessian to the performance of the subject. In other words, take the difference between the optimal bound parameters and the subject's actual bound parameters (call this delta_b) and multiply them by the inverse of the Hessian. Specifically, delta_b * inv(H) * delta_b. Does this predict how well the subjects do?
Regarding circularity, we agree that it seems counterintuitive to observe a bound distance vs. curvature pattern different from the one we show in Figure 3B. Deviations from the optimal bound settings in directions of strong curvature cause larger drops in the reward rate than deviations of similar magnitude in directions of weak curvature (illustrated below by a different spread of isoreward rate contours in different directions). Thus, if subjects feature closetooptimal reward rates, one would expect the bounds to be closetooptimal in directions of strong curvature. However, this might not be the case in directions of weak curvature, as bound mistunings in these directions do not strongly impact the reward rate. In other words, one would expect the bounds to be further from optimal in directions of weak curvature, as illustrated in the top panel in Author response image 1 (assuming twodimensional bounds, one dot per subject), just to achieve the observed reward rate. However, this result is not as obvious as it may appear. We could equally well take all these dots and move them along the ellipsoidal isoreward curves (along which the reward rate does not change) until they are aligned along directions of strong curvature (bottom panel in below figure). Thus, for the same closetooptimal reward rates we could have observed the opposite pattern—closer tooptimal bounds in directions of weak curvature rather than directions of strong curvature—but we didn’t. Therefore, closetooptimal reward rates do not necessarily predict the pattern we observe.
The second part of the comment seems to suggest a test for how close the reward rate is to a quadratic function. If this reward rate were perfectly quadratic, then the estimated reward rate loss should coincide with that predicted from the quadratic model. We have performed this analysis (see Figure 3C), and the results suggest that there is indeed a close match to the quadratic model, thus validating the implicit assumptions about the functional form of the reward rate that underlies our analysis.
However, it is also important that you address the following substantive issues:
The SAT is also confusing because it is in diffusion particle space and not in belief space. If the SAT was in belief space, and various assumptions are met (lack of side bias etc.) then accuracy should be on average the same across conditions, and the speed should be the only thing that changes as information increases, correct? This should perhaps be made more clear and developed in a bit more detail.
This is indeed correct and a good point that we omitted in the previous version of the manuscript. We now make clear that the bound is on the diffusing particle, and elaborate in a new paragraph in Results and discussion that this is only in rare cases the same as having a bound on the posterior belief.
Isn't there an explicit link between increasing drift rate (i.e. information rate) and whether speed and/or accuracy both increases for a fixed bound?
To our knowledge this link only exists explicitly for timeinvariant drift rates within individual trials, in which case there are analytic expressions for mean reaction time and choice probability. In our case, the drift rate changes within individual trials (due to timevarying velocity/acceleration), in which case these quantities needed to be determined numerically.
Is there a significant difference between subject reward rate and random choices for the cost 0.2 condition? These appear to differ by the least amount.
The difference is still significant. To make this clear, we have added:
“For all of the different assumptions about how we define this optimum as outlined above, if subjects were to randomly choose one option immediately at trial onset instead of featuring the observed behavior, they would incur a significant drop in reward rate (subject vs. random, blue vs. red in Figure 2B: Wilcoxon signedrank W=55, p<0.002, except cost 0.2: W=54, p<0.004)” to the relevant paragraph in Results and Discussion.
https://doi.org/10.7554/eLife.06678.007Article and author information
Author details
Funding
National Institutes of Health (NIH) (R01 DC007620)
 Dora E Angelaki
National Institutes of Health (NIH) (R01 EY016178)
 Gregory C DeAngelis
National Science Foundation (NSF) (BCS0446730)
 Alexandre Pouget
U.S. Department of Defense (Multidisciplinary University Research Initiative (N000140710937))
 Alexandre Pouget
Air Force Office of Scientific Research (FA95501010336)
 Alexandre Pouget
James S. McDonnell Foundation
 Alexandre Pouget
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Experiments and DEA were supported by NIH grant R01 DC007620. GCD was supported by NIH grant R01 EY016178. AP was supported by grants from the National Science Foundation (BCS0446730), a Multidisciplinary University Research Initiative (N000140710937), the Air Force Office of Scientific Research (FA95501010336), and the James McDonnell Foundation.
Ethics
Human subjects: Human subjects: Informed consent was obtained from all participants and all procedures were reviewed and approved by the Washington University Office of Human Research Protections (OHRP), Institutional Review Board (IRB; IRB ID# 201109183). Consent to publish was not obtained in writing, as it was not required by the IRB, but all subjects were recruited for this purpose and approved verbally. Of the initial seven subjects, three participated in a followup experiment roughly 2 years after the initial data collection. Procedures for the followup experiment were approved by the Institutional Review Board for Human Subject Research for Baylor College of Medicine and Affiliated Hospitals (BCM IRB, ID# H29411) and informed consent and consent to publish was given again by all three subjects.
Reviewing Editor
 Timothy Behrens, Oxford University, United Kingdom
Publication history
 Received: January 29, 2015
 Accepted: June 18, 2015
 Accepted Manuscript published: June 19, 2015 (version 1)
 Version of Record published: July 1, 2015 (version 2)
Copyright
© 2015, Drugowitsch et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,765
 Page views

 413
 Downloads

 19
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Computational and Systems Biology
 Microbiology and Infectious Disease

 Cancer Biology
 Computational and Systems Biology