Putting perception into action with inverse optimal control for continuous psychophysics
Abstract
Psychophysical methods are a cornerstone of psychology, cognitive science, and neuroscience where they have been used to quantify behavior and its neural correlates for a vast range of mental phenomena. Their power derives from the combination of controlled experiments and rigorous analysis through signal detection theory. Unfortunately, they require many tedious trials and preferably highly trained participants. A recently developed approach, continuous psychophysics, promises to transform the field by abandoning the rigid trial structure involving binary responses and replacing it with continuous behavioral adjustments to dynamic stimuli. However, what has precluded wide adoption of this approach is that current analysis methods do not account for the additional variability introduced by the motor component of the task and therefore recover perceptual thresholds that are larger compared to equivalent traditional psychophysical experiments. Here, we introduce a computational analysis framework for continuous psychophysics based on Bayesian inverse optimal control. We show via simulations and previously published data that this not only recovers the perceptual thresholds but additionally estimates subjects’ action variability, internal behavioral costs, and subjective beliefs about the experimental stimulus dynamics. Taken together, we provide further evidence for the importance of including acting uncertainties, subjective beliefs, and, crucially, the intrinsic costs of behavior, even in experiments seemingly only investigating perception.
Editor's evaluation
This important article presents a Bayesian model framework for estimating individual perceptual uncertainty from continuous tracking data – a powerful and exciting alternative to traditional binarychoice psychophysical experiments. The model takes into account motor variability, action cost, and possible misestimation of the generative dynamics. The analyses provide compelling evidence for the framework. The article is clearly written and provides a didactic resource for students wishing to implement similar models on continuous action data.
https://doi.org/10.7554/eLife.76635.sa0eLife digest
Humans often perceive the world around them subjectively. Factors like light brightness, the speed of a moving object, or an individual's interpretation of facial expressions may influence perception. Understanding how humans perceive the world can provide valuable insights into neuroscience, psychology, and even people’s spending habits, making human perception studies important. However, these socalled psychophysical studies often consist of thousands of simple yes or no questions, which are tedious for adult volunteers, and nearly impossible for children.
A new approach called ‘continuous psychophysics’ makes perception studies shorter, easier, and more fun for participants. Instead of answering yes or no questions (like in classical psychophysics experiments), the participants follow an object on a screen with their fingers or eyes. One question about this new approach is whether it accounts for differences that affect how well participants follow the object. For example, some people may have jittery hands, while others may be unmotivated to complete the task.
To overcome this issue, Straub and Rothkopf have developed a mathematical model that can correct for differences between participants in the variability of their actions, their internal costs of actions, and their subjective beliefs about how the target moves. Accounting for these factors in a model can lead to more reliable study results. Straub and Rothkopf used data from three previous continuous psychophysics studies to construct a mathematical model that could best predict the experimental results. To test their model, they then used it on data from a continuous psychophysics study conducted alongside a classical psychophysics study. The model was able to correct the results of the continuous psychophysics study so they were more consistent with the results of the classical study.
This new technique may enable wider use of continuous psychophysics to study a range of human behavior. It will allow larger, more complex studies that would not have been possible with conventional approaches, as well as enable research on perception in infants and children. Brain scientists may also use this technique to understand how brain activity relates to perception.
Introduction
Psychophysical methods such as forcedchoice tasks are widely used in psychology, cognitive science, neuroscience, and behavioral economics because they provide precise and reliable quantitative measurements of the relationship between external stimuli and internal sensations (Gescheider, 1997; Wichmann and Jäkel, 2018). The tasks employed by traditional psychophysics are typically characterized by a succession of hundreds of trials in which stimuli are presented briefly and the subject responds with a binary decision, for example, whether target stimuli at different contrast levels were perceived to be present or absent.
The analysis of such experimental data using signal detection theory (SDT; Green and Swets, 1966) employs a model that assumes that a sensory signal generates an internal representation corrupted by Gaussian variability. A putative comparison of this stochastic signal with an internal criterion leads to a binary decision. These assumptions render human response rates amenable to analysis based on Bayesian decision theory, by probabilistically inverting the model of the decisiongenerating process. This provides psychologically interpretable measures of perceptual uncertainty and of the decision criterion, for example, for contrast detection. Thus, the power of classical psychophysics derives from the combination of controlled experimental paradigms with computational analysis encapsulated in SDT.
Psychophysical experiments and their analysis with SDT have revealed invaluable knowledge about perceptual and cognitive abilities and their neuronal underpinnings, and found widespread application to tasks as diverse as eyewitness identification and medical diagnosis (for recent reviews, see, e.g., Lynn and Barrett, 2014; Wixted, 2020). One central drawback, however, is that collecting data in such tasks is often tedious, as famously noted already by James, 1890. This leads to participants’ engagement levels being low, particularly in untrained subjects, resulting in measurements contaminated by additional variability (Manning et al., 2018). A common solution is to rely on few but highly trained observers to achieve consistent measurements (Green and Swets, 1966; Jäkel and Wichmann, 2006).
Recent work has suggested overcoming this shortcoming by abandoning the rigid structure imposed by independent trials and instead eliciting continuous behavioral adjustments to dynamic stimuli (Bonnen et al., 2015; Bonnen et al., 2017; Knöll et al., 2018; Huk et al., 2018). In their first study using continuous psychophysics, Bonnen et al., 2015, used a tracking task, in which subjects moved a computer mouse to track targets of different contrasts moving according to a random walk governed by linear dynamics with additive Gaussian noise. This experimental protocol not only requires orders of magnitude less time compared to traditional psychophysical methods, but participants report it to be more natural, making it more suitable for untrained subjects and potentially children and infants. Similar approaches have recently been used to measure contrast sensitivity (Mooney et al., 2018), eye movements toward optic flow (Chow et al., 2021), or retinal sensitivity (Grillini et al., 2021).
To obtain measures of perceptual uncertainty from the tracking data, Bonnen et al., 2015, proposed analyzing these data with the Kalman filter (KF; Kalman, 1960), the Bayesoptimal estimator for sequential observations with Gaussian uncertainty. The KF, however, models the perceptual side of the tracking task only, that is, how an ideal observer (Geisler, 1989) sequentially computes an estimate of the target’s position. Unfortunately, although the perceptual thresholds estimated with the KF from the tracking task were highly correlated with the perceptual uncertainties obtained with an equivalent classical forcedchoice task employing stimuli with the same contrast, Bonnen et al., 2015, reported that these were systematically larger by a large margin.
This is not surprising since every perceptual laboratory task involves some motor control component at least for generating a response, albeit to different degrees. While a detection task may require pressing a button, a reproduction task involves a motor control component, and a tracking task involves continuous visuomotor control, as mentioned by Bonnen et al., 2015. In addition to the problem of estimating the current position of the target, a tracking task encompasses the motor control problem of moving the finger, computer mouse, or gaze toward the target. This introduces additional sources of variability and bias. First, repeated movements toward a target exhibit variability (Faisal et al., 2008), which arises because of neural variability during execution of movements (Jones et al., 2002) or their preparation (Churchland et al., 2006). Second, a subject might trade off the instructed behavioral goal of the tracking experiment with subjective costs, such as biomechanical energy expenditure (di Prampero, 1981) or mental effort (Shenhav et al., 2017). Third, subjects might have mistaken assumptions about the statistics of the task (Petzschner and Glasauer, 2011; Beck et al., 2012), which can lead to different behavior from a model that perfectly knows the task structure, that is, ideal observers. A model that only considers the perceptual side of the task will, therefore, tend to overestimate perceptual uncertainty because these additional factors get lumped into perceptual model parameters, as we will show in simulations.
Here, we account for these factors by applying ideas from optimal control under uncertainty (see, e.g., Wolpert and Ghahramani, 2000; Todorov and Jordan, 2002; Shadmehr and MussaIvaldi, 2012, for reviews in the context of sensorimotor neuroscience) to the computational analysis of continuous psychophysics. Partially observable Markov decision processes (POMDPs; Åström, 1965; Kaelbling et al., 1998) offer a general framework for modeling sequential perception, and actions under sensory uncertainty, action variability, and explicitly include behavioral costs (Hoppe and Rothkopf, 2019). They can be seen as a generalization of Bayesian decision theory, on which classical analysis of psychophysics including SDT is based, to sequential tasks. Specifically, we employ the wellstudied linear quadratic Gaussian framework (LQG; Anderson and Moore, 2007), which accommodates continuous states and actions under linear dynamics and Gaussian variability. While the KF, being the Bayesoptimal estimator for a linearGaussian system, can be shown to minimize the squared error between the latent target and its estimate, the optimal control framework can flexibly incorporate different kinds of costs such as the intrinsic costs of actions. Thus, it can accommodate task goals beyond estimation as well as physiological and cognitive costs of performing actions.
Modeling the particular task as it is implemented by the researcher allows deriving a normative model of behavior such as ideal observers in perceptual science (Green and Swets, 1966; Geisler, 1989) or optimal feedback control models in motor control (Wolpert and Ghahramani, 2000; Todorov and Jordan, 2002). This classic task analysis at the computational level (Marr, 1982) can now be used to produce predictions of behavior, which can be compared to the actual empirically observed behavior. However, the fundamental assumption in such a setting is that the subject is carrying out the instructed task and that subject’s internal model of the task is identical to the underlying generative model of the task employed by the researcher. Here, instead, we allow for the possibility that the subject is not acting on the basis of the model the researcher has implemented in the experiment. Instead, we allow for the possibility that subjects’ cost function does not only capture the instructed task goals but also the experienced subjective cost such as physiological and cognitive costs of performing actions. Such an approach is particularly useful in a more naturalistic task setting, where subjective costs and benefits of behavior are difficult to come by a priori. Similarly, we allow subjects’ subjective internal model of stimulus dynamics to differ from the true model employed by the researcher. In the spirit of rational analysis (Simon, 1955; Anderson, 1991; Gershman et al., 2015), we subsequently invert this model of human behavior by developing a method for performing Bayesian inference of parameters describing the subject. Importantly, inversion of the model allows all parameters to be inferred from behavioral data and does not presuppose their value, so that, for example, if subjects’ actions were not influenced by subjective internal behavioral costs, the internal cost parameter would be estimated to be zero. This approach therefore reconciles normative and descriptive approaches to sensorimotor behavior as it answers the question which subjective internal model, beliefs, and costs best explain observed behavior.
We show through simulations with synthetic data that models not accounting for behavioral cost and action variability overestimate perceptual uncertainty when applied to data that includes these factors. We apply our method to data from three previously published experiments, infer perceptual uncertainty, and obtain overwhelming evidence through Bayesian model comparison with the widely applicable information criterion (WAIC; Watanabe and Opper, 2010) that this model explains the data better than models only considering perceptual contributions to behavioral data. Additionally, the method provides inferences of subjects’ action variability, subjective behavioral costs, and subjective beliefs about stimulus dynamics. Taken together, the methodology presented here improves current analyses and should be the preferred analysis technique for continuous psychophysics paradigms.
Results
Computational models of psychophysical tasks
In a classical psychophysics task (e.g., position discrimination; Figure 1A and B), the stimuli x_{t} presented to the observer are usually independent and identically distributed between trials. This allows for straightforward application of Bayesian decision theory: in any trial, the observer receives a stochastic measurement ${y}_{t}\sim p({y}_{t}{x}_{t},\mathit{\theta})$, forms a belief $p({x}_{t}{y}_{t},\mathit{\theta})\propto p({y}_{t}{x}_{t},\mathit{\theta})p({x}_{t},\mathit{\theta})$, and makes a single decision minimizing a cost function ${u}_{t}=\text{argmin}\int J({u}_{t},{x}_{t},\mathit{\theta})p({x}_{t}{y}_{t},\mathit{\theta})d{x}_{t}$. In SDT, for example, observations are commonly assumed to be Gaussian distributed and the cost function J assigns a value to correct and wrong decisions. These assumptions make it straightforward to compute the observer’s decision probabilities $p({u}_{t}{x}_{t})$ given parameters θ (e.g., sensitivity and criterion in SDT) and invert the model to infer those parameters from behavior, yielding a posterior distribution $p(\mathit{\theta}{u}_{1:T},{x}_{1:T})$ (see Figure 1C). Similarly, a psychometric curve can be interpreted as the decision probabilities of a Bayesian observer with a particular choice of measurement distribution, which determines the shape of the curve. The assumption of independence between trials is critical in the application of these modeling techniques.
Continuous psychophysics abandons the independence of stimuli between individual trials. Instead, stimuli are presented in a continuous succession. For example, in a positiontracking task, the stimulus moves from frame to frame and the observer’s goal is to track the stimulus with their mouse cursor (Figure 1D and E). In a computational model of a continuous task, the stimulus dynamics are characterized by a state transition probability $p({\mathbf{x}}_{t}{\mathbf{x}}_{t1},{\mathbf{u}}_{t1},\mathit{\theta})$ and can potentially be influenced by the agent’s actions ${\mathbf{u}}_{t1}$. As in the trialbased task, the agent receives a stochastic measurement ${\mathbf{y}}_{t}\sim p({\mathbf{y}}_{t}{\mathbf{x}}_{t},\mathit{\theta})$ and has the goal to perform a sequence of actions ${\mathbf{u}}_{1:T}=\text{argmin}\int J({\mathbf{u}}_{1:T},{\mathbf{x}}_{1:T})p({\mathbf{x}}_{1:T}{\mathbf{y}}_{1:T})d{\mathbf{x}}_{1:T}$, minimizing the cost function. Formally, this problem of acting under uncertainty is a POMDP and is computationally intractable in the general case (Åström, 1965; Kaelbling et al., 1998).
In targettracking tasks used in previous continuous psychophysics studies (Bonnen et al., 2015; Bonnen et al., 2017; Huk et al., 2018; Knöll et al., 2018), in which the target is on a random walk, the dynamics of the stimulus are linear and the subjects’ goal is to track the target. Tracking can be reasonably modeled with a quadratic cost function penalizing the separation between the target and the position of the tracking device. The variability in the dynamics and the uncertainty in the observation model can be modeled with Gaussian distributions. The resulting LQG control problem is a wellstudied special case of the general POMDP setting, which can be solved exactly by determining an optimal estimator and an optimal controller (Edison, 1971; Anderson and Moore, 2007). It is defined by a discretetime linear dynamical system with Gaussian noise,
a linear observation model with Gaussian noise,
and a quadratic cost function,
where Q defines the costs for the state (e.g., squared distance between the target and a mouse or hand) and R defines the cost of actions (biomechanical, cognitive, etc.).
For the LQG control problem, the separation principle between estimation and control holds (Davis and Vinter, 1985). This means that the optimal solutions for the estimator (KF) and controller (LQR) can be computed independently from one another. The KF (Kalman, 1960) is used to form a belief ${\widehat{\mathbf{x}}}_{t}$ about the current state based on the previous estimate and the current observation:
This belief is then used to compute the optimal action ${\mathbf{u}}_{t}$ based on the linear quadratic regulator (LQR, Kalman, 1964): ${\mathbf{u}}_{t}=L{\widehat{\mathbf{x}}}_{t}$. For the derivations of $K$ and $L$, see section ‘LQG control derivations’.
We consider four models, each of which is a generalization of the previous one. Importantly, this means that the most general model contains the remaining three as special cases. We introduce the models in the following paragraphs and describe their respective free parameters. For a complete specification of how these parameters need to be set to model a tracking task and how this gives rise to the matrices defining the linear dynamical systems for each model, see Appendix 2—table 1.
First, the ideal observer model in the tracking task is the KF as it is the optimal Bayesian sequential estimator of the latent, unobserved state of the system (Figure 2A). The state represents the current position of the target and there is no explicit representation of the agent’s response or action. The computational goal of the KF is to perform sequential inference about the position of the target given sensory measurements and an internal model of the target’s dynamics. As a proxy for a behavioral response, one can therefore use the best estimate ${\widehat{\mathbf{x}}}_{t}$ that corresponds to the target’s position. Because the ideal observer assumption implies that the task dynamics, that is, the standard deviation of the random walk are perfectly known to the subject, the KF has only one free parameter, the perceptual uncertainty σ. This parameter describes the uncertainty in the sensory observation of target’s true position.
Second, the optimal actor model maintains a representation of its own response, that is, the position of the mouse cursor in addition to the target’s position: the state space is now twodimensional. This allows us, in addition to the sensory uncertainty about the target, to model sensory uncertainty about the position of the cursor ${\sigma}_{p}$ (Figure 2B). It is now possible to explicitly model a behavioral goal different from optimal sensory inference, as in the case of the KF model. The behavioral goal is encapsulated in the cost function and for the optimal actor the cost function is fully determined by the task goal intended by the researcher, that is, the tracking of the target. Therefore, the cost function does not contain any free parameters in the optimal actor model. Different from the KF model that only carries out estimation, the optimal actor model generates a sequences of actions with their associated action variability ${\sigma}_{m}$. The optimal behavior of the model is now to move the cursor as closely as possible toward the estimated position of the target.
Third, the bounded actor model is based on the optimal actor model by also using a state representation involving the target and its own response as well as uncertain sensory observations of both these state variables. We use the term "bounded actor" as it is customary in parts of the cognitive science literature, where, according to Herbert Simon's definition (Simon, 1955), bounded rationality, which is related to the concept of resourcerationality, takes the limitations on cognitive capacity into a account. It is also a control model so that sequential actions are generated. The difference lies in the cost function, which now can additionally accommodate internal behavioral costs c (Figure 2C) that may penalize, for example, large actions and thus result in a larger lag between the response and the target. The control cost parameter c therefore can implement a tradeoff between tracking the target and expending effort (see Figure 2—figure supplement 1).
Finally, the subjective actor is a version of the bounded actor but now bases its decisions on an internal model of the stimulus dynamics that may differ from the true generative model employed by the experimenter. The causes for differences in the internal model may be sensory, perceptual, cognitive, or stemming from constraints on planning. For example, in an experimental design with a target moving according to a random walk on position (Bonnen et al., 2015) with a true standard deviation ${\sigma}_{\text{rw}}$, the subject could instead assume that the target follows a random walk on position with standard deviation ${\sigma}_{s}$ and an additional random walk on the velocity with standard deviation ${\sigma}_{v}$. These subjective assumptions about the target dynamics can, for example, lead the subjective actor to overshoot the target, for which they then have to correct (Figure 2D). For the subjective actor, the state space is threedimensional because it contains the target’s velocity in addition to the target and response positions (see Appendix 2). One could also formulate a version of any of the other models to include subjective beliefs. However, the subjective actor model is a generalization that includes all of these possible models. Accordingly, if the subjective beliefs played an important role but the costs or variability did not, we would infer the relevant parameters to have a value close to zero.
Bayesian inverse optimal control
The normative models described in the previous section treat the problem from the subject’s point of view. That is, they describe how optimal actors with different internal models and different goals as encapsulated by the cost function should behave in a continuous psychophysics task. The above normative models may give rise to different sequences of actions, which are the optimal solutions given the respective models with their associated parameters and computational goals. From the subject’s perspective, the true state of the experiment ${\mathbf{x}}_{t}$ is only indirectly observed via the uncertain sensory observation ${\mathbf{y}}_{t}$ (see Figure 1D). From the point of view of a researcher, the true state of the experiment is observed because it is under control of the experiment, for example, using a computer that presents the target and mouse position. The computational goal for the researcher is to estimate parameters θ that describe the perceptual, cognitive, and motor processes of the subject for each model, given observed trajectories ${\mathbf{x}}_{1:T}$ (Figure 1F). In the case of continuous psychophysics, these parameters include the perceptual uncertainty about the target σ and about one’s own cursor position ${\sigma}_{p}$, the control cost c, and the action variability ${\sigma}_{m}$, as well as parameters that describe how the subject’s internal model differs from the true generative model of the experiment employed by the researcher. These properties are not directly observed and can only be inferred from the subject’s behavior. Importantly, in the framework presented here, we subsequently need to carry out model comparison across the considered models potentially describing the subject as we do not assume to know the model describing the subject’s internal beliefs, costs, and model a priori.
To compute the posterior distribution according to Bayes’ theorem $p(\mathit{\theta}{\mathbf{x}}_{1:T})\propto p({\mathbf{x}}_{1:T}\mathit{\theta})p(\mathit{\theta})$, we need the likelihood $p({\mathbf{x}}_{1:T}\mathit{\theta})$, which we can derive using the probabilistic graphical model from the researcher’s point of view (Figure 1F). The graph’s structure is based on that describing the subject’s point of view. However, because different variables are observed, the decoupling of the perceptual and control processes no longer holds and we need to marginalize out the latent internal observations of the subject: $p({\mathbf{x}}_{1:T}\mathit{\theta})=\int ({\mathbf{x}}_{1:T},{\mathbf{y}}_{1:T}\mathit{\theta})d{\mathbf{y}}_{1:T}$. The section ‘Likelihood function’ describes how to do this efficiently. We compute posterior distributions using probabilistic programming and Bayesian inference (see section ‘Bayesian inference’). In the following, we show empirically through simulations involving synthetic data that the models’ parameters can indeed be recovered with Bayesian inverse optimal control.
Simulation results
To establish that the inference algorithm can recover the model parameters from ground truth data, we generated 200 sets of parameters from uniform distributions resulting in realistic tracking data (see Appendix 3, ‘Parameter recovery’ for details). For each set of parameters, we simulated datasets consisting of 10, 25, and 50 trials (corresponding to 2, 5, and 10 min) from the subjective actor model. Figure 3A shows one example posterior distribution. Average posterior means and credible intervals relative to the true value of the parameter are shown in Figure 3B. With only 2 min of tracking, the 95% posterior credible intervals of the perceptual uncertainty σ are [0.8, 1.3] relative to the true value on average. This means that the tracking data of an experiment lasting 2 min is sufficient to obtain posterior distributions over model parameters for which 95% of probability is within a range of 20% underestimation and 30% overestimation of the true values. As a comparison, we consider the simulations for estimating the width of a psychometric function in classical psychophysical tasks conducted by Schütt et al., 2015. With 800 forcedchoice trials (corresponding to roughly 20 min), the 95% posterior credible intervals are within [0.6, 1.6] relative to the true value on average. This suggests a temporal efficiency gain for continuous psychophysics of at least a factor of 10. These simulation results are in accordance with the empirical results of Bonnen et al., 2015, who had shown that the tracking task yields stable but biased estimates of perceptual uncertainty in under 2 min of data collection.
Behavior in a task with an interplay of perception and action such as a tracking task contains additional sources of variability and bias beyond sensory influences: repeated actions with the same goal are variable and the cost of actions may influence behavior. We model these factors as action variability ${\sigma}_{m}$ and control cost c, respectively, in the bounded actor model. A model without these factors needs to attribute all the experimentally measured behavioral biases and variability to perceptual factors, even when they are potentially caused by additional cognitive and motor processes. We substantiate this theoretical argument via simulations. We simulated 20 datasets with different values for c and ${\sigma}_{m}$ sampled from uniform ranges from the bounded actor model and the subjective actor model. Then, we fit each of the four different models presented in Figure 2 to the tracking data and computed posterior mean perceptual uncertainties. The results are shown in Figure 3C. The posterior means of σ from the bounded actor and subjective actor scatter around the true value, with the model used to simulate the data being the most accurate for estimating σ. The other two models (KF and optimal actor), which do not contain intrinsic behavioral costs, both overestimate σ. This overestimation is worse when the model does also not account for action variability (KF). Furthermore, Bayesian model comparison confirms that the model used to simulate the data is also the model with the highest predictive accuracy in all cases (Figure 3—figure supplement 1).
Continuous psychophysics
We now show that the ability of the optimal control models to account for control costs and action variability results in more accurate estimates of perceptual uncertainty in a targettracking task that are in closer agreement to the values obtained through an equivalent twointerval forcedchoice (2IFC) task. We reanalyze data from three subjects in two previously published experiments (Bonnen et al., 2015, Appendix 4, ‘Continuous psychophysics’ for details): a position discrimination and a positiontracking task. Both experiments employed the same visual stimuli: Gaussian blobs with six different widths within white noise backgrounds were used to manipulate the stimulus contrast (see Figure 4A). In the 2IFC discrimination task, the psychophysical measurements of perceptual uncertainty increased with decreasing contrast (Figure 4A, green lines). In the tracking experiment, the same visual stimuli moved on a Gaussian random walk with a fixed standard deviation and subjects tracked the target with a small red mouse cursor.
We fit all four models presented in Figure 2 to the data of the tracking experiment. To a first approximation, it is reasonable to assume that action variability, perceptual uncertainty about the mouse position, and control costs are independent of the target blob width, so that these parameters are shared across the different contrast conditions. The perceptual uncertainty σ should be different across contrasts, so we inferred individual parameters σ per condition. We assume ideal temporal integration over individual frames, including for the reanalysis of the KF model (see Appendix 4). We focus on the KF and the subjective actor in the following.
The posterior means of estimated perceptual uncertainty in both models were highly correlated with the classical psychophysical measurements (r > 0.88 for both models and all subjects). However, the parameters from the subjective actor model are smaller in magnitude than KF’s in all three subjects and all but one blob width condition (Figure 4A). The reason for this is that the KF attributes all variability to perceptual uncertainty, while the other models explicitly include cognitive and motor influences (see section ‘Simulation results’). The average factor between the posterior mean perceptual uncertainty in the continuous task and the 2IFC task in the five higher contrast conditions is 1.20 for the subjective actor, while it is 1.73 for the KF. Only in the lowest contrast condition, it increases to 2.20 and 2.93, respectively. Thus, accounting for action variability and control effort in our model leads to estimates of perceptual uncertainty closer to those obtained with the classical 2IFC psychophysics task. Note that a deviation between the two psychophysical tasks should not be surprising based on previous research comparing the thresholds obtained with different traditional paradigms, as we further discuss below.
Importantly, in addition to the perceptual uncertainty parameter, we also obtain posterior distributions over the other model parameters (Figure 4B). The three subjects differ in how variable their actions are (${\sigma}_{m}$) and how much they subjectively penalize control effort (c). Additionally, the subjective actor model infers subjects’ implicit assumptions about how the target blob moved. The model estimates that two of the three subjects assume a velocity component ${\sigma}_{v}$ to the target’s dynamics, although the target’s true motion according to the experimenter’s generative model does not contain such a motion component, while in the third participant this estimated velocity component is close to zero. Similarly, the model also infers the subjectively perceived randomness of the random walk, that is, the standard deviation ${\sigma}_{s}$ (Figure 4C), for each subject, which can be smaller or larger than the true standard deviation used in generating the stimulus.
As an important test, previous research has proposed comparing the autocorrelation structure of the tracking data with that of the model’s prediction (Bonnen et al., 2015; Knöll et al., 2018; Huk et al., 2018). We simulated data from both the KF and the subjective actor (20 trials, using the posterior mean parameter estimates) and compared it to the empirical data. To this end, we computed crosscorrelograms (CCGs, i.e., the crosscorrelation between the velocities of the target and the response, see section ‘Crosscorrelograms’), for each trial of the real data and simulated data from both models (Figure 4C). Indeed, the subjective actor captures more of the autocorrelation structure of the tracking data, quantified by the correlation between the CCGs of the model and those of the data, which was 0.61 for the KF and 0.86 for the subjective actor.
To quantitatively compare the models, we employ Bayesian model comparison using the WAIC (Watanabe and Opper, 2010, see section ‘Model comparison’). WAIC is computed from posterior samples and estimates the outofsample predictive accuracy of a model, thereby taking into account that models with more parameters are more expressive. Overwhelming evidence for the subjective actor model is provided by Akaike weights larger than 0.98 in all three subjects, with the weight for the KF model being equal to 0 within floating point precision. This means that the behavioral data was overwhelmingly more likely under the subjective actor than the KF model, even after taking the wider expressivity of the model into account. Such strong evidence in favor of one of the models is the consequence of the large number of data samples obtained in continuous psychophysics paradigms.
Application to other datasets
We furthermore applied all four models to data from an additional experiment in which subjects tracked a target with their finger using a motion tracking device. Experiment 2 in the study by Bonnen et al., 2017 involved five conditions with different standard deviations for the target’s random walk. These five conditions were presented in separate blocks (see Appendix 4, ‘Motion tracking’). Comparisons of CCGs for one subject are shown in Figure 5A (see Figure 5—figure supplement 1 for all subjects). Qualitatively, the subjective actor captures the crosscorrelation structure of the data better than the other models. To quantitatively evaluate the model fits, we again used WAIC (Figure 5B). In two of the three subjects, the subjective actor performs best, while in one subject the bounded actor with accurate stimulus model accounts better for the data. We also performed a crossvalidation analysis by fitting the models to four of the conditions and evaluating it in the remaining condition. Our reasoning was that the quality of different models in capturing participants’ behavior can be quantified by measuring the likelihood of parameters across conditions, that is, by crossvalidation, and in the consistency of estimated parameters, that is, their respective variance. The crossvalidation analysis provides additional support for the subjective model since it gives more consistent estimates of perceptual uncertainty and has higher log likelihood than the other models (Appendix 5—figure 1). Again, we can also inspect and interpret the other model parameters such as control costs, action variability, and subjective belief about target dynamics (Figure 5—figure supplement 2). Similarly, when applied to another experiment, in which one human participant, two monkeys, and a marmoset tracked optic flow fields with their gaze (Knöll et al., 2018), model selection favored the subjective actor (Appendix 5—figure 2).
Discussion
We introduce an analysis method for continuous psychophysics experiments (Bonnen et al., 2015; Knöll et al., 2018; Bonnen et al., 2017; Huk et al., 2018) that allows estimating subjects’ perceptual uncertainty, the quantity of interest in the analysis of traditional psychophysics experiments with SDT. We validated the method on synthetic data and demonstrated its feasibility for actions involving human and nonhuman primate participants’ tracking with a computer mouse (Bonnen et al., 2015), tracking by pointing a finger (Bonnen et al., 2017), and tracking using gaze (Knöll et al., 2018).
Importantly, we applied our method to experimental data from a manual tracking task, which were collected together with equivalent stimuli in a 2IFC paradigm for the purpose of empirical validation (Bonnen et al., 2015). The perceptual uncertainties inferred using inverse optimal control were in better agreement with the 2IFC measurements than those inferred using the previously proposed KF. In the lowest contrast conditions, all models we considered show a large and systematic deviation in the estimated perceptual uncertainty compared to the equivalent 2IFC task. Note that when considering synthetic data we did not see such a discrepancy. Thus, the observed bias points toward additional mechanisms such as a computational cost or computational uncertainty that are not captured by the current models at very low contrast. One reason for this could be that the assumption of constant behavioral costs across different contrast conditions might not hold at very low contrasts because subjects might simply give up tracking the target although they can still perceive its location. Another possible explanation is that the visual system is known to integrate visual signals over longer times at lower contrasts (Dean and Tolhurst, 1986; Bair and Movshon, 2004), which could affect not only sensitivity in a nonlinear fashion but could also lead to nonlinear control actions extending across a longer time horizon. Further research will be required to isolate the specific reasons. The analysis methods presented in this study are well suited to investigate these questions systematically because they allow modeling these assumptions explicitly and quantifying parameters describing subjects’ behavior.
Because subjects’ behavior is conceptualized as optimal control under uncertainty (Åström, 1965; Kaelbling et al., 1998; Anderson and Moore, 2007; Hoppe and Rothkopf, 2019), the optimal actor model additionally contains action variability and a cost function encapsulating the behavioral goal of tracking the target. The present analysis method probabilistically inverts this model similarly to approaches in inverse reinforcement learning (Ng and Russell, 2000; Ziebart et al., 2008; Rothkopf and Dimitrakakis, 2011) and inverse optimal control (Chen and Ziebart, 2015; Herman et al., 2016; Schmitt et al., 2017). The inferred action variability can be attributed to motor variability (Faisal et al., 2008) but also to other cognitive sources, including decision variability (Gold and Shadlen, 2007). We extended this model to a bounded actor model in the spirit of rational analysis (Simon, 1955; Anderson, 1991; Gershman et al., 2015) by including subjective costs for carrying out tracking actions. These behavioral costs correspond to intrinsic, subjective effort costs, which may include biomechanical costs (di Prampero, 1981), as well as cognitive effort (Shenhav et al., 2017), trading off with the subject’s behavioral goal of tracking the target.
The model was furthermore extended to allow for the possibility that subjects may act upon beliefs about the dynamics of the target stimuli that differ from the true dynamics employed by the experimenter. Such a situation may arise due to perceptual biases in the observation of the target’s dynamics. Model selection using WAIC (Watanabe and Opper, 2010) favored this bounded actor with subjective beliefs about stimulus dynamics in all four experiments involving human and monkey subjects with a single exception, in which the bounded actor model better accounted for the behavioral data. Thus, we found overwhelming evidence that subjects’ behavior in the continuous psychophysics paradigm was better explained by a model that includes action variability and internal costs for actions increasing with the magnitude of the response.
One possible criticism of continuous psychophysics is that it introduces additional unmeasured factors such as action variability, intrinsic costs, and subjective internal models. While classical psychophysical paradigms take great care to minimize the influence of these factors by careful experimental design, they can nevertheless still be present, for example, as serial dependencies (Green, 1964; Fischer and Whitney, 2014; Fründ et al., 2014). Similarly, estimates of perceptual uncertainty often differ between classical psychophysical tasks when compared directly. For example, Yeshurun et al., 2008, reviewed previous psychophysical studies and conducted experiments to empirically test the theoretical predictions of SDT between YesNo and 2IFC tasks. They found a broad range of deviations between the empirically measured perceptual uncertainty and the theoretical value predicted by SDT. Similarly, there are differences between 2AFC tasks, in which two stimuli are separated spatially, and their equivalent 2IFC tasks, in which the stimuli are separated in time (Jäkel and Wichmann, 2006). While the former task engages spatial attention, the latter engages memory. These factors are typically also not accounted for in SDTbased models. Thus, substantial empirical evidence suggests that factors often not modeled by SDT, such as attention, memory, intrinsic costs, beliefs of the subject differing from the true task statistics, and learning, nevertheless very often influence the behavior in psychophysical experiments. This applies to classical trialbased tasks as well as continuous tasks. Crucially, the framework we employ here allows modeling and inferring some of these additional cognitive factors responsible for some of the observed deviations.
On a conceptual level, the results of this study underscore the fact that psychophysical methods, together with their analysis methods, always imply a generative model of behavior (Green and Swets, 1966; Swets, 1986; Wixted, 2020). Nevertheless, different models can be used in the analysis, that is, models that may or may not be aligned with the actual generative model of the experiment as designed by the researcher or with the internal model that the subject is implicitly using. For example, while classical SDT assumes independence between trials and an experiment may be designed in that way, the subject may assume temporal dependencies between trials. The analysis framework we use here accounts for both these possibilities. A participant may engage in an experiment with unknown subjective costs and false beliefs, as specified in the subjective actor model. Similarly, this analysis framework also allows for the researcher to consider multiple alternative models of behavior and quantify both the uncertainty over individual models’ parameters as well as uncertainty over models through Bayesian model comparison.
The limitations of the current method are mainly due to the limitations of the LQG framework. For instance, the assumption that perceptual uncertainty is constant across the whole range of the stimulus domain is adequate for position stimuli, but would not be correct for stimuli that behave according to Weber’s law (Weber, 1834). This could be addressed using an extension of the LQG to signaldependent uncertainty (Todorov, 2005) to which our inverse optimal control method can be adapted, as we have recently shown (Schultheis et al., 2021). Similarly, the assumption of linear dynamics can be overcome using generalizations of the LQG framework (Todorov and Li, 2005) or policies parameterized with neural networks (Kwon et al., 2020). Finally, assuming independent noise across time steps at the experimental sampling rate of (60 Hz) is certainly a simplifying assumption. Nevertheless, the assumption of independent noise across time steps is very common both in models of perceptual inference as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework.
Taken together, the current analysis framework opens up the prospect of a wider adoption of continuous psychophysics paradigms in psychology, cognitive science, and neuroscience as it alleviates the necessity of hundreds of repetitive trials with binary forcedchoice responses in expert observers, recovering perceptual uncertainties closer to classical paradigms than previous analysis methods. Additionally, it extracts meaningful psychological quantities capturing behavioral variability, effort costs, and subjective perceptual beliefs and provides further evidence for the importance of modeling the behavioral goal and subjective costs in experiments seemingly only investigating perception as perception and action are inseparably intertwined.
Materials and methods
LQG control derivations
Request a detailed protocolFor the LQG control problem as defined in section ‘Computational models of psychophysical tasks,’ the separation principle between estimation and control holds (Davis and Vinter, 1985). This means that the optimal solutions for the estimator $K$ and controller $L$ can be computed independent from one another. Note that this does not mean that actions are independent of perception because ${\mathbf{u}}_{t}$ depends on ${\widehat{\mathbf{x}}}_{t}$.
The optimal estimator is the KF (Kalman, 1960)
where the Kalman gain ${K}_{t}={P}_{t}{C}^{T}{(C{P}_{t}{C}^{T}+W{W}^{T})}^{1}$ is computed using a discretetime Riccati equation
initialized with ${\widehat{\mathbf{x}}}_{0}=0$ and ${P}_{0}=V{V}^{T}$. Note that our analyses were insensitive to the initialization because state uncertainty and Kalman gains converged after a few time steps, while trials were about 1000 time steps long, which led to a negligible influence of the initial conditions. The optimal controller
is the LQR ${L}_{t}={({B}^{T}{S}_{t+1}B+R)}^{1}{B}^{T}{S}_{t+1}A$ and is also computed with a discretetime Riccati equation
initialized with ${S}_{T}=Q$. In principle, all matrices of the dynamical system and thus also $K$ and $L$ could be timedependent, but since this is not the case in the models considered in this work, we leave the time indices out for notational simplicity. Due to this time invariance of the dynamical system, the gain matrices ${K}_{t}$ and ${L}_{t}$ typically converge after a few time steps. This allows us to use the converged versions, which we call $K$ and $L$ in our inference method.
Crosscorrelograms
Request a detailed protocolAs a summary statistic of the tracking data, we use CCGs as defined by Mulligan et al., 2013. The CCG at time lag $\tau $ is the correlation between the velocities of the target ${v}_{t}^{\text{(target)}}$ at time $t$ and the velocity of the response ${v}_{t\tau}^{\text{(response)}}$ at time $(t\tau )$:
where $\text{cov}(x,y)$ and $s(x)$ are the covariance and standard deviation across all time steps and trials. The velocities are estimated from position data via finite differences.
Likelihood function
Request a detailed protocolDue to the conditional independence assumption implied by the graphical model from the researcher’s point of view (Figure 1F), the next state ${\mathbf{x}}_{t+1}$ depends on the whole history of the subject’s noisy observations ${\mathbf{y}}_{1:t}$ and internal estimates ${\widehat{\mathbf{x}}}_{1:t}$. Since, for the researcher, the subject’s observations ${\mathbf{y}}_{t}$ are latent variables, the Markov property between the ${\mathbf{x}}_{t}$ no longer holds and each ${\mathbf{x}}_{t}$ depends on all previous ${\mathbf{x}}_{1:t1}$, that is, $p({\mathbf{x}}_{1:T}\theta )={\prod}_{t=1}^{T1}{p}_{\theta}({\mathbf{x}}_{t}{\mathbf{x}}_{1:t1})$. We can, however, find a more manageable representation of the dynamical system by describing how the ${\mathbf{x}}_{t}$ and ${\widehat{\mathbf{x}}}_{t}$ evolve jointly (van den Berg et al., 2011). This allows us to write down a joint dynamical system of ${\mathbf{x}}_{t}$ and $\hat{\mathbf{x}}}_{t$, which only depends on the previous time step:
For the derivation of this joint dynamical system and the definitions of F and G, see Appendix 1, ‘Derivation.’ The derivation is extended to account for differences between the true generative model and the agent’s subjective internal model in Appendix 1, ‘Extension to subjective internal models’.
This allows us to recursively compute the likelihood function in the following way. If we know the distribution $p({\mathbf{z}}_{t}{x}_{1:t1})=\mathcal{N}({\mu}_{tt1},{\mathrm{\Sigma}}_{tt1})$, we can compute the joint distribution $p({\mathbf{z}}_{t},{\mathbf{z}}_{t+1}{\mathbf{x}}_{1:t1})$ using the dynamical system from (10). Since this distribution is a multivariate Gaussian, we can condition on ${\mathbf{x}}_{t}$ and marginalize out the $\hat{\mathbf{x}}}_{t$ and $\hat{\mathbf{x}}}_{t+1$ to arrive at ${p}_{\theta}({\mathbf{x}}_{t+1}{\mathbf{x}}_{1:t})$. For details, see Appendix 1. The computation of the likelihood function involves looping over a possibly large number of time steps. To make computing gradients w.r.t. the parameters computationally tractable, we use the Python automatic differentiation library jax (Frostig et al., 2018). Our implementation is available at https://github.com/RothkopfLab/lqg; Straub, 2022.
Bayesian inference
Request a detailed protocolEquipped with the likelihood function derived above, we can use Bayes’ theorem to compute the posterior distribution for the parameters of interest:
Since most parameters (except the internal costs c) are interpretable in degrees of visual angle, prior distributions $p(\mathit{\theta})$ can be chosen in a reasonable range depending on the experimental setup. We used halfCauchy priors with the scale parameter γ set to 50 for σ, 25 for ${\sigma}_{p}$, 0.5 for ${\sigma}_{m}$, and 1 for c. We verified that the prior distributions lead to reasonable tracking data using prior predictive checks with the CCGs of the tracking data.
Samples from the posterior were drawn using NUTS (Hoffman and Gelman, 2014) as implemented in the probabilistic programming package numpyro (Phan et al., 2019).
Model comparison
Request a detailed protocolWe compare models using the WAIC (Watanabe and Opper, 2010), which approximates the expected log pointwise predictive density. Importantly, unlike AIC it does not make parametric assumptions about the form of the posterior distribution. It can be computed using samples from the posterior distribution (see Vehtari et al., 2017, for a detailed explanation):
where ${V}_{s=1}^{S}$ is the variance over samples from the posterior. Since WAIC requires individual data points ${\mathbf{x}}_{i}$ to be independent given the parameters θ, we define the terms $p\left({\mathbf{x}}_{i}{\mathit{\theta}}_{s}\right)$ as the likelihood of a trial $i$ given posterior sample $s$, which is computed as explained in section ‘Likelihood function.’ We use the implementation from the Python package arviz (Kumar et al., 2019).
When comparing multiple models, the WAIC values can be turned into Akaike weights summing to 1
where ${\mathrm{\Delta}}_{i}$ is the difference between the current model’s and the best model’s WAIC. The Akaike weights can be interpreted as the relative likelihood of each model.
Note that the assumption of independent noise across time steps might lead to WAIC values that are larger than those obtained under a more realistic noise model involving correlations across time. However, this should not necessarily affect the ranking between models in a systematic way, that is, favoring individual models disproportionately more than others.
Appendix 1
Inverse LQG likelihood
Derivation
We define $\mathit{\theta}$ to be the vector containing any parameters of interest, which influence the LQG system defined above. This could be parameters of the matrices of the dynamical system ($A$, $B$, $V$), the observation model ($C$, $W$), or the cost function ($Q$, $R$). For notational simplicity, we do not explicitly indicate this dependence of the matrices on the parameters. We are interested in the posterior probability
We start by writing the likelihood using repeated applications of the chain rule:
The current state ${\mathbf{x}}_{t}$ depends on the whole history of previous states ${\mathbf{x}}_{1:t1}$ since the internal observations ${\mathbf{y}}_{1:T}$ and estimates $\hat{\mathbf{x}}}_{1:T$ are unobserved from the researcher’s perspective (see Figure 1F). The linear Gaussian assumption, however, allows us to marginalize over these variables efficiently. To do this, we start by expressing the current state ${\mathbf{x}}_{t}$ and estimate $\hat{\mathbf{x}}}_{t$ such that they only depend on the previous time step. This idea is based on LQGMP (van den Berg et al., 2011), a method for planning in LQG problems, which computes the distribution of states and estimates without any observations. We start by inserting the LQR control law (Equation 7) into the state update (Equation 1):
Then, we rewrite the KF update equation (Equation 5)
Here, we have again inserted the LQR control law (Equation 7), then the observation model (Equation 2), then the state transition model (Equation 1), and finally rearranged the terms. Putting these together, we can write
This allows us to recursively compute the likelihood function (Equation S2) in the following way. If we know the distribution $p({\mathit{z}}_{t}{x}_{1:t1})=\mathcal{N}({\mathit{\mu}}_{tt1},{\mathrm{\Sigma}}_{tt1})$, we can compute the joint distribution $p({\mathbf{z}}_{t},{\mathbf{z}}_{t+1}{\mathbf{x}}_{1:t1})$ using the dynamical system from Equation S10 and the formulas for linear transformations of Gaussian distributions, giving us the mean and covariance matrix
Since this distribution is a multivariate Gaussian, we can marginalize out $\hat{\mathbf{x}}}_{t$ to obtain ${p}_{\mathit{\theta}}({\mathbf{x}}_{t},{\mathbf{z}}_{t+1}{\mathbf{x}}_{1:t1})$ and then condition on ${\mathbf{x}}_{t}$ to arrive at ${p}_{\mathit{\theta}}({\mathbf{z}}_{t+1}{\mathbf{x}}_{1:t})$. We do this by partitioning the mean and covariance matrix as
To marginalize out $\hat{\mathbf{x}}}_{t$, we simply drop those rows and columns that correspond to $\hat{\mathbf{x}}}_{t$, yielding
To condition on ${\mathbf{x}}_{t}$, we proceed by applying the equations for conditioning in a multivariate Gaussian distribution to obtain
with
and
To obtain the contribution to the likelihood at the current time step $p({\mathbf{x}}_{t+1}{\mathbf{x}}_{1:t})$, we take ${p}_{\mathit{\theta}}({\mathbf{z}}_{t+1}{\mathbf{x}}_{1:t})={p}_{\mathit{\theta}}({\mathbf{x}}_{t+1},{\hat{\mathbf{x}}}_{t+1}{\mathbf{x}}_{1:t})$ and marginalize over $\hat{\mathbf{x}}}_{t+1$. Again, since everything is Gaussian, we can achieve this by simply dropping those rows of the mean and rows and columns of the covariance matrix that correspond to $\hat{\mathbf{x}}}_{t+1$. This concludes our iterative algorithm for computing the likelihood factors. Without loss of generality (since we can subtract the initial time step from the observed trajectories to let them start at zero), we initialize $p({\mathbf{z}}_{1}{\mathbf{x}}_{0})$ with ${\mathit{\mu}}_{10}=0$ and ${\mathrm{\Sigma}}_{10}=G{G}^{T}$.
Extension to subjective internal models
If we allow the subjective internal model to differ from the true generative model of the experiment, we need to adapt the model equations accordingly. We denote matrices of the true generative model with superscript g and matrices representing the subjective internal model with superscript $s$. The state update model
and the observation model
remain the same and contain only the matrices of the generative model. The KF update equation, being internal to the subject, is now influenced by several subjective components
The joint dynamical system of ${\mathbf{x}}_{t}$ and ${\widehat{\mathbf{x}}}_{t}$ is then
Then, we proceed by conditioning on the observed states and marginalizing out the subject’s internal estimates at each time step as described in the previous subsection.
Appendix 2
Models of the tracking task
As explained in section ‘Computational models of psychophysical tasks,’ we consider a succession of increasingly more complex models, which can be seen as generalizations of the previous ones. In this section, we provide a verbal description of each of the models. Appendix 2—table 1 contains full definitions of all the matrices needed to define the models.
Ideal observer (KF)
For the KF model, the state is simply the position of the target, ${\mathbf{x}}_{t}=\left[\begin{array}{c}\hfill {x}_{t}^{(\text{target})}\hfill \end{array}\right]$. The subject’s cursor position is taken as the internal estimate of the state ${\widehat{\mathbf{x}}}_{t}$. The likelihood of the observed data $({\mathbf{x}}_{1:T},{\widehat{\mathbf{x}}}_{1:T})$ can be computed as above (Appendix 1), but since the model does not explicitly include actions and instead treats the internal estimate as the subject’s cursor position, we need to condition on ${\mathbf{x}}_{1:T}$ and ${\widehat{\mathbf{x}}}_{1:T}$ at every time step.
Optimal and bounded actor
The models based on LQG control explicitly include the subject’s cursor position as a part of the state. In the simplest case, the state is twodimensional and contains the position of the target stimulus and the subject’s position of the response: ${\mathbf{x}}_{t}={\left[\begin{array}{cc}\hfill {x}_{t}^{(\text{target})}\hfill & \hfill {x}_{t}^{(\text{response})}\hfill \end{array}\right]}^{T}$. The continuous psychophysics experiments considered here used target stimuli moving according to a simple Gaussian random walk.
Accordingly, the model may assume that the target only moves because of the random walk with standard deviation ${\sigma}_{\text{rw}}$ and that the response is influenced by the agent’s control input and by action variability ${\sigma}_{m}$. The agent observes the target and response position separately, with independent observation noise in both dimensions ($\sigma $ for the target and ${\sigma}_{p}$ for the response). The behavioral cost of actions is penalized by the parameter $c$. For the optimal actor without internal costs, we set $c=0$, while for the bounded actor $c$ is a free parameter.
Subjective actor
The target’s position in the tracking task by Bonnen et al., 2015 is on a random walk,
$\begin{array}{r}{x}_{t+1}^{\text{(target)}}={x}_{t}^{\text{(target)}}+{\sigma}_{\text{rw}}{\u03f5}_{t},\phantom{\rule{thickmathspace}{0ex}}{\u03f5}_{t}\sim \mathcal{N}(0,1).\end{array}$(S27)
The subjective actor may assume that the target dynamics differ from this true random walk in that there is still a random walk on the target’s position, but the standard deviation can be different from the true standard deviation. In addition, the subjective actor may assume that in addition to the target position’s random walk, the target’s velocity can change over time according to its own random walk,
$\begin{array}{r}{x}_{t+1}^{\text{(target)}}={x}_{t}^{\text{(target)}}+dt\cdot {v}_{t}+{\sigma}_{s}{\u03f5}_{t},\phantom{\rule{thickmathspace}{0ex}}{\u03f5}_{t}\sim \mathcal{N}(0,1)\end{array}$(S28)
Thus, for the subjective actor, the actual matrices of the generative model remain identical to the basic model, but the agent assumes a different dynamical system, in which the target has an additional dimension representing its velocity ${\mathbf{x}}_{t}={\left[\begin{array}{ccc}\hfill {x}_{t}^{(\text{target})}\hfill & \hfill {x}_{t}^{(\text{response})}\hfill & \hfill {v}_{t}\hfill \end{array}\right]}^{T}$. The agent still receives an observation of the positions only, with independent observation noise on both dimensions. The parameters ${\sigma}_{s}$ and ${\sigma}_{v}$ represent the subject’s belief about the standard deviation of the target’s position and velocity. The matrices of the subjective part of the model are changed to represent the subjective dynamics described here verbally (see Appendix 2—table 1).
Appendix 3
Simulations
Parameter recovery
To evaluate the inference method, 200 sets of parameters for the subjective actor model were sampled from the following uniform distributions:
For each set of parameters, we simulated datasets with the following number of trials: 10, 25, and 50 trials. Each trial was 720 time steps long. For each dataset, four Markov chains with 7500 samples each were drawn from the posterior distribution, after 2500 warmup steps.
Model recovery
To investigate how different models behave when fit to tracking data generated from one of the other models, we performed a model recovery analysis. To this end, 20 different values for the parameters $c$ and ${\sigma}_{m}$ were sampled from uniform distributions defined above. The values of the other model parameter were fixed at $\sigma =6$, ${\sigma}_{p}=1$, ${\sigma}_{s}=0.1$ and ${\sigma}_{v}=10$. For each set of parameters, we simulated datasets of 50 trials with 500 time steps per trial from the bounded actor model and the subjective model. Then, we estimated fits for each of the four different models presented in Figure 2 to the tracking data. We ran four chains with 5000 samples each after 2000 warmup steps.
Appendix 4
Experiments
We reanalyze data from three previous publications (Bonnen et al., 2015; Bonnen et al., 2017; Knöll et al., 2018). In this section, we describe our data analysis procedures for these datasets. For detailed information about the experiments, we refer to the original publications.
Continuous psychophysics
First, we reanalyze data from three subjects in two previously published experiments (Bonnen et al., 2015): a position discrimination and a position tracking task. The stimuli were ‘Gaussian blobs,’ twodimensional Gaussian functions on Gaussian pixel noise, which changed from frame to frame. The visibility of the stimuli was manipulated via the standard deviation of the Gaussian blobs, with larger standard deviations leading to lower luminance increments and thus to a weaker signal. The standard deviations were 11, 13, 17, 21, 25, and 29 arcmin.
In the discrimination experiment, position discrimination thresholds for each blob width were measured in a twointerval forcedchoice (2IFC) task. We fit cumulative Gaussian psychometric functions with lapse rates in a BetaBinomial model to the discrimination performance using psignifit (Schütt et al., 2015). We use the posterior mean of the standard deviation of the Gaussian psychometric function as our estimate of the perceptual uncertainty in the 2IFC task.
In the tracking experiment, the same visual stimuli moved on a random walk with a standard deviation of one pixel (1.32 arcmin of visual angle) per frame at a frame rate of 60 Hz. Subjects were instructed to track the target with a small red mouse cursor. Each subject completed 20 trials for each blob width, with each trial lasting 1200 time steps (20 s). As in Bonnen et al., 2015, the response time series were shifted by $\tau =12$ time steps w.r.t. the target time series to account for a constant time lag. Instead, one could explicitly account for the time lag in the observation function of the model by extending the state space to include the $\tau $ previous time steps, but we found no appreciable differences in the model fits. We obtain posterior mean perceptual uncertainty estimates using the methods described in sections ‘Likelihood function’ and ‘Bayesian inference’. Since there are two stimulus presentations in a 2IFC task and we are interested in perceptual uncertainty for a single stimulus presentation, we divided the estimates of the perceptual uncertainty by √2 to obtain singleinterval sensory uncertainties. This corresponds to the assumption that the subject judges the difference between the stimuli observed in the two intervals. Although this assumption is debated and empirical deviations in both directions were observed (Yeshurun et al., 2008), as discussed in the main text, this assumption is a natural starting point when converting between 2IFC and singleinterval perceptual uncertainties. Because the stimuli in the 2IFC task were shown for 15 frames and the computational models of the tracking task operate on single frames, we apply a second correction factor to the perceptual uncertainty estimates. Assuming optimal integration of the Gaussian uncertainty across frames, the 2IFC sensory uncertainty estimates were multiplied by a constant factor of √15.
Motion tracking
We reanalyzed the data from experiment 2 of Bonnen et al., 2017, in which subjects tracked a circular target on a gray background with their cursor. Instead of the computer mouse, the cursor was controlled used a Leap Motion hand tracking device (Leap Motion, San Francisco, USA). There were five different standard deviations for the target’s random walk. Each subject completed 20 trials per standard deviation in randomly interleaved blocks of 10 trials. As above, the response time series were shifted by 12 time steps w.r.t. the target time series.
Gaze tracking
We reanalyzed the data from an experiment on gaze tracking (Knöll et al., 2018), in which two macaques (M1 and M2), one human participant (H), and a marmoset (C) tracked the center of an optical flow field displayed on a screen, while their eye movements were recorded. In this experiment, the temporal shift applied to the response time series was 10 time steps, which was determined from the CCGs.
Appendix 5
Crossvalidation
For the experiment from Bonnen et al., 2017, in addition to the model fits on the data from all conditions, we performed a crossvalidation evaluation for the tracking experiment with different random walk conditions. In experiment 2 from the study by Bonnen et al., 2017, there were five different standard deviation conditions for the target’s random walk, which were presented in blocks (see Appendix 4, ‘Motion tracking’). Specifically, we fit all models to the data in a leaveoneconditionout crossvalidation scheme, that is, we fit the model on data from four conditions and evaluate it on the remaining condition, and repeated this procedure for each condition. As a metric for evaluation, we compute the log predictive posterior density relative to the best performing model ($\mathrm{\Delta}\mathrm{log}p$). It was higher for the subjective model than for the other models in all but one condition of one participant (Appendix 5—figure 1A). To check which model yields the most reliable estimates of perceptual uncertainty, we look at the posterior mean estimates of $\sigma $ (Appendix 5—figure 1B). The subjective model’s estimates are most consistent across crossvalidation conditions, with an average standard deviation of 0.25 arcmin, compared to 0.60 for the basic LQG and 0.52 for the KF. This suggests that the model recovers the subjects’ perceptual uncertainty more reliably than the alternative models.
Application to eyetracking data
We reanalyze data from the experiment described in Appendix 4, ‘Gaze tracking.’ Since the generative model of the target’s random walk was slightly different compared to the other experiments, the model was adapted accordingly.
The target dynamics has a velocity component that depends on the current position, such that the target tends toward the center of the screen. The matrices of the dynamical system and cost function are
Thus, the agent’s estimate is also threedimensional, that is, the agent estimates a velocity from their observations of the position.
As for the other experiments, we fit all four models to the data and computed WAIC as a model comparison metric (Appendix 5—figure 2). Again, the behavioral data is better accounted for by the LQG models compared to the KF model. As in the other analyses of experimental data, the bounded actor models with internal costs better account for the data than the optimal actor model.
Data availability
The current manuscript is a computational study, so no new data have been generated for this manuscript. Modelling code is uploaded at https://github.com/RothkopfLab/lqg, (copy archived at swh:1:rev:58ab4c621081d6eb9eccefd0f3f3c91032ddca38).

github.comID BonnenEtAl2015_KalmanFilterCode. Data from: Continuous psychophysics: Targettracking to measure visual sensitivity.
References

Is human cognition adaptive?Behavioral and Brain Sciences 14:471–485.https://doi.org/10.1017/S0140525X00070801

Optimal control of markov processes with incomplete state informationJournal of Mathematical Analysis and Applications 10:174–205.https://doi.org/10.1016/0022247X(65)90154X

Adaptive temporal integration of motion in directionselective neurons in macaque visual cortexThe Journal of Neuroscience 24:7305–7323.https://doi.org/10.1523/JNEUROSCI.055404.2004

Dynamic mechanisms of visually guided 3d motion trackingJournal of Neurophysiology 118:1515–1531.https://doi.org/10.1152/jn.00831.2016

ConferencePredictive inverse optimal control for linearquadraticgaussian systemsEighteenth International Conference on Artificial Intelligence and Statistics. PMLR. pp. 165–173.

BookStochastic Modelling and ControlLondon: Chapman and Hall.https://doi.org/10.1007/9789400948280

Energetics of muscular exerciseReviews of Physiology, Biochemistry and Pharmacology 89:143–222.https://doi.org/10.1007/BFb0035266

On the optimal control of stochastic linear systemsIEEE Transactions on Automatic Control 16:776–785.https://doi.org/10.1109/TAC.1971.1099840

Serial dependence in visual perceptionNature Neuroscience 17:738–743.https://doi.org/10.1038/nn.3689

ConferenceCompiling machine learning programs via highlevel tracingSystems for Machine Learning. pp. 23–24.

Sequential idealobserver analysis of visual discriminationsPsychological Review 96:267–314.https://doi.org/10.1037/0033295x.96.2.267

The neural basis of decision makingAnnual Review of Neuroscience 30:535–574.https://doi.org/10.1146/annurev.neuro.29.051605.113038

Consistency of auditory detection judgmentsPsychological Review 71:392–407.https://doi.org/10.1037/h0044520

ConferenceInverse reinforcement learning with simultaneous estimation of rewards and dynamicsIn Artificial Intelligence and Statistics. pp. 102–110.

The nouturn sampler: adaptively setting path lengths in hamiltonian monte carloJournal of Machine Learning Research: JMLR 15:1593–1623.https://doi.org/10.5555/2627435.2638586

Multistep planning of eye movements in visual searchScientific Reports 9:1–12.https://doi.org/10.1038/s41598018375360

Beyond trialbased paradigms: continuous behavior, ongoing neural activity, and natural stimuliThe Journal of Neuroscience 38:7551–7558.https://doi.org/10.1523/JNEUROSCI.192017.2018

Sources of signaldependent noise during isometric force productionJournal of Neurophysiology 88:1533–1544.https://doi.org/10.1152/jn.2002.88.3.1533

Planning and acting in partially observable stochastic domainsArtificial Intelligence 101:99–134.https://doi.org/10.1016/S00043702(98)00023X

A new approach to linear filtering and prediction problemsJournal of Basic Engineering 82:35–45.https://doi.org/10.1115/1.3662552

When is a linear control system optimal?Journal of Basic Engineering 86:51–60.https://doi.org/10.1115/1.3653115

ArviZ a unified library for exploratory analysis of bayesian models in pythonJournal of Open Source Software 4:1143.https://doi.org/10.21105/joss.01143

Inverse rational control with partially observable continuous nonlinear dynamicsNeural Information Processing Systems 33:7898–7909.https://doi.org/10.48550/arXiv.2009.12576

“Utilizing” signal detection theoryPsychological Science 25:1663–1673.https://doi.org/10.1177/0956797614541991

Psychophysics with children: investigating the effects of attentional lapses on threshold estimatesAttention, Perception & Psychophysics 80:1311–1324.https://doi.org/10.3758/s1341401815102

BookVision: A Computational Investigation into the Human Representation and Processing of Visual InformationW. H. Freeman.

ConferenceReflexive and voluntary control of smooth eye movementsIS&T/SPIE Electronic Imaging.https://doi.org/10.1117/12.2010333

In Proc. 17th International Conf. on Machine Learning663–670, Algorithms for inverse reinforcement learning, In Proc. 17th International Conf. on Machine Learning, Morgan Kaufmann.

Iterative bayesian estimation as an explanation for range and regression effects: a study on human path integrationThe Journal of Neuroscience 31:17220–17229.https://doi.org/10.1523/JNEUROSCI.202811.2011

ConferencePreference elicitation and inverse reinforcement learningIn Joint European conference on machine learning and knowledge discovery in databases.

ConferenceI see what you see: inferring sensor and policy models of human realworld motor behaviorProceedings of the AAAI Conference on Artificial Intelligence. pp. 3797–3803.https://doi.org/10.1609/aaai.v31i1.11049

Toward a rational and mechanistic account of mental effortAnnual Review of Neuroscience 40:99–124.https://doi.org/10.1146/annurevneuro072116031526

A behavioral model of rational choiceThe Quarterly Journal of Economics 69:99.https://doi.org/10.2307/1884852

SoftwareRothkopfLab/lqg, version swh:1:rev:58ab4c621081d6eb9eccefd0f3f3c91032ddca38Software Heritage.

Indices of discrimination or diagnostic accuracy: their rocs and implied modelsPsychological Bulletin 99:100–117.https://doi.org/10.1037/00332909.99.1.100

Optimal feedback control as a theory of motor coordinationNature Neuroscience 5:1226–1235.https://doi.org/10.1038/nn963

ConferenceA generalized iterative lqg method for locallyoptimal feedback control of constrained nonlinear stochastic systemsIn Proceedings of the 2005, American Control Conference.

LQGmp: optimized path planning for robots with motion uncertainty and imperfect state informationThe International Journal of Robotics Research 30:895–913.https://doi.org/10.1177/0278364911406562

Practical bayesian model evaluation using leaveoneout crossvalidation and waicStatistics and Computing 27:1413–1432.https://doi.org/10.1007/s1122201696964

Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theoryJournal of Machine Learning Research 11:12.

BookDe Pulsu, Resorptione, Auditu et Tactu: Annotationes Anatomicae et PhysiologicaeCF Koehler.

Methods in psychophysicsExperimental Psychology and Cognitive Neuroscience 5:1–42.https://doi.org/10.1002/9781119170174

The forgotten history of signal detection theoryJournal of Experimental Psychology. Learning, Memory, and Cognition 46:201–233.https://doi.org/10.1037/xlm0000732

Computational principles of movement neuroscienceNature Neuroscience 3 Suppl:1212–1217.https://doi.org/10.1038/81497

ConferenceMaximum entropy inverse reinforcement learningIn Proceedings of the 23rd national conference on Artificial intelligence. pp. 1433–1438.
Decision letter

Jörn DiedrichsenReviewing Editor; Western University, Canada

Joshua I GoldSenior Editor; University of Pennsylvania, United States

Jörn DiedrichsenReviewer; Western University, Canada
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Putting perception into action: Inverse optimal control for continuous psychophysics" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Jörn Diedrichsen as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Joshua Gold as the Senior Editor.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission. Overall, all reviewers thought that the paper has the potential to become a very valuable contribution to the field. The revisions focus mainly on the clarity of the submission. The extensive original reviews are attached below and should be responded to in detail.
Essential revisions:
1) The assumption of independence of motor noise across observations (reviewer #1, comment #1) is likely problematic in terms of obtaining valid WAIC values – and this problem should be addressed.
2) The technical exposition in the introduction is sometimes confusing and the clarity of the explanations should be improved. The specific comments provide a detailed list of constructive suggestions here.
3) The discussion of the previous literature is not always as balanced and would be ideal (see reviewer #2)
4) For a tools and resource paper, I would hold that the associated code and documentation should be visible to the reviewers at the point of submission, so the quality of the online resource can be reviewed together with the paper. So I would ask you to provide a working link with the revision of the paper.
Reviewer #1: All comments in public review.
Reviewer #2 (Recommendations for the authors):
Specific Remarks:
– There are passages that read like poorly grounded indictments of wellestablished literature. Supplementary Section A, which functions as support for claims in the main text and the figure 3 caption, references a study by Yeshurun et al., (2008). The authors of the present paper write that Yeshurun et al., 'investigated the validity of the theoretical result (Green and Swets, 1966), that the forcedchoice sensitivity d'FC differs from the sensitivity obtained through an equivalent YesNo task d'YN by a factor of √2, i.e. d'FC = √2d'YN. Apart from running a serious of experiments targeted at evaluating the validity of this theoretical result, the authors reported the actual values of this factor based on extensive literature review involving numerous studies investigating perceptual sensitivities across sensory modalities. Indeed, the literature review collected a broad range of deviations from the theoretical value.'
The authors then go on to specifically reference two of the papers cited by Yeshurun (2008). One of these papers (Jesteadt and Bilger, 1974) reported that dprime estimated with a forcedchoice task was better by a factor of 2.1 than performance in an equivalent yesno task. The other (Markowitz and Swets, 1967) reported dprime estimated with a forcedchoice task was better by only 1.15. These two studies deviated from the predicted 1.4 improvement by factors of 1.5x and 0.8x, respectively.
The logic of the passage is problematic, and the passage seems only tenuously connected to the aims of the paper. In my view, I think the supplementary passage and the connected threads in the main text are harmful to the paper's messaging and should be removed. Here is why: The theoretical result relating sensitivity in forcedchoice and yesno tasks as reported by Green and Swets (1966) is easy to derive in a few lines and is most certainly valid, presuming that certain assumptions hold. The authors can of course fairly question whether these assumptions hold in certain experiments (as did Yeshurun et al., 2008), but experiments do not, and could not, evaluate the 'validity of this theoretical result'. The theoretical result stands on its own. The authors' writing seems to conflate a lack of empirical variability (e.g. due to measurement error, assumptions not holding) with a lack of theoretical validity. I thought, initially, that this could perhaps be explained away by inattentive writing.
But the authors' then use the range of deviations aforementioned deviationsthe 1.5x deviation reported by Jesteadt and Bliger was on auditory frequency and intensity discrimination; the 0.8x deviation reported by Markowitz and Swets, was on sound detectionto suggest that the observed 1.36x deviations between tracking and forcedchoicebased estimates (from Model 4 fits) are within an established range of deviations in the literature. The relevance of these numbers to the present work is dubious at best. Consider, for example, if some other pair of studies on gustatory sodium discrimination or detection had reported deviations ranging from 0.25x to 8x the theoretically predicted values. Would that somehow be relevant to the interpretation of the current work? I am not sure why the authors thought this a valuable addition to the paper. They seem to be using these studies to suggest that because other studies have deviated from predictions (whatever the causes) that discrepancies in the present study should be thought of as small. It all feels forced. After writing out these notes, I was concerned that perhaps I had been making too much of it. But the next issue has a similar flavor.
– The manuscript appears both to misleadingly describe findings in Bonnen et al., (2015), and to selfcontradict its own assertions regarding those findings. The authors several times write that Bonnen et al.,'s trackingbased estimates of position uncertainty (i.e. position discrimination thresholds) are an order of magnitude (i.e. ~10x) higher than corresponding forcedchoicebased estimates. The authors contrast this discrepancy with their own results, saying that they 'infer values that differ by a factor of 1.36 on average'. It is true that the raw estimates of Bonnen et al., reported trackingbased estimates that were ~10x larger. But that was before Bonnen et al., took several obvious factors into account (e.g. the benefit of integrating across 15 frames of each stimulus presentation, the predicted improvement in sensitivity in 'twolook' forced choice tasks). After taking these factors into account (as it is appropriate to do), the ~10x difference between the tracking and forcedchoicebased was reduced to a ~2x difference. In the supplement, the authors acknowledge that Bonnen et al., took these factors into account, but their writing in the abstract, introduction, and discussion (references to order of magnitude differences) reads as if they are unaware. I am unclear about the reasons for this discrepancy, but it should be corrected.
Further, on page 5, left column the authors report that estimates of perceptual uncertainty of target position from the Kalman Filter model differ from the forcedchoicebased estimates by approximately 2x. They write: 'The average factor between the posterior mean perceptual uncertainty in the continuous task and the 2IFC task in the 5 higher contrast conditions is 1.20 for the LQG model, while it is 1.73 for the KF. Only in the lowest contrast condition, it increases to 2.20 and 2.93 arcmin, respectively.'. This difference is in line with the conclusions of Bonnen et al., and does not square with the assertion that trackingbased estimates were off by an order of magnitude (~10x). Given that the current Kalman filter analysis essentially replicates the analysis performed by Bonnen et al., these two assertions (10x differences vs 2x differences) cannot both be correct. Again, in keeping with the first point above, all this seems like a forced attempt to convince the reader that observed deviations are small between the tracking vs. forcedchoicebased estimates of perceptual uncertainty about target position. But it seems unnecessary. Model 4 does a more accurate job of estimating perceptual uncertainty about target position than simpler models (and seems to provide a closer match to forcedchoicebased estimates). Let those results stand on their own.
– For a computational paper, in which the primary contribution is to develop methods for data analysis and parameter estimation, there should be substantially more discussion of which parameters have their values trade off of one another. For example, the authors have taken the trouble visualize a posterior distribution over the parameters (Figure 3A). They should help the reader develop an intuition for why the posterior has the structure that it does. All models other than the Kalman filter model include perceptual uncertainty about target position (\σ), perceptual uncertainty about cursor position (\σ_p), and motor variability. All of these quantities affect the reliability with which discrepancies between target and cursor position can be evaluated, and should have similar effects on performance. The figure shows that these parameters tradeoff with each other (strong negative correlations in the posterior distributions over the model parameters). The authors should include some discussion of these tradeoffs, helping to provide the reader intuition for why these parameters tradeoff in the way that they do. Similarly for the positive correlation between motor variability (\σ_m) and perceptual uncertainty about cursor position.
– Regarding their analysis of simulated data, the authors report that attempts to infer the value of the target position uncertainty (\σ) from Model 4 were accurate (Figure 3C). It would be helpful for the authors to describe in more detail the parameters of the simulation. Specifically, in would be useful to explain and provide intuition for how, if the observer had a mistaken belief regarding the drift dynamics, it is possible to accurately infer the value of perceptual uncertainty about target position. A naïve reader might presume that, because there is an infinite number of pairs of position uncertainty (\σ) and presumed drift variance (\σ_s) that determine the Kalman gain, that \σ could not in fact be accurately estimated if \σ_s does not equal the true drift parameter (\σ_rw). Please explain for the reader how this works.
Tone
The current paper describes shortcomings of the Bonnen et al., (2015) data analyses. Many of these shortcomings were explicitly acknowledged by Bonnen et al., It would help the tone and tenor of the manuscript if, when the current authors describe the acknowledged shortcomings, the current authors cite Bonnen et al., (2015).
Examples include:
– "A model, which only considers the perceptual side of the task, will therefore tend to overestimate perceptual uncertainty because these additional factors get lumped into perceptual model parameters, as we will show in simulations."
– "A model without these factors needs to attribute all the experimentally measured behavioral biases and variability to perceptual factors, even when they are potentially caused by additional cognitive and motor processes."
The current paper makes a nice contribution in demonstrating these points quantitatively. But, in places (and in keeping with the flavor of remarks above), the writing seems concerned that readers might not recognize the paper's unique contribution. I think the paper's contribution is clear. And I think that their paper will read better, and leave a better impression, if it looks for ways to portray previous work in a more generous light.
Specific Comments
– Experimentor/Participant distinction
The experimenter has access to the true target position and the true cursor position is available. The experimental participant him/herself has access only to the estimates of target and cursor positions. The article should more explicitly discuss this issue in the main text. Figure 2 and the Supplemental derivations allude to it, but it should be discussed more prominently.
– Explanation of matrix values.
The authors should explain why the B=[0 dt]' parameter takes on the particular values that it does. Is the implication that the control action is best understood as a 'rate of change' so that result matrixmultiplying the control action with B is a position? Please explain. The value of the nonzero element of B trades off perfectly with the square root (?or similar?) of the movement cost parameter 'c'. So it would be valuable for the authors to explain why they chose to set B to have the value it does.
Telegraphic mathematical development.
In the supplement, the mathematics associated with transitioning between Supplemental Equations S16S18 needs to be expanded. Conditioning the nD Gaussian on x_t and marginalizing out x_hat_t are two separate steps, but the manner in which the passage is written encourages the reader to presume that they are a single step. Too much analytical work is foisted on the reader if he/she wants to carefully follow the derivation along. The derivation is correct (I worked it out myself), but it should be laid out more explicitly. Further, the final expression for the likelihood of the state (x_{t+1}) on time step t+1 should be provided. This likelihood is straightforwardly obtained by marginalizing out the estimate (x^{hat}_{t+1}) of the (2vector) state, but it would be nice for the reader if the final expression was actually made explicit for the reader.
Subjective model of state dynamics
The authors should make explicit in equation form the fallacious state dynamics that are assumed by the subjective observer. The authors describe it in words, but equations will prevent any uncertainty that the description may produce. Something like:
position walk: p_{t+1} = p_t + eps; where eps ~ N(0, σ_{rw})
velocity walk: v_{t+1} = v_t + eta; where eta ~ N(0, σ_v )
x_{t+1} = p_{t+1} + sum( v_{ 0:(t+1) } ) where the sum goes from 0 to t+1
Subjective model of estimate of velocity
Also, it is not made completely clear how the subjective observer's fallacious estimate of velocity is computed, how it resides in the model, whether the state is expressed as a 2vector [ x_t x_p ] or a 3vector [ x_t x_p v_t ], and whether the estimate of the state is expressed as a twovector [ xhat_t xhat_p ] or a 3vector [ xhat_t xhat_p vhat_t ]. An expanded discussion of these issues should be included. The expressions are correct but as they stand, but the mathematical development and discussion of these points are a bit too telegraphic.
Typo in the supplementary equation.
I believe that there is a typo in equation S2 of the supplement where the Ricatti equation is laid out. The term with the inverse currently reads ( C*P*C' )^1 whereas this term in standard expressions for the Ricatti equation should be ( C*P*C' + W*W' )^1. This term follows the same form as the inverse term in the expression for the Kalman gain (see the line above) which reflects the total covariance ( prior + observation covariances ).
Reviewer #3 (Recommendations for the authors):
The paper is already in a very polished state. I have only a few comments about clarity and completeness.
– I found myself getting confused while reading the paper about what parameters were actually involved in each model (e.g. pg 3: "the KF model has only one parameter: perceptual uncertainty σ". My recommendation to the authors is to move equations 14 out of the Methods section and into the main text. Personally I would consider this part of the results (ie. "what are the models we're using"?) and so I would rather see this presented pedagogically within the main paper. Keep implementation details or other technical issues in the Methods, but present the models themselves in the Results section. Just a recommendation, but that's my 2 cents!
– This leads to a related note on clarity: it's not entirely clear to me which parameters are being fit in each model. Presumably you're not fitting the C matrix in equation 2? What about B in equation 1? Come to think of it, I'm not sure where the cursor fits in, allowing for uncertainty in cursor position). (Does that become part of the state vector x_t? Please unpack this more clearly so that we can understand what these equations correspond to for each of the models discussed in the paper!
– "Our implementation will be made available on github"  I want to note that this should be a requirement for acceptance! (i.e., the code should be posted before the paper is accepted).
– The name "bounded actor" seems like a poor one, since there aren't any bounds on the actor. (There are just costs). Figure 2 C refers to it as "bounded actor with behavioral costs"  personally I would keep the "with behavioral costs" and drop the "bounded".
– One other point that would be worth making: you could also formulate a "with subjective belief" version of any of the simpler models. Even in the simple KF model, you could allow for mismatch between the true dynamics and the subject's belief about the dynamics. I think it's probably ok to leave the model comparisons as is, since obviously it becomes a bit messy if we have to include "optimal" and "subjective" versions of each of the models, but you should at least mention somewhere in the paper that this can be used to improve the accuracy (in terms of matching observer behavior) for any of the models.
– Can you say anything about the discrete time assumptions of the model? i.e., how would you expect the model parameters (or the accuracy of the fit) to change if you switched to 120 Hz frame rate or 20 Hz frame rate?
– Finally, a small note on grammar/usage: the paper frequently uses "… , which…" in cases where "that" (with no comma). e.g., in the abstract: "recover perceptual thresholds, which are one order of magnitude larger compared to equivalent traditional psychophysical experiments". This makes it sound like you're providing a definition of threshold, rather than describing a property of the recovered thresholds. More correct would be: "We recover perceptual thresholds that are one order of magnitude larger than …". I found this issue repeatedly in the text. You could basically look for nearly every occurrence of ", which" and replace by "that".
https://doi.org/10.7554/eLife.76635.sa1Author response
Essential revisions:
1) The assumption of independence of motor noise across observations (reviewer #1, comment #1) is likely problematic in terms of obtaining valid WAIC values – and this problem should be addressed.
2) The technical exposition in the introduction is sometimes confusing and the clarity of the explanations should be improved. The specific comments provide a detailed list of constructive suggestions here.
3) The discussion of the previous literature is not always as balanced and would be ideal (see reviewer #2)
4) For a tools and resource paper, I would hold that the associated code and documentation should be visible to the reviewers at the point of submission, so the quality of the online resource can be reviewed together with the paper. So I would ask you to provide a working link with the revision of the paper.
Yes, thank you for summarizing the main points by the reviewers. We address all of these points in the responses to individual reviewers in the following. The link to the code associated with our manuscript is https://github.com/RothkopfLab/lqg and has been added in the manuscript.
Reviewer #1: All comments in public review.
Responses provided in response to public review.
Reviewer #2 (Recommendations for the authors):
Specific Remarks:
– There are passages that read like poorly grounded indictments of wellestablished literature. Supplementary Section A, which functions as support for claims in the main text and the figure 3 caption, references a study by Yeshurun et al., (2008). The authors of the present paper write that Yeshurun et al., 'investigated the validity of the theoretical result (Green and Swets, 1966), that the forcedchoice sensitivity d'FC differs from the sensitivity obtained through an equivalent YesNo task d'YN by a factor of √2, i.e. d'FC = √2d'YN. Apart from running a serious of experiments targeted at evaluating the validity of this theoretical result, the authors reported the actual values of this factor based on extensive literature review involving numerous studies investigating perceptual sensitivities across sensory modalities. Indeed, the literature review collected a broad range of deviations from the theoretical value.'
The authors then go on to specifically reference two of the papers cited by Yeshurun (2008). One of these papers (Jesteadt and Bilger, 1974) reported that dprime estimated with a forcedchoice task was better by a factor of 2.1 than performance in an equivalent yesno task. The other (Markowitz and Swets, 1967) reported dprime estimated with a forcedchoice task was better by only 1.15. These two studies deviated from the predicted 1.4 improvement by factors of 1.5x and 0.8x, respectively.
The logic of the passage is problematic, and the passage seems only tenuously connected to the aims of the paper. In my view, I think the supplementary passage and the connected threads in the main text are harmful to the paper's messaging and should be removed. Here is why: The theoretical result relating sensitivity in forcedchoice and yesno tasks as reported by Green and Swets (1966) is easy to derive in a few lines and is most certainly valid, presuming that certain assumptions hold. The authors can of course fairly question whether these assumptions hold in certain experiments (as did Yeshurun et al., 2008), but experiments do not, and could not, evaluate the 'validity of this theoretical result'. The theoretical result stands on its own. The authors' writing seems to conflate a lack of empirical variability (e.g. due to measurement error, assumptions not holding) with a lack of theoretical validity. I thought, initially, that this could perhaps be explained away by inattentive writing.
But the authors' then use the range of deviations aforementioned deviationsthe 1.5x deviation reported by Jesteadt and Bliger was on auditory frequency and intensity discrimination; the 0.8x deviation reported by Markowitz and Swets, was on sound detectionto suggest that the observed 1.36x deviations between tracking and forcedchoicebased estimates (from Model 4 fits) are within an established range of deviations in the literature. The relevance of these numbers to the present work is dubious at best. Consider, for example, if some other pair of studies on gustatory sodium discrimination or detection had reported deviations ranging from 0.25x to 8x the theoretically predicted values. Would that somehow be relevant to the interpretation of the current work? I am not sure why the authors thought this a valuable addition to the paper. They seem to be using these studies to suggest that because other studies have deviated from predictions (whatever the causes) that discrepancies in the present study should be thought of as small. It all feels forced. After writing out these notes, I was concerned that perhaps I had been making too much of it. But the next issue has a similar flavor.
We were not as careful in our wording as we should have been. Of course, experiments finding empirical differences in perceptual uncertainties between psychophysical tasks deviating from the differences predicted by SDT cannot invalidate the theoretical result of a sqrt(2) factor between 2AFC and YN experiments based on signal detection theory. We fully agree on that. We have therefore replaced phrases like “investigated the validity of the theoretical result” with more adequate wording such as “empirically tested the theoretical result”.
The point we were trying to make by citing Yeshurun et al., (2008) is that, irrespective of the sensory modality, it is very common to find discrepancies between the predictions of SDT across different psychophysical tasks and the empirical data across different psychophysical tasks. How large are these discrepancies? We turned to Yeshurun et al., (2008) to provide a quantitative value for these discrepancies.
This is not an “indictment” of “wellestablished literature” but a scientific fact. The first sentence of our abstract should leave no doubt that no “indictments” are involved here. If there are other studies showing that empirical data in different psychophysical tasks agree with the theoretical predictions, we will be happy to cite them.
Why are we trying to make this point?
Because it is not at all straightforward how to interpret these discrepancies, scientifically. One common criticism towards continuous psychophysics tracking experiments that we have encountered repeatedly is that they introduce additional uncontrolled factors and therefore yield biased estimates of sensory uncertainty. Classic psychophysics paradigms such as 2AFC and YN experiments together with SDT, on the other hand, are said to provide unbiased and consistent estimates of sensory uncertainty. But, Yeshurun et al., (2008) have carefully reviewed the literature and carried out experiments to establish that, empirically, there are significant discrepancies across numerous classical tasks. Jäkel and Wichmann (2006) also report systematic discrepancies between the empirical results and the results expected by SDT across experiments. Numerous studies also have reported pervasive systematic sequential effects across classic psychophysical experiments, a fact that has attracted a lot of interest recently. Thus, in our view, there is enough empirical evidence to support the view that the analysis of behavioral data obtained through classical psychophysical experiments with SDT is useful, but that there are almost always factors in the experiments that are not perfectly captured by models based on SDT, such as attention, memory, intrinsic costs, beliefs on the side of the subject differing from the true task statistics employed by the researcher, learning, and potentially more.
This applies to trialbased tasks as well as continuous tasks. In our models, we have included some of these additional factors for the tracking task, and shown that this helps decrease the discrepancy between the sensory uncertainty inferred from the tracking task and the 2AFC task. Therefore, it is important to us to highlight that the remaining discrepancy is not a particularity of continuous psychophysics as a new experimental paradigm, but typical when comparing different psychophysical tasks.
We agree that the concrete values in different sensory modalities derived from the studies by Jestead and Bilger (1974) and Markowitz and Swets (1967) may not be as relevant to the visual position discrimination and tracking tasks conducted by Bonnen et al., (2015). We have therefore removed references to these concrete values from the Results section (Figure 4A, where we now plot posterior CIs instead), the Discussion section, and the supplementary material. In the discussion, we instead cite Yeshurun et al., (2008) simply as evidence for the discrepancies between empirical measurements and the predictions by SDT across tasks.
As requested by the reviewer, we have removed the passage in the supplementary material and revised the discussion of this point in the Discussion section in the following way:
“One possible criticism of continuous psychophysics is that it introduces additional unmeasured factors such as action variability, intrinsic costs, and subjective internal models. While classical psychophysical paradigms take great care to minimize the influence of these factors by careful experimental design, they can nevertheless still be present, e.g. as serial dependencies (Green, 1964; Fischer and Whitney, 2014; Fründ et al., 2014). Similarly, estimates of perceptual uncertainty often differ between classical psychophysical tasks when compared directly. For example, Yeshurun et al., (2008) reviewed previous psychophysical studies and conducted experiments to empirically test the theoretical predictions of SDT between YesNo and 2IFC tasks. They found a broad range of deviations between the empirically measured perceptual uncertainty and the theoretical value predicted by SDT. Similarly, there are differences between 2AFC tasks, in which two stimuli are separated spatially, and their equivalent 2IFC tasks, in which the stimuli are separated in time (Jäkel and Wichmann, 2006). While the former task engages spatial attention, the latter engages memory. These factors are typically also not accounted for in SDTbased models. Thus, substantial empirical evidence suggests that factors often not modeled by SDT, such as attention, memory, intrinsic costs, beliefs of the subject differing from the true task statistics, and learning nevertheless very often influence the behavior in psychophysical experiments.”
– The manuscript appears both to misleadingly describe findings in Bonnen et al., (2015), and to selfcontradict its own assertions regarding those findings. The authors several times write that Bonnen et al.,'s trackingbased estimates of position uncertainty (i.e. position discrimination thresholds) are an order of magnitude (i.e. ~10x) higher than corresponding forcedchoicebased estimates. The authors contrast this discrepancy with their own results, saying that they 'infer values that differ by a factor of 1.36 on average'. It is true that the raw estimates of Bonnen et al., reported trackingbased estimates that were ~10x larger. But that was before Bonnen et al., took several obvious factors into account (e.g. the benefit of integrating across 15 frames of each stimulus presentation, the predicted improvement in sensitivity in 'twolook' forced choice tasks). After taking these factors into account (as it is appropriate to do), the ~10x difference between the tracking and forcedchoicebased was reduced to a ~2x difference. In the supplement, the authors acknowledge that Bonnen et al., took these factors into account, but their writing in the abstract, introduction, and discussion (references to order of magnitude differences) reads as if they are unaware. I am unclear about the reasons for this discrepancy, but it should be corrected.
Further, on page 5, left column the authors report that estimates of perceptual uncertainty of target position from the Kalman Filter model differ from the forcedchoicebased estimates by approximately 2x. They write: ‘The average factor between the posterior mean perceptual uncertainty in the continuous task and the 2IFC task in the 5 higher contrast conditions is 1.20 for the LQG model, while it is 1.73 for the KF. Only in the lowest contrast condition, it increases to 2.20 and 2.93 arcmin, respectively.’. This difference is in line with the conclusions of Bonnen et al., and does not square with the assertion that trackingbased estimates were off by an order of magnitude (~10x). Given that the current Kalman filter analysis essentially replicates the analysis performed by Bonnen et al., these two assertions (10x differences vs 2x differences) cannot both be correct. Again, in keeping with the first point above, all this seems like a forced attempt to convince the reader that observed deviations are small between the tracking vs. forcedchoicebased estimates of perceptual uncertainty about target position. But it seems unnecessary. Model 4 does a more accurate job of estimating perceptual uncertainty about target position than simpler models (and seems to provide a closer match to forcedchoicebased estimates). Let those results stand on their own.
Again, thank you for pointing out that this could potentially be misunderstood by a reader. But, we would like the reviewer to be as generous with us as with the authors of the Bonnen et al., study. We were referring to the fact that in the Bonnen et al., (2015) paper in Figure 10 the factor between forced choice and tracking is approximately one order of magnitude. It is worth pointing out that Bonnen et al., (2015) discuss that this factor can be reduced by accounting for temporal integration.
We understand that the way we have referred to these values could potentially be read in a misleading way. We agree that our results can stand on their own simply due to the fact that the estimates are closer to forcedchoice based estimates. Therefore, in the revised manuscript all references to "an order of magnitude" are removed and instead we simply state the values as they result from our reanalysis of the data from Bonnen et al., (2015).
We have now written in the abstract,
“However, what has precluded wide adoption of this approach is that current analysis methods do not account for the additional variability introduced by the motor component of the task and therefore recover perceptual thresholds, which are larger compared to equivalent traditional psychophysical experiments.”
and rewritten the parts that mention “orders of magnitude” in the introduction and discussion as well.
– For a computational paper, in which the primary contribution is to develop methods for data analysis and parameter estimation, there should be substantially more discussion of which parameters have their values trade off of one another. For example, the authors have taken the trouble visualize a posterior distribution over the parameters (Figure 3A). They should help the reader develop an intuition for why the posterior has the structure that it does. All models other than the Kalman filter model include perceptual uncertainty about target position (\σ), perceptual uncertainty about cursor position (\σ_p), and motor variability. All of these quantities affect the reliability with which discrepancies between target and cursor position can be evaluated, and should have similar effects on performance. The figure shows that these parameters tradeoff with each other (strong negative correlations in the posterior distributions over the model parameters). The authors should include some discussion of these tradeoffs, helping to provide the reader intuition for why these parameters tradeoff in the way that they do. Similarly for the positive correlation between motor variability (\σ_m) and perceptual uncertainty about cursor position.
Sorry, but we first need to clarify a conceptual point about Bayesian inference. While there are some correlations in the twodimensional marginals of the posterior distributions, these cannot be interpreted as perfect linear tradeoffs. Instead, there are nonlinear relationships between multiple variables in the posterior distributions and one should always be cautious when interpreting posterior distributions using only twodimensional marginals. The mode of the full nonGaussian joint posterior is also not in general equal to the mode of the marginal distributions.
For example, there is a positive correlation between motor variability (parameter σ_m) and behavioral costs (parameter c) in the twodimensional marginal posterior distribution shown in Figure 3A. The posterior distribution infers that these parameters lie in a certain range: e.g., most of the posterior mass for σ_m is between 0.48 and 0.52. Within that range, there is some remaining uncertainty about which exact values the parameters take. However, this does not mean that one can arbitrarily increase perceptual uncertainty while linearly increasing the cost, because this relationship is nonlinear and breaks down outside of the region with high posterior density. Similarly, the Figure2—figure supplement1 “Pareto efficiency plot” is a clear example that tradeoffs can interact highly nonlinearly. We also want to emphasize that the ability to infer a full joint posterior distribution (with correlations between variables) is a desirable property of Bayesian statistics instead of a shortcoming. With a maximumlikelihood approach, we would obtain a single bestfitting value per parameter without any insight into its associated uncertainty.
We agree that, intuitively, it may seem like “these quantities affect the reliability with which discrepancies between target and cursor position can be evaluated, and should have similar effects on performance”. But, importantly, our modelbased analysis shows that their influence is not the same. This can be seen in our parameter recovery analysis, which shows that our inference method can recover the true values of these parameters used to simulate the trajectories quite well (Figure 3B). Note that our inference method considers full trajectories in the space of states, sensory measurements, actions and associated noises. Thus, the inference is based not on summary statistics such as position errors or crosscorrelations, but takes into account how the different parameters interact at each time step.
All these points together emphasize the importance of including extensive simulations to check for parameter recovery (‘Simulations results’) and proper model comparison on experimental data (‘Model comparison’).
As for a more in depth discussion of the influence of the parameters, we have decided to substantially expand the final paragraph of the section “Computational models of psychophysical tasks”. This includes a qualitative discussion of the effects of most of the parameters when introducing the different models and will hopefully aid in developing better intuitions about the behavior of the overall model and the inferred properties. Here are some example sentences:
“[…] the KF has only one free parameter, the perceptual uncertainty \σ. This parameter describes the uncertainty in the sensory observation of target's true position.”
“The difference lies in the cost function, which now can additionally accommodate internal behavioral costs c (Figure 2 C) that may penalize e.g. large actions and thus result in a larger lag between the response and the target. The control cost parameter c therefore can implement a tradeoff between tracking the target and expending effort (see Figure 2—figure supplement1).“
“[…], the subject could instead assume that the target follows a random walk on position with standard deviation \σ_s and an additional random walk on the velocity with standard deviation σ_v. These subjective assumptions about the target dynamics can, for example, lead the subjective actor to overshoot the target, for which they then have to correct.”
– Regarding their analysis of simulated data, the authors report that attempts to infer the value of the target position uncertainty (\σ) from Model 4 were accurate (Figure 3C). It would be helpful for the authors to describe in more detail the parameters of the simulation. Specifically, in would be useful to explain and provide intuition for how, if the observer had a mistaken belief regarding the drift dynamics, it is possible to accurately infer the value of perceptual uncertainty about target position. A naïve reader might presume that, because there is an infinite number of pairs of position uncertainty (\σ) and presumed drift variance (\σ_s) that determine the Kalman gain, that \σ could not in fact be accurately estimated if \σ_s does not equal the true drift parameter (\σ_rw). Please explain for the reader how this works.
Sorry, but this again refers back to the issue of properties of Bayesian inference of parameters in the present models. This is why validation of inference on synthetic data and proper model comparison are fundamentally important.
For what it is worth, our intuition for why we can disentangle the subjective random walk drift from the perceptual uncertainty goes as follows: As you note correctly, both parameters influence the Kalman gain. However, the state estimate (x^hat_t) is not only a function of the Kalman gain, but also of the concrete sensory observations (y_t) that the subject receives (see equation 5). These sensory observations depend on the perceptual uncertainty, but not on the subjective random walk drift.
Another way to get an intuition is to see that the prediction of the next position may be rendered uncertain in equivalent ways by the two factors but with new incoming sensory observations, there is less variability in the estimates when the perceptual uncertainty is low. This leads to a difference in behavior that the statistical model picks up.
Additionally, to show that both the perceptual uncertainty and the subjective standard deviation of the random walk can be inferred, we ran the same simulations as for Figure 3A and B (which were previously for the bounded actor) for the subjective actor. The results show that all of its parameters can be inferred accurately. The updated versions of Figure 3A and B now show all six parameters of the subjective actor model.
Tone
The current paper describes shortcomings of the Bonnen et al., (2015) data analyses. Many of these shortcomings were explicitly acknowledged by Bonnen et al., It would help the tone and tenor of the manuscript if, when the current authors describe the acknowledged shortcomings, the current authors cite Bonnen et al., (2015).
Examples include:
– "A model, which only considers the perceptual side of the task, will therefore tend to overestimate perceptual uncertainty because these additional factors get lumped into perceptual model parameters, as we will show in simulations."
– "A model without these factors needs to attribute all the experimentally measured behavioral biases and variability to perceptual factors, even when they are potentially caused by additional cognitive and motor processes."
The current paper makes a nice contribution in demonstrating these points quantitatively. But, in places (and in keeping with the flavor of remarks above), the writing seems concerned that readers might not recognize the paper's unique contribution. I think the paper's contribution is clear. And I think that their paper will read better, and leave a better impression, if it looks for ways to portray previous work in a more generous light.
We are sorry that we have written some passages in a way that the reviewer feels potentially could be understood as a misrepresentation of Bonnen et al., (2015). This was not our intent. The other reviewers did not comment on this. In the passages cited above, we do not refer to the data analysis by Bonnen et al., (2015), specifically. For example, the statement that “a model, which only considers the perceptual side of the task, will tend to overestimate perceptual uncertainty” seems fairly neutral to us. We are making a general statement that a model without a motor component has to attribute all the variability and biases in the tracking data to perception. This is simply a fact that is clearly shown in our simulations in Figure 3C.
However, there is still a difference between mentioning factors such as actions and motor variability and explicitly accounting for these factors in a quantitative computational model, whose implementation is provided to the research community.
Nevertheless, to acknowledge the comments of the reviewer, we have revised a paragraph in the introduction and now explicitly mention that Bonnen et al., (2015) acknowledge that motor processes play a role in the tracking task. If there are any other specific passages that paint the contributions of Bonnen et al., (2015) in an unfavorable light, we are more than happy to consider changing them.
The revised paragraph reads:
“The KF models the perceptual side of the tracking task only, i.e. how an ideal observer sequentially computes an estimate of the target’s position. Tracking, however, is not merely a perceptual task but involves motor processes as well: In addition to the problem of estimating the current position of the target, a tracking task encompasses the motor control problem of moving the finger, computer mouse, or gaze towards the target. This introduces additional sources of variability and bias. First, repeated movements towards a target exhibit variability (Faisal et al., 2008), which arises because of neural variability during execution of movements (Jones et al., 2002) or their preparation (Churchland et al., 2006). Second, a subject might trade off the instructed behavioral goal of the tracking experiment
With subjective costs, such as biomechanical energy expenditure (Di Prampero,1981) or mental effort (Shenhav et al., 2017). Third, subjects might have mistaken assumptions about the statistics of the task (Petzschner and Glasauer, 2011, Beck et al., 2012), which can lead to different behavior from a model which perfectly knows the task structure, i.e. ideal observers. Bonnen et al., (2015) acknowledge that motor processes play a role, but their computational model, the KF, does not account for these processes. A model that only considers the perceptual side of the task will, therefore, tend to overestimate perceptual uncertainty because these additional factors get lumped into perceptual model parameters, as we will show in simulations.”
Specific Comments
– Experimentor/Participant distinction
The experimenter has access to the true target position and the true cursor position is available. The experimental participant him/herself has access only to the estimates of target and cursor positions. The article should more explicitly discuss this issue in the main text. Figure 2 and the Supplemental derivations allude to it, but it should be discussed more prominently.
We discuss this more prominently in the revised manuscript first in the introduction:
“By modeling the particular task as it is implemented by the researcher allows deriving a normative model of behavior such as ideal observers in perceptual science (Green and Swets,1966; Geisler, 1989) or optimal feedback control models in motor control (Wolpert and Ghahramani, 2000; Todorov and Jordan, 2002). This classic task analysis at the computational level (Marr, 1982) can now be used to produce predictions of behavior, which can be compared to the actual empirically observed behavior. However, the fundamental assumption in such a setting is that the subject is carrying out the instructed task and that subject's internal model of the task is identical to the underlying generative model of the task employed by the researcher. Here, instead, we allow for the possibility that the subject is not acting on the basis of the model the researcher has implemented. Instead, we allow for the possibility that subjects' cost function does not only capture the instructed task goals but also the experienced subjective cost such as physiological and cognitive costs of performing actions. Such an approach is particularly useful in more naturalistic task setting, where subjective costs and benefits of behavior are difficult to come by a priori.
Similarly, we allow subjects' subjective internal model of stimulus dynamics to differ from the true model employed by the researcher in the specific experimental task.
In the spirit of rational analysis (Simon, 1955; Anderson, 1991; Gershman et al., 2015) we subsequently invert this model of human behavior, by developing a method for performing Bayesian inference of parameters describing the subject. Importantly, inversion of the model allows all parameters to be inferred from behavioral data and does not presuppose their value, so that, e.g., if subjects' actions were not influenced by subjective internal behavioral costs, the internal cost parameter would be estimated to be zero. This approach therefore reconciles normative and descriptive approaches to sensorimotor behavior.”
and explain the graphical models in Figure 1 in more detail. Specifically, we have expanded the beginning of the section “Bayesian inverse optimal control” in the following way to better explain the distinction between the researcher’s and the subject’s perspective:
“The normative models described in the previous section treat the problem from the subject's point of view. That is, they describe how optimal actors with different internal models and goals should behave in a continuous psychophysics task. From the subject's perspective, the true state of the experiment x_t is only indirectly observed via the uncertain sensory information y_t (see Figure 1D). From the point of view of a researcher, the true state of the experiment is observed because they control the experiment, e.g. using a computer that presents the target and mouse position. The goal is to estimate parameters \theta that describe the perceptual, cognitive, and motor processes of the subject for each model, given observed trajectories x_{1:T} (Figure 1F).“
and the Discussion section also includes
“On a conceptual level, the results of this study underscore the fact that psychophysical methods together with their analysis methods always imply a generative model of behavior (Green and Swets, 1966; Swets, 1986; Wixted, 2020).
Nevertheless, different models can be used in the analysis i.e. models that may or may not be aligned with the actual generative model of the experiment as designed by the researcher or with the internal model that the subject is implicitly using. For example, while classical SDT assumes independence between trials and an experiment may be designed in that way, the subject may assume temporal dependencies between trials.
The analysis framework we use here accounts for both these possibilities. A participant may engage in an experiment with unknown subjective costs and false beliefs, as specified in the subjective actor model. Similarly, this analysis framework also allows for the researcher to consider multiple alternative models of behavior and to quantify both the uncertainty over individual models' parameters as well as uncertainty over models through Bayesian model comparison.”
– Explanation of matrix values.
The authors should explain why the B=[0 dt]' parameter takes on the particular values that it does. Is the implication that the control action is best understood as a 'rate of change' so that result matrixmultiplying the control action with B is a position? Please explain. The value of the nonzero element of B trades off perfectly with the square root (?or similar?) of the movement cost parameter 'c'. So it would be valuable for the authors to explain why they chose to set B to have the value it does.
The nonzero element of B is the gain applied to the action to yield the change in position. As you explain correctly, the choice of the nonzero element of B trades off with the action cost parameter c. If we had set it to a different value, we would simply have a different scaling of the action cost parameter. This is why B is not a free parameter of the model but set to a fixed value. Setting this value to the duration of a time step is customary in modeling of dynamical systems including in motor control, because it allows interpreting the magnitude of the action in physical terms, i.e. as a velocity.
Telegraphic mathematical development.
In the supplement, the mathematics associated with transitioning between Supplemental Equations S16S18 needs to be expanded. Conditioning the nD Gaussian on x_t and marginalizing out x_hat_t are two separate steps, but the manner in which the passage is written encourages the reader to presume that they are a single step. Too much analytical work is foisted on the reader if he/she wants to carefully follow the derivation along. The derivation is correct (I worked it out myself), but it should be laid out more explicitly. Further, the final expression for the likelihood of the state (x_{t+1}) on time step t+1 should be provided. This likelihood is straightforwardly obtained by marginalizing out the estimate (x^{hat}_{t+1}) of the (2vector) state, but it would be nice for the reader if the final expression was actually made explicit for the reader.
Thanks for verifying our derivations! We have inserted some details between Equations S16 and S18 (which is now Equation S20) to hopefully make the derivations easier to follow. Specifically, we have treated the marginalization and conditioning as separate operations. We have also clarified how x^{hat}_{t+1} can be marginalized out from p(x_{t+1}, x^{hat}_{t+1}  x_{1:t}).
Subjective model of state dynamics
The authors should make explicit in equation form the fallacious state dynamics that are assumed by the subjective observer. The authors describe it in words, but equations will prevent any uncertainty that the description may produce. Something like:
position walk: p_{t+1} = p_t + eps; where eps ~ N(0, σ_{rw})
velocity walk: v_{t+1} = v_t + eta; where eta ~ N(0, σ_v )
x_{t+1} = p_{t+1} + sum( v_{ 0:(t+1) } ) where the sum goes from 0 to t+1
Thank you for this suggestion. We have added the univariate descriptions of the subjective position and velocity random walks to the supplementary text to clarify the subjective dynamics. We have also extended Appendix 2 Table 1 to clarify the dynamics of all models.
Subjective model of estimate of velocity
Also, it is not made completely clear how the subjective observer's fallacious estimate of velocity is computed, how it resides in the model, whether the state is expressed as a 2vector [ x_t x_p ] or a 3vector [ x_t x_p v_t ], and whether the estimate of the state is expressed as a twovector [ xhat_t xhat_p ] or a 3vector [ xhat_t xhat_p vhat_t ]. An expanded discussion of these issues should be included. The expressions are correct but as they stand, but the mathematical development and discussion of these points are a bit too telegraphic.
Sorry, the definition of the state space for our models could indeed have been clearer. We have extended Appendix 2 Table 1 to include definitions of the state vectors and added a statement in the main text pointing these differences out.
Typo in the supplementary equation.
I believe that there is a typo in equation S2 of the supplement where the Ricatti equation is laid out. The term with the inverse currently reads ( C*P*C' )^1 whereas this term in standard expressions for the Ricatti equation should be ( C*P*C' + W*W' )^1. This term follows the same form as the inverse term in the expression for the Kalman gain (see the line above) which reflects the total covariance ( prior + observation covariances ).
Thanks for catching this error, we fixed it. The error was only in the equation and does not affect the implementation.
Reviewer #3 (Recommendations for the authors):
The paper is already in a very polished state. I have only a few comments about clarity and completeness.
– I found myself getting confused while reading the paper about what parameters were actually involved in each model (e.g. pg 3: "the KF model has only one parameter: perceptual uncertainty σ". My recommendation to the authors is to move equations 14 out of the Methods section and into the main text. Personally I would consider this part of the results (ie. "what are the models we're using"?) and so I would rather see this presented pedagogically within the main paper. Keep implementation details or other technical issues in the Methods, but present the models themselves in the Results section. Just a recommendation, but that's my 2 cents!
This is a good idea! We have moved these equations into the main text. We have also expanded the verbal description of the models in the section “Computational models of psychophysical tasks” and included a reference to Appendix 2 Table 1, which clarifies what parameters are included in which model. In addition, we have changed some of the model descriptions in the section “Computational models of psychophysical tasks” to hopefully improve the comprehensibility of this section.
– This leads to a related note on clarity: it's not entirely clear to me which parameters are being fit in each model. Presumably you're not fitting the C matrix in equation 2? What about B in equation 1? Come to think of it, I'm not sure where the cursor fits in, allowing for uncertainty in cursor position). (Does that become part of the state vector x_t? Please unpack this more clearly so that we can understand what these equations correspond to for each of the models discussed in the paper!
We hope that our changes in the section “Computational models of psychophysical tasks” as mentioned in our response to the previous comment clarify this!
– "Our implementation will be made available on github"  I want to note that this should be a requirement for acceptance! (i.e., the code should be posted before the paper is accepted).
Of course. The implementation is available at https://github.com/RothkopfLab/lqg and we have added the link in the manuscript.
– The name "bounded actor" seems like a poor one, since there aren't any bounds on the actor. (There are just costs). Figure 2 C refers to it as "bounded actor with behavioral costs"  personally I would keep the "with behavioral costs" and drop the "bounded".
We agree that, formally, there are no bounds as this term is used within the optimization literature. Instead, we use this term as it is customary in parts of the cognitive science literature, where, according to Herbert Simon’s definition, bounded rationality takes the limitations on cognitive capacity into account. In our model, the costs capture both biomechanical costs in the motor system and cognitive costs.
– One other point that would be worth making: you could also formulate a "with subjective belief" version of any of the simpler models. Even in the simple KF model, you could allow for mismatch between the true dynamics and the subject's belief about the dynamics. I think it's probably ok to leave the model comparisons as is, since obviously it becomes a bit messy if we have to include "optimal" and "subjective" versions of each of the models, but you should at least mention somewhere in the paper that this can be used to improve the accuracy (in terms of matching observer behavior) for any of the models.
Yes, we fully agree that we could compare any possible combination of models with or without subjective beliefs, action variability, and behavioral costs. We decided against this for two reasons.
First, the introduction of behavioral costs leads to the largest difference in the model comparison (Figure 4E), while the subjective beliefs make a smaller difference. Thus, it would be unlikely that a KF with subjective beliefs would fare better than the other models.
Second, and more importantly, the subjective actor is a generalization of the other models, so if action variability and costs did not play a large role but subjective beliefs were important, our method would infer posterior distributions over the action variability and cost parameters that are essentially zero but the posterior distributions over the subjective beliefs would be very different from the true dynamics, i.e. we would essentially recover a KF with subjective beliefs. This is not what we find, though (Figure 4B and C).
Following your suggestion, we point the following out in the section introducing the subjective actor:
“One could also formulate a version of any of the other models to include subjective beliefs. However, the subjective actor model is a generalization that includes all of these possible models. Accordingly, if the subjective beliefs played an important role but the costs or variability did not, we would infer the relevant parameters to have a value close to zero.”
– Can you say anything about the discrete time assumptions of the model? i.e., how would you expect the model parameters (or the accuracy of the fit) to change if you switched to 120 Hz frame rate or 20 Hz frame rate?
For the analyses in the present paper, we have chosen the sampling rate to match the frame rate at which the experiment was conducted. If one were to change the frame rate, the statistics of the target’s random walk would no longer be captured by the model in a straightforward way. For example, if we fit the model at a frame rate of 120Hz to data that was collected at 60Hz, the assumption of linear dynamics with independent Gaussian noise would no longer hold, because there would only be noise at every second time step.
Theoretically, if one were to run a tracking experiment at a different frame rate, we would certainly expect a higher uncertainty in the parameter estimates at lower frame rates because there would be less data. At higher frame rates, on the other hand, one could run into problems with the assumption of independent noise across time steps, as discussed in our response to Reviewer #1.
– Finally, a small note on grammar/usage: the paper frequently uses "… , which…" in cases where "that" (with no comma). e.g., in the abstract: "recover perceptual thresholds, which are one order of magnitude larger compared to equivalent traditional psychophysical experiments". This makes it sound like you're providing a definition of threshold, rather than describing a property of the recovered thresholds. More correct would be: "We recover perceptual thresholds that are one order of magnitude larger than …". I found this issue repeatedly in the text. You could basically look for nearly every occurrence of ", which" and replace by "that".
Thanks for pointing this out. We have changed several of our "which" constructions to "that".
https://doi.org/10.7554/eLife.76635.sa2Article and author information
Author details
Funding
Hessian Ministry of Higher Education, Science, Research and Art (The Adaptive Mind)
 Constantin A Rothkopf
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Kathryn Bonnen and Lawrence Cormack for sharing their behavioral data and for discussion of continuous psychophysics. Calculations for this research were conducted on the Lichtenberg highperformance computer of the TU Darmstadt. This research was supported by 'The Adaptive Mind,' funded by the Excellence Program of the Hessian Ministry of Higher Education, Science, Research and Art.
Senior Editor
 Joshua I Gold, University of Pennsylvania, United States
Reviewing Editor
 Jörn Diedrichsen, Western University, Canada
Reviewer
 Jörn Diedrichsen, Western University, Canada
Publication history
 Preprint posted: December 23, 2021 (view preprint)
 Received: December 23, 2021
 Accepted: August 8, 2022
 Version of Record published: September 29, 2022 (version 1)
 Version of Record updated: October 7, 2022 (version 2)
 Version of Record updated: October 10, 2022 (version 3)
Copyright
© 2022, Straub and Rothkopf
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 609
 Page views

 144
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
The automatic initiation of actions can be highly functional. But occasionally these actions cannot be withheld and are released at inappropriate times, impulsively. Striatal activity has been shown to participate in the timing of action sequence initiation and it has been linked to impulsivity. Using a selfinitiated task, we trained adult male rats to withhold a rewarded action sequence until a waiting time interval has elapsed. By analyzing neuronal activity we show that the striatal response preceding the initiation of the learned sequence is strongly modulated by the time subjects wait before eliciting the sequence. Interestingly, the modulation is steeper in adolescent rats, which show a strong prevalence of impulsive responses compared to adults. We hypothesize this anticipatory striatal activity reflects the animals’ subjective reward expectation, based on the elapsed waiting time, while the steeper waiting modulation in adolescence reflects agerelated differences in temporal discounting, internal urgency states, or explore–exploit balance.

 Computational and Systems Biology
 Neuroscience
How dynamic interactions between nervous system regions in mammals performs online motor control remains an unsolved problem. In this paper, we show that feedback control is a simple, yet powerful way to understand the neural dynamics of sensorimotor control. We make our case using a minimal model comprising spinal cord, sensory and motor cortex, coupled by long connections that are plastic. It succeeds in learning how to perform reaching movements of a planar arm with 6 muscles in several directions from scratch. The model satisfies biological plausibility constraints, like neural implementation, transmission delays, local synaptic learning and continuous online learning. Using differential Hebbian plasticity the model can go from motor babbling to reaching arbitrary targets in less than 10 min of in silico time. Moreover, independently of the learning mechanism, properly configured feedback control has many emergent properties: neural populations in motor cortex show directional tuning and oscillatory dynamics, the spinal cord creates convergent force fields that add linearly, and movements are ataxic (as in a motor system without a cerebellum).