Influence of sensory modality and control dynamics on human path integration

  1. Akis Stavropoulos  Is a corresponding author
  2. Kaushik J Lakshminarasimhan
  3. Jean Laurens
  4. Xaq Pitkow  Is a corresponding author
  5. Dora E Angelaki  Is a corresponding author
  1. Center for Neural Science, New York University, United States
  2. Center for Theoretical Neuroscience, Columbia University, United States
  3. Ernst Strüngmann Institute for Neuroscience, Germany
  4. Department of Electrical and Computer Engineering, Rice University, United States
  5. Department of Neuroscience, Baylor College of Medicine, United States
  6. Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, United States
  7. Tandon School of Engineering, New York University, United States

Abstract

Path integration is a sensorimotor computation that can be used to infer latent dynamical states by integrating self-motion cues. We studied the influence of sensory observation (visual/vestibular) and latent control dynamics (velocity/acceleration) on human path integration using a novel motion-cueing algorithm. Sensory modality and control dynamics were both varied randomly across trials, as participants controlled a joystick to steer to a memorized target location in virtual reality. Visual and vestibular steering cues allowed comparable accuracies only when participants controlled their acceleration, suggesting that vestibular signals, on their own, fail to support accurate path integration in the absence of sustained acceleration. Nevertheless, performance in all conditions reflected a failure to fully adapt to changes in the underlying control dynamics, a result that was well explained by a bias in the dynamics estimation. This work demonstrates how an incorrect internal model of control dynamics affects navigation in volatile environments in spite of continuous sensory feedback.

Editor's evaluation

This paper investigates the importance of visual and inertial sensory cues as well as the underlying motion dynamics to the accuracy of spatial navigation. When motion control was artificially manipulated in a virtual environment, subjects could navigate accurately using vision, but not inertial signals alone. Overall, these findings shed new light on how the brain combines sensory information and internal models of control dynamics for self-motion perception and navigation.

https://doi.org/10.7554/eLife.63405.sa0

Introduction

Imagine driving a car onto an icy road, where steering dynamics can change rapidly. To avoid crashing, one must rapidly infer the new dynamics and respond appropriately to keep the car on the desired path. Conversely, when you leave an ice patch, control dynamics change again, compelling you to re-adjust your steering. The quality of sensory cues may also vary depending on environmental factors (e.g. reduced visibility in fog or twilight, sub-threshold vestibular stimulation under near-constant travel velocity). Humans are adept at using time-varying sensory cues to adapt quickly to a wide range of latent control dynamics in volatile environments. However, the relative contributions of different sensory modalities and the precise impact of latent control dynamics on goal-directed navigation remain poorly understood. Here, we study this in the context of path integration.

Path integration, a natural computation in which the brain uses dynamic sensory cues to infer the evolution of latent world states to continuously maintain a self-position estimate, has been studied in humans, but past experimental paradigms imposed several constraints. First, in many tasks, the motion was passive and/or restricted along predetermined, often one-dimensional (1D), trajectories (Klatzky, 1998; Jürgens and Becker, 2006; Petzschner and Glasauer, 2011; Campos et al., 2012; Tramper and Medendorp, 2015). Second, unlike time-varying actions that characterize navigation under natural conditions, participants’ responses were often reduced to single, binary end-of-trial decisions (ter Horst et al., 2015; Chrastil et al., 2016; Koppen et al., 2019). Third, even studies that explored contributions of different sensory modalities in naturalistic settings failed to properly disentangle vestibular from motor cues generated during active locomotion (Kearns et al., 2002; Campos et al., 2010; Bergmann et al., 2011; Chen et al., 2017; Schubert et al., 2012; Péruch et al., 1999; Péruch et al., 2005). Furthermore, varying constraints have presumably resulted in inconsistent findings on the contribution of vestibular cues to path integration (Jürgens and Becker, 2006; Campos et al., 2010; ter Horst et al., 2015; Tramper and Medendorp, 2015; Koppen et al., 2019; Chrastil et al., 2019; Glasauer et al., 1994; Seidman, 2008).

There is a tight link between path integration and spatial navigation on the one hand, and internal models and control dynamics on the other. To accurately estimate self-motion, we rely not only on momentary sensory evidence but also on the knowledge of motion dynamics, that is, an internal model of the world. Knowledge of the dynamics makes the sensory consequences of actions predictable, allowing for more dexterous steering. However, although there is a large body of research focused on dynamics and adaptation for motor control (Shadmehr and Mussa-Ivaldi, 1994; Lackner and Dizio, 1994; Krakauer et al., 1999; Takahashi et al., 2001; Burdet et al., 2001; Kording et al., 2007; Berniker et al., 2010), studies of perceptual inference of latent dynamics during navigation have been limited. Some pioneering studies demonstrated participants’ ability to reproduce arbitrary 1D velocity profiles (Grasso et al., 1999; Israël et al., 1997), while more recent efforts showed that the history of linear (Petzschner and Glasauer, 2011) and angular (Prsa et al., 2015) displacements affects how participants process sensory input in the current trial. We previously observed that false expectations about the magnitude of self-motion can have a drastic effect on path integration (Lakshminarasimhan et al., 2018). We wondered whether prior expectations about the temporal dynamics of self-motion, that is, how velocities are temporally correlated, can also propagate over time to influence navigation.

To explore how dynamics influence navigation across sensory modalities (visual, vestibular, or both), we have built upon a naturalistic paradigm of path integration in which participants navigate to a briefly cued target location using a joystick to control their velocity in a virtual visual environment (Lakshminarasimhan et al., 2018; Alefantis et al., 2021). Here, we generalize this framework by varying both the control dynamics (joystick control varied along a continuum from velocity to acceleration) and the available sensory cues (vestibular, visual, or both). To achieve this, we designed a motion-cueing (MC) algorithm to render self-motion stimuli according to a joystick control input of maintained accelerations while maintaining correspondence between visual (optic flow) and inertial cues. Using a motion platform with six degrees of freedom to approximate the accelerations that an observer would feel under the imposed control dynamics, we ensured that the MC algorithm would generate matching visual and vestibular cues to closely approximate the desired self-motion (see Materials and methods, Figure 1—figure supplements 1 and 2). The development of the MC algorithm represents a departure from classical paradigms of navigation research in humans (Chrastil et al., 2019; Israël et al., 1996; Koppen et al., 2019; Seemungal et al., 2007; ter Horst et al., 2015), as it helps eliminate artificial constraints while still allowing for the isolation of different sensory contributions, most notably vestibular/somatosensory cues, during active, volitional, steering.

We found that participants’ steering responses were biased (undershooting), and the biases were more prominent in the vestibular condition. Furthermore, steering biases were strongly modulated by the underlying control dynamics. These findings suggest that inertial cues alone (as generated by motion cueing) lack the reliability to support accurate path integration in the absence of sustained acceleration, and that an accurate internal model of control dynamics is needed to make use of sensory observations when navigating in volatile environments.

Results

Task structure

Human participants steered toward a briefly cued target location on a virtual ground plane, with varying sensory conditions and control dynamics interleaved across trials. Participants sat on a motion platform in front of a screen displaying a virtual environment (Figure 1A). Stereoscopic depth cues were provided using polarizing goggles. On each trial, a circular target appeared briefly at a random location (drawn from a uniform distribution within the field of view; Figure 1B and C) and participants had to navigate to the remembered target location in the virtual world using a joystick to control linear and angular self-motion. The virtual ground plane was defined visually by a texture of many small triangles which independently appeared only transiently; they could therefore only provide optic-flow information and could not be used as landmarks. The self-motion process evolved according to Markov dynamics, such that the movement velocity at the next time step depended only on the current joystick input and the current velocity (Materials and methods – Equation 1a).

Figure 1 with 3 supplements see all
Experimental design.

(A) Experimental setup. Participants sit on a six-degrees-of-freedom motion platform with a coupled rotator that allowed unlimited yaw displacements. Visual stimuli were back-projected on a screen (see Materials and methods). The joystick participants used to navigate in the virtual world is mounted in front of the participants’ midline. (B) Schematic view of the experimental virtual environment. Participants use a joystick to navigate to a cued target (yellow disc) using optic-flow cues generated by ground plane elements (brown triangles; visual and combined conditions only). The ground plane elements appeared transiently at random orientations to ensure they cannot serve as spatial or angular landmarks. (C) Left: Overhead view of the spatial distribution of target positions across trials. Red dot shows the starting position of the participant. Positions were uniformly distributed within the participant’s field of view. Right: Movement trajectories of one participant during a representative subset of trials. Starting location is denoted by the red dot. (D) Control dynamics. Inset: Linear joystick input from a subset of trials in the visual condition of an example participant. Left: Simulated maximum pulse joystick input (max joystick input = 1) (see also Figure 1—figure supplement 3). This input is lowpass filtered to mimic the existence of inertia. The time constant of the filter varies across trials (time constant τ). In our framework, maximum velocity also varies according to the time constant τ of each trial to ensure comparable travel times across trials (see Materials and methods – Control Dynamics). Right: The same joystick input (scaled by the corresponding maximum velocity for each τ) produces different velocity profiles for different time constants (τ = 0.6 s corresponds to velocity control; τ = 3 s corresponds to acceleration control; τ values varied randomly along a continuum across trials, see Materials and methods). Also depicted is the brief cueing period of the target at the beginning of the trial (gray zone, 1 s long). (E) Markov decision process governing self-motion sensation (Materials and methods – Equation 1a). u, v, and o denote joystick input, movement velocity, and sensory observations, respectively, and subscripts denote time indices. Note that due to the 2D nature of the task, these variables are all vector-valued, but we depict them as scalars for the purpose of illustration. By varying the time constant, we manipulated the control dynamics (i.e., the degree to which the current velocity carried over to the future, indicated by the thickness of the horizontal lines) along a continuum such that the joystick position primarily determined either the participant’s velocity (top; thin lines) or acceleration (bottom; thick lines) (compare with (D) top and bottom, respectively). Sensory observations were available in the form of vestibular (left), optic flow (middle), or both (right).

A time constant for the control filter (control timescale) governed the control dynamics: in trials with a small time constant and a fast filter, joystick position essentially controlled velocity, providing participants with responsive control over their self-motion, resembling regular road-driving dynamics. However, when the time constant was large and the control filter was slow, joystick position mainly controlled acceleration, mimicking high inertia under viscous damping, as one would experience on an icy road where steering is sluggish (Figure 1D right and 1E – top vs. bottom). For these experiments, as the control timescale changed, the maximum velocity was adjusted so that the participant could reach the typical target in about the same amount of time on average. This design ensured that the effect of changing control dynamics would not be confused with the effect of integrating sensory signals over a longer or shorter time.

Concurrently, we manipulated the modality of sensory observations to generate three conditions: (1) a vestibular condition in which participants navigated in darkness, and sensed only the platform’s motion (note that this condition also engages somatosensory cues, see Materials and methods), (2) a visual condition in which the motion platform was stationary and velocity was signaled by optic flow, and (3) a combined condition in which both cues were available (Figure 1E – left to right). Across trials, sensory conditions were randomly interleaved while manipulation of the time constant followed a bounded random walk (Materials and methods – Equation 2). Participants did not receive any performance-related feedback.

Effect of sensory modality on performance

We first compared the participants’ stopping locations on each trial to the corresponding target locations, separately for each sensory condition. We calculated the radial distance r~ and angular eccentricity θ~ of the participants’ final position relative to the initial position (Figure 2A), and compared them to the initial target distance r and angle θ, as shown for all trials (all time constants together) of a typical participant in Figure 2B and C. This revealed biased performance with notable undershooting (participants stopped short of the true target location), in both distance and angle, which was well described by a linear model without intercept (radial distance R2 ± standard deviation – vestibular: 0.39 ± 0.06, visual: 0.67 ± 0.1, combined: 0.64 ± 0.11; angular eccentricity R2 ± standard deviation – vestibular: 0.85 ± 0.06, visual: 0.95 ± 0.05, combined: 0.96 ± 0.04. Adding a non-zero intercept term offered negligible improvement; radial distance ΔR2 – vestibular: 0.02 ± 0.02, visual: 0.03 ± 0.03, combined: 0.03 ± 0.02; angular eccentricity ΔR2 – vestibular: 0.02 ± 0.03, visual: 0.01 ± 0.01, combined: 0.01 ± 0.01). We refer to the slope of the linear regression as ‘response gain’: a response gain of unity indicates no bias, while gains larger (smaller) than unity indicate overshooting (undershooting). As shown with the example participant in Figure 2B and C, there was substantial undershooting in the vestibular condition, whereas performance was relatively unbiased under the combined and visual conditions (see also Figure 2—figure supplement 1A). These results were consistent across participants (Figure 2D, mean radial gain± standard deviation – vestibular: 0.76 ± 0.25, visual: 0.88 ± 0.23, combined: 0.85 ± 0.22, mean angular gain± standard deviation – vestibular: 0.79 ± 0.22, visual: 0.98 ± 0.14, combined: 0.95 ± 0.12), and no significant sex differences were observed (see Figure 2—figure supplement 1B). The difference in response gain between modalities could be traced back to the control exerted by the subjects on the joystick. Both linear and angular components of control input had shorter duration in the vestibular condition (mean ± SEM of total area of joystick input across participants (a.u.): radial – vestibular: 5.62 ± 0.27, visual: 7.31 ± 0.33, combined: 7.07 ± 0.34; angular – vestibular: 2.39 ± 0.30, visual: 3.29 ± 0.42, combined: 3.79 ± 0.46), and produced smaller displacements, as summarized by the response gains (Figure 2, Figure 2—figure supplement 2).

Figure 2 with 2 supplements see all
Effect of sensory modality on participants' responses.

(A) Geometric definition of analysis variables.

The gray solid line indicates an example trajectory. The target and response distance and angle relative to the starting position of the participant are given by r,θ (thin lines) and r~,θ~ (thick lines), respectively. (B C) Example participant: Comparison of the radial distance r~ of an example participant’s response (final position) against the radial distance r of the target (B), as well as the angular eccentricity of the participant’s response θ~ vs. target angle θ (C), across all trials for one participant, colored according to the sensory condition (green: vestibular, cyan: visual, purple: combined visual and vestibular; Figure 2—source data 1). Radial and angular response gains were defined as the slope of the corresponding regressions. Black dashed lines show unity slope, and the solid lines represent slopes of the regression fits (intercept set to 0). (D) All participants: Radial and angular gains in each sensory condition plotted for each individual participant (Figure 2—source data 2). Ellipses show 68% confidence intervals of the distribution of data points for the corresponding sensory condition. Diamonds (centers of the ellipses) represent the mean radial and angular response gains across participants. Dashed lines indicate unbiased radial or angular position responses. Solid diagonal line has unit slope. (E) Magnitudes of radial and angular components of control inputs across sensory conditions for an example participant. Shaded regions represent ±1 standard deviation across trials. The gray zone corresponds to the target presentation period.

Effect of control dynamics on performance

To examine whether control dynamics affected the response gain, we performed three complementary analyses. First, we recomputed response gains by stratifying the trials into three groups of equal size based on the time constants. We found that smaller time constants (velocity control) were associated with smaller response gains (Figure 3A; Appendix 2—table 1). This relationship was most pronounced in the vestibular condition, where larger time constants (acceleration control) resulted in better (closer to ideal) performance (Figure 3, green; see Discussion). Control dynamics had a smaller but considerable effect on steering responses in the visual and combined conditions, with participants exhibiting modest overshooting (undershooting) when the time constant was large (small) (Figure 3A, cyan/purple).

Figure 3 with 2 supplements see all
Effect of control dynamics on participants’ responses.

(A) Participant average of radial and angular response gains in each condition, with trials grouped into tertiles of increasing time constant τ. Error bars denote ±1 SEM. (B) Effect of time constant τ on radial (left) and angular (right) residual error, for an example participant (Figure 3—source data 1). Solid lines represent linear regression fits and ellipses the 68% confidence interval of the distribution for each sensory condition. Dashed lines denote zero residual error (i.e. stopping location matches mean response). (C) Correlations of radial (εr) and angular (εθ) residual errors with the time constant for all participants. Ellipses indicate the 68% confidence intervals of the distribution of data points for each sensory condition. Solid diagonal line has unit slope. Across participants, radial correlations, which were larger for the vestibular condition, were greater than angular correlations (see also Appendix 2—table 2). (D) Linear regression coefficients for the prediction of participants’ response location (final position:,r~, θ~; left and right, respectively) from initial target location (r,θ) and the interaction between initial target location and the time constant (rτ,θτ) (all variables were standardized before regressing, see Materials and methods; Figure 3—source data 2). Asterisks denote statistical significance of the difference in coefficient values of the interaction terms across sensory conditions (paired t-test; *: p<0.05, **: p<0.01, ***: p<0.001; see main text). Error bars denote ±1 SEM. Note a qualitative agreement between the terms that included target location only and the gains calculated with the simple linear regression model (Figure 2B). (E) Comparison of actual and null-case (no adaptation) response gains, for radial (top) and angular (bottom) components, respectively (average across participants). Dashed lines represent unity lines, that is, actual response gain corresponds to no adaptation. Inset: Regression slopes between actual and null-case response gains. A slope of 0 or 1 corresponds to perfect or no adaptation (gray dashed lines), respectively. Error bars denote ±1 SEM.

Second, we performed a fine-grained version of the above analysis by computing residual errors on each trial, that is, the deviation of the response from the mean response predicted from target location alone (Materials and methods – Equation 3). Since participants try to stop at their believed target location, ideally their mean responses should depend only on target location, and not on control dynamics. In other words, if participants adapted their control appropriately to the varying control dynamics, their responses should cluster around their mean response, and as a result, their residual errors should be centered around zero without any mean dependence on dynamics. However, we found a significant correlation between residual errors and the time constant across trials (Figure 3B, C; Figure 3—figure supplement 1, Appendix 2—table 2, see Materials and methods; no significant sex differences were observed, and therefore are not investigated in subsequent analyses, see also Figure 2—figure supplement 1C). This correlation and the corresponding regression slopes were substantially higher in the vestibular condition (mean Pearson’s r ± SEM: radial component – vestibular: 0.52 ± 0.02, visual: 0.36 ± 0.03, combined: 0.37 ± 0.03; angular component – vestibular: 0.23 ± 0.02, visual: 0.23 ± 0.03, combined: 0.26 ± 0.03; see also Appendix 2—Tables 2 and 3). Thus, for a given target distance, participants tended to travel further when the time constant was larger (acceleration control), indicating they did not fully adapt their steering control to the underlying dynamics.

Third, to quantify the contribution of the time constant in the participants’ responses, we expanded the linear model to accommodate a dependence of response (final stopping position) on target location, time constant, and their interaction. A partial correlation analyses revealed that the time constant contributed substantially to participants’ response gain, albeit only by modulating the radial and angular distance dependence (Appendix 2—table 4; Figure 3—figure supplement 2; see Materials and methods – Equation 4). Again, the contribution of the time constant-dependent term was much greater for the vestibular condition (Figure 3D), especially for the radial distance (p-values of difference in coefficient values across modalities obtained by a paired t-test – radial: vestibular vs. visual: p < 10–4 , vestibular vs. combined: p < 10–4; angular: vestibular vs. visual: p = 0.016, vestibular vs. combined: p = 0.013). While perfect adaptation should lead to response gain that is independent of control dynamics, all three independent analyses revealed that control dynamics did substantially influence the steering response gain, exposing participants’ failure to adapt their steering to the underlying dynamics. Adaptation was lowest for the vestibular condition; in contrast, for the visual and combined conditions, the response gain was less affected indicating greater compensation when visual information was available.

We quantified the extent to which participants failed to adapt to the control dynamics, by simulating a null case for no adaptation. Specifically, we generated null-case trajectories by using the steering input from actual trials and re-integrating it with time constants from other trials. In this set of null-case trajectories, the steering control corresponds to different time constants; in other words, steering is not adapted to the underlying dynamics (see Materials and methods). We then grouped these trajectories based on the simulation time constant (as in Figure 3A) and computed the corresponding response gains. We found that the true response gains in the vestibular condition were much closer to the no-adaptation null case, compared to visual/combined conditions (Figure 3E). Interestingly, this finding was more prominent in the radial component of the response gain (Figure 3E insets), consistent with our earlier observations of a stronger influence of the dynamics on the radial component of the responses.

We have shown how various measures of the participants’ final responses (stopping positions,response gain, residual errors) are influenced by the time constant of the dynamics. This large dependence of the final responses on the time constant exposes participants’ failure to fully adapt their steering to the underlying dynamics. In other words, the influence of the dynamics on steering control was relatively weak, especially in the vestibular condition.

For best performance, however, control dynamics should influence the time course of steering behavior. We directly quantified the influence of the control dynamics on steering by comparing participants’ braking (negative control input) across time constants: when the time constant is large, we ideally expect to see more braking as a countermeasure for the sluggish control (Figure 1D) to minimize travel duration (see Materials and methods). Indeed, participants do tend to brake more for higher time constants, but this effect is weaker in the vestibular condition (Figure 4 and inset). Nevertheless, correlations between the time constant and cumulative braking (total area below zero linear control input) were significant in all sensory conditions (mean Pearson’s r ± SEM – vestibular: 0.20 ± 0.03, visual: 0.62 ± 0.04, combined: 0.57 ± 0.04; p-values of Pearson’s r difference from zero – vestibular: p = 10−5, visual: p < 10−7, combined: p < 10−7). Overall, it appears that behavior in the vestibular condition is minimally influenced by the dynamics (i.e. smaller modulation of control input by the time constant, as shown by the cumulative braking). When optic flow is available, however, participants are more flexible in adjusting their control.

Linear and angular control inputs for each condition grouped based on the time constant (see legend; bottom right), for an example participant.

Shaded regions represent ±1 standard deviation across trials. Yellow zones denote target presentation period. Inset: Cumulative braking (i.e. absolute sum of negative linear input) for each condition across time constant groups. Braking was averaged across trials. Error bars denote ±1 SEM across participants.

We have shown previously that accumulating sensory noise over an extended time (~10 s) would lead to a large uncertainty in the participant’s beliefs about their position, causing them to undershoot (Lakshminarasimhan et al., 2018). The exact amount of undershooting depends both on the reliability of self-motion cues, which determines the instantaneous uncertainty in the self-motion estimate, and on travel duration, which governs how much uncertainty is accumulated while navigating to the target. With recent findings ascribing uncertainty accumulation to noise in the velocity input (Stangl et al., 2020), the observed differences in navigation performance across sensory modalities can be readily attributed to greater measurement noise (lower reliability) in vestibular signals. On the other hand, we observed performance differences across control dynamics within each sensory modality, so those differences cannot be attributed to differences in the reliability of self-motion cues (instantaneous uncertainty). However, it might seem that this effect of control dynamics must be due to either differences in travel duration or velocity profiles, which would both affect the accumulated uncertainty. We adjusted stimulus parameters to ensure that the average travel time and average velocity were similar across different control dynamics (Materials and methods – Equation 1.2-1.10), however, we found that travel duration and average velocity depend weakly on the time constant in some participants. Simulations suggest that both dependencies are a consequence of maladaptation to the dynamics rather than a cause of the observed effect of the dynamics on the responses. Interestingly, the dependence is stronger in the vestibular condition where there is less adaptation to the dynamics, agreeing with our simulations (Figure 5—figure supplement 1A, B). Difference in velocity profiles is also an unlikely explanation since their expected effect on the participants’ responses (undershoot) is the opposite of the observed effect of the control dynamics (overshooting tendency; Figure 5—figure supplement 1C). Consequently, unlike the effect of sensory modality on response gain, neither instantaneous nor accumulated differences in the uncertainty can fully account for the influence of control dynamics, that is, the time constant. Instead, we will now show that the data are well explained by strong prior expectations about motion dynamics that cause a bias in estimating the time constant.

Modeling the effect of control dynamics across sensory modalities

From a normative standpoint, to optimally infer movement velocity, one must combine sensory observations with the knowledge of the time constant. Misestimating the time constant would produce errors in velocity estimates, which would then propagate to position estimates, leading control dynamics to influence response gain (Figure 5A, middle-right). This is akin to misestimating the slipperiness of an ice patch on the road causing an inappropriate steering response, which would culminate in a displacement that differs from the intended one (Figure 5—figure supplement 2). However, in the absence of performance-related feedback at the end of the trial, participants would be unaware of this discrepancy, wrongly believing that the actual trajectory was indeed the intended one. In other words, participants’ imperfect adaptation to changes in control dynamics could be a consequence of control dynamics misestimation.

Figure 5 with 2 supplements see all
Bayesian model framework and correlations between the time constant and model-implied residual errors.

(A)Left: Illustration of the Bayesian estimator model. We fit two parameters: the ratio λ of standard deviations of prior and likelihood (λ=σprior/σlikelihood) and the mean of the prior (μprior) of the normally distributed variable φ=logτ (black dotted box). Likelihood function is centered on the log-transformation of the actual τ, φ*=logτ* (black dashed line). The time constant estimate τ^ corresponded to the median of the posterior distribution over τ, which corresponds to the median φ^ over φ, τ^=exp(φ^) (red dotted box;red dashed line; see Materials and methods). Middle: Control dynamics implied by the actual time constant τ (top; gray shade) and the estimated time constant τ^ (bottom; red shade). u, v, and o denote joystick input, movement velocity, and sensory observations, respectively, and subscripts denote time indices. v^ denotes the inferred velocity implied by the model. Misestimation of the time constant leads to erroneous velocity estimates about self-motion v^ which result in biased position beliefs. Right: Illustration of the actual (black) and believed (red) trajectories produced by integrating (box) the actual velocity v and the estimated velocity v^ , respectively. White and yellow dots denote the starting and target position, respectively. Inset: Illustration of correlated (black dots) and uncorrelated (red dots) residual errors with the time constant for actual and model-implied responses (simulated data). For simplicity, we depict residual errors as one-dimensional and assume unbiased responses (response gain of 1). Blown-up dots with yellow halo correspond to the actual and model-implied trajectories of the right panel. Solid black horizontal line corresponds to zero residual error (i.e. stop on target location). (B) Comparison of correlations between real and subjective residual errors with τ (Figure 5—source data 1). On the right, participant averages of these correlations are shown. Colored bars: ‘Subjective’ correlations, open bars: Actual correlations. Error bars denote ±1 SEM across participants. Asterisks denote the level of statistical significance of differences between real and subjective correlations (*: p<0.05, **: p<0.01, ***: p<0.001).

Figure 5—source data 1

correlations between time constant and model-implied residual errors.

https://cdn.elifesciences.org/articles/63405/elife-63405-fig5-data1-v2.zip

We tested the hypothesis that participants misestimated the time constant using a two-step model that reconstructs the participants’ believed trajectories according to their point estimate of the time constant τ, as follows. First, a Bayesian observer model infers the participant’s belief about τ on individual trials, that is, the subjective posterior distribution over the time constant (τ inference step; Figure 5A, left). Second, we used the median of that belief to reconstruct the believed trajectory by integrating the actual joystick input according to the estimated time constant on that trial (integration step), resulting in a believed stopping location (Figure 5A, middle-right). In the absence of bias (response gain of one), the believed stopping locations should land on or near the target. However, various unmeasurable fluctuations in that belief across trials should lead to variability clustered around the target location. When the behavior is biased (response gain different from one, as was the case here – Figure 2D), this cluster should instead be centered around the participants’ mean belief for that target location (determined from their biased responses and henceforth referred to as mean stopping location). Since the participants’ goal is to stop as close to their perceived target location as possible, the deviation of believed stopping locations from the mean stopping location for a given target should be small. We call this deviation the subjective residual error. Therefore, we inferred the parameters of the Bayesian model separately for each participant by minimizing the subjective residual errors induced by the control dynamics using the principle of least squares (see Materials and methods for further details). We next describe the parameters of the Bayesian model and then describe the results of fitting the model to our data.

Because the time constant τ is always positive, we model both the prior distribution and the likelihood function over the variable φ=logτ as Gaussians in log-space. We parameterized both the prior and the likelihood with a mean (μ) and standard deviation (σ). The mean of the prior (μ) was allowed to freely vary across sensory conditions but assumed to remain fixed across trials. On each trial, the likelihood was assumed to be centered on the actual value of the log time constant τ* on that trial according to μ=φ*=logτ* and was therefore not a free parameter. Finally, we set the ratio λ of prior over likelihood σ, to freely vary across sensory conditions. Thus, for each sensory condition, we fit two parameters: the μ of the prior, and the ratio (λ) of prior σ to likelihood σ. As mentioned above, we fit the model to minimize the difference between their believed stopping locations and their experimentally measured mean stopping location (subjective residual errors), using a least-squares approach (Materials and methods) and obtained one set of parameters for each condition. Finally, the participant’s estimated time constant τ^ on each trial was taken to be the median of the best-fit model, which equals the median of the distribution over φ (Figure 5A, left). By integrating the subject’s joystick inputs on each trial using τ^ rather than the actual time constant τ, we computed the believed stopping location and the subjective residual errors implied by the best-fit model.

We then compared the correlations between the time constant and the residual errors for real responses (from data in Figure 3B and C) or subjective responses (from model), separately for radial and angular components. Because participants try to stop at their believed target location, the believed stopping position should depend only on target location and not on the control dynamics. Any departure would suggest that participants knowingly failed to account for the effect of the control dynamics, which would manifest as a dependence of the subjective residual errors on the time constant τ. In other words, a good model of the participants’ beliefs would predict that the subjective residual errors should be uncorrelated with the time constant τ (Figure 5A inset – red) even if the real residual errors are correlated with the time constant (Figure 5A inset – black). In all cases, we observed that the correlation between residual error and time constant was indeed significantly smaller when these errors were computed using the subjective (believed) rather than real stopping location (Figure 5B). In fact, subjective residual errors were completely uncorrelated with the time constant suggesting that the Bayesian model is a good model of participants’ beliefs, and that the apparent influence of control dynamics on behavioral performance was entirely because participants misestimated the time constant of the underlying dynamics.

We next examined the model posterior estimates to assess how subjects’ internal estimate of the control dynamics departed from the true dynamics. The relationship between real and model-estimated time constants for all participants can be seen in Figure 6A. In the vestibular condition, all participants consistently misestimated τ, exhibiting a substantial regression toward the mean (Figure 6A, green). This effect was much weaker in the visual condition. Only a few participants showed relatively flat estimates, with the majority showing smaller departures from ideal estimates (dashed line). The data for the combined condition followed a similar trend, with properties between those in the visual and vestibular conditions (Figure 6A, purple). These results suggest that the better control adaptation in the visual and combined conditions shown in Figure 3 is due to participants’ improved estimates of the time constant when optic flow was available.

Model parameters.

(A) Relationship between the model-estimated and actual time constant across all participants in vestibular (green), visual (cyan), and combined (purple) conditions. Participant averages are superimposed (thick lines). Dashed line: unbiased estimation (Figure 6—source data 1). (B) Fitted model parameters: ratio λ of prior (σp) over likelihood (σl) standard deviation and mean (μ) of prior. Error bars denote ±1 SEM. Dashed lines represent the corresponding values of the sampling distribution of φ=logτ , which is normal (see Materials and methods; Figure 6—source data 2). The prior distribution’s μ was comparable in the vestibular condition to the μ of the actual sampling distribution (sampling distribution μ: 0.58 logs – p-value of prior μ difference obtained by bootstrapping – vestibular: p = 0.014, visual: p =< 10–7; combined: p < 10–7). Asterisks denote the level of statistical significance of differences in the fitted parameters across conditions (*: p<0.05, **: p<0.01, ***: p<0.001).

The source of inaccuracies in the estimated time constant can be understood by examining the model parameters (Figure 6B).The ratio λ of prior over likelihood standard deviations was significantly lower in the vestibular condition than other conditions, suggesting stronger relative weighting of the prior over the likelihood (Figure 6B left, green bar; mean ratio λ± standard SEM – vestibular: 0.30 ± 0.09, visual: 1.02 ± 0.17, combined: 0.80 ± 0.10; p-value of ratio λ paired differences obtained by bootstrapping – vestibular vs. visual: p = 0.0007, vestibular vs. combined: p = 0.0087; visual vs. combined: p = 0.016). Notably, the ratio was close to one for the visual and combined conditions, suggesting equal weighting of prior and likelihood. Thus, participants’ estimate of the control dynamics in the vestibular condition was plagued by a combination of strong prior and weak likelihood, which explains the much stronger regression toward the mean in Figure 6A.

Alternative models

To test whether our assumption of a static prior distribution over time constants was reasonable, we fit an alternative Bayesian model in which the prior distribution was updated iteratively on every trial, as a weighted average of the prior on the previous trial and the current likelihood over φ (Dynamic prior model; see Materials and methods). For this version, the initial prior μ was taken to be the time constant on the first trial, and we once again modeled the likelihood and prior as normal distributions over the log-transformed variable, φ, where the likelihood was centered on the actual φ and was therefore not a free parameter. Thus, we fit one parameter: the ratio λ of prior σ over likelihood σ. On each trial, the relative weighting of prior and likelihood responsible for the update of the prior depended solely on λ; that is, the relationship between their corresponding σ (i.e. relative widths). The performance of the static and dynamic prior models was comparable in all conditions, for both distance and angle, suggesting that a static prior is adequate in explaining the participants’ behavior on this task (Figure 7; light vs. dark bars). In line with our expectations, when updating the prior in the dynamic model, the weighting of the previous-trial prior received significantly more weight in the vestibular condition (in the range of [0,1]; mean prior weights ± SEM– vestibular: 0.93 ± 0.03, visual: 0.48 ± 0.10, combined: 0.61 ± 0.09; p-value of paired weight differences obtained by bootstrapping – vestibular vs. visual: p = 10–5 , vestibular vs. combined: p = 4 · 10–4; visual vs. combined: p = 0.08). The comparable goodness of models with static and dynamic priors suggest that sensory observations were not too reliable to cause rapid changes in prior expectations during the course of the experiment.

Figure 7 with 2 supplements see all
Comparison of the correlations between the actual τ and the subjective residual errors implied by three different τ-estimation models (Bayesian estimation with a static prior ([S], Bayesian estimation with a dynamic prior [D], fixed estimate [F]).

We tested the hypotheses that either the prior distribution should not be static or that the participants ignored changes in the control dynamics and navigated according to a fixed time constant across all trials (fixed τ estimate model; see Materials and methods). For this, we compared the correlations between the subjective residual error and the actual trial τ that each model produces. The dynamic prior model performs similarly to the static prior model in all conditions, indicating that a static prior is adequate in explaining our data (p-values of paired t-test between correlation coefficients of the two models: distance – vestibular: p = 0.96, visual: p = 0.19, combined: p = 0.91; angle – vestibular: p = 0.87, visual: p = 0.09, combined: p = 0.59). For visual and combined conditions, the fixed τ model not only fails to minimize the correlations but, in fact, strongly reverses it, for both distance (left) and angle (right). Since these correlations arise from the believed trajectories that the fixed τ model produces, this suggests that participants knowingly stop before their believed target location for higher time constants. Model performance was only comparable in the vestibular condition, where the average correlation of the fixed τ model (F) was contained within the 95% confidence intervals (CI) of the static prior Bayesian model (S), for both distance and angle (distance – F: mean Pearson’s correlation coefficient ρ = 0.03, S: 95% CI of Pearson’s correlation coefficient ρ = [–0.10 0.25]; angle – F: mean Pearson’s correlation coefficient ρ = –0.01, S: 95% CI of Pearson’s correlation coefficient ρ = [–0.12 0.15]). Error bars denote ±1 SEM.

At the other extreme, to test whether participants used sensory observations at all to estimate control dynamics, we compared the static prior Bayesian model to a parsimonious model that assumed a fixed time constant across all trials (i.e. completely ignoring changes in control dynamics). This latter model can be understood as a Bayesian model instantiated with a very strong static prior. In line with our expectations (see Figure 6A), this latter model performed comparably in the vestibular condition, but substantially worse in the visual and combined conditions (Figure 7).

Due to the correlated nature of the random walk process dictating the evolution of time constants, an alternative by which participants could get away without estimating the time constant in the vestibular condition would be to carry over their estimate from the previous combined/visual trial to the current vestibular trial. To test this, we considered two models: the time constant estimate in the current vestibular trial was taken to be either the real time constant or the posterior estimate of the time constant from the previous visual/combined trial. Neither model, however, could account for the participants’ behavior, as they could not fully explain away the correlation between the residual errors and the time constant (Figure 7—figure supplement 1). Intuitively, although choosing actions with respect to the previous trial’s time constant should result in estimates that regress toward the mean, the predicted effect is weaker than that observed in the data.

Finally, we tested a variation of previously suggested sensory feedback control models (Glasauer et al., 2007; Grasso et al., 1999) where a controller relies solely on sensory inputs to adjust their control without explicitly estimating the latent variables governing the control dynamics. Specifically, the model assumes that participants implement a type of bang-bang control that switches at a certain distance from the target (or more accurately, the mean response). However, this model predicts a much stronger dependence of the responses on the dynamics compared to our data, and characteristics of the predicted control input differ significantly from the actual control (Figure 7—figure supplement 2). Overall, our results suggest that optic flow, but not vestibular signals, primarily contributes to inferring the latent velocity dynamics.

Discussion

We showed that human participants can navigate using different sensory cues and that changes in the control dynamics affect their performance. Specifically, we showed that participants can path integrate to steer toward a remembered target location quite accurately in the presence of optic flow. In contrast, inertial (vestibular/somatosensory) cues generated by motion cueing alone lacked the reliability to support accurate path integration, leading to substantially biased responses under velocity control. Performance was also influenced by the changing control dynamics in all sensory conditions. Because control dynamics were varied on a trial-by-trial basis, sensory cues were crucial for inferring those dynamics. We used probabilistic inference models to show that the observed responses are consistent with estimates of the control dynamics that were biased toward the center of the experimental distribution. This was particularly strong under the vestibular condition such that the response gain substantially increased as the motion dynamics tended toward acceleration control. Although control dynamics were correlated across trials, our models showed that participants did not take advantage of those correlations to improve their estimates.

Relation to past work

In the paradigm used here, participants actively controlled linear and angular motion, allowing us to study multisensory path integration in two dimensions with few constraints. This paradigm was made possible by the development of an MC algorithm to render visual and vestibular cues either synchronously or separately. In contrast, previous studies on human path integration used restricted paradigms in which motion was either 1D or passively rendered, and participants’ decisions were typically reduced to end-of-trial binary evaluations of relative displacement (Campos et al., 2012; Chrastil et al., 2016; Chrastil et al., 2019; Jürgens and Becker, 2006; Koppen et al., 2019; ter Horst et al., 2015; Tramper and Medendorp, 2015). As a result, findings from past studies that evaluate the contributions of different sensory modalities to self-motion perception (Chrastil et al., 2019; Israël et al., 1996; Koppen et al., 2019; Seemungal et al., 2007; ter Horst et al., 2015) may be more limited in generalizing to real-world navigation.

Our results show that, at least in humans, navigation is driven primarily by visual cues under conditions of near-constant travel velocity (velocity control). This dominance of vision suggests that the reliability of the visual cues is much higher than vestibular cues (as generated by our platform), as corroborated by the data from the combined condition in which performance resembles the visual condition. This makes sense because the vestibular system is mainly sensitive to acceleration, exhibiting higher sensitivity to higher-frequency motion compared to the visual system (Karmali et al., 2014). Consequently, it may only be reliable when motion is dominated by acceleration. This interpretation is further supported by the observation that participants’ vestibular performance was a lot less biased in the regime of acceleration joystick control, where accelerations are prolonged during navigation.

Experimental constraints in past navigation studies have also precluded examining the influence of control dynamics. In fact, the importance of accurately inferring control dynamics, which are critical for predicting the sensory consequences of actions, has largely been studied in the context of limb control and motor adaptation (Burdet et al., 2001; Kording et al., 2007; Krakauer et al., 1999; Lackner and Dizio, 1994; Shadmehr and Mussa-Ivaldi, 1994; Takahashi et al., 2001). Here, we provide evidence for the importance of accurately inferring control dynamics in the context of path integration and spatial navigation. Although participants were not instructed to expect changes in the latent dynamics and received no feedback, we showed that they nevertheless partly adapted to those dynamics while exhibiting a bias toward prior expectations about these dynamics. This biased estimation of control dynamics led to biased path integration performance. This result is analogous to findings about the effect of changing control dynamics in motor control: first, adaptation to the dynamics happens even in the absence of performance-related feedback (Batcho et al., 2016; Lackner and Dizio, 1994) and, second, this adaptation relies on prior experience (Arce et al., 2009) and leads to systematic errors when knowledge of the dynamics is inaccurate (Körding et al., 2004). Thus, participants try to exploit the additional information that the dynamics contain about their self-motion in order to achieve the desired displacement.

A Bayesian estimator with a static prior over the dynamics sufficiently explained participants’ beliefs in our data, while results were comparable with a dynamic prior that was updated at every trial. This could be attributed to the structure of the random walk of the control dynamics across trials, as a static prior is not as computationally demanding and potentially more suitable for fast changes in the time constant. These Bayesian models attempt to explain behavior in an optimal way given the task structure. Meanwhile, alternative suboptimal models (fixed estimate, carry-over estimate, sensory feedback model) failed to explain behavior successfully, especially when optic flow was available. These results strongly favor underlying computations within the context of optimality in the presence of optic flow.

Task performance was substantially worse in the vestibular condition, in a manner suggesting that vestibular inputs from motion cueing lack the reliability to precisely estimate control dynamics on individual trials. Nevertheless, the vestibular system could still facilitate inference by integrating trial history to build expectations about their statistics. Consistent with this, the mean of the prior distribution over the dynamics fit to data was very close to the mean of the true sampled distribution, suggesting that even if within-trial vestibular observations are not sufficient, participants possibly combine information about the dynamics across trials to construct their prior beliefs. This is consistent with the findings of Prsa et al., 2015, where vestibular cues were used to infer an underlying pattern of magnitude of motion across trials. However, the measurement of the dynamics in that study substantially differs from ours: here, motion dynamics are inferred using self-motion cues within each trial whereas in Prsa et al., 2015, the dynamics were inferred by integrating observations about the magnitude of the displacement across trials. If vestibular cues can in fact support inference of dynamics – as recent findings suggest in eye-head gaze shifts (Sağlam et al., 2014) – a common processing mechanism could be shared across sensory modalities. Overall, this finding highlights the importance of incorporating estimates of the control dynamics in models of self-motion perception and path integration.

Limitations and future directions

Note that restrictions of our motion platform limited the range of velocities that could be tested, allowing only for relatively small velocities (see Materials and methods). Consequently, trial durations were long, but the motion platform also restricted total displacement, so we could not test larger target distances. We previously studied visual path integration with larger velocities and our results in the visual and combined conditions are comparable for similar travel times (as trials exceeded durations of 10 s, undershooting became more prevalent; Lakshminarasimhan et al., 2018). However, it is unclear how larger velocities (and accelerations) would affect participants’ performance (especially under the vestibular condition) and whether the present conclusions are also representative of the regime of velocities not tested.

The design of the MC algorithm allowed us to circumvent the issues associated with the physical limitations of the platform to a large degree. This was achieved in part by exploiting the tilt/translation ambiguity and substituting linear translation with tilt (see Materials and methods). However, high-frequency accelerations, as those found at movement onset, generated tilts that briefly exceeded the tilt-detection threshold of the semicircular canals (Figure 1—figure supplement 2). Although the duration of suprathreshold stimulation was very small, we cannot exclude the possibility that the perceived tilt affected the interpretation of vestibular inputs. For example, participants may not attribute tilt to linear translation, hence underestimating their displacement. This, however, would lead to overshooting to compensate for the lack of perceived displacement,which is not what we observed in our experiment. Another potential explanation for the poor vestibular performance could be that participants perceive tilt as a conflicting cue with respect to their expected motion or visual cues. In that case, participants would only use the vestibular inputs to a small extent if at all. Manipulating vestibular inputs (e.g. gain, noise manipulations) in future experiments, either alone or in conjunction with visual cues, would offer valuable insights on two fronts: first, to help clarify the efficiency of our MC algorithm and its implications on the design of driving simulators in the future, and second, to precisely quantify the contribution of vestibular cues to path integration in natural settings.

For the sake of simplicity, we modeled each trial’s control dynamics as a single measurement per trial when, in reality, participants must infer the dynamics over the course of a trial using a dynamic process of evidence accumulation. Specifically, participants must measure their self-motion velocity over time and combine a series of measurements to extract information about the underlying dynamics. Although we were able to explain the experimental findings of the influence of control dynamics on steering responses with our model, this approach could be expanded into a more normative framework using hierarchical Bayesian models (Mathys et al., 2011) to infer subjective position estimates by marginalizing over possible control dynamics.

One interesting question is whether providing feedback would eliminate the inference bias of the control dynamics estimation and future studies should explicitly test this hypothesis. Furthermore, it would be interesting to jointly introduce sensory conflict and manipulate sensory reliability to study dynamic multisensory integration such that sensory contributions during navigation can be better disentangled. Although it has been shown that cue combination takes place during path integration (Tcheang et al., 2011), previous studies have had contradicting results regarding the manner in which body-based and visual cues are combined (Campos et al., 2010; Chrastil et al., 2019; Koppen et al., 2019; Petrini et al., 2016; ter Horst et al., 2015). Since visual and vestibular signals differ in their sensitivity to different types of motion (Karmali et al., 2014), the outcomes of their integration may depend on the self-motion stimuli employed. Combined with hierarchical models of self-motion inference that considers the control dynamics, it is possible to develop an integrated, multi-level model of navigation, while constraining dramatically the hypothesized brain computations and their neurophysiological correlates.

Materials and methods

Equipment and task

Fifteen participants (9 male, 6 female; all adults in the age group 18–32) participated in the experiments. Apart from two participants, all participants were unaware of the purpose of the study. Experiments were first performed in the above two participants before testing others. All experimental procedures were approved by the Institutional Review Board at Baylor College of Medicine and all participants signed an approved consent form.

Experimental setup

Request a detailed protocol

The participants sat comfortably on a chair mounted on an electric motor allowing unrestricted yaw rotation (Kollmorgen motor DH142M-13–1320, Kollmorgen, Radford, VA), itself mounted on a six-degree-of-freedom motion platform (comprised of MOOG 6DOF2000E, Moog Inc, East Aurora, NY). Participants used an analog joystick (M20U9T-N82, CTI electronics, Stratford, CT) with two degrees of freedom and a circular displacement boundary to control their linear and angular speed in a virtual environment based on visual and/or vestibular feedback. The visual stimulus was projected (Canon LV-8235 UST Multimedia Projector, Canon USA, Melville, NY) onto a large rectangular screen (width × height : 158 × 94 cm) positioned in front of the participant (77 cm from the rear of the head). Participants wore crosstalk free ferroelectric active-shutter 3D goggles (RealD CE4s, ColorLink Japan, Ltd, Tokyo, Japan) to view the stimulus. Participants wore headphones generating white noise to mask the auditory motion cues. The participants’ head was fixed on the chair using an adjustable CIVCO FirmFit Thermoplastic face mask (CIVCO, Coralville, IA).

Spike2 software (Power 1401 MkII data acquisition system from Cambridge Electronic Design Ltd, Cambridge, United Kingdom) was used to record joystick and all event markers for offline analysis at a sampling rate of 83313 Hz.

Visual stimulus

Request a detailed protocol

Visual stimuli were generated and rendered using C++ Open Graphics Library (OpenGL) by continuously repositioning the camera based on joystick inputs to update the visual scene at 60 Hz. The camera was positioned at a height of 70 cm above the ground plane, whose textural elements lifetimes were limited (∼250 ms) to avoid serving as landmarks. The ground plane was circular with a radius of 37.5 m (near and far clipping planes at 5 and 3750 cm, respectively), with the participant positioned at its center at the beginning of each trial. Each texture element was an isosceles triangle (base × height 5.95 × 12.95 cm) that was randomly repositioned and reoriented at the end of its lifetime. The floor density was held constant across trials at ρ=2.5elements/m2. The target, a circle of radius 25 cm whose luminance was matched to the texture elements, flickered at 5 Hz and appeared at a random location between θ=±38 deg of visual angle at a distance of r=2.5-5.5 m (average distance r¯ = 4 m) relative to where the participant was stationed at the beginning of the trial. The stereoscopic visual stimulus was rendered in an alternate frame sequencing format and participants wore active-shutter 3D goggles to view the stimulus.

Behavioral task – visual, inertial, and multisensory motion cues

Request a detailed protocol

Participants were asked to navigate to a remembered target (‘firefly’) location on a horizontal virtual plane using a joystick, rendered in 3D from a forward-facing vantage point above the plane. Participants pressed a button on the joystick to initiate each trial and were tasked with steering to a randomly placed target that was cued briefly at the beginning of the trial. A short tone at every button push indicated the beginning of the trial and the appearance of the target. After 1 s, the target disappeared, which was a cue for the participant to start steering. During steering, visual and/or vestibular/somatosensory sensory feedback was provided (see below). Participants were instructed to stop at the remembered target location, and then push the button to register their final position and start the next trial. Participants did not receive any feedback about their performance. Prior to the first session, all participants performed about 10 practice trials to familiarize themselves with joystick movements and the task structure.

The three sensory conditions (visual, vestibular, combined) were randomly interleaved. In the visual condition, participants had to navigate toward the remembered target position given only visual information (optic flow). Visual feedback was stereoscopic, composed of flashing triangles to provide self-motion information but no landmark. In the vestibular condition, after the target disappeared, the entire visual stimulus was shut off too, leaving the participants to navigate in complete darkness using only vestibular/somatosensory cues generated by the motion platform. In the combined condition, participants were provided with both visual and vestibular information during their movement.

Independently of the manipulation of the sensory information, the properties of the motion controller also varied from trial to trial. Participants experienced different time constants in each trial, which affected the type and amount of control that was required to complete the task. In trials with short time constants, joystick position mainly controlled velocity, whereas in trials with long time constants, joystick position approximately controlled the acceleration (explained in detail in Control dynamics below).

Each participant performed a total of about 1450 trials (mean ± standard deviation (SD): 1450 ± 224), split equally among the three sensory conditions (mean ± SD – vestibular: 476 ± 71, visual: 487 ± 77, combined: 487 ± 77). We aimed for at least 1200 total trials per participant, and collected extended data from participants whose availability was compatible with the long runtime of our experiment.

Joystick control

Request a detailed protocol

Participants navigated in the virtual environment using a joystick placed in front of the participant’s midline, in a holder mounted on the bottom of the screen. This ensured that the joystick was parallel to the participant’s vertical axis, and its horizontal orientation aligned to the forward movement axis. The joystick had two degrees of freedom that controlled linear and angular motion. Joystick displacements were physically bounded to lie within a disk, and digitally bounded to lie within a square. Displacement of the joystick over the anterior-posterior axis resulted in forward or backward translational motion, whereas displacement in the left-right axis resulted in rotational motion. The joystick was enabled after the disappearance of the target. To avoid skipping trials and abrupt stops, the button used to initiate trials was activated only when the participant’s velocity dropped below 1 cm/s.

The joystick controlled both the visual and vestibular stimuli through an algorithm that involved two processes. The first varied the control dynamics, producing velocities given by a lowpass filtering of the joystick input, mimicking an inertial body under viscous damping. The time constant for the control filter (control timescale) was varied from trial to trial, according to a correlated random process as explained below.

The second process was an MC algorithm applied to the output of the control dynamics process, which defined physical motion that approximated the accelerations an observer would feel under the desired control dynamics, while avoiding the hardwired constraints of the motion platform. This MC algorithm trades translation for tilt, allowing extended acceleration without hitting the displacement limits (24 cm).

These two processes are explained in detail below.

Control dynamics

Request a detailed protocol

Inertia under viscous damping was introduced by applying a lowpass filter on the control input, following an exponential weighted moving average with a time constant that slowly varied across trials. On each trial, the system state evolved according to a first-order Markov process in discrete time, such that the movement velocity at the next time step depended only on the current joystick input and the current velocity. Specifically, the vertical and horizontal joystick positions utv and utω determined the linear and angular velocities vt and ωt as:

(1a) vt+1=αvt+βvutvandωt+1=αωt+βωutω

The time constant τ of the lowpass filter determined the coefficient α (Figure 1—figure supplement 3A):

(1b) α= eΔtτ

Sustained maximal controller inputs of utv=1 or uωv=1 produce velocities that saturate at:

(1c) νmax=βν/(1α)andωmax=βω/(1α)

We wanted to set vmax and ωmax in such a way that would ensure that a target at an average linear or angular displacement x is reachable in an average time T, regardless of τ (we set x=4 m and T=8.5 s). This constrains the input gains βv and βω. We derived these desired gains based on a 1D bang-bang control model (i.e. purely forward movement, or pure turning) which assumes maximal positive control until time s, followed by maximal negative control until time T (Figure 1—figure supplement 3A). Although we implemented the leaky integration in discrete time with a frame rate of 60 Hz, we derived the input gains using continuous time and translated them to discrete time.

The velocity at any time 0tT during the control is:

(1d) vtvmax={1 etτ0 <ts1+ (vsvmax+1)etsτs <t<T

where vs is the velocity at the switching time s when control switched from positive to negative, given by:

(1e) vs= vmax1- e-sτ

By substituting vs into Equation 1d and using the fact that at time T, the controlled velocity should return to 0, we obtain an expression that we can use to solve for s:

(1f) vT=0= -1+ vmax1- e-sτvmax + 1e-T-sτ

Observe that vmax cancels in this equation, so the switching time s is independent of vmax and therefore also independent of the displacement x (see also Figure 1—figure supplement 3A):

(1g) s = τlog(1 + eTτ2)

Integrating the velocity profile of Equation 1d to obtain the distance travelled by time T, substituting the switch time s (Figure 1—figure supplement 3A), and simplifying, we obtain:

(1h) x= xT=2 τ vmaxlog(coshT2τ)

We can then solve for the desired maximum linear speed vmax for any time constant τ, average displacement x, and trial duration T:

(1i) vmax(τ)= x2τ1logcosh(T/2τ)

Similarly, the maximum angular velocity was: ωmax(τ)= θ2τ1logcosh(T/2τ) , where θ is the average angle we want our participant to be able to turn within the average time T.

These equations can also be re-written in terms of a dimensionless time z= τ/T (duration of trial in units of the time constant) and average velocities v¯=x/T and ω¯=θ/T:

(1j) vmax=v¯1/2zlogcosh(1/2z)andωmax=ω¯1/2zlogcosh(1/2z)

where θ is the average angle we want the participants to be able to steer within time T.

Setting control gains according to Equation 1i allows us to manipulate the control timescale τ, while approximately maintaining the average trial duration for each target location (Figure 1—figure supplement 3B). Converting these maximal velocities into discrete-time control gains using Equations 1.1–1.3 gives us the desired inertial control dynamics.

Slow changes in time constant

Request a detailed protocol

The time constant τ was sampled according to a temporally correlated log-normal distribution. The log of the time constant, ϕ=logτ , followed a bounded random walk across trials according to (Figure 1—figure supplement 3C)

(2) ϕt+1 = c ϕt + ηt

The marginal distribution of ϕ was normal, N(μϕ,σϕ2) , with mean μϕ = 12lnτ-+lnτ+ and standard deviation σϕ = 14(logτ+  logτ) , which ensured that 95% of the velocity timescales lay between τ- and τ+ . The velocity timescales changed across trials with their own timescale τϕ , related to the update coefficient by c = e-Δtτϕ , where we set Δt to be one trial and τϕ to be two trials. To produce the desired equilibrium distribution of ϕ we set the scale of the random walk Gaussian noise ηN(μη,ση2) with μη=μϕ(1-c) and ση2=σϕ21-c2 .

MC algorithm

Request a detailed protocol

Each motion trajectory consisted of a linear displacement in the 2D virtual space combined with a rotation in the horizontal plane. While the motion platform could reproduce the rotational movement using the yaw motor (which was unconstrained in movement range and powerful enough to render any angular acceleration or speed in this study), its ability to reproduce linear movement was limited by the platform’s maximum range of 25 cm and maximum velocity of 50 cm/s (in practice, the platform was powerful enough to render any linear acceleration in this study). To circumvent this limitation, we designed an MC algorithm that takes advantage of the gravito-inertial ambiguity (Einstein, 1907) inherent to the vestibular organs (Angelaki and Dickman, 2000; Fernandez et al., 1972; Fernández and Goldberg, 1976).

Specifically, the otolith organs in the inner ear sense both linear acceleration (A) and gravity (G), that is, they sense the gravito-inertial acceleration (GIA): F=G+A. Consequently, a forward acceleration of the head (ax , expressed in g, with 1g = 9.81 m/s2) and a backward pitch (by an angle θ, in radians) will generate a total GIA Fx = θ+ ax. The MC took advantage of this ambiguity to replace linear acceleration by tilt. Specifically, it controlled the motion platform to produce a total GIA (Figure 1—figure supplement 1, ‘desired platform GIA’) that matched the linear acceleration of the simulated motion in the virtual environment. As long as the rotation that induced this simulated acceleration was slow enough, the motion felt subjectively was a linear acceleration.

This control algorithm was based on a trade-off where the high-pass component of the simulated inertial acceleration (Figure 1—figure supplement 1, ‘desired platform linear acceleration’) was produced by translating the platform, whereas the low-pass component was produced by tilting the platform (Figure 1—figure supplement 1, ‘desired platform tilt’).

Even though this method is generally sufficient to ensure that platform motion remains within its envelope, it does not guarantee it. Thus, the platform’s position, velocity, and acceleration commands were fed through a sigmoid function f (Figure 1—figure supplement 1, ‘platform limits’). This function was equal to the identity function (fx=x) as long as motion commands were within 75% of the platform’s limits, so these motion commands were unaffected. When motion commands exceed this range, the function bends smoothly to saturate at a value set slightly below the limit, thus preventing the platform from reaching its mechanical range (in position, velocity, or acceleration) while ensuring a smooth trajectory. Thus, if the desired motion exceeds 75% of the platform’s performance envelope, the actual motion of the platform is diminished, such that the total GIA actually experienced by the participant (‘actual platform GIA’) may not match the desired GIA. If left uncorrected, these GIA errors would result in a mismatch between inertial motion and the visual VR stimulus. To prevent these mismatches, we designed a loop that estimates GIA error and updates the simulated motion in the visual environment. For instance, if the joystick input commands a large forward acceleration and the platform is unable to reproduce this acceleration, then the visual motion is updated to represent a slower acceleration that matches the platform’s motion. Altogether, the IC and MC algorithms are applied sequentially as follows: (1) The velocity signal produced by the IC process controls the participant’s attempted motion in the virtual environment. (2) The participant acceleration in the VR environment is calculated and inputted to the MC algorithm (‘desired platform GIA’). (3) The MC computes the platform’s motion commands and the actual platform GIA is computed. (4) The difference between the desired GIA motion actual GIA (GIA error) is computed and used to update the motion in the virtual environment. (5) The updated position is sent to the visual display.

A summary of the performance and efficiency of the MC algorithm during the experiment can be seen in Figure 1—figure supplement 2. For a detailed view of the implementation of the MC algorithm, refer to Appendix 1.

Quantification and statistical analysis

Customized MATLAB code was written to analyze data and to fit models. Depending on the quantity estimated, we report statistical dispersions either using 95% confidence interval, standard deviation, or standard error in the mean. The specific dispersion measure is identified in the portion of the text accompanying the estimates. For error bars in figures, we provide this information in the caption of the corresponding figure. We report and describe the outcome as significant if p<0.05.

Estimation of response gain

Request a detailed protocol

In each sensory condition, we first computed the τ-independent gain for each participant; we regressed (without an intercept term) each participant’s response positions (r~,θ~) against target positions (r,θ) separately for the radial (r~ vs. r) and angular (θ~ vs. θ) coordinates, and the radial and angular response gains (gr , gθ) were quantified as the slope of the respective regressions (Figure 2A). In addition, we followed the same process to calculate gain terms within three τ groups of equal size (Figure 3A).

Correlation between residual error and time constant τ

Request a detailed protocol

To evaluate the influence of the time constant on the steering responses, we computed the correlation coefficient between the time constants and the residual errors from the mean response (estimated using the response gain) for distance and angle. Under each sensory condition, the radial residual error (εr) for each trial i was given by:

(3a) εr,i=r~igrri

where r~i is the radial response, and the mean radial response is given by multiplying the target distance ri by the radial gain gr . Similarly, the angular residual error (εθ) was calculated as:

(3b) εθ,i=θ~igθθi

Regression model containing τ

Request a detailed protocol

To assess the manner in which the time constant affected the steering responses, we augmented the simple linear regression models for response gain estimation mentioned above with τ-dependent terms (Figure 3—figure supplement 2; τ and τr for radial response r~ , τ, and τθ for angular response θ~). Subsequently, we calculated the Pearson’s linear partial correlations between the response positions and each of the three predictors.

Estimation of τ-dependent gain

Request a detailed protocol

To quantify the extent to which the time constant modulates the response gain, we linearly regressed each participant’s response positions (r~,θ~) against target positions (r,θ) and the interaction between target positions and the time constant τ according to:

(4a) r~=brr+arrτandθ~=bθθ+aθθτ

where br, bθ and ar, aθ are the coefficients of the target locations and the interaction terms, respectively. All quantities were first standardized by dividing them with their respective standard deviation, to avoid size effects of the different predictors. This form allows for modulation of the response gain by the time constant, which is clear when the target location is factored out:

(4b) r~=r(br+arτ)andθ~=θ(bθ+aθτ)

Estimation of simulated no-adaptation response gains

Request a detailed protocol

We quantified the extent to which participants failed to adapt to the underlying control dynamics, by generating a simulated null case for no adaptation. First, we selected trials in which the time constant was close to the mean of the sampling distribution (±0.2 s). Then, we integrated the steering input of those trials with time constants from other trials (see Equations 1a, 1b). This generated a set of trajectories for which the steering corresponded to a different time constant, providing us with a null case of no adaptation to the underlying dynamics. We then stratified the simulated trajectories into equal-sized groups based on the time constants (same as in Figure 3A) and computed the corresponding radial and angular response gains. Note that the response gains were computed according to the target locations of the initial set of trials.

Rationale behind modeling approach

Request a detailed protocol

We tested the hypothesis that the τ-dependent errors in steering responses arise from participants misestimating control dynamics on individual trials. Specifically, if participants’ estimate of the time constant τ differs from the actual value, then their believed trajectories (computed using the estimated τ) would differ accordingly from the actual trajectories along which they travelled. believed stopping locations should land on or near the target. However, various unmeasurable fluctuations in that belief across trials should lead to variability clustered around the target location. Because participants try to stop on their believed target location, the believed stopping locations, subject to unmeasurable fluctuations of the belief across trials, should be distributed evenly around the participant’s mean response (mean belief), after adjusting for the average response gain. This is so because, if the distribution of believed responses depended on the time constant, then that would imply that participants willingly misestimated the control dynamics. Mathematically, the subjective residual errors (deviation of the believed stopping location from the mean response for a given target; see Materials and methods: Correlation between residual error and time constant τ) should be distributed evenly around zero and be uncorrelated with the time constant τ. Therefore, a good model of the participants’ beliefs should predict that subjective residual errors are statistically independent of the time constant.

Bayesian observer model for τ estimation

Request a detailed protocol

To account for the effect of the time constant τ on steering performance, we considered a two-step observer model that uses a measurement m of the real time constant τ and a prior distribution over hypothesized time constants in logarithmic scale to compute an estimate τ^ on each trial (first step), and then integrates the actual joystick input using that estimate to reconstruct the participant’s believed trajectory (second step). We formulated our model in the logarithmic space of φ= logτ , therefore the prior distribution over the hypothesized time constants pφ was assumed to be normal in log-space with mean, μprior and standard deviation, σprior. The measurement distribution p(m|φ) was also assumed to be normal in log-space with mean φ, and standard deviation σlikelihood . Note that whereas the prior p(φ) remains fixed across trials of a particular sensory modality, the mean of measurement distribution is governed by φ and thus varies across trials. For each sensory modality, we fit two parameters, Θ{μprior,λ} , where λ was taken to be the ratio of σprior over σlikelihood (i.e. their relative weight).

Model fitting

When inferring the participant’s beliefs about the control dynamics, we computed the posterior distribution on trial i as pφmip(φ)p(mi|φ) (Figure 5A, left) and then selected the median over φ (equal to the maximum a posteriori estimate), and back-transformed it from log-space to obtain an estimate of the time constant τ^i for that trial:

(5) τ^i=exp{argmaxφ  p(φ|mi)}

Subsequently, τ^ is used to integrate the actual joystick input and produce the participant’s believed trajectory, according to (Equations 1–10) in the Control dynamics section.

The Bayesian model had two free parameters Θ{μprior,λ}. We fit the model by assuming that participants stop as close to the target as possible given their understanding of the task. Specifically, we minimized the mean squared error (MSE) between the measured mean stopping position (computed using the response gains gr and gθ from Equation 3) and our model of the participant’s believed stopping location x^i given the inferred dynamics τ^i . For each sensory condition:

(6) Θ=argminΘ 1ni=1n{x^i(τ^i,ui)Gxitar}2

where, for each trial i,x^i is the believed participant’s position, τ^i is the estimated time constant, ui is the time series of the joystick control input, xitar is the actual target position, G is the response gain matrix determined from gr and gθ , and n is the total number of trials.

Model validation

Request a detailed protocol

To evaluate the performance of our model, we examined the correlations between the subjective residual error and τ that are given by the model. The subjective residual error is defined as the difference between the believed (subjective) stopping location that a model produces and the mean response of the actual trajectories, adjusted for the response gain. The subjective residual errors are calculated for the radial and angular components of the response separately, according to Equation 3 (where actual responses r~,θ~ are substituted by believed responses r~^,θ~^ , respectively). Ideally, these correlations should not exist for the model predictions (explained in text; Figure 5B). We determined the statistical significance of the model-implied correlations by adjusting for multiple comparisons (required level of statistical significance: p = 0.0085). To assess the performance of the Bayesian model, we compared the correlations between believed and actual stopping location with the time constant (Figure 5B; Wilcoxon signed-rank test).

Dynamic prior model

Request a detailed protocol

Since the time constant changes randomly across trials, we tested whether the history of time constants influenced the estimate τ^ . If true, the Bayesian model would imply a prior distribution over φ=logτ that is dynamically changing according to the recent history of time constants, rather than being fixed. To explore this possibility, we repeated the two-step model outlined above, with the difference that the mean of the prior distribution is updated at every trial i by a weighted average of the mean prior in the previous trial and the current measurement over φ:

(7) μprior, i=(1k)μprior, i1+k φi,where k=λ2λ2+1

and where λ is the ratio of prior standard deviation over likelihood standard deviation. As k indicates, the relative weighting between prior and measurement on each trial depends solely on their relative widths. Finally, the initial prior was taken to be the time constant on the first trial. Thus, the only free parameter we fit was λ.

Sensory-independent model

Request a detailed protocol

As another alternative to the Bayesian model with a static prior, we also constructed a model where participants ignored changes in the time constants and navigated according to a fixed estimate τ^ across all trials in each sensory condition. This model had only one free parameter: the time constant estimate τ^ , which was integrated with the actual joystick input of each trial to reconstruct the believed trajectory of the participant. We fit τ^ for each sensory condition by minimizing the MSE between the believed stopping location and the mean response (according to Equation 6).

Model comparison

Request a detailed protocol

To compare the static prior Bayesian model against the dynamic prior Bayesian and the sensory-independent models, we compared the correlations between believed stopping locations and time constants that each model produces (Figure 7; paired Student’s t-test).

Sensory feedback control model

Request a detailed protocol

We tested a sensory feedback control model, in which the controller uses bang-bang control and switches from forward to backward input at a constant and predetermined distance from the target position (corrected for the bias, i.e. mean response). Specifically, we preserved the actual angular and only fitted the linear control input for each trial. Thus, as switch distance, we refer to a Euclidean distance from the bias-corrected target position. We fit the mean and standard deviation of the switch distance for each participant in each condition separately, by minimizing the distance of the actual from the model-predicted stopping locations. To evaluate how well this model describes our data, we compared the correlations and regression slopes between the time constant and residual errors from the stopping locations predicted by the model with those from our actual data (Figure 7—figure supplement 2).

Appendix 1

Implementation of MC algorithm

Step 1

In the first step, the participant’s velocity is being transformed into the VR (screen) coordinates. This transformation is necessary to deduce centrifugal components from the participants’ trajectory, and include them in the motor commands:

vt+1VR,x= vt+1JScos(φtVR)
vt+1VR,y= vt+1JSsin(φtVR)
ωt+1VR= ωt+1JS

where vVR,x and vVR,y are the linear velocities of the participant in VR coordinates, ωVR is the angular velocity of the VR system, and φVR is the direction of the platform in space.

Step 2

As mentioned before, the arena diameter is finite, and it is necessary to keep track of the participant’s position in the arena, to avoid ‘crashing’ on the invisible walls. In this step, the participant’s velocity is slowed down when the participant approaches the boundaries of the arena, to account for a ‘smooth crash’.

Step 3

Here, the current acceleration is calculated in the VR coordinates aVR,x, aVR,y . This is also where the GIA error feedback loop (see Step 10) updates the VR acceleration.

αt+1VR,x= vt+1VR,x vtVR,x+dtτMC.(vtVR,x v^tVR,x)dt
αt+1VR,y= vt+1VR,y vtVR,y+dtτMC.(vtVR,y v^tVR,y)dt

where v^t is the updated velocity from the previous timestep (τMC explained in Step 10). After the acceleration is obtained, it is being transformed back to the participant’s coordinates asub,x, asub,y:

αt+1sub,x= αt+1VR,xcosφtVR+ αt+1VR,ysinφtVR
αt+1sub,y= -αt+1VR,xsinφtVR+ αt+1VR,ycosφtVR

Step 4

Now, the acceleration asub in participant’s coordinates is being transformed into platform coordinates to take into account the orientation of the participant onto the motion platform (φtmoog), which is controlled by the yaw motor. For instance, if the participant faces toward the left of the platform and accelerates forward in egocentric coordinates, then the platform should move to the left:

αt+1desired,x= αt+1sub,xcosφtmoog- αt+1sub,ysinφtmoog
αt+1desired,y= αt+1sub,xsinφtmoog+ αt+1sub,ycosφtmoog

where αt+1desired,x is the desired platform acceleration.

Step 5

This is the MC step. Here, the amount of tilt and translation that will be commanded is computed, based on the tilt-translation trade-off we set. First, the platform’s desired acceleration is computed by applying a step response function ft to the acceleration input:

aMC,xt= 0+adesired,xtft-s ds

where:

f(t)= k1etT1+ k2etT2+ k3etT3
T=[0.07  0.3  1],K=[0.4254  1.9938  0.5684]

These coefficients were adjusted with respect to the following constraints:

  1. f(0) = 1, that is, the output would correspond to the input at t=0. This was chosen to ensure that the high-frequency content of the motion would be rendered by translating the platform.

  2. 0f=0: This was chosen to ensure that, if the input was an infinitely long acceleration, the motion of the platform would stabilize to a point where the linear velocity was 0.

  3. df/dt = 0at t=0. This was chosen because tilt velocity of the platform is equal to -df/dt. Since the tilt velocity at t < 0 is zero, this constraint ensures that tilt velocity is continuous and prevents excessive angular acceleration at t = 0.

The same process is repeated for the y component of the acceleration.

Finally, the amount of tilt (θ, in degrees) is calculated based on the difference between the desired platform motion and the deliverable motion:

θt+1MC,x=sin-1at+1moog,x- at+1MC,xg
θt+1MC,y=sin-1at+1moog,y- at+1MC,yg

where g = 9.81 m/s2.

Step 6

Afterward, the tilt velocity and acceleration are being calculated:

θ˙t+1MC,y= θt+1MC,y θtMC,ydt,θ˙t+1MC,x= θt+1MC,x θtMC,xdt
θ¨ t+1MC,y= θ˙t+1MC,y  θ˙tMC,ydt,θ¨ t+1MC,x= θ˙t+1MC,x  θ˙tMC,xdt

In a next step, we compute the motion command that should be sent by the platform. Note that the platform is placed at a height h below the head. Therefore, tilting the platform by an angle θ induces a linear displacement of the head corresponding to -hθπ180 . Therefore, a linear displacement is added to the platform’s motion to compensate for this. Next, we limit the platform’s acceleration, velocity, and position commands to ensure that they remain within the limit of the actuators. For this purpose, we define the following function  fλ,xmax(x):

{ if |x|λxmax ,                     fλ,xmax(x)= x                                                                    else  if |x|(2λ)xmax ,   fλ,xmax(x)=xmax.sign(x).[|x/xmax|14.(1λ).(|x/xmax|λ)2] if |x|>(2λ)xmax ,          fλ,xmax(x)= sign(x).xmax}

This function is designed so that if the input x increases continuously, for example, xt=t, then the output  fλ,xmax(x(t)) will be identical to x until x reaches a threshold λxmax . After this, the output will decelerate continuously (d fλ,xmax(x(t))dt=constant) until it stops at a value xmax . We fed the platform’s acceleration, velocity, and position command through this function, as follows:

at+1moog,x=  fλ,amax(at+1MC,x+hθ¨t+1MC,xπ180 )
vt+1moog,x=  fλ,vmax(vtmoog,x+dt.at+1moog,x)
xt+1moog,x=  fλ,xmax(xtmoog,x+dt.xt+1moog,x)

The same operation takes place for the y component of the acceleration, as well as for the platform velocity and position. The process is repeated for the tilt command itself.

We set λ=0.75 and amax=4 m/s2 , vmax=0.4 m/s, xmax=0.23 m, θ¨max=300 deg/s, θ˙max=30 deg/s and θmax=10 deg, slightly below the platform’s and actuator physical limits. This ensured that the platform’s motion matched exactly the MC algorithm’s output, as long as it stayed within 75% of the platform’s range. Otherwise, the function f ensured a smooth trajectory and, as detailed in Steps 8–10, a feedback mechanism was used to update the participant position in the VR environment, so as to guarantee that visual motion always matched inertial motion.

Step 7

The motor commands for tilt and translation are being sent to the platform:

xt+1moog, yt+1moog, θt+1moog,x, θt+1moog,y

Step 8

Because of Step 6, the total GIA of the platform may differ from what is commanded by the MC algorithm. To detect and discrepancy, we computed the GIA provided by the platform:

vt+1actual,x=xt+1moog- xtmoogdt , vt+1actual,y=yt+1moog- ytmoogdt
at+1actual,x=vt+1actual,x- vtactual,xdt , at+1actual,y=vt+1actual,y- yvactual,ydt
GIAt+1actual,x=at+1actual,x+gsinθt+1moog,xhθ¨t+1moog,xπ180
GIAt+1actual,y=at+1actual,y+gsinθt+1moog,yhθ¨t+1moog,yπ180

Step 9

We transform platform’s GIA into participant’s reference frame:

GIAt+1sub,x=GIAt+1actual,xcosφtmoog+GIAt+1actual,ysinφtmoog
GIAt+1sub,y=-GIAt+1actual,xsinφtmoog+GIAt+1actual,ycosφtmoog

Also, the error esub between the obtained GIA and desired GIA (from Step 3) is calculated, and fed through the same sigmoid function (λ= 0.75, GIAmax=1 ms2) discussed previously, to avoid computational instability in the case of a big mismatch:

et+1x= fλ,GIAmax(GIAt+1sub,x at+1sub,x)
et+1y= fλ,GIAmax(GIAt+1sub,y at+1sub,y)

Step 10

The GIA error is now used to update the system in the case of a mismatch. First, it is transformed into VR coordinates. Then, the velocity and position in VR coordinates are recomputed based on the joystick input and on the error signal:

et+1VR,x=et+1sub,xcosφtVR- et+1sub,ysinφtVR
et+1VR,y=et+1sub,xsinφtVR+ et+1sub,ycosφtVR
v^t+1VR,x=v^tVR,x+(at+1VR,x+et+1VR,x)dt
v^t+1VR,y=v^tVR,y+(at+1VR,y+et+1VR,y)dt
xt+1VR,x=xtVR,x+v^t+1VR,xdt
xt+1VR,y=xtVR,y+v^t+1VR,ydt
φt+1VR=φtVR+ωtVRdt

Note that the error signal is also fed into the acceleration in VR coordinates (see Step 3). Ideally, linear acceleration should be computed based on the updated velocity value at time t, that is:

αt+1VR,x= vt+1VR,xv^tVR,xdt

However, we found that this led to numerical instability, and instead we introduced a time constant τMC=1s in the computation, as shown in Step 3.

Appendix 2

Appendix 2—table 1
Average radial (top) and angular (bottom) behavioral response gains across participants, for groups of time constant τ magnitudes (mean ± SEM).
Radial bias table
VestibularVisualCombined
τ: [0.34–1.53]0.649 ± 0.0560.818 ± 0.0570.786 ± 0.055
τ: [1.53–2.16]0.733 ± 0.0630.871 ± 0.0590.836 ± 0.056
τ: [2.16–8.89]0.902 ± 0.0770.944 ± 0.0610.917 ± 0.058
Angular bias table
VestibularVisualCombined
τ: [0.34–1.53]0.731 ± 0.0530.919 ± 0.0360.902 ± 0.032
τ: [1.53–2.16]0.770 ± 0.0600.984 ± 0.0380.944 ± 0.029
τ: [2.16–8.89]0.878 ± 0.0611.024 ± 0.0401.012 ± 0.033
Appendix 2—table 2
Pearson’s correlation coefficient (r) and corresponding p-value (p) for radial (top) and angular (bottom) correlation between residual error and the time constant τ across participants.

Mean Pearson’s r ± SEM: radial component – vestibular: 0.52±0.02, visual: 0.36±0.03, combined: 0.37±0.03; angular component – vestibular: 0.23±0.02, visual: 0.23±0.03, combined: 0.26±0.03.

Radial correlations
VestibularVisualCombined
Subject 1r = 0.585, p = 4.2·10–45r = 0.502, p = 1.2·10–37r = 0.617, p = 1.0·10–59
Subject 2r = 0.622, p = 5.5·10–43r = 0.338, p = 7.4·10–12r = 0.377, p = 2.9·10–14
Subject 3r = 0.433, p = 3.5·10–25r = 0.280, p = 2.7·10–11r = 0.374, p = 8.7·10–20
Subject 4r = 0.492, p = 9.1·10–31r = 0.494, p = 3.1·10–31r = 0.350, p = 2.1·10–15
Subject 5r = 0.411, p = 4.4·10–17r = 0.314, p = 3.4·10–10r = 0.360, p = 3.7·10–13
Subject 6r = 0.601, p = 2.0·10–58r = 0.233, p = 1.2·10–08r = 0.233, p = 1.2·10–08
Subject 7r = 0.606, p = 1.6·10–44r = 0.522, p = 1.5·10–31r = 0.474, p = 1.1·10–25
Subject 8r = 0.477, p = 9.6·10–34r = 0.255, p = 5.7·10–10r = 0.294, p = 4.6·10–13
Subject 9r = 0.478, p = 1.0·10–22r = 0.517, p = 7.9·10–27r = 0.523, p = 6.3·10–28
Subject 10r = 0.573, p = 7.2·10–39r = 0.497, p = 4.7·10–28r = 0.576, p = 3.5·10–39
Subject 11r = 0.375, p = 5.9·10–16r = 0.224, p = 2.1·10–06r = 0.144, p = 0.002
Subject 12r = 0.522, p = 2.1·10–39r = 0.341, p = 1.3·10–16r = 0.319, p = 1.1·10–14
Subject 13r = 0.512, p = 1.1·10–38r = 0.385, p = 1.4·10–21r = 0.401, p = 4.7·10–23
Subject 14r = 0.461, p = 8.3·10–30r = 0.241, p = 1.3·10–08r = 0.276, p = 7.0·10–11
Subject 15r = 0.703, p = 1.7·10–61r = 0.214, p = 1.1·10–05r = 0.213, p = 1.3·10–05
Angular correlations
VestibularVisualCombined
Subject 1r = 0.254, p = 1.9·10–08r = 0.302, p = 1.8·10–13r = 0.437, p = 2.3·10–27
Subject 2r = 0.156, p = 0.002r = 0.287, p = 8.6·10–09r = 0.270, p = 9.2·10–08
Subject 3r = 0.301, p = 2.2·10–12r = 0.274, p = 7.3·10–11r = 0.351, p = 1.7·10–17
Subject 4r = 0.315, p = 1.3·10–12r = 0.299, p = 1.7·10–11r = 0.343, p = 8.9·10–15
Subject 5r = 0.153, p = 0.003r = 0.291, p = 6.7·10–09r = 0.387, p = 3.8·10–15
Subject 6r = 0.292, p = 5.9·10–13r = 0.121, p = 0.003r = 0.224, p = 4.8·10–08
Subject 7r = 0.098, p = 0.042r = 0.356, p = 2.4·10–14r = 0.275, p = 6.0·10–09
Subject 8r = 0.346, p = 2.0·10–17r = –0.004, p = 0.920r = 0.005, p = 0.902
Subject 9r = 0.093, p = 0.071r = 0.349, p = 4.1·10–12r = 0.348, p = 3.1·10–12
Subject 10r = 0.294, p = 4.7·10–10r = 0.336, p = 9.6·10–13r = 0.235, p = 9.0·10–07
Subject 11r = 0.064, p = 0.183r = –0.032, p = 0.507r = 0.027, p = 0.575
Subject 12r = 0.271, p = 1.2·10–10r = 0.278, p = 2.7·10–11r = 0.333, p = 5.6·10–16
Subject 13r = 0.238, p = 1.2·10–08r = 0.312, p = 2.5·10–14r = 0.255, p = 1.0·10–09
Subject 14r = 0.215, p = 4.3·10–07r = 0.138, p = 0.001r = 0.217, p = 3.7·10–07
Subject 15r = 0.328, p = 1.2·10–11r = 0.134, p = 0.006r = 0.137, p = 0.005
Appendix 2—table 3
Linear regression slope coefficients for radial (α, top) and angular (β, bottom) components of residual error against the time constant τ across participants.

Mean regression slope ± SEM: Radial (m/s) – vestibular: 0.62±0.06, visual: 0.28±0.03, combined: 0.29±0.03; angular (deg/s) – vestibular: 2.05±0.2, visual: 1.04±0.23, combined: 1.09±0.19.

Radial regression coefficients (m/s)
VestibularVisualCombined
Subject 1α = 0.775α = 0.247α = 0.337
Subject 2α = 0.776α = 0.464α = 0.470
Subject 3α = 0.255α = 0.138α = 0.157
Subject 4α = 0.406α = 0.138α = 0.269
Subject 5α = 1.009α = 0.559α = 0.487
Subject 6α = 0.829α = 0.151α = 0.149
Subject 7α = 0.512α = 0.351α = 0.330
Subject 8α = 0.582α = 0.245α = 0.222
Subject 9α = 0.321α = 0.330α = 0.311
Subject 10α = 0.943α = 0.365α = 0.445
Subject 11α = 0.522α = 0.322α = 0.177
Subject 12α = 0.484α = 0.166α = 0.210
Subject 13α = 0.570α = 0.324α = 0.327
Subject 14α = 0.507α = 0.253α = 0.321
Subject 15α = 0.799α = 0.091α = 0.102
Angular regression coefficients (deg/s)
VestibularVisualCombined
Subject 1β = 1.664β = 1.045β = 1.553
Subject 2β = 1.645β = 2.022β = 1.632
Subject 3β = 1.317β = 0.552β = 1.232
Subject 4β = 2.165β = 0.919β = 1.155
Subject 5β = 2.349β = 3.201β = 3.045
Subject 6β = 2.620β = 0.563β = 0.870
Subject 7β = 1.434β = 1.101β = 0.843
Subject 8β = 4.185β = –0.039β = 0.040
Subject 9β = 1.254β = 1.562β = 1.394
Subject 10β = 2.937β = 1.971β = 1.152
Subject 11β = 1.849β = –0.193β = 0.194
Subject 12β = 1.382β = 0.836β = 0.954
Subject 13β = 1.619β = 1.233β = 1.165
Subject 14β = 2.141β = 0.585β = 0.790
Subject 15β = 2.214β = 0.256β = 0.264
Appendix 2—table 4
Partial correlation coefficients (mean ± standard deviation) for prediction of the radial (r~, top) and angular (θ~, bottom) components of the final stopping location (relative to starting position) from initial target distance (r) and angle (θ), the time constant τ, and the interaction of the two (r·τ or r·θ), respectively.
Radial partial correlation coefficients ± standard deviation
VestibularVisualCombined
PredictorsRadial distance (r)0.20 ± 0.050.48 ± 0.130.45 ± 0.10
Time constant (τ)–0.06 ± 0.070.01 ± 0.06–0.03 ± 0.06
Interaction term (r·τ)0.20 ± 0.090.07 ± 0.060.12 ± 0.09
Angular partial correlation coefficients ± standard deviation
VestibularVisualCombined
PredictorsAngular distance (θ)0.57 ± 0.130.90 ± 0.080.90 ± 0.06
Time constant (τ)–0.06 ± 0.08–0.01 ± 0.06–0.07 ± 0.06
Interaction term (θ·τ)0.27 ± 0.110.28 ± 0.150.33 ± 0.14

Data availability

MATLAB code implementing all quantitative analyses in this study is available online (https://github.com/AkisStavropoulos/matlab_code, copy archived at swh:1:rev:03fcaf8a8170d99f80e2bd9b40c888df34513f89). The dataset was made available online at the following address: https://gin.g-node.org/akis_stavropoulos/humans_control_dynamics_sensory_modality_steering.

The following data sets were generated
    1. Stavropoulos L
    2. Laurens, Pitkow A
    (2021) G-node
    ID humans_control_dynamics_sensory_modality_steering. Steering data from a navigation study in virtual reality under different sensory modalities and control dynamics, named "Influence of sensory modality and control dynamics on human path integration".

References

    1. Einstein A
    (1907)
    Relativitätsprinzip und die ausdemselbengezogenenFolgerungenOn the Relativity Principle and the Conclusions Drawn from It
    Jahrbuch Der Radioaktivität 1:411–462.

Decision letter

  1. Adrien Peyrache
    Reviewing Editor; McGill University, Canada
  2. Richard B Ivry
    Senior Editor; University of California, Berkeley, United States
  3. Benjamin J Clark
    Reviewer; University of New Mexico, United States
  4. Gunnar Blohm
    Reviewer; Queen's University, Canada
  5. Stefan Glasauer
    Reviewer; Ludwig-Maximilians-Universität München, Germany

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Influence of sensory modality and control dynamics on human path integration" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Richard Ivry as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Benjamin Clark (Reviewer #1); Gunnar Blohm (Reviewer #2); Stefan Glasauer (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

In this manuscript, the authors investigated the importance of visual and vestibular sensory cues and the underlying motion dynamics to the accuracy of spatial navigation by human subjects. A virtual environment coupled with a 6-degrees of motion platform, as described in prior studies, allowed precise control over sensory cues and motion dynamics. To investigate whether control dynamics influence performance, the transfer function between joystick deflection and self-motion velocity was modified at each trial, resulting in subject to rely more on velocity or acceleration to find their way. To explain the main result that navigation error depends on control dynamics, the authors propose a probabilistic model in which an internal estimate of dynamics is biased by a strong prior. Overall, the three reviewers agree this manuscript might be suitable for publication in eLife and that additional data are not necessary. However, the analyses need to be clarified and the conclusion better justified. You will find below a summary of the main concerns. Please refer to the reviewers' comments appended at the end for more details.

Essential revisions:

1. Concerns were raised regarding motion cueing that was used to approximate the vestibular cues that would be present during real motion. The reviewers think that it should be better to refrain from generalizing and to restrict the conclusions to this specific artificial type of vestibular input. It could even by interesting, since motion cueing is used in driving simulators. See reviewer #2, point #3 and reviewer #3, point #3.

2. One possible interpretation of the data is that the subjects rely almost exclusively on sensory feedback, and that no estimate of control dynamics is necessary. One caveat of the current design is that the different trial types were interleaved, possibly resulting in unreliable efferent copies (leading subjects to estimate velocity from sensory inputs only) and a history effect in the estimation of tau (biasing vestibular trials). The authors should provide more evidence that their effect is not the result of feedback control only and that there is no history effect. See reviewer #2 point #2 and reviewer #3, point #1-2.

3. The relationship between tau and performance is unclear and should be clarified. Figure 3A seems to contradict Figure 5A. See reviewer #2, point #1.

4. It is unclear why the authors did not propose a more normative framework, e.g. using a hierarchical Bayesian model, as suggested in the discussion. This would be a very interesting addition to the manuscript. See reviewer #2 point #4.

5. The manuscript lack important information and details: number of trials, maximal velocity, difference between males and females, slope of the dependence between time constant and error. The actual control signal, the joystick command, should be shown and analyzed. See reviewer #1, point #1-2; reviewer #3, point #4-5.

6. It seems that tau was correlated with trial duration and velocity (Supp Figure 4), unlike what is stated in the manuscript (the effect of both factors are said to be "unlikely" p 161-167). The author should clarify this point. See reviewer #3, point #5.

7. Data presentation can be improved. See reviewer #1 point #3-5.

Reviewer #1:

1) The study tested performance by both male and female subjects. Could the authors comment as to whether sex differences were observed across performance measures? Perhaps sex can be indicated in some of the scatter plots.

2) Figure 2A. It would be helpful if the authors identified the start-point of the trajectory and also provided more explanation of the schematic in the caption.

3) Figure 2B-C. It would be helpful if the authors could expand this section to show some example trajectories and the relationship between examples and plotted data points. This could be done by presenting measures (radial distance, angular eccentricity, grain) for each example trajectory.

4) Because the range of sampled time-constants can vary across subjects, it would nice to show plots as in Figure 3B for each subject (i.e., in supplementary material).

5) Discussion. The broader implications of the findings from the models are not sufficiently discussed. In addition, some comparison could also be made to other recent efforts to model path integration error (e.g., PMC7250899).Reviewer #2:

The authors asked how the brain uses different sensory signals to estimate self-motion for path integration in the presence of different movement dynamics. They used a new paradigm to show that path integration based on vision was mostly accurate, but vestibular signals alone led to systematic errors particularly for velocity-based control.

While I really like the general idea and approach, the conclusions of this study hinge on a number of assumptions for which it would be helpful if the authors could provide better justifications. I also have some clarification questions for certain parts of the manuscript.

1) lines 26-7: "performance in all conditions was highly sensitive to the underlying control dynamics". This is hard to really appreciate from the residual error regressions in Figure 3 and seems to be contradicting Figure 5A (for vestibular condition). A more explicit demonstration of how tau affects performance would be helpful.

2) One of the main potential caviats I see in the study design is the fact that trial types (vest, visual, combined) were randomly interleaved. In the combined condition, this could potentially result in a form of calibration of the vestibular signal and/or a better estimate of tau that then is used for a subsequent vestibular-only trial. As such, you'd espect a history effect based on trial type more so (or in addition to) simple sequence effects. This is particularly true since you have a random walk design for across-trial changes of tau. In other words, my question is whether in the vestibular condition participants simple use their previous estimate of tau, since that would be on average close enough to the real tau?

3) I thought the experimental design was very clever, but I was missing some crucial information regarding the design choices and their consequences. First, has there been a psychophysical validation of GIA vs pure inertial acceleration? Second, were GIAs always well above the vestibular motion detection threshold? In other words could the worse performance in the vestibular condition be simply related to signal detection limitations? Third, how often did the motion platform enter the platform motion range limit regime (non-linear portion of sigmoid)?

4) lines 331-345: it's unclear to me why you did not propose a more normative framework as outlined here. Especially, a model that would "contrain the hypothesized brain computationa dn their neurophysiological correlates" would be highly desirable and really strengthen the future impact of this study.

5) I would highly recommend all data to be made available online in the same way as the analysis code has been made available.Reviewer #3:

The manuscript describes interesting experimental and modelling results of a novel study of human navigation in virtual space, where participants had to move towards a briefly flashed target using optic flow and/or vestibular cues to infer their trajectory via path integration. To investigate whether control dynamics influence performance, the transfer function between joystick deflection and self-motion velocity was modified trial-by-trial in a clever way. To explain the main result that navigation error depends on control dynamics, the authors propose a probabilistic model in which an internal estimate of dynamics is biased by a strong prior. Even though the paper is clearly written and contains most of the necessary information, the study has several shortcomings, as outlined below, and an important alternative hypothesis has not been considered, so that some of the conclusions are not fully supported by results and modelling.

Substantive concerns

1) The main idea of the paper for explaining the influence of control dynamics is that for accurate path integration performance participants have to estimate dynamics. This idea is apparently inspired by studies on limb motor control. However, tasks in these studies are often ballistic, because durations are short compared to feedback delays. In navigation, this is not the case and participants can therefore rely on feedback control (for another reason, why reliance on sensory feedback in the present study is a good idea, see point 2 below). This means that the task can be solved, even though not perfectly, without actually knowing the control dynamics. Thus, an alternative hypothesis for explaining the results that has not been considered is that the error dependence of control dynamics is a direct consequence of feedback control. Feedback control models have previously been suggested for goal-directed path integration (e.g., Grasso et al., 1999; Glasauer et al., 2007).

To test this assumption, I modelled the experiment assuming a simple bang-bang feedback control that switches at a predefined and constant perceived distance from the target from +1 to -1 and stops when perceived velocity is smaller than an epsilon. Sensory feedback is perceived position, which is assumed to be computed via integration of optic flow. This model predicts a response gain of unity, a strong dependence of error on time constant (slope similar to Figure 3) or of response gain on time constant (Equation 4.1) with regression coefficients of 0.8 and 0.05 (cf. Figure 3D), and a modest correlation between movement duration and time constant (r approximately 0.2, similar to Figure 3A). Thus, a feedback model uninformed about actual motion dynamics and without any attempt to estimate them can explain most features of the data. Modifications (velocity uncertainty, delayed perception, noise on the stopping criterion, etc.) do not change the main features of the simulation results.

Accordingly, since simple feedback control seems to be an alternative to estimating control dynamics in this experiment, the authors’ conclusion in the abstract “that people need an accurate internal model of control dynamics when navigating in volatile environments” is not supported by the current results.

2) Modelling: the main rationale of the model (line 173 ff: “From a normative standpoint, …”) is correct, but an accurate estimate of the dynamics is only required if the uncertainty of the velocity estimate based on the efference copy is not too large. Otherwise, velocity estimation should rely predominantly on sensory input. In my opinion that’s what happens here: due to the trial-by-trial variation in dynamics, estimates based on efference copy are very unreliable (the same command generates a different sensory feedback in each trial), and participants resort to sensory input for velocity estimation. This results in feedback control, which, as mentioned above, seems to be compatible with the results.

3) Motion cueing: Motion cueing can, in the best case, approximate the vestibular cues that would be present during real motion. Furthermore, it is not clear whether the applied tilt is really perceived as linear acceleration, or whether the induced semi-circular canal stimulus is too strong so that subjects experience tilt. Participants might have used the tilt has indicator for onset or offset of translational motion, specifically because it is self-generated, but the contribution of the vestibular cues found in the present experiment might be completely different from what would happen during real movement. Therefore, conclusions about vestibular contributions are not warranted here and cannot solve the questions around “conflicting findings” mentioned in the introduction.

4) Methods: I was not able to find an important piece of information: how many trials were performed in each condition? Without this information, the statistical results are incomplete. It was also not possible to compute the maximal velocity allowed by joystick control, since for Equation 1.9 not just the displacement x and the time constant is required, but also the trial duration T, which is not reported. One can only guess from Figure 1D that vmax is about 50 cm/s for tau=0.6 s and therefore the average T is assumed to be around 8.5 s.

5) Results: information that would useful is not reported. On page 6 it is mentioned that the “effect of control dynamics must be due to either differences in travel duration or velocity profiles”, it is then stated that both is “unlikely”, but no results are given. It turns out that in the supplementary Figure 4A the correlation between time constant and duration/velocity is shown, and apparently the correlation with duration is significant (but small) in the majority of cases. Why is that not discussed in the Results section? Other results are also not reported, for example, what was the slope of the dependence between time constant and error? Why is the actual control signal, the joystick command, not shown and analyzed?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled “Influence of sensory modality and control dynamics on human path integration” for further consideration by eLife. Your revised article has been evaluated by Richard Ivry (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but reviewer #3 has raised several issues that need to be addressed, as outlined below:

Reviewer #3:

The present version of the manuscript has clearly improved, and the authors responded adequately to the comments, and a link to the data was also provided. Some very helpful additional analysis was added, such as shown in Figure 3E. There are, however, some critical points left which are outlined below.

Introduction, line 83f: “These findings suggest that inertial cues alone lack the reliability to support accurate path integration …” Even though in general I’d agree with this statement, the findings in the current paper do not support this claim. Since the inertial cues were generated by motion cueing rather than being natural, it could be that natural inertial cues would yield much better path integration performance. Please change accordingly. See also next comments.

Figure 1 suppl. 2: I agree that the initial tilt cannot contribute to linear path integration, but if it is processed by the central estimator (see, for example, your co-author Jean Laurens’ models), it would change the perceived orientation of the participant to a tilted position. Consequently, the GIA after the tilt would be correctly perceived as being due to tilt, this means it would not be interpreted as resulting from linear displacement, and vestibular input would not at all, or only to a very little part, be used as input to the path integration system. This could be an explanation for the findings of inferior performance in the vestibular condition (see comment above). It would mean that motion cueing as applied here is not appropriate for simulating linear travel, which would be an important finding for designing driving simulators. Please discuss …

Results, page 4: it seems that the fit for the combined condition, specifically for distance (both in terms of R2 and of response gain), was worse than for the visual condition. This would be surprising, since adding a second sensory input should not have that effect. However, if the vestibular stimulus, specifically for distance, is not appropriate, then this is exactly what should happen. A conflicting vestibular stimulus could decrease response gain (and the fit).

Results, page 6, line 164ff: “A partial correlation analyses revealed..” A summary statistical result should be shown here as well to support the result of time constant dependence.

Line 165: “…albeit only by modulating the distance dependence” I first misunderstood this and thought it would only modulate radial distance dependence. After looking at Figure 3 suppl 2: maybe better write “…albeit only by modulating both angular and radial distance dependence.”

Figure 5: text in figure caption is missing (probably due to clipping of the text box).

Results page 12-13, Bayesian model: I’m surprised that both SD of likelihood and prior were free parameters. For a Bayes model with Gaussian distributions and fixed prior, only the quotient of both standard deviations is a free parameter (the model is basically equivalent to a weighted sum of the mean of prior and the measurement, with the weight being determined by the quotient of the variances). So, either I misunderstand your model, or there’s a mistake. If the latter is the case, then Figure 6 and the corresponding results are also partly wrong, since likelihood σ and prior σ cannot be determined on their own, but only their quotient. See next comment, I suppose there is really a mistake.

Results page 14, dynamic prior model: here you can easily see from equation 7 (page 25) that there are in fact only 2 free parameters, not three (as you state), if you re-express the weight k: the weight k is given as k=varp/(varp+varm)=1/(1+varm/varp). So only varm/varp is free, not both, you cannot determine both from the fit. Note: in this model, it is usually sufficient to take the first measurement as mean of the first prior (corresponding to a maximum likelihood estimate on the first trial, or uninformative prior). This reduces the model to one free parameter.

Discussion, line 343-344: “In contrast, inertial (vestibular/somatosensory cues) alone lacked the reliability to support accurate path integration …” this is the case for the motion cueing inertial cues, so please make clear here and at other points that your data only refer to this type of inertial cues.

Discussion: I miss a general discussion of the limits of the study due to using motion cueing. As mentioned several times, the results concerning the vestibular and combined conditions of this study cannot be generalized to vestibular stimuli under natural conditions.

Along these lines I’m also very puzzled to read in the authors’ responses the following statement: “Therefore, there is no need to ensure that these accelerations are perceived identically: they are identical.”

(This reminds me of an astronaut who once stated that there is no need to study perception of up and down in space, because in weightlessness there is no up and down.)

Two identical linear accelerations can very well be perceived completely differently depending on the rotational history and context. That’s the reason why we perceive a tilt of the head as what it is, and not as rapid linear displacement. Please ask your coauthors Dora Angelaki and Jean Laurens, who are long enough in the field to know this. And this is extremely relevant in the present context.

https://doi.org/10.7554/eLife.63405.sa1

Author response

Essential revisions:

1. Concerns were raised regarding motion cueing that was used to approximate the vestibular cues that would be present during real motion. The reviewers think that it should be better to refrain from generalizing and to restrict the conclusions to this specific artificial type of vestibular input. It could even by interesting, since motion cueing is used in driving simulators. See reviewer #2, point #3 and reviewer #3, point #3.

We agree with the remark and we apologize for our overgeneralization. We have responded to the reviewers’ specific questions about the motion cueing algorithm below. Although we have not claimed to have solved any questions regarding conflicting findings in the literature, we rephrased lines 47-50 such that we do not create the impression that we make such a claim. However, we are confident that our less restricted experimental design generalizes better than the strong vestibular-visual cue conflict that is present when simulating real-world navigation without motion cueing (lines 75-78 and 352-355).

2. One possible interpretation of the data is that the subjects rely almost exclusively on sensory feedback, and that no estimate of control dynamics is necessary. One caveat of the current design is that the different trial types were interleaved, possibly resulting in unreliable efferent copies (leading subjects to estimate velocity from sensory inputs only) and a history effect in the estimation of tau (biasing vestibular trials). The authors should provide more evidence that their effect is not the result of feedback control only and that there is no history effect. See reviewer #2 point #2 and reviewer #3, point #1-2.

History effect: we now test the history effect explicitly. We hypothesize that subjects try to compensate for whatever time constant (tau) they currently believe in, in order to stop at their believed target location. If so, their stopping location should depend only on the target location and not on tau; otherwise they knowingly failed to compensate for tau. In other words, their subjective residual errors (from their believed trajectories) should not depend on tau (correlation equal to zero). If the subjects use their previous tau estimate, then accordingly their believed responses should yield zero correlations between the subjective residual errors and that tau. If participants used the tau from the previous visual/combined trial as their estimate of tau in the current vestibular trial, there should be zero correlation between the tau and the (model-based) subjective residual errors. The data shows that for this model the correlation does not drop to zero when that estimate comes from a visual or combined trial (Figure 7 suppl. 1). In other words, the lack of adaptation cannot be fully explained away by subjects carrying over the tau estimates from the previous combined/visual trials. Intuitively, although the previous tau is, on average, in the direction of the mean of the sampling distribution, it is still far away from the overall mean due to the correlated nature of the random walk. Therefore, carrying over the previous estimate cannot fully explain away how much the participants were oblivious to dynamics changes in the vestibular condition – something that the Bayesian model readily does (Figure 6A). We reference the corresponding figure and analysis in lines 315-323.

Sensory feedback control: With respect to feedback control, we agree that, in principle, it is possible to perform this task suboptimally by directly estimating velocity using sensory feedback in the form of optic flow without estimating the control dynamics. To minimize travel time, bang-bang control is the optimal policy. But to implement it correctly, one must predict accurately when to start braking (switch from +1 to -1) based on the current dynamics. For this reason, we argue that an accurate internal model of the dynamics is required. On the other hand, a bang-bang control policy that is oblivious to the dynamics, such as a purely sensory feedback control model, is guaranteed to result in errors unless the participant performs further corrections. In other words, a bang-bang control policy that is based purely on momentary sensory feedback only allows for inaccurate but never perfect performance.

We appreciate the reviewer’s care in substantiating their idea about feedback control in a simulation of their own! To test whether the reviewer’s model explains our observations, we fit a sensory feedback control model with fixed-distance bang-bang policy to our data with two parameters: mean and standard deviation of switching distance from target (Methods – lines 707-715). The problem with the fixed policy proposed by the reviewer is without adaptation to the control dynamics, the errors have a stronger dependence on those dynamics than we measure experimentally. This fixed model predicts much higher correlations and regressions slopes between the time constant and the residual error than the ones found in the actual data (lines 324-331, Figure 7 suppl. 2A). In fact, the empirical distributions of the switching distances were broad and much closer to that predicted by an ideal bang-bang control policy (that anticipates when to brake using knowledge of the dynamics) than the best-fit fixed-distance bang-bang control policy (Figure 7 suppl. 2C).

We conclude that participants do not fully adapt to the control dynamics, but nor do they ignore them. The previous version of this paper emphasized the imperfection of adaptation, while underemphasizing that there was indeed some significant adaptation. Now we compute the responses expected from both a perfectly adapting policy and a fixed, unadapting one, and show (Figure 3E) that data lies between these extremes, revealing partial adaptation for all conditions. Visual information allows more adaptation, but even in the vestibular condition we still see more adaptation than a fixed controller that does not adapt its policy based on the current dynamics.

3. The relationship between tau and performance is unclear and should be clarified. Figure 3A seems to contradict Figure 5A. See reviewer #2, point #1.

(Please note that Figure 5 has changed to Figure 6 due to our edits). We apologize for the lack of clarity in our description. Under ideal steering control adaptation, stopping positions should depend only on target location, and nothing else (see lines 145-149, 165-168). However, we observed a relationship between the responses and the time constant (Figure 3A), indicating a deviation from ideal adaptation. Our model attributes this deviation to misestimated dynamics (lines 226-228): better/worse control adaptation corresponds to better/worse estimation of the time constant. As pointed out in lines 278-280, better dynamics estimation (Figure 6A; visual/combined) results in smaller modulation of the actual responses by the dynamics (Figure 3A; visual/combined). Conversely, greater misestimation (i.e. vestibular in Figure 6A) leads to stronger modulation (i.e. vestibular in Figure 3A). We rephrased lines 25-27 to better convey the effect of the dynamics on performance. For an explicit demonstration of how the participants’ steering was influenced by the changes in the dynamics, we added lines 185-195 and Figure 4.

4. It is unclear why the authors did not propose a more normative framework, e.g. using a hierarchical Bayesian model, as suggested in the discussion. This would be a very interesting addition to the manuscript. See reviewer #2 point #4.

In this study, we explored the contribution of different sensory modalities, control dynamics, and their interaction in human path integration. The extensiveness of the analyses needed to describe the model performance and beliefs in this novel approach does not leave room for a highly technical and thorough multi-level model of goal-oriented path integration. Such a model should include a detailed description of how participants estimate the dynamics. Our goal was to use this study as a foundation and develop such a model in follow-up studies.

5. The manuscript lack important information and details: number of trials, maximal velocity, difference between males and females, slope of the dependence between time constant and error. The actual control signal, the joystick command, should be shown and analyzed. See reviewer #1, point #1-2; reviewer #3, point #4-5.

We apologize for our omissions. We provided all missing information and additional figures as indicated by the reviewers. We would like to point out that an explicit demonstration of how the participants’ control was influenced by the changes in the dynamics was added in lines 130-134, 185-195 and Figures 2E, 4.

6. It seems that tau was correlated with trial duration and velocity (Supp Figure 4), unlike what is stated in the manuscript (the effect of both factors are said to be “unlikely” p 161-167). The author should clarify this point. See reviewer #3, point #5.

We thank the reviewers for giving us a chance to explain. Prior to the start of data collection, we adjusted stimulus parameters to ensure travel time and mean velocity are similar across different dynamics for a controller with perfect knowledge of the dynamics. However, participants’ knowledge of the dynamics was not perfect, as revealed by the dynamics-dependent responses that we attributed to erroneous adaptation/estimation. Additionally, we found a small dependence of travel duration and average travel velocity on the dynamics for some participants. Importantly, travel duration is a feature of the control policy. Since perfect adaptation would yield identical travel times across dynamics, the resulting dependence on the dynamics again shows that participants failed to adapt perfectly. Simulations confirmed this, as maladaptation results in a dependence of travel duration on the dynamics. We edited lines 207-214 to reflect this explanation and updated Figure 5 suppl. 1 to include our simulations.

7. Data presentation can be improved. See reviewer #1 point #3-5.

Reviewer #1’s comments about data presentation were greatly appreciated. All suggestions were accommodated to the fullest extent we could, and we adjusted other figures in the same spirit (e.g. Figure 1D, 4A).

Reviewer #1:

1) The study tested performance by both male and female subjects. Could the authors comment as to whether sex differences were observed across performance measures? Perhaps sex can be indicated in some of the scatter plots.

Since no significant sex differences were observed (see lines mentioned below for stats), we do not indicate the sexes of the participants in the main figures (lines 125-129, 149-152), however, we added Figure 2 suppl. 1B, C to illustrate the difference in performance across sexes.

2) Figure 2A. It would be helpful if the authors identified the start-point of the trajectory and also provided more explanation of the schematic in the caption.

Figure 2A has been updated accordingly and the corresponding caption was expanded.

3) Figure 2B-C. It would be helpful if the authors could expand this section to show some example trajectories and the relationship between examples and plotted data points. This could be done by presenting measures (radial distance, angular eccentricity, grain) for each example trajectory.

We think that the update in Figure 2A addresses the point of the reviewer by identifying the variables in 2B, C as they relate to the trajectory. Because the response gain is not a trial-by-trial measure, it can only be shown as is in Figure 2B, C. Nevertheless, we added Figure 2 suppl. 1A (referenced in line 126) where we display trajectories of an example subject under each sensory condition with the corresponding response gain values displayed on top.

4) Because the range of sampled time-constants can vary across subjects, it would be nice to show plots as in Figure 3B for each subject (i.e., in supplementary material).

The sampling distributions of the time constant across subjects and conditions were added in new Figure 3 suppl. 1, along with plots as in Figure 3B for more subjects.

5) Discussion. The broader implications of the findings from the models are not sufficiently discussed. In addition, some comparison could also be made to other recent efforts to model path integration error (e.g., PMC7250899).

We added a discussion paragraph about the model comparisons and the implications of our findings in lines 378385, while we discuss the proposed study in lines 200-201.

Reviewer #2:

The authors asked how the brain uses different sensory signals to estimate self-motion for path integration in the presence of different movement dynamics. They used a new paradigm to show that path integration based on vision was mostly accurate, but vestibular signals alone led to systematic errors particularly for velocity-based control.

While I really like the general idea and approach, the conclusions of this study hinge on a number of assumptions for which it would be helpful if the authors could provide better justifications. I also have some clarification questions for certain parts of the manuscript.

1) lines 26-7: “performance in all conditions was highly sensitive to the underlying control dynamics”. This is hard to really appreciate from the residual error regressions in Figure 3 and seems to be contradicting Figure 5A (for vestibular condition). A more explicit demonstration of how tau affects performance would be helpful.

We rephrased lines 25-27 to better convey the effect of the control dynamics on performance (i.e. failure to adapt steering to the underlying dynamics). The observed relationship between the responses and the time constant (Figure 3A) denotes a deviation from ideal steering control adaptation (ideal adaptation would manifest as an absence of modulation because stopping positions should depend only on target location and nothing else, see lines 145-149, 165-168). With our model, we attributed this modulation, and thereby, the corresponding erroneous adaptation, to dynamics misestimation (lines 226-228). Therefore, better/worse control adaptation corresponds to better/worse estimation of the time constant. As pointed out in lines 278-280, better dynamics estimation (Figure 6A; visual/combined) results in smaller modulation of the actual responses by the dynamics (Figure 3A; visual/combined). Conversely, greater misestimation (i.e. vestibular in Figure 6A) leads to stronger response modulation (i.e. vestibular in Figure 3A).

Last but not least, for an explicit demonstration of how the participants’ steering was influenced by the changes in dynamics, we added lines 185-195 and Figure 4 describing how participants’ control adapts to changes of the time constant.

2) One of the main potential caviats I see in the study design is the fact that trial types (vest, visual, combined) were randomly interleaved. In the combined condition, this could potentially result in a form of calibration of the vestibular signal and/or a better estimate of tau that then is used for a subsequent vestibular-only trial. As such, you’d expect a history effect based on trial type more so (or in addition to) simple sequence effects. This is particularly true since you have a random walk design for across-trial changes of tau. In other words, my question is whether in the vestibular condition participants simple use their previous estimate of tau, since that would be on average close enough to the real tau?

The motivation for our stimulus design was expressly to provide an opportunity for participants to rely on history. Despite our efforts, we did not find a significant history effect.

As clarified in the response to the question above and in the revised text, the model tries to explain why participants failed to adapt to the changes in tau. In the data, this failure manifests itself as a correlation between response errors and tau (lines 149-155; Figure 3A-D). The proposed model successfully attributes this failure to participants misestimating tau, because response errors obtained by integrating the participants’ control input according to the MAP estimate of tau (instead of the real tau), no longer exhibits such a correlation (Figure 5B; lines 278-280). If alternatively, participants simply used the real tau from the previous visual/combined trial as their estimate of tau in the current vestibular trial, then integrating the participants’ control input using this new way of estimating tau should also result in a correlation of zero.

We tested this and found that, using the previous estimate, model predicted correlations do not drop to zero when that estimate comes from a visual or combined trial (Figure 7 suppl. 1). In other words, the lack of adaptation cannot be fully explained away by subjects carrying over the tau estimates from the previous combined/visual trials. Intuitively, although the previous tau is, on average, in the direction of the mean of the sampling distribution, it is still far away from the mean due to the correlated nature of the random walk. Therefore, carrying over the previous estimate cannot fully explain away the regression towards the mean (the degree to which the participants were oblivious to dynamics changes) – something that the Bayesian model readily does (Figure 6A). We reference the corresponding figure and analysis in lines 315-323.

3) I thought the experimental design was very clever, but I was missing some crucial information regarding the design choices and their consequences. First, has there been a psychophysical validation of GIA vs pure inertial acceleration? Second, were GIAs always well above the vestibular motion detection threshold? In other words could the worse performance in the vestibular condition be simply related to signal detection limitations? Third, how often did the motion platform enter the platform motion range limit regime (non-linear portion of sigmoid)?

1. To determine the parameters of the Motion Cuing Algorithm, we performed several pilot experiments in ourselves to verify the effectiveness of the perception. More detailed answers to the reviewer’s question follow

2. Has there been a psychophysical validation of GIA vs pure inertial acceleration? From the point of view of physics, GIA and pure inertial acceleration are indistinguishable (Einstein, 1907). Therefore, there is no need to ensure that these accelerations are perceived identically: they are identical. However, it is possible to sense the rotation movements generated when the MC algorithm tilts the subjects. We analyzed the rotation velocity (Figure 1 suppl. 2B) and found that they could exceed rotation velocity thresholds found in the literature (Lim et al., 2017, MacNeilage et al., 2010) for brief periods (~0.6s), but we argue that these periods are too short to influence our experiment’s outcome (see Figure 1 suppl. 2B).

3. Were GIAs always well above the vestibular motion detection threshold? Yes, we verified (in Figure 1 suppl. 2A) that the GIAs profiles were higher than a conservative motion detection threshold (8 cm/s2, based on Kingma 2005, MacNeilage et al., 2010, Zupan and Merfeld: the thresholds range between 5 and 8.5 cm/s2 in these studies).

4. How often did the motion platform enter the platform motion range limit regime (non-linear portion of sigmoid)? To evaluate this, we show the GIA error in Figure 1 suppl. 2A. Indeed, these GIA error occur if (and only if) the platform is this range limit regime. We show in Figure 1 suppl. 2A that these errors remain well below the GIA threshold: therefore, we don’t think that the limitations of the platform could have influenced the subject’s perception.

4) lines 331-345: it’s unclear to me why you did not propose a more normative framework as outlined here. Especially, a model that would “contrain the hypothesized brain computationa dn their neurophysiological correlates” would be highly desirable and really strengthen the future impact of this study.

In short, this paper is already too dense. We plan to develop such a model, along with behavior and neural recordings in monkeys, which are currently underway. See also response to Reviewing Editor’s comment #4.

5) I would highly recommend all data to be made available online in the same way as the analysis code has been made available.

The dataset was made available online at the following address, and the link was provided in line 719: https://gin.g-node.org/akis_stavropoulos/humans_control_dynamics_sensory_modality_steering

Reviewer #3:

The manuscript describes interesting experimental and modelling results of a novel study of human navigation in virtual space, where participants had to move towards a briefly flashed target using optic flow and/or vestibular cues to infer their trajectory via path integration. To investigate whether control dynamics influence performance, the transfer function between joystick deflection and self-motion velocity was modified trial-by-trial in a clever way. To explain the main result that navigation error depends on control dynamics, the authors propose a probabilistic model in which an internal estimate of dynamics is biased by a strong prior. Even though the paper is clearly written and contains most of the necessary information, the study has several shortcomings, as outlined below, and an important alternative hypothesis has not been considered, so that some of the conclusions are not fully supported by results and modelling.

Substantive concerns

1) The main idea of the paper for explaining the influence of control dynamics is that for accurate path integration performance participants have to estimate dynamics. This idea is apparently inspired by studies on limb motor control. However, tasks in these studies are often ballistic, because durations are short compared to feedback delays. In navigation, this is not the case and participants can therefore rely on feedback control (for another reason, why reliance on sensory feedback in the present study is a good idea, see point 2 below). This means that the task can be solved, even though not perfectly, without actually knowing the control dynamics. Thus, an alternative hypothesis for explaining the results that has not been considered is that the error dependence of control dynamics is a direct consequence of feedback control. Feedback control models have previously been suggested for goal-directed path integration (e.g., Grasso et al., 1999; Glasauer et al., 2007).

To test this assumption, I modelled the experiment assuming a simple bang-bang feedback control that switches at a predefined and constant perceived distance from the target from +1 to -1 and stops when perceived velocity is smaller than an epsilon. Sensory feedback is perceived position, which is assumed to be computed via integration of optic flow. This model predicts a response gain of unity, a strong dependence of error on time constant (slope similar to Figure 3) or of response gain on time constant (Equation 4.1) with regression coefficients of 0.8 and 0.05 (cf. Figure 3D), and a modest correlation between movement duration and time constant (r approximately 0.2, similar to Figure 3A). Thus, a feedback model uninformed about actual motion dynamics and without any attempt to estimate them can explain most features of the data. Modifications (velocity uncertainty, delayed perception, noise on the stopping criterion, etc.) do not change the main features of the simulation results.

Accordingly, since simple feedback control seems to be an alternative to estimating control dynamics in this experiment, the authors' conclusion in the abstract "that people need an accurate internal model of control dynamics when navigating in volatile environments" is not supported by the current results.

Indeed, in principle, it is possible to perform this task suboptimally by directly estimating velocity using sensory feedback in the form of optic flow, without estimating the control dynamics.

We appreciate the reviewer’s care in substantiating their idea about feedback control in a simulation of their own! To test whether the reviewer’s model explains our observations, we fit a sensory feedback control model with fixed-distance bang-bang policy to our data with two parameters: mean and standard deviation of switching distance from target (Methods – lines 707-715). The problem with the fixed policy proposed by the reviewer is without adaptation to the control dynamics, the errors have a stronger dependence on those dynamics than we measure experimentally. This fixed model predicts much higher correlations and regressions slopes between the time constant and the residual error than the ones found in the actual data (lines 324-331, Figure 7 suppl. 2A). In fact, the empirical distributions of the switching distances were broad and much closer to that predicted by an ideal bang-bang control policy (that anticipates when to brake using knowledge of the dynamics) than the best-fit fixed-distance bang-bang control policy (Figure 7 suppl. 2C).

We conclude that participants do not fully adapt to the control dynamics, but nor do they ignore them. The previous version of this paper emphasized the imperfection of adaptation, while underemphasizing that there was indeed some significant adaptation. Now we compute the responses expected from both a perfectly adapting policy and a fixed, unadapting one, and show (Figure 3E) that data lies between these extremes, revealing partial adaptation for all conditions. Visual information allows more adaptation, but even in the vestibular condition we still see more adaptation than a fixed controller that does not adapt its policy based on the current dynamics.

2) Modelling: the main rationale of the model (line 173 ff: "From a normative standpoint, …") is correct, but an accurate estimate of the dynamics is only required if the uncertainty of the velocity estimate based on the efference copy is not too large. Otherwise, velocity estimation should rely predominantly on sensory input. In my opinion that's what happens here: due to the trial-by-trial variation in dynamics, estimates based on efference copy are very unreliable (the same command generates a different sensory feedback in each trial), and participants resort to sensory input for velocity estimation. This results in feedback control, which, as mentioned above, seems to be compatible with the results.

We manipulated the dynamics in this way precisely because we did not want the efference copy to be fully informative about self-motion velocity. As explained above, we agree that velocity estimation relies predominantly on sensory input. However, as mentioned in the response to the previous question, reliance on sensory input need not necessarily result in pure feedback control, since sensory observations can contribute significantly to the estimation of the control dynamics, which seems to be what the participants are attempting based on our findings (lines 411-412). Pure feedback control would certainly become valuable, and thus much more likely, if we alter the control dynamics within the duration of each trial. This is something that we would like to investigate in future studies.

3) Motion cueing: Motion cueing can, in the best case, approximate the vestibular cues that would be present during real motion. Furthermore, it is not clear whether the applied tilt is really perceived as linear acceleration, or whether the induced semicircular canal stimulus is too strong so that subjects experience tilt. Participants might have used the tilt has indicator for onset or offset of translational motion, specifically because it is self-generated, but the contribution of the vestibular cues found in the present experiment might be completely different from what would happen during real movement. Therefore, conclusions about vestibular contributions are not warranted here and cannot solve the questions around "conflicting findings" mentioned in the introduction.

This comment has also been answered in response to comments of Rev. 2 (see also response to Rev. Editor’s comment #1). Specifically, we have added Figure 1 suppl. 2B (referenced in lines 74-75 and 592-593), where we show the tilt velocity profile over time with a tilt/translation discrimination threshold we chose according to the canal thresholds literature (Lim et al., 2017, MacNeilage et al., 2010). Tilt velocity exceeds the proposed threshold briefly right after trial onset, however, the displacement of the subjects during that period is negligible and should not influence navigation. Thus, perceived tilt can be used as an indicator of trial onset, but it cannot contribute to path integration for 3 reasons: (a) the displacement during that period is negligible, (b) tilt velocity is kept below the perceptual threshold for the remainder of the trajectory, (c) GIA is always above the motion detection threshold of the vestibular system (Figure 1 suppl. 2A).

We have also rephrased lines 47-50 such that we do not create the impression that we make the claim to solve the questions around "conflicting findings". However, we are confident that our relatively less restricted experimental design can be more generalizable when it comes to vestibular contributions in real-world navigation (lines 75-78 and 352-355).

4) Methods: I was not able to find an important piece of information: how many trials were performed in each condition? Without this information, the statistical results are incomplete. It was also not possible to compute the maximal velocity allowed by joystick control, since for Equation 1.9 not just the displacement x and the time constant is required, but also the trial duration T, which is not reported. One can only guess from Figure 1D that vmax is about 50 cm/s for tau=0.6 s and therefore the average T is assumed to be around 8.5 s.

We apologize for our omission. The values for x and T that we used are added in line 518. Also, we added the number of trials each participant performed in each condition in lines 484-487.

5) Results: information that would useful is not reported. On page 6 it is mentioned that the "effect of control dynamics must be due to either differences in travel duration or velocity profiles", it is then stated that both is "unlikely", but no results are given. It turns out that in the supplementary Figure 4A the correlation between time constant and duration/velocity is shown, and apparently the correlation with duration is significant (but small) in the majority of cases. Why is that not discussed in the Results section? Other results are also not reported, for example, what was the slope of the dependence between time constant and error? Why is the actual control signal, the joystick command, not shown and analyzed?

We thank the reviewer for allowing us to fix these problems. Prior to the start of data collection, we adjusted stimulus parameters to ensure travel time and mean velocity are similar across different dynamics for a controller with perfect knowledge of the dynamics. Nevertheless, participants’ knowledge of the dynamics was incorrect, as revealed by the dynamics-dependent responses that we attributed to erroneous adaptation/estimation. Additionally, we found a small dependence of travel duration and average travel velocity on the dynamics for some participants. Importantly, travel duration is a feature of the control policy. Since, according to our design, perfect adaptation would exhibit similar travel times across dynamics, the resulting dependence on the dynamics merely shows that participants failed to adapt perfectly. Simulations confirmed this, as maladaptation results in a dependence of travel duration on the dynamics.

We edited lines 207-214 to reflect this explanation and updated Figure 5 suppl. 1 to include our simulations. We added the slopes of the regression between the time constant and the residual errors in Figure 3 suppl. 1 and Table 3 (referenced in line 155).

We also added a discussion of analyses (lines 185-195) and figures 1D inset, 2D inset, 4, and Figure 2 suppl. 2 that refer to the joystick input.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Reviewer #3:

The present version of the manuscript has clearly improved, and the authors responded adequately to the comments, and a link to the data was also provided. Some very helpful additional analysis was added, such as shown in Figure 3E. There are, however, some critical points left which are outlined below.

Introduction, line 83f: "These findings suggest that inertial cues alone lack the reliability to support accurate path integration …" Even though in general I'd agree with this statement, the findings in the current paper do not support this claim. Since the inertial cues were generated by motion cueing rather than being natural, it could be that natural inertial cues would yield much better path integration performance. Please change accordingly. See also next comments.

We agree with this comment. We edited this sentence to include this consideration (lines 87-88), and also adjusted the wording in other parts of the text where we refer to inertial cues (lines 358, 381, 412).

Figure 1 suppl. 2: I agree that the initial tilt cannot contribute to linear path integration, but if it is processed by the central estimator (see, for example, your co-author Jean Laurens' models), it would change the perceived orientation of the participant to a tilted position. Consequently, the GIA after the tilt would be correctly perceived as being due to tilt, this means it would not be interpreted as resulting from linear displacement, and vestibular input would not at all, or only to a very little part, be used as input to the path integration system. This could be an explanation for the findings of inferior performance in the vestibular condition (see comment above). It would mean that motion cueing as applied here is not appropriate for simulating linear travel, which would be an important finding for designing driving simulators. Please discuss …

We edited the legend in Figure 1 Suppl. 2 to reflect this view and point to the main text for a more detailed discussion (lines 435-450). We want to point out that, in a previous study, we measured the participants’ performance in complete absence of sensory cues (Lakshminarasimhan, 2018 Figure S1B). Compared to the response measured in the present study using vestibular feedback, we find that the performance is much worse in the absence of sensory feedback, suggesting that participants used the generated vestibular cues to some extent (no sensory cues condition correlation ± SD between target position and response, radial component: 0.39±0.12, angular component: 0.58±0.2; vestibular condition correlation ± SD between target position and response, radial component: 0.64±0.05, angular component: 0.93±0.03; radial component t-test p=9 ∙ 10−7, angular component t-test p = 10−8). However, we agree that the possibility that the perceived tilt would influence the processing of vestibular inputs cannot be ruled out. We hope that our revised text clearly highlights this limitation.

Results, page 4: it seems that the fit for the combined condition, specifically for distance (both in terms of R2 and of response gain), was worse than for the visual condition. This would be surprising, since adding a second sensory input should not have that effect. However, if the vestibular stimulus, specifically for distance, is not appropriate, then this is exactly what should happen. A conflicting vestibular stimulus could decrease response gain (and the fit).

This is a good observation. Although R2 and the response gain for the combined condition are slightly lower than the visual condition (only for the radial component), their difference is not significant as R2 and response gain for combined falls within the 95% CI of the mean R2 and response gain of the visual condition, respectively (Radial component: combined condition mean R2 = 0.64, visual condition 95% CI of mean R2 = [0.62 0.72], paired t-test p=0.074; Angular component: combined condition mean R2 = 0.96, visual condition 95% CI of mean R2 = [0.93 0.98], paired t-test p=0.37). Other than the concerns raised in the previous comment about the effect of perceived tilt on path integration and performance, our experimental design should not allow for a sensory conflict. We propose further manipulations for future experiments that would investigate the relationship between vestibular and visual cues in our edited Discussion section (lines 446-450).

Results, page 6, line 164ff: "A partial correlation analyses revealed.." A summary statistical result should be shown here as well to support the result of time constant dependence.

We apologize for this omission. We added Table 4 that shows these values, since the total number of values to be shown (3 conditions x 3 predictors) would take up too much space and would be hard to read. We refer to the Table on line 174.

Line 165: "…albeit only by modulating the distance dependence" I first misunderstood this and thought it would only modulate radial distance dependence. After looking at Figure 3 suppl 2: maybe better write "…albeit only by modulating both angular and radial distance dependence."

We apologize for the confusion, we changed the wording as suggested.

Figure 5: text in figure caption is missing (probably due to clipping of the text box).

Apologies, corrected.

Results page 12-13, Bayesian model: I'm surprised that both SD of likelihood and prior were free parameters. For a Bayes model with Gaussian distributions and fixed prior, only the quotient of both standard deviations is a free parameter (the model is basically equivalent to a weighted sum of the mean of prior and the measurement, with the weight being determined by the quotient of the variances). So, either I misunderstand your model, or there's a mistake. If the latter is the case, then Figure 6 and the corresponding results are also partly wrong, since likelihood σ and prior σ cannot be determined on their own, but only their quotient. See next comment, I suppose there is really a mistake.

Thank you for pointing this out. Indeed, there was a subtle error on our behalf. We corrected and re-fitted our models (static and dynamic prior) with just the ratio λ of prior σ over likelihood σ instead of both prior and likelihood σ, and therefore decreased the fitted parameters to 2 and 1 for the static and dynamic prior models, respectively. We updated the text and the relevant statistics and figures.

Results page 14, dynamic prior model: here you can easily see from equation 7 (page 25) that there are in fact only 2 free parameters, not three (as you state), if you re-express the weight k: the weight k is given as k=varp/(varp+varm)=1/(1+varm/varp). So only varm/varp is free, not both, you cannot determine both from the fit. Note: in this model, it is usually sufficient to take the first measurement as mean of the first prior (corresponding to a maximum likelihood estimate on the first trial, or uninformative prior). This reduces the model to one free parameter.

We want to clarify that we use k only to update the mean of the prior distribution across trials. After our corrections according to the previous comment, we fit just the ratio λ of prior σ over likelihood σ, and set the initial prior to be the time constant on the first trial. This reduced the fitted parameters of the dynamic prior model to 1.

Discussion, line 343-344: "In contrast, inertial (vestibular/somatosensory cues) alone lacked the reliability to support accurate path integration …" this is the case for the motion cueing inertial cues, so please make clear here and at other points that your data only refer to this type of inertial cues.

We already adjusted the wording to this sentence in response to the first comment.

Discussion: I miss a general discussion of the limits of the study due to using motion cueing. As mentioned several times, the results concerning the vestibular and combined conditions of this study cannot be generalized to vestibular stimuli under natural conditions.

We have considered the comments and concerns raised carefully and made the necessary adjustments to the text. As mentioned in the responses above, we clarified wherever applicable that any conclusions made based on our findings apply only to this specific paradigm (lines 87-88, 358, 381, 412). We also added a paragraph in Discussion describing the limitation of the Motion Cueing algorithm and opportunities for future work (435-450).

Along these lines I'm also very puzzled to read in the authors' responses the following statement: "Therefore, there is no need to ensure that these accelerations are perceived identically: they are identical."

(This reminds me of an astronaut who once stated that there is no need to study perception of up and down in space, because in weightlessness there is no up and down.)

Two identical linear accelerations can very well be perceived completely differently depending on the rotational history and context. That's the reason why we perceive a tilt of the head as what it is, and not as rapid linear displacement. Please ask your coauthors Dora Angelaki and Jean Laurens, who are long enough in the field to know this. And this is extremely relevant in the present context.

All our apologies: this is a misunderstanding. Yes, the combination of rotation and acceleration experienced during tilt can be perceived differently from the acceleration experienced during translation. The misunderstanding originated from the way we think about it: in our mind, it is the rotation history (sensed by the canals) that makes the difference, whereas the accelerations are the same (that is to say, in the absence of rotation sensors, the acceleration induced by tilt and translation are indistinguishable); hence our response.

https://doi.org/10.7554/eLife.63405.sa2

Article and author information

Author details

  1. Akis Stavropoulos

    Center for Neural Science, New York University, New York, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft
    Contributed equally with
    Kaushik J Lakshminarasimhan
    For correspondence
    ges6@nyu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4613-9422
  2. Kaushik J Lakshminarasimhan

    Center for Theoretical Neuroscience, Columbia University, New York, United States
    Contribution
    Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Akis Stavropoulos
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3932-2616
  3. Jean Laurens

    Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
    Contribution
    Conceptualization, Investigation, Methodology, Software, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9101-2802
  4. Xaq Pitkow

    1. Department of Electrical and Computer Engineering, Rice University, Houston, United States
    2. Department of Neuroscience, Baylor College of Medicine, Houston, United States
    3. Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review and editing
    For correspondence
    xaq@rice.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6376-329X
  5. Dora E Angelaki

    1. Center for Neural Science, New York University, New York, United States
    2. Department of Neuroscience, Baylor College of Medicine, Houston, United States
    3. Tandon School of Engineering, New York University, New York, United States
    Contribution
    Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Validation, Writing – review and editing
    For correspondence
    da93@nyu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9650-8962

Funding

NIH Blueprint for Neuroscience Research (NIH DC007620)

  • Dora E Angelaki

National Science Foundation (NeuroNex DBI-1707398)

  • Kaushik J Lakshminarasimhan

Gatsby Charitable Foundation

  • Kaushik J Lakshminarasimhan

Simons Foundation (324143)

  • Xaq Pitkow
  • Dora E Angelaki

National Institutes of Health (R01NS120407)

  • Xaq Pitkow
  • Dora E Angelaki

National Science Foundation (1707400)

  • Xaq Pitkow

National Science Foundation (1552868)

  • Xaq Pitkow

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Jing Lin and Jian Chen for their technical support, and Baptiste Caziot, Panos Alefantis, Babis Stavropoulos, Evangelia Pappou, and Emmanouela Pechynaki for their useful insights. This work was supported by the Simons Collaboration on the Global Brain, grant no. 324143, and NIH DC007620. GCD was supported by NIH EY016178.

Ethics

Human subjects: All experimental procedures were approved by the Institutional Review Board at Baylor College of Medicine and all participants signed an approved consent form (H-29411).

Senior Editor

  1. Richard B Ivry, University of California, Berkeley, United States

Reviewing Editor

  1. Adrien Peyrache, McGill University, Canada

Reviewers

  1. Benjamin J Clark, University of New Mexico, United States
  2. Gunnar Blohm, Queen's University, Canada
  3. Stefan Glasauer, Ludwig-Maximilians-Universität München, Germany

Publication history

  1. Preprint posted: September 23, 2020 (view preprint)
  2. Received: September 23, 2020
  3. Accepted: December 11, 2021
  4. Version of Record published: February 18, 2022 (version 1)
  5. Version of Record updated: May 3, 2022 (version 2)

Copyright

© 2022, Stavropoulos et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 382
    Page views
  • 60
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Akis Stavropoulos
  2. Kaushik J Lakshminarasimhan
  3. Jean Laurens
  4. Xaq Pitkow
  5. Dora E Angelaki
(2022)
Influence of sensory modality and control dynamics on human path integration
eLife 11:e63405.
https://doi.org/10.7554/eLife.63405
  1. Further reading

Further reading

    1. Neuroscience
    Mingchao Yan et al.
    Tools and Resources

    Resolving trajectories of axonal pathways in the primate prefrontal cortex remains crucial to gain insights into higher-order processes of cognition and emotion, which requires a comprehensive map of axonal projections linking demarcated subdivisions of prefrontal cortex and the rest of brain. Here, we report a mesoscale excitatory projectome issued from the ventrolateral prefrontal cortex (vlPFC) to the entire macaque brain by using viral-based genetic axonal tracing in tandem with high-throughput serial two-photon tomography, which demonstrated prominent monosynaptic projections to other prefrontal areas, temporal, limbic, and subcortical areas, relatively weak projections to parietal and insular regions but no projections directly to the occipital lobe. In a common 3D space, we quantitatively validated an atlas of diffusion tractography-derived vlPFC connections with correlative green fluorescent protein-labeled axonal tracing, and observed generally good agreement except a major difference in the posterior projections of inferior fronto-occipital fasciculus. These findings raise an intriguing question as to how neural information passes along long-range association fiber bundles in macaque brains, and call for the caution of using diffusion tractography to map the wiring diagram of brain circuits.

    1. Medicine
    2. Neuroscience
    Simon Oxenford et al.
    Tools and Resources

    Background: Deep Brain Stimulation (DBS) electrode implant trajectories are stereotactically defined using preoperative neuroimaging. To validate the correct trajectory, microelectrode recordings (MER) or local field potential recordings (LFP) can be used to extend neuroanatomical information (defined by magnetic resonance imaging) with neurophysiological activity patterns recorded from micro- and macroelectrodes probing the surgical target site. Currently, these two sources of information (imaging vs. electrophysiology) are analyzed separately, while means to fuse both data streams have not been introduced.

    Methods: Here we present a tool that integrates resources from stereotactic planning, neuroimaging, MER and high-resolution atlas data to create a real-time visualization of the implant trajectory. We validate the tool based on a retrospective cohort of DBS patients (𝑁 = 52) offline and present single use cases of the real-time platform. Results: We establish an open-source software tool for multimodal data visualization and analysis during DBS surgery. We show a general correspondence between features derived from neuroimaging and electrophysiological recordings and present examples that demonstrate the functionality of the tool.

    Conclusions: This novel software platform for multimodal data visualization and analysis bears translational potential to improve accuracy of DBS surgery. The toolbox is made openly available and is extendable to integrate with additional software packages.

    Funding: Deutsche Forschungsgesellschaft (410169619, 424778381), Deutsches Zentrum für Luftund Raumfahrt (DynaSti), National Institutes of Health (2R01 MH113929), Foundation for OCD Research (FFOR).