Introduction

Physical reasoning—the intuitive ability to predict how objects and environments behave under the laws of physics—is a fundamental component of human cognition1,2. Even the simplest daily activities, as moving a cup away from a table’s brink or catching a ball in mid-air, require humans to harness cognitive resources to anticipate and respond to their physical environment. Yet, despite the importance of physical reasoning for human survival, the question is still open regarding its underlying mechanisms.

A contentious debate in cognitive psychology and neuroscience has revolved around whether human reasoning is an abstract, modular process3,4 that relies on amodal symbols, rules and conventions3, or whether it is fundamentally grounded in human bodily interactions with the environment5-8. In other words, is physical reasoning a result of a ‘high-level’ rational, logic-based inference or is it honed and refined through ‘low-level’ embodied sensorimotor experience? This is not a trivial debate, and its outcomes could fundamentally influence our theoretical understanding of cognition and inform not only psychological research but also applied fields such as artificial intelligence and robotics, where the replication of human-like common sense remains a formidable challenge9.

Traditionally, cognitive researchers argue that physical reasoning is grounded in innate knowledge of the relationships between cues and outcomes and is largely detached from the real-time sensorimotor experiences that shape cognition10-13. Similarly, Bayesian models of cognition suggest that individuals use priors grounded in an intuitive knowledge of Newtonian physics, informed by past observations, to refine real-time physical predictions11,14,15. Under perceptual uncertainty, these priors integrate with new sensory data to forecast causal relationships and potential outcomes of actions16-18.

An alternative view on physical reasoning has emerged in the last decades, which argues that cognitive processes are fundamentally grounded in bodily experiences and interactions with the physical world3,5. For example, humans’ prediction of the trajectory of moving objects (e.g., a ball thrown towards them) relies on their need to physically act upon this prediction (e.g., catch the ball). Moreover, the potency of physical reasoning in this context is affected by individual variations in skills19,20. A seasoned athlete with robust physical prowess may be more adept at foreseeing the course of a swiftly thrown ball, granting them a competitive edge in timely reactions21,22. This interplay between prediction and action, which is driven by the precepts of physical reasoning, allows humans to efficiently navigate through dynamic environments, adapt to novel scenarios, and react with precision and agility to unpredictable situations.

Gravity’s ubiquitous presence on Earth offers a unique opportunity to test this fundamental question about cognition. All living species on our planet have been continuously exposed to a constant gravitational acceleration of 9.8 ms2 (1g) which has influenced the way we successfully interact with the environment23,24. The vestibular system in the inner ear continuously monitors the position of the head with respect to gravity. By integrating vestibular signals with visual, proprioceptive, and visceral cues, the brain creates an internal model of terrestrial gravity, which provides a fundamental prior for reasoning about motion and spatial orientation and predicting trajectories, weight, and stability of objects in everyday contexts24. Developmental evidence suggests that aspects of this gravity prior are innate, as events that defy expected gravitational behaviour violate the expectations of human toddlers25. Recent studies in newborn chicks raised in a controlled lab setting further supported this perspective by revealing the presence of an innate gravity prior26.

However, it remains unclear to what extent real-time embodied experience can recalibrate this internal mental model. In other words, does immediate sensorimotor exposure to a novel gravitational environment alter high-level physical reasoning, or is this gravity model largely fixed? Addressing this question is important because it speaks to the flexibility of the human mind: if brief physical experiences can update an internal gravity model, then one of the most abstract mental skills is grounded in recent bodily states. If not, reasoning may rely on (and be limited by) a more ‘hard-wired’ prior. Here, we address this question by conducting two pre-registered studies, guided by a simple logic: if real-time embodied experience affects the mechanisms underlying physical reasoning, then altering a person’s gravitational cues while solving a physical reasoning task should affect their internal gravity model (and thereby their task performance).

In the first study, we tested participants under terrestrial gravity with and without perturbation of vestibular gravity cues. Healthy adult participants completed the Virtual Tools task16—a set of computerized reasoning games in which they had to reason about the movement of objects in a virtual 2D environment (Fig. 1a and Supplementary Video 1). While participants played the reasoning games, we disrupted gravitational signalling in real time using disruptive Galvanic Vestibular Stimulation (GVS27), which introduces noise to their vestibular organ. We also tested the same participants in a Sham stimulation condition with similar skin sensations but no vestibular stimulation (order of conditions was counterbalanced). Participants played a range of games that varied in their reliance on gravity-based predictions, allowing us to assess whether disrupting vestibular input selectively impairs reasoning in gravity-dependent contexts, or whether it produces a more general cognitive disruption. We assessed participants’ performance (measured by success rate, average number of attempts, and average attempt duration) and strategies (measured by the change in their tool selection and positioning from one attempt to another within each game). Better performance and more efficient strategies in the Sham condition compared to the GVS condition in gravity-dependent games would suggest that an intact real-time experience is necessary for high-level reasoning about physical dynamics. Alternatively, a similar or better performance in the GVS condition would suggest that disrupting the real-time experience of gravity does not selectively degrade reasoning performance. In our pre-registration, we hypothesised that when participants do not experience changes in gravity (sham stimulation), they would perform better in physical-reasoning tasks that are based on terrestrial gravity.

Task design

a, Virtual Tools task. The participants were required to click on one out of the three tools depicted in the upper right corner of the figure (second panel from the left) and to place it with another click on the screen to get the red ball into the green area (third panel from the left). As soon as the tool is released (panel on the right), the laws of physics/gravity are applied, and the tool interacts with the other objects. b, Galvanic Vestibular Stimulation set up with two pairs of electrodes shown on a head model. The upper electrodes deliver the Galvanic Vestibular Stimulation and the ones below are used as a Sham stimulation to control for non-vestibular specific effects. c, Procedure. After a series of practice games followed by baseline games designed under terrestrial gravity, participants played two sets of games designed under terrestrial gravity (in study 1) and hypo or hyper gravities (in study 2), with concurrent stimulation (either GVS or Sham during a given set).

In the second study, we aimed to test whether the hindering effects of real-time embodied experience can be reduced in case of reasoning about altered gravity. To that end, we manipulated the gravity within the Virtual Tools games, which required reasoning about hypogravity (0.5 g) or hypergravity (2 g) while each participant experienced GVS or Sham condition. This meant that object movements were influenced by gravity at either a higher or lower acceleration than Earth’s standard gravity (9.8 m/s²). We calculated the performance and strategy measures as in study 1. Then, for each measure, we computed a ‘gravity-weighted index’ (GWI) by calculating the ratio between GVS-performance to Sham-performance, scaled by each game’s reliance on gravity. GWI is a direct quantification of the effects of real-time vestibular noise on physical reasoning. A higher GWI for success rate in study 2 compared to study 1 would suggest that the effects of real-time embodied experience are reduced when reasoning about non-terrestrial gravity. Conversely, a lower GWI would suggest that high-level reasoning depends mainly on a fixed gravity prior, and that real-time vestibular noise leaves participants’ performance unchanged, regardless of the gravity within the game. We hypothesized that participants would perform better with altered sense of gravity when the task involves non-terrestrial gravity.

Results

Study 1: Intact Real-Time Embodied Experience is Necessary for High-Level Reasoning

We first examined whether the altered vestibular input influenced participants’ performance in the reasoning task - defined as success rate, number of attempts to solve the games and time per attempt. High success rates and low number and time per attempt indicate good performance. Preliminary analyses showed that males and females did not perform significantly differently between the Sham and GVS conditions (male; t(9) = -0.58, p = .71, Cohen’s d = -0.22; female: t(33) = 0.47, p = .32, Cohen’s d = 0.10), so we combined all participants in subsequent analyses.

When examining all reasoning games, participants performed similarly in the Sham condition compared to the GVS condition in all performance measures (tsuccess_rate(43) = 0.27, psucess_rate = .39, Cohen’s dsucess_rate = 0.05; tnumber_of_attempt (43) = -0.88, pnumber_of_attempt = .19, Cohen’s dnumber_of_attempt = -0.14; ttime_per_attempt(43) = 0.32, ptime_per_attempt = .62, Cohen’s dtime_per_attempt = 0.05; See full statistics in Table 1).

Means, Standard Deviations and t test Statistics for Study 1 measures

At game level, we found a significant negative impact of the GVS in a subset of games only (see Fig. 2). GVS had a negative impact on the success rate in ‘Remove’ (t(42) = 2.12, p = .02, Cohen’s d = 0.63) and ‘GoalMove’ (t(41) = 1.87, p = .03, Cohen’s d = 0.56). We also found a detrimental effect of the GVS on the number of attempts for two games, ‘Spiky’ (t (41) = -2.10, p = .02, Cohen’s d = -0.63) and ‘Falling_A’ (t (42) = -1.78, p = .04, Cohen’s d = -0.53).

Study 1 - Subset of games with a significant impact of the GVS

a, on the success rate. The games significantly impacted by the GVS are displayed on the left with their name below the illustration. The bar chart on the right shows the average success rate for the game when played under the Sham stimulation (blue bar) and the GVS stimulation (orange bar). The colour of the borders indicates the impact of gravity in the game. High-gravity-dependent games: turquoise box; Medium-gravity-dependent-games: magenta box. b, on the number of attempts. c, on the distance between placements. d, on the tool switching across attempts. Independent t-tests * p < .05

Study 1: Real-Time Embodied Experience Shapes Reasoning Strategies

We next examined whether the altered vestibular input influenced participants’ high-level strategies - defined as changes in tool selection and spatial positioning across successive attempts within each game - as these patterns reflect how participants adapt their behaviour to solve the task (see Methods). Big changes in tool positioning and frequent tool switching indicate poor evaluation of the failed outcome. In their successive placements across games, participants used not-significantly larger distance between their tool placements in the GVS condition compared to the Sham but switched between tools near significantly more in the Sham condition compared to the GVS condition (tdistance(43) = -0.52 pdistance = .61, Cohen’s ddistance = -0.11; ttool_switching(43) = 1.9, ptool_switching = .06, Cohen’s dtool_switching = 0.28; See full statistics in Table 1).

As for performance measures, the effects of GVS were specific to a subset of games (see Fig. 2). Participants in the GVS condition placed tools significantly farther apart across attempts compared to Sham in two games, ‘Shafts_B’ (t(25) = -2.29, p =.03, Cohen’s d = -0.85) and ‘Gap’ (t (23) = -2.30, p = .03, Cohen’s d = -0.90). Regarding tool switching, in ‘Chaining’, participants under GVS switched between tools more frequently than under the Sham condition (t(36) = -3.27, p = .002, Cohen’s d = -1.04).

To determine whether the observed differences in distance between successive tool placements reflect distinct reasoning strategies induced by altered gravitational signalling - rather than incidental performance fluctuations, we conducted two additional exploratory analyses (not pre-registered; see Methods). These analyses explicitly tested if the distances between tool placements across attempts could reliably distinguish stimulation groups, directly quantifying differences in participants’ strategic approaches. In the first analysis, we explicitly investigated whether GVS increased the likelihood of participants changing their spatial tool placement strategies between attempts. Using a Dirichlet Process Gaussian Mixture Model on participants’ placement coordinates, we quantified how frequently participants switched from one identified strategy cluster to another (Fig. 3a). This analysis confirmed that GVS significantly increased the probability of switching strategies compared to Sham stimulation (odds ratio [OR] = 0.83, 95% confidence interval [CI] = [0.70, 0.98], p = .03 - Fig. 3b), and that switching probability decreased over successive attempts (OR = 0.87, 95% CI = [0.83, 0.92], p < .001), suggesting stabilization of strategy use across trials. We then conducted a second complementary analysis using a leave-one-out kernel density classification approach to address whether participants adopted systematically different strategies across stimulation conditions. Here, we assessed if a participant’s overall distance in tool placements patterns could reliably predict their stimulation group (GVS or Sham). Predictive accuracy was computed for each left-out participant by comparing their placement patterns against those of other participants. Because the observed accuracy exceeds the range expected by chance (Fig. 3c), this result confirms that altered vestibular signaling systematically influenced participants’ spatial strategies, leading to reliably distinct patterns between stimulation groups.

Clustering analysis

a, Illustration of results of the Dirichlet Process Gaussian Mixture Model by stimulation type for a selection of games. Each dot represents an individual placement, the ellipses represent the different clusters or strategies b, Strategy switch. Probability of strategy switch on overall games played under GVS versus Sham. c, Leave-one-out Kernel density analysis. This figure shows the distribution of accuracies obtained from 1000 iterations with randomly shuffled group labels and their corresponding 95% Equally-Tailed Intervals (ETIs). The black curve represents the null distribution obtained by randomizing group labels; red dashed lines mark its 95% credible interval. The blue bar marks the observed accuracy, which exceeds this range—indicating that tool placement patterns reliably distinguish between groups.

Study 2: Real-time embodied experience supports physical reasoning adaptation

In Study 2, participants performed the Virtual Tools games in non-terrestrial gravity. The task required reasoning about hypogravity (0.5 g) or hypergravity (2 g) while experiencing GVS or Sham condition. Similar to previous research manipulating visual gravity in Virtual Tools28 and showing no difference in the effect of hypo vs hyper gravity on physical reasoning, we found no difference in the effect of the stimulation on success rates in hypo and hyper gravity games (hypogravity: t(39) = -0.43, p = .66, Cohen’s d = -0.09; hypergravity: t(39) = 0.10, p = .92, Cohen’s d = 0.02;), so we aggregated 0.5 and 2g results as ‘altered gravity’ in subsequent analyses.

Across all games, participants succeeded not significantly better in the GVS condition compared to the Sham condition, with similar number of attempts (tsuccess_rate(39) = -0.18, psuccess_rate = .57, Cohen’s dsuccess_rate = -0.03; tnumber_of_attempt(39) = -0.64, pnumber_of_attempt = .26, Cohen’s dnumber_of_attempt= -0.09; See full statistics in Table 2). At game level, we found a significant negative impact of the GVS in a subset of games only (see Fig. 4). GVS had a negative impact on the success rate in ‘Goal Move’ (t(37) = 1.85, p = .04, Cohen’s d = 0.58). We found a detrimental effect of the GVS on the number of attempts for two games, ‘Towers_B’ (t (38) = -1.96, p = .03, Cohen’s d = 0.61) and ‘CatapultAlt’ (t (37) = -1.78, p = .04, Cohen’s d = -0.56). We also observed a detrimental effect of GVS on time per attempt, for one game, ‘Chaining’ (t(38) = -1.74, p = 0.044, Cohen’s d = -0.54).

Study 2 - Subset of games with a significant impact of the GVS

a, on the success rate. The games significantly impacted by the GVS are displayed on the left with their name below the illustration. The bar chart on the right shows the average success rate for the game when played under the Sham stimulation (blue bar) and the GVS stimulation (orange bar). The colour of the borders indicates the impact of gravity in the game. High-gravity-dependent games: turquoise box; Medium-gravity-dependent-games: magenta box; Low-gravity-dependent-games: dark orange box. b, on the number of attempts. c, on the time per attempt. d, on the tool switching across attempts. Independent t-tests, * p < .05

Means, Standard Deviations and t test Statistics for Study 2

Across the two studies, there was no significant difference in the average success rates in the baseline games played in both studies (independent two-sided t-test: t(82) = 0.15, p = .88, Cohen’s d = 0.03). Potential differences in the GVS effect in altered vs terrestrial gravity could therefore not be explained by potential differences in participants’ reasoning performance between the two studies. To compare the effect of the GVS on physical reasoning in terrestrial vs altered gravities, we defined a ‘gravity-weighted index’ (GWI) as the ratio between results for each measure in GVS and Sham, to which we applied a multiplier to account for the differentiated gravity impact in the different games (see equation (1) in Methods: this index was not pre-registered). Importantly, for this specific analysis, we included only the games from study 1 which were also played in study 2. As shown in Fig. 5, we observed a significantly greater gravity-weighted index for success rate in altered gravity compared to terrestrial gravity (tGWI_success_rate(82) = 1.7; pGWI_success_rate = .046, Cohen’s dGWI_success_rate = 0.37), confirming the facilitation role of GVS in physical reasoning in altered gravity. But no significant difference was found for the gravity-weighted indexes in terrestrial versus altered gravity for the number of attempts nor the time per attempt (tGWI_number_of_attempt (82) = -0.84; pGWI_number_of_attempt = .20, Cohen’s dGWI_number_of_attempt = -0.18; tGWI_time_per_attempt (82) = 0.55; pGWI_time_per_attempt = .71, Cohen’s d GWI_time_per_attempt = 0.12).

Gravity-weighted indexes in terrestrial gravity and altered gravity.

a, Success rate. The plot shows the gravity-weighted index averaged across conditions (black dots), as well as the individual participant gravity-weighted index data (transparent dots). An independent one-sided t-test: t(82) = 1.70; p = .046, Cohen’s d = 0.37 showed that gravity-weighted index for SR in altered gravity was significantly higher than in terrestrial gravity. b, Number of attempts. c, Time per attempt. d, Distance between tool placements. e, Tool Switching between attempts Independent t-tests, * p < .05.

Study 2: Real-time embodied experience does not impact on strategies

Across all games, participants used not significantly larger distances between placements in the Sham compared to the GVS condition (tdistance(39) = 0.27, pdistance= .78, Cohen’s ddistance = 0.05) and switched between tools at similar levels under the two stimulations (ttool_switching(39) = -0.14, ptool_switching = .89, Cohen’s dtool_switching = -0.03, See Table 2). At game level, participants switched significantly less between tools in only one game, ‘Table_A’ (t(19) = 2.37, p = .03, Cohen’s d = 1.02- See Fig. 4). No significant negative impact of the GVS was observed for distance across successive placements.

The GWI for tool switching was near significantly greater in altered gravity games compared to terrestrial gravity games (tGWI_tool_switching(79) = 1.9; pGWI_tool_switching = .06, Cohen’s dGWI_tool_switching = 0.42, See Fig. 5). For the distance between tool positionings across attempts, the GWI was not significantly lower in altered gravity games compared to terrestrial gravity games (tGWI_distance(79) = -0.57; pGWI_distance = .57, Cohen’s dGWI_distance = -0.13, See Fig. 5).

Discussion

Our study tests a body-dependence in human high-level reasoning. Real-time vestibular perturbation via GVS was found to significantly impair participants’ reasoning about object motion under Earth gravity, yet the same perturbation was different when reasoning in altered-gravity environments. This paradoxical dual effect suggests that high-level cognitive judgments about physics are not fixed computations but are dynamically modulated by the body’s ongoing sensory state. In essence, the brain’s internal “physics engine” appears to be embodied and context-sensitive—robustly tuned to terrestrial gravity under stable conditions, but capable of flexibly reconfiguring its assumptions when sensory cues signal a novel gravitational context.

Our data lend strong support to theories of embodied cognition, which posit that high-level reasoning is deeply rooted in the sensorimotor systems of the body 29. Rather than viewing physical reasoning as an abstract process operating in isolation, our findings show it to be exquisitely sensitive to the state of the body and its interaction with the environment. Disrupting the vestibular sense degraded performance in the familiar 1g context, indicating that steady bodily feedback normally scaffolds accurate intuitive physics. Yet the same disruption became beneficial in altered-gravity context, implying that the brain can rapidly reweight sensory inputs to meet task demands 30. This ability to reconfigure cognitive strategies based on bodily context reflects a form of neural plasticity on short timescales. It resonates with evidence that vestibular pathways are not only crucial for balance but also project widely to cognitive centers involved in spatial memory, attention, and executive function 23,31. In effect, our “common sense” physical reasoning may emerge from neural computations that evolved for sensorimotor control. For example, humans possess an internalised sense of gravity through lifelong experience, and this embodiment of gravity guides both our actions and intuitions 32. The present results show that this embodied knowledge is not rigid: when the ground truth changes (as in altered gravity), the mind-body system can adapt by drawing on alternative cues (visual, proprioceptive, or even erratic vestibular signals) to revise its predictions. Such sensorimotor integration ensures that cognition remains grounded and can cope with environmental novelty. More broadly, the context-dependent reversal we observed—where perturbing the body helps rather than hinders cognition under changed physics—underscores a core principle of embodied intelligence: mental processes exploit bodily feedback and will adjust their operation to maintain alignment with the external world.

Our findings also invite interpretation through the lens of predictive coding and internal models of physics. The human brain maintains an implicit model of Earth’s gravity that normally aids perception and action 32, for example by allowing accurate catching of falling objects. In conventional settings this internal model functions as a reliable prior expectation, integrated with vestibular and other sensory inputs to stabilise our experience of the world 33. However, a strong gravitational prior can become a liability in an altered-gravity context, where it produces systematic prediction errors. Our results suggest that adding vestibular “noise” via GVS effectively perturbs or downregulates this prior, forcing the cognitive system to rely more on immediate sensory evidence. In predictive-processing terms, the perturbation injects precisely the kind of prediction errors that prompt the brain to update its beliefs about physical dynamics. Notably, there is independent evidence that introducing stochastic vestibular input can improve performance by amplifying information throughout in neural circuits 34. Consistent with the principle of stochastic resonance, a moderate level of vestibular perturbation may boost the brain’s sensitivity to new gravitational contingencies rather than being purely disruptive 34. Thus, when gravity is altered, the GVS-induced sensory prediction errors appear to accelerate the revision of the brain’s internal model of gravity, yielding more adaptive reasoning. This perspective aligns with recent views that the cerebellum and vestibular pathways implement forward models to predict gravity’s effects. By momentarily destabilising these predictions, GVS allowed new sensorimotor information to recalibrate cognitive expectations on the fly. In short, an embodied predictive-coding mechanism may underlie the flexible physics reasoning observed here, with vestibular signals serving as a key “teaching signal” when our default assumptions about the physical world no longer hold.

Another important point in our findings is that GVS disrupted physical reasoning in games where terrestrial gravity was critical for reaching the solution, selectively reducing performance in those problems. This selective impairment is consistent with previous demonstrations that GVS perturbs spatial cognition, including short-term spatial memory and egocentric mental rotation. For instance, Dilda, et al. 35 observed an impact of GVS on short-term spatial memory and egocentric mental rotation.

Importantly, the vestibular perturbation also altered participants’ cognitive strategies, particularly their spatial exploration. Clustering of tool placements showed that participants under GVS shifted more frequently between distinct spatial clusters from one attempt to the next than in the Sham condition, indicating noisier and less stable spatial exploration, in line with evidence that GVS modulates spatial perception and biases distance judgements27. By contrast, the near-significant increase in tool switching in the Sham condition may seem at odds with the idea that frequent switching reflects poor evaluation of failed outcomes. Given the relatively small number of attempts per game, however, this pattern is better interpreted as suggesting that GVS dampens cognitive flexibility, consistent with previous reports of its detrimental effects on flexibility36.

When gravity within the game was altered, the same vestibular disruption became functionally beneficial. The gravity-weighted success-rate index was higher under GVS than Sham, indicating that vestibular noise facilitated adaptation to the recalibrated gravitational field and helped reduce sensory prediction errors. This rapid improvement, achieved after less than 15 minutes of exposure, is striking given the omnipresence of gravity in everyday life and prior evidence that enhanced performance in novel vestibular environments typically follows much longer pre-adaptation to noisy GVS (around 120 minutes)30. Although we did not observe robust stimulation effects on strategy measures in altered gravity, the trend in the gravity-weighted tool-switching index suggests that vestibular perturbation may also start to reshape strategic exploration—a possibility that future studies should examine more systematically.

Taken together, our findings highlight the remarkable flexibility of the human cognitive apparatus and open several avenues for deeper inquiry. The ability of a momentary vestibular manipulation to reshape high-level reasoning speaks to a brain that continually recalibrates its internal rules to match the external context, even for fundamental concepts like gravity. This challenges the notion of intuitive physics as a fixed, encapsulated module; instead, it appears as a fluid interplay between memory, expectation, and real-time sensation. An exciting implication is that cognitive plasticity extends into domains once thought stable, with potential applications in training and rehabilitation. For instance, in astronautics and extreme environments, strategically perturbing astronauts’ vestibular inputs during training might accelerate the update of internal models for novel gravitational fields, thereby speeding up adaptation 37,38. Likewise, therapeutic vestibular stimulation could possibly be used to push the brain toward alternative strategies when habitual sensorimotor predictions contribute to errors or biases in perception.

On the theoretical front, our findings encourage an expansion of frameworks like predictive coding to more fully include embodiment: the brain’s “predictions” are not only about sensory events but are inextricably linked to the physical self in a given world state. This perspective dovetails with emerging trends in artificial intelligence and robotics, where embodied agents—equipped with physical sensors and the capacity for real-time feedback—show more robust problem-solving and generalization than disembodied algorithms 29. Just as our study demonstrates that a human’s reasoning can improve by perturbing and thereby enriching the sensory input in unfamiliar settings, an embodied AI might benefit from injecting controlled noise or variability to avoid overfitting to one environmental regime. In summary, the present work not only highlights the adaptability in human reasoning under sensorimotor perturbation, but also provides a conceptual bridge between neuroscience, cognitive science, and embodied AI: it suggests that intelligence, whether biological or artificial, may achieve its highest flexibility when it harnesses the dynamics of an embodied predictive model of the world.

Methods

An analysis plan for this study was pre-registered in Open Science Framework (https://osf.io/8vutf). Data and analysis code have been deposited on GitHub at https://github.com/Physical-Cognition-Lab/Adaptability-in-altered-gravity.

Participants

Study 1

A total of 48 healthy adults with no history of neurological and vestibular disorders took part in the study (M = 25.2 years old, SD = 4.9; range of age: 18.1 to 34.3; 38 females, 45 right-handed as per the Edinburgh Handedness Questionnaire39). Four participants were excluded from analyses as they reported dizziness and did not complete the experiment. Therefore, the final sample consisted of 44 participants (M = 25.7 years, SD = 4.8; 34 females). All participants gave their written informed consent. An a priori power analysis was conducted using G*Power version 3. Results indicated the required sample size to achieve 95% power for detecting a medium effect, at a significance criterion of α = .05, was N = 40 for a 2-way ANOVA. Thus, the obtained sample size of N = 44 is adequate to test the study hypothesis. Participants were recruited through the Birkbeck College website and were informed that they could withdraw at any time. The study was granted ethical approval by the Psychology Ethics Committee at Birkbeck, University of London (references #2122009 and #2122028). The participants were either compensated with credits or with 10 pounds.

Study 2

This study was covered by the same ethical approval than study 1. Participants to study 2 were recruited through the Birkbeck College website and did not participate to study 1. All participants gave their written informed consent. A total of 41 healthy adults with no history of neurological and vestibular disorders took part in the study (M = 24.4 years old, SD = 5.1; range of age: 18.5 to 34.9; 30 females, 39 right-handed as per the Edinburgh Handedness Questionnaire39). One participant was excluded from analyses as they reported dizziness and did not complete the experiment. Therefore, the final sample consisted of 40 participants (M = 24.4 years, SD = 5.2; 29 females).

Design

Study 1

To test the impact of altered gravity on participants’ physical reasoning and to account for the high intrasubject variability40 while keeping the overall experiment and stimulation times in line with standard practices27, we used a within-subject design, where each participant played games with concomitant GVS and a control Sham stimulation. All the games were designed in terrestrial gravity (Fig. 1 and Extended Data Fig. 1). After practicing four games, each participant played twelve baseline games without any stimulation, allowing us to compare the two studies against a baseline. Then, participants completed two sets of fourteen games with concomitant GVS or Sham stimulations. To mitigate for carry-over and practice effects with the games and the stimulations, we counterbalanced the order of the stimulations and the order of sets of games across participants, resulting in four groups. In addition, we randomized the order of the games. Finally, each participant would only play a given game once during the experiment.

Study 2

Participants played a version of the same games designed in altered terrestrial gravity (Fig. 1 and Extended Data Fig. 1). After practicing four games, each participant played nine baseline games in terrestrial gravity and without any stimulation. These baseline games were chosen from the baseline games used in study 1. Then, participants completed two sets of ten games in altered gravity with concomitant GVS or Sham stimulations (counterbalanced across participants). Half of the games were in half-terrestrial gravity and half of them in double-terrestrial gravity (Fig. 1, Extended Data Table 1 and Supplementary Videos 2 and 3). A red text above the game indicated the gravity in the game. The number of games included in study 2 was slightly reduced compared to study 1 to a number deemed to be sufficient for assessing the effects of altered gravity on reasoning. The split of the games between the two sets was slightly amended compared to study 1 to ensure a balanced mix of games in terms of gravity level, difficulty and gravity dependency. As in study 1, to mitigate for carry-over and practice effects with the games and the stimulations, we counterbalanced the order of the stimulations and the order of sets of games across participants, and we randomized the order of the games. Finally, each participant would only play a given game once during the experiment.

Experimental procedure

Participants performed the experiment in a dimly lit room, seated, with their head restrained by a chinrest, facing a screen placed approximately at 30 cm and centered at eye level. After participants provided their consent and completed questionnaires assessing their eligibility to receive the GVS and their hand laterality, they received explanations on the Galvanic Vestibular Stimulation. The experimenter then placed the electrodes and let the participants try the GVS for 10 seconds27. After participants confirmed they agreed to proceed with the stimulation, they were given standardized instructions on the task. Next, they practiced 4 practice games and then the baseline games, allowing them to get familiarized with the Virtual Tools games before the GVS stimulation. Finally, they played 2 sets of games with concurrent stimulation (either GVS or Sham). They moved to the next game either when they found the solution or after 8 failed attempts or after 1 minute had passed. They received visually presented feedback (green tick or red cross) after each attempt. After the last game, the experimenter removed the electrodes and debriefed the participants.

Physical reasoning task

Virtual Tools Games

Participants practiced an adaptation of the ‘Virtual Tools’ task, an online gaming framework developed by Allen, et al. 16 which has been used in previous research to assess physical reasoning in adults and children16,28,41,42. This task consists of a series of computerized games in which participants are asked to select one of three tools in one click and to place it on the screen in another click, to achieve a goal – getting a red object into a green goal area. Their tool placement triggers physical cascades, which approximate the physics of the real word, including gravity and object-to-object interactions (see Figure 1a and Supplementary Video 1). To succeed, participants must use their knowledge about physical laws such as gravity and collision forces and predict environmental changes and object-to-object interactions, without performing physical motor movement. For each game, participants could attempt to place tools up to 8 times with a time limit of one minute. The game is reset to its initial state after a failed attempt.

Block types and gravity manipulation

Participants were exposed to different gravities. In the baseline block, participants played the games designed under terrestrial gravity and no stimulation. Then they played two sets of games with a stimulation on, either GVS or Sham. These sets of games were either designed under terrestrial gravity - study 1-or altered gravity - study 2 (See Extended data Figure 1 for details of the games in each study). The games were of different levels of difficulty and relied to different extents on gravity-related predictions for their solution (See Extended data Table 1), based on piloting and previous computational modelling work in the lab classifying games’ difficulty and gravity impact in three levels (gravity: low-gravity-dependent, medium-gravity-dependent, high-gravity-dependent; difficulty: low, medium, high). In each study, the two experimental sets were matched on both factors. In addition, for study 2, the sets were matched on the two gravity levels (hyper/hypo gravity).

Galvanic Vestibular Stimulation (GVS)

Vestibular stimulation was delivered using a commercial stimulator (Good Vibrations Engineering Ltd., Nobleton, Ontario, Canada), controlled via LabVIEW. Carbon rubber electrodes (16 cm²) coated with electrode gel were placed bilaterally over the mastoid processes and secured with adhesive tape. Prior to electrode placement, the application sites were cleaned, and electrode gel was applied to minimize impedance. The stimulation protocol used a stochastic waveform, consisting of an alternating sum-of-sines voltage with dominant frequencies at 0.16, 0.32, 0.43, and 0.61 Hz. This stochastic GVS induces a sense of postural instability without producing consistent or directional illusory motion. To prevent compensatory effects from the non-stimulated side, stimulation was delivered binaurally. A Sham stimulation condition was also implemented, which produced similar skin sensations under the electrodes as the GVS, to control for non-vestibular specific effects. The Sham electrodes were placed on the left and right sides of the neck, approximately 5 cm below the GVS electrodes, which were positioned above the mastoid bones. The maximum intensity for both stimulation conditions was set at 1 mA. These parameters were selected to optimize vestibular disruption43,44, eliciting a mild sensation of dizziness that dissipated immediately after stimulation45. These settings have been shown to mimic spatial disorientation, and suprathreshold GVS is considered an analog to spaceflight. In this experiment, the maximum duration of GVS was 14 minutes (14 games × 1-minute maximum), which has been shown to be well-tolerated, with no lasting effects beyond the stimulation.

Data and statistical analyses

Data and statistical analyses were carried out with MATLAB R2020a (MathWorks), the R software environment (version 4.3.3) for statistical computing and graphics and Python (Jupyter Notebook version 6.4.1) for clustering analyses.

Practice games were not included in the analysis. The completion of a minimum of 10 games under each stimulation was set as a requirement for participants’ data to be included in the analysis. This threshold had been defined to ensure the data were sufficient to assess the effects of the GVS versus Sham stimulations. For each participant and each game, we quantified the success rate, the number of attempts needed and the time per attempt as performance measures. We then quantified the tool switching between attempts and the distance between tool positioning in successive attempts as strategy measures.

For each study, for our three performance measures, we conducted one-sided independent t-tests for game level comparisons under the GVS and Sham stimulations and one-sided paired t-tests for comparing participants’ performance under each stimulation. For strategy measures, we used two-sided t-tests as we had no direction in our hypotheses. For comparing the weighted gravity indexes between study 1 and study 2, we conducted one-sided independent t-tests for performance measures and two-sided independent t-tests for strategy measures. For all the analyses, the level of significance was always set at p < .05.

Computation of primary measures

Performance analysis

For each participant, for each trial and for each attempt, the following data were recorded: success (yes/no), time elapsed (in seconds) between the start of the attempt and the placement of the tool, tool selected and its x,y coordinates on the screen. The data were analysed in MATLAB, by averaging data across games and per stimulation, and by assessing the average number of attempts, success rates and time per attempt under each stimulation.

Strategy analysis

For each game and for each stimulation type, we evaluated how participants attempted to solve each game by evaluating two measures: the percent of tool switching and the average tool-positioning distance. We focused only on games in which participants failed to solve the game on the first attempt. To calculate the percent of tool switching, for each game and each attempt (starting from the second attempt), we compared whether the selected tool was identical to the one used in the previous attempt. We evaluated the percent of attempts for each game in which the selected tool was different. We then averaged the percent of tool switching across all games within each stimulation type. For the average tool-use positioning distance, we calculated for each game the average Euclidean distance in pixels between the positioning of the tool in each attempt and the previous attempt. For each game, we averaged this Euclidean distance across all attempts (starting from the second attempt), yielding an average tool-use positioning distance per game. Then, we averaged the positioning distance across all games within each stimulation.

Gravity-weighted index

Participants completed the virtual tool-use task in both GVS and sham conditions, allowing us to assess the effects of altered gravity on their performance and strategies. As the games relied on gravity to different extent, we calculated a gravity-weighted index for each measure, defined as the ratio between participants’ performance and strategy outcomes in the GVS versus Sham conditions, weighted by the influence of gravity on each game. This gravity-weighted index is a direct quantification of the effects of real-time vestibular noise on high-level reasoning. We then compared the gravity-weighted index across tasks played under terrestrial and altered gravities. As an example, let IndexSR1(respectively IndexSR2 and IndexSR3) be the averaged index for success rates for low-gravity-dependent games (respectively medium-gravity-dependent games, high-dependent-gravity games) for a participant.

Let low-gravity-dependent games be weighted with a coefficient 1, medium-gravity-dependent games with a coefficient of 2 and high-gravity-dependent games with a coefficient of 3. Then, the gravity-weighted index for Success rate for a participant is equal to:

For success rate gravity-weighted index, a value close to -1 indicates a large negative difference between GVS and Sham gravity-weighted results, a value close to 0 similar results and a value close to 1 indicates a large positive difference between GVS and Sham gravity-weighted results.

A gravity-weighted index is similarly calculated for each performance and strategy measures for each participant of study 1 and study 2. For this analysis, only games included in both studies were included and two outliers games (‘Spiky’ and ‘SeeSaw’) were excluded. Then for each performance and strategy measure, we averaged the gravity-weighted indexes across participants to calculate the gravity-weighted indexes for study 1 (terrestrial gravity games) and for study 2 (altered gravity games).

Dirichlet Process Gaussian Mixture Model

To quantify strategic switching behaviour, we employed a Dirichlet Process Gaussian Mixture Model approach adapted from Allen, et al. 16. This unsupervised clustering method automatically identified distinct spatial placement strategies from the [x,y] coordinates of tool attempts, without requiring a priori assumptions about the number or nature of strategies. For each combination of stimulation condition and trial level, we fitted separate mixture models that captured the unique cluster structure and estimated the probability of each attempt belonging to each identified strategy.

Strategic switches were defined as transitions between different spatial strategies on consecutive attempts within the same game. A switch occurred when the cluster differed between two sequential attempts by the same participant. Given the probabilistic nature of the Dirichlet Process, we repeated the entire clustering procedure across 1,000 iterations. For each iteration, we fitted a logistic regression model with switch probability as the dependent variable and both stimulation type (GVS vs. Sham) and attempt number as predictors, extracting the p-value for the stimulation type effect. From this bootstrap analysis, we identified the iteration that produced results closest to the modal p-value distribution by creating kernel density estimates of these stimulation effect p-values across all 1,000 iterations. We selected the iteration whose p-value was nearest to the peak density value, representing the most probable statistical outcome under our clustering approach. We report the results from this selected iteration and its corresponding logistic regression model.

Leave-one-out Kernel density analysis

While the Dirichlet Process Gaussian Mixture Model enables to assess strategic switching behaviour under the GVS and Sham stimulations, it does not address whether participants adopted systematically different strategies across stimulation conditions. To determine whether altered vestibular input led to systematically different distance patterns across placements between GVS and Sham conditions, we implemented a leave-one-out kernel density classification approach based on the [x,y] coordinates of each attempt. For each participant, we excluded their data and used the remaining participants’ placements to construct two separate two-dimensional kernel density estimates (KDEs): one for the GVS group and one for the Sham group. These KDEs captured the spatial distribution of tool placements within each condition.

Next, we estimated the probability density of the left-out participant’s placements under each of the two KDEs. These estimates were used to compute a relative likelihood, reflecting how much more likely a given placement was under the correct group distribution compared to the incorrect one. Relative likelihood values were binarized into classification accuracy scores: a value of 1 was assigned if the placement was more likely under the correct group, and 0 otherwise. Accuracy scores were then averaged across all trials for each participant to produce a single accuracy value.

To evaluate whether the observed classification performance exceeded chance, we repeated the procedure for a null model in which stimulation labels were randomly shuffled. This control analysis was run over 1,000 iterations, each time generating KDEs from the permuted data and computing classification accuracy in the same leave-one-out manner. The resulting distribution of accuracies served as a baseline against which the observed accuracy was compared.

Two versions of this analysis were conducted. The first focused exclusively on the first attempt of each game, evaluating early strategy use under each condition. The second was performed at the game level, averaging placement patterns across trials.

Extended data

Details of the games played under concomitant stimulation in study 1 and study 2.

The gravity dependency column indicates the level of reliance on gravity to solve each game. The difficulty column indicates the level of difficulty to solve the game. The two columns on the right indicate the gravitational accelerations used in each of the two study for a given game.

Left panel Games in Study 1

a, Baseline games. b, Games in Set 1. c, Games in Set 2. The colour of the borders indicates the impact of gravity in the game. High-gravity-dependent games: turquoise box; Medium-gravity-dependent-games: magenta box; Low-gravity-dependent games: orange box. Right panel Games in Study 2

Data availability

Data and analysis code have been deposited on GitHub at https://anonymous.4open.science/r/Adaptability-in-altered-gravity-B318

Acknowledgements

This work was supported by the ESRC New Investigator grant ES/W009242/1, BA Talent Award TDA21\210038, Waterloo Foundation grant 917-4975, Leverhulme Trust research grant RPG-2022-327, and the Birkbeck/Wellcome Trust Institutional Strategic Support Fund to OO and by the UK Experimental Psychology Society and BIAL Foundation grant (grant number 041/2020) to E.R.F. We thank Ruben Zamora for helping with the online data collection.

Additional information

Author contributions

HG, ERF, and OO conceptualized the study and designed the experiment. HG collected the data. HG and TG analyzed the data under the supervision of ERF and OO. All authors wrote the manuscript and approved the final version of the manuscript for submission.

Funding

UKRI | Economic and Social Research Council (ESRC) (New Investigator grant ES/W009242/1)

  • Ori Ossmy

British Academy (BA) (Talent Award TDA21/210038)

  • Ori Ossmy

Waterloo Foundation (TWF) (917-4975)

  • Ori Ossmy

Leverhulme Trust (The Leverhulme Trust) (RPG-2022-327)

  • Ori Ossmy

Birkbeck/ Wellcome Trust (Institutional Strategic Support Fund)

  • Ori Ossmy

UK Experimental Psychology Society (041/2020)

  • Elisa Raffaella Ferrè

BIAL Foundation (041/2020)

  • Elisa Raffaella Ferrè

Additional files

Supplementary information Video 1. Illustrative attempt of a game in terrestrial gravity (1g) – Study 1. The participant correctly positions the chosen tool on the screen, enabling the red ball to get into the green target area (successful attempt).

Supplementary information Video 2. Illustrative attempt of a game in hyper gravity (2g) – Study 2.

Supplementary information Video 3. Illustrative attempt of a game in hypo gravity (0.5g) – Study 2.