Object representation in a gravitational reference frame

  1. Alexandriya MX Emonds
  2. Ramanujan Srinath
  3. Kristina J Nielsen
  4. Charles E Connor  Is a corresponding author
  1. Department of Biomedical Engineering, Johns Hopkins University School of Medicine, United States
  2. Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, United States
  3. Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, United States

Abstract

When your head tilts laterally, as in sports, reaching, and resting, your eyes counterrotate less than 20%, and thus eye images rotate, over a total range of about 180°. Yet, the world appears stable and vision remains normal. We discovered a neural strategy for rotational stability in anterior inferotemporal cortex (IT), the final stage of object vision in primates. We measured object orientation tuning of IT neurons in macaque monkeys tilted +25 and –25° laterally, producing ~40° difference in retinal image orientation. Among IT neurons with consistent object orientation tuning, 63% remained stable with respect to gravity across tilts. Gravitational tuning depended on vestibular/somatosensory but also visual cues, consistent with previous evidence that IT processes scene cues for gravity’s orientation. In addition to stability across image rotations, an internal gravitational reference frame is important for physical understanding of a world where object position, posture, structure, shape, movement, and behavior interact critically with gravity.

Editor's evaluation

In this study, the authors investigate whether neurons in the inferior temporal (IT) cortex encode features relative to the absolute gravitational vertical, by recording responses to objects in varying orientations while monkeys viewed them sitting in physically rotated chairs. They find surprising and compelling evidence that neural tuning is unaffected by physical whole-body tilt, which cannot be explained by any compensatory torsional rotations of the eyes. These findings are of fundamental importance because they indicate that IT neurons may play a role not only in object recognition but more broadly in physical scene understanding.

https://doi.org/10.7554/eLife.81701.sa0

Introduction

Reflexive eye movements compensate for up/down and right/left head movements, but when your head tilts laterally, as during sports, driving (Zikovitz and Harris, 1999), social communication (Halberstadt and Saitta, 1987; Mignault and Chaudhuri, 2003; Krumhuber et al., 2007; Mara and Appel, 2015), working in cramped environments, reaching for distant objects, and resting in bed, your eyes compensate less than 20% (Miller, 1962; Schworm et al., 2002), so retinal images rotate around the point of fixation. But the perceptual compensation for this is so automatic and complete that we are usually unaware of the image rotation, and visual abilities are not strongly affected. This perceptual stability is more than just a generalization of recognition across orientations. Critically, our perceptual reference frame for objects remains stable with respect to the environment and gravity. As a result, trees still appear vertical and apples still appear to fall straight to the ground, even though their orientations and trajectories on the retina have changed.

Here, we explored the hypothesis that this perceptual stability is produced by transforming visual objects into a stable, non-retinal reference frame. Our previous work has shown that the primate ventral visual pathway (Felleman and Van Essen, 1991) implements an object-centered reference frame (Pasupathy and Connor, 1999; Pasupathy and Connor, 2001; Pasupathy and Connor, 2002; Carlson et al., 2011; Srinath et al., 2021; Brincat and Connor, 2004; Brincat and Connor, 2006; Yamane et al., 2008; Hung et al., 2012; Connor and Knierim, 2017), stabilizing against position and size changes on the retina. But this still leaves open the orientation of the ventral pathway reference frame. Our recent work has shown that one channel in anterior ventral pathway processes scene-level visual cues for the orientation of the gravitational reference frame (Vaziri et al., 2014; Vaziri and Connor, 2016), raising the possibility that the ventral pathway reference frame is aligned with gravity. Here, we confirmed this hypothesis in anterior IT (Felleman and Van Essen, 1991), and found that gravitational alignment depends on both visual and vestibular/somatosensory (Brandt et al., 1994; Baier et al., 2012) cues. To a lesser extent, we observed tuning aligned with the retinal reference frame, and object orientation in either reference frame was linearly decodable from IT population responses with high accuracy. This is consistent with psychophysical results showing voluntary perceptual access to either reference frame (Attneave and Reid, 1968). The dominant, gravitationally aligned reference frame not only confers stability across image rotations but also enables physical understanding of objects in a world dominated by the force of gravity.

Results

Object tuning in a gravitational reference frame

Monkeys performed a dot fixation task while we flashed object stimuli on a high-resolution LED monitor spanning 100° of the visual field in the horizontal direction. We used evolving stimuli guided by a genetic algorithm (Carlson et al., 2011; Srinath et al., 2021; Yamane et al., 2008; Hung et al., 2012; Connor and Knierim, 2017; Vaziri et al., 2014; Vaziri and Connor, 2016) to discover 3D objects that drove strong responses from IT neurons. We presented these objects centered at fixation, across a range of screen orientations, with the monkey head-fixed and seated in a rotating chair tilted clockwise (–) or counterclockwise (+) by 25° about the axis of gaze (through the fixation point and the interpupillary midpoint; Figure 1a and b). Compensatory ocular counter-rolling was measured to be 6° based on iris landmarks visible in high-resolution photographs, consistent with previous measurements in humans (Miller, 1962; Schworm et al., 2002) and larger than previous measurements in monkeys (Rosenberg and Angelaki, 2014), making it unlikely that we failed to adequately account for the effects of counterroll. Eye rotation would need to be five times greater than previously observed to mimic gravitational tuning. Our rotation measurements required detailed color photographs that could only be obtained with full lighting and closeup photography. This was not possible within the experiments themselves, where only low-resolution monochromatic infrared images of the eyes were available. Importantly, our analytical compensation for counter-rotation did not depend on our measurement of ocular rotation. Instead, we tested our data for correlation in retinal coordinates across a wide range of rotational compensation values. The fact that maximum correspondence, for those neurons tuned in the retinal reference frame, was observed at a compensation value of 6° (Figure 1—figure supplement 1) indicates that counterrotation during the experiments was consistent with our measurements outside the experiments.

Figure 1 with 10 supplements see all
Example neuron tuned for object orientation in a gravitational reference frame.

(a, b) Stimuli demonstrating example object orientations in the full scene condition. At each object orientation, the object was positioned on the ground-like surface naturalistically by virtually immersing or ‘planting’ 15% of its mass below ground, providing physical realism for orientations that would otherwise be visibly unbalanced and ensuring that most of the object was visible at each orientation. The high-response object shape and orientation discovered in the genetic algorithm experiments was always at the center of the tested orientation range and labeled 0°. The two monkey tilt conditions are diagrammed at left. The small white dots at the center of the head (connected by vertical dashed lines) represent the virtual axis of rotation produced by a circular sled supporting the chair. Stimuli were presented on a 100°-wide display screen for 750ms (separated by 250ms blank screen intervals) while the monkey fixated a central dot. Stimuli were presented in random order for a total of 5 repetitions each. (c,d) Responses of an example IT neuron to full scene stimuli, as a function of object orientation on the screen and thus with respect to gravity, across a 100° orientation range, while the monkey was tilted –25° (c) and 25° (d). Response values are averaged across the 750ms presentation time and across 5 repetitions and smoothed with a boxcar kernel of width 50° (3 orientation values). For this neuron, object orientation tuning remained consistent with respect to gravity across the two tilt conditions, with a peak response centered at 0° (dashed vertical line). The pink triangles indicate the object orientations compared across tilts in the gravitational alignment analysis. The two leftmost values are eliminated to equate the number of comparisons with the retinal alignment analysis. (e,f) The same data plotted against orientation on the retina, corrected for 6° counter-rolling of the eyes (Figure 1—figure supplement 1). The cyan triangles indicate the response values compared across tilts in the retinal analysis. Due to 6° the shift produced by ocular counter-rolling, these comparison values were interpolated between tested screen orientations using a Catmull-Rom spline. Since for this cell orientation tuning was consistent in gravitational space, the peaks are shifted right or left by 19° each, that is 25° minus the 6° compensation for ocular counter-rotation. (g–j) Similar results were obtained for this neuron with isolated object stimuli.

The Figure 1 example neuron was tested with both full scene stimuli (Figure 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Figure 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface).

In Figure 1c and d, responses for the full scene condition are plotted as a function of orientation in the gravitational reference frame, that is orientation on the display screen. Despite the difference in body, head, and eye orientation between Figure 1c and d, the object orientation tuning pattern is stable; for example the peak at 0° lines up (vertical dashed line). The correlation between the two curves in gravitational coordinates is 0.99 (t=25.89, p=3.28 X 10–8). Thus, the object information signaled by this neuron, which necessarily originates in retinal coordinates, has been transformed into the gravitational reference frame.

When the same data are plotted in the retinal reference frame (Figure 1e and f), the peak near 0° shifts right or left by 19° (25° tilt minus 6° counterrotation of the eyes). This reflects the transformation of retinal information into a new reference frame. Because the eyes were rotated in different directions under the two tilt directions, the overlap of tested orientations in retinal coordinates is limited to seven screen orientations. In addition, to account for ocular counterrotation, the tested orientation values (black dots) in the two curves must be shifted 6° in the positive direction for the –25° tilt and 6° negative for the +25° tilt. Thus, the appropriate comparison points between Figure 1e and f, indicated by the cyan triangles, must be interpolated from the Catmull-Rom spline curves used to connect the tested orientations (black dots). A comparable set of seven comparison points in the gravitational reference frame (Figure 1c and d, pink triangles) falls directly on the tested orientations.

Object orientation tuning remained stable with respect to gravity across tilts, peaking at orientation 0°, for both full scene (Figure 1c and d) and isolated object (Figure 1g and h) stimuli. Correspondingly, orientation tuning profiles shifted relative to retinal orientation by about 40° between the two tilt conditions (Figure 1e, f, i and j), shifting the peak to the right and left of 0°. A similar example neuron is presented in Figure 1—figure supplement 2, along with an example neuron for which tuning aligned with the retina, and thus shifted with respect to gravity. Expanded versions of the stimuli and neural data for these examples and others are shown in Figure 1—figure supplements 310.

Distribution of gravity- and retina-aligned tuning

Figure 2a scatterplots correlation values between object orientation tuning functions in the two tilt conditions calculated with respect to retinal orientation (x axis) and gravity (y axis), for a sample of 89 IT neurons tested with full scene stimuli. In both the scatterplot and the marginal histograms, color indicates the result of a 1-tailed randomization t-test on each cell for significant positive correlation (p<0.05) in the gravitational reference frame (pink), retinal reference frame (cyan), or both reference frames (dark gray) presumably due to the broad object orientation tuning of some IT neurons (Hung et al., 2012).

Figure 2 with 6 supplements see all
Scatterplots of object orientation tuning function correlations across tilts.

(a) Scatterplot of correlations for full scene stimuli. Correlations of tuning in the gravitational reference frame (y axis) are plotted against correlations in the retinal reference frame (x axis). Marginal distributions are shown as histograms. Neurons with significant correlations with respect to gravity are colored pink and neurons with significant correlations with respect to the retinae are colored cyan. Neurons with significant correlations in both dimensions are colored dark gray, and neurons with no significant correlation are colored light gray. (b) Scatterplot for isolated object stimuli. Conventions the same as in (a). (c) Same scatterplot as in (a), but balanced for number of comparison orientations between gravitational and retinal analysis. (d) Same as (b), but balanced for number of comparison orientations between gravitational and retinal analysis. Comparable plots based on individual monkeys are shown in . Anatomical locations of neurons in individual monkeys are shown in Figure 2—figure supplements 4 and 5 .

Of the 52 neurons with consistent object orientation tuning in one or both reference frames, 63% (33/52) were aligned with gravity, 21% (11/52) were aligned with the retinae, and 15% (8/52) were aligned with both. The population tendency toward positive correlation was strongly significant along the gravitational axis (two-tailed randomization t-test for center-of-mass relative to 0; p=6.49 X 10–29) and also significant though less so along the retinal axis (p=5.76 X 10–10). Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane; Figure 2b). In this case, 60% of the 53 neurons with significant object orientation tuning in one or both reference frames (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p=3.63 X 10–22) and retinal axes (p=1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation. However, we cannot rule out a contribution of visual cues for gravity in the visual periphery, including screen edges and other horizontal and vertical edges and planes, which in the real world are almost uniformly aligned with gravity (but see Figure 2—figure supplement 1). Nonetheless, the Figure 2b result confirms that gravitational tuning did not depend on the horizon or ground surface in the background condition. This is further confirmed through cell-by-bell comparison between scene and isolated for those cells tested with both (Figure 2—figure supplement 6).

The analyses above were based on the full set of orientation comparisons possible for the gravitational reference frame (7), while the experimental design inevitably produced fewer comparisons for the retinal reference frame (5). Rerunning the analyses based on just 5 comparable object orientations in both reference frames (Figure 1, pink and cyan triangles) produced the results shown in Figure 2c and d. For full scene stimuli, this yielded 56% (23/41) significant gravitational alignment, 27% (11/41) retinal alignment, and 17% (7/41) dual alignment (Figure 2c). For isolated object stimuli, this reanalysis yielded 58% (28/48) gravitational alignment, 29% (14/48) retinal alignment, and 13% (6/48) dual alignment (Figure 2d).

Population coding of orientation in both reference frames

Neurons with no significant correlation in either reference frame might actually combine signals from both reference frames, as in other brain systems that interact with multiple reference frames (Stricanne et al., 1996; Buneo et al., 2002; Avillac et al., 2005; Mullette-Gillman et al., 2005; Cohen and Groh, 2009; Caruso et al., 2021; Chang and Snyder, 2010; McGuire and Sabes, 2011; Chen et al., 2013). This would be consistent with human psychophysical results showing mixed influences of retinal and gravitational reference frames, with stronger weight for gravitational (Bock and Dalecki, 2015; Corballis et al., 1978). For mixed reference frame tuning of this kind, it has been shown that simple linear decoding can extract information in any one reference frame with an appropriate weighting pattern across neurons (Stricanne et al., 1996; Deneve et al., 2001; Pouget et al., 2002). We tested that idea here and found that object orientation information in either gravitational or retinal space could be decoded with high accuracy from the responses of the IT neurons in our sample. The decoding task was to determine whether two population responses, across the 89 neurons tested with (different) full scene stimuli, were derived from same or different orientations (the same two orientation values were chosen for each stimulus peculiar to each neuron), either in gravitational space or retinal space (corrected for counter-rolling). This match/non-match task allowed us to analyze population information about orientation equivalence even though individual neurons were tested using different stimuli with no comparability between orientations. (Across neurons, orientations were aligned according to their order in the tested range, so that each non-match trial involved the same orientation difference, in the same direction, for each neuron.) Our decoding method was linear discriminant analysis of the neural population response patterns for each stimulus pair, implemented with Matlab function fitcdiscr.

The accuracy of these linear models for orientation match/non-match in the gravitational reference frame was 97% (10-fold cross-validation). The accuracy of models for orientation match/non-match in the retinal reference frame was 98%. (The accuracies for analyses based on the partially overlapping population of 99 neurons tested with isolated objects were 81% gravitational and 90% retinal.) The success of these simple linear models shows that information in both reference frames was easily decodable as weighted sums across the neural population. No complex, nonlinear, time-consuming neural processing would be required. This easy, linear decoding of information in both reference frames is consistent with psychophysical results showing that humans have voluntary access to either reference frame (Attneave and Reid, 1968). High accuracy was obtained even with models based solely on neurons that showed no significant correlation in either gravitational or retinal reference frames (Figure 2a, light gray): 89% for gravitational discrimination and 97% for retinal discrimination. This supports the idea that these neurons carry a mix of retinal and gravitational object orientation signals.

Gravity-aligned tuning based on purely visual cues

The results for isolated object stimuli in Figure 2b and d indicate that alignment of object information with gravity does not require the visual cues present in the full scene stimuli (ground surface and horizon) and can be based purely on vestibular and somatosensory cues for the direction of gravity in a dark room. We also tested the converse question of whether purely visual cues (tilted horizon and ground surface) could produce alignment of object orientation tuning with the visually apparent orientation of gravity, even in the presence of conflicting vestibular and somatosensory cues (i.e. with the monkey in a normal upright orientation). In spite of the conflict, many neurons showed object orientation tuning functions aligned with the visually cued direction of gravity, as exemplified in Fig. 3a,c–f. The five object orientations that were comparable in a gravitational reference frame (pink triangles) produced consistent responses to object orientations relative to the ground surface and horizon (Figure 3c and d). For example, the top left stimulus in Figure 3a (horizon tilt –25°, retinal orientation –25°) has the same orientation with respect to the ground surface as the bottom right stimulus (horizon tilt +25°, retinal orientation +25°). Thus, in the visually-cued gravitational reference frame, these two stimuli line up at 0° orientation in both Figure 3c and d, and they evoke similar responses. Conversely, the nine orientations comparable in the retinal reference (black dots and cyan triangles) produced inconsistent responses (Figure 3e and f). A different example neuron (Fig. 3b,g–j) exhibited object orientation tuning aligned with the retinae (Figure 3i and j) and not gravity (Figure 3g and h).

Example neurons tested with tilted horizon stimuli while the monkey remained in an upright orientation.

(a, b) Stimuli used to study two different neurons, demonstrating example object orientations in two conditions, with the ground surface, horizon, and sky gradient tilted –25° (clockwise, top row) or with ground surface, etc. tilted +25° (counterclockwise, second row). The monkey was in a normal upright orientation during these experiments, producing conflicting vestibular/somatosensory cues. The retinal orientation discovered in the genetic algorithm experiments is arbitrarily labeled 0°. (c, d) For one of the example IT neurons, tested with the stimuli in (a), object orientation tuning with respect to the visually cued direction of gravity was consistent across the two ground tilts. (e, f) Correspondingly, the neuron gave very different responses to retinal object orientation values between the two ground tilts. (g, h) This different example IT neuron, tested with the stimuli in (b), did not exhibit consistent object orientation tuning in visually-cued gravitational space. (i, j) Instead, this neuron maintained consistent tuning for retinal-screen orientation despite changes in ground tilt.

Across a sample of 228 IT neurons studied in this cue conflict experiment, 123 showed significant correlation across visual ground/horizon tilt in one or both reference frames. Of these, 54% (67/123) showed object orientation tuning aligned with the retinal reference frame, 35% (43/123) with the gravitational reference frame, and 11% (13/123) with both (Figure 4a). The population tendency toward retina-aligned orientation tuning was significant (two-tailed randomization t-test for center-of-mass relative to 0; p=8.14 X 10–28) as was the tendency toward gravity-aligned orientation tuning (p=6.23 X 10–6). The experimental design in this case produced more comparisons in the retinal reference frame, and balancing the numbers of comparisons resulted in more equal percentages (Figure 4b). The main result in this experiment, that many IT neurons exhibit object orientation tuning aligned with visual cues for the direction of gravity, even in the presence of conflicting vestibular/somatosensory cues, argues that visual cues contribute to gravity-aligned tuning under normal circumstances, where they combine with convergent vestibular/somatosensory cues. That would be consistent with our previous discovery that many neurons in IT are strongly tuned for the orientation of large-scale ground surfaces and edges, in the orientation ranges experienced across head tilts (Brincat and Connor, 2004; Brincat and Connor, 2006), and more generally with the strong visual representation of scene information in temporal lobe (Epstein and Kanwisher, 1998; Epstein, 2008; Lescroart and Gallant, 2019; Kornblith et al., 2013).

Scatterplots of object orientation tuning function correlations across visual horizon tilts on the screen, with the monkey in an upright orientation.

(a) Scatterplot of correlations for full scene stimuli. Correlations of tuning in gravitational space as cued by horizon tilt (y axis) are plotted against correlations in retinal space (x axis). Marginal distributions are shown as histograms. Neurons with significant correlations in visually-cued gravitational space are colored pink and neurons with significant correlations in retinal space are colored cyan. Neurons with significant correlations in both dimensions are colored dark gray, and neurons with no significant correlation are colored light gray. (b) Same scatterplot as in (a), but with correlation values balanced for number of comparison orientations between gravitational and retinal analysis.

Discussion

The fundamental goal of visual processing is to transform photoreceptor sheet responses into usable, essential information—readable, compressed, stable signals, for the specific things we need to understand about the world. In this sense, the transformation described here achieves both stability and specificity of object information. The gravitational reference frame remains stable across retinal image rotations, a critical advantage for vision from the tilting platform of the head and body. And, it enables understanding of object structure, posture, shape, motion, and behavior relative to the strong gravitational force that constrains and interacts with all these factors. It provides information about whether and how objects and object parts are supported and balanced against gravity, how flexible, motoric objects like bodies are interacting energetically with gravity, what postural or locomotive behaviors are possible or likely, and about potential physical interactions with other objects or with the observer under the influence of gravity. In other words, it provides information critical for guiding our mechanistic understanding of and skillful interactions with the world.

It is important to distinguish this result from the notion of increasing invariance, including rotational invariance, at higher levels in the ventral pathway. There is a degree of rotational invariance in IT, but even by the broadest definition of invariance (angular range across which responses to an optimal stimulus remain significantly greater than average responses to random stimuli) the average is ~90° for in-plane rotation and less for out-of-plane rotations (Hung et al., 2012). It has often been suggested that the ventral pathway progressively discards information about spatial positions, orientations, and sizes as a way to standardize the neural representations of object identities. But, in fact, these critical dimensions for understanding the physical world of objects and environments are not discarded but rather transformed. In particular, spatial position information is transformed from retinal coordinates into relative spatial relationships between parts of contours, surfaces, object parts, and objects (Connor and Knierim, 2017). Our results here indicate a novel kind of transformation of orientation information in the ventral pathway, from the original reference frame of the eyes to the gravitational reference frame that defines physical interactions in the world. Because this is an allocentric reference frame, the representation of orientation with respect to gravity is invariant to changes in the observer system (especially lateral head tilt), making representation more stable and more relevant to external physical events. However, our results do not suggest a change in orientation tuning breadth, increased invariance to object rotation, or a loss of critical object orientation information.

A similar hypothesis about gravity-related tuning for tilted planes has been tested in parietal area CIP (central intraparietal area). Rosenberg and Angelaki, 2014 measured the responses of 46 CIP neurons with two monkey tilts, right and left 30°, and fit the responses with linear models. They reported significant alignment with eye orientation for 45 of 92 (49%) tilt tests (two separate tests for each neuron, right and left), intermediate between eye and gravity for 26/92 tilt tests (28%), and alignment with gravity for 6/92 tilt tests (7%). However, of the 5 neurons in this last category, only one appeared to show significant alignment with gravity for both tilt directions (Rosenberg and Angelaki, 2014; Figure 4D). Thus, while orientation tuning of ~35% of CIP neurons was sensitive to monkey tilt and gravity-aligned information could be extracted with a neural network (Rosenberg and Angelaki, 2014), there was no explicit tuning in a gravitational reference frame or dominance of gravitational information as found here. There is however compelling human fMRI evidence that parietal and frontal cortex are deeply involved in perceiving and predicting physical events (Fischer et al., 2016), and have unique abstract signals for stability not detected in ventral pathway (Pramod et al., 2022), though these could reflect decision-making processes (Shadlen and Newsome, 2001; Gold and Shadlen, 2007). Our results and others (Gallivan et al., 2014; Gallivan et al., 2016; Cesanek et al., 2021) suggest nonetheless that ventral pathway object and scene processing may be a critical source of information about gravity and its effects on objects, especially when detailed object representations are needed to assess precise shape, structure, support, strength, flexibility, compressibility, brittleness, specific gravity, mass distribution, and mechanical features to understand real world physical situations.

Our results raise the interesting question of how visual information is transformed into a gravity-aligned reference frame, and how that transformation incorporates vestibular, somatosensory, and visual cues for the direction of gravity. Previous work on reference frame transformation has involved shifts in the position of the reference frame. There is substantial evidence that these shifts are causally driven by anticipatory signals for attentional shifts and eye movements from prefrontal cortex, acting on ventral pathway cortex to activate neurons with newly relevant spatial sensitivities (Tolias et al., 2001; Moore and Armstrong, 2003; Moore et al., 2003; Armstrong et al., 2006; Schafer and Moore, 2011; Noudoost and Moore, 2011). Here, the more difficult geometric problem is rotation of visual information, such that “up”, “down”, “right” and “left” become associated with signals from different parts of the retina, based on a change in the perceived direction of gravity. This could also involve spatial remapping, but in circular directions, within an object-centered reference frame (Pasupathy and Connor, 2001; Pasupathy and Connor, 2002; Carlson et al., 2011; Srinath et al., 2021; Brincat and Connor, 2004; Brincat and Connor, 2006; Yamane et al., 2008; Hung et al., 2012; Connor and Knierim, 2017). Humans can perform tasks requiring mental rotation of shapes, but this is time consuming in proportion to the angle of required rotation (Shepard and Metzler, 1971), and seems to rely on an unusual strategy of covert motor simulation (Wexler et al., 1998). The rotation required here is fast and so automatic as to be unnoticeable. Discovering the underlying transformation mechanism will likely require extensive theoretical, computational, and experimental investigation.

Materials and methods

Behavioral task, stimulus presentation, and electrophysiological recording

Request a detailed protocol

Two head-restrained male rhesus monkeys (Macaca mulatta) were trained to maintain fixation within 1° (radius) of a 0.3° diameter spot for 4 s to obtain a juice reward. Eye position was monitored with an infrared eye tracker (EyeLink). Image stimuli were displayed on a 3840x2160 resolution, 80.11 DPI television screen placed directly in front of the monkey, centered at eye level at a distance of 60 cm. The screen subtended 70° vertically and 100° horizontally. Monkeys were seated in a primate chair attached to a+/–25° full-body rotation mechanism with the center of rotation at the midpoint between the eyes, so that the angle of gaze toward the fixation point remained constant across rotations. The rotation mechanism locked at body orientations of –25° (tilted clockwise), 0°, and +25° (counterclockwise). After fixation was initiated by the monkey, 4 stimuli were presented sequentially, for 750ms each, separated by 250ms intervals with a blank, gray background. All stimuli in a given generation were tested in random order for a total of five repetitions. The electrical activity of well-isolated single neurons was recorded with epoxy-coated tungsten electrodes (FHC Microsystems). Action potentials of individual neurons were amplified and electrically isolated using a Tucker-Davis Technologies recording system. Recording positions ranged from 5 to 25 mm anterior to the external auditory meatus within the inferior temporal lobe, including the ventral bank of the superior temporal sulcus, lateral convexity, and basal surface. Positions were determined on the basis of structural magnetic resonance images and the sequence of sulci and response characteristics observed while lowering the electrode. A total of 368 object-selective IT neurons were studied with different combinations of experiments. All animal procedures were approved by the Johns Hopkins Animal Care and Use Committee (protocol # PR21M442) and conformed to US National Institutes of Heath and US Department of Agriculture guidelines.

Stimulus generation

Request a detailed protocol

Initially random 3D stimuli evolved through multiple generations under control of a genetic algorithm (Carlson et al., 2011; Srinath et al., 2021; Brincat and Connor, 2004; Brincat and Connor, 2006; Yamane et al., 2008), leading to high-response stimuli used to test object orientation tuning as a function of eye/head/body rotation. Random shapes were created by defining 3D mesh surfaces surrounding medial axis skeletons (Srinath et al., 2021). These shapes were assigned random or evolved optical properties including color, surface roughness, specularity/reflection, translucency/transparency, and subsurface scattering. They were depicted as projecting from (partially submerged in) planar ground surfaces covered with a random scrub grass texture extending toward a distant horizon meeting a blue, featureless sky, with variable ambient light color and lighting direction consistent with random or evolved virtual times of day. Ground surface tilt and object orientation were independent variables of interest as described in the main text. These varied across ranges of 100–200° at intervals of 12.5 or 25°. Ground surface slant, texture gradient, and horizon level, as well as object size and appearance, varied with random or evolved virtual viewing distances. The entire scenes were rendered with multi-step ray tracing using Blender Cycles running on a cluster of GPU-based machines.

Data analysis and statistics

Request a detailed protocol

Response rates for each stimulus were calculated by counting action potentials during the presentation window and averaging across five repetitions. Orientation tuning functions were smoothed with boxcar averaging across three neighboring values. Pearson correlation coefficients between object orientation tuning functions in different conditions (in some cases corrected for ocular counter-rolling) were calculated for the averaged, smoothed values. Significance of positive correlations were measured with a one-tailed randomization t-test, p=0.05. (There was no a priori reason to predict or test for negative correlations between orientation tuning functions.) A null distribution was created by randomly assigning response values across the tested orientations within each of the two tuning functions and recalculating the t-statistic 10,000 times. Significant biases of population correlation distributions toward positive or negative values were measured with two-tailed randomization t-tests, with exact p-values reported. A null distribution was created by randomly assigning response values across the tested orientations within each of the two tuning functions for each neuron, recalculating the t-statistic for each neuron, and recalculating the correlation distribution center of mass on the correlation domain 10,000 times.

Population decoding analysis

Request a detailed protocol

We pooled data across 89 neurons tested with full scene stimuli at the two monkey tilts and used cross-validated linear discriminant analysis to discriminate matching from non-matching orientations in both the retinal and gravitational reference frames. Ground truth matches were identical (in either gravitational or counter-rolling corrected retinal coordinates, depending on which reference frame was being tested). Ground truth non-matches differed by more than 25°. We equalized the numbers of retinal and gravitational match and non-match conditions by subsampling. This yielded five potential pairs of matches and 20 potential pairs of non-matches for each reference frame. For each member of a test pair, we randomly selected one raw response value for each neuron from among the five individual repetitions for that object orientation. We generated a dataset for all possible test pairs under these conditions. We used Matlab function fitcdiscr to build optimal models for linear discrimination of matches from non-matches based on response patterns across the 89 neurons. We built separate models for retinal and gravitational reference frame match/non-match discrimination. We report the accuracy of the models as 1 – misclassification rate using 10-fold cross validation.

Data availability

All relevant data is publicly accessible in our GitHub repository https://github.com/amxemonds/ObjectGravity (copy archived at Emonds, 2023). Further information requests should be directed to and will be fulfilled by the corresponding author, Charles E. Connor (connor@jhu.edu).

References

    1. Schworm HD
    2. Ygge J
    3. Pansell T
    4. Lennerstrand G
    (2002)
    Assessment of ocular counterroll during head tilt using binocular video oculography
    Investigative Ophthalmology & Visual Science 43:662–667.

Decision letter

  1. SP Arun
    Reviewing Editor; Indian Institute of Science Bangalore, India
  2. Tirin Moore
    Senior Editor; Howard Hughes Medical Institute, Stanford University, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Object representation in a gravitational reference frame" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Tirin Moore as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The high tuning correlation between the whole-body tilt conditions could also occur if IT neurons encoded the angle between the horizon and the object in the object-with-horizon experiment, and/or the angle between the object and the frame of the computer monitor which may potentially be visible in the object-alone conditions. The authors will need to think carefully about how to address this confound, or acknowledge clearly that this is an alternate explanation for their findings, which would also dilute the overall novelty of the results. One possibility could be to perform identical analyses on pre-trained deep neural networks. Another could be to quantify the luminance of the monitor, and maybe also how brightly lit other objects are by the monitor in their setup. Finally, object-orientation tuning could be compared in the object-alone and object-in-scene conditions.

2) The authors should provide more details about the torsional eye movements they have measured in each animal. For instance, have the authors measured torsional eye rotations on every trial? Is it fixed always at {plus minus}6{degree sign} or does it change from trial to trial? If it changes, then could the high tuning correlation between the whole-body rotations be simply driven by trials in which the eyes compensated more?

3) A lot of details are dense in the manuscript. The authors should clearly present their control analyses and also the correlation analyses reported in the main figures. Please refer to the specific comments in the individual reviews for details.

Reviewer #1 (Recommendations for the authors):

In addition to the comments in the public review, It would also be good if the authors can quantify the overall tendency of the population in some way, rather than reporting the proportions of neurons that show a high correlation in the two reference frames. For instance, is the average tuning correlation in the absolute gravitational reference frame stronger than the average tuning correlation for the retinal reference frame? Are the proportions of neurons encoding the two reference frames different in the two experiments?

Specific comments:

In Figure 1a, the object orientations given are -50{degree sign}, 0{degree sign}, and +50{degree sign}, but in figures 1c and 1d, we can see that orientations go up to 100{degree sign}, in both directions. For these bigger rotations, do the objects penetrate the ground surface? Can the authors show more object orientations?

Figure 1a: please rearrange the columns to show the object rotating consistently in one direction (CW or CCW). For instance, swap the leftmost and rightmost columns in the stimuli.

Figure 1e, f – Can the authors quantify the shift from 1c and 1d explicitly? In line 102, it says the shift is about 20o. Is there any variability in the magnitude of shift across trials/neurons etc? If so, can the authors explain it clearly?

Figure 1,3: The Cyan and pink triangles are not explained clearly at all. The authors should elaborate on this in the Results and in the figure legends.

Figure 1e, f, i, j – We understand that x-axis values are estimated from monkey tilt and torsional rotation. Can authors show some details on torsional rotation, as in is this observed for every trial? Is there trial-to-trial variability here? Are there any trials, for which there is complete compensation by ocular counter-rolling? Though it is mentioned in the supplementary section (line 548), it is not very clear, what is meant by the comment "For all the data from both monkeys".

Figure 2c, d – I suggest the authors move panels c and d to supplementary material, as it is not central to the arguments. Can the authors explain the matched analysis in detail on how it was done?

Line 135 – It says a sample of 99 neurons, but in lines 136-138 while giving the % of each set of neurons, the denominator is 53. Please clarify.

Figure 3: Since there are two neurons shown in this figure, label them as Cell 1 and Cell 2 in the figure itself. Also, it would be better to explicitly mention, which one of the figures 3c or 3e, has the x-axis inferred.

Line 476: Materials and methods: Provide the details of recording sites – left/right hemisphere, probe, and data acquisition process.

Methods: Can the authors show one full set of example stimuli indicating all object orientations used in each experiment?

Reviewer #2 (Recommendations for the authors):

– The data is presented in a very compact form right now. For both Figure 1 and Figure 3, I would have found helpful a figure showing the responses of a cell to the 5 repeated presentations and showing 'each of the stimuli' (and monkey physical orientation) presented for each condition in the gravitational and retinal reference frame comparisons.

And to show such a plot (possibly in Supplementary data) for more example cells. A huge amount of work went into collecting this data, and I think it would really help to bring readers closer to the raw data through more examples and a complete and hand-holding presentation format (even if takes up more pdf pages).

– For plots of single-cell tuning curves, error bars indicating SEM would be helpful.

– The result of the decoding analysis, that one can build decoders for both the gravitational reference frame and the retinal reference frame same-different task, is interesting. To what extent does this depend on specialized mechanisms? If one were to attempt the same decoding using a deep neural network trained on classification by presenting the same images presented to the monkey in the experiment, could one achieve similar decoding for the gravitational frame same/different task? Or would it completely fail?

– Additional discussion of the relation of current findings to known functional architecture of IT would be helpful. For example, the recordings were from AP5 to AP25. Were any differences observed across this span? Were cells recorded in object or scene regions of IT (cf. Vaziri and Connor)?

– Also, how do results relate to the notion of IT cells generating an invariant representation? If IT cells were completely rotation invariant, then all the points should cluster in the top right in their scatter plots, and that is clearly not the case. Is the suggestion then that in general IT cells are less invariant to rotations than to translations, scalings, etc., and furthermore that this selectivity for rotation angle is represented in a mixed reference frame, enabling robust decoding of identity and orientation in retinal and gravitational coordinates? A more explicit statement on the relation of the findings to the idea of IT encoding a hierarchy of increasingly invariant representations would be helpful.

Reviewer #3 (Recommendations for the authors):

1. The authors employ a correlation analysis to examine quantitatively the effect of tilt on orientation tuning. However, it is not clear to me how well the correlation analysis can distinguish the two reference frames (retinal versus gravitational). For instance, for the data in Figure 1, I expect that the retinal reference frame also would have provided a positive correlation although the orientation tuning shifted as predicted in retinal coordinates. Furthermore, a lack of correlation can reflect an absence of orientation tuning. Therefore, I suggest that the authors select first those neurons that show a significant orientation tuning for at least one of the two tilts. For those neurons, they can determine for each tilt the preferred orientation and examine the difference in preferred orientation between the two tilts. Each of the two reference frames provides clear predictions about the expected (absence of) difference between the preferred orientations for the two tilts. Using such an analysis they can also determine whether neurons tested with and without a scene background show effects of tilt on orientation preference that are consistent across the scene manipulation (i.e. without and with scene background). Then the scene data would be useful.

2. I have two issues with the population decoding analysis. First, the authors should provide a better description of the implementation of the decoding analysis. It was unclear to me how the match-nonmatch task was implemented. Second, they should perform this analysis for the object without scene background data, since as I discussed above, the scene data are difficult to interpret.

3. The authors pooled the data of the two monkeys. They should provide also individual monkey data so that the reader knows how consistent the effects are for the two animals.

https://doi.org/10.7554/eLife.81701.sa1

Author response

Essential revisions:

1) The high tuning correlation between the whole-body tilt conditions could also occur if IT neurons encoded the angle between the horizon and the object in the object-with-horizon experiment, and/or the angle between the object and the frame of the computer monitor which may potentially be visible in the object-alone conditions. The authors will need to think carefully about how to address this confound, or acknowledge clearly that this is an alternate explanation for their findings, which would also dilute the overall novelty of the results. One possibility could be to perform identical analyses on pre-trained deep neural networks. Another could be to quantify the luminance of the monitor, and maybe also how brightly lit other objects are by the monitor in their setup. Finally, object-orientation tuning could be compared in the object-alone and object-in-scene conditions.

We agree that a shape-configuration (i.e. overlapping orientation) interaction between horizon and object was possible, as opposed to the horizon serving purely as a gravitational cue. That is why we tested neurons in the isolated object condition. We now make that concern and the control importance of the isolated object condition explicit in the text discussion of Figure 1 (where we also eliminate the claim that the room was otherwise dark): The Figure 1 example neuron was tested with both full scene stimuli (Figure 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Figure 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface).

This control condition, in which the main results in Figure 2b were replicated, addresses the reasonable concern about the horizon/object shape configuration interaction. In addition, we recognize that remaining visual cues for gravity in the room, including the screen edges, could still contribute to tuning in gravitational coordinates: Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Figure 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation. However, we cannot rule out a contribution of visual cues for gravity in the visual periphery, including screen edges and other horizontal and vertical edges and planes, which in the real world are almost uniformly aligned with gravity and thus strong cues for its orientation (but see Figure 2–supplement figure 1). Nonetheless, the Figure 2b result confirms that gravitational tuning did not depend on the horizon or ground surface in the background condition.

Figure 2–supplement figure 1shows that the results were comparable for a subset of cells studied with a circular aperture surrounding the floating object, with gray background in the circular aperture and black screen outside it. Under this condition, the circular aperture edge, which conveys no information about the direction of gravity and would maintain a constant relationship to the object regardless of object-tilt, would be more high-contrast, salient, and closer to the object than the screen edges.

Finally, we show the reviewer-suggested cell-by-cell comparisons of scene and isolated stimuli, for those cells tested with both, in Figure 2–supplement figure 6. This figure shows 8 neurons with significant gravitational tuning only in the floating object condition, 11 neurons with tuning only in the gravitational condition, and 23 neurons with significant tuning in both. Thus, a majority of significantly tuned neurons were tuned in both conditions. A two-tailed paired t-test across all 79 neurons tested in this way showed that there was no significant tendency toward stronger tuning in the scene condition. The 11 neurons with tuning only in the gravitational condition by themselves might suggest a critical role for visual cues in some neurons. However, the converse result for 8 cells, with tuning only in the floating condition, suggests a more complex dependence on cues or a conflicting effect of interaction with the background scene for a minority of cells.

Main text: “This is further confirmed through cell-by-bell comparison between scene and isolated for those cells tested with both (Figure 2–supplement figure 6).”

We do not think the further suggestion of orientation interactions between object and screen edges in the isolated object condition needs mentioning in the paper itself, given that the closest screen edges on our large display were 28° in the periphery, and there is no reason to suspect that IT encodes orientation relationships between distant, disconnected visual elements. Screen edges have been present in most studies of IT, and no such interactions have been reported. However, we will also discuss this point in online responses.

2) The authors should provide more details about the torsional eye movements they have measured in each animal. For instance, have the authors measured torsional eye rotations on every trial? Is it fixed always at {plus minus}6{degree sign} or does it change from trial to trial? If it changes, then could the high tuning correlation between the whole-body rotations be simply driven by trials in which the eyes compensated more?

We now clarify that we could only measure ocular rotation outside the experiment with high-resolution closeup color photography. Our measurements were consistent with previous reports showing that counterroll is limited to 20% of tilt. Moreover, they are consistent with our analyses showing that maximum correlation with retinal coordinates is obtained with a 6° correction for counterroll, indicating equivalent counterroll during experiments. Counterroll would need to be five times greater than previous observations to completely compensate for tilt and mimic the gravitational tuning we observed. For these reasons, counterroll is not a reasonable explanation for our results:

“Compensatory ocular counter-rolling was measured to be 6° based on iris landmarks visible in high-resolution photographs, consistent with previous measurements in humans6,7, and larger than previous measurements in monkeys41, making it unlikely that we failed to adequately account for the effects of counterroll. Eye rotation would need to be five times greater than previously observed to mimic gravitational tuning. Our rotation measurements required detailed color photographs that could only be obtained with full lighting and closeup photography. This was not possible within the experiments themselves, where only low-resolution monochromatic infrared images were available. Importantly, our analytical compensation for counter-rotation did not depend on our measurement of ocular rotation. Instead, we tested our data for correlation in retinal coordinates across a wide range of rotational compensation values. The fact that maximum correspondence was observed at a compensation value of 6° (Figure 1­–supplement figure 1) indicates that counterrotation during the experiments was consistent with our measurements outside the experiments.”

3) A lot of details are dense in the manuscript. The authors should clearly present their control analyses and also the correlation analyses reported in the main figures. Please refer to the specific comments in the individual reviews for details.

See below.

Reviewer #1 (Recommendations for the authors):

In addition to the comments in the public review, It would also be good if the authors can quantify the overall tendency of the population in some way, rather than reporting the proportions of neurons that show a high correlation in the two reference frames. For instance, is the average tuning correlation in the absolute gravitational reference frame stronger than the average tuning correlation for the retinal reference frame? Are the proportions of neurons encoding the two reference frames different in the two experiments?

We scatterplot the complete distributions of joint tuning values in Figure 2 (with marginals for the two tuning dimensions), which is the most direct way to convey the entire underlying datasets. We report overall tendencies in terms of the significance of the distance of the mean or center-of-mass from 0 in the positive direction. This conveys the strength of tuning tendencies conditioned by variability in the data. We now point out the comparative strength of the p values:

“Of the 52 neurons with consistent object orientation tuning in one or both reference frames, 63% (33/52) were aligned with gravity, 21% (11/52) were aligned with the retinae, and 15% (8/52) were aligned with both. The population tendency toward positive correlation was strongly significant along the gravitational axis (two-tailed randomization t-test for center-of-mass relative to 0; p = 6.49 X 10–29) and also significant though less so along the retinal axis (p = 5.76 X 10–10).”

Specific comments:

In Figure 1a, the object orientations given are -50{degree sign}, 0{degree sign}, and +50{degree sign}, but in figures 1c and 1d, we can see that orientations go up to 100{degree sign}, in both directions. For these bigger rotations, do the objects penetrate the ground surface? Can the authors show more object orientations?

We now explain in the Figure 1 caption that at each orientation 15% of the virtual object mass was planted in the ground to provide a physically realistic presentation of an orientation that would be unbalanced if it merely rested on the ground. Additional examples of how object orientation interacted with the ground are shown in Figure 3 and Figure 1—figure supplements 3–10.

“At each object orientation, the object was virtually placed on the ground-like surface naturalistically by immersing or “planting” 15% of its mass below ground, providing physical realism for orientations that would otherwise be visibly unbalanced, and ensuring that most of the object was visible at each orientation. The high-response object shape and orientation discovered in the genetic algorithm experiments was always at the center of the tested range and labeled 0°.”

Figure 1a: please rearrange the columns to show the object rotating consistently in one direction (CW or CCW). For instance, swap the leftmost and rightmost columns in the stimuli.

We are not sure what is desired here. All of the subplots in Figure 1 effectively rotate counterclockwise going from left to right as they are now. This makes sense so that the rotation scale in the response plots can progress from negative numbers to positive numbers going left to right, as is the convention, given the additional convention that counterclockwise rotations are usually considered positive. Maybe there is a confusion about the fact that “0” is the orientation found by the genetic algorithm, and the stimuli were rotated in both directions away from this roughly optimum orientation; this should be cleared up by the new text in the Figure 1 legend.

Figure 1e, f – Can the authors quantify the shift from 1c and 1d explicitly? In line 102, it says the shift is about 20o. Is there any variability in the magnitude of shift across trials/neurons etc? If so, can the authors explain it clearly?

We have changed this to:

“the peaks are shifted right or left by 19° each, i.e. 25° minus the 6° compensation for ocular counter-rotation.”

Figure 1,3: The Cyan and pink triangles are not explained clearly at all. The authors should elaborate on this in the Results and in the figure legends.

We have changed the main text to clarify this:

“When the same data are plotted in the retinal reference frame (Figure 1e and f), the peak near 0° shifts right or left by 19° (25° tilt minus 6° counterrotation of the eyes). This reflects the transformation of retinal information into a new reference frame. Because the eyes were rotated in different directions under the two tilt directions, the overlap of tested orientations in retinal coordinates is limited to seven screen orientations. In addition, to account for ocular counterrotation, the tested orientation values (black dots) in the two curves must be shifted 6° in the positive direction for the –25° tilt and 6° negative for the +25° tilt. Thus, the appropriate comparison points between Figure 1e and f, indicated by the cyan triangles, must be interpolated from the Catmull-Rom spline curves used to connect the tested orientations (black dots). A comparable set of seven comparison points in the gravitational reference frame (Figure 1c and d, pink triangles) falls directly on the tested orientations.”

Figure 1e, f, i, j – We understand that x-axis values are estimated from monkey tilt and torsional rotation. Can authors show some details on torsional rotation, as in is this observed for every trial? Is there trial-to-trial variability here? Are there any trials, for which there is complete compensation by ocular counter-rolling? Though it is mentioned in the supplementary section (line 548), it is not very clear, what is meant by the comment "For all the data from both monkeys".

We expanded and clarified the description of the analysis of compensation of ocular rotation:

“For the tilt experiments on all the neurons, combined across monkeys, we searched for the counterroll compensation that would produce the strongest agreement in retinal coordinates. At each compensation level tested, we normalized and summed the mean squared error (MSE) between responses at corresponding retinal positions. The best agreement in retinal coordinates (minimum MSE) was measured at 12° offset, corresponding to 6° rotation from normal in each of the tilt conditions (lower left).”

As mentioned above, we now clarify as much as possible the methods and limitations of our eye rotation measurements, and we emphasize that our method for compensation did not depend on these measurements but was instead optimized for retinal correlation:

“Compensatory ocular counter-rolling was measured to be 6° based on iris landmarks visible in high-resolution photographs, consistent with previous measurements in humans6,7, and larger than previous measurements in monkeys41, making it unlikely that we failed to adequately account for the effects of counterroll. Eye rotation would need to be five times greater than previously observed to mimic gravitational tuning. Our rotation measurements required detailed color photographs that could only be obtained with full lighting and closeup photography. This was not possible within the experiments themselves, where only low-resolution monochromatic infrared images were available. Importantly, our analytical compensation for counter-rotation did not depend on our measurement of ocular rotation. Instead, we tested our data for correlation in retinal coordinates across a wide range of rotational compensation values. The fact that maximum correspondence was observed at a compensation value of 6° (Figure 1—figure supplement 1) indicates that counterrotation during the experiments was consistent with our measurements outside the experiments.”

Figure 2c, d – I suggest the authors move panels c and d to supplementary material, as it is not central to the arguments. Can the authors explain the matched analysis in detail on how it was done?

Figure 2c and 2d are important because the larger number of matching positions in the gravitational comparison may bias the results toward gravitational correlation. This is explained in main text:

“The analyses above were based on the full set of orientation comparisons possible for the gravitational reference frame (7), while the experimental design inevitably produced fewer comparisons for the retinal reference frame (5). Rerunning the analyses based on just 5 comparable object orientations in both reference frames (Figure 1, pink and cyan triangles) produced the results shown in Figures 2c and d. For full scene stimuli, this yielded 56% (23/41) significant gravitational alignment, 27% (11/41) retinal alignment, and 17% (7/41) dual alignment (Figure 2c). For isolated object stimuli, this reanalysis yielded 58% (28/48) gravitational alignment, 29% (14/48) retinal alignment, and 13% (6/48) dual alignment (Figure 2d).”

Line 135 – It says a sample of 99 neurons, but in lines 136-138 while giving the % of each set of neurons, the denominator is 53. Please clarify.

As in the description of the scene condition results, percentages are given for neurons with one or both significant results; now clarified:

“In this case, 60% of the 53 neurons with significant object orientation tuning in one or both reference frames (32/53)”

Figure 3: Since there are two neurons shown in this figure, label them as Cell 1 and Cell 2 in the figure itself. Also, it would be better to explicitly mention, which one of the figures 3c or 3e, has the x-axis inferred.

This is now clarified in the figure legend:

(a,b) Stimuli used to study two different neurons, demonstrating example object orientations in two conditions, with the ground surface, horizon, and sky gradient tilted –25° (clockwise, top row), and with ground surface, etc. tilted +25° (counterclockwise, second row). The monkey was in a normal upright orientation during these experiments, producing conflicting vestibular/somatosensory cues. The retinal orientation discovered in the genetic algorithm experiments is arbitrarily labeled 0°. (c,d) For one of the example IT neurons, tested with the stimuli in (a), object orientation tuning with respect to the visually cued direction of gravity was consistent across the two ground tilts. (e,f) Correspondingly, the neuron gave very different responses to retinal object orientation values between the two ground tilts. (g,h) This different example IT neuron, tested with the stimuli in (b), did not exhibit consistent object orientation tuning in visually-cued gravitational space. (i,j) Instead, this neuron maintained consistent tuning for retinal-screen orientation despite changes in ground tilt.

Line 476: Materials and methods: Provide the details of recording sites – left/right hemisphere, probe, and data acquisition process.

The electrical activity of well-isolated single neurons was recorded with epoxy-coated tungsten electrodes (FHC Microsystems). Action potentials of individual neurons were amplified and electrically isolated using a Tucker-Davis Technologies recording system. Recording positions ranged from 5 to 25 mm anterior to stereotaxic 0 within the left inferior temporal lobe, including the ventral bank of the superior temporal sulcus, lateral convexity, and basal surface. Positions were determined on the basis of structural magnetic resonance images and the sequence of sulci and response characteristics observed while lowering the electrode.

In addition, locations of individual neurons and distribution between subdivisions of IT are now described in Figure 2—figure supplements 4,5.

Reviewer #2 (Recommendations for the authors):

– The data is presented in a very compact form right now. For both Figure 1 and Figure 3, I would have found helpful a figure showing the responses of a cell to the 5 repeated presentations and showing 'each of the stimuli' (and monkey physical orientation) presented for each condition in the gravitational and retinal reference frame comparisons.

And to show such a plot (possibly in Supplementary data) for more example cells. A huge amount of work went into collecting this data, and I think it would really help to bring readers closer to the raw data through more examples and a complete and hand-holding presentation format (even if takes up more pdf pages).

– For plots of single-cell tuning curves, error bars indicating SEM would be helpful.

We have added expanded data presentations in Figure 1—figure supplements 3–10, reproduced above, to show each of the stimuli for the two examples (one gravitationally tuned and one retinally tuned) in that figure. In these plots, the individual temporally smoothed tuning curves for each of 5 repetitions are shown to indicate variability of responses directly. Temporal smoothing is critical because low number of stimulus repetitions (5) is balanced by close sampling of stimulus orientation in our experimental design.

– The result of the decoding analysis, that one can build decoders for both the gravitational reference frame and the retinal reference frame same-different task, is interesting. To what extent does this depend on specialized mechanisms? If one were to attempt the same decoding using a deep neural network trained on classification by presenting the same images presented to the monkey in the experiment, could one achieve similar decoding for the gravitational frame same/different task? Or would it completely fail?

We should have explained in the main text that our match/nonmatch decoding model was a simple linear discriminant analysis implemented with the Matlab function fitcdiscr. Given that linear discrimination worked with high accuracy, there was no point in exploring more complex, nonlinear classification schemes like deep networks, which could easily capture linear decoding mechanisms. This is now clarified:

This match/non-match task allowed us to analyze population information about orientation equivalence even though individual neurons were tested using different stimuli with no comparability between orientations. (Across neurons, orientations were aligned according to their order in the tested range, so that each non-match trial involved the same orientation difference, in the same direction, for each neuron.) Our decoding method was linear discriminant analysis of the neural population response patterns for each stimulus pair, implemented with Matlab function fitcdiscr.

The accuracy of these linear models for orientation match/non-match in the gravitational reference frame was 97% (10-fold cross-validation). The accuracy of models for orientation match/non-match in the retinal reference frame was 98%. (The accuracies for analyses based on the partially overlapping population of 99 neurons tested with isolated objects were 81% gravitational and 90% retinal.) The success of these simple linear models shows that information in both reference frames was decodable as weighted sums across the neural population. No complex, nonlinear, time-consuming neural processing would be required. This easy, linear decoding of information in both reference frames is consistent with psychophysical results showing that humans have voluntary access to either reference frame23. High accuracy was obtained even with models based solely on neurons that showed no significant correlation in either gravitational or retinal reference frames (Figure 2a, light gray): 89% for gravitational discrimination and 97% for retinal discrimination. This supports the idea that these neurons carry a mix of retinal and gravitational object orientation signals.

– Additional discussion of the relation of current findings to known functional architecture of IT would be helpful. For example, the recordings were from AP5 to AP25. Were any differences observed across this span? Were cells recorded in object or scene regions of IT (cf. Vaziri and Connor)?

We have added anatomical plots and a table to present these results in Figure 2—figure supplements 4,5.

– Also, how do results relate to the notion of IT cells generating an invariant representation? If IT cells were completely rotation invariant, then all the points should cluster in the top right in their scatter plots, and that is clearly not the case. Is the suggestion then that in general IT cells are less invariant to rotations than to translations, scalings, etc., and furthermore that this selectivity for rotation angle is represented in a mixed reference frame, enabling robust decoding of identity and orientation in retinal and gravitational coordinates? A more explicit statement on the relation of the findings to the idea of IT encoding a hierarchy of increasingly invariant representations would be helpful.

Terrific suggestion; this really is a point of confusion throughout the ventral pathway field. We have added a new second paragraph to the discussion:

“It is important to distinguish this result from the notion of increasing invariance, including rotational invariance, at higher levels in the ventral pathway. There is a degree of rotational invariance in IT, but even by the broadest definition of invariance (angular range across which responses to an optimal stimulus remain significantly greater than average responses to random stimuli) the average is ~90° for in-plane rotation and less for out-of-plane rotations.17 It has often been suggested that the ventral pathway progressively discards information about spatial positions, orientations, and sizes as a way to standardize the neural representations of object identities. But, in fact, these critical dimensions for understanding the physical world of objects and environments are not discarded but rather transformed. In particular, spatial position information is transformed from retinal coordinates into relative spatial relationships between parts of contours, surfaces, object parts, and objects.18 Our results here indicate a novel kind of transformation of orientation information in the ventral pathway, from the original reference frame of the eyes to the gravitational reference frame that defines physical interactions in the world. Because this is an allocentric reference frame, the representation of orientation with respect to gravity is invariant to changes in the observer system (especially lateral head tilt), making representation more stable and more relevant to external physical events. However, our results do not suggest a change in orientation tuning breadth, increased invariance to object rotation, or a loss of critical object orientation information.”

Reviewer #3 (Recommendations for the authors):

1. The authors employ a correlation analysis to examine quantitatively the effect of tilt on orientation tuning. However, it is not clear to me how well the correlation analysis can distinguish the two reference frames (retinal versus gravitational). For instance, for the data in Figure 1, I expect that the retinal reference frame also would have provided a positive correlation although the orientation tuning shifted as predicted in retinal coordinates. Furthermore, a lack of correlation can reflect an absence of orientation tuning. Therefore, I suggest that the authors select first those neurons that show a significant orientation tuning for at least one of the two tilts. For those neurons, they can determine for each tilt the preferred orientation and examine the difference in preferred orientation between the two tilts.

We understand the intent of this suggestion, and it is certainly desirable to have a measure that definitively differentiates between the two reference frames for each neuron. However, the response profiles for object orientations for IT neurons are not always unimodal. Worse, in most cases the breadth of tuning characteristic of IT neurons makes the orientation peak response negligibly different from a wide range of neighboring orientation responses. This can be seen in the examples in notes to Figure S2. As a result, using peak or preferred orientation would be hopelessly noisy and uninformative. The suggested analysis would be good for narrow V1 bar/grating orientation tuning but not for IT object orientation tuning. The best only way to measure similarity of orientation tuning is correlation across all the tested orientations, and that is why we use that as the measure of reference frame alignment throughout the paper.

Each of the two reference frames provides clear predictions about the expected (absence of) difference between the preferred orientations for the two tilts. Using such an analysis they can also determine whether neurons tested with and without a scene background show effects of tilt on orientation preference that are consistent across the scene manipulation (i.e. without and with scene background). Then the scene data would be useful.

We now make this comparison, using correlation for the reasons just explained, between the two experimental conditions in Figure 2—figure supplement 6.

2. I have two issues with the population decoding analysis. First, the authors should provide a better description of the implementation of the decoding analysis. It was unclear to me how the match-nonmatch task was implemented. Second, they should perform this analysis for the object without scene background data, since as I discussed above, the scene data are difficult to interpret.

We now specify exactly how this analysis was done and report the results for isolated object experiments:

“Our decoding method was linear discriminant analysis of the neural population response patterns for each stimulus pair, implemented with Matlab function fitcdiscr.”

The accuracy of these linear models for orientation match/non-match in the gravitational reference frame was 97% (10-fold cross-validation). The accuracy of models for orientation match/non-match in the retinal reference frame was 98%. (The accuracies for analyses based on the partially overlapping population of 99 neurons tested with isolated objects were 81% gravitational and 90% retinal.) The success of these simple linear models shows that information in both reference frames was easily decodable as weighted sums across the neural population. No complex, nonlinear, time-consuming neural processing would be required.

3. The authors pooled the data of the two monkeys. They should provide also individual monkey data so that the reader knows how consistent the effects are for the two animals.

This is now done in Figure 2—figure supplements 2,3.

https://doi.org/10.7554/eLife.81701.sa2

Article and author information

Author details

  1. Alexandriya MX Emonds

    1. Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, United States
    2. Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, United States
    Present address
    University of Chicago, Chicago, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
  2. Ramanujan Srinath

    1. Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, United States
    2. Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, United States
    Present address
    Department of Neurobiology and Neuroscience Institute, University of Chicago, Chicago, United States
    Contribution
    Software, Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1832-7250
  3. Kristina J Nielsen

    1. Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, United States
    2. Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Validation, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
    Additional information
    Senior authors
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9155-2972
  4. Charles E Connor

    1. Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, United States
    2. Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    connor@jhu.edu
    Competing interests
    No competing interests declared
    Additional information
    Senior authors
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8306-2818

Funding

National Institutes of Health (EY029420)

  • Kristina J Nielsen
  • Charles E Connor

Office of Naval Research (N00014-22206)

  • Charles E Connor

Office of Naval Research (N00014-18-1-2119)

  • Charles E Connor

National Institutes of Health (NS086930)

  • Charles E Connor

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors acknowledge the design and manufacturing contributions of William Nash, William Quinlan, and James Garmon, the software and hardware engineering contributions of Justin Killibrew, and the animal care, handling, training, and surgery contributions of Ofelia Garalde. Dr. Amy Bastian commented on the manuscript.

Ethics

All animal procedures were approved by the Johns Hopkins Animal Care and Use Committee (protocol # PR21M422) and conformed to US National Institutes of Health and US Department of Agriculture guidelines.

Senior Editor

  1. Tirin Moore, Howard Hughes Medical Institute, Stanford University, United States

Reviewing Editor

  1. SP Arun, Indian Institute of Science Bangalore, India

Version history

  1. Received: July 8, 2022
  2. Preprint posted: August 7, 2022 (view preprint)
  3. Accepted: July 19, 2023
  4. Version of Record published: August 10, 2023 (version 1)

Copyright

© 2023, Emonds et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 253
    Page views
  • 49
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alexandriya MX Emonds
  2. Ramanujan Srinath
  3. Kristina J Nielsen
  4. Charles E Connor
(2023)
Object representation in a gravitational reference frame
eLife 12:e81701.
https://doi.org/10.7554/eLife.81701

Share this article

https://doi.org/10.7554/eLife.81701

Further reading

    1. Neuroscience
    Peibo Xu, Jian Peng ... Yuejun Chen
    Research Article

    Deciphering patterns of connectivity between neurons in the brain is a critical step toward understanding brain function. Imaging-based neuroanatomical tracing identifies area-to-area or sparse neuron-to-neuron connectivity patterns, but with limited throughput. Barcode-based connectomics maps large numbers of single-neuron projections, but remains a challenge for jointly analyzing single-cell transcriptomics. Here, we established a rAAV2-retro barcode-based multiplexed tracing method that simultaneously characterizes the projectome and transcriptome at the single neuron level. We uncovered dedicated and collateral projection patterns of ventromedial prefrontal cortex (vmPFC) neurons to five downstream targets and found that projection-defined vmPFC neurons are molecularly heterogeneous. We identified transcriptional signatures of projection-specific vmPFC neurons, and verified Pou3f1 as a marker gene enriched in neurons projecting to the lateral hypothalamus, denoting a distinct subset with collateral projections to both dorsomedial striatum and lateral hypothalamus. In summary, we have developed a new multiplexed technique whose paired connectome and gene expression data can help reveal organizational principles that form neural circuits and process information.

    1. Neuroscience
    Maureen van der Grinten, Jaap de Ruyter van Steveninck ... Yağmur Güçlütürk
    Tools and Resources

    Blindness affects millions of people around the world. A promising solution to restoring a form of vision for some individuals are cortical visual prostheses, which bypass part of the impaired visual pathway by converting camera input to electrical stimulation of the visual system. The artificially induced visual percept (a pattern of localized light flashes, or ‘phosphenes’) has limited resolution, and a great portion of the field’s research is devoted to optimizing the efficacy, efficiency, and practical usefulness of the encoding of visual information. A commonly exploited method is non-invasive functional evaluation in sighted subjects or with computational models by using simulated prosthetic vision (SPV) pipelines. An important challenge in this approach is to balance enhanced perceptual realism, biologically plausibility, and real-time performance in the simulation of cortical prosthetic vision. We present a biologically plausible, PyTorch-based phosphene simulator that can run in real-time and uses differentiable operations to allow for gradient-based computational optimization of phosphene encoding models. The simulator integrates a wide range of clinical results with neurophysiological evidence in humans and non-human primates. The pipeline includes a model of the retinotopic organization and cortical magnification of the visual cortex. Moreover, the quantitative effects of stimulation parameters and temporal dynamics on phosphene characteristics are incorporated. Our results demonstrate the simulator’s suitability for both computational applications such as end-to-end deep learning-based prosthetic vision optimization as well as behavioral experiments. The modular and open-source software provides a flexible simulation framework for computational, clinical, and behavioral neuroscientists working on visual neuroprosthetics.