Perceiving animacy in ‘identical’ images

  1. Johns Hopkins University, Baltimore, United States

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Xilin Zhang
    Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, China
  • Senior Editor
    Yanchao Bi
    Peking University, Beijing, China

Reviewer #1 (Public review):

Summary:

Evidence for visual representation of animacy.

Strengths:

This is a very cool paper that casts light on a persistent problem in the psychology and philosophy of visual representation: is there high-level perception? Every vision scientist agrees that low-level features such as shape, color, texture, motion and spatial frequency are represented in visual perception, but there is a great deal of controversy about the representation of high-level properties such as causation, faces, agency and animacy. Animacy is especially problematic because there are large differences in line curvature between stimuli that represent animate and inanimate items.

This article uses a novel approach-visual "anagrams" that are exactly the same image, except one is rotated 90 degrees relative to the other. They found persistent differences in visual processing between animate and inanimate stimuli. (Of course, the stimuli aren't animate-they represent animate items.). For example, there were processing differences between changes between animate and inanimate items (rabbit to boot) that were not present in rabbit to dog. They also showed such differences in two kinds of visual search tasks.

Of course, there are feature differences that exploit orientation. A classic example is the difference between a square and a diamond that is produced from the square by rotating it 45 degrees.

They addressed an aspect of this challenge having to do with some features using silhouettes. There was no search advantage for silhouetted stimuli.

Weaknesses:

I thought this was an excellent submission. I have two suggestions for revision:

(1) I thought that experiment 7 should have been described in more detail, with the upshot explained better. What exactly do the authors take it to show?

(2) There should be a candid discussion of what the loose ends are and how they might be addressed. It would be good to have some examples like the square/diamond case with some indication of what would address such challenges.

Reviewer #2 (Public review):

Summary:

The authors present a creative approach using visual anagrams matched on low-level image statistics to isolate animacy from low-level visual features and report consistent effects of animacy on visual working memory and attention. While this is a thoughtful design and is well executed across seven pre-registered experiments, it remains unclear whether the reported effect is truly driven by animacy, as opposed to broader differences in ensemble statistics or semantic structure across the "mixed animacy" versus "uniform animacy" conditions. As such, the interpretation of a "pure" animacy effect may be overstated.

Strengths:

(1) An important methodological advance in controlling low-level confounds that have historically complicated the study of animacy.

(2) The converging effects across multiple experiments, together with the pre-registered design, strengthen the reliability of the reported findings.

Weaknesses:

(1) Specificity of the animacy effect vs. category-level ensemble structure

The central claim is that animacy itself drives the observed effects. However, the key manipulation ("mixed animacy" versus "uniform animacy") also introduces differences in category-level ensemble structure. For example, in Experiments 1-2, cross-category change detection (e.g., dog to chair) may be easier not because of animacy per se, but because of a change in overall ensemble statistics (Brady & Alvarez, 2011, 2015). In addition, since each display contains five objects (two in one category and three in the other category), cross-category changes may also alter category balance in a way that further facilitates detection. In contrast, within-category changes preserve both ensemble structure and category composition, making them more difficult to detect.

Brady, T. F., & Alvarez, G. A. (2011). Hierarchical encoding in visual working memory: Ensemble statistics bias memory for individual items. Psychological Science.

Brady, T. F., & Alvarez, G. A. (2015). Contextual effects in visual working memory reveal hierarchically structured memory representations. Journal of Vision.

(2) Limited stimulus set and potential learning effects

The relatively small stimulus set (six anagram pairs) and repeated exposure raise the possibility of learning or familiarity effects. Does performance change over time? e.g., are there meaningful differences between early and late trials (e.g., first 10% vs. last 10%)? If such differences are present, they could suggest the development of task-specific strategies or increased efficiency with repeated exposure, rather than stable effects driven by the experimental manipulation itself.

(3) Role of semantics

Although the anagram paradigm effectively controls low-level visual features, it still relies on high-level semantics (e.g., "dog" vs. "boot"). These stimuli differ not only in animacy but also along other semantic dimensions such as natural versus manmade categories. From a semantic standpoint, it remains unclear whether the observed effects can be uniquely attributed to animacy or whether they reflect broader conceptual distinctions.

Reviewer #3 (Public review):

Summary:

This study makes clever use of generative AI to create stimuli that are pixel-for-pixel identical but which have radically different meanings depending on their orientation, to investigate the perception of animacy while retaining control over low-level image features (so-called 'anagram' stimuli).

The authors present seven elegantly designed experiments in a commendably compact format.

Experiments 1 and 2 involved a working memory paradigm in which participants had to spot which of five objects in an array changed after a pause. Importantly, the changed object was an anagram stimulus that in one orientation matched the animacy/inanimacy of the changed object, and in the other orientation was the opposite (e.g., a rabbit is replaced by either a dog or a boot, where the dog and boot stimuli are actually identical, just rotated by 90 degrees). They found a difference in accuracy depending on whether the animacy of the objects matched.

Experiments 3 and 4 used a visual search task in which the participants had to localize the target, and the distractors were anagrams that either matched the target in terms of animacy or did not. There was a significant cost in terms of response time when the animacy of the target was the same as that of the distractors. Experiments 5 and 6 also used a similar visual search design, except that the task was to determine if the target was present or absent from the display, and the distractors again either matched or differed from the target in terms of animacy. Again, the authors found slower responses when the distractor arrays matched the animacy of the target than when they differed.

An obvious potential concern about the studies is addressed by Experiment 7. It is unclear if the observed effects are related to the specific orientations of the target and distractor stimuli selected in each condition. For example, it could be that all the animate versions of the anagrams involved tall and skinny shapes, while all the inanimate versions involved wide and short objects, due to the 90-degree rotational difference between the two versions of the stimuli. To control for this, the authors repeated the visual search experiment but with convex-hull silhouettes of each of the stimuli. In other words, all targets and distractors from each trial were replaced by a black splotch with approximately the same overall outline (envelope) as the corresponding stimulus. Importantly, in contrast to the anagram stimuli, the silhouettes had had no meaningful semantic interpretation, and their animacy did not change depending on their orientation.

Strengths:

The main strength is the elegant use of stimuli that control almost perfectly for low-level image features.

Weaknesses:

My only real concern about the study is whether the findings truly provide evidence for a high-level visual representation of animacy independent of the low-level stimulus characteristics, or whether, instead, the effects are essentially semantic priming, which is independent of visual processing per se. For example, if all the stimuli in the experiments were replaced with the verbal names of the depicted objects instead of pictures, would we expect different results? Words can also access semantic representations of the animacy of objects, and also don't suffer from low-level visual confounds. It would be helpful to add a discussion of this possibility to the article.

Reviewer #4 (Public review):

In this article, the authors investigate whether perceived animacy influences visual processing independently of lower-level visual features by using "visual anagrams." Across seven experiments, they test whether animacy, isolated from many lower-level visual properties, structures visual working memory and guides visual attention. The central claim is that the visual system may represent animacy itself, rather than animacy emerging solely from associations among low-level visual properties.

I find this investigation compelling. The experiments described provide strong control over several lower-level visual features, including curvature, texture, and related image properties. However, the visual anagrams are not pixelwise-identical across orientations. Because the images are rotated, the retinal configuration of pixels and the spatial organization of some low- to mid-level shape features also change. As a result, the configural arrangement of mid-level visual features may still contribute to perceived animacy.

I encourage the authors to discuss how independent perceived animacy is in this context from the contribution of mid-level visual features, such as configural shape cues that are diagnostic of animacy. This distinction would help sharpen the interpretation of the results and more precisely define the level of visual representation isolated by the visual-anagram approach.

Additionally, previous studies have argued that low- and mid-level curvilinear features may contribute to animate/inanimate categorization, and may in some cases be sufficient to support such distinctions (e.g., PMID: 33798259; PMID: 28654965). I encourage the authors to clarify how these previous findings on curvilinearity and rectilinearity fit with the overarching claim of the current study, namely that the visual system may represent animacy itself rather than animacy emerging solely from associations among lower-level visual properties.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation