A picture is worth more than one word

The brain’s visual areas independently process rich levels of information from a scene to help identify objects while retaining other details.

Image credit: Liam Huang via Flickr (CC BY 2.0)

When looking at a picture, we can quickly identify a recognizable object, such as an apple, applying a single word label to it. Although extensive neuroscience research has focused on how human and monkey brains achieve this recognition, our understanding of how the brain and brain-like computer models interpret other complex aspects of a visual scene – such as object position and environmental context – remains incomplete.

In particular, it was not clear to what extent object recognition comes at the expense of other important scene details. For example, various aspects of the scene might be processed simultaneously. On the other hand, general object recognition may interfere with processing of such details.

To investigate this, Lindsey and Issa analyzed 12 monkey and human brain datasets, as well as numerous computer models, to explore how different aspects of a scene are encoded in neurons and how these aspects are represented by computational models. The analysis revealed that preventing effective separation and retention of information about object pose and environmental context worsened object identification in monkey cortex neurons. In addition, the computer models that were the most brain-like could independently preserve the other scene details without interfering with object identification.

The findings suggest that human and monkey high level ventral visual processing systems are capable of representing the environment in a more complex way than previously appreciated. In the future, studying more brain activity data could help to identify how rich the encoded information is and how it might support other functions like spatial navigation. This knowledge could help to build computational models that process the information in the same way, potentially improving their understanding of real-world scenes.