A battery of image classification challenges reveals shared and distinct object categorization behavior across monkeys, humans, and deep networks

Han Zhang; Zhihao Zheng; Jiaqi Hu; Qiao Wang; Mengya Xu; Zhaojiayi Zhou; Zixuan Li; Gouki Okazawa

doi:10.7554/eLife.111725.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Shuo Wang
Washington University in St. Louis, St. Louis, United States of America
Senior Editor
Joshua Gold
University of Pennsylvania, Philadelphia, United States of America

Reviewer #1 (Public review):

Summary:

This study presents a systematic behavioral characterization of object classification abilities in macaque monkeys using a high-throughput touchscreen-based paradigm. The work shows that monkeys can learn and generalize many binary object classification rules, and compares their behavior with humans and computational models. A key finding is that monkey behavior is more closely aligned with visual deep neural networks, whereas human behavior is better captured by language-informed models. The study provides a useful benchmark for understanding visually grounded object categorization in nonhuman primates.

Strengths:

The study introduces a scalable and well-controlled behavioral paradigm for testing many object classification rules in macaques. The comparison across monkeys, humans, and computational models is a major strength and makes the work broadly relevant to visual neuroscience, comparative cognition, and computational modeling. The results provide an informative framework for distinguishing categorization based primarily on visual representations from categorization supported by semantic or language-based knowledge.

Weaknesses:

Some aspects of the interpretation would benefit from clarification. In particular, it remains somewhat unclear what stimulus-level factors drive image difficulty, how much training performance reflects general rule learning versus repeated reinforcement of specific images, and whether monkeys and humans apply the same category rules. The link between macaque IT representations and monkey behavior is also suggestive but not yet fully resolved, given the limited and separate neural dataset.

https://doi.org/10.7554/eLife.111725.1.sa2

Reviewer #2 (Public review):

Summary:

The paper tackles a very interesting question and provides a solid and systematic piece of data that may be useful for numerous NeuroAI works in the future. The question is how well can macaque monkeys with a "pretrained" visual system without human knowledge learn to categorize images based on different kinds of (sometimes arbitrary) category definitions. In general, I love the paper, and I think both the data and presentation of it are beautiful.

Strengths:

(1) The authors developed a scalable method for training and studying this behavior, and did an exhaustive evaluation of monkeys' behavior and learning process.

(2) Beyond the behavior result, they performed extensive analysis and control experiments to isolate the cue monkeys are using to perform the categorization.

(3) The extensive comparison of behavior with deep neural networks is also super interesting.

(4) The authors performed a very careful examination of generalization behavior in monkeys, similar to standard practise in machine learning.

(5) The presentation of the data is very beautiful and deliberately designed, kudos to the authors for their efforts!

(6) I really enjoyed the further categorization task based on human knowledge, and the arbitrary rule task; this really pushes our understanding of the visual categorization and learning capability of monkeys.

(7) The examination of *learning dynamics* in human vs monkey is also quite interesting, i.e., humans can "understand the rule" and learn much faster versus monkeys learning across a few days.

Weaknesses:

(1) Though all results are pretty cool, the organization of results, figures, and sections can be modified to flow even better.

(2) Maybe provide DNN categorization and generalization results for the non-main monkey experiments (Figures 2,3), those comparisons can be really interesting too!

https://doi.org/10.7554/eLife.111725.1.sa1

Author response:

We sincerely thank the editors and reviewers for their time and thoughtful feedback on our manuscript. The reviewers' constructive comments have been very helpful in guiding our revision plan. Below, we outline our plan.

In response to Reviewer #1's comments on clarifying the factors that affect image difficulty and categorization rules, we will implement several revisions. First, to clarify what drives image difficulty, we will test whether image typicality within categories, quantified using methods such as Kramer et al. (2023; Sci Adv 9.17: eadd2981), can explain monkey categorization performance. Second, we will also examine whether performance on generalization images depended on their similarity to specific repeated images and on their category typicality. Third, to address whether monkeys and humans apply similar category rules, we will focus on images for which monkeys consistently made errors and examine whether these same images also yielded lower performance (i.e., longer reaction times) in humans.

Reviewer #1 also raised an important question about how well macaque IT representations and behavior align. The IT categorization performance estimated in our manuscript is currently lower than monkey behavior, but this may reflect the limited number of recorded neurons. We will estimate ceiling IT performance as a function of neuron count and compare it with monkey and human behavior.

In response to Reviewer #2's suggestion to enhance narrative flow, we will reorganize the text and adjust the ordering of certain figures and sections to ensure smoother transitions between findings and analyses. Specifically, we will more clearly state which parts of the manuscript establish monkeys' categorization ability and which parts compare their behavior with models or humans before performing a triangular comparison across all three.

Regarding Reviewer #2's suggestion to test DNN performance on control experiments (non-natural stimuli, arbitrary categorization), we agree this is an excellent addition. We will perform these analyses and plan to report the results in the revised manuscript.

We believe these revisions will substantially strengthen the manuscript and fully address the reviewers' feedback.

https://doi.org/10.7554/eLife.111725.1.sa0

A battery of image classification challenges reveals shared and distinct object categorization behavior across monkeys, humans, and deep networks

Peer review process

Editors

Be the first to read new articles from eLife