Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorArun SPIndian Institute of Science Bangalore, Bangalore, India
- Senior EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
Reviewer #1 (Public Review):
In this study, the authors have compared object recognition in humans and rats. To this end, they trained rats to touch a target object shown along with a distractor object. The rats were initially trained on a base pair, and then tested on sets of variant pairs where the target or distractor could be transformed through size, position, 3d rotation or lighting variations. In addition, the authors then used a cDNN to find image pairs that would elicit different performance from early vs late layers, and tested rats and humans on these pairs (zero vs high and high vs zero protocols). A similar protocol was used for humans as well to get their performance on the base pair and test pairs. Finally the authors find the correlation between cDNN performance on each layer and rat or human performance across all test protocols. The main finding is that rats show the best match to earlier cDNN layers compared to humans. Based on this the authors conclude that humans and rats show contrasting performance on object recognition.
General comments
Whether rats and humans have similar object representations is an interesting and fundamental question, and I commend the authors for their extensive matched experiments on rats and humans. However, the conclusions must be tempered by the fact that the authors are testing a limited set of object variations derived from just two objects. There are also potentially substantial differences in the tasks given to rats and humans, if I understand the methods and procedures correctly. My concerns are detailed below.
My main concern is that the authors find very low agreement between rats and humans on comparable tasks, but it would be important if they can identify qualitative differences in the performance. For instance, can they say that rats are using low-level visual cues compared to humans. They could compare several low-level visual models (see below) and report how human and rat accuracy compares to each of these models. Since the visual representations of cDNNs are unknown, such a comparison would be useful.
The authors should also discuss the potential reason for the human-rat differences too, and importantly discuss whether these differences are coming from the rather unusual approach of training used in rats (i.e. to identify one item among a single pair of images), or perhaps due to the visual differences in the stimuli used (what were the image sizes used in rats and humans?). Can they address whether rats trained on more generic visual tasks (e.g. same-different, or category matching tasks) would show similar performance as humans?
I also found that a lot of essential information is not conveyed clearly in the manuscript. Perhaps it is there in earlier studies but it is very tedious for a reader to go back to some other studies to understand this one. For instance, the exact number of image pairs used for training and testing for rats and humans was either missing or hard to find out. The task used on rats was also extremely difficult to understand. An image of the experimental setup or a timeline graphic showing the entire trial with screenshots would have helped greatly.
The authors state that the rats received random reward on 80% of the trials, but is that on 80% of the correctly responded trials or on 80% of trials regardless of the correctness of the response? If these are free choice experiments, then the task demands are quite different. This needs to be clarified. Similarly, the authors mention that 1/3 of the trials in a given test block contained the old base pair - are these included in the accuracy calculations?
It would be good for the authors to articulate more clearly the nature of the differences between humans and rats. For instance, rat behaviour was found to be correlated with low-level image properties like brightness, whereas presumably, human behaviour is not. It would be useful if the authors can compare rat behaviour against several possible alternative models, including the dCNN layers in Figure 4. These models could include other rats (giving a reliability estimate), luminance based models, contrast based models, models based on V1 simple cells, etc - these models would elucidate further the nature of the rat performance. A similar analysis could be done for human performance.
The authors were injecting noise with stimuli to cDNN to match its accuracy to rat. However, that noise potentially can interacted with the signal in cDNN and further influence the results. That could generate hidden confound in the results. Can they acknowledge/discuss this possibility?
The authors claimed that discrimination task in rats was more dependent on concavity than component arrangement (figure 1 left panel). But that could be just an artifact due to sampling more values of concavity than component arrangement. In that case these two attributes are not comparable at all. Could the authors address this issue in some way.
Reviewer #2 (Public Review):
Schnell et al. performed two extensive behavioral experiments concerning the processing of objects in rats and humans. To this aim, they designed a set of objects parametrically varying along alignment and concavity and then they used activations from a pretrained deep convolutional neural network to select stimuli that would require one of two different discrimination strategies, i.e. relying on either low- or high-level processing exclusively. The results show that rodents rely more on low-level processing than humans.
Strengths:
1. The results are challenging and call for a different interpretation of previous evidence. Indeed, this work shows that common assumptions about task complexity and visual processing are probably biased by our personal intuitions and are not equivalent in rodents, which instead tend to rely more on low-level properties.
2. This is an innovative (and assumption-free) approach that will prove useful to many visual neuroscientists. Personally, I second the authors' excitement about the proposed approach, and its potential to overcome the limits of experimenters' creativity and intuitions. In general, the claims seem well supported and the effects sufficiently clear.
3. This work provides an insightful link between rodent and human literature on object processing. Given the increasing number of studies on visual perception involving rodents, these kinds of comparisons are becoming crucial.
4. The paper raises several novel questions that will prompt more research in this direction.
Weaknesses:
1. There are a few inconsistencies in the number of subjects reported. Sometimes 45 humans are mentioned and sometimes 50. Probably they are just typos, but it's unclear.
2. A few aspects mentioned in the introduction and results are only defined in the Methods thus making the manuscript a bit hard to follow (e.g. the alignment dimension), htus I had to jump often from the main text to the methods to get a sense of their meaning.
3. The choices related to the stimulus design and the network used to categorize them are not fully described and would benefit from some further clarification/justification. The choice of alignment and concavity as baseline properties of the stimuli is not properly discussed. Also, from the low-correlations I got the feeling that AlexNet is just not a good model of rat visual processing. Which indeed can be interpreted as further evidence of what the authors are trying to demonstrate, but it might also be an isolated case.
4. Many important aspects of the task are not fully described in the Methods (e.g. size of the stimuli, reaction times and basic statistics on the responses).
Reviewer #3 (Public Review):
Schnell and colleagues trained rats on a visual categorization task. They found that rats could discriminate objects across various image transformations. Rat performance correlated best with late convolutional layers of an artificial neural network. In contrast, human performance showed the strongest correlation with higher, fully connected layers, indicating that rats employed simpler strategies to accomplish this task as compared to humans. This is a methodologically rigorous study. The authors tested a substantial number of rats across a large variety of stimuli. One notable strength is the use of neural networks to generate stimuli with varying levels of complexity. This approach shows significant potential as a principled model for conducting studies on object recognition and other related visual behavioral phenomena. The data strongly support the conclusion that rats and humans rely on different visual features for discrimination tasks. Overall, this is a valuable study that provides novel, important insights into the visual capabilities of rats. However, some aspects of the study need further clarification. The study does not provide clear insights into the visual features that enable rats to perform these discriminations. The relationship between neural network layers and specific aspects of visual behavior is not well-defined, representing a key limitation of the current work. Further, the current analyses do not adequately address the consistency of visual behaviors across different rats or whether different rats rely on the same visual features to accomplish the task. Lastly, rodent performance was substantially lower compared to humans and generally worse than neural network classification. The factors contributing to this disparity are unclear.