Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
- Senior EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
Reviewer #1 (Public review):
Summary:
This paper describes a complex series of studies that measure and explain object recognition in mice. The authors demonstrate that mice are capable of solving an object recognition task, that object identity is decodable in different regions of cortex, and the decodability, to some extent, can be captured by extant theory on object manifolds in deep neural networks. The authors further add some correlational analysis of the time courses of object discriminability to bolster their claim of an object processing hierarchy in the mouse cortex.
The behavioral and neural data described in this paper are likely to be of interest to the general neuroscience community. That said, I have some issues with the analyses, modeling, and image dataset that I'll detail below.
Strengths:
(1) The behavioral work is incredibly cool. Getting mice to solve this task is a real achievement and opens up new avenues for mice as a model for complex visual tasks.
(2) Similarly, the neural recordings are astounding in their scale.
(3) This could be the most complete demonstration of a primate-analogous object processing network in the mouse.
Weaknesses:
No new weaknesses were noted by this reviewer.
Reviewer #2 (Public review):
Summary:
The paper argues that mice are capable of some view-invariant object recognition and that some of their visual areas (especially LM, LI, and AL) carry linearly-decodable signals that could, in principle, help in this process. Further, it argues that the population code in those areas makes linear decodability easier in two ways (fewer dimensions and a smaller radius).
Strengths:
It is very useful to see the performance of the mice in this difficult task, and to compare it to the performance of neurons in the mouse visual system. It is also useful to see analyses of the neural code that seek to understand how the code in some visual areas may be particularly suited to decoding object identity.
Weaknesses:
Though the paper has improved from the previous submission, there are still some open questions, especially about whether some lower-level properties of the neurons (such as receptive field location) might explain the differences between visual areas. This and other concerns are outlined below.
(1) Do the signals from the visual areas outperform or underperform the mice? It is hard to tell, because for mice we get numbers in percent correct (Figure 1e, based on 2 alternatives), whereas for visual areas we get numbers in bits (Figure 2c, where it is not clear whether there are 2 or 4 alternatives). This makes it very hard to compare the two. The authors should provide a statement or figure where readers can compare the two. Also, if the behavioral data are obtained with 2AFC, why not run the analyses as 2AFC too?
(2) Differences in discriminability across objects (Figure 1f). Are these differences also seen for the model based on the difference of Gaussians? (The authors should add those predictions to the plot.) If so, this could further point to possible low-level explanations. It is already quite interesting that the difference of Gaussians model predicts ~58% accuracy, which is not far from the ~65% accuracy of the mice.
(3) Similarly, in a later figure about decoding visual cortical activity, the authors should show a similar breakdown by object. Are certain objects more decodable than others?
(4) Number of neurons. It is wonderful to see so many neurons (489182, i.e., an average of ~15k per mouse). But might the same neurons have been recorded multiple times? Has a tool like ROICat or similar been run to exclude this? If not, that is ok, but the authors should add a sentence in Results to indicate that these are not unique neurons (some neurons may be duplicates or triplicates).
(5) Retinotopy: "within the same ∼20o area of visual space". This is a useful analysis, but which 20 deg area was considered? Was it the one in front of the mice? This would be surprising, because some of the regions do not cover that area (Zhuang et al, eLife 2017). Was a different area chosen? What are its coordinates in azimuth and elevation? And how does it compare to the region where the stimulus was shown during imaging? The Methods do not explain where the stimulus was placed (only that it was in front of the left eye). This information should be added. Also, the screen covered ~120 deg of visual space (63 cm monitor placed 15 cm away), so the emphasis on a 20 deg area is not clear. The authors should provide a figure showing coverage of the screen by each visual area and the position of the stimuli presented during imaging.
(6) If during imaging the stimuli were presented slightly above the horizontal meridian, then a possible explanation for the superiority of LM, AL, and LI is that their receptive fields tend to be in the upper visual field, whereas the rest of the higher visual areas tend to have receptive fields in the lower visual field (Zhuang et al, eLife 2017).
(7) Dimensionality: "number of directions in which this variability is spread". Unless I missed the explanation, the Methods don't provide any information on how the dimensionality is computed. Is it done with cross-validation? If not, noise can be interpreted as having high dimension. There are methods to estimate dimensionality with cross-validation, thus excluding the contribution of noise (e.g., Stringer et al 2019). The authors should confirm that this was done with cross-validation and provide information in the Methods.
(8) Temporal dynamics: "evidence for temporal integration during a trial". Are there really dynamics in the visual responses that last on the scale of seconds? This would be remarkable. Image recognition is usually thought to be done in 100 ms. The long scales presented here are more likely associated with behavioral responses or state responses, or similar. Might there be different brain state correlates in the different cases? For instance, pupil dilation might be different.
(9) Methods: "to ensure animals were in an attentive state (eyes clear and open)". This sounds peculiar. Did the mice ever close their eyes? If so, that's a discovery. Mice keep their eyes open at all times, even when they are sleeping. So, using eye closure for online detection of "inattentive states" does not seem to make sense. (Also, and this is a minor point: why stop a scan when the animal is "inattentive"? Wouldn't one want to acquire the associated data for comparison? Is the point to save disk space?). This whole set of statements is a bit concerning.