Figures and data

Experimental procedure for behavioral training.
a. Single frames from movies with the objects that were presented to the animals. b. Behavioral training sequence. c. Probability of licking either probe during the early training period (upper bar plot) and later training period (lower bar plot) for 1 animal. Error bars represent S.E.M. Student t-test * p < 0.05 d. Performance as a function of training time, N = 7 animals. e. Performance across repetitions of familiar (gray) and novel (red) object transformations during one session. N = 5 animals. Gray dashed line represents chance level. Blue line represents performance of a pixel based linear classifier. G reen l ine represents performance of a DoG filter based linear classifier. F or both (d) and (e) shaded areas represent S.E.M. f. Performance for different objects. Lines indicate the median of the distribution, and the variable bars indicate the 75, 90, 95, and 100 percentiles. Dots represent the raw values. N = 5 animals. N= 80, 20, 20, 80 conditions for objects A, B, C and D respectively g. Contributions of the object latent parameters toward predicting the average performance of the animals. N = 5 animals. Error bars show s.e. of the regression coefficients scaled by the mean difference.

Object identity decoding across the visual hierarchy.
a. Example large field of view recording (green) with area boundaries overlaid. Scale bar represents 1mm. A small inset depicts the two-photon average image for a small segment of the large field of view captured with the mesoscope. b. Example responses of all neurons to moving objects (shown on top) from the recording shown in (a). Each clip is presented for 3-5 seconds before a short pause switches to a new clip that might be the same or a different object identity. c. Discriminability of object identity as a function of the number of neurons sampled. Each line represents the average across all recorded sites. d. Scatter plot of the discriminability of different areas with a population of 64 neurons compared to V1 for all the recording sites. Insert histogram represents the difference between the discriminability of each area and V1. Red line and number indicate the mean difference. Diamonds represent the results with 2 objects whereas circles represent the results with 4 objects. Outliers have been omitted for better visualization. Wilcoxon signed rank test *** p < 0.001, ** p <0.01, * p < 0.05. e. Average discriminability of all visual areas with a population of 128 neurons. The number below each area represents the recording sites sampled. f. Same as in (e) but when using a single neuron at a time to decode the object identity. The number below each area represents the cells sampled. g. Low-dimensional representation of the 128-dimensional neural activity space, illustrating the separation of the responses to four different objects for three example areas. Each dot represents the average of the activity in one 500msec bin. The side histograms represent the distances of the data projected onto each of the four object category axes for the same-class (colored) and different-class (gray). Each insert represents the confusion matrix after decoding.

Generalization performance across background noise and identity-preserving transformations.
a. Left Top: Average performance of the decoder for objects without background. Left Bottom: Average performance of the decoder for objects with background. Right:Low-dimensional representation of the responses to the object w/ background are shown on the right similar to Fig. 2e. Each insert represents the confusion matrix after decoding. b. Generalization test across background noise. The decoder was trained on the responses to objects without background and tested on the responses to objects that contained back-ground noise. c. Bar plot indicating the difference in performance when testing the ability of the decoder to generalize across background noise, compared to the performance of the decoder in V1. * indicate p < 0.05 Kruskal-Wallis with multiple comparisons test. d. Example parameter space of the four nuisance classes: Translation (x/y), Scale, Pose (tilt/rotation) and Light (four light sources). The example images for the 4 bins are depicting objects that are located more to the right (Translation), objects that are larger (Scale), objects that are rotated more to the right (Pose) and a light source that is located lower (Light). The decoder was tested on a parameter space of each of the four nuisance variables that had not been part of the training set. e. Bar plot indicating the difference of the performance when testing on untrained parameter space, compared to the performance of V1. * indicate p < 0.05 Kruskal-Wallis with multiple comparisons test.

Classification capacity and geometry of manifolds across the visual hierarchy.
a. Scatter plot of the classification capacity of different areas compared to V1 for 4 objects. Insert histogram represents the difference between the classification capacity of each area and V1. Red line and number indicate the mean difference. Wilcoxon signed rank test *** p < 0.001, ** p < 0.01, * p < 0.05. b. Average classification capacity of all visual areas with a population of 128 neurons. The number below each area represents the recording sites sampled. c. Illustration of low dimensional representations of object manifolds for two visual areas. Left: each point in an object manifold corresponds to neural responses to an object under certain identity-preserving transformations. Right: demonstration of two possible changes in the manifold geometry in a higher order area, reduction of the radius of one manifold through reduction of its extent in all directions (top) and reduction of the dimension of one manifold by concentrating variability at certain elongated axis, reducing the spread along other axes. Such changes have predictable effects on the ability to perform linear classification of those objects. d. Bar plots of the manifold radius differences to V1 (left), and manifold dimension differences to V1 (right) of all areas. Wilcoxon signed rank test, * p < 0.05.

Temporal dynamics and cross-area dependencies.
a. Schematic representation of the classification scores as the distances of the response trajectories to the decision boundary (left) and their resulting temporal dependencies across different areas (right). b. Score correlations across all recorded areas (left) and raw pairwise correlations of the single neuron activity between areas (right). Significance was estimated by bootstrapping across all correlations, * p < 0.025/45. c. Schematic representation of the score partial correlation coefficients between areas.

Behavioral performance and object latent parameters.
Performance vs object latent parameters: X offset from center, Y offset from center, Magnification, Rotation, Tilt & Speed. X offset, Rotation and Tilt are plotted as the absolute values from the default object parameter set. N=200, all data from 5 animals. The regression line is indicated for each and the explained variance of the regression is noted on the top left of each plot.

Reliability across all visual areas.
a. Comparison of the average reliability of the responses to the object stimuli across neurons of all visual areas (y-axis) and neurons in non-visual areas (x-axis). Insert histogram represents the difference between the average reliability between each visual area and non-visual areas. Red line and number indicate the mean difference across all recording sites. Wilcoxon signed rank test *** p < 0.001, ** p <0.01, * p < 0.05. b. Discriminability vs average reliability for all the cells with each recording. Plotted separately for objects w/ (red) and w/o background (gray). The regression line is indicated for each and the explained variance of the regression is noted on the top left of each plot. At the sides of each axis are the boxplots for each of the datasets.

Discriminability with single neurons or with RF restriction or with all the neurons and pairwise decoding of object identity.
a. Bar graph of the average discriminability when using single neurons to decode the object identity. Horizontal lines indicate p < 0.01 Kruskal-Wallis with multiple comparisons test. Number of recording sites is reported in Fig. 2c. b. Average discriminability for all visual areas when selecting a population of 20 cells that have their RF centered within the same 20° of visual space. The number below each area represents the recording sites sampled. * p < 0.05 Wilcoxon signed rank test when compared to V1. c. Candle plot of the average pairwise discriminability in % correct. Dots indicate different samples for each area and their color indicates the number of neurons recorded. d. Scatter plot of the pairwise object discriminability in % correct of different areas with a population of 64 neurons compared to V1 for all the recording sites. Insert histogram represents the difference between the discriminability of each area and V1. Red line and number indicate the mean difference. Outliers have been omitted for better visualization. Wilcoxon signed rank test *** p < 0.001, ** p <0.01, * p < 0.05.

Models of RF and decoding performance.
a. The 256 RF (ICA 100%), their enlarged version (ICA 150%) and their combination (ICA multi) that were used for the computational model. b. Discriminability of the simulated responses of 128 units to objects and the generalization test on objects with background and translation, from the filters in (a) and also the pixels of the stimuli as input. Horizontal lines indicate p < 0.05, Kruskal-wallis multiple comparison test.

Object size and object speed effect on decoding.
a. Discriminability as a function of the object size for all visual areas. b. Discriminability as a function of object speed for all visual areas. Shaded areas represent S.E.M. c. Difference in discriminability between Low (<8°/s) and High (>32°/s) object speeds. Horizontal lines indicate significant difference with Kruskal-Wallis with multiple comparisons test p < 0.05.

State dependent decoding performance.
a. Comparison of the discriminability of the responses to the object stimuli across neurons of all visual areas while the animals are running (y-axis) and while the animals remain stationary (x-axis). Insert histogram represents the difference between the discriminability between the two states. Red line and number indicate the mean difference across all recording sites. Wilcoxon signed rank test *** p < 0.001 b. Average discriminability for all visual areas while the animals are running and in c. while they are stationary. The number in each area in c represents the recording sites sampled.

Capacity shuffling control.
Gray: Classification capacity of different areas. Red: Classification capacity estimated after shuffling of the object identity labels. Lines indicate the median of the distribution, and the variable bars indicate the 75, 90, 95, and 100 percentiles. Dots represent the raw values.

Temporal dynamics when object identity is switching.
a. Discriminability across time for trials where the preserved object identity is preserved (cis-trials) or switched (trans-trials). b. Normalized discriminability to the average discriminability of the cis trials. c. Difference between normalized discriminabilities to V1. Shaded areas represent S.E.M. d. Bar plot of difference in normalized discriminability of the cis (left) and trans (right) trials during the 0.5-1.5 seconds of the trial. Wilcoxon signed rank test, * p < 0.05.

Score correlation for different latent parameters.
a. Schematic representation of the score correlation coefficients between areas when the the decoder is trained to decode both object identity and the different latent parameters of the objects compared to the control in which we randomly sampled from the latent parameter distributions. The bar width and luminance indicates the score correlation coefficient and the color indicated the % change from the control (bottom row, right) b. Average difference between the score correlations of all visual areas to control for different latent parameters, * p < 0.05
