Framework for quantifying factorization in neural and model representations

(A) A subspace for encoding object identity in a linearly separable manner can be achieved by becoming invariant to non-class variables (compact spheres, middle column; colored dots represent example images within each class) or by encoding variance induced by non-identity variables in orthogonal neural axes to the identity subspace (extended cylinders, left column), but only the factorization strategy simultaneously represents multiple variables in a disentangled fashion. A code that is sensitive to non-identity parameters within the identity subspace corrupts the ability to decode identity (right column) (identity subspace in orange). (B) Variance across images within a class can be measured in two different linear subspaces: that containing the majority of variance for all other parameters (a, “other_param_subspace”) and that containing the majority of the variance for that parameter (b, “param_subspace”). Factorization is defined as the fraction of parameter-induced variance that avoids the other-parameter subspace (left). By contrast, invariance to the parameter of interest is computed by comparing the overall parameter-induced variance to the variance in response to other parameters (c, “var_other_param”) (right). (C) In a simulation of coding strategies for two binary variables out of 10 total dimensions that are varying (see Methods), a decrease in orthogonality of the relationship between the encoding of the two variables, or a>0 (going from a square to a parallelogram geometry; only 3 of 10 dimensions used in simulation are shown), despite maintaining linear separability of variables, results in poor classifier performance in the few training-samples regime.

Benefit of factorization to neural decoding in macaque V4 and IT

(A) Factorization of object identity and position increased from macaque V4 to IT (dataset E1 – multiunit activity in macaque visual cortex)(left). Like factorization, invariance also increased from V4 to IT (note, “identity” refers to invariance to all non-identity position factors, solid black line)(right). Combined with increased factorization of the remaining variance, this led to higher invariance within the variable’s subspace (orange lines), representing a neural subspace for identity information with invariance to nuisance parameters which decoders can target for read-out. (B) Applying a transformation to the data that rotated the relative positions of mean responses to object classes (see Methods), designed to preserve relevant activity statistics (including invariance to non-class factors) while decreasing factorization of class information from non-class factors, has the effect of reducing object class decoding performance (light vs. dark red bars, chance = 1/64; n=128 multi-unit sites in V4 and 128 in IT) .

Measurement of factorization in DNN models and relationship to neural predictivity

(A) Schematic showing how metaanalysis on models and brain data was conducted by first computing various representational metrics on models and then measuring a model’s predictive power across a variety of datasets. The combination of model-layer metric and model-layer dataset predictivity for a choice of model, layer, metric, and dataset specifies the coordinates of a single dot on the scatter plots in (C), and the across model correlation coefficient between a particular representational metric and neural predictivity for a dataset summarizes the potential importance of the metric in producing more brainlike models (see Figure 4). (B) For computing the representational metrics of factorization of and invariance to a scene parameter, variance in model responses was induced by individually varying each of four scene parameters (n=10 parameter levels) for each base scene (n=100 base scenes). (C) Scatter plots for example neural dataset (IT single-units, macaque E2 dataset) showing the correlation between a model’s predictive power as an encoding model for IT neural data versus a model’s ability to factorize or become invariant to different scene parameters (each dot is a different model, using each model’s penultimate layer). Note that factorization in trained models is consistently higher than that for an untrained, randomly initialized Resnet-50 DNN architecture (rightward shift relative to yellow vertical dashed line). Invariance to background and lighting but not to object pose and viewpoint increased in trained models relative to the untrained control (rightward versus leftward shift relative to yellow vertical dashed line). (D) Same as (C) except for human behavior performance patterns across images (human I2 dataset).

Scene parameter factorization correlates with more brainlike DNN models

(A) Factorization of scene parameters in model representations consistently correlated with a model being more brainlike across multiple independent datasets measuring monkey neurons, human fMRI voxels, or behavioral performance in both macaques and humans (left vs. right column)(gray bars). By contrast, increased invariance to camera viewpoint or object pose was not indicative of brainlike models (black bars). In all cases, model representational metric and neural predictivity score were computed by averaging scores across the last 5 model layers. (B) Recomputing camera viewpoint or object pose factorization from natural movie datasets that primarily contained camera or object motion, respectively (right: example movie frames; also see Methods), gave similar results for predicting which model representations would be more brainlike as computing these factorization scores using our synthetic images. (C) Summary of the results from (A) across datasets (x-axis) for invariance (open symbols) versus factorization (closed symbols).

Scene parameter factorization correlates with more brainlike DNN models

(A) Average brain predictivity across datasets of classification (black solid bar) and factorization (colored solid bars) in a model representation. Adding factorization to classification in a regression model produced significant improvements in predicting the most brainlike models, cross-validated (across models) performance averaged across datasets (unfaded bars exceed dashed line for classification alone as a metric). (B) Example scatter plots for neural and fMRI datasets (IT multi-units & single-units, macaque E1 & E2; fMRI with grayscale & color images, human F1 & F2) showing a reversing trend in neural (voxel) predictivity for models that are increasingly good at classification (left column). This saturating/reversing trend is no longer present when adding object pose factorization to classification as a combined, predictive metric for brainlikeness of a model (right column).

Factorization and invariance in V4 & IT neural data

Normalized factorization and invariance as in Figure 2A but after subtracting shuffle control for V4 and IT neural dataset. Shuffling the image identities of each population vector accounts for increases in factorization driven purely by changes in the covariance statistics of population responses between V4 and IT. However, normalized factorization scores remain significantly above zero for both brain areas.

Scatter plots for all datasets

Scatter plots as in Figure 3C,D for all datasets. Brain metrics (y-axes) by panel are: (A) macaque neuron/human voxel fits in V4 cortex, (B) macaque neuron/human voxel fits in ITC/HVC, and (C) macaque/human per-image classification performance (I1) and image-by-distractor class performance (I2). In all panels, the plots in the top half use DNN factorization scores on the x-axis while the bottom half use DNN invariance scores.

Predictivity of factorization and invariance restricting to ResNet-50 model architectures

Same format as Figure 4C except with the analyses restricted to using only models with the Resnet-50 architecture. The main finding of factorization of scene parameters in DNNs being generally positively correlated with better predictions of brain data is replicated using this architecture-matched subset of models, controlling for potential confounds from model architecture.

Predictivity of factorization and invariance for RDMs

Same format as Figure 4C except for predicting population representational dissimilarity matrices (RDMs) of macaque neurophysiological and human fMRI data (in the main analyses linear encoding fits of each single neuron/voxel were used to measure brain predictivity of a model). The main finding of factorization of scene parameters in DNNs being positively correlated with better predictions of brain data is replicated using RDMs instead of neural/voxel goodness of fit.

Effect on neural and behavioral predictivity of PCA threshold for computing factorization, Related to Figure 4

The % variance threshold used in the main text for estimating a PCA linear subspace capturing the bulk of the variance induced by all other parameters besides the parameter of interest is somewhat arbitrary. Here we show that results of our main analysis change little if we vary this parameter from 50-99%. In the main text, a PCA threshold of 90% was used for computing factorization scores.

Datasets used for measuring similarity of models to the brain

Datasets from both macaque and human high-level visual cortex as well as high-level visual behavior were collated for testing the brainlikeness of computational models. For neural and fMRI datasets, the features in the model were used to predict the image-by-image response pattern of each neuron or voxel. For behavior datasets, the performance of linear decoders built atop model representations were compared to performance per image of macaques and humans.

Models tested

For each model, we measured representational factorization and invariance in each of the final five representational layers of the model as well as evaluating their brainlikeness using the datasets in Table S1.