Task and model.

A) Top, Lost in Migration task. Bottom, the seven stimulus layouts (random target/flanker directions). B) VAM schematic. The numbers after the CNN layer names correspond to the number of channels used in that layer. See Methods for additional details.

Comparison of model/participant behavior.

For panels B-E, each point is one model/participant (n = 75), black line: unity, red line: linear best-fit. A) Example model/participant RT distributions. B) Mean RT (Pearson’s r = 0.99, bootstrap 95% CI = (0.99, 0.99), best-fit slope = 1.07). C) Accuracy (r = 0.91, 95% CI = (0.87, 0.94), slope = 1.15). D) RT congruency effect (r = 0.77, 95% CI = (0.67, 0.86), slope = 1.01). E) Accuracy congruency effect (r = 0.92, 95% CI = (0.88, 0.94), slope = 1.20). F) Drift rates averaged across all trials and models. G) Mean RT vs. age averaged across models. H) Example model/participant mean RT vs. stimulus layout (Pearson’s r = 0.67). I) Example model/participant mean RT vs. horizontal stimulus position (negative values: left of center; Pearson’s r = 0.79). J) Empirical CDF of Pearson’s r between model/participant mean RTs across stimulus feature bins (only participants with significant RT modulation are shown; layout: n = 60 models/participants, x-position: n = 72, y-position: n = 69). Error bars in panels F-I are bootstrap 95% confidence intervals.

Neural representations of target direction.

A) Schematic of the CNN features extracted from each network layer. B) Decoding accuracy of stimulus target direction. C) Normalized mutual information for target direction conveyed by single units, averaged across units. Mutual information was normalized by the entropy of the target direction distribution. D) Dimensionality of target representations as measured by the participation ratio of the target-centered feature covariance matrix. E) Proportion of units exhibiting selectivity for target direction. Panels B-E show the average across n = 75 models; error bars correspond to bootstrap 95% confidence intervals.

Suppression of task-irrelevant information and tolerance in task-relevant representations.

A) Decoding accuracy of stimulus target direction in a new distracter context (generalization performance). Context was defined by the values of a given stimulus feature (flanker direction, layout, horizontal/vertical position). B) Decoding accuracy of irrelevant stimulus features. C) Normalized mutual information for irrelevant stimulus features conveyed by single units, averaged across units. For each stimulus feature, the mutual information was normalized by the entropy of the stimulus feature distribution. All panels show the average across n = 75 models; error bars correspond to bootstrap 95% confidence intervals.

Orthogonality of target/flanker subspaces predicts accuracy congruency effects.

A) Target/flanker subspace alignment averaged across models. B) Pearson’s correlation coefficient between target/flanker subspace alignment and accuracy congruency effect calculated across models. C) Target/flanker subspace alignment vs. accuracy congruency effect for layers Conv4-FC1. Each point corresponds to one model; the red line is the linear best-fit. For all panels, n = 75 models. Error bars in panels A-B correspond to bootstrap 95% confidence intervals. Asterisks in panel B indicate a significant Pearson’s r (adjusted p-value < 0.05, permutation test with n = 1000 shuffles, Bonferroni correction for 7 comparisons).

Comparison of VAMs and task-optimized models.

A) Accuracy congruency effect. B) Target/flanker subspace alignment. C) Dimensionality of target representations, as measured by the participation ratio of the target-centered feature covariance matrix. D) Normalized mutual information for target/flanker direction conveyed by single units, averaged across units. Mutual information was normalized by the entropy of the target/flanker direction distribution. E) Decoding accuracy of target/flanker direction. F) Proportion of units exhibiting selectivity for target direction in layers Conv5-Conv6. All panels show the average across n = 75 task-optimized models and n = 75 VAMs; error bars correspond to bootstrap 95% confidence intervals. The VAM data shown in panels A-F is the same as that shown in Figs. 2E, 5A, 3D, 3C/4C, 3B/4B, and 3E, respectively.

VAM inference algorithm

Example model/participant RT distributions and dependence of RTs on stimulus features.

A) Example model/participant RT distributions (all trials). B) Examples of model/participant mean RT vs. stimulus layout. C) Examples of model/participant mean RT vs. horizontal stimulus position (negative values: left of center). D) Examples of model/participant mean RT vs. vertical stimulus position (negative values: above center). For all panels, error bars correspond to bootstrap 95% confidence intervals.

Age dependence of LBA parameters.

For all panels, we tested age-dependence with a oneway ANOVA and report Bonferroni-adjusted p-values, corrected for 4 comparisons (n = 75 models). We also report adjusted p-values from a post-hoc comparison of the 20–29 vs. 70–89 age groups conducted with Tukey’s HSD. Error bars correspond to bootstrap 95% confidence intervals. A) Non-decision time parameter t0 (F (5, 69) = 13.3, p < 1e-7). Tukey’s HSD for 20–29 vs. 70–89 age groups: p < 1e-8. B) Response caution (b - A; F (5, 69) = 0.49, p = 1.0). C) Mean target drift rate (F (5, 69) = 3.4, p = 0.026). Tukey’s HSD for 20–29 vs. 70–89 age groups: p = 0.002. D) Mean flanker drift rate (F (5, 69) = 0.72, p = 1.0).

Dependence of RTs on stimulus layout and position.

For each participant/model, we calculated the mean RT in each stimulus feature bin, then subtracted the average of these mean RTs from each bin. The panels show the average of these centered RTs across all participants with significant modulation of RT for that particular stimulus feature. For all panels, we conducted a one-way ANOVA for both models/participants and report Bonferroni-adjusted p-values, corrected for 3 comparisons (n = 75 models). We also report results from post-hoc comparisons between select feature bins conducted with Tukey’s HSD. Error bars correspond to bootstrap 95% confidence intervals. A) RT vs. stimulus layout (models: F (6, 53) = 7.43, p < 1e-6, RTs for the vertical line layout were significantly faster (Tukey’s HSD adjusted p-value < 0.05) than RTs from all other layouts except ‘>‘; participants: F (6, 53) = 23.1, p < 1e-22; RTs for the vertical line layout were significantly faster than RTs from all other layouts). B) RT vs. horizontal stimulus position (negative values: left of center; models: F (7, 64) = 16.8, p < 1e-18, RTs for the leftmost and rightmost position bins were significantly slower than RTs from all intermediate position bins; participants: F (7, 64) = 72.6, p < 1e-73; RTs for the leftmost and rightmost position bins were significantly slower than RTs from all intermediate position bins). C) RT vs. vertical stimulus position (negative values: above center; models: F (5, 66) = 17.2, p < 1e-14, RTs for the topmost and bottommost position bins were significantly slower than RTs from the two centermost position bins; participants: F(5, 66) = 113.3, p < 1e-74; RTs for the topmost and bottommost position bins were significantly slower than RTs from the two centermost position bins).

RT delta plots and conditional accuracy functions.

A) RT delta plots for participants and models trained with the standard training paradigm (all RT data; n = 75 models/participants). B) Conditional accuracy functions for participants and models trained with the standard training paradigm. C) RT delta plots for participants and models trained separately on data in each RT quantile, then combined (n=10 combined models/participants). D) Conditional accuracy functions for participants and models trained separately on data in each RT quantile, then combined. For all panels, error bars correspond to bootstrap 95% confidence intervals.

Activity of all selective (+) units for one example model.

Each row shows the activity of one unit for 100 randomly selected stimuli, sorted by target direction. The activity of each unit was centered and normalized by the activity of the stimulus with the largest magnitude activation. The small number of selective (+) units in layer Conv1 are not shown.

Absence of correlation between flanker suppression metrics and congruency effects.

All panels show the Pearson’s correlation coefficient between the specified suppression and behavior metrics, calculated across models. A) Flanker direction decoding accuracy vs. accuracy congruency effect. B) Mutual information for flanker direction conveyed by single units vs. accuracy congruency effect. C) Flanker direction decoding accuracy vs. RT congruency effect. D) Mutual information for flanker direction conveyed by single units vs. RT congruency effect. For all panels, n = 75 models, error bars correspond to bootstrap 95% confidence intervals. Asterisks indicate a significant Pearson’s r (adjusted p-value < 0.05, permutation test with n = 1000 shuffles, Bonferroni correction for 7 comparisons).

Absence of correlation between target/flanker subspace alignment and RT congruency effect.

Pearson’s correlation coefficient between target/flanker subspace alignment and RT congruency effect across models (n = 75 models, error bars correspond to bootstrap 95% confidence intervals). The correlation was not significant for any layer (adjusted p-value > 0.05, permutation test with n = 1000 shuffles, Bonferroni correction for 7 comparisons).

Additional analysis of VAMs and task-optimized models.

A) Normalized mutual information for stimulus layout and horizontal/vertical stimulus position conveyed by single units, averaged across units. Mutual information was normalized by the entropy of the corresponding stimulus feature distribution. B) Decoding accuracy of stimulus layout and horizontal/vertical stimulus position. All panels show the average across n = 75 task-optimized models and n = 75 VAMs; error bars correspond to bootstrap 95% confidence intervals. The VAM data shown in panels A and B is the same as that shown in Figs. 4C and 4B, respectively.