Deep neural network approach captures tuning properties of individual monkey V4 neurons

a, Schematic illustrating experimental setup: Awake, head-fixed macaque monkeys were presented with static natural images after fixating for 300 ms (120 ms presentation time per image, 15 images per trial, 1200 ms inter-trial period), while recording the neuronal activity in V4 using 32-channel probes. Animals were fixating on a fixation spot such that the recorded neurons’ population receptive field was centered on the monitor. Post-hoc spike sorting resulted in single-unit activity of individual V4 neurons. b, Schematic illustrating model architecture: The pre-processed stimuli (100 x 100 pixels crop) and neuronal responses were used to train a neuron-specific read-out of a ResNet50 pre-trained on an image classification task. Specifically, we selected the ResNet50 layer with the best V4 predictions and computed the neuronal responses by passing the feature activations to a neuron-specific Gaussian readout and a subsequent non-linearity. Traces on the right show average responses (gray) to 75 test images of two example neurons and their corresponding model predictions (black). c, Schematic illustrating 32-channels along the probe used for electrophysiological recordings and number of recording sessions per monkey. In total, we recorded the single-unit activity of n=1,244 neurons. d, Explainable variance as a measure of response reliability to natural images plotted versus model prediction performance (correlation between prediction and average neural response to repeated presentations) of all cells. Dotted red line indicates a prediction performance of 0.3 used in subsequent analyses (explainable variance mean ± s.d. = 0.33 ± 0.19, correlation to average mean ± s.d. = 0.43 ± 0.21) e, Schematic illustrating optimization of most exciting images (MEIs). For each in silico neuron, we optimized its MEI using gradient ascent over n=100 iterations.The whole gray box (full extent) is 14.82°degrees visual angle in width and height. f, MEIs of ten example neurons.

Closed-loop paradigm verifies model-derived optimal stimuli of single V4 neurons

a, Schematic illustrating closed-loop experimental paradigm for acute recordings in monkey V4. In brief, after recording and spike sorting of the “generation session”, we train a model, select neurons for experimental confirmation, generate MEIs and identify the most exciting natural image control stimuli, and present both MEIs and controls back to the animal while recording from the same neurons in the “verification session”. Functional and waveform matching of units across recordings is performed offline. b, MEI and the seven most exciting natural image crops, selected from 5,000 natural images, for four example neurons. Natural images were matched in size, location and contrast to the MEI. c, Peak-normalized recorded responses of the neurons in (b) to their MEI (orange) and control images (black; mean across n=20 repeats). d, Recorded versus predicted neuronal activity of two example neurons to their MEI and control stimuli, as well as to MEIs and control stimuli of other neurons. e, Scatter plot of model performance on the test set of natural images and the closed-loop stimuli (as shown in d, but for all neurons). Correlation to average: mean ± s.d. = 0.61 ± 0.11; Synthesized and selected stimuli: mean ± s.d. 0.61 ± 0.20, n = 55 neurons. A paired t-test showed no significant difference p = .61. f, Distribution of peak-normalized mean responses to each neuron’s MEI and control stimuli, as well as MEIs and control stimuli of other neurons for all closed-loop neurons (n = 55 neurons, n = 24 sessions, n = 1 monkey). P-values for a paired t-test are: MEI-Control, 3.22e-08; MEI-OtherMEIs, 2.57e-14; MEI-OtherControls, 3.06e-19; Control-OtherMEIs, 2.86e-07; Control-OtherControls, 1.62e-19; OtherMEIs-OtherControls, 2.99e-05. P-values were corrected for multiple comparisons with Bonferroni correction.

Topographic organization of model-derived feature selectivity in macaque V4

a, MEIs of 17 neurons recorded in a single experimental session, arranged according to each neuron’s channel location along the recording probe. Numbers indicate channel, with higher channel numbers meaning greater recording depth. b, MEIs of varying numbers of neurons for four different sessions (indicated by different colors). c, Schematic illustrating paradigm of simple psychophysics experiment. In one trial, subjects were presented with MEIs of 9 neurons recorded within one session (left) or randomly sampled from all neurons except the target session (right), and reported the location (left or right) of the set of MEIs that looked more consistent (i.e. shared the same image features). The experiment included n=50 trials/sessions. d, Distribution of fraction of sessions correctly identified across n=25 observers, with change level and observer average indicated by dotted lines. Mean across subjects, = 0.73; subject-variability in s.d., = 0.13; session-variability in s.d., = 0.21.

Contrastive clustering of MEIs confirms topographic organization of V4 visual tuning selectivity

a, Schematic illustrating contrastive learning approach to quantify MEI similarity. Per neuron, we optimize n=50 MEIs initialized with different random seeds, then select highly activating MEIs, and use one MEI per neuron (n=889) as a training batch. Each MEI is transformed and augmented twice and the model’s objective then is to minimize the distance in a 2D MEI similarity space between different transforms of the same MEI, while maximizing the distance to MEI transforms of other neurons. b, Position of all highly activating MEIs (n=19688) of n=889 neurons in a 2D MEI similarity space, with MEIs of five example neurons indicated in different colors. Dots of the same color indicate MEIs optimized from different random seeds of the same neuron. c, Schematic illustrating analyses performed on the 2D MEI similarity space. We computed the pairwise 2D distances across all MEIs of one neuron to estimate MEI consistency (left), and all pairwise distances across MEIs of the same recording session to estimate recording session consistency (right). For the latter, we used the distances across a random selection of neurons from other sessions as control. d, Distribution of embedding-distances across MEIs of the same neuron. Vertical dotted line indicates mean of the distribution. e, Mean distance across neurons from one example session (vertical line), with a null distribution generated by bootstrapping distances across the same number of neurons randomly sampled from all other sessions. Orange shading indicates values <5% percentile. Note that the null distribution depends on how many neurons were recorded in each session, as it estimates the standard error of the mean for each session. f, Histogram of session means like in e), but for all sessions. Grand mean across all sessions is indicated by the vertical line. Mean = 423.97 ± 105.20 s.d. g, Mean within-session distance across all sessions from f) along with the mean null distribution across sessions in gray. The population mean significantly deviates from the null distribution (p < 4 × 105; 25,000 bootstrap samples). Orange shading indicates values <5% percentile. h, Percentage of sessions with the within-session distance <5% percentile of the null distribution for different numbers of neurons per session (x-axis) and different model predictions thresholds (shades of gray). The percentiles obtained from the embedding space (including all neurons above a prediction threshold of 0.3) were significantly correlated with the observer agreement (percent correct) of the psychophysics experiment (ρ = 0.33, p=.019, n=50 sessions).

V4 neurons cluster into distinct response modes that resemble feature maps of artificial vision systems

a, Position of all highly activating MEIs (n=19,688) of n=889 neurons in the 2D MEI similarity space, color coded based on cluster assignment obtained from the hierarchical clustering algorithm HDBSCAN. For n=12 clusters, we show a random selection of MEIs of different neurons assigned to this cluster. For examples of the other clusters, see Suppl. Fig. 4 and for independent verification of the clusters, see Suppl. Fig. 3. Light gray dots indicate MEIs that could not be assigned to any of the clusters with high probability. b, Feature visualizations of early- to mid-level units in the deep neural network InceptionV1 trained in an image classification task (Olah et al., 2020). Units are grouped into distinct categories based on (Olah et al., 2020), with clusters from (a) resembling these categories indicated below. c, Example units of the neural network trained on image classification compared with example MEIs exhibiting similar spatial patterns. The resemblance between the two can be used to generate hypotheses, such as to predict color boundary encoding in primate V4 neurons, that can be subsequently tested experimentally.

Spike waveform and functional matching of single units across recordings.

a, Schematic illustrating spike sorting of the closed-loop experimental paradigm. The “generation session” is spike sorted directly after the recording, resulting in “Sorting 1”. This data is then used for model training and optimization of MEIs, which are presented back to the animal during the “verification session”. The verification session recording starts immediately after the generation session recording ends, to ensure a continuous monitoring of the recorded units over time. After the experiment, the generation and verification session recordings are concatenated (“full session”) and spike sorted, resulting in “Sorting 2”. b, Unit matching based on spike waveforms across Sorting 1 (generation session) and Sorting 2 (full session) for an example session. The left plot shows the percentage of spikes of the Sorting 1 units assigned to the units of Sorting 2. Units were assigned by passing the principal components of each spike, extracted using the Sorting 1 Gaussian Mixture model (GMM), to the Sorting 2 model. For a potential match (orange), at least 95% of the spikes of a single unit of Sorting 1 had to be assigned to an individual unit of Sorting 2. The right plot shows the percentage of spikes of Sorting 2 units assigned to the units of Sorting 1. For a final match (red), at least 95% of the spikes of a single Sorting 2 unit had to be assigned to the potential match of Sorting 1. c, Distribution of correlations of mean test set responses for all final matches. We only included matched units into the analysis, if their functional correlation was at least 0.5.

Diverse model-derived stimuli of individual monkey V4 neurons.

a, For a set of 14 example neurons, we show 10 MEIs per neuron, generated from different starting points (random seeds) during the MEI optimization. The different MEIs exhibit the same visual feature but somewhat differ with respect to orientation, scale and position. b, Same as (a), for another set of 14 neurons.

Similarity of optimal stimuli in neuronal response space.

a, MEIs of three example neurons. Right shows a schematic illustrating how we compared the similarity of MEIs using representational similarity. In brief, each MEI was presented to the trained CNN model that was used to produce the MEIs to obtain a response vector. The response vectors were then compared using cosine similarity. b, Mean cosine similarity of MEIs within a single recording session (diagonal) and across recording sessions for n=88 sessions, peak-normalized for each row. c, Distribution of percentiles of within-session similarity. For example, a percentile of 0.05 means that the MEI similarity within the session was larger than the MEI similarity to 95% of the other sessions. For n=30/55 sessions, the percentile was <0.05. d, Cosine similarity of MEIs of n=889 neurons, sorted based on cluster assignment (cf. Fig. 5). e, Mean cosine similarity of MEIs within a cluster (diagonal) and across clusters, peak normalized per row. The matrix depicts the mean across n=100 similarity matrices, each generated based on a different random selection of MEIs per neuron. Top shows distribution of cosine similarity within an example cluster (black) and the mean similarity to all other clusters (gray). f, Mean within-cluster and across-cluster similarity for all clusters.

Overview of optimal stimuli of V4 response modes.

a, MEIs of example neurons for five response modes not shown in Fig. 5.