Figures and data

Experimental approach.
a: Experimental recordings from macaque visual areas V1 (453 neurons from 2 monkeys) and V4 (394 neurons from 2 monkeys) during presentation of natural images. Monkeys fixated on the screen (fixation spot shown in red). The stimulus was centered on the population receptive field of neurons recorded that day, indicated by the white circle. b: Architecture of functional “digital twin” models, consisting of a core shared across neurons (V1: first layer of ConvNext; V4: layer 3 of Robust ResNet50) and a neuron-specific readout. This design creates in silico neurons that model the response properties of individual biological neurons, with example receptive field locations of two neurons illustrated. c: Prediction accuracy of the model on test images not used during training, for both V1 (orange) and V4 (blue). Left panel shows distribution of single trial correlations, and right panel displays correlation to the average response across stimulus repeats. The dotted line indicates an inclusion threshold of 0.4. For further analysis, we only included neurons with a correlation to average above that threshold (n = 443 neurons for V1, n = 205 neurons for V4).

Continuum of single neuron response sparseness in early and mid-level macaque visual cortex.
a, Response profiles of non-sparse (left) and sparse (right) V1 neurons. Curves display neuronal activity sorted from lowest to highest response, derived from model predictions across 1.2 million ImageNet images (gray, top row) and recorded responses to 75 test images (black, bottom row), averaged over stimulus repeats. We used responses to test images to obtain mean stimulus driven activity and average out stimulus-unrelated signals. Skewness values quantify lifetime sparsity, with higher values indicating neurons that respond selectively to fewer stimuli while remaining silent to most others. Green dotted lines indicate recorded baseline firing rate (Hz) during grey screen presentation prior to stimulus onset. b, Comparable response profiles for representative V4 neurons, demonstrating similar variations in lifetime sparsity in this higher-order visual area, with some neurons showing broadly tuned responses and others exhibiting highly selective activation patterns. c, Correlation analysis between prediction-based and recording-based skewness values for qualifying V1 (orange, n = 443) and V4 (blue, n = 205) neurons. The strong correlation (r = 0.54 for V1 and r = 0.66 for V4, both p < 0.001) validates that our model accurately captures intrinsic sparsity characteristics across natural scenes, supporting the use of model-predicted responses for systematic in silico analyses across large image datasets. d, Population-level distribution of lifetime sparsity across V1 (orange) and V4 (blue) neuronal populations (V1: n = 443 neurons, V4: n = 205 neurons), revealing a continuous spectrum rather than discrete categories. Representative activation curves above illustrate how response profiles change along this continuum. Neurons with skewness below 2.0 are defined as non-sparse, though this threshold represents a point along a gradual transition. This distribution highlights the functional diversity within each visual area. e, Baseline firing rate extracted from a 300 ms fixation window before stimulus onset plotted versus skewness of predicted responses. V1 (orange): n = 443 neurons, V4 (blue): n = 205 neurons. R2 from exponential fit. Neurons with low baseline firing rates exhibit variable skewness that likely reflects the prevalence of their preferred features in natural scenes—rare features produce highly skewed responses while common features yield more symmetric distributions. f, Median predicted activity to natural scenes plotted versus baseline firing rate extracted from a 300 ms fixation window before stimulus onset. V1 (orange): n = 443 neurons, V4 (blue): n = 205 neurons. R2 from linear regression.

Two complementary methods to study neuronal selectivity: optimization-based feature visualization and large-scale image screening.
a, Schematic of the feature visualization procedure, in which a starting noise image is iteratively optimized using gradient ascent on a neural predictive model to either maximize or minimize the activity of a single neuron. This yields the most exciting input (MEI) or the least exciting input (LEI), respectively. The process begins with a noise image and updates pixel values to achieve the target activation level. Example shows an LEI for a V4 neuron. b, Schematic of the image screening procedure, in which a large dataset of 1.2 million ImageNet images is used to probe neuronal responses. Each image elicits a predicted neuronal response from the model, allowing construction of a response profile across the dataset. Sorting these responses identifies the most activating (MAI) and least activating (LAI) images for each neuron.

Identification of most and least activating stimuli of macaque V1 neurons.
Least (left) and most (right) activating inputs for five example V1 neurons. For each neuron, top row show optimized images starting from different initialization (i.e. noise) seeds (LEIs on the left, MEIs on the right) and bottom row shows the most and least activating images identified through screening 1.2 million ImageNet images (LAIs on the left, MAIs on the right). Images are 2.3 × 2.3 degrees visual angle, with each neuron’s receptive field located in the center of the image.

Identification of most and least activating stimuli of macaque V4 neurons.
Least (left) and most (right) activating inputs for five example V4 neurons. For each neuron, top row show optimized images starting from different initialization (i.e. noise) seeds (LEIs on the left, MEIs on the right) and bottom row shows the most and least activating images identified through screening 1.2 million ImageNet images (LAIs on the left, MAIs on the right). Images are 14.81 × 14.81 degrees visual angle, with each neuron’s receptive field located in the center of the image.

High and low activity reflect structured and perceptually coherent feature combinations.
a, Schematic illustrating the computation of image similarities. All naturalistic rendered images were embedded into DreamSim, a perceptual similarity space fine-tuned on human judgments. Within this high-dimensional space, we computed cosine similarity among the top 10 most activating images (MAIs) and among the least activating images (LAIs), as well as their similarity to random images. b, Distributions of cosine similarity among MAIs (top 10 images, red) and between MAIs and random images (gray). These distributions were used to compute discriminability using the d-prime metric. c, d-prime values for MAIs and LAIs across all non-sparse V1 and V4 neurons. Gray bars indicate a control condition comparing similarities between random image sets. Across the population, d-prime values for both MAIs and LAIs were significantly higher in V1 and V4 compared to random image sets (two-sample t-test, p < 0.001). After applying false discovery rate correction, all V4 neurons (n = 168) retained p-values below 0.05, indicating that the likelihood of observing such discriminability by chance was consistently low. In the larger population, n = 293 out of n = 315 neurons showed significant p-values for MAIs, and n = 281 neurons showed significant p-values for LAIs.

Model predictions accurately identify extreme stimuli in recorded neuronal responses from V1 and V4.
a, Left: Model-predicted responses to 75 test ImageNet images for an example non-sparse (top) and sparse (bottom) V1 neuron. The predicted least and most activating test images are marked in blue and red, respectively. Right: Actual recorded responses of the same neurons to the same test images, averaged across stimulus repeats and sorted by recorded response magnitude. The blue and red dots indicate the images that the model predicted to elicit the lowest and highest responses. b, Distribution of response percentiles in recorded data for the predicted least (blue) and most (red) activating images, shown separately for non-sparse (top) and sparse (bottom) V1 neurons. Note that a non-selective ordering would be expected to yield a uniform distribution, similar to the blue distribution observed here. c, Same as (b), but for nonsparse and sparse neurons in area V4.

Independent evaluator models confirm the identification of optimal stimuli for V1 and V4 neuronal responses.
a, Schematic of the verification pipeline. Least and most activating images for V1 neurons were identified using a generator model through both optimization and screening of images. These images were then passed to an independent evaluator model, comprising a separate model ensemble with the same core architecture but a readout trained from scratch. b, Example V1 neuron. Left: Distribution of predicted responses to sizeand contrast-matched natural images, with the predicted responses to the MEI and LEI (identified by the generator model) highlighted by the evaluator model. Right: Distribution of response percentiles for MEIs, LEIs, MAIs, and LAIs as predicted by the evaluator model, relative to the distribution of natural image responses. MAIs and LAIs were identified based on 200k naturalistic rendered images. c,d, As in (a,b), but for V4 neurons. The evaluator model in this case had a distinct architecture and training objective and was trained from scratch on the neural data.

Responses of V4 neurons vary continuously depending on preferred and non-preferred stimuli.
a, Schematic illustrating construction of the 2D image similarity space for each neuron. Each of ∼200k naturalistic rendered images was assigned x- and y-coordinates based on its cosine similarity (computed using DreamSim) to the neuron’s least activating image (LAI, x-coordinate) and most activating image (MAI, y-coordinate), respectively. MAIs and LAIs were defined from the same image set. b, Example 2D similarity space for a non-sparse V4 neuron. Bins are color-coded by the mean predicted neuronal activity of the images within each bin. The arrow denotes the principal activity gradient, and R2 indicates the variance explained by a linear fit. The color bar spans the 0.1 to 99.9th percentile of neuronal responses across all images. c, Same as (b), but for a sparse V4 neuron. Here, the activity gradient aligns primarily along the y-axis, indicating stronger modulation by similarity to the MAI than to the LAI. d, Variance explained (R2) by linear regression predicting neuronal activity from the 2D similarity space (as in b,c), shown separately for sparse (dark blue) and non-sparse (light blue) V4 neurons. e, Results of three control analyses validating the similarity space for the V4 neuron shown in panel b. Control 1 replaces both MAIs and LAIs with random images. Control 2 replaces MAIs with random images but retains LAIs. Control 3 replaces LAIs with random images but retains MAIs. f, Variance explained (R2) by linear regression for the original analysis (similarity to MAIs and LAIs on x- and y-axes) and for Control 1 (left), Control 2 (middle), and Control 3 (right). Sparse (dark blue) and non-sparse (light blue) neurons are shown separately. Lines indicate linear regression fits, with slopes reported in each panel.

Most and least activating stimuli reveal distributed feature tuning across neuronal populations.
a, Examples of least activating images (left) and most activating images (right) for two V1 neurons. Arrows indicate the perceptual similarity between MAIs and LAIs shared across both neurons. b, Distribution of response percentiles evoked by MAIs, LAIs, and randomly sampled natural images across the V1 neuronal population. The left histogram shows the probability that one neuron’s MAI will elicit a specific response percentile in other neurons. Response percentiles were calculated from each neuron’s responses to 1.2 million ImageNet images. Data represent population means with 99% confidence intervals. c,d, Same analyses as in (a,b) applied to neurons recorded in visual area V4. e, Schematic illustrating how we used the MAIs and LAIs identified from neurons recorded in one monkey to predict the responses of neurons recorded in another monkey. The resulting response percentiles indicate how these images ranked compared to the predicted responses to 1.2 million ImageNet images, as shown in panels (f,g). f, Cross-animal generalization: probability that MAIs (left) and LAIs (right) from V1 neurons in monkey A will evoke specific response percentiles in V1 neurons from monkey B. g, Same cross-animal analysis as in (e), applied to V4 neurons.

Dual-feature selectivity in the mouse visual cortex.
a, Experimental recordings from mouse visual areas V1 (598 neurons), LM (350 neurons), and LI (126 neurons) from 8 C57BL-6 mice during presentation of natural images. Mice were head-fixed on a treadmill and passively viewing the stimuli presented on a screen in front of them. b, Prediction accuracy of the model on test images not used during training, for V1 (orange), LM (cyan), and LI (blue) as the correlation to the average response across stimulus repeats. The dotted line indicates an inclusion threshold of 0.4. For further analysis, we only included neurons with a correlation to average above that threshold (n = 561 neurons for V1, n = 325 neurons for LM, n = 113 neurons for LI). c, Correlation analysis between prediction-based and recording-based skewness values for qualifying V1 (orange, n = 561), LM (cyan, n = 325), and LI (blue, n = 113) neurons. Similar to the primates, there is strong correlation (r = 0.834 for V1, r = 0.0.828 for LM, and r = 0.801 for LI, all p < 0.001) indicating that the model accurately captures intrinsic sparsity characteristics across natural scenes. d, Population-level distribution of lifetime sparsity across V1 (orange), LM (cyan), and LI (blue) neuronal populations, revealing a continuous spectrum rather than discrete categories. Neurons with skewness below 2.0 are defined as non-sparse. e, Response profiles of non-sparse (left) and sparse (right) V1 neurons. Curves display neuronal activity sorted from lowest to highest response, derived from model predictions across 200, 000 ImageNet images (gray, top row) and recorded responses to 100 test images (black, bottom row), averaged over stimulus repeats. Skewness values quantify lifetime sparsity, as in Fig. 2a,b. f,g, Same as (e) for LM neurons (f) and LI neurons (g). h, Baseline firing rate extracted from a 200 ms window before stimulus onset plotted versus skewness of predicted responses. V1 (orange, n = 561), LM (cyan, n = 325), LI (blue, n = 113) neurons. R2 from exponential fit. i, Least (blue, LAI) and most (red, MAI) activating images for 4 example non-sparse V1 neurons. LAI and MAI are identified through screening of 200, 000 ImageNet images. Images are 84 x 114 degrees visual angle. j,k Same as (i) for LM neurons (j) and LI neurons (k). l, Distribution of response percentiles evoked by MAIs, LAIs, and random images across the V1 neuronal population. The left histogram shows the probability that one neuron’s MAI will elicit a specific response percentile in other neurons. Response percentiles were calculated from each neuron’s responses to 200, 000 ImageNet images. Data represent population means with 99% confidence intervals. m,n Same analysis as (l) for LM neurons (m) and LI neurons (n).

Single neuron coding strategies in sparse and non-sparse visual neurons.
This schematic illustrates how sparse and non-sparse neurons exhibit selectivity for distinct concepts, where each concept (e.g., C1, C2) represents a specific combination of latent visual features such as color, shape, and texture—dimensions known to be encoded in area V4. These concepts can be thought of as points in a high-dimensional perceptual space. Left: Sparse neurons respond selectively and strongly to a single concept (e.g., C1 : green dot texture) and remain silent for most other stimuli. These neurons exhibit high lifetime sparseness and encode only a narrow portion of stimulus space. Right: Non-sparse neurons may exhibit dual-feature selectivity, characterized by excitation to one concept (e.g., C2 : concave curvature) and suppression to another (e.g., C1). These neurons tend to maintain non-zero baseline activity and modulate their firing rates bidirectionally, enabling graded responses to a broader range of stimuli. This bidirectional modulation likely reflects feature-selective excitation (red arrows) and inhibition (blue arrows), with firing rates encoding the similarity of each stimulus to both excitatory and suppressive features. Sparse and non-sparse neurons are intermingled within the same cortical area, forming a distributed population code in which different neurons anchor their selectivity to partially overlapping sets of concepts. This organizational motif—shared feature selectivity and bidirectional modulation—appears conserved across visual cortical areas, including earlier regions such as V1, though the underlying concepts differ with the feature space represented at each stage.

Relationship between model performance and response skewness
a, Single trial correlation (left) and correlation to average (right) of V1 model plotted versus skewness of recorded V1 responses. b, Like (a), but for V4 model and responses.

Most and least activating images of example V1 neurons
Least activating (left) and most activating (right) images of ten example V1 neurons. Per neuron, the top row shows optimized images (LEIs and MEIs) and the bottom row shows screened ImageNet images (LAIs and MAIs). Each image is 2.3 × 2.3 degrees visual angle, with the receptive field of the neuron in the center.

Most and Least Activating Images of Example V4 Neurons
Least activating (left) and most activating (right) images of ten example V4 neurons. Per neuron, the top row shows optimized images (LEIs and MEIs) and the bottom row shows screened ImageNet images (LAIs and MAIs). Each image is 14.83 × 14.83 degrees visual angle, with the receptive field of the neuron in the center.

Most and least activating images of simulated simple and complex cells
a, Example simulation of a V1 simple and complex cell based on the same Gabor receptive field (left). Activations to 200, 000 naturalistic images were computed as the dot product between the Gabor filter and each image. For simple cells, this revealed least (blue) and most (red) activating images, showing a bimodal response distribution with LAIs sharing the same orientation but differing in phase. Complex cells were modeled by squaring and pooling responses across different phases, resulting in sparse activation profiles without coherent LAIs.

Responses of V1 neurons vary continuously depending on preferred and non-preferred stimuli
a, Example 2D DreamSim similarity space (cf. Fig. 9) for a non-sparse V1 neuron. Bins are color-coded by the mean predicted neuronal activity of the images within each bin. The R2 indicates the variance explained by a linear fit. The color bar spans the 0.1 to 99.9th percentile of neuronal responses across all images. b, Same as (b), but for a sparse V1 neuron. Here, the activity gradient aligns primarily along the y-axis, indicating stronger modulation by similarity to the MAI than to the LAI. c, Variance explained (R2) by linear regression predicting neuronal activity from the 2D similarity space (as in a,b), shown separately for sparse (dark orange) and non-sparse (light orange) V4 neurons. The explained variances values are generally lower for V1 compared to V4 neurons (cf. Fig. 9). This is likely due to the fact that the DreamSim representational space is more closely aligned with the representational space of V4 than V1. f, Variance explained (R2) by linear regression for the original analysis (similarity to MAIs and LAIs on x- and y-axes) and for Control 1 (left), where both MAIs and LAIs were replaced with random images.