Neural response properties in visual cortex that affect perception under challenging viewing conditions.

A: Humans are able to recognize temporally varying targets, for example an approaching car, in suboptimal viewing conditions, for example mist. B: Temporal adaptation refers to the reduction of neural responses when stimuli are repeated, with more pronounced reductions for similar compared to different repeated inputs. C: Contrast modulation is characterized by the contrast response function; neural responses are reduced for low input contrast.

Experimental design and network modeling.

A: Object recognition task with the test images consisting of objects (digits 3, 6, 8 and 9 from the MNIST dataset) embedded in a pixelised noise pattern. Contrast of the digit image was varied (50%, 60%, 70%, 80% and 90%). B: Adaptation trials, consisting of the presentation of same (left) or different (right) noise prior to the test image. C: Control trials, consisting of the presentation of the test image in isolation. D: DCNNs were trained with a feedforward backbone consisting of three convolutional filter layers, one fully connected layer and a readout layer. After each convolution, history-dependent adaptation was applied (depicted in green), feeding activations from the previous model time step, i.e. previous feedforward pass. Filter sizes are represented within parentheses. E: Temporal adaptation mechanisms as implemented by Vinken et al. (2020). Left, Additive suppression whereby each unit feeds its activations from the previous timestep. Right, Temporal adaptation mechanism which feeds unit activations across feature maps. F: Divisive normalization introduced by Heeger (1992, 1993) which feeds activations from the previous timestep using divisive rather than additive suppression, here implemented using a recursive multiplicative feedback signal (see Methods: DCNN modeling).

Effects of temporal adaptation on categorization performance and reaction times for noise-embedded objects.

A: Accuracy across contrast levels for test images shown after a blank adapter (grey) or a same (blue) or different (yellow) noise adapter as the test image. The solid red line shows performance for objects without noise; the dotted black line shows chance level (25%). Each point depicts an individual subject. B: Similar as (A) but for reaction times. Adaptation to same noise most improves recognition performance for higher contrast levels, while adaption to both same and different noise results in faster reaction times overall. Linear mixed effect model, post-hoc Tukey test, p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001. This figure can be reproduced by mkFigure3.py.

Temporal adaptation results in enhanced object-evoked activity at parieto-occipital electrodes.

A: Topomaps showing ERP amplitudes for test images presented without adapter (blank, left) and test images presented after adapting to same (middle) or different noise (right). B: Difference in evoked potentials to test images with and without adapters (left, same noise versus blank; right, different noise versus blank). C: Left, average ERP per condition for electrodes Iz, Oz, O1 and O2. Middle, differences in ERPs with and without adaptation. Time windows for test images with adaptation for which same- (s) and different-noise (d) conditions are significantly different from blank (b) conditions are depicted by shaded red areas. Right, average ERP amplitudes for same vs. different noise adaptation within the identified time windows. Shaded regions and errorbars depict SEM across subjects. D: Same as C, but for electrodes P9 and P10. Adaptation-specific differences arise in parietal electrodes later in the response, with stronger deflections when adapting to same as opposed to different noise. T-test (two-sided), ∗∗ p < 0.01. This figure can be reproduced by mkFigure456.py.

Responses in (occipito-)parietal, but not occipital, electrodes are modulated by the contrast level of noise-embedded objects.

A: Topomaps representing ERPs during presentation of the test image for three object contrast levels. B: Left, ERPs shown separately per contrast level for P9 and P10. Right, response magnitude computed by taking the mean amplitude for the P300 component. C-D: Same as B for occipito-parietal (C) electrodes, including Pz, P1, P2, P3 and P4, and occipital electrodes (D), including Iz, Oz, O1 and O2. The P300 component of (occipito-)parietal electrodes is modulated by object contrast. 1-sample T-Test (coefficient of linear curve against 0), ∗∗∗ p < 0.001. This figure can be reproduced by mkFigure456.py.

More pronounced object contrast modulation at parietal electrodes after same-noise adaptation.

A: ERPs for P9 and P10 shown separately for trials without adaptation (left) and adaptation with same (middle) or different (right) noise compared to the test image. The shaded regions depict the SEM across subjects. B: Response magnitude was computed by taking the mean amplitude for the P300 component for the similar experimental conditions as in panel (A). ERP signals exhibit significant object contrast-dependent modulation, evident as increasingly negative response deflections for increasingly higher contrast levels, after adapting to same noise only. 1-sample T-Test (coefficient of linear curve against 0), ∗∗ p < 0.01. This figure can be reproduced by mkFigure456.py.

Temporal adaptation improves decoding of objects from EEG responses.

A: Average decoding accuracy for predicting the presented object class based on evoked potentials for test images without (dark gray) and with embedded noise, including blank (light grey), same- (blue) and different-noise (orange) trials. Decoding accuracy’s are averaged for [0, 1]s time window after stimulus onset. The lower asteriks denote significant differences from chance level (0.25, one-sided T-test). The upper asteriks denote significant differences across trial types (one-way ANOVA, post-hoc Tukey test). p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001. B: Decoding accuracy for the different trial types across time points, with the number of time points for which decoding accuracy was significantly different from chance level (i.e. 25%) noted on the right. Shaded regions depict the SEM across subjects. This figure can be reproduced by mkFigure7_SFig2.ipynb.

DCNNs with extended intrinsic adaptation show human-like benefits on performance and neural object representations.

(previous page) A: Human object recognition performance for same- (blue) and different- (yellow) noise trials (same data as in Fig. 3A). B: Network classification accuracy for the test set after same- and different-noise adapters for DCNNs without a temporal adaptation mechanism. C: Network classification accuracy for DCNNs with one of three temporal adaptation mechanisms (from left to right): additive suppression, lateral recurrence and divisive normalization. Input sequences consisted of three images: an adapter (A), a blank (B) and a test (T) image (ABT). D: Same as panel C for networks trained on input sequences of 21 images (AAAAAAAAAAAAAAABTTTTT). Networks with instrinsic adaptation mechanisms better approximate the human behavior, showing increasing benefit of temporal adaptation for higher object contrasts. E: Decoding accuracy for the first convolutational layer for the test set for same (blue) and different (yellow) noise adapters for DCNNs with temporal adaptation. Decoding performance for later layers is shown in Supp. Fig. 8. Adapting to the same noise leads to better object decoding of for all adaptation mechanisms. Panels A-D can be reproduced by mkFigure8ABCD.py and panels E can be reproduced by mkFigure8E_SFig8.py.

DCNNs with divisive normalization show higher robustness against spatial shifts of input.

A: Average accuracy across spatial shifts for DCNNs with temporal adaptation trained on short (n = 3, ABT) or long (n = 21, AAAAAAAAAAAAAAABTTTTT) input sequences. The accuracy is normalized with respect to no shift. DCNNs with divisive normalization (DN) are more robust against spatially shifting noise during test time as the sequence length increases compared to DCNNs with additive suppression (AS) and lateral recurrence (LR). Error bars show the SEM. Independent T-test, ∗∗∗ p < 0.001. B: Effect of spatially shifting the noise during the test image on network performance. Depicted is the drop in accuracy compared to 0 pixel shift for the three temporal adaptation mechanisms optimized on the different sequence lengths. This figure can be reproduced by mkFigure9.py.

Number of trainable parameters per model.

Shown are the number of parameters for a feedforward model without (baseline) and with a temporal adaptation, arising from intrinsic (int) or recurrent (rec) mechanisms.

Temporal adaptation parameters for DCNNs with additive suppression and divisive normalization.

A: Trained parameter values for each convolutional layer for DCNNs with additive suppression, including α and β. Each row depicts a different sequence length and error bars depict SEM across network initializations (n = 5). B: Same as panel A for DCNNs with divisive normalization, including the temporal adaptation parameters α, K and σ. C: Average parameter values for DCNNs with additive suppression averaged across the convolutional layers. Errors bars depict SEM across network initializations (n = 5). D: Same as panel C for DCNNs with divisive normalization. This figure can be reproduced by mkSFigure1.py.

Object-related representations in the neural data per object contrast.

A: Average decoding accuracy for predicting the presented object class based on evoked potentials for test images varying in object contrast level. Decoding accuracy’s are averaged for [0, 1]s time window after stimulus onset. B: Decoding accuracy for the different object contrast levels, with the number of time points for which decoding accuracy was significantly different from chance level (i.e. 25%) noted on the top left. Shaded regions depict the SEM across subjects. This figure can be reproduced by mkFigure7_SFig2.ipynb.

Model activations.

A: Activations for the first convolutional layer for the test set for same (blue) and different (yellow) noise adapters for the first DCNN initialization without a temporal adaptation mechanism. Results are shown for three different object contrast levels (%), including 10, 50, 90. B-D: Same as panel A for a DCNN initialization endowed with a temporal adaptation mechanism, including additive suppression (B), lateral recurrence (C) and divisive normalization (D). This figure can be reproduced by mkSFigure3-7.py.

Model activations.

A: Activations for the first convolutional layer for the test set for same (blue) and different (yellow) noise adapters for the second DCNN initialization without a temporal adaptation mechanism. Results are shown for three different object contrast levels (%), including 10, 50, 90. B-D: Same as panel A for a DCNN initialization endowed with a temporal adaptation mechanism, including additive suppression (B), lateral recurrence (C) and divisive normalization (D). This figure can be reproduced by mkSFigure3-7.py.

Model activations.

A: Activations for the first convolutional layer for the test set for same (blue) and different (yellow) noise adapters for the third DCNN initialization without a temporal adaptation mechanism. Results are shown for three different object contrast levels (%), including 10, 50, 90. B-D: Same as panel A for a DCNN initialization endowed with a temporal adaptation mechanism, including additive suppression (B), lateral recurrence (C) and divisive normalization (D). This figure can be reproduced by mkSFigure3-7.py.

Model activations.

A: Activations for the first convolutional layer for the test set for same (blue) and different (yellow) noise adapters for the fourth DCNN initialization without a temporal adaptation mechanism. Results are shown for three different object contrast levels (%), including 10, 50, 90. B-D: Same as panel A for a DCNN initialization endowed with a temporal adaptation mechanism, including additive suppression (B), lateral recurrence (C) and divisive normalization (D). This figure can be reproduced by mkSFigure3-7.py.

Model activations.

A: Activations for the first convolutional layer for the test set for same (blue) and different (yellow) noise adapters for the fifth DCNN initialization without a temporal adaptation mechanism. Results are shown for three different object contrast levels (%), including 10, 50, 90. B-D: Same as panel A for a DCNN initialization endowed with a temporal adaptation mechanism, including additive suppression (B), lateral recurrence (C) and divisive normalization (D). This figure can be reproduced by mkSFigure3-7.py.

Decoding accuracy for the convolutional layers across adapter types.

A: Decoding accuracy for the first convolutional layer for the test set for same (blue) and different (yellow) noise adapters for DCNNs endowed with one of three temporal adaptation mechanisms (from left to right): additive suppression, lateral recurrence and divisive normalization. Input sequences varied in length, including short (ABT) and long (AAAAAAAAAAAAAAABTTTTT) sequences. B-C: Same as panel A for the second (B) and third (C) convolutional layer.This figure can be reproduced by mkFigure8E_SFig8.py.