Experimental paradigm and decoding analysis.

(a) In each trial of the emotion recognition task, participants fixated centrally before a face stimulus was presented for 1,000 ms. Subsequently, they selected the recognized emotional expression using the VR controller. (b) Face stimuli were presented either with (right) or without (left) stereoscopic depth information. For the stereoscopic condition, a 3D model of the face was rendered using two slightly offset virtual cameras (a “binocular VR camera rig”), providing binocular depth cues. For the monoscopic condition, the same 3D model was first rendered from a single virtual camera to produce a flat 2D texture, eliminating stereoscopic disparity. This texture was then presented to both eyes via the binocular VR camera, removing binocular depth cues while maintaining all other visual properties. (c) Overview of all stimulus combinations: three different face identities (rows) expressing four different emotional facial expressions (columns: happy, angry, surprised, neutral). Face identity was irrelevant for the emotion recognition task. Face models were originally created and evaluated by Gilbert et al. (2021). (d) Schematic of the decoding approach: A multivariate linear classifier (logistic regression) was trained on successive time windows of EEG data, treating each channel as a feature. The emotional expression shown in the respective trial served as the classification label. Using a 5-fold cross-validation, the classifier was tested repeatedly on 80% of the trials and its predictions were validated on the remaining 20%. This procedure yielded decoding performance scores per time window, reflecting the amount of available stimulus information in the EEG data at the corresponding time in the epoch.

Canon AT-1 Retro Camera (https://skfb.ly/6ZwNB) by AleixoAlonso and EyeBall (https://skfb.ly/osJMS) by CrisLArt are licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/).

Time-resolved classification performance for different decoding targets and conditions.

(a) Classification performance (mean ± 1 SEM) of the decoder distinguishing between the four facial expressions, trained separately for the mono- and stereoscopic viewing conditions, as well as on data pooled across both depth conditions. Horizontal lines at the bottom indicate clusters where decoding was significantly above chance. Shaded rectangles in the background mark relevant ERP time windows associated with face processing. (b) Selection of three (out of six) binary contrasts which underlie the multiclass classification in (a). Angry vs neutral: Highest decoding performance. Happy vs surprised: classification performance lacks an early peak and only rises later in the trial. Angry vs surprised: Lowest (but still significant) decoding performance. (c) Average performance per time window, depth condition (top; colors as in (a)), and binary contrast (bottom; *: selection and colors as in (b)). Thick bars: mean ± 1 SEM. (d) Time resolved classification performance (mean ± 1 SEM) for the task-irrelevant decoding targets: identity of the stimulus face (top) and presence vs absence of stereoscopic depth information (bottom). ROC-AUC: area under the receiver operating characteristic.

Spatial patterns and localized sources related to decoding (a) facial expressions and (b) the availability of stereoscopic depth information.

(a) Top: Classification of the four facial expressions. Spatial patterns of the classifier (top row) and their projection onto the cortical surface using eLORETA (second and third row) for the four time windows. Colors indicate the absolute, normalized weight of each sensor or cortical parcel (warmer colors represent higher absolute weights). In the topographies, the six most informative channels (top 10%) are highlighted in yellow. Projections on the cortical surface are masked to show only the top 5% most informative parcels. Second row: Lateral view on both hemispheres. Third row: View on occipital and parietal cortices of the right hemisphere. Colored lines on the cortex mark the outlines of relevant regions (see (c)). Bottom: Time course of decoding performance for multiclass classification of the four emotional expressions (as in Figure 2). (b) As in (a) [in reversed order], but for the classifier trained to distinguish between trials with and without stereoscopic depth information. (c) Outlines of relevant cortical regions, following the reduced parcellation atlas by Glasser et al. (2016) and Mills (2016). SPC: Superior Parietal Cortex; MT+: MT+ Complex and Neighboring Visual Areas; VSVC: Ventral Stream Visual Cortex; EVC: Early Visual Cortex; PVC: Primary Visual Cortex (V1).

Eye tracking analyses and comparison with the EEG decoding results.

(a) Top left: Exemplary gaze trace (one trial) on a stimulus face. Saccades are plotted in red, fixations in blue. The fixation cross preceding the stimulus was displayed at (0,0). Top right: Vertical component of the same gaze trace over time. Bottom left: Horizontal gaze component over time. Bottom right: Saccade rate (saccades per second) over the time of the trial across all trials and participants. Colors like in (b) and (c). We observed a dip in saccade rate right after stimulus onset, followed by a sharp increase peaking during the EPN time window. (b) Heatmaps showing fixation distributions within the first second after stimulus onset, across all trials and participants, separated by emotional expression. (c) Circular histograms of saccade directions (polar angles relative to the preceding fixation), plotted per time window and emotional expression (colors like in (b)). Saccade counts are normalized by the length of the time window. Most saccades occurred during the time window of the EPN (an ERP component associated with reflexive attention to the stimulus), predominantly in downward (especially for angry and surprised faces) or lateral directions. (d) Decoding performance of classifiers trained on the gaze data (spherical coordinates) for different decoding targets. EEG-based decoding performance is overlaid in gray for comparison (in the subset of participants with eye tracking data). Horizontal bars at the bottom of each plot indicate time points with decoding significantly above chance level. Red bars mark significant differences in performance between eye tracking and EEG-based decoding (two-sided, cluster-corrected t-test). Note, EEG decoding results shown here are based on the subsample with eye tracking data (n=17), resulting in lower scores compared to Figures 1 and 2.