Neuroscience

Hierarchical priors enable neural prediction of perceived biological motion

Ingmar EJ de Vries author has email address
Floris P de Lange
Moritz F Wurm

Centre for Mind/Brain Sciences, University of Trento, Rovereto, Italy
Donders Centre for Cognitive Neuroimaging, Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, Netherlands

https://doi.org/10.7554/eLife.111118.1

Open access
Copyright information

Figures and data

Experimental design and behavior.
(A) A run consisted of 108 trials alternated by a fixation cross (2 ± 0.2 sec, uniformly distributed). A trial consisted of one of 14 unique 5-sec ballet dancing sequences, such that each unique sequence was shown at least 7 times per run. (B) Participants observed the ballet sequences in 3 different viewing conditions (33% of trials per condition); normal, up-down inverted, and piecewise temporally scrambled, with pieces uniformly distributed between 200 – 500 msec. (C) Top: To ensure attention to the ballet sequence, in 16 trials per run the video was occluded by a black screen for 400 msec, after which the video continued either correctly, or with an incorrect sequence. Participants (dis)confirmed by button press the correct continuation of the video. Bottom: To encourage fixation, on 8 trials per run the fixation cross changed color for 200 msec at a random time between 0.4 and 4.8 seconds after onset, and participants pressed a button upon detection. (D) Behavioral results. Percentage correct and trial-averaged RT on catch trials in top and bottom panels, respectively. RT on the fixation task is counted from color change onset, while RT on the occlusion task is counted from occlusion offset. Dots represent single-participant data (n = 40 healthy human participants). Bars represent the group mean. ITI inter-trial interval. * p < 0.05, ** p < 0.01.

Dynamic RSA results for motion models.
Region of interest (ROI)-based analysis, with normalized dRSA regression weights illustrated as latency plots. The three viewing conditions are displayed in rows with normal on top (A), inverted in the middle (B), and temporally scrambled at the bottom (C). Different stimulus models are displayed in columns with pixelwise motion reflecting optical flow vector direction in each pixel, and body motion reflecting 3D motion of 13 kinematic markers placed on the dancer. Lines and shaded areas indicate participant-average and SEM, respectively, with n = 40 in (A) and (B), and n = 37 in (C). Light-to-dark colors indicate posterior-to-anterior ROIs. Horizontal bars indicate beta weights significantly larger than zero (one-sided t-test with p < 0.001 for each time sample), corrected for multiple comparisons across time using cluster-based permutation testing (p < 0.05 for cluster-size), with colors matching the respective ROI line plot. (D) Individual-participant (n = 40) peak magnitude in normal versus inverted condition, with peak magnitudes averaged over the first 4 ROIs for pixelwise and view-dependent body motion, and the first 3 ROIs for view-invariant body motion. A dot in the upper left or lower right triangle indicates a larger peak magnitude in the inverted or normal condition, respectively. The red dot reflects the participant average. (E) Same as (D) but for normal versus scrambled condition (n = 37). ** p < 0.01, *** p < 0.001.

dRSA between neural RDMs of normal and inverted movie conditions.
(A) Similarity between the neural RDMs of the normal and inverted movie was computed for each normal-by-inverted time point (top), resulting in a normal-time-by-inverted-time dRSA matrix (middle). Last, the 2D dRSA matrix was averaged along the diagonals to create a lag-plot (i.e., lag between normal and inverted RDMs; bottom), in which a peak to the left or right of the vertical zero-lag midline indicates that the neural representation of either normal or inverted is leading, respectively. Similarity between neural RDMs was computed as correlation (B), or partial correlation (C), where two low-level visual models with lagged representations were partialed out to account for a large part of shared variance between neural RDMs being explained by low-level visual features (i.e., pixelwise luminance and pixelwise motion magnitude; see Methods). In (B) and (C): The first column reflects the average over the first 4 ROIs. Lines and shaded areas indicate participant-average (n = 40) and SEM, respectively. The horizontal black bars indicate correlation coefficients significantly larger than zero (one-sided t-test with p < 0.001 for each time sample), corrected for multiple comparisons across time using cluster-based permutation testing (p < 0.05 for cluster-size). The second column reflects the participant-average per ROI, with light-to-dark colors reflecting posterior-to-anterior ROIs, while the third column shows the same but zoomed in. Note that the first column reflects absolute correlation coefficients, while the second and third column reflect proportion of maximum correlation coefficient per ROI. Second and third column are smoothed using a 20 msec sliding window for plotting purposes only.

Dynamic RSA results for body posture models.
Region of interest (ROI)-based analysis, with normalized dRSA regression weights illustrated as latency plots. Two viewing conditions are displayed in rows with normal on top (A) and temporally scrambled below (B). Different stimulus models are displayed in columns with body posture capturing 3D position of 13 kinematic markers placed on the dancer. Lines and shaded areas indicate participant-average and SEM, respectively, with n = 40 and 37 in A and C, respectively. Light-to-dark colors reflect posterior-to-anterior ROIs. Horizontal bars indicate beta weights significantly larger than zero (one-sided t-test with p < 0.001 for each time sample), corrected for multiple comparisons across time using cluster-based permutation testing (p < 0.05 for cluster-size), with colors matching the respective ROI line plot. (C) Individual-participant (n = 37) peak magnitude in normal versus scrambled condition, with peak magnitudes averaged over the first 4 ROIs. A dot in the upper left or lower right triangle indicates a larger peak magnitude in the scrambled or normal condition, respectively. The red dot reflects the participant average. * p < 0.05.

dRSA results for all models.
Region of interest (ROI) analysis, with normalized dRSA regression weights illustrated as latency plots for all tested models. The three viewing conditions are displayed in rows with normal on top (A), inverted in the middle (B), and temporal piecewise scrambled at the bottom (C). Different stimulus models are displayed in columns. Lines and shaded areas indicate participant-average and SEM, respectively, with n = 40 in (A) and (B), and n = 37 in (C). Light-to-dark colors indicate posterior-to-anterior ROIs. Horizontal bars indicate beta weights significantly larger than zero (one-sided t-test with p < 0.001 for each time sample), corrected for multiple comparisons across time using cluster-based permutation testing (p < 0.05 for cluster-size), with colors matching the respective ROI line plot.

Simulations.
(A) dRSA on simulated data using simple correlation as similarity measure, in the normal and inverted viewing conditions. Note that the model RDMs do not differ between the normal and inverted viewing conditions. Colors indicate separate simulations of individual model RDMs. Columns indicate which model was tested. These results illustrate the effect of shared variance between various models and the effect of temporal autocorrelation within a given model. (B) Same but with regression weight as similarity measure for dRSA using principal component regression (PCR; see “Materials and Methods” for details). In short, simulated neural RDMs are exactly the same as in (A), but at the final step in dRSA (i.e., comparing neural and model RDMs) all models are included in a single regression, and only the beta weight of the tested model (columns) is illustrated. (C) and (D) Same as (A) and (B), respectively, but for the piecewise scrambled viewing condition. Importantly, these results indicate that PCR is effective at extracting only the model of interest from the simulated neural RDM, while largely regressing out the other models.

Dynamic representational similarity analysis (dRSA) approach.
Figure adapted from De Vries et al. 2023, Nat Commun, CC BY 4.0 (81). (A) Subjects observed ∼46 repetitions of 14 unique 5-sec dancing videos (Table S1) in 3 different viewing conditions (Fig. 1a and b) during MEG (i.e., ∼15 repetitions per video per condition). (B) Stimuli were characterized at different levels using dynamic stimulus models (e.g., body posture; see Fig. S4 for all models). (C) Individual-subject source-reconstructed MEG signals within regions of interest (ROI; see Fig. S5 for ROI definitions) were used as features for subsequent steps. (D) Neural and model representational dissimilarity matrices (RDMs) were created at each time point based on pairwise dissimilarity in neural responses to the 14 stimuli and pairwise dissimilarity in stimulus feature models, respectively. Bottom: model and neural RDMs are shown for 5-time points. (E) Similarity between neural and model RDMs was computed for each neural-by-model time point (lower panel), using regression weights to test a specific model RDM, while regressing out other covarying model RDMs. This approach was validated through simulations (see subsection ‘Simulations’ and Fig. S2). Last, the 2-dimensional dRSA matrix was averaged along the diagonal to create a latency-plot (i.e., the lag between neural and model RDM; upper panel), in which peaks to the right or left of the vertical zero-lag midline reflect reactive or predictive neural representations, respectively. These dRSA latency plots are computed separately for each subject, ROI, model and condition, and then statistically tested.

Illustration of stimulus models.
Figure adapted from De Vries et al. (2023) Nat Commun, CC BY 4.0 (81). (A) Example frames of two videos (rows) to be correlated for creating the model RDMs. (B) Low-level visual models. Left: pixelwise luminance (spatially smoothed grayscale). Middle: magnitude of pixelwise motion (operationalized as optical flow vectors), with brighter colors indicating higher magnitude. Right: direction of pixelwise motion, with optical flow vectors indicated in blue and scaled 10 times for illustrative purposes. (C) Models based on 3D kinematic marker positions, from left to right: view-dependent body posture, view-invariant body posture (i.e., after aligning the kinematic markers between videos without changing their internal structure, through translation and rotation along the vertical axis), view-dependent body motion as indicated by blue lines (i.e., difference in kinematic marker position between two subsequent frames), and view-invariant body motion. Note that the main dRSA regression-based analysis included also view-dependent and view-invariant acceleration models, as well as gaze-position models based on individual participant eye-tracker data.

ROI definitions.
Figure reproduced from De Vries et al. 2023, Nat Commun, CC BY 4.0 (81). Cortical regions of interest (ROIs) for main analysis based on combinations of parcels as defined according to the Human Connectome Project (HCP) atlas (82). ROIs were a priori defined as follows (ROI name = atlas parcels): V1 = V1 [487 vertices]; V2 = V2 [491 vertices]; V3+V4 = V3 and V4 [490 vertices]; LOTC = V4t, FST, MT, MST, LO1, LO2, LO3, PH, PHT, TPOJ2 and TPOJ3 [565 vertices]; aIPL = PF, PFt, AIP and IP2 [403 vertices]; PMv = IFJa, IFJp, 6r, 6v, PEF, IFSp, 44 and 45 [466 vertices]. Note that these vertex amounts are based on the ICBM152 template cortical surface, exact amounts differ slightly between individual subjects. MEG responses at all vertices from both hemispheres were combined into a single vector to compute the neural RDM at a single time point (i.e., pairwise dissimilarity in the MEG response to the 14 action sequences). For visualization, all vertices within a single ROI are given the same color, which matches the color for each ROI in the main results (Fig. 2, 6 and S1). LOTC = lateral occipitotemporal cortex, aIPL = anterior inferior parietal lobe, PMv = ventral premotor cortex.

Temporal subsampling and realignment.
Figure reproduced from De Vries et al. 2023, Nat Commun, CC BY 4.0 (81). Temporal subsampling and realignment were used to attenuate idiosyncratic temporal heterogeneity in dRSA results caused by arbitrary pairwise alignment specific to these 14 stimuli (see subsection “Temporal subsampling” in the “Methods” section). (A) On each of 1000 subsampling iterations, a 3-sec segment is randomly selected independently for each of the 14 stimuli (orange boxes). Crucially, while a different random 3-s window was selected for the 14 stimuli, for a given stimulus the same 3-sec window were selected for both neural and model data, thus keeping temporal alignment between those intact (indicated by vertical orange dotted lines). (B) and (C) Next, the 14 new 3-s segments are realigned, after which model and neural representational dissimilarity matrices (RDMs) are computed at each realigned time point t_R. Last, similarity is computed for each combination of realigned model time (x) and neural time (y). Note that the steps in (A), (B) and (C), as well as the last step to compute the dynamic representational similarity analysis (dRSA) curves, are all done within a single subsampling iteration. After 1000 subsampling iterations, dRSA results are averaged over iterations.

Ballet figures per stimulus.
Each video stimulus consisted of a unique sequence of 4 smoothly connected ballet figures. The 4 figures were selected from a total of 5 unique figures. Note that all figures were presented from multiple viewpoints, to allow for both view-dependent and view-invariant body posture and motion models. The pirouette was always performed counterclockwise and could start and end at different angles.

Peak latencies of the participant-average dRSA curves.
Peak latencies are indicated in msec relative to zero lag between neural and model RDMs. Results are shown only for condition-ROI-model combinations with a significant main dRSA result (Fig. 2, 6, and S1), since latency estimation is inaccurate otherwise. We observed both a predictive (negative) and lagged (positive) peak for pixelwise motion direction in the inverted condition (Fig. 2b) and therefore computed peak latency separately for each. LOTC = lateral occipitotemporal cortex, aIPL = anterior inferior parietal lobe, PMv = ventral premotor cortex. Lum. = luminance, mag. = magnitude.

Sign up for email alerts