Sustained vs transient channels.

A. Responses of sustained and transient channels to onset and offset step stimuli, and periodic signals comprising sequences of onsets and offsets. Note that while the sustained channels closely follow the intensity profile of the input stimuli, transient channels only respond to changes in stimulus intensity, and such a response is always positive, irrespective of whether stimulus intensity increases or decreases. Therefore, when presented with periodic signals, while the sustained channels respond at the same frequency as the input stimulus (frequency following), transient channels respond at a frequency that is twice that of the input (frequency doubling). B. Synchrony as measured from cross-correlation between pairs of step stimuli, as seen through sustained (top) and transient (bottom) channels (transient and sustained channels are simulated using Equations 1 and 10, respectively). Note how synchrony (i.e, correlation) for sustained channels peaks at zero lag when the intensity of the input stimuli changes in the same direction, whereas it is minimal at zero lag when the steps have opposite polarities (negatively correlated stimuli). Conversely, being insensitive to the polarity of intensity changes, synchrony for transient channels always peaks at zero lag. C. Synchrony (i.e., cross-correlation) of periodic onsets and offset stimuli as seen from sustained and transient channels. While synchrony peaks once (at zero phase shift) for sustained channels, it peaks twice for transient channels (at zero and pi radians phase shift), as a consequence of its frequency doubling response characteristic. D. Experimental apparatus: participants sat in front of a black cardboard panel with a circular opening, through which audiovisual stimuli were delivered by a white LED and a loudspeaker. E. Predicted effects of Experiments 1 and 2 depending on whether audiovisual integration relies on transient or sustained input channels. The presence of the effects of interest in both experiments or the lack thereof indicates an inconclusive result, not interpretable in the light of our hypotheses.

Experiment 1, results.

A. Responses in the TOJ task and psychometric fits (averaged across participants) for the four experimental conditions. B. Responses in the SJ task and psychometric fits (averaged across participants) for the four experimental conditions. Each dot in panels A and B corresponds to 80 trials. C. Window of subjective simultaneity for each condition and task. D. Point of subjective simultaneity for each condition and task.

Experiment 2, stimuli and results.

A. Schematic representation of the periodic stimuli. B. Frequency domain representation of the psychometric function: note the amplitude peak at 2 cycles per period. Errorbars represent the 99% confidence intervals. The inset represents the phase angle of the 2 cycles per period frequency component for each participant (thin lines), and the average phase (arrow). C. Results of Experiment 2 and psychometric fit, averaged across all participants. Each dot corresponds to 75 trials.

MCD model.

A. Model schematics: the impulse-response functions of the channels are represented in the small boxes, and call-outs represent the transfer functions. B. Lag detector step responses as a function of the lag between visual and acoustic steps. C. Correlation detector responses as a function of the lag between visual and acoustic steps. D. Population of MCD units, each receiving input from spatiotopic receptive fields. E. Normalization, where the output of each unit is divided by the sum of the activity of all units. F. Optimal integration of audiovisual spatial cues, as achieved using a population of MCDs with divisive normalization. Lines represent the likelihood functions for the unimodal and bimodal stimuli; dots represent the response of the MCD model, which is indistinguishable from the bimodal likelihood function.

MCD simulations of Experiments 1 and 2.

A. Schematics of the observer model, which receives input from one MCD unit to generate a behavioural response. The output of the MCD unit is integrated over a given temporal window (whose width depends on the duration of the stimuli), and corrupted by late additive noise before being compared to an internal criterion to generate a binary response. Such a perceptual decision-making process is modelled using a generalised linear model (GLM), depending on the task, the predictors of the GLM were either (Equation 8) or (Equation 9). B . Responses for the TOJ task of Experiment 1 (dots) and model responses (red curves). C. Responses for the SJ task of Experiment 1 (dots) and model responses (blue curves). D. Experiment 2 human (dots) and model responses (blue curve). E. Scatterplot of human vs model responses for both experiments.

MCD simulations of published results.

A. Results of the causality judgment task of Parise & Ernst (2016). The left panel represents the empirical classification image (grey) and the one obtained using the MCD model (blue). The right panel represents the output of the model plotted against human responses. Each dot corresponds to 315 responses. B. Results of the temporal order judgment task of Parise & Ernst (2016). The left panel represents the empirical classification image (grey) and the one obtained using the MCD model (red). The right panel represents the output of the model plotted against human responses. Each dot corresponds to 315 responses. C. Results of the causality judgment task of Locke & Landy (2017). The left panel represents the empirical classification image (grey) and the one obtained using the MCD model (blue). The right panel represents the effect of maximum audiovisual lag on perceived causality. Each dot represents on average 876 trials (range=[540, 1103]). D. Results of the temporal order judgment task of Wen et al. (2020). Squares represent the onset condition, whereas circles represent the offset condition. Each dot represents ≈745 trials. E. Results of the detection task of Andersen and Mamassian (2008), showing auditory facilitation of visual detection task. Each dot corresponds to 336 responses. F. Results of the audiovisual amplitude modulation detection task of Nidiffer and colleagues (2018), where the audiovisual correlation was manipulated by varying the frequency and phase of the modulation signals. Each dot represents ≈140 trials. The datapoint represented by a star correspond to the stimuli displayed in Supplementary Figure S7.

Sample size of the datasets modelled and analysed in the present study.

Two of the datasets listed here (i.e., Experiment 1 and Parise et al., 2016) consisted of two tasks, each tested on the same pool of observers. The last row represents the total number of observers and trials, the average number of trials per participant, and the average correlation between MCD simulations and human data. Note how the revised MCD tightly replicated human responses in all of the datasets included in this study, despite major differences in stimuli, tasks, and sample sizes of the individual studies (see last column).

Results of Friedman test for Experiment 1.

Four separate tests were used to assess whether the four experimental conditions differed in terms of PSS and WSS in the TOJ and SJ tasks. The first column represents the variables, the second the χ2 value, the third the degrees of freedom, and the fourth the p-value. Given that we ran four tests on the same dataset, statistical significance should be computed by comparing the p-value against a Bonferroni-adjusted alpha level of 0.0125 (i.e., 0.05/4).

Results and psychometric fits of Experiment 1.

Data from different observers are represented in different rows. A. represent the data of the TOJ task, B. represent the data of the SJ task. The icons on top represent the different conditions. Each dot corresponds to 10 trials. C. represents the scatterplot of the PSS measured with the TOJ plotted against the PSS measured with the SJ. D. represents the scatterplot of the WSS measured with the TOJ plotted against the PSS measured with the SJ. Each dot in panels C and D represents the PSS or WSS from one condition and observer (that is, there are 4 dots for each observer, one per condition); datapoints from the same participant are linked by a grey line.

Results and psychometric fits of Experiment 2.

Data from different observers are represented in different rows. The first four column represent the data in polar coordinates, whereas the second column represent the same dataset in Cartesian coordinates. The counter-clockwise rotation of the polar psychometric functions indicate that maximum perceived synchrony across the senses occur when vision changes slightly before audition. Each dot corresponds to 15 trials. The last column represents the psychometric curve in the frequency domain: note the peak at 2cpp in every participant, indicating the frequency doubling effect. The errorbars in the last column represent the 99% confidence intervals.

Stimuli and reverse correlation analyses of Parise & Ernst (2016).

Panel A represents three pairs of audiovisual stimuli (left), and their cross-correlation (right). Panel B shows a schematic representation of the reverse correlation analyses. This figure is adapted from [18].

Stimuli of Locke & Landy (2017).

Three example pairs of audiovisual stimuli, and their cross-correlogram. Note how the stimuli vary in both terms of temporal rate (i.e., total number of clicks and flashes) and audiovisual correlation (with the bottom one being fully correlated).

Stimuli of Wen et al. (2020).

Stimuli consisted of rectangular temporal envelopes, with a variable audiovisual lag either at the onset (top) or offset (bottom). The amount of audiovisual lag varied across trials, following a staircase procedure.

Stimuli of Andersen & Mamassian (2008).

Stimuli consisted of on-steps, with a parametrical manipulation of the lag between vision and audition, determined using the method of constant stimuli.

Stimuli of Nidiffer et al. (2018).

Stimuli consisted of a sinusoidal amplitude modulation over a pedestal intensity (Panel A). The pedestal had a 10ms linear ramp at onset and offset. While the visual sinusoidal amplitude modulation was constant throughout the experiment (frequency = 6Hz, zero phase offset), the auditory amplitude modulation varied in both frequency (from 6Hz to 7Hz, in 5 steps) and phase offset (from 0 to 360 deg, in 8 steps). Next to each pair of stimuli, we represent the scatterplot of the visual and auditory envelopes, which highlight the audiovisual correlation. The call-out boxes zoom in on the correlation between the audiovisual sinusoidal amplitude modulation (excluding the linear onset and offset ramps), whereas the main scatterplots also display the ramps. Panel B represents the audiovisual Pearson correlation for all the stimuli used by Nidiffer et al. The left panel shows the total audiovisual correlation, which was calculated while also including the onset and offset ramps; the right panel represents the partial correlation, calculated by only considering the sinusoidal amplitude modulation (i.e., the area inside the call-outs in Panel A) without considering the onset and offset linear ramps (as done in Nidiffer et al., 2018). Note that once the ramps are included, all audiovisual stimuli are strongly correlated. The stars in the correlation matrices mark the cells corresponding to the four stimuli represented in Panel A, and the datapoints represented by a star in Figure 5F