Stimuli consisted of a sinusoidal amplitude modulation over a pedestal intensity (Panel A). The pedestal had a 10ms linear ramp at onset and offset. While the visual sinusoidal amplitude modulation was constant throughout the experiment (frequency = 6Hz, zero phase offset), the auditory amplitude modulation varied in both frequency (from 6Hz to 7Hz, in 5 steps) and phase offset (from 0 to 360 deg, in 8 steps). Next to each pair of stimuli, we represent the scatterplot of the visual and auditory envelopes, which highlight the audiovisual correlation. The call-out boxes zoom in on the correlation between the audiovisual sinusoidal amplitude modulation (excluding the linear onset and offset ramps), whereas the main scatterplots also display the ramps. Panel B represents the audiovisual Pearson correlation for all the stimuli used by Nidiffer et al. The left panel shows the total audiovisual correlation, which was calculated while also including the onset and offset ramps; the right panel represents the partial correlation, calculated by only considering the sinusoidal amplitude modulation (i.e., the area inside the call-outs in Panel A) without considering the onset and offset linear ramps (as done in Nidiffer et al., 2018). Note that once the ramps are included in the analyses, all audiovisual stimuli are strongly correlated (with only minor differences across conditions). The stars in the correlation matrices mark the cells corresponding to the four stimuli represented in Panel A, and the datapoints represented by a star in Figure 5F. Panel C displays a comparison between the stimuli used by Nidiffer et al. (2018) and our Experiment 2. Both the time axis (abscissa) and the intensity envelope (ordinate) are here drawn to scale. Although both experiments consist of stimuli with periodic amplitude modulations, there are key important differences. First off, while between two consecutive trials the visual and auditory stimuli were completely off in Nidiffer, in our study the pedestal was always present (without any interruptions across consecutive trials). That introduces transients in Nidiffer’s study both at the beginning and the end of each stimulus, which are absent in our stimuli. The relative magnitude of such transients to the comparatively low depth of amplitude modulation (of the sinusoidal component) is the ultimate reason for the absence of frequency doubling in Nidiffer et al. (2018). In our study, transients at the beginning and end of each trial are prevented by playing a constant pedestal stimulus level across trials, and by applying a Gaussian envelope to the depth of our square-wave modulation. Additionally, it is important to stress the obvious difference in the duration and frequency of the stimuli used in the two studies. Specifically, our stimuli were 12 times longer than the stimuli of Nidiffer, but the frequency of amplitude modulation of our study was about 12 times lower (varying depending on the various conditions of Nidiffer et al.). Finally, note the difference in the depth of amplitude modulation across the two experiments.