(a) Button-press data, aligned at cue onset, were averaged over all crossmodal cue and visual-only periods per subject, then averaged over subjects for each cue condition. Y-axis represents the proportion of button-presses reporting congruent crossmodal and visual flicker at each time point, sampled at 60 Hz (or every 16.7 ms). Colored lines and their shading show mean ± 1 standard error across 34 subjects during attended and ignored cues (thick and thin lines) for low- and high-frequency (green and red colors). Black lines represent the equivalent probability for visual-only periods, serving as baseline (Materials and methods). Asterisks indicate a significant difference between cues at each time point (repeated-measures ANOVA followed by planned comparisons). We use FDR q = 0.05 for the statistical threshold unless noted otherwise. (b) Crossmodal effects are mediated by task-relevant attention. Our measure of crossmodal effects, the perceptual switch index (PSI, y-axis), is defined as the mean difference for the probability of seeing congruent flicker during 1–4 s after the cue onset for attended-low-frequency cues (thick green in panel a) compared to other cue types. Attention-task performance (x-axis) is the correlation coefficient between the reported and actual congruent stimuli when comparing between rivalry percepts and crossmodal cues at offset (See Materials and methods for details). The across-subject correlation between the two variables was strong (r(32) = .46, p = 0.006, two-tailed), demonstrating the crossmodal effects were strongly dependent on performance during the attention task. (c) and (d) Button-press data aligned at cue onset, with lines and shading as in panel (a). Y-axis showing the proportion of button-presses reporting the mismatched flicker at each time point, after (c) visual-crossmodal mismatch, or (d) visual-crossmodal match at cue onset. Only the data of the attended low-frequency condition differed significantly from visual-only periods.