Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners

  1. Ross K Maddox
  2. Huriye Atilgan
  3. Jennifer K Bizley
  4. Adrian KC Lee  Is a corresponding author
  1. University of Washington, United States
  2. University College London, United Kingdom
3 figures and 3 videos

Figures

Construction of the auditory and visual stimuli.

(A) Amplitude envelopes shown for 2 s of the target (black) and masker (red) auditory streams. Trials were 14 s long, over which the target and masker envelopes were independent. (B) Visual radius envelopes for the three audio–visual coherence conditions: match-target (black), match-masker (red), and match-neither (blue). (C) Example frames of the visual disc at three radius values, according to the match-target envelope in B. (D) Carrier frequency modulation events for the pitch task. Deflection was one period of a sinusoid, reaching ±1.5 semitones over 100 ms. (E) Changes in vowel formants F1 and F2 for the timbre events. There were two streams, one with vowel /u/ and the other with vowel /a/. Timbre events lasted 200 ms and morphed formants F1 and F2 slightly toward /ε/ and /i/, respectively, and then back to /u/ and /a/. The closed circle endpoints show the steady-state vowel and the open circle point shows the average reversal point across subjects. Note that the change in formats during the morph event was small compared to the distance between vowels in the F1–F2 space. (F) The visual stimulus during a flash (100 ms duration).

https://doi.org/10.7554/eLife.04995.003
Behavioral results.

Each behavioral measure shown in two panels: (A) d′ sensitivity, (B) bias, (C) hit rate, (D) false alarm rate, (E) visual hit rate. Left: mean ± SEM for each condition across all subjects (solid squares) as well as for pitch and timbre events separately (empty triangles and circles, respectively). Right: normalized mean ± SEM across all subjects demonstrating the within-subjects effects. Measurements with significant effects of coherence (viz., sensitivity and hit rate) are denoted with bold type and an asterisk on their vertical axis label. Post hoc differences between conditions that are significant at p < 0.017 are shown with brackets and asterisks. See ‘Results’ for outcomes of all statistical tests.

https://doi.org/10.7554/eLife.04995.007
Figure 2—source data 1

Behavioral results for individual subjects.

Raw performance data for each subject for each of the panels in Figure 2. The data are in plaintext CSV format and can be opened with any text or spreadsheet editor. See ‘Materials and methods’ for specific descriptions of how each category was calculated.

https://doi.org/10.7554/eLife.04995.008
Conceptual model of coherence-based cross-modal object formation in the pitch task.

Sensory streams are shown as a box containing connected sets of features. Auditory streams are on the left half of the gray sensory boundary and visual on the right. Cross-modal coherence, where present, is shown as a line connecting the coherent auditory and visual features: specifically, the auditory amplitude and the visual size. This results in cross-modal binding of the coherent auditory and visual streams, enhancing each streams' features, which is beneficial in the match-target condition (A), problematic in the match-masker condition (B), and not present in the match-neither condition (C). Attended features are indicated with a yellow ellipse. Enhancement/suppression resulting from object formation is reflected in the strength of the box drawn around each stream's features (i.e., thick lines indicate enhancement, broken lines show suppression).

https://doi.org/10.7554/eLife.04995.009

Videos

Video 1
Two example trials from the match-target condition.

The video shows two trials from the pitch task in which the target auditory stream is coherent with the visual stimulus. The target auditory stream starts 1 s before the masker stream (lower pitch in the first trial, higher pitch in the second). The task is to respond by pressing a button to brief pitch perturbations in the target auditory stream but not the masker auditory stream, as well as to cyan flashes in the ring of the visual stimulus.

https://doi.org/10.7554/eLife.04995.004
Video 2
Two example trials from the match-masker condition.

As in Video 1, except the visual stream is coherent with the masker auditory stream.

https://doi.org/10.7554/eLife.04995.005
Video 3
Two example trials from the match-neither condition.

As in Video 1, except the visual stream is coherent neither auditory stream.

https://doi.org/10.7554/eLife.04995.006

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ross K Maddox
  2. Huriye Atilgan
  3. Jennifer K Bizley
  4. Adrian KC Lee
(2015)
Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners
eLife 4:e04995.
https://doi.org/10.7554/eLife.04995