1. Neuroscience
Download icon

Vestigial auriculomotor activity indicates the direction of auditory attention in humans

  1. Daniel J Strauss  Is a corresponding author
  2. Farah I Corona-Strauss
  3. Andreas Schroeer
  4. Philipp Flotho
  5. Ronny Hannemann
  6. Steven A Hackley
  1. Systems Neuroscience and Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Germany
  2. Audiological Research Unit, Sivantos GmbH, Germany
  3. Clinical and Cognitive Neuroscience Laboratory, Department of Psychological Sciences, University of Missouri, United States
Research Article
  • Cited 0
  • Views 1,690
  • Annotations
Cite this article as: eLife 2020;9:e54536 doi: 10.7554/eLife.54536

Abstract

Unlike dogs and cats, people do not point their ears as they focus attention on novel, salient, or task-relevant stimuli. Our species may nevertheless have retained a vestigial pinna-orienting system that has persisted as a 'neural fossil’ within in the brain for about 25 million years. Consistent with this hypothesis, we demonstrate that the direction of auditory attention is reflected in sustained electrical activity of muscles within the vestigial auriculomotor system. Surface electromyograms (EMGs) were taken from muscles that either move the pinna or alter its shape. To assess reflexive, stimulus-driven attention we presented novel sounds from speakers at four different lateral locations while the participants silently read a boring text in front of them. To test voluntary, goal-directed attention we instructed participants to listen to a short story coming from one of these speakers, while ignoring a competing story from the corresponding speaker on the opposite side. In both experiments, EMG recordings showed larger activity at the ear on the side of the attended stimulus, but with slightly different patterns. Upward movement (perking) differed according to the lateral focus of attention only during voluntary orienting; rearward folding of the pinna’s upper-lateral edge exhibited such differences only during reflexive orienting. The existence of a pinna-orienting system in humans, one that is experimentally accessible, offers opportunities for basic as well as applied science.

eLife digest

Dogs, cats, monkeys and other animals perk their ears in the direction of sounds they are interested in. Humans and their closest ape relatives, however, appear to have lost this ability. Some humans are able to wiggle their ears, suggesting that some of the brain circuits and muscles that allow automatic ear movements towards sounds are still present. This may be a ‘vestigial feature’, an ability that is maintained even though it no longer serves its original purpose.

Now, Strauss et al. show that vestigial movements of muscles around the ear indicate the direction of sounds a person is paying attention to. In the experiments, human volunteers tried to read a boring text while surprising sounds like a traffic jam, a baby crying, or footsteps played. During this exercise, Strauss et al. recorded the electrical activity in the muscles of their ears to see if they moved in response to the direction the sound came from. In a second set of experiments, the same electrical recordings were made as participants listened to a podcast while a second podcast was playing from a different direction. The individuals’ ears were also recorded using high resolution video.

Both sets of experiments revealed tiny involuntary movements in muscles surrounding the ear closest to the direction of a sound the person is listening to. When the participants tried to listen to one podcast and tune out another, they also made ear ‘perking’ movements in the direction of their preferred podcast.

The results suggest that movements of the vestigial muscles in the human ear indicate the direction of sounds a person is paying attention to. These tiny movements could be used to develop better hearing aids that sense the electrical activity in the ear muscles and amplify sounds the person is trying to focus on, while minimizing other sounds.

Introduction

Watching the ears allows an equestrian to gauge their mount’s shifting attention. Ear movements are not a useful cue in humans or apes because higher primates have lost the ability to orient by adjusting pinna shape and focal direction. Instead we judge a person’s attention by their gaze direction. In thousands of research reports each year, though, casual observation of ocular orienting is replaced by sophisticated recording techniques. We show in the present paper that similar electrical and optical techniques allow us to extract muscular correlates of pinna-orienting in our species and even render subtle pinna-orienting movements visible. Activation of the ear muscles is directionally specific and it occurs during voluntary as well as reflexive attention.

A review of research in Hackley, 2015 on pinna-orienting in humans identified three relevant findings scattered across the preceding 100-or-so years. The first was Wilson’s oculo-auricular phenomenon (Wilson, 1908), in which shifting the gaze hard to one side elicits a 1 to 4 mm deflection of the lateral rim of both ears. The relevance to spatial attention is uncertain, though, with diverging results across studies, for example see Gerstle and Wilkinson, 1929; Urban et al., 1993; O'Beirne and Patuzzi, 1999. Additional evidence comes from a 1987 study (Hackley et al., 1987) of the bilateral postauricular muscle (PAM) reflex (onset latency = 10 ms) to acoustic onset transients. Increased amplitudes were observed when subjects directed their attention to a stream of tones on the same side as the recorded muscle while ignoring a competing, contralateral stream. Comparisons across left/right stimulus, attention, and PAM combinations localized modulation to the motor limb of the reflex arc. This pattern could indicate that the muscle behind an ear is primed when attention is directed toward that side. Finally, an experiment in Stekelenburg and van Boxtel, 2002 found that the automatic capture of attention by unexpected sounds coming from a speaker hidden to the left of the participant elicited greater activity in the left than right PAM.

Apart from the research just described, functional studies of the human auriculomotor system have been mainly limited to the PAM reflex, in the context of audiometry or affective psychophysiology. The auriculomotor system lies essentially untouched in the literature. Here we present evidence that our brains retain vestigial circuitry for orienting the pinnae during both exogenous, stimulus-driven attention to brief, novel sounds and endogenous, goal-directed attention to sustained speech. We also demonstrate a complex interplay of different auricular muscles which may be causally linked to subtle movements of the pinnae.

Results

Experiment 1 - Exogenous attention

To examine automatic, stimulus-driven attention we used novel sounds similar to those in the Stekelenburg and van Boxtel, 2002 study, for example traffic jam, baby crying, footsteps. However, we presented them randomly from four different speakers (at ± 30°, ± 120°; Figure 1) rather than just one, while the subject read a boring essay. As we were interested in the interactive role of distinct muscles in attempting to shape and point the pinnae, we recorded EMG from posterior, anterior, superior, and transverse auricular muscles (PAM, AAM, SAM, and TAM). Visual evidence had previously been limited to still photos of Wilson’s oculo-auricular phenomenon (Wilson, 1908), so we supplemented our EMG data with videos from four high-definition cameras, see Methods and Video 1, Video 2, and Video 3. To confirm that our findings would generalize to different age groups, older (62.7 ± 5.9 y) as well as younger (24.1 ± 3.1 y) adults were tested.

Experimental setup.

(A) Four loudspeakers presented novel sounds (Exp. 1) or stories (Exp. 2) at 30° to the left or right of fixation or behind the interaural axis. Instructions, text, or fixation cross was displayed on a 55 in flat screen. (B) Surface EMGs were recorded bilaterally from four auricular muscles as well as from left zygomaticus major, frontalis, and sternocleidomastoideus, using a bandpass of 10 – 1000 Hz and a sampling rate of 9600 Hz. Separation of paired auricular electrodes was 1 cm.

Video 1
Experiment 1 –Ear movement example from a trial with a novel sound at the right posterior speaker in Experiment 1.

The right half of the display portrays evoked movements of the ipsilateral pinna in three ways. The large video clip of the pinna uses digital magnification to render the overall pattern of movement apparent. The color overlay in these videos indicates the motion magnitude. Just below the video and to the right, an unrectified EMG recording of the postauricular muscle is shown in co-registration with the video. The global head motion was reduced by a 2-dimensional rigid pre-registration with respect to a set of manually specified reference points on the head (see also Figure 2—figure supplement 1). The 3-dimensional graph medial to the 2-D graph includes a vector that indicates moment-by-moment changes in EMG activity of the superior auricular muscle (SAM, the vertical axis), transverse auricular muscle (TAM, a horizontal axis), and the difference between activity in the posterior and anterior auricular muscles (PAM-AAM, the other horizontal axis). The left half of this video gives corresponding information for the contralateral ear which, consistent with evidence presented in the main text, was not as active as the ipsilateral one.

Video 2
Experiment 1 –The right ear example from the previous video, but with four different videos in sequence.

The first video of the sequence shows the raw recording (without digital magnification). The second video shows the digitally magnified motion, the third video shows the magnified motion with color overlay as in the previous video supplement, and the fourth video shows the three dimensional motion from a different angle. This video sequence shows the impact of the digital motion magnification and the depth information about ear motion that can be derived from a stereo computer vision setup such as the one used here.

Video 3
Experiment 2 –Ear movement example from a participant who exhibited exceptionally large, long-lasting involuntary auricular muscle activations and ear motion during the endogenous attention task in Experiment 2.

The attention of the participant was directed to the story played from the posterior right speaker. The organization of the plots and co-registration is as in Video 1. However, this time the raw videos without digital magnification are shown. The raw videos are played faster, time-locked to the time axis given in minutes in the one dimensional plots of the rectified postauricular muscle activity. Note that time-axis reflects the entire timeline including the instructions and the introduction to the stories before the directional listening task. The listening task started at approximately 2 min. The video also documents the end of the listening task (around 7 min) accompanied with a time–locked offset of the muscle activation and pinna displacement. A causal relation of the rectified postauricular muscle activity and the motion magnitude in the videos is clearly noticeable, especially for the ipsilateral ear.

The signal-averaged EMG waveforms of Figure 2 show well-defined responses with an onset latency of about 70 ms, responses that vary in amplitude, duration, and morphology according to the relative direction of the sound source. The inter- and intra-subject variability for the analyzed auricular muscles is shown in Figure 3 and Figure 4, respectively. These plots portray the consistency of the PAM, AAM, and TAM responses across stimuli and subjects, especially for stimulation from the back. For the statistical analysis, mean amplitudes were subjected to a mixed, repeated-measures analysis of variance, with factors of age group, stimulus-muscle correspondence (ipsi-/contralateral), and anterior/posterior stimulus direction. EMG amplitudes were larger for stimulus sources on the same side as the recorded ear for PAM, AAM, and TAM [F(1,26) = 47.44, 17.01, and 47.53, respectively; p-values < 0.001; ηp2 = 0.65, 0.40, and 0.65] but not SAM.

Figure 2 with 1 supplement see all
Experiment 1.

Grand average (N = 28) of the baseline corrected and normalized event-related electromyograms at the four auricular muscles for the recordings ipsilateral (left panel) and contralateral (right panel) to stimulation; top: front speakers (30°), bottom: back speakers (120°). The contralateral-ipsilateral organization of our data set is justified by a preliminary analysis that obtained null effects for left–versus–right using a more complete factorial structure (left/right stimulus direction × left/right recording site). The following figure supplement is available for Figure 2—figure supplement 1. Analysis of video recordings from one participant who exhibited submillimeter pinna displacements in response to stimulation from the back speakers. This figure supplement is complemented by Video 1 and Video 2.

Figure 3 with 3 supplements see all
Experiment 1 – Responses of the PAM to stimuli from the back speakers, showing intersubject variability: Top panels: Every row corresponds to the averaged response of one participant.

Amplitude is encoded in color. The top rows (1-16) represent younger adult participants; the bottom rows (17-28), older adults. Bottom panels: Mean and standard deviation based on the above plots. The following figure supplements are available for Figure 3—figure supplement 1. The described intersubject variability analysis for the AAM, Figure 3—figure supplement 2. SAM, and Figure 3—figure supplement 3. TAM.

Figure 4 with 3 supplements see all
Experiment 1 – Intrasubject variability of the PAM: Mean and standard deviations of the phasic responses 50 - 300 ms) of every participant.

Top panels: responses to the front speakers. Bottom panels: responses to the back speakers. Left panels: Responses of the younger adults. Right panels: responses of the older adults. Blue represents ipsilateral responses, red represents contralateral responses. The following figure supplements are available for Figure 4—figure supplement 1. The described intrasubject variability analysis for the AAM, Figure 4—figure supplement 2. SAM, and Figure 4—figure supplement 3. TAM.

Responses were also larger to sounds emanating from the back than the front speakers for PAM, AAM, and TAM [F(1,26) = 32.1, 12.0, and 19.9, respectively; p-values < 0.003, ηp2 = 0.55, 0.32, and 0.43]. Posterior, ipsilateral stimulation elicited the most vigorous responses from these three muscles. In particular, the interaction between the factors ipsi/contralateral and anterior/posterior for PAM, AAM, and TAM yields F(1,26) = 40.4, 14.9, and 23.6, respectively; p-values < 0.002; ηp2 = 0.61, 0.36, and 0.48. There were no interactions involving age group, but a main effect indicated that older participants had smaller AAM responses [F(1,26) = 6.0, p < 0.03, ηp2 = 0.19].

These results support the hypothesis that the human brain retains circuits that attempt to point the ears in the direction of unexpected, potentially relevant sounds. The corresponding vestigial auriculomotor drive appears to be causally linked to very small ear displacements, see Figure 2—figure supplement 1, Video 1, and Video 2. Having documented the existence of directionally-appropriate responses of the ear muscles to brief novel sounds, we turn now to a qualitatively distinct type of attention.

Experiment 2 – Endogenous Attention

To examine voluntary, goal-directed attention we used the classic, dichotic-listening paradigm, see Hillyard et al., 1973; Cherry, 1953. Two competing short stories were played either over the two front speakers or the two back speakers. To increase motivation participants were allowed to choose, after a brief introduction, which of the two stories (podcasts) they would like to listen to. They were then told which speaker that story would be presented from. Our subjects were instructed to listen carefully while looking at a fixation cross and, as in the immediately preceding study, holding their head still on a chin rest. Upon completion, a new story was picked and the listening direction was switched to one of the other speakers. Recording methods were identical to those of the exogenous experiment. Muscle activity was quantified as the mean of the absolute EMG energy over the entire course of each 5 min listening trial in Figure 5 and for consecutive segments of 10 s duration in a temporal analysis shown in Figure 6.

Figure 5 with 2 supplements see all
Experiment 2.

Grand average of the PAM, AAM, and SAM activity when stories were played from the front (top) and back speakers (bottom). Shown is the normalized (total) energy of the left/right recording channels during attention to the left or right story (bars represent the standard error). The following figure supplements are available for Figure 5—figure supplement 1. Subband analysis of the described ipsi- vs. contralateral effect for PAM, AAM, and SAM; Figure 5—figure supplement 2. Reported results for a selected narrow frequency band.

Experiment 2.

Time–resolved activity (each sampling point represents the energy induced in consecutive 10 s segments) after pooling the ipsi– and contralateral signals with a segment–wise normalization for the front (top) and back speakers (bottom).

As in the exogenous study, EMG energy at PAM and AAM was largest on the side to which attention was focused [analysis corresponding to Figure 5: F(1, 19) = 15.2 and 4.6, respectively; p= 0.001 and 0.04; ηp2 = 0.44 and 0.20]; an effect that is particularly strong in narrow–band middle frequency components of the signal, see Figure 5—figure supplements 1 and 2 and the tables in Supplementary file 1.

A different pattern emerged for the other two muscles. Whereas TAM but not SAM activity had reflected lateralization of transient, exogenous attention, the reverse was true for sustained, goal-directed attention. That is to say, mean EMG energy at SAM was larger at the ipsi– than contralateral ear [F(1, 19) = 16.3; p=0.001ηp2 = 0.46] in Experiment 2, but there was no such difference for TAM.

Another main effect indicated that activation of all four muscles was generally enhanced when participants listened to one of the two speakers that were slightly behind as opposed to in front of them [PAM, AAM, TAM, SAM: F(1, 19) = 5.7, 3.1, 8.1, and 12.0, respectively; p= 0.03, 0.09, 0.01, and 0.003; ηp2 = 0.23, 0.14, 0.30, and 0.39]. These effects did not interact with each other or with age. Although PAM activity declines over time, EMG energy of all three muscles is clearly sustained across the 5-min sessions, see Figure 6. A corresponding sustained deflection of the pinna is also noticeable in the co-registered Video 3.

Potential motor confounds

An alternative to the account we have been developing is that participants in Experiments 1 and 2 may have shifted their gaze toward the attended source. This would have then triggered Wilson, 1908 phenomenon, that is auriculomotor activity secondary to large gaze shifts. To test this hypothesis, we segmented the horizontal electrooculogram (EOG) in the same way as the auricular EMG. Voltages were converted to degrees of arc separately for each participant, based on findings from a cursor tracking protocol (± 35°). Figure 7 and Figure 8 document a complete absence of eye movements that were systematically related to attention direction. A limitation of these findings is that electro-oculographic recordings have a resolution of only 1 − 2°. However, gaze shifts less than 30° are rarely accompanied by auriculomotor activity, see Urban et al., 1993. Representative examples of macrosaccades during reading in Experiment one with co-registered auricular muscle activity can be found in Figure 7—figure supplement 1. There is no obvious linkage of saccades and PAM responses. Note that the mean visual angle range observed in this example generalizes across subjects, see Figure 7—figure supplement 2. Also when considering all the macrosaccades from all the subjects in Experiment 1, our data do not exhibit a regularity between auditory stimuli and macrosaccades, see Figure 7—figure supplement 3.

Figure 7 with 9 supplements see all
Experiment 1: Averaged horizontal EOG activity at around the time of stimulation from the back speakers, showing an apparent absence of systematic shifts in gaze direction: Top panels: Every row corresponds to the averaged response of one participant.

Gaze angle is encoded in color, such that positive values (yellow) indicate rightward eye movements/positive angles. The top rows (1-16) represent younger adult participants; the bottom rows (17-28), older adults. Bottom panels: Mean and standard deviation based on the above plots.The following figure supplements are available for Figure 7—figure supplement 1. Macrosaccades during reading for one subject as example; Figure 7—figure supplement 2. Boxplots of the EOG for all subjects; Figure 7—figure supplement 3. Density of all detected macrosaccades during Experiment 1; Figure 7—figure supplement 4. Responses of the M. sternocleidomastoideus to stimuli from the front speakers; Figure 7—figure supplement 5. Responses of the M. sternocleidomastoideus to stimuli from the back speakers; Figure 7—figure supplement 6. Responses of the M. frontalis to stimuli from the front speakers; Figure 7—figure supplement 7. Responses of the M. frontalis to stimuli from the back speakers; Figure 7—figure supplement 8. Responses of the M. zygomaticus to stimuli from the front speakers; Figure 7—figure supplement 9. Responses of the M. zygomaticus to stimuli from the back speakers.

Figure 8 with 2 supplements see all
Experiment 2: Intrasubject variability of the horizontal EOG: Mean and standard deviations of the EOG during the complete trial of every participant.

Top panels: attending the front speakers. Bottom panels: attending the back speakers. Left panels: younger adults. Right panels: older adults. Blue represents the EOG when attending the left, red when attending the right speaker. A positive EOG indicates that the gaze is directed toward the side of the attended speaker. Note that the deviation of the mean from 0 is well within one standard deviation and therefore indicates that participants did not systematically divert their gaze to the attended speaker. The following figure supplements are available for Figure 8—figure supplement 1. Time-resolved EOG analysis in Experiment 2; Figure 8—figure supplement 2. Activity of the frontalis and zygomaticus muscle in Experiment 2.

Another line of evidence that the auricular responses observed in our study were not secondary to eye movements, concerns their pattern of lateralization. Activation of TAM during Wilson’s oculo-auricular phenomenon is more vigorous on the side opposite the direction of gaze, see Gerstle and Wilkinson, 1929; Urban et al., 1993. By contrast, we found in Experiment one that TAM activation was relatively enhanced at the ear on the same side as the attention-engaging sounds. The PAM component of Wilson's phenomenon does exhibit enhanced activity on the ipsilateral side, but this effect appears to be reliable only for gaze shifts greater than about 40 degrees (Patuzzi and O'Beirne, 1999, Figure 4).

Another alternative interpretation is that participants oriented not with their ears or eyes, but by lifting their chin from the chin rest and rotating their head toward the attended sound. If humans have a vestibulo-auricular response as do cats (Tollin et al., 2009), such head rotations could have indirectly triggered activity in the ear muscles. However, recent research has shown that azimuthal head rotations have little effect on auricular activity in humans, see Cook and Patuzzi, 2014. Moreover, analysis of sternocleidomastoid EMG in our data suggest that movements of the neck were rare, small, and unsystematic, see Figure 7—figure supplements 4 and 5. An additional statistical analysis in Supplementary file 1 also rejects an influence of the sternocleidomastoid EMG and horizontal EOG. Finally, head rotations would have been too slow to generate the rapid responses of around 70 ms onset latency observed in the ear muscles in Experiment 1. We note with interest, though, the possibility that subtle, covert activation of head turning muscles (Corneil et al., 2008) might be correlated with ocular and auricular orienting. Note that there was also no corresponding co–activation of the other measured (non–auricular) facial muscles, the zygomaticus and frontalis muscle, see Figure 7—figure supplements 69 for Experiment one and Figure 8—figure supplement 2 for Experiment 2.

Discussion

These data provide compelling evidence that our brains retain, in vestigial form, circuitry for orienting the pinnae during both exogenous and endogenous modes of attention. The neural drive to our ear muscles is so weak that the actual movements (see co-registered video data in the Supplementary Information) are at least one to two orders of magnitude smaller compared to those generated during biting, smiling, grimacing, or voluntary ear-wiggling. To understand what remains of the vestigial pinna-orienting system so as to exploit it for practical or scientific purposes it is helpful to take a comparative, phylogenetic approach (Hackley, 2015, Hackley et al., 2017).

Vestigial Pinna-Orienting

The ability to swivel and point the pinnae seems to have been lost during the transition from the primarily nocturnal lifestyles of prosimians to the diurnal ones of New World monkeys, and then, Old World monkeys (Coleman and Ross, 2004). Mobility continued to decline as the ears became shorter and more rigid, see Waller et al., 2008; Coleman and Ross, 2004. The musculature degenerated. For example, an inferior auricular muscle to oppose SAM still exists in lesser apes such as gibbons and siamangs (Burrows et al., 2011), but not in chimpanzees (Burrows et al., 2011) or humans (Cattaneo and Pavesi, 2014). Given that head rotation has little effect on PAM activity (Cook and Patuzzi, 2014), it seems likely that the vestibulo–auricular reflex as documented in cats (Tollin et al., 2009) has not been conserved in our species. Also presumably lost is the ability to use proprioceptive information to adjust auditory processing in accordance with pinna position, orientation, and shape as documented in cats, see Kanold and Young, 2001. Although the ear muscles of Old World monkeys have spindles (Lovell et al., 1977), those of humans do not (Cattaneo and Pavesi, 2014).

When pinna-orienting movements became too small to modify acoustic input substantially, possibly 25 million years ago when lesser apes branched off from Old World monkeys, see Gibbs et al., 2007 as discussed in Hackley, 2015, selective environmental pressure ceased. The neural system became more-or-less 'frozen’ in a form optimized for controlling taller, more flexible ears, mounted on a smaller, more spherical head. This evolutionary perspective helps us to understand the surprising finding that AAM, which pulls the base of the pinna forward, was activated in Experiment one by novel sounds coming from the rear. Co-activation of opposing muscles AAM and PAM in our remote ancestors would have reduced occlusion of the ear canal by the tragus. Note that this occlusion occurs in monkeys when contraction of the PAM homolog is unopposed, see Waller et al., 2008, supplementary video clip 17. In addition, PAM-AAM co-activation would have stabilized the base of the pinna and reduced myotendinous elasticity, thereby allowing quick changes in position or orientation. This perspective also illuminates our unexpected finding of ipsilateral SAM suppression in Experiment 1. A study of pinna orienting in cats, whose tall ears resemble those of prosimians, showed that they tend to tilt the ear downward slightly when orienting to a lateral target, see Populin and Yin, 1998.

Potential neural mechanisms

Neurobiologists have distinguished two types of pinna-orienting movements in cats, based on onset latency, see Siegmund and Santibáñez, 1982. The short-latency response is specific to auditory stimuli and is chronometrically uncorrelated with saccades toward the target. By contrast, the long-latency response can be elicited by visual as well as auditory stimuli and it is roughly synchronous with ocular orienting, see Populin and Yin, 1998. Using a 4-speaker set-up similar to that of the present Experiment 1, Siegmund and Santibáñez, 1982 found cats’ unconditioned pinna responses to have an EMG onset latency that averaged 78 ms, similar to our value of about 70 ms. The animals were then trained to make gaze shifts toward the sound sources. Onset latency of the auriculomotor responses dropped to a remarkable 29 ms and the responses were resistant to extinction over the course of 125 trials. Both findings were replicated by Populin and Yin, 1998 (mean = 26 ms; failure to extinguish across 10,000 unreinforced trials). The latter authors obtained an even more rapid response (mean = 21 ms) when the sound was preceded by a visual stimulus that served as warning signal and indicated that the cat should maintain gaze at a fixation point. They argued that the short-latency pinna response is too rapid to be mediated by the brain region most centrally involved in orienting, the superior colliculus (SC). This is because an earlier study, Populin and Yin, 1997, had found the average first-spike latency in the relevant portion of this structure to be 19 ms.

Comparisons with these cat studies suggest that our participants' auriculomotor responses may have been primarily also of the short-latency variety that is not mediated by the SC. Two of the conditions tested by Populin and Yin, 1998 involved brief, lateralized auditory stimuli that, as in the present Experiment 1, were not task-relevant. The stimuli elicited short-latency ipsilateral pinna movements that were temporally uncorrelated with gaze shifts (see their Figures 6 and 8). During the delayed-saccade condition of their study, laterally presented sounds were task-relevant and forward fixation was required, as in our Experiment 2. Ipsilateral pinna movements triggered by onset of these sounds were of the short-latency variety (21 ms, as noted above). Subsequent, smaller movements were then observed in synchrony with ocular orienting, roughly 400 ms after the fixation point was extinguished (Figure 7). It is long-latency pinna movements of this sort that Populin and Yin, 1998 suggested might be mediated by the SC.

Given the major role of the SC in controlling eye fixation (Krauzlis et al., 2017), this structure may also be responsible for sustained maintenance of pinna orientation, such as in Experiment 2. Pinna movements can be triggered by electrical stimulation of the deep and intermediate layers of the SC in accordance with a topographical pattern that is in register with that of eye movements (Stein and Clamann, 1981). Lesions of this structure reduce the likelihood of pinna orienting as well as its accuracy, see Czihak et al., 1983. Although monosynaptic connections from SC to auriculomotor neurons in the facial nucleus do exist (Vidal et al., 1988), pinna control is dominated by disynaptic pathways from the SC that include the paralemniscal, oculomotor, or pontine reticular zones, see Henkel and Edwards, 1978; Takeuchi et al., 1979; Vidal et al., 1988. Among these, the paralemniscal zone appears to be the most important, and its auditory input originates in the nearby nucleus sagulum, see Henkel, 1981.

Portions of neocortex also play a role in controlling pinna movements. Lesions of auditory cortex reduce the kinematic complexity of pinna orienting and slow its habituation, see Alvarado and Santibañez, 1971. Stimulation and recording studies in the macaque have identified a premotor ear-eye field (area 8B), which is connected with both auditory cortical areas and the SC, see Lanzilotto et al., 2013. These animal neuroanatomy and physiology studies, coupled with Wilson, 1908 seminal report, make it clear that the eyes and pinnae work together during endogenously cued attentional orienting.

Future work

A recent study in humans by Gruters et al., 2018, showed a close relationship between movements of the left and right eardrums and multiple parameters of task-related left- and right-directed eye movements. It will be important in future research to test whether muscular responses of the middle and outer ears are linked in a coordinated manner to ocular orienting. Furthermore, exploration of the relationship between human auriculomotor activity and subtle markers of covert attention (see the recent review given in van Ede et al., 2019) corticofugal modulation of ascending auditory pathways (Perrot et al., 2006) in endogenous attention, and the neural mechanisms of orienting discussed in the preceding section has scarcely begun.

Our results have implications for applied science, as well. They suggest that patterns of auricular muscle activity might serve as an easily accessible correlate of top-down processing in endogenous modes of attention. As such, the described effects might complement electroencephalographic indices of attentional focus (de Cheveigné et al., 2018; Schäfer et al., 2018) in that their sensitivity is exclusively spatial, rather than reflecting a context-specific mixture of modality, feature, location, and object representations. Registration of pinna-orienting might better support near real-time decoding of the attentional focus and, as compared to EEG-based stimulus reconstruction approaches, does not require the exogenous sound source, for example see the discussion in Schäfer et al., 2018. Thus, auricular muscle monitoring might support the decoding of auditory attention in technical applications such as attentionally controlled hearing aids that preferentially amplify sounds the user is attempting to listen to. We wish to underscore, though, that the development of such applications would benefit crucially from a better understanding of how the auditory and visual attention systems interact. We hope that the results presented here will stimulate research in this direction.

Materials and methods

Participants

Both older (N = 12, mean age = 62.7 ± 5.9 y, 8 F, all right-handed) and younger adult (N = 16, mean age = 24.1 ± 3.1 y, 8 F, 15 right-handed, one left-handed) volunteers in Experiment one had age-typical, pure tone audiometric thresholds (1, 2, 4, and 8 kHz; young < 20 dB; old < 40 dB). All served in both studies, but after the 8th participant, Experiment two was altered in several ways (e.g., four stimulus directions rather than two). Only data from the final 21 subjects were retained for Experiment 2. The two groups in this experiment comprised 11 older adults (mean age = 62.6 ± 6.2 y, 8 F, all right-handed) and 10 younger adult (mean age = 24.1 ± 3.6 y, 5 F, nine right-handed, one left-handed). After a detailed explanation of the procedure, all subjects signed a consent form. The study was approved by the responsible ethics committee (ethics commission at the Ärztekammer des Saarlandes, Saarbrücken, Germany; Identification Number: 79/16).

Stimuli and tasks

Request a detailed protocol

The four active loudspeakers (KH120A, Neumann, Germany) were positioned at head level, 115 cm. Sounds in Experiment 1 and 2 were reproduced with a soundcard (Scarlett 18i20, Focusrite, UK). The experimental paradigms were programmed using software for scientific computing (Matlab, Mathworks, USA) and Psychtoolbox 3. In Experiment 1, sounds lasted 1.7 – 10.0 s, were delivered every 15 – 40 s, and had an average intensity of 70 dBC, except for foot steps (65 dBC). Each of the nine stimuli (lemur howling, dog barking, helicopter flying, cell phone vibrating, birds singing, baby crying, mosquito buzzing, footsteps, and traffic jam) was repeated four times (i.e., once per speaker). In Experiment 2, the stories were 5 min long, with an average intensity of 50 dBA for younger and 60 dBA for older participants. Participants answered content questions at the conclusion of each condition in this experiment.

Electrophysiological recordings and signal processing

Surface EMGs were recorded with non-recessed, Ag/AgCl electrodes (BME4, BioMed Electrodes, USA), which were 4 mm in diameter for TAM and 6 mm (BME6) in all other cases, see Figure 1. The signals were AD-converted at 9600 Hz and 24 bit resolution per channel (4 × USBamp, g.tec GmbH, Austria). Skin temperature, skin resistance, electrocardiograms, and EOGs were also recorded. All signal processing algorithms were implemented using the scientific computing software Matlab (Mathworks, USA, Version: 2018a). Because surface electrodes had not previously been used to record from intrinsic ear muscles, we conducted preliminary tests with a participant who exhibited a large, reliable Wilson’s phenomenon and who could voluntarily contract her SAM and PAM. Isolation of the corresponding responses indicated that EMG from TAM electrodes was not an artifact of volume conduction from PAM or SAM. In other words, the TAM activity was not correlated with forced SAM/PAM innervation. Sternocleidomastoid EMG signals were zero-phase bandpass filtered from 60 to 1000 Hz (FIR, 2000th order), the auricular EMG signals from 10 to 1000 Hz (FIR, 2000th order) with a notch filter at 50 Hz (IIR, 2nd order). Horizontal EOG signals were zero-phase filtered from 0.01 to 20 Hz (IIR, 2nd order). All filter operations were performed using Matlab’s filtfilt-function for zero-phase filtering. The filtered signals were then downsampled to 2400 Hz for further processing. The statistical analysis was performed using repeated measures ANOVA (with IBM SPSS Statistics 26). Within-subjects factors were stimulus-muscle correspondence (ipsi- vs. contralateral responses) and anteriority (front vs. back speakers). The only between-subjects factor our statistical model accounted for was age and, in association with that, also stimulus level in the endogenous experiment. Other factors like head-size, audiogram shape or small electrode placement differences were not included in the model. All main and interaction effects were tested.

Exogeneous (transient) data

Request a detailed protocol

Root-mean-square (RMS) envelopes of the filtered and downsampled EMG signals were calculated with a sliding window (step size = 1 sample, window length = 150 samples/62.5 ms.) The data were then segmented into epochs extending from 3 s prior to stimulus onset until 3 s following termination of the auditory stimulus, which was of variable duration. Epochs were baseline corrected with respect to the mean RMS envelope amplitude of the pre-stimulus interval, that is this mean was subtracted from the epoch. For every participant, normalization was performed within every monitored auricular muscle (e.g., left PAM, right SAM) and for a specific stimulus type (e.g., traffic jam). Since every stimulus type was repeated four times, the largest voltage in the corresponding four epochs was used for normalization. The reference value in each case was the largest voltage at any time point within the four relevant epochs (e.g., the four directions/trials with a traffic jam stimulus) in the EMG recordings of that particular muscle. Each participant’s data were pooled according to whether the side of the stimulus and recorded muscle did or did not match, and then were averaged into contralateral and ipsilateral waveforms. Eighteen trials contributed to each of these per subject contralateral and ipsilateral waveforms, nine from the left speaker and nine from the right. Mean amplitudes were computed across a measurement window extending from 100 to 1500 ms following stimulus onset and were then subjected to statistical analysis.

Endogenous (sustained) data

Request a detailed protocol

Artifacts of the filtered and downsampled EMG data were reduced by averaging the signal energy of 1 s, non-overlapping segments and rejecting segments that deviated by more than two standard deviations from the mean. For each participant, the mean energy of a given channel during the four listening conditions (left/right × front/back) was calculated and then normalized to the largest value across the 5-min run. These normalized data were then averaged into ipsilateral/contralateral categories and subjected to statistical analysis.

EMG time-frequency decomposition (for Figure 5—figure supplements 1 and 2)

Request a detailed protocol

As the rather sustained muscle activity during endogenous attention might be reflected in low frequency components according to convolution models of the EMG (Farina et al., 2014), filtered and downsampled EMG signals in Experiment two were decomposed into eight frequency bands by a nonsubsampled octave-band filter bank (5th order Daubechies filter). Each frequency band was then further processed in the same fashion as the broadband signals reported in the main text.

Computer vision setup and motion analysis (for Videos 1, 2 and 3)

Request a detailed protocol

Videos were acquired using four Ximea MQ022CG–CM color sensors with a resolution of 1936 × 1216 at 120 frames per second and an exposure of 2 ms. Two cameras were positioned on each side of the head and focused on the ears to record pairwise stereo videos. We used hardware triggering for all four cameras and recorded each camera onto a separate m.2 solid-state-drive to reduce frame loss. We used a KOWA 35 mm macro lens with an aperture of F0.4 which gave us a close-up view of the ear with acceptable depth of field to allow slight movements towards the camera and enough distance such that the cameras did not cast shadows on to the scene. We illuminated the face uniformly with flicker-free LED studio illumination. The cameras were calibrated with the stereo camera calibrator app from the Mathworks Matlab Computer Vision System Toolbox. Calibration was performed whenever camera adjustment required re-alignment of relative stereo camera positions or a change of focus of one of the cameras.

For 3D reconstruction and motion visualization/quantification, we used functions from the Mathworks Matlab Computer Vision System Toolbox and custom written code. Our analysis system was able to reduce redundancies in optic flow and stereo depth estimation by exploiting the unilateral scene composition and limited degrees of freedom for ear and head movements. For 3D reconstructions, we initialized a sequence with one initial estimation of disparity and subsequently tracked points independently for the left and right image sequence.

We tracked points with respect to the first frame of the sequence as reference frame with dense optical flow initialized with a rigid motion estimation. Motion was visualized with a Lagrangian motion magnification approach that had a constant magnification factor with respect to the reference frame and prior removal of affine motion with respect to manually selected stable points. The results of the motion analysis with and without magnification can be seen in the videos.

References

  1. 1
    Targeting reflex: some features and inhibition targeting reflex: some features and inhibition
    1. M Alvarado
    2. G Santibañez
    (1971)
    Acta Neurobiologiae Experimentalis 31:33–45.
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
    Audio-visual reaction after unilateral lesions of the superior colliculus in cats
    1. E Czihak
    2. M Santibañez
    3. M Klimann
    4. G Santibañez
    (1983)
    Acta Neurobiologiae Experimentalis 43:15–25.
  9. 9
  10. 10
  11. 11
    The oculo-aural movement
    1. M Gerstle
    2. P Wilkinson
    (1929)
    Journal of Neurology and Psychopathology 9:228–230.
    https://doi.org/10.1136/jnnp.s1-9.35.228
  12. 12
    Evolutionary and biomedical insights from the rhesus macaque genome
    1. RA Gibbs
    2. J Rogers
    3. MG Katze
    4. R Bumgarner
    5. GM Weinstock
    6. ER Mardis
    7. KA Remington
    8. RL Strausberg
    9. JC Venter
    10. RK Wilson
    11. MA Batzer
    12. CD Bustamante
    13. EE Eichler
    14. MW Hahn
    15. RC Hardison
    16. KD Makova
    17. W Miller
    18. A Milosavljevic
    19. RE Palermo
    20. A Siepel
    21. JM Sikela
    22. T Attaway
    23. S Bell
    24. KE Bernard
    25. CJ Buhay
    26. MN Chandrabose
    27. M Dao
    28. C Davis
    29. KD Delehaunty
    30. Y Ding
    31. HH Dinh
    32. S Dugan-Rocha
    33. LA Fulton
    34. RA Gabisi
    35. TT Garner
    36. J Godfrey
    37. AC Hawes
    38. J Hernandez
    39. S Hines
    40. M Holder
    41. J Hume
    42. SN Jhangiani
    43. V Joshi
    44. ZM Khan
    45. EF Kirkness
    46. A Cree
    47. RG Fowler
    48. S Lee
    49. LR Lewis
    50. Z Li
    51. YS Liu
    52. SM Moore
    53. D Muzny
    54. LV Nazareth
    55. DN Ngo
    56. GO Okwuonu
    57. G Pai
    58. D Parker
    59. HA Paul
    60. C Pfannkoch
    61. CS Pohl
    62. YH Rogers
    63. SJ Ruiz
    64. A Sabo
    65. J Santibanez
    66. BW Schneider
    67. SM Smith
    68. E Sodergren
    69. AF Svatek
    70. TR Utterback
    71. S Vattathil
    72. W Warren
    73. CS White
    74. AT Chinwalla
    75. Y Feng
    76. AL Halpern
    77. LW Hillier
    78. X Huang
    79. P Minx
    80. JO Nelson
    81. KH Pepin
    82. X Qin
    83. GG Sutton
    84. E Venter
    85. BP Walenz
    86. JW Wallis
    87. KC Worley
    88. SP Yang
    89. SM Jones
    90. MA Marra
    91. M Rocchi
    92. JE Schein
    93. R Baertsch
    94. L Clarke
    95. M Csürös
    96. J Glasscock
    97. RA Harris
    98. P Havlak
    99. AR Jackson
    100. H Jiang
    101. Y Liu
    102. DN Messina
    103. Y Shen
    104. HX Song
    105. T Wylie
    106. L Zhang
    107. E Birney
    108. K Han
    109. MK Konkel
    110. J Lee
    111. AF Smit
    112. B Ullmer
    113. H Wang
    114. J Xing
    115. R Burhans
    116. Z Cheng
    117. JE Karro
    118. J Ma
    119. B Raney
    120. X She
    121. MJ Cox
    122. JP Demuth
    123. LJ Dumas
    124. SG Han
    125. J Hopkins
    126. A Karimpour-Fard
    127. YH Kim
    128. JR Pollack
    129. T Vinar
    130. C Addo-Quaye
    131. J Degenhardt
    132. A Denby
    133. MJ Hubisz
    134. A Indap
    135. C Kosiol
    136. BT Lahn
    137. HA Lawson
    138. A Marklein
    139. R Nielsen
    140. EJ Vallender
    141. AG Clark
    142. B Ferguson
    143. RD Hernandez
    144. K Hirani
    145. H Kehrer-Sawatzki
    146. J Kolb
    147. S Patil
    148. LL Pu
    149. Y Ren
    150. DG Smith
    151. DA Wheeler
    152. I Schenck
    153. EV Ball
    154. R Chen
    155. DN Cooper
    156. B Giardine
    157. F Hsu
    158. WJ Kent
    159. A Lesk
    160. DL Nelson
    161. WE O'brien
    162. K Prüfer
    163. PD Stenson
    164. JC Wallace
    165. H Ke
    166. XM Liu
    167. P Wang
    168. AP Xiang
    169. F Yang
    170. GP Barber
    171. D Haussler
    172. D Karolchik
    173. AD Kern
    174. RM Kuhn
    175. KE Smith
    176. AS Zwieg
    177. Rhesus Macaque Genome Sequencing and Analysis Consortium
    (2007)
    Science 316:222–234.
    https://doi.org/10.1126/science.1139247
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    Neuronal control of fixation and fixational eye movements
    1. RJ Krauzlis
    2. L Goffart
    3. ZM Hafed
    (2017)
    Philosophical Transactions of the Royal Society B: Biological Sciences 372:20160205.
    https://doi.org/10.1098/rstb.2016.0205
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
    Sensitivity of auditory cells in the superior colliculus to eye position in the behaving cat
    1. LC Populin
    2. TCT Yin
    (1997)
    In: A. R Palmer, A Rees, A Summerfield, R Meddis, editors. Psychophysical and Physiological Advances in Hearing. London: Whurr. pp. 441–448.
  28. 28
  29. 29
  30. 30
    Effector pattern of the audio-visual targeting reflex in cats
    1. H Siegmund
    2. G Santibáñez
    (1982)
    Acta Neurobiologiae Experimentalis 42:311–326.
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
    A note on an associated movement of the eyes and ears in man
    1. SAK Wilson
    (1908)
    Review of Neurology and Psychiatry 6:331–336.

Decision letter

  1. Jennifer M Groh
    Reviewing Editor; Duke University, United States
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  3. Sarah Verhulst
    Reviewer; Ghent University, Belgium
  4. Christopher Shera
    Reviewer; University of Southern California, United States
  5. Brian D Corneil
    Reviewer; University of Western Ontario, Canada

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Unlike cats and dogs, humans can at most barely move their ears. This study reports that the largely vestigial muscles of the ear nevertheless retain a sensitivity to spatial attention: changes in electrical activity of ear muscles could be evoked both in subjects deliberately attempting to listen to a story from one location while ignoring another, and when subjects were surprised by novel sounds while reading an irrelevant essay. The attentional effects on ear muscle electrical activity reported in this study substantially expand the scope of knowledge regarding top down mechanisms in the brain and how they influence the sense organs.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Vestigial auriculomotor activity indicates the direction of auditory attention in humans" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Christopher Shera (Reviewer #1); Sarah Verhurst (Reviewer #2).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that this submission of your work will not be considered further for publication in eLife. However, the reviewers found merit in the project. Should you choose to revise the manuscript, we ask that authors include the original submission manuscript number, title, and corresponding author and upload a point-by-point response to the reviews including any changes they have made to the manuscript.

The comments of the reviewers are included below. As you may be aware, at eLife those initial reviews set the stage for a subsequent consultation among the reviewers to achieve consensus. Most of our discussion centered on the most challenging comments from reviewer 3 regarding the relationship between your findings and eye movements. A possible relationship to eye movements would be interesting either way, but much of this material is only provided in the supplementary figures, and lacks narrative description to make the implications clear (in particular the series of orange and black figures in the supplementary materials).

We recognize that EOG recordings provide a limited resolution that likely preclude looking at microsaccades in Experiment 2, but Experiment 1 provides an opportunity to ascertain whether there is any ear muscle activity associated with the macrosaccades that occur during reading.

We also suggest that the results concerning the frontal speakers that are currently presented only in the supplementary materials be included in the main manuscript. As there is no obvious conceptual reason to discount the results from frontal space, they should be given equal prominence as the results from rear space.

Reviewer #1:

The manuscript is generally well written, the experiments thorough and compelling, and the content novel and genuinely intriguing. My comments are relatively minor.

Abstract: The logic of the first paragraph of the Abstract is confusing and the phrasing sometimes awkward. I suggest reworking it along these lines:

"Humans, unlike dogs and cats, are not commonly thought to move their ears when focusing auditory attention, either reflexively toward novel sounds or voluntarily toward those that are goal-relevant. Nevertheless, humans may retain a vestigial pinna-orienting system that persists as a "neural fossil" within the brain. Consistent with this hypothesis, we demonstrate that the direction of auditory attention is reflected in the sustained electrical activity of muscles within the vestigial auriculomotor system."

Results paragraph two: What is ps? A typo? Or is it somehow supposed to be the plural of p? If the latter, it would be much clearer to write "p values < 0.001".

Results paragraph four: This paragraph is out of place in the Results and should be moved to the Discussion.

Final paragraph of the Results: This paragraph is also rather jarringly out of place and should be moved to the Discussion.

In the same paragraph: Citation needed for "found in multiple languages".

Discussion paragraph two: This might be a good place to include the phrase "about 25 million years ago" which was lost in the rewrite of the Abstract.

In the same paragraph: Should be "New World monkeys" and "Old World monkeys"

Subsection “Exogeneous (transient) data”: What was the overlap between consecutive windows (step size)?

Discussion: The manuscript should cite and discuss the recent and likely related work of Gruters et al., 2018.

Reviewer #2:

The manuscript presents how auriculomotor activity forms an objective correlate of exogenous and endogenous auditory attention. To this end, study participants had their heads in a fixed position while listening to sounds coming from different directions to steer auditory attention. At the same time, TAM, PAM, SAM and AAM EMG muscle activity and pinna movements were measured, the latter with a camera. The study is convincing in relating the exogenous auriculomotor activity (increase of PAM/AAM) to auditory attention, because SCM and EOG signals were used to rule out other explanatory variables such as eye gaze and horizontal movement of the head. The presented analysis and supplementary material support the main conclusions. I have one major point related to data-processing, and others are mostly related to interpretation/applicability of results. I am expecting that the data-processing point will not change the main outcome of the study.

1) I have difficulties understanding the normalization procedure described in the text and its relationship to the values labeled on Figure 2.

"Epochs were baseline corrected..[].. Amplitudes were normalized separately for each participant and muscle according to the largest value among the four stimulus presentations at any time point within the epoch data"

When reading this, I interpret that for each person, the peak amplitude of the signal in the largest condition should be one (i.e. normalized), and that in the other three conditions for this person the amplitude should be less than one. Then data was averaged across conditions, after which a mean amplitude across a window of 1400 ms was calculated, or pooled across participants to yield the grand average waveform. When looking at Figure 2, the amplitudes have amplitudes of 100 to 125 [-], and I do not follow the relationship between those numbers and the description in the text. Also, in case magnitudes are normalized to peak maxima, I would expect quite a variability in the baselines of different individuals which should show up strongly in the grand-averaging across people with different base-line estimates.

2) The characterization of the vestigial network was performed on the basis of a still head during the task, which was necessary to demonstrate the main point of the paper. However, this study does not really go into whether and how strongly this auriculomotor activity plays when people are allowed to move their heads during an attention task. i.e., would this mechanism be complementary to attention-driven gaze, movement steering or does only occur when the head itself cannot move? This differentiation might be important to consider when translating this work to hearing-aid applications. This point is not a drawback of the paper, but should perhaps be discussed more strongly when discussing the potential application areas.

Reviewer #3:

General assessment: There are some aspects of the paper that I found intriguing, but there are a number of points that I found either under-explored, or unconvincing. A stronger mechanistic case should also be made relating these findings to others in the literature. The case for using this to aid decoding is also weak. Ultimately, I find that this article falls (fairly far) below the standard of what I would expect for eLife.

Substantive concerns.

1) The authors could do a much better job placing the current results in the context of other subtle indicators of covert attention. There is an extensive literature on any number of subtle indicators of covert attention (e.g., microsaccades, pupil dilation, even subtle levels of neck muscle recruitment; see van Ede Chekroud Nobre, Nat Human Behavior, 2019 for a recent article in this field), and mechanistic evidence tying these to the superior colliculus. Given that the authors invoke the superior colliculus as a possible node within the auriculomotor pathway, it would help to speculate on how the current results fit in with other work in this literature.

2) Consideration of these other measures leads to concerns about discounting the possibility of subtle eye or head movements. For eye movements, the use of electrooculography to address gaze orientation is not sufficient. EOG has good temporal resolution, but its spatial resolution is very poor, and can't be used to rule out anything with saccadic amplitudes less than 1-2 deg. Further, given recent results linking eye movements to movements of the eardrum in humans and monkeys (Gruters et al., 2018), a much more precise linking of eye movements are auricular muscle recruitment is warranted, and this could be done much more systematically (e.g., are the auricular muscles actually recruited during eye movements?). Discounting head movements using surface EMG recordings of SCM is also insufficient for a number of reasons. As a powerful head turning muscle, SCM tends not to be recruited for subtle movements of the head. Further, SCM contributes to contralateral, not ipsilateral, head turns, so the focus on the ipsilateral SCM muscle in Experiment 1 supplementary figure 5 is incorrect. Overall, I found the measures used to discount the possibility of subtle eye or head movements to be unconvincing.

3) The authors speculate that signals from the auricular muscles could be used to decode the locus of auditory spatial attention in near real time. While of potential interest, this claim is highly speculative given the coarseness and apparent variability in the signals shown in the manuscript (which generally show grand averages, with little to no sense of variability). If the authors wish to make the decoding argument, then why not try this? How well can target location actually be extracted from the current data? This would seem to be a tractable question for Experiment 1 (e.g., use data from some subset of trials to train a classifier, and then see how well the classifier works on the other set of trials). Chance performance would be 25% -- can a classifier based on auricular muscle activity do substantially better? I must admit that I am sceptical that signals extracted from these small signals could be useful at all in the real world, given how much the ears move during facial expressions or voluntary ear wiggling. Unless the decoding case can be made more strongly, my advice would be to drop the "decoding" angle from the paper and focus on basic findings.

4) For Experiment 2, EMG activity is basically averaged across the entire 5 min range. This is a very coarse timeframe and approach, and I can't help but think there would be something more interesting in the data. Is there any way of looking for transient changes in auricular muscle recruitment, and then tying that back to some sort of event during the stream of auditory information? There is the chance of some potentially rich data that really hasn't been mined with the current approach.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Vestigial Auriculomotor Activity Indicates the Direction of Auditory Attention in Humans" for further consideration by eLife. Your revised article has been evaluated by Barbara Shinn-Cunningham (Senior Editor) and a Reviewing Editor.

The manuscript has been improved and all three reviewers appreciate the importance of this work, but there are some remaining issues that need to be addressed. In particular, reviewer 3 's concerns center on the eye movement results as well as several issues concerning head and neck musculature. These concerns will need to be addressed before the paper can be accepted. All three reviewers and reviewing editor have consulted and agree on the importance of incorporating these additional analyses. The full reviews are included below.

Reviewer #1:

I carefully read both rebuttal letter and the revised manuscript and in my view, the revised manuscript is very clear, original and of high scientific standard. The added section on auditory-visual interactions and expanded discussion have turned this paper into a very nice and complete paper.

Reviewer #2:

The authors have generally done a fine job addressing the reviewer concerns. …

Reviewer #3:

General assessment

This short manuscript reports that tasks that engage auditory attention either exogenously (Experiment 1) or endogenously (Experiment 2) lead to the recruitment of auricular muscles that subtly change the shape of the pinna, doing so in a spatially-dependent manner. The conclusion is that such recruitment attests to the presence of a vestigial brain circuit. The topic is timely given recent findings linking saccadic eye movements to movements of the eardrum, and a number of other subtle indicators of covert attention driven, for example, by the oculomotor system. The results are intriguing, but more can be done to address other potential confounds, particularly on the oculomotor side.

Substantive concerns

The authors have extensively revised the manuscript, and established the phenomena both within and across their subject pool. I still have some concerns about other potential confounds that need to be addressed; as the authors say the neural drive to the ear muscles is so weak that the resultant movements are miniscule compared to those generated during broad smiles or wiggling.

1) Previously, I had raised concerns about potential confounds from the oculomotor system that orients the line of sight via eye and/or head movements (what the authors term the "visuomotor system" in their response). The authors have added a number of analyses that go some to length to assuage concerns about eye movements. However, grand average measures of EOG across many trials could mask some interactions between eye movements and auricular muscle activity; the data shown in Figure 7 also shows how the variance of the EOG signal decreases after stimulus onset, particularly for stimuli presented at the left-back speaker. More analyses and details are warranted.

1a) For Experiment 1, the auditory stimuli are presented while subjects are "reading a boring essay". Please provide details about how large the eye movement excursions were; from Multimodal Figure supplement 1, it appears that the text spanned about +/-12 deg of horizontal visual angle, but this is just one subject.

1b) The authors acknowledge that they can detect "macro" saccades greater than about 1 degree on average, and these should be analyzed in a more systematic manner than relying on average EOG traces, which could wash out effects. The oculomotor literature on microsaccades has a number of ways of presenting spatial and temporal patterns of saccades timed to external events (e.g., see saccadic “rasters” in Figure 4 of Tian, Yoshida and Hafed Front Syst Neurosci 2016), and I think these should be applied to the data from Experiment 1. See for example the work by Ziad Hafed (Figure 4 in Tian, Yoshida and Hafed Front Syst Neurosci 2016). Something similar for the "macrosaccade" data from Experiment 1 could is needed to establish whether or not stimulus onset is altering the patterning of larger saccadic eye movements.

1c) For Experiment 2, please provide the "time-resolved" plots for EOG data, similar to what is provided for auricular muscle activity in Figure 6. The EOG data shown in Figure 8 is helpful, but shows data averaged across an entire 5 min segment I believe, which is a very large window.

2) In regards to a potential concern about head movements, I agree that the vestibular-auricular reflex is unlikely, given that the head was stabilized on a chin-rest. My previous concern was more about whether the act of spatially deploying auditory attention was related to neck muscle contraction that introduced cross-talk at the auricular muscles (the absence of head movement can't be used to infer the absence of neck muscle contraction in this regard). The work by Cooke and Patuzzi, 2014, doesn't address this concern since they examined sternocleidomastoid activity during ipsilateral head turns (right PAM and right SCM recordings during right head turns). SCM is a contralateral, not ipsilateral head turner, so would be directly recruited by leftward turns, which do not appear to have been studied in the Cooke and Patuzzi setup. The Cooke and Patuzzi paper actually mentions the concerns I have about potential cross-talk from other nearby muscles (see end of the first paragraph of their "subjects and methods"). Muscles on the back of the neck (e.g., splenius capitis or suboccipital muscles; insertion on the occiput, which lies close to the mastoid) are ipsilateral head turners. The activity of the suboccipital muscles in particular has also been related to reflexive visuospatial attention in a Posner type task in head-restrained monkeys (e.g., see Corneil et al., 2008). To be clear, I don't think that the entirety of the results could be "explained" by cross-talk from nearby dorsal neck muscles, but the authors should consider this perspective.

3) A final point about other muscles; Figure 1 shows that the authors recorded EMG activity on zygomaticus major and frontalis, but results are not analyzed. Given the point about how small the movements of interest are compared to those related to smiling, please establish the independence of the auricular muscle recordings from these facial muscles.

https://doi.org/10.7554/eLife.54536.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

The manuscript is generally well written, the experiments thorough and compelling, and the content novel and genuinely intriguing. My comments are relatively minor.

Abstract: The logic of the first paragraph of the Abstract is confusing and the phrasing sometimes awkward. I suggest reworking it along these lines:

"Humans, unlike dogs and cats, are not commonly thought to move their ears when focusing auditory attention, either reflexively toward novel sounds or voluntarily toward those that are goal-relevant. Nevertheless, humans may retain a vestigial pinna-orienting system that persists as a "neural fossil" within the brain. Consistent with this hypothesis, we demonstrate that the direction of auditory attention is reflected in the sustained electrical activity of muscles within the vestigial auriculomotor system."

Thank you very much for disentangling and sharpening this paragraph. We moved your suggestions 1:1 to the revised version of the manuscript (we just added “for about 25 million years”).

Results paragraph two: What is ps? A typo? Or is it somehow supposed to be the plural of p? If the latter, it would be much clearer to write "p values < 0.001".

An anecdote: The “ps” was causing a discussion among the authors before the first submission. Some liked it as it is very common in psychology, some disliked it as it is horrible from mathematical point of view. We followed your advice and used “p-values” in the revised version of the manuscript.

Results paragraph four: This paragraph is out of place in the Results and should be moved to the Discussion.

Thanks for the observation. We moved this paragraph along with some associated text to the Discussion.

Final paragraph of the Results: This paragraph is also rather jarringly out of place and should be moved to the Discussion.

Thanks for the observation. We deleted most of this paragraph and moved the rest to the Discussion.

In the same paragraph: Citation needed for "found in multiple languages".

In the interest of space, we have omitted the comments about ear-straining metaphors from the revised manuscript.

Discussion paragraph two: This might be a good place to include the phrase "about 25 million years ago" which was lost in the rewrite of the Abstract.

Included in the revised version. Thanks

In the same paragraph: Should be "New World monkeys" and "Old World monkeys"

Amended in the revised version. Thanks.

Subsection “Exogeneous (transient) data”: What was the overlap between consecutive windows (step size)?

The RMS value (windows size = 150 samples) was calculated for every sample without overlap or omission (so a step size of 1; that is why we refer to it as an “RMS-envelope”). This is more clearly stated in the revised version (Materials and methods subsection “Electrophysiological Recordings and Data Processing”).

Discussion: The manuscript should cite and discuss the recent and likely related work of Gruters et al., 2018.

We modified the Discussion accordingly, see general comments and comments to reviewer 3. This important citation is now included along with a more complete discussion of eareye interactions.

Reviewer #2:

The manuscript presents how auriculomotor activity forms an objective correlate of exogenous and endogenous auditory attention. To this end, study participants had their heads in a fixed position while listening to sounds coming from different directions to steer auditory attention. At the same time, TAM, PAM, SAM and AAM EMG muscle activity and pinna movements were measured, the latter with a camera. The study is convincing in relating the exogenous auriculomotor activity (increase of PAM/AAM) to auditory attention, because SCM and EOG signals were used to rule out other explanatory variables such as eye gaze and horizontal movement of the head. The presented analysis and supplementary material support the main conclusions. I have one major point related to data-processing, and others are mostly related to interpretation/applicability of results. I am expecting that the data-processing point will not change the main outcome of the study.

1) I have difficulties understanding the normalization procedure described in the text and its relationship to the values labeled on Figure 2.

"Epochs were baseline corrected..[].. Amplitudes were normalized separately for each participant and muscle according to the largest value among the four stimulus presentations at any time point within the epoch data"

When reading this, I interpret that for each person, the peak amplitude of the signal in the largest condition should be one (i.e. normalized), and that in the other three conditions for this person the amplitude should be less than one. Then data was averaged across conditions, after which a mean amplitude across a window of 1400 ms was calculated, or pooled across participants to yield the grand average waveform. When looking at Figure 2, the amplitudes have amplitudes of 100 to 125 [-], and I do not follow the relationship between those numbers and the description in the text. Also, in case magnitudes are normalized to peak maxima, I would expect quite a variability in the baselines of different individuals which should show up strongly in the grand-averaging across people with different base-line estimates.

In the first step, every epoch was baseline corrected by subtracting the mean baseline value from the epoch. Then, normalization was performed, which you understood correctly: The largest (absolute) value in four conditions (for every stimulus type/subject) is set to 1 (or -1, if the baseline subtraction introduced negative values) and all other values are scaled accordingly. This normalization procedure should be clearer in the revised version. Along these lines, we also point out more clearly that the normalization is not only done independently for every subject and channel, but also for every stimulus type. For example, for subject x, all 4 responses to the stimulus “baby crying” (which was presented 4 times, once from each speaker) recorded at the same channel (for example right PAM) were normalized with respect to each other. The next stimulus, for instance, “dog barking”, was then processed independently. In the next phase, responses of the same stimulus type were pooled (and averaged) to form ipsi- or contralateral responses, defined according to lateral congruence of sound source and recorded muscle. (It was at this point that the orange-and-black epoch matrices presented in the supplement were generated). Then, responses were averaged for every subject, and mean values between 100-1500 ms were calculated for use in the statistical analyses.

Regarding the plot: For plotting purposes in the previous version of the manuscript, we increased the values to % baseline, which is why the baseline had an average of 100 in the plot. The initial idea was to make those plots more comparable to those in our Stekelenburg and van Boxtel, 2002, reference. However, we see that there was room for confusion, especially as this was not in line with the matrix plots in the supplementary material. Therefore, we removed the percentage-based scale for the y-axis in the revised version. Thanks for drawing our attention to this issue.

Regarding baseline variability: While there are cases in which peak normalization produces large baseline variability, averaging across epochs ameliorates this problem because the activity is not locked to any event. In the revised version, we have added single-epoch plots along with the average and standard deviation (see Figure 3 and related material in the supplement). Thus, it should be much easier to appreciate the variance within our data in the revised manuscript.

2) The characterization of the vestigial network was performed on the basis of a still head during the task, which was necessary to demonstrate the main point of the paper. However, this study does not really go into whether and how strongly this auriculomotor activity plays when people are allowed to move their heads during an attention task. i.e., would this mechanism be complementary to attention-driven gaze, movement steering or does only occur when the head itself cannot move? This differentiation might be important to consider when translating this work to hearing-aid applications. This point is not a drawback of the paper, but should perhaps be discussed more strongly when discussing the potential application areas.

This is an important insight, and one that we will be addressing in an upcoming paper focused on decoding attention direction from pinna EMG signals. Thank you very much for bringing this up, Dr. Verhulst. However, based on comments from another reviewer, we have decided to reduce discussion of hearing aids and other potential applications in the present manuscript. As you mentioned, it was our main goal in this initial study to isolate the auriculomotor effect as much as possible, so head-ear interactions were not emphasized. Neurobiological research in cats has begun to identify the ways in which pinna orienting is coordinated with head and eye movements (Populin and Yin, 1998). It seems to be more complicated than eye-head coordination due to the acoustic shadow of the head and multidimensionality of ear movements. We briefly allude to the topic in our discussion of the vestibulo-auricular response and future work.

Reviewer #3:

General assessment: There are some aspects of the paper that I found intriguing, but there are a number of points that I found either under-explored, or unconvincing. A stronger mechanistic case should also be made relating these findings to others in the literature. The case for using this to aid decoding is also weak. Ultimately, I find that this article falls (fairly far) below the standard of what I would expect for eLife.

The authors would like to thank reviewer 3 for the critical feedback. Our initial submission was motivated by our finding that there is neural drive to auricular muscles when paying attention. We presented evidence in form of a short report which documented for the first time a direct spatial correspondence between sustained auditory attention and sustained auriculomotor activity, as measured electromyographically. The auriculomotor system as such is almost untouched in the literature. Indeed, all previous relevant work for “auriculomotor” was reviewed within the strict space limits of our short report submission. However, Reviewer 3 looked at the data from a new but important angle, focusing mainly on the interactions between the auditory and the visual motor systems. Even though we are convinced that our results are of merit focusing on the auditory modality (the 1st and 2nd reviewer are also positive in this sense), we completely agree that this new aspect is a very important component for the discussion of our results. This is especially true in light of the recent work linking eye to eardrum movements. Furthermore, we believe that a careful consideration of this topic, embedding our findings within the literature, will extend the study’s impact and more effectively stimulate further research. We clearly stated this now in “Future Work”.

Substantive concerns.

1) The authors could do a much better job placing the current results in the context of other subtle indicators of covert attention. There is an extensive literature on any number of subtle indicators of covert attention (e.g., microsaccades, pupil dilation, even subtle levels of neck muscle recruitment; see van Ede Chekroud Nobre, Nat Human Behavior, 2019 for a recent article in this field), and mechanistic evidence tying these to the superior colliculus. Given that the authors invoke the superior colliculus as a possible node within the auriculomotor pathway, it would help to speculate on how the current results fit in with other work in this literature.

As we mentioned before, we completely agree with the reviewer regarding this critique. We have added a new paragraph in the Discussion section focusing on this topic.

2) Consideration of these other measures leads to concerns about discounting the possibility of subtle eye or head movements. For eye movements, the use of electrooculography to address gaze orientation is not sufficient. EOG has good temporal resolution, but its spatial resolution is very poor, and can't be used to rule out anything with saccadic amplitudes less than 1-2 deg. Further, given recent results linking eye movements to movements of the eardrum in humans and monkeys (Gruters et al., 2018), a much more precise linking of eye movements are auricular muscle recruitment is warranted, and this could be done much more systematically (e.g., are the auricular muscles actually recruited during eye movements?). Discounting head movements using surface EMG recordings of SCM is also insufficient for a number of reasons. As a powerful head turning muscle, SCM tends not to be recruited for subtle movements of the head. Further, SCM contributes to contralateral, not ipsilateral, head turns, so the focus on the ipsilateral SCM muscle in Experiment 1 supplementary figure 5 is incorrect. Overall, I found the measures used to discount the possibility of subtle eye or head movements to be unconvincing.

The focus of the initial submission as short report was the analysis of auricular muscle activation during spatial auditory attention and not the analysis of the interaction between the auditory and visual systems per se. We merely wanted to monitor relationships between visual and auditory motor systems that are currently known to exist: the oculo-auricular phenomenon of Wilson. In this way, we could exclude that the described effect is secondary to the Wilson’s phenomenon (as we understand it today) or to gross neck movements. Note that the latter has also been ruled out earlier (see Cooke and Patuzzi, 2014). We did not plan our study or look at the data/results from the new but important angle recommended by the third reviewer—that of stressing possible aspects of interactions between the auditory and the visual motor systems that might beyond Wilson’s phenomenon. Accepting the reviewer’s critique and following also the editors’ advice, we analyzed and discussed much more carefully the interactions between the visual and the auditory systems reflected in our data. We also stated clearly that our setup only allows us to rule out spatial attention effects that are secondary to Wilson’s phenomenon. In particular, it becomes clear that:

1) Grand-average waveforms (shown in the supplementary information, Experiment 1) show a complete absence of location-specific eye movements prior to or synchronous with auricular responses. This does not rule out an effect of eye movements that were too small to be recorded with EOG which, as the reviewer notes, has a resolution of about 2 degrees of arc.

2) However, Wilson’s phenomenon is rarely elicited by eye movements less than 30 degrees (Urban, Marczynski, and Hopf, 1993). Furthermore, the enhancement of PAM activity appears to become laterally specific (ipsi > contra) only for gaze shifts greater than about 40 degrees (Patuzzi and O’Beirne, 1999, Figure 4). Our EOG recordings were certainly adequate for detecting ocular movements of this size.

3) There were large saccades in Experiment 2 as participants moved from the right edge of one line of text to the left edge of the next line of text. Inspection of single epochs failed to identify any systematic association between PAM responses and large saccades, see Author response image 1 for four representative trials.

Author response image 1
Macrosaccades in the EOG (black line) during reading in Experiment 1 for subject 15 as example (a subject with large, clear PAM activations).

It is noticeable that the muscle activations are not linked to the macrosaccades.

4) Activation of TAM during Wilson’s phenomenon exhibits lateral asymmetry in the direction opposite to that of our findings. Visible movements (Gertle and Wilkinson, 1929) and EMG activity (Urban et al., 1993) for this muscle are greater on the side opposite to the direction of gaze. By contrast, in our study TAM activation was enhanced when attention was elicited by sounds on the same side as the muscle.

5) Only the very fastest saccades (Fischer and Ramsperger, 1984) have a latency comparable to that of the auricular responses documented by our study (70 ms).

6) The latency of TAM responses with respect to onset of the eye movement in Wilson’s phenomenon averages about 340 ms (Schmidt and Thoden, 1978), far too slow to contribute to the attention effects we have documented in Experiment 1.

Nonetheless, we clearly state now in the revision that our experiment is not designed to discover or analyze interactions beyond of what is currently known about Wilson’s phenomenon in ear-eye interaction (i.e., that it is slow and only occurs during large, sustained gaze shifts). Especially in the light of recent of work of Jennifer Groh’s group, there is a lot of room for future investigations on this topic. We hope that our results help to stimulate such research.

3) The authors speculate that signals from the auricular muscles could be used to decode the locus of auditory spatial attention in near real time. While of potential interest, this claim is highly speculative given the coarseness and apparent variability in the signals shown in the manuscript (which generally show grand averages, with little to no sense of variability). If the authors wish to make the decoding argument, then why not try this? How well can target location actually be extracted from the current data? This would seem to be a tractable question for Experiment 1 (e.g., use data from some subset of trials to train a classifier, and then see how well the classifier works on the other set of trials). Chance performance would be 25% -- can a classifier based on auricular muscle activity do substantially better? I must admit that I am sceptical that signals extracted from these small signals could be useful at all in the real world, given how much the ears move during facial expressions or voluntary ear wiggling. Unless the decoding case can be made more strongly, my advice would be to drop the "decoding" angle from the paper and focus on basic findings.

Because of the current interest in decoding spatial auditory attention from the EEG signal, the authors thought that this might be an interesting application of the proposed auriculomotor monitoring. Apart from the stimulus reconstruction approach in which EEG signals are employed to reconstruct the envelope of the attended speaker (identification of attention direction requires correlating EEG and speech envelopes), the decoding of endogenous spatial attention from the EEG only (i.e., without the speech envelope) still remains a challenge. One could assume that an activation of the auricular muscles due to spatial attention might “amplify” the endogenous signal. In fact, originally, we found the demonstrated effect of the auricular muscle activation in endogenous modes of attention by trying to decode spatial auditory attention from the EEG (in the BMBF Attentional Microphone project; PI: DJS). A robust effect was just identified for the mastoid electrodes for larger frequencies which turned out to be related to the PAM activation. Driven by this, we just took the EMG with carefully attached electrodes and combined a hybrid machine learning scheme developed before in DJSs group (Strauss and Steidl, JCAM, 2002) to decode spatial auditory attention. For left/right decisions, the decoding scheme reached a performance a way above chance level (abstract at IEEE EMBC 2018 and SPR 2018). However, instead of using a black box learning scheme with abstract EMG features, we were interested in quantifying and analyzing the regularities between spatial auditory attention and the auricular EMG and, perhaps, associated pinna movements. This was the motivation for the present study. The experiments are not designed for a machine learning-based decoding in which many other factors matter, such as recording time and the associated electrodeskin interface stability. We agree with the reviewer that we should remove the engineering application / decoding from the paper as this is indeed not the subject here. In the revised version, we only mention a possible decoding application briefly in the discussion to stimulate a possible interested in those who work on EEG based decoding.

However, just to complete this response, we would like show that decoding the left/right listening direction from the back speakers (similar as in the EEG literature; a related setup was used in Schäfer et al., 2018) is possible with the described data. In particular, we demonstrate the performance of 2 different endogenous attention decoding approaches in Author response image 2 and Author response image 3 (i.e., for Experiment 2).

Here Author response image 2 shows the application of a hybrid wavelet-support vector machine classification of waveforms (Strauss and Steidl, “Hybrid Wavelet-Support Vector Machine Classification of Waveforms”, J of Comput and Appl. Math 2002) of the right/left listening direction jointly for the PAM and SAM. This machine is individualized (learned) for each participant and side of the head/ear using EMG data segments of 1s. If the (trained) machines on both sides give the same output at segment n, the corresponding direction is decoded. Otherwise the decoding system is in a doubt state and there is no decision in the decoded direction (i.e., the decoding scheme outputs the same the direction that it had at n-1, i.e., its previous state). For this simple test the filter bank was not adapted, nor the hyperparameters of the support vector machine were particularly tuned. The training set consisted of 162 observations (based on K-nearest-neighbor to mean) and the independent test set of 226 observations with a standard deviation of 32 (because of an energy-threshold based artefact rejection) for each of the 21 participants in Experiment 2. It is easy to see that the performance with >90 % is far above chance level (50%). The technical details of such an individualized decoding scheme with hybrid kernel learning machines for hearing aids can be found in Corona-Strauss, Hannemann, Strauss. Method for Operating a Hearing Aid Device. US Patent Application # 16102983 (Priority date: 14.08.2017 from the German application # 102017214163.8).

Apart from these adaptive concepts, we also evaluated the analysis of the mean energy (broadband signal) described in the paper as feature for a support vector classifier by means of a “leave-oneparticipant-out” cross validation: That is, the machine is trained with N-1 participants (N=21 in Experiment 2) and participant #N (which was not part of the training set) is classified; then shuffled such that a new set of N-1 participants is used for training and another participant #N-1 is classified (i.e., one participant serves as test set). As we have already mentioned, the mean energy is used as feature as in the time resolved analysis in Figure 6 but with segments of 1s (instead of 10s in the paper) to provide near real time decoding. Note that here we applied a learning across participants without individualization (as in in the analysis in Author response image 2). 10602+/-21 data segments/observations were used for training (from 20 participants) and an independent test set of 530+/-21 from 1 participant. Author response image 3 shows the result for the 21 independently classified participants. Even here the mean classification performance is with 75% above chance level.

Author response image 2
Left/right decoding performance for a conjoint classification of PAM/SAM EMG using an individualized decoding scheme in Experiment 2.
Author response image 3
Left/right decoding performance for a conjoint classification of PAM/SAM EMG using an(non-individualized) decoding scheme in Experiment 2 with a leave-one-participant-out cross validation.

The results for the very same “leave-one-participant-out” cross validation is shown in Author response image 4 Experiment 1, i.e., the exogeneous attention setting. Here we used the rms-value of the entire response to generate the learning/testing associations for the support vector machine. We used a training set of 486 observations (subjects-1*nstimuli*directions/27*9*2) and an independent test set of 18 observations (nstimuli*directions/9*2). Also here the classification accuracy is far above chance level.

Author response image 4
Left/right decoding performance for a conjoint classification of PAM/AAM/SAM/TAM EMG using an (non-individualized) decoding scheme in Experiment 1 with a leave-one-participant-out cross validation.

4) For Experiment 2, EMG activity is basically averaged across the entire 5 min range. This is a very coarse timeframe and approach, and I can't help but think there would be something more interesting in the data. Is there any way of looking for transient changes in auricular muscle recruitment, and then tying that back to some sort of event during the stream of auditory information? There is the chance of some potentially rich data that really hasn't been mined with the current approach.

The reviewer is posing an excellent question regarding events which trigger particular movements. This is certainly a subject for future research, which we now note in the Discussion section. However, as we focused on sustained spatial listening in Experiment 2, the speech material was not designed for segmentation into events. In fact, we have chosen speech material that has a rather balanced saliency, arousal level, and a homogeneous information density. There was also, as described, the freedom for participants to pick a story and, consequently, the attended material varied across participants. Nevertheless, we describe more carefully in the revised version that Figure 6 and Figure 7 have a time-resolved analysis (resolution of 10s segments) that shows a sustained activation for the course of the experiment.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Reviewer #3:

General assessment

This short manuscript reports that tasks that engage auditory attention either exogenously (Experiment 1) or endogenously (Experiment 2) lead to the recruitment of auricular muscles that subtly change the shape of the pinna, doing so in a spatially-dependent manner. The conclusion is that such recruitment attests to the presence of a vestigial brain circuit. The topic is timely given recent findings linking saccadic eye movements to movements of the eardrum, and a number of other subtle indicators of covert attention driven, for example, by the oculomotor system. The results are intriguing, but more can be done to address other potential confounds, particularly on the oculomotor side.

The authors would like to thank reviewer 3 for the critical feedback and the suggested new analyses to strengthen our argumentation regarding possible confounds between the auriculo- and oculomotor system or other muscular co-activations. The suggested new analyses were really a clever set of ideas and the results are now reported in the new submission (see figure supplements specified below). The authors really appreciate this interest in oculo-auriculomotor interactions. The suggested analysis techniques along with the recently increasing interest in these interactions provide a solid plan for future research in the involved research labs and hopefully for others too. But as discussed already, our study was really designed for the analysis of auricular muscle activation during spatial auditory attention and not the analysis of the interaction between the auditory and visual systems per se. We merely wanted to monitor relationships between eye and ear movements that are currently known to exist: the oculo-auricular phenomenon of Wilson. In this way, we could exclude that the described effect is secondary to the Wilson’s phenomenon (as we understand it today) or to gross neck movements. Therefore, we tried not to over-analyze our data regarding oculo-auriculomotor interactions and drawing possibly too steep conclusions. The authors really appreciate that reviewer 3 (and the editors) left room for this. Thanks to reviewer 3, the limits of our techniques regarding the oculomotor system are now carefully discussed in the manuscript. Our discussion states clearly that exploring these interactions will be an interesting future research path. Unlike the present studies, which included a fixation cross or reading task, it would be better to employ a free-gaze paradigm. This would maximize the chances of observing oculo-auricular co-activation.

Substantive concerns

The authors have extensively revised the manuscript, and established the phenomena both within and across their subject pool. I still have some concerns about other potential confounds that need to be addressed; as the authors say the neural drive to the ear muscles is so weak that the resultant movements are miniscule compared to those generated during broad smiles or wiggling.

1) Previously, I had raised concerns about potential confounds from the oculomotor system that orients the line of sight via eye and/or head movements (what the authors term the "visuomotor system" in their response). The authors have added a number of analyses that go some to length to assuage concerns about eye movements. However, grand average measures of EOG across many trials could mask some interactions between eye movements and auricular muscle activity; the data shown in Figure 7 also shows how the variance of the EOG signal decreases after stimulus onset, particularly for stimuli presented at the left-back speaker. More analyses and details are warranted.

As mentioned above, we added all the suggested new analyses. In particular, we extended supplements of Figure 7 and 8, see the specifications below.

1a) For Experiment 1, the auditory stimuli are presented while subjects are "reading a boring essay". Please provide details about how large the eye movement excursions were; from Multimodal Figure supplement 1, it appears that the text spanned about +/-12 deg of horizontal visual angle, but this is just one subject.

We have now included the box-plot analysis for all the subjects, see Figure 7—figure supplement 2.

1b) The authors acknowledge that they can detect "macro" saccades greater than about 1 degree on average, and these should be analyzed in a more systematic manner than relying on average EOG traces, which could wash out effects. The oculomotor literature on microsaccades has a number of ways of presenting spatial and temporal patterns of saccades timed to external events (e.g., see saccadic “rasters” in Figure 4 of Tian, Yoshida and Hafed Front Syst Neurosci 2016), and I think these should be applied to the data from Experiment 1. See for example the work by Ziad Hafed (Figure 4 in Tian, Yoshida and Hafed Front Syst Neurosci 2016). Something similar for the "macrosaccade" data from Experiment 1 could is needed to establish whether or not stimulus onset is altering the patterning of larger saccadic eye movements.

The authors would like to thank reviewer 3 for this excellent suggestion. We included a similar (macrosaccadic) raster plot. This shows a rather uniform distribution of the macrosaccades along the time axis/after the stimulus onset, see Figure 7—figure supplement 3.

1c) For Experiment 2, please provide the "time-resolved" plots for EOG data, similar to what is provided for auricular muscle activity in Figure 6. The EOG data shown in Figure 8 is helpful, but shows data averaged across an entire 5 min segment I believe, which is a very large window.

We agree. A time-resolved plot, similar to the one used for the auricular muscles, is now shown in “Figure 8—figure supplement 1”.

2) In regards to a potential concern about head movements, I agree that the vestibular-auricular reflex is unlikely, given that the head was stabilized on a chin-rest. My previous concern was more about whether the act of spatially deploying auditory attention was related to neck muscle contraction that introduced cross-talk at the auricular muscles (the absence of head movement can't be used to infer the absence of neck muscle contraction in this regard). The work by Cooke and Patuzzi, 2014, doesn't address this concern since they examined sternocleidomastoid activity during ipsilateral head turns (right PAM and right SCM recordings during right head turns). SCM is a contralateral, not ipsilateral head turner, so would be directly recruited by leftward turns, which do not appear to have been studied in the Cooke and Patuzzi setup. The Cooke and Patuzzi paper actually mentions the concerns I have about potential cross-talk from other nearby muscles (see end of the first paragraph of their "subjects and methods"). Muscles on the back of the neck (e.g., splenius capitis or suboccipital muscles; insertion on the occiput, which lies close to the mastoid) are ipsilateral head turners. The activity of the suboccipital muscles in particular has also been related to reflexive visuospatial attention in a Posner type task in head-restrained monkeys (e.g., see Corneil et al., 2008). To be clear, I don't think that the entirety of the results could be "explained" by cross-talk from nearby dorsal neck muscles, but the authours should consider this perspective.

The authors agree, even though our bipolar configuration of electrodes would be rather robust to the possible volume conduction effects mentioned in the Cooke and Patuzzi paper. Also, the fact that far apart auricular muscles (not just PAM, but also AAM and SAM) show these effects is of course promising with respect to a co-activation interpretation. We note in the revised manuscript the possibility that subtle, covert activation of head turning muscles as suggested by the reviewer might be correlated with ocular and auricular orienting. Thanks for mentioning this and for the reference. We certainly did consider this perspective, see the last paragraph before the discussion.

3) A final point about other muscles; Figure 1 shows that the authors recorded EMG activity on zygomaticus major and frontalis, but results are not analyzed. Given the point about how small the movements of interest are compared to those related to smiling, please establish the independence of the auricular muscle recordings from these facial muscles.

Indeed, we had electrodes at these positions, mainly for a different subsequent experiment that analyzed the interaction of these muscles and the auricular ones during positive affective states. The frontalis and zygomaticus data are now reported for Experiment 1 and 2, see Figure 7—figure supplement 6-9 and “Figure 8—figure supplement 2”, respectively.

https://doi.org/10.7554/eLife.54536.sa2

Article and author information

Author details

  1. Daniel J Strauss

    Systems Neuroscience and Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Homburg/Saar, Germany
    Contribution
    Conceptualization, Resources, Formal analysis, Supervision, Validation, Investigation, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    daniel.strauss@uni-saarland.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8481-499X
  2. Farah I Corona-Strauss

    Systems Neuroscience and Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Homburg/Saar, Germany
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
  3. Andreas Schroeer

    Systems Neuroscience and Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Homburg/Saar, Germany
    Contribution
    Software, Formal analysis, Validation, Investigation, Visualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7904-3622
  4. Philipp Flotho

    Systems Neuroscience and Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Homburg/Saar, Germany
    Contribution
    Data curation, Software, Formal analysis, Validation, Investigation, Visualization
    Competing interests
    No competing interests declared
  5. Ronny Hannemann

    Audiological Research Unit, Sivantos GmbH, Erlangen, Germany
    Contribution
    Conceptualization, Methodology
    Competing interests
    No competing interests declared
  6. Steven A Hackley

    Clinical and Cognitive Neuroscience Laboratory, Department of Psychological Sciences, University of Missouri, Columbia, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared

Funding

Bundesministerium für Bildung und Forschung (BMBF-FZ No. 03FH004IX5)

  • Daniel J Strauss

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was partially supported by the German Federal Ministry of Education and Research, Grant No. BMBF-FZ 03FH004IX5 (PI: DJS). We thank Larissa Arand for assistance with data collection and Becca Sullinger for the artwork in Figure 1. We acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and Saarland University within the funding programme Open Access Publishing.

Ethics

Human subjects: The study was approved by the responsible ethics committee (ethics commission at the Ärztekammer des Saarlandes, Saarbrücken, Germany; app. number 79/16) After a detailed explanation of the procedure, all subjects signed a consent form.

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Jennifer M Groh, Duke University, United States

Reviewers

  1. Sarah Verhulst, Ghent University, Belgium
  2. Christopher Shera, University of Southern California, United States
  3. Brian D Corneil, University of Western Ontario, Canada

Publication history

  1. Received: January 24, 2020
  2. Accepted: May 28, 2020
  3. Version of Record published: July 3, 2020 (version 1)

Copyright

© 2020, Strauss et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,690
    Page views
  • 87
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Víctor J López-Madrona et al.
    Research Article Updated

    Hippocampal firing is organized in theta sequences controlled by internal memory processes and by external sensory cues, but how these computations are coordinated is not fully understood. Although theta activity is commonly studied as a unique coherent oscillation, it is the result of complex interactions between different rhythm generators. Here, by separating hippocampal theta activity in three different current generators, we found epochs with variable theta frequency and phase coupling, suggesting flexible interactions between theta generators. We found that epochs of highly synchronized theta rhythmicity preferentially occurred during behavioral tasks requiring coordination between internal memory representations and incoming sensory information. In addition, we found that gamma oscillations were associated with specific theta generators and the strength of theta-gamma coupling predicted the synchronization between theta generators. We propose a mechanism for segregating or integrating hippocampal computations based on the flexible coordination of different theta frameworks to accommodate the cognitive needs.

    1. Neuroscience
    Kyle Jasmin et al.
    Research Article

    Individuals with congenital amusia have a lifelong history of unreliable pitch processing. Accordingly, they downweight pitch cues during speech perception and instead rely on other dimensions such as duration. We investigated the neural basis for this strategy. During fMRI, individuals with amusia (N=15) and controls (N=15) read sentences where a comma indicated a grammatical phrase boundary. They then heard two sentences spoken that differed only in pitch and/or duration cues, and selected the best match for the written sentence. Prominent reductions in functional connectivity were detected in the amusia group, between left prefrontal language-related regions and right hemisphere pitch-related regions, which reflected the between-group differences in cue weights in the same groups of listeners. Connectivity differences between these regions were not present during a control task. Our results indicate that the reliability of perceptual dimensions is linked with functional connectivity between frontal and perceptual regions, and suggest a compensatory mechanism.