Neural synchronization is strongest to the spectral flux of slow music and depends on familiarity and beat salience

  1. Kristin Weineck  Is a corresponding author
  2. Olivia Xin Wen
  3. Molly J Henry
  1. Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical Aesthetics, Germany
  2. Goethe University Frankfurt, Institute for Cell Biology and Neuroscience, Germany
  3. Department of Psychology, Toronto Metropolitan University, Canada

Abstract

Neural activity in the auditory system synchronizes to sound rhythms, and brain–environment synchronization is thought to be fundamental to successful auditory perception. Sound rhythms are often operationalized in terms of the sound’s amplitude envelope. We hypothesized that – especially for music – the envelope might not best capture the complex spectro-temporal fluctuations that give rise to beat perception and synchronized neural activity. This study investigated (1) neural synchronization to different musical features, (2) tempo-dependence of neural synchronization, and (3) dependence of synchronization on familiarity, enjoyment, and ease of beat perception. In this electroencephalography study, 37 human participants listened to tempo-modulated music (1–4 Hz). Independent of whether the analysis approach was based on temporal response functions (TRFs) or reliable components analysis (RCA), the spectral flux of music – as opposed to the amplitude envelope – evoked strongest neural synchronization. Moreover, music with slower beat rates, high familiarity, and easy-to-perceive beats elicited the strongest neural response. Our results demonstrate the importance of spectro-temporal fluctuations in music for driving neural synchronization, and highlight its sensitivity to musical tempo, familiarity, and beat salience.

Editor's evaluation

This study investigated the neural tracking of music using novel methodology. The core finding was stronger neuronal entrainment to "spectral flux" rather than other more commonly tested features such as amplitude envelope. The study is methodologically sophisticated and provides novel insight on the neuronal mechanisms of music perception.

https://doi.org/10.7554/eLife.75515.sa0

eLife digest

When we listen to a melody, the activity of our neurons synchronizes to the music: in fact, it is likely that the closer the match, the better we can perceive the piece. However, it remains unclear exactly which musical features our brain cells synchronize to. Previous studies, which have often used ‘simplified’ music, have highlighted that the amplitude envelope (how the intensity of the sounds changes over time) could be involved in this phenomenon, alongside factors such as musical training, attention, familiarity with the piece or even enjoyment. Whether differences in neural synchronization could explain why musical tastes vary between people is also still a matter of debate.

In their study, Weineck et al. aim to better understand what drives neuronal synchronization to music. A technique known as electroencephalography was used to record brain activity in 37 volunteers listening to instrumental music whose tempo ranged from 60 to 240 beats per minute. The tunes varied across an array of features such as familiarity, enjoyment and how easy the beat was to perceive. Two different approaches were then used to calculate neural synchronization, which yielded converging results.

The analyses revealed that three types of factors were associated with a strong neural synchronization. First, amongst the various cadences, a tempo of 60-120 beats per minute elicited the strongest match with neuronal activity. Interestingly, this beat is commonly found in Western pop music, is usually preferred by listeners, and often matches spontaneous body rhythms such as walking pace. Second, synchronization was linked to variations in pitch and sound quality (known as ‘spectral flux’) rather than in the amplitude envelope. And finally, familiarity and perceived beat saliency – but not enjoyment or musical expertise – were connected to stronger synchronization.

These findings help to better understand how our brains allow us to perceive and connect with music. The work conducted by Weineck et al. should help other researchers to investigate this field; in particular, it shows how important it is to consider spectral flux rather than amplitude envelope in experiments that use actual music.

Introduction

Neural activity synchronizes to different types of rhythmic sounds, such as speech and music (Doelling and Poeppel, 2015; Nicolaou et al., 2017; Ding et al., 2017; Kösem et al., 2018) over a wide range of rates. In music, neural activity synchronizes with the beat, the most prominent isochronous pulse in music to which listeners sway their bodies or tap their feet (Tierney and Kraus, 2015; Nozaradan et al., 2012; Large and Snyder, 2009; Doelling and Poeppel, 2015). Listeners show a strong behavioral preference for music with beat rates around 2 Hz (here, we use the term tempo to refer to the beat rate). The preference for 2 Hz coincides with the modal tempo of Western pop music (Moelants, 2002) and the most prominent frequency of natural adult body movements (MacDougall and Moore, 2005). Indeed, previous research showed that listeners perceive rhythmic sequences at beat rates around 1–2 Hz especially accurately when they are able to track the beat by moving their bodies (Zalta et al., 2020). Despite the perceptual and motor evidence, studies looking at tempo-dependence of neural synchronization are scarce (Doelling and Poeppel, 2015; Nicolaou et al., 2017) and we are not aware of any human EEG study using naturalistic polyphonic musical stimuli that were manipulated in the tempo domain.

In the current study, we aimed to test whether the preference for music with beat rates around 2 Hz is reflected in the strength of neural synchronization by examining neural synchronization across a relatively wide and finely spaced range of musical tempi (1–4 Hz, corresponding to the neural δ band). In addition, a number of different musical, behavioral, and perceptual measures have been shown to modulate neural synchronization and influence music perception, including complexity, familiarity, repetition of the music, musical training of the listener, and attention to the stimulus (Kumagai et al., 2018; Madsen et al., 2019; Doelling and Poeppel, 2015). Thus, we investigated the effects of enjoyment, familiarity and the ease of beat perception on neural synchronization.

Most studies assessing neural synchronization to music have examined synchronization to either the stimulus amplitude envelope, which quantifies intensity fluctuations over time (Doelling and Poeppel, 2015; Kaneshiro et al., 2020; Wollman et al., 2020), or ‘higher order’ musical features such as surprise and expectation (Di Liberto et al., 2020). This mimics approaches used for studying neural synchronization to speech, where neural activity has been shown to synchronize with the amplitude envelope (Peelle and Davis, 2012), which roughly corresponds to syllabic fluctuations (Doelling et al., 2014), as well as to ‘higher order’ semantic information (Broderick et al., 2019). Notably, most studies that have examined neural synchronization to musical rhythm have used simplified musical stimuli, such as MIDI melodies (Kumagai et al., 2018) and monophonic melodies (Di Liberto et al., 2020), or rhythmic lines comprising clicks or sine tones (Nozaradan et al., 2012; Nozaradan et al., 2011; Wollman et al., 2020); only a few studies have focused on naturalistic, polyphonic music (Tierney and Kraus, 2015; Madsen et al., 2019; Kaneshiro et al., 2020; Doelling and Poeppel, 2015). ‘Higher order’ musical features are difficult to compute for naturalistic music, which is typically polyphonic and has complex spectro-temporal properties (Zatorre et al., 2002). However, amplitude-envelope synchronization is well documented: neural activity synchronizes to amplitude fluctuations in music between 1 Hz and 8 Hz, and synchronization is especially strong for listeners with musical expertise (Doelling and Poeppel, 2015).

Because of the complex nature of natural polyphonic music, we hypothesized that amplitude envelope might not be the only or most dominant feature to which neural activity could synchronize (Müller, 2015). Thus, the current study investigated neural responses to different musical features that evolve over time and capture different aspects of the stimulus dynamics. Here, we use the term musical feature to refer to time-varying aspects of music that fluctuate on time scales corresponding roughly to the neural δ band, as opposed to elements of music such as key, harmony or syncopation. We examined amplitude envelope, the first derivative of the amplitude envelope (usually more sensitive to sound onsets than the amplitude envelope), beat times, and spectral flux, which describes spectral changes of the signal on a frame-to-frame basis by computing the difference between the spectral vectors of subsequent frames (Müller, 2015). One potential advantage of spectral flux over the envelope or its derivative is that spectral flux is sensitive to rhythmic information that is communicated by changes in pitch even when they are not accompanied by changes in amplitude. Critically, temporal and spectral information jointly influence the perceived accent structure in music, which provides information about beat locations (Pfordresher, 2003; Ellis and Jones, 2009; Jones, 1993).

The current study investigated neural synchronization to natural music by using two different analysis approaches: Reliable Components Analysis (RCA) (Kaneshiro et al., 2020) and temporal response functions (TRFs) (Di Liberto et al., 2020). A theoretically important distinction here is whether neural synchronization observed using these techniques reflects phase-locked, unidirectional coupling between a stimulus rhythm and activity generated by a neural oscillator (Lakatos et al., 2019) versus the convolution of a stimulus with the neural activity evoked by that stimulus (Zuk et al., 2021). TRF analyses involve modeling neural activity as a linear convolution between a stimulus and relatively broad-band neural activity (e.g. 1–15 Hz or 1–30 Hz; Crosse et al., 2016; Crosse et al., 2021); as such, there is a natural tendency for papers applying TRFs to interpret neural synchronization through the lens of convolution (although there are plenty of exceptions to this e.g. Crosse et al., 2015; Di Liberto et al., 2015). RCA-based analyses usually calculate correlation or coherence between a stimulus and relatively narrow-band activity, and in turn interpret neural synchronization as reflecting entrainment of a narrow-band neural oscillation to a stimulus rhythm (Doelling and Poeppel, 2015; Assaneo et al., 2019). Ultimately, understanding under what circumstances and using what techniques the neural synchronization we observe arises from either of these physiological mechanisms is an important scientific question (Doelling et al., 2019; Doelling and Assaneo, 2021; van Bree et al., 2022). However, doing so is not within the scope of the present study, and we prefer to remain agnostic to the potential generator of synchronized neural activity. Here, we refer to and discuss ‘entrainment in the broad sense’ (Obleser and Kayser, 2019) without making assumptions about how neural synchronization arises, and we will moreover show that these two classes of analyses techniques strongly agree with each other.

We aimed to answer four questions. (1) Does neural synchronization to natural music depend on tempo? (2) Which musical feature shows the strongest neural synchronization during natural music listening? (3) How compatible are RCA- and TRF-based methods at quantifying neural synchronization to natural music? (4) How do enjoyment, familiarity, and ease of beat perception affect neural synchronization? To answer these research questions, we recorded electroencephalography (EEG) data while participants listened to instrumental music presented at different tempi (1–4 Hz). Strongest neural synchronization was observed in response to the spectral flux of music, for tempi between 1 and 2 Hz, to familiar songs, and to songs with an easy-to-perceive beat.

Results

Scalp EEG activity of 37 human participants was measured while they listened to instrumental segments of natural music from different genres (Appendix 1—table 1). Music segments were presented at thirteen parametrically varied tempi (1–4 Hz in 0.25 Hz steps; see Materials and methods). We assessed neural synchronization to four different musical features: amplitude envelope, first derivative of the amplitude envelope, beat times, and spectral flux. Neural synchronization was quantified using two different analysis pipelines and compared: (1) RCA combined with time- and frequency-domain analyses (Kaneshiro et al., 2020), and (2) TRFs (Crosse et al., 2016). As different behavioral and perceptual measures have been shown to influence neural synchronization to music (Madsen et al., 2019; Cameron et al., 2019), we investigated the effects of enjoyment, familiarity, and the ease with which a beat was perceived (Figure 1A). To be able to use a large variety of musical stimuli on the group level, and to decrease any effects that may have arisen from individual stimuli occurring at certain tempi but not others, participants were divided into four subgroups that listened to different pools of stimuli (for more details please see Materials and methods). The subgroups’ stimulus pools overlapped, but the individual song stimuli were presented at different tempi for each subgroup.

Figure 1 with 2 supplements see all
Experimental design and musical features.

(A) Schematic of the experimental procedure. Each trial consisted of the presentation of one music segment, during which participants were instructed to listen attentively without moving. After a 1 s silence, the last 5.5 s of the music segment was repeated while participants tapped their finger along with the beat. At the end of each trial, participants rated their enjoyment and familiarity of the music segment, as well as the ease with which they were able to tap to the beat (Translated English example in Figure: “How much did you like the song?” rated from “not at all” to “very much”). (B) Exemplary traces of the four musical features of one music segment. (C) Z-scored mean amplitude spectrum of all 4 musical features. Light orange dashed boxes highlight when the FFT Frequency corresponds to the stimulation tempo or first harmonic. (D) Mutual information (MI) for all possible feature combinations (green) compared to a surrogate distribution (yellow, three-way ANOVA, *pFDR <0.001, rest: pFDR <0.05). Boxplots indicate the median, the 25th and 75th percentiles (n=52). (E) MI scores between all possible feature combinations (*pFDR <0.001, rest: pFDR <0.05).

Figure 1—source data 1

Source data for visualizing and analyzing the musical features.

https://cdn.elifesciences.org/articles/75515/elife-75515-fig1-data1-v1.zip

Musical features

We examined neural synchronization to the time courses of four different musical features (Figure 1B). First, we quantified energy fluctuations over time as the gammatone-filtered amplitude envelope (we report analyses on the full-band envelope in Figure 2—figure supplement 1 and Figure 3—figure supplement 1). Second, we computed the half-wave-rectified first derivative of the amplitude envelope, which is typically considered to be sensitive to the presence of onsets in the stimulus (Bello et al., 2005). Third, a percussionist drummed along with the musical segments to define beat times, which were here treated in a binary manner. Fourth, a spectral novelty function, referred to as spectral flux (Müller, 2015), was computed to capture changes in frequency content (as opposed to amplitude fluctuations) over time. In contrast to the first derivative, the spectral flux is better able to identify note onsets that are characterized by changes in spectral content (pitch or timbre), even if the energy level remains the same. To ensure that each musical feature possessed acoustic cues to the stimulation-tempo manipulation, we computed a fast Fourier transform (FFT) on the musical-feature time courses separately for each stimulation-tempo condition; the mean amplitude spectra are plotted in Figure 1C.

Overall, amplitude peaks were observed at the intended stimulation tempo and at the harmonic rates for all stimulus features.

In order to assess the degree to which the different musical features might have been redundant, we calculated mutual information (MI) for all possible pairwise feature combinations and compared MI values to surrogate distributions calculated separately for each feature pair (Figure 1D and E). MI quantifies the amount of information gained about one random variable by observing a second variable (Cover and Thomas, 2005). MI values were analyzed using separate three-way ANOVAs (MI data vs. MI surrogate ×Tempo × Subgroup) for each musical feature.

Spectral flux shared significant information with all other musical features; significant MI (relative to surrogate) was found between amplitude envelope and spectral flux (F(1,102)=24.68, pFDR = 1.01e-5, η2=0.18), derivative and spectral flux (F(1,102)=82.3, pFDR = 1.92e-13, η2=0.45) and beat times and spectral flux (F(1,102)=23.05, pFDR = 1.3e-5, η2=0.13). This demonstrates that spectral flux captures information from all three other musical features, and as such, we expected that spectral flux would be associated with strongest neural synchronization. Unsurprisingly, there was also significant shared information between the amplitude envelope and first derivative (F(1,102)=14.11, pFDR = 4.67e-4, η2=0.09); other comparisons: (Fenv-beat(1,102)=8.44, pFDR = 0.006, η2=0.07; Fder-beat(1,102)=6.06, pFDR = 0.016, η2=0.05).

There was a main effect of Tempo on MI shared between the amplitude envelope and derivative (F(12,91)=4, pFDR = 2e-4, η2=0.32) and the spectral flux and beat times (F(12,91)=5.48, pFDR = 4.35e-6, η2=0.37) (Figure 1—figure supplement 1). This is likely due to the presence of slightly different songs in the different tempo conditions, as the effect of tempo on MI was unsystematic for both feature pairs (see Materials and methods and Appendix 1—table 1). MI for the remaining feature pairs did not differ significantly across tempi.

No significant differences in MI were observed between subgroups, despite the subgroups hearing slightly different pools of musical stimuli: (Fenv-der(3,100)=0.71, pFDR = 0.94, η2=0.01; Fenv-beat(3,100)=2.63, pFDR = 0.33, η2=0.07; Fenv-spec(3,100)=0.3, pFDR = 0.94, η2=0.01; Fder-beat(3,100)=0.43, pFDR = 0.94, η2=0.01; Fder-spec(3,100)=0.46, pFDR = 0.94, η2=0.01; Fbeat-spec(3,100)=0.13, pFDR = 0.94, η2=0.002).

Neural synchronization was strongest in response to slow music

Neural synchronization to music was investigated using two converging analysis pipelines based on (1) RCA followed by time- (stimulus-response correlation, SRCorr) and frequency- (stimulus-response coherence, SRCoh) domain analysis and (2) TRFs.

First, an RCA-based analysis approach was used to assess tempo effects on neural synchronization to music (Figure 2, Figure 2—figure supplement 1). RCA involves estimating a spatial filter that maximizes correlation across data sets from multiple participants (for more details see Materials and methods) (Kaneshiro et al., 2020; Parra et al., 2018). The resulting time course data from a single reliable component can then be assessed in terms of its correlation in the time domain (SRCorr) or coherence in the frequency domain (SRCoh) with different musical feature time courses. Our analyses focused on the first reliable component, which exhibited an auditory topography (Figure 2A). To control for inherent tempo-dependent effects that could influence our results (such as higher power or variance at lower frequencies, that is 1/f), SRCorr and SRCoh values were normalized by a surrogate distribution. This way the temporal alignment between the stimulus and neural time course was destroyed, but the spectrotemporal composition of each signal was preserved. The surrogate distribution was obtained by randomly circularly shifting the neural time course in relation to the musical features per tempo condition and stimulation subgroup for 50 iterations (Zuk et al., 2021). Subsequently, the ‘raw’ SRCorr or SRCoh values were z-scored by subtracting the mean and dividing by the standard deviation of the surrogate distribution.

Figure 2 with 2 supplements see all
Stimulus–response correlation and stimulus–response coherence are tempo dependent for all musical features.

(A) Projected topography of the first reliable component (RC1). (B) Average SRCorr of the aligned neural response and surrogate distribution (grey) across tempi for each musical feature (left) and the z-scored SRCorr based on a surrogate distribution (right) (± SEM; shaded area). Highest correlations were found at slow tempi (repeated-measure ANOVA, Greenhouse-Geiser correction where applicable). The slopes of regression models were used to compare the tempo-specificity between musical features. (C) Mean SRCorr across musical features. Highest correlations were found in response to spectral flux with significant differences between all possible feature combinations, pFDR <0.001, except between the envelope or derivative and beat onsets, pFDR <0.01 (n=34, repeated-measure ANOVA, Tukey’s test, median, 25th and 75th percentiles). Z-scored SRCoh in response to the (D) amplitude envelope, (E) first derivative, (F) beat onsets and (G) spectral flux. Each panel depicts the SRCoh as colorplot (left) and the pooled SRCoh values at the stimulation tempo and first harmonic (right, n=34, median, 25th and 75th percentile). (H) Same as (C) for the SRCoh with significant differences between all possible feature combinations (pFDR <0.001) apart between the envelope and beat onsets. Coherence values were averaged over the stimulus tempo and first harmonic. (I) Mean differences of SRCoh values at the stimulation tempo and first harmonic (n=34, negative values: higher SRCoh at harmonic, positive values: higher SRCoh at stimulation tempo, paired-sample t-test, pFDR <0.05). (J) Same as (I) based on the FFT amplitudes (pFDR <0.001).

Figure 2—source data 1

Source data for the RCA-based measures stimulus-response correlation (SRCorr) and stimulus-response coherence (SRCoh).

https://cdn.elifesciences.org/articles/75515/elife-75515-fig2-data1-v1.zip
Figure 2—source data 2

Output of the RCA-based analysis of the first two stimulation subgroups (based on Kaneshiro et al., 2020).

https://cdn.elifesciences.org/articles/75515/elife-75515-fig2-data2-v1.zip
Figure 2—source data 3

Output of the RCA-based analysis of the last two stimulation subgroups (based on Kaneshiro et al., 2020).

https://cdn.elifesciences.org/articles/75515/elife-75515-fig2-data3-v1.zip

The resulting z-scored SRCorrs were significantly tempo-dependent for the amplitude envelope and the spectral flux (repeated-measure ANOVAs with Greenhouse-Geiser correction where required: Fenv(12,429)=2.5, pGG = 0.015, η2=0.07; Fder(12,429)=1.67, p=0.07, η2=0.05; Fbeat(12,429)=0.94, p=0.5, η2=0.03; Fspec(12,429)=2.92, pGG = 6.88e-4, η2=0.08). Highest correlations were found at slower tempi (~1–2 Hz).

No significant differences were observed across subgroups (Fenv(3,30)=1.13, pFDR=0.55, η2=0.1; Fder(3,30)=0.72, pFDR = 0.55, η2=0.07; Fbeat(3,30)=0.85, pFDR = 0.55, η2=0.08; Fspec(3,30)=0.9, pFDR = 0.55, η2=0.08). The results for the z-scored SRCorr were qualitatively similar to the ‘raw’ SRCorr with biggest differences for the beat feature.

In the frequency domain, z-scored SRCoh (Figure 2D–G) showed clear peaks at the stimulation tempo and harmonics. Overall, SRCoh was stronger at the first harmonic of the stimulation tempo than at the stimulation tempo itself, regardless of the musical feature (Figure 2I, paired-sample t-test, envelope: t(12)=-5.16, pFDR = 0.001, re = 0.73; derivative: t(12)=-5.11, pFDR = 0.001, re = 0.72; beat: t(12)=-4.13, pFDR = 0.004, re = 0.64; spectral flux: t(12)=-3.3, pFDR = 0.01, re = 0.56). The stimuli themselves mostly also contained highest FFT amplitudes at the first harmonic (Figure 2J, envelope: t(12)=-6.81, pFDR = 5.23e-5, re=0.81; derivative: t(12)=-6.88, pFDR = 5.23e-5, re = 0.81; spectral flux: t(12)=-8.04, pFDR = 2.98e-5, re = 0.85), apart from the beat onsets (beat: t(12)=6.27, pFDR = 8.56–5. re = 0.79).

For evaluating tempo-dependent effects, we averaged z-scored SRCoh across the stimulation tempo and first harmonic and submitted the average z-SRCoh values to repeated-measure ANOVAs for each musical feature. Z-SRCoh was highest for slow music, but this tempo dependence was only significant for the spectral flux and beat onsets (Fenv(12,429)=1.31, p=0.21, η2=0.04; Fder(12,429)=1.71, p=0.06, η2=0.05; Fbeat(12,429)=2.07, pGG = 0.04, η2=0.06; Fspec(12,429)=2.82, pGG = 0.006, η2=0.08). No significant differences for the SRCoh were observed across subgroups (Fenv(3,30)=0.93, pFDR=0.58, η2=0.09; Fder(3,30)=3.07, pFDR = 0.17, η2=0.24; Fbeat(3,30)=2.26, pFDR = 0.2, η2=0.18; Fspec(3,30)=0.29, pFDR = 0.83, η2=0.03). Individual data examples of the SRCorr and SRCoh can be found in Figure 2—figure supplement 2.

Second, TRFs were calculated for each stimulation tempo. A TRF-based approach is a linear-system identification technique that serves as a filter describing the mapping of stimulus features onto the neural response (forward model) (Crosse et al., 2016). Using linear convolution and ridge regression to avoid overfitting, the TRF was computed based on mapping each musical feature to ‘training’ EEG data. Using a leave-one-trial-out approach, the EEG response for the left-out trial was predicted based on the TRF and the stimulus feature of the same trial. The predicted EEG data were then correlated with the actual, unseen EEG data (we refer to this correlation value throughout as TRF correlation). We analyzed the two outputs of the TRF analysis: the filter at different time lags, which typically resembles evoked potentials, and the TRF correlations (Figure 3, Figure 3—figure supplement 1).

Figure 3 with 3 supplements see all
TRFs are tempo dependent.

(A) Mean TRF (± SEM) correlations as a function of stimulation tempo per stimulus feature (p-values next to the legend correspond to a repeated-measure ANOVA across tempi for every musical feature and the p-value below to the slope comparison of a linear regression model). TRF correlations were highest for spectral flux and combined musical features for slow tempi. The TRF correlations were z-scored based on a surrogate distribution (right panel). (B) Violin plots of the TRF correlations across musical features. Boxplots illustrate the median, 25th and 75th percentiles (n=34). Significant pairwise musical feature comparisons were calculated using a repeated-measure ANOVA with follow-up Tukey’s test, *pFDR <0.001. (C) Top panel: Topographies of the TRF correlations and TRF time lags (0–400ms) in response to the amplitude envelope. Each line depicts one stimulation tempo (13 tempi between 1 Hz, blue and 4 Hz, green). Lower panel: Colormap of the normalized TRF weights of the envelope in the same time window across stimulation tempi. (D) Same as (C) for the first derivative, (E) beat onsets and (F) spectral flux. Cluster-based permutation testing was used to identify significant tempo-specific time windows (red dashed box, p<0.05). Inset: Mean TRF weights in response to the spectral flux for time lags between 102 and 211ms (n=34, median, 25th and 75th percentile).

Figure 3—source data 1

Source data of the TRF correlations and weights.

https://cdn.elifesciences.org/articles/75515/elife-75515-fig3-data1-v1.zip

Again, strongest neural synchronization (here quantified as Pearson correlation coefficient between the predicted and actual EEG data) was observed for slower music (Figure 3A). After z-scoring the TRF correlations with respect to the surrogate distributions, as described for the SRcorr and SRcoh measures, repeated-measures ANOVAs showed that significant effects of Tempo were observed for all musical features with z-TRF correlations being strongest at slower tempi (~1–2 Hz) (Fenv(12,429)=2.47, p=0.004, η2=0.07; Fder(12,429)=1.84, pGG = 0.04, η2=0.05; Fbeat(12,429)=3.81, pGG = 3.18e-4, η2=0.11; Fspec(12,429)=12.87, pGG = 3.87e-13, η2=0.29).

The original tempi of the music segments prior to being tempo manipulated fell mostly into the range spanning 1.25–2.5 Hz (Figure 1—figure supplement 2A). Thus, music that was presented at stimulation tempi in this range were shifted to a smaller degree than music presented at tempi outside of this range, and music presented at slow tempi tended to be shifted to a smaller degree than music presented at fast tempi (Figure 1—figure supplement 2B,C). Thus, we conducted a control analysis to show that there was no significant effect on z-TRF correlations of how much music stimuli were tempo shifted (2.25 Hz: F(2,96)=0.45, P=0.43; 1.5 Hz: F(2,24)=0.49, p=0.49; Figure 3—figure supplement 3; for more details see Materials and methods).

Spectral flux drives strongest neural synchronization

As natural music is a complex, multi-layered auditory stimulus, we sought to explore neural synchronization to different musical features and to identify the stimulus feature or features that would drive strongest neural synchronization. Regardless of the dependent measure (RCA-SRCorr, RCA-SRCoh, TRF correlation), strongest neural synchronization was found in response to spectral flux (Figures 2C, H, 3B). In particular, significant differences (as quantified with a repeated-measure ANOVA followed by Tukey’s test) were observed between the spectral flux and all other musical features based on z-scored SRCorr (FSRCorr(3,132)=39.27, pGG = 1.2e-16, η2=0.55), z-SRCoh (FSRCoh(3,132)=26.27, pGG = 1.72e-12, η2=0.45) and z-TRF correlations (FTRF(4,165)=30.09, pGG = 1.21e-13, η2=0.48).

As the TRF approach offers the possibility of running a multivariate analysis, all musical features were combined and the resulting z-scored TRF correlations were compared to the single-feature TRF correlations (Figure 3B). Although there was a significant increase in z-TRF correlations in comparison to the amplitude envelope (repeated-measure ANOVA with follow-up Tukey’s test, pFDR = 1.66e-08), first derivative (pFDR = 1.66e-8), and beat onsets (pFDR = 1.66e-8), the spectral flux alone showed an advantage over the multi-featured TRF (pFDR = 6.74e-8). Next, we ran a multivariate TRF analysis combining amplitude envelope, first derivative, and beat onsets, and then subtracted the predicted EEG data from the actual EEG data (Figure 3—figure supplement 2). We calculated a TRF forward model using spectral flux to predict the EEG data residualized with respect to the multivariate predictor combining the remaining musical features. The resulting TRF weights were qualitatively similar to the model with spectral flux as the only predictor of the neural response. Thus, taking all stimulus features together is not a better descriptor of the neural response than the spectral flux alone, indicating together with the MI results from Figure 1 that spectral flux is a more complete representation of the rhythmic structure of the music than the other musical features.

To test how strongly modulated TRF correlations were by tempo for each musical feature, a linear regression was fitted to single-participant z-TRF correlations as a function of tempo, and the slopes were compared across musical features (Figure 3A). Linear slopes were significantly higher for spectral flux and the multivariate model compared to the remaining three musical features (repeated-measure ANOVA with follow-up Tukey’s test, envelope-spectral flux: pFDR = 2.8e-6; envelope – all: pFDR = 2.88e-4; derivative-spectral flux: pFDR = 7.47e-8; derivative – all: pFDR = 2.8e-6; beat-spectral flux: pFDR = 2.47e-8; beat – all: pFDR = 2.09e-5; spectral flux – all: pFDR = 0.01). The results for z-SRCorr were qualitatively similar except for the comparison between the envelope and spectral flux (envelope-spectral flux: pFDR = 0.12; derivative-spectral flux: pFDR = 0.04; beat-spectral flux: pFDR = 6e-4; Figure 2B).

Finally, we also examined the time courses of TRF weights (Figure 3C–F) for time lags between 0 and 400ms, and how they depended on tempo. Cluster-based permutation testing (1,000 repetitions) was used to identify time windows in which TRF weights differed across tempi for each musical feature (see Materials and methods for more details). Significant effects of tempo on TRF weights were observed for spectral flux between 102–211ms (p=0.01; Figure 3F). The tempo specificity was observable in the amplitudes of the TRF weights, which were largest for slower music (Figure 3F). The TRFs for the amplitude envelope and first derivative demonstrated similar patterns to each other, with strong deflections in time windows consistent with a canonical auditory P1–N1–P2 complex, but did not differ significantly between stimulation tempi (Figure 3C–D). Similarly, the full-band (Hilbert) amplitude envelope and the corresponding first derivative (Figure 3—figure supplement 1) displayed tempo-specific effects at time lags of 250–400ms (envelope, p=0.01) and 281–400ms (derivative, p=0.02). Visual inspection suggested that TRF differences for these musical features were related to latency, as opposed to amplitude (Figure 3—figure supplement 1E-F,I-J). Therefore, we identified the latencies of the TRF-weight time courses within the time window of P3 and fitted a piece-wise linear regression to those mean latency values per musical feature (Figure 3—figure supplement 1G, K). In particular, TRF latency in the P3 time window decreased over the stimulation tempo conditions from 1 to 2.5 Hz and from 2.75 to 4 Hz for both stimulus features (derivative: T1-2.5Hz=-1.08, p=0.33, R2=0.03; T2.75-4Hz=-2.2, p=0.09, R2=0.43), but this was only significant for the envelope (T1-2.5Hz=-6.1, p=0.002, R2=0.86; T2.75-4Hz=-5.66, p=0.005, R2=0.86).

Results of TRF and SRCorr/SRCoh converge

So far, we demonstrated that both RCA- and TRF-based measures of neural synchronization led to similar results at the group level, and reveal strongest neural synchronization to spectral flux and at slow tempi. Next, we wanted to quantify the relationship between the SRCorr/SRCoh and TRF correlations across individuals (Figure 4, Figure 4—figure supplement 1). This could have implications for the interpretation of studies focusing only on one method. To test this relationship, we predicted TRF correlations from SRCorr or SRCoh values (fixed effect) in separate linear mixed-effects models with Participant and Tempo as random effects (grouping variables). For all further analyses, we used the ‘raw’ (non-z-scored) values for all dependent measures, as they yielded in the previous analysis (Figures 2 and 3) qualitatively similar results to the z-scored values. Each musical feature was modeled independently.

Figure 4 with 1 supplement see all
Significant relationships between SRCorr and TRF correlations for all musical features.

(A) Linear-mixed effects models of the SRCorr (predictor variable) and TRF correlations (response variable) in response to the amplitude envelope. Each dot represents the mean correlation of one participant (n=34) at one stimulation tempo (n=13) (=grouping variables; blue, 1 Hz-green, 4 Hz). Violin plots illustrate fixed effects coefficients (β). (B)-(D) same as (A) for the first derivative, beat onsets and spectral flux. For all musical features, the fixed effects were significant.

Figure 4—source data 1

Source data for comparing the results of the TRF and RCA-based measures.

https://cdn.elifesciences.org/articles/75515/elife-75515-fig4-data1-v1.mat

For all four musical features, SRCorr significantly predicted TRF correlations (tenv(440) = 9.77, βenv=0.53, pFDR <1e-15, R2=0.51; tder(440) = 8.09, βder=0.46, pFDR = 5.77e-14, R2=0.28; tbeat(440) = 12.12, βbeat=0.67, pFDR <1e-15, R2=0.61; tspec(440) = 12.49, βspec=0.56, pFDR = 1e-15, R2=0.76). The strongest correlations between neural synchronization measures were found for the beat onsets and spectral flux of music (Figure 4C and D).

In the frequency domain, we examined the SRCoh values at the stimulation tempo and first harmonic separately (Figure 4—figure supplement 1). SRCoh values at both the intended stimulation tempo and the first harmonic significantly predicted TRF correlations for all musical features. For all musical features, the first harmonic was a better predictor of TRF correlations than the intended stimulation tempo except for the beat onsets (intended tempo: tenv(440) = 4.78, βenv=0.17, pFDR = 3.15e-6, R2=0.34; tder(440) = 3.06, βder=0.1, pFDR = 0.002, R2=0.13; tbeat(440) = 8.12, βbeat=0.28, pFDR = 1.95e-14, R2=0.5; tspec(440) = 3.42, βspec=0.09, pFDR = 7.9e-4, R2=0.64; first harmonic: tenv(440) = 6.17, βenv=0.09, pFDR = 3.07e-9, R2=0.33; tder(440) = 4.98, βder=0.09, pFDR = 1.43e-6, R2=0.16; tbeat(440) = 8.79, βbeat=0.2, pFDR <1e-15, R2=0.51; tspec(440) = 6.87, βspec=0.09, pFDR = 5.82e-11, R2=0.64). Overall, these results suggest that, despite their analytical differences as well as common differences in interpretation, TRF and RCA–SRCorr/RCA-SRCoh seem to pick up on similar features of the neural response, but may potentially strengthen each other’s explanatory power when used together.

Familiar songs and songs with an easy-to-tap beat drive strongest neural synchronization

Next, we tested whether neural synchronization to music depended on (1) how much the song was enjoyed, (2) the familiarity of the song, and (3) how easy it was to tap the beat of the song; each of these characteristics was rated on a scale ranging between –100 and +100. We hypothesized that difficulty to perceive and tap to the beat in particular would be associated with weaker neural synchronization. Ratings on all three dimensions are shown in Figure 5A. To evaluate the effects of tempo on the individuals’ ratings, separate repeated-measure ANOVAs were conducted for each behavioral rating. All behavioral ratings were unaffected by tempo (enjoyment: F(12,429)=0.58, p=0.85, η2=0.02; familiarity: F(12,429)=1.44, pGG = 0.18, η2=0.04; ease of beat tapping: F(12,429)=1.62, P=0.08, η2=0.05).

Figure 5 with 3 supplements see all
TRF correlations are highest in response to familiar songs.

(A) Normalized (to the maximum value per rating/participant), averaged behavioral ratings of enjoyment, familiarity and easiness to tap to the beat (± SEM). No significant differences across tempo conditions were observed (repeated-measure ANOVA with Greenhouse-Geiser correction). (B) Mean TRF correlations topography across all ratings (based on the analysis of 15 trials with highest and lowest ratings per behavioral measure). (C) Violin plots of TRF correlations comparing low vs. highly enjoyed, low vs. highly familiar, and subjectively difficult vs. easy beat trials. Strongest TRF correlations were found in response to familiar music and music with an easy-to-perceive beat (n=34, paired-sample t-test, *pFDR <0.05). Boxplots indicate median, 25th and 75th percentile. (D) Mean TRFs (± SEM) for time lags between 0–400ms of more and less enjoyable music songs. (E)-(F) Same as (D) for trials with low vs. high familiarity and difficult vs. easy beat ratings.

Figure 5—source data 1

Source data of the behavioral ratings and TRF correlations.

https://cdn.elifesciences.org/articles/75515/elife-75515-fig5-data1-v1.zip

To assess the effects of familiarity, enjoyment, and beat-tapping ease on neural synchronization, TRFs in response to spectral flux were calculated for the 15 trials with the highest and the 15 trials with the lowest ratings per participant per behavioral rating (Figure 5B–F). TRF correlations were not significantly different for less enjoyed compared to more enjoyed music (paired-sample t-test, t(33)=1.91, pFDR = 0.06, re = 0.36; Figure 5C). In contrast, significantly higher TRF correlations were observed for familiar vs. unfamiliar songs (t(33)=-2.57, pFDR = 0.03, re = 0.46), and for songs with an easier-to-perceive beat (t(33)=-2.43, pFDR = 0.03, re = 0.44). These results were reflected in the TRFs at time lags between 0 and 400ms (Figure 5D–F). We wanted to test whether these TRF differences may have been attributable to acoustic features, such as the beat salience of the musical stimuli, which could have an effect on both behavioral ratings and TRFs. Thus, we computed single-trial FFTs on the spectral flux of the 15 highest vs. lowest rated trials (Figure 5—figure supplement 1). Pairwise comparisons revealed higher stimulus-related FFT peaks for more enjoyed music (t-test, t(33)=-2.79, pFDR = 0.01, re = 0.49), less familiar music (t(33)=2.73, pFDR = 0.01, re = 0.49) and easier-to-perceive beats (t(33)=-3.33, pFDR = 0.01, re = 0.56).

Next, we wanted to entertain the possibility that musical expertise could modulate neural synchronization to music (Doelling and Poeppel, 2015). We used the Goldsmith’s Musical Sophistication Index (Gold-MSI) to quantify musical ‘sophistication’ (referring not only to the years of musical training, but also e. g. musical engagement or self-reported perceptual abilities Müllensiefen et al., 2014), which we then correlated with neural synchronization. No significant correlations were observed between musical sophistication and TRF correlations (Pearson correlation, envelope: R=−0.21, pFDR = 0.32; derivative: R=−0.24, pFDR = 0.31; beats: R=−0.04, pFDR = 0.81; spectral flux: R=−0.34, pFDR = 0.2; Figure 5—figure supplement 2).

Discussion

We investigated neural synchronization to naturalistic, polyphonic music presented at different tempi. The music stimuli varied along a number of dimensions in idiosyncratic ways, including the familiarity and enjoyment of the music, and the ease with which the beat was perceived. The current study demonstrates that neural synchronization is strongest to (1) music with beat rates between 1 and 2 Hz, (2) spectral flux of music, and (3) familiar music and music with an easy-to-perceive beat. In addition, (4) analysis approaches based on TRF and RCA revealed converging results.

Neural synchronization was strongest to music with beat rates in the 1–2 Hz range

Strongest neural synchronization was found in response to stimulation tempi between 1 and 2 Hz in terms of SRCorr (Figure 2B), TRF correlations (Figure 3A), and TRF weights (Figure 3C–F). Moreover, we observed a behavioral preference to tap to the beat in this frequency range, as the group preference for music tapping was at 1.55 Hz (Figure 5—figure supplement 3). Previous studies have shown a preference to listen to music with beat rates around 2 Hz (Bauer et al., 2015), which is moreover the modal beat rate in Western pop music (Moelants, 2002) and the rate at which the modulation spectrum of natural music peaks (Ding et al., 2017). Even in nonmusical contexts, spontaneous adult human locomotion is characterized by strong energy around 2 Hz (MacDougall and Moore, 2005). Moreover, when asked to rhythmically move their bodies at a comfortable rate, adults will spontaneously move at rates around 2 Hz (McAuley et al., 2006) regardless whether they use their hands or feet (Rose et al., 2020). Thus, there is a tight link between preferred rates of human body movement and preferred rates for the music we make and listen to that was moreover reflected in our neural data. This is perhaps not surprising, as musical rhythm perception activates motor areas of the brain, such as the basal ganglia and supplementary motor area (Grahn and Brett, 2007), and is further associated with increased auditory–motor functional connectivity (Chen et al., 2008). In turn, involving the motor system in rhythm perception tasks improves temporal acuity (Morillon et al., 2014), but only for beat rates in the 1–2 Hz range (Zalta et al., 2020).

The tempo range within which we observed strongest synchronization partially coincides with the original tempo range of the music stimuli (Figure 1—figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo at which the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3—figure supplement 3), and thus are not able to rule out the influence of the magnitude of tempo manipulation on other tempo conditions.

In the frequency domain, SRCoh was strongest at the stimulation tempo and its harmonics (Figure 2D–G,I). In fact, highest coherence was observed at the first harmonic and not at the stimulation tempo itself (Figure 2I). This replicates previous work that also showed higher coherence (Kaneshiro et al., 2020) and spectral amplitude (Tierney and Kraus, 2015) at the first harmonic than at the musical beat rate. There are several potential reasons for this finding. One reason could be that the stimulation tempo that we defined for each musical stimulus was based on beat rate, but natural music can be subdivided into smaller units (e.g. notes) that can occur at faster time scales. A recent MEG study demonstrated inter-trial phase coherence for note rates up to 8 Hz (Doelling and Poeppel, 2015). Hence, the neural responses to the music stimuli in the current experiment were likely synchronized to not only the beat rate, but also faster elements such as notes. In line with this hypothesis, FFTs conducted on the stimulus features themselves showed higher amplitudes at the first harmonic than the stimulation tempo for all musical features except the beat onsets (Figure 2J). Moreover, there are other explanations for higher coherence at the first harmonic than at the beat rate. For example, the low-frequency beat-rate neural responses fall into a steeper part of the 1 /f slope, and as such may simply suffer from worse signal-to-noise ratio than their harmonics.

Regardless of the reason, since frequency-domain analyses separate the neural response into individual frequency-specific peaks, it is easy to interpret neural synchronization (SRCoh) or stimulus spectral amplitude at the beat rate and the note rate – or at the beat rate and its harmonics – as independent (Keitel et al., 2021). However, music is characterized by a nested, hierarchical rhythmic structure, and it is unlikely that neural synchronization at different metrical levels goes on independently and in parallel. One potential advantage of TRF-based analyses is that they operate on relatively wide-band data compared to Fourier-based approaches, and as such are more likely to preserve nested neural activity and perhaps less likely to lead to over- or misinterpretation of frequency-specific effects.

Neural synchronization is driven by spectral flux

Neural synchronization was strongest in response to the spectral flux of music, regardless whether the analysis was based on TRFs or RCA. Similar to studies using speech stimuli, music studies typically use the amplitude envelope of the sound to characterize the stimulus rhythm (Vanden Bosch der Nederlanden et al., 2020; Kumagai et al., 2018; Doelling and Poeppel, 2015; Decruy et al., 2019; Reetzke et al., 2021). Although speech and music share features such as amplitude fluctuations over time and hierarchical grouping (Patel, 2003), there are differences in their spectro-temporal composition that make spectral information especially important for music perception. For example, while successful speech recognition requires 4–8 spectral channels, successful recognition of musical melodies requires at least 16 spectral channels (Shannon, 2005) – the flipside of this is that music is more difficult than speech to understand based only on amplitude-envelope information. Moreover, increasing spectral complexity of a music stimulus enhances neural synchronization (Wollman et al., 2020). Previous work on joint accent structure indicates that spectral information is an important contributor to beat perception (Ellis and Jones, 2009; Pfordresher, 2003). Thus, it was our hypothesis in designing the current study that a feature that incorporates spectral changes over time, as opposed to amplitude differences only, would better capture how neural activity entrains to musical rhythm.

Using TRF analysis, we found that not only was neural synchronization to spectral flux stronger than to any other musical feature, it was also stronger than the response to a multivariate predictor that combined all musical features. For this reason, we calculated the shared information (MI) between each pair of musical features, and found that spectral flux shared significant information with all other musical features (Figure 1). Hence, spectral flux seems to capture information contained in, for example, the amplitude envelope, but also to contain unique information about rhythmic structure that cannot be gleaned from the other acoustic features (Figure 3).

One hurdle to performing any analysis of the coupling between neural activity and a stimulus time course is knowing ahead of time the feature or set of features that will well characterize the stimulus on a particular time scale given the nature of the research question. Indeed, there is no necessity that the feature that best drives neural synchronization will be the most obvious or prominent stimulus feature. Here, we treated feature comparison as an empirical question (Di Liberto et al., 2015), and found that spectral flux is a better predictor of neural activity than the amplitude envelope of music. Beyond this comparison though, the issue of feature selection also has important implications for comparisons of neural synchronization across, for example, different modalities.

For example, a recent study found that neuronal activity synchronizes less strongly to music than to speech Zuk et al., 2021; notably this paper focused on the amplitude envelope to characterize the rhythms of both stimulus types. However, our results show that neural synchronization is especially strong to the spectral content of music, and that spectral flux may be a better measure for capturing musical dynamics than the amplitude envelope (Müller, 2015). Imagine listening to a melody played in a glissando fashion on a violin. There might never be a clear onset that would be represented by the amplitude envelope – all of the rhythmic structure is communicated by spectral changes. Indeed, many automated tools for extracting the beat in music used in the musical information retrieval (MIR) literature rely on spectral flux information (Olivera et al., 2010). Also in the context of body movement, spectral flux has been associated with the type and temporal acuity of synchronization between the body and music at the beat rate (Burger et al., 2018) to a greater extent than other acoustic characterizations of musical rhythmic structure. As such, we found that spectral flux synchronized brain activity better than the amplitude envelope.

Neural synchronization was strongest to familiar songs and songs with an easy beat

We found that the strength of neural synchronization depended on the familiarity of music and the ease with which a beat could be perceived (Figure 5). This is in line with previous studies showing stronger neural synchronization to familiar music (Madsen et al., 2019) and familiar sung utterances (Vanden Bosch der Nederlanden et al., 2022). Moreover, stronger synchronization for musicians than for nonmusicians has been interpreted as reflecting musicians’ stronger expectations about musical structure. On the surface, these findings might appear to contradict work showing stronger responses to music that violated expectations in some way (Kaneshiro et al., 2020; Di Liberto et al., 2020). However, we believe these findings are compatible: familiar music would give rise to stronger expectations and stronger neural synchronization, and stronger expectations would give rise to stronger ‘prediction error’ when violated. In the current study, the musical stimuli never contained violations of any expectations, and so we observed stronger neural synchronization to familiar compared to unfamiliar music. There was also higher neural synchronization to music with subjectively ‘easy-to-tap-to’ beats. Overall, we interpret our results as indicating that stronger neural synchronization is evoked in response to music that is more predictable: familiar music and with easy-to-track beat structure.

Musical training did not affect the degree of neural synchronization in response to tempo-modulated music (Figure 5—figure supplement 2). This contrasts with previous music research showing that musicians’ neural activity was entrained more strongly by music than non-musicians’ (Madsen et al., 2019; Doelling and Poeppel, 2015; Di Liberto et al., 2020). There are several possible reasons for this discrepancy. One is that most studies that have observed differences between musicians and nonmusicians focused on classical music (Doelling and Poeppel, 2015; Madsen et al., 2019; Di Liberto et al., 2020), whereas we incorporated music stimuli with different instruments and from different genres (e.g. Rock, Pop, Techno, Western, Hip Hop, or Jazz). We suspect that musicians are more likely to be familiar with, in particular, classical music, and as we have shown that familiarity with the individual piece increases neural synchronization, these studies may have inadvertently confounded musical training with familiarity. Another potential reason for the lack of effects of musical training on neural synchronization in the current study could originate from the choice of utilizing acoustic audio descriptors as opposed to ‘higher order’ musical features. However, ‘higher order’ features such as surprise or entropy that have been shown to be influenced by musical expertise (Di Liberto et al., 2020) are difficult to compute for natural, polyphonic music.

TRF- and RCA-based measures show converging results

RCA and TRF approaches share their ability to characterize neural responses to single-trial, ongoing, naturalistic stimuli. As such, both techniques afford something that is challenging or impossible to accomplish with ‘classic’ ERP analysis. However, we made use of two techniques in parallel in order to leverage their unique advantages. RCA allows for frequency-domain analysis such as SRCoh, which can be useful for identifying neural synchronization specifically at the beat rate, for example. The frequency-specificity could serve as an advantage of the SRCoh over the TRF measures, where an EEG broadband signal was used. However, the RCA-based approaches Kaneshiro et al., 2020 have been criticized because of their potential susceptibility to autocorrelation, which is argued to be minimized in the TRF approach (Zuk et al., 2021), which uses ridge regression to dampen fast oscillatory components (Crosse et al., 2021). However, by minimizing the effects of auto-correlation one concern could be that this could remove neural oscillations of interest as well. TRFs also offer a univariate and multivariate analysis approach that allowed us to show that adding other musical features to the model did not improve the correspondence to the neural data over and above spectral flux alone.

Despite their differences, we found strong correspondence between the dependent variables from the two types of analyses. Specifically, TRF correlations were strongly correlated with stimulation-tempo SRCoh, and this correlation was higher than for SRCoh at the first harmonic of the stimulation tempo for the amplitude envelope, derivative and beat onsets (Figure 4—figure supplement 1). Thus, despite being computed on a relatively broad range of frequencies, the TRF seems to be correlated with frequency-specific measures at the stimulation tempo. The strong correspondence between the two analysis approaches has implications for how users interpret their results. Although certainly not universally true, we have noticed a tendency for TRF users to interpret their results in terms of a convolution of an impulse response with a stimulus, whereas users of stimulus–response correlation or coherence tend to speak of entrainment of ongoing neural oscillations. The current results demonstrate that the two approaches produce similar results, even though the logic behind the techniques differs. Thus, whatever the underlying neural mechanism, using one or the other does not necessarily allow us privileged access to a specific mechanism.

Conclusions

This study presented new insights into neural synchronization to natural music. We compared neural synchronization to different musical features and showed strongest neural responses to the spectral flux. This has important implications for research on neural synchronization to music, which has so far often quantified stimulus rhythm with what we would argue is a subpar acoustic feature – the amplitude envelope. Moreover, our findings demonstrate that neural synchronization is strongest for slower beat rates, and for predictable stimuli, namely familiar music with an easy-to-perceive beat.

Materials and methods

Participants

Thirty-seven participants completed the study (26 female, 11 male, mean age = 25.7 years, SD = 4.33 years, age range = 19–36 years). Target sample size for this was estimated using G*Power3, assuming 80% power for a significant medium-sized effect. We estimate a target sample size of 24 (+4) for within-participant condition comparisons and 32 (+4) for correlations, and defaulted to the larger value since this experiment was designed to investigate both types of effects. The values in parentheses were padding to allow for discarding ~15% of the recorded data. The datasets of three participants were discarded because of large artefacts in the EEG signal (see section EEG data Preprocessing), technical problems or for not following the experimental instructions. The behavioral and neural data of the remaining 34 participants were included in the analysis.

Prior to the EEG experiment, all participants filled out an online survey about their demographic and musical background using LimeSurvey (LimeSurvey GmbH, Hamburg, Germany, http://www.limesurvey.org). All participants self-identified as German speakers. Most participants self-reported normal hearing (seven participants reported occasional ringing in one or both ears). Thirty-four participants were right- and three were left-handed. Musical expertise was assessed using the Goldsmith Music Sophistication Index (Gold-MSI; Müllensiefen et al., 2014). Participants received financial compensation for participating (Online: 2.50 €, EEG: 7€ per 30 min). All participants signed the informed consent before starting the experiment. The study was approved by the Ethics Council of the Max Planck Society Ethics Council in compliance with the Declaration of Helsinki (Application No: 2019_04).

Stimuli

Request a detailed protocol

The stimulus set started from 39 instrumental versions of musical pieces from different genres, including techno, rock, blues, and hip-hop. The musical pieces were available in a *.wav format on Qobuz Downloadstore (https://www.qobuz.com/de-de/shop). Each musical piece was segmented manually using Audacity (Version 2.3.3, Audacity Team, https://www.audacityteam.org) at musical phrase boundaries (e.g. between chorus and verse), leading to a pool of 93 musical segments with varying lengths between 14.4 and 38 s. We did not use the beat count from any publicly available beat-tracking softwares, because they did not track beats reliably across genres. Due to the first Covid-19 lockdown, we assessed the original tempo of each musical segment using an online method. Eight tappers, including the authors, listened to and tapped to each segment on their computer keyboard for a minimum of 17 taps; the tempo was recorded using an online BPM estimation tool (https://www.all8.com/tools/bpm.htm). In order to select stimuli with unambiguous strong beats that are easy to tap to, we excluded 21 segments due to high variability in tapped metrical levels (if more than 2 tappers tapped different from the others) or bad sound quality.

The remaining 72 segments were then tempo-manipulated using a custom-written MAX patch (Max 8.1.0, Cycling ’74, San Francisco, CA, USA). Each segment was shifted to tempi between 1 and 4 Hz in steps of 0.25 Hz. All musical stimuli were generated using the MAX patch, even if the original tempo coincided with the stimulation tempo. Subsequently, the authors screened all of the tempo-shifted music and eliminated versions where the tempo manipulation led to acoustic distortions, made individual notes indistinguishable, or excessively repetitive. Overall, 703 music stimuli with durations of 8.3–56.6 s remained. All stimuli had a sampling rate of 44,100 Hz, were converted from stereo to mono, linearly ramped with 500 ms fade-in and fade-out and root-mean-square normalized using Matlab (R2018a; The MathWorks, Natick, MA, USA). A full overview of the stimulus segments, the original tempi and the modulated tempo range can be found in the Appendix (Appendix 1—table 1, Figure 1—figure supplement 2).

Each participant was assigned to one of four pseudo-randomly generated stimulus lists. Each list comprised 4–4.6 min of musical stimulation per tempo condition (Kaneshiro et al., 2020), resulting in 7–17 different musical segments per tempo and a total of 159–162 segments (trials) per participant. Each segment was repeated only once per tempo but was allowed to occur up to three times at different tempi within one experimental session (tempo difference between two presentations of the same segment was 0.5 Hz minimum). The presentation order of the musical segments was randomly generated for each participant prior to the experiment. The music stimuli were played at 50 dB sensation level (SL), based on individual hearing thresholds that were determined using the method of limits (Leek, 2001).

Experimental design

Request a detailed protocol

After attaching the EEG electrodes and seating the participant in an acoustically and electrically shielded booth, the participant was asked to follow the instructions on the computer screen (BenQ Monitor XL2420Z, 144 Hz, 24”, 1920 × 1080, Windows 7 Pro (64-bit)). The auditory and visual stimulus presentation was achieved using custom-written Matlab scripts using Psychtoolbox (PTB-3, Brainard, 1997) in Matlab (R2017a; The MathWorks, Natick, MA, USA). Upon publication the Source Code for stimulus presentation can be found on the projects OSF repository (Weineck et al., 2022).

The overall experimental flow for each participant can be found in Figure 1A. First, each participant conducted a self-paced spontaneous motor tempo task (SMT; Fraisse, 1982), which is a commonly used technique to assess individual’s preferred tapping rate (Rimoldi, 1951, Mcauley, 2010). To obtain SMT, each participant tapped for thirty seconds (3 repetitions) at a comfortable rate with a finger on the table close to a contact microphone (Oyster S/P 1605, Schaller GmbH, Postbauer-Heng, Germany). Second, we estimated individual’s hearing threshold using the method of limits. All sounds in this study were delivered by a Fireface soundcard (RME Fireface UCX Audiointerface, Audio AG, Haimhausen, Germany) via on-ear headphones (Beyerdynamics DT-770 Pro, Beyerdynamic GmbH & Co. KG, Heilbronn, Germany). After a short three-trial training, the main task was performed. The music stimuli in the main task were grouped into eight blocks with approximately 20 trials per block and the possibility to take a break in between.

Each trial comprised two parts: attentive listening (music stimulation without movement) and tapping (music stimulation +finger tapping; Figure 1A). During attentive listening, one music stimulus was presented (8.3–56.6 s) while the participant looked at a fixation cross on the screen; the participant was instructed to mentally locate the beat without moving. Tapping began after a 1 s interval; the last 5.5 s of the previously listened musical segment were repeated, and participants were instructed to tap a finger to the beat of the musical segment (as indicated by the replacement of the fixation cross by a hand on the computer screen). Note that 5.5 s of tapping data is not sufficient to conduct standard analyses of sensorimotor synchronization; rather, our goal was to confirm that the participants tapped at the intended beat rate based on our tempo manipulation. After each trial, participants were asked to rate the segment based on enjoyment/pleasure, familiarity and ease of tapping to the beat with the computer mouse on a visual analogue scale ranging from –100 to +100. At the end of the experiment, the participant performed the SMT task again for three repetitions.

EEG data acquisition

Request a detailed protocol

EEG data were acquired using BrainVision Recorder (v.1.21.0303, Brain Products GmbH, Gilching, Germany) and a Brain Products actiCap system with 32 active electrodes attached to an elastic cap based on the international 10–20 location system (actiCAP 64Ch Standard-2 Layout Ch1-32, Brain Products GmbH, Gilching, Germany). The signal was referenced to the FCz electrode and grounded at the AFz position. Electrode impedances were kept below 10 kOhm. The brain activity was acquired using a sampling rate of 1000 Hz via a BrainAmp DC amplifier (BrainAmp ExG, Brain Products GmbH, Gilching, Germany). To ensure correct timing between the recorded EEG data and the auditory stimulation, a TTL trigger pulse over a parallel port was sent at the onset and offset of each musical segment and the stimulus envelope was recorded to an additional channel using a StimTrak (StimTrak, Brain Products GmbH, Gilching, Germany).

Data analysis

Behavioral data

Request a detailed protocol

Tapping data were processed offline with a custom-written Matlab script. To extract the taps, the *.wav files were imported and downsampled (from 44.1 kHz to 2205 Hz). The threshold for extracting the taps was adjusted for each trial manually (SMT and music tapping) and trials with irregular tap intervals were rejected. The SMT results were not analyzed as part of this study and will not be discussed further. For the music tapping, only trials with at least three taps (two intervals) were included for further analysis. Five participants were excluded from the music tapping analysis due to irregular and inconsistent taps within a trial (if >40% of the trials were excluded).

On each trial, participants were asked to rate the musical segments based on enjoyment/pleasure, familiarity and ease to tap to the beat. The rating scores were normalized to the maximum absolute rating per participant and per category. For the group analysis the mean and standard error of the mean (SEM) were calculated. For assessing the effects of each subjective dimension on neural synchronization, the 15 trials with the highest and lowest ratings (regardless of the tempo) per participant were further analyzed (see EEG – Temporal Response Function).

Audio analysis

Request a detailed protocol

We assessed neural synchronization to four different musical features (Figure 1B–C). Note that the term ‘musical feature’ is used to describe time-varying features of music that operate on a similar time-scale as neural synchronization as opposed to the classical musical elements such as syncopation or harmony; (1) Amplitude envelope – gammatone filtered amplitude envelope in the main manuscript and absolute value of the full-band Hilbert envelope in the figure supplement; the gammatone filterbank consisted of 128 channels linearly spaced between 60 and 6000 Hz. (2) Half-wave rectified, first derivative of the amplitude envelope, which detects energy changes over time and is typically more sensitive to onsets (Daube et al., 2019; Di Liberto et al., 2020). (3) Binary-coded beat onsets (0=no beat; 1=beat); a professionally trained percussionist tapped with a wooden drumstick on a MIDI drum pad to the beat of each musical segment at the original tempo (three trials per piece). After latency correction, the final beat times were taken as the average of the two takes with the smallest difference (Harrison and Müllensiefen, 2018). (4) Spectral novelty (‘spectral flux’) (Müller, 2015) was computed using a custom-written Python script (Python 3.6, Spyder 4.2.0) using the packages numpy and librosa. For computing the spectral flux of each sound, the spectrogram across frequencies of consecutive frames (frame length = 344 samples) was compared. The calculation of the spectral flux is based on the logarithmic amplitude spectrogram that results in a 1D vector (spectral information fluctuating over time). All stimulus features were z-scored and downsampled to 128 Hz for computing the stimulus-brain synchrony. To account for slightly different numbers of samples between stimulus features, they were cut to have matching sample sizes.

To validate that each musical feature contained acoustic cues to our tempo manipulation, we conducted a discrete Fourier transform using a Hamming window on each musical segment (resulting frequency resolution of 0.0025 Hz), averaged and z-scored the amplitude spectra per tempo and per musical feature (Figure 1C).

To assess how much information the different musical features share, a mutual information (MI) score was computed between each pair of musical features (Figure 1D). MI (in bits) is a time-sensitive measure that quantifies the reduction of uncertainty for one variable after observing a second variable (Cover and Thomas, 2005). MI was computed using quickMI from the Neuroscience Information Theory Toolbox with 4 bins, no delay, and a p-value cut-off of 0.001 (Timme and Lapish, 2018). For each stimulus feature, all trials were concatenated in the same order for each tempo condition and stimulation subgroup (Time x 13 Tempi x 4 Subgroups). MI values for pairs of musical features were compared to surrogate datasets in which one musical feature was time reversed (Figure 1D). To statistically assess the shared information between musical features, a three-way ANOVA test was performed (with first factor: data-surrogate comparison; second factor: tempo and third factor: stimulation subgroup).

EEG data preprocessing

Request a detailed protocol

Unless stated otherwise, all EEG data were analyzed offline using custom-written Matlab code (R2019b; The MathWorks, Natick, MA, USA) combined with the Fieldtrip toolbox (Oostenveld et al., 2011). The continuous EEG data were bandpass filtered between 0.5 and 30 Hz (Butterworth filter), re-referenced to the average reference, downsampled to 500 Hz, and epoched between 1 s after stimulus onset (to remove onset responses to the start of the music stimulus) until the end of the initial musical segment presentation (attentive listening part of the trial). Single trials and channels containing large artefacts were removed based on an initial visual inspection. Missing channels were interpolated based on neighbouring channels with a maximum distance of 3 (ft_prepare_neighbours). Subsequently, Independent Component Analysis (ICA) was applied to remove artefacts and eye movements semi-automatically. After transforming the data back from component to electrode space, electrodes that exceeded 4 standard deviations of the mean squared data for at least 10% of the recording time were excluded. If bad electrodes were identified, pre-processing for that recording was repeated after removing the identified electrode (Kaneshiro et al., 2020). For the RCA analysis, if an electrode was identified for which 10% of the trial data exceeded a threshold of mean +2 standard deviations of the single-trial, single-electrode mean squared amplitude, the electrode data of the entire trial was replaced by NaNs. Next, noisy transients of the single-trial, single-electrode recordings were rejected. Therefore, data points were replaced by NaNs when the data points exceeded a threshold of two standard deviations of the single-trial, single-electrode mean squared amplitude. This procedure was repeated four times to ensure that all artefacts were removed (Kaneshiro et al., 2020). For the TRF analysis, which does not operate on NaNs, noisy transients were replaced by estimates using shape-preserving piecewise cubic spline interpolation or by the interpolation of neighbouring channels for single-trial bad electrodes.

Next, the data were restructured to match the requirements of the RCA or TRF (see sections EEG – Temporal Response Function and EEG – Reliable Component Analysis), downsampled to 128 Hz and z-scored. If necessary, the neural data were cut to match the exact sample duration of the stimulus feature per trial. For the RCA analysis approach, the trials in each tempo condition were concatenated resulting in a time-by-electrode matrix (Time x 32 Electrodes; with Time varying across tempo condition). Subsequently the data of participants in the same subgroup were pooled together in a time-by-electrode-by-participant matrix (Time x 32 Electrodes x 9 or 10 Participants depending on the subgroup). In contrast to the RCA, for the TRF analysis, trials in the same stimulation condition were not concatenated in time, but grouped into cell arrays per participant according to the stimulus condition (Tempo x Trials x Electrodes x Time).

EEG – reliable component analysis

Request a detailed protocol

To reduce data dimensionality and enhance the signal-to-noise ratio, we performed RCA (reliable components analysis, also correlated components analysis) (Dmochowski et al., 2012). RCA is designed to capture the maximum correlation between datasets of different participants by combining electrodes linearly into a vector space. One important feature of this technique is that it maximizes the correlation between electrodes across participants (which differentiates it from the similar canonical correlation analysis) (Madsen et al., 2019). Using the rcaRun Matlab function (Dmochowski et al., 2012; Kaneshiro et al., 2020), the time-by-electrode matrix was transformed to a time-by-component matrix with the maximum across-trial correlation in the first reliable component (RC1), followed by components with correlation values in descending order. For each RCA calculation, for each tempo condition and subgroup, the first three RCs were retained, together with forward-model projections for visualizing the scalp topographies. The next analysis steps in the time and frequency-domain were conducted on the maximally correlated RC1 component.

To examine the correlation between the neural signal and stimulus over time, the stimulus-response correlation (SRCorr) was calculated for every musical feature. This analysis procedure was adopted from Kaneshiro et al., 2020. In brief, every stimulus feature was concatenated in time with trials of the same tempo condition and subgroup to match the neural component-by-time matrix. The stimulus features were temporally filtered to account for the stimulus–brain time lag, and the stimulus features and neural time-courses were correlated. To create a temporal filter, every stimulus feature was transformed into a Toeplitz matrix, where every column repeats the stimulus-feature time course, shifted by one sample up to a maximum shift of 1 s, plus an additional intercept column. The Moore-Penrose pseudoinverse of the Toeplitz matrix and temporal filter was used to calculate the SRCorr. To report the SRCorr, the mean (± SEM) correlation coefficient across tempo conditions for every stimulus feature was calculated. For comparing tempo-specificity between musical features, a linear regression was fit to SRCorr values (and TRF correlations) as a function of tempo for every participant and for every musical feature (using fitlm). We compared the resulting slopes across musical features with a one-way ANOVA.

Stimulus-response coherence (SRCoh) is a measure that quantifies the consistency of phase and amplitude of two signals in a specific frequency band and ranges from 0 (no coherence) to 1 (perfect coherence) (Srinivasan et al., 2007). Here, the magnitude-squared coherence between different stimulus features and neural data was computed using the function mscohere with a Hamming window of 5 s and 50% overlap, resulting in a frequency range 0–64 Hz with a 0.125 Hz resolution. As strong coherence was found at the stimulation tempo and the first harmonic, the SRCoh values of each frequency vector were compared between musical features.

In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.

EEG – temporal response function

Request a detailed protocol

The TRF is a modeling technique, which computes a filter that optimally describes the relationship between the brain response and stimulus features (Ding and Simon, 2012; Crosse et al., 2016). Via linear convolution, the filter delineates how the stimulus features map onto the neural response (forward model), using ridge regression to avoid overfitting (range of lambda values: 10-6 - 106). All computations of the TRF used the Matlab toolbox “The multivariate Temporal Response Function (mTRF) Toolbox” (Crosse et al., 2016). The TRF was calculated in a leave-one-out cross-validation manner for all trials per stimulation tempo; this procedure was repeated for each musical feature separately, and additionally for all musical features together in a multivariate model (using mTRFcrossval and mTRFtrain) using time lags 0–400ms (Di Liberto et al., 2020). For the multivariate TRF approach, the stimulus features were combined by replacing the single time-lag vector by several time-lag vectors for every musical feature (Time x 4 musical features at different time lags). Using mTRFpredict, the neural time course of the left-out trial was predicted based on the time course of the corresponding musical feature of that trial. The quality of the predicted neural data was assessed by computing Pearson correlations between the predicted and actual EEG data separately for each electrode (TRF correlations). We averaged over the seven to eight electrodes with the highest TRF correlations that also corresponded to a canonical auditory topography. To quantify differences in the TRFs, the mean TRF correlation across stimulation tempo and/or musical feature was calculated per participant. The TRF weights across time lags were Fisher-z-scored (Figure 3C–F; Crosse et al., 2016). Analogous to the SRCorr and SRCoh, the TRF correlations were z-scored based on subtracting the mean and dividing the standard deviation of a surrogate distribution which was generated by shifting the neural data randomly relative to the musical features during the training and prediction of the TRF for 50 iterations per participant and stimulation tempo.

We tested the effects of more vs. less modulated music segments on the neural response by comparing TRF correlations within a stimulation tempo condition (Figure 3—figure supplement 3). Therefore, we took up to three trials per participant within the 2.25 Hz stimulation tempo condition where the original tempo ranged between (2.01–2.35 Hz) and compared them to up to three trials where the original tempo was slower (1.25–1.5 Hz). The same analysis was repeated in the 1.5 Hz stimulation tempo condition (original tempo ~1.25–1.6 Hz vs. originally faster music at ~2.1–2.5 Hz).

The assessment of TRF weights across time lags was accomplished by using a clustering approach for each musical feature and comparing significant data clusters to clusters from a random distribution (Figure 3C–F). To extract significant time windows in which the TRF weights were able to differentiate the different tempo conditions, a one-way ANOVA was performed at each time point. Clusters (consecutive time windows) were identified if the p-value was below a significance level of 0.01 and the size and F-statistic of those clusters were retained. Next, the clusters were compared to a surrogate dataset, which followed the same procedure, but had the labels of the tempo conditions randomly shuffled before entering it to the ANOVA. This step was repeated 1000 times (permutation testing). At the end, the significance of clusters was evaluated by subtracting the proportion of times the summed F-values of each clusters exceeded the summed F-values of the surrogate clusters from 1. A p-value below 0.05 was considered significant (Figure 3G). This approach yielded significant regions for the full-band (Hilbert) envelope and derivative (Figure 3—figure supplement 1). As these clusters did not show differences across amplitudes but rather in time, a latency analysis was conducted. Therefore, local minima around the grand average minimum or maximum within the significant time lag window were identified for every participant/tempo condition and the latencies retained. As there was no significant correlation between latencies and tempo conditions, the stimulation tempi were split upon visual inspection into two groups (1–2.5 Hz and 2.75–4 Hz). Subsequently, a piecewise linear regression was fitted to the data and the R2 and p-values calculated (Figure 3—figure supplement 1).

In order to test whether spectral flux predicted the neural signal over and above the information it shared with the amplitude envelope, first derivative and beat onsets, we calculated TRFs for spectral flux after ‘partialing out’ their effects (Figure 3—figure supplement 2). This was achieved by first calculating TRF predictions based on a multivariate model comprising the amplitude envelope, derivate and beat onsets, and second, subtracting those predictions from the ‘actual’ EEG data and using the residual EEG data to compute a spectral flux model.

TRFs were evaluated based on participant ratings of enjoyment, familiarity, and ease to tap to the beat. Two TRFs were calculated per participant based on the 15 highest and 15 lowest ratings on each measure (ignoring tempo condition and subgroup), and the TRF correlations and time lags were compared between the two groups of trials (Figure 5). Significant differences between the groups were evaluated based on paired-sample t-tests.

The effect of musical sophistication was analyzed by computing the Pearson correlation coefficients between the maximum TRF correlation across tempi per participant and the general musical sophistication (Gold-MSI) per participant (Figure 5—figure supplement 2).

EEG – comparison of TRF and RCA measures

Request a detailed protocol

The relationship between the TRF analysis approach and the SRCorr was calculated using a linear-mixed effects model (using fitlme). Participant and tempo were random (grouping) effects; SRCorr the fixed (predictor) effect and TRF correlations the response variable. To examine the underlying model assumption, the residuals of the linear-mixed effects model were plotted and checked for consistency. The best predictors of the random effects and the fixed-effects coefficients (beta) were computed for every musical feature and illustrated as violin plot (Figure 4).

Statistical analysis

Request a detailed protocol

For each analysis, we assessed the overall difference between multiple subgroups using a one-way ANOVA. To test for significant differences across tempo conditions and musical features (TRF Correlation, SRCorr and SRCoh), repeated-measure ANOVAs were conducted coupled to Tukey’s test and Greenhouse-Geiser correction was applied when the assumption of sphericity was violated (as calculated with the Mauchly’s test). As effect size measures, we report partial η2 for repeated-measures ANOVAs and requivalent for paired sample t-test (Rosenthal and Rubin, 2003). Where applicable, the p-values were corrected using the False Discovery Rate (FDR).

Appendix 1

Appendix 1—table 1
Overview over the music stimuli.

Parameters of stimulus creation for all 72 musical stimulus segments. The columns indicate 1. the stimulus number, 2. title of the musical piece, 3. the Artist of each musical piece, 4. the CD each piece was taken from (available at Qobuz Downloadstore), 5. timestamp of the music segment onset relative to the start of the recording [min.sec,ms], 6. duration of the music segment [sec] relative to the start of the music segment, 7. original tempo of excerpt [BPM; beats per minute] based on the taps of the authors and their colleagues and 8. frequency range [Hz] of the tempo-modulation (in 0.25 Hz steps) for each music piece.

No.TitleArtistCDStart [min]Duration [sec]Tempo [BPM/Hz]Range [Hz]
1Abba MedleySuper TroopersInstrumental Pop Hits7.30,7122,99136.09 /2.271.5–3.75
2Abba MedleySuper TroopersInstrumental Pop Hits8.59,5721,69135.92 /2.271.75–4
3All is AliveFrancesco P.Instrumental Hits, Vol.11.22,8521,41128.27 /2.141.5–4
4All is AliveFrancesco P.Instrumental Hits, Vol.12.06,7221,62127.98 /2.131.5–4
5ApacheThe ShadowsRock Story "Instrumental Versions"0.39,5914,78135.22 /2.251.75–4
6ApacheThe ShadowsRock Story "Instrumental Versions"0.54,2521,60133.86 /2.231.75–4
7La BikinaRubén Fuentes GassonBachata0.47,4914,99124.83 /2.081.75–4
8BulldogThe VenturesRock Story "Instrumental Versions"0.06,1518,13151.33 /2.522–4
9Careless WhisperMads HaaberInstrumental Pop Hits2.41,9325,5076.93 /1.281–2.75
10CocaineCorben CassavetteInstrumental Pop Hits1.39,2925,74105.13 /1.751.5–4
11Dark PlaceBeataddictzStreet Beatz, Vol.20.20,8322,2292.13 /1.541–3.75
12F.B. I.The ShadowsRock Story "Instrumental Versions"0.20,5915,31140.05 /2.331.25–4
13Five TripsTr3ntatr3 GiriInstrumental Hits, Vol.11.48,7916,48123.24 /2.051.5–4
14GuyboEddie CochranRock Story "Instrumental Versions"0.18,9916,28110.00 /1.831.5–3
15Gypsy Salsa, Cha Cha BeatCorp Latino Dance GroupHot Latin Dance1.57,0624,26100.04 /1.671.5–3
16Highway RiderzBeataddictzStreet Beatz, Vol.20.19,6130,3097.13 /1.621–4
17In GoChuck BerryRock Story "Instrumental Versions"0.26,1024,00116.30 /1.941.5–4
18Oh by Jingo!Chet AtkinsRock Story "Instrumental Versions"0.07,4723,59120.55 /2.011–3.5
19Keep It 1,000BeataddictzStreet Beatz, Vol.21.13,8725,1078.06 /1.301–3
20The Last DayBeataddictzStreet Beatz, Vol.21.27,4022,5288.13 /1.471–2.5
21For the Last TimeBeataddictzStreet Beatz, Vol.21.16,8826,2174.86 /1.251–3.25
22Lights OutBeataddictzStreet Beatz, Vol.20.32,1221,6089.12 /1.491.25–3.5
23I likeFrancesco P.Instrumental Hits, Vol.11.34,1927,95112.13 /1.871.25–3.75
24Dark LineAlex CundariInstrumental Hits, Vol.10.37,8119,75106.08 /1.771.25–3
25Live ForeverThe WonderwallsInstrumental Pop Hits1.16,2822,6190.11 /1.501.5–3.75
26Lucy in the Sky with DiamondsRicardo CalienteInstrumental Pop Hits2.43,5523,9381.11 /1.351–3.5
27MonalisaKen LaszloInstrumental Hits, Vol.11.13,3729,79129.97 /2.171.5–4
28MonalisaKen LaszloInstrumental Hits, Vol.12.13,0128,80130.02 /2.171.5–4
29Can't Fight the MoonlightJon CarranInstrumental Pop Hits1.10,0218,5697.95 /1.631.5–3.75
30Muy TranquiloGramatikMuy Tranquilo2.05,126,5990.04 /1.501–3.25
31No MercyBeataddictzStreet Beatz, Vol.20.12,1526,4176.11 /1.271–3.5
32I'm A PushaBeataddictzStreet Beatz, Vol.20.11,0722,6085.15 /1.421–3.25
33Rockin' the Blues AwayTiny Grimes QuintetRock Story "Instrumental Versions"0.05,6120,84141.05 /2.351.5–4
34The Rocking GuitarIni KamozeRock Story "Instrumental Versions"0.17,3916,21118.66 /1.981.5–3.25
35Country Rodeo SongMarco RinaldoCountry Instrumental Mix1.46,3527,70112.94 /1.881.5–3.75
36I Shot the SheriffCorben CassavetteInstrumental Pop Hits0.19,5225,5494.12 /1.571.25–4
37Sing Sing SingBenny GoodmanSing Sing Sing0.18,2336,01108.68 /1.811.5–3
38Si Una VezPete AstudilloBachata0.46,5416,95124.54 /2.081.5–3
39I'm Still StandingRicardo CalienteInstrumental Pop Hits0.39,1421,7286.04 /1.431.25–2.75
40Streets On FireBeataddictzStreet Beatz, Vol.20.23,3324,9981.16 /1.351–3.75
41TequilaThe ChampsRock Story "Instrumental Versions"1.02,8521,4589.40 /1.491–3
42Vegas DreamVegas ProjectInstrumental Hits, Vol.11.13,7322,73128.08 /2.131.5–3.25
43I Can't WaitAlex CundariInstrumental Hits, Vol.10.23,7324,0883.59 /1.391–2.5
44Who DatBeataddictzStreet Beatz, Vol.21.15,5824,5087.10 /1.451–3.25
45Abba MedleySuper TroopersInstrumental Pop Hits0.28,3827,05136.09 /2.271.25–4
46Abba MedleySuper TroopersInstrumental Pop Hits5.09,8921,71136.09 /2.271–3
47Abba MedleySuper TroopersInstrumental Pop Hits6.03,5920,64136.09 /2.271.5–4
48La BikinaRubén Fuentes GassonBachata1.18,3146,00124.83 /2.081.75–4
49BulldogThe VenturesRock Story "Instrumental Versions"0.44,9819,20151.33 /2.522–4
50BulldogThe VenturesRock Story "Instrumental Versions"1.21,2538,85151.33 /2.522–4
51Careless WhisperMads HaaberInstrumental Pop Hits3.09,4424,4176.93 /1.281–2.75
52Dark PlaceBeataddictzStreet Beatz, Vol.21.45,7335,1092.13 /1.541–3.75
53F.B. I.The ShadowsRock Story "Instrumental Versions"0.37,3020,91140.05 /2.331.5–4
54GuyboEddie CochranRock Story "Instrumental Versions"0.36,3117,53110.00 /1.831.5–3
55Highway RiderzBeataddictzStreet Beatz, Vol.20.59,1920,4097.13 /1.621–4
56In GoChuck BerryRock Story "Instrumental Versions"0.48,4226,80116.30 /1.941.5–4
57Oh by Jingo!Chet AtkinsRock Story "Instrumental Versions"0.40,8923,00120.55 /2.011.25–3.5
58Live ForeverThe WonderwallsInstrumental Pop Hits1.41,0234,0090.11 /1.501.25–3.5
59Live ForeverThe WonderwallsInstrumental Pop Hits3.35,7225,8090.11 /1.501.25–3.5
60Lucy in the Sky with DiamondsRicardo CalienteInstrumental Pop Hits3.06,4023,0081.11 /1.351–3.5
61Can't Fight the MoonlightJon CarranInstrumental Pop Hits1.42,2621,8297.95 /1.631.5–3.75
62No MercyBeataddictzStreet Beatz, Vol.20.50,1626,7476.11 /1.271–3.5
63No MercyBeataddictzStreet Beatz, Vol.22.33,5326,4176.11 /1.271–3.5
64Rockin' the Blues AwayTiny Grimes QuintetRock Story "Instrumental Versions"0.47,6525,50141.05 /2.351.5–4
65Rockin' the Blues AwayTiny Grimes QuintetRock Story "Instrumental Versions"1.12,7836,07141.05 /2.351.5–4
66The Rocking GuitarIni KamozeRock Story "Instrumental Versions"0.33,5815,08118.66 /1.981.5–3.25
67Country Rodeo SongMarco RinaldoCountry Instrumental Mix2.13,9924,03112.94 /1.881.5–3.75
68Sing Sing SingBenny GoodmanSing Sing Sing1.08,6518,87108.68 /1.811.5–3
69Sing Sing SingBenny GoodmanSing Sing Sing2.46,0336,29108.68 /1.811.5–3
70Si Una VezPete AstudilloBachata1.03,7029,27124.54 /2.081.5–3
71Streets on fireBeataddictzStreet Beatz, Vol.22.00,6338,0081.16 /1.351–3.75
72I Can't WaitAlex CundariInstrumental Hits, Vol.11.10,7134,4883.59 /1.391–2.5

Data availability

The source data and source code of all main results, the source code of the musical stimulus presentation and the raw EEG data are freely available on the OSF repository (https://doi.org/10.17605/OSF.IO/Y5XHS).

The following data sets were generated
    1. Weineck K
    2. Wen O
    3. Henry MJ
    (2022) Open Science Framework
    Neural synchronization is strongest to the spectral flux of slow music and depends on familiarity and beat salience.
    https://doi.org/10.17605/OSF.IO/Y5XHS

References

  1. Book
    1. Fraisse P
    (1982)
    6 - Rhythm and Tempo, Psychology of Music
    Academic Press.
  2. Book
    1. Jones MR
    (1993)
    Dynamics of Musical Patterns: How Do Melody and Rhythm Fit Together? Psychology and Music: The Understanding of Melody and Rhythm
    Lawrence Erlbaum Associates, Inc.
  3. Book
    1. Mcauley JD
    (2010)
    Tempo and Rhythm. Music Perception
    Springer.
  4. Conference
    1. Moelants D
    (2002)
    Preferred tempo reconsidered
    Proceedings of the 7th international conference on music perception and cognition Citeseer. pp. 1–4.
    1. Rimoldi HJA
    (1951) Personal tempo
    The Journal of Abnormal and Social Psychology 46:283–303.
    https://doi.org/10.1037/h0057479
  5. Book
    1. Shannon RV
    (2005)
    Speech and Music Have Different Requirements for Spectral Resolution
    Academic Press.

Decision letter

  1. Ole Jensen
    Reviewing Editor; University of Birmingham, United Kingdom
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  3. Benedikt Zoefel
    Reviewer; CNRS, France

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Neural entrainment is strongest to the spectral flux of slow music and depends on familiarity and beat salience" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Barbara Shinn-Cunningham as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Benedikt Zoefel (Reviewer #1) and Nate Zuk (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

In general the reviewers were positive however more work needs to be done to validate that the results are not a consequence of the analyses or the specific choice of music before tempo manipulation as pointed out by the reviewers. Also, it would be important to better clarify the concepts of neural entertainment, synchronization and neural tracking and their meaning in this specific context.

Reviewer #1 (Recommendations for the authors):

I believe the authors could emphasize their most important results more when they appear in the Results section. An example is the beginning of page 11 where one important sentence ("SRCoh was highest…") is hidden in the text. This result is interesting – highlighting it would also make main results easier to extract.

RCA: "These approaches have been criticized because of their potential susceptibility to autocorrelation". As the authors mention a possible involvement of neural oscillations (in their introduction), it could be useful to point out that a removal of autocorrelation (to calculate TRF) might actually remove oscillations as well.

Due to differences in x and y axes, I was initially confused by Figure 1c, wondering why stimulation tempo (Hz) does not correspond to FFT frequency (Hz). Maybe the authors could include lines to show where stimulation tempo = FFT frequency and the first harmonic? This would make the relevant information much easier to extract.

I leave it to the authors, but an additional point that might be worth discussing is the fact that humans seem to be most sensitive to amplitude modulations at higher frequencies (around 4 Hz) than those that seem to play an important role in the current study (1-2 Hz). This is for example summarized in a review by Edwards and Chang (2013, Hear Res). Other relevant work is that by Teng, Poeppel and colleagues showing theta activity to most reliably follow acoustic rhythms. Could the authors discuss whether this means that music is special or other reasons for relatively low preferred rates in their work?

Reviewer #2 (Recommendations for the authors):

The paper was very nice but sometimes hard to read because I am not so confident with the difference between engagement, entertainment, synchronization and tracking. In the literature, those terms are sometimes used interchangeably, and sometimes instead, they are used with a precise meaning. I suggest the authors think about rephrasing parts of the paper and clarifying those terms from the beginning to broaden the range of readers that can enjoy the paper in depth. I found very few typos, and the visualizations were nice. Sometimes I think that the plots are too small and hard to read, and the x/y axes proportion must be chosen carefully.

Reviewer #3 (Recommendations for the authors):

p. 4, line 72: "…and we are not aware…tempo-modulated". I think Kaneshiro et al., 2020 fits this description. They did not use a controlled spacing of tempos like was done in this study, but they used naturalistic polyphonic music with a variety of tempos. Rajendran et al., 2020 also fits this description, although they did extracellular recordings and not EEG.

p. 6, line 106-107: "…spectral flux is sensitive…changes in amplitude." Can you provide a citation where rhythmic information is provided by changes in pitch but not in amplitude, perhaps including a behavioral measure of rhythm salience?

Figure 1C: I think the reason the z-scored amplitude decreases with decreasing stimulation tempo is because z-scoring reduces the magnitude of the components as more non-zero spectral components are introduced in the range of interest. This could be confusing to readers given that EEG may track these tempos better. The authors could alternatively z-score the original signals (before computing the spectra), which might result in more consistent amplitudes across stimulation tempo.

p. 20, lines 368-370: "In this study…tapped frequency." I think this should go earlier in the results, around where Figure 1 is presented, in order to clarify the difference between "stimulation tempo" (which is referred to earlier) and "tapped beat rate".

p. 25-26, lines 481-510: This discussion aims to highlight the importance of considering other stimulus features besides the envelope (like spectral flux) when comparing the neural tracking of speech and music, partly via criticizing Zuk et al. (2021) for focusing on the envelope. However, the current study does not fully back up some of the author's critiques, mainly because speech stimuli were not included in this study. We don't know how much spectral flux improves EEG prediction of speech over envelope-based measures, although it likely will (see the use of spectrotemporal changes and acoustic edges in Daube et al., 2019). But we can instead compare the prediction accuracies to Di Liberto et al., 2015, Current Biology.

Envelope-based forward modeling of speech produces the worst average prediction accuracies around 0.05, and the accuracies increase as features such as the spectrogram and phonetic features are added (Figure 2 of that paper). These prediction accuracies are above the prediction accuracies in the current study (0.04 on average with spectral flux, and 0.06 for the slowest tempos, see Figure 3 of this study), supporting the possibility that speech is generally tracked better than music. I agree it is possible that other musical features may predict EEG better and could produce comparable prediction accuracies to speech. But for the discussion here, the authors should instead focus on how similarities and differences in relevant features for speech and music could affect interpretations for studies comparing neural tracking, which would nicely follow the first paragraph of this section. More specifically, the authors should clarify what "characterize stimuli as fairly as possible" (line 509) means and suggest alternative ways of comparing speech and music. If they choose to criticize studies using the envelope alone (which might not be necessary to make their point), I recommend also acknowledging that their study does not directly address questions about speech and music comparisons and that more work needs to be done to understand the differences between speech and music tracking.

p. 25, lines 497-498: "…we found…envelope." It is not clear if spectral flux causes stronger entrainment given the data. Rephrase to: "…we found that spectral flux was tracked better than the amplitude envelope."

p. 26, lines 520-521: "…and ease of beat tapping…stimulation tempi." The statistical analysis of the data in Figure 3 showed that ease of beat tapping did not significantly vary with stimulation tempo (lines 327-329), which contradicts this statement.

p. 28, lines 532-535: "One is that our study…entrainment." I don't understand why this would matter, because the analysis was done after selecting "musical" and "non-musical" subjects. Perhaps cut this?

p. 29, line 594: "correlate" should be "correlated".

p. 32, line 647-648: "Each segment…0.25 Hz." The details for the range of tempi are in Supplementary Table 1, but it would be useful here to state the maximum amount of tempo shift relative to the original, both for increasing and decreasing the tempo. Additionally, I mentioned in the public review that the original tempos seem to span the range with the best neural tracking. You should also state the original range of tempos here.

p. 34, lines 703-704: "This signal…AFz position." Why did you choose to reference this way? Is there a related citation?

p. 37, line 773: "To statistically asses" should be "assess".

p. 40, section EEG – Reliable Components Analysis, Temporal Response Function: In the public review, I suggested normalizing the measures of correlation, coherence, and prediction accuracy by null distributions. When you do so, make sure the shuffling of trials is done using trials with the same tempi. Otherwise, there will be a mismatch in the spectral content of the music due to differences in tempi, which defeats the point of generating the null distributions.

p. 40, line 855: "…a system identification technique…" This isn't quite true, change to "…a modeling technique…".

p. 41, lines 873-884, also Figure 3C: From my understanding, TRFs were fit to each stimulus feature separately. I am ok with this for comparing prediction accuracies because the models appear to have the same dimensionality (T x 1, where T is the number of lags). However, this is potentially an issue for interpreting TRF weights because the stimulus features in the separate models are correlated, so peaks and troughs found in the TRF for one model might be the effect of one of the other features. One possibility is to look at the TRF weights of the full model instead, but ridge unfortunately optimizes the model by making the weights correlated (an effect that produces smooth TRFs), so I don't think it alleviates the issue either. The best approach would be to iteratively partial out the contribution of each stimulus feature. Start with the envelope model, compute the EEG prediction with that model, and then subtract the prediction from the EEG data. Then fit the derivative model and then the beats model, removing each after they are fit. This way, the TRF weights found for the spectral flux model reflect the temporal response after removing the effects of the other stimulus features.

p. 42-43, lines 903-933: In the public review, I mentioned that the TRF-SVM modeling needed to be clearer. Specifically:

– How are the TRFs fit for each group? If you are using a leave-one-trial-out procedure like before, the resulting TRFs for each trial are the models fit to all trials except the testing trial. As a result, they are going to be highly similar to each other, which could explain why the SVM performs so well. Here, I recommend instead fitting the TRF to each trial separately (no cross-validation) using the same ridge parameter that you found for the model trained earlier (in the section EEG – Temporal Response Function). That way the TRFs will represent the stimulus-response mappings for each individual trial.

– Clarify which training step was referred to for calculating the surrogate data (line 920). I recommend shuffling prior to the TRF modeling step, although if you fit the TRF to each trial individually then the result of shuffling before vs after will be the same.

– Lines 913-916: There are 6 trials at tapped rate, and 6 trials at 2x tapped rate, which should result in n=12 not n=13.

Figure 5 – supplement 1: The distributions on the left don't look like they come from the points on the right in each plot. Can you check if there was a mistake with the points plotted or the y-axes?

https://doi.org/10.7554/eLife.75515.sa1

Author response

Essential revisions:

In general the reviewers were positive however more work needs to be done to validate that the results are not a consequence of the analyses or the specific choice of music before tempo manipulation as pointed out by the reviewers. Also, it would be important to better clarify the concepts of neural entertainment, synchronization and neural tracking and their meaning in this specific context.

Reviewer #1 (Recommendations for the authors):

I believe the authors could emphasize their most important results more when they appear in the Results section. An example is the beginning of page 11 where one important sentence ("SRCoh was highest…") is hidden in the text. This result is interesting – highlighting it would also make main results easier to extract.

We tried to highlight our most important results by dividing the results into smaller paragraphs. This way, we hope that the main results will be easier to read.

RCA: "These approaches have been criticized because of their potential susceptibility to autocorrelation". As the authors mention a possible involvement of neural oscillations (in their introduction), it could be useful to point out that a removal of autocorrelation (to calculate TRF) might actually remove oscillations as well.

This is true. We added this concern to the discussion of the manuscript.

p. 28 l. 556-560: “However, the RCA-based approaches (Kaneshiro et al., 2020) have been criticized because of their potential susceptibility to autocorrelation, which is argued to be minimized in the TRF approach (Zuk et al., 2021), which uses ridge regression to dampen fast oscillatory components (Crosse et al., 2021). However, by minimizing the effects of auto-correlation one concern could be that this could remove neural oscillations as well.”

Due to differences in x and y axes, I was initially confused by Figure 1c, wondering why stimulation tempo (Hz) does not correspond to FFT frequency (Hz). Maybe the authors could include lines to show where stimulation tempo = FFT frequency and the first harmonic? This would make the relevant information much easier to extract.

We added lines to Figure 1C to highlight when the first harmonic or the stimulation tempo equal the FFT Frequency.

I leave it to the authors, but an additional point that might be worth discussing is the fact that humans seem to be most sensitive to amplitude modulations at higher frequencies (around 4 Hz) than those that seem to play an important role in the current study (1-2 Hz). This is for example summarized in a review by Edwards and Chang (2013, Hear Res). Other relevant work is that by Teng, Poeppel and colleagues showing theta activity to most reliably follow acoustic rhythms. Could the authors discuss whether this means that music is special or other reasons for relatively low preferred rates in their work?

Thanks for this. It is true that Edwards and Chang, 2013 (among others) identify highest sensitivity to amplitude modulation around 4 Hz. This is faster than the rates at which we saw strongest neural synchronization to the amplitude envelope of music. It is possible then, that the rates at which we examined neural synchronization were “suboptimal” with respect to the system’s sensitivity to amplitude modulation, which may have handicapped amplitude envelope as a feature to describe music. However, Edwards and Chang also identify highest sensitivity to frequency modulation around the same rate, that is, 2–5 Hz. Again, this is faster than the rates at which we saw strongest neural synchronization to spectral flux. So we would argue that the difference between our amplitude and spectral features was not due to differences in the pre-existing sensitivities of the auditory system to these types of modulations. We actually have some data from another study (unpublished) showing that behavioral preferences for different rates are category-specific, meaning that you can tell the difference between amplitude- / frequency-modulated sounds, speech, and music based on the rates that listeners prefer to hear those sounds presented at. Although these are not neural data, they suggest to us that sensitivity to the modulations in technical sounds (AM, FM) might not be sufficient to predict sensitivity to fluctuations in categories of natural sounds, and music in particular. However, we would not necessarily propose that music is special in this way. Although we find this to be an extremely interesting topic, any discussion we would add would be fairly wild speculation – therefore, we hope to have your support by not adding this interesting point to our manuscript and lengthening the discussion further.

Nonetheless, we do include a discussion about why these low rates are important in natural music, which we reproduce here for your convenience.

p. 23, l. 409-426: “Strongest neural synchronization was found in response to stimulation tempi between 1 and 2 Hz in terms of SRCorr (Figure 2B), TRF correlations (Figure 3A), and TRF weights (Figure 3C-F). Moreover, we observed a behavioral preference to tap to the beat in this frequency range, as the group preference for music tapping was at 1.55 Hz (Figure 5 —figure supplement 3). Previous studies have shown a preference to listen to music with beat rates around 2 Hz (Bauer et al., 2015), which is moreover the modal beat rate in Western pop music (Moelants, 2002) and the rate at which the modulation spectrum of natural music peaks (Ding et al., 2017). Even in nonmusical contexts, spontaneous adult human locomotion is characterized by strong energy around 2 Hz (MacDougall and Moore, 2005). Moreover, when asked to rhythmically move their bodies at a comfortable rate, adults will spontaneously move at rates around 2 Hz (McAuley et al., 2006) regardless whether they use their hands or feet (Rose et al., 2020). Thus, there is a tight link between preferred rates of human body movement and preferred rates for the music we make and listen to that was moreover reflected in our neural data. This is perhaps not surprising, as musical rhythm perception activates motor areas of the brain, such as the basal ganglia and supplementary motor area (Grahn and Brett, 2007), and is further associated with increased auditory–motor functional connectivity (Chen et al., 2008). In turn, involving the motor system in rhythm perception tasks improves temporal acuity (Morillon et al., 2014), but only for beat rates in the 1–2 Hz range (Zalta et al., 2020).”

Reviewer #2 (Recommendations for the authors):

The paper was very nice but sometimes hard to read because I am not so confident with the difference between engagement, entertainment, synchronization and tracking. In the literature, those terms are sometimes used interchangeably, and sometimes instead, they are used with a precise meaning. I suggest the authors think about rephrasing parts of the paper and clarifying those terms from the beginning to broaden the range of readers that can enjoy the paper in depth. I found very few typos, and the visualizations were nice. Sometimes I think that the plots are too small and hard to read, and the x/y axes proportion must be chosen carefully.

Thank you for your helpful comments and your positive review. We rephrased parts of the paper and moved explanatory sections from the discussion to the introduction. We decided to use the term “neural synchronization” throughout, but do discuss how our study relates to the concept of neural entrainment, and how the analysis frameworks we tested relate to different theoretical backgrounds. We also rearranged some Figures (especially the Figures 2 and 3) that appeared to be quite small. We hope that this way the Figures are better readable and easier to understand.

Reviewer #3 (Recommendations for the authors):

p. 4, line 72: "…and we are not aware…tempo-modulated". I think Kaneshiro et al., 2020 fits this description. They did not use a controlled spacing of tempos like was done in this study, but they used naturalistic polyphonic music with a variety of tempos. Rajendran et al., 2020 also fits this description, although they did extracellular recordings and not EEG.

Kaneshiro et al., 2020 used four natural (unmanipulated) songs, which had tempi of 156 BPM, 94 BPM, 90 BPM and 86 BPM. Although it is true that this covers a range of tempi, the experimental manipulation was not to parametrically vary tempo as we did here. In contrast, Rajendran et al., 2020 used a larger musical stimulus set with tempi ranging from 0.7 to 3.7 Hz and investigated the neural response to those musical stimuli in rats.

Despite these differences, we can see why those studies could fit this description in our Introduction, and we have changed the sentence.

p. 3 l. 54-57: “Despite the perceptual and motor evidence, studies looking at tempo-dependence of neural synchronization are scarce (Doelling and Poeppel, 2015, Nicolaou et al., 2017) and we are not aware of any human EEG study using naturalistic polyphonic musical stimuli that were manipulated in the tempo domain.“

p. 6, line 106-107: "…spectral flux is sensitive…changes in amplitude." Can you provide a citation where rhythmic information is provided by changes in pitch but not in amplitude, perhaps including a behavioral measure of rhythm salience?

A number of past studies (such as Jones, 1987; Jones and Pfordresher, 1997; Ellis and Jones, 2009) describe the “joint accent structure” of music, where the position of perceived accents is determined by changes in pitch (deviations) in relation to the pre-existing melody (melodic accents) in addition to timing changes in relation of the temporal flow of an auditory stimulus (temporal accents, Jones and Pfordresher, 1997).

For a more concrete example, you could imagine classical music played in a glissando style by a violin or cello. There may never be a clear sound on-/offset, and the perceived rhythm can be based entirely on spectral changes of the music piece.

Finally, related to the behavioral consequences, a study by Burger et al., 2013, demonstrated a positive correlation between the spectral flux at lower frequencies (50-100 Hz) in music with perceived beat strength and urge to move.

We briefly mentioned parts of this in the discussion, but partially moved it up to the introduction and added citations:

p. 5 l. 95-100: “One potential advantage of spectral flux over the envelope or its derivative is that spectral flux is sensitive to rhythmic information that is communicated by changes in pitch even when they are not accompanied by changes in amplitude. Critically, temporal and spectral information jointly influence the perceived accent structure in music, which provides information about beat locations (Pfordresher, 2003, Ellis and Jones, 2009, Jones, 1993).”

p. 25 l. 476-477: “Previous work on joint accent structure indicates that spectral information is an important contributor to beat perception (Ellis and Jones, 2009, Pfordresher, 2003).”

Figure 1C: I think the reason the z-scored amplitude decreases with decreasing stimulation tempo is because z-scoring reduces the magnitude of the components as more non-zero spectral components are introduced in the range of interest. This could be confusing to readers given that EEG may track these tempos better. The authors could alternatively z-score the original signals (before computing the spectra), which might result in more consistent amplitudes across stimulation tempo.

We z-scored the original signal before computing the FFT. For plotting, we z-scored the FFT again. However, we compared the z-scored FFTs to the FFTs that were computed without subsequent z-scoring and similar trends were found (please see plots in Author response image 1). Therefore, we decided to keep the z-scored FFT (as shown in the original version of the manuscript).

Author response image 1

p. 20, lines 368-370: "In this study…tapped frequency." I think this should go earlier in the results, around where Figure 1 is presented, in order to clarify the difference between "stimulation tempo" (which is referred to earlier) and "tapped beat rate".

Thank you. As we removed most of the tapping-related analysis from the manuscript, we also deleted this sentence.

p. 25-26, lines 481-510: This discussion aims to highlight the importance of considering other stimulus features besides the envelope (like spectral flux) when comparing the neural tracking of speech and music, partly via criticizing Zuk et al. (2021) for focusing on the envelope. However, the current study does not fully back up some of the author's critiques, mainly because speech stimuli were not included in this study. We don't know how much spectral flux improves EEG prediction of speech over envelope-based measures, although it likely will (see the use of spectrotemporal changes and acoustic edges in Daube et al., 2019). But we can instead compare the prediction accuracies to Di Liberto et al., 2015, Current Biology.

Envelope-based forward modeling of speech produces the worst average prediction accuracies around 0.05, and the accuracies increase as features such as the spectrogram and phonetic features are added (Figure 2 of that paper). These prediction accuracies are above the prediction accuracies in the current study (0.04 on average with spectral flux, and 0.06 for the slowest tempos, see Figure 3 of this study), supporting the possibility that speech is generally tracked better than music. I agree it is possible that other musical features may predict EEG better and could produce comparable prediction accuracies to speech. But for the discussion here, the authors should instead focus on how similarities and differences in relevant features for speech and music could affect interpretations for studies comparing neural tracking, which would nicely follow the first paragraph of this section. More specifically, the authors should clarify what "characterize stimuli as fairly as possible" (line 509) means and suggest alternative ways of comparing speech and music. If they choose to criticize studies using the envelope alone (which might not be necessary to make their point), I recommend also acknowledging that their study does not directly address questions about speech and music comparisons and that more work needs to be done to understand the differences between speech and music tracking.

Thank you for this important comment and sorry for the potential misunderstanding that originated from this part of the discussion. In this section of the discussion, we did not mean to write negatively about studies that solely focus on the stimulus amplitude envelope. And we certainly did not mean to imply that our paper has anything definitive to say about direct comparisons of neural synchronization to music and speech. Every paper follows a different agenda and focuses on different aspects of the research in the field. In the current study, we wanted to test the effects of different acoustic features on neural synchronization (to music only). The point we were trying to make is not that the amplitude envelope is a “bad” acoustic feature, but rather that one could also consider using different acoustic features (going beyond the often-used amplitude envelope) to arrive at a more nuanced understanding of neural synchronization to music. As you say, this approach has been taken in speech work in the past and has improved forward-model predictions of neural data. Here, we are attemping a similar approach in the musical domain, where our choice of musical features was grounded in some intuition or understanding of the kinds of acoustic fluctuations that might give rise to a sense of temporal regularity or pulse.

Our aim with this discussion was not to create a conflict or a debate in the literature, but rather just to provide a little food for thought for future work. For that reason, we have substantially cut down this part of the discussion, and no longer speak of “characterizing stimuli as fairly as possible”. We reproduce this revised bit of the discussion here for your convenience:

p. 26-27 l. 498-511: “For example, a recent study found that neuronal activity synchronizes less strongly to music than to speech (Zuk et al., 2021); notably this paper focused on the amplitude envelope to characterize the rhythms of both stimulus types. However, our results show that neural synchronization is especially strong to the spectral content of music, and that spectral flux may be a better measure for capturing musical dynamics than the amplitude envelope (Müller, 2015). Imagine listening to a melody played in a glissando fashion on a violin. There might never be a clear onset that would be represented by the amplitude envelope – all of the rhythmic structure is communicated by spectral changes. Indeed, many automated tools for extracting the beat in music used in the musical information retrieval (MIR) literature rely on spectral flux information (Oliveira et al., 2010). Also, in the context of body movement, spectral flux has been associated with the type and temporal acuity of synchronization between the body and music at the beat rate (Burger et al., 2018) to a greater extent than other acoustic characterizations of musical rhythmic structure. As such, we found that spectral flux synchronized brain activity better than the amplitude envelope.”

p. 25, lines 497-498: "…we found…envelope." It is not clear if spectral flux causes stronger entrainment given the data. Rephrase to: "…we found that spectral flux was tracked better than the amplitude envelope."

Thank you. We changed the word “entrainment” to “synchronization” throughout the manuscript. For more consistency, we therefore rephrased the sentence to:

p. 27 l. 510-511: “As such, we found that spectral flux synchronized brain activity better than the amplitude envelope.”

p. 26, lines 520-521: "…and ease of beat tapping…stimulation tempi." The statistical analysis of the data in Figure 3 showed that ease of beat tapping did not significantly vary with stimulation tempo (lines 327-329), which contradicts this statement.

Yes, it is correct that we did not see any significant differences of the behavioral ratings across tempo conditions. We removed the sentence from the manuscript.

p. 28, lines 532-535: "One is that our study…entrainment." I don't understand why this would matter, because the analysis was done after selecting "musical" and "non-musical" subjects. Perhaps cut this?

We removed the “musicians” vs. “non-musicians” part from the manuscript due to the suggestions of Reviewer 2. Therefore, we cut this sentence.

p. 29, line 594: "correlate" should be "correlated".

Thank you. Done.

p. 32, line 647-648: "Each segment…0.25 Hz." The details for the range of tempi are in Supplementary Table 1, but it would be useful here to state the maximum amount of tempo shift relative to the original, both for increasing and decreasing the tempo. Additionally, I mentioned in the public review that the original tempos seem to span the range with the best neural tracking. You should also state the original range of tempos here.

We added the original music tempo range and maximum amount of tempo change as histograms to Figure 1 —figure supplement 2. We also added statements of the original music tempo in the Results section (p. 13 l. 265-273) and mention that it coincides with the tempo range of highest neural synchronization in the discussion (p. 23-24 l. 427-436).

p. 34, lines 703-704: "This signal…AFz position." Why did you choose to reference this way? Is there a related citation?

We chose to reference this way because the FCz and AFz are the standard positions for grounding and referencing the electrodes when using the actiCAP 64Ch Standard-2 system and layout from Brain Products. As these central positions are not ideal when conducting auditory experiments, we re-referenced our data to the average reference. This procedure has been used previously such as in Falk et al., 2017 and Cabral-Calderin and Henry, 2022.

p. 37, line 773: "To statistically asses" should be "assess".

Done. (Oops.)

p. 40, section EEG – Reliable Components Analysis, Temporal Response Function: In the public review, I suggested normalizing the measures of correlation, coherence, and prediction accuracy by null distributions. When you do so, make sure the shuffling of trials is done using trials with the same tempi. Otherwise, there will be a mismatch in the spectral content of the music due to differences in tempi, which defeats the point of generating the null distributions.

When calculating the z-score of the data, we took care that we calculated the surrogate distribution per tempo and participant. Details of the implementation can be found in the Materials and methods section:

p. 39 l. 823-831: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

p. 40, line 855: "…a system identification technique…" This isn't quite true, change to "…a modeling technique…".

Thank you, we changed it.

p. 41, lines 873-884, also Figure 3C: From my understanding, TRFs were fit to each stimulus feature separately. I am ok with this for comparing prediction accuracies because the models appear to have the same dimensionality (T x 1, where T is the number of lags). However, this is potentially an issue for interpreting TRF weights because the stimulus features in the separate models are correlated, so peaks and troughs found in the TRF for one model might be the effect of one of the other features. One possibility is to look at the TRF weights of the full model instead, but ridge unfortunately optimizes the model by making the weights correlated (an effect that produces smooth TRFs), so I don't think it alleviates the issue either. The best approach would be to iteratively partial out the contribution of each stimulus feature. Start with the envelope model, compute the EEG prediction with that model, and then subtract the prediction from the EEG data. Then fit the derivative model and then the beats model, removing each after they are fit. This way, the TRF weights found for the spectral flux model reflect the temporal response after removing the effects of the other stimulus features.

Our mutual-information analysis showed that the musical features are indeed correlated, and in particular that spectral flux significantly shares mutual information with all other music features (Figure 1). Thus, the TRF weights for each feature will be necessarily nonindependent (though it’s not clear to us that they might be “an effect of one of the other features” alone). Therefore, we took you up on your analysis suggestion. However, we decided to not do a step-wise regression analysis, as this would lead to multiple orthogonalizations and there is no obvious reason in which order the TRFs should be computed. Instead, we calculated a multivariate TRF based on the amplitude envelope, first derivative of the envelope, and beat onsets (everything but spectral flux). Then, as you suggested, we subtracted the resulting predictions from the EEG data. The residual data were used to compute the TRF weights in response to spectral flux. The consequent TRF weights look qualitatively similar to the originals (compare to Figure 3F) and we added them to the analysis (Figure 3 —figure supplement 2):

Author response image 2

We also plotted the TRF amplitude in the previously computed significant time lag window (102-211ms):

Author response image 3

p. 42-43, lines 903-933: In the public review, I mentioned that the TRF-SVM modeling needed to be clearer. Specifically:

– How are the TRFs fit for each group? If you are using a leave-one-trial-out procedure like before, the resulting TRFs for each trial are the models fit to all trials except the testing trial. As a result, they are going to be highly similar to each other, which could explain why the SVM performs so well. Here, I recommend instead fitting the TRF to each trial separately (no cross-validation) using the same ridge parameter that you found for the model trained earlier (in the section EEG – Temporal Response Function). That way the TRFs will represent the stimulus-response mappings for each individual trial.

– Clarify which training step was referred to for calculating the surrogate data (line 920). I recommend shuffling prior to the TRF modeling step, although if you fit the TRF to each trial individually then the result of shuffling before vs after will be the same.

Thank you for making this important comment. We used a leave-one-trial-out procedure as before, and therefore agree that this way of analyzing the data is not the most suitable. In response to your comments we, as you suggest, either calculate the TRFs based on individual trials or calculate the TRF surrogate dataset by shuffling the labels prior to the TRF analysis (instead of implementing it at the training step of the SVM classifier). However, in neither of those cases the SVM accuracies of the actual data were significantly better than the SVM accuracies to a surrogate dataset. Therefore, we removed this part from the manuscript.

– Lines 913-916: There are 6 trials at tapped rate, and 6 trials at 2x tapped rate, which should result in n=12 not n=13.

By “n” we did not mean the number of trials, but rather how many participants and tempo conditions fulfilled this requirement. However, as we removed all SVM-related things from the manuscript, we also removed this part from the manuscript.

Figure 5 – supplement 1: The distributions on the left don't look like they come from the points on the right in each plot. Can you check if there was a mistake with the points plotted or the y-axes?

Thank you for this important comment. The left plot depicts the mean TRF correlations per participant whereas the right plot displayed the maximum TRF correlations per participant. Based on the comments of Reviewer 2, we removed the left plots (contrasting musicians vs. non-musicians) in each panel and only plotted the general sophistication index (from the Gold-MSI) against the TRF correlations. To make it more consistent with the Figures of the main manuscript we changed the maximum TRF correlations into the mean TRF correlations per participant (this way the right plot corresponds to the previous “musician vs. non-musician plot”, Figure 5 —figure supplement 2).

References

Burger, B., Ahokas, R., Keipi, A. and Toiviainen, P. (Year) Relationships between spectral flux, perceived rhythmic strength, and the propensity to move. City.

Cabral-Calderin, Y. and Henry, M.J. (2022) Reliability of Neural Entrainment in the Human Auditory System. J Neurosci, 42, 894-908.

Crosse, M.J., Di Liberto, G.M., Bednar, A. and Lalor, E.C. (2016) The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli. Frontiers in Human Neuroscience, 10.

Crosse, M.J., Zuk, N.J., Di Liberto, G.M., Nidiffer, A.R., Molholm, S. and Lalor, E.C. (2021) Linear Modeling of Neurophysiological Responses to Speech and Other Continuous Stimuli: Methodological Considerations for Applied Research. Frontiers in Neuroscience, 15.

Di Liberto, G.M., Pelofi, C., Bianco, R., Patel, P., Mehta, A.D., Herrero, J.L., de Cheveigné, A., Shamma, S. and Mesgarani, N. (2020) Cortical encoding of melodic expectations in human temporal cortex. eLife, 9, e51784.

Di Liberto, Giovanni M., O’Sullivan, James A. and Lalor, Edmund C. (2015) Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing. Current Biology, 25, 2457-2465.

Ding, N., Chatterjee, M. and Simon, J.Z. (2014) Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. NeuroImage, 88, 41-46.

Edwards, E. and Chang, E.F. (2013) Syllabic (∼2–5 Hz) and fluctuation (∼1–10 Hz) ranges in speech and auditory processing. Hearing Research, 305, 113-134.

Ellis, R.J. and Jones, M.R. (2009) The role of accent salience and joint accent structure in meter perception. Journal of experimental psychology. Human perception and performance, 35 1, 264-280.

Falk, S., Lanzilotti, C. and Schön, D. (2017) Tuning Neural Phase Entrainment to Speech. Journal of Cognitive Neuroscience, 29, 1378-1389.

Jones, M.R. (1987) Dynamic pattern structure in music: Recent theory and research. Perception and Psychophysics, 41, 621-634.

Jones, M.R. and Pfordresher, P.Q. (1997) Tracking musical patterns using joint accent structure. Canadian Psychological Association, Canada, pp. 271-291.

Kaneshiro, B., Nguyen, D.T., Norcia, A.M., Dmochowski, J.P. and Berger, J. (2020) Natural music evokes correlated EEG responses reflecting temporal structure and beat. NeuroImage, 214, 116559.

Madsen, J., Margulis, E.H., Simchy-Gross, R. and Parra, L.C. (2019) Music synchronizes brainwaves across listeners with strong effects of repetition, familiarity and training. Scientific Reports, 9, 3576.

Rajendran, V., Harper, N. and Schnupp, J. (2020) Auditory cortical representation of music favours the perceived beat. Royal Society Open Science, 7, 191194.

Teng, X., Meng, Q. and Poeppel, D. (2021) Modulation Spectra Capture EEG Responses to Speech Signals and Drive Distinct Temporal Response Functions. eneuro, 8, ENEURO.0399-0320.2020.

Vanden Bosch der Nederlanden, C.M., Joanisse, M.F., Grahn, J.A., Snijders, T.M. and Schoffelen, J.-M. (2022) Familiarity modulates neural tracking of sung and spoken utterances. NeuroImage, 252, 119049.

Zuk, N.J., Murphy, J.W., Reilly, R.B. and Lalor, E.C. (2021) Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies. PLOS Computational Biology, 17, e1009358.

https://doi.org/10.7554/eLife.75515.sa2

Article and author information

Author details

  1. Kristin Weineck

    1. Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
    2. Goethe University Frankfurt, Institute for Cell Biology and Neuroscience, Frankfurt am Main, Germany
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft
    For correspondence
    kristin.weineck@ae.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3204-860X
  2. Olivia Xin Wen

    Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
    Contribution
    Software, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8845-1233
  3. Molly J Henry

    1. Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
    2. Department of Psychology, Toronto Metropolitan University, Toronto, Canada
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2284-8884

Funding

European Research Council (ERC-STG-804029 BRAINSYNC)

  • Molly J Henry

Max-Planck-Gesellschaft (Max Planck Research Group Grant)

  • Molly J Henry

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank the lab staff of the Max Planck Institute for Empirical Aesthetics for technical support during data acquisition and Lauren Fink for valuable input during data analysis and stimulus feature design.

Ethics

All participants signed the informed consent before starting the experiment. The study was approved by the Ethics Council of the Max Planck Society Ethics Council in compliance with the Declaration of Helsinki (Application No: 2019_04).

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Ole Jensen, University of Birmingham, United Kingdom

Reviewer

  1. Benedikt Zoefel, CNRS, France

Publication history

  1. Received: November 12, 2021
  2. Preprint posted: November 30, 2021 (view preprint)
  3. Accepted: July 25, 2022
  4. Version of Record published: September 12, 2022 (version 1)

Copyright

© 2022, Weineck et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 744
    Page views
  • 142
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Kristin Weineck
  2. Olivia Xin Wen
  3. Molly J Henry
(2022)
Neural synchronization is strongest to the spectral flux of slow music and depends on familiarity and beat salience
eLife 11:e75515.
https://doi.org/10.7554/eLife.75515
  1. Further reading

Further reading

    1. Neuroscience
    2. Structural Biology and Molecular Biophysics
    Tianzhi Li, Qiqi Cheng ... Cong Ma
    Research Article

    Exocytosis of secretory vesicles requires the soluble N-ethylmaleimide-sensitive factor attachment protein receptor (SNARE) proteins and small GTPase Rabs. As a Rab3/Rab27 effector protein on secretory vesicles, Rabphilin 3A was implicated to interact with SNAP-25 to regulate vesicle exocytosis in neurons and neuroendocrine cells, yet the underlying mechanism remains unclear. In this study, we have characterized the physiologically relevant binding sites between Rabphilin 3A and SNAP-25. We found that an intramolecular interplay between the N-terminal Rab-binding domain and C-terminal C2AB domain enables Rabphilin 3A to strongly bind the SNAP-25 N-peptide region via its C2B bottom α-helix. Disruption of this interaction significantly impaired docking and fusion of vesicles with the plasma membrane in rat PC12 cells. In addition, we found that this interaction allows Rabphilin 3A to accelerate SNARE complex assembly. Furthermore, we revealed that this interaction accelerates SNARE complex assembly via inducing a conformational switch from random coils to α-helical structure in the SNAP-25 SNARE motif. Altogether, our data suggest that the promotion of SNARE complex assembly by binding the C2B bottom α-helix of Rabphilin 3A to the N-peptide of SNAP-25 underlies a pre-fusion function of Rabphilin 3A in vesicle exocytosis.

    1. Computational and Systems Biology
    2. Neuroscience
    Kiri Choi, Won Kyu Kim, Changbong Hyeon
    Research Article

    The projection neurons (PNs), reconstructed from electron microscope (EM) images of the Drosophila olfactory system, offer a detailed view of neuronal anatomy, providing glimpses into information flow in the brain. About 150 uPNs constituting 58 glomeruli in the antennal lobe (AL) are bundled together in the axonal extension, routing the olfactory signal received at AL to mushroom body (MB) calyx and lateral horn (LH). Here we quantify the neuronal organization in terms of the inter-PN distances and examine its relationship with the odor types sensed by Drosophila. The homotypic uPNs that constitute glomeruli are tightly bundled and stereotyped in position throughout the neuropils, even though the glomerular PN organization in AL is no longer sustained in the higher brain center. Instead, odor-type dependent clusters consisting of multiple homotypes innervate the MB calyx and LH. Pheromone-encoding and hygro/thermo-sensing homotypes are spatially segregated in MB calyx, whereas two distinct clusters of food-related homotypes are found in LH in addition to the segregation of pheromone-encoding and hygro/thermo-sensing homotypes. We find that there are statistically significant associations between the spatial organization among a group of homotypic uPNs and certain stereotyped olfactory responses. Additionally, the signals from some of the tightly bundled homotypes converge to a specific group of lateral horn neurons (LHNs), which indicates that homotype (or odor type) specific integration of signals occurs at the synaptic interface between PNs and LHNs. Our findings suggest that before neural computation in the inner brain, some of the olfactory information are already encoded in the spatial organization of uPNs, illuminating that a certain degree of labeled-line strategy is at work in the Drosophila olfactory system.