1. Neuroscience
Download icon

Auditory cortical alpha/beta desynchronization prioritizes the representation of memory items during a retention period

  1. Nathan Weisz  Is a corresponding author
  2. Nadine Gabriele Kraft
  3. Gianpaolo Demarchi  Is a corresponding author
  1. Centre for Cognitive Neuroscience and Department of Psychology, Paris-Lodron Universität Salzburg, Austria
Research Article
  • Cited 0
  • Views 609
  • Annotations
Cite this article as: eLife 2020;9:e55508 doi: 10.7554/eLife.55508

Abstract

To-be-memorized information in working-memory could be protected against distracting influences by processes of functional inhibition or prioritization. Modulations of oscillations in the alpha to beta range in task-relevant sensory regions have been suggested to play an important role for both mechanisms. We adapted a Sternberg task variant to the auditory modality, with a strong or a weak distracting sound presented at a predictable time during the retention period. Using a time-generalized decoding approach, relatively decreased strength of memorized information was found prior to strong distractors, paralleled by decreased pre-distractor alpha/beta power in the left superior temporal gyrus (lSTG). Over the entire group, reduced beta power in lSTG was associated with relatively increased strength of memorized information. The extent of alpha power modulations within participants was negatively correlated with strength of memorized information. Overall, our results are compatible with a prioritization account, but point to nuanced differences between alpha and beta oscillations.

Introduction

Adaptive sensory processing entails the prioritization of task-relevant features with respect to competing information. Top-down modulation of activity in neural ensembles encoding task-relevant or distracting information is crucial in achieving this goal. In particular, regionally specific power changes around the alpha frequency range have been linked to such a putative top-down-mediated gain modulation, with enhanced power reflecting relatively inhibited states (Jensen and Mazaheri, 2010; Klimesch et al., 2007). For the visual modality especially, a vast amount of empirical evidence supports this notion. For example, increased alpha power in parieto-occipital cortical regions contralateral to the unattended hemifield is a very robust finding (e.g. Busch and VanRullen, 2010; Thut et al., 2006). The general inhibitory gating function of localized alpha increases has also been reported with respect to more specific visual features, leading to remarkable spatially circumscribed alpha modulations (Jokisch and Jensen, 2007; Zumer et al., 2014) even at a retinotopic level (Popov et al., 2019). Also for the domain of working memory, alpha increases have been reported during the retention period in the visual (e.g. Jensen et al., 2002; Klimesch et al., 1999), somatosensory (e.g. Haegens et al., 2009) and auditory modalities (e.g. Obleser et al., 2012), putatively protecting the to-be-remembered information against interference. This load-dependent top-down amplification of alpha and its concomitant inhibition account are widely accepted, but circumscribed decreases in alpha to beta power (often labeled as desynchronization) have also been deemed functionally important in the context of working-memory tasks. In a prioritization account, they reflect an enhanced activation of performance-relevant neural ensembles (e.g. Noh et al., 2014; Sauseng et al., 2009). A recent framework by Hanslmayr et al., 2016 explicitly links the extent of alpha/beta desynchronization to the representational strength of the information content in episodic memory (for supportive evidence see Griffiths et al., 2019). This is in line with a framework by van Ede, 2018 stressing the importance of regionally specific alpha and beta decreases when item-specific information needs to be prioritized in the retention period of working-memory tasks.

Distracting sounds are ever-present in natural listening environments and necessitate flexible exertion of inhibition or prioritization processes. Besides stimulus-feature information, which can influence the precise location of alpha modulations in the visual system (Popov et al., 2019), temporal cues can also be exploited (Rohenkohl and Nobre, 2011; van Ede and Chekroud, 2018): that is, when distracting sound input can be temporally predicted, inhibition or prioritization processes should be regulated in an anticipatory manner in relevant auditory regions. As in other sensory modalities (Frey et al., 2015; Weisz and Obleser, 2014), an increasing amount of evidence points to a functional role of alpha oscillations in listening tasks. Increased alpha oscillations have been observed in putatively visual brain regions when focusing attention on auditory input in cue-target periods (Frey et al., 2014; Fu et al., 2001; Snyder and Foxe, 2010). A similar posterior pattern is also observed in challenging listening situations, for example, with increased cognitive load or when faced with background noise (for reviews see Johnsrude and Rodd, 2016; Rönnberg et al., 2011). However, increases in alpha oscillations as a mechanism for selective inhibition (Strauß et al., 2014) have rarely been shown for auditory cortex, in which feature-specific processing of target and distractor sounds takes place. With regards to alpha desynchronization in auditory cortex, different lines of evidence showing an association between (also illusory) sound perception and low auditory cortical alpha power (e.g. Lange et al., 2014; Weisz et al., 2007; Weisz and Obleser, 2014; for invasive recordings illustrating sound-sensitive alpha desynchronization in anterolateral Heschl’s Gyrus, see Billig et al., 2019), suggest a link to representational content as described above.

The goal of the present study was to test whether power modulations in the alpha/beta range (Hanslmayr et al., 2016; van Ede, 2018) in task-relevant auditory cortical areas prior to a temporally predictable distractor, which was presented in the same (i.e. auditory) modality as the target, would better fit with an inhibition or prioritization account. On a general level, power increases would be predicted by an inhibition account, whereas decreases would be expected according to a prioritization account. Furthermore, both alternative accounts make opposing predictions regarding the relationship between pre-distractor alpha modulations and the strength of memorized information in the retention period (see Figure 1).

Modified auditory Sternberg paradigm and cartoon depiction of analysis rationale.

(A) A sequence of four consonants spoken by a female voice was presented. After the retention period, either a strong (consonant spoken by a male voice) or a weak (scrambled consonant) distractor was presented (at 1 s). Distractor type was kept constant during a block. Subsequently, participants indicated by a button press whether the probe was part of the memory set (‘part’) or not (‘no part’). At an individual level, temporal decoding was performed on whether the probe was part of memory set or not. When the probe was part of the memory set, it should have been seen to share distinct neural patterns with those elicited by the items of the memory set, while this should not have been the case when the stimulus was not part of the memory set. By time-generalizing the classifiers trained on the probe to the period of the retention interval, we obtained a quantitative proxy for the strength of memorized information at the time of distractor presentation. The results were then statistically contrasted between weak and strong distractors across the group. (B) Alpha/Beta power in lSTG was calculated at a single trial level in a pre-distractor period and was used to bin high and low power trials. For a 0.5-s pre-distractor period, analysis analogous to (A) was performed to quantify the relationship between regionally specific alpha/beta power and strength of memorized information. A prioritization account would predict that lower ‘desynchronized’ states go along with relatively increased strength of memorized information. This pattern should be captured when contrasting the bins across the entire group and when taking into account the extent of modulation within single participants. An inhibition account would predict an opposing pattern.

We adapted a Sternberg task variant introduced by Bonnefond and Jensen, 2012 to the auditory modality. These researchers illustrated pronounced alpha and beta increases, as well as phase effects, in parieto-occipital regions prior to the presentation of a more potent but temporally predictable visual distractor in the retention period. Using magnetoencephalography (MEG) and decoding, we first identified regions that were informative as to whether a speech item was part of a memory set or not, and focused subsequent spectral analysis on the left superior temporal gyrus (lSTG). This region, which is crucially involved in phonological short-term memory (Jacquemot and Scott, 2006), expressed marked alpha/beta desynchronization prior to a strong distractor. Importantly, by time-generalizing the aforementioned classifier (King and Dehaene, 2014), we implemented a proxy for the strength of memorized information that could be compared between trials with high or low power. Specifically, we show that lower pre-distractor beta power in lSTG goes along with relatively enhanced memory representation in the same period. For alpha power, however, a negative correlation was observed between the strength of memorized information and the extent to which power was modulated at an individual level. Overall, our study draws a nuanced picture that points to differential alpha and beta processes in the auditory cortex that altogether support the prioritization of relevant information in working memory (van Ede, 2018).

Results

Thirty-three healthy participants performed a modified Sternberg task (Bonnefond and Jensen, 2012) adapted to the auditory modality. In each trial they listened to a sequence of four consonants spoken by a female voice (see Figure 1A). These items had to be memorized across a 2-s retention period, after which a probe item was presented. Participants were requested to report whether the probe item was part of the memory set or not. Critically, precisely 2 s after onset of the last memory item, a distractor was presented that was either strong (a consonant spoken by a male voice) or weak (an acoustically scrambled version of a consonant). The different distractor types were presented blockwise in a manner counterbalanced across participants.

Adverse behavioral impact of strong distractors

We reasoned that the processing of a strong distractor would be more difficult to suppress and should affect behavioral performance. Comparing average accuracy between the strong and the weak distractor conditions showed a small (85% vs 83%) but statistically significant (t32 = −2.11, pone-sided = 0.02; Cohen’s d = 0.37, ‘small effect’) deterioration of performance for strong distractors. Reaction times were on average 9 ms slower for the strong distractor condition (579 ms vs 570 ms); however, this difference was not significant (t29 = 1.07, pone-sided = 0.14; Cohen’s d = −0.20, ‘very small effect’). It should be noted that response speed was not emphasized to avoid interference from button presses on processing of the probe item. This may have reduced potential reaction time differences. Overall, the behavioral analysis supports the notion that the strong distractor condition was slightly more challenging, laying a solid foundation for the subsequent MEG analysis.

Decoding probe-related information

Our main goal was to investigate the extent to which different distractor levels influence the strength of memorized information during the retention period, in particular prior to the predictable onset of the distractor. Also, we wanted to relate these effects to potential alpha power modulations in the pre-distractor period. To this end, we first applied temporal decoding, using linear discriminant analysis (LDA, 5-fold cross validation, repeated five times, AUC as a metric; see Figure 1A and 'Materials and methods' for details), on the post-probe MEG sensor-level activity, to classify whether a probe was part of the memory set or not. The results are depicted in Figure 2A and show robust and sustained above-chance classification performance rapidly commencing ~334 ms after probe onset (pcluster = 4e-4) and lasting until the onset of the response prompt at 700 ms post-probe (Cohen’s d in this period was generally >0.8, that is ‘strong and very strong effects’). The time course of this effect is very much in line with those seen in evoked response studies on old vs new effects in short-term memory (Danker et al., 2008; Kayser et al., 2003), indicating that early sensory activation is not informative on whether a probe was a memorized item or not. In subsequent analyses, a 0.4–0.7-s post-probe decoding period was used as a training set, and the derived classifiers were applied in a time-generalized manner to the preceding retention period (yielding a quantitative proxy for the strength of memorized information; see below and also Figure 1).

Although the results so far show that whether a probe was part of a memory set or not can be differentiated based on the MEG data, the resulting temporal decoding pattern uses all sensors and is therefore spatially agnostic. In order to obtain insights into which brain regions may be contributing to the effect and aiding the identification of task-relevant region(s) in a data-driven manner (Jacquemot and Scott, 2006), we adapted an approach to derive Informative Activity in source space (Marti and Dehaene, 2017). In brief, this approach projects the sensor level classifier weights to source space using beamformer filters. To make the effects more interpretable, we implemented a within-subject permutation analysis and z-scored the classifier-weights in a first-level analysis. These data were subsequently tested at group-level against zero within a nonparametric cluster permutation test using a t-test, yielding a positive cluster (p=4e-4) collapsed over the relevant post-probe time period. As shown in Figure 2A, Informative Activity can be detected in widespread cortical regions encompassing temporal, parietal and frontal areas. Although the pattern was bilateral, there was a clear left-hemispheric dominance, with the most pronounced effect localizable to lSTG. Given the particular involvement of this latter area in processing speech sounds (e.g. Mesgarani et al., 2014; see also Billig et al., 2019) and its central role in phonological short-term memory (Jacquemot and Scott, 2006), it was used as a task-relevant region of interest for the spectral analysis (see below).

Decoding of probe-related information.

(A) Results of the temporal decoding of MEG sensor-level activity after the probe presentation to classify whether it was part of the memory set or not (see also Figure 1A for rationale). Above-chance detection performance (AUC = area under the receiver operating characteristic [ROC] curve) was found commencing ~300 ms after the probe onset (at 2.0 s) and lasting at least until the response was prompted. Informative activity for this decoding as a function of time is shown on the right (green areas outlining the expanse of the cluster that results following a nonparametric permutation test for the 0.3–0.7-s post-probe onset interval). Within the relevant time interval (in blue-dotted box) informative activity emerges early in left STG and progressively spreads to further temporal, parietal and frontal areas. Data used for plotting the results of the temporal decoding at 10.17605/OSF.IO/753MK. (B) The time generalization result is shown separately for the strong and weak distractor conditions (left and middle panels). Trivially, the strongest classification results are obtained approximately at the onset of the probe (at 2 s). Relatively decreased decoding performance (AUC <0.5) was obtained prior to the onset of the strong distractor. Statistical comparison of strong vs weak distractor conditions revealed two peak effects at ~400 ms and ~200 ms preceding the distractor onset, although only the difference closer to distractor onset was significant (pcluster = 0.0156) at the cluster level (right panel). Data used for plotting the time generalized result at 10.17605/OSF.IO/4CV83.

After illustrating the above-chance decoding performance for the probe, we tested the extent to which informative patterns are present in the retention period, and how they are modulated by the distractor especially prior to its anticipated onset. The classifier described above was employed for this purpose using time generalization (King and Dehaene, 2014). The rationale for this approach (see also Figure 1) was that if a probe was part of the memory set, it should share neural patterns with those elicited by the actual memory set. This should not be the case for probes that were not part of the memory set, so that time-generalizing these post-probe patterns to the retention period should give a quantitative proxy for the strength of memorized information. The full time generalization result is shown separately for the strong and weak distractor conditions in Figure 2B. Trivially, the strongest classification results are obtained approximately at the onset of the target (at 2 s), which corresponds to the area close to the diagonal of the time generalization matrix. We were mainly interested in the decoding performance in the period preceding the anticipated distractor at 1 s, that is, in the off-diagonal pattern. It is evident that for the late 0.4–0.7-s post-probe training time period (probe onset at 2 s shown on the y-axis), decoding accuracy gradually ramps up over the retention interval.

Statistical analysis was done for a 500-ms window preceding the onset of the distractor (Figure 2B, right panel), focusing on the data-driven 0.4–0.7-s post-probe training time window (see above). This analysis yielded two peak effects at ~400 ms and ~200 ms (Cohen’s d in these periods ~0.6 and ~0.55, respectively, that is, medium effect sizes) preceding the distractor onset, during which critical t-values (±2.0369) were exceeded. However, only the latter difference was significant following a nonparametric permutation test (pcluster = 0.0156). These results suggest that memorized information is differentially activated prior to distractor onset, with relatively reduced activation prior to the strong distractor. In fact, when post-hoc testing the decoding accuracy over the entire test- and training-time window, average decoding accuracy prior to the strong distractor was significantly below chance (t32 = −3.98, p=3.63e-04), whereas it did not differ for weak distractors (t32 = −1.73, p=0.09). Below-chance decoding may appear surprising but is not uncommon in time- and condition-generalization decoding approaches (King and Dehaene, 2014).

Although the precise neural processes leading to these patterns are challenging to pinpoint, in functional terms, this result translates into a systematic relative absence of memory-item specific patterns akin to those elicited during relevant periods following probe onset. As this activation gradually ramps up toward the anticipated onset of the probe in both conditions, albeit somewhat delayed in the strong distractor condition, it is clear that some representation with regards to the memorized item would also need to be present during the periods of below chance decoding. Recently, Stokes, 2015 and Wolff et al., 2017 described network-level ‘activity silent’ processes encoding working memory content, and similar processes could be present in the early part of the retention period in the present task. Overall, the described decoding effect is in line with the behavioral results described above, implying an overall detrimental impact associated with anticipation of a strong distractor.

Pre-distractor modulations of induced oscillatory activity

In a next step, we focused on modulations of oscillatory activity in the lSTG, a region dominating our source level analysis of informative activity with regards to the probe decoding (i.e. whether memorized or not) and strongly suggested to be task-relevant for phonological short-term memory by previous research (see, for example, Jacquemot and Scott, 2006). We focused our (statistical) analysis on the period immediately preceding the predictable occurrence of the distractor. According to an inhibition account, alpha/beta enhancements would be expected to precede the strong distractor in particular, putatively reflecting an anticipatory suppression of its processing. A prioritization account, on the other hand, would predict a power reduction in the same frequency range, putatively reflecting an anticipatory activation of the memorized information.

The time-frequency representations in Figure 3A, which display the induced power in the 5–25 Hz range, show strong ongoing alpha/beta activity with a peak ~10 Hz in the lSTG. Analogous to the study by Bonnefond and Jensen, 2012, a 500-ms period preceding the occurrence of the distractor is marked, suggesting an alpha power decrease in the strong as compared with the weak distractor condition. This impression is supported by a nonparametric permutation test (pcluster = 0.0104), which yields a significant difference in this period with peak differences of ~12–13 Hz and ~21–22 Hz (Figure 3A, right panel), comparable to those recorded in the study in the visual domain (Bonnefond and Jensen, 2012). Given the perfect temporal predictability of the distractor occurrence, stronger prestimulus phase alignment of alpha oscillations could be expected (as reported in the visual modality by Bonnefond and Jensen, 2012; but see van Diepen et al., 2015). This process putatively exploits the fact that excitability varies over an alpha cycle, to align its inhibitory phase optimally to suppress processing of the irrelevant sound maximally. However, even though clear post-distractor evoked alpha enhancements could be observed (see Figure 3B), no prominent evoked alpha could be observed preceding the distractor (an analysis using ITC leads to an identical conclusion; data not shown). For the sake of completeness, we ran an analogous statistical test as for the induced power, showing no difference at the cluster corrected level (Figure 3B, right panel). Since no pronounced evoked alpha activity was identified in the pre-distractor period, we refrained from further analysis (such as phase opposition effects). This result extends a previous report (van Diepen et al., 2015) in finding no evidence that auditory cortical alpha phase is adjusted in a top-down manner.

Pre-distractor alpha power modulations in the left superior temporal gyrus.

(A) Time-frequency representations of the induced power show strong ongoing alpha/beta activity with a peak at ~10 Hz. No baseline normalization was applied. The vertical dots indicate a 500-ms period preceding the occurrence of the distractor. An alpha/beta power decrease in the strong vs the weak distractor condition can be seen (left and middle panel). The notion is supported by the outcome of a nonparametric permutation test leading to a significant difference at cluster level (marked by a black contour; pcluster = 0.0104) over an alpha to beta range with peak differences at ~12–13 and 21–22 Hz (right panel). For both ranges, the peak effects were observed at 0.7 s, that is, ~300 ms prior to the anticipated onset of the distractor. Data used for plotting induced power at 10.17605/OSF.IO/4WUYD. (B) Time-frequency representations of the evoked power. Post-distractor alpha enhancements are seen, but no prominent alpha preceded the distractor (left and middle panel). The nonparametric statistical test at cluster level showed no difference (right panel). Data used for plotting evoked power at 10.17605/OSF.IO/TYZC8.

Overall, the spectral analysis suggests a pronounced decrease of alpha to beta power prior to the expected occurrence of the strong distractor. Although this pattern appears to support a prioritization account, it does not fit well with the decoding result at a first sight. One interpretation, along the lines of the inhibition account, could be that the expectation of a more salient auditory distractor may involuntarily draw more selective attention towards it (reflected in an anticipatory alpha decrease), making it more difficult to suppress. On the basis of the results presented so far, however, these alternatives cannot be differentiated. In the next part, by exploiting the possibility of time-generalizing the classifiers trained above (for rationale see also Figure 1A), we will attempt to address this important issue by linking pre-distractor alpha power modulations to the strength of memorized information in the retention period.

Strength of memorized information and pre-distractor alpha power

To address the functional relevance of pre-distractor alpha and beta power modulations in the lSTG in greater detail, trials were sorted according to alpha (13 ± 3 Hz) or beta (21 ± 3 Hz) power in this region in a 400–1000-ms time period following the onset of the retention period (i.e. a 600-ms pre-distractor window centered on peak latency effect at 700 ms). Subsequently, these trials were median split into high and low alpha or beta power bins. Analogous to the analysis described above (see also Figure 2), we trained a classifier on all trials to discriminate whether a probe was part of a memory set or not, and applied the classifier to a 0.5-s time-window prior to distractor presentation separately for the high- and low-power trials (analogous to the analysis shown in Figures 2 and 3) . On the basis of the previous analysis, we again focused on a 400–700-ms training time period and calculated the average strength of memorized information for each bin as well as the difference between bins for each individual participant. A functional relationship between pre-distractor alpha/beta power and strength of memorized information should be reflected in two ways (with different directions predicted according to a prioritization or inhibition account; see Figure 1B). First, strength of memorized information should differ overall between low- and high-power bins. Second, stronger (relative) differences between the bins should be reflected in stronger concomitant differences in the strength of memorized information.

To test for overall differences in the strength of memorized information between low- and high-power bins we first ran a repeated measures ANOVA using frequency band (alpha and beta) and bin (high and low) as factors. This analysis showed that the strength of memorized information did not differ overall between the frequency bands (F1,32 = 1.19, p=0.28). A trend was observed for bins (F1,32 = 3.29, p=0.08) with low-power bins showing relatively increased strength of memorized information (see Figure 4A). However, the interaction effect (F1,32 = 4.06, p=0.05) indicated that this difference was not uniform for the alpha and beta bands. Comparing strength of memorized information within each frequency band showed that a difference was only significant for beta (t32 = 2.27, p=0.03) and not for alpha (t32 = 0.06, p=0.95). Thus, in line with a prioritization account (see Figure 1B), overall pre-distractor power in the beta frequency range in lSTG goes along with relatively increased strength of memorized information.

Relationship between alpha and beta power modulation and strength of memorized information (operationalized via the time-generalized decoding approach; see Figure 1) in the 0.5-s pre-distractor period.

(A) Average strength of memorized information in the relevant period split between strong and weak power trials. At the group-level, significant modulation is seen only for the beta band, with relatively increased strength of memorized information for weak power trials. (B) Interindividual variation in the extent to which alpha power was modulated between high- and low-power trials within a participant (a higher value on the x-axis reflects a more extreme power difference between high- and low -power trials) was negatively correlated with the modulation of strength of memorized information (see main text). This effect was in large part driven by the strong power trials. (C) The same correlation analysis showed no effect in the beta band. Overall, power in the alpha range was more strongly modulated as compared to that in the beta range. Data used for plotting the relation between alpha power modulations and probe-related information is at 10.17605/OSF.IO/QG3KB.

The previous approach treats low- and high-power bins equally among participants. If there is a functional relationship between the oscillatory processes in lSTG and strength of memorized information, however, more extreme power differences between the bins should accompany more extreme differences in terms of strength of memorized information (see Figure 1B). In this respect, an interesting first observation is that modulations between low- and high-power trials (computed in each participant as the log10 ratio between high and low power trials in the relevant band) were significantly stronger for alpha oscillations as compared to beta oscillations (t32 = 3.11, p=0.004; compare also Figure 4B and C). Correlating this power modulation measure with the differences in strength of memorized information between low- and high-power trials yielded a significant effect only for alpha oscillations (alpha — r = −0.36, p=0.04; beta — r = 0.06, p=0.72), a pattern that better fits a prioritization account (see Figure 1B). A follow-up correlation analysis on the separate bins shows that for alpha this effect is driven by the high-power trials (Figure 4B). Altogether, relatively desynchronized states in the alpha and beta bands appear to go along with relatively increased strength of memorized information, which is in accordance with a prioritization process. Interestingly, our analysis of these trials points to a more nuanced and differential picture for alpha and beta oscillations, which have been frequently treated in a homogenous manner. Beta desynchronization appears to prioritize memorized information in general, whereas the alpha processes seem to be dependent on individually varying modulations.

Memory-related information is mainly carried by low-frequency activity

So far we have established that overall alpha and beta power in lSTG is relatively reduced prior to the onset of an anticipated strong distractor, and that this process could be involved in prioritizing auditory information in working memory. A valid concern could be that if it were mainly alpha/beta power reductions in lSTG that were carrying the decodable information (trained post-probe onset and time-generalized to retention period) then this would make the analysis somewhat circular. Although this reasoning conflicts with the condition comparison effects showing that lower alpha/beta powers prior to strong distractors are associated with on average lower decoding accuracy for memorized information, we explicitly followed up this issue. For this purpose, the basic decoding shown in Figure 1A, that is whether a probe was part of a memory set or not, was repeated following filtering of the data in different frequency bands (broadband — 1–30 Hz; theta — 1–7 Hz; alpha — 9–17 Hz; beta — 18–24 Hz) and by either applying the Hilbert transform or not. Importantly for the purpose of addressing the potential circularity issue, decoding was poor for the alpha and beta bands in general (see Figure 5A), and did not differ from chance for the relevant time period upon which our time-generalized effects are based (see Figure 5B).

Follow-up analysis to elucidate which frequency band drives decoding performance post-probe presentation (used as trained classifiers time-generalized to retention period).

(A) Results of temporal decoding on MEG sensor-level activity analogous to Figure 2A. (B) Average decoding performance for the relevant 0.4–0.7-s post-probe time period (used for results displayed in Figure 2B and Figure 4) shows that neither alpha nor beta band activity contains information with regards to the memorized item. As shown previously, above-chance decoding performance is seen for broadband activity and this effect appears to be driven strongly by activity in the theta range. Although amplitude information was sufficient for decoding above chance for broadband and theta activity, decoding performance was improved when the temporal fine-structure was maintained.

For the broadband analysis, decoding was also significantly above chance when using only the analytic amplitude part following the Hilbert transform (t32 = 4.19, p=0.0002), although this is significantly lower than when using the original broadband signal (t32 = −6.22, p=5.68e–07). A similar pattern can be observed for the theta band, with significant above-chance decoding for the real (t32 = 9.62, p=5.78e-11) and analytic amplitude (t32 = 6.26, p=5.11e-07) versions of the signal (for real vs analytic amplitude, t32 = −4.80, p=3.25e–05). Decoding when using only the theta band was overall superior as compared to that using the broadband signal (real signal — t32 = 10.71, p=4.10e–12; analytic amplitude — t32 = 7.11, p=4.53e–08). To summarize: alpha/beta band modulations, which are different on average prior to strong and weak distractors, do not appear to carry actual (decodable) information about the memorized item (see also Griffiths et al., 2019). The most important signal component contributing to our analysis (Figure 1A) is in the theta frequency range. Interestingly, the temporal fine structure, as compared to simple amplitude modulations, seems to contain relevant representational information.

Discussion

In the current study, we investigated the neural dynamics prior to an anticipated distractor in the auditory modality. We were particularly interested in potential modulations of alpha power in auditory cortical regions, which have shown patterns similar to those described in various cognitive tasks in the visual system (Frey et al., 2015). Also alpha-power modulations, although mainly in non-auditory regions, have been previously linked to listening effort or attentional control (McGarrigle et al., 2014; Pichora-Fuller et al., 2016; Wöstmann et al., 2017). In order to understand the functional relevance of potential alpha-power modulations, it was important to link them to the informational content carried by the neural patterns in the same time period. For this purpose, we adapted a modified Sternberg paradigm first proposed by Bonnefond and Jensen, 2012 to the auditory system: that is, we introduced (in a block wise manner) putatively weak and strong auditory distractor items in the retention period with predictable timing. Although perfect temporal predictability of a distractor is rare in natural environments (e.g. ticking of clock, dripping faucet), this was maintained in the present study to assure maximum comparability. In particular, strict temporal predictability should boost some potential effects, especially those pertaining to increasing pre-distractor phase consistency.

The behavioral effects are weaker than in the original visual experiment by Bonnefond and Jensen, 2012. Overall, the behavioral finding of lower accuracy for strong distractors suggests an adverse effect of strong acoustic distractors on the representation of memorized information. This notion was supported on a neural level using a time-generalization approach (King and Dehaene, 2014). This encompassed first training of classifiers to decode whether a probe item was part of the memory set or not, with these classifiers being used subsequently as a quantitative proxy for the strength of memorized information. These classifiers were then time-generalized to the period around the distractor presentation in the retention interval. Interestingly, classifier accuracy, especially prior to the strong distractor, was significantly below chance level. Such patterns are not uncommon in M/EEG studies using time- and condition-generalized decoding (see King and Dehaene, 2014) and are also seen in fMRI studies (e.g. van Loon et al., 2018). Descriptively, in electrophysiology, below-chance decoding can arise when the neural patterns underlying representations are opposing and/or temporally shifted. Thus below-chance decoding cannot be interpreted as the absence of condition- or feature-relevant information. However, a functional interpretation is challenging (King and Dehaene, 2014). On the basis that our approach of training classifiers on the post-probe period and time-generalizing them to the retention period only provides a limited access (hence a proxy) to the strength of memorized information, we would hesitate to interpret the results in absolute terms. When contrasting the conditions in relative terms, we find that anticipation of a strong distractor went along with relatively weak memorized information prior to distractor onset.

Interestingly, alpha/beta power prior to the presentation of the strong distractor decreased, whereas it was relatively sustained in the weak distractor condition. This effect is at odds with findings reported in the visual modality using an analogous paradigm (Bonnefond and Jensen, 2012), where — in line with idea of an inhibitory role for alpha oscillations (Jensen and Mazaheri, 2010; Klimesch et al., 2007) — induced alpha/beta enhancements were seen prior to a strong distractor. As in our study, the induced power effect in the Bonnefond and Jensen study was broadband and thus included the beta frequency range (see Figure 2 in Bonnefond and Jensen, 2012), although these authors focused only on the alpha (8–12 Hz) parts in their follow-up analysis. This, in general, is in line with our induced power effect shown in Figure 3, which shows peak effects around 13 Hz and 22 Hz. Our follow-up analysis, however, shows that both alpha and beta powers are relevant in a seemingly differential manner with respect to prioritization of memorized information. In the Bonnefond and Jensen study, pronounced pre-distractor evoked alpha power effects were observed that could not be identified in our auditory task. This negative finding could be modality specific but does also fit well with recent reported problems in identifying attentional pre-stimulus alpha-phase adjustment (van Diepen et al., 2015).

An integration of our alpha/beta findings with these lines of evidence appears challenging on the basis of power effects alone, as they could be interpreted either as the involuntary direction of attentional resources to the anticipated strong distractor (a sort of ‘failed inhibition’ within a gating account) or as a top-down driven prioritization of memorized information in anticipation of a strong distractor (see, for example, Hanslmayr et al., 2016 and van Ede, 2018). The group-level effect of reduced decoding accuracy prior to the strong distractor could be seen to support the first interpretation, but this analysis does not relate the pre-distractor alpha/beta power to estimated strength of memorized information. A desirable analysis would be to perform a single-trial analysis within each participant, but obtaining a reliable quantification of strength of memorized information at a single-trial level is challenging (because of low signal-to-noise, for example). As an alternative approach, we sorted trials according to the pre-distractor power in the respective bands and performed a median split. A prioritization account would entail two predictions: 1) that the low-power bin should be associated with relatively enhanced strength of memorized information prior to the distractor onset; and 2) that participants with more extreme power differences between the low- and high-power bins should also show more extreme differences in the strength of memorized information. We observed that for beta, the first prediction was met, whereas for alpha, it was the second prediction. Overall, while desynchronization in both frequency ranges appears to prioritize representations as suggested by other frameworks (Hanslmayr et al., 2016; van Ede, 2018), our results suggest functionally distinct contributions by auditory cortical alpha and beta oscillations. Interestingly, the extent to which alpha power was modulated within participants was significantly stronger than that for beta. Further studies will be needed to elaborate whether these distinct neural response patterns are linked to different cognitive processes that may subserve prioritization, such as temporal anticipation or selective attention. Altogether, our study underlines the value of combining conventional spectral analysis approaches with multi-voxel pattern analysis (MVPA) in advancing our understanding of the functional role of brain oscillations in the auditory system.

Our results may seem at odds with findings that frequently point to alpha power enhancements as an adaptive process within challenging listening tasks (for review see Strauß et al., 2014). In line with dominant views regarding the functional relevance of alpha oscillations (Jensen and Mazaheri, 2010; Klimesch et al., 2007), alpha enhancements in such circumstances have been linked to the selective inhibition of irrelevant ‘channels’ of auditory information (Strauß et al., 2014). In terms of tasks, a large proportion of studies illustrating enhanced task-related alpha power are not fully comparable to the present study because they are based on manipulation of selective attention (Fu et al., 2001; Worden et al., 2000) or on the processing of degraded (and thus more challenging) speech (but see alternative studies showing alpha decreases to track increased speech degradation, for example McMahon et al., 2016 and Miles et al., 2017; see also Hauswald et al., 2019). Some studies also reported the functional relevance of alpha enhancements during retention periods of auditory working memory tasks (e.g. Obleser et al., 2012; Wilsch et al., 2015), but these studies did not introduce strong/weak distractors at predictable time points. Importantly, however, the sources of these alpha enhancement effects have most frequently been identified in non-auditory brain regions (e.g. Obleser and Weisz, 2012; Wilsch et al., 2015) and rarely in the auditory cortex (e.g. Müller and Weisz, 2012). Indeed, reduction of auditory cortical alpha activity has been commonly linked to attended (Frey et al., 2014) as well as perceived (including illusory) auditory input (e.g. Leske et al., 2014; Müller et al., 2013; Weisz et al., 2007; Billig et al., 2019; for a more general perspective see Lange et al., 2014). The association of alpha modulations to attended/ignored or perceived auditory information has to date been very indirect.

Our study significantly advances this state by showing a relationship between alpha/beta power in the left auditory cortex and strength of memorized information in the retention period. This result supports the interpretations of studies showing alpha power reductions in the auditory modality and a more general assertion that cortical alpha/beta desynchronization during memory tasks represents the content of memorized information (Hanslmayr et al., 2016). Similar to a recent fMRI study using a representational similarity approach (Griffiths et al., 2019), we show that suppression of alpha/beta power itself does not carry the information content but is likely to be a process that enables this to occur. In our study, which used broadband signals in a first step of the decoding analysis, this content-specific information appears to be largely driven by slow (delta/theta) activity, with the temporal fine structure containing relevant information on top of the slower amplitude changes. Overall, our results can be reconciled with those of previous studies focusing on alpha enhancements in auditory tasks, as alpha reductions or enhancements may show engagements of different neural systems in processing relevant or blocking irrelevant information, respectively (van Ede, 2018). The functional versatility of alpha power modulations in listening tasks also serves as a precaution not to equate, for example, alpha power enhancements simplistically to concepts such as listening effort (McGarrigle et al., 2014; Pichora-Fuller et al., 2016).

In summary, precise predictability of the occurrence of an auditory distractor leads to an anticipatory prioritization of memorized information. We show that modulations of alpha/beta oscillations in task-relevant auditory cortical regions could be a relevant process mediating the ‘protection’ of relevant auditory information against interference. In doing so, our study significantly adds to our understanding of the functional role of alpha and beta oscillations in the auditory system. Interestingly, our results suggest a somewhat differential and to date unreported pattern for these frequency ranges in the auditory system: desynchronized beta states in task-relevant auditory regions appear to accompany the prioritization of information, but the extent of prioritization seems more level dependent for alpha. Illustrating a link to prioritization processes is a crucial first step in understanding the functional role of auditory cortical alpha/beta modulations during retention periods. Prospective studies using, for example, experimental neuromodulation techniques are needed to go beyond the correlational level. Future studies will need to test the extent to which the main direction of our findings generalizes to more natural listening situations such as speech, in which the temporal features of the distractor are not predictable with absolute precision.

Materials and methods

Participants

Thirty-three participants were included in the calculations (22 female; age range, 18–46 years; mean age, 26.8 years). Four participants were excluded because of technical issues during the testing or because the data were too noisy. All participants reported normal or corrected-to-normal vision and an absence of hearing problems in daily life. None of them suffered or was suffering a psychological or neurological disorder. Written informed consent was obtained from each participant before the experiment. They obtained either €10/h reimbursement or credits required for their bachelor studies in psychology. All procedures were approved by the Ethics Committee of the University of Salzburg.

Stimuli and procedure

Request a detailed protocol

Participants underwent standard preparation procedures for MEG experiments. Five head-position indicators (HPI) coils were applied (three on the forehead, and one behind each ear). Using a Polhemus FASTRAK digitizer, anatomical landmarks (nasion, left and right pre-auricular points) and HPI coils were recorded, and additionally approximately 300 head-shape points were sampled. To control for eye movements and heart rate, electrodes were applied horizontally and vertically to the eyes (electrooculogram), one electrode was placed on the lower left ribs and one next to the right clavicle (electrocardiogram), as well as one reference electrode on the back. After entering the MEG cabin, a 5 min resting state was recorded, which was not utilized for the present study. The experimental paradigm consisted of a Sternberg task, similar to the one used by Bonnefond and Jensen, 2012, but adapted to the auditory modality. Visual stimuli were displayed with the PROPixx projector (VPixx Technologies Inc) on an opaque screen. Auditory stimuli were delivered using the SOUNDPixx system (VPixx Technologies Inc) through two pneumatic tubes. The stimulus delay introduced by the tube was measured using a microphone (16.5 ms ± 0.1 ms), and this delay was taken into account and compensated for in the analysis phase. The experiment was programmed in MATLAB 9.1 (The MathWorks, Natick, Massachusetts, USA) using the open source Psychophysics Toolbox (Kleiner et al., 2007).

During the experiment, participants focused on a fixation point. They listened to a memory set of four consonants spoken by a female voice (see Figure 1A). The interstimulus interval between the presentations of the consonants presentation was set to 1 s, and a distractor was presented to the participants 2 s after the final (fourth) letter (timepoint 1 s in Figure 1A). Within each experimental block, the distractor was either a consonant spoken by a male voice (strong distractor) or a temporally scrambled consonant (weak distractor). The scrambling of the distractor was achieved using the Matlab-based shufflewins function (Ellis, 2011). This scrambling approach preserves the frequency content of the original voice but makes it unintelligible. One second after the distractor, the probe was presented spoken by the same female voice as in the memory set. Thereafter, the participants needed to decide via button press whether the probe was part of the memory set or not. The participants were exposed to 12 blocks, six per each distractor condition and each one containing 24 trials. An intertrial interval from 1.5 to 2.5 s (mean 2.0 s, uniformly distributed) was used. One block had a duration of about 6 min. The sequence of the conditions and the assignment of the buttons was randomized across participants.

MEG acquisition and analysis

Request a detailed protocol

The brain magnetic signal was recorded (sampling rate, 1 kHz; hardware filters, 0.1–330 Hz) using a whole-head MEG device (Elekta Neuromag Triux, Elekta Oy, Finland) in a standard passive magnetically shielded room (AK3b, Vacuumschmelze, Germany). Signals were captured by 102 magnetometers and 204 orthogonally placed planar gradiometers at 102 different positions. We used a signal space separation algorithm (Taulu et al., 2005) implemented in the Maxfilter program (version 2.2.15) provided by the MEG manufacturer to remove external noise from the MEG signal (mainly 16.6 Hz, that is Austrian train AC power supply frequency, and 50 Hz plus harmonics) and to realign data to a common standard head position (to [0 0 40] mm, -trans default Maxfilter parameter) across different blocks on the basis of the measured head position at the beginning of each block.

First, a high-pass filter at 0.5 Hz (6th order zero-phase Butterworth filter) was applied to the continuous data. Then, continuous data were epoched around the onset of the retention phase using a 3-s pre- and post-stimulus window. For most analyses, the data were downsampled to 256 Hz (100 Hz to speed up decoding analysis, described below). The epoched data were subjected to an independent component analysis (ICA) using the runica algorithm (Delorme and Makeig, 2004). The Maxwell filtering greatly reduced the dimensionality of the data, from the original 306 sensors to usually 55–75 real components, depending on the single data block (Elekta Neuromag MaxFilter User’s Guide, 2012). Therefore, prior to the ICA computing stage, a principal components analysis (PCA) with a fixed number of components (n = 50) was performed in order to ease the convergence of the ICA algorithm (see Demarchi et al., 2019, for example). The ICA components were manually scrutinized to identify eye blinks and eye movements and to train artifact and heartbeat, resulting in approximately two to five components that were removed from the data. The final rank of the data then ranged from 45 to 48. Given this extensive preprocessing, no trials had to be rejected.

In a first step, before time- and condition-generalizing trained classifiers, we applied an LDA classifier to a time window −0.2 s to 0.7 s around the probe presentation to confirm that by using all MEG sensors and all 288 trials we could decode whether a probe was part of the four-item memory set or not (Figure 2A). For this purpose, we employed the standard settings of the MVPA toolbox, that is a five-fold cross-validation scheme (training on 230 trials, testing on 58 trial), stratified, repeated five times, averaging the AUC values across folds splits and repetitions to obtain a time-course of decoding performance (Treder, 2018). Apart from serving as a sanity check, this analysis also yielded training-time ranges of interest for the subsequent cross-condition (time-generalized) decoding analysis. Note that for this latter analysis, no cross-validation was performed anymore. The classifier weights from the temporal decoding analysis were used to identify areas containing informative activity (see below). For analysis (King and Dehaene, 2014) in which classifiers were trained on post-probe onset periods and time- and condition-generalized to the retention period separately for the strong and weak distractor conditions, we followed the following rationale: given that neural activity driven by a memorized probe should share features with the activities elicited by the letters in the memory set, for our purposes, this time-generalization step yields a quantitative proxy for the strength of memorized information. By focusing in particular on the 0.5-s period prior to the presentation of the distractor and a training time period (of 0.4–0.7 s) in which the classifier showed above-chance performance, we could test the extent to which the strength of memorized information was modulated in anticipation of the distracting sound.

Given the nature of our research question outlined in the introduction, we wanted to analyze pre-distractor alpha power modulations in task-relevant brain regions. For this purpose, covariance-corrected (Haufe et al., 2014) classifier weights were projected to source space using an approach adapted from Marti and Dehaene, 2017. A realistically shaped single-shell head model (Nolte, 2003) was computed by warping a template MNI brain to the participant’s head shape. A grid with 1 cm resolution on the template brain was morphed to fit the individual brain volume and lead fields were computed for each grid point. This information was used along with the covariance matrix of all sensors computed via the entire 30-Hz low-pass filtered epoch to obtain LCMV spatial filters (Van Veen et al., 1997). These beamformer filters were subsequently multiplied with the aforementioned covariance-corrected classifier weights to obtain ‘informative activity’ (Marti and Dehaene, 2017) in source space (taking the absolute value on source level). In order to make this data more interpretable, we implemented a permutation approach converting these time series to z-values and testing them across participants against 0 (see below). Overall, this data-driven approach yielded meaningful neuroanatomical regions that differentiated whether a probe was part of the memory set or not. Given our particular interest in auditory processes, we focused on the lSTG, which was the region providing the most prominent informative activity. For this region, we used the beamformer filters to project the single trial data onto a lSTG virtual sensor (location of peak effect in auditory cortex across participant) and applied spectral analysis to it. More precisely, we used Fourier transform of Hanning-tapered data applied to a frequency range of 2–30 Hz (in 1 Hz steps) and time shifted between a period of −1.5 s to 0.5 s around onset of the distractor (shifted in steps of 0.025 s). The time window for the spectral analysis was adapted to each frequency (four cycles) and the analysis was performed separately for the strong and weak distractor condition.

Data preprocessing, spectral and source analysis was done using the Fieldtrip toolbox (Oostenveld et al., 2011). For the decoding analysis, we used the Matlab-based open-source MVPA-Light toolbox (https://github.com/treder/MVPA-LightTreder, 2019).

Statistical analysis

Request a detailed protocol

The behavioral impact of the distractor types was tested using a paired t-test, comparing accuracies and reaction times. Given the hypothesis that the strong distractor would be detrimental to performance, one-tailed testing was performed. With regard to our trained classifier, decoding accuracy was tested against chance level (AUC = 0.5) between −0.2 s and 0.7 s around probe onset using a t-test. In order to make the source projected classifier weights (‘informative activity’) more interpretable, we generated randomly shuffled trial labels and re-ran the same classifier and source projection approach. This was done 500 times, and the empirically observed values at each time and grid point were z-transformed using the mean and standard-deviation from the randomized data. The z-transformed data were tested against 0 across participants using a t-test. Also, the time-generalized decoding analysis and the spectral analysis described above were assessed using a t-test comparing the strong and weak distractor conditions. To control for multiple comparisons, we employed a nonparametric cluster permutation approach as proposed by Maris and Oostenveld, 2007 normally using 5000 randomizations. Finally, testing the relationship between power modulations and strength of memorized information was done in a two-fold manner separately for alpha (~13 Hz) and beta (~21 Hz) oscillations. Both approaches were based on the first approach tested, sorting trials into high- and low-power bins according to a 0.6-s pre-distractor time window of our lSTG region of interest. The trained classifier was then applied to a 0.5-s pre-distractor window for low- and high-power trials, and AUC values were averaged over a 0.4–0.7-s post-probe training time period (analogously to the approach shown in Figure 2B). A first test along the predictions shown in Figure 1B addressed the question of whether lower-power trials would go along with increased strength of memorized information. Results of the aforementioned analysis were first entered into a repeated measures ANOVA using frequency band (alpha, beta) and power (high, low) as factors. Planned contrasts (high vs low power) were followed up using a paired t-test. A second test along the predictions shown in Figure 1B addressed whether individuals showing more extreme modulations between low- and high-power trials would also show more extreme differences in the strength of memorized information between these bins. To this end, power modulations were operationalized using a log10([high power] / [low power]) ratio and correlated with the respective difference in the strength of memorized information. This analysis was performed separately for alpha and beta.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
    Elekta Neuromag MaxFilter User’s Guide, version Software Version 2.2 (NM24057A-A)
    1. Elekta Neuromag MaxFilter User’s Guide
    (2012)
    Elektra Neuromag, Helsinki, Finland.
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Factors That Increase Processing Demands When Listening to Speech
    1. IS Johnsrude
    2. JM Rodd
    (2016)
    In: G Hickok, S. L Small, editors. Neurobiology of Language. Elsevier. pp. 491–502.
    https://doi.org/10.1016/B978-0-12-407794-2.00040-7
  21. 21
  22. 22
  23. 23
  24. 24
    What's new in psychtoolbox-3
    1. M Kleiner
    2. D Brainard
    3. D Pelli
    4. A Ingling
    5. R Murray
    6. C Broussard
    (2007)
    Perception 36:1–16.
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
    MVPA-Light
    1. M Treder
    (2019)
    GitHub.
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65

Decision letter

  1. Timothy D Griffiths
    Reviewing Editor; University of Newcastle, United Kingdom
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  3. Alexander Billig
    Reviewer; University College London, United Kingdom
  4. William Sedley
    Reviewer; Newcastle University, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The work describes a relationship between α activity and the prioritisation of elements held in working memory that will be of broad interest.

Decision letter after peer review:

Thank you for submitting your work entitled "Auditory cortical α desynchronization prioritizes the representation of memory items during a retention period" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: William Sedley (Reviewer #2).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work is not currently suitable for publication in eLife. It is eLife policy that if needed revisions would take more than 8 weeks, we reject a manuscript. In your case, that is where we stand: your paper is of potential interest, but requires significant revisions that we believe require more than 2 months to resolve.

Specifically, the reveiewers found the work interesting in measuring α changes in auditory areas measured using MEG as a function of the presence of a (temporally predictable) auditory distractor. The authors use a modified Sternberg technique with strong and weak distractors presented in different blocks and a phonological WM task after Bonnefond and Jensen's visual version. The group data show decreases in pre-distractor induced but not phase locked α power in left temporal cortex and lower decoding accuracy for the target letter before the strong distractor compared to the weak. In contrast, analysis of the individual α changes showed that increased α correlated with decreased decoding accuracy at the time of the distractor explaining about 1/3 of variance.

We encourage you to consider resubmitting the paper to eLife if you are able to address the issues described below. The most significant points include potential circularity in the analysis, the specificity of the effect with respect to α, the specification of time windows, and the discussion of the basis.

Major comments:

1) The cover mentions 'In daily life we are forced to prioritize ongoing sensory information, to be able to deal with the tremendous amount of input we are exposed to.' Does a situation ever occur in daily life when we have to deal with a distractor that is entirely predictable in time?

2) The central message of this paper relates to α, but the data suggests it is more of a β effect (e.g. the difference plot in Figure 3A, the 10-16Hz band used for the correlation analysis). While the authors make a strong case in the Introduction for the effect to arise in the α band, the data disputes this. Yet, the authors continue to focus on α throughout the Discussion.

3) Throughout the analysis, the authors use restricted windows of analysis. Such practices help limit the multiple comparison problem, but strong a priori reasons are required for window selection – reasons that are not supplied in the manuscript. Were these windows were selected post-hoc (intentionally or not)? This is most troublesome at the start of subsection “Pre-distractor α power modulations of probe-related information”, where the α band was defined as 10-16Hz. This selection is particularly worrying as it does not appear to be either a priori (10-16Hz is really pushing the boundaries of what is consider α) or driven by earlier analysis (the effect in Figure 3A extends from 10-25Hz). Given that this effect is only marginally significant (p = 0.0272), the concern is that worry these parameters may have been selected to drop the p-value below the "significance" threshold, and would not survive multiple comparison correction should the window be expanded.

4) The reviewers struggled with the rationale of the decoding approach. Subsection “Decoding probe-related information” paragraph one describes beginning by decoding from post-probe activity whether that probe was part of the memory set or not. Is decoding accuracy here intended to act as a proxy for how well the memory set was encoded/retained in memory? If so, shouldn't that accuracy be greater than chance earlier than 300 ms after the probe onset (Figure 2A shows it is at chance until then)? This also seems rather an indirect proxy – presumably the neural activity post-probe also heavily features the identity of the probe itself. Furthermore, the neural activity involved in distinguishing presence vs. absence in the memory set might also reflect differential adaptation of the probe representation (although the timescales are probably long enough for this not to be a consideration in sensory cortices). Perhaps decoding accuracy is not supposed to be a proxy for how well the memory set was encoded/retained in memory, since later (paragraph two and elsewhere) the authors refer to using this trained classifier to look for how well the probe is represented during the retention period. To address this question we might have expected the authors to have trained a classifier on neural representations of specific letters (e.g. using the period after each is presented during the memory set, or when they appear as probes) and test for these representations during the retention period. With such an analysis the expectation might be that in a successful trial (and perhaps more so for a weak distractor), that trial's probe letter would be better neurally represented during retention if it were present in the memory set but not if it were absent. We may have misunderstood the rationale of the analyses, in which case it likely needs to be set out more clearly in the manuscript.

5) A major concern was the potential circularity of the analysis. The same signal was used to get the classifier estimates and the α/β power. If α/β contributes to the classifier then the signal would essentially be correlated with itself. There are ways to get around this issue, for instance by using a band-stop filter (i.e. taking out α/β from the signal) on the data before feeding it into the classifier. The reviewers suggest orthogonalizing their α/β data and classifier data before attempting a correlation.

6) We would like to see the brain distribution of the AUC in Figure 2A as a function of time. We cannot work out what latency the brain data shown correspond to. We agree the data are left lateralised but there seems to be an effect in right operculum and DLPFC.

7) From 2B left and right, the decoding of probe-related information prior to the distractor seems to be at/below chance for strong distractors and below chance for weak distractors (assuming chance is AUC = 50%). On that basis, I'm not sure whether the difference (shown in 2B right) underpinning one of the paper's main conclusions is really interpretable. The authors do not mention or discuss this.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Auditory cortical α desynchronization prioritizes the representation of memory items during a retention period" for further consideration by eLife. Your revised article has been evaluated by Barbara Shinn-Cunningham (Senior Editor) and a Reviewing Editor.

The authors have addressed a number of concerns raised in a previous round, importantly providing a clearer rationale for their approach, and checking the α-specificity of the effects. The additional analysis of memory content decoding by frequency band has also strengthened the paper. There are two remaining issues that need to be addressed before acceptance, as outlined below.

Major Issue 1: Below chance classifier decoding.

1) There does not seem to be any reference to the cross-validation of the classifier in the main text. Before seeing whether the classifier can generalise to the retention window, it would be important to see whether the classifier can generalise across different folds of the probe window. This supplementary analysis would allow more confidence in the validity of the classifier.

2) A lack of detail about the classification approach in the Materials and methods makes it difficult to evaluate the suitability of said classification approach. It would be helpful to note explicitly the number of trials used for training/testing, and the number and nature of features included.

3) The below chance decoding in paragraph two of subsection “Decoding probe-related information” is a major concern, particularly because of the apparent absence of cross-validation during the probe window. It would be important to demonstrate that:

i) This is not due to overfitting to the training dataset. It is unclear from the Materials and methods, but perhaps 306 sensors and 50 timepoints (i.e. 15,300 features) have been used to train on (by my count) 288 trials. Such a large number of features relative to a low number of trials can mean that the classifier cannot generalise to any data set other than the training data set (potentially explaining the below chance decoding).

ii) This is not due to linear dependence between features. Again, it is unclear, but it would appear that raw time/sensor data has been used as features in the classifier. Though MEG has excellent spatial resolution, there is still some co-linearity in the signal between sensors. This can impede the generalisability of the classifier to test data sets.

Both of these points could be addressed by running a PCA on the features prior to classification. PCA will help minimise linear dependencies between features, and if one then takes only the top 100 components, then the number of features will not exceed the number of training trials.

Major Issue 2: Correlation between α power variability and ability to decode memorandum.

The authors have correlated, across subjects, a measure of α power variability with the effect of α power on memorandum decodability. As a previous reviewer points out, this is rather removed from the data and does not clearly support the interpretation that α power and decodability themselves are linked. A trial-by-trial analysis of α power versus memorandum decodability was suggested. This would require some consideration of across-subject differences in the raw measures. So either a correlation could be performed subject by subject, and then the correlation coefficients tested against zero as a group, or a single trial-by-trial regression could be run across subjects with appropriate normalization or inclusion of random subject effects.

It might be that the trial-wise data are too noisy for such an approach, and that this is a reason for the authors' use of the median split into high-α and low-α trials. However, the appropriate analysis would then seem to be to compare memorandum decodability for low-α versus (minus) high-α trials, testing whether this difference is significantly greater than zero for the group. The correlation shows only that α power variability is associated with the effect of α power on memorandum decodability. This might not be meaningless, but as the scatter plots in 4B show, there is a fairly balanced split of subjects for whom higher α is associated with better decoding and those for whom lower α is associated with better decoding (positive and negative differences on the y axis). The axis labels are not too clear as to the direction of the comparisons. A possible interpretation is that for subjects on the left of the scatter plots with a lower log10 α ratio (nearer 0.7, i.e. nearer 10^0.7 = four times more α power in high- than low-α trials), high-α trials lead to better decoding than low-α trials, whereas for subjects on the right of the scatter plots with a higher log10 α ratio (nearer 0.9, i.e. nearer 10^0.9 = eight times more α power in high- than low-α trials), the reverse is true.

In the absence of clarification or a new analysis along the lines suggested, it is not clear that the data support the conclusions of the paper.

https://doi.org/10.7554/eLife.55508.sa1

Author response

Major comments:

1) The cover mentions 'In daily life we are forced to prioritize ongoing sensory information, to be able to deal with the tremendous amount of input we are exposed to.' Does a situation ever occur in daily life when we have to deal with a distractor that is entirely predictable in time?

We do not claim that our task is in particular ecologically valid and agree that such situations of perfect predictable time are very rare, albeit possible (e.g. ticking of clock, dripping of faucet, also listening to an “overlearned” song).

Nevertheless, when planning the experiment we decided that sticking to the original visual version of the experiment by Bonnefond and Jensen would make a lot of sense. In particular, we were hoping that maximising the similarity to their study would boost chances to find a pre-distractor increase of phase consistency (as would be seen in the evoked response). As described in the manuscript we were unable to identify such an effect in the auditory modality, which however fits well with general problems in finding attentionally driven α phase effects in (stimulus-free) prestimulus periods (e.g. van Diepen et al., 2015). Overall, this domain appears to be quite controversial (van Diepen et al., 2019) and in the current version we are not placing excessive emphasis on this controversy.

In order to acknowledge this issue we added some additional comments to the Discussion.

“While perfect temporal predictability of a distractor is rare in natural environments (e.g. ticking of clock, dripping faucet), this was maintained in the present study to assure maximum comparability. In particular strict particular temporal predictability should boost some potential effects, especially ones pertaining to increasing pre-distractor phase consistency.”

“Future studies will need to test to what extent the main direction of our findings generalize to more natural listening situations such as speech in which temporal features of the distractor are not predictable with absolute precision.”

2) The central message of this paper relates to α, but the data suggests it is more of a β effect (e.g. the difference plot in Figure 3A, the 10-16Hz band used for the correlation analysis). While the authors make a strong case in the Introduction for the effect to arise in the α band, the data disputes this. Yet, the authors continue to focus on α throughout the Discussion.

The response to this very justified issue should be split into two parts: one pertaining to the somewhat appearing arbitrariness of the chosen time- and frequency windows especially for the correlation analysis and another pertaining the labels for frequency boundaries.

Regarding the first issue we apologize that the impression of arbitrariness was elicited. However the selection of time and frequency windows (following a reasonable restriction of the “search-window”; see Author response image 1) was strictly data driven according the spectrotemporal characteristics of the negative cluster resulting from the induced power contrast (shown in Figure 3A of the manuscript). Relevant features are displayed in the Author response image 1.​

Author response image 1
The image on the left shows the (unthresholded) T-values averaged between in the.

5 to 1s period, i.e. the.5s window preceding the distractor. Separate (negative) peaks can be identified at 13 Hz and 22 Hz. Also when inspecting which frequencies contribute to the negative cluster (middle panel) it is clear that the cluster comprises two peaks at the respective frequencies. Displaying the temporal profiles for these frequencies (right panel) it can be seen that peaks are reached at ~.7s, i.e. ~.3s prior to the presentation of the distractor. Given these follow-up inspections of the cluster, in the original manuscript we focused on the lower-frequency contribution, i.e. multitaper FFT centred at 13Hz and.7s using reasonable parameters to estimate power.

We apologize for this lacking detail and added clarifications:

“To address the functional relevance of pre-distractor α power modulations in the left STG in greater detail, trials were sorted according to α power (13 +/- 3 Hz) in this region in a 400-1000 ms time period following the onset of the retention period (i.e. a 600 ms pre-distractor window centered on peak latency effect at 700 ms).”

It may be argued whether effects at 13 Hz -i.e. not strictly falling into the canonical Berger-band- can be labeled as “α” or not. But the reviewers definitely are right in noting that we ignored the higher frequency (~22 Hz) part in the description as well as in the follow-up analysis. In fact upon re-inspecting the screenshot, it appears possible that the “true” effect is broadband but that noise at 16.6 Hz (stemming from the railway lines in the vicinity of the lab; perhaps not fully removed despite using maxfilter to clean the data as well as analysing data in source space) obscure it. This however cannot be resolved conclusively. In general, a broadband effect is in line with more current developments that see α and β describing similar processes: e.g. a recent paper in eLife (Griffiths et al., 2019) simply used the shorthand “α / β” label. Also upon reinspecting the study by Bonnefond and Jensen, which served as a model for ours, it is quite clear when inspecting their crucial Figure 2, that their pre-distractor effects are also rather broadband ranging from ~8-20 Hz. It is just that the authors decided to restrict their follow-up analysis to the canonical Berger-band (8-12 Hz; as indicated by the box). We raise awareness of the broadband nature of the induced power effect in Figure 3​ at different points, e.g.:​

“This impression is supported by a nonparametric permutation test (p​ ​cluster = .0104), yielding a significant difference in this period with peak differences ~12-13 Hz and ~21-22 Hz (Figure​ 3A,​ right panel) comparable to the study in the visual domain (Bonnefond and Jensen, 2012).​ ”

“Furthermore, given the broader spectral distribution of the power effect (Figure​ 3A)​ , we reran the entire analysis described in this section using β power (centred at peak effect at 22 +/ 3 Hz) to sort trials. No negative cluster was obtained when decoding memorized information (one positive cluster at p​ = .67), meaning that at no time point did a correlation coefficient become statistically significant even at an uncorrected level (r​​’s at.895 s and 1.304 s = .06 and.19 respectively).”

“As in our study, the induced power effect in the Bonnefond and Jensen study was broadband including the β frequency range (see Figure 2 in (Bonnefond and Jensen, 2012))​ , however the authors only focused on the α (8-12 Hz) parts in their follow-up analysis. This in general is in line with our induced power effect shown in Figure​ 3,​ which shows peak effects around 13 Hz and 22 Hz. Our follow-up analysis however show that only the ~13 Hz (“α”) part is relevant with respect to prioritization of memorized information.”

Despite the induced power effect being broadband however, it appears that functionally -i.e. with respect to our inhibition vs prioritization question- not all frequency parts are equal within this band. In particular we reran the interindividual correlation analysis and now show that the prioritization effect seen is specific to the ~13 Hz part. Given this important follow-up analysis inspired by the reviews and the new accompanying explanations is the text, we hope that you are fine in maintaining the “α” label for the ~13 Hz effect.

3) Throughout the analysis, the authors use restricted windows of analysis. Such practices help limit the multiple comparison problem, but strong a priori reasons are required for window selection – reasons that are not supplied in the manuscript. Were these windows were selected post-hoc (intentionally or not)? This is most troublesome at the start of subsection “Pre-distractor α power modulations of probe-related information”, where the α band was defined as 10-16Hz. This selection is particularly worrying as it does not appear to be either a priori (10-16Hz is really pushing the boundaries of what is consider α) or driven by earlier analysis (the effect in Figure 3A extends from 10-25Hz). Given that this effect is only marginally significant (p = 0.0272), the concern is that worry these parameters may have been selected to drop the p-value below the "significance" threshold, and would not survive multiple comparison correction should the window be expanded.

Again we apologize that choices appear arbitrary, even though they are strictly literature- and data-driven. The time- and frequency-window used in our study almost perfectly overlap with the one used in the Bonnefond and Jensen study (5-30 Hz with focus of statistical analysis on.5 s prior to distractor onset; showing main effect spread between 8-20 Hz; see above). We identified the strongest induced power effects to be ~.7 s and at ~13 and ~22 Hz (see Author response image 1). We admittedly ignored the higher frequency (~22 Hz) part in our original submission as rightly pointed out by the reviewers. A window of +/-.3 s (centred on.7 s) is quite a standard choice for estimating power via FFT, thus the described.4 to 1 s time window for the correlational analysis between power changes and decoding. In the current version of the manuscript we used the same time-window to estimate the correlation effects with the β power changes.

While the time- and frequency-bands were data-driven from the induced power analysis, we did not average over time for the analysis correlating the power with the decoding changes. Thus the negative clusters are also purely driven by the data (scatter plots provided only for visualization, making sure that effects are “convincing”, i.e. not outlier-driven). It may be argued whether to call a p​ = .027 “marginally significant” (similar to other variations on “significant” sometimes encountered in the literature), but we think that this pre-distractor effects should be appreciated together with the post-stimulus correlation (​p = .001) overall supporting the claims forwarded by this analysis. It is challenging to follow-up (and thus not pursued further in the manuscript), but it is plausible that the “true” cluster may actually encompass the pre- and post-stimulus periods, but that any effect is transiently interrupted by the evoked effects influencing power as well as decoding results.

4) The reviewers struggled with the rationale of the decoding approach. Subsection “Decoding probe-related information” paragraph one describes beginning by decoding from post-probe activity whether that probe was part of the memory set or not. Is decoding accuracy here intended to act as a proxy for how well the memory set was encoded/retained in memory? If so, shouldn't that accuracy be greater than chance earlier than 300 ms after the probe onset (Figure 2A shows it is at chance until then)? This also seems rather an indirect proxy – presumably the neural activity post-probe also heavily features the identity of the probe itself. Furthermore, the neural activity involved in distinguishing presence vs. absence in the memory set might also reflect differential adaptation of the probe representation (although the timescales are probably long enough for this not to be a consideration in sensory cortices). Perhaps decoding accuracy is not supposed to be a proxy for how well the memory set was encoded/retained in memory, since later (paragraph two and elsewhere) the authors refer to using this trained classifier to look for how well the probe is represented during the retention period. To address this question we might have expected the authors to have trained a classifier on neural representations of specific letters (e.g. using the period after each is presented during the memory set, or when they appear as probes) and test for these representations during the retention period. With such an analysis the expectation might be that in a successful trial (and perhaps more so for a weak distractor), that trial's probe letter would be better neurally represented during retention if it were present in the memory set but not if it were absent. We may have misunderstood the rationale of the analyses, in which case it likely needs to be set out more clearly in the manuscript.

Thank you for making it very clear to us that the entire rationale remains obscure based on the original manuscript. Next to adding more explanations at strategically chosen points, two measures were undertaken to improve comprehensibility: 1) We changed Figure​ 1 to include a schematic depiction of the analysis rationale as well as the predictions based on the inhibition vs prioritization account. The -not very informative- bar plots of the behavioral effects have been removed (effects only described in text). 2) We tried to streamline the terminology by referring to the time-generalized decoding effects as (strength of) “memorized information”. Indeed our analysis approach of training the classifier to post-probe periods and time-generalizing into the retention period makes it (as rightly indicated in your comment) a “proxy” of memorized information. Given that a probe belonging to the memory set should share features in common with the actual memory set, we think that this argumentation is justified, especially when it comes to contrasting the experimental conditions.

Some additional information has been added to the text hopefully helping to make the approach more clear. e.g.: “The rationale for this approach (see also Figure​ 1)​ was that if a probe was part of the memory set, it should share neural patterns with those elicited by the actual memory set. This should not be the case for probes that were not part of the memory set, so that time-generalizing these post-probe patterns to the retention period should give a quantitative proxy for the fidelity of the memory representation.”

Regarding the latency of the probe-locked effects, our results show that early latencies do not contain sufficient information on whether an item was part of the memory set or not, meaning that at a low (e.g. early sensory) level similar processes are involved. While we did not set out to identify such late effects (i.e. they are again data-driven), this finding fits actually very nicely with other reports. For example EEG studies on old vs new memory effects consistently report latencies beyond 300 ms.

We added following sentence: “The time course of this effect is very much in line with evoked response studies on old vs new effects in short term memory (Danker et al., 2008; Kayser et al., 2003),​ indicating that early sensory activation is not informative on whether a probe was a memorized item or not.”

5) A major concern was the potential circularity of the analysis. The same signal was used to get the classifier estimates and the α/β power. If α/β contributes to the classifier then the signal would essentially be correlated with itself. There are ways to get around this issue, for instance by using a band-stop filter (i.e. taking out α/β from the signal) on the data before feeding it into the classifier. The reviewers suggest orthogonalizing their α/β data and classifier data before attempting a correlation.

This is a very interesting issue that we overlooked in our initial approach, even though the statement that “The​ same signal was used to get the classifier estimates and the α/β power​” is not quite correct: the classifiers were trained post-probe presentation whereas α / β was estimated during the cue-target interval. The fact of time-generalizing the classifier (trained on post-probe neural activity) to the retention period should actually greatly diminish the danger of “circularity”. Nevertheless it is correct that if the post-probe decoding results are driven mainly by α (power) modulations and strong α differences are observed prior to the distractor, then some ambiguity exists. We followed-up on this issue in a dual manner:

Firstly we performed the post-probe decoding analysis using data filtered in different bands (broadband, theta, α and β) and either keeping or abolishing the temporal fine-structure (using the norm of the hilbert transform). These results described in detail in a new section (subsection “Memory-related information is mainly carried by low-frequency activity”) and Figure​ 5 clearly shows that neither α or β activity contribute significantly to the decoding. This seems to be mainly driven by slower frequencies, with the temporal fine structure adding relevant information on top of the amplitude changes.

Secondly, we repeated the correlation analysis presented in Figure 4 also using pre-distractor β power (~22 Hz) to sort the trials and are able to confirm that the relationship to the decoding of memorized information is specific for α.

Altogether this shows that α power reductions to do not carry representational information, but -in a sense of a prioritization signal- enables such content specific patterns to emerge, a conclusion much in line with a recent eLife paper (Griffiths et al., 2019).

We added a statement:

“This result supports the aforementioned interpretations of studies showing α power reductions in the auditory modality and a more general assertion of cortical α desynchronization during memory tasks to support representing the content of memorized information (Hanslmayr et al., 2016). Similar to a recent fMRI study using a representational similarity approach (Griffiths et al., 2019), we show that suppression of α power itself does not carry the information content but likely is an enabling process for this to occur. In our study, in which we used broadband signals in a first step of the decoding analysis, this content-specific information appears to be largely driven by slow (δ / theta) activity with the temporal fine structure containing relevant information on top of the slower amplitude changes.”

6) We would like to see the brain distribution of the AUC in Figure 2A as a function of time. We cannot work out what latency the brain data shown correspond to. We agree the data are left lateralised but there seems to be an effect in right operculum and DLPFC.

We have modified Figure​ 2A now to also include the temporal evolution of “informative activity” (i.e. source projection of classifier weights; the AUC is captured in the sensor level using all sensors). For the relevant time-window it can be clearly seen that strong informative activity emerges early between 300-500 ms following probe onset with a dominance in left STG and subsequently spreads to other regions (including parietal, DLPFC and inferior temporal regions). Given its reported importance for phonological short term memories (e.g. Jaquemot and Scott, 2006) and being considered as auditory processing region, we find that focusing our induced activity analysis on this area can be justified.

7) From 2B left and right, the decoding of probe-related information prior to the distractor seems to be at/below chance for strong distractors and below chance for weak distractors (assuming chance is AUC = 50%). On that basis, I'm not sure whether the difference (shown in 2B right) underpinning one of the paper's main conclusions is really interpretable. The authors do not mention or discuss this.

This is an excellent observation. Indeed it would be intuitive to expect constant activation of memorized information (i.e. reflected in electrophysiological recordings). However, in general we do not think that below chance decoding invalidates the main conclusions of the paper, as these rely on the contrast, either between weak vs strong distractors (​Figure 3)​ or α power (Figure​ 4).​ Training of the classifier was done on post-probe periods and increases of decoding accuracy would appear the more when similar -putatively content-related- patterns occur in the retention period. In case this pattern is not activated (significant) below chance decoding accuracy would be the consequence.

In general it is evident from the time courses in Figure 3B that neural patterns related to memorized information ramp up slowly, being largely absent in periods prior to the distractor (which could appear as below chance decoding in the result) and becoming stronger towards anticipated onset of the probe item. Our results effectively show that the process of ramping up neural patterns related to memorized information is delayed when a strong distractor is expected. Activation in a timely “on-demand” fashion seems like an economic functional organization. Alternative forms of maintaining memorized information could be engaged in the pre-distractor period, that do not lead to striking surface-level electrophysiological patterns. The possibility of network-level “activity-silent” memory representations has been recently demonstrated suggested and demonstrated by Stokes et al. (Stokes et al., 2015; Wolff et al., 2017).

We address this issue now explicitly mentioned:

“In fact, when post-hoc testing the decoding accuracy over the entire aforementioned test- and training-time window, average decoding accuracy prior to the strong distractor was significantly below chance (​t​32 = -3.98, p​ = 3.63e-04) whereas it did not differ for weak distractors (t​​32 = -1.73, p​ = .09). Below-chance decoding may appear surprising but simply means a relative absence of memory-item specific patterns akin to those elicited during relevant periods following probe onset. Since this activation gradually ramps up toward the onset of the probe in both conditions albeit somewhat delayed in the strong distractor condition, it is clear that some representation with regards to the memorized item would also need to be present during the periods of below chance decoding. Recently Stokes et al. (Stokes, 2015; Wolff et al., 2017) described network-level “activity silent” processes encoding working memory content and similar processes could be present early during the retention period in the present task.”

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Major Issue 1: Below chance classifier decoding.

1) There does not seem to be any reference to the cross-validation of the classifier in the main text. Before seeing whether the classifier can generalise to the retention window, it would be important to see whether the classifier can generalise across different folds of the probe window. This supplementary analysis would allow more confidence in the validity of the classifier.

Thanks for pointing out this important issue, that we missed to describe in the previous version of the manuscript. A cross-validation approach was actually already implemented to test where out 9temporal) classifier(s) could decode whether a probe was part of the memory set or not (i.e. AUC values in Figure 2A are the average across the different folds). We have now added the missing details to the text. In essence, for the pure temporal decoding step shown in Figure 2A “standard” (i.e. common in the field) 5-fold repeated 5 times cross validation scheme was performed (see e.g. http://www.fieldtriptollbox.org/tutorial/mvpa_light/#cross-validation).

“In a first step, before time- 684 and condition-generalizing trained classifiers, we applied an LDA classifier to a time window -.2 to.7 s centered on the probe presentation to confirm that by using all MEG sensors and all 288 trials we could decode whether a probe was part of the four-item memory set or not (Figure 2A). For this purpose, we employed the standard settings of the MVPA toolbox, that is a 5-fold cross-validation scheme (training on 230 trials, testing on 58 trial), stratified, repeated 5 times, averaging the AUC values across folds splits and repetitions to obtain a time-course of decoding performance (Treder, 2018). Apart from serving as sanity check, this analysis also yielded training-time ranges of interest for the subsequent cross-condition (time-generalized) decoding analysis. Note that for this latter analysis no cross-validation was performed anymore. The classifier weights from the temporal decoding analysis were used to identify areas containing informative activity (see below). For analysis (King and Dehaene, 2014) in which classifiers were trained on post-probe onset periods and time- and condition generalized to the retention period separately for the strong and weak distractor condition we followed the rationale: Given that neural activity driven by a memorized probe should share features with that elicited by the ones in the memory set, for our purposes this time-generalization step yields a quantitative proxy for the strength of memorized information.”

2) A lack of detail about the classification approach in the Materials and methods makes it difficult to evaluate the suitability of said classification approach. It would be helpful to note explicitly the number of trials used for training/testing, and the number and nature of features included.

We apologize for not having provided these important details in the previous manuscript version. Relevant information about the classifier parameters have now been provided. (See response 1).

3) The below chance decoding in paragraph two of subsection “Decoding probe-related information” is a major concern, particularly because of the apparent absence of cross-validation during the probe window. It would be important to demonstrate that:

i) This is not due to overfitting to the training dataset. It is unclear from the Materials and methods, but perhaps 306 sensors and 50 timepoints (i.e. 15,300 features) have been used to train on (by my count) 288 trials. Such a large number of features relative to a low number of trials can mean that the classifier cannot generalise to any data set other than the training data set (potentially explaining the below chance decoding).

ii) This is not due to linear dependence between features. Again, it is unclear, but it would appear that raw time/sensor data has been used as features in the classifier. Though MEG has excellent spatial resolution, there is still some co-linearity in the signal between sensors. This can impede the generalisability of the classifier to test data sets.

Both of these points could be addressed by running a PCA on the features prior to classification. PCA will help minimise linear dependencies between features, and if one then takes only the top 100 components, then the number of features will not exceed the number of training trials.

We would like to thank the Editors for giving us the opportunity to elaborate on this important issue. First of all we wanted to point out that for the sanity check in Figure 2A actually a quite standard cross validation approach was pursued (see response 1), however we missed communicating this important details. Nevertheless “below chance decoding” deserves more attention than we previously dedicated in the manuscript. Addressing this issues requires in a first step to exclude being a consequence of flawed analysis. In a second step, it requires some (at least attempted) clarifications what this could mean.

Regarding the potential of flawed analysis: we are confident that overfitting is not a severe issue in this study. There are different reasons for this, that would have been more obvious if we have provided the aforementioned requested details already in our previous version of the manuscript. 1) Our classifier(s) trained to decode whether a probed was part of a memory set or not (Figure 2A) were tested using a -quite standard- 5 times 5-fold cross validation approach, which reduces the peril of overfitting (see response 1). 2) a classifier was trained / tested for each time-point separately, which leaves us the 306 features per time-point. The use of Maxfilter for cleaning and repositioning the MEG data effectively reduces the number of independent “components” (ie rank) to 55-75, serving effectively as a first run of (PCA-like) feature selection. Moreover, at the stage of the ICA cleaning, a PCA with n=50 components is performed, to facilitate the convergence of the ICA algorithm. This has been now clarified in the text:

“The Maxwell filtering greatly reduces the dimensionality of the data, from the original 306 sensors to usually 55-75 real components, depending on the single data block (Elekta, 2012). Therefore, prior to the ICA computing stage, a PCA with a fixed number of components (n=50) is performed, in order to easen to convergence of the ICA algorithm (see e.g. Demarchi et al., 2019). The ICA components were manually scrutinized to identify eye blinks, eye movements, train artifact and heartbeat, resulting in approximately two to five components that were removed from the data. The final rank of the data then ranged from 45 to 48. Given this extensive preprocessing, no trials had to be rejected.”

“Interestingly, classifier accuracy especially prior to the strong distractor was significantly below chance level. Such patterns are not uncommon in M/EEG studies using time- and condition generalized decoding (see King and Dehaene, 2014) and are also seen in fMRI studies (e.g. van Loon et al., 2018). Descriptively, in electrophysiology below chance decoding can arise when neural patterns underlying representations are opposing and/or temporally shifted. Thus below chance decoding cannot be interpreted as absence of condition- or feature relevant information. However, a functional interpretation is challenging (King and Dehaene, 2014). Based on the fact that our approach training classifiers on the post-probe period and time-generalizing them to the retention period only provides a limited access (hence a proxy) to the strength of memorized information we would hesitate to interpret results in absolute terms. Contrasting the conditions in relative terms, we find that anticipation of a strong distractor went along with relatively weaker memorized information prior to distractor onset.”

Major Issue 2: Correlation between α power variability and ability to decode memorandum.

The authors have correlated, across subjects, a measure of α power variability with the effect of α power on memorandum decodability. As a previous reviewer points out, this is rather removed from the data and does not clearly support the interpretation that α power and decodability themselves are linked. A trial-by-trial analysis of α power versus memorandum decodability was suggested. This would require some consideration of across-subject differences in the raw measures. So either a correlation could be performed subject by subject, and then the correlation coefficients tested against zero as a group, or a single trial-by-trial regression could be run across subjects with appropriate normalization or inclusion of random subject effects.

It might be that the trial-wise data are too noisy for such an approach, and that this is a reason for the authors' use of the median split into high-α and low-α trials. However, the appropriate analysis would then seem to be to compare memorandum decodability for low-α versus (minus) high-α trials, testing whether this difference is significantly greater than zero for the group. The correlation shows only that α power variability is associated with the effect of α power on memorandum decodability. This might not be meaningless, but as the scatter plots in 4B show, there is a fairly balanced split of subjects for whom higher α is associated with better decoding and those for whom lower α is associated with better decoding (positive and negative differences on the y axis). The axis labels are not too clear as to the direction of the comparisons. A possible interpretation is that for subjects on the left of the scatter plots with a lower log10 α ratio (nearer 0.7, i.e. nearer 10^0.7 = four times more α power in high- than low-α trials), high-α trials lead to better decoding than low-α trials, whereas for subjects on the right of the scatter plots with a higher log10 α ratio (nearer 0.9, i.e. nearer 10^0.9 = eight times more α power in high- than low-α trials), the reverse is true.

In the absence of clarification or a new analysis along the lines suggested, it is not clear that the data support the conclusions of the paper.

This is an insightful comment and after careful deliberation we agree that the correlation approach alone goes not full support the interpretation that desynchronized states in auditory cortex are related to prioritization of presentations in auditory working memory. We also agree that ideally power and classifications results could be meaningfully linked at a single trial level, but that this prone to fail given the noiseness of the data. We thus implemented an analysis along the lines as suggested by the reviewer comparing strength or memorized information (see above) between high and low power in the α and β band (i.e. matching the peaks effects shown in Figure 3A). A prioritization account would predict that lower power (i.e. desynchronized states) should exhibit greater strength of memorized information as compared to high power states (and vice versa for an inhibition account). We also maintained a correlation approach analogous to the previous submission and attempted to explain the rationale for this in clearer terms: i.e. more extreme differences in strength of memorized information respectively. Across participants a prioritization account would predict a negative correlation (for the predications based on prioritization and inhibition account see the updated Figure 1). We decided to restrict all analysis to the pre-distractor period now and also only show the analysis collapsed over the entire.5a period. Even though this averaged approach leads to somewhat lower effects in terms of magnitude (e.g. the correlation shown in Figure 4B is lower than in the previous submission in which the relationship was visualised for the greatest statistical effect obtained from the nonparametric cluster analysis), we think that showing the result in this manner leads to a clearer description of the main points.

The outcome of this revise analysis -albeit overall still supporting a prioritization account- has yielded some surprising insights that we missed previously. Importantly it showed that overall desynchronized (low power) states was linked to relatively enhanced strength if memorized information only for β. However, when taking into account the magnitude of power modulation a negative correlation (matching a prioritization account) was only observed for α. Interestingly α power was more strongly modulated within a participant that β. We are not aware of studies showing a similarly differential pattern of auditor cortical α and β oscillations. It will be interesting in future to explore more detail to what extent these differential neural patters map onto different aspects cognitive processed supporting prioritization of memory representations. In order to account for these altered findings we decided to adapt the title accordingly.

The changes are reflected in a novel Figure 4 as well as at multiple points in the text, especially in the description of the results:

“A functional relationship between pre-distractor α / β power and strength of memorized information should be reflected in two manners (with different directions predicted according to a prioritization or inhibition account; see Figure 1B): Firstly, strength of memorized information should differ overall between low and high power bins. And secondly stronger (relative) differences between the bins should be reflected in stronger concomitant differences in strength of memorized information. […] Whereas β desynchronization appears to prioritize memorized information in general, the α processes seem to be dependent on individually varying modulations.”

Furthermore some comments are now added to the Discussion:

“The group-level effect of reduced decoding accuracy prior to the strong distractor could be seen as a support for the first interpretation, however this analysis does not relate the pre-distractor α / β power to estimated strength of memorized information. […] Altogether, our study underlines the value of combining conventional spectral analysis approaches with MVPA in advancing our understanding the functional role of brain oscillations in the auditory system.”

References:

Van Diepen, Rosanne M., John J Foxe, and Ali Mazaheri. „The Functional Role of Α-Band Activity in Attentional Processing: The Current Zeitgeist and Future Outlook. Current Opinion in Psychology 29 (Oktober 2019): 229–38. https://doi.org/10.1016/j.copsyc.2019.03.015.

https://doi.org/10.7554/eLife.55508.sa2

Article and author information

Author details

  1. Nathan Weisz

    Centre for Cognitive Neuroscience and Department of Psychology, Paris-Lodron Universität Salzburg, Salzburg, Austria
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    nathan.weisz@sbg.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7816-0037
  2. Nadine Gabriele Kraft

    Centre for Cognitive Neuroscience and Department of Psychology, Paris-Lodron Universität Salzburg, Salzburg, Austria
    Contribution
    Data curation, Formal analysis, Investigation, Visualization, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2818-2283
  3. Gianpaolo Demarchi

    Centre for Cognitive Neuroscience and Department of Psychology, Paris-Lodron Universität Salzburg, Salzburg, Austria
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    gianpaolo.demarchi@sbg.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7597-9298

Funding

The authors declare that there was no funding for this work.

Acknowledgements

We would like to thank Jens Gfroerer-Kötschau and Manfred Seifter for their support during data collection.

Ethics

Human subjects: The study was conducted according to the declaration of Helsinki (7th revision). Written informed consent was obtained from each participant prior to the experiment. All procedures were approved by the Ethics Committee of the University of Salzburg (EK-GZ:22/2016a).

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Timothy D Griffiths, University of Newcastle, United Kingdom

Reviewers

  1. Alexander Billig, University College London, United Kingdom
  2. William Sedley, Newcastle University, United Kingdom

Publication history

  1. Received: January 27, 2020
  2. Accepted: May 5, 2020
  3. Accepted Manuscript published: May 7, 2020 (version 1)
  4. Version of Record published: May 21, 2020 (version 2)

Copyright

© 2020, Weisz et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 609
    Page views
  • 118
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)