Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorJonathan PeelleNortheastern University, Boston, United States of America
- Senior EditorBarbara Shinn-CunninghamCarnegie Mellon University, Pittsburgh, United States of America
Reviewer #1 (Public Review):
Summary:
The manuscript describes a study in which younger, normal-hearing adults listened to two concurrent speech streams (audio-visual presentation) while magnetoencephalography (MEG) was recorded. They were asked to attend to one and ignore the other speech stream. Speech materials were processed using natural language processing (NLP) model approaches to categorize speech chunks of about 3.5 s duration as being of either high or low probability based on topic modeling. MEG results show that decoding performance (reconstruction of speech) was high for the high-probability speech chunks under both the attend and ignore conditions, suggesting that semantic information in the unattended speech was still processed. The conclusions of this paper are mostly well supported by the data.
Strengths:
The authors use sophisticated analyses using natural language processing models - that are beyond the state-of-the-art - to make inferences about semantic speech processing in the brain. The analytic methods are well described, enabling readers to possibly implement the approach for their own analyses.
The study shows that highly salient semantic information of speech is processed in the brain even when a listener attends to something different. The work has implications for selective attention models that are concerned with how individuals process speech.
Weaknesses:
The title of the manuscript may be a bit misleading: "Get the gist of the story: Neural map of topic keywords in multi-speaker environment". The study was not about the gist of the story but about the gist of speech chunks of about 3.5 s. The study shows important evidence that neural activity is sensitive to the gist of short speech segments, even in unattended speech, but the gist of the story is a yet more abstract level that cannot be reduced to the gist of short speech chunks.
The calculations of t-values for the spatial maps showing significant clusters were non-standard, which makes interpreting the magnitude of the t-values difficult. Better motivation for why the specific approach was chosen would be important, or perhaps replacing it with a more standard approach. It further appears that the region of interest analyses were carried out without multiple comparison corrections, possibly suggesting a note of caution about some of the source-localization results.
Reviewer #2 (Public Review):
Summary:
This study by Park and Gross investigates the spatiotemporal neural representation of semantic information most pertinent to the gist of speech materials presented to subjects as magnetoencephalography was recorded. Participants heard and saw naturalistic continuous speech recordings (with the auditory component presented to one ear), while also presented with distractor auditory speech (presented in the other ear). Participants were instructed to attend to the speech stream that matched the video of the speaker. The stimuli were semantically parsed to create short segments to which topic probabilities were assigned. These segments were then organized into high and low topic probabilities for each of the four topics (determined using Latent Dirichlet Allocation (LDA) analysis). The results suggest clear differences in the fidelity of neural encoding of the speech envelope during high-topic probability segments, which is interpreted as the brain representing key information for a story whether that information is explicitly attended to.
Strengths:
The use of LDA analysis makes possible the quantification of whether a particular speech segment is relevant to a particular topic and enables analysis based on this high-temporal resolution of semantic salience. The authors show clear differences between attended and unattended speech conditions, as well as, surprisingly, differences between semantically salient unattended speech and attended, less semantically relevant speech.
Weaknesses:
Though the effect sizes of the results of this study show clear differences between stimulus conditions, clarification of the experimental methods is needed to appreciate their interpretation. Broadly, I would suggest adding a clearer description of the task during data collection, even though it has been published elsewhere.
One key piece of information that is missing is how semantically relevant topics are assigned, so that salient semantic information can be compared between attended and unattended stories. It's unclear to me how results are combined across topics and stories. If a particular speech segment is assigned 4 topic probabilities, that segment has both a high probability of belonging to one topic and a low probability of belonging to another. I understand how this can be used to create the experimental conditions for a single topic, but how are results combined across topics?
I think some discussion of using the encoding and decoding of the speech envelope as a measure of what is semantically relevant is warranted. The fidelity with which the speech envelop is represented has been used as a proxy for how well that speech is attended to, but it is unclear to me whether we should expect to see high-fidelity encoding of speech envelop outside of the primary and secondary auditory regions of the brain, or how it relates to the semantic information contained in the speech signal.
Additionally, I wonder if it might be more informative to decode the topic labels themselves directly by building a model to predict the topic probabilities from the neural data? This might give a more direct measure of where and when semantically relevant information is represented.