Author response:
Public Reviews:
Reviewer #1 (Public review):
Summary:
Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech.
Strengths:
Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications.
Weaknesses:
Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results.
We greatly appreciate the reviewers for the insightful comments and constructive suggestions. Below are the revisions we plan to make:
(1) Experimental Procedure: We will provide a more detailed description of the stimuli and comprehension tests in the revised manuscript. Additionally, we will upload the corresponding audio files and transcriptions as supplementary data to ensure full transparency.
(2) Statistics/Analyses: In response to the reviewer's suggestions, we have reproduced the states' spatial maps using unnormalized activity patterns. For the resting state, we observed a state similar to the baseline state described by Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states showed network activity levels that deviated significantly from zero. Furthermore, we regenerated the null distribution for behavior-brain state correlations using a circular shift approach, and the results remain largely consistent with our previous findings. We have also made other adjustments to the analyses and introduced some additional analyses, as per the reviewer's recommendations. These changes will be incorporated into the revised manuscript.
(3) Interpretation/Rationale: We will expand on the interpretation of the relationship between state occurrence and semantic coherence. Specifically, we will highlight that higher semantic coherence may enable the brain to more effectively accumulate information over time. State #2 appears to be involved in the integration of information over shorter timescales (hundreds of milliseconds), while State #3 is engaged in longer timescales (several seconds).
Reviewer #2 (Public review):
Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension.
Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions.
(1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null.
We have regenerated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence.
We notice that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations is primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.
We will update Figure 3 and relevant supplementary figures to reflect the new null distribution generated via circular shift. Furthermore, we will expand the discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.
(2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations.
We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly.
Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.
For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion." In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on this ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.
In the revised manuscript, we will give a detailed illustration for how the correspondence of states across analyses were made.
(3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015). Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes.
The temporal window hypothesis indeed provides a better explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to the short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation.
We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.