Neuroscience

Finding structure during incremental speech comprehension

Bingjiang Lyu author has email address
William D. Marslen-Wilson
Yuxing Fang
Lorraine K. Tyler author has email address

Changping Laboratory, Beijing, 102206, China
Centre for Speech, Language and the Brain, Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, United Kingdom

https://doi.org/10.7554/eLife.89311.2

Open access
Copyright information

Figures and data

Example spoken sentence stimuli and plausible structured interpretations.
The two target sentences in each set differ only in the transitivity of the first verb (Verb1). Each sentence has two possible structured interpretations before the actual main verb is presented: an active interpretation, where the subject noun (SN) performs the action, and a passive interpretation, where the SN is the recipient of the action. The interpretative preference hinges on the likelihood of the SN acting as an agent or a patient (i.e., its thematic role) in conjunction with the transitivity of Verb1. As the sentence progresses to the prepositional phrase, a combination of higher SN agenthood and greater Verb1 intransitivity (i.e., a higher Active index) generally favors an Active interpretation. Conversely, increased SN patienthood coupled with higher Verb1 transitivity (i.e., a higher Passive index) may lead to a Passive interpretation. Note that while the SN is the same for the two target sentences within the same set, it varies across different sentence sets. All images were generated using Midjourney for illustrative purposes.

Human incremental structural interpretations derived from continuation pre-tests.
(A) An example set of target sentences differing only in the transitivity of Verb1, HiTrans: high transitivity, LoTrans: low transitivity. Det: determiner, SN: subject noun, V1: Verb1, PP1-PP3: prepositional phrase, MV: main verb, END: the last word in the sentence. (B) Probability of a direct object (left) and a prepositional phrase (right) continuation after Verb1. (C) Probability of a main verb in the continuations after Verb1, which indicates an Active interpretation. (D) Correlations between corpus-based lexical constraints and probabilistic interpretations in the two pre-tests. (Spearman rank correlation, black dots indicate significance determined by 10,000 permutations, PFDR < 0.05 corrected).

Incremental interpretation of sentential structure by BERT.
(A) Context-free dependency parse trees of two plausible structural interpretations. Left: Passive interpretation where V1 is the head of a reduced relative clause. Right: Active interpretation where V1 is the main verb. (B) Incremental input to BERT structural probing model, with the lightness of dots encoding different positions in the target sentences. Det: determiner, SN: subject noun, V1: Verb1, PP1-PP3: prepositional phrase, MV: main verb, END: the last word in the sentence. (C) BERT structural probing model is trained to output a parse depth vector, representing the parse depths of all the words in the sentence input. The BERT parse depth for a specific word is updated incrementally as the sentence unfolds word-by-word. In this example, the parse depth of “found” increases with the presence of the prepositional phrase, indicating an increased preference for the Passive interpretation [according to the context-free parse depths in (A)]. (D) Incremental interpretation of the dependency between SN and V1 in the model space consisting of the parse depth of Det, SN and V1. Upper: Each colored circle represents the parse depth vector up to V1 derived at a certain position in the sentence [with the same color scheme as in (B)]. The hollow triangle and circle represent the context-free dependency parse vectors for Passive and Active interpretations in (A). Lower: incremental interpretation of the two target sentence types represented by the trajectories of median parse depth. (E) Distance from Passive and Active landmarks in the model space as the sentence unfolds [between each colored circle and the two landmarks in the upper panel of (D)] (two-tailed two-sample t-test, *: P < 0.05, **: P < 0.001, error bars represent SEM).

Correlation between incremental BERT structural measures and explanatory variables.
BERT structural measures include (A, B) BERT interpretative mismatch represented by each sentence’s distance from the two landmarks in model space (Figure 3D); (C, D) Dynamic updates of BERT interpretative mismatch represented by each sentence’s movement to the two landmarks; (E, F) Overall structural representations captured by the first two principal components (i.e., PC1 and PC2) of BERT parse depth vectors; (G, H) BERT Verb1 (V1) parse depth and its dynamic updates. Explanatory variables include lexical constraints derived from massive corpora and the main verb probability derived from the continuation pre-test (Spearman correlation, permutation test, PFDR < 0.05, multiple comparisons corrected for all BERT layers, results shown here are based on layer 14, see Appendix 1 figures 4-6 for the results of all layers, see Appendix 1-figure 7 for the dynamic change of Verb1 parse depth); PP1-PP3: prepositional phrase, MV: main verb, END: the last word in the sentence.

Illustration of the pipeline for ssRSA.
For each pair of sentences, we extract their BERT or corpus-based measures and calculate the dissimilarity between these measures, resulting in a model representational dissimilarity matrix (RDM). Meanwhile, we also extract the neural activity recorded while participants are listening to these sentences and calculate their dissimilarity to create a data RDM. Specifically, we use a spatiotemporal searchlight in EMEG source space, which moves across the brain and captures the neural activity within a 10-mm-radius sphere over a 60-ms sliding time window. By correlating the model RDM with data RDMs from all spatiotemporal searchlights, we can identify whether, and if so, when and where the brain represents the information captured by the model RDM. The ssRSA is conducted in V1, PP1 and MV epochs respectively, with HiTrans and LoTrans sentences combined as one group (i.e., 120 sentences in total).

Neural dynamics underpinning the emerging structure and interpretation of an unfolding sentence. (A-C)
ssRSA results of BERT parse depth vector up to Verb1 (V1), the preposition (PP1) and the main verb (MV) in epochs separately time-locked to their onsets. (D-F) ssRSA results of the mismatch for the preferred structural interpretation (the specific BERT layer from which BERT structural measures were derived was denoted in parentheses). From top to bottom in each panel: vertex t-mass (each vertex’s summed t-value during its significant period); heatmap of time-series of ROI peak t-value (the highest t-value in an ROI at each time-point) with a green bar indicating effect onset and ROI t-mass (each ROI’s summed mean t-value during its significant period); cluster t-mass time-series (summed t-value of all the significant vertices of a cluster at each time-point). [cluster-based permutation test, vertex-wise P < 0.01, cluster-wise P < 0.05 in (A-E); marginal significance in (F) with cluster-wise P = 0.06]. Solid vertical lines indicate the timings of onset, average uniqueness point (UP), and average offset of the word time-locked in the epoch with grey shades indicating the range of one SD. LH/RH: left/right hemisphere. See Appendix 1-table 2 for full anatomical labels. See Appendix 1-figure 8 for Spearman’s rho time-series of ROIs in individual participants, see Appendix 1-figure 9 for the significant results of other BERT layers in the MV epoch.

Neural dynamics updating the incremental structural interpretation. (A)
ssRSA results of BERT Verb1 (V1) parse depth change at the main verb (MV) relative to the parse depth of V1 when it is first encountered. (B) ssRSA results of the updated BERT V1 parse depth when the input sentence reaches the MV. (C) Spatiotemporal overlap between the effects in (A) and (B). (cluster-based permutation test, vertex-wise P < 0.01, cluster-wise P < 0.05). See Appendix 1-figure 14 for Spearman’s rho time-series of ROIs in individual participants.

Neural dynamics of multifaceted probabilistic constraints underpinning incremental structural interpretations.
(A, B) ssRSA results of SN agenthood and SN patienthood (i.e., plausibility of SN being the agent or the patient of V1) in PP1 and MV epochs separately. (C) ssRSA results of non- directional index (i.e., interpretative coherence between SN and V1 regardless of the structure preferred) in MV epoch. (D) ssRSA results of Passive index (i.e., interpretative coherence for the Passive interpretation) in MV epoch. (E) Influence of the Passive interpretative coherence on the emerging sentential structure in MV epoch revealed by the Granger causal analysis (GCA) based on the non-negative matrix factorization (NMF) components of whole-brain ssRSA results (see Appendix 1-figure 16 for more details) [(A-D) cluster-based permutation test, vertex-wise P < 0.01, cluster-wise P < 0.05; (E) permutation test PFDR < 0.05]. See Appendix 1-figure 15 for Spearman’s rho time-series of ROIs in individual participants.

Sign up for email alerts